Upload
spark-summit
View
428
Download
4
Embed Size (px)
Citation preview
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
SparkSQL:Another16xfasteraFerTungstenSPARCprocessorhasdrama1cadvantagesoverx86onApacheSpark
BradCarlileSr.DirectorSoluKonArchitectureEngineeringSAERoom311,Wednesday,Feb8,2:40pm–2:55pmFeburary7,2017Addi$onalinfoonProofpoints:hXp://blogs.oracle.com/bestperf
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
SafeHarborStatementThefollowingisintendedtooutlineourgeneralproductdirecKon.ItisintendedforinformaKonpurposesonly,andmaynotbeincorporatedintoanycontract.Itisnotacommitmenttodeliveranymaterial,code,orfuncKonality,andshouldnotberelieduponinmakingpurchasingdecisions.Thedevelopment,release,andKmingofanyfeaturesorfuncKonalitydescribedforOracle’sproductsremainsatthesolediscreKonofOracle.
2
SPARCDAXinApacheSparkisProof-of-conceptandnotproduct
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
SparkisExciKng–Lotsofso=wareinnova$onSpark’sWholeEcosystemtotallyappealstotheperformanceexpertinme
• GreatEcosystemforAnalyKcs– SparkformsaconKnuumwithothertechnologies(Ka`a,Solr,…)– LotsofOracle’scustomerusingSpark
• Sparkisfast&scalablebecauseofcleandesign– Spark’sin-memoryfocus
• SparkspeaksSQL&DataFrames– PerfectmarriageforDataScienKsts&HardwareAcceleraKon
– …butwhatisthestatusofprocessorperformance?3
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
ModernAnalyKcsisaboutYourData&ExternalData
• AnalyKcsisnowyourdataPLUSexternaldata– OracleDatabaseIn-MemoryFastestonSPARC– ConKnuouslyanalyzingsamedataindifferentways–Fastestifstoredatain-memory
• BigDatacanbeusedtodescribeexternaldata– Socialnetworks,publicdata:senKments,weather,traffic,events,raKngs,trends,IoTsensors,behaviors,…
– LotsofETL/filteringneededtofindusefuldata– BigDataSQL
ConfidenKal–OracleInternal/Restricted/HighlyRestricted 4
BeBerinforma1onwhenyoucananalyzeallrelevantdata
Weather
Tweets
ExternalData
Order
Customer
YourData
SensorsSocial
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
0.8x
1.0x
1.2x
1.4x
1.6x
1.8x
2.0x
2012 2013 2014 2015 2016
CorePerform
ance
vs.x86E5v2
WhyNOTUpHERE?
Isitreallylatency?
X86percoreperformanceflatfor4genera1ons
ISTHERECOMPUTEBOTTLENECK?REALLY?
MITTechReview,“ChipmakerIntelhassignaledaslowingofMoore’sLaw,…whichplayedaroleinjustabouteverymajoradvanceinengineeringandtechnologyfordecades.”
5
x86Innova1onsinPerCorePerformancehasStalled
"percore=(serverperformance)/(servercorecount)"
E5 E5v2 E5v3 E5v4
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
0.8x
1.0x
1.2x
1.4x
1.6x
1.8x
2.0x
2012 2013 2014 2015 2016
CorePerform
ance
vs.x86E5v2
x86:flatforfourgenera1ons
SPARC’sInnovaKonsareConKnuallyIncreasingCorePerformance
• SPARChasfastercoresforDatabase,Java,Apps,...• SPARChasmorecoresperchipwhichalsodrivesefficiency
6
Innova1veHWDesigngetsaround“x86boBleneck”
SPARCM7 S7
SPARCS7
SPARC:DB
&Java/JVM
S7
"percore=(serverperformance)/(servercorecount)"
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
0.8x
1.0x
1.2x
1.4x
1.6x
1.8x
2.0x
2012 2013 2014 2015 2016
CorePerform
ance
vs.x86E5v2
Java-JVMOLTPDBMeoryBandwidthGB/s
SPARCS7is1.6xto2.1xfasterthanx86
It’smoreimportanthowyouusetransistors,thanthenumbertransistorsyoumake(Moore’sLaw)
7
Innova1veHWDesigngetsaround“x86boBleneck”,x86percoreperfstalled
x86indashedlines
SPARCins
olidlines
E5E5v2
E5v3 E5v4
SPARCT5
SPARCM7S7
SPARCS7
"percore=(serverperformance)/(servercorecount)" "percore=(serverperformance)/(servercorecount)"
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
SPECjbb2015MulKJVMSPARCS7
• InnovaKonsinSPARCprocessorinnovaKonsimproveJavaandJVMperformance– x86performanceisflatgeneraKon-to-generaKon• TradeoffbetweentuningforMaxjOpsorCritjOpsonx86
8
SPARCS7core1.5xto1.9xfasterthanx86E5v4coreonMaxjOps/core
0
1,000
2,000
3,000
4,000
5,000
6,000
S7-2 Huawei HP HP Cisco HuaweiLenovo
CritjOps/coreMaxjOps/core
x86E5v4 x86E5v3
"percore=(serverperformance)/(servercorecount)”Seedisclosureslide
Processor chip,core
MaxjOPS
CritjOPS
SPARCS7-2 4.27GHzSPARCS7 2,16 65,790 35,812
HuaweiRH2288Hv3 2.2GHzE5v422c 2,44 121,381 38,595CiscoC220 2.2GHzE5v422c 2,44 94,667 71,951HuaweiRH2288Hv3 2.3GHzE5v318c 2,36 98,673 28,824Lenovox240M5 2.3GHzE5v318c 2,36 80,889 43,654
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
Java(JVM)istheLanguageoftheCloud&FOSS
JVM=JavaVirtualMachine-Run$meORACLECONFIDENTIAL-HighlyRestricted
Java&theunderlyingJVMisthedesigntargetforSPARCBigData&DevOps Descrip1on WriBentoOracleFusion EnterpriseApps Java
ApacheSpark In-memoryAnaltyics JVMApacheCassandra NoSQLdatabase Java
ApacheHadoop DisktodiskMapReduce Java
ApacheHDFS filesystem Java
ApacheLucene DoctextIndexing Java
ApacheSolr text/doc/websearch Java
Jersey/Grizzly(REST) RESTInterface Java
ApacheHive SQLonHDFS Java
Neo4j Graph Java
Scala Language JVM
BigData&DevOps Descrip1on WriBentoAkka(REST) RESTforBigData JVM
ApacheAccumulo key/valuestore Java
ApacheFlink Batch&StreamProcessing Java(JVM)
ApacheKa`a StreamingMessageBroker JVM
Apachelog4j Logeventmanager Java
ApacheSamza Streamprocessing JVM
ApacheStorm Streamprocessing Java&Clojure
ApacheYarn Clusterjobmanager Java
ApacheZookeeper Config&groupservices Java
H20.ai MLAppslibraries Java,Python,R
ApacheHbase NoSQLdatabase Java
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
CassandraNoSQL2.2.6(YahooCloudServingBenchmark)SPARCcoreis2.0xfasterthanE5v3core
• SPARCS72.0xfasterpercorethanE5v3Haswell• SPARC’sJava/JVMadvantagesbenefitCassandraperformance– YCSB–YahooCloudServingBenchmark–300M• MixedLoadWorkloadA(50%Read/50%Update)• Moreinfoat:hXp://blogs.oracle.com/bestperf
10
Cassandra Chip,core Processor AveOps/s KOps/s
percoreSPARCpercoreAdvantage
SPARCS7-2 2,16 4.27SPARCS7 69,858 4.4K 2.0xE5v3Haswell 2,36 2.3E5v3 90,184 2.5K 1.0x
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
OracleDatabasealsoOpKmizedforAnalyKcQueriesApacheSparkneedstoimplementtheseop$miza$ons!
• ColumnStoreisopKmalforAnalyKcs– AnalyKcsLooksacrosstransacKonsatspecificcolumns– DatabasecolumnsareMachineLearningfeatures
• Columnsfarmoreefficientwhenanalyzingfeatures(efficientbandwidth&CPUneeds)• CanexploitdatacharacterisKcs>20xbenefit
• StorageopKmizaKons–datarepeatedlyanalyzed– VectorprocessingofDicKonary-encodedcomparesneedstobeaddedtoinSparkpersisKngDataFramesinmemory
– layeredcompressionmeans10’sTBofdatafitintoTB’sofmemory• ProcessingopKmizaKons– BloomFilterjoinprocessing,operatorpushdown,metadataopKmize
11
SQLop1mizedforanaly1cqueries BigFeatureRevoluIon
#featurescollectedisExploding10’sto100’s
Order F1 F2 … F150
Customer F1 F2 … F78 … F100
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
OracleDB
OracleDatabaseIn-MemoryAtScale&SPARCDAX
• OLTPusesprovenrowformat• AnalyKcsusein-memoryColumn
• 10xfasteranalyKcsduetosoFware• Columnarcompressionmeanshugedatabasescannowfitin-memory
• OracledatabasestoresBOTHrowandcolumnformats• SimultaneouslyacKveandtransacKonallyconsistent
• 10xto20xFasterSPARCDAX(DataAccelerator)HWinnovaKon
12
RowFormat ColumnFormat
CachedOLTP
TransacKonsWholeDBInMemory
Compressed
WholeDBInMemory
Compressed
10-100’sTBytesRowstore
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
SPARCDramaKcallyFasterIn-MemorySQLAnalyKcsinDB
• SPARCS79.4xfasterpercorethanx86E5v4– SPARCS7-2422query/mvs2-chipx8656query/m• S7-2(18-corestotal),2-chipx86E5v4(20-corestotal)
– x86percoreperformanceflatfor4generaKons
• DAXoffloadsIn-MemoryScans– DAXoffloadallowscorestoincreasethroughput
• 160GBinmemoryrepresents1TBDBondisk• 6.2xcompressionwithOracle12.2In-memory
13
SPARCismul1genera1onalleadinperformance:14Billion-row/secpercore
In-MemorySQLAnaly1cs
9.4xfasterSPARCcoreadvantage
E5v4x86
SPARCOffloadDAXfrees
Coresforotherprocessing
6xCompression
6xIMcapacity
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
SPARC’sMulK-generaKonalLeapfroginPerformance
• IntegratedOffload– DataAnalyKcsAcceleraKon(DAX)– EncrypKon&Security
• It’smoreimportanthowyouusetransistors,thanMoore’sLaw(thenumbertransistorsyoumake)
RadicalInnova1on:IntegratedOffloadoffers10xfasterperformance!
OthervendorsfocusedonDetachedGPUs• DetachedGPUsarepoorlydesignedforSQL• SlowGPUinterconnec$onrobsperformance
Memory
HalfBW
Memory
X86IBMPower
100%U1lizedNOOFFLOAD!NOOPENCores!
Band-Width
SPARCS7
DAXOFFLO
AD O
FFLOADDAX
14
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
GraphicProcessingUnits(GPUs)GPUsOnlyHelpVeryComputeIntensiveAlgorithmsComplexML/StaIsIcsoRennotfastonGPUs…“Canonlycomputeasfastasmovedata”• DetachedGPUshavelimitedbandwidth– GPUsfitforlotsofsimplegraphics• GraphicshaveKnydatamovement:Coords&texturesin,videoframeout
– BigDataMLmuchmoredemanding:GPUshavelimitedapplicaKons• MLLearning&Scoring– SomeMLLearningcanbecomputeintensive,butonlydoneiniKallytocreateModel
• NvidiaMathLibrariesonlyhaveBLAS3rouKnes– BLAS2andBLAS1doNOThavecomputeintensitythereforearenotinlibrary
– ComplicatedMLhasmixofBLAS1,BLAS2,BLAS3 BandwidthlinestoscaleGPUcomputeoRenstalledbylackofbandwidth
Memory
Chip
Local169GB/s
Local57GB/s
E5v422core
Local57GB/s
E5v422core
GPU GPU
CurrentPCIe FutureNVlink
SPARCM732core
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
SQL&theManyFlavorsofDSL–AllpotenKalsforDAX
• SQL:– SELECTcount(*)frompersonWHEREci$zen.age>18
• ApacheSparkSQL(DSL)– valvoters:Int=ci$zen.filter($”ci$zen.age”>18).count()
• JavaStreams– intvoters=arrayList.parallelStream().filter(ciKzen->ciKzen.olderThan(18)).count();
• ApacheEclipse(GoldmanSachsCollecKons)– intvoters=fastList.count(ciKzen->ciKzen.olderThan(18));
ORACLECONFIDENTIAL-HighlyRestricted
SQLcanbewriBeninDomainSpecificLanguages(akaLanguageIntegratedQueries)
ApacheSparkfeedsbothformatsintoSQLopImizerJoinscanbewrinen&op$mized
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
SPARCDAXIntegratedwithJavaJDK8Streams
• JavaStreamsprovideSQL-stylecodeinJava– JavaStreamsisaperfectfitforDAXacceleraKon
• DAX&JavaStreams– IntegerStreamfilter,allMatch,anyMatch,noneMatch,map(ternaryoperator),toArray&countfuncKons– SimplecodechangestouseDAX• importcom.Oracle.Streaminsteadofimportjava.uKl.Stream• DaxIntStreaminsteadofIntStream
• Publicaccess:hXp://SWiSdev.oracle.com/DAX
ORACLECONFIDENTIAL-HighlyRestricted
DAXacceleratesJavaStreams:3.6xto21.8xfaster
3.6x 4.0x 4.1x
10.7x
21.8x
Outlier %Kle Top-N Filter allMatch
SpeedupwithDAX10MillionRows
JavaStreamAPI:filter2_data.parallelStream().filter(w->w.temp<100).count()
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
ApacheSpark:SQLandDSLBothOpKmizedbyCatalystOpKmizer
18
Ex:“SelectallbooksbyauthorsbornaRer1980named‘Paulo’frombooks&authors”SQLSELECT*FROMauthoraJOINbookbONa.id=b.author_idWHEREa.year_of_birth>1980ANDa.first_name='Paulo’ORDERBYb.Ktle
QueryDSL(DomainSpecificLanguage)valjoinDF=author.as(‘a).join(book.as(‘b),$”a.id”===$”b.author_id”).filter($”a.year_of_birth”>1980).filter($”a.first_name”=“Paulo”)).orderby(‘b.Ktle)
CatalystOp1mizerOp$mizedPlan
(canIreordervariousopera$ons?)Whole-StageCodeGen
newinSpark2.0,thisisProjectTungsten
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
ApacheSpark2.1Whole-stageCodeGenPerformance
• Let’sevaluaKonperformanceonFullTableScanof600Mrows– TimetoScan=0.16secImpressive:104.2MillionRows/secpercore
19
2-chipx86E5v3(Haswell),total36cores,72threads/VCPU
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
ByEvaluaKngDeliveredBandwidth,WecanseewearesKllNotYetEfficient
• EvaluaKonperformanceonFullTableScanof600Mrows– TimetoScan=0.16secImpressive:104.2MillionRows/secpercore
• Goingbackto1stprinciples:Scanningisaboutdatamovement– Whatisbandwidthof2.1in-memoryScan? 14GB/s(Spark2.0.1)– Whatisthesystemmemorybandwidth? 114GB/s(StreamTriad)– 7.6xmorebandwidthavailable!
• OtherQueriesdeliverless:“SELECTcount(*)FROMlineorderWHERElo_quanKtyBETWEEN10and20”
– 19xmorebandwidthavailable!--BUTonlydelivering6GB/sonthesystem
20
2-chipx86E5v3(Haswell),total36cores,72threads/VCPU
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
AFerSpark2.1:Another10xinPerformanceSKllPossibleSelectcount(*)fromstore_saleswheress_item_sk>100andss_item_sk<1000
1. VolcanoIteratormodel:planinterpretaKon– Open;Nextdataelement;performpredicate;Close
2. “CollegeFreshman”Java:Whole-StageCodeGenpipelinedcode– for(ss_item_skinstore_sales){if(ss_item_sk>100andss_item_sk<1000){count+=1}}
3. TunedtruevectorizaKonlibrary:highly-tunedcodeoperatesonconKguouscolumn– vectorRangeFilter(n,VECTOR_OP_GT,1000,VECTOR_OP_LT,1000,store_sales,result,result_cnt)
4. HardwareAccelera1onfurtheracceleratesscanning– vectorRangeFilter(n,VECTOR_OP_GT,1000,VECTOR_OP_LT,1000,store_sales,result,result_cnt)• x86AVX2(GraphicsInstrucKon)– deliver70GB/sperchip• Oracle’sSPARCM7processor– deliver140GB/sperchip
21
ProjectTungsten
SPARCDAX
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
SPARCDAXAcceleraKngApacheSpark2.1.0
• Proof-of-ConceptPrototypeonSPARCS7:ApacheSpark2.1POC:DAX&in-memorycolumnar– 2-chipSPARCS7DAX:9.8Billionrows/secon2-predicatescan• SPARCS7-2(16totalcores) 0.061sec• 2-chipx86(20totalcores)0.760seclatestgenera$onE5v4Broadwell
– TheseadvantagesareoverandaboveTungsten’simprovements
• SPARCDAXoffloadscriKcalfuncKons– SQL:filtering,dicKonaryencoding,1&2predicate,joinprocessing,addiKonalcompression,etc.
• LibdaxopenAPI:hXp://SWiSdev.oracle.com/DAX
22
SparkSQLonSPARCS7-2DAXover15.6xfasterpercorethanx86
x86 SPARCDAX
15.6xfasterpercorethanx86
SparkOn
SPARCS7
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
In-MemoryColumnarTwo-levelCompression:OracleDBSametechniqueneedstobeaddedtoApacheSpark
• EfficientInMemoryColumnarTable• DicKonaryencodinghugecompression– 50USonlyneed6bits(<1byte)vs.“SouthDakota”needs12charactersor192unicodebits=32xsmaller
– Ozip/RLEcompressionisthenappliedontopofdicKonaryencoding
• Innova1onsthatApacheSparks1llneeds– Directlyscandic$onaryencodeddata!– Rangescans(2-predicate)inonestep– Ozipdecompressionwithoutwri$ngtomemory– SaveMin&MaxforscaneliminaKon– CanusedicKonaryfor“featurizaKon”ofdataforML
23
0132023
ColumnMin:SouthDakotaMax:Utah
DicIonary
Dictencode
Columnvaluelist
SouthDakota
Tennessee
Utah
Texas
SouthDakota
Texas
Utah
DicIonaryVALUEID
SouthDakota0Tennessee1Texas2Utah3
OZip+
RLE)
Capa-cityLow(Ozip)
Verizonre-inventedthisaswell(SparkSummit2017Boston):RealKmeAnalyKcalQueryProcessing&PredicKveModelDebasishDas(Verizon)
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
libdaxOpenAPIforFree&OpenSourceSoFware(FOSS)
• DesignedtoacceleratewidevarietyofOracle&FOSSsoFware– Examples:• OracleDBIn-memory9.5xfasterwithDAX• ApacheSparkSQLover16xfasterwithDAX• AcKveViam6xto8xfasterwithDAX• JavaStreams3.6xto21.8xfasterwithDAX
• OpenAPIforlibdax–signupforfreesystemstoactuallydevelop/trycode– Publishedsamplecodes– hXps://swisdev.oracle.com
Libdaxdesignedforkeyscan&dic1onaryopera1onstoaccelerateavarietyofsoyware
24
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
SPARCProcessorisfastestforStaKsKcs&ML(MachineLearning)SPARCisFastestforSparkApplica1onsSPARCDAXcandrama1callyaccelerateSparkSQL
ConfidenKal–OracleInternal/Restricted/HighlyRestricted 25
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
TheBasicAnalyKcsFlow
• Datacancomefrommanysources– Databases,NoSQL,csv,feeds…
• Weneedtoprepareit– DataMungingofallsorts!
• Analyzethedata– Findthe“rightway”toanalyzeit• ML,Graph,SQL…
ConfidenKal–OracleInternal/Restricted/HighlyRestricted 26
Alotof1mespentinDataMunging
DataMunging
RESULTS
Analy1cs
NoSQL,Search.…
StreamingKaza,Storm…
Databases
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
ConKnuousAnalyKcsCycleResultsOFenEnrichTransacKonsAnaly1csismorethanonepipelinestream
• ConKnuousiteraKonsaroundthedataanalyKcswheel– Save,catalog,andre-useallthings:data,SQL,code,andanalyKcs
• In-memoryadvantagesateachstage– SPARC’sDAX&leadingbandwidthiskey
• Manysourcesofdata– Internalproprietary,publicdata,externalstreaming,archives
OracleConfidenKal–HighlyRestricted 27
StreamingKaza,Storm,…
EnhanceTransacIons
Reports
NoSQL
ETL(SQL)
ETL(SQL)
In-Memory:Reusedata&analysis
ML&Graph(FP)
Results(SQL)
FeatureExtract,Generate&Transform
(SQL)
Databases
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
ComputaKonalGraphAlgorithms:PageRank&Single-SourceShortestPath(SSSP)
• GraphcomputaKonsacceleratedbySPARC’smemorybandwidth– Bellman-Ford/SSSP(single-sourceshortestpath)–opKmalrouteorconnecKon– PageRank-measuringwebsiteimportance
Graph:SPARCM7upto1.5xfasterpercorethanx86GraphAlgorithm Workload
Size4-chip
X86E5v34-chip
SPARCT7-4SPARCperchipAdvantage
SPARCpercoreAdvantage
SSSPBellman-Ford
448MverKces,17.2Bedges 39.2s 14.7s 2.7x 1.5x233MverKces,8.6Bedges 21.3s 8.5s 2.5x 1.4x
PageRank448MverKces,17.2Bedges 136.7s 62.6s 2.2x 1.2x233MverKces,8.6Bedges 72.1s 27.6s 2.6x 1.5x
28"percore=(serverperformance)/(servercorecount)"
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
OracleDatabaseAdvancedAnalyKcsOpKonMachineLearningonSPARC
• OracleAdvancedAnalyKcsinOracleDatabase12.2– SPARCM7fasterpercoreontraining64-bitfloaKngpointintensive– In-memory640millionrecords,AirlineOn-Kmedataset
MLtrainingSPARCM7upto2.0xfasterpercorethanx86
SGD(StochasKcGradientDescent)IPM(InteriorPointMethod)
Training:CreaIngModelfromdata
ABri-butes
X5-44-chip
T7-44-chip
SPARCperchipAdvantage
SPARCpercoreAdvantage
Supervised
SVMIPMSolver 900 1442s 404s 3.6x 2.0xGLMClassificaKon 900 331s 154s 2.1x 1.2xSVMSGDSolver 9000 157s 84s 1.9x 1.1xGLMRegression 900 78s 55s 1.4x 0.8x
ClusterModelExpectaKonMaximizaKon 9000 763s 455s 1.7x 0.9xK-Means 9000 232s 161s 1.4x 0.8x
29"percore=(serverperformance)/(servercorecount)"
Predict Train
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
OracleDatabaseAdvancedAnalyKcsOpKonMachineLearningonSPARC
• OracleAdvancedAnalyKcsinOracleDatabase12.2– SPARCM7muchfasterpercoreonscoringbandwidth-intensive– In-memory1Billionrecords,AirlineOn-Kmedataset
MLpredic1onSPARC3.0x-4.8xfasterpercorethanx86
SGD(StochasKcGradientDescent)IPM(InteriorPointMethod)
Predic1onUsingmodelon1Brecords
ABri-butes
X5-44-chip
T7-44-chip
SPARCperchipAdvantage
SPARCpercoreAdvantage
Supervised
SVMIPMSolver 900 206s 24s 8.6x 4.8xGLMRegression 900 166s 25s 6.6x 3.7xGLMClassificaKon 900 156s 25s 6.2x 3.5xSVMSGDSolver 9000 132s 24s 5.5x 3.1x
ClusterModelK-Means 9000 222s 35s 6.3x 3.6xExpectaKonMaximizaKon 9000 243s 40s 6.1x 3.4x
30"percore=(serverperformance)/(servercorecount)"
Predict Train
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
SparkMLDataFrameAlgorithmsuseBLAS1&someBLAS2
• ALS(AlternaKngLeastSquaresmatrixfactorizaKon– BLAS2:_SPR,_TPSV– BLAS1:_AXPY,_DOT,_SCAL,_NRM2
• LogisKcregressionclassificaKon– BLAS2:_GEMV– BLAS1:_DOT,_SCAL
• Generalizedlinearregression– BLAS1:_DOT
• Gradient-boostedtreeregression– BLAS1:_DOT
• GraphXSVD++• BLAS1:_AXPY,_DOT,_SCAL
• NeuralNetMulK-layerPerceptron• BLAS3:_GEMM• BLAS2:_GEMV
31
Lowercomputeintensityof0.66to2.0,BLAS1/BLAS2limitsMLperformance,BLAS3faster
SparseLinearAlgebrahasEvenlowercomputeintensity
“Onecanonlycomputeasfastasonecanmovedata”MaxFlops=Compute-intensity*(Memory-BW/bytes-element)Compute-Intensity=Flops/elementSingleFP=4bytes/elementDoubleFP=8bytes/element
HINT:SparkDevelopersneedtowritesub-blockmatriximplementa$onstogetmoreBLAS3usageinSparkML…Why?BLAS3aremoreefficiently,becausetheyuselessmemorybandwidthandreducenumberofmethodcallsFastestLAPACKsolversuseBLAS3
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
GPUsOnlyAccelerateSomeComputeKernelsDetachedGPUmemorybandwidthisboBleneckformostcomputa1ons
Rou1neLevel Opera1on Flops MemComputeIntensity
TeslaK80peak
MaxPossibleGflops
@12GB/sblock=100
PCIeGPU
%peak
MaxPossibleGflops
@80GB/sblock=100
Nvlink(future)GPU
%peak
BLAS1(vector) y=ax+y 2n 3n 2/3 935Gflops 1Gflops 0.1% 7Gflops 0.7%
BLAS2(matrix-vector) y=Ax+y 2n^2+2n n^2 2 935Gflops 3Gflops 0.3% 20Gflops 2.1%
BLAS3(matrix-matrix) C=AB+C 2n^3 4n^2 n/2 935Gflops 75Gflops 8.0% 500Gflops 53.5%
FFT C=FFT(A) nlog(n) 2n Log(n)/2 935Gflops 239Gflops1M1D-FFT 25% 935Gflops
1M1D-FFTNearpeak
OracleConfidenKal–HighlyRestricted 32
Only6BLAS3rouKnetypesinNvidiaLibrary:gemm,syrk,syr2k,trsm,trmm,symmCanwritecodesusingBLAS2/1inGPUsbutresulKngcomputeintensitymustbelargetoaccelerate
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
SPARCAdvantagesonAnalyKcsAcrosstheSpectrumKeyunderlyingtechnologiesprovidedrama1cperformanceadvantages
SPARC’s Java & JVM advantages
SQL DAX Offload
SPARC DAX offload acceleration Oracle Developer Perflib math libraries
NoSQL Machine Learning Graph & Spatial
• Oracle Database • Big Data with DAX
• Cassandra NoSQL • Oracle NoSQL
• Spark MLlib • Advanced Analytics
• Spark GraphX • Oracle Spatial&Graph
upto10xfasterpercore
upto5xfasterpercore
upto1.5xfasterpercore
upto1.9xfasterpercore
33
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
RequiredBenchmarkDisclosureStatement• AddiKonalInfo:hXp://blogs.oracle.com/bestperf• Copyright2016,Oracle&/oritsaffiliates.Allrightsreserved.Oracle&JavaareregisteredtrademarksofOracle&/oritsaffiliates.OthernamesmaybetrademarksoftheirrespecKveowners• SPECandthebenchmarknameSPECjEnterpriseareregisteredtrademarksoftheStandardPerformanceEvaluaKonCorporaKon.Resultsfromwww.spec.orgasof7/6/2016.SPARC
S7-2,14,400.78SPECjEnterprise2010EjOPS(unsecure);SPARCS7-2,14,121.47SPECjEnterprise2010EjOPS(secure)OracleServerX6-2,27,803.39SPECjEnterprise2010EjOPS(unsecure);IBMPowerS824,22,543.34SPECjEnterprise2010EjOPS(unsecure);IBMx3650M5,19,282.14SPECjEnterprise2010EjOPS(unsecure).
• SPECandthebenchmarknameSPECjbbareregisteredtrademarksofStandardPerformanceEvaluaKonCorporaKon(SPEC).ResultsfromhXp://www.spec.orgasof6/29/2016.SPARCS7-2(16-core)65,790SPECjbb2015-MulKJVMmax-jOPS,35,812SPECjbb2015-MulKJVMcriKcal-jOPS;IBMPowerS812LC(10-core)44,883SPECjbb2015-MulKJVMmax-jOPS,13,032SPECjbb2015-MulKJVMcriKcal-jOPS;SPARCT7-1(32-core)120,603SPECjbb2015-MulKJVMmax-jOPS,60,280SPECjbb2015-MulKJVMcriKcal-jOPS;HuaweiRH2288Hv3(44-core)121,381SPECjbb2015-MulKJVMmax-jOPS,38,595SPECjbb2015-MulKJVMcriKcal-jOPSHPProLiantDL360Gen9(44-core)120,674SPECjbb2015-MulKJVMmax-jOPS,29,013SPECjbb2015-MulKJVMcriKcal-jOPS;HPProLiantDL380Gen9(44-core)105,690SPECjbb2015-MulKJVMmax-jOPS,52,952SPECjbb2015-MulKJVMcriKcal-jOPS;;CiscoUCSC220M4(44-core)94,667SPECjbb2015-MulKJVMmax-jOPS,71,951SPECjbb2015-MulKJVMcriKcal-jOPS;HuaweiRH2288HV3(36-core)98,673SPECjbb2015-MulKJVMmax-jOPS,28,824SPECjbb2015-MulKJVMcriKcal-jOPs;Lenovox240M5(36-core)80,889SPECjbb2015-MulKJVMmax-jOPS,43,654SPECjbb2015-MulKJVMcriKcal-jOPS;SPARCT5-2(32-core)80,889SPECjbb2015-MulKJVMmax-jOPS,37,422SPECjbb2015-MulKJVMcriKcal-jOPS;SPARCS7-2(16-core)66,612SPECjbb2015-Distributedmax-jOPS,36,922SPECjbb2015-DistributedcriKcal-jOPS;HPProLiantDL380Gen9(44-core)120,674SPECjbb2015-Distributedmax-jOPS,39,615SPECjbb2015-DistributedcriKcal-jOPS;HPProLiantDL360Gen9(44-core)106,337SPECjbb2015-Distributedmax-jOPS,55,858SPECjbb2015-DistributedcriKcal-jOPS;HPProLiantDL580Gen9(96-core)219,406SPECjbb2015-Distributedmax-jOPS,72,271SPECjbb2015-DistributedcriKcal-jOPS;LenovoFlexSystemx3850X6(96-core)194,068SPECjbb2015-Distributedmax-jOPS,132,111SPECjbb2015-DistributedcriKcal-jOPS.
• SPECandthebenchmarknamesSPECfpandSPECintareregisteredtrademarksoftheStandardPerformanceEvaluaKonCorporaKon.ResultsasofOctober25,2015fromwww.spec.organdthisreport.1chipresultsSPARCT7-1:1200SPECint_rate2006,1120SPECint_rate_base2006,832SPECfp_rate2006,801SPECfp_rate_base2006;SPARCT5-1B:489SPECint_rate2006,440SPECint_rate_base2006,369SPECfp_rate2006,350SPECfp_rate_base2006;FujitsuSPARCM10-4S:546SPECint_rate2006,479SPECint_rate_base2006,462SPECfp_rate2006,418SPECfp_rate_base2006.IBMPower710Express:289SPECint_rate2006,255SPECint_rate_base2006,248SPECfp_rate2006,229SPECfp_rate_base2006;FujitsuCELSIUSC740:715SPECint_rate2006,693SPECint_rate_base2006;NECExpress5800/R120f-1M:474SPECfp_rate2006,460SPECfp_rate_base2006.
• Two-KerSAPSalesandDistribuKon(SD)standardapplicaKonbenchmarks,SAPEnhancementPackage5forSAPERP6.0asof5/16/16:SPARCM7-8,8processors/256cores/2048threads,SPARCM7,4.133GHz,130000SDUsers,713480SAPS,Solaris11,Oracle12cSAP,CerKficaKonNumber:2016020,SPARCT7-2(2processors,64cores,512threads)30,800SAPSDusers,2x4.13GHzSPARCM7,1TBmemory,OracleDatabase12c,OracleSolaris11,Cert#2015050.HPEIntegritySuperdomeX(16processors,288cores,576threads)100,000SAPSDusers,16x2.5GHzIntelXeonProcessorE7-8890v34096GBmemory,SQLServer2014,WindowsServer2012R2DatacenterEdiKon,Cert#2016002.IBMPowerSystemS824(4processors,24cores,192threads)21,212SAPSDusers,4x3.52GHzPOWER8,512GBmemory,DB210.5,AIX7,Cert#201401.DellPowerEdgeR730(2processors,36cores,72threads)16,500SAPSDusers,2x2.3GHzIntelXeonProcessorE5-2699v3256GBmemory,SAPASE16,RHEL7,Cert#2014033.HPProLiantDL380Gen9(2processors,36cores,72threads)16,101SAPSDusers,2x2.3GHzIntelXeonProcessorE5-2699v3256GBmemory,SAPASE16,RHEL6.5,Cert#2014032.SAP,R/3,regTMofSAPAGinGermanyandothercountries.Moreinfowww.sap.com/benchmarks.
MustbeinSPARCS7&M7Presenta1onswithBenchmarkResults
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
WhyApacheSparkisbestonSPARC
• SPARC’sleadershipinfastJVMperformancespeedsScala– SPARCdesignfocusonJava&JVMbenefitsallAppsthatuseJVM
• SPARC’shugeleadindeliveredmemorybandwidth– SPARCacceleratesSpark’sin-memorydesign–Bigdataappsneedsfastdatamovement
• SPARCDAXtoprovidehugeoffloadacceleraKoninApacheSpark– ContribuKngcodetoApacheSparkSQL/DataFramestouseDAX
• SPARC’sleadingfloaKng-pointperformanceforMachineLearning– DesignedforfastFloaKngPoint(FP)onsparse&advancedalgorithm
Designinnova1onsateverylevel
36
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
SPARCAdvantagesforEveryFormofAnalyKcs
AnalyIcs In-memoryColumnarSW&SoRwareinSilicon(SPARCDAX)
SPARC’sefficientCPU&memorydesign
DatabaseAnaly1csAnaly$cSQLQueriesfordatabase ✔ ✔JavaAnaly1csJavaCodethatprobesdataSQL-likecode ✔ ✔MachineLearning(ML)Automa$callyfindinghiddenpanerns&correla$ons ✔ ✔GraphAutoma$callyfindpanernsinSocialnetworks,etc. ✔NoSQLFastaccesstosemi-structured&unstructureddata ✔Spa1alFastthroughputongeometricdata ✔
37
SPARCis1.3xtoover10xfasterpercoreonAnaly1csthanx86
Featuregenera$on&Selec$onforMLiso=enSQL
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
0.0
50.0
100.0
150.0
200.0
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Billion
sRow
s/sec
Bitstoencodedata:log2(cardinality)
RangeScanUsedinSQL“Between”
S7-1DAX8-coreS7-2DAX16-coreE5v3AVX218-core
SPARCDAXScanPerformanceversusx86E5v3(AVX2)DAXoffloadscans:SPARCS7onlyuses25%coresandx86uses50%cores
38
18-corex86E5v3
8-coreSPARCS7
OraclehasopenAPIforDAX:hXps://swisdev.oracle.com/
16-coresSPARCS7-2
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
“Big”Features:AXributes/QualiKestoAnalyze
• OrganizingfeaturesiscriKcaltotheanalyKcscycleConfidenKal–OracleInternal/Restricted/HighlyRestricted 39
BigDataisalsoBigFeatures:10’sto100’sfeaturestochose,create,&manage
StaKsKcsML
(MachineLearning)
ResultsReport
FeatureDBSQL
JoinsSQL
ETLSQL
(Extract,transform,&load)
FeatureSelect&
FeaturizaIon(behaviors)
SQL
RAWDATAPublic
&private
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
MachineLearning(ML):Scoring/PredicKonversusTraining/LearningCharacterisKcsPredic1on/Scoringoperatesonhugeamountsofdatawithlowcomputeintensity
MLScore/Predic1on
MLLearn/Train
%ofacKvity MostData *IniKal
ComputaKon O(n^2)Matrix-vector
O(n^3)Matrix-matrix
Data O(n^2) O(n^2)
ComputeIntensity(Compute/Data) Lowconstant O(n)
SPARCAdvantageDuetoMemoryBandwidth&design
3xto6xpercore
Upto1.3xpercore
OracleConfidenKal–HighlyRestricted 40
TrainingSet
Scoring/PredicKon
Results
Cantrainononeserverthenmovemodeltopredic$onserver(ex:StubHub)Modelslive1week!
Training/Learning
DatatoEvaluate
*Ini$alandthenoccasionallyupdatemodelsdatasamples
Model
Every4weeks