Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Interesting Times Ahead
Bob LucasOperational Director, USC – Lockheed Martin Quantum Computing Center
University of Southern California
1
Interesting Times Ahead
Bob LucasOperational Director, USC – Lockheed Martin Quantum Computing Center
University of Southern California
Institute for Defense Analyses/Center for Computing SciencesLivermore Software Technology Corporation
ANSYS, Leidos, and Micron
2
Interesting Times Ahead
Bob LucasOperational Director, USC – Lockheed Martin Quantum Computing Center
University of Southern California
Institute for Defense Analyses/Center for Computing SciencesLivermore Software Technology Corporation
ANSYS, Leidos, and Micron
Your name here J
3
Overview
• déjàvu,alloveragain?– I’mremindedofmyyouth
• SoftwarePerspective– It’llbeharderthistimearound
• Co-design– Let’sgetthesystemsweneed
4
Supercomputinginthelate1980s
• Cray-2wasmyfirstsupercomputer– SupercomputingResearchCenter(nowCCS)
• Sharedmemory,vectormainframe– O($10M)– Four250MHzECLCPUs– Threewarmbodies
5
NASACray-2fromWikipedia
Supercomputinginthelate1980s
• Cray-2wasmyfirstsupercomputer– SupercomputingResearchCenter(nowCCS)
• Sharedmemory,vectormainframe– O($10M)– Four250MHzECLCPUs– Threewarmbodies– Peoplewerecheap
6
NASACray-2fromWikipedia
DisruptiveTechnology
• Field-EffectTransistor– Patentfiledin1925
• Metal-Oxide-Semiconductor– Inventedin1959
• ComplementaryMOSFETs– Latch-upsolvedinmid-1980s– LotsofDOD$$$(e.g.,SEMATEC)
• PersonalComputers– Highvolume– Lowcost,O($1K)
7
IBMPCfromWikipedia
DrivingdownthecostofHPC
8
CosmicCube
Years
LogPe
rformance
ScalableCOTS
Mainframes
Seymour’sperspective
9
Years
LogPe
rformance
BunchofChickens
SturdyOx
DecadeofInnovation
10
ConvexSPP
IntelTouchstoneDelta
IBMSP1
CrayT3D
TMCCM5
Maspar
Today’sLinuxServerClusters
• CheapHardware– Commodityvolumes
• Freesoftware– Ifyoucaninstallityourself
11
BeowulfatCaltech
Today’sLinuxServerClusters
• CheapHardware– Commodityvolumes
• Freesoftware– Ifyoucaninstallityourself
• Ofteninefficient– Memorywall– Messagepassing
12
BeowulfatCaltech
Today’sLinuxServerClusters
• CheapHardware– Commodityvolumes
• Freesoftware– Ifyoucaninstallityourself
• Ofteninefficient– Memorywall– Messagepassing
• JustBuyMore…– Amazonbuysitbytheacre– Cloudservicemodel– Peopleareexpensive
13
BeowulfatCaltech
NorthernVAdatacenter
Anothertechnologytransitionishappening
14
• Compiledsourcecodeperformancehasplateaued– LibrariesstillbenefitfromwiderSIMDALUs
Déjàvu?
• JohnShalfsawTsugioMakimoto’stalkatISC‘06– PCshaven’tbeenthelow-cost,high-growthmarketforadecade!– Aretheytoday’smainframes,readytobeundercutbycheaperH/W?
15
16
John’sVisionofSoCforHPC(circa2008)
ProcessorCore(ARM,Tensilica,RISC-V)Withextra“options”likeDPFPU,ECC
IPlicensecost$0-$500k
NoCFabric:(Arteris,Denali,otherOMAP-4)IPLicensecost:$200k-$350k
DDR31600memorycontroller(Denali/Cadence,SiCreations)+PhyandProgrammablePLL
IPLicense:$250-$350k
PCIeGen3RootcomplexIPLicense:$250k
IntegratedFLASHControllerIPLicense:$150k 10GigEorIBDDR4xChannel
IPLicense:$150k-$250k
WithMartyDeneroff
memctl
memctlMemory
DRAM
MemoryDRAM PCIe
FLASHctl
IBorGigE
accelerator
Accelerator?
InnovativeacceleratorsforAI
• Low-precisionarithmeticforML– Volta&TPU
• Neuromorphicforvision– TrueNorth
• Annealersforoptimization– Fujitsu&D-Wave
• Startups– Cerebras,EMU&SambaNova
17
Overview
• déjàvu,alloveragain?– I’mremindedofmyyouth
• SoftwarePerspective– It’llbeharderthistimearound
• Co-design– Let’sgetthesystemsweneed
18
Valueofsoftware
• Hardwarefabricationlargelyautomated– O($1B)ofNREcanbeamortizedoverabigproductionrun
• Softwareismoreofanartform,orcraft– ProductivityisO(1)SLOCperhour– LS-DYNArepresentsover1Mhoursoflabor
• Softwareisoftenmorevaluablethanhardware– GMusedtohaveO(10K)IBMPOWERprocessors– Theyspentmoremoneyonsoftwarelicenses
• Trustandacceptancebyuser’sispriceless
19
Itshardtodisplaceexistingsoftware
• Math,science,andengineeringpredatecomputing– We’vehadsevendecadestobuildcodesforthem– Modifyingexistingcodesisoftentheeasiestpathforward
• Anditpreservesexistinginvestment
– InitiallysuccessfulASCIburncodespredatedASCI• ImaginedisplacingWindows,Google,orFacebook– HundredsofmillionsofSLOCs– You’dneedtoreplicatetheirdatatoo
• Easiertocompleteinanewmarketplace– AppleoutflankeditscompetitorswithiPodandiPhone– Today’sinnovationinAI
20
Disruptioninapplicationsoftware
• Nevertheless,sometimesnewsoftwareisrequired– Newresearchproblemsormarkets(e.g.,MLtoday)– Newhardwareorsoftwaretechnologytoexploit
• LLNL’sbrand-newDYNA3DwasrewrittenfortheCray1in1979• MakotoAsaicreatedGeant4tousenewsoftwaretechnology,C++
• Successfulsoftwarethenevolves– LLNL’sDYNA3Dwasadoptedbycommercialcompanies
• LikeNASTRAN,SPICE,andmanyotherpublicoracademiccodes
– LS-DYNAisnowO(10M)linesofsourcecode• PrimarilyF77,butincreasinglyF95andC
– LSTCstartedworkingonmessagepassingin1993• BeforeMPIreleased
21
Exponentialgrowthofdemand
22
“… the current code is limited to 4096 processes so I cannot run the job up to the 96k cores I wanted to.”
ScalingLS-DYNA
• Users’requirementsareunbounded– Rolls-Roycewantsvirtualcertification– ModelsareO(100M)elements,andgrowing
• TryingtoscaleLS-DYNAaccordingly– WorkingwithCray,NCSA,andRolls-Royce
23
Roll-RoyceDummyEngineModel
LS-DYNAsparsematrixreorderingandfactorizationonBlueWaters(8threads/MPIrank)FigurescourtesyofErmanGuleryuz(NCSA)
2017LS-DYNAbehaviorvscores
24
AvailablememoryonMPIrank0whilerunning105MDOFengineinLS-DYNAonBlueWatersFigurescourtesyofErmanGuleryuz(NCSA)
2048cores
8192cores
4096cores
16384cores
−10 −8 −6 −4 −2 0 2 4 6 8 10
−10
−8
−6
−4
−2
0
2
4
6
8
10
ops tree for etree−std−neg1, 188054 supernodes, depth 7
Boundsonchange
mpirun–np2048mppdynai=control.kncpu=8
• Changeneedstocomefrom“below”– LibrarieswritteninCcanbelinkedwithFortran
• E.g.,Metis,MPIandMUMPS– OpenMPisagracefulextensionofthelanguage
• Bringsbackhappymemoriesofautotasking• UPCtoo
• Fixwhat’sbroken– MultifrontaleliminationtreeisaDAG– TraversebrancheswithICL’sPaRSEC?
• #3onmyLSTCtodolist
25
Heterogeneousnodes
• Floatingpointaccelerators(APU,GPU)• I’m0:1atintegratingNVIDIAGPUsintoLS-DYNA– There’relotsofDGEMMcallsinamultifrontalcode– FirsttoreceivetheCUBLAS.Perhapstooearly?
• ANSYSsucceeded– Reverseengineeredourexperiment(twice!)– Theirusers’modelsaredifferentthanLSTC’s(solids)– They’vegonebeyondme,exploitingGPUmemoryB/Wtoo
• Timeforanotherlook– CUDAFortranthistimearound– NVIDIA’shelping
26
Overview
• déjàvu,alloveragain?– I’mremindedofmyyouth
• SoftwarePerspective– It’llbeharderthistimearound
• Co-design– Let’sgetthesystemsweneed
27
Managingcomplexity
• Thegroundisshiftingunderneathus• Softwareapproachestoshielddevelopers– DomainSpecificLanguages– Kokkos– LANL’sRistra
• Whynotanequivalenthardwareeffort?– Otherhalfofco-design– Lowerthebarforthesoftwarepeople
28
UniqueroleforHPCcommunity
• PeterKogge’sEXACUBEprocessor-in-memory– IBMhad80386IP– IBMhadDRAMfabs
• Severalofthem
• Today,onlySamsungmakesbothlogicandmemory• Specializationconstrainsimagination• Logicalreadyexistsinthememory– WhatshouldMicronadd?– Howwoulditimpactthehost?
29
Hybrid Memory Cube
Self Test, Self Repair, Scrubbing,Refresh, Autonomous Functions
Steertheintegrationofnewtechnology
• Intel’sApachePass– 3DXpointinaDDR4formfactor
• Multipleconfigurationsoffered– Highbandwidthfilesystem– Highbandwidthswapspace– Directuseraccess
• Iwant“door#3”– Keepsparsematrixfactorsin3DXpoint
• EliminatefileI/Oabstraction(andoverhead)forout-of-core
– TherestofLS-DYNAcanliveinDDR4• Orbetter,HBM
30
3DXpointillustration
SoCecosystemprovidesabigknob
• Anton– Twoorders-of-magnitudemorecapability
• GreenFlash– Twoorders-of-magnitudelessenergy
31
Bespokesystems
• WehavethetalenttoexploitSoCtechnology– SpreadthinlyovertheHPCusercommunity– LBL’sDavidDonofriodesignedchipsforIntelandApple– CCS’sBillCarlsonhackedGNUCtocreatetheinitialUPC– USC’sJeffDrapergotasolesourceawardfromDARPA
• Systemvendorscertainlydo– StillHPE’sbusinessmodel?
• ItwasbeforeGregAstfalkretired– Sunwaycantoo
• TaihuLight
32
Bespokesystems
• WehavethetalenttoexploitSoCtechnology– SpreadthinlyovertheHPCusercommunity– LBL’sDavidDonofriodesignedchipsforIntelandApple– CCS’sBillCarlsonhackedGNUCtocreatetheinitialUPC– USC’sJeffDrapergotasolesourceawardfromDARPA
• Systemvendorscertainlydo– StillHPE’sbusinessmodel?
• ItwasbeforeGregAstfalkretired– Sunwaycantoo
• TaihuLight– Low-volumebusinessmodel
• Sellengineering,notchips
33
B2NRE~2Xproductioncost
Addressthingsthatlimitperformance
• ReplacingPentiumswith“free”RISC-Vswon’tbeenough– Onlyfreeuntilyoufabthem
• Distributedaddressspace– IwantmyE-registersback
• UPC-likeglobalmemoryabstraction• Surelythepatentshaveexpiredbynow
• Virtualmemoryhierarchy– Non-unitstride,gather/scatter,indirectaddressing,etc.
• Utah’sImpulseproject– SiHammond’sstorewithfloatingpointaccumulate
• Perform“one-touch”functionsinthememory
• Synchronization– Ihavelotsof8-byteMPI_ALLREDUCEsforerrorstatus– BlueGene/Lhadacombiningnetwork
34
MakeSoCecosystemHPCfriendly
• CreateIPspecificallyforHPC– Ourneedswilldifferfromcommoditymarkets
• Fillsoftwaregapstoo.– LANLreportedlyfindsARM’sFortranenvironmentlacking
• InvestinECADR&D– ReducethecostofengineeringSoCsystems– AndresOlofssonisworkingthis
• Enduringadvantage– GOTSIP,notavailabletoothers
35
Summary
• There’sstill“gasinthetank”forCMOS– RichLinderman’sphrase– Specializationoffersapathforward,beyondMoore’sLaw
• Evolution– HPCsoftwareisoftenmorevaluablethatthehardware– Thepaceofchangemustallowforadaptation
• Collaborate– Scienceandengineeringhavemuchincommon– OnedestructiveenginetestisO($100M)
• RoughlythesamecostasAnton
36