28
NERSC-9 Nicholas J. Wright, NERSC-9 Chief Architect NUG mee=ng March 24, LBNL 3/24/16

NERSC-9€¦ · informaon to benefit future studies and help future validaon of results • Whitepaper released which describes other storage uses cases present in APEX workflows

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: NERSC-9€¦ · informaon to benefit future studies and help future validaon of results • Whitepaper released which describes other storage uses cases present in APEX workflows

NERSC-9

NicholasJ.Wright,NERSC-9ChiefArchitectNUGmee=ngMarch24,LBNL

3/24/16

Page 2: NERSC-9€¦ · informaon to benefit future studies and help future validaon of results • Whitepaper released which describes other storage uses cases present in APEX workflows

NERSCTimeline

NRP complete 12.5 MW

201520162016-182020202120242028

NERSC-8 Cori Phase II

NERSC-8 Cori Phase I

CRT 25MW upgrade

CRT 35+ MW upgrade

NERSC-10 Capable Exascale for broad Science

Staff move in

NERSC-9 150-300 Petaflops

NERSC-11 5-10 Exaflops

Edison Move Complete

2

Page 3: NERSC-9€¦ · informaon to benefit future studies and help future validaon of results • Whitepaper released which describes other storage uses cases present in APEX workflows

APEX2020CurrentStatus•  3rdjointSC/NNSAprocurement•  ALerTrinity/NERSC-8(2016)&CORAL(2018)

•  RFPdraLtechnicalspecsreleasedNov.10,2015•  2ndDraLreleasedMarch11th

hZp://www.lanl.gov/projects/apex/_assets/docs/APEX2020_draL_tech_specs_v2.0.pdf

-3-

2015 2016 2017 2018 2019 2020

RFP Contract DeliveryNon-RecurringEngineering

Page 4: NERSC-9€¦ · informaon to benefit future studies and help future validaon of results • Whitepaper released which describes other storage uses cases present in APEX workflows

1.E+01

1.E+02

1.E+03

1.E+04

1.E+05

1.E+06

1.E+071/1/19

92

1/1/19

96

1/1/20

00

1/1/20

04

1/1/20

08

1/1/20

12

1/1/20

16

1/1/20

20

1/1/20

24

Energy

perF

lop(pJ)

Heavyweight Heavyweight Scaled Heavyweight Constant

Lightweight Lightweight Scaled Lightweight Constant

Heterogeneous Hetergeneous Scaled Historical

CMOS Projection Hi Perf CMOS Projection Low Power UHPC Goal

4

NERSCneedstotransi@ontoenergyefficientarchitectures

ManycoreorHybridistheonlyapproachthatcrossestheexascale

finishline

Page 5: NERSC-9€¦ · informaon to benefit future studies and help future validaon of results • Whitepaper released which describes other storage uses cases present in APEX workflows

5

Intel Federal LLC Proprietary The information on this page is subject to the use and disclosure restrictions provided on the second page to this document.

Throughput vs Single Thread: Perf Trade-off

SKLBDW

HSWIVBSNB

NHMPNRMRM

YNHBNSSLM

STW

GLMPLM

TMT

0.00

0.20

0.40

0.60

0.80

1.00

1.20

1.40

0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 1.60 1.80

rela

tive

IPC

Normalized Power (22nm)

Belli Kuttanna

2.5-3.5x power, <2x freq

66Haswell:SilvermontIPC:~3xPower:~5x

Page 6: NERSC-9€¦ · informaon to benefit future studies and help future validaon of results • Whitepaper released which describes other storage uses cases present in APEX workflows

AbstractMachineModelforExascale

6

3D StackedMemory

(Low Capacity, High Bandwidth)

FatCore

FatCore

Thin Cores / Accelerators

DRAMNVRAM

(High Capacity, Low Bandwidth)

Coherence DomainCoreIntegrated NICfor Off-Chip

Communication

Page 7: NERSC-9€¦ · informaon to benefit future studies and help future validaon of results • Whitepaper released which describes other storage uses cases present in APEX workflows

Edison-2012

7

3D StackedMemory

(Low Capacity, High Bandwidth)

FatCore

FatCore

Thin Cores / Accelerators

DRAMNVRAM

(High Capacity, Low Bandwidth)

Coherence DomainCoreIntegrated NICfor Off-Chip

Communication

Page 8: NERSC-9€¦ · informaon to benefit future studies and help future validaon of results • Whitepaper released which describes other storage uses cases present in APEX workflows

Cori(NERSC-8)-2016

8

3D StackedMemory

(Low Capacity, High Bandwidth)

FatCore

FatCore

Thin Cores / Accelerators

DRAMNVRAM

(High Capacity, Low Bandwidth)

Coherence DomainCoreIntegrated NICfor Off-Chip

Communication

Page 9: NERSC-9€¦ · informaon to benefit future studies and help future validaon of results • Whitepaper released which describes other storage uses cases present in APEX workflows

NERSC-9(2020)?–Anexascale-eraarchitecture

9

3D StackedMemory

(Low Capacity, High Bandwidth)

FatCore

FatCore

Thin Cores / Accelerators

DRAMNVRAM

(High Capacity, Low Bandwidth)

Coherence DomainCoreIntegrated NICfor Off-Chip

Communication

Page 10: NERSC-9€¦ · informaon to benefit future studies and help future validaon of results • Whitepaper released which describes other storage uses cases present in APEX workflows

Layer NERSC-7(Edison)2013

NERSC-8(Cori)2016

NERSC-92020

HighBandwidthMemorypernode

None 16GB,>400GB/sec More!

DRAMpernode 64GB,~100GB/sec 96GB,90-100GB/sec

Some

NV-DIMM(byteaddressable)

None None Maybe

Non-Vola@le(Pageaddressable)

None 1.5PB,1.5TB/sec 10sPBs,10sTB/sec

SpinningDisk–/scratch

8PB,130GB/sec 28PB,700GB/sec

Collapsedlayer>50PBs~1TB/secSpinningDisk–

longerterm(/project)

~30PB,~70GB/sec ~50PB,~100GB/sec

Tape ~40PB,~10GB/sec ~100PB,~20GB/sec ~100sPB,~10sGB/sec

Page 11: NERSC-9€¦ · informaon to benefit future studies and help future validaon of results • Whitepaper released which describes other storage uses cases present in APEX workflows

•  NVRAMtechnologiesarecosteffec=veforbandwidthtoday–  BurstBuffersinTrinity/Cori(2016)&CORAL(2018)

•  In2020– Willanyspinningdiskbeneededforcapacity?Costisthelimi=ngfactor

– NVRAM:Howmuch?Whatkind(s)?Wheretoputitinthemachine?WhatsoLware(run=me/scheduler/OS)enhancementswillbeneeded?•  Workflows!

–  Fusion,Climate,QCD,ALS,JGI,Materials,SkySurveyhZps://www.nersc.gov/assets/apex-workflows-v2.pdf

MarketSurvey:StorageTechnologiesareChanging

11

Page 12: NERSC-9€¦ · informaon to benefit future studies and help future validaon of results • Whitepaper released which describes other storage uses cases present in APEX workflows

APEXwillDefineWorkflowstoOp@mizePlabormStorage•  Aworkflowisadescrip=onofthestepsneededtoobtainresultsin

ascien=ficinves=ga=on•  Theworkflowlifecycletypicallyconsistsofmanycomputa=onal

anddatatransforma=onsteps–  Runningsimula=onsand/orexperiments–  Analyzingoutputdata–  Managingdatatoaidthescien=ficinves=ga=on,includingcollec=ng

informa=ontobenefitfuturestudiesandhelpfuturevalida=onofresults

•  WhitepaperreleasedwhichdescribesotherstorageusescasespresentinAPEXworkflows–  Baseduponextensiverequirementsgatheringexercise–  Includeses=matesofdatavolumesandlife=mesformul=pleNERSC,

LANL,LLNLandSNLworkflows•  Overallgoalistoprovideaframeworktoreasonaboutplaporm

storagedesigndecisions–  Allowsvendortoinnovateandbeflexible

12

Page 13: NERSC-9€¦ · informaon to benefit future studies and help future validaon of results • Whitepaper released which describes other storage uses cases present in APEX workflows

DataReten

=onTime

Forever

Temporary

Setup/Parameterize/

CreateGeometry

SimulatePhysics Viz

Ini=alInputDeck

CheckpointDump

Γ*JMTTI

JobBegin

JobEnd

Campaign

Ini=alState

CheckpointDump

TimestepDataSet

SampledDataSet

Down-Sample

Post-Process

AnalysisDataSet

SimInputDeck

PhaseS1 PhaseS2 PhaseS3 PhaseS4 PhaseS5

CheckpointDump

4–8xperweek

5-15xperpipeline

TimestepDataSet

5–10xperweek

Simula=onSciencePipeline13

Page 14: NERSC-9€¦ · informaon to benefit future studies and help future validaon of results • Whitepaper released which describes other storage uses cases present in APEX workflows

DataReten

=onTime

Forever

Temporary

Generateand/orGatherInputData

HTCAnalysisorUQSimula=on

CheckpointDump

Campaign

SharedInput

CheckpointDump

Analysis

PhaseH1PhaseU1

PhaseH2PhaseU2

PhaseH3PhaseU3

CheckpointDump

4–8xperweek

5-15xperpipeline

PrivateInput

File-basedComm.

AnalysisDataSets

AnalysisDataSets

oror

HTCSciencePipeline

UQSciencePipeline

14

Page 15: NERSC-9€¦ · informaon to benefit future studies and help future validaon of results • Whitepaper released which describes other storage uses cases present in APEX workflows

TargetSystemConfigura@on

15

NERSC-8 NERSC-9-Target

SSP >5xEdison >20xEdison

BaselineMemoryCapacity 1.1PB >3PiB

BurstBuffer 1.5PB1.5TB/s >90PB>5TB/s

Disk 22PB744GB/s

Page 16: NERSC-9€¦ · informaon to benefit future studies and help future validaon of results • Whitepaper released which describes other storage uses cases present in APEX workflows

MarketSurveyshaveFormedtheBasisofourRequirementsDevelopment

•  TheCrossroads/NERSC-9(CN9)teamshadmanyformal(Face-to-Face)andinformal(telecon)interac=onswithvendorsoverthelast15months–  Interac=onscon=nueleadinguptotheRFPrelease

•  MarketSurveysandinterac=onsfocusedonmajorprimeandtechnologyprovidercandidates:

16

Page 17: NERSC-9€¦ · informaon to benefit future studies and help future validaon of results • Whitepaper released which describes other storage uses cases present in APEX workflows

•  NERSCworkloadanalysisperformedaspartoftheprocurementac=vi=es– hZp://portal.nersc.gov/project/mpccc/baus=n/NERSC_2014_Workload_Analysis_30Oct2015.pdf

•  NERSChasheldonerequirementsworkshopperofficelookingat2017requirements– hZp://www.nersc.gov/science/hpc-requirements-reviews

TechnicalSpecifica@onsIncludeFindingsFromWorkloadAnalysisandRequirementsWorkshops

17

Page 18: NERSC-9€¦ · informaon to benefit future studies and help future validaon of results • Whitepaper released which describes other storage uses cases present in APEX workflows

VASP

MILC

Espresso

CESM

GYRO

LAMMPS

NAMD

chroma

xgcgtc

M3DWRFAMD

tgyrocp2k

BerkeleyGWqlua

S3D

osirisgtsGaussian

EWI3Dsextet.x

NWCHEMnimrodmadam_toast

ARTrun_wmcphoenix

ChomboCrunchoverlap_inverter

geneeffBeamEMGeopython-mpiNyx

transFnGromacsmolproNCAR-LES CompoaRunxaorsaGadget

lsppstg

DLPOLYelm_6f

Amber

>600Others•  13codesmakeup50%of

workload

•  25codesmakeup66%ofworkload

•  50codesmakeup80%ofworkload

•  Remainingcodes(over600)makeup20%ofworkload.

Over650applica@onsrunonNERSCresources

-18-

TopApplica@oncodesonHopperandEdisonbyhoursused.

Jan–Dec2014

Page 19: NERSC-9€¦ · informaon to benefit future studies and help future validaon of results • Whitepaper released which describes other storage uses cases present in APEX workflows

NERSCBenchmarksWereChosentoRepresenttheWorkload

-19-

DensityFunc=onalTheory

LaxceQCD

MolecularDynamics

Con=nuumFusion

Bio-Informa=cs

PICFusion

Climate

ScalableSolvers

QuantumChemistry

CMBSeismic PDSF TopalgorithmsonNERSCsystems

bycorehoursusedJan–Dec2014

•  Regroupedtopcodesbysimilaralgorithms.

•  Asmallnumberofbenchmarkscanrepresentalargefrac=onoftheworkload.•  miniDFT•  MILC•  GTC•  Meraculous

•  IncludesGenepooland

PDSFsystems.

Page 20: NERSC-9€¦ · informaon to benefit future studies and help future validaon of results • Whitepaper released which describes other storage uses cases present in APEX workflows

APEXplanstouse“mini-apps”,somefullappsforsystemevalua@on

MiniApp Descrip@on Language

miniDFT(QuantumEspresso)

Plain-waveDensityFunc=onalTheory(DFT) Fortran

MILC LaxceQuantumChromodynamics(QCD).Sparsematrixinversion,CG

C

GTC-P Par=cle-in-cellmagne=cfusion C

UMT Unstructured-Meshdeterminis=cradia=onTransport

C/C++/Fortran

SNAP Neutronpar=cletransportapplica=on Fortran

PENNANT Unstructuredfiniteelement C

Meraculous Denovogenomeassembly UPC

MiniPIC Par=cleincellforaccelerators C++

HPCG HighPerformanceConjugateGradient C

-20-

Page 21: NERSC-9€¦ · informaon to benefit future studies and help future validaon of results • Whitepaper released which describes other storage uses cases present in APEX workflows

1.  Provideasignificantincreaseincomputa=onalcapabili=esovertheEdisonsystem,atleast16xonasetofrepresenta=veDOEbenchmarks

2.  Plapormneedstomeettheneedsofextremecompu=nganddatausersbyaccelera=ngworkflowperformance

3.  Plapormshouldprovideavehicleforthedemonstra=onanddevelopmentofexascale-eratechnologies

4.  Deliveryinthe2020=meframe

GoalsandObjec@vesfortheNERSC-9Project

21

Page 22: NERSC-9€¦ · informaon to benefit future studies and help future validaon of results • Whitepaper released which describes other storage uses cases present in APEX workflows

•  NERSC-9willbuilduponthesuccessesofthedatadifferentcomponentsofCori

•  Endtoendworkflowrequirementsandperformancearecri=calforthedesignandop=miza=onofthesystem

•  Overallgoalistoenableseamlessdatamo=onwithdynamicalloca=onandschedulingofresources–  Enablefirststepstowardsexascale-erastoragesystem

–  Vendorcommunityexcitedaboutengagementandcollabora=onopportuni=es

NERSC-9WillProvideCapabili@esforDOEData-IntensiveUsersin2020

22

Page 23: NERSC-9€¦ · informaon to benefit future studies and help future validaon of results • Whitepaper released which describes other storage uses cases present in APEX workflows

APEX2020–NREonthePaththeExascale•  TheAPEX2020systemsNREtopicswilltargetareasthat

–  achievehigherapplica=onperformance,–  improvesupportfordata-intensivecompu=ng,and,–  enablegreatereaseofusebyadvancingnewtechnologiesonthepathtotheexascalesystemsin2023

•  TheCrossroadsandNERSC-9plapormsNREtopicsare–  Technologiesfortheexplora=onofnewandnovelprogrammingmodelsconcepts

–  Aplapormintegratedstoragesystemthatsupportsnewmodelsformovingandmanagingdataseamlessly

–  Systemswithscalablemanagementcapabili=estoenhancethereliability,resilience,powerandenergyusagecharacteris=cs

23

Page 24: NERSC-9€¦ · informaon to benefit future studies and help future validaon of results • Whitepaper released which describes other storage uses cases present in APEX workflows

Summary

•  NERSC-9willbe2020machinethatmeetstheneedsofallNERSCusers

•  NERSCwillcon=nueitsNESAPprograminsupportofNERSC-9

•  NERSCwillpartnerwithvendorsonNon-RecurringEngineeringprojectstomaximizetheusabilityandperformanceofthemachine

-24-

Page 25: NERSC-9€¦ · informaon to benefit future studies and help future validaon of results • Whitepaper released which describes other storage uses cases present in APEX workflows

Ques@ons?

25

Page 26: NERSC-9€¦ · informaon to benefit future studies and help future validaon of results • Whitepaper released which describes other storage uses cases present in APEX workflows

TheApplica@onTransi@onProgramisdesignedtocon@nueusersonthepathtoexascale

•  Technicalspecifica=onsasksforCenterofExcellence–  Establishmentofacollabora=onbetweentheLabs,thechosenOEM,andkeytechnologyproviders,e.g.processor,isessen=altomeetthegoalsofthemakingefficientuseoftheplapormina=melymanner

•  CenterofExcellence(CoE)baseduponpreviousDOEefforts–  NERSCExascaleScien=ficApplica=onsProgram(NESAP)–  CAAR&ESPprogramsatORNL&ANL

•  CenterofExcellence(CoE)leveragessomeorallof:–  SSImetricapplica=ons–  NERSCExascaleScien=ficApplica=onsProgram(NESAP)–  Selectapplica=onsexpectedtousethemachineshortlyaLeropera=onalreadiness/acceptance

26

Page 27: NERSC-9€¦ · informaon to benefit future studies and help future validaon of results • Whitepaper released which describes other storage uses cases present in APEX workflows

TheApplica@onTransi@onProgramwillprovidedevelopmentresourcesforusers•  Earlyaccesstokeytechnologiesandprogramming

environmentsisessen=alforapplica=ontransi=on–  Programmingenvironmentiscrucial

•  Accesstoemula=onandsimula=oncapabili=esasearlyaspossible–  keycontribu=onoftechnologyproviders

•  EarlyAccessDevelopmentSystem–  Oneormoreitera=onsofincreasingscale

•  Eventually2-10%offinalsystemsize

•  Developmenttestbeds–  Toinves=gateselectadvancedtechnologyareas

•  E.g.Network,powermanagement,burstbuffer

–  Sameordifferentcomposi=onofhardwaredependingontopic

27

Page 28: NERSC-9€¦ · informaon to benefit future studies and help future validaon of results • Whitepaper released which describes other storage uses cases present in APEX workflows

APEXNonRecurringEngineering(NRE):Philosophy

•  TechnicalSpecifica=onsaskforNREproposals•  NREcontractspoten=ally10-15%ofplapormbudgets

•  Othertopicsthathavepoten=altoimpactpathtoexascalewillbeconsidered

•  Focusontopicsthatprovideaddedvaluebeyondplannedvendorroadmapac=vi=es

•  NREcollabora=onswillhaveimpactonfollow-onplapormsprocuredbytheU.S.DepartmentofEnergy'sNNSAandOfficeofScience.

28