View
4
Download
0
Category
Preview:
Citation preview
1
PartnershipInitiativeComputationalScience– UsingHPCefficiently
DieterKranzlmüller
Munich NetworkManagementTeamLudwig-Maximilians-UniversitätMünchen(LMU)&LeibnizSupercomputing Centre (LRZ)of the Bavarian Academy of Sciences and Humanities
FromMichaelResch‘s Talk
n Prof.Resch:
– „Educationand TrainingHPCand the challenges we see inHPC“
– „It is aquestion of algorithms,of methods,of how you use the system“
– „For asuccessful simulation,you need to know computer science,mathematics,and the respective field of application“
– „FocusatHLRSonengineering“
– „Trainingis necessary!“
D.Kranzlmüller SSIP2017Workshop 2
2
LeibnizSupercomputingCentreoftheBavarianAcademyofSciencesandHumanities
D.Kranzlmüller SSIP2017Workshop 3
Withapprox.230employeesformorethan100.000studentsandformorethan30.000employeesincluding8.500scientists
• EuropeanSupercomputingCentre• National SupercomputingCentreè GCS
• RegionalComputerCentreforallBavarianUniversities• ComputerCentreforallMunichUniversities
Photo:ErnstGraf
SuperMUC Phase 1 + 2
D.Kranzlmüller SSIP2017Workshop 4
3
LRZBuildingExtension
D.Kranzlmüller SSIP2017Workshop 5
Picture: Ernst A. Graf
Picture: Horst-Dieter Steinhöfer
Figure: Herzog+Partner für StBAM2 (staatl. Hochbauamt München 2)
I/O
nodes
NAS80 Gbit/s
18 Thin node islands (each >8000 cores)
1 Fat node island (8200 cores) è SuperMIG
$HOME1.5 PB / 10 GB/s
Snapshots/Replika1.5 PB
(separate fire section)
non blocking non blocking
pruned tree (4:1)
SB-EP16 cores/node
2 GB/core
WM-EX40cores/node6.4 GB/core
10 PB
200 GB/s
GPFS for$WORK
$SCRATCH
InternetAchive and Backup ~ 30 PB
Desaster Recovery Site
Compute nodes Compute nodes
SuperMUCArchitecture
D.Kranzlmüller SSIP2017Workshop 6
4
PowerConsumptionatLRZ
D.Kranzlmüller SSIP2017Workshop 7
0
5.000
10.000
15.000
20.000
25.000
30.000
35.000St
rom
verb
rauc
h in
MW
h
CoolingSuperMUC
D.Kranzlmüller SSIP2017Workshop 8
5
SuperMUCPhase2@LRZ
SSIP2017Workshop
HighEnergyEfficiencyü UsageofIntelXeonE52697v3processorsü Directliquidcooling
- 10%poweradvantageoveraircooledsystem- 25%poweradvantageduetochiller-lesscooling
Photos:TorstenBloth,Lenovo
ü Energy-aware scheduling- 6%poweradvantage- ~40%poweradvantage- Totalannualsavingsof~2Mio.€
forSuperMUCPhase1and2D.Kranzlmüller 9Slide:HerbertHuber
Increasing numbers
Date System Flop/s Cores
2000 HLRB-I 2 Tflop/s 1512
2006 HLRB-II 62 Tflop/s 9728
2012 SuperMUC 3200 Tflop/s 155656
2015 SuperMUC Phase II 3.2 + 3.2 Pflop/s 229960
D.Kranzlmüller SSIP2017Workshop 10
6
SuperMUC Jobsize 2015 (in Cores)
SSIP2017WorkshopD.Kranzlmüller
11
Challenges onExtremeScale Systems
n Size:number of cores >100.000
n Complexity/Heterogeneity
n Reliability/Resilience
n Energy consumption as part of TotalCost of Ownership(TCO)– Executecodes with optimalpowerconsumption
(or within acertain powerband)è Frequency scaling– Optimize for energy-to-solution
è Allow more codes within given budget– Improved performance
è (inmost cases)improved energy-to-solution
D.Kranzlmüller SSIP2017Workshop 12
7
1st LRZ Extreme Scale Workshop
n July2013:1st LRZExtremeScaleWorkshop
n Participants:– 15internationalprojects
n Prerequisites:– Successfulrunon4islands(32768cores)
n ParticipatingGroups(Softwarepackages):– LAMMPS,VERTEX,GADGET,WaLBerla,BQCD,Gromacs,APES,SeisSol,CIAO
n Successfulresults(>64000Cores):– InvitedtoparticipateinPARCOConference(Sept.2013)includinga
publicationoftheirapproach
D.Kranzlmüller SSIP2017Workshop 13
1st LRZ Extreme Scale Workshop
n Regular SuperMUC operation– 4 Islands maximum– Batch scheduling system
n Entire SuperMUC reserved 2,5 days for challenge:– 0,5 Days for testing– 2 Days for executing– 16 (of 19) Islands available
n Consumed computing time for all groups:– 1 hour of runtime = 130.000 CPU hours– 1 year in total
D.Kranzlmüller SSIP2017Workshop 14
8
Results (Sustained TFlop/s on 128000 cores)
D.Kranzlmüller SSIP2017Workshop 15
Name MPI #cores Description TFlop/s/island TFlop/smaxLinpack IBM 128000 TOP500 161 2560Vertex IBM 128000 PlasmaPhysics 15 245GROMACS IBM,Intel 64000 MolecularModelling 40 110Seissol IBM 64000 Geophysics 31 95waLBerla IBM 128000 LatticeBoltzmann 5.6 90LAMMPS IBM 128000 MolecularModelling 5.6 90APES IBM 64000 CFD 6 47BQCD Intel 128000 QuantumPhysics 10 27
Extreme Scaling Continued
n Lessons learned è Stability and scalability
n LRZ Extreme Scale Benchmark Suite (LESS) will be available in two versions: public and internal
n All teams will have the opportunity to run performance benchmarks after upcoming SuperMUC maintenances
n 2nd LRZ Extreme Scaling Workshop è 2-5 June 2014– Full system production runs on 18 islands with sustained Pflop/s
(4h SeisSol, 7h Gadget)– 4 existing + 6 additional full system applications– High I/O bandwidth in user space possible (66 GB/s of 200 GB/s max)– Important goal: minimize energy*runtime (3-15 W/core)
n 3rd Extreme Scale-Out with new SuperMUC Phase 2
D.Kranzlmüller SSIP2017Workshop 16
9
ExtremeScale-OutSuperMUC Phase2
n 12May– 12June2015(30days)n SelectedGroupof EarlyUsers
n Nightly Operation:general queue max 3islandsn Daytime Operation:special queue max 6islands (full system)
n Totalavailable:63,432,000core hoursn Totalused:43,758,430core hours (Utilisation:68.98%)
Lessons learned (2015):n Preparation is everythingn Finding Heisenbugs is difficultn MPIis at its limitsn Hybrid(MPI+OpenMP)is the way to gon I/Olibraries getting even more important
D.Kranzlmüller SSIP2017Workshop 17
4thExtremeScale Workshop2016
• 4DayWorkshop(29February – 3March2016)• 13Projects:
• 147,456cores in9216Nodes• 14.1Mio CPUh• MaxTimeperJob6h• Dailyand nightly operation mode
Application Field Institution PI
1 INDEXA CDF TUMünchen M.Kronbichler
2 MPAS Climate Science KIT D.Heinzeller
3 Inhouse MaterialScience TUDresden F.Ortmann
4 HemeLB LifeScience UC London P.Coveney
5 KPM Chemistry FAU Erlangen M.Kreutzer
6 SWIFT Cosmology UDurham M.Schaller
7 LISO CFD TUDarmstadt S.Kraheberger
8 ILDBC Lattice Boltzmann FAU Erlangen M.Wittmann
9 Walberla Lattice Boltzmann FAU Erlangen Ch.Godenschwager
10 GASPI Framework ITWM Kaiserslautern M.Kühn
11 GADGET Cosmology LMUMünchen K.Dolag
12 VERTEX Astrophysics MPI for Astrophysics T.Melson
13 PSC Plasma LMUMünchen K.Bamberg
D.Kranzlmüller SSIP2017Workshop 18
10
VERTEX:SimulationCodeforSupernove Explosions (plasma+neutrino dynamics)
A.Marekand Team(MaxPlanck-InstituteforAstrophysics,Garching)
D.Kranzlmüller SSIP2017Workshop 19
COMPATè CompBioMed (PeterCoveney,UCL)
n Molecular Modelling Simulationrunning onallcores of SuperMUCPhase1+2
n Dockingsimulation of potentials drugs for breast cancern Goal:Ademonstration of feasibility with the powerof high
performance computingn 37hours totalrun timen 241,672coresn 8.900.000CPUhoursn Toolsdeveloped inEUProjectsMAPPERand COMPAT:
http://www.compat-project.eu/
n Lessons learnedè CompBioMed,aCentre of ExcellenceinComputational Biomedicinehttp://www.compbiomed.eu
D.Kranzlmüller SSIP2017Workshop 20
11
Summaryfrom the4thLRZExtremeScale Workshop
Lesson learned
1 Although Phase1is awell running system for some timenow (since 2011)stillsome quirks and problems inthe system have been found and havebeen fixed.
2 Themain focus was onthe performance optimization of the user codesand lead to great results.
3 One code wasusing Phase1+Phase2 for the first time.(for 37h allavailable241,672cores)
4 Applications from JUQUEEN and PizDaint show how the general purposearchitecture of SuperMUCcompares to specialized architectures likeGPUsor BlueGene.
5 Application codes now reach the Pflop/srange.
D.Kranzlmüller SSIP2017Workshop 21
ExtremeScaling - Conclusions
n Thenumberofcomputecores,thecomplexity(andheterogeneity)issteadilyincreasing– introducingnewchallenges/issues
n Usersneedtopossibilitytoreliablyexecute(andoptimize)theircodesonthefullsizemachineswithmorethan100.000cores
n TheExtremeScalingWorkshopSeries@LRZoffersanumberofincentivesforusersè NextWorkshopSpring2017
n ThelessonslearnedfromtheExtremeScalingWorkshopareveryvaluablefortheoperationofthecenter– Improveperformanceofapplicationsandenergyconsumptionduring
operations– Improvereliability/stabilityofhard- andsoftwareenvironmentunder
extremeconditions– Learnhowtousetheinfrastructureandprepareprocessesforoperations
D.Kranzlmüller SSIP2017Workshop 22
12
PartnershipInitiativeComputationalSciencesπCS
n Individualizedservicesforselectedscientificgroups– flagshiprole– Dedicatedpoint-of-contact– Individualsupportandguidanceandtargetedtraining &education– PlanningdependabilityforusecasespecificoptimizedITinfrastructures– EarlyaccesstolatestITinfrastructure(hard- andsoftware)developments
andspecificationoffuturerequirements– AccesstoITcompetencenetworkandexpertiseatCSandMath
departmentsn Partnercontribution
– EmbeddingITexpertsinusergroups– Jointresearchprojects(includingfunding)– Scientificpartnership– equalfooting– jointpublications
n LRZbenefits– Understandingthe(currentandfuture)needsandrequirementsofthe
respectivescientificdomain– Developingfutureservicesforallusergroups– Thematicfocusing:EnvironmentalComputing
D. Kranzlmüller SSIP2017Workshop 23
piCS Cookbook
1. Choose focus topics to serve as lighthouse– Nationalagreement within GCS:LRZfocuses onEnvironment(&Energy)
2. Choose user communities– Already active atLRZ?– Notactive atLRZ?
3. Invite them for introductory piCS Workshops– Showfaces &tour– Discussion onjoint topics,requirements,interests,...
4. Establish linksbetween communities and specific points-of-contact– Whom to talk to,if there are questions?– When to talk to them?Ingeneral,as early as possible– Maybe,place people into the research groups (weekly,for acertain period)
5. Runjoint lectures (e.g.hydrometeorology and computer science)6. Apply for joint projects7. Use HPCefficiently
D.Kranzlmüller SSIP2017Workshop 24
13
D.Kranzlmüller SSIP2017Workshop 25
Partnership Initiative Computational Science –Using HPC efficiently
Dieter Kranzlmüllerkranzlmueller@lrz.de
Recommended