Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
GoingfurtherwithDamaris:Energy/PerformanceStudy
inPost-PetascaleI/OApproachesMa@hieuDorier,OrçunYildiz,
ShadiIbrahim,GabrielAntoniu,Anne-CécileOrgerie
2ndworkshopoftheJLESCChicago,November2014
1GoingfurtherwithDamaris:Energy/PerformanceStudyinPost-PetascaleI/OApproaches
Challenge(2009):MakeCM1’sI/OscaleforthefutureBlueWaterssystem
Imagecredit:LeighOrf,BobWilhelmson
GoingfurtherwithDamaris:Energy/PerformanceStudyinPost-PetascaleI/OApproaches 2
TradiXonalI/Oapproach
GoingfurtherwithDamaris:Energy/PerformanceStudyinPost-PetascaleI/OApproaches 3
100,000+processes
100to1000I/Oservers
10,000+processes
Transfertoanothercluster
I/Obursts
Bo@leneck
Toomuchdata,toomanyfiles
Offline,a]ersimulaXon
Periodiccheckpoints
Time-parXXoningI/O(or“whyyoudidn’tgetresultsin5meforthedeadline…”)
GoingfurtherwithDamaris:Energy/PerformanceStudyinPost-PetascaleI/OApproaches 4
File-per-processapproach Collec=veI/Oapproach
SimulaXonperiodicallystopstoperformI/O
• Toomanyfiles• Hardtoreadback• Highmetadataoverhead
• RequiressynchronizaXon• DatacommunicaXonsteps
GoingfurtherwithDamaris:Energy/PerformanceStudyinPost-PetascaleI/OApproaches 5
SoluXon:Damaris
Damarisinafewkeyconcepts
• DedicatedI/Ocores• thesecoresdonotperformanycomputaXon
• Sharedmemory• toimprovememoryusagebyavoidingcopies
• Pluginsystem• adaptability/flexibility• connecXonwithvisualizaXonso]ware
• SimpleAPIandexternalXML-basedmetadata
6GoingfurtherwithDamaris:Energy/PerformanceStudyinPost-PetascaleI/OApproaches
DamarisinthecontextoftheJointLab
• Star=ngpoint• Nov.2009:preliminarydiscussionsonI/OchallengesforBlueWatersatthe
2ndJLPCworkshop• Firststeps
• 2010:StartofthecollaboraXonoftheKerDatateam(INRIA)withNCSA:Ma@hieuDorier’sMSinternship@UIUCwithFranckCappelloandMarcSnir
• 2011:CollaboraXonextendedtoANLsince(RobRoss,TomPeterka)• Otherinternshipsandmutualvisitsfollowed
• Damarisatthecoreofseveraljointprojects• 2012:FACCTS(2012)–PIs:RobRoss,GabrielAntoniu• 2013-2014:Data@ExascaleAssociateTeam(INRIA,ANL,UIUC)
• PIs:GabrielAntoniu(INRIA),RobRoss(ANL),MarcSnir(UIUC)• 2014-2016:aWPwithintheNextGNPUFANL-INRIA(+partners)• OtherjointprojectsinpreparaXon
GoingfurtherwithDamaris:Energy/PerformanceStudyinPost-PetascaleI/OApproaches 7
EvoluXonofDamaris
2011 2012 2013 2014
DevelopmentStarXng
Version1.0ICSACMSRC2ndPrize
Cluster2012
IPDPS2013PhDforum LDAV
2013
DIDC2014
IPDPS2014PhDforum
CM1 Nek5000OLAM
GTC
DedicatednodesInSituVisualizaXon
enabledwithVisIt
Grid’5000
KrakenTitan
Intrepid
BlueWaters
TimeParXXoning
8GoingfurtherwithDamaris:Energy/PerformanceStudyinPost-PetascaleI/OApproaches
Peopleinvolved
INRIA• Ma@hieuDorier• GabrielAntoniu• LokmanRahmani• ShadiIbrahim• OrçunYildiz• Anne-CécileOrgerie
NCSA• RobertoSisneros• DaveSemeraro
ANL• FranckCappello• MarcSnir• RobRoss• TomPeterka• DriesKimpe
Internships:• Ma@hieuDorier(1styearmaster)-2010• Ma@hieuPerin(1styearmaster)-2011• SergiuVicol(1styearbachelor)-2012• CatalinaNita(1styearmaster)-2013• OrçunYildiz(2ndyearmaster)-2014
Externalusers/contributors:• LeighOrf(CentralMichigan)• FrancieliZanonBoito(UFRGS)• RodrigoKassick(UFRGS)
9GoingfurtherwithDamaris:Energy/PerformanceStudyinPost-PetascaleI/OApproaches
Damaris1.0:stateoftheimplementaXon• 3modes
• Synchronous(Xme-parXXoning)• Dedicatedcore(s)• Dedicatednode(s)
• VerysimpleAPIforC,C++andFortransimulaXons• XML-baseddatadescripXon• Enable/Disablesharedmemory• Pluginsystem(C++plugins)• ConnecXontoVisItforinsituvisualiza=on
• About20,000linesofC++code,basedonMPI• DependsonBoost,Xerces-C,XSD,(opXonallyVisIt)• hRp://damaris.gforge.inria.fr
• Poten=alplansforintegra=onwithintheVisItpackage• Poten=alplansforuseasoneofthedefaultbackendinCM1
10GoingfurtherwithDamaris:Energy/PerformanceStudyinPost-PetascaleI/OApproaches
ThreeI/OapproachesinDamarisTimeParXXoning DedicatedCore(s) DedicatedNode(s)
• SwitchbetweenmodesusingconfiguraXonfile• <dedicatedcores=“X”nodes=“Y”/>
• Timepar==oning• Goodatsmallscale,badatlargerscales
• Dedicatedcores• Goodwhenmanycores/node,whenmemorycanbeafforded
• Dedicatednodes• Goodwhenfewcores/nodeandmemoryonanodeisenough
11GoingfurtherwithDamaris:Energy/PerformanceStudyinPost-PetascaleI/OApproaches
Focusofthistalk:HowmuchdoestheI/Oapproach
impactenergyefficiency?
Othertalksrelatedtothiscollabora5on:LokmanRahmani,GabrielAntoniu
GoingfurtherwithDamaris:Energy/PerformanceStudyinPost-PetascaleI/OApproaches 12
Goals
Studytheimpactof:• TheI/Oapproach(dedicatedcores,nodes,etc.)• TheI/Ofrequency(XmebetweenI/Ophases)• Theunderlyingarchitecture
On:• SimulaXonperformance• EnergyconsumpXon
13GoingfurtherwithDamaris:Energy/PerformanceStudyinPost-PetascaleI/OApproaches
Experimentalsetup:CM1onGrid’5000
Nancysite:• Graphenecluster(4cores/node)• 20GInfiniBandnetwork• 6PVFSservers• EATONPowerDistribuXonUnits• CM1on32nodes(128cores)
Rennessite:• Parapluiecluster(24cores/node)• 20GInfiniBandnetwork• 3PVFSservers• EATONPowerDistribuXonUnits• CM1on16nodes(384cores)
14GoingfurtherwithDamaris:Energy/PerformanceStudyinPost-PetascaleI/OApproaches
ImpactoftheI/Oapproach(G5K/Nancy)
Andthewinneris…
LongerrunXme+I/Ovariability=lowerpowerusage
Lower=Be@er
15GoingfurtherwithDamaris:Energy/PerformanceStudyinPost-PetascaleI/OApproaches
ImpactoftheI/Ofrequency(G5K/Nancy)
• Time-par==oning:lineardependencybetweenfrequencyandenergyconsumpXon
• Dedicatedresources:whendoingI/Oevery10iteraXons,DC(1),DC(2)andDN(7:1)cannotkeepup,resulXnginhigherenergyconsumpXon
• Whendedicatedresourcescankeepup,theenergyconsump=ondoesnotdependonI/Oanymore
Lower=Be@er
16GoingfurtherwithDamaris:Energy/PerformanceStudyinPost-PetascaleI/OApproaches
Impactofthearchitecture(G5K/Nancy,Rennes)
DedicaXng1coreisthebestapproachontheRennessite(24corespernode)
DedicaXng1nodeevery8isthebestapproachontheNancysite(4corespernode)
Differentnumberofcorespernode=
differentop=malI/Oapproach
Lower=Be@er
17GoingfurtherwithDamaris:Energy/PerformanceStudyinPost-PetascaleI/OApproaches
Overallpower/runXmeresults
Nancy Rennes
1dedicatednodefor7simulaXonnodes
1dedicatedcoreper24-corenode
18GoingfurtherwithDamaris:Energy/PerformanceStudyinPost-PetascaleI/OApproaches
CanwemodeltheenergyefficiencyunderdifferentI/Oapproaches?
Canwepredictthebestone?
19GoingfurtherwithDamaris:Energy/PerformanceStudyinPost-PetascaleI/OApproaches
Model’shypothesis
GoingfurtherwithDamaris:Energy/PerformanceStudyinPost-PetascaleI/OApproaches 20
• ApplicaXoniscomputaXon-intensive• I/OinfullyoverlapwithcomputaXon
Energymodel:generalcase
E = Tsim ×Psim
Tsim =Tbase ×niterations
(ncores/node × score(ncores/node ))(nnodes × snodes (nnodes ))
ScalabilityfuncXonsw.r.t.numberofcoresandnumberofnodes
Timefor1iteraXonon1core NumberofiteraXons
Simplifica=on1:thescalabilityw.r.t.thenumberofcorespernodedoesnotdependonthenumberofnode,andthescalabilityw.r.t.thenumberofnodesdoesnotdependonthenumberofcorespernode.
21GoingfurtherwithDamaris:Energy/PerformanceStudyinPost-PetascaleI/OApproaches
Energymodelfordedicatednodes
Psim =Pmaxc+
12Pidle +Pmax( )d
c+ dNumberofsimulaXonnodes
Numberofdedicatednodes
Maxpowerofanode Idlepowerofanode
Simplifica=on2:Thepowerofadedicatednodeistheaverageofmaxandidle.
22GoingfurtherwithDamaris:Energy/PerformanceStudyinPost-PetascaleI/OApproaches
Energymodelfordedicatedcores
Psim = Pmax
Simplifica=on3:ThepowerofanoderunningthesimulaXondoeschangesignificantlywhensomeofthecoresarededicatedtoI/O.
23GoingfurtherwithDamaris:Energy/PerformanceStudyinPost-PetascaleI/OApproaches
ModelcalibraXon(withCM1onG5K/Rennes)
ScalabilityofCM1w.r.t.thenumberofcorespernodeandw.r.t.thenumberofnodes
PowerconsumpXonof8nodesonRennessiteofG5K(maxpowerandidlepower)
24GoingfurtherwithDamaris:Energy/PerformanceStudyinPost-PetascaleI/OApproaches
ModelvalidaXon(CM1onG5K/Rennes)
• Fiveruns(errorbars=min-max)• WorstrelaXveerrorbetweenmodel
andobservaXon:4%• LargervariaXonswithDN(7:1):
probablyduetonetworkcontenXon
• Bestapproachpredicted(andobserved):1dedicatedcore/node
25GoingfurtherwithDamaris:Energy/PerformanceStudyinPost-PetascaleI/OApproaches
ModelvalidaXon(CM1onG5K/Nancy)
• Fiveruns(errorbars=min-max)• WorstrelaXveerrorbetweenmodel
andobservaXon:5.7%
• Bestapproachpredicted(andobserved):1dedicatednodefor7non-dedicatednodes
26GoingfurtherwithDamaris:Energy/PerformanceStudyinPost-PetascaleI/OApproaches
Model’saccuracy:summary
Site Approach Accuracy
Rennes Dedicatedcores(1)Dedicatedcores(2)Dedicatednodes(15:1)Dedicatednodes(7:1)
96.0%96.9%97.3%98.0%
Nancy Dedicatedcores(1)Dedicatednodes(15:1)Dedicatednodes(7:1)
95.0%94.3%95.0%
27GoingfurtherwithDamaris:Energy/PerformanceStudyinPost-PetascaleI/OApproaches
Conclusion
28GoingfurtherwithDamaris:Energy/PerformanceStudyinPost-PetascaleI/OApproaches
ConclusionandfuturedirecXons
Contribu=ons:• Insightonenergy/performanceofI/Oapproaches• AllavailablewithinDamaris• Energymodelfordedicatedcoresanddedicatednodes• ValidaXononGrid’5000withCM1
Model’slimita=on:• ValidforcomputaXon-intensiveapplicaXons• Doesnotincludenetwork-relatedenergyconsumpXon• DoesnotincludetheenergyconsumpXoninthestoragesystem
Futurework:• ValidaXonwithothersimulaXons• Tradeoffbetweencompression,performanceandenergyconsumpXon
GoingfurtherwithDamaris:Energy/PerformanceStudyinPost-PetascaleI/OApproaches 29
AbitofadverXsement:Darshan-Web
GoingfurtherwithDamaris:Energy/PerformanceStudyinPost-PetascaleI/OApproaches 30
Demo:h@p://darshan-web.irisa.frInstalla=ontutorial:h@p://darshan-ruby.gforge.inria.fr