Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Memory Organisation Cartography & Analysis
Beniamine DavidGuillaume Huard Bruno Ra�n
March 19, 2015
Memory Organisation Cartography & Analysis
Outline
1 Context and motivations
2 MOCADesignVisualisationPerformances evaluation
3 Conclusions and future work
Beniamine David (MOAIS) 2/26
Memory Organisation Cartography & Analysis
I Context and motivations
Outline
1 Context and motivations
2 MOCADesignVisualisationPerformances evaluation
3 Conclusions and future work
Beniamine David (MOAIS) 3/26
Memory Organisation Cartography & Analysis
I Context and motivations
How to optimize a complex application ?
SOFA [Allard et al., 2007, Nesme et al., 2009]
Simulation framework design for scientists ⇒ precise
Interactive simulation ⇒ fast
High abstraction level ⇒ hard to pro�le
How to improve such applications ?
Get a deep understanding of the code ?
Extract and optimize �costly� parts ?
Limitations
How to identify �costly� parts ?
Optimization done outside the framework still e�cient inside ?
Beniamine David (MOAIS) 4/26
Memory Organisation Cartography & Analysis
I Context and motivations
How to optimize a complex application ?
SOFA [Allard et al., 2007, Nesme et al., 2009]
Simulation framework design for scientists ⇒ precise
Interactive simulation ⇒ fast
High abstraction level ⇒ hard to pro�le
How to improve such applications ?
Get a deep understanding of the code ?
Extract and optimize �costly� parts ?
Limitations
How to identify �costly� parts ?
Optimization done outside the framework still e�cient inside ?
Beniamine David (MOAIS) 4/26
Memory Organisation Cartography & Analysis
I Context and motivations
How to optimize a complex application ?
SOFA [Allard et al., 2007, Nesme et al., 2009]
Simulation framework design for scientists ⇒ precise
Interactive simulation ⇒ fast
High abstraction level ⇒ hard to pro�le
How to improve such applications ?
Get a deep understanding of the code ?
Extract and optimize �costly� parts ?
Limitations
How to identify �costly� parts ?
Optimization done outside the framework still e�cient inside ?
Beniamine David (MOAIS) 4/26
Memory Organisation Cartography & Analysis
I Context and motivations
How to optimize a complex application ?
SOFA [Allard et al., 2007, Nesme et al., 2009]
Simulation framework design for scientists ⇒ precise
Interactive simulation ⇒ fast
High abstraction level ⇒ hard to pro�le
How to improve such applications ?
Get a deep understanding of the code ?
Extract and optimize �costly� parts ?
Limitations
How to identify �costly� parts ?
Optimization done outside the framework still e�cient inside ?
Beniamine David (MOAIS) 4/26
Memory Organisation Cartography & Analysis
I Context and motivations
How to optimize a complex application ?
SOFA [Allard et al., 2007, Nesme et al., 2009]
Simulation framework design for scientists ⇒ precise
Interactive simulation ⇒ fast
High abstraction level ⇒ hard to pro�le
How to improve such applications ?
Get a deep understanding of the code ?
Extract and optimize �costly� parts ?
Limitations
How to identify �costly� parts ?
Optimization done outside the framework still e�cient inside ?
Beniamine David (MOAIS) 4/26
Memory Organisation Cartography & Analysis
I Context and motivations
How to optimize a complex application ?
SOFA [Allard et al., 2007, Nesme et al., 2009]
Simulation framework design for scientists ⇒ precise
Interactive simulation ⇒ fast
High abstraction level ⇒ hard to pro�le
How to improve such applications ?
Get a deep understanding of the code ?
Extract and optimize �costly� parts ?
Limitations
How to identify �costly� parts ?
Optimization done outside the framework still e�cient inside ?
Beniamine David (MOAIS) 4/26
Memory Organisation Cartography & Analysis
I Context and motivations
How to optimize a complex application ?
SOFA [Allard et al., 2007, Nesme et al., 2009]
Simulation framework design for scientists ⇒ precise
Interactive simulation ⇒ fast
High abstraction level ⇒ hard to pro�le
How to improve such applications ?
Get a deep understanding of the code ?
Extract and optimize �costly� parts ?
Limitations
How to identify �costly� parts ?
Optimization done outside the framework still e�cient inside ?
Beniamine David (MOAIS) 4/26
Memory Organisation Cartography & Analysis
I Context and motivations
How to optimize a complex application ?
SOFA [Allard et al., 2007, Nesme et al., 2009]
Simulation framework design for scientists ⇒ precise
Interactive simulation ⇒ fast
High abstraction level ⇒ hard to pro�le
How to improve such applications ?
Get a deep understanding of the code ?
Extract and optimize �costly� parts ?
Limitations
How to identify �costly� parts ?
Optimization done outside the framework still e�cient inside ?
Beniamine David (MOAIS) 4/26
Memory Organisation Cartography & Analysis
I Context and motivations
HPC machines
Parallel machinesMulti socketsMulti coresInstruction level parallelism
Gap between memory and processorsOver 100 CPU cycles for 1 memory access
Complex memory architectureCache hierarchyNUMAAccelerators
Beniamine David (MOAIS) 5/26
Memory Organisation Cartography & Analysis
I Context and motivations
HPC machines
Parallel machinesMulti socketsMulti coresInstruction level parallelism
Gap between memory and processorsOver 100 CPU cycles for 1 memory access
Complex memory architectureCache hierarchyNUMAAccelerators
Beniamine David (MOAIS) 5/26
Memory Organisation Cartography & Analysis
I Context and motivations
HPC machines
Parallel machinesMulti socketsMulti coresInstruction level parallelism
Gap between memory and processorsOver 100 CPU cycles for 1 memory access
Complex memory architectureCache hierarchyNUMAAccelerators
Beniamine David (MOAIS) 5/26
Memory Organisation Cartography & Analysis
I Context and motivations
Pro�ling tools
Likwid[Treibig et al., 2010]
PAPI [Weaver et al., 2013]
Opro�le [Levon, 2000]
. . .
Limits
Not suitable for complex applications
Produce a lot of data, hard to interpret
Focus on CPUs
Beniamine David (MOAIS) 6/26
Memory Organisation Cartography & Analysis
I Context and motivations
Pro�ling tools
Likwid[Treibig et al., 2010]
PAPI [Weaver et al., 2013]
Opro�le [Levon, 2000]
. . .
Limits
Not suitable for complex applications
Produce a lot of data, hard to interpret
Focus on CPUs
Beniamine David (MOAIS) 6/26
Memory Organisation Cartography & Analysis
I Context and motivations
Advanced tools
Vtunes [Reinders, 2005]
HPCToolkit [Adhianto et al., 2010]
Paraver [Pillet et al., 1995]
Limits
Complex to use
Still a lot of data
Still focusing on CPUs
Beniamine David (MOAIS) 7/26
Memory Organisation Cartography & Analysis
I Context and motivations
Advanced tools
Vtunes [Reinders, 2005]
HPCToolkit [Adhianto et al., 2010]
Paraver [Pillet et al., 1995]
Limits
Complex to use
Still a lot of data
Still focusing on CPUs
Beniamine David (MOAIS) 7/26
Memory Organisation Cartography & Analysis
I Context and motivations
Memory analysis
MemProf [Lachaize et al., 2012]
Instruction Based Sampling: works only with some AMDCPUs
Only show NUMA remote access patterns
SPCD [Diener et al., 2012, Diener et al., 2013, Cruz et al., 2014]
Based on hardware modi�cation and/or simulation tools
Analysis used within a runtime, not given to the user
No temporal information
Beniamine David (MOAIS) 8/26
Memory Organisation Cartography & Analysis
I Context and motivations
Memory analysis
MemProf [Lachaize et al., 2012]
Instruction Based Sampling: works only with some AMDCPUs
Only show NUMA remote access patterns
SPCD [Diener et al., 2012, Diener et al., 2013, Cruz et al., 2014]
Based on hardware modi�cation and/or simulation tools
Analysis used within a runtime, not given to the user
No temporal information
Beniamine David (MOAIS) 8/26
Memory Organisation Cartography & Analysis
I Context and motivations
Heapinfo [Beniamine, 2013]
Instrumentation using Valgrind
Intercept and record all access on allocated data structures
Cartography view of the memory access
Drawbacks
Valgrind instrumentation ⇒ slow, serialize threads
Naive spatio-temporal merge to reduce the amount ofinformation
Visualisation method does not scale
Beniamine David (MOAIS) 9/26
Memory Organisation Cartography & Analysis
I Context and motivations
Heapinfo [Beniamine, 2013]
Instrumentation using Valgrind
Intercept and record all access on allocated data structures
Cartography view of the memory access
Drawbacks
Valgrind instrumentation ⇒ slow, serialize threads
Naive spatio-temporal merge to reduce the amount ofinformation
Visualisation method does not scale
Beniamine David (MOAIS) 9/26
Memory Organisation Cartography & Analysis
II MOCA
Outline
1 Context and motivations
2 MOCADesignVisualisationPerformances evaluation
3 Conclusions and future work
Beniamine David (MOAIS) 10/26
Memory Organisation Cartography & Analysis
II MOCA 1 Design
Outline
1 Context and motivations
2 MOCADesignVisualisationPerformances evaluation
3 Conclusions and future work
Beniamine David (MOAIS) 11/26
Memory Organisation Cartography & Analysis
II MOCA 1 Design
Objectives
Detect patternsDispersion between structuresDispersion inside a structureFrequencyLinearity
Detect sharingConcurrent accessRead/write concurrencyShared structures
Beniamine David (MOAIS) 12/26
Memory Organisation Cartography & Analysis
II MOCA 1 Design
Objectives
Detect patternsDispersion between structuresDispersion inside a structureFrequencyLinearity
Detect sharingConcurrent accessRead/write concurrencyShared structures
Beniamine David (MOAIS) 12/26
Memory Organisation Cartography & Analysis
II MOCA 1 Design
Design
Linux Kernel Module
Intercept page faultsGenerate false page faults
Group access into chunksPeriodically invalidate each page of the current chunk
Independent record for each thread/process
Beniamine David (MOAIS) 13/26
Memory Organisation Cartography & Analysis
II MOCA 1 Design
Design
Linux Kernel Module
Intercept page faults
Generate false page faults
Group access into chunksPeriodically invalidate each page of the current chunk
Independent record for each thread/process
Beniamine David (MOAIS) 13/26
Memory Organisation Cartography & Analysis
II MOCA 1 Design
Design
Linux Kernel Module
Intercept page faultsGenerate false page faults
Group access into chunksPeriodically invalidate each page of the current chunk
Independent record for each thread/process
Beniamine David (MOAIS) 13/26
Memory Organisation Cartography & Analysis
II MOCA 1 Design
Design
Linux Kernel Module
Intercept page faultsGenerate false page faults
Group access into chunks
Periodically invalidate each page of the current chunk
Independent record for each thread/process
Beniamine David (MOAIS) 13/26
Memory Organisation Cartography & Analysis
II MOCA 1 Design
Design
Linux Kernel Module
Intercept page faultsGenerate false page faults
Group access into chunksPeriodically invalidate each page of the current chunk
Independent record for each thread/process
Beniamine David (MOAIS) 13/26
Memory Organisation Cartography & Analysis
II MOCA 1 Design
Design
Linux Kernel Module
Intercept page faultsGenerate false page faults
Group access into chunksPeriodically invalidate each page of the current chunk
Independent record for each thread/process
Beniamine David (MOAIS) 13/26
Memory Organisation Cartography & Analysis
II MOCA 1 Design
Page fault handling
LINUX MOCA
Pagefault(task, @)
Is taskmonitored
Should wemonitor it
Handlepage fault
Monitor(task)
Add tochunk(task,@)
Is it a falsepage fault
Fix falsepage fault(task,@)
Resumeexecution
page fault handler
no
yes
no
yes
no
yes
Beniamine David (MOAIS) 14/26
Memory Organisation Cartography & Analysis
II MOCA 2 Visualisation
Outline
1 Context and motivations
2 MOCADesignVisualisationPerformances evaluation
3 Conclusions and future work
Beniamine David (MOAIS) 15/26
Memory Organisation Cartography & Analysis
II MOCA 2 Visualisation
Tools
Traces imported inside Framesoc [Pagano et al., 2013]
Visualisation with Ocelotl [Dosimont et al., 2014]
Data aggregationTrade-o� information displayed/ information lossEasy navigation through the tracePossibility to focus on a zone / one type of event
Beniamine David (MOAIS) 16/26
Memory Organisation Cartography & Analysis
II MOCA 2 Visualisation
Tools
Traces imported inside Framesoc [Pagano et al., 2013]Visualisation with Ocelotl [Dosimont et al., 2014]
Data aggregationTrade-o� information displayed/ information lossEasy navigation through the tracePossibility to focus on a zone / one type of event
Beniamine David (MOAIS) 16/26
Memory Organisation Cartography & Analysis
II MOCA 2 Visualisation
Tools
Traces imported inside Framesoc [Pagano et al., 2013]Visualisation with Ocelotl [Dosimont et al., 2014]
Data aggregation
Trade-o� information displayed/ information lossEasy navigation through the tracePossibility to focus on a zone / one type of event
Beniamine David (MOAIS) 16/26
Memory Organisation Cartography & Analysis
II MOCA 2 Visualisation
Tools
Traces imported inside Framesoc [Pagano et al., 2013]Visualisation with Ocelotl [Dosimont et al., 2014]
Data aggregationTrade-o� information displayed/ information loss
Easy navigation through the tracePossibility to focus on a zone / one type of event
Beniamine David (MOAIS) 16/26
Memory Organisation Cartography & Analysis
II MOCA 2 Visualisation
Tools
Traces imported inside Framesoc [Pagano et al., 2013]Visualisation with Ocelotl [Dosimont et al., 2014]
Data aggregationTrade-o� information displayed/ information lossEasy navigation through the trace
Possibility to focus on a zone / one type of event
Beniamine David (MOAIS) 16/26
Memory Organisation Cartography & Analysis
II MOCA 2 Visualisation
Tools
Traces imported inside Framesoc [Pagano et al., 2013]Visualisation with Ocelotl [Dosimont et al., 2014]
Data aggregationTrade-o� information displayed/ information lossEasy navigation through the tracePossibility to focus on a zone / one type of event
Beniamine David (MOAIS) 16/26
Memory Organisation Cartography & Analysis
II MOCA 2 Visualisation
Views
Two di�erent type of views
Cartography viewMemory acces depending on the timeShared / private memory zonesShow the global memory behaviour
Memory GanttPer thread cartographyCompare threads behaviourShow memory use imbalance
Beniamine David (MOAIS) 17/26
Memory Organisation Cartography & Analysis
II MOCA 2 Visualisation
Views
Two di�erent type of viewsCartography view
Memory acces depending on the timeShared / private memory zonesShow the global memory behaviour
Memory GanttPer thread cartographyCompare threads behaviourShow memory use imbalance
Beniamine David (MOAIS) 17/26
Memory Organisation Cartography & Analysis
II MOCA 2 Visualisation
Views
Two di�erent type of viewsCartography view
Memory acces depending on the timeShared / private memory zonesShow the global memory behaviour
Memory GanttPer thread cartographyCompare threads behaviourShow memory use imbalance
Beniamine David (MOAIS) 17/26
Memory Organisation Cartography & Analysis
II MOCA 2 Visualisation
Cartography view
Figure: Cartography view of a naïve parallel matrix multiplication
Beniamine David (MOAIS) 18/26
Memory Organisation Cartography & Analysis
II MOCA 2 Visualisation
Memory gantt
Figure: Memory-gantt view of a naïve parallel matrix multiplication.
Beniamine David (MOAIS) 19/26
Memory Organisation Cartography & Analysis
II MOCA 3 Performances evaluation
Outline
1 Context and motivations
2 MOCADesignVisualisationPerformances evaluation
3 Conclusions and future work
Beniamine David (MOAIS) 20/26
Memory Organisation Cartography & Analysis
II MOCA 3 Performances evaluation
Experimental setup
Virtual machine made with Kameleon [Ruiz et al., 2015]Recipe: http://moais.imag.fr/membres/david.beniamine/
kameleon/deb_virt_dev_RR_MOCA.tar.gz
Application: extremely naïve parallel matrix multiplication
Matrix size 10002 doubles
Machine OS Processor #cores Mem (Gib)Host Debian Wheezy Intel Xeon E5-1607 6 16Guest Debian Jessie Intel Xeon E5-1607 2 4
Beniamine David (MOAIS) 21/26
Memory Organisation Cartography & Analysis
II MOCA 3 Performances evaluation
Wakeup Interval
0
10000
20000
30000
0 30 60 90 120
Wakeup interval (ms)
Tim
e (
ms)
0
500000
1000000
1500000
0 30 60 90 120
Wakeup interval (ms)
Eve
nts
Beniamine David (MOAIS) 22/26
Memory Organisation Cartography & Analysis
II MOCA 3 Performances evaluation
False pagefaults
0
4000
8000
12000
hashmap hack none nomonitor
False pagefault type
Tim
e (
ms)
0e+00
1e+05
2e+05
3e+05
4e+05
hashmap hack none
False pagefault type
Eve
nts
Beniamine David (MOAIS) 23/26
Memory Organisation Cartography & Analysis
III Conclusions and future work
Outline
1 Context and motivations
2 MOCADesignVisualisationPerformances evaluation
3 Conclusions and future work
Beniamine David (MOAIS) 24/26
Memory Organisation Cartography & Analysis
III Conclusions and future work
Conclusions
New kind of performance analysis [Beniamine et al., 2015]
Memory access patterns
Temporal evolution
Thread sharing informations
Detect imbalanced memory use
Fast analysis
Beniamine David (MOAIS) 25/26
Memory Organisation Cartography & Analysis
III Conclusions and future work
Conclusions
New kind of performance analysis [Beniamine et al., 2015]
Memory access patterns
Temporal evolution
Thread sharing informations
Detect imbalanced memory use
Fast analysis
Beniamine David (MOAIS) 25/26
Memory Organisation Cartography & Analysis
III Conclusions and future work
Conclusions
New kind of performance analysis [Beniamine et al., 2015]
Memory access patterns
Temporal evolution
Thread sharing informations
Detect imbalanced memory use
Fast analysis
Beniamine David (MOAIS) 25/26
Memory Organisation Cartography & Analysis
III Conclusions and future work
Conclusions
New kind of performance analysis [Beniamine et al., 2015]
Memory access patterns
Temporal evolution
Thread sharing informations
Detect imbalanced memory use
Fast analysis
Beniamine David (MOAIS) 25/26
Memory Organisation Cartography & Analysis
III Conclusions and future work
Conclusions
New kind of performance analysis [Beniamine et al., 2015]
Memory access patterns
Temporal evolution
Thread sharing informations
Detect imbalanced memory use
Fast analysis
Beniamine David (MOAIS) 25/26
Memory Organisation Cartography & Analysis
III Conclusions and future work
Conclusions
New kind of performance analysis [Beniamine et al., 2015]
Memory access patterns
Temporal evolution
Thread sharing informations
Detect imbalanced memory use
Fast analysis
Beniamine David (MOAIS) 25/26
Memory Organisation Cartography & Analysis
III Conclusions and future work
Perspectives
Short term
Improve memory gantt view
Fine grain sharing detection
More experiments
Explain benchmarks behaviour (NAS, ParSec, . . . )
Long term
Work on real applications
Describe a methodology
Beniamine David (MOAIS) 26/26
Memory Organisation Cartography & Analysis
III Conclusions and future work
Perspectives
Short term
Improve memory gantt view
Fine grain sharing detection
More experiments
Explain benchmarks behaviour (NAS, ParSec, . . . )
Long term
Work on real applications
Describe a methodology
Beniamine David (MOAIS) 26/26
Memory Organisation Cartography & Analysis
IV Bibliography
Adhianto, L., Banerjee, S., Fagan, M., Krentel, M., Marin, G.,Mellor-Crummey, J., and Tallent, N. R. (2010).HPCTOOLKIT: tools for performance analysis of optimizedparallel programs.Concurrency and Computation: Practice and Experience,22(6):685�701.
Allard, J., Cotin, S., Faure, F. c., Bensoussan, P.-J., Poyer,F. c., Duriez, C., Delingette, H., and Grisoni, L. (2007).SOFA - an Open Source Framework for Medical Simulation.In Medicine Meets Virtual Reality (MMVR 15), palm beach,États-Unis.
Beniamine, D. (2013).Cartographier la mémoire virtuelle d'une application de calculscienti�que.In ComPAS'2013 / RenPar'21, Grenoble, France.
Beniamine David (MOAIS) 26/26
Memory Organisation Cartography & Analysis
IV Bibliography
Beniamine, D., Corre, Y., Dosimont, D., and Huard, G.(2015).Memory Organisation Cartography & Analysis.Research Report 8694, INRIA Grenoble.
Cruz, E. H. M., Diener, M., Alves, M. A. Z., and Navaux, P.O. A. (2014).Dynamic thread mapping of shared memory applications byexploiting cache coherence protocols .Journal of Parallel and Distributed Computing ,74(3):2215�2228.
Diener, M., Cruz, E. H. M., and Navaux, P. O. A. (2012).Using the Translation Lookaside Bu�er to Map Threads inParallel Applications Based on Shared Memory.In Parallel Distributed Processing Symposium (IPDPS), 2012IEEE 26th International, pages 532�543.
Diener, M., Cruz, E. H. M., and Navaux, P. O. A. (2013).Beniamine David (MOAIS) 26/26
Memory Organisation Cartography & Analysis
IV Bibliography
Communication-Based Mapping Using Shared Pages.In Parallel Distributed Processing (IPDPS), 2013 IEEE 27thInternational Symposium on, pages 700�711.
Dosimont, D., Schnorr, L. M., Huard, G., and Vincent, J.-M.(2014).A Trace Macroscopic Description based on Time Aggregation.Technical Report RR-8524.Trace visualization; trace analysis; trace overview; timeaggregation; parallel systems; embedded systems; informationtheory; scienti�c computation; multimedia application;debugging; optimization.
Lachaize, R., Lepers, B., and Quema, V. (2012).MemProf: A Memory Pro�ler for NUMA Multicore Systems.In USENIX 2012 Annual Technical Conference (USENIX ATC12), pages 53�64, Boston, MA. USENIX.
Levon, J. (2000).
Beniamine David (MOAIS) 26/26
Memory Organisation Cartography & Analysis
IV Bibliography
Opro�le Manual.Victoria University of Manchester,.
Nesme, M., Kry, P. G., JeRabkova, L., and Faure, F. (2009).Preserving Topology and Elasticity for Embedded DeformableModels.In ACM SIGGRAPH 2009 Papers, SIGGRAPH '09, pages52�1, New York, NY, USA. ACM.
Pagano, G., Dosimont, D., Huard, G., Marangozova-Martin,V., and Vincent, J. M. (2013).Trace Management and Analysis for Embedded Systems.In Embedded Multicore Socs (MCSoC), 2013 IEEE 7thInternational Symposium on, pages 119�122.
Pillet, V., Labarta, J., Cortes, T., and Girona, S. (1995).PARAVER: A Tool to Visualize and Analyze Parallel Code.In Nixon, P., editor, Proceedings of WoTUG-18: Transputerand occam Developments, pages 17�31.
Beniamine David (MOAIS) 26/26
Memory Organisation Cartography & Analysis
IV Bibliography
Reinders, J. (2005).VTune performance analyzer essentials.Intel Press.
Ruiz, C., Harrache, S., Mercier, M., and Richard, O. (2015).Reconstructable Software Appliances with Kameleon.SIGOPS Oper. Syst. Rev., 49(1):80�89.
Treibig, J., Hager, G., and Wellein, G. (2010).LIKWID: A lightweight performance-oriented tool suite for x86multicore environments.In Proceedings of PSTI2010, the First International Workshopon Parallel Software Tools and Tool Infrastructures, San DiegoCA.
Weaver, V. M., Terpstra, D., McCraw, H., Johnson, M.,Kasichayanula, K., Ralph, J., Nelson, J., Mucci, P., Mohan,T., and Moore, S. (2013).PAPI 5: Measuring power, energy, and the cloud.
Beniamine David (MOAIS) 26/26
Memory Organisation Cartography & Analysis
IV Bibliography
In Performance Analysis of Systems and Software (ISPASS),2013 IEEE International Symposium on, pages 124�125.
Beniamine David (MOAIS) 26/26