64

Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

Beniamine DavidGuillaume Huard Bruno Ra�n

March 19, 2015

Page 2: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

Outline

1 Context and motivations

2 MOCADesignVisualisationPerformances evaluation

3 Conclusions and future work

Beniamine David (MOAIS) 2/26

Page 3: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

I Context and motivations

Outline

1 Context and motivations

2 MOCADesignVisualisationPerformances evaluation

3 Conclusions and future work

Beniamine David (MOAIS) 3/26

Page 4: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

I Context and motivations

How to optimize a complex application ?

SOFA [Allard et al., 2007, Nesme et al., 2009]

Simulation framework design for scientists ⇒ precise

Interactive simulation ⇒ fast

High abstraction level ⇒ hard to pro�le

How to improve such applications ?

Get a deep understanding of the code ?

Extract and optimize �costly� parts ?

Limitations

How to identify �costly� parts ?

Optimization done outside the framework still e�cient inside ?

Beniamine David (MOAIS) 4/26

Page 5: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

I Context and motivations

How to optimize a complex application ?

SOFA [Allard et al., 2007, Nesme et al., 2009]

Simulation framework design for scientists ⇒ precise

Interactive simulation ⇒ fast

High abstraction level ⇒ hard to pro�le

How to improve such applications ?

Get a deep understanding of the code ?

Extract and optimize �costly� parts ?

Limitations

How to identify �costly� parts ?

Optimization done outside the framework still e�cient inside ?

Beniamine David (MOAIS) 4/26

Page 6: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

I Context and motivations

How to optimize a complex application ?

SOFA [Allard et al., 2007, Nesme et al., 2009]

Simulation framework design for scientists ⇒ precise

Interactive simulation ⇒ fast

High abstraction level ⇒ hard to pro�le

How to improve such applications ?

Get a deep understanding of the code ?

Extract and optimize �costly� parts ?

Limitations

How to identify �costly� parts ?

Optimization done outside the framework still e�cient inside ?

Beniamine David (MOAIS) 4/26

Page 7: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

I Context and motivations

How to optimize a complex application ?

SOFA [Allard et al., 2007, Nesme et al., 2009]

Simulation framework design for scientists ⇒ precise

Interactive simulation ⇒ fast

High abstraction level ⇒ hard to pro�le

How to improve such applications ?

Get a deep understanding of the code ?

Extract and optimize �costly� parts ?

Limitations

How to identify �costly� parts ?

Optimization done outside the framework still e�cient inside ?

Beniamine David (MOAIS) 4/26

Page 8: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

I Context and motivations

How to optimize a complex application ?

SOFA [Allard et al., 2007, Nesme et al., 2009]

Simulation framework design for scientists ⇒ precise

Interactive simulation ⇒ fast

High abstraction level ⇒ hard to pro�le

How to improve such applications ?

Get a deep understanding of the code ?

Extract and optimize �costly� parts ?

Limitations

How to identify �costly� parts ?

Optimization done outside the framework still e�cient inside ?

Beniamine David (MOAIS) 4/26

Page 9: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

I Context and motivations

How to optimize a complex application ?

SOFA [Allard et al., 2007, Nesme et al., 2009]

Simulation framework design for scientists ⇒ precise

Interactive simulation ⇒ fast

High abstraction level ⇒ hard to pro�le

How to improve such applications ?

Get a deep understanding of the code ?

Extract and optimize �costly� parts ?

Limitations

How to identify �costly� parts ?

Optimization done outside the framework still e�cient inside ?

Beniamine David (MOAIS) 4/26

Page 10: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

I Context and motivations

How to optimize a complex application ?

SOFA [Allard et al., 2007, Nesme et al., 2009]

Simulation framework design for scientists ⇒ precise

Interactive simulation ⇒ fast

High abstraction level ⇒ hard to pro�le

How to improve such applications ?

Get a deep understanding of the code ?

Extract and optimize �costly� parts ?

Limitations

How to identify �costly� parts ?

Optimization done outside the framework still e�cient inside ?

Beniamine David (MOAIS) 4/26

Page 11: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

I Context and motivations

How to optimize a complex application ?

SOFA [Allard et al., 2007, Nesme et al., 2009]

Simulation framework design for scientists ⇒ precise

Interactive simulation ⇒ fast

High abstraction level ⇒ hard to pro�le

How to improve such applications ?

Get a deep understanding of the code ?

Extract and optimize �costly� parts ?

Limitations

How to identify �costly� parts ?

Optimization done outside the framework still e�cient inside ?

Beniamine David (MOAIS) 4/26

Page 12: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

I Context and motivations

HPC machines

Parallel machinesMulti socketsMulti coresInstruction level parallelism

Gap between memory and processorsOver 100 CPU cycles for 1 memory access

Complex memory architectureCache hierarchyNUMAAccelerators

Beniamine David (MOAIS) 5/26

Page 13: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

I Context and motivations

HPC machines

Parallel machinesMulti socketsMulti coresInstruction level parallelism

Gap between memory and processorsOver 100 CPU cycles for 1 memory access

Complex memory architectureCache hierarchyNUMAAccelerators

Beniamine David (MOAIS) 5/26

Page 14: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

I Context and motivations

HPC machines

Parallel machinesMulti socketsMulti coresInstruction level parallelism

Gap between memory and processorsOver 100 CPU cycles for 1 memory access

Complex memory architectureCache hierarchyNUMAAccelerators

Beniamine David (MOAIS) 5/26

Page 15: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

I Context and motivations

Pro�ling tools

Likwid[Treibig et al., 2010]

PAPI [Weaver et al., 2013]

Opro�le [Levon, 2000]

. . .

Limits

Not suitable for complex applications

Produce a lot of data, hard to interpret

Focus on CPUs

Beniamine David (MOAIS) 6/26

Page 16: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

I Context and motivations

Pro�ling tools

Likwid[Treibig et al., 2010]

PAPI [Weaver et al., 2013]

Opro�le [Levon, 2000]

. . .

Limits

Not suitable for complex applications

Produce a lot of data, hard to interpret

Focus on CPUs

Beniamine David (MOAIS) 6/26

Page 17: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

I Context and motivations

Advanced tools

Vtunes [Reinders, 2005]

HPCToolkit [Adhianto et al., 2010]

Paraver [Pillet et al., 1995]

Limits

Complex to use

Still a lot of data

Still focusing on CPUs

Beniamine David (MOAIS) 7/26

Page 18: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

I Context and motivations

Advanced tools

Vtunes [Reinders, 2005]

HPCToolkit [Adhianto et al., 2010]

Paraver [Pillet et al., 1995]

Limits

Complex to use

Still a lot of data

Still focusing on CPUs

Beniamine David (MOAIS) 7/26

Page 19: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

I Context and motivations

Memory analysis

MemProf [Lachaize et al., 2012]

Instruction Based Sampling: works only with some AMDCPUs

Only show NUMA remote access patterns

SPCD [Diener et al., 2012, Diener et al., 2013, Cruz et al., 2014]

Based on hardware modi�cation and/or simulation tools

Analysis used within a runtime, not given to the user

No temporal information

Beniamine David (MOAIS) 8/26

Page 20: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

I Context and motivations

Memory analysis

MemProf [Lachaize et al., 2012]

Instruction Based Sampling: works only with some AMDCPUs

Only show NUMA remote access patterns

SPCD [Diener et al., 2012, Diener et al., 2013, Cruz et al., 2014]

Based on hardware modi�cation and/or simulation tools

Analysis used within a runtime, not given to the user

No temporal information

Beniamine David (MOAIS) 8/26

Page 21: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

I Context and motivations

Heapinfo [Beniamine, 2013]

Instrumentation using Valgrind

Intercept and record all access on allocated data structures

Cartography view of the memory access

Drawbacks

Valgrind instrumentation ⇒ slow, serialize threads

Naive spatio-temporal merge to reduce the amount ofinformation

Visualisation method does not scale

Beniamine David (MOAIS) 9/26

Page 22: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

I Context and motivations

Heapinfo [Beniamine, 2013]

Instrumentation using Valgrind

Intercept and record all access on allocated data structures

Cartography view of the memory access

Drawbacks

Valgrind instrumentation ⇒ slow, serialize threads

Naive spatio-temporal merge to reduce the amount ofinformation

Visualisation method does not scale

Beniamine David (MOAIS) 9/26

Page 23: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

II MOCA

Outline

1 Context and motivations

2 MOCADesignVisualisationPerformances evaluation

3 Conclusions and future work

Beniamine David (MOAIS) 10/26

Page 24: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

II MOCA 1 Design

Outline

1 Context and motivations

2 MOCADesignVisualisationPerformances evaluation

3 Conclusions and future work

Beniamine David (MOAIS) 11/26

Page 25: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

II MOCA 1 Design

Objectives

Detect patternsDispersion between structuresDispersion inside a structureFrequencyLinearity

Detect sharingConcurrent accessRead/write concurrencyShared structures

Beniamine David (MOAIS) 12/26

Page 26: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

II MOCA 1 Design

Objectives

Detect patternsDispersion between structuresDispersion inside a structureFrequencyLinearity

Detect sharingConcurrent accessRead/write concurrencyShared structures

Beniamine David (MOAIS) 12/26

Page 27: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

II MOCA 1 Design

Design

Linux Kernel Module

Intercept page faultsGenerate false page faults

Group access into chunksPeriodically invalidate each page of the current chunk

Independent record for each thread/process

Beniamine David (MOAIS) 13/26

Page 28: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

II MOCA 1 Design

Design

Linux Kernel Module

Intercept page faults

Generate false page faults

Group access into chunksPeriodically invalidate each page of the current chunk

Independent record for each thread/process

Beniamine David (MOAIS) 13/26

Page 29: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

II MOCA 1 Design

Design

Linux Kernel Module

Intercept page faultsGenerate false page faults

Group access into chunksPeriodically invalidate each page of the current chunk

Independent record for each thread/process

Beniamine David (MOAIS) 13/26

Page 30: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

II MOCA 1 Design

Design

Linux Kernel Module

Intercept page faultsGenerate false page faults

Group access into chunks

Periodically invalidate each page of the current chunk

Independent record for each thread/process

Beniamine David (MOAIS) 13/26

Page 31: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

II MOCA 1 Design

Design

Linux Kernel Module

Intercept page faultsGenerate false page faults

Group access into chunksPeriodically invalidate each page of the current chunk

Independent record for each thread/process

Beniamine David (MOAIS) 13/26

Page 32: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

II MOCA 1 Design

Design

Linux Kernel Module

Intercept page faultsGenerate false page faults

Group access into chunksPeriodically invalidate each page of the current chunk

Independent record for each thread/process

Beniamine David (MOAIS) 13/26

Page 33: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

II MOCA 1 Design

Page fault handling

LINUX MOCA

Pagefault(task, @)

Is taskmonitored

Should wemonitor it

Handlepage fault

Monitor(task)

Add tochunk(task,@)

Is it a falsepage fault

Fix falsepage fault(task,@)

Resumeexecution

page fault handler

no

yes

no

yes

no

yes

Beniamine David (MOAIS) 14/26

Page 34: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

II MOCA 2 Visualisation

Outline

1 Context and motivations

2 MOCADesignVisualisationPerformances evaluation

3 Conclusions and future work

Beniamine David (MOAIS) 15/26

Page 35: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

II MOCA 2 Visualisation

Tools

Traces imported inside Framesoc [Pagano et al., 2013]

Visualisation with Ocelotl [Dosimont et al., 2014]

Data aggregationTrade-o� information displayed/ information lossEasy navigation through the tracePossibility to focus on a zone / one type of event

Beniamine David (MOAIS) 16/26

Page 36: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

II MOCA 2 Visualisation

Tools

Traces imported inside Framesoc [Pagano et al., 2013]Visualisation with Ocelotl [Dosimont et al., 2014]

Data aggregationTrade-o� information displayed/ information lossEasy navigation through the tracePossibility to focus on a zone / one type of event

Beniamine David (MOAIS) 16/26

Page 37: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

II MOCA 2 Visualisation

Tools

Traces imported inside Framesoc [Pagano et al., 2013]Visualisation with Ocelotl [Dosimont et al., 2014]

Data aggregation

Trade-o� information displayed/ information lossEasy navigation through the tracePossibility to focus on a zone / one type of event

Beniamine David (MOAIS) 16/26

Page 38: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

II MOCA 2 Visualisation

Tools

Traces imported inside Framesoc [Pagano et al., 2013]Visualisation with Ocelotl [Dosimont et al., 2014]

Data aggregationTrade-o� information displayed/ information loss

Easy navigation through the tracePossibility to focus on a zone / one type of event

Beniamine David (MOAIS) 16/26

Page 39: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

II MOCA 2 Visualisation

Tools

Traces imported inside Framesoc [Pagano et al., 2013]Visualisation with Ocelotl [Dosimont et al., 2014]

Data aggregationTrade-o� information displayed/ information lossEasy navigation through the trace

Possibility to focus on a zone / one type of event

Beniamine David (MOAIS) 16/26

Page 40: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

II MOCA 2 Visualisation

Tools

Traces imported inside Framesoc [Pagano et al., 2013]Visualisation with Ocelotl [Dosimont et al., 2014]

Data aggregationTrade-o� information displayed/ information lossEasy navigation through the tracePossibility to focus on a zone / one type of event

Beniamine David (MOAIS) 16/26

Page 41: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

II MOCA 2 Visualisation

Views

Two di�erent type of views

Cartography viewMemory acces depending on the timeShared / private memory zonesShow the global memory behaviour

Memory GanttPer thread cartographyCompare threads behaviourShow memory use imbalance

Beniamine David (MOAIS) 17/26

Page 42: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

II MOCA 2 Visualisation

Views

Two di�erent type of viewsCartography view

Memory acces depending on the timeShared / private memory zonesShow the global memory behaviour

Memory GanttPer thread cartographyCompare threads behaviourShow memory use imbalance

Beniamine David (MOAIS) 17/26

Page 43: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

II MOCA 2 Visualisation

Views

Two di�erent type of viewsCartography view

Memory acces depending on the timeShared / private memory zonesShow the global memory behaviour

Memory GanttPer thread cartographyCompare threads behaviourShow memory use imbalance

Beniamine David (MOAIS) 17/26

Page 44: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

II MOCA 2 Visualisation

Cartography view

Figure: Cartography view of a naïve parallel matrix multiplication

Beniamine David (MOAIS) 18/26

Page 45: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

II MOCA 2 Visualisation

Memory gantt

Figure: Memory-gantt view of a naïve parallel matrix multiplication.

Beniamine David (MOAIS) 19/26

Page 46: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

II MOCA 3 Performances evaluation

Outline

1 Context and motivations

2 MOCADesignVisualisationPerformances evaluation

3 Conclusions and future work

Beniamine David (MOAIS) 20/26

Page 47: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

II MOCA 3 Performances evaluation

Experimental setup

Virtual machine made with Kameleon [Ruiz et al., 2015]Recipe: http://moais.imag.fr/membres/david.beniamine/

kameleon/deb_virt_dev_RR_MOCA.tar.gz

Application: extremely naïve parallel matrix multiplication

Matrix size 10002 doubles

Machine OS Processor #cores Mem (Gib)Host Debian Wheezy Intel Xeon E5-1607 6 16Guest Debian Jessie Intel Xeon E5-1607 2 4

Beniamine David (MOAIS) 21/26

Page 48: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

II MOCA 3 Performances evaluation

Wakeup Interval

0

10000

20000

30000

0 30 60 90 120

Wakeup interval (ms)

Tim

e (

ms)

0

500000

1000000

1500000

0 30 60 90 120

Wakeup interval (ms)

Eve

nts

Beniamine David (MOAIS) 22/26

Page 49: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

II MOCA 3 Performances evaluation

False pagefaults

0

4000

8000

12000

hashmap hack none nomonitor

False pagefault type

Tim

e (

ms)

0e+00

1e+05

2e+05

3e+05

4e+05

hashmap hack none

False pagefault type

Eve

nts

Beniamine David (MOAIS) 23/26

Page 50: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

III Conclusions and future work

Outline

1 Context and motivations

2 MOCADesignVisualisationPerformances evaluation

3 Conclusions and future work

Beniamine David (MOAIS) 24/26

Page 51: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

III Conclusions and future work

Conclusions

New kind of performance analysis [Beniamine et al., 2015]

Memory access patterns

Temporal evolution

Thread sharing informations

Detect imbalanced memory use

Fast analysis

Beniamine David (MOAIS) 25/26

Page 52: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

III Conclusions and future work

Conclusions

New kind of performance analysis [Beniamine et al., 2015]

Memory access patterns

Temporal evolution

Thread sharing informations

Detect imbalanced memory use

Fast analysis

Beniamine David (MOAIS) 25/26

Page 53: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

III Conclusions and future work

Conclusions

New kind of performance analysis [Beniamine et al., 2015]

Memory access patterns

Temporal evolution

Thread sharing informations

Detect imbalanced memory use

Fast analysis

Beniamine David (MOAIS) 25/26

Page 54: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

III Conclusions and future work

Conclusions

New kind of performance analysis [Beniamine et al., 2015]

Memory access patterns

Temporal evolution

Thread sharing informations

Detect imbalanced memory use

Fast analysis

Beniamine David (MOAIS) 25/26

Page 55: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

III Conclusions and future work

Conclusions

New kind of performance analysis [Beniamine et al., 2015]

Memory access patterns

Temporal evolution

Thread sharing informations

Detect imbalanced memory use

Fast analysis

Beniamine David (MOAIS) 25/26

Page 56: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

III Conclusions and future work

Conclusions

New kind of performance analysis [Beniamine et al., 2015]

Memory access patterns

Temporal evolution

Thread sharing informations

Detect imbalanced memory use

Fast analysis

Beniamine David (MOAIS) 25/26

Page 57: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

III Conclusions and future work

Perspectives

Short term

Improve memory gantt view

Fine grain sharing detection

More experiments

Explain benchmarks behaviour (NAS, ParSec, . . . )

Long term

Work on real applications

Describe a methodology

Beniamine David (MOAIS) 26/26

Page 58: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

III Conclusions and future work

Perspectives

Short term

Improve memory gantt view

Fine grain sharing detection

More experiments

Explain benchmarks behaviour (NAS, ParSec, . . . )

Long term

Work on real applications

Describe a methodology

Beniamine David (MOAIS) 26/26

Page 59: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

IV Bibliography

Adhianto, L., Banerjee, S., Fagan, M., Krentel, M., Marin, G.,Mellor-Crummey, J., and Tallent, N. R. (2010).HPCTOOLKIT: tools for performance analysis of optimizedparallel programs.Concurrency and Computation: Practice and Experience,22(6):685�701.

Allard, J., Cotin, S., Faure, F. c., Bensoussan, P.-J., Poyer,F. c., Duriez, C., Delingette, H., and Grisoni, L. (2007).SOFA - an Open Source Framework for Medical Simulation.In Medicine Meets Virtual Reality (MMVR 15), palm beach,États-Unis.

Beniamine, D. (2013).Cartographier la mémoire virtuelle d'une application de calculscienti�que.In ComPAS'2013 / RenPar'21, Grenoble, France.

Beniamine David (MOAIS) 26/26

Page 60: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

IV Bibliography

Beniamine, D., Corre, Y., Dosimont, D., and Huard, G.(2015).Memory Organisation Cartography & Analysis.Research Report 8694, INRIA Grenoble.

Cruz, E. H. M., Diener, M., Alves, M. A. Z., and Navaux, P.O. A. (2014).Dynamic thread mapping of shared memory applications byexploiting cache coherence protocols .Journal of Parallel and Distributed Computing ,74(3):2215�2228.

Diener, M., Cruz, E. H. M., and Navaux, P. O. A. (2012).Using the Translation Lookaside Bu�er to Map Threads inParallel Applications Based on Shared Memory.In Parallel Distributed Processing Symposium (IPDPS), 2012IEEE 26th International, pages 532�543.

Diener, M., Cruz, E. H. M., and Navaux, P. O. A. (2013).Beniamine David (MOAIS) 26/26

Page 61: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

IV Bibliography

Communication-Based Mapping Using Shared Pages.In Parallel Distributed Processing (IPDPS), 2013 IEEE 27thInternational Symposium on, pages 700�711.

Dosimont, D., Schnorr, L. M., Huard, G., and Vincent, J.-M.(2014).A Trace Macroscopic Description based on Time Aggregation.Technical Report RR-8524.Trace visualization; trace analysis; trace overview; timeaggregation; parallel systems; embedded systems; informationtheory; scienti�c computation; multimedia application;debugging; optimization.

Lachaize, R., Lepers, B., and Quema, V. (2012).MemProf: A Memory Pro�ler for NUMA Multicore Systems.In USENIX 2012 Annual Technical Conference (USENIX ATC12), pages 53�64, Boston, MA. USENIX.

Levon, J. (2000).

Beniamine David (MOAIS) 26/26

Page 62: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

IV Bibliography

Opro�le Manual.Victoria University of Manchester,.

Nesme, M., Kry, P. G., JeRabkova, L., and Faure, F. (2009).Preserving Topology and Elasticity for Embedded DeformableModels.In ACM SIGGRAPH 2009 Papers, SIGGRAPH '09, pages52�1, New York, NY, USA. ACM.

Pagano, G., Dosimont, D., Huard, G., Marangozova-Martin,V., and Vincent, J. M. (2013).Trace Management and Analysis for Embedded Systems.In Embedded Multicore Socs (MCSoC), 2013 IEEE 7thInternational Symposium on, pages 119�122.

Pillet, V., Labarta, J., Cortes, T., and Girona, S. (1995).PARAVER: A Tool to Visualize and Analyze Parallel Code.In Nixon, P., editor, Proceedings of WoTUG-18: Transputerand occam Developments, pages 17�31.

Beniamine David (MOAIS) 26/26

Page 63: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

IV Bibliography

Reinders, J. (2005).VTune performance analyzer essentials.Intel Press.

Ruiz, C., Harrache, S., Mercier, M., and Richard, O. (2015).Reconstructable Software Appliances with Kameleon.SIGOPS Oper. Syst. Rev., 49(1):80�89.

Treibig, J., Hager, G., and Wellein, G. (2010).LIKWID: A lightweight performance-oriented tool suite for x86multicore environments.In Proceedings of PSTI2010, the First International Workshopon Parallel Software Tools and Tool Infrastructures, San DiegoCA.

Weaver, V. M., Terpstra, D., McCraw, H., Johnson, M.,Kasichayanula, K., Ralph, J., Nelson, J., Mucci, P., Mohan,T., and Moore, S. (2013).PAPI 5: Measuring power, energy, and the cloud.

Beniamine David (MOAIS) 26/26

Page 64: Beniamine David Guillaume Huard Bruno Ra n · Resume execution page fault handlerno no yes yes no Beniamine David (MOAIS) yes 14/26. Memory Organisation Cartography & Analysis II

Memory Organisation Cartography & Analysis

IV Bibliography

In Performance Analysis of Systems and Software (ISPASS),2013 IEEE International Symposium on, pages 124�125.

Beniamine David (MOAIS) 26/26