Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Towards Exascale
ROMA
June 2014
Journée des doctorants
TowardsExascale
ROMA
TowardsExascale
ROMA The story starts with a box ...
... that contains lots of little boxes.
TowardsExascale
ROMA The story starts with a box ...
... that contains lots of little boxes.
TowardsExascale
ROMA
The Titan SuperComputer:
• 404m2 (the big box)• 299, 008 processor cores
(the small boxes)
• 17.59 PetaFlops
• 8.2 MW• 693.6 TiB of RAM• 240 GB/s transfer speed
to RAM
Image courtesy of Oak Ridge National Laboratory, U.S. Dept. of Energy
TowardsExascale
ROMA Then what is Exascale?
×1000
But inthe same box:
TowardsExascale
ROMA Then what is Exascale?
×1000
But inthe same box:
TowardsExascale
ROMA Then what is Exascale?
×1000
But inthe same box:
TowardsExascale
ROMA
TowardsExascale
ROMA
Linear algebra, problems get bigger and bigger
Code Aster, Carter(e.g., finite ele-ments)
→Solution of sparsesystemsAx = b
Often the most expensive part in numerical simulation codesSparse direct methods to solve Ax = b:
• Decompose A under the form LU,LDLt or LLt
• Solve the triangular systems Ly = b, then Ux = y3D example in earth science:acoustic wave propagation,27-point finite difference grid
Current goal [Seiscope project]:LU on complete earthn = N3 = 10003
Extrapolation on a 1000× 1000× 1000 grid: 55 exaflops, 200 Tbytesfor factors, 40 TBytes for active memory!
TowardsExascale
ROMA
TowardsExascale
ROMA
TowardsExascale
ROMA
Resilience
The main known problem for Exascale is Resilience.
Time to checkpoint
Time
What if there is 1000×the processing power?It gets worse
TowardsExascale
ROMA
Resilience
The main known problem for Exascale is Resilience.
Time to checkpoint
Time
What if there is 1000×the processing power?It gets worse
TowardsExascale
ROMA
Resilience
The main known problem for Exascale is Resilience.
Time to checkpoint
Time
What if there is 1000×the processing power?It gets worse
TowardsExascale
ROMA
Resilience
The main known problem for Exascale is Resilience.
Time to checkpoint
Time
What if there is 1000×the processing power?It gets worse
TowardsExascale
ROMA
Resilience
The main known problem for Exascale is Resilience.
Time to checkpoint
Time
What if there is 1000×the processing power?
It gets worse
TowardsExascale
ROMA
Resilience
The main known problem for Exascale is Resilience.
Time to checkpoint
Time
What if there is 1000×the processing power?It gets worse
TowardsExascale
ROMA
Resilience
The main known problem for Exascale is Resilience.
Time to checkpoint
Time
What if there is 1000×the processing power?
It gets worse
TowardsExascale
ROMA
Resilience
The main known problem for Exascale is Resilience.
Time to checkpointTime
What if there is 1000×the processing power?It gets worse
TowardsExascale
ROMA
Fault-tolerance techniques
• Rollback Recovery Strategies: All processors periodicallystop computing and checkpoint (save the state of theparallel application onto resilient storage).
• Coordinated checkpointing, No need to log messages/ All processors need to rollback/ I/O congestion
• Non Coordinated checkpointing/ Need to log messages
• Slowdowns failure-free execution and increasescheckpoint size/time
, Faster re-execution with logged messages• Hierarchical checkpointing
/ Need to log inter-groups messages, Only processors from failed group need to rollback, Faster re-execution with logged messages, Rumor: scales well to very large platforms
TowardsExascale
ROMA
Fault-tolerance techniques
• Rollback Recovery Strategies: All processors periodicallystop computing and checkpoint (save the state of theparallel application onto resilient storage).
• Coordinated checkpointing, No need to log messages/ All processors need to rollback/ I/O congestion
• Non Coordinated checkpointing/ Need to log messages
• Slowdowns failure-free execution and increasescheckpoint size/time
, Faster re-execution with logged messages
• Hierarchical checkpointing/ Need to log inter-groups messages, Only processors from failed group need to rollback, Faster re-execution with logged messages, Rumor: scales well to very large platforms
TowardsExascale
ROMA
Fault-tolerance techniques
• Rollback Recovery Strategies: All processors periodicallystop computing and checkpoint (save the state of theparallel application onto resilient storage).
• Coordinated checkpointing, No need to log messages/ All processors need to rollback/ I/O congestion
• Non Coordinated checkpointing/ Need to log messages
• Slowdowns failure-free execution and increasescheckpoint size/time
, Faster re-execution with logged messages• Hierarchical checkpointing
/ Need to log inter-groups messages, Only processors from failed group need to rollback, Faster re-execution with logged messages, Rumor: scales well to very large platforms
TowardsExascale
ROMA
Replication
Model
• A parallel application comprising n (sequential) processes• Each process replicated g ≥ 2 times• A processing element executes a single replica• The application fails when all replicas in one replica group
have been hit by failures
1 2
. . .
i
. . .
n
Objective
• Show when replication is beneficial to periodiccheckpointing
TowardsExascale
ROMA
Replication
Model
• A parallel application comprising n (sequential) processes• Each process replicated g ≥ 2 times• A processing element executes a single replica• The application fails when all replicas in one replica group
have been hit by failures
1 2
. . .
i
. . .
n
Objective
• Show when replication is beneficial to periodiccheckpointing
TowardsExascale
ROMA
Replication
Model
• A parallel application comprising n (sequential) processes• Each process replicated g ≥ 2 times• A processing element executes a single replica• The application fails when all replicas in one replica group
have been hit by failures
1 2
. . .
i
. . .
n
Objective
• Show when replication is beneficial to periodiccheckpointing
TowardsExascale
ROMA
Replication
Model
• A parallel application comprising n (sequential) processes• Each process replicated g ≥ 2 times• A processing element executes a single replica• The application fails when all replicas in one replica group
have been hit by failures
1 2
. . .
i
. . .
n
Objective
• Show when replication is beneficial to periodiccheckpointing
TowardsExascale
ROMA
Replication
Model
• A parallel application comprising n (sequential) processes• Each process replicated g ≥ 2 times• A processing element executes a single replica• The application fails when all replicas in one replica group
have been hit by failures
1 2
. . .
i
. . .
n
Objective
• Show when replication is beneficial to periodiccheckpointing
TowardsExascale
ROMA
Prediction
• Predictor (Recall, Precision), Window-based predictions• Predictions must be provided at least Cp seconds in
advance
TimeTR-C TR-C Tlost TR-C
Error(Regular mode)
TimeTR-C Wreg
I
TR-C-Wreg
TR-C
(Prediction without failure)
TimeTR-C Wreg
IError
TR-C-Wreg
TR-C
(Prediction with failure)
C C C D R C
C C Cp C C
C C Cp D R C C
Objective
• Characterize when prediction is useful.
TowardsExascale
ROMA
Prediction
• Predictor (Recall, Precision), Window-based predictions• Predictions must be provided at least Cp seconds in
advance
TimeTR-C TR-C Tlost TR-C
Error(Regular mode)
TimeTR-C Wreg
I
TR-C-Wreg
TR-C
(Prediction without failure)
TimeTR-C Wreg
IError
TR-C-Wreg
TR-C
(Prediction with failure)
C C C D R C
C C Cp C C
C C Cp D R C C
Objective
• Characterize when prediction is useful.
TowardsExascale
ROMA
Kind of errors
Hard errors
• Easy to detect
• Easy to localize and characterize
• Expensive to correct
Soft errors
• Hard to detect
• Hard to localize and characterize
• Easy to correct (sometimes)
TowardsExascale
ROMA
Silent errors
How to spot them
• Add some redundancy
• Error detecting codes
• Selective reliability
How to face them
• Majority vote among the replicas
• Error correcting codes
• Checkpoint recovery
TowardsExascale
ROMA
Finding the best trade-off
Let us consider an iterative method
• correction at each step• increases cost of a single iteration• no time wasted for checkpoint• good for low error rates
• checkpointing + detection at each step• small overhead at each iteration (detection)• periodic time loss for checkpointing• checkpoint interval can be tailored on error rate
Solution: combine the two techniques
TowardsExascale
ROMA
Finding the best trade-off
Let us consider an iterative method
• correction at each step• increases cost of a single iteration• no time wasted for checkpoint• good for low error rates
• checkpointing + detection at each step• small overhead at each iteration (detection)• periodic time loss for checkpointing• checkpoint interval can be tailored on error rate
Solution: combine the two techniques
TowardsExascale
ROMA
Finding the best trade-off
Let us consider an iterative method
• correction at each step• increases cost of a single iteration• no time wasted for checkpoint• good for low error rates
• checkpointing + detection at each step• small overhead at each iteration (detection)• periodic time loss for checkpointing• checkpoint interval can be tailored on error rate
Solution: combine the two techniques
TowardsExascale
ROMA
Finding the best trade-off
Let us consider an iterative method
• correction at each step• increases cost of a single iteration• no time wasted for checkpoint• good for low error rates
• checkpointing + detection at each step• small overhead at each iteration (detection)• periodic time loss for checkpointing• checkpoint interval can be tailored on error rate
Solution: combine the two techniques
TowardsExascale
ROMA
Dealing with verifications
It is not always possible to use error detection / correctioncodes at each step. What if we still want to use checkpointsand recoveries ?
Problem
• We don’t know when the error occurred
• We don’t know if the last checkpoint is valid
We need a verification mechanism to verify that there were nosilent errors in previous computations and to check thecorrectness of the checkpoints. But this has a cost!
TowardsExascale
ROMA
Checkpoints and Verifications
We assume there are no errors during checkpoints (less errorsources when doing I/O).
Simple approach: perform a verification before each checkpointto eliminate risk of corrupted data.
Time
w V C w V C w V C w V C
Is this better?
Time
w C w V C w C w V C w C
TowardsExascale
ROMA
Checkpoints and Verifications
We assume there are no errors during checkpoints (less errorsources when doing I/O).
Simple approach: perform a verification before each checkpointto eliminate risk of corrupted data.
Time
w V C w V C w V C w V C
Is this better?
Time
w C w V C w C w V C w C
TowardsExascale
ROMA
With k checkpoints and one verification
With multiple checkpoints, the problem is to find when theerror occurred.
Time
Error
V C w C w C w C w C w V R V R V R V
TowardsExascale
ROMA
With k checkpoints and one verification
With multiple checkpoints, the problem is to find when theerror occurred.
Time
Error
V C w C w C w C w C w V R V R V R V
TowardsExascale
ROMA
With k checkpoints and one verification
With multiple checkpoints, the problem is to find when theerror occurred.
Time
Error
V C w C w C w C w C w V R V R V R V
TowardsExascale
ROMA
With k checkpoints and one verification
With multiple checkpoints, the problem is to find when theerror occurred.
Time
Error
V C w C w C w C w C w V R V R V R V
TowardsExascale
ROMA
With k checkpoints and one verification
With multiple checkpoints, the problem is to find when theerror occurred.
Time
Error
V C w C w C w C w C w V R V R V R V
Solution
• The problem is very similar with k verifications and onecheckpoint
• With constant C, V and R we can find an optimal solutionto this problem (i.e that minimizes the expectation of theexecution time).
TowardsExascale
ROMA
What about DAGs?
Let us consider a Directed Acyclic Graph (DAG) where:
• Nodes represent tasks
• Edges correspond to precedence constraintsWe make several important assumptions on this model:
• All tasks are executed by all the p processors (whichamounts to linearize the task graph and to execute alltasks sequentially)
• Each task has its own undivisible work of size w
Problem: Where do we have to place the checkpoints and theverifications in order to find the optimal expectation of thetime to execute all the tasks without failures?
TowardsExascale
ROMA
Starting with simple graphs
We have analytical formulas to compute the expectation of thetime to successfully execute each of these graphs.
• We can find the optimal expectation of the time tosuccessfully execute the fork graph and the linear chainusing a polynomial dynamic programming algorithm.
• The join is probably NP-Complete because of thecombinatorial explosion of the possibilites.
T0
T1
Ti
Tn
T0
Ti
Tn
Tf T0 T1 Ti Tn
Future work: investigate the optimal checkpointing andverification problem for general DAGs.
TowardsExascale
ROMA
TowardsExascale
ROMA
Memory
Another concern: Bandwidth to Memory:
240Gb/s
When system grows 10 times,
Bandwidth to Memory should grow 20 times!
Since we are not good with architecture, we focus onalgorithms..
TowardsExascale
ROMA
Memory
Another concern: Bandwidth to Memory:
240Gb/s
When system grows 10 times,
Bandwidth to Memory should grow 20 times!
Since we are not good with architecture, we focus onalgorithms..
TowardsExascale
ROMA
Memory
Another concern: Bandwidth to Memory:
240Gb/s
When system grows 10 times,
Bandwidth to Memory should grow 20 times!
Since we are not good with architecture, we focus onalgorithms..
TowardsExascale
ROMA
Memory
Another concern: Bandwidth to Memory:
240Gb/s
When system grows 10 times,
Bandwidth to Memory should grow 20 times!
Since we are not good with architecture, we focus onalgorithms..
TowardsExascale
ROMA
Memory
Another concern: Bandwidth to Memory:
240Gb/s
When system grows 10 times,
Bandwidth to Memory should grow 20 times!
Since we are not good with architecture, we focus onalgorithms..
TowardsExascale
ROMA
Pebble Game
0/3
0/2
0/4
0/1
Two moves:
• Add a pebble on a vertex.• Remove a pebble from a vertex.
One rule:
• To add pebble on a vertex, all its predecessors must have anumber of pebbles equal to its weight.
One goal:
• All vertices have to be fulfil at least one time and thenumber of used pebbles must be minimized.
TowardsExascale
ROMA
Pebble Game
1/3
0/2
0/4
0/1
Two moves:
• Add a pebble on a vertex.• Remove a pebble from a vertex.
One rule:
• To add pebble on a vertex, all its predecessors must have anumber of pebbles equal to its weight.
One goal:
• All vertices have to be fulfil at least one time and thenumber of used pebbles must be minimized.
TowardsExascale
ROMA
Pebble Game
0/3
0/2
0/4
0/1
Two moves:
• Add a pebble on a vertex.• Remove a pebble from a vertex.
One rule:
• To add pebble on a vertex, all its predecessors must have anumber of pebbles equal to its weight.
One goal:
• All vertices have to be fulfil at least one time and thenumber of used pebbles must be minimized.
TowardsExascale
ROMA
Pebble Game
0/3
0/2
0/4
1/1
Two moves:
• Add a pebble on a vertex.• Remove a pebble from a vertex.
One rule:
• To add pebble on a vertex, all its predecessors must have anumber of pebbles equal to its weight.
One goal:
• All vertices have to be fulfil at least one time and thenumber of used pebbles must be minimized.
TowardsExascale
ROMA
Pebble Game
0/3
0/2
0/4
1/1
Wrong
Two moves:
• Add a pebble on a vertex.• Remove a pebble from a vertex.
One rule:
• To add pebble on a vertex, all its predecessors must have anumber of pebbles equal to its weight.
One goal:
• All vertices have to be fulfil at least one time and thenumber of used pebbles must be minimized.
TowardsExascale
ROMA
Pebble Game
0/3
0/2
0/4
0/1
pebble counter : 0; number max of pebbles : 0
TowardsExascale
ROMA
Pebble Game
1/3
0/2
0/4
0/1
pebble counter : 1; number max of pebbles : 1
TowardsExascale
ROMA
Pebble Game
2/3
0/2
0/4
0/1
pebble counter : 2; number max of pebbles : 2
TowardsExascale
ROMA
Pebble Game
3/3
0/2
0/4
0/1
pebble counter : 3; number max of pebbles : 3
TowardsExascale
ROMA
Pebble Game
3/3
0/2
1/4
0/1
pebble counter : 4; number max of pebbles : 4
TowardsExascale
ROMA
Pebble Game
3/3
0/2
2/4
0/1
pebble counter : 5; number max of pebbles : 5
TowardsExascale
ROMA
Pebble Game
3/3
0/2
3/4
0/1
pebble counter : 6; number max of pebbles : 6
TowardsExascale
ROMA
Pebble Game
3/3
0/2
4/4
0/1
pebble counter : 7; number max of pebbles : 7
TowardsExascale
ROMA
Pebble Game
3/3
0/2
3/4
0/1
pebble counter : 6; number max of pebbles : 7
TowardsExascale
ROMA
Pebble Game
3/3
0/2
2/4
0/1
pebble counter : 5; number max of pebbles : 7
TowardsExascale
ROMA
Pebble Game
3/3
0/2
1/4
0/1
pebble counter : 4; number max of pebbles : 7
TowardsExascale
ROMA
Pebble Game
3/3
0/2
0/4
0/1
pebble counter : 3; number max of pebbles : 7
TowardsExascale
ROMA
Pebble Game
3/3
1/2
0/4
0/1
pebble counter : 4; number max of pebbles : 7
TowardsExascale
ROMA
Pebble Game
3/3
0/4
0/1
2/2
pebble counter : 5; number max of pebbles : 7
TowardsExascale
ROMA
Pebble Game
3/3
0/4
1/1
2/2
pebble counter : 6; number max of pebbles : 7
TowardsExascale
ROMA
An other modelisation
DefinitionLet G be a DAG with weighted edges and vertices, and π atopological order.
• We define Me(π, x) (memory edges) as the set of edgeseuv such that π(u) < π(x) ≤ π(v)
• We call Cost of π at vertex v the value
Cost(π, v) = w(v) +∑
u∈N+(v)
c(evu) +∑
eux∈Me(π,v)
c(eux)
• We define the Cost of an order as:
Cost(π) = max{Cost(π, v), v ∈ G}
Our goal: minimize Cost(π)
TowardsExascale
ROMA
An other modelisation
DefinitionLet G be a DAG with weighted edges and vertices, and π atopological order.
• We define Me(π, x) (memory edges) as the set of edgeseuv such that π(u) < π(x) ≤ π(v)
• We call Cost of π at vertex v the value
Cost(π, v) = w(v) +∑
u∈N+(v)
c(evu) +∑
eux∈Me(π,v)
c(eux)
• We define the Cost of an order as:
Cost(π) = max{Cost(π, v), v ∈ G}
Our goal: minimize Cost(π)
TowardsExascale
ROMA
An other modelisation
unprocessedprocessedvertices already
vertices
Figure : Before the processing of v
TowardsExascale
ROMA
An other modelisation
unprocessedprocessedvertices already
vertices
Figure : During the processing of v
TowardsExascale
ROMA
An other modelisation
unprocessedprocessedvertices already
vertices
Figure : After the processing of v
TowardsExascale
ROMA
TowardsExascale
ROMA
Energy
One last problem, Energy.
2W /cm2
80W /cm2
8.2MW
Thermal Wall: We cannot improve the clock
efficiency of a chip: it would melt.
TowardsExascale
ROMA
Energy
One last problem, Energy.
2W /cm2
80W /cm2
8.2MW
Thermal Wall: We cannot improve the clock
efficiency of a chip: it would melt.
TowardsExascale
ROMA
Energy
One last problem, Energy.
2W /cm2
80W /cm2
8.2MW
Thermal Wall: We cannot improve the clock
efficiency of a chip: it would melt.
TowardsExascale
ROMA
Energy
One last problem, Energy.
2W /cm2
80W /cm2
8.2MW
Thermal Wall: We cannot improve the clock
efficiency of a chip: it would melt.
TowardsExascale
ROMA
Energy
One last problem, Energy.
2W /cm2
80W /cm2
8.2MW
Thermal Wall: We cannot improve the clock
efficiency of a chip: it would melt.
TowardsExascale
ROMA
Speed Scaling
One can modify the execution speed f of any task,f ∈ [fmin, fmax].
Let Ti of weight wi executed on processor pj :
time
pj · · · · · ·
Exe(wi , fi )
fi
Exe(wi , fi )
fi
TowardsExascale
ROMA
Speed Scaling
One can modify the execution speed f of any task,f ∈ [fmin, fmax].
Let Ti of weight wi executed on processor pj :
time
pj · · · · · ·
Exe(wi , fi )
fi
Exe(wi , fi )
fi
TowardsExascale
ROMA
Speed Scaling
One can modify the execution speed f of any task,f ∈ [fmin, fmax].
Let Ti of weight wi executed on processor pj :
time
pj · · · · · ·
Exe(wi , fi )
fi
Exe(wi , fi )
fi
TowardsExascale
ROMA
The energy consumption of the execution of task Ti at speed fi :
Ei (fi ) = Exe(wi , fi )f 3i = wi f 2i
→ (Dynamic part of the classical energy model)
TowardsExascale
ROMA
Unfortunately some more drawbacks (reliability):
fi
Ri (fi )
frel
Ri (frel)
Ri (fi ) ≈ 1− λ0e−dfi Exe(wi , fi )
TowardsExascale
ROMA
Unfortunately some more drawbacks (reliability):
fi
Ri (fi )
frel
Ri (frel)
Ri (fi ) ≈ 1− λ0e−dfi Exe(wi , fi )
TowardsExascale
ROMA
Unfortunately some more drawbacks (reliability):
fi
Ri (fi )
frel
Ri (frel)
Ri (fi ) ≈ 1− λ0e−dfi Exe(wi , fi )
TowardsExascale
ROMA A solution: two executions!
Ri = 1− (1− Ri (f(1)i ))(1− Ri (f
(2)i ))
time
p1
p2
T(1)i f
(1)i
T(2)i f
(2)i
ti
Energy consumption with two executions:
Ei = wi
(f(1)i
)2+ wi
(f(2)i
)2
fi
Ei (fi )
wi f2i + wi f
2i = 2Ei (fi )
frel
Ei (frel)
frel√2
TowardsExascale
ROMA A solution: two executions!
Ri = 1− (1− Ri (f(1)i ))(1− Ri (f
(2)i ))
time
p1
p2
T(1)i f
(1)i
T(2)i f
(2)i
ti
Energy consumption with two executions:
Ei = wi
(f(1)i
)2+ wi
(f(2)i
)2
fi
Ei (fi )
wi f2i + wi f
2i = 2Ei (fi )
frel
Ei (frel)
frel√2
TowardsExascale
ROMA A solution: two executions!
Ri = 1− (1− Ri (f(1)i ))(1− Ri (f
(2)i ))
time
p1
p2
T(1)i f
(1)i
T(2)i f
(2)i
ti
Energy consumption with two executions:
Ei = wi
(f(1)i
)2+ wi
(f(2)i
)2
fi
Ei (fi )
wi f2i + wi f
2i = 2Ei (fi )
frel
Ei (frel)
frel√2
TowardsExascale
ROMA A solution: two executions!
Ri = 1− (1− Ri (f(1)i ))(1− Ri (f
(2)i ))
time
p1
p2
T(1)i f
(1)i
T(2)i f
(2)i
ti
Energy consumption with two executions:
Ei = wi
(f(1)i
)2+ wi
(f(2)i
)2
fi
Ei (fi ) wi f 2i + wi f2i = 2Ei (fi )
frel
Ei (frel)
frel√2
TowardsExascale
ROMA A solution: two executions!
Ri = 1− (1− Ri (f(1)i ))(1− Ri (f
(2)i ))
time
p1
p2
T(1)i f
(1)i
T(2)i f
(2)i
ti
Energy consumption with two executions:
Ei = wi
(f(1)i
)2+ wi
(f(2)i
)2
fi
Ei (fi ) wi f 2i + wi f2i = 2Ei (fi )
frel
Ei (frel)
frel√2
TowardsExascale
ROMA A solution: two executions!
Ri = 1− (1− Ri (f(1)i ))(1− Ri (f
(2)i ))
time
p1
p2
T(1)i f
(1)i
T(2)i f
(2)i
ti
Energy consumption with two executions:
Ei = wi
(f(1)i
)2+ wi
(f(2)i
)2
fi
Ei (fi ) wi f 2i + wi f2i = 2Ei (fi )
frel
Ei (frel)
frel√2
TowardsExascale
ROMA
To sum up
We need to find for each task:
• the number of execution (one or two)• their speed• their mapping (processor)
In order to minimize the energy consumption under theconstraints:
• ∀i , ti ≤ D (bounded makespan)• ∀i , Ri (Ti ) ≥ Ri (frel) (minimum reliability)
TowardsExascale
ROMA
Two kind of results
Theoretical:
• FPTAS for linear chains;• Inapproximability for independent tasks;• With a relaxation on the makespan constraint (β), we can
approximate the optimal solution within 1 + 1β2
, for all
β ≥ max(
2− 32p+1 , 2−p+24p+2
).
But also simulations for general DAGs.
TowardsExascale
ROMA
TowardsExascale
ROMA
Sparse direct solution: main research issues
Code Aster,EDF Pump,nuclear backupcircuit
01234D
epth
(km
)
0Dip (km)
5
10
15
20
Cros
s (km
)
5 10 15 20
3000 4000 5000 6000m/s
Frequency domainseismic modeling,Helmholtz equa-tions, SEISCOPEproject
Extrapolation on a 1000× 1000× 1000 grid:55 exaflops, 200 Tbytes for factors, 40 TBytes for active memory!
Main algorithmic issues
• Parallel algorithmic issues: synchronization avoidance,mapping irregular data structures, scheduling.
• Performance scalability: time but also memory/proc whenincreasing number of processors (and problem size).
• Numerical issues: numerical accurary, hybrid iterative-directsolvers, application (elliptic PDEs) specific solvers
TowardsExascale
ROMA
Execution of malleable task trees
• It is one of the problems submitted to Exascale• Motivation: linear algebra, sparse matrix factorisations. . .• Principle: many processors available → can parallelize the
tree but also the tasks
• Difficulty: parallelisation is not perfect; the moreprocessors we allocate to a task, the more losses occur
0
1 2 3 4
11 12 13 14 21 22 23
121 122 123 124 231 232
TowardsExascale
ROMA
• In the model developed (time to complete a task of lengthL with p processors is L/pα for 0 < α < 1): makespan-optimal processor allocation to the tree looks likeelectricity charges repartition → nice structure to workwith
• This model forgets some relevant constraints as memorylimit or granularity: other models are designed to handlethem
0
1 2 3 4
11 12 13 14 21 22 23
121 122 123 124 231 232
TowardsExascale
ROMA
• In the model developed (time to complete a task of lengthL with p processors is L/pα for 0 < α < 1): makespan-optimal processor allocation to the tree looks likeelectricity charges repartition → nice structure to workwith
• This model forgets some relevant constraints as memorylimit or granularity: other models are designed to handlethem
0
1 2 3 4
11 12 13 14 21 22 23
121 122 123 124 231 232
TowardsExascale
ROMA
• In the model developed (time to complete a task of lengthL with p processors is L/pα for 0 < α < 1): makespan-optimal processor allocation to the tree looks likeelectricity charges repartition → nice structure to workwith
• This model forgets some relevant constraints as memorylimit or granularity: other models are designed to handlethem
0
1 2 3 4
11 12 13 14 21 22 23
121 122 123 124 231 232
TowardsExascale
ROMA
• In the model developed (time to complete a task of lengthL with p processors is L/pα for 0 < α < 1): makespan-optimal processor allocation to the tree looks likeelectricity charges repartition → nice structure to workwith
• This model forgets some relevant constraints as memorylimit or granularity: other models are designed to handlethem
0
1 2 3 4
11 12 13 14 21 22 23
121 122 123 124 231 232
TowardsExascale
ROMA
• In the model developed (time to complete a task of lengthL with p processors is L/pα for 0 < α < 1): makespan-optimal processor allocation to the tree looks likeelectricity charges repartition → nice structure to workwith
• This model forgets some relevant constraints as memorylimit or granularity: other models are designed to handlethem
0
1 2 3 4
11 12 13 14 21 22 23
121 122 123 124 231 232
TowardsExascale
ROMA
TowardsExascale
ROMA