Upload
vokhanh
View
212
Download
0
Embed Size (px)
Citation preview
| |
Mauro Calderara, Sascha Brück, Mathieu Luisier
Using Today’s Fastest Chips to Design the Chips of
Tomorrow
| |
What we want to do
How we do it
Apr 08 2016 Mauro Calderara 2
Overview
| |
What we want to do → Quantum Transport: electrons and structures
How we do it → How GPUs saved the day
Apr 08 2016 Mauro Calderara 3
Overview
| | Apr 08 2016 Mauro Calderara 4
Probably you’re familiar with this
| | Apr 08 2016 Mauro Calderara 5
Zooming in
| | Apr 08 2016 Mauro Calderara 6
The future?
(link to video: http://iis.ee.ethz.ch/~mauro/movie_SC15.avi)
| | Apr 08 2016 Mauro Calderara 7
From a somewhat more abstract POV
Device
| | Apr 08 2016 Mauro Calderara 7
From a somewhat more abstract POV
Device
? e
| | Apr 08 2016 Mauro Calderara 7
From a somewhat more abstract POV
Device
? e e
| | Apr 08 2016 Mauro Calderara 7
From a somewhat more abstract POV
Device e
? e e
| | Apr 08 2016 Mauro Calderara 7
From a somewhat more abstract POV
Device e
e
e e
? e e
| |
How do electrons behave w.r.t the
device?
Apr 08 2016 Mauro Calderara 8
This is what we’re ultimately interested in!
Device
| |
How do electrons behave w.r.t the
device?
Change in parameters → change in
behavior?
Apr 08 2016 Mauro Calderara 8
This is what we’re ultimately interested in!
Device
| |
How do electrons behave w.r.t the
device?
Change in parameters → change in
behavior?
Apr 08 2016 Mauro Calderara 8
This is what we’re ultimately interested in!
Device
e
e
e
e
e e
| |
How do electrons behave w.r.t the
device?
Change in parameters → change in
behavior?
Apr 08 2016 Mauro Calderara 8
This is what we’re ultimately interested in!
Device
e
e
e
e
e e
Gate voltage
| |
How do electrons behave w.r.t the
device?
Change in parameters → change in
behavior?
Apr 08 2016 Mauro Calderara 8
This is what we’re ultimately interested in!
Device
e
e
e
e
e e
Gate voltage
Dimensions
Material
properties
| |
How do electrons behave w.r.t the
device?
Change in parameters → change in
behavior?
Applies not just to transistors
Batteries
Storage devices
...
Apr 08 2016 Mauro Calderara 8
This is what we’re ultimately interested in!
Device
e
e
e
e
e e
Gate voltage
Dimensions
Material
properties
| | Apr 08 2016 Mauro Calderara 9
How would we do that? The ‘‘easy’’ case:
| | Apr 08 2016 Mauro Calderara 9
How would we do that? The ‘‘easy’’ case:
→ device behaves like bulk material
| | Apr 08 2016 Mauro Calderara 10
How would we do that? The ‘‘difficult’’ case:
| | Apr 08 2016 Mauro Calderara 10
How would we do that? The ‘‘difficult’’ case:
→ device behaves like atomic structure
| | Apr 08 2016 Mauro Calderara 11
The cost of going small
Why is this ‘‘easy’’ ... ... and this ‘‘difficult’’?
| | Apr 08 2016 Mauro Calderara 12
The cost of going small
Can assume is ‘‘infinite’’ and
use semi empirical model.
Very finite! Need to do
it from first principles.
| | Apr 08 2016 Mauro Calderara 12
The cost of going small
Can assume is ‘‘infinite’’ and
use semi empirical model.
Very finite! Need to do
it from first principles.
| | Apr 08 2016 Mauro Calderara 12
The cost of going small
Can assume is ‘‘infinite’’ and
use semi empirical model.
Very finite! Need to do
it from first principles.
| | Apr 08 2016 Mauro Calderara 12
The cost of going small
Can assume is ‘‘infinite’’ and
use semi empirical model.
Very finite! Need to do
it from first principles.
| | Apr 08 2016 Mauro Calderara 12
The cost of going small
Can assume is ‘‘infinite’’ and
use semi empirical model.
Very finite! Need to do
it from first principles.
| | Apr 08 2016 Mauro Calderara 12
The cost of going small
Can assume is ‘‘infinite’’ and
use semi empirical model.
Very finite! Need to do
it from first principles.
| | Apr 08 2016 Mauro Calderara 12
The cost of going small
Can assume is ‘‘infinite’’ and
use semi empirical model.
Very finite! Need to do
it from first principles.
run
tim
e
run
tim
e
| |
run
tim
e
run
tim
e
Apr 08 2016 Mauro Calderara 13
The cost of going small
Semi-empirical → O(Hours) First principles → O(Months)
| |
run
tim
e
run
tim
e
Apr 08 2016 Mauro Calderara 13
The cost of going small
Semi-empirical → O(Hours) First principles → O(Months)
| |
run
tim
e
run
tim
e
Apr 08 2016 Mauro Calderara 13
The cost of going small
Semi-empirical → O(Hours) First principles → O(Months)
| |
What we want to do → Quantum Transport: electrons and structures
How we do it → How GPUs saved the day
Apr 08 2016 Mauro Calderara 14
Overview
| | Apr 08 2016 Mauro Calderara 15
Where does all that time go?
run
tim
e
~ 40x
| | Apr 08 2016 Mauro Calderara 15
Where does all that time go?
run
tim
e
~ 40x
Solve an eigenvalue
problem (not discussed
here).
| | Apr 08 2016 Mauro Calderara 15
Where does all that time go?
run
tim
e
~ 40x Invert the matrix from
before (selectively!) using
a recursive algorithm.
Solve an eigenvalue
problem (not discussed
here).
| |
Instead of trying to invert selectively,
solve system using generic sparse
solver package
Apr 08 2016 Mauro Calderara 16
Avoiding the inversion, use a sparse solver instead ru
nti
me
~ 40x
| |
Instead of trying to invert selectively,
solve system using generic sparse
solver package
Gain: speed, parallelism, capacity for
somewhat larger systems
Apr 08 2016 Mauro Calderara 16
Avoiding the inversion, use a sparse solver instead ru
nti
me
~ 40x
| |
Instead of trying to invert selectively,
solve system using generic sparse
solver package
Gain: speed, parallelism, capacity for
somewhat larger systems
Cost: code now mem-bw bound
And: not such a good fit for GPUs ...
Apr 08 2016 Mauro Calderara 16
Avoiding the inversion, use a sparse solver instead ru
nti
me
~ 40x
| |
Instead of trying to invert selectively,
solve system using generic sparse
solver package
Gain: speed, parallelism, capacity for
somewhat larger systems
Cost: code now mem-bw bound
And: not such a good fit for GPUs ...
Apr 08 2016 Mauro Calderara 16
Avoiding the inversion, use a sparse solver instead ru
nti
me
~ 40x
| |
run
tim
e
We’ve been able to solve that one
Apr 08 2016 Mauro Calderara 17
Tackling the eigenvalue problem
run
tim
e
~ 200x
| |
Good speedup so far
(now: O(Days), still not
quite there...)
Apr 08 2016 Mauro Calderara 18
Now what?
run
tim
e
~ 70x overall
| |
Good speedup so far
(now: O(Days), still not
quite there...)
But
Apr 08 2016 Mauro Calderara 18
Now what?
run
tim
e
~ 70x overall
| |
Good speedup so far
(now: O(Days), still not
quite there...)
But
Apr 08 2016 Mauro Calderara 18
Now what?
run
tim
e
~ 70x overall
Mem-BW bound by sparse solver
?
| |
Good speedup so far
(now: O(Days), still not
quite there...)
But
Apr 08 2016 Mauro Calderara 18
Now what?
run
tim
e
~ 70x overall
Mem-BW bound by sparse solver
?
| |
Good speedup so far
(now: O(Days), still not
quite there...)
But
Apr 08 2016 Mauro Calderara 18
Now what?
run
tim
e
~ 70x overall
Mem-BW bound by sparse solver
| |
Good speedup so far
(now: O(Days), still not
quite there...)
But
Apr 08 2016 Mauro Calderara 18
Now what?
run
tim
e
~ 70x overall
Mem-BW bound by sparse solver
? Advisor PhD student
| |
Inverting sparse system not feasible
Apr 08 2016 Mauro Calderara 19
A Sparse Solver for Transport Problems running on GPUs
-1
=
| |
Inverting sparse system not feasible
In our case: also not neccessary
Apr 08 2016 Mauro Calderara 19
A Sparse Solver for Transport Problems running on GPUs
-1
=
| |
Inverting sparse system not feasible
In our case: also not neccessary
Need first and last block rows only
Apr 08 2016 Mauro Calderara 19
A Sparse Solver for Transport Problems running on GPUs
-1
=
| |
Inverting sparse system not feasible
In our case: also not neccessary
Need first and last block rows only
If we can compute this fast, we can
interleave the solving step with the BC
computation
obtain the full solution very efficiently
Apr 08 2016 Mauro Calderara 19
A Sparse Solver for Transport Problems running on GPUs
-1
=
| |
Recursive algorithm based on the
Schwinger-Dyson equation
Apr 08 2016 Mauro Calderara 20
Obtaining the first and last block columns of the inverse
for i = N:1 𝑋𝑖 ← (𝐴𝑖,𝑖 − 𝐴𝑖,𝑖+1𝑋𝑖+1) \ 𝐴𝑖,𝑖−1
for i = 2:N
𝑄𝑖 ← −𝑋𝑖 𝑄𝑖−1
| |
Recursive algorithm based on the
Schwinger-Dyson equation
Apr 08 2016 Mauro Calderara 20
Obtaining the first and last block columns of the inverse
for i = N:1 𝑋𝑖 ← (𝐴𝑖,𝑖 − 𝐴𝑖,𝑖+1𝑋𝑖+1) \ 𝐴𝑖,𝑖−1
for i = 2:N
𝑄𝑖 ← −𝑋𝑖 𝑄𝑖−1
| |
Recursive algorithm based on the
Schwinger-Dyson equation
Apr 08 2016 Mauro Calderara 20
Obtaining the first and last block columns of the inverse
for i = N:1 𝑋𝑖 ← (𝐴𝑖,𝑖 − 𝐴𝑖,𝑖+1𝑋𝑖+1) \ 𝐴𝑖,𝑖−1
for i = 2:N
𝑄𝑖 ← −𝑋𝑖 𝑄𝑖−1
N
N+1
N-1
N-2
𝑋 𝐴
| |
Recursive algorithm based on the
Schwinger-Dyson equation
Apr 08 2016 Mauro Calderara 20
Obtaining the first and last block columns of the inverse
for i = N:1 𝑋𝑖 ← (𝐴𝑖,𝑖 − 𝐴𝑖,𝑖+1𝑋𝑖+1) \ 𝐴𝑖,𝑖−1
for i = 2:N
𝑄𝑖 ← −𝑋𝑖 𝑄𝑖−1
N
N+1
N-1
N-2
𝑋 𝐴
| |
Recursive algorithm based on the
Schwinger-Dyson equation
xGEMM + xGESV + xGEMM
Apr 08 2016 Mauro Calderara 20
Obtaining the first and last block columns of the inverse
for i = N:1 𝑋𝑖 ← (𝐴𝑖,𝑖 − 𝐴𝑖,𝑖+1𝑋𝑖+1) \ 𝐴𝑖,𝑖−1
for i = 2:N
𝑄𝑖 ← −𝑋𝑖 𝑄𝑖−1
N
N+1
N-1
N-2
𝑋 𝐴
| |
Recursive algorithm based on the
Schwinger-Dyson equation
xGEMM + xGESV + xGEMM
Very fast on accelerators
Apr 08 2016 Mauro Calderara 20
Obtaining the first and last block columns of the inverse
for i = N:1 𝑋𝑖 ← (𝐴𝑖,𝑖 − 𝐴𝑖,𝑖+1𝑋𝑖+1) \ 𝐴𝑖,𝑖−1
for i = 2:N
𝑄𝑖 ← −𝑋𝑖 𝑄𝑖−1
N
N+1
N-1
N-2
𝑋 𝐴
| |
Recursive algorithm based on the
Schwinger-Dyson equation
xGEMM + xGESV + xGEMM
Very fast on accelerators
Parallelizable
Apr 08 2016 Mauro Calderara 20
Obtaining the first and last block columns of the inverse
for i = N:1 𝑋𝑖 ← (𝐴𝑖,𝑖 − 𝐴𝑖,𝑖+1𝑋𝑖+1) \ 𝐴𝑖,𝑖−1
for i = 2:N
𝑄𝑖 ← −𝑋𝑖 𝑄𝑖−1
N
N+1
N-1
N-2
𝑋 𝐴
| |
Runs on GPUs, compute bound
Apr 08 2016 Mauro Calderara 21
A Sparse Solver for Transport Problems running on GPUs
| |
Runs on GPUs, compute bound
Apr 08 2016 Mauro Calderara 21
A Sparse Solver for Transport Problems running on GPUs
Arithmetic Intensity [log(FLOPS/Byte)]
Performance [log(FLOPS)]
| |
Runs on GPUs, compute bound
Interleaves with EV computation
Apr 08 2016 Mauro Calderara 21
A Sparse Solver for Transport Problems running on GPUs
Arithmetic Intensity [log(FLOPS/Byte)]
Performance [log(FLOPS)]
| |
Runs on GPUs, compute bound
Interleaves with EV computation
Memory efficient
Apr 08 2016 Mauro Calderara 21
A Sparse Solver for Transport Problems running on GPUs
Arithmetic Intensity [log(FLOPS/Byte)]
Performance [log(FLOPS)]
| |
Runs on GPUs, compute bound
Interleaves with EV computation
Memory efficient
Much faster than sparse solvers
Apr 08 2016 Mauro Calderara 21
A Sparse Solver for Transport Problems running on GPUs
Arithmetic Intensity [log(FLOPS/Byte)]
Performance [log(FLOPS)]
| |
Runs on GPUs, compute bound
Interleaves with EV computation
Memory efficient
Much faster than sparse solvers
Whole simulation: O(Hours)
Apr 08 2016 Mauro Calderara 21
A Sparse Solver for Transport Problems running on GPUs
Arithmetic Intensity [log(FLOPS/Byte)]
Performance [log(FLOPS)]
| |
Runs on GPUs, compute bound
Interleaves with EV computation
Memory efficient
Much faster than sparse solvers
Whole simulation: O(Hours)
Apr 08 2016 Mauro Calderara 21
A Sparse Solver for Transport Problems running on GPUs
Arithmetic Intensity [log(FLOPS/Byte)]
Performance [log(FLOPS)]
~ 10x
/ 80x
| |
Apr 08 2016 Mauro Calderara 22
Summary
| |
Transforming a sparse problem to a dense one can be a good thing
Apr 08 2016 Mauro Calderara 22
Summary
| |
Transforming a sparse problem to a dense one can be a good thing
Large speedup over state of the art (15x - 150x)
Apr 08 2016 Mauro Calderara 22
Summary
| |
Transforming a sparse problem to a dense one can be a good thing
Large speedup over state of the art (15x - 150x)
Significant increase in capacity (100’000 atoms → 10x - 100x)
Apr 08 2016 Mauro Calderara 22
Summary
| |
Transforming a sparse problem to a dense one can be a good thing
Large speedup over state of the art (15x - 150x)
Significant increase in capacity (100’000 atoms → 10x - 100x)
Uses hybrid ressources very efficiently (15 PF sustained)
Apr 08 2016 Mauro Calderara 22
Summary
| |
Transforming a sparse problem to a dense one can be a good thing
Large speedup over state of the art (15x - 150x)
Significant increase in capacity (100’000 atoms → 10x - 100x)
Uses hybrid ressources very efficiently (15 PF sustained)
Made ballistic ab-initio QT simulations for realistic structures a reality
Apr 08 2016 Mauro Calderara 22
Summary
| | Apr 08 2016 Mauro Calderara 23 (link to video: http://iis.ee.ethz.ch/~mauro/movie_Ag_Switch.avi)
| | Apr 08 2016 Mauro Calderara 24
Questions?