73
| | Mauro Calderara, Sascha Brück, Mathieu Luisier Using Today’s Fastest Chips to Design the Chips of Tomorrow

Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

  • Upload
    vokhanh

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| |

Mauro Calderara, Sascha Brück, Mathieu Luisier

Using Today’s Fastest Chips to Design the Chips of

Tomorrow

Page 2: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| |

What we want to do

How we do it

Apr 08 2016 Mauro Calderara 2

Overview

Page 3: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| |

What we want to do → Quantum Transport: electrons and structures

How we do it → How GPUs saved the day

Apr 08 2016 Mauro Calderara 3

Overview

Page 4: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| | Apr 08 2016 Mauro Calderara 4

Probably you’re familiar with this

Page 5: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| | Apr 08 2016 Mauro Calderara 5

Zooming in

Page 6: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| | Apr 08 2016 Mauro Calderara 6

The future?

(link to video: http://iis.ee.ethz.ch/~mauro/movie_SC15.avi)

Page 7: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| | Apr 08 2016 Mauro Calderara 7

From a somewhat more abstract POV

Device

Page 8: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| | Apr 08 2016 Mauro Calderara 7

From a somewhat more abstract POV

Device

? e

Page 9: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| | Apr 08 2016 Mauro Calderara 7

From a somewhat more abstract POV

Device

? e e

Page 10: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| | Apr 08 2016 Mauro Calderara 7

From a somewhat more abstract POV

Device e

? e e

Page 11: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| | Apr 08 2016 Mauro Calderara 7

From a somewhat more abstract POV

Device e

e

e e

? e e

Page 12: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| |

How do electrons behave w.r.t the

device?

Apr 08 2016 Mauro Calderara 8

This is what we’re ultimately interested in!

Device

Page 13: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| |

How do electrons behave w.r.t the

device?

Change in parameters → change in

behavior?

Apr 08 2016 Mauro Calderara 8

This is what we’re ultimately interested in!

Device

Page 14: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| |

How do electrons behave w.r.t the

device?

Change in parameters → change in

behavior?

Apr 08 2016 Mauro Calderara 8

This is what we’re ultimately interested in!

Device

e

e

e

e

e e

Page 15: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| |

How do electrons behave w.r.t the

device?

Change in parameters → change in

behavior?

Apr 08 2016 Mauro Calderara 8

This is what we’re ultimately interested in!

Device

e

e

e

e

e e

Gate voltage

Page 16: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| |

How do electrons behave w.r.t the

device?

Change in parameters → change in

behavior?

Apr 08 2016 Mauro Calderara 8

This is what we’re ultimately interested in!

Device

e

e

e

e

e e

Gate voltage

Dimensions

Material

properties

Page 17: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| |

How do electrons behave w.r.t the

device?

Change in parameters → change in

behavior?

Applies not just to transistors

Batteries

Storage devices

...

Apr 08 2016 Mauro Calderara 8

This is what we’re ultimately interested in!

Device

e

e

e

e

e e

Gate voltage

Dimensions

Material

properties

Page 18: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| | Apr 08 2016 Mauro Calderara 9

How would we do that? The ‘‘easy’’ case:

Page 19: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| | Apr 08 2016 Mauro Calderara 9

How would we do that? The ‘‘easy’’ case:

→ device behaves like bulk material

Page 20: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| | Apr 08 2016 Mauro Calderara 10

How would we do that? The ‘‘difficult’’ case:

Page 21: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| | Apr 08 2016 Mauro Calderara 10

How would we do that? The ‘‘difficult’’ case:

→ device behaves like atomic structure

Page 22: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| | Apr 08 2016 Mauro Calderara 11

The cost of going small

Why is this ‘‘easy’’ ... ... and this ‘‘difficult’’?

Page 23: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| | Apr 08 2016 Mauro Calderara 12

The cost of going small

Can assume is ‘‘infinite’’ and

use semi empirical model.

Very finite! Need to do

it from first principles.

Page 24: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| | Apr 08 2016 Mauro Calderara 12

The cost of going small

Can assume is ‘‘infinite’’ and

use semi empirical model.

Very finite! Need to do

it from first principles.

Page 25: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| | Apr 08 2016 Mauro Calderara 12

The cost of going small

Can assume is ‘‘infinite’’ and

use semi empirical model.

Very finite! Need to do

it from first principles.

Page 26: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| | Apr 08 2016 Mauro Calderara 12

The cost of going small

Can assume is ‘‘infinite’’ and

use semi empirical model.

Very finite! Need to do

it from first principles.

Page 27: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| | Apr 08 2016 Mauro Calderara 12

The cost of going small

Can assume is ‘‘infinite’’ and

use semi empirical model.

Very finite! Need to do

it from first principles.

Page 28: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| | Apr 08 2016 Mauro Calderara 12

The cost of going small

Can assume is ‘‘infinite’’ and

use semi empirical model.

Very finite! Need to do

it from first principles.

Page 29: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| | Apr 08 2016 Mauro Calderara 12

The cost of going small

Can assume is ‘‘infinite’’ and

use semi empirical model.

Very finite! Need to do

it from first principles.

run

tim

e

run

tim

e

Page 30: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| |

run

tim

e

run

tim

e

Apr 08 2016 Mauro Calderara 13

The cost of going small

Semi-empirical → O(Hours) First principles → O(Months)

Page 31: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| |

run

tim

e

run

tim

e

Apr 08 2016 Mauro Calderara 13

The cost of going small

Semi-empirical → O(Hours) First principles → O(Months)

Page 32: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| |

run

tim

e

run

tim

e

Apr 08 2016 Mauro Calderara 13

The cost of going small

Semi-empirical → O(Hours) First principles → O(Months)

Page 33: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| |

What we want to do → Quantum Transport: electrons and structures

How we do it → How GPUs saved the day

Apr 08 2016 Mauro Calderara 14

Overview

Page 34: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| | Apr 08 2016 Mauro Calderara 15

Where does all that time go?

run

tim

e

~ 40x

Page 35: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| | Apr 08 2016 Mauro Calderara 15

Where does all that time go?

run

tim

e

~ 40x

Solve an eigenvalue

problem (not discussed

here).

Page 36: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| | Apr 08 2016 Mauro Calderara 15

Where does all that time go?

run

tim

e

~ 40x Invert the matrix from

before (selectively!) using

a recursive algorithm.

Solve an eigenvalue

problem (not discussed

here).

Page 37: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| |

Instead of trying to invert selectively,

solve system using generic sparse

solver package

Apr 08 2016 Mauro Calderara 16

Avoiding the inversion, use a sparse solver instead ru

nti

me

~ 40x

Page 38: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| |

Instead of trying to invert selectively,

solve system using generic sparse

solver package

Gain: speed, parallelism, capacity for

somewhat larger systems

Apr 08 2016 Mauro Calderara 16

Avoiding the inversion, use a sparse solver instead ru

nti

me

~ 40x

Page 39: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| |

Instead of trying to invert selectively,

solve system using generic sparse

solver package

Gain: speed, parallelism, capacity for

somewhat larger systems

Cost: code now mem-bw bound

And: not such a good fit for GPUs ...

Apr 08 2016 Mauro Calderara 16

Avoiding the inversion, use a sparse solver instead ru

nti

me

~ 40x

Page 40: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| |

Instead of trying to invert selectively,

solve system using generic sparse

solver package

Gain: speed, parallelism, capacity for

somewhat larger systems

Cost: code now mem-bw bound

And: not such a good fit for GPUs ...

Apr 08 2016 Mauro Calderara 16

Avoiding the inversion, use a sparse solver instead ru

nti

me

~ 40x

Page 41: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| |

run

tim

e

We’ve been able to solve that one

Apr 08 2016 Mauro Calderara 17

Tackling the eigenvalue problem

run

tim

e

~ 200x

Page 42: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| |

Good speedup so far

(now: O(Days), still not

quite there...)

Apr 08 2016 Mauro Calderara 18

Now what?

run

tim

e

~ 70x overall

Page 43: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| |

Good speedup so far

(now: O(Days), still not

quite there...)

But

Apr 08 2016 Mauro Calderara 18

Now what?

run

tim

e

~ 70x overall

Page 44: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| |

Good speedup so far

(now: O(Days), still not

quite there...)

But

Apr 08 2016 Mauro Calderara 18

Now what?

run

tim

e

~ 70x overall

Mem-BW bound by sparse solver

?

Page 45: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| |

Good speedup so far

(now: O(Days), still not

quite there...)

But

Apr 08 2016 Mauro Calderara 18

Now what?

run

tim

e

~ 70x overall

Mem-BW bound by sparse solver

?

Page 46: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| |

Good speedup so far

(now: O(Days), still not

quite there...)

But

Apr 08 2016 Mauro Calderara 18

Now what?

run

tim

e

~ 70x overall

Mem-BW bound by sparse solver

Page 47: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| |

Good speedup so far

(now: O(Days), still not

quite there...)

But

Apr 08 2016 Mauro Calderara 18

Now what?

run

tim

e

~ 70x overall

Mem-BW bound by sparse solver

? Advisor PhD student

Page 48: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| |

Inverting sparse system not feasible

Apr 08 2016 Mauro Calderara 19

A Sparse Solver for Transport Problems running on GPUs

-1

=

Page 49: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| |

Inverting sparse system not feasible

In our case: also not neccessary

Apr 08 2016 Mauro Calderara 19

A Sparse Solver for Transport Problems running on GPUs

-1

=

Page 50: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| |

Inverting sparse system not feasible

In our case: also not neccessary

Need first and last block rows only

Apr 08 2016 Mauro Calderara 19

A Sparse Solver for Transport Problems running on GPUs

-1

=

Page 51: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| |

Inverting sparse system not feasible

In our case: also not neccessary

Need first and last block rows only

If we can compute this fast, we can

interleave the solving step with the BC

computation

obtain the full solution very efficiently

Apr 08 2016 Mauro Calderara 19

A Sparse Solver for Transport Problems running on GPUs

-1

=

Page 52: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| |

Recursive algorithm based on the

Schwinger-Dyson equation

Apr 08 2016 Mauro Calderara 20

Obtaining the first and last block columns of the inverse

for i = N:1 𝑋𝑖 ← (𝐴𝑖,𝑖 − 𝐴𝑖,𝑖+1𝑋𝑖+1) \ 𝐴𝑖,𝑖−1

for i = 2:N

𝑄𝑖 ← −𝑋𝑖 𝑄𝑖−1

Page 53: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| |

Recursive algorithm based on the

Schwinger-Dyson equation

Apr 08 2016 Mauro Calderara 20

Obtaining the first and last block columns of the inverse

for i = N:1 𝑋𝑖 ← (𝐴𝑖,𝑖 − 𝐴𝑖,𝑖+1𝑋𝑖+1) \ 𝐴𝑖,𝑖−1

for i = 2:N

𝑄𝑖 ← −𝑋𝑖 𝑄𝑖−1

Page 54: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| |

Recursive algorithm based on the

Schwinger-Dyson equation

Apr 08 2016 Mauro Calderara 20

Obtaining the first and last block columns of the inverse

for i = N:1 𝑋𝑖 ← (𝐴𝑖,𝑖 − 𝐴𝑖,𝑖+1𝑋𝑖+1) \ 𝐴𝑖,𝑖−1

for i = 2:N

𝑄𝑖 ← −𝑋𝑖 𝑄𝑖−1

N

N+1

N-1

N-2

𝑋 𝐴

Page 55: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| |

Recursive algorithm based on the

Schwinger-Dyson equation

Apr 08 2016 Mauro Calderara 20

Obtaining the first and last block columns of the inverse

for i = N:1 𝑋𝑖 ← (𝐴𝑖,𝑖 − 𝐴𝑖,𝑖+1𝑋𝑖+1) \ 𝐴𝑖,𝑖−1

for i = 2:N

𝑄𝑖 ← −𝑋𝑖 𝑄𝑖−1

N

N+1

N-1

N-2

𝑋 𝐴

Page 56: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| |

Recursive algorithm based on the

Schwinger-Dyson equation

xGEMM + xGESV + xGEMM

Apr 08 2016 Mauro Calderara 20

Obtaining the first and last block columns of the inverse

for i = N:1 𝑋𝑖 ← (𝐴𝑖,𝑖 − 𝐴𝑖,𝑖+1𝑋𝑖+1) \ 𝐴𝑖,𝑖−1

for i = 2:N

𝑄𝑖 ← −𝑋𝑖 𝑄𝑖−1

N

N+1

N-1

N-2

𝑋 𝐴

Page 57: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| |

Recursive algorithm based on the

Schwinger-Dyson equation

xGEMM + xGESV + xGEMM

Very fast on accelerators

Apr 08 2016 Mauro Calderara 20

Obtaining the first and last block columns of the inverse

for i = N:1 𝑋𝑖 ← (𝐴𝑖,𝑖 − 𝐴𝑖,𝑖+1𝑋𝑖+1) \ 𝐴𝑖,𝑖−1

for i = 2:N

𝑄𝑖 ← −𝑋𝑖 𝑄𝑖−1

N

N+1

N-1

N-2

𝑋 𝐴

Page 58: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| |

Recursive algorithm based on the

Schwinger-Dyson equation

xGEMM + xGESV + xGEMM

Very fast on accelerators

Parallelizable

Apr 08 2016 Mauro Calderara 20

Obtaining the first and last block columns of the inverse

for i = N:1 𝑋𝑖 ← (𝐴𝑖,𝑖 − 𝐴𝑖,𝑖+1𝑋𝑖+1) \ 𝐴𝑖,𝑖−1

for i = 2:N

𝑄𝑖 ← −𝑋𝑖 𝑄𝑖−1

N

N+1

N-1

N-2

𝑋 𝐴

Page 59: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| |

Runs on GPUs, compute bound

Apr 08 2016 Mauro Calderara 21

A Sparse Solver for Transport Problems running on GPUs

Page 60: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| |

Runs on GPUs, compute bound

Apr 08 2016 Mauro Calderara 21

A Sparse Solver for Transport Problems running on GPUs

Arithmetic Intensity [log(FLOPS/Byte)]

Performance [log(FLOPS)]

Page 61: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| |

Runs on GPUs, compute bound

Interleaves with EV computation

Apr 08 2016 Mauro Calderara 21

A Sparse Solver for Transport Problems running on GPUs

Arithmetic Intensity [log(FLOPS/Byte)]

Performance [log(FLOPS)]

Page 62: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| |

Runs on GPUs, compute bound

Interleaves with EV computation

Memory efficient

Apr 08 2016 Mauro Calderara 21

A Sparse Solver for Transport Problems running on GPUs

Arithmetic Intensity [log(FLOPS/Byte)]

Performance [log(FLOPS)]

Page 63: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| |

Runs on GPUs, compute bound

Interleaves with EV computation

Memory efficient

Much faster than sparse solvers

Apr 08 2016 Mauro Calderara 21

A Sparse Solver for Transport Problems running on GPUs

Arithmetic Intensity [log(FLOPS/Byte)]

Performance [log(FLOPS)]

Page 64: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| |

Runs on GPUs, compute bound

Interleaves with EV computation

Memory efficient

Much faster than sparse solvers

Whole simulation: O(Hours)

Apr 08 2016 Mauro Calderara 21

A Sparse Solver for Transport Problems running on GPUs

Arithmetic Intensity [log(FLOPS/Byte)]

Performance [log(FLOPS)]

Page 65: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| |

Runs on GPUs, compute bound

Interleaves with EV computation

Memory efficient

Much faster than sparse solvers

Whole simulation: O(Hours)

Apr 08 2016 Mauro Calderara 21

A Sparse Solver for Transport Problems running on GPUs

Arithmetic Intensity [log(FLOPS/Byte)]

Performance [log(FLOPS)]

~ 10x

/ 80x

Page 66: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| |

Apr 08 2016 Mauro Calderara 22

Summary

Page 67: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| |

Transforming a sparse problem to a dense one can be a good thing

Apr 08 2016 Mauro Calderara 22

Summary

Page 68: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| |

Transforming a sparse problem to a dense one can be a good thing

Large speedup over state of the art (15x - 150x)

Apr 08 2016 Mauro Calderara 22

Summary

Page 69: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| |

Transforming a sparse problem to a dense one can be a good thing

Large speedup over state of the art (15x - 150x)

Significant increase in capacity (100’000 atoms → 10x - 100x)

Apr 08 2016 Mauro Calderara 22

Summary

Page 70: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| |

Transforming a sparse problem to a dense one can be a good thing

Large speedup over state of the art (15x - 150x)

Significant increase in capacity (100’000 atoms → 10x - 100x)

Uses hybrid ressources very efficiently (15 PF sustained)

Apr 08 2016 Mauro Calderara 22

Summary

Page 71: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| |

Transforming a sparse problem to a dense one can be a good thing

Large speedup over state of the art (15x - 150x)

Significant increase in capacity (100’000 atoms → 10x - 100x)

Uses hybrid ressources very efficiently (15 PF sustained)

Made ballistic ab-initio QT simulations for realistic structures a reality

Apr 08 2016 Mauro Calderara 22

Summary

Page 72: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| | Apr 08 2016 Mauro Calderara 23 (link to video: http://iis.ee.ethz.ch/~mauro/movie_Ag_Switch.avi)

Page 73: Title to be determined - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6651-mauro-calde… · What we want to do → Quantum Transport: electrons and structures

| | Apr 08 2016 Mauro Calderara 24

Questions?