High Performance Compu2ng Department, Simula Research Laboratory… · 2016. 1. 26. · Johannes...

Preview:

Citation preview

MonodomainSimula,onsofCardiacElectrophysiologyoverUnstructured3DHeartGeometry

Pa$entData

JohannesLangguthHighPerformanceCompu2ngDepartment,SimulaResearchLaboratory,Oslo,Norway

Par,,o

ning

ScalableHeterogeneousCPU-GPUComputa$onsforUnstructured

TetrahedralMeshes

MPIseparator

Nodeinteriorpart

MPIseparator

Nodeinteriorpart

Globalproblem

GPU1separator

GPU1interiorpart

GPU0separator

GPU0interiorpart

MPIseparator

CPUinteriorpart

CPU-GPUseparator

MPIseparator

Nodeinteriorpart

ff

3DTetrahedralMesh

QPI$32$GB/s$51.2$GB/s$ 51.2$GB/s$

8$GB/s$ 8$GB/s$

MPI$Infiniband$

HeterogeneousNodes

Compute(CPU(main(part(

Compute(MPI((separator(

Send(MPI(separator(to(other(nodes(

Receive(MPI(separator(from(other(node(

Compute(GPU(0(separator(

Compute(GPU(1(separator( Compute(GPU(1(main(part(

Compute(GPU(0(main(part(

Send(CPU<GPU((separator(to(GPU(0(

GPU(0(swap(

GPU(1(swap(

Send(CPU<GPU((separator(to(GPU(1(

Send(GPU(1((separator(to(GPU(0(

Send(GPU(0((separator(to(GPU(1(

CPU(swap(

Thread(

0(

1(

2(

15(

14(

3(

Time(during(compute(round(

13(

…(

Idle(Compute(CPU<GPU((separator(

Send(GPU(0(separator(to(host(

Send(GPU(1(separator(to(host(

Calcium

Hand

ling

-75

-50

-25

025

scalars

-87.2

40

CA Bt=150ms t=250ms t=500ms t=600ms

t=750ms t=1250ms t=1350mst=1100ms

0"

500"

1000"

1500"

2000"

2500"

3000"

16" 32" 64" 128"

GFLOPs'

GPU"only" Heterogeneous" Communica=on"Disabled"

ScalingPerformance

Recommended