56
Guy Gueritz Oil & Gas Business Development Mathieu Dubois Senior HPC Consultant

Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

Guy Gueritz Oil & Gas Business Development

Mathieu Dubois Senior HPC Consultant

Page 2: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

2 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

1. Hybrid Architectures for Seismic Imaging

BULL Profile in HPC Hybrid Architectures Example : Reverse Time Migration

2. Parallel Programming for Hybrid Architectures

GPU Activities at BULL : building an expertise Tools and Programming Environments Numerical methods Scalability

Page 3: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

3 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

1. Hybrid Architectures For Seismic

Imaging

Guy GUERITZ

Oil & Gas Business Development

Grenoble Advanced Competency & Services Center

Page 4: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

4 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

BCS

BIS

BSS

BIP

€1.35 – 1.45 billion

€1.2 billion

Direct

margin

Direct

margin

EBIT

Indirect

costs

EBIT

€50-60

million

Indirect

costs

Shareholders

Crescendo Ind. 20%

France Télécom 8%

FSI 5%

NEC 2%

Floating 65%

Total 100%

2011 figures

Revenue €1,301 M + 4.6%

Gross margin +4.2%

EBIT +23%

Employees 9,000

Revenue 2010 >

2013

Maint. & PRS

Services

Hardw. & systems

Fulfillment

Critical systems

Profitability 2010 > 2013

Page 5: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

5 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

0 50 100 150 200

2007

2008

2009

2010

2011

37

70

98

152

181 180M€+ income in 2011 (w/o maintenance)

Three petaflop-scale systems

- 2010: Tera 100, the first petaflopic system ever designed and developed in Europe, one of the most efficient in its category (84% @ linpack)

- 2010-2011: Genci / Curie (France) - 2 Pflops

- 2011-2012: IFERC – 1.5 Pflops

Other recent key projects - KNMI (Netherlands): meteo

- Barcelona Supercomputing Center (Spain): 186 Tflops (hybrid)

- Société Générale (France) : 350 Tflops

- Dassault Aviation (France) : 100 Tflops

- AWE (UK) : 250 Tflops

Launch of Extreme Factory (HPC pay per use) and Mobull (HPC mobile data center)

- Extreme Factory: Renault, Exa, LL Products, Classified

- Mobull: U_Perpignan, Cenaero

Page 6: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

6 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

Services

Design

Architecture

Project

Management

Optimisation

supercomputer suite

StoreWay

Hardware platforms

Software environments

Interconnect

Storage systems

Built from standard components, optimized by Bull’s innovation

Page 7: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

7 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

Structural Mechanics Implicit

Structural Mechanics Explicit

Computational Fluid Dynamics

Electro-Magnetics

Computational Chemistry

Quantum Mechanics

Reservoir Simulation Rendering / Ray Tracing Climate / Weather

Ocean Simulation Data Analytics

Computational Chemistry

Molecular Dynamics Computational Biology

Seismic Processing

Page 8: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

8 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

TERA 100

GPU-based extension

198 bullx B505 accelerator

blades

396 NVIDIA® Tesla™

M2090 GPU processors

202,752 GPU cores

CURIE

GPU-based extension

144 bullx B505 accelerator

blades

288 NVIDIA® Tesla™ M2090

GPU processors

147,456 GPU cores

Barcelona

Supercomputing Centre

GPU-based system

126 bullx B505 accelerator

blades

252 NVIDIA® Tesla™ M2090

GPU processors

129,024 GPU cores

Page 9: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

9 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

Need A super computing system:

to be installed at Petrobras’ new Data Center, at the University Campus of Rio de Janeiro

equipped with GPU accelerator technology

dedicated to the development of new subsurface imaging techniques to support oil exploration and production

Solution

A hybrid architecture coupling 66 general-purpose servers to 66

GPU systems:

66 bullx R422 E2 servers, i.e. 132 compute nodes or 1056 Intel® Xeon® 5500 cores providing a peak performance of 12.4 Tflops

66 NVIDIA® Tesla S1070 GPU systems, i.e. 63360 cores, providing an additional theoretical performance of 246 Tflops

1 bullx R423 E2 service node

Ultra fast InfiniBand QDR interconnect

bullx cluster suite and Red Hat Enterprise Linux

Leader in the Brazilian petrochemical sector,

and one of the largest integrated energy

companies in the world

Page 10: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

10 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

Page 11: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

11 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

source:

exascale.org

Page 12: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

12 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

(Animation courtesy of the Institute of Geophysics in Hamburg)

Page 13: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

13 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

Forward Pass

• First Recursion – forward in time • Model downgoing wavefield, store snapshots of wavefield at set time

intervals

Backward Pass

• Second Recursion – reverse time • Compute backward extrapolation of wavefield snapshots starting with

receiver data

Correlate Forward + Backward Snapshots

• Apply imaging condition • Correlate forward + backward samples together

Page 14: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

14 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

Turning waves

Prismatic waves

Diving waves

Strong reflections

Multiples

Page 15: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

15 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

3D Gridded Model

- Wave equation discretized into derivatives at set timesteps

- 3D grid size & resolution corresponds to wavelength (max. frequency) & aperture

size

Time Approximation by Finite Differences

- Differential equations transformed into finite difference equations at set timesteps

- Explicit scheme: one element is calculated recursively from several previously

calculated points & timesteps

Fourier Methods

- Transforms between time & frequency domains

- Eliminates some cumulative errors found in FD approximations

Page 16: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

16 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

Grid size

- Frequency content

- Choice of FD scheme

Aperture

- Too big = too costly

computationally

- Too small = depending on

geology, may miss reflections

Storing downgoing wavefield

- Snapshots

- ‚Virtual receivers‘ at model

boundaries

- Random boundaries

Code parallelization

- 3D loops in OpenMP, CUDA

Domain decomposition

- MPI implemented to fit local

memory

Page 17: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

17 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

Storing all wavefield snapshots

- Simple method but generates enormous data

- Requires large capacity, fast-access on-node storage

- Node I/O impacts performance

Checkpointing

- Storing pairs of consecutive snapshots at specified time intervals

Storing boundary history only

- Record wavefield at edges & bottom of model

- ‚Virtual‘ receivers

- Recursive calculation, so can regenerate downgoing wavefield

Random boundaries

- Make boundaries random reflectors

- Extrapolate twice, once forward (no storage), once backward (generates downgoing wavefield

Page 18: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

18 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

Multi-core CPU

sockets

Page 19: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

19 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

CPUs connected to RAM via

independent memory channels

Page 20: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

20 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

I/O Hub

2 – 4 GPUs per node

Page 21: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

21 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

I/O Hub

Local mass storage:

spinning or solid-state

drives

Page 22: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

22 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

I/O Hub

Node – node interconnect

Page 23: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

23 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

Water

cooling bullx S supernodes bullx blades

(B500 series and

DLC B700 series)

bullx R series

Storage

Architectur

e ACCELERATORS

supercomputer suite

Page 24: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

24 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

• 2 x Intel Xeon 5600

• 2 x NVIDIA M2090

• 2 x IB QDR

7U

2.1

TF

LO

PS

Embedded Accelerator for high performance with high energy efficiency

Page 25: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

25 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

Front

view

2 x CPUs 2 x GPUs Double-width blade

2 NVIDIA Tesla M2090 GPUs

2 Intel® Xeon® 5600 quad/hexa-core CPUs

1 dedicated PCI-e 16x connection for each GPU

Double InfiniBand QDR connections between

blades

Exploded

view

Page 26: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

26 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

I/O

Controller

Multi GPU System

IB

GBE

GPU GPU GPU GPU

CPU CPU QPI

QPI

PCIe 8x

4GB/s

QPI

westmere EP westmere EP westmere EP

31.2GB/s

12.8GB/s

Each direction

31.2GB/s

IB

PCIe 16x

8GB/s PCIe 8x

4GB/s

IB

PCIe 16x

8GB/s

bullx B505 Accelerator Blade

QPI

I/O

Controller

(Tylersburg)

I/O

Controller

(Tylersburg)

GBE

GPU GPU

Page 27: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

27 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

RTM Example: Salt Diapir

Object of study

- Demonstrate imaging

quality of RTM

- Show GPU speedup

Paradigm ECHOS 1.1

- Uses AXE RTM libraries

Multi-client data imaged

with PSDM

Data courtesy of J. Schlegtenhorst

Page 28: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

28 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

2 cables, 8 streamers

each

(2*8) * 408 traces

16*3.3 MB = 52.8 MB

Streamer interval 100m

Far offset 5300m

Shot pattern 5000m X 700m

Sub-volume 10 Km x 6 Km x 12 Km

12.5m x 12.5m grid

fmax=25 Hz - fmax=40 Hz

Data courtesy of J. Schlegtenhorst

Page 29: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

29 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

CDP Grid Inline 2701

Data courtesy of J. Schlegtenhorst

Page 30: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

30 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

Optimum Grid Inline 2701

Data courtesy of J. Schlegtenhorst

Page 31: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

31 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

CDP Grid Inline 2891

Data courtesy of J. Schlegtenhorst

Page 32: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

32 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

Optimum Grid Inline 2891

Data courtesy of J. Schlegtenhorst

Page 33: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

33 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

Single Shot Runtime (30Hz)

New B510 Sandybridge blades 16 cores, 4 channels to memory RTM image 2h 41m

B505 Westmere GPU blades 2 x M2090 GPU RTM image 15m

Page 34: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

34 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

Run Times:16 cores (1 node 2 sockets SandyBridge)

25 Hz

30 Hz

35 Hz

40 Hz

OPTIMUM GRID 43 min 1 hour 17 min 2 hours 07 min 3 hours 23 min

CDP GRID 2 hours 19 min 2 hour 41 min 2 hours 58 min 3 hours 28 min

Page 35: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

35 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

Choice of hybrid architecture depends on several factors

- Algorithm & numerical method employed

- Correlation strategy used (local storage requirements)

- Grid & aperture sizes

- Frequencies involved

- Size of survey

As RTM becomes more generally used, system scalability will be

of critical importance

- Processor & co-processor technologies evolving rapidly

- Software environment maturing

- Economics of hybrid approach gaining hold

Page 36: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

36 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

2. Parallel Programming For Hybrid

Architectures

Mathieu DUBOIS

Senior Application Engineer - Hardware Accelerators Expert

Applications & Performance Team

Grenoble Advanced Competency & Services Center

Page 37: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

37 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

3 sites : Grenoble (A) , Angers (B),

Les Clayes-sous-Bois (C)

15 fulltime dedicated engineers 14 performance engineers

1 system administrators coming from different scientific domains Software & Hardware Expertise

2 benchmarking systems

Benchmarking System in Anger : top500 ranked (110) – 107 Tflops/s

HPC Lab in Grenoble

B

A

C

Page 38: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

38 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

Presales common operations Technical answers to “calls for tender” Consulting (architecture)

Services “Extreme Computing Competence Center” Specific mission: Porting, integration and optimization

of user applications in their bullx environment

Training

Support High Level support (L3)

Technology watch

Development of internal tools

Page 39: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

39 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

Training Benchmarking

Proof Of Concept /

Code Migration

Code

Optimisation

Activities

Physics, Chemistry,

Biology Oil & Gas

Life Science Security & Finance

Areas

Technology Watch & Performance Evaluation

Page 40: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

40 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

Barcelona Supercomputing Center

•252 M2090

•103 Tflop Linpack score

•Ranked 114 at TOP500

•Ranked 7 at GREEN500 (#1 in Europe)

GENCI

•288 M2090

•110 Tflop Linpack score

•Ranked 102 at TOP500

•Ranked 8 at GREEN500

CEA - Tera 100

•390 M2090

•154 Tflop Linpack score

•Ranked 75 at TOP500

Page 41: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

41 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

Page 42: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

42 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

BULL’s expertise in GPU environment is well

recognized

Page 43: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

43 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

2010 - Premier Prix : Dimitri Komatitsch

SPECFEM3D (geodynamics)

GPU version in development

2009 - Premier Prix : Luigi Genovese

BigDFT (nanosciences)

CUDA & OpenCL version available

Award and Active Development

of Major Scientific Applications

Page 44: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

44 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

PGI Accelerator

HMPP

OpenCL

Fortran CUDA

CUDA C

Performance

PGI Accelerator HMPP

Fortran CUDA

CUDA C

OpenCL

Simplicity

Page 45: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

45 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

2

2

2

2

2

2

2

2

2

1

z

P

y

P

x

P

t

P

v

Isotropic Wave Equation :

order-k in space stencil (here k is 4)

memory bandwidth bound code

Page 46: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

46 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

Per thread

local memory

Per bloc

shared

memory

Per GPU

global

memory

thread

block of threads

kernel 1

kernel 2

sequential kern

els

Page 47: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

47 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

http://developer.download.nvidia.com/CUDA/CUDA_Zone/papers/gpu_3dfd_rev.pdf

First Approach : 3k +1 elements needed for 1 output value

Better Approach : Some data are being reused for several output values Perform calculation from shared memory latency of shared memory is 2 orders of magnitude

lower than global memory

One order of magnitude increased performance comparing GPU to one CPU core

One can also overlap computation with data transfer

for output wavefield saving.

Page 48: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

48 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

Code based on an extension of the pseudo spectral method called the pseudo-analytic model

Modifies the Fourier Transform of the Laplacian operator

correcting the propagation errors from the finite differences scheme

Obtain nearly non-dispersive wave propagation

Original source code in Fortran 90 using OpenMP and MKL FFTs

One shot per node

Obvious Hot Spots :

FFTs

Laplacian

Page 49: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

49 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

0

50

100

150

200

250

300

350

400

kernel 1 kernel 2 kernel 3 kernel 4 kernel 5 FFTs

Tim

e (

se

c)

85 % of the time spent in one subroutine

In this subroutine 6 kernels are identified :

kernel1 : 31 % kernel2 : 13 % kernel3 : 2 % kernel4 : 1 % kernel5 : 18 %

FFT : 35%

Page 50: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

50 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

bullx b505 server with : 2 Intel Westmere 4 cores processors @ 2,67 GHz 24 GB DDR3@1333 MHz 2 NVIDIA M2090 GPUs

Software and tools NVIDIA CUDA 4.1

Intel Compilers version 12 and Intel MPI 4 PGI compilers 11

Use CUFFT Library and write call wrappers

Write a CUDA kernel for each of the 5 subroutine kernels (avoid transfers)

Compare CUDA C , Fortran CUDA, HMPP

Page 51: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

51 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

Simplified Fortran porting

No need for Fortran C CUDA interfaces

No problem with conversion of unit memory stride access in multidimensional arrays

API simplified and identical to Fortran 90 !Define variables on CPUs

real, pinned, allocatable, dimension(:,:,:) :: A_host

!Define variables on GPUs

real, device, allocatable, dimension(:,:,:) :: A_device

!allocate them in a single call

allocate( A_host(nx,ny,nz), A_device(nx,ny,nz) )

!transfer data between CPU/GPU

A_device = A_host

Same Performance between

CUDA C & Fortran CUDA

Page 52: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

52 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

#pragma hmpp <cublas> group, target=cuda

#pragma hmpp <cublas> acquire

#pragma hmppalt cublas declare, name=“cublasSgemm”, extend(error,…),fallbakc=true

void MycublaSgemm(int* proxyError, char transa, char transb, int m, int n, int k,

float alpha, const float *A, int lda, const float *B, int ldb,

float beta, float *C, int ldc)

(

devicedataA = hmpprt_data_get_device_adress(A);

(...)

cublasSgemm(transa,transb,m,n,k,alpha,deviceData1A,lda,deviceDataB,ldb,beta,deviceDataC,ldc);

)

Before HMPP 3 : no possibility to call for external CUDA libraries

Now

#pragma hmpp <cublas> group, target=cuda

#pragma hmpp <cublas> acquire

#pragma hmppalt cublas call, name=“cublasSgemm”

sgemm(trans,trans,n,n,n,alpha,A,n,B,n,beta,C,n)

Replace at compilation with

Page 53: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

53 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

4x Speed Up between

1 M2090 and 8 Xeon cores

Data transfers are reduced to 1 second

0

50

100

150

200

250

300

350

400

kernel 1 kernel 2 kernel 3 kernel 4 kernel 5 FFTs

Tim

e (

sec)

8 cores Xeon

M2090 GPU

Page 54: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

54 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

RTM are embarrassingly parallel applications (over shots)

On standard CPU servers : compute 1 shot per node take advantage of all the CPU cores for full MKL FFT performance Overall performance will increase with new generation processors

On GPU servers : 4x speed up for one shot using one GPU Compute 1 shot per GPU available on the server Diminish the number of servers by 2 for same speed up Keep the same number of servers but with double speed up

Small data set for benchmarking Problem size may be too big to fit in today GPU memory

Page 55: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

55 ©Bull, 2012 GPU Tech Conference 2012 – San Jose

BULL has built his expertise on real customer requests : Trainings

POC in Oil & Gas, Finance, Life Science, Material Science Advice for cluster architecture definition Pro-activity

BULL expertise is recognized: Successful POC with significant speed up and cost reduction

Acknowledgment in scientific publication Help with code migration and optimization

Page 56: Guy Gueritz Oil & Gas Business Development - GTC On-Demand ...on-demand.gputechconf.com/gtc/2012/presentations/S... · Mathieu Dubois Senior HPC Consultant . ... supercomputer suite

56 ©Bull, 2012 GPU Tech Conference 2012 – San Jose