28
HiFUN on GPU Krishnababu et. al. Porting Scalable Parallel CFD Application HiFUN on NVIDIA GPU D. V. Krishnababu, N. Munikrishna, Nikhil Vijay Shende 1 N. Balakrishnan 2 Thejaswi Rao 3 1. S & I Engineering Solutions Pvt. Ltd., Bangalore, India 2. Aerospace Engineering, Indian Institute of Science, Banglore, India 3. NVIDIA Graphics Pvt. Ltd., Banglore, India GPU Technology Conference Silicon Valley March 26–29, 2018 1 / 18

Porting Scalable Parallel CFD Application HiFUN on …...S & I Engineering Solutions Pvt. Ltd., Bangalore, India 2. Aerospace Engineering, Indian Institute of Science, Banglore, India

  • Upload
    others

  • View
    2

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Porting Scalable Parallel CFD Application HiFUN on …...S & I Engineering Solutions Pvt. Ltd., Bangalore, India 2. Aerospace Engineering, Indian Institute of Science, Banglore, India

HiFUN onGPU

Krishnababuet. al. Porting Scalable Parallel CFD Application

HiFUN on NVIDIA GPU

D. V. Krishnababu, N. Munikrishna, Nikhil Vijay Shende 1

N. Balakrishnan 2

Thejaswi Rao 3

1. S & I Engineering Solutions Pvt. Ltd., Bangalore, India2. Aerospace Engineering, Indian Institute of Science, Banglore, India

3. NVIDIA Graphics Pvt. Ltd., Banglore, India

GPU Technology ConferenceSilicon Valley

March 26–29, 20181 / 18

Page 2: Porting Scalable Parallel CFD Application HiFUN on …...S & I Engineering Solutions Pvt. Ltd., Bangalore, India 2. Aerospace Engineering, Indian Institute of Science, Banglore, India

HiFUN onGPU

Krishnababuet. al.

Introductionhttp://www.sandi.co.in

The HiFUN SoftwareHigh Resolution Flow Solver on Unstructured Meshes.A Computational Fluid Dynamics (CFD) Flow Solver.Primary product of the company SandI.Robust, fast, accurate and efficient tool.

About SandIA technology company.Incubated from Indian Institute of Science, Bangalore.Promotes high end CFD technologies withuncompromising quality standards.

2 / 18

Page 3: Porting Scalable Parallel CFD Application HiFUN on …...S & I Engineering Solutions Pvt. Ltd., Bangalore, India 2. Aerospace Engineering, Indian Institute of Science, Banglore, India

HiFUN onGPU

Krishnababuet. al.

Introductionhttp://www.sandi.co.in

The HiFUN SoftwareHigh Resolution Flow Solver on Unstructured Meshes.A Computational Fluid Dynamics (CFD) Flow Solver.Primary product of the company SandI.Robust, fast, accurate and efficient tool.

About SandIA technology company.Incubated from Indian Institute of Science, Bangalore.Promotes high end CFD technologies withuncompromising quality standards.

2 / 18

Page 4: Porting Scalable Parallel CFD Application HiFUN on …...S & I Engineering Solutions Pvt. Ltd., Bangalore, India 2. Aerospace Engineering, Indian Institute of Science, Banglore, India

HiFUN onGPU

Krishnababuet. al.

Features of HiFUNhttp://www.sandi.co.in/home/products

General

3 / 18

Page 5: Porting Scalable Parallel CFD Application HiFUN on …...S & I Engineering Solutions Pvt. Ltd., Bangalore, India 2. Aerospace Engineering, Indian Institute of Science, Banglore, India

HiFUN onGPU

Krishnababuet. al.

Features of HiFUNhttp://www.sandi.co.in/home/products

Well Validated

AIAA DPW SPICES

AIAA HiLiftPW4 / 18

Page 6: Porting Scalable Parallel CFD Application HiFUN on …...S & I Engineering Solutions Pvt. Ltd., Bangalore, India 2. Aerospace Engineering, Indian Institute of Science, Banglore, India

HiFUN onGPU

Krishnababuet. al.

Features of HiFUNhttp://www.sandi.co.in/home/products

Super Scalable Workload: 165 Million Volumes

Simulation CPU Cores Time (Hours/Days)256 30/1.25

RANS10000 1

256 108/4.5URANS

10000 3256 525/22

DES10000 15

5 / 18

Page 7: Porting Scalable Parallel CFD Application HiFUN on …...S & I Engineering Solutions Pvt. Ltd., Bangalore, India 2. Aerospace Engineering, Indian Institute of Science, Banglore, India

HiFUN onGPU

Krishnababuet. al.

SandI–NVIDIA Collaboration

2014 - Joint Development Initiative Kicks Off

2015 - NVIDIA Innovation Award

2016 -

GTCx Mumbai

HiFUN in GPU Apps Catalogue

GTC 2016: Poster Presentation

2018 - GTC 2018

WayAhead

-HiFUN on NVIDIA Pascal, Volta GPU

NVLink With IBM Power CPU

6 / 18

Page 8: Porting Scalable Parallel CFD Application HiFUN on …...S & I Engineering Solutions Pvt. Ltd., Bangalore, India 2. Aerospace Engineering, Indian Institute of Science, Banglore, India

HiFUN onGPU

Krishnababuet. al.

HiFUN on NVIDIA GPU

Hybrid SupercomputersConsist of CPU and NVIDIA GPU.Less power to achieve same FLOPS.Less cooling & space.

GPUThousands of computing cores sharing same RAM.Higher memory bandwidth.High data transfer overheads with CPU.

7 / 18

Page 9: Porting Scalable Parallel CFD Application HiFUN on …...S & I Engineering Solutions Pvt. Ltd., Bangalore, India 2. Aerospace Engineering, Indian Institute of Science, Banglore, India

HiFUN onGPU

Krishnababuet. al.

HiFUN on NVIDIA GPU

Hybrid SupercomputersConsist of CPU and NVIDIA GPU.Less power to achieve same FLOPS.Less cooling & space.

GPUThousands of computing cores sharing same RAM.Higher memory bandwidth.High data transfer overheads with CPU.

7 / 18

Page 10: Porting Scalable Parallel CFD Application HiFUN on …...S & I Engineering Solutions Pvt. Ltd., Bangalore, India 2. Aerospace Engineering, Indian Institute of Science, Banglore, India

HiFUN onGPU

Krishnababuet. al.

HiFUN on NVIDIA GPU

Parallelization Model on GPUShared memory.Many FLOPS per byte of data from CPU to GPU.Re–look at parallelization of CFD algorithms.

Parallelization Challenges

General purpose algorithms.Implicit: Global data dependence.Complex multi–layered unstructured data structure.

8 / 18

Page 11: Porting Scalable Parallel CFD Application HiFUN on …...S & I Engineering Solutions Pvt. Ltd., Bangalore, India 2. Aerospace Engineering, Indian Institute of Science, Banglore, India

HiFUN onGPU

Krishnababuet. al.

HiFUN on NVIDIA GPU

Parallelization Model on GPUShared memory.Many FLOPS per byte of data from CPU to GPU.Re–look at parallelization of CFD algorithms.

Parallelization Challenges

General purpose algorithms.Implicit: Global data dependence.Complex multi–layered unstructured data structure.

8 / 18

Page 12: Porting Scalable Parallel CFD Application HiFUN on …...S & I Engineering Solutions Pvt. Ltd., Bangalore, India 2. Aerospace Engineering, Indian Institute of Science, Banglore, India

HiFUN onGPU

Krishnababuet. al.

HiFUN on NVIDIA GPU

ConstraintsNo compromise on distributed memory scalability.Source code maintainability should not suffer.Software portability should not suffer.

Parallel Strategy

Accelerate single node performance via offload model.Hybrid: MPI and OpenACC directives.

Offload ModelComputationally intensive part is offloaded to GPU.Optimal data communication between CPU & GPU.

9 / 18

Page 13: Porting Scalable Parallel CFD Application HiFUN on …...S & I Engineering Solutions Pvt. Ltd., Bangalore, India 2. Aerospace Engineering, Indian Institute of Science, Banglore, India

HiFUN onGPU

Krishnababuet. al.

HiFUN on NVIDIA GPU

ConstraintsNo compromise on distributed memory scalability.Source code maintainability should not suffer.Software portability should not suffer.

Parallel Strategy

Accelerate single node performance via offload model.Hybrid: MPI and OpenACC directives.

Offload ModelComputationally intensive part is offloaded to GPU.Optimal data communication between CPU & GPU.

9 / 18

Page 14: Porting Scalable Parallel CFD Application HiFUN on …...S & I Engineering Solutions Pvt. Ltd., Bangalore, India 2. Aerospace Engineering, Indian Institute of Science, Banglore, India

HiFUN onGPU

Krishnababuet. al.

HiFUN on NVIDIA GPU

ConstraintsNo compromise on distributed memory scalability.Source code maintainability should not suffer.Software portability should not suffer.

Parallel Strategy

Accelerate single node performance via offload model.Hybrid: MPI and OpenACC directives.

Offload ModelComputationally intensive part is offloaded to GPU.Optimal data communication between CPU & GPU.

9 / 18

Page 15: Porting Scalable Parallel CFD Application HiFUN on …...S & I Engineering Solutions Pvt. Ltd., Bangalore, India 2. Aerospace Engineering, Indian Institute of Science, Banglore, India

HiFUN onGPU

Krishnababuet. al.

HiFUN on NVIDIA GPU

Onera M6 NASA CRM Trap Wing

Configurations & Workloads (Million)

Onera M6 Wing: 1.1, 9.3, 12.12, 15.4NASA CRM: 6.2, 26.5, 30NASA Trap Wing: 20, 66

Simulation TypeSteady RANS Simulations

10 / 18

Page 16: Porting Scalable Parallel CFD Application HiFUN on …...S & I Engineering Solutions Pvt. Ltd., Bangalore, India 2. Aerospace Engineering, Indian Institute of Science, Banglore, India

HiFUN onGPU

Krishnababuet. al.

HiFUN on NVIDIA GPU

Onera M6 NASA CRM Trap Wing

Configurations & Workloads (Million)

Onera M6 Wing: 1.1, 9.3, 12.12, 15.4NASA CRM: 6.2, 26.5, 30NASA Trap Wing: 20, 66

Simulation TypeSteady RANS Simulations

10 / 18

Page 17: Porting Scalable Parallel CFD Application HiFUN on …...S & I Engineering Solutions Pvt. Ltd., Bangalore, India 2. Aerospace Engineering, Indian Institute of Science, Banglore, India

HiFUN onGPU

Krishnababuet. al.

HiFUN on NVIDIA GPU

Computing Platform: NVIDIA PSG

Node configurationTwo Hexa–deca core Intel(R) Xeon(R) Haswellprocessors.Eight NVIDIA Tesla K–80 GPUs.

GPU Memory = 12 GB.Total CPU Memory per node = 256 GB.Infiniband interconnect

SoftwarePGI Compiler 16.7OPENMPI 1.10.2OpenACC 2.0

11 / 18

Page 18: Porting Scalable Parallel CFD Application HiFUN on …...S & I Engineering Solutions Pvt. Ltd., Bangalore, India 2. Aerospace Engineering, Indian Institute of Science, Banglore, India

HiFUN onGPU

Krishnababuet. al.

HiFUN on NVIDIA GPU

Computing Platform: NVIDIA PSG

Node configurationTwo Hexa–deca core Intel(R) Xeon(R) Haswellprocessors.Eight NVIDIA Tesla K–80 GPUs.

GPU Memory = 12 GB.Total CPU Memory per node = 256 GB.Infiniband interconnect

SoftwarePGI Compiler 16.7OPENMPI 1.10.2OpenACC 2.0

11 / 18

Page 19: Porting Scalable Parallel CFD Application HiFUN on …...S & I Engineering Solutions Pvt. Ltd., Bangalore, India 2. Aerospace Engineering, Indian Institute of Science, Banglore, India

HiFUN onGPU

Krishnababuet. al.

HiFUN on NVIDIA GPUParallel Performance Parameters

Ideal Speed–upRatio of number of nodes used for a given run to referencenumber of nodes.

Actual Speed–upRatio of time/iteration using reference number of nodes totime/iteration using number of nodes for given run.

Accelerator Speed–upRatio of time per iteration obtained using given no. of CPUsto time per iteration obtained using same no. of CPUsworking in tandem with GPUs.

12 / 18

Page 20: Porting Scalable Parallel CFD Application HiFUN on …...S & I Engineering Solutions Pvt. Ltd., Bangalore, India 2. Aerospace Engineering, Indian Institute of Science, Banglore, India

HiFUN onGPU

Krishnababuet. al.

HiFUN on NVIDIA GPUSingle Node Performance

Accelerator Speed–up on 2 GPU

ObservationsIncrease in grid size increases GPU utilization andaccelerator speed–up.Important to load GPU completely.

13 / 18

Page 21: Porting Scalable Parallel CFD Application HiFUN on …...S & I Engineering Solutions Pvt. Ltd., Bangalore, India 2. Aerospace Engineering, Indian Institute of Science, Banglore, India

HiFUN onGPU

Krishnababuet. al.

HiFUN on NVIDIA GPUSingle Node Performance

Varying GPUs % Increase

ObservationsIncrease in no. of GPUs increase acceleratorspeed–up.Use of 4 GPUs per node is optimal.

14 / 18

Page 22: Porting Scalable Parallel CFD Application HiFUN on …...S & I Engineering Solutions Pvt. Ltd., Bangalore, India 2. Aerospace Engineering, Indian Institute of Science, Banglore, India

HiFUN onGPU

Krishnababuet. al.

HiFUN on NVIDIA GPUSingle Node Performance

Time to RANS Solution (Hours)

ObservationsTime to solution on 1 million grid ∼ 15 minutes.Time to solution on 30 million grid ∼ half a day.Single node serves as a desktop supercomputer.

15 / 18

Page 23: Porting Scalable Parallel CFD Application HiFUN on …...S & I Engineering Solutions Pvt. Ltd., Bangalore, India 2. Aerospace Engineering, Indian Institute of Science, Banglore, India

HiFUN onGPU

Krishnababuet. al.

HiFUN on NVIDIA GPUMulti–node Performance

Parallel Speed–up: 66 Million Workload

ObservationsNear linear speed–up using 2 GPUs per node.Drop in speed–up for larger no. nodes and/or higherGPUs due to lower GPU utilization.

16 / 18

Page 24: Porting Scalable Parallel CFD Application HiFUN on …...S & I Engineering Solutions Pvt. Ltd., Bangalore, India 2. Aerospace Engineering, Indian Institute of Science, Banglore, India

HiFUN onGPU

Krishnababuet. al.

HiFUN on NVIDIA GPUMulti–node Performance

Normalized Time Per Iteration: 66 Million WorkloadObservations

Drop in time/iter with increase in no. of nodes and/orGPUs.Time to solution with 8 nodes ∼ 4 hours.

17 / 18

Page 25: Porting Scalable Parallel CFD Application HiFUN on …...S & I Engineering Solutions Pvt. Ltd., Bangalore, India 2. Aerospace Engineering, Indian Institute of Science, Banglore, India

HiFUN onGPU

Krishnababuet. al.

HiFUN on NVIDIA GPU

Concluding Remarks

Offload model to port HiFUN on GPU.GPU based computing node is powerful enough toserve as desktop supercomputer.HiFUN is ideally suited to solve grand challengeproblems on GPU based hybrid supercomputers.OpenACC directives based offload model is anattractive option for porting legacy CFD codes on GPU.

18 / 18

Page 26: Porting Scalable Parallel CFD Application HiFUN on …...S & I Engineering Solutions Pvt. Ltd., Bangalore, India 2. Aerospace Engineering, Indian Institute of Science, Banglore, India

HiFUN onGPU

Krishnababuet. al.

HiFUN on NVIDIA GPU

Concluding Remarks

Offload model to port HiFUN on GPU.GPU based computing node is powerful enough toserve as desktop supercomputer.HiFUN is ideally suited to solve grand challengeproblems on GPU based hybrid supercomputers.OpenACC directives based offload model is anattractive option for porting legacy CFD codes on GPU.

18 / 18

Page 27: Porting Scalable Parallel CFD Application HiFUN on …...S & I Engineering Solutions Pvt. Ltd., Bangalore, India 2. Aerospace Engineering, Indian Institute of Science, Banglore, India

HiFUN onGPU

Krishnababuet. al.

HiFUN on NVIDIA GPU

Concluding Remarks

Offload model to port HiFUN on GPU.GPU based computing node is powerful enough toserve as desktop supercomputer.HiFUN is ideally suited to solve grand challengeproblems on GPU based hybrid supercomputers.OpenACC directives based offload model is anattractive option for porting legacy CFD codes on GPU.

18 / 18

Page 28: Porting Scalable Parallel CFD Application HiFUN on …...S & I Engineering Solutions Pvt. Ltd., Bangalore, India 2. Aerospace Engineering, Indian Institute of Science, Banglore, India

HiFUN onGPU

Krishnababuet. al.

HiFUN on NVIDIA GPU

Concluding Remarks

Offload model to port HiFUN on GPU.GPU based computing node is powerful enough toserve as desktop supercomputer.HiFUN is ideally suited to solve grand challengeproblems on GPU based hybrid supercomputers.OpenACC directives based offload model is anattractive option for porting legacy CFD codes on GPU.

18 / 18