24
Stan Posey, CAE Industry Development NVIDIA, Santa Clara, CA, USA

Stan Posey, CAE Industry Development ... - The ANSYS … · ANSYS Maxwell ANSYS Nexxim (Signal Integrity) ... CUDA 4.0.17 ANSYS Mechanical 14 Preview on GPU Workstation ... Santa

  • Upload
    hakhue

  • View
    234

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Stan Posey, CAE Industry Development ... - The ANSYS … · ANSYS Maxwell ANSYS Nexxim (Signal Integrity) ... CUDA 4.0.17 ANSYS Mechanical 14 Preview on GPU Workstation ... Santa

Stan Posey, CAE Industry Development

NVIDIA, Santa Clara, CA, USA

Page 2: Stan Posey, CAE Industry Development ... - The ANSYS … · ANSYS Maxwell ANSYS Nexxim (Signal Integrity) ... CUDA 4.0.17 ANSYS Mechanical 14 Preview on GPU Workstation ... Santa

2 ANSYS 2011 Regional Conferences

NVIDIA and HPC Evolution of GPUs

Public, based in Santa Clara, CA | ~$4B revenue | ~5,500 employees

Founded in 1999 with primary business in semiconductor industry Products for graphics in workstations, notebooks, mobile devices, etc.

Began R&D of GPUs for HPC in 2004, released first Tesla and CUDA in 2007

Development of GPUs as a co-processing accelerator for x86 CPUs

2004: Began strategic investments in GPU as HPC co-processor

2006: G80 first GPU with built-in compute features, 128 cores; CUDA SDK Beta

2007: Tesla 8-series based on G80, 128 cores – CUDA 1.0, 1.1

2008: Tesla 10-series based on GT 200, 240 cores – CUDA 2.0, 2.3

2009: Tesla 20-series, code named ―Fermi‖ up to 512 cores – CUDA SDK 3.0, 3.2

HPC Evolution of GPUs

3 Years With

3 Generations

Page 3: Stan Posey, CAE Industry Development ... - The ANSYS … · ANSYS Maxwell ANSYS Nexxim (Signal Integrity) ... CUDA 4.0.17 ANSYS Mechanical 14 Preview on GPU Workstation ... Santa

3 ANSYS 2011 Regional Conferences

NVIDIA and ANSYS Collaboration Focus

NVIDIA Structural Fluid Electro- GPU Status Mechanics Dynamics magnetics

Available Today

Updates for 2011

Product

Evaluation

Research Evaluation

ANSYS Mechanical 13

SMP, Single GPU

ANSYS Mechanical 14

DMP, Improved PCG

ANSYS HFSS

ANSYS Maxwell

ANSYS Nexxim

(Signal Integrity)

ANSYS CFD 15

Solver, other models

NVIDIA Provides Business and Engineering Investments in ANSYS Technology Developments

ANSYS CFD 14

Radiation HT (beta)

ANSYS Mechanical 15

Multi-GPU, Multi-node

Page 4: Stan Posey, CAE Industry Development ... - The ANSYS … · ANSYS Maxwell ANSYS Nexxim (Signal Integrity) ... CUDA 4.0.17 ANSYS Mechanical 14 Preview on GPU Workstation ... Santa

4 ANSYS 2011 Regional Conferences

ANSYS computes the heavy workloads of matrix solvers on

the GPU and other routines on the CPU

ANSYS Mechanical GPU acceleration is user-transparent

Jobs launch and complete without additional user steps

1. ANSYS job launched on CPU

2. Solver operations sent to GPU

3. GPU sends results back to CPU

4. ANSYS job completes on CPU

1

2

3 4

How ANSYS Software Works With GPUs

Page 5: Stan Posey, CAE Industry Development ... - The ANSYS … · ANSYS Maxwell ANSYS Nexxim (Signal Integrity) ... CUDA 4.0.17 ANSYS Mechanical 14 Preview on GPU Workstation ... Santa

5 ANSYS 2011 Regional Conferences

Important Considerations for ANSYS and GPUs

Core ANSYS focus is on direct and iterative linear solvers

Others (models, mat. assembly) move to GPUs in progressive stages

Most ANSYS software employs a domain parallel method

GPU computing fits this method, preserves DANSYS investments ANSYS 13 focus was SMP solvers; ANSYS 14 focus is DANSYS solvers

ANSYS software is parallel and scales well for multi-core CPUs Direct solvers use a scheme of computations on both GPU and CPU Iterative solvers have computations on GPU, matrix assembly on CPU Investigations include GPU performance against multi-core CPU only

Page 6: Stan Posey, CAE Industry Development ... - The ANSYS … · ANSYS Maxwell ANSYS Nexxim (Signal Integrity) ... CUDA 4.0.17 ANSYS Mechanical 14 Preview on GPU Workstation ... Santa

6 ANSYS 2011 Regional Conferences

ANSYS Presentation at NVIDIA GTC 2010 Sep 20 — 23, 2010 San Jose Convention Center, San Jose, California, USA

Accelerating System Level Signal Integrity Simulation with GPU Dr. Ekanathan Palamadai, ANSYS

Lower

is

better

Nexxim 13.0 Convolution Results for Tesla C2050:

Intel Nehalem 8 core CPU, OpenMP: 108 H

NVIDIA Tesla C2050 GPU, OpenMP: 4 H

Single Precision ~27x

Double Precision ~13x

Speedup combines

GPU and other SW

changes

Page 7: Stan Posey, CAE Industry Development ... - The ANSYS … · ANSYS Maxwell ANSYS Nexxim (Signal Integrity) ... CUDA 4.0.17 ANSYS Mechanical 14 Preview on GPU Workstation ... Santa

7 ANSYS 2011 Regional Conferences

ANSYS CFD 14.0 to Offer (Beta) GPU Capability

ANSYS CFD preliminary results of radiation heat transfer view-factor computation on GPUs vs. CPUs Radiation HT Applications: - Underhood cooling - Cabin comfort HVAC - Furnace simulations - Solar loads on buildings - Combustor in turbine - Electronics passive cooling

Other ANSYS CFD Evaluations: - Models (e.g. disperse phase) - Implicit equation solvers

NOTE: Growing CPU time of view-factor

computations inhibit proper inclusion of radiation HT effects

NOTE: GPU time remains low even

as view-factor computations

grow very large

Page 8: Stan Posey, CAE Industry Development ... - The ANSYS … · ANSYS Maxwell ANSYS Nexxim (Signal Integrity) ... CUDA 4.0.17 ANSYS Mechanical 14 Preview on GPU Workstation ... Santa

8 ANSYS 2011 Regional Conferences

ANSYS Announcement of NVIDIA CUDA Support

"This initial development for GPU computing demonstrates our focus on evolving ANSYS software to take advantage of important technology trends in high-performance computing." said Dipankar Choudhury, vice president of

corporate product strategy and planning at ANSYS. "We work to achieve optimized software performance, across the full spectrum of HPC technologies, so that our customers get maximum value from their investment in HPC. Here, our

technical collaboration with NVIDIA has resulted in a significant benefit for our mutual customers."

Page 9: Stan Posey, CAE Industry Development ... - The ANSYS … · ANSYS Maxwell ANSYS Nexxim (Signal Integrity) ... CUDA 4.0.17 ANSYS Mechanical 14 Preview on GPU Workstation ... Santa

9 ANSYS 2011 Regional Conferences

ANSYS Mechanical 13: Collaboration on SMP direct sparse and PCG/JCG iterative solvers – CUDA 3.2 support in 13.0 SP2

Initial release for both Linux and Windows 64-bit, and single GPU per job – multi-GPU under evaluation for future release:

Model limits for direct depend on largest front sizes: GPUs good for ~1M DOF to ~8M DOF for 6GB Tesla C2075 or Quadro 6000 Model limits for iterative depend on GPU memory: GPUs good for ~1M DOF to ~5M DOF for 6GB Tesla C2075 or Quadro 6000

ANSYS Mechanical 14: Collaboration on DMP solvers – Nov 11

Details of ANSYS Mechanical for NVIDIA GPUs

Page 10: Stan Posey, CAE Industry Development ... - The ANSYS … · ANSYS Maxwell ANSYS Nexxim (Signal Integrity) ... CUDA 4.0.17 ANSYS Mechanical 14 Preview on GPU Workstation ... Santa

10 ANSYS 2011 Regional Conferences

ANSYS Mechanical Results of Solver Acceleration NOTE: Results of ANSYS Mechanical for Tesla C2050 and Intel Xeon 5560

- Xeon 5560, 2.8 GHz 2 sockets, 8 cores - 32 GB memory - Win XP SP2 64-bit - Tesla C2050 GPU

GPU Solver

Kernel Speedups

GPU Overall

Simulation Speedups

From NAFEMS World Congress May 2011

Boston, MA, USA

“Accelerate FEA Simulations with a GPU”

-by Jeff Beisheim, ANSYS

System Configuration:

Page 11: Stan Posey, CAE Industry Development ... - The ANSYS … · ANSYS Maxwell ANSYS Nexxim (Signal Integrity) ... CUDA 4.0.17 ANSYS Mechanical 14 Preview on GPU Workstation ... Santa

11 ANSYS 2011 Regional Conferences

ANSYS 13 and 14P3 Performance Study by NVIDIA HP Z800 Workstation Configuration - $9,403

Windows 7 Professional 64-bit or CentOS 2 x Xeon® X5650 HC 2.66 GHz CPUs 12MB/1333 (12 cores) NVIDIA Quadro 2000 1 GB Graphics HP 24 GB (6x4GB) DDR3-1333 ECC memory HP 500 GB SATA 7200 HDD Add HP 24 GB (6x4GB) for total 48 GB - $1,800 (included) Source: h10010.www1.hp.com/wwpc/pscmisc/vac/us/en/sm/workstations/z800.html

NVIDIA Tesla C2075 GPU – about $2,000 NOTE: Directly configures with HP Z800 as above and with other workstations and servers [vendors appear later in presentation]

ANSYS Mechanical Model – V13sp-5

Turbine geometry of 2,100 K DOF and mostly SOLID187 FE’s

Single load step, static, large deflection nonlinear

Use of ANSYS Mechanical 13 SMP and 14P3 DANSYS direct sparse solver

ANSYS Mechanical Model – V13cg-2 [Study still a work in progress]

Engine block (static, linear) of 6,270 K DOF and mostly SOLID187 FE’s

Use of ANSYS Mechanical 14P3 SMP PCG iterative solver

+ NVIDIA Tesla C2075 GPU

HP Z800 Workstation

Page 12: Stan Posey, CAE Industry Development ... - The ANSYS … · ANSYS Maxwell ANSYS Nexxim (Signal Integrity) ... CUDA 4.0.17 ANSYS Mechanical 14 Preview on GPU Workstation ... Santa

12 ANSYS 2011 Regional Conferences

V13sp-5 Model 2449

1484

846

633560 512

395 358 359414450526

0

1000

2000

3000 Xeon 5670 2.93 GHz Westmere (Dual Socket)

Xeon 5670 2.93 GHz Westmere + Tesla C2075

AN

SY

S M

ech

anic

al T

imes

in S

eco

nd

s

- Turbine geometry

- 2,100 K DOF

- SOLID187 FEs

- Static, nonlinear

- One load step

- Direct sparse

4.7x

2.0x

3.3x

1.6x 1.4x 1.6x

NOTE: Results Based on ANSYS Mechanical 13.0 SP2 SMP Solver Jun 2011

NOTE: Add a Tesla C2075 to use with 6 cores: now 30% faster than 12, with 6 available for other tasks

1 Core 2 Core 4 Core 6 Core 12 Core

1 Socket 2 Socket

8 Core

Results from HP Z800 Workstation, 2 x Xeon X5670 2.93GHz 48GB memory, CentOS 5.4 x64; Tesla C2075, CUDA 4.0.17

ANSYS Mechanical 13 on GPU Workstation

AVAILABLE TODAY Lower

is

better

Page 13: Stan Posey, CAE Industry Development ... - The ANSYS … · ANSYS Maxwell ANSYS Nexxim (Signal Integrity) ... CUDA 4.0.17 ANSYS Mechanical 14 Preview on GPU Workstation ... Santa

13 ANSYS 2011 Regional Conferences

V13sp-5 Model

1848

1192

846

564 516399

273 270314342444

0

1000

2000

3000 Xeon 5670 2.93 GHz Westmere (Dual Socket)

Xeon 5670 2.93 GHz Westmere + Tesla C2075

AN

SY

S M

ech

anic

al T

imes

in S

eco

nd

s

- Turbine geometry

- 2,100 K DOF

- SOLID187 FEs

- Static, nonlinear

- One load step

- Direct sparse

4.2x

2.7x 3.5x

2.1x 1.9x

NOTE: Add a Tesla C2075 to use with 6 cores: now 46% faster than 12, with 6 available for other tasks

1 Core 2 Core 4 Core 6 Core 12 Core

1 Socket 2 Socket

8 Core

Results from HP Z800 Workstation, 2 x Xeon X5670 2.93GHz 48GB memory, CentOS 5.4 x64; Tesla C2075, CUDA 4.0.17

ANSYS Mechanical 14 Preview on GPU Workstation

NOTE: Results Based on ANSYS Mechanical 14.0 Preview 3 DMP Solver Aug 2011

AVAILABLE NOV 2011 Lower

is

better

Page 14: Stan Posey, CAE Industry Development ... - The ANSYS … · ANSYS Maxwell ANSYS Nexxim (Signal Integrity) ... CUDA 4.0.17 ANSYS Mechanical 14 Preview on GPU Workstation ... Santa

14 ANSYS 2011 Regional Conferences

V13sp-5 Model

414395

358

270273314

0

250

500

750 Xeon 5670 + Tesla C2075 for 13.0 SP2 SMP

Xeon 5670 + Tesla C2075 for 14.0 P3 DMP

AN

SY

S M

ech

anic

al T

imes

in S

eco

nd

s

- Turbine geometry

- 2,100 K DOF

- SOLID187 FEs

- Static, nonlinear

- One load step

- Direct sparse 4 Core 6 Core

32%

ANSYS Mechanical for 12-Core GPU Workstation

NOTE: Comparison of ANSYS Mechanical14.0 Preview 3 DMP vs. 13.0 SP2 SMP for Tesla GPU

8 Core

Results from HP Z800 Workstation, 2 x Xeon X5670 2.93GHz 48GB memory, CentOS 5.4 x64; Tesla C2075, CUDA 4.0.17

45% 33%

13SP2 14P3 13SP2 14P3 13SP2 14P3

Lower

is

better

Page 15: Stan Posey, CAE Industry Development ... - The ANSYS … · ANSYS Maxwell ANSYS Nexxim (Signal Integrity) ... CUDA 4.0.17 ANSYS Mechanical 14 Preview on GPU Workstation ... Santa

15 ANSYS 2011 Regional Conferences

830

426

1524

11551214

682

0

500

1000

1500

2000Xeon 5560 2.8 GHz Nehalem 4 Cores (Dual Socket)

Xeon 5560 2.8 GHz Nehalem 4 Cores + Tesla C2050

48 GB In-memory

32 GB Out-of-memory

24 GB Out-of-memory

AN

SY

S M

ech

anic

al T

imes

in S

eco

nd

s

2.0x

Study on System Memory Effects at 4 Cores

1.3x

NOTE: Results Based on ANSYS Mechanical 13.0 SMP Direct Solver Sep 2010

34 GB required for in-memory solution

Results from HP Z800 Workstation, 2 x Xeon X5560 2.8GHz CPUs, 48GB memory, MKL 10.25; Tesla C2050, CUDA 3.1

NOTE: Greatest benefit for CPU and CPU+GPU is in-memory solution

V12sp-5 Model

- Turbine geometry

- 2,100 K DOF

- SOLID187 FEs

- Static, nonlinear

- One load step

- Direct sparse

1.7x

Lower

is

better

Page 16: Stan Posey, CAE Industry Development ... - The ANSYS … · ANSYS Maxwell ANSYS Nexxim (Signal Integrity) ... CUDA 4.0.17 ANSYS Mechanical 14 Preview on GPU Workstation ... Santa

16 ANSYS 2011 Regional Conferences

V13cg-2 Model 1758

1175

817721 732

828

153 147 146161280

0

500

1000

1500

2000 Xeon 5670 2.93 GHz Westmere (Dual Socket)

Xeon 5670 2.93 GHz Westmere + Tesla C2075

AN

SY

S M

ech

anic

al T

imes

in S

eco

nd

s

- Engine block

- 6,270 K DOF

- SOLID187 FEs

- Static, linear

- PCG iterative

6.3x

5.1x 4.7x 5.0x

1 Core 2 Core 4 Core 6 Core 12 Core

1 Socket 2 Socket

8 Core

Results from HP Z800 Workstation, 2 x Xeon X5670 2.93GHz 48GB memory, CentOS 5.4 x64; Tesla C2075, CUDA 4.0.17

ANSYS Mechanical 14 Preview on GPU Workstation

NOTE: Results Based on ANSYS Mechanical 14.0 Preview 3 SMP Solver Aug 2011

AVAILABLE NOV 2011

5.7x

Lower

is

better

NOTE: Results for SMP only

Page 17: Stan Posey, CAE Industry Development ... - The ANSYS … · ANSYS Maxwell ANSYS Nexxim (Signal Integrity) ... CUDA 4.0.17 ANSYS Mechanical 14 Preview on GPU Workstation ... Santa

17 ANSYS 2011 Regional Conferences

V13cg-2 Model

1829

1048

682605

666

465

1758

1175

817721 732

828

153 147 146161280

0

500

1000

1500

2000 Xeon 5670 2.93 GHz Westmere - DANSYS

Xeon 5670 2.93 GHz Westmere - SMP

Xeon 5670 2.93 GHz Westmere - SMP + Tesla C2075

AN

SY

S M

ech

anic

al T

imes

in S

eco

nd

s

- Engine block

- 6,270 K DOF

- SOLID187 FEs

- Static, linear

- PCG iterative

6.3x

5.1x 4.7x 5.0x

1 Core 2 Core 4 Core 6 Core 12 Core

1 Socket 2 Socket

8 Core

Results from HP Z800 Workstation, 2 x Xeon X5670 2.93GHz 48GB memory, CentOS 5.4 x64; Tesla C2075, CUDA 4.0.17

ANSYS Mechanical 14 Preview on GPU Workstation

NOTE: Results Based on ANSYS Mechanical 14.0 Preview 3 Solvers Aug 2011

AVAILABLE NOV 2011

5.7x

Lower

is

better

NOTE: DANSYS outperforms SMP

Page 18: Stan Posey, CAE Industry Development ... - The ANSYS … · ANSYS Maxwell ANSYS Nexxim (Signal Integrity) ... CUDA 4.0.17 ANSYS Mechanical 14 Preview on GPU Workstation ... Santa

18 ANSYS 2011 Regional Conferences

ANSYS Base License : Unlocks up to 2 CPU Cores

ANSYS HPC Pack: Unlocks up to 8 CPU Cores

Unlocks 1 computational GPU

ANSYS HPC Core Licensees: Contact ANSYS to enquire

* Academic customers: GPU feature is bundled with ANSYS Base License

*

How ANSYS is Licensed for NVIDIA GPUs

Page 19: Stan Posey, CAE Industry Development ... - The ANSYS … · ANSYS Maxwell ANSYS Nexxim (Signal Integrity) ... CUDA 4.0.17 ANSYS Mechanical 14 Preview on GPU Workstation ... Santa

19 ANSYS 2011 Regional Conferences

ANSYS 14 Performance Gain > 4X vs. Base License

4.4

1.35 1.35 1.38

1.0

2.12.3

1.0

0

1

2

3

4

5

CPU Speed-up GPU Speed-up Solution Cost

Base License 2 Core

ANSYS HPC Pack 6 Cores

ANSYS HPC Pack 8 Cores

ANSYS HPC Pack 6 Cores + GPU

Solution Cost Basis

- ANSYS base license

- ANSYS HPC Pack

- $10K for Workstation

- $2K for Tesla C2075

Results from HP Z800 Workstation, 2 x Xeon X5670 2.93GHz 48GB memory, CentOS 5.4 x64; Tesla C2075, CUDA 4.0.17

NOTE: Based on ANSYS Mechanical 14.0 Preview 3 DMP Solver Aug 2011 and Model V13sp-5

Fac

tors

Gai

n O

ver

Bas

e L

icen

se R

esu

lts

Performance Basis

V13sp-5 Model:

- 2,100 K DOF

- SOLID187 FEs

- Static nonlinear

- One load step

- Direct sparse

NOTE: Invest 38% more over

Base License for a gain of over 4x!

Page 20: Stan Posey, CAE Industry Development ... - The ANSYS … · ANSYS Maxwell ANSYS Nexxim (Signal Integrity) ... CUDA 4.0.17 ANSYS Mechanical 14 Preview on GPU Workstation ... Santa

20 ANSYS 2011 Regional Conferences

NVIDIA Use of ANSYS Software for Product Design

ANSYS Icepak – active and passive cooling of IC packages

ANSYS Mechanical – large deflection bending of PCBs

ANSYS Mechanical – comfort and fit of 3D emitter glasses

ANSYS Mechanical – shock & vib of solder ball assemblies

Page 21: Stan Posey, CAE Industry Development ... - The ANSYS … · ANSYS Maxwell ANSYS Nexxim (Signal Integrity) ... CUDA 4.0.17 ANSYS Mechanical 14 Preview on GPU Workstation ... Santa

21 ANSYS 2011 Regional Conferences

NVIDIA HPC Case Study: Performance Gain of 77x

ANSYS Mechanical Simulations by NVIDIA for Design of 3D Emitter Glasses

Simulation for prediction of comfort, fit, and handling

Study optimized on CPU platform before applying GPU

Once impossible model parameterization now practical

Page 22: Stan Posey, CAE Industry Development ... - The ANSYS … · ANSYS Maxwell ANSYS Nexxim (Signal Integrity) ... CUDA 4.0.17 ANSYS Mechanical 14 Preview on GPU Workstation ... Santa

22 ANSYS 2011 Regional Conferences

Servers with Tesla GPUs

Workstations with

Tesla GPUs

Workstations Servers

Existing System • Tesla C2050 (3 GB)

• Tesla C2075 (6 GB)

New System Purchase • Total 6-8 CPU cores

• Total 48 GBs of CPU memory

• Disk with minimum 500 GB

• Tesla C2075

+ Quadro 2000 for pre/post

-- OR --

• Quadro 6000 (6GB)

Existing System • Tesla S2050 (12 GB or 3 GB/GPU)

New System Purchase • Total 4 CPUs, 6-8 CPU cores each

• Total 4 x16 PCIe (one for each GPU)

• Total 96 to128 GBs of CPU memory

• Disk with minimum 2000 GB (scratch)

• Tesla M2070 or Tesla M2090

Recommended System Configurations

Page 23: Stan Posey, CAE Industry Development ... - The ANSYS … · ANSYS Maxwell ANSYS Nexxim (Signal Integrity) ... CUDA 4.0.17 ANSYS Mechanical 14 Preview on GPU Workstation ... Santa

23 ANSYS 2011 Regional Conferences

Summary and Next Steps

ANSYS Software supports NVIDIA GPUs for Computation

ANSYS 13.0 since Nov 2010; New features coming in ANSYS 14.0 Joint Collaboration on ANSYS 13.0 is only the beginning

Collaboration ongoing in all disciplines of CSM, CFD and CEM

Learn more about ANSYS and NVIDIA GPU solution More at: www.nvidia.com/object/tesla-ansys-accelerations.html Want to try ANSYS on NVIDIA GPUs? Contact [email protected]

Page 24: Stan Posey, CAE Industry Development ... - The ANSYS … · ANSYS Maxwell ANSYS Nexxim (Signal Integrity) ... CUDA 4.0.17 ANSYS Mechanical 14 Preview on GPU Workstation ... Santa

24 ANSYS 2011 Regional Conferences

Thank You, Questions ?

Stan Posey | CAE Industry Development | [email protected]

NVIDIA, Santa Clara, CA, USA