21
ANSYS High P f Performance Computing User Group 2010 2010 © 2010 CAE Associates

ANSYS High Pf Performance Computing - CAE … · — Run #2: PCG Solver 24 CPURun #2: PCG Solver, 24 CPU s’s, Hyperthreading On Wall time = 8766 secOn. Wall time = 8766 sec. Wall

Embed Size (px)

Citation preview

ANSYS High P f Performance Computing

User Group20102010

© 2010 CAE Associates

Parallel Processing in ANSYS

ANSYS offers two parallel processing methods: ANSYS offers two parallel processing methods: Shared-memory ANSYS: Shared-memory ANSYS uses the shared-

memory architecture in ANSYS, meaning using multiple processors on a single machine Most but not all of the solution phase runs in parallelsingle machine. Most, but not all, of the solution phase runs in parallel when using the shared-memory architecture. Many solvers in ANSYS can use the shared-memory architecture. In addition, pre- and postprocessingcan make use of the multiple processors, including graphics operations,can make use of the multiple processors, including graphics operations, processing of large CDB files, and other data and compute intensive operations

2

Parallel Processing in ANSYS

Distributed ANSYS: Distributed ANSYS can run over a cluster of machines Distributed ANSYS: Distributed ANSYS can run over a cluster of machines or use multiple processors on a single machine and works by splitting the model into different parts and distributing those parts to each machine/processor. By solving only a portion of the entire model on eachmachine/processor. By solving only a portion of the entire model on each machine/processor, the processing time and memory requirements can be reduced.

— With Distributed ANSYS, the entire solution phase runs in parallel, including , p p , gthe stiffness matrix generation, linear equation solving, and results calculations.

— If you are running Distributed ANSYS on a single machine with lti th th l ti h (f l dmultiprocessors, then the non-solution phases (for example, pre- and

postprocessing) will run in shared-memory parallel mode, making use of the multiprocessors for graphics and other data and compute intensive operations, as is done in shared-memory ANSYS. y

3

FEA Benchmark Problem

Bolted Flange with O-Ring Bolted Flange with O-Ring

Nonlinear material properties (H perelastic O Ring)(Hyperelastic O-Ring)

Large Deformation

Nonlinear Contact

1 Million Degrees of Freedom

ANSYS 12 1 ANSYS 12.1

4

Datahal DHCAD5650 High End Workstation

SuperMicro 4U Tower Black SATA 5 25 SuperMicro 4U Tower Black SATA 5.25 Bays 8 Hotswap EATX3 800W RPS

Dual Hex Core (12 cores total) Intel® XEON 5650 2 66GHz Processors— Intel® XEON 5650 2.66GHz Processors

24 GB RAM — (4) 6GB (3 x 2GB) – 1333MHz

DDR3/PC3 10600 Non ECC DDR3DDR3/PC3-10600 – Non-ECC – DDR3 SDRAM – 240-Pin DIMM

Four 300GB Toshiba SAS 15,000 RPM 16MB 3 5IN drives in RAID 016MB 3.5IN drives in RAID 0.

One 500GB SATA Hard Drive for the Operating System

Nvidia Quadro FX 1800 Video Card Nvidia Quadro FX 1800 Video Card LG DVD/RW

5

FEA Benchmark Performance

Single Machine SMP Sparse vs MPP DSPARSE

6

7

Single Machine ‐ SMP Sparse vs. MPP DSPARSE

5

Up

3

4

lver Spe

ed U

SPARSE

DSPARSE

2

Sol DSPARSE

0

1

0 2 4 6 8 10 12 14

6

# Cores

CFX Benchmark Problem

Flow around an airfoil Flow around an airfoil

1 Million Elements

3-D, Steady State, Compressible Flow

k-ε Turbulence Model

7

CFX Benchmark Performance

CFX Parallel Performance

6

7

CFX Parallel Performance

5

Up

3

4

lver Spp

ed U

CFX

2

Sol

0

1

0 2 4 6 8 10 12 14

8

# Cores

Disk Drive Speed

The bolted flange analysis was run on the two different drives of our high The bolted flange analysis was run on the two different drives of our high end workstation to compare disk speed influence on solution time.

The RAID arra completed the sol tion almost t ice as fast as the SATA The RAID array completed the solution almost twice as fast as the SATA drive:

— Run #1: PCG Solver, 12 CPU, In-Core, RAID Array. Wall time = 8754 sec.R #2 PCG S l 12 CPU I C SATA D i W ll ti 16822— Run #2: PCG Solver, 12 CPU, In-Core, SATA Drive. Wall time = 16822 sec.

9

Hyperthreading

Hyperthreading allows one physical processor to appear as two logical Hyperthreading allows one physical processor to appear as two logical processors to the operating system. This allows the operating system to perform two different processes simultaneously.

It does not, however, allow the processor to do two of the same type of operation simultaneously (i.e. floating point operations).

This form of parallel processing is only effective when a system has many lightweight tasks.

10

Hyperthreading and ANSYS

The bolted flange analysis was run with Hyperthreading on and then again The bolted flange analysis was run with Hyperthreading on and then again with it off to determine it’s influence.

— Run #1: PCG Solver, 12 CPU’s, Hyperthreading Off. Wall time = 8754 sec.Run #2: PCG Solver 24 CPU’s Hyperthreading On Wall time = 8766 sec— Run #2: PCG Solver, 24 CPU s, Hyperthreading On. Wall time = 8766 sec.

An LS-Dyna analysis was also run in the same manner as above with the following resultsfollowing results.

— Run #1: 12 CPU’s, Hyperthreading Off. Wall time = 19560 sec.— Run #2: 24 CPU’s, Hyperthreading On. Wall time = 32918 sec.

11

High Performance Computingp g

(HPC)Product Configurationsg

Presented by: Tony Solazzo

HPC Changes

ANSYS HPC solutions now support multiphysics ANSYS HPC solutions now support multiphysics— Single solution enables parallel processing for all physics and level of fidelity -

fluids, structures, thermal, and electromagnetics• Mechanical products : ANSYS Multiphysics, ANSYS Mechanical/Emag, ANSYSMechanical products : ANSYS Multiphysics, ANSYS Mechanical/Emag, ANSYS

Mechanical/CFD-Flo, ANSYS Mechanical, ANSYS Structural, ANSYS Professional NLS, ANSYS Professional NLT, ANSYS AUTODYN, ANSYS AUTODYN Single Task, ANSYS Emag

• Fluids products : ANSYS CFD ANSYS FLUENT ANSYS CFX ANSYS CFD-Flo• Fluids products : ANSYS CFD, ANSYS FLUENT, ANSYS CFX, ANSYS CFD-Flo, ANSYS Icepak, ANSYS POLYFLOW

• Solver products : ANSYS Multiphysics Solver, ANSYS Mechanical Solver, ANSYS Structural Solver, ANSYS Emag Solver, ANSYS CFD Solver, ANSYS FLUENT Solver ANSYS CFX Solver ANSYS CFD Flo SolverSolver, ANSYS CFX Solver, ANSYS CFD-Flo Solver

— Eliminates the need to separately acquire and deploy parallel processing for separate simulation domains

— Increases the value from your overall investment in high performance— Increases the value from your overall investment in high performance computing and ANSYS multiphysics solutions

13

HPC Changes

How are they packaged in Version 12? How are they packaged in Version 12?— HPC Configurations

• ANSYS HPC - Individual processor based• ANSYS HPC Pack - Sold in groups of 8 processorsANSYS HPC Pack Sold in groups of 8 processors

— Each simulation consumes one or more packs• ANSYS HPC Workgroup - provides parallel capacity

for multiple users and multiple simulations

14

Overview ANSYS HPC Packs

ANSYS HPC Packs enable high fidelity insight ANSYS HPC Packs enable high-fidelity insight — Each simulation consumes one or more packs— Parallel enabled increases quickly with added packsParallel enabled increases quickly with added packs

ParallelEnabled

204851

(Cores)

328

128

512

28

1 2 3 4 5

15

Packs per Simulation

Enabling Insight ANSYS HPC Packs

Example Customer Owns (5) HPC Packs Example Customer Owns (5) HPC Packs— Can run (5) 8-processor based projects

Parallel

2048

a a eEnabled(Cores)

2048

3128

512

328

8

8

Packs per Simulation1 2 3 4 5

Solver Jobs1

23

88

8

16

45

88

8

Enabling Insight ANSYS HPC Packs

Example Customer Owns (5) HPC Packs

Parallel

Example Customer Owns (5) HPC Packs— Can run (3) 8-processor based projects— Can run (1) 32-processor based project

2048

a a eEnabled(Cores)

2048

3128

512

328

8

12

3Packs per Simulation1 2 3 4 5

88

Solver Jobs

17

48

832

Enabling Insight ANSYS HPC Packs

Example Customer Owns (5) HPC Packs

Parallel

Example Customer Owns (5) HPC Packs— Can run (1) 32-processor based project— Can run (1) 128-processor based project

2048

a a eEnabled(Cores)

2048

3128

512

328

8

1

2

1 2 3 4 5Solver Jobs

32

18

2 2128

Enabling Insight ANSYS HPC Packs

Example Customer Owns (5) HPC Packs

Parallel

Example Customer Owns (5) HPC Packs— Can run (1) 2048-processor based project

2048

a a eEnabled(Cores)

2048

3128

512

328

8

121 3 4 5

Solver Jobs

20

19

2048

Enabling Productivity - ANSYS HPC Workgroup Solution ANSYS HPC Workgroup provides parallel capacity ANSYS HPC Workgroup provides parallel capacity

for multiple users and multiple simulations— Volume access to parallel processes

Available in blocks from 128 to 2048 processes— Available in blocks from 128 to 2048 processes— Shared across any number of simulation tasks on

single server

20

Enabling Productivity - ANSYS HPC Workgroup Solution

ANSYS HPC Workgroup ANSYS HPC Workgroup— ANSYS HPC Workgroup 128— ANSYS HPC Workgroup 256

ANSYS HPC Workgroup 512— ANSYS HPC Workgroup 512— ANSYS HPC Workgroup 1024— ANSYS HPC Workgroup 2048

HPC Server ANSYS HPC Workgroup

Parallel Block

21