Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
© 2012 ANSYS, Inc. June 21, 2012 1
Boost Your Productivity Through HPC - for FEA & CFD
Duraivelan Dakshinamoorthy, Ph.D
© 2012 ANSYS, Inc. June 21, 2012 2
• Introduction
• Parallel Scalability
• FEA
• CFD
• HPC advancements
Agenda
© 2012 ANSYS, Inc. June 21, 2012 3
• Introduction
• Parallel Scalability
• FEA
• CFD
• HPC advancements
Agenda
© 2012 ANSYS, Inc. June 21, 2012 4
Why HPC?
© 2012 ANSYS, Inc. June 21, 2012 5
Why HPC for ANSYS Users Enhanced insight and productivity
It’s all about getting better insight into product behavior quicker!
HPC enables high-fidelity • Include details - for reliable results
• “Getting it right the first time”
• Innovate with confidence
HPC enables design exploration & optimization • Consider multiple design ideas
• Optimize the design
• Ensure performance across range of conditions
© 2012 ANSYS, Inc. June 21, 2012 6
HPC – A Software Development Imperative
• Clock Speed – Leveling off
• Core Counts – Growing
• Exploding (GPUs)
• Future performance depends on highly scalable parallel software
Source: http://www.lanl.gov/news/index.php/fuseaction/1663.article/d/20085/id/13277
© 2012 ANSYS, Inc. June 21, 2012 7
ANSYS HPC Leadership
A History of HPC Performance
2010 - 2011 ►Ideal scaling to 3072 cores (fluids) ►Hybrid parallelization (fluids) ►Network-aware partitioning (fluids) ►Large finite antenna arrays (HFSS 14) ►GPU acceleration with DMP(structures)
Today’s multi-core / many-core hardware evolution makes HPC a software development imperative. ANSYS is committed to maintaining performance
leadership.
© 2012 ANSYS, Inc. June 21, 2012 8
• Introduction
• Parallel Scalability
• FEA
• CFD
• HPC advancements
Agenda
© 2012 ANSYS, Inc. June 21, 2012 9
Impressive speed and scaling improvements in 13.0 release
Focus on resolving bottlenecks in the distributed memory solvers (DANSYS)
Sparse direct solver
• Parallelized equation ordering
• 40% faster w/ updated Intel MKL
Preconditioned Conjugate Gradient (PCG) iterative solver
• Parallelized preconditioning step
Support of unsymmetric eigensolver
ANSYS Mechanical Scaling
© 2012 ANSYS, Inc. June 21, 2012 10
ANSYS Mechanical Scaling
Number of Cores Number of Cores
• Intel Xeon E5-2690 processors (2.9 GHz, 16 cores total) • 128 GB of RAM • SUSE Linux Enterprise Server
10.7 Mio Degrees of Freedom Static, linear, structural 1 load step
1 Mio Degrees of Freedom Harmonic, linear, structural 4 frequencies
© 2012 ANSYS, Inc. June 21, 2012 11
ANSYS Mechanical Scaling
6 Mio Degrees of Freedom Plasticity, Contact Bolt pretension 4 load steps
1 HPC Pack
© 2012 ANSYS, Inc. June 21, 2012 12
What about GPU Computing?
CPUs and GPUs work in a collaborative fashion
Multi-core processors
•Typically 4-8 cores
•Powerful, general purpose
Many-core processors
•Typically hundreds of cores
•Great for highly parallel code, within memory constraints
CPU GPU
PCI Express channel
© 2012 ANSYS, Inc. June 21, 2012 13
Solver Kernel
Speedups
Overall Speedups
ANSYS Mechanical SMP – GPU Speedup
• Intel Xeon 5560 processors (2.8 GHz, 8 cores total) • 32 GB of RAM • Windows XP SP2 (64-bit) • Tesla C2050 (ECC,ON; WDDM driver)
© 2012 ANSYS, Inc. June 21, 2012 15
Distributed ANSYS – GPU Speedup @ 14.0
Cores GPU Speedup
2 no 2.25
4 no 4.29
2 yes 11.36
4 yes 11.51
Vibroacoustic harmonic analysis of an audio speaker
• Direct sparse solver
• Quarter-symmetry model with 700K DOF:
– 657424 nodes
– 465798 elements
– higher-order acoustic fluid elements (FLUID220/221)
Distributed ANSYS Results (baseline is 1 core):
• With GPU, ~11x speedup on 2 cores!
• 15-25% faster than SMP with same number of cores
Windows workstation: Two Intel Xeon 5530 processors (2.4 GHz, 8 cores total), 48 GB RAM, NVIDIA Quadro 6000
Speedup
SMP DANSYS
SMP+GPU DANSYS+GPU
0.00
2.00
4.00
6.00
8.00
10.00
12.00
2
4
© 2012 ANSYS, Inc. June 21, 2012 16
1848
1192
846
564 516 399 444
342 314 273 270
0
1000
2000
3000
Xeon 5670 2.93 GHz Westmere (Dual Socket)
Xeon 5670 2.93 GHz Westmere + Tesla C2075
AN
SY
S M
echanic
al
Tim
es
in S
econds
4.2x
2.7x
3.5x
2.1x 1.9x
1 Core 2 Core 4 Core 6 Core 12 Core
1 Socket 2 Socket
8 Core
Results from HP Z800 Workstation, 2 x Xeon X5670 2.93GHz
48GB memory, CentOS 5.4 x64; Tesla C2075, CUDA 4.0.17
V13sp-5 Model
Turbine
geometry
2,100 K DOF
SOLID187 FEs
Static, nonlinear
One iteration
Direct sparse
Lower is
Better
ANSYS Mechanical 14.0 Performance for Tesla C2075
© 2012 ANSYS, Inc. June 21, 2012 18
ANSYS Mechanical HPC Case Study Enabling Enhanced Productivity
“Parallel processing makes it possible to evaluate five to 10 design iterations per day, enabling Cognity to rapidly improve their design...”
- Rae Younger, Managing Director, Cognity Limited
Application: Stress analysis of hydraulic deflection housing Software: ANSYS Mechanical HPC Solution: Critical to meet delivery-time requirements for this project Business Ability to complete the design in approx. 70% Solution: less time than would have been required
© 2012 ANSYS, Inc. June 21, 2012 19
ANSYS Mechanical HPC Case Study Enabling Enhanced Productivity
“By optimizing our solver selection and workstation configuration, and including GPU acceleration, we’ve been able to dramatically reduce turnaround time — from over two days to just an hour. This enables the use of simulation to examine multiple design ideas and gain more value out of our investment in simulation.”
- Berhanu Zerayohannes, Senior Mechanical Engineer, NVIDIA
Application: Deflection and bending of 3-D glasses Software: ANSYS Mechanical HPC Solution: From 60 hours per simulation to 47 minutes (77x speedup) Business Ability to ensure robust performance of the 3-D Solution: glasses via examining multiple design ideas
Co
pyr
igh
t 20
11
NV
IDIA
Co
rpo
rati
on
. A
ll ri
ghts
res
erve
d.
© 2012 ANSYS, Inc. June 21, 2012 20
• Introduction
• Parallel Scalability
• FEA
• CFD
• HPC advancements
Agenda
© 2012 ANSYS, Inc. June 21, 2012 21
ANSYS CFX Partitioning
Optimize parallel partitioning in multi-core clusters (CFX)β
• Partitioner determines number of connections between partitions and optimizes part.-host assignments
Re-use previous results to initialize calculations on large problem (CFX) β
• Large case interpolation for cases with >~100M nodes
Clean up of coupled partitioning option for multi-domain cases (CFX)
• Eliminates ‘isolated’ partition spots
Dramatically reduced partitioning times for cases with fluid-solid interfaces and very large numbers of regions
Compute Node 1 Compute Node 2
P1
P5
P3
P6
P2 P7
P4 P8
P1 P5
P3
P6 P2
P7
P4
P8
Partitioning step finds adjacency amongst partitions; partitions with max adjacency are grouped on same compute nodes
© 2012 ANSYS, Inc. June 21, 2012 22
ANSYS CFX Scalability - Not just for large meshes
Problem Description
Name of Definition file
Type Number of nodes Turbulence
model Other physical
model
Case 1 perf_Airfoil_R12.def External Aero 9933000 KE Compressible, Total Energy
Case 2 perf_AirliftReactor_R
12.def Multiphase 474993 KE Inhomogeneous
Case 3 perf_IndyCar_R12.de
f External Aero 483460 KE Incompressible
Case 4 perf_Internal_R12.d
ef Multiphase 943175 KE Thermal Energy
Case 5 perf_LeMansCar_R1
2.def External Aero 1864025 KE Thermal Energy
Case 6 perf_Pump_R12.def Rotating Machine
1305718 KE Frozen Rotor
© 2012 ANSYS, Inc. June 21, 2012 23
ANSYS CFX Test Case 1 – Airfoil
0
5
10
15
20
25
30
35
40
45
50
0 10 20 30 40 50
Performence Linear
Number of Cores
Spee
d U
p
Mesh Count – 9930000 nodes
© 2012 ANSYS, Inc. June 21, 2012 24
ANSYS CFX Test Case 2 - Air Lift Reactor
0
1
2
3
4
5
6
7
8
9
10
0 2 4 6 8 10
Performence Linear
Number of Cores
Spee
d U
p Mesh Count – 474993 nodes
© 2012 ANSYS, Inc. June 21, 2012 25
ANSYS CFX Test Case 3 - Indy Car
0
1
2
3
4
5
6
7
8
9
10
0 2 4 6 8 10
Performence Linear
Number of Cores
Spee
d U
p
Mesh Count – 483460 nodes
© 2012 ANSYS, Inc. June 21, 2012 26
ANSYS CFX Test Case 4 – Internal
0
2
4
6
8
10
12
14
16
18
20
0 5 10 15 20
Performence Linear
Number of Cores
Spee
d U
p
Mesh Count – 943175 nodes
© 2012 ANSYS, Inc. June 21, 2012 27
ANSYS CFX Test Case 5 – Le Mans
0
5
10
15
20
25
30
35
0 5 10 15 20 25 30 35
Performence Linear
Number of Cores
Spee
d U
p
Mesh Count – 1864025 nodes
© 2012 ANSYS, Inc. June 21, 2012 28
ANSYS CFX Test Case 6 – Pump
0
2
4
6
8
10
12
14
16
18
20
0 5 10 15 20
Performence Linear
Number of Cores
Spee
d U
p
Mesh Count – 1305718 nodes
© 2012 ANSYS, Inc. June 21, 2012 29
ANSYS Fluent Parallel Scalability
Intel Westmere
Consistently improved scalability
across releases
Sedan, 4M cells
Solv
er R
atin
gs
0
1000
2000
3000
4000
5000
6000
7000
0 100 200 300 400 500 600
6.3.0
12.0.0
Intel Harpertown
Number of Cores
Solv
er R
atin
gs
0
10000
20000
30000
40000
50000
60000
70000
0 500 1000 1500 2000
13.0.0
14.0.0
Number of Cores
0
5000
10000
15000
20000
25000
0 100 200 300 400 500 600
12.0.0
13.0.0
Xeon X5560 @ 2.80GHz (Nehalem EP)
Number of Cores
Solv
er R
atin
gs
© 2012 ANSYS, Inc. June 21, 2012 30
ANSYS Fluent Parallel Scalability
SGI ICE 8400EX, Intel 6-core
Consistently improved scalability
across releases
Truck, 111M cells
0
200
400
600
800
1000
1200
1400
0 500 1000 1500 2000
12.0.0
13.0.0
Solv
er R
atin
gs
Number of Cores
0
500
1000
1500
2000
2500
0 1000 2000 3000 4000 5000
13.0.0
14.0.0
Solv
er R
atin
gs
Number of Cores
Intel Westmere hex-core 2.93 GHz
0
50
100
150
200
250
300
350
400
450
0 200 400 600 800 1000 1200
6.3.0
12.0.0
Number of Cores
Intel Harpertown
Solv
er R
atin
gs
© 2012 ANSYS, Inc. June 21, 2012 31
Hybrid Parallelisation
Hybrid parallel: fast shared memory communication (OpenMP) within a machine to speed up overall solver performance; distributed memory (MPI) between machines
© 2012 ANSYS, Inc. June 21, 2012 32
Including Monitors
Scalability with Monitors
• Scalability to higher core counts
• Simulations with monitors including plotting and printing
Hex-core mesh, F1 car, 130 million cells monitor-enabled
0
5
10
15
20
25
30
35
0 200 400 600 800 1000
Example data for scaling with R14 monitors
3072 cores
Monitor support optimizations
maintain scalability expectations
© 2012 ANSYS, Inc. June 21, 2012 35
ANSYS Fluent Read and Write Times
File I/O has typically been a bottle
neck for large grids
Consistently reduced I/O times
across releases
Time to Read 111M Truck Case
Time to Write 111M Truck Case Time to Read 200M Cavity Case
© 2012 ANSYS, Inc. June 21, 2012 36
ANSYS Fluent Auto-Partitioning
Auto partitioning is now very quick
Less than 10s to process 800M cells!
Serial pre-partitioning step no
longer required
200M 400M 600M 800M
Time 2.914 4.706 6.617 9.86
0
2
4
6
8
10
12
Tim
e in
se
con
ds
cavity case, 768 cores
192 384 768 1536
Time 5.307 4.542 6.177 8.109
0
1
2
3
4
5
6
7
8
9 Ti
me
in s
eco
nd
s
truck_111m
Time to Partition 200M Cavity Case over 768 cores
Time to Partition 111M Truck Case
© 2012 ANSYS, Inc. June 21, 2012 37
Fluids I/O
Fluent and CFX use a “singular” file structure
• This means there is one global set of files and every process writes to them.
This methodology falls down at a large number of cores where the file I/O becomes a bottleneck
• CFX deals with this by using inline compression (cdat)
• Fluent has both inline compression (cdat) and at v12.x introduced support for a Parallel File (pdat).
Parallel file system support in ANSYS Fluent
– ~10x - 20x speedup for data write
– Eliminates scaling bottleneck for data intensive simulations on large clusters (e.g., transient flows)
Serial I/O Parallel I/O
ANSYS Fluent
© 2012 ANSYS, Inc. June 21, 2012 38
Fluids I/O
Mesh File Location Async I/O Time
15M Cas NFS OFF 217s
15M Cas NFS ON 62s
15M Dat NFS OFF 113s
15M Dat NFS ON 8s
30M Cas NFS OFF 207s
30M Cas NFS ON 75s
30M Dat NFS OFF 144s
30M Dat NFS ON 10s
Asynchronous I/O for Linux Fluent Total write time 3-5x quicker over NFS
Even larger speed-ups on bigger cases and local disk (up to 10x)
© 2012 ANSYS, Inc. June 21, 2012 39
Parallel Scaling for Complex Physics Innovation in Fluent 14.5 for Discrete Phase Performance
Original partitions
New partition boundary
Contours of particle weights
MPI Hybrid
Original 180.7 78.1
Banded Partition 159.5 47.6
0
50
100
150
200
Tim
e in
se
con
ds
100x100 mesh 30000 particles 2 machines, 8 cores each
Hybrid particle tracking balances load within a machine, while the enhanced partitioning spreads it across machines Original Hybrid
Time 33.47 8.89
0
10
20
30
40 Ti
me
in s
eco
nd
s
DEM, 1 time step, 44k cells, 600k particles,
12x Intel Westmere 2.93GHz
© 2012 ANSYS, Inc. June 21, 2012 40
• To Demonstrate 50:50:50 Method – Volvo XC60 vehicle model
– Four shape parameters
– RBF Morph (Integrated within FLUENT) to define shape parameters
– Grid morphing in parallel
• ANSYS WorkBench (Frame Work to Automate Process) – To drive shape parameters
– To create DOE
– To perform Goal Driven Optimization
HPC Fluids Demonstration Case
The 50:50:50 Method
50 50 design points in the design
space EXTENT
50 50 million cells used in CFD simulation of each design
point ACCURACY
50 50 hours total elapsed time to simulate all the design points
SPEED
“One – Click” – Entire design space is simulated and post-processed completely automatically after the initial baseline
case setup
© 2012 ANSYS, Inc. June 21, 2012 41
HPC Fluids Demonstration Case Prepare Meshed Model for
Baseline Vehicle Shape
CFD Solver Setup, Define Shape Parameters
Generate DOE using Input Shape Parameters
Collate Data, Perform Optimization
Morph Vehicle Shape
Run CFD Simulation
STEP 1
STEP 2
STEP 3
STEP 4
STEP 5
Mesh Morpher Integrated within FLUENT Solver (FLUENT), Optimizer (DX) & Post Processor (CFD Post) Integrated within
ANSYS WorkBench
© 2012 ANSYS, Inc. June 21, 2012 42
Boat tail angle
© 2012 ANSYS, Inc. June 21, 2012 43
Boat tail angle
© 2012 ANSYS, Inc. June 21, 2012 44
Boat tail angle
© 2012 ANSYS, Inc. June 21, 2012 45
Long roof drop angle
© 2012 ANSYS, Inc. June 21, 2012 46
Long roof drop angle
© 2012 ANSYS, Inc. June 21, 2012 47
Long roof drop angle
© 2012 ANSYS, Inc. June 21, 2012 48
Front Spoiler
© 2012 ANSYS, Inc. June 21, 2012 49
Front Spoiler
© 2012 ANSYS, Inc. June 21, 2012 50
HPC Fluids Demonstration Case
768 Cores 384 Cores 288 Cores 240 Cores 144 Cores
Task Time (Seconds) Time (Seconds) Time
(Seconds) Time
(Seconds) Time
(Seconds)
Baseline Case (i.e. Design Point 1)
Read volume mesh of baseline case into the CFD solver and apply solver settings
225 340 365 481 228
CFD Solution 6979 11153 14409 17256 27246
Writing CFD data file 681 538 558 600 532
Each Subsequent Design Point
Morph vehicle shape 84 59 65 69 100
CFD Solution 1284 1754 2208 2630 4100
Writing CFD data file 734 559 572 621 532
Total Run Time (Wall Clock) Needed for All 50 Design Points (Hours)
30.80 35.63 42.98 50.28 72.19
© 2012 ANSYS, Inc. June 21, 2012 51
HPC Fluids Demonstration Case
Compute Cluster Details
1. Intel’s Endeavor Cluster
2. Intel Xeon X5670 (dual socket)
3. Clock speed 2.93 GHz
4. Six cores per socket (12 cores per node)
5. 24 GB RAM @ 1333 MHz, SMT ON, Turbo ON
6. QDR Infiniband
7. RHEL Server Release 6.1
© 2012 ANSYS, Inc. June 21, 2012 52
ANSYS Fluids HPC Case Study Enabling Enhanced Insight
“Petrobras relies on ANSYS software for its superior parallel scalability, together with advanced multiphase models and dynamic meshing.”
- Carlos Alberto Capela Moraes, Technical Consultant, CENPES (Petrobras R&D Center)
Application: Transient multiphase simulation of sand transportation Software: ANSYS CFD HPC Solution: Consider more detailed, accurate and complete 3-D flow assurance simulations than ever before Business Ability to understand critical scenarios and complex Solution: operations of upstream processing systems
© 2012 ANSYS, Inc. June 21, 2012 53
ANSYS Fluids HPC Case Study Enabling Enhanced Insight
“ANSYS HPC technology has ensured that we can test and implement changes quickly and competitively. This allows us to turn around simulation results for multiple designs between race qualifications on Fridays and Saturdays...”
- Nathan Sykes, CFD Team Leader , Red Bull Racing
Application: Car aerodynamics, braking, exhaust systems Software: ANSYS Fluent HPC Solution: Ability to obtain high-fidelity insight into car performance in shorter turnaround times Business ANSYS HPC crucial to Red Bull Racing Solution: Formula One Championships in 2010 & 2011
© 2012 ANSYS, Inc. June 21, 2012 54
ANSYS Fluids HPC Case Study Enabling Enhanced Insight
“ANSYS HPC technology is enabling Cummins to use larger models with greater geometric details and more-realistic treatment of physical phenomena...”
- John Horsley, Engineer, Cummins Turbo Technologies
Application: Turbochargers of diesel engines Software: ANSYS CFX HPC Solution: High-fidelity results; 12 times faster; ability to simultaneously evaluate 5 full-stage compressor or turbine designs in a few hours Business Ability to bring new products to market in less Solution: time while substantially reducing expenses
© 2012 ANSYS, Inc. June 21, 2012 55
3 Millions of Cells (6 Days)
25 Millions (4 Days)
10 Millions (5 Days)
50 Millions (2 Days)
Increase of :
Spatial-temporal Accuracy
Complexity of Physical Phenomenon
Supersonic Multiphase Radiation
Compressibility Conduction/Convection
Transient Optimisation / DOE Dynamic Mesh
LES Combustion Aeroacoustic Fluid Structure Interaction
HPC for High Fidelity CFD
EURO/CFD • Model sizes up to 200M cells (ANSYS Fluent)
• 2011 cluster of 700 cores
– 64-256 cores per simulation
© 2012 ANSYS, Inc. June 21, 2012 56
• Introduction
• Parallel Scalability
• FEA
• CFD
• HPC advancements
Agenda
© 2012 ANSYS, Inc. June 21, 2012 57
HPC Advancements in Licensing
2048
32 8
128 512
Parallel Enabled (Cores)
Packs per Simulation
1 2 3 4 5
Scalable licensing
• ANSYS HPC (per-process)
• ANSYS HPC Pack
– Each simulation consumes one or more Packs
– Parallel enabled increases quickly with added Packs
• ANSYS HPC Workgroup
– 128 to 2048 parallel shared across any number of simulations on a single server
• ANSYS HPC Enterprise
– Similar to HPC Workgroup but deploy and use anywhere in the world
Single solution for multiphysics and any level of fidelity
© 2012 ANSYS, Inc. June 21, 2012 58
HPC Advancements Through Partnerships
• ANSYS maintains close technical collaboration with the leaders in HPC
• This mutual commitment ensures that you get the most possible value from your overall HPC investment
• Some current examples:
– Optimized performance on multicore processors from Intel, with R&D focused on Intel’s Many Integrated Core (MIC)
• Over 60% performance boost for the latest Intel® Xeon® E5-2600 processor (Sandy Bridge) family compared to previous Intel (Westmere) generation
– GPU computing accelerates ANSYS Mechanical today, with very active R&D engagement with NVIDIA across full portfolio
– ANSYS and HP – Tuning Performance and Productivity at any scale
– ANSYS and IBM – Optimized cluster and storage architectures for ANSYS
– ANSYS and Cray – Support for extreme scalability of ANSYS CFD on the Cray XE, up to 1000’s of cores
© 2012 ANSYS, Inc. June 21, 2012 60
1
1.63
6 core Xeon x5675 8 core E5-2680
Intelligent Performance for structural/thermal simulation
The memory capacity of the Intel® Xeon® processor E5-2600 product family allows even the largest of workloads to be handled in-core, significantly improving run times.
Intel® AVX is delivering significant speed up in factoring and solving the FE assembly equations matrix which are floating point intensive.
Users will usually see significant reduction in simulation runtimes even for the largest of models due to the additional cores and larger memory capacity. This will allow customers to run larger, higher fidelity models in more iterations within a set time and cost constraints to improve their product quality and enable innovation.
ANSYS Mechanical 14 Relative Performance Higher is better
HPC Advancements in Processor Technology - ANSYS Mechanical Parallel Scalability on Xeon E5
Data Source: Intel approved/published results as of February 1, 2012.
© 2012 ANSYS, Inc. June 21, 2012 61
Leading Performance for fluid flow simulation
The memory bandwidth of the Intel® Xeon® processor E5-2600 product family allows excellent scalability and per core performance.
Support for higher speed memory DIMMs, added on-core capacity for memory loads, as well as a larger cache size are key to extending performance and scalability.
Higher memory bandwidth has a pronounced impact with fully coupled solver applications, which are the most memory intensive. Sedan_4m is shown as an example of fully coupled solver performance. Truck_14m is representative of segregated solver performance. The horizontal line at 1.63 represents the geomean speedup over 6 standard benchmarks.
Data Source: Approved/published results as of February 1, 2012.
ANSYS Fluent 14 Relative Performance Higher is better
1
1.86
6 core Xeon X5675 8 core Xeon E5-2680
Sedan_4m
Geomean
1
1.53
6 core Xeon X5675 8 core Xeon E5-2680
Truck_14m Geomean
HPC Advancements in Processor Technology - ANSYS Fluent Parallel Scalability on Xeon E5
© 2012 ANSYS, Inc. June 21, 2012 62
Good scalability and more operations per clock make obtaining results on Intel® Xeon® E5 1.68x faster than on Intel Xeon 5600 platforms For end user it is about faster turnaround or solving larger tasks with the same resources along with lower TCO
ANSYS CFX 14 Relative Performance Higher is better
1
1.76
6 core Xeon X5675 8 core Xeon E5-2680
LeMansCar
Geomean
1
1.49
6 core Xeon X5675 8 core Xeon E5-2680
Stage Compressor Geomean
HPC Advancements in Processor Technology - ANSYS CFX Parallel Scalability on Xeon E5
Data Source: Approved/published results as of February 1, 2012.
© 2012 ANSYS, Inc. June 21, 2012 63
Upcoming HPC Advancements
• ANSYS focus on HPC is ongoing… – Architecting for extreme scalability
• Performance at 100’s and 10,000’s of cores for FEA and CFD, respectively • Innovative mechanical solvers: Multilevel PCG, 2D parallel DSPARSE fronts • GP-GPUs for radiation, UDFs, DEM and possibly other CFD solvers • Hybrid distributed/shared memory and vector processing paradigms
– Scalability across all components and full simulation process • Meshing, setup, solver, I/O, visualization, optimization… • Distributed parallel meshing integrated with solver • Parallel for linear dynamics, including mode superposition-based analyses
– Ongoing optimization and performance tuning • Dynamic load balancing; optimized resource mapping, compiler evaluation
– Usability • Multi-component parallel execution environment, job scheduler support • Hardware fault tolerance, system performance tracking and debugging
• All to achieve next-generation capability / performance!
© 2012 ANSYS, Inc. June 21, 2012 64
Information Available
ANSYS HPC Partner Solutions – http://www.ansys.com/About+ANSYS/Partner+Programs/Strategic+HPC+Partnerships
– Reference configurations
• Performance data
• White papers
• Sales contact points
Performance Data – http://www.ansys.com/benchmarks
© 2012 ANSYS, Inc. June 21, 2012 65
Information Available
ANSYS Platform Support • http://www.ansys.com/Support/Platform+Support
– Platform Support Policies
– Supported Platforms
– Supported Hardware
– Tested systems
ANSYS Resource Library • http://www.ansys.com/demoroom/
– Search for HPC!
ANSYS Advantage • Online Magazine
© 2012 ANSYS, Inc. June 21, 2012 66
Information Available
Customer Portal • http://www1.ansys.com/customer/
– Knowledge Resources
– Installation and Systems FAQ’s
Customer Support • http://www1.ansys.com/customer/
• Portal, Email or Phone
Global ANSYS network providing Comprehensive Support
© 2012 ANSYS, Inc. June 21, 2012 67
“Take Home” Points / Discussion
ANSYS HPC performance enables scaling for high-fidelity – What could you learn from a 10M (or 100M) cell / DOF model?
– What could you learn if you had time to consider 10 x more design ideas?
– Scaling applies to “all physics”, “all hardware” (desktop and cluster)
ANSYS continually invests in software development for HPC • Committed to leading edge scalability, performance and usability
• Maximized value from your HPC investment
Comments / Questions / Discussion
© 2012 ANSYS, Inc. June 21, 2012 68
THANK YOU