Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
1 © 2019 ANSYS, Inc. October 24, 2019
Understanding Hardware Selection to Speed Up Your Simulations
October 2019
Wim Slagter, PhD
ANSYS, Inc.
2 © 2019 ANSYS, Inc. October 24, 2019
Major Barrier- Turnaround Time Limitations
Source: Intel-ANSYS Simulation Survey 2014
3 © 2019 ANSYS, Inc. October 24, 2019
Problem Statement
“I am not achieving the performance and throughput I was
expecting from my hardware & software”
Image courtesy of Intel Corporation
4 © 2019 ANSYS, Inc. October 24, 2019
Building A Balanced System Is The Key To Improving Your Experience
If Your System Is
Slow, So Are Your
Engineers &
Analysts Processors
Memory
Storage
Networks
Image courtesy of Intel Corporation
5 © 2019 ANSYS, Inc. October 24, 2019
HDD vs. SSD
What Hardware Configuration to Select?
SMP vs. DMP Interconnects?Clusters?
CPUs? GPUs?
6 © 2019 ANSYS, Inc. October 24, 2019
Agenda
• HPC Terminology
• Hardware Considerations
• Solution Reference Architecture
• Supporting “HPC Resources Anywhere”
• HPC Parallel & Parametric Licensing
7 © 2019 ANSYS, Inc. October 24, 2019
Agenda
• HPC Terminology
• Hardware Considerations
• Solution Reference Architecture
• Supporting “HPC Resources Anywhere”
• HPC Parallel & Parametric Licensing
8 © 2019 ANSYS, Inc. October 24, 2019
HPC Hardware Terminology
Machine 1 (or Node 1)
GPU
Processor 1 (or Socket 1)
Processor 2 (or Socket 2)
Interconnect(GigE or InfiniBand)
Machine N (or Node N)
GPU
Processor 1 (or Socket 1)
Processor 2 (or Socket 2)
9 © 2019 ANSYS, Inc. October 24, 2019
Shared Memory Parallel
• Single Machine Parallel (SMP) systems share a single global memory image that may be distributed physically across multiple cores, but is globally addressable.
• OpenMP is the industry standard.
Machine 1 (or Node 1)
Processor 1 (or Socket 1)
10 © 2019 ANSYS, Inc. October 24, 2019
Distributed Memory Parallel
• Distributed memory parallel processing (DMP) assumes that physical memory for each process is separate from all other processes.
• Parallel processing on such a system requires some form of message passing software to exchange data between the cores.
• MPI (Message Passing Interface) is the industry standard for this.
Machine 1 (or Node 1)
Processor 1 (or Socket 1)
11 © 2019 ANSYS, Inc. October 24, 2019
Agenda
• HPC Terminology
• Hardware Considerations
• Solution Reference Architecture
• Supporting “HPC Resources Anywhere”
• HPC Parallel & Parametric Licensing
12 © 2019 ANSYS, Inc. October 24, 2019
HDD vs. SSD
What Hardware Configuration to Select?
SMP vs. DMP Interconnects?Clusters?
CPUs? GPUs?
13 © 2019 ANSYS, Inc. October 24, 2019
Scalability on Workstations- ANSYS Fluent 2019 R2
• HP Z8 G4 Workstation.• 2x Intel Xeon Platinum 8160 (2.1-3.7GHz, 24cores) CPUs.• 192GB (2600MHz, 8GBx24 DIMMs. • 1TB HP Z Turbo Drive G2 (NVMe SSD)
1,00
1,82
2,67
3,43
5,47 5,59 5,82
0,0
1,0
2,0
3,0
4,0
5,0
6,0
7,0
4 8 12 16 32 36 48
Spee
d U
p Ra
tio
Number of CPU cores
aircraft
1,00
1,81
2,61
3,32
5,46 5,73 6,00
0,0
1,0
2,0
3,0
4,0
5,0
6,0
7,0
4 8 12 16 32 36 48
Spee
d U
p Ra
tio
Number of CPU cores
Landig Gear
14 © 2019 ANSYS, Inc. October 24, 2019
Performance Comparison of an Old with New Workstation- ANSYS Fluent 2019 R2
Contents
Workstations
Z820 WorkstationDual Intel Xeon E5-2697v2 (2.7-3.5GHz, 12 cores)
RAM : 1866MHz, 4 channelsvs
Z8 G4 WorkstationDual Intel Xeon Platinum 8160 (2.1-3.7GHz, 24 cores)
RAM : 2666MHz, 6 channels
MPI IBM-MPI
Number of CPU cores tested 4 / 8 / 12 / 16 / 24 / 32 / 36
Benchmark Models 2 cases: aircraft and landing gear
15 © 2019 ANSYS, Inc. October 24, 2019
Performance Comparison of an Old with New Workstation- ANSYS Fluent 2019 R2
1,00 1,72 1,78 1,97 2,04
NoData
NoData
1,07
1,94
2,82
3,65
4,95
5,80 5,91
0,0
1,0
2,0
3,0
4,0
5,0
6,0
7,0
4 8 12 16 24 32 36
Spee
dUP
Ratio
Number of CPU cores
aircraft
Z820 Z8G4
1,00
1,77 2,29
2,64 2,81
NoData
NoData
1,08
2,08
2,97
3,83
5,20
6,05 6,30
0,0
1,0
2,0
3,0
4,0
5,0
6,0
7,0
4 8 12 16 24 32 36
Spee
dUP
Ratio
Number of CPU cores
Landing gear
Z820 Z8G4
85% Speed Up at 24 cores for a middle size CFD simulation
142% Speed Up at 24 cores for a small size CFD simulation
The memory is different on these two machines 2666MHz vs 1866MHz. They may have different memory bandwidth also. For 4-cores, they have very similar performance. When more cores are being used, the performance is starting to deviate indicating a memory bandwidth difference. In short, it may be a matter of both memory speed and bandwidth.
17 © 2019 ANSYS, Inc. October 24, 2019
Optimized for the Latest HPC ArchitecturesANSYS Mechanical 19.x
ANSYS Features & Capabilities
Optimized for Intel Xeon Gold processors:• Upgraded to the Intel MKL 2017 update 2 libraries on Linux and Windows• Provides access to the AVX-512 instruction set• Biggest speedup gains achieved in the sparse direct solver
Iterative Solver Benchmarks
Direct Solver Benchmarks
R19.0 572 sec 425 sec
R19.1 539 sec 404 sec
• R19 Benchmark set (DMP)• Used geometric mean values for each class of benchmarks • Used 1, 2, 4, 8, 16, & 32 cores• 2 Intel Xeon Gold 6148 (2.4 GHz, 40 cores total), 192 GB RAM, Linux CentOS 7.3
R19.1 performs ~10% faster than R19.0 on Skylake systems
18 © 2019 ANSYS, Inc. October 24, 2019
Optimized for the Latest HPC ArchitecturesANSYS Mechanical 2019 R2
1,65
1,19
1,55 1,53
1,78
1,55 1,561,71
1,82 1,76
0,00
0,50
1,00
1,50
2,00
V19cg-1 V19cg-2 V19cg-3 V19ln-1 V19ln-2 V19sp-1 V19sp-2 V19sp-3 V19sp-4 V19sp-5
Core Solver Rating on 32 coresNormalized to Haswell
Intel® Xeon® E5-2699 v3 (Haswell)Intel® Xeon® E5-2697 v4 (Broadwell)Intel® Xeon® Gold 6254 (Cascade Lake)
19 © 2019 ANSYS, Inc. October 24, 2019
Optimized for the Latest HPC ArchitecturesANSYS Fluent 2019 Rx
0
2000
4000
6000
8000
10000
12000
32 64 128 256 512
Ratin
g (Jo
bs/d
ay)
MPI Tasks (Cores)
ANSYS Fluent 2019 Aircraft Wing 14M Benchmarkon Intel® Gold 6142 2.60 GHz/Intel® Gold6242 2.8GHz
Gold 6142 processors (16c/2.6GHz/150W) 2019 R1 (19.3.0)
Gold 6242 processors (16c/2.8GHz/150W) 2019 R3 (19.5.0)
Higher is Better
ANSYS Fluent Standard Benchmark Aircraft Wing 14M Cells
20 © 2019 ANSYS, Inc. October 24, 2019
Optimized for the Latest HPC ArchitecturesANSYS CFX 2019 R3
0
3000
6000
9000
12000
15000
18000
21000
1 2 4 8
Ratin
g (Jo
bs P
er D
ay)
Number Compute Nodes
Intel® Xeon Processors
Xeon E5-2690v4
Xeon Gold 6154
Platinum 8268
Higher is Better
22 © 2019 ANSYS, Inc. October 24, 2019
Performance Comparison of Intel Xeon Processors- ANSYS CFD 18.1
0
2
4
6
8
10
12
14
16
18
20
22
0 4 8 12 16 20 24 28 32Sp
eedU
p
Number of Cores
ANSYS CFX 18.1
E5-2680v2_HDD
E5-2697v4_HDD
Gold6150_HDD
0
2
4
6
8
10
12
14
16
18
20
22
0 4 8 12 16 20 24 28 32
Spee
dUp
Number of Cores
ANSYS Fluent 18.1
E5-2680v2_HDD
E5-2697v4_HDD
Gold6150_HDD
※Each series is the average value of model_1 and _2, and Turbo Boost On andOff in each CPU.
※Each series is the average value of model_1 and _2, and Turbo Boost On andOff in each CPU.
vs vs vs vs
35%up
43%up
23 © 2019 ANSYS, Inc. October 24, 2019
Performance Comparison of Intel Xeon Processors- ANSYS Fluent 2019 R2
1,00
1,82
2,67
3,43
5,47 5,59 5,82
1,03
1,85
2,71 3,41
5,04 5,09
NoData
0,0
1,0
2,0
3,0
4,0
5,0
6,0
7,0
4 8 12 16 32 36 48
Spee
dUP
Ratio
Number of CPU cores
aircraft
Xeon Platinum 8160 Xeon Gold 6154
When it exceeds the 32 parallels, Xeon Gold 6154 shows a tendency to stall performance compared with Xeon Platinum 8160
1,00
1,81
2,61 3,32
5,46 5,73 6,00
1,09
1,98
2,86 3,54
5,17 5,09
NoData
0,0
1,0
2,0
3,0
4,0
5,0
6,0
7,0
4 8 12 16 32 36 48
Spee
dUP
Ratio
Number of CPU cores
Landing gear
Xeon Platinum 8160 Xeon Gold 6154
24 © 2019 ANSYS, Inc. October 24, 2019
• A newer generation (in this case: Cascade Lake) might have significant performance gain over the previous generation but may require more cores.
• Often hardware vendors are not providing an apples to apples comparison in the terms that we would expect.
• This is just something to be aware of when comparing one processor and one another.
1.2 and 2.4 relative performance based on increased core count, resp.
Performance Comparison of Intel Xeon Processors- ANSYS 2019 R1
25 © 2019 ANSYS, Inc. October 24, 2019
• A newer generation might have significant performance gain over the previous generation but may require more cores.
• Often hardware vendors are not providing an apples to apples comparison in the terms that we would expect.
• This is just something to be aware of when comparing one processor and one another.
Processor Performance Comparisons
Up to YY%
faster
0
1
2S Intel® Xeon® processor E5-2698 v3
2S Intel® Xeon® processor E5-2697 v4
Intel® Xeon® Gold 6148 processor
Up to 13%
faster
Fluent workload: sedan_4m.
ANSYS® Fluent 18.1 increased performance1 with the Intel® Xeon® Gold 6148 processor
Up to 60%
faster@ 13% more cores
26 © 2019 ANSYS, Inc. October 24, 2019
• For Mechanical, the situation is different because of AVX-512 support from E5-v4 and Gold processor generation.
Processor Performance ComparisonsANSYS Mechanical 2019 R2
1,65
0,00
0,20
0,40
0,60
0,80
1,00
1,20
1,40
1,60
1,80
V19cg-1
Core Solver Rating on 32 coresNormalized to Haswell
Intel® Xeon® E5-2699 v3 (Haswell)
Intel® Xeon® E5-2697 v4 (Broadwell)
Intel® Xeon® Gold 6254 (Cascade Lake)
29 © 2019 ANSYS, Inc. October 24, 2019
Intel Xeon Skylake vs. AMD EPYC (Naples)Processor Performance Comparisons
30 © 2019 ANSYS, Inc. October 24, 2019
Intel Xeon Skylake vs. AMD EPYC (Naples)Processor Performance Comparisons
31 © 2019 ANSYS, Inc. October 24, 2019
Intel Xeon Cascade Lake vs. AMD EPYC (Naples)Processor Performance Comparisons
Hardware Specifics
‐ EPYC 7601➢AMD EPYC 7601 with 2 sockets, 32 cores per socket, Mellanox
EDR interconnect
‐ CLX-9242➢Intel Xeon Platinum 9242 with 2 sockets, 48 cores per socket, Intel
OPA interconnect
‐ CLX 8260L➢Intel Xeon Platinum 8260L (CLX-SP) with 2 sockets, 24 cores per
socket, Intel OPA interconnect (single rail)
32 © 2019 ANSYS, Inc. October 24, 2019
Intel Xeon Cascade Lake vs. AMD EPYC (Naples)Processor Performance Comparisons
• About 1.5X performance compared to EPYC
• Better with larger number of cores/nodes
• About 1.3X performance compared to EPYC
• Better with larger number of cores/nodes
33 © 2019 ANSYS, Inc. October 24, 2019
Intel Xeon Cascade Lake vs. AMD EPYC (Naples)Processor Performance Comparisons
34 © 2019 ANSYS, Inc. October 24, 2019
Processor Performance ComparisonsIntel Cascade Lake vs. AMD EPYC (Rome)
35 © 2019 ANSYS, Inc. October 24, 2019
Understanding the effect of clock speed- ANSYS CFD
ANSYS Fluent 19.2
5% on CentOS10% on Windows
higher is better
Intel Xeon Gold 6144• Normal: 3.5 GHz.
• OverClock: 4.09GHz.
Overclocking can increase Speedup by 10% on a Windows system. Half
of the increase on Linux.
36 © 2019 ANSYS, Inc. October 24, 2019
Understanding the effect of clock speed- ANSYS CFD
37 © 2019 ANSYS, Inc. October 24, 2019
Understanding the effect of clock speed- ANSYS Mechanical
• Effect of increased core operating frequencies on the DMP benchmarks running on 12 cores
• Influence is highest for sparse solver benchmarks
Using higher clock speed is alwayshelpful to realize productivity gains
38 © 2019 ANSYS, Inc. October 24, 2019
• We can see that relative to 1 core we can see good performance gains in many cases by using Turbo Boost on the E5 processor family.
Turbo Boost (Intel)- ANSYS Mechanical
39 © 2019 ANSYS, Inc. October 24, 2019
Turbo Boost (Intel)- ANSYS CFD
Using Turbo Boost / Core can behelpful to realize productivity gains
40 © 2019 ANSYS, Inc. October 24, 2019
Turbo Boost (Intel)- ANSYS CFD
0,5
0,75
1
1,25
E5-2697v4 Gold6150
Spee
dUp
CPU
32Cores
TruboBoostOff_3MTruboBoostOn_3MTruboBoostOff_30MTruboBoostOn_30M
0,5
0,75
1
1,25
E5-2697v4 Gold6150
Spee
dUp
CPU
16Cores
TruboBoostOff_3MTruboBoostOn_3MTruboBoostOff_30MTruboBoostOn_30M
0,5
0,75
1
1,25
E5-2697v4 Gold6150
Spee
dUp
CPU
10Cores
TruboBoostOff_3MTruboBoostOn_3MTruboBoostOff_30MTruboBoostOn_30M
0,5
0,75
1
1,25
E5-2697v4 Gold6150
Spee
dUp
CPU
4Cores
TruboBoostOff_3MTruboBoostOn_3MTruboBoostOff_30MTruboBoostOn_30M
1.14 1.13 1.11 1.08
1.05 1.07 1.03 1.03
1.14 1.10 1.10 1.07
1.07 1.08 1.04 1.04
Using Turbo Boost / Core can behelpful to realize productivity gains,Particularly at lower core counts.
3M cell single phase
30M cell single phase
41 © 2019 ANSYS, Inc. October 24, 2019
Hyper-threadingEvaluation of Hyperthreading on ANSYS/FLUENT Performance
iDataplex M3 (Intel Xeon x5670, 2.93 GHz)TURBO: ON
(measurement is improvement relative ot Hyperthtreading OFF)
0.90
0.95
1.00
1.05
1.10
eddy_417K turbo_500K aircraft_2M sedan_4M truck_14MANSYS/FLUENT Model
Impr
ovem
et d
ue to
Hyp
erth
read
ing
.
HT OFF (12 threads on 12 physical cores) HT ON (24 threads on 12 physical cores)
High
er is
bet
ter
Hyper-threading is NOT recommended
42 © 2019 ANSYS, Inc. October 24, 2019
Understanding the effect of memory bandwidth- Is 24 Cores Equal to 24 Cores?
3 x (8) = 24 cores 2 x (12) = 24 cores
43 © 2019 ANSYS, Inc. October 24, 2019
Understanding the effect of memory bandwidth- Is 24 Cores Equal to 24 Cores?
22%up
44 © 2019 ANSYS, Inc. October 24, 2019
10-core processor has higher performance per core than
12-core processor.Consider memory per core!
Understanding the effect of memory bandwidth
45 © 2019 ANSYS, Inc. October 24, 2019
Understanding the effect of memory bandwidth- Is 20 Cores Equal to 20 Cores?
Using less cores per node can behelpful to realize productivity gains
46 © 2019 ANSYS, Inc. October 24, 2019
Understanding the effect of memory bandwidth
Using less cores per node can behelpful to realize productivity gains
47 © 2019 ANSYS, Inc. October 24, 2019
10-core processor has higher performance per core than
12-core processor.Consider memory per core!
Understanding the effect of memory channels
48 © 2019 ANSYS, Inc. October 24, 2019
Understanding the effect of memory speed- ANSYS CFD Impact of DIMM speed on ANSYS/FLUENT Application Performance
(Intel Xeon x5670, 2.93 GHz)Hyper Threading: OFF, TURBO: ON
Active threads per node: 12(performance measure improvement is relative to memory speed of 1066 MHz)
80%
85%
90%
95%
100%
105%
110%
115%
120%
125%
130%
eddy_417K turbo_500K aircraft_2M sedan_4M truck_14M
ANSYS/FLUENT Model
Impa
ct o
f Mem
ory
Spe
ed
1066 MHz1333 MHz
• Some processors types have slower memory speeds by default
• In the past, memory speed was more influential
• With current processors, we can see a minimal effect of memory speed
• On other processors non-optimally filling of the memory channels can slow the memory speed
Memory speed is shown to have a measurable, but small effect of
approximately 2%
49 © 2019 ANSYS, Inc. October 24, 2019
Distributed Memory Parallel is Outperforming Shared Memory Parallel computing
SMP DMP
4 8 12 160
5.0
2.5
0.0
50.0
25.0
64 128 192 2560
Speedup Factor vs. Number of Coresfor ANSYS Mechanical
0.0
SMP vs. DMP
50 © 2019 ANSYS, Inc. October 24, 2019
• Faster cores mean faster solution
• Faster memory means slightly faster solution
• Memory bandwidth is an important factor for (linear) scale-ability
• Turbo Boost/Turbo Core modes do give some benefit especially at low core counts per node.
• In general hyper threading should not be used because of licensing implications.
• Be careful when looking at comparisons! Make sure you are comparing like with like!
Recap
51 © 2019 ANSYS, Inc. October 24, 2019
HDD vs. SSD
What Hardware Configuration to Select?
SMP vs. DMP Interconnects?Clusters?
CPUs? GPUs?
52 © 2019 ANSYS, Inc. October 24, 2019
Take Advantage of the Latest HPC ArchitecturesANSYS Mechanical 19.0 with Nvidia GPU
ANSYS Application Examples
53 © 2019 ANSYS, Inc. October 24, 2019
Take Advantage of the Latest HPC ArchitecturesANSYS Mechanical 19.x with Nvidia GPU
ANSYS Application Examples
• R19 benchmark suite run on Linux server with 2 Intel Xeon E5-2695v3 processors• 256 GB RAM, SSD, 1 NVIDIA P100 16 GB PCIe card, CentOS 7.2
Comparison of entire simulation solution time (not just the calculations that are accelerated on the GPU)
54 © 2019 ANSYS, Inc. October 24, 2019
Take Advantage of the Latest HPC ArchitecturesANSYS Mechanical 19.x with Nvidia GPU
ANSYS Application Examples
• 4.2 million DOF; sparse solver, nonlinear static analysis involving contact, plasticity and gasket elements• Linux cluster; each compute node contains 2 Intel Xeon E5-2690 v4 (2.1GHz, 8c) processors, 256 GB RAM,
2 NVIDIA Tesla P100, CentOS 7.4
Comparison of entire simulation solution time (not just the calculations that are accelerated on the GPU)
0,0
0,5
1,0
1,5
2,0
2,5
3,0
3,5
4,0
0 GPU 1 GPU 2 GPU
Rela
tive
Spee
dup
DMP Performance w/ 16 cores
R19.0
R19.1
55 © 2019 ANSYS, Inc. October 24, 2019
Take Advantage of the Latest HPC ArchitecturesANSYS Mechanical 18.1 with Nvidia GPU
ANSYS Application Examples
• R18 benchmark suite run on PNY workstation with 2 Intel Xeon E5-2620 v4 (2.1GHz, 8c) processors• 128 GB RAM, 2 x SSD Raid 0, 1 NVIDIA Quadro GP100, Windows 10
Comparison of entire simulation solution time (not just the calculations that are accelerated on the GPU)
1,89
1,39 1,33 1,301,40
1,65 1,61
1,13 1,03
1,54
2,26
1,59
1,18
0,00
0,50
1,00
1,50
2,00
2,50
V18cg-1 V18cg-2 V18cg-3 V18ln-1 V18ln-2 V18ln-3(DSLP)
V18ln-4(Impeller)
V18sp-1 V18sp-2 V18sp-3 V18sp-4 V18sp-5 V18sp-6(Coupling)
Elapsed Time Speed Up from "10 CPU cores" to "9 CPU cores with 1 GP100" for 1 HPC Pack
56 © 2019 ANSYS, Inc. October 24, 2019
Take Advantage of the Latest HPC ArchitecturesANSYS Mechanical 18.1 with Nvidia GPU
ANSYS Application Examples
2.5x
When GPU accelerator is used, job speeds up by 2.5 times with 2 cores, by 2.1 times with 4 cores and 1. 7 times with 8 cores.
When GPU accelerator is used with 16 cores, job speeds up by 6.33 times.
higher is better
2.1x
1.7x
Hardware Configuration:• HP Z840 workstation with dual E5-2699v4 (2.2 GHz), 128GBs 2400MHz memory• Optional NVIDIA card: Tesla K40c or Quadro GP100
57 © 2019 ANSYS, Inc. October 24, 2019
Take Advantage of the Latest HPC ArchitecturesANSYS Mechanical 18.1 with Nvidia GPU
ANSYS Application Examples
58 © 2019 ANSYS, Inc. October 24, 2019
Take Advantage of the Latest HPC ArchitecturesANSYS Mechanical 18.1 with Nvidia GPU
ANSYS Application Examples
59 © 2019 ANSYS, Inc. October 24, 2019
Take Advantage of the Latest HPC ArchitecturesANSYS Mechanical 17.0 with Nvidia GPU
ANSYS Application Examples
60 © 2019 ANSYS, Inc. October 24, 2019
Take Advantage of the Latest HPC ArchitecturesANSYS Mechanical 17.0 with Nvidia GPU
ANSYS Application Examples
61 © 2019 ANSYS, Inc. October 24, 2019
NVIDIA-GPU Solution Fit for ANSYS Mechanical
GPUs accelerate the solver part of analysis, consequently problems with high solver workloads benefit the most from GPUs• Characterized by both high DOF and high factorization requirements• Models with solid elements (such as castings) and have >500K DOF experience good
speedups
Better performance when run on DMP mode over SMP mode
GPU and system memories both play important roles in performance• Sparse solver:
– Bulkier and/or higher-order FE models are good and will be accelerated– If the model exceeds 5M DOF, then use a single GPU with 12 GB memory (Tesla K40, Quadro
K6000 / GP100, P100).
• PCG/JCG solver: – Memory saving (MSAVE) option should be turned off for enabling GPUs– Models with lower Level of Difficulty value (Lev_Diff) are better suited for GPUs
62 © 2019 ANSYS, Inc. October 24, 2019
Optimized for the Latest HPC ArchitecturesANSYS Fluent 19.0
ANSYS Application Example
63 © 2019 ANSYS, Inc. October 24, 2019
Optimized for the Latest HPC ArchitecturesANSYS Fluent 19.0
ANSYS Application Example
64 © 2019 ANSYS, Inc. October 24, 2019
Optimized for the Latest HPC ArchitecturesANSYS Fluent 18.1
ANSYS Application Example
Case Details:• Boeing Landing Gear Analysis; 15 million mixed cells, 100 iterationsHardware Configuration:• Dual Intel Xeon E5-2698v3 (2.3GHz), 256GB, Tesla P100
65 © 2019 ANSYS, Inc. October 24, 2019
Optimized for the Latest HPC ArchitecturesANSYS Fluent 18.1
ANSYS Application Example
Case Details:• F1 race car model; 140 million hexa-core cells; pseudo transient solver is off; 100 iterationsHardware Configuration:• Dual Intel Xeon E5-2698v3 (2.3GHz), 256GB, Tesla P100
66 © 2019 ANSYS, Inc. October 24, 2019
Optimized for the Latest HPC ArchitecturesANSYS Fluent 18.1
ANSYS Application Example
Case Details:• External flow over truck; 14 million mixed cells; until convergenceHardware Configuration:• Dual Intel Xeon E5-2698v3 (2.3GHz), 256GB, Tesla P100
67 © 2019 ANSYS, Inc. October 24, 2019
Optimized for the Latest HPC ArchitecturesANSYS Fluent 18.1
ANSYS Application Example
Case Details:• External flow over truck; 14 million mixed cells; until convergenceHardware Configuration:• Dual Intel Xeon E5-2698v3 (2.3GHz), 256GB, Tesla P100
68 © 2019 ANSYS, Inc. October 24, 2019
Optimized for the Latest HPC ArchitecturesANSYS Fluent 18.0
ANSYS Application Example
Case Details:• 9.6 million cell pipe benchmarkHardware Configuration:• Cluster of XL250 Gen9s with E5-2690v4, 128GBs 2400MHz memory and 2 NVIDIA K80s/node
69 © 2019 ANSYS, Inc. October 24, 2019
Optimized for the Latest HPC ArchitecturesANSYS Fluent 17.0
ANSYS Application Example
• 2x Intel Xeon Broadwell-EP (Xeon E5-2690 v4 2.6 GHz) 16-core CPU, Quadro GP100, windows 10, 256 GB RAM• Customer model: automotive water-cooled engine jacket (5.5m cells)
Comparison of time until convergence (not just the calculations that are accelerated on the GPU)
70 © 2019 ANSYS, Inc. October 24, 2019
NVIDIA-GPU Solution Fit for ANSYS Fluent
71 © 2019 ANSYS, Inc. October 24, 2019
NVIDIA-GPU Solution Fit for ANSYS Fluent- Supported Hardware Configurations
CPU
GPU
CPU
GPU
CPU
GPU
CPU
GPU
Some nodes with 16 processes and some with 12 processes
Some nodes with 2 GPUs some with 1 GPU
15 processes not divisible by 2 GPUs
● Homogeneous process distribution● Homogeneous GPU selection● Number of processes be an exact
multiple of number of GPUs
72 © 2019 ANSYS, Inc. October 24, 2019
• Adding GPUs to a CPU-only node resulted in 2.1x speed up while reducing energy consumption by 38%
NVIDIA-GPU Solution Fit for ANSYS Fluent- Power Consumption Study
73 © 2019 ANSYS, Inc. October 24, 2019
NVIDIA-GPU Solution Fit for ANSYS Fluent
GPUs accelerate the AMG solver portion of the CFD analysis, thus benefit problems with relatively high %AMG • Coupled solvers have high %AMG in the range of 60-70%• Fine meshes and low-dissipation problems have high %AMG
In some cases, pressure-based coupled solvers offer faster convergence compared to segregated solvers (problem-dependent)
The whole problem must fit on GPUs for the calculations to proceed• In pressure-based coupled solver, each million cells need approx. 4 GB of GPU memory• High-memory cards such as Tesla K80, Quadro K6000 / GP100 or P100 are ideal
Moving scalar equations such as turbulence may not benefit much because of low workloads (using ‘scalar yes’ option in ‘amg-options’)
Better performance on lower CPU core counts• A ratio of 3 or 4 CPU cores to 1 GPU is recommended
74 © 2019 ANSYS, Inc. October 24, 2019
Optimized for the Latest HPC ArchitecturesANSYS HFSS Transient 18.1
ANSYS Application Examples
0,00
1,00
2,00
3,00
4,00
5,00
6,00
7,00
8,00
9,00
cauer DifferntialVia Dipole_PML F35_800Mhz GSM_Antenna PECMine
Spee
dup
Xeon E5-2687W
Tesla K40
Quadro GP100
Comparison of entire simulation solution time (not just the calculations that are accelerated on the GPU)
• 2x Intel Xeon Intel Xeon E5-2687W 3.1GHz] 8-core CPU. • Tesla K40. Tesla GP100. 256 GB RAM. Windows 7 x64.
75 © 2019 ANSYS, Inc. October 24, 2019
Optimized for the Latest HPC ArchitecturesANSYS HFSS Transient 18.1
ANSYS Features & Capabilities
• Automatic job assignment for parametric sweeps or network analyses with multiple excitations
• Speedup scales linearly with respect to the number of GPUs
• Auto detection of GPUs attached to displays and exclude them from GPU acceleration
GPU monitoring by nvidia-smi
CPU monitoring by Windows Task Manager
76 © 2019 ANSYS, Inc. October 24, 2019
Optimized for the Latest HPC ArchitecturesANSYS HFSS 18.1
ANSYS Application Examples
• 2x Intel Xeon Haswell-EP [Xeon E5-2695 v3 2.3 GHz] 14-core CPU. • Tesla K80. Tesla P100. 256 GB RAM. CentOS 7.2 64-bit.
Comparison of entire simulation solution time (not just the calculations that are accelerated on the GPU)
0 0,5 1 1,5 2 2,5 3 3,5 4
F35
HitachiCar
EBG_Ground_plane
8 CPU Cores+ 1x P100
8 CPU Cores+ 1x K80
8 CPU Cores
77 © 2019 ANSYS, Inc. October 24, 2019
Optimized for the Latest HPC ArchitecturesANSYS Maxwell3D 18.1
ANSYS Application Examples
• Benchmark Model: T.E.A.M. Problem 21. Eddy Current Loss in Power Transformer.• 2x Intel Xeon Haswell-EP [Xeon E5-2695 v3 2.3 GHz] 14-core CPU. • Tesla K80. Tesla P100. 256 GB RAM. CentOS 7.2 64-bit.
Comparison of entire simulation solution time (not just the calculations that are accelerated on the GPU)
0 0,2 0,4 0,6 0,8 1 1,2 1,4
8 CPU Cores
8 CPU Cores+ 1x K80
8 CPU Cores+ 1x P100
8 CPU Cores 8 CPU Cores+ 1x K80
8 CPU Cores+ 1x P100
78 © 2019 ANSYS, Inc. October 24, 2019
NVIDIA-GPU Solution Fit for ANSYS HFSS & Maxwell3D
GPUs accelerate the hybrid solver in HFSS Transient• The GPU-accelerated hybrid solver clearly outperforms the implicit solver.• High-frequency problems with uniform meshes and high operating frequency benefit
the most.• Usually good speedups can be achieved starting from 140K DOFs.• The hybrid solver in HFSS Transient detects projects not suitable to run on GPUs
(speedup < 1x) and falls back to CPUs automatically.
GPUs accelerate the multi-frontal sparse direct solver in HFSS and Maxwell3D• Usually good speedups can be achieved starting from 2M DOFs.• Double precision only.• Also these solvers use GPUs only if there is a potential speedup.
79 © 2019 ANSYS, Inc. October 24, 2019
HDD vs. SSD
What Hardware Configuration to Select?
SMP vs. DMP Interconnects?Clusters?
CPUs? GPUs?
80 © 2019 ANSYS, Inc. October 24, 2019
• Need fast interconnects to feed fast processors– Two main characteristics for each interconnect: latency and bandwidth– Distributed ANSYS is highly bandwidth bound
+--------- D I S T R I B U T E D A N S Y S S T A T I S T I C S ------------+
Release: 14.5 Build: UP20120802 Platform: LINUX x64 Date Run: 08/09/2012 Time: 23:07
Processor Model: Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz
Total number of cores available : 32Number of physical cores available : 32Number of cores requested : 4 (Distributed Memory Parallel)MPI Type: INTELMPI
Core Machine Name Working Directory----------------------------------------------------
0 hpclnxsmc00 /data1/ansyswork1 hpclnxsmc00 /data1/ansyswork2 hpclnxsmc01 /data1/ansyswork3 hpclnxsmc01 /data1/ansyswork
Latency time from master to core 1 = 1.171 microsecondsLatency time from master to core 2 = 2.251 microsecondsLatency time from master to core 3 = 2.225 microseconds
Communication speed from master to core 1 = 7934.49 MB/sec Same machineCommunication speed from master to core 2 = 3011.09 MB/sec QDR InfinibandCommunication speed from master to core 3 = 3235.00 MB/sec QDR Infiniband
Understanding the effect of the interconnect
81 © 2019 ANSYS, Inc. October 24, 2019
Understanding the effect of the interconnect- ANSYS Fluent
Exhaust Model
7.6M cellsTransient simulation with explicit time stepping for engine startup cycleFujitsu PRIMERGY CX250 HPC systems (E5-2690v2 with 20 and E5-2697v2 with 24 cores per node, resp.) For CFD we can see the performance
of IB vs GiGE – GiGE starts to drop off after 2 nodes
82 © 2019 ANSYS, Inc. October 24, 2019
Understanding the effect of the interconnect- ANSYS Fluent
• Fluent 18.0 performance measured using benchmark sets ranging from 2 to 14 million cells.
• Intel Xeon E5 v4 processor family – up to 96 nodes (3456 cores).
• At lower core counts (~576 cores) the performance between Intel Omni-Path vs EDR InfiniBand is comparable and at higher core counts Omni-Path outperforms by ~25-47%.
83 © 2019 ANSYS, Inc. October 24, 2019
Understanding the effect of the interconnect- ANSYS Fluent
• For the Combustor 12 million cell model, OPA is ~33% better in performance compared to EDR InfiniBand (using 36 nodes, 3456 cores).
• For the Open Racecar 280 million cell case, OPA maintains nearly linear scalability up to ~7000 core count run.
84 © 2019 ANSYS, Inc. October 24, 2019
Understanding the effect of the interconnect- ANSYS Fluent 2019 R2
In case of IBM-MPI and MSMPI, saturation occurred at 48
parallels in the small size CFD simulation.
MSMPI is fastest MPI in the middle size CFD simulation.
1,00
1,82
2,67
3,43
5,47 5,59 5,82
1,01
1,82 2,65
3,43
5,45 5,56
4,15
0,99
1,80
2,65 3,36
5,09 4,96 4,51
0,0
1,0
2,0
3,0
4,0
5,0
6,0
7,0
4 8 12 16 32 36 48
Spee
d U
p Ra
tio
Number of CPU cores
aircraft
IntelMPI IBM-MPI MSMPI
1,00
1,81
2,61 3,32
5,46 5,73 6,00
1,01
1,94
2,78 3,58
5,65 5,88
4,57
1,03
1,94
2,80
3,60
5,67 5,87 6,21
0,0
1,0
2,0
3,0
4,0
5,0
6,0
7,0
4 8 12 16 32 36 48Sp
eed
Up
Ratio
Number of CPU cores
Landing gear
IntelMPI IBM-MPI MSMPI
• HP Z8 G4 Workstation.• 2x Intel Xeon Platinum 8160 (2.1-3.7GHz, 24cores) CPUs.• 192GB (2600MHz, 8GBx24 DIMMs. • 1TB HP Z Turbo Drive G2 (NVMe SSD)
86 © 2019 ANSYS, Inc. October 24, 2019
Understanding the effect of the interconnect- ANSYS Fluent
For the several MPI benchmarks, HPC-X exhibits higher performance
and better scalability
87 © 2019 ANSYS, Inc. October 24, 2019
Understanding the effect of the interconnect- ANSYS Fluent
88 © 2019 ANSYS, Inc. October 24, 2019
Understanding the effect of the interconnect- ANSYS Mechanical
For ANSYS Mechanical GiGE does not scale to more than 1 node!
89 © 2019 ANSYS, Inc. October 24, 2019
Understanding the effect of the interconnect- ANSYS Mechanical
V13sp-5 Model
Turbine geometry2,100 K DOFSOLID187 FEsStatic, nonlinearOne iterationDirect sparseLinux cluster (8 cores per node) 0
10
20
30
40
50
60
8 cores 16 cores 32 cores 64 cores 128 cores
Ratin
g (r
uns/
day)
Interconnect Performance
Gigabit Ethernet
DDR Infiniband
Using faster interconnects can behelpful to realize productivity gains
- particularly at higher core/node counts
90 © 2019 ANSYS, Inc. October 24, 2019
• 10GiGE and InfiniBand are recommended for HPC Clusters. o Currently InfiniBand only for large clusters is recommended o QDR should be more than adequate for small to medium clusters.
FDR for large clusters.
• For more than 1 node you will see performance decrease using GiGE. o For Mechanical users do not use GiGE at all if their jobs span more
than one node.
Recap
91 © 2019 ANSYS, Inc. October 24, 2019
HDD vs. SSD
What Hardware Configuration to Select?
SMP vs. DMP Interconnects?Clusters?
CPUs? GPUs?
92 © 2019 ANSYS, Inc. October 24, 2019
• Need fast hard drives to feed fast processors– Check the bandwidth specs
– ANSYS Mechanical can be highly I/O bandwidth bound– Sparse solver in the out-of-core memory mode does lots of I/O
– Distributed ANSYS can be highly I/O latency bound– Seek time to read/write each set of files causes overhead
– Consider SSDs– High bandwidth and extremely low seek times
– Consider RAID configurationsRAID 0 – for speed RAID 1,5 – for redundancyRAID 10 – for speed and redundancy
Understanding the effect of the disks/storage- ANSYS Mechanical
93 © 2019 ANSYS, Inc. October 24, 2019
Understanding the effect of the disks/storage- ANSYS Mechanical 18.1
When working directory is assigned to Z Turbo Drive G2 and BMT models for CG solver are used with more than 16 cores, job speeds up by 1.4 times.
When working directory is assigned to Z Turbo DriveG2 and BMT models for SPARSE are used with more than 16 cores, job speeds up by 1.8-2.6 times.
higher is better
higher is better
1.4x 1.4x 1.4x
1.8x
2.6x2.1x
Hardware Configuration:• HP Z840 workstation with dual E5-2699v4 (2.2 GHz), 128GBs 2400MHz memory• Optional Storage: Micron SATA SSD No RAID or HP Z Turbo Drive G2 512GB No RAID
94 © 2019 ANSYS, Inc. October 24, 2019
Understanding the effect of the disks/storage- ANSYS Mechanical 18.1
95 © 2019 ANSYS, Inc. October 24, 2019
Understanding the effect of the disks/storage- ANSYS Mechanical 18.1
96 © 2019 ANSYS, Inc. October 24, 2019
Understanding the effect of the disks/storage- ANSYS Mechanical
Ratin
g
Number of Cores
Using faster disks can behelpful to realize productivity gains
- particularly at higher core/node counts
97 © 2019 ANSYS, Inc. October 24, 2019
Landing Gear Noise Predictions using Scale-Resolving Simulations (180M cell model using pressure based segregated solver)
Understanding the effect of the disks/storage- ANSYS Fluent
98 © 2019 ANSYS, Inc. October 24, 2019
Mesh File Location Async I/O Time
15M Cas NFS OFF 217s
15M Cas NFS ON 62s
15M Dat NFS OFF 113s
15M Dat NFS ON 8s
30M Cas NFS OFF 207s
30M Cas NFS ON 75s
30M Dat NFS OFF 144s
30M Dat NFS ON 10s
Asynchronous I/O for Linux FluentTotal write time 3-5x quicker over NFSEven larger speed-ups on bigger cases and local disk (up to 10x)
Understanding the effect of the disks/storage- ANSYS Fluent
99 © 2019 ANSYS, Inc. October 24, 2019
• I/O is very important for Mechanical Solvero Raid 0 mandatory for multiple diskso SSD’s recommended for speed, 15k SAS drives
• Fluent and CFX for most customers won’t require fast local disk access (for most type of job)
• Parallel file systems can meet the requirements of both types of solver
Recap
100 © 2019 ANSYS, Inc. October 24, 2019
Agenda
• HPC Terminology
• Hardware Considerations
• Solution Reference Architecture
• Supporting “HPC Resources Anywhere”
• HPC Parallel & Parametric Licensing
101 © 2019 ANSYS, Inc. October 24, 2019
ANSYS CFX/Fluent Starter Cluster
102 © 2019 ANSYS, Inc. October 24, 2019
ANSYS Mechanical Starter Cluster
103 © 2019 ANSYS, Inc. October 24, 2019
Agenda
• HPC Terminology
• Hardware Considerations
• Solution Reference Architecture
• Supporting “HPC Resources Anywhere”
• HPC Parallel & Parametric Licensing
104 © 2019 ANSYS, Inc. October 24, 2019
Supporting “HPC Resources Anywhere”
● ANSYS support and certify the leading remote display software solutions (VNC, DCV, Exceed onDemand, and Microsoft Remote Desktop).
● ANSYS support and certify the leading job schedulers (LSF, PBS Professional, UGE/SGE, MOAB/Torque, Microsoft HPC).
● ANSYS support cloud portal workflows (IBM Platform Application Center, Altair Compute Manager, NICE EnginFrame, ANSYS EKM).
ANSYS Features & Capabilities
Customer Benefits● Easy access to more powerful HPC
resources, and simulate models that were simply impossible in the past.
● Collaborate virtually from anywhere with any client device.
● Increase HPC resource utilization while lowering IT support overhead.
● Reduce network overload and security concerns by elimination of moving big simulation data sets around!
105 © 2019 ANSYS, Inc. October 24, 2019
ANSYS 2019 R3 supports the following remote display solutions:• Nice Desktop Cloud Visualization (DCV) 2017.4o Linux server + Linux/Windows client
• OpenText Exceed onDemand 8 SP11o Linux server + Linux/Windows client
• OpenText Exceed TurboX 12.0o Linux server + Linux/Windows client
• VNC Connect 6.4 (with VirtualGL 2.6)o Linux server + Linux/Windows client
• Microsoft Remote Desktop (on Windows cluster)
Hardware requirements for remote visualization servers require:• GPU capable video cards• large amounts of RAM accessible for multiple user availability when running
ANSYS applications and pre/post processing
Supporting “HPC Resources Anywhere”Remote Display Support
ANSYS Features & Capabilities
106 © 2019 ANSYS, Inc. October 24, 2019
Supporting “HPC Resources Anywhere”Virtual Desktop (VDI) Support
ANSYS Features & Capabilities
Support for virtual GPU• for less graphically intensive work – GPU to be shared
between multiple virtual machines (VMs)
GPU pass-through still for best performance• One GPU per VM, up to 8 VMs per machine (K1, K2
cards); memory constraints will limit in any case
Supported at ANSYS 2019 R3:
107 © 2018 ANSYS, Inc. October 24, 2019
Agenda
• HPC Terminology
• Hardware Considerations
• Solution Reference Architecture
• Supporting “HPC Resources Anywhere”
• HPC Parallel & Parametric Licensing
108 © 2018 ANSYS, Inc. October 24, 2019
HPC Parallel & Parametric Licensing
❏ Standalone HPC licensesBasic option to license individual cores for low-level HPC (e.g. 5-6 cores)
❏ HPC PackHPC product rewarding volume parallel processing for high-fidelity individual simulations
❏ HPC WorkgroupHPC product rewarding volume parallel processing for increased simulation throughput shared among engineers throughout a single location or the world
❏ HPC Parametric PackHPC product enabling simultaneous execution of multiple design points while consuming just one set of licenses
2052
36
12
132
516
Parallel Enabled(Total Cores)
HPC Packs per Simulation1 2 3 4 5
32772
8196
6 7
32+4
8+4
128+4
512+4
2048+4
8192+4
23768+4
109 © 2018 ANSYS, Inc. October 24, 2019
What’s new in ANSYS 19.0?
ANSYS Mechanical Pro, Premium, Enterprise
ANSYS CFD Premium and Enterprise
ANSYS Mechanical CFD ANSYS HFSS
ANSYS AIM ANSYS Q3D Extractor
ANSYS Maxwell ANSYS Icepak
ANSYS Mechanical CFD Maxwell 3D
ANSYS Chemkin-Pro and Enterprise
ANSYS Mechanical Maxwell 3D
ANSYS SIwave
More products are now using ANSYS HPC • Standalone HPC licenses, HPC Packs and HPC
Workgroup become more flexible and work across physics with all ANSYS Mechanical, Fluids and Electronics products*
4 Built-in HPCs now across all physics• 4 built-in HPCs are now included in Mechanical,
Fluids and Electronics products, including ANSYS AIM and ANSYS Chemkin Enterprise.
HPC Packs are now additive • HPC Packs becomes additive in nature to the 4 built-
in HPCs (e.g. 1 HPC Pack licenses 8 + 4 = 12 total cores, 2 HPC Pack license 32 + 4 = 36 total cores, etc.)
* Impacted products :
Note: R19.0 license manager is required. For ANSYS Mechanical and Fluids products changes are backward compatible; for ANSYS Electronics products changes are compatible with version 19.0 and forward
Note: built-in HPCs are linked to a solver seat and cannot be shared with other solver seats!
Note: the single, standalone HPCs are not additive to the Packs
110 © 2019 ANSYS, Inc. October 24, 2019
HPC license for running parametric FEA or CFD simulations on multiple CPU cores simultaneously, and more cost effectively
ANSYS HPC Parametric Pack License
Key Benefits• Ability to automatically and simultaneously
execute design points while consuming just one set of application licenses
• Scalable because number of simultaneous design points enabled increases quickly with added packs
• Amplifies complete workflow because design points can include execution of multiple applications (pre, meshing, solve, HPC, post)
Number of Simultaneous Design Points Enabled64
2
8
Number of HPC Parametric Pack Licenses1
4
16
32
3 4 5
111 © 2019 ANSYS, Inc. October 24, 2019
HPC Parametric Packs to Reduce Time for Design Variations
dp1dp2dp3dp4
Sequ
entia
l se
ries o
f D
esig
n po
ints
Unused Cores
One solver key and one HPC Parametric Pack
without HPC
94% Reduced Time to Innovation
HPC Parametric Packs amplify both solver licenses and HPC licenses
allowing you to drastically reduce time to innovation, without the cost of additional solver or HPC licenses…
One solver key
Four solver keysOR
+ 4 HPC keys
112 © 2019 ANSYS, Inc. October 24, 2019
Licensing GPUs for Computing
Electronics products
● 4 HPC licenses enable 1 GPU through the available 8 HPC tasks● 1 HPC Pack enables up to 12 CPU cores + 1 GPUs through the available 12 HPC tasks● 2 HPC Packs enable up to 36 CPU cores + 4 GPUs through the available 36 HPC tasks
Fluids / Structural products1 GPU requires 1 HPC task as long as GPUs ≤ CPU cores
Examples:● 2 HPC licenses enable up to 3 CPU cores + 3 GPUs through the available 6 HPC tasks● 1 HPC Pack enables up to 6 CPU cores + 6 GPUs through the available 12 HPC tasks● 2 HPC Packs enable up to 18 CPU cores + 18 GPUs through the available 36 HPC tasks
1 GPU unlocked by every 8 HPC tasks
GPU acceleration can be enabled through all ANSYS HPC product licenses: ANSYS HPC, ANSYS HPC Pack and ANSYS HPC Workgroup.
113 © 2019 ANSYS, Inc. October 24, 2019
Pay-per-use HPC at ANSYS 2019 R2
HPC Packs and HPC Parametric Packs are supported by usage-based ANSYS Elastic Units (AEUs) for on-premise
and off-premise (i.e. cloud) deployments. Optimal for intermittent use and/or peak demands in HPC!
Consumption Rate by Product Category
1 AEU/h 2 AEU/h 4 AEU/h 8 AEU/h
Geometry Interfaces ECAD Translators Pre/Post Solver
DSO Optimization HPC Pack HPC Parametric Pack
AIM, Maxwell 2D, Simplorer
114 © 2019 ANSYS, Inc. October 24, 2019
HPC license cost decreases as more are purchased either as HPC Packs or as HPC Workgroups.
ANSYS HPC and ANSYS HPC Workgroup gives flexible use of a pool of licenses.
ANSYS HPC Pack gives “quick” scale-up but is more restrictive in how users can use it.
The ability to be more flexible is why the HPC Workgroup options cost more than the HPC Packs.
HPC Parametric Pack enables more cost-effective licensing for design exploration and optimization.
Which Type of Licensing is Right for Me?
115 © 2019 ANSYS, Inc. October 24, 2019
Multiple licensing options to fit different requirements.
HPC Packs for quick scale-up.
HPC Workgroup for Flexibility.
GPU’s treated the same as cores in the licensing model.
As you scale-up license cost decreases per core.
Per core pricing becomes less of an issue.
Wrap-up - Licensing
Running on 2,000 cores instead of 20 cores at 1.5X – and not 100X
Filling up a 1024- instead of 128-core clusterwith 32-core jobs will cut the price per job in half!
Enabling 64 instead of 4 simultaneous design points at ~3X – and not 16X
116 © 2018 ANSYS, Inc. October 24, 2019
Additional Resources- Tools to Check Out!
www.ansys.com/ws-roi-estimator
ROI Estimator!
For hardware Advice!
www.ansys.com/support/platform-support www.ansys.com/hpc-webinarswww.ansys.com/hpc-cluster-appliance
www.ansys.com/free-hpc-benchmark
Get a Free HPC Benchmark!
117 © 2018 ANSYS, Inc. October 24, 2019
White Papers by clicking below:• Speed Simulation & Innovation
• The Value of High-Performance Computing for Simulation
• Debunking Six Myths of High-Performance Computing
• Higher Performance Across a Wide Range of ANSYS Fluent Simulations with the Intel Xeon Gold 6148 Processor
• Value of HPC for Ensuring Product Integrity
• Optimizing Business Value in High-Performance Engineering Computing
• ANSYS Fluent with Fujitsu PRIMERGY HPC: HVAC for Built Environment
• ANSYS® Application Benchmarking on Dell PowerEdge VRTX
• HPE Reference Architecture for Small and Medium Enterprises
• ANSYS Fluent Brings CFD Performance with Intel Processors and Fabrics
Additional Resources- IT White Papers
118 © 2018 ANSYS, Inc. October 24, 2019
Additional Resources- IT White Papers & Technical Briefs
Technical Briefs by clicking below:• Dell EMC HPC System for Manufacturing— ANSYS Application Performance
• SGI Technology Guide for ANSYS Mechanical Analysts
• SGI Technology Guide for ANSYS Fluent Analysts
• Workstations for FEA Simulation
• HP Reference Architecture for Small and Medium Enterprises
White Papers by clicking below:• Mechanical Engineer Productivity Boosted by Higher-Core CPUs
• Focus on Faster Mechanical Simulation
• Workstations for FEA Simulation
• Intel Solid-State Drives Increase Productivity of Product Design and Simulation
119 © 2018 ANSYS, Inc. October 24, 2019
Click on webinars related to HPC/IT for more and upcoming ones!
Additional Resources- IT Webinars
Watch recorded webinars by clicking below:• Understand How High-Performance Compute Can Accelerate Your Simulation
Throughput• Speed-up Your Desktop ANSYS FEA Simulations with ANSYS HPC• Getting the Most from Your ANSYS Simulation Applications• How to Evaluate and Improve the Performance of ANSYS Mechanical• Extreme Scalability for High-Fidelity CFD Simulations• Industry Perspectives on Extreme Scalability for High-Fidelity CFD Simulations
120 © 2018 ANSYS, Inc. October 24, 2019
• Connect with Me– [email protected]
• Connect with ANSYS, Inc.– LinkedIn ANSYSInc– Twitter @ANSYS– Facebook ANSYSInc
• Follow our Blog– ansys-blog.com
Thank You!