Learn How to Speed Up Your Simulations with More Powerful ... · Your Simulations with . More Powerful Servers. Tony DeVarco and Norbert Bianchin, Hewlett Packard Enterprise. Jeremy

Learn How to Speed Up Your Simulations with More Powerful Servers

Tony DeVarco and Norbert Bianchin, Hewlett Packard Enterprise

Jeremy Rappenecker, Victaulic

Wim Slagter, Diana Collier and Pierre Louat, ANSYS

Part of ANSYS IT Solutions Webcast Series

October 16, 2019

www.ansys.com/Solutions/Solutions-by-Role/IT-Professionals

IT & Cloud Solutions for ANSYS

IT solutions webcast series from ANSYS and our partners

Our goal is to provide ANSYS customers with- Recommendations on HW and system specification for on-premises

and in the cloud- Best practice configuration, setup, management- Roadmap and vision for planning

Past topics included- BD Improved Engineering Simulation Productivity with HPC to Run ANSYS Software- Understand How High-Performance Compute Can Accelerate Your Simulation Throughput- Speed-up your Desktop ANSYS FEA Simulations with ANSYS HPC- Getting the Most from Your ANSYS Simulation Applications- Focus on Faster Mechanical Simulation

http://www.ansys.com/Solutions/Solutions-by-Role/IT-Professionals

ANSYS Focus on IT Solutions

More simulations in less time

HPC scale-up and collaboration

IT is the enabler for more effective use of engineering simulation

Larger, more complex and accurate simulation

Some Observations – Are You Also Compute Bound?Roughly half of our users exclusively run on a laptop/desktop!

17% of our users have hardware that is >3 years old!

More Observations – Misconception about HPC for FEAMechanical clearly scales beyond a workstation computer!

Release 16.0 Release 18.0 2019 R1Release 19.1

2019 R3

Distributed ANSYS EnhancementsImproved scaling for sparse direct solver

0

2

4

6

8

10

12

128 256 512 1024 2048 4096

Spee

dup

Number of Cores

DMP Scaling Performance

2019 R2

2019 R3

Increased performance over time

efficiency > 50%

Case Characteristics:• SP8mdof benchmark• Sparse Solver• Nonlinear transient analysis involving

creep & NLGEOM,ONHardware Configuration:• 2 Intel Xeon Platinum 8260L @ 2.4GHz

(48 cores total)• 192 GB RAM, SSD, RHEL 7.6

More Observations & How Do We Address Them?

Today’s Topic Being Addressed by

Norbert Bianchin

Today’s Topic Being Addressed by Tony DeVarco

ROI calculator for refreshing your hardware for Fluent and Mechanical workloads

www.ansys.com/ws-roi-estimator

http://www.ansys.com/hpc-cluster-appliance

http://www.ansys.com/ws-roi-estimator

Today’s Agenda and Speakers

Beyond the Desktop: The value of using HPC to run ANSYS SolversNorbert Bianchin, WW Hybrid HPC & AI GBU Project Manager, Hewlett Packard Enterprise

Free ANSYS Performance Benchmark Program Manager

Questions & AnswersTony DeVarco and Norbert Bianchin, Hewlett Packard Enterprise

Jeremy Rappenecker, Victaulic Wim Slagter, Diana Collier and Pierre Louat, ANSYS

ANSYS 2019 Benchmark Results and Reference ConfigurationsTony DeVarco, HPC Solutions Manager, Hewlett Packard Enterprise

HPC Benchmark at Victaulic Jeremy Rappenecker, Project Engineer – Fluid Control Technology, Victaulic

BEYOND THE DESKTOP: THE VALUE OF USING HPC TO RUN ANSYS SOLVERSNorbert BIANCHINWW Hybrid HPC & AI GBU Program ManagerFree ANSYS Performance Benchmark Program ManagerOctober 16th, 2019

FREE ANSYS PERFORMANCE BENCHMARK PROGRAM

“ANSYS research shows that 50% of engineers are simulating on workstations exclusively.”

Today you need to work faster, being able to manipulate bigger model in less time than before to remain competitive

Let’s see how the Free Performance Benchmark Program is designed to answer these important questions – in precise detail, based on an analysis of your own workloads and current IT setup.

10

https://www.ansys.com/campaigns/performance-benchmark-program?dm_t=0,0,0,0,0

Customers target : workstation users, limited by their current capabilities for pre, post and solveSOLUTION VALUE FOR ANSYS CUSTOMERS

11

WHY

Overcome workstation limitations with HPC - more compute power and more memory (faster turn-around time, more design variations, finer meshes, more physics, coupling, run simulations in parallel, parameter studies)

Old adage says “Time is money” so doing more and better in less time, is all benefit for you!

Seamless integration with the same windows-based interfaceEasy HPC deployment and HPC access with no IT overheadEasy HPC resource update or upgrade (move to latest hardware or software release on

demand in off-premises data center)


12

HOW

This Program is FREE, so no reason to not try

Proof of performance as a service to compare against your workstation solution using your dataset

No additional effort for you!


13

WHATROI Calculator tool summarizes the financial gains.HPC benchmarks done by ANSYS engineers Infrastructure proposals Workstation upgradeWindows-based HPC compute nodes or cluster for end-to-end pre, post and solvers Linux-based HPC clusters to offload the Linux solver part Service proposalsManaged Services / single point of contact to HPE and ANSYS support ANSYS software pre deployed in cloud data center using UberCloud services (UberCloud is

ANSYS cloud-hosting partner)Delivery on-premises or off-premises

HPC SOLUTION PORTFOLIO TO MEET YOUR NEEDS

14

❶Keep your workflows unchanged on your Windows environment

Simulation data

❷Simple (but limited) path to use HPC capabilities is to upgrade your workstation

On-premises

Windows based HPC compute node for end-to-end pre, post and solvers

DL 380 Gen10

Windows based HPC clusters for end-to-end pre, post and solvers

Apollo 2000 Gen10

Linux based HPC clusters for solvers

Apollo 6000 Gen10

Off-premises

Scale to take benefit of HPC withoutdisrupting your operating model

❸

ANSYS Workbench embedded and remote connection to cloud partners for solvers

Apollo 2000

HARDWARE USED FOR THE BENCHMARK

15

ANSYS ADVANIA CLUSTER

Machine 1 node Up to 4 compute nodes

OS Windows server 2016 CentOS Linux release 7.3

Core (per machine) Up to 40 Up to 128 (Hyper threading disabled)

Processor Model Intel Xeon Gold 6148 @ 2.40GHz

Intel Xeon Broadwell | Dual Socket CPU | E5-2683v4 | 2.10GHz

Memory (per Machine) 384 GB 256GB

HDD (per Machine) SSD 1.9 TB SSD 1.9 TB

Interconnect N/A Intel Omnipath

SAMPLE ANSYS MECHANICAL BENCHMARK CUSTOMER RESULT Customer Model Information Baseline (customer ) : 2-3 runs per day

16

DOF 553,527# of Cores 4

Elapsed (hr) 3.1 • HPC-enabled: up to 10 runs per day• 4X Productivity increase instead of only 2-3 runs

(design iterations) during working hours, 10 runs are now feasible

MANUAL USE DURING 8 HOURS WORKING DAY

• HPC-enabled: up to 30 runs per day• 12X Productivity increase instead of only 2-3 runs

(design iterations) during working hours, 30 runs are now feasible

Usage of HPC cluster and job scheduler maximizes hardware and license utilization yielding an even higher productivity increase!

SCHEDULER USE TO RUN 24 HOURS

FREE PERFORMANCE BENCHMARK PROGRAM PROCESS

17

RegistrationYou can sign up using the registration form at Free HPC Benchmark web page

www.ansys.com/free-hpc-benchmark

First ContactANSYS to validate your information and request your application dataset

Your Model UploadedCustomer sends the model through a secure channel to ANSYS

Convergence & Data Clean-upDataset Optimization and tuning

http://www.ansys.com/free-hpc-benchmark

FREE PERFORMANCE BENCHMARK PROGRAM PROCESS

18

BenchmarkYour model is submitted to HPC cluster by ANSYS

Results meetingBenchmark results discussion. Report is sent to you showing the speed up along with HPC solutions that fit customer's needs

SolutionProposal for offering the best HPC solutions, services and licenses

You’re all setFor better, faster and more simulations and results

Jeremy RappeneckerProject Engineer – Fluid Control Technology

HPC Benchmark at Victaulic

Background History• Since 1919 (First Patent Granted)• Headquartered in Easton, PA• Products (Size Range:1/2” to 96”)

• Grooved Pipe Couplings and Fittings• Valves• Grooving Tools• Fire Suppression

• More than 2,000 Patents Granted Since

• Tallest Building to Deepest Mine

Current Conditions and Simulation Goals• Desktop Specification

• 6 Cores, 16 GB RAM, 500GB SSD• Model Details

• Original elapsed time to solve: 60 hours• Improved elapsed time to solve : 14 hours • Element Count: 87,021; DOF count: 596,960

• Physical Testing • Flow Testing• Large Diameter (>12”)• Equipment Limitations• Resource Constraints

• Simulation Goals• Detailed 3D Models • CFD• Fluid Structure Interaction• Development and Discovery• Increase Design Iterations

• Questions• What do we need?• How much do we need?• Where is the best value?

Results From HPC Benchmark

0

10000

20000

30000

40000

50000

60000

Elap

sed

Tim

e (s

)

Elapsed Time Vs. Cores

No GPUwith 1 GPU

0

1

2

3

4

5

6

7

Elap

sed

Tim

e (x

fast

er)

Elapsed Time Vs. Cores [Scalability]

No GPUwith 1 GPU

What’s Next?• Evaluate Data• Which Direction?

• Workstations• HPC Cluster

• Develop Hardware Specification• Build Business Case

• Design Iteration Cost• Physical Testing • Existing Data• Tooling Capital• Project Timelines

Apollo 2000

THANK YOU

ANSYS 2019 BENCHMARK RESULTS AND REFERENCE CONFIGURATIONSTony DeVarcoHPC Solutions Manager, Hewlett Packard Enterprise

October 16th, 2019

HPE BENCHMARK ENVIRONMENT FOR ANSYSHPE Apollo 2000 Gen10 Cluster

XL170r Gen10 nodesHPE Apollo 6000 Gen10 Cluster

XL230k Gen10 nodes

Processor Model & Clock Speed Intel® Xeon® Gold 6142 2.6 GHz Intel® Xeon® Gold 6242 2.8 GHz

Total Cores per Compute Node 16 cores/Socket (32 cores) 16 cores/Socket (32 cores)

Memory per Node 192GB 192GB

Memory Clock in MHz 2666 MHz 2666 MHz

Network Interconnect EDR InfiniBand EDR InfiniBand

Linux OS Red Hat Enterprise Linux Server release 7.6

Red Hat Enterprise Linux Server release 7.6

Turbo On/Off Turbo On Turbo On

Total Cores available 128 nodes/4096 cores 16 nodes/512 cores

26

ANSYS MECHANICAL 2019 R1 STANDARD BENCHMARKS

ANSYS MECHANICAL 2019 R1 RUNNING ON ONE SINGLE NODEINTEL XEON GOLD 6142 2.6 GHZ PROCESSOR

28

• Power Supply Module (V19cg-1)• Tractor Rear Axle (V19cg-2)• Engine Block (V19cg-3)• Gear Box (V19ln-1)• Radial Impeller (V19ln-2)• Peltier Cooling Block (V19sp-1)• Semi-Submersible (V19sp-2)• Speaker (V19sp-3)• Turbine (V19sp-4)• BGA (V19sp-5)

ANSYS Mechanical Standard benchmarks datasets

0

500

1000

1500

2000

2500

3000

3500

4000

4500

Seco

nds

elap

sed

Scaling within 1 node 1 2 4 8 16 32

# of Cores

V19cg-1 V19cg-2 V19cg-3 V19In-1 V19In-2 V19sp-1 V19sp-2 V19sp-3 V19sp-4 V19sp-5

Lower is Better

ANSYS MECHANICAL 2019 R3 SCALING UP TO 4 NODES (128 CORES)USING INTEL XEON GOLD 6242 2.6 GHZ PROCESSOR

• Power Supply Module (V19cg-1)• Tractor Rear Axle (V19cg-2)• Engine Block (V19cg-3)• Gear Box (V19ln-1)• Radial Impeller (V19ln-2)• Peltier Cooling Block (V19sp-1)• Semi-Submersible (V19sp-2)• Speaker (V19sp-3)• Turbine (V19sp-4)• BGA (V19sp-5)


Lower is Better

29

0,0

20,0

40,0

60,0

80,0

100,0

120,0

140,0

160,0

180,0

V19cg-1 V19cg-2 V19cg-3 V19ln-1 V19ln-2 V19sp-1 V19sp-2 V19sp-3 V19sp-4 V19sp-5 GeometricMean

Seco

nds E

laps

ed

Benchmark name

NODE SCALING

1 2 3 4

ANSYS MECHANICAL 2019 UP TO 4 NODES (128 CORES)2019R1 XEON GOLD 6142 (SKL) AND 2019R3 XEON GOLD 6242 (CCL)


Lower is Better

30

0,00

20,00

40,00

60,00

80,00

100,00

120,00

140,00

32 64 96 128

Seco

nd E

laps

ed

SKL and CCL geom mean up to 4 Node

geo mean skl geo mean ccl

Gold 6242 is 15% faster by average

ANSYS FLUENT 2019 R1 AND R3 STANDARD BENCHMARKS

ANSYS FLUENT 2019 R1 SPEEDUP RUNNING ON SINGLE INTEL XEON GOLD 6142 2.6 GHZ PROCESSOR

Higher is Better

ANSYS Fluent Standard Benchmark Aircraft Wing 14M Cells

0

100

200

300

400

500

600

700

Rat

ing

(jobs

/day

)

ANSYS Fluent 2019 R1 Aircraft Wing 14M Benchmark on Intel® Gold 6142 2.60 GHz

1 node

MPI Tasks (cores)

1c 2c 4c 6c 8c 10c 12c 1 4c 16c 18c 20c 22c 24c 26c 28c 30c 32c

32

0

2000

4000

6000

8000

10000

12000

32 64 128 256 512

Ratin

g (Jo

bs/d

ay)

MPI Tasks (Cores)

ANSYS Fluent 2019 Aircraft Wing 14M Benchmarkon Intel® Gold 6142 2.60 GHz/Intel® Gold6242 2.8GHz

Gold 6142 processors (16c/2.6GHz/150W) 2019 R1 (19.3.0)


ANSYS FLUENT 2019 PERFORMANCE COMPARISON: INTEL XEON GOLD 6142 2.6 GHZ VS. INTEL XEON GOLD 6242 2.8 GHZ

Higher is Better

ANSYS Fluent Standard Benchmark Aircraft Wing 14M Cells33

0

500

1000

1500

2000

2500

3000

3500

4000

32 64 128 256 512

Ratin

g (Jo

bs/d

ay)

MPI Tasks (Cores)

ANSYS Fluent 2019 Exhaust System Benchmarkon Intel® Gold 6142 2.60 GHz/Intel® Gold6242 2.8GHz



ANSYS Fluent Standard Benchmark Exhaust System 33M Cells

Higher is Better


34

0

100

200

300

400

500

600

700

800

64 128 256 512

Ratin

g (Jo

bs/d

ay)

MPI Tasks (Cores)

ANSYS Fluent 2019 F1 Racecar 140M Benchmarkon Intel® Gold 6142 2.60 GHz/Intel® Gold6242 2.8GHz



ANSYS Fluent Standard Benchmark F1 Racecar 140M Cells

Higher is Better


35

ANSYS CFX 2019 R3 CFX STANDARD BENCHMARKSANSYS CFX 2019 R3STANDARD BENCHMARKS

HPE ANSYS BENCHMARK ENVIRONMENT FOR INTEL® XEON® PROCESSORS USED FOR RUNNING THE LEMANS CAR MODEL

HPE SGI 8600 ClusterHPE Apollo 6000 Gen10

ClusterXL170r Gen10 nodes

SGI ICE XA(legacy system)

Processor Model & Clock Speed Intel® Xeon® Gold 6154

3.0 GHzIntel® Xeon® Platinum 8268

2.90 GHzIntel® Xeon® E5-2690v4 2.6

GHz

Total Cores per Compute Node 18 cores/Socket (36 cores) 16 cores/Socket (32 cores) 14 cores/Socket (28 cores)

Memory per Node 192GB 192GB 128GB

Memory Clock in MHz 2666 MHz DDR4 2933 MHz DDR4 2400 MHz DDR4

Network InterconnectIntel Omni-Path OPA Intel Omni-Path OPA EDR InfiniBand

Linux OS Red Hat Enterprise Linux Server release 7.6

Red Hat Enterprise Linux Server release 7.6

SLES 11 SP3

Turbo On/Off Turbo On Turbo On Turbo On

Total Cores available 288 nodes/10368 cores 288 nodes/13824 cores 128 nodes/3,584

37

ANSYS CFX 2019 R3 SCALING UP TO 8 NODESOF VARIOUS GENERATIONS OF INTEL XEON PROCESSORS

LeMans Car ModelExternal flow over a LeMans Car. The case has approximately 1.8

million nodes (10 million elements, all tetrahedral), and solves compressible fluid flow with heat transfer using the k-epsilon

turbulence model.

0

3000

6000

9000

12000

15000

18000

21000

1 2 4 8

Ratin

g (Jo

bs P

er D

ay)

Number Compute Nodes

Intel® Xeon Processors

Xeon E5-2690v4

Xeon Gold 6154

Platinum 8268

Higher is Better

38

0

5000

10000

15000

20000

25000

30000

1 2 4 8 16

Ratin

g(Jo

bs P

er D

ay)



Xeon E5-2690v4

Xeon Gold 6154

Platinum 8268

Automotive Pump ModelFlow in an automotive pump case. A total of

approximately 600,000 nodes (1.6 million elements, combination of tetrahedral, prisms,

and pyramids) over three domains with multiple frames of reference, and solves

incompressible fluid flow using the k-epsilon turbulence model.

Higher is Better


39

0

2000

4000

6000

8000

10000

1 2 4 8 16

Ratin

g (Jo

bs P

er D

ay)



Xeon E5-2690v4

Xeon Gold 6154

Xeon Platinum 8268

Higher is Better

ANSYS CFX 2019 R3 SCALING UP TO 16 NODES OF VARIOUS GENERATIONS OF INTEL XEON PROCESSORS

External flow over an airfoil The case has approximately 9.9 million

nodes (9.4 million elements, all hexahedral) and solves compressible fluid flow with heat transfer using the

SST turbulence model

40

0

500

1000

1500

2000

2500

1 2 4 8 16

Ratin

g (Jo

bs P

er D

ay)



Xeon E5-2690v4

Intel Gold 6154

Xeon Platinum8268

Higher is Better


External flow over an airfoil Same as the 10M refined to 50M

cells

41

-100

100

300

500

700

900

1100

1300

1 2 4 8 16

Ratin

g (Jo

bs P

er D

ay)



Xeon E5-2690v4

Xeon Gold 6154

Xeon Platinum 8268

Higher is Better

ANSYS CFX 2019 R3 SCALING UP TO 16 NODES OF VARIOUS GENERATIONS OF INTEL XEON PROCESSORS

External flow over an airfoil Same as the 10M refined to 100M

cells

42

ANSYS FLUENT/CFX: HPE APOLLO 2000 GEN10 STARTER CLUSTER Server Options:

• Either 1 ProLiant DL360 Gen10 head node (external) or a single XL170r (within the Apollo 2000 chassis)

• 2-4 ProLiant XL170r Gen10 1U compute servers

Apollo 2000 Gen10 chassis• Processors: 32 cores per compute node using

the Intel® Xeon® Gold 6242 2.8 GHz processors

• Up to 128 cores with four compute nodes using the Intel Xeon Gold 6242

• Local scratch one 480GB NVME SSD drive

Memory for the Cluster• Compute nodes: 192GB• Head node 96/192GB depending on role

Cluster Interconnect:• Integrated Gigabit, 10Gigabit Ethernet, InfiniBand or

Intel Omni-Path (jobs scaling greater than two nodes InfiniBand or Omni-Path recommended)

Operating Environment:• RedHat Enterprise Linux v7.x • SUSE Linux Enterprise Linux v12 SP2• Windows Server 2019 (or 2019 for latest version)

Workloads:• Suited for Fluent up to ~260M cells• Suited for CFX up to 74M to 260M nodes

NOTE: All memory channels need to be filled and be filled with equal amounts of RAM. If not, you could see up to a 40% decrease in performance. Please file an ANSYS service request to help refine your configuration workflow before making a purchase.

43

ANSYS MECHANICAL: HPE APOLLO 2000 GEN10 STARTER CLUSTER Server Options:

• Either 1 ProLiant DL360 Gen10 head node (external) or a single XL170r (within the Apollo 2000 chassis)

• 2-4 ProLiant XL170r Gen10 1U compute servers

Apollo 2000 Gen10 chassis• Processors: 32 cores per compute node using

the Intel® Xeon® Gold 6242 2.8 GHz processors • Up to 128 cores with four compute nodes using

the intel Xeon Gold 6242 • 2 RAID0 1TB NVME write intensive SSD drives

for local scratch

Accelerator Options:• NVIDIA P100 or V100 GPUs (need to use XL

190r Gen 10 blade)

Memory for the Cluster• Compute nodes: 384GB• Head node 96/192GB depending on role

Cluster Interconnect:• Integrated Gigabit, 10Gigabit Ethernet, InfiniBand or

Intel Omni-Path (jobs scaling greater than two nodes InfiniBand or Omni-Path recommended)

Operating Environment:• RedHat Enterprise Linux v7.x • SUSE Linux Enterprise Linux v12 SP2• Windows Server 2019 (or 2019 for latest version)

Workloads:• Suited for Mechanical up to 80M to 550M DOF

depending on solver used

44

NOTE: All memory channels need to be filled and be filled with equal amounts of RAM. If not, you could see up to a 40% decrease in performance. Please file an ANSYS service request to help refine your configuration workflow before making a purchase.

Final Remarks

Wrap Up / Next Steps

www.ansys.com/about-ansys/partner-ecosystem/high-performance-computing-partners/hpe

www.ansys.com/ws-roi-estimator

Check out the ROI!

For hardware check these pages out!

www.ansys.com/support/platform-supportwww.ansys.com/hpc-webinars

www.ansys.com/hpc-cluster-appliance

Benefit from the Resources on the HPE page!

www.ansys.com/free-hpc-benchmark

Join the Free HPC Benchmark!

http://www.ansys.com/about-ansys/partner-ecosystem/high-performance-computing-partners/hpe


http://www.ansys.com/ws-roi-estimator


http://www.ansys.com/support/platform-support


http://www.ansys.com/hpc-webinars



http://www.ansys.com/free-hpc-benchmark

Join the Simulation Conversation!

47

Read. Comment. Join the conversation!

The new and improved ANSYS blog is live at:

ansys.com/blog

Documents

Learn How to Speed Up Your Simulations with More Powerful ... · Your Simulations with . More Powerful Servers. Tony DeVarco and Norbert Bianchin, Hewlett Packard Enterprise. Jeremy