Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Learn How to Speed Up Your Simulations with More Powerful Servers
Tony DeVarco and Norbert Bianchin, Hewlett Packard Enterprise
Jeremy Rappenecker, Victaulic
Wim Slagter, Diana Collier and Pierre Louat, ANSYS
Part of ANSYS IT Solutions Webcast Series
October 16, 2019
www.ansys.com/Solutions/Solutions-by-Role/IT-Professionals
IT & Cloud Solutions for ANSYS
IT solutions webcast series from ANSYS and our partners
Our goal is to provide ANSYS customers with- Recommendations on HW and system specification for on-premises
and in the cloud- Best practice configuration, setup, management- Roadmap and vision for planning
Past topics included- BD Improved Engineering Simulation Productivity with HPC to Run ANSYS Software- Understand How High-Performance Compute Can Accelerate Your Simulation Throughput- Speed-up your Desktop ANSYS FEA Simulations with ANSYS HPC- Getting the Most from Your ANSYS Simulation Applications- Focus on Faster Mechanical Simulation
ANSYS Focus on IT Solutions
More simulations in less time
HPC scale-up and collaboration
IT is the enabler for more effective use of engineering simulation
Larger, more complex and accurate simulation
Some Observations – Are You Also Compute Bound?Roughly half of our users exclusively run on a laptop/desktop!
17% of our users have hardware that is >3 years old!
More Observations – Misconception about HPC for FEAMechanical clearly scales beyond a workstation computer!
Release 16.0 Release 18.0 2019 R1Release 19.1
2019 R3
Distributed ANSYS EnhancementsImproved scaling for sparse direct solver
0
2
4
6
8
10
12
128 256 512 1024 2048 4096
Spee
dup
Number of Cores
DMP Scaling Performance
2019 R2
2019 R3
Increased performance over time
efficiency > 50%
Case Characteristics:• SP8mdof benchmark• Sparse Solver• Nonlinear transient analysis involving
creep & NLGEOM,ONHardware Configuration:• 2 Intel Xeon Platinum 8260L @ 2.4GHz
(48 cores total)• 192 GB RAM, SSD, RHEL 7.6
More Observations & How Do We Address Them?
Today’s Topic Being Addressed by
Norbert Bianchin
Today’s Topic Being Addressed by Tony DeVarco
ROI calculator for refreshing your hardware for Fluent and Mechanical workloads
www.ansys.com/ws-roi-estimator
Today’s Agenda and Speakers
Beyond the Desktop: The value of using HPC to run ANSYS SolversNorbert Bianchin, WW Hybrid HPC & AI GBU Project Manager, Hewlett Packard Enterprise
Free ANSYS Performance Benchmark Program Manager
Questions & AnswersTony DeVarco and Norbert Bianchin, Hewlett Packard Enterprise
Jeremy Rappenecker, Victaulic Wim Slagter, Diana Collier and Pierre Louat, ANSYS
ANSYS 2019 Benchmark Results and Reference ConfigurationsTony DeVarco, HPC Solutions Manager, Hewlett Packard Enterprise
HPC Benchmark at Victaulic Jeremy Rappenecker, Project Engineer – Fluid Control Technology, Victaulic
BEYOND THE DESKTOP: THE VALUE OF USING HPC TO RUN ANSYS SOLVERSNorbert BIANCHINWW Hybrid HPC & AI GBU Program ManagerFree ANSYS Performance Benchmark Program ManagerOctober 16th, 2019
FREE ANSYS PERFORMANCE BENCHMARK PROGRAM
“ANSYS research shows that 50% of engineers are simulating on workstations exclusively.”
Today you need to work faster, being able to manipulate bigger model in less time than before to remain competitive
Let’s see how the Free Performance Benchmark Program is designed to answer these important questions – in precise detail, based on an analysis of your own workloads and current IT setup.
10
Customers target : workstation users, limited by their current capabilities for pre, post and solveSOLUTION VALUE FOR ANSYS CUSTOMERS
11
WHY
Overcome workstation limitations with HPC - more compute power and more memory (faster turn-around time, more design variations, finer meshes, more physics, coupling, run simulations in parallel, parameter studies)
Old adage says “Time is money” so doing more and better in less time, is all benefit for you!
Seamless integration with the same windows-based interfaceEasy HPC deployment and HPC access with no IT overheadEasy HPC resource update or upgrade (move to latest hardware or software release on
demand in off-premises data center)
Customers target : workstation users, limited by their current capabilities for pre, post and solveSOLUTION VALUE FOR ANSYS CUSTOMERS
12
HOW
This Program is FREE, so no reason to not try
Proof of performance as a service to compare against your workstation solution using your dataset
No additional effort for you!
Customers target : workstation users, limited by their current capabilities for pre, post and solveSOLUTION VALUE FOR ANSYS CUSTOMERS
13
WHATROI Calculator tool summarizes the financial gains.HPC benchmarks done by ANSYS engineers Infrastructure proposals Workstation upgradeWindows-based HPC compute nodes or cluster for end-to-end pre, post and solvers Linux-based HPC clusters to offload the Linux solver part Service proposalsManaged Services / single point of contact to HPE and ANSYS support ANSYS software pre deployed in cloud data center using UberCloud services (UberCloud is
ANSYS cloud-hosting partner)Delivery on-premises or off-premises
HPC SOLUTION PORTFOLIO TO MEET YOUR NEEDS
14
❶Keep your workflows unchanged on your Windows environment
Simulation data
❷Simple (but limited) path to use HPC capabilities is to upgrade your workstation
On-premises
Windows based HPC compute node for end-to-end pre, post and solvers
DL 380 Gen10
Windows based HPC clusters for end-to-end pre, post and solvers
Apollo 2000 Gen10
Linux based HPC clusters for solvers
Apollo 6000 Gen10
Off-premises
Scale to take benefit of HPC withoutdisrupting your operating model
❸
ANSYS Workbench embedded and remote connection to cloud partners for solvers
Apollo 2000
HARDWARE USED FOR THE BENCHMARK
15
ANSYS ADVANIA CLUSTER
Machine 1 node Up to 4 compute nodes
OS Windows server 2016 CentOS Linux release 7.3
Core (per machine) Up to 40 Up to 128 (Hyper threading disabled)
Processor Model Intel Xeon Gold 6148 @ 2.40GHz
Intel Xeon Broadwell | Dual Socket CPU | E5-2683v4 | 2.10GHz
Memory (per Machine) 384 GB 256GB
HDD (per Machine) SSD 1.9 TB SSD 1.9 TB
Interconnect N/A Intel Omnipath
SAMPLE ANSYS MECHANICAL BENCHMARK CUSTOMER RESULT Customer Model Information Baseline (customer ) : 2-3 runs per day
16
DOF 553,527# of Cores 4
Elapsed (hr) 3.1 • HPC-enabled: up to 10 runs per day• 4X Productivity increase instead of only 2-3 runs
(design iterations) during working hours, 10 runs are now feasible
MANUAL USE DURING 8 HOURS WORKING DAY
• HPC-enabled: up to 30 runs per day• 12X Productivity increase instead of only 2-3 runs
(design iterations) during working hours, 30 runs are now feasible
Usage of HPC cluster and job scheduler maximizes hardware and license utilization yielding an even higher productivity increase!
SCHEDULER USE TO RUN 24 HOURS
FREE PERFORMANCE BENCHMARK PROGRAM PROCESS
17
RegistrationYou can sign up using the registration form at Free HPC Benchmark web page
www.ansys.com/free-hpc-benchmark
First ContactANSYS to validate your information and request your application dataset
Your Model UploadedCustomer sends the model through a secure channel to ANSYS
Convergence & Data Clean-upDataset Optimization and tuning
FREE PERFORMANCE BENCHMARK PROGRAM PROCESS
18
BenchmarkYour model is submitted to HPC cluster by ANSYS
Results meetingBenchmark results discussion. Report is sent to you showing the speed up along with HPC solutions that fit customer's needs
SolutionProposal for offering the best HPC solutions, services and licenses
You’re all setFor better, faster and more simulations and results
Jeremy RappeneckerProject Engineer – Fluid Control Technology
HPC Benchmark at Victaulic
Background History• Since 1919 (First Patent Granted)• Headquartered in Easton, PA• Products (Size Range:1/2” to 96”)
• Grooved Pipe Couplings and Fittings• Valves• Grooving Tools• Fire Suppression
• More than 2,000 Patents Granted Since
• Tallest Building to Deepest Mine
Current Conditions and Simulation Goals• Desktop Specification
• 6 Cores, 16 GB RAM, 500GB SSD• Model Details
• Original elapsed time to solve: 60 hours• Improved elapsed time to solve : 14 hours • Element Count: 87,021; DOF count: 596,960
• Physical Testing • Flow Testing• Large Diameter (>12”)• Equipment Limitations• Resource Constraints
• Simulation Goals• Detailed 3D Models • CFD• Fluid Structure Interaction• Development and Discovery• Increase Design Iterations
• Questions• What do we need?• How much do we need?• Where is the best value?
Results From HPC Benchmark
0
10000
20000
30000
40000
50000
60000
Elap
sed
Tim
e (s
)
Elapsed Time Vs. Cores
No GPUwith 1 GPU
0
1
2
3
4
5
6
7
Elap
sed
Tim
e (x
fast
er)
Elapsed Time Vs. Cores [Scalability]
No GPUwith 1 GPU
What’s Next?• Evaluate Data• Which Direction?
• Workstations• HPC Cluster
• Develop Hardware Specification• Build Business Case
• Design Iteration Cost• Physical Testing • Existing Data• Tooling Capital• Project Timelines
Apollo 2000
THANK YOU
ANSYS 2019 BENCHMARK RESULTS AND REFERENCE CONFIGURATIONSTony DeVarcoHPC Solutions Manager, Hewlett Packard Enterprise
October 16th, 2019
HPE BENCHMARK ENVIRONMENT FOR ANSYSHPE Apollo 2000 Gen10 Cluster
XL170r Gen10 nodesHPE Apollo 6000 Gen10 Cluster
XL230k Gen10 nodes
Processor Model & Clock Speed Intel® Xeon® Gold 6142 2.6 GHz Intel® Xeon® Gold 6242 2.8 GHz
Total Cores per Compute Node 16 cores/Socket (32 cores) 16 cores/Socket (32 cores)
Memory per Node 192GB 192GB
Memory Clock in MHz 2666 MHz 2666 MHz
Network Interconnect EDR InfiniBand EDR InfiniBand
Linux OS Red Hat Enterprise Linux Server release 7.6
Red Hat Enterprise Linux Server release 7.6
Turbo On/Off Turbo On Turbo On
Total Cores available 128 nodes/4096 cores 16 nodes/512 cores
26
ANSYS MECHANICAL 2019 R1 STANDARD BENCHMARKS
ANSYS MECHANICAL 2019 R1 RUNNING ON ONE SINGLE NODEINTEL XEON GOLD 6142 2.6 GHZ PROCESSOR
28
• Power Supply Module (V19cg-1)• Tractor Rear Axle (V19cg-2)• Engine Block (V19cg-3)• Gear Box (V19ln-1)• Radial Impeller (V19ln-2)• Peltier Cooling Block (V19sp-1)• Semi-Submersible (V19sp-2)• Speaker (V19sp-3)• Turbine (V19sp-4)• BGA (V19sp-5)
ANSYS Mechanical Standard benchmarks datasets
0
500
1000
1500
2000
2500
3000
3500
4000
4500
Seco
nds
elap
sed
Scaling within 1 node 1 2 4 8 16 32
# of Cores
V19cg-1 V19cg-2 V19cg-3 V19In-1 V19In-2 V19sp-1 V19sp-2 V19sp-3 V19sp-4 V19sp-5
Lower is Better
ANSYS MECHANICAL 2019 R3 SCALING UP TO 4 NODES (128 CORES)USING INTEL XEON GOLD 6242 2.6 GHZ PROCESSOR
• Power Supply Module (V19cg-1)• Tractor Rear Axle (V19cg-2)• Engine Block (V19cg-3)• Gear Box (V19ln-1)• Radial Impeller (V19ln-2)• Peltier Cooling Block (V19sp-1)• Semi-Submersible (V19sp-2)• Speaker (V19sp-3)• Turbine (V19sp-4)• BGA (V19sp-5)
ANSYS Mechanical Standard benchmarks datasets
Lower is Better
29
0,0
20,0
40,0
60,0
80,0
100,0
120,0
140,0
160,0
180,0
V19cg-1 V19cg-2 V19cg-3 V19ln-1 V19ln-2 V19sp-1 V19sp-2 V19sp-3 V19sp-4 V19sp-5 GeometricMean
Seco
nds E
laps
ed
Benchmark name
NODE SCALING
1 2 3 4
ANSYS MECHANICAL 2019 UP TO 4 NODES (128 CORES)2019R1 XEON GOLD 6142 (SKL) AND 2019R3 XEON GOLD 6242 (CCL)
ANSYS Mechanical Standard benchmarks datasets
Lower is Better
30
0,00
20,00
40,00
60,00
80,00
100,00
120,00
140,00
32 64 96 128
Seco
nd E
laps
ed
SKL and CCL geom mean up to 4 Node
geo mean skl geo mean ccl
Gold 6242 is 15% faster by average
ANSYS FLUENT 2019 R1 AND R3 STANDARD BENCHMARKS
ANSYS FLUENT 2019 R1 SPEEDUP RUNNING ON SINGLE INTEL XEON GOLD 6142 2.6 GHZ PROCESSOR
Higher is Better
ANSYS Fluent Standard Benchmark Aircraft Wing 14M Cells
0
100
200
300
400
500
600
700
Rat
ing
(jobs
/day
)
ANSYS Fluent 2019 R1 Aircraft Wing 14M Benchmark on Intel® Gold 6142 2.60 GHz
1 node
MPI Tasks (cores)
1c 2c 4c 6c 8c 10c 12c 1 4c 16c 18c 20c 22c 24c 26c 28c 30c 32c
32
0
2000
4000
6000
8000
10000
12000
32 64 128 256 512
Ratin
g (Jo
bs/d
ay)
MPI Tasks (Cores)
ANSYS Fluent 2019 Aircraft Wing 14M Benchmarkon Intel® Gold 6142 2.60 GHz/Intel® Gold6242 2.8GHz
Gold 6142 processors (16c/2.6GHz/150W) 2019 R1 (19.3.0)
Gold 6242 processors (16c/2.8GHz/150W) 2019 R3 (19.5.0)
ANSYS FLUENT 2019 PERFORMANCE COMPARISON: INTEL XEON GOLD 6142 2.6 GHZ VS. INTEL XEON GOLD 6242 2.8 GHZ
Higher is Better
ANSYS Fluent Standard Benchmark Aircraft Wing 14M Cells33
0
500
1000
1500
2000
2500
3000
3500
4000
32 64 128 256 512
Ratin
g (Jo
bs/d
ay)
MPI Tasks (Cores)
ANSYS Fluent 2019 Exhaust System Benchmarkon Intel® Gold 6142 2.60 GHz/Intel® Gold6242 2.8GHz
Gold 6142 processors (16c/2.6GHz/150W) 2019 R1 (19.3.0)
Gold 6242 processors (16c/2.8GHz/150W) 2019 R3 (19.5.0)
ANSYS Fluent Standard Benchmark Exhaust System 33M Cells
Higher is Better
ANSYS FLUENT 2019 PERFORMANCE COMPARISON: INTEL XEON GOLD 6142 2.6 GHZ VS. INTEL XEON GOLD 6242 2.8 GHZ
34
0
100
200
300
400
500
600
700
800
64 128 256 512
Ratin
g (Jo
bs/d
ay)
MPI Tasks (Cores)
ANSYS Fluent 2019 F1 Racecar 140M Benchmarkon Intel® Gold 6142 2.60 GHz/Intel® Gold6242 2.8GHz
Gold 6142 processors (16c/2.6GHz/150W) 2019 R1 (19.3.0)
Gold 6242 processors (16c/2.8GHz/150W) 2019 R3 (19.5.0)
ANSYS Fluent Standard Benchmark F1 Racecar 140M Cells
Higher is Better
ANSYS FLUENT 2019 PERFORMANCE COMPARISON: INTEL XEON GOLD 6142 2.6 GHZ VS. INTEL XEON GOLD 6242 2.8 GHZ
35
ANSYS CFX 2019 R3 CFX STANDARD BENCHMARKSANSYS CFX 2019 R3STANDARD BENCHMARKS
HPE ANSYS BENCHMARK ENVIRONMENT FOR INTEL® XEON® PROCESSORS USED FOR RUNNING THE LEMANS CAR MODEL
HPE SGI 8600 ClusterHPE Apollo 6000 Gen10
ClusterXL170r Gen10 nodes
SGI ICE XA(legacy system)
Processor Model & Clock Speed Intel® Xeon® Gold 6154
3.0 GHzIntel® Xeon® Platinum 8268
2.90 GHzIntel® Xeon® E5-2690v4 2.6
GHz
Total Cores per Compute Node 18 cores/Socket (36 cores) 16 cores/Socket (32 cores) 14 cores/Socket (28 cores)
Memory per Node 192GB 192GB 128GB
Memory Clock in MHz 2666 MHz DDR4 2933 MHz DDR4 2400 MHz DDR4
Network InterconnectIntel Omni-Path OPA Intel Omni-Path OPA EDR InfiniBand
Linux OS Red Hat Enterprise Linux Server release 7.6
Red Hat Enterprise Linux Server release 7.6
SLES 11 SP3
Turbo On/Off Turbo On Turbo On Turbo On
Total Cores available 288 nodes/10368 cores 288 nodes/13824 cores 128 nodes/3,584
37
ANSYS CFX 2019 R3 SCALING UP TO 8 NODESOF VARIOUS GENERATIONS OF INTEL XEON PROCESSORS
LeMans Car ModelExternal flow over a LeMans Car. The case has approximately 1.8
million nodes (10 million elements, all tetrahedral), and solves compressible fluid flow with heat transfer using the k-epsilon
turbulence model.
0
3000
6000
9000
12000
15000
18000
21000
1 2 4 8
Ratin
g (Jo
bs P
er D
ay)
Number Compute Nodes
Intel® Xeon Processors
Xeon E5-2690v4
Xeon Gold 6154
Platinum 8268
Higher is Better
38
0
5000
10000
15000
20000
25000
30000
1 2 4 8 16
Ratin
g(Jo
bs P
er D
ay)
Number Compute Nodes
Intel® Xeon Processors
Xeon E5-2690v4
Xeon Gold 6154
Platinum 8268
Automotive Pump ModelFlow in an automotive pump case. A total of
approximately 600,000 nodes (1.6 million elements, combination of tetrahedral, prisms,
and pyramids) over three domains with multiple frames of reference, and solves
incompressible fluid flow using the k-epsilon turbulence model.
Higher is Better
ANSYS CFX 2019 R3 SCALING UP TO 16 NODESOF VARIOUS GENERATIONS OF INTEL XEON PROCESSORS
39
0
2000
4000
6000
8000
10000
1 2 4 8 16
Ratin
g (Jo
bs P
er D
ay)
Number Compute Nodes
Intel® Xeon Processors
Xeon E5-2690v4
Xeon Gold 6154
Xeon Platinum 8268
Higher is Better
ANSYS CFX 2019 R3 SCALING UP TO 16 NODES OF VARIOUS GENERATIONS OF INTEL XEON PROCESSORS
External flow over an airfoil The case has approximately 9.9 million
nodes (9.4 million elements, all hexahedral) and solves compressible fluid flow with heat transfer using the
SST turbulence model
40
0
500
1000
1500
2000
2500
1 2 4 8 16
Ratin
g (Jo
bs P
er D
ay)
Number Compute Nodes
Intel® Xeon Processors
Xeon E5-2690v4
Intel Gold 6154
Xeon Platinum8268
Higher is Better
ANSYS CFX 2019 R3 SCALING UP TO 16 NODESOF VARIOUS GENERATIONS OF INTEL XEON PROCESSORS
External flow over an airfoil Same as the 10M refined to 50M
cells
41
-100
100
300
500
700
900
1100
1300
1 2 4 8 16
Ratin
g (Jo
bs P
er D
ay)
Number Compute Nodes
Intel® Xeon Processors
Xeon E5-2690v4
Xeon Gold 6154
Xeon Platinum 8268
Higher is Better
ANSYS CFX 2019 R3 SCALING UP TO 16 NODES OF VARIOUS GENERATIONS OF INTEL XEON PROCESSORS
External flow over an airfoil Same as the 10M refined to 100M
cells
42
ANSYS FLUENT/CFX: HPE APOLLO 2000 GEN10 STARTER CLUSTER Server Options:
• Either 1 ProLiant DL360 Gen10 head node (external) or a single XL170r (within the Apollo 2000 chassis)
• 2-4 ProLiant XL170r Gen10 1U compute servers
Apollo 2000 Gen10 chassis• Processors: 32 cores per compute node using
the Intel® Xeon® Gold 6242 2.8 GHz processors
• Up to 128 cores with four compute nodes using the Intel Xeon Gold 6242
• Local scratch one 480GB NVME SSD drive
Memory for the Cluster• Compute nodes: 192GB• Head node 96/192GB depending on role
Cluster Interconnect:• Integrated Gigabit, 10Gigabit Ethernet, InfiniBand or
Intel Omni-Path (jobs scaling greater than two nodes InfiniBand or Omni-Path recommended)
Operating Environment:• RedHat Enterprise Linux v7.x • SUSE Linux Enterprise Linux v12 SP2• Windows Server 2019 (or 2019 for latest version)
Workloads:• Suited for Fluent up to ~260M cells• Suited for CFX up to 74M to 260M nodes
NOTE: All memory channels need to be filled and be filled with equal amounts of RAM. If not, you could see up to a 40% decrease in performance. Please file an ANSYS service request to help refine your configuration workflow before making a purchase.
43
ANSYS MECHANICAL: HPE APOLLO 2000 GEN10 STARTER CLUSTER Server Options:
• Either 1 ProLiant DL360 Gen10 head node (external) or a single XL170r (within the Apollo 2000 chassis)
• 2-4 ProLiant XL170r Gen10 1U compute servers
Apollo 2000 Gen10 chassis• Processors: 32 cores per compute node using
the Intel® Xeon® Gold 6242 2.8 GHz processors • Up to 128 cores with four compute nodes using
the intel Xeon Gold 6242 • 2 RAID0 1TB NVME write intensive SSD drives
for local scratch
Accelerator Options:• NVIDIA P100 or V100 GPUs (need to use XL
190r Gen 10 blade)
Memory for the Cluster• Compute nodes: 384GB• Head node 96/192GB depending on role
Cluster Interconnect:• Integrated Gigabit, 10Gigabit Ethernet, InfiniBand or
Intel Omni-Path (jobs scaling greater than two nodes InfiniBand or Omni-Path recommended)
Operating Environment:• RedHat Enterprise Linux v7.x • SUSE Linux Enterprise Linux v12 SP2• Windows Server 2019 (or 2019 for latest version)
Workloads:• Suited for Mechanical up to 80M to 550M DOF
depending on solver used
44
NOTE: All memory channels need to be filled and be filled with equal amounts of RAM. If not, you could see up to a 40% decrease in performance. Please file an ANSYS service request to help refine your configuration workflow before making a purchase.
Final Remarks
Wrap Up / Next Steps
www.ansys.com/about-ansys/partner-ecosystem/high-performance-computing-partners/hpe
www.ansys.com/ws-roi-estimator
Check out the ROI!
For hardware check these pages out!
www.ansys.com/support/platform-supportwww.ansys.com/hpc-webinars
www.ansys.com/hpc-cluster-appliance
Benefit from the Resources on the HPE page!
www.ansys.com/free-hpc-benchmark
Join the Free HPC Benchmark!
Join the Simulation Conversation!
47
Read. Comment. Join the conversation!
The new and improved ANSYS blog is live at:
ansys.com/blog