View
4
Download
1
Category
Preview:
Citation preview
© 2016 HBM
Improving the Performance of nCode DesignLife Simulations
Jon AldredDirector, Product Management
2016 nCode User Group Meeting
© 2016 HBM October 5-6, 2016 www.ncode.com
© 2016 HBM
2
• Put simply:• Reduce the number of calculation locations in your model
• Reduce the number of data points in your loading
• Use more computing power
• The translation stage reading the finite element results can take some time. Increasing the MemoryBufferSize preference to translate the whole file in memory can be a big time saving!
How do we perform a quicker CAE fatigue prediction?
POWER
SMARTER
2016 nCode User Group Meeting
© 2016 HBM October 5-6, 2016 www.ncode.com
© 2016 HBM
3
• Processing threads are necessary to take advantage of multi‐core processors in current computer hardware.
• Because the rate of CPU clock speed improvements is slow, parallel computing in the form of multi‐core processors is required to improve overall processing performance.
• On a single PC machine, this is known as shared‐memory processing (SMP).
• nCode DesignLife uses 2 threads as standard –meaning that 2 processing cores can be simultaneously used for DesignLife computations.
• Additional licenses for Processing Threads enable more processing cores to be used simultaneously.
• Each Processing Thread requires a license or 150 CDS units.
Processing Threads
2016 nCode User Group Meeting
© 2016 HBM October 5-6, 2016 www.ncode.com
© 2016 HBM
4
• Increased number of threads produces excellent scaling for faster analysis.
• This example shows almost linear scaling with 8 threads.
Thread Scaling
Times are for analysis run only and do not include FE translation time.
Example:• Analysis
EN with AbsMaxPrincipal combination method.
Loading is duty cycle with 91 events, 135 channels with 80% PeakValley compression resulting in 16,356 data points per channel
6,000 nodes analyzed
• Computer Hardware for Windows 2 x quad-core 2GHz, 4GB RAM Windows7 64-bit
• Computer Hardware for Linux 2 x hex core Xeon X5680 3.33GHz 64 bit SuSE Linux 11 SP1
2016 nCode User Group Meeting
© 2016 HBM October 5-6, 2016 www.ncode.com
© 2016 HBM
5
• Distribute analysis jobs across multiple machines or nodes of a compute cluster in order to improve simulation throughput.
• Uses Intel, Microsoft, or IBM implementations of MPI (Message Passing Interface) standard for the communication between processes.
• A batch interface program is provided to simplify the running of distributed jobs.
Distributed Processing
Use with an HPC cluster or just split a job across multiple PCs!
2016 nCode User Group Meeting
© 2016 HBM October 5-6, 2016 www.ncode.com
© 2016 HBM
6
• A DesignLife batch job can be distributed across multiple machines or compute nodes, where each machine can use multiple processing threads.
• For example, a single batch job for strain‐life analysis can run distributed across 3 quad‐core computers.
• This uses normal DesignLife licenses on the “master” computer plus a Distributed Processing license (or 250 CDS units) that enables the job to be distributed to “slaves”.
• Slave processes only require Processing Thread licenses.
Threads and Compute Nodes
Fundamentals
DesignLife Base
CAE Strain (E-N)
Distributed Processing
Processing Thread (x2)
Processing Thread (x4) Processing Thread (x4)
Master
Slave Slave
License usage:
Example using 4 cores on each of 3 machines (12 threads in total)
2016 nCode User Group Meeting
© 2016 HBM October 5-6, 2016 www.ncode.com
© 2016 HBM
7
• Distributing large batch jobs across multiple machines produces a significant reduction in analysis time.
• Speed up is very scalable as more threads are added across more nodes.
Reduced Run Time with Distributed Processing
Example:• Analysis
EN with CriticalPlane combination method. 5499 shells analyzed, 6 load cases Duty cycle 7 events, combined full
1,402,880 points
• Computer Hardware for Linux 64 bit SuSE Linux 11 SP1 HP Westmere Intel 3.33 GHz HPC IBM MPI
*Actual reduction in run-time depends on specific job.
Times are for analysis run only and do not include FE translation time.
2016 nCode User Group Meeting
© 2016 HBM October 5-6, 2016 www.ncode.com
© 2016 HBM
8
• Distributed Processing is very scalable across multiple nodes.
• In this example, analysis with 8 nodes was almost linearly 8 times faster.
• Each node used 12 threads. Total of 96 threads.
Faster Analysis with Distributed Processing
Example:• Analysis
EN with CriticalPlane combination method. 5499 shells analyzed, 6 load cases Duty cycle 7 events, combined full
1,402,880 points
• Computer Hardware for Linux 64 bit SuSE Linux 11 SP1 HP Westmere Intel 3.33 GHz HPC IBM MPI
Times are for analysis run only and do not include FE translation time.
2016 nCode User Group Meeting
© 2016 HBM October 5-6, 2016 www.ncode.com
© 2016 HBM
9
9
• Full automotive body model for both sheet steel and spot welds.
• Run time for 6 threads was over 39 hours.
• With 96 threads over 8 nodes (processes), total run time under 8 hours.
Real world example
2016 nCode User Group Meeting
© 2016 HBM October 5-6, 2016 www.ncode.com
© 2016 HBM
10
1
• There is increasing need for faster fatigue analysis.• nCode DesignLife scales nearly linearly with:
• Increasing number of processing threads on a single computer
• Increasing number of computers in a cluster
Summary
2016 nCode User Group Meeting
© 2016 HBM October 5-6, 2016 www.ncode.com
© 2016 HBM
www.hbmprenscia.com
2016 nCode User Group Meeting
© 2016 HBM October 5-6, 2016 www.ncode.com
Recommended