Clusters: Mainstream Technology for CAE - MSC Softwareweb.mscsoftware.com/events/vpd2006/na/presentations/51.pdf · Clusters: Mainstream Technology for CAE ... Latest Advancements

MSC.Software VPD Conference | July 17-19, 2006 | Huntington Beach, California

Clusters: Mainstream Technology for CAE

Alanna DwyerHPC Division, HP

2


Linux and Clusters Sparked a Revolution in High Performance Computing!

• Supercomputing performance now affordable and accessible • Linux enabled the use industry-standard technologies • Many more users and new applications• Cluster growth rate is over 50% per year! (volume is half of HPC)

• Now a critical resource in meeting today’s CAE challenges• Increasingly complex CAE analysis demands more

• larger models; more jobs to run; longer runs • Market is responding, adding enterprise RAS features to clusters

• Treating CLUSTERS like PRODUCTS, not custom deployments• Integration with large SMP systems allows one to optimize resource

deployment• Some jobs just can’t be distributed…

3


Why cluster?

• Budget:• Price-performance (+10 GFLOPs system today < $4K)

• Scale beyond practical SMP limits• Faster time to market and profit, improved insights

• Resource consolidation• Centralized management, optimize utilization

• Clusters aren’t just for compute engines• Can apply same principles to file systems and visualization• Can help deal with exponential growth in volume of simulation data

44

Application Experience Application Experience ––User Application (Courtesy of NTUST)User Application (Courtesy of NTUST)

•A large-scale FE model (nonlinear continuum mechanics)•Computing time of 80 days was necessary with 1-CPU in year 2000•14 processors of AMD Athlon 1600+ with Myrinet 67 hours•96 processor cores of HP Opteron 270 at NTUST cluster < 12 hours•A home-made application ported in less than a day.

NTUST: National Taiwan University of Science and Technology

5


SMP vs. Cluster (farm) Example

MSC.Nastran: XLTDF Comparison

0

1000

2000

3000

4000

5000

6000

7000

1 2 4Number of processes

Tota

l ela

psed

tim

e

Integrity rx5670 4 way SMP Integrity rx2620 - 2 node clusterProLiant DL145 G2 - 2 node cluster

6


CAE Application Sub-Segments

CAE Domain: Pre/Post Structures Impact Fluids

Parallelized Serial (SMP*) SMP (MPI*)

1 – 4 (8*) cores

Integrity SMP or Farm

30%30%

MPI

Job Scalability 32 – 64 GB

MPI

2 – 16 (32*) cores

X64 Cluster

4 – 128 (256*) cores

Typical Solution Workstation or SMP server X64 Cluster

60%20%

CPU cycles – AutoCPU cycles – Aero

All jobs10%50%

(*emerging capability)

7


HPC Cluster Implementation challenges

• System and workload management

• Scalable performance• Scalable data management• Interconnect/Network

Complexity• Application availability and

scalability• Power and cooling• Acquisition and deployment

8


Latest Advancements in Clustering

• Multi-core delivering continued price-performance improvements

• Improvements in clustering software and tools • More applications are being developed and tuned to

leverage cluster/DMP solutions• Principles of compute clusters being applied to storage

and visualization• InfiniBand now established in HPC• Solutions now coming to market that address power

and cooling concerns

9


HP Unified Cluster Portfolio

10


Applications: ISVs standardizing on HP-MPI

Powerful Solver Technology

AMLS

MolproUniversity of Cardiff

“One of the top reasons that we went with HP-MPI is that we've had a great working relationship with HP. It was a win-win for ANSYS, HP and our customers - in terms of cost, interconnects, support and performance compared to other message passing interfaces for Linux and Unix. In addition, I've always had great turnaround from HP in response to hardware and software issues.”

Lisa Fordanich, Senior Systems Specialist, ANSYS www.ansys.com/services/ss-interconnects.htm

“HP-MPI is an absolute godsend,” notes Keith Glassford, director of the Materials Science division at San Diego, CA-based Accelrys Software Inc. “It allows us to focus our energy and resources on doing what we’re good at, which is developing scientific and engineering software to solve customer problems.”

11


CAE Reference Architecture

computeclusters

computeclusters

computeclusters

computeclusters

direct attached Disk Array

(or use SFS)

Client Workstations

Front End HAjob scheduler

computeSMPs

computeclusters

computeclusters

pre/postSMP

visualizationclusterScalable

File Share

meta dataobject data

LAN

InfiniBand switched fabric interconnect

Remote WorkstationsRGS

12


A Cluster Alternative to Direct Attached Storage: HP Scalable File Share (SFS)• Applying principles of clusters to file systems and storage

enables the sharing of data sets without performance penalty• MSC.Nastran is Fast on HP SFS:

Replace extra-disk fat-nodes with flexible storage• Traditional approach:

• Special nodes in the cluster w/ multiple local JBOD disks• Expensive and hard to manage

• New approach• Use fast centralized, virtualized HP SFS filesystem

• Similar performance• Lower cost

• Shared rather than dedicated storage• Easier to use

• Any node in the cluster can run Nastran• Higher reliability: RAID 6 instead of RAID 0

13


MSC.Nastran Benchmark XXCMD

• Standard MSC benchmark• XXCMD: solution of the natural

frequencies of an automotive body• Performs a medium amount of I/O

compared to industry real-life customer datasets (4 TB of I/O with blocksize of 256 KB)

• Multiple jobs running simultaneously: no shared data

• Customers typically use direct attached storage for each host

• 1 controller and 5 drives per job are recommended for good throughput

• SFS performance• 1 Object Storage Server node and

4 enclosures (with array of SATA drives) for every 4 hosts achieved excellent performance

• No degradation for up to 16 hosts, and small degradation from 16 to 32 hosts

• Significant (~6 times) advantage vs. small SCSI configuration

MSC.Nastran benchmark XXCMD - performs medium I/O(small is better)

0

20000

40000

60000

80000

100000

120000

140000

1 2 4 8 16 32

# hosts

time

(sec

)

SFS 2 jobs perhost

MSA 2 jobs perhost

SCSI 2 jobs perhost

14


Key Considerations in Designing a Solution

• What processor and interconnect for the mix of jobs• Centralized resource or single purpose systems

• Can applications co-exist?• Economics of consolidation

• Environmentals: power, cooling, weight, space• Roll your own system or acquire a total solution• Production scalability requirements

• Performance• Availability and Reliability• Manageability (provisioning, booting, monitoring, upgrades)

• Budget, of course – and TCO

15


www.hp.com/go/hptcFor more information see

Cluster Platform Express: www.hp.com/go/cp-express

[email protected]

http://www.hp.com/go/hptc

http://www.hp.com/go/cp-express

16


Implementations of CAE Reference Architecture:

AMD Opteron example

Opteron Workstationfor Pre/Post

XW93002 Dual Core Opteron 2.6 GHz CPUs2 internal 146 GB drives32 GB memoryDVD

Fast

HP xw9300 WorkstationOpteron Server for Structural AnalysisDL58522U Rack with Factory integration4 Dual Core Opteron CPUs2 internal 146 GB drives32 GB memoryMSA30 Dual Bus

Faster

ProLiant DL585 Serverwith Disk Array

Fastest

CP 4000 Cluster

Opteron Cluster for CFD and Impact Analysis

HP Cluster Platform 4000 compute cluster

42U Rack Sidewinder optionDL385 head node for cluster

administrationDL145G2 with two Dual Core

Opteron CPUs, each with 1 internal drive and 4 GB memory (1GB/core)

DL585 front end node with 64GB for grid generation and domain decompositioin

XC Software Operating Environment

support

Documents

Clusters: Mainstream Technology for CAE - MSC Softwareweb.mscsoftware.com/events/vpd2006/na/presentations/51.pdf · Clusters: Mainstream Technology for CAE ... Latest Advancements