Upload
buinhi
View
220
Download
0
Embed Size (px)
Citation preview
MSC.Software VPD Conference | July 17-19, 2006 | Huntington Beach, California
Clusters: Mainstream Technology for CAE
Alanna DwyerHPC Division, HP
2
MSC.Software VPD Conference | July 17-19, 2006 | Huntington Beach, California
Linux and Clusters Sparked a Revolution in High Performance Computing!
• Supercomputing performance now affordable and accessible • Linux enabled the use industry-standard technologies • Many more users and new applications• Cluster growth rate is over 50% per year! (volume is half of HPC)
• Now a critical resource in meeting today’s CAE challenges• Increasingly complex CAE analysis demands more
• larger models; more jobs to run; longer runs • Market is responding, adding enterprise RAS features to clusters
• Treating CLUSTERS like PRODUCTS, not custom deployments• Integration with large SMP systems allows one to optimize resource
deployment• Some jobs just can’t be distributed…
3
MSC.Software VPD Conference | July 17-19, 2006 | Huntington Beach, California
Why cluster?
• Budget:• Price-performance (+10 GFLOPs system today < $4K)
• Scale beyond practical SMP limits• Faster time to market and profit, improved insights
• Resource consolidation• Centralized management, optimize utilization
• Clusters aren’t just for compute engines• Can apply same principles to file systems and visualization• Can help deal with exponential growth in volume of simulation data
44
Application Experience Application Experience ––User Application (Courtesy of NTUST)User Application (Courtesy of NTUST)
•A large-scale FE model (nonlinear continuum mechanics)•Computing time of 80 days was necessary with 1-CPU in year 2000•14 processors of AMD Athlon 1600+ with Myrinet 67 hours•96 processor cores of HP Opteron 270 at NTUST cluster < 12 hours•A home-made application ported in less than a day.
NTUST: National Taiwan University of Science and Technology
5
MSC.Software VPD Conference | July 17-19, 2006 | Huntington Beach, California
SMP vs. Cluster (farm) Example
MSC.Nastran: XLTDF Comparison
0
1000
2000
3000
4000
5000
6000
7000
1 2 4Number of processes
Tota
l ela
psed
tim
e
Integrity rx5670 4 way SMP Integrity rx2620 - 2 node clusterProLiant DL145 G2 - 2 node cluster
6
MSC.Software VPD Conference | July 17-19, 2006 | Huntington Beach, California
CAE Application Sub-Segments
CAE Domain: Pre/Post Structures Impact Fluids
Parallelized Serial (SMP*) SMP (MPI*)
1 – 4 (8*) cores
Integrity SMP or Farm
30%30%
MPI
Job Scalability 32 – 64 GB
MPI
2 – 16 (32*) cores
X64 Cluster
4 – 128 (256*) cores
Typical Solution Workstation or SMP server X64 Cluster
60%20%
CPU cycles – AutoCPU cycles – Aero
All jobs10%50%
(*emerging capability)
7
MSC.Software VPD Conference | July 17-19, 2006 | Huntington Beach, California
HPC Cluster Implementation challenges
• System and workload management
• Scalable performance• Scalable data management• Interconnect/Network
Complexity• Application availability and
scalability• Power and cooling• Acquisition and deployment
8
MSC.Software VPD Conference | July 17-19, 2006 | Huntington Beach, California
Latest Advancements in Clustering
• Multi-core delivering continued price-performance improvements
• Improvements in clustering software and tools • More applications are being developed and tuned to
leverage cluster/DMP solutions• Principles of compute clusters being applied to storage
and visualization• InfiniBand now established in HPC• Solutions now coming to market that address power
and cooling concerns
9
MSC.Software VPD Conference | July 17-19, 2006 | Huntington Beach, California
HP Unified Cluster Portfolio
10
MSC.Software VPD Conference | July 17-19, 2006 | Huntington Beach, California
Applications: ISVs standardizing on HP-MPI
Powerful Solver Technology
AMLS
MolproUniversity of Cardiff
“One of the top reasons that we went with HP-MPI is that we've had a great working relationship with HP. It was a win-win for ANSYS, HP and our customers - in terms of cost, interconnects, support and performance compared to other message passing interfaces for Linux and Unix. In addition, I've always had great turnaround from HP in response to hardware and software issues.”
Lisa Fordanich, Senior Systems Specialist, ANSYS www.ansys.com/services/ss-interconnects.htm
“HP-MPI is an absolute godsend,” notes Keith Glassford, director of the Materials Science division at San Diego, CA-based Accelrys Software Inc. “It allows us to focus our energy and resources on doing what we’re good at, which is developing scientific and engineering software to solve customer problems.”
11
MSC.Software VPD Conference | July 17-19, 2006 | Huntington Beach, California
CAE Reference Architecture
computeclusters
computeclusters
computeclusters
computeclusters
direct attached Disk Array
(or use SFS)
Client Workstations
Front End HAjob scheduler
computeSMPs
computeclusters
computeclusters
pre/postSMP
visualizationclusterScalable
File Share
meta dataobject data
LAN
InfiniBand switched fabric interconnect
Remote WorkstationsRGS
12
MSC.Software VPD Conference | July 17-19, 2006 | Huntington Beach, California
A Cluster Alternative to Direct Attached Storage: HP Scalable File Share (SFS)• Applying principles of clusters to file systems and storage
enables the sharing of data sets without performance penalty• MSC.Nastran is Fast on HP SFS:
Replace extra-disk fat-nodes with flexible storage• Traditional approach:
• Special nodes in the cluster w/ multiple local JBOD disks• Expensive and hard to manage
• New approach• Use fast centralized, virtualized HP SFS filesystem
• Similar performance• Lower cost
• Shared rather than dedicated storage• Easier to use
• Any node in the cluster can run Nastran• Higher reliability: RAID 6 instead of RAID 0
13
MSC.Software VPD Conference | July 17-19, 2006 | Huntington Beach, California
MSC.Nastran Benchmark XXCMD
• Standard MSC benchmark• XXCMD: solution of the natural
frequencies of an automotive body• Performs a medium amount of I/O
compared to industry real-life customer datasets (4 TB of I/O with blocksize of 256 KB)
• Multiple jobs running simultaneously: no shared data
• Customers typically use direct attached storage for each host
• 1 controller and 5 drives per job are recommended for good throughput
• SFS performance• 1 Object Storage Server node and
4 enclosures (with array of SATA drives) for every 4 hosts achieved excellent performance
• No degradation for up to 16 hosts, and small degradation from 16 to 32 hosts
• Significant (~6 times) advantage vs. small SCSI configuration
MSC.Nastran benchmark XXCMD - performs medium I/O(small is better)
0
20000
40000
60000
80000
100000
120000
140000
1 2 4 8 16 32
# hosts
time
(sec
)
SFS 2 jobs perhost
MSA 2 jobs perhost
SCSI 2 jobs perhost
14
MSC.Software VPD Conference | July 17-19, 2006 | Huntington Beach, California
Key Considerations in Designing a Solution
• What processor and interconnect for the mix of jobs• Centralized resource or single purpose systems
• Can applications co-exist?• Economics of consolidation
• Environmentals: power, cooling, weight, space• Roll your own system or acquire a total solution• Production scalability requirements
• Performance• Availability and Reliability• Manageability (provisioning, booting, monitoring, upgrades)
• Budget, of course – and TCO
15
MSC.Software VPD Conference | July 17-19, 2006 | Huntington Beach, California
www.hp.com/go/hptcFor more information see
Cluster Platform Express: www.hp.com/go/cp-express
16
MSC.Software VPD Conference | July 17-19, 2006 | Huntington Beach, California
Implementations of CAE Reference Architecture:
AMD Opteron example
Opteron Workstationfor Pre/Post
XW93002 Dual Core Opteron 2.6 GHz CPUs2 internal 146 GB drives32 GB memoryDVD
Fast
HP xw9300 WorkstationOpteron Server for Structural AnalysisDL58522U Rack with Factory integration4 Dual Core Opteron CPUs2 internal 146 GB drives32 GB memoryMSA30 Dual Bus
Faster
ProLiant DL585 Serverwith Disk Array
Fastest
CP 4000 Cluster
Opteron Cluster for CFD and Impact Analysis
HP Cluster Platform 4000 compute cluster
42U Rack Sidewinder optionDL385 head node for cluster
administrationDL145G2 with two Dual Core
Opteron CPUs, each with 1 internal drive and 4 GB memory (1GB/core)
DL585 front end node with 64GB for grid generation and domain decompositioin
XC Software Operating Environment
support