Upload
hoangcong
View
223
Download
4
Embed Size (px)
Citation preview
Dell HPC
GPU Computing Approach
• Hardware changes rapidly
– New CPUs
– New GPUs
– New Interconnects
– New software
• All of these happen at different rates and at different times
• GPU applications are evolving very rapidly
• How do you adapt to these changes? How do you protect your investment? How do you adapt to new and evolving applications?
• Be Flexible
3
Dell HPC
Great example of flexibility
• From initial development to “final” code version – performance improves by a factor of 9!
• Software changes during development results in hardware changes
4
Dell HPC
Implementation
• Develop on something smaller such as a laptop or workstation
• Deploy production applications onto cluster
• For cluster deployments:
– Move GPUs to external PCIe chassis
• Allows CPUs and GPUs to be changed independently
• Allows network to be changed independently
• Optimize power and cooling for GPUs and CPUs separately
• Add GPUs to host nodes as applications evolve
– It may be 1 GPU today and 8 GPUs tomorrow
5
Dell HPC
Dell C410x
• 3U PCIe chassis – 16 slots (10 in front, 6 in back) – all x16
– 8 PCIe connections to host nodes (1-8 slots per connection)
6
• Redundant power supplies (4x 1400W)
• BMC (IPMI 2.0) on-board
Dell HPC
Host nodes:
• C6100:
7
• C6145:
• 4-in-2U
• 2S Intel with IB mezz card (x8)
• PCIe x16 HIC card
• Redundant power
• 2x 4S AMD boards in 2U
• (4) x16 slots – 3 are open
– 1 has iPASS connector
• IB mezz card (x8)
• Redundant power
Dell HPC
Host/GPU combinations
• Many combinations are possible
– Intel or AMD?
– How many GPUs per node?
– How many lanes per GPU?
8
Dell HPC
Internal vs. External: NAMD
9
0.95
0.82
0
0.2
0.4
0.6
0.8
1
1.2
STMV
Ste
ps/
Se
co
nd
NAMD – STMV Benchmark
SuperMicro (2)
C410x / C6100 (2)
Dell HPC
Internal vs. External: CUDASW++
10
0
5
10
15
20
25
30
GF
LO
PS
Query Length
CUDASW++
C410x / C6100 (2)
SuperMicro (2)
Dell HPC
Scalability: NAMD
11
0.10
0.47
0.84
1.52
0.95
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
STMV
Ste
ps/
Se
co
nd
NAMD
CPU
C410x / C6100 (1)
C410x / C6100 (2)
C410x / C6100 (4)
SuperMicro (2)
Dell HPC
Impact of CUDA versions
• Heisenberg Spin Glass (HSG) Model
– Spin Glass modeling is a technique used in statistical mechanics to simulate and predict the behavior of various physical phenomena
• HSG is multi-GPU capable using MPI
– Recent upgrade to CUDA 4.0
• Two code versions:
– MPI based
› GPUs communicate by sending data to host, then to approproate GPU
– CUDA 4.0
› GPUs communicate directly (no host)
• Compare performance
12
Dell HPC
HSG results
• CUDA 4.0 (GPU Direct) is 15-30% faster than MPI
• For Intel systems, GPU Direct requires all GPUs to be connected to the same IOH
• C410x allows you to expand to multiple GPUs per single IOH
13
Dell HPC
Realities
• HPC storage is about 15-25% the cost of a system but about 90% of the problems
• HPC Storage is about Solutions not just hardware
– Hardware, file system, client, management/monitoring, documentation, best practices, sizing and performance guidance, services and support
• There are no one, two, or even three file systems/solutions that satisfy the various requirements
– Recent IDC study: 25 customers = 13 file systems
• Applications/Processes drive solutions (just like compute). But
– Very few customers understand the IO characteristics of the apps
• Access frequency requirements don’t match the underlying storage platform
– A very large percentage of data is never touched approximately 2-4 weeks after it is created
15
Dell HPC
HPC Storage Solutions Aren’t Easy
• Ignoring Cost – name the Top 3 storage attributes
1. Performance
2. Reliability
3. Capacity
• Difficult or impossible to get all 3 attributes in a single solution with HPC price constraints
• Can we get all 3 attributes in different solutions and integrate them?
– Maintain attributes and improves flexibility and increases options
16
Dell HPC
Flexibility, Adaptability, and Options
• The performance importance of data changes over the life of the data
– At first, performance is very important
– After a period of time, the performance is less important
• Why keep data on high-performance storage that isn’t being used?
• Based on applications and performance importance there are three basic categories of data requirements:
1. Fast Scratch
• Performance, performance, performance
2. Primary (/home)
• Reliability
3. Long-term
• Capacity (very little performance)
17
Dell HPC
Dell’s approach to deliver HPC storage solutions
• Dell is delivering solutions using two approaches:
– Complete solutions - Fully vetted, tested, supported
› Come with end-to-end support from Dell and partners
› Detailed documentation including best practices, performance and sizing guidance
› Deployment services if necessary
– Roll-it-your-own
› Dell creates technical whitepapers containing:
– Recommended configurations
– Details on configuration
– Best practices and sizing guidance
› Customer buys hardware and uses whitepapers as a reference guide
› Full Dell warranty and support on Dell components
– Limited or no deployment services; no solution type services
• Overtime, deliver building blocks that will integrate into the larger storage ecosystem
18
Dell HPC
Fast Scratch Storage • Requirements:
– Very fast (above 1.4 GB/s) – more than NFS
– Scalability in performance and capacity
– Cost effective
– Reliability is not necessarily a primary requirement
• Roll-Your-Own reference configurations and supporting data
Cambridge University Developed Lustre Reference Configuration
– Detailed whitepaper discussing architecture and performance analysis of the Lustre solution deployed at University of Cambridge
– The deployment steps and best practices listed in the paper can be used to architect similar Lustre solutions using Dell server and storage products
– Currently work under progress to develop a reference architecture using latest generation Dell PowerEdge servers and PowerVault storage
• Complete Dell HPC Fast Scratch Solutions
Dell | Terascala High Performance Computing Storage Solution (DT-HSS)
– Third generation Lustre solution from Dell and Terascala referred to as DT-HSS3
– Utilizes Dell’s latest generation 6Gb/s SAS based PowerVault MD series storage
19
Dell HPC
The DELL | Terascala HPC Storage Solution (DT-HSS3) • Unique scale out storage appliance for throughput
intensive applications
• Fully supported storage appliance that leverages Lustre, industry’s leading open-source parallel file system
• Simple, linear scalability
– Up to 6.2 GB/s of read and 4.2GB/s write throughput per base object pair. Scale aggregate performance by adding object pairs.
– 48TB to Petabytes in a single name space
– Pre-defined configurations from 48TB to 336 TB in a single rack – (building blocks)
– Configurations serve as building blocks for larger and faster solutions
• Rich management including hardware and file system monitoring
– Automated Install & Maintenance , Health Monitoring, Failover Solution, Root Cause Analysis
20
Metadata Storage Server (MDS) Pair
Object Storage Server (OSS) Pair
Dell HPC
Primary Storage • Requirements:
– Performance is usually not a big deal
– Reliability is important
– Ease of use is important
• Typical usage for home directories, user data, application data and results
• NFS is a widely used protocol for such use case
• Roll-Your-Own reference configurations and supporting data:
– Dell PowerVault MD1200 as a Network File System Backend Storage Solution
– Optimizing Dell PowerVault MD1200 Storage Arrays for High Performance Computing (HPC) Deployments
• Complete Dell HPC NFS Storage Solutions
– Dell HPC NFS Storage Solution (NSS)
› Leverages Dell PowerEdge and PowerVault storage
› 24-96TB (raw storage) in a single namespace using Red Hat XFS file system
› Dell developed tuning and best practices
21
Dell HPC
The Dell HPC NFS Storage Solution
22
NFS Gateway
… Storage – MD1200
Expansion
MD1200’s
• Takes the guesswork out of NFS configurations – Appliance approach to inexpensive NFS solutions
• Range of capacity: – Up to 96TB in a single namespace
• HA Configuration options • Good performance
– Up to 1.47 GB/s for writes and 2.4 GB/s for reads for NFS performance
– 6Gbps SAS, optional IB or 10GigE
– Tuned storage and file system configurations
• Cost Effective • Reliable and supported
– Proven hardware
– 3 years support with Dell including XFS support
– Redundant power supplies, connections, plus drive spares kit
• Easy to install – Dell configuration and deployment: Whitepaper and Dell PS
– Affordable installation services available
Dell HPC
Benefits of Dell NSS
• Performance tuned NFS server – Best possible performance
– No need to experiment with tuning options – already tuned
23
0
200000
400000
600000
800000
1000000
1200000
1400000
2 4 8 12 16 24 32
Th
rou
gp
ut
KB
/s
Clients
tuned
not tuned
30%
Dell HPC
NSS Options
• Single NFS Gateway
– Perc H800 RAID card(s) in NFS gateway
› Dell MD1200 JBOD’s connected to RAID cards
– RAID-60 or RAID-60+LVM
24
• Two Active-Passive NFS Gateways
– Dell MD3200 RBOD contains RAID card
– Dell MD1200 JBOD’s are connected to RBOD
– RAID-6 + LVM
NSS NSS-HA
• NFS Gateway – Dell Server (R710)
– RAID-1 for OS (plus 1 hot-spare)
– RAID-0 for additional swap space
– 3 years of support on OS, file system, hardware
– Cold spares (disks)
– IB, 10GigE options
– RHEL 5.5 OS
– Redhat Scalable File system (XFS)
– Dell ProSupport
Common Aspects
Dell HPC
NSS Large Solution: 96 TB’s
25
QDR IB or 10GigE Raw capacity: 96TB
Formatted capacity: ~80TB RAID-60 and LVM RAID-6 within each MD1200
RAID-0 across MD1200 pairs
LVM to combine LUNS
10GigE NFS Performance Peak Sequential Read: 850 MB/s
Peak Sequential Write: 1,180 MB/s
InfiniBand NFS Performance Peak Sequential Read: 1,350 MB/s
Peak Sequential Write: 1,470 MB/s
Summary
Dell HPC
NSS-HA: Large
26
PowerVault MD1200
PowerVault
MD3200
Dell R710
NSS-HA Server Dell 710
NSS-HA Server
GigE
Power Cords
IB or 10GigE
SAS (6Gbps)
1 1
Raw capacity: 96TB
Formatted capacity: ~80TB RAID-6 and LVM RAID-6 within each MD3200/1200
LVM to combine LUNS
10GigE NFS Performance Peak Sequential Read: 560 MB/s
Peak Sequential Write: 1,130 MB/s
InfiniBand NFS Performance Peak Sequential Read: 2,430 MB/s
Peak Sequential Write: 1,274 MB/s
Summary
Dell HPC
Summary
• Two most recent trends:
• GPU Computing – GPU Computing is still evolving
› Hardware (CPUs, GPUs, Interconnect), and software (CUDA)
– Best course of action is to remain flexible
– Ability to upgrade CPUs or GPUs or software independent of each
– External PCIe chassis affords flexibility
› Good host nodes
• Data Management and Storage – Overall it’s the largest problem for users today
– Focus on performance (fast-scratch), reliability (primary), and capacity (long-term)
› Develop a product for each piece and integrate them together
– Roll-it-your-own and Fully supported solutions are available
– Tools for data management are becoming highly critical
27