Dell Presentation Template Standard 4:3 Layouthpcadvisorycouncil.com/events/2011/european_workshop/pdf/6_Dell.pdf · •HA Configuration options ... – RAID-60 or RAID-60+LVM 24

Dell HPC

Dr. Jeffrey Layton ([email protected])

Enterprise Technologist - HPC

Dell HPC

GPU Computing at Dell

Dell HPC

GPU Computing Approach

• Hardware changes rapidly

– New CPUs

– New GPUs

– New Interconnects

– New software

• All of these happen at different rates and at different times

• GPU applications are evolving very rapidly

• How do you adapt to these changes? How do you protect your investment? How do you adapt to new and evolving applications?

• Be Flexible

3

Dell HPC

Great example of flexibility

• From initial development to “final” code version – performance improves by a factor of 9!

• Software changes during development results in hardware changes

4

Dell HPC

Implementation

• Develop on something smaller such as a laptop or workstation

• Deploy production applications onto cluster

• For cluster deployments:

– Move GPUs to external PCIe chassis

• Allows CPUs and GPUs to be changed independently

• Allows network to be changed independently

• Optimize power and cooling for GPUs and CPUs separately

• Add GPUs to host nodes as applications evolve

– It may be 1 GPU today and 8 GPUs tomorrow

5

Dell HPC

Dell C410x

• 3U PCIe chassis – 16 slots (10 in front, 6 in back) – all x16

– 8 PCIe connections to host nodes (1-8 slots per connection)

6

• Redundant power supplies (4x 1400W)

• BMC (IPMI 2.0) on-board

Dell HPC

Host nodes:

• C6100:

7

• C6145:

• 4-in-2U

• 2S Intel with IB mezz card (x8)

• PCIe x16 HIC card

• Redundant power

• 2x 4S AMD boards in 2U

• (4) x16 slots – 3 are open

– 1 has iPASS connector

• IB mezz card (x8)

• Redundant power

Dell HPC

Host/GPU combinations

• Many combinations are possible

– Intel or AMD?

– How many GPUs per node?

– How many lanes per GPU?

8

Dell HPC

Internal vs. External: NAMD

9

0.95

0.82

0

0.2

0.4

0.6

0.8

1

1.2

STMV

Ste

ps/

Se

co

nd

NAMD – STMV Benchmark

SuperMicro (2)

C410x / C6100 (2)

Dell HPC

Internal vs. External: CUDASW++

10

0

5

10

15

20

25

30

GF

LO

PS

Query Length

CUDASW++

C410x / C6100 (2)

SuperMicro (2)

Dell HPC

Scalability: NAMD

11

0.10

0.47

0.84

1.52

0.95

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

STMV

Ste

ps/

Se

co

nd

NAMD

CPU

C410x / C6100 (1)

C410x / C6100 (2)

C410x / C6100 (4)

SuperMicro (2)

Dell HPC

Impact of CUDA versions

• Heisenberg Spin Glass (HSG) Model

– Spin Glass modeling is a technique used in statistical mechanics to simulate and predict the behavior of various physical phenomena

• HSG is multi-GPU capable using MPI

– Recent upgrade to CUDA 4.0

• Two code versions:

– MPI based

› GPUs communicate by sending data to host, then to approproate GPU

– CUDA 4.0

› GPUs communicate directly (no host)

• Compare performance

12

Dell HPC

HSG results

• CUDA 4.0 (GPU Direct) is 15-30% faster than MPI

• For Intel systems, GPU Direct requires all GPUs to be connected to the same IOH

• C410x allows you to expand to multiple GPUs per single IOH

13

Dell HPC

Data Management and Storage

Dell HPC

Realities

• HPC storage is about 15-25% the cost of a system but about 90% of the problems

• HPC Storage is about Solutions not just hardware

– Hardware, file system, client, management/monitoring, documentation, best practices, sizing and performance guidance, services and support

• There are no one, two, or even three file systems/solutions that satisfy the various requirements

– Recent IDC study: 25 customers = 13 file systems

• Applications/Processes drive solutions (just like compute). But

– Very few customers understand the IO characteristics of the apps

• Access frequency requirements don’t match the underlying storage platform

– A very large percentage of data is never touched approximately 2-4 weeks after it is created

15

Dell HPC

HPC Storage Solutions Aren’t Easy

• Ignoring Cost – name the Top 3 storage attributes

1. Performance

2. Reliability

3. Capacity

• Difficult or impossible to get all 3 attributes in a single solution with HPC price constraints

• Can we get all 3 attributes in different solutions and integrate them?

– Maintain attributes and improves flexibility and increases options

16

Dell HPC

Flexibility, Adaptability, and Options

• The performance importance of data changes over the life of the data

– At first, performance is very important

– After a period of time, the performance is less important

• Why keep data on high-performance storage that isn’t being used?

• Based on applications and performance importance there are three basic categories of data requirements:

1. Fast Scratch

• Performance, performance, performance

2. Primary (/home)

• Reliability

3. Long-term

• Capacity (very little performance)

17

Dell HPC

Dell’s approach to deliver HPC storage solutions

• Dell is delivering solutions using two approaches:

– Complete solutions - Fully vetted, tested, supported

› Come with end-to-end support from Dell and partners

› Detailed documentation including best practices, performance and sizing guidance

› Deployment services if necessary

– Roll-it-your-own

› Dell creates technical whitepapers containing:

– Recommended configurations

– Details on configuration

– Best practices and sizing guidance

› Customer buys hardware and uses whitepapers as a reference guide

› Full Dell warranty and support on Dell components

– Limited or no deployment services; no solution type services

• Overtime, deliver building blocks that will integrate into the larger storage ecosystem

18

Dell HPC

Fast Scratch Storage • Requirements:

– Very fast (above 1.4 GB/s) – more than NFS

– Scalability in performance and capacity

– Cost effective

– Reliability is not necessarily a primary requirement

• Roll-Your-Own reference configurations and supporting data

Cambridge University Developed Lustre Reference Configuration

– Detailed whitepaper discussing architecture and performance analysis of the Lustre solution deployed at University of Cambridge

– The deployment steps and best practices listed in the paper can be used to architect similar Lustre solutions using Dell server and storage products

– Currently work under progress to develop a reference architecture using latest generation Dell PowerEdge servers and PowerVault storage

• Complete Dell HPC Fast Scratch Solutions

Dell | Terascala High Performance Computing Storage Solution (DT-HSS)

– Third generation Lustre solution from Dell and Terascala referred to as DT-HSS3

– Utilizes Dell’s latest generation 6Gb/s SAS based PowerVault MD series storage

19

http://www.dell.com/downloads/global/solutions/200-DELL-CAMBRIDGE-SOLUTIONS-WHITEPAPER-20072010b.pdf




http://content.dell.com/us/en/enterprise/d/hpcc/Storage_Lustre.aspx






Dell HPC

The DELL | Terascala HPC Storage Solution (DT-HSS3) • Unique scale out storage appliance for throughput

intensive applications

• Fully supported storage appliance that leverages Lustre, industry’s leading open-source parallel file system

• Simple, linear scalability

– Up to 6.2 GB/s of read and 4.2GB/s write throughput per base object pair. Scale aggregate performance by adding object pairs.

– 48TB to Petabytes in a single name space

– Pre-defined configurations from 48TB to 336 TB in a single rack – (building blocks)

– Configurations serve as building blocks for larger and faster solutions

• Rich management including hardware and file system monitoring

– Automated Install & Maintenance , Health Monitoring, Failover Solution, Root Cause Analysis

20

Metadata Storage Server (MDS) Pair

Object Storage Server (OSS) Pair

Dell HPC

Primary Storage • Requirements:

– Performance is usually not a big deal

– Reliability is important

– Ease of use is important

• Typical usage for home directories, user data, application data and results

• NFS is a widely used protocol for such use case

• Roll-Your-Own reference configurations and supporting data:

– Dell PowerVault MD1200 as a Network File System Backend Storage Solution

– Optimizing Dell PowerVault MD1200 Storage Arrays for High Performance Computing (HPC) Deployments

• Complete Dell HPC NFS Storage Solutions

– Dell HPC NFS Storage Solution (NSS)

› Leverages Dell PowerEdge and PowerVault storage

› 24-96TB (raw storage) in a single namespace using Red Hat XFS file system

› Dell developed tuning and best practices

21

http://content.dell.com/us/en/enterprise/d/business~solutions~whitepapers~en/Documents~hpc-pv-md1200-nfs.pdf.aspx




http://www.webbuyersguide.com/resource/brief.aspx?id=17680&sitename=dellhpc





http://content.dell.com/us/en/enterprise/spredir.ashx/hpcc/storage-dell-nss

Dell HPC

The Dell HPC NFS Storage Solution

22

NFS Gateway

… Storage – MD1200

Expansion

MD1200’s

• Takes the guesswork out of NFS configurations – Appliance approach to inexpensive NFS solutions

• Range of capacity: – Up to 96TB in a single namespace

• HA Configuration options • Good performance

– Up to 1.47 GB/s for writes and 2.4 GB/s for reads for NFS performance

– 6Gbps SAS, optional IB or 10GigE

– Tuned storage and file system configurations

• Cost Effective • Reliable and supported

– Proven hardware

– 3 years support with Dell including XFS support

– Redundant power supplies, connections, plus drive spares kit

• Easy to install – Dell configuration and deployment: Whitepaper and Dell PS

– Affordable installation services available

Dell HPC

Benefits of Dell NSS

• Performance tuned NFS server – Best possible performance

– No need to experiment with tuning options – already tuned

23

0

200000

400000

600000

800000

1000000

1200000

1400000

2 4 8 12 16 24 32

Th

rou

gp

ut

KB

/s

Clients

tuned

not tuned

30%

Dell HPC

NSS Options

• Single NFS Gateway

– Perc H800 RAID card(s) in NFS gateway

› Dell MD1200 JBOD’s connected to RAID cards

– RAID-60 or RAID-60+LVM

24

• Two Active-Passive NFS Gateways

– Dell MD3200 RBOD contains RAID card

– Dell MD1200 JBOD’s are connected to RBOD

– RAID-6 + LVM

NSS NSS-HA

• NFS Gateway – Dell Server (R710)

– RAID-1 for OS (plus 1 hot-spare)

– RAID-0 for additional swap space

– 3 years of support on OS, file system, hardware

– Cold spares (disks)

– IB, 10GigE options

– RHEL 5.5 OS

– Redhat Scalable File system (XFS)

– Dell ProSupport

Common Aspects

Dell HPC

NSS Large Solution: 96 TB’s

25

QDR IB or 10GigE Raw capacity: 96TB

Formatted capacity: ~80TB RAID-60 and LVM RAID-6 within each MD1200

RAID-0 across MD1200 pairs

LVM to combine LUNS

10GigE NFS Performance Peak Sequential Read: 850 MB/s

Peak Sequential Write: 1,180 MB/s

InfiniBand NFS Performance Peak Sequential Read: 1,350 MB/s


Summary

Dell HPC

NSS-HA: Large

26

PowerVault MD1200

PowerVault

MD3200

Dell R710

NSS-HA Server Dell 710

NSS-HA Server

GigE

Power Cords

IB or 10GigE

SAS (6Gbps)

1 1

Raw capacity: 96TB

Formatted capacity: ~80TB RAID-6 and LVM RAID-6 within each MD3200/1200

LVM to combine LUNS

10GigE NFS Performance Peak Sequential Read: 560 MB/s


InfiniBand NFS Performance Peak Sequential Read: 2,430 MB/s


Summary

Dell HPC

Summary

• Two most recent trends:

• GPU Computing – GPU Computing is still evolving

› Hardware (CPUs, GPUs, Interconnect), and software (CUDA)

– Best course of action is to remain flexible

– Ability to upgrade CPUs or GPUs or software independent of each

– External PCIe chassis affords flexibility

› Good host nodes

• Data Management and Storage – Overall it’s the largest problem for users today

– Focus on performance (fast-scratch), reliability (primary), and capacity (long-term)

› Develop a product for each piece and integrate them together

– Roll-it-your-own and Fully supported solutions are available

– Tools for data management are becoming highly critical

27

Thanks!