32
HPC Storage Integration - Dell NFS Storage Solution (NSS) Overview Onur Celebioglu HPC Advisory Council Workshop [email protected]

HPC Storage Integration - Dell NFS Storage Solution · PDF fileHPC Storage Integration - Dell NFS Storage Solution (NSS) Overview ... –Redhat Enterprise Linux (RHEL) 5.5 ... HA-LVM

Embed Size (px)

Citation preview

HPC Storage Integration - Dell NFS Storage Solution (NSS) Overview

Onur Celebioglu HPC Advisory Council Workshop [email protected]

Dell HPC

The Dell HPC NFS Storage Solutions (NSS) Agenda

• When is NFS needed?

• Where does NFS fit?

– NFS as a function of scale and throughput

• HPC Challenges

• Development of NSS

– What does Dell NSS do for you?

• Starting point for NSS configurations

• NSS Configurations

• NSS Performance

Confidential 2

Dell HPC

When is NFS needed?

• Virtually all clusters, regardless of size, need a shared file system

– At the very minimum for /home and applications

• For small to medium systems NFS can also serve as primary file system for jobs

– No need for high-speed scratch file system for most application profiles

• Even for large systems, NFS can serve user /home data and applications

– No need for performance – just reliability and ease of use/management

Confidential 3

Dell HPC

HPC Challenges: Complexity, Performance, Cost

4 Confidential

• Challenge #1 is Compute

› Scalable applications

› Simplified deployment

› Cost effective nodes

• Challenge #2 is the Interconnect

› Low latency networks

› Robust management tools

› Costs as a % of node cost

› Challenge #3 is Storage & I/O

› Common file system

› Good performance and cost effective

› Reliable, ease to configure and manage

› Performance Tuned Configurations

The NSS addresses this

Dell HPC

The Dell HPC NFS Storage Solution

Confidential 5

NFS Gateway

… Storage – MD1200

Expansion

MD1200’s

• Takes the guesswork out of NFS configurations – Appliance approach to inexpensive NFS solutions

• Range of capacity: – 24TB – 96TB in a single namespace

• Good performance – 240 MB/s to 1.45 GB/s for NFS performance

– 6Gbps SAS, optional IB or 10GigE

– Tuned storage and file system configurations

• Cost Effective

• Reliable and supported – Proven hardware

– 3 years support with Dell including XFS support

– Redundant power supplies, connections, plus drive spares kit

• Easy to install – Dell configuration and deployment: Whitepaper and Dell PS

– Affordable custom installation services available

Dell HPC

Development of Dell NSS

• NSS Goals/Requirements: – No proprietary hardware/software – true Open Storage

– Appliance approach

• Approach: Dell examined options in terms of: – File systems, RAID levels and configuration

– LVM configuration

– File system and server OS tuning

– NFS server configuration

• Results: – NFS Gateway

› Dell PowerEdgeTM R710

› Extreme reliability, cost effectiveness, expandability

– Storage: › Dell PowervaultTM MD1200

› Great performance (6 Gbps SAS), reliability, and $/GB

– A baseline set of configurations › Great combination of performance and reliability

› Three defined configurations based on capacity

Confidential 6

Dell HPC

RAID Configurations: Seq Write

0

200

400

600

800

1000

1200

1400

6 Dr 12 Dr 24 Dr 36 Dr 48 Dr

Th

rou

gh

pu

t (M

B/s

)

MD1200 + H800, NL SAS, Seq Write, DD

R6/6

R5/6

R6/12

Dell HPC

RAID Configurations: Seq Read

0

200

400

600

800

1000

1200

1400

1600

1800

2000

6 Dr 12 Dr 24 Dr 36 Dr 48 Dr

Th

rou

gh

pu

t (M

B/s

)

MD1200 + H800, NL SAS, Seq Read, DD

R6-6

R5-6

R6-12

Dell HPC

RAID Configurations: Reliability Modeling

o R6/12 MTTDL is much higher than R5/6 ~27,000 years at 84 drives

0

20

40

60

80

100

120

140

160

6 Drs 12 Drs 24 Drs 36 Drs 48 Drs 60 Drs 72 Drs 84 Drs

MT

TD

L (

Ye

ars

)

Reliability, R50 based on R5/6, 2TB/Drive Assumptions:

2TB Drives

MTTF of disk: 600K hours

Hot spare drives

bit error rate: 10-15

Dell HPC

Benefits of Dell NSS

• Performance tuned NFS server – Best possible performance

– No need to experiment with tuning options – already tuned

Confidential 10

0

200000

400000

600000

800000

1000000

1200000

1400000

2 4 8 12 16 24 32

Th

rou

gp

ut

KB

/s

Clients

tuned

not tuned

30%

Dell HPC

The Dell HPC NFS Storage Solution (NSS)

Configurations

Dell HPC

Configuration Starting Point: How much capacity? What network?

12 Confidential

24TB 48TB 96TB

0

200

400

600

800

1000

1200

1400

1600

NSS Small NSS Medium NSS Large

NF

S T

hro

ug

hp

ut

(MB

/s)

10GigE Read

10GigE Write

InfiniBand Read

InfiniBand Write

Dell HPC

NFS Gateway

13

QDR IB

Confidential

Dell PowerEdge R710: (NFS Gateway) • (2) 2.4 GHz Intel Westmere CPUs (4 cores)

• 24 - 48GB memory

• Varies by configuration

• RAID-1 OS w/ hot spare

• RAID-0 swap space (2 drives)

• 3 years support including FS

• PERC H800

• 1GB cache battery-backed

• RAID-6 or RAID-60

• Varies by configuration

• Tuned LVM for certain configurations

• IB or 10GigE data interface

• GigE management connection

Up to 30% Better Performance

of NSS vs. Untuned

Dell HPC

Software

• Client:

– NFSv3 compatible client

– If using InfiniBand on the clients:

› OFED 1.5.1

• NSS Server:

– Redhat Enterprise Linux (RHEL) 5.5

– NFSv3

– Redhat Scalable File System: XFS version 2.10.2-7

– If using IPoIB:

› OFED 1.5.1

– LVM

Confidential 14

Dell HPC

NSS Small Configuration: 24TB

15

QDR IB

MD1200: (12) 2TB 7.2K NL-SAS

Confidential

Raw capacity: 24TB

Expandable to 96TB

Formatted capacity: ~20TB

Expandable to ~80TB

RAID-6

10GigE NFS Performance

Peak Sequential Read: 275 MB/s

Peak Sequential Write: 550 MB/s

InfiniBand NFS Performance (IPoIB)

Peak Sequential Read: 440 MB/s Peak Sequential Write: 890 MB/s

Summary

Dell HPC

NSS Medium Solution: 48 TB’s

16

QDR IB

MD1200: (12) 2TB 7.2K NL-SAS

MD1200: (12) 2TB 7.2K NL-SAS

Raw capacity: 48TB

Expandable to 96TB

Formatted capacity: ~40TB Expandable to ~80TB

RAID-60

RAID-6 within each MD1200

RAID-0 across MD1200’s

10GigE NFS Performance: Peak Sequential Read: 490 MB/s Peak Sequential Write: 840 MB/s

InfiniBand NFS Performance Peak Sequential Read: 755 MB/s Peak Sequential Write: 1,350 MB/s

Summary

Confidential

Dell HPC

NSS Large Solution: 96 TB’s

17

QDR IB Raw capacity: 96TB

Formatted capacity: ~80TB RAID-60 and LVM RAID-6 within each MD1200

RAID-0 across MD1200’s

LVM to combine LUNS

10GigE NFS Performance Peak Sequential Read: 850 MB/s

Peak Sequential Write: 1,180 MB/s

InfiniBand NFS Performance Peak Sequential Read: 1,350 MB/s

Peak Sequential Write: 1,470 MB/s

Summary

Confidential

Dell HPC

The Dell HPC NFS Storage Solution (NSS)

Performance

Dell HPC

Experimental Configurations

Confidential 19

Dell HPC

10GigE NFS Performance: Sequential Read

Confidential 20

0

100000

200000

300000

400000

500000

600000

700000

800000

900000

1 2 4 8 16 24 32

Th

rou

gh

pu

t K

B/s

Threads (Nodes)

NFS 10GigE Sequential Reads

NSS Small

NSS Medium

NSS Large

• 10GigE with NFS gateway with GigE clients

• Performance Peaks: – NSS Small: 8 nodes doing IO

– NSS Medium: 24 nodes doing IO

– NSS Large: No peak over range tested

Dell HPC

10GigE NFS Performance: Sequential Write

Confidential 21

0

200000

400000

600000

800000

1000000

1200000

1400000

1 2 4 8 16 24 32

Th

rou

gh

pu

t K

B/s

Threads (nodes)

NSS 10 GigE Sequential Writes

NSS Small

NSS Medium

NSS Large

• 10GigE with NFS gateway with GigE clients

• Peaks: – NSS Small: 8 nodes doing IO

– NSS Medium: 8 nodes doing IO

– NSS Large: 16 nodes doing IO

Dell HPC

InfiniBand (IPoIB) NFS Performance: Sequential Read

Confidential 22

0

200000

400000

600000

800000

1000000

1200000

1400000

1600000

1 2 4 8 16 24 32

Th

rou

gh

pu

t K

B/s

Threads (Nodes)

NSS IPoIB Sequential Reads

NSS Small

NSS Medium

NSS Large

• IPoIB NFS gateway with QDR IB clients

• Peaks: – NSS Small: 1 node doing IO (fairly level until 4-8 nodes)

– NSS Medium: 4 nodes doing IO (not much drop-off)

– NSS Large: 8 nodes doing IO (good performance over range)

Dell HPC

Infiniband (IPoIB) NFS Performance: Sequential Write

Confidential 23

0

200000

400000

600000

800000

1000000

1200000

1400000

1600000

1 2 4 8 16 24 32

Th

rou

gh

pu

t K

B/s

Threads (Nodes)

NSS IPoIB Sequential Writes

NSS Small

NSS Medium

NSS Large

• IPoIB NFS gateway with QDR IB clients

• Peaks: – NSS Small: 1 node doing IO (steady drop off to 16 nodes)

– NSS Medium: 2 nodes doing IO (good performance for up to 8-16 nodes)

– NSS Large: 4 nodes doing IO (good performance over range of nodes tested)

Dell HPC

Random Read IOPS Performance 10GigE and Infiniband

• Both 10GigE and IB have about the same performance – Performance dictated by controllers and disks, not network

Confidential 24

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

1 2 4 8 16 24 32

IOP

S

Threads (Nodes)

NSS IPoIB Random Read IOPS

NSS Small

NSS Medium

NSS Large

Dell HPC

Random Write IOPS Performance 10GigE and Infiniband

• Both 10GigE and IB have about the same performance – Performance dictated by controllers and disks, not network

Confidential 25

0

500

1000

1500

2000

2500

1 2 4 8 16 24 32

IOP

S

Threads (Nodes)

NSS IPoIB Random Write IOPS

NSS Small

NSS Medium

NSS Large

Dell HPC

Metadata – File Create

Confidential 26

0

5000

10000

15000

20000

25000

30000

1 2 4 8 16 32

IOP

S

Nodes (Threads)

NSS IPoIB - Metadata - File Create

Small

Medium

Large

Dell HPC

Observations

• Performance of IB (IPoIB) for single node is excellent

– One reason is that clients for 10GigE testing are using GigE instead of IB

• For most node counts, IPoIB is better than 10GigE

• Many applications have just one MPI process doing IO

– You only have a very small number of nodes performing IO

– In these cases, using NFS over IB (IPoIB) will greatly help your I/O performance

Confidential 27

Dell HPC

NSS-HA Large Solution: 96 TB’s

28

Raw capacity: 96TB

Formatted capacity: ~80TB RAID-60 and LVM RAID-6 within each MD enclosure

HA-LVM to combine LUNS

InfiniBand NFS Performance Peak Sequential Read: 2.4 GB/s

Peak Sequential Write: 1.3 GB/s

Summary

Confidential

QDR IB QDR IB

Dell HPC

NSS-HA Large Performance

29 Confidential

• Improved Read performance due to embedded RAID controller

• Writes around 1.3 GB/s with write cache mirroring enabled

Dell HPC

The Dell HPC NFS Storage Solution (NSS)

Summary

Dell HPC

Summary

• Virtually all clusters, regardless of size, need a shared file system

– For small to medium systems NFS can also serve as primary file system for jobs

– Even for large systems, NFS can serve user /home data and applications

• Dell NSS is designed to take the guesswork out of NFS configurations

• Three pre-configured systems – 24, 48TB, 96TB’s, QDR or 10GigE connection to network

– Range of capacity

– Tuned configurations (good performance)

– Cost Effective

– Easy to configure

– Affordable Dell deployment/installation available

Confidential 31