27
© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. Breaking IO Performance Barriers: Scalable Parallel File System for AWS Paresh G. Pattani, Ph.D. Sr. Director, High Performance Data Solutions Intel Corporation July 10, 2014

Breaking IO Performance Barriers: Scalable Parallel File System for AWS

Embed Size (px)

DESCRIPTION

Across all industries worldwide, HPC is helping innovative users achieve breakthrough results—from leading edge academic research to data-intensive applications, such as weather prediction and large-scale manufacturing in the aerospace and automotive sectors. As HPC-powered simulations continue to grow ever larger and more complex, scientists are looking for cost-effective high performance compute resources that's available when they need it. Access to on-demand infrastructure allows opportunities to experiment and try new speculative models. AWS provides computing infrastructure that allows scientists and engineers to solve complex science, engineering, and business problems using applications that require high bandwidth, low latency networking, and very high compute capabilities. Driven by its flexibility and affordability, many HPC and big data workloads are transitioning from on premise entirely onto AWS. But like on-premises HPC, maximizing application of ""HPC cloud"" workloads requires fast and highly scalable storage. Intel® Cloud Edition for Lustre Software has been purpose-built for use with the dynamic computing resources available from Amazon Web Services to provide the fast, massively scalable storage software resources needed to accelerate performance, even on complex workloads.

Citation preview

Page 1: Breaking IO Performance Barriers: Scalable Parallel File System for AWS

© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Breaking IO Performance Barriers:

Scalable Parallel File System for AWS

Paresh G. Pattani, Ph.D.

Sr. Director, High Performance Data Solutions

Intel Corporation July 10, 2014

Page 2: Breaking IO Performance Barriers: Scalable Parallel File System for AWS

The need for parallel storage

Page 3: Breaking IO Performance Barriers: Scalable Parallel File System for AWS

Parallel Storage Needs

• Time spent storing and retrieving data is time not

spent on compute. Fast storage maximizes

processing utilization.

Scalability

Reliability

Performance

• Growing datasets require greater amounts of storage

and the ability to expand existing storage.

• Large clusters and critical workloads require a

comprehensive focus on data availability.

Page 4: Breaking IO Performance Barriers: Scalable Parallel File System for AWS

Scale Out Storage Using Lustre*

• Purpose-built for HPC

• Distributed, Parallel, Vast Global Namespace

• Linux server based

• Linux, Windows and Mac client support

• Support for 100,000+ Clients

• Designed for Reliable Storage

• Now available on AWS Marketplace lustre.intel.com/cloudedition

* Some names and brands may be claimed as the property of others.

Page 5: Breaking IO Performance Barriers: Scalable Parallel File System for AWS

Intel Strategy for Lustre* Storage

Extend core Lustre* for use across

HPC and enterprise applications

Intel Enhanced Lustre* – HPC Clouds

Extend core Lustre* with key

features for new markets and use

cases

Push Lustre* onto HPC cloud

infrastructure

Open-source innovation driving

performance at scale

Open Source - Powerful storage

foundation for exascale applications

Increased scale and streaming

bandwidth

Accelerate maturity, lower risk

and grow the ecosystem

1 2

* Some names and brands may be claimed as the property of others.

Page 6: Breaking IO Performance Barriers: Scalable Parallel File System for AWS

Use Models: Cloud Resources for HPC

1 Augment: burst peak workloads and supplement resources

2 Transition: move on-premises HPC to cloud infrastructure

3 Deploy: launch new applications exclusively to the cloud

Page 7: Breaking IO Performance Barriers: Scalable Parallel File System for AWS

Key HPC Markets Using Lustre* Today

Large-scale Manufacturing

Weather and Climate

Life Sciences Energy Finance

* Some names and brands may be claimed as the property of others.

Page 8: Breaking IO Performance Barriers: Scalable Parallel File System for AWS

What Does Intel® Cloud

Edition for Lustre* Software

Look Like?

*Other names and brands may be claimed as the property of others.

Page 9: Breaking IO Performance Barriers: Scalable Parallel File System for AWS

MDS

MDS

Lustre* Components

Management Metadata Storage

Lustre* mount service

Initial point of contact

for clients

Namespace of file

system

File layouts, no data

Scalable

File content stored as

objects

Striped across targets

Scales to 100+

MGT

MDT

OST

OST

MGS

OSS

OSS

*Other names and brands may be claimed as the property of others.

Page 10: Breaking IO Performance Barriers: Scalable Parallel File System for AWS

Deploying a Storage Cluster

Page 11: Breaking IO Performance Barriers: Scalable Parallel File System for AWS

Deploying a Storage Cluster

Page 12: Breaking IO Performance Barriers: Scalable Parallel File System for AWS

Deploying a Storage Cluster

Page 13: Breaking IO Performance Barriers: Scalable Parallel File System for AWS

Deploying a Storage Cluster

Page 14: Breaking IO Performance Barriers: Scalable Parallel File System for AWS

Monitoring & Command Line Interface

Page 15: Breaking IO Performance Barriers: Scalable Parallel File System for AWS

Performance….

Page 16: Breaking IO Performance Barriers: Scalable Parallel File System for AWS

Large File Benchmark

Comparing 3 Lustre* cluster configuration

Increase the number of OSSs • 4 OSS

• 8 OSS

• 16 OSS

Configurations of MGS and MDS are the

same

We use 32 clients

MDS

EBS Optimized

RAID0

8x 40GB

Standard

110 MB/sec

m3.2xlarge

OSS

EBS Optimized 8x 100GB

Standard

110 MB/sec

m3.2xlarge

Client 110 MB/sec

m3.2xlarge

MGS 94 MB/sec

m1.medium

*Other names and brands may be claimed as the property of others.

Page 17: Breaking IO Performance Barriers: Scalable Parallel File System for AWS

IOR Sequential Read FPP

0

200

400

600

800

1000

1200

1400

1600

1 2 4 8 16 32

4OSS

8OSS

16OSS

N. Clients

MB/sec

Client’s network bottleneck

OSS’s network bottleneck

OSS’s network bottleneck

Close to the OSS network

Page 18: Breaking IO Performance Barriers: Scalable Parallel File System for AWS

0

200

400

600

800

1000

1200

1400

1600

1 2 4 8 16 32

4OSS

8OSS

16OSS

IOR Sequential Write FPP

N. Clients

MB/sec

Client’s network bottleneck

OSS’s network bottleneck

OSS’s network bottleneck

Ops….

Page 19: Breaking IO Performance Barriers: Scalable Parallel File System for AWS

Aggregate Performance During Run

• LTOP is available and

we use it to record the

OSTs activities during

the IOR run.

• With a simple python

script we create this

graph: “aggregate

performance vs time”

to analyze the problem.

time

1920

MB/sec

Long tail

Page 20: Breaking IO Performance Barriers: Scalable Parallel File System for AWS

Compare Lustre* and NFS

*Other names and brands may be claimed as the property of others.

Page 21: Breaking IO Performance Barriers: Scalable Parallel File System for AWS

Small File Benchmark

Simulated EDA Benchmark • Simulate workload by compiling a package

• untar; configure; make;

• Python wrapper parallelizes on cluster using MPI

• Calculate score based on (total workload/runtime)

32 Clients • Linux, c3.xlarge

Compare with NFS • Linux, i2.4xlarge

• 4x EBS RAID0

Page 22: Breaking IO Performance Barriers: Scalable Parallel File System for AWS

Lustre* Configuration

1 MGT • m3.medium

1 - 4 MDTs • m3.2xlarge

• 8x 40GB EBS

4 OSTs • c3.xlarge

• 8x 40GB EBS

*Other names and brands may be claimed as the property of others.

Page 23: Breaking IO Performance Barriers: Scalable Parallel File System for AWS

EDABench – Lustre* vs. NFS

0

2000

4000

6000

8000

10000

12000

1 2 4 8 16 32 64 128

EDABench Score

(Compile)

Processes (32 clients)

1 MDT

2 MDTs

4 MDTs

NFS

*Other names and brands may be claimed as the property of others.

Page 24: Breaking IO Performance Barriers: Scalable Parallel File System for AWS

Storage Instance Cost Comparison

• EBS Optimized for all storage instances

• Global Support for Lustre*

• Does not include EBS cost

Cluster Option Total Cost / Hour

Lustre* – 1xMDT + 4xOSS $2.00

Lustre* – 2xMDT + 4xOSS $2.69

Lustre* – 4xMDT + 4xOSS $4.07

NFS – i2.4xlarge $3.51

*Other names and brands may be claimed as the property of others.

Page 25: Breaking IO Performance Barriers: Scalable Parallel File System for AWS

Intel® Cloud Edition for Lustre* software

*Other names and brands may be claimed as the property of others.

Page 26: Breaking IO Performance Barriers: Scalable Parallel File System for AWS

Status Today

• Available on AWS Marketplace

• Setup in less than 10 minutes

• Try for yourself lustre.intel.com/cloudedition

lustre.intel.com/contactus

Page 27: Breaking IO Performance Barriers: Scalable Parallel File System for AWS

Thank You.