32
Towards a High-Performance and Scalable Storage System for Workflow Applications Emalayan Vairavanathan 1 The University of British Columbia Department of Electrical and Computer Engineering

Towards a High-Performance and Scalable Storage System for Workflow Applications Emalayan Vairavanathan 1 The University of British Columbia Department

Embed Size (px)

Citation preview

Page 1: Towards a High-Performance and Scalable Storage System for Workflow Applications Emalayan Vairavanathan 1 The University of British Columbia Department

1

Towards a High-Performance and Scalable Storage System for Workflow Applications

Emalayan Vairavanathan

The University of British Columbia

Department of Electrical and Computer Engineering

Page 2: Towards a High-Performance and Scalable Storage System for Workflow Applications Emalayan Vairavanathan 1 The University of British Columbia Department

Background: Workflow Applications

modFTDock workflow

• Large number of independent tasks collectively work on a problem

• Common Characteristics

File based communication

Large number of tasks

Large amount of storage I/O

Regular data access patterns

2

Page 3: Towards a High-Performance and Scalable Storage System for Workflow Applications Emalayan Vairavanathan 1 The University of British Columbia Department

Background – ModFTDock in Argonne Blue Gene/P

3

Central Storage System (e.g., GPFS, NFS)

Scale: 40960 Compute nodes

File based communication

Large IO volumeWorkflow Runtime

Engine

1.2 M Docking

Tasks

IO throughput < 1MBps / core

App. task

Local storage

App. task

Local storage

App. task

Local storage

App. task

Local storage

App. task

Local storage

Page 4: Towards a High-Performance and Scalable Storage System for Workflow Applications Emalayan Vairavanathan 1 The University of British Columbia Department

Z. Zhang et. al, SC’12Background –Central Storage Bottleneck

Montage workflow (512 BG/P CPU cores, GPFS)

4

Page 5: Towards a High-Performance and Scalable Storage System for Workflow Applications Emalayan Vairavanathan 1 The University of British Columbia Department

5

Contributions - Alleviating storage I/O bottleneck

Intermediate Storage System Designed and implemented a

prototype Integrated with workflow runtime Evaluated with applications on BG/P

The Case for Cross-Layer Optimizations in Storage: A

Workflow-Optimized Storage System. S. Al-Kiswany,

Emalayan Vairavanathan, L. B. Costa, H. Yang, M.

Ripeanu. Submitted - FAST '13.

Workflow-aware Storage System Identified new data access patterns Studied the viability of a workflow-aware

storage

A Workflow-Aware Storage System: An Opportunity

Study. Emalayan Vairavanathan, S. Al-Kiswany, L. B.

Costa, Z.Zhang, D.Katz, M.Wilde, M. Ripeanu. CCGRID

'12. Acceptance Rate : 27%.

A case for Workflow-Aware Storage: An Opportunity

Study using MosaStore. Emalayan Vairavanathan, S.

Al-Kiswany, A. Barros, L. B. Costa1 H. Yang, G. Fedak,

D.Katz, M.Wilde, M. Ripeanu. Submitted - FGCS Journal

MosaStore Storage System Experimental platform for other

studies

Predicting Intermediate Storage Performance for

Workflow Applications. L. B. Costa, A. Barros, Emalayan

Vairavanathan, S. Al-Kiswany, M. Ripeanu. Submitted –

CCGRID '13.

Page 6: Towards a High-Performance and Scalable Storage System for Workflow Applications Emalayan Vairavanathan 1 The University of British Columbia Department

Intermediate Storage System

6Central Storage System (e.g., GPFS, NFS)

App. task

Local storage

App. task

Local storage

App. task

Local storage

Intermediate Storage

POSIX API

Workflow Runtime

Engine

Stage In

Stage Out

Opportunities:

Underutilized resources

Compute Nodes

Page 7: Towards a High-Performance and Scalable Storage System for Workflow Applications Emalayan Vairavanathan 1 The University of British Columbia Department

7

Evaluation - modFTDock on Blue Gene/P

20- 40% improvement

2x improvement

Page 8: Towards a High-Performance and Scalable Storage System for Workflow Applications Emalayan Vairavanathan 1 The University of British Columbia Department

8

Contributions - Alleviating storage I/O bottleneck

Intermediate Storage System Designed and implemented a

prototype Integrated with workflow runtime Evaluated with applications on BG/P

The Case for Cross-Layer Optimizations in Storage: A

Workflow-Optimized Storage System. S. Al-Kiswany,

Emalayan Vairavanathan, L. B. Costa, H. Yang, M.

Ripeanu. Submitted - FAST '13.

Workflow-aware Storage System Identified new data access patterns Studied the viability of a workflow-aware

storage

A Workflow-Aware Storage System: An Opportunity

Study. Emalayan Vairavanathan, S. Al-Kiswany, L. B.

Costa, Z.Zhang, D.Katz, M.Wilde, M. Ripeanu. CCGRID

'12. Acceptance Rate : 27%.

A case for Workflow-Aware Storage: An Opportunity

Study using MosaStore. Emalayan Vairavanathan, S.

Al-Kiswany, A. Barros, L. B. Costa1 H. Yang, G. Fedak,

D.Katz, M.Wilde, M. Ripeanu. Submitted - FGCS Journal

MosaStore Storage System Experimental platform for other

studies

Predicting Intermediate Storage Performance for

Workflow Applications. L. B. Costa, A. Barros, Emalayan

Vairavanathan, S. Al-Kiswany, M. Ripeanu. Submitted –

CCGRID '13.

Page 9: Towards a High-Performance and Scalable Storage System for Workflow Applications Emalayan Vairavanathan 1 The University of British Columbia Department

A Workflow-aware Storage System

Central Storage System (e.g., GPFS)

Task scheduling

POSIX APIApp. task

Local storage

App. task

Local storage

App. task

Local storage

Intermediate storage (shared)

Compute Nodes

Stage In/Out

Workflow Runtime

Engine

Deploy intermediate storage

Opportunities

Dedicated intermediate storage

Exposing data location

Regular data access patterns

Workflow-aware Intermediate Storage

Page 10: Towards a High-Performance and Scalable Storage System for Workflow Applications Emalayan Vairavanathan 1 The University of British Columbia Department

10

Data Access Patterns in Workflow Applications

• Pipeline

• Broadcast

• Reduce

• Scatter

and Gather

Locality andlocation-aware scheduling

Replication

Collocation and location-aware scheduling

Block-level data placement

Wozniak et al PDSW’09, Katz et al BlueWater, Shibata et al. HPDC’10

Page 11: Towards a High-Performance and Scalable Storage System for Workflow Applications Emalayan Vairavanathan 1 The University of British Columbia Department

Data Access Patterns in ModFTDock

Broadcast pattern

Reduce pattern

Pipelinepattern

11

ModFTDock

Page 12: Towards a High-Performance and Scalable Storage System for Workflow Applications Emalayan Vairavanathan 1 The University of British Columbia Department

12

Evaluation - Baselines

MosaStore, NFS and Node-local storage

vs Workflow-aware storage

Central Storage System (e.g., GPFS, NFS)

App. task

Local storage

App. task

Local storage

App. task

Local storage

Intermediate storage (shared)

Compute Nodes

Stage In/Out

MosaStore

NFS

Local storage

Workflow-aware storage

Page 13: Towards a High-Performance and Scalable Storage System for Workflow Applications Emalayan Vairavanathan 1 The University of British Columbia Department

13

Evaluation - Platform

• Cluster of 20 machines. Intel Xeon 4-core, 2.33-GHz CPU, 4-GB RAM, 1-Gbps NIC, and a RAID-

1 on two 300-GB 7200-rpm SATA disks

• Central storage NFS server Intel Xeon E5345 8-core, 2.33-GHz CPU, 8-GB RAM, 1-Gbps NIC, and

a 6 SATA disks in a RAID 5 configuration

NFS server is better provisioned

Page 14: Towards a High-Performance and Scalable Storage System for Workflow Applications Emalayan Vairavanathan 1 The University of British Columbia Department

14

Evaluation – Benchmarks and Application

Synthetic benchmark

Application and workflow run-time engine Montage modFTDock

Workload Pipeline Broadcast Reduce

Small 100KB, 200KB, 10KB 100KB, 1KB 10KB, 100KB

Medium 100 MB, 200 MB, 1MB 100 MB, 1MB 10MB, 200 MB

Large 1GB, 2GB, 10MB 1 GB, 10 MB 100MB, 2 GB

Page 15: Towards a High-Performance and Scalable Storage System for Workflow Applications Emalayan Vairavanathan 1 The University of British Columbia Department

15

Synthetic Benchmark - Pipeline

Average runtime for medium workload

Optimization: Locality and location-aware scheduling

3x improvement in workflow time

Page 16: Towards a High-Performance and Scalable Storage System for Workflow Applications Emalayan Vairavanathan 1 The University of British Columbia Department

Synthetic Benchmarks - Broadcast

16

Optimization: Replication

Average runtime for medium workload on disk

60% improvement in the runtime

Page 17: Towards a High-Performance and Scalable Storage System for Workflow Applications Emalayan Vairavanathan 1 The University of British Columbia Department

17

Evaluation – Montage

Montage workflow

Total application time on five different systems

10% improvement in the runtime

Page 18: Towards a High-Performance and Scalable Storage System for Workflow Applications Emalayan Vairavanathan 1 The University of British Columbia Department

18

Contributions - Alleviating storage I/O bottleneck

Intermediate Storage System Designed and implemented a

prototype Integrated with workflow runtime Evaluated with applications on BG/P

The Case for Cross-Layer Optimizations in Storage: A

Workflow-Optimized Storage System. S. Al-Kiswany,

Emalayan Vairavanathan, L. B. Costa, H. Yang, M.

Ripeanu. Submitted - FAST '13.

Workflow-aware Storage System Identified new data access patterns Studied the viability of a workflow-aware

storage

A Workflow-Aware Storage System: An Opportunity

Study. Emalayan Vairavanathan, S. Al-Kiswany, L. B.

Costa, Z.Zhang, D.Katz, M.Wilde, M. Ripeanu. CCGRID

'12. Acceptance Rate : 27% (one of the top 15 papers).

A case for Workflow-Aware Storage: An Opportunity

Study using MosaStore. Emalayan Vairavanathan, S.

Al-Kiswany, A. Barros, L. B. Costa1 H. Yang, G. Fedak,

D.Katz, M.Wilde, M. Ripeanu. Submitted - FGCS Journal

MosaStore Storage System Experimental platform for other

studies

Predicting Intermediate Storage Performance for

Workflow Applications. L. B. Costa, A. Barros, Emalayan

Vairavanathan, S. Al-Kiswany, M. Ripeanu. Submitted –

CCGRID '13.

Page 19: Towards a High-Performance and Scalable Storage System for Workflow Applications Emalayan Vairavanathan 1 The University of British Columbia Department

19

THANK YOU

Page 20: Towards a High-Performance and Scalable Storage System for Workflow Applications Emalayan Vairavanathan 1 The University of British Columbia Department

20

BACKUP SLIDES

Page 21: Towards a High-Performance and Scalable Storage System for Workflow Applications Emalayan Vairavanathan 1 The University of British Columbia Department

21

Background –Many-task workflows

Large amount of legacy code

Rapid application development

Portability (workstation – supercomputers)

Easy to debug

Implicit fault-tolerance

Expression of natural parallelism

Page 22: Towards a High-Performance and Scalable Storage System for Workflow Applications Emalayan Vairavanathan 1 The University of British Columbia Department

22

Background – Motivation

Many-task applications are becoming popular

Better utilization of costly hardware, Energy saving (lot of time is spend to execute workflow applications)

Better scalability and high performance will help to solve large problems more accurately

Large number of available workflow applications

Page 23: Towards a High-Performance and Scalable Storage System for Workflow Applications Emalayan Vairavanathan 1 The University of British Columbia Department

23

Blue Gene/P Architecture

40960 compute nodes (160K cores)

10 Gbps Switch

Complex

10 Gbps Switch

Complex

GPFS: deployed on 128 file server nodes (3 Petabytes

storage capacity)

640 IO NodesTorus N

etwork

6.4 Gbpsper link.

Tree network(850 MBps x 640)

10 Gb/s x 128

Page 24: Towards a High-Performance and Scalable Storage System for Workflow Applications Emalayan Vairavanathan 1 The University of British Columbia Department

24

Example Workflow Software Stack

Shared Storage System

Swift script

Intermediate Code

Task dispatching service (e.g. Coasters)

Worker Worker Worker Worker

Worker Worker Worker Worker

Workflow runtime engine (e.g. Swift)

Tasks / Notifications

Tasks / Notifications

Performs Storage IO

Swift Compiler

Page 25: Towards a High-Performance and Scalable Storage System for Workflow Applications Emalayan Vairavanathan 1 The University of British Columbia Department

25

Intermediate Storage System

MosaStore

• File is divided into fixed size chunks.

• Chunks: stored on the storage nodes.

• Manager maintains a block-map for each file

• POSIX interface for accessing the system

MosaStore distributed storage architecture

Page 26: Towards a High-Performance and Scalable Storage System for Workflow Applications Emalayan Vairavanathan 1 The University of British Columbia Department

26

Contribution - Intermediate Storage System

Support a set of POSIX APIs (random read and write, delete, close)

Garbage-collection

Replication (eager and lazy)

Client side caching

MosaStore Storage System

Page 27: Towards a High-Performance and Scalable Storage System for Workflow Applications Emalayan Vairavanathan 1 The University of British Columbia Department

27

Viability study – Changes in MosaStore

• Optimized data placement for the pipeline pattern

Priority to local writes and reads

• Optimized data placement for the reduce pattern

Collocating files in a single benefactor

• Replication mechanism optimized for the broadcast pattern

Parallel replication

• Data block placement for the scatter and gather patterns

Page 28: Towards a High-Performance and Scalable Storage System for Workflow Applications Emalayan Vairavanathan 1 The University of British Columbia Department

28

Evaluation - Synthetic Benchmark on Blue Gene/P

100% performance gain in the application runtime

Pipeline benchmark Runtime at different scale

Page 29: Towards a High-Performance and Scalable Storage System for Workflow Applications Emalayan Vairavanathan 1 The University of British Columbia Department

29

Synthetic Benchmarks - Reduce

Optimization: Collocation and location-aware scheduling

Average runtime for medium workload

2x improvement in the runtime

Page 30: Towards a High-Performance and Scalable Storage System for Workflow Applications Emalayan Vairavanathan 1 The University of British Columbia Department

30

Synthetic benchmarks – Small workload

Reduce benchmark Broadcast benchmark

Page 31: Towards a High-Performance and Scalable Storage System for Workflow Applications Emalayan Vairavanathan 1 The University of British Columbia Department

31

Evaluation – ModFTDock

ModFTDock workflowTotal application time on three

different systems

20% improvement in the runtime

Page 32: Towards a High-Performance and Scalable Storage System for Workflow Applications Emalayan Vairavanathan 1 The University of British Columbia Department

32

Evaluation – Montage per stage time

Total application time five different systems