15
Jialin Liu, Bradly Crysler, Yin Lu, Yong Chen Oct. 15. 2013@U-REaSON Seminar Data-Intensive Scalable Computing Laboratory (DISCL) Locality-driven High-level I/O Aggregation for Processing Scientific Datasets 1

Jialin Liu , Bradly Crysler , Yin Lu , Yong Chen Oct. 15. 2013@U-REaSON Seminar

  • Upload
    zeroun

  • View
    29

  • Download
    0

Embed Size (px)

DESCRIPTION

Jialin Liu , Bradly Crysler , Yin Lu , Yong Chen Oct. 15. 2013@U-REaSON Seminar Data-Intensive Scalable Computing Laboratory (DISCL ). Locality-driven High-level I/O Aggregation for Processing Scientific Datasets. Introduction. - PowerPoint PPT Presentation

Citation preview

Page 1: Jialin Liu ,  Bradly Crysler , Yin Lu , Yong Chen Oct. 15. 2013@U-REaSON Seminar

Jialin Liu, Bradly Crysler, Yin Lu, Yong ChenOct. 15. 2013@U-REaSON Seminar

Data-Intensive Scalable Computing Laboratory (DISCL)

Locality-driven High-level I/O Aggregation for Processing Scientific Datasets

1

Page 2: Jialin Liu ,  Bradly Crysler , Yin Lu , Yong Chen Oct. 15. 2013@U-REaSON Seminar

Introduction

Scientific simulations nowadays generate a few terabytes (TB) of data in a single run and the data sizes are expected to reach petabytes (PB) in the near future. VPIC, Vector Particle in Cell, Plasma physics, 26 bytes per particle, 30TB

Accessing and analyzing the data reveals poor I/O performance due to the logical-physical mismatching.

Page 3: Jialin Liu ,  Bradly Crysler , Yin Lu , Yong Chen Oct. 15. 2013@U-REaSON Seminar

Introduction

Scientific Datasets and Scientific I/O Libraries PnetCDF, HDF5, ADIOS

PnetCDF

MPI-IO

Parallel File Systems

Scientific I/O libraries allow users to specify array-based logical input Logical-physical mismatching

Page 4: Jialin Liu ,  Bradly Crysler , Yin Lu , Yong Chen Oct. 15. 2013@U-REaSON Seminar

Motivation

I/O methods in scientific I/O libraries(PnetCDF, ADIOS, HDF5):

Independent I/O

Collective I/O

Nonblocking I/O

Processes collaboration: No Calls collaboration : No

Processes collaboration: Yes Calls collaboration : No

Processes collaboration: Yes Calls collaboration : Yes

Page 5: Jialin Liu ,  Bradly Crysler , Yin Lu , Yong Chen Oct. 15. 2013@U-REaSON Seminar

Motivation

Contention on Storage Server without Aware of Locality

Call0

Call1

Calli

Two Phase Collective I/O

…ag00 ag01 ag02 ag03

… … …

ag10 ag11 ag12 ag13 agi0 agi1 agi2 agi3

Page 6: Jialin Liu ,  Bradly Crysler , Yin Lu , Yong Chen Oct. 15. 2013@U-REaSON Seminar

Performance with Overlapping Calls

Conclusion: Overlapping Should be Removed

Page 7: Jialin Liu ,  Bradly Crysler , Yin Lu , Yong Chen Oct. 15. 2013@U-REaSON Seminar

Idea: High level I/O Aggregation

start{0,0,0}length{100,200,100}

start{0,0,100}length{100,200,100}

start{10,20,100}length{10,150,400}

start{10,170,100}length{10,150,400}

PhysicalLayoutsub0

sub2

sub0sub2

sub1

sub3

sub1sub3

PhysicalLayout

start{0,0,0}length{100,200,200}

start{10,20,100}length{10,300,400}

Call0

Call1

Logical Input Decomposition

Page 8: Jialin Liu ,  Bradly Crysler , Yin Lu , Yong Chen Oct. 15. 2013@U-REaSON Seminar

Idea: High level I/O Aggregation

Basic Idea Figure out the overlapping among requests Eliminate the overlapping before doing I/O

Challenges How to decompose the requests How to aggregate the sub-arrays at a high level

Page 9: Jialin Liu ,  Bradly Crysler , Yin Lu , Yong Chen Oct. 15. 2013@U-REaSON Seminar

Hila: High Level I/O Aggregation

Way to figure out the physical layout Sub-correlation Function

Sub-correlation Set

Lustre Striping: stripe size: t; stripe count: l; Dataset : Dimension: d; subsets size: m

Page 10: Jialin Liu ,  Bradly Crysler , Yin Lu , Yong Chen Oct. 15. 2013@U-REaSON Seminar

Hila Algorithm: Prior Step

Prior Step: calculate sub-correlation set, one time analysis

Page 11: Jialin Liu ,  Bradly Crysler , Yin Lu , Yong Chen Oct. 15. 2013@U-REaSON Seminar

Hila Algorithm: Decomposition

Main Steps: Request Decomposition and Aggregation

Page 12: Jialin Liu ,  Bradly Crysler , Yin Lu , Yong Chen Oct. 15. 2013@U-REaSON Seminar

Improvement with Hila

Performance Improved with Hila

Page 13: Jialin Liu ,  Bradly Crysler , Yin Lu , Yong Chen Oct. 15. 2013@U-REaSON Seminar

Improvement with Hila

FASM Improved with Hila

Page 14: Jialin Liu ,  Bradly Crysler , Yin Lu , Yong Chen Oct. 15. 2013@U-REaSON Seminar

Conclusion and Future Work

Conclusion The mismatching between logical access and physical layout

can lead to poor performance. We propose the locality-driven high-level aggregation approach

(HiLa) to facilitate the existing I/O methods by eliminating the overlapping among sub-array requests.

Future Work Apply to write operations Integrate with file systems.

Page 15: Jialin Liu ,  Bradly Crysler , Yin Lu , Yong Chen Oct. 15. 2013@U-REaSON Seminar

Locality-driven High-level I/O Aggregationfor Processing Scientific Datasets

ThanksQ&A

http://discl.cs.ttu.edu