33
A Data Diffusion A Data Diffusion Approach to Large Scale Approach to Large Scale Scientific Exploration Scientific Exploration Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago Joint work with: Yong Zhao: Microsoft Ian Foster: Univ. of Chicago, CS & CI, Argonne National Laboratory, MCS Alex Szalay: The Johns Hopkins Univ., Dept. of Physics and Astronomy Funded by: NSF: TeraGrid DOE: Advanced Scientific Computing Research NASA: Ames Research Center 2007 Microsoft eScience Workshop at RENCI October 21 st , 2007

A Data Diffusion Approach to Large Scale Scientific Explorationiraicu/research/presentations/2007... · 2011-10-09 · A Data Diffusion Approach to Large Scale Scientific Exploration

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Data Diffusion Approach to Large Scale Scientific Explorationiraicu/research/presentations/2007... · 2011-10-09 · A Data Diffusion Approach to Large Scale Scientific Exploration

A Data Diffusion A Data Diffusion Approach to Large Scale Approach to Large Scale

Scientific ExplorationScientific ExplorationIoan Raicu

Distributed Systems LaboratoryComputer Science Department

University of Chicago

Joint work with: Yong Zhao: Microsoft

Ian Foster: Univ. of Chicago, CS & CI, Argonne National Laboratory, MCSAlex Szalay: The Johns Hopkins Univ., Dept. of Physics and Astronomy

Funded by: NSF: TeraGrid

DOE: Advanced Scientific Computing ResearchNASA: Ames Research Center

2007 Microsoft eScience Workshop at RENCIOctober 21st, 2007

Page 2: A Data Diffusion Approach to Large Scale Scientific Explorationiraicu/research/presentations/2007... · 2011-10-09 · A Data Diffusion Approach to Large Scale Scientific Exploration

10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 2

Data Diffusion Example:AstroPortal Stacking Service• Purpose

– On-demand “stacks” of random locations within ~10TB dataset

• Challenge– Rapid access to 10-10K

“random” files– Time-varying load

• Solution– Dynamic acquisition of

compute, storage

S4 SloanData

+

+++

+

+

=

+

Web page or Web Service

Page 3: A Data Diffusion Approach to Large Scale Scientific Explorationiraicu/research/presentations/2007... · 2011-10-09 · A Data Diffusion Approach to Large Scale Scientific Exploration

10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 3

Falkon: a Fast and Light-weight tasK executiON framework

• Goal: enable the rapid and efficient execution of many independent jobs on large compute clusters

• Combines three techniques:– multi-level scheduling techniques to enable

separate treatments of resource provisioning– a streamlined task dispatcher able to achieve

order-of-magnitude higher task dispatch rates than conventional schedulers

– performs data diffusion and uses a data-aware scheduler to leverage the co-located computational and storage resources

Page 4: A Data Diffusion Approach to Large Scale Scientific Explorationiraicu/research/presentations/2007... · 2011-10-09 · A Data Diffusion Approach to Large Scale Scientific Exploration

10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 4

Execution Model• Dispatch Policy

– first-available, max-compute-util, max-cache-hit*• Caching Policy

– random, FIFO, LRU, LFU• Replay policy• Resource Acquisition Policy

– one-at-a-time, additive, exponential, all-at-once, optimal*

• Resource Release Policy– distributed, centralized*

Page 5: A Data Diffusion Approach to Large Scale Scientific Explorationiraicu/research/presentations/2007... · 2011-10-09 · A Data Diffusion Approach to Large Scale Scientific Exploration

10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 5

Falkon 2-Tier Architecture

• Tier 1: Dispatcher– GT4 Web Service accepting

task submissions from clients and sending them to available executors

• Tier 2: Executor– Run tasks on local resources

• Provisioner– Static and dynamic resource

provisioningWS

WS

ProvisionerCompute

Resources

Executor 1

Clients

Executor n

ComputeResource m

Compute Resource 1

Dispatcher

Page 6: A Data Diffusion Approach to Large Scale Scientific Explorationiraicu/research/presentations/2007... · 2011-10-09 · A Data Diffusion Approach to Large Scale Scientific Exploration

10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 7

Dispatch Throughput

0

100

200

300

400

500

0 32 64 96 128 160 192 224 256Number of Executors

Thro

ughp

ut (t

asks

/sec

)

WS Calls (no security)Falkon (no security)Falkon (GSISecureConversation)

Page 7: A Data Diffusion Approach to Large Scale Scientific Explorationiraicu/research/presentations/2007... · 2011-10-09 · A Data Diffusion Approach to Large Scale Scientific Exploration

10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 8

Execution Efficiency

0.90.910.920.930.940.950.960.970.980.99

1

0 32 64 96 128 160 192 224 256Number of Executors

Effic

ienc

y

Sleep 64Sleep 32Sleep 16Sleep 8Sleep 4Sleep 2Sleep 1

Page 8: A Data Diffusion Approach to Large Scale Scientific Explorationiraicu/research/presentations/2007... · 2011-10-09 · A Data Diffusion Approach to Large Scale Scientific Exploration

10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 10

Resource Provisioning0. provisioner registration1. task(s) submit2. resource allocation to GRAM3. resource allocation to LRM4. executor registration5. notification for work6. pick up task(s)7. deliver task(s) results8. notification for task result9. pick up task(s) results

Page 9: A Data Diffusion Approach to Large Scale Scientific Explorationiraicu/research/presentations/2007... · 2011-10-09 · A Data Diffusion Approach to Large Scale Scientific Exploration

10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 13

Dynamic Resource Provisioning

GRAM+PBS Falkon-15 Falkon-60 Falkon-120 Falkon-180 Falkon-∞

Ideal (32 nodes)

Queue Time (sec) 611.1 87.3 83.9 74.7 44.4 43.5 42.2Execution Time (sec) 56.5 17.9 17.9 17.9 17.9 17.9 17.8Execution

Time % 8.5% 17.0% 17.6% 19.3% 28.7% 29.2% 29.7%

• End-to-end execution time: – 1260 sec in ideal case– 4904 sec 1276 sec

• Average task queue time: – 42.2 sec in ideal case– 611 sec 43.5 sec

• Trade-off:– Resource Utilization for

Execution Efficiency

GRAM+PBS Falkon-15 Falkon-60 Falkon-120 Falkon-180 Falkon-∞

Ideal (32 nodes)

Time to complete

(sec) 4904 1754 1680 1507 1484 1276 1260Resouce

Utilization 30% 89% 75% 65% 59% 44% 100%Execution Efficiency 26% 72% 75% 84% 85% 99% 100%Resource

Allocations 1000 11 9 7 6 0 0

Page 10: A Data Diffusion Approach to Large Scale Scientific Explorationiraicu/research/presentations/2007... · 2011-10-09 · A Data Diffusion Approach to Large Scale Scientific Exploration

10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 14

Data Diffusion• Data caching at executor

local disk– RANDOM– FIFO– LRU– LFU

• Avoid the need for a shared file system

• Theoretical linear scalability with compute resources

0100200300400500600700800900

10001100

0.000

001

0.000

010.0

001

0.001 0.0

1 0.1 1 10 100

1000

File Size (MB)

Thro

ughp

ut (M

b/s)

Local ReadLocal Read+WriteGPFS ReadGPFS Read+Write

Page 11: A Data Diffusion Approach to Large Scale Scientific Explorationiraicu/research/presentations/2007... · 2011-10-09 · A Data Diffusion Approach to Large Scale Scientific Exploration

10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 15

Shared File SystemPerformance

0

500

1000

1500

2000

2500

3000

3500

4000

1E-06

0.001 0.0

1 0.1 1 10 100

1000

File Size (MB)

Rea

d Th

roug

hput

(Mb/

s)

64 Nodes32 Nodes16 Nodes8 Nodes4 Nodes2 Nodes1 Node

0

500

1000

1500

2000

2500

3000

3500

4000

1E-06

0.001 0.0

1 0.1 1 10 100

1000

File Size (MB)

Rea

d+W

rite

Thro

ughp

ut (M

b/s)

64 Nodes32 Nodes16 Nodes8 Nodes4 Nodes2 Nodes1 Node

Page 12: A Data Diffusion Approach to Large Scale Scientific Explorationiraicu/research/presentations/2007... · 2011-10-09 · A Data Diffusion Approach to Large Scale Scientific Exploration

10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 16

Local Disk Theoretical Performance

0

10000

20000

30000

40000

50000

60000

70000

1E-06

0.001 0.0

1 0.1 1 10 100

1000

File Size (MB)

Rea

d Th

roug

hput

(Mb/

s)

64 Nodes32 Nodes16 Nodes8 Nodes4 Nodes2 Nodes1 Node

0

10000

20000

30000

40000

50000

60000

70000

1E-06

0.001 0.0

1 0.1 1 10 100

1000

File Size (MB)

Rea

d+W

rite

Thro

ughp

ut (M

b/s)

64 Nodes32 Nodes16 Nodes8 Nodes4 Nodes2 Nodes1 Node

Page 13: A Data Diffusion Approach to Large Scale Scientific Explorationiraicu/research/presentations/2007... · 2011-10-09 · A Data Diffusion Approach to Large Scale Scientific Exploration

10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 19

Data Diffusion:Micro-Benchmarks

1. Model (local disk): local disk performance model2. Model (shared file system): shared file system (GPFS) performance model3. Falkon (next-available policy): load balancing across available executors, operates

on shared file system 4. Falkon (next-available policy) + Wrapper: same as (3), but tasks execute through

a wrapper that creates a temp scratch directory on the shared file system, makes symbolic links, executes task, and removes the temp scratch directory and symbolic links

5. Falkon (first-available policy – 0% locality): load balancing across available executors with data caching; 0% data locality with data read from the shared file system

6. Falkon (first-available policy – 100% locality): same as (5), but with the workload from (5) repeating four times; the caches are first populated (not as part of the timed experiment)

7. Falkon (max-compute-util policy – 0% locality): same as (5), but uses the data-aware scheduler to send tasks to the executors that have the most data cached; the workload has 0% data locality

8. Falkon (max-compute-util policy – 100% locality): same as (7), but with the workload repeated four times

Page 14: A Data Diffusion Approach to Large Scale Scientific Explorationiraicu/research/presentations/2007... · 2011-10-09 · A Data Diffusion Approach to Large Scale Scientific Exploration

10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 20

Micro-Benchmarks Results:1-64 Nodes

0

10000

20000

30000

40000

50000

60000

70000

1 2 4 8 16 32 64Number of Nodes

Rea

d Th

roug

hput

(Mb/

s)

1. Model (local disk)2. Model (shared file system)3. Falkon (next-available policy)4. Falkon (next-available policy) + Wrapper5. Falkon (first-available policy – 0% locality)6. Falkon (first-available policy – 100% locality)7. Falkon (max-compute-util policy – 0% locality)8. Falkon (max-compute-util policy – 100% locality)

0

10000

20000

30000

40000

50000

60000

70000

1 2 4 8 16 32 64Number of Nodes

Rea

d+W

rite

Thro

ughp

ut (M

b/s)

1. Model (local disk)2. Model (shared file system)3. Falkon (next-available policy)4. Falkon (next-available policy) + Wrapper5. Falkon (first-available policy – 0% locality)6. Falkon (first-available policy – 100% locality)7. Falkon (max-compute-util policy – 0% locality)8. Falkon (max-compute-util policy – 100% locality)

Page 15: A Data Diffusion Approach to Large Scale Scientific Explorationiraicu/research/presentations/2007... · 2011-10-09 · A Data Diffusion Approach to Large Scale Scientific Exploration

10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 21

Micro-Benchmarks Results:1 Node

0

200

400

600

800

1000

1200

1400

1B 1KB 10KB 100KB 1MB 10MB 100MB 1GBFile Size

Rea

d Th

roug

hput

(Mb/

s)

1. Model (local disk)2. Model (shared file system)3. Falkon (next-available policy)4. Falkon (next-available policy) + Wrapper5. Falkon (first-available policy – 0% locality)6. Falkon (first-available policy – 100% locality)7. Falkon (max-compute-util policy – 0% locality)8. Falkon (max-compute-util policy – 100% locality)

0

200

400

600

800

1000

1200

1400

1B 1KB 10KB 100KB 1MB 10MB 100MB 1GBFile Size

Rea

d+W

rite

Thro

ughp

ut (M

b/s)

1. Model (local disk)2. Model (shared file system)3. Falkon (next-available policy)4. Falkon (next-available policy) + Wrapper5. Falkon (first-available policy – 0% locality)6. Falkon (first-available policy – 100% locality)7. Falkon (max-compute-util policy – 0% locality)8. Falkon (max-compute-util policy – 100% locality)

Page 16: A Data Diffusion Approach to Large Scale Scientific Explorationiraicu/research/presentations/2007... · 2011-10-09 · A Data Diffusion Approach to Large Scale Scientific Exploration

10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 22

Micro-Benchmarks Results:8 Nodes

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

1B 1KB 10KB 100KB 1MB 10MB 100MB 1GBFile Size

Rea

d Th

roug

hput

(Mb/

s)

1. Model (local disk)2. Model (shared file system)3. Falkon (next-available policy)4. Falkon (next-available policy) + Wrapper5. Falkon (first-available policy – 0% locality)6. Falkon (first-available policy – 100% locality)7. Falkon (max-compute-util policy – 0% locality)8. Falkon (max-compute-util policy – 100% locality)

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

1B 1KB 10KB 100KB 1MB 10MB 100MB 1GBFile Size

Rea

d+W

rite

Thro

ughp

ut (M

b/s)

1. Model (local disk)2. Model (shared file system)3. Falkon (next-available policy)4. Falkon (next-available policy) + Wrapper5. Falkon (first-available policy – 0% locality)6. Falkon (first-available policy – 100% locality)7. Falkon (max-compute-util policy – 0% locality)8. Falkon (max-compute-util policy – 100% locality)

Page 17: A Data Diffusion Approach to Large Scale Scientific Explorationiraicu/research/presentations/2007... · 2011-10-09 · A Data Diffusion Approach to Large Scale Scientific Exploration

10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 23

Micro-Benchmarks Results:64 Nodes

0

10000

20000

30000

40000

50000

60000

70000

1B 1KB 10KB 100KB 1MB 10MB 100MB 1GBFile Size

Rea

d Th

roug

hput

(Mb/

s)

1. Model (local disk)2. Model (shared file system)3. Falkon (next-available policy)4. Falkon (next-available policy) + Wrapper5. Falkon (first-available policy – 0% locality)6. Falkon (first-available policy – 100% locality)7. Falkon (max-compute-util policy – 0% locality)8. Falkon (max-compute-util policy – 100% locality)

0

10000

20000

30000

40000

50000

60000

70000

1B 1KB 10KB 100KB 1MB 10MB 100MB 1GBFile Size

Rea

d+W

rite

Thro

ughp

ut (M

b/s)

1. Model (local disk)2. Model (shared file system)3. Falkon (next-available policy)4. Falkon (next-available policy) + Wrapper5. Falkon (first-available policy – 0% locality)6. Falkon (first-available policy – 100% locality)7. Falkon (max-compute-util policy – 0% locality)8. Falkon (max-compute-util policy – 100% locality)

Page 18: A Data Diffusion Approach to Large Scale Scientific Explorationiraicu/research/presentations/2007... · 2011-10-09 · A Data Diffusion Approach to Large Scale Scientific Exploration

10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 24

Task Dispatch Throughput:64 Nodes, 1 Byte files on GPFS

487 485

21

378 381 379

459

487

438

20

375 373 377

457

0

50

100

150

200

250

300

350

400

450

500

No Data Access 3. Falkon (next-available policy)

4. Falkon (next-available policy) +

Wrapper

5. Falkon (first-available policy –

0% locality)

6. Falkon (first-available policy –

100% locality)

7. Falkon (max-compute-util policy

– 0% locality)

8. Falkon (max-compute-util policy

– 100% locality)

Thro

ughp

ut (t

asks

/sec

)

ReadRead+Write

Page 19: A Data Diffusion Approach to Large Scale Scientific Explorationiraicu/research/presentations/2007... · 2011-10-09 · A Data Diffusion Approach to Large Scale Scientific Exploration

10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 25

Dispatch Policies100%100%100% 100%100%100%

63%

99%100%

31%

99%100%

16%

99%100%

8%

97%100%

4%

97%100%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Cac

he H

it R

ate

(loca

l dis

k)

1 2 4 8 16 32 64Number of Nodes

first-availablemax-compute-utilmax-cache-hit

Page 20: A Data Diffusion Approach to Large Scale Scientific Explorationiraicu/research/presentations/2007... · 2011-10-09 · A Data Diffusion Approach to Large Scale Scientific Exploration

10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 29

Applications via Swift

13

Virtual Node(s)

SwiftScript

Abstractcomputation

Virtual DataCatalog

SwiftScriptCompiler

Specification Execution

Virtual Node(s)

Provenancedata

ProvenancedataProvenance

collector

launcher

launcher

file1

file2

file3

AppF1

AppF2

Scheduling

Execution Engine(Karajan w/

Swift Runtime)

Swift runtimecallouts

CC CC

Status reporting

Swift Architecture

Provisioning

FalkonResource

Provisioner

AmazonEC2

Application #Tasks/workflow #StagesATLAS: High Energy

Physics Event Simulation 500K 1

fMRI DBIC: AIRSN Image Processing 100s 12

FOAM: Ocean/Atmosphere Model 2000 3GADU: Genomics 40K 4

HNL: fMRI Aphasia Study 500 4NVO/NASA: Photorealistic

Montage/Morphology 1000s 16

QuarkNet/I2U2: Physics Science Education 10s 3 ~ 6

RadCAD: Radiology Classifier Training 1000s 5

SIDGrid: EEG Wavelet Processing, Gaze Analysis 100s 20

SDSS: Coadd, Cluster Search 40K, 500K 2, 8

SDSS: Stacking, AstroPortal 10Ks ~ 100Ks 2 ~ 4MolDyn: Molecular Dynamics 1Ks ~ 20Ks 8

Page 21: A Data Diffusion Approach to Large Scale Scientific Explorationiraicu/research/presentations/2007... · 2011-10-09 · A Data Diffusion Approach to Large Scale Scientific Exploration

10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 30

Applications

• Medicine: – fMRI (Falkon vs. GRAM)

• Astronomy: – Montage (Falkon vs. GRAM vs. MPI)– Stacking (Falkon with Data Diffusion vs.

Falkon using shared file system)

Page 22: A Data Diffusion Approach to Large Scale Scientific Explorationiraicu/research/presentations/2007... · 2011-10-09 · A Data Diffusion Approach to Large Scale Scientific Exploration

10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 31

Applications: fMRI

1239

2510

3683

4808

456866 992 1123

120 327 546 678

0

1000

2000

3000

4000

5000

6000

120 240 360 480Input Data Size (Volumes)

Tim

e (s

)

GRAMGRAM/ClusteringFalkon

• GRAM vs. Falkon: 85%~90% lower run time• GRAM/Clustering vs. Falkon: 40%~74% lower run time

Page 23: A Data Diffusion Approach to Large Scale Scientific Explorationiraicu/research/presentations/2007... · 2011-10-09 · A Data Diffusion Approach to Large Scale Scientific Exploration

10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 32

Applications: Montage

0

500

1000

1500

2000

2500

3000

3500

mProjec

t

mDiff/Fit

mBackg

round

mAdd(su

b)

mAdd total

Components

Tim

e (s

)

GRAM/ClusteringMPIFalkon

• GRAM/Clustering vs. Falkon: 57% lower application run time• MPI* vs. Falkon: 4% higher application run time• * MPI should be lower bound

Page 24: A Data Diffusion Approach to Large Scale Scientific Explorationiraicu/research/presentations/2007... · 2011-10-09 · A Data Diffusion Approach to Large Scale Scientific Exploration

10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 33

Applications: Stacking

• 320M+ objects• 1.5M+ files• 3.3 TB compressed / 9TB

uncompressed

0250500750

10001250150017502000225025002750

010

0000

2000

0030

0000

4000

0050

0000

6000

0070

0000

8000

0090

0000

1E+0

61E

+06

1E+0

6Files

Obj

ect C

ount

per

File

• SDSS DR5 Dataset

Page 25: A Data Diffusion Approach to Large Scale Scientific Explorationiraicu/research/presentations/2007... · 2011-10-09 · A Data Diffusion Approach to Large Scale Scientific Exploration

10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 34

Stacking Workloads

• Uniform sample from 320M objects• ZIPF distribution of object popularity

– alpha = 0.1, 1.0, 2.0Stacking Size (# of objects)

Object Distribution

Working Set Size (#

of files)

Working Set Size (# of objects)

Locality (accesses

per file)10000 ZIPF - alpha 0.1 1915 1924 5.2210000 ZIPF - alpha 1.0 2755 2819 3.6310000 ZIPF - alpha 2.0 6110 6259 1.6410000 UNIFORM 9771 10000 1.02

Page 26: A Data Diffusion Approach to Large Scale Scientific Explorationiraicu/research/presentations/2007... · 2011-10-09 · A Data Diffusion Approach to Large Scale Scientific Exploration

10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 35

Stacking Performance Evaluation

0

50

100

150

200

250

300

350

400

450

Web Server Shared FileSystem (WAN)

Shared FileSystem (LAN)

Data Diffusion

Original Dataset Location

Tim

e (s

ec)

ZIPF - alpha 0.1ZIPF - alpha 1.0ZIPF - alpha 2.0UNIFORM

Page 27: A Data Diffusion Approach to Large Scale Scientific Explorationiraicu/research/presentations/2007... · 2011-10-09 · A Data Diffusion Approach to Large Scale Scientific Exploration

10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 36

Object Size Effect

0

50

100

150

200

250

300

350

10x10 100x100 500x500Object Size

Tim

e (s

ec)

Shared File System (LAN)Data Diffusion

Page 28: A Data Diffusion Approach to Large Scale Scientific Explorationiraicu/research/presentations/2007... · 2011-10-09 · A Data Diffusion Approach to Large Scale Scientific Exploration

10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 37

Data Diffusion Effect

0 20 40 60 80 100 120 140

Data Diffusion (UNIFORM)

Data Diffusion (ZIPF 0.1)

Data Diffusion (ZIPF 1.0)

Data Diffusion (ZIPF 2.0)

Shared File System (LAN - UNIFORM)

Shared File System (LAN - ZIPF 0.1)

Shared File System (LAN - ZIPF 1.0)

Shared File System (LAN - ZIPF 2.0)

Time (sec)

1st Time2nd Time

Page 29: A Data Diffusion Approach to Large Scale Scientific Explorationiraicu/research/presentations/2007... · 2011-10-09 · A Data Diffusion Approach to Large Scale Scientific Exploration

10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 38

10K*2 Stacking Workload

0

100

200

300

400

500

600

700

800

900

1000

0 10 20 30 40 50 60 70 80 90 100 110 120Time (sec)

Thro

ughp

ut (C

utou

ts/s

ec)

10K Objects - UNIFORM 1st Time: Instantenous Throughput10K Objects - UNIFORM 1st Time: Average Throughput10K Objects - UNIFORM 2nd Time: Instantenous Throughput10K Objects - UNIFORM 2nd Time: Average Throughput

Page 30: A Data Diffusion Approach to Large Scale Scientific Explorationiraicu/research/presentations/2007... · 2011-10-09 · A Data Diffusion Approach to Large Scale Scientific Exploration

10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 39

100K Stacking Workload

0

50

100

150

200

250

0 100 200 300 400 500 600 700Time (sec)

Thro

ughp

ut (C

utou

ts/s

ec)

100K Objects - UNIFORM: Instantenous Throughput100K Objects - UNIFORM: Average Throughput

Page 31: A Data Diffusion Approach to Large Scale Scientific Explorationiraicu/research/presentations/2007... · 2011-10-09 · A Data Diffusion Approach to Large Scale Scientific Exploration

10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 40

Future Work

• Explore data pre-fetching policies• Update Swift to use data diffusion• 3-Tier Architecture (essential on BG/P)• Extend provisioner to support the Virtual

Workspace Service (opens door to EC2)• Globus Incubator Project• Alternate languages and technologies

Page 32: A Data Diffusion Approach to Large Scale Scientific Explorationiraicu/research/presentations/2007... · 2011-10-09 · A Data Diffusion Approach to Large Scale Scientific Exploration

10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 44

More Information• Web: http://people.cs.uchicago.edu/~iraicu/research/Falkon/• Related Projects:

– Swift: http://www.ci.uchicago.edu/swift/index.php– AstroPortal: http://people.cs.uchicago.edu/~iraicu/research/AstroPortal/index.htm

• Data Diffusion Collaborators:– Ioan Raicu, Computer Science Dept., The University of Chicago– Yong Zhao, Microsoft– Ian Foster, Math and Computer Science Div., Argonne National Laboratory & Computer

Science Dept., The University of Chicago – Alex Szalay, Department of Physics and Astronomy, The Johns Hopkins University

• Other Falkon Collaborators:– Catalin Dumitrescu, Computer Science Dept., The University of Chicago – Mike Wilde, Computation Institute, University of Chicago & Argonne National Laboratory

• Funding:– NASA: Ames Research Center, Graduate Student Research Program (GSRP)– DOE: Mathematical, Information, and Computational Sciences Division subprogram of the

Office of Advanced Scientific Computing Research, Office of Science, U.S. Dept. of Energy– NSF: TeraGrid

Page 33: A Data Diffusion Approach to Large Scale Scientific Explorationiraicu/research/presentations/2007... · 2011-10-09 · A Data Diffusion Approach to Large Scale Scientific Exploration

10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 45

More Information:Related Documents

• Ioan Raicu, Yong Zhao, Catalin Dumitrescu, Ian Foster, Mike Wilde. “Falkon: a Fast and Light-weight tasK executiON framework”, to appear at IEEE/ACM SuperComputing 2007.

• Ioan Raicu, Catalin Dumitrescu, Ian Foster. Dynamic Resource Provisioning in Grid Environments, to appear TeraGrid Conference 2007.

• Yong Zhao, Mihael Hategan, Ben Clifford, Ian Foster, Gregor von Laszewski, Ioan Raicu, Tiberiu Stef-Praun, Mike Wilde. “Swift: Fast, Reliable, Loosely Coupled Parallel Computation”, to appear at IEEE Workshop on Scientific Workflows 2007.

• Yong Zhao, Mihael Hategan, Ioan Raicu, Mike Wilde, Ian Foster. “Swift: a Parallel Programming Tool for Large Scale Scientific Computations”, under review at Scientific Programming Journal, Special Issue on Dynamic Computational Workflows: Discovery, Optimization, and Scheduling.

• Ioan Raicu, Ian Foster, Alex Szalay. “Harnessing Grid Resources to Enable the Dynamic Analysis of Large Astronomy Datasets”, poster presentation, IEEE/ACM SuperComputing 2006.

• Ioan Raicu, Ian Foster, Alex Szalay, Gabriela Turcu. “AstroPortal: A Science Gateway for Large-scale Astronomy Data Analysis”, TeraGrid Conference 2006, June 2006.

• Alex Szalay, Julian Bunn, Jim Gray, Ian Foster, Ioan Raicu. “The Importance of Data Locality in Distributed Computing Applications”, NSF Workflow Workshop 2006.