32
PRUNE: A Preserving Run Environment for Reproducible Scientific Computing -Peter Ivie

PRUNE: A Preserving Run Environment for …escience-2016.idies.jhu.edu/wp-content/uploads/2016/11/...PRUNE: A Preserving Run Environment for Reproducible Scientific Computing -Peter

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: PRUNE: A Preserving Run Environment for …escience-2016.idies.jhu.edu/wp-content/uploads/2016/11/...PRUNE: A Preserving Run Environment for Reproducible Scientific Computing -Peter

PRUNE: A Preserving Run Environment for Reproducible Scientific Computing

-Peter Ivie

Page 2: PRUNE: A Preserving Run Environment for …escience-2016.idies.jhu.edu/wp-content/uploads/2016/11/...PRUNE: A Preserving Run Environment for Reproducible Scientific Computing -Peter

Reproducibility

•  "[An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions]" –Jon Claerbout

2

Page 3: PRUNE: A Preserving Run Environment for …escience-2016.idies.jhu.edu/wp-content/uploads/2016/11/...PRUNE: A Preserving Run Environment for Reproducible Scientific Computing -Peter

Verify and Extend

•  Don’t re-invent the wheel

•  Stand on the shoulders of giants

3

Page 4: PRUNE: A Preserving Run Environment for …escience-2016.idies.jhu.edu/wp-content/uploads/2016/11/...PRUNE: A Preserving Run Environment for Reproducible Scientific Computing -Peter

PRUNE features

•  Designed for Big Data •  Manage storage and compute resources •  Reproducible workflow specifications •  Share workflow with others •  Reshare changes back •  User defined granularity

4

Page 5: PRUNE: A Preserving Run Environment for …escience-2016.idies.jhu.edu/wp-content/uploads/2016/11/...PRUNE: A Preserving Run Environment for Reproducible Scientific Computing -Peter

Design

Execute

Observe

Share/Publish

Preserve Later

Preserve

Accepted philosphy

•  Libraries •  Hardware •  Network

•  System Administrators •  Remote Collaborators •  Graduated Students

5

Page 6: PRUNE: A Preserving Run Environment for …escience-2016.idies.jhu.edu/wp-content/uploads/2016/11/...PRUNE: A Preserving Run Environment for Reproducible Scientific Computing -Peter

Proposed philosophy

Design

Execute

Observe

Share/Publish

Preserve Later

Preserve

Preserve

Execute

Unpreserve

Design

Share/Publish

Preserve First

Observe

6

Page 7: PRUNE: A Preserving Run Environment for …escience-2016.idies.jhu.edu/wp-content/uploads/2016/11/...PRUNE: A Preserving Run Environment for Reproducible Scientific Computing -Peter

Differences

•  Git: User decides when to preserve

Preserve

Execute

Unpreserve

Design

Share/Publish

Preserve First

Observe

7

Page 8: PRUNE: A Preserving Run Environment for …escience-2016.idies.jhu.edu/wp-content/uploads/2016/11/...PRUNE: A Preserving Run Environment for Reproducible Scientific Computing -Peter

Differences

•  Git: User decides when to preserve

•  Preserve ALL specification changes

Preserve

Execute

Unpreserve

Design

Share/Publish

Preserve First

Observe

8

Page 9: PRUNE: A Preserving Run Environment for …escience-2016.idies.jhu.edu/wp-content/uploads/2016/11/...PRUNE: A Preserving Run Environment for Reproducible Scientific Computing -Peter

Differences

•  Git: User decides when to preserve

•  Preserve ALL specification changes

•  Git: Code Commits separate from Code Execution

Preserve

Execute

Unpreserve

Design

Share/Publish

Preserve First

Observe

9

Page 10: PRUNE: A Preserving Run Environment for …escience-2016.idies.jhu.edu/wp-content/uploads/2016/11/...PRUNE: A Preserving Run Environment for Reproducible Scientific Computing -Peter

Differences

•  Git: User decides when to preserve

•  Preserve ALL specification changes

•  Git: Code Commits separate from Code Execution

•  System Manages ALL computation

Preserve

Execute

Unpreserve

Design

Share/Publish

Preserve First

Observe

10

Page 11: PRUNE: A Preserving Run Environment for …escience-2016.idies.jhu.edu/wp-content/uploads/2016/11/...PRUNE: A Preserving Run Environment for Reproducible Scientific Computing -Peter

Differences

•  Git: User decides when to preserve

•  Preserve ALL specification changes

•  Git: Code Commits separate from Code Execution

•  System Manages ALL computation

•  Remove unneeded items later on

Preserve

Execute

Unpreserve

Design

Share/Publish

Preserve First

Observe

11

Page 12: PRUNE: A Preserving Run Environment for …escience-2016.idies.jhu.edu/wp-content/uploads/2016/11/...PRUNE: A Preserving Run Environment for Reproducible Scientific Computing -Peter

What to Preserve

Hardware

Kernel

Operating System

Software

Command: ‘do < in.txt in.dat > out.txt o2.txt’

Data

arguments: [ file_id1, file_id2 ]parameters: [ ‘in.txt’, ‘in.dat’ ]

returns: [ ‘out.txt’, ‘o2.txt’ ]results: [ file_id3, file_id4 ]

Environm

entV

irtua

l Mac

hine

/ C

onta

iner

Pru

neTask

environment: envi_id1

12

Page 13: PRUNE: A Preserving Run Environment for …escience-2016.idies.jhu.edu/wp-content/uploads/2016/11/...PRUNE: A Preserving Run Environment for Reproducible Scientific Computing -Peter

Overview

F1 = file_add( filename=‘./observed.dat’ )

export( [ T7[1] ], filename=‘./plot.jpg’ )

T6 = task_add( args=[ T4[0] ], params=['input_data’], cmd=‘analyze < in_data > out_data’, returns=[‘out_data'], environment=E2 )

E2 = envi_add( type=‘EC2’, image=‘hep.stable’ )

T7 = task_add( cmd=‘plot in1 in2 out1 out2’,args=[ T5[0], T6[0] ], params=[‘in1’,‘in2’],returns=[‘out1’,‘out2’], environment=E2 )

User space

E1 = envi_add( type=‘EC2’, image=‘hep.beta’ )

Compute Resources

E1Simulate

Analyze T3(E1)

F2

F7F6F3F4

PRUNE space User interface

T2(E1)

T5(E2)

T6(E2)

T1(E1)

Plot

E2T4(E2)

F5F1

F9F8

T7(E2)

File

Environment

Task

T5 = task_add( args=[ F1 ], ...)(remaining arguments the same as above)

T4 = task_add( cmd=‘simulate > output’,returns=[‘output'], environment=E1)

Workflow Version #2

13

Page 14: PRUNE: A Preserving Run Environment for …escience-2016.idies.jhu.edu/wp-content/uploads/2016/11/...PRUNE: A Preserving Run Environment for Reproducible Scientific Computing -Peter

Sample code: Merge sort

#!/usr/bin/env python from prune import client prune = client.Connect() #Use SQLite3

###### Import sources stage ###### E1 = prune.env_add(type=`EC2',

image=`ami-b06a98d8') D1, D2 = prune.file_add( `nouns.txt', `verbs.txt' )

14

Page 15: PRUNE: A Preserving Run Environment for …escience-2016.idies.jhu.edu/wp-content/uploads/2016/11/...PRUNE: A Preserving Run Environment for Reproducible Scientific Computing -Peter

Sample code: Merge sort

###### Sort stage ###### D3, = prune.task_add( returns=[`output.txt'], env=E1, cmd=`sort input.txt > output.txt', args=[D1], params=[`input.txt'] ) D4, = prune.task_add( returns=[`output.txt'], env=E1, cmd=`sort input.txt > output.txt', args=[D2], params=[`input.txt'] )

###### Merge stage ###### D5, = prune.task_add( returns=[`merged_out.txt'], env=E1, cmd=`sort -m input*.txt > merged_out.txt', args=[D3,D4], params=[`input1.txt',`input2.txt'] )

15

Page 16: PRUNE: A Preserving Run Environment for …escience-2016.idies.jhu.edu/wp-content/uploads/2016/11/...PRUNE: A Preserving Run Environment for Reproducible Scientific Computing -Peter

Prune Task

Hardware

Kernel

Operating System

Software

Command: ‘do < in.txt in.dat > out.txt o2.txt’

Data

arguments: [ file_id1, file_id2 ]parameters: [ ‘in.txt’, ‘in.dat’ ]

returns: [ ‘out.txt’, ‘o2.txt’ ]results: [ file_id3, file_id4 ]

Environm

entV

irtua

l Mac

hine

/ C

onta

iner

Pru

neTask

environment: envi_id1

16

Page 17: PRUNE: A Preserving Run Environment for …escience-2016.idies.jhu.edu/wp-content/uploads/2016/11/...PRUNE: A Preserving Run Environment for Reproducible Scientific Computing -Peter

Sample code: Merge sort

###### Sort stage ###### D3, = prune.task_add( returns=[`output.txt'], env=E1, cmd=`sort input.txt > output.txt', args=[D1], params=[`input.txt'] ) D4, = prune.task_add( returns=[`output.txt'], env=E1, cmd=`sort input.txt > output.txt', args=[D2], params=[`input.txt'] )

###### Merge stage ###### D5, = prune.task_add( returns=[`merged_out.txt'], env=E1, cmd=`sort -m input*.txt > merged_out.txt', args=[D3,D4], params=[`input1.txt',`input2.txt'] )

17

Page 18: PRUNE: A Preserving Run Environment for …escience-2016.idies.jhu.edu/wp-content/uploads/2016/11/...PRUNE: A Preserving Run Environment for Reproducible Scientific Computing -Peter

Sample code: Merge sort

###### Execute the workflow ###### prune.execute( worker_type='local', cores=8 ) #prune.execute( worker_type='wq', name='myapp' )

###### Export ###### prune.export( D5, `merged.txt' ) # Final data prune.export( D5, `wf.prune', lineage=2 )

18

Page 19: PRUNE: A Preserving Run Environment for …escience-2016.idies.jhu.edu/wp-content/uploads/2016/11/...PRUNE: A Preserving Run Environment for Reproducible Scientific Computing -Peter

Sample code: Merge sort

###### Execute the workflow ###### prune.execute( worker_type='local', cores=8 ) #prune.execute( worker_type='wq', name='myapp' )

###### Export ###### prune.export( D5, `merged.txt' ) # Final data prune.export( D5, `wf.prune', lineage=2 )

19

Page 20: PRUNE: A Preserving Run Environment for …escience-2016.idies.jhu.edu/wp-content/uploads/2016/11/...PRUNE: A Preserving Run Environment for Reproducible Scientific Computing -Peter

Sharable workflow description file

{"body": {"args": ["f908ff689b9e57f0055875d927d191ccd2d6deef:0", "319418e43783a78e3cb7e219f9a1211cba4b3b31:0"], "cmd": "sort -m input*.txt > merged_output.txt", "env": "da39a3ee5e6b4b0d3255bfef95601890afd80709", "env_vars": {}, "params": ["input1.txt", "input2.txt"], "precise": true, "returns": ["merged_output.txt"], "types": []}, "cbid": "e82855394e9dcdee03ed8a25c96c79245fd0481a", "size": 322, "type": "call", "wfid": "a0230143-9b3a-4766-809d-5b7172e9b967", "when": 1476886144.7171359} {"body": {"args": ["29ae0a576ab660cb17bf9b14729c7b464fa98cca"], "cmd": "sort input.txt > output.txt", "env": "da39a3ee5e6b4b0d3255bfef95601890afd80709", "env_vars": {}, "params": ["input.txt"], "precise": true, "returns": ["output.txt"], "types": []}, "cbid": "f908ff689b9e57f0055875d927d191ccd2d6deef", "size": 241, "type": "call", "wfid": "a0230143-9b3a-4766-809d-5b7172e9b967", "when": 1476886144.484422} {"body": {"args": ["48044131b31906e6c917d857ddd1539278c455cf"], "cmd": "sort input.txt > output.txt", "env": "da39a3ee5e6b4b0d3255bfef95601890afd80709", "env_vars": {}, "params": ["input.txt"], "precise": true, "returns": ["output.txt"], "types": []}, "cbid": "319418e43783a78e3cb7e219f9a1211cba4b3b31", "size": 241, "type": "call", "wfid": "a0230143-9b3a-4766-809d-5b7172e9b967", "when": 1476886144.6183109} {"cbid": "29ae0a576ab660cb17bf9b14729c7b464fa98cca", "size": 144, "type": "file", "wfid": "a0230143-9b3a-4766-809d-5b7172e9b967", "when": 1476886144.2482941} time person year Way …

20

Page 21: PRUNE: A Preserving Run Environment for …escience-2016.idies.jhu.edu/wp-content/uploads/2016/11/...PRUNE: A Preserving Run Environment for Reproducible Scientific Computing -Peter

Workflow evolution (US Censuses)

1850 1940...

1850 1940...

1850 1940...

Stage 1Uncompress (year+fragment)

Stage 2Normalize (year+fragment)

1940

1940...

1850...1850

1940...1940

... Stage 3Split by key (year+fragment+key)

Stage 4Join fragments (year+key)

Stage 5Pair by year (year1+year2+key)

Stage 6Group matches (year1+year2+key)

Stage 7Filter 1-1 matches (year1+year2+key)

1930

1940

1850

1860

1860

1870

1930

1940...

1930

1940

1850

1860

1860

1870

1930

1940...

1930

1940

1850

1860

1860

1870

1930

1940...

21

Page 22: PRUNE: A Preserving Run Environment for …escience-2016.idies.jhu.edu/wp-content/uploads/2016/11/...PRUNE: A Preserving Run Environment for Reproducible Scientific Computing -Peter

Redefine filter criteria

1850 1940...

1850 1940...

1850 1940...

Stage 1Uncompress (year+fragment)

Stage 2Normalize (year+fragment)

1940

1940...

1850...1850

1940...1940

... Stage 3Split by key (year+fragment+key)

Stage 4Join fragments (year+key)

Stage 5Pair by year (year1+year2+key)

Stage 6Group matches (year1+year2+key)

Stage 7Filter 1-1 matches (year1+year2+key)

1930

1940

1850

1860

1860

1870

1930

1940...

1930

1940

1850

1860

1860

1870

1930

1940...

1930

1940

1850

1860

1860

1870

1930

1940...

1930

1940

1850

1860

1860

1870

1930

1940...

22

Page 23: PRUNE: A Preserving Run Environment for …escience-2016.idies.jhu.edu/wp-content/uploads/2016/11/...PRUNE: A Preserving Run Environment for Reproducible Scientific Computing -Peter

Redefine match criteria

1850 1940...

1850 1940...

1850 1940...

Stage 1Uncompress (year+fragment)

Stage 2Normalize (year+fragment)

1940

1940...

1850...1850

1940...1940

... Stage 3Split by key (year+fragment+key)

Stage 4Join fragments (year+key)

Stage 5Pair by year (year1+year2+key)

Stage 6Group matches (year1+year2+key)

Stage 7Filter 1-1 matches (year1+year2+key)

1930

1940

1850

1860

1860

1870

1930

1940...

1930

1940

1850

1860

1860

1870

1930

1940...

1930

1940

1850

1860

1860

1870

1930

1940...

1930

1940

1850

1860

1860

1870

1930

1940...

1930

1940

1850

1860

1860

1870

1930

1940...

23

Page 24: PRUNE: A Preserving Run Environment for …escience-2016.idies.jhu.edu/wp-content/uploads/2016/11/...PRUNE: A Preserving Run Environment for Reproducible Scientific Computing -Peter

New key function chosen

1850 1940...

1850 1940...

1850 1940...

Stage 1Uncompress (year+fragment)

Stage 2Normalize (year+fragment)

1940

1940...

1850...1850

1940...1940

... Stage 3Split by key (year+fragment+key)

Stage 4Join fragments (year+key)

Stage 5Pair by year (year1+year2+key)

Stage 6Group matches (year1+year2+key)

Stage 7Filter 1-1 matches (year1+year2+key)

1930

1940

1850

1860

1860

1870

1930

1940...

1930

1940

1850

1860

1860

1870

1930

1940...

1930

1940

1850

1860

1860

1870

1930

1940...

1940

1940...

1850...1850

1940...1940

...

1930

1940

1850

1860

1860

1870

1930

1940...

1930

1940

1850

1860

1860

1870

1930

1940...

1930

1940

1850

1860

1860

1870

1930

1940...

24

Page 25: PRUNE: A Preserving Run Environment for …escience-2016.idies.jhu.edu/wp-content/uploads/2016/11/...PRUNE: A Preserving Run Environment for Reproducible Scientific Computing -Peter

Re-normalize

1850 1940...

1850 1940...

1850 1940...

Stage 1Uncompress (year+fragment)

Stage 2Normalize (year+fragment)

1940

1940...

1850...1850

1940...1940

... Stage 3Split by key (year+fragment+key)

Stage 4Join fragments (year+key)

Stage 5Pair by year (year1+year2+key)

Stage 6Group matches (year1+year2+key)

Stage 7Filter 1-1 matches (year1+year2+key)

1930

1940

1850

1860

1860

1870

1930

1940...

1930

1940

1850

1860

1860

1870

1930

1940...

1930

1940

1850

1860

1860

1870

1930

1940...

1850 1940...

1940

1940...

1850...1850

1940...1940

...

1930

1940

1850

1860

1860

1870

1930

1940...

1930

1940

1850

1860

1860

1870

1930

1940...

1930

1940

1850

1860

1860

1870

1930

1940...

25

Page 26: PRUNE: A Preserving Run Environment for …escience-2016.idies.jhu.edu/wp-content/uploads/2016/11/...PRUNE: A Preserving Run Environment for Reproducible Scientific Computing -Peter

New input data

1850 1940...

1850 1940...

1850 1940...

Stage 1Uncompress (year+fragment)

Stage 2Normalize (year+fragment)

1940

1940...

1850...1850

1940...1940

... Stage 3Split by key (year+fragment+key)

Stage 4Join fragments (year+key)

Stage 5Pair by year (year1+year2+key)

Stage 6Group matches (year1+year2+key)

Stage 7Filter 1-1 matches (year1+year2+key)

1930

1940

1850

1860

1860

1870

1930

1940...

1930

1940

1850

1860

1860

1870

1930

1940...

1930

1940

1850

1860

1860

1870

1930

1940...

1850 1940...

1850 1940...

1850 1940...

1940

1940...

1850...1850

1940...1940

...

1930

1940

1850

1860

1860

1870

1930

1940...

1930

1940

1850

1860

1860

1870

1930

1940...

1930

1940

1850

1860

1860

1870

1930

1940...

26

Page 27: PRUNE: A Preserving Run Environment for …escience-2016.idies.jhu.edu/wp-content/uploads/2016/11/...PRUNE: A Preserving Run Environment for Reproducible Scientific Computing -Peter

Derivation History = Cachable Results

27

Page 28: PRUNE: A Preserving Run Environment for …escience-2016.idies.jhu.edu/wp-content/uploads/2016/11/...PRUNE: A Preserving Run Environment for Reproducible Scientific Computing -Peter

Execution time cut in half for run #2

28

Page 29: PRUNE: A Preserving Run Environment for …escience-2016.idies.jhu.edu/wp-content/uploads/2016/11/...PRUNE: A Preserving Run Environment for Reproducible Scientific Computing -Peter

Quotas

29

Page 30: PRUNE: A Preserving Run Environment for …escience-2016.idies.jhu.edu/wp-content/uploads/2016/11/...PRUNE: A Preserving Run Environment for Reproducible Scientific Computing -Peter

Scalability

•  ~12,000 parallel cores •  ~3 million tasks

•  Overhead ~1% above native wall clock

30

Page 31: PRUNE: A Preserving Run Environment for …escience-2016.idies.jhu.edu/wp-content/uploads/2016/11/...PRUNE: A Preserving Run Environment for Reproducible Scientific Computing -Peter

Sharing workflow between users

{"body": {"args": ["f908ff689b9e57f0055875d927d191ccd2d6deef:0", "319418e43783a78e3cb7e219f9a1211cba4b3b31:0"], "cmd": "sort -m input*.txt > merged_output.txt", "env": "da39a3ee5e6b4b0d3255bfef95601890afd80709", "env_vars": {}, "params": ["input1.txt", "input2.txt"], "precise": true, "returns": ["merged_output.txt"], "types": []}, "cbid": "e82855394e9dcdee03ed8a25c96c79245fd0481a", "size": 322, "type": "call", "wfid": "a0230143-9b3a-4766-809d-5b7172e9b967", "when": 1476886144.7171359} {"body": {"args": ["29ae0a576ab660cb17bf9b14729c7b464fa98cca"], "cmd": "sort input.txt > output.txt", "env": "da39a3ee5e6b4b0d3255bfef95601890afd80709", "env_vars": {}, "params": ["input.txt"], "precise": true, "returns": ["output.txt"], "types": []}, "cbid": "f908ff689b9e57f0055875d927d191ccd2d6deef", "size": 241, "type": "call", "wfid": "a0230143-9b3a-4766-809d-5b7172e9b967", "when": 1476886144.484422} {"body": {"args": ["48044131b31906e6c917d857ddd1539278c455cf"], "cmd": "sort input.txt > output.txt", "env": "da39a3ee5e6b4b0d3255bfef95601890afd80709", "env_vars": {}, "params": ["input.txt"], "precise": true, "returns": ["output.txt"], "types": []}, "cbid": "319418e43783a78e3cb7e219f9a1211cba4b3b31", "size": 241, "type": "call", "wfid": "a0230143-9b3a-4766-809d-5b7172e9b967", "when": 1476886144.6183109} {"cbid": "29ae0a576ab660cb17bf9b14729c7b464fa98cca", "size": 144, "type": "file", "wfid": "a0230143-9b3a-4766-809d-5b7172e9b967", "when": 1476886144.2482941} time person year Way …

31

Page 32: PRUNE: A Preserving Run Environment for …escience-2016.idies.jhu.edu/wp-content/uploads/2016/11/...PRUNE: A Preserving Run Environment for Reproducible Scientific Computing -Peter

http://ccl.cse.nd.edu/research/papers/ •  Sample workflows •  http://ccl.cse.nd.edu/software/prune/prune.html

–  Merge sort –  Pairwise comparisons (US Censuses) –  High-energy Physics

Thank You!

Formoreinforma+on:[email protected]