18
LQCD Workflow Discussion Fermilab, 18 Dec 2006 Mike Wilde [email protected]

LQCD Workflow Discussion Fermilab, 18 Dec 2006 Mike Wilde

Embed Size (px)

DESCRIPTION

Virtual Data Workflow Abstracts Grid Details

Citation preview

Page 1: LQCD Workflow Discussion Fermilab, 18 Dec 2006 Mike Wilde

LQCD Workflow DiscussionFermilab, 18 Dec 2006

Mike [email protected]

Page 2: LQCD Workflow Discussion Fermilab, 18 Dec 2006 Mike Wilde

Virtual Data Origins:The Grid Physics Network

Enhance scientific productivity through…• Discovery, application and management of

data and processes at all scales• Using a worldwide data grid as a scientific

workstationThe key to this approach is Virtual Data –

creating and managing datasets through workflow “recipes” and provenance recording.

Page 3: LQCD Workflow Discussion Fermilab, 18 Dec 2006 Mike Wilde

Virtual Data WorkflowAbstracts Grid Details

Page 4: LQCD Workflow Discussion Fermilab, 18 Dec 2006 Mike Wilde

Complex Workflows Run on Grid

Galaxy Image Montage Workflow~1200 node workflow, 7 levels

Mosaic of M42 created onthe Teragrid using PegasusCourtesy NASA and NVO= Data

Transfer= Compute Job

Page 5: LQCD Workflow Discussion Fermilab, 18 Dec 2006 Mike Wilde

Neuroscience Example• Create a library of typed procedures for popular fMRI

tools, that are pre-installed across Grid sites• Process structured data collections with simple

procedures – automated iteration• Establish a data cataloging scheme with metadata

annotation and search• Provide easy-to-use workflow execution portals –

command line environments that make clusters and Grids appear like workstations

• Work in an environment that is a fully-tracked “clean room” with accurate provenance and powerful search

Page 6: LQCD Workflow Discussion Fermilab, 18 Dec 2006 Mike Wilde

Example Science Driver:Functional MRI Analysis

3a.h

align_warp/1

3a.i

3a.s.h

softmean/9

3a.s.i

3a.w

reslice/2

4a.h

align_warp/3

4a.i

4a.s.h 4a.s.i

4a.w

reslice/4

5a.h

align_warp/5

5a.i

5a.s.h 5a.s.i

5a.w

reslice/6

6a.h

align_warp/7

6a.i

6a.s.h 6a.s.i

6a.w

reslice/8

ref.h ref.i

atlas.h atlas.i

slicer/10 slicer/12 slicer/14

atlas_x.jpg

atlas_x.ppm

convert/11

atlas_y.jpg

atlas_y.ppm

convert/13

atlas_z.jpg

atlas_z.ppm

convert/15

Workflow courtesy James Dobson, Dartmouth Brain Imaging Center

Page 7: LQCD Workflow Discussion Fermilab, 18 Dec 2006 Mike Wilde

fMRI Dataset processingFOREACH BOLDSEQDV reorient (# Process Blood O2 Level Dependent Sequence input = [ @{in: "$BOLDSEQ.img"},

@{in: "$BOLDSEQ.hdr"} ], output = [@{out: "$CWD/FUNCTIONAL/r$BOLDSEQ.img"} @{out: "$CWD/FUNCTIONAL/r$BOLDSEQ.hdr"}], direction = "y", );END

DV softmean ( input = [ FOREACH BOLDSEQ @{in:"$CWD/FUNCTIONAL/har$BOLDSEQ.img"} END ], mean = [ @{out:"$CWD/FUNCTIONAL/mean"} ]);

Page 8: LQCD Workflow Discussion Fermilab, 18 Dec 2006 Mike Wilde

Spatial normalization of functional runreorientRun

reorientRun

reslice_warpRun

random_select

alignlinearRun

resliceRun

softmean

alignlinear

combinewarp

strictmean

gsmoothRun

binarize

reorient/01

reorient/02

reslice_warp/22

alignlinear/03 alignlinear/07alignlinear/11

reorient/05

reorient/06

reslice_warp/23

reorient/09

reorient/10

reslice_warp/24

reorient/25

reorient/51

reslice_warp/26

reorient/27

reorient/52

reslice_warp/28

reorient/29

reorient/53

reslice_warp/30

reorient/31

reorient/54

reslice_warp/32

reorient/33

reorient/55

reslice_warp/34

reorient/35

reorient/56

reslice_warp/36

reorient/37

reorient/57

reslice_warp/38

reslice/04 reslice/08reslice/12

gsmooth/41

strictmean/39

gsmooth/42gsmooth/43gsmooth/44 gsmooth/45 gsmooth/46 gsmooth/47 gsmooth/48 gsmooth/49 gsmooth/50

softmean/13

alignlinear/17

combinewarp/21

binarize/40

reorient

reorient

alignlinear

reslice

softmean

alignlinear

combine_warp

reslice_warp

strictmean

binarize

gsmooth

Dataset-level workflow Expanded (10 volume) workflow

Page 9: LQCD Workflow Discussion Fermilab, 18 Dec 2006 Mike Wilde

VDL2 Workflow – Data types and “atomic” procedures

type file {} type fileNames{ file f[]; }

// Procedure to run "R" statistical package

(file t) bricRInvoke (file script, int iteration, file dataAll, file dataPerm){ app { bricRInvoke @filename(script) iteration @filename(dataAll) @filename(dataPerm); }}

// Procedure to run AFNI Clustering tool

(file t, file v) bricCluster (file clusterScript, int iteration, file randBrain, file brainFile, file specFile) { app { bricPerlCluster @filename(clusterScript) iteration @filename(randBrain) @filename(brainFile) @filename(specFile); }}

// Procedure to merge results based on statistical likelihoods

(file t) bricCentralize ( file fn[]) { app { bricCentralize @filenames(fn); } }

Page 10: LQCD Workflow Discussion Fermilab, 18 Dec 2006 Mike Wilde

VDL2 Workflow – Dataset iteration procedures

// Procedure to iterate over the data collection

(fileNames randCluster, fileNames dsetReturn) brain_cluster () {

int j[]=[1:2000];

file dataAll <fixed_mapper; file="obs.imit.all">; file dataPerm <fixed_mapper; file="perm.matrix.11">; file brainFile <fixed_mapper; file="colin_lh_mesh140_std.pial.asc">; file specFile <fixed_mapper; file="colin_lh_mesh140_std.spec">; file randScript <fixed_mapper; file="script.obs.imit.tibi">; file clusterScript <fixed_mapper; file="surfclust.tibi">;

fileNames randBrain <simple_mapper; prefix="rand.brain.set">;

foreach int i in j { randBrain.f[i] = bricRInvoke(randScript,i,dataAll,dataPerm); file rBrain=randBrain.f[i]; (randCluster.f[i], dsetReturn.f[i]) = bricCluster(clusterScript,i,rBrain, brainFile,specFile); } }

Page 11: LQCD Workflow Discussion Fermilab, 18 Dec 2006 Mike Wilde

VDL2 Workflow – Main Workflow Program

// Declare datasets

fileNames randBrain <simple_mapper; prefix="rand.brain.set">; fileNames randCluster <simple_mapper; prefix="Tmean.4mm.perm", suffix="_ClstTable_r4.1_a2.0.1D">; fileNames dsetReturn <simple_mapper; prefix="Tmean.4mm.perm", suffix="_Clustered_r4.1_a2.0.niml.dset">; file bricResult <fixed_mapper; file="OUT.TXT">;

// Main program – launches the whole workflow

(randCluster, dsetReturn) = brain_cluster();

bricResult= bricCentralize (randCluster.f);

Page 12: LQCD Workflow Discussion Fermilab, 18 Dec 2006 Mike Wilde

fMRI Virtual Data QueriesWhich transformations can process a “subject image”?• Q: xsearchvdc -q tr_meta dataType

subject_image input• A: fMRIDC.AIR::align_warp

List anonymized subject-images for young subjects: • Q: xsearchvdc -q lfn_meta dataType subject_image privacy anonymized subjectType young• A: 3472-4_anonymized.img

Show files that were derived from patient image 3472-3:• Q: xsearchvdc -q lfn_tree 3472-3_anonymized.img• A: 3472-3_anonymized.img

3472-3_anonymized.sliced.hdr atlas.hdr atlas.img … atlas_z.jpg 3472-3_anonymized.sliced.img

Page 13: LQCD Workflow Discussion Fermilab, 18 Dec 2006 Mike Wilde

US-ATLASData Challenge 2

Sep 10

Mid July

CPU

-day

Event generation using Virtual Data

Page 14: LQCD Workflow Discussion Fermilab, 18 Dec 2006 Mike Wilde

LIGO Inspiral Search Application

• Describe…

Inspiral workflow application is the work of Duncan Brown, Caltech,

Scott Koranda, UW Milwaukee, and the LSC Inspiral group

Page 15: LQCD Workflow Discussion Fermilab, 18 Dec 2006 Mike Wilde

Virtual Data ApplicationsApplication Jobs / workflow Levels StatusATLASHEP Event Simulation

500K 1 In Use

LIGOInspiral/Pulsar

~700 2-5 Inspiral In Use

NVO/NASAMontage/Morphology

1000s 7 Both In Use

GADU/BLAST Genomics

40K 1 In Use

fMRI DBICAIRSN Image Proc

100s 12 In Devel

QuarkNetCosmicRay science

<10 3-6 In Use

SDSSCoadd image proc

40K 2 In Devel

SDSSGalaxy cluster search

500K 8 CS Research

GTOMOImage proc

1000s 1 In Devel

SCECEarthquake sim

- - In Devel

Page 16: LQCD Workflow Discussion Fermilab, 18 Dec 2006 Mike Wilde

Dimensions of Provenance Data

Virtual DataCatalog

Virtual datarelationships

Metadataannotations

Derivationlineage

Page 17: LQCD Workflow Discussion Fermilab, 18 Dec 2006 Mike Wilde

Virtual Data Catalog Schema

dvIDhoststart

durationexitcode

stats

Invocation

nmspacename

version

Call

passes passes

executescalls

binds references

describesuses

includes

nmspacename

version

Procedure

argnametype

direction

FormalArg

argnamevalue

ActualArg

wfidfromDV

toDV

Workflow

nmspacename

Dataset

objectpred

type/valuserdate

Annotation

1

1

1

1

1

1

*

*

*

*

*

1

11

1

1

1

1 describes

Page 18: LQCD Workflow Discussion Fermilab, 18 Dec 2006 Mike Wilde

Acknowledgements…many thanks to the entire OSG Collaboration and our application science partners in ATLAS, CMS, LIGO, SDSS, UChicago Human Neuroscience Laboratory, SCEC, and Argonne’s Computational Biology and Climate Science Groups of the Mathematics and Computer Science Division.

Workflow research and development:• ISI/USC: Ewa Deelman, Carl Kesselman, Gaurang Mehta, Gurmeet

Singh, Mei-Hui Su, Karan Vahi, Jens Voeckler• U of Chicago: Ben Clifford, Ian Foster, Mihael Hategan, Tibi Stef-

Praun, Gregor von Laszewsk, Mike Wilde, Yong Zhao• www.griphyn.org/vds

GriPhyN and iVDGL are supported by the National Science Foundation

Many of the research efforts involved in this work are supported by the US Department of Energy, office of Science.