View
224
Download
1
Category
Preview:
Citation preview
Kepler+PF+RWS,Kepler+PF+RWS, Podhorszki, Altintas et al. Provenance Challenge @ GGF18Provenance Challenge @ GGF18
RWS Provenance Experiments in Kepler (Kepler + PR + RWS)
Norbert Podhorszki
Ilkay Altintas
Bertram Ludaescher
in collaboration with
Shawn Bowers
Timothy McPhillips
Kepler+PF+RWS,Kepler+PF+RWS, Podhorszki, Altintas et al. Provenance Challenge @ GGF18Provenance Challenge @ GGF18
Initial Provenance Framework (IPAW’06, Altintas et al.)
• Vision:– Modeled as a separate concern in the system
• Optional drag and drop feature– Listen to execution and save information (customizable):
• Context: who, what, where, when, and why that is associated with the run
• Input data and its associated metadata• Workflow outputs and intermediate data products• Workflow definition (entities, parameters, connections): a
specification of what exists in the workflow and can have a context of its own
• Information about the workflow evolution -- workflow trail
Kepler+PF+RWS,Kepler+PF+RWS, Podhorszki, Altintas et al. Provenance Challenge @ GGF18Provenance Challenge @ GGF18
Kepler System Architecture
Authentication
GUI
Vergil
SMS
KeplerCore
ExtensionsPtolemy
…Kepler GUI Extensions…
Actor&DataSEARCH
TypeSystem
Ext
ProvenanceRecorder
KeplerObject
Manager
Documentation
Smart Re-run /Failure
Recovery
IPAW’06-Altintas et al.
Kepler+PF+RWS,Kepler+PF+RWS, Podhorszki, Altintas et al. Provenance Challenge @ GGF18Provenance Challenge @ GGF18
Kepler Provenance Recorder (IPAW’06, Altintas et al)
• Parametric and customizable
– Different report formats– Variable levels of
verbosity• all, some, medium,
on error– Multiple cache
destinations
• Saves information on– User name, Date, Run,
etc…
Kepler+PF+RWS,Kepler+PF+RWS, Podhorszki, Altintas et al. Provenance Challenge @ GGF18Provenance Challenge @ GGF18
Read-Write-ReSet Model (IPAW’06, McPhillips et al)
• r, r …. r, w, w, … w, r, … r, w, ... w, …
firing• what about actor state? what about “real” dependencies?• reset event s defines when actor “cuts off” dependencies
– a semantic notion, known to the actor [developer] (or part of a higher-order scheme)
• r, r …. r, w, w, … w, [s!] r, … r, w, ... w, …
A3
r … r w…w
[s!]
PS ???
Kepler+PF+RWS,Kepler+PF+RWS, Podhorszki, Altintas et al. Provenance Challenge @ GGF18Provenance Challenge @ GGF18
Goals of the PR+RWS Experiments
• Use the RWS model for Kepler workflows– both single-level and nested workflows (fun starts here :-)
• Extend the Kepler Provenance Recorder – Modify the methods of the provenance listener class– Classes to store execution data about the workflow
• To generate the send-receive relations of the tokens correctly • To count actor firings correctly
• Disclaimer: Initially only one workflow run is targeted– (but approach can handle multiple actor firings due to pipeline
parallelism .. )– future: queries over several runs and workflow-provenance – (others in Kepler already doing this merge efforts in the future)
Kepler+PF+RWS,Kepler+PF+RWS, Podhorszki, Altintas et al. Provenance Challenge @ GGF18Provenance Challenge @ GGF18
Implementation: Data Model
• Port-actor relationship– portTable(Port, Actor, type)
• type is r as real and v as virtual (transparent)• Token-object relationship
– tokenTable(Token, Object)
• Object-value relationship– objectTable(Object, Value, Type)
• type is currently not recorded• RWS trace
– traceTable(Port, Event, Token, FiringCounter)• event: r as read, w as write or s as state-reset
Kepler+PF+RWS,Kepler+PF+RWS, Podhorszki, Altintas et al. Provenance Challenge @ GGF18Provenance Challenge @ GGF18
Implementation: Class Hierarchy
• Extends the existing provenance execution listener with– Methods– More event listeners– Supporting classes
• RWSPortInfo, RWSActorInfo – Data structures for building and containing info about the workflow
(and counters for event record
• RWSEvent– Handles RWS events
Kepler+PF+RWS,Kepler+PF+RWS, Podhorszki, Altintas et al. Provenance Challenge @ GGF18Provenance Challenge @ GGF18
initialize()
Generate RWS portMap
Generate RWS actorMap
Record static wf info
Create new RWS event list
Initialization phase
RWSPortInfo(info locally
known at a port)
RWSPortInfo(build connection info)
for each port
for each port
RWSActorInfo
for each actor
portTable
Execution: Initialization
Kepler+PF+RWS,Kepler+PF+RWS, Podhorszki, Altintas et al. Provenance Challenge @ GGF18Provenance Challenge @ GGF18
Execution: Event Handling and Modifications
validate()
Before model is executed.
Subscribe to token listeners
TokenSendTokenGet
changeExecuted()
Sth is changed in the workflow
Re-generate RWS portMap
Just before run
When the workflow is modified
event handling methods are extended here
Kepler+PF+RWS,Kepler+PF+RWS, Podhorszki, Altintas et al. Provenance Challenge @ GGF18Provenance Challenge @ GGF18
Execution: During the workflow run
TokenSendEvent()New RWS event w
When a token event occurs
TokenGetEvent()
Print sent token’s info(token id, object id, value)
Generate virtual TokenGet event
For each connected transparent port
New RWS event r
Generate virtual TokenSend event
If it is atransparent port
tokenTable
traceTable
objectTable
Kepler+PF+RWS,Kepler+PF+RWS, Podhorszki, Altintas et al. Provenance Challenge @ GGF18Provenance Challenge @ GGF18
A Kepler Workflow Implementation
RWS TRACE
Table # of elements size in KB
portTable 81 4 tokenTable 30 2 objectTable 30 3
traceTable 86 6
Kepler+PF+RWS,Kepler+PF+RWS, Podhorszki, Altintas et al. Provenance Challenge @ GGF18Provenance Challenge @ GGF18
Query 1.a
Find the process that led to Atlas X Graphic / everything that caused Atlas X Graphic to be as it is. This should tell us the new brain images from which the averaged atlas was generated, the warping performed etc.
Answer a. list of actors that contributed to the result: (21 actors). They appear in reversed order as they were executed.
?- q1b_actors('"/usr/home/pnorbert/Provenance/ProvCh/data/output/atlas-x.gif"', ActorList), print(ActorList).
[ .pc.Convert_x, .pc.Slicer_x, .pc.SoftMean, .pc.Reslice3, .pc.Reslice2, .pc.Reslice4, .pc.Reslice1, .pc.AlignWarp3, .pc.RefImg, .pc.RefHdr, .pc.InputHdr3, .pc.InputImg3, .pc.AlignWarp2, .pc.InputHdr2, .pc.InputImg2, .pc.AlignWarp4, .pc.InputHdr4, .pc.InputImg4, .pc.AlignWarp1, .pc.InputImg1, .pc.InputHdr1]
Kepler+PF+RWS,Kepler+PF+RWS, Podhorszki, Altintas et al. Provenance Challenge @ GGF18Provenance Challenge @ GGF18
Query 1.bAnswer b. list of intermediate values created by the workflow (26 values).
?- q1b_values('"/usr/home/pnorbert/Provenance/ProvCh/data/output/atlas-x.gif"', ValueList), print(ValueList).
["/usr/home/pnorbert/Provenance/ProvCh/data/output/atlas-x.gif", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage4/atlas-x.pgm", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage3/atlas.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage3/atlas.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced3.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced2.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced4.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced1.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced2.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced3.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced4.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced1.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage1/warp3.warp", "/usr/home/pnorbert/Provenance/ProvCh/data/input/reference.img", "/usr/home/pnorbert/Provenance/ProvCh/data/input/reference.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/input/anatomy3.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/input/anatomy3.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage1/warp2.warp", "/usr/home/pnorbert/Provenance/ProvCh/data/input/anatomy2.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/input/anatomy2.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage1/warp4.warp", "/usr/home/pnorbert/Provenance/ProvCh/data/input/anatomy4.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/input/anatomy4.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage1/warp1.warp", "/usr/home/pnorbert/Provenance/ProvCh/data/input/anatomy1.img", "/usr/home/pnorbert/Provenance/ProvCh/data/input/anatomy1.hdr”]
Kepler+PF+RWS,Kepler+PF+RWS, Podhorszki, Altintas et al. Provenance Challenge @ GGF18Provenance Challenge @ GGF18
Improved PC workflow (cf. COMAD wf)
RWS TRACE
Table # of elements size in KB
portTable 42 2 tokenTable 51 3 objectTable 39 4 traceTable 150 9
• A more generic workflow to accepts any number of images
• Smaller number of actors• This effects the number of
values as it requires additional array operations
• cf. also COMAD approach and Taverna approach (but we fire AlignWrap individually here)
Kepler+PF+RWS,Kepler+PF+RWS, Podhorszki, Altintas et al. Provenance Challenge @ GGF18Provenance Challenge @ GGF18
Improved PC workflow
Kepler+PF+RWS,Kepler+PF+RWS, Podhorszki, Altintas et al. Provenance Challenge @ GGF18Provenance Challenge @ GGF18
Query 1
Find the process that led to Atlas X Graphic / everything that caused Atlas X Graphic to be as it is. This should tell us the new brain images from which the averaged atlas was generated, the warping performed etc.
Answer a. list of actors that contributed to the result: (15 actors). They appear in reversed order as they were executed.
?- q1b_actors('"/usr/home/pnorbert/Provenance/ProvCh/data/output/atlas-x.gif"', ActorList), print(ActorList).
[ .pca.Convert, .pca.Slicer , .pca.hdrrepeat, .pca.seqXYZ, .pca.imgrepeat, .pca.SoftMeanArray, .pca.imgarray, .pca.hdrarray, .pca.Reslice, .pca.AlignWarp, .pca.RefHdr, .pca.InputHdr, .pca.InputImg, .pca.RefImg, .pca.Ramp]
Kepler+PF+RWS,Kepler+PF+RWS, Podhorszki, Altintas et al. Provenance Challenge @ GGF18Provenance Challenge @ GGF18
Query 1Answer b. list of intermediate values created by the workflow (33 values).It includes internal data values (arrays) additionally to the original file names.?- q1b_values('"/usr/home/pnorbert/Provenance/ProvCh/data/output/atlas-x.gif"', ValueList), print(ValueList).[ "/usr/home/pnorbert/Provenance/ProvCh/data/output/atlas-x.gif", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage4/atlas-x.pgm", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage3/atlas.hdr", "x", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage3/atlas.img", { "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced1.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced2.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced3.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced4.img" }, { "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced1.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced2.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced3.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced4.hdr" }, "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced1.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced2.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced3.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced4.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage1/warp1.warp", "/usr/home/pnorbert/Provenance/ProvCh/data/input/reference.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/input/anatomy1.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/input/anatomy1.img", "/usr/home/pnorbert/Provenance/ProvCh/data/input/reference.img", 1, etc...
Kepler+PF+RWS,Kepler+PF+RWS, Podhorszki, Altintas et al. Provenance Challenge @ GGF18Provenance Challenge @ GGF18
Nested workflow tricky example
S
Kepler+PF+RWS,Kepler+PF+RWS, Podhorszki, Altintas et al. Provenance Challenge @ GGF18Provenance Challenge @ GGF18
The trick
• Multi-port of Ptolemy– two distinct channels going into S and out from S– A’s output is delivered to S.C– B’s output is delivered to S.D– S.C’s output is delivered to E– S.D’s output is delivered to F
Kepler+PF+RWS,Kepler+PF+RWS, Podhorszki, Altintas et al. Provenance Challenge @ GGF18Provenance Challenge @ GGF18
Lineage of actors and values
Who contributed to value D.2 arrived at F??- q1('"D.2"', ActorList, ValueList).
ActorList = ['.WF15.S.D', '.WF15.S', '.WF15.B']ValueList = ['"D.2"', '2', '2']
Who contributed to value C.1 arrived at E??- q1('"C.1"', ActorList, ValueList).
ActorList = ['.WF15.S.C', '.WF15.S', '.WF15.A']ValueList = ['"C.1"', '1', '1']
Kepler+PF+RWS,Kepler+PF+RWS, Podhorszki, Altintas et al. Provenance Challenge @ GGF18Provenance Challenge @ GGF18
Single-level lineage of actors and values
Who contributed to value D.2 arrived at F??- q1b('"D.2"', ActorList, ValueList).
ActorList = ['.WF15.S', '.WF15.B']ValueList = ['"D.2"', '2']
Who contributed to value C.1 arrived at E??- q1b('"C.1"', ActorList, ValueList).
ActorList = ['.WF15.S', '.WF15.A']ValueList = ['"C.1"', '1']
Kepler+PF+RWS,Kepler+PF+RWS, Podhorszki, Altintas et al. Provenance Challenge @ GGF18Provenance Challenge @ GGF18
Conclusions
• 1st attempt combining Kepler PR & Kepler RWS provenance model– Both published in IPAW 2006
• Query 1 was successfully answered.
• Queries 2 and 3 are answerable, but hadn’t been implemented yet.
• Queries on multiple runs and workflow design provenance is out of the scope of this initial prototype.– Other groups in Kepler focusing on this.
Kepler+PF+RWS,Kepler+PF+RWS, Podhorszki, Altintas et al. Provenance Challenge @ GGF18Provenance Challenge @ GGF18
Some related references
• Provenance Framework/Recorder:– Provenance Collection Support in the Kepler Scientific Workflow
System,I.Altintas, O. Barney, E. Jaeger-Frank, IPAW2006, Chicago, Illinois, May 2006.
• RWS Model:– A Model for User-Oriented Data Provenance in Pipelined Scientific W
orkflows, Shawn Bowers, Timothy McPhillips, Bertram Ludaescher, Shirley Cohen, Susan B. Davidson. International Provenance and Annotation Workshop (IPAW'06), Chicago, Illinois, USA, May 3-5, 2006.
Recommended