38
Workflow discovery in e- science Antoon Goderis Peter Li Carole Goble University of Manchester, UK www.cs.man.ac.uk/~goderisa

Workflow discovery in e-science

  • Upload
    vienna

  • View
    29

  • Download
    0

Embed Size (px)

DESCRIPTION

Workflow discovery in e-science. Antoon Goderis Peter Li Carole Goble University of Manchester, UK www.cs.man.ac.uk/~goderisa. Agenda. Web services in science Workflow re-use Workflow discovery Is workflow discovery a new problem? How do people match up workflows? - PowerPoint PPT Presentation

Citation preview

Page 1: Workflow discovery in e-science

Workflow discovery in e-science

Antoon Goderis Peter Li Carole Goble

University of Manchester, UK

www.cs.man.ac.uk/~goderisa

Page 2: Workflow discovery in e-science

Agenda

• Web services in science

• Workflow re-use

• Workflow discovery

– Is workflow discovery a new problem?

– How do people match up workflows?

– Can we replicate the behaviour with tools?

• Conclusions

Page 3: Workflow discovery in e-science

Workflows Web services

BPEL, SCUFL, MOML, VDL … descriptions

SOAP, WSDL description

Workflow engine Readily invoked

Orchestrates (Web-) services

Can be published as Web service

Page 4: Workflow discovery in e-science

Science is highly distributed and connected

Page 5: Workflow discovery in e-science

The Web has revolutionised science

Page 6: Workflow discovery in e-science

Web services about to do the same?

Page 7: Workflow discovery in e-science

Scientific workflows• e-science = supporting scientists to encode,

enact, explain and share experimental procedures featuring lots of specialised data

• Case study: bioinformatics – Understanding the DNA to behaviour link

– 3000 bio-services via the Taverna workflow editor http://mygrid.org.uk/taverna

– Re-use and repurposing of workflows

– +/- 200 Taverna workflows shared at fffff

Page 8: Workflow discovery in e-science
Page 9: Workflow discovery in e-science

Scientific workflows• e-science = supporting scientists to encode,

enact, explain and share experimental procedures

• Case study: bioinformatics – Understanding the DNA to life link

– 3000 bio-services via the Taverna workflow editor http://mygrid.org.uk/taverna

– Re-use and repurposing of workflow fragments

– +/- 200 Taverna workflows shared at fffff

Page 10: Workflow discovery in e-science

Manchester, CS dept

Manchester Biology dept

Newcastle, CS dept

Page 11: Workflow discovery in e-science

Scientific workflows• e-science = supporting scientists to encode,

enact, explain and share experimental procedures

• Case study: bioinformatics – Understanding the DNA to life link

– 3000 bio-services via the Taverna workflow editor http://mygrid.org.uk/taverna

– Re-use and repurposing of workflow fragments

– +/- 200 Taverna workflows shared at www.myExperiment.org

Page 12: Workflow discovery in e-science
Page 13: Workflow discovery in e-science

One + Three questions1. Can’t we just do it with ?

• Keyword search doesn’t seem to cut it

1. Is workflow discovery a new problem?

2. How do people match up workflows?

3. Can we replicate the behaviour with tools?

Page 14: Workflow discovery in e-science

my current workflow myExperiment.org

Page 15: Workflow discovery in e-science

my current workflow myExperiment.org

?

Page 16: Workflow discovery in e-science

1. Is workflow discovery a new problem?

Service discovery Workflow discovery

Discovery goal Encapsulate found service

Edit found workflow

Matching process Match over signature

Match over signature and content (data and service flow)

Starting context Service or data Service or data or workflow

Source: survey of 21 myGrid/Taverna users

Page 17: Workflow discovery in e-science

1. Is workflow discovery a new problem? Yes

Service discovery Workflow discovery

Discovery goal Encapsulate found service

Edit found workflow

Matching process Match over signature

Match over signature and content (data and service flow)

Starting context Service or data Service or data or workflow

Workflow discovery subsumes service discovery

Page 18: Workflow discovery in e-science

2. How do people match up workflows?

?

Page 19: Workflow discovery in e-science

3. Can we replicate the behaviour with tools?

?+

1

2

3

...

1

2

3

Page 20: Workflow discovery in e-science

A user experiment with bioinformatics workflows

?+

Page 21: Workflow discovery in e-science

Workflow discovery task

• Can I sensibly adapt an existing experimental procedure (workflow) with another one?

• Extend Replace

+

?

Page 22: Workflow discovery in e-science

Workflow corpus

• 66 similar workflows for Graves’ disease done by single author

• 1 + 5 workflows

• Workflow diagram

• No documentation

• No annotation

1 + 5

Page 23: Workflow discovery in e-science

By the experts, for the experts

• 9 bioinformaticians and 4 developers at a Taverna training day

Page 24: Workflow discovery in e-science

Matching strategies

• Matching input workflow with 5 others1 2

3 4

5

?

Page 25: Workflow discovery in e-science

Human on-line matching strategies!

• Traits

• Scores of attraction

• Yes or no

Page 26: Workflow discovery in e-science

Matching strategy: traits

Men want.. Women want..

Short term relationship

Long term relationship

Slim Tall

Students, artists, musicians, veterinarians

Lawyers, financial execs, firemen

Blonde Hair or shaved

Medium income High income

From an analysis of 30 000

profiles

Page 27: Workflow discovery in e-science

Matching strategy: scoring

Confidencelevel

Score

Percentile

www.AmIHotOrNot.com

Page 28: Workflow discovery in e-science

Matching strategy: yes or no

Page 29: Workflow discovery in e-science

Traits

• Predicted trait Biological subtask

Biological supertask

Shared inputs + outputs

Same service type

Shared service compositions

Shared path between intermediary input and output

Page 30: Workflow discovery in e-science

Traits and score

• Predicted trait

• Score of similarity, usefulness and confidence

E.g. [1 Identical –

9 Not similar]

Biological subtask

Biological supertask

Shared inputs + outputs

Same service type

Shared service compositions

Shared path between intermediary input and output

Page 31: Workflow discovery in e-science

The gold standard

?• The collection of

workflow similarity assessments

• Predictive traits, possibly interacting

1 + 5

Traits/score

Page 32: Workflow discovery in e-science

2. How do people match up workflows?

• Difficulty of task

– Biological relationship very difficult for 6 out of 9

– Shape similarity difficult for 4 out of 13

– Medium confidence

• Consistency

– Inter participant disagreement on how to order biological similarity and shape similarity [Spearman rank order test]

• Predictive traits

– No one trait dominant between and within participants [Levene homogeneity of variance test]

Page 33: Workflow discovery in e-science

Can we do better?

• Simpler tasks and workflows

• Taverna experienced users

• Workflow documentation and annotation

• Other factors in use, e.g. size difference

– Fix allowed factors

– Adopt black box approach: yes/no matching

Page 34: Workflow discovery in e-science

Automated discovery technique• Unattributed graph matcher implementation by

Messmer and Bunke

– Sub-isomorphism detection; exponential time complexity

– DAGs and optimization for repository of graphs

• Workflows parsed as graphs

– Workflow input, workflow output andintermediate services as nodes

– Data links as edges

•probeSetid

AffyMapper_seq databaseid

Blastx

Results_Blastx

Page 35: Workflow discovery in e-science

• Ranking based on

– shared nodes

– difference in size between input graph and repository graphs

Automated discovery technique

Page 36: Workflow discovery in e-science

3. Can we replicate the behaviour with tools? Kind of..

Average similarity assessments across participants

?+

1 + 66

Traits/score

Page 37: Workflow discovery in e-science

Current work

?+

1

2

3

...

1

2

3

12 + 21

Yes/no

Text clustering

OWL workflow ontology

Precision / recall

Graph matching

Page 38: Workflow discovery in e-science

Take home • Scientists compose Web services for real – and

share their results

• Workflow discovery is a real problem, which subsumes service discovery

• A range of matching strategies and techniques apply

• Evaluation is a challenge - gold standards hard to build

• Come and play at myExperiment.org

• References at www.cs.man.ac.uk/~goderisa