47
www.ci.anl.gov www.ci.uchicago.edu Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University of Chicago Argonne National Laboratory

Www.ci.anl.gov Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University

Embed Size (px)

Citation preview

Page 1: Www.ci.anl.gov  Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University

www.ci.anl.govwww.ci.uchicago.edu

Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna

Ravi MadduriUniversity of ChicagoArgonne National Laboratory

Page 2: Www.ci.anl.gov  Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University

www.ci.anl.govwww.ci.uchicago.edu

2

About me

• Research Fellow at the Computation Institute, University of Chicago

• Lead architect for Workflow technologies in the caBIG project

• Workflow Working Group Chair and a key person in the BIRN project

• Interested in Informatics, Applications of High throughput data transfer, computing in Biomedical informatics

Page 3: Www.ci.anl.gov  Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University

www.ci.anl.govwww.ci.uchicago.edu

3

And..

Page 4: Www.ci.anl.gov  Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University

www.ci.anl.govwww.ci.uchicago.edu

4

Agenda

• Introduction to Service Oriented Science (SoS)• Introduction to caBIG as an example of SoS• Introduce caGrid as an enabler of SoS vision• Introduce Workflow concepts• Talk about our implementation using Taverna• Show a few Taverna workflows including the

AutoQRS workflow from CVRG• Lessons learned and future directions.

Page 5: Www.ci.anl.gov  Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University

www.ci.anl.govwww.ci.uchicago.edu

5

Service-Oriented Science

People create services (data, code, instr.) …which I discover (& decide whether to use) …& compose to create a new function ... & then publish as a new service.

I find “someone else” to host services, so I don’t have to become an expert in operating services & computers!

I hope that this “someone else” can manage security, reliability, scalability, …

!!“Service-Oriented Science”, Science, 2005

Page 6: Www.ci.anl.gov  Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University

www.ci.anl.govwww.ci.uchicago.edu

6

caBIG Goal and Vision

caBIG is a virtual web of interconnected data, individuals and organizations that redefines how research is conducted, care is provided, and patients/participants interact with the biomedical enterprise.

• Connect the cancer research community through a shareable, interoperable infrastructure

• Deploy and extend standard rules and a common language to more easily share information

• Build or adapt tools for collecting, analyzing, integrating and disseminating information associated with cancer research and care

Page 7: Www.ci.anl.gov  Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University

www.ci.anl.govwww.ci.uchicago.edu

7

caGrid

caBIG function dimensions

Clinical Data and Trials Management

Biospecimen Management

In Vivo Imaging

Molecular Characterization

Page 8: Www.ci.anl.gov  Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University

www.ci.anl.govwww.ci.uchicago.edu

8

What is caGrid?

• Biomedical applications that share data all have common needs for syntactic and semantic interoperability

• caGrid is a software toolkit aimed at software developers creating Grid applications

Page 9: Www.ci.anl.gov  Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University

www.ci.anl.govwww.ci.uchicago.edu

9

caGrid provides

• Metadata services that add semantic information to all Grid services

• The GAARDS toolkit, a standard security platform

• Introduce: the ‘Eclipse’ for services development• Index Service: A service registry for

advertisement and discovery of capabilities

Page 10: Www.ci.anl.gov  Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University

www.ci.anl.govwww.ci.uchicago.edu

10

caGrid: nuts and bolts

Page 11: Www.ci.anl.gov  Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University

www.ci.anl.govwww.ci.uchicago.edu

11

A scientific workflow

• precisely defines a multi-step procedure, to seamlessly integrate and streamline local and remote heterogeneous computational and data resources to perform in silico scientific exploration.

Page 12: Www.ci.anl.gov  Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University

www.ci.anl.govwww.ci.uchicago.edu

Workflow Requirements

12

Service discovery

Data access

Service interaction

Security enforcement

Knowledge sharing

Page 13: Www.ci.anl.gov  Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University

www.ci.anl.govwww.ci.uchicago.edu

13

caGrid

data

instruments

computation resource

Virtualization

Security

Connectivity

Overview of caGrid Workflow

Cancer Data Standards Repository

Discovery Composition

Orchestration

Analysis

Community

reuse

generate

• Workflow as consumer- Easily reuse services for complex

experiments.

- Workflow as contributor - Workflow as “best practice”

wrapped as services.

- Workflow providing RoI for SOA

Page 14: Www.ci.anl.gov  Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University

www.ci.anl.govwww.ci.uchicago.edu

14

(1) Service discovery

Index Metadata

(2) Data access

Data servicesAnalytical services

Security services

(5) Knowledge

sharing

(4) Security enforcement

Taverna workbench

(3) Service invocation

caFlow

(1)

(2)

(3)

(4)(5)

authen. credentialdelegation

...

• caGrid Workflow Suite • Service discovery• Data access• Service interaction• Security enforcement• Knowledge sharing

Page 15: Www.ci.anl.gov  Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University

www.ci.anl.govwww.ci.uchicago.edu

15

The caBIG Workflow System

caGrid

Cancer Data Standards Repository

Discovery composition

Execution Reuse

Community

reuse

generate

Service discovery based on cancer research metadata.

Data-flow modeling flavor caGrid activity

State management (WSRF)Security (GSI)

Implicit iteration: handle parallel executionWSRF and GSI enforcement

A “Facebook” for caGrid workflows

Workflow Execution. ServiceWorkflows in caGrid Portal

Page 16: Www.ci.anl.gov  Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University

Semantic Service Discovery• Semantic search – searches Index Service for registered caGrid services

matching various search criteria:– Service name, inputs, outputs, research center,

class names, concept codes, etc.

Page 17: Www.ci.anl.gov  Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University

www.ci.anl.govwww.ci.uchicago.edu

17

Service metadata • Types of query- String based. - Property based.- Semantic based.

Semantic Service Discovery

Page 18: Www.ci.anl.gov  Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University

www.ci.anl.govwww.ci.uchicago.edu

18

caBIG services palette

• As a result of semantic search or direct adding– caBIG services appear in Taverna’s Service Panel– Ready to be drag

and dropped into caGrid workflows

Page 19: Www.ci.anl.gov  Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University

www.ci.anl.govwww.ci.uchicago.edu

19

Data access: CQL Builder

Page 20: Www.ci.anl.gov  Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University

www.ci.anl.govwww.ci.uchicago.edu

20

Service interaction: managing state

0

10

20

Page 21: Www.ci.anl.gov  Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University

www.ci.anl.govwww.ci.uchicago.edu

21

Security enforcement

• Authentication– Ability to invoke services secured by Grid Security

Infrastructure (GSI)– Integrated caGrid Security framework (GAARDS)

with Taverna’s Credential manager– Transport Level Security

• Authorization– This is done on the service side upon looking at

User’s credentials• Credential Delegation Service Integration

Page 22: Www.ci.anl.gov  Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University

www.ci.anl.govwww.ci.uchicago.edu

22

Secure Grid services

• Taverna can invoke secure Grid services that require user to log in to caGrid

• Taverna interacts with caGrid’s GAARDS infrastructure to obtain user’s proxy:– Authenticate the user with user’s affiliated

Authentication Service– Obtain user’s proxy from Dorian Service– Default proxy lifetime: 12 hours

Page 23: Www.ci.anl.gov  Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University

www.ci.anl.govwww.ci.uchicago.edu

23

Using secure caGrid services

• Involves:1. Discovering a secure caGrid service from

Taverna2. Logging onto selected caGrid to obtain a

proxy certificate3. Saving and managing caGrid proxies and

username and passwords

Page 24: Www.ci.anl.gov  Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University

www.ci.anl.govwww.ci.uchicago.edu

24

Configuring secure services (1/2)

• Authentication Service and Dorian Service urls required in order to obtain user’s proxy

• Can be configured globally for all services from the same caGrid (in preferences)

• Can be configured individually for a particular caGrid service (overrides configuration from preferences)

Page 25: Www.ci.anl.gov  Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University

www.ci.anl.govwww.ci.uchicago.edu

25

Configuring secure services (2/2)

• View secure’s service details• Configure service’s

security properties

Page 26: Www.ci.anl.gov  Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University

www.ci.anl.govwww.ci.uchicago.edu

26

Logging onto caGrid

• User is prompted for his caGrid username and password when any secure service is invoked from a workflow for the first time

Page 27: Www.ci.anl.gov  Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University

www.ci.anl.govwww.ci.uchicago.edu

27

Credential management

• Taverna obtains proxy for user from Dorian Service using user’s caGrid username and password

• Proxies are saved and managed byCredential Manager

• caGrid username and password can also be remembered

Page 28: Www.ci.anl.gov  Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University

www.ci.anl.govwww.ci.uchicago.edu

28

Workflow execution service

Taverna Workflow Service wraps the Taverna execution engine into a WS-Resource and exposes operations such as createResource, startWorkflow, getStatus, and getOutput for user submitted workflows.

startWorkflowcreateResource

getStatus

getOutput

Workflow Service

Stateful Resources

(Resource Properties)

EPR

Taverna Engine

Data Services

Analytical Services

caGrid &

Other Services

Client API

Taverna Workbench Workflow Portlet

Page 29: Www.ci.anl.gov  Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University

www.ci.anl.govwww.ci.uchicago.edu

29

Workflow execution service

• Taverna Workflow Service • Provides stateful resources that execute the

workflows.• Supports caGrid security architecture (GSI

Security).• Allows programmatic submission of workflows.

Page 30: Www.ci.anl.gov  Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University

www.ci.anl.govwww.ci.uchicago.edu

30

Access Taverna workflow via caGrid portal

Taverna Workflow Portlet is deployed in the caGrid Portal on the training Grid:

URL : http://portal-demo.training.cagrid.org/web/guest/tools/taverna-workflow

•The Portlet currently lists a few workflows with their descriptions that can be browsed from the above URL

• Users can select a workflow they are interested in running.

View : 1

Page 31: Www.ci.anl.gov  Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University

www.ci.anl.govwww.ci.uchicago.edu

31

Access Taverna workflow via caGrid portal

URL : http://portal-demo.training.cagrid.org/web/guest/tools/taverna-workflow

• Based on the number of input ports in the workflow, the portlet prompts the users to enter the input values in the textbox.

• For example, the Lymphoma workflow takes only one input in the form an Experiment ID that identifies the experiment that caArray uses for data collection.

• Hit submit after the entering the data.

View : 2

Page 32: Www.ci.anl.gov  Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University

www.ci.anl.govwww.ci.uchicago.edu

32

Access Taverna workflow via caGrid portal

URL : http://portal-demo.training.cagrid.org/web/guest/tools/taverna-workflow

• The portlet stores the user submitted workflows in the current session of the portal.

• Users can View all the Active and Completed Workflows in the session.

• Clicking the Output Button shows the output of the workflow.

• The portlet provides workflow specific view-resolvers to render the outputs. For E.g: Lymphoma workflow currently displays the output in a html table.

Views : 3, 4, & 5

Page 33: Www.ci.anl.gov  Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University

www.ci.anl.govwww.ci.uchicago.edu

33

• Search ‘cabig’ in myExperiment or • Type

http://www.myexperiment.org/search?type=workflows&query=cabig

• Typehttp://tinyurl.com/cabig-workflow

Knowledge Sharing

Page 34: Www.ci.anl.gov  Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University

www.ci.anl.govwww.ci.uchicago.edu

Discovery using myExperiment

34

Page 35: Www.ci.anl.gov  Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University

www.ci.anl.govwww.ci.uchicago.edu

MicroArray from

tumor tissue

Microarray

preProcessing

Lymphoma

prediction

Lymphoma Prediction Workflow

Page 36: Www.ci.anl.gov  Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University

www.ci.anl.govwww.ci.uchicago.edu

Lymphoma type prediction

Acknowledgement: Juli Klemm, Xiaopeng Bian, Rashmi Srinivasa (NCI)Jared Nedzel (MIT)

Page 37: Www.ci.anl.gov  Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University

www.ci.anl.govwww.ci.uchicago.edu

AutoQRS Analysis Workflow

WFDB binary and Patient ID

WFDBdata service

AutoQRS Output Data

Service

AutoQRS Analytical

Service

Retrieve WFDB Patient Record

JSDL service

InvokeProcessing

AnalysisExecutionRecord

AutoQRS XML Results

Store WFDB

Page 38: Www.ci.anl.gov  Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University

www.ci.anl.govwww.ci.uchicago.edu

38

The Taverna workflow

Page 39: Www.ci.anl.gov  Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University

www.ci.anl.govwww.ci.uchicago.edu

39

The result in MS Excel

Page 40: Www.ci.anl.gov  Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University

www.ci.anl.govwww.ci.uchicago.edu

40

Accomplishments

• Lymphoma workflow – Among the top 20 most viewed/downloaded Workflows in myExperiment– This is more impressive given that this workflow was

uploaded much later than the other workflows• Our BMC-Bioinformatics Article on “caGrid

Workflow Toolkit: A Taverna based workflow tool for cancer Grid” achieved “Highly Accessed” relative to its age

• We are part of the CVRG Project that recently got renewed

Page 41: Www.ci.anl.gov  Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University

www.ci.anl.govwww.ci.uchicago.edu

41

Lessons Learned

• Lower the barriers to entry for sharing data and analytics

• Software is surprisingly hard to use for end users – more so if the benefit is not all too clear

• Return on Investment of a SOA is in creating reusable workflows (LEGO blocks)

• Workflows are only as good as the services we create

• Traditional SDLC does not always work in the favor of the end users

• 80-20 and KISS

Page 42: Www.ci.anl.gov  Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University

www.ci.anl.govwww.ci.uchicago.edu

42

Goals of Workflow Project in CVRG

• Deploy existing technology on the CVRG that can be used to store and execute workflows generated locally using the Taverna workbench

• Develop new technology that allows non-expert users to graphically compose and execute workflows via a web-interface.

• Extend the Taverna Engine and add support to invocation of REST-style services so that users can annotate workflow inputs and outputs using ontology terms from NCBO Bioportal and other ontology repositories

• Develop specifications describing how workflows should be designed, validated, and documented, and support user development of workflows.

• Extend the technology so that workflows can be executed in a cloud-computing environment

Page 43: Www.ci.anl.gov  Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University

www.ci.anl.govwww.ci.uchicago.edu

43

Suggested Direction

• Hosted Workflow Solution– SaaS workflow tools• Globus Online• Galaxy

Page 44: Www.ci.anl.gov  Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University

www.ci.anl.govwww.ci.uchicago.edu

44

Acknowledgements

• Univ. Chicago / ANL– Ian Foster– Dinanath Sulakhe– Bo Liu

• Univ. Manchester, UK– Carole Goble– Stian Soiland-Reyes– Alexandra Nenadic

• Inventrio – Shannon Hastings– Stephen Langella– Scott Oster

• Other colleagues from Ohio State University, National Cancer Institute, JHU …

Page 45: Www.ci.anl.gov  Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University

www.ci.anl.govwww.ci.uchicago.edu

45

Journal papers & book chapters• Composition as a Service. IEEE Internet Computing. 2010• A Comparison of Using Taverna and BPEL in Building Scientific Workflows: the

case of caGrid. CCPE. 2010.• Data-driven Service Composition in Building SOA Solutions: A Petri Net

Approach. IEEE T-ASE, 2010• Scientific workflows that enable Web-scale collaboration: combining the

power of Taverna and caGrid. IEEE Internet Computing. 2008• Workflow in a Service Oriented Cyberinfrastructure Environment. in: Junwei

Cao (Ed.). Cyberinfrastructure Technologies and Applications. Nova Science Publishers, 2008. (book chapter)

Page 46: Www.ci.anl.gov  Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University

www.ci.anl.govwww.ci.uchicago.edu

46

Conference papers• Scientific workflows as services in caGrid: a Taverna and gRAVI

approach. ICWS 2009• Wrap Scientific Applications as WSRF Grid Services using gRAVI.

ICWS 2009• Orchestrating caGrid Services in Taverna. ICWS 2008• Building Scientific Workflow with Taverna and BPEL: a

Comparative Study in caGrid. WESOA 2008• Build Grid Enabled Scientific Workflows using gRAVI and Taverna.

SWBES 2008

Page 47: Www.ci.anl.gov  Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University

www.ci.anl.govwww.ci.uchicago.edu

47

Contact information

• Ravi Madduri– [email protected]

• Computation Institute, Univ. Chicago– http://www.ci.uchicago.edu/