51
Taverna and myExperiment: Designing, Exchanging and Sharing of Scientific Workflows Katy Wolstencroft University of Manchester

Taverna and myExperiment: Designing, Exchanging and Sharing of Scientific Workflows

  • Upload
    deacon

  • View
    30

  • Download
    4

Embed Size (px)

DESCRIPTION

Taverna and myExperiment: Designing, Exchanging and Sharing of Scientific Workflows. Katy Wolstencroft University of Manchester. Connecting things Together. Data Resources Genome databases Kinetic/metabolite data Analysis tools Sequence alignment Similarity searching Pattern matching - PowerPoint PPT Presentation

Citation preview

Page 1: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

Taverna and myExperiment: Designing, Exchanging and Sharing of

Scientific Workflows

Katy WolstencroftUniversity of Manchester

Page 2: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

Connecting things Together

• Data Resources– Genome databases– Kinetic/metabolite data

• Analysis tools– Sequence alignment– Similarity searching– Pattern matching

• Knowledge Resources– Ontologies– Controlled vocabularies

Page 3: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

What is a Workflow?

A mechanism for connecting things togetherWorkflows provide a general technique for describing and enacting a process Describes what you want to do, not how you want to do itSimple language specifies how bioinformatics processes fit togetherProcesses are represented as web services

RepeatMasker

Web serviceGenScan

Web ServiceBlast

Web Service

Sequence Predicted Genes out

Page 4: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

What is a workflow?

• Business Process workflows– Tasks, Schedules, dependencies (on staff time), and costs

• Scientific Workflows – on in silico data– Data throughput, dependencies (on analysis results)– Input, algorithm, output– Flow of information, scheduling of order, collection of results,

intermediate results and provenance• High level description of your experiment• Workflow is the model of the experiment

– Methods section in your publication• Workflow can be shared and reused

Page 5: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

Kepler

Triana

BPEL

Ptolemy II

Taverna

Page 6: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

Workflow diagram

Tree view of workflow structure

Available services Taverna

Open source and extensible

Page 7: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

What is a web service?

NOT the same as services on the web (i.e. web forms)Web services support machine-to-machine interaction over a network

Page 8: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

Web Evolution

XML

ProgrammabilityConnectivity

HTML

PresentationTCP/IP

Technology

Innovation

FTP, E-mail, GopherWeb Pages

Browse the Web

Program the Web

Web Services

Taken from :http://www.softstar-inc.com/

Page 9: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

How do you use Web Services?

• SOAP (Simple Object Access Protocol)– An xml protocol for passing messages

• WSDL (Web Service Definition Language)– A machine-readable description of the operations supported

• Normally transferred by http

Page 10: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

Who Provides the Services?

• Open domain services and resources• Taverna accesses 3500+ services• Third party – we don’t own them – we didn’t build them• All the major providers

– NCBI, DDBJ, EBI …• Enforce NO common data model.

• Quality Web Services considered desirable

Page 11: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

What types of service?

• WSDL Web Services• BioMart • R-processor• BioMoby• Soaplab• Local Java services• Beanshell• Workflows

• Coming soon.....REST, Matlab......?

Page 12: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

Create and run workflows

Share, discover and reuse workflows

Manage the metadata needed and generated

RDF, OWL

Discover and reuse servicesFeta

A Collection of Components

Page 13: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

What do Scientists use Taverna for?

– Data gathering, annotation and model building– Data analysis from distributed tools– Data mining and knowledge management– Data curation and warehouse population– Parameter sweeps and simulation

Users from Systems Biology, Proteomics, Sequence analysis, Protein structure prediction, Gene/protein annotation, Microarray data analysis, QTL studies, Chemioinformatics, Medical image analysis, Public Health care epidemiology, Heart model simulation, Phenotype studies, Phylogeny, Statistical analysis, Pharmacogenomics, Text mining Astronomy, Music, Meteorology

Page 14: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

Taverna - Successful cases of adoption Selected Successful Cases of Adoption

Originally designed to support bioinformatics, now expanded into new areas

Page 15: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

Annotation Pipelines

• Genome annotation pipelines – Bergen Center for Computational Science – Gene Prediction in

Algal Viruses, a case study.• Workflow assembles evidence for predicted genes / potential

functions • Human expert can ‘review’ this evidence before submission to the

genome database• Data warehouse pipelines

– e-Fungi – model organism warehouse– ISPIDER – proteomics warehouse

• Annotating the up/down regulated genes in a microarray experiment

Page 16: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows
Page 17: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

Building models and knowledge management

• SBML population• Comparing models and experimental data• Mining text resources and building knowledge models

Page 18: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

[Peter Li, Doug Kell]

Systems Biology Model Construction

Automatic reconstruction of genome-scale yeast metabolism from distributed data in the life sciences to create and manipulate Systems Biology Markup Models.

Page 19: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

LibSBML Integration

• API consumer used to integrate libSBML directly into Taverna

• Performing statistical analyses on quantitative data in Taverna workflows: an example using R and maxdBrowse to identify differentially-expressed genes from microarray data Peter Li, Juan I. Castrillo, Giles Velarde, Ingo Wassink, Stian Soiland-Reyes, Stuart Owen, David Withers, Tom Oinn, Matthew R. Pocock, Carole A. Goble, Stephen G. Oliver, Douglas B. Kell – Submitted to BMC bioinformatics

Page 20: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

Data Analysis Pipelines

• Access to local and remote analysis tool• You start with your own data / public data of interest• You need to analyse it to extract biological knowledge

Page 21: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

Trichuris muris

• Mouse whipworm infection - parasite model of the human parasite - Trichuris trichuria

Understanding Phenotype• Comparing resistant vs susceptible strains – MicroarraysUnderstanding Genotype• Mapping quantitative traits – Classical genetics QTL

Joanne Pennock, Richard GrencisUniversity of Manchester

Page 22: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

Trichuris muris

• Identified the biological pathways involved in sex dependence in the mouse model, previously believed to be involved in the ability of mice to expel the parasite.

• Manual experimentation: Two year study of candidate genes, processes unidentified

Joanne Pennock, Richard GrencisUniversity of Manchester

Page 23: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

Trichuris muris

• Identified the biological pathways involved in sex dependence in the mouse model, previously believed to be involved in the ability of mice to expel the parasite.

• Manual experimentation: Two year study of candidate genes, processes unidentified

• JO IS A LAB BIOLOGIST• JO HAS NEVER BUILT A WORKFLOW

Joanne Pennock, Richard GrencisUniversity of Manchester

Page 24: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

http://www.genomics.liv.ac.uk/tryps/trypsindex.html

Andy B

rassS

teve Kem

pP

aul Fisher

• Sleeping Sickness in African Cattle• Caused by infection by parasite (Trypanosoma brucei)

• Some cattle breeds more resistant than others• Differences between resistant and susceptible cattle?• Can we breed cattle resistant to infection?

Fisher et al (2007) A systematic strategy for large-scale analysis of genotype phenotype correlations: identification of candidate genes involved in African trypanosomiasis.Nucleic Acids Res.35(16):5625-33

Page 25: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

Why was the Workflow Approach Successful?

• Workflows are protocols – they can be reused or repurposed

• Workflow analysed each piece of data systematically– Eliminated user bias and premature filtering of datasets and

results leading to single sided, expert-driven hypotheses• The size of the QTL and amount of the microarray data

made a manual approach impractical • Workflows capture exactly where data came from and

how it was analysed• Workflow output produced a manageable amount of data

for the biologists to interpret and verify– “make sense of this data” -> “does this make sense?”

Page 26: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

Sharing Experiments

• Taverna supports the in silico experimental process for individual scientists

• How do you share your results/experiments/experiences with your– Research group– Collaborators– Scientific community

Page 27: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows
Page 28: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

Just Enough Sharing….

• myExperiment can provide a central location for workflows from one community/group

• myExperiment allows you to say– Who can look at your workflow– Who can download your workflow– Who can modify your workflow– Who can run your workflow

Page 29: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

The most important aspect of myExperiment - Designed by scientists

Ownership and Attribution

Page 30: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

• Packs allow you to collect different items together, like you might with a "wish list" or "shopping basket"

• You can collect internal things (such as workflows, files and even other packs) as well as link to things outside myExperiment

• Your packs can then be shared, tagged, discovered and discussed easily on myExperiment

Packs

Page 31: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

Bringing myExperiment to the Taverna User

myExperiment Plugin in Taverna

Page 32: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

Running Workflows Through myExperimentTaverna Remote Execution (T-REX)

Page 33: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX myexp: <http://rdf.myexperiment.org/ontology#>PREFIX sioc: <http://rdfs.org/sioc/ns#>select ?friend1 ?friend2 ?acceptedat where {?z rdf:type<http://rdf.myexperiment.org/ontology#Friendship> . ?z myexp:has-requester?x .?x sioc:name ?friend1 . ?z myexp:has-accepter ?y . ?y sioc:name ?friend2 .?z myexp:accepted-at ?acceptedat }

All accepted Friendships including accepted-at time

Semantically-Interlinked Online Communities

Page 34: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

Service Discovery

Feta “old School”• Semantic Discovery• Ability to find service

mismatches• Complex queries• Closed curation• Ugly GUI interface

BioCatalogue • Discovery by tags, text

and semantics• Social curation• Web based catalogue

Page 35: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

Finding Services

There are over 3500 distributed services. How do we find an appropriate one?

• We need to annotate services by their functions (and not their names!)

• The services might be distributed, but a registry of service descriptions can be central and queried

• Annotated with terms from the myGrid ontology• Questions we can ask: Find me all the services that

perform a multiple sequence alignment and accept protein sequences in FASTA format as input

Page 36: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

myGrid Ontology

Logically separated into two parts:• Service ontology

Physical and operational features of web services• Domain ontology

Vocabulary for core bioinformatics data, data types and their relationships

Ontology developed in OWL

Page 37: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

myGrid ontology

• Example : BLAST (from the DDBJ)– Performs task: Alignment– Uses Method: Similarity Search Algorithm– Uses Resources: DNA/Protein sequence databases– Inputs:

• biological sequence• database name• blast program

– Outputs: Blast Report

Page 38: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

Feta Search Result

Page 39: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

Limitations of the Current Model

• Feta discovery tool is only accessible from the Taverna Workbench

• Only pertinent to Taverna users – other people need to find and use web services

• Focuses on finding services, but not workflows. For reuse, we need to do both

• Closed annotation system - myGrid curator provides service descriptions

Page 40: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

BioCatalogue: A Community Resource

• Expanding annotation to allow the community to join in• What is the minimum annotation we need to find the

service, and to execute it?• Graduated annotation – bronze, silver, gold, platinum• Record who annotated what and when, to address

service versioning and status• Service status monitors

Page 41: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

Curation by Experts

Curation by the Community

Automated Curation

refinevalidate

refinevalidate

Curation by Developers

seed seed

refinevalidate

seed

BioCatalogueJoint Manchester-EBI

Launch ISMB 2009

Page 42: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

Current work

Page 43: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

Speed and Scalability

Taverna 2 enactor• Support for long running workflows• Large scale data – industrial bioinformatics• Data streaming• Passing data by reference• Integration with established computing platforms

– caGrid, EGEE, KnowArc, Dutch e-Science Grid

Page 44: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

caGrid Plugin for Taverna

• Enables discovery of services in caGrid service registry

• Taverna support for GAARDS-secured caGrid services

Lymphoma type prediction workflow

Page 45: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

Extensibility and ease of use

• Drag and drop workflow building• More content

– greater pool of workflows from myExperiment• More components

– Gathering together commonly used sets of services• Service and workflow annotation checking• Shim libraries – for connecting incompatible services

Page 46: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

Remote Execution

Taverna Remote Execution Service (T-REX)• Running workflows on a server• Running workflows inside other applications

Taverna is for informatics people (bioinformaticians, cheminformaticians etc). We need other interfaces for uptake by laboratory scientists and health workers

Page 47: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

Toolkits “Taverna Inside”

Workflows under the hood• e-Laboratories (portals)

– Systems Biology, e-Health• Web based execution

– Running workflows over the web through myExperiment• Visualisation clients that call workflows in the

background

Page 48: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

UTOPIAPettifer, Kell, University of Manchester

Page 49: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

Toolkits “Taverna Inside”Workflow development pipeline

Workflows developed by bioinformaticiansEnacted locally

E-Labs and 3rd party clients

Social support for bioinformaticians to find and reuse workflows and expertiseAccess to ready made workflows for biologists

Workflows enacted locally Taverna remote execution service (T-Rex)

Social support to find and reuse workflows and expertise

CONFIGURABLE access to ready made workflows for biologists

Workflows embedded in applications and combined with data management systems

Page 50: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

myGrid Team

Page 51: Taverna and myExperiment:  Designing, Exchanging and Sharing of Scientific Workflows

More Information• myGrid

– http://www.mygrid.org.uk• Taverna

– http://taverna.sourceforge.net• myExperiment

– http://www.myexperiment.org– http://rubyforge.org/projects/myexperiment/– http://wiki.myexperiment.org/

• BioCatalogue– http://www.biocatalogue.org

Thanks to Carole Goble, David De Roure, Stian Soiland-Reyes and Jiten Bhagat for slide contributions