BioIT Europe 2010 - BioCatalogue

Preview:

DESCRIPTION

BioCatalogue presentation at BioIT Europe Hannover 2010 by prof Carole Goble

Citation preview

The Reality of Web Services in the Life Sciences

Professor Carole Goblecarole.goble@manchester.ac.uk

University of Manchester, UKmyGrid Project

BioIT World Europe 2010, Hannover

http://www.biocatalogue.org

Web Services

• Programmatic Interfaces to Services.

• Machine-Machine communication

• Software Lego™ that works across the web and underpins enterprise SOA.

• Standard interfaces.• Two big families:

– SOAP and REST.

Programmatic Interfaces to Services on the up…..

• Specialisation and segregation of methods from monolithic servers.

• Component packaging.• Publishing data and analyses.• Tools / resources integration.• Applications, analytic workflows,

workbenches and enterprise platforms

• Agile software development• Remote and in house execution • Loosely coupled systems.

http://ww

w.m

yexperiment.org/w

orkflows/15

8.html

Service Providersand Consumers

• Core facility (EMBL-EBI, DDBJ, NCBI …)

• EMBL-EBI 8-10million hits/month• 329 services

• Community projects and labs

• Single Investigator projects

• Enterprises (e.g. Pharmas)

Public Private

Web Service Rhetoric

• Pistoia Alliance

• BioIT Alliance

• ELIXIR

• But not all rosy … see Christian Hauck’s talk 16.00 Thursday.

Web Service Technology Standards

• Simple Object Access Protocol– Remote Procedure Call based– HTTP transport protocol only– Web Service Description Language in

XML, UDDI registry– Extensible

• Representational State Transfer– Resource (document) style– HTTP and URI application protocol– XML and JSON responses, usually– GET / PUT / POST – Lightweight, webby

Bio Service Special Flavours

• Distributed Annotation Services (www.biodas.org)

• BioMOBY (www.biomoby.org)

• SADI

• SSWAP (iPlant Collaborative)

Where…can I find them? advertise mine?

What…do they do? can I use them?

How…do they work? up to date? reliable?

Who…provides them? recommends them? knows about them?

Reusing Public and Third Party Web Services

Web Service Description Language

<wsdl:message name="getGlimmersResponse">

<wsdl:part name="getGlimmersReturn" type="xsd:string"/> </wsdl:message> <wsdl:message name="aboutServiceRequest"/> <wsdl:message name="getGlimmersRequest">

<wsdl:part name="in0" type="xsd:string"/> <wsdl:part name="in1" type="xsd:string"/> <wsdl:part name="in2" type="xsd:string"/> <wsdl:part name="in3" type="xsd:string"/> <wsdl:part name="in4" type="xsd:string"/> <wsdl:part name="in5" type="xsd:string"/> <wsdl:part name="in6" type="xsd:string"/> <wsdl:part name="in7" type="xsd:int"/> <wsdl:part name="in8" type="xsd:string"/>

Pathport Web service from the Virginia Bioinformatics Institute http://pathport.vbi.vt.edu/services/wsdls/beta/glimmer.wsd

Name of the service

Uninformative names for parameters

What kind of string?

Services In the Wild

Find• EMBOSS clustalw program called ‘emma’

Execute• SOAP / REST / Quasi-REST / REST-like

Understand• Input0:string, Output0: string• What does SeqRet actually do?• Example data? Parameter configurations?

Input-Output correlations?

Use• Quality of Service, Monitoring, Robustness• Volatility, Sustained, License, Conditions of Use

Cataloguingto avoid reinvention

• Investigator and project specific registries

• Community lists• Specialist

registries

• General catalogues and search engines

An Open, Public, Curated, Boutique Cataloguefor Web Services serving the Life Sciences for the

Bioinformatics Community

http://www.biocatalogue.orgLaunched June 2009

Nucl Acids Res, June 2010, Web Servers issue doi: 10.1093/nar/gkq394

UNDERSTANDand USE

UNDERSTANDand USE

Prot

ein

Seq.

Alig

nmen

t

Prot

ein

Stru

ctur

e P

redi

ction

Prot

ein

Func

tion

Pred

ictio

n

Nuc

leoti

de S

eq. A

lignm

ent

Rna

stru

ctur

e pr

edic

tion

Gen

e Pr

edic

tion

Text

Min

ing

Ont

olog

y

Phyl

ogen

y

Mic

roar

ray

Sequ

ence

Ret

rieva

l

Iden

tifier

Ret

rieva

l

Stru

ctur

e Re

trie

val

Lite

ratu

re R

etrie

val

Gen

omic

s

Prot

eom

ics

Syst

ems

Biol

ogy

Bios

tatis

tics

Chem

oinf

orm

atics

Service Coverage1719 services – SOAP and REST

– 92% with service description– 57.5% with all ops/methods described

>60 classifications Big players: EBI, NCBI, DDBJ etc….

60 operations on chemistry and chem-informatics data

[June 09 - Sep10]

Steady use: 2K+ unique IPs/month.

• Chiefly public services• Community contributed

– Service Providers: 127– Third Parties: 92

submitters– 420 registered members– 27 countries

(UK>Spain>USA>Canada)

• Partners and registries– EMBRACE Registry,

SeekDa!, (BioMOBY, DAS)

• Automated crawling• Manual mining

Building Content and Community

EMBL-EBI

DDBJ

NCBI

But these statistics have to be interpreted…..

Curation

Chang

e log

s

Quantitative Annotations

Tags

Semantic Annotations

Ontologies

FunctionalCapabilities

Provenance

OperationalCapabilities

OperationalMetrics

Use Policy

Social Status

Ratings

AttributionFree text

Instrumentation

Usable and Useful

Understandable

Annotations

Bio-Services• EDAM• myGrid• BioMOBY…

Bioontologies• OBO

Foundry• BioPortal…

Services• WSMO• SAWSDL• SA-REST…

Incremental Annotation50,672

• accumulate, aggregation, types, attribution

Archived ServiceArchived Service

AnnotationsAnnotations

AttributionAttribution

TaggingTagging

Social Social

Annotate AnythingAnnotate Anything

CategoriesCategories

OperationsInputsOutputs

OperationsInputsOutputs

Example useExample use

• Availability• API changes• Test script

sandbox

• Based on EMBRACE Registry Monitoring Framework

• Availability• API changes• Test script

sandbox

• Based on EMBRACE Registry Monitoring Framework

Social SharingFeeds

Social SharingFeeds

WSDL, SAWSDL, SA-REST, WSMORDF and SPARQL

Service annotationformats

Gadgets, Apps

Customised and Private instances

A service / resource

Open Source (BSD)Open Platform

Read (Write) REST APIs

EDAM, BioMOBY, myGrid, OBO family, BioXSD

Annotation Ontologies

People Powered ContentReward and AttributionSensitivities

Tools

Bringing a Community together

Automation

Core Contribution& CurationCoordinationGovernance

Content Capture & Curation

GovernanceBlackhole

• Submission• Content• Ownership / submitter /

curator responsibilities• Responsibility migrations• Service update• Metadata update• Notifications• Withdrawal• Take-down• Archiving• Preservation

Curating third party services is HARD

The Reality of Web Services in the Life Sciences

The Reality of (Expert) Crowd Sourcing Contributions

for a Web Service Catalogue

Eight years ago Lincoln Stein said…

“An interface is a contract between data provider and

data consumer”

Stein L Creating a bioinformatics nation. Nature 2002;417:119-120.

A Public interface means a Public Service

• Thinking local not global– Local configuration bake-ins – Scalability – I/O and load– Interface granularity and interaction

chattiness

• Interface churn– Silent API volatility– BioCatalogue Change logs– Web Interface trumps API– Local application trumps dependent

external ones

Ensembl API: updated on every release, not backward compatible with obscured versioning.

BioMART: exposed internal identifier formats and then changed them.

Preservation

(Public) Service Sustainability

Staff/funding/project churn• 2 year availability, responsibility migration/hole, service

decay -> application decay• 58% developed by students, 24% stated not maintained • (Schultheiss et al. (2010) PLoS Comp Biol (in review))• 146 services archived, >90% availability

Sustainability strategyMake it portable, Provide documentationUse existing frameworks and practicesInvolve the community and know your usersPlan sunset or migrationFunding models for sustainability

Schultheiss et al. (2010) PLoS Comp Biol (in review)

Geek UsabilityQuasi-Standards

• http://xml.nig.ac.jp/rest/Invoke?• service={x}&method={y}&...

• Which service? Need to know precisely what is expected for every service at the same endpoint

• http://xml.nig.ac.jp/{service}/{method}?...• Service-method pairs

y

like

http://BASE/op?parameter={value}

Usability: The What and How are Implicit knowledge

• No or lots of docs, poor examples• Complexity• Interfaces and Operation• Service families

Service

OperationOperationOperationOperationOperationOperationOperationOperationOperationOperation

Input

Output

Parameters

Errors

Behaviour families

Function

Polymorphic

Patterns

e.g. KEGG, TFmodeller

e.g. searchSimple operation in BLAST DDBJ

e.g. InterProScan (EBI), RapidMiner, Soaplab Server

Domain Tasks

Invocable operations

query database program

searchSimple

Polymorphic One operation

multiple functional unitsBLAST (DDBJ)

1 Operation: searchSimple

5 Functional units

PD: protein sequence databaseND: nucleotide sequence database

proteinBlast

blastp proteinPD

nucleotideBlast

blastn nucleotide ND

proteinNucleotideBlast

tblastn nucleotideND

nucleotideProteinBlast

blastx protein PD

nucleotideBlastFrameTranslation

tblastx nucleotide ND

Server Wrapper Pattern

• SOAPLab services operations

• clear | describe | getLastEvent | getResults | getResultsInfo | getStatus | run | runAndWaitFor | terminate | waitfor |

• All 100 or so services have same WSDL document.

The SOAP/REST technical view over services is not enough

Need a functional / task-oriented view

Functional Unitannotation

• Service description abstraction

• Services as functional tasks

• Within the boundary of a service

• Independent from technology used

Service

OperationOperationOperationOperationOperationOperationOperationOperationOperationOperationW

SD

LR

ES

TD

AS

[Missier, et al 2010 Functional Units: Abstractions for Web Service Annotations]

Complexitybecause it’s a database really

SABIO–RK Service only

Taverna workflow

find chemical reactions that are associated with a given metabolite, and the kinetics associated with those reactions.

Reflections

• Writing reusable, reliable (public) services with good and stable interfaces for others is hard

• A service interface is different to a web interface or a database query interface.

• Public interfaces – internal interfaces mismatch• Publishing an interface is a publishing step.• Technologist – User mismatch• Eat your own dog food• Takes resource, time and trouble• But will pay off! We can’t afford to reinvent.

Enterprise Concerns:real or perceived?

• Security– HTTPS trusted peers inside a firewall– WS-Security and OAuth (REST)– Or is it fear of using external data?

• Performance– Signature granularity and chattiness– Data shipping vs reference shipping– XML and JSON are not the only

formats

• Governance– Service Level Agreements

Technical or social issues?

Collaborative Curating

• Socialising the community• Rewarding contributors

• 10:90 long tail rule• Content feedback spiral

• Feedback sensitivities• Reputation protection

• Widen - Smart application feeds

• Resourced core content team

Cost of Crowd Curation

Take home

• Emerging, evolving, exciting and challenging Web service ecosystem

• BioCatalogue draws together services, knowledge and community to provide intelligence.

• Crowd collaboration to scale contribution, core to coordinate

• Open effort – contribute or adopt• Core resource – for Alliances and Journals

• Social + technical challenges• Christian Hauck’s talk 16.00 Thursday.

Credits

Thomas LaurentHamish McWilliams

Franck Tanoh Jiten BhagatCarole Goble

Rodrigo LopezEric Nzuobontane

Steve Pettifer

Katy Wolstencroft

Robert Stevens

David De Roure

52

Mannie Tagarira

Jerzy OrlowskiSergejs Aleksejevs

Thank You

http://www.biocatalogue.org

About Us - http://wiki.biocatalogue.org

API Docs - http://apidocs.biocatalogue.org

11th July 2010 54ISMB 10

Bhagat, J., Tanoh, F., Nzuobontane, E., Laurent, T., Orlowski, J., Roos, M., Wolstencroft, K., Aleksejevs, S., Stevens, R., Pettifer, S., Lopez, R., Goble, C.A.: BioCatalogue: a universal catalogue of web services for the life sciences, Nucl. Acids Res., 2010.

doi:10.1093/nar/gkq394

Recommended