Upload
delilah-logan
View
215
Download
1
Tags:
Embed Size (px)
Citation preview
myGrid: Upper level Grid Services
for the Bioinformatican
Prof. Carole Goblehttp://www.mygrid.org.uk
Sun Microsystems BioGrid Symposium, Baltimore, USA 4th-5th December 2002
UK eScience Programme
Grid-enabled eScienceEmphasis on information integration
and knowledge managementThe Virtual Organisation view$180 million + industrial contributionsComplete infrastructure of regional eScience centres, support and a UK computational GridStarted on Globus though Unicore
used in EuroGrid with great successCentres donated equipment – highly
heterogeneousCore component of the EU Grid FP6
programme
Cambridge
Newcastle
Edinburgh
Oxford
Glasgow
Manchester
CardiffSouthampton
London
BelfastDL
RALHinxton
myGrid
IBM
• EPSRC UK eScience pilot project• 01/01/02 - end 30/03/05• Uses the UK Grid infrastructure
Lion BioSciences, Millennium Pharmaceuticals & Oracle
• Not a computational grid project• Building Grid middleware• Higher level services: workflow, databases,
knowledge management, provenance…• Service-based : Open Grid Service
Architecture early adopter• Bioinformatics services are published as Web
services and Grid Services• Working with publicly available biological
resources: e.g. EMBL-EBI
myGrid
What is the Grid?• Resource sharing & coordinated problem
solving in dynamic, multi-institutional virtual organizations
• On-demand, ubiquitous access to computing, data, and all kinds of services
• New capabilities constructed dynamically and transparently from distributed services
• No central location, No central control, No existing trust relationships, Little predetermination
• Uniformity, Pooling & Virtualisation
What is the Grid?• In silico experiments
– Information harvesting & PSE– Dynamically forming virtual
organisations to solve problems.
– Describing, searching for and weaving resources: people. applications, db, content, instruments
– Orchestrating resources – Support for scientific method:
provenance, argumentation, opinion contextualisation etc
• BioUtility & communities of practice
Knowledge Grid
Information Grid
Data/Computation Grid
“E-Scientists” Environment
Information Weaving
• Large amounts of different kinds of data & many applications.
• Highly heterogeneous.– Different types, algorithms,
forms, implementations, communities, service providers
• High autonomy.• Highly complex and inter-
related, & volatile.• Much of it textual narrative
Circadian Rhythms
1. Has anyone else studied the effect of neurotransmitters on the circadian rhythms in Drosophila?
2. I’ve got a cluster of proteins from my experiment. How do their functions interrelate? And what are the proteins with a particular function?
3. Is a structure known for my protein? What other proteins have a similar structure?
4. Can I build a homology 3D model?5. What is known about a homologous
protein?
1
2
54
3
e-Science Q & A
Who else has asked this question & can I use/adapt their approach?– Workflow.
What were the results at each stage?– Dynamic Data Repositories.
When was P12345 last updated?Which BLAST did I use?
– Provenance.Has PDB changed since I last ran this?
– Notification.
1
2
54
3
Personalisation.
3
54
Courtesy of Mark Wilkinson (BioMOBY)
myGrid • Service based architecture
– Publication, discovery, interoperation, composition, decommissioning of myGrid services
• Resource Interoperation– Workflow coordination & Database
integration.– Experimental workflows rather than
production workflows.• Experimentation
– Provenance & Change Propagation– Personalisation & Collaborative working.
• Security & ownership• Knowledge based using metadata and
ontologies
RASMOL
Metadata
Knowledge(ontologie
s)
Low level Grid Common Services (OGSI)Co-scheduling, data shipping, authentication, job execution, resource monitoring, database access
…
Middle level Grid Common Services:Database access, distributed query processing, service discovery, workflow enactment, event
notification
Upper level knowledge-based Grid Common Services:
Semantic integration, knowledge based querying, workflow composition, visualisation, provenance
mgt, semantic service discovery
Pro
ven
an
ce
Pers
on
alis
aio
nSecu
rity
BioMedical Services Library:DAS, workflow sets, integrated databases
Web Portal
Carp Gene expression
analysis
TALISMANannotation workbench
Workbench
Who is myGrid for?myGrid users
biologists IS specialists
infrequentproblem specificbioinformaticians
tool builders
serviceprovider
systemsadministrators
bioinformaticstool builders
myGrid Outcomes
• e-Scientists– Environment built on toolkits for service access,
personalisation & community.– Talisman – Interpro family of pattern databases
annotation– UTOPIA – visual multiple sequence alignment– Workbench for gene expression in Carp & Graves
disease• Developers
– Protocols and service descriptions.– myGrid-in-a-Box developers kit of core services.– Reference implementation services & applications.– Bio services.
Service based architecture
• Each bio resource is a service– Database, archive, analysis,
tool, person, instrument, a workflow …
• Each myGrid architectural component is a service– Workflow enactment
engine, event notification, registry, scheduler…
• OGSA early adopter.
Web services
Grid protocols
Open Grid Service
Architecture
Metadata+ontology• Service registration, discovery,
publication, composition, management.
• Data types & ontologies• Service matchmaking• Ontology editor, deployment
server & reasoner• Typing inputs and outputs of
workflows• Semantic Database integration• Portal driving ….
Web services
Grid protocols
OGSA
Semantic Web
W3C: RDF,DAML+OIL, OWL
1. User selects values from a drop down list to create a property based description of their required service. Values are constrained to provide only sensible alternatives.
2. Once the user has entered a partial description they submit it for matching. The results are displayed below.
3. The user adds the operation to the growing workflow.
4. The workflow specification is complete and ready to match against those in the workflow repository.
Integration & Coordination
• View-based Information Repository for XML data
• Database integration– Access XML and RDBMS with OGSA-DAI– Semantic database integration.– Distributed query processing.
• Workflow – Dynamic workflow enactment engine.– Workflow repository– User interactivity.– Workflows linked with results
E-Science Support
• Data provenance and resource change management– Workflow logs.– Event notification service.– Incremental view management.– Workflow and query evolution.
• Personalisation– Management of views over repositories.– Personalisation of process flows. – Annotation of data sets and workflows– Dynamic creation of personal data sets.
Bio-Science services
• Grid-enabled BioServices by the EMBL-European Bioinformatics Institute– EMBOSS, SRS, Open BQS, BLAST, XEmbl and
EmblFetch, Flybase, Gadfly …
• Applications using Gateway API– TALISMAN (annotation tool used by Interpro)– UTOPIA (sequence fingerprint analysis)
• Portal• Workbench application
How do the functions of a
cluster of proteins
interrelate?
Some proteins in my personal repository
Portal
PersonalRepository
Meta Data:Ontology
WorkflowRepository
Meta Data:Service Type
Directory
RepositoryClient
OntologyClient
WorkflowClient
Find services that takes a protein and gives their functions and pick the best match.
Portal
PersonalRepository
Meta Data:Ontology
WorkflowRepository
Meta Data:Service Type
Directory
RepositoryClient
OntologyClient
WorkflowClient
Find another that displays the proteins base on their function. Ontology restricts inputs & outputs
Portal
PersonalRepository
Meta Data:Ontology
WorkflowRepository
Meta Data:Service Type
Directory
RepositoryClient
OntologyClient
WorkflowClient
Build a workflow of composed services linked together
Portal
PersonalRepository
Meta Data:Ontology
WorkflowRepository
Meta Data:Service Type
Directory
RepositoryClient
OntologyClient
WorkflowClient
See if a workflow that is appropriate already exists. It could have been made anyone who will share with you.
Portal
PersonalRepository
Meta Data:Ontology
WorkflowRepository
Meta Data:Service Type
Directory
RepositoryClient
OntologyClient
WorkflowClient
Pick one and enact it.
Portal
PersonalRepository
Meta Data:Ontology
WorkflowRepository
Meta Data:Service Type
Directory
RepositoryClient
OntologyClient
WorkflowClient
While its running it picks the best service instance that can run the service at that time.
Repos.Client
Bioinformatic Services
PersonalRepository
WorkflowEnactment
ServiceDirectory
4
2
2?
2?Provenance
Data
3
WorkflowClient
Service SelectionClient
1
Repos.Client
Bioinformatic Services
PersonalRepository
WorkflowEnactment
ServiceDirectory
4
2
2?
2?Provenance
Data
3
WorkflowClient
Service SelectionClient
1
While its running it picks the best service instance that can run the service at that time.
Or you choose.
The workflow finishes with the final display service
Repos. Client
Bioinformatic Services
PersonalRepository
WorkflowEnactment
ServiceDirectory
4
2
2?
2?Provenance
Data
3
WorkflowClient
Service SelectionClient
1
Results are put into your personal repository, with a concept from the ontology to tell you and myGrid what they mean.
Repos. Client
Bioinformatic Services
PersonalRepository
WorkflowEnactment
ServiceDirectory
4
2
2?
2?Provenance
Data
3
WorkflowClient
Service SelectionClient
1
And full provenance record kept, and linked with the results. We could redo or reuse the workflow.
Repos. Client
Bioinformatic Services
PersonalRepository
WorkflowEnactment
ServiceDirectory
4
2
2?
2?Provenance
Data
3
WorkflowClient
Service SelectionClient
1
HPC vs Bioinformatics
• Computational Biology vs Bioinformatics => HPC vs Info Grid– Relationship between them? Shared
components? Architectures? – Information management matters!
Accelerating scientific process is not just accelerating compute intensive processes.
• HPC style BioGrid– Provenance? Personalisation? Metadata?
Interactivity? Knowledge? Intermediate results to db; annotated logs…
We are not alone
• Other Efforts – we are not alone– W3C semantic web, BioMOBY, I3C, OMG
LSR, active ontology development in the community, DARPA,
• Open Grid Service Architecture– We believe!! Links with Web Services give
many benefits.– But it’s a moving target … – GGF is a zoo … over 40 RG and WG, often
overlapping.
Service Providers • Its hard to get Service Providers buy-in
– lower the barriers of entry– make it reliable.– security & intellectual property management– programmatic interfaces
• How do we migrate legacy applications?– Whole bunch of apps and databases on the web
• Accounting matters– Who is going to pay for all this?
Hotch potch
• Heterogeneity sucks– Multi-policy of everything – security,
access, accounting really matters in EU– Getting a UK Grid to work is non-trivial– Huge investment in system admin.
• Doing more than you could do before.– Not just another predictable BLAST
service over a bunch of machines– Non-predictable analysis.
Not a silver bullet! Its just middleware not magic• Data quality• Content management of databases (controlled
vocabularies)• Provenance and versioning policies• Appropriate use of tools• Computational inaccessibility of free text
annotation• Database accessibility through means other than
point and click web interfaces.Independent of the Grid!
Life Sciences Grid (LSG)
http://people.cs.uchicago.edu/~dangulo/LSG/
The sum up
• If you ignore the multi-organisational aspect of Grid
• If you ignore the heterogeneous aspect of Grid
• If you assume its safe and free and fair
• Then its not so hard.
The myGrid Team• Carole Goble• Norman Paton• Alvaro Fernandes• Stephen Pettifer• Luc Moreau• Dave De Roure• Chris Greenhalgh• Tom Rodden• John Brooke• Paul Watson• Alan Robinson• Rob Gaizauskas• Robert Stevens• Neil Wipat
• Matthew Addis• Nick Sharman• Rich Cawley• Simon Harper• Karon Mee• Simon Miles• Vijay Dailani• Xiaojian Liu• Tom Oinn• Martin Senger• Milena Radenkovic• Kevin Glover• Angus Roberts• Chris Wroe
• Mark Greenwood • Phil Lord• Neil Davis• Darren Marvin• Justin Ferris• Peter Li• Nedim Alpdemir• Luca Toldo• Robin McEntire• Anne Westcott• Tony Storey• Bernard Horan• Paul Smart• Robert Haynes
Spares
Knowledge Services
Knowledge-based data/computation
services
Knowledge-based information
services
Data/computation services
Information services
e-Scientist environment
Text miningAnnotation
Base services
Semanticservices
Knowledgeservices
Knowledgeapplications & networks
Collaboratory Prediction
Applications
Resources
Web Portal
Gateway API
Workbench Apps Builder (Talisman)
Custom Application DemonstratorApplication
UTOPIA
WorkbenchDemonstrator
Cold Carp Gene Expression
MSD Sequence annotation
…
Pro
ven
an
ce
Pers
on
alis
aio
n
Secu
rityBioMedical Services Librarye.g. Distributed Annotation Service
User Agent
Presentation Services
Collaboration Support
Management Tools
Base
Serv
ices
Sem
an
tic
aw
are
serv
ices
Fab
ric
Semantic Data Integration
Provenance metadata
Versioning
QoSDistributed
Query
Database
Provenance Validation & Assessment
MIR Database Access
Workflow Enactment
JobExecution
Semantic Workflow Design
Third Party
Ontology Service
Event Notification
Semantic Discovery
Syntactic Discovery
‘White Pages’ & ‘Yellow Pages’
Discovery
Device Access
Information Extraction
Knowledge
Metadata
Annotation
Preferences
Reasoner
Availability
Service matcher
myGrid Stack
Web Portal
Gateway API
Workbench Apps Builder (Talisman)
Custom Application DemonstratorApplication
UTOPIA
WorkbenchDemonstrator
Cold Carp Gene Expression
MSD Sequence annotation
…
Pro
ven
an
ce
Pers
on
alis
aio
n
Secu
rityBioMedical Services Librarye.g. Distributed Annotation Service
User Agent
Presentation Services
Collaboration Support
Management Tools
Base
Serv
ices
Sem
an
tic
aw
are
serv
ices
Fab
ric
Semantic Data Integration
Provenance metadata
Versioning
QoSDistributed
Query
Database
Provenance Validation & Assessment
MIR Database Access
Workflow Enactment
JobExecution
Semantic Workflow Design
Third Party
Ontology Service
Event Notification
Semantic Discovery
Syntactic Discovery
‘White Pages’ & ‘Yellow Pages’
Discovery
Device Access
Information Extraction
Knowledge
Metadata
Annotation
Preferences
Reasoner
Availability
Service matcher
myGrid Stack 0.1
Cold Carp Gene Expression
Web Portal
Gateway API
Workbench Apps Builder (Talisman)
Custom Application DemonstratorApplication
UTOPIA
WorkbenchDemonstrator
MSD Sequence annotation
…
Pro
ven
an
ce
Pers
on
alis
aio
n
Secu
rityBioMedical Services Librarye.g. Distributed Annotation Service
User Agent
Presentation Services
Collaboration Support
Management Tools
Base
Serv
ices
Sem
an
tic
aw
are
serv
ices
Fab
ric
Semantic Data Integration
Provenance metadata
Versioning
QoSDistributed
Query
Database
Provenance Validation & Assessment
MIR Database Access
Workflow Enactment
JobExecution
Semantic Workflow Design
Third Party
Ontology Service
Event Notification
Semantic Discovery
Syntactic Discovery
‘White Pages’ & ‘Yellow Pages’
Discovery
Device Access
Information Extraction
Knowledge
Metadata
Annotation
Preferences
Reasoner
Availability
Service matcher
myGrid Stack 0.2
Service based architecture
Find them
Publication, registration, discovery, matchmaking,
deregistration.
Organise them.
Interoperation, composition, substitution.
Run them.
Execution, monitoring, exception handling.