Upload
ainsley-odom
View
16
Download
2
Embed Size (px)
DESCRIPTION
Proteome data integration characteristics and challenges. - PowerPoint PPT Presentation
Citation preview
Proteome data integrationProteome data integrationcharacteristics and challengescharacteristics and challenges
Proteome data integrationProteome data integrationcharacteristics and challengescharacteristics and challenges
K. Belhajjame1, R. Cote4, S.M. Embury1, H. Fan2, C. Goble1, H. Hermjakob, S.J. Hubbard1, D. Jones3, P. Jones4, N. Martin2, S. Oliver1,
C. Orengo3, N.W. Paton1, M. Pentony3, A. Poulovassilis2, J. Siepen, R.D. Stevens1, C. Taylor4, L. Zamboulis2, and W. Zhu4
1University of Manchester2Birkbeck College
3University College London4European Bioinformatics Institute
All Hands Meetings, 2005 2
OutlineOutline
Experimental proteomics
ISPIDER architecture
Example use cases
Conclusion
All Hands Meetings, 2005 3
Separation
Protein digestion
Mass Spectrometry
Experimental proteomicsExperimental proteomics
An essential component for elucidation of the biological functions of proteins The study of the set of proteins produced by an organism with the aim of understanding their behaviour under varying conditions Protein DB
2D gel electrophoresis
Maldi TOF
Enzymatic digestion
Identification
Protein ID
All Hands Meetings, 2005 4
Experimental proteomicsExperimental proteomics
Development of new technologies for:
– protein separation (2D-SDS-PAGE, HPLC, Capillary
Electrophoresis)
– mass spectrometry (Multi-Dimensional protein identification)
Availability of publicly accessible protein sequence
databases
Proteomics databases (PedroDB, gpmDB, PepSeeker,
Pride, …)
Building experiments involving analysis services orchestration and data processing and integration
All Hands Meetings, 2005 5
Objectives of ISPIDERObjectives of ISPIDER
A Grid dedicated to the creation of bioinformatics
experiments for proteomics
Develop, or make, existing Proteome databases and
Grid-enabled services
Develop Middleware support for developing and
executing new proteome analyses, based on distributed
query processing and workflow technologies
Undertake proteomic studies that demonstrate the
effectiveness of the resulting infrastructure
All Hands Meetings, 2005 6
OutlineOutline
Experimental proteomics
ISPIDER architecture
Example use cases
Conclusion and future directions
All Hands Meetings, 2005 7
ISPIDERISPIDER
ExistingExistingE-ScienceE-ScienceInfrastructureInfrastructure
ISPIDERISPIDERProteomics GridProteomics GridInfrastructureInfrastructure
ISPIDERISPIDERProteomics Proteomics ClientsClients
PublicPublicProteomicsProteomicsResourcesResources
ProteomeRequestHandler
InstanceIdent/Mapping
Services
ProteomicOntologies/
Vocabularies
SourceSelectionServices
DataCleaningServices
myGridOntologyServices
myGridDQP
AutoMedmyGrid
Workflows
KEY: WS = Web services, GS = Genome sequence, TR = transcriptomic data, PS = protein structure, PF = protein family, FA = functional annotation, PPI = protein-protein interaction data, WP = Work Package
VanillaQuery Client
2D GelVisualisatio
nClient + Aspergil.
Extensions
+ Phosph.Extensions PPI Validation
+ Analysis Client
Protein ID Client
Web services
Existing Resources
PS
WS
PF
WS
TR
WS
GS
WS
FA
WS
PPI
WS
PID
WS
PRIDE
WS
PEDRo
WS
ISPIDER Resources
Phos
WS
All Hands Meetings, 2005 8
OutlineOutline
Experimental proteomics
ISPIDER architecture
Example use cases
Conclusion and future directions
All Hands Meetings, 2005 9
MotivationMotivation
Protein identification experiments are usually used as input into further analysis processes.
– Gathering evidence for a biological hypothesis
– Suggesting new hypotheses
ObjectiveObjectiveAugment the identification results with additional information on the identified protein
ImplementationImplementationTaverna workflow system
Value-added protein datasetsValue-added protein datasets
All Hands Meetings, 2005 10
Value-added protein datasetsValue-added protein datasets
PepMapper Web Service
GO Services
Auxiliary Services
All Hands Meetings, 2005 11
Genome-focused protein identification
Genome-focused protein identification
MotivationMotivation
Currently, protein identification searches performed over large data
sets. This means fewer false negatives, but false positives are also
more likely.
ObjectiveObjective
More focused and thus more efficient protein identification
ImplementationImplementation
Taverna workflow system
DQP, a service-based query processor
All Hands Meetings, 2005 12
Genome-focused protein identification
Genome-focused protein identification
DQP Web Service
IPI
PepMapper web service
GOA Web Service
select p.Name, p.Seqfrom p in db_proteinSequenceswhere p.OS='HomoSapiens';
All Hands Meetings, 2005 13
Integrated access to proteome databases
Integrated access to proteome databases
MotivationMotivation
Ability to analyse existing proteomics results en masse is limited,
because of the heterogeneities between the schemas of the different
databases
ObjectiveObjective
Providing integrated access to proteome databases through a
common schema
ImplementationImplementation
AutoMed, a framework for mapping heterogeneous schemata
DQP, a service-based query processor
All Hands Meetings, 2005 14
Integrated access to proteome databases
Integrated access to proteome databases
Automed Wrappers
PRIDEPedroDBgpmDB
Automed Repository
OGSA-DAIActivity
OGSA-DAIActivity
OGSA-DAIActivity
OGSA DistributedQuery Processor
AutomedQuery Processor
AutomedDQP Wrapper
User query
Result
OQL query
OQL result
All Hands Meetings, 2005 15
ConclusionsConclusions
+ Available e-science technologies provide rapid prototyping facilities for bioinformatics analyses
+ Combining such technologies is possible and opens up more possibilities Taverna + DQP Automed + DQP
- Writing custom code is usually required– Processing service output to extract inputs for following services – Transforming results between data formats– Dealing with mismatches between identifiers
Developing a user-guided environment for the detection and resolution of mismatches
Development of Proteomics client applications (PepMapper, PepSeeker and PRIDE)