View
1.051
Download
3
Category
Tags:
Preview:
DESCRIPTION
Proteomics, ProteomeXchange, Proteins, Biohackathon
Citation preview
PRIDE and ProteomeXchange – Making proteomics data accessible and reusable
Dr. Yasset Perez-Riverol Twitter: @ypriverol
Github: ypriverol
Bioinformatician - PRIDE Group
Proteomics Services Team
EMBL-EBI
Hinxton, Cambridge, UK
Yasset Perez-Riverol
yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)
Proteomics Services, EBI-EMBL
IntAct Interactions
PRIDE MS/MS Data
Uniprot Protein Sequences
Reactome Pathways
Biomodels
Yasset Perez-Riverol
yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)
Overview
• The ProteomeXchange (PX) consortium
• PRIDE and ProteomeXchange
• PRIDE Components.
• Current and future developments.
Yasset Perez-Riverol
yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)
ProteomeXchange Consortium • Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing proteomics repositories.
• Includes PeptideAtlas (ISB, Seattle), PRIDE (Cambridge, UK) and MassIVE (UCSD, San Diego).
• Common identifier space (PXD identifiers)
• Two supported data workflows: MS/MS and SRM.
• Main objective: Make data available and reusable.
http://www.proteomexchange.org
Yasset Perez-Riverol
yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)
ProteomeCentral
Metadata / Manuscript
Raw Data*
Results
Journals
UniProt/ neXtProt
Peptide Atlas
Other DBs
Receiving repositories
PASSEL (SRM data)
PRIDE (MS/MS data)
Other DBs
GPMDB
Researcher’s results
Reprocessed results
Raw data*
Metadata
MassIVE (MS/MS data)
Vizcaíno et al., Nat Biotechnol, 2014
ProteomeXchange data workflow
Yasset Perez-Riverol
yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)
MassIVE (UCSD)
http://proteomics.ucsd.edu/service/massive/
• Just joined ProteomeXchange on June 2014
Yasset Perez-Riverol
yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)
• Suitable for SRM assays
• Part of PeptideAtlas set of resources.
http://www.peptideatlas.org/passel/ Farrah et al., Proteomics, 2012
PASSEL: repository for SRM data
Yasset Perez-Riverol
yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)
Pride: Protein identification Database
Vizcaíno et al., N. A Research, 2014 http://www.ebi.ac.uk/pride/archive/
Yasset Perez-Riverol
yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)
PX Submission workflow for MS/MS data 1. Mass spectrometer output files: raw data (binary files) or
peak list spectra in a standardized format (mzML, mzXML).
2. Result files:
a. Complete submissions: Result files can be converted to PRIDE XML or the mzIdentML data standard.
b. Partial submissions: For workflows not yet supported by PRIDE, search engine output files will be stored and provided in their original form.
3. Metadata: Sufficiently detailed description of sample origin, workflow, instrumentation, submitter based on Ontologies and Controlled Vocabularies.
4. Other files: Optional files: a. QUANT: Quantification related results e. FASTA b. PEAK: Peak list files c. OTHER: Any other file type
Published
Raw Files
Other files
Ternent et al., Proteomics, 2014
Yasset Perez-Riverol
yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)
Complete submissions using mzIdentML Search Engine
Results + MS files
Search engines
mzIdentML
- Mascot - MSGF+ - Myrimatch and related tools from D. Tabb’s lab - OpenMS - PEAKS - ProCon (ProteomeDiscoverer, Sequest) - Scaffold - TPP via the idConvert tool (ProteoWizard) - ProteinPilot (planned by the end of 2014) - Others: library for X!Tandem conversion, lab internal pipelines, …
An increasing number of tools support export to mzIdentML 1.1
- Referenced spectral files need to be submitted as well (all open formats are supported).
Updated list: http://www.psidev.info/tools-implementing-mzIdentML#.
Yasset Perez-Riverol
yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)
mzTab
http://mztab.googlecode.com
• Basic information about experiment and sample • Key-Value pairs Metadata
• Basic information about protein identifications • Table-based Protein
• Information about quantified peptides • Table-based Peptide
• Information about identified spectra • Table-based PSM
• Basic information about identified small molecules • Table-based Small Molecule
J. Griss et al., MCP, 2014
Yasset Perez-Riverol
yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)
PRIDE Components: Submission Process
PRIDE Converter PRIDE Inspector PX Submission Tool
Yasset Perez-Riverol
yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)
• Capture the mappings between the different types of files.
• Add the mandatory metadata annotation.
• Make the file upload process straightforward to the submitter (It transfers all the files using Aspera or FTP).
• Command line alternative: some scripting is needed.
PRIDE Components: PX submission tool
Published
Raw
Other files
http://www.proteomexchange.org/submission
PX submission
tool
Yasset Perez-Riverol
yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)
Available for complete submissions
Wang et al., Nat. Biotechnology, 2012
PRIDE Inspector 2.0
PRIDE Inspector 2.0 supports: - PRIDE XML - mzIdentML + all types of spectra files - mzML - mzTab Quantitation (work in progress)
https://github.com/PRIDE-Toolsuite/
Yasset Perez-Riverol
yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)
Pride Components: Pipelines and Visualization
Submission validation Pipeline
• QC of files submitted. • Metadata check.
Submission pipeline.
• Add Project to Database (files location, general statistics, metadata)
Publication pipeline
• Conversion of files to mztab • Conversion spectra peaks to mgf • Index de information in Solr server
Yasset Perez-Riverol
yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)
Pride Components: Services & Web components
Yasset Perez-Riverol
yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)
ProteomeCentral: Portal for all PX datasets
http://proteomecentral.proteomexchange.org/cgi/GetDataset
Yasset Perez-Riverol
yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)
ProteomeXchange: 1329 datasets up until October 2014
Origin: 271 USA
166 Germany
115 United Kingdom
73 Switzerland
70 China
68 Netherlands
67 France
55 Canada
44 Spain
42 Belgium
33 Sweden
31 Australia
31 Denmark
31 Japan
20 India
20 Norway
19 Taiwan
17 Ireland
16 Austria
14 Finland
14 Italy
12 Republic of Korea
11 Brazil
9 Russia
8 Israel
7 Singapore …
Type: 437 PRIDE complete 792 PRIDE partial 63 PeptideAtlas/PASSEL complete 14 MassIVE 23 reprocessed
Publicly Accessible: 691 datasets, 52% of all 86% PRIDE 12% PASSEL 2% MassIVE
Data volume: Total: ~55 TB Number of all files: ~131,000 PXD000320-324: ~ 5 TB PXD000065: ~ 1.4TB
Top Species studied by at least 10 datasets: 577 Homo sapiens 165 Mus musculus 56 Saccharomyces cerevisiae 53 Arabidopsis thaliana 29 Rattus norvegicus 22 Escherichia coli 17 Bos taurus 16 Mycobacterium tuberculosis 13 Oryza sativa 13 Drosophila melanogaster 13 Glycine max ~ 290 species in total
Datasets/year: 2012: 102 2013: 527 2014: 700
Yasset Perez-Riverol
yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)
Journals and Data Deposition
Journal
Num
ber o
f Sub
mis
sion
s
Yasset Perez-Riverol
yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)
Data Access ? To
tal N
umbe
rs
Yasset Perez-Riverol
yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)
Future developments
• Make the data reusable.
• Integration of different Protein expression resources
• PRIDE
• PeptideAtlas
• ProteomicsDB
• Human Proteome Map
PXD Identifier
Hits
Dataset title
PXD000561 153512 A draft map of the human
proteome
PXD000865 51639 Mass spectrometry based draft of
the human proteome
Yasset Perez-Riverol
yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)
PROXI Clients
Repositories &
Databases
Web Services PROXI PROXI PROXI PROXI PROXI Registry
Data Perez-Riverol Y, Proteomics, 20014
Yasset Perez-Riverol
yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)
Conclusions
• ProteomeXchange is widely used.
• PRIDE contains most of the MS/MS datasets.
• It has now a new consortium member: MassIVE (UCSD).
• Around half of the datasets are already public.
• Different open source tools available to facilitate the process:
• File transfer speed should not be a problem (Aspera support)
• Data depostion enables and promotes data reuse.
• ProteomeXchange is open to new members.
Yasset Perez-Riverol
yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)
Acknowledgements
PRIDE Team Juan A. Vizcaino (Group Leader) Attila Csordas Rui Wang Florian Reisinger Jose A. Dianes Tobias Ternent Yasset Perez-Riverol Noemi del Toro Henning Hermjakob
PeptideAtlas Team (ISB, Seattle) Eric Deutsch Terry Farrah Zhi Sun MAssIVE Nuno Bandeira And many other PX partners and stakeholders
Yasset Perez-Riverol
yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)
Questions?
Recommended