Upload
juan-antonio-vizcaino
View
710
Download
7
Embed Size (px)
DESCRIPTION
Talk during the PSI/ProteomeXchange workshop in HUPO 2014. It summarizes how to submit data to ProteomeXchange via PRIDE.
Citation preview
Submitting your data to
ProteomeXchange – A Mini-Tutorial
Dr. Juan Antonio Vizcaíno
PRIDE Group Coordinator
Proteomics Services Team
EMBL-EBI
Hinxton, Cambridge, UK
Juan A. Vizcaí[email protected]
13th HUPO World CongressMadrid, 7 October 2014
Overview
• The ProteomeXchange (PX) consortium
• How to submit and access data in PX via PRIDE
• How to access PX data
• Submitting data triggers data reuse
Juan A. Vizcaí[email protected]
13th HUPO World CongressMadrid, 7 October 2014
ProteomeXchange Consortium
• Goal: Development of a framework to allow
standard data submission and dissemination
pipelines between the main existing proteomics
repositories.
• Includes PeptideAtlas (ISB, Seattle), PRIDE
(Cambridge, UK) and (very recently) MassIVE
(UCSD, San Diego).
• Common identifier space (PXD identifiers)
• Two supported data workflows: MS/MS and SRM.
• Main objective: Make life easier for researchers
http://www.proteomexchange.org
Juan A. Vizcaí[email protected]
13th HUPO World CongressMadrid, 7 October 2014
ProteomeCentral
Metadata /
Manuscript
Raw Data*
Results
Journals
UniProt/
neXtProt
Peptide Atlas
Other DBs
Receiving repositories
PASSEL
(SRM data)
PRIDE
(MS/MS data)
Other DBs
GPMDB
Researcher’s results
Reprocessed results
Raw data*
Metadata
MassIVE
(MS/MS data)
Vizcaíno et al., Nat Biotechnol, 2014
ProteomeXchange data workflow
Juan A. Vizcaí[email protected]
13th HUPO World CongressMadrid, 7 October 2014
ProteomeCentral
Metadata /
Manuscript
Raw Data*
Results
Journals
UniProt/
neXtProt
Peptide Atlas
Other DBs
Receiving repositories
PASSEL
(SRM data)
PRIDE
(MS/MS data)
Other DBs
GPMDB
Researcher’s results
Reprocessed results
Raw data*
Metadata
MassIVE
(MS/MS data)
Vizcaíno et al., Nat Biotechnol, 2014
ProteomeXchange data workflow
Juan A. Vizcaí[email protected]
13th HUPO World CongressMadrid, 7 October 2014
Overview
• The ProteomeXchange (PX) consortium
• How to submit and access data in PX via PRIDE
• How to access PX data
• Submitting data triggers data reuse
Juan A. Vizcaí[email protected]
13th HUPO World CongressMadrid, 7 October 2014
PRIDE (PRoteomics IDEntifications) database
http://www.ebi.ac.uk/pride
• Focused on MS/MS
approaches
• Other data types can
also be stored as
“Partial submissions”.
Martens et al., Proteomics, 2005
Vizcaíno et al., NAR, 2013
Juan A. Vizcaí[email protected]
13th HUPO World CongressMadrid, 7 October 2014
Manuscript just out detailing the process
Ternent et al., Proteomics, 2014http://www.proteomexchange.org/submission
Example dataset:
PXD000764
- Title: “Discovery of new CSF biomarkers for meningitis in children”
- 12 runs: 4 controls and 8 infected samples
- Identification and quantification data
Juan A. Vizcaí[email protected]
13th HUPO World CongressMadrid, 7 October 2014
PX Data workflow for MS/MS data1. Mass spectrometer output files: raw data (binary files) or
peak list spectra in a standardized format (mzML, mzXML).
2. Result files:
a. Complete submissions: Result files can be converted to
PRIDE XML or the mzIdentML data standard.
b. Partial submissions: For workflows not yet supported by
PRIDE, search engine output files will be stored and
provided in their original form.
3. Metadata: Sufficiently detailed description of sample origin,
workflow, instrumentation, submitter.
4. Other files: Optional files:
a. QUANT: Quantification related results e. FASTA
b. PEAK: Peak list files f. SP_LIBRARY
c. GEL: Gel images
d. OTHER: Any other file type
Published
RawFiles
Other files
Juan A. Vizcaí[email protected]
13th HUPO World CongressMadrid, 7 October 2014
Complete vs Partial submissions: experimental metadata
Complete Partial
General experimental metadata about the projects is similar.
However, at the assay level information in partial submissions is not so detailed
Juan A. Vizcaí[email protected]
13th HUPO World CongressMadrid, 7 October 2014
Complete
Partial
Complete vs Partial submissions: processed results
For complete submissions, it is possible to connect the spectra with the identification
processed results and they can be visualized.
Juan A. Vizcaí[email protected]
13th HUPO World CongressMadrid, 7 October 2014
How to perform a complete PX submission to PRIDE
• Decide between a complete/partial submission.
• File conversion/export to PRIDE XML or mzIdentML
• File check before submission (PRIDE Inspector)
• Experimental annotation and actual file submission (PX
submission tool)
• Post-submission steps
Juan A. Vizcaí[email protected]
13th HUPO World CongressMadrid, 7 October 2014
How to perform a complete PX submission to PRIDE
• Decide between a complete/partial submission.
• File conversion/export to PRIDE XML or mzIdentML
• File check before submission (PRIDE Inspector)
• Experimental annotation and actual file submission (PX
submission tool)
• Post-submission steps
Juan A. Vizcaí[email protected]
13th HUPO World CongressMadrid, 7 October 2014
How to perform a complete PX submission to PRIDE
• Complete submission:
• MS/MS data.
• Processed results can be converted to the PSI standard
mzIdentML or PRIDE XML.
• Partial submission:
• Any type of data (not SRM, which goes to PASSEL)
• E.g. top down, data independent acquisition, MS Imaging (to
come), etc.
• Processed results cannot be converted to a data standard.
Juan A. Vizcaí[email protected]
13th HUPO World CongressMadrid, 7 October 2014
How to perform a complete PX submission to PRIDE
• Decide between a complete/partial submission.
• File conversion/export to mzIdentML or PRIDE XML
• File check before submission (PRIDE Inspector)
• Experimental annotation and actual file submission (PX
submission tool)
• Post-submission steps
Juan A. Vizcaí[email protected]
13th HUPO World CongressMadrid, 7 October 2014
Complete submissions
Search
Engine
Results +
MS files
Search
engines
mzIdentML
- Mascot
- MSGF+
- Myrimatch and related tools from D. Tabb’s lab
- OpenMS
- PEAKS
- ProCon (ProteomeDiscoverer, Sequest)
- Scaffold
- TPP via the idConvert tool (ProteoWizard)
- ProteinPilot (planned by the end of 2014)
- Others: library for X!Tandem conversion, lab
internal pipelines, …
An increasing number of tools support export to mzIdentML
1.1
- Referenced spectral files need to be submitted as well
(all open formats are supported).
Updated list: http://www.psidev.info/tools-implementing-
mzIdentML#.
Juan A. Vizcaí[email protected]
13th HUPO World CongressMadrid, 7 October 2014
Search
output
files
Spectra
files
Original data files ‘RESULT’ file generation Final ‘RESULT’ file
PRIDE
XML
‘RESULT’
Before: file conversion
File conversion
PRIDE
Converter
Juan A. Vizcaí[email protected]
13th HUPO World CongressMadrid, 7 October 2014
Tools ‘RESULT’ file generation Final ‘RESULT’ file
mzIdentML
‘RESULT’
Now: native file export
Spectra
files
Mascot
ProteinPilot
Scaffold
PEAKS
MSGF+
Others
Native File export
Juan A. Vizcaí[email protected]
13th HUPO World CongressMadrid, 7 October 2014
How to perform a complete PX submission to PRIDE
• Decide between a complete/partial submission.
• File conversion/export to PRIDE XML or mzIdentML
• File check before submission (PRIDE Inspector)
• Experimental annotation and actual file submission (PX
submission tool)
• Post-submission steps
Juan A. Vizcaí[email protected]
13th HUPO World CongressMadrid, 7 October 2014
Available for complete submissions
Wang et al., Nat. Biotechnology, 2012
PRIDE Inspector 2.0
PRIDE Inspector 2.0 supports:
- PRIDE XML
- mzIdentML + all types of spectra files
- mzML- mzTab Ident (work in progress)
http://code.google.com/p/pride-
toolsuite/wiki/PRIDEInspector
Juan A. Vizcaí[email protected]
13th HUPO World CongressMadrid, 7 October 2014
How to perform a complete PX submission to PRIDE
• Decide between a complete/partial submission.
• File conversion/export to PRIDE XML or mzIdentML
• File check before submission (PRIDE Inspector)
• Experimental annotation and actual file submission (PX
submission tool)
• Post-submission steps
Juan A. Vizcaí[email protected]
13th HUPO World CongressMadrid, 7 October 2014
• Capture the mappings between the different types of files.
• Make the file upload process straightforward to the submitter (It transfers all the
files using Aspera or FTP).
PX submission tool
Published
Raw
Other files
http://www.proteomexchange.org/submission
PX
submission
tool
• Command line alternative: some scripting is needed
Juan A. Vizcaí[email protected]
13th HUPO World CongressMadrid, 7 October 2014
PX submission tool: step by step
Step 1
Step 2
Juan A. Vizcaí[email protected]
13th HUPO World CongressMadrid, 7 October 2014
PX submission tool: step by step
Step 3
Step 4
Juan A. Vizcaí[email protected]
13th HUPO World CongressMadrid, 7 October 2014
PX submission tool: step by step
Step 5 Step 6
Juan A. Vizcaí[email protected]
13th HUPO World CongressMadrid, 7 October 2014
PX submission tool: step by step
Step 7
Step 8
Juan A. Vizcaí[email protected]
13th HUPO World CongressMadrid, 7 October 2014
PX submission tool: step by step
Step 9
Juan A. Vizcaí[email protected]
13th HUPO World CongressMadrid, 7 October 2014
Fast file transfer with Aspera
- Aspera is the default file transfer protocol to PRIDE:
- PX Submission tool
- Command line
- Up to 50X faster than FTPFile transfer speed should
not be a problem!!
Juan A. Vizcaí[email protected]
13th HUPO World CongressMadrid, 7 October 2014
PX submission tool: HPP tags
Juan A. Vizcaí[email protected]
13th HUPO World CongressMadrid, 7 October 2014
Batch submissions on the command line
• Generate on your own the PX summary file (generated
by default by the PX submission tool).
MTD first_name John Arthur
MTD last_name Smith
MTD email [email protected]
MTD affiliation University of Cambridge
MTD title Human proteome
MTD description An experiment about human proteome
MTD keyword human, proteome
MTD pubmed 12345
MTD px 10.1000/182
MTD pride_login pride-user
FMH file_id file_type file_path file_mapping
FME 1 result /path/to/pride/xml/files/pride-1.xml7,8,9
FME 2 result /path/to/pride/xml/files/pride-2.xml4
FME 3 result /path/to/mzidentml/files/mzidentml-1.xml 5,10
FME 4 raw /path/to/raw/files/raw-1.bin
FME 5 raw /path/to/raw/files/raw-2.bin
FME 6 raw /path/to/raw/files/raw-3.bin
FME 7 raw ftp://some.url/at/some/place/raw-4.bin
FME 8 search/path/to/search/engine/output/search-1.out
FME 9 other /path/to/other/file/other-1.e
FME 10 peak /path/to/peak/list/mzml-1.xml
Juan A. Vizcaí[email protected]
13th HUPO World CongressMadrid, 7 October 2014
Batch submissions on the command line
• Generate on your own the PX summary file (generated
by default by the PX submission tool).
Juan A. Vizcaí[email protected]
13th HUPO World CongressMadrid, 7 October 2014
Batch submissions on the command line (2)
• Generate on your own the PX summary file (generated
by default by the PX submission tool).
• Put together all the files plus the PX summary file.
• Ask PRIDE team for a specific upload directory (pride-
Juan A. Vizcaí[email protected]
13th HUPO World CongressMadrid, 7 October 2014
How to perform a complete PX submission to PRIDE
• Decide between a complete/partial submission.
• File conversion/export to PRIDE XML or mzIdentML
• File check before submission (PRIDE Inspector)
• Experimental annotation and actual file submission (PX
submission tool)
• Post-submission steps
Juan A. Vizcaí[email protected]
13th HUPO World CongressMadrid, 7 October 2014
Post-processing steps
• PRIDE curators will check the files
• Files must be valid to the schema
• All the required annotations must be there
• Basic QC check (e.g. detect errors in PTM annotation)
• If everything is correct, submission to PRIDE is done
• The author receives a PXD identifier, a reviewer username
and a password, and a DOI (for complete submissions).
Juan A. Vizcaí[email protected]
13th HUPO World CongressMadrid, 7 October 2014
Overview
• The ProteomeXchange (PX) consortium
• How to submit and access data in PX via PRIDE
• How to access PX data
• Submitting data triggers data reuse
Juan A. Vizcaí[email protected]
13th HUPO World CongressMadrid, 7 October 2014
ProteomeCentral
Metadata /
Manuscript
Raw Data*
Results
Journals
UniProt/
neXtProt
Peptide Atlas
Other DBs
Receiving repositories
PASSEL
(SRM data)
PRIDE
(MS/MS data)
Other DBs
GPMDB
Researcher’s results
Reprocessed results
Raw data*
Metadata
MassIVE
(MS/MS data)
Vizcaíno et al., Nat Biotechnol, 2014
ProteomeXchange data workflow
Juan A. Vizcaí[email protected]
13th HUPO World CongressMadrid, 7 October 2014
ProteomeCentral: Portal for all PX datasets
http://proteomecentral.proteomexchange.org/cgi/GetDataset
Juan A. Vizcaí[email protected]
13th HUPO World CongressMadrid, 7 October 2014
Origin:
235 USA
142 Germany
97 United Kingdom
67 Switzerland
64 Netherlands
62 China
60 France
48 Canada
43 Spain
36 Belgium
32 Sweden
29 Australia
26 Denmark
23 Japan
18 Taiwan
17 India
16 Ireland
14 Norway
14 Italy
12 Finland
11 Republic of Korea
10 Brazil
8 Austria
7 Israel
7 Singapore …
ProteomeXchange: 1,148 datasets up until August 2014
Type:
386 PRIDE complete
687 PRIDE partial
51 PeptideAtlas/PASSEL complete
1 MassIVE
23 reprocessed
Publicly Accessible:
544 datasets, 50% of all
90% PRIDE
10% PASSEL
Data volume:
Total: ~51 TB
Number of all files: ~130,000
PXD000320-324: ~ 5 TB
PXD000065: ~ 1.4TB
Top Species studied by at least 10
datasets:
510 Homo sapiens
142 Mus musculus
46 Saccharomyces cerevisiae
45 Arabidopsis thaliana
23 Rattus norvegicus
16 Escherichia coli
15 Bos taurus
15 Mycobacterium tuberculosis
13 Oryza sativa
12 Drosophila melanogaster
12 Glycine max
~ 265 species in totalDatasets/year:
2012: 102
2013: 527
2014: 519
Juan A. Vizcaí[email protected]
13th HUPO World CongressMadrid, 7 October 2014
Overview
• The ProteomeXchange (PX) consortium
• How to submit and access data in PX via PRIDE
• How to access PX data
• Submitting data triggers data reuse
Juan A. Vizcaí[email protected]
13th HUPO World CongressMadrid, 7 October 2014
Which are the most accessed datasets?
PXD Identifier Hits Dataset title Publication
PXD000561 153512 A draft map of the human proteome
Kim et al.,
Nature,2014.
PMID: 24870542
PXD000851 111587
Membrane proteomic analysis of
colorectal cancer tissue
Kume et al., MCP,
2014.
PMID:24687888
PXD000865 51639
Mass spectrometry based draft of
the human proteome
Wilhelm et al., 2014,
Nature,
PMID:24870543
Juan A. Vizcaí[email protected]
13th HUPO World CongressMadrid, 7 October 2014
Which are the most accessed datasets?Tota
l N
um
bers
Juan A. Vizcaí[email protected]
13th HUPO World CongressMadrid, 7 October 2014
Vaudel M, Barsnes H, Berven FS, Sickmann A,
Martens L:
Proteomics 2011;11(5):996-9.
http://searchgui.googlecode.com http://peptide-shaker.googlecode.com
Vaudel M, Burkhart J, Zahedi RP, Berven FS, Sickmann A, Martens L,
Barsnes H:
Nature Biotechnology (in press)
CompOmics Open Source Analysis Pipeline
Juan A. Vizcaí[email protected]
13th HUPO World CongressMadrid, 7 October 2014
Find the desired PRIDE project …
… and start re-analyzing the data!
… inspect the project details ….
Reshake PRIDE data!
Juan A. Vizcaí[email protected]
13th HUPO World CongressMadrid, 7 October 2014
Conclusions
• Submission to ProteomeXchange via PRIDE is easy.
• Decide between complete and partial submissions.
• Different open source tools available to facilitate the process.
• File transfer speed should not be a problem (Aspera support)
Juan A. Vizcaí[email protected]
13th HUPO World CongressMadrid, 7 October 2014
Acknowledgements
PRIDE Team
Attila Csordas
Rui Wang
Florian Reisinger
Jose A. Dianes
Tobias Ternent
Yasset Perez-Riverol
Noemi del Toro
Henning Hermjakob
EU FP7 grant number 260558
PeptideAtlas Team (ISB, Seattle)
Eric Deutsch
Terry Farrah
Zhi Sun
Andrew R. Jones
Lennart Martens
Juan Pablo Albar
Martin Eisenacher
Gil Omenn
And many other PX partners and
stakeholders