Upload
felicity-hancock
View
218
Download
3
Tags:
Embed Size (px)
Citation preview
Johannes [email protected]
PSI MeetingHeidelberg, April 2011EBI is an Outstation of the European Molecular Biology Laboratory.
mzTabProposal for A Simple Data Format for Proteomics Results
Johannes [email protected]
PSI MeetingHeidelberg, April 2011
Current Situation
• The necessity of standard data formats has become generally accepted
• Proteomics techniques are constantly evolving• Proposed standard formats had to become very complex
to adequately capture proteomics data• mzIdentML for identification data• mzQuantML for quantitative data
• An effective use of these data formats requires sophisticated bioinformatic knowledge
• Many researchers are still used to use MS Excel to “look” at their data
Johannes [email protected]
PSI MeetingHeidelberg, April 2011
Communication of Proteomics Results
• Proteomics resources require a mechanism to simply/efficiently exchange basic proteomics results
• Collaboration with colleagues from other scientific fields is increasingly important• Necessity to share proteomics results with researchers outside of
proteomics
• Need to make proteomics data easily accessible
Johannes [email protected]
PSI MeetingHeidelberg, April 2011
Potential Current Problems
• Currently proposed standard formats are difficult to use without the JAVA APIs
• “Complete” standard formats are too complex and big to quickly share the essential results
• Quick, f.e. Perl scripts for specific research questions are not easily possible• Large amount of potential innovation could be lost
• Reading files requires special software• Further processing of the data (f.e. with statistical) tools is not easily
possible• No standard tools to read / write mz*ML files available• Custom built software required for many use cases otherwise fulfilled by
“Excel & friends”
Johannes [email protected]
PSI MeetingHeidelberg, April 2011
mzTab - Aim
• To provide a simple and efficient way of exchanging proteomics data• Which protein / peptide was identified in a given experimental
setting
• Easy to update and maintain• Easy to use by the proteomics community, systems
biologists as well as providers of knowledge bases
Johannes [email protected]
PSI MeetingHeidelberg, April 2011
mzTab – Target Audience
• Proteomics repositories (f.e. PRIDE, PeptideAtlas) • Knowledge base resources (f.e. UniProt, HPRD)• Researchers outside of proteomics• Researchers analyzing proteomics data with limited
bioinformatic knowledge / support
Johannes [email protected]
PSI MeetingHeidelberg, April 2011
mzTab – proposed concept
• A tab-delimited file format• Goals
• Content should be “readable” using MS Excel• Should contain minimal information for proteomics repositories /
knowledge bases to exchange data• Data should be easily accessible using f.e. scripting languages• One file should be able to contain multiple experiments / proteins from
different resources• Aim: To represent the result of a query to f.e. PRIDE using this
format• Provide a simplisitic summary of proteomics results
• Every entry contains a reference to the source data (in mzIdentML / mzQuantML format)
Johannes [email protected]
PSI MeetingHeidelberg, April 2011
mzTab – proposed concept
• What the format does NOT aim at:• Replace mzIdentML or mzQuantML• Contain the complete data of a proteomics experiment• Provide detailed evidence for the data• Allow a researcher to recreate the process which led to the
results• Be requirements conform (MIAPE, journal guidelines, etc.)• In short: be complete in any way
Johannes [email protected]
PSI MeetingHeidelberg, April 2011
mzTab – Possible Format Specification
• Three sections• (Optional) Metdata section• (Required) Protein section• (Optional) Peptide section
• Can report proteomics data at different levels• Single experiments• Multiple (possibly linked) experiments• Data generated as a result to a query (possibly to multiple
resources)
Johannes [email protected]
PSI MeetingHeidelberg, April 2011
mzTab – Metadata Section
----metadataPRIDE_16649-title: The Synaptic Proteome during
Development and Plasticity of the Mouse Visual CortexPRIDE_16649-species: [NEWT, 10090, Mouse,]PRIDE_16649-tissue: [EFO, EFO:0000916, visual cortex,]PRIDE_16649-instrument[1]-type: [MS, MS:1000287, TOF-
MS,]PRIDE_16649-search_engine: [MS, MS:1001207, Mascot, ]PRIDE_16649-contact[1]-name: August B SmitPRIDE_16649-contact[1]-email: [email protected]_16649-url:
http://www.ebi.ac.uk/pride/q.do?accession=16649----END
Johannes [email protected]
PSI MeetingHeidelberg, April 2011
mzTab – Protein Section
----proteinsAccession … reliability peptides …
ambiguity_membersP12345 4 2
P12346,P123457…´----END
• A Table holding the basic identification information• Suggestions of how to include
• quantitative data• multiple search engine scores• ambiguous modification positions
Johannes [email protected]
PSI MeetingHeidelberg, April 2011
mzTab – Peptide Table
----peptidessequence accession unit unique … reliability …DIIL O00160 PRIDE_3381 false 5 …VESVDL O00160 PRIDE_3381 true 4 …----END
• A Table holding the basic peptide information
• Suggestions of how to include • quantitative data• multiple search engine scores• ambiguous modification positions