Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
Proteomics Data Analysis using Galaxy-P
Center for Mass Spectrometry and Proteomics
November 23rd 2015 Pratik Jagtap
http://www.cbs.umn.edu/msp
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
Proteomics Data Analysis using Galaxy-P • Proteomics Workflow • Search Databases • Galaxy Platform • Generating a Database within GalaxyP • Peaklist Conversion • Search algorithms • Using search algorithms within GalaxyP • Protein Inference • Using PeptideShaker within GalaxyP
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
Documentation: http://z.umn.edu/augworkshopgalaxyp
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
Proteomics Data Analysis using Galaxy-P • Proteomics Workflow • Search Databases • Galaxy Platform • Generating a Database within GalaxyP • Peaklist Conversion • Search algorithms • Using search algorithms within GalaxyP • Protein Inference • Using PeptideShaker within GalaxyP
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
Eng et al 2011 Mol Cell Proteomics. 10(11): R111.009522.
PROTEOMICS WORKFLOW
5
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
SEARCH DATABASES
Mass spectrum Reference Protein Database
from genomic annotation Peptide Spectral Match
6
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
Swiss-Prot is the manually annotated and reviewed section of the UniProt Knowledgebase (UniProtKB).
It is a high quality annotated and non-redundant protein sequence database,
which brings together experimental results, computed features and scientific
conclusions. http://en.wikipedia.org/wiki/Swiss-Prot
TrEMBL contains high-quality computationally analyzed records, which are enriched with automatic annotation.
The translations of annotated coding sequences in the EMBL-Bank/GenBank/DDBJ nucleotide sequence database are automatically processed and entered in
TrEMBL. http://en.wikipedia.org/wiki/TrEMBL
PROTEOMIC DATABASES
7
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
CUSTOMIZED PROTEOMIC DATABASES
Customized database
repositories (CPTAC / UniMesh)
Genomic DNA
sequences.
Expressed sequence
tags / cDNA sequences.
Six-frame translation
Three-frame translation
Metagenomic databases.
Translation
RNASeq data.
Translation and database reduction
workflows
Proteomic databases.
8
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
LOOKING BEYOND THE KNOWN PROTEOME
Mass spectrum Reference Protein Database
from genomic annotation
Cancer / Disease related Databases such as COSMIC, IARC p53, OMIM…
Deep genome sequencing data from ICGC, TCGA and CPTAC
RNASeq data (Customized OR
Combined)
6-frame DNA sequences. 3-frame cDNA sequences. Identification of
peptides corresponding
to novel proteoforms.
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
Proteomics Data Analysis using Galaxy-P • Proteomics Workflow • Search Databases • Galaxy Platform • Generating a Database within GalaxyP • Peaklist Conversion • Search algorithms • Using search algorithms within GalaxyP • Protein Inference • Using PeptideShaker within GalaxyP
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
GALAXY PLATFORM
Benefits of Galaxy • A web-based bioinformatics data analysis platform. • Software accessibility and usability. • Share-ability of tools, workflows and histories. • Reproducibility and ability to test and compare results after using multiple
parameters. • Software tools can be used in a sequential manner to generate analytical workflows
that can be reused, shared and creatively modified for multiple studies.
Goecks J et al Genome Biol. 2010;11(8):R86.
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
TOOLS & WORKFLOWS • Software tools can be used in a sequential manner to generate analytical
workflows that can be reused, shared and creatively modified for multiple studies.
For example, Protein Database Downloader downloads UniProt protein FASTA
databases of various organisms.
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
Galaxy-P: https://galaxyp.msi.umn.edu/
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
Proteomics Data Analysis using Galaxy-P • Proteomics Workflow • Search Databases • Galaxy Platform • Generating a Database within GalaxyP • Peaklist Conversion • Search algorithms • Using search algorithms within GalaxyP • Protein Inference • Using PeptideShaker within GalaxyP
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
INPUTS : Mass spectral data and search database.
The dataset will be searched against FASTA database with human proteins, contaminant proteins, spiked in proteins and a subset of 3-frame translated cDNA database from EnSEMBL.
INPUTS: a) MGF formatter MGF files. (dataset collection) b) ABRF-Spike4: FASTA sequences of 4 spiked in proteins. c) FASTA File from EnSEMBL Searches: Subset of 3-frame translated cDNA database from EnSEMBL (our template for identifying novel proteoforms). d) Human UniProt FASTA file + contaminant proteins.
HeLa cell lysate
4 proteins spiked in (10 fmols each)
Digested O/N with trypsin
Liquid chromatography fractionation (10 fractions)
Thermofinnigan Orbitrap Velos (Orbi MS, MS/MS HCD)
RAW Files
mzml files
msconvert
MGF files
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
Log in using your MSI login and password. Click on http://z.umn.edu/history1 Import history and click on ‘start using this history’ Click on http://z.umn.edu/workflow1 Choose import to copy the workflow into your user workflows. On the confirmation screen, select start using this workflow to navigate to your user. In the workflows menu select Run Workflow 1 from the drop down menu. Appropriately assign each input database from History 1 to the corresponding input or the workflow and ‘Run’ the workflow.
GENERATING A DATABASE
1
2
3
4
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
WORFLOW 1
17
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
Tools used in the workflow
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
Select History 1
Import history
Start using this history
Select Workflow 1
Import workflow
Start using this workflow
Run Workflow 1
INPUT
WORKFLOW
http://z.umn.edu/history2
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
Proteomics Data Analysis using Galaxy-P • Proteomics Workflow • Search Databases • Galaxy Platform • Generating a Database within GalaxyP • Peaklist Conversion • Search algorithms • Using search algorithms within GalaxyP • Protein Inference • Using PeptideShaker within GalaxyP
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
Eng et al 2011 Mol Cell Proteomics. 10(11): R111.009522.
PROTEOMICS WORKFLOW
21
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
INPUTS : Mass spectral data and search database.
The dataset will be searched against FASTA database with human proteins, contaminant proteins, spiked in proteins and a subset of 3-frame translated cDNA database from EnSEMBL.
INPUTS: a) MGF formatter MGF files. (dataset collection) b) ABRF-Spike4: FASTA sequences of 4 spiked in proteins. c) FASTA File from EnSEMBL Searches: Subset of 3-frame translated cDNA database from EnSEMBL (our template for identifying novel proteoforms). d) Human UniProt FASTA file + contaminant proteins.
HeLa cell lysate
4 proteins spiked in (10 fmols each)
Digested O/N with trypsin
Liquid chromatography fractionation (10 fractions)
Thermofinnigan Orbitrap Velos (Orbi MS, MS/MS HCD)
RAW Files
mzml files
msconvert
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
Eng et al 2011 Mol Cell Proteomics. 10(11): R111.009522.
MASS SPECTRAL DATA
23
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
RAW DATA CONVERSION TOOL
.RAW
msconvert ProteoWizard
mzML
http://z.umn.edu/msconvert
MGF Formatter
MGF
http://z.umn.edu/mgfformatter
24
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
Click on http://z.umn.edu/history2b Import history and click on ‘start using this history’
5
6
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
Proteomics Data Analysis using Galaxy-P • Proteomics Workflow • Search Databases • Galaxy Platform • Generating a Database within GalaxyP • Peaklist Conversion • Search algorithms • Using search algorithms within GalaxyP • Protein Inference • Using PeptideShaker within GalaxyP
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
A face in the crowd: recognizing peptides through database search. Eng et al 2011 Mol Cell Proteomics. 10(11)
PROTEOMICS WORKFLOW
27
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
Mass spectrum Reference Protein Database
from genomic annotation Peptide Spectral Match
DATABASE SEARCH
28
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
Nesvizhskii et al Nature Methods - 4, 787 - 797 (2007)
DATABASE SEARCH
29
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
Nesvizhskii et al Nature Methods - 4, 787 - 797 (2007)
DATABASE SEARCH
30
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
SEARCHGUI
Vaudel M. et al Proteomics (2011) 11(5) https://code.google.com/p/searchgui/ 31
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
Comet
Myrimatch
MSGF+
MS Amanda
MULTIPLE SEARCH ALGORITHMS Tabb et al, J. Proteome Res., 2007, 6 (2)
Eng et al, Proteomics. 2013, 13(1)
Kim and Pevzner PA. Nat Commun., 2014, 5(1)
Geer et al, J Proteome Res., 2004,3(5).
Craig and Beavis. Bioinformatics., 2004, Jun 20(9)
Dorfer et al, J Proteome Res., 2014, 13(8).
32
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
MULTIPLE SEARCH ALGORITHMS
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
Proteomics Data Analysis using Galaxy-P • Proteomics Workflow • Search Databases • Galaxy Platform • Generating a Database within GalaxyP • Peaklist Conversion • Search algorithms • Using search algorithms within GalaxyP • Protein Inference • Using PeptideShaker within GalaxyP
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
Click on http://z.umn.edu/history3b Import history and click on ‘start using this history’
7
8
Identification Algorithms: OMSSA, MS-GF+ and Comet Database Search Parameters 1: Precursor Accuracy Unit: ppm 2: Precursor Ion m/z Tolerance: 10.0 3: Fragment Ion m/z Tolerance: 0.01 4: Enzyme: Trypsin 5: Number of Missed Cleavages: Not implemented 6: Database: input_database.fasta 7: Forward Ion: b 8: Rewind Ion: y 9: Fixed Modifications: mmts on c 10: Variable Modifications: oxidation of m
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
SEARCHGUI PARAMETERS
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
SEARCHGUI PARAMETERS
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
Identification Algorithms: OMSSA, MS-GF+ and Comet Database Search Parameters 1: Precursor Accuracy Unit: ppm 2: Precursor Ion m/z Tolerance: 10.0 3: Fragment Ion m/z Tolerance: 0.01 4: Enzyme: Trypsin 5: Number of Missed Cleavages: Not implemented 6: Database: input_database.fasta 7: Forward Ion: b 8: Rewind Ion: y 9: Fixed Modifications: mmts on c 10: Variable Modifications: oxidation of m
SEARCHGUI PARAMETERS
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
Proteomics Data Analysis using Galaxy-P • Proteomics Workflow • Search Databases • Galaxy Platform • Generating a Database within GalaxyP • Peaklist Conversion • Search algorithms • Using search algorithms within GalaxyP • Protein Inference • Using PeptideShaker within GalaxyP
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
PEPTIDESHAKER
Vaudel et al Nature Biotechnology, 33, (2015)
http://galaxyproteomics.github.io/peptideshaker/
40
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
Slide from Alexey Nesvizshkii talk at http://www.scivee.tv/node/12671
PEPTIDESHAKER : PROTEIN INFERENCE
41
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
Click on http://z.umn.edu/history4b Import history and click on ‘start using this history’
9
10
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
4.3 Peptide Shaker in GalaxyP
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
PEPTIDESHAKER : TARGET-DECOY SEARCH
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
PEPTIDESHAKER : TARGET-DECOY SEARCH
45
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
4.3 Peptide Shaker in GalaxyP
46
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
PEPTIDESHAKER: OUTPUTS
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
PEPTIDESHAKER: OUTPUTS
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
http://z.umn.edu/augworkshopgalaxyp
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
Complex Workflows Galaxy-P provides an integrated platform for every step of proteogenomic analysis. • Build target database – download and
translate EST databases or perform gene prediction with Augustus.
• Numerous tools for identification and text manipulation.
• Workflow utilizing BLAST to identify novel peptides.
• Tool to assess peptide-spectrum matches and visualize spectra.
• Visualize identified peptides on the genome. • 140 steps: Seamless, integrated
proteogenomic workflow.
Flexible and accessible workflows for improved proteogenomic analysis using Galaxy framework. J. Proteome Res., DOI: 10.1021/pr500812t Link: z.umn.edu/pgfirstlook
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
Links to workflows, webcast, pages, documentation and publications.
Workflows Proteogenomic studies: http://z.umn.edu/pg140 Metaproteomic studies: http://z.umn.edu/metaproteomics1
Webcast Using ProteinPilot within Galaxy-P: z.umn.edu/ppingp
Pages Proteogenomics page: z.umn.edu/proteinpilotpage Metaproteomics page: z.umn.edu/metaproteomicspage
Workshop / Tutorial on proteogenomics: Mass Spectrometry-based Proteomics Data Analysis using Galaxy-P: z.umn.edu/gcc2015gp
Manuscripts
• Metaproteomic analysis using the Galaxy framework. Proteomics. (2015) doi: 10.1002/pmic.201500074. PMID: 26058579.
• Multi-omic data analysis using Galaxy. Nat Biotechnol. (2015) 33(2):137-9. doi: 10.1038/nbt.3134. PMID: 25658277
• Flexible and accessible workflows for improved proteogenomic analysis using the Galaxy framework. J Proteome Res. (2014)13(12):5898-908. doi: 10.1021/pr500812t. PMID:25301683
• Proteomic profiles in acute respiratory distress syndrome differentiates survivors from non-survivors. PLoS One. (2014) 7;9(10):e109713. doi: 10.1371/journal.pone.0109713. PMID: 25290099
• Using Galaxy-P to leverage RNA-Seq for the discovery of novel protein variations. BMC Genomics. (2014) 15:703. doi: 10.1186/1471-2164-15-703. PubMed PMID: 25149441
Proteogenomics page: z.umn.edu/proteinpilotpage Metaproteomics page: z.umn.edu/metaproteomicspage
5: Sheynkman GM, Johnson JE, Jagtap PD, Shortreed MR, Onsongo G, Frey BL, Griffin TJ, Smith LM.
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
Proteomics Data Analysis using Galaxy-P • Proteomics Workflow • Search Databases • Galaxy Platform • Generating a Database within GalaxyP • Peaklist Conversion • Search algorithms • Using search algorithms within GalaxyP • Protein Inference • Using PeptideShaker within GalaxyP
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279
© 2015 Regents of the University of Minnesota. All rights reserved.
QUESTIONS?
Follow us on twitter.com/usegalaxyp
Visit http://usegalaxyp.org
or http://galaxyp.msi.umn.edu
or