26
Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279 © 2015 Regents of the University of Minnesota. All rights reserved. PROTEOINFORMATICS OVERVIEW Center for Mass Spectrometry and Proteomics August 20th 2015 Pratik Jagtap http://www.cbs.umn.edu/msp

August 20th 2015 Pratik Jagtapcbs.umn.edu/sites/cbs.umn.edu/files/public/downloads/08...Pratik Jagtap Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

Embed Size (px)

Citation preview

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

PROTEOINFORMATICS OVERVIEW

Center for Mass Spectrometry and Proteomics

August 20th 2015 Pratik Jagtap

http://www.cbs.umn.edu/msp

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

Outline

•  PROTEOMICS WORKFLOW •  PEAKLIST PROCESSING •  Search Databases Overview •  Protein Identification •  Protein Validation and Quantification •  Publication Guidelines

Terminology •  RAW file

•  Peaklist

•  Peaklist processing

•  Peptide-Spectral Match (PSM)

•  Genome Assembly and annotation

•  Variety of search databases

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

PROTEOMICS WORKFLOW

Eng  et  al  2011  Mol  Cell  Proteomics.  10(11):  R111.009522.  

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

Mass  spectral  data  (.RAW)  

StaAsAcal  validaAon  of  Protein  IdenAficaAon.  

Protein  IdenAficaAon  

Processing  Mass  Spectrometer  

PROTEOMICS WORKFLOW

Search  databases    Protein  

QuanAtaAon.  

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

Outline •  PROTEOMICS WORKFLOW

•  PEAKLIST PROCESSING •  Search Databases Overview •  Protein Identification •  Protein Validation and Quantification •  Publication Guidelines

Terminology •  RAW file

•  Peaklist

•  Peaklist processing

•  Peptide-Spectral Match (PSM)

•  Genome Assembly and annotation

•  Variety of search databases

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved. .    

MASS SPECTRAL DATA

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

Cappadona  et  al  2012    Amino  Acids.  Sep  2012;  43(3):  1087–1108    

Eng  et  al  2011  Mol  Cell  Proteomics.  10(11):  R111.009522.  

MASS SPECTRAL DATA

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

Eng  et  al  2011  Mol  Cell  Proteomics.  10(11):  R111.009522.  

PROTEOMICS WORKFLOW

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

Peaklist  Processing  

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

RAW DATA CONVERSION TOOLS

.RAW XRawfile library from

ThermoFinnigan Xcalibur software.

ReAdW

mzxML

http://tools.proteomecenter.org/wiki/index.php?title=Software:ReAdW

msconvert

ProteoWizard

mzML

http://proteowizard.sourceforge.net/

Others Raw2MSM extract_msn DeconMSn DTASuperCharge

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

Average ppm and Standard deviation improves when MaxQuant processed files are used.

ORBITRAP: PROCESSING AND EFFECTS

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

Peaklist  Processing  

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

Eng  et  al  2011  Mol  Cell  Proteomics.  10(11):  R111.009522.  

PROTEOMICS WORKFLOW

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

Outline •  PROTEOMICS WORKFLOW •  PEAKLIST PROCESSING

•  Search Databases Overview

•  Protein Identification •  Protein Validation and Quantification •  Publication Guidelines

Terminology •  RAW file

•  Peaklist

•  Peaklist processing

•  Peptide-Spectral Match (PSM)

•  Genome Assembly and annotation

•  Variety of search databases

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

Mass  spectral  data  (.RAW)  

StaAsAcal  validaAon  of  Protein  IdenAficaAon.  

Protein  IdenAficaAon  

Processing  Mass  Spectrometer  

PROTEOMICS WORKFLOW

Search  databases    Protein  

QuanAtaAon.  

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

Search against database. Mass spectrum

DATABASE SEARCH

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

Salzberg  Genome  Biology  2007  8:102      doi:10.1186  

DNA → GENOME → PROTEOMIC DATABASE.

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

GENOMIC AND PROTEOMIC DATABASES

Finished and Published Genomes •  3551 Bacterial Genomes. •  211 Archaeal Genomes. •  58 Eukaryal Genomes. •  3363 Viral Genomes

http://www.genomesonline.org/index

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

PROTEOMIC DATABASES

CUSTOMIZED DATABASES

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

Swiss-Prot is the manually annotated and reviewed section of the UniProt Knowledgebase (UniProtKB).

It is a high quality annotated and non-redundant protein sequence database,

which brings together experimental results, computed features and scientific

conclusions. http://en.wikipedia.org/wiki/Swiss-Prot

TrEMBL contains high-quality computationally analyzed records, which are enriched with automatic annotation.

The translations of annotated coding sequences in the EMBL-Bank/GenBank/DDBJ nucleotide sequence database are automatically processed and entered in

TrEMBL. http://en.wikipedia.org/wiki/TrEMBL

PROTEOMIC DATABASES

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

UNIPROT DATABASE

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

UNIPROT DATABASE

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

The Reference Sequence (RefSeq) collection provides a comprehensive, integrated, non-redundant, well-annotated set of sequences, including genomic DNA, transcripts, and proteins. RefSeq sequences form a foundation for medical, functional, and diversity studies. They provide a stable reference for genome annotation, gene identification and characterization, mutation and polymorphism analysis (especially RefSeqGene records), expression studies, and comparative analyses. http://www.ncbi.nlm.nih.gov/refseq/

PROTEOMIC DATABASES

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

CUSTOMIZED PROTEOMIC DATABASES

Customized database

repositories (CPTAC / UniMesh)

Genomic DNA

sequences.

Expressed sequence

tags / cDNA sequences.

Six-frame translation

Three-frame translation

Metagenomic databases.

Translation

RNASeq data.

Translation and database reduction

workflows

Proteomic databases.

24

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

Eng  et  al  2011  Mol  Cell  Proteomics.  10(11):  R111.009522.  

PROTEOMICS WORKFLOW

Center for Mass Spectrometry and Proteomics | Phone | (612)625-2280 | (612)625-2279

© 2015 Regents of the University of Minnesota. All rights reserved.

Outline •  PROTEOMICS WORKFLOW •  PEAKLIST PROCESSING •  Search Databases Overview

•  Protein Identification

•  Protein Validation and Quantification •  Publication Guidelines

Terminology •  RAW file

•  Peaklist

•  Peaklist processing

•  Peptide-Spectral Match (PSM)

•  Genome Assembly and annotation

•  Variety of search databases