36
Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department of Biochemistry and Molecular Biology Georgetown University Medical Center

Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department

Embed Size (px)

Citation preview

Page 1: Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department

Protein Sequence Analysis- Overview -

NIH Proteomics Workshop 2007

Raja MazumderScientific Coordinator, PIR

Research Assistant Professor, Department of Biochemistry and Molecular Biology

Georgetown University Medical Center

Page 2: Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department

Topics

Proteomics and protein bioinformatics (protein sequence analysis)

Why do protein sequence analysis? Searching sequence databases Post-processing search results Detecting remote homologs

Page 3: Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department

Clinical proteomics

From Petricoin et al., Nature Reviews Drug Discovery (2002) 1, 683-695

Page 4: Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department

Single protein and shotgun analysis

Adapted from: McDonald et al. (2002). Disease Markers 18:99-105

Protein Bioinformatics

Mixture of proteinsG

el b

ased

sep

erat

ion

Single protein analysis

Digestion of protein mixture

Spot excisionand digestion

LC orLC/LC separation

Shotgun analysis

Peptides from a single protein

Peptides from many proteins

MS analysisMS/MS analysis

Page 5: Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department

Protein bioinformatics: protein sequence analysis

Helps characterize protein sequences in silico and allows prediction of protein structure and function

Statistically significant BLAST hits usually signifies sequence homology

Homologous sequences may or may not have the same function but would always (very few exceptions) have the same structural fold

Protein sequence analysis allows protein classification

Page 6: Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department

Development of protein sequence databases

Atlas of protein sequence and structure – Dayhoff (1966) first sequence database (pre-bioinformatics). Currently known as Protein Information Resource (PIR)

Protein data bank (PDB) – structural database (1972) remains most widely used database of structures

UniProt – The Universal Protein Resource (2003) is a central database of protein sequence and function created by joining the forces of the Swiss-Prot, TrEMBL and PIR protein database activities

Page 7: Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department

Comparative protein sequence analysis and evolution

Patterns of conservation in sequences allows us to determine which residues are under selective constraint (and thus likely important for protein function)

Comparative analysis of proteins is more sensitive than comparing DNA

Homologous proteins have a common ancestor

Different proteins evolve at different rates

Protein classification systems based on evolution: PIRSF and COG

Page 8: Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department

PIRSF and large-scale annotation of proteins

PIRSF is a protein classification system based on the evolutionary relationships of whole proteins

As part of the UniProt project, PIR has developed this classification strategy to assist in the propagation and standardization of protein annotation

Page 9: Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department

Comparing proteins

Amino acid sequence of protein generated from proteomics experiment

e.g. protein fragment DTIKDLLPNVCAFPMEKGPCQTYMTRWFFNFETGECELFAYGGCGGNSNNFLRKEKCEKFCKFT

Amino-acids of two sequences can be aligned and we can easily count the number of identical residues (or use an index of similarity) as a measure of relatedness.

Protein structures can be compared by superimposition

Page 10: Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department

Protein sequence alignment

Pairwise alignmenta b a c d a b _ c d

Multiple sequence alignment provides more informationa b a c da b _ c dx b a c e

MSA difficult to do for distantly related proteins

Page 11: Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department

Protein sequence analysis overview

Protein databases PIR (pir.georgetown.edu) and UniProt

(www.uniprot.org)

Searching databases Peptide search, BLAST search, Text search

Information retrieval and analysis Protein records at UniProt and PIR Multiple sequence alignment Secondary structure prediction Homology modeling

Page 12: Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department

Universal Protein Resource

http://www.uniprot.org/

Literature-Based Annotation

UniProt Archive

UniProt NREF

Swiss-Prot

PIR-PSDTrEMBL RefSeq GenBank/EMBL/DDBJ

EnsEMBL PDB PatentData

Other Data

UniProt KnowledgebaseAutomated Annotation

Clustering at 100, 90, 50%

Literature-Based Annotation

UniParc

UniRef100

Swiss-Prot

PIR-PSDTrEMBL RefSeq GenBank/EMBL/DDBJ

EnsEMBL PDB PatentData

Other Data

UniProtKB

Automated mergingof sequences

Automated Annotation

UniRef90

UniRef50

Page 13: Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department

Peptide Search

Page 14: Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department

ID mapping

Page 15: Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department

Query Sequence

Unknown sequence is Q9I7I7

BLAST Q9I7I7 against the UniProt Knowledgebase (http://www.uniprot.org/search/blast.shtml)

Analyze results

Page 16: Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department

BLAST results

Page 17: Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department

Text searchAny Fieldnot specific

Page 18: Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department

Text search results: display optionsMove Pubmed ID, Pfam ID and PDB ID into “Columns in Display”

specific

Page 19: Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department

Text search results: add input box

Page 20: Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department

Text search result with null/not null

Page 21: Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department

UniProt beta sitehttp://beta.uniprot.org/

Page 22: Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department

UniProtKB protein record

Page 23: Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department

SIR2_HUMAN protein record

Page 24: Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department

Are Q9I7I7 and SIR2_HUMAN homologs?

Check BLAST results

Check pairwise alignment

Page 25: Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department

Protein structure prediction

Programs can predict secondary structure information with 70% accuracy

Homology modeling - prediction of ‘target’ structure from closely related ‘template’ structure

Page 26: Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department

Secondary structure predictionhttp://bioinf.cs.ucl.ac.uk/psipred/

Page 27: Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department

Secondary structure prediction results

Page 28: Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department

Sir2 structure

Page 29: Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department

Homology modelinghttp://www.expasy.org/swissmod/SWISS-MODEL.html

Page 30: Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department

Homology model of Q9I7I7

Blue - excellentGreen - so soRed - not good

Yellow - beta sheetRed - alpha helixGrey - loop

Page 31: Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department

Sequence features: SIR2_HUMAN

Page 32: Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department

Multiple sequence alignment

Page 33: Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department

Multiple sequence alignmentQ9I7I7, Q82QG9, SIR2_HUMAN

Page 34: Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department

Sequence features: CRAA_RABIT

Page 35: Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department

Identifying Remote Homologs

Page 36: Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department

Structure guided sequence alignment