34
Protein Sequence Protein Sequence Analysis - Overview Analysis - Overview Raja Mazumder Raja Mazumder Senior Protein Scientist, PIR Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry Assistant Professor, Department of Biochemistry and Molecular Biology and Molecular Biology Georgetown University Medical Center Georgetown University Medical Center NIH Proteomics Workshop 2004

Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology

Embed Size (px)

Citation preview

Page 1: Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology

Protein Sequence Analysis - Protein Sequence Analysis - OverviewOverview

Raja MazumderRaja MazumderSenior Protein Scientist, PIRSenior Protein Scientist, PIR

Assistant Professor, Department of Biochemistry and Molecular Assistant Professor, Department of Biochemistry and Molecular BiologyBiology

Georgetown University Medical CenterGeorgetown University Medical Center

NIH Proteomics Workshop 2004

Page 2: Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology

22

OverviewOverview

ProteomicsProteomics and and protein bioinformatics protein bioinformatics (protein sequence analysis)(protein sequence analysis)

Why do protein sequence analysis? Why do protein sequence analysis? Searching sequence databasesSearching sequence databases Post-processing search resultsPost-processing search results Detecting remote homologsDetecting remote homologs

Page 3: Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology

33

Clinical Proteomics

From Petricoin et al., Nature Reviews Drug Discovery (2002) 1, 683-695From Petricoin et al., Nature Reviews Drug Discovery (2002) 1, 683-695

Page 4: Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology

44

Single protein and shotgun analysisSingle protein and shotgun analysis

Adapted from: McDonald et al. 2002. Disease Markers 18 99-105

Protein Bioinformatics

Mixture of proteinsG

el b

ased

sep

erat

ion

Single protein analysis

Digestion of protein mixture

Spot excisionand digestion

LC orLC/LC separation

Shotgun analysis

Peptides from a single protein

Peptides from many proteins

MS analysisMS/MS analysis

Page 5: Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology

55

Protein Bioinformatics: Protein Protein Bioinformatics: Protein sequence analysissequence analysis

Helps characterize protein sequences Helps characterize protein sequences inin silico silico and allows prediction of protein structure and and allows prediction of protein structure and functionfunction

Statistically significant BLAST hits Statistically significant BLAST hits usuallyusually signifies sequence homologysignifies sequence homology

Homologous sequences may or may not have Homologous sequences may or may not have the same function but would always (very few the same function but would always (very few exceptions) have the same structural foldexceptions) have the same structural fold

Protein sequence analysis allows protein Protein sequence analysis allows protein classification classification

Page 6: Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology

66

Development of protein sequence Development of protein sequence databasesdatabases

Atlas of protein sequence and structureAtlas of protein sequence and structure – – Dayhoff (1966) first sequence database (pre-Dayhoff (1966) first sequence database (pre-bioinformatics). Currently known as Protein bioinformatics). Currently known as Protein Information Resource (PIR)Information Resource (PIR)

Protein data bankProtein data bank (PDB) – structural database (PDB) – structural database (1972) remains most widely used database of (1972) remains most widely used database of structuresstructures

UniProtUniProt – The United Protein Databases – The United Protein Databases (UniProt, 2003) is a central database of protein (UniProt, 2003) is a central database of protein sequence and function created by joining the sequence and function created by joining the forces of the SWISS-PROT, TrEMBL and PIR forces of the SWISS-PROT, TrEMBL and PIR protein database activitiesprotein database activities

Page 7: Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology

77

Comparative protein sequence Comparative protein sequence analysis and evolutionanalysis and evolution

Patterns of conservation in sequences allows us Patterns of conservation in sequences allows us to determine which residues are under selective to determine which residues are under selective constraints (are important for protein function)constraints (are important for protein function)

Comparative analysis of proteins more sensitive Comparative analysis of proteins more sensitive than comparing DNAthan comparing DNA

Homologous proteins have a common ancestorHomologous proteins have a common ancestor Different proteins evolve at different ratesDifferent proteins evolve at different rates Protein classification systems based on Protein classification systems based on

evolution: evolution: PIRSFPIRSF and and COGCOG

Page 8: Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology

88

PIRSF and large-scale functional PIRSF and large-scale functional annotation of proteinsannotation of proteins

PIRSF structure is in the PIRSF structure is in the form of a network form of a network classification system classification system based on the evolutionary based on the evolutionary relationships of whole relationships of whole proteins and domainsproteins and domains

As part of the UniProt As part of the UniProt project, PIR has developed project, PIR has developed this classification strategy this classification strategy to assist in the propagation to assist in the propagation and standardization of and standardization of protein annotationprotein annotation

Page 9: Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology

99

Comparing proteinsComparing proteins

Amino acid sequence of protein generated Amino acid sequence of protein generated from proteomics experimentfrom proteomics experiment

e.g. protein fragment e.g. protein fragment DTIKDLLPNVCAFPMEKGPCQTYMTRWFFNFETGECELFAYGGCGGNSNNFLRKEKCEKFDTIKDLLPNVCAFPMEKGPCQTYMTRWFFNFETGECELFAYGGCGGNSNNFLRKEKCEKFCKFTCKFT

Amino-acids of two sequences can be Amino-acids of two sequences can be aligned and we can easily count the aligned and we can easily count the number of identical residues (or use an number of identical residues (or use an index of similarity) to find the % similarity.index of similarity) to find the % similarity.

Proteins structures can be compared by Proteins structures can be compared by superimpositionsuperimposition

Page 10: Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology

1010

Protein sequence alignmentProtein sequence alignment

Pairwise alignmentPairwise alignment aa bb a a cc dd aa bb _ _ cc dd

Multiple sequence alignment usually provides Multiple sequence alignment usually provides more informationmore information a a bb a a cc d d a a bb _ _ cc d d x x bb a a cc e e

Multiple alignment difficult to do for distantly Multiple alignment difficult to do for distantly related proteins related proteins

Page 11: Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology

1111

Protein sequence analysis Protein sequence analysis overviewoverview

Protein databasesProtein databases PIR and UniProtPIR and UniProt

Searching databasesSearching databases Peptide search, BLAST search, Text searchPeptide search, BLAST search, Text search

Information retrieval and analysisInformation retrieval and analysis Protein records at UniProt and PIRProtein records at UniProt and PIR Multiple sequence alignmentMultiple sequence alignment Secondary structure predictionSecondary structure prediction Homology modelingHomology modeling

Page 12: Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology

1212

Universal Protein KnowledgebaseUniversal Protein Knowledgebase(UniProt) (UniProt)

PIR (Protein Information Resource)PIR (Protein Information Resource) has recently joined forces with EBI (European Bioinformatics Institute) and has recently joined forces with EBI (European Bioinformatics Institute) and SIB (Swiss Institute of Bioinformatics) to establish the UniProtSIB (Swiss Institute of Bioinformatics) to establish the UniProt

Literature-Based Annotation

UniProt Archive

UniProt NREF

Swiss-Prot

PIR-PSDTrEMBL RefSeq GenBank/EMBL/DDBJ

EnsEMBL PDB PatentData

Other Data

UniProt Knowledgebase

Classification

Automated Annotation

Clustering at 100, 90, 50%

Literature-Based Annotation

UniProt Archive

UniProt NREF

Swiss-Prot

PIR-PSDTrEMBL RefSeq GenBank/EMBL/DDBJ

EnsEMBL PDB PatentData

Other Data

UniProt Knowledgebase

Classification

Automated Annotation

Clustering at 100, 90, 50%

http://www.uniprot.org/

Page 13: Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology

1313

Peptide SearchPeptide Search

Page 14: Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology

1414

Query SequenceQuery Sequence

Unknown sequence is Q9I7I7Unknown sequence is Q9I7I7 BLAST Q9I7I7 against the UniProt BLAST Q9I7I7 against the UniProt

knowledgebaseknowledgebase ( (http://http://

www.pir.uniprot.org/search/blast.shtmlwww.pir.uniprot.org/search/blast.shtml)) Analyze resultsAnalyze results

Page 15: Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology

1515

BLAST resultsBLAST results

Page 16: Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology

1616

Text SearchText Search

Page 17: Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology

1717

Text search results: display Text search results: display optionsoptions

Moving Pubmed ID and PDB ID into “Columns in Display”

Page 18: Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology

1818

Text search results: add input Text search results: add input boxbox

Page 19: Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology

1919

Text Search Result with NULL/NOT Text Search Result with NULL/NOT NULLNULL

Page 20: Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology

2020

UniProt protein recordUniProt protein record: :

Page 21: Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology

2121

SIR2_HUMAN protein recordSIR2_HUMAN protein record

Page 22: Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology

2222

Are Q9I7I7 and SIR2_HUMAN Are Q9I7I7 and SIR2_HUMAN close homologs?close homologs?

Check BLAST resultsCheck BLAST results

Check pairwise alignmentCheck pairwise alignment

Page 23: Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology

2323

Protein structure predictionProtein structure prediction

Programs can predict Programs can predict secondary structure secondary structure information with 70% information with 70% accuracyaccuracy

Homology modeling - Homology modeling - prediction of ‘target prediction of ‘target structure from closely structure from closely related ‘template’ related ‘template’ structurestructure

Page 24: Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology

2424

Secondary structure predictionSecondary structure predictionhttp://bioinf.cs.ucl.ac.uk/psipred/http://bioinf.cs.ucl.ac.uk/psipred/

Page 25: Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology

2525

Secondary structure prediction Secondary structure prediction resultsresults

Page 26: Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology

2626

Sir2 Homolog-Nad ComplexSir2 Homolog-Nad Complex

Page 27: Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology

2727

Homology modelingHomology modelinghttp://www.expasy.org/swissmod/SWISS-MODEL.htmlhttp://www.expasy.org/swissmod/SWISS-MODEL.html

Page 28: Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology

2828

Homology model of Q9I7I7Homology model of Q9I7I7

Blue - excellentGreen - so soRed - not good

Yellow - beta sheetRed - alpha helixGrey - loop

Page 29: Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology

2929

Sequence features: Sequence features: SIR2_HUMANSIR2_HUMAN

Page 30: Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology

3030

Multiple sequence alignmentMultiple sequence alignment

Page 31: Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology

3131

Multiple sequence alignmentMultiple sequence alignment

Q9I7I7, Q82QG9, SIR2_HUMANQ9I7I7, Q82QG9, SIR2_HUMAN

Page 32: Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology

3232

Sequence features: Sequence features: CRAA_RABITCRAA_RABIT

Page 33: Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology

3333

Identifying remote homologsIdentifying remote homologs

Page 34: Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology

3434

Structure guided sequence Structure guided sequence alignmentalignment