10
Protein Structure Databases, cont. 11/09/05 D Dobbs ISU - BCB 444/544X 1 11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 1 11/9/05 Protein Structure Databases (continued) Prediction & Modeling 11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 2 Bioinformatics Seminars Nov 10 Thurs 3:40 Com S Seminar in 223 Atanasoff Computational Epidemiology Armin R. Mikler, Univ. North Texas http://www.cs.iastate.edu/~colloq/#t3 Nov 10 Thurs 4:10 EEOB Seminar in 210 Bessey Diversity and Evolution of Plant Immunity Genes: Insights from Molecular Population Genetics Peter Tiffin, Univ. of Minnesota http://www.cbs.umn.edu/tiffin/index.html 11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 3 Bioinformatics Seminars CORRECTION: Next week - Baker Center/BCB Seminars: (seminar abstracts available at above link) Nov 14 Mon 1:10 PM Doug Brutlag, Stanford Discovering transcription factor binding sites Nov 15 Tues 1:10 PM Ilya Vakser, Univ Kansas Modeling protein-protein interactions both seminars will be in Howe Hall Auditorium 11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 4 Protein Structure & Function: Analysis & Prediction Mon Protein structure: basics; classification,databases, visualization Wed Protein structure databases - cont. Thurs Lab Protein structure databases Protein structure analysis & prediction Fri Protein structure prediction Protein-nucleic acid interactions 11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 5 Reading Assignment (for Mon-Fri) Mount Bioinformatics Chp 10 Protein classification & structure prediction http://www.bioinformaticsonline.org/ch/ch10/index.html pp. 409-491 Ck Errata: http://www.bioinformaticsonline.org/help/errata2.html Additional reading assignments for BCB 544: Gene Prediction: Burge & Karlin 1997 JMB 268:78 Prediction of complete gene structures in human genomic DNA Structure Prediction: Schueler-Furman Baker, Science 310:638 Progress in modeling of protein structures and interactions 11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 6 Review last lecture: Protein Structure: Basics

Protein Structure Databases, cont. 11/09/05web.cs.iastate.edu/~cs544/Lectures/1109ProteinStructure.pdfProtein Structure Databases, cont. 11/09/05 D Dobbs ISU - BCB 444/544X 3 11/09/05

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Protein Structure Databases, cont. 11/09/05web.cs.iastate.edu/~cs544/Lectures/1109ProteinStructure.pdfProtein Structure Databases, cont. 11/09/05 D Dobbs ISU - BCB 444/544X 3 11/09/05

Protein Structure Databases, cont. 11/09/05

D Dobbs ISU - BCB 444/544X 1

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 1

11/9/05

Protein Structure Databases(continued)

Prediction & Modeling

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 2

Bioinformatics Seminars

Nov 10 Thurs 3:40 Com S Seminar in 223 AtanasoffComputational Epidemiology

Armin R. Mikler, Univ. North Texashttp://www.cs.iastate.edu/~colloq/#t3

Nov 10 Thurs 4:10 EEOB Seminar in 210 BesseyDiversity and Evolution of Plant Immunity Genes: Insights from Molecular Population Genetics

Peter Tiffin, Univ. of Minnesotahttp://www.cbs.umn.edu/tiffin/index.html

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 3

Bioinformatics SeminarsCORRECTION:

Next week - Baker Center/BCB Seminars: (seminar abstracts available at above link)

Nov 14 Mon 1:10 PM Doug Brutlag, StanfordDiscovering transcription factor binding sites

Nov 15 Tues 1:10 PM Ilya Vakser, Univ KansasModeling protein-protein interactions both seminars will be in Howe Hall Auditorium

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 4

Protein Structure & Function:Analysis & Prediction

Mon Protein structure: basics; classification,databases, visualization

Wed Protein structure databases - cont.

Thurs Lab Protein structure databases Protein structure analysis & prediction

Fri Protein structure prediction Protein-nucleic acid interactions

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 5

Reading Assignment (for Mon-Fri)Mount Bioinformatics

• Chp 10 Protein classification & structure predictionhttp://www.bioinformaticsonline.org/ch/ch10/index.html

• pp. 409-491• Ck Errata: http://www.bioinformaticsonline.org/help/errata2.html

Additional reading assignments for BCB 544:• Gene Prediction: Burge & Karlin 1997 JMB 268:78

Prediction of complete gene structures in human genomic DNA

• Structure Prediction: Schueler-Furman…Baker, Science 310:638Progress in modeling of protein structures and interactions

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 6

Review last lecture:

Protein Structure: Basics

Page 2: Protein Structure Databases, cont. 11/09/05web.cs.iastate.edu/~cs544/Lectures/1109ProteinStructure.pdfProtein Structure Databases, cont. 11/09/05 D Dobbs ISU - BCB 444/544X 3 11/09/05

Protein Structure Databases, cont. 11/09/05

D Dobbs ISU - BCB 444/544X 2

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 7

Protein Structure & Function

• Amino acids characteristics• Structural classes & motifs• Protein functions & functional families

(not much - more on this later)

• Classification• Databases• Visualization

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 8

Amino Acids

Each of 20 different amino acids has different"R-Group," side chain attached to Cα

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 9

Peptide bond is rigid and planar

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 10

Hydrophobic Amino Acids

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 11

Charged Amino Acids

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 12

Polar Amino Acids

Page 3: Protein Structure Databases, cont. 11/09/05web.cs.iastate.edu/~cs544/Lectures/1109ProteinStructure.pdfProtein Structure Databases, cont. 11/09/05 D Dobbs ISU - BCB 444/544X 3 11/09/05

Protein Structure Databases, cont. 11/09/05

D Dobbs ISU - BCB 444/544X 3

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 13

Certain side-chain configurations areenergetically favored (rotamers)

Ramachandran plot:"Allowable" psi & phi angles

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 14

Glycine is smallest amino acidR group = H atom

• Glycine residues increasebackbone flexibility becausethey have no R group

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 15

Proline is cyclic• Proline residuesreduce flexibility ofpolypeptide chain

• Proline cis-transisomerization is oftena rate-limiting step inprotein folding• Recent worksuggests it also mayalso regulate ligandbinding in nativeproteins -Andreotti

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 16

Cysteines can form disulfide bonds

• Disulfide bonds(covalent) stabilize3-D structures

• In eukaryotes,disulfide bonds arefound only in secretedproteins orextracellular domains

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 17

Globular proteins have a compacthydrophobic core

Packing of hydrophobic side chains into interior is maindriving force for folding

Problem? Polypeptide backbone is highly polar(hydrophilic) due to polar -NH and C=O in eachpeptide unit; these polar groups must be neutralized

Solution? Form regular secondary structures,e.g., α-helix, β-sheet, stabilized by H-bonds

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 18

Exterior surface of globular proteinsis generally hydrophilic

Hydrophobic core formed by packed secondarystructural elements provides compact, stable core

"Functional groups" of protein are attached to thisframework; exterior has more flexible regions(loops) and polar/charged residues

Hydrophobic "patches" on protein surface are ofteninvolved in protein-protein interactions

Page 4: Protein Structure Databases, cont. 11/09/05web.cs.iastate.edu/~cs544/Lectures/1109ProteinStructure.pdfProtein Structure Databases, cont. 11/09/05 D Dobbs ISU - BCB 444/544X 3 11/09/05

Protein Structure Databases, cont. 11/09/05

D Dobbs ISU - BCB 444/544X 4

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 19

Protein Secondary Structures

α−Helicesβ−SheetsLoopsCoils

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 20

α−helix: stabilized by H-bonds betweenevery ~ 4th residue in backbone

C = blackO = redN = blue

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 21

Certain amino acids are "preferred" &others are rare in α−helices

• Ala, Glu, Leu, Met = good helix formers• Pro, Gly Tyr, Ser = very poor• Amino acid composition & distribution varies, depending

on on location of helix in 3-D structure

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 22

β-sheets - also stabilized by H-bondsbetween back bone atoms

Anti-parallel Parallel

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 23

Loops• Connect helices and sheets• Vary in length and 3-D

configurations• Are located on surface of

structure• Are more "tolerant" of mutations• Are more flexible and can adopt

multiple conformations• Tend to have charged and polar

amino acids• Are frequently components of

active sites• Some fall into distinct

structural families (e.g.,hairpin loops, reverse turns)

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 24

Coils

• Regions of 2' structure that are nothelices, sheets, or recognizable turns

• Intrinsically disordered regions appear toplay important functional roles

Page 5: Protein Structure Databases, cont. 11/09/05web.cs.iastate.edu/~cs544/Lectures/1109ProteinStructure.pdfProtein Structure Databases, cont. 11/09/05 D Dobbs ISU - BCB 444/544X 3 11/09/05

Protein Structure Databases, cont. 11/09/05

D Dobbs ISU - BCB 444/544X 5

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 25

Globular proteins are built fromrecurring structural patterns

Motifs or supersecondary structures =combinations of 2' structural elements

Domains = combinations of motifs• Independently folding unit (foldon)• Functional unit

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 26

Simple motifs combine to form domains

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 27

6 main classes of protein structure1) α Domains

• Bundles of helices connected by loops

2) β Domains• Mainly antiparallel sheets, usually with 2 sheets forming

sandwich

3) α/β Domains• Mainly parallel sheets with intervening helices, also

mixed sheets

4) α+β Domains• Mainly segregated helices and sheets

5) Multidomain (α & β)• Containing domains from more than one class

6) Membrane & cell-surface proteins

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 28

α-domain structures: 4-helix bundles

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 29

β-sheets: up-and-down sheets & barrels

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 30

α/β-domains: leucine-rich motifs canform horseshoes

Page 6: Protein Structure Databases, cont. 11/09/05web.cs.iastate.edu/~cs544/Lectures/1109ProteinStructure.pdfProtein Structure Databases, cont. 11/09/05 D Dobbs ISU - BCB 444/544X 3 11/09/05

Protein Structure Databases, cont. 11/09/05

D Dobbs ISU - BCB 444/544X 6

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 31

New today:

Protein StructureDatabasesClassificationVisualization

Protein Structure PredictionSecondary structure Tertiary structure

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 32

Protein sequence databases

• UniProt (SwissProt, PIR, EBI)http://www.pir.uniprot.org

• NCBI Protein http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Protein

More on these later: protein function prediction

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 33

Protein sequence & structure: analysis• Diamond STING Millennium - many useful structure

analysis tools, including Protein Dossierhttp://trantor.bioc.columbia.edu/SMS/

• SwissProt (UniProt)protein knowledgebasehttp://us.expasy.org/sprot

• InterPROsequence analysis toolshttp://www.ebi.ac.uk/interpro

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 34

Protein structure databases• PDB Protein Data Bank http://www.rcsb.org/pdb/ (RCSB) - THE protein structure database

• MMDB Molecular Modeling Databasehttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Structure

(NCBI Entrez) - has "added" value

• MSD Molecular Structure Database http://www.ebi.ac.uk/msdEspecially good for interactions, binding sites

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 35

Protein structure classification• SCOP = Structural Classification of Proteins

Levels reflect both evolutionary and structural relationshipshttp://scop.mrc-lmb.cam.ac.uk/scop

• CATH = Classification by Class, Architecture, Topology & Homologyhttp://cathwww.biochem.ucl.ac.uk/latest/

• DALI/FSSP (recently moved to EBI & reorganized)• fully automated structure alignments

• DALI server http://www.ebi.ac.uk/dali/index.html• DALI Database (fold classification)

http://ekhidna.biocenter.helsinki.fi/dali/start

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 36

Protein structure visualization• Molecular Visualization Freeware:

http://www.umass.edu/microbio/rasmol

• MolviZ.Orghttp://www.umass.edu/microbio/chime

• Protein Explorer http://www.umass.edu/microbio/chime/pe/protexpl/frntdoor.htm• RASMOL (& many decendents: Protein Explorer,PyMol, MolMol, etc.)

http://www.umass.edu/microbio/rasmol/index2.htm• CHIME

http://www.umass.edu/microbio/chime/getchime.htm

• Cn3Dhttp://www.biosino.org/mirror/www.ncbi.nlm.nih.gov/Structure/cn3d/

• Deep View = Swiss-PDB Viewerhttp://www.expasy.org/spdbv

Page 7: Protein Structure Databases, cont. 11/09/05web.cs.iastate.edu/~cs544/Lectures/1109ProteinStructure.pdfProtein Structure Databases, cont. 11/09/05 D Dobbs ISU - BCB 444/544X 3 11/09/05

Protein Structure Databases, cont. 11/09/05

D Dobbs ISU - BCB 444/544X 7

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 37

PDB (RCSB)http://www.rcsb.org/pdb

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 38

RCSB PDB - Beta sitehttp://pdbbeta.rcsb.org/pdb/Welcome.do

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 39

RCSB PDB - New Tutorialhttp://core1.rcsb.org/tutorial

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 40

NCBI Structurehttp://www.ncbi.nlm.nih.gov/Structure

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 41

MMDBhttp://www.ncbi.nlm.nih.gov/Structure/MMDB/mmdb.shtml

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 42

Cn3Dhttp://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml

Page 8: Protein Structure Databases, cont. 11/09/05web.cs.iastate.edu/~cs544/Lectures/1109ProteinStructure.pdfProtein Structure Databases, cont. 11/09/05 D Dobbs ISU - BCB 444/544X 3 11/09/05

Protein Structure Databases, cont. 11/09/05

D Dobbs ISU - BCB 444/544X 8

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 43

MMDB: MMolecular MModeling Data Base

Derived PDB structure recordsValue added to PDB records including:

• Integration with other ENTREZ databases & tools• Conversion to parseable ASN.1 data description language• Correction of numbering discrepancies in structure vs sequence• Validation• Addition of explicit chemical graph information

Structure neighbors determined by Vector AlignmentSearch Tool (VAST)

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 44

Searching MMDB

1CET

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 45

MMDB Structure Summary

Cn3D viewer

VAST neighbors

BLAST neighbors

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 46

Cn3D : Displaying 2' Structures

Chloroquine

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 47

Cn3D : Displaying 3' Structures

Chloroquine

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 48

Cn3D: Structural Alignments

Chloroquine

NADH

Page 9: Protein Structure Databases, cont. 11/09/05web.cs.iastate.edu/~cs544/Lectures/1109ProteinStructure.pdfProtein Structure Databases, cont. 11/09/05 D Dobbs ISU - BCB 444/544X 3 11/09/05

Protein Structure Databases, cont. 11/09/05

D Dobbs ISU - BCB 444/544X 9

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 49

Protein Explorer (RasMol/Chime)

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 50

Protein Explorer

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 51

SCOP - Structure Classification

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 52

CATH - Structure Classification

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 53

Structural Genomics

~ 30,000 "traditional" genes in human genome(not counting: ???)

~ 3,000 proteins in a typical cell> 2 million sequences in UniProt> 33,000 protein structures in the PDB Experimental determination of protein structure

lags far behind sequence determination!Goal: Determine structures of "all" protein folds in nature, using

combination of experimental structure determination methods(X-ray crystallography, NMR, mass spectrometry) & structureprediction

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 54

Structural Genomics Projects

TargetDB: database of structural genomics targetshttp://targetdb.pdb.org

Protein Structure Prediction?

Page 10: Protein Structure Databases, cont. 11/09/05web.cs.iastate.edu/~cs544/Lectures/1109ProteinStructure.pdfProtein Structure Databases, cont. 11/09/05 D Dobbs ISU - BCB 444/544X 3 11/09/05

Protein Structure Databases, cont. 11/09/05

D Dobbs ISU - BCB 444/544X 10

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 55

Protein Folding

"Major unsolved problem in molecular biology"

In cells: spontaneousassisted by enzymesassisted by chaperones

In vitro: many proteins fold spontaneously & many do not!

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 56

Steps in Protein Folding1- "Collapse"- driving force is burial of hydrophobic aa’s

(fast - msecs)2- Molten globule - helices & sheets form, but "loose"

(slow - secs)3- "Final" native folded state - compaction, some 2'

structures rearranged

Native state? - assumed to be lowest free energy - may be an ensemble of structures

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 57

Protein Dynamics

• Protein in native state is NOT static• Function of many proteins depends on conformational

changes, sometimes large, sometimes small• Globular proteins are inherently "unstable"

(NOT evolved for maximum stability)• Energy difference between native and denatured

state is very small (5-15 kcal/mol)(this is equivalent to 1 or 2 H-bonds!)

• Folding involves changes in both entropy & enthalpy

11/09/05 D Dobbs ISU - BCB 444/544X: Protein Structure Databases - cont. 58

Protein Structure Prediction

• Structure is largely determined by sequence BUT:

• Similar sequences can assume different structures• Dissimilar sequences can assume similar structures• Many proteins are multi-functional• Protein folding:

• determination of folding pathways• prediction of tertiary structure

still largely unsolved problems