Upload
lucas-dickerson
View
218
Download
1
Embed Size (px)
Citation preview
10/10/07BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 1
BCB 444/544
Lecture 21
Protein Structure Visualization, Classification & Comparison
Secondary Structure Prediction
#21_Oct10
10/10/07BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 2
Mon Oct 8 - Lecture 20
Protein Secondary Structure Prediction
• Chp 14 - pp 200 - 213
Wed Oct 10 - Lecture 21
Protein Tertiary Structure Prediction
• Chp 15 - pp 214 - 230
Thurs Oct 11 & Fri Oct 12 - Lab 7 & Lecture 22
Protein Tertiary Structure Prediction
• Chp 15 - pp 214 - 230
Required Reading (before lecture)
10/10/07BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 3
Assignments & Announcements
ALL: HomeWork #3 √Due: Mon Oct 8 by 5 PM
• HW544: HW544Extra #1
√Due: Task 1.1 - Mon Oct 1 by noon
Due: Task 1.2 & Task 2 - Fri Oct 12 by 5 PM
• 444 "Project-instead-of-Final" students should also submit:• HW544Extra #1
• √Due: Task 1.1 - Mon Oct 8 by noon
•Due: Task 1.2 - Fri Oct 12 by 5 PM <Task 2 NOT required for BCB444 students>
10/10/07BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 4
Seminars this Week - Thurs:
BCB List of URLs for Seminars related to Bioinformatics:http://www.bcb.iastate.edu/seminars/index.html
• Oct 11 Thurs
• Dr. Klaus Schulten (Univ of Illinois) - Baker Center Seminar
The Computational Microscope 2:10 PM in E164 Lagomarcino
http://www.bioinformatics.iastate.edu/seminars/abstracts/2007_2008/Klaus_Schulten_Seminar.pdf
• Dr. Dan Gusfield (UC Davis) - Computer Science Colloquium
ReCombinatorics: Combinatorial Algorithms for Studying History of Recombination in Populations 3:30 PM in Howe Hall
Auditorium
http://www.cs.iastate.edu/~colloq/new/gusfield.shtml
10/10/07BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 5
Seminars this Week - Fri:
BCB List of URLs for Seminars related to Bioinformatics:http://www.bcb.iastate.edu/seminars/index.html
• Oct 12 Fri • Dr. Edward Yu (Physics/BBMB, ISU) - BCB Faculty Seminar
TBA: "Structural Biology" (see URL below) 2:10 PM in 102
Sci http://webdev.its.iastate.edu/webnews/data/site_gdcb_dept_seminars/30/webnewsfilefield_abstract/Dr.-Ed-Yu.pdf
• Dr. Srinivas Aluru (ECprE, ISU) - GDCB Seminar
Consensus Genetic Maps: A Graph Theoretic Approach 4:10 PM in 1414 MBB
http://webdev.its.iastate.edu/webnews/data/site_gdcb_dept_seminars/35/webnewsfilefield_abstract/Dr.-Srinivas-Aluru.pdf
10/10/07BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 6
Chp 12 - Protein Structure Basics
SECTION V STRUCTURAL BIOINFORMATICS
Xiong: Chp 12 Protein Structure Basics
• Amino Acids• Peptide Bond Formation• Dihedral Angles• Hierarchy• Secondary Structures• Tertiary Structures
• Determination of Protein 3-Dimensional Structure• Protein Structure DataBank (PDB)
10/10/07BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 7
Protein Structure & Function
• Protein structure - primarily determined by sequence
• Protein function - primarily determined by structure
• Globular proteins: compact hydrophobic core & hydrophilic surface
• Membrane proteins: special hydrophobic surfaces
• Folded proteins are only marginally stable• Some proteins do not assume a stable "fold" until they bind to
something = Intrinsically disordered
Predicting protein structure and function can be very hard
-- & fun!
10/10/07BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 8
6 Main Classes of Protein Structure
1) -Domains
Bundles of helices connected by loops
2) -DomainsMainly antiparallel sheets, usually 2 sheets forming
sandwich
3) DomainsMainly parallel sheets with intervening helices, mixed
sheets
4) Domains
Mainly segregated helices and sheets
5) Multidomain (Containing domains from more than one class
6) Membrane & cell-surface proteins
10/10/07BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 9
Protein Structure Databases
PDB - Protein Data Bank http://www.rcsb.org/pdb/ (RCSB) - THE protein structure database
MMDB - Molecular Modeling Databasehttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Structure
(NCBI Entrez) - has "added" value
MSD - Molecular Structure Database http://www.ebi.ac.uk/msd
Especially good for interactions & binding sites
10/10/07BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 10
PDB (RCSB) - recently "remediated" http://www.rcsb.org/pdb
10/10/07BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 11
Structure at NCBIhttp://www.ncbi.nlm.nih.gov/Structure
10/10/07BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 12
MMDB at NCBI http://www.ncbi.nlm.nih.gov/Structure/MMDB/mmdb.shtml
10/10/07BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 13
MMDB: MMolecular MModeling Data Base
• Derived from PDB structure records
• "Value-added" to PDB records includes:• Integration with other ENTREZ databases & tools• Conversion to parseable ASN.1 data description language• Data also available in mmCIF & XML (also true for PDB now)• Correction of numbering discrepancies in structure vs
sequence• Validation • Explicit chemical graph information (covalent bonds)
• Integrated tool for identifying structural neighbors Vector Alignment Search Tool (VAST)
http://www.ncbi.nlm.nih.gov/Structure/VAST/vastsearch.html
10/10/07BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 14
MSD: MMolecular SStructuretructure Database
http://www.ebi.ac.uk/msd/
10/10/07BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 15
wwPDB: World Wide PDBhttp://www.wwpdb.org
10/10/07BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 16
Experimental Determination of 3D Structure
2 Major Methods to obtain high-resolution structures
1. X-ray Crystallography (most PDB structures)
2. Nuclear Magnetic Resonance (NMR) Spectroscopy
Note Advantages & Limitations of each method• (See your lecture notes & textbook)• For more info: http://en.wikipedia.org/wiki/Protein_structure
1. Other methods (usually lower resolution, at present):• Electron Paramagnetic Resonance (EPR - also called ESR, EMR)• Electron microscopy (EM)• Cryo-EM• Scanning Probe Microscopies (AFM - Atomic Force Microscopy)
• http://www.uweb.engr.washington.edu/research/tutorials/SPM.pdf
• Circular Dichroism (CD), several other spectroscopic methods
10/10/07BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 17
Chp 13 - Protein Structure Visualization, Comparison & Classification
SECTION V STRUCTURAL BIOINFORMATICS
Xiong: Chp 13
Protein Structure Visualization, Comparison & Classification
• Protein Structural Visualization• Protein Structure Comparison• Protein Structure Classification
10/10/07BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 18
Protein Structure Visualization
RASMOL & decendents: PyMol, MolMolhttp://www.umass.edu/microbio/rasmol/index2.htm
Cn3D - esp. good for structural alignmentshttp://www.biosino.org/mirror/www.ncbi.nlm.nih.gov/Structure/cn3d/
CHIME (Protein Explorer)http://www.umass.edu/microbio/chime/getchime.htm
MolviZ.Orghttp://www.umass.edu/microbio/chime
Deep View = Swiss-PDB Viewerhttp://www.expasy.org/spdbv
10/10/07BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 19
PyMol http://pymol.sourceforge.net/
10/10/07BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 20
Cn3D http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml
10/10/07BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 21
Cn3D : Displaying 3' Structures
Chloroquine
10/10/07BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 22
Cn3D: Structural Alignments
NADH
10/10/07BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 23
Protein Explorer (Chime)http://www.umass.edu/microbio/chime/pe_beta/pe/protexpl/frntdoor.htm
10/10/07BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 24
Protein Structure Comparison Methods
3 Basic Approaches for Aligning Structures:
1. Intermolecular -
2. Intramolecular -
3. Combined -
• DALI/FSSP (most commonly used)Fully automated structure alignments
1. DALI server http://www.ebi.ac.uk/dali/index.html2. DALI Database (fold classification)
http://ekhidna.biocenter.helsinki.fi/dali/start
We will skip this for now
10/10/07BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 25
Protein Structure Classification
• SCOP = Structural Classification of Proteins
Levels reflect both evolutionary and structural relationships
http://scop.mrc-lmb.cam.ac.uk/scop
• CATH = Classification by Class, Architecture,Topology & Homologyhttp://cathwww.biochem.ucl.ac.uk/latest/
• DALI - (recently moved to EBI & reorganized)
DALI Database (fold classification)http://ekhidna.biocenter.helsinki.fi/dali/start
Each method has strengths & weaknesses….
10/10/07BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 26
SCOP - Structure Classificationhttp://scop.mrc-lmb.cam.ac.uk/scop/
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
10/10/07BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 27
CATH - Structure Classification http://www.cathdb.info/latest/index.html
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
10/10/07BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 28
Chp 14 - Secondary Structure Prediction
SECTION V STRUCTURAL BIOINFORMATICS
Xiong: Chp 14
Protein Secondary Structure Prediction
• Secondary Structure Prediction for Globular Proteins
• Secondary Structure Prediction for Transmembrane Proteins
• Coiled-Coil Prediction
10/10/07BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 29
Secondary Structure Prediction
Has become highly accurate in recent years (>85%)
• Usually 3 (or 4) state predictions:
• H = -helix• E = -strand• C = coil (or loop)• (T = turn)
10/10/07BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 30
Secondary Structure Prediction Methods
• 1st Generation methods Ab initio - used relatively small dataset of structures available
Chou-Fasman - based on amino acid propensities (3-state)
GOR - also propensity-based (4-state)• 2nd Generation methods
based on much larger datasets of structures now availableGOR II, III, IV, SOPM
• 3rd Generation methodsHomology-based & Neural network based
PHD, PSIPRED, SSPRO, PROF, HMMSTR
• Meta-Serverscombine several different methods
Consensus & Ensemble basedJPRED, PredictProtein
10/10/07BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 31
Secondary Structure Prediction Servers
Prediction Evaluation?• Q3 score - % of residues correctly predicted (3-state)
in cross-validation experiments
Best results? Meta-servers• http://expasy.org/tools/ (scroll for 2' structure prediction)
• http://www.russell.embl-heidelberg.de/gtsp/secstrucpred.html
• JPred www.compbio.dundee.ac.uk/~www-jpred
• PredictProtein http://www.predictprotein.org/ Rost, Columbia
Best individual programs? ??• CDM http://gor.bb.iastate.edu/cdm/ Sen…Jernigan, ISU
• GOR V http://gor.bb.iastate.edu/ Kloczkowsky…Jernigan, ISU
10/10/07BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 32
• Developed by Jernigan Group at ISU• Basic premise: combination of 2 complementary methods
can enhance performance by harnessing distinct advantages of both methods; combines FDM & GOR V:
• FDM - Fragment Data Mining - exploits availability of sequence-similar fragments in the PDB, which can lead to highly accurate prediction - much better than GOR V - for such fragments, but such fragments are not available for many cases
• GOR V - Garnier, Osguthorpe, Robson V - predicts secondary structure of less similar fragments with good performance; these are protein fragments for which FDM method cannot find suitable structures
• For references & additional details: http://gor.bb.iastate.edu/cdm/
Consensus Data Mining (CDM)
10/10/07BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 33
Secondary Structure Prediction: for Different Types of Proteins/Domains
For Complete proteins:
Globular Proteins - use methods previously described
Transmembrane (TMM) Proteins - use special methods
(next slides)
For Structural Domains: many under development:
Coiled-Coil Domains (Protein interaction domains)
Zinc Finger Domains (DNA binding domains),
others…
10/10/07BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 34
SS Prediction for Transmembrane Proteins
Transmembrane (TM) Proteins • Only a few in the PDB - but ~ 30% of cellular proteins are
membrane-associated !
• Hard to determine experimentally, so prediction important
• TM domains are relatively 'easy' to predict!
Why? constraints due to hydrophobic environment
2 main classes of TM proteins:
- helical
- barrel
10/10/07BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 35
SS Prediction for TM -Helices
-Helical TM domains:• Helices are 17-25 amino acids long (span the membrane) • Predominantly hydrophobic residues • Helices oriented perpendicular to membrane• Orientation can be predicted using "positive inside" rule
Residues at cytosolic (inside or cytoplasmic) side of TM helix, near hydrophobic anchor are more positively charged than those on lumenal (inside an organelle in eukaryotes) or periplasmic side (space between inner & outer membrane in gram-negative bacteria)
• Alternating polar & hydrophobic residues provide clues to interactions among helices within membrane
Servers? • TMHMM or HMMTOP - 70% accuracy - confused by hydrophobic
signal peptides (short hydrophobic sequences that target proteins to the endoplasmic reticulum, ER)
• Phobius - 94% accuracy - uses distinct HMM models for TM helices& signal peptide sequences
10/10/07BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 36
SS Prediction for TM -Barrels
-Barrel TM domains: • -strands are amphipathic (partly hydrophobic, partly
hydrophilic)
• Strands are 10 - 22 amino acids long
• Every 2nd residue is hydrophobic, facing lipid bilayer
• Other residues are hydrophilic, facing "pore" or opening
Servers? Harder problem, fewer servers…
TBBPred - uses NN or SVM (more on these ML methods later) Accuracy ?
10/10/07BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 37
Prediction of Coiled-Coil Domains
Coiled-coils• Superhelical protein motifs or domains, with two or more
interacting -helices that form a "bundle"• Often mediate inter-protein (& intra-protein) interactions
'Easy' to detect in primary sequence:• Internal repeat of 7 residues (heptad)• 1 & 4 = hydrophobic (facing helical interface)• 2,3,5,6,7 = hydrophilic (exposed to solvent)
• Helical wheel representation - can be used manually detect these, based on amino acid sequence
Servers?
Coils, Multicoil - probability-based methods
2Zip - for Leucine zippers = special type of CC in TFs:
characterized by Leu-rich motif: L-X(6)-L-X(6)-L-X(6)-L
10/10/07BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 38
Chp 15 - Tertiary Structure Prediction
SECTION V STRUCTURAL BIOINFORMATICS
Xiong: Chp 15
Protein Tertiary Structure Prediction
• Methods• Homology Modeling• Threading and Fold Recognition• Ab Initio Protein Structural Prediction• CASP