2d 3d Structure

  • Upload
    thkiran

  • View
    232

  • Download
    0

Embed Size (px)

Citation preview

  • 8/3/2019 2d 3d Structure

    1/38

    1

    Structure prediction methods

    (2D and 3D)

    Much of the text in the slides that follow are drawn either verbatim or

    paraphrased from the following texts:

    Bioinformatics (Baxevanis and Ouellette)

    Chapter 8: Predictive methods using protein sequences

    (Ofran and Rost) 198-219

    Chapter 9: Protein structure prediction and analysis

    (Wishart) 224-247

    Chapter 12: Creation and analysis of protein multiple sequence alignments

    (Barton) 333-336

    Proteins: Structures and molecular properties

    (Thomas Creighton)

    Topics Covered Overview of protein structure: primary, secondary, tertiary, and

    quaternary

    Overview of protein folding

    Secondary structure prediction methods

    Solvent accessibility prediction

    3D fold prediction Ab initio protein structure prediction

    Threading methods

    Community evaluation of protein structure prediction

    Critical Assessment of protein Fold Prediction (CASP)http://predictioncenter.org/

    EVA (real-time continuous evaluation of protein fold prediction methods)http://cubic.bioc.columbia.edu/eva/

    Methods for solving protein structures experimentally

  • 8/3/2019 2d 3d Structure

    2/38

    2

    The importance of protein structure

    Bioinformatics is much more than just sequence analysismany

    of the most interesting and exciting applications in

    bioinformatics today actually are concerned with structure

    analysis.

    The origins of bioinformatics actually lie in the field of structural

    biology

    Proteins are perhaps the most complex chemical entities in nature.

    No other class of molecule exhibits the variety and and

    irregularity in shape, size, texture and mobility that can be

    found in proteins.

    Baxevanis & Ouellette (Ch. 9, p.224, Wishart)

    Hierarchical descriptions of proteins

    (follows the folding process) Primary structure: the amino acid sequence

    Secondary structure: regular local structure of linear segments ofpolypeptide chains (Creighton)

    Helices (~35% of residues) Beta sheet (~25% of residues) Both types predicted by Linus Pauling (Corey and Pauling, 1953) Other less common structures:

    Beta turns

    3/10 helices

    loops

    Remaining unclassifiable regions termed random coil or unstructuredregions

    http://www.chembio.uoguelph.ca/educmat/phy456/456lec01.htm Tertiary structure: Overall topology of the folded polypeptide chain

    (Creighton) Mediated by hydrophobic interactions between distant parts of protein

    Quaternary structure: Aggregation of the separate polypeptide chainsof a protein (Creighton)

    Baxevanis & Ouellette (Ch. 9, p.224, Wishart)

  • 8/3/2019 2d 3d Structure

    3/38

    3

    Protein folding

    Folded conformations of globular

    proteins

    Most proteins are globular: natural proteins in solution aremuch smaller in their dimensions than comparablepolypeptides with random or repetitive conformations andhave roughly spherical shapes

    Denaturation: Most proteins are robust to changes in theirenvironment, until they (somewhat literally) fall apart: Most proteins are robust to changes in temperature, pH and

    pressure, exhibiting little or no change until a point is reached atwhich there is a sudden change and loss of biological function

    Denaturing proteins has been used to explore folding pathways

    (e.g.,Understanding how proteins fold: the lysozyme story so far.Dobson

    CM, Evans PA, Radford SE.Trends Biochem Sci. 1994)

    Creighton, Proteins Ch. 6

  • 8/3/2019 2d 3d Structure

    4/38

    4

    Structural domains

    Folded structures of most small proteins are roughlyspherical and remarkably compact

    Proteins with >200aa tend to consist of >2 structural units,called domains

    Domains interact to varying extents, but less extensivelythan do structural elements within domains Some domain detection tools make use of this pattern, looking for

    covariation between positions as evidence of interaction

    Nagarajan and Yona, Automatic prediction of protein domainsfrom sequence information using a hybrid learning system.Bioinformatics2004

    Domains may not always be well segregated; someproteins have multiple domains with 2 or three polypeptideconnections between domains

    See, for example, the SCOP interleaved domains

    Creighton, Proteins Ch. 6

    Structural domains (contd) Definition of domain is a subjective process done in

    different ways by different people

    Domains are most evident by their compactness

    Expressed quantitatively as the ratio of the surface area of adomain to the surface area of a sphere with the same volume

    Observed values are 1.65+/- 0.08

    Course of polypeptide backbone through domain isirregular, but generally follows moderately straight coursethrough the domain and then makes a U-turn to recross thedomain

    Overall impression: segments of somewhat stiffpolypeptide chain interspersed with relatively tight turns orbends (almost always on the molecules surface) Compared to behavior of a fire hose dropped in one spot

    Creighton, Proteins Ch. 6

  • 8/3/2019 2d 3d Structure

    5/38

    5

    Structural

    domains

    (contd)

    Creighton, Proteins Ch. 6

    Figure 6.13

    Driving forces in protein folding

    Complex combination of local and globalforces

    Local forces drive secondary structureformation

    Repulsion between hydrophobic side chains of someamino acids and hydrophilic backbone of proteinchain (intra-molecular)

    Interaction between side chains and surroundingsolvent

    Subcellular environment (e.g., membrane, secreted, etc.)

    Pauling et al 1951

    Baxevanis & Ouellette (Ch. 9, Wishart)

  • 8/3/2019 2d 3d Structure

    6/38

    6

    More driving forces in protein folding

    Hydrophobicity

    Hydrophobic residues need to be shielded from solvent

    Polar residues to the outside, hydrophobic to the inside

    Stronger interactions

    Hydrogen bonds, disulfide bridges

    Weak interactions

    Van der Waals, electrostatic, etc

    Recommended reading: Proteins (Thomas Creighton).

    Global effects on protein fold

    Long-range interactions (repulsive or

    attractive) between distant parts of structure

    These can override local effects

    E.g., chameleon protein:

    11 amino acids adopt helical structure in one region,

    and the same 11 amino acids adopt beta strand in

    another.

    Minor & Kim, 1996

    Baxevanis & Ouellette (Ch. 9, Wishart)

  • 8/3/2019 2d 3d Structure

    7/38

    7

    Ligands and co-factors

    http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/E/Enzymes.html#coenzymes

  • 8/3/2019 2d 3d Structure

    8/38

    8

    Information required for folding is (mostly)

    contained in the primary sequence Early on, proteins were shown to fold into their native

    structures in isolation

    This led to the belief that structure is determined by

    sequence alone (Anfinsen, 1973)

    Over the last decade, a significant number of proteins have

    been shown to not fold properly in the test tube (e.g.,

    requiring the assistance of chaperonins)

    Nevertheless, the native 3D structure is assumed to be in

    some energetic minimum This led to the development ofab initio folding methods

    Baxevanis & Ouellette (Ch. 9, Wishart)

    Folding pathways

    Evidence that local structure segments form first,

    and then pack against each other to form 3D fold

    Exploited in protein fold prediction, Rosetta method

    Simons, Bonneau, Ruczinski & Baker (1999).Ab initio

    Protein Structure Prediction of CASP III Targets Using

    ROSETTA. Proteins

    Semi-stable structural intermediates on foldingpathway to lowest-energy conformation

    Prof. Susan Marqusee, Berkeley

    Baxevanis & Ouellette (Ch. 9, Wishart)

  • 8/3/2019 2d 3d Structure

    9/38

    9

    Secondary structure

    Alpha

    helix

    structure

    http://www.web-books.com/MoBio/Free/Ch2C4.htm

  • 8/3/2019 2d 3d Structure

    10/38

    10

    Amphi-

    pathic

    alpha

    helix

    http://www.web-books.com/MoBio/Free/Ch2C4.htm

    Beta strand

    http://www.web-books.com/MoBio/Free/Ch2C4.htm

  • 8/3/2019 2d 3d Structure

    11/38

    11

    Beta sheet

    http://www.web-books.com/MoBio/Free/Ch2C4.htm

    Secondary Structure Prediction

  • 8/3/2019 2d 3d Structure

    12/38

    12

    Why is secondary structure

    prediction important?

    Secondary structure diverges less rapidly

    than primary sequence

    Knowledge or prediction of 2ary structure

    improves detection and alignment of remote

    homologs

    3d-pssm, SAM T02 (fold prediction servers)

    Baxevanis & Ouellette (Ch. 9, Wishart)

    Focusing on single residues

    Early structure prediction methods focused on thestructural characteristics of individual residues

    This enabled the larger problem to be decomposedinto smaller easier-to-solve problems (enabling thecombination of solutions to sub-problems to forma global solution)

    This also enabled methods to focus on detectingtransmembrane regions, solvent-accessibleresidues, and other important features ofmolecules

    Baxevanis & Ouellette (Ch. 9, Wishart)

  • 8/3/2019 2d 3d Structure

    13/38

    13

    Secondary structure prediction

    using MSA information?

    Labeling residues in a sequence as -helix, -

    sheet or turn/coil (3-state prediction).

    Accuracy of prediction enhanced by ~6% when

    multiple sequence alignments are used vs the use

    of a single sequence (Cuff & Barton, 1999)

    State of the art methods -- PSIPRED (Jones 1999)

    and JNET (Cuff & Barton, unpublished) have >76%accuracy for 3-state prediction.

    Baxevanis & Ouellette (Ch. 12, Barton)

    Amino acid patterns indicative of-strand structures

    Short runs of conserved hydrophobic

    Buried -strand

    An i, i+2, i+4 pattern of conservedhydrophobic residues suggests a surface -strand.

    Conserved residues sharing the samephysicochemical properties are likely toform one face of a strand.

    Baxevanis & Ouellette (Ch. 12, Barton)

  • 8/3/2019 2d 3d Structure

    14/38

    14

    Amino acid patterns indicative of

    -helical structures

    Conservation patterns of i, i+3, i+4, i+7and variations (e.g., i, i+4, i+7) suggestsan alpha helix

    Amphiphilic/conservation patterns(alternating hydrophobic and polarresidues) following an i, i+3, i+4, i+7pattern (and variations, e.g., i, i+4, i+7)are likely to represent surface helices

    Baxevanis & Ouellette (Ch. 12, Barton)

    Identifying loop regions

    Insertions and deletions are not welltolerated in the hydrophobic core.

    Regions of an MSA that include many gapcharacters are likely to indicate surface loops.

    Glycine and proline residues can be foundin any secondary structure.

    However, conserved glycine/proline residuesare strongly suggestive of loops.

    Baxevanis & Ouellette (Ch. 12, Barton)

  • 8/3/2019 2d 3d Structure

    15/38

  • 8/3/2019 2d 3d Structure

    16/38

    16

    Early schemes used observed preferences

    Various schemes give the amino acids numerical weights orrankings for their preferences, and several computer programscan predict the secondary structure from the given sequence.

    The simplest such scheme of Chou and Fasman, Ann. RevBiochem. (1978), examined the statistical distribution of aminoacids in alpha helix, beta sheet and turns or loops, using a set ofknown protein structures from the protein databank.

    A novel sequence can then be scanned, and the tendency ofeach portion of the sequence to form secondary structure isassessed.

    http://www.chembio.uoguelph.ca/educmat/phy456/456lec01.htm

    Improving secondary structure prediction

    Peer pressure (pressure from the neighbors): A minimum of4 amino acids out of 6 should show alpha preference, or 3 out of5 beta preference, or clusters of 2-3 breakers in a sequence of 4are needed to set the secondary structure in any region, andindividual misfits adopt the secondary structure of theirneighbours.

    Learning secondary structure preferences from expandeddata sets: More recent prediction schemes take advantage oflarger data sets to examine amino acid preference for differentregions in a helix or different positions in a tight turn.

    Up-weighting conserved residues: In addition, sequences ofhomologous proteins may be compared. The rationale is thathighly conserved amino acids contribute more to the three

    dimensional structure than unconserved, and differentweightings can be introduced to the statistical analysis.

    Improved accuracy: The accuracy of prediction has risen fromabout 55% using the simple Chou-Fasman method, where thetendency is to overpredict, to about 80% using current methods.

    http://www.chembio.uoguelph.ca/educmat/phy456/456lec01.htm

  • 8/3/2019 2d 3d Structure

    17/38

    17

    Basic types of secondary structure Helices ( and others)

    is most common; 3.6 residues/turn

    Side chains project outward

    Structure is stabilized between hydrogen bondsbetween the carbonyl (CO) group of one amino acidand the amino (NH) group of the amino acid that is 4positions C-terminal to it

    -Strands (two or more strands interact to form a

    -sheet) Other (sometimes called loop, coil, or non-

    regular)

    Baxevanis & Ouellette (Ch. 9, Wishart)

    The new generation of secondarystructure prediction

    PHDsec (Rost et al 1994, Rost et al 1996)

    Based on machine learning concepts

    Training set: learn implicit rules, principles and model

    parameters from labelled data (sequences whose

    secondary structures are known for each position)

    Test set: sequences of unknown structure

    Baxevanis & Ouellette Ch 8 (Ofran and Rost)

  • 8/3/2019 2d 3d Structure

    18/38

    18

    Key to success

    The success of machine learning algorithmsdepends on the careful choice of the biologicallybased features used for training and asufficiently large and accurate training set

    To enhance prediction accuracy on novel data,training data diversity is also critical

    Exploit knowledge that local environment isimportant: to predict 2ary structure of residue i,

    consider all residues in a window aroundI: i-n, i, i+n.

    Baxevanis & Ouellette Ch 8 (Ofran and Rost)

    PHDsec

    Employs homology detection and a feed-forwardartificial neural network

    Step 1: homolog search and MSA construction

    Step 2: label each position with conservationsignal (across MSA) and observed substitutions

    Step 3: submit representative annotated sequence

    to a system of neural networks. Output is a prediction of the most likely secondary

    structure at each position, with the estimatedconfidence in that prediction

    Baxevanis & Ouellette Ch 8 (Ofran and Rost)

  • 8/3/2019 2d 3d Structure

    19/38

    19

    Assessing performance evaluations

    Overall, the correct evaluation of performance

    for prediction methods is an art in itself; only a

    handful of methods turned out over time to not

    have been overestimated by their developers.

    Evaluation must be performed on a standard dataset

    Training and test data should be rigorously kept

    separate

    Standard deviations of estimates should be provided

    Baxevanis & Ouellette Ch 8 (Ofran and Rost)

    Other problems with comparing

    different methods

    Performance reported in literature can take different forms

    Accuracy and coverage

    Positive (or negative) predictive power

    Sensitivity and specificity

    Machine learning terms (e.g., Matthews coefficients)

    Wilcoxon paired score signed rank tests

    Or be based on different criteria for success per residue

    per secondary structure element

    per protein

    Others measure performance only in cases where aprediction has high confidence (with a likelihood of alower FP rate)

    Baxevanis & Ouellette Ch 8 (Ofran and Rost)

  • 8/3/2019 2d 3d Structure

    20/38

    20

    The EVA server Continuous assessment of the predictions of automatic

    servers using the same measurements, the same standards,and the same sequences to all methods

    New structures (pre-release to PDB) given to EVA byparticipating structural biologists. EVA submits the aminoacid sequences to online servers.

    Predictions stored until release of 3D coordinates to PDB.Then the predicted (2D or 3D) structures can be comparedagainst the solved structures, and given various scores.

    Approach enables the community to compare methods, andgives developers concrete feedback that is critical formethod improvement.

    Baxevanis & Ouellette Ch 8 (Ofran and Rost)

    How do the methods compare?

    Best methods now reach 76% accuracy at 3-stateprediction (helix, strand, random coil)

    Rost 2001

    See EVA website for detailed comparisons

    Metaservers:

    Consensus approaches combining weighted predictionsfrom different servers

    These almost always outperform individual methods

    Shown in both CASP and EVA

    Baxevanis & Ouellette Ch 8 (Ofran and Rost)

  • 8/3/2019 2d 3d Structure

    21/38

    21

    Caveats

    Even when an experimental structure is available, it issometimes unclear where one secondary structure elementends and another begins

    Low-confidence predictions (and regions of disagreementacross servers) can correspond to structurally ambiguousregions

    Real-life example: Prion protein (involved in bovinespongiform encephalopathy, Creutzfeld-Jakob disease, etc).

    Region assumed to be responsible for aggregation believed to flipfrom experimentally determined helical structure to (predicted)strand in diseased individuals

    All the best secondary structure prediction methods predict thisregion to be beta (incorrect)

    Baxevanis & Ouellette Ch 8 (Ofran and Rost)

    Secondary structure predictionprograms

    PSI-PRED

    JNET (Cuff & Barton)

    PHD (Rost & Sander)

    Baxevanis & Ouellette Ch 8 (Ofran and Rost)

  • 8/3/2019 2d 3d Structure

    22/38

    22

    PSIPRED

  • 8/3/2019 2d 3d Structure

    23/38

    23

    Solvent accessibility

    Solvent accessibility is the area of a proteins surface thatis exposed to surrounding solvent.

    This information is critical for facilitating the detection offunctionally (as opposed to structurally) critical residues

    Solvent-exposed positions have the potential to interactwith other molecules, metal atoms or ions

    Entirely buried residues may help stabilize a proteins 3Dfold, but can not participate in

    an enzyme active site,

    binding site in a DNA-binding protein, or an interaction site in a signal transduction component

    all of which require spatial accessibility of the residue tosolvent

    Baxevanis & Ouellette Ch 8 (Ofran and Rost)

  • 8/3/2019 2d 3d Structure

    24/38

    24

    Measuring solvent accessibility Measured in square Angstroms

    Values range from 0 (entirely buried) to 300 (onsurface)

    Two entirely exposed residues can have verydifferent accessible areas

    Residues with long side chains expose a larger area tosolvent than residues with short side chains

    Values typically normalized by the maximum

    possible for an amino acid, to measure thepercentage of the residue that is accessible tosolvent.

    Baxevanis & Ouellette Ch 8 (Ofran and Rost)

    Conservation of solventaccessibility

    Homologous proteins with similar folds

    tend to conserve solvent accessibility values

    at buried positions (i.e., solvent accessibility

    between 0-10%);

    Exposed positions (values between 60-

    100%) show less conservation of solventaccessibility between homologs.

    Rost and Sander, 1994

    Baxevanis & Ouellette Ch 8 (Ofran and Rost)

  • 8/3/2019 2d 3d Structure

    25/38

    25

    Prediction methods PHDacc and PROFace

    Part of the PredictProtein service at Columbia

    U. (Burkhard Rost lab)

    Sequence alignment and profile construction

    using MaxHom

    Per-residue 10-state scheme, corresponding to

    predicted percentage of residue that is

    accessible (1=0-1%; 2=2-4%; etc)

    Baxevanis & Ouellette Ch 8 (Ofran and Rost)

    Prediction methods: Jpred Cuff & Barton, 2000

    Prediction server predicting 2ary structure andsolvent accessibility

    Sequence alignment and profile construction usingPSI-BLAST and HMM methods

    Per-residue 3-state scheme, corresponding topredicted percentage of residue that is accessible(0%, 5%, 25%)

    Prediction outputs from two neural networks arecombined to give an average relative solventaccessibility.

    Baxevanis & Ouellette Ch 8 (Ofran and Rost)

  • 8/3/2019 2d 3d Structure

    26/38

    26

    Solvent accessibility:

    Method performance No large-scale continuous system for evaluation isavailable (unlike the case for 2D and 3D structureprediction)

    Local sequence information is insufficient Accessibility to solvent appears to be influenced by nonlocal

    effects

    For two-state prediction (buried vs exposed) accuracy isbetween 75-85% for both PHDacc and PROFacc

    For more detailed definitions (e.g., percentage of exposure),accuracy is more difficult to measure.

    Correlation coefficient between predicted and measuredsolvent accessibility for PHDacc is 0.53

    Random guess would yield a correlation coefficient of zero

    Superior results require a homology model construction

    Baxevanis & Ouellette Ch 8 (Ofran and Rost)

    3D-structure prediction

  • 8/3/2019 2d 3d Structure

    27/38

    27

    Basic premise: The function and structure of

    a protein are encoded in its primary sequence

    The amino acid sequence determines a proteins 3D

    structure, subcellular localization, intermolecular

    interactions, biochemical physiological tasks, and

    (eventually) how and when it will be broken down into

    its component building blocks.

    Paraphrased from class text (Ofran and Rost), p 198

    How many unique protein folds are there?

    Many structural biologists believe that all protein domainswill eventually be classified into only 1000 different foldclasses

    Koonin et al 2002

    Structural Genomics Initiative is designed to populate thatfold space Even with attempts to solve novel structures, upon examination of

    new structures, many are clearly members of existing structural

    classes

    Baxevanis & Ouellette (Ch. 9, Wishart)

  • 8/3/2019 2d 3d Structure

    28/38

    28

    3D structure classification schemes All alpha (>50% helix; 30% beta sheet;

  • 8/3/2019 2d 3d Structure

    29/38

    29

    Threading

    Limited to generating approximate models or suggestingapproximate folds

    >5 Angstroms for 3D threading

    >3Angstroms for 2D threading

    Name based on threading a tube (called a snake) througha plumbing system.

    Each unique threading of a sequence through the 3D modelcan be evaluated using empirically derived energy functionor measure of packing efficiency

    Sequences can be scored based on how well they fit themodel (i.e., the best score achievable)

    Baxevanis & Ouellette Ch 9 (Wishart)

    Three-dimensional threading First described by Novotny et al (1984)

    Rediscovered in early 1990s

    Jones et al 2992; Sippl & Weitckus 1992; Bryant & Lawrence1993

    Based largely on heuristic contact potentials (interactions betweenpairs of residues)

    3D coordinates of theoretical structure (based on threading ofsequence through PDB structure model) used to evaluate predictedcontacts and derive a fitness score based on a pseudoenergyfunction

    Powerful for predicting 3D structure of unknown proteins,

    and for evaluating structure of known proteins Limitations found in this method:

    interactions are not always conserved between distant homologs

    Computational complexity (very slow)

    Modest accuracy (early methods ignored amino acid information;model accuracy >5Angstroms)

    Baxevanis & Ouellette Ch 9 (Wishart)

  • 8/3/2019 2d 3d Structure

    30/38

    30

    Contact maps 2D plots of distances

    between C-alphaatoms of all pairs ofresidues

    Observed interactionsbetween amino acidsused to form contactpotentials for 3Dthreading methods

    Creighton, Proteins Ch. 6

    Figure 6.14

    Two-dimensional threading Sequence-profile methods; combines predictions of 2ary

    structure prediction (and possibly solvent accessibility)with standard profile methods to score and align proteins

    Improved accuracy through combined use of 2ary structureand amino acid similarity

    Much faster than standard 3D threading

    Model accuracy good but not excellent (RMSD>3Angstroms)

    However, for model construction for proteins with no closehomologs with solved structure, these methods are among the best

    Examples: UCSC SAMT99 (two-track HMMs), 3d-pssm, FUGUE

    Judged best by EVA

    Baxevanis & Ouellette Ch 9 (Wishart)

  • 8/3/2019 2d 3d Structure

    31/38

    31

    Rosetta

    Hybrid ab initio and homology-based

    structure prediction

    David Baker

    The HMMSTR-Rosetta server

    http://www.bioinfo.rpi.edu/~bystrc/hmmstr/server.php

  • 8/3/2019 2d 3d Structure

    32/38

    32

    Assessing method performance

    Astral benchmark datasets

    Park et al

    CASP experiments

    EVA and Livebench

    Continuous evaluation of webservers

  • 8/3/2019 2d 3d Structure

    33/38

    33

    Experimental methods for solving

    protein 3D structure

    Experimental determination ofprotein structure

    X-ray crystallography

    NMR spectroscopy

  • 8/3/2019 2d 3d Structure

    34/38

    34

    X-ray crystallography

    Most accurate; can be applied to larger proteins

    Oldest method; first structure (myoglobin) determined in late1950s (Kendrew et al 1958). More than 20K structures solved todate

    Method: Small protein crystals (measuring

  • 8/3/2019 2d 3d Structure

    35/38

    35

    NMR spectroscopy

    Much newer: first NMR structure in 1983 Allows biologists to study structure and dynamics of molecules in liquid state

    (or near-physiological environment)

    Structures solved by measuring how radio waves are absorbed by atomicnuclei

    Absorption measurement allows the determination of how much nuclearmagnetism is transferred from one atom (or nucleus) to another

    Magnetization transfer measured through chemical shifts, J-couplings and nuclearOverhauser effects

    Measured parameters define a set of approximate structural constraints that are fedinto a constraint minimization calculation (distance geometry or simulatedannealing)

    Result is an ensemble of (15-50) of structures that satisfy the experimentalconstraints

    These multiple structures are overlaid/superimposed on each other to produceblurrograms

    NMR result is potentially more reflective of true solution behavior of proteins;most proteins seem to exist in an ensemble of slightly different configurations

    Baxevanis & Ouellette (Ch. 9, Wishart)

    Limitations of NMR spectroscopy

    Size limitations: maximum of 30kD (~250aa)

    Solubility of molecule

    cannot be applied to membrane proteins

    Expensive: requires special isotopically labeled molecules

    Inherently less precise

    Baxevanis & Ouellette (Ch. 9, Wishart)

  • 8/3/2019 2d 3d Structure

    36/38

    36

    Storing and retrieving protein structures

    The Protein Data Bank (PDB)

    First electronic database in bioinformatics

    Set up at Brookhaven National Laboratory by WalterHamilton in 1971

    7 protein structures at database initiation

    Coordinates stored and distributed on punch cards and computer tape

    Currently

    22K structures (as of October 23, 2005)

    Coordinate distribution and deposition is electronic (via the worldwide web)

    Moved to the Research Collaboratory for Structural Bioinformatics(RSCB) in 1998

    Primary archival center for experimentally determined 3D structuresof proteins, nucleic acids, carbohydrates and complexes

    Separate repository for theoretical models

    Baxevanis & Ouellette (Ch. 9, Wishart)

    http://www.usm.maine.edu/~rhodes/ModQual/index.html

  • 8/3/2019 2d 3d Structure

    37/38

    37

    http://www.usm.maine.edu/~rhodes/ModQual/index.html

    Summary Experimental determination of protein structure is

    expensive and not always straightforward

    Predictive methods are relied upon to obtain clues toprotein fold (and function)

    Knowing what (which parts of a protein structure) you canbelieve and what you cant is critical for both experimentaland predicted structures

  • 8/3/2019 2d 3d Structure

    38/38

    Summary (contd)

    Ab initio methods of protein fold prediction use physics-based energyminimization to simulate the process of protein folding

    These methods are generally less successful than homology-based foldprediction (limited to short peptides/small proteins)

    Exception: Rosetta/I-sites methods (Baker group) which employ bothtypes of approach

    Threading methods fall into the homology-based class of approaches.

    2D profiles use 2ary structure (prediction/knowledge) as well as sequenceinformation (and perhaps additional information).

    3D profiles use 3D models and assign scores to proteins based on inter-residue contacts based on the observed contacts in the original structuretemplate and derived contact potentials from other structures

    Summary (contd) Community assessment of 2D and 3D structure prediction uses various

    approaches

    EVA and LiveBench (continuous real-time assessment of methods)

    CASP (Critical Assessment of Protein Structure Prediction)

    Benchmark datasets (e.g., Astral PDB40 for fold recognition)

    Reported accuracy of 2D structure prediction between 75-77% (forbest methods)

    Reported accuracy of comparative models derived by 3D structureprediction servers is harder to assess.

    Fold prediction (ignoring the comparative model construction) is fairly

    accurate for the best serversprovided A homologous structure has already been deposited in the PDB

    That structure can be detected with a significant E-value using sequenceinformation alone, e.g., by PSI-BLAST)

    The inclusion of 2ary structure prediction (e.g., in 2D profiles) canimprove the alignment and give a modest boost to fold recognitionaccuracy when %ID is very low, but can also yield errors in prediction