Protein Prediction

Embed Size (px)

Citation preview

  • 7/30/2019 Protein Prediction

    1/100

    Protein Secondary Structure PredictionPSSP

  • 7/30/2019 Protein Prediction

    2/100

    Proteins

    Protein: from the Greek word PROTEUO which means "tobe first (in rank or influence)"

    Why are proteins important to us:

    Proteins make up about 15% of the mass of the average person

    and maintain the structural integrity of the cell.

    Enzyme acts as a biological catalyst

    Storage and transportHaemoglobin

    Antibodies

    Hormones Insulin

  • 7/30/2019 Protein Prediction

    3/100

    Introduction to proteins

    Peptide

    Bond

  • 7/30/2019 Protein Prediction

    4/100

    Four levels of protein

    structure

  • 7/30/2019 Protein Prediction

    5/100

    Conformational Parameters of secondary structure of

    a protein

    Dihedral Angles/torsion angles/rotation angles

    =phi= N-C

    =psi= C-C

    = omega=C-N

  • 7/30/2019 Protein Prediction

    6/100

    These values can be calculated?

    Ramachandran Plot: Founded by G.N.Ramachandran.

    Green region indicates the

    stericially permitted & values except Gly and Pro.

    Yellow circles represent the

    conformational angles of several

    secondary structures..-helix,parallel & anti parallel -sheet

  • 7/30/2019 Protein Prediction

    7/100

    Helices

  • 7/30/2019 Protein Prediction

    8/100

    Helices

    H: - helix

    G: 310 helix

    I: - helix (extremely rare)

  • 7/30/2019 Protein Prediction

    9/100

    Hydrogen bondingi-i+3th i-i+4th i-i+5th

  • 7/30/2019 Protein Prediction

    10/100

    Secondary Structure

    8 different categories (DSSP): H: - helix

    G: 310 helix

    I: - helix (extremely rare)

    E: - strand

    B: - bridge

    T: - turn

    S: bend

    L: the rest

    (pitch 5.4 A0)

    1.5 A0

  • 7/30/2019 Protein Prediction

    11/100

    Three secondary structure states

    Prediction methods are normally trained andassessed for only 3 states (residues):

    H (helix), E (strands) and L (coil)

    There are many published 8-to-3 states reduction

    methods Standard reduction methods are defined by

    programsDSSP (Dictionary of SS of Proteins),STRIDE, and DEFINE

    Improvement of predictive accuracy of different SSP(Secondary Structure Prediction) programs depends

    on the choice of the reduction method

  • 7/30/2019 Protein Prediction

    12/100

    Protein Secondary Structure Prediction

    Techniques for the prediction of protein secondarystructure provide information that is useful both in

    ab initio structure prediction and

    as an additional constraint for fold-recognition

    algorithms. Knowledge of secondary structure alone can help the

    design of site-directed or deletion mutants that will

    not destroy the native protein structure.

    For all these applications it is essential that thesecondary structure prediction be accurate, or at

    least that, the reliability for each residue can be

    assessed.

  • 7/30/2019 Protein Prediction

    13/100

    Protein Secondary Structure Prediction If a protein sequence shows clear similarity to a

    protein of known three dimensional structure, then

    the most accurate method of predicting the

    secondary structure is to align the sequences by

    standard dynamic programming algorithms, as the

    homology modelling is much more accurate thansecondary structure prediction for high levels of

    sequence identity.

    Secondary structure prediction methods are of most

    use when sequence similarity to a protein of knownstructure is undetectable.

    It is important that there is no detectable sequence

    similarity between sequences used to train and test

    secondary structure prediction methods.

  • 7/30/2019 Protein Prediction

    14/100

    Protein Secondary Structure

    Secondary Structure

    Regular

    Secondary

    Structure

    ( -helices, -

    sheets)

    Irregular

    Secondary

    Structure

    (Tight turns,

    Random coils,

    bulges)

  • 7/30/2019 Protein Prediction

    15/100

    PSSP Algorithms

    There are three generations in PSSP

    algorithms

    Early/First Generation: based onstatistical/rule basedinformation of single

    aminoacids

    Second Generation: based on windows(segments) of aminoacids. Typically a window

    containes 11-21 aminoacids

    Third Generation: based on the use of

    windows on evolutionary information

  • 7/30/2019 Protein Prediction

    16/100

    PSSP: First Generation

    First generation PSSP systems are based on

    statistical information on a single aminoacid

    The most relevant algorithms:

    Chow-Fasman, 1974 (Statistics based)

    GOR, 1978 (Rule based)

    Both algorithms claimed 74-78% of predictive

    accuracy, but tested with better constructed datasetswere proved to have the predictive accuracy ~50%

    (Nishikawa, 1983)

  • 7/30/2019 Protein Prediction

    17/100

    PSSP: Second Generation

    Based on the information contained in awindow of aminoacids (11-21 aa.)

    The most systems use algorithms based on: Statistical information

    Physico-chemical properties

    Sequence patterns

    Multi-layered neural networks

    Graph-theory Multivariante statistics

    Expert rules

    Nearest-neighbour algorithms

    No Bayesian networks

  • 7/30/2019 Protein Prediction

    18/100

    PSSP: Second Generation

    Main problems:

    Prediction accuracy

  • 7/30/2019 Protein Prediction

    19/100

    PSSP: Third Generation

    PHD: First algorithm in this generation (1994) Evolutionary information improves the prediction accuracy to

    72%

    Use of evolutionary information:

    1. Scan a database with known sequences with alignmentmethods for finding similar sequences

    2. Filter the previous list with a threshold to identify themost significant sequences

    3. Build aminoacid exchangeprofiles based on theprobable homologs (most significant sequences)

    4. The profiles are used in the prediction, i.e. in buildingthe classifier

  • 7/30/2019 Protein Prediction

    20/100

    PSSP: Third Generation

    Many of the second generation algorithms have been

    updated tothird generation The most important algorithms of today

    Predator: Nearest-neighbour

    PSI-Pred: Neural networks

    SSPro: Neural networks

    SAM-T02: Homologs (Hidden Markov Models)

    PHD: Neural networks

    Due to the improvement of protein information indatabases i.e. better evolutionary information, todays

    predictive accuracy is ~80%

    It is believed that maximum reachable accuracy is

    88%

  • 7/30/2019 Protein Prediction

    21/100

    First Generation PSSP

    Two classical methods that use previously

    determined propensities:

    Chou-Fasman

    Garnier-Osguthorpe-Robson

  • 7/30/2019 Protein Prediction

    22/100

    Chou-Fasman method

    Uses table of conformational parameters

    (propensities) determined primarily frommeasurements of secondary structure.

    Frequency of amino acid X observed in element Y

    Frequency of element Y in database

  • 7/30/2019 Protein Prediction

    23/100

  • 7/30/2019 Protein Prediction

    24/100

  • 7/30/2019 Protein Prediction

    25/100

    Designations:H = Strong Former,

    h = Former,

    I = Weak Former,

    i = Indifferent,

    B = Strong Breaker,

    b = Breaker;

    P = Conformational

    Parameter

  • 7/30/2019 Protein Prediction

    26/100

    The Chou-Fasman method

    If you were asked to determine whether an amino

    acid in a protein of interest is part of a -helix or -

    sheet, you might think to look in a protein database

    and see which secondary structures amino acids insimilar contexts belonged to.

    The Chou-Fasman method (1974) is a combination of

    such statistics-based methods and rule-basedmethods.

  • 7/30/2019 Protein Prediction

    27/100

    Steps of the Chou-Fasman algorithm:

    1. Calculate propensities from a set of solvedstructures. For all 20 amino acids i,calculate these

    propensities by:

    )(

    )/(

    )(

    )/(

    )(

    )/(

    iP

    TurniP

    iP

    BetaiP

    iP

    HelixiP

    Propensities > 1 mean that the residue type I is likely to be found in theCorresponding secondary structure type.

  • 7/30/2019 Protein Prediction

    28/100

    Amino Acid -Helix -Sheet TurnAla 1.29 0.90 0.78

    Cys 1.11 0.74 0.80

    Leu 1.30 1.02 0.59

    Met 1.47 0.97 0.39

    Glu 1.44 0.75 1.00

    Gln 1.27 0.80 0.97

    His 1.22 1.08 0.69Lys 1.23 0.77 0.96

    Val 0.91 1.49 0.47

    Ile 0.97 1.45 0.51

    Phe 1.07 1.32 0.58

    Tyr 0.72 1.25 1.05

    Trp 0.99 1.14 0.75

    Thr 0.82 1.21 1.03

    Gly 0.56 0.92 1.64

    Ser 0.82 0.95 1.33

    Asp 1.04 0.72 1.41

    Asn 0.90 0.76 1.23

    Pro 0.52 0.64 1.91

    Arg 0.96 0.99 0.88

    Chou and Fasman

    Favors

    -Helix

    Favors

    -strand

    Favors

    turn

  • 7/30/2019 Protein Prediction

    29/100

    Chou and Fasman

    Predicting helices:- find nucleation site: 4 out of 6 contiguous residues with P()>1

    - extension: extend helix in both directions until a set of 4 contiguous

    residues has an average P() < 1 (breaker)

    - if average P() over whole region is >1, it is predicted to be helical

    Predicting strands:

    - find nucleation site: 3 out of 5 contiguous residues with P()>1

    - extension: extend strand in both directions until a set of 4 contiguousresidues has an average P() < 1 (breaker)

    - if average P() over whole region is >1, it is predicted to be a strand

  • 7/30/2019 Protein Prediction

    30/100

    2. Once the propensities are calculated, each aminoacid is categorized using the propensities as one of:

    Each amino acid is also categorized as one of:

    helix-former,

    helix-breaker, or

    helix-indifferent.

    (That is, helix-formers have high helical propensities, helix-breakershave low helical propensities, and helix-indifferent haveintermediate propensities.)

    sheet-former,

    sheet-breaker, or

    sheet-indifferent.

    For example, it was found (as expected) that glycine and prolines

    are helix-breakers.

  • 7/30/2019 Protein Prediction

    31/100

    3.Find nucleation sites.

    These are short subsequences with a high-concentration ofhelix-formers (or sheet-formers).

    These sites are found with some heuristic rule (e.g. a sequenceof 6 amino acids with at least 4 helix-formers, and no helix-breakers").

    4. Extend the nucleation sites, adding residues at theends, maintaining an average propensity greater thansome threshold.

    5. Step 4 may create overlaps; Finally, we deal withthese overlaps using some heuristic rules.

    Th GOR th d

  • 7/30/2019 Protein Prediction

    32/100

    The GOR method(Garnier, Osguthorpe, Robson)

    Position-dependent propensities for helix, sheet or turn is calculated for each amino

    acid. For each position j in the sequence, eight residues on either side areconsidered.

    A helix propensity table contains information about propensity for residues at 17

    positions when the conformation of residue j is helical. The helix propensity tables

    have 20 x 17 entries.

    Build similar tables for strands and turns.

    GOR simplification:

    The predicted state of AAj is calculated as the sum of the position-dependent

    propensities of all residues around AAj.

    GOR can be used at : http://abs.cit.nih.gov/gor/(current version is GOR IV)

    j

    http://abs.cit.nih.gov/gor/http://abs.cit.nih.gov/gor/
  • 7/30/2019 Protein Prediction

    33/100

    Suppose aj is the amino acid that we are trying tocategorize.

    GOR looks at the residues

    Intuitively, it assigns a structure based on probabilities it has

    calculated from protein databases. These probabilities are of the

    form

  • 7/30/2019 Protein Prediction

    34/100

    Accuracy

    Both Chou and Fasman and GOR have been

    assessed and their accuracy is estimated to

    be Q3=60-65%.

    (initially, higher scores were reported, but theexperiments set to measure Q3 were flawed,

    as the test cases included proteins used toderive the propensities!)

  • 7/30/2019 Protein Prediction

    35/100

    Nearest Neighbour Method

    This method depends on the spatial structure of

    central residues in each window having kowledge of

    its neighbours.

    This methods is hence called as Memory/Homology

    based method.

    Steps

  • 7/30/2019 Protein Prediction

    36/100

    Steps Computes MS Algorithm

    Computes the distance between homologous

    sections

    Example : NNSSP

    Training Data

    Test data

    New data

    NN Tree(Binary search tree)

    Validation

    Prediction

    Actual NN

    tree with data

    Modelling scheme of the nerest-neighbour method

    SDV Hyperplanes

    DS manipulation

  • 7/30/2019 Protein Prediction

    37/100

    Neural Networks

    Single sequence methods - train network using sets

    of known proteins of certain types (all alpha, all beta,

    alpha+beta) then use to predict for query sequence

    NNPREDICT (>70% accuracy)

  • 7/30/2019 Protein Prediction

    38/100

    NEURAL NETWORKS

  • 7/30/2019 Protein Prediction

    39/100

    Inspired by the brain

    Traditional computers struggle to

    recognize and generalize patterns

    of the past for future actions

    Brain as an information processing

    system contains 10 billion nerve

    cells or neurons and each neuron isconnected to other neuron through

    about 10,000 synapses

  • 7/30/2019 Protein Prediction

    40/100

    Brain

    Interconnected networkof that

    collect, process anddisseminate electricalsignals via

    Neurons

    Synapses

    Neural Network

    Interconnected network of

    units (or nodes) that

    collect, process anddisseminate values via

    links

    Nodes

    Links

  • 7/30/2019 Protein Prediction

    41/100

    Scheme for modeling Neural Network:

    Building a random network Training the net on a training set

  • 7/30/2019 Protein Prediction

    42/100

    Building a random network

    random selection of the type of node

    random selection of the parameters of the node

    random selection of the number of the inputs

    connecting the inputs and outputs until the netis larger

    running the training set over the net

    selecting the proper outputremoval of all nodes which do not contribute to

    the output

  • 7/30/2019 Protein Prediction

    43/100

    Training the network

    The general idea behind this is to run the net on the

    training set, and every time it gives a right answer

    leave it as it is and or strengthen the path. Every time

    it makes a mistake penalize it. This can be done inseveral ways to incrementally modify the net:

    alter the parameters

    add/delete connections

    add/delete the nodes with their connections

  • 7/30/2019 Protein Prediction

    44/100

    T i l th d l d t t i f d f d

  • 7/30/2019 Protein Prediction

    45/100

    Typical methodology used to train a feed-forwardnetwork for secondary structure prediction is based onQian and Sejnowski, 1988

    Alanine: 100000000

    Helix : 100000000

  • 7/30/2019 Protein Prediction

    46/100

    Different Network Topologies

    Single layer feed-forward networks

    Input layer projecting into the output layer

    Input Output

    layer layer

    Single layer

    network

  • 7/30/2019 Protein Prediction

    47/100

    Different Network Topologies

    Multi-layer feed-forward networks

    One or more hidden layers. Input projects only

    from previous layers onto a layer.

    Input Hidden Output

    layer layer layer

    2-layer or

    1-hidden layer

    fully connected

    network

  • 7/30/2019 Protein Prediction

    48/100

    Different Network Topologies

    Recurrent networks

    A network with feedback, where some of its

    inputs are connected to some of its outputs(discrete time).

    Input Output

    layer layer

    Recurrent

    network

  • 7/30/2019 Protein Prediction

    49/100

    Features

    A typical training set consists of 100 non-homologous protein chains (15,000 training patterns)

    A net with an input window of 17, five hidden nodes

    in a single hidden layer and three outputs will have

    357 input nodes and 1,808 weights.

    Predictions are made on a winner-takes-all basis

  • 7/30/2019 Protein Prediction

    50/100

    PHD (Profile network from HeiDelberg)

    A program with several cascading neural networks. The method employs a two-layered feed-forward

    neural network

    Sequence to structure (First layer)

    Structure to Structure (Second layer)

    > 70%

    Window length is 13, and for every position in the window

    frequencies for 20 aa is calculated (20x13)

    Based on this the OUTPUT is a probability of the three

    possible classes (H,E and C)

  • 7/30/2019 Protein Prediction

    51/100

    Evaluating Prediction Efficiency

    Jackknief test:

    Percentage of correctly classified residues:

    Correlation coefficient for each target class:

  • 7/30/2019 Protein Prediction

    52/100

    Prediction of Transmembrane helices

    Two Ways:

    One is solely based on the construction principles of

    proteins associated with physico-chemical properties

    of amino acids. No concept of training is involved.

    The other is to collect data sets with known

    structures, extract features and use machine learning

    algorithms for predictions.

  • 7/30/2019 Protein Prediction

    53/100

    Outline

    1. Importance of Transmembrane Proteins

    2. General Topologies

    3. Methods (and challenges) for Structural

    Studies of TM Proteins

  • 7/30/2019 Protein Prediction

    54/100

    Eukaryotic cells have many membranes

  • 7/30/2019 Protein Prediction

    55/100

    Transmembrane Proteins

    v Cellular roles include:Communication between cells

    Communications between organelles and cytosolIon transport, Nutrient transport

    Links to extracellular matrixReceptors for viruses

    Connections for cytoskeletonv Over 25% of proteins in complete genomes.

    v Key roles in diabetes, hypertension, depression,arthritis, cancer, and many other common diseases.v Targets for over 75% of pharmaceuticals.

  • 7/30/2019 Protein Prediction

    56/100

    Transmembrane Proteins

    v Cellular roles include:Communication between cells

    Communications between organelles and cytosolIon transport, Nutrient transportLinks to extracellular matrix

    Receptors for virusesConnections for cytoskeleton

    v Over 25% of proteins in complete genomes.v Key roles in diabetes, hypertension, depression,

    arthritis, cancer, and many other common diseases.v Targets for over 75% of pharmaceuticals.

    However, very few TM protein structures have been solved!

    Bi l i l M b Li id Bil

  • 7/30/2019 Protein Prediction

    57/100

    Biological Membrane = Lipid Bilayer

    Approximately 30 thickHydrophobic core + Hydrophilic or charged headgroups

    Mixture of lipids that vary in type of head groups, lengths of acyl chains,number of double bonds

    (Some membranes also contain cholesterol)

    M b Bil ith P t i

  • 7/30/2019 Protein Prediction

    58/100

    Membrane Bilayer with Proteins

    In order to be stable in this environment, a polypeptide chain needs to

    (1) contain a lot of amino acids with hydrophobic sidechains, and

    (2) fold up to satisfy backbone H-bond propensity - How?

  • 7/30/2019 Protein Prediction

    59/100

    Structure Solution #1: Hydrophobic alpha-

    helix

    Satisfies polypeptide

    backbone hydrogen

    bonding Hydrophobic

    sidechains face

    outward into lipids

  • 7/30/2019 Protein Prediction

    60/100

    Examples of Helix Bundle TM Proteins

    PDB = 1QHJ PDB = 1RRC

    Single helix or helical bundles (> 90% of TM proteins)Examples: Human growth hormone receptor, Insulin receptor

    ATP binding cassette family - CFTRMultidrug resistance proteins

    7TM receptors - G protein-linked receptors

  • 7/30/2019 Protein Prediction

    61/100

    Structure solution #2

    Beta-barrel

    Beta sheet satisfies

    backbone hydrogen

    bonds between

    strands

    Wrap sheet around

    into barrel shape

    Sidechains on theoutside of the barrel

    are hydrophobic

    E l f B t B l TM P t i

  • 7/30/2019 Protein Prediction

    62/100

    Examples of Beta Barrel TM Proteins

    PDB = 1EK9 PDB = 2POR

    Beta barrels - in outer membrane of gram negative bacteria,and some nonconstitutive membrane acting toxins

    Examples: Porins

    General Topologies of TM Proteins

  • 7/30/2019 Protein Prediction

    63/100

    General Topologies of TM Proteins

    Single helix or helical bundles and Beta barrelsBoth topologies result in

    hydrophobic surfaces facing acyl chains of lipidsPart protruding from membrane can be a very short sequence (a few

    amino acids), a loop, or large, independently folding domains

  • 7/30/2019 Protein Prediction

    64/100

    Presence of Hydrophobic TM

    Domain can result in:Low levels of expressionDifficulties in solubilization

    Difficulties in crystallization

    Attempting crystallization and structure solution of

    transmembrane proteins is considered difficult and

    risky.

    Diffi lt d i k b t till ibl

  • 7/30/2019 Protein Prediction

    65/100

    Difficult and risky, but still possible:TM Proteins of Known Structure

    Great summary and resource:http://blanco.biomol.uci.edu/Membrane_Proteins_xtal.html

    Bacteriorhodopsin, RhodopsinPhotosynthetic reaction centers

    PorinsLight harvesting complexes

    Potassium channelsChloride channelsAquaporin

    TransportersEtc.

    **Although few in number, each of these structureshave been important for addressing key functions.***

  • 7/30/2019 Protein Prediction

    66/100

    Protein Folding Problem

    How does a one-dimensional amino

    acid sequence determine a specific

    three-dimensional structure?

    Or

    How can we read the sequence andpredict that structure?

  • 7/30/2019 Protein Prediction

    67/100

    General Idea

    We know what an alpha-helix or a beta strand

    looks like, so

    (1) figure out which parts of the sequence arehelices and which parts are strands

    (2) figure out how they pack together

    For soluble proteins, neither is well predicted.

    But for transmembrane proteins ...

    TM P t i St t P di ti St #1

  • 7/30/2019 Protein Prediction

    68/100

    TM Protein Structure Prediction, Step #1For alpha-helical transmembrane proteins, hydropathy plot analysisprovides a fairly accurate method to predict which amino acids form

    membrane-spanning helices

    We can model the structure of anindividual alpha helix fairlyaccurately.

  • 7/30/2019 Protein Prediction

    69/100

    TM Protein Structure Prediction, Step #2

    How do the helices pack in the membrane?There are several labs studying known protein structures to identify factorsinvolved in determining how transmembrane helices pack together(specificity of interaction and packing motifs)

    Hydrogen bonds

    HydrophobiciityAmino acids known to face the lumen of a channelMultiple sequence alignments

    Helix packing sequence motifs, etc.These kinds of information are then combined with protein docking and

    energy minimization programs to predict how the helices pack together.

    It is quite possible that studies of helical transmembrane proteins couldlead to key information about the protein folding problem - how to predictprotein structure from amino acid sequence

  • 7/30/2019 Protein Prediction

    70/100

    collect data sets with known

    Build Model

    Align using clustralW/hmmalign

    Hmmbuild

    Hmmsearch

    Calculate threshold

    Score distribution and Training data (NN)Validation

    Prediction using Machine learning techniques

  • 7/30/2019 Protein Prediction

    71/100

    Summary Transmembrane Proteins play many important processes in

    cellular processes in both health and disease

    Two general type of tertiary structure are found to cross themembranes: beta-barrels and alpha-helices

    Structural Studies of TM Proteins are impeded by difficulties in

    overexpression, purification and crystallization However, the few dozen structures that have been determined

    have provided key information about channels (gating,selectivity, etc.), energetics, transport, and othertransmembrane processes

    Analysis of helical transmembrane protein structures may leadto accurate predictions of protein structure from amino acidsequence for this type of protein

  • 7/30/2019 Protein Prediction

    72/100

    Prediction of protein conformations

    from protein sequences

    (3D Prediction)

    Protein Conformations

  • 7/30/2019 Protein Prediction

    73/100

    73

    Predict protein 3D structure from (amino acid) sequence

    Sequence secondary structure 3D structure

    function

  • 7/30/2019 Protein Prediction

    74/100

    74

  • 7/30/2019 Protein Prediction

    75/100

    75

    Protein 3D Structure Detection

    X-ray Crys

    NMR

    Expensive

    Slow

  • 7/30/2019 Protein Prediction

    76/100

    76

    Protein Structure

    Protein 3D structure biological function

    Lock & key model of enzyme function (docking)

    Folding problem

    protein sequence 3D structure

    Structure prediction and alignment

    Protein design, drug design, etc

    The holy grailof bioinformatics

  • 7/30/2019 Protein Prediction

    77/100

    77

    The Prediction Problem

    Can we predict the final 3D protein structure knowing

    only its amino acid sequence?

    Studied for 4 Decades

    Primary Motivation for Bioinformatics

    Based on this 1-to-1 Mapping of Sequence to

    Structure

    Still very much an OPEN PROBLEM

  • 7/30/2019 Protein Prediction

    78/100

    78

  • 7/30/2019 Protein Prediction

    79/100

    79

    Predicting Protein Structure

    Goal Find best fit of sequence to 3D structure

    Comparative (homology) modeling

    Construct 3D model from alignment to protein sequences

    with known structure Threading (fold recognition)

    Pick best fit to sequences of known 2D / 3D structures (folds)

    Ab initio / de novo methods

    Attempt to calculate 3D structure from scratch Molecular dynamics

    Energy minimization

    Lattice models

  • 7/30/2019 Protein Prediction

    80/100

    80

    PSP: Goals

    Accurate 3D structures. But not there yet.

    Good guesses

    Working models for researchers

    Understand the FOLDING PROCESS

    Get into the Black Box

    Only hope for some proteins

    25% wont crystallize, too big for NMR Best hope for novel protein engineering

    Drug design, etc.

  • 7/30/2019 Protein Prediction

    81/100

    81

    PSP: Major Hurdles

    Energetics We dont know all the forces involved in detail

    Too computationally expensive BY FAR!

    Conformational search impossibly large

    100 a.a. protein, 2 moving dihedrals, 2 possible positions for

    each diheral: 2200

    conformations!

    Levinthals Paradox

    Longer than time of universe to search

    Proteins fold in a couple of seconds??

    Multiple-minima problem

  • 7/30/2019 Protein Prediction

    82/100

    82

    Tertiary Structure Prediction

    Major Techniques

    Comparative Modeling

    Homology Modeling

    Threading

    Template-Free Modeling

    De novo/ab initioMethods

    Physics-Based

    Knowledge-Based

    Homology Modeling

  • 7/30/2019 Protein Prediction

    83/100

    83

    Homology Modeling

  • 7/30/2019 Protein Prediction

    84/100

    84

    Steps

    Template selection

    Target template alignment

    Model building Evaluation

    Repeated until a satisfactory

    model structure is achieved

  • 7/30/2019 Protein Prediction

    85/100

    85

    Threading

    a library of protein folds (templates)

    a scoring function to measure the fitness of a

    sequence -> structure alignment

    a search technique for finding the best alignmentbetween a fixed sequence and structure

    a means of choosing the best fold from among the

    best scoring alignments of a sequence to all possible

    folds

  • 7/30/2019 Protein Prediction

    86/100

    86

    ab initioMethods

    The ab initio approach (Figure 6.25) ignores

  • 7/30/2019 Protein Prediction

    87/100

    87

    pp ( g ) g

    sequence homology and attempts to predict the

    folded state from fundamental energetics orphysicochemical properties associated with the

    constituent residues. This involves modelling

    physicochemical parameters in terms of forcefields that direct the folding. These constraints will

    reflect the energetics associated with charge,

    hydrophobicity and polarity with the aim being tofind a single structure oflow energy.

    How to define the energy of a PROTEIN?

  • 7/30/2019 Protein Prediction

    88/100

    88

    How to define the energy of a PROTEIN?

    How to find the conformations for which the energy is

    minimum?

    This approach is based on the thermodynamic argument that

    the native structure of a protein is the global minimum in

    the free energy profile.

    Generally the results are expressed as rmsd (root mean

    square deviations), reflecting the difference in positions

    between corresponding atoms in the experimental and

    calculated (predicted) structures.

  • 7/30/2019 Protein Prediction

    89/100

    89

    What make global minimum?

    1) Semi imperical potential function

    Which is calculated as the sum of all the possible pair

    wise interactions between the atoms in the molecule

    Eg: AMBER force field

    Evaluates the N(N-1)/2 atom-atom interaction which

    requires computational time N square.

    Most of the computational methods look for the

    Global minimum??

    SEMI IMPERICAL POTENTIAL FUNCTION

  • 7/30/2019 Protein Prediction

    90/100

    90

    structure structure

    E

    N

    ER

    G

    Y

    E

    N

    E

    R

    G

    Y

    SCHEMATIC DIAGRAM OF DIFFERENT TYPES OF

    ENERGY FUNCTIONS

    Global Minimum Global Minimum

    N(N-1)/2 Atom-atom Interaction

    Eg : AMBER Force Fields

  • 7/30/2019 Protein Prediction

    91/100

    To over come this reduce the resolution at which the

    potential function is calculated.

    Instead of atom-atom potential The United atompotential would be an approximation.

    This approximation is also called as a Pseudo atom

  • 7/30/2019 Protein Prediction

    92/100

    92

    2) UNRES force field (Residues with Solvent)

    UNRES was originally designed and parameterizedto locate native-like structures of proteins as thelowest in potential energy by unrestricted globaloptimization.

    Propensities of amino acids calculated.

    the backbone is represented as a sequence of -carbon (C) atoms linked by virtual bonds designatedas dC, with united peptide groups (ps) in theircenters.

  • 7/30/2019 Protein Prediction

    93/100

    93

    Molecular Dynamics

  • 7/30/2019 Protein Prediction

    94/100

    94

    Computation of dynamics or motion of a ptn.

    1) The physical forces which influence the folding

    process are well represented by the semi empirical

    force fields.

    2) The atoms of the ptn move independently uponinduction of the force fields..

    Calculated by Newtons laws of motion:-

    F=ma (calculated for each fematoseconds)..

  • 7/30/2019 Protein Prediction

    95/100

    Motion influenced by temperature??

  • 7/30/2019 Protein Prediction

    96/100

    96

    Stimulated annealing:-

    Higher temperatures the motion will be greater in ashorter period of time.

    1) Initial global minimum of the atom is calculated first.

    2) Temperature is raised to 3000K for a fewFemtoseconds and then gradually lowered to roomtemperature 300K

    The idea is to even if you start to predict the protein inwith wrong structure (at 3000K) final corrections willlead to conformation.

    Molecular dynamics Optimization tool

  • 7/30/2019 Protein Prediction

    97/100

    97

    energy

    Conformational space

    Trajectory

    Oth h i l ti

  • 7/30/2019 Protein Prediction

    98/100

    98

    Other such simulations are ..

    MONTE CARLO SIMULATIONS

    CONFORMATIONAL SPACE ANNEALING

    ROSETTA ALGORITHM

    CASP (Critical assessment of structure

    prediction)

    LEVINTHAL PARODOX etc

    ROSETTA ALGORITHM

  • 7/30/2019 Protein Prediction

    99/100

    Break target sequence into fragments of 9 amino acids

    Create profile , X, for target

    Create profile, S, for similar PDB sequences

    Align profiles X, S to get best match fragment

    Use fragments as starting point for optimisation, using:

    - hydrophobic burial

    - polar side-chain interactions

    - hydrogen bonding between beta-strands

    - hard sphere repulsion (van der Waals)

    Create 1000 structures, and

    Choose cluster centre as the

    best prediction

  • 7/30/2019 Protein Prediction

    100/100