View
221
Download
5
Category
Tags:
Preview:
Citation preview
Proteins
Structural Bioinformatics
2
3
Specific databases of protein sequencesand structures
Swissprot PIR TREMBL (translated from DNA) PDB (Three Dimensional Structures)
4
“ Perhaps the most remarkable features of the molecule are its complexity and its lack of symmetry. The arrangement seems to be almost totally lacking in the kind of regularities which one instinctively anticipates.”
Solved in 1958 by Max Perutz John Kendrew of Cambridge University. Won the 1962 and Nobel Prize in Chemistry.
Myoglobin – the first high resolution protein structure
5
Why Proteins Structure ?Why Proteins Structure ?
Proteins are fundamental components of all living
cells, performing a variety of biological tasks.
Each protein has a particular 3D structure that
determines its function.
Protein structure is more conserved than protein
sequence , and more closely related to function.
6
There Are Four Levels of Protein StructurePrimary: amino acid linear sequence.
Secondary: -helices, β-sheets and loops.
Tertiary: the 3D shape of the fully folded polypeptide chain
Quaternary: arrangement of several polypeptide chains.
7
Symbols for the 20 amino acids
A ala alanine M met methionineC cys cysteine N asn aspargineD asp aspartic acid P pro prolineE glu glutamic acid Q gln glutamineF phe phenylalanine R arg arginineG gly glycine S ser serineH his histidine T thr threonineI ile isoleucine V val valineK lys lysine W trp tryptophaneL leu leucine Y tyr tyrosine
8
Secondary StructureSecondary structure is usually divided into
three categories:
Alpha helix Beta strand (sheet)Anything else –
turn/loop
9
3.6 residues
5.6 Å
Alpha HelixAlpha Helix: : Pauling Pauling ((19511951))
• A consecutive stretch of 5-40 amino
acids (average 10).
• A right-handed spiral conformation.
• 3.6 amino acids per turn.
• Stabilized by H-bonds in the backbone between C=O of residue n, and NH of residue n+4.
• Side-chains point out.
10
Beta StrandBeta Strand: : Pauling and Corey Pauling and Corey ((19511951))
• Different polypeptide chains run alongside each
other and are linked together by hydrogen bonds.
• Each section is called β -strand,
and consists of 5-10 amino acids.
β -strand
11
The strands become adjacent to each other, forming beta-sheet.
Beta SheetBeta Sheet3.47Å
4.6Å
3.25Å
4.6Å
(a)Antiparallel(b)Parallel
12
LoopsLoops• Connect the secondary structure
elements.
• Have various length and shapes.
• Located at the surface of the folded protein and therefore may have important role in biological recognition processes.
• Proteins that are evolutionary related have the same helices & sheets but may vary in loop structures.
13
How is the 3D Structure Determined ?How is the 3D Structure Determined ?
1. Experimental methods (Best approach):1. Experimental methods (Best approach):• X-rays crystallography.
• NMR.
• Others.
2. In-silico methods (partial solutions - 2. In-silico methods (partial solutions -
based on similarity):based on similarity):.• Threading - needs a 3D structure, combinatorial complexity.
• Ab-initio structure prediction - not always successful.
14
X-ray crystallography1. Obtain an ordered protein crystal.
2. Check x-ray diffraction.
The crystal is bombarded The crystal is bombarded with X-ray beams.with X-ray beams.
The collision of the beams The collision of the beams with the electrons creates with the electrons creates a diffraction pattern.a diffraction pattern.
15
X-ray crystallography3. Analyze diffraction pattern and produce an
electron density map.
4. Thread the known protein sequence into the density map.
16
X-ray crystallography
• The molecules must be very pure in order to produce perfect and stable crystals.
• The method is time-consuming and difficult.
17
NMR - Nuclear MagneticResonance (since 1945)
• A sample is immersed in a magnetic field and bombarded with radio waves.
• The molecule’s nucleus resonate (spin). This motion is determined and is specific for each molecule type.
18
Principles of NMR
19
NMR - Nuclear MagneticResonance
• The NMR technique is very time consuming and expensive, and the sample has to be in a concentrated solution, and is limited to small and soluble molecules.
20
PDB: Protein Data Bank• Holds 3D models of biological macromolecules (protein,
RNA, DNA).
• All data are available to the public.
• Obtained by X-Ray crystallography (84%) or NMR spectroscopy (16%).
• Submitted by biologists and biochemists from around the world.
21
PDB – Protein Data Bank
http://www.rcsb.org/pdb/
22
How Many Structures ?How Many Structures ?PDB Content Growth
http://www.rcsb.org/pdb/holdings.html
23
Structure Prediction: Motivation
• Hundreds of thousands of gene sequences translated to proteins (genbanbk, SW, PIR)
• Only about 28000 solved structures (PDB)Experimental methods are time consuming and not always posible
• Goal: Predict protein structure based on sequence information
24
Structure Prediction: Motivation
• Understand protein function– Locate binding sites
• Broaden homology– Detect similar function where sequence differs
• Explain disease– See effect of amino acid changes– Design suitable compensatory drugs
25
Prediction Approaches
• Primary (sequence) to secondary structure– Sequence characteristics
• Secondary to tertiary structure– Fold recognition– Threading against known structures
• Primary to tertiary structure– Ab initio modelling
26
Secondary structures have an amphiphilic nature :one face polar and the other non polar
Non-polarpolar
-helix -sheet
non-polar
polar
polar
Can we predict the secondary structure from sequence ?
27
Secondary Structure Prediction Methods
• Chou-Fasman / GOR Method– Based on amino acid frequencies
• Artificial Neural Network (ANN) methods– PHDsec and PSIpred
• HMM (Hidden Markov Model)
• Best accuracy now ~80%
28
Chou and Fasman (1974)Name P(a) P(b) P(turn)
Alanine 142 83 66Arginine 98 93 95Aspartic Acid 101 54 146Asparagine 67 89 156Cysteine 70 119 119Glutamic Acid 151 037 74Glutamine 111 110 98Glycine 57 75 156Histidine 100 87 95Isoleucine 108 160 47Leucine 121 130 59Lysine 114 74 101Methionine 145 105 60Phenylalanine 113 138 60Proline 57 55 152Serine 77 75 143Threonine 83 119 96Tryptophan 108 137 96Tyrosine 69 147 114Valine 106 170 50
The propensity of an amino acid to be part of a certain secondary structure (e.g. – Proline has a low propensity of being in an alpha helix or beta sheet breaker)
Success rate of 50%
29
Secondary Structure Method Improvements
‘Sliding window’ approach• Most alpha helices are ~12 residues long
Most beta strands are ~6 residues long Look at all windows of size 6/12 Calculate a score for each window. If >threshold
predict this is an alpha helix/beta sheet
TGTAGPOLKCHIQWMLPLKKTGTAGPOLKCHIQWMLPLKK
30
Improvements in the 1980’s
• Adding information from conservation in MSA
• Smarter algorithms (e.g. HMM, neural networks).
Success -> ~80%
31
PHDsec and PSIpred
• PHDsec– Rost & Sander, 1993– Based on sequence family alignments
• PSIpred– Jones, 1999– Based on Position Specific Scoring Matrix Generated by PSI-BLAST
• Both consider long-range interactions
32
HMM
• HMM enables us to calculate the probability of assigning a sequence of hidden states to the observation
TGTAGPOLKCHIQWML TGTAGPOLKCHIQWML HHHHHHHLLLLBBBBBHHHHHHHLLLLBBBBB
p? =
observation
Hidden state
33
The probability of observing a residue which belongs to an α-helix followed by a residue belonging to a turn = 0.15
The probability of observing
Alanine as part of a β-sheet
Table built according to large database of known secondary structures
α-helix followed by
α-helix
Beginning with an α-
helix
34
HMM
• The above table enables us to calculate the probability of assigning secondary structure to a protein
• ExampleTGQTGQHHHHHH
p = 0.45 x 0.041 x 0.8 x 0.028 x 0.8x 0.0635 = 0.0020995
35
SS prediction using ANN
ACDEFGHI
KL
MNPQRSTVWY.
Inputs for one positionAmino
acid at position
36
PHDsec Neural Net
ACDEFGHI
KL
MNPQRSTVWY.
Inputs for one positionAmino
acid at position
Hidden layer
OutputsH= helixE= strandC= CoilConfidence 0=low,9=high
37
Secondary structure prediction
• AGADIR - An algorithm to predict the helical content of peptides • APSSP - Advanced Protein Secondary Structure Prediction Server • GOR - Garnier et al, 1996 • HNN - Hierarchical Neural Network method (Guermeur, 1997) • Jpred - A consensus method for protein secondary structure prediction at University
of Dundee • JUFO - Protein secondary structure prediction from sequence (neural network) • nnPredict - University of California at San Francisco (UCSF) • PredictProtein - PHDsec, PHDacc, PHDhtm, PHDtopology, PHDthreader, MaxHom,
EvalSec from Columbia University • Prof - Cascaded Multiple Classifiers for Secondary Structure Prediction • PSA - BioMolecular Engineering Research Center (BMERC) / Boston • PSIpred - Various protein structure prediction methods at Brunel University • SOPMA - Geourjon and Delיage, 1995 • SSpro - Secondary structure prediction using bidirectional recurrent neural networks
at University of California • DLP - Domain linker prediction at RIKEN
Recommended