View
214
Download
0
Tags:
Embed Size (px)
Citation preview
1
Protein Structure, Structure Protein Structure, Structure Classification and PredictionClassification and Prediction
Bioinformatics X3
January 2005
P. Johansson, D. Madsen
Dept.of Cell & Molecular Biology,
Uppsala University
2
OverviewOverview
• Introduction to proteins, structure & classification
• Protein Folding
• Experimental techniques for structure determination
• Structure prediction
3
4
ProteinsProteins
• Proteins play a crucial role in virtually all biological processes with a broad range of functions.
• The activity of an enzyme or the function of a protein is governed by
the three-dimensional structure
5
20 amino acids - the building blocks
6
The Amino AcidsThe Amino Acids
7
Hydrophilic or hydrophobic..?Hydrophilic or hydrophobic..?
• Virtually all soluble proteins feature a hydrophobic core surrounded by a hydrophilic surface
• But, peptide backbone is inherently polar ?
• Solution ; neutralize potential H-donors & acceptors using ordered secondary structure
8
Secondary StructureSecondary Structure: -helix
9
• 3.6 residues / turn
• Axial dipole moment
• Not Proline & Glycine
• Protein surfaces
Secondary StructureSecondary Structure: -helix
10
Secondary StructureSecondary Structure: -sheets
11
Secondary StructureSecondary Structure: -sheets
• Parallel or antiparallel
• Alternating side-chains
• No mixing
• Loops often have polar amino acids
12
Structural classificationStructural classification
• Databases– SCOP, ’Structural Classification of Proteins’,
manual classification
– CATH, ’Class Architecture Topology Homology’, based on the SSAP algorithm
– FSSP, ’Family of Structurally Similar Proteins’, based on the DALI algorithm
– PClass, ’Protein Classification’ based on the LOCK and 3Dsearch algorithms
13
Structural classification, CATHStructural classification, CATH
• Class, four types :– Mainly
– structures
– Mainly
– No secondary structure
• Arhitecture (fold)
• Topology (superfamily)
• Homology (family)
14
Structural classification..Structural classification..
15
Structural classification..Structural classification..
• Two types of algorithms
– Inter-Molecular, 3D, Rigid Body ; structural alignment in a common coordinate system (hard) e.g. VAST, LOCK.. alg.
– Intra-Molecular, 2D, Internal Geometry ; structural alignment using internal distances and angles e.g. DALI, STRUCTURAL, SSAP.. alg.
16
Structural classification, Structural classification, SSAPSSAP
• SSAP, ‘Sequential Structure Alignment Program’
Basic idea ; The similarity between residue i in molecule A and residue k in molecule B is characterised in terms of their structural surroundings
This similarity can be quantified into a score, Sik
Based on this similarity score and some specified gap penalty, dynamic programming is used to find the optimal structural alignment
17
Structural classification, Structural classification, SSAPSSAP
The structural neighborhood of residue i in A compared to residue k in B
i k
18
Structural classification, Structural classification, SSAP..SSAP..
Distance between residue i & j in molecule A ; dAi,j
Similarity for two pairs of residues, i j in A & k l in B ;
,,bdd
as
Bkl
Aij
klij +−= a,b constants
Similarity between residue i in A and residue k in B ;
∑−= ++ +−
=n
nmB
mkkA
mii
kibdd
aS
,,
,
Idea ; Si,k is big if the distances from residue i in A to the 2n nearest neighbours are similar to the corresponding distances around k in B
19
Structural classification, Structural classification, SSAP..SSAP..
This works well for small structures and local structural alignments - however, insertions and deletions cause problems unrelated distances
HSERAHVFIM..
GQ-VMAC-NW..
i=5
k=4
A :
B :
- The real algorithm uses Dynamic programming on two levels, first to find which distances to compare Sik, then to align the structures using these scores
20
Experimental techniques for structure Experimental techniques for structure determinationdetermination
• X-ray Crystallography
• Nuclear Magnetic Resonance spectroscopy (NMR)
• Electron Microscopy/Diffraction
• Free electron lasers ?
21
X-ray CrystallographyX-ray Crystallography
22
X-ray Crystallography..X-ray Crystallography..
• From small molecules to viruses
• Information about the positions of individual atoms
• Limited information about dynamics
• Requires crystals
23
24
NMRNMR
• Limited to molecules up to ~50kDa (good quality up to 30 kDa)
• Distances between pairs of hydrogen atoms
• Lots of information about dynamics• Requires soluble, non-aggregating
material• Assignment problem
25
Electron Microscopy/ DiffractionElectron Microscopy/ Diffraction
• Low to medium resolution• Limited information about
dynamics• Can use very small crystals
(nm range)• Can be used for very large
molecules and complexes
26
27
Structure PredictionStructure Prediction
GPSRYIV…?
28
Protein FoldingProtein Folding
• Different sequence Different structure
• Free energy difference small due
to large entropy decrease,
G = H - TS
29
Structure PredictionStructure Prediction
Why is structure prediction and especially ab initio calculations hard..?
• Many degrees of freedom / residue
• Remote noncovalent interactions
• Nature does not go through all conformations
• Folding assisted by enzymes & chaperones
30
Ab initio calculations used
for smaller problems ;
• Calculation of affinity
• Enzymatic pathways
Molecular dynamicsMolecular dynamics
31
Sequence Classification rev.
• Class : Secondary structure content
• Fold : Major structural similarity.
• Superfamily : Probable common evolutionary origin.
• Family : Clear evolutionary relationship.
32
• Search sequence data banks for homologs
• Search methods e.g. BLAST, PSIBLAST, FASTA…
• Homologue in PDB..?
Structure PredictionStructure Prediction
IVTY…PGGG HYW…QHG
33
Multiple sequence / structure alignment• Contains more information than a single sequence
for applications like homology modeling and secondary structure prediction
• Gives location of conserved parts and residues likely to be buried in the protein core or exposed to solvent
Structure PredictionStructure Prediction
34
HFD fingerprint
Multiple alignment exampleMultiple alignment example
35
• Statistical Analysis (old fashioned):– For each amino acid type assign it’s ‘propensity’
to be in a helix, sheet, or coil.
• Limited accuracy ~55-60%. • Random prediction ~38%.
MTLLALGINHKTAP...CCEEEEEECCCCCC...
Secondary Structure PredictionSecondary Structure Prediction
36
• Each residue is classified as:– H/H, strong helix / strand former.– h/h, weak helix / strand former.– I, indifferent.– b/b, weak helix/strand breaker.– B/B, strong helix / strand breaker.
The Chou & Fasman Method
37
The Chou & Fasman Method..
• Score each residue: – H/h=1, I=0 or ½, B/b=-1.– H/h=1, I=0 or ½, B/b=-1.
• Helix nucleation: – Score > 4 in a “window” of 6 residues.
• Strand nucleation:– Score > 3 in a “window” of 5 residues.
• Propagate until score < 1 in a 4 residue “window”.
38
GPSRYIVTLANGKHelix:
Strand
-1 -1 0 0 -1 1 1 0 1 1 -1 -1 1
-1 -1 -1 .5 1 1 1 1 1 0 0 -1 -1
-2 0 1 2 3 3 1 No nucl.
-1.5 .5 2.5 4.5 5 4 3 1 -1
-2.5 -.5 1.5 … 3 1 -1
Nucleation
Propagate
GPSRYIVTLANGKResult
The Chou & Fasman Method..
39
• Neural networks (e.g. the PHD server):– Input: a number of protein sequences +
secondary structure.– Output: a trained network that predicts
secondary structure elements with ~70% accuracy.
• Use many different methods and compare (e.g. the JPred server)!
Modern methods
40
Summary
• The function of a protein is governed by its structure
• Different sequence Different structure
• PDB, protein data bank
• Secondary structure prediction is hard, tertiary structure prediction is even harder
• Use homologs whenever possible or different methods to assess quality
41