41
1 Protein Structure, Protein Structure, Structure Structure Classification and Classification and Prediction Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala University

1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala

1

Protein Structure, Structure Protein Structure, Structure Classification and PredictionClassification and Prediction

Bioinformatics X3

January 2005

P. Johansson, D. Madsen

Dept.of Cell & Molecular Biology,

Uppsala University

Page 2: 1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala

2

OverviewOverview

• Introduction to proteins, structure & classification

• Protein Folding

• Experimental techniques for structure determination

• Structure prediction

Page 3: 1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala

3

Page 4: 1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala

4

ProteinsProteins

• Proteins play a crucial role in virtually all biological processes with a broad range of functions.

• The activity of an enzyme or the function of a protein is governed by

the three-dimensional structure

Page 5: 1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala

5

20 amino acids - the building blocks

Page 6: 1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala

6

The Amino AcidsThe Amino Acids

Page 7: 1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala

7

Hydrophilic or hydrophobic..?Hydrophilic or hydrophobic..?

• Virtually all soluble proteins feature a hydrophobic core surrounded by a hydrophilic surface

• But, peptide backbone is inherently polar ?

• Solution ; neutralize potential H-donors & acceptors using ordered secondary structure

Page 8: 1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala

8

Secondary StructureSecondary Structure: -helix

Page 9: 1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala

9

• 3.6 residues / turn

• Axial dipole moment

• Not Proline & Glycine

• Protein surfaces

Secondary StructureSecondary Structure: -helix

Page 10: 1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala

10

Secondary StructureSecondary Structure: -sheets

Page 11: 1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala

11

Secondary StructureSecondary Structure: -sheets

• Parallel or antiparallel

• Alternating side-chains

• No mixing

• Loops often have polar amino acids

Page 12: 1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala

12

Structural classificationStructural classification

• Databases– SCOP, ’Structural Classification of Proteins’,

manual classification

– CATH, ’Class Architecture Topology Homology’, based on the SSAP algorithm

– FSSP, ’Family of Structurally Similar Proteins’, based on the DALI algorithm

– PClass, ’Protein Classification’ based on the LOCK and 3Dsearch algorithms

Page 13: 1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala

13

Structural classification, CATHStructural classification, CATH

• Class, four types :– Mainly

– structures

– Mainly

– No secondary structure

• Arhitecture (fold)

• Topology (superfamily)

• Homology (family)

Page 14: 1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala

14

Structural classification..Structural classification..

Page 15: 1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala

15

Structural classification..Structural classification..

• Two types of algorithms

– Inter-Molecular, 3D, Rigid Body ; structural alignment in a common coordinate system (hard) e.g. VAST, LOCK.. alg.

– Intra-Molecular, 2D, Internal Geometry ; structural alignment using internal distances and angles e.g. DALI, STRUCTURAL, SSAP.. alg.

Page 16: 1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala

16

Structural classification, Structural classification, SSAPSSAP

• SSAP, ‘Sequential Structure Alignment Program’

Basic idea ; The similarity between residue i in molecule A and residue k in molecule B is characterised in terms of their structural surroundings

This similarity can be quantified into a score, Sik

Based on this similarity score and some specified gap penalty, dynamic programming is used to find the optimal structural alignment

Page 17: 1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala

17

Structural classification, Structural classification, SSAPSSAP

The structural neighborhood of residue i in A compared to residue k in B

i k

Page 18: 1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala

18

Structural classification, Structural classification, SSAP..SSAP..

Distance between residue i & j in molecule A ; dAi,j

Similarity for two pairs of residues, i j in A & k l in B ;

,,bdd

as

Bkl

Aij

klij +−= a,b constants

Similarity between residue i in A and residue k in B ;

∑−= ++ +−

=n

nmB

mkkA

mii

kibdd

aS

,,

,

Idea ; Si,k is big if the distances from residue i in A to the 2n nearest neighbours are similar to the corresponding distances around k in B

Page 19: 1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala

19

Structural classification, Structural classification, SSAP..SSAP..

This works well for small structures and local structural alignments - however, insertions and deletions cause problems unrelated distances

HSERAHVFIM..

GQ-VMAC-NW..

i=5

k=4

A :

B :

- The real algorithm uses Dynamic programming on two levels, first to find which distances to compare Sik, then to align the structures using these scores

Page 20: 1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala

20

Experimental techniques for structure Experimental techniques for structure determinationdetermination

• X-ray Crystallography

• Nuclear Magnetic Resonance spectroscopy (NMR)

• Electron Microscopy/Diffraction

• Free electron lasers ?

Page 21: 1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala

21

X-ray CrystallographyX-ray Crystallography

Page 22: 1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala

22

X-ray Crystallography..X-ray Crystallography..

• From small molecules to viruses

• Information about the positions of individual atoms

• Limited information about dynamics

• Requires crystals

Page 23: 1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala

23

Page 24: 1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala

24

NMRNMR

• Limited to molecules up to ~50kDa (good quality up to 30 kDa)

• Distances between pairs of hydrogen atoms

• Lots of information about dynamics• Requires soluble, non-aggregating

material• Assignment problem

Page 25: 1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala

25

Electron Microscopy/ DiffractionElectron Microscopy/ Diffraction

• Low to medium resolution• Limited information about

dynamics• Can use very small crystals

(nm range)• Can be used for very large

molecules and complexes

Page 26: 1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala

26

Page 27: 1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala

27

Structure PredictionStructure Prediction

GPSRYIV…?

Page 28: 1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala

28

Protein FoldingProtein Folding

• Different sequence Different structure

• Free energy difference small due

to large entropy decrease,

G = H - TS

Page 29: 1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala

29

Structure PredictionStructure Prediction

Why is structure prediction and especially ab initio calculations hard..?

• Many degrees of freedom / residue

• Remote noncovalent interactions

• Nature does not go through all conformations

• Folding assisted by enzymes & chaperones

Page 30: 1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala

30

Ab initio calculations used

for smaller problems ;

• Calculation of affinity

• Enzymatic pathways

Molecular dynamicsMolecular dynamics

Page 31: 1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala

31

Sequence Classification rev.

• Class : Secondary structure content

• Fold : Major structural similarity.

• Superfamily : Probable common evolutionary origin.

• Family : Clear evolutionary relationship.

Page 32: 1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala

32

• Search sequence data banks for homologs

• Search methods e.g. BLAST, PSIBLAST, FASTA…

• Homologue in PDB..?

Structure PredictionStructure Prediction

IVTY…PGGG HYW…QHG

Page 33: 1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala

33

Multiple sequence / structure alignment• Contains more information than a single sequence

for applications like homology modeling and secondary structure prediction

• Gives location of conserved parts and residues likely to be buried in the protein core or exposed to solvent

Structure PredictionStructure Prediction

Page 34: 1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala

34

HFD fingerprint

Multiple alignment exampleMultiple alignment example

Page 35: 1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala

35

• Statistical Analysis (old fashioned):– For each amino acid type assign it’s ‘propensity’

to be in a helix, sheet, or coil.

• Limited accuracy ~55-60%. • Random prediction ~38%.

MTLLALGINHKTAP...CCEEEEEECCCCCC...

Secondary Structure PredictionSecondary Structure Prediction

Page 36: 1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala

36

• Each residue is classified as:– H/H, strong helix / strand former.– h/h, weak helix / strand former.– I, indifferent.– b/b, weak helix/strand breaker.– B/B, strong helix / strand breaker.

The Chou & Fasman Method

Page 37: 1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala

37

The Chou & Fasman Method..

• Score each residue: – H/h=1, I=0 or ½, B/b=-1.– H/h=1, I=0 or ½, B/b=-1.

• Helix nucleation: – Score > 4 in a “window” of 6 residues.

• Strand nucleation:– Score > 3 in a “window” of 5 residues.

• Propagate until score < 1 in a 4 residue “window”.

Page 38: 1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala

38

GPSRYIVTLANGKHelix:

Strand

-1 -1 0 0 -1 1 1 0 1 1 -1 -1 1

-1 -1 -1 .5 1 1 1 1 1 0 0 -1 -1

-2 0 1 2 3 3 1 No nucl.

-1.5 .5 2.5 4.5 5 4 3 1 -1

-2.5 -.5 1.5 … 3 1 -1

Nucleation

Propagate

GPSRYIVTLANGKResult

The Chou & Fasman Method..

Page 39: 1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala

39

• Neural networks (e.g. the PHD server):– Input: a number of protein sequences +

secondary structure.– Output: a trained network that predicts

secondary structure elements with ~70% accuracy.

• Use many different methods and compare (e.g. the JPred server)!

Modern methods

Page 40: 1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala

40

Summary

• The function of a protein is governed by its structure

• Different sequence Different structure

• PDB, protein data bank

• Secondary structure prediction is hard, tertiary structure prediction is even harder

• Use homologs whenever possible or different methods to assess quality

Page 41: 1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala

41