Bioinformaticspedagogix-tagc.univ-mrs.fr/courses/bioinfo_intro/pdf...1-cut 2-cuts 4-cuts Slide...

Preview:

Citation preview

Macromolecular structure

Bioinformatics

Contents

Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure Structure analysis

Structure alignment Domain recognition

Structure prediction Homology modelling Threading/folder recognition Secondary structure prediction ab initio prediction

Jacques van Heldenjvanheld@ucmb.ulb.ac.be

Determinationof protein structure

Structure

Crystal

Hanging drop method / vapour diffusion method

Microscope slide

2-Concentrated salt solution

1-Dilute protein solutionMicroscope

many differentconditions of 1&2must be tried

Crystallisation

Slide courtesy from Shoshana Wodak

Diffraction pattern Atomic model

Determination of protein structure

Slide courtesy from Shoshana Wodak

A high resolution protein structure : 1.5 - 2.0 Å resolution

q

q

q

The resolution problem

Slide courtesy from Shoshana Wodak

Nuclear Magnetic Resonance (NMR)

Source: Branden & Tooze (1991)

Interatomic forces

Covalent interactions Hydrogen bonds Hydrophobic/hydrophilic interactions Ionic interactions van der Waals force Repulsive forces

Jacques van Heldenjvanheld@ucmb.ulb.ac.be

Structure databases

Structure

Structure databases

PDB (Protein database) Official structure repository

SCOP (Stuctural Classification Of Proteins) Structure classification. Top level reflect structural classes.The

second level, called Fold, includes topological and similaritycriteria.

CATH (Class, Architecture, Topology and Homologoussuperfamily)

PDB entry header

HEADER TRANSCRIPTION REGULATION 06-MAR-92 1D66 1D66 2

COMPND GAL4 (RESIDUES 1 - 65) COMPLEX WITH 19MER DNA 1D66 3

SOURCE (SACCHAROMYCES $CEREVISIAE) OVEREXPRESSED IN (ESCHERICHIA 1D66 4

SOURCE 2 $COLI) 1D66 5

AUTHOR R.MARMORSTEIN,S.HARRISON 1D66 6

REVDAT 1 15-APR-93 1D66 0 1D66 7

JRNL AUTH R.MARMORSTEIN,M.CAREY,M.PTASHNE,S.C.HARRISON 1D66 8

JRNL TITL /DNA$ RECOGNITION BY /GAL4$: STRUCTURE OF A 1D66 9

JRNL TITL 2 PROTEIN(SLASH)/DNA$ COMPLEX 1D66 10

JRNL REF NATURE V. 356 408 1992 1D66 11

JRNL REFN ASTM NATUAS UK ISSN 0028-0836 006 1D66 12

REMARK 1 1D66 13

REMARK 2 1D66 14

REMARK 2 RESOLUTION. 2.7 ANGSTROMS. 1D66 15

REMARK 3 1D66 16

REMARK 3 REFINEMENT. 1D66 17

REMARK 3 PROGRAM CORELS;TNT;XPLOR 1D66 18

REMARK 3 AUTHORS J.SUSSMAN;D.TRONRUD;A.BRUNGER 1D66 19

REMARK 3 R VALUE 0.230 1D66 20

REMARK 3 RMSD BOND DISTANCES 0.015 ANGSTROMS 1D66 21

REMARK 3 RMSD BOND ANGLES 2.9 DEGREES 1D66 22

REMARK 4 1D66 23

REMARK 4 THERE ARE TWO DNA CHAINS WHICH HAVE BEEN ASSIGNED CHAIN 1D66 24

REMARK 4 INDICATORS *D* AND *E*. THERE ARE TWO PROTEIN CHAINS 1D66 25

REMARK 4 WHICH HAVE BEEN ASSIGNED CHAIN INDICATORS *A* AND *B*. 1D66 26

REMARK 4 EACH PROTEIN - DNA COMPLEX CONTAINS FOUR BOUND CD IONS. 1D66 27

...

Class

Architecture

Topology

Figure from Shoshana Wodak

CATH - A protein domain classification

In CATH, proteindomains are classifiedaccording to a tree with 4levels of hierarchically Class Architecture Topology Homology

CATH: structural classification of proteins, [http://www.biochem.ucl.ac.uk/bsm/cath/] SCOP: Structural classification of proteins [http://scop.mrc-lmb.cam.ac.uk/scop/] FSSP:Fold classification based on structure alignments [http://www.sander.ebi.ac.uk/fssp/] HSSP: Homology derived secondary structure assignments [http://www.sander.ebi.ac.uk/hssp/] DALI:Classification of protein domains [http://www.ebi.ac.uk/dali/domain/] VAST: structural neighbours by direct 3D structure comparison [http://www.ncbi.nlm.nih.gov:80/Structure/VAST/vast.shtml] CE: Structure comparisons by Combinatorial Extension [http://cl.sdsc.edu/ce.html]

Classifications of protein structures (domains)

Slide courtesy from Shoshana Wodak

Books

Branden, C. & Tooze, J. (1991). Introduction to proteinstructure. 1 edit, Garland Publishing Inc., New York andLondon.

Westhead, D.R., J.H. Parish, and R.M. Twyman. 2002.Bioinformatics. BIOS Scientific Publishers, Oxford.

Mount, M. (2001). Bioinformatics: Sequence andGenome Analysis. 1 edit. 1 vols, Cold Spring HarborLaboratory Press, New York.

Gibas, C. & Jambeck, P. (2001). DevelopingBioinformatics Computer Skills, O'Reilly.

Jacques van Heldenjvanheld@ucmb.ulb.ac.be

Secondary structure elements

Structure

Secondary structure - α-helix

Source: Branden & Tooze (1991)

3.6 residues

hydrogen bond

CarbonNitrogenOxygen

Hydrophobicity of side-chain residues in helices

Source: Branden & Tooze (1999)Blue: polarRed: basic or acidic

Secondary structure - β sheets

Antiparallel Parallel

Source: Branden & Tooze (1991)

Secondary structure - twist of β sheets

Mixed β sheet

Source: Branden & Tooze (1991)

Angles of rotation

Each dipeptide unit is characterizedby two angles of rotation Phi around the N-Calpha bond Psi around the Calpha-C bond

Image from Branden & Tooze (1999)

Dipeptide unit

The Ramachandran map

Slide courtesy from Shoshana Wodak

Dipeptide unit

Jacques van Heldenjvanheld@ucmb.ulb.ac.be

Tertiary structure

Structure

Combinations of secondary structures

loop

α-helix

β-sheet

Retinol binding protein (PDB:1rpb)

Jacques van Heldenjvanheld@ucmb.ulb.ac.be

Analysis of structure

Bioinformatics

Question: Is structure A similar to structure B ?

Structure AStructure B

Approach: structure alignments

Structure-structure alignment and comparison

Slide courtesy from Shoshana Wodak

Open form Closed form

Citrate synthase, ligand induced conformational changesDomain motion and small structural distortions

Analyzing conformational changes

Slide courtesy from Shoshana Wodak

Defining Domains: What for?

Link between domain structure and function

Different structural domains can be associated with

different functions

Enzyme active sites are often at domain interfaces;domain movements play

a functional role

Cathepsin DDNA Methyltransferase

Slide courtesy from Shoshana Wodak

N

C

N

C

C

N

1-cut

2-cuts

4-cuts

Slide courtesy from Shoshana Wodak

Methods for Identifying Domains

Underlying principle Domain limits are defined by identifying groups of residues such

that the number of contacts between groups is minimized.

Domains From Contact Map

Lactate dehydrogenase

Slide courtesy from Shoshana Wodak

Jacques van Heldenjvanheld@ucmb.ulb.ac.be

Structure prediction

Structure

Methods for structure prediction

Homology modelling Building a 3D model on the basis of similar sequences

Threading Threading the sequence on all known protein structures, and

testing the consistency

Secondary structure prediction ab initio prediction of tertiary structure

For proteins of normal size, it is almost impossible to predictstructures ab initio.

Some results have been obtained in the prediction ofoligopeptide structures.

Homology modelling - steps

Similarity search Modelling of backbone

Secondary structure elements Loops

Modelling of side chains Refinement of the model Verification

Steric compatibility of the residues

Homology modelling - similarity search

Starting from a query sequence, search for similarsequences with known structure. Search for similar sequences in a database of protein structures. Multiple alignment. A weight can be assigned to each matching protein (higher

score to more similar proteins)

The higher is the sequence similarity, the more accuratewill be the predicted structure. When one disposes of structure for proteins with >70% similarity

with the query, a good model can be expected. When the similarity is <40%, homology modeling gives poor

results. The lack of available structures constitutes one of the main

limitations to homology modeling• In 2004, PDB contains

Homology modelling - Backbone modelling

Modelling of secondary structure elements a-helices b-sheets For each secondary structure element of the template, align the

backbone of query and template.

Loop modelling Databases of loop regions Loop main chain depends on number of aa and neighbour

elements (a-a, a-b, b-a, b-b)

Homology modelling - Side chain modelling

Side-chain conformation (model building and energyrefinement) Conserved side chains take same coordinates as in the template. For non-conserved side chains, use rotamer libraries to

determine the most favourable conformation.

Homology modelling - refinement

After the steps above have been completed, the modelcan be refined by modifying the positions of some atomsin order to reduce the energy.

Recommended