72
Swiss Institute of Bioinformatics Torsten Schwede Biozentrum - Universität Basel Swiss Institute of Bioinformatics Klingelbergstr 50-70 CH - 4056 Basel, Switzerland Tel: +41-61 267 15 81 EMBnet course: Introduction to Protein Structure Bioinformatics Homology Modeling Lausanne, February 22, 2007

Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

Swiss Institute of Bioinformatics

Torsten SchwedeBiozentrum - Universität Basel Swiss Institute of BioinformaticsKlingelbergstr 50-70 CH - 4056 Basel, Switzerland Tel: +41-61 267 15 81

EMBnet course: Introduction to Protein Structure Bioinformatics

Homology ModelingLausanne, February 22, 2007

Page 2: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

How many structures do we know?

http://www.wwpdb.org/

Page 3: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

How many structures do we know?

Page 4: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

[ PDB: http://www.pdb.org ]

Growth of the Protein Data Bank PDB

[ PDB: http://www.pdb.org ]

TotalYearly

Page 5: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

100

1,000

10,000

100,000

1,000,000

10,000,000

1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006

TrEMBL

SwissProt

PDB

No experimentalstructure for mostprotein sequences

(Sources: PDB, EBI, SIB)

How many structures do we know?

Page 6: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

In the near future for most of the known protein sequences

no experimental structure will be available.

Can we predict protein structures

from genome sequences?

MNIFEMLRID EGLRLKIYKD TEGYYTIGIG HLLTKSPSLN AAKSELDKAI GRNCNGVITKDEAEKLFNQD VDAAVRGILR NAKLKPVYDS LDAVRRCALI NMVFQMGETG VAGFTNSLRMLQQKRWDEAA VNLAKSRWYN QTPNRAKRVI TTFRTGTWDA YKNL

Page 7: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

The protein sequence contains all information needed to create a correctly folded protein.

Can we predict the folding process of a protein structure from their sequences (abinitio)?

Many proteins fold spontaneously to their native structureProtein folding is relatively fast (nsec – sec)Chaperones speed up folding, but do not alter the structure

MNIFEMLRID EGLRLKIYKD TEGYYTIGIG HLLTKSPSLN AAKSELDKAI GRNCNGVITKDEAEKLFNQD VDAAVRGILR NAKLKPVYDS LDAVRRCALI NMVFQMGETG VAGFTNSLRMLQQKRWDEAA VNLAKSRWYN QTPNRAKRVI TTFRTGTWDA YKNL

Page 8: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

( )

( )

( )( )

∑ ∑

= += ⎟⎟⎟

⎜⎜⎜

⎛+

⎥⎥

⎢⎢

⎟⎟⎠

⎞⎜⎜⎝

⎛−⎟

⎟⎠

⎞⎜⎜⎝

⎛+

−++

−+

−=

N

i

N

ij ij

ji

ij

ij

ij

ijij

torsions

N

anglesii

i

bondsii

i

rqq

rr

nV

k

llk

1 1 0

612

2

0,

2

0,

44

cos12

2

2

πεσσ

πε

γω

θθ

ν

Molecular Dynamics

Page 9: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

Ab initio protein folding simulation

[ http://www.research.ibm.com/bluegene/ ]

Physical time for simulation 10–4 seconds Typical time-step size 10–15 seconds Number of MD time steps 1011

Atoms in a typical protein and water simulation 32’000 Approximate number of interactions in force calculation 109

Machine instructions per force calculation 1000 Total number of machine instructions 1023

Petaflop capacity computer (floating point operations per second) 1 petaflop (1015)

Blue Gene will need 1-3 years to simulate 100 μsec.

Page 10: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

[ PDB: http://www.pdb.org ]

Growth of the Protein Data Bank PDB

New folds per year

“Old” folds per year

Page 11: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

[ http://www.biochem.ucl.ac.uk/bsm/cath_new/ ]

CATH - Protein Structure Classification

Class(C)

derived from secondary structure content is assigned automatically

Architecture(A)

describes the gross orientation of secondary structures, independent of connectivity.

Topology(T)

clusters structures according to their topological connections and numbers of secondary structures

Homologous Superfamily (H)

This level groups together protein domains which are thought to share a common ancestor and can therefore be described as homologous.

Page 12: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

.

Number of residues aligned

Pairw

ise

sequ

ence

iden

tity

100

75

50

25

0

Sequence identityimplies

structuralsimilarity !

Don't know region

(B.Rost, Columbia, NewYork)

Sequence similarity implies structural similarity?

Page 13: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

.

0

20

40

60

80

100

0 50 100 150 200 250

identitysimilarity

Number of residues aligned

Perc

enta

ge

sequen

ce

iden

tity

/sim

ilarity

(B.Rost, Columbia, NewYork)

Sequence similarity implies structural similarity?

Don’t

know region .....

Sequence identity implies structural similarity

Page 14: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

Find a compatible fold for a given sequence ....

>Protein XYMSTLYEKLGGTTAVDLAVDKFYERVLQDDRIKHFFADVDMAKQRAHQKAFLTYAFGGTDKYDGRYMREAHKELVENHGLNGEHFDAVAEDLLATLKEMGVPEDLIAEVAAVAGAPAHKRDVLNQ

≈?

Fold recognition / Threading

Number of protein folds that occurs in nature is limited. Fold Recognition

can be used to:

Identify templates for comparative modeling

Assign Protein Function

Page 15: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

Fold recognition / Threading

The "biological" perspective: Homologous proteins have evolved by

molecular evolution from a common ancestor. If we can establish

homology, we can predict aspects of structure and function of a new

protein by analogy.

The "physical" perspective: The native conformation of a protein

corresponds to a global free energy minimum of the protein / solvent

system. To identify a compatible fold, the protein sequence is "threaded"

through a library of folds, and empirical energy calculations are used to

evaluate compatibility.

No single method is perfect. Consensus methods often perform better:

MetaPP: http://cubic.bioc.columbia.edu/predictprotein/

http://bioinfo.pl/meta/

Further reading: Adam Godzik, "Fold Recognition Methods", in:

"Structural Bioinformatics", Bourne & Weissig, Eds.

Page 16: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

Protein Structure / Fold Databases

PDB: http://www.pdb.org

EBI-MSD http://www.ebi.ac.uk/msd/

SCOP http://scop.mrc-lmb.cam.ac.uk/scop/

CATH http://www.biochem.ucl.ac.uk/bsm/cath_new/

Page 17: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

Fold Recognition Servers

Meta serverhttp://bioinfo.pl/meta/

3DPSSM / Phyrehttp://www.sbg.bio.ic.ac.uk/servers/3dpssm/

http://www.sbg.bio.ic.ac.uk/~phyre/

GenTHREADERhttp://bioinf.cs.ucl.ac.uk/psipred/

FUGUE2http://www-cryst.bioc.cam.ac.uk/~fugue/prfsearch.html

SAMhttp://www.cse.ucsc.edu/research/compbio/HMM-apps/T99-query.html

FOLDhttp://fold.doe-mbi.ucla.edu/

FFAS/PDBBLASThttp://bioinformatics.burnham-inst.org/

Page 18: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

Evolution of the globin family:

Page 19: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

0.0

2.5

0.5

1.5

2.0

1.0

100 050

Percent identical residues in core

Rm

sdof

bac

kbone

atom

s in

core

[ Chothia & Lesk (1986) ]

Evolution of protein structure families

Common core = all residues that can be superposed in 3D

For proteins > 60% identical residues, the core contains >

90 % of all residues deviating less than 1.0 Å.

Page 20: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

Homology modeling= Comparative protein modeling = Knowledge-based modeling

Idea: Using experimental 3D-structures of related family members (templates) to calculate a model for a new sequence (target).

Similar Sequence Similar Structure

Page 21: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

Known Structures(Templates)

Target Sequence Template Selection

Alignment Template - Target

Structure modeling

Structure Evaluation &Assessment

HomologyModel(s)

Comparative Modeling

Page 22: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

Known Structures(Templates)

Target Sequence Template Selection

Alignment Template - Target

Structure modeling

Structure Evaluation &Assessment

HomologyModel(s)

• Protein Data Bank PDB http://www.pdb.org

Database of templates

• Separate into single chains• Remove bad structures

(models)• Create BLASTable database

or fold library (profiles, HMMs)

Comparative Modeling

Page 23: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

Known Structures(Templates)

Target Sequence Template Selection

Alignment Template - Target

Structure modeling

Structure Evaluation &Assessment

HomologyModel(s)

Template selection:

1. Sequence Similarity / Fold recognition

2. Structure quality (resolution, experimental method)

3. Experimental conditions (ligands and cofactors)

Comparative Modeling

Page 24: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

Known Structures(Templates)

Target Sequence Template Selection

Alignment Template - Target

Structure modeling

Structure Evaluation &Assessment

HomologyModel(s)

• Multiple sequence alignment for pairs > 40% identity

or• Use structural alignment of

templates to guide sequence alignment of target

or• Use separate profiles for

template and targets

Comparative Modeling

Page 25: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

Known Structures(Templates)

Target Sequence Template Selection

Alignment Template - Target

Structure modeling

Structure Evaluation &Assessment

HomologyModel(s)

• Errors in template selection or alignment result in bad models

iterative cycles of alignment, modeling and evaluation

Built many models, choose best.

Comparative Modeling

Page 26: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

Known Structures(Templates)

Target Sequence Template Selection

Alignment Template - Target

Structure modeling

Structure Evaluation &Assessment

HomologyModel(s)

I. Manual Model building

II. Template based fragment assembly

– Composer (Sybyl, Tripos)– SWISS-MODEL

III. Satisfaction of spatial restraints– Modeller (Insight II, MSI)– CPH-Models

Comparative Modeling

Page 27: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

[ http://www.expasy.org/spdbv/ ]

I. Manual Modeling

Page 28: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

II. Template based fragment assembly

Find structurally conserved core regions

Page 29: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

II. Template based fragment assembly

Build model core… by averaging core template backbone atoms (weighted by local sequence similarity with the target sequence). Leave non-conserved regions (loops) for later ….

Page 30: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

II. Template based fragment assembly

Loop (insertion) modelingUse the “spare part” algorithm to find compatible fragments in a Loop-Database, or “ab-initio” rebuilding (e.g. Monte Carlo, MD, GA, etc.) to build missing loops.

Page 31: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

II. Template based fragment assembly

Side Chain placementFind the most probable side chain conformation, using

• homologues structure information• back-bone dependent rotamer libraries• energetic and packing criteria

Page 32: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

II. Template based fragment assembly

Rotamer Libraries

Only a small fraction of all possible side chain conformations is observed in experimental structures

Rotamer libraries provide an ensemble of likely conformations

The propensity of rotamers depends on the backbone geometry:

Page 33: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

II. Template based fragment assembly

Energy minimization

modeling method will produce unfavorable contacts and bonds

Energy minimization is used to

• regularize local bond and angle geometry

• Relax close contacts and geometric strain

extensive energy minimization will move coordinates away from real structure ⇒ keep it to a minimum

SWISS-MODEL is using GROMOS 96 force field for a steepest descent

Page 34: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

M

A

T

EA

F

TS

G

Q

Homology Modeling

III. Satisfaction of Spatial restraints

Page 35: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

III. Satisfaction of Spatial restraints

Alignment of target sequence with templates

Extraction of spatial restraints from templates

Modeling by satisfaction of spatial restraints

M

A

T

EA

F

TS

G

Q

Page 36: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

Some features of a protein structure:

R resolution of X-ray experimentr amino acid residue typeΦ, Ψ main chain anglest secondary structure classM main chain conformation classΧ i,, ci side chain dihedral angle classa residue solvent accessibilitys residue neighborhood differenced Ca - Ca distanceΔd difference between two Ca - Ca distances

III. Satisfaction of Spatial restraints

Page 37: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

Feature properties can be associated with

a protein (e.g. X-ray resolution)

residues (e.g. solvent accessibility)

pairs of residues (e.g. Ca - Ca distance)

other features (e.g. main chain classes)

How can we derive modeling restraints from this data?A restraint is defined as probability density function (pdf) p(x):

∫=<≤1

2

)()21(x

x

dxxpxxxp1)( =∫ dxxp

with

0)( >xp

III. Satisfaction of Spatial restraints

Page 38: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

a) 11 Cys residues Chi-1 angles

b) smoothed distribution from a)

c) 297 Cys Chi-1 angles as control

III. Satisfaction of Spatial restraints

Derive pdfs from frequency tables by smoothing:

Page 39: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

4.0'2.0 << s

4.0''2.0 << s

4.0'2.0 << s 6.0''4.0 << s 4.0''2.0 << s6.0'4.0 << s

III. Satisfaction of Spatial restraints

Combine basis pdfs to molecular probability density functions

Page 40: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

Satisfaction of spatial restraints

Find the protein model with the highest probability

Variable target function:

Start with a linear conformation model or a model close to

the template conformation

At first, use only local restraints

minimize some steps using a conjugate gradient optimization

repeat with introducing more and more long range restraints

until all restraints are used

III. Satisfaction of Spatial restraints

Page 41: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

III. Satisfaction of Spatial restraints

Optimization schedule and progress

Page 42: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

EVA

Evaluation of Automatic protein structure prediction [ Burkhard Rost, Andrej Sali, http://maple.bioc.columbia.edu/eva/ ]

CASPCommunity Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction http://predictioncenter.org/casp7/

Model Accuracy Evaluation

Page 43: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

Evaluation of Automatic protein structure prediction

[ Burkhard Rost, Andrej Sali, http://maple.bioc.columbia.edu/eva/ ]

Target SequenceMNIFEMLRID EGLRLKIYKD TEGYYTIGIG HLLTKSPSLN AAKSELDKAI GRNCNGVITK

New PDB ReleasePrediction Servers

e.g.

Evaluation of prediction accuracy

1

2

3

Page 44: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

Typical types of errors

Sequence alignment errors.

Loops which cannot be rebuilt.

Inappropriate template selection.

Subunit displacement.

Page 45: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

Typical types of errors

Sequence alignment errors.

Loops which cannot be rebuilt.

Inappropriate template selection.

Subunit displacement.

Page 46: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

e.g. flap-region in adenylate kinases(1AKE, 4AKE)

e.g. DNA-binding domains(1AWC, 1ETC)

… because they are sequence independent.

Structural rearrangements ….

… cause problems for template selection and automated evaluation:

Page 47: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

Problem:

How can we identify errors in 3-dimensional protein structures (without knowing the correct answer)?

Protein Structure Evaluation

Bond & Angle Geometry

Molecular Interactions

Empirical Force Fields

Statistical Methods

Page 48: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

e.g. GROMOS, CHARMM, AMBER, ...

Which type of errors in a protein structure can you identify by an empirical force filed?

Which type of errors are not recognized?

Empirical Force Fields

Page 49: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

Useful to identify regions with errors in geometry

Statistical Methods

Ramachandran Plot of backbone angles (ϕ,ψ)favored regionsgenerously allowed regions disallowed regions

Amino acids with special properties:• PRO: ϕ = 60º• GLY (�)

Similar plots for χ-angle distributions

Page 50: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

Probability for a feature to occur in a given environment,

e.g.

Solvent exposed / buried

Hydrophobic / polar environment

Electrostatic interactions

Secondary structure

See: R. Luthy (1992) Assessment of protein models with

three-dimensional profiles, Nature, 356(6364):83-5

1D - 3D Checks

Page 51: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

+, Ile86

III, Ala182

II, Phe134

I, Val13

*, Met80

I II III*

Val13 Met80 Phe134 Ala182

A

B

+

Statistical Mean Force Potentials

Atomic non-local interaction energy.

Page 52: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

Atom Type Definitions

Page 53: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

Distance Å

MFPkcal/mol

Methyl-Methyl pairs

Cysteine S-S-pairs

Distance Å

Statistical Mean Force Potentials

Use inverse Boltzmann law to derive an atomic Potential of Mean Force (Ū) from the observed number of atomic pairs (i,j) within a distance shell r±Δr in the training database of protein structures:

Nexpected is the expected number of atomic pairs (i,j) in the same distance shell if there were no interactions between atoms (reference state).

),,(),,(ln),,(

rjiNrjiNRTrjiU

expected

observed−= R: gas constantT: temperature

Page 54: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

ANOLEA : (Atomic Non-Local Environment Assessment)

http://protein.bio.puc.cl/cardex/servers/anolea/

http://swissmodel.expasy.org/anolea/

Page 55: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

Correct Structure:PDB: 1GES

Model with wrongalignment:

Detects local packing errors

Errors in alignments

ANOLEA

Page 56: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

Checks the stereo-chemical quality of a protein structure, producing a

number of plots analyzing its overall and residue-by-residue geometry.

• Covalent geometry• Planarity• Dihedral angles• Chirality• Non-bonded interactions• Main-chain hydrogen bonds• Disulphide bonds• Stereochemical parameters• Residue-by-residue analysis

Laskowski R A, MacArthur M W, Moss D S & Thornton J M (1993). PROCHECK: aprogram to check the stereochemical quality of protein structures. J. Appl. Cryst., 26, 283-291. Morris A L, MacArthur M W, Hutchinson E G & Thornton J M (1992). Stereochemical quality of protein structure coordinates. Proteins, 12, 345-364.

PROCHECK

Page 57: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

WHAT IF I check my structure?

Imagine ...• An everyday situation in a biocomputing lab: "Should they use the structure?" • An everyday situation in a crystallography lab: "Should they deposit the structure already?" In a WHAT_CHECK report, each reported fact has an assigned severity:

error:severe errors encountered during the analyses. Items marked as errors are considered severe problems requiring immediate attention.

warning:Either less severe problems or uncommon structural features. These still need special attention.

note:Statistical values, plots, or other verbose results of tests and analyses that have been performed.

WHAT IF: A molecular modeling and drug design program. G.Vriend, J. Mol. Graph. (1990) 8, 52-56. Errors in protein structures. R.W.W. Hooft, G. Vriend, C. Sander, E.E. Abola, Nature (1996) 381, 272-272.

WhatCheck / WhatIf

Page 58: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

# 49 # Note: Summary report for users of a structureThis is an overall summary of the quality of the structure ascompared with current reliable structures. This summary is mostuseful for biologists seeking a good structure to use for modellingcalculations.

The second part of the table mostly gives an impression of how wellthe model conforms to common refinement constraint values. Thefirst part of the table shows a number of constraint-independentquality indicators.

Structure Z-scores, positive is better than average:1st generation packing quality : -2.5502nd generation packing quality : -5.472 (bad)Ramachandran plot appearance : -1.898chi-1/chi-2 rotamer normality : -1.433Backbone conformation : -2.173

RMS Z-scores, should be close to 1.0:Bond lengths : 0.905Bond angles : 1.476Omega angle restraints : 0.921Side chain planarity : 2.681 (loose)Improper dihedral distribution : 1.771 (loose)Inside/Outside distribution : 1.333 (unusual)

whatcheck.txt

WhatCheck / WhatIf report for a bad model ...

Page 59: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

All checking tools are happy, so can I believe it now?

Models are not experimental facts !

Models can be partially inaccurate or sometimes completely wrong !

A model is a tool that helps to interpret biochemical data.

Page 60: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

ANOLEA : (Atomic Non-Local Environment Assessment)

• http://protein.bio.puc.cl/cardex/servers/anolea/• http://swissmodel.expasy.org/anolea/

ProCheck

• http://www.biochem.ucl.ac.uk/~roman/procheck/procheck.html

WhatCheck

• http://www.cmbi.kun.nl/gv/whatcheck/

Verify3D

• http://www.doe-mbi.ucla.edu/Services/Verify_3D/

Biotech Validation Suite for Protein Structures

• http://biotech.ebi.ac.uk:8400/

Some useful Evaluation Tools

Page 61: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

What can models be used for ?

“A Model must be wrong, in some respects, else it would

be the thing itself. The trick is to see where it is right.”

(Henry A. Bent)

Page 62: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

Save Zone

TwilightZone

MidnightZone

Model quality vs. sequence identity

Page 63: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

Annotation by fold assignment3D-motif searching, active site recognition

Including NMR restraints

Supporting site directed mutagenesis

X-Ray Molecular replacement models

Docking of small moleculesDrug development;

comparable to medium resolution NMR or low resolution X-ray structures

What can models be used for ?

Page 64: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

The knowledge of 3-dimensional

structures of target proteins allows

to undertand interactions of

inhibitors and drugs with their target

proteins.

Application example: Understanding drug interactions

Page 65: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

Reference:

Discovery of a potent and selective protein kinase CK2 inhibitor by high-throughput docking.

Vangrevelinghe E, Zimmermann K, Schoepfer J, Portmann R, Fabbro D, Furet P.Oncology Research, Novartis Pharma, Basle, J Med Chem. 2003 Jun 19;46(13):2656-62.

Discovery of CK2a Inhibitors by in silico docking

Homology model of

the target molecule:

Page 66: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

Medicines are not Effective in all Patients

Group Incomplete/absent efficacy

SSRI 10-25%ACE-I 10-30%Beta blockers 15-25%Statins 30-70%Beta2 agonists 40-70%

[ Spear BB (2001) Trends Mol Med;7(5):201-204 ]

InterInter--individual differences in drug efficacy:individual differences in drug efficacy:

Page 67: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

E.g. Changes in the electrostatic properties upon mutation

-8 -4 0 +4 +8 kT/e

1

4

2

7

3

5

6 8

Structural analysis of human mutations and nsSNPs

Page 68: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

100

1'000

10'000

100'000

1'000'000

1986 1988 1990 1992 1994 1996 1998 2000 2002 2004

TrEMBL

SwissProt

PDB

Public database holdings

Page 69: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

• large scale experimental structure solution projects

Goal: Most of the sequences in a genome database should match

at least one structure with a sufficient sequence identity

allowing for reliable modeling.

Range of sequence space that can be modeled with acceptable accuracy.

The modeling error determines selection of targets for structural genomics.

Structural Genomics

Page 70: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

Structural Genomics – Target Selection

Page 71: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

Protein Modeling Resources

SWISS-MODEL http://swissmodel.expasy.org

Modeller http://www.salilab.org

WhatIf http://www.cmbi.kun.nl/whatif/

3D-JIGSAW http://www.bmm.icnet.uk/people/paulb/3dj/form.html

CPHmodels http://www.cbs.dtu.dk/services/CPHmodels/

SDSC1 http://cl.sdsc.edu/hm.html

Page 72: Homology Modeling - Vital-ITHomology Model(s) • Multiple sequence alignment for pairs > 40% identity or • Use structural alignment of templates to guide sequence alignment of target

ANOLEA : (Atomic Non-Local Environment Assessment)

• http://protein.bio.puc.cl/cardex/servers/anolea/• http://swissmodel.expasy.org/anolea/

ProCheck

• http://www.biochem.ucl.ac.uk/~roman/procheck/procheck.html

WhatCheck

• http://www.cmbi.kun.nl/gv/whatcheck/

Verify3D

• http://www.doe-mbi.ucla.edu/Services/Verify_3D/

Biotech Validation Suite for Protein Structures

• http://biotech.ebi.ac.uk:8400/

Some useful Evaluation Tools