49
CS6104: Computational Structural Biology & Bioinformatics Homology Modeling and Molecular Docking David Bevan Dept of Biochemistry [email protected] April 6, 2004

CS6104: Computational Structural Biology & Bioinformatics

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CS6104: Computational Structural Biology & Bioinformatics

CS6104: Computational Structural Biology & Bioinformatics

Homology Modeling and Molecular Docking

David BevanDept of Biochemistry

[email protected]

April 6, 2004

Page 2: CS6104: Computational Structural Biology & Bioinformatics

Some Components of Molecular Modeling

• Visualization• Molecular mechanics• Quantum mechanics• Molecular dynamics• Homology modeling• Molecular docking

Page 3: CS6104: Computational Structural Biology & Bioinformatics

Aims of Structural Genomics

• To determine or predict the 3D structures of all the proteins encoded in the genome

• Up to 40% of the known protein sequenceshave at least one segment related to one or more structures

=> Determine all of the folds=> Use homology modeling to

predict 3D structures

Page 4: CS6104: Computational Structural Biology & Bioinformatics

What is Homology?

• Cannot be partial• Assertion of homology is an hypothesis• Hypothesis is usually based on extent of

sequence similarity between proteins, though ultimately similar functions need to be demonstrated

Homology: having a commonevolutionary origin

Page 5: CS6104: Computational Structural Biology & Bioinformatics

Some Definitions

• Homologue (Homolog): proteins that are evolutionarily related

• Orthologue (Ortholog): homologues from different organisms

• Paralogue (Paralog): homologues from the same organism

Page 6: CS6104: Computational Structural Biology & Bioinformatics

Homology Modeling(Comparative Structure Modeling)

• 3D structures conserved to greater extent than primary structures

• Develop models of protein structure based on structures of homologues

• Using known structure as a “template”, calculate 3D model of a protein for which only know the sequence (the “target”)

• Model includes core, loops, and side chains

Page 7: CS6104: Computational Structural Biology & Bioinformatics

Steps in Homology Modeling

Page 8: CS6104: Computational Structural Biology & Bioinformatics

Template Selection• Identify protein structures related to

target and select those to be used as templates

• Involves searching a database such as at NCBI (e.g., BLAST at NCBI)

• Involves a certain amount of sequence alignment

Page 9: CS6104: Computational Structural Biology & Bioinformatics

Aligning Sequences

• Critical step in homology modeling• Many options to consider• Factors to consider

– Which algorithm to use– Which scoring method to apply– Whether and how to assign gap penalties

Page 10: CS6104: Computational Structural Biology & Bioinformatics

Algorithms for Alignment

• Earliest method was that of Needleman and Wunsch (1970)

• Another widely adopted one was by Smith and Waterman (1981)

• More recent are the heuristic methods such as BLAST and FASTA, which are more approximate but much faster

Page 11: CS6104: Computational Structural Biology & Bioinformatics

Needleman-Wunsch Algorithm• Global alignment method (i.e., match

sequences along entire length)• Produces an optimal alignment• Steps

– Setting up a matrix– Scoring the matrix– Identifying the optimal alignment

• Can be time-consuming• Not effective for highly divergent proteins

Page 12: CS6104: Computational Structural Biology & Bioinformatics

Smith and Waterman Algorithm• Local alignment method• Useful in database searching• Similar to global alignment

– Proteins arranged in a matrix– Optimal path along diagonal is sought

• Differences compared to global alignment– Can start alignment at internal position– Alignment does not have to extend to ends

Page 13: CS6104: Computational Structural Biology & Bioinformatics

Heuristic Search Methods• Developed to do rapid searches of

large databases• Not guaranteed to find globally

optimal solution• Rarely miss a significant match• Identify regions of potential interest

and then expand regions to identify alignment

• Include FASTA and BLAST

Page 14: CS6104: Computational Structural Biology & Bioinformatics

Scoring Alignments• Need some method of scoring to find optimal

alignment• Four general types of scoring have been applied

– Identity: considers only identical residues– Genetic code: considers the number of base changes in

DNA or RNA to interconvert codons for the amino acids

– Chemical similarity: considers physico-chemical properties

– Observed substitutions: considers substitution frequencies observed in alignments of sequences (*used the most*)

Page 15: CS6104: Computational Structural Biology & Bioinformatics

PAM Matrices• Dayhoff mutation data matrix originally developed to

study evolution of proteins• Uses probability of one amino acid mutating to a second

amino acid within a particular evolutionary time• Denoted PAM (Percentage of Acceptable Point Mutations)• One PAM is a unit of evolutionary divergence in which 1%

of the amino acids have been changed• Uses substitution frequencies from alignments of very

similar sequences and extrapolates to more distant relationships

• PAM40 will recognize short alignments of highly similar sequences

• PAM250 will recognize longer, weaker local alignments

Page 16: CS6104: Computational Structural Biology & Bioinformatics

BLOSUM Matrices• Matrices based on alignments of more distantly related

sequences• Uses alignments of short regions of related sequences• Sequences clustered into groups (blocks) based on

similarity at some threshold value of percentage identity

• Blocks substitution matrices (BLOSUM) derived based on substitution frequencies

• BLOSUM90 matrix derived using threshold of 90% identity => very similar sequences

• BLOSUM30 matrix derived using threshold of 30% identity => highly divergent sequences

Page 17: CS6104: Computational Structural Biology & Bioinformatics

Summary of PAM and BLOSUM Matrices

BLOSUM90

PAM30

BLOSUM 80 BLOSUM62

PAM120 PAM180BLOSUM45

PAM240

Less divergent More divergent

Mouse vs Rat

Mouse vsBacteria

Page 18: CS6104: Computational Structural Biology & Bioinformatics

Building the 3D Model• Rigid body assembly

– Rigid bodies from aligned sequences– Core region, loops, and side chains

• Segment matching– Most hexapeptide segments can be

clustered into ~100 structural classes– Use segments from homologs (or non-

homologs) to build up structure

Page 19: CS6104: Computational Structural Biology & Bioinformatics

Building the 3D Model (cont.)

• Satisfaction of spatial restraints– Generate restraints from templates– Assume distances and angles between

aligned template and target are similar– Minimize violations of all restraints using

distance geometry or optimization techniques (i.e., force field) to satisfy spatial restraints

Page 20: CS6104: Computational Structural Biology & Bioinformatics

Evaluating the 3D ModelProcheck

• Ramachandran plot• Planar peptide bonds• Side chain conformations that

correspond to those in rotamer library

• Hydrogen bonding• No bad atom-atom contacts

Page 21: CS6104: Computational Structural Biology & Bioinformatics

Evaluating the 3D Model3D-Profiler

• Based on statistical preferences of each of the 20 amino acids for particular environments within a protein

• Each residue position can be characterized by its environment

• Preferred environments for amino acids defined by three parameters– Area of each residue that is buried– Fraction of side-chain area that is covered by polar

atoms (i.e., O and N)– Local secondary structure

Page 22: CS6104: Computational Structural Biology & Bioinformatics

Refining the 3D Model

• MD and energy minimization• Application of restraints based on

experimental data (e.g., NMR, fluorescence)

Page 23: CS6104: Computational Structural Biology & Bioinformatics

Threading• Based on concepts of finite number of folds

and conservation of 3D structure• Challenge is to find fold(s) adopted by a given

sequence• Model a given amino acid sequence as each of

several folds from a pre-specified library• Models are evaluated by some criteria to

identify the most likely fold (e.g., pairwise contact “energies”)

⇒ instead of using homologues astemplates, uses library of folds

Page 24: CS6104: Computational Structural Biology & Bioinformatics

Applications of the Model

Page 25: CS6104: Computational Structural Biology & Bioinformatics

Arabidopsis thaliana Project

• 46 genes for family 1 β-glucosidases– 40 β-O-glucosidases– 6 β-S-glucosidases

• ~ 20 genes for family 35 β-galactosidases

An NSF 2010 Project

Page 26: CS6104: Computational Structural Biology & Bioinformatics

Model of Beta-Glucosidase

Red: Crystal structure Blue: Homology model

Page 27: CS6104: Computational Structural Biology & Bioinformatics

Crystal vs. Homology Model of β-Glucosidase

RMSD (heavy atoms) 2.53 Å

RMSD (backbone atoms) 1.82 Å

Page 28: CS6104: Computational Structural Biology & Bioinformatics

Possible Template Structures• Maize Glu1

– 1E1E: native enzyme– 1E1F: native enzyme in complex with PSG– 1E4L: E191D– 1E4N, 1E55, 1E56: E191D complexes

• Myrosinase– 1E4M: Sinapis alba 1.20 Angstroms– Many others

• Cyanogenic β-glucosidase from white clover - 1CBG• Bacillus polymyxa β-glucosidase

– 1BGA: native enzyme– 1BGG, 1E4I, 1TR1: complexes or mutants

• Bacillus circulans β-glucosidase - 1QOX

Page 29: CS6104: Computational Structural Biology & Bioinformatics

Approach to Homology Modeling• Identify and align templates• Align target sequences with templates

– Biology Workbench (CLUSTALW)– MODELLER4 and MODELLER6

• Build models– MODELLER4 and MODELLER6

• Evaluate models– PROCHECK– PROSAII

Page 30: CS6104: Computational Structural Biology & Bioinformatics

Evaluation of Models

0.4-10.2Average0 to 1.3-8.3 to -11.4Range

0.2-9.8At5g259800-10.1At1g60090

0.5-10.9At2g444701.0-9.8At1g662800-10.6At1g26560

0.2-11.3At5g44640

Rama - % disallowed

ProsaII z-score

Page 31: CS6104: Computational Structural Biology & Bioinformatics

Molecular Docking

• Receptor-ligand• Enzyme-substrate• Protein-DNA• Protein-protein• Receptor-drug

Attempt to predict the structure(s) of the intermolecular complex between two or more molecules.

Page 32: CS6104: Computational Structural Biology & Bioinformatics

General Considerations• Molecular representations

– Abstract or atoms– Flexible or fixed

• Juxtaposition of molecules– Interactive or automated– Search algorithm should create an optimum

number of conformations that include experimental binding modes

• Evaluation of complementarity– Scoring function– Force field energy functions

Page 33: CS6104: Computational Structural Biology & Bioinformatics

Search Algorithms• Molecular dynamics• Monte Carlo methods• Genetic algorithms• Fragment-based methods• Point complementary methods• Distance geometry methods• Tabu searches• Systematic searches

Page 34: CS6104: Computational Structural Biology & Bioinformatics

Docking Algorithms

SPROUTICMGrowMolSurflexSMoGAutoDockMCSSSLIDEGRIDFlexX/FlexELUDIDOCKDe novo DesignVirtual Screening

Page 35: CS6104: Computational Structural Biology & Bioinformatics

AutoDock• A suite of automated docking tools• Designed to predict how small

molecules, such as substrates or drug candidates, bind to a receptor

• Consists of three programs– AutoTors: to define torsions in ligand– AutoGrid: to calculate grids– AutoDock: to perform the docking– Also includes GUI called AutoDockTools

Page 36: CS6104: Computational Structural Biology & Bioinformatics

AutoDock Grid Maps• Pre-calculated and used as

look-up tables• Place probe atom at each grid

point and calculate energy• Grid for each type of atom

(e.g., C, O, N, H)• Interpolate ligand atom

positions relative to grid points• Energy based on van der

Waals, electrostatics, and hydrogen bonding

Page 37: CS6104: Computational Structural Biology & Bioinformatics

Structure-based Drug Design

• Directs discovery of a drug lead, a compound with at least micromolaraffinity for a target

• Involves combination of docking (virtual ligand screening) and experimental assays (high throughput screening)

Page 38: CS6104: Computational Structural Biology & Bioinformatics

Docking and HTS Overview

Docking HTS

Page 39: CS6104: Computational Structural Biology & Bioinformatics

Virtual Ligand Screening

Page 40: CS6104: Computational Structural Biology & Bioinformatics

Retinoid Receptors

• Members of superfamily of nuclear receptors (transcription factors)

• Two families (RARs and RXRs)• Subtypes denoted α, β, γ within each

family

Page 41: CS6104: Computational Structural Biology & Bioinformatics

Functions of Receptors

• Normal processes– Morphogenesis– Differentiation– Metabolism– Homeostasis

• Pathologies– Teratogenicity– Endocrine disruption

Page 42: CS6104: Computational Structural Biology & Bioinformatics

Natural Ligands for Retinoid Receptors

O-CH3 CH3

CH3

CH3 CH3 O

CH3 CH3

CH3

CH3

CH3

O O-

trans-Retinoic acid cis-Retinoic acid

Page 43: CS6104: Computational Structural Biology & Bioinformatics

Docking of trans-Retinoic Acid to RARγ

Blue = crystal structure

Red = re-docked ligand

Page 44: CS6104: Computational Structural Biology & Bioinformatics

Docking of RetinoidsDesignation Structure RMSD (Å)t-RA

O-CH3 CH3

CH3

CH3 CH3 O

0.30

c-RA CH3 CH3

CH3

CH3

CH3

O O-

0.37

BMS181156

O

CO

O-0.97

BMS184394-R

O

CO

O-0.29

CD564

O

CO

O-0.23

Page 45: CS6104: Computational Structural Biology & Bioinformatics

Kinesins

Page 46: CS6104: Computational Structural Biology & Bioinformatics

Eg5

• A kinesin motor protein

• Slides microtubules of developing spindle apart to pushcentrosomes apart

Page 47: CS6104: Computational Structural Biology & Bioinformatics

Monastrol• Inhibitor of Eg5• An anti-mitotic agent

Page 48: CS6104: Computational Structural Biology & Bioinformatics

Docking to Human Eg5

Page 49: CS6104: Computational Structural Biology & Bioinformatics

Docking to Drosophila Eg5