Homology Modeling (Comparative Structure...

Preview:

Citation preview

GBCB 5874: Problem Solving in GBCB

Homology Modeling(Comparative Structure Modeling)

GBCB 5874: Problem Solving in GBCB

Aims of Structural Genomics

• High-throughput 3D structure determination andanalysis

• To determine or predict the 3D structures of all theproteins encoded in the genome

• Up to 40% of the known protein sequences have atleast one segment related to one or morestructures

=> Determine all of the folds=> Use homology modeling to predict 3D structures

GBCB 5874: Problem Solving in GBCB

Growth in the PDB

GBCB 5874: Problem Solving in GBCB

What is Homology?

• Homology: having a common evolutionaryorigin

• Cannot be partial• Assertion of homology is an hypothesis• Hypothesis usually based on extent of

sequence similarity between proteins,though similar functions should bedemonstrated

GBCB 5874: Problem Solving in GBCB

Some Definitions

• Homologues (homologs): proteins thatare evolutionarily related

• Orthologues (orthologs): homologuesfrom different organisms

• Paralogues (paralogs): homologuesfrom the same organism

GBCB 5874: Problem Solving in GBCB

Basis of Homology Modeling

• 3D structures conserved to greaterextent than primary structures

• Develop models of protein structurebased on structures of homologues

• Using known structure as a “template”,calculate 3D model of a protein forwhich only know the sequence (the“target”)

GBCB 5874: Problem Solving in GBCB

Steps in Homology Modeling

GBCB 5874: Problem Solving in GBCB

Template Selection

• Identify protein structures related to targetand select those to be used as templates

• Involves searching a database such as atNCBI (e.g., BLAST at NCBI)

• Involves a certain amount of sequencealignment

GBCB 5874: Problem Solving in GBCB

Aligning Sequences

• Critical step in homology modeling• Many options to consider• Factors to consider

– Which algorithm to use– Which scoring method to apply– Whether and how to assign gap penalties

GBCB 5874: Problem Solving in GBCB

Scoring Alignments• Need some method of scoring to find optimal

alignment• Four general types of scoring have been applied

– Identity: considers only identical residues– Genetic code: considers the number of base changes in

DNA or RNA to interconvert codons for the amino acids– Chemical similarity: considers physico-chemical properties– Observed substitutions: considers substitution frequencies

observed in alignments of sequences (*used the most*)

GBCB 5874: Problem Solving in GBCB

Scoring Matrices• PAM40 - short highly similar sequences• PAM160 - detecting members of protein family• PAM250 - longer more divergent sequences• BLOSUM90 - short highly similar sequences• BLOSUM80 - detecting members of protein family• BLOSUM62 - most effective in finding all potential

similarities• BLOSUM30 - longer more divergent sequences

GBCB 5874: Problem Solving in GBCB

Log-Odds Matrix

Si,j = log[qi,j)/(pipj)]

qi,j = frequency of substitutionpipj= probability of occurrence of

residues i and j in proteins

GBCB 5874: Problem Solving in GBCB

Building the 3D Model

• Rigid body assembly– Rigid bodies from aligned sequences– Core region, loops, and side chains

• Satisfaction of spatial restraints– Generate restraints from templates– Assume distances and angles between aligned template

and target are similar– Minimize violations of all restraints using distance

geometry or optimization techniques (i.e., force field) tosatisfy spatial restraints

GBCB 5874: Problem Solving in GBCB

Evaluation of Model Quality

• Check for proper protein stereochemistry– ProCheck (http://biotech.ebi.ac.uk:8400/cgi-bin/sendquery)

• Ramachandran plot, bond-length, …– Whatif (http://www.cmbi.kun.nl/gv/servers/WIWWWI)

• Packing quality– Both web-servers

• Fitness of sequence to structure– ProsaII (http://lore.came.sbg.ac.at/Services/prosa.html)

• Program runs on Linux and Unix– Verify3D (http://www.doe-mbi.ucla.edu/Services/Verify_3D/)

• Web-server

GBCB 5874: Problem Solving in GBCB

Evaluating the 3D ModelProcheck

• Ramachandran plot• Planar peptide bonds• Side chain

conformations thatcorrespond to thosein rotamer library

• Hydrogen bonding• No bad atom-atom

contacts

GBCB 5874: Problem Solving in GBCB

Evaluating the 3D Model3D-Profiler (Verify 3D)

• Based on statistical preferences of each of the 20amino acids for particular environments within aprotein

• Residue positions characterized by environment• Preferred environments defined by three

parameters– Area of each residue that is buried– Fraction of side-chain area that is covered by

polar atoms (i.e., O and N)– Local secondary structure

GBCB 5874: Problem Solving in GBCB

Refining the 3D Model

• MD and energy minimization• Application of restraints based on

experimental data (e.g., NMR,fluorescence)

GBCB 5874: Problem Solving in GBCB

Applications of the Model

Recommended