38
Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Robert F. Murphy Copyright Copyright 1996, 1999, 1996, 1999, 2001. 2001. All rights reserved. All rights reserved.

Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright 1996, 1999, 2001. All rights reserved

  • View
    215

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, 2001. All rights reserved

Computational Biology, Part 10Protein Structure Prediction and

Display

Computational Biology, Part 10Protein Structure Prediction and

Display

Robert F. MurphyRobert F. Murphy

Copyright Copyright 1996, 1999, 2001. 1996, 1999, 2001.

All rights reserved.All rights reserved.

Page 2: Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, 2001. All rights reserved

GoalGoal

Take primary structure (sequence) and, Take primary structure (sequence) and, using rules derived from known structures, using rules derived from known structures, predict the secondary structure that is most predict the secondary structure that is most likely to be adopted by each residuelikely to be adopted by each residue

Page 3: Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, 2001. All rights reserved

Structural PropensitiesStructural Propensities

Due to the size, shape and charge of its side Due to the size, shape and charge of its side chain, each amino acid may “fit” better in chain, each amino acid may “fit” better in one type of secondary structure than anotherone type of secondary structure than another

Classic example: The rigidity and side chain Classic example: The rigidity and side chain angle of proline cannot be accomodated in angle of proline cannot be accomodated in an an -helical structure-helical structure

Page 4: Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, 2001. All rights reserved

Structural PropensitiesStructural Propensities

Two ways to view the significance of this Two ways to view the significance of this preference (or preference (or propensitypropensity)) It may control or affect the folding of the It may control or affect the folding of the

protein in its immediate vicinity (amino acid protein in its immediate vicinity (amino acid determines structure)determines structure)

It may constitute selective pressure to use It may constitute selective pressure to use particular amino acids in regions that must have particular amino acids in regions that must have a particular structure (structure determines a particular structure (structure determines amino acid)amino acid)

Page 5: Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, 2001. All rights reserved

Secondary structure predictionSecondary structure prediction

In either case, amino acid propensities In either case, amino acid propensities should be useful for predicting secondary should be useful for predicting secondary structurestructure

Two classical methods that use previously Two classical methods that use previously determined propensities:determined propensities: Chou-FasmanChou-Fasman Garnier-Osguthorpe-RobsonGarnier-Osguthorpe-Robson

Page 6: Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, 2001. All rights reserved

Chou-Fasman methodChou-Fasman method

Uses table of conformational parameters Uses table of conformational parameters (propensities) determined primarily from (propensities) determined primarily from measurements of secondary structure by CD measurements of secondary structure by CD spectroscopyspectroscopy

Table consists of one “likelihood” for each Table consists of one “likelihood” for each structure for each amino acidstructure for each amino acid

Page 7: Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, 2001. All rights reserved

Chou-Fasman propensities (partial table)

Chou-Fasman propensities (partial table)

Amino Acid P Pβ Pt

Glu 1.51 0.37 0.74Met 1.45 1.05 0.60Ala 1.42 0.83 0.66Val 1.06 1.70 0.50Ile 1.08 1.60 0.50Tyr 0.69 1.47 1.14Pro 0.57 0.55 1.52Gly 0.57 0.75 1.56

Page 8: Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, 2001. All rights reserved

Chou-Fasman methodChou-Fasman method

A prediction is made for each type of A prediction is made for each type of structure for each amino acidstructure for each amino acid Can result in ambiguity if a region has high Can result in ambiguity if a region has high

propensities for both helix and sheet (higher propensities for both helix and sheet (higher value usually chosen, with exceptions)value usually chosen, with exceptions)

Page 9: Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, 2001. All rights reserved

Chou-Fasman methodChou-Fasman method

Calculation rules are somewhat Calculation rules are somewhat ad hocad hoc Example: Method for helixExample: Method for helix

Search for nucleating region where 4 out of 6 a.a. Search for nucleating region where 4 out of 6 a.a. have Phave P > 1.03 > 1.03

Extend until 4 consecutive a.a. have an average PExtend until 4 consecutive a.a. have an average P < <

1.001.00 If region is at least 6 a.a. long, has an average PIf region is at least 6 a.a. long, has an average P > >

1.03, and average P1.03, and average P > average P > average Pββ consider region to consider region to

be helixbe helix

Page 10: Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, 2001. All rights reserved

Garnier-Osguthorpe-RobsonGarnier-Osguthorpe-Robson

Uses table of propensities calculated Uses table of propensities calculated primarily from structures determined by X-primarily from structures determined by X-ray crystallographyray crystallography

Table consists of one “likelihood” for each Table consists of one “likelihood” for each structure for each amino acid for each structure for each amino acid for each position in a 17 amino acid windowposition in a 17 amino acid window

Page 11: Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, 2001. All rights reserved

Garnier-Osguthorpe-RobsonGarnier-Osguthorpe-Robson

Analogous to searching for “features” with a 17 Analogous to searching for “features” with a 17 amino acid wide frequency matrixamino acid wide frequency matrix

One matrix for each “feature”One matrix for each “feature” -helix-helix ββ-sheet-sheet turnturn coilcoil

Highest scoring “feature” is found at each Highest scoring “feature” is found at each locationlocation

Page 12: Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, 2001. All rights reserved

Accuracy of predictionsAccuracy of predictions

Both methods are only about 55-65% accurateBoth methods are only about 55-65% accurate A major reason is that while they consider the A major reason is that while they consider the

local context of each sequence element, they local context of each sequence element, they do not consider the global context of the do not consider the global context of the sequence - the type of proteinsequence - the type of protein The same amino acids may adopt a different The same amino acids may adopt a different

configuration in a cytoplasmic protein than in a configuration in a cytoplasmic protein than in a membrane proteinmembrane protein

Page 13: Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, 2001. All rights reserved

“Adaptive” methods“Adaptive” methods

Neural network methods - train network using Neural network methods - train network using sets of known proteins then use to predict for sets of known proteins then use to predict for query sequencequery sequence nnpredictnnpredict

Homology-based methods - predict structure Homology-based methods - predict structure using rules derived only from proteins using rules derived only from proteins homologous to query sequencehomologous to query sequence SOPMSOPM PHDPHD

Page 14: Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, 2001. All rights reserved

Neural Network methodsNeural Network methods

A neural network with multiple layers is A neural network with multiple layers is presented with known sequences and presented with known sequences and structures - network is trained until it can structures - network is trained until it can predict those structures given those predict those structures given those sequencessequences

Allows network to adapt as needed (it can Allows network to adapt as needed (it can consider neighboring residues like GOR)consider neighboring residues like GOR)

Page 15: Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, 2001. All rights reserved

Neural Network methodsNeural Network methods

Different networks can be created for Different networks can be created for different types of proteinsdifferent types of proteins

Page 16: Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, 2001. All rights reserved

Homology-based modelingHomology-based modeling

Principle: From the sequences of proteins Principle: From the sequences of proteins whose structures are known, choose a whose structures are known, choose a subset that is similar to the query sequencesubset that is similar to the query sequence

Develop rules (e.g., train a network) for just Develop rules (e.g., train a network) for just this subsetthis subset

Use these rules to make prediction for the Use these rules to make prediction for the query sequencequery sequence

Page 17: Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, 2001. All rights reserved

Retrieving 3D structuresRetrieving 3D structures

Protein Data Bank (PDB)Protein Data Bank (PDB) using web browserusing web browser

home page = http://www.pdb.bnl.gov/home page = http://www.pdb.bnl.gov/

using anonymous FTPusing anonymous FTP EntrezEntrez

using web browserusing web browser BLASTBLAST

using web browserusing web browser

Page 18: Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, 2001. All rights reserved

Displaying Structures with RasMolDisplaying Structures with RasMol The The GIFGIF image of Ribonuclease A is static image of Ribonuclease A is static

- we cannot rotate the molecule or recolor - we cannot rotate the molecule or recolor portions of it to aid visualizationportions of it to aid visualization

For this we can use For this we can use RasMolRasMol, a public , a public domain program available for wide range of domain program available for wide range of computers, including MacOS, Windows and computers, including MacOS, Windows and UnixUnix

Page 19: Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, 2001. All rights reserved

Displaying Structures with RasMolDisplaying Structures with RasMol Drs. David Hackney and Will McClure Drs. David Hackney and Will McClure

have developed an online tutorial for have developed an online tutorial for RasMolRasMol - a link may be found on the 03- - a link may be found on the 03-310, 03-311 and 03-510 web pages310, 03-311 and 03-510 web pages

Page 20: Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, 2001. All rights reserved

PDB filesPDB files

In order to optimally display, rotate and In order to optimally display, rotate and color the 3D structure, we need to download color the 3D structure, we need to download a copy of the coordinates for each atom in a copy of the coordinates for each atom in the molecule to our local computerthe molecule to our local computer

The most common format for storage and The most common format for storage and exchange of atomic coordinates for exchange of atomic coordinates for biological molecules is biological molecules is PDB file formatPDB file format

Page 21: Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, 2001. All rights reserved

PDB filesPDB files

PDB file format PDB file format is a text (ASCII) format, is a text (ASCII) format, with an extensive header that can be read with an extensive header that can be read and interpreted either by programs or by and interpreted either by programs or by peoplepeople

We can request either the header only or the We can request either the header only or the entire file; the next screen requests the entire file; the next screen requests the header onlyheader only

Page 22: Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, 2001. All rights reserved

http://www.pdb.bnl.gov/pdb-bin/opdbshort

Page 23: Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, 2001. All rights reserved

http://www.pdb.bnl.gov/pdb-bin/send-pdb?filename=1rat&short=1

Page 24: Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, 2001. All rights reserved

http://www.pdb.bnl.gov/pdb-bin/opdbshort

Page 25: Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, 2001. All rights reserved

RasMol has a graphics window and a command window

Page 26: Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, 2001. All rights reserved

PDB Retrieval & DisplayPDB Retrieval & Display

Can download PDB files from EntrezCan download PDB files from Entrez Second example: Display structures of Second example: Display structures of

MHC proteins containing MHC proteins containing ββ22-microglobulin-microglobulin

Page 27: Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, 2001. All rights reserved
Page 28: Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, 2001. All rights reserved

Useful RasMol commandsUseful RasMol commands

show sequence show sequence lists all amino acids in each lists all amino acids in each chainchain

select *a select *a selects all residues in chain Aselects all residues in chain A colour red colour red displays the selected residues in displays the selected residues in

redred

Page 29: Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, 2001. All rights reserved
Page 30: Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, 2001. All rights reserved
Page 31: Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, 2001. All rights reserved

Alternatives to RasMolAlternatives to RasMol

NCBI (providers of Entrez service) have NCBI (providers of Entrez service) have developed a public domain 3D viewer for developed a public domain 3D viewer for molecules, Cn3D (“See in 3D”)molecules, Cn3D (“See in 3D”)

Integrated into Network Entrez ClientIntegrated into Network Entrez Client Available as a stand-alone helper Available as a stand-alone helper

applicationapplication

Page 32: Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, 2001. All rights reserved

Alternatives to RasMolAlternatives to RasMol

It is often useful for an investigator or It is often useful for an investigator or teacher to be able to save a series of views teacher to be able to save a series of views of one or more molecules so that they can of one or more molecules so that they can be replayed again (creating a be replayed again (creating a scriptscript for a for a “movie” with preprogrammed changes in “movie” with preprogrammed changes in rotation, color, etc.)rotation, color, etc.)

Two programs that do this are Two programs that do this are CHIMECHIME and and MAGEMAGE

Page 33: Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, 2001. All rights reserved

Alternatives to RasMolAlternatives to RasMol

CHIMECHIME (derived from RasMol source) is (derived from RasMol source) is available as a Browser Pluginavailable as a Browser Plugin

MAGEMAGE is available as a stand-alone helper is available as a stand-alone helper applicationapplication

Information on both is available through Information on both is available through links on a HELP page at the PDBlinks on a HELP page at the PDB

Page 34: Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, 2001. All rights reserved

http://www.pdb.bnl.gov/pdb-bin/opdbshort

Page 35: Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, 2001. All rights reserved

Structural homologyStructural homology

It is useful for new proteins whose 3D It is useful for new proteins whose 3D structure is structure is notnot known to be able to find known to be able to find proteins whose 3D structure proteins whose 3D structure isis known known that that are expected to have a similar structure to are expected to have a similar structure to the unknownthe unknown

It is also useful for proteins whose 3D It is also useful for proteins whose 3D structure is known to be able to find structure is known to be able to find other other proteins with similar structuresproteins with similar structures

Page 36: Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, 2001. All rights reserved

Finding proteins with known structures based on sequence homology

Finding proteins with known structures based on sequence homology If you want to find known 3D structures of If you want to find known 3D structures of

proteins that are similar in proteins that are similar in primary amino acid primary amino acid sequence sequence to a particular sequence, can use to a particular sequence, can use BLASTBLAST web page and choose the web page and choose the PDB PDB databasedatabase

This is This is notnot the PDB database of structures, rather the PDB database of structures, rather a database of amino acid sequences for those a database of amino acid sequences for those proteins in the structure databaseproteins in the structure database

Links are available to retrieve PDB filesLinks are available to retrieve PDB files

Page 37: Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, 2001. All rights reserved

Finding proteins with similar structures to a known proteinFinding proteins with similar structures to a known protein For For literatureliterature and and sequencesequence databases, databases, EntrezEntrez

allows allows neighborsneighbors to be found for a selected to be found for a selected entry based on “homology” in terms (MEDline entry based on “homology” in terms (MEDline database) or sequence (protein and nucleic database) or sequence (protein and nucleic acid sequence databases)acid sequence databases)

An experimental feature allows neighbors to An experimental feature allows neighbors to be chosen for entries in the be chosen for entries in the structure structure database database

Page 38: Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, 2001. All rights reserved

Finding proteins with similar structures to a known proteinFinding proteins with similar structures to a known protein Proteins with similar structures are termed Proteins with similar structures are termed

““VAST NeighborsVAST Neighbors” by ” by EntrezEntrez (VAST (VAST refers to the method used to evaluate refers to the method used to evaluate similarity of structure)similarity of structure)

VASTVAST or or structure neighbors structure neighbors may or may may or may not have sequence homology to each othernot have sequence homology to each other