Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
1
NAME:_______________________
COLLEGE:________________________
MOLECULAR GRAPHICS & STRUCTURAL
BIOINFORMATICS PRACTICAL CLASS – MT 2011
Dr. Simon Newstead – [email protected]
Dr. Shabana Vohra - [email protected]
& other demonstrators
A digital version of this document is available at:
http://sbcb.bioch.ox.ac.uk/teaching/BioPrac_2012.pdf
COMMENTS:
GRADE:
2
INTRODUCTION
Bioinformatics is central to modern biochemistry, involving the use of computers to model and
analyse data. There are various subdivisions of bioinformatics like sequence analysis and searching,
protein structure analysis and predictions. Structural bioinformatics is mainly concerned with
computational analysis of information on protein structures. This practical is intended to provide you
with a "feel" for some basic aspects of bioinformatics. In a relatively short period of time it is not
possible to provide an exhaustive coverage of all aspects of the subject. However, we hope to convey
some of the excitement and interest associated with work in this field.
The aims of this practical are to introduce you to:
A simple molecular graphics program, PyMOL, that enables you to display and analyse
protein structures
Use of web based techniques to analyse protein structural properties
During this process you will analyse several protein structures. This will provide an introduction to the
diversity of protein structures. The practical is open-ended, in that you will be provided with
information on how to access the structure of any protein in which you are interested (provided that it
has been determined!).
A central aspect of this practical is that you are meant to learn how to use the programs by
experimenting with them, i.e. we will not tell exactly what to do at each stage. This is an
important aspect of your scientific training. We will provide you with the resources and the
information but then you are expected to use your intelligence and biochemical knowledge to
fully exploit these resources.
3
OVERVIEW
There are two stages to this practical:
Use of PyMOL in a Windows environment to visualise and analyse protein structure.
Use of advanced graphics and molecular modelling tools for the analysis of protein structure
and function
This year we have decided to timetable an introductory lecture in week 4 that will describe the aims of
the practical and introduce you to the graphics programs we will use. It is hoped that during this week
you will be able to download the programs, PyMol in particular, and experiment with these before the
practical. Demonstrators will be available for the first 2 days of the practical, and the practical itself
will be supervised for the remaining 2; students are encouraged to address any issues with the material
in the first days. The computing laboratory will be available for 4 sessions; in the first week, there will
be demonstrators to assist you. The remaining sessions will have a single demonstrator and are for
finishing up the work.
REPORT WRITING
You are expected to complete each section of this worksheet, with printed diagrams and hand drawn
schematics where appropriate.
4
STAGE 1 - USING PyMOL FOR ANALYSING PROTEIN STRUCTURE
PYMOL BASICS.
Before we can examine a protein structure, we need a file containing atomic coordinates, i.e. the XYZ
coordinates of each atom within the protein molecule. Such coordinate files may be written in several
different formats. We will use the standard format for proteins, known as 'PDB' (Protein DataBank)
format. All proteins for which the structures have been determined have their coordinates deposited in
a computer database, maintained in the USA. This is accessible via the WWW (at www.rcsb.org).
We will illustrate the basics of using PyMOL with a protein that should be familiar to you, namely
bacteriorhodopsin. This is stored in PDB file 3HAP (note the format of the entry code for a protein:- a
number plus three letters, the latter being related in a more or less obvious way to the name of the
protein). You need to download this from www.rcsb.org - download the “text” option rather than an
archive.
Start PyMOL from the ‘teaching software’ folder. PyMOL normally starts with two windows: the
External GUI Window and thhe Viewer Window. The external window consists of menus, buttons,
text boxes to manipulate the structure in the viewer. The viewer window will display all 3D
graphics and direct user interaction with 3D models will be done through the viewer. The viewer
window has internal GUI, which allows you to perform actions on specific objects and specific
atom selections. It contains an object list, a mouse button configuration matrix, a frame indicator,
and a set of "VCR"-like controls for working with movies. The viewer also has a command line at
the bottom.
The following provides some information on viewer and commands available in PyMOL. Some
5
aspects will be demonstrated at the start of the practical. Note that help on PyMOL can be obtained by
opening the PyMOL manual webapges. PyMOL's quick demo, accessible through the built-in
Wizard menu, gets users started with all of the standard representations.
Displaying Coordinates
There are different ways of displaying coordinates. This will be demonstrated at the start of the
practical class. We will explore how to load a PDB file containing the XYZ coordinates of a protein
molecule, and how to select and display different components of the structure in different fashions.
Now use FILE > OPEN to open the pdb file, i.e. 3HAP.pdb.
When you load the molecule in PyMOL, it will display the molecule in viewer and the molecule name
in the object list. The file name is the default object name. You can rename if you want. The display
of the coordinates is in default line representation. Other representations are cartoons, ribbons, dots,
spheres, surfaces, and meshes. You can change the representation to ribbons or cartoons by
selecting ‘ribbon’ or ‘cartoon’ in ‘Show’ menu. To get rid of the lines you can use lines in ‘hide’
menu on control panel. You can rotate (left), zoom (right) and move (center) the molecule with the
mouse buttons.
You can perform the above task using commands:
Load molecule:
PyMOL> load <path to file><filename>
It will display a representation of the object in the viewer, and add the object's name to the control
panel.
PyMOL> show cartoon
PyMOL> hide line
This will change the representation to cartoon and will hide lines.
6
Atom Selection
An important aspect is to select a subsection of a protein to examine in more detail. For example, you
might wish only to examine the protein backbone, or to look at just one domain of a multi-domain
protein. If you want to manipulate a subset of the atoms and bonds in a molecule, you can use atom
selections. You can select particular residues or atoms in a binding pocket, or hydrophobic
residues, or all the alanines in a helix, and so on. You can create a selection and name it to make it
easier to use again later. Selection-expressions range from single words to long complicated
expressions and are stored as objects. The default selection-expression is all, which refers to all the
atoms that are currently loaded. If a selection-expression is missing, PyMOL will apply the
command to all.
PyMOL> select selection-name, selection-expression
PyMOL> select nterm, resi 1-10
PyMOL> zoom nterm
PyMOL> show spheres, nterm
When you create a selection-name, PyMOL puts it in the control panel so you can apply control
panel functions to the selection using your mouse. Selections are manipulated like PyMOL objects
and are shown in parenthesis. You can modify its colour or representation and this will affect only
the selected region. If you name the selection, you will be able to manipulate it any number of
times.
Here are some widely used expressions
Expression Interpretation
7
* All atoms
resn cys all cysteines
chain A chain A
segi lig segment lig
resi 8+12+16+20-28 Residues 8, 12, 16 and 20-28
resn arg+his,+lys All arg, lys & his residues
ss h All α-helices
ss s All β-sheets
ss l All turns
ss ”” All residues which has not been assigned structure
name c+o+n+c protein backbone
name ca alpha carbons
het All non-protein atoms
h hydrogen atoms
resn asp+glu acidic residues
resn lys+his+arg basic residues
Coloring
You can apply various colors to selections and objects using typed commands or pull-down menu
on the control panel. See the menu titled "Settings" to find out more about representations and
colors.
PyMOL> color color-name
PyMOL> color color-name, selection-expression
PyMOL> color yellow, resn cys # color cysteine yellow
8
PyMOL> color red, ss h # color helices red
PyMOL> color yellow, ss s # color beta sheet residues yellow
PyMOL> color green, ss l # color loops green
You can set the colour of the selections and of the background using the graphical user interface.
There are also 'predefined' colour schemes (via drop down menu) namely element, chain, secondary
structure and spectrum You can colour on structure which colours the molecule by protein secondary
structure:- α-helices are coloured red; β-sheets are coloured yellow, turns are coloured green, and all
other residues also are coloured green. You can colour on the basis of crystallographic temperature
factors.
Labeling Atoms
If you right click on an atom it will show its identity and you can change its properties and label it.
You can use the button in the drop down menu to label it by different properties.
Saving the session and images
If you want to be able to return to the current state of PyMOL, then you can create a session-file.
Choose "Save Session" in the "File" menu and respond to the dialog box by naming the file with a
".pse" file-name extension. When you open the saved session-file, PyMOL's memory returns to the
state that was saved.
You can write a graphics file to store the image you have created in the viewer. To save an image
to a file, use the "Save Image" option in the "File" menu or type the png command:
PyMOL>png file-name
9
Printing
To print you need to 'save' your image in a graphics file, which will then be printed. Set the
background to white for clarity and to avoid wasting toner.
A WELL KNOWN EXAMPLE - BACTERIORHODOPSIN.
When analysing the structure of a protein molecule, the degree of detail included in the representation
must be appropriate to the level of the analysis. For example, analysis of the mechanism of a
photosynthetic membrane protein requires a more detailed representation than when comparing the
backbone folds of different proteins. We will now examine different levels of representation, again
using 3HAP (bacteriorhodopsin) as an example. So, as before, load 3HAP into PyMOL.
Using the various options in PyMOL to display bacteriorhodopsin in the following fashions:
Display the protein as a CARTOON colored by secondary structure whilst displaying the
retinal group in sticks and coloured by element and other hetero atoms in spheres coloured
green. You can remove waters if any. This will test your understanding of the select and
display options. You will need to experiment to get this to work!
1. What is the dominant secondary structure of this protein?
a. …………………………………………………………………………………………
2. What is the importance of the bound lipid elements in the protein’s structure and
biological function?
a. …………………………………………………………………………………………
3. How does the protein’s structural surface properties reflect both the bound
10
molecules and its natural environment?
a. …………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
4. Describe the environment of the bound retinal
a. …………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
………………………………………………………………………………………
5. Draw by hand a schematic of the protein structure, highlighting functionally
important features
a.
…………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
11
…………………………………………………………………………………………
…………………………………………………………………………………………
………………………
…………………………………………………………………………………………
A β-BARREL MEMBRANE PROTEIN - OMPA.
This protein consists almost entirely of β-sheet, and is found in the outer membrane of E. coli. As for
myoglobin, you need to download the coordinates from www.rcsb.org. Download and open the file
1bxw.
Select the backbone and colour on structure. Identify the central β-barrel structure. Analyse the
secondary structure, in terms of the first and last residues of each secondary structure element, to
produce as secondary structure diagram. Use arrows for β-strands, (cylinders for α-helices - but not in
this protein!), and lines for loops. Label the diagram with the start and end residues and label the
strands of the β-barrel as S1 to S2 ….etc.
6. Draw by hand a schematic diagram of the protein’s topology. Hint:- a β-hairpin can
be drawn as flat arrows connected by loops; you might find it helpful to imagine
cutting open the β-barrel between the first and last strand, drawing it as if laid flat
on a surface. On this diagram label the strands S1 to S?? identified above.
12
7. Based on the patterns of hydrophobic residues on the protein surface how would you
expect the protein to orient in the membrane?
a. …………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
8. Is this an antiparallel or parallel β barrel? What is the difference?
a. …………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
9. What is the angle of the strands with respect to the pore axis? What is the angle of
the hydrogen bonds between the strands with respect to the axis?
a. …………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
10. Where are the waters in the structure? What is their significance in this structure?
a. …………………………………………………………………………………………
…………………………………………………………………………………………
11. There is a detergent molecule in the structure. Where did this come from and what is
the implication of its presence?
13
a. …………………………………………………………………………………………
…………………………………………………………………………………………
14
STAGE 2 – ADVANCED TECHNIQUES IN MOLECULAR MODELLING
For each of the following methods, pick one of the following structures to apply the methods
to (name/pdbid)
MSCL 2OAR (elNemo)
KCSA 1K4C (elNemo)
RHODOPSIN 1U19
SOPIP2 1Z98
AMTB 1U77
SECY 1RHZ
H+/CL- TRANSPORTER 1KPL
BTUCD 1L7V
You may wish to start by running the elastic network modelling section in the background, and do
other sections whilst the calculations are running. Ensure that you download the biological unit,
not the asymmetric unit for multimers.
15
Elastic Network Modelling
The elastic network model is a commonly used technique in molecular modelling for calculating
the dynamics of proteins. This is a “coarse-grain” technique; it employs a highly abstract model to
speed up the calculations; in contrast to well established methods such as atomistic molecular
dynamics which take days, the elastic network model typically takes less than an hour to run. Here
you will use a well known webserver “El nemo” to calculate the normal modes of your protein.
You may find the server is offline or busy- if this is the case you will need to come back and attempt
this when it is available.
Download your protein of choice to desktop and submit the structure to the service at
http://www.igs.cnrs-mrs.fr/elnemo/ to calculate the protein dynamics. Visualise the “modes” of
motion using the provided animations. The following reference may be helpful
Atilgan et al. Anisotropy of fluctuation dynamics of proteins with an elastic network model.
Biophys J (2001) vol. 80 (1) pp. 505-515
1. Briefly describe the physical basis behind the elastic network model (you may wish
to read one of the papers listed on the website to answer this).
a. …………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
2. Draw by hand a schematic of top 3 dominant modes of motion. How do they relate
to function? Your schematic should be a recognisable sketch of the motions of the
protein, with only functionally or dynamically important details included
a.
16
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
3. How well does the model predict the crystallographic B factors? Suggest some
reasons it might not predict them well.
a. …………………………………………………………………………………………
…………………………………………………………………………………………
………………………………………………………………………………………
17
Transmembrane Helix Prediction
Get the sequence of your selected protein (available from the PDB) in the FASTA format (a single
letter code).
Cut & paste the sequence to a transmembrane (TM) helix prediction programs. Run and print the res-
ults.
The TM helix prediction programs to use are:
TMHMM – works via an advanced pattern search method (‘hidden Markov model’)
http://www.cbs.dtu.dk/services/TMHMM-2.0/
TopPred – uses a first generation method (hydrophobicity profile based)
http://bioweb.pasteur.fr/seqanal/interfaces/toppred.html
DAS – compares your sequence against a large database of known TM helices
http://www.sbc.su.se/~miklos/DAS/
In each case fill in the window with your sequence and follow the instructions on the webpage.
(All of these can also be accessed from the MSPS group website…
http://sbcb.bioch.ox.ac.uk/links.php#Databases_and_Servers
then scroll up to Structure Prediction)
1. Each of these methods calculates the likelihood of a given sequence containing TM
helices. How do they differ in technique? How does this change the results you get back
and their interpretation?
a. …………………………………………………………………………………………
…………………………………………………………………………………………
18
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
2. How well do the predictions match up with the experimentally determined structure?
Why might the different techniques fail?
a. …………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
3. How similar are the predictions and are there regions they consistently disagree? Why
do you think this is?
a. …………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
4. Do you think that the oligomerisation state have an influence on the results? Does the
19
function of the helix in the protein mechanism influence the result?
a. …………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
5. Attach the resulting plots below & overleaf, with an annotated picture of the structure
showing the differences between prediction and experimental result.
20
Homology Search
The use of evolutionary information is an integral part of comparative modelling and a database
search is usually one of the first steps undertaken when trying to obtain structural and functional
information relating to a novel target sequence. Two sequences are said to be homologous if they
derive from a common ancestor. Identification of the homologous sequences or the sequences that
have identity is usually performed via a database search by an algorithm such as BLAST or
FASTA. The Basic local alignment search tool (BLAST) is one of the most common methods to
find the homologous sequences in a sequence database. The BLAST algorithm finds the highest
scoring pair (HSP) segments between the query sequence and the database sequence and if the HSP
exceeds a cut off threshold, it is reported as a hit. BLAST uses substitution matrices such as PAM
and BLOSUM for comparison.
Go to to http://www.ebi.ac.uk/Tools/sss/ and go to NCBI Blast to perform the searches. Copy and
paste the sequence in the box provided. Run the query with default parameters and run the blast
query against different databases (UniProt Knowledgebase, UniprotKB/Swiss-Prot, and protein
structure sequences) and record the number of hits with E-value less than 0.001. Alter the
algorithm parameters i.e. word size and matrix (BLOSSUM45 and PAM70), and report the
difference in the results (you may have to set the score and alignment options to 1000 ). Perform
the search using one of the following sequences and answer the questions below. You can
download the sequence using the link http://sbcb.bioch.ox.ac.uk/teaching/sequence.doc
>seq1 MFLWLKCFCTLIIVTIAKNSSAKIPHCKYDETINISHFKRLNDAYIYEHFEIPANLTGEFDYKELMDGSKVPTEFPNLRGCICKVRPCIRICCARKNILSNGECSDGVKNEIKLTMLDLTMQDILLTDPTLAELNMIPQYNSTELLILREQFQPCDEIVSLKRDEYTILKDGSILLHTSAEILSNDQYCLYPEIYSDFPETIRIINRRCYRNVMPGIAQLSVISVVGFILTLAVYLSVEKLRNLLGKCLICSLFSMFMEYFIWTDYFRLLQSICSAAGYMKYFFSMSSYLWFSVVSFHLWELFTSLNRHEPQYRFLIYNTFVWCTAAIPTVVIFSMNQMWENDPGKSEWLPLVGYFGCSVKDWNSSSWFYHIPIVILNSFNVIMFVLTAIYIWKVKKGVKSFAQHDERNTTCLEFNVQTYIQFVRLFLIMGASWLLDQLTRLAEDSHLLLDTIVLNLTVYLNAAFGILIFVLLILKSTFKMIMER>seq2
21
QCDNAKGLKAFYDAIKYGPNHLMVFGGVCPSVTSIIAESLQGWNLVQLSFAATTPVLADKKKYPYFFRTVPSDNAVNPAILKLLQHYQWRRVGTLTQDVQRFSEVRNDLTGVLYGEDIEISDTESFSNDPCTSVKKLKGNDVRIILGQFDQNMAAKVFCCAYEENMYGSKYQWIIPGWYEPSWWEQVHTEANSSRCLRKNLLAAMEGYIGVDFEPLSSKQIKTISGKTPQQYEREYNNKRSGVGPSKFHGYAYDGIWVIAKTLQRAMETLHASSRHQRIQDFNYTDHTLGRIILNAMNETNFFGVTGQVVFRNGERMGTIKFTQFQDSREVKVGEYNAVADTLEIINDTIRFQGSEPPKDKTIILEQLRKISLPLYSILSALTILGMIMASAFLFFNIKNRNQKLIKMSSPYMNNLIILGGMLSYASIFLFGLDGSFVSEKTFETLCTVRTWILTVGYTTAFGAMFAKTWRVHAIFKNVKMKKKIIKDQKLLVIVGGMLLIDLCILICWQAVDPLRRTVERYSMEPDPAGRDISIRPLLEHCENTHMTIWLGIVYAYKGLLMLFGCFLAWETRNVSIPALNDSKYIGMSVYNVGIMCIIGAAVSFLTRDQPNVQFCIVALVIIFCSTITLCLVFVPKLITLRTNPDAATQNRRFQFTQNQKKEDSKTSTSVTSVNQASTSRLEGLQSENHRLRMKITELDKDLEEVTMQLQDTPEKTTYIKQNHYQELNDILNLGNFTESTDGGKAVLKNHLDQNPQLQWNTTEPSRTCKDPIEDINSPEHIQRRLSLQLPILHHAYLPSIGGVDASCVSPCVSPTASPRHRHVPPSFRVMVSGL
>seq3MKNTFSLISVFWFLKISIIFCHLSDPRCFWRIKDAKNDLGDKETYCFFSIYTKQGYVKNDYFSWNLDKKVTPKTNHLIFSVYLAMEEINKNGHILPNISLLVNIECGLELYGERTGLAFKSEEFIPNYYCRNHRKYLIVLTTPKWGVSTSLGPLLYISRVPELYCGHFHLLLNDNEQFPHLYQISPKDTSLPLAMVSLVVHFRWNWIGAIVTNDDHGIQFLSELRGEMQKHIVCLSVAIIIQTEKFMALKEFRMNYNKIAMSSATVVIVYGDKDSPIQFTLIMWKSEGIWRIWVSVSQFDMITVIGDFLLYSSTGSFIFSHQQSEISGFEKFIQTVHPSNYSSEFSLAKLWWTYFTCSLPPSNCKKLKNCPIKTVFKWLFMTPIGMSMSDISYNLYNAMYAVAHSVHEMLLQQVDIWSTNAGTELEFDSWKMFSILKTLKFVNPAGDLVNMNQNLKQDTEYDIFYIPNFQKYYGLKMKIGRFSGYLPSGQQLYMSKEMMEWATDMDQILPSICSMPCRPGLRKSPQEEKDICCFVCNPCPENEISNMTNMDQCVKCPEDQYANEDQTLCLQKVVDVLDYRDPLGKSLAGFALCFSVLTSIVLCVFLKHRESPIVKANNQTLSYVLLISLIFCFICSLLYIGHPTMFICILQQTAFAIAFTVAASTVLAKTITVILAFKITVPGRMRWLLVSGAPKYIIFVCTMIQLIFCGIWLGTSPPFVETDVHMTHGHIIIVCNKGSVIAFYCVLGYMGSVALASFTVAFLSRKLPDTFNEAKLLTFSMLVFCSVWITFIPVYHSTKGKTMVAVEVFCILASSAGLLLCIYAPKCYIILLRPQKNSFYKFRKPHSKSENIS
>seq4MLSAGLGLLMLVAVVEFLIGLIGNGSLVVWSFREWIRKFNWSSYNLIILGLAGCRFLLQWLIILDLSLFPLFQSSRWLRYLSIFWVLVSQASLWFATFLSVFYCKKITTFDRPAYLWLKQRAYNLSLWCLLGYFIINLLLTVQIGLTFYHPPQGNSSIRYPFESWQYLYAFQLNSGSYLPLVVFLVSSGMLIVSLYTHHKKMKVHSAGRRDVRAKAHITALKSLGCFLLLHLVYIMASPFSITSKTYPPDLTSVFIWETLMAAYPSLHSLILIMGIPRVKQTCQKILWKTVCARRCWGP
>seq 5MNTEALTRGLLFLSLVLTGVPGNAAVICAFLSLVHRDGHLSPADAIVLHLASVNLMVVGVRCLLEVLATFEIQNVFDDTGCKAVIFIYRTSRSLSIWLTFVLSAYQCLCVAPPGSRWAAVRTLVTQYLAVIFLGLWLINTSMSVAPLLFAVGARNDSRLMQNAINVEFCFLSFPSRLSRDANGAAQVGRDVVPMSLMAMASLILLVFLYRHSRQVQGLRGGGQRSAERRAAITVVTLVTLYVLFYGVDNGLWVYTLTVPQTLGSSLNTDL
1. Report the number of hits in each case
22
Search program Blast Blast Blast
Database UniProt Knowledgebase,
UniProt /Swiss-prot
Structure
No of hits
2. Do you observe any difference in results when you change the matrix?
a. …………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
3. Did you get any significant hits when you run the query against the protein
structure sequence and what is the percentage identity between the target
sequence and the most significant hit?
a. …………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
4. What is the significance of an E-value?
a. …………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
………………………………………………………………………………………
23
Homology Modelling
Homology modelling is a tool used in the construction of three dimensional structure of a protein
from a sequence of amino acids based on the structure of the homologous protein.
The homology modelling involves three steps.
Identify homologous sequences with known structure
Align the unknown sequence with the homologous sequences
Construct a model based on the coordinates of the known structure
There are various tools available to develop, refine and evaluate the homology model. We are
going to use Swiss model. In Swiss model you can either provide the sequence you want to model
and the program does the homology search and build a model (Automated mode). The other
alternate is to provide this alignment of the target and the template sequence and based on the
alignment Swiss model develops a homology model (alignment mode).
Use one of the following uniprot sequence IDs and build a homology model with Swiss-model
(http://swissmodel.expasy.org/)
5HT7R_HUMAN
MC4R_HUMAN
ACM2_HUMAN
24
1. Did Swiss-Model generate more than one model? Would you prefer one over the
other and why?
a. …………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
2. Which template structure was used to build the homology model. What was the
percentage identity of the target sequence with the template sequence and is it
significant?
a. …………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
3. Download the coordinates of the model as a pdb file and view it using PyMOL.
Generate the image of your model and include your image with a brief discussion
about the secondary structure and compare it with the template structure.
a. …………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
………………………………………………………………………………………
4. Discuss the quality of the model. Which parts of the protein do you think are
25
modeled better than others, and why?
a. …………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
Bibliography
1. PyMOL - The PyMOL Molecular Graphics System, Version 1.3, Schrödinger, LLC.
26
2. Suhre, K. & Sanejouand, Y.H., ElNémo: a normal mode web server for protein movement
analysis and the generation of templates for molecular replacement Nucleic Acids Research,
32, W610-W614, 2004.
3. Anders Krogh, Björn Larsson, Gunnar von Heijne, Erik L.L Sonnhammer, Predicting trans-
membrane protein topology with a hidden markov model: application to complete genomes,
Journal of Molecular Biology, Volume 305, Issue 3, 19 January 2001, Pages 567-580, ISSN
0022-2836, 10.1006/jmbi.2000.4315.
4. M. Cserzo, E. Wallin, I. Simon, G. von Heijne and A. Elofsson: Prediction of transmem-
brane alpha-helices in procariotic membrane proteins: the Dense Alignment Surface
method; Prot. Eng. vol. 10, no. 6, 673-676, 1997
5. von Heijne, G. (1992) Membrane Protein Structure Prediction: Hydrophobicity Analysis
and the 'Positive Inside' Rule. J.Mol.Biol. 225, 487-494.
6. Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. (1990) "Basic local
alignment search tool." J. Mol. Biol. 215:403-410
7. Arnold K., Bordoli L., Kopp J., and Schwede T. (2006). The SWISS-MODEL Workspace:
A web-based environment for protein structure homology modelling. Bioinformatics,
22,195-201.
STAGE 3 – INTRODUCTION TO ASSESSING CRYSTALLOGRAPHIC MODEL
QUALITY AND RE-BUILDING IN COOT.
27
The aim of this exercise is to introduce you to structure validation and analysis. By the end of this
tutorial you should be able to assess the limitations of crystallographic models and assess their
strengths and weaknesses.
You will already have downloaded and opened up the PDB files: 2bf6 ; 2aeq ; 3gvl in coot. The
aim of this exercise is to introduce you to structure validation and analysis. By the end of this
tutorial you should be able to assess the limitations of crystallographic models and assess their
strengths and weaknesses.
You will already have downloaded and opened up the PDB files: 2bf6 ; 2aeq ; 3gvl in coot.
Q1. Go to the Protein Data Bank web site and search for the models using their PDB codes.
What do you notice about the resolution of the diffraction data collected for these crystals?
Q2. What does ‘Resolution’ mean in X ray crystallography? (Hint. You will need to know the
Bragg equation). How does this relate to the electron density map you ultimately calculate?
28
Now have a look at the models and their respective maps in coot using the EDS feature from the
File drop down menu.
Q3. What do you see in the 0.9 Å map of 2bf6 compared to the one calculated at 3.0 Å for
2aeq? How does this relate to the question posed in the tutorial sheet concerning model
building errors and Q2 above?
From the validation drop down menu, assess each of the models using the geometry checker.
Which parts of the models appear to be ‘wrong’/not built correctly?
Q4. Considering the ligands in the three models. Which of these have been built incorrectly
and why? Do you think this has anything to do with the resolution of the maps/quality of the
data?
29
Q5. Considering the ligand in model 3gvl. What do you notice about the positioning of the
carboxyl group on ligand SLB/A that would cause you concern, quite apart from the
geometry?
Backbone torsion angle validation: The backbone conformation of a polypeptide is completely
defined by the sequence of the torsion angles along its peptide chain. The 2D scatter plot of the
phi-psi (-) backbone torsion angle pairs for each residue, called the Ramachandran plot, shows
preferred regions where specific - combinations cluster. These projections are equivalent to a 2D
projection of the conformational energy surface, signifying the probability of finding a residue with
a given torsion angle pair.
30
The - torsion angle plot quickly shows the plausibility of the backbone geometry for a given
peptide chain. Importantly, the backbone torsion angles are generally not restrained in model
refinement and are thus an independent and reliable means of validation. Torsion angle pairs that
fall outside the allowed regions are highly improbable, (although not impossible) due to the
excessive conformational energy holding them in those positions.
We will now assess the different models we have loaded into coot. Bring up the ramachandran plot
for each and investigate these using the features built into coot. Tip. You can click on atoms in the
ramachandran plot and coot will automatically centre on the atom in the main window.
Q6. Draw a tripeptide and indicate the phi- and psi- angles. Why are these angles
constrained?
Q7. Why do we not do this for ligands?
31
Q8. Investigate the rotomer validation tool on any of the three models and familiarize
yourself with some of the common rotomers for different amino acids. Why do some amino
acids have more rotomer conformations than others?
I hope this has practical has provided a brief introduction to the tools and methods we use routinely
to assess structural models in our research and investigate their biochemical properties. In many
cases and increasingly in the post structural genomics era, scientists are using the protein structure
models deposited in the PDB in their own research. In many cases these scientists will not be
trained biochemists or structural biologists and assume the deposited models are accurate! This is
unfortunately not the case! If you use structural models in your Part II project or decide to
investigate models as part of your learning here in Oxford, be sure you understand the limitations
of the data you are using. A good place to check any model deposited in the PDB is the
MolProbity web site (Google Molprobity). You can upload any PDB entry and run a series of
tests to assess the stereochemistry of the model and compare the results to other models reported at
similar resolution. If you are inclined, you can also work through the tutorials on this site.
32
Further reading:
Due to the limited time available for this practical class we have consciously kept the pace quick
and only lightly covered the topics. The main aim was to introduce you to the molecular graphics
programs you can use to investigate structural models in biology. Further information on these
topics can be found in the following texts:
Biomolecular Crystallography, Principles, Practice and Application. Bernhard Rupp. Garland
Science. Pretty much all you will need to know about protein X-ray crystallography