MOLECULAR GRAPHICS PRACTICAL CLASSsbcb.bioch.ox.ac.uk/teaching/BioPrac_2012.doc · Web viewDemonstrators will be available for the first 2 days of the practical, and the practical

1

NAME:_______________________

COLLEGE:________________________

MOLECULAR GRAPHICS & STRUCTURAL

BIOINFORMATICS PRACTICAL CLASS – MT 2011

Dr. Simon Newstead – [email protected]

Dr. Shabana Vohra - [email protected]

& other demonstrators

A digital version of this document is available at:

http://sbcb.bioch.ox.ac.uk/teaching/BioPrac_2012.pdf

COMMENTS:

GRADE:

mailto:[email protected]

mailto:[email protected]

2

INTRODUCTION

Bioinformatics is central to modern biochemistry, involving the use of computers to model and

analyse data. There are various subdivisions of bioinformatics like sequence analysis and searching,

protein structure analysis and predictions. Structural bioinformatics is mainly concerned with

computational analysis of information on protein structures. This practical is intended to provide you

with a "feel" for some basic aspects of bioinformatics. In a relatively short period of time it is not

possible to provide an exhaustive coverage of all aspects of the subject. However, we hope to convey

some of the excitement and interest associated with work in this field.

The aims of this practical are to introduce you to:

A simple molecular graphics program, PyMOL, that enables you to display and analyse

protein structures

Use of web based techniques to analyse protein structural properties

During this process you will analyse several protein structures. This will provide an introduction to the

diversity of protein structures. The practical is open-ended, in that you will be provided with

information on how to access the structure of any protein in which you are interested (provided that it

has been determined!).

A central aspect of this practical is that you are meant to learn how to use the programs by

experimenting with them, i.e. we will not tell exactly what to do at each stage. This is an

important aspect of your scientific training. We will provide you with the resources and the

information but then you are expected to use your intelligence and biochemical knowledge to

fully exploit these resources.

3

OVERVIEW

There are two stages to this practical:

Use of PyMOL in a Windows environment to visualise and analyse protein structure.

Use of advanced graphics and molecular modelling tools for the analysis of protein structure

and function

This year we have decided to timetable an introductory lecture in week 4 that will describe the aims of

the practical and introduce you to the graphics programs we will use. It is hoped that during this week

you will be able to download the programs, PyMol in particular, and experiment with these before the

practical. Demonstrators will be available for the first 2 days of the practical, and the practical itself

will be supervised for the remaining 2; students are encouraged to address any issues with the material

in the first days. The computing laboratory will be available for 4 sessions; in the first week, there will

be demonstrators to assist you. The remaining sessions will have a single demonstrator and are for

finishing up the work.

REPORT WRITING

You are expected to complete each section of this worksheet, with printed diagrams and hand drawn

schematics where appropriate.

4

STAGE 1 - USING PyMOL FOR ANALYSING PROTEIN STRUCTURE

PYMOL BASICS.

Before we can examine a protein structure, we need a file containing atomic coordinates, i.e. the XYZ

coordinates of each atom within the protein molecule. Such coordinate files may be written in several

different formats. We will use the standard format for proteins, known as 'PDB' (Protein DataBank)

format. All proteins for which the structures have been determined have their coordinates deposited in

a computer database, maintained in the USA. This is accessible via the WWW (at www.rcsb.org).

We will illustrate the basics of using PyMOL with a protein that should be familiar to you, namely

bacteriorhodopsin. This is stored in PDB file 3HAP (note the format of the entry code for a protein:- a

number plus three letters, the latter being related in a more or less obvious way to the name of the

protein). You need to download this from www.rcsb.org - download the “text” option rather than an

archive.

Start PyMOL from the ‘teaching software’ folder. PyMOL normally starts with two windows: the

External GUI Window and thhe Viewer Window. The external window consists of menus, buttons,

text boxes to manipulate the structure in the viewer. The viewer window will display all 3D

graphics and direct user interaction with 3D models will be done through the viewer. The viewer

window has internal GUI, which allows you to perform actions on specific objects and specific

atom selections. It contains an object list, a mouse button configuration matrix, a frame indicator,

and a set of "VCR"-like controls for working with movies. The viewer also has a command line at

the bottom.

The following provides some information on viewer and commands available in PyMOL. Some

http://www.rcsb.org/

5

aspects will be demonstrated at the start of the practical. Note that help on PyMOL can be obtained by

opening the PyMOL manual webapges. PyMOL's quick demo, accessible through the built-in

Wizard menu, gets users started with all of the standard representations.

Displaying Coordinates

There are different ways of displaying coordinates. This will be demonstrated at the start of the

practical class. We will explore how to load a PDB file containing the XYZ coordinates of a protein

molecule, and how to select and display different components of the structure in different fashions.

Now use FILE > OPEN to open the pdb file, i.e. 3HAP.pdb.

When you load the molecule in PyMOL, it will display the molecule in viewer and the molecule name

in the object list. The file name is the default object name. You can rename if you want. The display

of the coordinates is in default line representation. Other representations are cartoons, ribbons, dots,

spheres, surfaces, and meshes. You can change the representation to ribbons or cartoons by

selecting ‘ribbon’ or ‘cartoon’ in ‘Show’ menu. To get rid of the lines you can use lines in ‘hide’

menu on control panel. You can rotate (left), zoom (right) and move (center) the molecule with the

mouse buttons.

You can perform the above task using commands:

Load molecule:

PyMOL> load <path to file><filename>

It will display a representation of the object in the viewer, and add the object's name to the control

panel.

PyMOL> show cartoon

PyMOL> hide line

This will change the representation to cartoon and will hide lines.

6

Atom Selection

An important aspect is to select a subsection of a protein to examine in more detail. For example, you

might wish only to examine the protein backbone, or to look at just one domain of a multi-domain

protein. If you want to manipulate a subset of the atoms and bonds in a molecule, you can use atom

selections. You can select particular residues or atoms in a binding pocket, or hydrophobic

residues, or all the alanines in a helix, and so on. You can create a selection and name it to make it

easier to use again later. Selection-expressions range from single words to long complicated

expressions and are stored as objects. The default selection-expression is all, which refers to all the

atoms that are currently loaded. If a selection-expression is missing, PyMOL will apply the

command to all.

PyMOL> select selection-name, selection-expression

PyMOL> select nterm, resi 1-10

PyMOL> zoom nterm

PyMOL> show spheres, nterm

When you create a selection-name, PyMOL puts it in the control panel so you can apply control

panel functions to the selection using your mouse. Selections are manipulated like PyMOL objects

and are shown in parenthesis. You can modify its colour or representation and this will affect only

the selected region. If you name the selection, you will be able to manipulate it any number of

times.

Here are some widely used expressions

Expression Interpretation

7

* All atoms

resn cys all cysteines

chain A chain A

segi lig segment lig

resi 8+12+16+20-28 Residues 8, 12, 16 and 20-28

resn arg+his,+lys All arg, lys & his residues

ss h All α-helices

ss s All β-sheets

ss l All turns

ss ”” All residues which has not been assigned structure

name c+o+n+c protein backbone

name ca alpha carbons

het All non-protein atoms

h hydrogen atoms

resn asp+glu acidic residues

resn lys+his+arg basic residues

Coloring

You can apply various colors to selections and objects using typed commands or pull-down menu

on the control panel. See the menu titled "Settings" to find out more about representations and

colors.

PyMOL> color color-name

PyMOL> color color-name, selection-expression

PyMOL> color yellow, resn cys # color cysteine yellow

8

PyMOL> color red, ss h # color helices red

PyMOL> color yellow, ss s # color beta sheet residues yellow

PyMOL> color green, ss l # color loops green

You can set the colour of the selections and of the background using the graphical user interface.

There are also 'predefined' colour schemes (via drop down menu) namely element, chain, secondary

structure and spectrum You can colour on structure which colours the molecule by protein secondary

structure:- α-helices are coloured red; β-sheets are coloured yellow, turns are coloured green, and all

other residues also are coloured green. You can colour on the basis of crystallographic temperature

factors.

Labeling Atoms

If you right click on an atom it will show its identity and you can change its properties and label it.

You can use the button in the drop down menu to label it by different properties.

Saving the session and images

If you want to be able to return to the current state of PyMOL, then you can create a session-file.

Choose "Save Session" in the "File" menu and respond to the dialog box by naming the file with a

".pse" file-name extension. When you open the saved session-file, PyMOL's memory returns to the

state that was saved.

You can write a graphics file to store the image you have created in the viewer. To save an image

to a file, use the "Save Image" option in the "File" menu or type the png command:

PyMOL>png file-name

9

Printing

To print you need to 'save' your image in a graphics file, which will then be printed. Set the

background to white for clarity and to avoid wasting toner.

A WELL KNOWN EXAMPLE - BACTERIORHODOPSIN.

When analysing the structure of a protein molecule, the degree of detail included in the representation

must be appropriate to the level of the analysis. For example, analysis of the mechanism of a

photosynthetic membrane protein requires a more detailed representation than when comparing the

backbone folds of different proteins. We will now examine different levels of representation, again

using 3HAP (bacteriorhodopsin) as an example. So, as before, load 3HAP into PyMOL.

Using the various options in PyMOL to display bacteriorhodopsin in the following fashions:

Display the protein as a CARTOON colored by secondary structure whilst displaying the

retinal group in sticks and coloured by element and other hetero atoms in spheres coloured

green. You can remove waters if any. This will test your understanding of the select and

display options. You will need to experiment to get this to work!

1. What is the dominant secondary structure of this protein?

a. …………………………………………………………………………………………

2. What is the importance of the bound lipid elements in the protein’s structure and

biological function?

a. …………………………………………………………………………………………

3. How does the protein’s structural surface properties reflect both the bound

10

molecules and its natural environment?

a. …………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

4. Describe the environment of the bound retinal

a. …………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

………………………………………………………………………………………

5. Draw by hand a schematic of the protein structure, highlighting functionally

important features

a.

…………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

11

…………………………………………………………………………………………

…………………………………………………………………………………………

………………………

…………………………………………………………………………………………

A β-BARREL MEMBRANE PROTEIN - OMPA.

This protein consists almost entirely of β-sheet, and is found in the outer membrane of E. coli. As for

myoglobin, you need to download the coordinates from www.rcsb.org. Download and open the file

1bxw.

Select the backbone and colour on structure. Identify the central β-barrel structure. Analyse the

secondary structure, in terms of the first and last residues of each secondary structure element, to

produce as secondary structure diagram. Use arrows for β-strands, (cylinders for α-helices - but not in

this protein!), and lines for loops. Label the diagram with the start and end residues and label the

strands of the β-barrel as S1 to S2 ….etc.

6. Draw by hand a schematic diagram of the protein’s topology. Hint:- a β-hairpin can

be drawn as flat arrows connected by loops; you might find it helpful to imagine

cutting open the β-barrel between the first and last strand, drawing it as if laid flat

on a surface. On this diagram label the strands S1 to S?? identified above.

http://www.rcsb.org/

12

7. Based on the patterns of hydrophobic residues on the protein surface how would you

expect the protein to orient in the membrane?

a. …………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

8. Is this an antiparallel or parallel β barrel? What is the difference?

a. …………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

9. What is the angle of the strands with respect to the pore axis? What is the angle of

the hydrogen bonds between the strands with respect to the axis?

a. …………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

10. Where are the waters in the structure? What is their significance in this structure?

a. …………………………………………………………………………………………

…………………………………………………………………………………………

11. There is a detergent molecule in the structure. Where did this come from and what is

the implication of its presence?

13

a. …………………………………………………………………………………………

…………………………………………………………………………………………

14

STAGE 2 – ADVANCED TECHNIQUES IN MOLECULAR MODELLING

For each of the following methods, pick one of the following structures to apply the methods

to (name/pdbid)

MSCL 2OAR (elNemo)

KCSA 1K4C (elNemo)

RHODOPSIN 1U19

SOPIP2 1Z98

AMTB 1U77

SECY 1RHZ

H+/CL- TRANSPORTER 1KPL

BTUCD 1L7V

You may wish to start by running the elastic network modelling section in the background, and do

other sections whilst the calculations are running. Ensure that you download the biological unit,

not the asymmetric unit for multimers.

15

Elastic Network Modelling

The elastic network model is a commonly used technique in molecular modelling for calculating

the dynamics of proteins. This is a “coarse-grain” technique; it employs a highly abstract model to

speed up the calculations; in contrast to well established methods such as atomistic molecular

dynamics which take days, the elastic network model typically takes less than an hour to run. Here

you will use a well known webserver “El nemo” to calculate the normal modes of your protein.

You may find the server is offline or busy- if this is the case you will need to come back and attempt

this when it is available.

Download your protein of choice to desktop and submit the structure to the service at

http://www.igs.cnrs-mrs.fr/elnemo/ to calculate the protein dynamics. Visualise the “modes” of

motion using the provided animations. The following reference may be helpful

Atilgan et al. Anisotropy of fluctuation dynamics of proteins with an elastic network model.

Biophys J (2001) vol. 80 (1) pp. 505-515

1. Briefly describe the physical basis behind the elastic network model (you may wish

to read one of the papers listed on the website to answer this).

a. …………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

2. Draw by hand a schematic of top 3 dominant modes of motion. How do they relate

to function? Your schematic should be a recognisable sketch of the motions of the

protein, with only functionally or dynamically important details included

a.

http://www.igs.cnrs-mrs.fr/elnemo/

16

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

3. How well does the model predict the crystallographic B factors? Suggest some

reasons it might not predict them well.

a. …………………………………………………………………………………………

…………………………………………………………………………………………

………………………………………………………………………………………

17

Transmembrane Helix Prediction

Get the sequence of your selected protein (available from the PDB) in the FASTA format (a single

letter code).

Cut & paste the sequence to a transmembrane (TM) helix prediction programs. Run and print the res-

ults.

The TM helix prediction programs to use are:

TMHMM – works via an advanced pattern search method (‘hidden Markov model’)

http://www.cbs.dtu.dk/services/TMHMM-2.0/

TopPred – uses a first generation method (hydrophobicity profile based)

http://bioweb.pasteur.fr/seqanal/interfaces/toppred.html

DAS – compares your sequence against a large database of known TM helices

http://www.sbc.su.se/~miklos/DAS/

In each case fill in the window with your sequence and follow the instructions on the webpage.

(All of these can also be accessed from the MSPS group website…

http://sbcb.bioch.ox.ac.uk/links.php#Databases_and_Servers

then scroll up to Structure Prediction)

1. Each of these methods calculates the likelihood of a given sequence containing TM

helices. How do they differ in technique? How does this change the results you get back

and their interpretation?

a. …………………………………………………………………………………………

…………………………………………………………………………………………

http://sbcb.bioch.ox.ac.uk/links.php#Databases_and_Servers

18

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

2. How well do the predictions match up with the experimentally determined structure?

Why might the different techniques fail?

a. …………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

3. How similar are the predictions and are there regions they consistently disagree? Why

do you think this is?

a. …………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

4. Do you think that the oligomerisation state have an influence on the results? Does the

19

function of the helix in the protein mechanism influence the result?

a. …………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

5. Attach the resulting plots below & overleaf, with an annotated picture of the structure

showing the differences between prediction and experimental result.

20

Homology Search

The use of evolutionary information is an integral part of comparative modelling and a database

search is usually one of the first steps undertaken when trying to obtain structural and functional

information relating to a novel target sequence. Two sequences are said to be homologous if they

derive from a common ancestor. Identification of the homologous sequences or the sequences that

have identity is usually performed via a database search by an algorithm such as BLAST or

FASTA. The Basic local alignment search tool (BLAST) is one of the most common methods to

find the homologous sequences in a sequence database. The BLAST algorithm finds the highest

scoring pair (HSP) segments between the query sequence and the database sequence and if the HSP

exceeds a cut off threshold, it is reported as a hit. BLAST uses substitution matrices such as PAM

and BLOSUM for comparison.

Go to to http://www.ebi.ac.uk/Tools/sss/ and go to NCBI Blast to perform the searches. Copy and

paste the sequence in the box provided. Run the query with default parameters and run the blast

query against different databases (UniProt Knowledgebase, UniprotKB/Swiss-Prot, and protein

structure sequences) and record the number of hits with E-value less than 0.001. Alter the

algorithm parameters i.e. word size and matrix (BLOSSUM45 and PAM70), and report the

difference in the results (you may have to set the score and alignment options to 1000 ). Perform

the search using one of the following sequences and answer the questions below. You can

download the sequence using the link http://sbcb.bioch.ox.ac.uk/teaching/sequence.doc

>seq1 MFLWLKCFCTLIIVTIAKNSSAKIPHCKYDETINISHFKRLNDAYIYEHFEIPANLTGEFDYKELMDGSKVPTEFPNLRGCICKVRPCIRICCARKNILSNGECSDGVKNEIKLTMLDLTMQDILLTDPTLAELNMIPQYNSTELLILREQFQPCDEIVSLKRDEYTILKDGSILLHTSAEILSNDQYCLYPEIYSDFPETIRIINRRCYRNVMPGIAQLSVISVVGFILTLAVYLSVEKLRNLLGKCLICSLFSMFMEYFIWTDYFRLLQSICSAAGYMKYFFSMSSYLWFSVVSFHLWELFTSLNRHEPQYRFLIYNTFVWCTAAIPTVVIFSMNQMWENDPGKSEWLPLVGYFGCSVKDWNSSSWFYHIPIVILNSFNVIMFVLTAIYIWKVKKGVKSFAQHDERNTTCLEFNVQTYIQFVRLFLIMGASWLLDQLTRLAEDSHLLLDTIVLNLTVYLNAAFGILIFVLLILKSTFKMIMER>seq2

http://www.ebi.ac.uk/Tools/sss/

21

QCDNAKGLKAFYDAIKYGPNHLMVFGGVCPSVTSIIAESLQGWNLVQLSFAATTPVLADKKKYPYFFRTVPSDNAVNPAILKLLQHYQWRRVGTLTQDVQRFSEVRNDLTGVLYGEDIEISDTESFSNDPCTSVKKLKGNDVRIILGQFDQNMAAKVFCCAYEENMYGSKYQWIIPGWYEPSWWEQVHTEANSSRCLRKNLLAAMEGYIGVDFEPLSSKQIKTISGKTPQQYEREYNNKRSGVGPSKFHGYAYDGIWVIAKTLQRAMETLHASSRHQRIQDFNYTDHTLGRIILNAMNETNFFGVTGQVVFRNGERMGTIKFTQFQDSREVKVGEYNAVADTLEIINDTIRFQGSEPPKDKTIILEQLRKISLPLYSILSALTILGMIMASAFLFFNIKNRNQKLIKMSSPYMNNLIILGGMLSYASIFLFGLDGSFVSEKTFETLCTVRTWILTVGYTTAFGAMFAKTWRVHAIFKNVKMKKKIIKDQKLLVIVGGMLLIDLCILICWQAVDPLRRTVERYSMEPDPAGRDISIRPLLEHCENTHMTIWLGIVYAYKGLLMLFGCFLAWETRNVSIPALNDSKYIGMSVYNVGIMCIIGAAVSFLTRDQPNVQFCIVALVIIFCSTITLCLVFVPKLITLRTNPDAATQNRRFQFTQNQKKEDSKTSTSVTSVNQASTSRLEGLQSENHRLRMKITELDKDLEEVTMQLQDTPEKTTYIKQNHYQELNDILNLGNFTESTDGGKAVLKNHLDQNPQLQWNTTEPSRTCKDPIEDINSPEHIQRRLSLQLPILHHAYLPSIGGVDASCVSPCVSPTASPRHRHVPPSFRVMVSGL

>seq3MKNTFSLISVFWFLKISIIFCHLSDPRCFWRIKDAKNDLGDKETYCFFSIYTKQGYVKNDYFSWNLDKKVTPKTNHLIFSVYLAMEEINKNGHILPNISLLVNIECGLELYGERTGLAFKSEEFIPNYYCRNHRKYLIVLTTPKWGVSTSLGPLLYISRVPELYCGHFHLLLNDNEQFPHLYQISPKDTSLPLAMVSLVVHFRWNWIGAIVTNDDHGIQFLSELRGEMQKHIVCLSVAIIIQTEKFMALKEFRMNYNKIAMSSATVVIVYGDKDSPIQFTLIMWKSEGIWRIWVSVSQFDMITVIGDFLLYSSTGSFIFSHQQSEISGFEKFIQTVHPSNYSSEFSLAKLWWTYFTCSLPPSNCKKLKNCPIKTVFKWLFMTPIGMSMSDISYNLYNAMYAVAHSVHEMLLQQVDIWSTNAGTELEFDSWKMFSILKTLKFVNPAGDLVNMNQNLKQDTEYDIFYIPNFQKYYGLKMKIGRFSGYLPSGQQLYMSKEMMEWATDMDQILPSICSMPCRPGLRKSPQEEKDICCFVCNPCPENEISNMTNMDQCVKCPEDQYANEDQTLCLQKVVDVLDYRDPLGKSLAGFALCFSVLTSIVLCVFLKHRESPIVKANNQTLSYVLLISLIFCFICSLLYIGHPTMFICILQQTAFAIAFTVAASTVLAKTITVILAFKITVPGRMRWLLVSGAPKYIIFVCTMIQLIFCGIWLGTSPPFVETDVHMTHGHIIIVCNKGSVIAFYCVLGYMGSVALASFTVAFLSRKLPDTFNEAKLLTFSMLVFCSVWITFIPVYHSTKGKTMVAVEVFCILASSAGLLLCIYAPKCYIILLRPQKNSFYKFRKPHSKSENIS

>seq4MLSAGLGLLMLVAVVEFLIGLIGNGSLVVWSFREWIRKFNWSSYNLIILGLAGCRFLLQWLIILDLSLFPLFQSSRWLRYLSIFWVLVSQASLWFATFLSVFYCKKITTFDRPAYLWLKQRAYNLSLWCLLGYFIINLLLTVQIGLTFYHPPQGNSSIRYPFESWQYLYAFQLNSGSYLPLVVFLVSSGMLIVSLYTHHKKMKVHSAGRRDVRAKAHITALKSLGCFLLLHLVYIMASPFSITSKTYPPDLTSVFIWETLMAAYPSLHSLILIMGIPRVKQTCQKILWKTVCARRCWGP

>seq 5MNTEALTRGLLFLSLVLTGVPGNAAVICAFLSLVHRDGHLSPADAIVLHLASVNLMVVGVRCLLEVLATFEIQNVFDDTGCKAVIFIYRTSRSLSIWLTFVLSAYQCLCVAPPGSRWAAVRTLVTQYLAVIFLGLWLINTSMSVAPLLFAVGARNDSRLMQNAINVEFCFLSFPSRLSRDANGAAQVGRDVVPMSLMAMASLILLVFLYRHSRQVQGLRGGGQRSAERRAAITVVTLVTLYVLFYGVDNGLWVYTLTVPQTLGSSLNTDL

1. Report the number of hits in each case

22

Search program Blast Blast Blast

Database UniProt Knowledgebase,

UniProt /Swiss-prot

Structure

No of hits

2. Do you observe any difference in results when you change the matrix?

a. …………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

3. Did you get any significant hits when you run the query against the protein

structure sequence and what is the percentage identity between the target

sequence and the most significant hit?

a. …………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

4. What is the significance of an E-value?

a. …………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

………………………………………………………………………………………

23

Homology Modelling

Homology modelling is a tool used in the construction of three dimensional structure of a protein

from a sequence of amino acids based on the structure of the homologous protein.

The homology modelling involves three steps.

Identify homologous sequences with known structure

Align the unknown sequence with the homologous sequences

Construct a model based on the coordinates of the known structure

There are various tools available to develop, refine and evaluate the homology model. We are

going to use Swiss model. In Swiss model you can either provide the sequence you want to model

and the program does the homology search and build a model (Automated mode). The other

alternate is to provide this alignment of the target and the template sequence and based on the

alignment Swiss model develops a homology model (alignment mode).

Use one of the following uniprot sequence IDs and build a homology model with Swiss-model

(http://swissmodel.expasy.org/)

5HT7R_HUMAN

MC4R_HUMAN

ACM2_HUMAN

http://swissmodel.expasy.org/

24

1. Did Swiss-Model generate more than one model? Would you prefer one over the

other and why?

a. …………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

2. Which template structure was used to build the homology model. What was the

percentage identity of the target sequence with the template sequence and is it

significant?

a. …………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

3. Download the coordinates of the model as a pdb file and view it using PyMOL.

Generate the image of your model and include your image with a brief discussion

about the secondary structure and compare it with the template structure.

a. …………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

………………………………………………………………………………………

4. Discuss the quality of the model. Which parts of the protein do you think are

25

modeled better than others, and why?

a. …………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

Bibliography

1. PyMOL - The PyMOL Molecular Graphics System, Version 1.3, Schrödinger, LLC.

26

2. Suhre, K. & Sanejouand, Y.H., ElNémo: a normal mode web server for protein movement

analysis and the generation of templates for molecular replacement Nucleic Acids Research,

32, W610-W614, 2004.

3. Anders Krogh, Björn Larsson, Gunnar von Heijne, Erik L.L Sonnhammer, Predicting trans-

membrane protein topology with a hidden markov model: application to complete genomes,

Journal of Molecular Biology, Volume 305, Issue 3, 19 January 2001, Pages 567-580, ISSN

0022-2836, 10.1006/jmbi.2000.4315.

4. M. Cserzo, E. Wallin, I. Simon, G. von Heijne and A. Elofsson: Prediction of transmem-

brane alpha-helices in procariotic membrane proteins: the Dense Alignment Surface

method; Prot. Eng. vol. 10, no. 6, 673-676, 1997

5. von Heijne, G. (1992) Membrane Protein Structure Prediction: Hydrophobicity Analysis

and the 'Positive Inside' Rule. J.Mol.Biol. 225, 487-494.

6. Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. (1990) "Basic local

alignment search tool." J. Mol. Biol. 215:403-410

7. Arnold K., Bordoli L., Kopp J., and Schwede T. (2006). The SWISS-MODEL Workspace:

A web-based environment for protein structure homology modelling. Bioinformatics,

22,195-201.

STAGE 3 – INTRODUCTION TO ASSESSING CRYSTALLOGRAPHIC MODEL

QUALITY AND RE-BUILDING IN COOT.

27

The aim of this exercise is to introduce you to structure validation and analysis. By the end of this

tutorial you should be able to assess the limitations of crystallographic models and assess their

strengths and weaknesses.

You will already have downloaded and opened up the PDB files: 2bf6 ; 2aeq ; 3gvl in coot. The

aim of this exercise is to introduce you to structure validation and analysis. By the end of this

tutorial you should be able to assess the limitations of crystallographic models and assess their

strengths and weaknesses.

You will already have downloaded and opened up the PDB files: 2bf6 ; 2aeq ; 3gvl in coot.

Q1. Go to the Protein Data Bank web site and search for the models using their PDB codes.

What do you notice about the resolution of the diffraction data collected for these crystals?

Q2. What does ‘Resolution’ mean in X ray crystallography? (Hint. You will need to know the

Bragg equation). How does this relate to the electron density map you ultimately calculate?

28

Now have a look at the models and their respective maps in coot using the EDS feature from the

File drop down menu.

Q3. What do you see in the 0.9 Å map of 2bf6 compared to the one calculated at 3.0 Å for

2aeq? How does this relate to the question posed in the tutorial sheet concerning model

building errors and Q2 above?

From the validation drop down menu, assess each of the models using the geometry checker.

Which parts of the models appear to be ‘wrong’/not built correctly?

Q4. Considering the ligands in the three models. Which of these have been built incorrectly

and why? Do you think this has anything to do with the resolution of the maps/quality of the

data?

29

Q5. Considering the ligand in model 3gvl. What do you notice about the positioning of the

carboxyl group on ligand SLB/A that would cause you concern, quite apart from the

geometry?

Backbone torsion angle validation: The backbone conformation of a polypeptide is completely

defined by the sequence of the torsion angles along its peptide chain. The 2D scatter plot of the

phi-psi (-) backbone torsion angle pairs for each residue, called the Ramachandran plot, shows

preferred regions where specific - combinations cluster. These projections are equivalent to a 2D

projection of the conformational energy surface, signifying the probability of finding a residue with

a given torsion angle pair.

30

The - torsion angle plot quickly shows the plausibility of the backbone geometry for a given

peptide chain. Importantly, the backbone torsion angles are generally not restrained in model

refinement and are thus an independent and reliable means of validation. Torsion angle pairs that

fall outside the allowed regions are highly improbable, (although not impossible) due to the

excessive conformational energy holding them in those positions.

We will now assess the different models we have loaded into coot. Bring up the ramachandran plot

for each and investigate these using the features built into coot. Tip. You can click on atoms in the

ramachandran plot and coot will automatically centre on the atom in the main window.

Q6. Draw a tripeptide and indicate the phi- and psi- angles. Why are these angles

constrained?

Q7. Why do we not do this for ligands?

31

Q8. Investigate the rotomer validation tool on any of the three models and familiarize

yourself with some of the common rotomers for different amino acids. Why do some amino

acids have more rotomer conformations than others?

I hope this has practical has provided a brief introduction to the tools and methods we use routinely

to assess structural models in our research and investigate their biochemical properties. In many

cases and increasingly in the post structural genomics era, scientists are using the protein structure

models deposited in the PDB in their own research. In many cases these scientists will not be

trained biochemists or structural biologists and assume the deposited models are accurate! This is

unfortunately not the case! If you use structural models in your Part II project or decide to

investigate models as part of your learning here in Oxford, be sure you understand the limitations

of the data you are using. A good place to check any model deposited in the PDB is the

MolProbity web site (Google Molprobity). You can upload any PDB entry and run a series of

tests to assess the stereochemistry of the model and compare the results to other models reported at

similar resolution. If you are inclined, you can also work through the tutorials on this site.

32

Further reading:

Due to the limited time available for this practical class we have consciously kept the pace quick

and only lightly covered the topics. The main aim was to introduce you to the molecular graphics

programs you can use to investigate structural models in biology. Further information on these

topics can be found in the following texts:

Biomolecular Crystallography, Principles, Practice and Application. Bernhard Rupp. Garland

Science. Pretty much all you will need to know about protein X-ray crystallography

Documents

MOLECULAR GRAPHICS PRACTICAL CLASSsbcb.bioch.ox.ac.uk/teaching/BioPrac_2012.doc · Web viewDemonstrators will be available for the first 2 days of the practical, and the practical