48
Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

Embed Size (px)

Citation preview

Page 1: Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

Molecular Replacement

Martyn Winn

CCP4 group, Daresbury Laboratory, UK

Page 2: Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

What do we know from the diffraction data?

• The point group of the new crystal form, the volume of the asymmetric unit and hence the likely number of molecules it contains (Matthews coefficient).

Note: We often cannot be sure of the space group and will need to search for the solution in several.

[Spacegroup determination rests on observation of absences in certain zones – eg Only l=4n seen on the 00l axis. Is this a 4 1 screw axis? A 4 3 screw axis? pointless tries. Scala plot?

Or are there two or more molecules in the asymmetric unit in the same orientation but separated by (x,y,1/4)?]

• The quality of the experimental intensities.• Are they complete? Saturated at low resolution? Anisotropic? • Are the intensity statistics reasonable? Could the crystal be twinned?

Page 3: Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

Data analysis before MR

Matthews coefficientNumber copies in a.s.u.

Native Patterson (translational NCS)B factor analysis

Self RF(rotational NCS)

Page 4: Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

NON-CRYSTALLOGRAPHIC TRANSLATION VECTOR (Thank you Airlie)

Crystal Patterson has origin sized peak at the translation vector.

Asymmetric unit of unknown crystal structure with non-crystallographic translation.

If the asymmetric unit contains two molecules related by a translation, then the native Patterson will have a large peak at the position representing this translation.

Unlike non-crystallographic rotations, non-crystallographic translations are not useful in structure determination. Use exptl phases?

In fact, they introduce awkward structure factor correlations not currently accounted for, and can make structures difficult to refine.

If there are more than one molecule in the asymmetric unit you should always check for non-crystallographic translation.

MOLREP does this within the program

Page 5: Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

What else can we do if there are several molecules in the asymmetric unit• A self rotation function can be calculated

from the measured data – it does not need a model.

• What can it show? If the molecule forms an oligomer, eg. a dimer, or trimer then we will see a peak in the self rotation map.

• However this can be mixed up with crystal symmetry and be very confusing to interpret!

Page 6: Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

SELF-ROTATION FUNCTION (Thank you Airlie)

If there is more than model molecule in the asymmetric unit (and no NCT), then the rotation function of the Patterson on itself will give a peak at the angle corresponding to the relative rotation between the two.

The self rotation function does not need a model!

This is useful for confirming or determining how many copies of the structure you have in the asymmetric unit. It should therefore be one of the first things you do with a new data set. If non-crystallographic symmetry is present it us extremely useful in MIR and density modification.

Crystal Patterson has same two-fold symmetry near the origin (intra-molecular peaks only)

Asymmetric unit of unknown crystal with non-crystallographic two-fold symmetry

Page 7: Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

Self Rotation Function for S100

symmetry related 2-folds

Page 8: Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

Finding search models

Need a PDB file for a structurally similar protein. This usually means a homologous protein.

Either you have one already? Or you search the Protein Data Bank

Search is based on sequence alignment between target protein and proteins in PDB.Several bioinformatics tools can help here:

OCA, MSDlite, MSDtarget - all use FASTAwww.ebi.ac.uk/msd

psiBLAST - iterative searchingwww.ncbi.nlm.nih.gov/BLAST

FFAS - profile-profile alignmentffas.ljcrf.edu/ffas-cgi/cgi/ffas.pl

Page 9: Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

Editing search models

Don’t use a raw PDB file for Molecular Replacement unless it is very similar (e.g. same protein, different conditions, ligand, etc.)Edit it to:

• remove residues that don’t occur in the target• remove side chain atoms that don’t occur in the target

(these assume a know alignment from model to target)• remove uncertain regions of model (check B factors, occupancies)• remove flexible loops

Note that we don’t add anything!! Homology modelling?

Consider use of individual domains and multimers (see MrBUMP below)

Page 10: Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

ChainsawNorman Stein, Daresbury Lab.

Page 11: Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

MR model preparation: chainsaw

• Molecular replacement model preparation utility that edits a PDB search model according to a sequence alignment.

• Features:– Removes un-aligned residues from the model– Prunes non-conserved residues back to the gamma atom– Preserves more atoms than in polyalanine model

Example of 1mr6 used as a template for 1tgx (38% sequence identity)

Unmodified template Chainsaw template Polyalanine template

Page 12: Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

Running Chainsaw:

complete PDB file

model totargetalignment

Alignment from:original search tool (FASTA, psiBLAST, etc.)multiple alignment (set of search models, protein family, etc.)hand-created

Page 13: Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

MolrepAlexei Vagin, York

http://www.ysbl.york.ac.uk/~alexei/molrep.html

Page 14: Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

Performs complete MR in single step:

Expt. data (MTZ)

Search model (PDB)

MolrepPositionedsearch model

• Individual steps for more difficult cases: CRF, TF, rigid-body• Multi-copy search: locked CRF, dyad search• Self RF• Phased TF, spherically-averaged phased TF• Improve search model• Other search models: electron density map, NMR models• Fit model in electron density map / EM map

Molrep: overview of functionality

Page 15: Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

MR for straightforward case via GUI:

title

mode

MTZ file

MTZ labels

search model

RUN IT!

Page 16: Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

Other parameters

Low resolution cut-offMolrep uses soft cut-off, Boff (BOFF, COMPL, RESMIN)

High resolution cut-offMolrep uses soft cut-off, Badd (BADD, SIM)

|F|new = |F|input *exp(-Badd*s2)*(1-exp(-Boff*s2)

Defaults estimated

High resolution limit

Absolute cut-off (RESMAX)Default estimated

Radius of Patterson sphere for CRFDefault is twice radius of gyration of search model, Keyword RAD, Infrequently Used Parameters in GUI

DEFAULTS ARE GOOD

Page 17: Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

Cross Rotation Function

List of top RF peaks

More details here

polar anglesEuler angles (CCP4)

R factor

Page 18: Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

Translation Functionpolar angles

R factorScore

fractionaltranslation

List of solutions:top TF for each RF solution

contrast of solution

Page 19: Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

Identification of solutions

SCORE = product Correlation Coefficient and maximal value of Packing Function

Packing Function integrated into TF search removes solutions with overlapping molecules

CONTRAST = ratio of top score to mean score:

>2.5 - definitely solution<2.5 and > 1.8 - solution<1.8 and > 1.5 - maybe solution<1.5 and > 1.3 - maybe not solution, but program accepts it <1.3 - probably not solution

Page 20: Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

Finding more than one copy in the asu

By default, Molrep will estimate number of copies to find.Override with NMON keyword

CRFTF for first copyFix first copyTF for second copyFix second copyTF for third copy...

Program flow:

Page 21: Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

Solving complexes

• Choose first component (largest, highest similarity)• Solve for first component (probably need to specify NMON explicitly)• New Molrep job

Model in - second componentFixed in - positioned first component

• Repeat for all other components

Possibility to use spherically-averaged phased TF using phasesfrom first component

Page 22: Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

Phaser

Phaser website:http://www-structmed.cimr.cam.ac.uk/phaser/http://www-structmed.cimr.cam.ac.uk/phaser/

Randy Read, Airlie McCoy, Cambridge

Page 23: Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

Performs complete MR in single step:

Expt. data (MTZ)

Search model (PDB)

PhaserPositionedsearch model

Use “MODE MR_AUTO” or “automated search” in the GUI

• anisotropy correction• fast rotation function• fast translation function• packing• refinement and phasing

loop over models

Page 24: Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

More functionality ...

• All steps can be run separately • Search over spacegroups:

MTZ spacegroup and enantiomorphAll spacegroups in MTZ point-groupSelected spacegroups

• Ensemble models (see later)• Brute RF and TF - slow and accurate• Normal mode analysis

Generates perturbed models

Page 25: Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

MR for straightforward case via GUI:

mode

MTZ file

search model

RUN IT!

target details

specify search

Page 26: Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

FRF

Euler angles (CCP4)

Top LLG and Z-scores for FRF

Page 27: Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

FTF

fractional translation

Top LLG and Z-scores for FRF

FRF solution number

Page 28: Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

Phaser does packing check after FTFClashes = C atoms closer than 3ÅDefault number of clashes = 10 (beware, was 0 in older versions)

Packing

Page 29: Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

Solution files:.sol file produced at end of job• Contains summary of all solutions• Each solution contains rotations and usually translations -

3DIM vs 6DIM•One line per model located•.sol file can be read back into Phaser in later jobs

Z-score Have I solved it? less than 5 no

5 - 6 unlikely

6 - 7 possibly

7 - 8 probably

more than 8 definitely

RFZ = RF Z-score

TFZ = TF Z-score

Page 30: Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

Ensemble models

Phaser refers to search models as “ensembles”

Often, ensemble contains single model, as in traditional MR

But Phaser can use an ensemble of > 1 models, which may work better than any single model

Models in an ensemble must be superposed prior to use in Phaser - use e.g. Superpose in CCP4

N.B. Phaser will complain if:– MW of models in ensemble are too different

– RMS between models is too large

(In Molrep, construct ensemble as pseudo-NMR PDB file)

Page 31: Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

Finding more than one copy in the asu

Specify > 1 in Composition of the asymmetric unit(keyword COMPOSITION ... NUMBER)

Specify > 1 in Number of copies to search for(keyword SEARCH ... NUMBER)

Phaser will issue warnings if these numbers are wrong.

CRFTF for first copyFix first copy (possibly multiple sets)CRF for second opyTF for second copyFix second copy (possibly multiple sets)...

Page 32: Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

Complexes

E.g. beta-blip example in Phaser tutorial:http://www-structmed.cimr.cam.ac.uk/phaser/tutorial/Phaser_MR_tute.html

As before, but:• Define > 1 type of component

Composition of the asymmetric unitDefine another component

• Define > 1 ensembleDefine ensembles

Add ensemble• Specify all searches

Search detailsAdd another search

Page 33: Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

MrBUMPRonan Keegan, Martyn Winn, Daresbury Lab.

Page 34: Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

The aim of MrBUMP

•An automation framework for Molecular Replacement.•Particular emphasis on generating a variety of search

models.•Can be used to generate models only.

•In favourable cases, gives “one-button” solution•In unfavourable cases, will suggest likely search models

for manual investigation (lead generation)

Wraps Phaser and/or Molrep.•Also uses a variety of helper applications (e.g. Chainsaw)

and bioinformatics tools (e.g. Fasta, Mafft)•Uses on-line databases (e.g. PDB, Scop)

Page 35: Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

`

`

`

`Target MTZ

& Sequence

TargetDetails

TemplateSearch

ModelPreparation

Molecular Replacement& Refinement

Check scores and exit or select the next model

The Pipeline

Page 36: Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

Search for homologous proteins

FASTA search of PDB• Sequence based search using sequence of target structure.• Can be run locally if user has fasta34 program installed or remotely

using the OCA web-based service hosted by the EBI.

All of the resulting PDB id codes are added to a list

These structures are calledmodel templates

Page 37: Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

Search for additional similar structures

• Additional structure-based search (optional)– Top hit from the FASTA search is used as the template

structure for a secondary structure based search.– Uses the SSM webservice provided by the EBI (a.k.a. MSDfold)

• Manual addition• Can add additional PDB id codes to the list, e.g. from FFAS or psiBLAST searches• Can add local PDB files

– Any new structures found are added to the list. – Provides structural variation, not based on direct sequence similarity to target

Page 38: Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

Multiple Alignment

• After the set of PDB ids are collected in the FASTA and SSM searches, their coordinate-based sequences are collected and put through a multiple alignment with the target sequence

• Aims:– Score template structures in a consistent manner, in order to

prioritise them for subsequent steps– Extract pairwise alignment between template and target for use

in Chainsaw step. Multiple alignment should give a better set of alignments than the original pair-wise FASTA alignments

Page 39: Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

Multiple Alignment

target

modeltemplates

pairwisealignment

Jalview 2.08.1 Barton group, Dundee

currently support ClustalW or MAFFT for multiple alignment

Page 40: Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

Template Model Scoring

• Sequence identity:– Ungapped sequence identity i.e. sequence identity of aligned target

residues• Alignment quality:

– Dependent on the alignment length, the number of gaps created in the template alignment and the extent of each of these gaps.

– The penalties given for gaps and the size of the gaps is biased so that alignments that preserve domains of the structure rather than spreading the aligned residues out score higher.

The top scoring models are then used for further processing

• Alignment Scoring:

score = sequence identity X alignment quality

Page 41: Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

Domains

• Suitable templates for target domains may exist in isolation in PDB, or in combination with dissimilar domains

• In case of relative domain motion, may want to solve domains separately

Page 42: Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

Domains

• Domains search:– Top scoring templates from multiple alignment are tested to see

if they contain any domains.– Uses the SCOP database. This only lists domains that appear

more than once in the PDB.– The database is scanned to to see if domains exist for each of

the PDBs in the list of templates– Domains are then extracted from the parent PDB structure file

and added to the list of template models as additional search models for MR.

Page 43: Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

Multimers

• Multimer search:– Search for quaternary structures that may be used as search

models.– Better signal-to-noise ratio than monomer, if assembly is

correct for the target.– Multimeric structures based on top templates are retrieved

using the PQS service at the EBI, and added to the list of search models

– PQS will soon be replaced by the use of the PISA service at the EBI (Eugene Krissinel)

1n5a SPLIT-ASU into 4 Oligomeric files of type TRIMERIC1n5b SPLIT-ASU into 2 Oligomeric files of type DIMERIC1n5c SYMMETRY-COMPLEX Oligomeric file of type DIMERIC1n5d SYMMETRY-COMPLEX Oligomeric file of type DIMERIC

Page 44: Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

Search Model Preparation

Search models prepared in four ways:1. PDBclip

– original PDB with waters removed, hydrogens removed, most probable conformations for side chains selected and chain ID’s added if missing.

2. Molrep – Molrep contains a model preparation function which will align the

template sequence with the target sequence and prune the non-conserved side chains accordingly.

– Chainsaw – Can be given any alignment between the target and template

sequences.– Non-conserved residues are pruned back to the gamma atom.

1. Polyalanine– Created by excluding all of the side chain atoms beyond the CB atom

using the Pdbset program

Also create an ensemble model for Phaser based on top 5 models

Page 45: Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

otherwise

final Rfree < 0.48 orfinal Rfree < 0.52 and dropped by 5%

• The search models can be processed with Molrep or Phaser or both.• The resulting models from molecular replacement are passed to Refmac

for restrained refinement.• The change in the Rfree value during refinement is used as rough

estimate of how good the resulting model is.

Molecular Replacement and Refinement

final Rfree < 0.35 or final Rfree < 0.5 and dropped by 20%

• MR scores and un-refined models available for later inspection.

“success”

“marginal”

“failure”

Page 46: Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

• MrBUMP can take advantage of a compute cluster to farm out the Molecular Replacement jobs.

• Currently Sun Grid Engine enabled clusters are supported but support will be added for LSF and condor and any other types of queuing system if there is enough demand.

• All nodes terminate when one finds a solution

MrBUMP on compute clusters

Page 47: Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

• Pre-release made available in Jan 06

• Simple installation

• Currently runs on Linux and OSX.

• Windows version almost ready.

•Comes with CCP4 GUI .

•Can also be run from the command line with keyword input

• First citation in Obiero et al., Acta Cryst. (2006). F62, 757-760

•Regular updates (currently version 0.3.2)

Pre-release version of MrBUMP

http://www.ccp4.ac.uk/MrBUMP

Page 48: Molecular Replacement Martyn Winn CCP4 group, Daresbury Laboratory, UK

A few observations ...

• In difficult cases, success in MrBUMP may depend on particular template, chain and model preparation method• Nevertheless, may get several putative solutions• Ease of subsequent model re-building, model completion may depend on choice of solution

• First solution or check everything? • Expectation that quick solution required - in fact, most users seem happy to let MrBUMP run for long time (hours, days)

• Worth checking “failed” solutions!