Upload
bioexcel
View
14
Download
0
Embed Size (px)
Citation preview
bioexcel.eu
Partners Funding
Assessing structure quality in the PDB archive
Presenters: Matthew ConroyHost: Adam Carter
BioExcel Webinar Series
8 February, 2017
bioexcel.eu
Thiswebinarisbeingrecorded
bioexcel.eu
BioExcel Overview• Excellence in Biomolecular Software
- Improve the performance, efficiency and scalability of key codes
• Excellence in Usability- Devise efficient workflow environments
with associated data integration
• Excellence in Consultancy and Training- Promote best practices and train end users
DMI Monitor
DMI Enactor
DMI Executor
DMI Enactor
Data Delivery Point
Data Source
Monitoring flow
Data flow
Service Invocation
DMI Optimiser
DMI Planner
DMIValidator
DMI Gateway
DMI Gateway
DMI Gateway
DMI Enactor
Portal / Workbench
DMI Request
DADC Engineer
DMI Expert
Repository
Registry
DMI Expert
Domain Expert
bioexcel.eu
Interest Groups
• Integrative Modeling IG• Free Energy Calculations IG• Hybrid methods for biomolecular systems IG• Biomolecular simulations entry level users IG• Practical applications for industry IG• Training• Workflows
Support platformshttp://bioexcel.eu/contact
Forums Code Repositories Chat channel Video Channel
bioexcel.eu
Audience Q&A session
Please use the Questionsfunction in GoToWebinar
application
Any other questions or points to discuss after the live
webinar? Join the discussion the discussion at
http://ask.bioexcel.eu.
bioexcel.eu
Today’s Presenter
Matthew Conroy is a Scientific Curator at EMBL-EBI working with the Protein Data Bank in Europe. Before joining PDBe, he was solving structures of proteins by X-ray crystallography, NMR and electron microscopy.
6
Protein Data Bank in Europe
PDBe.org
Assessing structure quality in the PDB archive
Matthew Conroy
What is the Protein Data Bank (PDB)?
PDBe.org
An archive of experimentally determined 3-dimensional structures of biological macromolecules
Protein, nucleic acids, sugars
wwPDB.org
‘The PDB’FTP Archive
wwPDB.org
‘The PDB’FTP Archive
Value added data
Value added data
Value added data
Value added data
From the small… …to the large
Models are interpretations of experimental data
PDBe.org
Copper scavenger <1kDa
Zika virus 4 MDa
What experimental data are available?
PDBe.org
10% solution NMR SpectroscopyData: Restraints (mandatory since 2008)
Chemical shifts (mand. since 2010)
88% X-ray crystallographyData: Structure factors (mandatory since 2008)
1% Electron microscopyData: Map in EMDB (mandatory since 2016)
Like in all science:
Some data are better than other data
Some models (interpretations of that data) are better than other models
For whatever you need to do, you need to:Find the best modelBe aware of any potential limitations
Different techniques have different strengths
• “Structures are not absolute truths – they are models that fit the experimental data and therefore have uncertainty and subjectivity associated with them.”
Whaddaya Know: A Guide to Uncertainty and Subjectivity in Structural Biology
Trends in Biochemical SciencesMacKay et al 2017
DOI: 10.1016/j.tibs.2016.11.002
Poll
• What do you use PDB data for primarily?
• Template for homology model
• Protein-Protein complex prediction
• Small molecule (drug) docking studies
• Something else
All structures in the PDB are models explaining data
PDB archive does not reject structures based on qualityBut we do give an indication of that quality
• Essentially, all models are wrong, but some are useful.
• George E.P. Box
Validating PDB Data
What is the wwPDB doing to indicate structure quality?
Validation Task Forces
Method-specific Validation Task Forces
Advise us how best to validate:
• the models
• the experimental data
• fit of model to data
Most entries in the PDB archive come withvalidation reportsPDF documents downloadable from wwPDB sitesAlso XML format for machine reading
Model Quality Data Quality* Fit of model to data*
X-ray ✓ ✓ ✓
NMR ✓ ✓ ✓
EM ✓ Not yet, but on EBI’s EMDB pages
Not yet, but on EBI’s EMDB pages
* If data were deposited
On wwPDB sites(recalculated annually)
Available at deposition(for authors and referees)
Standalone servervalidate.wwpdb.org
At PDBe, they’re available from search results
PDBe.org
And also from each entry page
PDBe.org
Start with a summary and gets more detailed
1. Overall quality at a glance
2. Residue-property plots
Highlight outliers
3. Detailed analysis of any potential issues
Start with a summary
• Overall quality at a glance
How does this structure compare to others in the PDB archive?
Several different metrics considered
Poor ranking doesn’t necessarily mean a structure is ‘wrong’
It may mean it’s not as ‘right’ as others
Justified reasons for outliers eg strainAre they supported by data?
PDBe.org
We use these to rank results at PDBe
Quality
PDBe.org
Overall quality at a glanceaka ‘Summary Sliders’
Atoms bumping into each other
Geometric quality assessed using MolprobityTo compare a structure to others in the PDB
X-ray, NMR and EM are all judged the same way
Surprising bond angles Compared to all PDB
and to similar structures
Of course, these only tell you the model is chemically sensible
Good geometry doesn’t mean it’s right!
Overall quality at a glanceaka ‘Summary Sliders’
Extra metrics for X-ray structuresHow well does the model back-predict the data?Lower values are better
Residues not in electron density
The real-space R-value (RSR) measures fit between a residue and the data. The RSR Z-score (RSRZ) is a normalisation of this specific to the residue type and a resolution bin. Outliers have RSRZ >2
A look at X-ray data
From the diffraction pattern, a map can be calculated.
This indicates the location of electrons (therefore atoms) in the crystalHence- ‘electron density map’
Model is built into this map in an iterative process
Resolution indicates precision with which atoms can be placed
3.7Å
2.4Å
1.5Å
0.8Å
In low resolution data, models might simply indicate location/orientation of proteins
37Å EM map of F1Fo ATPase to show role in shaping mitochondrial cristae
PDB entry 4b2q
Colour coded according to the number ofgeometric quality criteria outliersie model quality
Green = 0Yellow = 1Orange = 2Red = 3 or more.Grey = unmodelled residues
Outliers in these metrics:• bond length & angle • Chirality/planarity • too-close contacts • Ramachandran• sidechain torsion angle• RNA sugar pucker
Residue-property plots – per polymer chain
The red dot! For X-ray structuresA red dot above a residue indicates a poor fit to the electron density
(ie RSRZ outlier)
Visualising validation at PDBe
PDBe.org
PDBe.org
Like
lihoo
d of
di
sord
er
Disordered in model ensembleOrdered in model ensemble
NMR data quality and fit to data• This is at the moment basic
1. Completeness of resonance assignments% of atoms for which chemical shift is measured
2. Statistically unusual shifts
3. Random Coil Index• Do chemical shift and protein conformation agree?
EM data quality and fit to data
PDBe.org/EMD-8116
Not yet in report, but available at EMDBpdbe.org/emdb
Geometry and fit to data go hand-in-hand
Asp 62 has some outlier bond lengths
But these are not justified by the electron density
Val 48 in entry 3kse is a Ramachandran outlier
But the strained conformation is supported by data
An aside on Assemblies• Only the smallest part of a
crystal structure is deposited to the PDB archive
• The whole can be generated by applying symmetry to this
• The part you’re interested in could be:
The file as it is:
The file and symmetry
Only part of the file
HIV protease. Entry 2az9
Viewing assemblies at PDBe with LiteMol
PDBe.org
Validation for small molecules
• There are thousands of amino acids and nucleic acid bases in the PDB
• But a small molecule could be unique
• So how can we tell what it should look like?
Summary of issues: X marks an outlier!
Is it the correct handedness?
Does it hit anything?
Is it chemically sensible?
Does it fit the data?
Validation for small molecules
PDBe.org
1. Geometry
Compares bond lengths and angleswith chemically similar fragments
Mogul - A knowledge-based library of molecular geometry derived from the Cambridge Structural Database (CSD)
Validation for small molecules
PDBe.org
2. Fit to data
RSR: Measure of how well a residue fits its local density >0.5 means it is worth a second look
LLDF- ‘local ligand density fit’Z-score of ligand RSR relative to nearby polymeric residues > 2 is flagged as unusual
LLDF: “Is the ligand data quality comparable to that of the binding site?”
Difference density- the ‘red and green bits’!Along with the electron density maps we’ve already seen, comes a second mapIndicating:
Areas where the model has too many atoms for the data
Data suggest too few atoms here
Areas where the model has too few atoms for the data
Is there a hint of a ligand bound to the Haem of this cytochrome P450?
Difference density can indicate modelling errors
Large green ‘blob’- there might be something else here
Difference density can indicate modelling errors
Aspirin in red density- presence unsupported by data(also not interacting with protein much!)
Viewing ligands at PDBe- each has its own page
PDBe.org
A summary of validation
PDBe.org
It’s not black and white! Structures can’t be assigned ‘good’ or ‘bad’ easily
Look at the validation reportHow many outliers? Are they justified? Are they talked about in the paper?Does the model fit the data?
Does the structure adequately explain what is known biologically?
A summary of validation
PDBe.org
wwPDB validation reports are extensive and a good aid to identifying structure quality
Search results at PDBe rank by quality Take into account validation metrics and resolution
PDBe pages allow validation to be viewed in 3DGeometry + data for EM and X-ray
Also a PyMOL plugin to show geometry
Available at pymolwiki.org
bioexcel.eu
Audience Q&A session
Please use the Questionsfunction in GoToWebinar
application
Any other questions or points to discuss after the live
webinar? Join the discussion the discussion at
http://ask.bioexcel.eu.
bioexcel.eu
Next Webinar15th February, 201715:00 GMT / 16:00 CET
Robust solutions for cryoEMfitting and visualisation of interaction space
Gydo van Zundertwith Mikael Trellet and JörgSchaarschmidt
Find out more, and register at www.bioexcel.eu/webinars