35
A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry Department University of Wisconsin – Madison USA Presented at the Fourteenth Conference on Intelligent Systems for Molecular Biology (ISMB 2006), Fortaleza, Brazil, August 7, 2006

A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry

Embed Size (px)

Citation preview

Page 1: A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry

A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps

Frank DiMaio, Jude ShavlikComputer Sciences Department

George PhillipsBiochemistry Department

University of Wisconsin – MadisonUSA

Presented at the Fourteenth Conference on Intelligent Systems for Molecular Biology (ISMB 2006), Fortaleza, Brazil, August 7, 2006

Page 2: A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry

X-ray Crystallography

ProteinCrystal

CollectionPlate

FFTFFT

ElectronDensity Map(“3D picture”)

X-ray beam

Page 3: A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry

Given: Sequence + Density Map

N

O

H

N

O

N

O

N

ON

N

O

N

OO

Sequence + Electron Density Map

Page 4: A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry

Find: Each Atom’s Coordinates

N

O

H

N

O

N

O

N

ON

N

O

N

OO

Page 5: A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry

Our Subtask: Backbone Trace

Page 6: A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry

The Unit Cell 3D density function ρ(x,y,z) provided over unit cell Unit cell may contain multiple copies of the protein

Page 7: A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry

The Unit Cell 3D density function ρ(x,y,z) provided over unit cell Unit cell may contain multiple copies of the protein

Page 8: A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry

Density Map Resolution

ARP/wARP(Perrakis et al. 1997)

TEXTAL(Ioerger et al. 1999)

Resolve(Terwilliger 2002)

Our focus

2Å2Å 3Å3Å 4Å4Å

Page 9: A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry

Overview of ACMI (our method)

Local Match Algorithm searches for sequence-specific

5-mers centered at each amino acid Many false positives

Global Consistency Use probabilistic model to filter false positives Find most probable backbone trace

Global Consistency Use probabilistic model to filter false positives Find most probable backbone trace

Page 10: A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry

5-mer Lookup and Cluster

…VKH V LVSPEKIEELIKGY…

PDB

Cluster 1 Cluster 2

wt=0.67 wt=0.33NOTE: can be done in precompute step

Page 11: A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry

5-mer Search

6D search (rotation + translation) forrepresentative structures in density map

Compute “similarity”

Computed by Fourier convolution (Cowtan 2001)

Use tuneset to convert similarity score to probability

y

mapfragfrag yxyyxt 2 )()()()(

Page 12: A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry

Convert Scores to Probabilities

5-mer representative

scores ti (ui)

search density map

Bayes’rule

probability distribution over unit cell

P(5-mer at ui | Map)

match to tuneset

scoredistributions

POSNEG

Page 13: A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry

In This Talk…

Where we are now

For each amino acid in the protein, we have a probability distribution over the unit cell

Where we are headedFind the backbone layout maximizing

P( | )iu Map

AAs P( | )ii

u Map

AA-pairs

P(conformation { , })i ji, j

u u

Page 14: A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry

Pairwise Markov Field Models A type of undirected graphical model

Represent joint probabilities as product of vertex and edge potentials

Similar to (but more general than) Bayesian networks

edges

( )st s ts t

ψ u ,u

vertices

( | )s ss

u y

( | )p U y

u1 u3u2

y

Page 15: A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry

Protein Backbone Model

ALA GLY LYS LEU

Each vertex is an amino acid Each label is location + orientation

Evidence y is the electron density map Each vertex (or observational) potential

comes from the 5-mer matching( | )i iu y

,i i iu x q

Page 16: A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry

Protein Backbone Model

Two types of edge (or structural) potentials Adjacency constraints ensure adjacent amino acids

are ~3.8Å apart and in the proper orientation

ALA GLY LYS LEU

Page 17: A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry

Protein Backbone Model

Two types of structural (edge) potentials Adjacency constraints ensure adjacent amino acids

are ~3.8Å apart and in the proper orientation Occupancy constraints ensure nonadjacent amino

acids do not occupy same 3D space

ALA GLY LYS LEU

Page 18: A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry

Backbone Model Potential( | )p u Map

amino acid

( | )i ii

u Mapadjacent AAs

( )adj i j

i j

ψ u ,u

nonadjacent AAs

( )occ i j

i j

ψ u ,u

Constraints between adjacent amino acids:

(|| ||)x i jp x x= x( )adj i jψ u ,u ( )i jp u ,u

Page 19: A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry

Constraints between nonadjacent amino acids:

0 if || ||

1 otherwisei j

occ i j

x x Kψ u ,u

( | )p u Map

Backbone Model Potential

amino acid

( | )i ii

u Mapadjacent AAs

( )adj i j

i j

ψ u ,u

nonadjacent AAs

( )occ i j

i j

ψ u ,u

Page 20: A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry

Observational (“amino-acid-finder”) probabilities

-2 2

|

Pr(5mer ... at )i i

i i i

ψ u

s s u

Map

( | )p u Map

Backbone Model Potential

amino acid

( | )i ii

u Mapadjacent AAs

( )adj i j

i j

ψ u ,u

nonadjacent AAs

( )occ i j

i j

ψ u ,u

Page 21: A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry

Probabilistic Inference

Exact methods are intractable Use belief propagation (BP) to approximate

marginal distributions

Want to find backbone layout that maximizes

( | ) ( | )i ip u p Map U Map

amino acid

( | )i ii

u Mapadjacent AAs

( )adj i j

i j

ψ u ,u

nonadjacent AAs

( )occ i j

i j

ψ u ,u

, ku k i

Page 22: A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry

Belief Propagation (BP)

Iterative, message-passing method (Pearl 1988)

A message, , from amino acid i toamino acid j indicates where i expects to find j

An approximation to the marginal (or belief) ,is given as the product of incoming messages

nib

ni jm

Page 23: A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry

2 ( )GLY ALA ALAm x )(1GLYGLYALA xm

1 ( | )GLY GLYb x y0 ( | )ALA ALAb x y 0 ( | )GLY GLYb x y2 ( | )ALA ALAb x y

Belief Propagation Example

ALA GLY

Page 24: A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry

Technical Challenges

Representation of potentials Store Fourier coefficients in Cartesian space At each location x, store a single orientation r

Speeding up O(N2X2) naïve implementation X = the unit cell size (# Fourier coefficients) N = the number of residues in the protein

Page 25: A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry

Speeding Up O(N2X2) Implementation

O(X2) computation for each occupancy message Each message must integrate over the unit cell O(X log X) as multiplication in Fourier space

O(N2) messages computed & stored Approx N-3 occupancy messages with a single message O(N) messages using a message product accumulator

Improved implementation O(NX log X)

Page 26: A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry

1XMT at 3Å Resolution

1.12Å RMSd100% coverage

HIGH

LOW

0.17

0.82

prob(AA at location)

Page 27: A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry

1VMO at 4Å Resolution

3.63Å RMSd72% coverage

0.02

0.25

HIGH

LOW

prob(AA at location)

Page 28: A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry

1YDH at 3.5Å Resolution

1.47Å RMSd90% coverage

0.02

0.27

HIGH

LOW

prob(AA at location)

Page 29: A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry

Experiments

Tested ACMI against other map interpretation algorithms: TEXTAL and Resolve

Used ten model-phased maps

Smoothly diminished reflection intensitiesyielding 2.5, 3.0, 3.5, 4.0 Å resolution maps

Page 30: A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry

RMS Deviation

0

2

4

6

8

10

12

2.0 2.5 3.0 3.5 4.0 4.5

ACMI

Textal

Resolve

ACMITextalResolve

Density Map Resolution

RM

S D

evia

tion ACMI

Page 31: A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry

0%

20%

40%

60%

80%

100%

2.0 2.5 3.0 3.5 4.0 4.5

ACMI

Textal

Resolv11e

Model Completeness

Density Map Resolution

0%

20%

40%

60%

80%

100%

2.0 2.5 3.0 3.5 4.0 4.5

ACMI

Textal

Resolve% c

hain

tra

ced

% r

esid

ues

iden

tifie

d

ACMI

Page 32: A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry

Per-protein RMS Deviation

ACMI RMS Error

TE

XT

AL

RM

S E

rror

Res

olve

RM

S E

rror

0

2

4

6

8

10

12

14

16

0 2 4 6 8 10 12 14 16

2.5A3.0A3.5A4.0A

0

2

4

6

8

10

12

14

16

0 2 4 6 8 10 12 14 16

Page 33: A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry

Conclusions

ACMI effectively combines weakly-matching templates to construct a full model

Produces an accurate trace even with poor-quality density map data

Reduces computational complexity from O(N2

X2) to O(N X log X)

Inference possible for even large unit cells

Page 34: A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry

Future Work

Improve “amino-acid-finding” algorithm

Incorporate sidechain placement / refinement

Manage missing data Disordered regions Only exterior visible (e.g., in CryoEM)

Page 35: A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry

Acknowledgements

Ameet Soni

Craig Bingman

NLM grants 1R01 LM008796 and 1T15 LM007359