Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Studying Protein MotionsDialogues with Data
SummaryThanks
Parameter Estimation as a Problem inStatistical Thermodynamics
Keith A. Earle1,2 David J. Schneider3,4
1Department of PhysicsUniversity at Albany, SUNY
2ACERTCornell University
3USDA
4Department of Plant PathologyCornell University
Thursday, 8 July 20101 / 40
Studying Protein MotionsDialogues with Data
SummaryThanks
Outline
1 Studying Protein MotionsResolving Multiple Time ScalesInterpreting Magnetic Resonance Spectra
2 Dialogues with DataBuilding a Probability Distribution FunctionWhat can you do with your PDF?Putting Geometry and Statistical Mechanics to Work
2 / 40
Studying Protein MotionsDialogues with Data
SummaryThanks
Resolving Multiple Time ScalesInterpreting Magnetic Resonance Spectra
Outline
1 Studying Protein MotionsResolving Multiple Time ScalesInterpreting Magnetic Resonance Spectra
2 Dialogues with DataBuilding a Probability Distribution FunctionWhat can you do with your PDF?Putting Geometry and Statistical Mechanics to Work
3 / 40
Studying Protein MotionsDialogues with Data
SummaryThanks
Resolving Multiple Time ScalesInterpreting Magnetic Resonance Spectra
T4 LysozymeACERT and Hubbell group collaboration
Figure: Dynamic modes: global and local
4 / 40
Studying Protein MotionsDialogues with Data
SummaryThanks
Resolving Multiple Time ScalesInterpreting Magnetic Resonance Spectra
Sensitivity of EPRFrequency-dependent windows
Figure: Dynamic processes cover many decades in rate
5 / 40
Studying Protein MotionsDialogues with Data
SummaryThanks
Resolving Multiple Time ScalesInterpreting Magnetic Resonance Spectra
Hamiltonian
6 / 40
Studying Protein MotionsDialogues with Data
SummaryThanks
Resolving Multiple Time ScalesInterpreting Magnetic Resonance Spectra
Field-Swept vs. Time Domain
7 / 40
Studying Protein MotionsDialogues with Data
SummaryThanks
Resolving Multiple Time ScalesInterpreting Magnetic Resonance Spectra
A Third WayStochastic Excitation
Figure: Blümich Prog. in NMR Spec. 19:331–417 (1987)
Magnetic resonance absorption is non-negative andnormalizedCan be treated as a probability density functionLeads to statistical geometry
8 / 40
Studying Protein MotionsDialogues with Data
SummaryThanks
Resolving Multiple Time ScalesInterpreting Magnetic Resonance Spectra
Derivative vs Absorption Representation
Applied magnetic field 9 T
9 / 40
Studying Protein MotionsDialogues with Data
SummaryThanks
Resolving Multiple Time ScalesInterpreting Magnetic Resonance Spectra
High field gives g and A resolution
10 / 40
Studying Protein MotionsDialogues with Data
SummaryThanks
Resolving Multiple Time ScalesInterpreting Magnetic Resonance Spectra
Outline
1 Studying Protein MotionsResolving Multiple Time ScalesInterpreting Magnetic Resonance Spectra
2 Dialogues with DataBuilding a Probability Distribution FunctionWhat can you do with your PDF?Putting Geometry and Statistical Mechanics to Work
11 / 40
Studying Protein MotionsDialogues with Data
SummaryThanks
Resolving Multiple Time ScalesInterpreting Magnetic Resonance Spectra
Dynamics
Isotropic part of spin interactions give line positionsAnisotropic part of spin interactions give line widths
12 / 40
Studying Protein MotionsDialogues with Data
SummaryThanks
Resolving Multiple Time ScalesInterpreting Magnetic Resonance Spectra
Collect spectra over a range of FrequenciesGenerate a data matrix
Figure: 131 R2 in ficoll solution
Zhang, et al., J. Phys. Chem. 114(16):5503–5521.
13 / 40
Studying Protein MotionsDialogues with Data
SummaryThanks
Resolving Multiple Time ScalesInterpreting Magnetic Resonance Spectra
Complex, Heterogeneous SystemsThe Blind Monk Problem
Figure: Simultaneous multifrequency line shape analysis: Many blindmonks, pooling their knowledge, can learn a lot about the elephant.
14 / 40
Studying Protein MotionsDialogues with Data
SummaryThanks
Resolving Multiple Time ScalesInterpreting Magnetic Resonance Spectra
Eliminate Unnecessary Detail
Figure: Tame the elephant, but don’t overwhelm the science
15 / 40
Studying Protein MotionsDialogues with Data
SummaryThanks
Building a Probability Distribution FunctionWhat can you do with your PDF?Putting Geometry and Statistical Mechanics to Work
Outline
1 Studying Protein MotionsResolving Multiple Time ScalesInterpreting Magnetic Resonance Spectra
2 Dialogues with DataBuilding a Probability Distribution FunctionWhat can you do with your PDF?Putting Geometry and Statistical Mechanics to Work
16 / 40
Studying Protein MotionsDialogues with Data
SummaryThanks
Building a Probability Distribution FunctionWhat can you do with your PDF?Putting Geometry and Statistical Mechanics to Work
Start with a simple model
Figure: Simulation of an exchange-narrowed multiplet with noise
17 / 40
Studying Protein MotionsDialogues with Data
SummaryThanks
Building a Probability Distribution FunctionWhat can you do with your PDF?Putting Geometry and Statistical Mechanics to Work
Analytical Expression for Lineshape
In the absence of noise, the line shape has the following form:
p(ω|θ)= 1π(2I+1) <
[〈 v | C−1(ω|θ) | v 〉
], (1)
C(ω|θ)=
i(∆ω+J)−3/5T 1/5T 2/5T
1/5T i(∆ω)−2/5T 1/5T
2/5T 1/5T i(∆ω−J)−2/5T
(2)
The expression for p(ω|θ) is non-negative and normalizable.Treat it as a probability density function (PDF)1.
1Streater, R. F. “Statistical Dynamics: A Stochastic Approach toNonequilibrium Thermodynamics”, 2nd Edition.
18 / 40
Studying Protein MotionsDialogues with Data
SummaryThanks
Building a Probability Distribution FunctionWhat can you do with your PDF?Putting Geometry and Statistical Mechanics to Work
Parameter Sensitivity
The Fisher Information Matrix is a way to quantify theparameter sensitivity of the model.
gij(θ) =
∫dω
(∂ ln p(ω|θ)
∂θi
)(∂ ln p(ω|θ)
∂θj
)p(ω|θ) (3)
The determinant of the Fisher information matrix is a usefulmeasure of parameter ‘stiffness’.
19 / 40
Studying Protein MotionsDialogues with Data
SummaryThanks
Building a Probability Distribution FunctionWhat can you do with your PDF?Putting Geometry and Statistical Mechanics to Work
Parameter Combinations
Eigenvalues and Eigenvectors of the Fisher Information identifysignificant parameter combinations
Fisher Information Eigenvalues Eigenvectors ×100[9270 −224.−224. 4970
] [4960 0
0 9280
] [−5.2 −99.9−99.9 5.2
]Table: Left: Matrix elements of the Fisher information. Center:Eigenvalues of the Fisher information matrix. Right: Eigenvectors ofthe Fisher information. Matrix element order: J, 1/T
Software for computing exchange line shapes is available at theEarle group website2.
2http://earlelab.rit.albany.edu. Thanks to Nabin Malakar fortranslating the original octave scripts to a matlab-compatible form.
20 / 40
Studying Protein MotionsDialogues with Data
SummaryThanks
Building a Probability Distribution FunctionWhat can you do with your PDF?Putting Geometry and Statistical Mechanics to Work
Analytical Derivatives
More realistic models: incorporate Zeeman and Hyperfineinteractions.
21 / 40
Studying Protein MotionsDialogues with Data
SummaryThanks
Building a Probability Distribution FunctionWhat can you do with your PDF?Putting Geometry and Statistical Mechanics to Work
Depicting any coordinate system in a Cartesian wayimplies a Cartesian geometry, but few people take thatgeometry seriously. Once you differentiate vectorfields or compute Taylor series expansions in theusual way, you have taken that geometry veryseriously even if you don’t realize it.
Paraphrased from M. K. Murray and J. W. Rice “DifferentialGeometry and Statistics’ (Chapman and Hall, New York) 1993.
22 / 40
Studying Protein MotionsDialogues with Data
SummaryThanks
Building a Probability Distribution FunctionWhat can you do with your PDF?Putting Geometry and Statistical Mechanics to Work
Distortion Energy
Define a distortion energy
U(ω|θ) =12
g (S(ω) − M(ω|θ))2 (4)
Canonical distribution for the distortion energy from the methodof Lagrange multipliers. Here, β specifies the mean energy:observed mean squared residual
P(ω|θ) =exp(−βU(ω|θ))∑ω∈Ω exp(−βU(ω|θ))
(5)
The denominator is the partition function for this system.
23 / 40
Studying Protein MotionsDialogues with Data
SummaryThanks
Building a Probability Distribution FunctionWhat can you do with your PDF?Putting Geometry and Statistical Mechanics to Work
Parameter Dependence of PDF
Figure: Blue: PDF of optimum model. Red: PDF of suboptimal model
The PDF fluctuates around 1/N. Here, N is the number ofobservation points in the measurement band.
24 / 40
Studying Protein MotionsDialogues with Data
SummaryThanks
Building a Probability Distribution FunctionWhat can you do with your PDF?Putting Geometry and Statistical Mechanics to Work
Parameter Sensitivity to the signal to noise ratio
Figure: Left: S/N ≈ 100. Right: S/N ≈ 50. Top: Spectrum J = 0.3[s−1], T = 100 [s]. Bottom: Z
25 / 40
Studying Protein MotionsDialogues with Data
SummaryThanks
Building a Probability Distribution FunctionWhat can you do with your PDF?Putting Geometry and Statistical Mechanics to Work
Benefit of Higher S/N
Figure: Red: S/N ≈ 50. Blue: S/N ≈ 100
The steps indicate progress along a linear path fromJi = 0.1→ Jf = 0.3 and Ti = 50→ Tf = 100.
26 / 40
Studying Protein MotionsDialogues with Data
SummaryThanks
Building a Probability Distribution FunctionWhat can you do with your PDF?Putting Geometry and Statistical Mechanics to Work
The Big Picture
Optimum PDF is uniform (plus fluctuations): θ = θ0
Suboptimal PDF has large excursions (large distortionenergy): θ , θ0
Update model parameters to achieve uniform PDF:θ→ θ0.Parameter optimization is a transport problem.Compute entropy from the partition function
∆S =∑ω∈Ω P(ω|θ) ln
(P(ω|θ)P(ω|θ0)
)Estimate P(ω|θ0) from the noise residual in the baseline:Don’t need to know θ0 a priori.
27 / 40
Studying Protein MotionsDialogues with Data
SummaryThanks
Building a Probability Distribution FunctionWhat can you do with your PDF?Putting Geometry and Statistical Mechanics to Work
Outline
1 Studying Protein MotionsResolving Multiple Time ScalesInterpreting Magnetic Resonance Spectra
2 Dialogues with DataBuilding a Probability Distribution FunctionWhat can you do with your PDF?Putting Geometry and Statistical Mechanics to Work
28 / 40
Studying Protein MotionsDialogues with Data
SummaryThanks
Building a Probability Distribution FunctionWhat can you do with your PDF?Putting Geometry and Statistical Mechanics to Work
Thermal Equilibrium
From statistical mechanics we know that thermal equilibrium isachieved when
The free energy A = E − S/β is minimized.The entropy S is a maximum.E is fixed by the choice of β.This is an alternative perspective on least squaresminimization.
29 / 40
Studying Protein MotionsDialogues with Data
SummaryThanks
Building a Probability Distribution FunctionWhat can you do with your PDF?Putting Geometry and Statistical Mechanics to Work
Outline
1 Studying Protein MotionsResolving Multiple Time ScalesInterpreting Magnetic Resonance Spectra
2 Dialogues with DataBuilding a Probability Distribution FunctionWhat can you do with your PDF?Putting Geometry and Statistical Mechanics to Work
30 / 40
Studying Protein MotionsDialogues with Data
SummaryThanks
Building a Probability Distribution FunctionWhat can you do with your PDF?Putting Geometry and Statistical Mechanics to Work
The heat capacity of a spectrum.
Heat capacity tells us about fluctuations in energy
CV ≡ β2(⟨
E2i
⟩− 〈Ei〉2
)≡ β2(∆E)2 ≡ β2∂
2 ln Z∂β2 (6)
Figure: Heat capacity (arb. units) as a function of parameter step
31 / 40
Studying Protein MotionsDialogues with Data
SummaryThanks
Building a Probability Distribution FunctionWhat can you do with your PDF?Putting Geometry and Statistical Mechanics to Work
Composite Systems: I
The partition function for N copies of an ‘isolated’ spectrum is
Z → 1N!
(Vv0ζ
)N
. (7)
ζ is the partition function defined earlier divided by V/v0.V/v0 is the number of measurements ω ∈ Ω.v0 is the measurement resolution.
If the N copies are near the optimum parameter set θ = θ0,
S = N[ln(
VN
1v0
)+ 1]
. (8)
This is the analogue of the Sackur-Tetrode equationappropriate for this system.
32 / 40
Studying Protein MotionsDialogues with Data
SummaryThanks
Building a Probability Distribution FunctionWhat can you do with your PDF?Putting Geometry and Statistical Mechanics to Work
Composite Systems: II
For a composite system of k spectral bands with Nj copies ofeach spectral band
Z =
k∏j=1
1Nj !
(Vj
v(j)0
ζj
)Nj
(9)
The entropy for such a system near the optimum parameter setis
S =
k∑j=1
Nj
[ln
(Vj
Nj
1
v(j)0
)+ 1
](10)
Note that the numbers Nj allow us to weight differentcontributions to the entropy.
33 / 40
Studying Protein MotionsDialogues with Data
SummaryThanks
Building a Probability Distribution FunctionWhat can you do with your PDF?Putting Geometry and Statistical Mechanics to Work
Equilibrium.How to analyze multiple data sets
Problem from Reif3: Two substances with different specificheats CA and CB at temperature TA and TB are brought intocontact. What is the final temperature?
Answer: Tf = CATA+CBTBCA+CB
when CA and CB are independent oftemperature. Extensions to more systems in contact areobvious.
Gives us a hint for a way to infer parameters from multiple datasets rationally.
3Reif, F. “Foundations of Statistical and Thermal Physics” McGraw-Hill(New York, 1965)
34 / 40
Studying Protein MotionsDialogues with Data
SummaryThanks
Building a Probability Distribution FunctionWhat can you do with your PDF?Putting Geometry and Statistical Mechanics to Work
Composite Systems: III
Spectra from different bands will have differentcharacteristic βj .The ‘stiffness’ term gj in the distortion energy is also banddependent.Adjusting the number of copies Nj of each subsystemallows one to tune βj .Similar to
√Nj improvement in SNR due to signal
averaging.Treat the composite system in the ‘isothermal’ ensemble.
35 / 40
Studying Protein MotionsDialogues with Data
SummaryThanks
Building a Probability Distribution FunctionWhat can you do with your PDF?Putting Geometry and Statistical Mechanics to Work
Spin-labeling ExampleNCp7 and TAR-DNA
Figure: 5NCp7 + SL TAR DNA: X, Q, W, D, G bands
Parameters derived from X-band experiments: Scholesresearch group.
36 / 40
Studying Protein MotionsDialogues with Data
SummaryThanks
Building a Probability Distribution FunctionWhat can you do with your PDF?Putting Geometry and Statistical Mechanics to Work
Spin-labeling ExampleDifferent Model
Figure: 5NCp7 + SL TAR DNA: Different component relative weights
Non-optimum parameters change CV .Sharp feature have larger dynamic range with respect tothe noise
37 / 40
Studying Protein MotionsDialogues with Data
SummaryThanks
Summary
Statistical Physics provides tools for exploring parameters.Classical Thermodynamics can offer insights into the fittingprocess.Constraints induce geometry in parameter space.
OutlookGeneralized coordinates and conjugate forces may help tofoster further insights.Volume allows one to define parameter ‘compressibilities’.Implementing search algorithms on curved manifolds mayallow further refinements of error estimates.
38 / 40
Studying Protein MotionsDialogues with Data
SummaryThanks
Students, Colleagues and Institutions
Yann Cotte, Philip Tuchscherer (M.Sc.)Laxman Mainali, Indra Dev Sahu (Ph.D.)Ariel Caticha, Kevin Knuth, Charles Scholes (Albany)David Schneider (USDA)Wayne Hubbell (UCLA)Jack Freed; Boris Dzikovski, Wulf Hofbauer, Joe Moscicki,Dmitriy Tipikin, Ziwei Zhang,. . . ACERTians past andpresent.
39 / 40
Studying Protein MotionsDialogues with Data
SummaryThanks
The Scarlet Piper of Albany
Figure: Mparilyn’s Mpingos Jig
40 / 40