Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
The muffled bleatingof cynical presentersdisturbs my daydreams
Statistical Inference of Parameters from Least-Squares Fitting of Slow-Motional EPR Spectra
David E. BudilACERT Workshop on Analysis of ESR Data from Motional
Dynamics
Cornell UniversityNovember 16-18, 2007
Getting startedGo towww.acert.cornell.edu/Workshop_2007/index_files/Page352.htmScroll down to last session in schedule, follow “Download Workshop Materials” linkSelect all files by checking box at top leftPress “Download” to save ACERT.zip to the desktopExtract all files to Desktop\ACERTOpen MatlabNavigate to C:\Documents and Settings\labuser\Desktop\ACERTType “nls eprldef”Press “Fit” in NLS GUI—Presto!
Humble supplicantsfutilely wait for skeeve2To bestow a node
Outline
1st part: OverviewMatlab-based lineshape calculation programsReview of fitting algorithmsError estimation methods
2nd part: ApplicationsEPRLL/EPRLFNLSMoving forward with Matlab
Matlab-based programs
Matlab-callable EPRLL/EPRLF programsCalling EPRLL/EPRLF directly from MatlabNLS “driver” graphical user interfaceMinimization algorithms
Gauss-NewtonNelder-Mead
ISBN-10: 0471731897 Buy now....
From the Shameless Commerce Dept…
Commercialismrears its ugly head todayBuy one, get one free
Some parameter conventions
Cartesian tensor components(form=1)
Pseudospherical tensor components(form = 2)
g1 (isotropic) (gx+gy+gz)/3g2 (axial) gz – (gx+gy)/2g3 (rhombic) (gx – gy)
Dynamic parameters
Dynamic parameters in Matlab NLS are expressed as the base 10 logarithm of the corresponding dynamic parameter. Examples:
rx, ry, rz log10 Rx, log10 Ry, log10 Rz
oss log10 ωSS
rbar = ( rx + ry + rz )/3
n = rz – ( rx + ry )/2
nxy = rx – ry
( ) 3/1Zyx RRRR =
yxxy RRN =
( ) 2/1yx
z
RRR
N =
In log (NLS) space In linear space
The transformation between cartesian and pseudospherical components takes a special form for the R tensor:
MOMD Dearest
Microscopic Order
Macroscopic Disorder
Cage diffusion
axes
A sloping shoulderoverturns the world we knowBut SRLS will fit it
MOMD in labeled biomolecules
N
zD
xm
ym
zm
O
xL
yL
zL
xR
yR
zR
zm
xm
ym
α
β
γ
y′ x′
MOMD-like lineshapeThe analysis is slowThe paper is long
Director in a labeled protein
N OxM
zM y
M zR zR
<zR>
ζ
zR traces out a trajectory in the protein frame
xM
zM
Viewed in the label frame, zD traces out a trajectory from which the potential coefficients are calculated
zD
The director axis zD is defined to be the average orientation of the z diffusion axis ⟨zR⟩in the protein frame.
1cos321 2
20 −ζ=S
The ordering potential
( )∑ Ω−=ΩKL
LK
LKB cTkU
,0)( D
Assumptions:•U is symmetrical w.r.t. director axis•U has C2 symmetry axes director axis•Ordering axes and diffusion axes of label coincide
LKc
LK
LK cc −=
Restrictions:real-valued
(symmetric combinations)
L, K even
Orienting potential
)(20 θD ( )1cos3
21 2 −θ
),(22 φθD φθ 2cossin
23 2
)(40 θD ( )[ ]3cos30cos35
81 24 +− θθ
),(42 φθD ( ) φθθ 2cos1cos7sin
85 24 −
),(44 φθD φθ 4cossin
835 4 gx4−y4
gz2(x2−y2)
gz4
dx2−y2
dz2
Analogous atomic orbital
Functional FormTerm
Two common director distributions
Elongated spotc20 + c22
Conec20 + c40
Matlab potential visualization toolAt Matlab prompt type “popnodist”
Basis set indices
Index
Physical interpretation Range
L Quantum number for total rotational angular momentum
(0 to Lemax)
K Quantum number for projection of rotational angular momentum on laboratory Z axis
(−L to L)
M Quantum number for projection of rotational angular momentum on molecular Z axis
(−L to L)
pI Nuclear spin transition index: net change in Z-projection of nuclear magnetic moment
(−2I to 2I)
qI Nuclear spin transition index: total number of nuclear spin quanta involved in a transition
(|pI| − 2I to 2I −|pI| )
Note: in EPRLL, pS = 0 (high field approximation) in EPRLF, pS = 0, ±1
Choosing a Basis Set
Truncation parameters: Lemx, Lomx, Kmx, Mmx and pImx
Effects of an incomplete basis set:
( ) 2loglog 210min,10 +−= ⊥⊥ RRALemx
νν
0068.07.30075.040.8min,
+=
+=⊥
AR
“Rules of thumb” for basis set indices in EPRLL
Lomx ≅ 0.7 Lemx
Kmx ≅ 0.3 Lemx (N > 10) to 0.9 Lemx (N < 0.1)
pImx = 2 I (nuclear spin)
Mmx = Kmx (MOMD) or pImx ( U( Ω) = 0 )
Kmn = 0 (αD, γD = 0) or −Kmx
Mmn = 0 (γN = 0) or −Kmx
Outline1st part: Overview
Matlab-based lineshape calculation programsReview of fitting algorithms
Gauss-Newton methodsNelder-Mead (downhill Simplex)Separation of linear variables
Error estimation methods
2nd part: ApplicationsEPRLL/EPRLFNLSMoving forward with Matlab
General definitions for minimization
Fitting by unweighted sum of squares
h vector of experimental spectrum intensitiesx vector of ESR parametersφ(x) vector of calculated spectrum intensitiesf(x)=h−φ(x) vector of residuals
( ) ( ) ( )xfxfxf ⋅== T22s
Fitting by weighted sum of squares
( ) ( ) ( )xfxfxf ⋅σ⋅== −1T22χ
σ may be taken as the standard deviation of the noise at the spectrum baseline
Downhill Simplex•1. order by values at vertices:
•2. Compute a reflection: xo is the center of gravity of all points except xn + 1.If , compute a new simplex with xr; rejct xn + 1. Go to step 1.
•3. expansion: If If compute new simplex with xe else compute new simplex with xr Go to step 1.
•4. contraction: If . If compute new simplex with xc. Go to step 1. Else go to step 5.
•5. shrink step: Compute the n vertices evaluations: . go to step
1.
Simplex search over selected functions
Banana function Banana function
Gauss-Newton methods: the Jacobian
( )
⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
⎟⎟⎠
⎞⎜⎜⎝
⎛∂∂
⎟⎟⎠
⎞⎜⎜⎝
⎛∂∂
⎟⎟⎠
⎞⎜⎜⎝
⎛∂∂
⎟⎟⎠
⎞⎜⎜⎝
⎛∂∂
=⊕⎟⎟⎠
⎞⎜⎜⎝
⎛∂∂
⊕⎟⎟⎠
⎞⎜⎜⎝
⎛∂∂
=
OMM
L
L
L
00
00
002
2
2
2
2
1
1
1
210
xx
xx
xxxf
xf
xf
xf
xxffxJ
MOMDspectrum
⎟⎟⎠
⎞⎜⎜⎝
⎛∂
∂R10log
Example for 2 parameters
⎟⎟⎠
⎞⎜⎜⎝
⎛∂
∂
20c
All problems are nailsif the only tool you haveworks like a hammer
The Hessian (curvature) matrix
( ) ( ) ( ) ( )
( ) ( ) ( )xJxJxH
x
T ⋅≈
⎟⎟⎠
⎞⎜⎜⎝
⎛
∂∂
⎟⎟⎠
⎞⎜⎜⎝
⎛∂
∂≈⎟
⎟⎠
⎞⎜⎜⎝
⎛
∂∂∂
=jiji
ij xxxxH
2222 χχχ
If f(x0) is approximated by a quadratic function in parameter space:
( )
we may directly step from xcurrent to the solution (Gauss-Newton step)
( ) ( ) ( )[ ] xHx21xxfxJxfxf 0 ⋅⋅+⋅−≈ 00
T22
( ) 0T
0min )( xxJxHxx 01
0 ⋅−≈ −
H is also used in error estimation!
G-N methods typically approximate the Hessian using the Jacobian
Separable parameters: scaling
b = AcExperimentalspectrum
Calculatedcomponentspectra
Scaling coefficients
Least-squares scaling coefficients may be obtained from the calculated spectra:
c1c2=
Solution: c = A−1b
Five component fitYet the meaning is unclearTomorrow try six
Spectral “shifting”calculated
“experimental”
-150 -100 -50 0 50 100 150 200-3
-2
-1
0
1
2
3
4x 10-3
Field lag/gauss
Max occurs at lag of -20 GaussCalculateCross-correlation
Shift “calculated”spectrum by -20 Gauss
Mismatch due to uncalibrated frequency, g
Outline1st part: Overview
Matlab-based lineshape calculation programsReview of fitting algorithmsError estimation methods
Curvature matrix uncertainty estimationParameter variance and covarianceIndeterminate/linearly dependent parameters
Monte Carlo uncertainty estimation
2nd part: ApplicationsEPRLL/EPRLFNLSMoving forward with Matlab
The Covariance Matrix
( ) ( ) ( ) ( )[ ] 1T1 **** −− == xJxJxCxc
The covariance matrix is calculated at the x* vector corresponding to a minimum
( ) ( ) αχχ pnppppppp Fp −=≤−⋅⋅− ,22** ΔxxCxx
( )[ ]222
1
2
1χ
χ
Δ
Δ
iiijjiiijiijii
j
j
i
jjji
ijii
j
i
cdxcccdxcc
dx
dx
dx
cc
cc
dx
dx
+−±=
⎟⎟
⎠
⎞
⎜⎜
⎝
⎛⋅
⎥⎥⎦
⎤
⎢⎢⎣
⎡⋅⎟
⎟
⎠
⎞
⎜⎜
⎝
⎛=
−
Error ellipse curves for 2 parameters
F distribution(Can use Matlab finv, fcdf, fpdf, etc.)
Old One gives lectureYou cannot understand itConsult your textbook
The Correlation Matrix
jjii
ijij CC
Cc =
Covariance of xi with xj
Variances of xi and xj
NLS reports correlation matrix C and variances Cii , from which covariance may be calculated
Correlation of xi
High ( > 0.8 ) values of cij can result when two columns of J are too similar
Caveats
Parameter uncertainty estimates from the curvature matrix are only accurate when
The only deviations between calculation and data are experimental noiseNoise is normally distributedThe σ2 weighting factor can be calculated as rms noiseThe χ2 function is well-approximated by a quadratic function in parameters spaceUncertainties in the parameters themselves are normally distributed (cf. Keith Earle)
Another difficulty for Gauss-Newton
Gauss-Newton methods can wander aimlessly if J(derivatives of spectrum) is inaccurate
Especially a concern in MOMD spectra
“spikes” in derivative lead to an overestimate in the rate of improvement of fit for parameter xj
xj will converge very slowly
nort =10
nort =3020c∂
∂
nort =10
The Downhill Simplex (Nelder-Mead) search is more robust in such cases)
Deviations above noise level
Cumulative distribution function
Should exhibit sigmoidal shape characteristic of a normal distribution
Monte Carlo Methods for Parameter Estimation
“Bootstrap” Monte CarloRe-fit spectrum N times with a fraction (1/e) of the points randomly chosen to be duplicates
Synthetic Dataset Monte CarloStart with calculated best-fit curve, add synthetic noise and re-fit N times to generate a sampling of parameters
Random RestartChoose a reasonable domain from which to draw N randomly distributed starting parameter sets and re-start fitting algorithm
Each of these methods leads to a set of N values for each xi from which the variance and covariance may be calculated
Vocatus atque non vocatus Capitaneus aderit
Practical exercises with Matlab NLS and Windows NLSL1. Standalone EPRLL/EPRLF
1. Comparison of methods for VO2+
2. MOMD orientations2. Simple NLS fitting procedure
1. Comparison of Gauss-Newton/Simplex2. Parameter Correlation3. Monte Carlo estimation of uncertainties
3. Basis set pruning4. Multiple components in NLSL5. Global analysis in NLSL
Segmentation faultSo close to a minimumLet’s try EasySpin
Overview of Matlab NLSInteractive—follow along in Users’ Guide in download directory
Some common pitfalls
Frequency is inconsistent with given g values/field rangeLine width is too narrow for the point spacingBasis set is insufficient for given dynamic parameters/spectral bandwidth
Missing, truncated, or otherwise bizarre lineshapes can result from the following common mistakes:
Example 1: Matlab EPRLL/EPRLF
This exercise makes a comparison of X-band spectra of slow-motional VO2+ ion calculated using EPRLL (high-field approximation) and EPRLF
P = vodefspc1 = meprf( P );spc2 = meprl( P );plot( P.fld, [spc1,spc2] )Legend( { ‘EPRLF’, ’EPRLL’ } )
2600 2800 3000 3200 3400 3600 3800 4000 4200-6
-4
-2
0
2
4
6x 10-5
EPRLFEPRLL
Because the hyperfine interaction of the V nucleus cannot be neglected at X-band fields, EPRLF is needed
Note that vodef.m is a copy of eprfdef.m that has been edited to set default parameters for VO2+
Use the following commands:
Exercise 2: MOMD spectra
P = momdldef;P.ideriv = 0;[spc,MOMD] = meprl(P) angle = 180*acos([0:P.nort-1]/(P.nort-1))/pisurf(angle,P.fld,MOMD)xlabel 'Tilt Angle‘ylabel 'Field'
020
4060
80100
3350
3400
34500
0.02
0.04
0.06
0.08
Tilt AngleField
Basic fitting
In Example Data subfolder, read in datafilep1a.datCarry out a “nonlinear” minimization of Rx, Ry, and Rz
Find correlation matrix (Covar button on GUI)Repeat using spherical representation
Set R.form to 2Set R.x to Rbar (isotropic R)Set R.y to N (R|| / R⊥)
Checking distribution of residuals
-3 -2 -1 0 1 2 3
x 10-4
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Residual
0 100 200 300 400 500 600-4
-2
0
2
4x 10-4
Plot residuals Save curves (button on GUI)
f = p1a_data-p1a_sim;figure; plot(f);
Examine a histogram of the residuals:figure; hist(f,30)
Examine the cumulative probability distributionp = [1:length(f)]/length(f);figure; plot( sort(f), p )
-3 -2 -1 0 1 2 3 4 5
x 10-4
0
10
20
30
40
50
60
70
Monte Carlo estimation
Select Simplex MC method (name output file “p1a.log”)When run is complete, edit p1a.logCopy matrix in log file to a MATLAB variable (say, “M”)
1st N columns are the random starting pointsColumns N+1 to 2N are the minimization end pointsColumn 2N+1 is the χ2
Column 2N+2 is the number of iterations
Continuation of Rx, Ry, Rz search
Results from random restart
67
89
67
896
6.5
7
7.5
8
8.5
9
log10 Rxlog10 Rylo
g 10 R
z
Plot starting points and minimized pointsplot3( M(:,1), M(:,2), M(:,3) )plot3( M(:,4), M(:,5), M(:,6),’ro’ )
Plot points within x2 tolerancetol = min( M(:,end) ) *1.10;keep = M(:,end) < tol;Plot3( M(keep,4), M(keep,5), M(keep,6), ‘go’)
Find covariance of selected pointscov( M(keep,4:6) );
Note: Starting points are inColumns 1-3
Minimized points are inColumns 4-6
Acknowledgements
SupportNSF CHE, DBIArmy CDMRP
StudentsKhaled KhairyStefano GullaJamie Lawton
ColleaguesKeith Earle (Albany)David Schneider (Cornell)Jack Freed (ACERT)
Wood pulp futures upThe next Freed grant proposalhas no page limit