The muffled bleating of cynical presenters disturbs …...Term Functional Form Two common director distributions Elongated spot c 20 + c 22 Cone c 20 + c 40 Matlab potential visualization

The muffled bleatingof cynical presentersdisturbs my daydreams

Statistical Inference of Parameters from Least-Squares Fitting of Slow-Motional EPR Spectra

David E. BudilACERT Workshop on Analysis of ESR Data from Motional

Dynamics

Cornell UniversityNovember 16-18, 2007

Getting startedGo towww.acert.cornell.edu/Workshop_2007/index_files/Page352.htmScroll down to last session in schedule, follow “Download Workshop Materials” linkSelect all files by checking box at top leftPress “Download” to save ACERT.zip to the desktopExtract all files to Desktop\ACERTOpen MatlabNavigate to C:\Documents and Settings\labuser\Desktop\ACERTType “nls eprldef”Press “Fit” in NLS GUI—Presto!

Humble supplicantsfutilely wait for skeeve2To bestow a node

http://www.acert.cornell.edu/Workshop_2007/index_files/Page352.htm

Outline

1st part: OverviewMatlab-based lineshape calculation programsReview of fitting algorithmsError estimation methods

2nd part: ApplicationsEPRLL/EPRLFNLSMoving forward with Matlab

Matlab-based programs

Matlab-callable EPRLL/EPRLF programsCalling EPRLL/EPRLF directly from MatlabNLS “driver” graphical user interfaceMinimization algorithms

Gauss-NewtonNelder-Mead

ISBN-10: 0471731897 Buy now....

From the Shameless Commerce Dept…

Commercialismrears its ugly head todayBuy one, get one free

http://www.amazon.com/Advanced-ESR-Methods-Polymer-Research/dp/0471731897/ref=si3_rdr_bb_product

Some parameter conventions

Cartesian tensor components(form=1)

Pseudospherical tensor components(form = 2)

g1 (isotropic) (gx+gy+gz)/3g2 (axial) gz – (gx+gy)/2g3 (rhombic) (gx – gy)

Dynamic parameters

Dynamic parameters in Matlab NLS are expressed as the base 10 logarithm of the corresponding dynamic parameter. Examples:

rx, ry, rz log10 Rx, log10 Ry, log10 Rz

oss log10 ωSS

rbar = ( rx + ry + rz )/3

n = rz – ( rx + ry )/2

nxy = rx – ry

( ) 3/1Zyx RRRR =

yxxy RRN =

( ) 2/1yx

z

RRR

N =

In log (NLS) space In linear space

The transformation between cartesian and pseudospherical components takes a special form for the R tensor:

MOMD Dearest

Microscopic Order

Macroscopic Disorder

Cage diffusion

axes

A sloping shoulderoverturns the world we knowBut SRLS will fit it

MOMD in labeled biomolecules

N

zD

xm

ym

zm

O

xL

yL

zL

xR

yR

zR

zm

xm

ym

α

β

γ

y′ x′

MOMD-like lineshapeThe analysis is slowThe paper is long

Director in a labeled protein

N OxM

zM y

M zR zR

<zR>

ζ

zR traces out a trajectory in the protein frame

xM

zM

Viewed in the label frame, zD traces out a trajectory from which the potential coefficients are calculated

zD

The director axis zD is defined to be the average orientation of the z diffusion axis ⟨zR⟩in the protein frame.

1cos321 2

20 −ζ=S

The ordering potential

( )∑ Ω−=ΩKL

LK

LKB cTkU

,0)( D

Assumptions:•U is symmetrical w.r.t. director axis•U has C2 symmetry axes director axis•Ordering axes and diffusion axes of label coincide

LKc

LK

LK cc −=

Restrictions:real-valued

(symmetric combinations)

L, K even

Orienting potential

)(20 θD ( )1cos3

21 2 −θ

),(22 φθD φθ 2cossin

23 2

)(40 θD ( )[ ]3cos30cos35

81 24 +− θθ

),(42 φθD ( ) φθθ 2cos1cos7sin

85 24 −

),(44 φθD φθ 4cossin

835 4 gx4−y4

gz2(x2−y2)

gz4

dx2−y2

dz2

Analogous atomic orbital

Functional FormTerm

Two common director distributions

Elongated spotc20 + c22

Conec20 + c40

Matlab potential visualization toolAt Matlab prompt type “popnodist”

Basis set indices

Index

Physical interpretation Range

L Quantum number for total rotational angular momentum

(0 to Lemax)

K Quantum number for projection of rotational angular momentum on laboratory Z axis

(−L to L)

M Quantum number for projection of rotational angular momentum on molecular Z axis

(−L to L)

pI Nuclear spin transition index: net change in Z-projection of nuclear magnetic moment

(−2I to 2I)

qI Nuclear spin transition index: total number of nuclear spin quanta involved in a transition

(|pI| − 2I to 2I −|pI| )

Note: in EPRLL, pS = 0 (high field approximation) in EPRLF, pS = 0, ±1

Choosing a Basis Set

Truncation parameters: Lemx, Lomx, Kmx, Mmx and pImx

Effects of an incomplete basis set:

( ) 2loglog 210min,10 +−= ⊥⊥ RRALemx

νν

0068.07.30075.040.8min,

+=

+=⊥

AR

“Rules of thumb” for basis set indices in EPRLL

Lomx ≅ 0.7 Lemx

Kmx ≅ 0.3 Lemx (N > 10) to 0.9 Lemx (N < 0.1)

pImx = 2 I (nuclear spin)

Mmx = Kmx (MOMD) or pImx ( U( Ω) = 0 )

Kmn = 0 (αD, γD = 0) or −Kmx

Mmn = 0 (γN = 0) or −Kmx

Outline1st part: Overview

Matlab-based lineshape calculation programsReview of fitting algorithms

Gauss-Newton methodsNelder-Mead (downhill Simplex)Separation of linear variables

Error estimation methods


General definitions for minimization

Fitting by unweighted sum of squares

h vector of experimental spectrum intensitiesx vector of ESR parametersφ(x) vector of calculated spectrum intensitiesf(x)=h−φ(x) vector of residuals

( ) ( ) ( )xfxfxf ⋅== T22s

Fitting by weighted sum of squares

( ) ( ) ( )xfxfxf ⋅σ⋅== −1T22χ

σ may be taken as the standard deviation of the noise at the spectrum baseline

Downhill Simplex•1. order by values at vertices:

•2. Compute a reflection: xo is the center of gravity of all points except xn + 1.If , compute a new simplex with xr; rejct xn + 1. Go to step 1.

•3. expansion: If If compute new simplex with xe else compute new simplex with xr Go to step 1.

•4. contraction: If . If compute new simplex with xc. Go to step 1. Else go to step 5.

•5. shrink step: Compute the n vertices evaluations: . go to step

1.

Simplex search over selected functions

Banana function Banana function

Gauss-Newton methods: the Jacobian

( )

⎥⎥⎥⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢⎢⎢⎢

⎣

⎡

⎟⎟⎠

⎞⎜⎜⎝

⎛∂∂

⎟⎟⎠

⎞⎜⎜⎝

⎛∂∂

⎟⎟⎠

⎞⎜⎜⎝

⎛∂∂

⎟⎟⎠

⎞⎜⎜⎝

⎛∂∂

=⊕⎟⎟⎠

⎞⎜⎜⎝

⎛∂∂

⊕⎟⎟⎠

⎞⎜⎜⎝

⎛∂∂

=

OMM

L

L

L

00

00

002

2

2

2

2

1

1

1

210

xx

xx

xxxf

xf

xf

xf

xxffxJ

MOMDspectrum

⎟⎟⎠

⎞⎜⎜⎝

⎛∂

∂R10log

Example for 2 parameters

⎟⎟⎠

⎞⎜⎜⎝

⎛∂

∂

20c

All problems are nailsif the only tool you haveworks like a hammer

The Hessian (curvature) matrix

( ) ( ) ( ) ( )

( ) ( ) ( )xJxJxH

x

T ⋅≈

⎟⎟⎠

⎞⎜⎜⎝

⎛

∂∂

⎟⎟⎠

⎞⎜⎜⎝

⎛∂

∂≈⎟

⎟⎠

⎞⎜⎜⎝

⎛

∂∂∂

=jiji

ij xxxxH

2222 χχχ

If f(x0) is approximated by a quadratic function in parameter space:

( )

we may directly step from xcurrent to the solution (Gauss-Newton step)

( ) ( ) ( )[ ] xHx21xxfxJxfxf 0 ⋅⋅+⋅−≈ 00

T22

( ) 0T

0min )( xxJxHxx 01

0 ⋅−≈ −

H is also used in error estimation!

G-N methods typically approximate the Hessian using the Jacobian

Separable parameters: scaling

b = AcExperimentalspectrum

Calculatedcomponentspectra

Scaling coefficients

Least-squares scaling coefficients may be obtained from the calculated spectra:

c1c2=

Solution: c = A−1b

Five component fitYet the meaning is unclearTomorrow try six

Spectral “shifting”calculated

“experimental”

-150 -100 -50 0 50 100 150 200-3

-2

-1

0

1

2

3

4x 10-3

Field lag/gauss

Max occurs at lag of -20 GaussCalculateCross-correlation

Shift “calculated”spectrum by -20 Gauss

Mismatch due to uncalibrated frequency, g

Outline1st part: Overview

Matlab-based lineshape calculation programsReview of fitting algorithmsError estimation methods

Curvature matrix uncertainty estimationParameter variance and covarianceIndeterminate/linearly dependent parameters

Monte Carlo uncertainty estimation


The Covariance Matrix

( ) ( ) ( ) ( )[ ] 1T1 **** −− == xJxJxCxc

The covariance matrix is calculated at the x* vector corresponding to a minimum

( ) ( ) αχχ pnppppppp Fp −=≤−⋅⋅− ,22** ΔxxCxx

( )[ ]222

1

2

1χ

χ

Δ

Δ

iiijjiiijiijii

j

j

i

jjji

ijii

j

i

cdxcccdxcc

dx

dx

dx

cc

cc

dx

dx

+−±=

⎟⎟

⎠

⎞

⎜⎜

⎝

⎛⋅

⎥⎥⎦

⎤

⎢⎢⎣

⎡⋅⎟

⎟

⎠

⎞

⎜⎜

⎝

⎛=

−

Error ellipse curves for 2 parameters

F distribution(Can use Matlab finv, fcdf, fpdf, etc.)

Old One gives lectureYou cannot understand itConsult your textbook

The Correlation Matrix

jjii

ijij CC

Cc =

Covariance of xi with xj

Variances of xi and xj

NLS reports correlation matrix C and variances Cii , from which covariance may be calculated

Correlation of xi

High ( > 0.8 ) values of cij can result when two columns of J are too similar

Caveats

Parameter uncertainty estimates from the curvature matrix are only accurate when

The only deviations between calculation and data are experimental noiseNoise is normally distributedThe σ2 weighting factor can be calculated as rms noiseThe χ2 function is well-approximated by a quadratic function in parameters spaceUncertainties in the parameters themselves are normally distributed (cf. Keith Earle)

Another difficulty for Gauss-Newton

Gauss-Newton methods can wander aimlessly if J(derivatives of spectrum) is inaccurate

Especially a concern in MOMD spectra

“spikes” in derivative lead to an overestimate in the rate of improvement of fit for parameter xj

xj will converge very slowly

nort =10

nort =3020c∂

∂

nort =10

The Downhill Simplex (Nelder-Mead) search is more robust in such cases)

Deviations above noise level

Cumulative distribution function

Should exhibit sigmoidal shape characteristic of a normal distribution

Monte Carlo Methods for Parameter Estimation

“Bootstrap” Monte CarloRe-fit spectrum N times with a fraction (1/e) of the points randomly chosen to be duplicates

Synthetic Dataset Monte CarloStart with calculated best-fit curve, add synthetic noise and re-fit N times to generate a sampling of parameters

Random RestartChoose a reasonable domain from which to draw N randomly distributed starting parameter sets and re-start fitting algorithm

Each of these methods leads to a set of N values for each xi from which the variance and covariance may be calculated

Vocatus atque non vocatus Capitaneus aderit

Practical exercises with Matlab NLS and Windows NLSL1. Standalone EPRLL/EPRLF

1. Comparison of methods for VO2+

2. MOMD orientations2. Simple NLS fitting procedure

1. Comparison of Gauss-Newton/Simplex2. Parameter Correlation3. Monte Carlo estimation of uncertainties

3. Basis set pruning4. Multiple components in NLSL5. Global analysis in NLSL

Segmentation faultSo close to a minimumLet’s try EasySpin

Overview of Matlab NLSInteractive—follow along in Users’ Guide in download directory

Some common pitfalls

Frequency is inconsistent with given g values/field rangeLine width is too narrow for the point spacingBasis set is insufficient for given dynamic parameters/spectral bandwidth

Missing, truncated, or otherwise bizarre lineshapes can result from the following common mistakes:

Example 1: Matlab EPRLL/EPRLF

This exercise makes a comparison of X-band spectra of slow-motional VO2+ ion calculated using EPRLL (high-field approximation) and EPRLF

P = vodefspc1 = meprf( P );spc2 = meprl( P );plot( P.fld, [spc1,spc2] )Legend( { ‘EPRLF’, ’EPRLL’ } )

2600 2800 3000 3200 3400 3600 3800 4000 4200-6

-4

-2

0

2

4

6x 10-5

EPRLFEPRLL

Because the hyperfine interaction of the V nucleus cannot be neglected at X-band fields, EPRLF is needed

Note that vodef.m is a copy of eprfdef.m that has been edited to set default parameters for VO2+

Use the following commands:

Exercise 2: MOMD spectra

P = momdldef;P.ideriv = 0;[spc,MOMD] = meprl(P) angle = 180*acos([0:P.nort-1]/(P.nort-1))/pisurf(angle,P.fld,MOMD)xlabel 'Tilt Angle‘ylabel 'Field'

020

4060

80100

3350

3400

34500

0.02

0.04

0.06

0.08

Tilt AngleField

Basic fitting

In Example Data subfolder, read in datafilep1a.datCarry out a “nonlinear” minimization of Rx, Ry, and Rz

Find correlation matrix (Covar button on GUI)Repeat using spherical representation

Set R.form to 2Set R.x to Rbar (isotropic R)Set R.y to N (R|| / R⊥)

Checking distribution of residuals

-3 -2 -1 0 1 2 3

x 10-4

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Residual

0 100 200 300 400 500 600-4

-2

0

2

4x 10-4

Plot residuals Save curves (button on GUI)

f = p1a_data-p1a_sim;figure; plot(f);

Examine a histogram of the residuals:figure; hist(f,30)

Examine the cumulative probability distributionp = [1:length(f)]/length(f);figure; plot( sort(f), p )

-3 -2 -1 0 1 2 3 4 5

x 10-4

0

10

20

30

40

50

60

70

Monte Carlo estimation

Select Simplex MC method (name output file “p1a.log”)When run is complete, edit p1a.logCopy matrix in log file to a MATLAB variable (say, “M”)

1st N columns are the random starting pointsColumns N+1 to 2N are the minimization end pointsColumn 2N+1 is the χ2

Column 2N+2 is the number of iterations

Continuation of Rx, Ry, Rz search

Results from random restart

67

89

67

896

6.5

7

7.5

8

8.5

9

log10 Rxlog10 Rylo

g 10 R

z

Plot starting points and minimized pointsplot3( M(:,1), M(:,2), M(:,3) )plot3( M(:,4), M(:,5), M(:,6),’ro’ )

Plot points within x2 tolerancetol = min( M(:,end) ) *1.10;keep = M(:,end) < tol;Plot3( M(keep,4), M(keep,5), M(keep,6), ‘go’)

Find covariance of selected pointscov( M(keep,4:6) );

Note: Starting points are inColumns 1-3

Minimized points are inColumns 4-6

Acknowledgements

SupportNSF CHE, DBIArmy CDMRP

StudentsKhaled KhairyStefano GullaJamie Lawton

ColleaguesKeith Earle (Albany)David Schneider (Cornell)Jack Freed (ACERT)

Wood pulp futures upThe next Freed grant proposalhas no page limit