Weighted Fit

8/3/2019 Weighted Fit

1/17

DOING PHYSICS WITH MATLAB

DATA ANALYSIS

WEIGHTED FITIan Cooper

School of Physics, University of Sydney

[email protected]

MATLAB SCRIPTS

Goto the directory containing the m-scripts for Data Analysis

The Matlab scripts are used to analysis sets of experimental data.

weighted_fit.m m-script used to fit a straight line to set of experimental data

which calls the following functions

fit_function.m function used to evaluate the fitted function

part_dev_fit.m function to evaluate partial derivatives of the fitted function

chi2test.m function to evaluate the 2

value

CURVE FITTING

LEAST SQUARES - UNCERTAINITES IN THE DATAIn many experiments, the functional relationship between two variables x and y is

investigated by measuring a set ofn values of (xi,yi) and their uncertainty xi, yi. The

functional relationship betweenx andy can be written as

(1) yf=f(a1, a2, , am;x) = f(a;x)

The goal is to find the unknown coefficients a = {a1, a2, , am} to fit the function

f(a;x) to the set ofn measurements. For a statistical analysis, it is difficult to consider

simultaneously the uncertainties in both thex andy values. In this treatment, only the

uncertainties yi in the y measurements will be considered. For the method of leastsquares, to find the coefficients a, the best estimates are those that minimizes the

2

value,given by equation (zz.2)

(2)

2

2

1

( ; )an i i

i yi

y f x

=

=

This is simply the sum of the squared deviations of the measurements from the fitted

functionf(a;x) weighted by the uncertainties yi in they values.

How to implement the method will be given in some detail so that you can modify them-script weighted_fit.m to add your own functions to fit a set of measurements.


2/17

This m-script calls two extrinsic functions: fit_function to evaluate the fittedfunction and part_dev to calculate the partial derivative of the function with respect

to the coefficients,( , )a

k

f x

a

. Also there is call to the m-script chi2trest.m to give a

measure the goodness of the fit. An iterative procedure is used based upon the

method of Marquardt where the minimum of2

is found by adjusting the value of thecoefficients through a damping factoru. The steps in the procedure and details of the

m-script weighted_fit.mare described in the following section.

USING THE M-SCRIPT weighted_fit.mThe measurements must be entered into a matrix called wData of dimension (n4).

The analysis only uses the uncertainties y associated with the y measurements. The

uncertainties xare only included to show any error bars when the measurements are

plotted.

Measurements Matrix

x x = wData(:, 1)

y y = wData(:, 2)

x dx = wData(:, 3)

y dy = wData(:, 4)

It is a good idea to clear all variables from the Workspace using the Matlab command

clear all. For example, the data (x,y,x, y) could be entered into the matrix wData1

and saved to a file from the Command Window using the Matlab command savewData1. To use this data at any time in the Command Window, use the Matlab

command load wData1. Then let wData = wData1 before running the m-script for

weight_fit.m.

To run this m-script, type weighted_fit in the command Window. You then will be

prompted to enter:

The minimum value for the fitted function. The maximum value for the fitted function. The title for the plot. TheXaxis label. The Yaxis label. The equation to be fitted to the data by entering a number from 1 to 9

1: y = a1 * x + a22: y = a1 * x

3: y = a1 * x^2 + a2 * x + a3

4: y = a1 * x^25: y = a1 * x^3 + a2 * x^2 + a3 * x + a4

6: y = a1 * x^a2

7: y = a1 * exp(- a2 * x)8: y = a1 * (1 - exp(-a2 * x))

9: y = a1 * x^4 + a2 * x^3 + a3 * x^2 + a4 * x +a5


3/17

After the m-script has been executed, the following information is displayed in theCommand Window:

The equation type. Number of measurements. The degrees of freedom (n m) where n is the number of measurements and m

is the number of coefficients (a1, a2, , am ). Measure s for the good-of-fit - 2 value, reduced 2 value, and a probability

factor and a description of the fitted equation to the data.

Coefficients in the array a {a1, a2, ... , am). The uncertainties in the coefficients in the array sigma.

The results are also displayed in two Figure Windows:

TheXYplot of the measurements and the fitted equation. The 2 distribution indicating the good-of-fit

CHI-SQUARED DISTRIBUTIONThe chi-squared distribution 2 (2 is a single entity and is not equal to ) and isvery useful for testing the goodness-of-of fit of a theoretical equation to a set of

measurements. For a set ofn independent random variables xi that have a Gaussian

distribution with theoretical means i and standard deviations i, the chi-squared

distribution 2 defined as

(3)

2

2

n

i i

i i

x

=

2

is also a random variable because it depends upon the random variables xi and iand follows the distribution

(4)

( )1

2 22

exp2

( )

22

P x

=

where ( ) is the gamma function and is the number ofdegrees of freedom and isthe sole parameter related to the number of independent variables in the sum used to

describe the distribution. The mean of the distribution is = and the variance is

= 2. This distribution can be used to test a hypothesis that a theoretical equation

fits a set of measurements. If an improbable chi-squared value is obtained, one mustquestion the validity of the fitted equation. The chi-squared characterizes the

fluctuations in the measurementsxiand so on average {(xi - i) / i} should be about

one. Therefore, one can define the reduced chi-squared value 2r

as

(5)2

2

r

=


4/17

The extrinsic function chi2test.m can be used to display the distribution for a given

degree of freedom and gives the probability of a chi-squared value exceeding a

given chi-squared value.

For example, chi2test(6, 12) prob = 6.2 %. This would imply that thehypothesis should be rejected because there is only a relatively small probability that

2

= 12 with = 6 degrees of freedom would occur by chance. The 2

distribution

given by equation (zz.4) is shown in Fig. 1.

Fig. 1 Chi-squared distribution with parameters: 6 degrees of freedom, the

2

value = 12 and the probability = 6.17% that the 2would be

exceeded by chance.

Basically, we have set up a hypothesis that our measurements can be described by

some analytical functionf(a;x). We test the hypothesis by the value of2.

2is a

measure of the total agreement between our measurements and the hypothesis. It can

be assumed that the minimum value of

2

is distributed according to the

2

0 5 10 15 20 25 300

0.02

0.04

0.06

0.08

0.1

0.12

0.14chi-squared distribtion

2

P(x)

dof = 6

chi2Max = 12

prob % = 6.17

prob = chi2test(dof, chi2Max)

Outputs

Probability 2 > 2max Value of2 to test

Degrees of freedom

Inputs


5/17

distribution with (n-m) degrees of freedom. Often the reduced 2 value{2reduced =2/(n-m)} is quoted as a measure of the goodness-of-fit

2reduced ~ 1 hypothesis is acceptable

2reduced > 1 hypothesis may not be acceptable


6/17

Example 1

The data for the extension of a spring is shown in Table 1. We can test the hypotheses

that a straight line fits the data.

Table 1. Data for the extension of a spring due to a variable load.

The data is saved in the file wData1.mat. Clear all the variables in the Workspace and

load the file wData1 then set wData=wData1

wData0 0 0 0

20.0000 0.5000 1.0000 0.0300

55.0000 1.0000 3.0000 0.050078.0000 1.5000 4.0000 0.0700

98.0000 2.0000 5.0000 0.1000

130.0000 2.5000 6.0000 0.1300154.0000 3.0000 8.0000 0.1500

173.0000 3.5000 9.0000 0.1700

205.0000 4.0000 10.0000 0.20000 0

Run the m-script weighted_fit and select the equation for the linear fit (equation 1).

The results of the weighted least squares fit are:

1: y = a1 * x + a2 linearNo. measurements = 9

Degree of freedom = 7

chi2 = 13.8177Reduced chi2 = 1.97395

Probability = 5.43

*** Acceptable Fit ***

Coefficients a1, a2, ... , am0.0195

0.0203

Uncertainties in coefficient0.0005

0.0346

The hypothesis is acceptable and we can conclude:

The intercept is (a2 = 0.02 0.03) N, hence the intercept can be taken as zero. The slope is as (a1 = 19.5 0.5 ) N.m-1. The spring obeys Hookes Law F k e= where the spring constant is

k= (19.5 0.5 ) N.m-1

The graphical output is shown in Figs. 2 and 3.

x: e (mm) 0 20 55 78 98 130 154 173 205

y: F(N) 0 0.50 1.00 1.50 2.00 2.50 3.00 3.50 4.00x (mm) 0 1 3 4 5 6 8 9 10

(N) 0 0.03 0.05 0.07 0.1 0.13 0.15 0.17 0.2


7/17

Fig. 2. Weighted least squares for to the data in Table 1. The slope of the

line is (19.5 0.5 N) and the intercept is (0.2 0.3).

Fig. 3. Ch-squared distribution for the linear fit to the data in Table 1. The

2 ~ 1, so the fit is acceptable enough the probability of the

Weighted least squares for to the data in Table 1. The slope of the

line is (19.5 0.5 N) and the intercept is (0.2 0.3).

0 5 10 15 20 25 300

0.02

0.04

0.06

0.08

0.1

0.12

0.14chi-squared distribtion

2

P(x)

dof = 7

chi2Max = 13.8177

chi2reduce

= 1.974

prob % = 5.43

0 20 40 60 80 100 120 140 160 180 200 220

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

extension e (mm)

load

F

(N)


8/17

Example 2

The data from Example 1 is used to fit a linear function with uncertainties assigned to

both thex andy values.

The results of the weighted least squares fit are:

1: y = a1 * x + a2 linearNo. measurements = 9

Degree of freedom = 7chi2 = 2.08357

Reduced chi2 = 0.297652

Probability of exceeding chi2 value = 95.5Coefficients a1, a2, ... , am

0.0194 0.0147

Uncertainties in coefficient0.0011 0.0923

? Fit may be too good ?

Fig. 4. Plot of the data and fitted function for Example 1.

The results are similar but not identical to Example 1.

x: e (mm) 0 20 55 78 98 130 154 173 205

y: F(N) 0 0.50 1.0 1.5 2.0 2.5 3.0 3.5 4.0y (N) 0 0.1 0.1 0.1 0.2 0.2 0.2 0.3 0.3

x (mm) 0 2 2 2 2 2 2 2 2

0 50 100 150 200 2500

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5Weighted Least Squares Fit

X

Y


9/17

Fig. 5. Plot of the chi-squared distribution showing the probability of

the 2

value being exceeded.

Example 3

The data below is used to fit a power relation 21

a y a x= .

The results of the weighted least squares fit are:6: y = a1 * x^a2 powerNo. measurements = 9


chi2 = 0.104884Reduced chi2 = 0.0149834

Probability of exceeding chi2 value = 100Coefficients a1, a2, ... , am

1.4379 0.5129

Uncertainties in coefficient0.2088 0.0988

? Fit may be too good ?

x: m (kg) 0 0.02 0.05 0.10 0.15 0.20 0.30 0.35 0.40y: T(s) 0 0.20 0.31 0.46 0.62 0.71 0.76 0.84 0.40

y (s) 0 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10

x (kg) 0 0 0 0 0 0 0 0 0

0 5 10 15 20 25 300

0.02

0.04

0.06

0.08

0.1

0.12

0.14chi-squared dist ribtion

2

P(x)

dof = 7

chi2Max = 2.0836

prob % = 95.5


10/17

Example 4

The data below is used to fit a power relation 21

a x y a e

= .

The results of the weighted least squares fit are:7: y = a1 * exp(- a2 * x) exponential decayNo. measurements = 9


chi2 = 1.06625Reduced chi2 = 0.152321

Probability of exceeding chi2 value = 99.3

Coefficients a1, a2, ... , am

20.0000 0.0444Uncertainties in coefficient

0.8857 0.0023? Fit may be too good ?

Fig. 6. Plot of data and exponential decay fit for data in Example 6.7.

x: t(s) 0 10 20 30 40 50 60 70 80

y: A (mm) 20 12.5 8.0 5.0 3.5 2.5 1.5 1.0 0.5

y (mm) 1.0 1.0 1.0 0.5 0.5 0.5 0.5 0.5 0.5

x (s) 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0

0 10 20 30 40 50 60 70 80 90 1000

5

10

15

20

25Weighted Least Squares Fit

X

Y


11/17

Example 5

The weighted-fit m-script can be used for interpolation. For example, the viscosity

of water is a function of temperature Tand tables give the viscosity at only fixedtemperatures. By fitting a polynomial to the data, one can estimate the viscosity at a

temperature between the fixed values.

The results for fitting a 3rd order polynomial are:5: y = a1 * x^3 + a2 * x^2 + a3 * x + a4 cubic polynomialNo. measurements = 9

Degree of freedom = 5chi2 = 168.672

Reduced chi2 = 33.7343Probability of exceeding chi2 value = -0.0217Coefficients a1, a2, ... , am

-0.0000 0.0006 -0.0476 1.7562

Uncertainties in coefficient0.0000 0.0000 0.0004 0.0045

??? Fit may not be acceptable ???

The results of fitting a 4th

order polynomial are:

No. measurements = 9


chi2 = 13.5519Reduced chi2 = 3.38797

Probability of exceeding chi2 value = 0.866

Coefficients a1, a2, ... , am0.0000 -0.0000 0.0010 -0.0556 1.7781

Uncertainties in coefficient0.0000 0.0000 0.0000 0.0008 0.0048

??? Fit may not be acceptable ???

The 4th

order polynomial has a much lower reduced 2 value and gives a better fit to

the data than the 3rd order polynomial.

viscosity (mPa.s)

Table 3r oder

polynomial

4t oder

polynomial

20 C 1.002 1.0212 0.9996

24 C --- 0.9200 0.9055

x: T(C) 0 10 20 30 40 50 60 80 100

y: (mPa.s) 1.783 1.302 1.002 0.800 0.651 0.548 0.469 0.354 0.281

y (mm) 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005

x (s) 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0


12/17

Fig. 7. 3rd

and 4th

order polynomial fit to viscosity data of example 5.

-20 0 20 40 60 80 100 1200.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8Weighted Least Squares Fit

X

Y

-20 0 20 40 60 80 100 1200.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8Weighted Least Squares Fit

X

Y


13/17

THE METHOD OF LEAST SQUARES

Step 1: Assigning the weights for finding uncertainties

Weights w are assigned to the uncertainties dy in the y measurements

w = 1/dy ifdyk= 0 then wk= 1 for any k

An adjustable parameteru known as the damping factor is initially set to 0.001 so that

the coefficients acan be adjusted to minimize the 2 value by simply adjusting thevalue ofu.

Step 2: Set the starting values for the coefficients a

To use the least squares method, we have to estimate starting values for the

coefficients a. If the equation can be made linear in some way, then we can solve n

simultaneous equations to find the unknown values ofa. For example,

EqType = 5

% f = a1 * x^3 + a2 * x^2 + a3 * x + a4 cubic polynomialxx(:,1) = x.^3;

xx(:,2) = x.^2;xx(:,3) = x;

a = xx\y;

If this cant be done, a simpler method is used to set the coefficients ora = 1.

Step 3: Minimize 2 value

The counters for the number of data points n and number of coefficients m are

i = 1, 2, , n

k= 1, 2, , mj = 1, 2, , m

The weighted difference matrix Dis

( )i i i i D w y f = D = w * (y f)

The value of2

is then

2 = D * D

where gives the transpose of a matrix.

We need to adjust the coefficients by an iterative method until we find the true

minimum of2. For stepL in the iterative procedure

a(L+1) = a(L) + da


14/17

and our desired goal is that 2{a(L+1)} <

2{a(L)}.

The minimum of2(a) is given by the condition

2

0

ka

=

For small variations of the coefficients, the value of2{a(L+1)} may be expanded in

terms of a Taylors series around 2{a(L)} and if the expansion is truncated after thesecond term, we can use the approximation

2 2 2

a( ) a( )

m

L L j

jk k j

daa a a

=

This can be written in matrix form as

B = CUR* da

where2

1

2a( )k L

k

Ba

=

and2 2

( )

1

2akj L

k j

CURa a

=

where da is the matrix for the increments in the coefficients, CUR is called the

curvature matrix as it expresses the curvature of2(a) with respect to a.

The B matrix, after performing the partial differentiation can be written as

( )n

ik i i i i

i k

f B w w y f

a

=

We need to calculate the partial derivatives of the fitted function with respect to the

coefficients a. To make the program more general, the weighted partial derivates pdf

are calculated numerically by the function part_der using the difference

approximation to the derivative

( ) ( )

2

k k

k

f a f af

a

+ =

The matrix B written in matrix form is

B = pdf * D

The curvature matrix CURcan be approximated by


15/17

1

ni i

kj i i

i k j

f fCUR w w

a a=

=

CUR = pdf * pdf

The elements of the curvature matrix CUR may have different magnitudes. To

improve the numerical stability, the elements can be scaled by the diagonal elements

of the curvature matrix and the damping factor u can be added to the diagonal

elements to give the modified curvature matrix MCUR

( )1 kj kjkj

kk jj

u CURMCUR

CUR CUR

+=

where kj is the Kronecker delta function (kj = 1 ifk=j otherwise kj = 0).

Therefore, we can approximate the incremental changes in the coefficients as

B = MCUR* da

da = (MCUR)-1 * B = MCOV * B

where MCOV = MCUR-1 is the modified covariance matrix.

The new estimates of the coefficients and the corresponding 2

value can then be

calculated

anew = a + da

To test the minimization 2 of as part of the iterative process, the following is done

If2new > 2old Moving away from a minimum, keep current a values and

set u = 10 uRepeat iteration.

If2new < 2old Approaching a minimum, set a = anew, u = u/ 10

Repeat iteration.

if | 2new < 2old | < 0.001

Terminate the iterationu = 0

2 = 2newCalculate: CUR, MCUR, MCOV

Step 4: Output the results


16/17

The square root of the diagonal elements of the covariance matrix COV give the

uncertainties in the coefficients fora

k kk sigma COV =

where

kj

kj

kk jj

MCOVCOV

CUR CUR=

Basically, we have set up a hypothesis that our measurements can be described by

some analytical functionf(a;x). We need to be able to statistically test the hypothesis.

This can be done using the value of2.

2is a measure of the total agreement between

our measurements and the hypothesis. It can be assumed that the minimum value of

2

is distributed according to the 2 distribution with (n-m) degrees of freedom. Often the

reduced 2 value{2reduced = 2/(n-m)} is quoted as a measure of the goodness-of-fit

2reduced ~ 1 hypothesis is acceptable

2reduced > 1 hypothesis may not be acceptable

Also the probability of the 2 value being exceeded is given as another measure of thegoodness-of-fit.

Finally, the measurements and fitted function are plotted together with a plot of the 2distribution.


17/17

References

G. Cowan,Statistical Data Analysis, (Clarendon Press, Oxford, 1998)

Advanced treatment of practical applications of statistics in data analysis.

A. Kharab and R.B. Guenther,An Introduction to Numerical Methods A MatlabApproach, (Chapman & Hall/CRC, Boca, 2002)

An introduction to theory and applications in numerical methods for researchers andprofessionals with a background only in calculus and basic programming. CD-ROM

contains the Matlab code for many algorithms

L. Lyons,A Practical Guide to Data Analysis for Physical Science Students,

(Cambridge University Press, Cambridge, 1991)

Basic introduction to analysis of measurement and least squares fitting.

L. Kirkup,Experimental Methods, (John Wiley & Sons, Brisbane, 1994)

Introduction to the analysis and presentation of experimental data.

W. R. Leo, Techniques for Nuclear and Particle Physics Experiments, (Spring-

Verlag, Berlin, 1987)

Chapter on Statistics and Treatment of Experimental Data including Curve Fitting:

linear and non linear fits.

W. H. Press, S. A. Teukolsky, W. T. Vetterling and B. P. Flannery,Numerical

Recipes in C, (Cambridge University Press, Cambridge, 1992)

Chapter on modeling of data including linear and non linear fits.

S. S. M. Wong, Computational Methods in Physics and Engineering, (Prentice

Hall, Englewood Cliffs, 1992)

Contains a very good chapter on the Method of Least squares and the Method of

Marquardt.

Documents

Weighted Fit