Upload
ranveerm
View
229
Download
0
Embed Size (px)
Citation preview
8/3/2019 Weighted Fit
1/17
DOING PHYSICS WITH MATLAB
DATA ANALYSIS
WEIGHTED FITIan Cooper
School of Physics, University of Sydney
MATLAB SCRIPTS
Goto the directory containing the m-scripts for Data Analysis
The Matlab scripts are used to analysis sets of experimental data.
weighted_fit.m m-script used to fit a straight line to set of experimental data
which calls the following functions
fit_function.m function used to evaluate the fitted function
part_dev_fit.m function to evaluate partial derivatives of the fitted function
chi2test.m function to evaluate the 2
value
CURVE FITTING
LEAST SQUARES - UNCERTAINITES IN THE DATAIn many experiments, the functional relationship between two variables x and y is
investigated by measuring a set ofn values of (xi,yi) and their uncertainty xi, yi. The
functional relationship betweenx andy can be written as
(1) yf=f(a1, a2, , am;x) = f(a;x)
The goal is to find the unknown coefficients a = {a1, a2, , am} to fit the function
f(a;x) to the set ofn measurements. For a statistical analysis, it is difficult to consider
simultaneously the uncertainties in both thex andy values. In this treatment, only the
uncertainties yi in the y measurements will be considered. For the method of leastsquares, to find the coefficients a, the best estimates are those that minimizes the
2
value,given by equation (zz.2)
(2)
2
2
1
( ; )an i i
i yi
y f x
=
=
This is simply the sum of the squared deviations of the measurements from the fitted
functionf(a;x) weighted by the uncertainties yi in they values.
How to implement the method will be given in some detail so that you can modify them-script weighted_fit.m to add your own functions to fit a set of measurements.
8/3/2019 Weighted Fit
2/17
This m-script calls two extrinsic functions: fit_function to evaluate the fittedfunction and part_dev to calculate the partial derivative of the function with respect
to the coefficients,( , )a
k
f x
a
. Also there is call to the m-script chi2trest.m to give a
measure the goodness of the fit. An iterative procedure is used based upon the
method of Marquardt where the minimum of2
is found by adjusting the value of thecoefficients through a damping factoru. The steps in the procedure and details of the
m-script weighted_fit.mare described in the following section.
USING THE M-SCRIPT weighted_fit.mThe measurements must be entered into a matrix called wData of dimension (n4).
The analysis only uses the uncertainties y associated with the y measurements. The
uncertainties xare only included to show any error bars when the measurements are
plotted.
Measurements Matrix
x x = wData(:, 1)
y y = wData(:, 2)
x dx = wData(:, 3)
y dy = wData(:, 4)
It is a good idea to clear all variables from the Workspace using the Matlab command
clear all. For example, the data (x,y,x, y) could be entered into the matrix wData1
and saved to a file from the Command Window using the Matlab command savewData1. To use this data at any time in the Command Window, use the Matlab
command load wData1. Then let wData = wData1 before running the m-script for
weight_fit.m.
To run this m-script, type weighted_fit in the command Window. You then will be
prompted to enter:
The minimum value for the fitted function. The maximum value for the fitted function. The title for the plot. TheXaxis label. The Yaxis label. The equation to be fitted to the data by entering a number from 1 to 9
1: y = a1 * x + a22: y = a1 * x
3: y = a1 * x^2 + a2 * x + a3
4: y = a1 * x^25: y = a1 * x^3 + a2 * x^2 + a3 * x + a4
6: y = a1 * x^a2
7: y = a1 * exp(- a2 * x)8: y = a1 * (1 - exp(-a2 * x))
9: y = a1 * x^4 + a2 * x^3 + a3 * x^2 + a4 * x +a5
8/3/2019 Weighted Fit
3/17
After the m-script has been executed, the following information is displayed in theCommand Window:
The equation type. Number of measurements. The degrees of freedom (n m) where n is the number of measurements and m
is the number of coefficients (a1, a2, , am ). Measure s for the good-of-fit - 2 value, reduced 2 value, and a probability
factor and a description of the fitted equation to the data.
Coefficients in the array a {a1, a2, ... , am). The uncertainties in the coefficients in the array sigma.
The results are also displayed in two Figure Windows:
TheXYplot of the measurements and the fitted equation. The 2 distribution indicating the good-of-fit
CHI-SQUARED DISTRIBUTIONThe chi-squared distribution 2 (2 is a single entity and is not equal to ) and isvery useful for testing the goodness-of-of fit of a theoretical equation to a set of
measurements. For a set ofn independent random variables xi that have a Gaussian
distribution with theoretical means i and standard deviations i, the chi-squared
distribution 2 defined as
(3)
2
2
n
i i
i i
x
=
2
is also a random variable because it depends upon the random variables xi and iand follows the distribution
(4)
( )1
2 22
exp2
( )
22
P x
=
where ( ) is the gamma function and is the number ofdegrees of freedom and isthe sole parameter related to the number of independent variables in the sum used to
describe the distribution. The mean of the distribution is = and the variance is
= 2. This distribution can be used to test a hypothesis that a theoretical equation
fits a set of measurements. If an improbable chi-squared value is obtained, one mustquestion the validity of the fitted equation. The chi-squared characterizes the
fluctuations in the measurementsxiand so on average {(xi - i) / i} should be about
one. Therefore, one can define the reduced chi-squared value 2r
as
(5)2
2
r
=
8/3/2019 Weighted Fit
4/17
The extrinsic function chi2test.m can be used to display the distribution for a given
degree of freedom and gives the probability of a chi-squared value exceeding a
given chi-squared value.
For example, chi2test(6, 12) prob = 6.2 %. This would imply that thehypothesis should be rejected because there is only a relatively small probability that
2
= 12 with = 6 degrees of freedom would occur by chance. The 2
distribution
given by equation (zz.4) is shown in Fig. 1.
Fig. 1 Chi-squared distribution with parameters: 6 degrees of freedom, the
2
value = 12 and the probability = 6.17% that the 2would be
exceeded by chance.
Basically, we have set up a hypothesis that our measurements can be described by
some analytical functionf(a;x). We test the hypothesis by the value of2.
2is a
measure of the total agreement between our measurements and the hypothesis. It can
be assumed that the minimum value of
2
is distributed according to the
2
0 5 10 15 20 25 300
0.02
0.04
0.06
0.08
0.1
0.12
0.14chi-squared distribtion
2
P(x)
dof = 6
chi2Max = 12
prob % = 6.17
prob = chi2test(dof, chi2Max)
Outputs
Probability 2 > 2max Value of2 to test
Degrees of freedom
Inputs
8/3/2019 Weighted Fit
5/17
distribution with (n-m) degrees of freedom. Often the reduced 2 value{2reduced =2/(n-m)} is quoted as a measure of the goodness-of-fit
2reduced ~ 1 hypothesis is acceptable
2reduced > 1 hypothesis may not be acceptable
8/3/2019 Weighted Fit
6/17
Example 1
The data for the extension of a spring is shown in Table 1. We can test the hypotheses
that a straight line fits the data.
Table 1. Data for the extension of a spring due to a variable load.
The data is saved in the file wData1.mat. Clear all the variables in the Workspace and
load the file wData1 then set wData=wData1
wData0 0 0 0
20.0000 0.5000 1.0000 0.0300
55.0000 1.0000 3.0000 0.050078.0000 1.5000 4.0000 0.0700
98.0000 2.0000 5.0000 0.1000
130.0000 2.5000 6.0000 0.1300154.0000 3.0000 8.0000 0.1500
173.0000 3.5000 9.0000 0.1700
205.0000 4.0000 10.0000 0.20000 0
Run the m-script weighted_fit and select the equation for the linear fit (equation 1).
The results of the weighted least squares fit are:
1: y = a1 * x + a2 linearNo. measurements = 9
Degree of freedom = 7
chi2 = 13.8177Reduced chi2 = 1.97395
Probability = 5.43
*** Acceptable Fit ***
Coefficients a1, a2, ... , am0.0195
0.0203
Uncertainties in coefficient0.0005
0.0346
The hypothesis is acceptable and we can conclude:
The intercept is (a2 = 0.02 0.03) N, hence the intercept can be taken as zero. The slope is as (a1 = 19.5 0.5 ) N.m-1. The spring obeys Hookes Law F k e= where the spring constant is
k= (19.5 0.5 ) N.m-1
The graphical output is shown in Figs. 2 and 3.
x: e (mm) 0 20 55 78 98 130 154 173 205
y: F(N) 0 0.50 1.00 1.50 2.00 2.50 3.00 3.50 4.00x (mm) 0 1 3 4 5 6 8 9 10
(N) 0 0.03 0.05 0.07 0.1 0.13 0.15 0.17 0.2
8/3/2019 Weighted Fit
7/17
Fig. 2. Weighted least squares for to the data in Table 1. The slope of the
line is (19.5 0.5 N) and the intercept is (0.2 0.3).
Fig. 3. Ch-squared distribution for the linear fit to the data in Table 1. The
2 ~ 1, so the fit is acceptable enough the probability of the
Weighted least squares for to the data in Table 1. The slope of the
line is (19.5 0.5 N) and the intercept is (0.2 0.3).
0 5 10 15 20 25 300
0.02
0.04
0.06
0.08
0.1
0.12
0.14chi-squared distribtion
2
P(x)
dof = 7
chi2Max = 13.8177
chi2reduce
= 1.974
prob % = 5.43
0 20 40 60 80 100 120 140 160 180 200 220
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
extension e (mm)
load
F
(N)
8/3/2019 Weighted Fit
8/17
Example 2
The data from Example 1 is used to fit a linear function with uncertainties assigned to
both thex andy values.
The results of the weighted least squares fit are:
1: y = a1 * x + a2 linearNo. measurements = 9
Degree of freedom = 7chi2 = 2.08357
Reduced chi2 = 0.297652
Probability of exceeding chi2 value = 95.5Coefficients a1, a2, ... , am
0.0194 0.0147
Uncertainties in coefficient0.0011 0.0923
? Fit may be too good ?
Fig. 4. Plot of the data and fitted function for Example 1.
The results are similar but not identical to Example 1.
x: e (mm) 0 20 55 78 98 130 154 173 205
y: F(N) 0 0.50 1.0 1.5 2.0 2.5 3.0 3.5 4.0y (N) 0 0.1 0.1 0.1 0.2 0.2 0.2 0.3 0.3
x (mm) 0 2 2 2 2 2 2 2 2
0 50 100 150 200 2500
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5Weighted Least Squares Fit
X
Y
8/3/2019 Weighted Fit
9/17
Fig. 5. Plot of the chi-squared distribution showing the probability of
the 2
value being exceeded.
Example 3
The data below is used to fit a power relation 21
a y a x= .
The results of the weighted least squares fit are:6: y = a1 * x^a2 powerNo. measurements = 9
Degree of freedom = 7
chi2 = 0.104884Reduced chi2 = 0.0149834
Probability of exceeding chi2 value = 100Coefficients a1, a2, ... , am
1.4379 0.5129
Uncertainties in coefficient0.2088 0.0988
? Fit may be too good ?
x: m (kg) 0 0.02 0.05 0.10 0.15 0.20 0.30 0.35 0.40y: T(s) 0 0.20 0.31 0.46 0.62 0.71 0.76 0.84 0.40
y (s) 0 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10
x (kg) 0 0 0 0 0 0 0 0 0
0 5 10 15 20 25 300
0.02
0.04
0.06
0.08
0.1
0.12
0.14chi-squared dist ribtion
2
P(x)
dof = 7
chi2Max = 2.0836
prob % = 95.5
8/3/2019 Weighted Fit
10/17
Example 4
The data below is used to fit a power relation 21
a x y a e
= .
The results of the weighted least squares fit are:7: y = a1 * exp(- a2 * x) exponential decayNo. measurements = 9
Degree of freedom = 7
chi2 = 1.06625Reduced chi2 = 0.152321
Probability of exceeding chi2 value = 99.3
Coefficients a1, a2, ... , am
20.0000 0.0444Uncertainties in coefficient
0.8857 0.0023? Fit may be too good ?
Fig. 6. Plot of data and exponential decay fit for data in Example 6.7.
x: t(s) 0 10 20 30 40 50 60 70 80
y: A (mm) 20 12.5 8.0 5.0 3.5 2.5 1.5 1.0 0.5
y (mm) 1.0 1.0 1.0 0.5 0.5 0.5 0.5 0.5 0.5
x (s) 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
0 10 20 30 40 50 60 70 80 90 1000
5
10
15
20
25Weighted Least Squares Fit
X
Y
8/3/2019 Weighted Fit
11/17
Example 5
The weighted-fit m-script can be used for interpolation. For example, the viscosity
of water is a function of temperature Tand tables give the viscosity at only fixedtemperatures. By fitting a polynomial to the data, one can estimate the viscosity at a
temperature between the fixed values.
The results for fitting a 3rd order polynomial are:5: y = a1 * x^3 + a2 * x^2 + a3 * x + a4 cubic polynomialNo. measurements = 9
Degree of freedom = 5chi2 = 168.672
Reduced chi2 = 33.7343Probability of exceeding chi2 value = -0.0217Coefficients a1, a2, ... , am
-0.0000 0.0006 -0.0476 1.7562
Uncertainties in coefficient0.0000 0.0000 0.0004 0.0045
??? Fit may not be acceptable ???
The results of fitting a 4th
order polynomial are:
No. measurements = 9
Degree of freedom = 4
chi2 = 13.5519Reduced chi2 = 3.38797
Probability of exceeding chi2 value = 0.866
Coefficients a1, a2, ... , am0.0000 -0.0000 0.0010 -0.0556 1.7781
Uncertainties in coefficient0.0000 0.0000 0.0000 0.0008 0.0048
??? Fit may not be acceptable ???
The 4th
order polynomial has a much lower reduced 2 value and gives a better fit to
the data than the 3rd order polynomial.
viscosity (mPa.s)
Table 3r oder
polynomial
4t oder
polynomial
20 C 1.002 1.0212 0.9996
24 C --- 0.9200 0.9055
x: T(C) 0 10 20 30 40 50 60 80 100
y: (mPa.s) 1.783 1.302 1.002 0.800 0.651 0.548 0.469 0.354 0.281
y (mm) 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005
x (s) 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
8/3/2019 Weighted Fit
12/17
Fig. 7. 3rd
and 4th
order polynomial fit to viscosity data of example 5.
-20 0 20 40 60 80 100 1200.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8Weighted Least Squares Fit
X
Y
-20 0 20 40 60 80 100 1200.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8Weighted Least Squares Fit
X
Y
8/3/2019 Weighted Fit
13/17
THE METHOD OF LEAST SQUARES
Step 1: Assigning the weights for finding uncertainties
Weights w are assigned to the uncertainties dy in the y measurements
w = 1/dy ifdyk= 0 then wk= 1 for any k
An adjustable parameteru known as the damping factor is initially set to 0.001 so that
the coefficients acan be adjusted to minimize the 2 value by simply adjusting thevalue ofu.
Step 2: Set the starting values for the coefficients a
To use the least squares method, we have to estimate starting values for the
coefficients a. If the equation can be made linear in some way, then we can solve n
simultaneous equations to find the unknown values ofa. For example,
EqType = 5
% f = a1 * x^3 + a2 * x^2 + a3 * x + a4 cubic polynomialxx(:,1) = x.^3;
xx(:,2) = x.^2;xx(:,3) = x;
a = xx\y;
If this cant be done, a simpler method is used to set the coefficients ora = 1.
Step 3: Minimize 2 value
The counters for the number of data points n and number of coefficients m are
i = 1, 2, , n
k= 1, 2, , mj = 1, 2, , m
The weighted difference matrix Dis
( )i i i i D w y f = D = w * (y f)
The value of2
is then
2 = D * D
where gives the transpose of a matrix.
We need to adjust the coefficients by an iterative method until we find the true
minimum of2. For stepL in the iterative procedure
a(L+1) = a(L) + da
8/3/2019 Weighted Fit
14/17
and our desired goal is that 2{a(L+1)} <
2{a(L)}.
The minimum of2(a) is given by the condition
2
0
ka
=
For small variations of the coefficients, the value of2{a(L+1)} may be expanded in
terms of a Taylors series around 2{a(L)} and if the expansion is truncated after thesecond term, we can use the approximation
2 2 2
a( ) a( )
m
L L j
jk k j
daa a a
=
This can be written in matrix form as
B = CUR* da
where2
1
2a( )k L
k
Ba
=
and2 2
( )
1
2akj L
k j
CURa a
=
where da is the matrix for the increments in the coefficients, CUR is called the
curvature matrix as it expresses the curvature of2(a) with respect to a.
The B matrix, after performing the partial differentiation can be written as
( )n
ik i i i i
i k
f B w w y f
a
=
We need to calculate the partial derivatives of the fitted function with respect to the
coefficients a. To make the program more general, the weighted partial derivates pdf
are calculated numerically by the function part_der using the difference
approximation to the derivative
( ) ( )
2
k k
k
f a f af
a
+ =
The matrix B written in matrix form is
B = pdf * D
The curvature matrix CURcan be approximated by
8/3/2019 Weighted Fit
15/17
1
ni i
kj i i
i k j
f fCUR w w
a a=
=
CUR = pdf * pdf
The elements of the curvature matrix CUR may have different magnitudes. To
improve the numerical stability, the elements can be scaled by the diagonal elements
of the curvature matrix and the damping factor u can be added to the diagonal
elements to give the modified curvature matrix MCUR
( )1 kj kjkj
kk jj
u CURMCUR
CUR CUR
+=
where kj is the Kronecker delta function (kj = 1 ifk=j otherwise kj = 0).
Therefore, we can approximate the incremental changes in the coefficients as
B = MCUR* da
da = (MCUR)-1 * B = MCOV * B
where MCOV = MCUR-1 is the modified covariance matrix.
The new estimates of the coefficients and the corresponding 2
value can then be
calculated
anew = a + da
To test the minimization 2 of as part of the iterative process, the following is done
If2new > 2old Moving away from a minimum, keep current a values and
set u = 10 uRepeat iteration.
If2new < 2old Approaching a minimum, set a = anew, u = u/ 10
Repeat iteration.
if | 2new < 2old | < 0.001
Terminate the iterationu = 0
2 = 2newCalculate: CUR, MCUR, MCOV
Step 4: Output the results
8/3/2019 Weighted Fit
16/17
The square root of the diagonal elements of the covariance matrix COV give the
uncertainties in the coefficients fora
k kk sigma COV =
where
kj
kj
kk jj
MCOVCOV
CUR CUR=
Basically, we have set up a hypothesis that our measurements can be described by
some analytical functionf(a;x). We need to be able to statistically test the hypothesis.
This can be done using the value of2.
2is a measure of the total agreement between
our measurements and the hypothesis. It can be assumed that the minimum value of
2
is distributed according to the 2 distribution with (n-m) degrees of freedom. Often the
reduced 2 value{2reduced = 2/(n-m)} is quoted as a measure of the goodness-of-fit
2reduced ~ 1 hypothesis is acceptable
2reduced > 1 hypothesis may not be acceptable
Also the probability of the 2 value being exceeded is given as another measure of thegoodness-of-fit.
Finally, the measurements and fitted function are plotted together with a plot of the 2distribution.
8/3/2019 Weighted Fit
17/17
References
G. Cowan,Statistical Data Analysis, (Clarendon Press, Oxford, 1998)
Advanced treatment of practical applications of statistics in data analysis.
A. Kharab and R.B. Guenther,An Introduction to Numerical Methods A MatlabApproach, (Chapman & Hall/CRC, Boca, 2002)
An introduction to theory and applications in numerical methods for researchers andprofessionals with a background only in calculus and basic programming. CD-ROM
contains the Matlab code for many algorithms
L. Lyons,A Practical Guide to Data Analysis for Physical Science Students,
(Cambridge University Press, Cambridge, 1991)
Basic introduction to analysis of measurement and least squares fitting.
L. Kirkup,Experimental Methods, (John Wiley & Sons, Brisbane, 1994)
Introduction to the analysis and presentation of experimental data.
W. R. Leo, Techniques for Nuclear and Particle Physics Experiments, (Spring-
Verlag, Berlin, 1987)
Chapter on Statistics and Treatment of Experimental Data including Curve Fitting:
linear and non linear fits.
W. H. Press, S. A. Teukolsky, W. T. Vetterling and B. P. Flannery,Numerical
Recipes in C, (Cambridge University Press, Cambridge, 1992)
Chapter on modeling of data including linear and non linear fits.
S. S. M. Wong, Computational Methods in Physics and Engineering, (Prentice
Hall, Englewood Cliffs, 1992)
Contains a very good chapter on the Method of Least squares and the Method of
Marquardt.