Kirkup - Principles and Applications of Non-Linear Least Squares - An Introduction for Physical Scientists Using Excel Solver (2003)

Principles and Applications of Non-Linear Least Squares: An Introduction for Physical Scientists using Excel’s Solver Les Kirkup, Department of Applied Physics, Faculty of Science, University of Technology, Sydney, New South Wales 2007, Australia. email: [email protected] Version date: October 2003

2

Preamble Least squares is an extremely powerful technique for fitting equations to data and is carried out in laboratories every day. Routines for calculating parameter estimates using linear least squares are most common, and many inexpensive pocket calculators are able to do this. As we move away from fitting the familiar equation, y = a + bx to data, we usually need to employ computer based programs such as spreadsheets, or specialised statistical packages to do the ‘number crunching’. In situations where an equation is complex, we may need to use non-linear least squares to fit the equation to experimental or observational data. Non-linear least squares is treated in this document with a focus on how Excel’s Solver utility may be employed to perform this task. Though I had originally intended to concentrate more or less exclusively on using Solver to carry out non-linear least squares (due to the general availability of Excel and the fact that I’d already written a text discussing data analysis using Excel!), several other related topics emerged including model identification, Monte Carlo simulations and uncertainty propagation. I have included something about those topics in this document. In addition, I have tried to include helpful worked examples to illustrate the techniques discussed. I hope the document serves its purpose (I had senior undergraduates and graduates in the physical sciences in mind when I wrote it) and I would appreciate any comments as to what might have been included (or discarded).

3

CONTENTS

Section 1: Introduction 5

1.1 Reasons for fitting equations to data 7

Section 2: Linear Least squares 8

2.1 Standard errors in best estimates 11

Section 3: Extensions of the linear least squares technique 13

3.1 Using Excel to solve linear least squares problems 13 3.2 Limitations of linear least squares 14

Section 4: Excel's Solver add-in 17

4.1 Example of use of Solver 17 4.2 Limitations of Solver 24 4.3 Spreadsheet for the determination of standard errors in parameter estimates 26 4.4 Confidence intervals for parameter estimates 28

Section 5: More on fitting using non-linear least squares 30

5.1 Local Minima in SSR 30 5.2 Starting values 32 5.3 Starting values by curve stripping 33 5.4 Effect of instrument resolution and noise on best estimates 35 5.4.1 Adding normally distributed noise to data using Excel’s Random Number Generator 37 5.4.2 Fitting an equation to noisy data 38 5.4.3 Relationship between sampling density and parameter estimates 39

Section 6: Linear least squares meets non-linear least squares 42

Section 7: Weighted non-linear least squares 46

7.1 Weighted fitting using Solver 46 7.2 Example of weighted fitting using Solver 47 7.2.1 Best estimates of parameters using Solver 49 7.2.2 Determining the D matrix 50 7.2.3 The weight matrix, W 51

7.2.4 Calculation of ( ) 1TWDD − 52

7.2.5 Bringing it all together 52

Section 8: Uncertainty propagation, least squares estimates and calibration 54

8.1: Example of propagation of uncertainties involving parameter estimates 55 8.2 Uncertainties in derived quantities incorporating least squares estimates 59 8.3: Example of propagation of uncertainties in derived quantities 60

4

8.4: Uncertainty propagation and nonlinear least squares 61 8.4.1: Example of uncertainty propagation in parameter estimates obtained by nonlinear least squares 62

Section 9: More on Solver 67

9.1 Solver Options 67 9.2 Solver Results 70

Section 10: Modelling and Model Identification 71

10.1 Physical Modelling 71 10.2 Data driven approach to discovering relationships 72 10.3 Other forms of modelling 73 10.4 Competing models 73 10.5 Statistical Measures of Goodness of Fit 74 10.5.1 Adjusted Coefficient of Multiple Determination 74 10.5.2 Akaike’s Information Criterion (AIC) 75 10.5.3 Example 76

Section 11: Monte Carlo simulations and least squares 78

11.1 Using Excel’s Random Number Generator 79 11.2 Monte Carlo simulation and non-linear least squares 82 11.3 Adding heteroscedastic noise using Excel’s Random Number Generator 86

Section 12: Review 90

Acknowledgements 90

Problems 91

References 101

5

Section 1: Introduction Scientists in all areas of the physical sciences search for defensible models that describe the way nature works. As a part of that search they often investigate the relationship between physical variables. As examples, they might want to know how the,

• electrical resistance of a superconductor depends on the temperature of the superconductor.

• width of an absorption peak in liquid chromatography depends on the flow of the mobile phase through a packed column.

• electrical permittivity of a solid depends on the moisture content in the solid. • output voltage from a conductivity sensor depends on the electrical

conductivity of the liquid in which the sensor is immersed. A model that explains or describes the relationship between physical variables may be devised from first principles, or it may represent a new development of an established model. Whatever the situation, once a model has been devised, it is prudent to compare it to ‘real’ data obtained by experiment. One reason for doing this is to establish whether predictions of the model are consistent with experimental data. Consider a specific example in which nuclear radiation passes through material of thickness, x. The relationship between the intensity, I, of the radiation and x can be written, I I x Bo= − +exp( )µ (1.1) Io is the intensity recorded in the absence of the material when the background radiation is negligible, µ is the absorption coefficient of the material and B is the background intensity. The appropriateness (or otherwise) of equation 1.1 may be investigated for a particular material by considering radiation intensity versus material thickness data, as shown in figure 1.1.

6

0

200

400

600

800

1000

1200

0.0 0.5 1.0 1.5 2.0

Thickness (cm)

Inte

nsity

(cou

nts)

Figure 1.1: Intensity versus thickness data. If equation 1.1 fairly describes the relationship between intensity and thickness, we should be able to find values for Io, µ and B such that the line generated by equation 1.1, when x varies between x = 0.0 and x = 2.0, ‘fits’ the data shown in figure 1.1 (i.e. passes close to the data points). We could begin by making an intelligent guess at values for Io, µ and B. Figure 1.2 shows the outcome of one attempt at guessing values for Io, µ and B.

0

200

400

600

800

1000

1200

0.0 0.5 1.0 1.5 2.0

Thickness (cm)

Inte

nsi

ty (

coun

ts)

Line generated using equation 1.1, where,Io = 800µ = 1B = 10

Figure 1.2: Line drawn through intensity versus thickness data using equation 1.1. It would have been fortuitous had the guesses for Io, µ and B, given in figure 1.2 produced a line that passed close to the data. We could try other values for Io, µ and B and through a process of ‘trial and error’ improve the fit of the line to the data. However, it must be admitted that this is an inefficient way to fit any equation to data

7

and that guesswork must give way to a better approach. This is the main consideration of this document. 1.1 Reasons for fitting equations to data It is possible to fit almost any equation to any data. However, a compelling reason for fitting an equation in the physical sciences is that it provides for an insightful interpretation of physical or chemical processes or phenomena. In particular, the fitting of an equation can assist in validating or refuting a theoretical model and allow for the determination of physically meaningful parameters1. As an example, the parameters in equation 1.1 have physical meaning. For example, µ in equation 1.1 is a quantity that characterises radiation absorption by a material. The applicability of equation 1.1 to a particular material is likely to have been studied by other workers. Therefore a value for µ as determined through analysing the data in figure 1.1 may be compared to that reported by others. There are situations in which an equation is fitted to data for the purpose of calibration and no attempt is made to relate parameters in the equation to physical constants. For example, the concentration of a particular chemical species might be determined using Atomic Absorption Spectroscopy (AAS). An instrument is calibrated by measuring the absorption of known concentrations of the species. A graph of absorption, y, versus concentration, x, is plotted. The next step is to fit an equation to the data. Using the equation, it is possible to determine species concentration from measurements made of absorption.

1 This issue is taken up again in section 10.

8

Section 2: Linear Least squares Often in an experiment there is a known, expected or proposed relationship between variables measured during the experiment. In perhaps the most common situation, the relationship between the dependent (or response) variable, y, and the independent (or predictor) variable, x, may be expressed as, y = a + bx (2.1) Equation 2.1 is the equation of a straight line with intercept, a, and slope, b. In principle, we should be able to find the intercept and slope by drawing a straight line through the points. In practice, the intercept and slope cannot be known exactly, as this would require that we eliminate (or correct for) all sources of random and systematic error in the data. This is not possible. If it were possible to eliminate all sources of error, and assuming the relationship between x and y is linear, we could write the ‘exact’ relationship between x and y as, y = α + βx (2.2) where α is the ‘true intercept’ and β is the ‘true slope’. α and β are often referred to as parameters2 and through applying techniques based on sound statistical principles, it is possible to establish best estimates of those parameters. We will represent the best estimates of α and β by symbols a and b respectively3. A powerful and widely used technique for establishing best estimates of parameters4 is that of least squares. The technique5 is versatile and allows parameters to be estimated when the relationship between x and y is more complex than that given by equation 2.1. For example, a, b and c in the equations 2.3 to 2.5 may be determined using the technique of least squares.

• cxxb

ay ++= (2.3)

• y = a + bx + cz. (here both x and z are independent variables) (2.4) • y = a+ ( )[ ]cxb exp1− . (2.5) In this discussion of least squares, the following assumptions are made:

1) There are no errors in the x values. 2) Errors in the y values are normally distributed with a mean of zero and a

constant variance. Constant variance errors are sometimes referred to as homoscedastic errors.

3) Errors in the y values are uncorrelated, so that, for example, the error in the ith y value is not correlated to the error in the (i+1)th y value.

2 Sometimes referred to as population parameters or regression coefficients. 3 In some texts, best estimates of α and β are written as α and β respectively. 4 Refer to chapters 6 and 7 of Kirkup (2002) for more details. 5 The technique is also widely referred to as regression.

9

The ith observed y value is written as yi and the ith value of x as xi. The ith predicted y value found using the equation of the line is written as iy , such that6, ii bxay +=ˆ (2.6) The least squares technique of fitting equations to data requires the calculation of ( )2ˆ ii yy − . We sum ( )2ˆ ii yy − from i = 1 to i = n, where n is the number of data points. The summation is written7,

( )=

=

−=ni

iii yySSR

1

2ˆ (2.7)

SSR is the Sum of Squares of the Residuals8. Strictly, equation 2.7 applies to fitting by ‘unweighted’ least squares. Weighted least squares is considered in section 7. The next stage is to find values of a and b which minimise SSR in equation 2.7. This is the key step in any least squares analysis, as values of a and b that minimise SSR are regarded as the best estimates obtainable of the parameters in an equation9. Best estimates could be found by ‘trial and error’, or by a systematic numerical search using a computer. When a straight line is fitted to data, an equation for the best line can be found analytically by partially differentiating SSR with respect to a and b in turn then setting the resulting equations equal to zero. Simultaneous equations obtained by this process are solved for a and b to give,

( )

−

−22

2

=ii

iiiii

xxn

yxxyxa (2.8)

and,

( )

−

−22

=ii

iiii

xxn

yxyxnb (2.9)

An elegant approach to determining a and b employs matrices. An added advantage of the matrix approach is that it may be conveniently extended to situations in which more complex equations are fitted to experimental data. The equations to be solved for a and b can be expressed in matrix form as:

6

iy is sometimes referred to as ‘y hat’. 7In future we assume that all summations are carried out between i = 1 to i = n, and therefore we omit the limits of the summations. 8

ii yy ˆ− is referred to as the ith residual. 9The process by which estimates are varied until some condition (such as the minimisation of SSR) is satisfied is often called ‘optimisation’.

10

=

ii

i

ii

i

yx

y

ba

xx

xn2 (2.10)

Equation 2.10 can be written concisely as, AB = P (2.11) where,

=

2ii

i

xx

xnA

=

ba

B

=

ii

i

yx

yP

To determine elements a and b of the matrix B, equation 2.11 is manipulated to give, B = A-1P (2.12) where A-1 is the inverse matrix10 of the matrix, A. Matrix inversion and matrix multiplication are onerous to perform manually, especially if matrices are large. The built in matrix functions in Excel are well suited to estimating parameters in linear least squares problems. Exercise 1 Table 2.1 contains x-y data which are shown plotted in figure 2.1. Table 2.1: x-y data.

x y 2 70 4 63 6 49 8 42 10 31

0 10 20 30 40 50 60 70 80

0 2 4 6 8 10 12 x

y

Figure 2.1: Linearly related x-y data. Using the data in table 2.1,

10 A-1 is used in the calculation of the standard errors in parameter estimates and is sometimes referred to as the ‘error matrix’.

11

i) find best estimates for the intercept, a, and the slope, b, of a straight line fitted

to the data using linear least squares [80.7, -4.94]. ii) draw the line of best fit through the points. iii) calculate the sum of squares of residuals, SSR. [9.9]. 2.1 Standard errors in best estimates In addition to the best estimates, a and b, the standard errors in a and b are required as this allows confidence intervals11 to be quoted for the parameters α and β. Calculations of a and b depend on the measured y values. As a consequence, uncertainties in the y values contribute to the uncertainties in a and b. In order to calculate uncertainties in a and b, the usual starting point is to determine the standard errors in a and b, written as σa and σb respectively. σa and σb are given by12,

( )2

1

212

∆= i

a

xσσ (2.13)

21

21

∆= n

b

σσ (2.14)

where

( )∆ = n x xi i2 2

− (2.15)

and

( )2

12ˆ

21

−−

≈ ii yyn

σ (2.16)

Alternatively, σa and σb may be determined using matrices13. The covariance matrix, V, contains elements which are the variances (as well as the covariances) of the best estimates of a and b. V, may be written,14 1AV −= 2σ (2.17) A-1 appears in equation 2.12. σ2 can be found using equation 2.16. Standard errors in a and b are written explicitly as,

( ) 21

111−= Aa σσ (2.18)

( ) 21

122−= Ab σσ (2.19)

11 See Kirkup (2002) p226. 12See Bevington and Robinson (1992). 13See chapter 5 of Neter et al. (1996). 14The covariance matrix is considered in more detail in section 9.

12

111−A and 1

22−A are diagonal elements of the A-1 matrix15.

Exercise 2 Using matrices, or otherwise, determine the standard errors in the intercept and slope of the best straight line through the data given in table 2.1. [1.9, 0.29]

15See Williams (1972).

13

Section 3: Extensions of the linear least squares technique The technique of least squares used to fit equations to experimental data can be extended in several ways:

• Weighting the fit. The assumption that the standard deviation in y values is the same for all values of x (a characteristic which is sometimes referred to as homoscedasticity16) may not be valid. When it is not valid, we need to ‘weight’ the fit, in effect forcing the line closer to those points that are known to higher precision. Weighted fitting is considered in section 7.

• More complex equations may be fitted to the data. Equations such as

cxxb

ay ++= and y = a + bx + cx2 are linear in the parameters and may be

fitted using linear least squares. The added computational complexity, which can arise when there are more than two parameters to be estimated, favours fitting by matrix methods. These methods are most conveniently applied using a computer for matrix manipulation/inversion.

• Equations may be fitted using linear least squares in which the equations have more than one independent variable. As an example, the equation y = a + bx + cz may be fitted to data, where x and z are the independent variables (this is sometimes referred to as ‘multiple regression’).

3.1 Using Excel to solve linear least squares problems Excel is capable of fitting functions to data that are linear in parameters. This may be achieved by using one of the following features in Excel:

• The LINEST() function • The Regression tool in the Analysis ToolPak

Excel has no built in tool for performing weighted least squares, though a spreadsheet may be created to perform this procedure17. Excel does not provide an easy to use utility for fitting an equation to data requiring the application of non-linear least squares. However, with the aid of a powerful add-in called ‘Solver’ resident in Excel, fitting using non-linear least squares is possible. We will deal with Solver in sections 4 and 9, but first we consider non-linear least squares.

16 The condition where the variance in y values is not constant for all x, is referred to as ‘heteroscedasticity’. 17See Kirkup (2002), section 6.10.

14

3.2 Limitations of linear least squares Quite complex functions can be fitted to data using linear least squares. As examples, y = a + xb ln + xc exp (3.1)

y = a + bx +2x

c (3.2)

The equation to be fitted is inserted into equation 2.7. SSR is partially differentiated with respect to each parameter estimate in turn. The resulting equations are set equal to zero and solved to find best estimates of the parameters. It is worth highlighting that the ‘linear’ in linear least squares does not mean that a plot of y versus x will produce a graph containing data which lie along a straight line. ‘Linear’ refers to the fact that the partial derivatives, a

SSR∂

∂ , bSSR

∂∂ etc. as

described in section 2, are linear in the parameter estimates. Using this definition, equations 3.1 and 3.2 may be fitted to data using linear least squares. Some relationships between physical variables require transformation before they are suitable for fitting by linear least squares. As an example, the variation of electrical resistance, R, with temperature, T, of some semiconductor materials is known to follow the relationship,

=T

RRγ

exp0 (3.3)

where R0 and γ are constants. Taking natural logarithms of both sides of equation 3.3 and comparing the resulting equation with y = a + bx, we obtain, (3.4)

Taking the y values to be ln R and the x values to be 1/T, least squares may be used to find best estimates for ln R0 (and hence R0) and γ. If the errors in R have constant variance, then after transformation, the errors in ln(R) do not have constant variance. In this circumstance weighted fitting is required18.

Weighted fitting of equations using least squares matters most when the scatter in data is large. If data show small scatter, then the best estimates found using weighted least squares are very similar to the best estimates found by using unweighted least squares.

18 See Dietrich (1991) p303.

1lnln 0

+=T

RR γ

x b a y + =

15

Though transforming equations can assist in many situations, there are some equations that cannot be transformed into a form suitable for fitting by linear least squares. As examples,

xcbx

ay+

+=2

(3.5)

cxbay exp+= (3.6)

[ ]cxbay exp1−+= (3.7)

dxcbxay expexp += (3.8)

For equations 3.5 to 3.8 it is not possible to obtain a set of linear equations that may be solved for best estimates of the parameters. We must therefore resort to another method of finding best estimates. That method still requires that parameter estimates are found that minimise SSR.

SSR may be considered to be a continuous function of the parameter estimates. A surface may be constructed, sometimes referred to as a hypersurface19 in M dimensional space, where M is the number of parameters appearing in the equation to be fitted to data. The intention is to use non-linear least squares to discover estimates, a, b, c etc which yield a minimum in the hypersurface. As with linear least squares, these estimates are regarded as the best estimates of the parameters in the equation. Figure 3.1 shows a hypersurface which depends on estimates a and b.

Figure 3.1: Variation of SSR as a function of parameter estimates, a and b. This figure is adapted from rcs.chph.ras.ru/nlr.ppt by Alexey Pomerantsev.

19 See Bevington and Robinson (1992).

b a

SSR

Minimum in hypersurface

16

Fitting by non-linear least squares begins with reasonable guesses for the best estimates of the parameters. The objective is to modify the starting values in an iterative fashion until a minimum is found in SSR. The computational complexity of the iteration process means that non-linear least squares can only realistically be carried out using a computer. There are many documented ways in which the values of a, b, c etc. can be found which minimise SSR, including Grid Search (Bevington and Robinson, 1992), Gauss Newton method (Nielsen-Kudsk, 1983) and the Marquardt algorithm (Bates and Watts, 1988). Non-linear least squares is unnecessary when the derivatives of SSR with respect to the parameters are linear in parameters. In this situation linear least squares offers a more efficient route to determining best estimates of the parameters (and the standard errors in the best estimates). Nevertheless, a linear equation can be fitted to data using non-linear least squares. The answer obtained for best estimates of parameters and the standard errors in the best estimates should agree, irrespective of whether a linear equation is fitted using linear or non-linear least squares.20

20 We consider this in more detail in section 6.

17

Section 4: Excel's Solver add-in Solver, first introduced in 1991, is one of the many 'add ins' available in Excel21. Originally designed for business users, Solver is a powerful and flexible optimisation tool which is capable of finding (as an example) the best estimates of parameters using least squares. It does this by iteratively altering the numerical value of variables contained in the cells of a spreadsheet until SSR is minimised. To solve non-linear problems, Solver uses Generalized Reduced Gradient (GRG2) code developed at the University of Texas and Cleveland State University22. Features of Solver are best described by reference to a particular example. 4.1 Example of use of Solver Consider an experiment in which the rise of air temperature in an enclosure (such as a room) is measured as a function of time as heat passes through a window into the enclosure. Table 4.1 contains the raw data. Figure 4.1 displays the same data in graphical form. Table 4.1: Variation of air temperature in an enclosure with time.

Time (minutes) Temperature (°C) 2 26.1 4 26.8 6 27.9 8 28.6 10 28.5 12 29.3 14 29.8 16 29.9 18 30.1 20 30.4 22 30.6 24 30.7

21 See Fyltra et al. (1998) 22 See Excel's online Help. See also Smith and Lasden (1992).

18

20.0

22.0

24.0

26.0

28.0

30.0

32.0

0 5 10 15 20 25

Time (minutes)

Tem

per

atur

e (C

)

Figure 4.1: Temperature variation with time inside an enclosure. Through a consideration of the flow of heat into and out of an enclosure, a relationship may be derived for the air temperature, T, inside the enclosure as a function of time, t. The relationship can be expressed, ( )[ ]tkTT s αexp1−+= (4.1) where Ts, k and α are constants. Equation 4.1 may be written in a form consistent with other equations appearing in this document. Using x and y for independent and dependent variables respectively and a, b and c for the parameter estimates, equation 4.1 becomes23, ( )[ ]cxbay exp1−+= (4.2) To find best estimates, a, b and c, we proceed as follows: 1. Enter the raw data from table 4.1 into columns A and B of an Excel worksheet as

shown in sheet 4.1. 2. Type =$B$15+$B$16*(1-EXP($B$17*A2)) into cell C2 as shown in sheet 4.1.

Cells B15 to B17 contain the starting values for a, b and c respectively. 3. Use the cursor to highlight cells C2 to C13. 4. Click on the Edit menu. Click on the Fill option, then click on the Down option24.

23 Equation 4.2 is of the same form as that fitted to data obtained through fluorescent decay measurements, where the decay is characterised by a single time constant – see Walsh and Diamond (1995). 24 These steps are often abbreviated in Excel texts to Edit äFill äDown.

19

Sheet 4.1: Temperature (y) and time (x) data from table 4.1 entered into a spreadsheet25.

A B C 1 x(mins) y(°C) y (°C) 2 2 26.1 =$B$15+$B$16*(1-EXP($B$17*A2)) 3 4 26.8 4 6 27.9 5 8 28.6 6 10 28.5 7 12 29.3 8 14 29.8 9 16 29.9

10 18 30.1 11 20 30.4 12 22 30.6 13 24 30.7 14 15 a 1 16 b 1 17 c 1

Sheet 4.2 shows the values returned in the C column. As the squares of the residuals are required, these are calculated in column D. Sheet 4.2: Calculation of sum of squares of residuals.

A B C D 1 x(mins) y(°C) y (°C) (y- y )2 (°C2) 2 2 26.1 -5.38906 991.560654 3 4 26.8 -52.5982 6304.066229 4 6 27.9 -401.429 184323.2129 5 8 28.6 -2978.96 9045405.045 6 10 28.5 -22024.5 486333300.3 7 12 29.3 -162753 26498009287 8 14 29.8 -1202602 1.44632E+12 9 16 29.9 -8886109 7.89635E+13

10 18 30.1 -6.6E+07 4.31124E+15 11 20 30.4 -4.9E+08 2.35385E+17 12 22 30.6 -3.6E+09 1.28516E+19 13 24 30.7 -2.6E+10 7.01674E+20 14 SSR = 7.14765E+20 15 a 1 16 b 1 17 c 1

The sum of the squares of residuals, SSR, is calculated in cell D14 by summing the contents of cells D2 through to D13. It is clear that the choices of starting values for a, b and c are poor, as the predicted values, y , in column C of sheet 4.2 bear no resemblance to the experimental values in column B. As a consequence, SSR is very large. Choosing good starting values for parameter estimates is often crucial to the

25 The estimated values of the dependent variable based on an equation like equation 4.2 must be distinguished from values obtained through experiment. Estimated values are represented by the symbol, y and experimental values by the symbol, y.

20

success of fitting equations using non-linear least squares and we will return to this issue later. SSR in cell D14 is reduced by carefully altering the contents of cells B15 through to B17. Solver is able to adjust the parameter estimates in cells B15 to B17 until the number in cell D14 is minimised. To accomplish this, choose Tools on Excel's Menu bar and pull down to Solver. If Solver does not appear, then on the same pull down menu, select Add-Ins and tick the Solver Add-in box. After a short delay, Solver should be added to the Tools pull down menu. Click on Solver. The dialog box shown in figure 4.2 should appear.

Figure 4.2: Solver dialog box with cell references inserted. After entering the information into the dialog box, click on the Solve button. After a few seconds Solver returns with the dialog box shown in figure 4.3.

We want to minimise the value in cell D14 so D14 becomes our 'target' cell.

Solver is capable of adjusting cell contents such that the value in the target cell is maximised, minimised or reaches a specified value. For least squares analysis we require the content of the target cell to be minimised.

Excel alters the values in these cells in order to minimise the value in cell D14

It is possible to constrain the values in one or more cells (for example a parameter estimate can be prevented from assuming a negative value, if a negative value is considered to be 'unphysical'). No constraints are applied in this example.

21

Figure 4.3: Solver dialog box indicating that fitting has been completed. Inspection of cells B15 to B17 in the spreadsheet indicates that Solver has adjusted the parameters. Sheet 4.3 shows the new parameters, SSR, etc. Sheet 4.3: Best values for a, b and c returned by Solver when starting values are poor.

A B C D 1 x(mins) y(°C) y (°C) (y- y )2 (°C2) 2 2 26.1 24.75011 1.822210907 3 4 26.8 27.99648 1.431570819 4 6 27.9 29.10535 1.452872267 5 8 28.6 29.48411 0.781649268 6 10 28.5 29.61348 1.239842411 7 12 29.3 29.65767 0.127929367 8 14 29.8 29.67277 0.01618844 9 16 29.9 29.67792 0.049318684

10 18 30.1 29.67968 0.176666436 11 20 30.4 29.68028 0.517990468 12 22 30.6 29.68049 0.845498795 13 24 30.7 29.68056 1.039257719 14 SSR = 9.500995581 15 a 15.24587 16 b 14.43473 17 c -0.5371 18

SSR in cell D14 in sheet 4.3 is almost 20 orders of magnitude smaller than that in cell D14 in sheet 4.2. However, all is not as satisfactory as it might seem. Consider the best line through the points which utilises the parameter estimates in cells B15 through to B17 of sheet 4.3.

22

Figure 4.4: Graph of y versus x showing the line based on equation 4.2 where a, b and c have the values given in sheet 4.3. A plot of residuals (i.e. a plot of ( ii yy ˆ− ) versus xi) is often used as an indicator of the ‘goodness of fit’ of an equation to data, with trends in the residuals indicating a poor fit26. However, no plot of residuals is required in this case to reach the conclusion that the line on the graph in figure 4.4 is not a good fit to the experimental data. Solver has found a minimum in SSR, but this is a local minimum27 and the parameter estimates are of little worth. The source of the problem can be traced to the poorly chosen starting values (i.e. a = b = c = 1). Working from these initial estimates, Solver has discovered a minimum in SSR. However, there is another combination of parameter estimates that will produce an even lower value for SSR. Methods by which good starting values for parameter estimates may be obtained are considered in section 5.2. In the example under consideration here, we note (by reference to equation 4.2) that when x = 0, y = a. Drawing a line 'by eye' through the data in Figure 4.1 indicates that, when x = 0, y ≈ 25.5 °C. Starting values for b and c may also be established by a similar preliminary analysis of data which we will consider in section 5.2. Denoting starting values by a0, b0 and c0, we find28, a0 = 25.5, b0 = 5.5 and c0 = -0.12 Inserting these values into sheet 4.2 and running Solver again gives the output shown in sheet 4.4 and in graphical form in figure 4.5.

26 See Cleveland (1994) and Kirkup (2002) for a discussion of residuals. 27 Local minima are discussed in section 5.1. 28 All parameter estimates in this example have units (for example the unit of c is min-1, assuming time is measured in minutes). For convenience units are omitted until the analysis is complete.

25

26

27

28

29

30

31

0 5 10 15 20 25 x

y

( )[ ]xy 5371.0exp143.1425.15ˆ −−+=

23

Sheet 4.4: Best values for a, b and c returned by Solver when starting values for parameter estimates are good.

A B C D 1 x(mins) y(°C) y (°C) (y- y )2 (°C2) 2 2 26.1 26.07247 0.000757691 3 4 26.8 26.97734 0.031447922 4 6 27.9 27.72762 0.029716516 5 8 28.6 28.34972 0.062639751 6 10 28.5 28.86555 0.133625786 7 12 29.3 29.29326 4.54949E-05 8 14 29.8 29.64789 0.023136202 9 16 29.9 29.94195 0.001759666

10 18 30.1 30.18577 0.007356121 11 20 30.4 30.38793 0.00014558 12 22 30.6 30.55556 0.001974583 13 24 30.7 30.69456 2.96361E-05 14 SSR = 0.29263495 15 a 24.98118 16 b 6.387988 17 c -0.093668 18

25

26

27

28

29

30

31

0 5 10 15 20 25

x

y

Figure 4.5: Graph of y versus x showing line and equation of line based on a, b and c in sheet 4.4. The sum of squares of residuals in cell D14 of sheet 4.4 is less than that in cell D14 of sheet 4.3. This indicates that the parameter estimates obtained using Solver when good starting values are used are rather better than those obtained when the starting values are poorly chosen. In addition, the line fitted to the data in figure 4.5 (where the line is based upon the new best estimates of the parameters) is far superior to the line fitted to the same data in shown in figure 4.4. This is further reinforced by the plot of residuals shown in figure 4.6 which exhibit a random scatter about the x axis.

( )[ ]xy 09367.0exp1388.698.24ˆ −−+=

24

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0 5 10 15 20 25 30

x

yy ˆ−

Figure 4.6: Plot of residuals based on the data and equation in figure 4.5. 4.2 Limitations of Solver Solver is able to efficiently solve for the best estimates of parameters in an equation, such as those appearing in equation 4.2. However, Solver does not provide standard errors in the parameter estimates. Standard errors in estimates are extremely important, as without them it is not possible to quote a confidence interval for the estimates and so we cannot decide if the estimates are ‘good enough’ for any particular purpose. If there are three parameters to be estimated, the standard errors in the parameter estimates can be determined with the assistance of the matrix of partial derivatives given by29,

E =

∂∂

∂∂

∂∂

∂∂

∂∂

∂∂

∂∂

∂∂

∂∂

∂∂

∂∂

∂∂

∂∂

∂∂

∂∂

2

2

2

cy

cy

by

cy

ay

cy

by

by

by

ay

cy

ay

by

ay

ay

iiiii

iiiii

iiiii

(4.3)

The standard errors in a, b and c are the diagonal elements of the covariance matrix, V, given by equation 2.17. Explicitly30,

( ) 21

111−= Ea σσ (4.4)

( ) 21

122−= Eb σσ (4.5)

29Note that this approach can be extended to any number of parameters. See Neter et al. (1996) chapter 13. 30 Compare these with equations 2.18 and 2.19.

25

( ) 21

133−= Ec σσ (4.6)

where31,

( )2

12ˆ

31

−−

≈ ii yyn

σ (4.7)

A convenient way to calculate the elements of the E matrix is to write, E = DTD (4.8) DT is the transpose of the matrix, D. D is given by,

D =

∂∂

∂∂

∂∂

∂∂

∂∂

∂∂

∂∂

∂∂

∂∂

∂∂

∂∂

∂∂

cy

by

ay

cy

by

ay

cy

by

ay

cy

by

ay

nnn

iii

222

111

(4.9)

The partial derivatives in equation 4.9 are evaluated on completion of fitting an equation using Solver, i.e. at values of a, b and c that minimise SSR. It is possible in some situations to determine the partial derivatives analytically. A more flexible approach and one that is generally more convenient, is to use the method of ‘finite

differences’ to finday

∂∂ 1 ,

ay

∂∂ 2 etc. In general,

( )[ ] [ ]( ) aa

xcbayxcbayay ii

xcb

i

i−+−+

≈

∂∂

δδ

1,,,,,,1

,,

(4.10)

As double precision arithmetic is used by Excel, the perturbation, , in equation 4.10 can be as small as = 10-6 or 10-7.

Similarly, the partial derivatives,byi

∂∂

andcyi

∂∂

, are approximated using,

( )[ ] [ ]

( ) bbxcbayxcbay

by ii

xca

i

i−+−+

≈

∂∂

δδ

1,,,,,1,

,,

(4.11)

31 n -3 in the denominator of the term in the square brackets of equation 4.7 appears because the estimate of the population standard deviation in the y values requires that the sum of squares of residuals be divided by the number of degrees of freedom. The numbers of degree of freedom is the number of data points, n, minus the number of parameters, p, in the equation. In this example, p = 3.

26

and,

( )[ ] [ ]( ) cc

xcbayxcbaycy ii

xba

i

i−+−+

≈

∂∂

δδ1

,,,,1,,

,,

(4.12)

4.3 Spreadsheet for the determination of standard errors in parameter estimates In an effort to clarify the process of estimating standard errors, we describe a step-by-step approach using an Excel spreadsheet32.

To find good approximations of the derivativesay

∂∂ 1 and

ay

∂∂ 2 , it is necessary to perturb

a slightly (say to 1.000001×a) while leaving the parameter estimates b and c at their optimum values. Sheet 4.5 shows the optimum values, as obtained by Solver for a, b and c in cells G20 to G22. Cell H20 contains the value 1.000001×a. Cell I21 contains the value 1.000001×b and cell J22 contains the value 1.000001×c. Sheet 4.5: Modification of best estimates of parameters.

F G H I J 19 from solver b,c constant a,c constant a,b constant 20 a 24.98118 24.98120574 24.98118076 24.98118 21 b 6.387988 6.387988103 6.387994491 6.387988 22 c -0.093668 -0.093668158 -0.093668158 -0.09367 23

We use the modified parameter estimates to calculate the numerator in equation 4.10. The denominator in equation 4.10 may be determined by entering the formula = $H$20-$G$20 into a cell on the spreadsheet.

The partial derivative 1,,

1

xcbay

∂∂

is calculated by entering the formula

=(H2-C2)/($H$20-$G$20) into cell L2 of sheet 4.633. By using FilläDown, the formula may be copied into cells in the L column so that the partial derivative is

calculated for every xi. To obtain byi

∂∂

,cyi

∂∂

etc. this process is repeated for columns M

and N, respectively of sheet 4.6. The contents of cells L2 to N13 become the elements of the D matrix given by equation 4.9.

32 It is possible to combine these steps into a macro of Visual Basic program (see Walkenback , 2001). 33 The values in the C column of the spreadsheet are shown in sheet 4.4.

27

Sheet 4.6: Calculation of partial derivatives. H I J K L M N 1 y with b,c,

constant y with, a,c

constant y with, a,b,

constant) dy/da dy/db dy/dc 2 26.07249879 26.0724749 26.07247 1 0.17084 -10.5934 3 26.9773606 26.97733762 26.97734 1 0.31249 -17.5673 4 27.72764019 27.72761795 27.72762 1 0.42994 -21.8493 5 28.34974564 28.34972402 28.34972 1 0.52732 -24.1556 6 28.86557359 28.86555249 28.86555 1 0.60807 -25.0362 7 29.29327999 29.29325932 29.29326 1 0.67503 -24.911 8 29.64791909 29.64789877 29.6479 1 0.73055 -24.0978 9 29.94197336 29.94195334 29.94195 1 0.77658 -22.8355

10 30.18579282 30.18577304 30.18577 1 0.81475 -21.3012 11 30.38795933 30.38793976 30.38794 1 0.84639 -19.6247 12 30.5555887 30.55556929 30.55557 1 0.87264 -17.8993 13 30.69458107 30.69456181 30.69456 1 0.89439 -16.1907

Excel’s TRANSPOSE() function is used to transpose the D matrix. We proceed as follows:

• Highlight cells B24 to N26. • In cell B24 type =TRANSPOSE(L2:N13). • Press Ctrl ä Shift äEnter to transpose of contents of cells L2 to N13 into

cells B24 to N26. • Multiply DT with D (using the MMULT matrix function in Excel) to give E,

i.e.,

E = DTD =

64.52528741.160062.246

8741.16049356.565898.7062.24665898.712

(4.13)

The MINVERSE() function in Excel is used to find the inverse of E, i.e.,

E-1 =

−−−−−−

0054669.003456.009005.0

03456.0870672.148539.009005.048539.0239517.2

(4.14)

Two more steps are required to calculate the standard errors in the parameter estimates. The first is to calculate the square root of each diagonal element of the matrix E-1. The second is to calculate σ using equation 4.7. Using the sum of squares of residuals appearing in cell D14 of sheet 4.4, we obtain,

1803.02926.0312

1 21

=

×−

≈σ

It follows that,

28

( ) 21

111−= Ea σσ ( ) 2

1240.21803.0 ×= = 0.270 (4.15)

( ) 21

122−= Eb σσ ( ) 2

1871.11803.0 ×= = 0.247 (4.16)

( ) 21

133−= Ec σσ ( ) 2

1005467.01803.0 ×= = 0.0133 (4.17)

4.4 Confidence intervals for parameter estimates We use parameter estimates and their respective standard errors to quote a confidence interval for each parameter34. For the parameters appearing in equation 4.1, Ts = a ± tX%,νσa (4.18) k = b ± tX%,νσb (4.19) α = c ± tX%,νσc (4.20) tX%,ν is the critical value of the t distribution for X% confidence level with ν degrees of freedom. t values are routinely tabulated in statistical texts. In this example ν = n -3 where n is the number of data points. In table 4.1 there are 12 points, so that ν = 9. If we choose a confidence level of 95 %, (the commonly chosen level), t95%,9 = 2.262 Restoring the units of measurements and quoting 95% confidence intervals gives, Ts = (24.98 ± 2.262 × 0.270) °C = (24.98 ± 0.61) °C k = (6.388 ± 2.262 × 0.247) °C = (6.39 ± 0.56) °C α = (-0.09367 ± 2.262 × 0.0133) min-1 = (-0.094 ± 0.030) min-1 Exercise 3 The amount of heat entering an enclosure through a window may be reduced by applying a reflective coating to the window. An experiment is performed to establish the effect of a reflective coating on the rise in air temperature within the enclosure. The temperature within the enclosure as a function of time is shown in table 4.2.

Time (minutes) Temperature (°C) 2 24.9 4 25.3 6 25.4 8 25.8 10 26.0 12 26.3 14 26.4 16 26.6 18 26.5 20 26.8 22 27.0 24 26.9

Table 4.2: Data for exercise 3.

34 See Kirkup (2002), p226.

29

Fit equation 4.2 to the data in table 4.2. Find a, b, and c and their respective standard errors. Note that good starting values for parameter estimates are required if fitting by non-linear least squares is to be successful. [a = 24.503 °C, σa = 0.128 °C, b = 3.0613 °C, σb

= 0.227 °C, c = -0.0682 min-1, σc = 0.0147 min-1]

30

Section 5: More on fitting using non-linear least squares There are several challenges to face when fitting equations to data using non-linear least squares. These can be summarised as,

1) Choosing an appropriate model to describe the relationship between x and y. 2) Avoiding local minima in SSR. 3) Establishing good starting values prior to fitting by non-linear least squares.

We consider 2) and 3) in this section. Model identification is considered in section 10. 5.1 Local Minima in SSR When data are noisy, or starting values are far from the best estimates, a non-linear least squares fitting routine can become ‘trapped’ in a local minimum. To illustrate this situation, we draw on the analysis of data appearing in section 4.1. Equation 4.2 is fitted to the data in table 4.1 using the starting values given in sheet 4.2 and the best estimates, a, b and c are obtained for the parameters. For clarity, the relationship between only one parameter estimate (c) and SSR is considered. Solver finds a minimum in SSR when c is about -0.53 and terminates the fitting procedure. The variation of SSR with c is shown in figure 5.1. The minimum in SSR in figure 5.1 is referred to as a local minimum as there is another combination of parameter estimates that will give a lower value for SSR. The lowest value of SSR obtainable corresponds to a global minimum. It is the global minimum that we would like to identify in all least squares problems.

9.5

10.0

10.5

11.0

11.5

12.0

-0.8 -0.7 -0.6 -0.5 -0.4 -0.3 c

SSR

Figure 5.1: Variation of SSR with c when a local minimum has been found when equation 4.2 is fitted to data. When starting values are used that are closer to the final values35, the non-linear fitting routine finds parameter estimates that produce a lower final value for SSR. Figure 5.2 shows the variation of SSR with c in the interval (-0.18 < c < -0.04).

35 See section 4.1.

31

Figure 5.2: Variation of SSR with c when a global minimum has been found when equation 4.2 is fitted to data. A number of indicators can assist in identifying a local minimum, though there is no ‘fool-proof’ way of deciding whether a local or global minimum has been discovered. A good starting point is to plot the raw data along with the fitted line (as illustrated in figure 4.4). A poor fit of the line to the data could indicate,

• A local minimum has been found. • An inappropriate model has been fitted to the data.

When a local minimum in SSR is found, the standard errors in the parameter estimates tend to be large. As an example, best estimates appearing in sheet 4.3 (resulting from being trapped in a local minimum), their respective standard errors and the magnitude of the ratio of these quantities (expressed as a percentage) are, a = 15.25, σa = 9.27, so that |σa/a| × 100% = 61 % b = 14.43, σb

= 9.17, so that |σb/b| × 100% = 64 % c = -0.5371, σc = 0.284, so that |σc/c| × 100% = 53 % When the global minimum in SSR is found (see sheet 4.4), the best estimates of the parameters, standard errors etc. are, a = 24.98, σa = 0.270, so that |σa/a| × 100% = 1.1 % b = 6.388, σb

= 0.247, so that |σb/b| × 100% = 3.9 % c = -0.09367, σc = 0.0133, so that |σc/c| × 100% = 14 % There is merit in fitting the same equation to data several times, each time using different starting values for the parameter estimates. If, after fitting, there is consistency between the final values obtained for the best estimates, then it is likely that the global minimum has been identified.

5.0 10.0 15.0 20.0 25.0 30.0 35.0 40.0

-0.20 -0.15 -0.10 -0.05 0 c

SSR

32

5.2 Starting values There are no general rules that may be applied in order to determine good starting values36 for parameter estimates prior to fitting by non-linear least squares. It is correct, but sometimes unhelpful, to remark that familiarity with the relationship being studied can assist greatly in deciding what might be reasonable starting values for parameter estimates. A useful approach to determining starting values is to begin by plotting the experimental data. Consider the data in figure 5.3, which has a smooth line drawn through the points ‘by eye’.

20.0

22.0

24.0

26.0

28.0

30.0

32.0

0 5 10 15 20 25

y (°

C)

≈31

≈25.5

Figure 5.3: Line drawn 'by eye' through the data given in table 4.1. If the relationship between x and y is given by equation 5.1, then we are able to estimate a and b by considering the data in figure 5.3 and a ‘rough’ line drawn through the data. ( )[ ]cxbay exp1−+= (5.1) Equation 5.1 predicts that y =a when x is equal to zero. From figure 5.3 we see that when x = 0, y ≈ 25.5 °C, so that a ≈ 25.5 °C. When x is large, (and assuming c is negative) then y = a + b. Inspection of the graph in figure 5.3 indicates that when x is large, y ≈ 31.0 °C, i.e. a + b ≈ 31.0 °C. It follows that b ≈ 5.5 °C. If we write the starting values for a and b as a0 and b0 respectively, then a0 = 25.5 °C and b0 = 5.5 °C. In order to determine a starting value for c, c0, equation 5.1 is rearranged into the form37,

36 Sometimes referred to as initial estimates. 37 Starting values, a0 and b0, are substituted into the equation.

x (minutes)

33

xcb

ay0

0

01ln =

−− (5.2)

Equation 5.2 has the form of an equation of a straight line passing through the origin

(ie y = bx). It follows that plotting

−−

0

01lnb

ayversus x should give a straight line

with slope, c0.

y = -0.1243x + 0.2461

-3.5

-3.0

-2.5

-2.0

-1.5

-1.0

-0.5

0.00 5 10 15 20 25 30

x

ln([

1-(y

-a0)

/b0]

Figure 5.4: Line of best fit used to determine starting value for c.

Figure 5.4 shows a plot of

−−

0

01lnb

ayversus x. The line of best fit and the

equation of the line has been added using the Trendline option in Excel38,39. The slope of the line is approximately -0.12. The starting values may now be stated for this example, i.e., a0 = 25.5 °C, b0 = 5.5 °C, c0 = -0.12 min-1

These starting values were used in the successful fit of equation 4.2 to the data given in table 4.1 (The output of the fit is shown in sheet 4.4). 5.3 Starting values by curve stripping Establishing starting values in some situations is quite difficult and may require a significant amount of pre-processing of the data. For example, the fitting to data of an equation consisting of a sum of exponential terms, such as, dxcbxay expexp += (5.3)

38 For details of Trendline see page 222 in Kirkup (2002). 39 An equation of the form y = a + bx was fitted to the data using Trendline in Excel. Alternatively, we could have fitted y = bx to the data. Either approach would have given an acceptable starting value for c0.

34

or40 fxedxcbxay expexpexp ++= (5.4) is particularly challenging especially when data are noisy and/or the ratio of the parameters within the exponentials is less than approximately 3 (e.g. when the ratio b/d in equation 5.3 is less than 3)41. Fitting of equations such as equation 5.3 and equation 5.4 is quite common, for example the kinetics of drug transport through the human body is routinely modelled using ‘compartmental analysis’. Compartmental analysis attempts to predict concentrations of drugs as a function of time (eg in blood or urine), The relationship between concentration and time is often well represented by a sum of exponential terms. In analytical chemistry, excited state lifetime measurements offer a means of identifying components in a mixture. The decay of phosphorescence with time that occurs after illumination of the mixture may be captured. The decay can be represented by a sum of exponential terms. Fitting a sum of exponentials by non-linear least squares allows for each component in the mixture to be discriminated42. If an equation to be fitted to data consists of a sum exponential terms, good starting values for parameter estimates are extremely important if local minima in SSR are to be avoided. It is also possible that, if starting values for the parameter estimates are too far from the optimum values, SSR will increase during the iterative process to such an extent that it exceeds the maximum floating point number that a spreadsheet (or other program) can handle. In this situation, fitting is terminated and an error message is returned by the spreadsheet. Data in figure 5.5 have been gathered in an experiment in which the decay of photo-generated current in the wide band gap semiconductor cadmium sulphide (CdS) is measured as a function of time after photo-excitation of the semiconductor has ceased. There appears to be an exponential decay of the photocurrent with time. Theory indicates43 that there may be more than one decay mechanism for photoconductivity. That, in turn, suggests that an equation of the form given by equation 5.3 or equation 5.4 is appropriate.

40 Here we assume fdb >> 41See Kirkup and Sutherland (1988). 42 See Demas (1983). 43See Bube (1960), chapter 6.

35

0102030405060708090

100

0 20 40 60 80 100

Time (ms)

Pho

tocu

rren

t (a

rbit

rary

uni

ts)

Figure 5.5: Photocurrent versus time data for cadmium sulphide. If equation 5.3 is to be fitted to data, how are starting values for parameter estimates established? If b is large (and negative) then the contribution of the first term in equation 5.3 to y is small when x exceeds some value, which we will designate as x. Equation 5.3 can now be written, for x > x, dxcy exp≈ (5.5) Equation 5.5 can be linearised by taking natural logarithms of both sides of the equation. The next step is to fit a straight line to the transformed data to find (approximate values) for c and d which we will designate as c0 and d0 respectively. Now we revisit equation 5.3 and write for x < x, bxaxdcy expexp 00 =− (5.6) Transforming equation 5.6 by taking natural logarithms of both sides of the equation then fitting a straight line to the transformed data will yield approximate values for a and b which can serve as starting values in a non-linear fit. For a more detailed discussion of how to determine starting values when an equation to be fitted consists of a sum of exponential terms, see Kirkup and Sutherland, 1988. 5.4 Effect of instrument resolution and noise on best estimates Errors in the dependent variable lead to uncertainties in parameter estimates44. If errors are very large, it may not be possible to establish reasonable parameter estimates. To illustrate the effect of errors on fitting, we consider the outcome of Monte Carlo simulations in which errors are added to data in the form of normally distributed noise to ‘noise free’ data45. After noise is added, non-linear least squares is performed to find best estimates of the parameters.

44 In the case of a ‘model violation’ (such that the equation fitted to the data is not appropriate) there would be non-zero residuals even if the data were error free. Such non-zero residuals would translate to uncertainties in parameter estimates. 45 Monte Carlo simulations are dealt with in section 11.

36

To study the effect of noise on parameter estimates, data are generated in an ‘experiment’ in which the temperature of water in a vessel is monitored as it cools in a laboratory. The equation relating temperature, T, to time, t, is written, ( ) )exp( ktTTTT s −−+= ∞∞ (5.7) where T∞ is the temperature at infinite time (which is equal to room temperature), Ts is the starting temperature, and k is the rate constant for cooling. We choose (arbitrarily), T∞ = 26 °C, Ts = 62 °C, k = 0.034 min-1 Noise free data generated at 5 minute intervals between t = 0 and t = 55 minutes using equation 5.7 are shown in figure 5.6.

2025

3035

404550

5560

65

0 10 20 30 40 50 60

Time (minutes)

Tem

per

atu

re (

oC

)

Figure 5.6: Noise free data of temperature versus time generated using equation 5.7. Writing equation 5.7 using our usual convention for variables and parameter estimates gives,

( ) )exp( cxabay −−+= (5.8)

The next step is to fit equation 5.8 to the data, using the following conditions: Starting values: a0 = 25, b0 = 60, c0 = 0.02 The fitting options46 are selected using Solver Options dialog box as shown in figure 5.7:

46 Fitting options are discussed in section 9.1.

37

Figure 5.7: Solver Options used to fit equation 5.8 to the data in figure 5.6. Using Solver, the following values were recovered for best estimates of parameters and standard errors in best estimates.

a σa b σb c σc SSR 25.99993547 3.66 ×10-5 61.99997243 1.45 ×10-5 0.0339998 7.65×10-8 2.73 ×10-9

Table 5.1: Best estimate of parameters and standard errors in parameters. 5.4.1 Adding normally distributed noise to data using Excel’s Random Number Generator To investigate the effect of errors on the fitting of equations to data, normally distributed noise47 of constant standard deviation (i.e. homoscedastic data) is added to noise free data48. Normally distributed noise can be added to the data by using the Random Number Generation tool in the Analysis ToolPak. The mean and standard deviation of the random numbers are controlled using the dialog box shown in figure 5.8. When adding noise, it is usual to select the mean to be zero. The standard deviation can have any value (the larger the value, the greater the ‘noise’). For this example it is convenient to leave the standard deviation at its default value of one.

47 Also referred to as Gaussian noise. 48 Heteroscedastic noise can also be added to data with the aid of Excel’s Random Number Generation tool (see section 11.3).

38

Figure 5.8: Normally distributed noise with zero mean and standard deviation of one. Noise is generated using the Random Number Generation tool in Excel’s Analysis ToolPak. The ‘experimental’ data in column D (i.e. the data with noise added) are obtained by adding values in columns B to those in column C. Figure 5.8 shows the formula entered into cell D2. The next step is to use FilläDown to enter the formula into cells D3 to D13. A plot of y values with noise added (as given in column D of figure 5.8) versus x is shown in figure 5.9.

2025

3035

404550

55

6065

0 10 20 30 40 50 60

Time (minutes)

Tem

pera

ture

(oC

)

Figure 5.9: Data in figure 5.6 with addition of normally distributed noise. 5.4.2 Fitting an equation to noisy data To show the effect errors have on the fitting of equation 5.8 to data, parameter estimates (and standard errors) are compared when temperature data,

• are noise free • are rounded to the nearest 0.1 °C, but no noise is added • have noise of standard deviation 0.2 °C added • have noise of standard deviation 1 °C added • have noise of standard deviation 5 °C added.

A B C D 1 x(min) y(°C) Noise (°C) yexp (°C) 2 0 62 -0.30023 =B2+C2 3 5 56.3719334 -1.27768 4 10 51.62373162 0.244257 5 15 47.61784084 1.276474 6 20 44.23821173 1.19835 7 25 41.38693755 1.733133 8 30 38.98141785 -2.18359 9 35 36.95196551 -0.23418

10 40 35.23978797 1.095023 11 45 33.79528402 -1.0867 12 50 32.57660687 -0.6902 13 55 31.54845183 -1.69043

39

Temperature data over the period x = 0 to x = 55 minutes were generated in the manner described in section 5.4.1. The starting values for all fits were a0 = 25, b0 = 60, c0 = 0.02. Solver Options were as given in figure 5.7.

Noise a σa b σb c σc SSR

None 25.9999 3.66 × 10-5 61.99997 1.45 × 10-5 0.0339998 7.65×10-8 2.73 ×10-9

RD:0.149 25.9870 0.0712 61.9995 0.0282 0.0339850 0.000149 0.01031

0.2 26.4735 0.378 61.9991 0.154 0.0345083 0.000821 0.3070

1.0 19.2820 4.54 61.1136 0.959 0.0245648 0.00487 12.63

5.0 25.4289 6.42 70.1401 4.82 0.0484832 0.0197 278.2

Table 5.1: Best estimates of parameters and standard errors in estimates. As anticipated, the standard errors in the estimates increase as the noise increases. In order to indicate to what extent the estimates a, b and c differ from the true values, T∞ = 26 °C, Ts = 62 °C and k = 0.034 min-1 respectively, percentage differences are presented in table 5.2.

Noise a (a- T∞)×100%/ T∞ b (b- Ts)×100%/ Ts c (b- k)×100%/ k

None 25.9999 -0.000385 62.0000 4.84E-05 0.0339998 0.000588

RD:0.1 25.9870 0.0500 61.9995 0.000806 0.0339850 0.0441

0.2 26.4735 1.82 61.9991 0.00145 0.0345083 1.50

1.0 19.2820 25.8 61.1136 1.43 0.0245648 27.8

5.0 25.4289 2.20 70.1401 13.1 0.0484832 42.6

Table 5.2: (Absolute) percentage difference between parameter estimates and true values. Note that, on the whole, the percentage difference between the parameter estimates and the true value as given in table 5.2, increases as the noise increases. However examination of table 5.2 reveals that for a noise of standard deviation of 5, the estimate of T∞

is within ≈ 2% of the true value. This should be expected: as noise added is random there is a possibility that ‘by chance’ a good estimate for some parameter will be obtained even when the noise is quite large. However if we were to repeat the simulation many times we would find that, on average, the percentage difference between the true values of the parameters and the parameter estimates would increase as the noise level increased. 5.4.3 Relationship between sampling density and parameter estimates When repeat measurements are made of a single quantity (such as the time taken for a ball to free fall through a fixed distance) the standard error in the mean, xσ , of the

data is related to the standard deviation, σ, by,50

n

x

σσ = (5.9)

49 Denotes temperature values rounded to 0.1 °C. 50 See Kirkup (2002), ch1.

40

Equation 5.9 indicates that xσ reduces as n/1 i.e. if more measurements are made, we profit by a reduction in the standard error of the mean. It is anticipated that in analysis by least squares, there is a similar reduction in the standard error of the parameter estimates as the number of measurements increases51. To establish this, consider the analysis of data generated using equation 5.7 (with parameters, T∞ = 26 °C, Ts = 62 °C, k = 0.034 min-1) to which noise of unity standard deviation has been added to data ‘gathered’ in the range x = 0 °C to x = 60 °C. Data are generated at evenly spaced intervals of temperature. The number of values were chosen to be n = 9, 16, 25, 33, 41, 49, 61, 91, 121. a, b and c and their respective standard errors were determined using (unweighted) non-linear least squares.

Equation 5.8 was fitted to the data in order to establish best estimates and standard errors in the best estimates. Squaring the standard errors gives the variances, 2

aσ , 2bσ and 2

cσ , in the parameter estimates.

If an equation of the form given in equation 5.9 is valid for the standard errors in the parameter estimates, then plotting 2

aσ , 2bσ and 2

cσ versus 1/n should produce a straight line. Figure 5.10 shows such plots.

outlier

y = 22.942x + 0.826 R 2 = 0.2831

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

0.00 0.02 0.04 0.06 0.08 0.10 0.12

1/n

Var

ianc

e of

par

amet

er e

stim

ate,

a

a)

y = 5.8982x + 0.1142 R 2 = 0.892

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.00 0.02 0.04 0.06 0.08 0.10 0.12

1/n

Var

ianc

e of

par

amet

er e

stim

ate,

b

b)

51 Frenkel (2002) discusses the relationship between the standard errors in the parameter estimates and the number of data, n.

41

y = 0.0001x + 2E-06 R 2 = 0.9164

0.0E+00

2.0E-06

4.0E-06

6.0E-06

8.0E-06

1.0E-05

1.2E-05

1.4E-05

1.6E-05

1.8E-05

0.00 0.02 0.04 0.06 0.08 0.10 0.12

1/n

vari

ance

of p

aram

eter

est

imat

e, c

c)

Figure 5.10: Variance of parameter estimates as a function of number of data. Each graph shows the equation of the best straight line fitted to the points and the coefficient of determination, R2. With the exception of the circled data point in figure 5.10a, the points on the graphs in figures 5.10a to 5.10c appear to be follow a linear relationship, indicating that the variance of the parameter estimates does decrease (at least approximately) as 1/n.

42

Section 6: Linear least squares meets non-linear least squares It is possible to use the technique of non-linear least squares to fit linear equations to data. In such circumstances we expect the same values to emerge for the best estimates of the parameters and the standard errors in the estimates, irrespective of whether fitting is carried out by linear or non-linear least squares. To illustrate this, we consider an example in which the van Deemter equation is fitted to gas chromatography data in table 6.152. v (ml/min) H (mm)

3.4 9.59 7.1 5.29

16.1 3.63 20.0 3.42 23.1 3.46 34.4 3.06 40.0 3.25 44.7 3.31 65.9 3.50 78.9 3.86 96.8 4.24

115.4 4.62 120.0 4.67

Table 6.1: Plate height versus flow rate data. The relationship between plate height, H, and flow rate, v, can be written,53

CvvB

AH ++= (6.1)

where A, B and C are constants. Consistent with our convention of naming variables and parameter estimates, we rewrite equation 6.1 as,

cxxb

ay ++= (6.2)

a, b, and c are estimates of the constants A, B and C respectively in equation 6.1. Equation 6.2 may be fitted to the data in table 6.1 using linear least squares. A convenient way to accomplish this is to use the Regression tool in the Analysis ToolPak in Excel54. Figure 6.1 shows an Excel spreadsheet containing the data and the output of the Regression tool. To perform (linear) least squares with this tool, we place values of 1/x and x in adjacent columns (these appear in columns B and C of figure 6.1).

52See Moody H W (1982). 53See Snyder et al. (1997), p46. 54See Kirkup (2002) p 373.

43

Figure 6.1: Fitting equation 6.2 to data using the Regression tool in Excel’s Analysis ToolPak. Equation 6.2 is now fitted to the data in table 6.1 using Solver to perform non-linear least squares. The approach adopted for determining the best estimates and the standard errors in the best estimates is as described in sections 4, 4.1 and 4.2. As anticipated, both linear least squares and non-linear least squares return the same best estimates for the parameters and standard errors in the best estimates, as can be seen by inspection of figures 6.1 and 6.2.

a

b c

σa

σb

σc

44

Best estimates of parameters Standard errors in estimates

Standard deviation of y values

E =DTD

D matrix

DT matrix

Inverse of E matrix

Sum of squares of residuals

Figure 6.2: Spreadsheet for fitting equation 6.2 to data in table 6.1 using non-linear least squares.

45

Exercise 4 The Knox equation is widely used to represent the relationship between the plate height H, and the velocity, v, of the mobile phase of a liquid chromatograph (LC)55. The relationship may be written,

CvvB

AvH ++= 31

(6.3)

where A, B and C are constants. Table 6.2 shows LC data of plate height versus flow velocity for data published by Katz et al (1983)56.

H (cm) v (cm/s) 0.004788 0.03027 0.003704 0.04527 0.003116 0.06507 0.002526 0.10023 0.002292 0.1306 0.002176 0.1653 0.002246 0.2488 0.002360 0.3185 0.002678 0.4792 0.002856 0.6028

Table 6.2: Data from Katz et al. (1983). Use either linear or non-linear least squares to fit equation 6.3 to the data in table 6.2 and thereby obtain best estimates of A, B and C and the standard errors in the

estimates. [0.002509 31

32

scm ⋅ , 0.0001232 cm2/s , 0.0008720 s, 0.000185 31

32

scm ⋅ , 3.12 × 10-6 cm2/s, 0.000326 s].

55See Kennedy and Knox, (1972). 56 The data were obtained with a benzyl acetate solute and a mobile phase of 4.48% (w/v) ethyl acetate in n – pentane.

46

Section 7: Weighted non-linear least squares There are some occasions where the standard deviation of the errors in the y values is not constant (i.e. errors exhibit heteroscedasticity). Such a situation may be revealed by plotting residuals57, ( )yy ˆ− , versus x. If errors are heteroscedastic, then weighted fitting is required. The purpose of weighted fitting is to obtain best estimates of the parameters by forcing the line close to the data that are known to high precision, while giving much less weight to those data that exhibit large scatter. The starting point for weighted fitting using least squares is to define a sum of squares of residuals that takes into account the standard deviation in the y values. We write,

−=

2

2 ˆ

i

ii yyσ

χ (7.1)

We refer to χ2 as the weighted sum of squares of residuals58. σi is the standard deviation in the ith y-value. The purpose of the weighted fitting is to find best estimates of parameters that minimise χ2 in equation 7.1. If σi is constant, as it is in unweighted fitting using least squares, equation 7.1 can be replaced by equation 2.7. In this sense, equation 7.1 can be thought as the more general formulation of least squares. 7.1 Weighted fitting using Solver In order to establish best estimates of parameters using Solver when weighted fitting is performed, we use an approach similar to that described in section 4. For weighted fitting, an extra column in the spreadsheet containing the standard deviations σi is required. It is possible that the absolute values of σi are unknown and that only relative standard deviations are known. For example, equations 7.2 and 7.3 are sometimes used when weighted fitting is required, ii y∝σ (7.2)

ii y∝σ (7.3) Weighted fitting can be carried out so long as,

• the absolute standard deviations in values are known, or • the relative standard deviations are known.

In order to accomplish weighted non linear least squares, we proceed as follows:

1) Fit the desired equation to data by calculating χ2 as given by equation 7.1. Use Solver to modify parameter estimates so that χ2 is minimised.

2) Determine the elements in the D matrix, as described in section 4.

57 See section 6.10 of Kirkup (2002).

58

−2

ˆ

i

ii yyσ

follows a chi-squared distribution, hence the use of the symbol , χ2.

47

3) Construct the weight matrix, W, in which the diagonal elements of the matrix contain the weights to be applied to the y values.

4) Calculate the weighted standard deviation σw, where σw is given by,

21

2

−=

pnw

χσ (7.4)

χ2 is given by equation 7.1, n is the number of data points and p is the number of parameters in the equation to be fitted to the data.

5) Calculate the standard errors of the parameter estimates, given by59

( ) ( )[ ] 211T WDDB

−= wσσ (7.5)

B is the matrix containing elements equal to the best estimates of the parameters. σw is the weighted standard deviation, given by equation 7.4.

6) Calculate the confidence interval for each parameter appearing in the equation at a specified level of confidence (usually 95%).

To illustrate steps 1 to 6, we consider an example of weighted fitting using Solver. 7.2 Example of weighted fitting using Solver The relationship between the current, I, through a tunnel diode and the voltage, V across the diode may be written60, ( )2VBAVI −= (7.6) A and B are constants to be estimated using least squares. Table 7.1 shows current – voltage data for a germanium tunnel diode. V(mV) I (mA)

10 4.94 20 6.67 30 10.57 40 10.11 50 10.44 60 12.90 70 10.87 80 9.73 90 7.03

100 5.61 110 3.80 120 2.36

Table 7.1: Current- voltage data for a germanium tunnel diode.

59 See Neter et al. (1996). 60 See Karlovsky (1962).

48

Equation 7.6 could be fitted to data using unweighted non-linear least squares (in the first instance it usually sensible to use an unweighted fit, as the residuals may show little evidence of heteroscedacity and so there is little point in performing a more complex analysis). In this example we are going to assume that the error in the y quantity is proportional to the size of the y quantity, ie equation 7.3 is valid for these data. The data in table 7.12 are entered into a spreadsheet as shown in sheet 7.1 and is plotted in figure 7.1. Sheet 7.1: Data from table 7.1 entered into a spreadsheet.

A B 1 x(mV) y(mA) 2 10 4.94 3 20 6.67 4 30 10.57 5 40 10.11 6 50 10.44 7 60 12.90 8 70 10.87 9 80 9.73

10 90 7.03 11 100 5.61 12 110 3.80 13 120 2.36 14

Figure 7.1: Current –voltage data for a germanium tunnel diode.

0

2

4

6

8

10

12

14

0 20 40 60 80 100 120 140 x (mV)

y(m

A)

49

7.2.1 Best estimates of parameters using Solver Consistent with symbols used in other analyses in this document, we rewrite equation 7.6 as,

( )2xbaxy −= (7.7)

We can obtain a reasonable value for b, which we will use as a starting value, b0, by noting that equation 7.7 predicts that y = 0 when x = b. By inspection of figure 7.1 we see that when y = 0, x ≈ 130 mV, so that b0 = 130 mV. Equation 7.7 is rearranged to give,

( )2xbx

ya

−= (7.8)

An approximate value for a (which we take to be the starting value, a0) can be obtained, by choosing any data pair from sheet 7.1 (say, x = 50 mV and y = 10.44 mA) and substituting these into equation 7.8 along with b0 = 130 mV. This gives (to two significant figures) a0 = 3.3 × 10-5. Sheet 7.2 shows the cells containing the calculated values of current ( y ) in column C based on equation 7.7. The parameter estimates are the starting values (3.3 × 10-5 and 130) in cells D17 and D18. Column D of sheet 7.2 contains the weighted sum of squares of residuals. The sum of these residuals appears in cell D14. Sheet 7.2: Fitted values and weighted sum of squares of residuals before optimisation occurs.

B C D 1

y(mA) y

2ˆ

−y

yy

2 4.94 4.752 0.001448311 3 6.67 7.986 0.038927822 4 10.57 9.9 0.004017905 5 10.11 10.692 0.003313932 6 10.44 10.56 0.000132118 7 12.90 9.702 0.061457869 8 10.87 8.316 0.055205544 9 9.73 6.6 0.103481567

10 7.03 4.752 0.105001811 11 5.61 2.97 0.221453287 12 3.80 1.452 0.381793906 13 2.36 0.396 0.692562482 14 sum 1.668796555 15 16 solver 17 a 3.30E-05 18 b 130

50

Running Solver (using the default settings – see section 9.1) gives the output shown in sheet 7.3. Sheet 7.3: Fitted values and weighted sum of squares of residuals after optimisation using Excel’s Solver.

B C D 1

y(mA) y

2ˆ

−y

yy

2 4.94 4.451251736 0.009788509 3 6.67 7.671430611 0.022541876 4 10.57 9.797888132 0.005335934 5 10.11 10.96797581 0.007201911 6 10.44 11.31904514 0.007089594 7 12.90 10.98844764 0.02195801 8 10.87 10.11353481 0.004843048 9 9.73 8.831658167 0.008524277

10 7.03 7.280169211 0.00126636 11 5.61 5.596419449 5.86015E-06 12 3.80 3.917760389 0.000960354 13 2.36 2.381543539 8.33317E-05 14 sum 0.089599066 15 16 from solver 17 a 2.289E-05 18 b 149.4440503

The weighted standard deviation is calculated using equation 7.4, ie

21

2

−=

pnw

χσ = 2

1

21260.08959906

−= 0.09465678 (7.9)

7.2.2 Determining the D matrix In order to determine the matrix of partial derivatives, we calculate

( )[ ] [ ]( ) aa

xbayxbayay ii

xb

i

i−+−+

≈

∂∂

δδ

1,,,,1

,

(7.10)

and

( )[ ] [ ]( ) bb

xbayxbayby ii

xa

i

i−+−+

≈

∂∂

δδ1

,,,1,

,

(7.11)

is chosen to be 10-6 (see section 4.2). Sheet 7.4 shows the values of the partial derivatives in the D matrix.

51

Sheet 7.4: Calculation of partial derivatives used in the D matrix.

E F G H I 1 y (b constant) y (a constant) dy/da dy/db 2 4.451256188 4.451261277 194446.4316 0.063842869 3 7.671438283 7.671448325 335115.2430 0.118528971 4 9.79789793 9.797912649 428006.4343 0.164058306 5 10.96798677 10.96800576 479120.0055 0.200430873 6 11.31905646 11.31907916 494455.9567 0.227646674 7 10.98845863 10.98848436 480014.2876 0.245705707 8 10.11354493 10.11357286 441794.9986 0.254607974 9 8.831666999 8.831696179 385798.0894 0.254353473 10 7.280176491 7.280205816 318023.5601 0.244942205 11 5.596425045 5.596453279 244471.4107 0.226374169 12 3.917764307 3.917790076 171141.6412 0.198649367 13 2.381545921 2.381567715 104034.2515 0.161767798 14 15 16 b constant a constant 17 2.289E-05 2.289E-05 18 149.4440503 149.4441997

7.2.3 The weight matrix, W

The weight matrix is a square matrix with diagonal elements proportional to 2

1

iσ and

other elements equal to zero61. In this example, σi is taken to be equal to yi, so that the diagonal matrix is as given in sheet 7.5.

Sheet 7.5: Weight matrix for tunnel diode analysis (while the weights are shown to only three decimal places, Excel retains all figures for the calculations).

C D E F G H I J K L M N 24 0.041 0 0 0 0 0 0 0 0 0 0 0 25 0 0.022 0 0 0 0 0 0 0 0 0 0 26 0 0 0.009 0 0 0 0 0 0 0 0 0 27 0 0 0 0.010 0 0 0 0 0 0 0 0 28 0 0 0 0 0.009 0 0 0 0 0 0 0 29 0 0 0 0 0 0.006 0 0 0 0 0 0 30 0 0 0 0 0 0 0.008 0 0 0 0 0 31 0 0 0 0 0 0 0 0.011 0 0 0 0 32 0 0 0 0 0 0 0 0 0.020 0 0 0 33 0 0 0 0 0 0 0 0 0 0.032 0 0 34 0 0 0 0 0 0 0 0 0 0 0.069 0 35 0 0 0 0 0 0 0 0 0 0 0 0.180

61 For details on the weight matrix, see Neter et (1996).

D

W

52

7.2.4 Calculation of ( ) 1T WDD −

To obtain standard errors in estimates a and b, we must determine ( ) 1T WDD −.

Sheet 7.6 shows the several steps required to determine ( ) 1T WDD−

. The steps consist of:

a) Calculation of the matrix WD. The elements of this matrix are shown in cells C37 to D48. (W is multiplied with D using the MMULT() function in Excel).

b) Calculation of the matrix DTWD. The elements of this matrix are shown in cell G37 to H38.

c) Inversion of the matrix, DTWD. The elements of the inverted matrix are shown in cells G41 to H42.

Sheet 7.6: Calculation of ( ) 1T WDD − . B C D E F G H

37 WD 7967.94045 0.002616 DTWD 2.2728E+10 15410.18847 38 7532.55853 0.002664 15410.1885 0.013460574 39 3830.89566 0.001468 40 4687.5077 0.001961 41 4536.55955 0.002089 (DTWD)-1 1.9662E-10 -0.0002251 42 2884.5279 0.001477 -0.0002251 331.9969496 43 3739.05374 0.002155 44 4075.06361 0.002687 45 6435.00139 0.004956 46 7767.87729 0.007193 47 11851.9142 0.013757 48 18678.9449 0.029045

7.2.5 Bringing it all together To calculate the standard errors in a and b, the weighted standard deviation of the mean (given by equation 7.4) is multiplied by the square root of the diagonal elements

of the ( ) 1T WDD−

matrix, i.e.

( ) 21

10101.9662 −×= wa σσ = 0.09465678 × ( ) 2110101.9662 −× = 1.327 × 10-6

and

( ) 21

00.332wb σσ = = 0.09465678 × ( ) 21101000.332 −× = 1.725

It follows that the 95% confidence intervals for A and B are, A= a ± t95%,νσa (7.12)

B = b ± t95%,νσb (7.13) t95%,ν is t value corresponding to the 95 % level of confidence and ν is the number of degrees of freedom.

53

In this example, the number of degrees of freedom, ν = n – p = 12 – 2 = 10. From statistical tables62, t95%,10 = 2.228 It follows that (inserting units),

A= (2.29 ± 0.30) × 10-5 mA/(mV)3 B = (149.4 ± 3.8) mV

Exercise 5 Equation 7.6 may be transformed into a form suitable for fitting by linear least squares.

a) Show that equation 7.6 can be rearranged into the form,

VABAVI 2

12

121

−=

(7.14)

b) Plot a graph of2

1

VI

versus V.

c) Use unweighted least squares to obtain best estimates of A and B and standard errors in the best estimates63. [2.30 × 10-5 mA/(mV)3, 149.8 mV, 2.0 × 10-6 mA/mV3, 1.5 mV)

d) Why is it preferable to use non-linear least squares to estimate parameters rather than to linearise equation 7.6 followed by using linear least squares to find these estimates?

62 See, for example, Kirkup (2002) page 385. 63 Care must be exercised when calculating the uncertainty in the estimate of B as this requires use of both slope and the intercept and these are correlated. For more information see Kirkup (2002), page 232.

54

Section 8: Uncertainty propagation, least squares estimates and calibration Establishing best estimates of parameters in an equation may be the main purpose of fitting an equation to experimental data. For example, in an experiment to study the variation of resistance, R, with time, t, in a photoconductor, the primary purpose of the fitting may be to obtain best estimates for the parameters A1, A2, B1 and B2

which appear in equation 8.164 which represents a possible relationship between R and t. ( ) ( )tBAtBAR 2211 expexp += (8.1) There are situations in which parameter estimates are used to calculate other quantities of interest. A common example involves gathering x-y data for the purpose of calibration. Once the best estimates of the parameters in the calibration equation have been determined, the equation is used to find ‘values of x’ from measured values of y. For example, if the relationship between x and y is

y = a + bx (8.2)

then for a given (mean) value of y, y , the corresponding value of x, x , can be determined. This is done by rearranging equation 8.2 and replacing y by y and x by x , so that

b

ayx

−=ˆ (8.3)

One approach to calculating the standard error in x is to assume that the errors in a, b and y are uncorrelated. In this situation the standard error, xσ , is given by,

222

ˆ

ˆˆˆ

∂∂+

∂∂+

∂∂= ybax y

xbx

ax σσσσ (8.4)

Unfortunately, errors in the parameters estimates a and b are correlated65, so it is not valid to use equation 8.4. To correctly determine xσ , we must account for that correlation. We begin by determining the covariance matrix, V, given by, V = σ2A-1 (8.5) Where σ2 is the variance in the y values and A-1 is the error matrix, as discussed in section 2. σ2 is found using,

64Equation 8.1 represents a possible relationship between R and t (Kirkup L and Cherry I, 1988). 65See Salter (2002).

55

( )

2

ˆ 22

−−

= n

yy iiσ (8.6)

where ii bxay +=ˆ , and n is the number of x-y data. In this example, A-1 is the inverse of the matrix, A, where,

A =

2xx

xn (8.7)

There is an economical way to determine the elements in the matrix, A, which is especially efficient when using a computer package that allows for matrix multiplication (such as Excel). A is written as, A = XTX (8.8) XT is the transpose of the matrix, X, where X is given by,

X =

n

i

x

x

xx

x

1

1

11

1

3

2

1

(8.9)

If f is some function of a and b, then66, f

Tf Vdd=2

fσ (8.10) where

∂∂∂∂

=

bfaf

fd (8.11)

As V = σ2A-1, equation 8.10 can be rewritten as, f

-1Tf dAd22 σσ =f (8.12)

8.1: Example of propagation of uncertainties involving parameter estimates Equation 8.12 is applied to data gathered in an experiment which considers the variation in pressure of a fixed mass and volume of gas as the temperature of the gas changes. The data are given in table 8.1.We will use the data to estimate the value of

66See Salter (2000).

56

the temperature at which the pressure of the gas is zero (this is termed the ‘absolute zero’ of temperature). Table 8.1: Pressure versus temperature data.

θ(°C) P (kPa) -20 211 -10 218

0 224 10 238 20 247 30 251 40 259 50 265 60 277 70 288 80 294

Assume that the relation between pressure, P, and temperature, θ, can be written, P = A + Bθ (8.13) Where A and B are parameters to be estimated using least squares. We will determine,

a) best estimates for A and B (written as a and b respectively). b) standard errors, σa and σb, in a and b. c) the intercept, INTθ , of the best line through the data on the temperature axis.

d) the standard error in INTθ , assuming errors in a and b are uncorrelated.

e) the standard error in INTθ , assuming errors in a and b are correlated. Solution a) a and b may be determined in several ways, including using the LINEST() function in Excel67. Applying the LINEST() function to the data in table 8.1 we obtain: a = 226909 Pa b = 836.36 Pa⋅°C b) Using LINEST() in Excel to calculate σa and σb gives, σa = 993.7 Pa σb = 22.80 Pa⋅°C 67 See page 228 of Kirkup (2002).

57

c) The intercept, θINT, on the temperature axis occurs when P = 0. Rearranging equation 8.13 gives,

BA

INT −=θ (8.14)

The best estimate of θINT, written as INTθ , is therefore,

ba

INT −=θ (8.15)

3.27136.836

226909 −=−= °C

d) Assuming that errors in a and b are uncorrelated, the usual propagation of uncertainties equation gives the standard error in INTθ ,

INTθσ ˆ as 68,

22

ˆ

ˆˆ

∂∂

+

∂∂

= bINT

aINT

baINTσθσθσ θ (8.16)

Now,

ba

INT 1ˆ−=

∂∂θ

and (8.17)

2

ˆ

ba

bINT =

∂∂θ

(8.18)

It follows that (using equation 8.16).

( )

2

2

2

ˆ 80.2236.836

2269097.993

36.8361

×+

×−=INTθσ

= 7.49 °C It follows that, ( )573271 . .INT ±−=θ °C e) In order to determine INTθ when the correlation between a and b is accounted for, we write (following equation 8.7),

68 See page 390 of Kirkup (2002).

58

A =

=

2090033033011

2xx

xn

Inverting the matrix A is accomplished using the MINVERSE() function in Excel69 . This gives,

A-1 =

×−−

5-109.0909100272727.000272727.0172727.0

(8.19)

To determine

INTθσ ˆ we use equation 8.12. It is convenient to rewrite equation 8.12 as,

INTINTINT θθθ σσ ˆˆ22

ˆ dAd -1T= (8.20)

Now

( )

2

ˆ 2

2

−−

= n

PP iiσ (8.21)

where ii baP θ+=ˆ (8.22) Values for a and b appear in part a) of this question. Using those estimates and equation 8.21, we find, 57171712 =σ (8.23) From equation 8.11 and equation 8.15,

INTθd is given by,

−=

∂∂∂∂

=

2

ˆ

1

ˆ

ˆ

bab

b

a

i

i

INT θ

θ

θd (8.24)

Substituting a and b obtained in part a) of this question gives,

−=

32439.00011957.0

INTθd

Returning to equation 8.20, we have,

( )

−

×−−

−= − 32439.00011957.0

1009091.900272727.000272727.0172727.0

32439.00011957.05717171 52

INTθσ

69 See page 285 in Kirkup (2002).

59

so that, 20.682

ˆ =INTθσ (°C)2, or

26.8ˆ =INTθσ °C

Now we write: ( )3.83.271 ±−=INTθ °C This may be compared with INTθ obtained in part d) of this question when a and b are assumed to be uncorrelated, ie: ( )573271 ..INT ±−=θ °C In this instance, failure to account for the correlation between a and b results in an underestimation of the standard error in INTθ . 8.2 Uncertainties in derived quantities incorporating least squares estimates Parameter estimates obtained using least squares, as well as other quantities that have uncertainty, may be brought together to determine a ‘derived’ quantity. The derived quantity has an uncertainty which may be calculated. As an example, consider the calibration line in figure 8.1 which is to be used to determine ox when y = oy (in an

analytical chemistry application, oy might represent the mean detector response of an instrument and ox is the predicted concentration of the analyte corresponding to that response).

Figure 8.1: Calibration line fitted to x-y data. Assuming the relationship between x and y in figure 8.1 is linear, then, oo xbay ˆ+= (8.25)

x

y

yo

o x ˆ

60

or,

b

ayx o

o

−=ˆ (8.26)

a and b are determined using least squares. As oy is not correlated with a or b, we write,

oooo xxy

o

ox y

xˆˆ

2

2

2ˆ

ˆdAd -1Tσσσ +

∂∂

= (8.27)

From equation 8.26 we have,

by

x

o

o 1ˆ=

∂∂

Also,

moy

22 σσ = (8.28)

where σ2 is given by equation 8.6, and m is the number of repeat measurements made of the detector response for a particular (unknown) analyte concentration. 8.3: Example of propagation of uncertainties in derived quantities In section 8.1 we considered data from an experiment in which the variation in pressure of a fixed mass and volume of gas was measured as the temperature of the gas changes. We will use that data and the additional information that four repeat measurements of pressure were made at an unknown temperature such that, Mean pressure oP = 2.54 × 105 Pa, Adapting equation 8.26, we have,

36.836

2269091054.2ˆ5 −×=

−=

baPo

oθ = 32.39 °C

Using equation 8.28, and the value of σ2 given in equation 8.23, we find70

4

571717122 ==

mP

σσ = 1.429 × 106 (Pa)2

Rewriting equation 8.27 in terms of the variables in this question gives,

70 The assumption is made here is that the scatter in the y values remains constant, such that the estimate we make of the standard deviation in the y values during calibration is the same as that of the y values obtained for the unknown x value.

61

ooo P

o

o

P θθθ σσθσ ˆˆ2

2

2ˆ

ˆdAd -1T+

∂∂

= (8.29)

20.6810429.136.836

1 62

+××

=

= 2.04 + 68.20 so that, 38.8ˆ =

oθσ °C

Finally, we write, oθ = (32.5 ± 8.4) °C 8.4: Uncertainty propagation and nonlinear least squares In general, parameter estimates obtained using non linear least squares are correlated. Therefore, for derived quantities which incorporate parameter estimates, the covariance matrix must be used to establish the standard errors in those quantities. The first stage, as with any non linear fitting, is to minimise the sum of squares of residuals, SSR, as described in sections 3 and 4. Suppose f is a function of parameter estimates obtained through non linear least squares. The variance in f, 2

fσ , may be written,

f

-1Tf dEd22 σσ =f (8.30)

E-1 is the inverse of the matrix, E, where,71 E = DTD (8.31) D is given by,

D =

∂∂

∂∂

∂∂

∂∂

∂∂

∂∂

∂∂

∂∂

∂∂

∂∂

∂∂

∂∂

cy

by

ay

cy

by

ay

cy

by

ay

cy

by

ay

nnn

iii

222

111

(8.32)

71See section 4.2.

62

and72

∂∂∂∂∂∂

=

cfbfaf

fd (8.33)

8.4.1: Example of uncertainty propagation in parameter estimates obtained by nonlinear least squares In many situations, calibration data exhibit a slight curvature and it is a matter of debate whether it is appropriate to fit an equation of the form y = a + bx to the data. As an example, consider the data shown in table 8.2 and also in figure 8.2. Table 8.2: Area versus concentration data for biochanin. Conc. (x) (mg/l)

Area, (y) (arbitrary units)

0.158 0.121342 0.158 0.121109 0.315 0.403550 0.315 0.415226 0.315 0.399678 0.631 1.839583 0.631 1.835114 0.631 1.835915 1.261 3.840554 1.261 3.846146 1.261 3.825760 2.522 8.523561 2.522 8.539992 2.522 8.485319 5.045 16.80701 5.045 16.69860 5.045 16.68172 10.09 34.06871 10.09 33.91678 10.09 33.70727

Close inspection of the data in figure 8.2 indicates that the relationship between Area and Concentration is not linear, but shows a slight but definite curvature. There are many candidates for the function that might be fitted to data, but we must be wary of using a function with too many adjustable parameters (see section 10). We will fit the function, CBxAy += (8.34) 72 Equations 8.32 and 8.33 are appropriate where there are three best estimates, a, b and c of the parameters in the equation fitted to data. Both equations may be extended if the number of parameters to be estimated exceeds three.

0 5

10 15 20 25 30 35 40

0 2 4 6 8 10 12 Concentration (mg/l)

0

5 10 15 20 25 30 35 40

0 2 4 6 8 10 12

Area (arbitrary units)

Figure 8.2: Calibration curve of area versus concentration for biochanin.

63

to the data in table 8.2. Applying non-linear least squares, the best estimates for A, B and C, represented by a, b, and c, respectively are, a = -0.5651 b = 3.581 c = 0.9790 When repeat measurements are made of the area under a chromatogram curve, the mean area can be determined. Using this mean we may estimate the concentration of the biochanin. We begin by rearranging equation 8.34, so that,

C

BAy

x

1

−= (8.35)

Substituting a, b and c, and oy , into equation 8.35, gives the estimate of x, ox , as,

c

oo b

ayx

1

ˆ

−= (8.36)

As oy is not correlated with a, b or c, we can write,

oooo xxy

o

ox y

xˆˆ

2

2

2ˆ

ˆdEd -1Tσσσ +

∂∂

= (8.37)

where,

∂∂∂∂∂∂

=

cxbxax

o

o

o

ˆ

ˆ

ˆ

ˆoxd (8.38)

and,

( )

3

ˆ 22

−−

= n

yy iiσ (8.39)

Partially differentiating ox in equation 8.36 with respect to a, b, c and oy respectively gives,

−

−−=

∂∂ c

c

oo

bay

bcax

1

1ˆ (8.40)

64

−−=

∂∂ c

oo

bay

bcbx

1

1ˆ (8.41)

−

−−=

∂∂

bay

bay

ccx o

coo ln

1ˆ1

2 (8.42)

−

−=

∂∂ c

c

o

o

o

bay

bcyx

1

1ˆ (8.43)

After calibration, the area under the chromatogram curve is measured four times for a sample of unknown concentration. It is found that,

15513.6=oy (8.44) Fitting using non-linear least squares gives, a = -0.5651 b = 3.581 c = 0.9790 Substituting for a, b, c and oy in equation 8.36 gives the estimate of the unknown

concentration, ox , as

9790.011

581.35651.015513.6ˆ

+=

−=

co

o bay

x = 1.902 mg/l

Substituting for a, b, c and oy into equations 8.40 to 8.43, we obtain,

2890838790ˆ

.-axo =

∂∂

, 542448030ˆ

.bxo −=

∂∂

, 2489144871ˆ

.-cxo =

∂∂

, 2890838790ˆ

.yx

o

o =∂∂

65

Sheet 8.1 Shows the layout of a spreadsheet used to calculate ox and

oxσ .

Sheet 8.1: Annotated sheet showing calculation of ox and

oxσ . A B C D 44 a -0.56512 σa 0.088470 45 b 3.58138 σb 0.078995 46 c 0.97901 σc 0.009026 47 48

oxd ˆ -0.28908 49 -0.54245 50 -1.24891 51 52 T

xod ˆ -0.28908 -0.542448 -1.248914

53 54 V 0.00783 -0.00601 0.000646 55 -0.00601 0.00624 -0.000705 56 0.00065 -0.0007 8.15E-05 57 58 V

oxd ˆ 0.00019 59 -0.00077 60 0.00009 61 62

oo xTx Vdd ˆˆ 0.00025

63 64

o

o

yx

∂∂ˆ

0.28908

65 2ˆ

∂∂

o

o

yx

0.08357

66 2σ 0.02985 67 2

oyσ 0.00746 68

oy 6.15513 69 m 4 70

ox 1.90193 71 2

ˆoxσ 0.00087 72

oxσ 0.02948

Best estimates of parameters and standard errors in parameters

∂∂∂∂∂∂

=

cxbxax

o

o

o

ˆ

ˆ

ˆ

ˆ oxd

−

−−=

∂∂ c

c

o

o

o

bay

bcyx

1

1ˆ

( )3

ˆ 22

−−

= n

yy iiσ

moy

22 σσ =

V = σ2E-1

co

o bay

x

1

ˆ

−=

oooo xxyo

ox y

xˆˆ

2

2

2ˆ

ˆdEd -1Tσσσ +

∂∂

=

66

From sheet 8.1, ox and oxσ are found to be:

ox = 1.90193

oxσ = 0.029 which allows us to write: x = (1.902 ± 0.029) mg/l Exercise 6 The following data were obtained during the calibration of an HPLC system using Ibuprofen. The area under the chromatograph peak is shown as a function of known concentrations (expressed in mass/tablet) of Ibuprofen. Table 8.3: Area under chromatograph peak as a function of concentration of Ibuprofen.

Mass per tablet/(mg/tablet)

Area (arbitrary

units) 103.9 265053 103.9 261357 139.3 345915 139.3 345669 180.1 445684 180.1 445753 200.3 494700 200.3 493846 219.9 540221 219.9 539610 278.1 683881 278.1 683991 305.7 755890 305.7 754901

Using the data in table 8.3:

a) Fit an equation of the form cbxay += to the data in table 8.3, where y corresponds to the area under the chromatograph peak and x corresponds to Ibuprofen concentration. Determine a, b and c and their respective standards errors.

b) A sample of Ibuprofen of unknown concentration is injected into the column of the calibrated HPLC. The mean area of three replicates measurements is found to be 405623. Use this information to estimate the concentration of Ibuprofen and the standard error in the estimate of the concentration.

67

Section 9: More on Solver Solver was devised primarily for use by the business community and this is reflected in the features it offers. Solver comprises three optimisation algorthms:

1) For integer problems, Solver uses the Branch and Bound method73. 2) Where equations are linear, the Simplex method is used for optimisation74. 3) In the case of non-linear problems, the General Reduced Gradient (GRG)

method is adopted75. It is the GRG method that is applied in our analyses therefore most of this section is devoted to describing features of Solver that relate to this. Though optimisation can be carried out successfully with the default settings in the Solver Option dialog box, Solver possesses several options that can be adjusted by the user to assist in the optimisation process and we will describe those next. The Solver dialog box, as shown in figure 4.2 offers the facility to constrain parameters estimates. The application of constraints requires careful consideration as it is possible that Solver will locate a local minimum, rather than a global minimum. The best estimates returned by Solver need to be compared with ‘physical reality’ before being accepted. Consider an example in which a parameter in an equation represents the speed of sound, v, in air. If, after fitting, the best estimate of v is -212 m/s, it is fair to question whether this value is ‘reasonable’. If it is not, then one course of action it to try new starting values for the parameter estimates. We could use the Constraints box in Solver to constrain the estimate of v so that it cannot take on negative values. This cannot guarantee that a physically meaningful value will be found for v, only that the value will be non-negative. 9.1 Solver Options To view Solver options shown in figure 9.1, it is necessary to click on the Option button in the Solver dialog box. This dialog box may be used to modify, for example, the methods by which the optimisation takes place. This, in turn, may provide for a better fit or reduce the fitting time over that obtained using the default settings.

73 See Wolsey L A, (1998). 74 See Nocedal, J, (1999). 75 See Smith and Lasden, (1992).

68

Figure 9.1: Solver Options dialog box illustrating default settings. We now consider some of the options in the Solver Options dialog box. Max Time: This restricts the total time Solver spends searching for an optimum

solution. Unless there are many data, the default value of 100 s is usually sufficient. If the maximum time is set too low, such that Solver has not completed its search, then a message is returned 'The maximum time limit was reached; continue anyway?'. Clicking on the Continue button will cause Solver to carry on searching for a solution.

Iterations: This is the maximum number of iterations that Solver will execute before terminating its search. The default value is 100, but this can be increased to a limit of 32,767. Solver is likely to find an optimum solution before reaching such a limit or return a message that an optimum solution cannot be found. If the number of iterations is set too low, such that Solver has not completed its search, then a message will be returned 'The maximum iteration limit was reached; continue anyway?'. Clicking on the Continue button will cause Solver to carry on searching for a solution.

Precision and Tolerance:

These options are applicable to situations in which constraints have been specified. Specifying constraints is not advised and so we will not consider these options.

Convergence: As fitting proceeds, Solver compares the most recent solution (for our application this would be the value of SSR) with previous solutions. If the fractional reduction in the solution over five iterations is less than the value in the Convergence box, Solver reports that optimisation is complete. If this value is made very small (say, 10-6) Solver will continue iterating (and hence take longer to complete) than if that number is larger (say, 10-2).

Assume Linear Model

If this box is ticked then Solver uses the Simplex method to obtain best estimates of parameters. If the model to be fitted to data is linear, then fitting may be performed using the Regression Tool in the Analysis ToolPak. This is an attractive alternative as the Regression Tool returns best estimates, standard errors in estimates, confidence intervals and sum of squares of residuals. If the 'Assume Linear

69

Model' box is ticked, Solver will attempt to establish if the model is indeed linear. If Solver determines that the model is non-linear, the message is returned 'The conditions for Assume Linear Model are not satisfied'. To continue, it is necessary to return to the Solver Option dialog box and untick the Assume Linear Model option.

Assume Non-Negative

This constrains all estimates in an equation so that they cannot take on negative values.

Use Automatic Scaling

In certain problems there may be many orders of magnitude difference between the data, the parameter estimates and the value in the target cell. This can lead to rounding problems owing to the finite precision arithmetic performed by Excel. If the 'Use Automatic Scaling' box is ticked, then Solver will scale values before carrying out optimisation (and 'unscale' the solution values before entering them into the spreadsheet). It is advisable to tick this box for all problems.

Show Iteration Results

Ticking this box causes Solver to pause after each iteration, allowing new estimates of parameters and the value in the Target cell to be viewed. If parameter estimates are used to draw a line of best fit through the data, then the line will be updated after each iteration. Updating the fitted line on the graph after each iteration gives a valuable insight into the progress made by Solver to find best estimates of the parameters in an equation.

Estimates: Tangent or Quadratic

This determines the method used to find subsequent values of each parameter estimate at the outset of the search (ie either linear or quadratic extrapolation). Both methods produce the same final results for the examples described in this document.

Derivatives: Forward or Central

The partial derivatives of the function in the target cell with respect to the parameter estimates are found by the method of finite differences. It is possible to perturb the estimates 'forward' from a particular point (similar to that described in section 4.3) or to perturb the estimates forward and backward from the point in order to obtain a better estimates of the partial derivatives. Both methods of determining the partial derivatives produce the same final results for the examples described in this document.

Search: Newton or Conjugate

Specifies the search algorithm. Reference to the Quasi - Newton and Conjugate search methods used by Excel can be found in Safizadeh and Signorile (1993) and Perry (1978) respectively. Both methods of produce the same final results for the examples described here.

Load Model and Save Model

It is possible that the effect on optimisation of using a combination of options such as Tangent (Estimates), Central (Derivatives) and Conjugate (Search methods) would like to be considered. It is tedious to record which fitting conditions have been used. Excel offers the facility to store the option by clicking on Save Model, followed by specifying the cells on the spreadsheet where the Model conditions should be saved. These conditions can be recalled by clicking on Load Model and indicating the cells which contain the saved information.

70

9.2 Solver Results Once Solver completes optimisation, it displays the Solver Results dialog box shown in figure 9.2.

Figure 9.2: Solver Results dialog box. Clicking on OK will retain the solution found by Solver (ie the starting parameters are permanently replaced by the final parameter estimates). At this stage Excel is able to present three reports: Answer, Sensitivity and Limits. Of the three reports, the Answer report is the most useful as it gives the starting values of the parameters estimates and the associated SSR. The report also displays the final parameter estimates and the final SSR, allowing for easy comparison with the original values. An Answer report is shown in figure 9.3.

Figure 9.3: Answer report created by Excel.

71

Section 10: Modelling and Model Identification There are several types of model that interest physical scientists. Physical and chemical models are based on the application of physical and chemical principles. Such principles are expected to have wide applicability and underlie phenomena observed inside and outside the laboratory. Equations founded on physical and chemical principles contain parameters that have physical meaning rather than simply being anonymous constants in an equation. For example, a parameter in an equation could represent the radius of the Earth, the energy gap of a semiconductor or a rate constant in a chemical reaction. There are also essentially statistically based models that may, through consideration of experimental or observational data, assist in identifying the important variables and lend support to an empirical relationship between variables. A useful empirical equation is one that successfully describes the trend in the data but is not derived from a consideration of the fundamental principles underlying the relationship between variables. While both types of modelling are useful, most scientists would prefer the insight and predictive opportunities offered by good physical models to those that have purely statistical basis or support. 10.1 Physical Modelling If a model based on physical and chemical principles is successful, in the sense that data gathered in experiments are consistent with the predictions of the model, then this lends support to the validity of the underlying principles. As an example, a physical principle described by Isaac Newton, is that an attractive force exists between all bodies. That attractive force is termed the gravitational force. Newton went on to indicate how the gravitational force between two bodies depends on their respective masses and the separation between the bodies. From this starting point, it is possible to predict the value of acceleration of a body when it is allowed to fall freely above the Earth’s surface. It is often the case that approximations are made so that the problem does not become too complicated76. In this example we might consider the Earth to be77:

a) a perfect sphere b) not rotating c) of uniform density

Once a prediction has been made as to how the acceleration of a body varies with distance above the Earth’s surface, the next step is to determine by careful measurement how the acceleration actually depends on distance. If the approximations given by a), b) and c) above are valid, then the relationship between free fall acceleration, g(h) and height, h, can be written:

76 Experienced physical scientists are able to simplify complex situations while retaining the key principles necessary to understand a particular physical process or phenomenon. 77 If it found that the data are inconsistent with the ‘simplified’ theory, the approximations may have to be revisited and the model revised.

72

( )2

0

1

+=

Rh

ghg (10.1)

where g0 is the acceleration caused by gravity at the Earth’s surface (i.e. when h = 0), and R is the radius of the Earth78. By gathering data of acceleration as a function of height, it should be possible to confirm or contest the validity of equation 10.1. It is also possible to infer from equation 10.1 that if the range of h values is too limited (much less than the radius, R) then the acceleration, g(h), will decrease almost linearly with height79. Additionally, as the radius of the Earth is one of the parameters to be estimated, this can be compared with the known radius of the Earth as determined by other methods. Applying physical principles in order to establish an equation that successfully relates the variables is challenging. However, such an equation is often more satisfying and have wider applicability than an empirical equation. 10.2 Data driven approach to discovering relationships As an alternative to a ‘physical principles’ approach to developing a relationship between physical variables, we could try a ‘data driven’ approach such that trends observed in the data suggest a relationship between dependent and independent variables that might be valid. One weakness of this approach is that, even if the correct functional relationship between acceleration and height is discovered, we would be unlikely to recognise that hidden within parameter estimates is an important physical constant, such as the radius of the Earth. For example, with respect to study involving gravity described in section 10.1, we might carefully gather experimental data of the acceleration of free fall, g(h) for various heights, h, then plot g(h) versus h in order to discern the type relationship between the two variables. Such a plot is shown in figure 10.1 for values of h in the range 0 to 20 km.

78 See Walker (2002), Chapter 12. 79 This can be shown by doing a binomial expansion of equation 10.1 (see problem 11 at the end of the article).

73

9.740

9.750

9.760

9.770

9.780

9.790

9.800

9.810

9.820

0 5000 10000 15000 20000

height (m)

g (m

/s2 )

Figure 10.1: Variation of acceleration due to gravity with height above the Earth’s surface Based on the data appearing in figure 10.1, there is a relationship between g(h) and h, but owing to the variability within the data and perhaps the limited range over which the data were gathered, it is difficult to justify fitting an equation other than y = a + bx to these data. 10.3 Other forms of modelling In the physical sciences we are often able to isolate and control important independent variables in an experiment. For example, past experience may suggest that the thickness of an aluminium film vacuum deposited onto a glass substrate is affected by the distance from the aluminium target to the substrate, the deposition time and the pressure of the gas in the vacuum chamber. Such isolation and control might be contrasted with situations often encountered in other areas of science (and in other disciplines, such as the health or medical sciences). Consider, as an example, the efficacy of a treatment in prolonging the life of a patient suffering with liver cancer. There may be many variables that affect patient longevity to be considered including patient age, sex, race, past medical history, family medical history and socio-economic status. In fact, identifying which are the most important variables may be the finest achievement of the modelling/data analysis process with little expectation that a functional relationship other than linear will emerge between independent and dependent variables. There are many areas of science in which a certain amount of data ‘mining’ or ‘prospecting’ is required to establish which variables are most important and which can be safely discarded. Here we will confine our considerations to the analysis of data which emerge from experiments in which independent variables can be carefully controlled and measured. 10.4 Competing models Whether equations relating variables have been developed by first considering physical principles, past experience, or intelligent guesswork, there are circumstances in which two or more equations compete to offer the best explanation of the relationship between the variables. More terms can be added to an equation (including

74

terms that introduce extra independent variables) until the fit between equation and data is optimised, as measured by some suitable statistic such as those described in section 10.5. Careful experimental design can assist in helping discriminate one equation from another. For example, if a model predicts a slightly non linear relationship between dependent and independent variables, it would be wise to make measurements over as wide a range of values of the independent variable as possible to expose or exaggerate that non-linearity. Additionally, if the data show large scatter, there may be merit in investigating ways by which the noise can be reduced in order to improve the quality of the data. In the situations in which we need to compare two or more equations, we can appeal to methods of data analysis to provide us with quantifiable means of distinguishing between models. It is these methods that we will concentrate upon for the remainder of this section. 10.5 Statistical Measures of Goodness of Fit There are several measures that can be used to assist in discriminating statistically which equation gives the best fit to data, including the Schwartz criterion, Mallow’s Cp and the Hannan and Quinn Information Criterion80. Here we focus on two criteria, the Adjusted Coefficient of Multiple Determination, 2

ADJR , and the Akaike Information Criterion (AIC) as they are quite easy to implement and interpret. 10.5.1 Adjusted Coefficient of Multiple Determination A measure of how well an equation is able to account for the relationship between the independent and dependent variable is given by the Coefficient of Multiple Determination, R2, given by,

( )( )

−

−−=

2

22

ˆ

ˆ1

i

ii

yy

yyR (10.2)

where yi is the ith observed value of y, iy is the predicted y value found using the equation representing the best line through the points and y is the mean of the observed y values. Note that the numerator in the second term of equation 10.2 is the sum of squares of residuals SSR. As more parameters (or independent variables) are added to the model we would expect SSR to reduce. As a consequence, R2 would tend to unity. If we were to use R2 to help choose between equations, for example between, y = a + bx (10.3) and

y = a + bx + cx2 (10.4)

then equation 10.4 would always be favoured over equation 10.3 owing to the extra flexibility the x2 term provides for the line of best fit to pass close to the data points. 80 See Al-Subaihi (2002).

75

While the extra term in x2 contributes to a reduction in SSR, is possible that the reduction is only marginal. It seems reasonable that, while looking for an equation that reduces SSR, account should also be taken of the number of parameters so as not to unfairly discriminate against equations with only a small number of parameters. One such statistic is the Adjusted Coefficient of Multiple Determination, 2

ADJR , is given by,81

( ) ( )

MnMRn

R−

−−−= 11 22ADJ (10.5)

where R2 is given by equation 10.2, n is the number of data and M is the number of parameters in the equation. The equation that is favoured when two or more equations are fitted to data, is that equation that gives the largest value for 2

ADJR . 10.5.2 Akaikes Information Criterion (AIC) Another way to compare two (or more) equations fitted to data, where the equations have different numbers of parameters is to use the Akaikes Information Criterion82 (AIC). This criterion takes into account SSR, but also includes a term proportional to the number of parameters used. AIC may be written, MSSRn 2lnAIC += (10.6) where n is the number of data and M is the number of parameters in the equation. The second term on the right hand side of equation 10.6 can be considered as a ‘penalty’ term. If the addition of another parameter in an equation reduces SSR then the first term on the right hand side of equation 10.6 becomes smaller. However the second term on the right hand side increases by two for every extra parameter used. It follows that a modest decrease in SSR which occurs when an extra term is introduced into an equation may be more than offset by the increase in AIC by using another parameter. We conclude that, if two or more equations are fitted to data, then the equation producing the smallest value for AIC is preferred. Care must be exercised when calculating SSR as, if a transformation is required to facilitate fitting, data must be transformed back to the original units before calculating SSR, otherwise it is not possible to compare equations using 2

ADJR or AIC. Additionally, if weighted fitting is to be used, then the same weighting of the data must be used for all equations fitted to data.

81 See Neter, Kutner, Nachtsheim and Wasserman for a discussion of equation 10.5. 2

ADJR is calculated by the Regression Tool in the Analysis ToolPak in Excel (see p 373 of Kirkup). 82 See Akaike (1974).

76

10.5.3 Example As part of a study into the behaviour of electrical contacts made to a ceramic conductor, the data in table 10.1 were obtained for the temperature variation of the electrical resistance of the contacts. Table 10.1: Resistance versus temperature for electrical contacts on a ceramic.

T (K) R(Ω) T (K) R(Ω) 50 4.41 190 0.69 60 3.14 200 0.85 70 2.33 210 0.94 80 2.08 220 0.78 90 1.79 230 0.74

100 1.45 240 0.77 110 1.36 250 0.68 120 1.20 260 0.66 130 0.86 270 0.84 140 1.12 280 0.77 150 1.05 290 0.75 160 1.05 300 0.86 170 0.74 180 0.88

These data are shown plotted in figure 10.2.

0.00.51.01.52.02.53.03.54.04.55.0

0 50 100 150 200 250 300 350

Temperature (K)

Res

ista

nce

(oh

ms)

Figure 10.2: Resistance versus temperature data for electrical contacts made to a ceramic material. It is suggested that there are two possible models that can be used to describe the variation of the contact resistance with temperature. Model 1 The first model assumes that contacts show semiconducting behaviour, where the relationship between R and T can be written,

=TB

AR exp (10.7)

77

where A and B are constants. Model 2 Another equation proposed to describe the data assumes an exponential decay of resistance with increasing temperature of the form, ( ) γβα +−= TR exp (10.8) where α, β and γ are constants. We will use the adjusted coefficient of multiple determination and the Akaikes information criterion to determine whether equation 10.7 or equation 10.8 better fits the data. Solution Both equation 10.7 and equation 10.8 were fitted using non-linear least squares. It is possible to linearise equation 10.7 by taking logarithms of both sides of the equation, then performing linear least squares. However it is more convenient to use the Solver utility in Excel to perform non-linear least squares, as described in sections 4 of this document. Summarised in table 10.2 are the results of the fitting. Note that the number of data in table 10.1, n = 26. Table 10.2: Parameter estimates and statistics obtained when fitting equations 10.7 and 10.8 to the data in table 10.1. Parameter estimates and other statistics Fitting

=TB

AR exp

Fitting ( ) γβα +−= TR exp

A, σA 0.4849, 0.0175 - B, σB 111.0, 2.40 - α, σα - 18.91, 2.14 β, σβ - 0.03391, 0.00196 γ, σγ - 0.7974, 0.0313 SSR 0.2709 0.3191 AIC -29.95 -23.70 R2 0.9859 0.9833

2ADJR 0.9853 0.9818

Inspection of table 10.2 reveals that the equation

=TB

AR exp is superior to

( ) γβα +−= TR exp as judged by AIC and 2ADJR . In this example the SSR is smaller

for equation 10.7 fitted to data compared to equation 10.8. As the number of parameters in equation 10.7 is less than that in equation 10.8, this would have been enough to encourage us to favour equation 10.7 as the better fit to data.

78

Section 11: Monte Carlo simulations and least squares How effective is the technique of least squares at providing good estimates of parameters appearing in an equation fitted to experimental data? This question is both challenging and important. To begin with it is not possible to assure that an equation fitted to data is appropriate. Additionally, we cannot be sure that the assumptions usually made when applying the technique of least squares (e.g. that errors in the y values are normally distributed, with a mean of zero and a constant standard deviation) are valid. There is no way to be certain of what the parameters should be that appear in any equation that is fitted to ‘real’ data. However, it is possible to contrive a situation where we do know the underlying relationship between the dependent variable (y) and independent variable (x) and how errors are distributed. The starting point is to generate ‘noise free’ y values in some range of x values. The next stage is to add ‘noise’ of known standard deviation with the aid of a random number generator83. Data generated in this manner are submitted to a least squares routine which, in turn, calculates the best estimates of parameters appearing in an equation fitted to the data. The estimates are compared with the ‘actual’ parameters allowing the error in the estimates84 to be determined. Generating and analysing data in this way manner is an example of a Monte Carlo simulation. Such simulations are widely used in science to imitate situations that are too difficult, costly or time consuming to investigate through conventional experiments. The Monte Carlo approach is powerful and versatile. As examples, we may investigate ‘experimentally’,

• the performance of data analysis tools (for example, the speed and accuracy of rival algorithms for non-linear least squares can be compared).

• the consequence of choosing different sampling regimes (for example, the distribution of parameter estimates obtained when measurements are made at evenly spaced intervals of x can be compared with the distribution of parameter estimates obtained when replicate measurements are made at extreme values of x).

• the effect of homo- or heteroscedasticity on parameter estimates (for example, the consequences may be investigated of fitting an equation by unweighted least squares to data, where data have been influenced by heteroscedastic noise).

• the effect of the magnitude of the ‘noise’ in the data on the standard errors of the parameter estimates.

83 or a pseudorandom number generator as routinely found in statistic and spreadsheet packages. 84 error = true value of parameter – estimated value of parameter.

79

11.1 Using Excel’s Random Number Generator The Random Number Generator in Excel offers a convenient means of adding normally distributed noise to otherwise noise free data85. The Random Number Generator is one of the tools in the Analysis ToolPak. The ToolPak is found by going to the Tools pull down menu on the Menu toolbar and clicking on Data Analysis. Figure 11.1 shows noise free y values in column B generated using the equation: y = 3 + 1.5x (11.1) Normally distributed noise with mean of zero and standard deviation of two is generated in the C column. In the D column the noise-free y-values are summed with the noise. The x-values are distributed evenly in the range x = 5 to x = 20.

A B C D 1 x ynoise_free noise y 2 5 10.5 -0.00145 =B2+C2 3 6 12.0 -3.08168 4 7 13.5 -2.95189 5 8 15.0 0.06251 6 9 16.5 0.674352 7 10 18.0 2.402985 8 11 19.5 2.987836 9 12 21.0 3.183932 10 13 22.5 -1.49775 11 14 24.0 0.441671 12 15 25.5 2.453826 13 16 27.0 -1.03515 14 17 28.5 -1.77266 15 18 30.0 -0.43274 16 19 31.5 1.972357 17 20 33.0 0.010632

Figure 11.1: Normally distributed noise with zero mean and standard deviation of two added to y values. x values are in the range x = 5 to x = 20, with no replicates. Figure 11.2 shows a similar range of x values, but in this case eight replicates are made at x = 5 and another eight at x = 20, with no values between these limits (such that the number of data in figures 11.1 and 11.2 are the same). Again, normally distributed noise with mean of zero and standard deviation of two is added to each of the y-values.

85 The Random Number Generator allows noise with distributions other than normal to be added to data. We will consider only normally distributed noise.

80

K L M N 1 x ynoise_free noise y 2 5 10.5 -0.51868 =L2+M2 3 5 10.5 0.849941 4 5 10.5 0.91623 5 5 10.5 0.153223 6 5 10.5 1.798417 7 5 10.5 -0.67711 8 5 10.5 -2.12328 9 5 10.5 -3.16988 10 20 33.0 -0.11841 11 20 33.0 0.47371 12 20 33.0 -2.95645 13 20 33.0 1.563158 14 20 33.0 2.03966 15 20 33.0 0.874793 16 20 33.0 1.679191 17 20 33.0 -1.71295

Figure 11.2: Normally distributed noise with zero mean and standard deviation of two added to y values. Data are generated at x = 5 and x = 20. Eight replicate y values are generated at each x value. Analysing the data shown in figures 11.1 and 11.2 using unweighted least squares gives the following estimates for parameters and standard errors in parameters. Note we refer to the data that are evenly distributed between x = 5 and x = 20 as given in Figure 11.1 as ‘Even dist.’ and the data consisting of replicates at x = 5 and x = 20 as ‘Extreme dist.’. The outcome of analysing using unweighted least squares is shown in table 11.1. a σa b σb R2

Even dist. 2.238 1.464 1.578 0.1099 0.9364 Extreme dist. 2.461 0.8297 1.538 0.05692 0.9812 Table 11.1: Parameter estimates and statistics for data in figures 11.1 and 11.2 found using unweighted least squares. Errors in intercept and slope in table 11.1 are found by subtracting the estimates from the true values (3 and 1.5 respectively) as shown in table 11.2. Error in a Error in b

Even dist. 0.7615 -0.07801 Extreme dist. 0.5386 -0.03845 Table 11.2: Errors in intercept and slope. It is possible that the simulated data are unrepresentative of the effect of evenly distributed data compared to data gathered at extreme x values (as there is only two sets of data and, by chance, the ‘Extreme dist.’ could have been favoured over the ‘Even dist.’). This is where the power of the Monte Carlo approach emerges. The simulation may be repeated many times in order to establish whether designing an experiment with replicate measurements made at extreme x values does consistently produce parameter estimates with smaller standard errors.

81

Figures 11.3 and 11.4 show histograms of estimates value of a and b which were determined by generating 50 sets of simulated data, based on adding noise of standard deviation of 2 to y values generated using equation 11.1.

0

2

4

6

8

10

12

14

0.0

1.0

2.0

3.0

4.0

5.0

6.0

7.0

estimate, a , of intercept

freq

uenc

y

even dist.

extreme dist.

Figure 11.3: Histogram consisting of 50 estimates of intercept found by fitting the equation y = a + bx to simulated data.

0

5

10

15

20

25

1.2 1.3 1.4 1.5 1.6 1.7 1.8

estimate, b , of slope

freq

uenc

y even dist.

extreme dist.

Figure 11.4: Histogram consisting of 50 estimates of slope found by fitting the equation y = a + bx to simulated data. Figures 11.3 and 11.4 provide convincing evidence of the benefits (as far as reducing standard errors in parameter estimates is concerned) of designing the experiments in which extreme x values are favoured. This finding has a sound foundation based on statistical principles. For example, the standard error in the estimate of the slope is related to the x-values, xi by,86

( )( ) 21

2 −=

xxi

b

σσ (11.2)

86 See Devore, 1991.

82

where x is the mean of the x values and σ is the standard deviation of the experimental y values given by,

( ) 2

12

2

ˆ

−−

= n

yy iiσ (11.3)

where n is the number of data. Equation 11.2 indicates that, for a fixed σ, σb become smaller for large deviations of x from the mean, i.e. for large values of xxi − . It is worth emphasising that the reduction of the standard errors and improved R2 are secured at some cost. What if the underlying relationship between x and y is not linear? Gathering data at two extremes of x has assumed that the data are linearly related and there is no way to test the validity of this assumption with the data gathered in this manner. 11.2 Monte Carlo simulation and non-linear least squares Let us now consider a situation requiring fitting by non-linear least squares. The equation to be fitted to data is given by, ( ) ( )xBAxBAy 2211 expexp += (11.4) We choose (arbitrarily), A1 = 50, A2 = 50, B1 = -0.025 and B2 = -0.010 Fifty values of y are generated in the range x = 1 to x = 200. A graph of the noise free data with y values calculated at equal increments of x beginning at x = 1 is shown in figure 11.5.

Figure 11.5: Noise free data generated using equation 11.4.

0

20

40

60

80

100

120

0 50 100 150 200 x

y

83

Next, noise is added with mean of zero and a constant standard deviation of unity (again chosen arbitrarily). The question arises: what values of x should be chosen to obtain estimates of A1, A2 etc which have the smallest standard errors? With normally distributed noise of zero mean and standard deviation of unity added to the y values, the graph looks typically like that shown in figure 11.6.

Figure 11.6: x – y data as shown in figure 11.5 with noise added. 50 replicate data sets were generated with noise added to the y values in figure 11.5. Upon the generation of each set, an equation of the form,

( ) ( )xbaxbay 2211 expexp += (11.5)

where a1, a2, b1 and b2 are estimates of A1, A2, B1 and B2 respectively, was fitted to the data using non-linear least squares. Note that starting values for the non-linear fit, which are very important when fitting a function consisting of a sum of exponentials, were a1 = 50, a2 = 50, b1 = -0.025 and b2 = -0.010. Figure 11.7 shows a histogram of the a1 parameter estimates.

0

20

40

60

80

100

120

0 50 100 150 200 x

y

84

Figure 11.7: Distribution of parameter estimate a1. Exercise 7 An alternative sampling regime to that used in section 11.2 is to choose smaller sample intervals in the region where the y values are changing most rapidly with x. A sampling regime that has this characteristic is given by,

( )

+−+=

11

ln1

iNN

xi λ (11.6)

N is the total number of data, and λ is a constant which is determined by letting ix equal the maximum x value, when i = N. Repeat the example given in section 11.2 (i.e. use the same starting equation and distribution of errors) using the new sampling regime described by equation 11.6 and perform 50 replicates. Plot a histogram of the distribution of the parameter estimate a1.

a) Is the standard deviation of the parameter estimates, a1, less than that obtained in section 11.2 when the xi values were evenly distributed?

b) Carry out an F test to compare the variances of the distribution of a1 obtained using both sampling regimes to establish if the difference in the characteristic width is statistically significant87.

Exercise 8 In an experiment to determine the wavelength, λ, of an ultrasonic wave, an experiment is to be performed which exploits the phenomenon of interference of waves from two sources of ultrasound.

87 See pages 342 to 346, and pages 369 to 371 in Kirkup (2002).

0 2

4 6 8

10 12

14 16

20 30 40 50 60 70 80 90

parameter estimate, a 1

Freq

uenc

y

85

The relationship between the separation, y, between two successive maxima of the interfering waves and the separation of the sources of the waves, d, is given by:

dD

yλ= (11.7)

where D is a constant.

Equation 11.7 is of the form y = bx, where, x ≡ d1

and b ≡ λD.

How may values of d be chosen so as to minimise the standard error in the slope, b? Simulation Two approaches for choosing values of d are to be compared. The first approach generates y values as d is increases from 1 to 20 cm in steps of 1 cm. The other is to generate y values as the ratio 1/d is increased from 1/20 to 1 (ie 0.05 to 1). Taking λ equal to 0.76 cm and D =50 cm, generate simulated values of y using equation 11.7:

a) for d = 1 to 20 cm, in steps of 1 cm. b) for 1/d = 1/20 cm-1 to 1 in steps of 0.05 cm-1.

For data generated by both methods a) and b), use Excel’s Random Number generator to add normally distributed noise with mean of zero and standard deviation of unity. Analysis Use the LINEST() function in Excel to find the slope of the best line through the origin for two sets of data generated. Replicate the simulation and least squares analysis fifty times and construct a histogram showing the distribution of best estimates of slope based on both distributions of x values. Questions Is there an obvious difference between the distributions of the parameter estimates based on the two sampling regimes?

a) Support your answer to a using an F test to compares variances in the slope. b) Do you foresee any practical problems when a ‘real’ experiment is to be

carried out using either sampling regimes?

86

11.3 Adding heteroscedastic noise using Excel’s Random Number Generator When the standard deviation of measurement is not constant, but instead depends on the x value, the distribution of errors is said to be heteroscedastic. As far as fitting an equation to data using least squares is concerned, it is necessary to use weighted fitting88. Heteroscedasticity may be revealed by plotting the residuals versus x as shown in figure 11.8. Figure 11.8: Residuals indicating weighted fit is required: The trend of large residual to small (or small to large) as x increases is a strong indication of heteroscedastic error distribution. Though heteroscedasticity may be revealed by a plot of residuals, the nature of the heteroscedasticity is not always clear. For example, when the dominant source of error is due to instrumental error, it is common for the error, ei to be proportional to the magnitude of the response, yi. We can use a Monte Carlo simulation to study the effect of heteroscedasticity, and to establish (for example) the consequences of fitting an equation to data with heteroscedastic errors using both unweighted and weighted least squares. We begin with an (arbitrary) equation from which we generate ‘noise free’ data. The equation is: y = 2 – 4x (11.8) Figure 11.9 shows noise free data generates in the range x = 1 to x = 10 using the Random Number Generator in Excel. In the C column there are normally distributed numbers with mean equal to zero, and standard deviation equal to one. The values in the D column also have a normal distribution, but the standard deviation of the distribution at each x value depends on the magnitude of the y value in column B. Specifically, the standard deviation, σi is given by :

σi = 0.1 × yi (11.9)

88 See page 264 of Kirkup (2002).

-150

-100

-50

0

50

100

150

10 20 30 x

∆∆∆∆y

-400 -300 -200 -100

0 100 200 300

10 20 30 x

∆∆∆∆y

87

A B C D E 1 x ynoise_free homo_noise hetero_noise y

2 1 -2 -1.0787 =0.1*B2*C2 =B2+D2 3 2 -6 -0.5726 4 3 -10 1.15598 5 4 -14 -0.0725 6 5 -18 0.67552 7 6 -22 0.44338 8 7 -26 -0.5806 9 8 -30 1.23376 10 9 -34 0.18546 11 10 -38 2.34588

Sheet 11.1: Generating data with heteroscedastic noise. ‘Experimental’ data appear in column E.

A B C D E 1 x ynoise_free homo_noise hetero_noise y

2 1 -2 -1.0787 0.21575 -1.78425 3 2 -6 -0.5726 0.34356 -5.65644 4 3 -10 1.15598 -1.15598 -11.156 5 4 -14 -0.0725 0.10146 -13.8985 6 5 -18 0.67552 -1.21594 -19.2159 7 6 -22 0.44338 -0.97544 -22.9754 8 7 -26 -0.5806 1.50944 -24.4906 9 8 -30 1.23376 -3.70128 -33.7013 10 9 -34 0.18546 -0.63055 -34.6306 11 10 -38 2.34588 -8.91434 -46.9143

Sheet 11.2: Completed spreadsheet based on values in sheet 11.1. Figure 11.9 shows a plot of y versus x based on the generated data in sheet 11.2. The line of best fit on the graph was found using Excel’s Trendline option and therefore represents and unweighted fit of the equation, y = a + bx to the data. Figure 11.10 shows the (unweighted) y residuals plotted versus x. The trend in the residuals indicates that the errors have a heteroscedastic distribution and therefore weighted fitting is required.

Figure 11.9: Plot of x – y data as generated by sheet 11.1. The line of best fit (found using unweighted least squares) is shown on the graph.

y = -4.5894x + 3.7994

-50 -45 -40 -35 -30 -25 -20 -15 -10 -5 0

2 4 6 8 10 12

x

y

88

Figure 11.10: Distribution of residuals when an unweighted fit is carried out. In order to compare unweighted and weighted fitting of data to heteroscedastic data, fifty sets of heteroscedastic data were generated in the manner described above. An equation of the form: y = a + bx (11.10) was fitted to simulated data. Unweighted fitting was performed using the LINEST() function in Excel. Weighted fitting was performed with the aid of Solver89, where the weighting was chosen so that the standard deviation in the ith value was taken to be proportional to yi, i.e., ii y∝σ (11.11) Figures 11.11 and 11.12 compares the scatter in the estimates parameters when unweighted and weighted fitting is performed on data which is heteroscedastic. It is clear from both figures that the weighted fit produces a much narrow distribution in parameter estimates and is therefore preferred over the weighted fit .

89 Note that the equation being fitted is linear in the parameters and so fitting can be accomplished using weighted linear least squares. However, as Excel does not possess an option that allows for easy fitting in this manner, it is easier to construct a spreadsheet that minimises (using Solver) the sum the

residuals, SSR, where:

−=

2ˆ

i

ii

yyy

SSR .

-6 -5 -4 -3 -2 -1 0 1 2 3 4 5

2 4 6 8 10 12

x

resi

dual

89

0

5

10

15

20

25

30

35

-1.5-0.5

0.5 1.5 2.5 3.5 4.5

parameter estimate, a

freq

uenc

y unweight

weight

Figure 11.11: Distribution of the parameter estimate, a, when unweighted and weighted fitting is carried out on fifty data sets.

0

5

10

15

20

25

-4.6-4.4

-4.2-4 -3.8

-3.6-3.4

-3.2

parameter estimate, b

freq

uenc

y

unweight

weight

Figure 11.12: Distribution of the parameter estimate, b, when unweighted and weighted fitting is carried out on fifty data sets. Exercise 9

a) Use equation 11.8 to generate y values for x = 1, 2, 3 etc. up to x = 10. b) Add normally distributed homoscedastic noise with mean of zero and standard

deviation of unity to the y values generated in part a). c) Fit equation 11.10 to the data using both unweighted fitting and weighted least

squares. For the weighted fit, assume that the relationship for the standard deviation in the y values given by equation 11.11 is valid.

d) Repeat part c) at least 40 times. Construct histograms of the scatter in both a and b for both weighted and unweighted fitting.

e) Calculate the mean and standard deviation of a and b that you obtained in part d).

f) Is unweighted fitting by least squares demonstrably better than weighted fitting in this example?

90

Section 12: Review This document focuses primarily on fitting equations to data using the technique of non-linear least squares. In particular, the use of the Solver tool packaged with Excel has been considered and how it may be employed for non-linear least squares fitting. For completeness, some discussion of linear least squares has been included and under what circumstances linear least squares is no longer viable. A most important aspect of fitting equations to data is to be able to determine standard errors in the estimates made of any parameters appearing in an equation. Solver does not provide standard errors, so this document describes the means by which standard errors can be calculated using an Excel spreadsheet. An advantage of employing Excel is that some aspects of fitting by non-linear least squares which are normally hidden from view when using a conventional computer based statistics package can be made visible with Excel. I hope this leads to a deeper appreciation of non-linear least squares than simply entering numbers into a stats package and waiting for the fruits of the analysis to emerge. Some general issues relating to fitting by non-linear least squares have been discussed, such as the existence of local minima in SSR and means by which good starting values may be established in advance of fitting. We have also considered briefly how equations fitted to data can be compared in order to determine which equation is the 'better' in a statistical sense, while at the same time emphasising that any equation fitted to data should be supported on a foundation of sound physical and/or chemical principles. This document is not yet complete. I would like to include something in the future about identifying and treating outliers as well as points of ‘high leverage’. Acknowledgements I would like to express my sincere thanks to Dr Mary Mulholland of the Faculty of Science at UTS and Dr Paul Swift (formerly of the same Faculty) for suggesting examples from chemistry and physics that may be usefully treated using non-linear least squares. From Luton University I acknowledge the assistance and encouragement of Professor David Rawson, Dr Barry Haggert and Dr John Dilleen. I thank my good friends John Harbottle and Peter Rowley for their excellent hospitality while I was in the UK in 2002 preparing some of this material. I also acknowledge a timely communication from Dr Marcel Maeder of Newcastle University (New South Wales) who queried the omission of Excel’s Solver from my book. I am grateful to Dr Maeder, as his query provided the spur to create this document. Finally, I thank the following organisations where parts of this document were prepared: University of Technology, Sydney, University of Paisley, UK, University of Luton, UK, and CSIRO, Lindfield, Australia.

91

Problems 1. Standard addition analysis is routinely used to establish the composition of a sample. In order to establish the concentration of Fe3+ in water, solutions containing known concentrations of Fe3+ were added to water samples90. The absorbance of each solution, y, was determined for each concentration of added solution, x. The absorbance/concentration data are shown in table P1. Concentration (ppm), x

Absorbance (arbitrary units), y

0 0.240 5.55 0.437

11.10 0.621 16.65 0.809 22.20 1.009

Table P1: Data for problem 1. The relationship between absorbance, y, and concentration, x, may be written, y = B(x-xC) (P1) Where B is the slope of the line of y versus x. xC is the intercept on the x axis which represents the concentration of Fe3+ in the water before additions are made. Use non-linear least squares to fit equation P1 to the data in table P1. Determine,

a) best estimates of B and xC [0.03441 ppm-1, -7.009 ppm] b) standard errors in B and xC [0.000277 ppm-1, 0.159 ppm].

2. Another way to analyse the data in table P1 is to write, y = A + Bx (P2) Here A is the intercept on the y axis at x = 0, and B is the slope. The intercept on the x axis, xC (found by setting y = 0 in equation P2) is given by,

BA

xC

−= (P3)

Use linear least squares to fit equation P2 to the data in table P1. Determine,

a) best estimates of A, B and xC [0.2412, 0.03441 ppm-1, -7.009 ppm] b) standard errors in the best estimates of A, B and xC [0.00376, 0.000277 ppm-1,

0.159 ppm]. .

90 This problem is adapted from Skoog and Leary (1992).

92

Note that the errors in the best estimate of slope and intercept in equation P2 are correlated and so the normal ‘propagation of uncertainties’ method is not valid when calculating xC (see section 8.1). 3 In a study of first order kinetics, the volume of titrant required, V(t), to reach the end point of a reaction is measured as a function of time, t. The following data were obtained91. t(s) V(t) (ml)

145 4.0 314 7.6 638 12.2 901 15.6

1228 18.6 1691 21.6 2163 24.0 2464 24.8

Table P2: Data for problem 3. The relationship between V and t can be written, V(t) = V∞ - ( V∞-V0)exp(-kt) (P4) Where k is the rate constant. V∞ and V0 are also constants. Using non-linear least squares, fit equation P4 to the data in table P2. Determine,

a) best estimates of V∞ , V0 and k [28.22 ml, 0.9906 ml, 0.0008469 s-1] b) standard errors in the estimates of V∞ , V0 and k [0.377 ml, 0.216 ml,

3.00 × 10-5 s-1]

91 These data were taken from Denton (2000).

93

4. Table P3 contains data obtained from a simulation of a chemical reaction in which noise of constant variance has been added to the data.92

Time, t, (s)

Concentration, C, (mol/l)

0 0.01000 20000 0.00862 40000 0.00780 60000 0.00687 80000 0.00648

100000 0.00595 120000 0.00536 140000 0.00507 160000 0.00517 180000 0.00450 200000 0.00482 220000 0.00414 240000 0.00359 260000 0.00354 280000 0.00324 300000 0.00333 320000 0.00309 340000 0.00285 360000 0.00349 380000 0.00273 400000 0.00271

Table P3: Simulated data taken from Zielinski and Allendoerfer (1997). Assuming that the relationship between Concentration, C, and time, t, can be written93,

ktCC

C0

0

1+= (P5)

C0 is the concentration at t = 0 and k is the second order rate constant. Fit equation P5 to the data in table P3 to obtain best estimates for C0 and k and standard errors in the best estimates. [0.009852 mol/l, 0.0006622 l/mol⋅s, 0.00167 mol/l, 1.98 × 10-5 l/mol⋅s] 5. Table P4 gives the temperature dependence of the energy gap of high purity crystalline silicon. The variation of energy gap with temperature can be represented by the equation,

92 See Zielinski and Allendoerfer (1997). 93 The assumption is made that a second order kinetics model can represent the reaction.

94

( ) ( )T

TETE gg +

−=βα 2

0 (P6)

T

(K) Eg (T) (eV)

20 1.1696 40 1.1686 60 1.1675 80 1.1657

100 1.1639 120 1.1608 140 1.1579 160 1.1546 180 1.1513 200 1.1474 220 1.1436 240 1.1392 260 1.1346 280 1.1294 300 1.1247 320 1.1196 340 1.1141 360 1.1087 380 1.1028 400 1.0970 420 1.0908 440 1.0849 460 1.0786 480 1.0723 500 1.0660 520 1.0595

Table P4: Energy gap versus temperature data. where Eg(0) is the energy gap at absolute zero and α and β are constants. Fit equation P6 to the data in table P4 to find best estimates of Eg(0), α and β as well as standard errors in the estimates. Use starting values, 1.1, 0.0004, and 600 respectively for estimates of Eg(0), α and β . [1.170 eV, 0.0004832 K-1, 662 K, 7.8 × 10-5 eV, 4.7 × 10-6 K-1, 11 K]

95

6. In an experiment to study phytoestrogens in Soya beans, an HPLC system was calibrated using known concentrations of the phytoestrogen, biochanin. Table P5 contains data of the area under the chromatograph absorption peak as a function of biochanin concentration. Conc. (x) (mg/l)

Area, (y) (arbitrary units)

0.158 0.121342 0.158 0.121109 0.315 0.403550 0.315 0.415226 0.315 0.399678 0.631 1.839583 0.631 1.835114 0.631 1.835915 1.261 3.840554 1.261 3.846146 1.261 3.825760 2.522 8.523561 2.522 8.539992 2.522 8.485319 5.045 16.80701 5.045 16.69860 5.045 16.68172 10.09 34.06871 10.09 33.91678 10.09 33.70727

Table P5: HPLC data for biochanin. A comparison is to be made of two equations fitted to the data in table P5. The equations are, y = A + Bx (P7) and y = A + BxC (P8) Assuming an unweighted fit is appropriate, fit equations P7 and P8 to the data in table P5. For each equation fitted to the data, calculate the,

a) best estimates of parameters [-0.4021, 3.404 (mg/l)-1, -0.5650, 3.581 (mg/l)-0.979, 0.9790]

b) standard errors in estimates [0.0575, 0.0127 (mg/l)-1, 0.0885, 0.0790 (mg/l)-0.979, 0.00903]

c) sum of squares of residuals (SSR) [0.6652, 0.5074] d) Akaikes information criterion [-4.15, -7.57] e) residuals. Draw a graph of residuals versus concentration.

Which equation better fits the data?

96

7. The relationship between critical current, Ic, and temperature, T, for a high temperature superconductor can be written,

−

−=

21

21

1435.0tanh174.1c

c

cc T

TTT

BTT

AI (P9)

Where A and B are constants and Tc is the critical temperature of the superconductor. For a high temperature superconductor with a Tc equal to 90.1 K, the following data for critical current and temperature were obtained:

T (K) I (mA) 5 5212

10 5373 15 5203 20 4987 25 4686 30 4594 35 4245 40 4091 45 3861 50 3785 55 3533 60 3199 65 2903 70 2611 75 2279 80 1831 85 1098 90 29

Table P6: Critical current versus temperature data for a high temperature superconductor with critical temperature of 90.1 K. Fit equation P9 to the data in table P6, to obtain best estimates for the parameters A and B and standard errors in best estimates. [3199 mA, 11.7, 17.0 mA, 1.79] 8. A sensor developed to measure the electrical conductivity of salt solutions is calibrated using solutions of sodium chloride of known conductivity, σ. Table P7 contains data of signal output, V, of the sensor as a function of conductivity.

97

σ (mS/cm) V (volts) 1.504 6.77 2.370 7.24 4.088 7.61 7.465 7.92

10.764 8.06 13.987 8.14 14.781 8.15 17.132 8.19 24.658 8.27 31.700 8.31 38.256 8.34

Table P7: Signal output from sensor as a function of electrical conductivity. Assume that the relationship between V and σ is, ( )[ ]ασexp1−+= kVV s (P10) Where Vs, k and α are constants. Use unweighted non-linear least squares to determine best estimates of the constants and standard errors in the best estimates. [8.689 V, 1.460 V, 0.4281, 0.0190 V, 0.00740 V, 0.0108] 9. In a study of the propagation of an electromagnetic wave through a porous solid, the variation of relative permittivity, εr, of solid was measured as a function of moisture content, νw (expressed as a fraction). Table P8 contains the data obtained in the experiment94.

νw εr 0.128 8.52 0.116 7.95 0.100 7.65 0.095 7.55 0.077 7.08 0.065 6.82 0.056 6.55 0.047 6.42 0.035 5.97 0.031 5.81 0.025 5.69 0.022 5.55 0.017 5.38 0.013 5.26 0.004 5.08

P8: Variation of relative permittivity with moisture content.

Assume the relationship between εr andνw can be written,

94 Francois Malan 2002 (private communication).

98

( ) ( ) mmmwwmwwr εεεενεενε +−+−= 222 (P11)

where, εw is relative permittivity of water εm is the relative permittivity of the (dry) porous material Use (unweighted) non-linear least squares to fit equation P11 to the data in table P8 and hence obtain best estimates of εw and εm and standard errors in the best estimates [55.44, 1.83, 5.067, 0.043] 10. Unweighted least squares requires the minimisation of SSR given by,

( ) −= 2ˆ ii yySSR (P12)

A technique sometimes adopted when optimising parameters in optical design situations is to minimise S4R, where, ( ) −= 4ˆ4 ii yyRS (P13) Perform a Monte Carlo simulation to compare parameter estimates obtained when equations P12 and P13 are used to fit an equation of the form, y = a + bx to simulated data. More specifically,

a) Use the function y = 2.1 – 0.4x to generate y values for x =1, 2, 3 etc up to x = 20.

b) Add normally distributed noise of mean equal to zero and standard deviation of 0.5 to the values generated in part a).

c) Find best estimates of a and b by minimising SSR and S4R as given by equations P12 and P13. (Suggestions: Solver may be used minimise SSR and S4R.

d) Repeat steps b) and c) until 50 sets of parameter estimates have been obtained using equation P12 and P13.

e) Is there any significant difference between the parameter estimates obtained when minimising SSR and S4R?

f) Is there any significant difference between the variance in the parameter estimates when minimising SSR and S4R?

99

11. In section 10.1, the relationship between free fall acceleration, g(h) and height, h, was written:

( )2

0

1

+=

Rh

ghg (P14)

To study the validity of equation P14, low noise data of free fall acceleration are gathered over a range of values of height, h. For h values small compared to the radius of the Earth, the acceleration will decrease almost linearly with height. Applying the binomial expansion to equation 10.1, we obtain for a first order approximation,

( )

−=Rh

ghg2

10 (P15)

Contained in table P9 are data of the variation of acceleration with height above the Earth’s surface.

h (km) g (m/s2) 1000 7.33 2000 5.68 3000 4.53 4000 3.70 5000 3.08 6000 2.60 7000 2.23 8000 1.93 9000 1.69

10000 1.49 Table P9: Variation of acceleration due to gravity with height.

a) Use least squares to fit both equations P14 and P15 to the data in table P9 and determine best estimates for g0 and R.

b) Calculate standard errors in the best estimates. c) Calculate and plot the residuals for each equation fitted to the data in table P9. d) Is equation P15 a reasonable approximation to equation P14 over the range of

h values in table P9? 12. The electrical resistance, r, of a particular material at a temperature, T, may be described by, BTAr += (P16) or 2TTr γβα ++= (P17)

100

where A, B, α, β, and γ are constants. Table P10 shows the variation of the resistance of an alloy with temperature. Table P10: Resistance versus temperature data for an alloy r (Ω) 19.5 18.4 20.2 20.1 20.9 20.8 21.2 21.8 21.9 23.6 23.2 23.9 23.2 24.1 24.2 26.3 25.5 26.1 26.3 27.1 28.0

T (K) 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 320 330 340 350

Using (unweighted) linear least squares, fit both equation P16 and P17 to the data in table P10 and determine for each equation,

a) estimates for the parameters [12.41 Ω and 4.30 × 10-2 Ω/K, 14.0 Ω, 3.0 × 10-2 Ω/K and 2.7 × 10-5 Ω/K2]

b) the standard error in each estimate [0.49 Ω and 0.19 × 10-2 Ω/K, 2.2 Ω, 1.8 × 10-2 Ω/K and 3.6 × 10-5 Ω/K2]

c) the standard deviation, σ, in each y value [0.5304 Ω, 0.5368 Ω] d) the sum of squares of the residuals, SSR [5.344 Ω2, 5.186 Ω2] e) the Akaikes information criterion, AIC. [39.20, 40.57]

101

References Akaike H A new look at the statistical model identification (1974) IEEE Transactions on Automatic Control 19 716-723. Al-Subaihi A A (2002) Variable Selection in Multivariate Regression using SAS/IML http://www.jstatsoft.org/v07/i12/mv.pdf Bard Y Nonlinear Parameter Estimation (1974) Academic Press, London. Bates D M and Watts D G Nonlinear Regression Analysis and its Applications (1988) Wiley, New York. Bevington P R and Robinson D K Data Reduction and Error Analysis for the Physical Sciences (1992) McGraw-Hill, New York. Bube R H Photoconductivity of Solids (1960) Wiley New York. Cleveland W S The Elements of Graphing Data (1994) Hobart Press, New Jersey. Conway D G and Ragsdale C T Modeling Optimization Problems in the Unstructured World of Spreadsheets (1997) Omega. Int. J. Mgmt. Sci. 25 313-322. Demas J N Excited State Lifetime Measurements (1983) Academic Press, New York. Denton P Analysis of First Order Kinetics Using Microsoft Excel Solver (2000) Journal of Chemical Education 77, 1524-1525. Dietrich C R Uncertainty, Calibration and Probability: Statistics of Scientific and Industrial Measurement 2nd edition (1991) Adam Hilger, Bristol. Frenkel R D Statistical Background to the ISO ‘Guide to the Expression of Uncertainty in Measurement’ (2002) CSIRO, Sydney p 43. Fylstra D, Lasdon L, Watson J and Waren A Design and Use of Microsoft Excel Solver (1998) Interfaces 28 29-55. Karlovsky J Simple Method for calculating the Tunneling Current in an Esaki Diode (1962) Phys. Rev. 127 419. Katz E, Ogan K L and Scott R P W Peak Dispersion and Mobile Phase Velocity in Liquid Chromatography: The Pertinent Relationship for Porous Silica (1983) J. Chromatogr. 270 51-75. Kennedy G J and Knox J H Performance of packings in high performance liquid chromatography. 1. Porous and surface layer supports (1972) J. Chromatogr. Sci. 10 549-556. Kirkup L Data Analysis with Excel: An Introduction for Physical Scientists (2002) Cambridge University Press, Cambridge.

102

Kirkup L and Cherry I Temperature Dependence of Photoconductive Decay in Sintered Cadmium Sulphide (1988) Eur. J. Phys. 9 64-68. Kirkup L and Sutherland J Curve Stripping and Non-Linear Fitting of Polyexponential Functions to Data using a Microcomputer (1988) Comp. in Phys. 2 64-68. Moody H W The Evaluation of the Parameters in the Van Deemter Equation (1982) Journal of Chemical Education 59, 290-291. Neter J, Kutner M J, Nachtsheim C J and Wasserman W Applied Linear Regression Models (1996) Times Mirror Higher Education Group Inc. Nielsen-Kudsk F A Microcomputer Program in Basic for Iterative, Non-Linear Data-Fitting to Pharmacokinetic Functions (1983) Int. J. Bio-Med. Comput. 14 95-107. Nocedal J Numerical optimization (1999) Springer: New York. Perry A A Modified Conjugate Gradient Algorithm (1978) Operation Research 26 1073-1078. Safizadeh M and Signorile R Optimization of Simulation via Quasi-Newton Methods (1994) ORSA J. Comput. 6 398-408. Salter C Error Analysis using the Variance-Covariance Matrix (2000) Journal of Chemical Education 77, 1239-1243. Skoog D A and Leary J J Principles of Instrumental Analysis 4th edition (1992) Harcourt Brace: Fort Worth. Smith S and Lasdon L Solving Large Sparse Nonlinear Programs Using GRG (1992) Journal on Computing 4 2-16. Snyder L R, Kirkland, J J and Glajch J L Practical HPLC Method Development 2nd Edition (1997) Wiley: New York. Walker J S Physics (2002) Prentice Hall, New Jersey. Walkenbach J Excel 2002 Power Programming with VBA (2001) M&T Books, New York. Walsh S and Diamond D Non-linear Curve Fitting Using Microsoft Excel Solver (1995) Talanta 42 561-572. Williams I P Matrices for Scientists (1972) Hutchinson University Library, London. Wolsey, L A Integer Programming (1998) Wiley, New York. Zielinski T J and Allendoerfer R D Least Squares Fitting of Nonlinear Data in the Undergraduate Laboratory (1997) Journal of Chemical Education 74 1001-1007.

Documents

Kirkup - Principles and Applications of Non-Linear Least Squares - An Introduction for Physical Scientists Using Excel Solver (2003)