Upload
rodrigo-p-silveira
View
28
Download
4
Embed Size (px)
Citation preview
Francis x. McConville Impact Technology Consuitants FunClions for
Easier Curve Filling An overview Df empirical
relations that cao be used to tit your data
Engineers often need to fit experimental data to an empirical relationship for extrapolation or modeling without resorting to a
fuU mathematical treatment based on physical principIes or theory. The most widely used platform in science for doing this is MS Excel, and while its "off-the-shelf" Trendline tool
Shifted reciprocal
1 y=-
a-x
is useful, it is limited to only 4 or 5 simple functions. Excel's Solver addin, on the other hand, offers a simple, powerful means to fit data to userdefined functions.
There are numerous commercial packages for data fitting and statistical analysis (1], but Excel is ubiquitous, and as E. G. John [2] aptly puts it, the use of Solver for data fitting is "simplicity itself'. More recently, in this publication, Du Plessis [3] further extols the virtues of Excel's Solver function and describes in detail how to use it for this purpose. For a synopsis, see box on p. 5l.
Once one has identified an appropri
ate function and achieved a good fit, then more-advanced modeling tasks become easier; and using calculus one can obtain derivatives and integraIs and thus rates of change, areas under curves, and so on.
But selecting the best function for the curve-fitting exercise is never trivial. To simplify the task, this article illustrates a coUection of 52 common one-, two- and three-parameter binary functions. that cover quite a wide range of behavior and should provide a good starting point. Even the oneand two-parameter relationships can be very versatile in spite of their simplicity. The accompanying figures il
13 L,
y=
~ 17
II I
li
ti I{ 11
I~ Modi y =(,
.~nl
FIGURE 2. Eq
I
lustrate the. b:I these equatlOn of welI-known! where possibl~
~ I Simple
/1. I exponentla
I
x
(asymptotlc)
y=a
\ I I xy=l-a
A I Modified powerPareto I' \ I-Inx(asymptotic)
y=a1
y=I-7
Hy~,boUc y/I I Hyperbolic
COSlnecosine
y = acosh(x/ a)a y = cosh(x)
For the cUf\] most effective, data are in thE
y = a l/x
Exponential (asymptotic)
y=I_e-ax
Exponential (rootfit)
FIGURE 1. Equations (1) to (12) are functions with one fitting parameter, a
48 CHEMICAL ENGINEERING WWW.CHE.COM DECEMBER 2008
13tants ReciprocalLinear
y=ax+b 1 y=-
a+bx
1 good fit, ing tasks lculus one integraIs Exponential
'eas under bx y=ae
Logarilhmic. ction for never triv y = a+blnx his article 2 common Lmeter biüte a wide lld provide n the onenships can 'their sim-figures il-
Rational
a y=-
b+x
Exponenlial b/xy =ae
Power b y=ax
16
Hyperbola (saturation
growlh)
ax y=-
b+x
Exponential (asymplolic)
y = a(l-ebx )
24í Shijied power
y = a(l + X)b
y = axb/x
FIGURE 2. 'Equations (13) to (32) are functions with two fitting parameters, a and b
lustrate the basic curve shapes that these equations generate. The names of well-known functions are included where possible.
For the curve fitting exercise to be most effective, it is important that the data are in the correct form and that
all units are consistent. For example, solubility data can often be fit to one of the simpIe logarithmic functions, but the best results are obtained if solubility is expressed as mole fraction and not weight percent. Thus, some understanding of the underly
ing principIes proves valuable in selecting and properly applying the best empirical model. It is also best to use the minimum number of parameters that will give a good fit, or else the fit may become meaningless.
A common technique is linearizing
CHEMICAL ENGINEERING WWW.CHE,COM DECEMBER 2008 49
li lnverse hyperbola
b c y=a+-+2 x x
FIGURE 3. Equations (33) to (52) are functions with three fitting parameters, a, b and c
Logarithmic
Shifted power
a non-linear equation by rearranging it, thus simplifying the fit processo For example, Equation (9) can be linearized by plotting ln(l- y) versus x. This results in a straight line with slope --a that passes through O. Another example is using Lineweaver-Burke
plots to linearize enzyme kinetic data. This technique can work well in many cases, but it tends to distort the experimental error and amplify the effect of outlying data points. Thus parameters
, derived in this way may not be as accurate as those obtained by fitting
50 CHEMICAL ENGINEERING WWW.CHE.COM DECEMBER 2008
data to the native model. The use of a program like Excel obviates the need to utilize such methods.
It is also possible to treat a data set as a bimodal distribution and fit the data to two different functions, applying one function above a certain value
lhe Solver F spreodshee Column A: Column B: E
coIumn C:' coIumn D: í Column E: t lhe values setup and r such a way lhis genera simpie spre www.pprbo
and anothel For exampl linear up te which it e~
ior. This a~
achieve a I quickly WhE
Fortheva quick look a cate the ex! pIe, Equatie pected to a~
x increases. the value a is negative, ishing valUl And functio perbola [Eql man model to O. Funcl [Equation C peak and th
Some oft: expected bei shapes whel used. This important p is selecting fit parametE derstanding work for selE must resort ues selected physical wc the model fi plished first tified by usi (goodness of fraction thai from 1 for 11 negative vaI
GoodplacE~
tion on curvl
-c)
~
model
, txC
model
-bx )Ce
rI'he use of a les the need
~t a data set and fit the
bons, applyertain value
A OUICK LOOK AT USING EXCEL SOLVER The Solver procedure is usuolly bosed on lhe "sum oF leost squores" opprooch. A typicol spreodsheet setup For on x-r data set would look like this: Column A: independent variable (xl data Column B: experimental dependent variable (r) data Column C: r values calculaled using lhe curve Fitting function of inleresl Column D: the difference between the values in Column C and Column B [residuais) Column E: lhe square of lhe values in column D The values in column E are lhen summed lo generale a "solver parameler". Solver is selup and run to aulomalicolly adjusl lhe volues oF equalion paramelers o, b and c in such o way as lo minimize lhe value of lhe "solver parometer" (c1ick Tools>Solver [4]). This generales lhe values a, b, and c lhat provide lhe besl possible Fil of lhe dolo. A simple spreodsheel demonslraling lhe approach is available 01 lhe aulhor's websile www.pprbook.com under "Templates".
and another function below that value. For example, a data set may be very linear up to a certain value of x, after which it exhibits exponential behavior. This approach can help the user achieve a purely empirical fit more quickly when required.
For the various asymptotic models, a quick look at the function should indicate the expected behavior. For exampIe, Equations (7) and (27) can be expected to approach a value of unity as x increases. Equation (20) approaches the value a as x increases and when b is negative, but approaches Oat diminishing values of x when b is positive. And functions such as the inverse hyperbola [Equation (35)] and the Chapman model [Equation (48)] asymptote to O. Functions such as Box-Lucas [Equation (21)] fit data that reach a peak and then begin to decline.
Some of the models can exhibit unexpected behavior and unusual curve shapes when negative parameters are used. This raises the point that an important part of running the Solver is selecting the starting values of the fit parameters. Where a physical understanding does not provide a framework for selecting starting values, one must resort to trial and error. The values selected must make sense in the physical world. Checking how well the model fits will usually be accomplished first by eye, but can be quantified by using values such as the R2 (goodness of fit) parameter, a unitless fraction that usually ranges in value from 1 for a perfect fit to O (or even negative values) for a poor fito
Good places to look for more information on curve fitting in general are the
O
websites listed in References [51 and [6]. For a more advanced discussion of proper curve fitting using non-linear regression, multivariate analysis, dealing with outlying data points, applying weighting to the data, and other other issues, see References [7-9].
The functions The functions included here represent a number of the most common curve types - power, exponential, logarithmie and trigonometric among others. Some are well known and historically important, and many correspond directly to well-known physical models. For example, the Arrhenius kinetic equation k =Ae-E /RT is a classic example of a basic exponential form [(Equation (19)], where: x =T;y =rate constant, k; a = Arrhenius constant, A; and b =-E/R.
Radioactive decay is a commonly cited example of an exponential model [Equation (18)]. Tlie basic form of the equation is L(t) = L(O)e-lt . In our context, y =L(t), the number of decays per unit time; x =t, elapsed time; a =initial decay rate; and b = l, the probability of a decay event during one time unit.
Similarly, the dilution of a species in a stirred tank being continuously fed fresh media ([A]=O) is described by Equation (18). Here the relationship is [A]/[A ] = e-(q/V)t where [Ao] and [A]oare the concentration ofA at time = O and at time = t, q and V the flowrate and system volume, and t is time.
The Michaelis-Menten enzyme kinetic model, v = Vmax!3/(S + Km), characterized by classic non-competitive substrate inhibition (saturation), fits the form of a hyperbola [Equation
(16)]. Here x = substrate concentration, S; y = reaction rate, v; a = Vmax;
and b = Michaelis constant, Km .
Other interesting examples are described in Bates and Watts [8]. Here the drop in biological oxygen demand (BOD) at a fixed rate constant was reported to follow an exponential decay of the form of Equation (20). And the change in intercellular concentration ofions due to membrane transport out of the cell is effectively fit to the form ofEquation (47).
As with any other endeavor, the deeper the understanding ofthe mechanisms at work, the easier the selection of an appropriate model will be. Hopefully the equations collected here will simplify your data fitting tasks by highlighting typical behaviors and the models that describe them. •
Edited by Gerald Ondrey
References 1. MathCAD, FindGraph, SigmaPlot, Origin
Lab, XlXtrFun. Easily found through an internet search.
2. E. G. John, Simplified Curve Fitting using Spreadsheet Add-ins, Int. J Engineering Ed. 14(5)pp.375-380, 1998.
3. B. J. du Plessis, Using Spreadsheets as Curve Fitting Tools, Chem. Eng. 114 (5) pp 66-69, May 2007.
4. If 'Solver' does not appear under the tools menu in Excel, it may be necessary to activate the add-in by selecting the 'Solver addin' checkbox under Tools>Add-ins.
5. http://www.aip.org/tip/INPHFA/vol-9/iss-2/ p24.html (by Marko Ledvij at The Industrial Physicist)
6. http://www.curvefit.com/(by GraphPad Software)
7. Draper, N. R; Smith, H. "Applied Regression Analysis," 3rd edition, John Wiley & Sons, N.Y.,1998
8. Bates, D. M. and Watts, D. G., "Nonlinear Regression Analysis and its Applications," John Wiley & Sons, N.Y., 1988.
9. Bevington, P. R "Data Reduction and Error Analysis for the Physical Sciences," McGraw HiII, N.Y., 1969.
Author
•~. Francis McConville is a senior consultant for Impact Technology Consultants in Lincoln, Mass. He is the au
... thor of "The Pilot Plant Real Book - A Unique Handbook . . for the Chemical Process Industry," and an instructor for. "'f' the Scientific Update professional training course "Se'-! crets of Batch Process Scale-Up". He has over 25 years
experience in the process industries, including 14 years as a pharmaceutical process development engineer at Sepracor, Inc. McConville holds a B.S. degree in Chemistry and M.S. degrees in Chemical Engineering and in Biotechnology from Worcester Polytechnic Institute in Mas· sachusetts. He is a member of the ACS, ISPE, and a lifetime member of the AIChE. He may be reached at [email protected].
CHEMICAL ENGINEERING WWW.CHE.COM DECEMBER 2008 51