Upload
kyle
View
48
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Regression Statistics. Engineers and Regression. Engineers often: Regress data to a model Used for assessing theory Used for predicting Empirical or theoretical model Use the regression of others Antoine Equation DIPPR. What are uncertainties of regression ? Do the data fit the model? - PowerPoint PPT Presentation
Citation preview
Regression Statistics
Engineers and Regression
Engineers often:
Regress data to a model Used for assessing theory Used for predicting Empirical or theoretical model
Use the regression of others Antoine Equation DIPPR
0 2 4 6 8 1012141618-5
0
5
10
15
20
25
X
Y
What are uncertainties of regression?
• Do the data fit the model?• What are the errors in the
prediction?• What are the errors in the
parameters?
Linear RegressionTwo classes for a model
Linear Non-linear
“Linear” refers to parameters, not the dependent variable (i.e. written in matrix form- linear algebra)
You can use the Mathcad function “linfit” on linear equations
cbxaxy 2
Example
= x2 x 1 abc
y
• parameters are linear• x is not linear
cbxaxy 2
bxaey
543 Te
Td
Tc
Tbay
CTBAy exp
bmxy
Quiz
1.
2.
3.
4.
5.
Which models can you use linear regression?
Regression- Straight line• Y= b0 + b1X
• Two parametersPractice• Open Excel Regression example (1st order
Example)• If necessary, add-in the Data Analysis
ToolPak• Regress the data in Columns B and C-
select 95 % confidence and residual plot. You may include data labels but you must select label.
• Copy the regressed slope and intercept into m and b cells in column H
• The next slides will help us understand the output
Straight Line Model
0 2 4 6 8 10 12 14 16 18-5
0
5
10
15
20
25residual (error)
“X” data
“Y” measured
slope
Intercept
𝑌 𝑖=𝑏0+𝑏1𝑋 𝑖+𝑒𝑖
𝑌 𝑖=𝑏0+𝑏1𝑋 𝑖
“Y” predicted
Straight Line Model
residual (error)
sum squared error
iX iY iY ie
“X” data“Y” data
1 2.749032178 1.439088483 1.3099436942 3.719910224 2.362003033 1.3579071923 0.925995017 3.284917582 -2.358922574 2.623482686 4.207832132 -1.584349455 6.539797342 5.130746681 1.4090506616 6.779909177 6.053661231 0.7262479467 4.946150401 6.976575781 -2.030425388 9.674178069 7.89949033 1.7746877399 7.61959821 8.82240488 -1.2028066710 7.650020996 9.745319429 -2.0952984311 11.514 10.66823398 0.84576602112 13.18285068 11.59114853 1.59170215213 13.28173635 12.51406308 0.76767327514 13.60444592 13.43697763 0.1674682915 12.79535218 14.35989218 -1.5645416 17.82374778 15.28280673 2.54094105617 14.55068379 16.20572128 -1.65503748
42.76608602
“Y” predicted𝑌 𝑖=𝑏0+𝑏1𝑋 𝑖
b0 = 0.923, b1 = 0.516
Number of fitted parameters: 2 for a two-parameter model
The R2 StatisticA useful statistic but not definitive
Tells you how well the data fit the model.
It does not tell you if the model is correct.
How much of the distribution of the data about the mean is
described by the model.
Average measured y value
Problems with R2
3 5 7 9 11 13 15 17 193579
1113
f(x) = 0.500090909090909 x + 3.00009090909091R² = 0.666542459508775
X
Y
3 5 7 9 11 13 15 17 193579
1113f(x) = 0.499909090909091 x + 3.00172727272727R² = 0.666707256898465
X
Y
3 5 7 9 11 13 15 17 193579
1113
f(x) = 0.499727272727273 x + 3.00245454545455R² = 0.666324041066559
X
Y
3 5 7 9 11 13 15 17 193579
1113
f(x) = 0.5 x + 3.00090909090909R² = 0.666242033727484
X
Y
Residuals (ei) should be normally distributed
3 5 7 9 11 13 15 17 193579
1113
f(x) = 0.500090909090909 x + 3.00009090909091R² = 0.666542459508775
X
Y1 2 3 4 5 6 7 8 9 10 11
-2.5-2
-1.5-1
-0.50
0.51
1.52
2.5
e
3 5 7 9 11 13 15 17 193579
1113
f(x) = 0.499727272727273 x + 3.00245454545455R² = 0.666324041066559
X
Y1 2 3 4 5 6 7 8 9 10 11
-1.5-1
-0.50
0.51
1.52
2.53
3.5
e
3 5 7 9 11 13 15 17 193579
1113f(x) = 0.499909090909091 x + 3.00172727272727R² = 0.666707256898465
X
Y1 2 3 4 5 6 7 8 9 10 11
-2-1.5
-1-0.5
00.5
11.5
22.5
e
3 5 7 9 11 13 15 17 193579
1113
f(x) = 0.5 x + 3.00090909090909R² = 0.666242033727484
X
Y 1 2 3 4 5 6 7 8 9 10 11
-2.5-2
-1.5-1
-0.50
0.51
1.5
e
Residuals (ei) should be normally distributed
Statistics: Slope/Intercept(1-a)100% Confidence
Intervalsintercept
slope
𝑏0±𝑆𝑏0 𝑡
h𝑤 𝑒𝑟𝑒𝑡= 𝑓 (𝛼2 ,𝑛−2)
𝑏1±𝑆𝑏1𝑡Called standard
error in Excel
Number of fitted parameters: 2 for a two-parameter model
Statistics: Predicted Variable(1-a)100% Confidence
Intervals
h𝑤 𝑒𝑟𝑒𝑡= 𝑓 (𝛼2 ,𝑛−2)𝑌 𝑝=𝑌 𝑝±𝑆 �� 𝑡
0 2 4 6 8 10 12 14 16 18-5
0
5
10
15
20
25
Given a specific value for X what is the error in ?
Confidence Interval on the Prediction
If I take more data, where will the data fall? (Where will be found if measured at given X?)
Expected Range of Data
n ∞SY 0
n ∞SY σ
Generalized Linear Regression
Linear regression can be written in matrix form. X Y21 18624 21432 28847 42550 45559 53968 62274 67562 56250 45341 37030 274
Straight Line Model
Quadratic Model
𝑌 𝑖=𝑏0+𝑏1𝑋 𝑖+𝑒𝑖
𝑌 𝑖=𝑏0+𝑏1𝑋 𝑖+𝑏2𝑋 𝑖2+𝑒𝑖
Excel Practice1. Complete columns D-H in 1st order example- look at
95% confidence level of data.
2. Complete multiple regression Practice examplea) Highlight all x columns for x inputb) Note that x data could represent non-linear data
such as T, 1/T, T2, etc. c) Look at regressed parameter associated with x1
and calculate the 95% confidence. Compare with Excel.
d) Select residuals and residual plots. Note that there are two types of residual information: 1) each dependent variable has a residual plot where the x-axis is the data and the y-axis is the residual for the data; 2) the summary output gives a residual for y (actual Y – predicted Y). If desired, you can plot as a function of the # of data points.