Upload
georgettenicolaides
View
154
Download
1
Embed Size (px)
Citation preview
11
Using Regression to Describe Using Regression to Describe Relationships in Data (Ch 3)Relationships in Data (Ch 3)
A.A. Simple Regression **Simple Regression **
B.B. Multiple RegressionMultiple Regression
22
Using Simple Regression to Using Simple Regression to Describe a RelationshipDescribe a Relationship
Regression analysisRegression analysis is a statistical technique used is a statistical technique used to describe relationships among variables.to describe relationships among variables.
The simplest case is one where a The simplest case is one where a dependent dependent (response) variable(response) variable yy may be related to an may be related to an independentindependent ( (explanatory, causal) variable explanatory, causal) variable x.x.
The equation expressing this relationship is the The equation expressing this relationship is the line:line:
xbby 10
33
Algebra refresherAlgebra refresher
44
Graph of An Exact Relationship Graph of An Exact Relationship y = 1 + 2xy = 1 + 2x
654321
13
8
3
x
y
xx yy
11 33
22 55
33 77
44 99
55 1111
66 1313
What about this?What about this?
55
66
Error in the RelationshipError in the Relationship
In real life, we usually do not have In real life, we usually do not have exact relationships.exact relationships.
Figure 3.2 shows a situation where Figure 3.2 shows a situation where the the yy and and xx have a strong tendency have a strong tendency to increase together but it is not to increase together but it is not perfect.perfect.
^̂ A good guess might be A good guess might be y = 1 + 2.5xy = 1 + 2.5x
77
Graph of a Relationship That is NOT ExactGraph of a Relationship That is NOT Exact
xx yy
11 33
22 22
33 88
44 88
55 1111
66 1313654321
12
7
2
x
y
S = 1.48324 R-Sq = 90.6 % R-Sq(adj) = 88.2 %
y = -0.2 + 2.2 x
Regression Plot
Let us assume this relationshipis approximatelyy = 1 +1.25x
88
ResidualsResiduals
Residuals = errors = deviationsResiduals = errors = deviations For any point, its residual = For any point, its residual = ê =ê = y – y –
y-haty-hat We want them as small as possible… We want them as small as possible…
why?why? Can be positive or negativeCan be positive or negative
99
654321
12
7
2
x
yS = 1.48324 R-Sq = 90.6 % R-Sq(adj) = 88.2 %
y = -0.2 + 2.2 x
Regression PlotFigure 3.3 Deviations From the LineFigure 3.3 Deviations From the Line
- deviations
+ deviations
1010
Computation Ideas (1)Computation Ideas (1)
We can search for a line that We can search for a line that minimizes the sum of the residuals:minimizes the sum of the residuals:
While this is a good idea, it can be While this is a good idea, it can be shown that shown that anyany line passing through line passing through the point (the point (x, yx, y) will have this sum = ) will have this sum = 0.0.
)ˆ(1
i
n
ii yy
1111
Computation Ideas (2)Computation Ideas (2)
We can work with absolute values and We can work with absolute values and search for a line that minimizes:search for a line that minimizes:
Such a procedure—called LAV or Such a procedure—called LAV or least least absolute valueabsolute value regression—does regression—does exist but usually is found only in exist but usually is found only in specialized software.specialized software.
|ˆ|1
i
n
ii yy
1212
Computation Ideas (3)Computation Ideas (3)
By far the most popular approach is to By far the most popular approach is to square the residuals and minimize:square the residuals and minimize:
This procedure is called This procedure is called least squaresleast squares and is widely available in software. and is widely available in software. It uses calculus to solve for the It uses calculus to solve for the bb0 0
and and bb11 terms and gives a unique terms and gives a unique solution.solution.
2
1
)ˆ( i
n
ii yy
1313
Least Squares EstimatorsLeast Squares Estimators
There are several formula for the There are several formula for the bb11 term. If doing it by hand, we might term. If doing it by hand, we might want to use:want to use:
_ __ _ The intercept is The intercept is bb00 = y – b = y – b11 x x
n
i
n
iii
n
i
n
i
n
iiiii
xn
x
yxn
yxb
1
2
1
2
1 1 11
1
1
1414
Figure 3.5 Figure 3.5 Computations Computations
RequiredRequiredfor for bb1 1 and and bb00
xxii yyii xxii22 xxiiyyii
11 33 11 33
22 22 44 44
33 88 99 2424
44 88 1616 3232
55 1111 2525 5555
66 1313 3636 7878
2121 4545 9191 196196Totals
1515
CalculationsCalculations
n
i
n
iii
n
i
n
i
n
iiiii
xn
x
yxn
yxb
1
2
1
2
1 1 11
1
1
__ __bb00 = y – b = y – b11 x = x =
1616
The Unique MinimumThe Unique Minimum
The line we obtained was:The line we obtained was:
This is the best (with smallest error) This is the best (with smallest error) equation.equation.
We guessed We guessed y = 1 + 2.5x y = 1 + 2.5x on slide 7on slide 7
xy 2.22.0ˆ
1717
We statistical software!
1818
Examples of Regression as a Examples of Regression as a Descriptive TechniqueDescriptive Technique
SU is concerned about the cost of adding SU is concerned about the cost of adding new computers to an existing network. new computers to an existing network. They obtained data on 14 existing campus They obtained data on 14 existing campus computer labs.computer labs.
They did a regression of cost in $’s v. the They did a regression of cost in $’s v. the number of computers.number of computers.
1919
Pricing a Computer NetworkPricing a Computer Network
^̂
yy [Cost] = 16594 + 650 [#computers]
2020
Interpreting the equation in words…Interpreting the equation in words…
Slope:Slope: – on average, each additional computer costs on average, each additional computer costs
$650$650..– Or – The cost of the project increases by $650 for Or – The cost of the project increases by $650 for
each additional computer.each additional computer. Intercept:Intercept:
– Must meet all of the following conditions:Must meet all of the following conditions:– Fixed cost? Fixed cost? – Did we collect data at x = 0?Did we collect data at x = 0?– Does it make practical sense to build a network Does it make practical sense to build a network
of 0 computers?of 0 computers? Prediction:Prediction:
– on average, the cost for adding 10 computers is on average, the cost for adding 10 computers is $23,094$23,094 ($16594 + $650 x 10) ($16594 + $650 x 10)
^̂
yy [Cost] = 16594 + 650 [#computers]
2121
Ex. Estimating Residential Real Estate Ex. Estimating Residential Real Estate ValuesValues
The Tarrant County Appraisal District The Tarrant County Appraisal District uses data such as house size, uses data such as house size, location and depreciation to help location and depreciation to help appraise property.appraise property.
Here we look at how appraisal value Here we look at how appraisal value ($’s) depends on size (sq feet) for a ($’s) depends on size (sq feet) for a set of 100 homes. set of 100 homes.
The data are from 1990.The data are from 1990.
2222
4500350025001500500
300000
200000
100000
0
SIZE
VA
LU
ETarrant County Real EstateTarrant County Real Estate
^̂yy [Value] = -50035 + 72.8 [sq feet]
2323
Interpreting the equation in wordsInterpreting the equation in words
Slope:Slope: – On average, each On average, each
additional square foot additional square foot increases the appraisal increases the appraisal value of a house by value of a house by $72.80.$72.80.
– Better --Better -- on average, each on average, each additional 100 sq feet additional 100 sq feet raises the appraisal value raises the appraisal value of a house by of a house by $7,280$7,280..
– Or -- Or -- The appraisal value The appraisal value of a house rises by about of a house rises by about $7280 for each additional $7280 for each additional 100 square feet.100 square feet.
^̂
yy [Value] = -50035 + 72.8 [sq feet]
Interpreting the equation in wordsInterpreting the equation in words
Intercept:Intercept: – Is this the “fixed appraisal value?”Is this the “fixed appraisal value?”– Is the intercept within the range? Is the intercept within the range?
– A house with zero square feet???A house with zero square feet???
Prediction:Prediction:– The value of a 1,500 square foot house is The value of a 1,500 square foot house is $ 59,165$ 59,165 (- (-
50035 + 72.8 x 1500) on average.50035 + 72.8 x 1500) on average.
2424
2525
Ex. Forecasting Housing StartsEx. Forecasting Housing Starts
Here we analyze the relationship between Here we analyze the relationship between US US housing starts and mortgage rates. and mortgage rates. The rate used is the US average for new The rate used is the US average for new home purchases.home purchases.
Annual data from 1963 to 2002 is used.Annual data from 1963 to 2002 is used.
2626
15105
2400
2200
2000
1800
1600
1400
1200
1000
RATES
ST
AR
TS
US Housing StartsUS Housing Starts^̂yy [starts] = 1726 - 22.2 [rates]
2727
……does a relationship exist?does a relationship exist?
the plot shows there is little the plot shows there is little relationship in these data relationship in these data – Is this reliable?Is this reliable?– What might improve this model?What might improve this model?
2828
Predict the price of a diamond based on its Predict the price of a diamond based on its carats: Price (y) v. Caratage (x)carats: Price (y) v. Caratage (x)
Coef SE Coef T PConstant -259.62591 17.31886 -14.99 2.52E-19Caratage 3721.0249 81.78588 45.497 6.75E-40
The dependent variable is WinsThe dependent variable is Wins
Predictor Predictor Coef SE Coef T P Coef SE Coef T P Constant Constant -79.63 40.45 -1.97 0.059 -79.63 40.45 -1.97 0.059 Field Goals Made 0.04119 0.01380 2.99 0.006Field Goals Made 0.04119 0.01380 2.99 0.006
2929
Predict # calls by CSR in call centerPredict # calls by CSR in call center
The regression equation isThe regression equation is CALLS = 13.7 + 0.744 MONTHSCALLS = 13.7 + 0.744 MONTHS
Predictor Coef SE Coef T PPredictor Coef SE Coef T P Constant 13.671 1.427 9.58 0.000Constant 13.671 1.427 9.58 0.000 MONTHS 0.74351 0.06666 11.15 0.000MONTHS 0.74351 0.06666 11.15 0.000
3030
3131
AssignmentAssignment
Study Text 3.1 – 3.2 and lecture notesStudy Text 3.1 – 3.2 and lecture notes Study table 3.14 on page 84Study table 3.14 on page 84 We will start HW/Lab 1 together.We will start HW/Lab 1 together. A short quiz will cover the lab, text, and A short quiz will cover the lab, text, and
lecture materiallecture material Review Text Chapter 2.1-2.2 and PPt Review Text Chapter 2.1-2.2 and PPt
Review of Basic Statistics 2.1-2.2 (why? Review of Basic Statistics 2.1-2.2 (why? questions on descriptive statistics and questions on descriptive statistics and plots will be on quiz also!)plots will be on quiz also!)