Regression Models

Part 1: Simple Linear Model1-1/301-1

Regression ModelsProfessor William GreeneStern School of Business

IOMS DepartmentDepartment of Economics

Part 1: Simple Linear Model1-2/301-2

Regression and Forecasting Models

Part 1 – Simple Linear Model

Part 1: Simple Linear Model1-3/30

Theory

Demand Theory: Q = f(Price) “The Law of Demand” Demand curves slope

downward What does “ceteris paribus” mean here?


Data on the U.S. Gasoline Market

Quantity = G = Expenditure / Price


Shouldn’t Demand Curves Slope Downward?

G

GasP

rice

0.650.600.550.500.450.400.350.30

140

120

100

80

60

40

20

0

Scatterplot of GasPrice vs G


Data on 62 Movies in 2010


Average Box Office Revenue is about $20.7 Million


Is There a Theory for This?

Scatter plot of box office revenues vs. number of “Can’t Wait To See It” votes on Fandango for 62 movies.


Average Box Office by Internet Buzz Index

= Average Box Office for Buzz in Interval


Deterministic Relationship: Not a Theory

Expected High Temperatures, August 11-20, 2013, ZIP 10012, NY


Probabilistic RelationshipWhat Explains the Noise?

Fuel Bill = Function of Rooms + Random Variation


Movie Buzz DataProbabilistic Relationship?


The Regression Model

y = 0 + 1x + y = dependent variablex = independent variableThe ‘regression’ is the deterministic part, 0 + 1 xThe ‘disturbance’ (noise) is .The regression model is E[y|x] = 0 + 1x


0 = y intercept

1 = slopeE[y|x] = 0 +

1x

y

x

Linear Regression Model


The Model Constructed to provide a framework for

interpreting the observed data What is the meaning of the observed relationship

(assuming there is one) How it’s used

Prediction: What reason is there to assume that we can use sample observations to predict outcomes?

Testing relationships


The slope is the interesting quantity.Each additional year of education is associated with an increase of 3.611 in disability adjusted life expectancy.


A Cost ModelElectricity.mpjTotal cost in $MillionOutput in Million KWHN = 123 American electric utilitiesModel: Cost = 0 + 1 KWH + ε


Cost Relationship

Output

Cost

80000700006000050000400003000020000100000

500

400

300

200

100

0

Scatterplot of Cost vs Output


Sample Regression


Interpreting the Model Cost = 2.44 + 0.00529 Output + e Cost is $Million, Output is Million KWH. Fixed Cost = Cost when output = 0

Fixed Cost = $2.44Million Marginal cost

= Change in cost/change in output= .00529 * $Million/Million KWH= .00529 $/KWH = 0.529 cents/KWH.


Covariation and Causality

EDUC

DALE

121086420

80

70

60

50

40

30

20

S 7.87034R-Sq 59.2%R-Sq(adj) 59.0%

Fitted Line PlotDALE = 35.16 + 3.611 EDUC

Does more education make you live longer (on average)?


Causality?

Height (inches) and Income ($/mo.) in first post-MBA Job (men). WSJ, 12/30/86.Ht. Inc. Ht. Inc. Ht. Inc.70 2990 68 2910 75 3150 67 2870 66 2840 68 2860 69 2950 71 3180 69 2930 70 3140 68 3020 76 3210 65 2790 73 3220 71 3180 73 3230 73 3370 66 2670 64 2880 70 3180 69 3050 70 3140 71 3340 65 2750 69 3000 69 2970 67 2960 73 3170 73 3240 70 3050

Estimated Income = -451 + 50.2 Height


b0

b1

How to compute the y intercept, b0, and the slope, b1, in y = b0 + b1x.


Least Squares Regression


Fitting a Line to a Set of Points

Income

PerC

apita

G

27000260002500024000230002200021000

6.4

6.3

6.2

6.1

6.0

5.9

5.8

5.7

5.6

Scatterplot of PerCapitaG vs Income

Choose b0 and b1 tominimize the sum of squared residuals

Gauss’s methodof least squares.

N N N2 2 2i 0 1 i i 0 1 i ii 1 i 1 i 1

SS [y - b - b x ] [y - (b + b x )] e

Residuals i i 0 1 i

i i

e y (b b x )ˆ y y

Yi

Xi

Predictionsb0 + b1xi


Computing the Least Squares Parameters b0 and b1

N Ni ii 1 i 1

N2 2x ii 1

Nxy i ii 1

1 1y = y = 20.721 x = x = 0.48242N N

1Var(x) = s = (x x) = 0.02453N-1

1Cov(x,y) = s = (x x)(y y) = 1.784N-1

4 numbers are needed :

xy1 2

x

0 1

s 1.784b 72.7181s 0.02453

b y - b x = 20.721- (72.7181)(0.48242) = -14.36


b0=-14.36

b1= 72.718


Least Squares Uses Calculus

0 1

0 1

0 0

0 1

0 1

1 1

0 1

N 21i iN-1 i=1

2N i i1

N-1 i=1

N1i iN-1 i=1

2N i i1

N-1 i=1

N1i i iN-1 i=1

SS = (y -b -b x )

(y -b -b x )SS =b b

= 2(y -b -b x )(-1) = 0

(y -b -b x )SS =b b

= 2(y -b -b x )(-x ) = 0

0 1

1

N1i=1 i iN-1

N 21i=1 iN-1

The solution is b = y - b x where

Σ (x - x)(y - y)b =

Σ (x - x)


0 1

0 1b =-b =-14.

20.00,36, b =72.71

b =73.5008, Sum of Squares = , Sum of Squares = 1

10724

51.569.7

Least squares minimizes the sum of squared deviations from the line.


Summary Theory vs. practice Linear Relationship

Deterministic Random, stochastic, ‘probabilistic’ Mean is a function of x

Regression Relationship Causality vs. correlation Least squares

Documents

Regression Models