40
Part 1 Cross Sectional Data •Simple Linear Regression Model – Chapter 2 •Multiple Regression Analysis – Chapters 3 and 4 •Advanced Regression Topics – Chapter 6 •Dummy Variables – Chapter 7 •Note: Appendices A, B, and C are additional

Part 1 Cross Sectional Data Simple Linear Regression Model – Chapter 2 Multiple Regression Analysis – Chapters 3 and 4 Advanced Regression Topics – Chapter

  • View
    221

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Part 1 Cross Sectional Data Simple Linear Regression Model – Chapter 2 Multiple Regression Analysis – Chapters 3 and 4 Advanced Regression Topics – Chapter

Part 1Cross Sectional Data

•Simple Linear Regression Model – Chapter 2

•Multiple Regression Analysis – Chapters 3 and 4

•Advanced Regression Topics – Chapter 6

•Dummy Variables – Chapter 7•Note: Appendices A, B, and C are

additional review if needed.

Page 2: Part 1 Cross Sectional Data Simple Linear Regression Model – Chapter 2 Multiple Regression Analysis – Chapters 3 and 4 Advanced Regression Topics – Chapter

1. The Simple Regression Model2.1 Definition of the Simple Regression

Model2.2 Deriving the Ordinary Least Squares

Estimates2.3 Properties of OLS on Any Sample of

Data2.4 Units of Measurement and Functional

Form2.5 Expected Values and Variances of the

OLS Estimators2.6 Regression through the Origin

Page 3: Part 1 Cross Sectional Data Simple Linear Regression Model – Chapter 2 Multiple Regression Analysis – Chapters 3 and 4 Advanced Regression Topics – Chapter

2.1 The Simple Regression Model• Economics is built upon assumptions

-assume people are utility maximizers-assume perfect information-assume we have a can opener

• The Simple Regression Model is based on assumptions

-more assumptions are required for more analysis-disproving assumptions leads to more complicated models

Page 4: Part 1 Cross Sectional Data Simple Linear Regression Model – Chapter 2 Multiple Regression Analysis – Chapters 3 and 4 Advanced Regression Topics – Chapter

2.1 The Simple Regression Model• Recall the SIMPLE LINEAR REGRESION

MODEL:

-relates two variables (x and y)-also called the two-variable linear regression model or bivariate linear regression model

y is the DEPENDENT or EXPLAINED variablex is the INDEPENDENT or EXPLANATORY

variabley is a function of x

(2.1) 10 uxy

Page 5: Part 1 Cross Sectional Data Simple Linear Regression Model – Chapter 2 Multiple Regression Analysis – Chapters 3 and 4 Advanced Regression Topics – Chapter

2.1 The Simple Regression Model• Recall the SIMPLE LINEAR REGRESION

MODEL:

u is the ERROR TERM or DISTURBANCE variable

-u takes into account all factors other than x that affect y

-u accounts for all “unobserved” impacts on y

(2.1) 10 uxy

Page 6: Part 1 Cross Sectional Data Simple Linear Regression Model – Chapter 2 Multiple Regression Analysis – Chapters 3 and 4 Advanced Regression Topics – Chapter

2.1 The Simple Regression Model

• Example of the SIMPLE LINEAR REGRESION MODEL:

-taste depends on cooking time-taste is explained by cooking time-taste is a function of cooking time-u accounts for other factors affecting

taste (cooking skill, ingredients available, random luck, differing taste buds, etc.)

(ie) 10 uecookingtimtaste

Page 7: Part 1 Cross Sectional Data Simple Linear Regression Model – Chapter 2 Multiple Regression Analysis – Chapters 3 and 4 Advanced Regression Topics – Chapter

2.1 The Simple Regression Model

• The SRM shows how y changes:

-for example, if B1=3, a 2 increase in x would cause a 6 unit change in y (2 x 3 = 6)

-B1 is the SLOPE PARAMETER

-B0 is the INTERCEPT PARAMETER or CONSTANT TERM

-not always useful in analysis

(2.2) 0u ifx y 1

Page 8: Part 1 Cross Sectional Data Simple Linear Regression Model – Chapter 2 Multiple Regression Analysis – Chapters 3 and 4 Advanced Regression Topics – Chapter

2.1 The Simple Regression Model

-note that this equation implies CONSTANT returns

-the first x has the same impact on y as the 100th x

-to avoid this we can include powers or change functional forms

(2.1) 10 uxy

Page 9: Part 1 Cross Sectional Data Simple Linear Regression Model – Chapter 2 Multiple Regression Analysis – Chapters 3 and 4 Advanced Regression Topics – Chapter

2.1 The Simple Regression Model

-in order to achieve a ceteris paribus analysis of x’s affect on y, we need assumptions of u’s relationship with x

-in order to simplify our assumptions, we first assume that the average of u in the population is zero:

(2.5) 0(u) E-if Bo is included in the equation, it can

always be modified to make (2.5) true-ie: if E(u)>0, simply increase B1

Page 10: Part 1 Cross Sectional Data Simple Linear Regression Model – Chapter 2 Multiple Regression Analysis – Chapters 3 and 4 Advanced Regression Topics – Chapter

2.1 x, u and Dependence-we now need to assume that x and u are

unrelated-if x and u are uncorrelated, u may still be

correlated to functions such as x2

-we therefore need a stronger assumption:

-the average value of u does not depend on x-second equality comes from (2.5)

-called the ZERO CONDITIONAL MEAN ASSUMPTION

(2.6) 0)()|( uExuE

Page 11: Part 1 Cross Sectional Data Simple Linear Regression Model – Chapter 2 Multiple Regression Analysis – Chapters 3 and 4 Advanced Regression Topics – Chapter

2.1 ExampleTake the regression:

(ie)u tyPaperqualiapermark 10 P

-where u takes into other factors of the applied paper, in particular length exceeding 10 pages

-assumption (2.6) requires that a paper’s length does not depend on how good it is:

0paper) bad|E(lengthpaper) good|( lengthE

Page 12: Part 1 Cross Sectional Data Simple Linear Regression Model – Chapter 2 Multiple Regression Analysis – Chapters 3 and 4 Advanced Regression Topics – Chapter

2.1 The Simple Regression Model

• Conditional Expectations of (2.1) and (2.6) give us:

-2.8 is called the POPULATION REGRESSION FUNCTION (PRF)

-a one unit increase in x increases the expected value of y by B1

-B0+B1x is the systematic (explained) part of y

-u is the unsystematic (unexplained) part of y

(2.8)x x)|(y 10 E

Page 13: Part 1 Cross Sectional Data Simple Linear Regression Model – Chapter 2 Multiple Regression Analysis – Chapters 3 and 4 Advanced Regression Topics – Chapter

2.2 Deriving the OLS Estimates• In order to estimate B1 and B2, we need sample

data

-let {(x,y):i=1,….n} be a sample of n observations from the population

-here yi is explained by xi with error term ui

-y5 indicates the observation of y from the 5th data point

-this regression plots a “best fit” line through our data points:

(2.9) ux ii10i y

Page 14: Part 1 Cross Sectional Data Simple Linear Regression Model – Chapter 2 Multiple Regression Analysis – Chapters 3 and 4 Advanced Regression Topics – Chapter

2.2 Deriving the OLS EstimatesThese OLS estimates create a straight line going

through the “middle” of the estimates:

Studying and Marks

0

1

2

3

4

5

6

7

8

0 20 40 60 80 100 120

Marks

Stu

dy

ing

Page 15: Part 1 Cross Sectional Data Simple Linear Regression Model – Chapter 2 Multiple Regression Analysis – Chapters 3 and 4 Advanced Regression Topics – Chapter

2.2 Deriving OLS EstimatesIn order to derive OLS, we first need assumptions.

We must first assume that u has zero expected value:

(2.10) 0(u) E-Secondly, we must assume that the covariance

between x and u is zero:

(2.11) 0)(),( xuEuxCov-(2.10) and (2.11) can also be rewritten in

terms of x and y as:

(2.12) 0x)--(y 10 E

(2.13) 0x)]--(y[ 10 xE

Page 16: Part 1 Cross Sectional Data Simple Linear Regression Model – Chapter 2 Multiple Regression Analysis – Chapters 3 and 4 Advanced Regression Topics – Chapter

2.2 Deriving OLS Estimates-(2.12) and (2.13) imply restrictions on the joint

probability of the POPULATION-given SAMPLE data, these equations become:

(2.14) 0)xˆ-ˆ(y n

1i1

n

1ii

o

-notice that the “hat” above B1 and B2 indicate we are now dealing with estimates

-this is an example of “method of moments” estimation (see Section C for a discussion)

(2.15) 0)xˆ-ˆ(yx n

1i1

n

1iii

o

Page 17: Part 1 Cross Sectional Data Simple Linear Regression Model – Chapter 2 Multiple Regression Analysis – Chapters 3 and 4 Advanced Regression Topics – Chapter

2.2 Deriving OLS EstimatesUsing summation properties, (2.14)

simplifies to:(2.16) xˆˆ

10 y

Which can be rewritten as:

(2.17) ˆˆ10 xy

Which is our OLS estimate for the intercept-therefore given data and an estimate of the slope, the estimated

intercept can be determined

Page 18: Part 1 Cross Sectional Data Simple Linear Regression Model – Chapter 2 Multiple Regression Analysis – Chapters 3 and 4 Advanced Regression Topics – Chapter

2.2 Deriving OLS EstimatesBy cancelling out 1/n and combining (2.17)

and 2.15 we get:

0]ˆ)ˆ([ 1

n

1i1

xxyyx ii

Which can be rewritten as:

n

1i 11 )(ˆ)(n

iiiii xxxyyx

Page 19: Part 1 Cross Sectional Data Simple Linear Regression Model – Chapter 2 Multiple Regression Analysis – Chapters 3 and 4 Advanced Regression Topics – Chapter

2.2 Deriving OLS EstimatesRecall the algebraic properties:

2n

1i

n

1i

][][ xxxxx iii

And

]][[][n

1i

n

1i

yyxxyyx iiii

Page 20: Part 1 Cross Sectional Data Simple Linear Regression Model – Chapter 2 Multiple Regression Analysis – Chapters 3 and 4 Advanced Regression Topics – Chapter

2.2 Deriving OLS EstimatesWe can make the simple assumption that:

(2.18) 0][ 2n

1i

xxi

Which essentially states that not all x’s are the same-ie: you didn’t do a survey where one question is “are you alive?”

-This is essentially the key assumption needed to estimate B1hat

Page 21: Part 1 Cross Sectional Data Simple Linear Regression Model – Chapter 2 Multiple Regression Analysis – Chapters 3 and 4 Advanced Regression Topics – Chapter

2.2 Deriving OLS EstimatesAll this gives us the OLS estimate for B1:

(2.19) ][

]][[ˆ

2n

1i

n

1i1

xx

yyxx

i

ii

Note that assumption (2.18) basically ensured the denominator is not zero.-also note that if x and y are positively (negatively) correlated, B1hat will be

positive (negative)

Page 22: Part 1 Cross Sectional Data Simple Linear Regression Model – Chapter 2 Multiple Regression Analysis – Chapters 3 and 4 Advanced Regression Topics – Chapter

2.2 Fitted ValuesOLS estimates of B1 and B2 give us a

FITTED value for y when x=xi:

(2.20) xˆˆˆ i10 iy-there is one fitted or predicted value of y for each observation of x-the predicted y’s can be greater than, less than or (rarely) equal to

the actual y’s

Page 23: Part 1 Cross Sectional Data Simple Linear Regression Model – Chapter 2 Multiple Regression Analysis – Chapters 3 and 4 Advanced Regression Topics – Chapter

2.2 ResidualsThe difference between the actual y values

and the estimates is the ESTIMATED error, or residuals:

(2.21) xˆˆˆˆ i10 iiii yyyu

-again, there is one residual for each observation-these residuals ARE NOT the same as the actual error term

Page 24: Part 1 Cross Sectional Data Simple Linear Regression Model – Chapter 2 Multiple Regression Analysis – Chapters 3 and 4 Advanced Regression Topics – Chapter

2.2 ResidualsThe SUM OF SQUARED RESIDUALS can be

expressed as:

(2.22) )xˆˆ(ˆ 2i10

1

2

i

n

ii yu

-if B1hat and B2hat are chosen to minimize (2.22), (2.14) and (2.15) are our FIRST ORDER CONDITIONS (FOC’S) and we are able to derive the same OLS estimates as above (2.17) and (2.19)

-the term “OLS” comes from the fact that the square of the residuals is minimized

Page 25: Part 1 Cross Sectional Data Simple Linear Regression Model – Chapter 2 Multiple Regression Analysis – Chapters 3 and 4 Advanced Regression Topics – Chapter

2.2 Why OLS?Why minimize the sum of the squared

residuals?

-Why not minimize the residuals themselves?-Why not minimize the cube of the residuals?

-not all minimization techniques can be expressed as formulas-OLS has the advantage of deriving unbiasedness, consistency, and

other important statistical properties.

Page 26: Part 1 Cross Sectional Data Simple Linear Regression Model – Chapter 2 Multiple Regression Analysis – Chapters 3 and 4 Advanced Regression Topics – Chapter

2.2 Regression LineOur OLS regression supplies us with an OLS

REGRESSION LINE:

(2.23) x ˆˆˆ 10 y-note that as this is an equation of a line, there are no subscripts

-B0hat is the predicted value of y when x=0

-not always a valid value

-(2.23) is also called the SAMPLE REGRESSION FUNCTION (SRF)

-different data sets will estimate different B’s

Page 27: Part 1 Cross Sectional Data Simple Linear Regression Model – Chapter 2 Multiple Regression Analysis – Chapters 3 and 4 Advanced Regression Topics – Chapter

2.2 Deriving OLS EstimatesThe slope estimate:

(2.24)x /y1 Shows the change in yhat when x changes, or

alternatively,

(2.25) x ˆy 1

The change in x can be multiplied by B1hat to estimate the change in y.

Page 28: Part 1 Cross Sectional Data Simple Linear Regression Model – Chapter 2 Multiple Regression Analysis – Chapters 3 and 4 Advanced Regression Topics – Chapter

2.2 Deriving OLS Estimates• Notes:

1) As the mathematics required to estimate OLS is difficult with more than a few data points, econometrics software (like Shazam) must be used.

2) A successful regression cannot conclude on causality, only comment on positive or negative relations between x and y

3) We often use the terminology “regress y on x” to estimate y=f(x)

Page 29: Part 1 Cross Sectional Data Simple Linear Regression Model – Chapter 2 Multiple Regression Analysis – Chapters 3 and 4 Advanced Regression Topics – Chapter

2.3 Properties of OLS on Any Sample of Data

•Review

-Once again, simple algebraic properties are needed in order to build OLS’s foundation-OLS (B1hat and B2hat) can be used to calculate fitted values (yhat)-the residual (u) is the difference between the actual y values and the estimated y values (yhat)

Page 30: Part 1 Cross Sectional Data Simple Linear Regression Model – Chapter 2 Multiple Regression Analysis – Chapters 3 and 4 Advanced Regression Topics – Chapter

2.3 Properties of OLSu=y-yhat

Here yhat underpredicts yStudying and Marks

012345678

0 20 40 60 80 100 120

Marks

Stu

dyi

ng

yhat y

uhat

Page 31: Part 1 Cross Sectional Data Simple Linear Regression Model – Chapter 2 Multiple Regression Analysis – Chapters 3 and 4 Advanced Regression Topics – Chapter

2.3 Properties of OLS1) From the FOC of OLS (2.14), the sum of

all residuals is zero:

(2.30) 0ˆn

1ii

u

2) Also from the FOC of OLS (2.15), the sample covariance between the regressors and the OLS residuals is zero:

(2.31) 0ˆn

1i

iiux

From 2.30, the left side of 2.31) is proportional to the required sample covariance

Page 32: Part 1 Cross Sectional Data Simple Linear Regression Model – Chapter 2 Multiple Regression Analysis – Chapters 3 and 4 Advanced Regression Topics – Chapter

2.3 Properties of OLS3) The point (xbar, ybar) is always on the

OLS regression line (from 2.16):

(2.16) xˆˆ10 y

Further Algebraic Gymnastics:

1) From (2.30) we know that the sample average of the fitted y values equals the sample average of the actual y values:

yy ˆ

Page 33: Part 1 Cross Sectional Data Simple Linear Regression Model – Chapter 2 Multiple Regression Analysis – Chapters 3 and 4 Advanced Regression Topics – Chapter

2.3 Properties of OLSFurther Algebraic Gymnastics:

2) 2.30 and 2.31 combine to prove that the covariance between yhat and uhat is zero

Therefore OLS breaks down yi into two uncorrelated parts – a fitted value and a residual:

(2.32) ˆˆ iii uyy

Page 34: Part 1 Cross Sectional Data Simple Linear Regression Model – Chapter 2 Multiple Regression Analysis – Chapters 3 and 4 Advanced Regression Topics – Chapter

2.3 Sum of SquaresFrom the idea of fitted and residual

components, we can calculate the TOTAL SUM OF SQUARES (SST), the EXPLAINED SUM OF SQUARES (SSE) and the RESIDUAL SUM OF SQUARES (SSR)

(2.35) )u( )y-(ySSR

(2.34) )y-y(SSE

(2.33) )y-(ySST

n

1i

2i

n

1i

2ii

n

1i

2i

n

1i

2i

Page 35: Part 1 Cross Sectional Data Simple Linear Regression Model – Chapter 2 Multiple Regression Analysis – Chapters 3 and 4 Advanced Regression Topics – Chapter

2.3 Sum of SquaresSST measures the sample variation in y.SSE measures the sample variation in yhat

(the fitted component.SSR measures the sample variation in uhat

(the residual component.

These relate to each other as follows:

(2.36) SST SSRSSE

Page 36: Part 1 Cross Sectional Data Simple Linear Regression Model – Chapter 2 Multiple Regression Analysis – Chapters 3 and 4 Advanced Regression Topics – Chapter

2.3 Proof of SquaresThe proof of (2.36) is as follows:

SSEy

yyyu

yyu

yyyy

ii

ii

ii

)(yu2SSR

)]ˆ()(yu2)ˆ(

)]ˆ()ˆ[(

)]ˆ()ˆ[(y)(y

ii

2ii

2

2

2i

2i

Since we assumed that the covariance between residuals and fitted values is zero,

(2.37) 0)(yu2 ii y

Page 37: Part 1 Cross Sectional Data Simple Linear Regression Model – Chapter 2 Multiple Regression Analysis – Chapters 3 and 4 Advanced Regression Topics – Chapter

2.3 Properties of OLS on Any Sample of Data

•Notes

-An in-depth analysis of sample and inter-variable covariance is available in section C for individual study-SST, SSE and SSR have differing interpretations and labels for different econometric software. As such, it is always important to look up the base formula

Page 38: Part 1 Cross Sectional Data Simple Linear Regression Model – Chapter 2 Multiple Regression Analysis – Chapters 3 and 4 Advanced Regression Topics – Chapter

2.3 Goodness of Fit

-Once we’ve run a regression, the question is begged, “How well does x explain y.”-We can’t answer that yet, but we can ask, “How well does the OLS regression line fit the data?”-To measure this, we use R2, the COEFFICIENT OF DETERMINATION:

(2.38) SST

SSR-1

SST

SSER 2

Page 39: Part 1 Cross Sectional Data Simple Linear Regression Model – Chapter 2 Multiple Regression Analysis – Chapters 3 and 4 Advanced Regression Topics – Chapter

2.3 Goodness of Fit

-”R2 is the ratio of the explained variation compared to the total variation”

-”the fraction of the sample variation in y that is explained by x”-R2 always lies between zero and 1

-if R2=1, all actual points lie on the regression line (usually an error)

-if R2≈0, the regression explains very little; OLS is a “poor fit”

Page 40: Part 1 Cross Sectional Data Simple Linear Regression Model – Chapter 2 Multiple Regression Analysis – Chapters 3 and 4 Advanced Regression Topics – Chapter

2.3 Properties of OLS on Any Sample of Data

•Notes

-A low R2 is not uncommon in the social sciences, especially in cross-sectional analysis-econometric regressions should not be heavily judged due to a low R2

-for example, if R2=0.12, that means 12% of the variation is explained, which is better than the 0% before the regression