21
1 INTERPRETATION OF A REGRESSION EQUATION The scatter diagram shows hourly earnings in 2002 plotted against years of schooling, defined as highest grade completed, for a sample of 540 respondents from the National Longitudinal Survey of Youth 1979–. -20 0 20 40 60 80 100 120 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 H ourly earnings ($) Years ofschooling (highestgrade com pleted)

C01F03_2012_10_28

Embed Size (px)

DESCRIPTION

C01F03_2012_10_28

Citation preview

Page 1: C01F03_2012_10_28

1

INTERPRETATION OF A REGRESSION EQUATION

The scatter diagram shows hourly earnings in 2002 plotted against years of schooling, defined as highest grade completed, for a sample of 540 respondents from the National Longitudinal Survey of Youth 1979–.

-20

0

20

40

60

80

100

120

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Ho

url

y ea

rnin

gs

($)

Years of schooling (highest grade completed)

Page 2: C01F03_2012_10_28

2

Highest grade completed means just that for elementary and high school. Grades 13, 14, and 15 mean completion of one, two and three years of college.

INTERPRETATION OF A REGRESSION EQUATION

-20

0

20

40

60

80

100

120

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Ho

url

y ea

rnin

gs

($)

Years of schooling (highest grade completed)

Page 3: C01F03_2012_10_28

3

Grade 16 means completion of four-year college. Higher grades indicate years of postgraduate education.

INTERPRETATION OF A REGRESSION EQUATION

-20

0

20

40

60

80

100

120

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Ho

url

y ea

rnin

gs

($)

Years of schooling (highest grade completed)

Page 4: C01F03_2012_10_28

. reg EARNINGS S

Source | SS df MS Number of obs = 540-------------+------------------------------ F( 1, 538) = 112.15 Model | 19321.5589 1 19321.5589 Prob > F = 0.0000 Residual | 92688.6722 538 172.283777 R-squared = 0.1725-------------+------------------------------ Adj R-squared = 0.1710 Total | 112010.231 539 207.811189 Root MSE = 13.126

------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- S | 2.455321 .2318512 10.59 0.000 1.999876 2.910765 _cons | -13.93347 3.219851 -4.33 0.000 -20.25849 -7.608444------------------------------------------------------------------------------

This is the output from a regression of earnings on years of schooling, using Stata.

4

INTERPRETATION OF A REGRESSION EQUATION

Page 5: C01F03_2012_10_28

5

For the time being, we will be concerned only with the estimates of the parameters. The variables in the regression are listed in the first column and the second column gives the estimates of their coefficients.

INTERPRETATION OF A REGRESSION EQUATION

. reg EARNINGS S

Source | SS df MS Number of obs = 540-------------+------------------------------ F( 1, 538) = 112.15 Model | 19321.5589 1 19321.5589 Prob > F = 0.0000 Residual | 92688.6722 538 172.283777 R-squared = 0.1725-------------+------------------------------ Adj R-squared = 0.1710 Total | 112010.231 539 207.811189 Root MSE = 13.126

------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- S | 2.455321 .2318512 10.59 0.000 1.999876 2.910765 _cons | -13.93347 3.219851 -4.33 0.000 -20.25849 -7.608444------------------------------------------------------------------------------

Page 6: C01F03_2012_10_28

6

In this case there is only one variable, S, and its coefficient is 2.46. _cons, in Stata, refers to the constant. The estimate of the intercept is –13.93.

INTERPRETATION OF A REGRESSION EQUATION

. reg EARNINGS S

Source | SS df MS Number of obs = 540-------------+------------------------------ F( 1, 538) = 112.15 Model | 19321.5589 1 19321.5589 Prob > F = 0.0000 Residual | 92688.6722 538 172.283777 R-squared = 0.1725-------------+------------------------------ Adj R-squared = 0.1710 Total | 112010.231 539 207.811189 Root MSE = 13.126

------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- S | 2.455321 .2318512 10.59 0.000 1.999876 2.910765 _cons | -13.93347 3.219851 -4.33 0.000 -20.25849 -7.608444------------------------------------------------------------------------------

Page 7: C01F03_2012_10_28

7

Here is the scatter diagram again, with the regression line shown.

-20

0

20

40

60

80

100

120

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Ho

url

y ea

rnin

gs

($)

Years of schooling (highest grade completed)

INTERPRETATION OF A REGRESSION EQUATION

EARNINGS = –13.93 + 2.46 S^

Page 8: C01F03_2012_10_28

8

What do the coefficients actually mean?

INTERPRETATION OF A REGRESSION EQUATION

-20

0

20

40

60

80

100

120

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Ho

url

y ea

rnin

gs

($)

Years of schooling (highest grade completed)

EARNINGS = –13.93 + 2.46 S^

Page 9: C01F03_2012_10_28

9

To answer this question, you must refer to the units in which the variables are measured.

INTERPRETATION OF A REGRESSION EQUATION

-20

0

20

40

60

80

100

120

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Ho

url

y ea

rnin

gs

($)

Years of schooling (highest grade completed)

^EARNINGS = –13.93 + 2.46 S

Page 10: C01F03_2012_10_28

10

S is measured in years (strictly speaking, grades completed), EARNINGS in dollars per hour. So the slope coefficient implies that hourly earnings increase by $2.46 for each extra year of schooling.

INTERPRETATION OF A REGRESSION EQUATION

-20

0

20

40

60

80

100

120

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Ho

url

y ea

rnin

gs

($)

Years of schooling (highest grade completed)

^EARNINGS = –13.93 + 2.46 S

Page 11: C01F03_2012_10_28

We will look at a geometrical representation of this interpretation. To do this, we will enlarge the marked section of the scatter diagram.

11

INTERPRETATION OF A REGRESSION EQUATION

-20

0

20

40

60

80

100

120

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Ho

url

y ea

rnin

gs

($)

Years of schooling (highest grade completed)

^EARNINGS = –13.93 + 2.46 S

Page 12: C01F03_2012_10_28

7

9

11

13

15

17

19

21

10.8 11 11.2 11.4 11.6 11.8 12 12.2

Years of schooling

Ho

url

y ea

rnin

gs

($)

The regression line indicates that completing 12th grade instead of 11th grade would increase earnings by $2.46, from $13.07 to $15.53, as a general tendency.

12

one year

$2.46$13.07

$15.53

INTERPRETATION OF A REGRESSION EQUATION

Page 13: C01F03_2012_10_28

You should ask yourself whether this is a plausible figure. If it is implausible, this could be a sign that your model is misspecified in some way.

13

INTERPRETATION OF A REGRESSION EQUATION

-20

0

20

40

60

80

100

120

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Ho

url

y ea

rnin

gs

($)

Years of schooling (highest grade completed)

^EARNINGS = –13.93 + 2.46 S

Page 14: C01F03_2012_10_28

14

For low levels of education it might be plausible. But for high levels it would seem to be an underestimate.

INTERPRETATION OF A REGRESSION EQUATION

-20

0

20

40

60

80

100

120

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Ho

url

y ea

rnin

gs

($)

Years of schooling (highest grade completed)

^EARNINGS = –13.93 + 2.46 S

Page 15: C01F03_2012_10_28

15

What about the constant term? (Try to answer this question yourself before continuing with this sequence.)

INTERPRETATION OF A REGRESSION EQUATION

-20

0

20

40

60

80

100

120

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Ho

url

y ea

rnin

gs

($)

Years of schooling (highest grade completed)

^EARNINGS = –13.93 + 2.46 S

Page 16: C01F03_2012_10_28

16

Literally, the constant indicates that an individual with no years of education would have to pay $13.93 per hour to be allowed to work.

INTERPRETATION OF A REGRESSION EQUATION

-20

0

20

40

60

80

100

120

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Ho

url

y ea

rnin

gs

($)

Years of schooling (highest grade completed)

^EARNINGS = –13.93 + 2.46 S

Page 17: C01F03_2012_10_28

17

This does not make any sense at all. In former times craftsmen might require an initial payment when taking on an apprentice, and might pay the apprentice little or nothing for quite a while, but an interpretation of negative payment is impossible to sustain.

INTERPRETATION OF A REGRESSION EQUATION

-20

0

20

40

60

80

100

120

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Ho

url

y ea

rnin

gs

($)

Years of schooling (highest grade completed)

^EARNINGS = –13.93 + 2.46 S

Page 18: C01F03_2012_10_28

18

A safe solution to the problem is to limit the interpretation to the range of the sample data, and to refuse to extrapolate on the ground that we have no evidence outside the data range.

INTERPRETATION OF A REGRESSION EQUATION

-20

0

20

40

60

80

100

120

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Ho

url

y ea

rnin

gs

($)

Years of schooling (highest grade completed)

^EARNINGS = –13.93 + 2.46 S

Page 19: C01F03_2012_10_28

-20

0

20

40

60

80

100

120

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Ho

url

y ea

rnin

gs

($)

Years of schooling (highest grade completed)

19

With this explanation, the only function of the constant term is to enable you to draw the regression line at the correct height on the scatter diagram. It has no meaning of its own.

INTERPRETATION OF A REGRESSION EQUATION

^EARNINGS = –13.93 + 2.46 S

Page 20: C01F03_2012_10_28

20

Another solution is to explore the possibility that the true relationship is nonlinear and that we are approximating it with a linear regression. We will soon extend the regression technique to fit nonlinear models.

INTERPRETATION OF A REGRESSION EQUATION

-20

0

20

40

60

80

100

120

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Ho

url

y ea

rnin

gs

($)

Years of schooling (highest grade completed)

^EARNINGS = –13.93 + 2.46 S

Page 21: C01F03_2012_10_28

Copyright Christopher Dougherty 2012.

These slideshows may be downloaded by anyone, anywhere for personal use.

Subject to respect for copyright and, where appropriate, attribution, they may be

used as a resource for teaching an econometrics course. There is no need to

refer to the author.

The content of this slideshow comes from Section 1.4 of C. Dougherty,

Introduction to Econometrics, fourth edition 2011, Oxford University Press.

Additional (free) resources for both students and instructors may be

downloaded from the OUP Online Resource Centre

http://www.oup.com/uk/orc/bin/9780199567089/.

Individuals studying econometrics on their own who feel that they might benefit

from participation in a formal course should consider the London School of

Economics summer school course

EC212 Introduction to Econometrics

http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx

or the University of London International Programmes distance learning course

EC2020 Elements of Econometrics

www.londoninternational.ac.uk/lse.

2012.10.28