Upload
michael-glynn
View
217
Download
0
Embed Size (px)
Citation preview
8/13/2019 An Overview of Regression Analysis
1/33
Chapter 1
An Overviewof Regression
Analysis
Copyright 2011 Pearson Addison-Wesley.
All rights reserved.Slides by Niels-Hugo Blunch
Washington and Lee University
8/13/2019 An Overview of Regression Analysis
2/33
1-2 2011 Pearson Addison-Wesley. All rightsreserved.
What is Econometrics?
Econometrics is too mathematical; its the reasonmy best friend isnt majoring in economics
There are two things you are better off not
watching in the making: sausages and econometricestimates
Econometrics may be defined as the quantitativeanalysis of actual economic phenomena
Blind people trying to describe an elephant just
based on what they happen to be touching
8/13/2019 An Overview of Regression Analysis
3/33
1-3 2011 Pearson Addison-Wesley. All rightsreserved.
What is Econometrics? (cont.)
Econometrics literally means economicmeasurement
It is the quantitative measurement and analysis of
actualeconomic and business phenomenaand soinvolves: economic theory
Statistics
Math
observation/data collection
Econometrics attempts to bridge the gap betweeneconomic theory and the real world
8/13/2019 An Overview of Regression Analysis
4/33
1-4 2011 Pearson Addison-Wesley. All rightsreserved.
What is Econometrics? (cont.)
Three major uses of econometrics: Describingeconomic reality (what is the quantitative
relationship between consumption and income?) Testinghypotheses about economic theory (is potato a
normal good or an inferior good?) Forecastingfuture economic activity (what will the U.S.
unemployment rate be in 2013?)
Econometrics vs. Statistics
Econometrics is based on Statistics, but Econometrics deals with economic issues Econometrics has many unique techniques Econometrics is designed for economic data
8/13/2019 An Overview of Regression Analysis
5/33
1-5 2011 Pearson Addison-Wesley. All rightsreserved.
Example
Consider this general and purely theoretical
relationship (from microeconomic theory):
Q = f(P, Ps, Yd) (1.1)Q: quantity demanded P: price
Ps: price of a substitute Yd: disposable income
Econometrics allows this general and purely
theoretical relationship to become explicit:
Q = 27.70.11P + 0.03Ps+ 0.23Yd (1.2)
8/13/2019 An Overview of Regression Analysis
6/33
1-6 2011 Pearson Addison-Wesley. All rightsreserved.
What is Econometrics? (cont.)
Econometrics models/approaches: Linear regression model
Nonlinear regression model
Time-series model
Panel data model
Discrete choice model
Simultaneous equations model
Event counts model
Duration-time model
This course will focus on linear regression model
8/13/2019 An Overview of Regression Analysis
7/33
1-7 2011 Pearson Addison-Wesley. All rightsreserved.
What is Regression Analysis?
Economic theory can give us the directionof achange, e.g. the change in the demand for dvdsfollowing a price decrease (or price increase)
But what if we want to know not just how?butalso how much?
Then we need: A sample of data
A way to estimate such a relationship
one of the most frequently ways used is regressionanalysis
8/13/2019 An Overview of Regression Analysis
8/33
1-8 2011 Pearson Addison-Wesley. All rightsreserved.
What is Regression Analysis? (cont.)
Formally, regression analysis is a statistical
techniquethat attempts to explain
movements in one variable, the dependentvariable, as a function of movements in a set
of other variables, the independent(or
explanatory) variables, through the
quantification of a single equation
8/13/2019 An Overview of Regression Analysis
9/33
1-9 2011 Pearson Addison-Wesley. All rightsreserved.
Example
Return to the example from before:
Q = f(P, Ps, Yd) (1.1)
Here, Q is the dependentvariable and P, Ps, Yd are
the independent (explanatory)variables Dont be deceived by the words dependent and
independent. A statistically significant regressionresult does not necessarilyimply causality(means
cause-and-effect relationship) We also need:
Economic theory
Common sense
8/13/2019 An Overview of Regression Analysis
10/33
1-10 2011 Pearson Addison-Wesley. All rightsreserved.
Linear Model
The simplest example is:
Yi = 0+ 1Xi(i = 1,2,,N) (1.3)
The sare called coefficients 0is the constant or intercept term 1is the slope coefficient: the amount that Y will change
when X increases by one unit; for a linear model, 1isconstant over the entire function
8/13/2019 An Overview of Regression Analysis
11/33
1-11 2011 Pearson Addison-Wesley. All rightsreserved.
Figure 1.1
Graphical Representation of the
Coefficients of the Regression Line
8/13/2019 An Overview of Regression Analysis
12/33
1-12 2011 Pearson Addison-Wesley. All rightsreserved.
Linear Model (cont.)
Application of linear regression techniques requires that the equation be
linearsuch as (1.3)
By contrast, the equation
Y = 0+ 1X2
(1.4)is not linear
What to do? First define
Z = X2 (1.5)
Substituting into (1.4) yields:
Y = 0+ 1Z (1.6)
This redefined equation is now linear(in the coefficients 0and 1andin
the variables Y and Z)
8/13/2019 An Overview of Regression Analysis
13/33
1-13 2011 Pearson Addison-Wesley. All rightsreserved.
Linear Regression Model
Is (1.3) a complete description of origins of variation in Y?
No, at least four sources of variation in Y other than thevariation in the included Xs: Other potentially important explanatory variables may be missing
(e.g., X2and X3) Measurement error
Incorrect functional form
Purely random and totally unpredictable occurrences
Inclusion of a stochastic error term () partially takes care
of these other sources of variation in Y that are NOT capturedby X, so that (1.3) becomes:Y = 0+ 1X + (1.7)
8/13/2019 An Overview of Regression Analysis
14/33
1-14 2011 Pearson Addison-Wesley. All rightsreserved.
Example: Aggregate
Consumption Function
Aggregate consumption as a function of aggregate income may belower (or higher) than it would otherwise have been due to:
consumer uncertaintyhard (impossible?) to measure, i.e. is anomitted variable
Observed consumption may be different from actual consumption dueto measurement error
The true consumption function may be nonlinear but a linear one isestimated (see Figure 1.2for a graphical illustration)
Human behavior always contains some element(s) of pure chance;unpredictable, i.e. random events may increase or decrease
consumption at any given time Whenever one or more of these factors are at play, the observed Y
will differ from the Y predicted from the deterministic part, 0+ 1X
8/13/2019 An Overview of Regression Analysis
15/33
1-15 2011 Pearson Addison-Wesley. All rightsreserved.
Figure 1.2
Errors Caused by Using a Linear Functional Form to Model a
Nonlinear Relationship
8/13/2019 An Overview of Regression Analysis
16/33
1-16 2011 Pearson Addison-Wesley. All rightsreserved.
General Regression Model
General regression model: Y = E(Y|X) + deterministiccomponent: E(Y|X)
stochastic/randomcomponent:
What is E(Y|X)? The deterministic component can be thought of as the
expected valueof Y givenXnamely E(Y|X)i.e. themean (or average)value of the Y associated with aparticular valueof X
In mathematics, E(Y|X) is the conditional expectation(thatis, expectationof Y conditionalon X)
In linear regression model, E(Y|X) = 0+ 1X
8/13/2019 An Overview of Regression Analysis
17/33
1-17 2011 Pearson Addison-Wesley. All rightsreserved.
Extending the Notation
Include the index of observations
Single explanatory variable case:
Yi= 0+ 1Xi+ i(i = 1,2,,N) (1.10) So there are really N equations, one for each
observation
0and 1 are the coefficients (they are population
parameters!)
the values of Y, X, and differ across observations
8/13/2019 An Overview of Regression Analysis
18/33
1-18 2011 Pearson Addison-Wesley. All rightsreserved.
Three Types of Data in Econometrics
Subscript i for data on individuals (so called cross
sectionaldata)
Subscript t for time seriesdata (e.g., series of years,
months, or daysdaily exchange rates, for example )
Subscript it when we have both(for example,
panel data)
8/13/2019 An Overview of Regression Analysis
19/33
1-19 2011 Pearson Addison-Wesley. All rightsreserved.
Multivariate Linear Regression Model
The general case: multivariatelinear regression
model
Yi=
0+
1X
1i+
2X
2i+
3X
3i+
i (i = 1,2,,N) (1.11)
Each of the slope coefficients gives the impact of a
one-unit increase in the corresponding X on Y,
holding the other included explanatory variablesconstant (i.e., ceteris paribus)
8/13/2019 An Overview of Regression Analysis
20/33
1-20 2011 Pearson Addison-Wesley. All rightsreserved.
Example: Wage Regression
Let wages (WAGE) depend on:
years of work experience (EXP)
years of education (EDU)
gender of the worker (GEND: 1 if male, 0 if female)
Substituting into equation (1.11) yields:
WAGEi= 0+ 1EXPi+ 2EDUi+ 3GENDi+ i (1.12)
8/13/2019 An Overview of Regression Analysis
21/33
1-21 2011 Pearson Addison-Wesley. All rightsreserved.
The Estimated Regression Equation
The regression equation considered so far is the truebutunknownregression equation
Instead of true, might think about this as the population
regression equation The population regression equation has to be estimated, we
need estimatorsof the regression coefficients
The sample/estimatedregression equation is:
The signs on top of the estimators are denoted hat, so that
we have 0-hat and 1-hat (they are sample statistics orestimators of regression coefficients!)
For each sample we get a different set of estimated regressioncoefficients
ii XY
10
8/13/2019 An Overview of Regression Analysis
22/33
1-22 2011 Pearson Addison-Wesley. All rightsreserved.
The Estimated Regression Equation
(cont.)
Yihatis the estimated value (or an estimate) of Yi
The residual,ei , is given as(1.17)
Note that ei is different from the error term, i,
(1.18)
ei is an estimate of i, the smaller is ei, the better is thefit
8/13/2019 An Overview of Regression Analysis
23/33
1-23 2011 Pearson Addison-Wesley. All rightsreserved.
Figure 1.3
True and Estimated Regression Lines
8/13/2019 An Overview of Regression Analysis
24/33
1-24 2011 Pearson Addison-Wesley. All rightsreserved.
Example: Using Regression to Explain
Housing prices
Houses are not homogenous products that havegenerally known market prices
So, how to appraise a house against a given asking
price? Yes, its true: many real estate companies actually
use regression analysis to do this!
Consider a specific case: Suppose the asking pricewas $230,000
Is this fair / too much /too little?
8/13/2019 An Overview of Regression Analysis
25/33
1-25 2011 Pearson Addison-Wesley. All rightsreserved.
Example: Using Regression to Explain
Housing prices (cont.)
Depends on size of house (higher size, higher price)
So, collect cross-sectional data on prices(in thousands of $) and sizes (in square feet)
for, say, 43 houses Then say this yields the following estimated regression
line:
(1.23)ii
SIZECEIPR 138.00.40
8/13/2019 An Overview of Regression Analysis
26/33
1-26 2011 Pearson Addison-Wesley. All rightsreserved.
Figure 1.5 A Cross-Sectional Model of
Housing Prices
8/13/2019 An Overview of Regression Analysis
27/33
1-27 2011 Pearson Addison-Wesley. All rightsreserved.
Example: Using Regression to Explain
Housing prices (cont.)
Note that the interpretationof the interceptterm is
problematic in this case (well get back to this later,
in Section 7.1.2)
The literal interpretation of the intercept here is the
price of a house with a size of zerosquare feet
8/13/2019 An Overview of Regression Analysis
28/33
1-28 2011 Pearson Addison-Wesley. All rightsreserved.
Example: Using Regression to Explain
Housing prices (cont.)
How to use the estimated regression line / estimated regressioncoefficients to answer the question? Just plug the particular size of the house that you are interested in
(here, 1,600 square feet) into (1.23)
Alternatively, read off the estimated price using Figure 1.5 Either way, we get an estimated price of $260.8 thousand
(=40+0.138*1600)
So, in terms of our original question, its a good dealgo ahead andpurchase!!
Note that we simplified a lot in this example by assuming that onlysize matters for housing prices (we ignore the year when the housewas built, the distance to work, the number of bedrooms, etc.)
8/13/2019 An Overview of Regression Analysis
29/33
Another Example: Weight and Height
Yi= 0+ 1Xi+ i
Yi= the weight (in pounds) of the ith customer
Xi = the height (in inches above 5 feet) of the ithcustomer
i = the stochastic error term for the ith customer
The estimated regression function is :
Estimated Weight = 103.40 + 6.38*Height (above 5)
1/16/2013 29
103.40 6.38*i i
Y X
8/13/2019 An Overview of Regression Analysis
30/33
1-30 2011 Pearson Addison-Wesley. All rightsreserved.
Another Example: Weight and Height (cont.)
8/13/2019 An Overview of Regression Analysis
31/33
1-31 2011 Pearson Addison-Wesley. All rightsreserved.
Another Example: Weight and Height (cont.)
8/13/2019 An Overview of Regression Analysis
32/33
1-32 2011 Pearson Addison-Wesley. All rightsreserved.
Another Example: Weight and Height (cont.)
8/13/2019 An Overview of Regression Analysis
33/33
1-33 2011 Pearson Addison-Wesley All rights
Key Terms from Chapter 1
Econometrics
Regression analysis
Dependent variable
Independent variable(s)
Explanatory variable(s)
Causality
Linear regression model Coefficient
Intercept
Slope
Error term
Conditional expectation
Multivariate linear regression
model
Cross-sectional data
Time series data Panel data
Residual