View
219
Download
0
Tags:
Embed Size (px)
Citation preview
Lecture 18 1
Econ 140Econ 140
Multiple Regression Applications III
Lecture 18
Lecture 18 2
Econ 140Econ 140Dummy variables
• Include qualitative indicators into the regression: e.g. gender, race, regime shifts.
• So far, have only seen the change in the intercept for the regression line.
• Suppose now we wish to investigate if the slope changes as well as the intercept.
• This can be written as a general equation:
Wi = a + b1Agei + b2Marriedi + b3Di + b4(Di*Agei) + b5(Di*Marriedi) + ei
• Suppose first we wish to test for the difference between males and females.
Lecture 18 3
Econ 140Econ 140Interactive terms
• For females and males separately, the model would be:
Wi = a + b1Agei + b2Marriedi + e
– in so doing we argue that would be different for males and females– we want to think about two sub-sample groups: males and females– we can test the hypothesis that the intercept and partial slope coefficients will be different for these 2 groups
abb ˆ and ˆ,ˆ21
Lecture 18 4
Econ 140Econ 140Interactive terms (2)
• To test our hypothesis we’ll estimate the regression equation above (Wi = a + b1Agei + b2Marriedi + e) for the whole sample and then for the two sub-sample groups
• We test to see if our estimated coefficients are the same between males and females
• Our null hypothesis is:
H0 : aM, b1M, b2M = aF, b1F, b2F
Lecture 18 5
Econ 140Econ 140Interactive terms (3)
• We have an unrestricted form and a restricted form– unrestricted: used when we estimate for the sub-sample groups separately– restricted: used when we estimate for the whole sample
• What type of statistic will we use to carry out this test?– F-statistic:
knknSSR
qSSRSSRF
U
UR
21
q = k, the number of parameters in the model
n = n1 + n2 where n is complete sample size
Lecture 18 6
Econ 140Econ 140Interactive terms (4)
• The sum of squared residuals for the unrestricted form will be:
SSRU = SSRM + SSRF
• L17_2.xls
– the data is sorted according to the dummy variable “female”
– there is a second dummy variable for marital status
– there are 3 estimated regression equations, one each for the total sample, male sub-sample, and female sub-sample
Lecture 18 7
Econ 140Econ 140Interactive terms (5)
• The output allows us to gather the necessary sum of squared residuals and sample sizes to construct the test statistic:
626.2466.0
224.1
633093.5495.7
3093.5495.7261.1621
knknSSR
qSSRSSRF
U
UR
– Since F0.05,3, 27 = 2.96 > F* we cannot reject the null hypothesis that the partial slope coefficients are the same for males and females
Lecture 18 8
Econ 140Econ 140Interactive terms (6)
• What if F* > F0.05,3, 27 ? How to read the results?
– There’s a difference between the two sub-samples and therefore we should estimate the wage equations separately
– Or we could interact the dummy variables with the other variables
• To interact the dummy variables with the age and marital status variables, we multiply the dummy variable by the age and marital status variables to get:
Wt = a + b1Agei + b2Marriedi + b3Di + b4(Di*Agei) + b5(Di*Marriedi) + ei
Irene O. Wong:Irene O. Wong:Irene O. Wong:Irene O. Wong:
Lecture 18 9
Econ 140Econ 140Interactive terms (7)
• Using L17_2.xls you can construct the interactive terms by multiplying the FEMALE column by the AGE and MARRIED columns
– one way to see if the two sub-samples are different, look at the t-ratios on the interactive terms
– in this example, neither of the t-ratios are statistically significant so we can’t reject the null hypothesis
Lecture 18 10
Econ 140Econ 140Interactive terms (8)
• If we want to estimate the equation for the first sub-sample (males) we take the expectation of the wage equation where the dummy variable for female takes the value of zero:
E(Wt|Di = 0) = a + b1Agei + b2Marriedi
• We can do the same for the second sub-sample (Females)
E(Wt|Di = 1) = (a + b3) + (b1 + b4)Agei + (b2 + b3) Marriedi
• We can see that by using only one regression equation, we have allowed the intercept and partial slope coefficients to vary by sub-sample
Lecture 18 11
Econ 140Econ 140Phillips Curve example
• Phillips curve as an example of a regime shift.
• Data points from 1950 - 1970: There is a downward sloping, reciprocal relationship between wage inflation and unemployment
W
UN
Lecture 18 12
Econ 140Econ 140Phillips Curve example (2)
• But if we look at data points from 1971 - 1996:
• From the data we can detect an upward sloping relationship
• ALWAYS graph the data between the 2 main variables of interest
W
UN
Lecture 18 13
Econ 140Econ 140Phillips Curve example (3)
• There seems to be a regime shift between the two periods
– note: this is an arbitrary choice of regime shift - it was not dictated by a specific change
• We will use the Chow Test (F-test) to test for this regime shift
– the test will use a restricted form:
– it will also use an unrestricted form:
– D is the dummy variable for the regime shift, equal to 0 for 1950-1970 and 1 for 1971-1996
Nt U
baW1
NNt U
DbU
bDbaW11
321
Lecture 18 14
Econ 140Econ 140Phillips Curve example (4)
• L17_3.xls estimates the restricted regression equations and calculates the F-statistic for the Chow Test:
• The null hypothesis will be:
H0 : b1 = b3 = 0
– we are testing to see if the dummy variable for the regime shift alters the intercept or the slope coefficient
• The F-statistic is (* indicates restricted)
Where q=2 kne
qeeF
2
2*2
ˆ
ˆˆ
Lecture 18 15
Econ 140Econ 140Phillips Curve example (5)
• The expectation of wage inflation for the first time period:
• The expectation of wage inflation for the second time period:
• You can use the spreadsheet data to carry out these calculations
NUbaDWE
1)0|(
NU
bbbaDWE1
)1|( 321
Lecture 18 16
Econ 140Econ 140
Relaxing Assumptions
Lecture 18
Lecture 18 17
Econ 140Econ 140Today’s Plan
• A review of what we have learned in regression so far and a look forward to what we will happen when we relax assumptions around the regression line
• Introduction to new concepts:
– Heteroskedasticity
– Serial correlation (also known as autocorrelation)
– Non-independence of independent variables
Lecture 18 18
Econ 140Econ 140CLRM Revision
• Calculating the linear regression model (using OLS)
• Use of the sum of square residuals: calculate the variance for the regression line and the mean squared deviation
• Hypothesis tests: t-tests, F-tests, 2 test.
• Coefficient of determination (R2) and the adjustment.
• Modeling: use of log-linear, logs, reciprocal.
• Relationship between F and R2
• Imposing linear restrictions: e.g. H0: b2 = b3 = 0 (q = 2); H0: + = 1.
• Dummy variables and interactions; Chow test.
Lecture 18 19
Econ 140Econ 140Relaxing assumptions
• What are the assumptions we have used throughout?• Two assumptions about the population for the bi-variate case:
1. E(Y|X) = a + bX (the conditional expectation function is linear); 2. V(Y|X) = (conditional variances are constant)
• Assumptions concerning the sampling procedure (i= 1..n) 1. Values of Xi (not all equal) are prespecified; 2. Yi is drawn from the subpopulation having X = Xi; 3. Yi ‘s are independent.
• Consequences are: 1. E(Yi) = a + bXi; 2. V(Yi) = 2; 3. C(Yh, Yi) = 0
– How can we test to see if these assumptions don’t hold?– What can we do if the assumptions don’t hold?
Lecture 18 20
Econ 140Econ 140Homoskedasticity
• We would like our estimates to be BLUE• We need to look out for three potential violations of the CLRM assumptions: heteroskedasticity, autocorrelation, and non-independence of X (or simultaneity
bias).• Heteroskedasticity: usually found in cross-section data (and longitudinal)• In earlier lectures, we saw that the variance of is
b̂
2
2
)ˆ(x
bV
– This is an example of homoskedasticity, where the variance is constant
Lecture 18 21
Econ 140Econ 140Homoskedasticity (2)
• Homoskedasticity can be illustrated like this:
constantvariance aroundthe regression line
Y
XX1 X2 X3
Lecture 18 22
Econ 140Econ 140Heteroskedasticity
• But, we don’t always have constant variance 2
– We may have a variance that varies with each observation, or
• When there is heteroskedasticty, the variance around the regression line varies with the values of X2i
Lecture 18 23
Econ 140Econ 140Heteroskedasticity (2)
• The non-constant variance around the regression line can be drawn like this:
XX1 X2 X3
Y
Lecture 18 24
Econ 140Econ 140Serial (auto) correlation
• Serial correlation can be found in time series data (and longitudinal data)
• Under serial correlation, we have covariance terms
– where Yi and Yh are correlated or each Yi is not independently drawn
– This results in nonzero covariance terms
h i
hiihi cccbV 2)(
Lecture 18 25
Econ 140Econ 140Serial (auto) correlation (2)
• Example: We can think of this using time series data such that unemployment at time t is related to unemployment in the previous time period t-1
• If we have a model with unemployment as the dependent variable Yt then
– Yt and Yt-1 are related
– et and et-1 are also related
Lecture 18 26
Econ 140Econ 140Non-independence
• The non-independence of independent variables is the third violation of the ordinary least squares assumptions
• Remember from the OLS derivation that we minimized the sum of the squared residuals
– we needed independence between the X variable and the error term
– if not, the values of X are not pre-specified
– without independence, the estimates are biased
0, given that 2 eXbge
Lecture 18 27
Econ 140Econ 140Summary
• Heteroskedasticity and serial correlation
– make the estimates inefficient
– therefore makes the estimated standard errors incorrect
• Non-independence of independent variables
– makes estimates biased
– instrumental variables and simultaneous equations are used to deal with this third type of violation
• Starting next lecture we’ll take a more in-depth look at the three violations of the CLRM assumptions