Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Economics 130
Lecture 9
Functional Forms
Dummy or Indicator Variables
Projects (Teams)
Readings for This Week
Text:
• CH 6
• CH 2, 2.8 – 2.9
• CH 4, 4.4 – 4.6
• CH 5, 5.6 – 5.8, CH 7
Multiple Regression
• We continue with addressing our second issue
+ add in how we evaluate these relationships:
– Where do we get data to do this analysis?
– How do we create the model relating the data?
– How do we relate data to on another?
– How do we evaluate these relationships?
Introducing Functional Forms
• How to formulate and estimate nonlinear
relationships.
Functional Forms
• Introduce concept – world is not linear
1. Power: xp raises the variable to the power p;
– Quadratic (x2) and cubic (x3) transformations.
2. The reciprocal: 1/x.
3. The natural logarithm: ln(x). Variations using logs…
Functional Forms
Functional Forms
Y = 3x(x = 1...15)
Y = x2
(x = 1...15)
Y = lnx(x = 1...15)
0
10
20
30
40
50
0 5 10 15 20
0
50
100
150
200
250
0 5 10 15 20
0
0.5
1
1.5
2
2.5
3
0 5 10 15 20
Functional Forms
• Model with power functions:
– y = β1 + β2 x + β3x2 + β4x
3 + e
– Note: STILL LINEAR, just in non-linear functions of x.
• Model with reciprocal:
– y = β1 + β2 (1/x) + e
– Note: STILL LINEAR, just in non-linear function of x.
Functional Forms
• Example with reciprocal :
Functional Forms
• Gretl output:
• Model 1: OLS, using observations 1-40
• Dependent variable: y
• coefficient std. error t-ratio p-value
• ---------------------------------------------------------
• const 358.036 24.3579 14.70 2.91e-017 ***
• invx -1119.08 285.656 -3.918 0.0004 ***
• R-squared 0.287688
Functional Forms
• What is the impact of income on food expenditures when x=19.605?
A. 0.102
B. 10.21
C. -1119.078
D. 2.9116
E. 1119.078
F. None of the above
Functional Form
• Why?
• The “slope” = dy/dx =
- β2/x2
• b2 = -1119.08
• Therefore the marginal effect on Y =
• -(-1119.08/(19.605)2
• = 2.9116
A. 0.102
B. 10.21
C. -1119.078
D. 2.9116
E. 1119.078
F. None of the above
Functional Forms
1. The log-log model : The parameter β2 is the elasticity
of y with respect to x. I.e., a 1% increase in x leads to
a β2 % increase in y. (E.g., β2=1.)
2. The log-linear model : A one-unit increase in x leads
to (approximately) a 100×β2 percent change in y.
3. The linear-log model: A 1% increase in x leads to a
β2/100 (β2%) unit change in y.
Functional Forms
• Linear Log Model
• FOOD_EXP = -97.18 + 132.165 ln(INCOME)
1 2_ ln( )FOOD EXP INCOME e
Functional Forms
• A 1% increase in X
implies how many unit
changes in Y?
• A 1% increase in x leads
to a β2/100 unit change
in y.
• Therefore:
1% increase leads to
132.16/100 unit change
in y = $1.32
Functional Forms
• Log Log Model:
– LN (FOOD_EXP) = 3.96 + .555 ln(INCOME)
1 2_ ln( )FOOD EXP INCOME e Ln(FOOD_EXP)
Functional Forms
• A 1% increase in x
implies what change in
y?
• A 1% increase in x leads
to a β2 % change in y.
• Therefore:
1% increase leads to
.55% change in y
Functional Forms
• Why care about
functional form?
• Data may not look like
“linear.”
• Consider, for example,
our housing model.
Functional Forms• Gretl output for the housing model:
• Model 1: OLS, using observations 1-14• Dependent variable: price
• Coefficient Std. Error t-ratio p-value• const 121.179 80.1778 1.5114 0.15888
• sqft 0.148314 0.021208 6.9933 0.00002 ***
• bedrms -23.9106 24.6419 -0.9703 0.35274
• R-squared 0.834673 Adjusted R-squared 0.804613
Functional Forms
• Gretl output for Linear Log Housing Model:
• Model 2: OLS, using observations 1-14• Dependent variable: price
• Coefficient Std. Error t-ratio p-value
• const -1749.97 259.141 -6.7530 0.00003 ***
• l_sqft 299.97 39.9758 7.5039 0.00001 ***
• l_bedrms -145.094 84.7188 -1.7127 0.11478
• R-squared 0.852853 Adjusted R-squared 0.826099
Functional Forms
• Log linear models are
widely used in human
capital literature.
• Suppose we believe
there is a constant return
on education, r. Then
first period return on
education =
w1 = (1+r)w0
• Therefore in year s:
Wages = w0 (1+r)s
Taking the logarithms:
ln WAGE( ) = ln WAGE0( ) + ln 1+ r( )EDUC
= 1+
2EDUC
Functional Forms
• Now let’s compare two models, one linear and
the other log linear.
Functional Forms
• First, linear model:
• Model 1: OLS, using observations 1-1000
• Dependent variable: wage
• coefficient std. error t-ratio p-value
• ---------------------------------------------------------
• const -4.91218 0.966788 -5.081 4.48e-07 ***
• educ 1.13852 0.0715497 15.91 5.59e-051 ***
• Mean dependent var 10.21302 S.D. dependent var 6.246641
• Sum squared resid 31092.99 S.E. of regression 5.581693
• R-squared 0.202366
Functional Forms
• Model 2: OLS, using observations 1-1000
• Dependent variable: l_wage
• coefficient std. error t-ratio p-value
• ---------------------------------------------------------
• const 0.788374 0.0848975 9.286 9.71e-020 ***
• educ 0.103761 0.00628307 16.51 2.40e-054 ***
• Mean dependent var 2.166837 S.D. dependent var 0.552806
• Sum squared resid 239.7676 S.E. of regression 0.490151
• R-squared 0.214621
Functional Forms
• For old times: can you reject the null
hypothesis that 2 = 0 at a 1% significance
level?
coefficient std. error t-ratio p-value
educ 0.1038 0.00628307 16.51 2.40e-054 ***
Functional Forms
• So what’s the interpretation of these results?
• Remember: With log-linear model, a one-unit
increase in x leads to (approximately) a 100×β2
percent change in y.
Functional Forms
• So a 1 unit increase in
education results in:
A. 0.1038% increase in
wages
B. 10.38% increase in
wages
C. 100.38% increase in
wages
D. None of the above
Functional Forms
• So a 1 unit increase in
education results in:
A. 0.1038% increase in
wages
B. 10.38% increase in
wages
C. 100.38% increase in
wages
D. None of the above
Answer:
A. 0.1038% increase in
wages
B. 10.38% increase in
wages
C. 100.38% increase in
wages
D. None of the above
Functional Forms
• Also note that the R2 for the log linear model is
better than the linear model.
• In the two models, R2 is:
• Linear: 0.202366
• Log-linear: 0.214621
• And this is a slight underestimate of the log-linear R2
Functional Forms
• A final word about measuring elasticity.
• Log-log models can be used to calculate
elasticities.
• Let’s consider the Bus Travel Model
Functional Forms
• Here are the regression results of turning the Bus Travel Model into a Log-log model.
• Model 1: OLS, using observations 1-40• Dependent variable: l_BUSTRAVL
• Coefficient Std. Error t-ratio p-value• const 45.8457 9.61411 4.7686 0.00003 ***• l_INCOME -4.73008 1.02119 -4.6319 0.00005 ***• l_POP 1.82037 0.235733 7.7222 <0.00001 ***• l_LANDAREA -0.970997 0.206807 -4.6952 0.00004 ***
• Mean dependent var 7.023257 S.D. dependent var 1.157544• Sum squared resid 18.87667 S.E. of regression 0.724121• R-squared 0.638768 Adjusted R-squared 0.608666• F(3, 36) 21.21967 P-value(F) 4.35e-08• Log-likelihood -41.73848 Akaike criterion 91.47697• Schwarz criterion 98.23248 Hannan-Quinn 93.91955
Functional Forms
• Ln(BUSTRAVEL) = 45.85 – 4.73 ln(INCOME)
+ 1.82 ln(POP) - .971 ln(LANDAREA)
• Income elasticity of demand = -4.73
• This appears to be highly elastic (<-1).
• Question: is it significantly different than -1?
Functional Forms
• Let’s us a 2-sided t test determining if this
income elasticity number is significantly
different than 1.
• Test statistic: t = – c/se
• = 4.73 – 1/1.021 = 3.65.
• Critical value at a = .05 = 2.03
• Conclusion: Income is “significantly elastic”
Multiple Regression
• Dummy Variables (Indicator Variables)
Regressions with Dummy Variables
• Simple Introduction:
• Dummy variable is either 0 or 1.
• Use to turn qualitative (Yes/No) data into 1/0.
Multiple Regression
• Simple Regression with a Dummy Variable
Y = 1 + 2D + e
• OLS estimation, confidence intervals, testing, etc. carried out in standard way
• Interpretation a little different.
Dummy Variables
• Simple Regression with a Dummy
Variable
• Fitted value for ith observation (point on
regression line):
• Since Di = 0 or 1 either
• or37
Ŷi = b1 + b2Di
Ŷi = b1
Ŷi = b1 + b2Di
Dummy Variables• Example: Explaining house prices (continued)
• Regress Y = house price on D = dummy for air
conditioning (=1 if house has air conditioning, = 0
otherwise).
• Result:
• Average price of house with air conditioning is $85,881
• Average price of house without air conditioning is
$59,88538
b1 = 59,885
b2 = 25,996
b1 + b2 = 85,881
Dummy Variables• Multiple Regression with Dummy Variables
• Example: Explaining house prices (continued)
• Y = 1 + 2D1 . . . kDk + e
• Regress Y = house price on D1 = driveway dummy and
D2 = rec room dummy
•
• Four types of houses:
• Houses with a driveway and a rec room (D1=1, D2=1)
• Houses with a driveway but no rec room (D1=1, D2=0)
• Houses with a rec room but no driveway (D1=0, D2=1)
• Houses with no driveway and no rec room (D1=0, D2=0)
Dummy Variables
Example: Explaining house prices (continued)
• If D1=1 and D2=1, then
Ŷi = b1 + b2 + b3 = 47,099 + 21,160 + 16,024 = 84,283
• “The average price of houses with a driveway and rec room is
$84,283”.
Coeff. St.
Error
t Stat P-
value
Lower
95%
Upper
95%
Inter. 47099.1 2837.6 16.60 2.E-50 41525 52673
D1 21159.9 3062.4 6.91 1.E-11 15144 27176
D2 16023.7 2788.6 5.75 1.E-08 10546 21502
Dummy Variables• If D1 = 1 and D2=0, then
• “The average price of houses with a driveway but no rec room is $68,259”.
• If D1=0 and D2=1, then
•
• “The average price of houses with a rec room but no driveway is $63,123”.
•
• If D1=0 and D2=0, then
•
• “The average price of houses with no driveway and no rec room is $47,099”.
Ŷi = b1 + b2 = 47,099 + 21,160 = 68,259
Ŷi = b1 + b3 = 47,099 + 16,024 = 63,123
Ŷi = b1 = 47,099
Dummy Variables
• Multiple Regression with Dummy and non-Dummy
Explanatory Variables
• Regress Y = house price on D = air conditioning
dummy and X = lot size.
• OLS estimates: b1= 32,693
b2 = 20,175
b3 = 5.64
Dummy Variables
• For houses with an air conditioner D = 1 and
• For houses without an air conditioner D=0 and
• Two different regression lines depending on whether the house has an air conditioner or not.
• Two lines have different intercepts but same slope (i.e. same marginal effect)
Ŷi = 52,868 + 5.64X
Ŷi = 32,693 + 5.64X
Dummy Variables
• Verbal ways of expressing OLS results:
• “An extra square foot of lot size will tend to add $5.64 onto the price of a house” (Note: no ceteris paribusqualifications to statement since marginal effect is same for houses with and without air conditioners)
• “Houses with air conditioners tend to be worth $20,175 more than houses with no air conditioners, ceteris paribus” (Note: Here we do have ceteris paribusqualification)
• “If we consider houses with similar lot sizes, those with air conditioners tend to be worth an extra $20,175”
Dummy Variables
• Another House Price Regression
• Regress Y = house price on D1 = dummy variable for driveway, D2 = dummy variable for rec room, X1 = lot size and X2 = number of bedrooms
• OLS estimates:
Y = 1 + 2D1 + 3D2 + 4X1 + 5X2 + e
b1= -2,736b2 = 12,598b3 = 10,969b4 = 5.197b5 = 10,564
Dummy Variables
1. If D1=1 and D2=1, then
This is the regression line for houses with a driveway and rec room.
2. If D1=1 and D2=0, then
• This is the regression line for houses with a driveway but no rec room.
3. If D1=0 and D2=1, then
This is the regression line for houses
with a rec room but no driveway.
4. If D1=0 and D2=0, then
This is the regression line for houses
with no driveway and no rec room.
Dummy Variables
1. If D1=1 and D2=1, then
This is the regression line for houses with a driveway and rec room.
2. If D1=1 and D2=0, then
• This is the regression line for houses with a driveway but no rec room.
Dummy Variables
3. If D1=0 and D2=1, then
This is the regression line for houses with a rec
room but no driveway
4. If D1=0 and D2=0, then
This is the regression line for houses with no
driveway and no rec room.
Dummy Variables
• “Houses with driveways tend to be worth
$12,598 more than similar houses with no
driveway.”
• “If we consider houses with the same number
of bedrooms, then adding an extra square foot
of lot size will tend to increase the price of a
house by $5.197.”
• “An extra bedroom will tend to add $10,562 to
the value of a house, ceteris paribus”
Dummy Variables
• Interacting Dummy and Non-Dummy Variables
• Where Z=DX.
• Z is either 0 (for observations with D=0) or X (for observations with D=1)
• If D=1 then
• If D=0, then
Two different regression lines corresponding to D=0 and D=1 exist and have different intercepts and slopes. • The marginal effect of X on Y is different for D=0 and
D=1
Y = 1 + 2D + X + 4Z + e
Ŷi = (b1 + b2) + (b3 + b4)X
Ŷi = b1 + b3X
Dummy Variables
• Regress Y = house price on D = air conditioner dummy, X = lot size and Z = DX
• OLS estimates:
• The marginal effect of lot size on housing is 7.27 for houses with air conditioners and only $5.02 for houses without.
• Increasing lot size will tend to add more to the value of a house if it has an air conditioner than if it does not.
51
b1= 35,684
b2 = 7,613
b3 = 5.02
b4 = 2.25
Dummy Variables
• Issue here: Using dummy variables on the right side of regression, i.e., as INDEPENDENT variables
• Dummy variables are (0,1) variables
• The illustrative example:
• House prices depend upon house characteristics: size (square feet), location (may be dummy), # of bedrooms, bathrooms, age, whether has a pool (dummy), whether has a tile roof (dummy), etc.
• Note: These sorts of models are called “hedonic price” models.
52
Dummy Variables
Here again is our basic simple regression model:
1 2PRICE SQFT e
1 if characteristic is present
0 if characteristic is not presentD
1 if property is in the desirable neighborhood
0 if property is not in the desirable neighborhoodD
Dummy Variables
1 2PRICE D SQFT e
1 2
1 2
( ) when 1( )
when 0
SQFT DE PRICE
SQFT D
Dummy Variables
Dummy Variables
1 2 ( )PRICE SQFT SQFT D e
1 2
1 2
1 2
( ) when 1( )
when 0
SQFT DE PRICE SQFT SQFT D
SQFT D
2
2
when 1 ( )
when 0
DE PRICE
DSQFT
1 2
1 2
1 2
( ) when 1( )
when 0
SQFT DE PRICE SQFT SQFT D
SQFT D
2
2
when 1 ( )
when 0
DE PRICE
DSQFT
Dummy variables can also be used to determine whether the
marginal impact of one variable depends on the presence (or
absence) of the dummy characteristic. Does price per square
foot depend upon whether the house is in a good
neighborhood?
Dummy Variables
Dummy Variables
• Here’s a model with both intercept and slope
dummy effects. We will estimate it now.
Dummy Variables
• 1000 house sales from two similar neighborhoods,
one bordering a university, the other 3 miles away.
• Variables:
• price house price, in $1000
• sqft square feet of living area, in 100's
• age house age, in years
• utown =1 if close to university
• pool =1 if house has pool
• fplace =1 if house has fireplace
Dummy Variables
• The Model:
• Sample Data:
Dummy Variables
• Model 2: OLS, using observations 1-1000• Dependent variable: p
• coefficient std. error t-ratio p-value • ---------------------------------------------------------• const 24.5000 6.19172 3.957 8.13e-05 ***• sqft 7.61218 0.245176 31.05 1.87e-148 ***• utown 27.4530 8.42258 3.259 0.0012 ***• usqft 0.0129940 0.00332048 3.913 9.72e-05 ***• pool 4.37716 1.19669 3.658 0.0003 ***• fplace 1.64918 0.971957 1.697 0.0901 *• age -0.190086 0.0512046 -3.712 0.0002 ***
• Mean dependent var 247.6557 S.D. dependent var 42.19273• Sum squared resid 230184.4 S.E. of regression 15.22521• R-squared 0.870570 Adjusted R-squared 0.869788• F(6, 993) 1113.183 P-value(F) 0.000000
Dummy VariablesPremium for
lots near the
university
is….
Premium for
having a pool
is…
Premium for
having a
fireplace is …
Δ Price due to
Δ in house
size is . . .
A. $24,500
B. $76
C. $4,377
D. $1.2994
E. $1,649
F. $27,453
A. $24,500
B. $76
C. $4,377
D. $1.2994
E. $1,649
F. $27,453
A. $24,500
B. $76
C. $4,377
D. $1.2994
E. $1,649
F. $27,453
A. $24,500
B. $76
C. $4,377
D. $1.2994
E. $1,649
F. $27,45
Dummy Variables
Based on these regression results, we estimate:
the location premium, for lots near the university, to be
$27,453
the price per square foot to be $89.12 for houses near the
university, and $76.12 for houses in other areas.
that houses depreciate $190.10 per year
that a pool increases the value of a home by $4377.20
that a fireplace increases the value of a home by
$1649.20
Dummy VariablesVariable Coefficient X Units Y Units Marginal Effect
UTOWN (1 = close
to university)
27.453 1,0 1000s 1 x 2.7453 x 1000 =
$27453
SQFT (size in 100s) 7.612 100s of
SQFT
1000s 7.612 ÷ 100 = .07612 x
1000 = $76.12
USQFT ( x SQFT) .01299 1,0 1000s .01299 x 1000 = $12.99
+ $76.12 = $89.11
AGE (years) -.190 Years 1000s 1 x -.19 x 1000 = - $190
FPLACE (1 =
fireplace)
1.649 1,0 1000s 1 x 1.649 x 1000 =
$1,649
POOL (1 = pool) 4.377 1,0 1000s 1 x 4.377 x 1000 =
$4,377
Dummy Variables
• Other Interactions:
– Here we are interacting a dummy with a
continuous variable
– Also could interact with other dummies:
• Does gender discrimination in wages depend upon
race? Interact gender with race in wage equation.
– Can interact continuous variables
• Does price per square foot depend upon house age?
Interact AGE with SQFT.
Dummy Variables
• There are many examples of categorical data– Regions, seasons, time (years)
– Credit ratings
– School ratings
• In some cases, there is a natural ordering (higher is better, as with ratings)
• In others, there is no natural ordering (regions)
• Even in cases with natural ordering, however, there is no clear meaning to units– Going from rank 1 to 2 is not the same “quantity” as going
from 2 to 3
• For these cases, define multiple dummies to account for different regions, seasons, years, ratings
Dummy Variables
• Equivalence of two regressions
–First begin a base model that includes
interactions between dummy variables.
–A wage model, where wage may depend
upon
• Education
• Gender
• Race
Dummy Variables
1 2
1 1 2
1 2 2
1 1 2 2
( )
EDUC WHITE MALE
EDUC BLACK MALEE WAGE
EDUC WHITE FEMALE
EDUC BLACK FEMALE
Dummy Variables
• Base Model Estimate:
Dummy Variables
• Calculate the marginal effects on hourly wage
due to race and gender. Effects are additive!!
Coefficient for
Black Males = δ1
= -1.83
Therefore, the
marginal effect of
being a black male
on hourly wage is
-$1.83
Coefficient for
Females = δ2 = -
2.55
Therefore, the
marginal effect of
being a femal on
hourly wage is
-$2.55
Coefficient for Black
Females = γ = .59
Add δ1 + δ2 + γ = -
3.79
Therefore, the
marginal effect of
being a black female
on hourly wage is
-$3.79
Dummy Variables
• Do an F Test to test the hypothesis that Gender and Race Don’t Matter:
– 3 Restrictions (J), 5 Regressors (K), N=1000
R Model = WAGE = 1 + 2 EDUC + e
– Result: F = 3.80
– Critical Value = 2.60
– Reject the null.
Dummy Variables/Gretl Practice
• We are now going to return to a model we
looked at earlier for women’s labor force
participation
• This model used 1990 data from 50 states.
Gretl Practice
• Here is a model of the determinants of women’s labor force participation for all 50 states.
• WLFP = Participation rate (%) of women > 16 in the labor force
• YF = Annual median earnings by females (000s of $)
• YM = Annual median earnings by males (000s of $)
• EDUC = Female HS grads > 24 (%)
• UE = Unemployment rate (%)
• MR = Marriage rate (%) women over 16
• DR = Divorce rate (%)
• URB = % of state’s population that is urban
• WH = % of state’s female > 16 population who are white
Gretl Practice
• This is another “kitchen sink” model:
• WLFP = 1 + 2YF + 3YM + 4EDUC +
5UE + 6MR + 7DR + 8URB + 9WH e
Gretl Practice
• Model 1: OLS, using observations 1-50• Dependent variable: wlfp
• Coefficient Std. Error t-ratio p-value• const 44.5096 8.97496 4.9593 0.00001 ***• yf 0.987983 0.407583 2.4240 0.01985 **• ym -0.174345 0.306207 -0.5694 0.57221• educ 0.285129 0.0931647 3.0605 0.00389 ***• ue -1.61058 0.313617 -5.1355 <0.00001 ***• mr -0.0782145 0.173139 -0.4517 0.65383• dr 0.437371 0.258336 1.6930 0.09804 *• urb -0.0926339 0.0333355 -2.7788 0.00820 ***• wh -0.0874916 0.0398446 -2.1958 0.03382 **
• Mean dependent var 57.47400 S.D. dependent var 4.248784• Sum squared resid 193.9742 S.E. of regression 2.175104• R-squared 0.780710 Adjusted R-squared 0.737922• F(8, 41) 18.24590 P-value(F) 2.90e-11• Log-likelihood -104.8395 Akaike criterion 227.6790• Schwarz criterion 244.8872 Hannan-Quinn 234.2319
Gretl Practice
• Model 4: OLS, using observations 1-50• Dependent variable: wlfp
• Coefficient Std. Error t-ratio p-value• const 41.346 5.55984 7.4365 <0.00001 ***• yf 1.06712 0.364515 2.9275 0.00550 ***• educ 0.258172 0.0708648 3.6432 0.00073 ***• ue -1.59099 0.307647 -5.1715 <0.00001 ***• dr 0.391632 0.235404 1.6637 0.10363• urb -0.0876356 0.0311463 -2.8137 0.00742 ***• wh -0.0850871 0.0391115 -2.1755 0.03527 **• ym -0.198418 0.298664 -0.6644 0.51010
• Mean dependent var 57.47400 S.D. dependent var 4.248784• Sum squared resid 194.9397 S.E. of regression 2.154396• R-squared 0.779619 Adjusted R-squared 0.742888• F(7, 42) 21.22554 P-value(F) 6.62e-12• Log-likelihood -104.9636 Akaike criterion 225.9272• Schwarz criterion 241.2234 Hannan-Quinn 231.7521
Gretl Practice
• Model 5: OLS, using observations 1-50• Dependent variable: wlfp
• Coefficient Std. Error t-ratio p-value• const 41.8336 5.47528 7.6405 <0.00001 ***• yf 0.849264 0.158152 5.3699 <0.00001 ***• educ 0.249152 0.0690987 3.6057 0.00080 ***• ue -1.67758 0.276859 -6.0593 <0.00001 ***• dr 0.434104 0.22508 1.9287 0.06039 *• urb -0.0942172 0.0293363 -3.2116 0.00250 ***• wh -0.0960861 0.0352037 -2.7294 0.00916 ***
• Mean dependent var 57.47400 S.D. dependent var 4.248784• Sum squared resid 196.9882 S.E. of regression 2.140355• R-squared 0.777303 Adjusted R-squared 0.746229• F(6, 43) 25.01455 P-value(F) 1.55e-12• Log-likelihood -105.2249 Akaike criterion 224.4499• Schwarz criterion 237.8341 Hannan-Quinn 229.5467
Dummy Variables/Gretl Practice
• We are going to look at data from both 1980
and 1990 because it’s possible there was a
structural change in WLFP over that decade.
• We are going to create a Dummy Variable for
1990, called D90. For 1990, D = 1.
• Now we are going to incorporate interaction
terms, by multiplying D90 by the other
variables.
Dummy Variables/Gretl Practice
• Here is the model we are starting with:
• WLFP = 1 + 2YF + 3YM + 4EDUC +
5UE + 6MR + 7DR + 8URB + 9WH
q1(D90*YF) + q2(D90*YM) + q3(D90*EDUC)
+ q4(D90*UE) + q5(D90*MR) + q6(D90*DR)
+ q7(D90*URB) + q8(D90*WH) + e
Dummy Variables/Gretl Practice
• Model 2: OLS, using observations 1-100
• Dependent variable: WLFP
• Omitted due to exact collinearity: D90YM
• coefficient std. error t-ratio p-value
• ------------------------------------------------------------
• const 49.6235 10.5465 4.705 1.00e-05 ***
• YF 0.00470565 0.000948974 4.959 3.71e-06 ***
• YM -0.000133492 0.000273021 -0.4889 0.6262
• EDUC 0.286358 0.0586220 4.885 4.97e-06 ***
• UE -1.09155 0.269898 -4.044 0.0001 ***
• MR -0.210187 0.153263 -1.371 0.1739
• DR 0.208349 0.172456 1.208 0.2304
• URB -0.0665164 0.0299182 -2.223 0.0289 **
• WH -0.126810 0.0343940 -3.687 0.0004 ***
• D90 -4.85398 13.6340 -0.3560 0.7227
• D90YF -0.00376523 0.000880843 -4.275 5.08e-05 ***
• D90EDUC -0.00164444 0.111949 -0.01469 0.9883
• D90UE -0.537329 0.396399 -1.356 0.1789
• D90MR 0.127952 0.230950 0.5540 0.5811
• D90DR 0.239853 0.304639 0.7873 0.4333
• D90URB -0.0276884 0.0437808 -0.6324 0.5288
• D90WH 0.0369987 0.0517175 0.7154 0.4764
• R-squared 0.861932 Adjusted R-squared 0.835316
Dummy Variables/Gretl Practice
• Suspecting multicollinearity, we eliminate variables with insignificant coefficients one at a time.
Dummy Variables/Gretl Practice
• Model 3: OLS, using observations 1-100
• Dependent variable: WLFP
• coefficient std. error t-ratio p-value
• ----------------------------------------------------------
• const 47.6366 6.57840 7.241 1.52e-010 ***
• YF 0.00477939 0.000733949 6.512 4.28e-09 ***
• EDUC 0.275070 0.0455059 6.045 3.43e-08 ***
• UE -1.06141 0.245591 -4.322 4.02e-05 ***
• MR -0.207293 0.104894 -1.976 0.0512 *
• DR 0.281618 0.133697 2.106 0.0380 **
• URB -0.0784652 0.0206237 -3.805 0.0003 ***
• WH -0.111495 0.0242421 -4.599 1.40e-05 ***
• D90YF -0.00405375 0.000682124 -5.943 5.36e-08 ***
• D90UE -0.569355 0.327225 -1.740 0.0853 *
• D90MR 0.126361 0.0509756 2.479 0.0151 **
• R-squared 0.858214 Adjusted R-squared 0.842283
• F(10, 89) 53.87052 P-value(F) 1.98e-33
Dummy Variables/Gretl Practice
• Here is the final model:
• WLFP = 47.63 + .00478 YF - .00405 (D90 *
YF) + .275 EDUC – 1.06 UE - .569 (D90 *
UE) - .207 MR + .126 (D90 * MR) + .282 DR
- .078 URB - .111 WH
• Adjusted R2 = .842
Dummy Variables/Gretl Practice
For 1980, set D90 = 0
• WLFP = 47.63
+ .00478YF + .275
EDUC – 1.06 UE - .207
MR +.282 DR - .078 URB
- .111 WH
For 1990, set D90 = to 1 and
combine terms
• WLFP = 47.63
+ .00073 YF + .275
EDUC – 1.63UE - .081
MR + .282 DR - .078
URB - .111 WH
Dummy Variables
Observations:
1. Marginal effect of marriage is smaller in 1990,
suggesting more women stayed in the labor after
marriage.
2. More dramatic unemployment coefficient
suggests the discouraged worker hypothesis is
stronger in 1990.
3. YF???
The Project
• Teams
• TOPICS