1 Decomposition Methods - Illustrative Example
• Follow up to “Decomposition Methods in Economics by Nicole Fortin,
Thomas Lemieux, and Sergio Firpo in the recent Handbook of Labor
Economics (Volume 4A, 2011)
1.1 Oaxaca-Blinder and the Gender Pay Gap
• Case featured in O’Neill and O’Neill (2006)“What Do Wage Differen-
tials Tell Us about Labor Market Discrimination?’ NBER WP11240,
published in The Economics of Immigration and Social Policy, edited
by S. Polachek, C.Chiswich, and H. Rapoport. Research in Labor
Economics 24:293-357.
• Use 2000 wage data from the NLSY79 when the cohort was 35-43
years of age.
• The NLSY being a longitudinal survey has actual labor market expe-
rience and a AFQT score.
• The sample is restricted to civilian wage and salary workers, thereby
omitting self-employed workers.
• The wage rates are the hourly wage as reported directly by those paid
by the hour. For those who are paid on another basis –day, week,
month, usual weekly earnings are divided by usual weekly hours.
1.2 Know the distribution of interest
• Plotting the density of wage by gender or the gender differential byquantile illustrates potential issues of economic interest
. *** use vkdensity to check bandwidth ;
. foreach aut in silver scott hardle {;2. vkdensity lropc00 if female==0, epan ‘aut’;3. vkdensity lropc00 if female==1, epan ‘aut’;4. };
bandwidth choice (Silverman)= .09915983bandwidth choice (Silverman)= .09591583bandwidth choice (Scott)= .12950542bandwidth choice (Scott)= .12191967bandwidth choice (Hardle)= .11678825bandwidth choice (Hardle)= .11296753
. kdensity lropc00 if female==0, gen(evalm1 densm1) width(0.10) nograph ;
. kdensity lropc00 if female==1, gen(evalf1 densf1) width(0.10) nograph ;
. graph twoway (histogram lropc00 if female==1, bin(50) lcolor(erose)> fi(inten80) fcolor(erose) ) (histogram lropc00 if female==0, bin(50)> lcolor(eltblue) fi(inten80) fcolor(eltblue) ) (connected densf1 evalf1,> m(i) lp(dash) lw(medium) lc(red) ) (connected densm1 evalm1, m(i)> lp(longdash) lw(medium) lc(blue) ) , ytitle("Density")> ylabel(0.0 0.2 0.4 0.6 0.8) xlabel(1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5)> xtitle("Log(wage)") legend(ring(0) pos(2) col(1) lab(1 "Women")> lab(2 "Men") lab(3 " ") lab(4 " ") order(1 3 2 4) region(lstyle(none))> symxsize(8) keygap(1) textwidth(25) ) saving(nlsy00_dens,replace)
. graph export nlysy00_dens.eps, replace(file nlysy00_dens.eps written in EPS format)
0.2
.4
.6
.8
Density
1 1.5 2 2.5 3 3.5 4 4.5Log(wage)
Women
Men
Figure 1: Densities of Male and Female Wages
. pctile evalf2=lropc00 if female==1 , nq(100) ;
. pctile evalm2=lropc00 if female==0 , nq(100) ;
. gen qdiff=evalm2-evalf2 if _n<100;
(5210 missing values generated)
. gen qtau=_n/100 if _n<100;
graph twoway (line qdiff qtau if qtau>0.0 & qtau<1.0, connect(l)> m(i) lw(medium) lc(black) ) , yline(.2333003, lpattern(solid) lcolor(red))> yline(.2046703 .2619303, lpattern(dash) lcolor(erose) )> xlabel(0.0 0.2 0.4 0.6 0.8 1.0) ylabel(0.0 0.1 0.2 0.3 0.4)> xtitle("Quantile") ytitle("Log Wage Differential")> saving(nlsy00_qplot,replace) ;
0.1
.2
.3
.4
Log Wage Differential
0 .2 .4 .6 .8 1Quantile
Figure 2: Gender Differential by Quantile
• Essentially in the NLSY 2000, gender wage differentials hoover within
standard errors of the average differential of 0.233 (0.015) in the entire
(15-85) interquartile range
• This is not the case among high paid executives from the Execucomp
Database! (extending Bertrand and Hallock, 2001)
0.1
.2
.3
.4
Differential in Log Total Compensation
0 .2 .4 .6 .8 1Quantile
Raw GapMean Gap
Figure 3: Gender Differential in Log Total Compensation by Quantile
Explanatory VariablesFemale 0 1 -0.092 ( 0.014)Education and skill level <10 yrs. 0.053 0.032 -0.027 ( 0.043) -0.089 ( 0.05) -0.027 ( 0.043) -0.045 ( 0.033) 10-12 yrs (no diploma or GED) 0.124 0.104 --- --- --- --- --- --- --- --- HS grad (diploma) 0.326 0.298 -0.013 ( 0.028) -0.002 ( 0.029) -0.013 ( 0.028) -0.003 ( 0.02) HS grad (GED) 0.056 0.045 0.032 ( 0.042) -0.012 ( 0.044) 0.032 ( 0.042) 0.006 ( 0.03) Some college 0.231 0.307 0.164 ( 0.031) 0.101 ( 0.03) 0.164 ( 0.031) 0.131 ( 0.022) BA or equiv. degree 0.155 0.153 0.380 ( 0.037) 0.282 ( 0.036) 0.380 ( 0.037) 0.330 ( 0.026) MA or equiv. degree 0.041 0.054 0.575 ( 0.052) 0.399 ( 0.046) 0.575 ( 0.052) 0.468 ( 0.034) Ph.D or prof. Degree 0.015 0.007 0.862 ( 0.077) 0.763 ( 0.1) 0.862 ( 0.077) 0.807 ( 0.06) AFQT percentile score (x.10) 4.231 3.971 0.042 ( 0.004) 0.041 ( 0.004) 0.042 ( 0.004) 0.042 ( 0.003)L.F. withdrawal due to family resp. 0.129 0.547 -0.078 ( 0.025) -0.083 ( 0.019) -0.078 ( 0.025) -0.067 ( 0.015)Lifetime Work Experience Years worked civilian 17.160 15.559 0.038 ( 0.003) 0.030 ( 0.002) 0.038 ( 0.003) 0.033 ( 0.002) Years worked military 0.578 0.060 0.024 ( 0.005) 0.042 ( 0.013) 0.024 ( 0.005) 0.021 ( 0.004) % worked part-time 0.049 0.135 -0.749 ( 0.099) -0.197 ( 0.049) -0.749 ( 0.099) -0.346 ( 0.044)Industrial Sectors Primary, Constr. & Utilities 0.186 0.087 --- --- --- --- 0.059 ( 0.031) --- --- Manufacturing 0.237 0.120 0.034 ( 0.026) 0.140 ( 0.035) 0.093 ( 0.029) 0.072 ( 0.021) Education, Health, & Public Adm. 0.130 0.358 -0.059 ( 0.031) 0.065 ( 0.03) --- --- -0.001 ( 0.02) Other Services 0.447 0.436 0.007 ( 0.024) 0.088 ( 0.029) 0.066 ( 0.026) 0.036 ( 0.018)Constant 2.993 ( 0.156) 2.865 ( 0.144) 2.934 ( 0.157) 2.949 ( 0.105)
Dependent Var. (Log Hourly Wage) 2.763 2.529Adj. R-Square 0.422 0.407 0.422 0.431Sample size 2655 2654
Means Female Coef.Male Coef. Male Coef
Table 2. Means and OLS Regression Coefficients of Selected Variables from NLSY Log Wage Regressions for Workers Ages 35-43 in 2000
Pooled Coef
Note: The data is an extract from the NLSY79 used in O'Neill and O'Neill (2006). Industrial sectors were added (at a lost of 89 observations) to their analysis to illustrate issues linked to categorical variables. The other explanatory variables are age, dummies for black, hispanic, region, msa, central city. Standard errors are in parentheses.
(1) (2) (3) (4) (5)
1.3 Gender Wage Gap and Female Dummy
• What’s wrong with the sequential introduction of explanatory vari-
ables?
• It depends on the order of the “decomposition”
• Let’s see “part-time” work example
Explanatory VariablesFemale -0.233 ( 0.015) -0.209 ( 0.015) -0.113 ( 0.014) -0.092 ( 0.014)"Explained by % worked part-time" -0.025 -0.021Education and skill level <10 yrs. --- --- --- --- -0.092 ( 0.014) -0.045 ( 0.033) 10-12 yrs (no diploma or GED) --- --- --- --- --- --- --- --- HS grad (diploma) --- --- --- --- -0.092 ( 0.014) -0.003 ( 0.02) HS grad (GED) --- --- --- --- -0.012 ( 0.044) 0.006 ( 0.03) Some college --- --- --- --- 0.101 ( 0.03) 0.131 ( 0.022) BA or equiv. degree --- --- --- --- 0.282 ( 0.036) 0.330 ( 0.026) MA or equiv. degree --- --- --- --- 0.399 ( 0.046) 0.468 ( 0.034) Ph.D or prof. Degree --- --- --- --- 0.763 ( 0.1) 0.807 ( 0.06) AFQT percentile score (x.10) --- --- --- --- 0.041 ( 0.004) 0.042 ( 0.003)L.F. withdrawal due to family resp. --- --- --- --- -0.083 ( 0.019) -0.067 ( 0.015)Lifetime Work Experience Years worked civilian --- --- --- --- 0.033 ( 0.002) 0.033 ( 0.002) Years worked military --- --- --- --- 0.021 ( 0.004) 0.021 ( 0.004) % worked part-time --- --- -0.288 ( 0.055) --- --- -0.346 ( 0.044)Industrial Sectors Primary, Constr. & Utilities --- --- --- --- --- --- --- --- Manufacturing --- --- --- --- 0.084 ( 0.021) 0.072 ( 0.021) Education, Health, & Public Adm. --- --- --- --- 0.008 ( 0.021) -0.001 ( 0.02) Other Services --- --- --- --- 0.038 ( 0.018) 0.036 ( 0.018)Constant 2.763 ( 0.01) 2.777 ( 0.011) 2.955 ( 0.106) 2.949 ( 0.105)
Adj. R-Square 0.046 0.051 0.422 0.431Sample size 2655 2654
Pooled Coef Pooled Coef Pooled Coef Pooled Coef
Note: The data is an extract from the NLSY79 used in O'Neill and O'Neill (2006). Industrial sectors were added (at a lost of 89 observations) to their analysis to illustrate issues linked to categorical variables. The other explanatory variables are age, dummies for black, hispanic, region, msa, central city. Standard errors are in parentheses.
Table 2b. OLS Regression Coefficients of Selected Variables from NLSY Log Wage Regressions for Workers Ages 35-43
(1) (2) (3) (5)
Reference Group:
Unadjusted mean log wage gap : E[ ln(w m )]-E[ ln(w f )] 0.233 ( 0.015) 0.233 ( 0.015) 0.233 ( 0.015) 0.233 ( 0.015) 0.233 ( 0.015)Composition effects attributable to Age, race, region, etc. 0.012 ( 0.003) 0.012 ( 0.003) 0.009 ( 0.003) 0.011 ( 0.003) 0.010 ( 0.003) Education -0.012 ( 0.006) -0.012 ( 0.006) -0.008 ( 0.004) -0.010 ( 0.005) -0.010 ( 0.005) AFQT 0.011 ( 0.003) 0.011 ( 0.003) 0.011 ( 0.003) 0.011 ( 0.003) 0.011 ( 0.003) L.T. withdrawal due to family 0.033 ( 0.011) 0.033 ( 0.011) 0.035 ( 0.008) 0.034 ( 0.007) 0.028 ( 0.007) Life-time work experience 0.137 ( 0.011) 0.137 ( 0.011) 0.087 ( 0.01) 0.112 ( 0.008) 0.092 ( 0.007) Industrial sectors 0.017 ( 0.006) 0.017 ( 0.006) 0.003 ( 0.005) 0.010 ( 0.004) 0.009 ( 0.004) Total explained by model 0.197 ( 0.018) 0.197 ( 0.018) 0.136 ( 0.014) 0.167 ( 0.013) 0.142 ( 0.012)
Wage structure effects attributable to Age, race, region, etc. -0.098 ( 0.234) -0.098 ( 0.234) -0.096 ( 0.232) -0.097 ( 0.233) -0.097 ( 0.24) Education 0.045 ( 0.034) 0.045 ( 0.034) 0.041 ( 0.033) 0.043 ( 0.034) 0.043 ( 0.031) AFQT 0.003 ( 0.023) 0.003 ( 0.023) 0.003 ( 0.025) 0.003 ( 0.024) 0.002 ( 0.025) L.T. withdrawal due to family 0.003 ( 0.017) 0.003 ( 0.017) 0.001 ( 0.004) 0.002 ( 0.011) 0.007 ( 0.01) Life-time work experience 0.048 ( 0.062) 0.048 ( 0.062) 0.098 ( 0.067) 0.073 ( 0.064) 0.092 ( 0.065) Industrial sectors -0.092 ( 0.033) 0.014 ( 0.028) -0.077 ( 0.029) -0.085 ( 0.031) -0.084 ( 0.032) Constant 0.128 ( 0.213) 0.022 ( 0.212) 0.193 ( 0.211) 0.128 ( 0.213) 0.128 ( 0.216)Total wage structure - Unexplained log wage gap
0.036 ( 0.019) 0.036 ( 0.019) 0.097 ( 0.016) 0.066 ( 0.015) 0.092 ( 0.014)
Note: The data is an extract from the NLSY79 used in O'Neill and O'Neill (2006). The other explanatory variables are age, dummies for black, hispanic, region, msa, central city. In column (1), the omitted industrial sector is "Primary, Construction, and Utilities". In column (2), the omitted industrial sector is "Education, Health and Public Admin". Standard errors are in parentheses. The means of the variables are reported in Table 2.
(1) (2) (3) (4) (5)
Table 3. Gender Wage Gap: Oaxaca-Blinder Decomposition Results (NLSY, 2000)
Using Male Coef. from col. 2, Table 2
Using Female Coef. Using Weighted Sum Using Male Coef. from col. 4, Table 2
Using Pooled from col. 5, Table 2
1.4 Oaxaca-Blinder in Stata
• Choose package by Ben Jann for ETH Zurich
st0151 from http://www.stata-journal.com/software/sj8-4SJ8-4 st0151. The Blinder-Oaxaca decomposition for linear... / TheBlinder-Oaxaca decomposition for linear regression / models / by Ben Jann,ETH Zurich / Support: [email protected] / After installation, type helpoaxaca
. *** Table 3, Column 2;
. oaxaca lropc00 age00 msa ctrlcity north_central south00 west hispanic black> sch_10 diploma_hs ged_hs smcol bachelor_col master_col doctor_col> afqtp89 famrspb wkswk_18 yrsmil78_00 pcntpt_22 primary eduheal othind,> by(female) weight(1)> detail(groupdem:age00 msa ctrlcity north_central south00 west hispanic black,> groupaf:afqtp89,> grouped:sch_10 diploma_hs ged_hs smcol bachelor_col master_col doctor_col ,> groupfam:famrspb,
> groupex:wkswk_18 yrsmil78_00 pcntpt_22 ,> groupind: primary eduheal othind) ;
Blinder-Oaxaca decomposition Number of obs = 5309
1: female = 02: female = 1
------------------------------------------------------------------------------lropc00 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------Differential |Prediction_1 | 2.762557 .0106598 259.16 0.000 2.741664 2.78345Prediction_2 | 2.529257 .0100367 252.00 0.000 2.509585 2.548928Difference | .2333003 .0146413 15.93 0.000 .2046039 .2619967
-------------+----------------------------------------------------------------Explained |
groupdem | .0115371 .0032919 3.50 0.000 .0050851 .0179891grouped | -.0124049 .0055175 -2.25 0.025 -.023219 -.0015907groupaf | .0108035 .0034414 3.14 0.002 .0040584 .0175486
groupfam | .0328186 .0106373 3.09 0.002 .0119698 .0536674
groupex | .137095 .0112599 12.18 0.000 .1150259 .159164groupind | .0174583 .0061707 2.83 0.005 .005364 .0295526
Total | .1973076 .0180079 10.96 0.000 .1620128 .2326024-------------+----------------------------------------------------------------Unexplained |
groupdem | -.0978872 .2338861 -0.42 0.676 -.5562956 .3605212grouped | .0454348 .0344576 1.32 0.187 -.0221009 .1129705groupaf | .0026284 .023485 0.11 0.911 -.0434014 .0486582
groupfam | .0025869 .0174562 0.15 0.882 -.0316266 .0368005groupex | .0475104 .0616535 0.77 0.441 -.0733281 .168349
groupind | .0137992 .0283532 0.49 0.626 -.0417722 .0693705_cons | .0219201 .2117714 0.10 0.918 -.3931443 .4369844Total | .0359927 .0185897 1.94 0.053 -.0004425 .0724279
------------------------------------------------------------------------------groupdem: age00 msa ctrlcity north_central south00 west hispanic blackgrouped: sch_10 diploma_hs ged_hs smcol bachelor_col master_col doctor_colgroupaf: afqtp89groupfam: famrspbgroupex: wkswk_18 yrsmil78_00 pcntpt_22groupind: primary eduheal othind
. *** Table 3, Column 3;
. oaxaca lropc00 age00 msa ctrlcity north_central south00 west hispanic black> sch_10 sch10_12 diploma_hs ged_hs bachelor_col master_col doctor_col> afqtp89 famrspb wkswk_18 yrsmil78_00 pcntpt_22 manuf eduheal othind,> by(female) weight(0)> detail(groupdem:age00 msa ctrlcity north_central south00 west hispanic black,> groupaf:afqtp89,> grouped:sch_10 sch10_12 diploma_hs ged_hs bachelor_col master_col doctor_col> groupfam:famrspb,> groupex:wkswk_18 yrsmil78_00 pcntpt_22 ,> groupind: manuf eduheal othind) ;
Blinder-Oaxaca decomposition Number of obs = 5309
1: female = 02: female = 1
------------------------------------------------------------------------------lropc00 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------Differential |
Prediction_1 | 2.762557 .0106598 259.16 0.000 2.741664 2.78345Prediction_2 | 2.529257 .0100367 252.00 0.000 2.509585 2.548928Difference | .2333003 .0146413 15.93 0.000 .2046039 .2619967
-------------+----------------------------------------------------------------Explained |
groupdem | .0094784 .0031157 3.04 0.002 .0033717 .0155851grouped | -.0077608 .0044009 -1.76 0.078 -.0163864 .0008648groupaf | .0106791 .0034073 3.13 0.002 .004001 .0173572
groupfam | .0348287 .008156 4.27 0.000 .0188433 .0508141groupex | .0869987 .0098831 8.80 0.000 .0676281 .1063692
groupind | .0024117 .0050158 0.48 0.631 -.0074191 .0122424Total | .1366357 .0143218 9.54 0.000 .1085655 .1647059
-------------+----------------------------------------------------------------Unexplained |
groupdem | -.0977918 .2325136 -0.42 0.674 -.5535101 .3579264grouped | -.0218645 .0204702 -1.07 0.285 -.0619853 .0182563groupaf | .0027562 .0250163 0.11 0.912 -.0462749 .0517873
groupfam | .0005559 .0041199 0.13 0.893 -.007519 .0086307groupex | .0974398 .0673318 1.45 0.148 -.0345281 .2294078
groupind | -.0770291 .0290617 -2.65 0.008 -.1339889 -.0200692_cons | .1925981 .2109645 0.91 0.361 -.2208846 .6060809
Total | .0966646 .0161679 5.98 0.000 .0649761 .128353------------------------------------------------------------------------------groupdem: age00 msa ctrlcity north_central south00 west hispanic blackgrouped: sch_10 sch10_12 diploma_hs ged_hs bachelor_col master_col doctor_coldoctor_col
groupaf: afqtp89groupfam: famrspbgroupex: wkswk_18 yrsmil78_00 pcntpt_22groupind: manuf eduheal othind
. *** Table 3, Column 5;
. oaxaca lropc00 age00 msa ctrlcity north_central south00 west hispanic black> sch_10 diploma_hs ged_hs smcol bachelor_col master_col doctor_col afqtp89> famrspb wkswk_18 yrsmil78_00 pcntpt_22 manuf eduheal othind,> by(female) pooled> detail(groupdem:age00 msa ctrlcity north_central south00 west hispanic black,> groupaf:afqtp89,> grouped:sch_10 diploma_hs ged_hs smcol bachelor_col master_col doctor_col ,> groupfam:famrspb,> groupex:wkswk_18 yrsmil78_00 pcntpt_22 ,> groupind: manuf eduheal othind) ;
Blinder-Oaxaca decomposition Number of obs = 5309
1: female = 02: female = 1
------------------------------------------------------------------------------| Robust
lropc00 | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------Differential |Prediction_1 | 2.762557 .0106321 259.83 0.000 2.741718 2.783395Prediction_2 | 2.529257 .0100104 252.66 0.000 2.509637 2.548877Difference | .2333003 .0146031 15.98 0.000 .2046787 .2619218
-------------+----------------------------------------------------------------Explained |
groupdem | .0103136 .0029705 3.47 0.001 .0044914 .0161357grouped | -.0095283 .0046437 -2.05 0.040 -.0186298 -.0004268groupaf | .0110502 .0034317 3.22 0.001 .0043242 .0177762
groupfam | .0281105 .006622 4.25 0.000 .0151316 .0410895groupex | .0923997 .0070155 13.17 0.000 .0786494 .1061499
groupind | .0091617 .003619 2.53 0.011 .0020686 .0162548Total | .1415073 .0121464 11.65 0.000 .1177008 .1653138
-------------+----------------------------------------------------------------Unexplained |
groupdem | -.0966636 .2401926 -0.40 0.687 -.5674324 .3741051grouped | .0425582 .031011 1.37 0.170 -.0182222 .1033387groupaf | .0023818 .0247265 0.10 0.923 -.0460812 .0508448
groupfam | .007295 .0096194 0.76 0.448 -.0115587 .0261487groupex | .0922058 .0650471 1.42 0.156 -.0352842 .2196957
groupind | -.083719 .0315878 -2.65 0.008 -.14563 -.0218081_cons | .1277349 .2162801 0.59 0.555 -.2961664 .5516361Total | .091793 .0137999 6.65 0.000 .0647457 .1188402
------------------------------------------------------------------------------groupdem: age00 msa ctrlcity north_central south00 west hispanic blackgrouped: sch_10 diploma_hs ged_hs smcol bachelor_col master_col doctor_colgroupaf: afqtp89groupfam: famrspbgroupex: wkswk_18 yrsmil78_00 pcntpt_22groupind: manuf eduheal othind
1.5 Choosing the base (omitted) category
• Resist mindless normalization of coefficients
• Favor interpretability of the results and comparability with literature