Upload
others
View
11
Download
0
Embed Size (px)
Citation preview
3/3/2014
1
CDS M Phil Econometrics Vijayamohan
CDS M Phil Econometrics
Vijayamohanan Pillai N
13-Mar-14 CDS Mphil Econometrics Vijayamohan
Dummy variable ModelsDummy variable Models
CDS Mphil Econometrics Vijayamohan
Dummy XDummy X--variablesvariables
Dummy Dummy YY--variablesvariables
CDS Mphil Econometrics Vijayamohan
Dummy XDummy X--variablesvariables
CDS M Phil Econometrics Vijayamohan
53-Mar-14
Dummy XDummy X--variablesvariables
Dummy variable: Dummy variable:
variable assuming values 0 and 1 to indicate variable assuming values 0 and 1 to indicate
some attributes some attributes
To classify data into mutually exclusive To classify data into mutually exclusive
categoriescategories
Also called: Also called:
indicator variable, binary variable, indicator variable, binary variable,
dichotomous variable, categorical variable, dichotomous variable, categorical variable,
qualitative variablequalitative variable
3-Mar-14 CDS M Phil Econometrics Vijayamohan
6
Dummy XDummy X--variablesvariables
YYii = = αα + + ββDDii + + uuii
YYii = Wage rate of an agricultural = Wage rate of an agricultural labourerlabourer
DDii = 1, if male worker= 1, if male worker0, otherwise.0, otherwise.
Mean wage of a male agri. worker?Mean wage of a male agri. worker?
E(YE(Yii | D| Dii = 1) = = 1) = αα + + ββ
3/3/2014
2
3-Mar-14 CDS M Phil Econometrics Vijayamohan
7
Dummy XDummy X--variablesvariables
YYii = = αα + + ββDDii + + uuii
YYii = Wage rate of an agricultural = Wage rate of an agricultural labourerlabourer
DDii = 1, if male worker= 1, if male worker0, otherwise.0, otherwise.
Mean wage of a female agri. worker?Mean wage of a female agri. worker?
E(YE(Yii | D| Dii = 0) = = 0) = αα
3-Mar-14 CDS M Phil Econometrics Vijayamohan
8
Dummy XDummy X--variablesvariables
YYii = = αα + + ββDDii + + uuii
YYii = Wage rate of an agricultural = Wage rate of an agricultural labourerlabourer
DDii = 1, if male worker= 1, if male worker0, otherwise.0, otherwise.
HH00: no sex discrimination : no sex discrimination ⇒⇒
HH00: : ββ= 0.= 0.
3-Mar-14 CDS M Phil Econometrics Vijayamohan
9
Dummy XDummy X--variablesvariables
YYii = = αα + + ββDDii + + uuii
YYii = Wage rate of an agricultural = Wage rate of an agricultural labourerlabourer
DDii = 1, if male worker= 1, if male worker0, otherwise.0, otherwise.
Analysis of Variance (ANOVA) Model:Analysis of Variance (ANOVA) Model:
Mean difference testMean difference test
CDS M Phil Econometrics Vijayamohan
103-Mar-14
In eco applications, In eco applications, control for other sociocontrol for other socio--eco factors:eco factors:caste, nature of work, experience, caste, nature of work, experience, …………Both quantitative and qualitative Both quantitative and qualitative variables:variables:Analysis of Covariance (ANCOVA) Analysis of Covariance (ANCOVA) ModelModel
Dummy XDummy X--variablesvariables
CDS M Phil Econometrics Vijayamohan
113-Mar-14
YYii = = αα00 ++ αα11DDii ++ββXXii ++ uuii
Mean wage of a female agri. worker?Mean wage of a female agri. worker?
E(YE(Yii | D| Dii = 0) = = 0) = αα00 + + ββXXii
Dummy XDummy X--variablesvariables
Mean wage of a male agri. worker?Mean wage of a male agri. worker?
E(YE(Yii | D| Dii = 1) = (= 1) = (αα00 ++ αα11) + ) + ββXXii
DDii = 1, if male worker= 1, if male worker= 0, otherwise.= 0, otherwise.
CDS M Phil Econometrics Vijayamohan
123-Mar-14
Dummy XDummy X--variablesvariables
Ag
ricu
ltu
ral
wag
e r
ate
Ag
ricu
ltu
ral
wag
e r
ate
XX
αα00
αα11
((αα 00
++αα 11
))
Differential intercept: Differential intercept: αα11
3/3/2014
3
CDS M Phil Econometrics Vijayamohan
133-Mar-14
Dummy XDummy X--variablesvariables
Ag
ricu
ltu
ral
wag
e r
ate
Ag
ricu
ltu
ral
wag
e r
ate
XX
αα00
αα11
((αα 00
++αα 11
)) ββ
ββ
CDS M Phil Econometrics Vijayamohan
143-Mar-14
Dummy XDummy X--variablesvariables
Ag
ricu
ltu
ral
wag
e r
ate
Ag
ricu
ltu
ral
wag
e r
ate
XX
αα00
αα11
((αα00++αα11))
ffmm
3 March 2014 Vijayamohan CDS 15
YYii = = αα00 ++ αα11DDii ++ββ11XXii ++ββ22DDiiXXii ++ uuii
Mean wage of a female agri. worker?Mean wage of a female agri. worker?
E(YE(Yii | D| Dii = 0) = = 0) = αα00 + + ββ11XXii
Mean wage of a male agri. worker?Mean wage of a male agri. worker?
E(YE(Yii | D| Dii = 1) = (= 1) = (αα00++αα11) + () + (ββ11++ββ22)X)Xii
DDii = 1, if male worker= 1, if male worker= 0, otherwise.= 0, otherwise.Interaction termInteraction term
CDS M Phil Econometrics Vijayamohan
163-Mar-14
Dummy XDummy X--variablesvariablesA
gri
cu
ltu
ral
wag
e r
ate
Ag
ricu
ltu
ral
wag
e r
ate
XX
αα00
ββ11ββ11+ + ββ22
αα11
((αα00++αα11))
CDS Mphil Econometrics Vijayamohan
Dummy YDummy Y--variablesvariables
Discrete Choice ModelsDiscrete Choice Models Many situations Many situations in in which which the the dependent variable dependent variable is is not a continuous variablenot a continuous variable..
Discrete Discrete or, or, qualitativequalitative
CDS Mphil Econometrics Vijayamohan
Dummy YDummy Y--variablevariable
3/3/2014
4
CDS M Phil Econometrics Vijayamohan
193-Mar-14
In general 2 types:In general 2 types:
1.1. dependent variables which take one dependent variables which take one of two values of two values (binary/ dichotomous choice), and(binary/ dichotomous choice), and
2. dependent variables which can 2. dependent variables which can take more than two values but are take more than two values but are finite (finite (polychotomouspolychotomous; multiple ; multiple choice).choice).
Dummy YDummy Y--variablevariable
Choose A Don’t Choose A
Individual i
To be Not to be
Binary Choice Model
Binary Choice Model
Choose A Don’t Choose A
Individual i
By car Not
Individual i
Alternatives J
2cycle
3car …
1walk bus train
Multinomial Choice Model
ExamplesExamples
•• Labour Force Labour Force Participation:Participation:
-- occupational choice (multiple choice)occupational choice (multiple choice)-- employed or unemployed (binary choice)employed or unemployed (binary choice)-- to be employed fullto be employed full--time, parttime, part--time or time or
unemployed unemployed (multiple choice(multiple choice))
CDS Mphil Econometrics Vijayamohan
Binary Choice Model Binary Choice Model
•• Voting Voting Behaviour:Behaviour:
-- to vote or not to vote (binary choice)to vote or not to vote (binary choice)-- to vote to vote Congress, BJP, Communists, Congress, BJP, Communists, or or
abstain abstain (multiple (multiple choicechoice))
CDS Mphil Econometrics Vijayamohan
ExamplesExamplesBinary Choice Model Binary Choice Model
3/3/2014
5
CDS Mphil Econometrics Vijayamohan
Censored Data:Censored Data:Limited dependent variable:Limited dependent variable:
Housing expenditure Housing expenditure ––some may not have purchased: some may not have purchased: so zeroso zero
ExamplesExamplesBinary Choice Model Binary Choice Model
Two questions Two questions ––
(i)(i) Can we still use OLS to estimate such Can we still use OLS to estimate such outcomesoutcomes??
(ii)(ii) If If not, how do we model such outcomes?not, how do we model such outcomes?
CDS Mphil Econometrics Vijayamohan
Binary Choice Model Binary Choice Model
Binary Choice Binary Choice ––
(i)(i) Linear Probability Model (LPM)Linear Probability Model (LPM)
(ii)(ii) LogitLogit/ / ProbitProbit ModelModel
Censored/ Limited Dependent Variable Censored/ Limited Dependent Variable Regression ModelRegression Model
(i)(i) TobitTobit ModelModel
CDS Mphil Econometrics Vijayamohan
Binary Choice Model Binary Choice Model
CDS M Phil Econometrics Vijayamohan
283-Mar-14
Linear Probability ModelLinear Probability Model
We We focus on single equation binary outcomes:focus on single equation binary outcomes:
A fundamental difference between a quantitative A fundamental difference between a quantitative response model response model and and a qualitative response a qualitative response model:model:
The latter The latter is is a a probability model. probability model.
{ }10 ,∈iy
3-Mar-14 CDS M Phil Econometrics Vijayamohan
29
Linear Probability ModelLinear Probability Model
In general:In general:
ProbProb(event (event j occursj occurs) ) = P(Y = j= P(Y = j) ) = f (relevant variables; = f (relevant variables; parametersparameters))= f(x= f(x ii, , ββ))
where [ ]ik1ii x,...,xx =
are the variables and β is a vector of parameters.
;),x(fP)x1y(P iiii β===
;),x(f1P1)x0y(P iiii β−=−==
CDS Mphil Econometrics Vijayamohan
Given Given
yi follows Bernoulli probability distribution
{ }10 ,∈iy
Linear Probability ModelLinear Probability Model
3/3/2014
6
CDS M Phil Econometrics Vijayamohan
313-Mar-14
How do we How do we specify f(xspecify f(xii,,ββ)?)?
•• Linear Probability ModelLinear Probability Model
An obvious choice is the familiar least squares procedure:
β=β ii x),x(f ⇒ iii uxy +β=
This leads to the linear probability model (LPM).
)1y(P1)0y(P0)xy(E iiii =⋅+=⋅=
β=== iiiii x)x1y(P)xy(E
Conditional expectation = conditional probability
The regression eqn describes the probability that yi = 1 given information on xi
CDS Mphil Econometrics Vijayamohan
Assuming E(u) = 0, it follows that
Linear Probability ModelLinear Probability Model
XXk
1
0
β1 +β2Xk
y, P
β1
The probability of the event occurring, p, is assumed to be a linear function of the variable X.
CDS Mphil Econometrics Vijayamohan
The case of a single explanatory variable:
yi = β1 +β2 Xi
Linear Probability ModelLinear Probability Model
Now an example….Now an example….CDS Mphil Econometrics Vijayamohan
CDS Mphil Econometrics Vijayamohan
Changed into binary: 0 = none; 1 = ≥ 1
http
://fa
irmod
el.e
con.
yale
.edu
/rayf
air/w
orks
d.ht
m
0 → 4511 → 150
F → 315M → 286
CDS M Phil Econometrics Vijayamohan
363-Mar-14
N
Min
Statistic
Max
Statistic
Mean
Statistic
Std.
Dev
Skewness
Kurtosis
Statistic
Std.
Error Statistic
Std.
Error
Have extramarital
affairs 601 0 1 0.250 0.433 1.160 0.100 -0.656 0.199
Sex 601 0 1 0.476 0.500 0.097 0.100 -1.997 0.199
Age group 601 17.5 57 32.488 9.289 0.889 0.100 0.232 0.199
Married years group 601 0.125 15 8.178 5.571 0.078 0.100 -1.571 0.199
Have children 601 0 1 0.715 0.452 -0.958 0.100 -1.087 0.199
Religiocity group 601 1 5 3.116 1.168 -0.089 0.100 -1.008 0.199
Education group 601 9 20 16.166 2.403 -0.250 0.100 -0.302 0.199
Occupation group 601 1 7 4.195 1.819 -0.741 0.100 -0.776 0.199
Marriage rating
group 601 1 5 3.932 1.103 -0.836 0.100 -0.204 0.199
Interpretation ?
LPM LPM –– Ray Fair ModelRay Fair Model
3/3/2014
7
CDS Mphil Econometrics Vijayamohan
LPM LPM –– Ray Fair ModelRay Fair Model
CDS Mphil Econometrics Vijayamohan
UnstandardizedStandar-
dized
Beta
t Sig.
B Std. Error
(Constant) 0.736 0.152 4.859 0.000
Sex 0.045 0.040 0.052 1.129 0.259
Age -0.007 0.003 -0.159 -2.463 0.014
Married years 0.016 0.005 0.206 2.911 0.004
Have children 0.054 0.047 0.057 1.168 0.243
Religiocity -0.054 0.015 -0.145 -3.608 0.000
Education 0.003 0.009 0.017 0.360 0.719
Occupation 0.006 0.012 0.025 0.499 0.618
Marriage rating -0.087 0.016 -0.223 -5.472 0.000
Dependent Variable: Extramarital affairs
Interpretation ?
LPM LPM –– Ray Fair ModelRay Fair Model
CDS Mphil Econometrics Vijayamohan
Chap 13-39
LPM LPM –– Ray Fair ModelRay Fair Model
CDS Mphil Econometrics Vijayamohan
sex Ageyears
married childrenreligiou
sEducat-
ionOccup-ation
marriagrating
PredProb
1 57 15 1 1 20 7 1 0.614
Predicted Probabilities
LPM LPM –– Ray Fair ModelRay Fair Model
0 57 15 1 1 20 7 1 0.569
1 57 15 1 1 20 7 5 0.2650 57 15 1 1 20 7 5 0.219
CDS Mphil Econometrics Vijayamohan
sex Ageyears
married childrenreligiou
sEducat-
ionOccup-ation
marriagrating
PredProb
1 27 4 0 5 9 1 5 -0.0270 27 4 0 5 9 1 5 -0.072
Predicted Probabilities
NegativeNegative probability !probability !
LPM LPM –– Ray Fair ModelRay Fair ModelSome Some serious shortcomings.serious shortcomings.
(i)(i) The The distribution of the disturbance is nondistribution of the disturbance is non--normalnormal..
As y As y can can take only one take only one of two of two values, the values, the error term also error term also has a discrete (nonhas a discrete (non--normal) distributionnormal) distribution..
β−= ixyuThe probability distribution of u is:
β−=⇒= ii x1u1y
β−=⇒= ii xu0y
In effect, u follows a Bernoulli distribution
CDS Mphil Econometrics Vijayamohan
The Linear Probability Model
with Pi
with (1 – Pi)
3/3/2014
8
CDS Mphil Econometrics Vijayamohan
Normal PP Plot of Regression Standardized Residuals
Histogram
LPM LPM –– Ray Fair ModelRay Fair Model
CDS M Phil Econometrics Vijayamohan
443-Mar-14
The Linear Probability Model
((ii) the ii) the error term is error term is heteroskedasticheteroskedastic
)x1)(x()u(Var ii β−β=
Which clearly varies with the value of xWhich clearly varies with the value of xii
Some serious shortcomings.Some serious shortcomings.
CDS Mphil Econometrics Vijayamohan
Chap 13-45
LPM LPM –– Ray Fair ModelRay Fair Model
CDS M Phil Econometrics Vijayamohan
463-Mar-14
these these problems problems not insurmountablenot insurmountable::
•• Problem of nonProblem of non--normality can be normality can be
circumvented provided we have a large circumvented provided we have a large
sample size (invoke the central limit theorem)sample size (invoke the central limit theorem)
•• Problem of Problem of heteroskedasticityheteroskedasticity can be can be
removed by using White’s removed by using White’s heteroskedasticheteroskedastic
standard errorsstandard errors
The Linear Probability Model
CDS M Phil Econometrics Vijayamohan
473-Mar-14
The Linear Probability ModelThe Linear Probability Model
(iii) The main problem (iii) The main problem isis
the the NonNon--fulfilment of 0 fulfilment of 0 ≤≤ E(YE(Yii) ) ≤≤ 11
There is no There is no guaranteeguarantee that the predicted values that the predicted values of Y will all lie between 0 and 1. of Y will all lie between 0 and 1.
NegativeNegativeprobability !probability !
sex Ageyears
married childrenreligiou
sEducat-
ionOccup-ation
marriagrating
PredProb
1 27 4 0 5 9 1 5 -0.0270 27 4 0 5 9 1 5 -0.072
CDS Mphil Econometrics Vijayamohan
The Linear Probability ModelThe Linear Probability Model
What we require therefore is a way of constraining the LPM constraining the LPM so that the predicted probabilities do lie in the [0,1] range.
In general we use alternative estimation models to do this.
3/3/2014
9
CDS M Phil Econometrics Vijayamohan
493-Mar-14
The SolutionThe Solution
0.00
0.25
0.50
0.75
1.00
-8 -6 -4 -2 0 2 4 6
The usual way of avoiding this problem is to hypothesize that
the probability is a sigmoid (S-shaped) function of Z, F(Z),
where Z is a function of the explanatory variables.
Several mathematical functions are sigmoid in character.
The SolutionThe Solution
0.00
0.25
0.50
0.75
1.00
-8 -6 -4 -2 0 2 4 6
CDS Mphil Econometrics Vijayamohan
Alternatives to Alternatives to
The Linear Probability ModelThe Linear Probability Model
CDS M Phil Econometrics Vijayamohan
523-Mar-14
Alternatives Alternatives
• The distribution
– Normal: PROBIT, natural for behavior
– Logistic: LOGIT, allows “thicker tails”
– Gompertz: asymmetric, underlies the
basic logit model for multiple choice
Underlying Probability Distributions for Binary ChoiceUnderlying Probability Distributions for Binary Choice
CDS Mphil Econometrics Vijayamohan
3 March 2014 Vijayamohan CDS 54
The The LogitLogit ModelModel
0.00
0.25
0.50
0.75
1.00
-8 -6 -4 -2 0 2 4 6
Z
Z
Z e1
e
e1
1)Z(F
+=
+= −
β= ixZ where
Several mathematical functions are sigmoid in Several mathematical functions are sigmoid in character. character.
One One is the logistic is the logistic function. function.
)(ZF
Z
3/3/2014
10
3 March 2014 Vijayamohan CDS 55
The The LogitLogit ModelModel
0.00
0.25
0.50
0.75
1.00
-8 -6 -4 -2 0 2 4 6
Z
Z
Z e1
e
e1
1)Z(F
+=
+= −
β= ixZ where
As As Z Z →→ ∞∞, e, e-- ZZ →→ 0 and p 0 and p →→ 1 1
(but cannot exceed 1).
As As Z Z →→ –– ∞∞, e, e-- ZZ →→ ∞∞ and and p p →→ 0 0
(but cannot be below 0).
)(ZF
Z
CDS M Phil Econometrics Vijayamohan
563-Mar-14
Normal distribution vs. Normal distribution vs. Logistic distributionLogistic distribution
CDS M Phil Econometrics Vijayamohan
573-Mar-14
Logistic distributionLogistic distributionThe Logistic distribution has density function: The Logistic distribution has density function:
wherewhere
aa is the mean of the distributionis the mean of the distribution
bb is the scale parameteris the scale parameter
ee is the base of the natural logarithm, Euler's e is the base of the natural logarithm, Euler's e
(2.71...) (2.71...)
2b/)az(
b/)az(
)e1(
e)b/1()z(f −−
−−
+=
Here a = 0; b = 1, 2, and 3
CDS M Phil Econometrics Vijayamohan
583-Mar-14
Logistic distributionLogistic distributionWith a = 0 and b = 1, the Logistic distribution has density
function:
Integrating the pdf gives the distribution function:
2z
z
)e1(
e)z(f −
−
+= –∞ < z < ∞
ze1
1)z(F −+
= –∞ < z < ∞
Here a = 0; b = 1, 2, and 3
Z
)(ZF
CDS Mphil Econometrics Vijayamohan
0.00
0.25
0.50
0.75
1.00
-8 -6 -4 -2 0 2 4 6Z
Z
Z e1
e
e1
1
+=
+= −
β= ixZ where
PPii = E(y = 1|X= E(y = 1|Xii) = F(Z)) = F(Z)
;e1
1P1
zi+
=−
zZ
z
i
i ee1
e1
P1
P=
++=
− −
;e1
1P
zi −+=
Odds ratioOdds ratio
The Logit Model: Odds Ratio
3 March 2014 Vijayamohan CDS 60
The Logit Model: Odds Ratio
)(ZF
0.00
0.25
0.50
0.75
1.00
-8 -6 -4 -2 0 2 4 6 Z
zZ
z
i
i ee1
e1
P1
P=
++
=− −
Odds ratio
β= ixZ where
Taking log of the odds ratio,Taking log of the odds ratio,β==
−= i
i
i xZP1
PlnL
L is called L is called LogitLogit..
Hence the Hence the LogitLogit modelmodel
Now an example….Now an example….
3/3/2014
11
CDS Mphil Econometrics Vijayamohan
CDS Mphil Econometrics Vijayamohan
Changed into binary: 0 = none; 1 = ≥ 1
http
://fa
irmod
el.e
con.
yale
.edu
/rayf
air/w
orks
d.ht
mht
tp://
fairm
odel
.eco
n.ya
le.e
du/ra
yfai
r/wor
ksd.
htm
0 → 4511 → 150
F → 315M → 286
LogitLogit –– Ray Fair ModelRay Fair Model
Regression variables estimates
BB S.E.S.E. WaldWald dfdf Sig.Sig. Exp(B)Exp(B)
SexSex 0.2800.280 0.2390.239 1.3741.374 11 0.2410.241 1.3241.324
Age Age --0.0440.044 0.0180.018 5.8815.881 11 0.0150.015 0.9570.957
Married years Married years 0.0950.095 0.0320.032 8.6558.655 11 0.0030.003 1.0991.099
Have childrenHave children 0.3980.398 0.2920.292 1.8611.861 11 0.1730.173 1.4881.488
Religiocity Religiocity --0.3250.325 0.0900.090 13.08913.089 11 0.0000.000 0.7230.723
Education Education 0.0210.021 0.0510.051 0.1740.174 11 0.6770.677 1.0211.021
Occupation Occupation 0.0310.031 0.0720.072 0.1860.186 11 0.6670.667 1.0311.031
Marriage Marriage
rating rating --0.4680.468 0.0910.091 26.55526.555 11 0.0000.000 0.6260.626
ConstantConstant 1.3771.377 0.8880.888 2.4072.407 11 0.1210.121 3.9643.964
CDS Mphil Econometrics VijayamohanCDS Mphil Econometrics Vijayamohan
Logit – Ray Fair ModelRegression variables estimates
CDS Mphil Econometrics Vijayamohan
Logit – Ray Fair ModelOdds ratio estimates
Regression variables estimatesRegression variables estimates
Logi
tLo
git––
Ray
Fai
r Mod
elR
ay F
air M
odel BB S.E.S.E. WaldWald Sig.Sig. Exp(B)Exp(B)
SexSex 0.2800.280 0.2390.239 1.3741.374 0.2410.241 1.3241.324
Age Age --0.0440.044 0.0180.018 5.8815.881 0.0150.015 0.9570.957
Married years Married years 0.0950.095 0.0320.032 8.6558.655 0.0030.003 1.0991.099
Have childrenHave children 0.3980.398 0.2920.292 1.8611.861 0.1730.173 1.4881.488
Religiocity Religiocity --0.3250.325 0.0900.090 13.08913.089 0.0000.000 0.7230.723
Education Education 0.0210.021 0.0510.051 0.1740.174 0.6770.677 1.0211.021
Occupation Occupation 0.0310.031 0.0720.072 0.1860.186 0.6670.667 1.0311.031
Marriage rating Marriage rating --0.4680.468 0.0910.091 26.55526.555 0.0000.000 0.6260.626
ConstantConstant 1.3771.377 0.8880.888 2.4072.407 0.1210.121 3.9643.964
Wald = (B/SE)Wald = (B/SE)22 = t= t22
Only 4 variables significantly different from zero Only 4 variables significantly different from zero at at αα = 0.05 = 0.05
3/3/2014
12
CDS Mphil Econometrics Vijayamohan
Logit – Ray Fair ModelB S.E. Wald Sig. Exp(B)
Sex 0.280 0.239 1.374 0.241 1.324
Age -0.044 0.018 5.881 0.015 0.957
Married years 0.095 0.032 8.655 0.003 1.099
Have children 0.398 0.292 1.861 0.173 1.488
Religiocity -0.325 0.090 13.089 0.000 0.723
Education 0.021 0.051 0.174 0.677 1.021
Occupation 0.031 0.072 0.186 0.667 1.031
Marriage rating -0.468 0.091 26.555 0.000 0.626
Constant 1.377 0.888 2.407 0.121 3.964
If B > 0, OR > 1; If B > 0, OR > 1; If B < 0, OR < 1. If B < 0, OR < 1. If B = 0, odds unchanged If B = 0, odds unchanged
z
i
i eP1
P=
−
Exp(Bi) = odds ratio =
Factor by which the odds change when the ithindependent variable ↑ by one unit.
e.g., When No. of years married e.g., When No. of years married ↑↑ by 1 unitby 1 unit, log , log of odds of odds for affairs for affairs ↑↑ by 1.099 or 9.9%, ceteris paribus. by 1.099 or 9.9%, ceteris paribus.
CDS M Phil Econometrics Vijayamohan
683-Mar-14
LogitLogit –– Ray Fair ModelRay Fair Model
Exp(BExp(Bii) = ) = odds ratio =odds ratio =
z
i
i eP1
P=
−
BB S.E.S.E. WaldWald Sig.Sig. Exp(B)Exp(B)
SexSex 0.2800.280 0.2390.239 1.3741.374 0.2410.241 1.3241.324
Age Age --0.0440.044 0.0180.018 5.8815.881 0.0150.015 0.9570.957
Married years Married years 0.0950.095 0.0320.032 8.6558.655 0.0030.003 1.0991.099
Have childrenHave children 0.3980.398 0.2920.292 1.8611.861 0.1730.173 1.4881.488
ReligiocityReligiocity --0.3250.325 0.0900.090 13.08913.089 0.0000.000 0.7230.723
Education Education 0.0210.021 0.0510.051 0.1740.174 0.6770.677 1.0211.021
Occupation Occupation 0.0310.031 0.0720.072 0.1860.186 0.6670.667 1.0311.031
Marriage rating Marriage rating --0.4680.468 0.0910.091 26.55526.555 0.0000.000 0.6260.626
ConstantConstant 1.3771.377 0.8880.888 2.4072.407 0.1210.121 3.9643.964
e.g., e.g., ReligiocityReligiocity significantly reduces significantly reduces incidence of incidence of
extramarital affairs! extramarital affairs!
‘Married years’ significantly contributes to ‘Married years’ significantly contributes to extramarital extramarital
affairs!affairs!
324.1e 28.0 =
CDS M Phil Econometrics Vijayamohan
693-Mar-14
LogitLogit –– Ray Fair ModelRay Fair Model
Exp(BExp(Bii) = ) = odds ratio =odds ratio =
z
i
i eP1
P=
−
BB S.E.S.E. WaldWald Sig.Sig. Exp(B)Exp(B)
SexSex 0.2800.280 0.2390.239 1.3741.374 0.2410.241 1.3241.324
Age Age --0.0440.044 0.0180.018 5.8815.881 0.0150.015 0.9570.957
Married years Married years 0.0950.095 0.0320.032 8.6558.655 0.0030.003 1.0991.099
Have childrenHave children 0.3980.398 0.2920.292 1.8611.861 0.1730.173 1.4881.488
ReligiocityReligiocity --0.3250.325 0.0900.090 13.08913.089 0.0000.000 0.7230.723
Education Education 0.0210.021 0.0510.051 0.1740.174 0.6770.677 1.0211.021
Occupation Occupation 0.0310.031 0.0720.072 0.1860.186 0.6670.667 1.0311.031
Marriage rating Marriage rating --0.4680.468 0.0910.091 26.55526.555 0.0000.000 0.6260.626
ConstantConstant 1.3771.377 0.8880.888 2.4072.407 0.1210.121 3.9643.964
Those who are married longer are about 1.1 times Those who are married longer are about 1.1 times
more likely to have more likely to have extramarital affairs than those extramarital affairs than those
recently married! recently married!
CDS Mphil Econometrics Vijayamohan
Logit – Ray Fair ModelModel Discrimination: Goodness of FitModel Discrimination: Goodness of Fit
Compares the observed and predicted group memberships.Cases with a cut value of 0.5 or greater are classified as having extramarital affairs.
>≤
= 5.0y if 1
5.0y if 0y
i
i*i
CDS Mphil Econometrics Vijayamohan
PredictedPredicted ObservedObserved
11
11
00
00
25/150
25/41435/451
435/560
Model DiscriminationModel Discrimination
LogitLogit –– Ray Fair ModelRay Fair Model
1. The likelihood (L): probability of the observed results, given the parameter estimates.
As L is a small number, < 1, we use – 2 times the log of L (–2LL)
A good model is one with a high L of the observed results ⇒ small value for – 2LL.
(If a model fits perfectly, L = 1; –2LL = 0.)
Usually compare with the –2LL of a model with only the constant.
Goodness of Fit
Model DiscriminationModel Discrimination
3/3/2014
13
CDS M Phil Econometrics Vijayamohan
733-Mar-14
Logit – Ray Fair ModelGoodness of Fit
The likelihood (L):
Usually compare with the –2LL of a model with only the constant.
Model is good, if –2LL(with all variables) < –2LL(with only constant),
Model DiscriminationModel Discrimination3-Mar-14 CDS M Phil Econometrics
Vijayamohan74
Logit – Ray Fair ModelGoodness of Fit
Pseudo R2 : similar to R2. to quantify the proportion of explained ‘variation’ in the logistic regression model.
1. Cox & Snell R2 and Nagelkerke R2: (in SPSS)
(a) Cox & Snell R2 : N/2
2CS
)Max(L
)0(L1R
−=
Mod
el D
iscr
imin
atio
n
3-Mar-14 CDS M Phil Econometrics Vijayamohan
75
Logit – Ray Fair ModelGoodness of Fit
(a) Cox & Snell R2 :
L(0) = L(with only the constant): Constrained
L(Max) = L(with all variables): Unconstrained
N = sample size.
Cannot achieve a maximum value of 1.
N/22CS
)Max(L
)0(L1R
−=
Mod
el D
iscr
imin
atio
n
CDS Mphil Econometrics Vijayamohan
Logit – Ray Fair Model
(b)(b)NagelkerkeNagelkerke RR22::
where where RR22
CSCS = = Cox & Snell R2
Goodness of Fit
2max
2CS2
NR
RR =
N/22max )]0(L[1R −=
Mod
el D
iscr
imin
atio
n
CDS Mphil Econometrics Vijayamohan
Logit – Ray Fair ModelGoodness of Fit
Mod
el D
iscr
imin
atio
n
3-Mar-14 CDS M Phil Econometrics Vijayamohan
78
Logit – Ray Fair Model: Goodness of Fit
No. of correct predictions = 435 + 25 = 460No. of correct predictions = 435 + 25 = 460N (No. of observations) = 601N (No. of observations) = 601
Count RCount R22 = = (No. of correct predictions)/N(No. of correct predictions)/N460/601 460/601 = 0.765= 0.765
Mod
el D
iscr
imin
atio
n
3/3/2014
14
LogitLogit –– Ray Fair ModelRay Fair Model
Predicted ProbabilitiesPredicted Probabilities
i
i
Z
Z
ie
eyP
+==
11)( oror
;e1
1P
zi −+=
BB
SexSex 0.2800.280
Age Age --0.0440.044
Married years Married years 0.0950.095
Have childrenHave children 0.3980.398
Religious Religious --0.3250.325
Education Education 0.0210.021
Occupation Occupation 0.0310.031
Marriage rating Marriage rating --0.4680.468
ConstantConstant 1.3771.377
sexsex AgeAge
MarriedyMarriedy
earsears childrenchildren religiousreligious
EducatEducat--
ionion
OccupOccup--
ationation
marriagmarriag
ratingrating ZiZi PiPi
11 5757 1515 11 11 2020 77 11 0.7990.799 0.6900.690
00 5757 1515 11 11 2020 77 11 0.5180.518 0.6270.627
11 5757 1515 11 11 2020 77 55 --1.0751.075 0.2540.254
00 5757 1515 11 11 2020 77 55 --1.3561.356 0.2050.205
11 2727 44 00 55 99 11 55 --2.9042.904 0.0520.052
00 2727 44 00 55 99 11 55 --3.1843.184 0.0400.040
Another commonly used distribution: the probit.
PROBIT MODELPROBIT MODEL
Here the sigmoid function is the cumulative standardized normal distribution.
2Z2
1
e2
1)Z()Z(f
−
π=φ=
∫∞−
−
π=Φ=
Z Z2
1 2
e2
1)Z()Z(F
CDS Mphil Econometrics Vijayamohan –∞ < z < ∞
CDS M Phil Econometrics Vijayamohan
813-Mar-14
COMPARISON OF COMPARISON OF LOGIT LOGIT AND PROBITAND PROBIT
How do logit and probit models compare?
• Results quite similar although the logistic distribution has slightly fatter tails
• Variance of the probit is 1 (standard normal distribution).
For the logit it is 3
π
CDS M Phil Econometrics Vijayamohan
823-Mar-14
COMPARISON OF COMPARISON OF LOGIT LOGIT AND PROBITAND PROBIT
How do logit and probit models compare?
• Amemiya (1981) : the relationship between probit and logit models :
βprobit= 0.625βlogit and
βlogit= 1.6βprobit
ESTIMATION OF BINARY PROBIT ESTIMATION OF BINARY PROBIT AND LOGIT MODELSAND LOGIT MODELS
The logit and probit are non-linear. The parameters enter the regression model in a non-linear fashion.
We can no longer use OLS.
Hence the method of Maximum Likelihood.
CDS Mphil Econometrics Vijayamohan