Upload
nash
View
30
Download
0
Embed Size (px)
DESCRIPTION
Forecasting Choices. Types of Variable. Continuous. Quantitative. Discrete (counting). Variable. Ordinal. Qualitative. Nominal. Nominal or Ordinal Dependent Variable. Indicating “choices” of a decision maker, say a consumer. Response categories: Mutually exclusive - PowerPoint PPT Presentation
Citation preview
Forecasting Choices
Types of Variable
Variable
Quantitative
Qualitative
Continuous
Discrete(counting)
Ordinal
Nominal
Nominal or Ordinal Dependent Variable
• Indicating “choices” of a decision maker, say a consumer.
• Response categories:– Mutually exclusive
– Collectively exhaustive
– Finite Number
• Desired regression outputs– Probability that the d.m. chooses each category
– Coefficient of each independent variable
Generalized Linear Models (GLM)
• Regression model for a continuous Y:Y = 0 + 1X1 + 2X2 + ee following N(0, )
• GLM Formulation:1. Model for Y:
Y is N(, )
2. Link Function (model for the predictors)
= 0 + 1X1 + 2X2
Estimation of Parameters of GLM
• Maximum Likelihood Estimation– For normal Y, MLE is the LS estimation
• Maximize:– Sum of log (likelihood function), Li of each
observation
MLE for Regression Model
• Y is N(, )
• MLE: Maximize
2
1222
2
1 1 1 1ln ln 2 ln
2 2 22
i iY
i i i i i
coeff are involved
f Y e f Y L Y
0 1 1 2 2i i iX X
222
1 1
1 1ln2 2
n n
i i ii i
L L Y
GLM for Binary Dependent Variable, Y
• Model for response:Y is B (n, )
• Model for predictors (Link Function)logit(0 + 1X1 + 2X2 +… KXK = g
• Probabilityexp(g) / (1+exp(g))
X : Covariates
• Independent variables are often referred to as “covariates.”
• Example: – SPSS binary logistic regression routine
– SPSS multinomial logistic regression routine
A. Logistic Regression For Ungrouped Data (ni=1)
• Model of Observation for the i-th observation Yi = 1: Choose category 1 with probability i
Yi = 0: Choose category 2 with probability 1- i
• Log Likelihood Function for the i-th observation
11 1 0
ln ln 1 ln 1
ii YYi i i i
i i i i i i
coeff are involved
p Y Y or
p Y L Y Y
MLE
• Maximize:
1 1
ln 1 ln 1n n
i i i i ii i
L L Y Y
0 1 1ln1
exp
1 exp
ii K Ki i
i
ii
i
X X g
g
g
Setting Up a Worksheet for MLE
• Define an array for storing parameters of the link function. Enter an initial estimate for each parameter. Then for each observation:
• Sum the likelihood and invoke the solver to maximize by changing the parameters.
• Multiply –2 to the maximized value for test of significance of the regression
Link Function, giParameters of the
Likelihoodln(Likelihood) Li
Test of Significance
• Hypotheses:
H0: 1 = 2 …. = 0
H1: At least one j = 0
• Test statistic:
• The Distribution Under H0: (DF = K)
0 12 2G L H L H
Standard Errors of Logistic Regression Coefficients (optional)
• Estimate of Information Matrix, I (K=2)
•
1 21
21 1 1 2
22 1 2 2
1
1 1 1
1 1 1
1 1 1
Ti i i
n
i i i i i i i i i i ii
i i i i i i i i i i i i i
i i i i i i i i i i i i i
I n p p
n p p n X p p n X p p
n X p p n X p p n X X p p
n X p p n X X p p n X p p
b X Diag X
1I
kbs the k th diagonal element of
b
Deviance Residuals and Deviance for Logistic Regression (Optional)
• Deviance (corresponds to SSE)
• Deviance Residual
ˆ ˆ2 ln 1 ln 1i i i i idev Y Y
0 1 0 0i i idev if Y and if Y
2
1
2n
ii
DEV dev L
B. Logistic Regression for Grouped Data Using WLS
• The observation for the i-th group:
->
->
,i i iR is B n
1
ln . ln ,1 1 1
i
i i i i i
pis approx N
p n
1. ,
i
i iii i
i
Rp is approx N
n n
0 1 1 2 2ln1
ii i K Ki i
i
pX X X e
p
1
0,1
i
i i i
e is Nn
->
WLS for Logistic Regression
• Regress:
ln1
i
i
p
p
on X1i, …, XKi with 1i i i iw n p p
WLS for Unequal Variance Data
X
Y
*
*
*
*
*
21
22
1
2
Observation 2 is subject to a larger variance than observation 1. So, it makes sense to give a lower weight. In WLS, the weight is proportional to 1/variance.
Modeling of Forecasting Choices - GLM
1. Model for Observation of the Dependent Variable.
A probability distribution
• Link Function (Model for Independent Variables)
A mathematical function
Forecasting Choices
# of Choices
2 Binomial Distr.
> 2 Multinomial Distr.
Unordered Ordered
Multinomial Logit Regression
• Multinomial Choice (m=3) , Ungrouped Data:
– Y1=1: Choose category 1 with probability
– Y1=0: Choose category 2 or 3 with probability 1-
– Y2=1: Choose category 2 with probability
– Y2=0: Choose category 1 or 3 with probability 1-
– Y3=1: Choose category 3 with probability
– Y3=0: Choose category 1 or 2 with probability 1-
31 21 2 3 1 2 3
1 2 3 1 2 3
, ,
1 1
YY Y
and
P Y Y Y
with Y Y Y
Log Likelihood Function
• Log Likelihood Function of
the i-th ungrouped observation
• MLE: Maximize
1 2 3
1 2 3 1 2 3
1 2 3 1 1 2 2 3 3
, ,
ln , , ln ln ln
i i iY Y Yi i i i i i
i i i i i i i i i i
coeff are involved
p Y Y Y
p Y Y Y L Y Y Y
1 1 2 2 3 31 1
ln ln lnn n
i i i i i i ii i
L L Y Y Y
Y3 and 3 can be omitted
• Multinomial Choice (m=3) , Ungrouped Data:
– Y1=1: Choose category 1 with probability
– Y1=0: Choose category 2 or 3 with probability 1-
– Y2=1: Choose category 2 with probability
– Y2=0: Choose category 1 or 3 with probability 1-
1 21 2(1 )
1 2 1 2 1 2, 1Y YY YP Y Y
Log Likelihood Function
• Log Likelihood Function of the i-th (ungrouped) observation
• MLE: Maximize
1 21 2 1
1 2 1 2 1 2
1 2 1 1 2 2 1 2 1 2
, 1
ln , ln ln 1 ln 1
i ii i Y YY Yi i i i i i
i i i i i i i i i i i
coeff are involved
p Y Y
p Y Y L Y Y Y Y
1 1 2 2 1 2 1 21 1
ln ln 1 ln 1n n
i i i i i i i i ii i
L L Y Y Y Y
1: Formulating “Link” Functions: Unordered Choice Categories
• Category 3 as the baseline category.
101 11 1 21 2 1 1
3
ln ...ii i K Ki i
i
X X X g
202 12 1 22 2 2 2
3
ln ...ii i i K Ki i
i
X X X g
From Link Functions to Probabilities
11 1 3 1
3
ln expii i i i
i
g g
22 2 3 2
3
ln expii i i i
i
g g
3 1 2
31 2
exp exp 1 1
1
1 exp exp
i i i
ii i
g g
g g
Test of Significance
• Hypotheses:H0: 11 = 21 = … K1 = 12 = 22 = … K2 = 0
H1: At least one ij = 0
• Test statistic
• The Distribution Under H0: (DF = 2 K)
0 12 2G L H L H
Interpreting Coefficients
• Not easy, as a change of probability for one category affects probabilities for other (two) categories.
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0 10 20 30 40 50 60 70
1 2
2: Formulating Link Functions: Ordered Choice Categories
Underlying Variable Defining Categories
Category 1 Category 2 Category 3
Choices for Probability Distribution of U
a. Ordered Probit Model for the i-th DM Ui = follows N(i, =1)
b. Ordered Logit Model for the i-th DM
Ui follows Logistic Distribution(i)
i = 1X1i + 2X2i (no const)
a. Ordered Probit Model
1 1 1 1Pr Pri i i i iU Z NORMSDIST
2 1 2 2 1Pr Pr Pri i i iU U U
3 3 3Pr 1 Pri iU U
b. Ordered Logit Model
1
1 11
expPr
1 expi
i ii
U
2 1 2 2 1Pr Pr Pri i i iU U U
3 2 2Pr 1 Pri i iU U
Types of Variable
Variable
Quantitative
Qualitative
Continuous
Discrete(counting)
Ordinal
Nominal
Poisson Regression for Counting
• Model of observations for Y
• Link Function
• Log Likelihood Function
exp( )0,1,
!
Yii i
i ii
P Y for YY
0 1 1ln i i K KiX X
exp( )ln ln ln !
!
iYi
i i i i ii
L Y YY