Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
References : Alan Agresti, Categorical Data Analysis, Wiley Interscience, New Jersey, 2002 Subhash Sharma, Applied Multivariate Techniques, John Wiley & Sons, 1996
IntroductionInterpreting Parameters in Logistic Regression
Inferences for Logistic RegressionLogit Models with Categorical Predictors
Multiple Logistic Regression
Logistic Regression
Siana Halim Indriati N Bisono
IntroductionInterpreting Parameters in Logistic RegressionInferences For Logistic RegressionLogit Models with Categorical PredictorsMultiple Logistic Regression
IntroductionInterpreting Parameters in Logistic Regression
Inferences for Logistic RegressionLogit Models with Categorical Predictors
Multiple Logistic Regression
Logistic Regression
Siana Halim Indriati N Bisono
Recently, logistic regression has become a popular tool in business applications. Some credit-scoringapplications use logistic regression to model the probability that a subject is credit worthy.
A company that relies on catalog sales may determine whether to send a catalog to a potential customer by modeling the probability of a sale as a function of indices of past buying behavior.
IntroductionInterpreting Parameters in Logistic Regression
Inferences for Logistic RegressionLogit Models with Categorical Predictors
Multiple Logistic Regression
Logistic Regression : Model
Logistic Regression
Siana Halim Indriati N Bisono
The logistic regression model is a generalized linear model withRandom Component : The response variable is binary
Yi = 1 or 0 (an event occurs or it doesn’t)We are interested in probability that Yi = 1, i.e. , π(xi)The distribution of Yi is binomial.
Systematic Component : A linear prediction such as
The explanatory or predictor variables may be quantitative (continuous), qualitative (discrete), or both (mixed)
jiji xx ββα +++ ...11
IntroductionInterpreting Parameters in Logistic Regression
Inferences for Logistic RegressionLogit Models with Categorical Predictors
Multiple Logistic Regression
Logistic Regression : Model
Logistic Regression
Siana Halim Indriati N Bisono
Link Function : The log of the odds that an event occurs, known as “logit” :
Putting all together the logistic regression model is
( ) ⎟⎠⎞
⎜⎝⎛−
=π
ππ1
loglogit
( )( ) ( )( ) jijii
ii xx
xxx ββαπ
ππ +++=⎟⎟
⎠
⎞⎜⎜⎝
⎛−
= ...1
log 11logit
IntroductionInterpreting Parameters in Logistic Regression
Inferences for Logistic RegressionLogit Models with Categorical Predictors
Multiple Logistic Regression
Illustration : Data for Most and Least Successful Financial Institutions
Logistic Regression
Siana Halim Indriati N Bisono
0.86022.5701
0.44022.7001
1.15022.1911
0.34021.4911
1.61023.2411
0.75022.1811
0.70022.9711
0.16022.6711
0.07023.5011
1.08022.7711
1.06022.8011
2.28120.5811
FPSizeSuccessFPSizeSuccessLeast SuccessfulMost Successful
IntroductionInterpreting Parameters in Logistic Regression
Inferences for Logistic RegressionLogit Models with Categorical Predictors
Multiple Logistic Regression
Illustration : Contingency table for Type and Size of Financial Institution
Logistic Regression
Siana Halim Indriati N Bisono
241311Total12111Least Successful (LS)12210Most Successful (MS)
TotalSmallLargeType of Financial Institution (FI)Size
Probability any FI will be MS is P(MS) = 12/24 = 0.5
Probability FI is MS given it is large (L)P(MS | L ) = 10/11 = 0.909
Probability FI is MS given it small (S)P( MS | S) = 2/13 = 0.154
Odds of a FI being MS areOdds (MS) = 12/12 = 1
Odds of a FI being MS given it is large areOdds (MS | L ) = 10/1 = 10 (1)
Odds of a FI being MS given it is small areOdds ( MS | S) = 2/11 = 0.154 (2)
IntroductionInterpreting Parameters in Logistic Regression
Inferences for Logistic RegressionLogit Models with Categorical Predictors
Multiple Logistic Regression
Odds and Probability
Logistic Regression
Siana Halim Indriati N Bisono
Odds and probabilities provide the same information, but in different forms. It is easy to convert odds into probabilities and vice versa. For example
10909.01
909.0)|(1
)|()|(
909.0101
10)|(1
)|()|(
=−
=−
=
=+
=+
=
LMSPLMSPLMSOdds
LMSoddsLMSoddsLMSP
IntroductionInterpreting Parameters in Logistic Regression
Inferences for Logistic RegressionLogit Models with Categorical Predictors
Multiple Logistic Regression
Illustration : The Logistic Regression Model
Logistic Regression
Siana Halim Indriati N Bisono
Taking the natural log of the odds given by eqn. (1) and (2) we get ln [odds (MS|L)] = ln (10) = 2.303
ln [odds (MS|S)] = ln (0.182) = -1.704These two equations can be combined into the following equation to give the log of the odds as a function of the size of the FI :
ln [odds (MS | SIZE)] = -1.704 + 4.007 x SIZE (3)where SIZE = 1 if the FI is large and SIZE = 0 if the FI is small.
IntroductionInterpreting Parameters in Logistic Regression
Inferences for Logistic RegressionLogit Models with Categorical Predictors
Multiple Logistic Regression
Illustration : The Logistic Regression Model
Logistic Regression
Siana Halim Indriati N Bisono
In general, for k independent variables (3) can be written
where
or
as ( )[ ] kkk XXXXMSodds βββ +++= ...,...,|ln 1101
kk XXpp βββ +++=−
...1
ln 110
( )ppXXMSodds k −
=1
,...,| 1
IntroductionInterpreting Parameters in Logistic Regression
Inferences for Logistic RegressionLogit Models with Categorical Predictors
Multiple Logistic Regression
Interpreting β: Odds, Probabilities, and Linear Approximations
Logistic Regression
Siana Halim Indriati N Bisono
For a binary response variable Y and an explanatory variable X, let
π(x) = P(Y=1|X=x) = 1 – P(Y=0|X=x). The logistic regression model is
(4)
Equivalently, the log odds, called the logit,has the linear relationship
(5)
)exp(1)exp()(xxxβα
βαπ++
+=
[ ] xxxx βαπ
ππ +=−
=)(1
)(log)(logit
IntroductionInterpreting Parameters in Logistic Regression
Inferences for Logistic RegressionLogit Models with Categorical Predictors
Multiple Logistic Regression
Interpreting β: Odds, Probabilities, and Linear Approximations
Logistic Regression
Siana Halim Indriati N Bisono
How can we interpret β in(5) ?Its sign determines whether π(x) is increasing of decreasing as xincreases.The rate of climb or descent increases as |β| increases; as β → 0 the curve flattens to a horizontal straight line. When β = 0, Y is independent of X.Since the logistic density is symmetric, π(x) approaches 1 at the same rate that it approaches 0.
The intercept parameter α is not usually of particular interest. However, by centering the predictor about 0 , α becomes the logit at the mean, and thus
( )xee πα
α=
+1
IntroductionInterpreting Parameters in Logistic Regression
Inferences for Logistic RegressionLogit Models with Categorical Predictors
Multiple Logistic Regression
Looking at the Data
Logistic Regression
Siana Halim Indriati N Bisono
Before fitting the model and making such interpretations, look at the data to check that the logistic regression model is appropriate.
Since Y takes only values 0 and 1, it is difficult to check this by plotting Y against x.
It can be helpful to plot sample proportions or logits against x.
Let ni denote the number of observations at setting I of x. Of them, let yi denote the number of “1” outcomes, with pi = yi/ni.
Sample logit i is
This is not finite when yi = 0 or ni. The adjustment
⎥⎦
⎤⎢⎣
⎡−
=⎥⎦
⎤⎢⎣
⎡− ii
i
i
i
yny
pp
log1
log
21
21
log+−
+
ii
i
yn
y
IntroductionInterpreting Parameters in Logistic Regression
Inferences for Logistic RegressionLogit Models with Categorical Predictors
Multiple Logistic Regression
Types of Inference : Hypothesis Test (Optional)
Logistic Regression
Siana Halim Indriati N Bisono
For the model with a single predictor,
significance test focus on H0 : β = 0, the hypothesis of independence.
[ ] xx βαπ +=)(logit
Wald test
Wald test uses the log likelihood at , with test statistic
Or its square; Under H0, z2 is asymptotically χ1
2
β̂
SEz β̂=
The Likelihood-ratio test
The likelihood ratio test uses twice the difference between the maximized log likelihood at and at β = 0 and also has an asymptotic χ1
2 null distribution.β̂
The Score test
The score test uses the log likelihood at β = 0 through the derivative of the log likelihood (i.e. the score function) at that point.
IntroductionInterpreting Parameters in Logistic Regression
Inferences for Logistic RegressionLogit Models with Categorical Predictors
Multiple Logistic Regression
Types of Inference : Confidence Interval (Optional)
Logistic Regression
Siana Halim Indriati N Bisono
An interval for β results from inverting a test of H0 : β = β0. The interval is the set of β0 for which the chi-squared test statistic is no greater than χ1
2(α) = z2α/2. The Wald confidence interval is
SE is given by the estimated square root of
A 95% confidence interval for logit [π(x0)] is
Substituting each endpoint into the inverse transformation gives a corresponding interval for π(x0) .
( )[ ] )(ˆˆ2/
22/
20 SEzorzSE αα βββ ±≤−
( ) ( ) ( ) )ˆ,ˆcov(2ˆvarˆvarˆˆvar 0200 βαβαβα xxx ++=+
( ) SEx 96.1ˆˆ 0 ±+ βα( )
logit)logit)
exp(1exp(
0 +=xπ
IntroductionInterpreting Parameters in Logistic Regression
Inferences for Logistic RegressionLogit Models with Categorical Predictors
Multiple Logistic Regression
ANOVA – Type Representation of Factors
Logistic Regression
Siana Halim Indriati N Bisono
Like ordinary regression, logistic regression extends to include qualitative explanatory variables, often called factors. We use dummy variable to do this.
For simplicity, we first consider a single factor X, with I categories. In row I of the I x 2 table, yi is the number of outcomes in the first column (successes) out of nitrials.
ii
i βαπ
π+=
−1log
resembles the model formula for cell means in one-way ANOVA
We treat yi as binomial with parameter πI,. The logit with factor is
IntroductionInterpreting Parameters in Logistic Regression
Inferences for Logistic RegressionLogit Models with Categorical Predictors
Multiple Logistic Regression
ANOVA – Type Representation of Factors
Logistic Regression
Siana Halim Indriati N Bisono
With I categories, X has I – 1 non redundant parameters. One parameter can be set to 0, say βI = 0. If the values do not satisfy this, we can recode so that it is true.For instance, setWhich satisfy .Then
Where the newly defined parameters satisfy the constraint. When βI = 0, α equals the logit in row I, and βi is the difference between the logits in rows i and I. Thus, βi equals the log odds ratio for that pair of rows.
IiIii βααβββ −=−= ~~ and0~
=Iβ( ) ( ) ( ) iIiIiiπ βαβββαβα
~~~~ +=−+−=+=logit
IntroductionInterpreting Parameters in Logistic Regression
Inferences for Logistic RegressionLogit Models with Categorical Predictors
Multiple Logistic Regression
Multiple Logistic Regression
Logistic Regression
Siana Halim Indriati N Bisono
The model for π(x) = P(Y=1) at values x = (x1, …, xp) of p predictors is
The alternative formula, is
The parameter βI refers to the effect of xi on the log odds that Y = 1, controlling the other xj.
[ ] pp xxx ββαπ +++= ...)( 11logit
)...exp(1)...exp(
11
11)(pp
pp
xxxxx ββα
ββαπ +++++++
=