7 Logistic Regression - Petra Christian Universityfaculty.petra.ac.id/halim/index_files/Stat2/7_Logistic...Introduction Interpreting Parameters in Logistic Regression Inferences for

References : Alan Agresti, Categorical Data Analysis, Wiley Interscience, New Jersey, 2002 Subhash Sharma, Applied Multivariate Techniques, John Wiley & Sons, 1996

IntroductionInterpreting Parameters in Logistic Regression

Inferences for Logistic RegressionLogit Models with Categorical Predictors

Multiple Logistic Regression

Logistic Regression

Siana Halim Indriati N Bisono

IntroductionInterpreting Parameters in Logistic RegressionInferences For Logistic RegressionLogit Models with Categorical PredictorsMultiple Logistic Regression




Logistic Regression


Recently, logistic regression has become a popular tool in business applications. Some credit-scoringapplications use logistic regression to model the probability that a subject is credit worthy.

A company that relies on catalog sales may determine whether to send a catalog to a potential customer by modeling the probability of a sale as a function of indices of past buying behavior.




Logistic Regression : Model

Logistic Regression


The logistic regression model is a generalized linear model withRandom Component : The response variable is binary

Yi = 1 or 0 (an event occurs or it doesn’t)We are interested in probability that Yi = 1, i.e. , π(xi)The distribution of Yi is binomial.

Systematic Component : A linear prediction such as

The explanatory or predictor variables may be quantitative (continuous), qualitative (discrete), or both (mixed)

jiji xx ββα +++ ...11




Logistic Regression : Model

Logistic Regression


Link Function : The log of the odds that an event occurs, known as “logit” :

Putting all together the logistic regression model is

( ) ⎟⎠⎞

⎜⎝⎛−

=π

ππ1

loglogit

( )( ) ( )( ) jijii

ii xx

xxx ββαπ

ππ +++=⎟⎟

⎠

⎞⎜⎜⎝

⎛−

= ...1

log 11logit




Illustration : Data for Most and Least Successful Financial Institutions

Logistic Regression


0.86022.5701

0.44022.7001

1.15022.1911

0.34021.4911

1.61023.2411

0.75022.1811

0.70022.9711

0.16022.6711

0.07023.5011

1.08022.7711

1.06022.8011

2.28120.5811

FPSizeSuccessFPSizeSuccessLeast SuccessfulMost Successful




Illustration : Contingency table for Type and Size of Financial Institution

Logistic Regression


241311Total12111Least Successful (LS)12210Most Successful (MS)

TotalSmallLargeType of Financial Institution (FI)Size

Probability any FI will be MS is P(MS) = 12/24 = 0.5

Probability FI is MS given it is large (L)P(MS | L ) = 10/11 = 0.909

Probability FI is MS given it small (S)P( MS | S) = 2/13 = 0.154

Odds of a FI being MS areOdds (MS) = 12/12 = 1

Odds of a FI being MS given it is large areOdds (MS | L ) = 10/1 = 10 (1)

Odds of a FI being MS given it is small areOdds ( MS | S) = 2/11 = 0.154 (2)




Odds and Probability

Logistic Regression


Odds and probabilities provide the same information, but in different forms. It is easy to convert odds into probabilities and vice versa. For example

10909.01

909.0)|(1

)|()|(

909.0101

10)|(1

)|()|(

=−

=−

=

=+

=+

=

LMSPLMSPLMSOdds

LMSoddsLMSoddsLMSP




Illustration : The Logistic Regression Model

Logistic Regression


Taking the natural log of the odds given by eqn. (1) and (2) we get ln [odds (MS|L)] = ln (10) = 2.303

ln [odds (MS|S)] = ln (0.182) = -1.704These two equations can be combined into the following equation to give the log of the odds as a function of the size of the FI :

ln [odds (MS | SIZE)] = -1.704 + 4.007 x SIZE (3)where SIZE = 1 if the FI is large and SIZE = 0 if the FI is small.




Illustration : The Logistic Regression Model

Logistic Regression


In general, for k independent variables (3) can be written

where

or

as ( )[ ] kkk XXXXMSodds βββ +++= ...,...,|ln 1101

kk XXpp βββ +++=−

...1

ln 110

( )ppXXMSodds k −

=1

,...,| 1




Interpreting β: Odds, Probabilities, and Linear Approximations

Logistic Regression


For a binary response variable Y and an explanatory variable X, let

π(x) = P(Y=1|X=x) = 1 – P(Y=0|X=x). The logistic regression model is

(4)

Equivalently, the log odds, called the logit,has the linear relationship

(5)

)exp(1)exp()(xxxβα

βαπ++

+=

[ ] xxxx βαπ

ππ +=−

=)(1

)(log)(logit




Interpreting β: Odds, Probabilities, and Linear Approximations

Logistic Regression


How can we interpret β in(5) ?Its sign determines whether π(x) is increasing of decreasing as xincreases.The rate of climb or descent increases as |β| increases; as β → 0 the curve flattens to a horizontal straight line. When β = 0, Y is independent of X.Since the logistic density is symmetric, π(x) approaches 1 at the same rate that it approaches 0.

The intercept parameter α is not usually of particular interest. However, by centering the predictor about 0 , α becomes the logit at the mean, and thus

( )xee πα

α=

+1




Looking at the Data

Logistic Regression


Before fitting the model and making such interpretations, look at the data to check that the logistic regression model is appropriate.

Since Y takes only values 0 and 1, it is difficult to check this by plotting Y against x.

It can be helpful to plot sample proportions or logits against x.

Let ni denote the number of observations at setting I of x. Of them, let yi denote the number of “1” outcomes, with pi = yi/ni.

Sample logit i is

This is not finite when yi = 0 or ni. The adjustment

⎥⎦

⎤⎢⎣

⎡−

=⎥⎦

⎤⎢⎣

⎡− ii

i

i

i

yny

pp

log1

log

21

21

log+−

+

ii

i

yn

y




Types of Inference : Hypothesis Test (Optional)

Logistic Regression


For the model with a single predictor,

significance test focus on H0 : β = 0, the hypothesis of independence.

[ ] xx βαπ +=)(logit

Wald test

Wald test uses the log likelihood at , with test statistic

Or its square; Under H0, z2 is asymptotically χ1

2

β̂

SEz β̂=

The Likelihood-ratio test

The likelihood ratio test uses twice the difference between the maximized log likelihood at and at β = 0 and also has an asymptotic χ1

2 null distribution.β̂

The Score test

The score test uses the log likelihood at β = 0 through the derivative of the log likelihood (i.e. the score function) at that point.




Types of Inference : Confidence Interval (Optional)

Logistic Regression


An interval for β results from inverting a test of H0 : β = β0. The interval is the set of β0 for which the chi-squared test statistic is no greater than χ1

2(α) = z2α/2. The Wald confidence interval is

SE is given by the estimated square root of

A 95% confidence interval for logit [π(x0)] is

Substituting each endpoint into the inverse transformation gives a corresponding interval for π(x0) .

( )[ ] )(ˆˆ2/

22/

20 SEzorzSE αα βββ ±≤−

( ) ( ) ( ) )ˆ,ˆcov(2ˆvarˆvarˆˆvar 0200 βαβαβα xxx ++=+

( ) SEx 96.1ˆˆ 0 ±+ βα( )

logit)logit)

exp(1exp(

0 +=xπ




ANOVA – Type Representation of Factors

Logistic Regression


Like ordinary regression, logistic regression extends to include qualitative explanatory variables, often called factors. We use dummy variable to do this.

For simplicity, we first consider a single factor X, with I categories. In row I of the I x 2 table, yi is the number of outcomes in the first column (successes) out of nitrials.

ii

i βαπ

π+=

−1log

resembles the model formula for cell means in one-way ANOVA

We treat yi as binomial with parameter πI,. The logit with factor is




ANOVA – Type Representation of Factors

Logistic Regression


With I categories, X has I – 1 non redundant parameters. One parameter can be set to 0, say βI = 0. If the values do not satisfy this, we can recode so that it is true.For instance, setWhich satisfy .Then

Where the newly defined parameters satisfy the constraint. When βI = 0, α equals the logit in row I, and βi is the difference between the logits in rows i and I. Thus, βi equals the log odds ratio for that pair of rows.

IiIii βααβββ −=−= ~~ and0~

=Iβ( ) ( ) ( ) iIiIiiπ βαβββαβα

~~~~ +=−+−=+=logit





Logistic Regression


The model for π(x) = P(Y=1) at values x = (x1, …, xp) of p predictors is

The alternative formula, is

The parameter βI refers to the effect of xi on the log odds that Y = 1, controlling the other xj.

[ ] pp xxx ββαπ +++= ...)( 11logit

)...exp(1)...exp(

11

11)(pp

pp

xxxxx ββα

ββαπ +++++++

=

Documents

7 Logistic Regression - Petra Christian Universityfaculty.petra.ac.id/halim/index_files/Stat2/7_Logistic...Introduction Interpreting Parameters in Logistic Regression Inferences for