34
Stat 504, Lecture 11 1 Key Concept: Logistic Regression for I × J tables

Key Concept - Pennsylvania State Universitypersonal.psu.edu/abs12/stat504/Lecture/lec11.pdf · Stat 504, Lecture 11 10 Logistic Regression (ref. Chs. 5 - 7, Agresti) is another way

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Key Concept - Pennsylvania State Universitypersonal.psu.edu/abs12/stat504/Lecture/lec11.pdf · Stat 504, Lecture 11 10 Logistic Regression (ref. Chs. 5 - 7, Agresti) is another way

Stat 504, Lecture 11 1'

&

$

%

Key Concept:

• Logistic Regression for I × J tables

Page 2: Key Concept - Pennsylvania State Universitypersonal.psu.edu/abs12/stat504/Lecture/lec11.pdf · Stat 504, Lecture 11 10 Logistic Regression (ref. Chs. 5 - 7, Agresti) is another way

Stat 504, Lecture 11 2'

&

$

%

Summary of Chi-square test of independence

for I × J tables:

• Chi square tests are used to test whether two

categorical variables measured on a group of

subjects are independent.

• Null hypothesis, H0: X is independent of Y

(refer to Lecture 8(36), for equivalent statements)

• Construct a contingency table that counts

numbers of subjects for each combination of

levels of variable X and variable Y

• If X and Y are independent, then the probability

distribution of X is the same for each Y (and

vice versa).

• Calculate expected number of subjects in each

cell if X and Y are independent:

Expected value =(row count * column count)

(total count)

• The chi square statistic is the sum of

(observed − expected)2

expected

over all cells in the table

Page 3: Key Concept - Pennsylvania State Universitypersonal.psu.edu/abs12/stat504/Lecture/lec11.pdf · Stat 504, Lecture 11 10 Logistic Regression (ref. Chs. 5 - 7, Agresti) is another way

Stat 504, Lecture 11 3'

&

$

%

• The null distribution of the chi-square statistic

(e.g. X2 or G2) is the chi square distribution

with (I − 1) ∗ (J − 1) df, where I is the number of

rows and J is the number of columns in the

contingency table.

• If the observed chi-square statistic is more

extreme than some chosen theoretical value of the

null distribution, then reject the hypothesis of

independence (e.g. for α = 0.05, χ2

1 = 3.96, if

X2

1 > χ2

1 or equivalently p-value< 0.05 then reject

H0).

• SAS: under PROC FREQ, option CHISQ or in

SAS Analyst (Statistics/Table Analysis with

Statistics:ChiSquare)

• For limitations of chi-square test and statistics

refer to Lecture 10(24)

• If the variables are dependent, further investigate

the direction and magnitude of associations (e.g.

difference in proportions, relative risk, odds

ratios, partitioning chi-square, logistic regression,

log-linear models, etc...)

Page 4: Key Concept - Pennsylvania State Universitypersonal.psu.edu/abs12/stat504/Lecture/lec11.pdf · Stat 504, Lecture 11 10 Logistic Regression (ref. Chs. 5 - 7, Agresti) is another way

Stat 504, Lecture 11 4'

&

$

%

From the last lecture:

Example 2. The table below classifies 5375 high

school students according to the smoking behavior of

the student (Z) and the smoking behavior of the

student’s parents (Y ).

Student smokes?

How many parents smoke? Yes (Z = 1) No (Z = 2)

Both (Y = 1) 400 1380

One (Y = 2) 416 1823

Neither (Y = 3) 188 1168

The test for independence yields X2 = 37.6 and

G2 = 38.4 with 2 df (p-values are essentially zero), so

Y and Z are related.

Page 5: Key Concept - Pennsylvania State Universitypersonal.psu.edu/abs12/stat504/Lecture/lec11.pdf · Stat 504, Lecture 11 10 Logistic Regression (ref. Chs. 5 - 7, Agresti) is another way

Stat 504, Lecture 11 5'

&

$

%

It is natural to think of Z as a response and Y as a

predictor, so we will discuss the conditional

distribution of Z given Y . Let

p1 = P (Z = 1 |Y = 1),

p2 = P (Z = 1 |Y = 2),

p3 = P (Z = 1 |Y = 3).

The estimates of these probabilities are

p̂1 = 400/1780 = .225,

p̂2 = 416/2239 = .186,

p̂3 = 188/1356 = .139.

Page 6: Key Concept - Pennsylvania State Universitypersonal.psu.edu/abs12/stat504/Lecture/lec11.pdf · Stat 504, Lecture 11 10 Logistic Regression (ref. Chs. 5 - 7, Agresti) is another way

Stat 504, Lecture 11 6'

&

$

%

The effect of Y on Z can be summarized with two

differences. For example, we can calculate the

increase in the probability of Z = 1 as Y goes from 3

to 2, and as Y goes from 2 to 1:

d̂23 = p̂2 − p̂3 = .047

d̂12 = p̂1 − p̂2 = .039

Alternatively, we may treat Y = 3 as a baseline and

calculate the increase in probability as we go from

Y = 3 to Y = 2 and from Y = 3 to Y = 1:

d̂23 = p̂2 − p̂3 = .047

d̂13 = p̂1 − p̂3 = .086

Page 7: Key Concept - Pennsylvania State Universitypersonal.psu.edu/abs12/stat504/Lecture/lec11.pdf · Stat 504, Lecture 11 10 Logistic Regression (ref. Chs. 5 - 7, Agresti) is another way

Stat 504, Lecture 11 7'

&

$

%

We may also express the effects as odds ratios:

θ̂23 =416 × 1168

188 × 1823= 1.42,

θ̂13 =400 × 1168

188 × 1380= 1.80.

Students with one smoking parent are estimated to be

42% more likely (on the odds scale) to smoke than

students whose parents do not smoke, and students

with two smoking parents are 80% more likely to

smoke than students whose parents do not smoke.

Page 8: Key Concept - Pennsylvania State Universitypersonal.psu.edu/abs12/stat504/Lecture/lec11.pdf · Stat 504, Lecture 11 10 Logistic Regression (ref. Chs. 5 - 7, Agresti) is another way

Stat 504, Lecture 11 8'

&

$

%

Another way to describe the effects is to perform

”partitioned tests”, that is to form a sequence of

smaller tables by combining rows and/or columns in a

meaningful way.

In our example we might be interested in exploring

the association in students smoking behavior

depending if neither parent smokes versus at least one

parent smoking. Thus we can combine the first two

rows of our 3 × 2 table and look at a new 2 × 2 table:

Student smokes Student doesn’t

1–2 parents smoke 816 3203

Neither parent smokes 188 1168

This table has X2 = 27.7, G2 = 29.1, p-value≈ 0, and

θ̂ = 1.58. We estimate that a student is 58% more

likely, on the odds scale, to smoke if he or she has at

least one smoking parent.

Page 9: Key Concept - Pennsylvania State Universitypersonal.psu.edu/abs12/stat504/Lecture/lec11.pdf · Stat 504, Lecture 11 10 Logistic Regression (ref. Chs. 5 - 7, Agresti) is another way

Stat 504, Lecture 11 9'

&

$

%

But what if

• we want to model the probabilities of a response

variable as a function of some explanatory

variables, e.g. ”risk” of student smoking as a

function of parents’ behavior.

• we want to perform descriptive discriminate

analyses such as describing the differences

between individuals in separate groups as a

function of explanatory variables, e.g. student

smokers and nonsmokers as a function of parents

smoking behavior

• we want to predict probabilities that individuals

fall into two categories of the binary response as

a function of some explanatory variables, e.g.

what is the probability that a student is a smoker

given that neither of his/her parents smokes.

• we want to classify individuals into two categories

based on explanatory variables, e.g. classify new

students into ”smoking” or ”nonsmoking” group

depending on parents smoking behavior.

• we want to develop a social network model, adjust

for ”bias”, analyze choice data, etc...

Page 10: Key Concept - Pennsylvania State Universitypersonal.psu.edu/abs12/stat504/Lecture/lec11.pdf · Stat 504, Lecture 11 10 Logistic Regression (ref. Chs. 5 - 7, Agresti) is another way

Stat 504, Lecture 11 10'

&

$

%

Logistic Regression

(ref. Chs. 5 - 7, Agresti) is another way we can

model the probabilities of a response variable

as a function of some explanatory variables. For

example, what is the probability that the child

smokes given that at least one parent smokes.

Logistic regression is a special type of generalized

linear models (GLM).

For now we’ll only focus on modeling the

probabilities of a binary response variable as a

function of another discrete variable, in order to

see how logistic regression ties in with the test of

independence and measures of associations in

two-way tables.

Page 11: Key Concept - Pennsylvania State Universitypersonal.psu.edu/abs12/stat504/Lecture/lec11.pdf · Stat 504, Lecture 11 10 Logistic Regression (ref. Chs. 5 - 7, Agresti) is another way

Stat 504, Lecture 11 11'

&

$

%

Now, suppose we arrange the data like this,

yi ni

1–2 parents smoke 816 4019

Neither parent smokes 188 1356

where yi is the number of children who smoke, ni is

the number of children, and πi = P (yi = 1|xi) odds of

smoking, where i = 1, 2. Then we suppose that

yi ∼ Bin(ni, πi),

and let X be a dummy variable

Xi =

8

<

:

0 if neither parent smokes,

1 if at least one parent smokes.

Then the logistic regression model is

logit(πi) = logπi

1 − πi

= β0 + β1Xi,

or

πi =exp(β0 + β1xi)

1 + exp(β0 + β1xi)

which says that log-odds of smoking are β0 for

”smoking parents”=none, and β0 + β1 for ”smoking

parents”=at least one.

Page 12: Key Concept - Pennsylvania State Universitypersonal.psu.edu/abs12/stat504/Lecture/lec11.pdf · Stat 504, Lecture 11 10 Logistic Regression (ref. Chs. 5 - 7, Agresti) is another way

Stat 504, Lecture 11 12'

&

$

%

We can fit the model in SAS like this:

(ref: lec10ex2.sas)

data smoke;

input s $ y n ;

cards;

smoke 816 4019

nosmoke 188 1356

;

proc logistic descending;

class s (ref=first) / param=ref;

model y/n = s /scale=none;

run;

In the data step, the dollar sign $ indicates that S is a

character-string variable.

In the logistic step, the statement

• descending

insures that you are modeling a probability of an

”event” which takes value 1, otherwise by default

SAS models the probability of ”nonevent”

• class S / param=ref ref=first;

says that S should be coded as a dummy variable

using the first category as the reference or zero

group. (The first category is “nosmoke,” because

it comes before “smoke” in alphabetical order.)

Page 13: Key Concept - Pennsylvania State Universitypersonal.psu.edu/abs12/stat504/Lecture/lec11.pdf · Stat 504, Lecture 11 10 Logistic Regression (ref. Chs. 5 - 7, Agresti) is another way

Stat 504, Lecture 11 13'

&

$

%

• model y/n

Because we have grouped data (i.e. multiple trials

per line of the data set), the model statement

uses the “event/trial” syntax, in which y/n

appears on the left-hand side of the equal sign.

The predictors go on the right-hand side,

separated by spaces if there are more than one.

An intercept is added automatically by default.

Page 14: Key Concept - Pennsylvania State Universitypersonal.psu.edu/abs12/stat504/Lecture/lec11.pdf · Stat 504, Lecture 11 10 Logistic Regression (ref. Chs. 5 - 7, Agresti) is another way

Stat 504, Lecture 11 14'

&

$

%

Let’s look at some output from this program:

• Model information

• Response profile

• Class Level Information

• Model convergence

• Goodness of fit-statistics

• Model fit

• Testing null hypothesis beta=0

• Analysis of maximum likelihood estimates

• Odds Ratio Estimates

Page 15: Key Concept - Pennsylvania State Universitypersonal.psu.edu/abs12/stat504/Lecture/lec11.pdf · Stat 504, Lecture 11 10 Logistic Regression (ref. Chs. 5 - 7, Agresti) is another way

Stat 504, Lecture 11 15'

&

$

%

Model Information

Data Set WORK.SMOKE

Response Variable (Events) y

Response Variable (Trials) n

Number of Observations 2

Model binary logit

Optimization Technique Fisher’s scoring

Response Profile

Ordered Binary Total

Value Outcome Frequency

1 Event 1004

2 Nonevent 4371

Class Level Information

Design

Variables

Class Value 1

s nosmoke 0

smoke 1

Model Convergence Status

Convergence criterion (GCONV=1E-8) satisfied.

Number of events/trials observations: 2

Model Fit Statistics

Intercept

Intercept sand

Criterion Only Covariates

AIC 5178.510 5151.390

SC 5185.100 5164.569

-2 Log L 5176.510 5147.390

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq

Likelihood Ratio 29.1207 1 <.0001

Score 27.6766 1 <.0001

Wald 27.3361 1 <.0001

Page 16: Key Concept - Pennsylvania State Universitypersonal.psu.edu/abs12/stat504/Lecture/lec11.pdf · Stat 504, Lecture 11 10 Logistic Regression (ref. Chs. 5 - 7, Agresti) is another way

Stat 504, Lecture 11 16'

&

$

%

Analysis of Maximum Likelihood Estimates

Parameter DF Estimate Error Chi-Square Pr > ChiSq

Intercept 1 -1.8266 0.0786 540.2949 <.0001

s smoke 1 0.4592 0.0878 27.3361 <.0001

Our logistic regression model:

log(πi/(1 − πi) = −1.826 + 0.4592Xi

The estimated coefficient of the dummy variable,

β̂1 = 0.4592,

agrees exactly with the log-odds ratio from the 2 × 2

table (e.g. ln(1.58) = 816×1168

188×3203= 0.459. The standard

error for β̂1, 0.0878, agrees exactly with the standard

error that you can calculate from the 2 × 2 table.

This is not surprising, because in the logistic

regression model β1 is the difference in the log-odds of

children smoking as we move from ”nosmoke” (i.e.

neither parent smokes) (Xi = 0) to ”smoke” (i.e. at

least one parent smokes) (Xi = 1), and the difference

in log-odds is a log-odds ratio.

Page 17: Key Concept - Pennsylvania State Universitypersonal.psu.edu/abs12/stat504/Lecture/lec11.pdf · Stat 504, Lecture 11 10 Logistic Regression (ref. Chs. 5 - 7, Agresti) is another way

Stat 504, Lecture 11 17'

&

$

%

Also, in this model, β0 is the log-odds of children

smoking for no-smoking parents (Xi = 0). Looking at

the 2× 2 table, the estimated log-odds for nonsmokers

is

log

188

1168

«

= log(0.161) = −1.8266,

which agrees with β̂0 from the logistic model.

Page 18: Key Concept - Pennsylvania State Universitypersonal.psu.edu/abs12/stat504/Lecture/lec11.pdf · Stat 504, Lecture 11 10 Logistic Regression (ref. Chs. 5 - 7, Agresti) is another way

Stat 504, Lecture 11 18'

&

$

%

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq

Likelihood Ratio 29.1207 1 <.0001

Score 27.6766 1 <.0001

Wald 27.3361 1 <.0001

This tests that a set of coefficients is simultaneously

zero, e.g. H0 = β1 = β2... = βk versus alternative that

at least on of them is nonzero. In our example, since

we have only a single covariate in this mode, this is

equivalent to testing β1 = 0 and this is the same as

testing that Y and X are independent.

Notice that ”likelihood ratio” matches G2 we

calculated in the last lecture, ”score” is also a

statistics approximately with chi-squared distribution.

We’ll discuss these in details a bit later.

The Wald test compares the statistic

z =β̂j

SE(β̂j)

to a standard normal distribution; the p-value is twice

the area to the right of |z| under the normal curve.

Page 19: Key Concept - Pennsylvania State Universitypersonal.psu.edu/abs12/stat504/Lecture/lec11.pdf · Stat 504, Lecture 11 10 Logistic Regression (ref. Chs. 5 - 7, Agresti) is another way

Stat 504, Lecture 11 19'

&

$

%

The goodness-of-fit statistics X2 and G2 from this

model are both zero, because the model is saturated.

However, suppose that we fit the intercept-only

model. This is accomplished by removing the

predictor from the model statement, like this:

model y/n = / scale=none;

The goodness-of-fit statistics are shown below.

Deviance and Pearson Goodness-of-Fit Statistics

Criterion DF Value Value/DF Pr > ChiSq

Deviance 1 29.1207 29.1207 <.0001

Pearson 1 27.6766 27.6766 <.0001

The Pearson statistic X2 = 27.6766 is precisely equal

to the ordinary X2 for testing independence in the

2 × 2 table. And the deviance G2 = 29.1207 is

precisely equal to the G2 for testing independence in

the 2 × 2 table.

Thus we have shown that analyzing a 2 × 2 table for

relatedness is equivalent to logistic regression with a

dummy variable.

Page 20: Key Concept - Pennsylvania State Universitypersonal.psu.edu/abs12/stat504/Lecture/lec11.pdf · Stat 504, Lecture 11 10 Logistic Regression (ref. Chs. 5 - 7, Agresti) is another way

Stat 504, Lecture 11 20'

&

$

%

Our logistic regression model:

log(πi/(1 − πi) = −1.826 + 0.4592Xi

Interpretation of coefficients:

For every one-unit increase in the explanatory

variable X1 (e.g. changing from no smoking parents

to smoking parents), the odds of ”success” πi/(1− πi)

will be multiplied by exp(β1), given that all the other

variables are held constant.

For our example, exp(0.4592) = 1.5828 which are

odds we already calculated.

Further, the predicted probability of a child being asmoker if at least one parent smokes is

P (Yi = 1|Xi = 1) =exp(−1.826 + 0.4592(Xi = 1))

(1 + exp(−1.826 + 0.4592(Xi = 1)))= 0.20

See lec10ex2.sas for the commands on using the OUTPUT

statement.

Page 21: Key Concept - Pennsylvania State Universitypersonal.psu.edu/abs12/stat504/Lecture/lec11.pdf · Stat 504, Lecture 11 10 Logistic Regression (ref. Chs. 5 - 7, Agresti) is another way

Stat 504, Lecture 11 21'

&

$

%

Now let’s replicate the analysis of the original 3 × 2

tables with logistic regression.

First, we re-express the data in terms of yi=number

of smoking students, and ni=number of students for

three groups based on parents behavior:

Student smokes?

How many parents smoke? yi ni

Both 400 1780

One 416 2239

Neither 188 1356

Then we decide on a baseline level for the explanatory

variable X, and create k − 1 dummy indicators if X is

a categorical variable with k levels. For our example,

let’s parent smoking=Neither be a baseline, and define

a pair of dummy indicators,

X1 =

8

<

:

1 if parent smoking=One ,

0 otherwise,

X2 =

8

<

:

1 if parent smoking=Both,

0 otherwise.

Page 22: Key Concept - Pennsylvania State Universitypersonal.psu.edu/abs12/stat504/Lecture/lec11.pdf · Stat 504, Lecture 11 10 Logistic Regression (ref. Chs. 5 - 7, Agresti) is another way

Stat 504, Lecture 11 22'

&

$

%

Let π = odds of student smoking. Then the model

log

π

1 − π

«

= β0 + β1X1 + β2X2

says that the log-odds of student smoking are β0 for

parents smoking=neither, β0 + β1 for parents

smoking=one and β0 + β2 for parents smoking=both.

Therefore,

β1 = log-odds for one

− log-odds for neither

β2 = log-odds for both

− log-odds for neither,

and we expect to get β̂1 = ln(1.42) = .351 and

β̂2 = ln(1.80) = .588. The estimated intercept should

be

β̂0 = log(188/1168) = −1.826

Page 23: Key Concept - Pennsylvania State Universitypersonal.psu.edu/abs12/stat504/Lecture/lec11.pdf · Stat 504, Lecture 11 10 Logistic Regression (ref. Chs. 5 - 7, Agresti) is another way

Stat 504, Lecture 11 23'

&

$

%

Here are two versions of a SAS program for fitting

this model

(ref: lec11ex2v1.sas):

data smoke;

input s $ y n ;

cards;

2 400 1780

1 416 2239

0 188 1356

;

proc logistic descending;

class s (ref=first)/ param=ref;

model y/n = s /scale=none;

output out=predict pred=prob;

run;

proc print data=predict;

run;

(ref: lec11ex2.sas):

proc logistic descending;

class s (ref=’neither’) / order=data param=ref;

model y/n = s /scale=none;

output out=predict pred=prob;

run;

proc print data=predict;

run;

Page 24: Key Concept - Pennsylvania State Universitypersonal.psu.edu/abs12/stat504/Lecture/lec11.pdf · Stat 504, Lecture 11 10 Logistic Regression (ref. Chs. 5 - 7, Agresti) is another way

Stat 504, Lecture 11 24'

&

$

%

In the class statement, the option order=data tells

SAS to sort the categories of S by the order in which

they appear in the dataset rather than alphabetical

order. The option param=ref tells SAS to create a set

of two dummy variables to distinguish among the

three categories. The option ref=’neither’ makes

neither the reference group (i.e. the group for which

both dummy variables are zero).

Let’s look at some relevant portions of the output of

lec11ex2v1.lst:

Analysis of Maximum Likelihood Estimates

Standard Wald

Parameter DF Estimate Error Chi-Square Pr > ChiSq

Intercept 1 -1.8266 0.0786 540.2949 <.0001

s 1 1 0.3491 0.0955 13.3481 0.0003

s 2 1 0.5882 0.0970 36.8105 <.0001

Odds Ratio Estimates

Point 95% Wald

Effect Estimate Confidence Limits

s 1 vs 0 1.418 1.176 1.710

s 2 vs 0 1.801 1.489 2.178

Page 25: Key Concept - Pennsylvania State Universitypersonal.psu.edu/abs12/stat504/Lecture/lec11.pdf · Stat 504, Lecture 11 10 Logistic Regression (ref. Chs. 5 - 7, Agresti) is another way

Stat 504, Lecture 11 25'

&

$

%

The saturated model ls,

logit(π) = −1.8266 + 0.3491X1 + 0.5882X2

For example, the predicted probability of a student

smoking given that only one parent is smoking is

P (Yi = 1|neither = 0, one = 1, both = 0)

= P (Yi = 1|X1 = 1, X2 = 0)

=exp (−1.8266 + 0.3491)

1 + exp (−1.8266 + 0.3491)(1)

In this case, the “intercept only” model says that

delinquency is unrelated to socioeconomic status, so

the test of the global null hypothesis β1 = β2 = 0 is

equivalent to the usual test for independence in the

3× 2 table. The estimated coefficients and SE’s are as

we predicted, as well as the estimated odds ratios.

Page 26: Key Concept - Pennsylvania State Universitypersonal.psu.edu/abs12/stat504/Lecture/lec11.pdf · Stat 504, Lecture 11 10 Logistic Regression (ref. Chs. 5 - 7, Agresti) is another way

Stat 504, Lecture 11 26'

&

$

%

If we include the statement

output out=predict pred=phat

reschi=pearson resdev=deviance;

in the PROC LOGISTIC call, then SAS creates a

new dataset called “results” that includes all of

the variables in the original dataset, the predicted

probabilities π̂i, the Pearson residuals and the

deviance residuals. Then we can add some code

to calculate and print out the estimated expected

number of successes µ̂i = niπ̂i and failures

ni − µ̂i = ni(1 − π̂i).

Page 27: Key Concept - Pennsylvania State Universitypersonal.psu.edu/abs12/stat504/Lecture/lec11.pdf · Stat 504, Lecture 11 10 Logistic Regression (ref. Chs. 5 - 7, Agresti) is another way

Stat 504, Lecture 11 27'

&

$

%

A revised SAS program that does all this is shown

below:

(ref: lec11ex2v2.sas)

data smoke;

input s $ y n ;

cards;

2 400 1780

1 416 2239

0 188 1356

;

proc logistic descending;

class s (ref=first)/ param=ref;

model y/n = s /scale=none;

output out=predict pred=prob reschi=pearson resdev=deviance;;

run;

data diagnostics;

set predict;

shat = n*prob;

fhat = n*(1-prob);

run;

proc print data=diagnostics;

var s y n prob shat fhat pearson deviance;;

run;

Page 28: Key Concept - Pennsylvania State Universitypersonal.psu.edu/abs12/stat504/Lecture/lec11.pdf · Stat 504, Lecture 11 10 Logistic Regression (ref. Chs. 5 - 7, Agresti) is another way

Stat 504, Lecture 11 28'

&

$

%

Running this program gives a new output section:

Obs s y n prob shat fhat pearson deviance

1 2 400 1780 0.22472 400.000 1380.00 -.000000031 0

2 1 416 2239 0.18580 416.000 1823.00 -3.6617E-15 0

3 0 188 1356 0.13864 188.000 1168.00 -.000001291 -.000001307

Most of the “shat” and “fhat” values are greater than

5.0, so the χ2 approximation is trustworthy.

Page 29: Key Concept - Pennsylvania State Universitypersonal.psu.edu/abs12/stat504/Lecture/lec11.pdf · Stat 504, Lecture 11 10 Logistic Regression (ref. Chs. 5 - 7, Agresti) is another way

Stat 504, Lecture 11 29'

&

$

%

Testing H0 : βj = 0 versus H1 : βj 6= 0.

The Wald chisquare statistics z2 = (β̂j/SE(β̂k))2 for

these tests are displayed along with the estimated

coefficients in the “Analysis of Maximum Likelihood

Estimates” section. A value of z2 bigger than 3.84

indicates that we can reject the null hypothesis βj = 0

at the .05-level.

Page 30: Key Concept - Pennsylvania State Universitypersonal.psu.edu/abs12/stat504/Lecture/lec11.pdf · Stat 504, Lecture 11 10 Logistic Regression (ref. Chs. 5 - 7, Agresti) is another way

Stat 504, Lecture 11 30'

&

$

%

Testing the joint significance of all predictors.

In SAS output: Testing Global Null Hypothesis:

BETA=0/Model Fit Statistics/Overall

Goodness-of-Fit Statistics

In our model

log

π

1 − π

«

= β0 + β1X1 + β2X2,

this is the test of H0 : β1 = β2 = 0 versus the

alternative that at least one of the coefficients

β1, . . . , βp is not zero.

In other words, this is testing the null hypothesis that

an intercept-only model is correct,

log

π

1 − π

«

= β0

versus the alternative that the current model is

correct

log

π

1 − π

«

= β0 + β1X1 + β2X2,

Page 31: Key Concept - Pennsylvania State Universitypersonal.psu.edu/abs12/stat504/Lecture/lec11.pdf · Stat 504, Lecture 11 10 Logistic Regression (ref. Chs. 5 - 7, Agresti) is another way

Stat 504, Lecture 11 31'

&

$

%

In the SAS output, three different chisquare statistics

for this test are displayed in the section “Testing

Global Null Hypothesis: Beta=0,” corresponding to

the the likelihood ratio, score and Wald tests. This

test has k degrees of freedom (e.g. the number of

dummy indicators, that is the number of

β-parameters (except the intercept).

Large chisquare statistics lead to small p-values and

provide evidence against the intercept-only model in

favor of the current model.

If these three tests agree, that’s evidence that the

large-sample approximations are working well and the

results are trustworthy. If the results from the three

tests disagree, most statisticians would tend to trust

the likelihood-ratio test more than the other two.

Page 32: Key Concept - Pennsylvania State Universitypersonal.psu.edu/abs12/stat504/Lecture/lec11.pdf · Stat 504, Lecture 11 10 Logistic Regression (ref. Chs. 5 - 7, Agresti) is another way

Stat 504, Lecture 11 32'

&

$

%

Testing that an arbitrary group of coefficients

is zero. To test the null hypothesis that a group of k

coefficients is zero, we need to fit two models:

• the reduced model which omits the k predictors

in question, and

• the current model which includes them.

Page 33: Key Concept - Pennsylvania State Universitypersonal.psu.edu/abs12/stat504/Lecture/lec11.pdf · Stat 504, Lecture 11 10 Logistic Regression (ref. Chs. 5 - 7, Agresti) is another way

Stat 504, Lecture 11 33'

&

$

%

The null hypothesis is that the reduced model is true;

the alternative is that the current model is true.

To perform the test, we must look at the “Model Fit

Statistics” section and examine the value of “−2 Log

L” for “Intercept and Covariates.” The

likelihood-ratio statistic is

∆G2 = −2 log L from reduced model

− (−2 log L from current model)

and the degrees of freedom is k (the number of

coefficients in question). The p-value is P (χ2

k ≥ ∆G2).

Larger values of ∆G2 lead to small p-values, which

provide evidence against the reduced model in favor

of the current model.

For our example,∆G2 = 5176.510 − 5138.144 = 38.3658 withdf = 3 − 1 = 2. Notice that this matches

Likelihood Ratio 38.3658 2 <.0001

from ”Testing Global Hypothesis: BETA=0” section.

Page 34: Key Concept - Pennsylvania State Universitypersonal.psu.edu/abs12/stat504/Lecture/lec11.pdf · Stat 504, Lecture 11 10 Logistic Regression (ref. Chs. 5 - 7, Agresti) is another way

Stat 504, Lecture 11 34'

&

$

%

Another way to calculate the test statistic is

∆G2 = G2 from reduced model

−G2 from current model,

where the G2’s are the overall goodness-of-fit

statistics which we will mention in the next

lecture.