Biostat 513 Discussion Week 4/28 4/29courses.washington.edu/b513/Spring 2010/Discussion/Discussion5.pdf · • Given a regression model and observed data, a “likelihood function”

Biostat 513 Discussion Week 54/28 & 4/29

ReviewLikelihoods and the Likelihood Ratio Test

• Given a regression model and observed data, a “likelihood function” is a probability model for the β parameters.

‐ Estimates of the parameters come from maximizing this likelihood function. In other words, given the observed data, they are the most likely values for the β’s.

‐Maximizing a likelihood function is the same as maximizing the log of the likelihood function.


• Likelihood Ratio (LR) Test:

‐Compares two nested models, the “full” and “reduced” model, for example:

full: Logit(π(X))=β0 + β1X1 + β2X2reduced: Logit(π(X))=β0 + β1X1

‐Given the observed data, the log‐likelihoods are maximized under each model.


The likelihood ratio statistic can be used to test the null that the extra parameters in the full model equal 0. It is given by:

‐2*(difference in the max. log likehoods)

Under the null hypothesis, this statistic has a chi2(df) distribution, where df = the difference in the number of parameters between the full and reduced model.

ReviewWald Test

The Wald test is used to test the null hypothesis that an individual regression parameter, or group of parameters, equals 0.

For a single parameter β, the Wald statistic is:

β_hat/SE(β_hat)

This has a N(0,1) distribution under the null hypothesis that β=0.

ExercisesWestern Collaborative Group Study( WCGS)

• The WCGS study recruited 3154 middle‐aged men (age 39‐59) during 1960‐1961. They were followed over time for up to 9 years for incident coronary heart disease (CHD). High SBP (≥140) is the risk factor of interest.

• Variables of interest:

– Disease: 0 = no CHD; 1 = CHD

– High SBP: 0 = SBP<140; 1 = SBP≥140

– Smoke: 0/1/2/3 = nonsmoker/1‐20 cigs/ 21‐30 cigs/ >30 cigs

ExercisesWestern Collaborative Group Study( WCGS)

We will explore 5 possible logistic regression models for the log odds of CHD as it depends on SBP and smoking. For each of these:

‐What is the model?

‐ Interpret the parameters in each model

‐Hypothesis testing via Likelihood Ratio Tests

‐Plot fitted values for each model

Western Collaborative Group Study( WCGS)Model 1

• Write out the logistic regression model for the odds of CHD, where CHD does not depend on SBP or smoking.

logit(π(x))=β0• What is the interpretation of the parameter?

β0 : log odds of 9‐year CHD risk for subjects in our data set.

• Can you do a LR test for this model? A Wald test? If so, what is the null hypothesis?

Western Collaborative Group Study( WCGS)Model 1: logit(π(x))=β0

. logit case [freq = count]

Iteration 0: log likelihood = -890.62187

Logistic regression Number of obs = 3154LR chi2(0) = 0.00Prob > chi2 = .

Log likelihood = -890.62187 Pseudo R2 = 0.0000

case Coef. Std. Err. z P>z [95% Conf. Interval]

_cons -2.422355 .0650864 -37.220.000 -2.549922 -2.294788

. predict fitted1, xb

. estimates store model1

H0: β0 = 0HA: β0 ≠0


• Is 9‐year CHD risk associated with smoking?• Write out the logistic regression model that allows the

log odds of CHD to depend on smoking category only.logit(π(x))=β0+ β1smoke 1+ β2smoke2+ β3smoke3

• Interpret the model parameters. β0 : log odds of CHD for non‐smokers.β1 : log odds ratio of CHD comparing people who smoke 1‐20 cigs/day to non‐smokers.β2 : log odds ratio of CHD comparing people who smoke 21‐30 cigs/day to non‐smokers.β3 : log odds ratio of CHD comparing people who smoke >30 cigs/day to non‐smokers.


Logit(π(x))=β0+ β1smoke1+ β2smoke2+ β3smoke3

LR test :

Is there is an association between smoking and CHD?

‐Null hypothesis:

H0: β1 = β2 = β3 = 0

HA: at least one βi (i = 1, 2, 3) ≠ 0

‐ Reduced model to compare with Model 2:

Logit(π(x))=β0


. xi: logit case i.smoke [freq = count]i.smoke _Ismoke_0-3 (naturally coded; _Ismoke_0 omitted)

Iteration 0: log likelihood = -890.62187Iteration 1: log likelihood = -876.52013Iteration 2: log likelihood = -875.84853Iteration 3: log likelihood = -875.84738

Logistic regression Number of obs = 3154LR chi2(3) = 29.55Prob > chi2 = 0.0000



_Ismoke_1 .4122448 .1627693 2.53 0.011 .0932229 .7312667_Ismoke_2 .8035253 .1834786 4.38 0.000 .4439138 1.163137_Ismoke_3 .8937922 .2010989 4.44 0.000 .4996455 1.287939_cons -2.76362 .1041517 -26.53 0.000 -2.967754 -2.559486

. lrtest model2 model1

Likelihood-ratio test LR chi2(3) = 29.55(Assumption: model1 nested in model2) Prob > chi2 = 0.0000

H0: β1 = β2 = β3 = 0


• Is 9‐year CHD risk associated with high SBP?

• Write out the logistic regression model that allows the log odds of CHD to depend on high SBP only.

logit(π(x))=β0+ β1SBP

• Interpret the model parameters.

β0 : log odds of CHD for men with normal SBP.

β1 : log odds ratio of CHD comparing men with high SBP to men with normal SBP.


Logit(π(x))=β0+ β1SBP

LR test :

Is there is an association between high SBP and CHD?

‐Null hypothesis:

H0: β1 = 0

HA: β1≠ 0


Logit(π(x))=β0


. logit case BP [freq=count]





BP .8316911 .1368846 6.08 0.000 .5634022 1.09998_cons -2.65926 .0815228 -32.620.000 -2.819042 -2.499478



.



H0: β1 = 0


• Write out the logistic regression model for the log odds of CHD including our predictor of interest, high SBP, adjusted for smoking category.

logit(π(x))=β0 + β1SBP+ β2smoke1+ β3smoke2+ β4smoke3

• Interpret the model parameters. β0 : log odds of CHD for non‐smokers with SBP<140β1 : log odds ratio for CHD comparing men with SBP≥140 to men with SBP<140, adjusted for smoking groupβ2 : log odds ratio for CHD comparing men who smoke 1‐20 cigs/day to non‐smokers, adjusted for SBPβ3 : log odds ratio for CHD comparing men who smoke 21‐30 cigs/day to non‐smokers, adjusted for SBPβ4 : log odds ratio for CHD comparing men who smoke >30 cigs/day to non‐smokers, adjusted for SBP


logit(π(x))=β0 + β1SBP+ β2smoke1+ β3smoke2+ β4smoke3

Model log odds of 9‐year CHD risk by SBP and smoke:

Smokingcategory:

0 cigs/day 1‐20 cigs/day

21‐30 cigs/day

>30cigs/day

SBP<140 β0 β0 + β2 β0 + β3 β0 + β4

SBP≥140 β0 + β1 β0 + β1 + β2 β0 + β1 + β3 β0 + β1 + β4


Logit(π(x))=β0 + β1SBP+ β2smoke1+ β3smoke2+ β4smoke3

LR test :

Is smoking associated with CHD incidence after adjusting for whether a man has high SBP or not?

‐Null hypothesis:

H0: β2 = β3 = β4 = 0



Logit(π(x))=β0 + β1SBP



LR test :

Is high SBP associated with CHD incidence after adjusting for smoking level?

‐Null hypothesis:

H0: β1=0

HA: β1≠ 0


Logit(π(x))=β0+ β2smoke1+ β3smoke2+ β4smoke3


. xi:logit case BP i.smoke [freq=count]i.smoke _Ismoke_0-3 (naturally coded; _Ismoke_0 omitted)





BP .8318699 .1379075 6.03 0.000 .5615762 1.102164_Ismoke_1 .4277964 .16374 2.61 0.009 .1068718 .748721_Ismoke_2 .8213089 .1848922 4.44 0.000 .4589269 1.183691_Ismoke_3 .8669904 .2028092 4.27 0.000 .4694917 1.264489_cons -3.004302 .1161083 -25.87 0.000 -3.23187 -2.776733









Is there a Wald test that corresponds to either of these?

H0: β1 =0

H0: β2 = β3 = β4 = 0


Write out the logistic regression model that allows the odds ratio for SBP≥140 vs. SBP<140 to vary according to smoking category.

logit(π(x))=β0 + β1SBP+ β2smoke1+ β3smoke2+ β4smoke3 +β5SBP*smoke1+ β6 SBP*smoke2+ β7SBP*smoke3

• Interpret the model parameters. β0 : log odds of CHD for non‐smokers with SBP<140β1 : log odds ratio for CHD comparing non‐smokers with SBP≥140 to non‐smokers with SBP<140β2 : log odds ratio for CHD comparing men with SBP<140 who smoke 1‐20 cigs/day to non‐smokers with SBP<140β3 : log odds ratio for CHD comparing men with SBP<140 who smoke 21‐30 cigs/day to non‐smokers with SBP<140β4 : log odds ratio for CHD comparing men with SBP<140 who smoke >30 cigs/day to non‐smokers with SBP<140



• Interpret the model parameters. β5 : the difference between the log odds ratios for CHD comparing men with SBP≥140 to men with SBP<140 for non‐smokers versus those who smoke 1‐20 cigs/day

β6 : the difference between the log odds ratios for CHD comparing men with SBP≥140 to men with SBP<140 for non‐smokers versus those who smoke 21‐30 cigs/day

β7: the difference between the log odds ratios for CHD comparing men with SBP≥140 to men with SBP<140 for non‐smokers versus those who smoke >30 cigs/day



Model log odds of 9‐year CHD risk by SBP and smoke:

Smokingcategory:

0 cigs/day 1‐20 cigs/day 21‐30 cigs/day >30cigs/day

SBP<140 β0 β0 + β2 β0 + β3 β0 + β4

SBP≥140 β0 + β1 β0 + β1 + β2 + β5 β0 + β1 + β3 + β6 β0 + β1 + β4 + β7



LR test :

Does the odds ratio for SBP≥140 vs. SBP<140 depend on smoking category?

‐Null hypothesis:

H0: β5 = β6 = β7 = 0


‐ Reduced model to compare with Model 4:logit(π(x))=β0 + β1SBP+ β2smoke1+ β3smoke2+ β4smoke3


. xi:logit case BP i.smoke i.smoke*BP [freq=count]

Iteration 4: log likelihood = -854.14373




BP .8262888 .2174909 3.80 0.000 .4000145 1.252563_Ismoke_1 .2992306 .2094823 1.43 0.153 -.1113471 .7098083_Ismoke_2 1.059936 .2139907 4.95 0.000 .6405219 1.47935_Ismoke_3 .7198859 .2689048 2.68 0.007 .192842 1.24693_IsmoXBP_1 .3506926 .3384628 1.04 0.300 -.3126822 1.014067_IsmoXBP_2 -.9121047 .4347924 -2.10 0.036 -1.764282 -.0599273_IsmoXBP_3 .3574813 .4154272 0.86 0.390 -.456741 1.171704_cons -3.002268 .1311784 -22.89 0.000 -3.259373 -2.745163




• . lrtest model5 model4

• Likelihood-ratio test LR chi2(3) = 9.45• (Assumption: model4 nested in model5) Prob > chi2 = 0.0239

H0: β5 = β6 = β7 = 0

Model 1: logit(π(x))=β0

twoway (scatter fitted1 BP) (lowess fitted1 BP), ytitle(Estimated Logit(P(CHD))) xtitle(High SBP)

Model 2logit(π(x))=β0+ β1smoke1+ β2smoke2+ β3smoke3

twoway (scatter fitted2 smoke, mcolor(pink)), ytitle(Estimated Logit(P(CHD))) xtitle(Smoking category)

Model 3: Logit(π(x))=β0+ β1SBP

twoway (scatter fitted3 BP, mcolor(pink)), ytitle(Estimated Logit(P(CHD))) xtitle(High SBP)

Model 4Logit(π(x))=β0 + β1SBP+ β2smoke1+ β3smoke2+ β4smoke3

• twoway (scatter fitted4 BP if smoke ==0, mcolor(pink)) (lowess fitted4 BP if smoke==0, mcolor(pink)) (scatter fitted4 BP if smoke ==1, mcolor(black)) (lowess fitted4 BP if smoke==1, mcolor(black))(scatter fitted4 BP if smoke ==2, mcolor(red)) (lowess fitted4 BP if smoke==2, mcolor(red))(scatter fitted4 BP if smoke ==3, mcolor(green)) (lowess fitted4 BP if smoke==3, mcolor(green)), ytitle(Estimated Logit(P(CHD))) xtitle(High SBP)

>30 cigs/day

Model 5logit(π(x))=β0 + β1SBP + β2smoke1+ β3smoke2+ β4smoke3 +β5SBP*smoke1+

β6SBP*smoke2+ β7SBP*smoke3

twoway (scatter fitted5 BP if smoke ==0, mcolor(pink)) (lowess fitted5 BP if smoke==0, mcolor(pink)) (scatter fitted5 BP if smoke ==1, mcolor(black)) (lowess fitted5 BP if smoke==1, mcolor(black))(scatter fitted5 BP if smoke ==2, mcolor(red)) (lowess fitted5 BP if smoke==2, mcolor(red))(scatter fitted5 BP if smoke ==3, mcolor(green)) (lowess fitted5 BP if smoke==3, mcolor(green)), ytitle(Estimated Logit(P(CHD))) xtitle(High SBP)

Western Collaborative Group Study( WCGS)

Summary• How would you summarize the association between 9‐year

CHD risk and elevated SBP and why?Adjusting for smoking level, the estimated OR for CHD comparing a group with elevated SBP to normal SBP, is 2.30 (95% CI (1.75, 3.01)). There was some evidence that the effect of SBP differs by smoking level, however it does not indicate a dose effect, like we may have expected. The data indicate that within the 21‐30 cigarettes/day group, men with high blood pressure had a lower risk of CHD (estimated OR = 0.92). This same trend was not seen in either of the adjacent smoking groups (estimated OR = 3.24 and 3.25 for 1‐20 and >30 cigs/day, respectively). Similarly, among nonsmokers we estimated higher odds of CHD for men with elevated SBP (OR =2.28, 95% CI(1.49, 3.50)).

Documents

Biostat 513 Discussion Week 4/28 4/29courses.washington.edu/b513/Spring 2010/Discussion/Discussion5.pdf · • Given a regression model and observed data, a “likelihood function”