Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Biostat 513 Discussion Week 54/28 & 4/29
ReviewLikelihoods and the Likelihood Ratio Test
• Given a regression model and observed data, a “likelihood function” is a probability model for the β parameters.
‐ Estimates of the parameters come from maximizing this likelihood function. In other words, given the observed data, they are the most likely values for the β’s.
‐Maximizing a likelihood function is the same as maximizing the log of the likelihood function.
ReviewLikelihoods and the Likelihood Ratio Test
• Likelihood Ratio (LR) Test:
‐Compares two nested models, the “full” and “reduced” model, for example:
full: Logit(π(X))=β0 + β1X1 + β2X2reduced: Logit(π(X))=β0 + β1X1
‐Given the observed data, the log‐likelihoods are maximized under each model.
ReviewLikelihoods and the Likelihood Ratio Test
The likelihood ratio statistic can be used to test the null that the extra parameters in the full model equal 0. It is given by:
‐2*(difference in the max. log likehoods)
Under the null hypothesis, this statistic has a chi2(df) distribution, where df = the difference in the number of parameters between the full and reduced model.
ReviewWald Test
The Wald test is used to test the null hypothesis that an individual regression parameter, or group of parameters, equals 0.
For a single parameter β, the Wald statistic is:
β_hat/SE(β_hat)
This has a N(0,1) distribution under the null hypothesis that β=0.
ExercisesWestern Collaborative Group Study( WCGS)
• The WCGS study recruited 3154 middle‐aged men (age 39‐59) during 1960‐1961. They were followed over time for up to 9 years for incident coronary heart disease (CHD). High SBP (≥140) is the risk factor of interest.
• Variables of interest:
– Disease: 0 = no CHD; 1 = CHD
– High SBP: 0 = SBP<140; 1 = SBP≥140
– Smoke: 0/1/2/3 = nonsmoker/1‐20 cigs/ 21‐30 cigs/ >30 cigs
ExercisesWestern Collaborative Group Study( WCGS)
We will explore 5 possible logistic regression models for the log odds of CHD as it depends on SBP and smoking. For each of these:
‐What is the model?
‐ Interpret the parameters in each model
‐Hypothesis testing via Likelihood Ratio Tests
‐Plot fitted values for each model
Western Collaborative Group Study( WCGS)Model 1
• Write out the logistic regression model for the odds of CHD, where CHD does not depend on SBP or smoking.
logit(π(x))=β0• What is the interpretation of the parameter?
β0 : log odds of 9‐year CHD risk for subjects in our data set.
• Can you do a LR test for this model? A Wald test? If so, what is the null hypothesis?
Western Collaborative Group Study( WCGS)Model 1: logit(π(x))=β0
. logit case [freq = count]
Iteration 0: log likelihood = -890.62187
Logistic regression Number of obs = 3154LR chi2(0) = 0.00Prob > chi2 = .
Log likelihood = -890.62187 Pseudo R2 = 0.0000
case Coef. Std. Err. z P>z [95% Conf. Interval]
_cons -2.422355 .0650864 -37.220.000 -2.549922 -2.294788
. predict fitted1, xb
. estimates store model1
H0: β0 = 0HA: β0 ≠0
Western Collaborative Group Study( WCGS)Model 2
• Is 9‐year CHD risk associated with smoking?• Write out the logistic regression model that allows the
log odds of CHD to depend on smoking category only.logit(π(x))=β0+ β1smoke 1+ β2smoke2+ β3smoke3
• Interpret the model parameters. β0 : log odds of CHD for non‐smokers.β1 : log odds ratio of CHD comparing people who smoke 1‐20 cigs/day to non‐smokers.β2 : log odds ratio of CHD comparing people who smoke 21‐30 cigs/day to non‐smokers.β3 : log odds ratio of CHD comparing people who smoke >30 cigs/day to non‐smokers.
Western Collaborative Group Study( WCGS)Model 2
Logit(π(x))=β0+ β1smoke1+ β2smoke2+ β3smoke3
LR test :
Is there is an association between smoking and CHD?
‐Null hypothesis:
H0: β1 = β2 = β3 = 0
HA: at least one βi (i = 1, 2, 3) ≠ 0
‐ Reduced model to compare with Model 2:
Logit(π(x))=β0
Western Collaborative Group Study( WCGS)Model 2
. xi: logit case i.smoke [freq = count]i.smoke _Ismoke_0-3 (naturally coded; _Ismoke_0 omitted)
Iteration 0: log likelihood = -890.62187Iteration 1: log likelihood = -876.52013Iteration 2: log likelihood = -875.84853Iteration 3: log likelihood = -875.84738
Logistic regression Number of obs = 3154LR chi2(3) = 29.55Prob > chi2 = 0.0000
Log likelihood = -875.84738 Pseudo R2 = 0.0166
case Coef. Std. Err. z P>z [95% Conf. Interval]
_Ismoke_1 .4122448 .1627693 2.53 0.011 .0932229 .7312667_Ismoke_2 .8035253 .1834786 4.38 0.000 .4439138 1.163137_Ismoke_3 .8937922 .2010989 4.44 0.000 .4996455 1.287939_cons -2.76362 .1041517 -26.53 0.000 -2.967754 -2.559486
. lrtest model2 model1
Likelihood-ratio test LR chi2(3) = 29.55(Assumption: model1 nested in model2) Prob > chi2 = 0.0000
H0: β1 = β2 = β3 = 0
Western Collaborative Group Study( WCGS)Model 3
• Is 9‐year CHD risk associated with high SBP?
• Write out the logistic regression model that allows the log odds of CHD to depend on high SBP only.
logit(π(x))=β0+ β1SBP
• Interpret the model parameters.
β0 : log odds of CHD for men with normal SBP.
β1 : log odds ratio of CHD comparing men with high SBP to men with normal SBP.
Western Collaborative Group Study( WCGS)Model 3
Logit(π(x))=β0+ β1SBP
LR test :
Is there is an association between high SBP and CHD?
‐Null hypothesis:
H0: β1 = 0
HA: β1≠ 0
‐ Reduced model to compare with Model 3:
Logit(π(x))=β0
Western Collaborative Group Study( WCGS)Model 3
. logit case BP [freq=count]
Iteration 0: log likelihood = -890.62187Iteration 1: log likelihood = -874.67431Iteration 2: log likelihood = -873.43429Iteration 3: log likelihood = -873.43065
Logistic regression Number of obs = 3154LR chi2(1) = 34.38Prob > chi2 = 0.0000
Log likelihood = -873.43065 Pseudo R2 = 0.0193
case Coef. Std. Err. z P>z [95% Conf. Interval]
BP .8316911 .1368846 6.08 0.000 .5634022 1.09998_cons -2.65926 .0815228 -32.620.000 -2.819042 -2.499478
. predict fitted3, xb
. estimates store model3
.
. lrtest model3 model1
Likelihood-ratio test LR chi2(1) = 34.38(Assumption: model1 nested in model3) Prob > chi2 = 0.0000
H0: β1 = 0
Western Collaborative Group Study( WCGS)Model 4
• Write out the logistic regression model for the log odds of CHD including our predictor of interest, high SBP, adjusted for smoking category.
logit(π(x))=β0 + β1SBP+ β2smoke1+ β3smoke2+ β4smoke3
• Interpret the model parameters. β0 : log odds of CHD for non‐smokers with SBP<140β1 : log odds ratio for CHD comparing men with SBP≥140 to men with SBP<140, adjusted for smoking groupβ2 : log odds ratio for CHD comparing men who smoke 1‐20 cigs/day to non‐smokers, adjusted for SBPβ3 : log odds ratio for CHD comparing men who smoke 21‐30 cigs/day to non‐smokers, adjusted for SBPβ4 : log odds ratio for CHD comparing men who smoke >30 cigs/day to non‐smokers, adjusted for SBP
Western Collaborative Group Study( WCGS)Model 4
logit(π(x))=β0 + β1SBP+ β2smoke1+ β3smoke2+ β4smoke3
Model log odds of 9‐year CHD risk by SBP and smoke:
Smokingcategory:
0 cigs/day 1‐20 cigs/day
21‐30 cigs/day
>30cigs/day
SBP<140 β0 β0 + β2 β0 + β3 β0 + β4
SBP≥140 β0 + β1 β0 + β1 + β2 β0 + β1 + β3 β0 + β1 + β4
Western Collaborative Group Study( WCGS)Model 4
Logit(π(x))=β0 + β1SBP+ β2smoke1+ β3smoke2+ β4smoke3
LR test :
Is smoking associated with CHD incidence after adjusting for whether a man has high SBP or not?
‐Null hypothesis:
H0: β2 = β3 = β4 = 0
HA: at least one βi (i = 2, 3, 4) ≠ 0
‐ Reduced model to compare with Model 4:
Logit(π(x))=β0 + β1SBP
Western Collaborative Group Study( WCGS)Model 4
Logit(π(x))=β0 + β1SBP+ β2smoke1+ β3smoke2+ β4smoke3
LR test :
Is high SBP associated with CHD incidence after adjusting for smoking level?
‐Null hypothesis:
H0: β1=0
HA: β1≠ 0
‐ Reduced model to compare with Model 4:
Logit(π(x))=β0+ β2smoke1+ β3smoke2+ β4smoke3
Western Collaborative Group Study( WCGS)Model 4
. xi:logit case BP i.smoke [freq=count]i.smoke _Ismoke_0-3 (naturally coded; _Ismoke_0 omitted)
Iteration 0: log likelihood = -890.62187Iteration 1: log likelihood = -861.60811Iteration 2: log likelihood = -858.87736Iteration 3: log likelihood = -858.86651
Logistic regression Number of obs = 3154LR chi2(4) = 63.51Prob > chi2 = 0.0000
Log likelihood = -858.86651 Pseudo R2 = 0.0357
case Coef. Std. Err. z P>z [95% Conf. Interval]
BP .8318699 .1379075 6.03 0.000 .5615762 1.102164_Ismoke_1 .4277964 .16374 2.61 0.009 .1068718 .748721_Ismoke_2 .8213089 .1848922 4.44 0.000 .4589269 1.183691_Ismoke_3 .8669904 .2028092 4.27 0.000 .4694917 1.264489_cons -3.004302 .1161083 -25.87 0.000 -3.23187 -2.776733
. predict fitted4, xb
. estimates store model4
Western Collaborative Group Study( WCGS)Model 4
Logit(π(x))=β0 + β1SBP+ β2smoke1+ β3smoke2+ β4smoke3
. lrtest model4 model3
Likelihood-ratio test LR chi2(3) = 29.13(Assumption: model3 nested in model4) Prob > chi2 = 0.0000
. lrtest model4 model2
Likelihood-ratio test LR chi2(1) = 33.96(Assumption: model2 nested in model4) Prob > chi2 = 0.0000
Is there a Wald test that corresponds to either of these?
H0: β1 =0
H0: β2 = β3 = β4 = 0
Western Collaborative Group Study( WCGS)Model 5
Write out the logistic regression model that allows the odds ratio for SBP≥140 vs. SBP<140 to vary according to smoking category.
logit(π(x))=β0 + β1SBP+ β2smoke1+ β3smoke2+ β4smoke3 +β5SBP*smoke1+ β6 SBP*smoke2+ β7SBP*smoke3
• Interpret the model parameters. β0 : log odds of CHD for non‐smokers with SBP<140β1 : log odds ratio for CHD comparing non‐smokers with SBP≥140 to non‐smokers with SBP<140β2 : log odds ratio for CHD comparing men with SBP<140 who smoke 1‐20 cigs/day to non‐smokers with SBP<140β3 : log odds ratio for CHD comparing men with SBP<140 who smoke 21‐30 cigs/day to non‐smokers with SBP<140β4 : log odds ratio for CHD comparing men with SBP<140 who smoke >30 cigs/day to non‐smokers with SBP<140
Western Collaborative Group Study( WCGS)Model 5
logit(π(x))=β0 + β1SBP+ β2smoke1+ β3smoke2+ β4smoke3 +β5SBP*smoke1+ β6 SBP*smoke2+ β7SBP*smoke3
• Interpret the model parameters. β5 : the difference between the log odds ratios for CHD comparing men with SBP≥140 to men with SBP<140 for non‐smokers versus those who smoke 1‐20 cigs/day
β6 : the difference between the log odds ratios for CHD comparing men with SBP≥140 to men with SBP<140 for non‐smokers versus those who smoke 21‐30 cigs/day
β7: the difference between the log odds ratios for CHD comparing men with SBP≥140 to men with SBP<140 for non‐smokers versus those who smoke >30 cigs/day
Western Collaborative Group Study( WCGS)Model 5
logit(π(x))=β0 + β1SBP+ β2smoke1+ β3smoke2+ β4smoke3 +β5SBP*smoke1+ β6 SBP*smoke2+ β7SBP*smoke3
Model log odds of 9‐year CHD risk by SBP and smoke:
Smokingcategory:
0 cigs/day 1‐20 cigs/day 21‐30 cigs/day >30cigs/day
SBP<140 β0 β0 + β2 β0 + β3 β0 + β4
SBP≥140 β0 + β1 β0 + β1 + β2 + β5 β0 + β1 + β3 + β6 β0 + β1 + β4 + β7
Western Collaborative Group Study( WCGS)Model 5
logit(π(x))=β0 + β1SBP+ β2smoke1+ β3smoke2+ β4smoke3 +β5SBP*smoke1+ β6 SBP*smoke2+ β7SBP*smoke3
LR test :
Does the odds ratio for SBP≥140 vs. SBP<140 depend on smoking category?
‐Null hypothesis:
H0: β5 = β6 = β7 = 0
HA: at least one βi (i = 5, 6, 7) ≠ 0
‐ Reduced model to compare with Model 4:logit(π(x))=β0 + β1SBP+ β2smoke1+ β3smoke2+ β4smoke3
Western Collaborative Group Study( WCGS)Model 5
. xi:logit case BP i.smoke i.smoke*BP [freq=count]
Iteration 4: log likelihood = -854.14373
Logistic regression Number of obs = 3154LR chi2(7) = 72.96Prob > chi2 = 0.0000
Log likelihood = -854.14373 Pseudo R2 = 0.0410
case Coef. Std. Err. z P>z [95% Conf. Interval]
BP .8262888 .2174909 3.80 0.000 .4000145 1.252563_Ismoke_1 .2992306 .2094823 1.43 0.153 -.1113471 .7098083_Ismoke_2 1.059936 .2139907 4.95 0.000 .6405219 1.47935_Ismoke_3 .7198859 .2689048 2.68 0.007 .192842 1.24693_IsmoXBP_1 .3506926 .3384628 1.04 0.300 -.3126822 1.014067_IsmoXBP_2 -.9121047 .4347924 -2.10 0.036 -1.764282 -.0599273_IsmoXBP_3 .3574813 .4154272 0.86 0.390 -.456741 1.171704_cons -3.002268 .1311784 -22.89 0.000 -3.259373 -2.745163
. predict fitted5, xb
. estimates store model5
Western Collaborative Group Study( WCGS)Model 5
• . lrtest model5 model4
• Likelihood-ratio test LR chi2(3) = 9.45• (Assumption: model4 nested in model5) Prob > chi2 = 0.0239
H0: β5 = β6 = β7 = 0
Model 1: logit(π(x))=β0
twoway (scatter fitted1 BP) (lowess fitted1 BP), ytitle(Estimated Logit(P(CHD))) xtitle(High SBP)
Model 2logit(π(x))=β0+ β1smoke1+ β2smoke2+ β3smoke3
twoway (scatter fitted2 smoke, mcolor(pink)), ytitle(Estimated Logit(P(CHD))) xtitle(Smoking category)
Model 3: Logit(π(x))=β0+ β1SBP
twoway (scatter fitted3 BP, mcolor(pink)), ytitle(Estimated Logit(P(CHD))) xtitle(High SBP)
Model 4Logit(π(x))=β0 + β1SBP+ β2smoke1+ β3smoke2+ β4smoke3
• twoway (scatter fitted4 BP if smoke ==0, mcolor(pink)) (lowess fitted4 BP if smoke==0, mcolor(pink)) (scatter fitted4 BP if smoke ==1, mcolor(black)) (lowess fitted4 BP if smoke==1, mcolor(black))(scatter fitted4 BP if smoke ==2, mcolor(red)) (lowess fitted4 BP if smoke==2, mcolor(red))(scatter fitted4 BP if smoke ==3, mcolor(green)) (lowess fitted4 BP if smoke==3, mcolor(green)), ytitle(Estimated Logit(P(CHD))) xtitle(High SBP)
>30 cigs/day
Model 5logit(π(x))=β0 + β1SBP + β2smoke1+ β3smoke2+ β4smoke3 +β5SBP*smoke1+
β6SBP*smoke2+ β7SBP*smoke3
twoway (scatter fitted5 BP if smoke ==0, mcolor(pink)) (lowess fitted5 BP if smoke==0, mcolor(pink)) (scatter fitted5 BP if smoke ==1, mcolor(black)) (lowess fitted5 BP if smoke==1, mcolor(black))(scatter fitted5 BP if smoke ==2, mcolor(red)) (lowess fitted5 BP if smoke==2, mcolor(red))(scatter fitted5 BP if smoke ==3, mcolor(green)) (lowess fitted5 BP if smoke==3, mcolor(green)), ytitle(Estimated Logit(P(CHD))) xtitle(High SBP)
Western Collaborative Group Study( WCGS)
Summary• How would you summarize the association between 9‐year
CHD risk and elevated SBP and why?Adjusting for smoking level, the estimated OR for CHD comparing a group with elevated SBP to normal SBP, is 2.30 (95% CI (1.75, 3.01)). There was some evidence that the effect of SBP differs by smoking level, however it does not indicate a dose effect, like we may have expected. The data indicate that within the 21‐30 cigarettes/day group, men with high blood pressure had a lower risk of CHD (estimated OR = 0.92). This same trend was not seen in either of the adjacent smoking groups (estimated OR = 3.24 and 3.25 for 1‐20 and >30 cigs/day, respectively). Similarly, among nonsmokers we estimated higher odds of CHD for men with elevated SBP (OR =2.28, 95% CI(1.49, 3.50)).