Upload
mili
View
21
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Lecture 15: Logistic Regression: Inference and link functions. BMTRY 701 Biostatistical Methods II. More on our example. > pros5.reg summary(pros5.reg) Call: glm(formula = cap.inv ~ log(psa) + gleason, family = binomial) - PowerPoint PPT Presentation
Citation preview
Lecture 15:Logistic Regression: Inference and link functions
BMTRY 701Biostatistical Methods II
More on our example
> pros5.reg <- glm(cap.inv ~ log(psa) + gleason, family=binomial)> summary(pros5.reg)
Call:glm(formula = cap.inv ~ log(psa) + gleason, family = binomial)
Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -8.1061 0.9916 -8.174 2.97e-16 ***log(psa) 0.4812 0.1448 3.323 0.000892 ***gleason 1.0229 0.1595 6.412 1.43e-10 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 512.29 on 379 degrees of freedomResidual deviance: 403.90 on 377 degrees of freedomAIC: 409.9
Other covariates: Simple logistic models
Covariate Beta exp(Beta) Z
Age -0.0082 0.99 -0.51
Race -0.054 0.95 -0.15
Vol -0.014 0.99 -2.26
Dig Exam (vs. no nodule)
Unilobar left 0.88 2.41 2.81
Unilobar right 1.56 4.76 4.78
Bilobar 2.10 8.17 5.44
Detection in RE 1.71 5.53 4.48
LogPSA 0.87 2.39 6.62
Gleason 1.24 3.46 8.12
What is a good multiple regression model?
Principles of model building are analogous to linear regression
We use the same approach• Look for significant covariates in simple models• consider multicollinearity• look for confounding (i.e. change in betas when a
covariate is removed)
Multiple regression model proposal
Gleason, logPSA, Volume, Digital Exam result, detection in RE
But, what about collinearity? 5 choose 2 pairs.
gleason log.psa. volgleason 1.00 0.46 -0.06log.psa. 0.46 1.00 0.05vol -0.06 0.05 1.00
gleason
-1 0 1 2 3 4 5
02
46
8
-10
12
34
5
log.psa.
0 2 4 6 8 0 20 40 60 80 100
020
4060
8010
0
vol
Categorical pairs
> dpros.dcaps <- epitab(dpros, dcaps)> dpros.dcaps$tab OutcomePredictor 1 p0 2 p1 oddsratio lower upper 1 95 0.2802360 4 0.09756098 1.000000 NA NA 2 123 0.3628319 9 0.21951220 1.737805 0.5193327 5.815089 3 84 0.2477876 12 0.29268293 3.392857 1.0540422 10.921270 4 37 0.1091445 16 0.39024390 10.270270 3.2208157 32.748987 OutcomePredictor p.value 1 NA 2 4.050642e-01 3 3.777900e-02 4 1.271225e-05> fisher.test(table(dpros, dcaps))
Fisher's Exact Test for Count Data
data: table(dpros, dcaps) p-value = 2.520e-05alternative hypothesis: two.sided
Categorical vs. continuous
t-tests and anova: means by category> summary(lm(log(psa)~dcaps))Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.2506 0.1877 6.662 9.55e-11 ***dcaps 0.8647 0.1632 5.300 1.97e-07 ***---Residual standard error: 0.9868 on 378 degrees of freedomMultiple R-squared: 0.06917, Adjusted R-squared: 0.06671 F-statistic: 28.09 on 1 and 378 DF, p-value: 1.974e-07
> summary(lm(log(psa)~factor(dpros)))Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.1418087 0.0992064 21.589 < 2e-16 ***factor(dpros)2 -0.1060634 0.1312377 -0.808 0.419 factor(dpros)3 0.0001465 0.1413909 0.001 0.999 factor(dpros)4 0.7431101 0.1680055 4.423 1.28e-05 ***---Residual standard error: 0.9871 on 376 degrees of freedomMultiple R-squared: 0.07348, Adjusted R-squared: 0.06609 F-statistic: 9.94 on 3 and 376 DF, p-value: 2.547e-06
Categorical vs. continuous
> summary(lm(vol~dcaps))Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 22.905 3.477 6.587 1.51e-10 ***dcaps -6.362 3.022 -2.106 0.0359 * ---Residual standard error: 18.27 on 377 degrees of freedom (1 observation deleted due to missingness)Multiple R-squared: 0.01162, Adjusted R-squared: 0.009003 F-statistic: 4.434 on 1 and 377 DF, p-value: 0.03589
> summary(lm(vol~factor(dpros)))Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 17.417 1.858 9.374 <2e-16 ***factor(dpros)2 -1.638 2.453 -0.668 0.505 factor(dpros)3 -1.976 2.641 -0.748 0.455 factor(dpros)4 -3.513 3.136 -1.120 0.263 ---Residual standard error: 18.39 on 375 degrees of freedom (1 observation deleted due to missingness)Multiple R-squared: 0.003598, Adjusted R-squared: -0.004373 F-statistic: 0.4514 on 3 and 375 DF, p-value: 0.7164
Categorical vs. continuous
> summary(lm(gleason~dcaps))Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.2560 0.1991 26.401 < 2e-16 ***dcaps 1.0183 0.1730 5.885 8.78e-09 ***---Residual standard error: 1.047 on 378 degrees of freedomMultiple R-squared: 0.08394, Adjusted R-squared: 0.08151 F-statistic: 34.63 on 1 and 378 DF, p-value: 8.776e-09
> summary(lm(gleason~factor(dpros)))
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.9798 0.1060 56.402 < 2e-16 ***factor(dpros)2 0.4217 0.1403 3.007 0.00282 ** factor(dpros)3 0.4890 0.1511 3.236 0.00132 ** factor(dpros)4 0.9636 0.1795 5.367 1.40e-07 ***---
Residual standard error: 1.055 on 376 degrees of freedomMultiple R-squared: 0.07411, Adjusted R-squared: 0.06672 F-statistic: 10.03 on 3 and 376 DF, p-value: 2.251e-06
Lots of “correlation” between covariates
We should expect that there will be insignificance and confounding.
Still, try the ‘full model’ and see what happens
Full model results
> mreg <- glm(cap.inv ~ gleason + log(psa) + vol + dcaps + factor(dpros), family=binomial)
> > summary(mreg)Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -8.617036 1.102909 -7.813 5.58e-15 ***gleason 0.908424 0.166317 5.462 4.71e-08 ***log(psa) 0.514200 0.156739 3.281 0.00104 ** vol -0.014171 0.007712 -1.838 0.06612 . dcaps 0.464952 0.456868 1.018 0.30882 factor(dpros)2 0.753759 0.355762 2.119 0.03411 * factor(dpros)3 1.517838 0.372366 4.076 4.58e-05 ***factor(dpros)4 1.384887 0.453127 3.056 0.00224 ** ---
Null deviance: 511.26 on 378 degrees of freedomResidual deviance: 376.00 on 371 degrees of freedom (1 observation deleted due to missingness)AIC: 392
What next?
Drop or retain? How to interpret?
Likelihood Ratio Test
Recall testing multiple coefficients in linear regression
Approach: ANOVA We don’t have ANOVA for logistic More general approach: Likelihood Ratio Test Based on the likelihood (or log-likelihood) for
“competing” nested models
Likelihood Ratio Test
Ho: small model Ha: large model Example:
)4()3()2(
log)(logit
654
3210
DIDIDI
volPSAGSpi
0and/or ;0;0:
0:
6541
6540
H
H
Recall the likelihood function
n
iiii
n
i i
yi
xxyxyL
x
xxyL
i
1101010
1 10
1010
))exp(1log()(),;,(log
)exp(1
)exp(),;,(
Estimating the log-likelihood
Recall that we use the log-likelihood because it is simpler (back to linear regression)
MLEs:• Betas are selected to maximize the likelihood• Betas also maximize the log-likelihood• If we plus the estimated betas, we get our ‘maximized’
log-likelihood for that model
We compare the log-likelihoods from competing (nested) models
Likelihood Ratio Test
LR statistic = G2 = -2*(LogL(H0)-LogL(H1))
Under the null: G2 ~ χ2(p-q)
If G2 < χ2(p-q),1-α, conclude H0
If G2 > χ2(p-q),1-α conclude H1
LRT in R
-2LogL = Residual Deviance So, G2 = Dev(0) - Dev(1) Fit two models:
0and/or ;0;0:
0:
6541
6540
H
H
> mreg1 <- glm(cap.inv ~ gleason + log(psa) + vol + factor(dpros),+ family=binomial)> mreg0 <- glm(cap.inv ~ gleason + log(psa) + vol, family=binomial)> mreg1Coefficients: (Intercept) gleason log(psa) vol -8.31383 0.93147 0.53422 -0.01507 factor(dpros)2 factor(dpros)3 factor(dpros)4 0.76840 1.55109 1.44743
Degrees of Freedom: 378 Total (i.e. Null); 372 Residual (1 observation deleted due to missingness)Null Deviance: 511.3 Residual Deviance: 377.1 AIC: 391.1
> mreg0Coefficients:(Intercept) gleason log(psa) vol -7.76759 0.99931 0.50406 -0.01583
Degrees of Freedom: 378 Total (i.e. Null); 375 Residual (1 observation deleted due to missingness)Null Deviance: 511.3 Residual Deviance: 399 AIC: 407
Testing DPROS
Dev(0) – Dev(1) =
p – q =
χ2(p-q),1-α, =
Conclusion?
p-value?
More in R
qchisq(0.95,3)-2*(logLik(mreg0) - logLik(mreg1))1-pchisq(21.96, 3)
> anova(mreg0, mreg1)Analysis of Deviance Table
Model 1: cap.inv ~ gleason + log(psa) + volModel 2: cap.inv ~ gleason + log(psa) + vol + factor(dpros) Resid. Df Resid. Dev Df Deviance1 375 399.02 2 372 377.06 3 21.96>
Notes on LRT
Again, models have to be NESTED For comparing models that are not nested, you
need to use other approaches Examples:
• AIC • BIC• DIC
Next time….
For next time, read the following article
Low Diagnostic Yield of Elective Coronary AngiographyPatel, Peterson, Dai et al.NEJM, 362(10). pp. 2886-95March 11, 2010
http://content.nejm.org/cgi/content/short/362/10/886?ssource=mfv