Logistic Regression. Example: Birth defects We want to know if the probability of a certain birth...

Logistic Regression

Example: Birth defects We want to know if the probability of a certain

birth defect is higher among women of a certain age Outcome (y) = presence/absence of birth defect Explanatory (x) = maternal age a birth

> bdlog<-glm(bd$casegrp~bd$MAGE,family=binomial("logit"))

Since “logit” is the default, you can actually use:> bdlog<-glm(bd$casegrp~bd$MAGE,binomial)

> summary(bdlog)

Example in RCall:

glm(formula = bd$casegrp ~ bd$MAGE, family = binomial("logit"))

Deviance Residuals:

Min 1Q Median 3Q Max

-0.56672 -0.24047 -0.14728 -0.08994 3.60539

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) 0.82555 0.32491 2.541 0.0111 *

bd$MAGE -0.19793 0.01489 -13.290 <2e-16 ***

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Null deviance: 2364.1 on 11892 degrees of freedom

Residual deviance: 2130.8 on 11891 degrees of freedom

AIC: 2134.8

P1979.08255.0

Plot> plot(MAGE~ fitted(glm(casegrp~ MAGE,binomial)), xlab=“Maternal Age”, ylab=“P(Birth Defect)”, pch=15)

Example: Categorical x Sometimes, it’s easier to interpret logistic

regression output if the x variables are categorical Suppose we categorize maternal age into 3

categories:

> bd$magecat3 <- ifelse(bd$MAGE>25, c(1),c(0))

> bd$magecat2 <- ifelse(bd$MAGE>=20 & bd$MAGE<=25, c(1),c(0))

> bd$magecat1 <- ifelse(bd$MAGE<20, c(1),c(0))

Maternal Age

Birth Defect

< 20 years 20-24 years

> 24 years

Yes 101 105 36

No 1385 3755 6511

Example in R> bdlog2<-glm(casegrp~magecat1+magecat2,binomial)

> summary(bdlog2)

Remember, with a set of dummy variables, you always put in one less variable than category

Example in RCall:

glm(formula = casegrp ~ magecat1 + magecat2, family = binomial)

Deviance Residuals:

Min 1Q Median 3Q Max

-0.3752 -0.2349 -0.1050 -0.1050 3.2259

Coefficients:

(Intercept) -5.1977 0.1671 -31.101 <2e-16 ***

magecat1 2.5794 0.1964 13.137 <2e-16 ***

magecat2 1.6208 0.1942 8.345 <2e-16 ***

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

AIC: 2154.6

Interpretation: odds ratios

> exp(cbind(OR=coef(bdlog2),confint(bdlog2)))

Waiting for profiling to be done...

OR 2.5 % 97.5 %

(Intercept) 0.005529105 0.003910843 0.007544544

magecat1 13.189149619 9.066531887 19.622868917

magecat2 5.057367954 3.492376718 7.495720840

This tells us that: women <20 years have a 13 times greater odds of

a birth defect than women >24 years women 20-24 years have a 5 times greater odds

of a birth defect than women >24 years

)exp( nOR

What variables can we consider dropping?> anova(bd.log,test="Chisq")

Analysis of Deviance Table

Model: binomial, link: logit

Response: casegrp

Terms added sequentially (first to last)

Df Deviance Resid. Df Resid. Dev P(>|Chi|)

NULL 11880 2355.87

magecat1 1 130.58 11879 2225.28 < 2.2e-16 ***

magecat2 1 82.73 11878 2142.56 < 2.2e-16 ***

bthparity2 1 13.37 11877 2129.18 0.0002555 ***

smoke 1 16.10 11876 2113.09 6.022e-05 ***

Small p-values indicate that all variables are needed to explain the variation in y

Goodness of fit statisticsCoefficients:

(Intercept) -4.9100 0.1870 -26.252 < 2e-16 ***

magecat1 2.2534 0.2073 10.872 < 2e-16 ***

magecat2 1.4732 0.1965 7.497 6.52e-14 ***

bthparity2parous -0.5932 0.1497 -3.962 7.45e-05 ***

smokesmoker 0.6515 0.1546 4.213 2.52e-05 ***

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

AIC: 2123.1-2 log LAIC

Binned residual plot> x<-predict(bd.log)

> y<-resid(bd.log)

> binnedplot(x,y)

Plots the average residual and the average fitted (predicted) value for each bin, or category

Category is based on the fitted values

95% of all values should fall within the dotted line

Poisson Regression

Using count data

What is a Poisson distribution?

Example: children ever born The dataset has 70 rows representing group-

level data on the number of children ever born to women in Fiji: Number of children ever born Number of women in the group Duration of marriage

1=0-4, 2=5-9, 3=10-14, 4=15-19, 5=20-24, 6=25-29 Residence

1=Suva (capital city), 2=Urban, 3=Rural Education

1=none, 2=lower primary, 3=upper primary, 4=secondary+

Poisson regression in R> ceb1<-glm(y ~ educ + res, offset=log(n), family = "poisson",

data = ceb)

Coefficients:

(Intercept) 1.43029 0.01795 79.691 <2e-16 ***

educnone 0.21462 0.02183 9.831 <2e-16 ***

educsec+ -1.00900 0.05217 -19.342 <2e-16 ***

educupper -0.40485 0.02956 -13.696 <2e-16 ***

resSuva -0.05997 0.02819 -2.127 0.0334 *

resurban 0.06204 0.02442 2.540 0.0111 *

AIC: Inf

Need to account for different population sizes in each area/group unless data are from same-size populations

Assessing model fit1. Examine AIC score – smaller is better

2. Examine the deviance as an approximate goodness of fit test Expect the residual deviance/degrees of freedom to be

approximately 1

> ceb2$deviance/ceb2$df.residual

[1] 41.35172

3. Compare residual deviance to a 2 distribution> pchisq(2646.5, 64, lower=F)

Model fitting: analysis of deviance Similar to logistic regression, we want to compare

the differences in the size of residuals between models

> ceb1<-glm(y~educ, family=“poisson", offset=log(n), data= ceb)

> ceb2<-glm(y~educ+res, family=“poisson", offset=log(n), data= ceb)

> 1-pchisq(deviance(ceb1)-deviance(ceb2), df.residual(ceb1)-df.residual(ceb2))

[1] 0.0007124383

Since the p-value is small, there is evidence that the addition of res explains a significant amount (more) of the deviance

Overdispersion in Poission models A characteristic of the Poisson distribution is

that its mean is equal to its variance

Sometimes the observed variance is greater than the mean Known as overdispersion

Another common problem with Poisson regression is excess zeros Are more zeros than a Poisson regression would

predict

Overdispersion Use family=“quasipoisson” instead of “poisson” to estimate the dispersion parameter

Doesn't change the estimates for the coefficients, but may change their standard errors

> ceb2<-glm(y~educ+res, family="quasipoisson", offset=log(n), data=ceb)

Poisson vs. quasipoisson

Family = “poisson”Family = “quasipoisson”

Coefficients:

(Intercept) 1.43029 0.01795 79.691 <2e-16 ***

educnone 0.21462 0.02183 9.831 <2e-16 ***

educsec+ -1.00900 0.05217 -19.342 <2e-16 ***

educupper -0.40485 0.02956 -13.696 <2e-16 ***

resSuva -0.05997 0.02819 -2.127 0.0334 *

resurban 0.06204 0.02442 2.540 0.0111 *

(Dispersion parameter for poisson family taken to be 1)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 1.43029 0.10999 13.004 < 2e-16 ***

educnone 0.21462 0.13378 1.604 0.11358

educsec+ -1.00900 0.31968 -3.156 0.00244 **

educupper -0.40485 0.18115 -2.235 0.02892 *

resSuva -0.05997 0.17277 -0.347 0.72965

resurban 0.06204 0.14966 0.415 0.67988

(Dispersion parameter for quasipoisson taken to be 37.55359)

Models for overdispersion When overdispersion is a problem, use a

negative binomial model Will adjust estimates and standard errors

> install.packages(“MASS”)

> library(MASS)

> ceb.nb <- glm.nb(y~educ+res+offset(log(n)), data= ceb)

OR> ceb.nb<-glm.nb(ceb2)

> summary(ceb.nb)

NB model in Rglm.nb(formula = ceb2, x = T, init.theta = 3.38722121141125, link = log)

Coefficients:

(Intercept) 1.490043 0.160589 9.279 < 2e-16 ***

educnone 0.002317 0.183754 0.013 0.98994

educsec+ -0.630343 0.200220 -3.148 0.00164 **

educupper -0.173138 0.184210 -0.940 0.34727

resSuva -0.149784 0.165622 -0.904 0.36580

resurban 0.055610 0.165391 0.336 0.73670

(Dispersion parameter for Negative Binomial(3.3872) family taken to be 1)

AIC: 740.55

Theta: 3.387

Std. Err.: 0.583

2 x log-likelihood: -726.555

> ceb.nb$deviance/ceb.nb$df.residual[1] 1.124297

What if your data looked like…

Zero-inflated Poisson model (ZIP) If you have a large number of 0 counts…

> install.packages(“pscl”)

> library(pscl)

> ceb.zip <- zeroinfl(y~educ+res, offset=log(n), data= ceb)

Logistic Regression. Example: Birth defects We want to know if the probability of a certain birth...

Documents

Mothers’ Labor Supply in Fragile Families: The Role of Child ......after birth, and whether the child had any birth defects. She found that certain high-risk health problems at birth

Binary Logistic Regression - Juan BattleBinary Logistic Regression • The logistic regression model is simply a non-linear transformation of the linear regression. • The logistic

Pakistani Women’s Perceived Spousal · –Predicted Probabilities From interactive Multinomial Logistic Regression Models. Births Intentions by Birth Order and Survey Year 95.5

Binary Logistic Regression Multinomial Logistic Regressionmgormley/courses/10601/slides/lecture10-multi.pdf · Binary Logistic Regression + Multinomial Logistic Regression 1 10-601

Population Dynamics and Random Genealogies1uctuates randomly (processes of Bienaym e{Galton{Watson, Jirina; birth{death processes, logistic branching process). Scaling limits of these

MCL-LOGISTIC-02_V1-4 · Web viewMCL-LOGISTIC-02 VERSION 1.4 MCL-LOGISTIC-02 VERSION 1.4 110 MCL-LOGISTIC-02 VERSION 1.4 MCL-LOGISTIC-02 VERSION 1.462 Draft version dated 16 April

PAPERS - BMJbaby, birth weight, mode of delivery, Apgar score, duration oflabour, time offirst antenatal attendance, orattendanceat daycarenursery. A stepwise multiple logistic regression

Design and implementation of a global logistic tracking ......SaaS cloud computing infrastructure refers to encapsulate some certain applications into services by adopting SOA interfaces

Malaysia logistic masterplan _ logistic

Conrollin/teach/BiostatW07/reading/Logistic... · Con ten ts 3 Logistic Regression 2 3.1 The Logistic Curv e. 3 3.2 Fitting a Logistic Curv e. 5 3.3 Multiple Logistic Regression

Logistic Management

Multiple Logistic Regression - GitHub Pages Logistic... · 2.Perform multiple logistic regression in SPSS. ... Univariable analysis ... b.Classification table

Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

Analysis of the 2010 Birth Defects Registry...and gravida as all Delaware residents that gave birth in 2010. Moreover, certain findings paralleled those found in other Delaware-specific

Autoterminal Śląsk Logistic Company profile. ATS Logistic – Shareholders 50%

Advanced epidemic models - University of Ottawa€¦ · Advanced epidemic models Infection rate Demography Constant birth, linear death Logistic growth Errors Hospital bed capacity

The Birth of Architecture - British Academy · cussing the birth of architecture, Auden says that certain buildings were the work of ‘the same Old Man’, but, he continues, we

From New Birth to New Creation - Present Truth … New Birth to New Creation A restatement of certain truths brought out in the 1800s in connection with dispensational truth

Logistic Regression. Outline Review of simple and multiple regressionReview of simple and multiple regression Simple Logistic RegressionSimple Logistic

Logistic Support Analysis - Qmuzik Technologiesapps.qmuzik.com/Training/qm8pdf/Logistic Support Analysis.pdfLogistic Support Analysis : Course Assignment INTRODUCTION Logistic Support