Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Logistic Regression for Estimation, Adjustment and Basic Prediction
Lecture 7
Section A
Multiple Logistic Regression: Some Examples
3
Learning Objectives
n Interpret the estimates from multiple logistic regression models in a scientific context
n Compare the results from simple and multiple logistic regression models to assess confounding
4
Example 1: Breast Feeding and Sex
n Data from a random sample of 192 Nepali children [12, 36) months old
n Question: what is the relationship between breast feeding and sex of a child?
n Data: - Breast fed: 70% - Sex: 48% female (1= male, 0 = female)
5
Example 1: Breast Feeding and Sex
n The resulting unadjusted association
: the ln(odds ratio) of being breast fed for males to females is 0.02
: the ln(odds) of being breast fed for female children is 1.12
02.01̂ =β
12.1ˆ =oβ
102.012.11
ln xpp
+=⎟⎟⎠
⎞⎜⎜⎝
⎛
−
6
Example 1: Breast Feeding , Sex and Age
n Now for the results of a multiple regression of breast feeding status on sex, and age of the child (months)
with x1 = sex (1=F) and x2 =age (months), p = probability (proportion) of y=1, where y = 1 if breastfed, 0 if not
n Additionally: - 95% CI for - 95% CI for
21 24.027.02.71
ln xxpp
−++=⎟⎟⎠
⎞⎜⎜⎝
⎛
−
48.0),.041 ,50.0(:1 =− pvalueβ
001.0 ,.17) ,31.0(:2 <−− pvalueβ
7
Example 1: Breast Feeding , Sex and Age
n The slope estimate for sex is - Still an estimated ln(odds ratio) of breast feeding for male
children to female children, who are of the same age - This called the age adjusted association between breast feeding
and sex
n The resulting odds ratio estimate is e0.27≈1.30; Male children in the sample have 30% greater odds of being breastfed than female children in the sample of the same age
n The 95% for the age adjusted odds ratio for males compared to females is (e-0.50, e1.04) →(0.61 , 2.82)
27.01̂ =β
8
Ex1: Predictors of Arm Circumference: Height &Weight
n The slope estimate for age is - Still an estimated ln(odds ratio) of breast feeding for children
who differ by one month in age (older to younger), but are of the same sex
- This called the sex adjusted association between breast feeding and age
n The resulting odds ratio estimate is e-0.24≈.79; A one month difference in age is associatedwith a 21% reduction in the odds of being breast fed (older to younger) among children of the same sex
n The 95% for the true sex-adjusted odds ratio for age is (e-0.31, e-0.17) →(0.73 , 0.84)
24.0ˆ2 −=β
Example 1: Presentation of Findings
n In research articles, frequently a single table of unadjusted and adjusted associations will be presented (for non-randomized studies)
9
Table 1: Logistic Regression Results for Predictors of Breast FeedingOdds Ratio (95% CI)
Predictor Unadjusted Adjusted
Sexfemale 1 1male 1.02 (0.55, 1.90) 1.30 (0.61, 2.82)
age (months) 0.79 (0.73 -‐ 0.84) 0.79 (0.73, 0.84)
Baseline Odds 1,333(exponetiated intercept)
Example 1: Additional Predictors
n Some other additional predictors of interest include - Maternal Parity
No Previous Children 17% 1 Previous Child 16% 2 Previous Children 14% 3 Previous Children 15% >= 4 Previous Children 38%
- Maternal Age mean = 27.7 years, range 17-43 years
10
Example 1: Presentation of Findings
n Sometimes the results from several models will be presented
11
Table 1: Logistic Regression Results for Predictors of Breast FeedingOdds Ratio (95% CI)
Predictor Unadjusted Model 2 Model 3 Model 4
Sexfemale 1 1 1 1male 1.02 (0.55, 1.90) 1.30 (0.61, 2.82)male 1.02 (0.55, 1.90) 1.30 (0.61, 2.82) 1.23 (0.54, 2.77) 1.22 (0.54, 1.77)
age (months) 0.79 (0.73 -‐ 0.84) 0.79 (0.73, 0.84) 0.77 (0.72, 0.83) 0.77 (0.71, 0.83)
Maternal Parity p=0.40 p=0.12 p=0.11 No previous children 1 1 1
1 previous child 0.38 (.12, 1.22) 0.23 (0.05, 1.01) 0.24 (0.05, 1.14)2 previous children 0.50 (0.15, 1.69) 0.36 (0.08, 1.54) 0.39 (0.08, 1.83)3 previous children 0.34 (0.11, 1.10) 0.18 (0.04, 1.05) 0.21 (0.04, 1.04)>=4 previous children 0.61 (0.18, 2.08) 0.61 (0.18, 2.08) 0.75 (0.14, 4.2)
Mother's Age (years) 0.99 (0.94, 1.04) 0.98 (0.89, 1.08)
Baseline Odds 1,333 4,932 7,071(exponetiated intercept)
Example 1: Presentation of Findings
n Sometimes the results from several models will be presented
12
Table 1: Logistic Regression Results for Predictors of Breast FeedingOdds Ratio (95% CI)
Predictor Model 4
Sexfemale 1malemale 1.22 (0.54, 1.77)
age (months) 0.77 (0.71, 0.83)
Maternal Parity p=0.11 No previous children 1
1 previous child 0.24 (0.05, 1.14)2 previous children 0.39 (0.08, 1.83)3 previous children 0.21 (0.04, 1.04)>=4 previous children 0.75 (0.14, 4.2)
Mother's Age (years) 0.98 (0.89, 1.08)
Baseline Odds 7,071
13
Example 2: Predictors of Obesity
n Data from 2009-10 NHANES1
n Sample of over 6,400 US residents, 16-80 years old
n HDL levels: mean 52.4 mg/dl, sd = 16, range 11-14, 15% of sample is obese by (BMI)
n Other potential predictors include sex, age (yrs), and marital status
14
Example 2: Predictors of Obesity
n Data from 2009-10 NHANES
Sex Distribution: 49% F, 51% M Age (years): 46.3 years, range 16 to 80 Marital Status:
Married 52% Widowed 9% Divorced 11% Separated 3%
Never Married 18% Living Together 7%
15
Example 2: Predictors of Obesity
n Note about the obesity/age relationship: here are the results from a lowess plot (this shows the unadjusted association)
-3-2
-10
1
Nat
ural
log
(ln) O
dds
of O
besi
ty
20 40 60 80Age (Years)
bandwidth = .8
2009-10 NHANESNatural ln Odds of Obesity as a Function of Age
Example 2: Logistic Regressions
16
Table 1: Logistic Regression Results for Predictors of ObesityOdds Ratio (95% CI)
Predictor Unadjusted Model 2 Model 3
HDL ( mg/dL) 0.967 (0.961, 0.973) 0.956 (0.951, 0.962) 0.958 (0.952, 0.964)
Males 1.75 (1.52,2.01) 2.63 (2.25, 3.07) 2.61 (2.22, 3.08)
Age Category p<0.001 p < 0.001 p< 0.001< 30 years 1 1 130-‐46 years 1.79 (1.46, 2.19) 1.76 (1.42, 2.17) 1.62 (1.25, 2.10)46-‐62 years 1.82 (1.49, 2.24) 1.95 (1.57, 2.43) 1.79 (1.37, 2.36)>= 62 years 1.47 (1.19, 1.81) 1.66 (1.34, 2.07) 1.53 (1.15, 2.05)
Marital Status p=0.69 oMarried 1 1Widowed 1.10 (0.85, 1.41) 1.05 (0.79, 1.41)Divorced 1.13 (0.90, 1.42) 1.14 (0.89, 1.44)Separated 1.18 (0.81, 1.73) 1.16 (0.78, 1.72)
Never Married 0.99 (0.81, 1.20) 1.19 (0.94, 1.50)Living together 0.91 (0.69, 1.20) 0.92 (0.68, 1.24)
Baseline Odds 0.61 0.58(exponentiated intercept)
Example 2: Logistic Regressions
17
Table 1: Logistic Regression Results for Predictors of ObesityOdds Ratio (95% CI)
Predictor Model 2
HDL ( mg/dL) 0.956 (0.951, 0.962)
Males 2.63 (2.25, 3.07)
Age Category p < 0.001< 30 years 130-‐46 years 1.76 (1.42, 2.17)46-‐62 years 1.95 (1.57, 2.43)>= 62 years 1.66 (1.34, 2.07)
Marital StatusMarriedWidowedDivorcedSeparated
Never MarriedLiving together
Baseline Odds 0.61
Summary
n Summary
18
Section B
Multiple Logistic Regression: Basics of Model Selection and Estimating Outcomes
20
Learning Objectives
n Understand the “linearity assumption” as it applies to multiple logistic regression
n Explain different strategies for picking a “final” multiple logistic regression model among candidates
n Use the results of multiple logistic regression models to compare groups who differ by more than one predictor, and estimate proportions/probabilities for groups given their x values
21
n The algorithm to estimate the equation of the multiple logistic regression is called the maximum likelihood estimation (same process used with simple logistic)
n The estimates for the intercept ( ) and the slopes ( ) are the values that make the observed data “most” likely among all choices for and .
n This must be done by computer.
Estimating for Logistic Regression
oβ̂ p21ˆ,....ˆ,ˆ βββ
oβ̂ p21ˆ,....ˆ,ˆ βββ
n With more than 1 predictor, the logistic regression model is no longer estimating a line in two dimensional space
22
Resulting Shape for Multiple Logistic Regression
n The “linearity” assumption assumes that the adjusted relationship being estimated between the ln(odds of y=1), for a binary outcome y, and each xi is linear in nature (this is an issue for continuous predictors, but not for binary or multi-categorical predictors)
23
Linearity Assumption
n When faced with potentially many possible predictors, how does a researcher go about choosing a “best” model?
n Model building and selection is a combination of science, statistics, and the research goal(s)
24
Choosing a “Final Model”
n If goal is to maximize precision of adjusted estimates
- Keep only those predictors that are statistically significant in final model
n If goal is to present results comparable to results of similar analyses presented by other researchers (on similar or different populations)
- Present at least one model that includes the same predictor set as the other research
25
Choosing a “Final Model”
n If goal is to show what happens to magnitude of association with different levels of adjustment - Present the results from several models that include different
subsets or combinations of adjustment variables
n If goal is prediction…..well this is a slightly more complicated “story”, and we will discuss briefly later in the course
26
Choosing a “Final Model”
Example 1: Prediction with Regression Results
n Recall the models for looking at predictors of breast feeding status (y/n) in Nepali children, 12-36 months
27
Table 1: Logistic Regression Results for Predictors of Breast FeedingOdds Ratio (95% CI)
Predictor Unadjusted Adjusted
Sexfemale 1 1male 1.02 (0.55, 1.90) 1.30 (0.61, 2.82)
age (months) 0.79 (0.73 -‐ 0.84) 0.79 (0.73, 0.84)
Baseline Odds 1,333(exponetiated intercept)
Example 1: Prediction with Regression Results
n Adjusted model on regression scale:
where x1 age (months), x2 = sex (1 M, 0 F)
n Estimate the probability (proportion of) that female children 22 months old are breast fed
28
21 27.024.02.7breastfed) being of oddsln( xx +−+=
Example 1: Prediction with Regression Results
n FYI:
29
Predictor Adjusted
Sexfemale 1male 1.30 (0.61, 2.82)
age (months) 0.79 (0.73, 0.84)
Baseline Odds 1,333(exponetiated intercept)
Example 1: Prediction with Regression Results
n Possible way to present the associated estimated probabilities from the logistic regression results
30
.2.4
.6.8
1
Est
imat
ed P
roba
bilit
y (P
ropo
rtion
)
10 15 20 25 30 35Age of Child (months)
Male Female
estimated from multiple logistic regression
Sample of 192 Nepali Children, 12-36 MonthsEstimated Probabiliy of Being Breast Fed by Age and Sex
Example 1: Prediction with Regression Results
n Adjusted model on regression scale:
where x1 age (months), x2 = sex (1 F, 0 M)
n Estimate the odds ratio of being breast fed for female children 22 months compared to male children , 19 months old
31
321 30.027.024.02.7breastfed) being of oddsln( xxx ++−+=
Example 1: Prediction with Regression Results
n Computations
32
92.1)0(27.0)22(24.0 2.7 ln(odds) :months 22 F, =+−+=
91.2)1(27.0)19(24.0 2.7 ln(odds) :months 19 M, =+−+=
Example 2: Predictors of Obesity: NHANES
33
Table 1: Logistic Regression Results for Predictors of ObesityOdds Ratio (95% CI)
Predictor Unadjusted Model 2 Model 3
HDL ( mg/dL) 0.967 (0.961, 0.973) 0.956 (0.951, 0.962) 0.958 (0.952, 0.964)
Males 1.75 (1.52,2.01) 2.63 (2.25, 3.07) 2.61 (2.22, 3.08)
Age Category p<0.001 p < 0.001 p< 0.001< 30 years 1 1 130-‐46 years 1.79 (1.46, 2.19) 1.76 (1.42, 2.17) 1.62 (1.25, 2.10)46-‐62 years 1.82 (1.49, 2.24) 1.95 (1.57, 2.43) 1.79 (1.37, 2.36)>= 62 years 1.47 (1.19, 1.81) 1.66 (1.34, 2.07) 1.53 (1.15, 2.05)
Marital Status p=0.69 oMarried 1 1Widowed 1.10 (0.85, 1.41) 1.05 (0.79, 1.41)Divorced 1.13 (0.90, 1.42) 1.14 (0.89, 1.44)Separated 1.18 (0.81, 1.73) 1.16 (0.78, 1.72)
Never Married 0.99 (0.81, 1.20) 1.19 (0.94, 1.50)Living together 0.91 (0.69, 1.20) 0.92 (0.68, 1.24)
Baseline Odds 0.61 0.58(exponentiated intercept)
Example 2: Logistic Regressions
34
Table 1: Logistic Regression Results for Predictors of ObesityOdds Ratio (95% CI)
Predictor Model 2
HDL ( mg/dL) 0.956 (0.951, 0.962)
Males 2.63 (2.25, 3.07)
Age Category p < 0.001< 30 years 130-‐46 years 1.76 (1.42, 2.17)46-‐62 years 1.95 (1.57, 2.43)>= 62 years 1.66 (1.34, 2.07)
Marital StatusMarriedWidowedDivorcedSeparated
Never MarriedLiving together
Baseline Odds 0.61
5432
1
97.056.067.056.0045.05.0obese) being of oddsln(
xxxxx
+++
+−+−=
Example 2: Logistic Regressions
n Estimate the proportion (probability) of 50 year old males with HDL of 80 mg/dL who are obese
35
46.2)1(97.0)0(56.0)1(67.0)0(56.0)80(045.05.0obese) being of oddsln(
−=+++
+−+−=
Example 2: Logistic Regressions
n Potential Presentation of Results
36
0.2
.4.6
Pre
dict
ed P
roba
bilit
y
0 50 100 150HDL
< 30 years 30-44 years44-62 years >= 62 years
By HDL levels and AgePredicted Probability of Obesity for Males
0.2
.4.6
Pre
dict
ed P
roba
bilit
y
0 50 100 150HDL
< 30 years 30-44 years44-62 years >= 62 years
By HDL levels and AgePredicted Probability of Obesity for Females
Summary
n Multiple logistic regression results can be used to estimate probabilities (proportions) of binary outcomes for a given subset in a population given their predictor values
n Multiple logistic results can be used to estimate odds ratios between groups who differ by more than one characteristic (predictor)
n Confidence intervals can be estimated for each of the above, but a computer is required
37
Section C
Multiple Logistic Regression: Some Examples from the Literature
39
Learning Objectives
n Interpret the results from simple and multiple logistic regression models presented in published journal articles
Example 1: Chemotherapy Utilization1
n From abstract
n 1 Du x. et al., Discrepancy Between Consensus Recommendations and Actual Community Use of Adjuvant Chemotherapy in Women with Breast Cancer. Archives of Internal Medicine, 203, Vol. 138, 90-07
40
Example 1: Chemotherapy Utilization
n From methods section “we used multivariable logistic regression analysis to generate the odds ratio of receiving chemotherapy in women with breast cancer and to determine the effect of age (Table 1) on chemotherapy use. In this model, we adjusted for race (white, black, or others), tumor stage (stage I, stage II, or stage IIIA), node status, hormone receptor status (Table 2), whether the patient had received surgery and radiation therapy (categorized as breast-conserving surgery without radiation, breast-conserving surgery with radiation, or mastectomy), and adjuvant hormone therapy use (yes or no). In addition to odds ratios, we generated the probabilities of receiving chemotherapy from the parameters of the logistic regression for women with different ages by holding other factors constant. “
41
Example 1: Chemotherapy Utilization
n Results table, Table 4
42
Example 2: Obesity and Neighborhood Condition2
n From abstract
n 3 Singh G. et al., Neighborhood Socioeconomic Conditions, Built Environments, And Childhood Obesity. Health Affairs, 2010, Vol. 29: No 3: 503-512
43
Example 2: Obesity and Neighborhood Condition
n From abstract
44
Example 2: Obesity and Neighborhood Condition
n Extensive Footnote
45
Example 2: Obesity and Neighborhood Condition
n Results
46
Example 3: HIV, HBV, HCV in IVDUs3
n Abstract 3 Akselrod H, et al. Seroprevalence of HIV, Hepatitis B Virus, and HCV Among Injection Drug Users in Connecticut: Understanding Infection and Coinfection Risks in a Nonurban Population. American Journal of Public Health. (Published online ahead of print October 17, 2013)
47
Example 3: HIV, HBV, HCV in IVDUs
n From Methods Section
48
Example 3: HIV, HBV, HCV in IVDUs
n Results: small portion of table
49
Example 3: HIV, HBV, HCV in IVDUs
n Results: another portion of table (with header appended)
50
Example 3: HIV, HBV, HCV in IVDUs
n Interesting….
51
Example 4: Alcohol and Mortalilty: Case/Control 4
n Abstract, part 1 4 Leon D, et al. Hazardous alcohol drinking and premature mortality in Russia: a population based case-control study. Lancet. 2007; 369: 2001–09
52
Example 4: Alcohol and Mortalilty: Case/Control 4
n Abstract, part 2 4 Leon D, et al. Hazardous alcohol drinking and premature mortality in Russia: a population based case-control study. Lancet. 2007; 369: 2001–09
53
Example 4: Alcohol and Mortality: Case/Control
n From Methods Section
“Logistic regression was used to estimate the strength of association of factors with mortality, with all analyses done with STATA (version 9.1). In all models, age was included in six 5-year categories. Education, smoking, and marital status were treated as potential confounders, and where appropriate were introduced into models as categorical variables.”
54
Example 4: Alcohol and Mortality: Case/Control
n From Methods Section
“Logistic regression was used to estimate the strength of association of factors with mortality, with all analyses done with STATA (version 9.1). In all models, age was included in six 5-year categories. Education, smoking, and marital status were treated as potential confounders, and where appropriate were introduced into models as categorical variables.”
55
Example 4: Alcohol and Mortality: Case/Control
n Results table
56
Example 5: How Not to Present Results
n From an article that will not be named
57