Logistic Regression and Confounding

  • Upload
    ds532

  • View
    228

  • Download
    0

Embed Size (px)

Citation preview

  • 8/13/2019 Logistic Regression and Confounding

    1/48

    Advanced Data Analysis:

    Methods to Control for Confounding(Matching and Logistic Regression)

  • 8/13/2019 Logistic Regression and Confounding

    2/48

    Goals Understand the issue of confounding in

    statistical analysis

    Learn how to use matching and logisticregression to control for confounding

  • 8/13/2019 Logistic Regression and Confounding

    3/48

    Confounding Example: people in a gastrointestinal outbreak

    Mostly members of the same dinner club BUT many clubmembers also went to a city-wide food festival

    Food handling practices in the dinner club might be blamedfor the outbreak when food eaten at the festival was thecause

    Membership in the dinner club could be a confounderof therelationship between attendance at the food festival andillness

    Analyzing the data to account for both dinner clubmembership and food festival attendance could helpdetermine which event was truly associated with theoutcome

  • 8/13/2019 Logistic Regression and Confounding

    4/48

    Confounding Gastrointestinal outbreak (continued)

    Stratification methods could be used to calculate

    the risk of illness due to the food festival for thosein the dinner club vs. those not in the dinner club

    If attending the food festival was a significant riskfactor for illness in both groups, then the festival

    would be implicated because illness occurredwhether or not people were members of thedinner club

  • 8/13/2019 Logistic Regression and Confounding

    5/48

    Confounding What if there are multiple factors that might be

    confounding the exposure-disease relationship? Using our previous example, what if we had to stratify by

    membership in the dinner club andby health status? Orstratify by other potential confounders (age, occupation,income, etc.)?

    Trying to stratify by all of these layers becomes difficult

    At this point more advanced methods are needed: Logistic regressioncontrols for many potential

    confounders at one time

    Matchingwhen incorporated correctly into the studydesign, reduces confounding before analysis begins

  • 8/13/2019 Logistic Regression and Confounding

    6/48

    Confounding Confounders In field epidemiology, we commonly compare two

    groups by using measures of association: Risk ratio (RR) in cohort studies

    Odds ratio (OR) in case-control studies

    May have multiple exposures significantly associatedwith disease or no exposures associated In these cases you need to explore whether a confounder is

    present making it appear that exposures are associated with

    the disease (when they really are not) or making it appearthat no association exists (when there really is one)

  • 8/13/2019 Logistic Regression and Confounding

    7/48

    Confounders A confounderis a variable that distorts the risk ratio

    or odds ratio of an exposure leading to an outcome Confounding is a form of bias that can result in a distortion

    in the measure of association between an exposure anddisease

    Confounding must be eliminated for accurate results (1)

    Confounding can occur in an observationalepidemiologic study whenever two groups are

    compared to each other Confounding is a mixing of effects when the groups are

    compared (exposure-disease relationship can be affected byfactors other than the relationship)

  • 8/13/2019 Logistic Regression and Confounding

    8/48

    Common Confounders Common confounders include age,

    socioeconomic status and gender.

    Examples: Children born later in the birth order are more

    likely to have Downs syndrome. Does birth order cause Downs syndrome?

    Norelationship is confounded by mothers age, older

    women are more likely to have children with Downs Mothers age confounds the association between birth

    order and Downs syndrome: appears there is anassociation when there is not (2)

  • 8/13/2019 Logistic Regression and Confounding

    9/48

    Common Confounders--Examples Womens use of hormone replacement therapy (HRT)

    and risk of cardiovascular disease Some studies suggest an association, others do not

    Women of higher socio-economic status (SES) are morelikely to be able to afford HRT

    Women of lower SES are at higher risk of cardiovasculardisease

    Differences in SES may thus confound the relationship

    between HRT and cardiovascular disease Need to control for SES among study participants (3)

  • 8/13/2019 Logistic Regression and Confounding

    10/48

    Common Confounders--Examples Hypothetical outbreak of gastroenteritis at a

    restaurant Study shows women were at much greater risk of

    the disease than men Association is confounded by eating salad

    women were much more likely to order salad thanmen

    Salad was contaminated with disease-causing

    agent Relationship between gender and disease was

    confounded by salad consumption (which was thetrue cause of the outbreak)

  • 8/13/2019 Logistic Regression and Confounding

    11/48

  • 8/13/2019 Logistic Regression and Confounding

    12/48

    Controlling for Confounding To control for confounding you must take the confounding

    variable out of the picture There are 3 ways to do this:

    Restrict the analysisanalyze the exposure-disease relationship

    only among those at one level of the confounding variable Example: look at the relationship between HRT and cardiovascular

    disease ONLY among women of high SES

    Stratifyanalyze the exposure-disease relationship separately forall levels of the confounding variable

    Example: look at the relationship between HRT and cardiovasculardisease separately among women of high SES and low SES

    Conduct logistic regressionregression puts all the variables into amathematical model

    Makes it easy to account for multiple confounders that need to becontrolled

  • 8/13/2019 Logistic Regression and Confounding

    13/48

    Controlling for Confounding:

    Stratification Stratification can be used to separate the

    effects of exposures and confounders Example: tuberculosis (TB) outbreak among

    homeless men Homeless shelter and soup kitchen implicated as

    the place of transmission Men likely to spend time in both places

    To determine which site is most likely, couldexamine the association between the homelessshelter and TB among men who did NOT go to thesoup kitchen and among men who DID go to thesoup kitchen

  • 8/13/2019 Logistic Regression and Confounding

    14/48

  • 8/13/2019 Logistic Regression and Confounding

    15/48

    Stratification--Example After conducting a case-control study, overall

    data show the following:

    Cases Controls Total

    Cookies 37 21 58

    No Cookies 13 29 42

    Total 100

    Cookie Exposu re

    OR = (37x29)/(21x13) = 3.93; 95% CI, 1.69 9.15

    p= 0.001*

  • 8/13/2019 Logistic Regression and Confounding

    16/48

    Stratification--Example

    Data continued..

    Cases Controls Total

    Punch 40 20 60

    No Punch 10 30 40

    Total 100

    Punch Exposu re

    OR = (40x30)/(20x10) = 6.00; 95% CI, 2.83 12.71

    p= 0.0004*

  • 8/13/2019 Logistic Regression and Confounding

    17/48

    Stratification--Example

    Both cookies and punch have a high odds ratio forillness & a confidence interval that does not include 1 OR (cookies) = 3.93; 95% CI, 1.699.15, p= 0.001*

    OR (punch) = 6.00; 95% CI, 2.8312.71, p= 0.0004* To stratify by punch exposure, we want to know:

    Among those who did notdrink punch, what is the oddsratio for the association between cookies and illness?

    Among those who diddrink punch, what is the odds ratio forthe association between cookies and illness?

    If cookies are the culprit, there should be an associationbetween cookies and illness, regardless of whether anyonedrank punch

  • 8/13/2019 Logistic Regression and Confounding

    18/48

    Stratification--Example

    Stratification of the cookie association bypunch exposure:

    Cases Controls Total

    Cookies 35 17 52

    No Cookies 5 3 8

    Total 60

    Did have punch

    OR = (35x3)/(17x5) = 1.3; 95% CI, 0.17 7.22

    p= 1.0*

  • 8/13/2019 Logistic Regression and Confounding

    19/48

    Stratification--Example

    Stratification of the cookie association by punch

    exposure:

    Cases Controls Total

    Cookies 2 4 6

    No Cookies 8 26 34

    Total 40

    Did not have punch

    OR = (2x26)/(4x8) = 1.63; 95% CI, 0.12 13.86

    p= 0.63*

  • 8/13/2019 Logistic Regression and Confounding

    20/48

    Stratification--Example

    To stratify by cookie exposure, we want toknow: Among those who did noteat cookies, what is the

    odds ratio for the association between punch andillness?

    Among those who dideat cookies, what is theodds ratio for the association between punch and

    illness? If punch is the culprit, there should be an

    association between punch and illness, regardlessof whether anyone ate cookies

  • 8/13/2019 Logistic Regression and Confounding

    21/48

    Stratification--Example

    Stratification of the punch association bycookie exposure:

    Cases Controls Total

    Punch 35 17 52

    No Punch 2 4 6

    Total 58

    Did have cookies

    OR = (35x4)/(17x2) = 4.12; 95% CI, 0.52 48.47

    p= 0.18*

  • 8/13/2019 Logistic Regression and Confounding

    22/48

    Stratification--Example

    Stratification of the punch association bycookie exposure:

    Cases Controls Total

    Punch 5 3 8

    No Punch 8 26 34

    Total 42

    Did not h ave cook ies

    OR = (5x26)/(3x8) = 5.42; 95% CI, < 0.80 40.95

    p= 0.08*

  • 8/13/2019 Logistic Regression and Confounding

    23/48

    Stratification

    Stratification allows us to examine two riskfactors independently of each other

    In our cookies and punch example we cansee that cookies were not really a risk factorindependent of punch (stratified ORs 1)

    Punch remained a potential risk factorindependent of cookies (large ORs and p-values close to significant)

  • 8/13/2019 Logistic Regression and Confounding

    24/48

    More on Stratification

    Mantel-Haenszel odds ratio Method of controlling for confounding using stratified

    analysis

    Takes an association, stratifies it by a potential confounderand then combines these by averaging them into oneestimate that is controlled for the stratifying variable

    Cookies and punch example: 2 stratum-specific estimates of the association between

    punch and illness (ORs of 4.1 and 5.4) More convenient to have only one estimatecan average

    two estimates into a pooled or common odds ratio

  • 8/13/2019 Logistic Regression and Confounding

    25/48

    Stratification andEffect Measure Modifiers

    Effect measure modification One stratum shows no association (OR 1) while another

    stratum does have an association No confounding third variable present, rather, need to

    identify and present estimates separately for each level orstratum

    Example: if gender is an effect measure modifier, youshould give 2 odds or risk ratios, 1 for men and 1 forwomen

    You identify effect measure modification bystratification (same technique used to identifyconfounding) but you are looking for the measure ofeffect to be different between the 2 or more strata

  • 8/13/2019 Logistic Regression and Confounding

    26/48

    Effect Measure Modifiers--Examples

    Among the elderly, gender is an effect modifier of theassociation between nutritional intake and osteoporosis Nutritional intake (calcium) is associated with osteoporosis

    among women Among men this association is not so strong because mens

    bone mineral content is not affected as much by nutritionalintake

    In developing countries, sanitation is an effect modifier ofthe association between breastfeeding and infantmortality In unsanitary conditions, breastfeeding has a strong effect in

    reducing infant mortality In cleaner conditions infant mortality is not very different

    between breastfed and bottle-fed infants

  • 8/13/2019 Logistic Regression and Confounding

    27/48

    Matching

    Matching can reduce confounding

    In case-control studies cases are matched to

    controls on desired characteristics In cohort studies unexposed persons are matched

    to exposed persons on desired characteristics

    You must account for matching when

    analyzing matched data Important that the matched variables not be

    exposures of interest

  • 8/13/2019 Logistic Regression and Confounding

    28/48

    Matching--Example

    Hypothetical study where students in a high schoolhave reported a strange smell and sudden illness Test the association between smelling an unusual odor and

    a set of symptoms Match cases and controls on gender, grade and hallway

    Precedents for outbreaks of illness related to unusual odors inbuildings, possibly psychogenic (ie. illness spread by panicrather than true cause)

    Women are more reactive in this situation, grade level controls

    for age (different ages may react differently) and matching onhallway controls for actual odor observed (different locationsmay produce different odors)

  • 8/13/2019 Logistic Regression and Confounding

    29/48

    Matching--Example

    Cells e and h are concordant cells because the case and thecontrol have the same exposure status

    Cells f and g are discordant because the case and control have adifferent exposure status

    Only the discordant cells give us useful data to contrast theexposure between cases and controls

    Controls

    Cases

    Exposed Not Exposed Total

    Exposed e f e + f

    Not Exposed g h g + h

    Total e + g f + h

    With matched case-control pairs, a 2x2 table is set up to examine pairs

    Table 1: Analysis of matched pairs for a case control study

  • 8/13/2019 Logistic Regression and Confounding

    30/48

    Matching--Example

    A chi-square for matched data (McNemarschi-square) can be calculated using a

    statistical computing program Calculation examines discordant pairs and results

    in a McNemar chi-square value and p-value

    If the p-value

  • 8/13/2019 Logistic Regression and Confounding

    31/48

    Matching--Example

    A table of discordant pairs can also be usedto calculate a measure of association

    Controls

    Cases

    Smell No Smell Total

    Smell 6 12 18

    No Smell 4 5 9

    Total 10 17

    Table 2: Sample data for sudden illness in a high school.Controls matched to cases on gender, grade, and hallway in the school

  • 8/13/2019 Logistic Regression and Confounding

    32/48

    Matching--Example

    Calculating the odds ratio:

    OR = (# pairs with exposed cases and unexposed cases)

    (# pairs with unexposed cases and exposed controls)

    = f / g = 12/4 = 3.0

    Interpretation: The odds of having a sudden onset of nausea, vomiting, or

    fainting if students smelled an unusual odor in the school

    were 3.0 times the odds of having a sudden onset of thesesymptoms if students did not smell an unusual odor in theschool, controlling for gender, grade, and location in theschool.

  • 8/13/2019 Logistic Regression and Confounding

    33/48

    Matching

    An important note about matching: Once you have matched on a variable, you

    cannot use that variable as a risk factor inyour analysis

    Cases and controls will have the exactsame matched variables so they are

    useless as risk factors Do not match on any variable you suspect

    might be a risk factor

  • 8/13/2019 Logistic Regression and Confounding

    34/48

    An Introduction to LogisticRegression

    Logistic regression is a mathematicalprocess that results in an odds ratio

    Logistic regression can control fornumerous confounders

    The odds ratio produced by logistic

    regression is known as the adjustedodds ratio because its value has beenadjusted for the confounders

  • 8/13/2019 Logistic Regression and Confounding

    35/48

  • 8/13/2019 Logistic Regression and Confounding

    36/48

    An Introduction to LogisticRegression

    Logistic regression uses an equation called alogit functionto calculate the odds ratio

    Using our earlier punch and cookies example,we suspect one of these food items isconfounding the other

    Variables would be:

    SICK (value is 1 if ill, 0 if not ill) PUNCH (1 if drank punch, 0 if did not drink punch)

    COOKIES (1 if ate cookies, 0 if did not eatcookies)

  • 8/13/2019 Logistic Regression and Confounding

    37/48

    Logistic Regression--Example

    General equation is:

    Logit (OUTCOME) = EXPOSURE + CONFOUNDER1

    + CONFOUNDER2 + CONFOUNDER3 + (etc) For our example:

    Outcome = variable SICK

    Exposure = variable PUNCH

    Confounder = variable COOKIES

    Equation is: Logit (SICK) = PUNCH + COOKIES

  • 8/13/2019 Logistic Regression and Confounding

    38/48

  • 8/13/2019 Logistic Regression and Confounding

    39/48

    Logistic Regression:Important Points

    Each variable on the right side of the equation iscontrolling for all the other variables on the right sideof the equation If you are not sure whether one of several variables is a

    confounder, you can examine them all at the same time

    Two important warnings: Do not put too many variables in the equation (a loose rule

    of thumb is you can add one variable for every 25observations)

    You cannot control for confounders you did not measure(Example: if a childs attendance at a particular daycare wasa confounder of the SICK-PUNCH relationship, but you donot have data on childrens daycare attendance, you cannotcontrol for it.)

  • 8/13/2019 Logistic Regression and Confounding

    40/48

  • 8/13/2019 Logistic Regression and Confounding

    41/48

    Logistic Regression

    For many investigations you may not need to uselogistic regression

    Logistic regression is helpful in managing

    confounding variables, useful with large datasets andin studies designed to establish risk factors forchronic conditions, cancer cluster investigations orother situations with numerous confounding factors

    Many software packages can simplify data analysisusing logistic regression SAS, SPSS, STATA and Epi Info are a few examples

  • 8/13/2019 Logistic Regression and Confounding

    42/48

    Logistic Regression:Software Packages

    Common software packages used for data analysis,including logistic regression* SASCary, NC http://www.sas.com/index.html

    SPSSChicago, IL http://www.spss.com/ STATACollege Station, TX http://www.stata.com

    Epi InfoAtlanta, GA http://www.cdc.gov/EpiInfo/

    EpisheetBoston, MAhttp://members.aol.com/krothman/modepi.htm

    (Episheet cannot do logistic regression but is useful forsimpler analyses, e.g., 2x2 tables and stratified analyses.)

    *This is not a comprehensive list, and UNC does not specifically

    endorse any particular software package.

    http://www.sas.com/index.htmlhttp://www.spss.com/http://www.stata.com/http://www.cdc.gov/EpiInfo/http://members.aol.com/krothman/modepi.htmhttp://members.aol.com/krothman/modepi.htmhttp://www.cdc.gov/EpiInfo/http://www.stata.com/http://www.spss.com/http://www.sas.com/index.html
  • 8/13/2019 Logistic Regression and Confounding

    43/48

    Logistic Regression--Examples

    Wedding Reception, 1997 (4)

    Guests complained of a diarrheal illness diagnosed

    as cyclosporiasis Univariate analysis (using 2x2 tables) showed

    eating raspberries was the exposure most stronglyassociated with risk for illness

    Multivariate logistic regression showed sameresults

    Investigators determined raspberries had not beenwashed

  • 8/13/2019 Logistic Regression and Confounding

    44/48

    Logistic Regression--Examples

    Assessing the relationship between obesity andconcern about food security (5) Washington State Dept. of Health analyzed data from the

    1995-99 Behavioral Risk Factor Surveillance System A variable indicating concern about food security was

    analyzed using a logistic regression model with income andeducation as potential confounders

    Persons who reported being concerned about food securitywere more likely to be obese than those who did not report

    such concerns (adjusted OR = 1.29, 95% CI: 1.04-1.83)

  • 8/13/2019 Logistic Regression and Confounding

    45/48

    Matching & Conditional LogisticRegression--Examples

    Foodborne SalmonellaNewport outbreak, 2002 (6) Affected 47 people from 5 different states

    Case-control study carried out, controls matched by age-

    group Logistic regression conducted to control for confounders

    Cases were more likely than controls to have eaten groundbeef (MOR = 2.3, 95% CI: 0.9-5.7) and more likely to haveeaten raw or undercooked ground beef (MOR = 50.9, 95%CI: 5.3-489.0)

    No specific contamination event identified but public healthalert issued to remind consumers about safe food-handlingpractices

  • 8/13/2019 Logistic Regression and Confounding

    46/48

    Matching & Conditional LogisticRegression--Examples

    Outbreak of typhoid fever in Tajikistan, 1996-97 (7) 10,000 people affected in outbreak, case-control study conducted Cases were culture positive for the organism (Salmonella serotype

    Typhi)

    Using 2x2 tables, illness was associated with: Drinking unboiled water in the 30 days before onset (MOR = 6.5, 95%

    CI: 3.0-24.0) Using drinking water from a tap outside the home (MOR = 9.1, 95%

    CI: 1.6-82.0) Eating food from a street vendor (MOR = 2.9, 95% CI: 1.4-7.2)

    When all variables were included in conditional logistic regression,

    only drinking unboiled water (MOR = 9.6, 95% CI: 2.7-334.0) andobtaining water from an outside tap (MOR = 16.7, 95% CI: 2.0-138.0) were significantly associated with illness

    Routinely boiling drinking water was protective (MOR = 0.2, 95% CI:0.05-0.5)

  • 8/13/2019 Logistic Regression and Confounding

    47/48

    Conclusion

    Controlling for confounding can be doneusing matched study design and logistic

    regression While complicated, with practice these

    methods can be as easy to use as 2x2

    tables

  • 8/13/2019 Logistic Regression and Confounding

    48/48

    References1. Gregg MB. Field Epidemiology. 2nd ed. New York, NY: Oxford

    University Press; 2002.2. Hecht CA, Hook EB. Rates of Down syndrome at livebirth by one-year

    maternal age intervals in studies with apparent close to completeascertainment in populations of European origin: a proposed revisedrate schedule for use in genetic and prenatal screening.Am J Med

    Genet.1996;62:376-385.3. Humphrey LL, Nelson HD, Chan BKS, Nygren P, Allan J, Teutsch S.

    Relationship between hormone replacement therapy, socioeconomicstatus, and coronary heart disease. JAMA. 2003;289:45.

    4. Centers for Disease Control and Prevention. Update: Outbreaks ofCyclosporiasis -- United States, 1997. MMWR Morb Mort Wkly Rep.1997;46:461-462. Available at: http://www.cdc.gov/mmwr/PDF/wk/mm4621.pdf.Accessed December 12, 2006.

    5. Centers for Disease Control and Prevention. Self-reported concernabout food security associated with obesity --- Washington, 19951999. MMWR Morb Mort Wkly Rep.2003;52:840-842. Available at:http://www.cdc.gov/mmwr/preview/mmwrhtml/mm5235a3.htm.Accessed December 12, 2006.

    http://www.cdc.gov/mmwr/PDF/wk/mm4621.pdfhttp://www.cdc.gov/mmwr/PDF/wk/mm4621.pdfhttp://www.cdc.gov/mmwr/PDF/wk/mm4621.pdfhttp://www.cdc.gov/mmwr/PDF/wk/mm4621.pdfhttp://www.cdc.gov/mmwr/PDF/wk/mm4621.pdfhttp://www.cdc.gov/mmwr/preview/mmwrhtml/mm5235a3.htmhttp://www.cdc.gov/mmwr/preview/mmwrhtml/mm5235a3.htmhttp://www.cdc.gov/mmwr/preview/mmwrhtml/mm5235a3.htmhttp://www.cdc.gov/mmwr/preview/mmwrhtml/mm5235a3.htmhttp://www.cdc.gov/mmwr/preview/mmwrhtml/mm5235a3.htmhttp://www.cdc.gov/mmwr/PDF/wk/mm4621.pdfhttp://www.cdc.gov/mmwr/PDF/wk/mm4621.pdf