Total, direct and indirect effects - people.duke.edupeople.duke.edu/~mababyak/docs/other/BABYAK_MORTENSON... · Web viewOne solution to this problem is to take a close look at the

Confounding, mediation, and some general considerations in regression modeling

Multivariable regression and its variations are currently the most frequently used type of statistical

technique in behavioral medicine research. A typical example of a multivariable1 model in our field

might be a regression model that attempts to evaluate a set of psychosocial predictors of disease,

such as heart failure. The measure indicating disease could be operationalized in a number of

measurement forms: as a continuous variable, such as left ventricular ejection fraction; as two or

more categories, such as the ordinal values of a symptom severity score; or even as the time elapsed

between some defined occasion and the diagnosis of heart failure. The term multivariable indicates

that the model contains a single response (dependent) variable, in this case, the marker of heart

failure, and at least two predictor variables in the model. The ostensible aim is to understand the

‘independent’ association of each predictor variable with the response variable. In the most

commonly used type of model, the regression coefficient or parameter estimate for a given predictor

represents the association between that predictor and the response, adjusting for all other predictors

in the model. We also might have more than one response variable, perhaps two separate indicators

of heart failure, such as LVEF and a symptom severity score. We might simply conduct a separate

regression analysis for each outcome, but also might elect to use a model that contains the two

response variables, referred to as a multivariate model. Regardless of which model we choose, a

number of important decisions must be made in developing the model. Foremost is the selection of

the form of probability model that best suits the response variable(s) under study. Next, and often

the most difficult part of the process, is to decide which predictors should be included in the model.

For the vast majority of work we do in behavioral medicine, an important part of the variable

selection process involves our presumed causal model. If we conduct a regression model to 1 The terms “multivariable” and “multivariate” are often confused. “Multivariable” indicates that the model contains a single response (dependent) variable, in this case, the marker of heart failure, and at least two predictor variables in the model. “Multivariate,” in contrast, indicates the presence of at least two response variables.

1

Laust H Mortensen, 04/06/10,

This is the stuff of footnotes, but I always like to compare this use with ‘multiple regression’ and contrast it with ‘multivariate’

examine the association between, say, tobacco use and heart failure, we are more often than not

proposing that tobacco use is a cause of heart failure. Upon proposing this model, we must

immediately set about identifying potential confounders, that is, other variables that may threaten

our causal conclusion. We also may be interested in variables that carry information about

mechanisms that occur in between the act of smoking and the outcome of heart failure. Finally, we

also may be concerned, or even believe a priori, that the association between a given predictor and

the response may differ depending on the level of another variable. For example, tobacco use may

be related to heart failure only for persons with a certain genotype. Of course, there are many

additional considerations in conducting a multivariable regression analysis, including testing

assumptions, proper scaling or standardization of the predictors, perhaps centering, rescaling or

orthogonalizing predictors, to name a few. In the present chapter, our focus will be relatively

narrow. After a few preliminaries, we will discuss 1) considerations in selecting predictor variables

for a model; 2) modern approaches to mediation; 3) testing for moderation, and finally 4) the role of

sample size in estimating regression models.

Preliminaries: What is a Model?

What is a model and why use one? The statistical models we use in behavioral medicine typically

take the general form of one of more ‘predictor’ variables and one outcome, or response variable,

such as y = bx1 + bx2 + bx3 + … where y is the response variable, the x’s are the predictor variables,

and the b’s are regression weights. In the vast majority of modern modeling algorithms, the

predictor variables can be of any form, including continuous, categorical, and ordinal (and as we

will note again later, there is no normality requirement for variables on the predictor side of an

equation). A few words about nomenclature are appropriate here. Techniques such as Analysis of

Variance (ANOVA) and Analysis of Covariance (ANCOVA), and multivariable (often referred to

2

as “multiple”) regression have been almost entirely displaced by more general models (e.g., the

general linear model, and the generalized linear model). This transition has created a somewhat

confusing amalgam of terminology from these older techniques. Variables on the x-side of the

equation are referred to interchangeably as independent variables, predictors, covariates,

covariables, or, for variables measured as categories, factors. Variables on the y-side are referred

to as the response, outcome, or dependent variable.

Models per se tend to be preferred over traditional tests (e.g., t-tests, chi-square tests) nowadays for

several reasons. First, models provide not only the same information as more conventional testing

approaches, that is, whether the effect of interest is “statistically significant,” but also yield

information about the size of the effect of interest, along with information about the uncertainty of

the effect estimate, usually in the form of a confidence interval. For example, in a clinical trial

comparing a new blood pressure-lowering drug to a standard drug could legitimately be evaluated

using a simple t-test that compares the treatment groups on mean blood pressure at the end of the

trial. However, the t-test would provide no information on how big the difference was. A key

advantage of multivariable models is that we can include so-called adjustment variables in addition

to the primary variable or variable of interest. These adjustment variables can serve a variety of

purposes in a multivariable model, and these purposes are at the heart of the remainder of this

chapter.

In modern practice most of the earlier techniques, such as t-tests, chi-square tests, ANOVA,

ANCOVA, multiple regression, etc., have been subsumed under the a few more general algorithms.

The generalized linear model (1), for example, can contain one or more variables of virtually any

measurement form on the predictor side, and the probability distribution of the dependent variable

can take a variety of forms beyond normal. These include the binomial, negative binomial, and

3


Here you use ’multiple’ rather than ‘multivariable’

gamma distributions. Hence, multiple regression, Llogistic regression, P oisson regression, and

many other conventional models models are often still estimated using dedicated logistic regression

routines, but also can be accomplished estimated using the generalized linear model. For time-to-

event data, the Cox regression model (2) is probably the most commonly used approach today,

although parametric techniques also appear with relative frequency. In addition to general linear

model, structural equation models (SEM) (3) have now been extended sufficiently such that they

also can perform virtually all of the above functions. SEM also has the advantage of allowing so-

called indirect relations to be estimated and tested, which we will discuss further in the section on

mediation below.

Cause. Except for the relatively rare case where a regression model is used for completely blind

empirical prediction, researchers typically use regression models to help understand something

substantive about the phenomena under study. Regardless of whether we care to admit it or not,

researchers are largely interested in using regression models to understand cause. Why would we

measure and model, for example, risk factors as predictors of cardiac disease if we were not

interested in those risk factors as causes? If understanding causation underlies our models, most

would agree that a useful model will include as many of the casually relevant variables in the

system as possible. What makes a variable “relevant?” This question has been of great interest and

debate from many decades in the statistics literature, and is often cast in terms of the problem of

“variable selection.” We argue that relevance depends on the causal model underlying the analysis.

Confounding. In the context of causal hypotheses, confounders represent a highly relevant type of

variable. For a variety of reasons, we know that should never be fooled into believing that an

association between two variables is sufficient evidence for causation. It may be the case, for

example, that the putative cause is confounded with another variable. The term confounding

4

Mike Babyak, 04/06/10,

Reworked this in its entirety


This statement may cause the uninitiated reader to confuse ‘routines’ (software) and ‘models’ (statistics). This is similar to the confusion I experienced when trying to make sense of the Mc Ardle paper. Isn’t any logistic regression of the type that we are considering here an instance of a generalized linear model? Isn’t all estimation MLE?

derives from the Latin confundere, to pour together, or to mix (4). At root, confounding is the

mixing of the role of two predictor variables. Imagine you are at the bottom of a steep ravine

looking up at a train trestle. Suddenly a very small boy goes running across the train trestle,

followed shortly by a much larger boy, who is shouting at the small boy. You conclude that the

large boy is chasing the small boy, that is, causing the small boy to run. However, shortly after the

boys cross the trestle, a train comes barreling across the trestle behind them. In fact, the larger boy

was not chasing the small boy at all; the train was causing both of them to run across the trestle

quickly. The causal role of the large boy and the train were mixed up, or confounded. The

presence of the large boy was really just a red herring; he just happened to be running from the

train, too. In conducting research we study one or a just a few variables that are of particular

interest in order to understand something about the causal relation with between that variable and

some outcome variable.

A simple example in a research context is presented in a didactic paper Rubin (5). Rubin presents

several large epidemiological studies that all seem to show that smoking tobacco in a pipe or

cigarette is associated with a higher rate of cancer deaths than smoking tobacco in cigarette form. It

is often helpful to draw a diagram of our causal hypothesis. This result is, of course, contrary to

our understanding of the relative dangers of the types of tobacco delivery. So, we need to ask

whether tobacco type (pipe/cigar vs. cigarette) might be confounded with some other variable.

More formally, we consider the general criteria for confounding, which are as follows: 1) the

confounding variable is presumed to be must becausally related to the predictor under study; 2) the

confounding variable must beis presumed to be causally related to the outcome; 3) the confounding

variable cannot be in the causal chain betweenis either common cause or a proxy for a common

cause of the predictor and the outcome. In our tobacco example, tobacco type is the predictor of

interest and cancer death is the outcome. What variable might be associated with the tobacco type

5


You are absolutely righ. Sloppy on my part.


This definition has the weakness that it is unclear what is meant by ‘related to’ in 1 & 2: It can be taken to mean statistically associated or to imply causal effect. The DAGs people will argue that this will lead people to confuse the actual bivariate associations in the dataset with the causal assumptions of conditional exchangeability. DAGists tend to favor a definition that expresses 1 & 2 with causal lingo, e.g. that the confounding variable is a common cause of the predictor and the outcome (or a proxy for a common cause).

and cancer death but is not in the causal chain between the two? One obvious candidate is age.

Older people are more likely than younger people to smoke cigars or pipes and are also more likely

to die of cancer. Although chronological age is clearly causally related to cancer death, age cannot

be caused by the type of tobacco we smoke. In Rubin’s examples, in each of the samples it was

clear that pipe/cigar smokers were much older on average than cigarette smokers, and that the death

rate also was higher among older individuals. When age was properly accounted for in the analysis,

the death rate among pipe/cigar smokers was no longer higher than among cigarette smokers—in

fact, it became lower. Thus the ‘effect’ of tobacco type was confounded with age.

Causal graphs. Causal models can be easier to comprehend if presented in graphic form. Often the

graphs are used as an informal heuristic tool and sometimes they are employed in a more formal

way as (causal) Directed Acyclic Graphs (DAGs) or to represent a Structural Equation Model

(SEM). Providing an introduction to graph theory is beyond the scope of this text, but a non-

technical introduction to causal DAGs can be found in Glymour and Greenland (6) and an overview

of causal analysis in the context of mediation is provided by VanderWeele and Vansteeland (7).

We will say much more about DAGs in the section on mediation below, but for now, we’ll

introduce just one basic element of DAG notation. Causal DAGs use single-headed arrows to

represent the hypothesized causal direction between variables. Double-headed arrows, contrast,

represent associations with no specified causal direction. In the first DAG below, tobacco type and

cancer mortality are associated, but with no causal direction. In the second DAG, tobacco type is

posited as the cause of cancer mortality.

6

The arrow in each of the figures, are, at this point black boxes, representing a potential host of

processes, not all of which are necessarily causal. Put differently, a raw or zero-order correlation,

or an unadjusted regression coefficient between two variables can be a function of a variety of

different processes, some possibly causal, but many others that have nothing to do with cause. In

this case, the association between tobacco type and cancer was actually generated by the presence

of a “third variable,” age, which was a common cause of both tobacco type (in that age captures the

cultural cohort) and cancer mortality. This confounding is depicted below. When we use

regression models to study the effect of one or a few putative causes of an outcome, we strive to

identify and include other variables in the model that might confound the relations under study. A

critical step in planning a study of virtually any design is considering carefully what variables might

confound the relations under study, and then being sure to measure those variables. This is

particularly important when the design is observational where there is no randomization to control

for confounding. By including confounding variables in the analysis of observational data, we may

be at least a bit closer to being able to understand cause. Considering potential confounders is also

important in randomized experiments. Except in extremely large studies, perfect baseline balance is

rarely achieved across randomized arms. When there is baseline imbalance in a randomized

experiment, the treatment effect under study may be confounded with the variable that is not

balanced. Unless the arms are substantially unbalanced, including potential confounding variables

as adjustment variables in a model will effectively reduce the threat of confounding when

interpreting the treatment effect.

7

Including variables to increase precision. Variables other than confounders may be relevant to the

regression model. We also want our model to include predictors that are associated with the

outcome, even if they are not associated with other predictors. In a linear model, such as multiple

regression, including additional predictors in the model (within the limits of sample size, which we

will discuss below), the precision of the regression weightparameter estimates2 is improved and

power of the tests of the regression weights is improved. Intuitively, power is improved because

additional predictors explain variance in the response, and therefore reduce the magnitude of the

error term by which the individual regression estimates are evaluated. For nonlinear models, such

as logistic regression and Cox survival models, the picture is a bit more complicated. Adding

additional variables will increase the standard errors for the parameter estimates, resulting in less

power. However, the estimates will also always be larger. Simulation studies have shown that the

benefit of the increased magnitude of the estimates outweighs the problem of larger standard errors

(8). Thus, when the sample size is large enough, in the most frequently used models in behavioral

medicine, including additional predictors is generally desirable.

Mediation

In addition to addressing confounding and increasing precision, we also might include additional

predictors in a model to study the possibility of mediation. Since the early paper on mediation by

Baron and Kenny (9), analyses of mediation has become increasingly prevalent in the literature. Its

importance has grown so much, in fact, that we have elected to devote a substantial section of this

chapter to it. The notion of mediation is used to describe a scenario where a variable affects

another variable though one or more intermediary variables. In the following sections, we will

2 When we use the term “parameter estimates,” we are referring to the weights or coefficients generated by the regression algorithm for each predictor variable.

8


This is an interesting question that I have struggled with—that is, the relationship between a good predictive model and the hypothesis testing model. The Steyerberg-Harrell school seems to suggest, with some vigor, that the two are highly related, that hypothesis-testing models are also by definition models that predict well. Are you familiar with the so-called ”Breiman debate?” http://www.jstor.org/stable/2676681This topic comes up, though a bit indirectly.


This essentially impinges on a prediction-argument, not a causality argument, doesn’t it? If my interest was to estimate the causal effect of stress on CVD in some non-linear model would I want to include variables beyond the sufficient set of confounders at the cost of increased standard errors?


It is perhaps not intuitively clear what regressions weights are.

review some conceptual issues involved in mediation and discuss methods that can be used to

statistically model mediation.

Total, direct and indirect effects. We begin with a little orientation to the nomenclature of modern

mediation analysis. Recall our graphic representation of a proposed causal association between two

variables:

In this graph the arrow pointing from X to Y indicates that the variable X affects the variable Y. We

will refer to this as the total effect of X on Y. The total effect of X on Y depicted in Figure 1 may

come about through any number of intermediary variables, but these can be left out when the

objective is to describe the total effect. If there are intermediary variables between X and Y, as we

noted earlier, the arrow from x to y in the above graph is a black box: We know the input (X) and

the output (Y), but not the mechanisms responsible for creating the association.

In this graph, the variable X affects the variable Y and the variable M. Also, we can see that the

variable M affects the variable Y. As a consequence we can distinguish between two different kinds

of effects of X on Y: A direct effect (X → Y) and an indirect effect through the variable M (X → M

→ Y). The second graph suggests that there is both a direct and an indirect effect of X on Y. In

other words, Figure 2 suggests that the variable M mediates some of the total effect of X on Y, but

it also suggests that there is an effect of X on Y that does not involve M. It is important to note that

the direct effect may in fact involve intermediary variable, just not the intermediary variable M, so

9

the direct effect might more appropriately be termed the non-M mediated effect as the direct effect

can be thought of as the sum of all pathways from X to Y that does not involve the mediator M.

Establishing the relative importance of the direct and indirect effect is often a primary concern in

mediation analysis. Figure 2 also illustrates the difference between confounding and mediation: M

is a mediator between X and Y because it lies on the pathway from X to Y. X is a confounder of the

association between M and Y because X affects M and Y.

Why mediation? Before elaborating further on the technique of mediation, it may prove fruitful to

examine the motivation for looking at mediation in the first place: Why is mediation important to

begin with? A recent paper by Hafeman & Schwartz listed three reasons: To support the evidence of

the main effect hypothesis, to examine the importance of path-specific mechanisms, and to provide

targets for intervention (10).

In 2005, a paper reported that women with a high level of perceived stress had a decreased risk of

breast cancer (11). This finding was quite surprising to many as high levels of stress had previously

been shown to have detrimental effects on various health outcomes, so could it be that the findings

were due to bias and confounding rather than a causal effect of perceived stress on the risk of breast

cancer? In the discussion the authors argue that the effect of stress was due to the fact that stress

hormones suppress estrogen secretion, which lowers the risk of developing breast cancer. This

pathway acts as a mediator between perceived stress and breast cancer. No information on estrogen

levels where available in this study, but an analysis of the mediating role of estrogen would have

improved the argument for a causal role of perceived stress in the development of breast cancer

because it would have served to open the black box of how the exposure and outcome were

connected. In fact, another research group had previously used this strategy to show that the

association between BMI and breast cancer was mediated by serum estrogen levels (12).

10

Another use for mediation is to examine path-specific hypotheses. An association between low

parental socioeconomic position and low offspring birth weight has been observed in many

different populations and across different measures of socioeconomic position. A study by

Mortensen et al. examined the role of two possible mediators of the relationship between maternal

educational attainment and offspring birth weight in a cohort of women followed throughout

pregnancy (13). The two mediators were prepregnant Body Mass Index (BMI) and smoking in the

third trimester. Smoking in pregnancy and high BMI is more prevalent among mothers with short

education, but these two factors have different effects on birth weight: A high BMI increases birth

weight, while smoking decreases it. This means that these two pathways have opposite

contributions to the total effect: if all mothers had the BMI of the highest educated mothers, the

educational differences would be larger because the higher prevalence of obesity among women

with short educations increase their children’s birth weights. If all mothers smoked like the highest

educated mothers, mothers with a shorter education would in fact give birth to the heaviest babies

because of the high prevalence of overweight and obesity among this group. The total effect of

education (short education is associated with a lower birth weight) reflects that the birth weight

reducing influence of the smoking-pathway is stronger that the birth weight increasing BMI-

pathway. The example of Mortensen et al. shows that the examination of different pathways can

increase our understanding of the total effects. For example, it suggests that the educational gradient

in birth weight that has been observed in numerous studies might reverse once smoking among

pregnant women is eliminated. It also underscores that mediation might be worth looking at, even in

the absence of a total effect. This is because a lack of association between the exposure and the

outcome might occur when different pathways that pull the total effect in opposite directions

balance each other out. This is sometimes referred to as a suppressor effect. In this case an analysis

11

of the relevant mediators would help the investigator retrieve the pathway-specific effects of the

exposure on the outcome.

A third use of mediation is to improve and evaluate interventions. Mediation is in a certain sense an

integrated part of the setup in all randomized controlled trials: The effect of randomization to

treatment on the outcome is mediated by the treatment received.

The intention-to-treat analysis is a measure of the effect of randomization to intervention, regardless

of the intervention actually received. In mediation terms, this corresponds to the total effect of

randomization. The motivation for the intention-to-treat analysis is that the results, because of the

random assignment to intervention or control, are unconfounded by factors that affect the

intervention received and the outcome, e.g. compliance to assigned treatment. However, the effect

of the intervention on the outcome is often the quantity of substantive interest, not the effect of

randomization to intervention. If this is indeed the case the intention to treat analysis can be

supplemented with analyses of mediation (14).

A similar use of mediation can be found in studies that uses naturally occurring experiments rather

than experiments under the investigator’s control. Mendelian randomization is a strategy for causal

inference that uses genetic variants as proxies for potentially modifiable factors, obesity for

example (15). In mendelian randomization the effect of the gene on the outcome is mediated by the

modifiable factors. There are special statistical methods (instrumental variable methods) that can be

12

used to recover the effect of the modifiable factors in a way that potentially avoids many of the

biases in observational studies.

Another use of the concept of mediation in intervention studies is that of surrogate endpoint in

randomized controlled trials, where the aim typically is to examine if an intervention has an effect

on one or more clinical disease endpoints such as cancer or cardiovascular disease. In order to

detect effects, clinical endpoints trials often require that a large number of participants are followed

for at long time. Because of this surrogate endpoints are often used (16). Surrogate endpoints are

biomarkers for disease progression and are as such mediators between the intervention and the

clinical endpoints. For example, CD4 cell count can be used as a surrogate endpoint in HIV

treatment trials and serum cholesterol levels as a surrogate of coronary heart disease.

Because mediation allows the investigator to peak into the black box it can also provide insight into

why interventions might work or fail and thus guide future interventions. The paper by Mortensen

et al. suggests that inventions that target smoking will likely reduce the educational gradient in birth

weight, particularly if the intervention is successful among mothers with a short education. Such

analyses of randomized trials might also provide clues as to what the ‘active ingredient’ in a given

intervention might be. Analyses of mediation are, however, not a free lunch: they come at the cost

of a number of added assumptions.

Causal knowledge as a prerequisite for mediation. The attentive reader will have noticed that we

used the term ‘affect’ to describe the relationship between variables. This is because, as was the

case for confounding, the notion of mediation makes little sense unless we have a causal model in

mind. In the case of mediation, the variables involved must be known or at least proposed to be

causally related in a way that is at least partly known to the investigator. For example, Boyle et al.

reported that the association between hostility and mortality was partly mediated by a pattern of

13

episodic excessive alcohol use (binge drinking) among hostile men (17). If high hostility is the

cause of binge drinking use (i.e. hostility → binge drinking), then the investigators’ conclusion is

correct. Let us assume that (unknown to the investigators) binge drinking over time increase

hostility. If binge drinking is the cause of hostility (i.e. binge drinking → hostility) then alcohol use

is not a mediator between hostility and mortality, but rather a common cause of these two variables.

If this was the case, binge drinking would act as a confounder of the association between hostility

and mortality, not as a mediator.

In order for analyses of mediation to make sense, assumptions about the nature of the relationships

between variables are needed. This may at first seem like a rather strong requirement because it

appears to force the investigator to make conclusions in advance about the relationships that are

under investigation. However, causational direction of relationship cannot be extracted from data

alone (18). Investigators will usually get around this by relying to existing knowledge. In the

example of Boyle et al. the prospective design will ensure that the outcome (mortality) occurs after

the exposures are recorded. But the relationship between hostility and binge drinking is cross

sectional so there is nothing in the design of the study to help the investigator decide about the

direction of the relationship. Most studies carefully consider whether the exposure in fact causes the

outcome. It is probably fair to say that in general less caution is exercised when it comes to making

assumptions about the causal relationship between exposure and mediator. Never the less the

analysis is conducted and the findings will most often be interpreted as if the mediator is caused by

the exposure. To this end, graphs are a helpful tool because they encode the investigator’s

assumptions about the possible causal relationships between variables.

Bearing this in mind, it may be fruitful to think of mediation in terms of (hypothetical)

interventions: If we could somehow intervene and change the subjects’ hostility levels in a certain

14

way, would we expect their alcohol use to decline? Would the association between hostility and

mortality change if the investigators had forced everyone to not to drink alcohol or forced everyone

to binge drink once a week? Thinking of mediation in terms of possible interventions has the added

advantage of providing a non-technical interpretation of the outcome of the analysis (given that the

analysis is conducted accordingly). Starting off with a vague question (“does alcohol mediate the

association between hostility and mortality”) may make it difficult to interpret the results. Just as

important, it will also serve to make the in many cases highly hypothetical nature of the mediation

analysis apparent (19, 20).

How to analyze mediation. There are numerous ways to statistically model mediation ("for a

review, see” (21)}. In a much cited 1986 paper, Baron and Kenny stated that the objective of such

an analysis was to “test for mediation” (9). This led them to device a method that was based on a

significance test. However, it can be argued that the question of interest is not to determine if a

given mediator is a statistically significant mediator, but rather to quantify how important the

mediator is. This follows the general arguments against relying only on test of statistical

significance in medical research (22, 23). In the applied literature, one of two somewhat different

modeling approaches to mediation is often used. The one approach is to use a Structural Equation

Model (SEM) and the other is to run a series of regressions to obtain and compare the total and

direct (non-mediated) effect of the exposure on the outcome. This latter approach, which is a

simplified version of the method of Baron and Kenny, involves controlling for the mediator to

estimate the direct effect (24). In some cases these two approaches will yield similar results, in

other cases the results will be different.

The SEM approach has the advantage that the statistical model corresponds to the graphs typically

used to conceptualize mediation, so that every arrow in the graph is estimated as a parameter from

15

one single model. SEMs are primarily used in the social sciences, whereas in the health sciences

SEMs appear to be the less popular choice. This is perhaps because SEMs are somewhat limited in

the sense that they are an extension of linear regression, which is not always well suited for the

kinds of data encountered in medicine. However, modern SEM theory (and modern SEM software)

is relatively flexible with regards to finding models that fit most problems that involve mediation. A

perhaps more important reason of the lack of popularity of SEMs for mediation analyses in the

medical sciences is that most investigators and scientific journals in the health sciences will be

familiar with multiple regression, but may not have experience with SEMs. In the following we will

concentrate on the mediator adjustment approach. For an example of an applied paper that uses both

approaches, see Batty et al. (25)

The mediator adjustment approach involves estimating the total effect and direct effect in two

separate regressions. To estimate the total effect, we need to take account of confounders of the

exposure outcome association. In the simple situation where there is no confounding, the total effect

is simply the outcome regressed on the exposure. The direct effect is typically estimated as the

association between exposure and outcome when conditioning on the mediator. Once we condition

on the mediator, we get the controlled direct effect the association between exposure and outcome.

This is called a controlled effect because it corresponds to evaluating the association between

exposure and outcome in a population where the mediator is forced by intervention at a certain

level. In order for this to make sense some conditions need to be met. We will discuss these

conditions using the example of Boyle et al.

Adjustment for path-specific confounding. In addition to adjustment for confounders of the

association between exposure and outcome, all confounders of the association between the mediator

and the outcome have to be controlled for. Consider again the example of Boyle et al.

16

In this graph, Early life socioeconomic position (SEP) confounds the relationship between hostility

and mortality because it affects both. It also confounds the association between binge drinking and

mortality. The graphs thus suggests that we should adjust for early life SEP when estimating the

total effect and when estimating the direct effect. Suppose that unemployment affects binge

drinking and mortality (loss of a job → increased binge drinking, mortality), but that this variable is

not affected by hostility (the dotted arrow does not exist). Then unemployment is not a confounder

of the total effect, but acts as a confounder of the association between binge drinking and mortality.

In this situation the investigator needs to adjust for unemployment even though unemployment does

not confound the total effect. If we fail to adjust for unemployment when estimating the direct

effect of hostility on mortality the results will generally be biased (26). This is because we need to

condition on binge drinking to estimate the non-binge drinking mediated effect of hostility on

mortality: Among those who binge drink, unemployment will be more frequent and mortality will

be increased as a consequence. Suppose that highly hostile men tend to fight with colleagues and

management and that they consequently are more likely to become unemployed (indicated by the

dotted arrow). In this case unemployment confounds the association between binge drinking and

mortality. This suggests that we should condition on it when estimating the direct effect. However,

if we control for unemployment we eliminate the contribution of the hostility → unemployment →

mortality pathway to the indirect effect. The problem arises because the dotted arrow contributes

both to the direct effect (non-binge drinking mediated) and the indirect (binge drinking mediated)

17

effect of hostility on mortality. This problem can be solved by resorting to a SEM or by applying

special methods (27).

Measurement error. The mediator has to be measured without error. While mismeasurement is

generally something that should be avoided, studies that aim to examine mediation should pay

particular attention to measurement error. This is because even random error in the measurement of

the mediator will bias both the direct effect and the indirect effect, but in different directions. The

actual direction and strength of this bias depends on the pattern of mismeasurement. For example,

suppose that instead of measuring binge drinking in the study by Boyle et al. the investigators

tossed a coin for each participant to determine whether he was a binge drinker. In this case the

direct effect would most likely be overestimated to the point that it would equal the total effect.

SEM software usually has build-in features for handling measurement error, whereas some work is

needed to take account of this in multiple regression (28) "for solutions, see e.g. "}.

No interaction between exposure and mediator. The direct effect of the exposure on the outcome

must not depend on at which particular value of the mediator variable it is assessed. In statistical

terms this can be viewed as an assumption of no statistical interaction between the exposure and the

mediator. For example, if the effect of hostility on mortality is stronger among those who binge

drink than among those who do not, we can estimate two different controlled direct effects: one for

binge drinkers and one of non-binge drinkers. Unless there is a strong argument for at what value of

the mediator the association between exposure and outcome should be evaluated, the controlled

direct effect does not make sense in the presence of statistical interactions. This also relates to the

difference between mediators and moderators. As discussed above, a mediator is a variable that lies

in a causal pathway (e.g. hostility → binge drinking → mortality). Moderation has to do with how

two (or more) variables alone and in combination affects a third variable. In statistical terms, this is

18

an interaction. It is important to note that statistical interaction depends on the choice of scale: Two

variables that do not interact on multiplicative scale (e.g. in a logistic regression) will interact on an

additive scale (linear regression) and vice versa. Because interaction depends on the choice of effect

measure, statistical interaction is often denoted effect measure modification in epidemiology (29).

The concepts of mediation and moderation are fundamentally different and not mutually exclusive,

so that a given variable can act as a mediator or as a moderator or as both. A discussion of

interaction and moderation is given in Hernan & Robins. (19)

Decomposition of total effects and indirect effects. We have now examined how total effects and

the controlled direct effects can be estimated given, but what about the indirect (mediated) effect?

In an SEM context, the total effect can readily be decomposed into a direct and indirect effect, but

this is more difficult when using the mediator adjustment approach. Intuitively it seems reasonable

to assume that the total is the sum of the parts, so that the indirect effect can the calculated by

subtracting the direct effect from the total effect. This is the case in some situations, but in many

situations it is not. If linear regression is used to estimate the total and direct effect, this strategy

works well although the standard error of the indirect effect is not directly estimated. But often

various kinds of non-linear regression models are used. Studies that use logistic regression will

often report the percent reduction in the Odds Ratio after adjustment for the mediator(s).

Unfortunately this strategy will generally not work (24, 30, 31). The problem is that this approach

assumes that the change in Odds Ratio from one logistic regression to another has a very specific

interpretation. This trick works in linear models because a mixture of two linear regressions is a

linear regression, but this is not generally the case in logistic regression for example. It is worth

noting that total and direct effects can be estimated from non-linear regression, but the indirect

effect cannot consistently be calculated by contrasting the total and direct effects.

19

There are also other (non-technical) reasons for resorting to a linear model. For example, it was

long believed that traditional risk factors did not explain the social gradient in cardiovascular

disease. This finding was predominantly supported by studies that used the mediator adjustment

approach in multiplicative, non-linear models. This was something of a paradox in so far that

research on the traditional risk factors suggested that these explained 90% of the cases. A landmark

paper by Lynch et al. from 2003 showed that this apparent paradox was explained by the choice of

relative measures of association. After adjustment for traditional cardiovascular risk factors (the

mediators) the relative differences decreased by about 25%. The absolute risk differences, however,

were reduced by about 75% (32). This highlights an inherent problem in calculating a relative

change in a relative measure.

Mediation and interaction. As noted above the problem of assessing mediation in the presence of

statistical interactions is exacerbated by the fact that statistical interactions are dependent on the

choice of scale. This means that the choices of statistical model and measure of association will in

part determine whether mediation is a tractable problem, which is both impractical and conceptually

unsatisfying. One solution to this problem is to take a close look at the relationship between the

exposure and the mediator. Recall that the controlled direct effect is estimated by fixing the

mediator at some value, for example eradicating all binge drinking. For many real life problems it is

difficult to imagine scenarios where forcing the mediator to attain a particular value is possible. But

is this is quite different from what we would expect the exposure to do to the mediator. If we could

somehow manipulate the exposure by intervention a reasonable expectation would be that the

distribution of the mediator would shift from the distribution it had under no exposure to the

distribution it has when the exposure is present. Consider the pathway perceived stress → estrogen

→ breast cancer from Nielsen et al. If we somehow intervened and eliminated all perceived stress,

we would expect the subjects’ levels of serum estrogen to increase. This would result in a shift in

20

the distribution of estrogen. So, instead of fixing the mediator at certain level, we can then calculate

the direct effect when the mediator has a certain distribution. If we wanted to estimate the direct

(non-estrogen mediated) effect of perceived stress on breast cancer, we could use this information.

For example, we could evaluate the association between perceived stress under the distribution that

estrogen has under no exposure to perceived stress instead of evaluating it in an analysis where we

fixed everyone’s estrogen levels to attain the exact same value. This leads to the estimation of a

natural direct effect. This is done using simple standardization techniques such as those used to

calculate standardized rates. It is important to note that using natural direct effects will yield results

that are identical to controlled direct effect unless there is statistical interaction between exposure

and outcome, but even in this case the concept does add value: Not only does the concept of natural

effects provide a definition of direct effects in the presence of interaction, they also lead to a

definition of indirect effects. A natural indirect effect can be defined as the change in outcome

when the exposure is fixed and the distribution of the mediator is changed. The reader is referred

elsewhere for a comprehensive review of natural effects (7, 27, 33).

Interactions and moderation

The assumption of homogeneity. An additionalAn assumption that all regression model make is

that the effect of a given variable is homogeneous across all levels of all other variables,

irrespective of whether those variables are measured and included in the model or not. For

example, if we estimate a model in which depression is a predictor of cardiac disease, we implicitly

make the assumption that the association between depression and disease is the same (within

sampling error) for men and women, the old and the young, across ethnicities, genotypes, etc. Even

if these other variables were measured and included in the model as adjustment covariables, such a

model would not yield any information about this possible heterogeneity. There are several ways

21

we might tackle this question. One intuitively appealing method would be to divide the sample into

subgroups and evaluate the regression coefficient within each of the groups. For example, we

might divide our sample based on gender, and estimate the relation between depression and cardiac

disease separately for me and women. Subgroup tests, however, are highly controversial and

generally discouraged by statisticians for a number of reasons (34). Among these objections to

subgroup testing, the two most important are the inflated error rate, and the differential power of the

tests, and the increased imprecision of the parameter estimates due to the smaller sample sizes.

Conducting many tests of any kind inflates the Type I error rate. In the case of subgroup tests, of

course, all the parameters in a given model are re-estimated within each subgroup, creating a whole

host of new opportunities for capitalizing on the idiosyncrasies of sample, with the added

disadvantage of conducting those tests on fewer data points! Correction for multiple testing in these

cases can be of some help, but unless the study was designed specifically for the subgroup test, the

power can and usually will be quite different for different subgroups. Hence, some subgroup tests

will have more power than others, making it virtually impossible to manage the error rate

coherently. If subgroup tests are of interest, the sampling plan must take them into account before

the study is carried out to ensure adequate and consistent power across them. The inferences from

pre-planned subgroup analyses are, of course, more robust than those which arose from post hoc

analyses. If the design did not take these tests into account, subgroup analyses should either not be

conducted at all, or should be interpreted as highly preliminary.

Finally, if we are interested in studying heterogeneity of associations, the preferred approach is to

test the corresponding interaction term rather to examine subgroups separately (35, 36). (There are

also Bayesian methods, which may overcome some of the problems with conventional subgroup

analyses [see (37, 38)]). For example, if one is interested in whether a treatment is more effective

in one ethnic group than another, the proper test is a treatment group by ethnicity interaction term.

22


Would it be worth explaining that when the data is split into subgroups not only the regression estimate interests but all other parameters in the model is estimated in each subgroup? My students sometimes miss this point because they only focus on the parameter of interest, not on the 2*d.f.s used

In a multivariable model setting, when more than one interaction term is of interest, the error rate

can be minimized by entering all the interaction terms of interest in the model as a block

simultaneously and testing the change in model fit associated with the block (39, 40). If the test of

the entire block is not significant, then the individual interaction terms are interpreted as

inconclusive or noise. I add a reminder here that in most statistical models nowadays all lower

order component terms must be included in the model with a higher order terms such as interaction.

For example, if we are testing a treatment group by ethnicity interaction, we also must include the

treatment group and ethnicity main effects—otherwise the interaction term is not really

interpretable as an interaction in the conventional sense of the concept.

Preserve measurement information wherever possible. On a final note, when testing interactions,

one might be tempted to create dichotomies or groups out of continuously measured variables.

Researchers also make artificial categories for other reasons, such as ease of interpretation,

evaluating nonlinearity, to parallel clinical cutpoints, or even in the belief that the grouping

somehow improves measurement precision. Indeed, creating groups out of continuous variables has

a long history in psychology, medicine, and epidemiology. What many modern researchers fail to

realize, however, is that this tradition arose strictly out of necessity. In the early days of modern

statistical practice, it was apparently well understood that the practice of grouping was less than

ideal, but there was little choice given the lack of computational power. With the availability of

ample computational power, modern authorities in methodology have repeatedly discouraged

researchers from adopting this practice (41-44). Compared to the categorized version of a variable,

using the continuous form yields substantially greater statistical power (43), is less likely to produce

spurious significance (45), and, from a measurement perspective, is a more reliable instantiation of

the variable under study (44). The much preferred alternative to categorizing is to model the

continuous variable as measured. If nonlinearity is a concern, techniques such as splines (40) or

23

Laust H Mortensen, 06-04-10,

NB!


Thanks!


Yes, too parental. Rewritten.

fractional polynomials (46) will allow for a nonlinear association without discarding information or

making arbitrary cutpoints. Despite the overwhelming evidence of the inadequacy of the

categorization approach, a quick glance at many scientific journals suggests that the force of

tradition is apparently quite strong. We once again appeal to readers to avoid this fundamental error

in data analysis.

Some more additional considerations on regression models

Sample size in multivariable models. We now turn to a last few concepts that bear directly on the

above material in terms of producing replicable models. Earlier, we alluded to the idea that

although it is a good idea to include potential confounders and additional predictors of the response

in a model, the number we can include in a model and still obtain reproducible results is determined

by the sample size we have to work with. Before the advent of simulation studies, statisticians

often offered rules of thumb based on their experience. One well-known rule of thumb for linear

regression models is that there should be at least 10, preferably 15, cases for every degree of

freedom used in estimating the equation. Typically, each predictor uses one degree of freedom.

For example, is we want to study 10 predictors with no interactions or curvilinear terms, we should

have at least 100 observations in our sample. Perhaps it is not a surprise, but modern simulation

studies have tended to support this rule of thumb, demonstrating empirically that following this

guideline will result in a regression model that is more likely to replicate in new samples. There are

also rules of thumb that have been empirically tested for logistic regression models and also

survival models such as Cox regression. The rules of thumb for logistic and time to event models

are similar to that for linear regression, about 10-15 observations per predictor. However, there is

an important difference in how the number of observations is counted in the logistic and time to

event models. In these models, the number of observations is based on something called the

24


Are there perhaps a tutorial on modeling of continuous variables to point to? Something on splines etc?

effective sample size. The effective sample size for a time to event regression model is simply the

number of events. So, if there are 1000 participants in a study, and only 10 of them sustain the

event being study, the effective sample size is 10. For logistic regression models, in which the

outcome is a binary variable, the effective sample size is the count of events or nonevents,

whichever is the smaller number of the two. For example, if there are 200 individuals in the

sample, and 20 had an event, the effective sample size is 20, not 200, and at best 2 variables can be

studied with reasonable confidence. If there were 180 events rather than 20, the effective sample

size would still be 20. In more technical parlance, the number of cases in a logistic regression

model with a binary response is min(q, n-q), where min represents “the minimum of the following

quantities”, q is the number of events, and n is the total sample size. Finally, for ordinal logistic

regression models, that is, models with more than two ordered category as the response, the

effective sample size is given by

n− 1n2∑

i=1

k

ni3

where n is the sample size and k is the number of response categories(40).

What are the consequences of studying more variables than the guidelines suggest? Perhaps the

most serious consequence of trying to squeeze too many variables in a model is overfitting.

Overfitting is a condition in which the idiosyncrasies of the sample lead to an overly optimistic

overall fit of the model. Intuitively, we might say that there is simply not enough information (in

terms of observations) to distinguish noise from true signal. The fewer observations per degree of

freedom in a model, the more likely the model will be overfit. Overfitting is discussed in greater

detail in Babyak (47) and Steyerberg (8). Figure 1 displays the results of a series of simulations

carried out by Babyak (47). The plot shows the distribution of model r-square values for various

25

levels of predictors/observations for a model with 10 predictors whose values are merely randomly

generated, i.e. are pure noise. Because the predictor values are randomly generated, the 'true' model

should have an r-square value of zero, with any non-zero r-square arising simply due to random

sampling fluctuation. The plot demonstrates that when there are relatively many observations per

predictor, the vast majority of r-square values are zero or very close to zero. However, as the

predictor/observation ratio becomes smaller, the typical r-square values become larger and more

varied, with some even reflecting a fairly large amount of variance explained. In addition to

generating overly optimistic model fit, having too few observations per predictor also results in bias

in the estimates for the individual parameters. Peduzzi et al. (48) showed in a series of simulations

that an inadequate predictors/observations ratio also leads to serious bias in the estimates of the

regression coefficients in logistic regression and time-to-event models. Some have argued that in

the case of models in which we are interested in a single predictor and are merely concerned about

ruling out confounding, fewer variables per predictor may be required. Vittinghoff et al. (49) has

argued that in this circumstance, perhaps as few as 5 events/case per predictor may be sufficient, but

the authors also show that under some circumstances even more than 15 per predictor may not be

enough. Perhaps the most prudent advice is that more is always better when it comes to sample size

and that when there are relatively fewer cases than the guidelines suggest, interpreting such results

with great caution.

Reducing the degrees of freedom in a model. If you are confronted with a situation in which you

wish to study more variables than the sample size allows, what are the alternatives? A popular

approach in the past has been to use automated 'stepwise' methods. There are actually a variety of

these techniques, but they are typically characterized by sequentially entering and removing

variables based on the correlations and partial correlations between the predictors and response

variable until some arbitrary criterion is met. For example, in forward stepwise selection, the

26

algorithm scans the correlations between the predictors and response variable and selects the

predictor with the largest correlation with the response. In the next step, the correlations between

the remaining candidate predictors and the response are partialled for the effect of the first variable

that was chosen, and the algorithm selects the largest of these partialled correlations. The process

continues until some predetermined measure of fit is achieved. Unfortunately, these algorithms

have been subsequently shown to be significantly flawed in terms of inference. They do generate

models that will fit the sample data well, but when used in the way that most of us have used them,

they are almost certain to not produce a replicable model. That is, when we compare the fit of the

model and the parameter estimates from the stepwise model to a model based on a new sample, not

much will be the same. Intuitively, the overly optimistic fit can be understood as a function of the

fact that we have tested many variables, and that by chance alone (i.e. random sampling

fluctuation), we are bound to find at least a few, and sometimes even many, predictor variables that

display a non-trivial association with the response variable. On the other side of the same coin, the

automated algorithm will also miss potentially important variables, again due to sampling error,

yielding a model with parameter estimates that may not be appropriately adjusted, i.e., a

misspecified model. Moreover,Further problems arise with automated algorithms when there are

correlations among the candidate predictor variables. In these instances, the choice to select one or

the other by the algorithm can be quite arbitrary. Not surprisingly, in recent years, the use of

automated stepwise methods has been almost uniformly discouraged by statisticians. Several

journals, in fact, will not accept papers that are based on conventional stepwise analyses (50, 51).

A commonly used alternative to stepwise selection is univariate prescreening of variables. In this

approach, the researcher evaluates the univariate relation between each predictor and the response

variable and selects those which are statistically significant for entry in a final regression model.

Unfortunately, this technique suffers from essentially the same, and at times worse shortcomings,

27


I miss the argument (that you make in your overfitting paper) that it is the repeated tests that make the model fit too well to the artefacts of the dataset. These sentences seems to say that backwards elimination fit data well, but that the results are wrong, but not why it is so.


Necessary?

though perhaps not quite as dire, asthan those seen in the automated stepwise algorithms. The fit is

again biased toward being too good, because we are selecting predictors whose parameters of the

largest magnitude without accounting for the possibility that the magnitude of the predictor is also

influenced by random sampling error. Steyerberg (8) calls selection based on p-values “testimation

bias” As a more general principle, using the sample data to determine what to include in a model

will produce fit that may be too good and parameters that are too large. A further difficulty with

univariate prescreening is that variables behave differently in univariate setting compared to a

multivariate model. It is entirely possible, for example, for a potential predictor to look quite

uninteresting in a univariate setting and then come to life when partialled for other variables.

Arguably the best alternative to automated techniques and prescreening is to specify the model in its

entirety before even collecting the data. A prespecified model is preferable for a number of

reasons. First and foremost, it requires a thoughtful consideration of the phenomenon under study

before collecting the data. Second, it is transparent. There is no doubt as to whether other variables

were considered but just not reported. Finally, the p-values for the fit of the model and for the

parameters will be 'honest.' In other words, once predictors are tested either during pretesting or

some other selection process and discarded, the tests of the model with the remaining variables, as

well as the test of model fit will be too optimistic (for a simulation study demonstrating this

principle, see (52)).

Sometimes, of course, it is not possible or even desirable to have a single prespecified model. We

simply may not know quite enough about the entire system of variables we are studying, or perhaps

collecting some of the data is expensive and we want to cull as many of the non-important variables

out of the equation. There are a variety of approaches that will either allow us to include more

variables that the rules of thumb suggest, or that will remove extraneous variables with the correct

28


There is a neat simulation study (http://www.ncbi.nlm.nih.gov/pubmed/17027287) that shows that you get the smallest SEs from this type of variable selection vs. the different data-driven methods, which makes perfect sense for a Bayesian like me because you actually input information into the model by deciding the variables without seeing data.


Very interesting.


It will in fact sometimes perform worse: If some important confounder is only associated with the variable of interest and/or the outcome conditional on other variables prescreening may perform worse. My comments on the three-step definition of confounding is related to this: Even if variable A is not marginally associated with variable B or outcome C it may still confound the association between B and C (...this is perhaps not very likely, but theoretically it is perfectly possible)

adjustment. The simplest technique for reducing degrees of freedom is to combine predictors in

some rational way. Combining is useful when there are variables that are acting solely as nuisance

or adjustment variables for which we are not particularly interested in their individual regression

coefficients, but still want the information they provide to be included in the model. We can simply

create a composite score from two or more variables, by summing their ranks or converting the

variables to standardized scores and summing them. Alternatively, we can use a clustering

technique such as principle components or common factor analysis to develop a composite that

captures the information in the variables. The resulting composite that we create is then used

instead of the individual variables in the model. More details on these approaches are available in

Harrell (40).

More sophisticated methods for automated model selection have been developed recently and are

now becoming more widely available in popular software packages. The techniques include the

lasso and least angle regression approaches developed by Tibshirani (53), Bayesian model

averaging (54), and the use of penalization (55) or random effects (56). The details of these

techniques are far beyond the scope of this chapter, but they do show some promise in terms of

allowing an algorithm to make reasonable selections of variables while accounting for uncertainty.

Because these approaches properly correct for capitalizing on the idiosyncrasies of the sample,

however, many researcher may be quite displeased with the failure to find ‘significant’ results.

Nevertheless, these approaches generate far more realistic appraisals of the extent to which our

results will replicate in a new sample.

Summary. This paper has reviewed some of the issues involved in the estimation of regression

models in terms of variable selection and underlying causal models. Specifically, regression

models that attempt to illuminate causal understanding are most useful when we try to account for

29

potential confounders, include additional variables that enhance precision, and test for mediators.

For mediation, SEMs are currently the best choice for the applied researcher because they are

linear, provide consistent decomposition of the total effect into direct and indirect contributions and

allow the investigator to take measurement error into account. If interactions among two or more

variables are suspected, care must be taken to design the study in such a way that these potential

interactions can be adequately studies. When testing mediation, if there are strong interactions

between the exposure and the outcome, methods beyond simple SEMs are needed. Finally, in order

to increase the likelihood that our models will replicate, and hence be generalizable, attention

should be paid to the number of parameters we seek to estimate in the context of sample size.

30

Figure Caption

Results of simulation automated stepwise regression with 15 candidate predictor variables. In the

true model, predictors were randomly generated and therefore unrelated to the response variable,

meaning that the true r-square was zero. The ratio of predictors to sample size was then

manipulated by altering the sample size. The frequency of falsely high r-squares increases as the

sample size to predictors ratio decreases.

31

References

1. McCullagh P, Nelder J. Generalized Linear Models. London: Chapman and Hall; 1989.2. Cox DR, Oakes D. Analysis of survival data. London: Chapman & Hall; 1984.3. Muthen LK, Muthen B. Mplus User's Guide. 3rd ed. Los Angeles, CA: Muthen and Muthen; 2004.4. Glare PGW. Oxford Latin dictionary. Oxford University Press; 1982.5. Rubin DB. Estimating causal effects from large data sets using propensity scores. Ann Intern Med

1997;127:757-63.6. Glymour MM, Greenland S, Rothman KJ, Lash TL. Causal diagrams. Modern Epidemiology, vol. 3rd.

Philadelphia: Lippincott Williams & Wilkins; 2008, p. 183-212.7. VanderWeele TJ, Vansteelandt S. Conceptual issues concerning mediation, interventions and

composition. Stastistics and Its Interface 2009;2:457-68.8. Steyerberg EW. Clinical Prediction Models. New York: Springer; 2009.9. Baron RM, Kenny DA. The moderator-mediator variable distinction in social psychological research:

Conceptual, strategic, and statistical considerations. J Pers Soc Psychol 1986;51:1173-82.10. Hafeman DM, Schwartz S. Opening the Black Box: a motivation for the assessment of mediation.

IntJEpidemiol 2009;38:838-45.11. Nielsen NR, Zhang ZF, Kristensen TS, Netterstrom B, Schnohr P, Gronbaek M. Self reported stress

and risk of breast cancer: prospective cohort study. BMJ 2005;331:548.12. Key TJ, Appleby PN, Reeves GK, Roddam A, Dorgan JF, Longcope C, Stanczyk FZ, Stephenson HE, Jr.,

Falk RT, Miller R, Schatzkin A, Allen DS, Fentiman IS, Wang DY, Dowsett M, Thomas HV, Hankinson SE, Toniolo P, Akhmedkhanov A, Koenig K, Shore RE, Zeleniuch-Jacquotte A, Berrino F, Muti P, Micheli A, Krogh V, Sieri S, Pala V, Venturelli E, Secreto G, Barrett-Connor E, Laughlin GA, Kabuto M, Akiba S, Stevens RG, Neriishi K, Land CE, Cauley JA, Kuller LH, Cummings SR, Helzlsouer KJ, Alberg AJ, Bush TL, Comstock GW, Gordon GB, Miller SR. Body mass index, serum sex hormones, and breast cancer risk in postmenopausal women. JNatlCancer Inst 2003;95:1218-26.

13. Mortensen LH, Diderichsen F, Smith GD, Andersen AM. The social gradient in birthweight at term: quantification of the mediating role of maternal smoking and body mass index. HumReprod 2009;24:2629-35.

14. Kraemer HC, Wilson GT, Fairburn CG, Agras WS. Mediators and moderators of treatment effects in randomized clinical trials. Arch Gen Psychiatry 2002;59:877-83.

15. Smith GD, Ebrahim S. Mendelian randomization: prospects, potentials, and limitations. Int J Epidemiol 2004;33:30-42.

16. Cohn JN. Introduction to surrogate markers. Circulation 2004;109:IV20-IV1.17. Boyle SH, Mortensen L, Gronbaek M, Barefoot JC. Hostility, drinking pattern and mortality.

Addiction 2008;103:54-9.18. Hernan MA, Hernandez-Diaz S, Werler MM, Mitchell AA. Causal knowledge as a prerequisite for

confounding evaluation: an application to birth defects epidemiology. AmJEpidemiol 2002;155:176-84.

19. Hernan MA, Robins JM. Causal Inference. Chapman Hall/CRC.20. Dawid AP. Causal Inference Without Counterfactuals. J Am Stat Assoc 2000;95:407-24.21. Mackinnon DP, Lockwood CM, Hoffman JM, West SG, Sheets V. A comparison of methods to test

mediation and other intervening variable effects. PsycholMethods 2002;7:83-104.22. Sterne JA, Davey SG. Sifting the evidence-what's wrong with significance tests? BMJ 2001;322:226-

31.23. Rothman KJ. Significance questing. AnnInternMed 1986;105:445-7.

32

24. Kaufman JS, MacLehose RF, Kaufman S. A further critique of the analytic strategy of adjusting for covariates to identify biologic mediation. Epidemiol PerspectInnov 2004;1:4.

25. Batty GD, Gale CR, Mortensen LH, Langenberg C, Shipley MJ, Deary IJ. Pre-morbid intelligence, the metabolic syndrome and mortality: the Vietnam Experience Study. Diabetologia 2008;51:436-43.

26. Cole SR, Hernan MA. Fallibility in estimating direct effects. Int J Epidemiol 2002;31:163-5.27. Petersen ML, Sinisi SE, van der Laan MJ. Estimation of direct causal effects. Epidemiology

2006;17:276-84.28. Gustafson P. Measurement error and misclassification in statistics and epidemiology: impacts and

Bayesian adjustments. Boca Raton, FL: CRC Press; 2003.29. Rothman KJ. Measuring Interactions. Epidemiology An Introduction, vol. 1st. New York: Oxford

University Press; 2002, p. 168-80.30. Ditlevsen S, Christensen U, Lynch J, Damsgaard MT, Keiding N. The mediation proportion: a

structural equation approach for estimating the proportion of exposure effect on outcome explained by an intermediate variable. Epidemiology 2005;16:114-20.

31. Mackinnon DP, Lockwood CM, Brown CH, Wang W, Hoffman JM. The intermediate endpoint effect in logistic and probit regression. Clin Trials 2007;4:499-513.

32. Lynch J, Davey SG, Harper S, Bainbridge K. Explaining the social gradient in coronary heart disease: comparing relative and absolute risk approaches. J Epidemiol Community Health 2006;60:436-41.

33. Pearl J. Direct and indirect effects: Technical report R-273. Proceedings of the American Statistical Association. Minneapolis, MN; 2005, p. 1572-81.

34. Assmann SF, Pocock SJ, Enos LE, Kasten LE. Subgroup analysis and other (mis)uses of baseline data in clinical trials. Lancet 2000;355:1064-9.

35. Altman DG, Bland JM. Statistics Notes: Interaction revisited: the difference between two estimates. BMJ 2003;326:219.

36. Altman DG, Matthews JNS. Statistics Notes: Interaction 1: heterogeneity of effects. BMJ 1996;313:486.

37. Dixon DO, Simon R. Bayesian subset analysis in a colorectal cancer clinical trial. Stat Med 1992;11:13-22.

38. Simon R. Bayesian subset analysis: application to studying treatment-by-gender interactions. Stat Med 2002;21:2909-16.

39. Cohen J, West SG, Aiken L, Cohen P. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. 3rd ed. London: Taylor and Francis; 2002.

40. Harrell FE. Regression Modeling Strategies: With applications to linear modeling, logistic regression, and survival analysis. New York: Springer; 2001.

41. MacCallum RC, Zhang S, Preacher K, Rucker D. On the Practice of Dichotomization of Quantitative Variables. Psychological Methods 2002;7:19-40.

42. Royston P, Altman DG, Sauerbrei W. Dichotomizing continuous predictors in multiple regression: a bad idea. Stat Med 2006;25:127-41.

43. Cohen J. The cost of dichotomization. Appl Psychol Meas 1983;7:249-53.44. Harrell FE. Problems Caused by Categorizing Continuous Variables. 2008.45. Maxwell SE, Delaney HD. Bivariate Median Splits and Spurious Statistical Significance. Psychol Bull

1993;113:20.46. Royston P, Altman DG. Regression Using Fractional Polynomials of Continuous Covariates:

Parsimonious Parametric Modelling. Journal of the Royal Statistical Society Series C (Applied Statistics) 1994;43:429-67.

47. Babyak MA. What you see may not be what you get: a brief, nontechnical introduction to overfitting in regression-type models. Psychosom Med 2004;66:411-21.

48. Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR. A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol 1996;49:1373-9.

33

49. Vittinghoff E, McCulloch CE. Relaxing the Rule of Ten Events per Variable in Logistic and Cox Regression. Am J Epidemiol 2007;165:710-8.

50. Freedland KE, Babyak MA, McMahon RJ, Jennings JR, Golden RN, Sheps DS. Statistical Guidelines for Psychosomatic Medicine. Psychosom Med 2005;67:167.

51. Thompson B. Stepwise regression and stepwise discriminant analysis need not apply here: A guidelines editorial. Ed Psychol Meas 1995;55:525-34.

52. Budtz-Jørgensen E, Keiding N, Grandjean P, Weihe P. Confounder Selection in Environmental Epidemiology: Assessment of Health Effects of Prenatal Mercury Exposure. Ann Epidemiol 2007;17:27-35.

53. Tibshirani R. The lasso method for variable selection in the Cox model. Stat Med 1997;16:385-95.54. Hoeting JA, Madigan D, Raftery AE, Volinsky CT. Bayesian model averaging: a tutorial. Statistical

Science 1999;14:382-417.55. Moons KGM, Donders ART, Steyerberg EW, Harrell FE. Penalized maximum likelihood estimation to

directly adjust diagnostic and prognostic prediction models for overoptimism: a clinical example. J Clin Epidemiol 2004;57:1262-70.

56. Greenland S. When should epidemiologic regressions use random coefficients? Biometrics 2000;56:915-21.

34

Documents

Total, direct and indirect effects - people.duke.edupeople.duke.edu/~mababyak/docs/other/BABYAK_MORTENSON... · Web viewOne solution to this problem is to take a close look at the