12

Click here to load reader

Statistical methods for preventive trials in mental health

Embed Size (px)

Citation preview

Page 1: Statistical methods for preventive trials in mental health

STATISTICS IN MEDICINE, VOL. 12,289-300 (1993)

STATISTICAL METHODS FOR PREVENTIVE TRIALS IN MENTAL HEALTH

C. HENDRICKS BROWN Department of Epidemiology and Biostatisiics, University of South Florida, 13201 North 30th Street, Tampa,

FL 33612, U.S.A. and Department of Biostatistics, Johns Hopkins University, Baltimore, MD. U.S.A.

INTRODUCTION

This paper discusses two methods for analysing preventive field trial data. We introduce a preventive trial aimed at improving psychiatric symptoms and reducing delinquency, drug and alcohol use during adolescence by intervening in first and second grades, a time when identifiable risk factors for these behaviours often first appear. We rely on the work of Rubin and Holland on the meaning of treatment effects to suggest a graphical approach to exploring the nature of the intervention effects. The adjusted empirical Q-Q plot is a new exploratory tool which helps identify appropriate models for analyses and helps identify which subgroups might benefit the most from the intervention. We also discuss an application of generalized estimating equations to sequences of unequally spaced binary data. Both of these methods are applied to preventive trial data.

THE BALTIMORE PREVENTION CENTER

The Baltimore Prevention Research Centre (PRC) is based on a co-operative agreement between the Mental Hygiene Department at Johns Hopkins School of Hygiene and Public Health and the Baltimore City School System. It seeks to reduce psychopathology, delinquency and drug use during adolescence by intervening during first and second grade, about ten years before the major mental health outcomes of adolescence. The intervention targets are early childhood risk factors for these adolescent outcomes which have been found repeatedly in prospective, longitudinal studies (Kellam et d l ) . For example, poor performance in grade school is related to depressed and anxious feelings for males. Aggressive behaviour in grade school is related to delinquency and drug use in adolescence. In addition, aggressive children who exhibit shy and withdrawn behaviour are much more likely to be delinquent and to be drug users in adolescence compared to other children, even those who were rated aggressive but not shy behaving (Kellam et u1.l). The expectation underlying the Baltimore preventive trials is that by modifying the risk factors early in a child’s development using inexpensive, universal interventions (that is, applied to everyone, Gordon’), the developmental pathways which ordinarily lead to adolescent mental health outcomes will be modified (Kellam et d3).

Two inexpensive classroom-based trials were run concurrently in 19 Baltimore City public schools. We will present data from one of these trials. For the first cohort, the intervention period was the 1985-86 first and 1986-87 second grades, and the second cohort again had a two- year mtervention period lagging by one year. The 19 schools were chosen so that they could be

0277-67 15/93/030289-12$11 .OO 0 1993 by John Wiley & Sons, Ltd.

Page 2: Statistical methods for preventive trials in mental health

290 C. H. BROWN

grouped into five geographically contiguous areas, each area being fairly homogeneous on sociodemographic and school-level characteristics. Within each geographic area, schools were selected randomly to receive one of two interventions or to serve as an external control school. Within the two types of intervention schools, all first-grade classrooms were balanced on children’s kindergarten grades and designated either an intervention classroom or an internal control classroom. The internal control classrooms did not receive the intervention, but their inclusion in the design allowed us to examine the differences between intervention and non- intervention while holding fixed school-level characteristics. Classrooms in the external control schools served as a complementary comparison to the internal controls; they would be especially important if there were evidence of contamination of control classrooms by the intervention. All the teachers, both intervention and control, received the same amount of training and attention by the prevention staff, and the classrooms maintained their compositions throughout first and second grades (see Dolan et

The Good Behaviour Game (GBG) intervention attempted to reduce aggressive and shy- withdrawn behaviours in the classroom using team-based behaviour management. Children were divided into three teams, and points were accumulated whenever team members exhibited any of several precisely defined aggressive behaviours. Any (or all) teams which scored less than a prespecified number of points received rewards. The goal of the intervention was to encourage students to manage their own behaviour through team self-interest. We anticipated that the successful diminishing of aggressive behaviours would have both short-term and long-term impacts. The major long-term impact would be to lessen delinquent acts and drug and alcohol use, all of which are known to have early aggressive behaviour as a risk fa~tor . ’ .~”. For short- term outcomes, we could infer early signs of intervention success from reduced ratings of aggressive and shy behaviours by both teachers and peers as well as by reduced levels of these behaviours during direct observation of classroom behaviour by trained observers..For the short term, we also anticipated that the classrooms’s level of learning would increase because less time would be needed to be devoted to classroom management. Because of the strong relationship between achievement and depression, we anticipated that the GBG intervention eventually would lessen depressive symptoms as well.3

for more details).

MEANING O F INTERVENTION EFFECT

The primary research question about the intervention effect is often stated in terms of whether there is an overall effect of an intervention. However, even with universal interventions it is important to identify who benefits from or is adversely affected by the intervention. Sometimes there are specific hypotheses about which subgroups should benefit the most, but for the most part there is usually too little etiologic data available to make educated a priori guesses about effectiveness. This question then should lead us to consider exploratory methods for determining intervention effectiveness as well as more confirmatory methods used to validate these ex- ploratory findings on fresh data. We will examine two useful methods for examining intervention effects; the first is an exploratory tool while the second is a more confirmatory tool. The exploratory method is graphical in nature; it makes fewer assumptions and can be used to suggest appropriate classes of models or re-expressions for examination.

The most natural definition of intervention effect comes from the interpretation which has now come to be known as Rubin’s method.*-” In a notation similar to that of Holland’s, we can define for each unit u, two potential outcomes, Yi(u), the response under treatment i (for intervention), and Yc(u), the response under treatment c (for control). We observe for each unit u

Page 3: Statistical methods for preventive trials in mental health

STATISTICAL METHODS FOR PREVENTIVE TRIALS IN MENTAL HEALTH 29 1

which treatment the unit received, S ( u ) - which can be either i or c, and the response under that treatment, Ys(,, (u). Along with which treatment and the associated response measure, we will also measure baseline data which we will designate as X (u).

Rubin's interpretation of intervention effect is based on what the difference in response would be if an individual were exposed to the intervention compared to the exposure without the intervention. The intervention effect on the uth subject is defined to be the difference in potential responses Yiu) and Y,(u). Since only one of these conditions can be observed, we need to infer, from data on subjects who were exposed to the intervention and data from subjects without such exposure, what the difference in response would be, namely.

Yi(U) - Yc(u). If subjects are ignorably assigned to intervention or control,' '3 l 2 we can make inferences about population differences since

E,(Y,(u) - Y&)) = E,(Ys,,,(u)lS(u) = 4 - 44(Ys{,)IS(u) = 4. (1)

It is, however, not necessary to use the difference in these two scores; ratios or other transforms, for example, may be more useful than differences, but only on the difference scale will equation (1) be satisfied. In general, we can consider differences between the two responses after transforming by the same monotone transform f,

f( Yi(u)) - f( Y,(u))* Some commonly used transforms are the identity, the log-transform, and the indicator function which is 1 if the argument is above a cutoff or 0 if below the cutoff value. This last transformation would be used to dichotomize outcomes in order to examine changes in one tail of the distribution.

We would like to identify situations under which this transformed difference has a relatively simple form, such as

where E, is an error term independent of X(u). In equation (2) the intervention effect is constant over all subjects; in equation (3) there is a linear interaction between the covariate X(u) and intervention, and in equation (4) the interaction is described by a potentially non-linear function g.

Adjusted empirical Q-Q plots, which are described below, are useful tools in determining whether our analyses should be based on differences in the untransformed scale, whether a transformation of the data can aid in describing the intervention effect, or whether an interaction between intervention and some subject characteristics exists. The technique we describe below does not rely on any a priori choice for this comparison.

ADJUSTED EMPIRICAL Q-Q PLOTS

Empirical Q-Q plots (EQQ)' are excellent exploratory tools for comparing the distributions of two groups on the same variable. Figure 1, for example, compares the distributions of Baltimore

Page 4: Statistical methods for preventive trials in mental health

292 C. H. BROWN

0.0 0.2 0.4 0.6 0.8

Spring Aggression Level for Students in GBG Classes

Figure 1. EQQ plot of spring aggression ratings by peers for males

PRC peer ratings of aggressive scores for males after one year of intervention who were in the GBG classrooms versus the internal control classrooms (C) within the same schools. A high score indicates that peers rated that particular child as behaving aggressively. There appears to be a modest but systematic difference in the two scores at the low end of the scale; that is, the lowest scoring males in the internal control classes appear to be scoring higher than the lowest scoring males in the GBG classes. This is the simplest two-group EQQ plot to examine the potential effect of the intervention, and as we will see it is misleading.

Figure 2 shows a similar comparison of GBG and internal control classrooms, this time at the beginning of the school year before intervention began. There are clear baseline differences with the highest scoring males in the GBG classes scoring higher than the highest scoring males in the internal control classrooms at the beginning of the year. There are, however, no differences in their use of the lower end of the scale. (Note that this information would not be revealed by an analysis on means.) We cannot determine from these data alone whether this difference at baseline reflects more of a variation in actual behaviour or whether the children use the scales differently. This uncertainty in meaning is a primary reason why this study used three different measures to rate aggressiveness: peer ratings; teacher ratings; and direct observation by trained, observers doing micro-analytic assessments in the classroom. Comparison of intervention effectiveness on all three measures can potentially clarify what the intervention is actually affecting.

The failure to achieve perfect balance on this baseline variable of peer aggression should not necessarily be viewed as a poor assignment to intervention. In fact, the large number of variables examined in this study make it extremely unlikely that excellent balance would be found on all baseline variables, and this imbalance in Fall aggressive ratings was the most extreme found among many baseline variables.

Page 5: Statistical methods for preventive trials in mental health

STATISTICAL METHODS FOR PREVENTIVE TRIALS IN MENTAL HEALTH 293

. . / ,

0.0 0.2 0.4 0.6 0.0

Fall Aggression Level for Students in GBG Classes

Figure 2. EQQ plot of fall (baseline) aggression ratings by peers

Based on these baseline differences on peer aggression, we choose to examine the differences between GBG and internal control males using a procedure to adjust for baseline aggression. The adjusted EQQ plot which we show below and discuss how to produce it and evaluate it in a later section, enables us to make a graphical comparison of the outcome distribution for two groups while adjusting for baseline characteristics. It is designed to be read much like an EQQ plot; points above the 45 degree line indicate higher scores for the group on the ordinate compared to the group on the abscissa. Figure 3 shows the adjusted EQQ plot for aggressive ratings after the first school year's intervention period, adjusted for Fall of first grade scores. The number given in the plotting position represents the ordinal level of the baseline characteristic; small values indicate that the baseline score for that point is a low value. The 45 degree line in the figure is used as a guide to indicate whether the adjusted aggression scores in the Spring for children in control classes are higher (that is, above the line) than those receiving the GBG intervention or whether they are lower (that is, below the line) than those receiving the GBG intervention. Although the adjusted EQQ plot does not retain the monotone increasing characteristic shape of an EQQ plot (see for example, Easton and McC~l loch '~ for a non-monotone QQ plot), the points are in a generally increasing pattern. The adjusted EQQ plot helps us consider whether the intervention has a different impact for varying baseline values and how the shape of the outcome distribution differs between GBG and internal control setting conditions.

A METHOD FOR ADJUSTING FOR BASELINE DIFFERENCES

The adjusted EQQ plot does not assume a linear relationship between baseline (Fall) and outcome (Spring) variables as one would ordinarily assume in analysis of covariance. Instead,

Page 6: Statistical methods for preventive trials in mental health

294 C. H. BROWN

2 - i 5

1

0.0 0.2 0.4 0.6

Adjusted Spring Aggression Scores for GBG Students

Figure 3. Adjusted EQQ plot of spring aggression for males

lowess, a resistant local fitting procedure is used” so that non-linear relationships - as well as non- parallelism - can be handled. The adjusted EQQ plot first applies the lowess smoother to each of the two groups. The combined data are then divided into a modest number of subsets based on sorted baseline scores; this ordinal grouping is used as a plotting character with lower values indicating lower baseline scores. The adjusted EQQ plot makes comparisons of the outcomes of pairs of subjects, one exposed to the GBG intervention, having outcome measure Y,,, and baseline X,,,, and the other in the internal control setting having outcome measure Y, and baseline X,, where XGBG and X , are close. We first estimate what their scores would be had they both been observed at the same baseline score, say, X, . If we shift the residual from the lowess fit at the point XGBG to the same residual from the lowess fit at X, , we have:

shifted value for GBG = Y,,, - (GBG lowess fit at X,,,) + (GBG lowess fit at X,)

and similarly

shifted value for C = Y, - (control lowess fit at X,) + (control lowess fit at X , )

for the shifted values for GBG and internal control. These shifts are shown in Figure 4. The point X , which is used for adjustment varies by level of baseline score. In Figure 3 we have used eight different adjustment points, one for each subset formed from sorted baseline values. EQQ plots of the shifted values for each subset are then formed as in Figure 3. Even more usefully, the plot can be shown ‘frame-by-frame’ using dynamic graphic techniques such as those available in MacSpin.I6

We now return to Figure 3. Since most of the small numbers on the plot (that is, small baseline values) are in the lower left-hand side of the plot, we first conclude that there is a strong

Page 7: Statistical methods for preventive trials in mental health

STATISTICAL METHODS FOR PREVENTIVE TRIALS IN MENTAL HEALTH - Shifted Value for GBG

GBG Lowess Fit at X GBG

295

---- Limited Range of Baseline Values ---.

Figure 4. Adjusting response variable to a common baseline value

relationship between baseline and outcome aggressive ratings. We can also see that most of the points lie above the line, indicating that after adjustment for Fall aggressive scores, GBG exposed males are rated somewhat less aggressive than are the males in internal control classes (the majority of the points fall above the 45 degree line) unlike the unadjusted plot in Figure 1. There appears to be a positive benefit of GBG in reducing aggressiveness even at a low level of baseline aggressiveness; there may be some question about its effectiveness near the middle, but the effects do not seem to diminish for those who have high rates of baseline aggressive behaviours. Thus the intervention is likely to be useful throughout most of the population, potentially affecting those in the middle somewhat less than those on the extremes. Referring back to equations (2 to 4), Figure 3 suggests that on the original scale of aggression outcome measures, the intervention effect appears to vary little by baseline aggression (equation (2)). If variations in impact are borne out in further analyses, non-linear models (similar to equation (4)) may be needed.

Of course, these statements need further verification before we can feel confident about their validity, but they do encourage us to choose analytic methods that will permit us to look for broad effects of the intervention with some effort devoted to examining a possibly small, non- linear interaction involving baseline aggression ratings. We have no evidence, though, that the outcome variable should be dichotomized as we might do for the data in Figure 2 (that is, near the upper quartile) in order to investigate where the intervention effect is most powerful.

The following plot is a second complementary way to examine whether the intervention affects the conditional distribution of the outcome (Spring aggression) given baseline covariate (Fall aggression). In Figure 5 we have compared the two lowess fits of GBG and internal control Spring versus fall aggression scores by plotting for each covariate value the lowess fit of GBG on the horizontal against the lowess fit for controls at the same covariate value on the vertical axis. Co-ordinates are then connected to one another by a smooth line, ordered by increasing values of

Page 8: Statistical methods for preventive trials in mental health

296 C. H. BROWN

x

I I I I I

0.1 0.2 0.3 0.4 0.5

Lowess Fit for Students in GBG Classes

Figure 5. Comparison of two lowess fits for fall to spring aggression

the covariate. This plot can be used to suggest how these two central tendency relationships, expressed as functions of a covariate, differ from one another. The systematic departure from the 45 degree line away from the origin suggests once again that GBG tends to improve aggression for students who had either low or high aggression scores at baseline with little or no improvement for those with moderate baseline aggression. Repeated analysis using a second independent cohort can be used to evaluate the stability of these tentative conclusions.

GENERALIZED ESTIMATING EQUATIONS FOR UNEQUALLY SPACED BINARY DATA

We now turn to a second more confirmatory tool. Generalized estimating equations (GEE) methodology has been applied to many longitudinal s t u d i e ~ . ' ~ ~ ' ~ The major benefit of these techniques is that they permit us to apply generalized linear models to problems where the correlation structure, arising from repeated measures or clustering, is unlikely to be modelled exactly. The non-independence between observations can cause severe underestimation of variances with the effect that all tests may be completely erroneous when we neglect this non- independence.

For our example, we will apply GEE to a problem with multiple levels of clustering. Our examination of the first-year outcome of the GBG using an adjusted EQQ plot (Figure 3) relied on peer measures of aggression. Despite the salience of the peer group's ratings on an individual's development, the reliance on peer measures or even teacher ratings introduces problems in interpretation of results. Both peers and teachers are participants in the intervention, so it is possible that the intervention itself, by increasing the cost of being aggressive in the classroom,

Page 9: Statistical methods for preventive trials in mental health

STATISTICAL METHODS FOR PREVENTIVE TRIALS IN MENTAL HEALTH 297

could change peer and teacher ratings without any change in the frequencies of the behaviours. To determine whether the interventions indeed changed the frequencies of maladapting behavi- our, we had all children in the classroom rated by observers on the presence or absence of specific behaviours during small time intervals. Specifically, at four times during first grade, two trained observers would visit the classroom and collect approximately 40 repeated measures each consisting of 10-seconds of observations. The observers would collect 6 consecutive 10-second observations on a child and then proceed to the next child on the list, continuing until all children in the class had been observed for one minute. Then the observer would return to the first child to collect another 6 observations and would continue until the classroom was dismissed. The data are characterized by highly correlated outcomes for consecutive measures with modest to little correlation between measures collected 15 to 30 minutes apart.

It was apparent early on that the classroom context had a major impact on the prevalence of behaviours, so data were also collected on the type of classroom structure during each 10-second observation, whether the class was working as a whole in a large group, whether the child under observation was working with a small group, whether the child was doing seat work, or whether time was taken up for procedural matters, such as getting on coats to go home. Among the outcomes measured at each 10-second interval was off-task behaviour, a primary measure of the child's concentration level during class time. Children were off-task during a 10-second time period if they did not attend to the assigned task for a full 6-second period during this interval. We will examine whether the GBG improved time off-task.

A series of logistic regression models were considered to determine the effect of the intervention given characteristics of the child and the classroom context at that time period. One model without interactions would be the following:

logit Pr(0ff-task,lBaseline, Context,, Intervention) = Po + P1 Baseline + /?,Context, + /?,Intervention.

The independence assumption ordinarily imposed in logistic regression is not likely to hold here since two measures close in time on the same subject are not likely to be independent. One might consider modelling this dependence, for example, as a Markov chain (Albert and B r o ~ n , ' ~ ) , but it would be difficult because of the interrupted sequences; some observations on one subject are preceded by a measure on the same subject while others are preceded by a measure on a different subject. More problematic, however, would be the difficulty in analysing off-task behaviours that were restricted to one type of activity; such a restriction would create even more irregular gaps in the dataset.

GEE methods can be applied to multivariate data to estimate parameters associated with marginal probabilities of an outcome such as off-task behaviour, even if the associations between the measures are poorly specified. As long as the number of subjects or units which are in fact measured independently is large and the marginal probability model is specified correctly, we can obtain consistent estimates of these marginal probability parameters and asymptotically correct confidence intervals as well. Liang and Zeger'* and Zeger and Liangl' recommend specifying a 'working model' for the correlation between the measures; if this model is close to correct then the estimates will be efficient. We chose to use a working model of independence in order to calculate the logistic regression parameters using a standard logistic regression package. Then adjustments were made in computing the variances following Zeger and Liang."

A further difficulty occurred in these data, however, since there are multiple levels of clustering that we were forced to consider. Table I shows five different levels which could be used for analysis of these data. They range from treating every observation as independent, to all 206

Page 10: Statistical methods for preventive trials in mental health

298 C. H. BROWN

Table I. Standard errors for different levels of analysis for males’ off-task behaviour

~~

Level of analysis (N) Coefficient Standard error ~ _ _ _ ~

10-second interval (7956) - 1.18 0.08 Subjects (206) - 1.18 0.22 Classes (25) - 1.18 0.32 Schools (12) - 1.18 0.36 Geographic areas (5) - 1.18 027

individuals as independent, to classrooms as independent, to school and finally geographic areas as independent. Note that in Table I, the point estimates for b3, the coefficient for GBG’s reduction in off-task behaviour compared to internal control, are identical for all these models; the log odds of being off-task for GBG students was 1.18 smaller than that for control students. The identical values for intervention effects in each of these models results from the assumption of independence as a working correlation model; under such a model only standard errors change with a change in the level of clustering. These standard errors were all computed using the general ‘sandwich-type’ formula,

where I is the observed Fisher information matrix under an independence assumption and S,([) is a vector of scores evaluated at the estimator [, again computed under an independence assumption. The subscript refers to the level of clustering. To obtain the first standard error in Table I, the subscript i refers to the score for each of the 7956 observations. To obtain the second standard error, i refers to the sum of scores for the ith individual; the third row involves a sum of scores for observations within the ith classroom, etc. Each of these levels of analysis were used to compute a standard error of the intervention effect, all using a working model of independence. Note that the GEE standard errors given in Table I are extremely different. Confidence intervals obtained by treating every observation as independent would be only one-third to one-fourth as wide as they would be based on other levels of analysis.

Some simple asymptotic properties about the standard errors are easy to derive. If it is correct to treat, say, individuals and all higher order clusters as independent, then the standard error calculations for that level, as well as any level higher than that level will all be asymptotically unbiased as the number of individuals increases to infinity. However, because of the small number of schools and geographical units, the variance of the standard error will increase dramatically. On the other hand, if we err in making an assumption that higher level of clustering is negligible, then the GEE standard error for our estimator will be biased. Suppose, for example, there is a sizable within-classroom association, but we treat observations on different individuals as (erroneously) independent. Then the standard error based on an assumption of independent individuals (0.22 in Table I) will be on average too small. Thus the choice between which standard error to report comes down to a trade-off of bias against variance.

There are few general statistical methods that can be used to decide which of the standard errors should lead to the most valid inferences. The newest version of GEE (Liang et al.”) can efficiently estimate some associations between variables, but it would be extremely difficult to handle the interrupted sequences in this dataset and large number of hierarchical clusters. From recent work, the most promising method is the bootstrap since this allows one to obtain separate estimates of bias and variance at all levels of clustering.

Page 11: Statistical methods for preventive trials in mental health

STATISTICAL METHODS FOR PREVENTIVE TRIALS IN MENTAL HEALTH 299

In this example (Table I) we can infer that GBG does have a substantial impact on reducing off- task behaviour regardless which level of clustering is used. Even using the largest standard error, our 95% confidence interval is ( - 1.90, - 0.46). Also, since the standard errors based on complete independence and the ones based on clustering at the individual level are so different, the data from each ten-second interval are clearly not independent of nearby ones. Similarly, judging by the sizes of the standard errors, there may be an important intraclass effect. However, based on the sized of the standard errors and the design which was based on classroom level interventions, it is unlikely that school level or higher level collapsing is necessary.

CONCLUSIONS

We have discussed two methods for analysis of prevention data in the mental health field. It is important for intervention studies to address the roles of mediating and modifying variables to understand whether a broad-based universal intervention should be applied to everyone or whether additional interventions should target select subgroups (Brownz’). The statistical methods to answer these questions should include exploratory methods as well as those traditionally used for treatment trials. Adjusted EQQ plots are useful because they permit non- parametric adjustments of covariates and therefore do not require full specification of the first moment structure, and they can play a major role in identifying variation in intervention effectiveness. Auxillary questions raised by the study should be addressed with existing data while validating these tentative results on separate cohorts whenever possible.

The statistical methods described here rely on the importance of defining the effect of an intervention as the difference in response that an individual would have if the intervention is given compared to the response if the intervention is not given. By using exploratory methods to examine the nature of this effect we can discover how best to model the intervention’s impact, either with transformations or categorizations of both baseline variables and response variables, and borrow strength from data across subjects. Non-independence in longitudinal non-normal data and multiple levels of clustering have in the past presented technical difficulties in obtaining valid inferential statements. Generalized estimating equations provides a general method of handling these problems without the necessity of specifying a correct second moment structure and can be applied to intervention studies with little additional effort beyond that necessary for performing generalized linear models.

The author wishes to acknowledge the contributions of Maria Corrada-Bravo for her help in the generalized estimating equation analyses. Also, this research could not have been carried out without the collaborative involvement of the Baltimore City Public School, and we gratefully acknowledge the contributions and support of Alice Pinderhughes, Leonard Wheeler, Leonard Granick, Craig Cutter, and Ray Bird, along with the principals and teachers involved in the study. This paper was supported by NIMH Grant MH40859.

ACKNOWLEDGEMENTS

REFERENCES 1. Kellam, S. G. Brown, C. H., Rubin, B. R. and Ensminger, M. E. ‘Paths leading to teenage psychiatric

symptoms and substance use: Developmental epidemiological studies in Woodlawn’, in Guze, S. B. Earls, F. J. and Barrett, J. E. (eds.) Childhood Psychopathology and Development, Raven Press, New York, 1983.

2. Gordon, R. S. ‘An operational classification of disease prevention’, Public Health Reports, 98, 107-109, (1983).

3. Kellam, S. G., Werthamer-Larsson, L., Dolan, L., Brown, C. H., Mayer, L., Rebok, G., Anthony, J.,

Page 12: Statistical methods for preventive trials in mental health

300 C. H. BROWN

Laudolff, J., Edelsohn, G. and Wheeler, L. ‘Developmental epidemiologically-based preventive trials: Baseline modeling of early target behaviors and depressive symptoms’, American Journal of Community

4. Dolan, L. J., Kellam, S. G., Brown, C. H., Werthamer-Larsson, L., Rebok, G. W., Mayer, L. S., Landolff, J., Turkkan, J., Ford, C. and Wheeler, L. ‘The short-term impact of two classroom-based preventive interventions on aggressive and shy behaviors and poor achievement’, To appear in Journal of Applied Developmental Psychology, (1992).

5. Dolan, L., Kellam, S., Brown, C. H., et al. ‘An early classroom-based preventive intervention aimed at aggressive and shy behaviors: impact at end of first grade’, Technical Report, Department of Mental Hygiene, Johns Hopkins University.

6. Ensminger, M. E., Kellam, S. G. and Rubin, B. R. ‘School and family origins of delinquency: comparisons by sex’, in Van Dusen, K. T. and Mednick, S. A. (eds.), Prospective Studies of Crime and Deliquency, Kluwer-Nijhoff, Boston, 1983.

7. Kellam, S. G., Branch, J. D., Agarwal, K. C. and Ensminger, M. E. Mental Health and Going to School: The Woodlawn Program of Assessment, Early, Intervention, and Evaluation, University of Chicago Press, Chicago, 1975.

8. Rubin, D. B. ‘Estimating causal effects of treatments in randomized and nonrandomized studies’, Journal of Educational Psychology, 66,688-701 (1974).

9. Rubin, D. B. ‘Bayesian inference for causal effects: The role of randomization’, Annals of Statistics, 6,

10. Holland, P. W. ‘Statistics and causal inference’, Journal of the American Statistical Association, 81,

11. Rosenbaum, P. R. and Rubin, D. B. ‘The central role of the propensity score in observational studies for causal effects’, Biometrika, 70, 41-55 (1983).

12. Rosenbaum, P. R. ‘From association to causation in observational studies: The.role of tests of strongly ignorable treatment assignment’, Journal of the American Statistical Association, 79, 41-48 (1984).

13. Wilk, M. B. and Gnanadesikan, R. ‘Probability plotting methods for the analysis of data’, Biometrika, 55,

14. Easton, G. S. and McCulloch, R. E. ‘A multivariate generalization of quantile-quantile plots’, Journal of

15. Cleveland, W. S. ‘Robust locally weighted regression and smoothing scatterplots’, Journal of the

16. MacSpin Reference Manual Dz Software, 1986. 17. Zeger, S. and Liang, K. Y. ‘Longitudinal data analysis for discrete and continuous outcomes’, Biometrics,

18. Liang, K. Y. and Zeger, S. ‘Longitudinal data analysis using generalized linear models’, Biometrika, 73,

19. Albert, P. and Brown, C. H. ‘The design of a panel study under an alternating Poisson process assumption’, Biometrics, 47, 921-932 (1991).

20. Liang, K. Y., Zeger, S. L. and Qaqish, B. ‘Multivariate regression analysis of categorical data’, Journal of the Royal Statistical Society, Series B, 54, 3-40 (1992).

21. Brown, C. H. ‘Principles for designing universal and targeted intervention studies’, Proceedings of the Second National Conference on Prevention Research, National Institute of Mental Health, 1991.

Psychology, 19, 563-584 (1991).

34-58 (1978).

945-960 (1986).

1-17 (1968).

the American Statistical Association, 85, 376-386 (1990).

American Statistical Association, 74, 829-836 (1979).

42, 121-130 (1986).

13-22 (1986).