7
Mental Health Services Research, Vol. 3, No. 2, 2001 Interpreting Results in Mental Health Research Jeffrey S. Harman, 1,5 Willard G. Manning, 2 Nicole Lurie, 3 and Chuan-Fen Liu 4 It is often difficult to interpret the clinical or policy significance of findings from mental health research when results are presented only in terms of statistical significance. Results expressed in terms of p values or as a metric corresponding to a mental health status scale are seldom intuitively meaningful. To help interpret the significance of research results, we demonstrate a social validity approach that relates scores on mental health status scales to four subsequent major life events. A logistic regression model is used to estimate the relation between mental health status scores and the probability of subsequent major life events, using data obtained on Medicaid beneficiaries with schizophrenia from an evaluation of the Utah Prepaid Mental Health Plan. Using this relatively simple approach will demonstrate to policy makers, clinicians, and researchers the social impact of an outcome, thereby aiding in the interpretation of the significance of results. KEY WORDS: clinical significance; effect size; mental health status; schizophrenia; life events. Policy makers, researchers, and clinicians are increasingly asked to respond to studies in mental health services that examine the effect of treatment or other interventions or policies on mental health status. Studies that find statistically significant differ- ences between treatment and control groups usually conclude that the treatment or intervention has a significant effect on mental health status. Because it is often not clear whether results that are statistically significant are also significant in a clinical or policy sense, many healthcare organizations and policy makers are put in positions where they must use their own idiosyncratic methods of interpreting sig- nificance. This is partly because measures of mental 1 Department of Psychiatry, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania. 2 Department of Health Studies, University of Chicago—MC2007, Chicago, Illinois. 3 Schools of Medicine and Public Health, University of Minnesota, Minneapolis, Minnesota. 4 Health Services Research and Development (152), Veterans Af- fairs Puget Sound Health Care System, Seattle, Washington. 5 Correspondence should be directed to Jeffrey S. Harman, Depart- ment of Psychiatry, University of Pittsburgh School of Medicine, 3811 O’Hara Street, Suite 430, Pittsburgh, Pennsylvania 15213; e-mail: [email protected]. health status are not expressed in a metric that is intuitively meaningful. Because of the difficulty in interpreting results from empirical studies, many important research results that could improve med- ical care practice are never actually translated into practice. To address this issue, efforts have been made to improve the ability to judge significance of research findings. Specific methodologies for judging clinical significance have been proposed, including the mea- surement of effect sizes. Effect sizes are often defined as the mean change found in a particular variable divided by the standard deviation of that variable (Cohen, 1977). Kazis, Anderson, and Meenan (1989) argue that this method should be used to assess the clinical significance of changes in health status be- cause the use of general effect size thresholds gives a stronger sense of the meaning of health status change. They suggest, using the thresholds defined by Cohen (1977), an effect size of 0.20 as small, one of 0.50 as moderate, and one of 0.80 or greater as large. Another suggested approach to judging clinically significant change is to use normative comparisons (Jacobson et al., 1999; Kendall, Marrs-Garcia, Nath, & Sheldrick, 1999). Jacobson et al. (1999) suggest calculating the reliable change index (RCI). The 91 1522-3434/01/0600-0091$19.50/0 C 2001 Plenum Publishing Corporation

Interpreting Results in Mental Health Research

Embed Size (px)

Citation preview

Page 1: Interpreting Results in Mental Health Research

P1: Vendor/ P2: GCQ/ QC: GCQ

Mental Health Services Research (MHSR) PP150-302688 May 2, 2001 17:15 Style file version Nov. 07, 2000

Mental Health Services Research, Vol. 3, No. 2, 2001

Interpreting Results in Mental Health Research

Jeffrey S. Harman,1,5 Willard G. Manning,2 Nicole Lurie,3 and Chuan-Fen Liu4

It is often difficult to interpret the clinical or policy significance of findings from mental healthresearch when results are presented only in terms of statistical significance. Results expressedin terms of p values or as a metric corresponding to a mental health status scale are seldomintuitively meaningful. To help interpret the significance of research results, we demonstratea social validity approach that relates scores on mental health status scales to four subsequentmajor life events. A logistic regression model is used to estimate the relation between mentalhealth status scores and the probability of subsequent major life events, using data obtainedon Medicaid beneficiaries with schizophrenia from an evaluation of the Utah Prepaid MentalHealth Plan. Using this relatively simple approach will demonstrate to policy makers, clinicians,and researchers the social impact of an outcome, thereby aiding in the interpretation of thesignificance of results.

KEY WORDS: clinical significance; effect size; mental health status; schizophrenia; life events.

Policy makers, researchers, and clinicians areincreasingly asked to respond to studies in mentalhealth services that examine the effect of treatmentor other interventions or policies on mental healthstatus. Studies that find statistically significant differ-ences between treatment and control groups usuallyconclude that the treatment or intervention has asignificant effect on mental health status. Because itis often not clear whether results that are statisticallysignificant are also significant in a clinical or policysense, many healthcare organizations and policymakers are put in positions where they must usetheir own idiosyncratic methods of interpreting sig-nificance. This is partly because measures of mental

1Department of Psychiatry, University of Pittsburgh School ofMedicine, Pittsburgh, Pennsylvania.

2Department of Health Studies, University of Chicago—MC2007,Chicago, Illinois.

3Schools of Medicine and Public Health, University of Minnesota,Minneapolis, Minnesota.

4Health Services Research and Development (152), Veterans Af-fairs Puget Sound Health Care System, Seattle, Washington.

5Correspondence should be directed to Jeffrey S. Harman, Depart-ment of Psychiatry, University of Pittsburgh School of Medicine,3811 O’Hara Street, Suite 430, Pittsburgh, Pennsylvania 15213;e-mail: [email protected].

health status are not expressed in a metric that isintuitively meaningful. Because of the difficulty ininterpreting results from empirical studies, manyimportant research results that could improve med-ical care practice are never actually translated intopractice.

To address this issue, efforts have been made toimprove the ability to judge significance of researchfindings. Specific methodologies for judging clinicalsignificance have been proposed, including the mea-surement of effect sizes. Effect sizes are often definedas the mean change found in a particular variabledivided by the standard deviation of that variable(Cohen, 1977). Kazis, Anderson, and Meenan (1989)argue that this method should be used to assess theclinical significance of changes in health status be-cause the use of general effect size thresholds gives astronger sense of the meaning of health status change.They suggest, using the thresholds defined by Cohen(1977), an effect size of 0.20 as small, one of 0.50 asmoderate, and one of 0.80 or greater as large.

Another suggested approach to judging clinicallysignificant change is to use normative comparisons(Jacobson et al., 1999; Kendall, Marrs-Garcia, Nath,& Sheldrick, 1999). Jacobson et al. (1999) suggestcalculating the reliable change index (RCI). The

91

1522-3434/01/0600-0091$19.50/0 C© 2001 Plenum Publishing Corporation

Page 2: Interpreting Results in Mental Health Research

P1: Vendor/ P2: GCQ/ QC: GCQ

Mental Health Services Research (MHSR) PP150-302688 May 2, 2001 17:15 Style file version Nov. 07, 2000

92 Harman, Manning, Lurie, and Liu

RCI is based on using one of three cutoff points todetermine if a clinically significant effect occurred.These cutoff points assume that a clinically significantchange occurs when (a) the level of functioning fallsoutside the range of the dysfunctional or the normalpopulation, where range is defined as extending twostandard deviations from the mean; or (b) the levelof functioning suggests that the client is statisticallymore likely to be in the normal rather than in the dys-functional population (Jacobson et al., 1999). Kendallet al. (1999) suggest a similar approach where dataon treated individuals are compared with normativeindividuals, using “clinically equivalent” effect sizes.A problem with using these methods is that theyrequire well established psychometric properties ofa scale. Also, these methods require that there is aset standard score for a normal population or thedysfunctional population in question (or both). Suchstandard scores are rarely available, with standardscores for a “normal” population especially rare formany mental health status scales. Even if standardscores are available and there is a consensus on whatis “clinically equivalent,” changes in client behavior,which would be clinically significant on commonsensegrounds, would not be large enough to fall within anormative range and would not be identified as clin-ically significant by the researcher. As Kazdin (1999,p. 333) points out, “one can be a little better or alot better without being all better or just like mostpeople.”

Although these approaches may provide moreinsight into the interpretation of a study’s findingsthan one that uses only a p value to test statistical sig-nificance, they still present results in terms that manypolicy makers, researchers, or clinicians may find con-fusing. This may decrease the possibility that impor-tant research results will actually be used by those in-dividuals or groups that create health policy or shapemedical care practice.

Another approach to interpreting the signifi-cance of research findings that is much more likely tobe accessible to policy makers, researchers, and clin-icians, is to use the construct of social validity. Thesocial validity construct is based on the belief thatinterventions should be evaluated in terms of desir-able social changes: that is, to what extent does theintervention produce outcomes that are important toor have an impact on society (Foster & Mash, 1999;Kazdin, 1977, 1999; Wolf, 1978). Expressing outcomesin terms of social impact can make the interpretabil-ity of research findings more accessible. Sechrest,McKnight, and McKnight (1996), advocated a move

toward social validity, stressing that changes in behav-ior or functioning is critical for assessing treatmentoutcome, rather than simply inferring change from ametric of uncertain meaning. Sechrest et al. (1996)also state that there is a critical need for psychologi-cal measures to be calibrated because it is importantto know how to interpret the score of one measureagainst another.

We demonstrate an empirical method based onthe suggestions of Sechrest et al. (1996), that relatesmeasures of mental health status from several differ-ent scales to life events. Using this method, clinicians,researchers, and policy makers can more easily judgethe significance of intervention outcomes. Relatingmental health status scores for several different men-tal health status scales also will calibrate these scalesto one another and allow for easier comparison ofoutcomes measured with different scales.

Although there are many scales available to mea-sure mental health status, there have been few at-tempts to relate scale scores to life events in order toaid in the interpretation of the social validity or impactof an outcome. An exception is the Mental Health In-dex, calibrated in this manner as part of the analysisof the RAND Health Insurance Experiment to aid inthe interpretation of effect sizes (Brook et al., 1983;Wells, Manning, & Valdez, 1989). In this study, weattempt to calibrate several different mental healthstatus scales and demonstrate a method that can beused in future studies to aid in the interpretation ofoutcomes. Establishing relationships between mentalhealth status scales and life events will help cliniciansand policy makers interpret the significance of studyfindings that are presented in terms of mental healthstatus scores. It will also allow for easier comparisonsof separate study findings that are presented in termsof different mental health status scales.

Past studies have demonstrated a relation be-tween life events and mental health status and schizo-phrenic symptomatology. Lehman (1983) showed asignificant relationship between global well-being,being a victim of a crime, and use of acute psychiatricservices by assessing these relationships throughPearson product–moment correlations. The Camber-well Collaborative Psychosis study demonstrated astrong link between stressful life events and onset ofacute psychosis by comparing the presence of variouslife events in people with recent onset of acutepsychosis to a control group (Bebbington et al., 1993;Van Os et al., 1994). Malla and Norman (1992) alsoestablished a relation between major life events andschizophrenic symptomatology by determining the

Page 3: Interpreting Results in Mental Health Research

P1: Vendor/ P2: GCQ/ QC: GCQ

Mental Health Services Research (MHSR) PP150-302688 May 2, 2001 17:15 Style file version Nov. 07, 2000

Interpreting Results in Mental Health Research 93

partial correlations between stress and symptom mea-sures in a subsequent time period while controllingfor symptom measures in a baseline period.

In this paper, we relate differences in scores oncommonly used mental health status scales to sub-sequent life events for persons with schizophrenia.These analyses are intended to aid clinicians, re-searchers, and policy makers in understanding the sig-nificance of findings from mental health services re-search using such scales and to demonstrate a methodto be used by other mental health researchers who usemental health status scales to measure the outcome ofsome intervention. In other words, this analysis willenable the results of such studies to be viewed in termsof the probability of the occurrence of meaningful lifeevents, and thus their social impact. This added infor-mation should make results more easily interpretablethan what an outcome expresses in terms of a mentalhealth status scale score, and should increase the like-lihood that policy makers and clinicians will translateresearch findings into practice.

METHODS

Measures

The mental health status scales examined in thisstudy are the Global Assessment Scale (GAS), twosubscales of the Schedule for Affective Disorders andSchizophrenia (SADS), which relate to depressionand endogenous features (SADS-A and SADS-C),the Brief Psychiatric Rating Scale (BPRS), and theSchizophrenia Subscale of the BPRS. These scaleshave been used in mental health services research be-cause of their proven validity and reliability as well astheir ease of administration by nonclinicians.

The GAS is a 100-point scale in which raters as-sess a respondent’s overall functioning. It is dividedinto ten deciles, with each decile describing a levelof functioning and symptoms. The rater chooses thedecile representing a patient’s lowest level of func-tioning during the specified time period and assignsa score somewhere within that decile as appropri-ate (Endicott, Spitzer, Fleiss, & Cohen, 1976). Higherscores indicate increased functioning and lower levelsof symptomatology and psychopathology (Endicottet al., 1976). The GAS has been shown to have verygood reliability and validity (Dworkin et al., 1990;Endicott et al., 1976).

The SADS was developed by Endicott andSpitzer (1978) to reduce variance in both the descrip-

tive and diagnostic evaluation of a subject. It com-prises eight subscales: depressive mood and ideation,endogenous features, depressive-associated fea-tures, suicidal ideation and behavior, anxiety, manicsyndrome, delusions–hallucinations, and formalthought disorder (Endicott & Spitzer, 1978). Thesesubscales are designed to measure different typesof symptoms by assessing severity of symptoms inthe prior week. The SADS can be administered bynonclinicians and rely on patient self-reports. Higherscores indicate lower levels of functioning and highersymptomatology. The SADS possesses both highreliability and validity (Endicott & Spitzer, 1978;Johnson, Magaro, & Stern, 1986).

The BPRS is used for the measurement of psy-chopathology and can be completed in approxi-mately 15 min by a trained, nonclinician interviewer(Overall & Gorham, 1962). It is designed to be sen-sitive to changes in patient condition during treat-ment and consists of 18 items, each correspondingto a different symptom construct: somatic concern,anxiety, emotional withdrawal, conceptual disorgani-zation, guilt feelings, tension, mannerisms and pos-turing, grandiosity, depressive mood, hostility, suspi-ciousness, hallucinatory behavior, motor retardation,uncooperativeness, unusual thought content, bluntedaffect, excitement, and disorientation (Overall &Gorham, 1962). For each of the 18 items, an inter-viewer assigns a score referring the degree to whichthe symptom is present from 1 (not present) to 7 (verysevere). Higher scores indicate increased symptomseverity and psychopathology (Overall & Gorham,1962). The BPRS has proven reliability and validity(Hedlund & Vieweg, 1980; Overall & Gorham, 1962).

The schizophrenia subscale of the BPRS is a 10-item scale abstracted from the BPRS. It includes onlyitems directly related to schizophrenia symptomatol-ogy and is considered to be the most discriminat-ing for measuring schizophrenic states (Overall &Gorham, 1979). The 10 items included in this scaleare emotional withdrawal, conceptual disorganiza-tion, motor retardation, grandiosity, hostility, suspi-ciousness, hallucinatory behavior, uncooperativeness,unusual thought content, and blunted affect. TheBPRS schizophrenia subscale possesses both relia-bility and validity (Anderson et al., 1989). Higherscores indicate increased symptomatology and psy-chopathology (Overall, 1979).

The major life events used in this study are psy-chiatric hospitalizations, victimizations, arrests, andsuicide attempts, all assessed by patient self-reportduring face-to-face interviews. Victimizations are

Page 4: Interpreting Results in Mental Health Research

P1: Vendor/ P2: GCQ/ QC: GCQ

Mental Health Services Research (MHSR) PP150-302688 May 2, 2001 17:15 Style file version Nov. 07, 2000

94 Harman, Manning, Lurie, and Liu

defined as being robbed, raped, or assaulted. Patientswere asked during interviews if any of the eventsoccurred since the previous interview took place(approximately 1 year).

Study Setting

Data used in this report were collected as partof an evaluation of the Utah Prepaid Mental HealthPlan (UPMHP), in which the Utah Medicaid programcontracted with several community mental healthcenters (CMHCs) to provide all mental health careto Medicaid beneficiaries in their catchment areason a capitated basis. One part of the evaluation ofthe UPMHP involved measuring the mental healthstatus of Medicaid beneficiaries with schizophreniathrough in-person interviews. Complete descriptionsof the UPMHP and the study setting and design arepresented elsewhere (Christianson & Gray, 1994;Christianson, Gray, Kihlstrom, and Speckman, 1995;Christianson, Manning et al., 1995).

Sample

We first identified the 1660 adult Medicaid ben-eficiaries with schizophrenia from Medicaid claimsdata covering an 18-month period from 1988 to 1990using a previously described and validated method(Lurie, Moscovice, Popkin, & Dysken, 1992). We ran-domly selected 1067 individuals, stratified by catch-ment area, to achieve a sample size of 400 individualsin prepaid and 400 individuals in fee-for-service sitesassuming an 80% response rate. A total of 823 indi-viduals were interviewed, with an actual response rateof 77% for the first wave.

Data Collection and Measurement

We use the first two of five waves of in-personinterviews collected as part of the evaluation of theUPMHP; the first wave was conducted from Januaryto April of 1991 (Time 1), whereas the second set wasconducted from July to September of 1992 (Time 2).During each wave, information was collected on men-tal health utilization, major life events, and mentalhealth status using the SADS, BPRS, and GAS. Fieldprocedures and survey instruments were essentiallythe same for both interviews. Ninety percent of thoseindividuals who completed surveys in Time 1 alsocompleted surveys in Time 2.

Analysis

We used methods similar to those employed inthe RAND Health Insurance Experiment to relatemental health status scale scores at Time 1 with theprobability of major life events self-reported as hav-ing occurred at least once between the Time 1 andTime 2 interview. Basically, baseline mental healthstatus scale scores of people who experience a majorlife event in Time 2 were compared with the baselinescale scores of people who did not experience a majorlife event in Time 2.

We used logistic regression analysis to relate theprobability of a negative life event to mental healthstatus scores. The analysis used weights that combinedsampling weights (the inverse of the sampling proba-bility) and response weights (the inverse of the prob-ability of having a complete response for that item)to correct for disproportionate sampling, sample loss,and item nonresponse. The inference statistics werethen corrected using a method based on Huber’s es-timator for the variance of the estimates in a robustregression (Huber, 1981).

In the logistic regression models, the self-reported life event from Time 2 was the dependentvariable and the mental health status score measuredat Time 1 was the explanatory variable. We related theprobability of each of the four life events to each ofthe five mental health status scales. The analyses con-trolled for differences in mental health status and theprobability of experiencing a negative life event thatwere due to age and gender. Odds ratios that corre-spond to the change in the probability of the negativelife event occurring at Time 2, given a difference in themental health status score equal to a moderate effectsize (effect size = 0.50) as defined by Cohen (1977),were then constructed. The p values associated withthese estimates were derived from simple two-tailedt tests. These analyses simply relate scores on a givenscale to the subsequent probability of a life event, withno causality or direction implied in this relationship.

RESULTS

To help determine if other samples are compara-ble to the one used in this study, general demographicinformation and the mean scores for each of the men-tal health status scales are presented in Table 1.

We present results in terms of improvementsin mental health status. Because the GAS is scoredpositively (higher scores indicate better functioning)

Page 5: Interpreting Results in Mental Health Research

P1: Vendor/ P2: GCQ/ QC: GCQ

Mental Health Services Research (MHSR) PP150-302688 May 2, 2001 17:15 Style file version Nov. 07, 2000

Interpreting Results in Mental Health Research 95

Table 1. Patient Characteristics

Mean SD

Age (years) 43.2 14.6Sex (percent male) 46% .50Race (percent nonwhite) 9% .29Scale Scoresa

Global Assessment Scale (0–100) 45.5 15.8SADS-A Depression Subscale (0–73) 26.3 9.8SADS-C Endogenous Subscale (0–36) 11.0 4.2BPRS Index (18–126) 37.7 15.7Schizophrenia Subscale of BPRS (10–70) 20.1 9.3

a Possible range of scale scores in parentheses.

whereas the other scales are scored negatively, wepresent our results in terms of an increase in the GASor a decrease in the SADS-A, SADS-C, BPRS, andSchizophrenia Subscale of BPRS. Table 2 presents theodds ratios corresponding to a difference in the scalescore equal to an effect size of 0.50.

As a guide to Table 2, GAS results are discussedin detail. The first two columns present the effect sizeas defined by Cohen (1977) and Kazis et al. (1989)and the difference in the GAS that corresponds toan effect size of 0.50. The next column indicates thatindividuals with differences in their baseline GASscores equal to an effect size of 0.50 (+7.8 points)have statistically significant (p < .01) differences inthe probability of any subsequent hospitalization,victimization, and suicide attempt. Thus, a moder-ate effect size in the GAS corresponds to a 15%difference in the probability of a psychiatric hospital-ization, an 8% difference in the probability of beingvictimized, and a 24% difference in the probability ofa suicide attempt. Differences in the interpretation ofeffect sizes between scales exist because each of thescales measure different aspects of symptomatologyand functioning. Therefore, these results indicate thata treatment that is reported to result in an effect sizeof 0.50 or a mean change in the GAS of 7.8 points, can

Table 2. Effects of Differences in Mental Health Status Scales

Psychiatrichospitalization Arrest Victimization Suicide attempt

ScaleEffect difference Odds Odds Odds Odds

Scale size (points) ratio 95% CI ratio 95% CI ratio 95% CI ratio 95% CI

Global Assessment Scale 0.50 +7.8 1.15 (1.10, 1.20) 1.06 (0.99, 1.14) 1.08 (1.03, 1.14) 1.24 (1.11, 1.39)SADS-A Depression Subscale 0.50 −4.9 1.19 (1.08, 1.31) 1.04 (0.94, 1.15) 1.18 (1.12, 1.25) 1.36 (1.28, 1.44)SADS-C Endogenous Subscale 0.50 −2.1 1.15 (1.04, 1.28) 1.07 (1.00, 1.16) 1.11 (1.07, 1.14) 1.35 (1.26, 1.45)BPRS Index 0.50 −7.5 1.17 (1.06, 1.29) 1.07 (0.94, 1.21) 1.12 (1.05, 1.19) 1.29 (1.21, 1.38)Schizophrenia Subscale of BPRS 0.50 −4.4 1.13 (1.02, 1.25) 1.07 (0.98, 1.17) 1.09 (0.98, 1.20) 1.24 (1.16, 1.33)

Note. Results represent differences in scales equal to a moderate effect size (effect size= 0.50) as defined by Cohen (1977). Sample probabilityof psychiatric hospitalization = .21, arrest = .10, victimization = .12, suicide attempt = .09.

instead be thought of as resulting in a 15% decreasein the probability of a psychiatric hospitalization or a24% reduction in the probability of a suicide attempt.

DISCUSSION

To help in the interpretability of research find-ings, we present a straightforward way to evaluatethe social impact of mental health outcomes by relat-ing mental health status scale scores and effect sizesto subsequent major life events. Presenting the resultof an intervention in terms of the reduction in theprobability of a suicide attempt or some other majorlife event should improve clinicians’ and policy mak-ers’ ability to interpret the importance of the results.For instance, if a clinical intervention is reported toresult in a 7.8 point increase in the GAS, or equiva-lently, an effect size of 0.50, this probably will not havemuch meaning to many clinicians or policy makers.However, stating that the result of the intervention isequivalent to a 24% reduction in the probability ofa suicide attempt will be more meaningful and easierto interpret for clinicians and other individuals. Simi-larly, reporting this result as being equivalent to a 15%reduction in the probability of being hospitalized willbe more meaningful to policy makers or health in-surance executives and will stress the importance ofthe finding more than a result presented in statisticalterms or as a scale score.

For example, Wirshing et al. (1999) studied theeffect of risperidone versus haloperidol in treatmentrefractory schizophrenia, concluding that risperidonedemonstrated clinical efficacy superior to that ofhaloperidol because patients, on average, improvedby 24% versus 11% on the BPRS (p = .03). The dif-ference in improvement in that study was 2.3 pointson the BPRS (9.8 point improvement versus 7.5 pointimprovement). Use of a social rather than a statistical

Page 6: Interpreting Results in Mental Health Research

P1: Vendor/ P2: GCQ/ QC: GCQ

Mental Health Services Research (MHSR) PP150-302688 May 2, 2001 17:15 Style file version Nov. 07, 2000

96 Harman, Manning, Lurie, and Liu

standard for assessing the effect size would indicate(based on our study sample) that a 2.3 point improve-ment on the BPRS, an effect size of 0.15, relatesto a 4.9% decrease in the probability of a psychi-atric hospitalization. Clinicians and policy makers canmore easily decide on their own whether a 4.9% de-crease in the probability of hospitalization representsa “clinically superior” result.

Similarly, Glick, Clarkin, Haas, and Spencer(1993) studied a family intervention, concluding thatit had a clinically significant effect on patients withchronic schizophrenia because patients improved onthe GAS by over two standard deviations (an im-provement of 12.5 points). This difference of 12.5points on the GAS relates to a 25% decrease in theprobability of a psychiatric hospitalization, confirm-ing the importance of such a difference in scores.

The results of this study should be viewed withsome caution for several reasons. First, the odds ra-tios presented in this analysis depend upon the base-line risks of the population studied. For example, inour sample, the probability of a psychiatric hospital-ization is .21. An odds ratio of 1.15 that is based on asample probability of .21 has quite a different mean-ing than an odds ratio of 1.15 that is based on a sam-ple probability of .05. In this case, one correspondsto a 3.2 percentage point difference in the probabilitywhereas the other corresponds to a 0.1 percentagepoint difference in the probability.

Second, the association between the mentalhealth status scale and the functioning of an individualmay not be linear. Thus, a 5-point change on the lowerlevels of a mental health status scale may not have thesame clinical meaning as a 5-point change on the up-per levels of that same scale. In such cases, our resultsmay understate or overstate the interpretation of theeffect sizes. We estimate the difference in probabilityusing a sample of individuals with schizophrenia, whofall in a restricted range (the more impaired end) ofthe mental health status scales. Thus, the results maynot be generalizable to a less-impaired population.However, this general methodology can be used toreproduce such an analysis to assess significance inother studies if the occurrence of life events is alsorecorded at the time of the study.

Third, we rely on patient self-report for measur-ing negative life events. There is potential for inaccu-rate recollection of the occurrence of the meaningfulevents in question. We doubt this is a major problembecause we use a dichotomous variable rather thanevent counts, which would potentially be less accurate.Also, the time between when the measures of baseline

mental health status were recorded and when the lifeevent took place is not known. Therefore, a recordedlife event, such as a suicide attempt, may have takenplace 1 month or 1 year after baseline mental healthstatus was assessed.

Finally, our study uses a sample of Medicaid ben-eficiaries in the state of Utah with a primary diagno-sis of schizophrenia, which again may limit its gen-eralizability. However, comparing demographic andscale score changes of our study population to othersshould help researchers determine how findings mightbe generalized to their population. Even if the resultspresented here are not generalizable, this method ofinterpreting mental health status scores can easily bereproduced in other studies as long as information onthe occurrence of major life events is collected at thetime of the study.

The information in this study helps provide moreinsight into the significance of research findings thanp values, scale scores, or proportions of a standarddeviation. However, the use of multiple indicators ofsignificance (social, clinical, and statistical) will pro-vide the best basis for judgment on the significance ofresearch findings. Knowledge of the association be-tween meaningful life events and mental health statusenables one to put differences in mental health statusscales in perspective, thus aiding in the interpretationof effect sizes. Furthermore, by calibrating each men-tal health status scale to a similar set of life events,results from different studies using different mentalhealth status scales can be compared more readily.

ACKNOWLEDGMENTS

The views expressed are solely those of the au-thors and do not necessarily reflect those of the Uni-versity of Pittsburgh, the University of Chicago, theUniversity of Minnesota, the Department of Healthand Human Services, or the Veterans Affairs PugetSound Health Care Center. Funding for data col-lection was provided by the State of Utah and theNational Institute of Mental Health. Additional sup-port was provided by the National Institute of MentalHealth Grant P30-MH30915. We gratefully acknowl-edge assistance provided by the Health Care Financ-ing Division, Department of Health, State of Utah,the Community Mental Health Centers in Utah; allthe interviewers who participated in the collection ofdata; and the cooperation of the interview respon-dents. We would also like to thank Jon Christianson,Tamara Stoner, Jennifer Skeem, and Kate Harkness

Page 7: Interpreting Results in Mental Health Research

P1: Vendor/ P2: GCQ/ QC: GCQ

Mental Health Services Research (MHSR) PP150-302688 May 2, 2001 17:15 Style file version Nov. 07, 2000

Interpreting Results in Mental Health Research 97

for their helpful comments on various versions of thispaper.

REFERENCES

Anderson, J., Larsen, J. K., Schultz, V., Nielsen, B. M., Korner,A., Behnke, K., Munk-Andersen, E., Butler, B., Allerup, P.,& Bech, P. (1989). The brief psychiatric rating scale: Dimen-sions of schizophrenia-reliability and construct validity. Psy-chopathology, 22, 168–176.

Bebbington, P., Wilkins, S., Jones, P., Foerster, A., Murray, R.,Toone, B., & Lewis, S. (1993). Life events and psychosis: Initialresults from the Camberwell Collaborative Psychosis Study.British Journal of Psychiatry, 162, 72–79.

Brook, R. H., Ware, J. E., Rogers, W. H., Keeler, E. B., Davies, A. R.,Donald, C. A., Goldberg, G. A., Lohr, K. N., Masthay, P. C., &Newhouse, J. P. (1983). Does free care improve adults’ health?The New England Journal of Medicine, 309, 1426–1434.

Christianson, J. B., & Gray, D. Z. (1994). What CMHCs can learnfrom two states’ efforts to capitate Medicaid benefits. Hospitaland Community Psychiatry, 45, 777–781.

Christianson, J. B., Gray, D. Z., Kihlstrom, L. C., & Speckman,Z. K. (1995). Development of the Utah prepaid mental healthplan. Advances in Health Economics and Health Research, 15,117–135.

Christianson, J., Manning, W., Lurie, N., Stoner, T., Gray,D., Popkin, M., & Marriott, S. (1995). Utah’s prepaid mentalhealth plan: The first year. Health Affairs, 14, 160–172.

Cohen, J. (1977) Statistical Power Analysis for the Behavioral Sci-ences. New York, NY: Academic Press.

Dworkin, R. J., Friedman, L. C., Telschow, R. L., Grant, K. D.,Moffic, H. S., & Sloan, V. J. (1990). The longitudinal use of theGlobal Assessment Scale in multiple-rater situations. Commu-nity Mental Health Journal, 26, 335–344.

Endicott, J., & Spitzer, R. L. (1978). A diagnostic interview: Theschedule for affective disorders and schizophrenia. Archivesof General Psychiatry, 35, 837–844.

Endicott, J., Spitzer, R. L., Fleiss, J. L., & Cohen, J. (1976). TheGlobal Assessment Scale: A procedure for measuring overallseverity of psychiatric disturbance. Archives of General Psy-chiatry, 33, 766–771.

Foster, S. L., & Nash, E. J. (1999). Assessing social validity in clinicaltreatment research: Issues and procedures. Journal of Consult-ing and Clinical Psychology, 67, 308–319.

Glick, I. D., Clarkin, J. F., Haas, G. L., & Spencer, J. H. (1993).Clinical significance of inpatient family intervention: Conclu-sions from a controlled clinical trial. Hospital and CommunityPsychiatry, 44, 869–873.

Hedlund, J. L., & Vieweg, M. S. (1980). The Brief Psychiatric Rat-ing Scale (BPRS): A comprehensive review. Journal of Oper-ational Psychiatry, 11, 48–65.

Huber, P. J. (1981). Robust statistics. New York: Wiley.Jacobson, N. S., Roberts, L. J., Berns, S. B., & McGlinchey, J. B.

(1999). Methods for defining and determining the clinical sig-nificance of treatment effects: Description, application, and

alternatives. Journal of Consulting and Clinical Psychiatry, 67,300–307.

Johnson, M. H., Magaro, P. A., & Stern S. L. (1986). Use of theSADS-C as a diagnostic and symptom severity measure. Jour-nal of Consulting and Clinical Psychology, 54, 546–551.

Kazdin, A. E. (1977). Assessing the clinical or applied importanceof behavior change through social validation. Behavior Modi-fication, 1, 427–452.

Kazdin, A. E. (1999). The meaning and measurement of clinicalsignificance. Journal of Consulting and Clinical Psychology,67, 332–339.

Kazis, L. E., Anderson, J. J., & Meenan, R. F. (1989). Effect sizes forinterpreting changes in health status. Medical Care, 27, s178–s189.

Kendall, P. C., Marrs-Garcia, A., Nath, S. R., & Sheldrick, R. C.(1999). Normative comparisons for the evaluation of clinicalsignificance. Journal of Consulting and Clinical Psychology, 67,285–299.

Lehman, A. F. (1983) The well-being of chronic mental patients:Assessing their quality of life. Archives of General Psychiatry,40, 369–373.

Lurie, N., Moscovice, I., Popkin, M. & Dysken, M. (1992). Accu-racy of medicaid claims for psychiatric diagnoses: Experiencewith the diagnosis of schizophrenia. Hospital and CommunityPsychiatry, 43, 69–71.

Malla, A. K., & Norman, R. M. (1992). Relationship of major lifeevents and daily stressors to symptomatology in schizophrenia.Journal of Nervous and Mental Disease, 180, 664–667.

Overall, J. E. (1979). Criteria for the selection of subjects for re-search in biological psychiatry. In Van H. M. Prag, M. H. Lader,O. J. Rafaelsen, & E. J. Sachar (Eds.), Handbook of biologicalpsychiatry (pp. 359–391). New York: Dekker.

Overall, J. E., & Gorham, D. R. (1962). The Brief Psychiatric RatingScale. Psychological Reports, 10, 799–812.

Sechrest, L., McKnight, P., & McKnight, K. (1996). Calibration ofmeasures for psychotherapy outcome studies. American Psy-chologist, 51, 1065–1071.

Van Os, J., Fahy, T. A., Bebbington, P., Jones, P., Wilkins, S., Sham,P., Russell, A., Gilvarry, K., Lewis, S., Toone, B., & Murray, R.(1994). The influence of life events on the subsequent course ofpsychotic illness: A prospective follow-up of the CamberwellCollaborative Psychosis Study. Psychological Medicine, 24,503–513.

Wells, K. B., Manning, W. G., & Valdez, R. B. (1989). The effect ofinsurance generosity on the psychological distress and psycho-logical well-being of a general population. Archives of GeneralPsychiatry, 46, 315–320.

Wirshing, D. A., Marshall, B. D., Jr., Green, M. F., Mintz, J., Marder,S. R., & Wirshing, W. C. (1999). Risperidone in treatment-refractory schizophrenia. American Journal of Psychiatry, 156,1374–1379.

Wolf, M. M. (1978). Social validity: The case of subjective mea-surement or how applied behavior analysis is finding its heart.Journal of Applied Behavior Analysis, 11, 203–214.

Leonard BickmanAction Editor