17
385 The CES-D Scale: A Self-Report Depression Scale for Research in the General Population Lenore Sawyer Radloff Center for Epidemiologic Studies National Institute of Mental Health The CES-D scale is a short self-report scale designed to measure depressive symptomatology in the general population. The items of the scale are symptoms associated with depression which have been used in previously validated longer scales. The new scale was tested in household interview surveys and in psychiatric settings. It was found to have very high internal consistency and adequate test- retest repeatability. Validity was established by pat- terns of correlations with other self-report measures, by correlations with clinical ratings of depression, and by relationships with other variables which support its construct validity. Reliability, validity, and factor structure were similar across a wide variety of demographic characteristics in the general population samples tested. The scale should be a useful tool for epidemiologic studies of de- pression. The Center for Epidemiologic Studies Depres- sion Scale (CES-D Scale) was developed for use in studies of the epidemiology of depressive symptomatology in the general population. Its purpose differs from previous depression scales which have been used chiefly for diagnosis at clinical intake and/or evaluation of severity of illness over the course of treatment. The CES-D was designed to measure current level of depres- sive symptomatology, with emphasis on the af- fective component, depressed mood. The symp- toms are among those on which a diagnosis of clinical depression is based but which may also accompany other diagnoses (including &dquo;nor- maI&dquo;) to some degree. This definition of the variable being measured determines the appropriate criteria of validity and reliability (Standards for Educational and Psychological Tests, 1974). Content validity will be based on the clinical relevance of the symp- toms which comprise the items of the scale. Criterion-oriented validity will include correla- tions with other valid self-report depression scales, correlations with clinical ratings of severity of depression, and discrimination be- tween psychiatric patients and general popula- tion samples. Construct validity will be based on what is known about the theory and epidemiolo- gy of depressive symptoms. Evidence that the scale is reliable but is also sensitive to current levels of symptomatology will be based on predictability of test-retest changes in scores (e.g., scores of patients before and after treat- ment, or scores of household respondents before and after &dquo;Life Events Losses&dquo;). Since several comparable samples (essentially replications) were tested, consistency of results across the Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227 . May be reproduced with no cost by students and faculty for academic use. Non-academic reproduction requires payment of royalties through the Copyright Clearance Center, http://www.copyright.com/

v01n3p385

Embed Size (px)

DESCRIPTION

Psihologia varstei

Citation preview

  • 385

    The CES-D Scale: A Self-Report DepressionScale for Research in the General PopulationLenore Sawyer Radloff

    Center for Epidemiologic StudiesNational Institute of Mental Health

    The CES-D scale is a short self-report scaledesigned to measure depressive symptomatology inthe general population. The items of the scale aresymptoms associated with depression which havebeen used in previously validated longer scales. Thenew scale was tested in household interview surveysand in psychiatric settings. It was found to havevery high internal consistency and adequate test-retest repeatability. Validity was established by pat-terns of correlations with other self-report measures,by correlations with clinical ratings of depression,and by relationships with other variables whichsupport its construct validity. Reliability, validity,and factor structure were similar across a widevariety of demographic characteristics in thegeneral population samples tested. The scale shouldbe a useful tool for epidemiologic studies of de-pression.

    The Center for Epidemiologic Studies Depres-sion Scale (CES-D Scale) was developed for usein studies of the epidemiology of depressivesymptomatology in the general population. Itspurpose differs from previous depression scaleswhich have been used chiefly for diagnosis atclinical intake and/or evaluation of severity ofillness over the course of treatment. The CES-D

    was designed to measure current level of depres-sive symptomatology, with emphasis on the af-fective component, depressed mood. The symp-toms are among those on which a diagnosis ofclinical depression is based but which may alsoaccompany other diagnoses (including &dquo;nor-maI&dquo;) to some degree.

    This definition of the variable being measureddetermines the appropriate criteria of validityand reliability (Standards for Educational andPsychological Tests, 1974). Content validity willbe based on the clinical relevance of the symp-toms which comprise the items of the scale.Criterion-oriented validity will include correla-tions with other valid self-report depressionscales, correlations with clinical ratings ofseverity of depression, and discrimination be-tween psychiatric patients and general popula-tion samples. Construct validity will be based onwhat is known about the theory and epidemiolo-gy of depressive symptoms. Evidence that thescale is reliable but is also sensitive to currentlevels of symptomatology will be based onpredictability of test-retest changes in scores(e.g., scores of patients before and after treat-ment, or scores of household respondents beforeand after &dquo;Life Events Losses&dquo;). Since severalcomparable samples (essentially replications)were tested, consistency of results across the

    Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227. May be reproduced with no cost by students and faculty for academic use. Non-academic reproduction

    requires payment of royalties through the Copyright Clearance Center, http://www.copyright.com/

  • 386

    samples will also be shown as indirect evidenceof reliability.The CES-D was designed for use in general

    population surveys, and is therefore a short,structured self-report measure. It is usable bylay interviewers, acceptable to the respondent,and not substantially influenced by the normalrange of conditions during a household inter-view. The scale was designed for use in studies ofthe relationships between depression and othervariables across population subgroups. To com-pare results from one subgroup to another, thescale must be shown to measure the same thingin both groups. Therefore, it will be shown thatproperties of the scale (validity, reliability, factorstructure) are similar for the various populationsubgroups to be studied.

    Development of the ScaleThe CES-D items were selected from a pool of

    items from previously validated depressionscales (e.g. Beck, Ward, Mendelson, Mock, &Erbaugh, 1961; Dahlstrom & Welsh, 1960;Gardner, 1968; Raskin, Schulterbrandt, Reatig,& McKeon, 1969; Zung, 1965). The major com-ponents of depressive symptomatology wereidentified from the clinical literature and factoranalytic studies. These components included:depressed mood, feelings of guilt and worthless-ness, feelings of helplessness and hopelessness,psychomotor retardation, loss of appetite, andsleep disturbance. Only a few items wereselected to represent each component. Fouritems were worded in the positive direction tobreak tendencies toward response set as well asto assess positive affect (or its absence). To em-phasize current state, the directions read: &dquo;Howoften this past week did you ... &dquo; Each responsewas scored from zero to three on a scale of fre-quency of occurrence of the symptom.

    Pretests on small &dquo;samples of convenience&dquo;indicated appropriate performance of the scaleand guided minor revisions for clarity andacceptability. The 20-item scale used in thestudies reported here is shown in Table 1. The

    possible range of scores is zero to 60, with thehigher scores indicating more symptoms,weighted by frequency of occurrence during thepast week.

    Field Tests: Methods

    First Questionnaire Survey (Ql Survey)The CES-D scale was included in a structured

    interview containing over 300 items, includingother scales designed to measure depression ordepressed mood (Bradburn Negative Affect,1969; Lubin, 1967), psychological symptoms s(Langner, 1962), well-being (Bradburn PositiveAffect, 1969; Cantril Ladder, 1965) and SocialDesirability (Crowne & Marlowe, 1960). It alsoincluded standard sociodemographic items (age,sex, education, occupation, marital status) andmeasures of life events, alcohol problems, socialfunctioning, physical illness and use of medica-tions. The interview, which took about an hour,was conducted by an experienced lay interviewerin the home of the respondent.

    Probability samples of households designed tobe representative of two communities (KansasCity, Missouri, and Washington County, Mary-land) were selected. An individual (aged 18 andover) was randomly selected for interview fromeach household in the sample. Independentsamples of households were designated for eachweek of the study. Strong efforts were made tocomplete interviews in the assigned week, but upto three weeks (and unlimited numbers of call-backs) were allowed to maximize response rate.Interviewing was done from October 1971through January 1973 in Kansas City and fromDecember 1971 through July 1973 in Washing-ton County. The response rate in Kansas Citywas about 75%, with a total of 1173 completedinterviews; in Washington County the responserate was about 80%, with 1673 completed inter-views. Informed consent was obtained from allrespondents. Both sites had a refusal rate ofabout 17%, plus a small percentage of not-at-home and other reasons for nonresponse.Demographic distributions of the samples are

    Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227. May be reproduced with no cost by students and faculty for academic use. Non-academic reproduction

    requires payment of royalties through the Copyright Clearance Center, http://www.copyright.com/

  • 387

    Table 1. CES-D Scale

    reported elsewhere (Comstock & Helsing, inpress), as are analyses of characteristics of thosewho refused to be interviewed (Comstock & Hel-sing, 1973; Klassen & Roth, 1974). Refusalswere significantly more likely to have lower edu-cation and come from smaller households thanrespondents. Analyses have been made of re-spondents interviewed in the assigned week (&dquo;ontime&dquo;) versus the harder to find respondents in-terviewed in the following three weeks (&dquo;late&dquo;)(Mebane, 1973). Males and working people wereslightly overrepresented among the &dquo;late&dquo; re-spondents, but the &dquo;late&dquo; did not differ from the&dquo;on time&dquo; on the psychological measures in the

    interview, including the CES-D scale. The sam-ples probably have some underrepresentation ofmales and the poorly educated. However, theyinclude respondents with a wide range of demo-graphic characteristics, in numbers adequate foranalyses of relationships among variables.

    Second Questionnaire Survey (Q2 Survey)The CES-D scale was also included in a slight-

    ly revised (mainly shortened) version of the ques-tionnaire (Q2) used in Washington County only,from March 1973 through July 1974 (for threemonths Q1 and Q2 were used alternately). Sam-

    Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227. May be reproduced with no cost by students and faculty for academic use. Non-academic reproduction

    requires payment of royalties through the Copyright Clearance Center, http://www.copyright.com/

  • 388

    ples were drawn for four-week periods. The re-vision was not expected to affect the CES-D,since the scale was placed very early in both in-terviews, with identical preceding sections. Themajor differences between the 01 and Q2 sur-veys were: length of interview (60 vs. 30minutes); the time-basis of the sampling frame(weekly vs. four-week); and the site (Kansas Cityand Washington County vs. Washington Countyotili,). The response rate for the Q2 survey wasabout 75%, with 1089 completed interviews, andabout 22% refusals. Therefore, the obtainedsample for Q2 may be slightly less representativethan that of the Washington County QI survey.

    Mail-backs

    From May 1973 through March 1974, each re-spondent to Q2 was asked to fill out and mailback one retest on the CES-D scale either two,four, six, or eight weeks after the original inter-view. A total of 419 mail-backs was received(about 56% response rate).

    Reinterview Survey (Q3 Survey)The CES-D was also included in a reinterview

    (Q3) of samples of the original respondents to01 or Q2. In Kansas City, from July 1973through December 1973, 343 respondents (78%of those attempted) were reinterviewed about 12months after the original interview. From Au-gust 1973 through April 1974 in WashingtonCounty, 1209 respondents (about 79% of thoseattempted) were reinterviewed once-eitherthree, six, or twelve months after the original in-terview.

    Psychiatric Patient SamplesTwo clinical validation studies have been done

    in coordination with the survey program: one inWashington County, Maryland (Craig & VanNatta, in press) and one in New Haven, Connec-

    ticut (Weissman. Prusoff & Newberry, 1975). Inthe Washington County study, seventy patientsresiding in a private psychiatric facility wereselected on the basis of willingness and ability toparticipate. Each patient was rated on the Rock-liff Depression Rating Scale (Rockliff, 1971) bythe nurse-clinician who was most familiar withthe patients current status. Immediately follow-ing this, the patient was interviewed by one ofthe interviewers from the Washington Countygeneral population survey, using the original in-terview form (Ql). In the New Haven Study,thirty-five people admitted to outpatient treat-ment for severe depression and scoring seven orhigher on the Raskin Depression Rating Scale(Raskin et al., 1969) participated in the study.They were given the CES-D scale and the SCL-90 (Derogatis, Lipman, & Covi, 1973) as self-re-ports and rated by clinicians on the HamiltonRating Scale (Hamilton, 1960) as well as theRaskin. The measures were taken upon admis-sion for treatment, after one week, and afterfour weeks of treatment, using psychotropicmedication and supportive psychotherapy.

    Summary of Field TestsIn the present paper, results will be reported

    for the following populations: All Ql Whites,All Q2 Whites, and All Q3 Whites (these analy-ses were confined to Whites to make the samplescomparable, since the Q2 sample contained lessthan 3% nonwhites). These results are treated asreplications to demonstrate repeatability of theproperties of the scale across two samples con-sidered equivalent (Ql vs. Q2) and across twotests on essentially the same sample (Q 1 /Q2 vs.Q3). For test-retest reliability, scores of the samepeople at different times will be compared (Q2vs. mail-backs and Q1 /Q2 vs. Q3). To demon-strate generalizability across different groupsand thereby justify the epidemiologic uses of thescale, results will be compared across age, sex,race, and educational subgroups of the com-bined Q1 and Q2 general populations and withthe Washington County patient group.

    Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227. May be reproduced with no cost by students and faculty for academic use. Non-academic reproduction

    requires payment of royalties through the Copyright Clearance Center, http://www.copyright.com/

  • Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227. May be reproduced with no cost by students and faculty for academic use. Non-academic reproduction

    requires payment of royalties through the Copyright Clearance Center, http://www.copyright.com/

  • 390

    Suitability for Use in Household Surveys

    AcceptabilityThe CES-D scale proved acceptable to both

    general and clinical populations. In the 01 andQ2 survey data, the average nonresponse to sin-gle items was less than 0.2%. Two items were an-swered &dquo;dont know&dquo; or &dquo;not applicable&dquo; some-what more often than average: &dquo;I felt I was justas good as other people&dquo; (1.6%) and &dquo;I felt hope-ful about the future&dquo; (2.2%). For comparison,the 01 item with the highest nonresponse washousehold income (8%). The entire CES-D scalewas considered missing if more than four itemswere missing. This occurred only twice in the Q1 land Q2 data combined.

    Distributions

    Parameters of the distributions of CES-Dscale scores in the general population samplesand the Washington County patients are shownin Table 2. The distribution of scores in the pa-tient group was symmetrical, with a large stand-ard deviation, while the general population dis-tributions were very skewed, with a smallerstandard deviation and a much larger propor-tion of low scores. This pattern is consistent withan interpretation of the scale as related to apathological condition more typical of a patientpopulation than a household sample. However,there was a wide enough range of scores in thegeneral population to allow meaningful identifi-cation of relationships between depressive symp-tomatology and other variables.The very skewed distributions and the fact

    that groups with higher means also tended tohave higher variances should be noted. Standardparametric significance tests on these data willnot be exact. However, several basic analyses(e.g., analysis of variance of CES-D scores by sexand marital status) have been replicated usingnormalizing transformations and nonpara-metric tests. In no case was the decision (acceptvs. reject Ho) reversed. Nevertheless, probability

    levels reported should be considered approxi-mate, and borderline levels should be inter-preted with caution.

    Conditions of Interview

    Controlling for age, race, sex, marital status,income and occupational role, analyses of condi-tions during the interview revealed no significantdifferences in the CES-D scores associated withthe time of day, day of week, or month of year inwhich the interview took .place (tables availableon request). There were some differences in theaverage CES-D scores obtained by the variousinterviewers, significant at borderline levels,which warrant further analyses (Choi & Com-stock, 1975; Handlin et al., 1974). The Washing-ton County Q2 survey obtained significantlylower CES-D scores than did the WashingtonCounty Q1 survey. The Q3 reinterview survey(Kansas City and Washington County com-bined) also had significantly lower average CES-D scores than the initial (Q1 or Q2) survey of thesame respondents. Further analyses will be re-quired to assess the relative contribution ofseveral factors (including possible real timetrends, response bias in test-retest effects, inter-view forms, possible nonresponse bias, andsampling procedures).

    These possible difcrences (among inter-viewers and among me three surveys) are smallin magnitude and of minor practical im-portance. If further analyses confirm true dif-ferences, then suitable controls could be intro-duced in the sampling or analytic procedures. Itis more important for present purposes that theproperties.of the scale were consistent across in-terviewers and questionnaire forms. Analysis ofthe interviewers (excluding those who completedfewer than 20 interviews) revealed very similarlevels of reliability and patterns of relationshipsto other variables across interviewers (tablesavailable on request). Results from the 01. Q2and Q3 surveys are reported separately in subse-quent sections of this paper to demonstrate thatthey are also very similar on these properties.

    Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227. May be reproduced with no cost by students and faculty for academic use. Non-academic reproduction

    requires payment of royalties through the Copyright Clearance Center, http://www.copyright.com/

  • 391

    ReliabilityInternal ConsistencyThe scale contains 20 symptoms, any of which

    may be experienced occasionally by healthy peo-ple ; a seriously depressed person would be ex-pected to experience many but not necessarilyall of these symptoms. In a healthy population,positive and negative affect are expected to co-exist, with a low (negative) correlation. However,it has been suggested (Klein, 1974) that severelydepressed patients are characterized by absenceof positive as well as presence of negative affect,so that positive and negative affect would bemore highly (negatively) correlated. There is alsoevidence that different kinds of people maymanifest different types of symptoms; e.g., lowersocioeconomic status people report more phy-sical symptoms while higher socioeconomicstatus people report more affective symptoms(Crandell & Dohrenwend, 1967). In summary, ina general population sample, we would expect agreat deal of heterogeneity, with many peopleexperiencing a few symptoms and a few ex-periencing many. Therefore, some inter-itemcorrelations may be quite low, but the directionof correlations should be consistent enough toproduce reasonably high measures of internalconsistency. In a patient group, we would expecthigher item means, higher inter-item correla-tions, and very high internal consistency.The results support these expectations (see

    Table 3). Both inter-item and item-scale correla-tions were higher in the patient sample than inthe general population samples (even when thesmall N and, therefore, greater sampling error ofthe patient sample is taken into account). Ex-pectations were also confirmed by measures ofinternal consistency (coefficient alpha and theSpearman-Brown, split-halves method; Nun-nally, 1967). They were high in the generalpopulation (about .85) and even higher in thepatient sample (about .90).

    This high internal consistency may includesome component of response bias, i.e., the ten-dency of an individual to answer all questions inthe same (positive or negative) direction. In the

    extreme, this would result in a bimodal distribu-tion of scores, which was not observed in thepresent data. Evidence of discriminant validityand of validity based on clinicians ratings, inde-pendent of self-report, also suggests that re-sponse bias is not the major contributor to thereliability of the scale.

    Test-retest Correlations

    Predictions regarding test-retest correlationsdepend on several factors. The CES-D scale wasexplicitly designed to measure current (&dquo;thisweek&dquo;) level of symptomatology, which is ex-pected to vary over time. Changes over time maynot be monotonic; they are more likely to becyclic in at least some individuals, and the phase(length) of cycles may vary across individuals.The CES-D was designed to be sensitive to possi-ble depressive reactions to events in a personslife; the timing of these events is unpredictablebut presumably aperiodic. There are alsomethodological complications in test-retestmeasures. For example, there may be biases dueto nonresponse, biasing effects of repeated test-ing, and asymmetric regression toward the meandue to the very skewed distribution of CES-Dscores. Furthermore, in the present data, thetest-retest time interval was confounded withdifferences in style of data collection: all initialscores were based on interviews; the short-inter-val (weeks) retests were different (i.e., self-ad-ministered mail-backs); the long-interval(months) retests were the same (i.e., interviews).

    In light of these properties of the variable be-ing measured, we would expect only moderatelevels of test-retest correlations in the overallsamples. Shorter test-retest time intervalsshould produce somewhat higher correlationsthan longer intervals. However, if people wereselected by the information we have about whathappened during the time interval, the correla-tions should be better differentiated. Specifical-ly, life events are expected to introduce variabili-ty (i.e., some individuals may react more thanothers) and thus lower the test-retest correla-

    Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227. May be reproduced with no cost by students and faculty for academic use. Non-academic reproduction

    requires payment of royalties through the Copyright Clearance Center, http://www.copyright.com/

  • 392

    tions. Tables 4 and 5 show that the results wereconsistent with these predictions.

    Table 4 shows the test-retest correlations forthose who responded to the request to fill outand mail in a retest of the CES-D (mail-backs)and those who were reinterviewed (Q3). Allrespondents were retested only once; each timeinterval represents a different group of people.The correlations were in the moderate range (allbut one were between .45 and .70) and were, onthe average, larger for the shorter time intervals.

    In Table 5, all Q3 respondents (test-retesttime interval ranging from three to twelvemonths) were classified by whether any one of 14negative life events had occurred in the yearprior to the first interview and in the interval be-tween interviews. Those with no life events ateither time had the highest test-retest correla-tion ; those with life events at both times had thelowest correlation. Those with events at one time

    but not the other had intermediate correlations.The correlation for those with no events (r = .54)might be considered the fairest estimate of test-retest reliability, in the sense of repeatabilitywith conditions replicated, for the three- totwelve-month time interval. In the New Havenpatient group, the correlation of CES-D scoresat admission with scores obtained after fourweeks of treatment was .53 (compared with r =.58 for the SCL-90). In this group, &dquo;events&dquo; hadcertainly occurred, but the effect of treatmentmay be assumed to be in the same direction forall or most patients. Therefore, it is reasonablethat the correlation was about the same as thatin the &dquo;no events&dquo; group.

    ValidityAlthough not designed for clinical diagnosis,

    the CES-D scale is based on symptoms of de-pression as seen in clinical cases. Therefore, it

    Table 4. Test-retest Correlations by TimeInterval Between Test and Retest

    Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227. May be reproduced with no cost by students and faculty for academic use. Non-academic reproduction

    requires payment of royalties through the Copyright Clearance Center, http://www.copyright.com/

  • 393

    Table 5. Test-retest Correlations by LifeEvents Losses Before Each Test

    should discriminate strongly between patientand general population groups, be sensitive tolevels of severity of depressive symtomatology,and reflect improvements after psychiatric treat-ment. In addition, it should correlate well withother scales designed to measure depression andless well with scales which measure related butdifferent variables; be related to a felt need forpsychiatric services; and be sensitive to possiblereactive depression in the face of certain lifeevents.

    Clinical Criteria

    The CES-D scores discriminated well betweenpsychiatric inpatient and general populationsamples and discriminated moderately amonglevels of severity within patient groups. Table 2shows that the average CES-D score for thegroup of 70 Washington County psychiatric in-patients was substantially and significantlyhigher than the average for the general popula-tion samples. Seventy percent of the patients butonly 21% of the general population scored atand above an arbitrary cutoff score of 16. In thepatient group, the correlation between the CES-D scale and ratings of severity of depression bythe nurse-clinician was .56 (Craig & Van Natta,in press). In the New Haven patient group, theaverage CES-D score at admission was 39.11,with no score below 16 (note that this group wasscreened to include only those above 6 on theRaskin scale). The correlations of the CES-Dwith the Hamilton Clinicians Rating scale andwith the Raskin Rating scale were moderate (.44to .54) at admission. After four weeks of treat-

    ment, the correlations were substantially higher(.69 to .75). These correlations were almost ashigh as those obtained for the 90-item SCL 90(Weissman et al., 1975).

    Self-report CriteriaTable 6 shows correlations of the CES-D scale

    with other self-report scales in the several sam-ples. (Note that Q2 and Q3 did not include allscales.) In all the samples, the pattern of correla-tions of the CES-D with other scales gives rea-sonable evidence of discriminant validity. Thehighest rs were with scales designed to measuresymptoms of depression (i.e., Lubin, BradburnNegative Affect and Bradburn Balance) or gen-eral psychopathology (Langner) and the CantrilLadder. The correlation of the CES-D with theBradburn Positive Affect scale was negative andwas low positive with scales designed to measuredifferent variables (medications, disability days,social functioning, aggression). The CES-D cor-related moderately with interviewer ratings ofdepression but low negative to zero with inter-viewer ratings of cooperation and understandingof the question.

    Table 6 also shows support for the concept ofa &dquo;syndrome&dquo; of depression which is more con-sistent in the patient sample than in the generalpopulation samples. In the patient groups, thecorrelations with other depression scales werehigher positive (in the New Haven patients, cor-relation with the SCL-90 was .83); with theBradburn Positive Affect, higher negative; andwith other scales, the same low positive.The low negative correlations with the Mar-

    lowe-Crowne scale of &dquo;social desirability&dquo; sug-gest that there may be some general response setinvolved in the CES-D scores (see also Klassen,Hornstra, & Anderson, 1975). However, the pat-tern of correlations in Table 6 suggests that thisbias is small and does not entirely mask mean-ingful relationships with other variables.

    Need for Services

    In the Ql and Q2 surveys, the respondentswere asked whether they had had an emotional

    Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227. May be reproduced with no cost by students and faculty for academic use. Non-academic reproduction

    requires payment of royalties through the Copyright Clearance Center, http://www.copyright.com/

  • 394

    Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227. May be reproduced with no cost by students and faculty for academic use. Non-academic reproduction

    requires payment of royalties through the Copyright Clearance Center, http://www.copyright.com/

  • 395

    Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227. May be reproduced with no cost by students and faculty for academic use. Non-academic reproduction

    requires payment of royalties through the Copyright Clearance Center, http://www.copyright.com/

  • 396

    problem in the past week for which they felt theyneeded help. The group who answered the &dquo;needhelp&dquo; question &dquo;yes&dquo; or &dquo;no, because its no useto look for help&dquo; are considered &dquo;at risk&dquo; of be-coming patients. Table 7 shows parameters ofthe CES-D scale by answers to this question.The Washington County patient population isincluded for comparison. The general popula-tion &dquo;need help&dquo; groups were more similar tothe patients than were the &dquo;not need help&dquo;groups. The &dquo;need help&dquo; groups had highmeans (significantly higher than the &dquo;not needhelp&dquo; groups) and standard deviations, symme-trical distributions (low skew), high percentagesof high scores (16 and above), and moderatelyhigh correlations with the Bradburn Positive Af-fect scale. The patterns of correlations with theBradburn Positive and Negative Affect scalescan also be considered in terms of discriminantvalidity.

    Life Events

    Past research has shown an association of ill-ness, including mental illness, with certain sig-nificant life events (Dohrenwend & Dohren-wend, 1974). Table 8 shows the average CES-Dscores for those who do and do not report certainevents in the year (or during the retest intervalfor Q3) preceding the interview. The results wereas predicted: the more negative the event, thehigher the depression score of those who ex-perienced it. Vacations were associated with lowCES-D scores (possibly biased by socioeconomicstatus); marriage was ambiguous; separationwas more strongly associated with depressionthan was divorce.

    Table 9 shows the interview-reinterview (i.e.,QI/Q2 vs. Q3) CES-D scores by Life EventsLosses (using the same criterion of Life EventsLosses as was used for Table 5). The overalltrend was for lower scores on Q3 than on the ori-ginal interview, except in the group with noevents before the first interview and at least oneevent in the retest interval. The four groups weresignificantly different in amount of change inCES-D score by several different methods of

    Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227. May be reproduced with no cost by students and faculty for academic use. Non-academic reproduction

    requires payment of royalties through the Copyright Clearance Center, http://www.copyright.com/

  • 397

    Table 9. Test-retest CES-D Average ScoresBy Life Events Losses Before Each Test

    a Overall significance of difference between groups in change scores:p < .01 in one-way analysis of variance and in one-way analysis ofcovariance, with score at time 1 as covariate.

    testing change scores. These relationships of theCES-D scores to life events are considered vali-dation of its sensitivity to current mood state,which is a property desired for the scale.

    Improvement After TreatmentFurther evidence of response to change is fur-

    nished by the New Haven clinical study (Weiss-man, et al., 1975). The average CES-D score,along with the SCL 90, the Hamilton, and theRaskin, decreased significantly from the time ofadmission to one week and to four weeks oftreatment (see Table 10). The mean for each ofthe 20 items was lower after four weeks of threat-ment than upon admission (tables available onrequest). The change was particularly large forpatients rated &dquo;recovered&dquo; (by a Raskin score ofless than 7) after four weeks. The average CES-D score went down 20 points in the recoveredgroup and 12 points in the group rated &dquo;still ill.&dquo;

    Factor AnalysisPrincipal components factor analysis (with

    ones in the main diagonal) of the 20-item scalewas done for the three general populationgroups (All Ql Whites, All Q2 Whites, and AllQ3 Whites). For each group, there were foureigenvalues greater than one, which together ac-counted for a total of 48% of the variance; there-fore, the normal varimax rotation to four factorswas examined (see Table 11). The pattern of fac-tor loadings is quite consistent across the threegroups. Including items with loadings above .40in all three groups, the four factors are readilyinterpretable as follows:

    I. Depressed affect (blues, depressed. lonelv.crv sad)

    II. Positive affect (good, hopeful, happy, en-Joy)

    III. Somatic and retarded activity (bothered,appetite, effort, sleep, get going)

    IV. Interpersonal ~un friendlv, dislike)

    Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227. May be reproduced with no cost by students and faculty for academic use. Non-academic reproduction

    requires payment of royalties through the Copyright Clearance Center, http://www.copyright.com/

  • 398

    Table 10. Average Scores of New HavenPatients At Admission and After

    Treatmenta (N=35)

    afrom Weissman et al (1975) Tables 3 & 4.

    bMatched t-test, p < .001 for all 4 measures, for 3 comparisons:Time 1 vs. Time 2Time 2 vs. Time 3Time 1 vs. Time 3

    Including items with loadings of at least .35 in atleast two groups would add the items ,failurefearful, happy. and enjoy to the depressed affectfactor and the items blues, mind, depressed. andtalk to the somatic factor. In all three groups,the depressed affect factor shares the largestproportion of the variance (about 16%) and theinterpersonal factor, the smallest proportion(about 8%).

    Similarity of factor structure of the three sam-ples was estimated by the Factorial InvarianceCoefficient, ri, (Derogatis, Kallmen, & Davis,197l ; Derogatis, Serio, & Cleary, 1972; Pinneau& Newhouse, 1964). The ri, is a measure of thecorrelation of the loadings of all items on onefactor in one group versus the loadings on onefactor in another group. If the factor structure oftwo groups is similar, the r;,, will be very highwhen loadings on the same factor in both groupsare correlated (the &dquo;diagonal&dquo; coefficients) andvery low when d(fferent factors are correlated(the &dquo;off-diagonal&dquo; coefficients). Comparing Q1with Q2 and Ql with Q3, the diagonal coeffi-cients were very high (.87 to .99). The off-diagonal coefficients (i.e., the similarity of different factors) were very low (the largest was .13).

    This is very strong evidence that the CES-D hasa similar factor structure in two samples fromsimilar populations (Ql vs. Q2) and across twotests on essentially the same sample (Ql vs. Q3).The factors found in the general population

    are consistent with the components of depres-sion built into the scale. However, the high inter-nal consistency of the scale found in all groupsargues against undue emphasis on separate fac-tors. The items are all symptoms related to de-pression. For epidemiologic research, a simpletotal score is recommended as an estimate of thedegree of depressive symptomatology.

    Generalizability Across SubgroupsTo be useful fur epidemiologic studies (e.g.,

    distribution of depression across demographicsubgroups), the CES-D scale must have ade-quate reliability and validity and a similar fac-tor structure within each subgroup of the popu-lation. Therefore, the analyses of Tables 3, 4, 6,7 and 11 were repeated on each of three agegroups (under 25, 25-64, over 64), the two sexes,two races (Black and White), three levels of edu-cation (less than high school, high school,greater than high school), and the two &dquo;need

    Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227. May be reproduced with no cost by students and faculty for academic use. Non-academic reproduction

    requires payment of royalties through the Copyright Clearance Center, http://www.copyright.com/

  • Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227. May be reproduced with no cost by students and faculty for academic use. Non-academic reproduction

    requires payment of royalties through the Copyright Clearance Center, http://www.copyright.com/

  • 400

    help&dquo; groups (&dquo;need help,&dquo; &dquo;not need help&dquo;).For these analyses, the data from Kansas Cityand Washington County Q1 and Q2 were com-bined to maximize numbers in the subgroups.With few exceptions, the results for the total

    population were confirmed in all subgroups. (ta-bles available on request). In all subgroups, co-efficient alpha was .80 or above. Test-retest cor-relations were moderate (.40 or above) in all butthree groups (Blacks, age under 25, and &dquo;needhelp&dquo;). The subgroup patterns of correlationswith other scales (as in Table 6) and relation-ships to &dquo;need help&dquo; (as in Table 7) were verysimilar to those in the total population. The sub-groups did not differ from each other or fromthe total population in factor structure. The&dquo;need help&dquo; group (which had been found to besimilar to the Washington County patient groupby various criteria above) was not like that pa-tient group in factor structure but was very simi-lar to the total general population.

    Cautions and Conclusions

    Some limitations in use of the scale should benoted. It is not intended as a clinical diagnostictool, and interpretations of individual scoresshould not be made. Even group averagesshould be interpreted in terms of level of symp-toms which accompany depression, not in termsof rates of illness. Appropriate cutoff scores forclinical screening are yet to be validated. Thereare some hints that understanding of the itemsmay be a problem; there was a very small butconsistent correlation between the CES-D scoreand the interviewer ratings of understanding ofthe questions, independent of education of therespondent. Analyses indicate that this was notsimply due to respondents who did not noticethe reversal of the positive items. Special cautionis needed with bilingual respondents (Trieman,1975). Further study of this issue is needed, withpossible revision for simplicity of wording andremoval of colloquial expressions. There is stillsome question as to the effect of the interviewerand the interview form on the mean level of scalescores. Until further study decides this issue, a

    sampling design balanced by interviewer may beappropriate.On the positive side, the results reported here

    are very favorable for the uses of the CES-Dscale for which it was designed. The scale hashigh internal consistency, acceptable test-reteststability, excellent concurrent validity by clinicaland self-report criteria, and substantial evidenceof construct validity. These properties holdacross the general population subgroupsstudied. The scale is suitable for use in Blackand White English-speaking American popula-tions of both sexes with a wide range of age andsocioeconomic status for the epidemiologicstudy of the symptoms of depression. A groupwith a high average score may be interpreted tobe &dquo;at risk&dquo; of depression or in need of treat-ment. The scale is a valuable tool to identifysuch high-risk groups and to study the relation-ships between depressive symptoms and manyother variables.

    ReferencesBeck, A. T., Ward, C. H., Mendelson, M., Mock, J.,

    & Erbaugh, J. An inventory for measuring depres-sion. Archives of General Psychiatry, 1961, 4,561-571.

    Bradburn, N. M. The structure of psychological wellbeing. Chicago: Aldine, 1969.

    Cantril, H. The pattern of human concern. NewBrunswick: Rutgers University, 1965.

    Choi, I. C., & Comstock, G. W. Interviewer effect onresponses to a questionnaire relating to mood.American Journal of Epidemiology, 1975, 101,84-92.

    Comstock, G. W., & Helsing, K. Characteristics ofrespondents and nonrespondents to a question-naire for estimating community mood. AmericanJournal of Epidemiology, 1973, 97, 233-239.

    Comstock, G. W., & Helsing K. Symptoms of depres-sion in two communities. Psychological Medicine(in press).

    Craig, T. J., & Van Natta, P. Recognition of de-pressed affect in hospitalized psychiatric patients:Staff and patient perceptions. Diseases of the Ner-vous System (in press).

    Crandell, D. L., & Dohrenwend, B. P. Some relationsamong psychiatric symptoms, organic illness andsocial class. American Journal of Psychiatry, 1967,12, 1527-1537.

    Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227. May be reproduced with no cost by students and faculty for academic use. Non-academic reproduction

    requires payment of royalties through the Copyright Clearance Center, http://www.copyright.com/

  • 401

    Crowne, D. P., & Marlowe, D. A new scale of socialdesirability independent of psychopathology.Journal of Consulting Psychology, 1960, 24,349-354.

    Derogatis, L. R., Kallmen, C. H., & Davis, D. M.FMATCH: A program to evaluate the degree ofequivalence of factors derived from analyses ofdifferent samples. Behavioral Science, 1971, 16,271-273.

    Derogatis, L. R., Lipman, R. S., & Covi, L. SCL-90:An outpatient psychiatric scale: Preliminary re-port. Psychopharmacology Bulletin, 1973, 9,13-27.

    Derogatis, L. R., Serio, J. C., & Cleary, P. A. Anempirical comparison of three indices of factorialsimilarity. Psychological Reports, 1972, 30,791-804.

    Dohrenwend, B. S., & Dohrenwend, B. P. (Eds.)Stressful life events: Their nature and effects. NewYork: Wiley-Interscience, 1974.

    Gardner, E. A. Development of a symptom check listfor the measurement of depression in a popula-tion. Unpublished, 1968.

    Hamilton. M. A rating scale for depression. Journalof Neurologic Neurosurgical Psychiatry, 1960, 23,56-62.

    Handlin, V., Klassen, D., Hornstra, R., & Roth, A.Interviewer effects in a community mental healthsurvey. Technical Report, 1974, The Greater Kan-sas City Mental Health Foundation, Contract PH43-66-1324, National Institute of Mental Health.

    Klassen, D., Hornstra, R., & Anderson, P. The in-fluence of social desirability on symptom andmood reporting in a community survey. Journal ofConsulting & Clinical Psychology, 1975, 43,448-452.

    Klassen, D., & Roth, A. Characteristics of non-re-spondents in the Community Mental Health As-sessment survey. Technical Report, 1974, TheGreater Kansas City Mental Health Foundation,Contract PH 43-66-1324, National Institute ofMental Health.

    Klein, D. F. Endogenomorphic depression. Archivesof General Psychiatry, 1974, 31, 447-454.

    Langner, T. S. A twenty-two item screening score ofpsychiatric symptoms indicating impairment.Journal of Health and Human Behavior, 1962, 3,269-276.

    Lubin, B. Munual for the depression adjective checklists. San Diego: Educational and Industrial Test-ing Service, 1967.

    Mebane, I. On time and late respondents. TechnicalReport, 1973. Center for Epidemiologic Studies,National Institute of Mental Health.

    Nunnally, J. C. Psychometric theory. New York: Mc-Graw Hill, 1967.

    Pinneau, S. R., & Newhouse, A. Measures of invari-ance and comparability in factor analysis for fixedvariables. Psychometrika, 1964, 29, 271-281.

    Raskin, A., Schulterbrandt, J., Reatig, N., & Mc-Keon, J. Replication of factors of psychopathologyin interview, ward behavior, and self-report rat-ings of hospitalized depressives. Journal of Ner-vous and Mental Disease, 1969, 148, 87-96.

    Rockliff, B. W. A brief rating scale for anti-depres-sant drug trials. Comprehensive Psychiatry, 1971,12, 122-135.

    Standards for educational and psychological tests.Washington, D.C.: American Psychological Asso-ciation, 1974.

    Trieman, B. Depressive mood among middle class ur-ban ethnic groups. Technical Report, 1975, Con-tract HSM 42-73-238, National Institute of Men-tal Health.

    Weissman, M. M., Prusoff, B., & Newberry P. Com-parison of the CES-D with standardized depres-sion rating scales at three points in time.Technical Report, 1975, Yale University, ContractASH-74-166, National Institute of Mental Health.

    Zung, W. W. K. A self-rating depression scale. Ar-chives of General Psychiatry, 1965, 12, 63-70.

    AcknowledgementsThe CES-D Scale was originallv developed bv Mr.

    Ben Z. Locke, Chief, Center ,for EpidemiologicStudies (CES), National Institute of Mental Healthand Dr. Peter Putnam. ,formerlv at CES. The overallprogram was initiated by Dr. Robert Markush._former Chiqfi CES. The,field studies were carried outbv the Epidemiologic Field Station, Greater KansasCitv Mental Health Foundation. Kansas Citv. Mis-souri (Dr. Rob(jn Hornstra. Director) and The Train-irzg Center for Public Health Research, Johns Hop-kins Universitv. Hagerstown. Maryland (Dr. GeorgeComstock) under contract with CES. The clinicalvalidation studies were carried out bv Dr. ThomasCraig. _fbrmer(v of Johns Hopkins and Dr. MvrnaWeissman. Yale Universitv, under contract NItl7CES. Important advice on and review of this reportwas provided bv those connected with the study. es-peciallv Dr. Thomas Craig and Dr. Evelvn Goldberg,Johns Hopkins; bv Dr. Len Derogatis, Johns Hop-kins ; and by the reviewers and editor of this journal.

    Authors AddressLenore S. Radloff, Room 10C-09, Parklawn Building,5600 Fishers Lane, Rockville, Maryland 20852.

    Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227. May be reproduced with no cost by students and faculty for academic use. Non-academic reproduction

    requires payment of royalties through the Copyright Clearance Center, http://www.copyright.com/