Reliability and validity of the Hospital Anxiety and Depression Scale and the Beck Depression Inventory (Full and FastScreen scales) in detecting depression in persons with hepatitis

Journal of Affective Disorders 100 (2007) 265–269www.elsevier.com/locate/jad

Brief report

Reliability and validity of the Hospital Anxiety and Depression Scaleand the Beck Depression Inventory (Full and FastScreen scales)

in detecting depression in persons with hepatitis C

Jeannette Golden a, Ronán M. Conroy b,⁎, Anne Marie O'Dwyer a

a Psychological Medicine Service, St James's Hospital, Dublin 8, Irelandb Epidemiology Department, Royal College of Surgeons in Ireland, 120 St Stephen's Green, Dublin 2, Ireland

Received 16 June 2006; received in revised form 21 October 2006; accepted 23 October 2006Available online 6 December 2006

Abstract

Background: We examined the performance the Beck Depression Inventory (BDI) and its short form (BDI-FS) and the HospitalAnxiety and Depression Scale depression (HADS-D) and anxiety (HADS-A) subscales in detecting depression in a group ofpatients with hepatitis C.Methods: SCID-CV was used to establish DSM-IV diagnosis. Sensitivity, specificity, positive and negative predictive values wereused to assess test performance and Cohen's Kappa to measure agreement with DSM diagnosis.Results: Twenty-five of 88 participants had a DSM-IV depressive diagnosis. There was considerable non-overlap between ‘caseness’on the BDI and HADS (Kappa=0.44). The HADS depression subscale had poor sensitivity (52%) and poor agreement with clinicaldiagnosis (Kappa=0.35). The full BDI had a sensitivity of 88% and aKappa of 0.54 against a sensitivity of 84% andKappa of 0.42 forthe short form. The HADS anxiety subscale predicted depression as well as the depression subscale (sensitivity 88%, Kappa 0.47).Conclusions: Neither the BDI nor the HADS agrees well with the clinical diagnosis of depressive disorder, nor do they agree wellwith one another. The anxiety subscale of the HADS appears to measure depression at least as well as the depressive subscale.© 2006 Elsevier B.V. All rights reserved.

Keywords: Depression; Screening; Self-completion scales; HADS; BDI

1. Introduction

Depression is still under-recognised and under-treated in medical patients (Gelenberg, 1999), both inprimary care (Williams et al., 1999) and in medicalinpatients (Bowler et al., 1994; Koenig et al., 1988;Penn et al., 1997). Screening for depression in themedially ill is made more difficult because the somatic

⁎ Corresponding author.E-mail address: [email protected] (R.M. Conroy).

0165-0327/$ - see front matter © 2006 Elsevier B.V. All rights reserved.doi:10.1016/j.jad.2006.10.020

symptoms of depression are also common in physicalillness (Brown-DeGagne et al., 1998; Ross et al., 2003).

The Beck Depression Inventory FastScreen (BDI-FS)was developed as a 7-item subset of the Beck DepressionInventory (BDI), aimed at rapid screening in medicalpatients (Beck et al., 1997). It is based on the cognitivesymptoms of the BDI, and mirrors the diagnostic criteriafor Major Depressive Disorder in DSM-IV. The HospitalAnxiety and Depression Scale (Zigmond and Snaith,1983) comprises two 7-item scales designed to ratedepression (HADS-D) and anxiety (HADS-A). It wasdeveloped to be brief, non-threatening and to exclude

mailto:[email protected]

http://dx.doi.org/10.1016/j.jad.2006.10.020

266 J. Golden et al. / Journal of Affective Disorders 100 (2007) 265–269

items which might reflect somatic complaints. It hasbeen widely used in research (Bjelland et al., 2002).

In this study, we assess the reliability and validity ofthe HADS, BDI and BDI-FS in a group of patients withhepatitis C.

2. Methods

2.1. Participants and measures

Participants were recruited as part of a study of moodand wellbeing among outpatients at the hepatitis Cservices of a St James's Hospital, Dublin. Ethicalapproval for the study was obtained by the hospitalethics committee. Having obtained written informedconsent, participants were interviewed using the SCID-CV, a structured diagnostic interview based on DSM-IVcriteria (First et al., 1996). They also completed theHADS and BDI scales. We have described the study inmore detail elsewhere (Golden et al., 2005).

The BDI-FS score was calculated from the relevantitems on the BDI. A caseness threshold of 7/8 was usedfor the HADS subscales, and 3/4 for the BDI-FS. Athreshold of 18/19 was used for the BDI.

2.2. Statistical analysis

Data were analysed using Stata Release 9. Cronbach'salpha was used to measure reliability. Validity wasassessed in two ways: using the area under the ROCcurve, which is a generalised measure of the ability of ascale to distinguish between two groups. It measures theprobability that a depressed person will score higher thana nondepressed person. An ROC curve area of 1indicates perfect separation of the two groups while anarea of 0.5 indicates no better separation than expectedby chance.

Validity was also assessed by calculating positive andnegative predictive values for each test at its publishedcaseness threshold. Agreement between DSM-IV diag-nosis was measured using the Kappa statistic, withconfidence intervals using the method of Donner andEliasziw (Reichenheim, 2004).

3. Results

3.1. Diagnoses

Of 97 potential participants, five refused and twocould not be interviewed because of security concerns.Of 90 participants, two failed to complete both self-assessment measures, leaving 88 participants to form the

study group. Of these, 23 were women (26%). Mostparticipants (47%) had been infected though injectingdrug use (IDU) and a further 32 had iatrogenic disease,through transfusion (8) treatment for haemophilia (14)or contaminated anti-D products (10). The remainingparticipants had disease of unknown aetiology.

DSM-IV depression was diagnosed in 25 participants(28%), of whom seven had major depressive disorder,ten adjustment disorder with depressive features andeight dysthymic disorder or depressive disorder nototherwise specified.

3.2. Reliability

All three depression scales had high reliabilities:0.83 for the HADS depression scale, 0.85 for theanxiety subscale, 0.85 for the BDI-FS and 0.93 for thefull BDI scale. Examination of the performance ofindividual scale items showed that the BDI suicidalityitem had a very limited range, with 88% of participantsscoring 0 and the remainder 1. It also had lowcorrelation with the other items of the BDI-SF (0.4)and could be removed without altering the reliability.We therefore calculated a second BDI-FS score basedon the remaining six items, which was shall refer to asthe BDI-FS6, using the same cutpoint (3/4) to determinecaseness.

3.2.1. Validity: detection of depression by the depressionscales

The area under the ROC curve was 0.87 for the BDI(95% CI 0.80 to 0.95). In comparison, it was 0.85 for theBDI-FS (95%CI 0.77 to 0.93)whichwas not significantlylower (P=0.227). The BDI-FS-6 had an ROC curve areaof 0.84 (95%CI 0.75 to 0.92) which was not significantlylower than that for the BDI-FS (P=0.092). The HADSdepression subscale, on the other hand, had an ROC curvearea of only 0.78 (95% CI 0.68 to 0.88), significantlylower than the BDI (P=0.025).

Table 1 shows the predictive value of the scales usingthe caseness thresholds. It should be noted that there isconsiderable non-overlap between caseness on theHADS and the BDI (Kappa = 0.56) or BDI-FS(Kappa=0.44). Agreement with DSM-IV diagnosiswas highest for the BDI (Kappa=0.54) and somewhatlower for the BDI-FS (Kappa=0.42) and BDI-FS-6(Kappa=0.44). (Kappas in the range 0.4 to 0.6 are takenas indicators of poor agreement in clinical medicine.)For the HADS depression scale, the Kappa was worse,at only 0.33.

The BDI had acceptable sensitivity (88%) and nega-tive predictive value (94%) and these values were

Table 1Diagnostic and screening performance of the HADS depression (HADS-D) and anxiety (HADS-A) subscales, the BDI-FS in its original (7-item) andmodified (6-item) versions and the Beck Depression Inventory (BDI)

Screening scale Non-cases (SCID-CV) Cases (SCID-CV) Total

N=63 N=25 N=88

HADS-D cutoff 7/8 Non-case 52 12 64Case 11 13 24+ve predictive value (95% CI) 54% (33%–74%)−ve predictive value (95% CI) 81% (70%–90%)Sensitivity (95% CI) 52% (31%–72%)Specificity (95% CI) 83% (71%–91%)Kappa (95% CI) 0.35 (0.13–0.55)

HADS-A cutoff 7/8 Non-case 43 3 46Case 20 22 44+ve predictive value (95% CI) 52% (36%–68%)−ve predictive value (95% CI) 93% (82%–99%)Sensitivity (95% CI) 88% (69%–97%)Specificity (95% CI) 68% (55%–79%)Kappa 0.47 (0.30–0.64)

BDI-FS cutoff 3/4 Non-case 42 4 46Case 21 21 42+ve predictive value 50% (34%–66%)−ve predictive value 91% (34%–66%)Sensitivity (95% CI) 84% (64%–95%)Specificity (95% CI) 67% (54%–78%)Kappa 0.42 (0.24–0.60)

BDI-FS-6 cutoff 3/4 Non-case 43 4 47Case 20 21 41+ve predictive value 51% (35%–67%)−ve predictive value 91% (80%–98%)Sensitivity (95% CI) 84% (64%–95%)Specificity (95% CI) 68% (55%–79%)Kappa 0.44 (0.26–0.62)

BDI Non-case 47 3 50Case 16 22 38+ve predictive value 58% (41%–74%)−ve predictive value 94% (83%–99%)Sensitivity (95% CI) 88% (69%–97%)Specificity (95% CI) 75% (62%–85%)Kappa 0.54 (0.37–0.71)

Numbers in bold are frequencies.Numbers in parentheses are confidence intervals.

267J. Golden et al. / Journal of Affective Disorders 100 (2007) 265–269

similar for the BDI-FS and BDI-FS-6. Positive predic-tive value indices were lower for all three BDI-derivedmeasures. The HADS depression scale had a poorsensitivity (52%) missing roughly half of all depressedparticipants.

3.2.2. Detection of depression by the HADS anxietysubscale

Table 1 also shows the performance of the anxietysubscale of the HADS, which had a higher ROC curvearea (0.84, 95% CI 0.74 to 0.93) than the depressionsubscale. It performed better as a screening tool fordepression, with higher negative predictive value and

sensitivity, detecting 88% of depressed participants asagainst 52% for the depression subscale.

3.2.3. Optimisation of cutoff points for depressionWe examined each scale to see if changing the

caseness threshold would significantly improve itsdiscriminative ability. No significantly better casenessthreshold was found for the HADS, but for the BDI-FS,the use of a threshold of 5/6 Kappa increased to 0.53,specificity to 76% and positive predictive value to 58%with sensitivity and negative predictive value essentiallyunaltered. This increase in prediction was statisticallysignificant (P=0.021).

268 J. Golden et al. / Journal of Affective Disorders 100 (2007) 265–269

4. Discussion

There are two areas of concern about current screeningtools for depression: the first is their ability to identifythose with depressive disorder. There is a perceived needfor means of rapid assessment of depression in medicalpatients (Williams et al., 2002) without the need forformal psychiatric assessment.

The second is that in many research studies theseinstruments are used in the absence of formal psychiatricassessment. This applies particularly to the HADS,which, because it is simple and non-threatening, iswidely used in research (Herrmann, 1997). However,comparatively few studies have validated the HADSagainst clinical diagnosis, and these have used a varietyof caseness thresholds, making it difficult to synthesisethe findings (Bjelland et al., 2002).

4.1. Case detection

The most significant finding of our study is the pooragreement between all self-completion measures andDSM-IV diagnosis of depression. This is of particularconcern in relation to the HADS, which missed justunder 50% of all those with depression. By contrast,though the BDI-FS had a high rate of false positivefindings, it had a lower rate of false negatives. Its highnegative predictive value suggests that it may be usefulin ruling out depression. The superiority of the BDI-FSconfirms the findings of Parker et al. (2001) and Becket al. (1997).

Though most case-findings studies with the HADShave reported sensitivities and specificities of 80% ormore (Herrmann, 1997), several other authors havereported performance as poor as that documented here(Hall et al., 1999; Silverstone, 1994). One explanationmay be the difference in reading age between the BDIand HADS. The BDI has a Fog readability index thatplaces it in the ‘easy’ category, while the HADS has oneof the highest required literacy levels of any of the self-completion instruments assessed by Williams et al.(2002). Our patient group would have included asignificant number with low literacy which, takenwith Williams et al.'s findings, suggests that theHADS may perform poorly in populations with lowliteracy levels.

4.2. What does the HADS anxiety scale measure?

The HADS anxiety scale performed marginally betterthan the depression scale as a screen for depressivedisorder. Costantini et al. (1999) have reported a similar

finding in a study of women with breast cancer. Thisstrongly suggests that the two subscales of the HADS donot measure distinct clinical entities, and that the largeliterature based on the premise that they do is signallyflawed.

4.3. Can the BDI-FS be shortened?

Our findings suggest that the omission of thesuicidality item from the BDI-FS does not compromiseits performance, and may increase its acceptability topatients. Further work, however, is needed to validatethis finding.

4.4. Conclusion

It is clear that none of the instruments is a substitutefor clinical observation. We can only repeat the warningsounded by Beck that “researchers and clinicians needto be aware of the differential sensitivity of depressioninstruments which, while supposedly measuring thesame construct, are focussed on different components ofthis mood disorder” (Beck and Gable, 2001).

References

Beck, C.T., Gable, R.K., 2001. Comparative analysis of theperformance of the Postpartum Depression Screening Scale withtwo other depression instruments. Nurs. Res. 50 (4), 242–250.

Beck, A.T., Guth, D., Steer, R.A., Ball, R., 1997. Screening for majordepression disorders in medical inpatients with the Beck Depres-sion Inventory for Primary Care. Behav. Res. Ther. 35 (8),785–791.

Bjelland, I., Dahl, A.A., Haug, T.T., Neckelmann, D., 2002. Thevalidity of the Hospital Anxiety and Depression Scale. An updatedliterature review. J. Psychosom. Res. 52 (2), 69–77.

Bowler, C., Boyle, A., Branford, M., Cooper, S.A., Harper, R.,Lindesay, J., 1994. Detection of psychiatric disorders in elderlymedical inpatients. Age Ageing 23 (4), 307–311.

Brown-DeGagne, A.M., McGlone, J., Santor, D.A., 1998. Somaticcomplaints disproportionately contribute to Beck DepressionInventory estimates of depression severity in individuals withmultiple chemical sensitivity. J. Occup. Environ. Med. 40 (10),862–869.

Costantini, M., Musso, M., Viterbori, P., Bonci, F., Del Mastro, L.,Garrone, O., et al., 1999. Detecting psychological distress in cancerpatients: validity of the Italian version of the Hospital Anxiety andDepression Scale. Support. Care Cancer 7 (3), 121–127.

First, M.B., Gibbon, M., Sptizer, R.L., Williams, J.B.W., 1996.Structured Clinical Interview for DSM-IV Axis I Disorders:Clinician Version (SCID-CV): User's Guide. American PsychiatricPublishing Inc., Arlington, VA.

Gelenberg, A., 1999. Depression is still underrecognized andundertreated. Arch. Intern. Med. 159 (15), 1657–1658.

Golden, J., O'Dwyer, A.M., Conroy, R.M., 2005. Depression andanxiety in patients with hepatitis C: prevalence, detection rates andrisk factors. Gen. Hosp. Psych. 27 (6), 431–438.

269J. Golden et al. / Journal of Affective Disorders 100 (2007) 265–269

Hall, A., A'Hern, R., Fallowfield, L., 1999. Are we using appropriateself-report questionnaires for detecting anxiety and depression inwomen with early breast cancer? Eur. J. Cancer 35 (1), 79–85.

Herrmann, C., 1997. International experiences with the HospitalAnxiety and Depression Scale—a review of validation data andclinical results. J. Psychosom. Res. 42 (1), 17–41.

Koenig, H.G., Meador, K.G., Cohen, H.J., Blazer, D.G., 1988.Detection and treatment of major depression in older medically illhospitalized patients. Int. J. Psychiatry Med. 18 (1), 17–31.

Parker, G., Hilton, T., Hadzi-Pavlovic, D., Bains, J., 2001. Screening fordepression in the medically ill: the suggested utility of a cognitive-based approach. Aust. N. Z. J. Psychiatry 35 (4), 474–480.

Penn, J.V., Boland, R., McCartney, J.R., Kohn, R., Mulvey, T., 1997.Recognition and treatment of depressive disorders by internalmedicine attendings and housestaff. Gen. Hosp. Psych. 19 (3),179–184.

Reichenheim, M.E., 2004. Confidence intervals for the kappa statistic.Stata J. 4 (4), 421–428.

Ross, L.E., Gilbert Evans, S.E., Sellers, E.M., Romach, M.K., 2003.Measurement issues in postpartum depression part 2: assessmentof somatic symptoms using the Hamilton Rating Scale forDepression. Arch. Women Ment. Health 6 (1), 59–64.

Silverstone, P.H., 1994. Poor efficacy of the Hospital Anxiety andDepression Scale in the diagnosis of major depressive disorder inboth medical and psychiatric patients. J. Psychosom. Res. 38 (5),441–450.

Williams Jr., J.W., Mulrow, C.D., Kroenke, K., Dhanda, R., Badgett,R.G., Omori, D., et al., 1999. Case-finding for depression inprimary care: a randomized trial. Am. J. Med. 106 (1), 36–43.

Williams Jr., J.W., Pignone, M., Ramirez, G., Perez Stellato, C., 2002.Identifying depression in primary care: a literature synthesis ofcase-finding instruments. Gen. Hosp. Psych. 24 (4), 225–237.

Zigmond, A.S., Snaith, R.P., 1983. The hospital anxiety anddepression scale. Acta Psychiatr. Scand. 67 (6), 361–370.

Documents

Reliability and validity of the Hospital Anxiety and Depression Scale and the Beck Depression Inventory (Full and FastScreen scales) in detecting depression in persons with hepatitis