The Cognitive Therapy Scale: Psychometric properties

Journal of Consulting and Clinical Psychology1986, Vol. 54, No. 3, 381-385

Copyright 1986 by the American Psychological Association, Inc.0022JW6X/86/W0.75

The Cognitive Therapy Scale: Psychometric Properties

T. Michael Vallis and Brian F. ShawClarke Institute of Psychiatry and University of Toronto

Keith S. DobsonUniversity of British Columbia

The Cognitive Therapy Scale (CTS; Young & Beck, 1980) was developed to evaluate therapist com-petence in cognitive therapy for depression. Preliminary data on the psychometric properties of the

scale have been encouraging but not without problems. In this article, we present data on the interrater

reliability, internal consistency, factor structure, and discriminant validity of the scale. To overcome

methodological problems with previous work, expert raters were used, all sessions evaluated were

cognitive therapy sessions, and a more adequate statistical design was used. Results indicate that the

CTS can be used with only moderate interrater reliability but that aggregation can be used to increase

reliability. The CTS is highly homogeneous and provides a relatively undifferentiated assessment of

therapist performance, at least when used with therapists following the cognitive therapy protocol.

The scale is, however, sensitive to variations in the quality of therapy.

As psychotherapy research continues, more and more attention

is being given to therapist in-session behavior (Elkin, 1984; Schaf-

fer, 1983). DeRubeis, Hollon, Evans, and Bemis (1982) and Lu-

borsky, Woody, McLellan, O'Brien, and Rosenzweig (1982) have

developed rating scales to assess therapy-specific therapist be-

havior, and these scales have been shown to clearly differentiate

therapy types (e.g., cognitive therapy and interpersonal therapy).

Ensuring that different therapies are unique is an important con-

tribution, but this work does not address how competently a

The National Institute of Mental Health (NIMH) Treatment of

Depression Collaborative Research Program is a multisite program ini-

tiated and sponsored by the Psychosocial Treatment Research Branch,

Division of Extramural Research Programs, NIMH, and is funded by

cooperative agreements to six participating sites. The principal NIMH

collaborators are Irene Elkin, coordinator; John P. Docherty, acting branch

chief; and Morris B. Parloff, former branch chief. Tracie Shea, of George

Washington University, is associate coordinator. The principal investigators

and project coordinators at the three participating research sites are Stuart

M. Sotsky and David Glass, George Washington University; Stanley D.

Imber and Paul A. Pilkonis, University of Pittsburgh; and John T. Watkins

and William Leber, University of Oklahoma. The principal investigatorsand project coordinators at the three sites responsible for training ther-

apists are Myma Weissman, Eve Chevron, and Bruce J. Rounsaville, Yale

University; Brian F. Shaw and T. Michael Vallis, Clarke Institute of Psy-chiatry; and Jan A. Fawcett and Phillip Epstein, Rush Presbyterian St.

Luke's Medical Centre. Collaborators in the data management and data

analysis aspects of the program are C. James Klett, Joseph F. Collins,

and Roderic Gillis, of the Perry Point, Maryland, \feterans AdministrationCooperative Studies Program.

This work was completed as part of the NIMH Treatment of Depression

Collaborative Research Program (NIMH Grant MH3823102 awardedto the second author).

We would like to thank John Rush, Maria Kovacs, Jeff Young, and

Gary Emery, who served as expert raters and consultants.

Correspondence concerning this article should be addressed to BrianF. Shaw, Clarke Institute of Psychiatry, 250 College Street, Toronto, On-

tario, Canada M5T 1R8.

given therapy is implemented (Schaffer). Competency assessment

provides a metric that can facilitate detection of therapist drift

(Shaw, 1984) and defines a potentially important predictor vari-

able for understanding the psychotherapy change process.

The Cognitive Therapy Scale (CTS) was developed by Young

and Beck (1980) to evaluate therapist competence in imple-

menting the cognitive therapy protocol of Beck, Rush, Shaw, and

Emery (1979). The CTS is an observer-rated scale that contains

11 items divided into two subscales on a rational basis. The Gen-

eral Skills subscale is composed of items assessing establishment

of an agenda, obtaining feedback, therapist understanding, in-

terpersonal effectiveness, collaboration, and pacing of the session

(efficient use of time). The Specific Cognitive Therapy Skills sub-

scale items assess empiricism, focus on key cognitions and be-

haviors, strategy for change, application of cognitive-behavioral

techniques, and quality of homework assigned. All items are rated

on Likert-type 7-point scales (range = 0-66). Item scale values

are associated with concrete, behavioral descriptors, and a de-

tailed rating manual is available.

The CTS requires varying degrees of inference from raters.

Establishing an agenda is a low-inference item. Relevant therapist

behaviors include obtaining feedback from the previous session,

identifying current issues, and collaborating with the patient to

select one or more appropriate targets for the session. These be-

haviors are specific, but which of them are required to justify a

given rating is unclear. At the other extreme, strategy for change

is a high-inference item. The scale descriptor for the maximum

score states, "Therapist followed a consistent strategy for change

that seemed very promising and incorporated the most appro-

priate [italics added] cognitive-behavioral techniques." The rater

must have considerable knowledge to make a judgment about

the most appropriate strategy for a given patient at a given point

in therapy.

Three recent studies have examined the interrater reliability

of the CTS (Dobson, Shaw, & Vallis, 1985; Hollon et al., 1981;

Young, Shaw, Beck, & Budenz, 1981). Reliability coefficients

381

382 T. VAIX1S, B. SHAW, AND K. DOBSON

(intraclass correlations) ranged from .54 to .96 in these studies.

Methodological problems make these data difficult to interpret,

however. All used a problematic statistical design, the Balanced

Incomplete Block Design (Kirk, 1968).' Further, Young et al.

selected tapes that had a restricted range, and not all of the ther-

apists in the Dobson et al. or Hollon et al. studies followed the

cognitive therapy protocol. Combining cognitive and noncog-

nitive therapists confounds the ability of the CTS to discriminate

cognitive therapy from noncognilive therapy and the ability of

the CTS to discriminate between the competency with which

different therapists perform cognitive therapy. Only Young et al.

examined cognitive therapists exclusively, and they reported a

moderate reliability coefficient. An additional concern with the

Hollon et al. study is that six of the seven raters were psychology

graduate students who may not have had the requisite experience

and knowledge to make highly inferential judgments. If the CTS

is to be used as an index of the competency of cognitive therapists,

interrater reliability should be established within a homogeneous

group of cognitive therapists using experienced raters and a

complete statistical design.

Fewer data are available on the internal consistency of the

CTS. Data from Dobson et al. (1985) suggest that the scale is

very homogeneous (a = .95). Similarly, Young et al. (1981), who

combined the data from all three of the studies cited above and

performed a factor analysis, found only two overlapping factors.

Specific cognitive therapy technical skill items loaded on Factor

1, whereas interpersonal skill items loaded moderately on Factor

1 and moderately to highly on Factor 2. Together, these data

suggest that the CTS is a highly homogeneous scale. Finally, Hol-

lon et al. (1981) reported preliminary data on the concurrent

and discriminant validity of the CTS. The CTS was shown to

correlate with the Cognitive Therapy subscale of the Minnesota

Therapy Rating Scale, a scale designed to measure adherence

and not competence, but not with the Interpersonal Therapy or

Pharmacotherapy subscales. The present article presents exten-

sive data on the CTS scale. Expert raters, a pure sample of cog-

nitive therapists, and a more adequate statistical design charac-

terized the study. Data on interrater reliability, internal consis-

tency, factor structure, and discriminant validity are presented.

Method

Overview

The data for this paper were derived from the training phase of the

cognitive therapy component of the National Institute of Mental Health

(NIMH) Treatment of Depression Collaborative Research Program

(TDCRP; Elkin, Parloff, Hadley, & Autry, 1985). Nine psychotherapists

(Ph.D. or M.D.), three from each of three treatment centers in the United

States, were trained in cognitive therapy. Training occurred over an 18-

month period, during which time each therapist treated four or five pa-

tients. AH patients were suffering from unipolar depression and met the

research diagnostic criteria for major depressive disorder (Spitzer, Endicott,

& Robins, 1978). Patients were excluded if they were psychotic or suffered

from bipolar affective disorder, alcoholism, or certain medical problems

(see Elkin et al. for complete details).

During the training period, each therapist received weekly individual

and monthly group supervision from training staff (the authors). Training

staff viewed videotapes of every therapy session and completed the CTS.2

In addition, five consultants periodically evaluated samples of videotapes.

AH raters based their ratings on complete (50-min) sessions.

Raters

A total of seven raters were involved in these analyses. All raters (6

Ph.D. and I M.D.) were experts in cognitive therapy and had considerable

clinical experience as well as experience in training others in cognitive

therapy. Three of the seven raters were cognitive therapy trainers in the

TDCRP. The remaining four raters were consultants. Due to availability

restrictions, five raters were involved in the interrater reliability analysis.

All seven raters were involved in the internal consistency, factor analysis,

and discriminant validity analyses.

Stimulus Videotapes and Rating Procedures

The various analyses involved different numbers of videotapes, selected

by various means; therefore the different samples are described separately.

Interrater reliability. Each of five raters evaluated the same 10 vid-

eotapes, selected randomly from a pool of 94 tapes. This pool represented

all videotapes received by the cognitive therapy component of the TDCRP

over a 6-month period (January through June of 1983). Videotapes were

selected so that each of the nine therapists were represented in the sample.

A secondXape was selected from one therapist so that 10 tapes composed

the reliability sample. AH raters evaluated the videotapes over a 2-day

period. Raters viewed eight tapes in pairs and two tapes alone. Raters

were paired so that each rater evaluated two tapes with each other rater.

The order of the tapes was counterbalanced across raters.

Internal consistency and factor analysis. On four occasions, TDCRP

consultants met to evaluate selected samples of videotapes. Videotapes

were randomly selected from all tapes available during the period between

consultants' visits so that each therapist who had seen a patient in that

period was represented. At two visits, two consecutive sessions were se-

lected from each therapist. Here the first session was randomly selected.

This was done so that consultants could base their assessment on more

than a single sample of a therapist's behavior.

To increase reliability, all videotapes were rated by two raters, and the

mean was used in the analyses. Raters were paired with each other so

that each rater evaluated approximately the same number of videotapes

with every other rater. For those raters involved in the ongoing monitoring

of therapists (trainers), tapes were periodically selected at random and

independently rated by one other rater. A total of approximately 725

videotapes were available, from which 90 were selected for analyses. To

balance for individual rater characteristics, approximately equal numbers

of ratings were obtained from each rater.3

Discriminant validity. From the same sample used in the internal

consistency analysis, 53 tapes were selected. Difference in the number of

videotapes selected was due to fewer tapes being available for a rater who

joined the project later than the rest. This rater evaluated a subset of

tapes in which acceptable and unacceptable decisions were not made.

In addition to making ratings on the CTS form, each rater also evaluated

the session overall as to whether it was acceptable cognitive therapy or

not. Acceptable sessions were judged as being of sufficient quality to be

1 With this design, raters do not evaluate all sessions. Instead, raters

are paired so that each rater evaluates the same number of sessions as

every other rater. Because raters and sessions are not crossed, reliabilities

are calculated on the basis of estimated effects.2 Ratings were also made on the following variables: overall therapist

quality; acceptability of the therapists' performance as adequate for an

outcome study; patient difficulty; patient receptivity; and the probability,

based on the observed session, that the therapist would become competent

as a cognitive therapist and as a cognitive therapy supervisor.3 Four raters viewed 28 videotapes, one viewed 27, one viewed 24, and

one viewed 17. Two raters viewed fewer videotapes due to unavailability

during some phases of the project (one trainer joined the project after it

had begun operation, and one consultant did not attend two site visits).

COGNITIVE THERAPY SCALE 383

a valid representation of cognitive therapy. Unacceptable sessions were

those whose content was judged to be unrepresentative of cognitive therapy

(regardless of quality) or whose quality was judged to be poor.

Ratings of acceptability and unacceptability were made independently

of ratings on the CIS. This was possible because all tapes were evaluated

by two raters. One rater was randomly selected to provide the acceptable

or unacceptable judgment. The CTS rating from the other rater was ob-

tained and used in this analysis. This procedure avoided the problem of

confounded ratings when the same individual makes both judgments.

Results

Interrater Reliability

The intraclass correlation coefficient (ICC; see Shrout & Fleiss,

1979) was calculated on CTS total scores. A one-way analysis of

variance (ANOVA), with tapes (10) as the independent variable

and raters (five) as the replication factor, generated the appropriate

sums of squares. The ANOVA indicated that the CTS scores for

the 10 videotapes differed significantly, F[9, 40) = 8.26, p <

.001. Comparison within these 10 tapes resulted in four

subgroups. Six tapes clustered in the low range (M = 38.0); one

in the low-medium range (M = 45.00); two in the medium-

high range (M = 54); and one, in the high range (M = 60.2).

Therefore, restricted variance did not appear to be a problem,

as it was for the Young et al. (1981) study (see Lahey, Downey,

&Saal, 1983).

The reliability of a single rater for this sample was .59, which

was statistically significant, ^9, 40) = 8.23, p < .01, although

not as high as that reported by Dobson et al. (1985) or Hollon

et al. (1981). Reliability for individual items was low to moderate,

ranging from .27 (pacing) to .59 (empiricism).

Unreliability in the ICC can be due to minimum variance,

differential pattern of correlations among rater pairs, or low cor-

relation between raters (Lahey et al., 1983). That there was sig-

nificant variance among the sample videotapes is an argument

against a minimal variance interpretation. Examination of the

pairwise correlation coefficients between all raters on CTS total

scores failed to identify any particular rater as deviant. All raters

demonstrated a similar range of correlations with other raters.4

Thus, with a group of therapists adhering to the cognitive therapy

Table 1

Item-Total Correlations for the Cognitive Therapy Scale (CTS)

Table 2

Factor Analysis of the Cognitive Therapy Scale

Subscale and item

General skillsAgendaFeedbackUnderstanding

Interpersonal effectivenessCollaborationPacing

Cognitive therapy skillsEmpiricism

Focus on cognitionsStrategy for changeImplementation of strategyHomework

Generalskills

.66

.83

.84

.79

.88

.77

.77

.72

.81

.80

.59

CTskills

.48

.74

.74

.61

.78

.75

.86

.87

.94

.93

.72

CTS

total

.59

.81

.82

.73

.87

.79

.84

.82

.91

.90

.68

Item Factor 1 Factor!

,2.3.4.5.6.7.8.9.

10.11.

AgendaFeedbackUnderstanding

Interpersonal effectivenessCollaborationPacingEmpiricism

Focus on cognitionStrategy for changeImplementation of strategyHomework

Eigenvalue% variance

.13

.72

.91

.79

.82

.54

.79

.79

.71

.72

.28

7.1364.80

.81

.39

.15

.15

.34

.62

.35

.32

.57

.54

.77

0.988.90

protocol and rated by experts in cognitive therapy, .59 appeared

to accurately estimate the reliability of a single rater. Increases

in reliability can be achieved by increasing the number of raters

(Epstein, 1979). For instance, by combining the data for two

raters, the interrater reliability coefficient increased to .77. Using

the Spearman Brown prophecy formula, we found the estimated

reliability of ratings combined from three raters to be 0.84.5''

Internal Consistency

Item-total correlations were calculated between each item and

the total for (a) the General Skills subscale, (b) the Cognitive

Therapy Skills subscale, and (c) the overall score (Table 1). Ex-

amination of the item-total correlations in Table 1 leads one to

question the division of items into subscales. All items correlated

moderately to highly with both subscales and with the total score.

Although each item correlated highest with its respective subscale,

the discrepancy in the magnitude of the correlations with its own

and with the other subscale was small. Similarly, the two subscales

correlated highly with each other, r(88) = .85, p< .001.

Factor structure. Through Principal-components factor

analysis, with varimax rotation, we further evaluated the internal

structure of the CTS. Two factors resulted from this analysis

(Table 2). The first factor (64.8% of the variance) included high

positive loadings for all but three items. Items from both the

General Skills and Cognitive Therapy Skills subscales loaded on

this factor. This factor reflected overall cognitive therapy quality,

Note. N = 90.

4 The range of correlations with other raters was .59 to .82 for Rater

1, .44 to .78 for Rater 2, .44 to .84 for Rater 3, .63 to .84 for Rater 4,

and .53 to .82 for Rater 5.5 This coefficient had to be estimated because six raters would have

been required to directly calculate interrater reliability when three raters

were combined.6 Although this study was not designed to evaluate intrarater reliability

for the five raters, test-retest data were available from the trainers. Each

trainer reevaluated a sample of randomly selected videotapes that were

originally rated at least 5 months previously. Reliability coefficients for

the CTS total score, based on samples of 17, I I , and 10 videotapes were

.81, .96, and .68, respectively. A second retest reliability study was con-

ducted on one trainer 18 months after the initial study. A test-retest

correlation of .77 was found for a sample of 10 videotapes.

384 T. VALLIS, B. SHAW, AND K. DOBSON

composed of both nonspecific factors (such as understanding

and interpersonal effectiveness) and specific cognitive therapy

factors (such as empiricism, focus on central cognitions, and

implementation of cognitive-behavioral interventions). The sec-

ond factor (8.9% of the variance) involved the items assessing

therapists' activities to structure cognitive therapy sessions

(agenda, pacing, and homework). This analysis confirmed the

findings on internal consistency. The scale was highly homoge-

neous, with only two orthogonal factors.

Discriminant validity. Differences in CTS item scores between

acceptable and unacceptable sessions were analyzed by a mul-

tivariate analysis of variance (MANOVA). We used / tests to com-

pare acceptable and unacceptable sessions on General Skills and

Cognitive Therapy Skills subscales as well as on the total scale.

Table 3 presents mean scores and the results of univariate anal-

yses. Univariate analyses were computed on item scores because

the MANOVA was highly significant (Hotelling's 7"2 = 6.54, p <

.01). As reflected in Table 3, CTS scores for acceptable sessions

were almost twice those of unacceptable sessions.

A stepwise discriminant function analysis was also conducted.

Only CTS items that significantly added to the separation of

groups (using the maximum Rao procedure) were included in

the discriminant function. These were, in the order in which

they were added to the function, application of cognitive-behav-

ioral techniques (Rao's V= 66.79, p < .0001), feedback (change

in Rao's V = 7.83, p < .006), empiricism (change in Rao's V =

5.86, p < .02), interpersonal effectiveness (change in Rao's V =

3.76, p < .05), and collaboration (change in Rao's V = 5.61,

p < .02). When the resulting discriminant function was used to

predict group classification (acceptable or unacceptable cognitive

therapy), 84.91% correct classification resulted. Three of the 29

acceptable sessions and 5 of the 24 unacceptable sessions were

misclassified. Because the base rate for a dichotomous classifi-

cation is 50%, the discriminant function increased correct clas-

sification by approximately 35%.

Discussion

The CTS was developed to assess therapist competency in cog-

nitive therapy. To be a useful clinical instrument, the scale must

have adequate psychometric properties, particularly rater reli-

ability. Data from this study suggest that the CTS is used with

only moderate reliability. We estimated that for a single rater,

only 59% of the variance in CTS scores is attributable to differ-

ences in competency across sessions. The remaining variability

(41%) is attributable to error. The major source of error likely

results from different raters relying on different aspects of a ther-

apist's behavior to make their ratings. The abundance of behavior

contained in a complete therapy session no doubt contributes

to this.Although a reliability coefficient of .59 is acceptable and con-

sistent with other psychotherapy rating scales (Lahey et al., 1983),it has implications for the use of the scale. Unreliability detracts

from the ability to use the scale for supervision or research. For-

tunately, reliability can be bootstrapped by aggregation (Epstein,

1979). Combining two judges' ratings increased reliability to .77

in the present study. Therefore, aggregation is recommended to

anyone planning to use the CTS.The interrater reliability estimate obtained in this study was

lower than that of Dobson et al. (1985) or Hollon et al. (1981),

Table 3

Analysis of Acceptable and Unacceptable Therapy Sessions on

the Cognitive Therapy Scale (CTS)

Item

AgendaFeedbackUnderstandingInterpersonal

effectivenessCollaborationPacingEmpiricismFocus on cognitionStrategy for changeImplementation of

strategyHomework

General skills subtotalC/B skills subtotalCTS total

Acceptablesessions(n = 29)

3.144.004.69

4.794.594.283.934.764.31

4.244.31

25.7621.5547.31

Unacceptablesessions(n = 24)

1.461.883.08

3.582.712.752.582.462.71

1.952.71

15.4611.9227.28

F*

18.69"40.53**26.17**

11.32*42.18*21.57*29.17*45.86*18.06*

66.79"18.06"

7.25*7.53*7.90*

B We used t tests to compare the two groups on the General Skills andCT Skills subscales and CTS total scores.*p<.05. «p<.001.

who examined a heterogeneous group of therapists, but was con-

sistent with Young et al. (1981). This suggests that the CTS is

highly reliable when used to rate a heterogeneous sample of ther-

apists. Reliability appears to be attenuated, however, when used

in a homogeneous sample of cognitive therapists.

Data on the discriminant validity of the CTS are encouraging.

CTS scores clearly discriminated between performance inde-

pendently judged as acceptable or unacceptable. This confirms

the findings reported in Young et al. (1981) and Hollon et al.

(1981). The ability of the scale to make more fine-grained dif-

ferentiations should be examined next.

The discriminant validity data clearly suggest that CTS total

scores do reflect cognitive therapy competency. The concern then

becomes what more specific information the scale yields. Al-

though the scale has been divided into two subscales on rational

grounds, data indicate that this division is unjustified. The scale

items are highly homogeneous. In fact, item-total correlations

indicated that all items loaded highly on both subscales. The

two subscales shared approximately 75% of their variance. Factor

analysis revealed only one major factor with most of the items

loading highly on it. The small second factor assessed therapists'

structuring activities, such as agenda setting and homework pre-

scriptions. Thus, the CTS is not a highly differentiated scale, at

least as it is currently used. This is not to mean that it provides

little information; the total scores appear to be a relatively good

measure of competency in cognitive therapy. There is little jus-

tification, however, for using item scores. Future work on this

scale should evaluate methods of increasing the internal differ-

entiation of the scale. Perhaps, by dividing the session into seg-

ments and obtaining separate ratings for each segment, the ten-

dency to make global ratings can be reduced.Given the specificity of the cognitive therapy approach, it is

somewhat surprising that the CTS is not more differentiated in

COGNITIVE THERAPY SCALE 385

its assessment of competence. It may be that the scale items are

not sufficiently specific to allow for independent assessment of

various aspects of cognitive therapy. The level of abstraction re-

quired to make these decisions might be so great as to make

judgments on various items nonindependent. Nonetheless, the

data from the present study indicate that the scale can be used

reliably and that it is sensitive to variability in the quality with

which a cognitive therapy protocol is administered. The scale

should prove useful in future research on cognitive therapy.

References

Beck, A. T., Rush, A. J., Shaw, B. F., & Emery, G. (1979). Cognitive

therapy of depression. New York: Guilford Press.

DeRubeis, R., Hollon, S., Evans, M., & Bemis, K. (1982). Can psycho-

therapies for depression be discriminated? A systematic investigation

of cognitive therapy and interpersonal therapy. Journal of Consulting

and Clinical Psychology, 50. 744-756.

Dobson, K. S., Shaw, B. E, & Vallis, T. M. (1985). The reliability of

competency ratings on cognitive-behavior therapists. British Journal

of Clinical Psychology. 24, 295-300.

Elkin, I. (1984). Specification of the technique variable in the NIMH

Treatment of Depression Collaborative Research Program. In J. Wil-

liams & R. Spitzer (Eds.), Psychotherapy research: Where are we and

where should we go? (pp. 150-159). New York: Guilford Press.

Elkin, I., Parloff, M., Hadley, S., & Autry, I. (1985). The NIMH Treatment

of Depression Collaborative Research Program: Background and re-

search plan. Archives of General Psychiatry, 42, 305-316.

Epstein, S. (1979). The stability of behavior: I. On predicting most of

the people most of the time. Journal of Personality and Social Psy-

chology, 37, 1097-1126.

Hollon, S., Mandell, M., Bemis, K., DeRubeis, R., Emerson, M., Evans,

M., & Kriss, M. (1981). Reliability and validity of the Young Cognitive

Therapy Scale. Unpublished manuscript, University of Minnesota.

Kirk, R. (1968). Experimental design: Procedures for the behavioral sci-

ences. Belmont, CA: Brooks/Cole.

Lahey, M., Downey, R., & Saal, F. (1983). Intraclass correlations: There

is more than meets the eye. Psychological Bulletin, 93, 586-595.

Luborsky, L., Woody, G., McLellan, A., O'Brien, C, & Rosenzweig, J.

(1982). Can independent judges recognize different psychotherapies?

An experience with manual-guided therapies. Journal of Consulting

and Clinical Psychology, 50, 49-62.

Schaffer, N. (1983). Methodological issues of measuring the skillfulness

of therapeutic techniques. Psychotherapy: Theory, research and practice,

20, 480-493.

Shaw, B. F. (1984). Specification of the training and evaluation of cognitive

therapists for outcome studies. In J. Williams & R. Spitzer (Eds.), Psy-

chotherapy research: Where are we and where should we go?(pp. 173-

189). New York: Guilford Press.

Shrout, P., & Fleiss, J. (1979). Intra-class correlations: Uses in assessing

rater reliability. Psychological Bulletin, 86, 420-428.

Spitzer, R., Endicott, J., & Robins, E. (1978). Research diagnostic criteria:

Rationale and reliability. Archives of General Psychiatry, 35, 773-782.

Young, J., & Beck, A. (1980). Cognitive Therapy Scale: Rating manual.

Unpublished manuscript, Center for Cognitive Therapy, Philadelphia,

PA.

Young, J., Shaw, B. F., Beck, A. T, & Budenz, D. (1981). Assessment of

competence in cognitive therapy. Unpublished manuscript, University

of Pennsylvania.

Received December 17, 1984

Revision received August 15, 1985

Documents

The Cognitive Therapy Scale: Psychometric properties