Upload
keith-s
View
213
Download
0
Embed Size (px)
Citation preview
Journal of Consulting and Clinical Psychology1986, Vol. 54, No. 3, 381-385
Copyright 1986 by the American Psychological Association, Inc.0022JW6X/86/W0.75
The Cognitive Therapy Scale: Psychometric Properties
T. Michael Vallis and Brian F. ShawClarke Institute of Psychiatry and University of Toronto
Keith S. DobsonUniversity of British Columbia
The Cognitive Therapy Scale (CTS; Young & Beck, 1980) was developed to evaluate therapist com-petence in cognitive therapy for depression. Preliminary data on the psychometric properties of the
scale have been encouraging but not without problems. In this article, we present data on the interrater
reliability, internal consistency, factor structure, and discriminant validity of the scale. To overcome
methodological problems with previous work, expert raters were used, all sessions evaluated were
cognitive therapy sessions, and a more adequate statistical design was used. Results indicate that the
CTS can be used with only moderate interrater reliability but that aggregation can be used to increase
reliability. The CTS is highly homogeneous and provides a relatively undifferentiated assessment of
therapist performance, at least when used with therapists following the cognitive therapy protocol.
The scale is, however, sensitive to variations in the quality of therapy.
As psychotherapy research continues, more and more attention
is being given to therapist in-session behavior (Elkin, 1984; Schaf-
fer, 1983). DeRubeis, Hollon, Evans, and Bemis (1982) and Lu-
borsky, Woody, McLellan, O'Brien, and Rosenzweig (1982) have
developed rating scales to assess therapy-specific therapist be-
havior, and these scales have been shown to clearly differentiate
therapy types (e.g., cognitive therapy and interpersonal therapy).
Ensuring that different therapies are unique is an important con-
tribution, but this work does not address how competently a
The National Institute of Mental Health (NIMH) Treatment of
Depression Collaborative Research Program is a multisite program ini-
tiated and sponsored by the Psychosocial Treatment Research Branch,
Division of Extramural Research Programs, NIMH, and is funded by
cooperative agreements to six participating sites. The principal NIMH
collaborators are Irene Elkin, coordinator; John P. Docherty, acting branch
chief; and Morris B. Parloff, former branch chief. Tracie Shea, of George
Washington University, is associate coordinator. The principal investigators
and project coordinators at the three participating research sites are Stuart
M. Sotsky and David Glass, George Washington University; Stanley D.
Imber and Paul A. Pilkonis, University of Pittsburgh; and John T. Watkins
and William Leber, University of Oklahoma. The principal investigatorsand project coordinators at the three sites responsible for training ther-
apists are Myma Weissman, Eve Chevron, and Bruce J. Rounsaville, Yale
University; Brian F. Shaw and T. Michael Vallis, Clarke Institute of Psy-chiatry; and Jan A. Fawcett and Phillip Epstein, Rush Presbyterian St.
Luke's Medical Centre. Collaborators in the data management and data
analysis aspects of the program are C. James Klett, Joseph F. Collins,
and Roderic Gillis, of the Perry Point, Maryland, \feterans AdministrationCooperative Studies Program.
This work was completed as part of the NIMH Treatment of Depression
Collaborative Research Program (NIMH Grant MH3823102 awardedto the second author).
We would like to thank John Rush, Maria Kovacs, Jeff Young, and
Gary Emery, who served as expert raters and consultants.
Correspondence concerning this article should be addressed to BrianF. Shaw, Clarke Institute of Psychiatry, 250 College Street, Toronto, On-
tario, Canada M5T 1R8.
given therapy is implemented (Schaffer). Competency assessment
provides a metric that can facilitate detection of therapist drift
(Shaw, 1984) and defines a potentially important predictor vari-
able for understanding the psychotherapy change process.
The Cognitive Therapy Scale (CTS) was developed by Young
and Beck (1980) to evaluate therapist competence in imple-
menting the cognitive therapy protocol of Beck, Rush, Shaw, and
Emery (1979). The CTS is an observer-rated scale that contains
11 items divided into two subscales on a rational basis. The Gen-
eral Skills subscale is composed of items assessing establishment
of an agenda, obtaining feedback, therapist understanding, in-
terpersonal effectiveness, collaboration, and pacing of the session
(efficient use of time). The Specific Cognitive Therapy Skills sub-
scale items assess empiricism, focus on key cognitions and be-
haviors, strategy for change, application of cognitive-behavioral
techniques, and quality of homework assigned. All items are rated
on Likert-type 7-point scales (range = 0-66). Item scale values
are associated with concrete, behavioral descriptors, and a de-
tailed rating manual is available.
The CTS requires varying degrees of inference from raters.
Establishing an agenda is a low-inference item. Relevant therapist
behaviors include obtaining feedback from the previous session,
identifying current issues, and collaborating with the patient to
select one or more appropriate targets for the session. These be-
haviors are specific, but which of them are required to justify a
given rating is unclear. At the other extreme, strategy for change
is a high-inference item. The scale descriptor for the maximum
score states, "Therapist followed a consistent strategy for change
that seemed very promising and incorporated the most appro-
priate [italics added] cognitive-behavioral techniques." The rater
must have considerable knowledge to make a judgment about
the most appropriate strategy for a given patient at a given point
in therapy.
Three recent studies have examined the interrater reliability
of the CTS (Dobson, Shaw, & Vallis, 1985; Hollon et al., 1981;
Young, Shaw, Beck, & Budenz, 1981). Reliability coefficients
381
382 T. VAIX1S, B. SHAW, AND K. DOBSON
(intraclass correlations) ranged from .54 to .96 in these studies.
Methodological problems make these data difficult to interpret,
however. All used a problematic statistical design, the Balanced
Incomplete Block Design (Kirk, 1968).' Further, Young et al.
selected tapes that had a restricted range, and not all of the ther-
apists in the Dobson et al. or Hollon et al. studies followed the
cognitive therapy protocol. Combining cognitive and noncog-
nitive therapists confounds the ability of the CTS to discriminate
cognitive therapy from noncognilive therapy and the ability of
the CTS to discriminate between the competency with which
different therapists perform cognitive therapy. Only Young et al.
examined cognitive therapists exclusively, and they reported a
moderate reliability coefficient. An additional concern with the
Hollon et al. study is that six of the seven raters were psychology
graduate students who may not have had the requisite experience
and knowledge to make highly inferential judgments. If the CTS
is to be used as an index of the competency of cognitive therapists,
interrater reliability should be established within a homogeneous
group of cognitive therapists using experienced raters and a
complete statistical design.
Fewer data are available on the internal consistency of the
CTS. Data from Dobson et al. (1985) suggest that the scale is
very homogeneous (a = .95). Similarly, Young et al. (1981), who
combined the data from all three of the studies cited above and
performed a factor analysis, found only two overlapping factors.
Specific cognitive therapy technical skill items loaded on Factor
1, whereas interpersonal skill items loaded moderately on Factor
1 and moderately to highly on Factor 2. Together, these data
suggest that the CTS is a highly homogeneous scale. Finally, Hol-
lon et al. (1981) reported preliminary data on the concurrent
and discriminant validity of the CTS. The CTS was shown to
correlate with the Cognitive Therapy subscale of the Minnesota
Therapy Rating Scale, a scale designed to measure adherence
and not competence, but not with the Interpersonal Therapy or
Pharmacotherapy subscales. The present article presents exten-
sive data on the CTS scale. Expert raters, a pure sample of cog-
nitive therapists, and a more adequate statistical design charac-
terized the study. Data on interrater reliability, internal consis-
tency, factor structure, and discriminant validity are presented.
Method
Overview
The data for this paper were derived from the training phase of the
cognitive therapy component of the National Institute of Mental Health
(NIMH) Treatment of Depression Collaborative Research Program
(TDCRP; Elkin, Parloff, Hadley, & Autry, 1985). Nine psychotherapists
(Ph.D. or M.D.), three from each of three treatment centers in the United
States, were trained in cognitive therapy. Training occurred over an 18-
month period, during which time each therapist treated four or five pa-
tients. AH patients were suffering from unipolar depression and met the
research diagnostic criteria for major depressive disorder (Spitzer, Endicott,
& Robins, 1978). Patients were excluded if they were psychotic or suffered
from bipolar affective disorder, alcoholism, or certain medical problems
(see Elkin et al. for complete details).
During the training period, each therapist received weekly individual
and monthly group supervision from training staff (the authors). Training
staff viewed videotapes of every therapy session and completed the CTS.2
In addition, five consultants periodically evaluated samples of videotapes.
AH raters based their ratings on complete (50-min) sessions.
Raters
A total of seven raters were involved in these analyses. All raters (6
Ph.D. and I M.D.) were experts in cognitive therapy and had considerable
clinical experience as well as experience in training others in cognitive
therapy. Three of the seven raters were cognitive therapy trainers in the
TDCRP. The remaining four raters were consultants. Due to availability
restrictions, five raters were involved in the interrater reliability analysis.
All seven raters were involved in the internal consistency, factor analysis,
and discriminant validity analyses.
Stimulus Videotapes and Rating Procedures
The various analyses involved different numbers of videotapes, selected
by various means; therefore the different samples are described separately.
Interrater reliability. Each of five raters evaluated the same 10 vid-
eotapes, selected randomly from a pool of 94 tapes. This pool represented
all videotapes received by the cognitive therapy component of the TDCRP
over a 6-month period (January through June of 1983). Videotapes were
selected so that each of the nine therapists were represented in the sample.
A secondXape was selected from one therapist so that 10 tapes composed
the reliability sample. AH raters evaluated the videotapes over a 2-day
period. Raters viewed eight tapes in pairs and two tapes alone. Raters
were paired so that each rater evaluated two tapes with each other rater.
The order of the tapes was counterbalanced across raters.
Internal consistency and factor analysis. On four occasions, TDCRP
consultants met to evaluate selected samples of videotapes. Videotapes
were randomly selected from all tapes available during the period between
consultants' visits so that each therapist who had seen a patient in that
period was represented. At two visits, two consecutive sessions were se-
lected from each therapist. Here the first session was randomly selected.
This was done so that consultants could base their assessment on more
than a single sample of a therapist's behavior.
To increase reliability, all videotapes were rated by two raters, and the
mean was used in the analyses. Raters were paired with each other so
that each rater evaluated approximately the same number of videotapes
with every other rater. For those raters involved in the ongoing monitoring
of therapists (trainers), tapes were periodically selected at random and
independently rated by one other rater. A total of approximately 725
videotapes were available, from which 90 were selected for analyses. To
balance for individual rater characteristics, approximately equal numbers
of ratings were obtained from each rater.3
Discriminant validity. From the same sample used in the internal
consistency analysis, 53 tapes were selected. Difference in the number of
videotapes selected was due to fewer tapes being available for a rater who
joined the project later than the rest. This rater evaluated a subset of
tapes in which acceptable and unacceptable decisions were not made.
In addition to making ratings on the CTS form, each rater also evaluated
the session overall as to whether it was acceptable cognitive therapy or
not. Acceptable sessions were judged as being of sufficient quality to be
1 With this design, raters do not evaluate all sessions. Instead, raters
are paired so that each rater evaluates the same number of sessions as
every other rater. Because raters and sessions are not crossed, reliabilities
are calculated on the basis of estimated effects.2 Ratings were also made on the following variables: overall therapist
quality; acceptability of the therapists' performance as adequate for an
outcome study; patient difficulty; patient receptivity; and the probability,
based on the observed session, that the therapist would become competent
as a cognitive therapist and as a cognitive therapy supervisor.3 Four raters viewed 28 videotapes, one viewed 27, one viewed 24, and
one viewed 17. Two raters viewed fewer videotapes due to unavailability
during some phases of the project (one trainer joined the project after it
had begun operation, and one consultant did not attend two site visits).
COGNITIVE THERAPY SCALE 383
a valid representation of cognitive therapy. Unacceptable sessions were
those whose content was judged to be unrepresentative of cognitive therapy
(regardless of quality) or whose quality was judged to be poor.
Ratings of acceptability and unacceptability were made independently
of ratings on the CIS. This was possible because all tapes were evaluated
by two raters. One rater was randomly selected to provide the acceptable
or unacceptable judgment. The CTS rating from the other rater was ob-
tained and used in this analysis. This procedure avoided the problem of
confounded ratings when the same individual makes both judgments.
Results
Interrater Reliability
The intraclass correlation coefficient (ICC; see Shrout & Fleiss,
1979) was calculated on CTS total scores. A one-way analysis of
variance (ANOVA), with tapes (10) as the independent variable
and raters (five) as the replication factor, generated the appropriate
sums of squares. The ANOVA indicated that the CTS scores for
the 10 videotapes differed significantly, F[9, 40) = 8.26, p <
.001. Comparison within these 10 tapes resulted in four
subgroups. Six tapes clustered in the low range (M = 38.0); one
in the low-medium range (M = 45.00); two in the medium-
high range (M = 54); and one, in the high range (M = 60.2).
Therefore, restricted variance did not appear to be a problem,
as it was for the Young et al. (1981) study (see Lahey, Downey,
&Saal, 1983).
The reliability of a single rater for this sample was .59, which
was statistically significant, ^9, 40) = 8.23, p < .01, although
not as high as that reported by Dobson et al. (1985) or Hollon
et al. (1981). Reliability for individual items was low to moderate,
ranging from .27 (pacing) to .59 (empiricism).
Unreliability in the ICC can be due to minimum variance,
differential pattern of correlations among rater pairs, or low cor-
relation between raters (Lahey et al., 1983). That there was sig-
nificant variance among the sample videotapes is an argument
against a minimal variance interpretation. Examination of the
pairwise correlation coefficients between all raters on CTS total
scores failed to identify any particular rater as deviant. All raters
demonstrated a similar range of correlations with other raters.4
Thus, with a group of therapists adhering to the cognitive therapy
Table 1
Item-Total Correlations for the Cognitive Therapy Scale (CTS)
Table 2
Factor Analysis of the Cognitive Therapy Scale
Subscale and item
General skillsAgendaFeedbackUnderstanding
Interpersonal effectivenessCollaborationPacing
Cognitive therapy skillsEmpiricism
Focus on cognitionsStrategy for changeImplementation of strategyHomework
Generalskills
.66
.83
.84
.79
.88
.77
.77
.72
.81
.80
.59
CTskills
.48
.74
.74
.61
.78
.75
.86
.87
.94
.93
.72
CTS
total
.59
.81
.82
.73
.87
.79
.84
.82
.91
.90
.68
Item Factor 1 Factor!
,2.3.4.5.6.7.8.9.
10.11.
AgendaFeedbackUnderstanding
Interpersonal effectivenessCollaborationPacingEmpiricism
Focus on cognitionStrategy for changeImplementation of strategyHomework
Eigenvalue% variance
.13
.72
.91
.79
.82
.54
.79
.79
.71
.72
.28
7.1364.80
.81
.39
.15
.15
.34
.62
.35
.32
.57
.54
.77
0.988.90
protocol and rated by experts in cognitive therapy, .59 appeared
to accurately estimate the reliability of a single rater. Increases
in reliability can be achieved by increasing the number of raters
(Epstein, 1979). For instance, by combining the data for two
raters, the interrater reliability coefficient increased to .77. Using
the Spearman Brown prophecy formula, we found the estimated
reliability of ratings combined from three raters to be 0.84.5''
Internal Consistency
Item-total correlations were calculated between each item and
the total for (a) the General Skills subscale, (b) the Cognitive
Therapy Skills subscale, and (c) the overall score (Table 1). Ex-
amination of the item-total correlations in Table 1 leads one to
question the division of items into subscales. All items correlated
moderately to highly with both subscales and with the total score.
Although each item correlated highest with its respective subscale,
the discrepancy in the magnitude of the correlations with its own
and with the other subscale was small. Similarly, the two subscales
correlated highly with each other, r(88) = .85, p< .001.
Factor structure. Through Principal-components factor
analysis, with varimax rotation, we further evaluated the internal
structure of the CTS. Two factors resulted from this analysis
(Table 2). The first factor (64.8% of the variance) included high
positive loadings for all but three items. Items from both the
General Skills and Cognitive Therapy Skills subscales loaded on
this factor. This factor reflected overall cognitive therapy quality,
Note. N = 90.
4 The range of correlations with other raters was .59 to .82 for Rater
1, .44 to .78 for Rater 2, .44 to .84 for Rater 3, .63 to .84 for Rater 4,
and .53 to .82 for Rater 5.5 This coefficient had to be estimated because six raters would have
been required to directly calculate interrater reliability when three raters
were combined.6 Although this study was not designed to evaluate intrarater reliability
for the five raters, test-retest data were available from the trainers. Each
trainer reevaluated a sample of randomly selected videotapes that were
originally rated at least 5 months previously. Reliability coefficients for
the CTS total score, based on samples of 17, I I , and 10 videotapes were
.81, .96, and .68, respectively. A second retest reliability study was con-
ducted on one trainer 18 months after the initial study. A test-retest
correlation of .77 was found for a sample of 10 videotapes.
384 T. VALLIS, B. SHAW, AND K. DOBSON
composed of both nonspecific factors (such as understanding
and interpersonal effectiveness) and specific cognitive therapy
factors (such as empiricism, focus on central cognitions, and
implementation of cognitive-behavioral interventions). The sec-
ond factor (8.9% of the variance) involved the items assessing
therapists' activities to structure cognitive therapy sessions
(agenda, pacing, and homework). This analysis confirmed the
findings on internal consistency. The scale was highly homoge-
neous, with only two orthogonal factors.
Discriminant validity. Differences in CTS item scores between
acceptable and unacceptable sessions were analyzed by a mul-
tivariate analysis of variance (MANOVA). We used / tests to com-
pare acceptable and unacceptable sessions on General Skills and
Cognitive Therapy Skills subscales as well as on the total scale.
Table 3 presents mean scores and the results of univariate anal-
yses. Univariate analyses were computed on item scores because
the MANOVA was highly significant (Hotelling's 7"2 = 6.54, p <
.01). As reflected in Table 3, CTS scores for acceptable sessions
were almost twice those of unacceptable sessions.
A stepwise discriminant function analysis was also conducted.
Only CTS items that significantly added to the separation of
groups (using the maximum Rao procedure) were included in
the discriminant function. These were, in the order in which
they were added to the function, application of cognitive-behav-
ioral techniques (Rao's V= 66.79, p < .0001), feedback (change
in Rao's V = 7.83, p < .006), empiricism (change in Rao's V =
5.86, p < .02), interpersonal effectiveness (change in Rao's V =
3.76, p < .05), and collaboration (change in Rao's V = 5.61,
p < .02). When the resulting discriminant function was used to
predict group classification (acceptable or unacceptable cognitive
therapy), 84.91% correct classification resulted. Three of the 29
acceptable sessions and 5 of the 24 unacceptable sessions were
misclassified. Because the base rate for a dichotomous classifi-
cation is 50%, the discriminant function increased correct clas-
sification by approximately 35%.
Discussion
The CTS was developed to assess therapist competency in cog-
nitive therapy. To be a useful clinical instrument, the scale must
have adequate psychometric properties, particularly rater reli-
ability. Data from this study suggest that the CTS is used with
only moderate reliability. We estimated that for a single rater,
only 59% of the variance in CTS scores is attributable to differ-
ences in competency across sessions. The remaining variability
(41%) is attributable to error. The major source of error likely
results from different raters relying on different aspects of a ther-
apist's behavior to make their ratings. The abundance of behavior
contained in a complete therapy session no doubt contributes
to this.Although a reliability coefficient of .59 is acceptable and con-
sistent with other psychotherapy rating scales (Lahey et al., 1983),it has implications for the use of the scale. Unreliability detracts
from the ability to use the scale for supervision or research. For-
tunately, reliability can be bootstrapped by aggregation (Epstein,
1979). Combining two judges' ratings increased reliability to .77
in the present study. Therefore, aggregation is recommended to
anyone planning to use the CTS.The interrater reliability estimate obtained in this study was
lower than that of Dobson et al. (1985) or Hollon et al. (1981),
Table 3
Analysis of Acceptable and Unacceptable Therapy Sessions on
the Cognitive Therapy Scale (CTS)
Item
AgendaFeedbackUnderstandingInterpersonal
effectivenessCollaborationPacingEmpiricismFocus on cognitionStrategy for changeImplementation of
strategyHomework
General skills subtotalC/B skills subtotalCTS total
Acceptablesessions(n = 29)
3.144.004.69
4.794.594.283.934.764.31
4.244.31
25.7621.5547.31
Unacceptablesessions(n = 24)
1.461.883.08
3.582.712.752.582.462.71
1.952.71
15.4611.9227.28
F*
18.69"40.53**26.17**
11.32*42.18*21.57*29.17*45.86*18.06*
66.79"18.06"
7.25*7.53*7.90*
B We used t tests to compare the two groups on the General Skills andCT Skills subscales and CTS total scores.*p<.05. «p<.001.
who examined a heterogeneous group of therapists, but was con-
sistent with Young et al. (1981). This suggests that the CTS is
highly reliable when used to rate a heterogeneous sample of ther-
apists. Reliability appears to be attenuated, however, when used
in a homogeneous sample of cognitive therapists.
Data on the discriminant validity of the CTS are encouraging.
CTS scores clearly discriminated between performance inde-
pendently judged as acceptable or unacceptable. This confirms
the findings reported in Young et al. (1981) and Hollon et al.
(1981). The ability of the scale to make more fine-grained dif-
ferentiations should be examined next.
The discriminant validity data clearly suggest that CTS total
scores do reflect cognitive therapy competency. The concern then
becomes what more specific information the scale yields. Al-
though the scale has been divided into two subscales on rational
grounds, data indicate that this division is unjustified. The scale
items are highly homogeneous. In fact, item-total correlations
indicated that all items loaded highly on both subscales. The
two subscales shared approximately 75% of their variance. Factor
analysis revealed only one major factor with most of the items
loading highly on it. The small second factor assessed therapists'
structuring activities, such as agenda setting and homework pre-
scriptions. Thus, the CTS is not a highly differentiated scale, at
least as it is currently used. This is not to mean that it provides
little information; the total scores appear to be a relatively good
measure of competency in cognitive therapy. There is little jus-
tification, however, for using item scores. Future work on this
scale should evaluate methods of increasing the internal differ-
entiation of the scale. Perhaps, by dividing the session into seg-
ments and obtaining separate ratings for each segment, the ten-
dency to make global ratings can be reduced.Given the specificity of the cognitive therapy approach, it is
somewhat surprising that the CTS is not more differentiated in
COGNITIVE THERAPY SCALE 385
its assessment of competence. It may be that the scale items are
not sufficiently specific to allow for independent assessment of
various aspects of cognitive therapy. The level of abstraction re-
quired to make these decisions might be so great as to make
judgments on various items nonindependent. Nonetheless, the
data from the present study indicate that the scale can be used
reliably and that it is sensitive to variability in the quality with
which a cognitive therapy protocol is administered. The scale
should prove useful in future research on cognitive therapy.
References
Beck, A. T., Rush, A. J., Shaw, B. F., & Emery, G. (1979). Cognitive
therapy of depression. New York: Guilford Press.
DeRubeis, R., Hollon, S., Evans, M., & Bemis, K. (1982). Can psycho-
therapies for depression be discriminated? A systematic investigation
of cognitive therapy and interpersonal therapy. Journal of Consulting
and Clinical Psychology, 50. 744-756.
Dobson, K. S., Shaw, B. E, & Vallis, T. M. (1985). The reliability of
competency ratings on cognitive-behavior therapists. British Journal
of Clinical Psychology. 24, 295-300.
Elkin, I. (1984). Specification of the technique variable in the NIMH
Treatment of Depression Collaborative Research Program. In J. Wil-
liams & R. Spitzer (Eds.), Psychotherapy research: Where are we and
where should we go? (pp. 150-159). New York: Guilford Press.
Elkin, I., Parloff, M., Hadley, S., & Autry, I. (1985). The NIMH Treatment
of Depression Collaborative Research Program: Background and re-
search plan. Archives of General Psychiatry, 42, 305-316.
Epstein, S. (1979). The stability of behavior: I. On predicting most of
the people most of the time. Journal of Personality and Social Psy-
chology, 37, 1097-1126.
Hollon, S., Mandell, M., Bemis, K., DeRubeis, R., Emerson, M., Evans,
M., & Kriss, M. (1981). Reliability and validity of the Young Cognitive
Therapy Scale. Unpublished manuscript, University of Minnesota.
Kirk, R. (1968). Experimental design: Procedures for the behavioral sci-
ences. Belmont, CA: Brooks/Cole.
Lahey, M., Downey, R., & Saal, F. (1983). Intraclass correlations: There
is more than meets the eye. Psychological Bulletin, 93, 586-595.
Luborsky, L., Woody, G., McLellan, A., O'Brien, C, & Rosenzweig, J.
(1982). Can independent judges recognize different psychotherapies?
An experience with manual-guided therapies. Journal of Consulting
and Clinical Psychology, 50, 49-62.
Schaffer, N. (1983). Methodological issues of measuring the skillfulness
of therapeutic techniques. Psychotherapy: Theory, research and practice,
20, 480-493.
Shaw, B. F. (1984). Specification of the training and evaluation of cognitive
therapists for outcome studies. In J. Williams & R. Spitzer (Eds.), Psy-
chotherapy research: Where are we and where should we go?(pp. 173-
189). New York: Guilford Press.
Shrout, P., & Fleiss, J. (1979). Intra-class correlations: Uses in assessing
rater reliability. Psychological Bulletin, 86, 420-428.
Spitzer, R., Endicott, J., & Robins, E. (1978). Research diagnostic criteria:
Rationale and reliability. Archives of General Psychiatry, 35, 773-782.
Young, J., & Beck, A. (1980). Cognitive Therapy Scale: Rating manual.
Unpublished manuscript, Center for Cognitive Therapy, Philadelphia,
PA.
Young, J., Shaw, B. F., Beck, A. T, & Budenz, D. (1981). Assessment of
competence in cognitive therapy. Unpublished manuscript, University
of Pennsylvania.
Received December 17, 1984
Revision received August 15, 1985