The Child and Adolescent Functional Assessment Scale ... · Hodges, 1989, 1997)—has enjoyed widespread use nationwide. ... AL Using CAFAS along with a battery of other measures

P1: GAEClinical Child and Family Psychology Review (CCFP) PP083-04-297983 February 27, 2001 16:45 Style file version Nov. 07, 2000

Clinical Child and Family Psychology Review, Vol. 4, No. 1, 2001

The Child and Adolescent Functional Assessment Scale(CAFAS): Review and Current Status1

Michael P. Bates2

Measures of impairment in psychological and behavioral functioning have a long history inthe field of children’s mental health, and appear particularly useful in eligibility determina-tion, treatment planning, and outcome evaluation of services for children and adolescents withserious emotional disturbance (SED). One recently developed multidimensional measure offunctional impairment—the Child and Adolescent Functional Assessment Scale (CAFAS; K.Hodges, 1989, 1997)—has enjoyed widespread use nationwide. It has been adopted as a toolfor making treatment eligibility decisions and documenting outcomes on a statewide level inmore than 20 states and on a local level in dozens of research and demonstration projects. Inthis paper, the technical merits of the CAFAS are closely examined, with the conclusion thatempirical evidence is lacking to support its valid use in making the types of treatment decisionsfor which it is currently being employed across the nation. Furthermore, there appears to belittle concern among mental health researchers, practitioners, administrators, and state legis-lators about these apparent limitations of the CAFAS. The potential benefits of establishingobjective and valid level-of-need criteria, using the CAFAS are numerous and the interest indoing so is clear; however, the psychometric limitations of the scale identified in this reviewneed to be addressed before its full potential can be realized.

KEY WORDS: functional impairment; measurement.

Measures of impairment in psychological and be-havioral functioning have a long history in the field ofchildren’s mental health. These level-of-functioning(LOF) scales have many promising features such ascost and time effectiveness, clinical utility, and under-standability to a wide audience. They also appear to beparticularly promising for use with children and ado-lescents with serious emotional disturbance (SED).3

1This article was adapted from portions of the author’s doctoraldissertation.

2Counseling, Clinical, School Psychology Program, GraduateSchool of Education, University of California, Santa Barbara,California, 93106-9490; e-mail: [email protected].

3The term serious emotional disturbance (SED) was replaced withthe term emotional disturbance (ED) in the 1997 reauthorizationof the Individuals with Disabilities Education Act (IDEA). Manyscholars have also used the less stigmatizing term emotional andbehavioral disorders (EBD) to refer to this population. SED isused in this paper because this is the term used in both the IDEAand Center for Mental Health Services (CMHS) definitions.

Whereas the earliest measures of functional impair-ment were hailed for providing simple scores alonga single dimension of global functioning, they havealso been criticized for containing vague descriptorsand being susceptible to rater bias. Recently, multi-dimensional measures of functional impairment havebeen developed that, presumably, have greater resis-tance to rater bias. One of these multidimensionalmeasures—the Child and Adolescent Functional As-sessment Scale (CAFAS; Hodges, 1989, 1997)—hasenjoyed widespread use nationwide. For example, ithas been adopted on a statewide level in more than 20states and on a local level in dozens of research anddemonstration projects. In fact, several states are us-ing it as the sole determinant of placement and fund-ing decisions for children’s behavioral health services.Given this widespread adoption of the CAFAS, it istimely and necessary to review this scale within thecontext of LOF assessment. The purpose of this pa-per is to describe the current status of the CAFAS in

63

1096-4037/01/0300-0063$19.50/0 C⃝ 2001 Plenum Publishing Corporation


64 Bates

the field of children’s mental health services, to crit-ically examine its technical qualities, and to proposefuture research activities that may enhance its validity.

BACKGROUND

Measures of functional impairment have numer-ous uses in the diagnosis, treatment, and evaluationof children’s mental health problems. For example,the definitions of serious emotional disturbance (SED)issued by the Center for Mental Health Services(CMHS, 1999) and contained within the Individualwith Disabilities Education Act (IDEA, 1990) bothcite functional impairment as a critical component ofSED. The CMHS, which has funded more than 40 na-tionwide sites for developing systems of care for com-prehensive services for youths with SED, recently is-sued the following definition of children with SED as:

. . . persons from birth up to age 18 who currentlyor at any time during the past year have had a diag-nosable mental, behavioral, or emotional disorder ofsufficient duration to meet diagnostic criteria speci-fied within DSM-III-R (or the most recent edition ofDSM) that resulted in functional impairment whichsubstantially interferes with or limits the child’s roleor functioning in family, school, or community activ-ities. (Federal Register, 1993, p. 29425)

Similarly, the IDEA definition of SED requires thatcertain characteristics exist “over a long period of timeand to a marked degree that adversely affects a child’seducational performance” [34 CRF 300.5(b)(8)]. Un-der both of these definitions, the assessment of func-tional impairment is a required component.

The construct of global functioning has alsobecome an important component of determining eli-gibility to receive mental health services. For example,CMHS administers a block grant program to allocatefunds to community mental health agencies for theprovision of services to youths with SED. As part ofthe application for this process, states must estimatethe incidence (number of new cases) and prevalence(total number of cases per year) of SED, using,in part, LOF measures (Federal Register, 1993).Similarly, many states have adopted a managed careperspective and seek Medicaid reimbursement formental health services for children and adolescentswith mental health problems. Under this system,eligibility for Medicaid-funded services is contin-gent upon demonstration that the youth exhibitssome level of functional impairment (Anderson,Berlant, Mauch, & Maloney, 1996; Srebnik, Uehara,

& Smukler, 1998). This requirement represents asignificant change from traditional reimbursementmodels, in which a diagnostic classification—such as adiagnosis from the Diagnostic and Statistical Manualof Mental Disorders, 4th edition (American Psychi-atric Association, 1994)—was sufficient to establisheligibility for any available services (Hodges & Gust,1995; Pokorny, 1991).

In addition to these uses for LOF scales, muchhas been written about their utility in outcome assess-ment (e.g., Lambert, 1994; Newman, 1980; Pokorny,1991). Historically, LOF measures have been widelyused in outcome assessment. For example, Lambertand McRoberts (1993) found that LOF indicatorscomprised 53% of the therapist-completed outcomemeasures used in psychotherapy treatment studiespublished in the Journal of Consulting and Clini-cal Psychology between 1986 and 1991. One reasonthat LOF measures appear to be especially useful inoutcome evaluation is that they provide a standardmeans of comparing clients across diagnoses or set-tings or both (Burlingame, Lambert, Reisinger, Neff,& Mosier, 1995). Burlingame et al. (1995, p. 228) sum-marized this point nicely:

Risk assessment establishes the pretreatment degreeof severity of the patient to the level playing fieldwhen comparing outcomes from different providers,clinics, or patient groups. Outcome assessment pro-cedures used in risk assessment should ensure thatone is comparing apples with apples when it comes toinitial severity of patients’ disorders. If initial patientseverity is not accounted for, then one health careinstitution may erroneously appear to exhibit pooreroutcomes due solely to treating more or less symp-tomatically severe cases. Reliable risk assessmentis even more important in mental health outcomeswhere improvement is measured in shades of gray incontrast to the black-and-white comparisons oftenpossible in other areas of the health care industry.

This aspect of LOF measures appears particularly use-ful for studies of treatment effectiveness for youthswith SED because this classification is a heteroge-neous and complex diagnostic category encompass-ing a variety of emotional and behavioral problems.This definitional complexity creates a problem for re-searchers who wish to study this population. Espe-cially under the system of care model that typicallyencourages broader service eligibility, two individu-als with SED may exhibit vastly different symptomsor behaviors and may require different interventions.Thus, LOF assessment may be the best tool to providea common metric by which to compare these youths.


Review and Current Status of the CAFAS 65

Sechrest and colleagues (Sechrest, McKnight,& McKnight, 1996) extended this argument. Theystrongly recommended that treatment outcome scalesbe calibrated, not only to provide standardized nor-mative scores, but to develop a standard measure bywhich to assess meaningful change. They advocatedthe use of procedures to associate changes in a scale’sscores with actual change in behavior or functioning.For example, they purported that “actual change inbehavior or functioning is critical for assessing treat-ment outcome, rather than simply inferring changefrom a metric of uncertain meaning” (p. 1065). A de-crease of 10 scale units on a depression scale, for exam-ple, might represent decrease of some degree in theintensity or severity of specific symptoms. Yet, howmuch of a change in behavior actually occurred andwhat impact might such changes have on meaningfulindicators such as functional status or quality of life?According to these authors, it is critical to establishchange in functioning as the meaningful criterion ofthe effectiveness of psychotherapeutic intervention.From this perspective, LOF measures should play akey role in the calibration of other psychological mea-sures and documentation of “real-life” changes in so-cial, emotional, and behavioral status.

The uses of LOF measures for purposes ofassessment, eligibility determination, and outcomeevaluation in children’s mental health services arenumerous. Furthermore, there is a clear interest incollecting LOF data on both the state and local levels[Georgetown National Technical Assistance Center(GUNTAC), 2000]. Perhaps the most widely usedLOF scale, at least on the statewide level, is theCAFAS. The following section presents a brief sur-vey of current CAFAS usage on both the state andlocal levels.

EXTENT OF USE OF CAFAS

Statewide Implementation

Many states have adopted policies or passedlegislation mandating the use of the CAFAS on astatewide basis. It appears that this trend has in-creased over the past few years. For example, in a late-1993 survey of state usage of LOF measures, Hodgesand Gust (1995) found that four states (Arizona, NewHampshire, North Carolina, and Wisconsin) wereconsistently using the CAFAS statewide. As of July2000, there were 30 states that have implemented orare considering implementing the CAFAS statewide

(GUNTAC, 2000; Hodges, Wong, & Latessa, 1998).Table I lists these 30 states and describes how theyare using the scale.

The primary uses of the CAFAS, at least on thestatewide level, appear to be for performance out-come assessment and service eligibility determina-tion. For example, in August 1995, the state of Floridabegan using the CAFAS as part of its state-legislatedmandate to collect performance outcome data forall children receiving mental health services (Massey,et al., 1998). In Virginia, the CAFAS was selected asone component of a statewide performance and out-come system (POMS) to assess outcomes within thepublic mental health system (Koch & Brunk, 1998).Beginning April 1998, in California, the CAFAS iscompleted for every youth receiving mental healthservices through every county mental health depart-ment (G. M. Pettigrew, personal communication, July21, 1997). As shown in Table I, numerous otherstates, including Delaware, Georgia, Maine, NorthDakota, South Carolina, and Tennessee, are currentlyusing the CAFAS for performance assessment pur-poses. Additionally, Illinois and Kentucky are cur-rently considering implementing the scale statewide(J. Call, personal communication, January 28, 2000;GUNTAC, 2000). Ohio recently switched from theCAFAS to the Ohio Youth Scales (Ogles, Melendez,Davis, & Lunnen, 1999) for their performance assess-ment (M. Wood, personal communication, February9, 2000).

Many states have implemented systematic col-lection of the CAFAS to determine eligibility for ser-vices. For example, the North Carolina Departmentof Mental Health is currently using the CAFAS toestablish service eligibility for youths with mentalhealth needs (Behar & Stelle, 1997; S. Clark, personalcommunication, November 19, 1997). The CAFASis also being used statewide in Virginia to deter-mine levels of care to manage services funded bythe recent Comprehensive Services Act (Kirkman,et al., 1999). Louisiana and Massachusetts are alsousing the CAFAS to determine level of need forMedicaid funded services (Hersch, 1998; Lemoine& McDermott, 1998). Michigan is in the processof developing empirical service eligibility guidelinesusing the CAFAS and other data (i.e., risk fac-tors, clinical condition) to predict type and inten-sity of services (Hodges, Warren, & Wotring, 1998).North Dakota uses the CAFAS and diagnostic cri-teria to assign youths to one of four eligibility cat-egories (J. Perry, personal communication, January31, 2000). In addition, both Georgia and South


66 Bates

Table I. Summary of Statewide Implementation of the CAFAS

State Purpose of CAFAS use Approx. date of implementation Source(s)

AL Using CAFAS along with a battery of other measures At least since 1999 Georgetown University(CBCL, YSR, and Parent Questionnaire) National Technical Assistancefor outcome evaluation on a statewide basis. Center (GUNTAC), 2000

AZ Cutoff total score of 90 on CAFAS qualifies youth At least since Hodges & Gust, 1995;for Intensive Case Management Services October, 1993 Schwartz & Perkins, 1997funded by the Division of BehavioralHealth Services of the Arizona Departmentof Health Services (considering revisingcriteria to include diagnostic information).

CA Component of state-mandated performance outcome April 1, 1998 G. M. Pettigrew, personalassessment for all youths receiving Department of communication, July 21,Mental Health services for 2 months or longer. 1997; GUNTAC, 2000

DE Clinical service management teams using CAFAS At least since 1999 R. Ray, personalfor treatment planning and outcome evaluation communication,with all youths receiving Medicaid or state- January 31, 2000funded services.

FL Component of state-legislated collection of August, 1995 Massey, Kershaw, Armstrong,performance outcome data for all children Shepard, & Wu, 1998receiving services funded by theDepartment of Children and Families.

GA All providers will be mandated to collect CAFAS March 1, 2000 GUNTAC, 2000; S. Lindsey,as component of the Performance Measurement personal communication,& Evaluation System (PERMES). Will become January 28, 2000sole criterion for determining eligibilityand level-of-need.

IL Piloting the CAFAS as part of a study on the At least since 1999 GUNTAC, 2000feasibility of implementing MHSIP ConsumerOriented Report Card.

IN Using Miniscale version (with two added At least since 1997 J. Phillips, personalsubscales: Environment and Reliance) for communication,performance assessment. January 28, 2000

KY Currently used in some programs. Recommended July, 1999 GUNTAC, 2000for use by KY Managed Care OutcomesCommittee. May be integrated with statewideevaluation protocol.

LA Sole criterion to establish level-of-need (LON) December, 1995 Lemoine, Speier, Ellzey,to receive one of 3 Medicaid-funded service & Pine, 1997; Lemoinepackages (high, medium, and low). & McDermott, 1998

ME In process of implementing CAFAS along with At least since 1999 S. Amero, personalother measures (CALOCUS, BERS) for communication,performance assessment, service planning, February 1, 2000and outcome evaluation for youths receivingMental Health case management services.

MD Piloting CAFAS via phone interviews with a At least since 1998 GUNTAC, 2000sample of total youths served as evaluationof first year of managed care reform.

MA Cut-off score of 80 using six of eight subscales, July 1, 1996 Irvin & Hersch, 1997;in conjunction with diagnosable disorder of Hersch, 19981-year duration, to determine eligibilityfor services funded by Department ofMental Health.

MI Presently developing guidelines to predict No information given Hodges, et al., 1998type and intensity of services from CAFASscores and diagnostic/risk information.

MN Statewide CAFAS use is encouraged but At least since 1999 GUNTAC, 2000not mandated as component of measuringclient and family outcomes.

(Continued )



Table I. (Continued )

State Purpose of CAFAS use Approx. date of implementation Source(s)

MO Component of preliminary study to assess October 1, 1995 Daniels & Clements, 1997outcomes for children and adolescents receivingpublic mental health services funded by theDepartment of Mental Health.

NE Collected at intake, every At least since 1999 GUNTAC, 20006 months, and at discharge while inProfessional Partner Program.

NH Using Miniscale Version (see IN) and diagnostic At least since GUNTAC, 2000; J. Perry,information to determine eligibility for services. October, 1993 personal communication,Planning to implement full version of the scale January 31, 2000beginning July 2000.

NJ Piloting the CAFAS in Southern Region Summer, 2000 GUNTAC, 2000with the long-term goal to use statewide.

NY Administered with other battery instruments At least since 1999 GUNTAC, 2000at intake and every 6 months in theF.R.I.E.N.D.S. program.

NC Primary criterion to authorize levels of care January, 1994 Behar, & Stelle, 1997;related to six levels of intensity of services (statewide by 1997) S. Clark, personalfor children with mental health and/or communication,substance use problems. November 19, 1997

ND Expanding use of CAFAS from 3 to all 8 At least since 1999 K. Moum, personalstate regions for outcome assessment communication,and treatment planning. January 28, 2000

OH Component of pilot study during 1998–99. 1998 GUNTAC, 2000Switched to Ohio Youth Scales in 2000.

OR Using the CAFAS statewide along with At least since 1999 GUNTAC, 2000the CGAS for outcome evaluation.

SC Currently mandated for use in treatment At least since 1999 D. Mahrer, personalplanning and outcome evaluation in inpatient communication,and outpatient child and adolescent programs. February 1, 2000Also in process of developing criterionscores for eligibility determination.

SD CAFAS is principal instrument used across At least since 1999 GUNTAC, 2000inpatient and outpatient settings statewide.

TN Component of Children’s Plan Outcome Review 1994 Heflinger & Simpkins,Team (C-PORT) used in evaluation of service 1997; O’Nealsystem for all children in state custody. & Wade, 1998

VT Component of evaluation battery designed by At least since 2000 GUNTAC, 2000University of VT Evaluation Team to createlinkages across multiple state grants.

VA Component of performance and outcome Summer, 1997 Koch & Brunk, 1998;measurement system (POMS) being piloted Kirkman, Brunk,statewide to assess outcomes of child/ & Cohen, 1999adolescent public mental health services,and used to determine Level of Carefor services funded by the ComprehensiveServices Act.

WV Component of assessment battery required for At least since 1999 GUNTAC, 2000all children receiving Medicaid-reimbursedbehavioral health services.

Note. Information regarding current usage of evaluation instruments in children’s mental health services for each state can be found atthe Georgetown University National Technical Assistance Center (GUNTAC) website (http://www.dml.georgetown.edu/depts/pediatrics/gucdc/eval.html).


68 Bates

Carolina are in the process of developing guide-lines for using the CAFAS in eligibility determina-tion (S. Lindsey, personal communication, January 28,2000; K. Moum, personal communication, January 31,2000).

Several recent changes in the way state mentalhealth departments conduct business appear to havecontributed to this rise in CAFAS usage. First, the in-clusion of the functional impairment stipulation in theCMHS definition of SED now requires states to oper-ationally define and measure functional impairmentto receive federal block grant funding for treatmentof youths with SED. Second, with many states adopt-ing a managed care model of service delivery, third-party payers such as Medicaid are requiring documen-tation of functional impairment to justify treatmentdecisions (Anderson et al., 1996; Srebnik et al., 1998).Third, the fields of psychology and mental health havesparked a demand for empirically justified treatmentmethods, which has created the need to collect ob-jective outcome data using instruments such as theCAFAS (Kazdin & Weisz, 1998; Task Force, 1995).

Demonstration Projects

In addition to statewide implementation, theCAFAS is widely used as an outcome measure ona smaller scale in local mental health settings andevaluation projects across the country. The CAFASwas developed as one of the outcome measures forthe Fort Bragg Evaluation Project (FBEP; Bickman,1996a, 1996b). This project has recently receivedmuch public scrutiny, primarily because the evalua-tors reported no significant differences in outcomesbetween the experimental and control groups. In asingle issue, the American Psychologist (May, 1997)devoted eight commentary articles in response toBickman’s findings. The interest raised was due pri-marily to Bickman’s conclusion that the $80 millionexperimental treatment—a system of care for youthswith SED—produced no better outcomes than thetraditional mental health system control. Althoughmany have questioned Bickman’s conclusions (e.g.,Friedman & Burns, 1996; Pires, 1997), it is clear thatthe CAFAS was an essential component of Bickman’sarguments.

The CAFAS is also being used in other system ofcare projects. The Center for Mental Health Services,for example, has funded more than 40 sites nation-wide to develop, implement, and evaluate systems ofcare for youths with SED. The CAFAS was selected asone of the mandatory outcome measures that evalua-

tors at each site must collect. Similarly, the CAFASis mandated for use in all county system of careprojects funded by California State Assembly Bill3015 (18 counties; A. Rosenblatt, Wyman, Kingdon, &Ichinose, 1997). Table II lists these and some of theadditional research projects that have used or are cur-rently using the CAFAS as an outcome measure.

EVALUATION OF LOF MEASURES

Given that global functioning plays an importantrole in the provision and evaluation of mental healthservices, and that the CAFAS in particular has beenadopted on such a widespread scale, it is prudent toevaluate LOF measures for their technical and prac-tical adequacy in serving these purposes. Several au-thors have offered criteria for selecting appropriatemeasures to assess treatment outcomes in studies ofservice delivery in mental health settings (Green &Newman, 1996; Newman & Ciarlo, 1994; Newman,Hunter, & Irving, 1987; Vermillion & Pfeiffer, 1993).Although there are differences between these setsof criteria, they seem to converge into the follow-ing four broad features of desirable outcome mea-sures: (a) strong psychometric properties, (b) validityfor use with target populations, (c) ease of use, and(d) utility. In the following sections, these guidelineswill be applied as a framework to discuss the evalua-tion of LOF measures in general and the CAFAS indetail.

Studies of the reliability and validity of LOFmeasures have generally produced mixed results; al-though most suggest that their psychometric qualitiesare moderate to good (Bird & Gould, 1995; Hodges& Gust, 1995), others have characterized them as un-acceptable (Zimmerman, 1996). Perhaps where LOFmeasures excel is in their ease of use. Most LOFmeasures employ a simple methodology, have min-imal cost, take little time to complete, and can usuallybe completed by nonprofessionals, though perhapswith questionable validity (B. Green, Shirk, Hanze,& Wanstrath, 1994; Hodges & Gust, 1995). For mostLOF scales, training materials are not available; yetbecause of their simple methodology they may not beneeded. LOF scales also have high utility. They usu-ally generate a single score that is easily applied toclinical treatment and outcome assessment, and pro-vide a common metric by which to compare clientswith different diagnostic features.

Unidimensional scales assessing global function-ing have a long history of use in diagnosis, treatment,



Table II. Summary of CAFAS Use in Research and Demonstration Projects

Project title Description of project & CAFAS use Source

Fort Bragg (NC) Evaluation Demonstration project comparing youths who Bickman, 1996a, 1996bProject (FBEP) received continuum of care mental health

services with those who received CHAMPUS-funded services. CAFAS was one of manyoutcome measures.

California Assembly Bill 3015- Eight counties in CA funded to develop, implement, Rosenblatt, et al., 1997Funded County Sites and evaluate systems of care for youths with SED.

CAFAS is used as one component of evaluation.CMHS-Funded Sites More than 40 nationwide sites funded to develop, “Comprehensive community

implement, and evaluate systems of care for mental health services foryouths with SED. CAFAS is mandated component children program,” 1999of outcome evaluation.

Mental Health Services Program Using CAFAS scores to assess client outcomes Rotto, Sokol, Matthews,for Youth (MHSPY) Replication and track service accountability of this system of & Russell, 1998Project (Indianapolis, IN) care for youths with SED.

Anne E. Casey Foundation’s Using CAFAS scores to assess client outcomes, Gutierrez-Mayka, 1998Mental Health Initiative for service fidelity, and clinical impact ofUrban Children community-based services for children at-risk

of out-of-home placement in three Bostonneighborhoods.

Wraparound Milwaukee (WI) Using CAFAS scores to evaluate a pilot study Kamradt, Kostan, &of the effectiveness of “wraparound” services Pina, 1998for youths with SED.

MENTOR (Boston, MA) Using CAFAS scores to assess outcomes for youths Altaffer & Stelk, 1998served by this national provider of community-based child/adolescent mental health programs.

Cleo Wallace Center (Westminster, CO) Using CAFAS scores at intake and discharge to Jacobson & Meyer, 1997establish need for service and monitor treatmentoutcomes in this residential psychiatric facility.

Youth Alliance of Central Georgia Using CAFAS scores at intake, at 3-month intervals, Feibelman, 1998and discharge to assess progress of youths served ina variety of mental health treatment facilities.

School and Community Study Using CAFAS to evaluate study of four model Oliveira, Rivera,(KY & VT) school-based programs for inclusion of children Kutash, Duchnowski,

with SED in communities with a system of care. & Calvanese, 1998Illinois State Board of Using CAFAS to evaluate study of community- Eber & Rolf, 1998

Education Sites based supports and services for children withemotional and behavioral disabilities andtheir families.

Prime Time Project Using CAFAS scores to describe and monitor Selby, Trupin, McCauley,(King County, WA) adolescents enrolled in community-based & Vander Stoep, 1998

intervention for youths with SED andinvolvement in the juvenile justice system.

and evaluation of mental health problems. The firstgeneration of global level of functioning scales was theHealth-Sickness Rating Scale (HSRS) developed byLuborsky (1962). This was a 100-point scale with eightdescriptor anchor points. Although the HSRS waseasy to use, it was criticized because its anchor pointsincluded both behavioral descriptions and diagnosticcategories and were unevenly distributed within itstotal range (Friis, 1996). Developed as an improve-ment over the HSRS, the Global Assessment Scale(GAS; Endicott, Spitzer, Fleiss, & Cohen, 1976) wasalso a 100-point scale marked by the inclusion of 10

evenly distributed anchor points, which contained nodiagnostic categories (Friis, 1996). Both of these scaleswere designed for use with adults.

The most widely accepted and utilized unidimen-sional LOF scale for youths—the Children’s GlobalAssessment Scale (CGAS, Shaffer et al., 1983)—wasdeveloped as an adaptation of the GAS for use with ayounger population. Similar to the GAS, it contained10 anchor points evenly distributed between 0 and100. Much of the wording of descriptors was signif-icantly altered, however, for use with children andadolescents. The Global Assessment of Functioning


70 Bates

(GAF) scale was first introduced in 1987 as Axis Vof the multiaxial diagnostic system of the DSM-III-R(American Psychiatric Association, 1987). This scalewas conceptually very similar to the GAS and theCGAS, although with a range of only 0–90. Whereasthe descriptors of the CGAS were written exclusivelyfor use with children, the descriptors of the GAF weremore general and designed for use with both adultsand children. With the publication of the DSM-IV(1994), the total range of the GAF was extended to 0–100 by adding definitions for the 91–100 range of func-tioning. Both the GAF and CGAS anchor points con-tain a mix of behavioral descriptions and symptoms.

There have been few published studies of thereliability or validity of unidimensional global ratingscales (Friis, 1996). It appears that most of the studiesthat do exist were conducted using the CGAS. Relat-ing to the stability of the CGAS, test-retest reliabil-ity coefficients have been generally positive (.74–.76,Bird, Canino, Rubio-Stipec, & Ribera, 1987; Caninoet al., 1987; .69–.95, Shaffer et al., 1983). Relating tointerrater reliability, the evidence is more mixed. Gen-erally, interrater reliability has been adequate in stud-ies using professional raters when information is gath-ered through case histories or in-person interviews. Inthe original study of the CGAS, for example, Shafferet al. (1983) reported a high coefficient of interraterreliability (.84). Raters in this study were five second-year psychiatry fellows responding to case vignettes.In a second study in which the GAF (DSM-III ver-sion) was also completed, two child psychiatrists con-ducted in-depth diagnostic interviews with both par-ents of 191 children. Two additional child psychiatristscompleted ratings from observing the videotapes ofthese interviews. The interrater reliability coefficientsfor overall severity were .72 for the CGAS, and .74(current) and .73 (past 6 months) for the GAF (Birdet al., 1987).

Whereas these initial findings demonstratedmoderate support for the interrater reliability of theCGAS and GAF, Green et al. (1994) argued that noneof these studies used raters who were actually in-volved in the treatment of the child. Addressing thisissue, these authors found somewhat lower interraterreliabilities for attending psychiatrist raters (.62), andcomparable reliabilities for milieu staff raters (.76),who completed the CGAS on 95 child hospital inpa-tients upon admission and discharge. Unsatisfactoryreliability coefficients were also reported in an earlierapplied field study conducted by Herman (1983, ascited in Hodges & Gust, 1995). A more recent study(Rey, Starling, Wever, Dossetor, & Plapp, 1995) docu-

mented low coefficients of interrater reliability among20 experienced clinicians using the GAF (DSM-III-R version; .54 for outpatients, .66 for inpatients) andthe CGAS (.63 for outpatients, .53 for inpatients) torate children in inpatient and outpatient treatmentsettings. Thus, the interrater reliabilities of the CGASand the GAF appear to be adequate only under cer-tain conditions.

Studies of the validity of the CGAS have alsoyielded mixed results. On the one hand, Bird et al.(1987) found moderate correlations (absolute valuerange = .40–.65) between CGAS ratings and scoreson the Child Behavior Checklist (CBCL; Achenbach,1991). Using the criterion score of 70 on the CGAS toform impaired and nonimpaired groups, these authorsalso found significant group differences on CBCL to-tal problem scores, clinical status (case–noncase), re-ferral status (referred–nonreferred), and number ofclinical diagnoses. Green et al. (1994), on the otherhand, failed to find significant correlations betweenCGAS and CBCL scores, but reported that CGASscores correlated significantly with indices of chil-dren’s competence. Thus, these results provide someevidence, but not compelling support, for the CGAS’svalid use in making clinical treatment decisions.

Hodges and Gust (1995, p. 407) concluded thatthe CGAS “has satisfactory reliability and validitywhen used by professionals and when used in a sit-uation in which there is minimal information vari-ance (i.e., information on which the score is based isconsistent across all raters).” These authors and oth-ers (Green et al., 1994; Rey et al., 1995) emphasizedthat more research is needed to assess the adequacyof the CGAS and similar measures for use in less-controlled applied settings. Hodges and Gust (1995)suggested that the CGAS and other unidimensionalglobal functioning measures are particularly vulnera-ble to respondent bias when the amount of availableinformation about the child is low. Whereas the threatof bias is present to some extent in all rating scales,one goal of scale development, use, and evaluationshould be to minimize the degree to which respon-dent bias contributes to the given score (Hodges &Gust, 1995). Thus, to generate an estimate of level offunctioning that is less prone to respondent bias, theseauthors advocated the use of multidimensional scalesthat attempt to measure global functioning across avariety of domains.

Although several multidimensional functioningscales have appeared in the literature, descriptive orpsychometric information (or both) about them is vir-tually nonexistent. The Colorado Client Assessment



Record (CCAR; Ellis, Wilson, & Foster, 1984) appearsto have been the first multidimensional checklist ofclient functioning. The CCAR consists of 77 checklistitems in the following nine domains: socio-legal, sub-stance use, medical/physical illness/injury, thinking,personal distress, personal behavior, interpersonalbehavior, interpersonal relations, role performance(employment, academic training, and management ofpersonal affairs), and meeting basic needs. Item andfactor analyses were conducted to arrive at these do-main groupings. Although the developers claim theCCAR has an extensive research background, muchof it is unpublished. Unfortunately, no reliability dataon the CCAR are available and the only validity ev-idence reported was that scores from a preliminaryversion of the scale discriminated hospital from clinicclients both at admission and at discharge (Ellis et al.,1984).

The North Carolina Functional Assessment Scale(NCFAS) is another multidimensional functioningscale that was adapted from the CCAR. Again, verylittle information about the NCFAS is available in theliterature. To date, only one study using the NCFAShas been published (Walker, Minor-Schork, Bloch, &Esinhart, 1996). According to these authors, the NC-FAS is a clinician-administered rating scale designedfor use with adults. Level of functioning is rated alongsix dimensions: role performance, emotional health,ability to care for basic needs, behavior, thinking, andsubstance use. These scales are combined to yield aglobal score ranging from 0 to 180, with a score of40 or above indicating significant functional disabil-ity (Walker et al., 1996). Published reliability or othervalidity data are unavailable.

Studies of the reliability and validity of unidimen-sional LOF measures have yielded mixed results. Inparticular, these scales appear to be highly vulnerableto rater bias when used in conditions of high informa-tion variance. Whereas multidimensional LOF mea-sures appear on the surface to resolve this problem,the reliability and validity evidence to support thisclaim is essentially nonexistent. To date, the CAFASis the only multidimensional LOF measure for whichpublished reliability and validity studies are available.

DESCRIPTION OF THE CAFAS

Both the CCAR and NCFAS were designedfor use with adults. In order to fill the need for amultidimensional global assessment scale for chil-dren and adolescents, Hodges (1989, 1997) cre-

ated the Child and Adolescent Functional Assess-ment Scale (CAFAS). Adapted from the NCFAS(Bickman, Heflinger, Pion, & Behar, 1992), theCAFAS initially contained five domains (Role Per-formance, Moods/Emotions, Behavior Toward Oth-ers/Self, Thinking, and Substance Use) with possi-ble total scores ranging from 0 to 150. In laterversions, the Role Performance subscale was dividedinto School/Work, Home, and Community domains.The instrument was originally designed as an outcomemeasure in the Fort Bragg Evaluation Project (FBEP;Bickman et al., 1992) for use with children and adoles-cents with severe emotional and behavioral disorders(Hodges & Wong, 1996).

Overview

The CAFAS (Hodges, 1989, 1997) is a rating scaledesigned to measure functional impairment acrossmultiple domains in children and adolescents, andtheir caregivers. Impairment is operationalized as thedegree to which the youth’s problems interfere withhis or her functioning in various life roles (e.g., stu-dent, family member, worker, friend, citizen). To com-plete the scale, a rater reviews a list of 165 behav-ioral descriptions and selects those statements thatdescribe the child’s most severe level of function-ing during a given time period (usually the past 1–3months). The list of behaviors fall into the followingfive domains:

1. Role Performance – effectiveness of theyouth’s ability to fulfill societal roles, includ-ing School/Work, Home, and Communitysubscales;

2. Behavior Toward Others/Self – appropriate-ness of the youth’s daily behavior;

3. Moods/Self-Harm – modulation of the youth’semotional life and extent to which youthdemonstrates self-harmful behavior, includingMoods/Emotions and Self-Harmful Behaviorsubscales;

4. Thinking – ability of the youth to use rationalthought processes; and

5. Substance Use – the youth’s substance use andthe extent to which it is inappropriate and dis-ruptive.

A second portion of the scale allows the rater toassess functional impairment in the caregiver. Be-cause these caregiver subscales are supplementary(Hodges, 1997) and the first five scales often are used


72 Bates

exclusively, this paper will focus only on the first por-tion of the instrument. Hodges and Wong (1996) sug-gested that the CAFAS may be useful in (a) linkinglevel of care to level of need, (b) evaluating and plan-ning programs, (c) conducting client-oriented costoutcome studies, and (d) providing consumer “reportcards.”

Scale Development

There is currently no available information in thepublished literature explaining how the CAFAS wasdeveloped. Both Hodges (1997) and Bickman et al.(1992) stated that the CAFAS was adapted from theNCFAS scale as part of the FBEP. In fact, 67% ofthe items on the original version of the CAFAS wereduplicate or modified NCFAS items. One source—the Clinical Training Manual of the Children andYouth Performance Outcome Program implementedby the California Department of Mental Health (1997,p. 61)—provides some information about the originsof the CAFAS. According to this document, the au-thor of the CAFAS

. . .made extensive modifications to the items andscales of the NCFAS to render them more appropri-ate for children, and subsequently sought input from40 experts on three separate occasions after each re-vision of the developing instrument. Colleagues wereselected who could provide input from a variety ofperspectives, including child psychopathology, nor-mal development, and the special needs of Hispanicand Afro-American children. Suggestions were alsoobtained from spokespersons for parent advocategroups.

No further information is available about the specificmethods used in the item selection and revision pro-cess, nor how the input and suggestions were obtainedand used. From the available literature, it cannot bedetermined whether the CAFAS items and subscaleswere primarily derived using empirical or rationalmethods.

According to the manual (Hodges, 1997), theCAFAS is not based on a particular theory of childpsychopathology. Thus, ratings are not intended toreflect any underlying etiology or dynamics regardingthe youth’s problems, but to profile the degree of dis-ruption in the youth’s current functioning. It may beargued, however, that scale development (selection ofitems, determination of content area, etc.) must be in-fluenced by some theoretical assumptions about childdevelopment and functioning, whether the developer

is fully aware of these assumptions or not (Reckase,1996). Without specifically addressing these assump-tions, the scale developer leaves it to the user to in-fer them from the scale’s construction and scoringscheme.

Scoring

According to the author, each item on theCAFAS is presented in specific behavioral terms andassigned to a given functional impairment score asfollows:

1. “30”: Severe—severe disruption or incapaci-tation;

2. “20”: Moderate—persistent disruption or ma-jor occasional disruption of functioning;

3. “10”: Mild—significant problems or distress;and

4. “0”: Minimal or No Impairment—no disrup-tion of functioning.

On each subscale, multiple items are given for eachseverity level. To generate a score for a scale orsubscale, the highest indicated level of severity isrecorded, even if multiple items at that severity levelare endorsed. For example, a rater would assign ascore of 20 (for Moderate impairment) to a subscalewhether one or three Moderate items were endorsed(assuming that no Severe items were endorsed). Us-ing the original scoring scheme for the Role Perfor-mance and Moods/Emotions scales, the highest sub-scale score is recorded as the overall scale score. Thus,a child who is rated 20 on School/Work, 10 on Home,and 30 on Community would receive a Role Perfor-mance score of 30 (the highest of the subscale scores).It is important to emphasize that, according to thescoring directions suggested in the manual, only itemswithin the maximum severity level endorsed for agiven subscale are evaluated by the rater. Thus, if oneor more items in the Severe category are endorsed,the rater would skip the Moderate, Mild, and Mini-mal items and proceed to the next subscale.

For each of the five CAFAS scales the possiblescores range from 0 to 30 (by tens). A total scalescore is generated by summing the five-scale scores,and can range from 0 to 150. It should be notedthat, although this scoring system is suggested in theoriginal manual (Hodges, 1989), the revised version(Hodges, 1997) expands the scoring range to 0–240by retaining each of the three School/Work, Home,



Table III. Relationship Between Scoring Systems for the 5-Scale and 8-Scale Versions of the CAFAS

8-Scale Name of scale 5-Scale

School/workCommunity

!!!!!→ Role performance (max. score)HomeBehavior toward othersMoods/emotions Moods/emotions (max. score)!!!→Moods/self-harmThinkingSubstance use

Range (0–240) TOTAL SCORE Range (0–150)

Note. Each CAFAS subscale score ranges from 0 to 30 by tens, such that the total score for the 5-scale version has only 16possible values (25 possible values for the 8-scale version).

and Community Role Performance scores and bothof the Moods/Emotions and Moods/Self-Harm scores(see Table III). One concerning trend is that multi-ple scoring schemes have been employed in publishedCAFAS studies. For example, Furlong, Casas, and col-leagues (Robertson et al., 1998; J. Rosenblatt et al.,1998; Wood et al., 1998) employed the five-scale scor-ing method following the original manual guidelines.J. Rosenblatt and A. Rosenblatt (1999) employed theeight- scale scoring method as suggested in the revisedmanual. Lemoine and McDermott (1998) used thefive-scale scoring scheme and included the two care-giver scales in the total score to generate a possible0–210 range. Hersch (1998) eliminated the Commu-nity Role Performance and Substance Use subscalesfrom the eight-scale scoring scheme to generate apossible total score range from 0 to 180. The stateof Indiana uses a miniscale version of the CAFASthat includes two additional subscales—Environmentand Reliance (J. Phillips, personal communication,January 28, 2000). With such nonconformity in scor-ing the instrument, it is imperative to clearly specifyhow the total scores were calculated when comparingCAFAS scores across studies or programs.

Target Population

The CAFAS is intended for use with “childrenand adolescents who have or may have emotional,behavioral, substance use, psychiatric, or psycholog-ical problems” (Hodges, 1997, p. 1-1). This includesyouths who are referred for these problems or whoare at risk for developing them. The author suggeststhat the CAFAS is particularly useful in assessing out-comes for youths with SED. It is intended for childrenaged 6–17 years, although there is also a version for

younger children aged 4–7 years—the Preschool andEarly Childhood Functional Assessment Scale (PEC-FAS; Hodges, 1997). Hodges and Wong (1996) foundno significant differences in CAFAS scores betweengender and racial/ethnic groups, suggesting that it maybe a useful component of culturally competent assess-ment. A Spanish language version is also available.

Raters and Training

The CAFAS was designed to be completed byclinicians or other trained administrators who areworking with the youth and family. It is also preferredthat raters have graduate training in a mental healthfield and “be knowledgeable about the spectrum ofbehavioral and emotional problems which [sic] chil-dren may experience” (Hodges, 1997, p. 6-2). Non-clinicians may complete the scale, but it is suggestedthat they receive full training and use the optionalstructured interview to collect information about theyouth. One particularly strong feature of the CAFASis the availability of a well-developed training manualwith numerous training vignettes. Clinician raters mayuse various sources of information to complete theCAFAS, including interviews with the child and fam-ily, interviews with other professionals familiar withthe child’s behaviors, and record reviews.

EVALUATION OF THE CAFAS

The following section describes an evaluation ofthe CAFAS, using the previously outlined criteriafor assessing treatment outcome measures. Evidencefrom the manual (Hodges, 1997) and relevant arti-cles are explored in the context of the categories of


74 Bates

psychometric properties, validity for use with targetpopulation, ease of use, and utility.

Psychometric Properties

The following psychometric properties areaddressed: (a) internal consistency reliability, (b) in-terrater reliability, (c) stability of scores, (d) con-tent and structural validity, (e) concurrent valid-ity, (f) criterion-related validity, and (g) predictivevalidity.

Internal Consistency Reliability

Little information about the internal consistencyreliability of the CAFAS and its scales is available inthe manual and none appears in published articles.In the manual, Hodges (1997) stated that the inter-nal consistency coefficient (Cronbach’s alpha) valuesranged from 0.63 to 0.68 for the different waves inthe FBEP (Breda, 1996), and cited her own psycho-metric paper (Hodges & Wong, 1996) as the sourcefor these data. Unfortunately, these data do not ap-pear in this paper and therefore the context underwhich they were generated is unclear. In the manual,Hodges (1997) stated that these internal consistencyvalues (0.63 to 0.68) “reflect on the homogeneity ofthe scales of the CAFAS” and “are supportive ofthe reliability of the CAFAS” (p. 2-1). The authoralso stated that this reliability evidence is “especiallytrue [sic] given that the separate scales are intendedto assess different domains of impairment” (p. 2-1),and that the reliability of the entire scale would de-crease with the omission of any of the individualscales.

These arguments can be critically examined onseveral points. First, coefficient alpha values of 0.63–0.68 are generally considered relatively low and donot provide compelling evidence of internal consis-tency (Clark & Watson, 1995; Schmitt, 1996). Instead,lower values of coefficient alpha suggest variability initem content, which may still be congruent with thedesired goals of a particular scale depending on theheterogeneity of the construct under study (Clark &Watson, 1995; Schmitt, 1996). Second, the proceduresfor completing the scale require selecting items in onlythe most impaired category on each subscale. Thus, ona given subscale, several items within the same impair-ment category may be endorsed, but never can two

items of different severities be endorsed. This neces-sitates a correlation of zero between items in differingimpairment categories. As a result, estimates of coef-ficient alpha will be greatly attenuated. It can there-fore be concluded that internal consistency reliabilityof the CAFAS has not been established. Given thatthis appears to be an inappropriate method of eval-uating LOF measures, however, this does not appearto be a critical weakness of the scale.

Interrater Reliability

In contrast to internal consistency reliability, ev-idence for interrater reliability for the CAFAS hasbeen well-documented (Hodges, 1997; Hodges &Wong, 1996). Using 20 training vignettes and fourdiscrete samples (N = 54) of undergraduate students,graduate students, and child service agency staffmembers, Hodges and Wong (1996) assessed inter-rater reliability in two ways. First, they calculatedPearson product moment correlations between theraters scores and a criterion score for each vignette.Criterion scores were generated by consensus of theprimary author and a board-certified child psychia-trist. Pearson coefficients were then transformed toz-scores and averaged across raters. Second, they cal-culated intraclass correlations (ICC) based on anal-ysis of variance procedures to provide an estimateof raters agreement with each other. AggregatedPearson coefficients for each of the four samplesranged from .74 to .99; ICC correlations ranged from.63 to .96.

Most of these values indicate good interrater re-liability. However, this method of reliability estima-tion is suspect in that the reliability coefficients weregenerated from ratings of subscales, not individualitems. Thus, it provides no information about the de-gree of agreement between raters on actual behav-iors, but only on severity of groups of behaviors. Tworaters could disagree about the behaviors a given childexhibits, but appear to be perfectly reliable if thesebehaviors were assigned to equivalent severity cate-gories. Furthermore, rater A could endorse five se-vere items on a subscale whereas rater B could en-dorse only one severe item, yet by these methodsthey would demonstrate perfect agreement due to themaximum scoring criteria. It would be of interest toexamine interrater reliability of the CAFAS, using in-dividual items as the unit of analysis. It would alsobe of interest to replicate this reliability study using a



larger sample of raters, using raters involved in childtreatment, or using “real life” ratings as opposed tovignettes.

According to the authors, values for the Think-ing subscale were not reported “due to low frequencyof formal thought problems or organicity in the vi-gnettes, which were designed to be representative oftypical clinical presentations” (Hodges & Wong, 1996,p. 499). This argument begets the following question:why was this subscale included on the CAFAS if itdoes not contribute to the assessment of “typical” clin-ical presentations? Is this subscale less important thanthe other scales? Even if this were the case, it wouldstill be necessary to explore the interrater reliabilityof this subscale to establish reliability estimates forthe entire CAFAS. Thus, it would be desirable to in-clude formal thought problems or organicity on futureinterrater reliability training vignettes.

Ogles, Davis, and Lunnen (1999) tested the in-terrater reliability of the CAFAS under two meth-ods of presentation of case data: (a) manual vignettesand (b) archival data from actual cases. As wouldbe expected, the interrater correlations for CAFAStotal scores generated using manual vignettes by threegroups of raters [undergraduate students (.88), grad-uate students (.89), and case managers (.94)] droppedconsiderably by using actual case data (.66, .75, and.55, respectively). A major drawback of this study,however, was small sample size: there were only fourraters in each of the three groups.

Given these questions about the interrater reli-ability studies of the CAFAS, it cannot be concludedthat raters tend to agree with each other on specificitems. It may be reasonably concluded, however, thatthey do tend to agree fairly well with each other on theseverity of behaviors, at least on four of the five sub-scales, in response to vignettes. Given that the severitylevel (as opposed to the item level) is the suggestedlevel of analysis for the CAFAS, this evidence pro-vides, at the very least, moderate support for the inter-rater reliability of the CAFAS when used under theseconditions. Closer inspection of the CAFAS trainingvignettes, in fact, reveals that case information oftencontains wording that is identical to that of individ-ual items on the scale, creating a rating situation thatis likely much less complex than actual usage condi-tions. Furthermore, as previously discussed, estimatesof interrater reliability are maximized under condi-tions when information variance is low. Thus, it wouldstill be necessary to demonstrate that the interraterreliability of the CAFAS holds up under actual usage

conditions—that is, when client information is variedor limited.

Stability of Scores

Only one study (Hodges, 1995) has examinedthe test-retest reliability of the CAFAS. In thisstudy, CAFAS ratings were gathered by two differ-ent raters at 1-week intervals via telephone inter-views with mothers of 56 youths. Interviews wereconducted by trained graduate students. The Pearsonproduct—moment correlation coefficients betweenthe two scores were as follows: Total Score = 0.95;Role Performance score = 0.84; Behavior TowardSelf and Others = 0.82; Moods/Emotions = 0.91; andThinking = 0.89. No explanations were provided forthe absence of correlations for the Substance Use sub-scale. Results of follow-up t-tests indicated no signif-icant differences between Time 1 and Time 2 ratingsfor any of the scale scores or the total score. In gen-eral, these findings provide fairly strong evidence thatCAFAS scores are stable over a period of 1 week, us-ing the interview protocol. Again, it would be infor-mative to explore the stability of scores generated byclinician raters under actual usage conditions.

Content and Structural Validity

To date, there is no available information ineither the published literature or the manual con-cerning the content validity of the CAFAS items.As previously discussed, only one secondary source(California Department of Mental Health, 1997) hasaddressed the details of item selection. Given this lim-ited information regarding the development of thescale, several problems emerge. First, it is unclear howitems were selected for inclusion in the scale, what theunderlying factor structure of the instrument is, andwhether individual items represent the constructs towhich they were assigned. As most scale developmentscholars agree (cf., Reckase, 1996), it is imperativeto have clear theoretical and empirical reasoning tomake meaningful decisions about inclusion of itemsand creation of subscales. To demonstrate this reason-ing, for example, one might summarize ratings fromexpert judges (theoretical) or conduct factor analy-ses (empirical). Given that this reasoning is lacking,it must be concluded that the content validity of theCAFAS is suspect. Second, given that the constructof global functioning and its subscale domains are not


76 Bates

operationalized, it is unclear whether the items pro-vide sufficient or excessive coverage. It is also unclearwhether the CAFAS items represent the most theo-retically or technically sound items from a larger pool,or whether they were subjected to any form of itemanalysis.

Evidence supporting the structural and scalingvalidity of the CAFAS is also unavailable. In otherwords, there is no supporting evidence suggesting thatitems in the “Severe” (“Moderate,” etc.) categoryactually reflect severe (moderate, etc.) functional im-pairment. On the School/Work Role Performancescale, for example, the following items are scored 30for “severe impairment”:

#004 – “Harmed or made serious threat to hurt ateacher/peer/co-worker/supervisor. . .”

#006 – “Chronic truancy resulting in negative conse-quences (e.g., detention, loss of course credit, failingcourses or tests, parents notified. . .)”

#008 – “Disruptive behavior, related to poor attentionor high activity level, persists despite the youth hav-ing been placed in a special learning environmentor receiving a specialized program or treatment. . .”

It may be the case that these items do reflect a simi-lar level of functional impairment. Conversely, theseitems may be associated with different levels of im-pairment. One could argue, for example, that a stu-dent who harmed or threatened a teacher is morefunctionally impaired than a student who receiveda detention for repeated truancy. What is clearlyneeded is empirical evidence to demonstrate that theCAFAS items reflect a unidimensional continuum ofseverity and are appropriately scaled. Such evidencewould greatly enhance the construct validity of thisinstrument.

One potential problem with the scaling of theCAFAS is that the items comprising the “Minimal orNo Impairment” severity level do not contribute tothe total score. Thus, a child’s subscale and total scoresare not affected by whether these items are endorsedor not. It appears then, that the purpose of these itemsis purely descriptive in nature. From initial inspectionof the items, it appears that some attempt to reflectthe absence of impairment (e.g., #030 – “Functionssatisfactorily even with distractions”), whereas oth-ers seem to reflect positive functioning (e.g., #037 –“Graduated from high school or received GED”). Itis interesting that both of these behaviors are scored0 even though one could argue that they representdifferent levels of functioning. Certainly this informa-tion would be important to understanding the scope

of a youth’s functioning, but insufficient attention isgiven to positive functioning on the CAFAS (as itdoes not even affect the score). One might reasonablyconclude that the CAFAS is aimed at assessing onlyimpairment in functioning, rather than positive func-tioning, for the majority of items are in the “Severe”and “Moderate” severity levels. It thus appears thatthe items in the “Minimal or No Impairment” sever-ity category are superfluous to its use as an objectiveoutcome measure.

Upon face inspection, there also appear to be anumber of items with overlapping content. For exam-ple, it may be argued that the following items repre-sent essentially equivalent content:

#012 – “Non-compliant behavior which results in per-sistent or repeated disruption of group functioningor becomes known to authority figures other thanclassroom teacher (e.g., principal) because of sever-ity and/or chronicity”

#013 – “Inappropriate behavior which results in per-sistent or repeated disruption of group functioningor becomes known to authority figures other thanclassroom teacher (e.g., principal) because of sever-ity and/or chronicity” [emphases added]

In the interest of parsimony, it would be desirableto reduce item redundancy by eliminating extrane-ous items or combining items with similar content.Item analysis—a critical step in scale construction andrefinement—would be appropriate to achieve theseaims to select the “best” items for the scale.

Another problem is that the suggested scoringsystem for the CAFAS employs a theoretically con-fusing scoring system that combines compensatoryand maximum strategies. Within each subscale, themaximum severity is scored, and then these scoresare summed across subscales to generate a globalscore, apparently combining two theoretically com-peting scoring models. Certainly there are other scor-ing models that might be applied to the CAFAS (andother multidimensional LOF scales) that might proveequally or more valid. A variety of scoring modelsmight be particularly useful for LOF assessment, in-cluding compensatory, average, conjunctive, or dis-junctive models. Because choice of scoring model canhave significant impact on the relative ranking of re-spondents (see Bates, 1999 for discussion), it is sug-gested that the original scoring system of the CAFASbe revisited and examined using empirical methods.

To address these problems, Bates (1999) recentlycompleted a study investigating the scaling propertiesof CAFAS items. The study was conducted in three



phases. In Phase 1, a group of expert raters was askedto indicate (a) the degree to which each of the itemsrepresents the subscale construct to which it was orig-inally assigned, and (b) how well the item tapped thegiven construct. In Phase 2, additional expert raterswas asked to indicate their perceptions of severity byassigning to each item a severity rating on a 9-pointscale. Using these ratings, successive intervals scalingtechniques were used to generate weighted rankingsfor each item. These rankings were then used to cal-culate weights for each items, providing a method toinvestigate the validity of the original scoring system.In Phase 3, CAFAS data were collected on a sample ofyouths with SED enrolled in a cross-agency system ofcare project for youths with serious emotional distur-bance. CAFAS scores calculated with these deriveditem weights and a consistent average scoring modelwere then compared with CAFAS scores generated bythe original method with reference to the strengths oftheir associations with other outcome measures suchas the CBCL, risk factors, and educational indicators.

The results of this study generally failed to sup-port the suggested scoring system for the CAFAS andinstead indicated that empirically guided alterationsin the scoring system performed as well or better thanthe original in several measures of concurrent valid-ity. Specifically, there were multiple occasions of itemorder reversal, where the relative severity rankings us-ing the empirically derived values were reversed com-pared with the values suggested in the original version(e.g., an item with an original severity of 20 had ahigher empirically derived weight than an item withan original severity of 30 did). More problematic wasthe finding that empirically derived item weights didnot hold equivalency across subscales, such that itemswith original severity values of 30 on the Communitysubscale, for example, were rated as much more se-vere than items with original severity values of 30 onthe School/Work subscale. These results call into ques-tion the structural validity of the CAFAS and shouldcause concern about the validity of current and pastusage of the instrument.

Concurrent Validity

There have been several investigations of theconcurrent validity of the CAFAS total score. Inthe first study,4 the relationship between CAFAS

4This study was reported in the CAFAS manual (Hodges, 1997),but no reference was given.

total scores and scores on the CGAS was investi-gated in the FBEP sample. Pearson correlations be-tween the CAFAS and the CGAS ranged from −0.72to −0.91 for three time periods of data collection[Note: correlations are negative because higher val-ues of CAFAS scores reflect greater impairment,whereas lower CGAS scores reflect greater impair-ment]. There was also significant agreement betweenthe CAFAS and CGAS in categorization of youthsin one of four levels of impairment: severe, moderate,mild, or slight/none. Although no further informationabout this study is available, it does provide prelimi-nary, albeit limited, evidence of the construct validityof the CAFAS.

In the second study (Hodges & Wong, 1996),also using the FBEP data, analyses were conducted todemonstrate the construct validity of the CAFAS byinvestigating its relationships with global measures ofpsychopathology and problematic behaviors. Evalua-tion measures collected in the FBEP project included(a) the Child Assessment Scale (CAS; Hodges, 1990)and its parent form, the Parent Child Assessment Scale(PCAS; Hodges, 1990), which generate global scoresindicating general psychopathology; (b) the Burden ofCare Questionnaire (BCQ; Bickman, 1996b; Bickmanet al., 1992), developed specifically for use with theFBEP to assess objective and subjective burden ex-perienced by parents of children with serious emo-tional or behavioral problems; and (c) the Child Be-havior Checklist (CBCL; Achenbach, 1991) for theparent, the Youth Self-Report (YSR; Achenbach &Edelbrock, 1983) for youths aged 11 and older, and theTeacher Report Form (TRF; Edelbrock & Achenbach,1984) for the teacher—instruments designed to as-sess perceptions of problematic behaviors from mul-tiple informants. Correlations between the CAFASand other global measures of problematic function-ing across four points in time were as follows: PCAS(.59, .62, .58, .63); CBCL (.42, .49, .48, .47); CAS (.54,.56, .55, .52); and BCQ (.36, .42, .43, .42). As indi-cated, moderate positive correlations were found forall measures across all time periods, providing evi-dence of concurrent validity between the CAFAS anda constellation of problematic behaviors.

Criterion-Related Validity

To measure the association between CAFAStotal scores and individual problematic behaviors(data gathered through interviews with parents, andCBCL, TRF, and YSR scores), Hodges and Wong


78 Bates

(1996) bifurcated CAFAS total scores into two cat-egories: presence and absence of pathology. For thefirst wave of data (intake), the authors used a totalscore of 80 as the cutoff between the categories; ascore of 50 was used as a cutoff for the three follow-uptime periods of 6, 12, and 18 months postintake. Littleexplanation was given of the rationale for choosingthese cutoff scores, other than the observation thatapproximately 20% of each sample comprised the“pathological” group. [The authors stated that they“considered these respondents to be seriously im-paired” (p. 455), yet did not explain why they didnot consider respondents who scored between 50 and70 at intake to also be seriously impaired.] A seriesof logistic regression analyses were then conductedusing CAFAS category as a criterion and the fol-lowing variable sets as predictors: (a) problems insocial relationships (with other children, other stu-dents, siblings, parents, and teachers); (b) risk behav-iors (physically attacked people, threatened people,talked about killing self); (c) involvement in juvenilejustice (arrested, convicted of crime, placed on pro-bation, spent time in correctional facility, saw proba-tion/law enforcement officers, detention center); and(d) school-related behaviors (disliked school, skippedschool, disciplined in school, suspended, grades, hap-piness at school, worked much less hard than others,repeated grade). Results indicated that each of thesevariables was highly significant in predicting CAFAScategory for at least one (and oftentimes all four) ofthe time periods. The authors concluded that theseresults provide support for the validity of the CAFASas a measure of impairment across multiple spheresof functioning.

Predictive Validity

In a third study involving the FBEP, Hodges andWong (1997) investigated the predictive validity ofthe CAFAS total score. CAFAS total scores at in-take were used to predict restrictiveness of care lev-els, cost of services, and number of services at both 6and 12 months postintake. Restrictiveness of care wasoperationalized along the following continuum: out-patient care, intensive nonresidential care, residentialcare (e.g., group home), residential treatment center,and inpatient hospitalization. Cost of services was op-erationalized as the total cost of all services received,whereas number of services received was operational-ized as the number of bed days (for inpatient or resi-

dential) and total number of days on which any servicewas delivered. [At 6 months postintake, the numberof service days ranged from 1 to 370, raising questionsabout how this variable was operationalized.] Resultsindicated that even after controlling for the effects ofother instruments (e.g., CBCL, CAS, PCAS, BCQ),the CAFAS total scores significantly predicted theseindicators of service utilization at both follow-up timeperiods, with proportion of unique variance explainedranging from .04 to .11. Although these values appearlow, the CAFAS total score was the single best predic-tor of service utilization and cost. The results of addi-tional analyses indicated that the CAFAS total scorein combination with psychiatric diagnostic informa-tion [e.g., DSM-IV (1994) diagnosis] best predictedservice utilization and cost.

Validity for Use With Target Population

The CAFAS was designed for use with youthswith a variety of emotional and behavioral prob-lems, specifically those with SED. The psychometricdata previously presented were collected through theFort Bragg Evaluation Project (Bickman, 1996b), ademonstration project comparing a continuum of carewith traditional mental health services for youths withSED. Thus, these psychometric data are clearly rele-vant to the target population and do provide prelim-inary support for its valid use with children and ado-lescents with SED. As previously discussed, SED is aheterogeneous diagnostic category covering a wide ar-ray of symptoms and problem behaviors. The CAFAShas face validity in that its items appear to cover thebreadth and depth of emotional and behavioral prob-lems that children and adolescents with SED face. Yet,its construct validity would be enhanced with an itemanalysis to ensure that item coverage is truly repre-sentative of SED. At the least, the strategies used toselect items should be addressed in the manual.

In an attempt to demonstrate validity for use witha diverse population, Hodges (1996) also reportedthat, using strict criteria, no significant differencesin CAFAS were found across gender, racial/ethnic,or caregiver education level groupings. Although thisspeaks to the comparability of CAFAS scores, it doesnot provide sufficient evidence to conclude that thescale holds equivalent meaning across groups. Al-though a discussion of necessary conditions to estab-lish equivalence is beyond the scope of this paper,Reid (1995) provided a well-reasoned model for



demonstrating cross-cultural equivalence of ratingscales. He highlighted the need to explore four formsof equivalence: (a) linguistic – the degree to which“. . . content and grammar have similar connotativeand denotative meaning across cultures” (Marsella &Kameoka, 1989, p. 239 as cited in Reid, 1995); (b) con-ceptual – the degree to which constructs in assess-ment hold similar conceptual meaning; (c) scale – thedegree to which raters share a common understandingof the uses and metric of a scale; and (d) normative –the degree to which norms developed for one cultureare appropriate for another. Thus, further study of theCAFAS items and structure, and how these are inter-preted by various cultural groups (or other meaning-ful groupings), needs to occur before the equivalenceof the scale can be established.

Ease of Use

Hodges (1997) stated that the CAFAS takesabout 10 min to complete if the rater is very famil-iar with the child’s behavior and functioning. No timeguidelines are given for raters who are unfamiliar withthe child. In practical terms, the CAFAS may actuallytake longer than 10 min to complete, given the largenumber of items. Nonclinicians can use the CAFAS,although it is suggested they receive full training anduse a structured interview to gather information. Thetraining materials included with the CAFAS are ex-tensive, consisting of detailed instructions for scoring,demonstration vignettes with ratings provided, and10 vignettes for testing rater reliability. To providea simple measure of the ease of use of the CAFAS,states are required to report their evaluation instru-ments used in children’s mental health services tothe Georgetown University National Technical As-sistance Center (GUNTAC) and are asked to rate theburden of these instruments on a 5-point scale (1: lowburden, 5: high burden). Of the 18 reporting this dataon the CAFAS, more than half gave a rating of 4 or5 (mean = 3.6; GUNTAC, 2000). Thus, there does ap-pear to be some burden associated with the use of theCAFAS.

Utility

One of the strengths of the CAFAS, as with mostLOF measures, is its clinical utility. The CAFAS totalscore appears to provide a meaningful metric by which

to compare youths with a variety of emotional andbehavioral difficulties, although there remain ques-tions about its validity. It also appears to be easilyunderstood by nonclinicians. One potential problemin tracking clinical changes, however, is that there isno associated meaning with the scale’s intervals. Thisis essentially an issue of social or clinical validity. Theclinical utility of the CAFAS would greatly benefitfrom supporting evidence of this type (Sechrest et al.,1996). Other evidence in support of the CAFAS’ util-ity comes from the extent to which it has been adoptedon both the state and local levels.

RECOMMENDATIONS FOR FUTURERESEARCH

The CAFAS has been implemented extensivelyon the state and local levels. Despite this trend, thereis surprisingly little evidence supporting the psycho-metric and clinical validity of this scale. Indeed, it isdisconcerting that the CAFAS has been so widely en-dorsed, especially at the legislative level, without em-pirical demonstration of its validity for use in mak-ing the types of treatment decisions for which it iscurrently being employed across the nation. Further-more, there appears to be little concern (at least asexpressed in the available literature) among mentalhealth researchers, practitioners, administrators andstate legislators about these apparent limitations ofthe CAFAS.

The widespread use of the CAFAS is an indica-tion that there is a growing recognition of the needto assess multiple dimensions of functional impair-ment. With managed care and new regulations (e.g.,changes in the federal definition of SED), policy mak-ers, administrators, treatment providers, and programevaluators are increasingly faced with the dilemma ofselecting objective and valid measures of children’sfunctioning and having none to choose from. At thetime this review was written, the CAFAS was the onlychildren’s multidimensional LOF instrument with anypublished articles on its technical merits. Several othermeasures of children’s functioning have been intro-duced and implemented in recent years (e.g., OhioYouth Scales, Ogles, Melendez, et al., 1999; Child andAdolescent Scale of Temperament and Life Function-ing (CASTLE); and Child Functional Assessment Rat-ing Scale (CFARS), no references given but men-tioned in GUNTAC, 2000), yet no information aboutthem exists in the published literature and it is unclear


80 Bates

to what extent they have demonstrated validity for thepurposes for which they are being used. Given thiscontext, the CAFAS may represent the best availableoption, thus accounting for its widespread use.

The potential benefits of the establishment ofobjective and valid level-of-need criteria, using theCAFAS, are numerous, and clearly the interest in do-ing so is high. Presumably, such a development wouldlead to increased precision in decision-making for thedelivery of mental health services, which would, inturn, lead to lower costs and better care management.It may also lead to improvements in matching the levelof care to the level of client need. For any of these ben-efits to be realized, however, the psychometric limita-tions of the CAFAS identified in this review need tobe addressed. Toward this aim, the following sectionprovides suggested directions for further research onthe CAFAS.

1. Technical properties – Further research isneeded on the technical properties of theCAFAS, particularly its factor structure, inter-rater reliability, and stability. A more in-depthinvestigation of whether the subscale domainshold up to empirical analysis would be desir-able. As discussed in this review, analyses of theinterrater reliability on an item level and test-retest reliability are also needed. These analy-ses should be performed on all of the CAFASsubscales, even if the incidence of certain items(e.g., Thinking) is low in the clinical population.

2. Concurrent validity – Further evidence of theCAFAS’s concurrent validity with additionalmeasures is needed. Potential concurrent mea-sures to establish the validity of the CAFASmight include another measure of LOF, moreelaborate measures of school functioning, self-and parent-report measures of substance use,juvenile justice indicators, diagnostic indica-tors such as DSM-IV (1994) diagnoses, as wellas additional behavioral, social, and emotionalassessments.

3. Discriminant validity – Studies of the CAFAS’sdiscriminant validity would be useful in deter-mining whether scores can reliably differen-tiate between samples of interest (e.g., clini-cal vs. nonclinical). Many scholars (cf., Bird &Gould, 1995; Weissman, Warner, & Fendrich,1990) have espoused the utility of LOF assess-ment for improving the precision of nosologi-cal diagnoses, such as employed in the DSM-IV (1994). For example, Bird and Gould (1995)

reported that studies using symptomatic crite-ria alone have overestimated the prevalencerates of most childhood disorders, such that asmany as “one-third to one-half of the childrenin a population have been found to meet crite-ria for one or more diagnostic categories” (p.92). When information about symptom sever-ity and dysfunctionality are included, theserates drop to levels more in line with theoreticaland clinical consensus. Demonstration of dis-criminant validity, using the CAFAS, therefore,would presumably lead to fewer misclassifica-tion rates and greater confidence in diagnos-tic decision making (e.g., type and amount ofservices rendered; Herman & Mowbray, 1991;Srebnik et al., 1998).

4. Predictive validity – Perhaps the most pressingneed is to establish the predictive validity ofthe CAFAS. Because it is commonly used atintake to make treatment decisions, it is vitalto investigate whether CAFAS scores predictimportant treatment variables, such as numberand amount of services, length of treatment,and cost of treatment. Newman and Tejeda(1996) reported the initiation of such a projectthrough the Indiana Division of Mental Health(IDMH) and indicated that the CAFAS wasthe intended instrument for use with childrenand adolescents (with a multidimensional adultLOF measure to be developed by the first au-thor). Briefly, the aims of this project are to(a) investigate the psychometric properties ofthe LOF measures, (b) track LOF and ser-vice data from service providers, and (c) iden-tify “cost-homogenous groups” (or clusters ofclients with similar LOF and service costs),with the ultimate goal of providing data forthe creation of actuarial criteria. These crite-ria will then be revisited and refined as neededon an ongoing basis. Other states have initi-ated similar projects, using the CAFAS (seeTable I; Behar & Stelle, 1997; Heflinger &Simpkins, 1997; Hersch, 1998; Hodges, Warren,& Wotring, 1998; Schwartz & Perkins, 1997)and other scales (Newman & Tejeda, 1996;Srebnik et al., 1998). Such studies might even-tually produce valid and empirically based al-gorithms for matching CAFAS scores with lev-els of care.

5. Applications – A variety of other applicationsof the CAFAS appear to hold promise toincrease its clinical utility in the delivery of



mental health services to youths. One innova-tive approach to classification of psychopathol-ogy in both adults and children has involved theapplication of cluster analysis to client charac-teristics, particularly LOF, to generate clienttypologies. Herman and Mowbray (1991), forexample, assessed 2,447 adults with seriousmental illness using 16 scales of daily function-ing (e.g., community living, substance abuse,level of health care needs) and subjected thesedata to cluster analysis to “organize and iden-tify the patterns within the rich array of in-formation provided by multidimensional LOFassessments” (p. 102). Results indicated sixrelatively homogenous clusters or client typesthat were then used to facilitate analysis of ser-vice utilization patterns and population differ-ences across service sites. Several other studieshave demonstrated the clinical utility of clus-ter analyses in child populations (Lahey, et al.,1988; McDermott & Weisz, 1995; Rosenblatt etal., 1998; Wood et al., 1998).

Herman and Mowbray (1991, p. 111) conciselysupported the utility of cluster analysis as follows:

The cluster analysis . . .has the descriptive advantagesexpected: it succinctly summarize [sic] a substantiveamount of data about client functioning and sever-ity levels overall, as well as about special treatmentneeds, such as health care and substance abuse. Datafrom statewide studies in the past have been difficultto use because agencies were presented with dozensof variables and asked to meaningfully interpret whytheir clients are more functional than the state av-erages on some variables, less functional on others,etc. With a cluster analysis, agencies are presentedwith simple statements . . . regarding the proportionof clients they serve from each cluster. Thus, clusteranalysis based on LOF data appears to be a burgeon-ing and useful technique to summarize and interpretan often overwhelming amount of client data.

In sum, the CAFAS holds substantial promiseas an important tool for use in diagnosis, treatment,and evaluation of youths with EBD. This review sug-gests that the technical adequacy of the CAFAS hasyet to be clearly established. It is hoped that moredefensible, empirically based methods can be used toassess children’s global functioning more efficientlyand accurately, with the ultimate goal to enhance al-location of resources and service delivery to youthswith emotional and behavioral disorders.

REFERENCES

Achenbach, T. (1991). Child behavior checklist manual, 1991.Burlington, VT: University of Vermont.

Achenbach, T. M., & Edelbrock, C. (1983). Manual for the child be-havior checklist and revised child behavior profile. Burlington,VT: University of Vermont, Department of Psychiatry.

Altaffer, F., & Stelk, W. (1998, March). Thriving in the frenzy:Outcomes reporting in a national provider of child/adolescentservices. Paper presented at the 11th Annual Conference, ASystem of Care for Children’s Mental Health: Expanding theResearch Base, Tampa, FL.

American Psychiatric Association. (1987). Diagnostic and statisticalmanual of mental disorders (3rd ed., Rev.). Washington, DC:Author.

American Psychiatric Association. (1994). Diagnostic and statisticalmanual of mental disorders (4th ed.). Washington, DC: Author.

Anderson, D. F., Berlant, J. L., Mauch, D., & Maloney, W. R. (1996).Managed behavioral health care services. In P. R. Kongstvedt(Ed.), The managed health care handbook (3rd ed., pp. 341–366). Gaithersburg, MD: Aspen.

Bates, M. P. (1999). Global functioning within a system of care foryouths with serious emotional disturbance: A closer look at theChild and Adolescent Functional Assessment Scale (CAFAS).Unpublished doctoral dissertation, University of California,Santa Barbara.

Behar, L., & Stelle, L. (1997). Criteria for accessing child mentalhealth and substance abuse services in North Carolina. In C.Liberton, K. Kutash, & R. Friedman (Eds.), The 9th AnnualResearch Conference Proceedings, A System of Care for Chil-dren’s Mental Health: Expanding the Research Base, February26 to February 28, 1996 (pp. 262–264). Tampa, FL: Universityof South Florida, the Louis de la Parte Florida Mental HealthInstitute, Research and Training Center for Children’s MentalHealth.

Bickman, L. (1996a). A continuum of care: More is not alwaysbetter. American Psychologist, 51, 689–701.

Bickman, L. (1996b). The evaluation of a children’s mental healthmanaged care demonstration. Journal of Mental Health Ad-ministration, 23, 7–15.

Bickman, L., Heflinger, C. A., Pion, G., & Behar, L. (1992). Evalu-ation planning for an innovative children’s mental health sys-tem. Clinical Psychology Review, 12, 853–865.

Bird, H., Canino, G., Rubio-Stipec, M., & Ribera, J. C. (1987). Fur-ther measures of the psychometric properties of the Children’sGlobal Assessment Scale. Archives of General Psychiatry, 44,821–824.

Bird, H. R., & Gould, M. S. (1995). The use of diagnostic instru-ments and global measures of functioning in child psychiatryepidemiological studies. In F. C. Verhulst & H. M. Koot (Eds.),The epidemiology of child and adolescent psychopathology (pp.86–103). Oxford, UK: Oxford University Press.

Breda, C. S. (1996). Methodological issues in evaluating mentalhealth outcomes of a children’s mental health managed caredemonstration. Journal of Mental Health Administration, 23,40–50.

Burlingame, G. M., Lambert, M. J., Reisinger, C. W., Neff, W. M.,& Mosier, J. (1995). Pragmatics of tracking mental health out-comes in a managed care setting. Journal of Mental HealthAdministration, 22, 226–236.

California Department of Mental Health. (1997). The Children andYouth Performance Outcome Program: Clinical training man-ual. Sacramento, CA: Author.

Canino, G., Bird, H. R., Rubio-Stipec, M., Woodbury, M. A.,Ribera, J. C., Huertas, S. E., & Sesman, M. (1987). Reliability ofchild diagnosis in a Hispanic sample. Journal of the AmericanAcademy of Child Psychiatry, 26, 560–565.

Center for Mental Health Service. (1999, December 17).Comprehensive community mental health servicesfor children program. Washington, DC: Author Re-trieved December 17, 1999 from the World Wide Web:http://www.mentalhealth.org/publications//allpubs/CA-0013/ccmhse.htm#TOP


82 Bates

Clark, L. A., & Watson, D. (1995). Constructing validity: Basicissues in objective scale development. Psychological Assess-ment, 7, 309–319.

Daniels, L. V., & Clements, L. (1997). The utilization of the Childand Adolescent Functional Assessment Scale for assessingprogram and clinical outcomes, mental health policy, andchild outcomes in Missouri. In C. Liberton, K. Kutash, & R.Friedman (Eds.), The 9th Annual Research Conference Pro-ceedings, A System of Care for Children’s Mental Health: Ex-panding the Research Base February 26 to February 28, 1996(pp. 420–423). Tampa, FL: University of South Florida, theLouis de la Parte Florida Mental Health Institute, Researchand Training Center for Children’s Mental Health.

Eber, L., & Rolf, K. (1998). Education’s role in the system of care:Student/family outcomes. In C. Liberton, K. Kutash, & R.Friedman (Eds.), The 10th Annual Research Conference Pro-ceedings, A System of Care for Children’s Mental Health: Ex-panding the Research Base February 23 to February 26, 1997(pp. 175–180). Tampa, FL: University of South Florida, TheLouis de la Parte Florida Mental Health Institute, Researchand Training Center for Children’s Mental Health.

Edelbrock, C., & Achenbach, T. (1984). The teacher version of theChild Behavior Profile: I. Boys aged 6–11. Journal of Consult-ing and Clinical Psychology, 52, 207–217.

Ellis, R. H., Wilson, N. Z., & Foster, F. M. (1984). Statewide treat-ment in outcome assessment in Colorado: The Colorado ClientAssessment Record (CCAR). Community Mental Health Jour-nal, 20, 72–89.

Endicott, J., Spitzer, R. L., Fleiss, J. L., & Cohen, J. (1976). TheGlobal Assessment Scale. Archives of General Psychiatry, 33,766–771.

Fedral Register 29422–29425. (1993, May 20).Feibelman, N. D., III. (1998). A system of care for children’s men-

tal health. In C. Liberton, K. Kutash, & R. Friedman (Eds.),The 10th Annual Research Conference Proceedings, A Systemof Care for Children’s Mental Health: Expanding the ResearchBase February 23 to February 26, 1997 (pp. 43, 44). Tampa,FL: University of South Florida, The Louis de la Parte FloridaMental Health Institute, Research and Training Center forChildren’s Mental Health.

Friedman, R. M., & Burns, B. J. (1996). The evaluation of the FortBragg demonstration project: An alternative interpretation ofthe findings. Journal of Mental Health Administration, 23, 128–136.

Friis, H. L. S. (1996). Routine evaluation of mental health: Re-liable information or worthless ‘guesstimates’? Acta Psychi-atrica Scandinavica, 93, 125–128.

Georgetown University National Technical Assistance Center.(1999). Evaluation Instruments [Table]. Washington, DC: Au-thor. Retrieved January 24, 2000 from the World WideWeb: http://www.dml.georgetown.edu/depts/pediatrics/gucdc/instruments 1.html

Green, B., Shirk, S., Hanze, D, & Wanstrath, J. (1994). The Chil-dren’s Global Assessment Scale in clinical practice: An em-pirical evaluation. Journal of the American Academy of Childand Adolescent Psychiatry, 33, 1158–1164.

Green, R. S., & Newman, F. L. (1996). Criteria for selecting out-come instruments to assess treatment outcomes. ResidentialTreatment for Children and Youth, 13, 29–48.

Gutierrez-Mayka, M. (1998, March). Findings from an evaluation ofa community-based intervention for children at-risk in Boston,MA. Paper presented at the 11th Annual Conference, A Sys-tem of Care for Children’s Mental Health: Expanding the Re-search Base, Tampa, FL.

Heflinger, C. A., & Simpkins, C. G. (1997). CAFAS: Evaluatingstatewide service. In C. Liberton, K. Kutash, & R. Friedman(Eds.), The 9th Annual Research Conference Proceedings, ASystem of Care for Children’s Mental Health: Expanding theResearch Base February 26 to February 28, 1996 (pp. 415–420).

Tampa, FL: University of South Florida, the Louis de la ParteFlorida Mental Health Institute, Research and Training Centerfor Children’s Mental Health.

Herman, S. E., & Mowbray, C. T. (1991). Client typology basedon functioning level assessments: Utility for service planningand monitoring. Journal of Mental Health Administration, 18,101–115.

Hersch, P. (1998). Implementing eligibility determination processfor children’s mental health services in Massachusetts, Char-acteristics of youth: The first six months. In C. Liberton, K.Kutash, & R. Friedman (Eds.), The 10th Annual Research Con-ference Proceedings, A System of Care for Children’s MentalHealth: Expanding the Research Base, February 23 to Febru-ary 26, 1997 (pp. 377–381). Tampa, FL: University of SouthFlorida, the Louis de la Parte Florida Mental Health Institute,Research and Training Center for Children’s Mental Health.

Hodges, K. (1989). Child and Adolescent Functional AssessmentScale. Unpublished manuscript, Eastern Michigan University,Ypsilanti.

Hodges, K. (1990). Child Assessment Schedule. Unpublishedmanuscript, Eastern Michigan University, Ypsilanti.

Hodges, K. (1995, March). Psychometric study of a telephone in-terview for the CAFAS using an expanded version of the scale.Paper presented at the 8th annual research conference: A Sys-tem of Care for Children’s Mental Health: Expanding the Re-search Base, Tampa, FL.

Hodges, K. (1996). Summary of psychometric data on the CAFAS.Ann Arbor, MI: Author.

Hodges, K. (1997). CAFAS manual for training coordinators, clini-cal administrators, and data managers. Ann Arbor, MI: Author.

Hodges, K., & Gust, J. (1995). Measures of impairment for childrenand adolescents. Journal of Mental Health Administration, 22,403–413.

Hodges, K., & Wong, M. M. (1996). Psychometric characteristics ofa multidimensional measure to assess impairment: The Childand Adolescent Functional Assessment Scale. Journal of Childand Family Studies, 5, 445–467.

Hodges, K., & Wong, M. M. (1997). Use of the Child and AdolescentFunctional Assessment Scale to predict service utilization andcost. Journal of Mental Health Administration, 24, 278–290.

Hodges, K., Warren, B., & Wotring, J. (1998, March). The develop-ment of a set of criteria for determining levels of care for youthwith SED based on empirical data. Paper presented at the 11thAnnual Research Conference, A System of Care for Children’sMental Health: Expanding the Research Base, Tampa, FL.

Hodges, K., Wong, M. M., & Latessa, M. (1998). Use of the Childand Adolescent Functional Assessment Scale (CAFAS) as anoutcome measure in clinical settings. Journal of BehavioralHealth Services and Research, 25, 325–336.

Individuals with Disabilities Education Act, 20 U.S.C. Sec. 1400(1990).

Irvin, E., & Hersch, P. (1997). Proposed eligibility criteria and pro-cedures for enrollment in Department of Mental Health con-tinuing care. In C. Liberton, K. Kutash, & R. Friedman (Eds.),The 9th Annual Research Conference Proceedings, A Systemof Care for Children’s Mental Health: Expanding the ResearchBase, February 26 to February 28, 1996 (pp. 264–267). Tampa,FL: University of South Florida, the Louis de la Parte FloridaMental Health Institute, Research and Training Center forChildren’s Mental Health.

Jacobson, C. V., & Meyer, T. (1997). Assessment of patient func-tioning in a child and adolescent psychiatric facility. In C.Liberton, K. Kutash, & R. Friedman (Eds.), The 9th Annual Re-search Conference Proceedings, A System of Care for Children’sMental Health: Expanding the Research Base, February 26 toFebruary 28, 1996 (pp. 297–301). Tampa, FL: University ofSouth Florida, The Louis de la Parte Florida Mental HealthInstitute, Research and Training Center for Children’s MentalHealth.



Kamradt, B., Kostan, M. J., & Pina, V. (1998). Wraparound Milwau-kee: Two year follow-up on the Twenty-Five Kid Project. In C.Liberton, K. Kutash, & R. Friedman (Eds.), The 10th AnnualResearch Conference Proceedings, A System of Care for Chil-dren’s Mental Health: Expanding the Research Base, February23 to February 26, 1997 (pp. 225–228). Tampa, FL: Universityof South Florida, The Louis de la Parte Florida Mental HealthInstitute, Research and Training Center for Children’s MentalHealth.

Kazdin, A. E., & Weisz, J. R. (1998). Identifying and developingempirically supported child and adolescent treatments. Journalof Consulting and Clinical Psychology, 66, 19–36.

Kirkman, C., Brunk, M., & Cohen, R. (1999). Determining levelsof need for decategorized funding of services for children withemotional and behavior disturbance. In J. Willis, C. Liberton,K. Kutash, & R. Friedman (Eds.), The 11th Annual ResearchConference Proceedings, A System of Care for Children’s Men-tal Health: Expanding the Research Base, March 8 to March 11,1997 (pp. 21–26). Tampa, FL: University of South Florida, TheLouis de la Parte Florida Mental Health Institute, Researchand Training Center for Children’s Mental Health.

Koch, J. R., & Brunk, M. (1998). An outcomes management sys-tem for child/adolescent public mental health services. In C.Liberton, K. Kutash, & R. Friedman (Eds.), The 10th AnnualResearch Conference Proceedings, A System of Care for Chil-dren’s Mental Health: Expanding the Research Base, February23 to February 26, 1997 (pp. 359–363). Tampa, FL: Universityof South Florida, the Louis de la Parte Florida Mental HealthInstitute, Research and Training Center for Children’s MentalHealth.

Lahey, B. B., Pelham, W. E., Schaughency, E. A., Atkins, M. S.,Murphy, H. A., Hynd, G., Russo, M., Hartdagen, S., & Lorys-Vernon, A. (1988). Dimensions and types of attention deficitdisorder. Journal of the American Academy of Child and Ado-lescent Psychiatry, 27, 360–365.

Lambert, M. J. (1994). Use of psychological tests for outcome as-sessment. In M. E. Maruish (Ed.), The use of psychologicaltesting for treatment planning and outcome assessment (pp. 75–97). Hillsdale, NJ: Lawrence Erlbaum.

Lambert, M. J., & McRoberts, C. H. (1993, April). Outcome mea-surement in JCCP: 1986–1991. Paper presented at the meetingof the Western Psychological Association, Phoenix, AZ.

Lemoine, R. L., & McDermott, B. E. (1998). Assessing levels andprofiles of service need using the CAFAS. In C. Liberton, K.Kutash, & R. Friedman (Eds.), The 10th Annual Research Con-ference Proceedings, A System of Care for Children’s MentalHealth: Expanding the Research Base, February 23 to Febru-ary 26, 1997 (pp. 371–375). Tampa, FL: University of SouthFlorida, the Louis de la Parte Florida Mental Health Institute,Research and Training Center for Children’s Mental Health.

Lemoine, R., Speier, T., Ellzey, S., & Pine, J. (1997). Using the Childand Adolescent Functional Assessment Scale (CAFAS) to es-tablish level-of-need for Medicaid managed care services. InC. Liberton, K. Kutash, & R. Friedman (Eds.), The 9th AnnualResearch Conference Proceedings, A System of Care for Chil-dren’s Mental Health: Expanding the Research Base, February26 to February 28, 1996 (pp. 267–270). Tampa, FL: Universityof South Florida, the Louis de la Parte Florida Mental HealthInstitute, Research and Training Center for Children’s MentalHealth.

Luborsky, L. (1962). Clinicians’ judgments of mental health: A pro-posed scale. Archives of General Psychiatry, 7, 407–417.

Massey. T., Kershaw, M. A., Armstrong, M., Shepard, J., & Wu,L. (1998). The children’s performance outcome measures: Re-sults after six months. In C. Liberton, K. Kutash, & R. Friedman(Eds.), The 10th Annual Research Conference Proceedings, ASystem of Care for Children’s Mental Health: Expanding theResearch Base, February 23 to February 26, 1997 (pp. 353–358). Tampa, FL: University of South Florida, the Louis de la

Parte Florida Mental Health Institute, Research and TrainingCenter for Children’s Mental Health.

McDermott, P. A., & Weiss, R. V. (1995). A normative typol-ogy of healthy, subclinical, and clinical behavior styles amongAmerican children and adolescents. Psychological Assessment,7, 162–170.

Newman, F. L. (1980). Global scales: Strengths, uses, and problemsof global scales as an evaluation instrument. Evaluation andProgram Planning, 3, 257–268.

Newman, F. L., & Ciarlo, J. A. (1994). Criteria for selecting psycho-logical instruments for treatment outcome assessment. In M.E. Maruish (Ed.), The use of psychological testing for treatmentplanning and outcome assessment (pp. 98–110). Hillsdale, NJ:Lawrence Erlbaum Associates.

Newman, F. L., Hunter, R. H., & Irving, D. (1987). Simple measuresof progress and outcome in the evaluation of mental healthservices. Evaluation and Program Planning, 10, 209–218.

Newman, F. L., & Tejeda, M. J. (1996). The need for research that isdesigned to support decisions in the delivery of mental healthservices. American Psychologist, 51, 1040–1049.

O’Neal, L., & Wade, P. (1998, March). Use of CAFAS and casereviews for outcome evaluation. Paper presented at the 11thAnnual Research Conference, A System of Care for Children’sMental Health: Expanding the Research Base, Tampa, FL.

Ogles, B. M., Davis, D., & Lunnen, K. M. (1999). Inter-rater reli-ability of four measures of youth functioning. In J. Willis, C.Liberton, K. Kutash, & R. Friedman (Eds.), The 11th AnnualResearch Conference Proceedings, A System of Care for Chil-dren’s Mental Health: Expanding the Research Base, March 8to March 11, 1997 (pp. 321–326). Tampa, FL: University ofSouth Florida, The Louis de la Parte Florida Mental HealthInstitute, Research and Training Center for Children’s MentalHealth.

Ogles, B. M., Melendez, G., Davis, D. C., & Lunnen, K. M.(1999). The Ohio Youth Problems, Functioning, and Satisfac-tion Scales: User’s manual. Unpublished manuscript, Ohio Uni-versity, Athens.

Oliveira, B., Rivera, V. R., Kutash, K., Duchnowski, A. J., &Calvanese, P. K. (1998). The school and community study:Summary of preliminary baseline data. In C. Liberton, K. Ku-tash, & R. Friedman (Eds.), The 10th Annual Research Con-ference Proceedings, A System of Care for Children’s MentalHealth: Expanding the Research Base, February 23 to Febru-ary 26, 1997 (pp. 141–146). Tampa, FL: University of SouthFlorida, The Louis de la Parte Florida Mental Health In-stitute, Research and Training Center for Children’s MentalHealth.

Pires, S. A. (1997). Lessons learned from the Fort Bragg Demon-stration: An overview. In S. A. Pires (Ed.), Lessons learnedfrom the Fort Bragg Demonstration (pp. 1–21). Tampa, FL:University of South Florida, Louis de la Parte Florida MentalHealth Institute, Research and Training Center for Children’sMental Health.

Pokorny, L. J. (1991). A summary measure of client level of func-tioning: Progress and challenges for use within mental healthagencies. Journal of Mental Health Administration, 18, 80–87.

Reckase, M. D. (1996). Test construction in the 1990s: Recent ap-proaches every psychologist should know. Psychological As-sessment, 8, 354–359.

Reid, R. (1995). Assessment of ADHD with culturally differentgroups: The use of behavioral ratings scales. School PsychologyReview, 24, 537–560.

Rey, J. M., Starling, J., Wever, C., Dossetor, D. R., & Plapp,J. M. (1995). Inter-rater reliability of global assessment offunctioning in a clinical setting. Journal of Child Psychologyand Psychiatry, 36, 787–792.

Robertson, L. M., Bates, M. P., Wood, M., Rosenblatt, J. A., Furlong,M. J., Casas, J. M., & Schweir, P. (1998). Educational place-ments of students with emotional and behavioral disorders


84 Bates

served by probation, mental health, public health, and socialservices. Psychology in the Schools, 35, 333–345.

Rosenblatt, A., Wyman, N., Kingdon, D., & Ichinose, C. (1997).Managing what you measure: Creating outcome driven systemsof care for youth with serious emotional disturbance. Unpub-lished manuscript.

Rosenblatt, J., & Rosenblatt, A. (1999). Academic achievementand mental health functioning: An illusory or realistic relation-ship? In J. Willis, C. Liberton, K. Kutash, & R. Friedman (Eds.),The 11th Annual Research Conference Proceedings, A Systemof Care for Children’s Mental Health: Expanding the ResearchBase, March 8 to March 11, 1997 (pp. 112–117). Tampa, FL:University of South Florida, The Louis de la Parte FloridaMental Health Institute, Research and Training Center forChildren’s Mental Health.

Rosenblatt, J., Robertson, L., Bates, M., Wood, M., Furlong, M.J., & Sosna, T. (1998). Troubled or troubling? Characteristicsof youths referred to a system of care without system-levelreferral constraints. Journal of Emotional and Behavioral Dis-orders, 6, 42–54.

Rotto, K. L., Sokol, P. I. T., Matthews, B., & Russell, L. (1998,March). A practitioners view of outcomes. Paper presented atthe 11th Annual Conference, A System of Care for Children’sMental Health: Expanding the Research Base, Tampa, FL.

Schmitt, N. (1996). Uses and abuses of coefficient alpha. Psycho-logical Assessment, 8, 350–353.

Schwartz, A., & Perkins, S. (1997). Criteria used in determiningappropriateness of service utilization in Arizona. In C. Lib-erton, K. Kutash, & R. Friedman (Eds.), The 9th Annual Re-search Conference Proceedings, A System of Care for Children’sMental Health: Expanding the Research Base, February 26 toFebruary 28, 1996 (pp. 270–272). Tampa, FL: University ofSouth Florida, the Louis de la Parte Florida Mental HealthInstitute, Research and Training Center for Children’s MentalHealth.

Sechrest, L., McKnight, P., & McKnight, K. (1996). Calibration ofmeasures for psychotherapy outcome studies. American Psy-chologist, 51, 1065–1071.

Selby, P. M., Trupin, E. W., McCauley, E., & Vander Stoep, A.(1998). The Prime Time Project: Preliminary review of the firstyear of a community-based intervention for youth in the juve-

nile justice system. In C. Liberton, K. Kutash, & R. Friedman(Eds.), The 10th Annual Research Conference Proceedings, ASystem of Care for Children’s Mental Health: Expanding theResearch Base, February 23 to February 26, 1997 (pp. 339–344). Tampa, FL: University of South Florida, The Louis de laParte Florida Mental Health Institute, Research and TrainingCenter for Children’s Mental Health.

Shaffer, D., Gould, M. S., Brasic, J., Ambrosini, P., Fisher, P.,Bird, H., & Aluwahlia, S. (1983). A Children’s Global Assess-ment Scale (CGAS). Archives of General Psychiatry, 40, 1228–1231.

Srebnik, D., Uehara, E., & Smukler, M. (1998). Field test of a toolfor level-of-care decisions in community mental health sys-tems. Psychiatric Services, 49, 91–97.

Task Force on Promotion and Dissemination of Psychological Pro-cedures. (1995). Training in and dissemination of empirically-validated psychological procedures: Report and recommenda-tions. Clinical Psychologist, 48, 3–23.

Vermillion, J., & Pfeiffer, S. (1993). Treatment outcomes and con-tinuous quality improvement: Two aspects of program evalu-ation. Psychiatric Hospital, 24, 9–14.

Walker, R., Minor-Schork, D., Bloch, R., & Eisenhart, J. (1996).High risk factors for rehospitalization within six months. Psy-chiatric Quarterly, 67, 235–243.

Weissman, M. M., Warner, V., & Fendrich, M. (1990). Applyingimpairment criteria to children’s psychiatric diagnosis. Journalof the American Academy of Child and Adolescent Psychiatry,29, 789–795.

Wood, M., Rosenblatt, J. A., Furlong, M. J., Robertson, L. M., Bates,M. P., & Casas, J. M. (1998). Evaluating system of care clini-cal outcomes by youth risk profiles. In C. Liberton, K. Ku-tash, & R. Friedman (Eds.), The 10th Annual Research Con-ference Proceedings, A System of Care for Children’s MentalHealth: Expanding the Research Base, February 23 to Febru-ary 26, 1997 (pp. 407–414). Tampa, FL: University of SouthFlorida, The Louis de la Parte Florida Mental Health In-stitute, Research and Training Center for Children’s MentalHealth.

Zimmerman, D. P. (1996). A comparison of commonly used treat-ment measures. Residential Treatment for Children and Youth,13, 49–69.

Documents

The Child and Adolescent Functional Assessment Scale ... · Hodges, 1989, 1997)—has enjoyed widespread use nationwide. ... AL Using CAFAS along with a battery of other measures