110
A STUDY OF THE CONSTRUCT VALIDITY OF THE BEHAVIORAL ASSESSMENT SYSTEM FOR CHILDREN, SECOND EDITION (BASC-2) WITHIN THE JUVENILE OFFENDER POPULATION. by JON PEIPER (Under the Direction of Georgia B. Calhoun) ABSTRACT The current study sought to evaluate the construct validity of the Behavior Assessment System for Children, Second Edition (BASC-2) Self-Report of Personality-Adolescent (SRP-A) as a broad screening measure for use within the juvenile offender population. The BASC-2 SRP-A is recommended for this purpose but has not been validated for use within this population. Results from Confirmatory Factor Analysis (n=205) provided evidence of adequate fit of the five-factor higher-order model (Reynolds & Kamphaus, 2004) with the data from the current study. The individual scales of the instrument demonstrated good to excellent internal consistency except for two scales; Sensation Seeking and Self-Reliance. Inter-scale correlations of SRP-A scales were in expected directions, while specific correlations with MMPI-A scales provided strong support for convergent validity. Based on these results, the BASC-2 SRP-A is supported for use within the juvenile offender population as a broad screening instrument. INDEX WORDS: Juvenile Offenders, Behavioral assessment System for Children, Second Edition (BASC-2), Factor Analysis, Validity, Reliability

Peiper Dissertation Front Matters - University of Georgia · PDF filea study of the construct validity of the behavioral assessment system for children, second edition (basc-2) within

Embed Size (px)

Citation preview

A STUDY OF THE CONSTRUCT VALIDITY OF THE BEHAVIORAL ASSESSMENT

SYSTEM FOR CHILDREN, SECOND EDITION (BASC-2)

WITHIN THE JUVENILE OFFENDER POPULATION.

by

JON PEIPER

(Under the Direction of Georgia B. Calhoun)

ABSTRACT

The current study sought to evaluate the construct validity of the Behavior Assessment System

for Children, Second Edition (BASC-2) Self-Report of Personality-Adolescent (SRP-A) as a

broad screening measure for use within the juvenile offender population. The BASC-2 SRP-A is

recommended for this purpose but has not been validated for use within this population. Results

from Confirmatory Factor Analysis (n=205) provided evidence of adequate fit of the five-factor

higher-order model (Reynolds & Kamphaus, 2004) with the data from the current study. The

individual scales of the instrument demonstrated good to excellent internal consistency except

for two scales; Sensation Seeking and Self-Reliance. Inter-scale correlations of SRP-A scales

were in expected directions, while specific correlations with MMPI-A scales provided strong

support for convergent validity. Based on these results, the BASC-2 SRP-A is supported for use

within the juvenile offender population as a broad screening instrument.

INDEX WORDS: Juvenile Offenders, Behavioral assessment System for Children, Second

Edition (BASC-2), Factor Analysis, Validity, Reliability

A STUDY OF THE CONSTRUCT VALIDITY OF THE BEHAVIORAL ASSESSMENT

SYSTEM FOR CHILDREN, SECOND EDITION (BASC-2)

WITHIN THE JUVENILE OFFENDER POPULATION.

by

JON PEIPER

B.S., The University of Georgia, 2002

M.Ed., The University of Georgia, 2005

A Dissertation Submitted to the Graduate Faculty of The University of Georgia in Partial

Fulfillment of the Requirements for the Degree

DOCTOR OF PHILOSOPHY

ATHENS, GEORGIA

2009

© 2009

Jon Peiper

All Rights Reserved

A STUDY OF THE CONSTRUCT VALIDITY OF THE BEHAVIORAL ASSESSMENT

SYSTEM FOR CHILDREN, SECOND EDITION (BASC-2)

WITHIN THE JUVENILE OFFENDER POPULATION.

by

JON PEIPER

Major Professor: Georgia B. Calhoun

Committee: Edward Delgado-Romero Brian A. Glaser Pamela O. Paisley

Electronic Version Approved: Maureen Grasso Dean of the Graduate School The University of Georgia August 2009

iv

DEDICATION

I would like to dedicate this paper to my wife, Katherine. She has been with me through

the thickest and most challenging parts of this journey. She has been supportive, encouraging,

and also willing to kick me in the pants when needed. She has also provided a balance to my life

that would not exist without her.

My family has also been a strong foundation of support. Their belief in me has been

motivating. I would to like thank my mother for teaching me compassion and my father for

teaching me dedication. Together, they have made it possible for me to be the professional I am

today. My siblings have all been individual inspirations to me and their support has been

invaluable.

Specifically, I would like to thank the staff, faculty, and students of the Counseling

Psychology program and in the Department of Counseling and Human Development Services.

My fellow students and cohort members have taught me about the true value of seeking and

offering help. The faculty inspired me and became models of what being a psychologist means. I

would like to thank Heather Dukes Murray for being a strong leader for JCAP and for all she did

to help me complete this dissertation. Finally, I would like to specifically thank Dr.s Georgia

Calhoun and Brian Glaser. They have nurtured me in my professional development since I began

as a masters student in the Juvenile Counseling and Assessment Program. You are both models

of who and how I want to be.

v

ACKNOWLEDGEMENTS

I would like to acknowledge Georgia Calhoun and Brian Glaser. They have personally

and professionally influenced who I am and this dissertation could not have been completed

without them. Throughout my years of work with JCAP, I always felt supported and encouraged.

As a doctoral assistant with the program, I developed professional confidence in myself because

they believed in me and respected my input. I learned and grew as a psychologist during JCAP.

Thank you.

vi

TABLE OF CONTENTS

Page

ACKNOWLEDGEMENTS .............................................................................................................v

LIST OF TABLES ....................................................................................................................... viii

LIST OF FIGURES ....................................................................................................................... ix

CHAPTER

1 Introduction ....................................................................................................................1

The Juvenile Offender ...............................................................................................2

Justification and Significance ....................................................................................5

Statement of Problem ................................................................................................7

General Hypotheses ...................................................................................................7

Definitions and Operational Terms ...........................................................................8

2 Review of Related Research ..........................................................................................9

Evidence-Based Assessment (EBA) .........................................................................9

Research and Theory ...............................................................................................10

Reliability and Validity ...........................................................................................11

The Assessment Process ..........................................................................................15

EBA for Specific Purposes with Children and Adolescents ...................................16

Behavioral Assessment System for Children, Second Edition (BASC-2) ..............23

3 Method .........................................................................................................................28

Description of Sample .............................................................................................28

Statistical Analysis ..................................................................................................28

Instruments ..............................................................................................................31

vii

Data Collection ........................................................................................................34

Limitations ...............................................................................................................37

Assumptions ............................................................................................................37

Hypotheses ..............................................................................................................38

4 Results ..........................................................................................................................39

Reliability ................................................................................................................39

Validity ....................................................................................................................45

5 Discussion and Summary .............................................................................................58

Summary .................................................................................................................58

Discussion of Findings ............................................................................................60

Reliability ................................................................................................................60

Validity ....................................................................................................................61

Limits to Internal Validity .......................................................................................64

Limits to External Validity ......................................................................................65

Implications for Future Research ............................................................................65

Implications for Practice .........................................................................................66

Conclusions .............................................................................................................67

REFERENCES ..............................................................................................................................69

APPENDICES ...............................................................................................................................74

A Stem and Leaf Plots for BASC-2 Scales......................................................................74

B BASC-2 Scale Cronbach Alphas and Item-Total Correlations ....................................84

C Results from Confirmatory Factor Analysis ................................................................94

D Results from Exploratory Factor Analysis ...................................................................98

viii

LIST OF TABLES

Page

Table 1: Matrix for Evaluating Internal Consistency Alphas. .......................................................13

Table 2: BASC-2 and MMPI-A Scale Statistics for Sample. ........................................................36

Table 3: Coefficient Alpha Classifications. ...................................................................................40

Table 4: Cronbach Alpha’s for Current Study and for Normative Sample. ..................................41

Table 5: Interscale Correlations within BASC-2 SRP-A. ..............................................................46

Table 6: BASC-2 SRP-A Correlations with MMPI-A. ................................................................48

Table 7: Fit Indices for Confirmatory Factor Analyses. ................................................................50

Table 8: Standardized Parameter Estimates for Five-factor Model. ..............................................53

Table 9: Parallel Analysis Results. ................................................................................................55

Table 10: Loadings from 4-factor Solution. ..................................................................................57

ix

LIST OF FIGURES

Page

Figure 1: Composite to Scale Relationships on BASC-2 SRP-A ..................................................30

Figure 2: EFA 4-factor Structure ...................................................................................................56

 

CHAPTER 1

INTRODUCTION

The age cohort comprising childhood and adolescence has been of interest to

psychologists since psychology’s introduction to the United States (Benjamin 2007). G. Stanley

Hall, ostensibly the founder of American psychology, has been credited with initiating the child

guidance movement. His efforts in founding journals, writing books, organizing associations,

and advocating for children have had a lasting influence on psychology and arguably this country

as well (Benjamin 2007).

One noteworthy continuation of Hall’s efforts can be found in the work of Lightner

Witmer (Benjamin 2007). By specifying the need for a psychology for application in clinics, he

opened the way for the development of school psychology, clinical psychology, and counseling

psychology; the applied psychologies. The application of psychology received initial skepticism

and was tagged with the same negative valence associated with phrenology and other

“applications” of psychology of that day. The early American versions of “therapy” were

negatively seen by many as a mystic type of healing (Benjamin 2007). Many a nose was turned

up at the application of psychology, but it has slowly become the face and hands of psychology.

A central function within the application of psychology is assessment. Watkin’s (1992)

discussion of the historical influences on assessment practices of counseling psychologists noted

that regardless of work setting, assessment occupied a significant part of a counseling

psychologist’s practice. Groth-Marnat (2003) stated that “assessment is crucial to the definition,

training, and practice of professional psychology” (p.5). He continued by citing that 91% of

psychologists in practice engage in assessment. Furthermore, Groth-Marnat (2003) noted that

assessment is considered the “very foundation of clinical investigation, applied research, and

 

program evaluation” (p.6), and described the recent increase of behavior rating scales in

assessments with children, specifically acknowledging the Behavioral Assessment System for

Children, Conner’s Parent/Teacher Rating Scales, and the Achenbach Child Behavior Checklist

as exemplars.

Perhaps the earliest version of psychological assessment was completed by the likes of

Freud, Jung, and Adler using clinical interviews (Groth-Marnat 2003). Since that time, the

practice of assessment has grown to include a plethora of methods and measures. The difficulty

now becomes how to choose an assessment. Groth-Marnat suggests evaluating the instrument in

regards to its theoretical orientation (Does the measure match its theory?), practical

considerations (Are its length and reading level appropriate?), standardization (Is the current

population similar to the standardization population?), reliability (Are reliability estimates

adequate?), and validity (Will it produce appropriate measurements within the intended use?).

Armed with an understanding of the assets and limitations of assessment, psychologists

are the primary providers of psychological testing for the purposes of diagnosis and treatment

planning (Groth-Marnat, 2003). Assessment can be seen as a psychologist’s unique contribution

to the broad field of mental health. As echoed by Blanton and Jaccard (2006), “measurement is a

cornerstone of psychological research and practice” (p.27).

The Juvenile Offender

According to the Federal Interagency Forum on Child and Family Statistics (2007) there

were 73.7 million children ages 0–17 in the United States in 2006, or 25 percent of the

population. In that year, 67 percent of children ages 0–17 lived with two married parents and

births to unmarried women constituted 37 percent of all U.S. births, the highest level ever

reported. In 2005, 20 percent of school-age children spoke a language other than English at

 

home. The adolescent birth rate, among females ages 15-17, fell to 2.1 % in 2005. In 2005, 18

percent of all children ages 0–17 lived in poverty. The percentage of children with at least one

parent working year round, was 78.3 percent in 2005. In 2005, 40 percent of households with

children had one or more housing problems like cost burden, physically inadequate housing and

crowded housing. In 2005, 68 percent of Caucasian; 66 percent of Asian-American children; 50

percent of African-American; and 45 percent of Hispanic/Latino children (ages 3-5) were read to

daily. In 2005, 5 percent of children ages 4–17 were reported by a parent to have serious

emotional or behavioral difficulties. These statistics are not intended to scare, but to simply

represent what our youth are experiencing. Considering that so many children are not having

their essential needs met is testament to the need for services. It was for precisely this reason that

the Juvenile Court System was created (Snyder & Sickmund, 2006).

The juvenile justice movement began in the 19th century with an interest in discontinuing

the practice of treating juvenile offenders as miniature adults (Snyder & Sickmund, 2006). As

early as 1825, the Society for the Prevention of Juvenile Delinquency was advocating for the

separation of juvenile and adult offenders, which led to the creation of privately operated youth

detention centers. These centers came under scrutiny for various charges of abuse and the states

began to take over control of many facilities. Illinois passed the Juvenile Court Act of 1899 on

April 4th, thus creating the nation’s first state juvenile court in Cook County (Chicago is the

current county seat) on July 3, 1899. Under the doctrine of parens patriae (the state as parent)

the state’s main focus was on the welfare of youth. Thirty-one new states followed with

establishing their own juvenile courts over the next 11 years. By 1925, all but two states had

developed juvenile courts (Snyder & Sickmund, 2006).

 

In 2004, law enforcement agencies in the United States made an estimated 2.2 million

arrests of persons under age 18, 16% of all arrests (Snyder, 2006). In 2004, for the tenth

consecutive year, the rate of juvenile arrests for Violent Crime Index offenses— murder, forcible

rape, robbery, and aggravated assault—declined. Specifically, between 1994 and 2004, the

juvenile arrest rate for Violent Crime Index offenses fell 49%. As a result, the juvenile Violent

Crime Index arrest rate in 2004 was at its lowest level since at least 1980. From its peak in 1993

to 2004, the juvenile arrest rate for murder fell 77%. Between 1980 and 2004, the juvenile arrest

rate for simple assault increased 106% for males and 290% for females. The disparity in violent

crime arrest rates for black juveniles and white juveniles declined from 6-to-1 in 1980 to 4- to-1

in 2004 (Snyder, 2006).

Snyder and Sickmund (2006) noted that of the 2.2 million arrests 29% were female, 68%

were ages 16-17, 71% were Caucasian, 27% were African-American, 1% were American Indian,

and 2% were Asian-American. Violent and drug arrest rates for young juveniles rose from 1980

to 2003 as their overall arrest rate fell. For youth ages 10–12 the Property Crime Index fell 51%

between 1980 and 2003 and the Violent Crime Index arrest rate increased 27%.

Teplin, Abram, McClelland, Mericle, Dulcan, and Washburn (2006) presented data from

the Northwestern Juvenile Project, which measured the prevalence of alcohol, drug, and mental

disorders among youth detained at the Cook County Juvenile Temporary Detention Center in

Illinois. The project used the Diagnostic Interview Schedule for Children (DISC) Version 2.3 to

assess and diagnose a random sample of youth at the detention center between November 20,

1995, and June 14, 1998. The stratified sample (by gender, race-ethnicity, age, and legal status)

of 1,829 youths included 1,172 males (64.1 percent) and 657 females (35.9 percent), 1,005

African-Americans (54.9 percent), 524 Hispanics/Latinos (28.7 percent), 296 Caucasians (16.2

 

percent), and 4 detainees of other racial and ethnic groups (0.2 percent). The mean age for the

youths was 14.9 years old. The detention center’s total population included 90 percent male with

racial classifications of African-American (77.9 percent), Hispanic/Latino (16.0 percent),

Caucasian (5.6 percent), and other racial or ethnic groups (0.5 percent). The percentage of mental

health disorders in this sample averaged 66.3% for males and 73.8% for females. The

percentages of mental health disorders across ethnicity totaled 64.6 % for African-American,

82% for Caucasian, and 70.4% for Hispanic/Latino males and conversely were 70.9% for

African-American, 86.1 % for Caucasian, and 75.9% for Hispanic/Latino female youth.

Justification and Significance of Study

Assessment is integral to the practice of counseling psychologists and assessment

instruments are used for myriad purposes. For instance, Kazdin (2005) listed uses of assessments

and among the list he included: diagnosis, case formulation, screening, case identification,

treatment planning, treatment implementation, treatment progress and outcome evaluation, and

cost/benefit evaluations of the treatment.

Kazdin (2005) recommended that the purposes of each instrument be delineated and the

criteria for validation of the instrument’s use for each purpose be specified. He notes that studies

of an instrument’s psychometrics are essentially never finished. There are an infinite number of

possible studies to complete for an instrument with no definite point of “completion”. It is

important that the instruments be validated for each use to develop evidence in support of those

uses. Since validity and reliability are not properties of the instrument, but rather are aspects of

the instruments use, it becomes quite clear why Kazdin (2005) described the limit of studies as

infinite.

 

With the importance of assessment in various applications, the validation of an

instrument becomes necessary for effective provision of the psychological services. The

movement toward evidence-based assessment (EBA) has recently begun appearing in the

literature (Mash & Hunsley, 2005). Achenbach (2005) specified that evidence for the methods

and measures for all assessment purposes are needed. He noted that the evidence-based treatment

(EBT) movement pushed forth without first considering how to effectively identify and measure

the problems that are to be treated and the outcomes following those treatments. Achenbach

stated that EBA and EBT will aide in “understanding, preventing, and ameliorating child

psychopathology” (p.547).

Testing the “functioning” of instruments across populations and purposes is necessary.

In an official publication by the Office of Juvenile Justice and Delinquency Prevention, Grisso

and Underwood (2004) state “instruments that provide evidence of reliability and validity with

youth in the juvenile justice system are preferable to those that do not” (p.12). Grisso and

Underwood also listed a number of assessment instruments they recommend for use within the

juvenile justice population. Since no studies, to date, have examined the BASC-2’s validity

within this specific population, it is not surprising that the BASC-2 did not appear on their list.

Although the BASC-2 was not listed, several other instruments with which the BASC-2 has

demonstrated convergent validity (like the Child Behavior Checklist) did appear on the list. The

BASC-2 is a commonly used behavioral rating scale which has been recommended for the

assessment of conduct problems (McMahan & Frick, 2005) and demonstrates promise for

effective use with juvenile offenders, but validity studies for this purpose are lacking.

 

Statement of the Problem

The purpose of the current study was to evaluate the validity of the BASC-2 with the

juvenile offender population. In the context of evidence-based assessment, the conditional

validation of instruments per their intended use is best-practice. Although the BASC-2 is

suggested as an appropriate broad screening measure of conduct problems, it had not been

validated for use with juvenile offenders. The current study focused on reliability, discriminant

validity, convergent validity, and the higher-order factor structure of the BASC-2 within a

sample of juvenile offenders. Results of this study have promise to impact the evidence-base of

assessment with juvenile offenders. If validated as a broad screener for conduct problems and

related internalizing symptoms, the BASC-2 could aid psychologists and others involved in the

treatment, prevention, and rehabilitation of juvenile offenders.

General Hypotheses

The general hypothesis involved the validity of the BASC-2 in the Juvenile Offender

population. This general hypothesis lead to specific questions. Will the BASC-2 scales

demonstrate adequate levels of internal consistency in the current sample? Will the BASC-2

scales correlate in theoretically predicted directions within its own scales and with the scales of

the MMPI-A? Will the higher-order factor structure be confirmed in the current study? Will

alternative higher-order factors emerge that explain the inter-scale correlations of the BASC-2

within a juvenile offender sample?

Null Hypothesis 1: The BASC-2’s scales will not demonstrate adequate levels of internal

consistency in the current sample

Null Hypothesis 2: The BASC-2 scales will not correlate in theoretically predicted

directions within its own scales.

 

Null Hypothesis 3: The BASC-2 scales will not correlate in theoretically predicted

directions with the scales of the MMPI-A.

Null Hypothesis 4: The higher-order factor structure will not be confirmed in the sample.

Null Hypothesis 5: No alternative higher-order factors will emerge to explain the inter-

scale correlations of the BASC-2 within a juvenile offender sample.

Definition of Terms

Juvenile Offender: In the current study, juvenile offender was defined as any youth, 18 years or

older, than has been arrested for committing a crime or violating a law. Also, the term juvenile

offender will be synonymous with the term juvenile delinquent.

Construct: A construct was defined in this study has a hypothetical phenomenon which cannot be

directly observed and is therefore latent.

Evidence-Based Treatment (EBT): The predominant definition in the literature for evidence-

based treatment is the recognition of the connection between current research on treatment

efficacy, the criteria for the research evidence, and the use of these treatments in empirically

validated ways.

Evidence-Based Assessment (EBA): Evidence-based assessment is an approach that utilizes

research and theory to select the constructs, methods, and measures for an assessment purpose

and the process of the assessment.

 

CHAPTER 2

REVIEW OF RELATED RESEARCH

Evidenced-based Assessments

The evidence-based treatment (EBT) or evidence-based practice in psychology (EBPP)

movement has its earliest roots with Lightner Witmer’s first clinic in 1896, but the modern state

of the movement has gained speed over the past 20 years (American Psychological Association

2006). The literature is replete with studies reporting empirical support of specific treatments for

use with clients (see Silverman & Hinshaw, 2008 for a review) and there are even guidelines to

help evaluate EBT guidelines (American Psychological Association 2002). In this same light,

psychologists are calling for evidence based assessments (EBA) and Achenbach (2005) mentions

that “without EBA, EBT may be like a magnificent house with no foundation” (p. 547).

Hunsley and Mash (2007) describe EBA as a three-pronged approach including research

and theory, methods and measures, and the process of an unfolding assessment. They explain

assessment as being a decision-making process which unfolds by iteratively testing new

hypotheses. The process entails integrating and interpreting data from different instruments and

informants for the explicit purpose of the unfolding assessment. EBA guidelines or standards

would provide a level of insurance that assessment procedures and instruments are used for valid

purposes based on research and theory; however, to date these guidelines are still in

developmental stages, which leaves room for faulty assessment procedures. The clinical

application of several commonly used assessments (specifically the Rorschach and Thematic

Apperception Test) appears to have “outstripped empirical evidence” (Hunsley & Mash, 2007;

p.31), which means many evaluations are being conducted with tests that don’t meet professional

standards.

10 

 

Mash and Hunsley (2005) note that the use of assessments only becomes valid when the

purposes and populations for that assessment have been evaluated and deemed appropriate.

Mash and Hunsley call for “replicated evidence for a measure's concurrent, predictive,

discriminative, and (ideally) incremental validity” (p. 372) to establish it as evidence-based. One

study with a convenience sample is not enough. Hunsley and Mash (2007) discuss three critical

aspects of EBA; 1) research and theory, 2) psychometrics, and 3) the entire assessment process.

Research and Theory

While discussing the first aspect of EBA (research and theory), they explain that theory

and research findings of normal development and psychopathalogy are necessary guides for the

selection of assessments and the processes to appraise the relevant constructs of interest. As

Smith (2005) explains, evaluating a measure involves evaluating the theory behind that measure.

McMahan and Frick (2005) highlight how critical it is that “assessment strategies used in

practice are also informed by research findings” (p. 477). In their seminal article on construct

validity, Cronbach and Meehl (1955) state the following:

“A rigorous (though perhaps probabilistic) chain of inference is required to

establish a test as a measure of a construct. To validate a claim that a test

measures a construct, a nomological net surrounding the concept must exist.

When a construct is fairly new, there may be few specifiable associations by

which to pin down the concept. As research proceeds, the construct sends out

roots in many directions, which attach it to more and more facts or other

constructs” (p.291).

11 

 

Cronbach and Meehl’s (1955) discussion of a nomological net speaks to the need for a

coherent theory to exist around the construct of interest. With such a theory in place, hypotheses

of associations can be made for the construct (as measured) with other constructs (again, as

measured). These associations can then be tested to add to the validation of the target measure

and target construct.

Reliability and Validity

In the second aspect of EBA, Hunsley and Mash (2007) discuss selecting

psychometrically sound instruments. They discuss the necessity of using instruments which have

demonstrated adequate levels of reliability, validity, and incremental validity for each purpose

for which they are used. Hunsley and Mash (2007) described the current professional standards

of a psychometrically sound instrument as including standardizations, relevant norms, and

appropriate levels of reliability and validity. They also assert that “blanket recommendations to

use reliable and valid measures when evaluating treatments are tantamount to writing a recipe for

baking hippopotamus cookies that begins with the instruction ‘use one hippopotamus,’ without

directions for securing the main ingredient” (Mash & Hunsley, 2005; p. 364).

Reliability and validity are not properties of the instrument itself; they are properties of

the specific use of that instrument, which makes it difficult to pinpoint exact criteria. The

“hippopotamus” in this sense is the overarching concept of construct validity which has been

called “an umbrella term, describing a process for theory validation that subsumes specific test

validation operations” (Smith, 2005; p. 396). The test-user therefore needs clear guidance on

how to evaluate an instrument for his or her specific purposes and population or sample.

Internal consistency has been described as “a measure of the ‘here-and-now, on-the-spot’

reliability” (p. 291; Charter, 2003), and also as the correlation estimate of the current instrument

12 

 

score with an alternate form test that was never administered (Ponterotto & Ruckdeschel, 2007).

The acceptable reliabilities for research are lower than what is acceptable for clinicians (.90) and

others involved with high-stakes decisions. Ponterotto and Ruckdeschel (2007) note that internal

consistency is affected by both the number of items in the subscale and the mean inter-item

correlation within it. With the inter-item correlation held constant, adding items will increase

alpha. The same is true for increasing the inter-item correlations. Interestingly, sample variance

can increase alpha because “when scores are bunched together, a small change in raw score will

lead to marked changes in relative rankings. If variance is greater, it is more likely that a small

change in raw score will not affect the relative rankings” (p.1001; Ponterotto & Ruckdeschel,

2007).

Ponterotto and Ruckdeschel (2007) provide reliability guidelines for researchers and a

reliability evaluation matrix (table 1) that is intended to be more broadly applicable than

Cicchetti’s (1994) familiar guidelines, which categorize results as .70 is unacceptable, .70 to .79

is fair, .80 to .89 is good, and above .90 is excellent. Ponterotto and Ruckdeschel (2007)

recommend that researchers 1) calculate coefficient alpha for every subscale in each study since

reliability is not a function of the test itself, but a function of the scores within a sample; 2) report

the mean inter-item correlations for the subscale (Clark & Watson, 1995 suggest a range of .15 to

.20 for broad constructs and .40 to .50 for narrow constructs); 3) construct confidence intervals

and note whether any subscales cross qualitative ratings on their provided matrix (see table 1); 4)

report the number of items per subscale and sample sizes; and 5) remember that coefficient alpha

is the standard for internal consistency estimates. The authors caution that even if a subscale

reaches a moderate reliability on the matrix, the error variance should still be considered (for

13 

 

instance, a 6 item scale with alpha .65 with less than 100 subjects would have an error variance

of 35% even with a moderate level of reliability).

Table 1, Matrix for Evaluating Internal Consistency Alphas

Items Per Scale Rating Sample Size N < 100 N = 100-300 N > 300

< 6 Excellent .75 .80 .85 Good .70 .75 .80 Moderate .65 .70 .75 Fair .60 .65 .70

7-11 Excellent .80 .85 .90 Good .75 .80 .85 Moderate .70 .75 .80 Fair .65 .70 .75

> 12 Excellent .85 .90 .90 Good .80 .85 - Moderate .75 .80 .85 Fair .70 .75 .80

*Adapted from (Ponterotto & Ruckdeschel, 2007)

Messick (1995) noted that “validity is not a property of the test or assessment as such, but

rather of the meaning of the test scores” (p. 741). Therefore, validity exists in the use of the test

or measure and not necessarily in the test itself. Since an individual measure is “just one of an

extensible set of indicators of the construct” (p. 742; Messick, 1995), the validation of the

various measures for their uses within specific populations adds to the evidence for validity of

that construct.

Smith (2005) provides the perspective that evidence of construct validity is always open

to criticism and reevaluation. He notes whenever a new investigation of an instrument’s

construct validity is undertaken, the new pieces of evidence add to the burgeoning argument for

or against its validation. Similarly, Messick (1995) characterizes validity as being “broadly

defined as nothing less than an evaluative summary of both the evidence for and the actual—as

14 

 

well as potential—consequences of score interpretation and use (i.e., construct validity conceived

comprehensively)” (p. 742).

Smith (2005) offers a five-step model of construct validation which includes (1)

specification of theory, (2) development of hypotheses predicted by that theory, (3) specification

research designs to test the hypotheses, (4) interpretation of the fit between resulting data and

predictions, and (5) revision of the theory and the constructs. Smith describes step 4 as the most

essential part of validation studies and involves the typical validity studies (convergent,

discriminant, etc.).

Messick (1995) identifies 6 aspects of construct validity that function as validity criteria;

content, substantive, structural, generalizability, external, and consequential aspects. Content

aspect involves content relevance, representativeness, and technical quality. Substantive involves

theoretical and empirical evidence. Structural involves the scoring and construct structure.

Generalizability involves the extent to which the score properties and interpretations generalize.

External involves convergent and discriminant validity. Consequential involves the value of the

scores for decision-making and the consequences of test use, or stated differently, the clinical

usefulness of assessments.

Mash and Hunsley (2005) state, “solid evidence to support the usefulness of assessment

for improving treatment outcomes for children who are assessed is lacking” (p. 362). Their

declaration is a call for studies of incremental and clinical utility of instruments. Hunsley (2003)

describes incremental validity as the increase in predictive or discriminative power gained by the

addition of the instrument of focus. Hunsley adds that when the question focuses on the

meaningfulness of the increase, clinical utility is being addressed. Clinical utility involves a

weighing of the costs (time and money), decision-making improvements, and treatment impacts

15 

 

of the instrument in reference to other available instruments, with the ultimate question being

whether or not to include the instrument in an assessment.

While discussing clinical utility of assessment instruments, Hunsley and Mash (2007)

articulate that “utility, even from an instrument as intensively researched as the MMPI-2, should

not be assumed” (p.33). They caution that little research is conducted on the clinical utility or

incremental validity of assessment instruments. Incremental validity is essentially established

when an instrument adds predictive data beyond what would already be available with other

information with consideration given to both time and money for the “cost” of the assessment.

Hunsley and Mash (2007) state in regards to clinical utility that “an emphasis on garnering

evidence regarding actual improvements in both decisions made by clinicians and service

outcomes experienced by patients and clients is at the heart of clinical utility” (p.45).

The Assessment Process

The third aspect of EBA, the entire assessment process, is described as having little

supporting evidence to date and that the assessment process should be empirically validated

(Hunsley & Mash, 2007).While presenting the Wechsler intelligence tests as being “among the

psychometrically strongest psychological instruments available” (p.32), Hunsley and Mash

(2007) warn against the common practice of interpreting inter-subtest score discrepancies. They

note that “nothing is to be gained, and much is to be potentially lost, by considering subtest

profiles” (p.32). This stands as an excellent example of a highly regarded test being used in a

manner not based in evidence.

Kazdin (2005) states, “in principle no finite number of studies can exhaust one type of

validity (e.g.,construct validity) or provide normative data from all possible samples (e.g.,

various combinations of ethnic, race, gender, sex, and age groups) at different points in time

16 

 

(e.g., cohorts)” (p.550). Therefore, the process of validation is continuous without ever “proving”

validity, but rather accumulating evidence in support of it (Smith 2005). Mash and Hunsley

(2005) specify that most child assessments are conducted for the purposes of diagnosis and case

formulation, screening, prognosis and predictions, treatment design and planning, treatment

monitoring, and treatment evaluation. In light of the need to evaluate the validity of the various

uses of instruments, their discussion of the myriad uses brings to mind the mythological story of

Sisyphus who was condemned to an eternity of rolling a boulder uphill only to watch it roll back

down again. The good news however, is that our task of building evidence is not as hopeless as

Sisyphus’s task may seem. We aren’t building sand castles in high-tide, but rather mounting

boulders of evidence that will provide the foundation for validating the uses of instruments.

EBA for Specific Purposes with Children and Adolescents

For the current study, the greater population of interest is that of childhood/adolescence.

As Kazdin (2005) notes, a problem with validating measures of childhood dysfunction is the lack

of true gold standards for comparisons. It is difficult to fully evaluate the validity of an

instrument’s use without established criterion. Since psychology is generally interested in latent

constructs, the measuring of such constructs becomes difficult to verify. Criterion-validity

provides one way of validating an instrument’s measurement of a construct. Cronbach and Meehl

(1955) describe criterion-related validity as subsuming predictive validity and concurrent

validity. As discussed previously, instrument validation does not emerge from just one type of

validity or from one study of its psychometrics. Evidence-based assessment guidelines, although

not yet complete, are being established for specific assessment purposes.

Fletcher, Francis, Morris, and Lyon (2005)explain that youth with a learning disorder

(LD) are different than youth with mental retardation, emotional/behavioral disturbances, or

17 

 

environmental causes of underachievement, although they share similar symptom presentations.

The authors note the inherent difficulty in ruling out other disorders or influencing factors when

presentations have symptom overlap. They evaluated four approaches to the assessment of LD.

The first and most common approach, IQ/achievement discrepancy (two-test model), had

problems with regression to the mean, meaning that on a subsequent test or alternative test, the

individual’s score will tend toward the mean (higher or lower depending). There were also issues

with discrepancy cut-offs in terms of unreliability of scores. This approach was also shown to

have limited validity in meta-analytic studies. The second approach evaluated, the low

achievement approach, has problems with measurement error. The third approach, intra-

individual differences, was noted as having validity problems. Response to instruction (RTI), the

forth approach, has demonstrated reliability and validity, but not fully adequate for identifying

LD.

In terms of the process of assessment for LD, Fletcher, Francis, Morris, and Lyon (2005)

boldly state, “We find little value in the idea of evaluating a child in a single assessment and

concluding that the child has LD based on an IQ-achievement discrepancy, low achievement, or

profiles on neuropsychological tests, largely because such assessments are not directly related to

treatment and the diagnosis itself is not reliable” (p.519). They state that not until after proper

treatment has been attempted, should children be diagnosed as LD. They insist upon first

allowing the child the opportunity to learn and therefore endorse a “treat and test” model over a

“test and treat” model. They recommend a hybrid-model combining the RTI and low-

achievement approaches.

In a similar vein, Silverman and Ollendick (2005) attempt to provide an overview of

where the field is in its evidence-based assessment of anxiety related disorders in children. They

18 

 

define anxiety as including avoidance, worry, and physiological arousal. Silverman and

Ollendick advocate for a pragmatic approach to assessment that involves selecting the instrument

that will be most useful for the setting, not just the test-users favorite instrument. The authors

caution against settling in to an assessment routine that didn’t embrace a concerted effort to

select the test based on considerations of person and purpose. They explain that clinical

interviews are prone to error and interviewer based variance; however, they state that semi-

structured or structured interviews “are necessary from an evidence-based perspective” (p.384).

Therefore, the selection of the procedures and protocols for the interview are paramount for

maintaining an evidence-based assessment. They caution that most of the psychometric

properties of rating scales for anxiety (including the Revised Manifest Anxiety Scale) have been

completed on only community samples, thus highlighting the need for cross-validation of the

instruments in other samples. They state the need for verifying the “real world” anxiety related

symptoms associated with norm-referenced scores on the rating scales. The scores demonstrate a

place on a distribution which may or may not reflect the magnitude of anxiety.

Silverman and Ollendick (2005) explain that it is important to assess for comorbidity of

disorders and suggest a sequence of assessing for primary anxiety disorder diagnoses co-

occurring first with other anxiety disorders, then with depression, and finally with externalizing

disorders like ADHD, oppositional defiant disorder, or conduct disorder. The authors note that

youth with comorbid disorders experience more “impairment” and that their symptoms are more

likely to persist. In the article, the authors voice their struggle with the notion of how much

evidence is needed before describing an instrument or method as evidence-based. They describe

grappling with whether or not to even include recommendations. They therefore provide a

tentative set of recommendations: heed the arbitrary metrics of instruments, be aware that the

19 

 

parent and youth reports are often discordant and consider both without pre-specifying one to be

better than the other, assess for comorbid disorders, and use an interview with a rating scale for

screening purposes.

Youngstrom, Findling, Youngstrom, and Calabrese (2005) reviewed the literature on

pediatric bipolar disorder (PBD). They note the field is calling for earlier onset diagnoses of

bipolar disorder. The authors cite research showing that 95,000 children and adolescents were

being medicated for bipolar disorder in 2001. They acknowledged it is unknown whether the

youth currently assessed as meeting bipolar disorder criteria will demonstrate the classic adult

presentation when older. Without longitudinal studies with agreed-upon diagnostic criteria for

PBD, the course of the disorder may never be known.

In terms of attitudes toward PBD, Youngstrom, Findling, Youngstrom, and Calabrese

(2005) identify different types of practitioners that either 1) don’t endorse the diagnosis in

childhood, 2) believe ADHD medication failure equals BP, or 3) feel unprepared to assess such a

low base-rate disorder. They explain that due to its strong heritability, the genetic predispositions

for BP exist from day one and as noted previously, longitudinal and genetic studies will be

needed to verify the continuation of what is thought of as PBD to the adult BD. They suggest

that family history does not count youth in or out for a BD diagnosis, but provides useful

information, specifically for treatment considerations since lithium response may demonstrate

heritability. They explain the most common comorbid disorders in youth are ADHD,

oppositional defiant disorder, conduct disorder, and learning disorders. The authors state that

comorbidity may complicate the diagnosis of PBD because clinicians will not see a clean-cut

version of BD and may recognize the co-occurring disorder at the exclusion of BD. They suggest

using personal baselines to differentiate (for instance) between the child’s normal level of high

20 

 

energy/activity and his or her manic state. They also caution about symptom overlap (bipolar

depressive episode vs. unipolar and ADHD vs. mania).

Youngstrom, Findling, Youngstrom, and Calabrese (2005) recommend utilizing multi-

informant interviews or gathering collateral data (i.e. school or medical records) and specify

information to consider in a diagnostic interview for PBD. They recommend that practitioners

maintain an open stance to encountering PBD (don’t pretend it doesn’t exist at all), establish base

rates for their particular setting, and gather a detail family history. Youngstrom, et al. suggest a

truncated approach to assessment beginning with screening instruments that lead into more

focused evaluation. They endorse using information from the assessment in an actuarial

approach to estimate the individual’s odds of having the disorder. The authors advise using

multi-source/informant data, evaluating for spontaneous changes of mood, assesing for elevated

mood and grandiosity (which are symptoms that are more specific to BD than other related

symptoms like irritability and explosiveness), and engaging in ongoing assessment to the extent

possible by extending the interview over multiple sessions or throughout treatment. They

recommend continuous evaluation of key constructs during treatment and in reference to the

literature on PBD, they suggest maintaining a critical perspective because research is not uniform

with operational definitions of PBD and to stay current because the literature on PBD changes

quickly.

In Klein, Dougherty, and Olino’s (2005) review of the adolescent depression literature,

describe support for the continuity of adolescent depression into adult depression with similar

presentations between adolescents and adults. The authors note that clinicians must determine

whether MDD or DD criteria are met, rule out exclusionary diagnosis (medical conditions,

bipolar, etc.), assess symptoms that may affect treatment (i.e. suicidality), explore the previous

21 

 

course of depression, evaluate comorbidity, and assess social functioning, family environment,

school functioning, stressors, traumas, family history, and previous treatment outcomes. They

recommend using multiple information sources and caution clinicians to be aware of the

attenuation effect (the tendency for symptom ratings to decrease with multiple assessments)

which might mimic clinical improvement during treatment. They state that assessment of

depression in children typically involves interviews and/or rating scales. They report that most

depression rating scales do not discriminate well between depression and anxiety and that there

is limited research on the incremental validity of interviews and rating scales for depression.

They note that validation of treatment utility of the instruments should be a priority, while noting

that there are few guidelines for determining clinical meaningfulness of rating scale scores to

evaluate the ongoing outcomes of treatment. They recommend using a semi-structured interview

like the K-SADS as well as using clinician and self-report rating scales for treatment evaluation.

They also recommend using parent and self-rating scales for screening, but caution against their

use in high or low base rate settings because of limited evidence of specificity and sensitivity.

Pelham, Fabiano, and Massetti (2005) discuss the evidence-based assessment of attention

deficit/hyperactivity disorder (ADHD).They reported that effective screenings for ADHD can be

made quickly and economically with parent and teacher assessments. They state that lengthy

DSM based interviews do not add any incremental validity over brief multi-informant rating

scales like the BASC (they did not review the BASC-2) and that research does not support the

notion that elaborate DSM based diagnostic interviews increase diagnostic precision. They also

explain that adding information about classroom verbal intrusions, seatwork completion and

accuracy, and evaluations of whether the child has the required supplies at school would increase

diagnostic confidence beyond multi-informant rating scales by including more objective

22 

 

measurements. The authors state that once the diagnosis is made, assessment and treatment focus

should turn toward the child’s specific impairments and what causes, maintains, or exacerbates

them (the client in context). Ongoing assessment should not focus on the DSM’s diagnostic

criteria beyond the initial diagnosis. Pelham, Fabiano, and Massetti (2005) state that rating scales

“must be combined with a clinical interview or additional paper-and-pencil questions” (p.416) to

rule out other diagnoses. Suggestions are also made that evaluations should include ecological

areas of functioning (social relations, family relations, teacher relations, and academic progress).

They call for future research to cross-validate instruments with other demographic groupings or

samples.

McMahan and Frick (2005) summarize the research on conduct problems in adolescents

and the implications for evidence-based assessment. In this review, they recommend the BASC-2

as an assessment for conduct problems (CP) for the following purposes: a broad screener for CP

behaviors, a focused assessment of overt/covert CP, and a broad screener for comorbid

adjustment and peer interaction problems. Furthermore, they cite the BASC-2 among the few

instruments that have been used “extensively in clinical practice and research with children and

adolescents with CP” (p.481).

McMahan and Frick (2005) state, “understanding the common comorbid problems has

proven to be very important for understanding and treating children and adolescents with CP” (p.

485). They describe the primary tasks of diagnosis and screening as (1) identifying the types and

severity of the youth's problems and determine and determine associated impairments; (2)

evaluate for other impairments from other disorders; (3) determine antecedents and factors

exacerbating or contributing to the continuation of these problems; and (4) determine which

23 

 

developmental pathway is most consistent with the youth's pattem of CP, comorbid conditions,

and risk factors.

Behavioral Assessment System for Children- second edition (BASC-2)

There is a dearth of research on Reynold and Kamphaus’s (2004) recently published

BASC-2. The literature contains many publications using the original BASC with possibly the

most recent study being completed by Evans and Oehler-Stinnett (2008). Due to the paucity of

research on the BASC-2, a complete review of this literature is possible.

In one study, Bergeron, Floyd, McCormack, and Farmer (2008) investigated the

dependability of externalizing composites and scales on the BASC-2 Teacher Report Scale-

Children (TRS-C) and the Achenbach System of Empirically Based Assessment Teacher Report

Form for Ages 6-18 (ASEBA TRF). In their study, they evaluated the variance associated with

students, raters, instruments, and occasions. The researchers had 6 teacher pairs (12 teachers

across 6 classes) rate a random set of 10 students in their classes on the BASC-2 TRS-C and the

ASEBA TRF twice over a period of 1-3 weeks. For the BASC-2 TRS-C, test-retest correlations

were all between .83 and .93, inter-rater correlations were between .72 and .79, and the

correlations with the ASEBA TRF were between .86 and .90 for the externalizing scales and

composite.

In another study, Heng and Wirrell (2006) utilized the BASC-2 Parent Report Scales,

Child and Adolescent versions (BASC-2 PRS-C and BASC-2 PRS-A respectively) in a study of

youth with migraines. The researchers investigated the between group differences (N=69) on the

BASC-2 PRS composite and subscales. The groups were composed of youth with migraines and

their siblings (as a control group) who did not have headache. The researchers found two

significant differences; the migraine group was higher on the Internalizing Composite and higher

24 

 

on the Somatization Subscale. For the youth with migraines, the researchers found significant

correlations between total sleep disturbance scores (as measured by the Child Sleep Habits

Questionnaire) and the following BASC-2 scales and composites: Hyperactivity, Depression,

Somatization, Atypicality, Attention Problems, Adaptability, Activities of Daily Living,

Behavioral Symptoms, Externalizing Behavior, Internalizing Behavior, and Adaptability Skills.

In her review, Tan (2007) cited that the BASC-2 can “purportedly be used to assess all

aspects of the federal definition of severe emotional disturbance, to design Individualized

Education Programs (IEPs) for emotionally disturbed children in the manifestation determination

process, and to develop family service plans” (p. 121). In reviewing the reliability estimates for

scales and composites, she concluded that “individual scales should not be used for important

decisions about individual students” and that “caution should thus be exercised in using

individual scales of the PRS and SRP” (p. 122). She concluded “the psychometric properties of

the BASC-2 are adequate, and the composite scales can be used with confidence, but

interpretation of individual scales should be done with caution” (p. 124).

In my own review of the BASC-2 manual (Reynolds and Kamphaus 2004), I drew

similar conclusions to Tan’s (2007) about scale reliability estimates. Internal consistency

estimates for scales and composites in the general and clinical norm samples were nearly all

adequately high, but the composites reliabilities were consistently higher. Specifically, the scale

internal consistencies for the general sample ranged from .61 to .89 and .64 to .90 for the clinical

sample. Using Ponterotto and Ruckdeschel’s (2007) guidelines for evaluating internal

consistency alphas with samples larger than 300, the ratings would be .90 is excellent, .85 is

good, .80 is moderate, and .75 is fair. The reader can clearly see that the scale alphas ranged

from unacceptable to excellent rating categories.

25 

 

On the other hand, the composites ranged from .83 to .95 for the general sample and .82

to .96 for the clinical sample (Reynolds and Kamphaus 2004). These values occupy rating

categories between moderate and excellent. Test-retest reliability was investigated on intervals

between 14 and 51 days for 107 adolescents with adjusted correlations ranging from .74-.84 for

composites and .61-.84 for scales, nearly identical to internal consistency values for scales, but

slightly lower for the composites. Standard error of measurement (SEM) values are statistical

replications of the internal consistency patterns, but may be easier to conceptualize because they

can be presented in Z-score units. These values range (in Z-score units) from 2.0 to 4.1 for the

general composites and 2.0 to 4.4 for the clinical composites, while the general scales range from

3.3 to 6.2 and the clinical scales range from 3.2 to 6.2. The large potential variation in Z-scores

(6.2) due to unreliability of scales provides credence to Tan’s (2007) statement that clinical

decisions should be based solely on scales, but to use the composites instead.

Weis and Smenner (2007) completed a study of the Behavioral Assessment System for

Children, Self-Report, Adolescent version (BASC SRP-A) (note, they did not use the BASC-2)

with 970 adolescents (16-18 years), 290 of them also completed the Minnesota Multiphasic

Personality Inventory, Adolescent version (MMPI-A). Of the 970 adolescents 75% were male.

Ethnicities included Caucasian (60%), Latino (24%), African American (10%), Asian American

(5%), and Native American (1%). Adolescents were being treated at two residential treatment

programs for youth with disruptive behavior problems. The reasons for referral included chronic

truancy (92%), substance abuse (75%), nonviolent antisocial behavior leading to arrest (50%),

and physical aggression leading to arrest (25%). Sixty-one % had been previously arrested and

14% had been removed from parents’ homes because of behavior problems.

26 

 

Weis and Smenner (2007) stated “adequate fit of the proposed model to this referred

sample would support the generalizability of the factor structure and the use of its components

with disruptive youth” (p. 113). They also stated “clinically significant deviations in norm-

referenced scores would support the utility of the SRP as a means to identify at-risk youth” (p.

113). Results of a confirmatory factor analysis (CFA) supported the BASC composites, but

suggested that the Sense of Inadequacy and Locus of Control scales load on the School

Maladjustment composite and the Depression scale loads on the Clinical Maladjustment and

Personal Adjustment composites. They used Steiger’s (1980) method to test the magnitude of

correlations between scales that were theoretically similar and dissimilar; the results were mixed

for convergent and discriminant validity. They noted that the Clinical Maladjustment composite

was correlated with the MMPI-A clinical scales, but was better viewed as an “omnibus measure

of social, emotional, and behavioral dysfunction rather than as a measure of internalizing

symptoms per se” (p. 123) and that Anxiety, Depression, Somatization, and Sense of Inadequacy

scales showed the best convergent and discriminant validity. They noted the Locus of Control

scale should “be viewed as a general indicator of psychosocial distress and impairment rather

than as a measure of locus of control” (p. 124). The Personal Adjustment composite had mixed

results; the Relations with Parents scale was judged to be a “relatively pure indicator of family

conflict and disruptive behavior,” and that the Interpersonal Relations, Self-Esteem, and Self-

Reliance scales seem to measure “the absence of depression, anxiety, and social impairment” (p.

124). The researchers found little support for the convergent/ discriminant or discriminative

validity of the School Maladjustment composite; however, the Sensation Seeking scale was

judged a good measure of “impulsivity, emotionality, and extroversion” (p. 124). They then

separated the adolescents into groups based on problems at home (BASC PRS,) and problems at

27 

 

school (BASC TRS) with scores greater than or equal to 70 clinical or less than or equal to 30

adaptive as the impaired group and less than 60 clinical or greater than 40 adaptive as normal

group.

In studies completed by Reynolds and Kamphaus (2004) the BASC-2 has demonstrated

appropriate levels of convergent and discriminant validity. Scale and composite validity was

investigated by the authors in several ways. They explored the scale intercorrelations and scale

factor groupings; correlations with other measures; and scale profiles of specific diagnostic

populations. Scale intercorrelations were in predicted directions with clinical scales being

positively related with other clinical scales and negatively related to adaptive scales.The

intercorrelations from the item development sample were submitted to confirmatory factor

analysis (CFA) and exploratory factor analysis (EFA). The authors began with a CFA model

based on the composites from the BASC, modification indexes (MIs) were used for model fit

improvements, and a 4th factor emerged for the Inattentive and Hyperactivity scales. Reynolds

and Kamphaus then used EFA to explore 3-factor and 4-factor solutions for alternative scale

groupings for composites. The authors concluded that the 4-factor CFA solution was supported

by the EFA. Correlation studies with other measures provided additional support of scale validity

for the BASC-2 SRP-A. Clinical profiles were created for several diagnostic groups: Attention-

Deficit/Hyperactivity Disorder, Bipolar Disorder, Emotional/Behavioral Disturbance, Hearing

Impairment, Learning Disability, Mental Retardation or Developmental Delay, Motor

Impairment, Pervasive Developmental Disorders, and Speech of Language Disorder. T-score

mean profiles for each group were computed based on the general combined sex norms.

28 

 

CHAPTER 3

METHOD

Description of Sample

The data for the current study were gathered as part of the standard intake

procedures for counseling and psychological evaluation clients referred by the Department of

Juvenile Justice to the Juvenile Counseling and Assessment Program (JCAP). All youth

consented to completing a battery of intake instruments prior to initiating counseling services

with JCAP or assessment instruments as part of a psychological evaluation. The sample for the

current study included 205 adolescents with an average age of 15.42. The percentage difference

by gender was 52.2% male and 47.8% female. The data was collected either as part of an

individual counseling intake (42.0%), group counseling intake (29.3%), psychological evaluation

(25.4%) or focused data collection at a detention center (3.4%). The grade level disbursement for

the sample included 2.6% in 6th grade, 9.7% in 7th grade, 23.0% in 8th grade, 35.2% in 9th grade,

21.4% in 10th grade, 7.7% in 11th grade, and .5% in 12th grade. The clients’ self-labeled ethnicity

disbursement was 63.7% African-American, 20.4% Caucasian, 10.4% Hispanic/Latino, 1.5%

Asian-American/Pacific, 2.5% “Multiracial”, and .5% each for “Native-American-Mexican”,

“White-Mexican”, and “Caucasian-Egyptian”.

Statistical Analysis

The current study utilized a combination of techniques to evaluate the psychometric

properties of the Behavioral Assessment System for Children, Second Edition (BASC-2) and the

higher-order factor structure of its scales. Internal consistency estimates were computed for each

scale of the instrument. The item response values were set to 0=Never, 1=Sometimes, 2=Often,

3=Almost Always and 0=False, 2=True; these values are the values used by Reynolds and

29 

 

Kamphaus (2004) in development of the BASC-2. Convergent and discriminant validity was

evaluated with correlations between the BASC-2 scales and composites with MMPI-A scales in

theoretically meaningful directions. Confirmatory factor analysis (CFA) was used to evaluate the

fit of the sample’s covariance matrix of scale scores with the proposed higher-order factor

structure of the BASC-2 (figure 1), as described by Reynolds and Kamphaus (2004). Exploratory

factor analysis (EFA) was then conducted to explore for alternative factor structures. The scale

scores of the BASC-2 represent individual first-level factors; the covariance matrix of these

scores was used for confirmatory analysis and the correlation matrix will be used for factor

exploration.

Prior to beginning factor analysis, minimum sample sizes were determined for the CFA

and the EFA based on the suggestions of (MacCallum, Widaman et al. 1999; Jackson 2001;

MacCallum, Widaman et al. 2001; Hogarty, Hines et al. 2005; Mundfrom, Shaw et al. 2005)

with good confidence that a sample of 200 response sets would be adequate for the CFA models

and to reproduce any “true” factors in the EFA. Next, the data was screened for outliers using an

SPSS macro (normtest), developed by DeCarlo (1997) and using Stem-and-Leaf plots. The score

distributions were evaluated for skew and kurtosis, followed by a review of the Kaiser-Meyer-

Olkin Measure of Sampling Adequacy (provided by SPSS) to determine if the covariance and

correlation matrices were suited for factor analysis.

Gorsuch (1983) suggested using principle axis factoring (PAF) when exploring for factor

structure and using principle component analysis when reducing the number of items or scales.

Because the current study involved an exploratory portion, PAF was used to explore alternative

factor structures. The number of factors to rotate was identified with a combination of scree plot

evaluation (Zwick & Velicer, 1986), parallel analysis (Zwick & Velicer, 1986; O’Conner, 2000),

30 

 

simple structure, and interpretability criteria. A direct oblimin (delta=0) rotation was used and

the cut-off value for factor loadings will be set at .30.

Figure 1, Composite to Scale Relationships on BASC-2 SRP-A *Note, the dotted lines denote inverse relationships, rectangles represent scales, and ovals represent composites.

Attitude to School

Attitude to 

Self‐Reliance

Sensation Seeking

Atypicality

Self‐Esteem

Interpersonal Relations 

Relations with Parents

Locus of Control

Social Stress

Anxiety

Depression

Sense of 

Somatization

Attention Problems

Hyperactivity

School Problems 

Internalizing Problems 

Inattention/ Hyperactivity 

Personal Adjustment 

Emotional Symptoms 

School Problems 

31 

 

Instruments

The Behavioral Assessment System for Children-Second Edition (BASC-2) is a

multidimensional, multimethod assessment system for evaluating behavior and self-perceptions

of children and young adult (Reynolds and Kamphaus 2004). The BASC-2 consists of three

separate components; the rating scales, the Sturctured Developmental History (SDH) form, and

the Student Observation System (SOS) used to record classroom observations. The rating scales

consist of three versions; the Parent Rating Scale (PRS), the Teacher Rating Scale (TRS), and the

Self-Report of Personality (SRP). The system was designed to evaluate the student’s behaviors

from three perspectives; the student’s (self), the teacher’s, and the parent’s. The student’s

perspective is gathered through the SRP rating scales for ages 8-25 years (8-11, Child; 12-21,

Adolescent; and 18-25, College). The teacher’s perspective is gathered with the TRS rating

scales and the SOS observation form. The TRS has separate rating scales for preschool (ages 2-5

years), child (6-11 years), and adolescent (12-21 years). The PRS rating scale measures the

parent’s perspective along with the SDH structured background interview. The PRS scales are

seperated for age groupings like the TRS; preschool (2-5), child (6-11), and adolescent (12-21).

Within each version of the rating scales (SRP, PRS, and TRS), individual clinical, adaptive, and

composite scales provide normative comparisons of the student with peers of his/her same age.

The test authors suggest not basing diagnoses, placements, or treatments on BASC-2 results

alone. Rather, they state that “when all the BASC-2 components have been collected along with

a clinical interview and a review of school and clinical records and histories, the professional

will have the information needed for a thorough, comprehensive evaluation of behavior,

personality, and context” (pg. 7; Reynolds & Kamphaus, 2004).

32 

 

The BASC-2 was developed to make improvements on the original BASC (Reynolds &

Kamphaus, 1992). The SRP item improvements and item development for the second-edition

were based on user feedback and review of the original scale items. Specifically, the original

BASC SRP scales tended to “contain more items, have lower reliabilities, and have more

restricted normative distributions” (p.94; Reynolds & Kamphaus, 2004). Students also were

reported to have difficulty choosing between true and false (suggesting a need for a finer

response gradation).

The authors conducted a study of the new 4-point response scale (never, sometimes,

often, almost always: N/S/O/A) for the SRP to test the appropriateness of it versus the T/F

format. They created two versions of the BASC-SRP with the only differences being in response

format (T/F or N/S/O/A) and wording of some items to accommodate the 4-point response

format (for instance if the word often was in the original question, it was removed). 131 students

participated in the study of the SRP-A and 230 participated in the SRP-C. They found internal

consistency to be highest for scales with a mixed response format (T/F and N/S/O/A) and that the

formats varied by scale for test-retest correlations; the N/S/O/A format had higher correlations

on 13 of 26 scales while the T/F had higher correlations on 12 of 26 scales. The authors

concluded that a mixed response format was the best choice.

Item selection for the BASC2-SRP-A was based on the standardization sample of 3,180

students and 256 items. To accommodate the mixed response format, the authors weighted the

T/F responses based on their overall standard deviations. They noted that on average, the T/F

standard deviations were half the size of the N/S/O/A. The selected weight resulted in a scoring

of T/F items as 2/0 and the N/S/O/A as 0/1/2/3. They stated the primary goals of the analyses as

scale reliability, distinctiveness, and interpretability. Specifically, scales should contain items

33 

 

that represent the construct, and correlate with other scales in predicted directions. To

accomplish these goals, the authors performed scale-by-analysis and analysis of all scales

simultaneously using Confirmatory Factor Analysis (CFA) with Amos 5.0, primarily.

Scale item analysis utilized CFA and SPSS based reliability estimates. The authors

guided item-retention decisions based on item-scale correlations, standardized factor loadings,

and theory. In general, they retained items with the highest correlations, highest loadings, and if

they were conceptually good markers of the construct (i.e. illegal drug use for the Conduct

Problems scale). The remaining items were subjected to a full CFA with all scales. Each item

was allowed to load on only one scale and the modification indexes (MIs) were used to gauge the

singular fit of each item with its scale. If the MIs suggested a statistically different fit for an item,

the authors investigated the item and dropped it if it had excessive overlap with another scale or

a low loading with its own scale. The authors reported dropping less than 10% of the items on

any level of form. They also examined the readability levels of items (SRP=2nd grade) and the

bias of test items. To explore the bias, the authors used partial correlations between individual

items and the demographic groups (between females and males, and among African-American,

Hispanic, and white children), and they used Differential Item Functioning estimates; “overall

fewer than five items were removed”, (p.109).

The general norm sample for the SRP-A was representative of the US population by

gender, geographic region, ethnicity, mother’s educational level, and special education

classification. Specifically, the SRP-A general sample included 4.5% AD/HD, 3.1% EBD, 1.1%

MR, 0% PDD, 6.5% LD, and 2.2% Speech/Language. The clinical norm sample for the SRP-A

included students in Special-Education classrooms and clinics, treatment centers for youth with

34 

 

emotional/behavioral issues, or students identified in the general sample as having a

representative issue for a total of 950 youth 12-18 years of age.

T-scores were developed for the normative samples using a linear transformation of raw

scores {LT = [50+(X-M)]/SD}. This transformation maintained the shape of the raw score

distributions, which was reasoned to be a meaningful representation of the population

distribution shape because measurement of uncommon problems often show theoretically

meaningful skew. The authors chose to use this transformation rather than an area transformation

that would have converted the shape to a normal distribution.

The Minnesota Multiphasic Personality Inventory, Adolescent version (MMPI-A) is a

478 item objective measure of personality. The items include a 2-point metric consisting of true

and false as response choices. It is a widely used assessment of adolescent psychopathology in

clinical and research settings (Butcher, Williams et al. 1992). Subjects for the instruments

normative sample were recruited at middle schools and high schools at geographical points

across the United States. The normative sample included data from California, Minnesota, New

York, North Carolina, Ohio, Pennsylvania, Virginia, and Washington state. The normative

sample was adequately stratified by ethnicity/race. The clinical sample for the instrument was

comprised of 420 boys and 293 girls in treatment facilities in the Minneapolis area,

predominately from alcohol and drug treatment facilities. The instrument derives 7 Validity

Scales, 10 Clinical Scales, 31 Clinical Subscales, 15 Content Scales, 31 Content Component

Subscales, and 11 Supplementary Scales.

Data Collection

As noted previously, the data for the current study were gathered from intake batteries

administered to prospective JCAP clients. The youths were referred for counseling services or

35 

 

psychological evaluations by their probation officers or directly by a judge and reported to either

the Department of Juvenile Justice probation office or Juvenile Court for counseling intake

screenings for JCAP services. Psychological evaluations were conducted at the court or

probation buildings in the youth’s county or at the youth’s placement or temporary detention

facility. Counseling intakes were completed by an intake counselor with the prospective client

and guardian(s) and the psychological evaluations were completed by doctoral level students in

counseling psychology. Each youth and guardian signed consent forms for the clinical data from

the intake batteries to be used in research studies conducted by JCAP. The data is archival in

nature and was not collected for the specific purposes of the current study, but for general

research and evaluative studies within JCAP.

Data screening included evaluating validity scores for assessment instruments, screening

for outliers, distribution normality for skew and kurtosis, identification status of CFA models,

and for the EFA the Kaiser-Meyer-Olkin Measure of Sampling Adequacy was also used. Cut-off

scores for validity on the SRP-A were reported in the manual (Reynolds & Kamphaus, 2004) as

the following table. A combination of V, and L values were used in determining if the scores

from a particular administration were valid. A set of scores on the SRP-A was deemed invalid if

it had both a 4 and higher on scale V and a 12 and higher on scale L. The cut-off validity scores

for the MMPI-A were 66 and above for scale L, 90 and above for scale F, and 80 and above for

VRIN and TRIN.

Screening for outliers involved investigating skew, kurtosis, and the distributions of each

variable. A cut-off score of 7 was used for skew and kurtosis. As can be seen in the following

table, most variables demonstrated very low values. Atypicality and Interpersonal Relations were

the only variables in which kurtosis was higher than 2, but no variable demonstrated skew or

36 

 

kurtosis greater than 7. Reviewing the Stem-and-Leaf plots for each variable showed that most

variables approximated a normal distribution. Atypicality, Depression, and Somatization seemed

to be weighted toward lower scores, while Interpersonal relations and Self-Esteem appeared

heavily weighted toward higher scores.

Table 2, BASC-2 and MMPI-A Scale Statistics for Sample N Mean Std. Deviation Skewness Kurtosis

Attitude to School 205 53.26 11.74 .557 -.599 Attitude to Teachers 205 56.33 11.31 .581 -.115 Sensation Seeking 205 51.79 10.37 .008 -.121 Atypicality 205 52.60 12.49 1.524 2.259 Locus of Control 205 54.96 11.99 .566 -.310 Social Stress 205 51.84 11.37 .882 .769 Anxiety 205 50.66 11.13 .517 -.090 Depression 205 54.11 12.23 .982 .261 Sense of Inadequacy 205 56.68 12.57 .647 .148 Somatization 205 53.06 11.85 .782 -.037 Attention Problems 205 55.58 10.96 .185 -.562 Hyperactivity 205 53.66 12.43 .670 -.107 Relation with Parents 204 44.73 12.82 -.145 -.966 Interpersonal Relations 205 51.00 10.15 -1.574 3.171 Self-Esteem 205 50.89 10.66 -1.324 1.604 Self-Reliance 205 45.17 10.12 .153 -.625 Hypochondriasis, 1 17 55.35 14.29 .757 -.903 Depression, 2 17 57.94 10.99 .189 -1.085 Hysteria, 3 17 53.29 13.33 .496 -.102 Psychopathic Deviate, 4 17 63.06 11.92 1.164 1.817 Masculinity/Femininity, 5 17 43.12 8.96 .313 -.766 Paranoia, 6 17 52.35 9.71 .554 -.336 Psychasthenia, 7 17 52.41 11.97 .114 .002 Schizophrenia, 8 17 51.71 10.12 .518 -.411 Hypomania, 9 17 54.24 12.08 .160 -.625

37 

 

Social Introversion, 0 17 52.77 9.73 -.231 -1.118 Identification of the CFA model took place prior to analysis. In CFA, the information

being analyzed is not the number of observations; it is the number of correlations (or

covariances). There are [k * (k-1)]/2 unique correlations in a correlation matrix, and [k * (k +

1)]/2 unique covariances in a covariance matrix, where k is the number of variables. These

correlations or covariances are the information in CFA, and the unknowns are the path values to

be estimated. Since the BASC-2 scale scores are the variables, the 16 scales result in

(16*15)/2=120 unique correlations and (16*17)/2=136 unique covariances for this study. In CFA

the parameters to be estimated are: the factor loadings, measurement error variances, factor

variances, factor correlations or covariances, and measurement error correlations or covariances

(if any). The current study has 22 factor loadings, 16 measurement error variances, 5 factor

variances, and 10 factor correlations for a total of 53 parameters to be estimated, which is much

lower than the number of pieces of information (136) and allows the model to be overidentified.

Limitations

The sample for the current study was not drawn at random from the greater population of

juvenile offenders. The participants were recruited through intakes for counseling and

psychological services. The sampling procedures used in the current study may limit the

generalizability of the results; however, the sample is exclusively comprised of the subset of

juvenile offenders of interest, namely those youth being screened for psychological services.

Assumptions

Because the youth were all mandated to participate in counseling services or a

psychological evaluation, their instrument scores may be suspect to bias, random responding, or

other threats to score integrity. The validity indexes of the BASC-2 are assumed to represent the

38 

 

youth’s appropriateness in responding. The youth in the current study, although a subset of

juvenile offenders receiving mental health services in the southeast, are assumed to be

representative of the greater population of juvenile offenders who may be screened for mental

health services.

Hypotheses

The general hypothesis involved the validity of the BASC-2 in the Juvenile Offender

population and led to specific questions. Will the BASC-2 scales demonstrate adequate levels of

internal consistency in the current sample? Will the BASC-2 scales correlate in theoretically

predicted directions within its own scales and with the scales of the MMPI-A? Will the higher-

order factor structure be confirmed in the current study? Will alternative higher-order factors

emerge that explain the inter-scale correlations of the BASC-2 within a juvenile offender

sample? If factors emerge, will they be conceptually different than the BASC-2 composites? If

the factors are conceptually different, will they be a better fit for the data than the composites in

a separate sample using CFA?

Null Hypothesis 1: The BASC-2 scales will not demonstrate adequate levels of internal

consistency in the current sample

Null Hypothesis 2: The BASC-2 scales will not correlate in theoretically predicted

directions within its own scales.

Null Hypothesis 3: The BASC-2 scales will not correlate in theoretically predicted

directions with the scales of the MMPI-A.

Null Hypothesis 4: The higher-order factor structure will not be confirmed in the sample.

Null Hypothesis 5: No alternative higher-order factors will emerge to explain the inter-

scale correlations of the BASC-2 within a juvenile offender sample.

39 

 

CHAPTER 4

Results

Reliability

Null Hypothesis 1: The BASC-2 scales will not demonstrate adequate levels of internal consistency in the current sample.

Internal consistency is a measure of how closely or similarly each item within a defined

scale varies with the other items within that scale. The reasoning is that if the items are intended

to measure a specific construct; for instance depression, then the ratings or responses for each

item should reflect the degree to which the respondent exhibits the construct. In other words, if a

person has a high level of construct X, then his or her responses will demonstrate that level. If

the scale that measures X is internally consistent, then the value (or level) of each item response

should be relatively consistent with each other.

A high level of internal consistency is desired when attempting to measure a construct

that is clearly defined. Issues emerge when using internal consistency values absent of theory.

Consider the differences between attempting to measure a person’s school performance

(relatively objective construct) and a person’s zest for life (relatively subjective and difficult to

define in many ways). In many ways, clinical syndromes and disorders, like depression, can

manifest broadly and therefore a “tight” scale with an extremely high level of internal

consistency is not necessarily desirable in that it may only be measuring one aspect of depression

(i.e. sleep disturbance or hopelessness). Therefore, for the purposes of this study, theory in

addition to benchmark levels of internal consistency were used to evaluate the appropriateness of

the scale internal consistencies. Specifically, as suggested by Ponterotto and Ruckdeschel (2007),

40 

 

sample size and number of items within the scales were used to identify the quality of the

coefficient alphas.

Table 3, Coefficient Alpha Classifications 7-11 Items >11 Items

Excellent .80 .85

Good .75 .80

Moderate .70 .75

Fair .65 .70

Coefficient alphas were calculated for each scale in the BASC-2 SRP-A (table below).

The following scales demonstrated internal consistency in the excellent range; Attitude to

School, Attitude to Teachers, Atypicality, Locus of Control, Social Stress, Anxiety, Depression,

Sense of Inadequacy, Hyperactivity, Relations with Parents, and Self-Esteem. The scales which

exhibited good consistency were Somatization and Interpersonal Relations. Attention Problems

demonstrated a moderate level of internal consistency while Sensation Seeking and Self-reliance

fell in the unsatisfactory category.

Each scale was also evaluated based on how the items functioned within the scale. The

weakest items of each scale will be the ones that correlate the least with the other items and also

least with the total score for the scale. The strongest items within each scale will be the ones that

are most correlated with the other items and with the total. Occasionally, deleting the weakest

item of a scale can increase the overall internal consistency of a scale. Also, evaluating the

difference in the interpretation of the weakest and strongest item wordings can give insight into

41 

 

the scale itself. For instance, typically the strongest item within the scale can be viewed as the

most representative of the actual construct being measured by the scale.

Table 4, Cronbach Alpha’s for Current Study and for Normative Sample

Cronbach's

Alpha (Current)

N of Items (Current)

Cronbach’s Alpha

(Normative)

Relation with Parents .91 10 .88Depression .88 12 .86

Anxiety .85 13 .86 Self-Esteem .84 8 .82 Atypicality .84 9 .82

Social Stress .84 10 .83 Locus of Control .83 9 .78

Attitude to School .82 7 .82 Hyperactivity .82 7 .74

Attitude to Teachers .81 9 .79 Sense of Inadequacy .81 10 .79

Interp. Relations .79 7 .78 Somatization .76 7 .67

Attention Problems .71 9 .79 Sensation Seeking .64 9 .70

Self-Reliance .60 8 .70 Note, Cronbach Alphas for normative sample are as reported by Reynolds and Kamphaus (2004)

The “weakest” item within Attitude to School was item 70 with an item-total correlation

of .381 and if it were deleted from the scale, alpha would rise to .824. Item 70 reads “My school

feels good to me.” The strongest item (172) would drop alpha to .752 if it were deleted. It has an

item-total correlation of .784 and reads “I hate school.”

42 

 

Sensation Seeking was an unsatisfactorily performing scale and it had two relatively

weak items (items 27, r = .043 and 57, r = .044), however, alpha would rise slightly more (.671

versus .652) if item 57 were dropped. These items read, respectively, “I like loud music” and “I

would rather be a police officer than a teacher.” The latter item immediately appears problematic

for a group of youth who typically have issues with police and educators. If both of these items

were dropped, alpha would rise to .690, not a large increase, but it would raise the scale from an

unsatisfactory level of consistency to a fair level of consistency. The strongest item for this scale,

item 77, reads “I like it when my friends dare me to do something.” If it were deleted, alpha

would drop to .559.

One item in particular in the Atypicality scale, item 149, had a miserable item-total

correlation (r = .212) and would raise alpha to .849 if it were dropped. This item reads “Someone

else controls my thoughts.” The strongest item (122) correlated at .733 with the total, reads, “I

hear voices in my head that no one else can hear,” and seems to be a more face valid description

of the clinical definition of Atypicality. If this item was removed, alpha would drop to .803.

Alpha would rise just barely to .878 if item 3 were deleted from the Depression scale.

This item correlated with total at .393 and reads “Nothing goes my way.” The strongest item (33,

“Nobody ever listens to me.”) would drop alpha to .859 if deleted and had an item-total

correlation of .743. On the Somatization scale, deleting item 4 (r = .349, “My muscles get sore a

lot.”) would increase alpha to .762. The deletion of the strongest item within this scale (99, r =

.590, “I feel dizzy.”) would result in dropping alpha to .710. Deleting item 95 (r = .180, “I listen

when people are talking to me.”) would raise alpha for Attention Problems to .722. Deletion of

the strongest item (143, r = .561, “I have trouble paying attention to what I am doing.”) would

drop alpha to .656. Dropping item 118 (r = .334, “I talk while other people are talking.”) from

43 

 

scale Hyperactivity would raise alpha to .822. Deleting the strongest item (124, r = .674, “I have

trouble sitting still.”) would drop alpha to .769. Within scale Self-Esteem, one item in particular

performed much worse than the other items. Item 104 (r = .062, “I am good at things.”) could

increase alpha to .887 if deleted. The strongest item of this scale (74, r = .741, “I like the way I

look.”) would reduce alpha to .802 if deleted.

Several of the scales demonstrated relatively consistent item performance. Essentially,

these scales did not have any absolute standout items as being weak or strong. This is most

obvious when the coefficient alpha drops with the deletion of any individual item. This was the

case for Attitude to Teachers, Locus of Control, Social Stress, Anxiety, Sense of Inadequacy,

Relationship with Parents, Interpersonal Relations, and Self-Reliance.

Item 145 (“My teacher is proud of me.”) was technically the weakest item within the

Attitude to Teachers scale. It correlated with the scale total at .410, but if it were dropped, alpha

would actually drop slightly to .804. In fact, all items within this scale would drop alpha if

removed. Item 85, with an item-total correlation of .564 would drop alpha to .785 if removed. It

reads “ My teacher trusts me.”

Within the scale, Locus of Control, no single item functioned particularly poorly and

dropping any of the items would result in a drop in alpha. The weakest item (36) reads “My

parents have too much control over my life,” had an item-total correlation of .406, and would

drop alpha slightly to .830. The strongest item (66, “My parents blame too many of their

problems on me.”) would drop alpha to .801 and had an item-total correlation of .658.

Dropping any of the items within Social Stress would result in a drop in a reduction in

internal consistency. The weakest item (165, “I feel that others do not like the way I do things”)

would drop alpha to .829 and had an item-total correlation of .456, while the strongest item (116,

44 

 

“I am left out of things.”) correlated with the total at .678 and would drop alpha to .813 if

deleted. The weakest item in Anxiety (20, “I worry about little things.”) would drop alpha to .847

and the strongest (110, “I worry, but I don’t know why.”) would drop it to .828. The item-total

correlations for these items, respectively, were .373 and .647.

All item deletions within Sense of Inadequacy would reduce alpha. The weakest item-

total correlation (r = .347) was for item 30 (“I cover up my work when the teacher walks by.”)

Deleting this item would drop alpha to .810, while dropping item 120 (r = .640, “I want to do

better, but I can’t.”) would result in alpha = .776.

Deleting any of the items from scale Relationship with Parents will reduce alpha. The

smallest drop in alpha (.910) would come from deleting item 132 (r = .542, “My mother and

father like my friends.”) and the largest drop in alpha (.895) would result from deleting item 126

(r = .797, “My parents are easy to talk to.”).

Interpersonal Relations presents no item deletions that could result in an increase in

alpha. The weakest item (13, r = .433) would reduce alpha to .775 and the strongest item (43, r =

.599) would reduce alpha to .751. These items read, respectively, “My classmates don’t like me”

and “Other children don’t like to be with me.”

In terms of internal consistency, Self-Reliance was the worst performing scale, but no

item deletions would result in an increase in alpha. The worst performing item (46) would only

slightly reduce alpha (.595) if it were deleted and the strongest item (123) would reduce alpha to

.500 if deleted. The item-total correlations of this items were .181 and .508, respectively, and

read “I can handle most things on my own” and “I am good at making decisions.”

45 

 

Validity

Null Hypothesis 2: The BASC-2 scales will not correlate in theoretically predicted directions within its own scales.

The scales within an instrument, like the BASC-2, often measure various dimensions of a

broader construct. In the case of the BASC-2 SRP-A, the scales are multidimensional

representations of the youths personality and behavioral functioning. It contains both clinical and

adaptive scales that would be predicted to correlate negatively with each other. Also, it would be

expected that the scales which purport to measure aspects of a particular higher-order construct

(i.e. the composites) would correlate more highly than scales which measure drastically different

higher-order constructs. Inter-scale correlations were computed between all scales of the BASC-

2 (see table below).

The adaptive scales and the clinical scales appear, for the most part, to be negatively

correlated, as predicted. All of the significant correlations between clinical and adaptive scales

were in the negative direction except for one, rs-r,ss = .227, p<.001. this correlation between Self-

Reliance and Sensation Seeking represents two scales that do not absolutely dictate adaptive or

clinical polarities. For instance, being high in Sensation Seeking is not as “clinical” as being, for

example, high in Depression. The same holds for Self-Reliance and therefore it is understandable

that these two scales did not show a discernable pattern of correlations across any of the scales.

Self-Reliance didn’t relevantly correlate with any other scale and Sensation Seeking, for

example, had 10 scale correlations under .30, 8 under .20, and 3 under .05. It’s highest

correlation, r = .484, p<.001, was with Hyperactivity and is likely the most appropriate scale, of

46 

 

any, for it correlate with strongly; eventhough it is in a composite with Attitude to School and

Attitude to Teachers.

Table 5, Interscale Correlations within BASC-2 SRP-A School Problems

Composite Internalizing Problems Composite Inatt/ Hyp

Comp. Personal Adjustment

Composite

AttSch

AttTch

SnSkg

Atyp

LoC

SoStrs

Anx

Dep

SoI

Som

AttPrb

Hyp

RlPrts

IntRel

S-E

S-R

AttSch 1 .510 .333 .311 .409 .275 .189 .350 .347 .301 .424 .349 -.185 -.202 -.151 .011

AttTch 1 .266 .447 .398 .454 .303 .445 .471 .321 .412 .360 -.256 -.367 -.323 -.066

SnSkg 1 .310 .175 .149 .155 .127 .259 .134 .325 .484 .045 .009 .035 .227

Atyp 1 .493 .637 .614 .604 .566 .511 .537 .531 -.239 -.503 -.507 -.035

LoC 1 .625 .630 .732 .579 .532 .422 .369 -.557 -.380 -.505 -.188

SoStrs 1 .714 .728 .663 .542 .410 .387 -.429 -.657 -.653 -.101

Anx 1 .714 .651 .609 .453 .404 -.316 -.487 -.557 -.073

Dep 1 .708 .559 .439 .299 -.452 -.515 -.578 -.153

SoI 1 .486 .573 .395 -.247 -.495 -.542 -.275

Som 1 .359 .334 -.266 -.445 -.429 -.118

AttPrb 1 .643 -.224 -.301 -.342 -.182

Hyp 1 -.101 -.133 -.176 .050

RlPrts 1 .287 .482 .186

IntRel 1 .628 .304

S-E 1 .295

47 

 

S-R 1

Attitude to Teachers and Attitude to School correlated with each other as expected, r =

.510, p<.001. They did not correlate as well with Sensation Seeking, as mentioned previously.

The Internalizing Problems Composite demonstrated all positive and significant correlations

between the scales. The interscale correlations ranged from .486 to .714. Attention Problems and

Hyperactivity correlated well, as expected, r = .643, p<.001. The scales within the Personal

Adjustment Composite were mixed. They were all positive and significant, but the magnitude

was not as great as would be hoped with three of the six correlations being below .30 and one

was just barely greater than .30.

Null Hypothesis 3: The BASC-2 scales will not correlate in theoretically predicted directions with the scales of the MMPI-A.

The MMPI-A is one of the most widely used personality assessment instrument.

Correlating the scales from the BASC-2 with the scales of the MMPI-A can give insight into

how well the BASC-2 is measuring the constructs which are similar to the constructs measured

by the MMPI-A. Essentially, if two scales that purport to measure the same or similar construct

correlate strongly with each other, it provides confidence that they are valid measurements of

that particular construct. The table below presents the correlations, note that shaded areas are

broadly expected correlations and the highlighted cells are specifically expected correlations.

In general, it was expected that the scales within the School Problems and

Inattentive/Hyperactive Composites would correlate positively with the externalizing scales

involving impulse control and emotional lability (i.e. scales 4 and 9). It was also expected that

the Personal Adjustment Composite scales would negatively correlate with all scales and the

48 

 

Internalizing Problems scales would correlate positively with the more internalizing scales

(scales 1-3 and 6-8). Specific correlations were expected to be positive between Sensation

Seeking and scale 9; Atypicality and scale 8; Anxiety and scales 3, 6, and 7; Depression and

scale 2; Somatization and scales 1 and 3; and the Personal Adjustment scales and scale 4.

Table 6, BASC-2 SRP-A Correlations with MMPI-A

Hypo, 1

Dep, 2

Hyst, 3

Psych Dev, 4

Masc/Fem

, 5

Paran, 6

Psychasth, 7

Schiz, 8

Hypom

an, 9

Soc Intr, 0

AttSch .297 -.112 .182 .286 .199 .127 .119 .348 .297 .081

AttTch .219 -.045 .233 .094 .025 .153 .227 .268 .038 .090

SnSkg .087 -.410 -.058 -.166 -.340 .222 .139 .307 .574* -.289

Atyp .613** .490* .567* .664** .414 .581* .513* .746** .231 .096

LoC .411 .391 .475 .702** .605* .520* .386 .613** .128 .283

SoStrs .526* .588* .632** .873** .515* .623** .469 .585* .029 .188

Anx .463 .638** .504* .809** .661** .602* .509* .595* .112 .240

Dep .396 .642** .504* .754** .517* .770** .478 .551* -.074 .220

SoI .218 .393 .299 .651** .237 .645** .379 .539* .201 -.023

Som .559* .449 .663** .647** .417 .528* .524* .680** .239 .236

AttPrb .242 .216 .059 .196 -.283 .407 .576* .476 .194 .309

Hyp .389 .035 .061 .097 -.041 .213 .490* .546* .283 .235

RlPrts -.412 -.466 -.490* -.664** -.423 -.374 -.361 -.265 .177 -.310

IntRel -.374 -.666** -.571* -.798** -.484* -.545* -.320 -.381 .154 -.261

49 

 

S-E -.487* -.741** -.688** -.838** -.521* -.640** -.445 -.444 .184 -.284

S-R -.314 -.396 -.166 -.442 .052 -.565* -.721** -.566* -.391 -.485*

As the table shows, the “externalizing” scales of the BASC-2 did not correlate well with

the predicted scales (4 and 9) on the MMPI-A. The general expectation that the “internalizing”

scales would correlate with scales 1-3 and 6-8 was confirmed quite well. Finally, the expectation

that the “adaptive” scales would correlate negatively with all MMPI-A scales was very well

supported.

In evaluating the specific correlations expected to occur, it can be noticed that all were in

the predicted direction and of relevant magnitude. Although the sample size was very small

(n=17), all but one correlation (Self-Reliance/Scale 4, r = -.442, p>.05) reached statistical

significance. Sensation Seeking and scale 9 (Hypomania), r = .574; (r = .574); Anxiety and scale

3 (Hysteria), r = .504; Anxiety and scale 6 (Paranoia), r = .602; Anxiety and scale 7

(Psychasthenia), r = .509; and Somatization and scale 1 (Hypochondriasis), r = .559 were all

significant at p< .05. The remaining correlations: Atypicality and scale 8 (Schizophrenia), r =

.746; Depression and scale 2 (Depression), r = .642; Somatization and scale 3 (Hysteria), r =

.663; Relation with Parents and scale 4 (Psychopathic Deviate), r = -.664; Interpersonal Relations

and scale 4 (Psychopathic Deviate), r = -.798; and Self-Esteem and scale 4 (Psychopathic

Deviate), r = -.838 were all significant at p < .01.

50 

 

Null Hypothesis 4: The higher-order factor structure will not be confirmed in the sample.

Reynolds and Kamphaus (2004) presented data on the factor structure of the SRP-A.

Confirmatory factor analysis (CFA) supported a four factor model including School Problems,

Internalizing Problems, Inattention/Hyperactivity, and Personal Adjustment. The chi-square and

fit indices were χ2(98) = 4,143, CFI=.848, and RMSEA=.116.

The current study, used (CFA) to examine the fit of various models, including the four-

factor model presented above. Adequate fit of the proposed model with this sample would

support the generalizability of the factor structure and the use of the SRP-A with the juvenile

offender population.The following criteria were used to evaluate fit: CFI ≥ .90, TLI ≥ .85, and

RMSEA ≤ .10 (Hu & Bentler, 1995).

Table 7, Fit Indices for Confirmatory Factor Analyses Model χ2 df CFI TLI RMSEA Null 1942.301 120 .000 -.143 .273 One-factor 543.298 104 .757 .682 .144 Two-factor 498.822 103 .781 .711 .137 Three-factorRK 450.612 101 .806 .739 .130 Three-factorALT 378.334 101 .846 .793 .116 Four-factor 360.828 98 .854 .798 .115 Five-factor 292.669 88 .887 .825 .107 Note: N =205; df =degrees of freedom; CFI =comparative fit index; TLI =Tucker Lewis index; RMSEA =root mean square error of approximation. RK= Reynolds and Kamphaus (2004) three-factor with attention problems and hyperactivity as internalizing problem and ALT = as externalizing problem.

The null model was a test of the independence of the SRP-A scales, while the one-factor

model reflected all scales being determined by a single higher-order factor (see Table above).

Both models fit the data poorly. The two-factor model, consisted of two broad factors: Personal

51 

 

Maladjustment and Personal Adjustment. This model loaded all of the scales whose elevations

suggest clinical and behavioral problems onto the Personal Maladjustment factor and let the

loadings of the adaptive scales remain as reflections of the Personal Adjustment factor (see Table

??). The factors in this model and all subsequent multi-factor models were allowed to correlate (r

= –.796) but residuals were not. This model also fit the data poorly.

The three-factor models consisted of factors representing adjustment, externalizing, and

internalizing. The three- factorRK model represents the model as tested by Reynolds and

Kamphaus (2004) and resulted in relatively adequate fit; χ2(101) = 4,887, CFI=.821, and

RMSEA=.124. This model fit less well in the current sample than the developer’s sample.

Results showed the correlation between Externalizing Problems and Internalizing Problems was

.659. The correlation between Externalizing Problems and Personal Adjustment was -.398 and

the correlation between Internalizing Problems and Personal Adjustment was -.804.

In the other three- factorALT model, Externalizing Problems was a merge of the scales

representing the Inattention/Hyperactivity and the School Problems composites of the SRP-A,

Internalizing Problems and Personal Adjustment were not changed from the configuration of the

SRP-A. Results showed the correlation between Externalizing Problems and Internalizing

Problems was .683. The correlation between Externalizing Problems and Personal Adjustment

was -.391 and the correlation between Internalizing Problems and Personal Adjustment was -

.824. The three-factorALT model provided a borderline-acceptable fit with the data, but was a

better fit with the data (AIC = 480.334) than the other three-factorRK model (AIC = 552.612) and

the two factor model (AIC = 596.822).

The four-factor model consisted of the same factors that represent the SRP-A composites

except the cross loadings associated with the Emotional Symptoms Index were removed, leaving

52 

 

only the main composites of School Problems, Internalizing Problems, Inattention/Hyperactivity,

and Personal Adjustment; therefore, this model is a simplified version of the SRP-A composites.

Results showed the correlation between School Problems and Internalizing Problems at .624, the

correlation between School Problems and Personal Adjustment at -.379, and the correlation

between School Problems and Inattention/Hyperactivity at .744. The correlation Internalizing

Problems with Inattention/Hyperactivity was .653 and with Personal Adjustment was -.823. The

correlation between Personal Adjustment and Inattention/Hyperactivity was -.393. The model

provided moderate fit with the data. Furthermore, the four-factor model provided better fit with

the data (AIC = 468.828) than the three-factor model (AIC = 480.334).

The five-factor model was the model as proposed within the BASC-2 SRP-A. This model

included the factors of the four-factor model, but added cross-loadings with some scales and a

fifth factor, the Emotional Symptoms Index. Because this index overlaps with the other

composites, correlation estimates will not be reported. The correlations between the other factors

are as follows; School Problems and Interpersonal Problems, r = .654; School Problems and

Personal Adjustment, r = -.481; School Problems and Inattention/Hyperactivity, r = .757;

Internalizing Problems and Personal Adjustment, r = -.481; Internalizing Problems and

Inattention/Hyperactivity, r = .673; and Personal Adjustment and Inattention/Hyperactivity, r = -

.448. This model provided borderline acceptable fit and better fit with the data (AIC = 420.669)

compared with the simplified four-factor model (AIC = 468.828) and the standardized parameter

estimates per factor can be found in table 8.

53 

 

Table 8, Standardized Parameter Estimates for Five-factor Model

Factors

Scale Personal

Adjustment Inattention/

HyperactivityInterpersonal

Problems School

Problems

Emotional Symptoms

Index

Self-Reliance -.163 - - - .068

Self-Esteem .608 - - - .032

Interpersonal Relations .686 - - - -

Relations with Parents .515 - - - -

Hyperactivity - .774 - - -

Attention Problems - .831 - - -

Somatization - - - - -

Sense of Inadequacy - - .881 - .017

Depression - - .814 - -.010

Anxiety - - .749 - -.014

Social Stress - - .714 - -.027

Locus of Control - - .773 - -

Atypicality - - .735 - -

Sensation Seeking - - - .514 -

Attitude to Teachers - - - .673 -

Attitude to Teachers - - - .679 -

54 

 

Null Hypothesis 5: No alternative higher-order factors will emerge to explain the inter-scale correlations of the BASC-2 within a juvenile offender sample.

To evaluate the final hypothesis, exploratory factor analysis (EFA) was used. Factors

were extracted with principle axis factoring. This was determined to be the appropriate method

of extraction because the ultimate purpose of the current study was akin to scale development

(Comrey, 1988). The scree plot, parallel analysis, simple structure, and interpretability were used

to identify the number of factors. In reviewing the scree plot, Zwick and Velicer (1986) suggest

identifying the point at which the smaller eigenvalues form a straight line and to retain the

eigenvalues falling above this line. The scree plot for the current sample (figure below)

suggested a 3-factor solution.

In a study conducted by Zwick and Velicer (1986), it was noted that parallel analysis

(PA) was highly accurate at identifying the number of factors to retain, and when in error would

have a tendency toward overestimation. A macro program written for SPSS (O'Connor, 2000)

was used to compare the actual eigenvalues for each factor to randomly generated eigenvalues.

Based on the criteria that a “real” factor must have an eigenvalue greater than the generated

eignevalue, the analysis suggested a 6-factor solution (table 9).

Principle axis factoring was conducted for 3, 4, 5, and 6 factor solutions. Since the scree

suggested a 3-factor solution and the parallel analysis (PA) suggested a 6-factor solution the

factor structure for each solution between these two values (i.e. 4 and 5 factor solutions) were

also explored. The 3, 4, 5, and 6 factor solutions were rotated as oblique because the theory

behind the scale suggested inter-factor correlations as well as a hierarchical structure of the

domains according to the composites. Each extraction was subjected to a direct oblimin (delta=0)

55 

 

rotation and a .40 cutoff value was used for identifying salient factor loadings. Dual loadings of

.30 were allowed in an attempt to allow for the same structure to emerge as used in the SRP-A.

Table 9, Parallel Analysis Results # Factors Actual Generated

1 6.650254 .588874

2 1.472813 .473503

3 .559254 .389834

4 .506439 .314790

5 .357622 .243911

6 .241213 .183281

The 6-factor solution (78.803 % variance explained) produced one factor that was not

well-determined (only one salient loading). The 5-factor (74.541% variance explained) produced

the same one-loading (Self-Reliance) factor as the 6-factor solution. The 4-factor (69.331%

variance explained), and 3-factor (63.116%) all produced well-determined factors (generally at

least 3 loadings higher than .40, but as few as 2 allowed for the current study to allow for the

composites to emerge as in the SRP-A). Off factor loadings of .30 or higher were judged to be

potential cross-loadings and the corresponding items were evaluated for theoretical-fit.

Factor interpretability was then taken into account for each factor within each solution (3,

4, 5, and 5 factor solutions). Based on the conceptual cohesion within each factor, it was

determined that the 3-factor solution did not provide adequate interpretability. The 5 and 6-factor

solutions provided some theoretically meaningful factors, but did not reveal full well-defined

factors. The 4-factor solution presented a balance between interpretable factors and simple

structure. It also mostly recreated the scale composites per the SRP-A. Anxiety, Social Stress,

56 

 

Atypicality, Depression, Somatization, and Sense of Inadequacy loading onto the same factor.

Attitude to School, Attitude to Teachers, and Sensation Seeking loaded onto another factor, and

Self-Esteem, Interpersonal Relations, and Self-Reliance also loaded onto one factor. Attention

Problems and Hyperactivity loaded together, but with the “school problem” scales, while Locus

of Control and Relations with Parents loaded together.

Figure 2, EFA 4-factor Structure

Attitude to School

Attitude to 

Self‐Reliance

Sensation Seeking

Atypicality

Self‐Esteem

Interpersonal Relations 

Relations with Parents

Locus of Control

Social Stress

Anxiety

Depression

Sense of 

Somatization

Attention Problems

Hyperactivity

Factor 1 

Factor 3 

Factor 4 

Factor 2 

57 

 

*Note, the dotted line denotes an inverse relationship, rectangles represent scales, and ovals represent factors.

Table 10, Loadings from 4-factor Solution

Scale Factor 1 Factor 2 Factor 3 Factor 4

Anxiety .908

Social Stress .681

Atypicality .614 .329

Depression .557 -.402

Somatization .512

Self-Esteem -.488 .433

Sense of Inadequacy .475 .325

Attitude to School .678

Attention Problems .614

Hyperactivity .299 .613

Attitude to Teachers .573

Sensation Seeking .567 .298

Locus of Control .327 -.710

Relation with Parents .537

Interpersonal Relations -.452 .565

Self-Reliance .479

58 

 

CHAPTER 5

DISCUSSION AND SUMMARY

Summary

Assessment is integral to the practice of counseling psychologists and assessment

instruments are used for myriad purposes. For instance, Kazdin (2005) listed uses of assessments

and among the list he included: diagnosis, case formulation, screening, case identification,

treatment planning, treatment implementation, treatment progress and outcome evaluation, and

cost/benefit evaluations of the treatment.

Kazdin (2005) recommended that the purposes of each instrument be delineated and the

criteria for validation of the instrument’s use for each purpose be specified. He noted that studies

of an instrument’s psychometrics are essentially never finished. There are an infinite number of

possible studies to complete for an instrument with no definite point of “completion”. It is

important that the instruments be validated for each use to develop evidence in support of those

uses. Since validity and reliability are not properties of the instrument, but rather are aspects of

the instruments use, it becomes quite clear why Kazdin (2005) described the limit of studies as

infinite.

With the importance of assessment in various applications, the validation of an

instrument becomes necessary for effective provision of the psychological services. The

movement toward evidence-based assessment (EBA) has recently begun appearing in the

literature (Mash and Hunsley 2005). Achenbach (2005) specified that evidence for the methods

59 

 

and measures for all assessment purposes are needed. He noted that the evidence-based treatment

(EBT) movement pushed forth without first considering how to effectively identify and measure

the problems that are to be treated and the outcomes following those treatments. Achenbach

(2005) mentioned that “without EBA, EBT may be like a magnificent house with no foundation”

and that EBA and EBT will aide in “understanding, preventing, and ameliorating child

psychopathology” (p.547).

Testing the “functioning” of instruments across populations and purposes is necessary.

In an official publication by the Office of Juvenile Justice and Delinquency Prevention, Grisso

and Underwood (2004) state “instruments that provide evidence of reliability and validity with

youth in the juvenile justice system are preferable to those that do not” (p.12). The BASC-2 is a

commonly used behavioral rating scale which has been recommended for the assessment of

conduct problems (McMahan & Frick, 2005) and demonstrates promise for effective use with

juvenile offenders, but validity studies for this purpose are lacking.

The purpose of the current study was to evaluate the validity of the BASC-2 with the

juvenile offender population. In the context of evidence-based assessment, the conditional

validation of instruments per their intended use is best-practice. Although the BASC-2 is

suggested as an appropriate broad screening measure of conduct problems, it had not been

validated for use with juvenile offenders. The current study focused on reliability, discriminant

validity, convergent validity, and the higher-order factor structure of the BASC-2 within a

sample of juvenile offenders. Results of this study have promise to impact the evidence-base of

assessment with juvenile offenders. By validating a broad screener for conduct problems and

related internalizing symptoms, the BASC-2 could aid psychologists and others involved in the

treatment, prevention, and rehabilitation of juvenile offenders.

60 

 

Discussion of Findings

Groth-Marnat (2003) suggested evaluating an instrument in regards to its theoretical

orientation (Does the measure match its theory?), practical considerations (Are its length and

reading level appropriate?), standardization (Is the current population similar to the

standardization population?), reliability (Are reliability estimates adequate?), and validity (Will it

produce appropriate measurements within the intended use?). The current study focused on

answering the last two questions about reliability and validity. Since reliability and validity are

not properties of the instrument itself; rather properties of the specific use of that instrument, this

study evaluated them in the exact context in which the BASC-2 would be used, namely,

psychological evaluations and screening assessments.

Reliability

In the pursuit of answering the questions about reliability, this study utilized item-level

analysis of the scales within the BASC-2 SRP-A. Overall, the scales performed quite well with

11(Attitude to School, Attitude to Teachers, Atypicality, Locus of Control, Social Stress,

Anxiety, Depression, Sense of Inadequacy, Hyperactivity, Relations with Parents, and Self-

Esteem) of the 16 scales demonstrating “excellent” levels of internal consistency. Two scales

(Somatization and Interpersonal Relations) demonstrated “good” levels of internal consistency

with one scale (Sensation Seeking) in the “moderate” range and one (Self-Reliance) in the

“unacceptable” range.

Reliability of variables is important when making determinations about meaningful

differences between scores and when making interpretations of score elevations. Based on these

61 

 

results, it is safe to say that the vast majority of scales held up very well when administered to

youth from a juvenile offender population. The scales also closely reproduced and in some cases

improved upon the coefficient alphas from the normative sample used by the test developers.

When sample sizes increase, alpha coefficients increase too, therefore, it would be expected that

the alphas from the test developers would be higher than those from the current study. For

example, Ponterotto and Ruckdeschel (2007) provide interpretive values for “excellent” levels of

internal consistency in a study with N < 100 at .75 and with N > 300 at .85, a 10 point difference

in the cutoff values. Comparing the coefficient values from the normative sample and the current

study shows that 11 of 16 scales demonstrated higher levels of internal consistency in the current

study and only four demonstrated lower alphas, one of which was only lower by .01. It can

therefore be argued that the individual scales of the BASC-2 SRP-A, when scored from

responses of juvenile offenders, performed at least as well as when administered by the test

developers to the normative sample.

Validity

The current study evaluated validity in several ways. First, convergent and discriminant

validity was evaluated with correlations between the BASC-2 and itself and with the MMPI-A.

Then CFA and EFA were used to evaluate the structure of the factors being measured by the

BASC-2. The results overall support the construct validity of the BASC-2.

Convergent validity evaluates how well a score positively correlates with a similar score,

while discriminant validity relates to a score either not correlating with a dissimilar score or

negatively correlating with an oppositely polarized score. The best convergent and discriminant

validity evidence was noted with the scales of the Internalizing Problems composite. These

scales correlated very well with each other and were appropriately negatively correlated with the

62 

 

scales of the Personal Adjustment composite. Each of these scales also had specific scales on the

MMPI-A with which they were expected to correlate and they were all also expected to generally

correlate with the scales representing internalizing problems. The correlations supported the

validity for these scales.

The Inattention/ Hyperactivity composite provided reasonable evidence of validity and

the two scales of this composite strongly correlated with each other. The correlations with

MMPI-A scales were mixed. To start, it was difficult to specify how these scales should correlate

as there were no specific MMPI-A scales that measure attention or hyperactivity. It was

hypothesized that they would best correlate with the more externalizing, impulsive scales. Based

on this particular expectation, the BASC-2 scales did not perform adequately. In fact, the highest

correlations were found with the Psychasthenia and Schizophrenia scales.

The School Problems composite did not demonstrate correlations as expected. The two

school scales correlated well with each other, but the third scale (Sensation Seeking) of the

composite did not demonstrate strong predicted correlations within the composite, but did

demonstrate a strong correlation with Hyperactivity (theoretically meaningful, but not predicted).

When correlated with the MMPI-A, it was expected that the School Composite would correlate

with the more externalizing, impulsive, and oppositional scales. These general correlations did

not emerge, but a specifically predicted relationship emerged between Sensation Seeking and

Hypomania, providing strong convergent validity support.

Overall, reasonable evidence emerged to support the convergent and discriminant validity

of the BASC-2. The internalizing scales strongly correlated in expected directions with the

specified scales, while the other scales of the BASC-2 adequately correlated in this study. The

63 

 

predicted correlations between BASC-2 scales and MMPI-A scales were very strong and all

emerged as predicted.

The structure of the BASC-2 was evaluated in support of its factorial validity. The SRP-

A was supported with adequate fit for the full five-factor model as proposed by Reynolds and

Kamphaus (2004). This result suggests that the scores on the SRP-A from administrations with

youth from a juvenile offender population can be meaningfully interpreted in respect to a five-

factor higher order structure. The four factor model was supported, although the five-factor was a

better fit, and the current study, surprisingly, demonstrated a better fit between the model and the

data, χ2(98) = 360.83, CFI=.854, and RMSEA=.115, than the normative sample, χ2(98) = 4,143,

CFI=.848, and RMSEA=.116, as reported by Reynolds and Kamphaus (2004).

Two alternative three factor models were tested and demonstrated moderate fit. One

model was the three-factors that Reynolds and Kamphaus (2004) proposed with Attention

Problems and Hyperactivity loading onto the Internalizing Problems composite rather than onto

their own individual composite. The other three-factor model was constructed by this writer and

included Attention Problems and Hyperactivity loading onto the School Problems composite

(renamed Externalizing Problems) because ADHD symptoms were conceptualized as more

“behavioral” than the clinical scales of the Internalizing Composite and a better conceptual fit

with the School Problems composite. Although the other three-factor was not as good of a fit for

the data, χ2(101) = 378.33, CFI=.846, and RMSEA=.116, as the four or five-factor models, it

was a much better fit than the three-factor model, χ2(101) = 450.61, CFI=.806, and

RMSEA=.130, as proposed by Reynolds and Kamphaus (2004).

The CFA, used theory to specify the models a priori and then test their fit with the data.

EFA, however, was used to explore the factorial structure suggested by the data. The best

64 

 

balance between well-defined, interpretable factors and simple structure emerged with a four-

factor model. The data resulted in some recreations of the SRP-A composites, but the structure

did not fully re-emerge. The clinical scales of the Internalizing Problems composite almost all re-

emerged in one factor except for Locus of Control which negatively loaded onto a single factor

with Relations with Parents. The Personal Adjustment composite re-emerged except for

Relations with Parents as noted previously, the School Problems composite fully re-emerged

with additional scales (Attention Problems and Hyperactivity) as theorized in the previously

mentioned “Externalizing” composite.

Overall, the construct validity of the BASC-2 was supported by the current study.

Factorial validity emerged with adequate fit of the proposed higher-order factor structure and the

factors were mostly confirmed through exploratory analysis as well. The convergent and

discriminate validity results further support the construct validity of the SRP-A being used as a

clinical assessment instrument with the juvenile offender population.

Limitations to Internal Validity

Whenever utilizing self-report data in a study, a threat to internal validity is introduced.

The current study used two self-report measures with a population that is involved with the

justice system. Often, in situations involving legal aspects, youth may be inclined to withhold

some of their responses to appear more favorable on the instruments. The current study protected

against this by screening validity indicators from the instruments prior to analysis, but the threat

still exists.

Convergent and discriminant validity were evaluated with a small sample (n = 17) of

youth who completed both instruments. Nearly all of the specified inter-scale correlations were

significant, even with the small sample size. The results are impressive given the level of

65 

 

correlations, but confidence in the results can be fully reached without a larger sample.

Therefore, the small sample size can provide trends for the correlations, but cannot be assumed

to be representative of the population.

Limitations to External Validity

The selection method for the current study was not randomized and the generalizability

of the results is limited to a degree by this. On the other hand, the data was gathered in the exact

manner in which it would be collected during the actual use and administration of the instrument

with this particular population. Therefore the generalizability of the results is limited in that the

selection was not randomized from with the juvenile offender population, but generalizability

was also improved because the data was gathered from the actual clinical administration of the

instrument per its intended use.

Implications for Future Research

The sample size for the current study was adequate for the analysis conducted, but a

larger sample size would provide further confidence in the results. Particularly, further research

studies could evaluate the first order factor structure of the BASC-2 SRP-A within the juvenile

offender population. The current study confirmed the scales with an evaluation of internal

consistency, but with a much larger sample (approximately 600-1,000), the first order factors

could also be confirmed. Even though the scales demonstrated excellent internal consistency for

the most part, it would be beneficial to evaluate how well the full structural model proposed by

Reynolds and Kamphaus (2004) could be confirmed within the current population. It could also

be of benefit to analyze the specific item functionings within this population to determine if any

appear inappropriate for the population.

66 

 

The current study sought to confirm and explore the construct validity of the instrument

within the juvenile offender population. Additional research could be conducted to expand on

this study by evaluating the incremental validity. An instrument that is valid and measures what

it intends to measures is vital, but an instrument that is valid and also provides relevant

information beyond what is already available is incrementally valid. For instance, does the SRP-

A have predictive or discriminant validity? The current study did not explore these questions and

future research could extend the current results in this way.

It is also of interest as to whether the SRP-A maintains construct validity in a more

specified subsample of the juvenile offender population. Juvenile offenders are a heterogeneous

population and although they share some similarities, overall these youth can be quite different

from each other. In this light, a particular subset of the greater juvenile offender population may

be of interest for future validity studies.

Implications for Practice

The current study evaluated the validity of an instrument in clinical use with a specific

population. In this light, it is a study with a focus on providing evidence toward decisions about

using this instrument with this particular population. The results of the current study support the

use of the BASC-2 SRP-A within the juvenile offender population for the clinical purposes of

psychological evaluations and screening assessments.

The current author recommends the BASC-2 SRP-A to be used as a broad screening and

assessment instrument within the juvenile offender population. The current results provide

evidence that the factorial validity of the SRP-A is stable within this population and that the

scales and composites of the instrument are interpretable within the population. In other words,

the evidence suggests that the scales seem to be measuring what they were intended to measure

67 

 

and that the composites are appropriate combinations of the scales. Caution should be used with

interpreting differences, elevations, or fluctuations in scores of the less reliable scales, namely

Self-Reliance and Sensation Seeking. Although evidence emerged to support the validity of these

scales, the variability within these two scales is to such a degree as to warrant caution when

determining the meaningfulness of a specific score.

The scales of the Internalizing Problems composite emerged as leaders in this instrument.

They demonstrated strong levels of internal consistency, strong inter-scale correlations, evidence

of convergent and discriminant validity, and structural consistency in the CFA and EFA. This

evidence overwhelmingly endorses the use of these scales in making clinical inferences,

diagnoses, or determinations of treatment needs. The internalizing scales are therefore strongly

recommended for clinical use.

Conclusion

The purpose of the current study was to evaluate the construct validity of the BASC-2

SRP-A when used as a broad screening instrument within the juvenile offender population.

Although the BASC-2 is suggested as an appropriate broad screening measure of conduct

problems, it had not previously been validated for use with juvenile offenders. Results of the

study support the construct validity of this instrument for this use within this population.

The scales almost all presented strong evidence of internal consistency except for Self-

Reliance and Sensation Seeking. These two scales presented marginal levels of consistency that

were unacceptable. In evaluating correlations for convergent and discriminant validity, all of the

specified individual scale to scale correlations provided strong evidence of validity. Specifically

Sensation Seeking and Hypomania; Atypicality and Schizophrenia; Anxiety and Hysteria,

Paranoia, and Psychasthenia; Depression and Depression; Somatization and Hypochondiasis and

68 

 

Hysteria; and Relationship with Parents, Interpersonal Relations, Self-Esteem, and Self-Reliance

with Psychopathic Deviate were expected to correlate and resulted in strong correlations.

Factorial validity was supported during confirmatory and exploratory factor analysis. The

full higher-order structure of the SRP-A was confirmed as an adequate fit for the data.

Alternative models with four and three factors were found to be acceptable fits for the data, but

not as good of a fit as the full five-factor model. If the scales were to be divided into just three

factors (externalizing, internalizing, and personal adjustment) it is recommended that the

alternative model, with Attention Problems and Hyperactivity loading onto the externalizing

problems factor, be chosen over the three-factor model proposed by the test developers. During

EFA, the Internalizing Problems composite nearly completely emerged, while the externalizing

factor from the three-factor model emerged in place of the School Problems and

Inattention/Hyperactivity composites.

Overall, the BASC-2 SRP-A performed quite well within the current sample. The data for

this study was gathered during the clinical administration of the instrument and can therefore be

generalized to this use with this population. Based on the results of the current study, the SRP-A

can be recommended for use as a broad screening instrument for the juvenile offender

population.

69 

 

References

American Psychological Association (2002). "Criteria for evaluating treatment guidelines."

American Psychologist 57(12): 1052-1059.

American Psychological Association (2006). "Evidence-Based Practice in Psychology."

American Psychologist 61(4): 271-285.

Benjamin, L. T., Jr. (2007). A brief history of modern psychology. Malden, MA, US, Blackwell

Publishing.

Bergeron, R., R. G. Floyd, et al. (2008). "The generalizability of externalizing behavior

composites and subscale scores across time, rater, and instrument." School Psychology

Review 37(1): 91-108.

Blanton, H. and J. Jaccard (2006). "Arbitrary Metrics in Psychology." American Psychologist

61(1): 27-41.

Butcher, J. N., C. L. Williams, et al. (1992). Minnesota Multiphasic Personality Inventory—

Adolescent (MMPI-A): Manual for administration, scoring, and interpretation.

Minneapolis, University of Minnesota Press.

Cicchetti, D. V. (1994). "Guidelines, criteria, and rules of thumb for evaluating normed and

standardized assessment instruments in psychology." Psychological Assessment 6(4):

284-290.

Cronbach, L. J. and P. E. Meehl (1955). "Construct validity in psychological tests."

Psychological Bulletin 52(4): 281-302.

70 

 

DeCarlo, L. T. (1997). "On the meaning and use of kurtosis." Psychological Methods 2(3): 292-

307.

Evans, L. G. and J. Oehler-Stinnett (2008). "Validity of the OSU Post-Traumatic Stress Disorder

Scale and the Behavior Assessment System for Children Self-Report of Personality with

child tornado survivors." Psychology in the Schools 45(2): 121-131.

Federal Interagency Forum on Child and Family Statistics (2007). America’s Children: Key

National Indicators of Well-Being Washington, DC:, U.S. Government Printing Office.

Fletcher, J. M., D. J. Francis, et al. (2005). "Evidence-Based Assessment of Learning Disabilities

in Children and Adolescents." Journal of Clinical Child and Adolescent Psychology

34(3): 506-522.

Gorsuch, R. L. (1983). Factor analysis Hillsdale, N.J. :, L. Erlbaum Associates,.

Grisso, T. and L. A. Underwood (2004). Screening and Assessing Mental Health and Substance

Use Disorders Among Youth in the Juvenile Justice System. A Resource Guide for

Practitioners. Washington, DC: US, Department of Justice, Office of Justice Programs,

Office of Juvenile Justice and Delinquency Prevention.

Groth-Marnat, G. (2003). Handbook of psychological assessment (4th ed.). Hoboken, NJ, US,

John Wiley & Sons Inc.

Heng, K. and E. Wirrell (2006). "Sleep disturbance in children with migraine." Journal of Child

Neurology 21(9): 761-766.

Hogarty, K. Y., C. V. Hines, et al. (2005). "The Quality of Factor Solutions in Exploratory

Factor Analysis: The Influence of Sample Size, Communality, and Overdetermination."

Educational and Psychological Measurement 65(2): 202-226.

71 

 

Hunsley, J. (2003). "Introduction to the Special Section on Incremental Validity and Utility in

Clinical Assessment." Psychological Assessment 15(4): 443-445.

Hunsley, J. and E. J. Mash (2007). "Evidence-based assessment." Annual Review of Clinical

Psychology 3: 29-51.

Jackson, D. L. (2001). "Sample Size and Number of Parameter Estimates in Maximum

Likelihood confirmatory factor analysis: A Monte Carlo investigation." Structural

Equation Modeling 8(2): 205-223.

Klein, D. N., L. R. Dougherty, et al. (2005). "Toward Guidelines for Evidence-Based

Assessment of Depression in Children and Adolescents." Journal of Clinical Child and

Adolescent Psychology 34(3): 412-432.

MacCallum, R. C., K. F. Widaman, et al. (2001). "Sample size in factor analysis: The role of

model error." Multivariate Behavioral Research 36(4): 611-637.

MacCallum, R. C., K. F. Widaman, et al. (1999). "Sample size in factor analysis." Psychological

Methods 4(1): 84-99.

Mash, E. J. and J. Hunsley (2005). "Evidence-Based Assessment of Child and Adolescent

Disorders: Issues and Challenges." Journal of Clinical Child and Adolescent Psychology

34(3): 362-379.

McMahon, R. J. and P. J. Frick (2005). "Evidence-Based Assessment of Conduct Problems in

Children and Adolescents." Journal of Clinical Child and Adolescent Psychology 34(3):

477-505.

Mundfrom, D. J., D. G. Shaw, et al. (2005). "Minimum Sample Size Recommendations for

Conducting Factor Analyses." International Journal of Testing 5(2): 159-168.

72 

 

Pelham, W. E., Jr., G. A. Fabiano, et al. (2005). "Evidence-Based Assessment of Attention

Deficit Hyperactivity Disorder in Children and Adolescents." Journal of Clinical Child

and Adolescent Psychology 34(3): 449-476.

Reynolds, C. R. and R. W. Kamphaus (2004). Behavior assessment system for children

Circle Pines, MN, American Guidance Service.

Silverman, W. K. and T. H. Ollendick (2005). "Evidence-Based Assessment of Anxiety and Its

Disorders in Children and Adolescents." Journal of Clinical Child and Adolescent

Psychology 34(3): 380-411.

Smith, G. T. (2005). "On Construct Validity: Issues of Method and Measurement."

Psychological Assessment 17(4): 396-408.

Snyder, H. N. (2006). Juvenile Arrests 2004. Washington, DC: US, Department of Justice,

Office of Justice Programs, Office of Juvenile Justice and Delinquency Prevention.

Snyder, H. N. and M. Sickmund (2006). Juvenile Offenders and Victims: 2006 National Report.

Washington, DC: U.S. , Department of Justice, Office of Justice Programs, Office of

Juvenile Justice and Delinquency Prevention.

Tan, C. S. (2007). "Test Review Behavior assessment system for children (2nd ed.)." Assessment

for Effective Intervention 32(2): 121-124.

Teplin, L. A., K. M. Abram, et al. (2006). Psychiatric Disorders of Youth in Detention.

Washington DC: US, Department of Justice, Office of Justice Programs, Office of

Juvenile Justice and Delinquency Prevention.

Watkins, C. E. (1992). "Historical influences on the use of assessment methods in counseling

psychology." Counselling Psychology Quarterly 5(2): 177-188.

73 

 

Weis, R. and L. Smenner (2007). "Construct validity of the Behavior Assessment System for

Children (BASC) Self-Report of Personality: Evidence from adolescents referred to

residential treatment." Journal of Psychoeducational Assessment 25(2): 111-126.

Youngstrom, E. A., R. L. Findling, et al. (2005). "Toward an Evidence-Based Assessment of

Pediatric Bipolar Disorder." Journal of Clinical Child and Adolescent Psychology 34(3):

433-448.

74 

 

Appendix A, Stem and Leaf Plots for BASC-2 Scales

Attitude to School

Frequency Stem & Leaf

1.00 3 . 2

18.00 3 . 555577777777779999

29.00 4 . 00000000111122222222222333334

36.00 4 . 555555555555557777777778888888888888

40.00 5 . 0000000000000000000000222222222224444444

23.00 5 . 55555666666777777777788

11.00 6 . 00000000133

16.00 6 . 5555555555777888

16.00 7 . 0000011113333333

12.00 7 . 555666788888

2.00 8 . 00

Attitude to Teachers

Frequency Stem & Leaf

8.00 3 . 66899999

21.00 4 . 111111111111113333333

32.00 4 . 55666666667778888888888999999999

47.00 5 . 00000011112222222333333333333333333333444444444

21.00 5 . 555666668888888888888

26.00 6 . 00000000000000222222222224

14.00 6 . 55555555566677

75 

 

20.00 7 . 00000000001222344444

9.00 7 . 577777999

3.00 8 . 122

Sensation Seeking

Frequency Stem & Leaf

2.00 2 . 44

2.00 2 . 78

6.00 3 . 133344

12.00 3 . 557777779999

31.00 4 . 0012222222233333333333333334444

32.00 4 . 55555557777777778888889999999999

39.00 5 . 000000000111111111111111222333333344444

33.00 5 . 666666666666666666668888888899999

18.00 6 . 000000011113333333

18.00 6 . 555555556677777788

7.00 7 . 0000022

4.00 7 . 6666

Atypicality

Frequency Stem & Leaf

14.00 4 . 00111111111111

44.00 4 . 22222222222222222222222222222222222333333333

28.00 4 . 5555555555555555555555555555

76 

 

2.00 4 . 66

22.00 4 . 8888888888888888888888

13.00 5 . 0000000000000

15.00 5 . 222222333333333

3.00 5 . 445

6.00 5 . 666677

9.00 5 . 899999999

4.00 6 . 1111

6.00 6 . 222223

10.00 6 . 4455555555

1.00 6 . 6

3.00 6 . 888

8.00 7 . 00000000

2.00 7 . 33

1.00 7 . 5

3.00 7 . 667

1.00 7 . 8

Locus of Control

Frequency Stem & Leaf

17.00 3 . 66666666777889999

24.00 4 . 000111111244444444444444

33.00 4 . 666666666666666666888888888888889

31.00 5 . 0000111111111111222222222333344

77 

 

36.00 5 . 555555555555555667777777788888888888

23.00 6 . 00000002222222444444444

11.00 6 . 66667789999

10.00 7 . 0111123344

12.00 7 . 666666666888

4.00 8 . 0333

3.00 8 . 555

Social Stress

Frequency Stem & Leaf

2.00 3 . 44

22.00 3 . 5556666788888888999999

31.00 4 . 0000000000000111111333333333333

38.00 4 . 55555555555555557777777779999999999999

38.00 5 . 11111111111111111111113333333333333333

27.00 5 . 555666666666666666678888889

21.00 6 . 000000222222222444444

10.00 6 . 6666666669

5.00 7 . 01122

1.00 7 . 6

2.00 8 . 00

78 

 

Anxiety

Frequency Stem & Leaf

8.00 3 . 22223444

27.00 3 . 555555566777777788888888899

28.00 4 . 0000011222222222222233444444

35.00 4 . 55555555555556666677777888888888888

41.00 5 . 00000000111111222333333344444444444444444

27.00 5 . 666666667778888889999999999

13.00 6 . 0011222223444

12.00 6 . 555777777778

7.00 7 . 0001233

3.00 7 . 679

2.00 8 . 00

Depression

Frequency Stem & Leaf

46.00 4 . 0000000000000000111111111111113333333333333333

46.00 4 . 5555555555555666666667777777777778888889999999

27.00 5 . 000111111111111122333333444

30.00 5 . 555555555556667778888888899999

17.00 6 . 01111111111222344

8.00 6 . 66668889

12.00 7 . 000000122223

7.00 7 . 5668888

7.00 8 . 0222233

79 

 

Sense of Inadequacy

Frequency Stem & Leaf

.00 3 .

7.00 3 . 5555788

32.00 4 . 00000000000011222222244444444444

19.00 4 . 6666677777799999999

44.00 5 . 00000000000001111111111223344444444444444444

23.00 5 . 66666666667788888888889

29.00 6 . 00000001111111222233344444444

20.00 6 . 55555555556668888889

10.00 7 . 0000011223

7.00 7 . 5555577

3.00 8 . 134

3.00 8 . 566

Somatization

Frequency Stem & Leaf

3.00 3 . 999

61.00 4 . 0000000000000000000000000000000000000000000011233334444444444

28.00 4 . 6666666777777777777777777999

24.00 5 . 000000001122222333333334

32.00 5 . 66666666666666666666668899999999

26.00 6 . 00000222222333333333333333

9.00 6 . 566888999

80 

 

7.00 7 . 1113334

8.00 7 . 66666777

3.00 8 . 224

3.00 8 . 677

Attention Problems

Frequency Stem & Leaf

1.00 3 . 4

15.00 3 . 556666688888889

16.00 4 . 1111111333333344

22.00 4 . 5555555556777777999999

52.00 5 . 0000000000000011111111122222222222244444444444444444

24.00 5 . 555666666666788888889999

22.00 6 . 0000000001122333333334

31.00 6 . 5555555556666666667777778888899

6.00 7 . 000222

13.00 7 . 5555567777889

2.00 8 . 02

Hyperactivity

Frequency Stem & Leaf

1.00 3 . 3

26.00 3 . 66666666666666888999999999

22.00 4 . 1111222222222233444444

81 

 

39.00 4 . 555555555555556668888888888888899999999

40.00 5 . 1111111111111112222222222244444444444444

11.00 5 . 77777777779

20.00 6 . 00000000000022222333

21.00 6 . 555555555666666788899

8.00 7 . 02222233

8.00 7 . 56666788

5.00 8 . 11334

2.00 8 . 77

Relation with Parents

Frequency Stem & Leaf

4.00 1 . 9999

9.00 2 . 033333333

14.00 2 . 55555555566688

19.00 3 . 0000111111333333334

25.00 3 . 5555555666666666668888899

24.00 4 . 000011111111112222333444

28.00 4 . 5555555555566666666777888999

24.00 5 . 000000001111122233333444

24.00 5 . 555555577777777777888888

20.00 6 . 00000000123333333333

13.00 6 . 5555555555557

82 

 

Interpersonal Relations

Frequency Stem & Leaf

2.00 2 . 69

4.00 3 . 1144

9.00 3 . 566779999

16.00 4 . 2222222222222222

27.00 4 . 555555555555555555555588999

45.00 5 . 011111222222222222222222222222222333333333344

61.00 5 . 5555555555555555555555566666666666667777889999999999999999999

33.00 6 . 122222222222222222222222222222222

Self-Esteem

Frequency Stem & Leaf

2.00 2 . 33

3.00 2 . 778

6.00 3 . 033333

14.00 3 . 56667779999999

19.00 4 . 0112222222223333333

18.00 4 . 555555555557778888

33.00 5 . 000000000000222222222222222222344

51.00 5 . 555555555556666677777777777777777777777777777789999

53.00 6 . 00000000000000000000011111111111111122222222222222222

83 

 

Self-Reliance

Frequency Stem & Leaf

3.00 2 . 444

4.00 2 . 7777

22.00 3 . 0000000000233333333333

31.00 3 . 5555555555555555556777788888889

45.00 4 . 000000011111111111111111112333333334444444444

20.00 4 . 55557777777778888888

41.00 5 . 00000000000000000000000033333333333333333

18.00 5 . 555556668888888888

15.00 6 . 111111111133344

4.00 6 . 6777

1.00 7 . 1

84 

 

Appendix B, BASC-2 Scale Cronbach Alphas and Item-Total Correlations

Attitude to School

r = .819 Item-Total Correlation

Cronbach's Alpha if Item

Deleted

item010 .525 .804 I don't care about school.

item040 .534 .800 I don't like thinking about school.

item070 .381 .824 My school feels good to me.

item082 .688 .772 School is boring.

item112 .492 .807 I get bored in school.

item142 .552 .796 I feel like I want to quit school.

item172 .784 .752 I hate school.

Sensation Seeking

r = .638

Corrected Item-Total Correlation

Cronbach's Alpha if Item

Deleted

item027 .043 .652 I like loud music.

item047 .374 .605 I like to take chances.

item057 .044 .671 I would rather be a police officer than a teacher.

item077 .522 .559 I like it when my friends dare me to do something.

item087 .347 .605 I like to play rough sports.

item107 .291 .617 I like to experiment with new things.

item117 .359 .600 I like to ride in a car that is going fast.

85 

 

item137 .442 .577 I like to be the first one to try new things.

item147 .442 .581 I like to dare others to do things.

Atypicality

r = .840 Corrected Item-Total Correlation

Cronbach's Alpha if Item

Deleted

item062 .458 .835 Sometimes, when alone, I hear my name.

item092 .698 .805 I feel like people are out to get me.

item100 .671 .810 Someone wants to hurt me.

item119 .495 .830 Even when alone, I feel like someone is watching me.

item122 .733 .803 I hear voices in my head that no one else can hear.

item130 .536 .825 I see weird things.

item149 .212 .849 Someone else controls my thoughts.

item152 .430 .836 I do things over and over and can't stop.

item160 .699 .807 I hear things that others cannot hear.

Depression

r = .877

Corrected Item-Total Correlation

Cronbach's Alpha if Item

Deleted

item003 .393 .878 Nothing goes my way.

item008 .504 .872 I used to be happier.

item021 .446 .875 Nothing is fun anymore.

86 

 

item033 .743 .859 Nobody ever listens to me.

item038 .697 .860 I just don't care anymore.

item051 .622 .864 I don't seem to do anything right.

item063 .683 .863 Nothing ever goes right for me.

item068 .671 .864 Nothing about me is right.

item081 .690 .859 I feel like my life is getting worse and worse.

item093 .645 .862 I feel depressed.

item098 .477 .874 No one understands me.

item111 .433 .875 I feel sad.

Somatization

r = .759

Corrected Item-Total Correlation

Cronbach's Alpha if Item

Deleted

item004 .349 .762 My muscles get sore a lot.

item009 .485 .731 I often have headaches.

item034 .555 .712 Often I feel sick in my stomach.

item039 .452 .735 Sometimes my ears hurt for no reason.

item064 .538 .722 I get sick more than others.

item069 .450 .737 My stomach gets upset more than most people's.

item099 .590 .710 I feel dizzy.

87 

 

Attention Problems

r = .713

Corrected Item-Total Correlation

Cronbach's Alpha if Item

Deleted

item005 .402 .685 People tell me I should pay more attention.

item035 .458 .674 I think that I have a short attention span.

item053 .544 .656 I have attention problems.

item065 .268 .709 I give up easily.

item083 .278 .707 I forget things.

item095 .180 .722 I listen when people are talking to me.

item113 .518 .661 I have trouble paying attention to the teacher.

item125 .270 .709 I pay attention when someone is telling me how to do something.

item143 .561 .656 I have trouble paying attention to what I am doing.

Hyperactivity

r = .816

Corrected Item-Total Correlation

Cronbach's Alpha if Item

Deleted

item088 .647 .775 I have trouble standing still in lines.

item118 .334 .822 I talk while other people are talking.

item124 .674 .769 I have trouble sitting still.

item134 .570 .789 I feel like I have to get up and move around.

item148 .462 .806 I talk without waiting for others to say something.

88 

 

item154 .512 .799 People tell me to be still.

item164 .669 .770 People tell me that I am too noisy.

Self Esteem

r = .843

Corrected Item-Total Correlation

Cronbach's Alpha if Item

Deleted

Item001 .603 .830 I like who I am.

item031 .726 .805 I wish I were different.

item044 .624 .820 I wish I were someone else.

item061 .651 .815 I feel good about myself.

item074 .741 .802 I like the way I look.

item091 .675 .811 I get upset about my looks.

item104 .062 .887 I am good at things.

item121 .688 .810 My looks bother me.

Attitude to teachers

r = .811 Corrected Item-Total Correlation

Cronbach's Alpha if Item

Deleted

item037 .461 .799 My teacher understands me.

item067 .563 .786 My teacher cares about me.

item085 .564 .785 My teacher trusts me.

item097 .448 .800 Teachers make me feel stupid.

89 

 

item115 .553 .787 Teachers look for the bad things that you do.

item127 .561 .787 Teachers are unfair.

item145 .410 .804 My teacher is proud of me.

item157 .466 .798 My teachers want too much.

item175 .548 .787 My teacher gets mad at me for no good reason.

Locus of Control

r = .832 Corrected Item-Total Correlation

Cronbach's Alpha if Item

Deleted

item006 .505 .819 Things go wrong for me, even when I try hard.

item019 .588 .810 What I want never seems to matter.

item036 .406 .830 My parents have too much control over my life.

item049 .417 .828 My parents are always telling me what to do.

item066 .658 .801 My parents blame too many of their problems on me.

item079 .569 .811 I get blamed for things I can't help.

item109 .502 .819 My parents expect too much from me.

item139 .636 .803 I am blamed for things I don't do.

item169 .598 .809 People get mad at me, even when I don't do anything wrong.

90 

 

Social Stress

r = .838

Corrected Item-Total Correlation

Cronbach's Alpha if Item

Deleted

item026 .461 .830 My friends have more fun than I do.

item056 .478 .829 Other children are happier than I am.

item075 .532 .823 People say bad things to me.

item086 .559 .820 People act as if they don't hear me.

item105 .564 .820 I am lonely.

item116 .678 .813 I am left out of things.

item135 .569 .819 Other people find things wrong with me.

item146 .544 .822 I feel out of place around people.

item165 .456 .829 I feel that others do not like the way I do things.

item176 .524 .823 Other people are against me.

Anxiety

r = .848

Corrected Item-Total Correlation

Cronbach's Alpha if Item

Deleted

item011 .373 .846 I can never seem to relax.

item020 .373 .847 I worry about little things.

item041 .596 .831 I worry a lot of the time.

item050 .527 .836 I often worry about something bad happening to me.

item071 .412 .843 I get so nervous I can't breathe.

91 

 

item080 .619 .829 I worry when I go to bed at night.

item101 .415 .843 I feel guilty about things.

item108 .577 .834 I get nervous.

item110 .647 .828 I worry but I don't know why.

item131 .583 .832 I get nervous when things do not go the right way for me.

item138 .415 .843 Little things bother me.

item140 .589 .831 I worry about what is going to happen.

item170 .431 .842 I am afraid of a lot of things.

Sense of Inadequacy

r = .811

Corrected Item-Total Correlation

Cronbach's Alpha if Item

Deleted

item024 .476 .796 I never seem to get anything right.

item030 .347 .810 I cover up my work when the teacher walks by.

item054 .468 .797 Most things are harder for me than for others.

item060 .454 .799 I never quite reach my goal.

item084 .583 .785 Even when I try hard, I fail.

item090 .480 .796 I am disappointed with my grades.

item114 .515 .791 When I take tests, I can't think.

item120 .640 .776 I want to do better, but I can't.

item144 .573 .788 I fail at things.

item150 .398 .803 I quit easily.

92 

 

Relationship with Parents

r = .911

Corrected Item-Total Correlation

Cronbach's Alpha if Item

Deleted

item042 .625 .905 I get along well with my parents.

item072 .696 .901 I am proud of my parents.

item102 .549 .910 I like going places with my parents.

item126 .797 .895 My parents are easy to talk to.

item132 .542 .910 My mother and father like my friends.

item141 .672 .903 My mother and father help me if I ask them to.

item155 .686 .902 My parents listen to what I say.

item156 .714 .900 I like to be close to my parents.

item171 .737 .899 My parents trust me.

item173 .756 .897 My parents are proud of me.

Interpersonal Relations

r = .787

Corrected Item-Total Correlation

Cronbach's Alpha if Item

Deleted

item013 .433 .775 My classmates don't like me.

item043 .599 .751 Other children don't like to be with me.

item073 .515 .760 Other kids hate to be with me.

item103 .488 .764 I feel that nobody likes me.

item133 .579 .748 People think I am fun to be with.

93 

 

item151 .559 .751 I am slow to make new friends.

item163 .495 .767 I am liked by others.

Self Reliance

r = .596

Corrected Item-Total Correlation

Cronbach's Alpha if Item

Deleted

item016 .249 .576 If I have a problem, I can usually work it out.

item046 .181 .595 I can handle most things on my own.

item076 .306 .560 I am dependable.

item106 .180 .597 I can solve difficult problems by myself.

item123 .508 .500 I am good at making decisions.

item136 .363 .541 I like to make decisions on my own.

item153 .215 .586 My friends come to me for help.

item166 .381 .534 I am someone you can rely on.

94 

 

Appendix C, Results from Confirmatory Factor Analysis

Standardized Parameter Estimates for the One- and Two-factor Models

Two-Factor One-Factor

Scale Personal

Adjustment Personal

Maladjustment Overall

Functioning

Self-Reliance .323 - -.195

Self-Esteem .856 - -.696

Interpersonal Relations .737 - -.644

Relations with Parents .528 - -.473

Hyperactivity - .498 .480

Attention Problems - .598 .588

Somatization - .663 .661

Sense of Inadequacy - .800 .795

Depression - .857 .855

Anxiety - .814 .809

Social Stress - .844 .853

Locus of Control - .767 .766

Atypicality - .745 .740

Sensation Seeking - .248 .228

Attitude to Teachers - .540 .539

Attitude to Teachers - .416 .408

95 

 

Standardized Parameter Estimates for Three-factor Model

Factors

Scale Personal

Adjustment Externalizing

Problems Internalizing

Problems

Self-Reliance .326 - -

Self-Esteem .864 - -

Interpersonal Relations .731 - -

Relations with Parents .527 - -

Hyperactivity - - .491

Attention Problems - - .593

Somatization - - .663

Sense of Inadequacy - - .799

Depression - - .857

Anxiety - - .816

Social Stress - - .847

Locus of Control - - .767

Atypicality - - .743

Sensation Seeking - .426 -

Attitude to Teachers - .742 -

Attitude to Teachers - .677 -

96 

 

Standardized Parameter Estimates for Three-factor* Model

Factors

Scale Personal

Adjustment Externalizing

Problems Internalizing

Problems

Self-Reliance .319 - -

Self-Esteem .855 - -

Interpersonal Relations .739 - -

Relations with Parents .529 - -

Hyperactivity - .762 -

Attention Problems - .781 -

Somatization - - .664

Sense of Inadequacy - - .793

Depression - - .860

Anxiety - - .822

Social Stress - - .854

Locus of Control - - .766

Atypicality - - .737

Sensation Seeking - .500 -

Attitude to Teachers - .580 -

Attitude to Teachers - .567 -

97 

 

Standardized Parameter Estimates for Four-factor Model

Factors

Scale Personal

Adjustment Inattention/

HyperactivityInterpersonal

Problems School

Problems

Self-Reliance .319 - - -

Self-Esteem .856 - - -

Interpersonal Relations .738 - - -

Relations with Parents .529 - - -

Hyperactivity - .773 - -

Attention Problems - .832 - -

Somatization - - .663 -

Sense of Inadequacy - - .794 -

Depression - - .862 -

Anxiety - - .821 -

Social Stress - - .854 -

Locus of Control - - .766 -

Atypicality - - .736 -

Sensation Seeking - - - .479

Attitude to Teachers - - - .700

Attitude to Teachers - - - .683

98 

 

Appendix D, Results from Exploratory Factor Analysis

 

99 

 

Loadings from 5-factor Solution

Factor 1 Factor 2 Factor 3 Factor 4 Factor 5

Interpersonal Relations

-.851

Social Stress .695 .253

Self-Esteem -.599

Atypicality .544 .393

Anxiety .482 .321 .381 .296

Sense of Inadequacy

.402 .363

Somatization .363 .278

Hyperactivity .797

Attention Problems

.727 -.274

Sensation Seeking

.469

Self-Reliance .747

Locus of Control

.907

Relation with Parents

-.578

Depression .390 .530

Attitude to School

-.615

Attitude to Teachers

.284 -.551

100 

 

Loadings from 6-factor Solution

Factor 1 Factor 2 Factor 3 Factor 4 Factor 5 Factor 6

Anxiety .843

Depression .772

Locus of Control

.648 .361

Sense of Inadequacy

.601 -.251

Somatization .509

Hyperactivity .917

Attention Problems

.642 -.256

Sensation Seeking

.411

Self-Reliance .774

Relation with Parents

-.806

Attitude to School

.730

Attitude to Teachers

.563

Interpersonal Relations

.744

Self-Esteem -.284 .509

Social Stress .413 -.475

Atypicality .264 .375 -.383