TESTING THE FACTOR STRUCTURE OF THE BEHAVIOR RATING

The Pennsylvania State University

The Graduate School

College of Education

TESTING THE FACTOR STRUCTURE OF THE BEHAVIOR RATING INVENTORY OF

EXECUTIVE FUNCTION (BRIEF) PARENT FORM USING A MIXED CLINICAL SAMPLE OF

YOUTH

A Dissertation in

School Psychology

by

Maria C. Smith

© 2013 Maria C. Smith

Submitted in Partial Fulfillment

of the Requirements

for the Degree of

Doctor of Philosophy

December 2013

ii

The dissertation of Maria C. Smith was reviewed and approved* by the following:

Beverly J. Vandiver

Associate Professor of Education

Dissertation Adviser

Co-Chair of Committee

Barbara A. Schaefer

Associate Professor of Education

Co-Chair of Committee

Lynn S. Liben

Distinguished Professor of Psychology

Hoi K. Suen

Distinguished Professor of Education

Kathleen J. Bieschke

Head, Department of Educational Psychology, Counseling, and Special Education

Professor of Counseling Psychology

*Signatures are on file in the Graduate School

iii

ABSTRACT

Executive functions (EF) are cognitive processes that are controlled and coordinated during

complex tasks (Monsell, 1996). EF has become increasingly popular in the context of clinical

evaluation, and, more recently, in the school setting. If children are inadequate at performing

basic classroom functions, such as inhibiting responses, regulating behavior, or predicting

outcomes, their academic success is likely to be compromised (Bull & Scerif, 2001; Palfrey et

al., 1985). The Behavior Rating Inventory of Executive Function (BRIEF; Gioia et al., 2000) is

a behavior-rating scale designed to assess the behavioral characteristics related to executive-

function deficits of youth in school and home environments. However, there continues to be

debate regarding the current two-factor, eight-scale factor structure of the BRIEF-Parent form

when applied in a mixed clinical (or school) sample of school-age youth. This study examined

the factor structure of scores from the BRIEF-Parent form. Ratings were provided by 371

parents or guardians of children living in Western Pennsylvania whose children had been

referred for psychoeducational evaluation. The original model (i.e., 2-factor, 8-scale) currently

employed in the instrument was examined and compared to six alternative models. Results were

analyzed through confirmatory factor analysis (CFA). Findings indicated that in a mixed clinical

sample of youth four of the seven models showed a good fit to the data (e.g., 2-factor, 8-scale -

CFI = .933, SRMR = .049; 3-factor, 9-scale - CFI = .956, SRMR = .041). Although there only

were small differences between the models, RMSEA was still above the recommended cutoff

(i.e., > .08), indicating some potential misfit in all models. Comparisons of the models indicated

that the 3-factor, 9-scale model fit the scores slightly better. These findings provide support for

the use of the two-factor, eight-scale version, which is the basis for the current BRIEF-Parent

iv

form, but competing models fit the data just as well if not better. Thus, the findings also raise

questions about the use of the BRIEF-Parent in its present format in the school setting.

v

Table of Contents

List of Figures .............................................................................................................................. viii

List of Tables ................................................................................................................................. ix

Acknowledgements ..........................................................................................................................x

INTRODUCTION ...........................................................................................................................1

LITERATURE REVIEW ................................................................................................................7

History of Executive Function .....................................................................................................7

Conceptualization of EF ...............................................................................................................9

Theory of unity .......................................................................................................................10

Theory of non-unity ................................................................................................................11

Underlying commonality ........................................................................................................13

EF as a cultural construct ........................................................................................................14

Executive Function in Children..................................................................................................16

Typical EF Development ........................................................................................................16

Role of EF in the learning environment .................................................................................19

The Behavior Rating Inventory of Executive Function .............................................................19

Parent Version ........................................................................................................................20

Description .........................................................................................................................20

Development ......................................................................................................................23

Normative sample ..............................................................................................................25

Evidence for factor structure ..............................................................................................25

U.S. versions ..................................................................................................................26

Translated versions ........................................................................................................30

Summary .......................................................................................................................34

Reliability Evidence of the BRIEF-Parent Form ....................................................................36

Internal consistency ...........................................................................................................36

Interrater reliability ............................................................................................................37

Test-retest reliability ..........................................................................................................38

Other Evidence for the Construct Validity of the BRIEF-Parent Form .................................38

Predictive validity ..............................................................................................................38

Convergent validity ............................................................................................................40

vi

Independent research of convergent validity .....................................................................44

Convergent validity and specific clinical populations .......................................................47

Discriminant validity .....................................................................................................49

Ecological validity .........................................................................................................52

Social consequences ......................................................................................................54

Summary .................................................................................................................................57

Purpose of the Present Study ..................................................................................................58

METHOD ......................................................................................................................................61

Participants .............................................................................................................................61

Geographical Context .............................................................................................................63

Measures .................................................................................................................................63

Demographic information ..................................................................................................63

BRIEF-Parent form ............................................................................................................63

Procedure ................................................................................................................................65

CFA Guidelines and Models ..................................................................................................66

Models................................................................................................................................66

Fit criteria ...........................................................................................................................68

RESULTS ......................................................................................................................................79

Preliminary Analyses ..............................................................................................................79

Descriptive statistics ...............................................................................................................79

Confirmatory Factor Analyses ...............................................................................................81

Criteria ........................................................................................................................…...81

Models................................................................................................................................82

Eight-scale models ......................................................................................................82

Nine-scale models .......................................................................................................84

Eight- versus nine-scale models .................................................................................88

Subsamples .................................................................................................................90

OVR subsample .....................................................................................................90

Caucasian subsample .............................................................................................92

Mother subsample ..................................................................................................94

DISCUSSION ................................................................................................................................98

vii

Eight-Scale Models of the BRIEF-Parent ..............................................................................98

Nine-Scale Models of the BRIEF-Parent .............................................................................102

Differences in Findings ........................................................................................................104

Reasons for Misfit ................................................................................................................106

Limitations ............................................................................................................................107

Implications ..........................................................................................................................109

Practice .............................................................................................................................109

Future research .................................................................................................................111

Conclusions ...........................................................................................................................113

REFERENCES ............................................................................................................................115

APPENDIX A ..............................................................................................................................135

Glossary of Acronyms .................................................................................................................135

APPENDIX B ..............................................................................................................................137

Items Comprising Scales on BRIEF-Parent form ........................................................................137

APPENDIX C ..............................................................................................................................138

School District Approval .............................................................................................................138

APPENDIX D ..............................................................................................................................139

Licensed Psychologist Approval .................................................................................................139

APPENDIX E ..............................................................................................................................140

Office for Research Protections Correspondence .......................................................................140

APPENDIX F...............................................................................................................................141

Structure Coefficient, Effect Sizes, and Error Terms for Subsamples ........................................141

Standardized Structure Coefficients for BRIEF-Parent for OVR Sample .......................141

Standardized Structure Coefficients for BRIEF-Parent for Caucasian Sample ..............143

Standardized Structure Coefficients for BRIEF-Parent for Mother Rater Sample .........145

viii

LIST OF FIGURES

Figure 1. Unity-8 Model ....................................................................................................70

Figure 2. 2Original-8 Model ..............................................................................................71

Figure 3. 2Donders-8 Model ..............................................................................................72

Figure 4. Unity-9 Model ....................................................................................................73

Figure 5. 2Monitor-9 Model ..............................................................................................74



Figure 8. Standardized Coefficients of 3Monitor-9 Model ...............................................89

ix

LIST OF TABLES

Table 1. Demographic Characteristics of Sample..............................................................62

Table 2. Composition of Models Organized by Factor and Indicator ..............................69

Table 3. Descriptive Statistics of Raw Scale Scores on the BRIEF-Parent Form ............80

Table 4. Summary of Fit Indices of CFA (ML Extraction) Models on the BRIEF-

Parent Form Scale Scores for a Mixed Disability Sample ........................................83

Table 5. Standardized Structure Coefficients for BRIEF-Parent Scales for Mixed

Disability Sample Arranged by Model (Maximum Likelihood Extraction) .............85

Table 6. Summary of Fit Indices of CFA (ML) Models of the BRIEF-Parent Form

for the OVR Sample ..................................................................................................93

Table 7. Summary of Fit Indices of CFA (ML) Models for the BRIEF-Parent Form

Based on the Caucasian Participants ..........................................................................95

Table 8. Summary of Fit Indices of CFA (ML) Models for BRIEF-Parent Form

Based on the Mothers as Raters .................................................................................97

Table 9. Root Mean Square Error Approximation (RMSEA) Values Arranged by

Model and Study ......................................................................................................106

Table 10. Percentage of Participants Receiving Special Education Services for

School Sample and School District by Category .....................................................109

x

Acknowledgements

They say it takes a village to raise a child. Since I’m in the process of raising two young

children as well as completing this dissertation, I will venture to say that it also takes a village to

complete a dissertation!

Thank you to my committee for helping me to complete this project. Dr. Beverly

Vandiver, my adviser, thank you for helping me to finish “lil D.” You kept me calm when my

nerves got the best of me, and helped me to realize that I was capable of completing this. Thank

you for encouraging me, but for also challenging me. You’ve helped me to turn this into

something I’m very proud of. Dr. Barbara Schaefer, thank you for becoming my “last minute

co-chair.” You’ve also served as a great role model to me in regards to balancing family and

academia. Dr. Hoi Suen, I enjoyed every class that I took from you. Thank you for including me

in many interesting projects and for helping me to more fully appreciate (and enjoy) the field of

measurement. Dr. Lynn Liben, you have always been extremely encouraging and are such a nice

person. I was honored to have such a renowned scholar on my committee. Thank you for your

contributions.

Thank you to Dr. Douglas Della Toffalo, who dedicated a lot of time and effort into

helping me complete this dissertation. You started off as a great internship supervisor and helped

foster my interest in executive function and the subfield of school neuropsychology. I can’t say

that I will miss all of those files, but I will always remember the help and kindness you extended

to me so that I could complete this goal of finishing this degree. Also thank you to Danielle

Wilson for your help with entering data.

This dissertation is dedicated to my family. First and foremost, thank you to my

wonderful, supportive, and amazing husband, Chad. You never stopped believing in me and

made this journey so much easier and more enjoyable. Thank you for taking over the household

and childcare duties on numerous occasions (and weekend trips) so that I had the opportunity to

work on this dissertation. I love you, and I know I couldn’t have done this without you. To my

kids, “Dr. Mommy” loves you more than I can possibly express. To my son, Drew, who just

turned five years old, thank you for your hugs and kisses and for the patience you have

demonstrated at such a young age. You are a bright, funny, energetic, and sweet little boy; I am

so blessed to have you in my life. To Lia, our little “pumpkin” who is now three years old, you

were always my little cheerleader! You are an inquisitive, expressive, sharp, and hilarious little

girl. You have been a fighter since the day you were born and you taught me to never give up. To

my sister, Caren, thank you for continuing to genuinely care about how things were progressing

for me after so many others had stopped asking. I have always looked up to you; you are an

amazing sister, mom, and friend. To my sister, Cathy, thank you for your support and

encouragement throughout this journey! To my parents, John and Elaine, thank you for serving

as role models as to the importance of pursuing my education. To my best friend, Anne, thank

you for being an amazing listener and for being there for me through so many ups and downs. To

my good friend, Katie, you have always made me laugh throughout our days at Penn State

whether that was honking at people or chasing buses. I’m so glad that the program brought us

together. To my dear friend, Maria (Big), I appreciate you being such a good friend over the

years. Thank you for graciously hosting me the night before my defense and for your help with

delivering my final copy on campus. To my cohort members (Katie, Kasey, Sharise, and Terry)

thank you being such great people and for helping to make the long hours in CEDAR as well as

studying for comps more enjoyable. And to those who I did not mention here, but have helped

me over the years to complete this dissertation, a heartfelt “thank you.”

1

INTRODUCTION

Since Congress enacted the Education for All Handicapped Children Act (Public Law 94-

142) in 1975, the largest increase (433%) in students identified for special education services has

been in the other health impairment (OHI) category (National Center for Education Statistics

[NCES], 2010). This law, commonly known as the Individuals with Disabilities Act (IDEA;

1997; Public Law 105-17), was reauthorized in 2004 as the Individuals with Disabilities

Improvement Act (IDEIA; Public Law 108-446). Under IDEA (2006), the law states that

children should be provided with special education services if they meet the following criteria:

Other health impairment means having limited strength, vitality, or alertness,

including a heightened alertness to environmental stimuli, that results in limited

alertness with respect to the educational environment, that—(i) Is due to chronic

or acute health problems such as asthma, attention deficit disorder or attention

deficit hyperactivity disorder, diabetes, epilepsy, a heart condition, hemophilia,

lead poisoning, leukemia, nephritis, rheumatic fever, sickle cell anemia, and

Tourette syndrome; and (ii) Adversely affects a child’s educational performance.

[§300.8(c)(9)]

One explanation for the growth of students qualifying for services within the OHI

category is the increase in the diagnosis of attention deficit hyperactivity disorder (ADHD;

Akibami, Liu, Pastor, & Reuben, 2011). Additionally, changes in diagnostic criteria for clinical

diagnoses such as autism spectrum disorders (ASD) in the fifth edition of the Diagnostic and

Statistical Manual of Mental Disorders (DSM-5) may indirectly result in an increase in students

qualifying for special education services under the OHI category. Another contributory factor to

the increased use of the OHI category is the survival rate of significantly premature (i.e., less

than or equal to 32 weeks gestation) infants (Alexander & Slay, 2002; Johansson & Cnattigius,

2

2010). The prevalence of specific learning disabilities and ADHD in premature infants without

neurological abnormalities is two to three times higher than in the overall population (Aylward,

2004). As of the 2008-2009 school year, an estimated 659,000 children aged 3 to 21 received

special education services under the OHI category, accounting for 10.2% of the special education

population in the United States (NCES, 2010).

Besides the regulations within IDEA, another federal law that mandates protection for

students with disabilities is Section 504 of the Rehabilitation Act of 1973 (commonly shortened

to Section 504; 34 C.F.R., Part 104), later amended to the Americans with Disabilities Act of

1990 (ADA), and further revised as part of ADA Amendments Act of 2008. The law decreed

that any entity, including schools, that receives federal funding must take measures to ensure

access and equal rights to services for those who have a “physical or mental impairment that

substantially limits one or more major life activities” (ADA, 2000; para. J1). The most recent

update in 2008 expands the current definition of a disability to be more inclusive of those with a

history or record of impairment, or of those who are perceived by others as having impairments.

A common problem for school psychologists and others responsible for applying the spirit of the

law in the school setting is determining what criteria of Section 504 qualify students for an

educational plan. Medical conditions can qualify a student for additional academic supports

under Section 504, even if the student does not meet the criteria to receive services under IDEA,

albeit, only a small percentage of students (approximately 1%) fall within this categorization

(Holler & Zirkel, 2008). Of this 1%, the most common impairment (approximately 80%) has

been children diagnosed with ADHD.

Medical diagnoses, such as ADHD and many other disorders relating to developmental,

neurological, and psychiatric conditions, have been largely associated with deficits in emotional

3

control, memory, inhibiting responses and regulating behavior (Barkley, 1997; Palfrey, Levine,

Walker, & Sullivan, 1985). Difficulties in these areas have also been documented in those with

autism (Mayes, Calhoun, Mayes, & Molitoris, 2012), specific learning disabilities (Obrzut,

1995), and traumatic brain injury (Loken, Thorton, Otto, & Long, 1995). Problems involving

behaviors, such as impulsivity, poor organization and self-monitoring skills pervade across

several IDEA special education categories. These impairments are not unique behaviors to those

who qualify for services under the OHI label.

Thus, reliable and valid methods are needed to identify children in need of services, and

should be able to stand up to rigorous scrutiny. Psychologists use many assessment methods,

including interviews, observations, and behavior rating scales, to gather important information

about students. Best practice involves gathering information across multiple sources and settings

(Hintze, Volpe, & Shapiro, 2007). However, psychologists are somewhat limited when

identifying students displaying behaviors that fall under the OHI category, because most medical

conditions, including ADHD, are more commonly diagnosed by medical professionals (Akibami

et al., 2011). Such medical conditions can often have a negative impact on a child’s academic

achievement (Johnson & Reid, 2011). As more awareness is gained of the behavioral

components related to specific medical conditions, it is becoming increasingly apparent that a

reciprocal connection exists between these problem behaviors and academic success (Lane,

O’Shaughnessy, Lambros, Gresham, & Beebe-Frankenberger, 2002). As such, psychologists

and other professionals in the school setting are charged with identifying these students in a

standardized manner and providing them with an appropriate educational experience.

School psychologists commonly use behavior rating scales as part of an overall

evaluation to gather information about a student’s eligibility under IDEA or Section 504.

4

Behavior rating scales are typically composed of statements about a wide variety of behaviors.

Raters are asked to indicate the frequency of the behavior observed of a designated child. To be

useful in gathering large amounts of information across many areas of functioning, the scales

must be psychometrically sound (Blais, 2011; Rojahn et al., 2010), as well as cost- and time-

effective (Chafouleas, Riley-Tillman, & Sugai, 2007).

Parents are most commonly asked to complete behavior ratings scales about their

children, but have a tendency to rate their children as having significantly more problems on all

behavior rating scales when compared to teacher ratings (Offord et al., 1996). However, parents

tend to be more accurate reporters of children’s hyperactivity and inattentiveness than children

themselves (Loeber, Green, & Lahey, 1990). Documented behaviors that are consistent across

both the home and school environments are important information for psychologists to gather. It

is equally important for psychologists to be aware of behaviors that differ based on the child’s

environmental setting. Essentially, the job of a school psychologist is to accurately survey

children’s potential mental and physical impairments, and determine how these factors may have

an impact on their lives and educational experience.

Although the OHI category encompasses a vast array of impairment, these health

conditions can be examined not only in terms of their impact on the body, but possible influence

on the brain itself. This idea of examining the structure of the brain to fully understand

psychological processes and behaviors is the crux of the field of neuropsychology. The use of

neuropsychological concepts within the context of the school is growing, particularly over the

past decade (Hale & Fiorello, 2004). Psychologists’ role is to adequately evaluate children

displaying behaviors, as those listed above, and make appropriate conclusions about their

eligibility for services. The term “executive function” (EF), which encapsulates these behaviors,

5

is drawn originally from neuropsychological literature and warrants further examination and

understanding.

In the last decade, EF has become a popular topic in applied settings, including the

school. Professionals who work with children are acknowledging that deficits in the area of EF

are important to consider, because of its linkage to poor academic performance or problematic

behaviors (Bull & Scerif, 2001). Its importance is evidenced when in 2002 an entire issue of the

peer-reviewed journal Child Neuropsychology focused on the Behavior Rating Inventory of

Executive Function (BRIEF; Gioia, Isquith, Guy, & Kenworthy, 2000) as well as subsequent

issues that have included articles about the BRIEF. A search of the Wiley Online Library

revealed that between 2002 and 2012 Psychology in the Schools, a peer-reviewed school

psychology journal, has published 89 articles that focused on executive function.

Despite the attention given to the BRIEF, few studies, beyond those of the developers,

have examined the reliability and validity of its scores. A PsycINFO search reveals

approximately 265 studies have used the BRIEF as a way of measuring executive function skills

in various populations from 2002 to 2012. In contrast, since the BRIEF’s development 12 years

ago, nine studies have examined the factor structure of the BRIEF-Parent form, two of which

were conducted by the authors of the instrument. Studies on the BRIEF’s factor structure have

been largely exploratory in nature (e.g., Batan et al., 2011; Slick et al. 2006). Only four studies

have conducted confirmatory factor analysis (CFA) on the BRIEF: (a) Gioia et al. (2002), as the

test developers, provided the initial examination, (b) Egeland and Fallmyr (2010) tested a

Norwegian version of the BRIEF, (c) LeJeune et al. (2010), including one of the test developers,

used the normative sample, and (d) Huizinga and Smidts (2011) examined a Dutch version of the

BRIEF. Thus, the purpose of the current study is to re-examine the two-factor eight-scale model

6

currently being used in the BRIEF instrument. Additionally, the study will extend the research

on the factor structure of the BRIEF-Parent scores at the scale level through examining several

alternative factor structures. These factor structures will be examined through CFA in a U.S.

sample of youth, who are in grades from kindergarten to 12th

and have mixed clinical diagnoses.

It is important to examine the factor structure of any instrument because accurate test

interpretations depend upon knowing the number of factors underlying the items of a measure.

The study will either strengthen or weaken the case that the present factor structure of the

BRIEF-Parent is sufficient for use and may have implications for the appropriate level of

interpretation for youth who are experiencing problems in executive functions. Because school

psychologists are charged with assessing and meeting the needs of a rapidly growing population

of children in not only OHI, but all special education categories, reliable and valid scores of the

instruments must be used in assessment. Public schools receive federal funding, so by law they

have a responsibility to ensure access and equal rights to those with disabilities and those who

are perceived as having impairment. Such a disability may fall under OHI or any number of

special education categories when considering limited ability in the areas of self-regulation,

attention, emotional control and memory. Parent and teacher input regarding student behavior is

crucial and can be provided in a standardized manner through the BRIEF. This information may

help students succeed in school and, ultimately, the future.

7

LITERATURE REVIEW

In 1996, Monsell called the current understanding of how cognitive processes are

controlled and coordinated during complex tasks (i.e., executive function [EF]) an “embarrassing

zone of almost total ignorance” (p. 93). Since then, a large amount of research has been

conducted about the topic, but a great deal of debate in understanding and using the EF concept

still exists in applied settings. A brief history of EF, including its conceptualizations and cultural

considerations will be discussed. EF will also be addressed, as it specifically relates to children

and the difficulties associated with using adult EF research in the realm of childhood assessment.

A practical way of analyzing executive dysfunction, namely the lack of socially appropriate

levels of executive functioning, is through observer ratings. Problems associated with this

methodology as well as the scope of measuring executive function will be reviewed. Then, the

Behavior Rating Inventory of Executive Function (BRIEF; Gioia, Isquith, Guy, & Kenworthy,

2000), one of the most popular observer rating scales of executive functioning, will be reviewed,

including its description and development. There are two versions of the “original” BRIEF, a

teacher form and a parent form. For the purposes of this study, the BRIEF-Parent will be

reviewed. Evidence for its factor structure will be provided through a review of previous

research and its psychometric properties. Finally, the purpose of the present study will be

provided.

History of Executive Function

The executive functions of inhibition and control have been studied in the field of

neurophysiology since the 1830s, with the focus on EF gradually making its way into the field of

psychology at the onset of the 20th

century (Lewis & Carpendale, 2009; Mead, 1910). Luria

(1961) and Vygotsky (1978), influenced by the initial research, investigated “higher” cognitive

8

processes, including planning, memory, and inhibition. The term “executive function” first

appeared in the 1970s, and was referred to as the “central executive” of the brain (Baddeley &

Hitch, 1974). Later, Lezak (1983) described executive function as “those capacities that enable a

person to engage in independent, purposive, self-serving behavior successfully” (p. 38) and as

being “necessary for appropriate, socially responsible … adult conduct” (p. 507). It is generally

agreed on that many high-level cognitive functions are directly related to the prefrontal cortex

(PFC) of the brain and are labeled as executive functions (Luria, 1973).

In the 1970s and 1980s the information-processing approach was revised to include

“supervisory systems that regulate the flow of information and control behavior” (Lewis &

Carpendale, 2009; p. 2). The focus of this research was also on a working-memory model

(Baddeley & Hitch, 1974), which is now part of the current conceptualization of executive

function. As technology improved in the 1990s and 2000s, the study of executive function

became less about social interaction, and more about individual functioning and

neuropsychological pathways. With the development of the medical resonance imaging (MRI),

positron emission tomography (PET), and electroencephalogram (EEG) scans, various neural

pathways have been examined, enabling researchers to pinpoint areas of the brain involved in the

executive function processes, particularly in the frontal lobe. Zelazo, Müller, Frye, and

Marcovitch (2003) indicate that research on executive function in the past two decades has

focused on pinpointing the specific component skills of executive function. Miyake et al. (2000)

used structural equation modeling to show that executive function is a made up of distinct

entities, but with an underlying communality. The vast amount of EF research, including

Miyake et al.’s, has been on an adult population, but there has been a shift toward examining EF

in children, including its development, and the best ways to model it. Furthermore, the focus has

9

been on how well EF research on adults could be generalized to understanding EF in children.

Researchers have questioned whether or not the adult conceptualization of brain processes could

help in understanding children in the areas of learning, emotional control, and other important

social skills. For example, Lehto, Juujärvi, Kooistra, and Pulkkinen (2003) have found mixed

results in children on the dimensions of executive function. In comparison to the findings of

Miyake et al. (2000) for adults, children seemed to show similar processes to those of adults as

well as have different processes at work. Clinicians working with children with neurological

difficulties also began examining their EF skills (Gioia, Isquith, Guy, & Kenworthy, 2000) as

well as developing interventions to address their deficits. Despite the progression of the research

on EF and its application, the construct is still an area for debate.

Conceptualization of EF

Executive function is considered to be composed of four areas: (a) goal formation

abilities, (b) ability to plan, (c) ability to carry out goal-directed plans, and (d) effective

performance (Lezak, 1983, p. 507). Each area helps explain how humans adapt their behaviors

as well as refrain from exhibiting inappropriate behaviors in order to meet a continually changing

environment. Some best designate EF as an umbrella term used to describe the control functions

of the PFC, particularly those of a goal-oriented nature (Best, Miller, & Jones, 2009).

Executive function is a construct that has eluded universal definition in spite of its

frequent appearance in neuropsychological literature (Jurado & Rosselli, 2007). The difficulty in

defining EF stems, in part, from inconsistent behaviors of those with damage in areas of the brain

believed to directly have an impact on executive function (Miyake et al., 2000). The debate

about the EF construct has been divided into three camps of conceptualization, each supported

by its own set of literature: (a) the existence of one underlying ability (the theory of unity), (b)

10

the existence of several, but distinct, brain processes (the theory of non-unity), and (c) a

combination of both unity and non-unity.

Theory of unity. The premise of the theory of unity is all executive processes combined

constitute an overarching, interconnected supervisory system commonly referred to as the

“central executive” (Baddeley, 1986; Norman & Shallice, 1986). This theory entails the

fundamental question of whether a single, underlying ability is responsible for a variety of

behaviors that have been labeled as executive functions. Due to the complexity of the various

systems involved in executive function, the idea that one system controls all aspect of executive

function is considered an outdated conceptualization. Baddeley (1996), one of the pioneers of

executive function research, states:

It is probably true to say that our initial specification of the ‘central executive’ was so

vague as to serve as little more than a ragbag into which could be stuffed all complex

strategy selection, planning, and retrieval checking that clearly goes on when subjects

perform even the apparently simple digit span task. (p. 6)

The idea of a central executive has often been disregarded in recent research (e.g.,

Packwood, Hodgetts, & Tremblay, 2011) because actions that are supposedly controlled by one

overarching entity of the brain lack specificity. An updated version of this theory is that both

general intelligence (g) and working memory are highly linked to a core factor of EF and the

organization of goal directed behavior (Duncan, Emislie, Williams, Johnson, & Freer, 1996).

Duncan et al. (1996) contend that Spearman’s (1904) g is a direct reflection of the brain’s frontal

lobes. As such, a connection between consistent deficits in frontal lobe functionality and

measurement of g is not as evident in research because measures used to assess “average

performance on a diverse range of tests” (Duncan et al., 1996; p. 259), such as the Wechsler

11

Adult Intelligence Scale (WAIS; Wechsler, 1955) are not best suited for testing intelligence in

the clinical population (Teuber, 1972). Instead, Duncan et al. argues, tests that incorporate fluid

intelligence are better suited, although not frequently used, for assessing cognitive ability in the

clinical population. This aspect is part of a theory, in which the use of several broad stratum

including the overarching mechanism, g, are considered responsible for intelligence (Carroll,

1993). Fluid intelligence is typically measured through novel problem solving with spatial or

verbal materials. When patient behavior is viewed through this framework, evidence of specific

and consistent deficits in patients with frontal lobe lesions becomes more apparent. Duncan et

al. focused particularly on demonstrating deficits in the area of “goal neglect,” which is defined

as “disregard of a task requirement even though it has been understood” (p. 265). In spite of

Duncan et al.’s persistence about the problems in conceptualizing and measuring intelligence,

evidence shows (e.g., Eslinger & Damasio, 1985; Shallice & Burgess, 1991a; Stuss &

Alexander, 2000) that patients with major PFC lesions can perform in the superior range of

intelligence tests (e.g., WAIS). Hence, the debate about the conceptualization of EF ensues,

which appears to be shifting in support of the concept of non-unity.

Theory of non-unity. Some scholars (Godefroy, Cabaret, Petit-Chenal, Pruvo, &

Rousseaux, 1999; Shallice & Burgess, 1991a) claim that EF is composed of numerous facets,

rejecting the notion of a core EF factor. Their argument, sometimes referred to as the theory of

non-unity, is based on the responses of patients with PFC lesions when administered cognitive

tests. Many patients with PFC lesions perform inconsistently on task-based executive function

tests (e.g., Tower of Hanoi [TOH]; Shallice & Burgess, 1991a) as well as on some cognitive

tests, such as the WAIS and Raven’s Progressive Matrices (Raven, Court, & Raven, 1988). If an

underlying single factor exists, and the function of the PFC is directly related to EF, then all

12

tasks purported to measure it should be difficult to perform for a patient with PFC damage (Stuss

& Benson, 1986).

Due to the ambiguity and confusion that have been associated with EF, its role in the

brain is sometimes conceptualized as a “black box.” More modern research has focused on

“decomposing the proposed ‘black box’ into more informative subcomponents” (Packwood,

Hodgetts, & Tremblay, 2011, p. 457). A common finding is that patients with frontal lesions do

not have consistent or predictable memory deficits, recall, or attention deficits (Goldberg, 2001).

Stuss and Alexander (2000) indicate that it can take researchers years to gather a sufficient

amount of patients with well-defined frontal lesions. And even when such a sample is obtained,

individual differences may play a major role in how these patients perform on task-based

measures. In addition, the intercorrelation between EF tasks in many studies is often found to be

lower (i.e., r ≤ .40; Hughes & Graham, 2002; Lehto, 1996) than expected and, in turn, are often

not statistically significant. In a small sample of 35 ninth-grade Finnish students, Lehto (1996)

found low intercorrelation (e.g., r’s = -.18 to .06) of student performances between three

commonly used neuropsychological task-based measures (Wisconsin Card Sorting Test [WCST],

Heaton, 1981; TOH; & Goal Search Task [GST], Vilkki & Holst, 1989; see Appendix A for a

glossary of acronyms). The WCST and TOH are among the most common instruments used in

research to directly measure executive functioning skills (e.g., Beck, Schaefer, Pang, & Carlson,

2011; Slomine et al., 2002). Lehto (1996) claimed, based on these results, that the

intercorrelation should be higher if a central executive function exists.

A problem that is indirectly a result of the shift toward a theory of non-unity is that

attention has been placed on parsing out individual processes rather than focusing on the

commonality of various factors. Séguin and Zelazo (2005) note that although factor analysis has

13

been useful in clarifying various constructs of executive function over the past 20 years,

researchers’ views on the underlying performance results tend to vary between studies.

Packwood et al. (2011) provide an example of this phenomenon in that it is difficult to see how

the factor of “visual processing” examined in one study (Floyd, Bergeron, & Hamilton, 2004) is

distinct from “visuospatial storage-and-processing coordination” examined in a separate study

(Fournier-Vicente, Larigauderie, & Gaonac’h, 2008). Packwood et al. has called for

transparency in the labeling of subcomponents in order to expedite the comparison between

studies and to decrease the ambiguity of EF constructs. As with most debates, there also exists a

group of researchers who have attempted to combine ideas from both theories.

Underlying commonality. A camp of researchers contends that the best

conceptualization of EF is the incorporation of both arguments (e.g., Fisk & Sharp, 2004; Lehto

et al., 2003; Miyake et al., 2000). The premise is that EF processes are “clearly distinguishable”

from one another, but each process is still related to some degree and “share some underlying

commonality” (Miyake et al., 2000, p. 72). In an individual difference study of 137

undergraduate students, Miyake et al. (2000) administered a battery of widely used executive

tasks (e.g., WCST and TOH) to examine three commonly postulated executive functions:

shifting, updating, and inhibition. Confirmatory factor analysis (CFA) indicated that the best fit

of the data could be described through a three-factor model reflecting shifting, updating, and

inhibition. Miyake et al. (2000) also reported that the target executive function factors were

moderately correlated (r = .42 to .63), but separable. A higher order model was proposed in

which the communality of the three factors was emphasized. However, the authors deemed the

“reduced” model as “good” since the “fit indices [met] standard criteria and [the] χ2 difference

14

test indicate[d] that the model’s fit [was] not statistically worse than the fit of the full model” (p.

73).

Although Miyake et al.’s (2000) study is frequently cited in the executive function

literature, there are number of limitations. One is the small number of participants for a CFA (N

= 137). Furthermore, participants were undergraduate students, which may have had above

average intelligence or socio-economic status, making generalizations to the adult population

difficult. Finally, the use of CFA does not directly address the nature of EF. Instead, the

findings indicate that the three selected measures are tapping distinct aspects of EF.

EF as a cultural construct. Even though a lack of clarity exists about the definition of

EF, scholars generally agree that EF plays a major role in human behavior and for that reason

must be studied further. These functions enable individuals to organize their thoughts, create a

plan, carry out the plan, and persevere on a task until it is completed. These functions are

considered essential to being successful as human beings in school and work setting as well as in

everyday lives (Barkley, 1997). The construct of EF itself, as with any construct, is inherently

based on one’s beliefs and views of the world. It is important to question whether the construct

as measured (in this case EF) is changed or affected in some manner because it is subject to

societal norms or opinions. It may be practically impossible to determine this partiality;

however, it is worth considering if the term executive function may be biased based on the

language used to label it or the framework from which it stems. An example would be those

behaviors that are considered undesirable in the school and work setting, such as lack of

inhibition and emotional control, and are clustered into the category of “executive dysfunction.”

These behaviors may or may not be considered inappropriate in various populations or cultures

worldwide. Cultural norms vary in terms of the display of emotion, interpretability of behaviors,

15

and societal norms that lie at the heart of those behaviors labeled as executive functions. The act

of “guiding, directing, and managing cognitive, emotional, and behavioral functions,” as Gioia et

al. (2000, p. 1) note, seems to be necessary in any society.

Theory and research in developmental psychopathology stem mostly from research

conducted in Western cultures; thus, little research had been conducted on developing culturally

sensitive modes of intervention in as recently as the late 20th

century (Coll, Akerman, &

Cicchetti, 2000). In the past decade, culturally specific research in the area of self-regulation,

particularly anger control and anger suppression is occurring. For example, in some cultures

overt expression of anger demonstrated by males may be considered socially acceptable.

However, Martinez, Schneider, Gonzales, and del Pilar Soteras de Toro (2008) demonstrated in a

group of 498 middle-school Cuban students that both males and females who displayed anger

control tended to be more likely to be rated by peers as well-liked, labeled as best friends, and

considered leaders than those students rated by peers as having difficulty in controlling their

anger. Additionally, in a group of 166 Korean American adolescents between the ages of 11 and

15, anger suppression was linked to depressive symptoms whereas weaker anger control and

greater outward anger expression were associated with externalizing problems (Park, Kim,

Cheung, & Kim, 2010).

Research conducted using non-Western cultures consistently demonstrates the reciprocal

relationship between self-regulation and socio-emotional competence and adjustment (e.g.,

Eisenberg, Liew, & Pidada, 2004; Martinez et al., 2008; Park et al., 2010). For instance,

Eisenberg et al. (2004) surveyed a group of 112 Indonesian students in third grade and three

years later in sixth grade. Students were asked to nominate and rank four classmates liked the

most and four classmates liked the least. Additionally, three teachers were asked to rate each

16

student in terms of regulation, social functioning, and negative emotionality. Results indicated

that boys’ results tended to hold across time and across reporters more consistently than the girls,

but ultimately, good self-regulation and low negative emotionality were good predictors of

positive socio-emotional functioning in both sexes.

Because the concept of EF pervades across cultures and lifestyles, and may heavily affect

individuals’ interaction with their learning environment, educators have become increasingly

interested in the concept. In a PsycINFO search, Bernstein and Waber (2007) reported that, in

1985, there were only five peer-reviewed articles about EF in education-related journals. Similar

publications almost tripled (14) in 1995. By 2005, over 500 articles were published in

education-related journals about EF. Thus, it appears that educators realize the impact that EF

may have on children’s educational experience (Best & Miller, 2010). Research involving the

development of executive functioning and its role in the learning process began to increase in the

field of education and psychology.

Executive Function in Children

Typical EF development. Extensive focus on EF in the adult population subsequently

led to a call for research to be conducted on children to obtain a better grasp of it

developmentally (Lewis & Carpendale, 2009). Hughes and Graham (2002) claim that the body

of literature on children and EF is still in its early stages for three main reasons. One, until

recently PFC has been incorrectly believed to only be functionally mature once a person reached

adolescence. Two, early examination of soldiers who endured head injuries in war were

misinterpreted that lesions to the PFC were not apparent, or rather, not realized until adulthood

(Stuss & Benson, 1986). And three, tests used to measure executive function were traditionally

difficult in nature, making it challenging and inappropriate to use them to assess children. The

17

shift in the research population has resulted in the use of less complex instruments in assessing

similar functions in children. Simplifying instruments can sometimes lead to inappropriate

interpretation from examiners beyond the scope of the instrument as well as greater

manipulability of task component demands (Best, Miller, & Jones, 2009). These alterations to

the original (i.e., adult) instruments raise questions whether such changes may alter what is

actually measured.

Most of the research that initially focused on children in the 1980s and 1990s was on the

atypical development of EF—most commonly ADHD and autism population (Hughes &

Graham, 2002). Recently, the research has shifted toward examining normal executive function

development. EF skills can be fostered for all children through using verbal scaffolding, playing

games that require sustained attention and planning as well as through giving children legitimate

choices and decision-making power (Dawson & Guare, 2009). Best et al. (2009) note, however,

that a disproportionate number of the test participants in many research studies are preschool age

(ages 2 to 5 years) and speculate that this status has occurred for several reasons. One is that

researchers believe a great deal of understanding can be gained by focusing on this age range

when executive functioning is first observable and the types of behaviors associated with

executive functioning need to be activated in social or educational settings. As the brain

develops, the beginning of many of the measurable executive functions also is developing in

adults. Tasks designed to assess EF in children tend to be less complex than adult tasks, making

them simpler and creating less confusion in attempting to single out specific EF abilities. For

example, an exercise in complex response inhibition children may be asked to complete is

known as “Baby Stroop.” This exercise involves matching small cups and spoons, and large

cups and spoons. The child is then told to play a “topsy-turvy” game and is given instructions to

18

match small “baby” spoon to big cup, and large “mommy” spoon to small cup. This task differs

from the commonly known Stroop Color-Word Test (Stroop, 1935), often given to the adult

population to test the same construct. In the child’s version a physical object is available for the

participant to touch, and there are only two variables to manipulate. In the Stroop Color-Word

test for adults, there are no physical objects for the participant to touch and there are more than

two variables. Consequently, testing children requires the examiner to change the tasks from that

required of the adult population. It is legitimate to question whether or not the adult and child

tasks are tapping the same construct (Garon, Bryson, & Smith, 2008).

Studying EF in preschool children is an important line of inquiry. However, it is equally

important to broaden the scope of EF research to include examination of the school-age

population. By expanding the age range to include all youths, a better grasp of development in

executive functioning can be examined. Romine and Reynolds (2005) conducted a meta-

analysis of EF studies, which involved samples of ages 5 to adulthood, and concluded that the

greatest increases of EF occurred in verbal fluency, planning, design fluency, and inhibition of

perseveration from ages five to eight years old. Additionally, the “sleeper effect” may exist,

meaning that individual differences as a young child may not show noticeable effects until

middle school (Best & Miller, 2010). An example of the sleeper effect would be the seemingly

minor effects of EF on theory of mind as a preschooler (i.e., having the mental capacity to

interpret and predict one’s own and other people’s behavior). The negligible abnormalities in EF

may appear to be inconsequential at such an early age to the child’s social interaction, but may

balloon into major social deficits as a teenager (Best et al., 2009). It is becoming increasingly

apparent that EF deficits that could lead to social and emotional problems may start in young

children as an area considered small and unobtrusive, and then develop into major deficits in

19

adolescents or adults. The development of the EF is especially important as children enter a

formal learning atmosphere.

Role of EF in the learning environment. After age five, most children are involved in

school as well as more non-family social settings, both of which require increased self-control.

Executive function is important to understand in the learning environment because of the

repercussions from executive dysfunction. If children are not able to adequately perform basic

classroom functions, such as inhibiting responses, regulating behavior, or predicting outcomes,

their academic success is likely to be compromised (Bull & Scerif, 2001; Palfrey et al., 1985).

An important link may exist between early executive functioning and future academic

achievement. Clark, Pritchard, and Woodward (2010) tested preschool-aged children (at age

four) using individual executive function tasks (e.g., TOH) as well as teacher ratings of executive

functioning using the BRIEF-Preschool version (BRIEF-P; Gioia, Espy, & Isquith, 2003). Based

on a teacher-rated measure of mathematics achievement, students who performed well on the

tasks at age four relative to peers were rated higher relative to their peers at age six. However,

these researchers also found the converse to be true about early executive function delay.

Children who showed delays in executive functioning development during their preschool years

also tended to have below average mathematics performance two years later. These findings

replicate and extend prior findings in this area. Children who have been identified as having

specific learning difficulties in mathematics have also been found to experience difficulties in the

areas of inhibitory control, set shifting, and working memory (Bull & Scerif, 2001).

The Behavior Rating Inventory of Executive Function

The Behavior Rating Inventory of Executive Function (BRIEF; Gioia et al., 2000) is a

behavior rating scale designed to assess the behavioral characteristics related to executive

20

function deficits of youth in the school and home environments. The BRIEF is probably the

best-known instrument designed to measure EF through a questionnaire format (Thorell &

Nyberg, 2008). Gioia et al. (2000) indicate that the goal from the outset was to “develop a

psychometrically sound measure of executive function in children that would be easy to

administer and score and would yield clinically useful information about commonly agreed upon

domains of executive function” (p. 35).

There are two versions of the original BRIEF (the Parent form and the Teacher form;

Gioia et al., 2000), which are intended for youth ages five through 18. There are several

variations of each version of the BRIEF, which are designed for different age ranges. The

BRIEF-Preschool version (BRIEF-P; Gioia, Espy, & Isquith, 2003) is available for both parents

and teachers to rate children between the ages of 2 to 5. Two self-report versions were created

for individuals to rate their own behavior: one for youth between the ages of 8 to 18 years

(BRIEF- Self-Report [BRIEF-SR]; Guy, Isquith, & Gioia, 2004) and an adult version (BRIEF-

Adult; Roth, Isquith, & Gioia, 2005), suitable for those 18 to 90 years old. Additionally, an

informant version is available as part of the BRIEF-Adult for those persons who are in frequent

contact with the adult being evaluated. For the purposes of this study, the parent version of the

BRIEF for youth ages 5 through 18 is reviewed.

Parent Version

Description. The BRIEF-Parent form is an 86-item questionnaire, in which

parents/guardians are asked to rate problematic behaviors of their child. Responses are

aggregated to form eight clinical scales: (a) Inhibit, (b) Shift, (c) Emotional Control, (d) Initiate,

(e) Working Memory, (f) Plan/Organize, (g) Organization of Materials, (h) Monitor; and two

validity scales: (i) Inconsistency, and (j) Negativity. The Inhibit scale measures the ability to

21

suppress impulses and to stop one’s own behavior at the proper time. The Shift scale assesses

the ability to move freely from one situation, activity, or aspect of a problem to another without

“getting stuck” on a topic; it also taps behaviors relating to transition, tolerating change, or to

problem-solve flexibly. The Emotional Control scale relates to the ability to modulate emotions,

such as anger, and to avoid rapid mood changes. The Initiate scale measures the ability to begin

a task or activity, and to independently problem-solve or generate ideas. The Working Memory

scale assesses the capacity to hold information in mind for the purpose of encoding information

and achieving goals. The Plan/Organize scale assesses abilities to develop appropriate steps

ahead of time in order to carry out events in a systematic manner, and to prioritize tasks in a

fashion that is not haphazard. The Organization of Materials scale relates to abilities to maintain

orderliness in everyday situations. The Monitor scale relates to abilities to keep track of one’s

own and others’ efforts through “work-checking” behaviors (Gioia et al., 2000, p. 17).

Gioia et al. (2000) attempted to address the area of bias through the Inconsistency scale

and the Negativity scale. The Inconsistency scale is designed to gauge how often a rater answers

similar questions in an inconsistent manner. For example, a rater may answer Never in response

to item 44 (Gets out of control more than friends), but also answer Often in response to Item 54

(Acts too wild or out of control; Gioia et al., p. 15). If such inconsistency emerges across similar

items throughout the instrument, a high Inconsistency score will be associated with the BRIEF-

Parent scores. Thus, Gioia et al. recommend that clinicians examine the protocols carefully

when the Inconsistency scale is abnormally high (≤ 6 is “acceptable;” 7 to 8 is “questionable;”

and ≥ 9 is “inconsistent”; p. 15). Examiners also need to inquire about the inconsistencies

identified. If the rater’s explanations of the inconsistencies are reasonable, then the scores from

22

the protocol should still be considered valid. If explanations are not reasonable, the rating scale

should not be used as a source of information.

The Negativity scale is also used to examine validity of a rater’s responses by measuring

how often the rater answers BRIEF-Parent items in an abnormally negative manner in relation to

the clinical samples. Nine specific items make up the Negativity scale (e.g., Item 8 “Tries same

approach to a problem over and over even when it does not work”). Gioia et al (2000)

designated that these items represented a distinct scale because all could be answered in an

“unusually negative manner” (p. 16), even though these items are also contained on other

subscales. The higher the raw score obtained, the more likely it is that the rater has a negative

perception of the child. A negative perception may influence the rater’s objectivity when rating

children’s behaviors. Inflated scores as a result of a rater’s perception is not a unique problem of

the BRIEF-Parent, but of any observer rating scale (Denckla, 2002). The other possibility,

however, is that the child truly may have severe executive dysfunction resulting in higher overall

scores in various areas. Scores of 5 or more are considered “elevated” (Gioia et al., 2000, p. 14)

and scores of more than 7 “reflects either an excessively negative perception of the child or that

the child may have substantial executive dysfunction” (Gioia et al. 2000, p. 15). If the

Negativity scale score is high, the examiner is prompted to investigate the reason behind the high

score and should make a decision regarding whether the protocol can be used as a valid source of

information.

Responses to the eight clinical scales are grouped into three composite scores, which are

calculated based on the above scale scores: the Behavioral Regulation Index (BRI), the

Metacognition Index (MI), and the Global Executive Composite (GEC). The BRI is a composite

of the Inhibit, Shift, and Emotional Control scales and “represents the…ability to shift cognitive

23

set and modulate emotions and behavior via appropriate inhibitory control” (Gioia et al., 2000; p.

20). The remaining scales (Initiate, Working Memory, Plan/Organize, Organization of Material,

and Monitor) are combined to reflect the MI score. Gioia et al. (2000) defined the MI as the

“ability to cognitively self-manage tasks and reflects the child’s ability to monitor his or her

performance” (p. 21). Presently, the BRI and MI scores are combined to form the GEC, which is

defined as a “summary score that incorporates all eight clinical scales of the BRIEF” (Gioia et

al., 2000, p. 21) and reflects an individual’s overall executive functioning based on the given

responses. Because of the various levels of interpretation arising from the scales, index scores,

or overall GEC, practitioners may find it difficult on how to interpret the BRIEF. If looking at

the scores at a cursory or screening level, Gioia et al. (2000) recommend using the eight scales

because they can be charted and visually inspected. Scores for each scale and composite are

expressed through norm-referenced T scores and percentiles based on either a national norm

group or by gender in the norm group.

Development. Gioia et al. (2000) created items for the BRIEF based on their clinical

experience as well as a review of neuropsychological literature. A group of general education

teachers, special education teachers, and reading specialists reviewed a pool of 180 parent items

for clarity and ease of reading, resulting in the removal of 51 items. The authors and 12

independent reviewers (i.e., “neuropsychologists in hospital and university-based clinical

practice,” p. 36) evaluated the remaining 129 items. No additional items were removed

following this review. The 129-item version was initially comprised of nine scales: (a) Inhibit,

(b) Shift, (c) Emotional Control, (d) Working Memory, (e) Sustain, (f) Plan, (g) Organize, (h)

Monitor, and (i) Initiate. To refine the scale, the parent form was then administered to 212

parents, whose children were enrolled at a local school. An iterative item-total correlation

24

process was used in eliminating items in a stepwise fashion. No additional items following this

analysis were eliminated. Principal factor analysis (PFA), with an orthogonal rotation, was run

on the intended items for each scale to identify the factor structure of the items and to further

refine each scale. The nine analyses resulted in one primary factor for each scale (Gioia et al.,

2000). No items were eliminated from any of the scales after the PFA analysis, but were re-

examined by the authors to ensure that each item aligned with the authors’ conceptualization of

the EF construct.

The standardization of the BRIEF was conducted using the 129 items, with another

iterative item-total correlation reliability process applied to “several larger clinical samples”

(Gioia et al., 2000, p. 37). The descriptions in the BRIEF manual are vague, and no depth is

provided about the clinical sample. Gioia et al. reported that the results supported the pre-

existing nine scales and that “larger and more reliable datasets allowed for final editing of the

scales” (p. 37). Whatever the authors did resulted in the selection of 86 items to create the final

version of the BRIEF instead of the 129-item version, which was used in the scale’s

standardization. Results indicated that the intercorrelation between some of the BRIEF scales

(Working Memory and Sustain r = .96; Plan and Organize r = .94) were singular. Thus, Gioia et

al. combined the respective scales and streamlined each set of items to be reflective of one scale.

This process resulted in the scales of (a) Working Memory and (b) Plan/Organize and in a

reduction of nine scales to seven. Nine items were identified that did not fit well in any of the

remaining scales, but Gioia et al. determined the items to be important to children’s everyday

functioning. Thus, the Organization of Materials scale was created from these items, resulting in

86 items across eight scales on the BRIEF. The validity scales were developed based on the

frequency of responses of inconsistency or negativity across all the items. Thus, the

25

Inconsistency scale is computed by a sum of raw difference scores between specific paired items.

The Negativity scale is computed by summing raw scores of specific items, in which a higher

raw score indicates a greater degree of negativity.

Normative sample. The normative group consisted of 1,419 parent ratings of students

between the ages of five and 18 with no history of special education or psychotropic medication

usage. Additionally, no more than 10% of items could be missing in order to be included in the

normative dataset. Attempts were made by Gioia et al. (2000) to mimic the population of the

United States, based on such variables as gender, socioeconomic status (SES), race/ethnicity,

age, and geographical population density. Participants were obtained through samples of both

private and public schools in a variety of settings (urban, suburban, and rural) in the state of

Maryland. Twenty-five schools were sampled: 12 elementary, nine middle, and four high

schools. Additionally, 18 adolescents, who were in a typical control group in a study examining

traumatic brain injury, were recruited to take part in the normative study.

Evidence for factor structure. Nine studies were found that have examined the factor

structure of the BRIEF-Parent version with various clinical populations. Five studies were based

on U.S. samples (Donders, DenBraber, & Vos, 2010; Gioia, Isquith, Retzlaff, & Espy, 2002;

Hulac, 2008; LeJeune et al., 2010; Slick, Lautzenhiser, Sherman, & Eryl, 2006) and four studies

occurred outside of the U.S. (Batan, Öktem-Tanör, & Kalem, 2011; Egeland & Fallmyr, 2010;

Huizinga & Smidts, 2011; Qian & Wang, 2007). The U.S. versions are reviewed first and then

the translated versions. In three studies, validity scales were explicitly acknowledged, but were

only used in two to consider the inclusion of cases in statistical analyses (Donders et al., 2010;

LeJeune et al., 2010; Slick et al., 2006). LeJeune et al. used all the BRIEF scores regardless of

results from the validity scales. In contrast, Donders et al. used the validity scales to establish

26

which cases would be involved in the primary analyses. Thus, the scores of eight BRIEF forms

were eliminated due to unusual degree of negativity or inconsistent responding. Slick et al.

reported screening each BRIEF protocol in relation to the validity scales, but no cases were

eliminated.

U.S. versions. Confirmatory factor analyses (CFA) were run on the BRIEF in two (Gioia

et al., 2002; LeJeune et al., 2010) of the five U.S. studies and exploratory factor analysis (EFA)

was used in the other three studies (Donders et al., 2010; Hulac, 2008; Slick et al., 2006). Gioia

et al. (2002) conducted a series of CFAs to establish the factor structure of the BRIEF-Parent

form. Instead of testing the BRIEF’s factor structure based on eight scales, Gioia et al. tested the

factor structure based on nine scales. The Monitor scale was divided into two separate scales

(Task-Monitoring and Self-Monitoring). Each BRIEF scale is considered to reflect an executive

function that is distinct, but related to each other by overarching executive systems reflected

through the BRI, MI, and GEC composites. The scales were treated as indicators and the factors

reflected the creation of composite scales. A minimum of two indicators was used to create a

factor. Based on maximum likelihood extraction, four models (one-, two-, three- & four-factors)

were tested in a sample of 374 children aged 5-18 years (M = 9.06 years, SD = 2.73) with mixed

clinical diagnoses (e.g., ADHD, learning disabilities, autism spectrum disorders, and affective

disorders). The one-factor model, a general executive function factor, was composed of all nine

scales. The two-factor model consisted of Behavioral Regulation (Inhibit, Shift, Emotional

Control, and Self-Monitor) and Metacognition (Initiate, Working Memory, Plan/Organize,

Organization of Materials, and Task-Monitor). The three-factor model, a reconfiguration of the

nine scales resulted in testing a Behavior Regulation factor (Inhibit and Self-Monitoring scales),

Emotional Regulation factor (Emotional Control and Shift scales), and a Metacognition factor

27

(Working Memory, Initiate, Plan/Organize, Organization of Materials, and Task-Monitor). The

four-factor model was composed of the prior structure of the Behavior Regulation and Emotional

Regulation factors plus the subdivision of the Metacognitive factor into “Internal” Metacognition

(Initiate, Working Memory, and Plan/Organize) and “External” Metacognition factor

(Organization of Materials and Task-Monitor).

The baseline one-factor model (general executive functioning) had the worst fit relative

to the other proposed models (χ2/df = 17.41; CFI = .77; SRMR = .09; RMSEA = .21) based on

minimum fit criteria (comparative fit index [CFI] > .95; standardized root mean square residual

[SRMR] < .08; root mean squared error of approximation [RMSEA] ≤ .06; a χ2/df ratio < 5).

Gioia et al. (2002) determined that the best fit was the three-factor model (CFI = .95; SRMR =

.04; RMSEA = .11; χ2/df =5.42); however, the fit of the three-factor model was less than ideal

based on its present form (i.e., RMSEA; χ2/df). Gioia et al. revised the three-factor model post-

hoc by correlating some of the error terms. According to Byrne and Shavelson (1996), this

decision must be based on theory rather than through post-hoc analyses. Gioia et al.’s rationale

for re-specifying the model was based on Barkley’s (1997) work; namely, inhibition is related to

other executive function processes, such as working memory, emotional control, and

organization. By estimating these error covariances, the three-factor model fit was significantly

improved (CFI = .97; SRMR = .03; RMSEA = .08; and χ2/df ratio = 3.4). However, some SEM

experts may consider this adjustment a limitation. The issue is whether estimated correlated

errors are appropriate and reflect an actual fit of the data to the model or whether such a

procedure has inflated the actual fit of the models to the data (Byrne, 2006). Additionally, no

higher-order models were tested, even though such a model is explicitly stated as part of the test

authors’ conceptualization of executive function.

28

LeJeune et al. (2010) examined a 24-item abbreviated version of the BRIEF in two

samples (i.e., Normative and Confirmed ADHD) and submitted the results to a CFA. Results

indicated a two-factor solution fit the data well (χ2

= 521.03, df = 19, p = .31; goodness of fit

index [GFI] = .92; CFI = .95; RMSEA = .05; 90% CI = .04, .07). The two-factor solution was

also found to be invariant across gender and age groups. A limitation of this study was that a

majority of the cases analyzed (86.7%) were based on data from the original normative sample

collected by the test authors, so it was not independently conducted. Gioia and Isquith are the

two lead authors of the original BRIEF and both contributed to this study. Another limitation

was that in the confirmed ADHD sample the Monitor and Initiate scales on the short-from had

relatively weak correlation (e.g., r = .56 - .61) to the original BRIEF scales. In the Normative

sample, Initiate was also had a relatively low correlation to the original BRIEF scales (.60).

LeJeune et al. (2010) explained these low correlations were due to the “recruitment procedures

for the sample… [that] may have differentially attracted parents with very marked concerns” (p.

190). However, another explanation is that the short form may not be configured to accurately

capture the specific behaviors and their severity, which are considered hallmark symptoms of

ADHD. Thus, the validity of the scores on the short-form of the BRIEF is insufficient to warrant

its use and further research is needed.

The factor structure of the eight-scale BRIEF was examined in a sample of 100 children

(ages 6-16) affected by traumatic brain injury (TBI). Donders, DenBraber, and Vos (2010) used

EFA, with maximum likelihood extraction, to identify two latent constructs. These findings

were similar to those obtained with the standardization sample (Gioia et al., 2000), except for

some variations. Donders et al. found that the Inhibit scale loaded on the MI factor rather than

loading on the BRI factor, as Gioia et al. (2000) reported. This finding suggests that, in children

29

with TBI, the Inhibit scale may be reflecting a more cognitive, rather than behavioral, aspect of

impulse control.

Donders et al. (2010) acknowledged that one of the limitations of this study was the

population. Recruitment was from rehabilitation referrals, so the severity of the cases of TBI

was greater in this particular population than the general population of children with TBI. Thus,

Donders et al. indicate that their sample may not have been an accurate reflection of the varied

degrees of TBI in the general population, limiting the generalizations of the findings.

Additionally, the study had a small sample size (N = 100) for a factor analysis. The location of

the Inhibit scale on a different factor than had been found before is noteworthy and should be

explored in future research.

Hulac (2008) examined the factor structure of the BRIEF-Parent form via EFA (principal

components analysis) in a sample of 93 adolescent females living in residential treatment

facilities. Hulac reported that the one-factor solution (general executive functioning) best

described the BRIEF structure for the sample. Hulac considered the identification of a one-

factor solution was due to the higher rate of underlying psychological conditions of the

adolescents (e.g., anxiety, depression, or bipolar disorder), an indication that the BRIEF may not

be invariant across psychological conditions. A limitation of Hulac’s study was the small sample

size.

Slick et al. (2006) submitted the original BRIEF (eight scales) to principal factor analysis

using a clinical sample of 80 children diagnosed with intractable epilepsy. Based on the Kaiser-

Guttman rule, a one-factor structure was identified with moderately high communalities (e.g.,

Plan/Organize = .81) for all indicators with the exception of Organization of Materials (.57).

However, both two and three-factor solutions (oblique rotation) were tested, because 71% of the

30

nonredundant residuals were greater than .05. Slick et al. reported that the two-factor solution

(the Behavioral Regulation and Metacognition Indices), as originally described in the test

manual, was a better solution for the data. The Metacognition Index factor was comprised of

four scales (Plan/Organize, Working Memory, Initiate, & Organization of Materials) and

Behavioral Regulation Index had three scales (Emotional Control, Shift, & Inhibit). The Monitor

scale loaded equally on both factors, which seems to supports Gioia et al.’s (2002) view that the

Monitor scales reflect two distinct scales—Self-Monitor and Task-Monitor. Slick et al. provided

little information on the three-factor solution, indicating that it was “explored” but “produced a

factor with no salients” (p. 186) and was therefore disregarded as a viable solution.

A limitation of the study is the small sample size (N = 80) for a factor analysis.

Furthermore, using “the eigenvalue rule of 1” is dated in determining the maximum number of

factors to retain in a factor analysis; other more acceptable methods are minimum average partial

and parallel analysis (Thompson & Daniel, 1996). Additionally, salient structure/pattern

coefficients are recommended to be above |.40| (Fabrigar, Wegener, MacCallum, & Strahan,

1999).

Translated versions. In four studies, the factor structure of translated and adapted

versions of the BRIEF-Parent form has been tested: (a) a Norwegian version (Egeland &

Fallmyr, 2010); (b) a Dutch version (Huizinga & Smidts, 2011); (c) a Turkish version (Batan,

Öktem-Tanör, & Kalem, 2011); and (d) a Chinese version (Qian & Wang, 2007). CFAs were

used in three of the studies (Egeland & Fallmyr; Huizinga & Smidts; Qian & Wang) and an EFA

was run in one study (Batan et al., 2011). Two studies (Egeland & Fallmyr; Huizinga & Smidts)

are reviewed in-depth and two are briefly summarized, as only the abstracts of the Batan et al.

and Qian and Wang studies are available in English.

31

Egeland and Fallmyr (2010) examined the factor structure of a 86-item Norwegian

version of the BRIEF parent form. The sample was 158 Norwegian children with no diagnosis

(48 controls) or mixed clinical diagnoses (72 school psychology referrals; 38 mental health

outpatients). Fourth grade children (estimated age 10 years; 23 boys; 25 girls) were used as

controls in the study. The school and mental health referrals composed the clinical sample (86

boys; 26 girls) with an average age of 10.9 years (SD = 2.6). CFA (extraction method

unspecified) was conducted to test five models; the scales were treated as indicators. Three

models tested the BRIEF based on the original eight scales: (a) a one-factor model; (b) a two-

factor model of BRI and MI; and (c) three-factor model of Emotional Regulation (ERI), BRI,

and MI. Two models tested the BRIEF based on nine scales by dividing the Monitor scale into

two scales described above: (a) a two-factor model of BRI and MI; and (b) a three-factor of ERI,

BRI, and MI. Egeland and Fallmyr (2010) found that the best fit was the three-factor model,

nine-scale version, (CFI = .96; RMSEA = .14; χ2/df =3.26). The baseline single-factor model

(general executive functioning) had the worst fit relative to the other models (χ2/df = 8.97; CFI =

.86; RMSEA = .23). The findings replicated Gioia et al.’s (2002) results.

A limitation of Egeland and Fallmyr’s (2010) study was the sample size (N = 158), which

is considered somewhat low for the number of parameters in the model (Comrey & Lee, 1992).

Although a Norwegian translation of the BRIEF was used with a Norwegian sample, this cultural

approach was offset by the use of American norms. Finally, the RMSEA was still high in the

three-factor model (RMSEA = .14), which is above the recommended criterion of .06 (Hu &

Bentler, 1999) and is indicative of a misfit of the model to the data.

Huizinga and Smidts (2011) examined the factor structure of a Dutch adaptation of the

BRIEF, which contained 75 items instead of 86. Parents of 847 Dutch school children (431 boys

32

and 416 girls) were recruited through “regular schools throughout the Netherlands” (p. 54) and

filled out the rating scale. Huizinga and Smidts (2011) conducted both item level and scale

CFAs to test the structure of the eight-scale BRIEF. Discrete factor analysis via Mplus was run

on a 72-item, eight-factor model, which was based on all of the items used in the clinical scales

of the BRIEF. The non-norm fit index (NNFI) was .92 and RMSEA was .109. Modifications

were made due to three items related to handwriting skills, which resulted in an increase in the

NNFI (.95) and a decrease in RMSEA (.087). Both were considered improvements to the model.

The second sets of analyses were multigroup CFAs on a two-factor model of the BRIEF

scale scores across four age groups: (a) 5 to 8, (b) 9 to 11, (c) 12 to 14, and (d) 15 to 18. To

ensure that the same factors were invariant across the age groups, the CFAs were first conducted

with no equality constraints imposed across the groups. Then, the same constraints of the

observed indicators to factors were imposed across age group, and finally, the model was tested

based on whether the factor intercepts were equivalent across the age groups. The two-factor

model comprised the BRI (Inhibit, Shift, and Emotional Control scale) and the MI (Initiate,

Working Memory, Plan/Organize, Organization of Materials, and Monitor).

Without any constraints, the two-factor model was considered to be a poor fit across the

groups (NNFI = .929; RMSEA = .129). Thus, the no constraint model was modified to allow for

two sets of residuals to correlate (Inhibit and Shift; Inhibit and Monitor), which resulted in a

better fit (NNFI = .97; RMSEA = .083). Using the modified model as the baseline, the

subsequent models ([a] equal factor loadings and [b] equal intercept) were tested. The models

did not degrade the fit across the age groups: Equal loadings, NNFI = .975; RMSEA = .078;

Equal Intercept, NNFI = .965, RMSEA = .092. These findings indicated that the two-factor

Dutch version of the BRIEF was factorially invariant across the age groups and that any mean

33

differences found between age groups could be interpreted as such. A limitation of this study

was that the eight-scale model was run twice because the first time indicated poor fit, so three

parameters were freely estimated and then re-run in order to improve model fit. It is debatable

whether this post-hoc fitting was warranted and if the model (without this modification) simply

did not fit the data.

Qian and Wang (2007) evaluated the reliability and validity of the scores for a Chinese

version of the BRIEF-Parent form in a sample of school-age children: 216 diagnosed with

ADHD, schizophrenia, or autism, and 311 labeled as “normal controls.” Confirmatory factor

analysis was conducted on the eight scales. In the abstract, Qian and Wang noted, “the eight-

scale model of the BRIEF was reasonable” (p. 277). No other information about the CFA was

available in English.

In a sample of Turkish youth (213 girls, 99 boys) between the ages of 5-18, Batan et al.

(2011) examined the reliability and validity of both the parent and teacher versions of the BRIEF

scores to establish normative standards. Only the findings of the parent version are reported

here. Batan et al. conducted an EFA on the eight scales, reported a two-factor solution, and

concluded that the solution was consistent with the original factor structure. All other

information about the factor analysis or structure of the BRIEF was in Turkish.

Because the BRIEF was developed in the United States and therefore originated from the

perspective of a Western culture, it is important to carefully examine research using translated

versions of the instrument. Some of the behaviors surveyed (e.g., inhibition) may be both intra-

personally and contextually altered when adapted to assess students in other countries. For

example, research indicates that family conflict is positively linked to externalizing problems in

Korean American youth and that expression of emotion is often discouraged (Park et al., 2010).

34

On the other hand, in many Hispanic cultures, it is socially acceptable for males to display

“machismo,” which encompasses aggressiveness, hypermasculinity, and overexpression of anger

(Harris, 1996). Because the results of the BRIEF are based upon frequency of behaviors from

the perspective of a rater, the results may be culturally dependent. Although researchers purport

to be testing the same constructs, some of the studies (e.g., Huizinga & Smidts, 2011) do not

contain the same number of items as the original form of the BRIEF, meaning that direct

translation is not possible. Some of the translated test versions (Batan et al., 2011; Qian &

Wang, 2007) do not explicitly state the number of items in the translated versions of the BRIEF.

Summary. In general, the findings indicate that six (Batan et al., 2011; Donders et al.,

2010; Huizinga & Smidts, 2011; LeJeune et al. 2010; Qian & Wang, 2007; Slick et al. 2006) of

the nine studies support the original two-factor, eight scale version of the BRIEF-Parent. Two

studies (Egeland & Fallmyr, 2010; Gioia et al., 2002) provide support for the three-factor, nine-

scale version of BRIEF-Parent, in which the Monitor scale is split into two separate scales (Self-

Monitor and Task-Monitor) and contains a third factor of Emotional Regulation.

Studies providing support for the two-factor, eight-scale version of the BRIEF-Parent

have some unique characteristics that may limit generalizability of the findings. Some had small

sample sizes and focused on specific clinical diagnoses, such as TBI and intractable epilepsy

(Donders et al., 2010; Slick et al., 2006). Others used the standardization sample (LeJeune et al.,

2010) or one described as “normal school children” (Huizinga & Smidts, 2011). These are not

necessarily generalizable to the U.S. special education population. Although some studies

occurred outside the U.S., and the BRIEF had to be translated, the findings were similar to those

from the U.S. studies. For two studies (Batan et al., 2011; Qian & Wang, 2007), it would have

been helpful to have additional information beyond abstracts to evaluate their findings. One

35

study (Egeland & Fallmyr, 2010) provides independent support of the three-factor, nine-scale

version. Both the Egeland and Fallmyr (2010) and Gioia et al. (2002) studies were based on

mixed clinical diagnoses samples, making the findings generalizable to special education

populations. Although both Gioia et al. (2002) and Huizinga and Smidts (2011) made post-hoc

modifications to improve model fit, Egeland and Fallmyr reported similar findings without such

modifications. Hulac’s (2008) findings were an anomaly in which a one-factor solution of the

eight scales was identified. No other studies identified via EFA or CFA a one-factor

solution/model as the best structure. However, Hulac used a small sample of 93 adolescent

females in residential treatment centers. Possibly the sample size as well as the unique sample

could have been factors in the emergence of such findings. Another unique finding was reported

by Donders et al. (2010), in which the Inhibit scale loaded on the MI factor instead of the BRI

factor. Again, the findings were based on a small sample of 100 children diagnosed with TBI;

both aspects could have contributed to the unique findings.

In summary, the BRIEF-Parent form has received support for the two-factor, eight-scale

version, which is the current configuration for test use. The support appears to be based on small

clinical samples (Donders et al., 2010; Slick et al., 2006). The three-factor, nine-scale version

has been based on mixed clinical diagnoses samples, but currently its support is based on two

studies (Egeland & Fallmyr, 2010; Gioia et al. 2002). However, the fit statistics for the three-

factor models did not meet the recommended criteria (e.g., RMSEA = .14; Egeland & Fallmyr,

2010). Based on both set of studies—eight- or nine-scale versions, it is important to continue

investigating the nature of the factor structure of the BRIEF-Parent in unique clinical samples as

well as mixed clinical diagnoses samples. It is particularly important to examine (and scrutinize)

the current factor structure of the BRIEF instrument in a mixed clinical sample because of the

36

similarity of this type of sample to a special education population. Given the increased use of EF

constructs and the BRIEF instrument in the school setting to examine children’s academic

difficulties, the current factor structure must be psychometrically sound. Both types of studies

(i.e., clinical samples and mixed clinical samples) are necessary to strengthen the case that the

BRIEF-Parent is a useful diagnostic tool for populations of youth who are experiencing problems

in executive functions. Its usefulness starts with whether the factor structure is the same across

diverse populations of youth. If the BRIEF-Parent is not, then its usefulness is limited or its

scale work needs to be re-examined.

Reliability Evidence of the BRIEF-Parent Form

Internal consistency. Cronbach’s alpha (1951) internal consistency for the scores of the

BRIEF scales, indexes, and GEC have ranged from .82 (Initiate) to .98 (GEC) in the clinical

sample and .80 (Initiate) to .97 (GEC) in the normative sample (Gioia et al., 2000, p. 51). A

general rule of thumb is that values above .80 are preferable for psychoeducational or clinical

tasks; values above .90 are considered “excellent” (Sattler, 2001, p. 102). Huizinga and Smidts

(2011) reported reliability estimates of the BRIEF scores that ranged from .78 (Initiate) to .90

(Working Memory) for the scales. Cronbach’s alphas for composite—the BRI, MI, and GEC—

scores were between .93 and .96. Item total correlation for all scales was above the benchmark

of .30 as established by Nunally and Bernstein (1994). Batan et al. (2011) also reported

sufficiently high reliability estimates of the scores for the Turkish version of the BRIEF-Parent,

which ranged from .60 to .94 (no scales were specified) for the scales. Qian and Wang (2007)

noted that Cronbach’s alphas for the scales ranged from .74 to .96. No scales were linked to

specific coefficients, except for the authors’ reporting a low estimate for a scale labeled “initial”

(.61). It is possible that the authors meant the Initiate scale, which might have been misspelled

37

during translation. Using normative sample data, LeJeune et al. (2010) reported internal

consistency for the scale scores of the BRIEF short-form ranging from .68 (Initiate) to .81

(Emotional Control) and from .86 (BRI) to .93 (GEC) for the composite scores. Based on the

reliability estimate reported, those for the Initiate scores have usually been the lowest estimate.

In three studies (Huizinga & Smidts, 2011; LeJeune et al., 2010; Qian & Wang, 2007), the

reliability estimates for the Initiate scores were below .80. Two of these studies were translated

versions of the BRIEF (Huizinga & Smidts, 2011; Qian & Wang, 2007) and the third examined

the short form of the BRIEF in comparison to the original form (LeJeune et al., 2010). These

studies used forms that were different from the original version of the BRIEF, which may

explain the low reliability score. In particular, the Initiate scale contains items relating to

beginning a task or activity as well as independently generating ideas. This scale includes items,

such as “Does not take initiative” or “Needs to be told to begin a task even when willing,” which

may reflect culturally-bound behaviors.

Interrater reliability. The BRIEF manual provides information about interrater

agreement (teacher-parent, parent-parent, and teacher-teacher; Gioia et al., 2000). Correlations

between teachers and parents have ranged between .15 (Organization of Materials and Shift

scales) to .50 (Inhibit scale; Mdn r = .24). No specific information was provided in the manual

about correlations between parents or correlations between teachers. According to Gioia et al.

(2000), interrater reliability estimates of scores between different types of raters, in this case,

teacher-parent versus the same pairs of raters, are expected to be lower due to the difference in

settings in which the child is observed. Thus, Gioia et al. consider these findings to reflect

differences in environmental structure between home and school as well as different expectations

in terms of organization in the school setting (lockers or materials given to students). No

38

independent study has examined the interrater reliability of the BRIEF scores; although,

correlation between parent and teacher ratings are often lower (.30 to .50) than parent-parent or

teacher-teacher interrater reliabilities (Achenbach et al., 1987), often causing different patterns of

agreement (e.g., Jepsen, Gray, & Taffe, 2012).

Test-retest reliability. A group of 54 parents, who served as part of the normative

sample, was given the BRIEF-Parent form to complete twice about their child over a two-week

period. Test-retest correlations ranged from .76 to .85 (Mdn r = .81). For the parent clinical

sample (n = 40), the reliability coefficient of the scores was slightly lower (.72 to .84; Mdn r =

.79) over an average of three weeks (Gioia et al., 2000).

Huizinga and Smidts (2011) also examined test-retest reliability for the Dutch version of

the BRIEF using Intraclass Coefficients (ICC), with the following criteria: ICC < .2 = very low,

.2 to .4 = low, .4 to .6 = intermediate, .6 to .8 = high, and .8 to 1.0 = very high (Landis & Koch

1977). All composite scores (BRI, MI, and GEC) were above .8 and the remaining scales ranged

between .73 (Working Memory) and .94 (Inhibit). Qian and Wang (2007) reported test-retest

reliability estimates of the scores were .68 to .89 (no scales or composites specified) for the

BRIEF in a sample of school age children in China.

Other Evidence for the Construct Validity of the BRIEF-Parent Form

Several types of validity are useful in the interpretation of scores from a scale (Messick,

1995). Five types of validity are addressed in relation to the BRIEF-Parent form: (a) predictive

validity, (b) convergent validity, (c) discriminant validity, (d) ecological validity, and (e) social

consequences.

Predictive validity. Pratt (2000) examined the BRIEF-Parent ratings of 212 children

between the ages of 6 and 11 years old. Participants were comprised of four groups: (a) ADHD,

39

(b) Reading Disorder (RD), (c) ADHD + RD, and (d) controls. ADHD children were found to

have statistically significant more problems on all BRIEF scales, and RD children had

statistically significant elevated BRIEF scores on the Sustain, Working Memory, and Plan scales

in comparison to the other groups. Pratt concluded that based on the BRIEF scores, those in the

ADHD + RD group could be distinguished from the RD and control groups, but not from the

ADHD group.

A major limitation of Pratt’s (2000) study was that an 80-item version of the BRIEF was

used before the instrument was officially published. This 80-item version had nine scales and

differs from the 86-item, eight-scale version that was officially published. For example, in the

80-item version, Plan and Organize represented two scales, whereas the official BRIEF has a

scale that combines the two constructs—Plan/Organize. As a result of these differences, results

from Pratt’s study cannot be directly compared to those studies that used the official version.

Two other studies (Mahone et al., 2002; McCandless & O’Laughlin, 2007) have

examined the predictive validity of the BRIEF-Parent form. Mahone el al. looked at the ratings

of parents for 76 children (18 ADHD; 21 Tourette’s (TS); 17 TS + ADHD; and 20 controls).

The Inhibit and Working Memory scales were elevated in the ADHD group, but not in the other

groups. Also, correlations were not statistically significant between the BRIEF scales and

various task-based or psychoeducational measures. Mahone et al. concluded that it is difficult to

separate ADHD from other clinical groups associated with EF deficits, solely by using the

BRIEF-Parent form and recommended its use should be in conjunction with other measures. A

limitation of the study was the small sample size (N = 76).

The predictive validity of the BRIEF-Parent scores has been largely based on

differentiating between ADHD subtypes (ADHD-Inattentive and ADHD-Combined).

40

McCandless and O’Laughlin (2007) looked at 70 boys and girls between the ages of five and 13

referred to a university-based clinic for assessment of ADHD, and hypothesized that individuals

identified with ADHD would demonstrate higher than average scores on the BRIEF-Parent

scales than those without ADHD. Specifically, the BRI would be elevated in the ADHD-

Combined subtype and that the Working Memory scale would also be elevated in both subtypes

in comparison to controls not diagnosed with ADHD. The findings not only supported this

premise, but the MI also was found to be elevated in both subgroups relative to the control

group. Because this sample was referred to the clinic and contained a small number of

participants, the results may not be as generalizable to the population.

In summary, the predictive validity of the BRIEF-Parent scales is limited. Various

BRIEF scales (BRI, MI, and Working Memory) were elevated in subgroups of the ADHD

sample (McCandless & O’Laughlin, 2007), but the correlations were not statistically significant

between the BRIEF-Parent scales and various task-based or psychoeducational measures

(Mahone et al., 2002). Pratt (2000)’s study demonstrates that elevated BRIEF scores existed in

the ADHD and RD sample, but the findings may have limited generalizability due to the use of

an unofficial version of the measure. Because of the variability in findings across the predictive

validity studies, it is recommended that the BRIEF not be used by itself for diagnostic purposes,

but in combination with various sources of information (Mahone et al., 2002)

Convergent validity. The effect of executive function on behavior should be reflected in

the convergence between established rating scales purported to measure the same (or similar)

behaviors (Messick, 1995). Gioia et al. (2000) tested for convergent validity between the

BRIEF-Parent scales and four measures supposedly tapping several similar constructs. No direct

comparison could be made between the BRIEF and other rating scales of executive function

41

because none existed at the time of standardization of the BRIEF. The first test of convergent

validity was with an ADHD measure. Parents of 100 clinically referred children completed the

BRIEF-Parent version and the ADHD-Rating Scale-IV (ADHD-IV; DuPaul, Power,

Anastopoulos, & Reid, 1998), which has two composite scales: Inattention and Hyperactivity-

Impulsivity. About half of the BRIEF-Parent scales (Working Memory, Plan/Organize, Initiate,

and Monitor plus the Metacognitive Index) were moderately correlated (r = .54 to .67) with the

ADHD-IV Inattention scale. The remaining BRIEF scales correlated between .39 and .49 with

the Inattention scale. Four BRIEF-Parent scales (Inhibit, Shift and Emotional Control scales and

the Behavioral Regulation index) were moderately correlated (range = .56 to .73) with the

ADHD-IV Hyperactivity-Impulsivity scale (Gioia et al., 2000). The remaining ADHD scales

were considered significantly correlated with all BRIEF-Parent scales. Correlations ranged from

.33 to .45 with the exception of the Organization of Materials scale, which was not statistically

correlated with the Hyperactivity-Impulsivity scale (r = .15).

Gioia et al. (2000) also examined the relation between the BRIEF-Parent scales and

Achenbach’s Child Behavior Checklist (CBCL; Achenbach, 1991). The CBCL has eight scales:

Withdrawal Problems, Somatic Complaints, Anxious/Depressed, Social Problems, Thought

Problems, Attention Problems, Delinquent Behavior, Aggressive Behavior and two broadband

domains: Internalizing and Externalizing. Based on what the respective measures purport to

measure, Gioia et al. (2000) expected to find similarities between the BRIEF-Parent scale of

Working Memory and the CBCL Attention Problem scale, as well as between the BRIEF-Parent

Inhibit scale and the CBCL Aggression scale.

Parents of 200 clinically-referred children completed both measures. Results indicated a

moderate relation between all BRIEF-Parent scales and the CBCL Attention Problems scale (r’s

42

= .50 to .72), with the exception of Organization of Materials. Three BRIEF scales (Inhibit,

Emotional Control, and Shift) were moderately correlated with CBCL’s Aggressive Behavior

scale (r’s = .57 to .73).

Gioia et al. (2000) also compared the Behavior Assessment for Children Parent Rating

Scale (BASC Parent; Reynolds & Kamphaus, 1992) with the BRIEF-Parent form in a sample of

80 parents of children who were clinically referred. The BASC Parent rating scales have nine

scales: Aggression, Conduct Problems, Hyperactivity, Anxiety, Depression, Somatization,

Atypicality, Withdrawal, and Attention Problems. The BRI from the BRIEF correlated with both

the Aggression (r = .76) and Hyperactivity (r = .63) scales. The BRIEF’s Emotional Control

scale also correlated (r’s = .62 to .69) with the Aggression, Anxiety, and Depression scales of the

BASC Parent. Correlations between the Emotional Control scale and the BASC’s Aggression,

Anxiety, and Depression scales make sense because these particular BASC scales involve

emotional responses of a child, which may be reflected through the ability to control emotion.

The BRIEF Inhibit scale also correlated with the Aggression (r = .72) and Hyperactivity (r = .68)

scales of the BASC Parent. Multiple BRIEF scales (Initiate, Working Memory, Plan/Organize,

and Monitor) correlated with the Attention Problems scale from the BASC Parent. Gioia et al.

(2000) hypothesized that the BRIEF Working Memory scale would correlate with the Attention

scale of the BASC. Gioia et al. concluded that the pattern of correlations were “strong” and

“expected” (p. 55).

Gioia et al. (2000) used a small sample size (N = 80). Additionally, BASC scales such as

Conduct Disorder and Somatization had low correlations with the BRIEF scales or composite

scores. Gioia et al. used these findings as evidence of discriminant validity, noting the

“relatively lower executive contribution to these problems” (p. 55). Gioia et al. make it appear

43

as though the two dysfunctions are completely unrelated; however, this assertion is debatable. It

is likely that those with conduct problems may display compromised executive functions, such as

lack of inhibition. Further, Gioia et al. (2000) examined the relations between the BRIEF and

the CBCL, and found moderate to high correlations between the Inhibit and Emotional Control

scale and the Aggressive Behavior scale of the CBCL, which are types of behavior involved, at

least to some degree, by those individuals with conduct disorder.

Gioia et al. (2000)’s final demonstration of convergent validity was between the BRIEF-

Parent form and the Conners’ Rating Scales (CRS; Conners, 1989). The CRS has eight scales:

Anxiety, Learning, Somatic, Obsessive-Compulsive, Antisocial, Restless-Disorganized, Conduct

Disorder, and Hyperactive-Immature. Parents of 25 clinically-referred children completed both

measures. The BRIEF’s BRI and its scales (Inhibit, Shift, and Emotional Control) correlated

with the CRS’s Restless-Disorganized (r = .71), Conduct Disorder (r = .77), and Hyperactive-

Immature (r = .57) scales. Low correlation (r’s = -.28 to .27) were found between all BRIEF

scales and the Obsessive-Compulsive and Antisocial CRS scales, which Gioia et al. viewed as

consistent with what the BRIEF does (and does not) purport to measure.

A limitation of the study was a small sample size (N = 25) of parent ratings from which

to make generalizations to a population. Additionally, there were several correlations that were

unexpectedly low, such as between the Learning scale of the CRS and the BRIEF’s Organization

of Materials (.06). Even though the Organization of Materials subtest had nonsignificant

correlations with every CRS scale, with the exception of Restless-Organized (r = .42) and

Hyperactive-Immature (r = .50), the lack of relation between organization and learning is an

unpredicted one as organization skills have been linked to academic success (Cameron, Connor,

Morrison, & Jewkes, 2008).

44

Gioia also contributed as a co-author to research (LeJeune et al. 2010) designed to

develop and evaluate an abbreviated version of the BRIEF-Parent form. The short-form is based

on 24 items selected from the original one. Three samples were used to analyze its psychometric

property: (a) the BRIEF Normative sample (N = 1,419) of children aged 5 to 18; (b) a sample of

133 children (ages 5 to 13 years) diagnosed with ADHD; and (c) a sample suspected of having

ADHD, consisting of 84 children (ages 5 to 16). Correlations between the original BRIEF-

Parent scales and the respective short-form scales generally exceeded .75 in the normative and

confirmed ADHD samples (range = .56 to .97). The Initiate (.61) and Monitor (.56) scales on the

short-form had the weakest correlations to the respective scales on the original form. Composite

index correlations between the BRIEF forms were strong in both the normative and ADHD

samples, ranging from .88 (BRI-normative and ADHD samples) to .97 (GEC-normative and

ADHD samples).

Independent research of convergent validity. Independent research on the BRIEF-

Parent form has provided mixed support for convergent validity. Several studies (e.g., Bishop,

2011; McCandless & O’Laughlin, 2007; and Toplack et al., 2009) have examined the relations

between the BRIEF-Parent scales and various task-based measures. McCandless and

O’Laughlin (2007) analyzed parent ratings on the BRIEF-Parent and the BASC Parent of 70

children seen at a university-based ADHD clinic. All correlations between BRIEF scales, and

the Attention and Hyperactivity scales on the BASC Parent were statistically significant. The

correlations had a wide range: .24 (Shift) to .70 (MI) with the BASC Attention scale; and .26

(Shift) to .83 (Inhibit) with the BASC Hyperactivity scale. However, the sample size was small,

and participants were not matched to control for gender, age, or other demographic factors.

45

Toplack et al. (2009) examined the relations between the BRIEF-Parent scales and

several task-based measures in a sample of 90 children (46 diagnosed with ADHD; 44 controls).

The four task-based measures used were inhibition (Stop Task; Logan & Cowan, 1984), set

shifting (Trail Making Task -Part B; Reitan, 1958), verbal and spatial working memory

(Working Memory composite; the Wechsler Intelligence Scale for Children-Third Edition

[WISC-III]; Kaplan et al., 1999) and planning (Stocking of Cambridge task [SOC]; the

Cambridge Neuropsychological Test Automated Battery [CANTAB]; Robbins et al., 1994).

Toplack et al. found statistically significant relations between all BRIEF-Parent scales and many

of the task-based measured. Nonsignificant relations were found between (a) the BRIEF’s

Inhibit and the Stop Task Inhibition (r = .21); (b) the BRIEF’s Shift scale and the Trail Making

Set Shifting (r = .23); (c) and the BRIEF’s Plan/Organize scale and SOC Planning task (r = -.22).

Toplack et al. stated, “virtually all of the executive function experimental tasks were

significantly associated with the parent and teacher ratings on the BRIEF scales” (p. 62).

However, all of these correlations were not reported; only those involving Inhibit, Shift, Working

Memory, and Plan/Organize were provided for inspection. Toplack et al. found that each task-

based measure was not uniquely related to the similarly named BRIEF scale (e.g., the Inhibit did

not correlate only with the similarly named task-based measure Inhibition: Stop Task). BRIEF-

ratings were found to be statistically significant predictors of ADHD status, whereas the task-

based measures were not. A limitation of the study was that the selection of task-based measures

was based on the name of the task aligning with the similarly named BRIEF scale. The name of

the test from the BRIEF and the task-based measure may not be tapping the same construct,

which would explain the low correlations. As noted earlier, a problem with a non-unitary,

individual process approach to the study of the executive function is that researchers’ opinions

46

are involved in naming and conceptualizing factors. This approach may lead to the

underrepresentation of commonality between similar constructs (Séguin and Zelazo, 2005).

Bishop (2011) tested 150 children between the ages of 6 and 18 using the BRIEF-Parent

form (87 diagnosed with ADHD and 63 children diagnosed with internalizing disorders, such as

depression and/or anxiety). Results indicated that children with ADHD had statistically

significantly lower scores on the WISC-IV Working Memory Index, and higher scores

(indicating more impairment) on two of the BRIEF’s scales (Plan/Organize scale and Working

Memory scale) than children with internalizing disorders. The Test of Variables of Attention

Commissions (TOVA; Greenberg & Kindschi, 1996), a measure of inhibition, shared 10% of the

variance with the Inhibit scale on the BRIEF. The WCST Perseverative Responses score shared

8% of the variance with the Shift scale of the BRIEF for those with internalizing disorders.

These results are consistent with similar research (e.g., Toplack et al., 2009) that weak to

moderate correlations (r = .32) exist between task-based measures and the BRIEF.

A limitation of this study was the removal of seven participants with measured IQ score

below 80. Bishop’s (2011) rationale was that the cutoff for IQ scores was to eliminate the

comorbidity of developmental delay. Altering the sample limits the generalizability of this study

to the special education population. The BRIEF is not recommended for use with students with

mental retardation (IQ < 70), but approximately 10% of the population functions within the IQ

range of 70 to 80.

In summary, convergent validity was demonstrated by Gioia et al. (2000) between the

BRIEF-Parent form and four well-known behavior rating scales (ADHD-IV, BASC-Parent,

CBCL, and CRS). However, the sample sizes were small in all analyses. Independent

researchers have found mixed support for convergent validity of the BRIEF. Other behavior

47

rating scales show strong convergence with the BRIEF-Parent form, but many task-based

measures do not. Toplack et al. (2009) showed convergence between the BRIEF scales and four

task-based measures, but some nonsignificant relations were apparent between similarly named

scales or tasks (e.g., the BRIEF’s Inhibit and the Stop Task Inhibition; r = .21). McCandless and

O’Laughlin (2007) showed significant relations between the BRIEF scales and the Attention and

Hyperactivity scales of the BASC Parent. However, Toplack et al. (2009) and Bishop (2011)

showed weak relations between the BRIEF scales and analogous task-based executive function

measures.

Convergent validity and specific clinical populations. The BRIEF-Parent form has

been used to study executive functioning of children in a variety of populations, such as brain

disease (Anderson et al., 2002), moderate to severe traumatic brain injury (Vriezen & Pigott,

2002) and autism spectrum disorder (Gilotty, Kenworthy, Sirian, Black, & Wagner, 2002).

Anderson et al. (2002) found non-significant correlations between task-based activities and the

BRIEF-Parent scales in a sample of 189 children, divided across three clinical groups (44

diagnosed with early treated phenylketonia; 45 diagnosed with early treated hydrocephalus; 20

diagnosed with frontal focal lesions; & 80 controls). Correlations between the BRIEF scales and

task-based measures “varied from .01 to .48” (Anderson et al., 2002, p. 237). Specific

correlations were not provided, but a table was provided that contained the “proportion of

children in each group that scored in the severe range (> 1 SD above the mean) on the BRIEF

parameters” (Anderson et al., 2002, p. 237). A limitation is that the specificity of the sample

hinders the generalizability of the results.

Vriezen and Pigott (2002) provided support for Anderson et al.’s findings.

Nonsignificant to low correlations were found between the BRIEF-Parent scales and task-based

48

activities. The sample consisted of 48 children with moderate to severe traumatic brain injury.

None of the BRIEF index scores correlated significantly with task-based measures of the WCST,

Comprehensive Trail Making Test (CTMT; Reitan, 1958), and TOVA. The BRIEF’s

Metacognitive Index was, however, statistically significantly correlated (r = -.30; p < .05) with

WISC-III Verbal IQ. Also, a greater number of children in the sample were identified as

impaired on the BRIEF more than on the task-based measures. These findings meant that based

on which instrument (BRIEF or task-based) was administered to children, there may have been a

different determination about level of impairment. A limitation of the study was a small sample

size (N = 48). As a result, only the index and composite scores of the BRIEF (GEC, BRI, MI)

were included in the analyses. Precluding the eight BRIEF scale scores did not allow for

examining the direct relation between the BRIEF, and the task-based measures and Verbal IQ

score. It has been suggested that the BRIEF may be tapping behaviors associated with emotional

and social aspects of EF in a different area of the brain than those areas involved in task-based

measures (Stuss & Alexander, 2000). This idea extends beyond the scope of this study, but

warrants further attention in future research when considering why such low correlations exist

between the BRIEF and many task-based measures.

In other studies that examined the BRIEF’s utility in specific populations, Gilotty et al.

(2002) sampled 35 children with ASD, and examined the relation between executive function

skills (BRIEF-Parent) and adaptive behavior (the Vineland Adaptive Behavior Scales [VABS];

Sparrow et al., 1984). There were several statistically significant inverse relations between the

VABS Social scale and BRIEF-Parent scales, specifically the MI (r = -.53), the Initiate (r = -.64),

and Working Memory (r = -.57) scales. As impairment in executive function increased, the

adaptive behavior skills of these children with ASD tended to decrease. Limitations of this study

49

were a small sample size as well as no control group from which to compare the results.

Although this study showed strong relations between the BRIEF scales and a well-known

adaptive behavior instrument, not all studies have yielded such positive results (e.g., Vriezen and

Pigott, 2002).

Gioia and Isquith (2004) have defended the less than ideal correlations between the

BRIEF-Parent scales and other measures in specific clinical populations. Gioia and Isquith

contend that the accuracy of the BRIEF-Parent scales has been inappropriately compared to

results of the WCST. Gioia and Isquith argue that it is unfair to judge the utility of the BRIEF-

Parent scales by comparing it to this particular task-based measure because the WCST scores

have not consistently shown impairment in executive function and ADHD (see Pennington &

Ozonoff, 1996). Assuming their assertion about the WCST is a valid one, it still does not explain

the lack of correlation between the BRIEF and many other task-based measures, such as the

TOH, TOVA, and CTMT. In a review of the BRIEF for the Fifteenth Mental Measurements

Yearbook, Fitzpatrick (2003) noted an absence of established metacognitive measures in testing

the convergent validity of the BRIEF-Parent form. Subsequently, other observer rating scales

measuring executive function, such as the Childhood Executive Functioning Inventory (CHEXI;

Thorell & Nyberg, 2008) have been developed, but no empirically-based studies have examined

their relations with the BRIEF.

Discriminant validity. As evidence of discriminant validity, Gioia et al. (2000) factor

analyzed the BRIEF-Parent scales and composite scores with the CBCL. Correlational data also

provided support. A common factor analysis (principal axis factoring extraction [PAF]; oblique

rotation) of the two scales, based on a sample of 200 parent ratings, indicated a four-factor

solution, which accounted for 73% of the variance. Factor 1 contained the BRIEF’s Shift,

50

Emotional Control, and Inhibit scales. Factor 2 made up the remaining BRIEF scales

(Plan/Organize, Working Memory, Initiate, Monitor, and Organization of Materials). Factor 3

was defined by five of the CBCL scales (Withdrawn, Anxious/Depressed, Social Problems,

Thought Problems, Attention Problems), and Factor 4 comprised the CBCL Delinquent

Behavior, CBCL Aggressive Behavior and the BRIEF Inhibit scale. Thus, the scales of the two

instruments loaded onto two separate factors with the exception of the BRIEF Inhibit scale. The

Inhibit scale loaded on Factor 1 with a value of .42 and on Factor 4 with a value of .53.

According to Gioia et al., the Inhibit scale may measure a more physical than mental

manifestation of inhibition. A limitation of these findings is that the sample size (N = 200) was

small for a factor analysis.

Gioia et al. (2000) also reported low correlations (r’s = .11 - .28) between all of the

BRIEF-Parent scales and the CBCL Somatic Complaints scale (an index of physical complaints

in relation to a child’s emotional functioning). Low correlations were found between all of the

BRIEF-Parent scales and the Conduct Problems scale on the BASC (r = |.05| to |.14|, but low to

moderate correlation between the BRIEF scales and the BASC’s Somatization scale (|.17| to

|.44|). Finally, the BRIEF had a low correlation with the CRS Obsessive-Compulsive or

Antisocial scales (Gioia et al., 2000).

Independent studies have provided mixed support for the discriminant validity of the

BRIEF. McCandless and O’Laughlin (2007) examined the discriminant validity of both the

BRIEF-Parent and BRIEF–Teacher forms in the classification of 70 boys and girls (ages 5-13),

who had been referred to a university-based clinic for assessment of ADHD. The children made

up three groups: (a) No ADHD; (b) ADHD- Inattentive Type (ADHD-IT); and (c) ADHD-

Combined Type (ADHD-CT). Discriminant function analysis was conducted using the scores

51

from the MI scale of the BRIEF-Teacher form and the Inhibit scale from the BRIEF-Parent form

to determine classification of ADHD and its subtypes. Agreement between the BRIEF’s GEC

scores was low (r = .13) and only three of the eight scales were at a level of agreement above

that of chance. This lack of agreement may have contributed to the following results: (a)

approximately 15.7% of the participants were correctly classified as ADHD-IT, (b) 48.6% with

ADHD-CT, and (c) 35.7% without ADHD. Percentage of cross-validated grouped cases

correctly classified was 62.9%. These classification rates are inadequate because 33.3% of the

members, based on three groups in the analysis, would be correctly identified by chance alone

(Tabachnick & Fidell, 2001). Additionally, the results indicated that the BRI of the BRIEF-

Parent was statistically significantly elevated in those children identified as having ADHD-CT,

but the BRI was not significantly elevated on the BRIEF-Teacher. Group differences were

apparent in both forms of the BRIEF, specifically the Working Memory and Inhibit scales, when

classifying children as either ADHD or non-ADHD. McCandless and O’Laughlin (2007) noted

that parents were better reporters of behavioral deficits using the BRIEF, but teachers more

accurately reported behaviors associated with cognitive deficits using the BRIEF. A limitation to

this study was a small sample size (N = 70). Also, because the control group was recruited from

a sample of children at a clinic, participants in this group may have been more impaired than that

of an average population of children serving as controls, hence not truly representative of a

control group.

Reddy, Hale, and Brodzinsky (2011) found a different pattern than that of McCandless

and O’Laughlin (2007). A group of 58 children diagnosed with ADHD were matched (age,

gender, parent’s education, & ethnicity) with 58 children who served as controls. Their parents

were administered the BRIEF. Based on several t-tests for independent samples, the results were

52

statistically significant between the ADHD group and the control group. Three discriminant

function analyses were conducted to examine the classification rate of the ADHD sample in

comparison to the control sample on the GEC, the two index scores (BRI and MI) and the eight

scale scores. Using the GEC, the conditional probability for the children in the control sample

was .77, whereas the ADHD sample was .79. Using the BRI and MI, conditional probability for

the children in the control sample were .86, and the ADHD sample was .79. Results were similar

for the scales: .84 for the control group and .81 for the ADHD sample. These findings meet or

exceed a recommended standard (.75) for diagnostic tests, proposed by Milich, Widiger, and

Landau (1987) for clinical practice.

In examining the discriminant functions, Reddy et al. (2011) reported that the BRI of the

BRIEF had a loading of .77, which was the highest correlation to function in comparison to the

Shift (.35), Emotional Control (.40), and Working Memory (.34) scales. The Inhibit, Initiate, and

Organization of Materials scales had low correlations to the function (.13, .14, and .26,

respectively), and the Plan/Organize and Monitor scales had low inverse correlations to the

function (-.18 and -.10, respectively). Limitations of the study were the small sample, and the

homogenous social class and racial composition of the sample, which were primarily Caucasian

children with college-educated parents. All of these issues may limit generalizability of the

findings.

Ecological validity. As in most fields, it is necessary to increase the applicability of the

results obtained through controlled experiments to naturally occurring phenomenon, which is

defined as ecological validity. Two aspects of ecological validity are pertinent to

neuropsychological testing: (a) verisimilitude (degree of similarity between test demands and

real-life demands) and (b) veridicality (degree of accuracy in predicting some environmental

53

behavior or molar outcome; Franzen & Wilhelm, 1996). Applying this concepts to the context of

school-based evaluations, versimilitude is similar to face validity in that the amount of what the

child does during an individual assessment translates into what a child is expected to do in a

classroom or learning environment. Because most individual assessment settings are quiet and

controlled, unlike a classroom with several distractions, Franzen and Wilhelm (1996) contend

that assessments often underestimate the degree of difficulty a child experiences in real-world

settings. Veridicality addresses whether a test might predict real-world behavior and can be used

to predict future behavior, which may contribute to a better understanding of the child in settings

such as the classroom. The point of psychoeducational assessments is to develop effective

interventions based on observations or test performance in individual assessment settings and

ensure these same interventions apply in the classroom (Franzen & Wilhelm, 1996).

In regard to ecological validity, Gioia and Isquith (2004) provide an application of the

BRIEF’s methodology to the assessment of executive dysfunction using a sample of children

with traumatic brain injury (TBI) and advocate for the use of both behavioral rating scales and

task-based measures in assessment. Both methods are necessary to properly assess and develop

appropriate interventions in clinical as well as applied (e.g., school) settings. Thus, the authors

outlined the neuropsychological deficits associated with TBI and the social, emotional,

academic, behavioral, and environmental impacts of such an injury and role that the BRIEF-

Parent form plays in measuring these areas.

Gioia and Isquith (2004) contend that the items on the BRIEF-Parent form have strong

ecological validity for several reasons: (1) the items originated from clinical interviews with

parents and teachers as well as input from 12 clinical neuropsychologists; (2) the BRIEF was

designed to capture everyday manifestations of executive dysfunction through items, such as

54

“When sent to get something, forgets what he or she is supposed to get” (tapping working

memory); and (3) the BRIEF scores correlated with scholastic achievement (e.g., Clark,

Pritchard, & Woodward, 2010; Mahone, Koth, Cutting, Singer, & Denckla, 2001). The BRIEF,

however, is still prone to limitations of observer rating scales, such as appropriate levels of

linguistic competence and emotional involvement influencing observations (Denckla, 2002).

Despite these limitations, interventions dealing specifically with executive function deficits have

shown promising evidence-based results (see Diamond & Lee, 2011). In a recent issue of

Communique, a newsletter circulated to members of the National Association of School

Psychologists (NASP), Cantin, Mann, and Hund (2012) reviewed various measures that

demonstrate strong psychometric characteristics and are recommended for school assessment.

The BRIEF was included in the list of instruments deemed useful for psychoeducational

assessment. Many types of validity, including social consequences, must be considered when

measurement is involved in educational decision-making.

Social consequences. Assessment may have seen, or unforeseen, repercussions for

individuals, groups of individuals, or society as a whole, based on the results yielded from a

measure (Messick, 1989). Thus, social consequences associated with the assessment should

always be considered. There are at least four issues about the BRIEF-Parent form that could

result in negative social consequences: (a) observer’s bias, (b) misinformation in media, (c)

overlap with ADHD diagnosis criteria, and (d) malingers.

The BRIEF-Parent form is designed to identify clusters of behaviors, which have been

labeled executive functions, and to put them in a simplified format that requires another

individual (in this case a parent or caregiver) to gauge the severity of a child’s behaviors. This

format of gathering information results in some degree of observer bias. However, the valuable

55

information gleaned from those in frequent contact with the child likely outweighs the negative

aspects of observer rating scales (Gioia & Isquith, 2004).

Another issue is whether the information gathered through instruments, such as the

BRIEF, ultimately exposes and help those in society who experience these difficulties. Behaviors

considered undesirable, particularly in a learning environment (e.g., speaking out of turn in a

classroom) need to be addressed through school-based interventions. The question is whether

there will be repercussions to the increased exposure of such terms for behaviors (typically used

in clinical settings) to parents and teachers through instruments such as the BRIEF. And, will

this exposure ultimately lead to mistruths about clinical disorders such as ADHD? For example,

Gonon, Bezard, and Boraud (2011) indicate that scientific literature often misrepresents research

about ADHD, resulting in misleading conclusions in the media. This misrepresentation may

result in parents reading or hearing these mistruths and to be misinformed. Parents are primarily

responsible for making decisions about treatment options on behalf of their children (e.g., drug

therapy, counseling, and/or education placement); thus, misinformation may result in incorrect

decision-making by parents. Because this risk is possible, it is important to consider word choice

and content of instruments given to parents about their child’s behaviors, such as the BRIEF-

Parent form. Thorell and Nyberg (2008) have been critical of the BRIEF-Parent form because

the language used in many items are similar, if not the same, to the diagnostic criteria for ADHD,

as in the Diagnostic and Statistical Manual of Mental Disorders-Test Revision (4th

ed. [DSM-IV-

TR]; American Psychiatric Association [APA], 2000). Due to the semantic overlap between

ADHD symptoms and EF measures, such as the BRIEF, Thorell and Nyberg contend that it

makes sense that these instruments would correlate with ADHD symptoms. This similarity in

wording raises the issue of whether the BRIEF-Parent form is actually tapping the relevant range

56

of ADHD symptoms or tapping a narrow range of symptoms due to wording. Thus, it is

legitimate to question whether using the BRIEF-Parent form to identify young children with

ADHD will be useful in predicting the existence of the disorder in the future. Also, Thorell and

Nyberg criticize the BRIEF-Parent form for confounding executive function concepts,

specifically, working memory and sustained attention. The item “has a short attention span” is

part of the BRIEF-Parent’s working memory scale, but Thorell and Nyberg claimed that such

items, which actually examine inattention, should not belong under the working memory

categorization.

Another potential issue in using the BRIEF-Parent form is the detection of malingerers.

No research was found that directly addressed the issue of susceptibility of the BRIEF-Parent

form to malingerers. Clinicians must consider this possibility when interpreting the BRIEF

results, as the test is based on another’s observations of a child. It could be argued that

government agencies provide financial motivation that could influence an individual to

exaggerate or outright lie on a test such as the BRIEF to obtain a diagnosis related to executive

dysfunction. Researchers (e.g., Fisher & Watkins, 2008; Sollman, Ranseen, & Berry, 2010) have

demonstrated in samples of college students, who were administered ADHD screeners, that it is

possible to feign clinical levels and achieve false-positive diagnoses, simply by exposing the

students to Internet-derived materials about ADHD prior to the students completing the

screeners. The possibility of false positives is another reason why Gioia et al. (2000) emphasize

that the BRIEF-Parent form should be only one part of an overall clinical or psycho-educational

assessment. Gioia et al. (2000) has attempted to address the area of bias through the

Inconsistency scale and the Negativity scale (see page 21).

57

Summary

In summary, the BRIEF-Parent form has been shown to demonstrate many facets of

validity (predictive, convergent, discriminant and ecological) particularly by the authors of the

instrument, but to a limited degree in independent studies. Social consequences were also

considered. Evidence is continuing to build to support the premise that the BRIEF-Parent form

is a psychometrically sound instrument in terms of both reliability and validity of scores, but at

this time, reliability estimates of the scores are moderate and evidence of construct validity is

mixed. Frequency of its use has increased in both clinical and school-based settings. The

increase in popularity may be due to the number of empirical studies using the BRIEF-Parent

form. The factor structure of the BRIEF-Parent is still, however, a topic of debate. The

conceptualization of executive function has evolved over the past two decades to involve

theories of separable, but related processes of EF. Predictive validity studies have provided

mixed support on the usefulness of the BRIEF-Parent form. Some research (McCandless &

O’Laughlin, 2007; Pratt, 2000) indicates that children with ADHD show more impairment on the

BRIEF-Parent scales than controls, whereas others (Mahone et al., 2002) show that the BRIEF

scales is not an accurate tool in correctly identifying ADHD or Tourette’s syndrome.

Convergent validity studies have varied in levels of support of the BRIEF. Gioia et al. (2000)

have showed convergence between the BRIEF-Parent scales and other established behavior

rating scales, such as the ADHD-IV, CBCL, BASC, and CRS. However, the BRIEF-Parent

scales have weak to moderate correlations with task-based measures, particularly in small

clinical populations such as children with brain disease or traumatic brain injury (Anderson et al.,

2002; Vriezen & Pigott, 2002). Although Gioia et al. (2000) have been able to demonstrate

discriminant validity through factor analysis between the BRIEF-Parent scales and the CBCL,

58

independent studies have varied in results. McCandless and O’Laughlin (2007) demonstrated

less than ideal discrimination between those with ADHD and the various ADHD subtypes when

using the BRIEF. In comparison, Reddy et al. (2011) showed adequate categorization based on

the BRIEF-Parent, with the BRI scores best separating the diagnostic groups. Ecological validity

studies are somewhat limited to those conducted by the test authors (Gioia & Isquith, 2004). But

more research is emerging that show relation between executive function instruments (such as

the BRIEF-Parent form) and school-based executive function interventions, including

computerized training, aerobic exercise, and martial arts and mindfulness practices (Diamond &

Lee, 2011). Social consequences are noteworthy when considering that EF symptoms can be

feigned. The possibility of faking symptoms could arguably desensitize the public to these EF

behaviors and downplay the severity of executive dysfunction. Despite the social consequences

that may arise from using an observer rating scale to assess the presence (or absence) of

executive dysfunction, instruments such as the BRIEF-Parent form are becoming increasingly

popular in the school setting (Hale & Fiorello, 2004). Because of the increased use of the BRIEF

in schools, more studies are warranted to ensure that the measure is psychometrically appropriate

for use.

Purpose of the Present Study

The purpose of this study is to examine and scrutinize the current factor structure of the

BRIEF-Parent form. Continued investigation of the present factor structure of the BRIEF-Parent

form is needed as there continues to be debates regarding the appropriate number of scales and

index scores that best reflect the structure of the scale. The test authors (Gioia et al., 2002) have

conducted one of the most frequently cited studies in the BRIEF-Parent literature; thus,

independent examination of the BRIEF-Parent form is needed. More research is needed using a

59

sample of children with mixed clinical diagnoses. To date, only one study (Gioia et al., 2002),

the developers, have used a sample of children in the U.S. with diverse diagnoses.

Internationally, Egeland and Fallmyr (2010) obtained similar findings with a diverse clinical

sample, but in a sample of Norwegian children using a Norwegian translation and American

norms. Huizinga and Smidts (2011) also conducted a study using a sample of “normal Dutch

school children” and a Dutch version of the BRIEF-Parent form. Regardless of the limitations

(i.e., small sample size; Norwegian sample using American norms), Egeland and Fallmyr’s

results supported Gioia et al.’s (2002) findings of a three-factor solution of the BRIEF-Parent

form based on nine scales. Huizinga and Smidts (2011) did not.

Using the normative data collected for the instrument, Gioia et al. (2000) determined that

the best structure was a two-factor model based on eight scales. Huizinga and Smidts (2011)

also analyzed data from Dutch parents that indicated the best structure of the BRIEF-Parent form

is the eight-scale version, the original scale format (Gioia et al., 2000). However, both Gioia et

al. (2002), and Egeland and Fallmyr (2010), using a mixed clinical sample, found that a three-

factor model, based on a nine-scale version also provided good fit to the BRIEF-Parent scores.

Alternative factor structures have been investigated and supported by research and should also be

revisited. Only one study (Egeland & Fallmyr, 2010) has examined both the eight- and nine-

scale versions of the BRIEF-Parent form and it was international. Hence, the issue about the best

structure of the BRIEF-Parent still exists. More studies are needed that examine the current

structure in comparison to alternative models using a mixed clinical population from the U.S.

The purpose of the present study is to scrutinize the current factor structure of the

BRIEF-Parent form in the context of a mixed clinical sample that would be similar to that of

60

students receiving special education services in a school population. Two questions will guide

the study:

1. Will the factor structure of BRIEF-Parent scores obtained from a mixed clinical

sample of school-aged youth align with the two-factor, eight-scale structure

originally proposed by the test authors?

2. If the current factor structure does not meet established criteria in this sample,

which, if any, alternative models meet the standards of good-fit?

This study will employ confirmatory factor analysis (CFA). Using the best criteria in

conducting CFAs, which are described in the Method section, multiple models of the BRIEF-

Parent will be tested and compared to the two-factor, eight-scale model that is currently

employed in the BRIEF-Parent form instrument. Furthermore, reliability estimates of the BRIEF

scores will be provided as well.

61

METHOD

Participants

Participants were 371 students in grades from kindergarten through 12, in which parent or

caregiver ratings were obtained for each. The raters were 267 mothers (72.0%), 73 fathers

(19.7%), and 31 (8.3%) other family members (e.g., grandparents, aunts, or step-parents). The

sample was a compilation of three archival data sets. One dataset was based on 264 students,

who obtained evaluations for the purposes of the Pennsylvania Office for Vocational

Rehabilitation (OVR) office to determine eligibility for special education services in the post-

secondary educational setting. Hereafter this dataset is designated as the OVR sample. All

students referred by OVR for this evaluation had previously been identified in their home school

districts as qualifying for special education services and residing in Northwestern Pennsylvania.

Two, data from 45 students were from private evaluations and reflected a mixture of clinical

diagnoses. These evaluations were conducted for two main reasons: (a) a parent’s complaint to

the child’s home school district about the special education determination, which resulted in a

third party evaluation or (b) a desire by parents to gain a better understanding of their child’s

educational functioning. This dataset in subsequent writings is called the private sample. Three,

the remaining dataset was based on 62 students, who had obtained a psycho-educational

evaluation through the school district’s referral procedures. The name for this dataset has been

shortened to be school sample.

Diagnoses for the sample were mixed with children identified for special education

services under the categories of Other Health Impairment, Autism, Specific Learning Disability,

Traumatic Brain Injury, and Emotional Disturbance. The ratio of males to females across the

samples ranged from approximately 3:2 to 3:1: OVR sample—59.8% males and 40.2% females;

62

private sample—71.1% males and 28.9% females; and school sample— 64.5% males and 35.5%

females. Both the private and school samples were comparable in age: private sample 5 to 16

years old (M = 10.8 years; SD = 3.34); and school sample 6 to 18 years old (M= 11.2 years; SD =

3.04). However, the students in the OVR sample were older, 16 to 18 years old, (M = 17.4

years; SD = 0.64). Students’ race/ethnicity in the samples were predominantly Caucasian: 94.7%

for the OVR sample, 97.8% for the private sample and 96.8% for the school sample. The

composition of the three samples, arranged by age, gender, and race, is reported in Table 1.

Table 1

Demographic Characteristics of Samples

OVR (N = 264) Private (N = 45) School (N = 62) All (N = 371)

Characteristic n (%) n (%) n (%) n (%)

Gender Male 158 (59.8) 32 (71.1) 40 (64.5) 230 (62.0)

Female 106 (40.2) 13 (28.9) 22 (35.5) 141 (38.0)

Age 5 -- 1 (2.2) -- 1 (0.3)

6 -- 2 (4.4) 2 (3.2) 4 (1.1)

7 -- 8 (17.8) 5 (8.1) 13 (3.5)

8 -- 3 (6.7) 8 (12.9) 11 (3.0)

9 -- 5 (11.1) 7 (11.3) 12 (3.2)

10 -- 3 (6.7) 6 (9.7) 9 (2.4)

11 -- 3 (6.7) 5 (8.1) 8 (2.2)

12 -- 5 (11.1) 7 (11.3) 12 (3.2)

13 -- 2 (4.4) 6 (9.7) 8 (2.2)

14 -- 5 (11.1) 5 (8.1) 10 (2.7)

15 -- 3 (6.7) 5 (8.1) 8 (2.2)

16 22 (8.3) 5 (11.1) 5 (8.1) 32 (8.6)

17 118 (44.7) -- -- -- 118 (31.8)

18 124 (47.0) -- 1 (1.6) 125 (33.7)

Race Caucasian 250 (94.7) 44 (97.8) 60 (96.8) 354 (95.4)

African

American

4 (1.5) 1 (2.2) -- 5 (1.3)

Hispanic 4 (1.5) -- -- 4 (1.1)

More than

One Race

2 (0.8) -- 1 (1.6) 3 (0.8)

Unknown or

unspecified

4 (1.5) -- 1 (1.6) 5 (1.3)

Note. OVR = Office of Vocational Rehabilitation.

63

Geographical Context

The participating school district was located in Northwestern Pennsylvania, where the

majority of the district’s families were considered to be of low to middle income.

Approximately 40% of the district’s students were classified as low-income in accordance with

the Pennsylvania Department of Education criterion (PA Department of Education, 2012). Most

of the children were predominantly Caucasian, non-Hispanic (over 98%). According to the

2009-2010 data compiled by the Pennsylvania Department of Education, a total of 1,258 students

attended grades from kindergarten through 12 in the district. Special education services were

provided to 230 students (18.3% of the population).

Measures

Demographic information. Demographic information about the youth, such as gender,

grade, age, and birth date, and the rater (name, relationship, and date of completing the form)

was collected.

BRIEF-Parent form. As noted earlier, the BRIEF-Parent form is a questionnaire,

designed for parents/guardians to complete to help professionals assess executive function

behaviors of school-age (5-18) youth in the home and school environment (Gioia et al., 2000).

The term parent is broadly defined to include any individual “with the most recent and most

extensive contact with the child” (Gioia et al., p. 5). In designing this measure, a key goal was to

create a measure that would be easy to score and would yield useful information about executive

functioning, which most professionals can commonly agree (Gioia et al., 2000).

As summarized earlier, the scale contains 86 items, which are divided into eight scales:

Inhibit, Shift, Emotional Control, Initiate, Working Memory, the Plan/Organize, Organization of

Materials, and the Monitor scale. Items that comprise each scale are displayed in Appendix B.

64

Three composite measures are created from combining specific scales: The Behavioral

Regulation Index (BRI; Inhibit, Shift, and Emotional Control); Metacognition Index (MI;

Initiate, Working Memory, Plan/Organize, Organization of Materials, and Monitor), and the

Global Executive Composite (GEC; BRI and MI scores).

Based on the items presented, the rater is asked to describe the child’s behavior over the

past six months. Raters are instructed to read statements concerning specific behaviors and to

rate the frequency of their occurrence. If the behavior has never been observed of the child over

the past six months, the rater is instructed to circle the letter N (Never). Likewise, if the behavior

has sometimes been a problem, the rater is expected to circle the letter S (Sometimes), and if the

behavior has often been a problem, the rater is expected to circle the letter O (Often). No further

explanation or definition of “sometimes” or “often” is provided. Raters are instructed to

complete all items even if a behavior does not apply to the child.

Scoring parallels the rating format (Never = 1, Sometimes = 2, Often = 3). Scores are

then summed for each of the eight scales and a composite score is computed. Additionally, a

qualified administrator enters each score into the software called Behavior Rating Inventory of

Executive Functioning Scoring Portfolio (BRIEF-SP; Isquith & Gioia, 2002), resulting in BRIEF

score profile of the child, which is a plot composed of T score for each scale. If the scoring

program is not available, appendix tables are provided in the BRIEF manual, with scores

presented by gender and age group. A T score of 50, the mean of the T score distribution, is

designated as the reference point for what is considered as for Normal levels of the particular

index or composite score. T scores at 1.5 standard deviations (T = 65) above or higher than the

mean of the T score distribution are classified in the manual as Clinically Significant. Such

elevated scores are considered to warrant special attention. Scores falling in the 51 to 64 range

65

are considered At-risk. These results are to be used in the context of a complete evaluation.

Thus, it is recommended that decisions about educational placement or intervention should not

be based solely on the BRIEF scores (Gioia et al., 2000). Description and information about the

development of the scale and the psychometric properties (reliability estimates and evidence for

validity) of the BRIEF is summarized above (see pp. 23-57).

Procedure

The archival data were obtained from two sources: (a) the school district and (b) the

private files of the licensed psychologist, who had conducted the evaluations for the private and

OVR cases. On written request, the school district’s superintendent approved access to the data.

Similarly, the licensed psychologist who conducted the private and OVR evaluations gave access

to these cases (see Appendices C and D). The licensed psychologist and the evaluators were

eight certified school psychologists (certified by the Department of Education in the state of

Pennsylvania to practice school psychology in a school district setting), with two evaluators also

possessing a doctoral level degree in the school psychology field. Years of experience for the

psychologists ranged from three to 15 years (M = 8.4; SD = 4.6). All psychologists (2 females; 6

males) were of the Caucasian race.

During the summer of 2009, the licensed psychologist created a digital database

containing information gathered through OVR and private evaluations, starting at 2003. The

BRIEF-Parent form was one of several protocols recorded into the database. Employees

(certified school psychologists) recorded T scores from the scoring program profile sheet. Item

level responses were recorded in a similar manner and gleaned from individual parent protocols.

Because the archived data did not contain identifying information about the participants,

the Pennsylvania State University’s Office of Research Protections (ORP) determined that the

66

proposed research was not “human participant” research as defined by the Department of Health

and Human Services (DHHS) Federal Regulations. Therefore, the research did not need to be

reviewed by the Institutional Review Board (IRB). Email correspondence of the study’s research

status is contained in Appendix E.

CFA Guidelines and Models

CFAs, using EQS software (v. 6.2), were conducted on the scores of the scales to

examine the factor structure of the BRIEF-Parent form. Raw scores served as the input data,

which were converted into a covariance matrix of the variables. The method of extraction was

maximum likelihood.

Models. Seven models were tested, three based on eight scales and four based on nine

scales. In the eight-scale models, the Monitor scale was treated as one scale, whereas in the

nine-scale models, it was divided into two separate scales: Self-Monitor and Task-Monitor.

Each scale was treated as an indicator. Parameters were estimated by fixing one indicator per

factor to unity. Variance and covariance of the factors were estimated.

Two approaches were taken to test the models. The two-factor, eight-scale model was

directly tested to determine whether it met the guidelines for a good fitting model. Then an

alternative model approach was used, in which each model was compared to other potentially

viable models to ascertain the model with the best fit to the data (Jöreskog, 1993). The

alternative approach model is one of three types of CFA frameworks used; the other two are

strictly confirmatory (i.e., one model is postulated and is either rejected or not rejected with no

modifications made) and model-generating approach (i.e., a model shows poor fit and an

exploratory approach is used to identify a better model). Strictly confirmatory was not used in

this study because this strategy would not allow for direct comparison of alternative models to

67

the current factor structure. The model-generating process was also not used because this

approach has been shown, particularly in small samples, to be negatively affected by chance

characteristics of the sample (MacCallum, Roznowski, & Necowitz, 1992). The alternative

approach was used because it is not driven by the data, but is based on the comparison of several

a priori models.

Starting with the most reduced model, Model 1 was a one-factor model, labeled Unity-8,

for the eight-scale version, in which executive function is depicted as one construct. Model 2,

labeled 2Original-8 was the original, two-factor, eight-scale model proposed by Gioia et al.

(2000), composed of Behavioral Regulation (Inhibit, Shift, Emotional Control) and

Metacognitive (Initiate, Working Memory, Plan/Organize, Organization of Materials, &

Monitor). Model 3 also represented an eight-scale, two-factor model, which had been proposed

by Donders et al. (2010) and was labeled accordingly (2Donders-8): Behavioral Regulation

(Shift, Emotional Control) and Metacognitive (Inhibit, Initiate, Working Memory,

Plan/Organize, Organization of Materials, & Monitor). The difference between the original and

Donders et al.’s model is in the placement of the Inhibit scale. Donders et al. placed the Inhibit

scale on the Metacognitive factor instead on the Behavioral Regulation factor.

Models 4 through 7 were based on nine scales, with the Monitor scale divided into two,

and the number of factors varied from one to four. Model 4 was a one-factor model for the nine-

scale version and was designated as Unity-9. Model 5’s, labeled 2Monitor-9, two-factor

structure had the following composition: four scales on Behavioral Regulation (Inhibit, Shift,

Emotional Control, & Self-Monitor) and five scales on Metacognitive (Initiate, Working

Memory, Plan/Organize, Organization of Materials, & Task-Monitor). The original model

(Model 2) had only one Monitor scale, which loaded on the Metacognitive factor, whereas in

68

Model 5 each factor had a Monitor scale. Model 6 was a three-factor model, named 3Monitor-9,

in which two scales loaded on Behavioral Regulation (Inhibit & Self-Monitor); two scales, which

had previously been on Behavioral Regulation loaded on a new factor called Emotional

Regulation (Emotional Control & Shift) and the same five scales loaded on Metacognitive factor

(Initiate, Working Memory, Plan/Organize, Organization of Materials, & Task-Monitor).

Finally, Model 7, designated as 4Monitor-9, was a four-factor model, in which the composition

of the Behavioral Regulation (Inhibit and Self-Monitor) and Emotional Regulation (Emotional

Control and Shift) factors were the same as Model 6. However, the Metacognitive factor was

divided into two factors: “Internal” Metacognition (Initiate, Working Memory, & Plan/Organize)

and the “External” Metacognition (Organization of Materials & Task-Monitor scales; Gioia et

al., 2002). A summary of the models is reported in Table 2; depictions of these models are also

presented in Figures 1 through 7.

Fit criteria. Several criteria were used to evaluate goodness of fit. The standard

criterion has been the chi-square test (χ2), in which a statistically nonsignificant (p > .05) test

would indicate that the model is a good fit for the data. However, this criterion has been found

to be sensitive to sample size, and as a result, could be statistically significant even when the

model might be a good fit for the data (Bentler, 1988). Thus, χ2, along with its degree of

freedom and associated p-value were examined, but was not considered a sufficient criterion to

assess model fit on its own. Other fit indices were used that reflected three broad categories of

fit: absolute, incremental, and parsimony.

69

Table 2

Composition of Models Organized by Factor and Indicator

Model Name Factor (Indicator)

Eight-Scale

Model 1 Unity-8 1. GEF (Inhibit, Shift, Emotional Control, Initiate, Working Memory, Plan/Organize,

Organization of Materials, Monitor )

Model 2 2Original-8 1. BRI (Inhibit, Shift, Emotional Control)

2. MI (Initiate, Working Memory, Plan/Organize, Organization of Materials, Monitor)

Model 3 2Donders-8 1. BRI (Shift, Emotional Control)

2. MI (Initiate, Working Memory, Plan/Organize, Organization of Materials, Monitor,

Inhibit)

Nine-Scale

Model 4 Unity-9 1. GEF (Inhibit, Shift, Emotional Control, Initiate, Working Memory, Plan/Organize,

Organization of Materials, Self-Monitor, Task-Monitor )

Model 5 2Monitor-9 1. BRI (Inhibit, Shift, Emotional Control, Self-Monitor)

2. MI (Initiate, Working Memory, Plan/Organize, Organization of Materials, Task-

Monitor)

Model 6 3Monitor-9 1. BRI (Inhibit, Self-Monitor)

2. ERI (Shift, Emotional Control)

3. MI (Initiate, Working Memory, Plan/Organize, Organization of Materials, Task-

Monitor)

Model 7 4Monitor-9 1. BRI (Inhibit, Self-Monitor)

2. ERI (Shift, Emotional Control)

3. Int MI (Initiate, Working Memory, Plan/Organize)

4. Ext MI (Organization of Materials, Task-Monitor)

Note. GEF = General Executive Functioning; BRI = Behavioral Regulation Index; MI = Metacognition Index; ERI = Emotional

Regulation Index; Int MI = Internal Metacognition; Ext MI = External Metacognition.

70

Figure 1. One-factor, eight-scale (Unity-8) model based on theory of unity.

General Executive Functioning

Shift

Inhibit

EmotionalControl

Initiate

Working Memory

Plan / Organize

Organizationof Materials

MonitorMonitor

71

Figure 2. Gioia et al.’s (2000) original two-factor model based on eight scales

(2Original-8). BRI =Behavioral Regulation Index; MI = Metacognition Index.

MI

Shift

Inhibit

EmotionalControl

Initiate

Working Memory

Plan / Organize


Monitor

BRI

72

Figure 3. Donders et al.’s (2010) two-factor, eight-scale model (2Donders-8) depicted

at scale level. BRI =Behavioral Regulation Index; MI = Metacognition Index.

MI

Emotional Control

Shift

Initiate

Working Memory

Plan / Organize


Monitor

BRI

Inhibit

73

Figure 4. Gioia et al.’s (2002) one-factor, nine-scale model (Unity-9) depicted at scale level.

General Executive Functioning

Shift

Inhibit

EmotionalControl

Initiate

Working Memory

Plan / Organize


Monitor

Task-Monitor

Self-Monitor

74

Figure 5. Gioia et al.’s (2002) two-factor, nine-scale model (2Monitor-9) depicted

at scale level. BRI =Behavioral Regulation Index; MI = Metacognition Index.

MI

Shift

Inhibit

Initiate

Working Memory

Plan / Organize


Task-Monitor

BRIEmotional

Control

Self-Monitor

75

Figure 6. Gioia et al.’s (2002) three-factor model (3Monitor-9) depicted at scale

level. BRI =Behavioral Regulation Index; ERI= Emotional Regulation Index;

MI = Metacognition Index.

MI

Self-Monitor

Inhibit

Initiate

Working Memory

Plan / Organize


Task-Monitor

BRI

EmotionalControl

ShiftERI

76

.

Figure 7. Gioia et al.’s (2002) four-factor model (4Monitor-9) depicted at scale

level. BRI =Behavioral Regulation Index; ER= Emotional Regulation Index;

Internal MI = Internal Metacognition Index; External MI = External Metacognition

Index.

Inhibit

BRISelf - Monitor

ExternalMI


Task-Monitor

Shift

ERIEmotional

Control

Initiate

InternalMI

Working Memory

Plan / Organize

77

Absolute indices are ones that “directly assess how well an a priori model produces the

sample data” (Hu & Bentler, 1998, p. 426) and their calculation does not rely on a baseline

model. Absolute indices used for judgment of model fit were (a) the standardized root-mean-

square residual (SRMR), which is the average difference between sample variance and

covariances, and the estimated population variance and covariances (Hu & Bentler, 1995), and

(b) the root mean square error of approximation (RMSEA), which is the discrepancy between the

error of approximation in the population covariance matrix and optimally chosen parameter

values of the model (Steiger & Lind, 1980). Good model fit was determined based on low

values for both indices: RMSEA equal or less than 0.8 (Browne & Cudeck, 1993); SRMR equal

or less than .08 (Hu & Bentler, 1999). A 90% confidence interval around RMSEA estimate was

reported to increase the precision of the estimate. The lower bound should ideally be less than

0.05 and as close to zero as possible and the upper bound should be equal or less than .08

(Browne & Cudeck, 1993).

Incremental indices “measure the proportionate improvement in fit by comparing a target

model with a more restricted, baseline model” (Hu & Bentler, 1998, p. 426). The null hypothesis

for these models was that all variables were uncorrelated. Two incremental fit indices used were

the comparative fit index (CFI) and the nonnormed fit index (NNFI); values equal or greater than

0.90 were indicative of an acceptable fit (Marsh & Grayson, 1995). The three 8-scale models

(Models 1 through Model 3) were nested as were all nine-scale models (Models 4 through Model

7). Chi-square values were used to compare the nested models with lower values considered

ideal. Furthermore, incremental change in χ2

was also used to examine the nested models in

regard to which least restrictive model (e.g., 2-factor) had the better fit in comparison to a

subsequent restrictive model (e.g., 1-factor).

78

To compare the best fitting eight-scale models to the best fitting nine-scale models, the

Expected Cross-Validation Index (ECVI; Browne & Cudeck, 1989) was used to assess

parsimony of the non-nested models. Parsimony fit indices penalize models that are less

straightforward so that simpler theoretical processes are preferred over more complex ones. This

particular value is used to express overall error between the population covariance and the model

fitted to the sample. A lower value is considered ideal when compared to ECVI values of other

models (Diamantopoulos & Siguaw, 2000).

Models were also examined for fit and misfit on several other criteria: average off-

diagonal absolute standardized residual (AODSR), the statistical significance of the parameters

in the equations, and the effect sizes (R2) of the parameters. Average off-diagonal absolute

standard residuals are used to indicate the discrepancy between the sample covariance matrix and

the model covariance matrix (Browne, 2006). Standardized residuals are comparable to standard

scores in a sampling distribution, and as a result, these values can be interpreted like z scores.

Thus, values greater than 2.58 (p < .01) are considered “large” (Byrnes, 2006, p. 94). Statistical

significance of the estimated parameters (unstandardized parameter estimate divided by its

standard of error) was also conducted. The ratios are interpreted like z scores, such that values

greater than ± 1.96, are considered statistically significant at the probability level of .05 (Byrne,

2006). Effect sizes (R2) of the parameters were reflected as squared values of the standardized

path coefficients; values less than 0.10 indicated a “small” effect; values around .30 indicated a

“medium” effect; and values greater than .50 was considered a “large” effect (Cohen, 1988).

79

RESULTS

Preliminary Analyses

Preliminary analyses were conducted to examine the scores on the BRIEF-Parent for

missing values and outliers. A total of 59 cases were identified as missing one or more

responses; however, the missing data were not imputed as the values are not reflective of an

attitude but an actual condition. Thus, these cases were deleted listwise, reducing the sample

size from 430 to 371. The Mahalanobis distance test (Tabachnick & Fidell, 2001) was

conducted and three extreme multivariate outliers (p < .01) were identified. Removing these

cases and re-running the primary analyses without them did not substantially alter the findings.

As a result, all analyses included these cases. The final sample used for statistical analysis was

371.

Descriptive Statistics

Descriptive statistics of the BRIEF scale scores (mean, standard deviation, skewness, and

kurtosis values, reliability estimates, and correlations) are presented in Table 3. Assumptions for

parametric statistics were tested (linearity and normality; Kline, 2006) for the BRIEF scores.

Visual examination indicated that the scores on the Inhibit scale had a slightly positive skew,

whereas the scores on the Organization of Materials scale had a slightly negative skew. Scores

on the Self-Monitor scale were marginally platykurtic, while the scores on the Initiate scale

appeared to be slightly leptokurtic. Despite these minor variations, the scores of the scales

approximated a normal distribution, using the guidelines of less than 2 for skew and less than 7

for kurtosis (Curran, West, & Finch, 1996). Linearity, inspected visually through scatterplots of

the BRIEF scale scores, was determined to be acceptable.

80

Table 3

Descriptive Statistics of Raw Scale Scores on the BRIEF-Parent Form

1 2 3 4 5 6 7 8 9 10

1. Inhibit ---

2. Shift .60 ---

3. Emotional Control .69 .73 ---

4. Initiate .57 .64 .54 ---

5. WM .60 .63 .50 .72 ---

6. Plan/Organize .53 .58 .48 .77 .81 ---

7. Organization of Materials .50 .47 .46 .60 .62 .68 ---

8. Monitor .70 .63 .59 .72 .71 .75 .57 ---

9. Self-Monitor .72 .61 .64 .65 .58 .59 .44 .88 ---

10. Task-Monitor .47 .45 .36 .59 .64 .71 .55 .83 .47 ---

M 16.31 14.49 18.43 16.05 20.35 25.25 13.04 16.43 7.91 8.53

SD 5.56 3.93 5.47 3.73 5.54 6.22 3.74 4.08 2.55 2.21

Range 10 - 30 8 - 24 10 - 30 8 - 24 10 - 30 12 – 36 6 - 18 8 - 24 4 - 12 4 - 12

Skew .66 .15 .26 -.02 -.17 -.19 -.29 -.11 .01 -.15

Kurtosis -.56 -.83 -.93 -.66 -.85 -.86 -1.05 -8.3 -1.13 -.81

α .93 .86 .92 .80 .92 .91 .90 .85 .86 .78

Note. N = 371. All correlations were statistically significant at .01.

81

All 45 correlations were statistically significant at .01 and ranged from .36 (Task-Monitor

and Emotional Control) to .88 (Self-Monitor and Monitor; Mdn = .60). Plan/Organize was

highly correlated with several other scales (Initiate =.77; Working Memory = .81; Monitor = .75;

Task-Monitor = .71). Monitor was highly correlated with Inhibit (.70), Initiate (.72) and

Working Memory (.71). Other notably high correlations existed between Emotional Control and

Shift (.73); Working Memory and Initiate (.72); and Self-Monitor and Inhibit (.72). The Monitor

scale was also highly correlated with Self-Monitor (.88) and Task-Monitor (.83), but these

patterns were expected as the latter two scales make-up the former scale. Reliability

(Cronbach’s alpha) of the scores ranged from .79 (Task-Monitor) to .93 (Inhibit; Mdn = .88) for

the BRIEF scales. The reliability estimate for the scores on the Task-Monitor scale was slightly

under .80, which is considered the minimum level for high-stakes decisions (Sattler, 2001).

Confirmatory Factor Analyses

CFAs (maximum likelihood extraction) of the BRIEF-Parent form were conducted using

EQS v. 6.2 software on a covariance matrix computed from raw scale scores. A CFA using

item-level data was not conducted due to the low ratio (4:1) between sample size (N = 371) and

number of items (86; Barrett & Kline, 1981).

Criteria. As summarized in pages 68-77, models were considered a good fit based on

the following criteria: (a) the change (drop) in chi-square was statistically significant in

comparison to the null hypothesis or competing models (p < .01); (b) the fit indices of CFI and

NNFI were equal or greater than .90 (Marsh & Grayson, 1995); (c) RMSEA was equal or less

than .08 (Browne & Cudeck, 1993); (d) SRMR was equal or less than .08 (Hu & Bentler, 1999);

(e) average off-diagonal standardized residuals (AODSR) was less than .05; and (f) the largest

standardized residuals were less than |1.96| (Byrne, 2006). Each model was compared to one

82

another to ascertain the model with the better fit to the data. Because an alternative models

approach was used, post-hoc model re-specifications were not calculated (Jöreskog, 1993), but

were examined to address possible reasons for misfit of the data to a model.

Models. As described in the Method section (see pp. 66-78), seven models were tested.

The 2Original-8 model, the first posed by Gioia et al. (2000), was used as a basis of comparison

in relation to all other tested models. The six alternative models were based on prior research

(Donders et al., 2010; Egeland & Fallmyr, 2010; Gioia et al., 2002; Huizinga & Smidts, 2011;

Hulac, 2008). Three models were based on eight scales and the remaining four models were

based on nine scales, resulting in two sets of nested models. The nested models for the eight-

scale version were (a) the Unity-8 Model, (b) the 2Original-8 Model, and (c) the 2Donders-8

Model. Models for the nine-scale version were based on the subdivision of the Monitor scale

into two separate scales (a) Self-Monitor and (b) Task-Monitor. As indicated earlier, this

subdivision resulted into the reconfiguration of the nine scales into four nested models: (a) the

Unity-9 model, (b) the 2Monitor-9 Model, (c) the 3Monitor-9 Model, and (d) the 4Monitor-9

Model. Eight-scale models were considered non-nested with the nine-scale models.

Eight-scale models. A summary of the goodness-of-fit indices for the eight-scale models

of the BRIEF-Parent scale is displayed in Table 4. Across all fit indices, except for one, the

2Original-8 Model fitted the data better than the other two 8-factor models. The 2Original-8

Model had a χ2 value of 173.478, which was statistically lower than the Unity-8 Model (316.836)

or 2Donders-8 Model (214.787). All eight-factor model χ2

values were statistically significant at

.05 relative to the Null. Also, the 2Original-8 Model had consistently better fit values (≥ .90)

across the incremental fit indices (CFI and NNFI) in comparison to the variability found with the

other two 8-scale models (Unity-8 =.819 to .870; 2Donders-8 = .874 to .915). Furthermore, the

83

Table 4

Summary of the Fit Indices of CFA (ML Extraction) Models on the BRIEF- Parent Form Scale Scores for a Mixed Disability Sample

χ2

df χ2

diff NNFI CFI SRMR RMSEA 90% CI AODSR

Eight-Scale

Null 2320.048* 28 -- -- -- -- -- --

Unity-8 316.836* 20 2003.212* .819 .870 .066 .200 (.181, .220) .050

2Original-8 173.478* 19 143.358* .901 .933 .049 .148 (.128, .168) .042

2Donders-8 214.787* 19 102.049* .874 .915 .057 .167 (.147, .187) .041

Nine-Scale

Null 2501.886* 36 -- -- -- -- -- --

Unity-9 414.935* 27 2086.951* .790 .843 .076 .197 (.180, .214) .064

2Monitor-9 165.279* 26 249.656* .922 .944 .044 .120 (.103, .138) .035

3Monitor-9 131.689* 24 33.59* .934 .956 .041 .110 (.092, .128) .031

4Monitor-9 130.049* 21 1.640 .924 .956 .039 .118 (.099, .138) .031

Note. N = 371. ML = Maximum likelihood extraction; Unity-8 = One-factor, eight-scale model; 2Original-8 = Two-factor, eight-scale

Gioia model; 2Donders-8 = Two-factor, eight-scale Donders model; Unity-9 = One-factor, nine-scale model; 2Monitor-9 = Two-

factor, nine-scale model; 3Monitor-9 = Three-factor, nine-scale model; 4Monitor-9 = Four-factor, nine-scale model; χ2

= chi-square; df

= degrees of freedom; χ2

diff = chi-square difference; NNFI = Bentler-Bonett non-normed fit index; CFI = comparative fit index; SRMR

= standard root mean square; RMSEA = root mean square error of approximation; CI = confidence interval; AODSR = average off-

diagonal standardized residual. *p < .05.

84

2Original-8 Model had a lower SRMR value (.049) in comparison to the values for the other two

8-scale models (Unity-8 Model = .066; 2Donders-8 Model = .057). The 2Donders-8 Model had

a negligibly lower score for the average off-diagonal standardized residual (AODSR; .041) than

the 2Original-8 Model (.042; .001 difference). All eight-scale models evidenced misfit in that

the RMSEA was greater than .08 (Browne & Cudeck, 1993).

A closer examination of the CFA findings for the 2Original-8 Model indicated that all

equations for the parameter estimates were statistically significant at .05. Structure coefficients

ranged from .72 (Organization of Materials and MI) to .91 (Plan/Organize and MI; Mdn = .85).

Factor intercorrelation between the BRI and MI of the 2Original-8 Model was .794. Effect sizes

(R2

is provided by EQS) for the 2Original-8 Model were all “large” (> .50; Cohen, 1988) and

accounted for 52% (Organization of Materials) to 82% (Plan/Organize) of the variance to the

respective factor. Structure coefficients, effect sizes, and error terms are presented in Table 5 for

the 2Original-8 Model.

The nested eight-scale models were compared to one another through the incremental

change in χ2. The incremental fit of the Unity-8 Model differed from the Null [χ

2 (8) = 2003.212,

p < .001]. The 2Original-8 Model also differed from the Unity-8 Model [χ2

(1) = 143.348, p <

.001] as did the 2Donders-8 Model from the Unity-8 Model [χ2

(1) = 102.049, p < .001]. Of the

two-factor models, the 2Original-8 Model had a higher incremental change relative to the less

restricted model (i.e., Unity-8).

Nine-scale models. A summary of the goodness-of-fit indices for the nine-scale models is

also displayed in Table 3. As expected, the Null Model for the nine-scale data was not supported

in that the chi-square was statistically significantly larger than any of the nine-scale models. The

4Monitor-9 Model had a χ2 of 130.049, which was lower than the Unity-9 (χ

2 = 414.935),

85

Table 5

Structure Coefficients for BRIEF-Parent Scales for Mixed Disability Sample Arranged by Model

(Maximum Likelihood Extraction)

Structure Coefficient Structure Coefficient

(Error Terms) (Error Terms)

Model [Effect Size] Model [Effect Size]

Model 1: Unity-8 Model Model 2: 2Original-8 Model

Factor 1- GEF Factor 1- BRI

Inhibit .79 (.62) [.62] Inhibit .79 (.62) [.62]

Shift .84 (.55) [.70] Shift .84 (.55) [.70]

ECO .84 (.54) [.71] ECO .84 (.54) [.71]

Initiate .85 (.52) [.73] Factor 2- MI

WM .87 (.49) [.76] Initiate .85 (.52) [.73]

P/O .91 (.42) [.82] WM .87 (.49) [.76]

ORG .72 (.69) [.52] P/O .91 (.42) [.82]

Monitor .84 (.54) [.71] ORG .72 (.69) [.52]

Monitor .84 (.54) [.71]

Model 3: 2Donders-8 Model Model 4: Unity-9 Model

Factor 1- BRI Factor 1- GEF

Shift .91 (.42) [.82] Inhibit .72 (.69) [.52]

ECO .80 (.60) [.65] Shift .74 (.67) [.55]

Factor 2- MI ECO .67 (.74) [.45]

Initiate .85 (.53) [.72] S-Monitor .74 (.68) [.54]

WM .87 (.50) [.75] Initiate .85 (.53) [.72]

P/O .89 (.46) [.79] WM .86 (.51) [.74]

ORG .72 (.70) [.51] P/O .88 (.48) [.77]

Monitor .85 (.52) [.73] ORG .71 (.70) [.51]

Inhibit .70 (.71) [.50] T-Monitor .71 (.70) [.51]

Model 5: 2Monitor-9 Model Model 6: 3Monitor-9 Model

Factor 1-BRI Factor 1-BRI

Inhibit .82 (.57) [.68] Inhibit .85 (.53) [.72]

Shift .80 (.60) [.64] S-Monitor .85 (.53) [.72]

ECO .82 (.57) [.68] Factor 2 – ERI

S-Monitor .82 (.58) [.67] Shift .86 (.51) [.74]

Factor 2-MI ECO .85 (.53) [.72]


WM .88 (.48) [.77] Initiate .84 (.54) [.71]

P/O .92 (.38) [.85] WM .88 (.49) [.77]

ORG .73 (.68) [.53] P/O .92 (.38) [.85]

T-Monitor .74 (.67) [.55] ORG .73 (.69) [.53]

T-Monitor .74 (.67) [.55]

86

Table 5 (continued)

Structure Coefficient

(Error Terms)

Model [Effect Size]

Model 7: 4Monitor-9 Model

Factor 1-BRI

Inhibit .85 (.53) [.72]

S-Monitor .85 (.53) [.72]

Factor 2- ERI

Shift .85 (.53) [.72]

ECO .86 (.51) [.74]

Factor 3- Int MI

Initiate .84 (.54) [.71]

WM .88 (.49) [.77]

P/O .92 (.38) [.85]

Factor 4 – Ext MI

ORG .73 (.68) [.53]

T-Monitor .75 (.67) [.56]

Note. N = 371; ECO = Emotional Control; WM = Working Memory; P/O = Plan/Organize; ORG

= Organization of Materials; S-Monitor = Self-Monitor; T-Monitor = Task-Monitor; GEF=

General Executive Functioning; BRI = Behavioral Regulation Index; MI = Metacognition Index;

ERI = Emotional Regulation Index; Int MI = Internal Metacognition Index; Ext MI = External

Metacognition Index; Unity-8 = One-factor, eight-scale model; 2Original-8 = Two-factor, eight-

scale Gioia model; 2Donders-8 = Two-factor, eight-scale Donders model; Unity-9 = One-factor,

nine-scale model; 2Monitor-9 = Two-factor, nine-scale model; 3Monitor-9 = Three-factor, nine-

scale model; 4Monitor-9 = Four-factor, nine-scale model. Effect size = R2.

2Monitor-9 (χ2

= 165.279), but only slightly lower than the 3Monitor-9 Model (χ2

= 131.689).

Three of the four nine-scale models demonstrated a strong fit to the data (i.e., 2Monitor-9,

3Monitor-9, 4Monitor-9), with the fit indices of NNFI and CFI ranging from .922 to .956.

AODSRs (.031 - .035) and SRMRs (.039 - .044) were less than .05. The Unity-9 Model

demonstrated poor fit to the data across all fit indices except for the SRMR (.076), but its value

was the highest of the nine-scale models. All four 9-scale models evidenced misfit in that the

RMSEAs were greater than .08. However, the viable nine scale models were the 2-, 3-, and 4-

factor ones.

87

A comparison of the viable nine scale models showed that all equations for the parameter

estimates were statistically significant at .05. Structure coefficients for the 2Monitor-9 Model

ranged from .73 (Organization of Materials and MI) to .92 (Plan/Organize and MI; Mdn = .82).

Factor intercorrelation between the BRI and MI for the 2Monitor-9 Model was .771. For the

3Monitor-9 Model, structure coefficients were similar to the 2Monitor-9 and ranged from .73

(Organization of Materials and MI) to .92 (Plan/Organize and MI; Mdn = .85). Intercorrelations

of the factors for the 3Monitor-9 Model were moderate to high: BRI and ERI = .879; BRI and

MI = .763; and ERI and MI = .716. For the 4Monitor-9 Model, again, structure coefficients were

similar to the other two models and ranged from .73 (Organization of Materials and Ext MI) to

.92 (Plan/Organize and Internal MI; Mdn = .85). Factor intercorrelations for the 4Monitor-9

Model ranged from .725 to .997, and were as follows: BRI and ERI = .878; BRI and Internal MI

= .767; BRI and External MI = .750; ERI and Internal MI =.725; ERI and External MI = .683;

and Internal MI and External MI = .997. Singularity was observed between the two

Metacognitive scales, which diminished the viability of the four-factor model.

Both the 3Monitor-9 and 4Monitor-9 Models consistently showed slightly stronger fit to

the data than the 2Monitor-9 Model. All four nested nine-scale models were compared to one

another based on the change in chi-square from the most to the least restrictive models. The

Unity-9 Model fit the data poorly but differed from the Null [χ2 (9) = 2086.951, p < .001]. The

2Monitor-9 Model also differed from the Unity-9 Model [χ2

(1) = 249.656, p < .001].

Incremental difference in fit differed between the 3Monitor-9 and 2Monitor-9 Models [χ2

(2) =

33.59, p < .001]; however, the incremental fit between the 4Monitor-9 and 3Monitor-9 Models

did not differ [χ2

(3) = 1.640, p > .05]. This finding indicated that adding another parameter

(i.e., factor) to the three-factor model did not improve the fit of the four-factor model to the data.

88

The RMSEA score was outside of the recommended range for all models, but was

slightly lower in the 3Monitor-9 Model in comparison to the other models, including the

4Monitor-9 Model. Additionally, singularity was found between the Internal MI factor and the

External MI factor (.997) in the 4Monitor-9 Model. Given the similar fit indices (i.e., NNFI,

CFI, SRMR, RMSEA, and AODSR) between the 3Monitor-9 and 4Monitor-9 Models, prior

research (Egeland & Fallmyr, 2010; Gioia et al., 2002), incremental χ2

differences, and the

parsimony of the three-factor model over the four, the 3Monitor-9 Model was selected as having

the better fit for the nine-scale version of the BRIEF-Parent scale. The factor structure, including

structure coefficients and error terms, for the 3Monitor-9 Model is displayed in Figure 8.

Eight- versus nine-scale models. To compare the goodness-of-fit values of the non-

nested models (8- versus 9-scales), two approaches were used. First, the viable models were, in

general, compared to each other based on the fit indices. Second, the models were compared on

the value of the Expected Cross-Validation Index (ECVI; Browne & Cudeck, 1989). This value

is used to express overall error between the population covariance and the model fitted to the

sample. A lower value is considered ideal as it is not informative in itself, rather, when

compared to other models (Diamantopoulos & Siguaw, 2000). As discussed, the model with the

best fit among the eight-scale models was the 2Original-8 Model. Even though three out of the

four 9-scale models (2Monitor-9, 3Monitor-9, 4Monitor-9) had a stronger fit to the data than the

2Original-Model, the 3Monitor-9 Model was considered to fit the data better. Thus, comparisons

were made between the 2Original-8 Model and 3Monitor-9 Model.

Across the fit indices, the 3Monitor-9 Model had higher values on the NNFI and CFI

(.023 - .033 difference), and slightly lower values (.008 - .038 difference), as needed, on the

89

Figure 8. Standardized coefficients derived by confirmatory factor analysis

(maximum-likelihood) for the three-factor, nine-scale (3Monitor-9) Model.

Effect sizes are squared standardized structure coefficients. BRI = Behavioral

Regulation Index; ERI= Emotional Regulation Index; MI = Metacognition

Index.

MI

Self-Monitor

Inhibit

Initiate

Working Memory

Plan / Organize


Task-Monitor

BRI

EmotionalControl

ShiftERI

.76

.88

.72

.85

.85

.86

.85

.84

.88

.92

.73

.74

.53

.53

.51

.53

.54

.49

.38

.69

.67

90

SRMR, RMSEA, and AODSR in comparison to the 2Original-8 Model. Additionally, the

3Monitor-9 Model had a lower ECVI value (.453) in comparison to the 2Original8 Model (.550),

indicating a better fit.

Subsamples. The sample had some unique demographic features in that 74% of

participants were between the age of 16 and 18 and were OVR referrals. Also, 95% of the

participants were of the Caucasian race, and 72% of the raters were mothers. Because of the

disproportional sample size of participants reflecting these demographic features, it was not

possible to run separate CFAs to determine whether the fit of the data to the model was based on

these features. However, participants of the minority subsamples (i.e., non-OVR referrals,

racial/ethnic minority participants, and non-mother raters) were temporarily removed from

analysis and CFAs were re-run only on the scores of the majority subsamples (i.e., OVR only,

Caucasians only, and mothers only).

OVR subsample. The OVR subsample consisted of 264 participants. All eight-factor

model χ2

values were statistically significant at .05. The 2Original-8 Model for this subsample

had a χ2 value of 124.980, which was statistically lower than the Unity-8 Model (223.707) or

2Donders-8 Model (138.838). Also, the 2Original-8 Model had slightly better fit values (≥ .90)

across the incremental fit indices (CFI; NNFI) in comparison to the Unity-8 Model (.819 to .870,

respectively) and 2Donders-8 Model (.888 to .915, respectively). All three 8-scale models for

the OVR subsample had the same CFI as the full sample; two of the three (2Original-8 and

Unity-8) had the same NNFI as well. The 2Donders-8 Model varied slightly, with the difference

in the NNFI equal to .014 in relation to the full sample. RMSEA values for all three 8-scale

models were consistently large (> .14) with differences from the full sample not exceeding .012.

Similar to the findings for the full sample, the 2Original-8 Model had the best fit compared to the

91

Null Model and the two other eight-factor models for the OVR subsample. The 2Original-8

Model also had a higher incremental change χ2

(1) = 98.727, p < .001 relative to the less

restricted model.

In terms of nine-scale models, all χ2 values were statistically significant relative to the

Null Model. Three out of the four models (2Monitor-9, 3Monitor-9, 4Monitor-9) demonstrated a

strong fit to the data, with the fit indices of NNFI and CFI, ranging from .925 to .956. AODSRs

(.032-.037) and SRMRs (.040-.048) for the three nine-scale models were less than .05. Again,

the Unity-9 Model had a poor fit across all indices in comparison to the other 9-scale models.

The difference between the OVR subsample and the full sample on the fit indices for all four

models was small (≤ .011). Similar to the full sample, all nine-scale models evidenced misfit in

that the RMSEAs were greater than .08 in the OVR subsample. All equations for the parameter

estimates were statistically significant at .05. Effect sizes (R2) were typically large (> .50;

Cohen, 1988), with the exception of two values in the Unity-9 Model, which showed medium

values (Emotional Control = .40 and Task-Monitor = .48). The effect sizes of the remaining

nine-scale models ranged from .51 (Task Monitor) to .87 (Plan/Organize). Incremental χ2

tests

showed statistically significant differences between the more and less restrictive models, with the

exception of the 4Monitor-9 to 3Monitor-9, which showed a non-significant change.

For the same reasons indicated for the full sample (good fit indices, incremental χ2

test

results, singularity of the Internal and External factors, and parsimony), the best nine-scale model

was again determined to be the 3Monitor-9 Model for the OVR subsample. The 3Monitor-9

Model consistently had a better fit to the data than the 2Original-8 Model for the OVR

subsample in terms of the NNFI, CFI, SRMR, AODSR, and RMSEA values. In addition, the

ECVI of the 3Monitor-9 Model (.467) was lower than that of the 2Original-8 Model (.589).

92

Overall, results mirrored the findings reported above for the full sample. A summary of the fit

indices for all models for the OVR sample is displayed in Table 6. A complete overview of

structure coefficients, effect sizes, and error terms for the OVR subsample is contained in

Appendix F.

Caucasian subsample. The sample was composed primarily of Caucasian participants (n

= 354; 95.4%). For the eight-scale models, all χ2 values were statistically significant relative to

the Null Model. A comparison of each Caucasian subsample value to its respective full sample

value for the three 8-scale models indicated less than .01 difference on all fit indices. Of the

eight-scale models for the Caucasian subsample, the 2Original-8 Model fitted the data best

(NNFI .896; CFI .929). SRMR was .052 and RMSEA was .152. In terms of incremental fit,

again, the 2Original-8 Model differed from the Unity-8 Model (χ2

(1) = 143.281, p < .001) more

so than the 2Donders-8 Model (χ2

(1) = 100.253, p < .001).

In terms of nine-scale models, all χ2 values were statistically significant relative to the

Null Model. Three out of the four models (2Monitor-9, 3Monitor-9, 4Monitor-9) demonstrated a

strong fit to the data; the fit indices of NNFI and CFI ranged from .916 to .952. Values for

measures of residual were low and less than .05: AODSRs (.032-.037) and SRMRs (.042-.047).

There were little differences on the fit indices among the nine-scale models between the

Caucasian subsample and the full sample; no change exceeded .008. However, as with the other

samples, RMSEAs were greater than .08. Like the full sample, the 3Monitor-9 Model for the

Caucasian subsample had a slightly lower RMSEA value, but had almost identical values on the

other fit indices as the 4Monitor-9 Model. All equations for the parameter estimates for the nine-

scale models were statistically significant at .05. Effect sizes (R2) were all considered “large”

(>.50; Cohen, 1988). The incremental χ2 values of the nine-scale models were statistically

93

Table 6

Summary of Fit Indices of CFA (ML) Models of the BRIEF-Parent Form for the OVR Sample

χ2

df χ2


8-scale

Null 1606.296 28 -- -- -- -- -- --

Unity-8 223.707* 20 1382.589* .819 .870 .070 .197 (.173, .220) .053

2Original-8 124.980* 19 98.727* .901 .933 .057 .146 (.121, .170) .046

2Donders-8 138.838* 19 84.869* .888 .915 .058 .155 (.131, .179) .045

9-scale

Null 1736.807 36 -- -- -- -- -- --

Unity-9 298.093* 27 1438.714* .787 .841 .081 .195 (.175, .215) .067

2Monitor-9 118.593* 26 179.500* .925 .944 .048 .116 (.095, .137) .037

3Monitor-9 86.858* 24 31.735* .945 .956 .042 .100 (.077, .122) .032

4Monitor-9 85.540* 21 1.318 .935 .956 .040 .108 (.084, .132) .032

Note. N = 264. ML = Maximum likelihood extraction; Unity-8 = One-factor, eight-scale model; 2Original-8 = Two-factor, eight-scale



= chi-square; df





94

significant for each comparison between the more and less restricted models with the exception

of the 4Monitor-9 Model to the 3Monitor-9 Model, χ2

(3) = 1.452, p > .05. For the Caucasian

subsample, the 3Monitor-9 Model was selected as the best model of the nine-scale models.

Factor intercorrelations for the 3Monitor-9 Model ranged from .700 to .881, which were

similar to the full sample. In comparing the non-nested 2Original-8 and 3Monitor-9 Models, the

ECVI was lower in the 3Monitor-9 Model (.492 vs. .577), indicating a better fit. Re-running the

data without racial/ethnic minority or unknown race participants did not substantially change the

model fit to the data. A summary of the fit indices for all models for the Caucasian subsample is

displayed in Table 7. A complete overview of structure coefficients, effect sizes, and error terms

is contained in Appendix F.

Mother subsample. Mothers made up 71.9% of the sample (n = 267). For the eight-scale

models, all χ2 values were statistically significant to the Null Model. The 2Original-8 Model had

the best fit to the data (NNFI .892 and CFI .927) in comparison to the other eight-scale models.

A comparison of the subsample of mothers to the full sample indicated that the difference

between each respective value showed that all fit indices for the 8-scale models was equal or less

than .014 (i.e., the value in the sample of mothers relative to same value in the full sample).

RMSEAs exceeded .08, but AODSR and SRMR values were acceptable.

For the nine-scale models, with the exception of Unity-9, the nine-scale models had

adequate fit values (> .90) across the incremental fit indices (CFI and NNFI); in terms of SRMR,

values were between .041 and .048. RMSEA values were greater than .08, ranging between .111

and .121. Incremental χ2 tests showed statistically significant differences between the more and

less restricted models, with the exception of the 4Monitor-9 and 3Monitor-9 Models. The

3Monitor-9 was, again, selected as having the best fit of the nine-scale models. Factor

95

Table 7

Summary of Fit Indices of CFA (ML) Models for the BRIEF-Parent Form Based on the Caucasian Participants

χ2

df χ2


8-scale

Null 2217.553 28 -- -- -- -- -- --

Unity-8 316.879* 20 1900.674* .810 .864 .069 .205 (.185, .225) .052

2Original-8 173.598* 19 143.281* .896 .929 .052 .152 (.131, .172) .043

2Donders-8 216.626* 19 100.253* .867 .910 .059 .172 (.151, .192) .043

9-scale

Null 2392.215 36 -- -- -- -- -- --

Unity-9 414.422* 27 1977.793* .781 .836 .079 .202 (.184, .219) .066

2Monitor-9 169.606* 26 244.816* .916 .939 .047 .125 (.107, .143) .037

3Monitor-9 137.812* 24 31.794* .928 .952 .043 .116 (.097, .135) .032

4Monitor-9 136.360* 21 1.452 .916 .951 .042 .125 (.105, .145) .032

Note. N =354. ML = Maximum likelihood extraction; Unity-8 = One-factor, eight-scale model; 2Original-8 = Two-factor, eight-scale



= chi-square; df





96

intercorrelations were similar to that of the full sample for the 3Monitor-9 and ranged from .681

to .863. In comparing the non-nested 2Original-8 and 3Monitor-9 Models, the ECVI was lower

in the 3Monitor-9 Model (.522 vs. .630), indicating a better fit. Overall, results were similar to

those obtained from the full sample, meaning that the scores obtained from solely mothers as

raters did not substantially change the model fit to the data. A summary of the fit indices for the

mother subsample is reported in Table 8. Structure coefficients, error terms, and effect sizes for

the mother subsample are contained in Appendix F.

97

Table 8

Summary of Fit Indices of CFA (ML) Models of BRIEF-Parent Form Based on the Mothers as Raters

χ2

df χ2


8-scale

Null 1649.599 28 -- -- -- -- -- --

Unity-8 245.615* 20 1403.984* .805 .861 .074 .206 (.183, .229) .054

2Original-8 137.630* 19 107.985* .892 .927 .055 .153 (.129, .177) .046

2Donders-8 170.325* 19 75.290* .862 .907 .064 .173 (.149, .197) .045

9-scale

Null 1778.992 36 -- -- -- -- -- --

Unity-9 323.773* 27 1455.219* .773 .830 .087 .203 (.183, .223) .068

2Monitor-9 128.026* 26 195.747* .919 .941 .048 .121 (.101, .142) .039

3Monitor-9 103.022* 24 25.004* .932 .955 .044 .111 (.089, .133) .033

4Monitor-9 100.889* 21 2.133 .921 .954 .041 .120 (.096, .143) .032

Note. N = 267. ML = Maximum likelihood extraction; Unity-8 = One-factor, eight-scale model; 2Original-8 = Two-factor, eight-

scale Gioia model; 2Donders-8 = Two-factor, eight-scale Donders model; Unity-9 = One-factor, nine-scale model; 2Monitor-9 =

Two-factor, nine-scale model; 3Monitor-9 = Three-factor, nine-scale model; 4Monitor-9 = Four-factor, nine-scale model; χ2

=

chi-square; df = degrees of freedom; χ2

diff = chi-square difference; NNFI = Bentler-Bonett non-normed fit index; CFI =

comparative fit index; SRMR = standard root mean square; RMSEA = root mean square error of approximation; CI = confidence

interval; AODSR = average off-diagonal standardized residual. *p < .05

98

DISCUSSION

The purpose of the study was to examine whether the eight-scale factor structure of

BRIEF-Parent (Gioia et al., 2000) scores could be replicated in a mixed clinical sample of

school-aged children. This study was unique in that it (a) was independently conducted, (b) had

a sample of US children with mixed clinical diagnoses, and (c) contained an adequately large

sample size for running a scale-level CFA relative to other independent published studies (e.g.,

Egeland & Fallmyr, 2010). Besides testing the original eight-scale version, other models of the

eight scales were tested as well as several models of the nine-scale format. Testing alternative

models is recommended to ensure that a preferred model is not accepted without considering

competing models that could also fit the data just as well if not better (Jöreskog, 1993). Looking

solely at Gioia et al.’s original model, the CFA findings appear to support this structure. Also,

when compared to the other two 8-scale models (Unity-8 and 2Donders-8), the 2Original-8

model appears to fit the data best. However, three of the four 9-scale models (2Monitor-9,

3Monitor-9, 4Monitor-9) demonstrated as strong a fit to the data as the 2Original-8 model and

thus are plausible models in understanding the structure of the BRIEF-Parent. The discussion

will examine potential explanations for the findings of the eight-scale models in contrast to the

nine-scale ones. Furthermore, the findings will be examined in relation to the theoretical premise

and current factor structure of the BRIEF-Parent. Limitations of the study will be presented as

well as implications for practice and future research.

Eight-Scale Models of the BRIEF-Parent

The research question of this study was “will the factor structure of BRIEF-Parent scores

obtained from a mixed clinical sample of school-aged children align with the two-factor, eight-

99

scale structure originally proposed by the test authors?” Based on prior research (i.e., Donders et

al., 2010; Huizinga & Smidts, 2011; Slick et al., 2006), when compared to a one-factor model,

the 2Original-8 model appears to fit the data best. From a theoretical perspective, this finding

makes sense. The one-factor model has a poor fit because there is no differentiation between the

behavioral and cognitive aspects of executive function. The one-factor model is considered to

align with the theoretical perspective of unity (Baddeley, 1986), which was discussed in the

literature review. In short, the premise of the theory of unity is all executive processes combine

to constitute an overarching, interconnected supervisory system. This view is generally

considered to be outdated (see Packwood, Hodgetts, & Tremblay, 2011).

However, the 2Original-8 Model and the 2Donders-8 Model contain two factors (BRI

and MI), which delineate between the behavioral and cognitive components of EF. Although

both models represent both components, the configuration of the scales is not the same. On the

2Original-8 Model, the Inhibit scale is on the BRI factor, but on the 2-Donders-8 Model, the

Inhibit scale is on the MI. Sampling may explain the theoretical rationale for the difference in

model configuration. Donders et al.’s (2010) model is based on information gleaned from a

group of children with traumatic brain injury (TBI). Thus, the type of sample may have

informed Donders et al.’s view that the Inhibit scale is a component of the cognitive factor (MI)

instead of solely a behavioral factor (BRI) as Gioia et al. (2000) has proposed. In essence, the

Inhibit scale is considered to measure a cognitive instead of behavioral aspect of impulse control.

However, Donders et al. acknowledge that what is measured by the Inhibit scale is not entirely

clear because it fails to correlate with traditional performance-based measures of inhibitory

control (Bodnar, Prahme, Cutting, Denckla, & Mahone, 2007). Furthermore, Gioia et al. (2000),

100

also found that the Inhibit scale loaded on the MI factor in the normative sample, but loaded on

the BRI factor in the clinical sample. In the current study, the findings support Gioia et al.’s

original model (2Original-8 Model), not Donders et al.’s. This finding is not unexpected, given

that the sample used was a mixed clinical sample, one similar to Gioia et al.’s. This study is

unique in that no other CFA study has examined the 2Original-8 Model in relation to both the

Unity-8 and 2Donders-8 Models, particularly using a mixed clinical sample of youth. However,

would the fit of the model to the data been reversed if a TBI sample had been used to compare

the two-factor 8-scale models? Or would Donders et al.’s finding been different if CFAs instead

of EFAs had been used?

Thus, it is inconclusive what position the Inhibit scale has in relation to the other

executive function constructs. It is unknown whether the difference in Donders et al.’s finding

could be due to sampling, the specificity of a model to a unique sample, or a better model. Is

Donders et al.’s model unique to children with TBI? Why did the scale also load on the MI in

Gioia et al.’s (2000) factor analysis? Further research using various clinical samples is

warranted to understand Donders et al.’s eight-scale model for the BRIEF-Parent scale.

The current study’s support for the two-factor, eight-scale structure of the BRIEF-Parent

version adds to the research (Batan et al., 2011; Huizinga & Smidts, 2011; Qian & Wang, 2007;

Slick et al., 2006), which also supports an eight-scale model of the BRIEF. Two international

studies have provided support for an eight-scale structure; Batan et al. (2011) conducted an EFA

instead of a CFA, and Qian and Wang (2007) concluded that a CFA showed that the eight-scale

model of the BRIEF was “reasonable.” Unfortunately, abstracts for both of these studies were

the only information available in English; thus, it is unknown whether alternative models were

101

considered and, if so, what criteria was used in making these conclusions. Slick et al. (2006)

supported the 2Original-8 Model with a small clinical sample of U.S. children diagnosed with

intractable epilepsy, using EFA, but the Monitor scale loaded on both factors. A three-factor

model was apparently tested; however, little information was provided on the solution. Slick et

al. noted that the three-factor solution was “explored” (p. 186), but disregarded as a viable

solution. Huizinga and Smidts (2011) supported an adapted version of the eight-scale structure

using CFA on item-level data. The model was run twice because the first time indicated poor fit,

so three parameters were freely estimated and then re-run in order to improve model fit. Because

this revised version of the model was within recommended standards, the authors deemed further

investigation (i.e., testing alternative models) unnecessary. However, the authors used a Dutch

version with a different number of items from the original scale making generalization difficult.

In summary, studies have supported some version of a two-factor, eight-scale model.

These studies have varied in an number of characteristics: (a) small sample size (N = 80-100;

Donders et al., 2010; Slick et al., 2006), (b) sample with specific clinical diagnosis, such as TBI

(Donders et al., 2010) or intractable epilepsy (Slick et al., 2006), or (c) translated versions with a

different number of items than the original BRIEF-Parent (Batan et al., 2011; Huizinga &

Smidts, 2011; Qian & Wang, 2007). Despite differences and limitations, researchers have

provided full or partial support for Gioia et al.’s eight-scale model. However, factor analytic

studies that supported a version of the two-factor, eight-scale structure (i.e., Donders et al., 2010;

Huizinga & Smidts, 2011; Slick et al., 2006) did not test any of the nine-scale versions of the

scale as alternative models. Gioia et al. (2002) compared only nine-scale versions to one another

with no eight-scale versions serving as alternative models. In contrast, Egeland and Fallmyr

102

(2010) and the current study examined both eight- and nine-scale versions. The current study

examined a four-factor, nine-scale model whereas Egeland and Fallmyr did not.

Nine-Scale Models of the BRIEF-Parent

Despite limited research on the original eight-scale structure of the BRIEF-Parent scale,

research on alternative versions of the instrument has grown. Following the release of the

BRIEF in 2000, Gioia and Isquith (2002) posited that monitoring one’s own problem-solving is a

distinct entity than monitoring one’s social behavior and thus should be examined in the context

of the BRIEF. Gioia and Isquith proposed re-examining the eight-item Monitor scale of the

BRIEF by dividing it into two 4-item scales: (a) monitoring of task-related activities (Task-

Monitor scale), and (b) monitoring of personal behavior activities (Self-Monitor scale). This

structure was considered theoretically viable due to the increasingly prevalent research that has

supported a model of two distinct emotional and attentional components of the brain (Dolcos &

McCarthy, 2006). Furthermore, there has been evidence that the BRIEF-Parent may be

improved by differentiating the Monitor scale. For example, Slick et al. (2006) reported an EFA

that supported a two-factor, eight-scale solution, but also showed that the Monitor scale loaded

on both the BRI and MI. Other clinical studies (Gilotty et al., 2002; McCandless & O’Laughlin,

2007) have shown that the Monitor scale correlated highly with both the BRI and MI factors.

Egeland and Fallmyr (2010) have raised concerns about the viability of the nine-scale

format. They contend that the reliability estimates of the scores for both the four-item Self-

Monitor and four-item Task-Monitor scales may not be adequately supported due to the small

number of items on each scale relative to the original eight-item Monitor scale. However, this

concern was not an issue in the current study. All estimates of reliability for the two Monitor

103

scales met or were close to the .80 cutoff. Gioia and Isquith (2002) also found that the reliability

estimates of the scores for both scales have been greater than .70, but the issue is whether these

estimates are adequate when the scales are used in making high stakes decisions.

The focus on 9-scale models goes beyond the division of the Monitor scale, but also has

been on the expansion of factors due to re-configuring the placement of pre-existing scales.

Such revisions point to how views on executive function are changing. Both three-factor and

four-factor 9-scale models have been proposed and tested. The three-factor model highlights that

there is a distinction between emotional regulation and inhibitory behavior control, whereas the

four-factor model parses out emotional regulation as well as differentiates between internal and

external metacognition.

In the current study, these nine-scale models, as well as a one-factor model, were tested;

three of the four 9-scale models showed a strong fit to the data (2Monitor-9, 3Monitor-9,

4Monitor-9), meeting the criteria for goodness of fit, with the exception of RMSEA. The two-

factor, nine-scale (2Monitor-9) model seemed to fit the data, but when compared to the three-

factor (3Monitor-9) and four-factor (4Monitor-9) nine scale models, the 2Monitor-9 model was

not as viable of a solution. Along with the strong statistical evidence that supported the three-

and four-factor solutions, the factor structures seemed to align with the theoretical views of EF.

In the three-factor model, the ERI factor (Emotional Control and Shift scales) was parsed out

from the “original” BRI factor and a “new” BRI factor was created (Inhibit and Self-Monitor

scales). This delineation made practical sense because the “new” BRI was then made up of

scales that measured inhibitory behavior whereas the ERI was comprised of scales that measured

internalized emotional control. The MI factor remained intact in this model and was separated

104

from behavioral and emotional regulation. As Gioia et al. (2002) has pointed out, this three-

factor solution aligned with Barkley’s (1997) theory, which included a three-prong model of

executive function: (a) behavioral (inhibitory) control; (b) emotional regulation; and (c)

metacognition. The four-factor solution also included the “new” BRI and the ERI factors as well

as the division of MI into Internal MI and External MI. Although a four-prong approach to

executive function has been theorized (see Shallice and Burgess, 1991b), there are problems with

this model as the four-factor CFA model showed that the correlation between the Internal MI and

External MI was almost perfectly correlated. These factors seemed to be measuring essentially

the same construct, despite the slight improvement in the fit indices. Further, the four-factor

model presented no statistically significant advantage over the three-factor model in that the

incremental χ2 test showed that the additional constraint did not improve the fit of the model to

the data. Prior research (Egeland & Fallmyr, 2010; Gioia et al. 2002), in addition to the issue of

parsimony, supported that the three-factor (3Monitor-9) model was a better fit than the four-

factor model. The “best” eight-scale model was determined to be Gioia et al.’s (2000) original

model (2Original-8); however, this model fell short when directly compared to the several of the

nine-scale models. There are several possible reasons for finding results that differ from those

reported in the BRIEF-Parent manual.

Differences in Findings

The difference in findings about the eight-scale structure between this study and Gioia et

al.’s (2000) could be due to the factor analytic method used. Gioia et al. (2000) used principal

factor analysis with oblique (direct oblimin) rotation to develop the BRIEF-Parent, whereas CFA

was used in the current study. While both factor analytic methods are theory-driven, CFA is

105

designed to test specific models about the nature of the factors while EFA is designed to

determine whether a set of variables create a factor that best reflects a theoretical construct

(Byrne, 2006). One of the biggest criticisms of exploratory factor analysis is that interpretation

of results can hinge largely on a researcher’s judgment (Tabachnick & Fidell, 2001). Because of

this vulnerability, several methods of factor retention methods (e.g., parallel analysis & MAP)

are recommended for use beyond the eigenvalue rule of one (Henson & Roberts, 2006). There is

no mention in the test manual that these additional procedures were used. However, Gioia et al.

(2000) does explicitly state, “the traditional method of determining the number of factors (i.e.,

eigenvalues > 1.0) was overridden in favor of theoretical considerations” (p. 61). The only stated

selection criterion was that pattern coefficients needed to be greater than |.40|. No models with

more than two factors were considered because solutions with a greater number of factors

produced factor defined by single variables and “did not add to the interpretability of the scales”

(Gioia et al., 2000; p. 62). Overall, the difference in findings regarding the factor structure may

be due to these limitations or the vague “theoretical considerations” employed by Gioia et al. to

establish the factor structure of the BRIEF-Parent scale.

Therefore, this study is particularly important in that all eight and nine scale models of

the BRIEF-Parent were tested in a U.S. sample of school-aged children with mixed clinical

diagnoses. No U.S. study has done this before. Egeland and Fallmyr (2010) was the first to

examine both types of models, but in a Norwegian sample. In both studies, Egeland and Fallmyr

and the current one, Gioia et al.’s original (2Original-8 Model) was not found to be the best

fitting model. Instead a nine-scale model, either the three or four-factor, was found to be more

viable. Thus, the findings of this study are even more important in that it is only the third study

106

to support a three-factor, nine-scale partition of the BRIEF-Parent. The current study is unique

from the other studies in that it is the first independent study of the BRIEF-Parent form

conducted in the United States using a large, mixed clinical sample of referred children,

comparing both eight- and nine-scale versions of the BRIEF-Parent.

Reasons for Misfit

A common area of misfit across the current study as well as other CFA studies on the

BRIEF-Parent (Gioia et al., 2002; Egeland & Fallmyr, 2010) involved the RMSEA value. The

RMSEA is used to measure the discrepancy between the error of approximation in the population

covariance matrix and optimally chosen parameter values of the model (Steiger & Lind, 1980).

However, a known issue concerning this index is that when sample size is small, RMSEA tends

to over-reject true population models (Hu & Bentler, 1999). In the three aforementioned studies,

RMSEA values fell above the 0.08 (Brown & Cudeck, 1993) cutoff for all tested models.

RMSEA values arranged by study and model are provided in Table 9.

Table 9

Root Mean Square Error Approximation (RMSEA) Values Arranged by Model and Study

Model

Study One-factor Two-factor Three-factor Four-factor

Gioia et al. (2002) .21 .12 .11 .12

Egeland & Fallmyr (2010) .23 .12 .14 --

Current study .20 .12 .11 .12

Egeland and Fallmyr (2010) made no modifications to the models after running the

CFAs; however, Gioia et al. (2002) did by estimating three error covariances between the Inhibit

scale and the Working Memory, Organization of Materials, and Emotional Control scales. By

Gioia et al. estimating these error covariances, the RMSEA in the three-factor model was

107

decreased from .11 to .08. Post-hoc modifications should be done under a theoretical premise

(Byrne, 2006); thus, Gioia et al. cited Barkley (1997) and Burgess (1997) in defense of such

modifications and claimed that there is a known relationship between inhibition and other

processes, including working memory, organization, and emotional control. As a result, it made

sense to connect the error terms associated with these scales. Under the alternative model

approach of establishing models a priori, no post-hoc modifications were made in this study.

However, Gioia et al.’s findings may be useful in future research.

Limitations

There are several limitations of the study, which may reduce the external and internal

validity of the findings. The sample was geographically limited to Western Pennsylvania; thus,

the findings may differ in other areas of the United States or world-wide. Additionally, the

sample was largely comprised of an older population of youth aged 16 to 18 years of age because

many of the students in the sample were referred from OVR. This point leads to the argument

that students who pursue OVR services may not be representative of typical students in special

education. According to Halpern, Yobanoff, Doren and Benz (1995), who examined special

education students in their last year of high school, a majority of these students do pursue some

type of postsecondary education within one year after graduation, making it feasible that the

youth in this sample do accurately represent students in special education. Nonetheless, a larger

sample of students from age 5 to 15 years would have been ideal.

Another aspect to consider is that the majority of raters in this sample were mothers.

Most CFA studies on the BRIEF-Parent (e.g., Gioia et al., 2002; Huizinga & Smidts, 2011; Slick

et al., 2006) have not specified which parents or guardians served as raters, so it is difficult to

108

speculate whether or not this sample contained a disproportional number of mothers versus

father or other raters. However, Batan et al. (2011; Turkish version) reported in their abstract

that 73.8% were mothers, 22.1% were fathers, and 4.1% were other primary caregivers. Batan et

al.’s percentages were similar to those in the current study. A final demographic concern is the

cultural diversity of the sample. A vast majority of the participants were of Caucasian descent,

so it is hard to say if the results would have varied if a higher percentage of racial/ethnic minority

participants had been in the sample.

A variable that could have potentially compromised internal validity was sample size.

The current study did not have a sufficient number of participants to conduct a viable CFA at

item-level (Barrett & Kline, 1981). However, this study had an adequate number of participants

to justify scale-level CFA techniques when using a conservative recommended value of 20:1 for

the case-to-indicator ratio (Fabrigar et al. 1999). The current study had a similar sample size (N

= 371) as Gioia et al.’s (2002) CFA study (N = 374).

Another potential threat to internal validity was whether the use of a sample of mixed

clinical diagnoses was representative of a typical special education population. Data for most

participants were unavailable regarding the diagnoses or educational category, which entitled

students to receive services. The only data available about special education were from the

school sample (n = 62). Table 10 provides the percentage of participants in the school sample

receiving services across various categories compared to the percentage of students receiving

services that was reported by the same school district for the 2012-13 school year (PA

Department of Education, 2012). The comparison indicates that the school sample contained a

larger sample of those designated with a Specific Learning Disability and Emotional

109

Disturbance, but had a smaller sample of those designated with Autism and Other Health

Impairment. Thus, there is some indication that the mixed diagnoses sample may be

representative of a special education population, but this conclusion is based on less than 20% of

the sample, as this information for the rest of the sample is unknown.

Table 10

Percentage of Participants Receiving Special Education Services for School Sample (N = 62)

and School District (N = 1,180) by Category

Category School Sample School District

Autism 6.5 8.8

Emotional Disturbance 12.9 7.4

Gifted 4.8 4.2

Non-exceptional 16.1 14.6

Other Health Impairment 14.5 21.7

Specific Learning Disability 43.5 30.4

Traumatic Brain Injury 1.6 < 1

Implications

Practice. The CFA findings indicate that competing models of the BRIEF-Parent may fit

the data adequately. Such findings have implications for the use of the scale in school

andvocational settings. The use of the BRIEF-Parent in the school setting warrants several areas

of consideration.

How the Monitor scale is treated has implication for practice. Is the construct unitary or

multidimensional, reflecting two related but separate constructs (Self- and Task-Monitor)?

In its current format, Gioia et al. (2000) describe the Monitor scale as assessing the abilities to

keep track of one’s own and others’ efforts through “work-checking” behaviors. There is a

distinction, however, between Task-Monitor (i.e., monitoring of task-related activities), which

includes items such as “Does not check work for mistakes” and Self-Monitor (i.e., monitoring of

110

personal behavioral activities), which includes items such as “Is unaware of how his/her behavior

affects or bothers others” (Gioia & Isquith, 2002). Task-monitoring appears to involve high

face validity in that it involves completing tasks. In contrast, a Self-Monitor item involves more

self- and social awareness instead of academic behaviors. Differentiating between these skills on

the Monitor scale may yield potentially useful information when developing interventions for

children with social skills deficits versus those who need such support, rather require help

completing their work more thoroughly.

Another implication for practice is the usefulness of emotional regulation, measured by

the third factor, ERI. This factor measures a child’s ability to regulate emotions relative to other

children his/her age. In the original model, the scales that would comprise the ERI in the three-

factor model (i.e., Emotional Control and Shift) are combined with the Inhibit scale to form the

BRI. However, there is no distinction in the BRI between emotional regulation and inhibitory

behavior control. In combination with several other sources of student data, gaining more

information about a student’s emotional regulation is a potentially valuable piece of information

for parents and educators. The information gleaned from the ERI score could give educators and

practitioners a global sense of where a child’s emotional regulation stands relative to other

children his or her age. An elevated score in the ERI could provide further insight into a

student’s functioning in the areas of modulating emotions or moving freely from one situation or

activity to another. This information may contribute to the development of more individualized

instruction and provide more data about students with emotional needs. Without a separate score

parsed out from the BRI, the potential exists that the unique emotional component of problematic

student behaviors may be overlooked.

111

In applying the current findings into practice is the issue of how to use the Metacognitive

Index (MI). Would the scale be more useful as one measure or two? Results indicate that these

two aspects of MI (Internal and External) are highly correlated and may be measuring the same

construct on the BRIEF-Parent. The Internal MI scale involves behaviors that may not be as

readily reflected on an observer rating scale, but involve calculated actions that take place in the

brain. However, whatever information is made available from the BRIEF-Parent could still be

useful to educators. Strategies for instruction rarely occur in isolation and are integrated into

complex cognitive goals that entail higher-order sequences. Pressley and Woloshun (1995)

discussed the relationship between metacognitive behaviors and reading skills. Good reading

skills may entail activation of prior knowledge and self-questioning about text content. At the

very least, the BRIEF-Parent scores could be useful in alerting the practitioner to at-risk

behaviors in the areas of planning, strategizing, and initiating tasks, which are very different

skills from those in the External MI where interventions may involve organization and work-

checking techniques. Students struggling with Internal MI type skills would need to learn

different types of interventions or strategies, which are designed for the students to ask

themselves questions or check for comprehension.

Future research. Future factor analytic studies conducted on the BRIEF-Parent should

be based on a larger sample size to allow for item analysis. No studies using a mixed clinical

sample, including the current study, have had a large enough sample to run the analyses at item

level. Huizinga and Smidts (2010) used the largest sample size (N = 847) with a 75-item Dutch

version of the original 86-item scale. Information gleaned from item-level research would be

useful in establishing accurate interpretation of the BRIEF-Parent scores. Research has begun to

112

emerge in other countries using the BRIEF-Parent and examining its factor structure; however,

norms for the country must be established before accurate comparisons can be made with U.S.

findings. Executive function is a culturally bound concept; thus, comparing those from other

countries and cultures may be a challenging, but an important objective in fully understanding

individual functioning and neuropsychological pathways.

Egeland and Fallmyr (2010) also conducted research on the factor structure of the

BRIEF-Teacher using a group of Norwegian teacher raters. Their findings provided evidence to

support an alternative three-factor, nine-scale model, which differed from the model provided in

the test manual. However, more direct evidence is needed, so conducting a CFA on the Teacher

form of the BRIEF using a large mixed clinical sample of U.S. youth is warranted. American

schools likely differ from Norwegian schools as well as teachers’ expectations of students in

terms of behavior or academic performance. It is difficult to say how cultural differences would

have an impact on the scores (and therefore factor structure) of the BRIEF-Teacher.

Further, executive dysfunction has been identified as a source of difficulty in different

groups of children with academic problems. More specifically, executive dysfunction is evident

in clinical samples as well as students in various categories of special education under IDEA

(e.g., specific learning disabilities or unique medical conditions). Thus, research on the BRIEF

should continue to be conducted using mixed clinical samples. How executive dysfunction is

expressed in various groups of students as well as how effective academic interventions are in

targeting such behaviors has not been thoroughly investigated. Investigating the aptitude-

treatment interaction (Cronbach & Snow, 1977) between students of various levels of cognitive

113

ability and the results of the BRIEF-Parent is also an area of further investigation that was

beyond the scope of this study, but should be explored.

Conclusions

Executive function is still a relatively new concept in psychoeducational assessment in

schools (Hale & Fiorello, 2004). The popularity of the EF framework has drastically increased

in schools over the past decade and the use of the BRIEF-Parent is also likely to continue to

increase in this setting. Therefore, research must continue on instruments, such as the BRIEF-

Parent, to ensure reliable and valid scores are used in making good decisions about those

students requiring special education services.

The purpose of this study was to test whether (a) the original eight-scale division of the

BRIEF-Parent scale was replicable in a mixed clinical sample of school-aged children and (b)

whether the original model would fit the data best when an alternative model approach was used.

Both eight-scale and nine-scale models have been tested and reported in the literature. However,

no comparisons have been made in an American sample between all of the existing models.

The current findings show that three of the four (two-, three-, and four-factor models) of

the nine scale models have a slightly better fit to the data than the original two-factor model,

eight scale division, which is currently the basis for scoring and interpreting the BRIEF-Parent.

The information brought to light in this study should be taken in consideration by the test

developers, warranting further examination, given that the BRIEF-Parent is used in the school

setting. Furthermore, more empirical research needs to be conducted on the relationship between

the results yielded from executive function instruments (such as the BRIEF-Parent) and their

connection to academic intervention. The use of the BRIEF-Parent as evidence to warrant

114

educational placement is beyond the scope of the instrument. Because executive function is only

one aspect of an individual cognitive functioning, it is essential that the BRIEF-Parent should not

be used in isolation for high states assessment, particularly in settings and populations where the

validity of the test scores has not been empirically-supported.

115

REFERENCES

Achenbach, T. (1991). Manual for the Child Behavior Checklist and 1991 profile. Burlington,

VT: University of Vermont, Department of Psychiatry.

Achenbach, T., McConaughy, S., & Howell, C. (1987). Child/adolescent behavioral and

emotional problems: Implications of cross-informant correlations for situational

specificity. Psychological Bulletin, 101, 213-232.

Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle.

In B. N. Petrove & F. Csaki (Eds.), Second International Symposium of Information

Theory (pp. 267-281). Budapest, Hungary: Akademiai Kiado.

Akibami, L. J., Liu, X., Pastor, P. N., & Reuben, C. A. (2011). Attention deficit hyperactivity

disorder among children aged 5–17 years in the United States, 1998–2009. NCHS Data

Brief, 70, 1-8.

Alexander, G. R., & Slay, M. (2002). Prematurity at birth: Trends, racial disparities, and

epidemiology. Mental Retardation and Developmental Disabilities Research Reviews, 8,

215-220.

American Psychiatric Association (2000). Diagnostic and statistical manual of mental disorders

4th

edition-text revision (DSM-IV-TR). Washington, DC: Author.

Americans with Disabilities Act of 1990, Pub. L. No. 34 C. F. R. 104 Stat. 33 (2000). Retrieved

from http://www2.ed.gov/policy/rights/reg/ocr/34cfr104.pdf

Anderson, V., Anderson, P., Northam, E., Jacobs, R., & Mikiewicz, O. (2002). Relationships

between cognitive and behavioral measures of executive function in children with brain

disease. Child Neuropsychology, 8, 231-240.

http://web.ebscohost.com.ezaccess.libraries.psu.edu/ehost/viewarticle?data=dGJyMPPp44rp2%2fdV0%2bnjisfk5Ie46bRLs6i1TbSk63nn5Kx95uXxjL6vrUytqK5Jr5avSLiqr1KwqJ5Zy5zyit%2fk8Xnh6ueH7N%2fiVauntk2zrrBNtqqvPurX7H%2b72%2bw%2b4ti7ebfepIzf3btZzJzfhruvtki0rrBQpNztiuvX8lXk6%2bqE8tv2jAAA&hid=110



116

Aylward, G. P. (2004). Neonatology and Prematurity. In R. T. Brown (Ed.), Handbook of

pediatric psychology in school settings (pp. 489-502). Mahwah, NJ: Lawrence Ehrlbaum.

Baddeley, A. (1986). Working memory. Oxford, UK: Clarendon Press.

Baddeley, A. (1996). Exploring the central executive. The Quarterly Journal of Experimental

Psychology, 49A, 5-28.

Baddeley, A., & Hitch, G. (1974). Working memory. In G. H. Bower (Ed.), The psychology of

learning and motivation (vol. 8; pp. 47-89 New York, NY: Academic.

Baddeley, A., & Wilson, B. (1988). Frontal amnesia and the dysexecutive syndrome. Brain and

Cognition, 7, 212-230.

Barkley, R. A. (1997). ADHD and the nature of self-control. New York, NY: Guilford.

Batan, S. N., Öktem-Tanör, Ö., & Kalem, E. (2011). Reliability and validity studies of

Behavioral Rating Inventory of Executive Function (BRIEF) in a Turkish normative

sample. Elementary Education Online, 10, 894-904.

Beck, D. M., Schaefer, C., Pang, K., & Carlson, S. M. (2011). Executive function in

preschool children: Test-retest reliability. Journal of Cognition & Development, 12, 169-

193.

Bentler, P. M. (1988). Comparative fit indexes in structural models. Psychological

Bulletin, 107, 238-246.

Bentler, P. M. (1995). EQS structural equations program manual. Encino, CA: Multivariate

Software.

Bernstein, J. H. & Waber, D. P. (2007). Executive function in education from theory to

practice. New York, NY: Guilford Press.

117

Best, J., & Miller, P. (2010). A developmental perspective on executive function. Child

Development, 81, 1641-1660.

Best, J., Miller, P., Jones, L. (2009). Executive functions after age 5: Changes and correlates.

Developmental Review, 29, 180-200.

Bishop, T. (2011). Relationship between performance-based measures of executive function and

the Behavior Rating Inventory of Executive Function (BRIEF), a parent rating measure

(Doctoral dissertation). Illinois Institute of Technology, Chicago, IL.

Blais, M. A. (2011). A guide to applying rating scales in clinical psychiatry. Psychiatric Times,

28, 58-62.

Bodnar, L. E., Prahme, M. C., Cutting, I. E., Denckla, M. B., & Mahone, E. M. (2007).

Construct validity of parent ratings of inhibitory control. Child Neuropsychology, 13,

345-32.

Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In K. A. Bollen

& J. S. Long (Eds.), Testing structural equation models (pp. 445-455). Newbury Park,

CA: Sage.

Browne, M. W., & Cudeck, R. (1989). Single sample cross-validation indices for covariance

structures. Multivariate Behavioral Research, 24, 445-455.

Bull, R., & Scerif, G. (2001). Executive functioning as a predictor of children’s mathematics

ability: Inhibition, switching, and working memory. Developmental Neuropsychology,

33, 205-228.

118

Burgess, P. (1997). Theory and methodology in executive function research. In P. Rabbitt (Ed.),

Methodology of frontal executive function (pp. 81-116). Hove, East Sussex: Psychology

Press.

Byrne, B. M. (2006). Structural equation modeling with EQS: Basic concepts, applications, and

programming. (2nd

ed.). Mahwah, NJ: Lawrence Erlbaum.

Cameron, C. E., Connor, C. M., Morrison, F. J., & Jewkes, A. M. (2008). Effects of classroom

organization on letter-word reading in first grade. Journal of School Psychology, 46, 173-

192.

Cantin, R. H., Mann, T. D., & Hund, A. M. (2012). Executive functioning predicts school

readiness and success: Implications for assessment and intervention. Communique, 41, 1.

Carroll, J. B. (1993). Human cognitive abilities: A survey of factor-analytic studies. New York:

Cambridge University Press.

Chafouleas, S. M., Riley-Tillman, T. C., & Sugai, G. (2007). School-based behavioral

assessment: Informing Instruction and Intervention. New York, NY: Guilford Press.

Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for testing

measurement invariance. Structural Equation Modeling: A Multidisciplinary Journal, 9,

233-255.

Cicchetti, D. V. (1994). Guidelines, criteria, and rules of thumb for evaluating normed and

standardized assessment instruments in psychology. Psychological Assessment, 6, 284-

290.

119

Clark, C. A., Pritchard, V. E., & Woodward, L. J. (2010). Preschool executive functioning

abilities predict early mathematics achievement. Developmental Psychology, 46, 1176-

1191.

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd

ed.). Hillsdale, NJ:

Lawrence Erlbaum.

Coll, C. G., Akerman, A., & Cicchetti, D. (2000). Cultural influences on developmental

processes and outcomes: Implications for the study of development and psychopathology.

Development and Psychopathology, 12, 333-356.

Comrey, A. L., & Lee, H. B. (1992). A first course in factor analysis. (2nd

ed.). Hillsdale, NJ:

Lawrence Erlbaum.

Conners, C. K. (1989). Manual for the Conners’ Rating Scales. North Towanda, NY: Multi-

Health Systems.

Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16,

297-334.

Cronbach, L. & Snow, R. (1977). Aptitude and instructional methods: A handbook for research

on interactions. New York, NY: Irvington.

Curran, P. J., West, S. G., & Finch, J. F. (1996). The robustness of test statistics to nonnormality

and specification error in confirmatory factor analysis. Psychological Methods, 1, 16-29.

Dawson, P., & Guare, R. (2009). Smart but scattered. New York, NY: Guilford Press.

Denckla, M. B. (2002). The Behavior Rating Inventory of Executive Function: Commentary.

Child Neuropsychology, 8, 304-306.

120

Diamantopoulos, A., & Siguaw, J. A. (2000). Introducing LISREL: A guide for the uninitiated.

London: Sage.

Diamond, A., & Lee, K. (2011). Interventions shown to aid executive function development in

children 4 to 12 years old. Science, 333, 959-964. doi:10.1126/science.1204529

Dolcos, F., & McCarthy, G. (2006). Brain systems mediating cognitive interference by emotional

distraction. The Journal of Neuroscience, 26, 2072-2079.

Donders, J., DenBraber, D., & Vos, L. (2010). Construct and criterion validity of the Behavior

Rating Inventory of Executive Function (BRIEF) in children referred for

neuropsychological assessment after paediatric traumatic brain injury. Journal of

Neuropsychology, 4, 197-209. doi:10.1348/174866409X478970

Duncan, J., Emislie, H., Williams, P., Johnson, R., & Freer, C. (1996). Intelligence and the

frontal lobes: The organization of goal-directed behavior. Cognitive Psychology, 30, 257-

303.

DuPaul, G. J., Power, T. J., Anastopoulos, A. D., & Reid, R. (1998). ADHD Rating Scale – IV:

Checklist, norms and clinical interpretation. New York, NY: Guilford Press.

Egeland, J., & Fallmyr, Ø. (2010). Confirmatory factor analysis of the Behavior Rating

Inventory of Executive Function (BRIEF): Support for a distinction between emotional

and behavioral regulation. Child Neuropsychology, 16, 326-337. doi:

10.1080/09297041003601462

Eisenberg, N., Liew, J., & Pidada, S. U. (2004). The longitudinal relations of regulation and

emotionality to quality of Indonesian children’s socioemotional functioning.

Developmental Psychology, 40, 790-804.

121

Eslinger, P. J., & Damasio, A. R. (1985). Severe disturbance of higher cognition after

bilateral frontal lobe ablation: Patient EVR. Neurology, 35, 1731–1741.

Fabrigar, L. R., Wegener, D. T., MacCallum, R. C., & Strahan, E. J. (1999). Evaluating the use

of exploratory factor analysis in psychological research. Psychological Methods, 3, 272-

299. doi: 1082-989X/99/53.00

Fisher, A. B. & Watkins, M. W. (2008). ADHD rating scales’ susceptibility to faking in a college

student sample. Journal of Postsecondary Education and Disability, 20, 81-92.

Fisk, J. E., & Sharp, C. A. (2004). Age-related impairment in executive functioning: Updating,

inhibition, shifting, and access. Journal of Clinical and Experimental Neuropsychology,

26, 874-890.

Fitzpatrick, C. (2003). [Review of the test Behavior Rating Inventory of Executive Function]. In

The fifteenth mental measurements yearbook. Available from http://

ovidsp.tx.ovid.com.ezaccess.libraries.psu.edu.

Floyd, R. G., Bergeron, R., & Hamilton, G. (2004). Joint exploratory factor analysis of the

Delis-Kaplan Executive Function System and the Woodcock-Johnson III Tests of

Cognitive Abilities. Poster presented at the Annual Meeting of the American

Psychological Association, Honolulu, HI.

Fournier-Vicente, S., Larigauderie, P., & Gaonac’h, D. (2008). More dissociations and

interactions within central executive functioning: A comprehensive latent-variable

analysis. Acta Psychologica, 129, 32-48.

122

Franzen, M. D., & Wilhelm, K. L. (1996). Conceptual foundations of ecological validity in

neuropsychological assessment. In R. J. Sbordone & C. J. Long (Eds.), Ecological

validity of neuropsychological testing (pp. 91-112). Boca Raton, FL: St. Lucie.

Garon, N., Bryson, S. E., & Smith, I. M. (2008). Executive function in preschoolers: A review

using an integrative framework. Psychological Bulletin, 134, 31-60.

Gilotty, L., Kenworthy, L., Sirian, L., Black, D., & Wagner, A. (2002). Adaptive skills and

executive function in autism spectrum disorders. Child Neuropsychology, 8, 241-248.

Gioia, G. A., Espy, K. A., & Isquith, P. K. (2003). The Behavior Rating Inventory of Executive

Function – Preschool Version professional manual. Odessa, FL: Psychological

Assessment Resources.

Gioia, G. A., & Isquith, P. K. (2002). Two faces of monitor: Thy self and thy task [Abstract].

Journal of the International Neuropsychological Society, 8, 229.

Gioia, G. A., & Isquith, P. K. (2004). Ecological assessment of executive function in traumatic

brain injury. Developmental Neuropsychology, 25, 135-158.

Gioia, G. A. Isquith, P. K., Guy, S. C., & Kenworthy, L. (2000). The Behavior Rating Inventory

of Executive Function professional manual. Odessa, FL: Psychological Assessment

Resources.

Gioia, G. A., Isquith, P. K., Retzlaff, P. D., & Espy, K. A. (2002). Confirmatory factor analysis

of the Behavior Rating Inventory of Executive Function (BRIEF) in a clinical sample.

Child Neuropsychology, 8, 249-257. doi:0929-7049/02/0804-249

http://web.ebscohost.com.ezaccess.libraries.psu.edu/ehost/viewarticle?data=dGJyMPPp44rp2%2fdV0%2bnjisfk5Ie46bRLs6i1TbSk63nn5Kx95uXxjL6vrUytqK5Jr5avSLiqr1KwqJ5Zy5zyit%2fk8Xnh6ueH7N%2fiVauntk2zrrBNtqqvPurX7H%2b72%2bw%2b4ti7ebfepIzf3btZzJzfhruvtki0rrFPpNztiuvX8lXk6%2bqE8tv2jAAA&hid=110

http://web.ebscohost.com.ezaccess.libraries.psu.edu/ehost/viewarticle?data=dGJyMPPp44rp2%2fdV0%2bnjisfk5Ie46bRLs6i1TbSk63nn5Kx95uXxjL6vrUytqK5Jr5avSLiqr1KwqJ5Zy5zyit%2fk8Xnh6ueH7N%2fiVauntk2zrrBNtqqvPurX7H%2b72%2bw%2b4ti7ebfepIzf3btZzJzfhruvtki0rrFPpNztiuvX8lXk6%2bqE8tv2jAAA&hid=110

123

Godefroy, O., Cabaret, M., Petit-Chenal, V., Pruvo, J., & Rousseaux, M. (1999). Control

functions of the frontal lobe: Modularity of the central-supervisory system. Cortex, 35, 1-

20.

Goldberg, E. (2001). The executive brain: Frontal lobes and the civilized mind. New York, NY:

Oxford University Press.

Gonon, F., Bezard, E., & Boraud, T. (2011). Misrepresentation of neuroscience data might give

rise to misleading conclusions in the media: The case of attention deficit hyperactivity

disorder. PLoS ONE, 6, 1-8.

Greenberg, L. M., & Kindschi, C. L. (1996). Test of variables of attention: Clinical guide. Los

Alamitos, CA: Universal Attention Disorders.

Guy, S. C., Isquith, P. K., Gioia G. A. (2004). The Behavior Rating Inventory of Executive

Function- Self-Report Version. Lutz, FL: Psychological Assessment Resources.

Hale, J. B., & Fiorello, C. A. (2004). School neuropsychology: A practitioner’s handbook. New

York, NY: Guilford Press.

Halpern, A. S., Yovanoff, P., Doren, B., & Benz, M. R. (1995). Predicting participation in

postsecondary education for school leavers with disabilities. Exceptional Children, 62,

151-164.

Harris, M. B. (1996). Aggressive experiences and aggressiveness: Relationship to ethnicity,

gender, and age. Journal of Applied Psychology, 26, 843-870.

Heaton, R. K. (1981). Manual for the Wisconsin Card Sorting Test. Odessa, FL: Psychological

Assessment Resources.

124

Henson, R. K., & Roberts, J. K. (2006). Use of exploratory analysis in published research:

Common errors and some comments on improved practice. Educational and

Psychological Measurement, 66, 393-416. doi: 10.1177/0013164405282485.

Hintze, J. M., Volpe, R. J., & Shapiro, E. S. (2007). Best practices in the systematic direct

observation of student behavior. In A. Thomas & J. Grimes (Eds.), Best practices in

school psychology-V (pp. 319-336). Bethesda, MD: National Association of School

Psychologists.

Holler, R., & Zirkel, P. A. (2008). Section 504 and public school students: A national survey

concerning “Section 504-Only” student. NASSP Bulletin, 92, 19-43.

Hu, L., & Bentler, P. M. (1995). Evaluating model fit. In R. H. Hoyle (Ed.), Structural equation

modeling: Concepts, issues, and applications (pp. 76-99). Thousand Oaks, CA: Sage.

Hu, L., & Bentler, P. M. (1998). Fit indices in covariance structure modeling: Sensitivity to

under parameterized model misspecification. Psychological Methods, 3, 424-453.

Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indices in covariance structure analysis:

Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1-55.

Hughes, C., & Graham, A. (2002). Measuring executive functions in childhood: Problems and

solutions? Child and Adolescent Mental Health, 7, 131-142.

Huizinga, M., & Smidts, D. P. (2011). Age-related changes in executive function: A normative

study with the Dutch version of the Behavior Rating Inventory of Executive Function

(BRIEF). Child Neuropsychology, 17, 51-66. doi:10.1080/09297049.2010.509715

125

Hulac, D. M. (2008). Evaluating executive functioning, academic achievement and emotional

control with adolescent females in a residential treatment center (Doctoral dissertation).

University of Northern Colorado, Greeley, CO.

Individuals with Disabilities in Education Act (IDEA) (2006). Washington, DC: U.S.

Government Printing Office. Retrieved on March 20, 2012 from

http://idea.ed.gov/download/finalregulations.pdf

Isquith, P., Gioia, G., & PAR staff (2002). Behavior Rating Inventory of Executive

Function Scoring Portfolio. Odessa, FL: Psychological Assessment Resources.

Jepsen, M. I., Gray, K. M., & Taffe, J. R. (2012). Agreement in multi-informant assessment of

behaviour and emotional problems and social functioning in adolescents with Autistic

and Asperger’s Disorder. Research in Autism Spectrum Disorders, 6, 1091-1098.

Johansson, S., & Cnattigius, S. (2010). Epidemiology of preterm birth. In C. Nosarti, R. Murray,

& M. Hack (Ed.) Neurodevelopmental outcomes of preterm birth: From childhood to

adult life (pp. 1-38). New York, NY: Cambridge University Press.

Johnson, J., & Reid, R. (2011). Overcoming executive functioning deficits with students with

ADHD. Theory Into Practice, 50, 61-67.

Jöreskog, K. G. (1993). Testing structural equation models. In K. A. Bollen & J. S. Long

(Eds.), Testing structural equation models (pp. 294-316). Newbury Park, CA: Sage.

Jurado, M. B., & Rosselli, M. (2007). The elusive nature of executive functions: A review of our

current understanding. Neuropsychological Review, 17, 213-233. doi: 10.1007/s11065-

007-9040-z

126

Kaplan, E., Fein, D., Kramer, J., Delis, D., & Morris, R. (1999). WISC-III-PI manual. San

Antonio, TX: Psychological Corporation.

Kline, R. B. (2006). Principles and practice of structural equation modeling. New York, NY:

Guilford Press.

Landis, J., & Koch, G. (1977). The measurement of observer agreement for categorical data.

Biometics, 33 (1), 159-174.

Lane, K. L., O’Shaughnessy, T. E., Lambros, K. M., Gresham, F. M., & Beebe-Frankenberger,

M., E. (2002). The efficacy of phonological awareness training with first-grade students

who have behavior problems and reading difficulties. Journal of Emotional & Behavioral

Disorders, 9, 219-231.

Lehto, J. (1996). Are executive function tests dependent on working memory capacity?

Quarterly Journal of Experimental Psychology, 49, 29-50.

Lehto, J., Juujärvi, P., Kooistra, L., & Pulkkinen, L. (2003). Dimensions of executive

functioning: Evidence from children. British Journal of Developmental Psychology, 21,

59-80.

LeJeune, B., Beebe, D., Noll, J., Kenealy, L., Isquith, P., & Gioia G. (2010). Psychometric

support for an abbreviated version of the Behavior Rating Inventory of Executive

Function (BRIEF) Parent Form. Child Neuropsychology, 16, 182-201.

doi:10.1080/09297040903352556

127

Lewis, C., & Carpendale, J. I. (2009). Introduction: Links between social interaction and

executive function. In C. Lewis & J. I. M. Carpendale (Eds.), Social interaction and the

development of executive function. New Directions in Child and Adolescent

Development, 123, 1–15. doi:10.1002/cd.232

Lezak, M. D. (1983). Neuropsychological assessment (2nd ed.) New York, NY: Oxford

University Press.

Loeber, R., Green, S. M., & Lahey, B. B. (1990). Mental health professionals’ perception

of the utility of children, mothers, and teachers as informants on childhood

psychopathology. Journal of Clinical Child Psychology, 19, 136-143.

Loken, W. J., Thorton, A. E., Otto, R. L., & Long, C. J. (1995). Sustained attention after severe

closed head injury. Neuropsychology, 9, 592-598.

Logan, G. D., & Cowan, W. B. (1984). On the ability to inhibit thought or action: A theory of an

act of control. Psychological Review, 91, 295-327.

Luria, A. (1961). The role of speech in the regulation of normal and abnormal behavior.

Oxford, UK: Pergamon.

Luria, A. (1973). The working brain: An introduction to neuropsychology. New York, NY:

Basic.

MacCallum, R. C., Roznoski, M., & Necowitz, L. B. (1992). Model modifications in covariance

structure analysis: The problem of capitalization on chance. Psychological Bulletin, 111,

490-504.

128

Mahone, E. M., Cirino, P. T., Cutting, L. E., Cerrone, P. M., Hagelthorn, K. M., Hiemenz, J. R.,

… Denckla, M.B . (2002). Validity of the Behavior Rating Inventory of Executive

Function in children with ADHD and/or Tourette syndrome. Archives of Clinical

Neuropsychology, 17, 643-662.

Mahone, E. M., Koth, C. W., Cutting, L., Singer, H. S., & Denckla, M. B. (2001). Executive

function in fluency and recall measures among children with Tourette syndrome or

ADHD. Journal of the International Neuropsychological Society, 7, 102-111.

Marsh, H. W., & Grayson, D. (1995). Latent variable models of multitrait-multimethod

data. In R. Hoyle (Ed.), Structural equation modeling: Concepts, issues and

applications (pp. 177−198). Thousand Oaks, CA: Sage.

Martinez, Y. A., Schneider, B. H., Gonzales, Y. S., & del Pilar Soteras de Toro, M. (2008).

Modalities of anger expression and the psychosocial adjustment of early adolescents in

eastern Cuba. International Journal of Behavioral Development, 32, 207-217.

Mayes, S. D., Calhoun, S. L., Mayes, R. D., & Molitoris, S. (2012). Autism and ADHD:

Overlapping and discriminating symptoms. Research in Autism Spectrum Disorders, 6,

277-285.

McCandless, S., & O’Laughlin, L. (2007). The clinical utility of the Behavior Rating Inventory

of Executive Function (BRIEF) in the diagnosis of ADHD. Journal of Attention

Disorders, 10, 381-389. doi: 10.1177/1087054706292115

Mead, G. H. (1910). What objects must psychology presuppose? Journal of Philosophy,

Psychology, and Scientific Methods, 7, 174-180.

129

Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13-

103). New York, NY: Macmillan.

Messick, S. (1995). Validity of psychological assessment: Validation of person’s responses and

performances as scientific inquiry into score meaning. American Psychologist, 50, 741-

749. doi: 003-066X/95

Milich, R., Widiger, T.A., & Landau, S. (1987). Differential diagnosis of attention deficit and

conduct disorders using conditional probabilities. Journal of Consulting & Clinical

Psychology, 55, 762–767.

Miyake, A., Friedman, N., Emerson, M., Witzki, A., & Howerter, A. (2000). The unity and

diversity of executive functions and their contributions to complex “frontal lobe” tasks: A

latent variable analysis. Cognitive Psychology, 41, 49-100. doi: 10.1006/cogp.1999.0734.

Monsell, S. (1996). Control of mental processes. In V. Bruce (Ed.), Unsolved mysteries of

the mind: Tutorial essays in cognition (pp. 93-148). Hove, UK: Erlbaum.

National Center for Education Statistics (2010). Children 3 to 21 years old served under

Individuals with Disabilities Education Act, Part B, by type of disability: Selected years,

1976-77 through 2008-09. Retrieved from

http://nces.ed.gov/programs/digest/d10/tables/dt10_045.asp

Norman, W., Shallice, T. (1986). Attention to action. In R.J. Davidson, G.E. Schwartz, &

D. Shapiro (Eds.), Consciousness and self regulation: Advances in research and theory

(Vol. 4, pp. 1-18). New York, NY: Plenum.

Nunally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd

ed.). New York, NY:

McGraw-Hill.

130

Obrzut, J. E. (1995). Dynamic versus structural processing differences characterize

laterality patterns of learning disabled children. Developmental Neuropsychology, 11,

467-484.

Offord, D. R., Boyle, M. H., Racine, Y., Szatmari, P., Fleming, J. E., Sanford, M., &

Lipman, E. L. (1996). Integrated assessment data from multiple informants.

Journal of the American Academy of Child and Adolescent Psychiatry, 35, 1078-1085.

Packwood, S., Hodgetts, H. M., & Tremblay, S. (2011). A multiperspective approach to the

conceptualization of executive functions. Journal of Clinical and Experimental

Neuropsychology, 33, 456-470. doi: 10.1080/13803395.2010.533157.

Palfrey, J. S., Levine, M. D., Walker, D. K., & Sullivan, M. (1985). The emergence of

attention deficits in early childhood: A prospective study. Journal of

Developmental and Behavioral Pediatrics, 3, 339-348.

Park, I. J., Kim, P. Y., Cheung, R. Y., & Kim, M. (2010). The role of culture, family

processes, and anger regulation in Korean American adolescents’ adjustment

problems. American Journal of Orthopsychiatry, 80, 258-266. doi: 10.1111/j.1939-

0023.2010.0129.x

Pennington, B. F., & Ozonoff, S. (1996). Executive functions and developmental

psychopathology. Journal of Child Psychology and Psychiatry, 37, 51-87.

Pennsylvania Department of Education. (2012). Poverty level by school district. Retrieved from

http://www.portal.state.pa.us/portal/server.pt/community/pa_pre_k_counts/8742/frl_by_d

istrict/522213

131

Pratt, B. M. (2000). The comparative development of executive function in elementary school

children with reading disorder and attention-deficit/hyperactivity disorder (Doctoral

dissertation). The California School of Professional Psychology at Alameda, Alameda,

CA.

Pressley, M., & Woloshun, V. (1995). Cognitive strategy instruction that really improves

children’s academic performance (2nd

ed.). Cambridge, MA: Brookline.

Qian, Y. & Wang, Y. (2007). Reliability and validity of Behavior Rating Inventory of Executive

Function for school age children in China. Journal of Peking University Health Sciences,

3, 277-283.

Raven, J. C., Court, J. H., & Raven, J. (1988). Manual for Raven’s Progressive Matrices and

Vocabulary Scales. London, UK: H. K. Lewis.

Reddy, L., Hale, J., & Brodzinsky, L. (2011). Discriminant validity of the Behavior Rating

Inventory of Executive Function Parent form for children with ADHD. School

Psychology Quarterly, 26, 45-55. doi: 10.1037/a0022585

Reitan, R. (1958). Validity of the Trail Making Test as an indicator of organic brain damage.

Perceptual and Motor Skills, 8, 271-276.

Reynolds, C., & Kamphaus, R. (1992). Behavior Assessment System for Children. Circle Pines,

MN: American Guidance Service.

Robbins, T. W., James, M., Owen, A. M., Sahakian, B. J., McInnes, L., & Rabbit, P. (1994).

Cambridge Neuropsychological Test Automated Battery (CANTAB): A factor analytic

study of a large sample of normal elderly volunteers. Dementia, 5, 266-281.

132

Rojahn, J., Rowe, E. W., Macken, J., Gray, A., Delitta, D., Booth, A., & Kimbrell, K. (2010).

Psychometric evaluation of the Behavior Problems Inventory-01 and the Nisonger Child

Behavior Rating Form with children and adolescents. Journal of Mental Health and

Research in Intellectual Disabilities, 3, 28-50.

Romine, C., & Reynolds, C. (2005). A model of the development of frontal lobe function:

Findings from a meta-analysis. Applied Neuropsychology, 12, 190-201.

Roth, R. M., Isquith, P. K., & Gioia, G. A. (2005). The Behavior Rating Inventory of Executive

Function – Adult Version. Lutz, FL: Psychological Assessment Resources.

Sattler, J. M. (2001). Assessment of children: Cognitive applications (4th ed.). La Mesa,

CA: Author.

Séguin, J. R., & Zelazo, P.D. (2005). Executive function in early physical aggression. In R. E.

Tremblay, W. W. Hartup, & J. Archer (Eds.), Developmental origins of aggression (pp.

307-329). New York, NY: Cambridge University Press.

Shallice, T., & Burgess, P. W. (1991a). Deficits in strategy application following frontal lobe

damage in man. Brain, 114, 727–741.

Shallice, T., & Burgess, P. W. (1991b). Higher-order cognitive impairments and frontal lobe

lesions in man. In H. Levin, H. Eisenberg, & A. Benton (Eds.), Frontal lobe

function and dysfunction (pp. 125-138). New York, NY: Oxford University Press.

Slick, D., Lautzenhiser, A., Sherman, E., & Eryl, K. (2006). Frequency of scale elevations and

factor structure of the Behavior Rating Inventory of Executive Function (BRIEF) in

children and adolescents with intractable epilepsy. Child Neuropsychology, 12, 181-189.

doi: 10.1080/09297040600611320.

133

Slomine, B. S., Gerring, J. P., Grados, M. A., Vasa, R., Brady, K. D., Christensen, J. R., &

Denckla, M. B. (2002). Performance on measures of ‘executive function’ following

pediatric traumatic brain injury. Brain Injury, 16, 759-772.

Sollman, M. J., Ranseen, J. D., & Berry, D. T. (2010). Detection of feigned ADHD in college

students. Psychological Assessment, 22, 325-335.

Sparrow, S. S., Balla, D., & Cicchetti, D. (1984). Vineland Adaptive Behavior Scales. Circle

Pines, MN: American Guidance Service.

Spearman, C. (1904). “General intelligence” objectively determined and measured. American

Journal of Psychology, 15, 201-293.

Steiger, J. H., & Lind, J. C. (1980, May). Statistically based tests for the number of common

factors. Paper presented at the annual meeting of the Psychometric Society, Iowa City,

IA.

Stroop, J. R. (1935). Studies of interference in serial verbal reaction. Journal of Experimental

Psychology, 18, 643-662.

Stuss, D. T., & Alexander, M. P. (2000). Executive functions and the frontal lobes: A conceptual

view. Psychological Research, 63, 289-298.

Stuss, D. T., & Benson, D. F. (1986). The frontal lobes. New York, NY: Raven.

Tabachnick, B. G., & Fidell, L. S. (2001). Using multivariate statistics (4th

ed.). Needham

Heights, MA: Allyn & Bacon.

Thompson, B., & Daniel, L. G. (1996). Factor analytic evidence for the construct validity of

scores: A historical overview and some guidelines. Educational and Psychological

Measurement, 56, 197-208.

134

Thorell, L., & Nyberg, L. (2008). The Childhood Executive Function Inventory (CHEXI):

A new rating instrument for parents and teachers. Developmental Neuropsychology, 33,

536-552.

Toplack, M., Bucciarelli, S., Jain, U., & Tannock, R. (2009). Executive functions:

Performance- based measures and the Behavior Rating Inventory of Executive

Function (BRIEF) in adolescents with attention deficit/hyperactivity disorder

(ADHD). Child Neuropsychology, 15, 53-72. doi: 10.1080/09297040802070929

Teuber, H. L. (1972). Unity and diversity of frontal lobe functions. Acta Neurobiologiae

Experimentalis, 32, 615-656.

Vilkki, J. & Holst, P. (1989). Deficient programming in spatial learning after frontal lobe

damage. Neuropsychologia, 27, 971-976.

Vriezen, E. R., & Pigott, S. E. (2002). The relationship between parental report on the BRIEF

and performance-based measures of executive faction in children with moderate to severe

traumatic brain injury. Child Neuropsychology, 8, 296-303.

Vygotsky, L. S. (1978). Mind in society: The development of higher psychological processes.

Cambridge, MA: Harvard University Press.

Wechsler, D. (1955). Wechsler Test of Adult Reading: Manual. San Antonio, TX: Psychological

Corporation.

Zelazo, P. D., Müller, U., Frye, D., & Marcovitch, S. (2003). The development of executive

function in early childhood. Monographs of the Society for Research in Child

Development, 68 (serial no. 274).

http://web.ebscohost.com.ezaccess.libraries.psu.edu/ehost/viewarticle?data=dGJyMPPp44rp2%2fdV0%2bnjisfk5Ie46bRLs6i1TbSk63nn5Kx95uXxjL6vrUytqK5Jr5avSLiqr1KwqJ5Zy5zyit%2fk8Xnh6ueH7N%2fiVauntk2zrrBNtqqvPurX7H%2b72%2bw%2b4ti7ebfepIzf3btZzJzfhruvtki0rrdRpNztiuvX8lXk6%2bqE8tv2jAAA&hid=110



135

Appendix A

Glossary of Acronyms

_____________________________________________________________________________

Glossary of Acronyms

_____________________________________________________________________________

ADHD Attention Deficit Hyperactivity Disorder

ADHD-IV ADHD-Rating Scale-Fourth Edition

AODSR average off-diagonal standardized residual

APA American Psychiatric Association

ASD Autism Spectrum Disorders

BASC Behavior Assessment for Children Rating Scale

BRI Behavioral Regulation Index

BRIEF Behavior Rating Inventory of Executive Function

CBCL Achenbach’s Child Behavior Checklist

CFA confirmatory factor analysis

CFI comparative fit index

CI confidence interval

CRS Conners’ Rating Scales

CTMT Comprehensive Trail Making Test

DSM Diagnostic and Statistical Manual of Mental Disorders

EEG electroencephalogram

EF executive functioning

EFA exploratory factor analysis

GEC Global Executive Composite

GST Goal Search Task

IFI Bollen’s fit index

MI Metacognition Index

MRI magnetic resonance imaging

NNFI Bentler-Bonett non-normed fit index

136

Table (cont.)

________________________________________________________________________

NTRS normal theory root square

OHI Other Health Impairment

PAF principal axis factoring

PFC prefrontal cortex

PET positron emission tomography

PNFI parsimonious normed fit index

RMSEA root mean square error of approximation

SOC Stocking of Cambridge task

SRMR standard root mean square

TBI traumatic brain injury

TOH Tower of Hanoi task

TOVA Test of Variables of Attention Commissions

VABS Vineland Adaptive Behavior Scales

WCST Wisconsin Card Sorting Test

137

Appendix B

Items Comprising Scales on BRIEF-Parent form

____________________________________________________________________________

Scale Items

____________________________________________________________________________

1. Inhibit 38, 41, 43, 44, 49, 54, 55, 56, 59, 65

2. Shift 5, 6, 8, 12, 13, 23, 30, 39

3. Emotional Control 1, 7, 20, 25, 26, 45, 50, 62, 64, 70

4. Initiate 3, 10, 16, 47, 48, 61, 66, 71

5. Working Memory 2, 9, 17, 19, 24, 27, 32, 33, 37, 57

6. Plan/Organize

11, 15, 18, 22, 28, 35, 36, 40, 46, 51, 53, 58

7. Organization of Materials 4, 29, 67, 68, 69, 72

8. Monitor

a. Task-Monitor

b. Self-Monitor

14, 21, 31, 60

34, 42, 52, 63

138

Appendix C

School District Approval

139

Appendix D

Licensed Psychologist Approval

140

Appendix E

Office for Research Protections Correspondence

141

Appendix F

Structure Coefficients, Effect Sizes, and Error Terms for Subsamples

Table F1

Standardized Structure Coefficients for BRIEF-Parent Scales for OVR Sample Arranged by

Model (Maximum Likelihood Extraction)






Inhibit .71 (.70) [.51] Inhibit .76 (.65) [.57]

Shift .73 (.69) [.53] Shift .85 (.53) [.72]

ECO .63 (.78) [.40] ECO .82 (.57) [.68]

Initiate .86 (.51) [.74] Factor 2-MI

WM .86 (.52) [.74] Initiate .85 (.52) [.73]

P/O .88 (.47) [.78] WM .87 (.49) [.77]

ORG .70 (.71) [.49] P/O .91 (.41) [.84]

Monitor .84 (.54) [.71] ORG .71 (.70) [.51]

Monitor .82 (.57) [.68]



Shift .93 (.38) [.86] Inhibit .71 (.71) [.50]

ECO .78 (.62) [.61] Shift .73 (.69) [.53]

Factor 2- MI ECO .63 (.78) [.40]


WM .87 (.50) [.75] Initiate .86 (.51) [.74]

P/O .90 (.44) [.81] WM .86 (.51) [.74]

ORG .71 (.71) [.50] P/O .89 (.47) [.78]

Monitor .84 (.55) [.70] ORG .70 (.71) [.50]


(Cont.)

142

Table F1 (cont.)






Inhibit .81 (.59) [.65] Inhibit .83 (.55) [.69]

Shift .80 (.60) [.64] S-Monitor .86 (.52) [.73]

ECO .79 (.61) [.63] Factor 2- ERI

S-Monitor .82 (.57) [.68] Shift .86 (.51) [.74]

Factor 2- MI ECO .84 (.54) [.71]


WM .88 (.48) [.77] Initiate .84 (.55) [.70]

P/O .93 (.36) [.87] WM .88 (.48) [.77]

ORG .72 (.69) [.52] P/O .93 (.36) [.87]

T-Monitor .72 (.70) [.51]

ORG .72 (.69) [.52]

T-Monitor .72 (.70) [.51]


Factor 1-BRI

Inhibit .83 (.55) [.69]

S-Monitor .86 (.51) [.73]

Factor 2- ERI

Shift .86 (.50) [.75]

ECO .84 (.54) [.71]

Factor 3- Int MI

Initiate .84 (.54) [.71]

WM .88 (.48) [.77]

P/O .93 (.37) [.86]

Factor 4 – Ext MI

ORG .73 (.68) [.53]

T-Monitor .75 (.69) [.53]








scale model; 4Monitor-9 = Four-factor, nine-scale model.

143

Table F2

Standardized Structure Coefficients for BRIEF-Parent Scales for Caucasian Sample Arranged by

Model (Maximum Likelihood Extraction)






Inhibit .72 (.70) [.52] Inhibit .79 (.62) [.62]

Shift .73 (.68) [.54] Shift .83 (.56) [.69]

ECO .66 (.75) [.44] ECO .85 (.53) [.72]

Initiate.85 (.53) [.72] Factor 2- MI

WM .86 (.51) [.74] Initiate .85 (.53) [.72]

P/O .88 (.48) [.77] WM .87 (.49) [.76]

ORG .73 (.69) [.53] P/O .91 (.42) [.82]

Monitor .85 (.52) [.73] ORG .74 (.67) [.54]

Monitor .84 (.55) [.70]



Shift .91 (.42) [.82] Inhibit .72 (.70) [.51]

ECO .80 (.60) [.65] Shift .73 (.68) [.54]

Factor 2- MI ECO .66 (.75) [.44]


WM .87 (.50) [.76] Initiate .85 (.53) [.72]

P/O .89 (.45) [.80] WM .86 (.51) [.75]

ORG .73 (.68) [.54] P/O .88 (.48) [.77]

Monitor .85 (.53) [.72] ORG .73 (.67) [.54]




Inhibit .82 (.57) [.68] Inhibit .85 (.53) [.72]

Shift .79 (.61) [.63] S-Monitor .84 (.54) [.71]

ECO .82 (.57) [.68] Factor 2- ERI

S-Monitor .82 (.58) [.67] Shift .84 (.55) [.70]

Factor 2- MI ECO .87 (.49) [.76]


WM .88 (.48) [.77] Initiate .84 (.55) [.70]

P/O .93 (.38) [.86] WM .88 (.48) [.77]

ORG .75 (.67) [.56] P/O .93 (.38) [.86]

T-Monitor .73 (.68) [.54] ORG .75 (.67) [.55]

T-Monitor .73 (.68) [.54]

144

Table F2 (continued)


(Error Terms)

Model [Effect Size]


Factor 1-BRI

Inhibit .85 (.53) [.72]

S-Monitor .85 (.54) [.71]

Factor 2- ERI

Shift .84 (.54) [.71]

ECO .87 (.50) [.75]

Factor 3- Int MI

Initiate .84 (.55) [.70]

WM .88 (.48) [.77]

P/O .92 (.39) [.85]

Factor 4 – Ext MI

ORG .75 (.67) [.56]

T-Monitor .74 (.67) [.55]









145

Table F3

Standardized Structure Coefficients for BRIEF-Parent Scales for Mother Rater Sample Arranged

by Model (Maximum Likelihood Extraction)






Inhibit .68 (.74) [.46] Inhibit .76 (.65) [.58]

Shift .70 (.71) [.49] Shift .82 (.58) [.66]

ECO .62 (.78) [.39] ECO .84 (.55) [.70]

Factor 2- MI

Initiate .85 (.53) [.72] Initiate .84 (.54) [.71]

WM .88 (.48) [.77] WM .89 (.47) [.78]

P/O .90 (.44) [.80] P/O .92 (.39) [.85]

ORG .72 (.70) [.51] ORG .73 (.69) [.53]

Monitor .85 (.53) [.72] Monitor .83 (.56) [.69]



Shift .90 (.43) [.81] Inhibit .67 (.74) [.45]

ECO .78 (.62) [.61] Shift .70 (.72) [.49]

Factor 2- MI ECO .62 (.79) [.38]


WM .88 (.47) [.78] Initiate .85 (.53) [.72]

P/O .91 (.41) [.83] WM .88 (.47) [.78]

ORG .72 (.69) [.52] P/O .90 (.43) [.81]

Monitor .84 (.54) [.71] ORG .72 (.69) [.52]




Inhibit .81 (.59) [.65] Inhibit .83 (.55) [.70]

Shift .78 (.63) [.60] S-Monitor .85 (.52) [.73]

ECO .80 (.60) [.64] Factor 2- ERI

S-Monitor .82 (.58) [.67] Shift .83 (.56) [.69]

Factor 2- MI ECO .85 (.53) [.72]


WM .89 (.46) [.79] Initiate .83 (.56) [.69]

P/O .94 (.34) [.88] WM .89 (.47) [.79]

ORG .73 (.68) [.54] P/O .94 (.34) [.88]

T-Monitor .74 (.67) [.54] ORG .73 (.68) [.54]

T-Monitor .74 (.68) [.54]

146

Table F3 (continued)


(Error Terms)

Model [Effect Size]


Factor 1-BRI

Inhibit .83 (.56) [.69]

S-Monitor .86 (.52) [.73]

Factor 2- ERI

Shift .83 (.55) [.70]

ECO .84 (.54) [.71]

Factor 3- Int MI

Initiate .83 (.56) [.69]

WM .89 (.46) [.79]

P/O .94 (.35) [.88]

Factor 4 – Ext MI

ORG .74 (.67) [.55]

T-Monitor .75 (.66) [.56]









147

VITA

Maria Carbone Smith 65 Fawnvue Drive

Robinson Twp., PA 15136

[email protected]

__________________________________________________________________________________________

Education:

2003- 2013 The Pennsylvania State University M.S. (May 2006), PhD. (December 2013)

University Park, PA School Psychology

1999-2003 Clarion University of Pennsylvania B.A. (May 2003; GPA – 3.76)

Clarion, PA Psychology (major), Spanish (minor)

Publications:

Lei, P., Smith, M., Suen, H. K. (2007). The use of generalizability theory to estimate data reliability in single-subject

observational research. Psychology in the Schools, 44, 433-439.

Watkins, M. W., Wilson, S. M., Kotz, K. M., Carbone, M. C., & Babula, T. (2006). Factor structure of the Wechsler

Intelligence Scale for Children- Fourth Edition among referred students. Educational and Psychological

Measurement, 66, 975-983.

Research Experience:

Research Assistant, Meaningful Science Consortium, Northwestern University, 2006

Research Assistant, POINT (Parent Observations of Infants and Toddlers) Instrument, 2005

Research Assistant, The Fund for the Improvement of Postsecondary Education (FIPSE)

Grant, Clarion University of Pennsylvania, 2002- 2003

Student Assistant, Statistics Education for Quantitative Literacy Project (SEQuaL) Clarion University of

Pennsylvania, 1999

Clinical Experience:

Doctoral School Psychology Intern, Cranberry Area School District, 2007-2008

CEDAR Clinic Student Supervisor, Penn State CEDAR Clinic, 2005-2006

School Psychology Practicum Intern, Indiana Area School District, 2006

School Psychology Practicum Intern, State College Area School District, 2005

School Psychology Student Clinician, Penn State CEDAR Clinic, 2003-2005

Work Experience:

Evaluator and Report Writer, Diagnostic and Treatment Specialists, Harmony, PA 2008-2011

Therapeutic Support Staff (TSS), Milestones Community Health, Indiana, PA, 2006

Trained volunteer, Stop Abuse For Everyone (SAFE), Community Shelter, Clarion, PA 2000-2003

Awards/Grants:

Conrad Frank, Jr. Graduate Fellowship for achievement, Penn State University, 2005-2006

State APSCUF Scholarship, Clarion University of Pennsylvania, 2001

Foundation Leadership Award, Clarion University of Pennsylvania, 2000

Continuing Professional Development:

Student member of National Association of School Psychologists

Student member of Association of School Psychologists of Pennsylvania

Association of School Psychologists of Pennsylvania (ASPP) Conference, State College, PA. Oct 2012.

Sexting and Online Solicitation: A Discussion for School Psychologists [Webinar]. April 18, 2013. Nathan,

Laurie. National Center for Missing and Exploited Children. Accessed from http://www.nasponline.org.

“I can’t get in trouble for one little e-mail, can I?”- What School Psychologists Need to Know about Law

and Electronic Communication [Webinar]. May 23, 2013. Haase, Karen. H & S School Law. Accessed

from http://www.nasponline.org.

“Normal Bilingual Language Development or Language Disorder?” [Webinar]. June 11, 2013. Casilleja,

Nancy. Pearson Assessments. Accessed from http://www.pearsonassessments.com.

Research Interests: Psychoeducational assessment, scale development, executive function

mailto:[email protected]

http://www.nasponline.org/

http://www.nasponline.org/

http://www.pearsonassessments.com/

Documents

TESTING THE FACTOR STRUCTURE OF THE BEHAVIOR RATING