Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
The Pennsylvania State University
The Graduate School
College of Education
TESTING THE FACTOR STRUCTURE OF THE BEHAVIOR RATING INVENTORY OF
EXECUTIVE FUNCTION (BRIEF) PARENT FORM USING A MIXED CLINICAL SAMPLE OF
YOUTH
A Dissertation in
School Psychology
by
Maria C. Smith
© 2013 Maria C. Smith
Submitted in Partial Fulfillment
of the Requirements
for the Degree of
Doctor of Philosophy
December 2013
ii
The dissertation of Maria C. Smith was reviewed and approved* by the following:
Beverly J. Vandiver
Associate Professor of Education
Dissertation Adviser
Co-Chair of Committee
Barbara A. Schaefer
Associate Professor of Education
Co-Chair of Committee
Lynn S. Liben
Distinguished Professor of Psychology
Hoi K. Suen
Distinguished Professor of Education
Kathleen J. Bieschke
Head, Department of Educational Psychology, Counseling, and Special Education
Professor of Counseling Psychology
*Signatures are on file in the Graduate School
iii
ABSTRACT
Executive functions (EF) are cognitive processes that are controlled and coordinated during
complex tasks (Monsell, 1996). EF has become increasingly popular in the context of clinical
evaluation, and, more recently, in the school setting. If children are inadequate at performing
basic classroom functions, such as inhibiting responses, regulating behavior, or predicting
outcomes, their academic success is likely to be compromised (Bull & Scerif, 2001; Palfrey et
al., 1985). The Behavior Rating Inventory of Executive Function (BRIEF; Gioia et al., 2000) is
a behavior-rating scale designed to assess the behavioral characteristics related to executive-
function deficits of youth in school and home environments. However, there continues to be
debate regarding the current two-factor, eight-scale factor structure of the BRIEF-Parent form
when applied in a mixed clinical (or school) sample of school-age youth. This study examined
the factor structure of scores from the BRIEF-Parent form. Ratings were provided by 371
parents or guardians of children living in Western Pennsylvania whose children had been
referred for psychoeducational evaluation. The original model (i.e., 2-factor, 8-scale) currently
employed in the instrument was examined and compared to six alternative models. Results were
analyzed through confirmatory factor analysis (CFA). Findings indicated that in a mixed clinical
sample of youth four of the seven models showed a good fit to the data (e.g., 2-factor, 8-scale -
CFI = .933, SRMR = .049; 3-factor, 9-scale - CFI = .956, SRMR = .041). Although there only
were small differences between the models, RMSEA was still above the recommended cutoff
(i.e., > .08), indicating some potential misfit in all models. Comparisons of the models indicated
that the 3-factor, 9-scale model fit the scores slightly better. These findings provide support for
the use of the two-factor, eight-scale version, which is the basis for the current BRIEF-Parent
iv
form, but competing models fit the data just as well if not better. Thus, the findings also raise
questions about the use of the BRIEF-Parent in its present format in the school setting.
v
Table of Contents
List of Figures .............................................................................................................................. viii
List of Tables ................................................................................................................................. ix
Acknowledgements ..........................................................................................................................x
INTRODUCTION ...........................................................................................................................1
LITERATURE REVIEW ................................................................................................................7
History of Executive Function .....................................................................................................7
Conceptualization of EF ...............................................................................................................9
Theory of unity .......................................................................................................................10
Theory of non-unity ................................................................................................................11
Underlying commonality ........................................................................................................13
EF as a cultural construct ........................................................................................................14
Executive Function in Children..................................................................................................16
Typical EF Development ........................................................................................................16
Role of EF in the learning environment .................................................................................19
The Behavior Rating Inventory of Executive Function .............................................................19
Parent Version ........................................................................................................................20
Description .........................................................................................................................20
Development ......................................................................................................................23
Normative sample ..............................................................................................................25
Evidence for factor structure ..............................................................................................25
U.S. versions ..................................................................................................................26
Translated versions ........................................................................................................30
Summary .......................................................................................................................34
Reliability Evidence of the BRIEF-Parent Form ....................................................................36
Internal consistency ...........................................................................................................36
Interrater reliability ............................................................................................................37
Test-retest reliability ..........................................................................................................38
Other Evidence for the Construct Validity of the BRIEF-Parent Form .................................38
Predictive validity ..............................................................................................................38
Convergent validity ............................................................................................................40
vi
Independent research of convergent validity .....................................................................44
Convergent validity and specific clinical populations .......................................................47
Discriminant validity .....................................................................................................49
Ecological validity .........................................................................................................52
Social consequences ......................................................................................................54
Summary .................................................................................................................................57
Purpose of the Present Study ..................................................................................................58
METHOD ......................................................................................................................................61
Participants .............................................................................................................................61
Geographical Context .............................................................................................................63
Measures .................................................................................................................................63
Demographic information ..................................................................................................63
BRIEF-Parent form ............................................................................................................63
Procedure ................................................................................................................................65
CFA Guidelines and Models ..................................................................................................66
Models................................................................................................................................66
Fit criteria ...........................................................................................................................68
RESULTS ......................................................................................................................................79
Preliminary Analyses ..............................................................................................................79
Descriptive statistics ...............................................................................................................79
Confirmatory Factor Analyses ...............................................................................................81
Criteria ........................................................................................................................…...81
Models................................................................................................................................82
Eight-scale models ......................................................................................................82
Nine-scale models .......................................................................................................84
Eight- versus nine-scale models .................................................................................88
Subsamples .................................................................................................................90
OVR subsample .....................................................................................................90
Caucasian subsample .............................................................................................92
Mother subsample ..................................................................................................94
DISCUSSION ................................................................................................................................98
vii
Eight-Scale Models of the BRIEF-Parent ..............................................................................98
Nine-Scale Models of the BRIEF-Parent .............................................................................102
Differences in Findings ........................................................................................................104
Reasons for Misfit ................................................................................................................106
Limitations ............................................................................................................................107
Implications ..........................................................................................................................109
Practice .............................................................................................................................109
Future research .................................................................................................................111
Conclusions ...........................................................................................................................113
REFERENCES ............................................................................................................................115
APPENDIX A ..............................................................................................................................135
Glossary of Acronyms .................................................................................................................135
APPENDIX B ..............................................................................................................................137
Items Comprising Scales on BRIEF-Parent form ........................................................................137
APPENDIX C ..............................................................................................................................138
School District Approval .............................................................................................................138
APPENDIX D ..............................................................................................................................139
Licensed Psychologist Approval .................................................................................................139
APPENDIX E ..............................................................................................................................140
Office for Research Protections Correspondence .......................................................................140
APPENDIX F...............................................................................................................................141
Structure Coefficient, Effect Sizes, and Error Terms for Subsamples ........................................141
Standardized Structure Coefficients for BRIEF-Parent for OVR Sample .......................141
Standardized Structure Coefficients for BRIEF-Parent for Caucasian Sample ..............143
Standardized Structure Coefficients for BRIEF-Parent for Mother Rater Sample .........145
viii
LIST OF FIGURES
Figure 1. Unity-8 Model ....................................................................................................70
Figure 2. 2Original-8 Model ..............................................................................................71
Figure 3. 2Donders-8 Model ..............................................................................................72
Figure 4. Unity-9 Model ....................................................................................................73
Figure 5. 2Monitor-9 Model ..............................................................................................74
Figure 6. 3Monitor-9 Model ..............................................................................................75
Figure 7. 4Monitor-9 Model ..............................................................................................76
Figure 8. Standardized Coefficients of 3Monitor-9 Model ...............................................89
ix
LIST OF TABLES
Table 1. Demographic Characteristics of Sample..............................................................62
Table 2. Composition of Models Organized by Factor and Indicator ..............................69
Table 3. Descriptive Statistics of Raw Scale Scores on the BRIEF-Parent Form ............80
Table 4. Summary of Fit Indices of CFA (ML Extraction) Models on the BRIEF-
Parent Form Scale Scores for a Mixed Disability Sample ........................................83
Table 5. Standardized Structure Coefficients for BRIEF-Parent Scales for Mixed
Disability Sample Arranged by Model (Maximum Likelihood Extraction) .............85
Table 6. Summary of Fit Indices of CFA (ML) Models of the BRIEF-Parent Form
for the OVR Sample ..................................................................................................93
Table 7. Summary of Fit Indices of CFA (ML) Models for the BRIEF-Parent Form
Based on the Caucasian Participants ..........................................................................95
Table 8. Summary of Fit Indices of CFA (ML) Models for BRIEF-Parent Form
Based on the Mothers as Raters .................................................................................97
Table 9. Root Mean Square Error Approximation (RMSEA) Values Arranged by
Model and Study ......................................................................................................106
Table 10. Percentage of Participants Receiving Special Education Services for
School Sample and School District by Category .....................................................109
x
Acknowledgements
They say it takes a village to raise a child. Since I’m in the process of raising two young
children as well as completing this dissertation, I will venture to say that it also takes a village to
complete a dissertation!
Thank you to my committee for helping me to complete this project. Dr. Beverly
Vandiver, my adviser, thank you for helping me to finish “lil D.” You kept me calm when my
nerves got the best of me, and helped me to realize that I was capable of completing this. Thank
you for encouraging me, but for also challenging me. You’ve helped me to turn this into
something I’m very proud of. Dr. Barbara Schaefer, thank you for becoming my “last minute
co-chair.” You’ve also served as a great role model to me in regards to balancing family and
academia. Dr. Hoi Suen, I enjoyed every class that I took from you. Thank you for including me
in many interesting projects and for helping me to more fully appreciate (and enjoy) the field of
measurement. Dr. Lynn Liben, you have always been extremely encouraging and are such a nice
person. I was honored to have such a renowned scholar on my committee. Thank you for your
contributions.
Thank you to Dr. Douglas Della Toffalo, who dedicated a lot of time and effort into
helping me complete this dissertation. You started off as a great internship supervisor and helped
foster my interest in executive function and the subfield of school neuropsychology. I can’t say
that I will miss all of those files, but I will always remember the help and kindness you extended
to me so that I could complete this goal of finishing this degree. Also thank you to Danielle
Wilson for your help with entering data.
This dissertation is dedicated to my family. First and foremost, thank you to my
wonderful, supportive, and amazing husband, Chad. You never stopped believing in me and
made this journey so much easier and more enjoyable. Thank you for taking over the household
and childcare duties on numerous occasions (and weekend trips) so that I had the opportunity to
work on this dissertation. I love you, and I know I couldn’t have done this without you. To my
kids, “Dr. Mommy” loves you more than I can possibly express. To my son, Drew, who just
turned five years old, thank you for your hugs and kisses and for the patience you have
demonstrated at such a young age. You are a bright, funny, energetic, and sweet little boy; I am
so blessed to have you in my life. To Lia, our little “pumpkin” who is now three years old, you
were always my little cheerleader! You are an inquisitive, expressive, sharp, and hilarious little
girl. You have been a fighter since the day you were born and you taught me to never give up. To
my sister, Caren, thank you for continuing to genuinely care about how things were progressing
for me after so many others had stopped asking. I have always looked up to you; you are an
amazing sister, mom, and friend. To my sister, Cathy, thank you for your support and
encouragement throughout this journey! To my parents, John and Elaine, thank you for serving
as role models as to the importance of pursuing my education. To my best friend, Anne, thank
you for being an amazing listener and for being there for me through so many ups and downs. To
my good friend, Katie, you have always made me laugh throughout our days at Penn State
whether that was honking at people or chasing buses. I’m so glad that the program brought us
together. To my dear friend, Maria (Big), I appreciate you being such a good friend over the
years. Thank you for graciously hosting me the night before my defense and for your help with
delivering my final copy on campus. To my cohort members (Katie, Kasey, Sharise, and Terry)
thank you being such great people and for helping to make the long hours in CEDAR as well as
studying for comps more enjoyable. And to those who I did not mention here, but have helped
me over the years to complete this dissertation, a heartfelt “thank you.”
1
INTRODUCTION
Since Congress enacted the Education for All Handicapped Children Act (Public Law 94-
142) in 1975, the largest increase (433%) in students identified for special education services has
been in the other health impairment (OHI) category (National Center for Education Statistics
[NCES], 2010). This law, commonly known as the Individuals with Disabilities Act (IDEA;
1997; Public Law 105-17), was reauthorized in 2004 as the Individuals with Disabilities
Improvement Act (IDEIA; Public Law 108-446). Under IDEA (2006), the law states that
children should be provided with special education services if they meet the following criteria:
Other health impairment means having limited strength, vitality, or alertness,
including a heightened alertness to environmental stimuli, that results in limited
alertness with respect to the educational environment, that—(i) Is due to chronic
or acute health problems such as asthma, attention deficit disorder or attention
deficit hyperactivity disorder, diabetes, epilepsy, a heart condition, hemophilia,
lead poisoning, leukemia, nephritis, rheumatic fever, sickle cell anemia, and
Tourette syndrome; and (ii) Adversely affects a child’s educational performance.
[§300.8(c)(9)]
One explanation for the growth of students qualifying for services within the OHI
category is the increase in the diagnosis of attention deficit hyperactivity disorder (ADHD;
Akibami, Liu, Pastor, & Reuben, 2011). Additionally, changes in diagnostic criteria for clinical
diagnoses such as autism spectrum disorders (ASD) in the fifth edition of the Diagnostic and
Statistical Manual of Mental Disorders (DSM-5) may indirectly result in an increase in students
qualifying for special education services under the OHI category. Another contributory factor to
the increased use of the OHI category is the survival rate of significantly premature (i.e., less
than or equal to 32 weeks gestation) infants (Alexander & Slay, 2002; Johansson & Cnattigius,
2
2010). The prevalence of specific learning disabilities and ADHD in premature infants without
neurological abnormalities is two to three times higher than in the overall population (Aylward,
2004). As of the 2008-2009 school year, an estimated 659,000 children aged 3 to 21 received
special education services under the OHI category, accounting for 10.2% of the special education
population in the United States (NCES, 2010).
Besides the regulations within IDEA, another federal law that mandates protection for
students with disabilities is Section 504 of the Rehabilitation Act of 1973 (commonly shortened
to Section 504; 34 C.F.R., Part 104), later amended to the Americans with Disabilities Act of
1990 (ADA), and further revised as part of ADA Amendments Act of 2008. The law decreed
that any entity, including schools, that receives federal funding must take measures to ensure
access and equal rights to services for those who have a “physical or mental impairment that
substantially limits one or more major life activities” (ADA, 2000; para. J1). The most recent
update in 2008 expands the current definition of a disability to be more inclusive of those with a
history or record of impairment, or of those who are perceived by others as having impairments.
A common problem for school psychologists and others responsible for applying the spirit of the
law in the school setting is determining what criteria of Section 504 qualify students for an
educational plan. Medical conditions can qualify a student for additional academic supports
under Section 504, even if the student does not meet the criteria to receive services under IDEA,
albeit, only a small percentage of students (approximately 1%) fall within this categorization
(Holler & Zirkel, 2008). Of this 1%, the most common impairment (approximately 80%) has
been children diagnosed with ADHD.
Medical diagnoses, such as ADHD and many other disorders relating to developmental,
neurological, and psychiatric conditions, have been largely associated with deficits in emotional
3
control, memory, inhibiting responses and regulating behavior (Barkley, 1997; Palfrey, Levine,
Walker, & Sullivan, 1985). Difficulties in these areas have also been documented in those with
autism (Mayes, Calhoun, Mayes, & Molitoris, 2012), specific learning disabilities (Obrzut,
1995), and traumatic brain injury (Loken, Thorton, Otto, & Long, 1995). Problems involving
behaviors, such as impulsivity, poor organization and self-monitoring skills pervade across
several IDEA special education categories. These impairments are not unique behaviors to those
who qualify for services under the OHI label.
Thus, reliable and valid methods are needed to identify children in need of services, and
should be able to stand up to rigorous scrutiny. Psychologists use many assessment methods,
including interviews, observations, and behavior rating scales, to gather important information
about students. Best practice involves gathering information across multiple sources and settings
(Hintze, Volpe, & Shapiro, 2007). However, psychologists are somewhat limited when
identifying students displaying behaviors that fall under the OHI category, because most medical
conditions, including ADHD, are more commonly diagnosed by medical professionals (Akibami
et al., 2011). Such medical conditions can often have a negative impact on a child’s academic
achievement (Johnson & Reid, 2011). As more awareness is gained of the behavioral
components related to specific medical conditions, it is becoming increasingly apparent that a
reciprocal connection exists between these problem behaviors and academic success (Lane,
O’Shaughnessy, Lambros, Gresham, & Beebe-Frankenberger, 2002). As such, psychologists
and other professionals in the school setting are charged with identifying these students in a
standardized manner and providing them with an appropriate educational experience.
School psychologists commonly use behavior rating scales as part of an overall
evaluation to gather information about a student’s eligibility under IDEA or Section 504.
4
Behavior rating scales are typically composed of statements about a wide variety of behaviors.
Raters are asked to indicate the frequency of the behavior observed of a designated child. To be
useful in gathering large amounts of information across many areas of functioning, the scales
must be psychometrically sound (Blais, 2011; Rojahn et al., 2010), as well as cost- and time-
effective (Chafouleas, Riley-Tillman, & Sugai, 2007).
Parents are most commonly asked to complete behavior ratings scales about their
children, but have a tendency to rate their children as having significantly more problems on all
behavior rating scales when compared to teacher ratings (Offord et al., 1996). However, parents
tend to be more accurate reporters of children’s hyperactivity and inattentiveness than children
themselves (Loeber, Green, & Lahey, 1990). Documented behaviors that are consistent across
both the home and school environments are important information for psychologists to gather. It
is equally important for psychologists to be aware of behaviors that differ based on the child’s
environmental setting. Essentially, the job of a school psychologist is to accurately survey
children’s potential mental and physical impairments, and determine how these factors may have
an impact on their lives and educational experience.
Although the OHI category encompasses a vast array of impairment, these health
conditions can be examined not only in terms of their impact on the body, but possible influence
on the brain itself. This idea of examining the structure of the brain to fully understand
psychological processes and behaviors is the crux of the field of neuropsychology. The use of
neuropsychological concepts within the context of the school is growing, particularly over the
past decade (Hale & Fiorello, 2004). Psychologists’ role is to adequately evaluate children
displaying behaviors, as those listed above, and make appropriate conclusions about their
eligibility for services. The term “executive function” (EF), which encapsulates these behaviors,
5
is drawn originally from neuropsychological literature and warrants further examination and
understanding.
In the last decade, EF has become a popular topic in applied settings, including the
school. Professionals who work with children are acknowledging that deficits in the area of EF
are important to consider, because of its linkage to poor academic performance or problematic
behaviors (Bull & Scerif, 2001). Its importance is evidenced when in 2002 an entire issue of the
peer-reviewed journal Child Neuropsychology focused on the Behavior Rating Inventory of
Executive Function (BRIEF; Gioia, Isquith, Guy, & Kenworthy, 2000) as well as subsequent
issues that have included articles about the BRIEF. A search of the Wiley Online Library
revealed that between 2002 and 2012 Psychology in the Schools, a peer-reviewed school
psychology journal, has published 89 articles that focused on executive function.
Despite the attention given to the BRIEF, few studies, beyond those of the developers,
have examined the reliability and validity of its scores. A PsycINFO search reveals
approximately 265 studies have used the BRIEF as a way of measuring executive function skills
in various populations from 2002 to 2012. In contrast, since the BRIEF’s development 12 years
ago, nine studies have examined the factor structure of the BRIEF-Parent form, two of which
were conducted by the authors of the instrument. Studies on the BRIEF’s factor structure have
been largely exploratory in nature (e.g., Batan et al., 2011; Slick et al. 2006). Only four studies
have conducted confirmatory factor analysis (CFA) on the BRIEF: (a) Gioia et al. (2002), as the
test developers, provided the initial examination, (b) Egeland and Fallmyr (2010) tested a
Norwegian version of the BRIEF, (c) LeJeune et al. (2010), including one of the test developers,
used the normative sample, and (d) Huizinga and Smidts (2011) examined a Dutch version of the
BRIEF. Thus, the purpose of the current study is to re-examine the two-factor eight-scale model
6
currently being used in the BRIEF instrument. Additionally, the study will extend the research
on the factor structure of the BRIEF-Parent scores at the scale level through examining several
alternative factor structures. These factor structures will be examined through CFA in a U.S.
sample of youth, who are in grades from kindergarten to 12th
and have mixed clinical diagnoses.
It is important to examine the factor structure of any instrument because accurate test
interpretations depend upon knowing the number of factors underlying the items of a measure.
The study will either strengthen or weaken the case that the present factor structure of the
BRIEF-Parent is sufficient for use and may have implications for the appropriate level of
interpretation for youth who are experiencing problems in executive functions. Because school
psychologists are charged with assessing and meeting the needs of a rapidly growing population
of children in not only OHI, but all special education categories, reliable and valid scores of the
instruments must be used in assessment. Public schools receive federal funding, so by law they
have a responsibility to ensure access and equal rights to those with disabilities and those who
are perceived as having impairment. Such a disability may fall under OHI or any number of
special education categories when considering limited ability in the areas of self-regulation,
attention, emotional control and memory. Parent and teacher input regarding student behavior is
crucial and can be provided in a standardized manner through the BRIEF. This information may
help students succeed in school and, ultimately, the future.
7
LITERATURE REVIEW
In 1996, Monsell called the current understanding of how cognitive processes are
controlled and coordinated during complex tasks (i.e., executive function [EF]) an “embarrassing
zone of almost total ignorance” (p. 93). Since then, a large amount of research has been
conducted about the topic, but a great deal of debate in understanding and using the EF concept
still exists in applied settings. A brief history of EF, including its conceptualizations and cultural
considerations will be discussed. EF will also be addressed, as it specifically relates to children
and the difficulties associated with using adult EF research in the realm of childhood assessment.
A practical way of analyzing executive dysfunction, namely the lack of socially appropriate
levels of executive functioning, is through observer ratings. Problems associated with this
methodology as well as the scope of measuring executive function will be reviewed. Then, the
Behavior Rating Inventory of Executive Function (BRIEF; Gioia, Isquith, Guy, & Kenworthy,
2000), one of the most popular observer rating scales of executive functioning, will be reviewed,
including its description and development. There are two versions of the “original” BRIEF, a
teacher form and a parent form. For the purposes of this study, the BRIEF-Parent will be
reviewed. Evidence for its factor structure will be provided through a review of previous
research and its psychometric properties. Finally, the purpose of the present study will be
provided.
History of Executive Function
The executive functions of inhibition and control have been studied in the field of
neurophysiology since the 1830s, with the focus on EF gradually making its way into the field of
psychology at the onset of the 20th
century (Lewis & Carpendale, 2009; Mead, 1910). Luria
(1961) and Vygotsky (1978), influenced by the initial research, investigated “higher” cognitive
8
processes, including planning, memory, and inhibition. The term “executive function” first
appeared in the 1970s, and was referred to as the “central executive” of the brain (Baddeley &
Hitch, 1974). Later, Lezak (1983) described executive function as “those capacities that enable a
person to engage in independent, purposive, self-serving behavior successfully” (p. 38) and as
being “necessary for appropriate, socially responsible … adult conduct” (p. 507). It is generally
agreed on that many high-level cognitive functions are directly related to the prefrontal cortex
(PFC) of the brain and are labeled as executive functions (Luria, 1973).
In the 1970s and 1980s the information-processing approach was revised to include
“supervisory systems that regulate the flow of information and control behavior” (Lewis &
Carpendale, 2009; p. 2). The focus of this research was also on a working-memory model
(Baddeley & Hitch, 1974), which is now part of the current conceptualization of executive
function. As technology improved in the 1990s and 2000s, the study of executive function
became less about social interaction, and more about individual functioning and
neuropsychological pathways. With the development of the medical resonance imaging (MRI),
positron emission tomography (PET), and electroencephalogram (EEG) scans, various neural
pathways have been examined, enabling researchers to pinpoint areas of the brain involved in the
executive function processes, particularly in the frontal lobe. Zelazo, Müller, Frye, and
Marcovitch (2003) indicate that research on executive function in the past two decades has
focused on pinpointing the specific component skills of executive function. Miyake et al. (2000)
used structural equation modeling to show that executive function is a made up of distinct
entities, but with an underlying communality. The vast amount of EF research, including
Miyake et al.’s, has been on an adult population, but there has been a shift toward examining EF
in children, including its development, and the best ways to model it. Furthermore, the focus has
9
been on how well EF research on adults could be generalized to understanding EF in children.
Researchers have questioned whether or not the adult conceptualization of brain processes could
help in understanding children in the areas of learning, emotional control, and other important
social skills. For example, Lehto, Juujärvi, Kooistra, and Pulkkinen (2003) have found mixed
results in children on the dimensions of executive function. In comparison to the findings of
Miyake et al. (2000) for adults, children seemed to show similar processes to those of adults as
well as have different processes at work. Clinicians working with children with neurological
difficulties also began examining their EF skills (Gioia, Isquith, Guy, & Kenworthy, 2000) as
well as developing interventions to address their deficits. Despite the progression of the research
on EF and its application, the construct is still an area for debate.
Conceptualization of EF
Executive function is considered to be composed of four areas: (a) goal formation
abilities, (b) ability to plan, (c) ability to carry out goal-directed plans, and (d) effective
performance (Lezak, 1983, p. 507). Each area helps explain how humans adapt their behaviors
as well as refrain from exhibiting inappropriate behaviors in order to meet a continually changing
environment. Some best designate EF as an umbrella term used to describe the control functions
of the PFC, particularly those of a goal-oriented nature (Best, Miller, & Jones, 2009).
Executive function is a construct that has eluded universal definition in spite of its
frequent appearance in neuropsychological literature (Jurado & Rosselli, 2007). The difficulty in
defining EF stems, in part, from inconsistent behaviors of those with damage in areas of the brain
believed to directly have an impact on executive function (Miyake et al., 2000). The debate
about the EF construct has been divided into three camps of conceptualization, each supported
by its own set of literature: (a) the existence of one underlying ability (the theory of unity), (b)
10
the existence of several, but distinct, brain processes (the theory of non-unity), and (c) a
combination of both unity and non-unity.
Theory of unity. The premise of the theory of unity is all executive processes combined
constitute an overarching, interconnected supervisory system commonly referred to as the
“central executive” (Baddeley, 1986; Norman & Shallice, 1986). This theory entails the
fundamental question of whether a single, underlying ability is responsible for a variety of
behaviors that have been labeled as executive functions. Due to the complexity of the various
systems involved in executive function, the idea that one system controls all aspect of executive
function is considered an outdated conceptualization. Baddeley (1996), one of the pioneers of
executive function research, states:
It is probably true to say that our initial specification of the ‘central executive’ was so
vague as to serve as little more than a ragbag into which could be stuffed all complex
strategy selection, planning, and retrieval checking that clearly goes on when subjects
perform even the apparently simple digit span task. (p. 6)
The idea of a central executive has often been disregarded in recent research (e.g.,
Packwood, Hodgetts, & Tremblay, 2011) because actions that are supposedly controlled by one
overarching entity of the brain lack specificity. An updated version of this theory is that both
general intelligence (g) and working memory are highly linked to a core factor of EF and the
organization of goal directed behavior (Duncan, Emislie, Williams, Johnson, & Freer, 1996).
Duncan et al. (1996) contend that Spearman’s (1904) g is a direct reflection of the brain’s frontal
lobes. As such, a connection between consistent deficits in frontal lobe functionality and
measurement of g is not as evident in research because measures used to assess “average
performance on a diverse range of tests” (Duncan et al., 1996; p. 259), such as the Wechsler
11
Adult Intelligence Scale (WAIS; Wechsler, 1955) are not best suited for testing intelligence in
the clinical population (Teuber, 1972). Instead, Duncan et al. argues, tests that incorporate fluid
intelligence are better suited, although not frequently used, for assessing cognitive ability in the
clinical population. This aspect is part of a theory, in which the use of several broad stratum
including the overarching mechanism, g, are considered responsible for intelligence (Carroll,
1993). Fluid intelligence is typically measured through novel problem solving with spatial or
verbal materials. When patient behavior is viewed through this framework, evidence of specific
and consistent deficits in patients with frontal lobe lesions becomes more apparent. Duncan et
al. focused particularly on demonstrating deficits in the area of “goal neglect,” which is defined
as “disregard of a task requirement even though it has been understood” (p. 265). In spite of
Duncan et al.’s persistence about the problems in conceptualizing and measuring intelligence,
evidence shows (e.g., Eslinger & Damasio, 1985; Shallice & Burgess, 1991a; Stuss &
Alexander, 2000) that patients with major PFC lesions can perform in the superior range of
intelligence tests (e.g., WAIS). Hence, the debate about the conceptualization of EF ensues,
which appears to be shifting in support of the concept of non-unity.
Theory of non-unity. Some scholars (Godefroy, Cabaret, Petit-Chenal, Pruvo, &
Rousseaux, 1999; Shallice & Burgess, 1991a) claim that EF is composed of numerous facets,
rejecting the notion of a core EF factor. Their argument, sometimes referred to as the theory of
non-unity, is based on the responses of patients with PFC lesions when administered cognitive
tests. Many patients with PFC lesions perform inconsistently on task-based executive function
tests (e.g., Tower of Hanoi [TOH]; Shallice & Burgess, 1991a) as well as on some cognitive
tests, such as the WAIS and Raven’s Progressive Matrices (Raven, Court, & Raven, 1988). If an
underlying single factor exists, and the function of the PFC is directly related to EF, then all
12
tasks purported to measure it should be difficult to perform for a patient with PFC damage (Stuss
& Benson, 1986).
Due to the ambiguity and confusion that have been associated with EF, its role in the
brain is sometimes conceptualized as a “black box.” More modern research has focused on
“decomposing the proposed ‘black box’ into more informative subcomponents” (Packwood,
Hodgetts, & Tremblay, 2011, p. 457). A common finding is that patients with frontal lesions do
not have consistent or predictable memory deficits, recall, or attention deficits (Goldberg, 2001).
Stuss and Alexander (2000) indicate that it can take researchers years to gather a sufficient
amount of patients with well-defined frontal lesions. And even when such a sample is obtained,
individual differences may play a major role in how these patients perform on task-based
measures. In addition, the intercorrelation between EF tasks in many studies is often found to be
lower (i.e., r ≤ .40; Hughes & Graham, 2002; Lehto, 1996) than expected and, in turn, are often
not statistically significant. In a small sample of 35 ninth-grade Finnish students, Lehto (1996)
found low intercorrelation (e.g., r’s = -.18 to .06) of student performances between three
commonly used neuropsychological task-based measures (Wisconsin Card Sorting Test [WCST],
Heaton, 1981; TOH; & Goal Search Task [GST], Vilkki & Holst, 1989; see Appendix A for a
glossary of acronyms). The WCST and TOH are among the most common instruments used in
research to directly measure executive functioning skills (e.g., Beck, Schaefer, Pang, & Carlson,
2011; Slomine et al., 2002). Lehto (1996) claimed, based on these results, that the
intercorrelation should be higher if a central executive function exists.
A problem that is indirectly a result of the shift toward a theory of non-unity is that
attention has been placed on parsing out individual processes rather than focusing on the
commonality of various factors. Séguin and Zelazo (2005) note that although factor analysis has
13
been useful in clarifying various constructs of executive function over the past 20 years,
researchers’ views on the underlying performance results tend to vary between studies.
Packwood et al. (2011) provide an example of this phenomenon in that it is difficult to see how
the factor of “visual processing” examined in one study (Floyd, Bergeron, & Hamilton, 2004) is
distinct from “visuospatial storage-and-processing coordination” examined in a separate study
(Fournier-Vicente, Larigauderie, & Gaonac’h, 2008). Packwood et al. has called for
transparency in the labeling of subcomponents in order to expedite the comparison between
studies and to decrease the ambiguity of EF constructs. As with most debates, there also exists a
group of researchers who have attempted to combine ideas from both theories.
Underlying commonality. A camp of researchers contends that the best
conceptualization of EF is the incorporation of both arguments (e.g., Fisk & Sharp, 2004; Lehto
et al., 2003; Miyake et al., 2000). The premise is that EF processes are “clearly distinguishable”
from one another, but each process is still related to some degree and “share some underlying
commonality” (Miyake et al., 2000, p. 72). In an individual difference study of 137
undergraduate students, Miyake et al. (2000) administered a battery of widely used executive
tasks (e.g., WCST and TOH) to examine three commonly postulated executive functions:
shifting, updating, and inhibition. Confirmatory factor analysis (CFA) indicated that the best fit
of the data could be described through a three-factor model reflecting shifting, updating, and
inhibition. Miyake et al. (2000) also reported that the target executive function factors were
moderately correlated (r = .42 to .63), but separable. A higher order model was proposed in
which the communality of the three factors was emphasized. However, the authors deemed the
“reduced” model as “good” since the “fit indices [met] standard criteria and [the] χ2 difference
14
test indicate[d] that the model’s fit [was] not statistically worse than the fit of the full model” (p.
73).
Although Miyake et al.’s (2000) study is frequently cited in the executive function
literature, there are number of limitations. One is the small number of participants for a CFA (N
= 137). Furthermore, participants were undergraduate students, which may have had above
average intelligence or socio-economic status, making generalizations to the adult population
difficult. Finally, the use of CFA does not directly address the nature of EF. Instead, the
findings indicate that the three selected measures are tapping distinct aspects of EF.
EF as a cultural construct. Even though a lack of clarity exists about the definition of
EF, scholars generally agree that EF plays a major role in human behavior and for that reason
must be studied further. These functions enable individuals to organize their thoughts, create a
plan, carry out the plan, and persevere on a task until it is completed. These functions are
considered essential to being successful as human beings in school and work setting as well as in
everyday lives (Barkley, 1997). The construct of EF itself, as with any construct, is inherently
based on one’s beliefs and views of the world. It is important to question whether the construct
as measured (in this case EF) is changed or affected in some manner because it is subject to
societal norms or opinions. It may be practically impossible to determine this partiality;
however, it is worth considering if the term executive function may be biased based on the
language used to label it or the framework from which it stems. An example would be those
behaviors that are considered undesirable in the school and work setting, such as lack of
inhibition and emotional control, and are clustered into the category of “executive dysfunction.”
These behaviors may or may not be considered inappropriate in various populations or cultures
worldwide. Cultural norms vary in terms of the display of emotion, interpretability of behaviors,
15
and societal norms that lie at the heart of those behaviors labeled as executive functions. The act
of “guiding, directing, and managing cognitive, emotional, and behavioral functions,” as Gioia et
al. (2000, p. 1) note, seems to be necessary in any society.
Theory and research in developmental psychopathology stem mostly from research
conducted in Western cultures; thus, little research had been conducted on developing culturally
sensitive modes of intervention in as recently as the late 20th
century (Coll, Akerman, &
Cicchetti, 2000). In the past decade, culturally specific research in the area of self-regulation,
particularly anger control and anger suppression is occurring. For example, in some cultures
overt expression of anger demonstrated by males may be considered socially acceptable.
However, Martinez, Schneider, Gonzales, and del Pilar Soteras de Toro (2008) demonstrated in a
group of 498 middle-school Cuban students that both males and females who displayed anger
control tended to be more likely to be rated by peers as well-liked, labeled as best friends, and
considered leaders than those students rated by peers as having difficulty in controlling their
anger. Additionally, in a group of 166 Korean American adolescents between the ages of 11 and
15, anger suppression was linked to depressive symptoms whereas weaker anger control and
greater outward anger expression were associated with externalizing problems (Park, Kim,
Cheung, & Kim, 2010).
Research conducted using non-Western cultures consistently demonstrates the reciprocal
relationship between self-regulation and socio-emotional competence and adjustment (e.g.,
Eisenberg, Liew, & Pidada, 2004; Martinez et al., 2008; Park et al., 2010). For instance,
Eisenberg et al. (2004) surveyed a group of 112 Indonesian students in third grade and three
years later in sixth grade. Students were asked to nominate and rank four classmates liked the
most and four classmates liked the least. Additionally, three teachers were asked to rate each
16
student in terms of regulation, social functioning, and negative emotionality. Results indicated
that boys’ results tended to hold across time and across reporters more consistently than the girls,
but ultimately, good self-regulation and low negative emotionality were good predictors of
positive socio-emotional functioning in both sexes.
Because the concept of EF pervades across cultures and lifestyles, and may heavily affect
individuals’ interaction with their learning environment, educators have become increasingly
interested in the concept. In a PsycINFO search, Bernstein and Waber (2007) reported that, in
1985, there were only five peer-reviewed articles about EF in education-related journals. Similar
publications almost tripled (14) in 1995. By 2005, over 500 articles were published in
education-related journals about EF. Thus, it appears that educators realize the impact that EF
may have on children’s educational experience (Best & Miller, 2010). Research involving the
development of executive functioning and its role in the learning process began to increase in the
field of education and psychology.
Executive Function in Children
Typical EF development. Extensive focus on EF in the adult population subsequently
led to a call for research to be conducted on children to obtain a better grasp of it
developmentally (Lewis & Carpendale, 2009). Hughes and Graham (2002) claim that the body
of literature on children and EF is still in its early stages for three main reasons. One, until
recently PFC has been incorrectly believed to only be functionally mature once a person reached
adolescence. Two, early examination of soldiers who endured head injuries in war were
misinterpreted that lesions to the PFC were not apparent, or rather, not realized until adulthood
(Stuss & Benson, 1986). And three, tests used to measure executive function were traditionally
difficult in nature, making it challenging and inappropriate to use them to assess children. The
17
shift in the research population has resulted in the use of less complex instruments in assessing
similar functions in children. Simplifying instruments can sometimes lead to inappropriate
interpretation from examiners beyond the scope of the instrument as well as greater
manipulability of task component demands (Best, Miller, & Jones, 2009). These alterations to
the original (i.e., adult) instruments raise questions whether such changes may alter what is
actually measured.
Most of the research that initially focused on children in the 1980s and 1990s was on the
atypical development of EF—most commonly ADHD and autism population (Hughes &
Graham, 2002). Recently, the research has shifted toward examining normal executive function
development. EF skills can be fostered for all children through using verbal scaffolding, playing
games that require sustained attention and planning as well as through giving children legitimate
choices and decision-making power (Dawson & Guare, 2009). Best et al. (2009) note, however,
that a disproportionate number of the test participants in many research studies are preschool age
(ages 2 to 5 years) and speculate that this status has occurred for several reasons. One is that
researchers believe a great deal of understanding can be gained by focusing on this age range
when executive functioning is first observable and the types of behaviors associated with
executive functioning need to be activated in social or educational settings. As the brain
develops, the beginning of many of the measurable executive functions also is developing in
adults. Tasks designed to assess EF in children tend to be less complex than adult tasks, making
them simpler and creating less confusion in attempting to single out specific EF abilities. For
example, an exercise in complex response inhibition children may be asked to complete is
known as “Baby Stroop.” This exercise involves matching small cups and spoons, and large
cups and spoons. The child is then told to play a “topsy-turvy” game and is given instructions to
18
match small “baby” spoon to big cup, and large “mommy” spoon to small cup. This task differs
from the commonly known Stroop Color-Word Test (Stroop, 1935), often given to the adult
population to test the same construct. In the child’s version a physical object is available for the
participant to touch, and there are only two variables to manipulate. In the Stroop Color-Word
test for adults, there are no physical objects for the participant to touch and there are more than
two variables. Consequently, testing children requires the examiner to change the tasks from that
required of the adult population. It is legitimate to question whether or not the adult and child
tasks are tapping the same construct (Garon, Bryson, & Smith, 2008).
Studying EF in preschool children is an important line of inquiry. However, it is equally
important to broaden the scope of EF research to include examination of the school-age
population. By expanding the age range to include all youths, a better grasp of development in
executive functioning can be examined. Romine and Reynolds (2005) conducted a meta-
analysis of EF studies, which involved samples of ages 5 to adulthood, and concluded that the
greatest increases of EF occurred in verbal fluency, planning, design fluency, and inhibition of
perseveration from ages five to eight years old. Additionally, the “sleeper effect” may exist,
meaning that individual differences as a young child may not show noticeable effects until
middle school (Best & Miller, 2010). An example of the sleeper effect would be the seemingly
minor effects of EF on theory of mind as a preschooler (i.e., having the mental capacity to
interpret and predict one’s own and other people’s behavior). The negligible abnormalities in EF
may appear to be inconsequential at such an early age to the child’s social interaction, but may
balloon into major social deficits as a teenager (Best et al., 2009). It is becoming increasingly
apparent that EF deficits that could lead to social and emotional problems may start in young
children as an area considered small and unobtrusive, and then develop into major deficits in
19
adolescents or adults. The development of the EF is especially important as children enter a
formal learning atmosphere.
Role of EF in the learning environment. After age five, most children are involved in
school as well as more non-family social settings, both of which require increased self-control.
Executive function is important to understand in the learning environment because of the
repercussions from executive dysfunction. If children are not able to adequately perform basic
classroom functions, such as inhibiting responses, regulating behavior, or predicting outcomes,
their academic success is likely to be compromised (Bull & Scerif, 2001; Palfrey et al., 1985).
An important link may exist between early executive functioning and future academic
achievement. Clark, Pritchard, and Woodward (2010) tested preschool-aged children (at age
four) using individual executive function tasks (e.g., TOH) as well as teacher ratings of executive
functioning using the BRIEF-Preschool version (BRIEF-P; Gioia, Espy, & Isquith, 2003). Based
on a teacher-rated measure of mathematics achievement, students who performed well on the
tasks at age four relative to peers were rated higher relative to their peers at age six. However,
these researchers also found the converse to be true about early executive function delay.
Children who showed delays in executive functioning development during their preschool years
also tended to have below average mathematics performance two years later. These findings
replicate and extend prior findings in this area. Children who have been identified as having
specific learning difficulties in mathematics have also been found to experience difficulties in the
areas of inhibitory control, set shifting, and working memory (Bull & Scerif, 2001).
The Behavior Rating Inventory of Executive Function
The Behavior Rating Inventory of Executive Function (BRIEF; Gioia et al., 2000) is a
behavior rating scale designed to assess the behavioral characteristics related to executive
20
function deficits of youth in the school and home environments. The BRIEF is probably the
best-known instrument designed to measure EF through a questionnaire format (Thorell &
Nyberg, 2008). Gioia et al. (2000) indicate that the goal from the outset was to “develop a
psychometrically sound measure of executive function in children that would be easy to
administer and score and would yield clinically useful information about commonly agreed upon
domains of executive function” (p. 35).
There are two versions of the original BRIEF (the Parent form and the Teacher form;
Gioia et al., 2000), which are intended for youth ages five through 18. There are several
variations of each version of the BRIEF, which are designed for different age ranges. The
BRIEF-Preschool version (BRIEF-P; Gioia, Espy, & Isquith, 2003) is available for both parents
and teachers to rate children between the ages of 2 to 5. Two self-report versions were created
for individuals to rate their own behavior: one for youth between the ages of 8 to 18 years
(BRIEF- Self-Report [BRIEF-SR]; Guy, Isquith, & Gioia, 2004) and an adult version (BRIEF-
Adult; Roth, Isquith, & Gioia, 2005), suitable for those 18 to 90 years old. Additionally, an
informant version is available as part of the BRIEF-Adult for those persons who are in frequent
contact with the adult being evaluated. For the purposes of this study, the parent version of the
BRIEF for youth ages 5 through 18 is reviewed.
Parent Version
Description. The BRIEF-Parent form is an 86-item questionnaire, in which
parents/guardians are asked to rate problematic behaviors of their child. Responses are
aggregated to form eight clinical scales: (a) Inhibit, (b) Shift, (c) Emotional Control, (d) Initiate,
(e) Working Memory, (f) Plan/Organize, (g) Organization of Materials, (h) Monitor; and two
validity scales: (i) Inconsistency, and (j) Negativity. The Inhibit scale measures the ability to
21
suppress impulses and to stop one’s own behavior at the proper time. The Shift scale assesses
the ability to move freely from one situation, activity, or aspect of a problem to another without
“getting stuck” on a topic; it also taps behaviors relating to transition, tolerating change, or to
problem-solve flexibly. The Emotional Control scale relates to the ability to modulate emotions,
such as anger, and to avoid rapid mood changes. The Initiate scale measures the ability to begin
a task or activity, and to independently problem-solve or generate ideas. The Working Memory
scale assesses the capacity to hold information in mind for the purpose of encoding information
and achieving goals. The Plan/Organize scale assesses abilities to develop appropriate steps
ahead of time in order to carry out events in a systematic manner, and to prioritize tasks in a
fashion that is not haphazard. The Organization of Materials scale relates to abilities to maintain
orderliness in everyday situations. The Monitor scale relates to abilities to keep track of one’s
own and others’ efforts through “work-checking” behaviors (Gioia et al., 2000, p. 17).
Gioia et al. (2000) attempted to address the area of bias through the Inconsistency scale
and the Negativity scale. The Inconsistency scale is designed to gauge how often a rater answers
similar questions in an inconsistent manner. For example, a rater may answer Never in response
to item 44 (Gets out of control more than friends), but also answer Often in response to Item 54
(Acts too wild or out of control; Gioia et al., p. 15). If such inconsistency emerges across similar
items throughout the instrument, a high Inconsistency score will be associated with the BRIEF-
Parent scores. Thus, Gioia et al. recommend that clinicians examine the protocols carefully
when the Inconsistency scale is abnormally high (≤ 6 is “acceptable;” 7 to 8 is “questionable;”
and ≥ 9 is “inconsistent”; p. 15). Examiners also need to inquire about the inconsistencies
identified. If the rater’s explanations of the inconsistencies are reasonable, then the scores from
22
the protocol should still be considered valid. If explanations are not reasonable, the rating scale
should not be used as a source of information.
The Negativity scale is also used to examine validity of a rater’s responses by measuring
how often the rater answers BRIEF-Parent items in an abnormally negative manner in relation to
the clinical samples. Nine specific items make up the Negativity scale (e.g., Item 8 “Tries same
approach to a problem over and over even when it does not work”). Gioia et al (2000)
designated that these items represented a distinct scale because all could be answered in an
“unusually negative manner” (p. 16), even though these items are also contained on other
subscales. The higher the raw score obtained, the more likely it is that the rater has a negative
perception of the child. A negative perception may influence the rater’s objectivity when rating
children’s behaviors. Inflated scores as a result of a rater’s perception is not a unique problem of
the BRIEF-Parent, but of any observer rating scale (Denckla, 2002). The other possibility,
however, is that the child truly may have severe executive dysfunction resulting in higher overall
scores in various areas. Scores of 5 or more are considered “elevated” (Gioia et al., 2000, p. 14)
and scores of more than 7 “reflects either an excessively negative perception of the child or that
the child may have substantial executive dysfunction” (Gioia et al. 2000, p. 15). If the
Negativity scale score is high, the examiner is prompted to investigate the reason behind the high
score and should make a decision regarding whether the protocol can be used as a valid source of
information.
Responses to the eight clinical scales are grouped into three composite scores, which are
calculated based on the above scale scores: the Behavioral Regulation Index (BRI), the
Metacognition Index (MI), and the Global Executive Composite (GEC). The BRI is a composite
of the Inhibit, Shift, and Emotional Control scales and “represents the…ability to shift cognitive
23
set and modulate emotions and behavior via appropriate inhibitory control” (Gioia et al., 2000; p.
20). The remaining scales (Initiate, Working Memory, Plan/Organize, Organization of Material,
and Monitor) are combined to reflect the MI score. Gioia et al. (2000) defined the MI as the
“ability to cognitively self-manage tasks and reflects the child’s ability to monitor his or her
performance” (p. 21). Presently, the BRI and MI scores are combined to form the GEC, which is
defined as a “summary score that incorporates all eight clinical scales of the BRIEF” (Gioia et
al., 2000, p. 21) and reflects an individual’s overall executive functioning based on the given
responses. Because of the various levels of interpretation arising from the scales, index scores,
or overall GEC, practitioners may find it difficult on how to interpret the BRIEF. If looking at
the scores at a cursory or screening level, Gioia et al. (2000) recommend using the eight scales
because they can be charted and visually inspected. Scores for each scale and composite are
expressed through norm-referenced T scores and percentiles based on either a national norm
group or by gender in the norm group.
Development. Gioia et al. (2000) created items for the BRIEF based on their clinical
experience as well as a review of neuropsychological literature. A group of general education
teachers, special education teachers, and reading specialists reviewed a pool of 180 parent items
for clarity and ease of reading, resulting in the removal of 51 items. The authors and 12
independent reviewers (i.e., “neuropsychologists in hospital and university-based clinical
practice,” p. 36) evaluated the remaining 129 items. No additional items were removed
following this review. The 129-item version was initially comprised of nine scales: (a) Inhibit,
(b) Shift, (c) Emotional Control, (d) Working Memory, (e) Sustain, (f) Plan, (g) Organize, (h)
Monitor, and (i) Initiate. To refine the scale, the parent form was then administered to 212
parents, whose children were enrolled at a local school. An iterative item-total correlation
24
process was used in eliminating items in a stepwise fashion. No additional items following this
analysis were eliminated. Principal factor analysis (PFA), with an orthogonal rotation, was run
on the intended items for each scale to identify the factor structure of the items and to further
refine each scale. The nine analyses resulted in one primary factor for each scale (Gioia et al.,
2000). No items were eliminated from any of the scales after the PFA analysis, but were re-
examined by the authors to ensure that each item aligned with the authors’ conceptualization of
the EF construct.
The standardization of the BRIEF was conducted using the 129 items, with another
iterative item-total correlation reliability process applied to “several larger clinical samples”
(Gioia et al., 2000, p. 37). The descriptions in the BRIEF manual are vague, and no depth is
provided about the clinical sample. Gioia et al. reported that the results supported the pre-
existing nine scales and that “larger and more reliable datasets allowed for final editing of the
scales” (p. 37). Whatever the authors did resulted in the selection of 86 items to create the final
version of the BRIEF instead of the 129-item version, which was used in the scale’s
standardization. Results indicated that the intercorrelation between some of the BRIEF scales
(Working Memory and Sustain r = .96; Plan and Organize r = .94) were singular. Thus, Gioia et
al. combined the respective scales and streamlined each set of items to be reflective of one scale.
This process resulted in the scales of (a) Working Memory and (b) Plan/Organize and in a
reduction of nine scales to seven. Nine items were identified that did not fit well in any of the
remaining scales, but Gioia et al. determined the items to be important to children’s everyday
functioning. Thus, the Organization of Materials scale was created from these items, resulting in
86 items across eight scales on the BRIEF. The validity scales were developed based on the
frequency of responses of inconsistency or negativity across all the items. Thus, the
25
Inconsistency scale is computed by a sum of raw difference scores between specific paired items.
The Negativity scale is computed by summing raw scores of specific items, in which a higher
raw score indicates a greater degree of negativity.
Normative sample. The normative group consisted of 1,419 parent ratings of students
between the ages of five and 18 with no history of special education or psychotropic medication
usage. Additionally, no more than 10% of items could be missing in order to be included in the
normative dataset. Attempts were made by Gioia et al. (2000) to mimic the population of the
United States, based on such variables as gender, socioeconomic status (SES), race/ethnicity,
age, and geographical population density. Participants were obtained through samples of both
private and public schools in a variety of settings (urban, suburban, and rural) in the state of
Maryland. Twenty-five schools were sampled: 12 elementary, nine middle, and four high
schools. Additionally, 18 adolescents, who were in a typical control group in a study examining
traumatic brain injury, were recruited to take part in the normative study.
Evidence for factor structure. Nine studies were found that have examined the factor
structure of the BRIEF-Parent version with various clinical populations. Five studies were based
on U.S. samples (Donders, DenBraber, & Vos, 2010; Gioia, Isquith, Retzlaff, & Espy, 2002;
Hulac, 2008; LeJeune et al., 2010; Slick, Lautzenhiser, Sherman, & Eryl, 2006) and four studies
occurred outside of the U.S. (Batan, Öktem-Tanör, & Kalem, 2011; Egeland & Fallmyr, 2010;
Huizinga & Smidts, 2011; Qian & Wang, 2007). The U.S. versions are reviewed first and then
the translated versions. In three studies, validity scales were explicitly acknowledged, but were
only used in two to consider the inclusion of cases in statistical analyses (Donders et al., 2010;
LeJeune et al., 2010; Slick et al., 2006). LeJeune et al. used all the BRIEF scores regardless of
results from the validity scales. In contrast, Donders et al. used the validity scales to establish
26
which cases would be involved in the primary analyses. Thus, the scores of eight BRIEF forms
were eliminated due to unusual degree of negativity or inconsistent responding. Slick et al.
reported screening each BRIEF protocol in relation to the validity scales, but no cases were
eliminated.
U.S. versions. Confirmatory factor analyses (CFA) were run on the BRIEF in two (Gioia
et al., 2002; LeJeune et al., 2010) of the five U.S. studies and exploratory factor analysis (EFA)
was used in the other three studies (Donders et al., 2010; Hulac, 2008; Slick et al., 2006). Gioia
et al. (2002) conducted a series of CFAs to establish the factor structure of the BRIEF-Parent
form. Instead of testing the BRIEF’s factor structure based on eight scales, Gioia et al. tested the
factor structure based on nine scales. The Monitor scale was divided into two separate scales
(Task-Monitoring and Self-Monitoring). Each BRIEF scale is considered to reflect an executive
function that is distinct, but related to each other by overarching executive systems reflected
through the BRI, MI, and GEC composites. The scales were treated as indicators and the factors
reflected the creation of composite scales. A minimum of two indicators was used to create a
factor. Based on maximum likelihood extraction, four models (one-, two-, three- & four-factors)
were tested in a sample of 374 children aged 5-18 years (M = 9.06 years, SD = 2.73) with mixed
clinical diagnoses (e.g., ADHD, learning disabilities, autism spectrum disorders, and affective
disorders). The one-factor model, a general executive function factor, was composed of all nine
scales. The two-factor model consisted of Behavioral Regulation (Inhibit, Shift, Emotional
Control, and Self-Monitor) and Metacognition (Initiate, Working Memory, Plan/Organize,
Organization of Materials, and Task-Monitor). The three-factor model, a reconfiguration of the
nine scales resulted in testing a Behavior Regulation factor (Inhibit and Self-Monitoring scales),
Emotional Regulation factor (Emotional Control and Shift scales), and a Metacognition factor
27
(Working Memory, Initiate, Plan/Organize, Organization of Materials, and Task-Monitor). The
four-factor model was composed of the prior structure of the Behavior Regulation and Emotional
Regulation factors plus the subdivision of the Metacognitive factor into “Internal” Metacognition
(Initiate, Working Memory, and Plan/Organize) and “External” Metacognition factor
(Organization of Materials and Task-Monitor).
The baseline one-factor model (general executive functioning) had the worst fit relative
to the other proposed models (χ2/df = 17.41; CFI = .77; SRMR = .09; RMSEA = .21) based on
minimum fit criteria (comparative fit index [CFI] > .95; standardized root mean square residual
[SRMR] < .08; root mean squared error of approximation [RMSEA] ≤ .06; a χ2/df ratio < 5).
Gioia et al. (2002) determined that the best fit was the three-factor model (CFI = .95; SRMR =
.04; RMSEA = .11; χ2/df =5.42); however, the fit of the three-factor model was less than ideal
based on its present form (i.e., RMSEA; χ2/df). Gioia et al. revised the three-factor model post-
hoc by correlating some of the error terms. According to Byrne and Shavelson (1996), this
decision must be based on theory rather than through post-hoc analyses. Gioia et al.’s rationale
for re-specifying the model was based on Barkley’s (1997) work; namely, inhibition is related to
other executive function processes, such as working memory, emotional control, and
organization. By estimating these error covariances, the three-factor model fit was significantly
improved (CFI = .97; SRMR = .03; RMSEA = .08; and χ2/df ratio = 3.4). However, some SEM
experts may consider this adjustment a limitation. The issue is whether estimated correlated
errors are appropriate and reflect an actual fit of the data to the model or whether such a
procedure has inflated the actual fit of the models to the data (Byrne, 2006). Additionally, no
higher-order models were tested, even though such a model is explicitly stated as part of the test
authors’ conceptualization of executive function.
28
LeJeune et al. (2010) examined a 24-item abbreviated version of the BRIEF in two
samples (i.e., Normative and Confirmed ADHD) and submitted the results to a CFA. Results
indicated a two-factor solution fit the data well (χ2
= 521.03, df = 19, p = .31; goodness of fit
index [GFI] = .92; CFI = .95; RMSEA = .05; 90% CI = .04, .07). The two-factor solution was
also found to be invariant across gender and age groups. A limitation of this study was that a
majority of the cases analyzed (86.7%) were based on data from the original normative sample
collected by the test authors, so it was not independently conducted. Gioia and Isquith are the
two lead authors of the original BRIEF and both contributed to this study. Another limitation
was that in the confirmed ADHD sample the Monitor and Initiate scales on the short-from had
relatively weak correlation (e.g., r = .56 - .61) to the original BRIEF scales. In the Normative
sample, Initiate was also had a relatively low correlation to the original BRIEF scales (.60).
LeJeune et al. (2010) explained these low correlations were due to the “recruitment procedures
for the sample… [that] may have differentially attracted parents with very marked concerns” (p.
190). However, another explanation is that the short form may not be configured to accurately
capture the specific behaviors and their severity, which are considered hallmark symptoms of
ADHD. Thus, the validity of the scores on the short-form of the BRIEF is insufficient to warrant
its use and further research is needed.
The factor structure of the eight-scale BRIEF was examined in a sample of 100 children
(ages 6-16) affected by traumatic brain injury (TBI). Donders, DenBraber, and Vos (2010) used
EFA, with maximum likelihood extraction, to identify two latent constructs. These findings
were similar to those obtained with the standardization sample (Gioia et al., 2000), except for
some variations. Donders et al. found that the Inhibit scale loaded on the MI factor rather than
loading on the BRI factor, as Gioia et al. (2000) reported. This finding suggests that, in children
29
with TBI, the Inhibit scale may be reflecting a more cognitive, rather than behavioral, aspect of
impulse control.
Donders et al. (2010) acknowledged that one of the limitations of this study was the
population. Recruitment was from rehabilitation referrals, so the severity of the cases of TBI
was greater in this particular population than the general population of children with TBI. Thus,
Donders et al. indicate that their sample may not have been an accurate reflection of the varied
degrees of TBI in the general population, limiting the generalizations of the findings.
Additionally, the study had a small sample size (N = 100) for a factor analysis. The location of
the Inhibit scale on a different factor than had been found before is noteworthy and should be
explored in future research.
Hulac (2008) examined the factor structure of the BRIEF-Parent form via EFA (principal
components analysis) in a sample of 93 adolescent females living in residential treatment
facilities. Hulac reported that the one-factor solution (general executive functioning) best
described the BRIEF structure for the sample. Hulac considered the identification of a one-
factor solution was due to the higher rate of underlying psychological conditions of the
adolescents (e.g., anxiety, depression, or bipolar disorder), an indication that the BRIEF may not
be invariant across psychological conditions. A limitation of Hulac’s study was the small sample
size.
Slick et al. (2006) submitted the original BRIEF (eight scales) to principal factor analysis
using a clinical sample of 80 children diagnosed with intractable epilepsy. Based on the Kaiser-
Guttman rule, a one-factor structure was identified with moderately high communalities (e.g.,
Plan/Organize = .81) for all indicators with the exception of Organization of Materials (.57).
However, both two and three-factor solutions (oblique rotation) were tested, because 71% of the
30
nonredundant residuals were greater than .05. Slick et al. reported that the two-factor solution
(the Behavioral Regulation and Metacognition Indices), as originally described in the test
manual, was a better solution for the data. The Metacognition Index factor was comprised of
four scales (Plan/Organize, Working Memory, Initiate, & Organization of Materials) and
Behavioral Regulation Index had three scales (Emotional Control, Shift, & Inhibit). The Monitor
scale loaded equally on both factors, which seems to supports Gioia et al.’s (2002) view that the
Monitor scales reflect two distinct scales—Self-Monitor and Task-Monitor. Slick et al. provided
little information on the three-factor solution, indicating that it was “explored” but “produced a
factor with no salients” (p. 186) and was therefore disregarded as a viable solution.
A limitation of the study is the small sample size (N = 80) for a factor analysis.
Furthermore, using “the eigenvalue rule of 1” is dated in determining the maximum number of
factors to retain in a factor analysis; other more acceptable methods are minimum average partial
and parallel analysis (Thompson & Daniel, 1996). Additionally, salient structure/pattern
coefficients are recommended to be above |.40| (Fabrigar, Wegener, MacCallum, & Strahan,
1999).
Translated versions. In four studies, the factor structure of translated and adapted
versions of the BRIEF-Parent form has been tested: (a) a Norwegian version (Egeland &
Fallmyr, 2010); (b) a Dutch version (Huizinga & Smidts, 2011); (c) a Turkish version (Batan,
Öktem-Tanör, & Kalem, 2011); and (d) a Chinese version (Qian & Wang, 2007). CFAs were
used in three of the studies (Egeland & Fallmyr; Huizinga & Smidts; Qian & Wang) and an EFA
was run in one study (Batan et al., 2011). Two studies (Egeland & Fallmyr; Huizinga & Smidts)
are reviewed in-depth and two are briefly summarized, as only the abstracts of the Batan et al.
and Qian and Wang studies are available in English.
31
Egeland and Fallmyr (2010) examined the factor structure of a 86-item Norwegian
version of the BRIEF parent form. The sample was 158 Norwegian children with no diagnosis
(48 controls) or mixed clinical diagnoses (72 school psychology referrals; 38 mental health
outpatients). Fourth grade children (estimated age 10 years; 23 boys; 25 girls) were used as
controls in the study. The school and mental health referrals composed the clinical sample (86
boys; 26 girls) with an average age of 10.9 years (SD = 2.6). CFA (extraction method
unspecified) was conducted to test five models; the scales were treated as indicators. Three
models tested the BRIEF based on the original eight scales: (a) a one-factor model; (b) a two-
factor model of BRI and MI; and (c) three-factor model of Emotional Regulation (ERI), BRI,
and MI. Two models tested the BRIEF based on nine scales by dividing the Monitor scale into
two scales described above: (a) a two-factor model of BRI and MI; and (b) a three-factor of ERI,
BRI, and MI. Egeland and Fallmyr (2010) found that the best fit was the three-factor model,
nine-scale version, (CFI = .96; RMSEA = .14; χ2/df =3.26). The baseline single-factor model
(general executive functioning) had the worst fit relative to the other models (χ2/df = 8.97; CFI =
.86; RMSEA = .23). The findings replicated Gioia et al.’s (2002) results.
A limitation of Egeland and Fallmyr’s (2010) study was the sample size (N = 158), which
is considered somewhat low for the number of parameters in the model (Comrey & Lee, 1992).
Although a Norwegian translation of the BRIEF was used with a Norwegian sample, this cultural
approach was offset by the use of American norms. Finally, the RMSEA was still high in the
three-factor model (RMSEA = .14), which is above the recommended criterion of .06 (Hu &
Bentler, 1999) and is indicative of a misfit of the model to the data.
Huizinga and Smidts (2011) examined the factor structure of a Dutch adaptation of the
BRIEF, which contained 75 items instead of 86. Parents of 847 Dutch school children (431 boys
32
and 416 girls) were recruited through “regular schools throughout the Netherlands” (p. 54) and
filled out the rating scale. Huizinga and Smidts (2011) conducted both item level and scale
CFAs to test the structure of the eight-scale BRIEF. Discrete factor analysis via Mplus was run
on a 72-item, eight-factor model, which was based on all of the items used in the clinical scales
of the BRIEF. The non-norm fit index (NNFI) was .92 and RMSEA was .109. Modifications
were made due to three items related to handwriting skills, which resulted in an increase in the
NNFI (.95) and a decrease in RMSEA (.087). Both were considered improvements to the model.
The second sets of analyses were multigroup CFAs on a two-factor model of the BRIEF
scale scores across four age groups: (a) 5 to 8, (b) 9 to 11, (c) 12 to 14, and (d) 15 to 18. To
ensure that the same factors were invariant across the age groups, the CFAs were first conducted
with no equality constraints imposed across the groups. Then, the same constraints of the
observed indicators to factors were imposed across age group, and finally, the model was tested
based on whether the factor intercepts were equivalent across the age groups. The two-factor
model comprised the BRI (Inhibit, Shift, and Emotional Control scale) and the MI (Initiate,
Working Memory, Plan/Organize, Organization of Materials, and Monitor).
Without any constraints, the two-factor model was considered to be a poor fit across the
groups (NNFI = .929; RMSEA = .129). Thus, the no constraint model was modified to allow for
two sets of residuals to correlate (Inhibit and Shift; Inhibit and Monitor), which resulted in a
better fit (NNFI = .97; RMSEA = .083). Using the modified model as the baseline, the
subsequent models ([a] equal factor loadings and [b] equal intercept) were tested. The models
did not degrade the fit across the age groups: Equal loadings, NNFI = .975; RMSEA = .078;
Equal Intercept, NNFI = .965, RMSEA = .092. These findings indicated that the two-factor
Dutch version of the BRIEF was factorially invariant across the age groups and that any mean
33
differences found between age groups could be interpreted as such. A limitation of this study
was that the eight-scale model was run twice because the first time indicated poor fit, so three
parameters were freely estimated and then re-run in order to improve model fit. It is debatable
whether this post-hoc fitting was warranted and if the model (without this modification) simply
did not fit the data.
Qian and Wang (2007) evaluated the reliability and validity of the scores for a Chinese
version of the BRIEF-Parent form in a sample of school-age children: 216 diagnosed with
ADHD, schizophrenia, or autism, and 311 labeled as “normal controls.” Confirmatory factor
analysis was conducted on the eight scales. In the abstract, Qian and Wang noted, “the eight-
scale model of the BRIEF was reasonable” (p. 277). No other information about the CFA was
available in English.
In a sample of Turkish youth (213 girls, 99 boys) between the ages of 5-18, Batan et al.
(2011) examined the reliability and validity of both the parent and teacher versions of the BRIEF
scores to establish normative standards. Only the findings of the parent version are reported
here. Batan et al. conducted an EFA on the eight scales, reported a two-factor solution, and
concluded that the solution was consistent with the original factor structure. All other
information about the factor analysis or structure of the BRIEF was in Turkish.
Because the BRIEF was developed in the United States and therefore originated from the
perspective of a Western culture, it is important to carefully examine research using translated
versions of the instrument. Some of the behaviors surveyed (e.g., inhibition) may be both intra-
personally and contextually altered when adapted to assess students in other countries. For
example, research indicates that family conflict is positively linked to externalizing problems in
Korean American youth and that expression of emotion is often discouraged (Park et al., 2010).
34
On the other hand, in many Hispanic cultures, it is socially acceptable for males to display
“machismo,” which encompasses aggressiveness, hypermasculinity, and overexpression of anger
(Harris, 1996). Because the results of the BRIEF are based upon frequency of behaviors from
the perspective of a rater, the results may be culturally dependent. Although researchers purport
to be testing the same constructs, some of the studies (e.g., Huizinga & Smidts, 2011) do not
contain the same number of items as the original form of the BRIEF, meaning that direct
translation is not possible. Some of the translated test versions (Batan et al., 2011; Qian &
Wang, 2007) do not explicitly state the number of items in the translated versions of the BRIEF.
Summary. In general, the findings indicate that six (Batan et al., 2011; Donders et al.,
2010; Huizinga & Smidts, 2011; LeJeune et al. 2010; Qian & Wang, 2007; Slick et al. 2006) of
the nine studies support the original two-factor, eight scale version of the BRIEF-Parent. Two
studies (Egeland & Fallmyr, 2010; Gioia et al., 2002) provide support for the three-factor, nine-
scale version of BRIEF-Parent, in which the Monitor scale is split into two separate scales (Self-
Monitor and Task-Monitor) and contains a third factor of Emotional Regulation.
Studies providing support for the two-factor, eight-scale version of the BRIEF-Parent
have some unique characteristics that may limit generalizability of the findings. Some had small
sample sizes and focused on specific clinical diagnoses, such as TBI and intractable epilepsy
(Donders et al., 2010; Slick et al., 2006). Others used the standardization sample (LeJeune et al.,
2010) or one described as “normal school children” (Huizinga & Smidts, 2011). These are not
necessarily generalizable to the U.S. special education population. Although some studies
occurred outside the U.S., and the BRIEF had to be translated, the findings were similar to those
from the U.S. studies. For two studies (Batan et al., 2011; Qian & Wang, 2007), it would have
been helpful to have additional information beyond abstracts to evaluate their findings. One
35
study (Egeland & Fallmyr, 2010) provides independent support of the three-factor, nine-scale
version. Both the Egeland and Fallmyr (2010) and Gioia et al. (2002) studies were based on
mixed clinical diagnoses samples, making the findings generalizable to special education
populations. Although both Gioia et al. (2002) and Huizinga and Smidts (2011) made post-hoc
modifications to improve model fit, Egeland and Fallmyr reported similar findings without such
modifications. Hulac’s (2008) findings were an anomaly in which a one-factor solution of the
eight scales was identified. No other studies identified via EFA or CFA a one-factor
solution/model as the best structure. However, Hulac used a small sample of 93 adolescent
females in residential treatment centers. Possibly the sample size as well as the unique sample
could have been factors in the emergence of such findings. Another unique finding was reported
by Donders et al. (2010), in which the Inhibit scale loaded on the MI factor instead of the BRI
factor. Again, the findings were based on a small sample of 100 children diagnosed with TBI;
both aspects could have contributed to the unique findings.
In summary, the BRIEF-Parent form has received support for the two-factor, eight-scale
version, which is the current configuration for test use. The support appears to be based on small
clinical samples (Donders et al., 2010; Slick et al., 2006). The three-factor, nine-scale version
has been based on mixed clinical diagnoses samples, but currently its support is based on two
studies (Egeland & Fallmyr, 2010; Gioia et al. 2002). However, the fit statistics for the three-
factor models did not meet the recommended criteria (e.g., RMSEA = .14; Egeland & Fallmyr,
2010). Based on both set of studies—eight- or nine-scale versions, it is important to continue
investigating the nature of the factor structure of the BRIEF-Parent in unique clinical samples as
well as mixed clinical diagnoses samples. It is particularly important to examine (and scrutinize)
the current factor structure of the BRIEF instrument in a mixed clinical sample because of the
36
similarity of this type of sample to a special education population. Given the increased use of EF
constructs and the BRIEF instrument in the school setting to examine children’s academic
difficulties, the current factor structure must be psychometrically sound. Both types of studies
(i.e., clinical samples and mixed clinical samples) are necessary to strengthen the case that the
BRIEF-Parent is a useful diagnostic tool for populations of youth who are experiencing problems
in executive functions. Its usefulness starts with whether the factor structure is the same across
diverse populations of youth. If the BRIEF-Parent is not, then its usefulness is limited or its
scale work needs to be re-examined.
Reliability Evidence of the BRIEF-Parent Form
Internal consistency. Cronbach’s alpha (1951) internal consistency for the scores of the
BRIEF scales, indexes, and GEC have ranged from .82 (Initiate) to .98 (GEC) in the clinical
sample and .80 (Initiate) to .97 (GEC) in the normative sample (Gioia et al., 2000, p. 51). A
general rule of thumb is that values above .80 are preferable for psychoeducational or clinical
tasks; values above .90 are considered “excellent” (Sattler, 2001, p. 102). Huizinga and Smidts
(2011) reported reliability estimates of the BRIEF scores that ranged from .78 (Initiate) to .90
(Working Memory) for the scales. Cronbach’s alphas for composite—the BRI, MI, and GEC—
scores were between .93 and .96. Item total correlation for all scales was above the benchmark
of .30 as established by Nunally and Bernstein (1994). Batan et al. (2011) also reported
sufficiently high reliability estimates of the scores for the Turkish version of the BRIEF-Parent,
which ranged from .60 to .94 (no scales were specified) for the scales. Qian and Wang (2007)
noted that Cronbach’s alphas for the scales ranged from .74 to .96. No scales were linked to
specific coefficients, except for the authors’ reporting a low estimate for a scale labeled “initial”
(.61). It is possible that the authors meant the Initiate scale, which might have been misspelled
37
during translation. Using normative sample data, LeJeune et al. (2010) reported internal
consistency for the scale scores of the BRIEF short-form ranging from .68 (Initiate) to .81
(Emotional Control) and from .86 (BRI) to .93 (GEC) for the composite scores. Based on the
reliability estimate reported, those for the Initiate scores have usually been the lowest estimate.
In three studies (Huizinga & Smidts, 2011; LeJeune et al., 2010; Qian & Wang, 2007), the
reliability estimates for the Initiate scores were below .80. Two of these studies were translated
versions of the BRIEF (Huizinga & Smidts, 2011; Qian & Wang, 2007) and the third examined
the short form of the BRIEF in comparison to the original form (LeJeune et al., 2010). These
studies used forms that were different from the original version of the BRIEF, which may
explain the low reliability score. In particular, the Initiate scale contains items relating to
beginning a task or activity as well as independently generating ideas. This scale includes items,
such as “Does not take initiative” or “Needs to be told to begin a task even when willing,” which
may reflect culturally-bound behaviors.
Interrater reliability. The BRIEF manual provides information about interrater
agreement (teacher-parent, parent-parent, and teacher-teacher; Gioia et al., 2000). Correlations
between teachers and parents have ranged between .15 (Organization of Materials and Shift
scales) to .50 (Inhibit scale; Mdn r = .24). No specific information was provided in the manual
about correlations between parents or correlations between teachers. According to Gioia et al.
(2000), interrater reliability estimates of scores between different types of raters, in this case,
teacher-parent versus the same pairs of raters, are expected to be lower due to the difference in
settings in which the child is observed. Thus, Gioia et al. consider these findings to reflect
differences in environmental structure between home and school as well as different expectations
in terms of organization in the school setting (lockers or materials given to students). No
38
independent study has examined the interrater reliability of the BRIEF scores; although,
correlation between parent and teacher ratings are often lower (.30 to .50) than parent-parent or
teacher-teacher interrater reliabilities (Achenbach et al., 1987), often causing different patterns of
agreement (e.g., Jepsen, Gray, & Taffe, 2012).
Test-retest reliability. A group of 54 parents, who served as part of the normative
sample, was given the BRIEF-Parent form to complete twice about their child over a two-week
period. Test-retest correlations ranged from .76 to .85 (Mdn r = .81). For the parent clinical
sample (n = 40), the reliability coefficient of the scores was slightly lower (.72 to .84; Mdn r =
.79) over an average of three weeks (Gioia et al., 2000).
Huizinga and Smidts (2011) also examined test-retest reliability for the Dutch version of
the BRIEF using Intraclass Coefficients (ICC), with the following criteria: ICC < .2 = very low,
.2 to .4 = low, .4 to .6 = intermediate, .6 to .8 = high, and .8 to 1.0 = very high (Landis & Koch
1977). All composite scores (BRI, MI, and GEC) were above .8 and the remaining scales ranged
between .73 (Working Memory) and .94 (Inhibit). Qian and Wang (2007) reported test-retest
reliability estimates of the scores were .68 to .89 (no scales or composites specified) for the
BRIEF in a sample of school age children in China.
Other Evidence for the Construct Validity of the BRIEF-Parent Form
Several types of validity are useful in the interpretation of scores from a scale (Messick,
1995). Five types of validity are addressed in relation to the BRIEF-Parent form: (a) predictive
validity, (b) convergent validity, (c) discriminant validity, (d) ecological validity, and (e) social
consequences.
Predictive validity. Pratt (2000) examined the BRIEF-Parent ratings of 212 children
between the ages of 6 and 11 years old. Participants were comprised of four groups: (a) ADHD,
39
(b) Reading Disorder (RD), (c) ADHD + RD, and (d) controls. ADHD children were found to
have statistically significant more problems on all BRIEF scales, and RD children had
statistically significant elevated BRIEF scores on the Sustain, Working Memory, and Plan scales
in comparison to the other groups. Pratt concluded that based on the BRIEF scores, those in the
ADHD + RD group could be distinguished from the RD and control groups, but not from the
ADHD group.
A major limitation of Pratt’s (2000) study was that an 80-item version of the BRIEF was
used before the instrument was officially published. This 80-item version had nine scales and
differs from the 86-item, eight-scale version that was officially published. For example, in the
80-item version, Plan and Organize represented two scales, whereas the official BRIEF has a
scale that combines the two constructs—Plan/Organize. As a result of these differences, results
from Pratt’s study cannot be directly compared to those studies that used the official version.
Two other studies (Mahone et al., 2002; McCandless & O’Laughlin, 2007) have
examined the predictive validity of the BRIEF-Parent form. Mahone el al. looked at the ratings
of parents for 76 children (18 ADHD; 21 Tourette’s (TS); 17 TS + ADHD; and 20 controls).
The Inhibit and Working Memory scales were elevated in the ADHD group, but not in the other
groups. Also, correlations were not statistically significant between the BRIEF scales and
various task-based or psychoeducational measures. Mahone et al. concluded that it is difficult to
separate ADHD from other clinical groups associated with EF deficits, solely by using the
BRIEF-Parent form and recommended its use should be in conjunction with other measures. A
limitation of the study was the small sample size (N = 76).
The predictive validity of the BRIEF-Parent scores has been largely based on
differentiating between ADHD subtypes (ADHD-Inattentive and ADHD-Combined).
40
McCandless and O’Laughlin (2007) looked at 70 boys and girls between the ages of five and 13
referred to a university-based clinic for assessment of ADHD, and hypothesized that individuals
identified with ADHD would demonstrate higher than average scores on the BRIEF-Parent
scales than those without ADHD. Specifically, the BRI would be elevated in the ADHD-
Combined subtype and that the Working Memory scale would also be elevated in both subtypes
in comparison to controls not diagnosed with ADHD. The findings not only supported this
premise, but the MI also was found to be elevated in both subgroups relative to the control
group. Because this sample was referred to the clinic and contained a small number of
participants, the results may not be as generalizable to the population.
In summary, the predictive validity of the BRIEF-Parent scales is limited. Various
BRIEF scales (BRI, MI, and Working Memory) were elevated in subgroups of the ADHD
sample (McCandless & O’Laughlin, 2007), but the correlations were not statistically significant
between the BRIEF-Parent scales and various task-based or psychoeducational measures
(Mahone et al., 2002). Pratt (2000)’s study demonstrates that elevated BRIEF scores existed in
the ADHD and RD sample, but the findings may have limited generalizability due to the use of
an unofficial version of the measure. Because of the variability in findings across the predictive
validity studies, it is recommended that the BRIEF not be used by itself for diagnostic purposes,
but in combination with various sources of information (Mahone et al., 2002)
Convergent validity. The effect of executive function on behavior should be reflected in
the convergence between established rating scales purported to measure the same (or similar)
behaviors (Messick, 1995). Gioia et al. (2000) tested for convergent validity between the
BRIEF-Parent scales and four measures supposedly tapping several similar constructs. No direct
comparison could be made between the BRIEF and other rating scales of executive function
41
because none existed at the time of standardization of the BRIEF. The first test of convergent
validity was with an ADHD measure. Parents of 100 clinically referred children completed the
BRIEF-Parent version and the ADHD-Rating Scale-IV (ADHD-IV; DuPaul, Power,
Anastopoulos, & Reid, 1998), which has two composite scales: Inattention and Hyperactivity-
Impulsivity. About half of the BRIEF-Parent scales (Working Memory, Plan/Organize, Initiate,
and Monitor plus the Metacognitive Index) were moderately correlated (r = .54 to .67) with the
ADHD-IV Inattention scale. The remaining BRIEF scales correlated between .39 and .49 with
the Inattention scale. Four BRIEF-Parent scales (Inhibit, Shift and Emotional Control scales and
the Behavioral Regulation index) were moderately correlated (range = .56 to .73) with the
ADHD-IV Hyperactivity-Impulsivity scale (Gioia et al., 2000). The remaining ADHD scales
were considered significantly correlated with all BRIEF-Parent scales. Correlations ranged from
.33 to .45 with the exception of the Organization of Materials scale, which was not statistically
correlated with the Hyperactivity-Impulsivity scale (r = .15).
Gioia et al. (2000) also examined the relation between the BRIEF-Parent scales and
Achenbach’s Child Behavior Checklist (CBCL; Achenbach, 1991). The CBCL has eight scales:
Withdrawal Problems, Somatic Complaints, Anxious/Depressed, Social Problems, Thought
Problems, Attention Problems, Delinquent Behavior, Aggressive Behavior and two broadband
domains: Internalizing and Externalizing. Based on what the respective measures purport to
measure, Gioia et al. (2000) expected to find similarities between the BRIEF-Parent scale of
Working Memory and the CBCL Attention Problem scale, as well as between the BRIEF-Parent
Inhibit scale and the CBCL Aggression scale.
Parents of 200 clinically-referred children completed both measures. Results indicated a
moderate relation between all BRIEF-Parent scales and the CBCL Attention Problems scale (r’s
42
= .50 to .72), with the exception of Organization of Materials. Three BRIEF scales (Inhibit,
Emotional Control, and Shift) were moderately correlated with CBCL’s Aggressive Behavior
scale (r’s = .57 to .73).
Gioia et al. (2000) also compared the Behavior Assessment for Children Parent Rating
Scale (BASC Parent; Reynolds & Kamphaus, 1992) with the BRIEF-Parent form in a sample of
80 parents of children who were clinically referred. The BASC Parent rating scales have nine
scales: Aggression, Conduct Problems, Hyperactivity, Anxiety, Depression, Somatization,
Atypicality, Withdrawal, and Attention Problems. The BRI from the BRIEF correlated with both
the Aggression (r = .76) and Hyperactivity (r = .63) scales. The BRIEF’s Emotional Control
scale also correlated (r’s = .62 to .69) with the Aggression, Anxiety, and Depression scales of the
BASC Parent. Correlations between the Emotional Control scale and the BASC’s Aggression,
Anxiety, and Depression scales make sense because these particular BASC scales involve
emotional responses of a child, which may be reflected through the ability to control emotion.
The BRIEF Inhibit scale also correlated with the Aggression (r = .72) and Hyperactivity (r = .68)
scales of the BASC Parent. Multiple BRIEF scales (Initiate, Working Memory, Plan/Organize,
and Monitor) correlated with the Attention Problems scale from the BASC Parent. Gioia et al.
(2000) hypothesized that the BRIEF Working Memory scale would correlate with the Attention
scale of the BASC. Gioia et al. concluded that the pattern of correlations were “strong” and
“expected” (p. 55).
Gioia et al. (2000) used a small sample size (N = 80). Additionally, BASC scales such as
Conduct Disorder and Somatization had low correlations with the BRIEF scales or composite
scores. Gioia et al. used these findings as evidence of discriminant validity, noting the
“relatively lower executive contribution to these problems” (p. 55). Gioia et al. make it appear
43
as though the two dysfunctions are completely unrelated; however, this assertion is debatable. It
is likely that those with conduct problems may display compromised executive functions, such as
lack of inhibition. Further, Gioia et al. (2000) examined the relations between the BRIEF and
the CBCL, and found moderate to high correlations between the Inhibit and Emotional Control
scale and the Aggressive Behavior scale of the CBCL, which are types of behavior involved, at
least to some degree, by those individuals with conduct disorder.
Gioia et al. (2000)’s final demonstration of convergent validity was between the BRIEF-
Parent form and the Conners’ Rating Scales (CRS; Conners, 1989). The CRS has eight scales:
Anxiety, Learning, Somatic, Obsessive-Compulsive, Antisocial, Restless-Disorganized, Conduct
Disorder, and Hyperactive-Immature. Parents of 25 clinically-referred children completed both
measures. The BRIEF’s BRI and its scales (Inhibit, Shift, and Emotional Control) correlated
with the CRS’s Restless-Disorganized (r = .71), Conduct Disorder (r = .77), and Hyperactive-
Immature (r = .57) scales. Low correlation (r’s = -.28 to .27) were found between all BRIEF
scales and the Obsessive-Compulsive and Antisocial CRS scales, which Gioia et al. viewed as
consistent with what the BRIEF does (and does not) purport to measure.
A limitation of the study was a small sample size (N = 25) of parent ratings from which
to make generalizations to a population. Additionally, there were several correlations that were
unexpectedly low, such as between the Learning scale of the CRS and the BRIEF’s Organization
of Materials (.06). Even though the Organization of Materials subtest had nonsignificant
correlations with every CRS scale, with the exception of Restless-Organized (r = .42) and
Hyperactive-Immature (r = .50), the lack of relation between organization and learning is an
unpredicted one as organization skills have been linked to academic success (Cameron, Connor,
Morrison, & Jewkes, 2008).
44
Gioia also contributed as a co-author to research (LeJeune et al. 2010) designed to
develop and evaluate an abbreviated version of the BRIEF-Parent form. The short-form is based
on 24 items selected from the original one. Three samples were used to analyze its psychometric
property: (a) the BRIEF Normative sample (N = 1,419) of children aged 5 to 18; (b) a sample of
133 children (ages 5 to 13 years) diagnosed with ADHD; and (c) a sample suspected of having
ADHD, consisting of 84 children (ages 5 to 16). Correlations between the original BRIEF-
Parent scales and the respective short-form scales generally exceeded .75 in the normative and
confirmed ADHD samples (range = .56 to .97). The Initiate (.61) and Monitor (.56) scales on the
short-form had the weakest correlations to the respective scales on the original form. Composite
index correlations between the BRIEF forms were strong in both the normative and ADHD
samples, ranging from .88 (BRI-normative and ADHD samples) to .97 (GEC-normative and
ADHD samples).
Independent research of convergent validity. Independent research on the BRIEF-
Parent form has provided mixed support for convergent validity. Several studies (e.g., Bishop,
2011; McCandless & O’Laughlin, 2007; and Toplack et al., 2009) have examined the relations
between the BRIEF-Parent scales and various task-based measures. McCandless and
O’Laughlin (2007) analyzed parent ratings on the BRIEF-Parent and the BASC Parent of 70
children seen at a university-based ADHD clinic. All correlations between BRIEF scales, and
the Attention and Hyperactivity scales on the BASC Parent were statistically significant. The
correlations had a wide range: .24 (Shift) to .70 (MI) with the BASC Attention scale; and .26
(Shift) to .83 (Inhibit) with the BASC Hyperactivity scale. However, the sample size was small,
and participants were not matched to control for gender, age, or other demographic factors.
45
Toplack et al. (2009) examined the relations between the BRIEF-Parent scales and
several task-based measures in a sample of 90 children (46 diagnosed with ADHD; 44 controls).
The four task-based measures used were inhibition (Stop Task; Logan & Cowan, 1984), set
shifting (Trail Making Task -Part B; Reitan, 1958), verbal and spatial working memory
(Working Memory composite; the Wechsler Intelligence Scale for Children-Third Edition
[WISC-III]; Kaplan et al., 1999) and planning (Stocking of Cambridge task [SOC]; the
Cambridge Neuropsychological Test Automated Battery [CANTAB]; Robbins et al., 1994).
Toplack et al. found statistically significant relations between all BRIEF-Parent scales and many
of the task-based measured. Nonsignificant relations were found between (a) the BRIEF’s
Inhibit and the Stop Task Inhibition (r = .21); (b) the BRIEF’s Shift scale and the Trail Making
Set Shifting (r = .23); (c) and the BRIEF’s Plan/Organize scale and SOC Planning task (r = -.22).
Toplack et al. stated, “virtually all of the executive function experimental tasks were
significantly associated with the parent and teacher ratings on the BRIEF scales” (p. 62).
However, all of these correlations were not reported; only those involving Inhibit, Shift, Working
Memory, and Plan/Organize were provided for inspection. Toplack et al. found that each task-
based measure was not uniquely related to the similarly named BRIEF scale (e.g., the Inhibit did
not correlate only with the similarly named task-based measure Inhibition: Stop Task). BRIEF-
ratings were found to be statistically significant predictors of ADHD status, whereas the task-
based measures were not. A limitation of the study was that the selection of task-based measures
was based on the name of the task aligning with the similarly named BRIEF scale. The name of
the test from the BRIEF and the task-based measure may not be tapping the same construct,
which would explain the low correlations. As noted earlier, a problem with a non-unitary,
individual process approach to the study of the executive function is that researchers’ opinions
46
are involved in naming and conceptualizing factors. This approach may lead to the
underrepresentation of commonality between similar constructs (Séguin and Zelazo, 2005).
Bishop (2011) tested 150 children between the ages of 6 and 18 using the BRIEF-Parent
form (87 diagnosed with ADHD and 63 children diagnosed with internalizing disorders, such as
depression and/or anxiety). Results indicated that children with ADHD had statistically
significantly lower scores on the WISC-IV Working Memory Index, and higher scores
(indicating more impairment) on two of the BRIEF’s scales (Plan/Organize scale and Working
Memory scale) than children with internalizing disorders. The Test of Variables of Attention
Commissions (TOVA; Greenberg & Kindschi, 1996), a measure of inhibition, shared 10% of the
variance with the Inhibit scale on the BRIEF. The WCST Perseverative Responses score shared
8% of the variance with the Shift scale of the BRIEF for those with internalizing disorders.
These results are consistent with similar research (e.g., Toplack et al., 2009) that weak to
moderate correlations (r = .32) exist between task-based measures and the BRIEF.
A limitation of this study was the removal of seven participants with measured IQ score
below 80. Bishop’s (2011) rationale was that the cutoff for IQ scores was to eliminate the
comorbidity of developmental delay. Altering the sample limits the generalizability of this study
to the special education population. The BRIEF is not recommended for use with students with
mental retardation (IQ < 70), but approximately 10% of the population functions within the IQ
range of 70 to 80.
In summary, convergent validity was demonstrated by Gioia et al. (2000) between the
BRIEF-Parent form and four well-known behavior rating scales (ADHD-IV, BASC-Parent,
CBCL, and CRS). However, the sample sizes were small in all analyses. Independent
researchers have found mixed support for convergent validity of the BRIEF. Other behavior
47
rating scales show strong convergence with the BRIEF-Parent form, but many task-based
measures do not. Toplack et al. (2009) showed convergence between the BRIEF scales and four
task-based measures, but some nonsignificant relations were apparent between similarly named
scales or tasks (e.g., the BRIEF’s Inhibit and the Stop Task Inhibition; r = .21). McCandless and
O’Laughlin (2007) showed significant relations between the BRIEF scales and the Attention and
Hyperactivity scales of the BASC Parent. However, Toplack et al. (2009) and Bishop (2011)
showed weak relations between the BRIEF scales and analogous task-based executive function
measures.
Convergent validity and specific clinical populations. The BRIEF-Parent form has
been used to study executive functioning of children in a variety of populations, such as brain
disease (Anderson et al., 2002), moderate to severe traumatic brain injury (Vriezen & Pigott,
2002) and autism spectrum disorder (Gilotty, Kenworthy, Sirian, Black, & Wagner, 2002).
Anderson et al. (2002) found non-significant correlations between task-based activities and the
BRIEF-Parent scales in a sample of 189 children, divided across three clinical groups (44
diagnosed with early treated phenylketonia; 45 diagnosed with early treated hydrocephalus; 20
diagnosed with frontal focal lesions; & 80 controls). Correlations between the BRIEF scales and
task-based measures “varied from .01 to .48” (Anderson et al., 2002, p. 237). Specific
correlations were not provided, but a table was provided that contained the “proportion of
children in each group that scored in the severe range (> 1 SD above the mean) on the BRIEF
parameters” (Anderson et al., 2002, p. 237). A limitation is that the specificity of the sample
hinders the generalizability of the results.
Vriezen and Pigott (2002) provided support for Anderson et al.’s findings.
Nonsignificant to low correlations were found between the BRIEF-Parent scales and task-based
48
activities. The sample consisted of 48 children with moderate to severe traumatic brain injury.
None of the BRIEF index scores correlated significantly with task-based measures of the WCST,
Comprehensive Trail Making Test (CTMT; Reitan, 1958), and TOVA. The BRIEF’s
Metacognitive Index was, however, statistically significantly correlated (r = -.30; p < .05) with
WISC-III Verbal IQ. Also, a greater number of children in the sample were identified as
impaired on the BRIEF more than on the task-based measures. These findings meant that based
on which instrument (BRIEF or task-based) was administered to children, there may have been a
different determination about level of impairment. A limitation of the study was a small sample
size (N = 48). As a result, only the index and composite scores of the BRIEF (GEC, BRI, MI)
were included in the analyses. Precluding the eight BRIEF scale scores did not allow for
examining the direct relation between the BRIEF, and the task-based measures and Verbal IQ
score. It has been suggested that the BRIEF may be tapping behaviors associated with emotional
and social aspects of EF in a different area of the brain than those areas involved in task-based
measures (Stuss & Alexander, 2000). This idea extends beyond the scope of this study, but
warrants further attention in future research when considering why such low correlations exist
between the BRIEF and many task-based measures.
In other studies that examined the BRIEF’s utility in specific populations, Gilotty et al.
(2002) sampled 35 children with ASD, and examined the relation between executive function
skills (BRIEF-Parent) and adaptive behavior (the Vineland Adaptive Behavior Scales [VABS];
Sparrow et al., 1984). There were several statistically significant inverse relations between the
VABS Social scale and BRIEF-Parent scales, specifically the MI (r = -.53), the Initiate (r = -.64),
and Working Memory (r = -.57) scales. As impairment in executive function increased, the
adaptive behavior skills of these children with ASD tended to decrease. Limitations of this study
49
were a small sample size as well as no control group from which to compare the results.
Although this study showed strong relations between the BRIEF scales and a well-known
adaptive behavior instrument, not all studies have yielded such positive results (e.g., Vriezen and
Pigott, 2002).
Gioia and Isquith (2004) have defended the less than ideal correlations between the
BRIEF-Parent scales and other measures in specific clinical populations. Gioia and Isquith
contend that the accuracy of the BRIEF-Parent scales has been inappropriately compared to
results of the WCST. Gioia and Isquith argue that it is unfair to judge the utility of the BRIEF-
Parent scales by comparing it to this particular task-based measure because the WCST scores
have not consistently shown impairment in executive function and ADHD (see Pennington &
Ozonoff, 1996). Assuming their assertion about the WCST is a valid one, it still does not explain
the lack of correlation between the BRIEF and many other task-based measures, such as the
TOH, TOVA, and CTMT. In a review of the BRIEF for the Fifteenth Mental Measurements
Yearbook, Fitzpatrick (2003) noted an absence of established metacognitive measures in testing
the convergent validity of the BRIEF-Parent form. Subsequently, other observer rating scales
measuring executive function, such as the Childhood Executive Functioning Inventory (CHEXI;
Thorell & Nyberg, 2008) have been developed, but no empirically-based studies have examined
their relations with the BRIEF.
Discriminant validity. As evidence of discriminant validity, Gioia et al. (2000) factor
analyzed the BRIEF-Parent scales and composite scores with the CBCL. Correlational data also
provided support. A common factor analysis (principal axis factoring extraction [PAF]; oblique
rotation) of the two scales, based on a sample of 200 parent ratings, indicated a four-factor
solution, which accounted for 73% of the variance. Factor 1 contained the BRIEF’s Shift,
50
Emotional Control, and Inhibit scales. Factor 2 made up the remaining BRIEF scales
(Plan/Organize, Working Memory, Initiate, Monitor, and Organization of Materials). Factor 3
was defined by five of the CBCL scales (Withdrawn, Anxious/Depressed, Social Problems,
Thought Problems, Attention Problems), and Factor 4 comprised the CBCL Delinquent
Behavior, CBCL Aggressive Behavior and the BRIEF Inhibit scale. Thus, the scales of the two
instruments loaded onto two separate factors with the exception of the BRIEF Inhibit scale. The
Inhibit scale loaded on Factor 1 with a value of .42 and on Factor 4 with a value of .53.
According to Gioia et al., the Inhibit scale may measure a more physical than mental
manifestation of inhibition. A limitation of these findings is that the sample size (N = 200) was
small for a factor analysis.
Gioia et al. (2000) also reported low correlations (r’s = .11 - .28) between all of the
BRIEF-Parent scales and the CBCL Somatic Complaints scale (an index of physical complaints
in relation to a child’s emotional functioning). Low correlations were found between all of the
BRIEF-Parent scales and the Conduct Problems scale on the BASC (r = |.05| to |.14|, but low to
moderate correlation between the BRIEF scales and the BASC’s Somatization scale (|.17| to
|.44|). Finally, the BRIEF had a low correlation with the CRS Obsessive-Compulsive or
Antisocial scales (Gioia et al., 2000).
Independent studies have provided mixed support for the discriminant validity of the
BRIEF. McCandless and O’Laughlin (2007) examined the discriminant validity of both the
BRIEF-Parent and BRIEF–Teacher forms in the classification of 70 boys and girls (ages 5-13),
who had been referred to a university-based clinic for assessment of ADHD. The children made
up three groups: (a) No ADHD; (b) ADHD- Inattentive Type (ADHD-IT); and (c) ADHD-
Combined Type (ADHD-CT). Discriminant function analysis was conducted using the scores
51
from the MI scale of the BRIEF-Teacher form and the Inhibit scale from the BRIEF-Parent form
to determine classification of ADHD and its subtypes. Agreement between the BRIEF’s GEC
scores was low (r = .13) and only three of the eight scales were at a level of agreement above
that of chance. This lack of agreement may have contributed to the following results: (a)
approximately 15.7% of the participants were correctly classified as ADHD-IT, (b) 48.6% with
ADHD-CT, and (c) 35.7% without ADHD. Percentage of cross-validated grouped cases
correctly classified was 62.9%. These classification rates are inadequate because 33.3% of the
members, based on three groups in the analysis, would be correctly identified by chance alone
(Tabachnick & Fidell, 2001). Additionally, the results indicated that the BRI of the BRIEF-
Parent was statistically significantly elevated in those children identified as having ADHD-CT,
but the BRI was not significantly elevated on the BRIEF-Teacher. Group differences were
apparent in both forms of the BRIEF, specifically the Working Memory and Inhibit scales, when
classifying children as either ADHD or non-ADHD. McCandless and O’Laughlin (2007) noted
that parents were better reporters of behavioral deficits using the BRIEF, but teachers more
accurately reported behaviors associated with cognitive deficits using the BRIEF. A limitation to
this study was a small sample size (N = 70). Also, because the control group was recruited from
a sample of children at a clinic, participants in this group may have been more impaired than that
of an average population of children serving as controls, hence not truly representative of a
control group.
Reddy, Hale, and Brodzinsky (2011) found a different pattern than that of McCandless
and O’Laughlin (2007). A group of 58 children diagnosed with ADHD were matched (age,
gender, parent’s education, & ethnicity) with 58 children who served as controls. Their parents
were administered the BRIEF. Based on several t-tests for independent samples, the results were
52
statistically significant between the ADHD group and the control group. Three discriminant
function analyses were conducted to examine the classification rate of the ADHD sample in
comparison to the control sample on the GEC, the two index scores (BRI and MI) and the eight
scale scores. Using the GEC, the conditional probability for the children in the control sample
was .77, whereas the ADHD sample was .79. Using the BRI and MI, conditional probability for
the children in the control sample were .86, and the ADHD sample was .79. Results were similar
for the scales: .84 for the control group and .81 for the ADHD sample. These findings meet or
exceed a recommended standard (.75) for diagnostic tests, proposed by Milich, Widiger, and
Landau (1987) for clinical practice.
In examining the discriminant functions, Reddy et al. (2011) reported that the BRI of the
BRIEF had a loading of .77, which was the highest correlation to function in comparison to the
Shift (.35), Emotional Control (.40), and Working Memory (.34) scales. The Inhibit, Initiate, and
Organization of Materials scales had low correlations to the function (.13, .14, and .26,
respectively), and the Plan/Organize and Monitor scales had low inverse correlations to the
function (-.18 and -.10, respectively). Limitations of the study were the small sample, and the
homogenous social class and racial composition of the sample, which were primarily Caucasian
children with college-educated parents. All of these issues may limit generalizability of the
findings.
Ecological validity. As in most fields, it is necessary to increase the applicability of the
results obtained through controlled experiments to naturally occurring phenomenon, which is
defined as ecological validity. Two aspects of ecological validity are pertinent to
neuropsychological testing: (a) verisimilitude (degree of similarity between test demands and
real-life demands) and (b) veridicality (degree of accuracy in predicting some environmental
53
behavior or molar outcome; Franzen & Wilhelm, 1996). Applying this concepts to the context of
school-based evaluations, versimilitude is similar to face validity in that the amount of what the
child does during an individual assessment translates into what a child is expected to do in a
classroom or learning environment. Because most individual assessment settings are quiet and
controlled, unlike a classroom with several distractions, Franzen and Wilhelm (1996) contend
that assessments often underestimate the degree of difficulty a child experiences in real-world
settings. Veridicality addresses whether a test might predict real-world behavior and can be used
to predict future behavior, which may contribute to a better understanding of the child in settings
such as the classroom. The point of psychoeducational assessments is to develop effective
interventions based on observations or test performance in individual assessment settings and
ensure these same interventions apply in the classroom (Franzen & Wilhelm, 1996).
In regard to ecological validity, Gioia and Isquith (2004) provide an application of the
BRIEF’s methodology to the assessment of executive dysfunction using a sample of children
with traumatic brain injury (TBI) and advocate for the use of both behavioral rating scales and
task-based measures in assessment. Both methods are necessary to properly assess and develop
appropriate interventions in clinical as well as applied (e.g., school) settings. Thus, the authors
outlined the neuropsychological deficits associated with TBI and the social, emotional,
academic, behavioral, and environmental impacts of such an injury and role that the BRIEF-
Parent form plays in measuring these areas.
Gioia and Isquith (2004) contend that the items on the BRIEF-Parent form have strong
ecological validity for several reasons: (1) the items originated from clinical interviews with
parents and teachers as well as input from 12 clinical neuropsychologists; (2) the BRIEF was
designed to capture everyday manifestations of executive dysfunction through items, such as
54
“When sent to get something, forgets what he or she is supposed to get” (tapping working
memory); and (3) the BRIEF scores correlated with scholastic achievement (e.g., Clark,
Pritchard, & Woodward, 2010; Mahone, Koth, Cutting, Singer, & Denckla, 2001). The BRIEF,
however, is still prone to limitations of observer rating scales, such as appropriate levels of
linguistic competence and emotional involvement influencing observations (Denckla, 2002).
Despite these limitations, interventions dealing specifically with executive function deficits have
shown promising evidence-based results (see Diamond & Lee, 2011). In a recent issue of
Communique, a newsletter circulated to members of the National Association of School
Psychologists (NASP), Cantin, Mann, and Hund (2012) reviewed various measures that
demonstrate strong psychometric characteristics and are recommended for school assessment.
The BRIEF was included in the list of instruments deemed useful for psychoeducational
assessment. Many types of validity, including social consequences, must be considered when
measurement is involved in educational decision-making.
Social consequences. Assessment may have seen, or unforeseen, repercussions for
individuals, groups of individuals, or society as a whole, based on the results yielded from a
measure (Messick, 1989). Thus, social consequences associated with the assessment should
always be considered. There are at least four issues about the BRIEF-Parent form that could
result in negative social consequences: (a) observer’s bias, (b) misinformation in media, (c)
overlap with ADHD diagnosis criteria, and (d) malingers.
The BRIEF-Parent form is designed to identify clusters of behaviors, which have been
labeled executive functions, and to put them in a simplified format that requires another
individual (in this case a parent or caregiver) to gauge the severity of a child’s behaviors. This
format of gathering information results in some degree of observer bias. However, the valuable
55
information gleaned from those in frequent contact with the child likely outweighs the negative
aspects of observer rating scales (Gioia & Isquith, 2004).
Another issue is whether the information gathered through instruments, such as the
BRIEF, ultimately exposes and help those in society who experience these difficulties. Behaviors
considered undesirable, particularly in a learning environment (e.g., speaking out of turn in a
classroom) need to be addressed through school-based interventions. The question is whether
there will be repercussions to the increased exposure of such terms for behaviors (typically used
in clinical settings) to parents and teachers through instruments such as the BRIEF. And, will
this exposure ultimately lead to mistruths about clinical disorders such as ADHD? For example,
Gonon, Bezard, and Boraud (2011) indicate that scientific literature often misrepresents research
about ADHD, resulting in misleading conclusions in the media. This misrepresentation may
result in parents reading or hearing these mistruths and to be misinformed. Parents are primarily
responsible for making decisions about treatment options on behalf of their children (e.g., drug
therapy, counseling, and/or education placement); thus, misinformation may result in incorrect
decision-making by parents. Because this risk is possible, it is important to consider word choice
and content of instruments given to parents about their child’s behaviors, such as the BRIEF-
Parent form. Thorell and Nyberg (2008) have been critical of the BRIEF-Parent form because
the language used in many items are similar, if not the same, to the diagnostic criteria for ADHD,
as in the Diagnostic and Statistical Manual of Mental Disorders-Test Revision (4th
ed. [DSM-IV-
TR]; American Psychiatric Association [APA], 2000). Due to the semantic overlap between
ADHD symptoms and EF measures, such as the BRIEF, Thorell and Nyberg contend that it
makes sense that these instruments would correlate with ADHD symptoms. This similarity in
wording raises the issue of whether the BRIEF-Parent form is actually tapping the relevant range
56
of ADHD symptoms or tapping a narrow range of symptoms due to wording. Thus, it is
legitimate to question whether using the BRIEF-Parent form to identify young children with
ADHD will be useful in predicting the existence of the disorder in the future. Also, Thorell and
Nyberg criticize the BRIEF-Parent form for confounding executive function concepts,
specifically, working memory and sustained attention. The item “has a short attention span” is
part of the BRIEF-Parent’s working memory scale, but Thorell and Nyberg claimed that such
items, which actually examine inattention, should not belong under the working memory
categorization.
Another potential issue in using the BRIEF-Parent form is the detection of malingerers.
No research was found that directly addressed the issue of susceptibility of the BRIEF-Parent
form to malingerers. Clinicians must consider this possibility when interpreting the BRIEF
results, as the test is based on another’s observations of a child. It could be argued that
government agencies provide financial motivation that could influence an individual to
exaggerate or outright lie on a test such as the BRIEF to obtain a diagnosis related to executive
dysfunction. Researchers (e.g., Fisher & Watkins, 2008; Sollman, Ranseen, & Berry, 2010) have
demonstrated in samples of college students, who were administered ADHD screeners, that it is
possible to feign clinical levels and achieve false-positive diagnoses, simply by exposing the
students to Internet-derived materials about ADHD prior to the students completing the
screeners. The possibility of false positives is another reason why Gioia et al. (2000) emphasize
that the BRIEF-Parent form should be only one part of an overall clinical or psycho-educational
assessment. Gioia et al. (2000) has attempted to address the area of bias through the
Inconsistency scale and the Negativity scale (see page 21).
57
Summary
In summary, the BRIEF-Parent form has been shown to demonstrate many facets of
validity (predictive, convergent, discriminant and ecological) particularly by the authors of the
instrument, but to a limited degree in independent studies. Social consequences were also
considered. Evidence is continuing to build to support the premise that the BRIEF-Parent form
is a psychometrically sound instrument in terms of both reliability and validity of scores, but at
this time, reliability estimates of the scores are moderate and evidence of construct validity is
mixed. Frequency of its use has increased in both clinical and school-based settings. The
increase in popularity may be due to the number of empirical studies using the BRIEF-Parent
form. The factor structure of the BRIEF-Parent is still, however, a topic of debate. The
conceptualization of executive function has evolved over the past two decades to involve
theories of separable, but related processes of EF. Predictive validity studies have provided
mixed support on the usefulness of the BRIEF-Parent form. Some research (McCandless &
O’Laughlin, 2007; Pratt, 2000) indicates that children with ADHD show more impairment on the
BRIEF-Parent scales than controls, whereas others (Mahone et al., 2002) show that the BRIEF
scales is not an accurate tool in correctly identifying ADHD or Tourette’s syndrome.
Convergent validity studies have varied in levels of support of the BRIEF. Gioia et al. (2000)
have showed convergence between the BRIEF-Parent scales and other established behavior
rating scales, such as the ADHD-IV, CBCL, BASC, and CRS. However, the BRIEF-Parent
scales have weak to moderate correlations with task-based measures, particularly in small
clinical populations such as children with brain disease or traumatic brain injury (Anderson et al.,
2002; Vriezen & Pigott, 2002). Although Gioia et al. (2000) have been able to demonstrate
discriminant validity through factor analysis between the BRIEF-Parent scales and the CBCL,
58
independent studies have varied in results. McCandless and O’Laughlin (2007) demonstrated
less than ideal discrimination between those with ADHD and the various ADHD subtypes when
using the BRIEF. In comparison, Reddy et al. (2011) showed adequate categorization based on
the BRIEF-Parent, with the BRI scores best separating the diagnostic groups. Ecological validity
studies are somewhat limited to those conducted by the test authors (Gioia & Isquith, 2004). But
more research is emerging that show relation between executive function instruments (such as
the BRIEF-Parent form) and school-based executive function interventions, including
computerized training, aerobic exercise, and martial arts and mindfulness practices (Diamond &
Lee, 2011). Social consequences are noteworthy when considering that EF symptoms can be
feigned. The possibility of faking symptoms could arguably desensitize the public to these EF
behaviors and downplay the severity of executive dysfunction. Despite the social consequences
that may arise from using an observer rating scale to assess the presence (or absence) of
executive dysfunction, instruments such as the BRIEF-Parent form are becoming increasingly
popular in the school setting (Hale & Fiorello, 2004). Because of the increased use of the BRIEF
in schools, more studies are warranted to ensure that the measure is psychometrically appropriate
for use.
Purpose of the Present Study
The purpose of this study is to examine and scrutinize the current factor structure of the
BRIEF-Parent form. Continued investigation of the present factor structure of the BRIEF-Parent
form is needed as there continues to be debates regarding the appropriate number of scales and
index scores that best reflect the structure of the scale. The test authors (Gioia et al., 2002) have
conducted one of the most frequently cited studies in the BRIEF-Parent literature; thus,
independent examination of the BRIEF-Parent form is needed. More research is needed using a
59
sample of children with mixed clinical diagnoses. To date, only one study (Gioia et al., 2002),
the developers, have used a sample of children in the U.S. with diverse diagnoses.
Internationally, Egeland and Fallmyr (2010) obtained similar findings with a diverse clinical
sample, but in a sample of Norwegian children using a Norwegian translation and American
norms. Huizinga and Smidts (2011) also conducted a study using a sample of “normal Dutch
school children” and a Dutch version of the BRIEF-Parent form. Regardless of the limitations
(i.e., small sample size; Norwegian sample using American norms), Egeland and Fallmyr’s
results supported Gioia et al.’s (2002) findings of a three-factor solution of the BRIEF-Parent
form based on nine scales. Huizinga and Smidts (2011) did not.
Using the normative data collected for the instrument, Gioia et al. (2000) determined that
the best structure was a two-factor model based on eight scales. Huizinga and Smidts (2011)
also analyzed data from Dutch parents that indicated the best structure of the BRIEF-Parent form
is the eight-scale version, the original scale format (Gioia et al., 2000). However, both Gioia et
al. (2002), and Egeland and Fallmyr (2010), using a mixed clinical sample, found that a three-
factor model, based on a nine-scale version also provided good fit to the BRIEF-Parent scores.
Alternative factor structures have been investigated and supported by research and should also be
revisited. Only one study (Egeland & Fallmyr, 2010) has examined both the eight- and nine-
scale versions of the BRIEF-Parent form and it was international. Hence, the issue about the best
structure of the BRIEF-Parent still exists. More studies are needed that examine the current
structure in comparison to alternative models using a mixed clinical population from the U.S.
The purpose of the present study is to scrutinize the current factor structure of the
BRIEF-Parent form in the context of a mixed clinical sample that would be similar to that of
60
students receiving special education services in a school population. Two questions will guide
the study:
1. Will the factor structure of BRIEF-Parent scores obtained from a mixed clinical
sample of school-aged youth align with the two-factor, eight-scale structure
originally proposed by the test authors?
2. If the current factor structure does not meet established criteria in this sample,
which, if any, alternative models meet the standards of good-fit?
This study will employ confirmatory factor analysis (CFA). Using the best criteria in
conducting CFAs, which are described in the Method section, multiple models of the BRIEF-
Parent will be tested and compared to the two-factor, eight-scale model that is currently
employed in the BRIEF-Parent form instrument. Furthermore, reliability estimates of the BRIEF
scores will be provided as well.
61
METHOD
Participants
Participants were 371 students in grades from kindergarten through 12, in which parent or
caregiver ratings were obtained for each. The raters were 267 mothers (72.0%), 73 fathers
(19.7%), and 31 (8.3%) other family members (e.g., grandparents, aunts, or step-parents). The
sample was a compilation of three archival data sets. One dataset was based on 264 students,
who obtained evaluations for the purposes of the Pennsylvania Office for Vocational
Rehabilitation (OVR) office to determine eligibility for special education services in the post-
secondary educational setting. Hereafter this dataset is designated as the OVR sample. All
students referred by OVR for this evaluation had previously been identified in their home school
districts as qualifying for special education services and residing in Northwestern Pennsylvania.
Two, data from 45 students were from private evaluations and reflected a mixture of clinical
diagnoses. These evaluations were conducted for two main reasons: (a) a parent’s complaint to
the child’s home school district about the special education determination, which resulted in a
third party evaluation or (b) a desire by parents to gain a better understanding of their child’s
educational functioning. This dataset in subsequent writings is called the private sample. Three,
the remaining dataset was based on 62 students, who had obtained a psycho-educational
evaluation through the school district’s referral procedures. The name for this dataset has been
shortened to be school sample.
Diagnoses for the sample were mixed with children identified for special education
services under the categories of Other Health Impairment, Autism, Specific Learning Disability,
Traumatic Brain Injury, and Emotional Disturbance. The ratio of males to females across the
samples ranged from approximately 3:2 to 3:1: OVR sample—59.8% males and 40.2% females;
62
private sample—71.1% males and 28.9% females; and school sample— 64.5% males and 35.5%
females. Both the private and school samples were comparable in age: private sample 5 to 16
years old (M = 10.8 years; SD = 3.34); and school sample 6 to 18 years old (M= 11.2 years; SD =
3.04). However, the students in the OVR sample were older, 16 to 18 years old, (M = 17.4
years; SD = 0.64). Students’ race/ethnicity in the samples were predominantly Caucasian: 94.7%
for the OVR sample, 97.8% for the private sample and 96.8% for the school sample. The
composition of the three samples, arranged by age, gender, and race, is reported in Table 1.
Table 1
Demographic Characteristics of Samples
OVR (N = 264) Private (N = 45) School (N = 62) All (N = 371)
Characteristic n (%) n (%) n (%) n (%)
Gender Male 158 (59.8) 32 (71.1) 40 (64.5) 230 (62.0)
Female 106 (40.2) 13 (28.9) 22 (35.5) 141 (38.0)
Age 5 -- 1 (2.2) -- 1 (0.3)
6 -- 2 (4.4) 2 (3.2) 4 (1.1)
7 -- 8 (17.8) 5 (8.1) 13 (3.5)
8 -- 3 (6.7) 8 (12.9) 11 (3.0)
9 -- 5 (11.1) 7 (11.3) 12 (3.2)
10 -- 3 (6.7) 6 (9.7) 9 (2.4)
11 -- 3 (6.7) 5 (8.1) 8 (2.2)
12 -- 5 (11.1) 7 (11.3) 12 (3.2)
13 -- 2 (4.4) 6 (9.7) 8 (2.2)
14 -- 5 (11.1) 5 (8.1) 10 (2.7)
15 -- 3 (6.7) 5 (8.1) 8 (2.2)
16 22 (8.3) 5 (11.1) 5 (8.1) 32 (8.6)
17 118 (44.7) -- -- -- 118 (31.8)
18 124 (47.0) -- 1 (1.6) 125 (33.7)
Race Caucasian 250 (94.7) 44 (97.8) 60 (96.8) 354 (95.4)
African
American
4 (1.5) 1 (2.2) -- 5 (1.3)
Hispanic 4 (1.5) -- -- 4 (1.1)
More than
One Race
2 (0.8) -- 1 (1.6) 3 (0.8)
Unknown or
unspecified
4 (1.5) -- 1 (1.6) 5 (1.3)
Note. OVR = Office of Vocational Rehabilitation.
63
Geographical Context
The participating school district was located in Northwestern Pennsylvania, where the
majority of the district’s families were considered to be of low to middle income.
Approximately 40% of the district’s students were classified as low-income in accordance with
the Pennsylvania Department of Education criterion (PA Department of Education, 2012). Most
of the children were predominantly Caucasian, non-Hispanic (over 98%). According to the
2009-2010 data compiled by the Pennsylvania Department of Education, a total of 1,258 students
attended grades from kindergarten through 12 in the district. Special education services were
provided to 230 students (18.3% of the population).
Measures
Demographic information. Demographic information about the youth, such as gender,
grade, age, and birth date, and the rater (name, relationship, and date of completing the form)
was collected.
BRIEF-Parent form. As noted earlier, the BRIEF-Parent form is a questionnaire,
designed for parents/guardians to complete to help professionals assess executive function
behaviors of school-age (5-18) youth in the home and school environment (Gioia et al., 2000).
The term parent is broadly defined to include any individual “with the most recent and most
extensive contact with the child” (Gioia et al., p. 5). In designing this measure, a key goal was to
create a measure that would be easy to score and would yield useful information about executive
functioning, which most professionals can commonly agree (Gioia et al., 2000).
As summarized earlier, the scale contains 86 items, which are divided into eight scales:
Inhibit, Shift, Emotional Control, Initiate, Working Memory, the Plan/Organize, Organization of
Materials, and the Monitor scale. Items that comprise each scale are displayed in Appendix B.
64
Three composite measures are created from combining specific scales: The Behavioral
Regulation Index (BRI; Inhibit, Shift, and Emotional Control); Metacognition Index (MI;
Initiate, Working Memory, Plan/Organize, Organization of Materials, and Monitor), and the
Global Executive Composite (GEC; BRI and MI scores).
Based on the items presented, the rater is asked to describe the child’s behavior over the
past six months. Raters are instructed to read statements concerning specific behaviors and to
rate the frequency of their occurrence. If the behavior has never been observed of the child over
the past six months, the rater is instructed to circle the letter N (Never). Likewise, if the behavior
has sometimes been a problem, the rater is expected to circle the letter S (Sometimes), and if the
behavior has often been a problem, the rater is expected to circle the letter O (Often). No further
explanation or definition of “sometimes” or “often” is provided. Raters are instructed to
complete all items even if a behavior does not apply to the child.
Scoring parallels the rating format (Never = 1, Sometimes = 2, Often = 3). Scores are
then summed for each of the eight scales and a composite score is computed. Additionally, a
qualified administrator enters each score into the software called Behavior Rating Inventory of
Executive Functioning Scoring Portfolio (BRIEF-SP; Isquith & Gioia, 2002), resulting in BRIEF
score profile of the child, which is a plot composed of T score for each scale. If the scoring
program is not available, appendix tables are provided in the BRIEF manual, with scores
presented by gender and age group. A T score of 50, the mean of the T score distribution, is
designated as the reference point for what is considered as for Normal levels of the particular
index or composite score. T scores at 1.5 standard deviations (T = 65) above or higher than the
mean of the T score distribution are classified in the manual as Clinically Significant. Such
elevated scores are considered to warrant special attention. Scores falling in the 51 to 64 range
65
are considered At-risk. These results are to be used in the context of a complete evaluation.
Thus, it is recommended that decisions about educational placement or intervention should not
be based solely on the BRIEF scores (Gioia et al., 2000). Description and information about the
development of the scale and the psychometric properties (reliability estimates and evidence for
validity) of the BRIEF is summarized above (see pp. 23-57).
Procedure
The archival data were obtained from two sources: (a) the school district and (b) the
private files of the licensed psychologist, who had conducted the evaluations for the private and
OVR cases. On written request, the school district’s superintendent approved access to the data.
Similarly, the licensed psychologist who conducted the private and OVR evaluations gave access
to these cases (see Appendices C and D). The licensed psychologist and the evaluators were
eight certified school psychologists (certified by the Department of Education in the state of
Pennsylvania to practice school psychology in a school district setting), with two evaluators also
possessing a doctoral level degree in the school psychology field. Years of experience for the
psychologists ranged from three to 15 years (M = 8.4; SD = 4.6). All psychologists (2 females; 6
males) were of the Caucasian race.
During the summer of 2009, the licensed psychologist created a digital database
containing information gathered through OVR and private evaluations, starting at 2003. The
BRIEF-Parent form was one of several protocols recorded into the database. Employees
(certified school psychologists) recorded T scores from the scoring program profile sheet. Item
level responses were recorded in a similar manner and gleaned from individual parent protocols.
Because the archived data did not contain identifying information about the participants,
the Pennsylvania State University’s Office of Research Protections (ORP) determined that the
66
proposed research was not “human participant” research as defined by the Department of Health
and Human Services (DHHS) Federal Regulations. Therefore, the research did not need to be
reviewed by the Institutional Review Board (IRB). Email correspondence of the study’s research
status is contained in Appendix E.
CFA Guidelines and Models
CFAs, using EQS software (v. 6.2), were conducted on the scores of the scales to
examine the factor structure of the BRIEF-Parent form. Raw scores served as the input data,
which were converted into a covariance matrix of the variables. The method of extraction was
maximum likelihood.
Models. Seven models were tested, three based on eight scales and four based on nine
scales. In the eight-scale models, the Monitor scale was treated as one scale, whereas in the
nine-scale models, it was divided into two separate scales: Self-Monitor and Task-Monitor.
Each scale was treated as an indicator. Parameters were estimated by fixing one indicator per
factor to unity. Variance and covariance of the factors were estimated.
Two approaches were taken to test the models. The two-factor, eight-scale model was
directly tested to determine whether it met the guidelines for a good fitting model. Then an
alternative model approach was used, in which each model was compared to other potentially
viable models to ascertain the model with the best fit to the data (Jöreskog, 1993). The
alternative approach model is one of three types of CFA frameworks used; the other two are
strictly confirmatory (i.e., one model is postulated and is either rejected or not rejected with no
modifications made) and model-generating approach (i.e., a model shows poor fit and an
exploratory approach is used to identify a better model). Strictly confirmatory was not used in
this study because this strategy would not allow for direct comparison of alternative models to
67
the current factor structure. The model-generating process was also not used because this
approach has been shown, particularly in small samples, to be negatively affected by chance
characteristics of the sample (MacCallum, Roznowski, & Necowitz, 1992). The alternative
approach was used because it is not driven by the data, but is based on the comparison of several
a priori models.
Starting with the most reduced model, Model 1 was a one-factor model, labeled Unity-8,
for the eight-scale version, in which executive function is depicted as one construct. Model 2,
labeled 2Original-8 was the original, two-factor, eight-scale model proposed by Gioia et al.
(2000), composed of Behavioral Regulation (Inhibit, Shift, Emotional Control) and
Metacognitive (Initiate, Working Memory, Plan/Organize, Organization of Materials, &
Monitor). Model 3 also represented an eight-scale, two-factor model, which had been proposed
by Donders et al. (2010) and was labeled accordingly (2Donders-8): Behavioral Regulation
(Shift, Emotional Control) and Metacognitive (Inhibit, Initiate, Working Memory,
Plan/Organize, Organization of Materials, & Monitor). The difference between the original and
Donders et al.’s model is in the placement of the Inhibit scale. Donders et al. placed the Inhibit
scale on the Metacognitive factor instead on the Behavioral Regulation factor.
Models 4 through 7 were based on nine scales, with the Monitor scale divided into two,
and the number of factors varied from one to four. Model 4 was a one-factor model for the nine-
scale version and was designated as Unity-9. Model 5’s, labeled 2Monitor-9, two-factor
structure had the following composition: four scales on Behavioral Regulation (Inhibit, Shift,
Emotional Control, & Self-Monitor) and five scales on Metacognitive (Initiate, Working
Memory, Plan/Organize, Organization of Materials, & Task-Monitor). The original model
(Model 2) had only one Monitor scale, which loaded on the Metacognitive factor, whereas in
68
Model 5 each factor had a Monitor scale. Model 6 was a three-factor model, named 3Monitor-9,
in which two scales loaded on Behavioral Regulation (Inhibit & Self-Monitor); two scales, which
had previously been on Behavioral Regulation loaded on a new factor called Emotional
Regulation (Emotional Control & Shift) and the same five scales loaded on Metacognitive factor
(Initiate, Working Memory, Plan/Organize, Organization of Materials, & Task-Monitor).
Finally, Model 7, designated as 4Monitor-9, was a four-factor model, in which the composition
of the Behavioral Regulation (Inhibit and Self-Monitor) and Emotional Regulation (Emotional
Control and Shift) factors were the same as Model 6. However, the Metacognitive factor was
divided into two factors: “Internal” Metacognition (Initiate, Working Memory, & Plan/Organize)
and the “External” Metacognition (Organization of Materials & Task-Monitor scales; Gioia et
al., 2002). A summary of the models is reported in Table 2; depictions of these models are also
presented in Figures 1 through 7.
Fit criteria. Several criteria were used to evaluate goodness of fit. The standard
criterion has been the chi-square test (χ2), in which a statistically nonsignificant (p > .05) test
would indicate that the model is a good fit for the data. However, this criterion has been found
to be sensitive to sample size, and as a result, could be statistically significant even when the
model might be a good fit for the data (Bentler, 1988). Thus, χ2, along with its degree of
freedom and associated p-value were examined, but was not considered a sufficient criterion to
assess model fit on its own. Other fit indices were used that reflected three broad categories of
fit: absolute, incremental, and parsimony.
69
Table 2
Composition of Models Organized by Factor and Indicator
Model Name Factor (Indicator)
Eight-Scale
Model 1 Unity-8 1. GEF (Inhibit, Shift, Emotional Control, Initiate, Working Memory, Plan/Organize,
Organization of Materials, Monitor )
Model 2 2Original-8 1. BRI (Inhibit, Shift, Emotional Control)
2. MI (Initiate, Working Memory, Plan/Organize, Organization of Materials, Monitor)
Model 3 2Donders-8 1. BRI (Shift, Emotional Control)
2. MI (Initiate, Working Memory, Plan/Organize, Organization of Materials, Monitor,
Inhibit)
Nine-Scale
Model 4 Unity-9 1. GEF (Inhibit, Shift, Emotional Control, Initiate, Working Memory, Plan/Organize,
Organization of Materials, Self-Monitor, Task-Monitor )
Model 5 2Monitor-9 1. BRI (Inhibit, Shift, Emotional Control, Self-Monitor)
2. MI (Initiate, Working Memory, Plan/Organize, Organization of Materials, Task-
Monitor)
Model 6 3Monitor-9 1. BRI (Inhibit, Self-Monitor)
2. ERI (Shift, Emotional Control)
3. MI (Initiate, Working Memory, Plan/Organize, Organization of Materials, Task-
Monitor)
Model 7 4Monitor-9 1. BRI (Inhibit, Self-Monitor)
2. ERI (Shift, Emotional Control)
3. Int MI (Initiate, Working Memory, Plan/Organize)
4. Ext MI (Organization of Materials, Task-Monitor)
Note. GEF = General Executive Functioning; BRI = Behavioral Regulation Index; MI = Metacognition Index; ERI = Emotional
Regulation Index; Int MI = Internal Metacognition; Ext MI = External Metacognition.
70
Figure 1. One-factor, eight-scale (Unity-8) model based on theory of unity.
General Executive Functioning
Shift
Inhibit
EmotionalControl
Initiate
Working Memory
Plan / Organize
Organizationof Materials
MonitorMonitor
71
Figure 2. Gioia et al.’s (2000) original two-factor model based on eight scales
(2Original-8). BRI =Behavioral Regulation Index; MI = Metacognition Index.
MI
Shift
Inhibit
EmotionalControl
Initiate
Working Memory
Plan / Organize
Organizationof Materials
Monitor
BRI
72
Figure 3. Donders et al.’s (2010) two-factor, eight-scale model (2Donders-8) depicted
at scale level. BRI =Behavioral Regulation Index; MI = Metacognition Index.
MI
Emotional Control
Shift
Initiate
Working Memory
Plan / Organize
Organizationof Materials
Monitor
BRI
Inhibit
73
Figure 4. Gioia et al.’s (2002) one-factor, nine-scale model (Unity-9) depicted at scale level.
General Executive Functioning
Shift
Inhibit
EmotionalControl
Initiate
Working Memory
Plan / Organize
Organizationof Materials
Monitor
Task-Monitor
Self-Monitor
74
Figure 5. Gioia et al.’s (2002) two-factor, nine-scale model (2Monitor-9) depicted
at scale level. BRI =Behavioral Regulation Index; MI = Metacognition Index.
MI
Shift
Inhibit
Initiate
Working Memory
Plan / Organize
Organizationof Materials
Task-Monitor
BRIEmotional
Control
Self-Monitor
75
Figure 6. Gioia et al.’s (2002) three-factor model (3Monitor-9) depicted at scale
level. BRI =Behavioral Regulation Index; ERI= Emotional Regulation Index;
MI = Metacognition Index.
MI
Self-Monitor
Inhibit
Initiate
Working Memory
Plan / Organize
Organizationof Materials
Task-Monitor
BRI
EmotionalControl
ShiftERI
76
.
Figure 7. Gioia et al.’s (2002) four-factor model (4Monitor-9) depicted at scale
level. BRI =Behavioral Regulation Index; ER= Emotional Regulation Index;
Internal MI = Internal Metacognition Index; External MI = External Metacognition
Index.
Inhibit
BRISelf - Monitor
ExternalMI
Organizationof Materials
Task-Monitor
Shift
ERIEmotional
Control
Initiate
InternalMI
Working Memory
Plan / Organize
77
Absolute indices are ones that “directly assess how well an a priori model produces the
sample data” (Hu & Bentler, 1998, p. 426) and their calculation does not rely on a baseline
model. Absolute indices used for judgment of model fit were (a) the standardized root-mean-
square residual (SRMR), which is the average difference between sample variance and
covariances, and the estimated population variance and covariances (Hu & Bentler, 1995), and
(b) the root mean square error of approximation (RMSEA), which is the discrepancy between the
error of approximation in the population covariance matrix and optimally chosen parameter
values of the model (Steiger & Lind, 1980). Good model fit was determined based on low
values for both indices: RMSEA equal or less than 0.8 (Browne & Cudeck, 1993); SRMR equal
or less than .08 (Hu & Bentler, 1999). A 90% confidence interval around RMSEA estimate was
reported to increase the precision of the estimate. The lower bound should ideally be less than
0.05 and as close to zero as possible and the upper bound should be equal or less than .08
(Browne & Cudeck, 1993).
Incremental indices “measure the proportionate improvement in fit by comparing a target
model with a more restricted, baseline model” (Hu & Bentler, 1998, p. 426). The null hypothesis
for these models was that all variables were uncorrelated. Two incremental fit indices used were
the comparative fit index (CFI) and the nonnormed fit index (NNFI); values equal or greater than
0.90 were indicative of an acceptable fit (Marsh & Grayson, 1995). The three 8-scale models
(Models 1 through Model 3) were nested as were all nine-scale models (Models 4 through Model
7). Chi-square values were used to compare the nested models with lower values considered
ideal. Furthermore, incremental change in χ2
was also used to examine the nested models in
regard to which least restrictive model (e.g., 2-factor) had the better fit in comparison to a
subsequent restrictive model (e.g., 1-factor).
78
To compare the best fitting eight-scale models to the best fitting nine-scale models, the
Expected Cross-Validation Index (ECVI; Browne & Cudeck, 1989) was used to assess
parsimony of the non-nested models. Parsimony fit indices penalize models that are less
straightforward so that simpler theoretical processes are preferred over more complex ones. This
particular value is used to express overall error between the population covariance and the model
fitted to the sample. A lower value is considered ideal when compared to ECVI values of other
models (Diamantopoulos & Siguaw, 2000).
Models were also examined for fit and misfit on several other criteria: average off-
diagonal absolute standardized residual (AODSR), the statistical significance of the parameters
in the equations, and the effect sizes (R2) of the parameters. Average off-diagonal absolute
standard residuals are used to indicate the discrepancy between the sample covariance matrix and
the model covariance matrix (Browne, 2006). Standardized residuals are comparable to standard
scores in a sampling distribution, and as a result, these values can be interpreted like z scores.
Thus, values greater than 2.58 (p < .01) are considered “large” (Byrnes, 2006, p. 94). Statistical
significance of the estimated parameters (unstandardized parameter estimate divided by its
standard of error) was also conducted. The ratios are interpreted like z scores, such that values
greater than ± 1.96, are considered statistically significant at the probability level of .05 (Byrne,
2006). Effect sizes (R2) of the parameters were reflected as squared values of the standardized
path coefficients; values less than 0.10 indicated a “small” effect; values around .30 indicated a
“medium” effect; and values greater than .50 was considered a “large” effect (Cohen, 1988).
79
RESULTS
Preliminary Analyses
Preliminary analyses were conducted to examine the scores on the BRIEF-Parent for
missing values and outliers. A total of 59 cases were identified as missing one or more
responses; however, the missing data were not imputed as the values are not reflective of an
attitude but an actual condition. Thus, these cases were deleted listwise, reducing the sample
size from 430 to 371. The Mahalanobis distance test (Tabachnick & Fidell, 2001) was
conducted and three extreme multivariate outliers (p < .01) were identified. Removing these
cases and re-running the primary analyses without them did not substantially alter the findings.
As a result, all analyses included these cases. The final sample used for statistical analysis was
371.
Descriptive Statistics
Descriptive statistics of the BRIEF scale scores (mean, standard deviation, skewness, and
kurtosis values, reliability estimates, and correlations) are presented in Table 3. Assumptions for
parametric statistics were tested (linearity and normality; Kline, 2006) for the BRIEF scores.
Visual examination indicated that the scores on the Inhibit scale had a slightly positive skew,
whereas the scores on the Organization of Materials scale had a slightly negative skew. Scores
on the Self-Monitor scale were marginally platykurtic, while the scores on the Initiate scale
appeared to be slightly leptokurtic. Despite these minor variations, the scores of the scales
approximated a normal distribution, using the guidelines of less than 2 for skew and less than 7
for kurtosis (Curran, West, & Finch, 1996). Linearity, inspected visually through scatterplots of
the BRIEF scale scores, was determined to be acceptable.
80
Table 3
Descriptive Statistics of Raw Scale Scores on the BRIEF-Parent Form
1 2 3 4 5 6 7 8 9 10
1. Inhibit ---
2. Shift .60 ---
3. Emotional Control .69 .73 ---
4. Initiate .57 .64 .54 ---
5. WM .60 .63 .50 .72 ---
6. Plan/Organize .53 .58 .48 .77 .81 ---
7. Organization of Materials .50 .47 .46 .60 .62 .68 ---
8. Monitor .70 .63 .59 .72 .71 .75 .57 ---
9. Self-Monitor .72 .61 .64 .65 .58 .59 .44 .88 ---
10. Task-Monitor .47 .45 .36 .59 .64 .71 .55 .83 .47 ---
M 16.31 14.49 18.43 16.05 20.35 25.25 13.04 16.43 7.91 8.53
SD 5.56 3.93 5.47 3.73 5.54 6.22 3.74 4.08 2.55 2.21
Range 10 - 30 8 - 24 10 - 30 8 - 24 10 - 30 12 – 36 6 - 18 8 - 24 4 - 12 4 - 12
Skew .66 .15 .26 -.02 -.17 -.19 -.29 -.11 .01 -.15
Kurtosis -.56 -.83 -.93 -.66 -.85 -.86 -1.05 -8.3 -1.13 -.81
α .93 .86 .92 .80 .92 .91 .90 .85 .86 .78
Note. N = 371. All correlations were statistically significant at .01.
81
All 45 correlations were statistically significant at .01 and ranged from .36 (Task-Monitor
and Emotional Control) to .88 (Self-Monitor and Monitor; Mdn = .60). Plan/Organize was
highly correlated with several other scales (Initiate =.77; Working Memory = .81; Monitor = .75;
Task-Monitor = .71). Monitor was highly correlated with Inhibit (.70), Initiate (.72) and
Working Memory (.71). Other notably high correlations existed between Emotional Control and
Shift (.73); Working Memory and Initiate (.72); and Self-Monitor and Inhibit (.72). The Monitor
scale was also highly correlated with Self-Monitor (.88) and Task-Monitor (.83), but these
patterns were expected as the latter two scales make-up the former scale. Reliability
(Cronbach’s alpha) of the scores ranged from .79 (Task-Monitor) to .93 (Inhibit; Mdn = .88) for
the BRIEF scales. The reliability estimate for the scores on the Task-Monitor scale was slightly
under .80, which is considered the minimum level for high-stakes decisions (Sattler, 2001).
Confirmatory Factor Analyses
CFAs (maximum likelihood extraction) of the BRIEF-Parent form were conducted using
EQS v. 6.2 software on a covariance matrix computed from raw scale scores. A CFA using
item-level data was not conducted due to the low ratio (4:1) between sample size (N = 371) and
number of items (86; Barrett & Kline, 1981).
Criteria. As summarized in pages 68-77, models were considered a good fit based on
the following criteria: (a) the change (drop) in chi-square was statistically significant in
comparison to the null hypothesis or competing models (p < .01); (b) the fit indices of CFI and
NNFI were equal or greater than .90 (Marsh & Grayson, 1995); (c) RMSEA was equal or less
than .08 (Browne & Cudeck, 1993); (d) SRMR was equal or less than .08 (Hu & Bentler, 1999);
(e) average off-diagonal standardized residuals (AODSR) was less than .05; and (f) the largest
standardized residuals were less than |1.96| (Byrne, 2006). Each model was compared to one
82
another to ascertain the model with the better fit to the data. Because an alternative models
approach was used, post-hoc model re-specifications were not calculated (Jöreskog, 1993), but
were examined to address possible reasons for misfit of the data to a model.
Models. As described in the Method section (see pp. 66-78), seven models were tested.
The 2Original-8 model, the first posed by Gioia et al. (2000), was used as a basis of comparison
in relation to all other tested models. The six alternative models were based on prior research
(Donders et al., 2010; Egeland & Fallmyr, 2010; Gioia et al., 2002; Huizinga & Smidts, 2011;
Hulac, 2008). Three models were based on eight scales and the remaining four models were
based on nine scales, resulting in two sets of nested models. The nested models for the eight-
scale version were (a) the Unity-8 Model, (b) the 2Original-8 Model, and (c) the 2Donders-8
Model. Models for the nine-scale version were based on the subdivision of the Monitor scale
into two separate scales (a) Self-Monitor and (b) Task-Monitor. As indicated earlier, this
subdivision resulted into the reconfiguration of the nine scales into four nested models: (a) the
Unity-9 model, (b) the 2Monitor-9 Model, (c) the 3Monitor-9 Model, and (d) the 4Monitor-9
Model. Eight-scale models were considered non-nested with the nine-scale models.
Eight-scale models. A summary of the goodness-of-fit indices for the eight-scale models
of the BRIEF-Parent scale is displayed in Table 4. Across all fit indices, except for one, the
2Original-8 Model fitted the data better than the other two 8-factor models. The 2Original-8
Model had a χ2 value of 173.478, which was statistically lower than the Unity-8 Model (316.836)
or 2Donders-8 Model (214.787). All eight-factor model χ2
values were statistically significant at
.05 relative to the Null. Also, the 2Original-8 Model had consistently better fit values (≥ .90)
across the incremental fit indices (CFI and NNFI) in comparison to the variability found with the
other two 8-scale models (Unity-8 =.819 to .870; 2Donders-8 = .874 to .915). Furthermore, the
83
Table 4
Summary of the Fit Indices of CFA (ML Extraction) Models on the BRIEF- Parent Form Scale Scores for a Mixed Disability Sample
χ2
df χ2
diff NNFI CFI SRMR RMSEA 90% CI AODSR
Eight-Scale
Null 2320.048* 28 -- -- -- -- -- --
Unity-8 316.836* 20 2003.212* .819 .870 .066 .200 (.181, .220) .050
2Original-8 173.478* 19 143.358* .901 .933 .049 .148 (.128, .168) .042
2Donders-8 214.787* 19 102.049* .874 .915 .057 .167 (.147, .187) .041
Nine-Scale
Null 2501.886* 36 -- -- -- -- -- --
Unity-9 414.935* 27 2086.951* .790 .843 .076 .197 (.180, .214) .064
2Monitor-9 165.279* 26 249.656* .922 .944 .044 .120 (.103, .138) .035
3Monitor-9 131.689* 24 33.59* .934 .956 .041 .110 (.092, .128) .031
4Monitor-9 130.049* 21 1.640 .924 .956 .039 .118 (.099, .138) .031
Note. N = 371. ML = Maximum likelihood extraction; Unity-8 = One-factor, eight-scale model; 2Original-8 = Two-factor, eight-scale
Gioia model; 2Donders-8 = Two-factor, eight-scale Donders model; Unity-9 = One-factor, nine-scale model; 2Monitor-9 = Two-
factor, nine-scale model; 3Monitor-9 = Three-factor, nine-scale model; 4Monitor-9 = Four-factor, nine-scale model; χ2
= chi-square; df
= degrees of freedom; χ2
diff = chi-square difference; NNFI = Bentler-Bonett non-normed fit index; CFI = comparative fit index; SRMR
= standard root mean square; RMSEA = root mean square error of approximation; CI = confidence interval; AODSR = average off-
diagonal standardized residual. *p < .05.
84
2Original-8 Model had a lower SRMR value (.049) in comparison to the values for the other two
8-scale models (Unity-8 Model = .066; 2Donders-8 Model = .057). The 2Donders-8 Model had
a negligibly lower score for the average off-diagonal standardized residual (AODSR; .041) than
the 2Original-8 Model (.042; .001 difference). All eight-scale models evidenced misfit in that
the RMSEA was greater than .08 (Browne & Cudeck, 1993).
A closer examination of the CFA findings for the 2Original-8 Model indicated that all
equations for the parameter estimates were statistically significant at .05. Structure coefficients
ranged from .72 (Organization of Materials and MI) to .91 (Plan/Organize and MI; Mdn = .85).
Factor intercorrelation between the BRI and MI of the 2Original-8 Model was .794. Effect sizes
(R2
is provided by EQS) for the 2Original-8 Model were all “large” (> .50; Cohen, 1988) and
accounted for 52% (Organization of Materials) to 82% (Plan/Organize) of the variance to the
respective factor. Structure coefficients, effect sizes, and error terms are presented in Table 5 for
the 2Original-8 Model.
The nested eight-scale models were compared to one another through the incremental
change in χ2. The incremental fit of the Unity-8 Model differed from the Null [χ
2 (8) = 2003.212,
p < .001]. The 2Original-8 Model also differed from the Unity-8 Model [χ2
(1) = 143.348, p <
.001] as did the 2Donders-8 Model from the Unity-8 Model [χ2
(1) = 102.049, p < .001]. Of the
two-factor models, the 2Original-8 Model had a higher incremental change relative to the less
restricted model (i.e., Unity-8).
Nine-scale models. A summary of the goodness-of-fit indices for the nine-scale models is
also displayed in Table 3. As expected, the Null Model for the nine-scale data was not supported
in that the chi-square was statistically significantly larger than any of the nine-scale models. The
4Monitor-9 Model had a χ2 of 130.049, which was lower than the Unity-9 (χ
2 = 414.935),
85
Table 5
Structure Coefficients for BRIEF-Parent Scales for Mixed Disability Sample Arranged by Model
(Maximum Likelihood Extraction)
Structure Coefficient Structure Coefficient
(Error Terms) (Error Terms)
Model [Effect Size] Model [Effect Size]
Model 1: Unity-8 Model Model 2: 2Original-8 Model
Factor 1- GEF Factor 1- BRI
Inhibit .79 (.62) [.62] Inhibit .79 (.62) [.62]
Shift .84 (.55) [.70] Shift .84 (.55) [.70]
ECO .84 (.54) [.71] ECO .84 (.54) [.71]
Initiate .85 (.52) [.73] Factor 2- MI
WM .87 (.49) [.76] Initiate .85 (.52) [.73]
P/O .91 (.42) [.82] WM .87 (.49) [.76]
ORG .72 (.69) [.52] P/O .91 (.42) [.82]
Monitor .84 (.54) [.71] ORG .72 (.69) [.52]
Monitor .84 (.54) [.71]
Model 3: 2Donders-8 Model Model 4: Unity-9 Model
Factor 1- BRI Factor 1- GEF
Shift .91 (.42) [.82] Inhibit .72 (.69) [.52]
ECO .80 (.60) [.65] Shift .74 (.67) [.55]
Factor 2- MI ECO .67 (.74) [.45]
Initiate .85 (.53) [.72] S-Monitor .74 (.68) [.54]
WM .87 (.50) [.75] Initiate .85 (.53) [.72]
P/O .89 (.46) [.79] WM .86 (.51) [.74]
ORG .72 (.70) [.51] P/O .88 (.48) [.77]
Monitor .85 (.52) [.73] ORG .71 (.70) [.51]
Inhibit .70 (.71) [.50] T-Monitor .71 (.70) [.51]
Model 5: 2Monitor-9 Model Model 6: 3Monitor-9 Model
Factor 1-BRI Factor 1-BRI
Inhibit .82 (.57) [.68] Inhibit .85 (.53) [.72]
Shift .80 (.60) [.64] S-Monitor .85 (.53) [.72]
ECO .82 (.57) [.68] Factor 2 – ERI
S-Monitor .82 (.58) [.67] Shift .86 (.51) [.74]
Factor 2-MI ECO .85 (.53) [.72]
Initiate .84 (.54) [.71] Factor 3- MI
WM .88 (.48) [.77] Initiate .84 (.54) [.71]
P/O .92 (.38) [.85] WM .88 (.49) [.77]
ORG .73 (.68) [.53] P/O .92 (.38) [.85]
T-Monitor .74 (.67) [.55] ORG .73 (.69) [.53]
T-Monitor .74 (.67) [.55]
86
Table 5 (continued)
Structure Coefficient
(Error Terms)
Model [Effect Size]
Model 7: 4Monitor-9 Model
Factor 1-BRI
Inhibit .85 (.53) [.72]
S-Monitor .85 (.53) [.72]
Factor 2- ERI
Shift .85 (.53) [.72]
ECO .86 (.51) [.74]
Factor 3- Int MI
Initiate .84 (.54) [.71]
WM .88 (.49) [.77]
P/O .92 (.38) [.85]
Factor 4 – Ext MI
ORG .73 (.68) [.53]
T-Monitor .75 (.67) [.56]
Note. N = 371; ECO = Emotional Control; WM = Working Memory; P/O = Plan/Organize; ORG
= Organization of Materials; S-Monitor = Self-Monitor; T-Monitor = Task-Monitor; GEF=
General Executive Functioning; BRI = Behavioral Regulation Index; MI = Metacognition Index;
ERI = Emotional Regulation Index; Int MI = Internal Metacognition Index; Ext MI = External
Metacognition Index; Unity-8 = One-factor, eight-scale model; 2Original-8 = Two-factor, eight-
scale Gioia model; 2Donders-8 = Two-factor, eight-scale Donders model; Unity-9 = One-factor,
nine-scale model; 2Monitor-9 = Two-factor, nine-scale model; 3Monitor-9 = Three-factor, nine-
scale model; 4Monitor-9 = Four-factor, nine-scale model. Effect size = R2.
2Monitor-9 (χ2
= 165.279), but only slightly lower than the 3Monitor-9 Model (χ2
= 131.689).
Three of the four nine-scale models demonstrated a strong fit to the data (i.e., 2Monitor-9,
3Monitor-9, 4Monitor-9), with the fit indices of NNFI and CFI ranging from .922 to .956.
AODSRs (.031 - .035) and SRMRs (.039 - .044) were less than .05. The Unity-9 Model
demonstrated poor fit to the data across all fit indices except for the SRMR (.076), but its value
was the highest of the nine-scale models. All four 9-scale models evidenced misfit in that the
RMSEAs were greater than .08. However, the viable nine scale models were the 2-, 3-, and 4-
factor ones.
87
A comparison of the viable nine scale models showed that all equations for the parameter
estimates were statistically significant at .05. Structure coefficients for the 2Monitor-9 Model
ranged from .73 (Organization of Materials and MI) to .92 (Plan/Organize and MI; Mdn = .82).
Factor intercorrelation between the BRI and MI for the 2Monitor-9 Model was .771. For the
3Monitor-9 Model, structure coefficients were similar to the 2Monitor-9 and ranged from .73
(Organization of Materials and MI) to .92 (Plan/Organize and MI; Mdn = .85). Intercorrelations
of the factors for the 3Monitor-9 Model were moderate to high: BRI and ERI = .879; BRI and
MI = .763; and ERI and MI = .716. For the 4Monitor-9 Model, again, structure coefficients were
similar to the other two models and ranged from .73 (Organization of Materials and Ext MI) to
.92 (Plan/Organize and Internal MI; Mdn = .85). Factor intercorrelations for the 4Monitor-9
Model ranged from .725 to .997, and were as follows: BRI and ERI = .878; BRI and Internal MI
= .767; BRI and External MI = .750; ERI and Internal MI =.725; ERI and External MI = .683;
and Internal MI and External MI = .997. Singularity was observed between the two
Metacognitive scales, which diminished the viability of the four-factor model.
Both the 3Monitor-9 and 4Monitor-9 Models consistently showed slightly stronger fit to
the data than the 2Monitor-9 Model. All four nested nine-scale models were compared to one
another based on the change in chi-square from the most to the least restrictive models. The
Unity-9 Model fit the data poorly but differed from the Null [χ2 (9) = 2086.951, p < .001]. The
2Monitor-9 Model also differed from the Unity-9 Model [χ2
(1) = 249.656, p < .001].
Incremental difference in fit differed between the 3Monitor-9 and 2Monitor-9 Models [χ2
(2) =
33.59, p < .001]; however, the incremental fit between the 4Monitor-9 and 3Monitor-9 Models
did not differ [χ2
(3) = 1.640, p > .05]. This finding indicated that adding another parameter
(i.e., factor) to the three-factor model did not improve the fit of the four-factor model to the data.
88
The RMSEA score was outside of the recommended range for all models, but was
slightly lower in the 3Monitor-9 Model in comparison to the other models, including the
4Monitor-9 Model. Additionally, singularity was found between the Internal MI factor and the
External MI factor (.997) in the 4Monitor-9 Model. Given the similar fit indices (i.e., NNFI,
CFI, SRMR, RMSEA, and AODSR) between the 3Monitor-9 and 4Monitor-9 Models, prior
research (Egeland & Fallmyr, 2010; Gioia et al., 2002), incremental χ2
differences, and the
parsimony of the three-factor model over the four, the 3Monitor-9 Model was selected as having
the better fit for the nine-scale version of the BRIEF-Parent scale. The factor structure, including
structure coefficients and error terms, for the 3Monitor-9 Model is displayed in Figure 8.
Eight- versus nine-scale models. To compare the goodness-of-fit values of the non-
nested models (8- versus 9-scales), two approaches were used. First, the viable models were, in
general, compared to each other based on the fit indices. Second, the models were compared on
the value of the Expected Cross-Validation Index (ECVI; Browne & Cudeck, 1989). This value
is used to express overall error between the population covariance and the model fitted to the
sample. A lower value is considered ideal as it is not informative in itself, rather, when
compared to other models (Diamantopoulos & Siguaw, 2000). As discussed, the model with the
best fit among the eight-scale models was the 2Original-8 Model. Even though three out of the
four 9-scale models (2Monitor-9, 3Monitor-9, 4Monitor-9) had a stronger fit to the data than the
2Original-Model, the 3Monitor-9 Model was considered to fit the data better. Thus, comparisons
were made between the 2Original-8 Model and 3Monitor-9 Model.
Across the fit indices, the 3Monitor-9 Model had higher values on the NNFI and CFI
(.023 - .033 difference), and slightly lower values (.008 - .038 difference), as needed, on the
89
Figure 8. Standardized coefficients derived by confirmatory factor analysis
(maximum-likelihood) for the three-factor, nine-scale (3Monitor-9) Model.
Effect sizes are squared standardized structure coefficients. BRI = Behavioral
Regulation Index; ERI= Emotional Regulation Index; MI = Metacognition
Index.
MI
Self-Monitor
Inhibit
Initiate
Working Memory
Plan / Organize
Organizationof Materials
Task-Monitor
BRI
EmotionalControl
ShiftERI
.76
.88
.72
.85
.85
.86
.85
.84
.88
.92
.73
.74
.53
.53
.51
.53
.54
.49
.38
.69
.67
90
SRMR, RMSEA, and AODSR in comparison to the 2Original-8 Model. Additionally, the
3Monitor-9 Model had a lower ECVI value (.453) in comparison to the 2Original8 Model (.550),
indicating a better fit.
Subsamples. The sample had some unique demographic features in that 74% of
participants were between the age of 16 and 18 and were OVR referrals. Also, 95% of the
participants were of the Caucasian race, and 72% of the raters were mothers. Because of the
disproportional sample size of participants reflecting these demographic features, it was not
possible to run separate CFAs to determine whether the fit of the data to the model was based on
these features. However, participants of the minority subsamples (i.e., non-OVR referrals,
racial/ethnic minority participants, and non-mother raters) were temporarily removed from
analysis and CFAs were re-run only on the scores of the majority subsamples (i.e., OVR only,
Caucasians only, and mothers only).
OVR subsample. The OVR subsample consisted of 264 participants. All eight-factor
model χ2
values were statistically significant at .05. The 2Original-8 Model for this subsample
had a χ2 value of 124.980, which was statistically lower than the Unity-8 Model (223.707) or
2Donders-8 Model (138.838). Also, the 2Original-8 Model had slightly better fit values (≥ .90)
across the incremental fit indices (CFI; NNFI) in comparison to the Unity-8 Model (.819 to .870,
respectively) and 2Donders-8 Model (.888 to .915, respectively). All three 8-scale models for
the OVR subsample had the same CFI as the full sample; two of the three (2Original-8 and
Unity-8) had the same NNFI as well. The 2Donders-8 Model varied slightly, with the difference
in the NNFI equal to .014 in relation to the full sample. RMSEA values for all three 8-scale
models were consistently large (> .14) with differences from the full sample not exceeding .012.
Similar to the findings for the full sample, the 2Original-8 Model had the best fit compared to the
91
Null Model and the two other eight-factor models for the OVR subsample. The 2Original-8
Model also had a higher incremental change χ2
(1) = 98.727, p < .001 relative to the less
restricted model.
In terms of nine-scale models, all χ2 values were statistically significant relative to the
Null Model. Three out of the four models (2Monitor-9, 3Monitor-9, 4Monitor-9) demonstrated a
strong fit to the data, with the fit indices of NNFI and CFI, ranging from .925 to .956. AODSRs
(.032-.037) and SRMRs (.040-.048) for the three nine-scale models were less than .05. Again,
the Unity-9 Model had a poor fit across all indices in comparison to the other 9-scale models.
The difference between the OVR subsample and the full sample on the fit indices for all four
models was small (≤ .011). Similar to the full sample, all nine-scale models evidenced misfit in
that the RMSEAs were greater than .08 in the OVR subsample. All equations for the parameter
estimates were statistically significant at .05. Effect sizes (R2) were typically large (> .50;
Cohen, 1988), with the exception of two values in the Unity-9 Model, which showed medium
values (Emotional Control = .40 and Task-Monitor = .48). The effect sizes of the remaining
nine-scale models ranged from .51 (Task Monitor) to .87 (Plan/Organize). Incremental χ2
tests
showed statistically significant differences between the more and less restrictive models, with the
exception of the 4Monitor-9 to 3Monitor-9, which showed a non-significant change.
For the same reasons indicated for the full sample (good fit indices, incremental χ2
test
results, singularity of the Internal and External factors, and parsimony), the best nine-scale model
was again determined to be the 3Monitor-9 Model for the OVR subsample. The 3Monitor-9
Model consistently had a better fit to the data than the 2Original-8 Model for the OVR
subsample in terms of the NNFI, CFI, SRMR, AODSR, and RMSEA values. In addition, the
ECVI of the 3Monitor-9 Model (.467) was lower than that of the 2Original-8 Model (.589).
92
Overall, results mirrored the findings reported above for the full sample. A summary of the fit
indices for all models for the OVR sample is displayed in Table 6. A complete overview of
structure coefficients, effect sizes, and error terms for the OVR subsample is contained in
Appendix F.
Caucasian subsample. The sample was composed primarily of Caucasian participants (n
= 354; 95.4%). For the eight-scale models, all χ2 values were statistically significant relative to
the Null Model. A comparison of each Caucasian subsample value to its respective full sample
value for the three 8-scale models indicated less than .01 difference on all fit indices. Of the
eight-scale models for the Caucasian subsample, the 2Original-8 Model fitted the data best
(NNFI .896; CFI .929). SRMR was .052 and RMSEA was .152. In terms of incremental fit,
again, the 2Original-8 Model differed from the Unity-8 Model (χ2
(1) = 143.281, p < .001) more
so than the 2Donders-8 Model (χ2
(1) = 100.253, p < .001).
In terms of nine-scale models, all χ2 values were statistically significant relative to the
Null Model. Three out of the four models (2Monitor-9, 3Monitor-9, 4Monitor-9) demonstrated a
strong fit to the data; the fit indices of NNFI and CFI ranged from .916 to .952. Values for
measures of residual were low and less than .05: AODSRs (.032-.037) and SRMRs (.042-.047).
There were little differences on the fit indices among the nine-scale models between the
Caucasian subsample and the full sample; no change exceeded .008. However, as with the other
samples, RMSEAs were greater than .08. Like the full sample, the 3Monitor-9 Model for the
Caucasian subsample had a slightly lower RMSEA value, but had almost identical values on the
other fit indices as the 4Monitor-9 Model. All equations for the parameter estimates for the nine-
scale models were statistically significant at .05. Effect sizes (R2) were all considered “large”
(>.50; Cohen, 1988). The incremental χ2 values of the nine-scale models were statistically
93
Table 6
Summary of Fit Indices of CFA (ML) Models of the BRIEF-Parent Form for the OVR Sample
χ2
df χ2
diff NNFI CFI SRMR RMSEA 90% CI AODSR
8-scale
Null 1606.296 28 -- -- -- -- -- --
Unity-8 223.707* 20 1382.589* .819 .870 .070 .197 (.173, .220) .053
2Original-8 124.980* 19 98.727* .901 .933 .057 .146 (.121, .170) .046
2Donders-8 138.838* 19 84.869* .888 .915 .058 .155 (.131, .179) .045
9-scale
Null 1736.807 36 -- -- -- -- -- --
Unity-9 298.093* 27 1438.714* .787 .841 .081 .195 (.175, .215) .067
2Monitor-9 118.593* 26 179.500* .925 .944 .048 .116 (.095, .137) .037
3Monitor-9 86.858* 24 31.735* .945 .956 .042 .100 (.077, .122) .032
4Monitor-9 85.540* 21 1.318 .935 .956 .040 .108 (.084, .132) .032
Note. N = 264. ML = Maximum likelihood extraction; Unity-8 = One-factor, eight-scale model; 2Original-8 = Two-factor, eight-scale
Gioia model; 2Donders-8 = Two-factor, eight-scale Donders model; Unity-9 = One-factor, nine-scale model; 2Monitor-9 = Two-
factor, nine-scale model; 3Monitor-9 = Three-factor, nine-scale model; 4Monitor-9 = Four-factor, nine-scale model; χ2
= chi-square; df
= degrees of freedom; χ2
diff = chi-square difference; NNFI = Bentler-Bonett non-normed fit index; CFI = comparative fit index; SRMR
= standard root mean square; RMSEA = root mean square error of approximation; CI = confidence interval; AODSR = average off-
diagonal standardized residual. *p < .05.
94
significant for each comparison between the more and less restricted models with the exception
of the 4Monitor-9 Model to the 3Monitor-9 Model, χ2
(3) = 1.452, p > .05. For the Caucasian
subsample, the 3Monitor-9 Model was selected as the best model of the nine-scale models.
Factor intercorrelations for the 3Monitor-9 Model ranged from .700 to .881, which were
similar to the full sample. In comparing the non-nested 2Original-8 and 3Monitor-9 Models, the
ECVI was lower in the 3Monitor-9 Model (.492 vs. .577), indicating a better fit. Re-running the
data without racial/ethnic minority or unknown race participants did not substantially change the
model fit to the data. A summary of the fit indices for all models for the Caucasian subsample is
displayed in Table 7. A complete overview of structure coefficients, effect sizes, and error terms
is contained in Appendix F.
Mother subsample. Mothers made up 71.9% of the sample (n = 267). For the eight-scale
models, all χ2 values were statistically significant to the Null Model. The 2Original-8 Model had
the best fit to the data (NNFI .892 and CFI .927) in comparison to the other eight-scale models.
A comparison of the subsample of mothers to the full sample indicated that the difference
between each respective value showed that all fit indices for the 8-scale models was equal or less
than .014 (i.e., the value in the sample of mothers relative to same value in the full sample).
RMSEAs exceeded .08, but AODSR and SRMR values were acceptable.
For the nine-scale models, with the exception of Unity-9, the nine-scale models had
adequate fit values (> .90) across the incremental fit indices (CFI and NNFI); in terms of SRMR,
values were between .041 and .048. RMSEA values were greater than .08, ranging between .111
and .121. Incremental χ2 tests showed statistically significant differences between the more and
less restricted models, with the exception of the 4Monitor-9 and 3Monitor-9 Models. The
3Monitor-9 was, again, selected as having the best fit of the nine-scale models. Factor
95
Table 7
Summary of Fit Indices of CFA (ML) Models for the BRIEF-Parent Form Based on the Caucasian Participants
χ2
df χ2
diff NNFI CFI SRMR RMSEA 90% CI AODSR
8-scale
Null 2217.553 28 -- -- -- -- -- --
Unity-8 316.879* 20 1900.674* .810 .864 .069 .205 (.185, .225) .052
2Original-8 173.598* 19 143.281* .896 .929 .052 .152 (.131, .172) .043
2Donders-8 216.626* 19 100.253* .867 .910 .059 .172 (.151, .192) .043
9-scale
Null 2392.215 36 -- -- -- -- -- --
Unity-9 414.422* 27 1977.793* .781 .836 .079 .202 (.184, .219) .066
2Monitor-9 169.606* 26 244.816* .916 .939 .047 .125 (.107, .143) .037
3Monitor-9 137.812* 24 31.794* .928 .952 .043 .116 (.097, .135) .032
4Monitor-9 136.360* 21 1.452 .916 .951 .042 .125 (.105, .145) .032
Note. N =354. ML = Maximum likelihood extraction; Unity-8 = One-factor, eight-scale model; 2Original-8 = Two-factor, eight-scale
Gioia model; 2Donders-8 = Two-factor, eight-scale Donders model; Unity-9 = One-factor, nine-scale model; 2Monitor-9 = Two-
factor, nine-scale model; 3Monitor-9 = Three-factor, nine-scale model; 4Monitor-9 = Four-factor, nine-scale model; χ2
= chi-square; df
= degrees of freedom; χ2
diff = chi-square difference; NNFI = Bentler-Bonett non-normed fit index; CFI = comparative fit index; SRMR
= standard root mean square; RMSEA = root mean square error of approximation; CI = confidence interval; AODSR = average off-
diagonal standardized residual. *p < .05.
96
intercorrelations were similar to that of the full sample for the 3Monitor-9 and ranged from .681
to .863. In comparing the non-nested 2Original-8 and 3Monitor-9 Models, the ECVI was lower
in the 3Monitor-9 Model (.522 vs. .630), indicating a better fit. Overall, results were similar to
those obtained from the full sample, meaning that the scores obtained from solely mothers as
raters did not substantially change the model fit to the data. A summary of the fit indices for the
mother subsample is reported in Table 8. Structure coefficients, error terms, and effect sizes for
the mother subsample are contained in Appendix F.
97
Table 8
Summary of Fit Indices of CFA (ML) Models of BRIEF-Parent Form Based on the Mothers as Raters
χ2
df χ2
diff NNFI CFI SRMR RMSEA 90% CI AODSR
8-scale
Null 1649.599 28 -- -- -- -- -- --
Unity-8 245.615* 20 1403.984* .805 .861 .074 .206 (.183, .229) .054
2Original-8 137.630* 19 107.985* .892 .927 .055 .153 (.129, .177) .046
2Donders-8 170.325* 19 75.290* .862 .907 .064 .173 (.149, .197) .045
9-scale
Null 1778.992 36 -- -- -- -- -- --
Unity-9 323.773* 27 1455.219* .773 .830 .087 .203 (.183, .223) .068
2Monitor-9 128.026* 26 195.747* .919 .941 .048 .121 (.101, .142) .039
3Monitor-9 103.022* 24 25.004* .932 .955 .044 .111 (.089, .133) .033
4Monitor-9 100.889* 21 2.133 .921 .954 .041 .120 (.096, .143) .032
Note. N = 267. ML = Maximum likelihood extraction; Unity-8 = One-factor, eight-scale model; 2Original-8 = Two-factor, eight-
scale Gioia model; 2Donders-8 = Two-factor, eight-scale Donders model; Unity-9 = One-factor, nine-scale model; 2Monitor-9 =
Two-factor, nine-scale model; 3Monitor-9 = Three-factor, nine-scale model; 4Monitor-9 = Four-factor, nine-scale model; χ2
=
chi-square; df = degrees of freedom; χ2
diff = chi-square difference; NNFI = Bentler-Bonett non-normed fit index; CFI =
comparative fit index; SRMR = standard root mean square; RMSEA = root mean square error of approximation; CI = confidence
interval; AODSR = average off-diagonal standardized residual. *p < .05
98
DISCUSSION
The purpose of the study was to examine whether the eight-scale factor structure of
BRIEF-Parent (Gioia et al., 2000) scores could be replicated in a mixed clinical sample of
school-aged children. This study was unique in that it (a) was independently conducted, (b) had
a sample of US children with mixed clinical diagnoses, and (c) contained an adequately large
sample size for running a scale-level CFA relative to other independent published studies (e.g.,
Egeland & Fallmyr, 2010). Besides testing the original eight-scale version, other models of the
eight scales were tested as well as several models of the nine-scale format. Testing alternative
models is recommended to ensure that a preferred model is not accepted without considering
competing models that could also fit the data just as well if not better (Jöreskog, 1993). Looking
solely at Gioia et al.’s original model, the CFA findings appear to support this structure. Also,
when compared to the other two 8-scale models (Unity-8 and 2Donders-8), the 2Original-8
model appears to fit the data best. However, three of the four 9-scale models (2Monitor-9,
3Monitor-9, 4Monitor-9) demonstrated as strong a fit to the data as the 2Original-8 model and
thus are plausible models in understanding the structure of the BRIEF-Parent. The discussion
will examine potential explanations for the findings of the eight-scale models in contrast to the
nine-scale ones. Furthermore, the findings will be examined in relation to the theoretical premise
and current factor structure of the BRIEF-Parent. Limitations of the study will be presented as
well as implications for practice and future research.
Eight-Scale Models of the BRIEF-Parent
The research question of this study was “will the factor structure of BRIEF-Parent scores
obtained from a mixed clinical sample of school-aged children align with the two-factor, eight-
99
scale structure originally proposed by the test authors?” Based on prior research (i.e., Donders et
al., 2010; Huizinga & Smidts, 2011; Slick et al., 2006), when compared to a one-factor model,
the 2Original-8 model appears to fit the data best. From a theoretical perspective, this finding
makes sense. The one-factor model has a poor fit because there is no differentiation between the
behavioral and cognitive aspects of executive function. The one-factor model is considered to
align with the theoretical perspective of unity (Baddeley, 1986), which was discussed in the
literature review. In short, the premise of the theory of unity is all executive processes combine
to constitute an overarching, interconnected supervisory system. This view is generally
considered to be outdated (see Packwood, Hodgetts, & Tremblay, 2011).
However, the 2Original-8 Model and the 2Donders-8 Model contain two factors (BRI
and MI), which delineate between the behavioral and cognitive components of EF. Although
both models represent both components, the configuration of the scales is not the same. On the
2Original-8 Model, the Inhibit scale is on the BRI factor, but on the 2-Donders-8 Model, the
Inhibit scale is on the MI. Sampling may explain the theoretical rationale for the difference in
model configuration. Donders et al.’s (2010) model is based on information gleaned from a
group of children with traumatic brain injury (TBI). Thus, the type of sample may have
informed Donders et al.’s view that the Inhibit scale is a component of the cognitive factor (MI)
instead of solely a behavioral factor (BRI) as Gioia et al. (2000) has proposed. In essence, the
Inhibit scale is considered to measure a cognitive instead of behavioral aspect of impulse control.
However, Donders et al. acknowledge that what is measured by the Inhibit scale is not entirely
clear because it fails to correlate with traditional performance-based measures of inhibitory
control (Bodnar, Prahme, Cutting, Denckla, & Mahone, 2007). Furthermore, Gioia et al. (2000),
100
also found that the Inhibit scale loaded on the MI factor in the normative sample, but loaded on
the BRI factor in the clinical sample. In the current study, the findings support Gioia et al.’s
original model (2Original-8 Model), not Donders et al.’s. This finding is not unexpected, given
that the sample used was a mixed clinical sample, one similar to Gioia et al.’s. This study is
unique in that no other CFA study has examined the 2Original-8 Model in relation to both the
Unity-8 and 2Donders-8 Models, particularly using a mixed clinical sample of youth. However,
would the fit of the model to the data been reversed if a TBI sample had been used to compare
the two-factor 8-scale models? Or would Donders et al.’s finding been different if CFAs instead
of EFAs had been used?
Thus, it is inconclusive what position the Inhibit scale has in relation to the other
executive function constructs. It is unknown whether the difference in Donders et al.’s finding
could be due to sampling, the specificity of a model to a unique sample, or a better model. Is
Donders et al.’s model unique to children with TBI? Why did the scale also load on the MI in
Gioia et al.’s (2000) factor analysis? Further research using various clinical samples is
warranted to understand Donders et al.’s eight-scale model for the BRIEF-Parent scale.
The current study’s support for the two-factor, eight-scale structure of the BRIEF-Parent
version adds to the research (Batan et al., 2011; Huizinga & Smidts, 2011; Qian & Wang, 2007;
Slick et al., 2006), which also supports an eight-scale model of the BRIEF. Two international
studies have provided support for an eight-scale structure; Batan et al. (2011) conducted an EFA
instead of a CFA, and Qian and Wang (2007) concluded that a CFA showed that the eight-scale
model of the BRIEF was “reasonable.” Unfortunately, abstracts for both of these studies were
the only information available in English; thus, it is unknown whether alternative models were
101
considered and, if so, what criteria was used in making these conclusions. Slick et al. (2006)
supported the 2Original-8 Model with a small clinical sample of U.S. children diagnosed with
intractable epilepsy, using EFA, but the Monitor scale loaded on both factors. A three-factor
model was apparently tested; however, little information was provided on the solution. Slick et
al. noted that the three-factor solution was “explored” (p. 186), but disregarded as a viable
solution. Huizinga and Smidts (2011) supported an adapted version of the eight-scale structure
using CFA on item-level data. The model was run twice because the first time indicated poor fit,
so three parameters were freely estimated and then re-run in order to improve model fit. Because
this revised version of the model was within recommended standards, the authors deemed further
investigation (i.e., testing alternative models) unnecessary. However, the authors used a Dutch
version with a different number of items from the original scale making generalization difficult.
In summary, studies have supported some version of a two-factor, eight-scale model.
These studies have varied in an number of characteristics: (a) small sample size (N = 80-100;
Donders et al., 2010; Slick et al., 2006), (b) sample with specific clinical diagnosis, such as TBI
(Donders et al., 2010) or intractable epilepsy (Slick et al., 2006), or (c) translated versions with a
different number of items than the original BRIEF-Parent (Batan et al., 2011; Huizinga &
Smidts, 2011; Qian & Wang, 2007). Despite differences and limitations, researchers have
provided full or partial support for Gioia et al.’s eight-scale model. However, factor analytic
studies that supported a version of the two-factor, eight-scale structure (i.e., Donders et al., 2010;
Huizinga & Smidts, 2011; Slick et al., 2006) did not test any of the nine-scale versions of the
scale as alternative models. Gioia et al. (2002) compared only nine-scale versions to one another
with no eight-scale versions serving as alternative models. In contrast, Egeland and Fallmyr
102
(2010) and the current study examined both eight- and nine-scale versions. The current study
examined a four-factor, nine-scale model whereas Egeland and Fallmyr did not.
Nine-Scale Models of the BRIEF-Parent
Despite limited research on the original eight-scale structure of the BRIEF-Parent scale,
research on alternative versions of the instrument has grown. Following the release of the
BRIEF in 2000, Gioia and Isquith (2002) posited that monitoring one’s own problem-solving is a
distinct entity than monitoring one’s social behavior and thus should be examined in the context
of the BRIEF. Gioia and Isquith proposed re-examining the eight-item Monitor scale of the
BRIEF by dividing it into two 4-item scales: (a) monitoring of task-related activities (Task-
Monitor scale), and (b) monitoring of personal behavior activities (Self-Monitor scale). This
structure was considered theoretically viable due to the increasingly prevalent research that has
supported a model of two distinct emotional and attentional components of the brain (Dolcos &
McCarthy, 2006). Furthermore, there has been evidence that the BRIEF-Parent may be
improved by differentiating the Monitor scale. For example, Slick et al. (2006) reported an EFA
that supported a two-factor, eight-scale solution, but also showed that the Monitor scale loaded
on both the BRI and MI. Other clinical studies (Gilotty et al., 2002; McCandless & O’Laughlin,
2007) have shown that the Monitor scale correlated highly with both the BRI and MI factors.
Egeland and Fallmyr (2010) have raised concerns about the viability of the nine-scale
format. They contend that the reliability estimates of the scores for both the four-item Self-
Monitor and four-item Task-Monitor scales may not be adequately supported due to the small
number of items on each scale relative to the original eight-item Monitor scale. However, this
concern was not an issue in the current study. All estimates of reliability for the two Monitor
103
scales met or were close to the .80 cutoff. Gioia and Isquith (2002) also found that the reliability
estimates of the scores for both scales have been greater than .70, but the issue is whether these
estimates are adequate when the scales are used in making high stakes decisions.
The focus on 9-scale models goes beyond the division of the Monitor scale, but also has
been on the expansion of factors due to re-configuring the placement of pre-existing scales.
Such revisions point to how views on executive function are changing. Both three-factor and
four-factor 9-scale models have been proposed and tested. The three-factor model highlights that
there is a distinction between emotional regulation and inhibitory behavior control, whereas the
four-factor model parses out emotional regulation as well as differentiates between internal and
external metacognition.
In the current study, these nine-scale models, as well as a one-factor model, were tested;
three of the four 9-scale models showed a strong fit to the data (2Monitor-9, 3Monitor-9,
4Monitor-9), meeting the criteria for goodness of fit, with the exception of RMSEA. The two-
factor, nine-scale (2Monitor-9) model seemed to fit the data, but when compared to the three-
factor (3Monitor-9) and four-factor (4Monitor-9) nine scale models, the 2Monitor-9 model was
not as viable of a solution. Along with the strong statistical evidence that supported the three-
and four-factor solutions, the factor structures seemed to align with the theoretical views of EF.
In the three-factor model, the ERI factor (Emotional Control and Shift scales) was parsed out
from the “original” BRI factor and a “new” BRI factor was created (Inhibit and Self-Monitor
scales). This delineation made practical sense because the “new” BRI was then made up of
scales that measured inhibitory behavior whereas the ERI was comprised of scales that measured
internalized emotional control. The MI factor remained intact in this model and was separated
104
from behavioral and emotional regulation. As Gioia et al. (2002) has pointed out, this three-
factor solution aligned with Barkley’s (1997) theory, which included a three-prong model of
executive function: (a) behavioral (inhibitory) control; (b) emotional regulation; and (c)
metacognition. The four-factor solution also included the “new” BRI and the ERI factors as well
as the division of MI into Internal MI and External MI. Although a four-prong approach to
executive function has been theorized (see Shallice and Burgess, 1991b), there are problems with
this model as the four-factor CFA model showed that the correlation between the Internal MI and
External MI was almost perfectly correlated. These factors seemed to be measuring essentially
the same construct, despite the slight improvement in the fit indices. Further, the four-factor
model presented no statistically significant advantage over the three-factor model in that the
incremental χ2 test showed that the additional constraint did not improve the fit of the model to
the data. Prior research (Egeland & Fallmyr, 2010; Gioia et al. 2002), in addition to the issue of
parsimony, supported that the three-factor (3Monitor-9) model was a better fit than the four-
factor model. The “best” eight-scale model was determined to be Gioia et al.’s (2000) original
model (2Original-8); however, this model fell short when directly compared to the several of the
nine-scale models. There are several possible reasons for finding results that differ from those
reported in the BRIEF-Parent manual.
Differences in Findings
The difference in findings about the eight-scale structure between this study and Gioia et
al.’s (2000) could be due to the factor analytic method used. Gioia et al. (2000) used principal
factor analysis with oblique (direct oblimin) rotation to develop the BRIEF-Parent, whereas CFA
was used in the current study. While both factor analytic methods are theory-driven, CFA is
105
designed to test specific models about the nature of the factors while EFA is designed to
determine whether a set of variables create a factor that best reflects a theoretical construct
(Byrne, 2006). One of the biggest criticisms of exploratory factor analysis is that interpretation
of results can hinge largely on a researcher’s judgment (Tabachnick & Fidell, 2001). Because of
this vulnerability, several methods of factor retention methods (e.g., parallel analysis & MAP)
are recommended for use beyond the eigenvalue rule of one (Henson & Roberts, 2006). There is
no mention in the test manual that these additional procedures were used. However, Gioia et al.
(2000) does explicitly state, “the traditional method of determining the number of factors (i.e.,
eigenvalues > 1.0) was overridden in favor of theoretical considerations” (p. 61). The only stated
selection criterion was that pattern coefficients needed to be greater than |.40|. No models with
more than two factors were considered because solutions with a greater number of factors
produced factor defined by single variables and “did not add to the interpretability of the scales”
(Gioia et al., 2000; p. 62). Overall, the difference in findings regarding the factor structure may
be due to these limitations or the vague “theoretical considerations” employed by Gioia et al. to
establish the factor structure of the BRIEF-Parent scale.
Therefore, this study is particularly important in that all eight and nine scale models of
the BRIEF-Parent were tested in a U.S. sample of school-aged children with mixed clinical
diagnoses. No U.S. study has done this before. Egeland and Fallmyr (2010) was the first to
examine both types of models, but in a Norwegian sample. In both studies, Egeland and Fallmyr
and the current one, Gioia et al.’s original (2Original-8 Model) was not found to be the best
fitting model. Instead a nine-scale model, either the three or four-factor, was found to be more
viable. Thus, the findings of this study are even more important in that it is only the third study
106
to support a three-factor, nine-scale partition of the BRIEF-Parent. The current study is unique
from the other studies in that it is the first independent study of the BRIEF-Parent form
conducted in the United States using a large, mixed clinical sample of referred children,
comparing both eight- and nine-scale versions of the BRIEF-Parent.
Reasons for Misfit
A common area of misfit across the current study as well as other CFA studies on the
BRIEF-Parent (Gioia et al., 2002; Egeland & Fallmyr, 2010) involved the RMSEA value. The
RMSEA is used to measure the discrepancy between the error of approximation in the population
covariance matrix and optimally chosen parameter values of the model (Steiger & Lind, 1980).
However, a known issue concerning this index is that when sample size is small, RMSEA tends
to over-reject true population models (Hu & Bentler, 1999). In the three aforementioned studies,
RMSEA values fell above the 0.08 (Brown & Cudeck, 1993) cutoff for all tested models.
RMSEA values arranged by study and model are provided in Table 9.
Table 9
Root Mean Square Error Approximation (RMSEA) Values Arranged by Model and Study
Model
Study One-factor Two-factor Three-factor Four-factor
Gioia et al. (2002) .21 .12 .11 .12
Egeland & Fallmyr (2010) .23 .12 .14 --
Current study .20 .12 .11 .12
Egeland and Fallmyr (2010) made no modifications to the models after running the
CFAs; however, Gioia et al. (2002) did by estimating three error covariances between the Inhibit
scale and the Working Memory, Organization of Materials, and Emotional Control scales. By
Gioia et al. estimating these error covariances, the RMSEA in the three-factor model was
107
decreased from .11 to .08. Post-hoc modifications should be done under a theoretical premise
(Byrne, 2006); thus, Gioia et al. cited Barkley (1997) and Burgess (1997) in defense of such
modifications and claimed that there is a known relationship between inhibition and other
processes, including working memory, organization, and emotional control. As a result, it made
sense to connect the error terms associated with these scales. Under the alternative model
approach of establishing models a priori, no post-hoc modifications were made in this study.
However, Gioia et al.’s findings may be useful in future research.
Limitations
There are several limitations of the study, which may reduce the external and internal
validity of the findings. The sample was geographically limited to Western Pennsylvania; thus,
the findings may differ in other areas of the United States or world-wide. Additionally, the
sample was largely comprised of an older population of youth aged 16 to 18 years of age because
many of the students in the sample were referred from OVR. This point leads to the argument
that students who pursue OVR services may not be representative of typical students in special
education. According to Halpern, Yobanoff, Doren and Benz (1995), who examined special
education students in their last year of high school, a majority of these students do pursue some
type of postsecondary education within one year after graduation, making it feasible that the
youth in this sample do accurately represent students in special education. Nonetheless, a larger
sample of students from age 5 to 15 years would have been ideal.
Another aspect to consider is that the majority of raters in this sample were mothers.
Most CFA studies on the BRIEF-Parent (e.g., Gioia et al., 2002; Huizinga & Smidts, 2011; Slick
et al., 2006) have not specified which parents or guardians served as raters, so it is difficult to
108
speculate whether or not this sample contained a disproportional number of mothers versus
father or other raters. However, Batan et al. (2011; Turkish version) reported in their abstract
that 73.8% were mothers, 22.1% were fathers, and 4.1% were other primary caregivers. Batan et
al.’s percentages were similar to those in the current study. A final demographic concern is the
cultural diversity of the sample. A vast majority of the participants were of Caucasian descent,
so it is hard to say if the results would have varied if a higher percentage of racial/ethnic minority
participants had been in the sample.
A variable that could have potentially compromised internal validity was sample size.
The current study did not have a sufficient number of participants to conduct a viable CFA at
item-level (Barrett & Kline, 1981). However, this study had an adequate number of participants
to justify scale-level CFA techniques when using a conservative recommended value of 20:1 for
the case-to-indicator ratio (Fabrigar et al. 1999). The current study had a similar sample size (N
= 371) as Gioia et al.’s (2002) CFA study (N = 374).
Another potential threat to internal validity was whether the use of a sample of mixed
clinical diagnoses was representative of a typical special education population. Data for most
participants were unavailable regarding the diagnoses or educational category, which entitled
students to receive services. The only data available about special education were from the
school sample (n = 62). Table 10 provides the percentage of participants in the school sample
receiving services across various categories compared to the percentage of students receiving
services that was reported by the same school district for the 2012-13 school year (PA
Department of Education, 2012). The comparison indicates that the school sample contained a
larger sample of those designated with a Specific Learning Disability and Emotional
109
Disturbance, but had a smaller sample of those designated with Autism and Other Health
Impairment. Thus, there is some indication that the mixed diagnoses sample may be
representative of a special education population, but this conclusion is based on less than 20% of
the sample, as this information for the rest of the sample is unknown.
Table 10
Percentage of Participants Receiving Special Education Services for School Sample (N = 62)
and School District (N = 1,180) by Category
Category School Sample School District
Autism 6.5 8.8
Emotional Disturbance 12.9 7.4
Gifted 4.8 4.2
Non-exceptional 16.1 14.6
Other Health Impairment 14.5 21.7
Specific Learning Disability 43.5 30.4
Traumatic Brain Injury 1.6 < 1
Implications
Practice. The CFA findings indicate that competing models of the BRIEF-Parent may fit
the data adequately. Such findings have implications for the use of the scale in school
andvocational settings. The use of the BRIEF-Parent in the school setting warrants several areas
of consideration.
How the Monitor scale is treated has implication for practice. Is the construct unitary or
multidimensional, reflecting two related but separate constructs (Self- and Task-Monitor)?
In its current format, Gioia et al. (2000) describe the Monitor scale as assessing the abilities to
keep track of one’s own and others’ efforts through “work-checking” behaviors. There is a
distinction, however, between Task-Monitor (i.e., monitoring of task-related activities), which
includes items such as “Does not check work for mistakes” and Self-Monitor (i.e., monitoring of
110
personal behavioral activities), which includes items such as “Is unaware of how his/her behavior
affects or bothers others” (Gioia & Isquith, 2002). Task-monitoring appears to involve high
face validity in that it involves completing tasks. In contrast, a Self-Monitor item involves more
self- and social awareness instead of academic behaviors. Differentiating between these skills on
the Monitor scale may yield potentially useful information when developing interventions for
children with social skills deficits versus those who need such support, rather require help
completing their work more thoroughly.
Another implication for practice is the usefulness of emotional regulation, measured by
the third factor, ERI. This factor measures a child’s ability to regulate emotions relative to other
children his/her age. In the original model, the scales that would comprise the ERI in the three-
factor model (i.e., Emotional Control and Shift) are combined with the Inhibit scale to form the
BRI. However, there is no distinction in the BRI between emotional regulation and inhibitory
behavior control. In combination with several other sources of student data, gaining more
information about a student’s emotional regulation is a potentially valuable piece of information
for parents and educators. The information gleaned from the ERI score could give educators and
practitioners a global sense of where a child’s emotional regulation stands relative to other
children his or her age. An elevated score in the ERI could provide further insight into a
student’s functioning in the areas of modulating emotions or moving freely from one situation or
activity to another. This information may contribute to the development of more individualized
instruction and provide more data about students with emotional needs. Without a separate score
parsed out from the BRI, the potential exists that the unique emotional component of problematic
student behaviors may be overlooked.
111
In applying the current findings into practice is the issue of how to use the Metacognitive
Index (MI). Would the scale be more useful as one measure or two? Results indicate that these
two aspects of MI (Internal and External) are highly correlated and may be measuring the same
construct on the BRIEF-Parent. The Internal MI scale involves behaviors that may not be as
readily reflected on an observer rating scale, but involve calculated actions that take place in the
brain. However, whatever information is made available from the BRIEF-Parent could still be
useful to educators. Strategies for instruction rarely occur in isolation and are integrated into
complex cognitive goals that entail higher-order sequences. Pressley and Woloshun (1995)
discussed the relationship between metacognitive behaviors and reading skills. Good reading
skills may entail activation of prior knowledge and self-questioning about text content. At the
very least, the BRIEF-Parent scores could be useful in alerting the practitioner to at-risk
behaviors in the areas of planning, strategizing, and initiating tasks, which are very different
skills from those in the External MI where interventions may involve organization and work-
checking techniques. Students struggling with Internal MI type skills would need to learn
different types of interventions or strategies, which are designed for the students to ask
themselves questions or check for comprehension.
Future research. Future factor analytic studies conducted on the BRIEF-Parent should
be based on a larger sample size to allow for item analysis. No studies using a mixed clinical
sample, including the current study, have had a large enough sample to run the analyses at item
level. Huizinga and Smidts (2010) used the largest sample size (N = 847) with a 75-item Dutch
version of the original 86-item scale. Information gleaned from item-level research would be
useful in establishing accurate interpretation of the BRIEF-Parent scores. Research has begun to
112
emerge in other countries using the BRIEF-Parent and examining its factor structure; however,
norms for the country must be established before accurate comparisons can be made with U.S.
findings. Executive function is a culturally bound concept; thus, comparing those from other
countries and cultures may be a challenging, but an important objective in fully understanding
individual functioning and neuropsychological pathways.
Egeland and Fallmyr (2010) also conducted research on the factor structure of the
BRIEF-Teacher using a group of Norwegian teacher raters. Their findings provided evidence to
support an alternative three-factor, nine-scale model, which differed from the model provided in
the test manual. However, more direct evidence is needed, so conducting a CFA on the Teacher
form of the BRIEF using a large mixed clinical sample of U.S. youth is warranted. American
schools likely differ from Norwegian schools as well as teachers’ expectations of students in
terms of behavior or academic performance. It is difficult to say how cultural differences would
have an impact on the scores (and therefore factor structure) of the BRIEF-Teacher.
Further, executive dysfunction has been identified as a source of difficulty in different
groups of children with academic problems. More specifically, executive dysfunction is evident
in clinical samples as well as students in various categories of special education under IDEA
(e.g., specific learning disabilities or unique medical conditions). Thus, research on the BRIEF
should continue to be conducted using mixed clinical samples. How executive dysfunction is
expressed in various groups of students as well as how effective academic interventions are in
targeting such behaviors has not been thoroughly investigated. Investigating the aptitude-
treatment interaction (Cronbach & Snow, 1977) between students of various levels of cognitive
113
ability and the results of the BRIEF-Parent is also an area of further investigation that was
beyond the scope of this study, but should be explored.
Conclusions
Executive function is still a relatively new concept in psychoeducational assessment in
schools (Hale & Fiorello, 2004). The popularity of the EF framework has drastically increased
in schools over the past decade and the use of the BRIEF-Parent is also likely to continue to
increase in this setting. Therefore, research must continue on instruments, such as the BRIEF-
Parent, to ensure reliable and valid scores are used in making good decisions about those
students requiring special education services.
The purpose of this study was to test whether (a) the original eight-scale division of the
BRIEF-Parent scale was replicable in a mixed clinical sample of school-aged children and (b)
whether the original model would fit the data best when an alternative model approach was used.
Both eight-scale and nine-scale models have been tested and reported in the literature. However,
no comparisons have been made in an American sample between all of the existing models.
The current findings show that three of the four (two-, three-, and four-factor models) of
the nine scale models have a slightly better fit to the data than the original two-factor model,
eight scale division, which is currently the basis for scoring and interpreting the BRIEF-Parent.
The information brought to light in this study should be taken in consideration by the test
developers, warranting further examination, given that the BRIEF-Parent is used in the school
setting. Furthermore, more empirical research needs to be conducted on the relationship between
the results yielded from executive function instruments (such as the BRIEF-Parent) and their
connection to academic intervention. The use of the BRIEF-Parent as evidence to warrant
114
educational placement is beyond the scope of the instrument. Because executive function is only
one aspect of an individual cognitive functioning, it is essential that the BRIEF-Parent should not
be used in isolation for high states assessment, particularly in settings and populations where the
validity of the test scores has not been empirically-supported.
115
REFERENCES
Achenbach, T. (1991). Manual for the Child Behavior Checklist and 1991 profile. Burlington,
VT: University of Vermont, Department of Psychiatry.
Achenbach, T., McConaughy, S., & Howell, C. (1987). Child/adolescent behavioral and
emotional problems: Implications of cross-informant correlations for situational
specificity. Psychological Bulletin, 101, 213-232.
Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle.
In B. N. Petrove & F. Csaki (Eds.), Second International Symposium of Information
Theory (pp. 267-281). Budapest, Hungary: Akademiai Kiado.
Akibami, L. J., Liu, X., Pastor, P. N., & Reuben, C. A. (2011). Attention deficit hyperactivity
disorder among children aged 5–17 years in the United States, 1998–2009. NCHS Data
Brief, 70, 1-8.
Alexander, G. R., & Slay, M. (2002). Prematurity at birth: Trends, racial disparities, and
epidemiology. Mental Retardation and Developmental Disabilities Research Reviews, 8,
215-220.
American Psychiatric Association (2000). Diagnostic and statistical manual of mental disorders
4th
edition-text revision (DSM-IV-TR). Washington, DC: Author.
Americans with Disabilities Act of 1990, Pub. L. No. 34 C. F. R. 104 Stat. 33 (2000). Retrieved
from http://www2.ed.gov/policy/rights/reg/ocr/34cfr104.pdf
Anderson, V., Anderson, P., Northam, E., Jacobs, R., & Mikiewicz, O. (2002). Relationships
between cognitive and behavioral measures of executive function in children with brain
disease. Child Neuropsychology, 8, 231-240.
116
Aylward, G. P. (2004). Neonatology and Prematurity. In R. T. Brown (Ed.), Handbook of
pediatric psychology in school settings (pp. 489-502). Mahwah, NJ: Lawrence Ehrlbaum.
Baddeley, A. (1986). Working memory. Oxford, UK: Clarendon Press.
Baddeley, A. (1996). Exploring the central executive. The Quarterly Journal of Experimental
Psychology, 49A, 5-28.
Baddeley, A., & Hitch, G. (1974). Working memory. In G. H. Bower (Ed.), The psychology of
learning and motivation (vol. 8; pp. 47-89 New York, NY: Academic.
Baddeley, A., & Wilson, B. (1988). Frontal amnesia and the dysexecutive syndrome. Brain and
Cognition, 7, 212-230.
Barkley, R. A. (1997). ADHD and the nature of self-control. New York, NY: Guilford.
Batan, S. N., Öktem-Tanör, Ö., & Kalem, E. (2011). Reliability and validity studies of
Behavioral Rating Inventory of Executive Function (BRIEF) in a Turkish normative
sample. Elementary Education Online, 10, 894-904.
Beck, D. M., Schaefer, C., Pang, K., & Carlson, S. M. (2011). Executive function in
preschool children: Test-retest reliability. Journal of Cognition & Development, 12, 169-
193.
Bentler, P. M. (1988). Comparative fit indexes in structural models. Psychological
Bulletin, 107, 238-246.
Bentler, P. M. (1995). EQS structural equations program manual. Encino, CA: Multivariate
Software.
Bernstein, J. H. & Waber, D. P. (2007). Executive function in education from theory to
practice. New York, NY: Guilford Press.
117
Best, J., & Miller, P. (2010). A developmental perspective on executive function. Child
Development, 81, 1641-1660.
Best, J., Miller, P., Jones, L. (2009). Executive functions after age 5: Changes and correlates.
Developmental Review, 29, 180-200.
Bishop, T. (2011). Relationship between performance-based measures of executive function and
the Behavior Rating Inventory of Executive Function (BRIEF), a parent rating measure
(Doctoral dissertation). Illinois Institute of Technology, Chicago, IL.
Blais, M. A. (2011). A guide to applying rating scales in clinical psychiatry. Psychiatric Times,
28, 58-62.
Bodnar, L. E., Prahme, M. C., Cutting, I. E., Denckla, M. B., & Mahone, E. M. (2007).
Construct validity of parent ratings of inhibitory control. Child Neuropsychology, 13,
345-32.
Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In K. A. Bollen
& J. S. Long (Eds.), Testing structural equation models (pp. 445-455). Newbury Park,
CA: Sage.
Browne, M. W., & Cudeck, R. (1989). Single sample cross-validation indices for covariance
structures. Multivariate Behavioral Research, 24, 445-455.
Bull, R., & Scerif, G. (2001). Executive functioning as a predictor of children’s mathematics
ability: Inhibition, switching, and working memory. Developmental Neuropsychology,
33, 205-228.
118
Burgess, P. (1997). Theory and methodology in executive function research. In P. Rabbitt (Ed.),
Methodology of frontal executive function (pp. 81-116). Hove, East Sussex: Psychology
Press.
Byrne, B. M. (2006). Structural equation modeling with EQS: Basic concepts, applications, and
programming. (2nd
ed.). Mahwah, NJ: Lawrence Erlbaum.
Cameron, C. E., Connor, C. M., Morrison, F. J., & Jewkes, A. M. (2008). Effects of classroom
organization on letter-word reading in first grade. Journal of School Psychology, 46, 173-
192.
Cantin, R. H., Mann, T. D., & Hund, A. M. (2012). Executive functioning predicts school
readiness and success: Implications for assessment and intervention. Communique, 41, 1.
Carroll, J. B. (1993). Human cognitive abilities: A survey of factor-analytic studies. New York:
Cambridge University Press.
Chafouleas, S. M., Riley-Tillman, T. C., & Sugai, G. (2007). School-based behavioral
assessment: Informing Instruction and Intervention. New York, NY: Guilford Press.
Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for testing
measurement invariance. Structural Equation Modeling: A Multidisciplinary Journal, 9,
233-255.
Cicchetti, D. V. (1994). Guidelines, criteria, and rules of thumb for evaluating normed and
standardized assessment instruments in psychology. Psychological Assessment, 6, 284-
290.
119
Clark, C. A., Pritchard, V. E., & Woodward, L. J. (2010). Preschool executive functioning
abilities predict early mathematics achievement. Developmental Psychology, 46, 1176-
1191.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd
ed.). Hillsdale, NJ:
Lawrence Erlbaum.
Coll, C. G., Akerman, A., & Cicchetti, D. (2000). Cultural influences on developmental
processes and outcomes: Implications for the study of development and psychopathology.
Development and Psychopathology, 12, 333-356.
Comrey, A. L., & Lee, H. B. (1992). A first course in factor analysis. (2nd
ed.). Hillsdale, NJ:
Lawrence Erlbaum.
Conners, C. K. (1989). Manual for the Conners’ Rating Scales. North Towanda, NY: Multi-
Health Systems.
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16,
297-334.
Cronbach, L. & Snow, R. (1977). Aptitude and instructional methods: A handbook for research
on interactions. New York, NY: Irvington.
Curran, P. J., West, S. G., & Finch, J. F. (1996). The robustness of test statistics to nonnormality
and specification error in confirmatory factor analysis. Psychological Methods, 1, 16-29.
Dawson, P., & Guare, R. (2009). Smart but scattered. New York, NY: Guilford Press.
Denckla, M. B. (2002). The Behavior Rating Inventory of Executive Function: Commentary.
Child Neuropsychology, 8, 304-306.
120
Diamantopoulos, A., & Siguaw, J. A. (2000). Introducing LISREL: A guide for the uninitiated.
London: Sage.
Diamond, A., & Lee, K. (2011). Interventions shown to aid executive function development in
children 4 to 12 years old. Science, 333, 959-964. doi:10.1126/science.1204529
Dolcos, F., & McCarthy, G. (2006). Brain systems mediating cognitive interference by emotional
distraction. The Journal of Neuroscience, 26, 2072-2079.
Donders, J., DenBraber, D., & Vos, L. (2010). Construct and criterion validity of the Behavior
Rating Inventory of Executive Function (BRIEF) in children referred for
neuropsychological assessment after paediatric traumatic brain injury. Journal of
Neuropsychology, 4, 197-209. doi:10.1348/174866409X478970
Duncan, J., Emislie, H., Williams, P., Johnson, R., & Freer, C. (1996). Intelligence and the
frontal lobes: The organization of goal-directed behavior. Cognitive Psychology, 30, 257-
303.
DuPaul, G. J., Power, T. J., Anastopoulos, A. D., & Reid, R. (1998). ADHD Rating Scale – IV:
Checklist, norms and clinical interpretation. New York, NY: Guilford Press.
Egeland, J., & Fallmyr, Ø. (2010). Confirmatory factor analysis of the Behavior Rating
Inventory of Executive Function (BRIEF): Support for a distinction between emotional
and behavioral regulation. Child Neuropsychology, 16, 326-337. doi:
10.1080/09297041003601462
Eisenberg, N., Liew, J., & Pidada, S. U. (2004). The longitudinal relations of regulation and
emotionality to quality of Indonesian children’s socioemotional functioning.
Developmental Psychology, 40, 790-804.
121
Eslinger, P. J., & Damasio, A. R. (1985). Severe disturbance of higher cognition after
bilateral frontal lobe ablation: Patient EVR. Neurology, 35, 1731–1741.
Fabrigar, L. R., Wegener, D. T., MacCallum, R. C., & Strahan, E. J. (1999). Evaluating the use
of exploratory factor analysis in psychological research. Psychological Methods, 3, 272-
299. doi: 1082-989X/99/53.00
Fisher, A. B. & Watkins, M. W. (2008). ADHD rating scales’ susceptibility to faking in a college
student sample. Journal of Postsecondary Education and Disability, 20, 81-92.
Fisk, J. E., & Sharp, C. A. (2004). Age-related impairment in executive functioning: Updating,
inhibition, shifting, and access. Journal of Clinical and Experimental Neuropsychology,
26, 874-890.
Fitzpatrick, C. (2003). [Review of the test Behavior Rating Inventory of Executive Function]. In
The fifteenth mental measurements yearbook. Available from http://
ovidsp.tx.ovid.com.ezaccess.libraries.psu.edu.
Floyd, R. G., Bergeron, R., & Hamilton, G. (2004). Joint exploratory factor analysis of the
Delis-Kaplan Executive Function System and the Woodcock-Johnson III Tests of
Cognitive Abilities. Poster presented at the Annual Meeting of the American
Psychological Association, Honolulu, HI.
Fournier-Vicente, S., Larigauderie, P., & Gaonac’h, D. (2008). More dissociations and
interactions within central executive functioning: A comprehensive latent-variable
analysis. Acta Psychologica, 129, 32-48.
122
Franzen, M. D., & Wilhelm, K. L. (1996). Conceptual foundations of ecological validity in
neuropsychological assessment. In R. J. Sbordone & C. J. Long (Eds.), Ecological
validity of neuropsychological testing (pp. 91-112). Boca Raton, FL: St. Lucie.
Garon, N., Bryson, S. E., & Smith, I. M. (2008). Executive function in preschoolers: A review
using an integrative framework. Psychological Bulletin, 134, 31-60.
Gilotty, L., Kenworthy, L., Sirian, L., Black, D., & Wagner, A. (2002). Adaptive skills and
executive function in autism spectrum disorders. Child Neuropsychology, 8, 241-248.
Gioia, G. A., Espy, K. A., & Isquith, P. K. (2003). The Behavior Rating Inventory of Executive
Function – Preschool Version professional manual. Odessa, FL: Psychological
Assessment Resources.
Gioia, G. A., & Isquith, P. K. (2002). Two faces of monitor: Thy self and thy task [Abstract].
Journal of the International Neuropsychological Society, 8, 229.
Gioia, G. A., & Isquith, P. K. (2004). Ecological assessment of executive function in traumatic
brain injury. Developmental Neuropsychology, 25, 135-158.
Gioia, G. A. Isquith, P. K., Guy, S. C., & Kenworthy, L. (2000). The Behavior Rating Inventory
of Executive Function professional manual. Odessa, FL: Psychological Assessment
Resources.
Gioia, G. A., Isquith, P. K., Retzlaff, P. D., & Espy, K. A. (2002). Confirmatory factor analysis
of the Behavior Rating Inventory of Executive Function (BRIEF) in a clinical sample.
Child Neuropsychology, 8, 249-257. doi:0929-7049/02/0804-249
123
Godefroy, O., Cabaret, M., Petit-Chenal, V., Pruvo, J., & Rousseaux, M. (1999). Control
functions of the frontal lobe: Modularity of the central-supervisory system. Cortex, 35, 1-
20.
Goldberg, E. (2001). The executive brain: Frontal lobes and the civilized mind. New York, NY:
Oxford University Press.
Gonon, F., Bezard, E., & Boraud, T. (2011). Misrepresentation of neuroscience data might give
rise to misleading conclusions in the media: The case of attention deficit hyperactivity
disorder. PLoS ONE, 6, 1-8.
Greenberg, L. M., & Kindschi, C. L. (1996). Test of variables of attention: Clinical guide. Los
Alamitos, CA: Universal Attention Disorders.
Guy, S. C., Isquith, P. K., Gioia G. A. (2004). The Behavior Rating Inventory of Executive
Function- Self-Report Version. Lutz, FL: Psychological Assessment Resources.
Hale, J. B., & Fiorello, C. A. (2004). School neuropsychology: A practitioner’s handbook. New
York, NY: Guilford Press.
Halpern, A. S., Yovanoff, P., Doren, B., & Benz, M. R. (1995). Predicting participation in
postsecondary education for school leavers with disabilities. Exceptional Children, 62,
151-164.
Harris, M. B. (1996). Aggressive experiences and aggressiveness: Relationship to ethnicity,
gender, and age. Journal of Applied Psychology, 26, 843-870.
Heaton, R. K. (1981). Manual for the Wisconsin Card Sorting Test. Odessa, FL: Psychological
Assessment Resources.
124
Henson, R. K., & Roberts, J. K. (2006). Use of exploratory analysis in published research:
Common errors and some comments on improved practice. Educational and
Psychological Measurement, 66, 393-416. doi: 10.1177/0013164405282485.
Hintze, J. M., Volpe, R. J., & Shapiro, E. S. (2007). Best practices in the systematic direct
observation of student behavior. In A. Thomas & J. Grimes (Eds.), Best practices in
school psychology-V (pp. 319-336). Bethesda, MD: National Association of School
Psychologists.
Holler, R., & Zirkel, P. A. (2008). Section 504 and public school students: A national survey
concerning “Section 504-Only” student. NASSP Bulletin, 92, 19-43.
Hu, L., & Bentler, P. M. (1995). Evaluating model fit. In R. H. Hoyle (Ed.), Structural equation
modeling: Concepts, issues, and applications (pp. 76-99). Thousand Oaks, CA: Sage.
Hu, L., & Bentler, P. M. (1998). Fit indices in covariance structure modeling: Sensitivity to
under parameterized model misspecification. Psychological Methods, 3, 424-453.
Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indices in covariance structure analysis:
Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1-55.
Hughes, C., & Graham, A. (2002). Measuring executive functions in childhood: Problems and
solutions? Child and Adolescent Mental Health, 7, 131-142.
Huizinga, M., & Smidts, D. P. (2011). Age-related changes in executive function: A normative
study with the Dutch version of the Behavior Rating Inventory of Executive Function
(BRIEF). Child Neuropsychology, 17, 51-66. doi:10.1080/09297049.2010.509715
125
Hulac, D. M. (2008). Evaluating executive functioning, academic achievement and emotional
control with adolescent females in a residential treatment center (Doctoral dissertation).
University of Northern Colorado, Greeley, CO.
Individuals with Disabilities in Education Act (IDEA) (2006). Washington, DC: U.S.
Government Printing Office. Retrieved on March 20, 2012 from
http://idea.ed.gov/download/finalregulations.pdf
Isquith, P., Gioia, G., & PAR staff (2002). Behavior Rating Inventory of Executive
Function Scoring Portfolio. Odessa, FL: Psychological Assessment Resources.
Jepsen, M. I., Gray, K. M., & Taffe, J. R. (2012). Agreement in multi-informant assessment of
behaviour and emotional problems and social functioning in adolescents with Autistic
and Asperger’s Disorder. Research in Autism Spectrum Disorders, 6, 1091-1098.
Johansson, S., & Cnattigius, S. (2010). Epidemiology of preterm birth. In C. Nosarti, R. Murray,
& M. Hack (Ed.) Neurodevelopmental outcomes of preterm birth: From childhood to
adult life (pp. 1-38). New York, NY: Cambridge University Press.
Johnson, J., & Reid, R. (2011). Overcoming executive functioning deficits with students with
ADHD. Theory Into Practice, 50, 61-67.
Jöreskog, K. G. (1993). Testing structural equation models. In K. A. Bollen & J. S. Long
(Eds.), Testing structural equation models (pp. 294-316). Newbury Park, CA: Sage.
Jurado, M. B., & Rosselli, M. (2007). The elusive nature of executive functions: A review of our
current understanding. Neuropsychological Review, 17, 213-233. doi: 10.1007/s11065-
007-9040-z
126
Kaplan, E., Fein, D., Kramer, J., Delis, D., & Morris, R. (1999). WISC-III-PI manual. San
Antonio, TX: Psychological Corporation.
Kline, R. B. (2006). Principles and practice of structural equation modeling. New York, NY:
Guilford Press.
Landis, J., & Koch, G. (1977). The measurement of observer agreement for categorical data.
Biometics, 33 (1), 159-174.
Lane, K. L., O’Shaughnessy, T. E., Lambros, K. M., Gresham, F. M., & Beebe-Frankenberger,
M., E. (2002). The efficacy of phonological awareness training with first-grade students
who have behavior problems and reading difficulties. Journal of Emotional & Behavioral
Disorders, 9, 219-231.
Lehto, J. (1996). Are executive function tests dependent on working memory capacity?
Quarterly Journal of Experimental Psychology, 49, 29-50.
Lehto, J., Juujärvi, P., Kooistra, L., & Pulkkinen, L. (2003). Dimensions of executive
functioning: Evidence from children. British Journal of Developmental Psychology, 21,
59-80.
LeJeune, B., Beebe, D., Noll, J., Kenealy, L., Isquith, P., & Gioia G. (2010). Psychometric
support for an abbreviated version of the Behavior Rating Inventory of Executive
Function (BRIEF) Parent Form. Child Neuropsychology, 16, 182-201.
doi:10.1080/09297040903352556
127
Lewis, C., & Carpendale, J. I. (2009). Introduction: Links between social interaction and
executive function. In C. Lewis & J. I. M. Carpendale (Eds.), Social interaction and the
development of executive function. New Directions in Child and Adolescent
Development, 123, 1–15. doi:10.1002/cd.232
Lezak, M. D. (1983). Neuropsychological assessment (2nd ed.) New York, NY: Oxford
University Press.
Loeber, R., Green, S. M., & Lahey, B. B. (1990). Mental health professionals’ perception
of the utility of children, mothers, and teachers as informants on childhood
psychopathology. Journal of Clinical Child Psychology, 19, 136-143.
Loken, W. J., Thorton, A. E., Otto, R. L., & Long, C. J. (1995). Sustained attention after severe
closed head injury. Neuropsychology, 9, 592-598.
Logan, G. D., & Cowan, W. B. (1984). On the ability to inhibit thought or action: A theory of an
act of control. Psychological Review, 91, 295-327.
Luria, A. (1961). The role of speech in the regulation of normal and abnormal behavior.
Oxford, UK: Pergamon.
Luria, A. (1973). The working brain: An introduction to neuropsychology. New York, NY:
Basic.
MacCallum, R. C., Roznoski, M., & Necowitz, L. B. (1992). Model modifications in covariance
structure analysis: The problem of capitalization on chance. Psychological Bulletin, 111,
490-504.
128
Mahone, E. M., Cirino, P. T., Cutting, L. E., Cerrone, P. M., Hagelthorn, K. M., Hiemenz, J. R.,
… Denckla, M.B . (2002). Validity of the Behavior Rating Inventory of Executive
Function in children with ADHD and/or Tourette syndrome. Archives of Clinical
Neuropsychology, 17, 643-662.
Mahone, E. M., Koth, C. W., Cutting, L., Singer, H. S., & Denckla, M. B. (2001). Executive
function in fluency and recall measures among children with Tourette syndrome or
ADHD. Journal of the International Neuropsychological Society, 7, 102-111.
Marsh, H. W., & Grayson, D. (1995). Latent variable models of multitrait-multimethod
data. In R. Hoyle (Ed.), Structural equation modeling: Concepts, issues and
applications (pp. 177−198). Thousand Oaks, CA: Sage.
Martinez, Y. A., Schneider, B. H., Gonzales, Y. S., & del Pilar Soteras de Toro, M. (2008).
Modalities of anger expression and the psychosocial adjustment of early adolescents in
eastern Cuba. International Journal of Behavioral Development, 32, 207-217.
Mayes, S. D., Calhoun, S. L., Mayes, R. D., & Molitoris, S. (2012). Autism and ADHD:
Overlapping and discriminating symptoms. Research in Autism Spectrum Disorders, 6,
277-285.
McCandless, S., & O’Laughlin, L. (2007). The clinical utility of the Behavior Rating Inventory
of Executive Function (BRIEF) in the diagnosis of ADHD. Journal of Attention
Disorders, 10, 381-389. doi: 10.1177/1087054706292115
Mead, G. H. (1910). What objects must psychology presuppose? Journal of Philosophy,
Psychology, and Scientific Methods, 7, 174-180.
129
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13-
103). New York, NY: Macmillan.
Messick, S. (1995). Validity of psychological assessment: Validation of person’s responses and
performances as scientific inquiry into score meaning. American Psychologist, 50, 741-
749. doi: 003-066X/95
Milich, R., Widiger, T.A., & Landau, S. (1987). Differential diagnosis of attention deficit and
conduct disorders using conditional probabilities. Journal of Consulting & Clinical
Psychology, 55, 762–767.
Miyake, A., Friedman, N., Emerson, M., Witzki, A., & Howerter, A. (2000). The unity and
diversity of executive functions and their contributions to complex “frontal lobe” tasks: A
latent variable analysis. Cognitive Psychology, 41, 49-100. doi: 10.1006/cogp.1999.0734.
Monsell, S. (1996). Control of mental processes. In V. Bruce (Ed.), Unsolved mysteries of
the mind: Tutorial essays in cognition (pp. 93-148). Hove, UK: Erlbaum.
National Center for Education Statistics (2010). Children 3 to 21 years old served under
Individuals with Disabilities Education Act, Part B, by type of disability: Selected years,
1976-77 through 2008-09. Retrieved from
http://nces.ed.gov/programs/digest/d10/tables/dt10_045.asp
Norman, W., Shallice, T. (1986). Attention to action. In R.J. Davidson, G.E. Schwartz, &
D. Shapiro (Eds.), Consciousness and self regulation: Advances in research and theory
(Vol. 4, pp. 1-18). New York, NY: Plenum.
Nunally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd
ed.). New York, NY:
McGraw-Hill.
130
Obrzut, J. E. (1995). Dynamic versus structural processing differences characterize
laterality patterns of learning disabled children. Developmental Neuropsychology, 11,
467-484.
Offord, D. R., Boyle, M. H., Racine, Y., Szatmari, P., Fleming, J. E., Sanford, M., &
Lipman, E. L. (1996). Integrated assessment data from multiple informants.
Journal of the American Academy of Child and Adolescent Psychiatry, 35, 1078-1085.
Packwood, S., Hodgetts, H. M., & Tremblay, S. (2011). A multiperspective approach to the
conceptualization of executive functions. Journal of Clinical and Experimental
Neuropsychology, 33, 456-470. doi: 10.1080/13803395.2010.533157.
Palfrey, J. S., Levine, M. D., Walker, D. K., & Sullivan, M. (1985). The emergence of
attention deficits in early childhood: A prospective study. Journal of
Developmental and Behavioral Pediatrics, 3, 339-348.
Park, I. J., Kim, P. Y., Cheung, R. Y., & Kim, M. (2010). The role of culture, family
processes, and anger regulation in Korean American adolescents’ adjustment
problems. American Journal of Orthopsychiatry, 80, 258-266. doi: 10.1111/j.1939-
0023.2010.0129.x
Pennington, B. F., & Ozonoff, S. (1996). Executive functions and developmental
psychopathology. Journal of Child Psychology and Psychiatry, 37, 51-87.
Pennsylvania Department of Education. (2012). Poverty level by school district. Retrieved from
http://www.portal.state.pa.us/portal/server.pt/community/pa_pre_k_counts/8742/frl_by_d
istrict/522213
131
Pratt, B. M. (2000). The comparative development of executive function in elementary school
children with reading disorder and attention-deficit/hyperactivity disorder (Doctoral
dissertation). The California School of Professional Psychology at Alameda, Alameda,
CA.
Pressley, M., & Woloshun, V. (1995). Cognitive strategy instruction that really improves
children’s academic performance (2nd
ed.). Cambridge, MA: Brookline.
Qian, Y. & Wang, Y. (2007). Reliability and validity of Behavior Rating Inventory of Executive
Function for school age children in China. Journal of Peking University Health Sciences,
3, 277-283.
Raven, J. C., Court, J. H., & Raven, J. (1988). Manual for Raven’s Progressive Matrices and
Vocabulary Scales. London, UK: H. K. Lewis.
Reddy, L., Hale, J., & Brodzinsky, L. (2011). Discriminant validity of the Behavior Rating
Inventory of Executive Function Parent form for children with ADHD. School
Psychology Quarterly, 26, 45-55. doi: 10.1037/a0022585
Reitan, R. (1958). Validity of the Trail Making Test as an indicator of organic brain damage.
Perceptual and Motor Skills, 8, 271-276.
Reynolds, C., & Kamphaus, R. (1992). Behavior Assessment System for Children. Circle Pines,
MN: American Guidance Service.
Robbins, T. W., James, M., Owen, A. M., Sahakian, B. J., McInnes, L., & Rabbit, P. (1994).
Cambridge Neuropsychological Test Automated Battery (CANTAB): A factor analytic
study of a large sample of normal elderly volunteers. Dementia, 5, 266-281.
132
Rojahn, J., Rowe, E. W., Macken, J., Gray, A., Delitta, D., Booth, A., & Kimbrell, K. (2010).
Psychometric evaluation of the Behavior Problems Inventory-01 and the Nisonger Child
Behavior Rating Form with children and adolescents. Journal of Mental Health and
Research in Intellectual Disabilities, 3, 28-50.
Romine, C., & Reynolds, C. (2005). A model of the development of frontal lobe function:
Findings from a meta-analysis. Applied Neuropsychology, 12, 190-201.
Roth, R. M., Isquith, P. K., & Gioia, G. A. (2005). The Behavior Rating Inventory of Executive
Function – Adult Version. Lutz, FL: Psychological Assessment Resources.
Sattler, J. M. (2001). Assessment of children: Cognitive applications (4th ed.). La Mesa,
CA: Author.
Séguin, J. R., & Zelazo, P.D. (2005). Executive function in early physical aggression. In R. E.
Tremblay, W. W. Hartup, & J. Archer (Eds.), Developmental origins of aggression (pp.
307-329). New York, NY: Cambridge University Press.
Shallice, T., & Burgess, P. W. (1991a). Deficits in strategy application following frontal lobe
damage in man. Brain, 114, 727–741.
Shallice, T., & Burgess, P. W. (1991b). Higher-order cognitive impairments and frontal lobe
lesions in man. In H. Levin, H. Eisenberg, & A. Benton (Eds.), Frontal lobe
function and dysfunction (pp. 125-138). New York, NY: Oxford University Press.
Slick, D., Lautzenhiser, A., Sherman, E., & Eryl, K. (2006). Frequency of scale elevations and
factor structure of the Behavior Rating Inventory of Executive Function (BRIEF) in
children and adolescents with intractable epilepsy. Child Neuropsychology, 12, 181-189.
doi: 10.1080/09297040600611320.
133
Slomine, B. S., Gerring, J. P., Grados, M. A., Vasa, R., Brady, K. D., Christensen, J. R., &
Denckla, M. B. (2002). Performance on measures of ‘executive function’ following
pediatric traumatic brain injury. Brain Injury, 16, 759-772.
Sollman, M. J., Ranseen, J. D., & Berry, D. T. (2010). Detection of feigned ADHD in college
students. Psychological Assessment, 22, 325-335.
Sparrow, S. S., Balla, D., & Cicchetti, D. (1984). Vineland Adaptive Behavior Scales. Circle
Pines, MN: American Guidance Service.
Spearman, C. (1904). “General intelligence” objectively determined and measured. American
Journal of Psychology, 15, 201-293.
Steiger, J. H., & Lind, J. C. (1980, May). Statistically based tests for the number of common
factors. Paper presented at the annual meeting of the Psychometric Society, Iowa City,
IA.
Stroop, J. R. (1935). Studies of interference in serial verbal reaction. Journal of Experimental
Psychology, 18, 643-662.
Stuss, D. T., & Alexander, M. P. (2000). Executive functions and the frontal lobes: A conceptual
view. Psychological Research, 63, 289-298.
Stuss, D. T., & Benson, D. F. (1986). The frontal lobes. New York, NY: Raven.
Tabachnick, B. G., & Fidell, L. S. (2001). Using multivariate statistics (4th
ed.). Needham
Heights, MA: Allyn & Bacon.
Thompson, B., & Daniel, L. G. (1996). Factor analytic evidence for the construct validity of
scores: A historical overview and some guidelines. Educational and Psychological
Measurement, 56, 197-208.
134
Thorell, L., & Nyberg, L. (2008). The Childhood Executive Function Inventory (CHEXI):
A new rating instrument for parents and teachers. Developmental Neuropsychology, 33,
536-552.
Toplack, M., Bucciarelli, S., Jain, U., & Tannock, R. (2009). Executive functions:
Performance- based measures and the Behavior Rating Inventory of Executive
Function (BRIEF) in adolescents with attention deficit/hyperactivity disorder
(ADHD). Child Neuropsychology, 15, 53-72. doi: 10.1080/09297040802070929
Teuber, H. L. (1972). Unity and diversity of frontal lobe functions. Acta Neurobiologiae
Experimentalis, 32, 615-656.
Vilkki, J. & Holst, P. (1989). Deficient programming in spatial learning after frontal lobe
damage. Neuropsychologia, 27, 971-976.
Vriezen, E. R., & Pigott, S. E. (2002). The relationship between parental report on the BRIEF
and performance-based measures of executive faction in children with moderate to severe
traumatic brain injury. Child Neuropsychology, 8, 296-303.
Vygotsky, L. S. (1978). Mind in society: The development of higher psychological processes.
Cambridge, MA: Harvard University Press.
Wechsler, D. (1955). Wechsler Test of Adult Reading: Manual. San Antonio, TX: Psychological
Corporation.
Zelazo, P. D., Müller, U., Frye, D., & Marcovitch, S. (2003). The development of executive
function in early childhood. Monographs of the Society for Research in Child
Development, 68 (serial no. 274).
135
Appendix A
Glossary of Acronyms
_____________________________________________________________________________
Glossary of Acronyms
_____________________________________________________________________________
ADHD Attention Deficit Hyperactivity Disorder
ADHD-IV ADHD-Rating Scale-Fourth Edition
AODSR average off-diagonal standardized residual
APA American Psychiatric Association
ASD Autism Spectrum Disorders
BASC Behavior Assessment for Children Rating Scale
BRI Behavioral Regulation Index
BRIEF Behavior Rating Inventory of Executive Function
CBCL Achenbach’s Child Behavior Checklist
CFA confirmatory factor analysis
CFI comparative fit index
CI confidence interval
CRS Conners’ Rating Scales
CTMT Comprehensive Trail Making Test
DSM Diagnostic and Statistical Manual of Mental Disorders
EEG electroencephalogram
EF executive functioning
EFA exploratory factor analysis
GEC Global Executive Composite
GST Goal Search Task
IFI Bollen’s fit index
MI Metacognition Index
MRI magnetic resonance imaging
NNFI Bentler-Bonett non-normed fit index
136
Table (cont.)
________________________________________________________________________
NTRS normal theory root square
OHI Other Health Impairment
PAF principal axis factoring
PFC prefrontal cortex
PET positron emission tomography
PNFI parsimonious normed fit index
RMSEA root mean square error of approximation
SOC Stocking of Cambridge task
SRMR standard root mean square
TBI traumatic brain injury
TOH Tower of Hanoi task
TOVA Test of Variables of Attention Commissions
VABS Vineland Adaptive Behavior Scales
WCST Wisconsin Card Sorting Test
137
Appendix B
Items Comprising Scales on BRIEF-Parent form
____________________________________________________________________________
Scale Items
____________________________________________________________________________
1. Inhibit 38, 41, 43, 44, 49, 54, 55, 56, 59, 65
2. Shift 5, 6, 8, 12, 13, 23, 30, 39
3. Emotional Control 1, 7, 20, 25, 26, 45, 50, 62, 64, 70
4. Initiate 3, 10, 16, 47, 48, 61, 66, 71
5. Working Memory 2, 9, 17, 19, 24, 27, 32, 33, 37, 57
6. Plan/Organize
11, 15, 18, 22, 28, 35, 36, 40, 46, 51, 53, 58
7. Organization of Materials 4, 29, 67, 68, 69, 72
8. Monitor
a. Task-Monitor
b. Self-Monitor
14, 21, 31, 60
34, 42, 52, 63
138
Appendix C
School District Approval
139
Appendix D
Licensed Psychologist Approval
140
Appendix E
Office for Research Protections Correspondence
141
Appendix F
Structure Coefficients, Effect Sizes, and Error Terms for Subsamples
Table F1
Standardized Structure Coefficients for BRIEF-Parent Scales for OVR Sample Arranged by
Model (Maximum Likelihood Extraction)
Structure Coefficient Structure Coefficient
(Error Terms) (Error Terms)
Model [Effect Size] Model [Effect Size]
Model 1: Unity-8 Model Model 2: 2Original-8 Model
Factor 1- GEF Factor 1- BRI
Inhibit .71 (.70) [.51] Inhibit .76 (.65) [.57]
Shift .73 (.69) [.53] Shift .85 (.53) [.72]
ECO .63 (.78) [.40] ECO .82 (.57) [.68]
Initiate .86 (.51) [.74] Factor 2-MI
WM .86 (.52) [.74] Initiate .85 (.52) [.73]
P/O .88 (.47) [.78] WM .87 (.49) [.77]
ORG .70 (.71) [.49] P/O .91 (.41) [.84]
Monitor .84 (.54) [.71] ORG .71 (.70) [.51]
Monitor .82 (.57) [.68]
Model 3: 2Donders-8 Model Model 4: Unity-9 Model
Factor 1- BRI Factor 1- GEF
Shift .93 (.38) [.86] Inhibit .71 (.71) [.50]
ECO .78 (.62) [.61] Shift .73 (.69) [.53]
Factor 2- MI ECO .63 (.78) [.40]
Initiate .86 (.52) [.74] S-Monitor .73 (.68) [.53]
WM .87 (.50) [.75] Initiate .86 (.51) [.74]
P/O .90 (.44) [.81] WM .86 (.51) [.74]
ORG .71 (.71) [.50] P/O .89 (.47) [.78]
Monitor .84 (.55) [.70] ORG .70 (.71) [.50]
Inhibit .69 (.72) [.48] T-Monitor .69 (.72) [.48]
(Cont.)
142
Table F1 (cont.)
Structure Coefficient Structure Coefficient
(Error Terms) (Error Terms)
Model [Effect Size] Model [Effect Size]
Model 5: 2Monitor-9 Model Model 6: 3Monitor-9 Model
Factor 1-BRI Factor 1-BRI
Inhibit .81 (.59) [.65] Inhibit .83 (.55) [.69]
Shift .80 (.60) [.64] S-Monitor .86 (.52) [.73]
ECO .79 (.61) [.63] Factor 2- ERI
S-Monitor .82 (.57) [.68] Shift .86 (.51) [.74]
Factor 2- MI ECO .84 (.54) [.71]
Initiate .84 (.55) [.70] Factor 3- MI
WM .88 (.48) [.77] Initiate .84 (.55) [.70]
P/O .93 (.36) [.87] WM .88 (.48) [.77]
ORG .72 (.69) [.52] P/O .93 (.36) [.87]
T-Monitor .72 (.70) [.51]
ORG .72 (.69) [.52]
T-Monitor .72 (.70) [.51]
Model 7: 4Monitor-9 Model
Factor 1-BRI
Inhibit .83 (.55) [.69]
S-Monitor .86 (.51) [.73]
Factor 2- ERI
Shift .86 (.50) [.75]
ECO .84 (.54) [.71]
Factor 3- Int MI
Initiate .84 (.54) [.71]
WM .88 (.48) [.77]
P/O .93 (.37) [.86]
Factor 4 – Ext MI
ORG .73 (.68) [.53]
T-Monitor .75 (.69) [.53]
Note. N = 264; ECO = Emotional Control; WM = Working Memory; P/O = Plan/Organize; ORG
= Organization of Materials; S-Monitor = Self-Monitor; T-Monitor = Task-Monitor; GEF=
General Executive Functioning; BRI = Behavioral Regulation Index; MI = Metacognition Index;
ERI = Emotional Regulation Index; Int MI = Internal Metacognition Index; Ext MI = External
Metacognition Index; Unity-8 = One-factor, eight-scale model; 2Original-8 = Two-factor, eight-
scale Gioia model; 2Donders-8 = Two-factor, eight-scale Donders model; Unity-9 = One-factor,
nine-scale model; 2Monitor-9 = Two-factor, nine-scale model; 3Monitor-9 = Three-factor, nine-
scale model; 4Monitor-9 = Four-factor, nine-scale model.
143
Table F2
Standardized Structure Coefficients for BRIEF-Parent Scales for Caucasian Sample Arranged by
Model (Maximum Likelihood Extraction)
Structure Coefficient Structure Coefficient
(Error Terms) (Error Terms)
Model [Effect Size] Model [Effect Size]
Model 1: Unity-8 Model Model 2: 2Original-8 Model
Factor 1- GEF Factor 1- BRI
Inhibit .72 (.70) [.52] Inhibit .79 (.62) [.62]
Shift .73 (.68) [.54] Shift .83 (.56) [.69]
ECO .66 (.75) [.44] ECO .85 (.53) [.72]
Initiate.85 (.53) [.72] Factor 2- MI
WM .86 (.51) [.74] Initiate .85 (.53) [.72]
P/O .88 (.48) [.77] WM .87 (.49) [.76]
ORG .73 (.69) [.53] P/O .91 (.42) [.82]
Monitor .85 (.52) [.73] ORG .74 (.67) [.54]
Monitor .84 (.55) [.70]
Model 3: 2Donders-8 Model Model 4: Unity-9 Model
Factor 1- BRI Factor 1- GEF
Shift .91 (.42) [.82] Inhibit .72 (.70) [.51]
ECO .80 (.60) [.65] Shift .73 (.68) [.54]
Factor 2- MI ECO .66 (.75) [.44]
Initiate .85 (.53) [.72] S-Monitor .74 (.68) [.54]
WM .87 (.50) [.76] Initiate .85 (.53) [.72]
P/O .89 (.45) [.80] WM .86 (.51) [.75]
ORG .73 (.68) [.54] P/O .88 (.48) [.77]
Monitor .85 (.53) [.72] ORG .73 (.67) [.54]
Inhibit .70 (.72) [.49] T-Monitor .71 (.71) [.50]
Model 5: 2Monitor-9 Model Model 6: 3Monitor-9 Model
Factor 1-BRI Factor 1-BRI
Inhibit .82 (.57) [.68] Inhibit .85 (.53) [.72]
Shift .79 (.61) [.63] S-Monitor .84 (.54) [.71]
ECO .82 (.57) [.68] Factor 2- ERI
S-Monitor .82 (.58) [.67] Shift .84 (.55) [.70]
Factor 2- MI ECO .87 (.49) [.76]
Initiate .84 (.54) [.70] Factor 3- MI
WM .88 (.48) [.77] Initiate .84 (.55) [.70]
P/O .93 (.38) [.86] WM .88 (.48) [.77]
ORG .75 (.67) [.56] P/O .93 (.38) [.86]
T-Monitor .73 (.68) [.54] ORG .75 (.67) [.55]
T-Monitor .73 (.68) [.54]
144
Table F2 (continued)
Structure Coefficient
(Error Terms)
Model [Effect Size]
Model 7: 4Monitor-9 Model
Factor 1-BRI
Inhibit .85 (.53) [.72]
S-Monitor .85 (.54) [.71]
Factor 2- ERI
Shift .84 (.54) [.71]
ECO .87 (.50) [.75]
Factor 3- Int MI
Initiate .84 (.55) [.70]
WM .88 (.48) [.77]
P/O .92 (.39) [.85]
Factor 4 – Ext MI
ORG .75 (.67) [.56]
T-Monitor .74 (.67) [.55]
Note. N = 354; ECO = Emotional Control; WM = Working Memory; P/O = Plan/Organize; ORG
= Organization of Materials; S-Monitor = Self-Monitor; T-Monitor = Task-Monitor; GEF=
General Executive Functioning; BRI = Behavioral Regulation Index; MI = Metacognition Index;
ERI = Emotional Regulation Index; Int MI = Internal Metacognition Index; Ext MI = External
Metacognition Index; Unity-8 = One-factor, eight-scale model; 2Original-8 = Two-factor, eight-
scale Gioia model; 2Donders-8 = Two-factor, eight-scale Donders model; Unity-9 = One-factor,
nine-scale model; 2Monitor-9 = Two-factor, nine-scale model; 3Monitor-9 = Three-factor, nine-
scale model; 4Monitor-9 = Four-factor, nine-scale model.
145
Table F3
Standardized Structure Coefficients for BRIEF-Parent Scales for Mother Rater Sample Arranged
by Model (Maximum Likelihood Extraction)
Structure Coefficient Structure Coefficient
(Error Terms) (Error Terms)
Model [Effect Size] Model [Effect Size]
Model 1: Unity-8 Model Model 2: 2Original-8 Model
Factor 1- GEF Factor 1- BRI
Inhibit .68 (.74) [.46] Inhibit .76 (.65) [.58]
Shift .70 (.71) [.49] Shift .82 (.58) [.66]
ECO .62 (.78) [.39] ECO .84 (.55) [.70]
Factor 2- MI
Initiate .85 (.53) [.72] Initiate .84 (.54) [.71]
WM .88 (.48) [.77] WM .89 (.47) [.78]
P/O .90 (.44) [.80] P/O .92 (.39) [.85]
ORG .72 (.70) [.51] ORG .73 (.69) [.53]
Monitor .85 (.53) [.72] Monitor .83 (.56) [.69]
Model 3: 2Donders-8 Model Model 4: Unity-9 Model
Factor 1- BRI Factor 1- GEF
Shift .90 (.43) [.81] Inhibit .67 (.74) [.45]
ECO .78 (.62) [.61] Shift .70 (.72) [.49]
Factor 2- MI ECO .62 (.79) [.38]
Initiate .85 (.53) [.72] S-Monitor .71 (.71) [.50]
WM .88 (.47) [.78] Initiate .85 (.53) [.72]
P/O .91 (.41) [.83] WM .88 (.47) [.78]
ORG .72 (.69) [.52] P/O .90 (.43) [.81]
Monitor .84 (.54) [.71] ORG .72 (.69) [.52]
Inhibit .66 (.75) [.43] T-Monitor .71 (.70) [.51]
Model 5: 2Monitor-9 Model Model 6: 3Monitor-9 Model
Factor 1-BRI Factor 1-BRI
Inhibit .81 (.59) [.65] Inhibit .83 (.55) [.70]
Shift .78 (.63) [.60] S-Monitor .85 (.52) [.73]
ECO .80 (.60) [.64] Factor 2- ERI
S-Monitor .82 (.58) [.67] Shift .83 (.56) [.69]
Factor 2- MI ECO .85 (.53) [.72]
Initiate .83 (.56) [.69] Factor 3- MI
WM .89 (.46) [.79] Initiate .83 (.56) [.69]
P/O .94 (.34) [.88] WM .89 (.47) [.79]
ORG .73 (.68) [.54] P/O .94 (.34) [.88]
T-Monitor .74 (.67) [.54] ORG .73 (.68) [.54]
T-Monitor .74 (.68) [.54]
146
Table F3 (continued)
Structure Coefficient
(Error Terms)
Model [Effect Size]
Model 7: 4Monitor-9 Model
Factor 1-BRI
Inhibit .83 (.56) [.69]
S-Monitor .86 (.52) [.73]
Factor 2- ERI
Shift .83 (.55) [.70]
ECO .84 (.54) [.71]
Factor 3- Int MI
Initiate .83 (.56) [.69]
WM .89 (.46) [.79]
P/O .94 (.35) [.88]
Factor 4 – Ext MI
ORG .74 (.67) [.55]
T-Monitor .75 (.66) [.56]
Note. N = 267; ECO = Emotional Control; WM = Working Memory; P/O = Plan/Organize; ORG
= Organization of Materials; S-Monitor = Self-Monitor; T-Monitor = Task-Monitor; GEF=
General Executive Functioning; BRI = Behavioral Regulation Index; MI = Metacognition Index;
ERI = Emotional Regulation Index; Int MI = Internal Metacognition Index; Ext MI = External
Metacognition Index; Unity-8 = One-factor, eight-scale model; 2Original-8 = Two-factor, eight-
scale Gioia model; 2Donders-8 = Two-factor, eight-scale Donders model; Unity-9 = One-factor,
nine-scale model; 2Monitor-9 = Two-factor, nine-scale model; 3Monitor-9 = Three-factor, nine-
scale model; 4Monitor-9 = Four-factor, nine-scale model.
147
VITA
Maria Carbone Smith 65 Fawnvue Drive
Robinson Twp., PA 15136
__________________________________________________________________________________________
Education:
2003- 2013 The Pennsylvania State University M.S. (May 2006), PhD. (December 2013)
University Park, PA School Psychology
1999-2003 Clarion University of Pennsylvania B.A. (May 2003; GPA – 3.76)
Clarion, PA Psychology (major), Spanish (minor)
Publications:
Lei, P., Smith, M., Suen, H. K. (2007). The use of generalizability theory to estimate data reliability in single-subject
observational research. Psychology in the Schools, 44, 433-439.
Watkins, M. W., Wilson, S. M., Kotz, K. M., Carbone, M. C., & Babula, T. (2006). Factor structure of the Wechsler
Intelligence Scale for Children- Fourth Edition among referred students. Educational and Psychological
Measurement, 66, 975-983.
Research Experience:
Research Assistant, Meaningful Science Consortium, Northwestern University, 2006
Research Assistant, POINT (Parent Observations of Infants and Toddlers) Instrument, 2005
Research Assistant, The Fund for the Improvement of Postsecondary Education (FIPSE)
Grant, Clarion University of Pennsylvania, 2002- 2003
Student Assistant, Statistics Education for Quantitative Literacy Project (SEQuaL) Clarion University of
Pennsylvania, 1999
Clinical Experience:
Doctoral School Psychology Intern, Cranberry Area School District, 2007-2008
CEDAR Clinic Student Supervisor, Penn State CEDAR Clinic, 2005-2006
School Psychology Practicum Intern, Indiana Area School District, 2006
School Psychology Practicum Intern, State College Area School District, 2005
School Psychology Student Clinician, Penn State CEDAR Clinic, 2003-2005
Work Experience:
Evaluator and Report Writer, Diagnostic and Treatment Specialists, Harmony, PA 2008-2011
Therapeutic Support Staff (TSS), Milestones Community Health, Indiana, PA, 2006
Trained volunteer, Stop Abuse For Everyone (SAFE), Community Shelter, Clarion, PA 2000-2003
Awards/Grants:
Conrad Frank, Jr. Graduate Fellowship for achievement, Penn State University, 2005-2006
State APSCUF Scholarship, Clarion University of Pennsylvania, 2001
Foundation Leadership Award, Clarion University of Pennsylvania, 2000
Continuing Professional Development:
Student member of National Association of School Psychologists
Student member of Association of School Psychologists of Pennsylvania
Association of School Psychologists of Pennsylvania (ASPP) Conference, State College, PA. Oct 2012.
Sexting and Online Solicitation: A Discussion for School Psychologists [Webinar]. April 18, 2013. Nathan,
Laurie. National Center for Missing and Exploited Children. Accessed from http://www.nasponline.org.
“I can’t get in trouble for one little e-mail, can I?”- What School Psychologists Need to Know about Law
and Electronic Communication [Webinar]. May 23, 2013. Haase, Karen. H & S School Law. Accessed
from http://www.nasponline.org.
“Normal Bilingual Language Development or Language Disorder?” [Webinar]. June 11, 2013. Casilleja,
Nancy. Pearson Assessments. Accessed from http://www.pearsonassessments.com.
Research Interests: Psychoeducational assessment, scale development, executive function