21
Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=ncny20 Child Neuropsychology A Journal on Normal and Abnormal Development in Childhood and Adolescence ISSN: 0929-7049 (Print) 1744-4136 (Online) Journal homepage: http://www.tandfonline.com/loi/ncny20 Measuring executive function skills in young children in Kenya Michael T. Willoughby, Benjamin Piper, Dunston Kwayumba & Megan McCune To cite this article: Michael T. Willoughby, Benjamin Piper, Dunston Kwayumba & Megan McCune (2018): Measuring executive function skills in young children in Kenya, Child Neuropsychology, DOI: 10.1080/09297049.2018.1486395 To link to this article: https://doi.org/10.1080/09297049.2018.1486395 Published online: 17 Jun 2018. Submit your article to this journal View related articles View Crossmark data

Measuring executive function skills in young …...An executive function composite score was approxi-mately normally distributed, despite higher-than-expected floor and ceiling effects

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Measuring executive function skills in young …...An executive function composite score was approxi-mately normally distributed, despite higher-than-expected floor and ceiling effects

Full Terms & Conditions of access and use can be found athttp://www.tandfonline.com/action/journalInformation?journalCode=ncny20

Child NeuropsychologyA Journal on Normal and Abnormal Development in Childhood andAdolescence

ISSN: 0929-7049 (Print) 1744-4136 (Online) Journal homepage: http://www.tandfonline.com/loi/ncny20

Measuring executive function skills in youngchildren in Kenya

Michael T. Willoughby, Benjamin Piper, Dunston Kwayumba & MeganMcCune

To cite this article: Michael T. Willoughby, Benjamin Piper, Dunston Kwayumba & Megan McCune(2018): Measuring executive function skills in young children in Kenya, Child Neuropsychology,DOI: 10.1080/09297049.2018.1486395

To link to this article: https://doi.org/10.1080/09297049.2018.1486395

Published online: 17 Jun 2018.

Submit your article to this journal

View related articles

View Crossmark data

Page 2: Measuring executive function skills in young …...An executive function composite score was approxi-mately normally distributed, despite higher-than-expected floor and ceiling effects

ARTICLE

Measuring executive function skills in young children inKenyaMichael T. Willoughbya, Benjamin Piperb, Dunston Kwayumbab and Megan McCunec

aEducation & Workforce Development, RTI International, Research Triangle Park, North Carolina, USA;bInternational Education, RTI International, Nairobi, Kenya; cInternational Education, RTI International,Washington, DC, USA

ABSTRACTInterest inmeasuring executive function skills in young children in low-and middle-income country contexts has been stymied by the lack ofassessments that are both easy to deploy and scalable. This studyreports on an initial effort to develop a tablet-based battery of execu-tive function tasks, which were designed and extensively studied intheUnited States, for use in Kenya. Participantswere 193 children, aged3–6 years old, who attended early childhood development andeducation centers. The rates of individual task completion were high(65–100%), and 85% of children completed three or more tasks.Assessors indicated that 90% of all task administrations were of accep-table quality. An executive function composite score was approxi-mately normally distributed, despite higher-than-expected floor andceiling effects on inhibitory control tasks. Children’s simple reactiontime (β = –0.20, p = .004), attention-related behaviors during testing(β = 0.24, p = .0005), and age (β = –0.24, p = .0009) were all uniquelyrelated to performance on the executive function composite. Resultsare discussed as they inform efforts to develop valid and reliablemeasures of executive function skills among young children in devel-oping country contexts.

ARTICLE HISTORYReceived 8 March 2018Accepted 28 May 2018

KEYWORDSEarly childhood; cognitiveassessment; low- andmiddle-income countries

Executive functions (EFs) are a set of cognitive skills that contribute to planning,problem solving, and goal-directed behavior (Diamond, 2013). In contrast to othermore crystallized or automatized aspects of cognition, EFs are fluid cognitive processesthat individuals draw upon in circumstances when automaticity is not possible. Threefoundational components of EF include working memory (defined as the holding inmind and updating of information while performing some operation on it), inhibitorycontrol (defined as the inhibition of automatized responding when engaged in taskcompletion), and cognitive flexibility (defined as the ability to shift one’s cognitive setamong distinct but related aspects of a given task). Working memory, inhibitorycontrol, and cognitive flexibility are the components of EF that are most frequentlystudied in early childhood (Diamond, 2013; Garon, Bryson, & Smith, 2008).

EF skills reflect individual differences in the integrity and efficiency of neuralnetworks that are mediated by the prefrontal cortex and the anterior cingulate cortex

CONTACT Michael T. Willoughby [email protected] RTI International, 3040 Cornwallis Road, ResearchTriangle Park, NC 27709

CHILD NEUROPSYCHOLOGYhttps://doi.org/10.1080/09297049.2018.1486395

© 2018 Informa UK Limited, trading as Taylor & Francis Group

Page 3: Measuring executive function skills in young …...An executive function composite score was approxi-mately normally distributed, despite higher-than-expected floor and ceiling effects

(Miller & Cohen, 2001; Petersen & Posner, 2012). Notably, both regions are exten-sively interconnected with subcortical structures that are involved in emotion proces-sing, stress response, and sensorimotor inputs (Bush, Luu, & Posner, 2000; Koziol,Budding, & Chidekel, 2012). This structure provides the neurobiological basis for EFskills to exert “top-down” controls on other aspects of cognition and behavior but alsoto be influenced by “bottom-up” processes, including emotional reactivity and stress(Arnsten, 2015).

EF skills, as well as the neural substrates that support them, evidence life-coursedevelopment (De Luca & Leventer, 2008). Early childhood, which is often defined as3–6 years of age, is the developmental period in which rapid improvements in EFskills are first evident. These rapid improvements represent quantitative differencesin the efficiency of neural processes that support EF skills (e.g., Bell & Wolfe, 2007;Moriguchi & Hiraki, 2011), as well as the qualitative strategies that are used duringEF tasks (Chevalier, Huber, Wiebe, & Espy, 2013). As such, EF skills may beespecially susceptible to environmental influences during early childhood (Fox,Levitt, & Nelson, 2010; Knudsen, 2004). Optimistically, this openness to environ-mental influence implies that efforts to enhance or intervene on EF may be especiallyeffective in early childhood, which aligns with broader economic arguments regard-ing the strong returns on investment that result from early childhood interventions(Heckman & Mosso, 2014). Pessimistically, the openness to environmental influenceimplies that chronic exposure to risk factors across early childhood may havepernicious effects on EF, which is consistent with evidence that early life povertyimpairs EF skills and the neural processes that support them (Hackman, Farah, &Meaney, 2010; Noble et al., 2015).

Most of what is known about EF in early childhood, as well as across the life coursemore generally, is based on studies of participants who reside in high-income countries.However, researchers and policymakers have increasingly emphasized the importanceof developing and adapting neurodevelopmental assessments (of which EF is a subset)for use with children in low- and middle-income countries (LMICs) (Fernald, Kariger,Engle, & Raikes, 2009; Fernandes et al., 2014; Kutlesic, Brewinski Isaacs, Freund, Hazra,& Raiten, 2017; Prado et al., 2010). The reason for interest in neurodevelopmentalassessments for use in LMICs is twofold. First, progress in the treatment of specificpediatric diseases that directly impact brain and behavior development, such as HIVand malaria, has shifted the focus away from concerns of mortality toward concerns ofmorbidity. Efforts to improve the neurodevelopmental outcomes of children withspecific pediatric diseases require improved measurement tools (Sherr, Croome, ParraCastaneda, Bradshaw, & Herrero Romero, 2014). Beyond disease-specific research,researchers and policymakers have emphasized that children who reside in LMICsface numerous health and nutritional risk factors that collectively undermine theirlong-term developmental potential (Grantham-McGregor et al., 2007), and these risksare inequitably distributed, further disadvantaging the already vulnerable and poor.This growing consensus has underscored the importance of developing and adaptingneurodevelopmental assessments that can be used to evaluate intervention-relatedefforts. A special emphasis has been placed on tools that are appropriate for use inearly childhood, given the assumption that early interventions have the greatest poten-tial impacts (Fernald et al., 2009; Suchdev et al., 2017).

2 M. T. WILLOUGHBY ET AL.

Page 4: Measuring executive function skills in young …...An executive function composite score was approxi-mately normally distributed, despite higher-than-expected floor and ceiling effects

Although EF skills represent one facet of neurodevelopmental functioning, most neu-rodevelopmental assessments that have been used in LMICs have not measured EF skills;this is especially true for assessments that are intended for use in early childhood (Fischer,Morris, & Martines, 2014; Semrud-Clikeman et al., 2017). Similarly, although numerousearly learning assessments have been developed for use with young children in LMICcontexts, with two exceptions, they have tended to measure social-emotional competence,emergent literacy, language, and numeracy skills, and fine and gross motor development,but not EF skills (Brinkman et al., 2017; Gladstone et al., 2010; McCoy et al., 2017; Pisaniet al., 2017). The exceptions are the International Development and Early LearningAssessment and the Measuring Early Learning and Quality Outcomes (MELQO) batteries,which included three performance-based tasks of EF in addition to the more commonlymeasured early learning constructs (UNESCO, UNICEF, Brookings Institution, & theWorld Bank, 2017; Wolf et al., 2017). At the time of this writing, we are unaware of anypublished EF data that originated from these batteries. However, a broader search of theresearch literature identified five studies that administered performance-based EF assess-ments to 3- to 6-year-old children in LMIC contexts. These studies took place in Albania(Von Suchodoletz, Uka, & Larsen, 2015), Indonesia (Prado et al., 2012), Costa Rica andCameroon (Chasiotis, Kiessling, Hofer, & Campos, 2006), Zambia (McCoy, Zuilkowski, &Fink, 2015), and Pakistan (Tarullo et al., 2017). These pioneering studies provide thestrongest demonstration to date of the feasibility of using EF tasks that were developed inhigh-income countries with young children in LMICs. The studies by Chasiotis et al. (2006)and Tarullo et al. (2017) were especially noteworthy because they involved multi-taskbatteries that were used to create more reliable composites of EF skills.

An implicit assumption in these studies is that tasks that were used to measure EFskills in Western contexts are generalizable to LMICs. This assumption is consistent withthe characterization of EF skills as a domain-general construct that is appropriate forassessment globally (Boivin & Giordani, 2009). In their systematic review of the use of theKaufman batteries in sub-Saharan Africa, Van Wyhe, Van De Water, Boivin, Cotton, andThomas (2017) noted that previous efforts to improve the “cultural fairness” of theKaufman battery have included a reliance on local norms to define impairment, the useof local translations and data collectors, and avoidance of knowledge and achievementsubtests that rely heavily on crystallized (cf. fluid) intelligence that is highly contextuallydependent. Many commonly used EF assessments now accommodate these issues (i.e.,the results are EF testing are often based on within-sample comparisons and the contentof tasks do not rely on crystalized knowledge). Most of what has been written regardingcultural influences on EF has focused on performance differences (1) that are observedfrom children who are mono- versus multilingual (Ross & Melinger, 2017; Von Bastian,Souza, & Gade, 2016) or (2) that result from comparisons of children in Eastern versusWestern cultures (Imada, Carlson, & Itakura, 2013; Moriguchi, Evans, Hiraki, Itakura, &Lee, 2012). Neither of these issues was directly relevant to this study. However, we didmake concerted efforts to ensure that task stimuli, verbal instructions, and generalassessment approach for EF tasks were culturally appropriate for Kenya. We also notethat previous studies have successfully measured EF skills in school-aged children andadults in Kenya, which increased our expectation that the same could be accomplishedwith young children (Esopo et al., 2018; Kariuki, Abubakar, Newton, & Kihara, 2014;Kitsao-Wekulo, Holding, Taylor, Abubakar, & Connolly, 2013).

CHILD NEUROPSYCHOLOGY 3

Page 5: Measuring executive function skills in young …...An executive function composite score was approxi-mately normally distributed, despite higher-than-expected floor and ceiling effects

All studies that have measured EF skills among young children in LMICs have beenlimited by their use of paper-and-pencil methods to administer tasks and to recordchild responses. Paper-and-pencil tasks require time-intensive staff training and certi-fication, create opportunities for errors in administration and scoring, often requiretask-specific materials (e.g., cards, blocks), and involve data entry expense. The use oftouch screen-enabled computerized data collection tools has the potential to addressthese limitations and is essential if EF assessments are to be incorporated into large-scale data collection efforts. Bangirana and colleagues made a similar point in theirinvestigation of CogState, which is a rapid computerized neurocognitive assessment thatwas used with 5- to 13-year-old children in Uganda (Bangirana, Sikorskii, Giordani,Nakasujja, & Boivin, 2015).

One of us has been involved in the development of a battery of EF tasks (i.e., EFTouch) that were intended for use in early childhood in the United States (summarizedin Willoughby & Blair, 2016). EF Touch, which includes tasks that measure inhibitorycontrol, working memory, and cognitive flexibility, is appropriate for use in large-scalestudies involving young children. EF Touch is portable, requires no task-specificmaterials, and involves highly scripted instructions to facilitate consistent administra-tion by end users with minimal training and no technical knowledge of EF. The taskswere specifically designed with young children in mind. This included making theinstructional language “child friendly,” and limiting individual tasks to 3–7 min inlength to minimize attentional demands and to create opportunities for breaks asneeded. A common structure is used for each task (item demonstration, followed bytraining, followed by test items). Children complete test items only if they pass trainingitems (“passing” is automatically determined). The content of the tasks in EF Touch issimilar to that of other tasks that are in wide use with young children. What hasdifferentiated EF Touch from other efforts is the extensive psychometric evaluation thathas focused on the measurement properties of individual tasks, the battery-wide score,and the iterative approach to task development and refinement (Willoughby, Blair,Wirth, Greenberg, & The Family Life Project Key Investigators, 2010, 2012;Willoughby, Wirth, & Blair, 2011; Willoughby, Wirth, Blair, & The Family LifeProject Key Investigators, 2012; Willoughby, Pek, & Blair, 2013; Willoughby, Blair, &Family Life Project Key Investigators, 2016).

EF Touchwas initially developed in a paper-and-pencil format, transitioned to aMicrosoftWindows environment, and has recently been expanded to an Android tablet environmentusing Tangerine™ software. The transition into Tangerine™ is notable because this electronicplatform already had been used for data collection in more than 60 LMICs (http://www.tangerinecentral.org). Migrating EF Touch into Tangerine™ has the potential to expedite theglobal measurement of EF skills in young children in LMICs, and the ability to adapt andupdate tools to different LMIC contexts. Here, we report on a proof-of-concept study thatwas designed to inform basic questions about the feasibility and validity of direct measure-ment of EF skills with young children in an LMIC context.

The goals of this study were threefold. First, we pilot tested an approach for task reviewand adaptation, translation, and data collector training. Second, we evaluated the feasibility ofcomputerized (tablet-based) assessments of EF skills in young children living in a developing-country context. Consistent with our previous work (Kuhn,Willoughby, Blair, &McKinnon,2017), feasibility was defined with respect to rates of task completion, the length of

4 M. T. WILLOUGHBY ET AL.

Page 6: Measuring executive function skills in young …...An executive function composite score was approxi-mately normally distributed, despite higher-than-expected floor and ceiling effects

administration, and assessor impressions of data quality. Evidence indicating high rates oftask completion, developmentally appropriate time expectations for task completion, andhigh assessor confidence in data quality would all support the feasibility of this work. Third,we tested the validity of the EF tasks by correlating children’s performance with other factorsknown to affect measures of EF skills, including child age, processing speed, and on-taskbehavior. If EF Touch is performing as expected, age differences in overall performanceshould be evident, children with shorter (faster) reaction time should perform better onindividual tasks, and children’s task engagement should be related to better performance.Collectively, this evidence would demonstrate the feasibility and initial validity of Tangerine™EF Touch and would provide the basis for larger-scale and more rigorous investigations ofthis battery in LMIC contexts.

Methods

Setting

Nairobi is the capital city of Kenya. It is located on the southern edge of the rich agriculturalarea of central Kenya. The 2009Kenyan census placed the population ofNairobi at 2.7millionresidents (Kenya National Bureau of Statistics, 2010), but projected a 2017 population closerto 3.6 million. As a capital city, Nairobi is diverse ethnically, with representation from acrossthe country. The ethnic groups with the largest populations in Nairobi County include theKikuyu, Luo, Luhya, Kisii, Kamba, and Somali. In addition to Kenyan ethnic groups, there is asizable population of Asians and Europeans. Nairobi also hosts citizens from neighboringcountries disturbed by open conflict or insecurity, including South Sudan, Uganda, theDemocratic Republic of Congo, Ethiopia, and Somalia.

Nairobi’s early childhood development and education (ECDE) provision modalitiesvary. While Nairobi has more ECDE centers than any other jurisdiction in the country,most (90%) are private rather than public ECDE centers supported by governmentfunds (Ministry of Education, Science and Technology, 2014). The private schoolpopulation is also diverse, with a large majority of private schools located in urbansettlements; the Ministry of Education refers to them as Alternative Provision of BasicEducation and Training (APBET) institutions (Zuilkowski, Piper, Ong’ele, & Kiminza,In press). APBET institutions typically are small, with less than 200 students at eachcenter, and are often located in temporary physical structures. The Ministry ofEducation’s language-of-instruction policy requires that the language of instruction inECDE and lower primary education be the “language of the catchment area,” which iseither English or Kiswahili given Nairobi’s ethnic and language diversity (KenyaInstitute of Education, 1992). The language used by most private schools and APBETinstitutions in Nairobi’s informal settlements is English, although many children aremore comfortable with Kiswahili (Piper, Zuilkowski, & Ong’ele, 2016).

Sample and participants

This study was conducted in cooperation with the Kenyan Ministry of Education andwas approved by Kenya’s National Commission for Science, Technology andInnovation and the Kenya Medical Research Institute, the two bodies responsible for

CHILD NEUROPSYCHOLOGY 5

Page 7: Measuring executive function skills in young …...An executive function composite score was approxi-mately normally distributed, despite higher-than-expected floor and ceiling effects

approving human subjects research in Kenya. At the ECDE institution level, consentwas first sought from the school head teacher and teachers, followed by oral assent fromeach of the children who participated in the study. We sampled 16 schools fromNairobi for the EF Touch pilot assessment. Six public and 10 APBET institutionswere randomly selected using Stata statistical software. From the 16 sampled schools,the final 10 schools (4 public and 6 APBET) were selected because they had at least 10children in the pre-primary 1 and 2 classrooms. Pre-primary 1 serves 4 year olds andpre-primary 2 is for 5 year olds, although generally there are some children older andyounger in both types of classrooms. We used systematic random sampling at theschool level, balancing by gender within each ECDE level.

In total, 200 children were recruited into the study. In the process of establishinginitial rapport with each child, data collectors determined the child’s language prefer-ence (English or Kiswahili) and were instructed to administer all tasks in this language.At the data processing phase, we determined that seven students had tasks administeredin more than one language. We dropped these data from consideration to eliminatelanguage administration as a source of potential error. Among the 193 participatingchildren, most had tasks administered in Kiswahili (74%), with the remaining inEnglish. Participating children were 51% female and 3–6 years old (2% age 3; 35%age 4; 40% age 5; 23% age 6) at the time of the assessment.

Procedures

EF Touch adaptationThe initial step in this work involved expanding the technical functionality of theTangerine™ platform to allow the delivery of audio and visual images as stimuli andto record child responses in a touch screen environment. Once these technical devel-opments were completed, we conducted a four-step task adaptation process to prepareEF Touch tasks for use in Kenya. First, a content review of all tasks (in English) wasconducted by multiple PhD-level colleagues with extensive experience in early child-hood assessment in East Africa. Second, the English version of each task was translatedinto Kiswahili by a professional in-country translator and the Kiswahili version wasindependently compared back to the English version by an individual working with thetranslator. Third, a 2-day adaptation workshop was conducted in Nairobi that involvedthe research staff (including all coauthors) and Kenyan Ministry of Education officials,all of whom had expertise in early childhood development in Kenya and variousassessment batteries for early childhood. Some of them also had experience in adaptingMELQO (UNESCO et al., 2017) to the Kenyan context (RTI International, 2015). Eachtask was reviewed in detail – the intent of the task as it relates to the local culturalcontext, as well as the appropriateness of the translated text to the Kenyan context,taking into consideration child-friendly language and local languages. Workshop parti-cipants collected data on 32 children, and then attended a group debriefing to finalizetask translations and study procedures based on the applicability of the tasks and thelanguage used to Kenya’s context.

Research leads and Ministry of Education officials who participated in the adaptationworkshop subsequently conducted a 3-day training of 20 data collectors. Data collectorswere selected from a pool of 157 individuals who had experience in undertaking

6 M. T. WILLOUGHBY ET AL.

Page 8: Measuring executive function skills in young …...An executive function composite score was approxi-mately normally distributed, despite higher-than-expected floor and ceiling effects

assessments at the pre-primary level, plus geographical knowledge of the study sites.Like the adaptation workshop, the facilitators made a didactic presentation about EF,then reviewed each EF Touch task in depth. Pilot data from an additional 40 childrenwere collected during the data collection training workshop, to familiarize data collec-tors with the process of using EF Touch and to provide opportunities to generatequestions. Debriefing of data collection experience and continued practice ensured ahigh degree of data collector confidence. On the final day of training, each data collectorconducted a mock administration of the entire battery before one of the research leads.The leads completed a data collector certification checklist that was reviewed with datacollectors before the start of formal data collection.

Data collectionFive teams of four data collectors collected EF Touch data from 200 children over a 2-dayperiod. All data collection occurred in the mornings at participating ECDE centers. Equalnumbers of boys and girls were sampled from each classroom, and individual testingoccurred in spaces provided by school staff (e.g., outdoors on benches in a commonspace; in a separate building that adjoined the school). Children worked one-on-one withdata collectors. Data collectors determined the language of task administration (Englishor Kiswahili) based on initial, rapport-building conversations with children (e.g., askingsimple questions of the child and noticing which language was used in her response).Data were collected using Android tablets (9.4 cm × 15.2 cm). Tablets were placed in anupright position in tablet stands (approximately US$7 each) in landscape orientation, tocreate a more uniform testing experience. Assessment data, devoid of student identifiers,were uploaded from tablets to a Tangerine™ server at the end of each data collection day.

Measures

EF Touch: BubblesThis 30-item task measured simple reaction time. A series of blue bubbles was pre-sented, one at a time, and children were instructed to “pop” each bubble as fast as theycould. Items were presented for up to 5,000 ms, and the time that transpired betweenstimuli onset and child touch was recorded by the tablet. Item responses that were fasterthan 400 ms were considered too fast to be plausible (i.e., likely reflected a responserecorded from the previous screen) and were set to missing. If a child failed to touchany bubble, the item was considered inaccurate and reaction time was not recorded.The mean reaction time across all correctly answered items was used to index simplereaction time.

EF Touch: Silly Sounds StroopThis 17-item Stroop-like task measured inhibitory control. Each item displayed picturesof a dog and a cat (the left–right placement on the screen varied across trials) andpresented the sound of either a dog barking or cat meowing. After the assessor andchild established the sounds that each animal typically made, the child was instructed toplay a silly game that involved touching the picture of the animal that did not make thesound (e.g., touching the cat when hearing a dog bark). Each item was presented for3,000 ms, and the accuracy and reaction time of responses were recorded. Item

CHILD NEUROPSYCHOLOGY 7

Page 9: Measuring executive function skills in young …...An executive function composite score was approxi-mately normally distributed, despite higher-than-expected floor and ceiling effects

responses that were faster than 400 ms were considered too fast to be plausible andwere set to missing. If an item was omitted, the child was given an accuracy score ofzero, and reaction time was not recorded. Mean accuracy across all items was used torepresent performance.

EF Touch: Animal Go/No-GoThis 40-item go/no-go task measured inhibitory control. Individual pictures of animalswere presented, and children were instructed to touch a centrally located button ontheir screen every time that they saw an animal (the “go” response), except when thatanimal was a pig (the “no-go” response). Each item was presented for 3,000 ms, and theaccuracy and reaction time of responses were recorded. Item responses that were fasterthan 400 ms were considered too fast to be plausible and were set to missing. If an itemwas omitted, children were given an accuracy score of zero, and reaction time was notrecorded. Mean accuracy across the eight no-go items was used to representperformance.

EF Touch: Spatial Conflict ArrowsThis 36-item spatial conflict task measured inhibitory control and cognitive flexibility.Two soft buttons appeared on the left- and right-most sides of the tablet screen.Children were instructed to touch the button to which an arrow was pointing. Threeblocks of 12 arrows were depicted in which arrows appeared above the button to whichthey were pointing (congruent condition), above the opposite button to which theywere pointing (incongruent condition), or in mixed locations. Each item was presentedfor 3,000 ms, and the accuracy and reaction time of responses were recorded. Itemresponses that were faster than 400 ms were considered too fast to be plausible andwere set to missing. If an item was omitted, children were given an accuracy score ofzero, and reaction time was not recorded. Mean accuracy for the number of incon-gruent items (from incongruent and mixed conditions) was used to representperformance.

EF Touch: Pick the PictureThis 32-item self-ordered pointing task measured working memory. Children werepresented with arrays of pictures that varied in length (i.e., two, three, four, or sixpictures per set). For each set, children initially were instructed to touch any picture oftheir choice. On subsequent trials within that set, the pictures were presented indifferent locations, and children were instructed to pick a picture that had not yetbeen touched. The mean accuracy of responses in each picture set (except for the firstpicture, which did require working memory) was used to represent task performance.

EF Touch: Something’s the SameThis 30-item two-part task was intended to measure attention shifting and flexiblethinking. In part 1 (20 items), children were presented with two pictures (e.g., twoanimals, two flowers) that were similar in one dimension (i.e., color, shape, or size)and described as such. A third picture was then presented alongside the original two,and the child was asked to select which of the original pictures was similar to thenew picture along some other dimension (e.g., blue flower and red flower are similar

8 M. T. WILLOUGHBY ET AL.

Page 10: Measuring executive function skills in young …...An executive function composite score was approxi-mately normally distributed, despite higher-than-expected floor and ceiling effects

because they are flowers; the third new picture of blue chair is similar to the blueflower because they are both blue). In part 2 (10 items), a new set of instructions anddemonstrations was provided that involved presenting the child with all threepictures and asking that the child identify one pair of pictures that were similaralong one dimension (i.e., color, shape, size) and then a second pair among the samethree pictures that were similar in some other dimension. After deployment, wedetermined a programming error in part 2 of the task and thus do not report onthose items here (we mention part 2 because it contributed to our estimate of tasktime). The mean accuracy of responses in part 1, which corresponded to ourprevious work, was used to represent task performance.

EF Touch: Task quality ratingsAt the end of each task, data collectors completed a one-item three-category rating ofthe perceived quality of the collected data (i.e., 1 = I have SERIOUS CONCERNS aboutthese data [e.g., the child did not seem to understand the task, was frequently dis-tracted/interrupted, was unable to pay attention, was not engaged in task, etc.]; 2 = Ifeel OK about these data [e.g., the child seemed to understand the task pretty well, onlyexperienced a few distractions or interruptions, paid attention reasonably well, wassomewhat engaged in task, etc.]; 3 = I feel GOOD about these data [e.g., the childclearly understood task, had very few [if any] distractions/interruptions, paid attentionto task, was engaged, etc.]). These ratings reflected the data collectors’ subjectiveimpressions of children’s comprehension of task instructions, engagement with thetask, and the suitability of the testing environment.

Preschool self-regulation assessment (PSRA)Data collectors rated child behavior during testing using the PSRA assessor report (Smith-Donald, Raver, Hayes, & Richardson, 2007). The PSRA includes items that reflect children’sattention (e.g., pays attention during instructions; sustains attention), behavior (e.g.,remains seated; engages with person administering task), or affect (e.g., shows pleasure inaccomplishment; shows feelings) during the EF Touch battery. Examiners rated each itemon a three-point Likert rating scale, with each scale point consisting of clear, behavioraldescriptors. Four items were omitted, based on feedback from data collectors and stake-holders during the training workshop (those items referred to children grabbing test itemsor showing various forms of aggression, none of which was considered relevant). In linewith prior research (Smith-Donald et al., 2007), two subscales, which reflected attention-relative behaviors (14 items; Cronbach’s α = 0.87) and positive affect (9 items; Cronbach’sα = 0.75) during testing, were derived from factor analysis of these items in this sample. Dueto a technical problem, PSRA reports were available for only 155 students.

Results

Feasibility

Descriptive analyses were used to characterize the feasibility of using Tangerine™ EFTouch with young children who attended center-based care in Nairobi, Kenya. The firstfeasibility metric was the proportion of children who completed a task, where

CHILD NEUROPSYCHOLOGY 9

Page 11: Measuring executive function skills in young …...An executive function composite score was approxi-mately normally distributed, despite higher-than-expected floor and ceiling effects

completion was defined as passing training items. Between 62% and 100% of childrencompleted each EF task (see Table 1). Children completed an average of four of fivepossible EF tasks (English M = 3.9 tasks; Kiswahili M = 4.3 tasks), with 85% of childrencompleting three or more tasks (84% of English, 96% Kiswahili).

Children who passed the training items for a given task typically completed all of theresulting test items (M = 85–100% andMode = 100% across tasks). Tasks took an average of3–5 min to complete (see Table 1; note that while “Something’s the Same” took an averageof 9 min to complete, it was a two-part task and is better considered two adjoined tasks forpurposes of time estimates). Task times in Table 1 include the time associated with thedescription of the task and the completion of demonstration items by the data collector, aswell as the completion of training and (when appropriate) test items from children. Thesetimes do not include any breaks that were offered in between the tasks, which were given atthe discretion of the data collector. In general, the rates of task and item completion, as wellas task lengths, were similar irrespective of the language of administration.

On average, 90% of all administrations were deemed to be of acceptable (“OK”) or goodquality, with the remaining 10% of administrations raising some concern (see Figure 1). Dataquality ratings were similarly high irrespective of the language in which the task wasadministered.

Task and battery performance

Children’s performance on individual tasks is summarized in Table 2. Children completedbetween 61% and 82% of test items correctly on each task. Floor effects were evident for twotasks. Specifically, 11% of children overall (9% of Kiswahili and 19% of English) who passedthe training items on the Animal Go/No-Go task did not answer a single test item (i.e., no-gotrials) correctly. Moreover, 10% of children overall (11% of Kiswahili and 9% of English) whopassed the training items on the Spatial Conflict Arrows task did not answer a single test item(i.e., incongruent arrows) correctly. Ceiling effects were evident for all three of the inhibitorycontrol tasks and were particularly pronounced for the Animal Go/No-Go (47%) and Silly

Table 1. EF task feasibility metrics.Tasks attempted(% completion)

% task itemscompleted

Task length(min)

Total Kiswahili English Total Kiswahili English Total Kiswahili English

Task N (%) N (%) N (%) M (Mdn) M (Mdn) M (Mdn)M

(Mdn) M (Mdn)M

(Mdn)

Animal Go/No-Go

193 (92) 142 (100) 51 (71) 0.99 (1.0) 0.99 (1.0) 0.99 (1.0) 5.2 (5.0) 5.1 (5.1) 5.2 (4.8)

Silly SoundsStroop

188 (88) 141 (89) 47 (85) 0.85 (0.94) 0.86 (0.94) 0.84 (0.88) 2.5 (2.4) 2.5 (2.4) 2.4 (2.4)

Spatial ConflictArrows

192 (62) 141 (60) 51 (67) 0.87 (0.89) 0.86 (0.89) 0.89 (0.92) 2.3 (2.0) 4.8 (4.6) 4.8 (4.8)

Something’sthe Same

192 (85) 142 (86) 50 (82) 1.0 (1.0) 1.0 (1.0) 1.0 (1.0) 8.8 (8.2) 8.8 (8.2) – (–)

Pick the Picture 191 (100) 141 (100) 50 (100) 1.0 (1.0) 1.0 (1.0) 1.0 (1.0) 5.1 (5.0) 5.0 (5.0) 5.4 (5.1)

Note. N = sample size; M = mean; Mdn = median. For Something’s the Same, time data inadvertently were notcollected in the English version of the tasks. Moreover, the total time reflects parts 1 and 2 of the task, although onlypart 1 scores are used here.

10 M. T. WILLOUGHBY ET AL.

Page 12: Measuring executive function skills in young …...An executive function composite score was approxi-mately normally distributed, despite higher-than-expected floor and ceiling effects

Sounds Stroop (30%) tasks. Descriptively, the mean levels of task performance were compar-able irrespective of the language in which the task was administered.

Bivariate correlations among the EF task scores, as well as the simple reaction timetask (Bubbles), are summarized in Table 3. Three points are noteworthy. First, childrenwith slower simple reaction times (i.e., longer mean responses on the Bubbles task)performed somewhat more poorly on each of the EF tasks (rs = − 0.08 to −0.29).Second, the magnitude of the correlations among EF tasks was modest (rs = − 0.06 to0.25). Third, the Animal Go/No-Go task was unrelated to performance on all othertasks, which was likely a function of limited task variation resulting from floor andceiling effects.

Consistent with our previous work (Willoughby, Blair, & The Family Life ProjectKey Investigators, 2016; Willoughby, Kuhn, Blair, Samek, & List, 2017), we created abattery-wide composite by standardizing each task score (M = 0, SD = 1) and averagingchildren’s responses across tasks. This battery-wide composite reflected a child’s averagestanding on all the tasks that s/he completed. As shown in Figure 2, the EF composite

Figure 1. EF task quality ratings, by language of administration.

Table 2. EF task descriptive statistics.Test items correct Floor effect Ceiling effect

Total Kiswahili English Total Kiswahili English Total Kiswahili English

Task M (SD) M (SD) M (SD) % % % % % %

Animal Go/No-Go 0.75 (0.35) 0.77 (0.33) 0.65 (0.41) 11 9 19 47 49 39Silly Sounds Stroop 0.82 (0.22) 0.82 (0.22) 0.80 (0.22) 0 0 0 30 31 25Spatial Conflict Arrows 0.61 (0.35) 0.61 (0.35) 0.62 (0.33) 10 11 9 18 20 12Something’s the Same(part 1)

0.66 (0.15) 0.66 (0.14) 0.67 (0.18) 0 0 0 2 1 7

Pick the Picture 0.70 (0.10) 0.69 (0.10) 0.72 (0.11) 0 0 0 0 0 0

Note. M = mean; SD = standard deviation.

CHILD NEUROPSYCHOLOGY 11

Page 13: Measuring executive function skills in young …...An executive function composite score was approxi-mately normally distributed, despite higher-than-expected floor and ceiling effects

was approximately normally distributed and exhibited expected age-related improve-ments (3- and 4-year-old children were combined into a single group because of thesmall number of 3s in the sample). Consistent with individual task scores (see Table 3),the EF composite was negatively correlated with simple reaction time from the Bubblestask (r = – 29, p < .001). The EF composite was positively related to data collectorimpressions of children’s attention-related behaviors (r = 0.36, p < .001) – not theirpositive affect during testing (r = 0.15, p = .07). This finding may be related to culturaldifferences in how all children interact with adults in Kenya.

An ordinary least-squares regression model was estimated to investigate the uniquecontributions of simple reaction time (Bubbles task), demographic factors (child age,gender, language of administration), and data collector ratings of children’s behaviorduring testing (mean replacement was used to accommodate missing values for thePSRA scales). This set of predictors was significantly associated with the EF compositescore, F(7, 185) = 8.0, p < .001, adjusted R2 = 0.20. Individual differences in simplereaction time (β = –0.20, p = .004) and attention-related behaviors during testing(β = 0.24, p = .0005) were both uniquely related to the EF composite. Specifically,

Table 3. Bivariate correlations among EF and simple reaction time tasks.1. 2. 3. 4. 5. 6.

1. Bubbles –2. Animal Go/No-Go −0.08 –3. Silly Sounds Stroop −0.29*** 0.14 –4. Spatial Conflict Arrows −0.16 0.05 0.17 –5. Something’s the Same −0.15* −0.06 0.15 0.19 –6. Pick the Picture −0.17* 0.04 0.14 0.25** 0.22** –

Note. Sample sizes by task = 154–191; *p < .05, **p < .01, ***p < .001. Sample size differences in pairwise elementscontribute to differences in statistical significance.

Figure 2. Age differences in simple reaction time and EF composite scores.

12 M. T. WILLOUGHBY ET AL.

Page 14: Measuring executive function skills in young …...An executive function composite score was approxi-mately normally distributed, despite higher-than-expected floor and ceiling effects

children who had a faster simple reaction time and who demonstrated greater attentionand engagement during task completion had higher EF composite scores. Relative to 5year olds (reference group), 3- and 4 year olds performed more poorly on the EFcomposite (β = –0.24, p = .0009). None of the other demographic factors (language ofadministration, p = .78; sex, p = .53; 6 year olds, p = .62) were uniquely related to the EFcomposite, nor were data collector ratings of positive affect during testing (p = .26).

Discussion

This study represented an initial test of the feasibility of adapting and deploying abattery of performance-based EF tasks – all of which had been administered extensivelyamong 3- to 5-year-old children in the United States – for use in LMIC contexts, withKenya serving as the initial exemplar. This effort involved technical changes in themethod in which tasks were administered (from a Windows-based, dual-monitor setupto an Android-based, single-screen tablet). It also involved the development of anoverall approach for task review, translation, data collector training, and task evalua-tion. With the adoption of a computerized format, this study extends previous studiesthat have successfully used paper-and-pencil methods to measure EF in young childrenin developing country contexts. Most children completed most tasks, assessment timewas relatively short, and the face validity of tasks was evident. Below, we elaboratespecific findings, identify task limitations that will require future attention, and considerthe broader implications of this work.

We used a multistep process to review and adapt tasks for use in Kenya. This included areview of task content by in-country experts in early childhood assessment, the use of aprofessional in-country translator, didactic presentation of EF skills and tasks to stakeholders(research colleagues andMinistry of Education officials), and stakeholder participation in thepilot testing of tasks. Through this process, we learned that the translator needs to clearlyunderstand the intent of each task – not simply the (in this case, English) words to betranslated. Providing additional instructions to the translator might have streamlined thestakeholders’ debates about whether the Kiswahili and English versions of tasks were com-municating identical task objectives. We also learned the importance of having key technicalstakeholders participate in pilot data collection. This arrangement minimized speculationabout how young children might perceive and respond to tasks and gave the stakeholders ashared set of experiences with which to discuss tasks. Engaging key research officers in thisprocess in advance of their conducting training for data collectors was also beneficial becausethey could directly relate to and address questions that arose when data collectors undertookpilot data collection.

In terms of feasibility, with one exception (Spatial Conflict Arrows, described below),85% or more of children completed each of the EF tasks. The common structure of alltasks (demonstration items, followed by training items, followed by test items for thosechildren that passed training) helped to ensure that the children clearly understood thetask objectives. Moreover, automating the determination of whether training items were“passed” standardized the assessment experience. Specifically, data collectors were notrequired to decide whether the child passed the training items, nor how many rounds oftraining were appropriate (children were given up to two opportunities to completetraining items before the task was automatically discontinued), as the programmed

CHILD NEUROPSYCHOLOGY 13

Page 15: Measuring executive function skills in young …...An executive function composite score was approxi-mately normally distributed, despite higher-than-expected floor and ceiling effects

tablets responded to the child’s performance. Finally, only 62% of children successfullypassed the training items for the Spatial Conflict Arrows task. In our previous work,this task was not viewed as more difficult, which suggests that task instructions and/orcriteria for whether children could be considered to have passed training items mightrequire revision in LMIC contexts such as Kenya.

Additional support for the feasibility of direct assessments came from the qualityratings that were completed by data collectors at the end of each assessment. Thequality ratings indicated that data collectors had confidence in the data that they werecollecting, and this was evident for all tasks in both languages. The ratings were a briefbut structured method for data collectors to share their subjective impressions of dataquality with data analysts. We found this approach more useful than open-form textcomments, which are often difficult to analyze and interpret systematically.

In addition to questions of feasibility, we considered children’s performance on each EFtask. On average, children completed 61–82% of the test items on each task correctly, withthe full range of variation present on each task. However, the inhibitory control taskssuffered from unacceptably high rates of floor and/or ceiling effects. There are two likelyexplanations for the relatively poor performance of these tasks. First, EF Touch hasprimarily been used with 3- to 5-year-old children in the United States and was designedto ensure that individual differences could be detected even among 3 year olds from low-income households. The current study included older children (63% of children were 5 or6 years old) from relatively high-performing preschools in the relatively wealthier capitalcity of Nairobi. These inhibitory control tasks may not have been sufficiently challengingfor children with high ability. Second, the stimulus presentation times for the inhibitorycontrol tasks in EF Touch (3,000 ms) were longer than in other studies that used similartasks with children of similar ages (1,500–2,000 ms; Simpson & Riggs, 2006; Wiebe,Sheffield, & Andrews Espy, 2012). Shortening the presentation time for individual itemswould likely reduce ceiling effects. An open question is whether the impact of itempresentation time would vary as a function of children’s familiarity with tablets, which isan experience that likely differs both within and across LMIC contexts.

On average, children completed four of the five EF tasks. However, consistent withthe broader literature, children’s performances across tasks were weakly (rs ≈ 0.1–0.3)associated (see Willoughby, Holochwost, Blanton, & Blair, 2014). Given modest corre-lations, we have advocated for creating an overall EF composite score that reflects thesummation of performance across tasks rather than defining EF as reflecting only theshared variation across tasks (Willoughby et al., 2016, 2017). The EF composite score inthis study was approximately normally distributed and was associated with expectedage-graded improvements. Notably, child age, individual differences in simple reactiontime, and child testing behavior were all uniquely associated with the EF compositescore. In contrast, child gender, the language of task administration, and positive affectobserved during task completion were not. These results provide initial face validity forTangerine™ EF Touch tasks in this new format and setting.

Ardila (2005) identified several cultural-dependent values that threatened the use inLMICs of cognitive assessments that were developed in high-income countries; threesuch assessments are germane to this study. The first concerns the cultural appropri-ateness of one-on-one (adult to child) assessment. For better or worse, individualizedtesting of preprimary and primary-aged youth is increasingly common in LMICs in

14 M. T. WILLOUGHBY ET AL.

Page 16: Measuring executive function skills in young …...An executive function composite score was approxi-mately normally distributed, despite higher-than-expected floor and ceiling effects

general (Gove & Dubeck, 2016). For example, near the time of this study, extensivenationally representative literacy and numeracy assessments of this type were beingconducted in Kenya (Freudenberger & Davis, 2017; Kenya National ExaminationsCouncil, 2016). Although adult–child assessments are not considered atypical inKenya, this is an important consideration in other contexts where investmentsin education and assessments are less common. The second concern is uncertainty inchildren’s perceptions of the importance of testing, including what degree of effortshould be expended. We would never consider EF Touch to be used for anything otherthan low-stakes decisions and have never described the tasks to children as “tests.”Instead, they are presented as “fun games.” Children are not given any indication oftheir performance (including when tasks are discontinued if training items are notcompleted), although they are praised for effort and offered a small reward at thecompletion of testing (e.g., children were given a pencil after task completion in thisstudy). The third potential concern is cultural variation in time use, including timeconstraints associated with EF task completion. As we noted, the inhibitory controltasks that involved a speeded component appeared to be too easy. We expect that thiswas more a characteristic of the participating children than cultural variation in theexperience of time. Fortunately, EF Touch allows manipulating both the stimuluspresentation and inter-stimulus speeds of speeded tasks to ensure that this is not anissue in other settings.

This study had at least five limitations. First, it involved a small sample of children whowere drawn from a convenience sample of preschools in Nairobi. Although preschools wereselected to represent a range of settings (i.e., from publicly funded schools supported by theMinistry of Education, to privately funded, informal schools), they tended to be high-performing schools, given Nairobi’s wealth relative to the rest of the country. The resultstherefore may not represent the typical range of student performance in Kenya. Second, mostchildren in this study ranged from 4 to 6 years of age, which is consistent with the target age ofchildren served in pre-primary classrooms in Kenya. Although a small number of 3 year oldsparticipated, the number of children was too small to make strong inferences about thefeasibility of using EF Touchwith children this young in LMIC contexts, though we have hadgood success administering these tasks to 3 year olds in the United States (Willoughby et al.,2010). Third, elsewhere we have emphasized the value of using modern psychometricmethods to investigate EF task performance and to test for equivalence in item parametersacross subgroups (Willoughby et al., 2011). Although the current study was not sufficientlylarge to ask these questions, an important direction for future research will involve formaltests of task equivalence across English vs. Kiswahili languages of administration. Fourth, datacollectors determined the language of administration based on child responses to rapport-building questions. Given that English is the de facto language of instruction inmany schools,some children may have responded to data collectors in English despite have greaterproficiency in Kiswahali. This decisionmay have impacted their performance. Fifth, althoughwe collected item-level reaction time, all EF task scores were based solely on accuracy data.Due to the speeded nature of inhibitory control tasks, the joint use of item-level reaction timeand accuracy data might enhance task scoring by reducing instances of floor and ceilingeffects (Magnus, Willoughby, Blair, & Kuhn, In press), both of which were a problem forinhibitory control tasks in this study. This is another direction for the next iteration of thiswork that involves a larger sample size.

CHILD NEUROPSYCHOLOGY 15

Page 17: Measuring executive function skills in young …...An executive function composite score was approxi-mately normally distributed, despite higher-than-expected floor and ceiling effects

Recent national assessments in Kenya have demonstrated a wide range of performanceon educational outcomes at grade 2 (Freudenberger & Davis, 2017), grade 3 (KNEC,2016), and grade 6 (Hungi & Thuku, 2010), and particularly in household-based assess-ments covering children from ages 6 to 16 (Uwezo, 2016). These performance variationsare only partially explained by geographic and socioeconomic differences. Large-scaleeducational interventions in Kenya have recently begun, including the Tusome programin literacy (2014–2019) and the Primary Education Development program in numeracy(2015–2018). Initial impact evaluations have shown substantial improvements in learningoutcomes in literacy (Freudenberger & Davis, 2017), yet massive variations in learningoutcomes persist, and in some cases have expanded as a result of these interventions(Oketch, Ngware, & Ezeh, 2010; Piper, Jepkemei, & Kibukho, 2015). These widening gapssuggest that there are missing variables that can explain the differences in performanceand that EF skills might contribute to some of that explanation. Moreover, Kenya hasonly recently begun, through the Tayari program, to support its counties to implementlarge-scale educational interventions in early childhood development (Piper, Merseth, &Ngaruiya, 2017); and before the advent of this intervention, little had been systematicallydone to improve the quality of pre-primary education in Kenya at scale within govern-ment structures. Understanding how EF skills interact with the range of necessary skillsfor pre-primary and primary education is a clear gap in the literature in Kenya and likelyacross other LMICs. Further development of EF Touch and other tools like it are essentialfor this work to proceed.

In summary, this study demonstrated the feasibility and initial face validity of usingTangerine™ EF Touch to measure EF skills in young children in an LMIC context. Thewidespread use of the Tangerine™ platform for data collection in LMICs has thepotential to greatly accelerate our ability to deploy globally scalable measures of EF atyoung ages and in locations where standard assessment tools for these children do notyet exist. The ability to reliably and validly measure EF skills in young children indeveloping country contexts has the potential to make fundamental contributions to thescientific literature, particularly given the dramatic variations in early life experience ofchildren in LMICs relative to high-income countries. Moreover, the availability ofreliable, valid, and culturally relevant performance-based assessment of EF in youngchildren in LMICs has the potential to inform policy-relevant questions related to earlychildhood programming and pedagogy.

Acknowledgments

This work was funded through internally supported research at RTI International and theChildren’s Investment Fund Foundation The authors are grateful for the ongoing collaborationwith the Kenyan Ministry of Education, and the ministry’s county education officers. Theauthors especially appreciate the support of Dr. Hellen Kimathi, the Director of EarlyChildhood Development for the Ministry of Education.

Funding

This work was supported by the RTI International and the Children’s Investment FundFoundation.

16 M. T. WILLOUGHBY ET AL.

Page 18: Measuring executive function skills in young …...An executive function composite score was approxi-mately normally distributed, despite higher-than-expected floor and ceiling effects

References

Ardila, A. (2005). Cultural values underlying psychometric cognitive testing. NeuropsychologyReview, 15(4), 185–195.

Arnsten, A. F. (2015). Stress weakens prefrontal networks: Molecular insults to higher cognition.Nature Neuroscience, 18(10), 1376–1385.

Bangirana, P., Sikorskii, A., Giordani, B., Nakasujja, N., & Boivin, M. J. (2015). Validation of thecogstate battery for rapid neurocognitive assessment in Ugandan school age children. Childand Adolescent Psychiatry and Mental Health, 9(1), 1–7.

Bell, M. A., & Wolfe, C. D. (2007). Changes in brain functioning from infancy to early childhood:Evidence from EEG power and coherence during working memory tasks. DevelopmentalNeuropsychology, 31(1), 21–38.

Boivin, M. J., & Giordani, B. (2009). Neuropsychological assessment of African children:Evidence for a universal brain/behavior omnibus within a coconstructivist paradigm.Progress in Brain Research, 178, 113–135.

Brinkman, S. A., Kinnell, A., Maika, A., Hasan, A., Jung, H., & Pradhan, M. (2017). Validity andreliability of the early development Instrument in Indonesia. Child Indicators Research, 10(2),331–352.

Bush, G., Luu, P., & Posner, M. I. (2000). Cognitive and emotional influences in anteriorcingulate cortex. Trends in Cognitive Sciences, 4(6), 215–222.

Chasiotis, A., Kiessling, F., Hofer, J., & Campos, D. (2006). Theory of mind and inhibitorycontrol in three cultures: Conflict inhibition predicts false belief understanding in Germany,Costa Rica and Cameroon. International Journal of Behavioral Development, 30(3), 249–260.

Chevalier, N., Huber, K. L., Wiebe, S. A., & Espy, K. A. (2013). Qualitative change in executivecontrol during childhood and adulthood. Cognition, 128(1), 1–12.

De Luca, C. R., & Leventer, R. J. (2008). Developmental trajectories of executive functions acrossthe life span. In V. Anderson, R. Jacobs, & P. J. Anderson (Eds.), Executive functions and thefrontal lobes: A lifespan perspective (pp. 23–56). New York: Taylor & Francis.

Diamond, A. (2013). Executive functions. Annual Review of Psychology, 64(1), 135–168.Esopo, K., Mellow, D., Thomas, C., Uckat, H., Abraham, J., Jain, P., Jang, C., Otis, N., Riis-

Vestergaard, M., Starcev, A., Orkin, K. & Haushofer, J. (2018). Measuring self-efficacy,executive function, and temporal discounting in Kenya. Behaviour Research and Therapy,101, 30–45.

Fernald, L. C., Kariger, P., Engle, P., & Raikes, A. (2009). Examining early child development inlow-income countries: A toolkit for the assessment of children in the first five years of life.Washington, DC: The International Bank for Reconstruction and Development/The WorldBank. doi:10.1596/28107

Fernandes, M., Stein, A., Newton, C. R., Cheikh-Ismail, L., Kihara, M., Wulff, K., . . . for theInternational Fetal and Newborn Growth Consortium for the 21st Century(INTERGROWTH-21st). (2014). The INTERGROWTH-21st Project NeurodevelopmentPackage: A novel method for the multi-dimensional assessment of neurodevelopment inpre-school age children. PLOS ONE, 9(11), e113360.

Fischer, V. J., Morris, J., & Martines, J. (2014). Developmental screening tools: Feasibility of useat primary healthcare level in low- and middle-income settings. Journal of Health, Populationand Nutrition, 32(2), 314–326.

Fox, S. E., Levitt, P., & Nelson, C. A., III. (2010). How the timing and quality of early experiencesinfluence the development of brain architecture. Child Development, 81(1), 28–40.

Freudenberger, E., & Davis, J. (2017). Tusome external evaluation—midline report. Prepared for theministry of education of kenya, USAID/ Kenya,and the UK department for International developmentunder contract no. AID-615-TO-16-00012. Washington, DC: Management Sciences International(MSI), a Tetra Tech company. http://pdf.usaid.gov/pdf_docs/PA00MS6J.pdf

Garon, N., Bryson, S. E., & Smith, I. M. (2008). Executive function in preschoolers: A reviewusing an integrative framework. Psychological Bulletin, 134(1), 31–60.

CHILD NEUROPSYCHOLOGY 17

Page 19: Measuring executive function skills in young …...An executive function composite score was approxi-mately normally distributed, despite higher-than-expected floor and ceiling effects

Gladstone, M., Lancaster, G. A., Umar, E., Nyirenda, M., Kayira, E., van den Broek, N. R., &Smyth, R. L. (2010). The malawi developmental assessment tool (MDAT): The creation,validation, and reliability of a tool to assess child development in rural African settings.PLOS Medicine, 7(5), e1000273.

Gove, A., & Dubeck, M. M. (2016). Assess reading early to inform instruction, improve quality,and realize possibilities. In A. W. Wiseman (Ed.), Annual review of comparative and interna-tional education, 2016: International perspectives on education and society (Vol. 30, pp. 81–88).Bingley, UK: Emerald Group.

Grantham-McGregor, S., Cheung, Y. B., Cueto, S., Glewwe, P., Richter, L., Strupp, B., & theInternational Child Development Steering Group. (2007). Developmental potential in the first5 years for children in developing countries. Lancet, 369(9555), 60–70.

Hackman, D. A., Farah, M. J., & Meaney, M. J. (2010). Socioeconomic status and the brain:Mechanistic insights from human and animal research. Nature Reviews Neuroscience, 11(9),651–659.

Heckman, J. J., & Mosso, S. (2014). The economics of human development and social mobility.Annual Review of Economics, 6(1), 689–733.

Hungi, N., & Thuku, F. W. (2010). Differences in pupil achievement in Kenya: Implications forpolicy and practice. International Journal of Educational Development, 30(1), 33–43.

Imada, T., Carlson, S. M., & Itakura, S. (2013). East-West cultural differences in context-sensitivity are evident in early childhood. Developmental Science, 16(2), 198–208.

International, R. T. I. (2015). Tayari programme: year 1. Tayari draft annual performance review.Prepared for and available from the children’s investment fund foundation. Research TrianglePark, NC: Author.

Kariuki, S. M., Abubakar, A., Newton, C. R., & Kihara, M. (2014). Impairment of executivefunction in Kenyan children exposed to severe falciparum malaria with neurological involve-ment. Malaria Journal, 13, 365.

Kenya Institute of Education (KIE). (1992). Primary education syllabus. Nairobi: Kenya Instituteof Education.

Kenya National Bureau of Statistics (KNBS). (2010). 2009 population and housing statistics.Nairobi: Author.

Kenya National Examinations Council (KNEC). (2016). Monitoring of learner achievement atclass 3 in literacy and numeracy in Kenya. Nairobi: Kenya National Examinations Council.

Kitsao-Wekulo, P. K., Holding, P. A., Taylor, H. G., Abubakar, A., & Connolly, K. (2013).Neuropsychological testing in a rural African school-age population: Evaluating contributionsto variability in test performance. Assessment, 20((6)), 776–784.

Knudsen, E. I. (2004). Sensitive periods in the development of the brain and behavior. Journal ofCognitive Neuroscience, 16(8), 1412–1425.

Koziol, L. F., Budding, D. E., & Chidekel, D. (2012). Frommovement to thought: Executive function,embodied cognition, and the cerebellum. Cerebellum (London, England), 11(2), 505–525.

Kuhn, L. J., Willoughby, M. T., Blair, C. B., & McKinnon, R. (2017). Examining an executivefunction battery for use with preschool children with disabilities. Journal of Autism andDevelopmental Disorders, 47(8), 2586–2594.

Kutlesic, V., Brewinski Isaacs, M., Freund, L. S., Hazra, R., & Raiten, D. J. (2017). Executivesummary: Research gaps at the intersection of pediatric neurodevelopment, nutrition, andinflammation in low-resource settings. Pediatrics, 139(Suppl 1), S1–S11.

Magnus, B. E., Willoughby, M. T., Blair, C. B., & Kuhn, L. J. (In press). Integrating item accuracyand reaction time to improve the measurement of executive function abilities in early childhood.Assessment.

McCoy, D. C., Sudfeld, C. R., Bellinger, D. C., Muhihi, A., Ashery, G., Weary, T. E., Fink, G.(2017). Development and validation of an early childhood development scale for use in low-resourced settings. Population Health Metrics, 15(1), 1–18.

McCoy, D. C., Zuilkowski, S. S., & Fink, G. (2015). Poverty, physical stature, and cognitive skills:Mechanisms underlying children’s school enrollment in Zambia. Developmental Psychology, 51(5), 600–614.

18 M. T. WILLOUGHBY ET AL.

Page 20: Measuring executive function skills in young …...An executive function composite score was approxi-mately normally distributed, despite higher-than-expected floor and ceiling effects

Miller, E. K., & Cohen, J. D. (2001). An integrative theory of prefrontal cortex function. AnnualReview of Neuroscience, 24(1), 167–202.

Ministry of Education, Science and Technology (MoEST). (2014). 2014 basic education statisticalbooklet. Nairobi: Author. http://www.education.go.ke/index.php/downloads/file/30-basic-education-statistical-booklet-2014

Moriguchi, Y., Evans, A. D., Hiraki, K., Itakura, S., & Lee, K. (2012). Cultural differences in thedevelopment of cognitive shifting: East-West comparison. Journal of Experimental ChildPsychology, 111(2), 156–163.

Moriguchi, Y., & Hiraki, K. (2011). Longitudinal development of prefrontal function during earlychildhood. Developmental Cognitive Neuroscience, 1(2), 153–162.

Noble, K. G., Houston, S. M., Brito, N. H., Bartsch, H., Kan, E., Kuperman, J. M., . . . Sowell, E. R.(2015). Family income, parental education and brain structure in children and adolescents.Nature Neuroscience, 18(5), 773–778.

Oketch, M., Ngware, M., & Ezeh, A. (2010). Why are there proportionately more poor pupils innon-state schools in urban Kenya in spite of FPE policy? International Journal of EducationalDevelopment, 30(1), 23–32.

Petersen, S. E., & Posner, M. I. (2012). The attention system of the human brain: 20 years after.Annual Review of Neuroscience, 35, 73–89.

Piper, B., Jepkemei, E., & Kibukho, K. (2015). Pro-poor PRIMR: Improving early literacy skillsfor children from low-income families in Kenya. Africa Education Review, 12(1), 67–85.

Piper, B., Merseth, K., & Ngaruiya, S. (2017). Scaling up early childhood development andeducation in a devolved setting: Policy making, resource allocations, and impacts of theTayari school readiness program in Kenya. Manuscript submitted for publication.

Piper, B., Zuilkowski, S., & Ong’ele, S. (2016). Implementing mother tongue instruction in thereal world: Results from a medium-scale randomized controlled trial in Kenya. ComparativeEducation Review, 60(4), 776–807.

Pisani, L., Dyenka, K., Sharma, P., Chhetri, N., Dang, S., Gayleg, K., & Wangdi, C. (2017).Bhutan’s national ECCD impact evaluation: Local, national, and global perspectives. EarlyChild Development and Care, 187(10), 1511–1527.

Prado, E. L., Alcock, K. J., Muadz, H., Ullman, M. T., Shankar, A. H., & For the SUMMIT StudyGroup. (2012). Maternal multiple micronutrient supplements and child cognition: A rando-mized trial in Indonesia. Pediatrics, 130(3), e536–e546.

Prado, E. L., Hartini, S., Rahmawati, A., Ismayani, E., Hidayati, A., Hikmah, N., . . . Alcock, K. J.(2010). Test selection, adaptation, and evaluation: A systematic approach to assess nutritionalinfluences on child development in developing countries. The British Journal of EducationalPsychology, 80(1), 31–53.

Ross, J, & Melinger, A. (2017). Bilingual advantage, bidialectal advantage or neither? comparingperformance across three tests of executive function in middle childhood. Dev Sci, 20, 4. doi:10.1111/desc.2017.20.issue-4

Semrud-Clikeman, M., Romero, R. A. A., Prado, E. L., Shapiro, E. G., Bangirana, P., & John, C.C. (2017). Selecting measures for the neurodevelopmental assessment of children in low- andmiddle-income countries. Child Neuropsychology: A Journal on Normal and AbnormalDevelopment in Childhood and Adolescence, 23(7), 761–802.

Sherr, L., Croome, N., Parra Castaneda, K., Bradshaw, K., & Herrero Romero, R. (2014).Developmental challenges in HIV infected children—An updated systematic review.Children and Youth Services Review, 45, 74–89.

Simpson, A., & Riggs, K. J. (2006). Conditions under which children experience inhibitory difficultywith a “button-press” go/no-go task. Journal of Experimental Child Psychology, 94(1), 18–26.

Smith-Donald, R., Raver, C. C., Hayes, T., & Richardson, B. (2007). Preliminary construct andconcurrent validity of the preschool self-regulation assessment (PSRA) for field-basedresearch. Early Childhood Research Quarterly, 22(2), 173–187.

Suchdev, P. S., Boivin, M. J., Forsyth, B. W., Georgieff, M. K., Guerrant, R. L., & Nelson, C. A.,III. (2017). Assessment of neurodevelopment, nutrition, and inflammation from fetal life toadolescence in low-resource settings. Pediatrics, 139(Suppl1), S23–S37.

CHILD NEUROPSYCHOLOGY 19

Page 21: Measuring executive function skills in young …...An executive function composite score was approxi-mately normally distributed, despite higher-than-expected floor and ceiling effects

Tarullo, A. R., Obradović, J., Keehn, B., Rasheed, M. A., Siyal, S., Nelson, C. A., & Yousafzai, A.K. (2017). Gamma power in rural Pakistani children: Links to executive function and verbalability. Developmental Cognitive Neuroscience, 26, 1–8.

UNESCO, UNICEF, Brookings Institution, & the World Bank. (2017). Overview: Measuring earlylearning quality and outcomes (MELQO). Paris: Authors. http://unesdoc.unesco.org/images/0024/002480/248053E.pdf

Uwezo. (2016). Are our children learning (2016)? Uwezo kenya sixth learning assessment report,December 2016. Nairobi: Twaweza East Africa. http://www.twaweza.org/uploads/files/UwezoKenya2015ALAReport-FINAL-EN-web.pdf

vanWyhe, K. S., van deWater, T., Boivin,M. J., Cotton,M. F., &Thomas, K.G. F. (2017). Cross-culturalassessment of HIV-associated cognitive impairment using the kaufman assessment battery forchildren: A systematic review. Journal of the International AIDS Society, 20(1), 1–11.

von Bastian, C. C., Souza, A. S., & Gade, M. (2016). No evidence for bilingual cognitive advantages:A test of four hypotheses. Journal of Experimental Psychology. General, 145(2), 246–258.

von Suchodoletz, A., Uka, F., & Larsen, R. A. A. A. (2015). Self-regulation across different contexts:Findings in young albanian children. Early Education and Development, 26(5–6), 829–846.

Wiebe, S. A., Sheffield, T. D., & Andrews Espy, K. (2012). Separating the fish from the sharks: Alongitudinal study of preschool response inhibition. Child Development, 83(4), 1245–1261.

Willoughby, M., Holochwost, S. J., Blanton, Z. E., & Blair, C. B. (2014). Executive functions:Formative versus reflective measurement. Measurement: Interdisciplinary Research andPerspectives, 12(3), 69–95.

Willoughby, M. T., & Blair, C. B. (2016). Longitudinal measurement of executive function inpreschoolers. In J. Griffin, L. Freund, & P. McCardle (Eds.), Executive function in preschool agechildren: Integrating measurement, neurodevelopment and translational research (pp. 91–113).Washington, DC: American Psychological Association Press. doi:10.1037/14797-005

Willoughby, M. T., Blair, C. B., & The Family Life Project Key Investigators. (2016). Measuringexecutive function in early childhood: A case for formative measurement. PsychologicalAssessment, 28(3), 319–330.

Willoughby,M. T., Blair, C. B.,Wirth, R. J., Greenberg,M., &The Family Life Project Key Investigators.(2010). Themeasurement of executive function at age 3 years: Psychometric properties and criterionvalidity of a new battery of tasks. Psychological Assessment, 22(2), 306–317.

Willoughby, M. T., Blair, C. B., Wirth, R. J., Greenberg, M., & The Family Life Project KeyInvestigators. (2012). The measurement of executive function at age 5: Psychometric proper-ties and relationship to academic achievement. Psychological Assessment, 24(1), 226–239.

Willoughby, M. T., Kuhn, L. J., Blair, C. B., Samek, A., & List, J. A. (2017). The test-retestreliability of the latent construct of executive function depends on whether tasks are repre-sented as formative or reflective indicators. Child Neuropsychology: A Journal on Normal andAbnormal Development in Childhood and Adolescence, 23(7), 822–837.

Willoughby, M. T., Pek, J., & Blair, C. B. (2013). Measuring executive function in early child-hood: A focus on maximal reliability and the derivation of short forms. PsychologicalAssessment, 25(2), 664–670.

Willoughby, M. T., Wirth, R. J., & Blair, C. B. (2011). Contributions of modern measurementtheory to measuring executive function in early childhood: An empirical demonstration.Journal of Experimental Child Psychology, 108(3), 414–435.

Willoughby, M. T., Wirth, R. J., Blair, C. B., & The Family Life Project Key Investigators. (2012).Executive function in early childhood: Longitudinal measurement invariance and develop-mental change. Psychological Assessment, 24(2), 418–431.

Wolf, S., Halpin, P., Yoshikawa, H., Dowd, A. J., Pisani, L., & Borisova, I. (2017). Measuringschool readiness globally: Assessing the construct validity and measurement invariance of theinternational development and early learning assessment (IDELA) in Ethiopia. EarlyChildhood Research Quarterly, 41, 21–36.

Zuilkowski, S., Piper, B., Ong’ele, S., & Kiminza, O. (In press). Parents, quality and school choice:How parents choose between public and low-cost private schools in Kenya’s Free PrimaryEducation era. Oxford Review of Education.

20 M. T. WILLOUGHBY ET AL.