19

Click here to load reader

Assessing Reading Comprehension in Bilinguals. 2006.pdf

Embed Size (px)

Citation preview

Page 1: Assessing Reading Comprehension in Bilinguals. 2006.pdf

Assessing Reading Comprehension in BilingualsAuthor(s): Diane August, David J. Francis, Han‐Ya Annie Hsu and Catherine E. SnowSource: The Elementary School Journal, Vol. 107, No. 2 (November 2006), pp. 221-238Published by: The University of Chicago PressStable URL: http://www.jstor.org/stable/10.1086/510656 .

Accessed: 25/01/2014 23:36

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

The University of Chicago Press is collaborating with JSTOR to digitize, preserve and extend access to TheElementary School Journal.

http://www.jstor.org

This content downloaded from 148.206.53.9 on Sat, 25 Jan 2014 23:36:04 PMAll use subject to JSTOR Terms and Conditions

Page 2: Assessing Reading Comprehension in Bilinguals. 2006.pdf

The Elementary School JournalVolume 107, Number 2� 2006 by The University of Chicago. All rights reserved.0013-5984/2006/10702-0006$05.00

Assessing ReadingComprehension inBilinguals

Diane AugustCenter for Applied Linguistics, Washington, DC

David J. FrancisUniversity of Houston

Han-Ya Annie HsuHarvard University

Catherine E. SnowHarvard University

Abstract

A new measure of reading comprehension, theDiagnostic Assessment of Reading Comprehen-sion (DARC), designed to reflect central com-prehension processes while minimizing decod-ing and language demands, was pilot tested.We conducted three pilot studies to assess theDARC’s feasibility, reliability, comparabilityacross Spanish and English, developmental sen-sitivity, and relation to standardized measures.The first study, carried out with 16 second-through sixth-grade English language learners,showed that the DARC items were at the appro-priate reading level. The second pilot study, with28 native Spanish-speaking fourth graders whohad scored poorly on the Woodcock-JohnsonLanguage Proficiency Reading Passages subtest,revealed a range of scores on the DARC, that yes-no answers were valid indicators of respondents’thinking, and that the Spanish and English ver-sions of the DARC were comparable. The thirdstudy, carried out with 521 Spanish-speakingstudents in kindergarten through grade 3, con-firmed that different comprehension processesassessed by the DARC (text memory, text infer-encing, background knowledge, and knowledgeintegration) could be measured independently,and that DARC scores were less strongly relatedto word reading than Woodcock-Johnson com-prehension scores. By minimizing the need forhigh levels of English oral proficiency or decod-ing ability, the DARC has the potential to reflectthe central comprehension processes of second-language readers of English more effectivelythan other measures.

The purpose of this article is to consider thechallenges of assessing comprehension insecond-language (L2) readers and to reporton three studies conducted to develop andvalidate a new measure of reading compre-hension called the Diagnostic Assessmentof Reading Comprehension (DARC). TheDARC, based on the assessment first de-

This content downloaded from 148.206.53.9 on Sat, 25 Jan 2014 23:36:04 PMAll use subject to JSTOR Terms and Conditions

Page 3: Assessing Reading Comprehension in Bilinguals. 2006.pdf

222 THE ELEMENTARY SCHOOL JOURNAL

NOVEMBER 2006

vised by Potts and Peterson (1985) and ex-tended by Hannon and Daneman (2001),was designed to assess students’ perfor-mance on four central comprehension pro-cesses: remembering newly read text, mak-ing inferences licensed by the text, accessingrelevant background knowledge, and mak-ing inferences that require integrating back-ground knowledge with the text. By mini-mizing the need for high levels of Englishoral proficiency or decoding ability, theDARC has the potential to reflect the com-prehension skills of L2 readers of English.

Multiple Determinants of Success inReading ComprehensionSuccessful reading comprehension reflectsthe presence of many component capabili-ties. Comprehension relies on decodingskills (reading words accurately and flu-ently, accessing lexical representations),knowledge in several domains (vocabulary,linguistic structure, and discourse as well asworld knowledge), and cognitive process-ing capacities (memory for text, accessingrelevant background knowledge, drawingjustified inferences). Because successfulcomprehension requires inputs from allthese domains of knowledge and process-ing, it can be disrupted by a failure in anyof them, even if the reader is competent inthe other ones. Comprehension is like achemical reaction, which can be con-strained by too little of any one of the ele-ments necessary in the reaction, even if theothers are present in abundant quantities.This limiting-element conceptualization ofcomprehension helps explain why compre-hension is so vulnerable; breakdown ofcomprehension can be caused by failuresof word-reading automaticity, of familiar-ity with key vocabulary words in the text,of background knowledge presupposed bythe text, of knowledge of discourse featuresused in the text, of interest in the topic, ofinferencing, or of formulating or recogniz-ing a purpose for reading the text. A limi-tation in a single domain may generate poorcomprehension of a particular text even

among readers with generally strong com-prehension skills. Disruption of compre-hension by a single limitation in the face ofgenerally good comprehension skills is, un-fortunately, invisible in standardized com-prehension assessments, which can producelow scores for readers who would scorehigh if one or two characteristics of the textor the situation were changed.

Even more important, there is little basisfor deciding the relative importance of thesefactors in determining poor comprehensionoutcomes for individual children or groupsof children. Knowing how each factor con-tributes to comprehension could help in de-signing optimally differentiated compre-hension instruction. In other words, if agroup of children comprehends poorly be-cause of failure to draw appropriate infer-ences, then attention to strategies for con-structing inferences makes instructionalsense. However, if children show normalability to form inferences but lack relevantvocabulary knowledge, then focusing in-struction on forming inferences is a wasteof time. The ultimate purpose of the DARCis to provide teachers with a better basis foradapting instruction to individual students’needs by helping to identify subgroups ofstruggling comprehenders.

Reasons Children Fail atComprehensionMany children score poorly on comprehen-sion assessments because their word read-ing is inaccurate (e.g., Adams, 1990; Gough& Tunmer, 1986; Vellutino, 1979, 1987); chil-dren growing up in poverty often fall intothis group (National Research Council,1998). Some children who read accurately,though, fail at comprehension because of in-adequate reading fluency (e.g., Perfetti,1985); children with little access to print orthose experiencing instruction that does notemphasize regular reading may be overrep-resented in this group (e.g., Stanovich,1991). Children with little interest in read-ing also are likely to show poor comprehen-sion (e.g., Guthrie, Wigfield, & VonSecker,

This content downloaded from 148.206.53.9 on Sat, 25 Jan 2014 23:36:04 PMAll use subject to JSTOR Terms and Conditions

Page 4: Assessing Reading Comprehension in Bilinguals. 2006.pdf

READING COMPREHENSION 223

2000; Sweet, Gughrie, & Ng, 1998), perhapsin part because of their restricted opportu-nities for practice. Still other children withaccurate and fluent word-reading skills failat comprehension because of poor vocabu-lary and/or limited background knowledge(e.g., Bradley & Bryant, 1983; Hulme,Muter, Snowling, & Stevenson, 2004); chil-dren growing up in poverty (Hart & Risley,1995) and second-language speakers (Na-tional Research Council, 1997) show height-ened risk of falling into this group. Evidencethat explicit strategy instruction improvescomprehension (National Reading Panel,2000) suggests that comprehension is diffi-cult for children who lack techniques for self-monitoring or self-correction. The RANDReading Study Group (2002) suggested thatsuch failures might be related to the absenceof conscious or self-initiated purposes forreading.

The particular challenges of readingcomprehension for children from low-in-come families and for English-languagelearners (ELLs) deserve mention. Such chil-dren typically have smaller vocabularies,less background knowledge relevant to thetexts they encounter, and less familiaritywith mainstream discourse patterns thanhigh-socioeconomic-status or English-onlyreaders, though there is no a priori reasonto assume they are more likely to have dif-ficulties with forming inferences. ELLs mayfind text memory, an important correlate ofscholastic achievement (Gathercole & Pick-ering, 2000), a particular challenge in En-glish. Children for whom initial reading in-struction was a lengthy and/or frustratingprocess are also likely to develop reducedmotivation to read and limited interest inschool-assigned reading materials (Na-tional Research Council, 1998). Assessingthe key limiting factors for different chil-dren is crucial to designing effective, tar-geted instruction for all of them.

The Need for Better ComprehensionAssessmentsSorting out optimal instruction for everylearner requires having information about

which aspects of reading are causing anychild’s comprehension breakdown. Currentcomprehension assessments provide limitedhelp with this task. First, these assessmentsare atheoretical in design. Standardizedcomprehension assessments are generally“portmanteau measures”—a single score re-flects a large domain. Thus, they do not re-flect the many factors that influence out-comes.

Second, existing comprehension assess-ments identify poor readers but do not iso-late the determinants of poor performance.Readers with poor skills across the boardcannot be distinguished from readerswhose comprehension outcomes are limitedonly by background knowledge, fluency, oranother specific factor. Thus, teachers havelittle guidance from test results concerningwhat child skills they should focus on.

Furthermore, the most helpful assess-ments provide information about children’sstrengths as well as their weaknesses. Forexample, readers might be very good atmemory for new information presented intext and at drawing inferences—strengthsthat teachers could build on—but be unableto display those capabilities if too manywords in the text are unfamiliar. Current as-sessments are particularly unhelpful in pro-viding information about the comprehen-sion processing of readers with poorvocabularies, for example, ELLs who maybring strong inferencing skills and goodstrategy use to reading but who are still inthe early stages of English vocabularylearning.

A Model for More DiagnosticComprehension AssessmentsMeasures of central comprehension pro-cesses minimally influenced by other fac-tors have been previously developed, foruse with both ELL and English monolin-gual populations, by Potts and Peterson(1985) and Hannon and Daneman (2001).Potts and Petersen (1985) developed a testthat isolated four processes that occur dur-ing successful reading comprehension

This content downloaded from 148.206.53.9 on Sat, 25 Jan 2014 23:36:04 PMAll use subject to JSTOR Terms and Conditions

Page 5: Assessing Reading Comprehension in Bilinguals. 2006.pdf

224 THE ELEMENTARY SCHOOL JOURNAL

NOVEMBER 2006

(Dixon, LeFevre, & Twilley, 1998; Engle, Na-tions, & Cantor, 1990; Haengi & Perfetti,1994; Palmer, MacLeod, Hunt, & Davidson,1985): (a) recalling from memory new infor-mation presented in the text, which we calltext memory; (b) making novel inferencesbased on information provided in the text,text inferencing; (c) accessing relevant priorknowledge from long-term memory, knowl-edge access; and (d) integrating accessedprior knowledge with new text information,knowledge integration. Scores on the Pottsand Peterson (1985) test predicted perfor-mance on a general measure of readingcomprehension, and scores reflecting thefour components related to other, indepen-dent tests of those components. In their as-sessment, reading passages consisted ofthree sentences that described relationsamong a set of real and artificial terms (e.g.,a jal is larger than a toc, a toc is larger than apony, and a beaver is larger than a caz). Usingthe information in the text and worldknowledge, participants could construct afive-item linear ordering (jal � toc � pony� beaver � caz). Participants read andstudied the paragraph and then respondedto true-false statements of four types. Textmemory statements (e.g., a jal is larger thana toc) tested information explicitly men-tioned in the paragraph. Text inferencingstatements (e.g., a jal is larger than a pony)required integrating information acrosspropositions in the text (i.e., a jal is largerthan a toc; a toc is larger than a pony); no priorknowledge was required. Knowledge ac-cess statements (e.g., a pony is larger than abeaver) could be answered by accessingprior knowledge; no information from thetext was required. Knowledge integrationstatements (e.g., a toc is larger than a beaver)required integrating prior knowledge (po-nies are larger than beavers) with a text-based fact (i.e., a toc is larger than a pony).

Potts and Peterson found that text mem-ory and text inferencing were highly corre-lated with each other, but neither was cor-related with knowledge access. Knowledgeintegration was correlated with the two

text-based components as well as with thepure prior knowledge component. Thesecorrelations suggest that the ability to re-member new information and the tendencyto use world knowledge are separable.Hannon and Daneman (2001) developed aversion of the test with more complex texts,for use with university students. The Han-non and Daneman test also proved to be avalid measure of the four components ofreading comprehension processing (Dixonet al., 1988; Engle et al., 1990; Haengi & Per-fetti, 1994; Palmer et al., 1985), as shown bycorrelations of the four scores with perfor-mance on comprehension tests designed toassess components of reading comprehen-sion. Furthermore, the Hannon and Dane-man test was brief and easy to administerand accounted for a substantial proportionof the variance in performance on a global,standardized test of reading comprehension(the Nelson-Denny test). Scores on knowl-edge integration were the best predictor ofNelson-Denny scores. The text and test de-signs Potts and Peterson and Hannon andDaneman used formed the basis for pilotwork reported here.

Purpose of the StudyBuilding on previous work, we developedthe Diagnostic Assessment of ReadingComprehension (DARC) for use with ELLsin the elementary grades. Because of our fo-cus on ELL readers, we needed texts thatwere even simpler than those Potts and Pe-tersen (1985) used. The DARC uses simpleand highly decodable words and severelyrestricts the need for background knowl-edge in texts that require sophisticated in-ferencing and knowledge integration.

In this article we present data from threepilot studies designed to refine and validatethe measure. The purpose of the first pilotstudy, carried out in Washington, DC, wasto determine that the DARC items were atthe appropriate reading level for elemen-tary students of limited English proficiency.The purpose of the second study, carriedout in Chicago and Boston, was to provide

This content downloaded from 148.206.53.9 on Sat, 25 Jan 2014 23:36:04 PMAll use subject to JSTOR Terms and Conditions

Page 6: Assessing Reading Comprehension in Bilinguals. 2006.pdf

READING COMPREHENSION 225

an initial validation of the DARC by deter-mining whether Spanish-speaking ELLswho had scored very poorly on a standard-ized comprehension assessment wouldshow a range of scores on the DARC, aswell as to assess the validity of participants’yes-no responses and to compare perfor-mance on Spanish and English versions ofthe DARC. The third pilot study, carried outin Texas, was conducted to estimate devel-opmental sensitivity, reliability, and validityof the DARC subscales using a larger sam-ple.

MethodParticipantsThe first sample consisted of 16 English-

language learners. Two to four studentsfrom each of grades 2 through 6 were se-lected by teachers to represent studentswith differing levels of English proficiency.All children in this sample had some pro-ficiency in English but were neverthelessEnglish-language learners. All but one ofthe children spoke Spanish as a first lan-guage. The children were participating ina dual-immersion bilingual program inWashington, DC. In the second sample, sub-jects were 28 native Spanish speakers whowere currently in all-English instruction infourth grade, though all of them had re-ceived initial literacy instruction in Spanish,in schools in Chicago (n � 15) and Boston(n � 13). All were part of a larger longitu-dinal study of transfer of literacy skills fromSpanish to English. The students selectedfor testing with the DARC had scored in thelowest third of the 168 students in the largerBoston and Chicago sample on the Wood-cock Language Proficiency Battery readingcomprehension subtest. The third sampleconsisted of 521 Spanish-speaking studentsin kindergarten through grade 3 living inHouston and Brownsville, Texas.

MeasuresWe started with the Potts and Peterson

materials, which included only one rela-tional feature in their texts. We made addi-

tional adjustments for younger childrenand ELLs: (a) with the Lexile Framework asa guide (http://www.lexile.com), writingtexts at the second-grade reading level toensure that most children could read them,(b) using vocabulary that young childrenwere likely to know, (c) employing verysimple syntactic structures, and (d) embed-ding the relational propositions (A culp isfaster than a cat) in a more narrative text (e.g.,Mary has four pets) to provide a familiargenre and more context. Thus, our text min-imized the effect of differences across chil-dren in decoding skills, vocabulary, and lin-guistic sophistication—abilities that maymask skills at the heart of reading compre-hension (memory for text and making text-based and integrative inferences). Wethought this task might be especially usefulin revealing inferencing skills of ELL chil-dren with normal comprehension processesbut limited English proficiency.

The resulting assessment consisted of abrief passage of narrative text that de-scribed relations among five entities, wherethree of the entities were unknown to allreaders because they were represented bynonce (nonsense) words. In contrast, two ofthe entities referred to were likely to beknown to all children. In addition, the nar-rative compared or contrasted the entitiesalong a dimension that was likely to be fa-miliar to all children, and the known enti-ties differed strikingly on that dimension.The narrative text and associated questionswere divided into three sections presentedsuccessively, in an effort to minimize thememory demands of the task for youngchildren. Each new section consisted ofroughly two propositions and the associ-ated questions. After reading each section,students were asked a series of yes-no ques-tions about that section. Students could re-read the previously presented material fol-lowed by the new section before answeringthe questions associated with the secondand third sections. Successive sections alsoincluded questions that related proposi-tions across previously presented sections

This content downloaded from 148.206.53.9 on Sat, 25 Jan 2014 23:36:04 PMAll use subject to JSTOR Terms and Conditions

Page 7: Assessing Reading Comprehension in Bilinguals. 2006.pdf

226 THE ELEMENTARY SCHOOL JOURNAL

NOVEMBER 2006

of the text. As in the Potts and Peterson task,the questions were framed to assess stu-dents’ abilities in each of the four compo-nents of reading comprehension describedabove. We developed two passages: “Nan’sPets” and “Tom and Ren.” Both passageswere developed first in English and thentranslated into Spanish using back transla-tion. In study 1 either “Nan’s Pets” or “Tomand Ren” was administered in English only.In study 2 “Nan’s Pets” was administeredin English and “Tom and Ren” was admin-istered in Spanish. In study 3 children re-ceived both an English and a Spanishpassage, with assignment of passage to lan-guage determined randomly but with theprovision that no child received the samepassage in both English and Spanish. A copyof the English version of “Nan’s Pets” can befound in the appendix.

ProceduresPrior to administering the assessment,

we gave children a practice story (see App.).This story served three purposes. Studentsread the story aloud and were rated for ac-curacy of word reading; children whoscored below 85% on word accuracy werenot administered the assessment. Second, itprovided all children an opportunity topractice answering the kinds of questionsthey were to be asked during the assess-ment and provided a chance for the exam-iner to explain why students’ answers werecorrect or incorrect. Because the text in-cluded at least some entities that had beennamed with nonsense words, the practicetext also prepared students to read textswhere some words would be unknown tothem

The participants in the first and secondpilot studies were administered the assess-ment individually in a clinical interviewmode; children were asked to justify theiranswers, and the tester probed to be sureshe understood the children’s reasoning foreach answer. The responses were tape-recorded, transcribed, and analyzed quali-tatively. The subjects in the second sample

were given slightly revised versions of theassessment. Because we had available theparticipants’ justifications for their answers,we could analyze their responses in twoways: simple responses (i.e., one point forcorrect, and no points for incorrect, an-swers, totaled for each of the components),and justified responses. We calculated jus-tified responses by assigning 0 for no re-sponse or an incoherent or incorrect justifi-cation, 1 for a plausible or possibly correctbut incomplete explanation, and 2 for a fulland correct justification. The simple re-sponse score was then multiplied by the jus-tification code to generate a justified score.Subjects in the third study, in addition tobeing tested on the DARC, were assessed onword reading, word reading efficiency,comprehension, and language proficiencyusing the Woodcock Language ProficiencyBattery (WLPB; Woodcock, 1991).

ResultsThe three pilot studies indicated that it ispossible to use the DARC with students asyoung as kindergarten, to develop texts thatare easily decoded but yet place demandson students’ abilities to form inferences andintegrate information across propositionsand with background knowledge, that theyes/no scores accurately reflect students’elaborated responses, and that the test canbe used to differentiate inferencing fromtext memory and background knowledge.

Pilot Testing ItemsDespite the cognitive demands of the as-

sessment, children at all grade levels (in theWashington, DC, sample) were able to com-plete the assessment (see Table 1). Further-more, students who scored quite differentlyon the Stanford-9 scored similarly on theDARC (e.g., see the second graders), andstudents who scored identically on theStanford-9 got very different results on theDARC (see the sixth graders). These prelim-inary results confirmed that the DARC wastapping comprehension capacities some-what different from those measured on the

This content downloaded from 148.206.53.9 on Sat, 25 Jan 2014 23:36:04 PMAll use subject to JSTOR Terms and Conditions

Page 8: Assessing Reading Comprehension in Bilinguals. 2006.pdf

Ta

ble

1.D

AR

CTo

tala

ndC

ompo

nent

Scor

esan

dSt

anfo

rd-9

Rea

din

gSc

ores

for

Chi

ldre

nin

Pilo

tSt

udy

1(N

�16

)

DA

RC

Scor

eSt

and

ford

-9Sp

ring

2001

(NC

E)

Gra

de/

IDTo

tal

Text

Mem

ory

Text

Infe

renc

ing

Bac

kgro

und

Kno

wle

dge

Inte

grat

ion

Tota

lRea

din

gR

ead

ing

Com

preh

ensi

on

2/1

206

25

768

.564

.22/

223

92

66

69.3

71.8

2/3

227

17

740

.745

.23/

127

103

77

74.7

82.7

3/2

164

16

549

.549

.53/

324

82

77

59.8

62.9

3/4

216

16

857

.556

.44/

128

65

710

56.4

57.0

4/2

2711

27

751

.152

.15/

128

65

512

40.2

42.5

5/2

214

37

752

.151

.15/

328

103

69

55.9

55.9

5/4

269

26

971

.879

.66/

130

65

712

59.3

59.3

6/2

246

44

10na

na6/

324

103

56

57.5

55.3

Max

imum

scor

e:St

ory

130

65

712

Stor

y2

3011

37

9

No

te.—

NC

E�

norm

alcu

rve

equi

vale

nt.

This content downloaded from 148.206.53.9 on Sat, 25 Jan 2014 23:36:04 PMAll use subject to JSTOR Terms and Conditions

Page 9: Assessing Reading Comprehension in Bilinguals. 2006.pdf

228 THE ELEMENTARY SCHOOL JOURNAL

NOVEMBER 2006

standardized reading comprehension as-sessment.

Examination of responses to the clinicalinterview suggested that some sentenceconstructions used in the statements to bejudged true or false confused the partici-pants. In subsequent versions of the test, wemodified these constructions. Specifically,the items to test knowledge integrationwere simplified by eliminating the leadingclause “Like XXX.” For example, in “Nan’sPets,” the item “Like crabs, snerps haveshells to protect them” was changed to read“Snerps have shells to protect them” (seeApp.). The objective of the item was to havestudents integrate background knowledge(turtles have shells) with text knowledge(snerps are like turtles) and arrive at the cor-rect answer: snerps are like turtles and tur-tles have shells, so snerps have shells too(six items from “Nan’s Pets”). In addition,one background knowledge item (#18) wasreplaced with a knowledge integrationitem, and one knowledge access item (#19)was changed to ensure that we tested back-ground knowledge essential to text infer-encing; specifically, we deleted “Culps havefur to keep them warm” and replaced itwith “A turtle is faster than a dog.” Withregard to scoring, we changed one itemfrom text memory to text inferencing (“Nanhas a dog”) because children had to inferthat this was incorrect (the story indicatedthat Nan had a cat). We made similar revi-sions to the story entitled “Tom and Ren.”

Using the DARC with Poor OverallComprehendersIn the second pilot study (in Chicago

and Boston) we tested participants who haddisplayed very poor performance on thereading comprehension subtest of theWoodcock Language Proficiency Battery(WLPB) administered in English. Thus, wewere primarily interested in knowingwhether these children in general, or somesubset of them, performed well on theDARC. Good performance on the DARCwould suggest that they had adequate text

memory, background knowledge, and in-ferencing abilities but were unable to usethose abilities when challenged by text thatwas difficult to decode, grammatically com-plex, or filled with unfamiliar vocabularyitems. Furthermore, we tested these chil-dren on the Spanish version of the DARCto collect additional evidence aboutwhether their basic comprehension skillswere intact. Finally, we were interested inexamining the relative difficulty of the foursubscales and exploring whether children’syes/no responses were produced by ran-dom guessing or reflected their reasonedanalysis and/or memory of the text.

Figure 1 presents the distribution of par-ticipants’ scores on the DARC plottedagainst their scores on the WLPB passagecomprehension measure; it is clear thatsome children performed well on the DARCin English despite having low scores onthe standardized comprehension measure.These results are not simply due to theDARC being easier than the WLPB passagecomprehension, because it is clear that somestudents who performed relatively well onthe WLPB scored rather poorly (i.e., nearchance levels) on the DARC. These relationsmerit reexamination in the larger sample.

Second, what was the overall perfor-mance of these bilingual children on theDARC, and was it better in English or Span-ish? For the 25 children who could be testedin both languages (three refused to take thetest in Spanish), there was no significant dif-ference between the two languages. Stu-dents scored 20.16 on average in English(out of a maximum of 30), and 19.40 inSpanish (see Fig. 2). We are, of course, as-suming in presenting this comparison thatthe items are of equal difficulty across thetwo languages; although we designed thetest with formally equivalent items in En-glish and Spanish, we have not, with thissmall sample, carried out the psychometricanalysis needed to demonstrate that the twoversions are fully equivalent. Furthermore,although all these participants were nativeSpanish speakers who had received initial

This content downloaded from 148.206.53.9 on Sat, 25 Jan 2014 23:36:04 PMAll use subject to JSTOR Terms and Conditions

Page 10: Assessing Reading Comprehension in Bilinguals. 2006.pdf

Fig. 1.—Relation between DARC total score and WLPB passage comprehension (English—standard score)for children selected because of low performance on the WLPB (n � 28).

Fig. 2.—Correlation between Spanish and English DARC scores for the Boston/Chicago sample (n � 28)

This content downloaded from 148.206.53.9 on Sat, 25 Jan 2014 23:36:04 PMAll use subject to JSTOR Terms and Conditions

Page 11: Assessing Reading Comprehension in Bilinguals. 2006.pdf

230 THE ELEMENTARY SCHOOL JOURNAL

NOVEMBER 2006

Table 2. Mean Number and Percentage Correct for DARC Subscales, in English and Spanish, Pilot Study 2

English (“Nan’s Pets”) Spanish (“Tom and Ren”)

Subscale No. of Items Mean SD % No. of Items Mean SD %

Text memory 6 5.08 .76 85 11 7.88 1.54 72Text inferencing 5 2.28 1.46 46 3 1.56 .87 52Background knowledge 6 4.76 1.16 79 5 4.24 .88 85Knowledge integration 13 8.04 1.65 62 11 5.80 2.16 53

literacy instruction in Spanish, their Span-ish proficiency at the time of this testing wasnot independently assessed and may wellhave influenced these relations.

Six of the 25 bilingual students achievedexactly the same score in English and Span-ish, 11 scored higher in English, and eightscored higher in Spanish. All three sub-groups showed a wide range of scores; inother words, there were relatively poorreaders and relatively good readers in allthree groups. The correlation betweenSpanish and English total scores was .39 (p� .05), significant but in the low-moderaterange. That, on average, children scored atcomparable levels in Spanish and Englishconstitutes preliminary evidence that theDARC decreased the English-language de-mands of the English assessment whilemaintaining the cognitive challenge.

The DARC was intended to providescores on four components of comprehen-sion—text memory, text inferencing, back-ground knowledge, and knowledge inte-gration. It stands to reason that knowledgeintegration and text inferencing would bemore difficult for young children than textmemory and background knowledge. Forthe most part, that is the pattern of difficultyobserved in the participants’ performance.In English, students performed best, in or-der of percentage correct, on text memory,then knowledge access, knowledge integra-tion, and text inferencing. In Spanish,knowledge access was easiest, then textmemory, knowledge integration, and textinferencing (see Table 2). The finding thatknowledge integration was only slightlymore difficult than background knowledge

in both English and Spanish may indicatethat knowledge integration items relied tooheavily on known attributes of the entitiesin the stories. This relation merits furtherexamination in larger samples and possiblycontinued work on the development ofknowledge integration items. It is also wor-thy of mention that performance on the textinferencing subscale was not different fromchance in either English or Spanish; therewere also very few items on this subscale, afeature that clearly needs to be improved infuture versions of the DARC.

Importantly, the four components of theDARC were relatively independent of oneanother, as would be predicted by the the-ory underlying the construction of the test.Scores were not completely unrelated, butbivariate correlations between subscales(see Table 3) as well as correlations betweeneach of the individual subscales and theother three subscales (presented on the di-agonal in Table 3) ranged from small tomoderate; only the correlation between textinferencing and background knowledgewas significant at p � .05. Based on this pre-liminary sample of cases, we concluded thatthe four subscales provided relatively non-overlapping information regarding readingcomprehension as predicted. Of course, therelation among subscales needs to be reex-amined in a larger sample unselected forperformance on an external comprehensionmeasure, but the results are at least prom-ising that the different subscales assess dif-ferent information about comprehension.This conclusion is further supported by theintersubscale correlations presented in Ta-ble 3.

This content downloaded from 148.206.53.9 on Sat, 25 Jan 2014 23:36:04 PMAll use subject to JSTOR Terms and Conditions

Page 12: Assessing Reading Comprehension in Bilinguals. 2006.pdf

READING COMPREHENSION 231

Table 3. Correlations among DARC Subscales and Corrected Subscale-Total Correlationsa

(on Diagonal), Pilot Study 2

Subscale Text Memory Text Inferencing Background Knowledge Knowledge Integration

Text memory .17Text inferencing .30 .35�

Background knowledge �.04 .37* .28Knowledge integration .06 .09 .12 .13

aCorrelation between component score and total for remaining components (i.e., total excluding items fromthe component being correlated).

�p � .10.*p � .05.

An important component of the test de-velopment process was the evaluation ofthe basis for students’ responses to thetrue-false questions. Of course, studentscan answer questions correctly simply byguessing, because the questions requiredonly yes or no in response. Consequently,we wanted to ascertain the extent to whichstudents had correctly reasoned about therelations among entities in the test, and theextent to which students’ dichotomized re-sponses reflected their underlying thinkingabout the text and the relation among en-tities. To address this question, we exam-ined students’ responses to the follow-upquestions and scored these as correct or in-correct. We refer to these scores as justifiedscores because students had to justify theiranswer after giving it. These justifiedscores were then compared to students’original answers, which we refer to belowas simple scores.

In general, results supported the utilityof the simple scoring procedure. Specifi-cally, in English and in Spanish, the corre-lation between the simple and the justifiedscores was .91, which was statistically sig-nificant at the p � .01 level. Moreover, cor-relations between WLPB passage compre-hension scores and scores for the foursubscales showed that relations with thesimple and the justified scores were not ap-preciably different from one another (seeTable 4). Thus, students’ simple responsesreflected their thinking about the passagesand about the true-false items.

Large-Sample Validation StudyIn this large-scale Texas validation study

of the DARC with Spanish-English bilin-gual students in kindergarten throughgrade 3, 521 participants (K � 12; grade 1� 130; grade 2 � 180; grade 3 � 198) tookthe DARC in English and Spanish, alongwith the passage comprehension and othersubtests from the Woodcock Battery(WLPB) in both English and Spanish. Thissample included only students who wereable to read the passages on the DARC inEnglish using one of the two forms; a fewstudents who took the English form wereunable to read the Spanish version. Therewas an orderly pattern of increase withgrade in number of items correct (see Table5) on both stories used, in both Spanish andEnglish. At the same time, there was no evi-dence of a ceiling effect even among thirdgraders.

Internal consistency reliability (Cron-bach’s alpha) for the subscales ranged be-tween .41 and .54 in English and from .21

Table 4. Spearman Rank Correlations of DARC withWLPB English Passage Comprehension, Pilot Study 2

DARC Subscale Justified Score Simple Score

Total score .09 .08Text memory �.12 �.08Text inferencing �.06 �.02Background knowledge .11 .34a

Knowledge integration �.06 �.15

aSignificant at the p � .10 level. All other correla-tions are not significantly different from 0.

This content downloaded from 148.206.53.9 on Sat, 25 Jan 2014 23:36:04 PMAll use subject to JSTOR Terms and Conditions

Page 13: Assessing Reading Comprehension in Bilinguals. 2006.pdf

232 THE ELEMENTARY SCHOOL JOURNAL

NOVEMBER 2006

Table 5. Mean Total Number Correct Responses (of 30 Possible) for Grades K–3 on Stories 1 and 2in English and Spanish, Pilot Study 3

Story 1 Story 2

Grade/Language N Mean SD N Mean SD

Kindergarten:English 7 22.43 2.94 5 19.40 2.51Spanish 16 18.56 18.56 19 17.74 3.05

Grade 1:English 64 21.25 4.69 67 20.12 4.30Spanish 68 19.85 4.06 68 19.96 3.47

Grade 2:English 97 21.95 3.96 83 22.35 3.74Spanish 71 21.97 3.92 78 21.08 3.35

Grade 3:English 105 23.49 4.15 93 23.12 3.73Spanish 88 22.73 3.82 88 22.54 3.84

Table 6. Correlations among Subscales, Corrected Subscale-Total Correlationsa (on Diagonal), andDisattenuated Correlationsb (above Diagonal), Pilot Study 3

Language/Subscale Text Memory Text Inferencing Background Knowledge Knowledge Integration

English (n � 521):Text memory .60 .39 .67 .24Text inferencing .17 .63 .81 .75Background knowledge .32 .39 .74 .76Knowledge integration .11 .34 .38 .77Cronbach’s alpha: Form 1 .41 .45 .54 .47Form 2 .46 .43 .52 .47

Spanish (n � 496):Text memory .54 .86 .89 1.14Text inferencing .32 .58 .78 1.04Background knowledge .35 .30 .59 1.13Knowledge integration .43 .38 .44 .66Cronbach’s alpha: Form 1 .21 .43 .36 .46Form 2 .50 .28 .45 .25

Note.—Alpha for TI and KI Combined: Form 1—English � .64, Spanish � .62; Form 2—English � .52 (.59with item 10 deleted), Spanish � .34 (.44 with item 10 deleted); total scale alpha: Form 1—English � .75, Spanish� .67; Form 2—English � .70, Spanish � .64.

aCorrelation between component score and total for remaining components (i.e., total excluding items fromthe component being correlated).

bDisattenuated correlations above the diagonal are equal to the observed correlation divided by the squareroot of the product of the reliabilities. To estimate scale reliabilities, we took the square root of the average ofthe squared reliability for Form 1 and Form 2 in a given language.

to .50 in Spanish for each subscale and form,and from .64 to .75 for the total score (seeTable 6). It is worth pointing out that relia-bilities for each scale were satisfactory (.4 orabove) given the current use of only a singletext to elicit student responses. However,the same cannot be said for each scale-form-language combination. The most problem-atic of these combinations were the two in-

ferencing scales on Form 2 in Spanish. Thediscrepant internal consistencies withinSpanish across the two forms and betweenlanguages on Form 2 indicated that morework is needed to build forms that are trulycomparable, that is, parallel or interchange-able across and within languages. Becauseindividual scale reliabilities were not highfor either form or language, we also tried

This content downloaded from 148.206.53.9 on Sat, 25 Jan 2014 23:36:04 PMAll use subject to JSTOR Terms and Conditions

Page 14: Assessing Reading Comprehension in Bilinguals. 2006.pdf

READING COMPREHENSION 233

combining the inferencing scales (text in-ferencing and knowledge integration) intoa single scale. The combined inferencingscale yielded alphas of .64 and .62 for Form1 in English and Spanish, respectively, and.59 and .44 for Form 2 with item 10 deletedin English and Spanish, respectively. Itemstatistics indicated that item 10 on Form 2was problematic as an inferencing item.Thus, although scale reliabilities were ad-equate for Form 1 in both languages andfor Form 2 as well in English, the Spanishadaptation of Form 2 requires additionalwork.

In addition to scale reliabilities in eachlanguage for each form and scale, Table 6presents correlations for each language, col-lapsing across forms. For the correlationspresented below, scores were standardizedfor each form and then combined to giveestimates of English and Spanish readingability as measured on the DARC. In addi-tion, Table 6 presents corrected correlationsbetween each scale and the total score onthe DARC, excluding the scale being cor-related. These correlations appear on the di-agonal of Table 6.

The subscale correlations with the totalDARC score ranged from .60 to .74 inEnglish, and from .54 to .66 in Spanish. Inboth English and Spanish, intercorrelationsamong subscales were considerably lower,ranging from .17 to .39 in English and from.32 to .44 in Spanish. Thus, the pattern ofrelative independence among the subscalesfound in the Boston/Chicago pilot studywas replicated with this much larger sam-ple. Above the diagonal in Table 6, we pro-vide correlations among the subscales, dis-attenuated for unreliability (Kenny, 1979).For the English version of the test, these dis-attenuated correlations show that the infer-encing scales were only somewhat relatedto text memory and were more highly re-lated to one another and to backgroundknowledge. The disattenuated correlationsin Spanish showed the scales to be less dif-ferentiated. That some of the disattenuatedcorrelations exceeded 1.0 indicated that the

reliability of the scales was likely underes-timated by Cronbach’s alpha and showsthat the Spanish adaptation requires addi-tional work.

Table 7 provides correlations of theDARC with subtests from the WoodcockLanguage Proficiency Battery (WLPB). TheDARC was less highly correlated with mea-sures of decoding (r � .28 and .22 for WLPBletter word and word attack, respectively)than the WLPB passage comprehension (r� .65 and .62, respectively). In other words,performance on the WLPB passage compre-hension test was much more influenced bydecoding skills than was performance onthe DARC. The DARC showed muchstronger correlations with listening compre-hension and oral language from the WLPB(.46 and .53, respectively) than it did withdecoding, as one would hope and expect. Incontrast, the WLPB passage comprehensioncorrelated .64 and .72 with listening com-prehension and oral language, that is, aboutas highly as it correlated with the two de-coding measures. (All correlations were sig-nificant at p � .0001.) Thus, despite the sim-plicity of the passages presented in theDARC, performance was strongly related tooral language processing, as reading com-prehension should be, and less affected byword-reading skills than the WLPB. Morecomplete information on correlations of theDARC total and subscale scores with read-ing and language measures from the WLPBis presented in Table 7. In this table, all cor-relations are significant at p � .0001 with theexception of the correlation between WLPBletter word and text memory, which is notstatistically significant. Thus, data from thisrobust validation study indicated that ourfirst efforts at developing the DARC weresuccessful in making test performance moredependent on higher-order processing andless dependent on word-level decodingskills.

The correlations in Table 7 show that, ofthe four DARC subscale scores, backgroundknowledge correlated most highly with theWLPB passage comprehension score. More

This content downloaded from 148.206.53.9 on Sat, 25 Jan 2014 23:36:04 PMAll use subject to JSTOR Terms and Conditions

Page 15: Assessing Reading Comprehension in Bilinguals. 2006.pdf

234 THE ELEMENTARY SCHOOL JOURNAL

NOVEMBER 2006

Table 8. Pearson Correlations between English and Spanish Reading and Language Measures for ChildrenWho Read in Both Languages on the DARC (n � 366), Pilot Study 3

English Spanish

WLPB WLPB

Measure DARCPassage

Comprehension Letter-Word DARCPassage

Comprehension Letter-Word

English (WLPB):Passage comprehension .32Letter-word .35 .67Oral language .57 .73 .65 .26 �.07 �.23

Spanish:DARC .28 .14 .20WLPB:

Passage comprehension �.14 .28 .20 .17Letter-word �.20 .03 .16 .10 .71Oral language .06 .08 .17 .37 .68 .56

Note.—Correlations larger than .13 in absolute value are significant at p � .01.

Table 7. Pearson Correlations for English Measures of Reading and Language with English scores from theWLPB Comprehension and the DARC, for the Texas Sample (n � 521)

DARC

WLPB ScaleWLPB PassageComprehension Total

TextInferencing

TextMemory

BackgroundKnowledge

KnowledgeIntegration

Passage comprehension .28 .18 .12 .31 .20Listening comprehension .64 .46 .31 .22 .48 .29Oral language .72 .53 .38 .24 .55 .34Letter-word identification .65 .28 .24 .05 .29 .21Word-reading efficiency .32 .34 .19 .16 .38 .25

important, WLPB passage comprehensionappeared more highly related to letter-wordidentification, a measure of decoding skill(.65), than any of the four DARC subscales(.05 to .29). Finally, all the DARC subscalescores were moderately to strongly corre-lated only with measures of oral languageand listening comprehension, whereas theWLPB passage comprehension correlatedwith decoding as strongly as with thesemeasures.

In Table 8 we present correlations be-tween English and Spanish for the DARCtotal scores and measures of reading andlanguage proficiency taken from the WLPB.These correlations are included to show theextent of intra- and interlanguage correla-tion for the DARC and WLPB. The correla-tions again show that the DARC was lesscorrelated with word reading than was the

WLPB passage comprehension subtest inboth English and in Spanish. Interlanguagecorrelations were different for the DARCand WLPB measures. The DARC in Spanishcorrelated positively with the DARC in En-glish as well as with English measures ofword reading, reading comprehension, andoral language proficiency. In contrast, theDARC in English correlated negatively withreading comprehension and word readingin Spanish and negligibly with oral lan-guage proficiency in Spanish. Furthermore,reading comprehension and word readingin Spanish measured by the WLPB corre-lated negatively with oral language profi-ciency in English and reading comprehen-sion measured by the DARC but not by theWLPB. Interestingly, the interlanguage cor-relations for comprehension measured onthe DARC and the WLPB were virtually

This content downloaded from 148.206.53.9 on Sat, 25 Jan 2014 23:36:04 PMAll use subject to JSTOR Terms and Conditions

Page 16: Assessing Reading Comprehension in Bilinguals. 2006.pdf

READING COMPREHENSION 235

identical (r � .28, p � .0001) despite theirdiffering patterns of correlations with wordreading and oral language.

Although these data are only prelimi-nary, they offer promising indications thatit may be possible to build a measure ofcomprehension that is not heavily influ-enced by decoding skill but that remainssensitive to the language and thinking skillsof ELLs.

DiscussionWe have presented findings from three pilotstudies that demonstrate the potentialvalue, usability, and discriminative capacityof a new diagnostic assessment of readingcomprehension. These preliminary analysesshow that the DARC is feasible for use withchildren as young as kindergartners, thatsimple yes-no responses reflect children’scomprehension processing on the DARCadequately, that different aspects of thecomprehension process (text memory, textinferencing, background knowledge, andknowledge integration) can be measured in-dependently, and, most important, that thismeasure reveals children’s comprehensioncapacities that are obscured by measureswith greater decoding, syntax, and vocab-ulary load. Some children who score poorlyon the Stanford-9 or the WLPB passagecomprehension measure perform well onthe DARC, suggesting that their poor per-formance on the standardized measure re-flects difficulties with some part of the com-prehension process (e.g., word decoding,vocabulary) other than comprehension pro-cessing per se. And the DARC is much lessinfluenced by differences in word-readingskills than is the standardized passage com-prehension subtest of the WLPB.

Thus, the DARC design has strong ad-vantages over more traditionally designed,portmanteau comprehension measures andmay be particularly useful for assessingcomprehension processing among English-language learners and other groups of chil-dren with limited vocabulary. Standardinstruction for learners scoring low on com-

prehension assessments is likely to focus onproviding them with strategies for improv-ing comprehension; students who scorepoorly on general comprehension tests butwell on the text inferencing and knowledgeintegration subscales of the DARC probablydo not need such instruction. Instead, theircontrol over the language demands of thetexts they are reading and their access torelevant background knowledge are morelikely explanations for their comprehensionproblems and suggest a very different focusfor intervening with them.

Findings from the third pilot study, thevalidation study carried out in Texas, indi-cate that additional work is necessary toimprove the reliability of scores on the textinferencing, knowledge integration, textmemory, and background knowledge sub-scales. Subscale reliabilities in English rangefrom .41 to .54, as compared to the full testreliabilities of between .70 and .74. By cre-ating additional passages, we can increasethe number of items measuring each of thesecomponents while reducing dependenceamong items. Increasing subscale reliabilitiesis clearly a prerequisite to developing theDARC into a useful diagnostic instrument.

Additional pilot work is also needed toevaluate the DARC with larger groups offourth and fifth graders, to determinewhether restricting the passages to a second-grade reading level creates problems in as-sessing older children and to consider theeffect of including children with somewhatgreater word-reading difficulties in thesample.

The preliminary evidence presentedhere offers considerable promise that theDARC can be further developed, expanded,and extended in range. Although we havefocused in this article on its value for Span-ish speakers learning English, the test is alsoof potential value in helping to pinpointsources of comprehension difficulties forEnglish-only students who score poorly onmore general measures. Precisely becausecomprehension is such a multideterminedprocess, knowing how to go about instruct-

This content downloaded from 148.206.53.9 on Sat, 25 Jan 2014 23:36:04 PMAll use subject to JSTOR Terms and Conditions

Page 17: Assessing Reading Comprehension in Bilinguals. 2006.pdf

236 THE ELEMENTARY SCHOOL JOURNAL

NOVEMBER 2006

ing and intervening most efficiently offers agreat challenge. Traditional comprehensionassessments offer classroom teachers littleguidance about the needed instructional fo-cus. The DARC, building on the work ofPotts and Petersen (1985) and Hannon andDaneman (2001), offers the potential to im-prove and individualize comprehensioninstruction by providing teachers with sub-scale scores that reflect the specific compo-nents of reading comprehension with whichstudents, including English-language learn-ers, might be having difficulties.

The DARC also has great potential as aresearch instrument because it offers thepossibility of varying characteristics of thetexts being read. For example, although thepassages we used in these pilot studies weredesigned to be simple in vocabulary loadand syntactic structure, it would be possibleto develop additional, more difficult itemsin the areas of vocabulary knowledge, syn-tactic complexity, or presupposed back-ground knowledge. If a learner who scoredwell on the central comprehension processesin passages like “Nan’s Pets” scored poorlyon one of these additional passages, thatwould suggest alternative sources of com-prehension difficulties. Development of suchtext manipulations is not outside the realmof feasibility but has not yet been attempted.Nevertheless, even without these additions,the preliminary work to date indicates thata more fully developed standard set of formsfor the DARC would provide educators andresearchers with a tool for assessing youngstudents’ performance on components ofcomprehension that are not easily extractedfrom scores on existing comprehension testsin use in the elementary grades.

Appendix

Diagnostic Assessment of ReadingComprehension (Pilot Version)PRACTICE TEXT

Maria likes to eat fruit. Most of all she likes toeat orkers. An orker is like an orange. But anorker is bigger than an orange.

PRACTICE ITEMS

1) Maria likes to eat fruit.If correct: That’s right. The story tells us

Maria likes to eat fruit.If incorrect: “How do you know that?” or

“What does the story tell us?”2) Most of all Maria likes to eat orkers.

If correct: That’s right. The story tells usMaria likes to eat orkers.

If incorrect: Look at the story again. [Re-present the practice storystimulus.] What does thestory tell us that Maria likesto eat most of all?

3) An orange has a peel.If correct: That’s right. The story does

not tell you that an orangehas a peel, but you know thisfrom your everyday life. Anorange has a peel or a skinthat you take off before youeat the orange.

If incorrect: The skin of an orange iscalled a peel. Even thoughthe story does not say that anorange has a peel, you knowthat an orange has a peel orskin from your everyday life.So, you would answer thisquestion, “Yes. An orangehas a peel.”

4) You peel an orker to eat it.If correct: That’s right. An orker is like

an orange and you peel anorange to eat an orange.That’s how we know thatyou peel an orker to eat anorker.

If incorrect: Listen to the story again. [Re-peat story.] What does thestory tell us about orangesand orkers? The story tells usthat an orker is like an or-ange. Do you peel an orange?Yes, that’s right, you peel anorange to eat it. If an orker isexactly like an orange, doyou think you peel an orkerto eat it? That’s right, youpeel an orker to eat it.

STORY TEXT

Nan has four pets. One pet is a cat. Nan’s cat isfast. Nan has a pet culp. Nan’s pet culp is likeher cat. But Nan’s pet culp is faster than her cat.

This content downloaded from 148.206.53.9 on Sat, 25 Jan 2014 23:36:04 PMAll use subject to JSTOR Terms and Conditions

Page 18: Assessing Reading Comprehension in Bilinguals. 2006.pdf

READING COMPREHENSION 237

Nan has a pet turtle. Nan’s turtle is slow, Nan alsohas a pet tarf. Nan’s pet tarf is like her pet turtle. ButNan’s pet tarf is slower than her turtle.

One day Nan got a pet snerp. Now Nan has five pets.Nan’s snerp is like her tarf. But Nan’s snerp is slowerthan her tarf. All of Nan’s pets like to play. The petslike to play in Nan’s backyard.

STORY ITEMS (true-false)

1) Nan has four pets.2) Nan’s cat is slow.3) Cats have fur to keep them warm.4) Nan’s pet cat is faster than her pet culp.5) Culps have fur to keep them warm.6) Nan has a turtle.7) Nan’s tarf is like her culp.8) Nan’s turtle is faster than her tarf.9) A cat is faster than a tarf.10) A culp is faster than a turtle.11) A tarf is faster than a dog.12) Tarfs have fur to keep them warm.13) Nan got a pet snerp.14) Nan has a dog.15) A turtle is slower than a cat.16) Cats can live in water.17) Turtles have fur to keep them warm.18) A culp is faster than a crab.19) A turtle is faster than a dog.20) Turtles cannot live in water.21) Now Nan has five pets.22) Nan’s snerp is like her culp.23) Nan’s pets like to play in her backyard.24) A culp is faster than a snerp.25) The turtle is slower than the snerp.26) A snerp is faster than a cat.27) Snerps have shells to protect them.28) Snerps live in trees.29) Snerps can live in water.30) Nan’s tarf is slower than her snerp.

Note

This research was supported in part bygrants PO1 HD039530, “Acquiring literacy in En-glish: Cross-linguistic, intra-linguistic, and de-velopmental factors,” and PO1 HD39521, “Or-acy/literacy development of Spanish-speakingchildren,” both jointly funded by the NationalInstitute of Child Health and Human Develop-ment of the National Institutes of Health and theInstitute of Education Sciences of the U.S. De-partment of Education. The opinions expressedherein are ours and do not necessarily reflect the

opinions of, or endorsement by, the fundingagencies.

References

Adams, M. J. (1990). Beginning to read: Thinkingand learning about print. Cambridge, MA: MITPress.

Bradley, L., & Bryant, P. E. (1983). Categorizingsounds and learning to read: A causal con-nection. Nature, 303, 419–421.

Dixon, P., LeFevre, J., & Twilley, L. C. (1988).Word knowledge and working memory aspredictors of reading skill. Journal of Educa-tional Psychology, 80, 465–472.

Engle, R. W., Nations, J. K., & Cantor, J. (1990).Is “working memory capacity” just anothername for word knowledge? Journal of Edu-cational Psychology, 82, 799–804.

Gathercole, S. E., & Pickering, S. J. (2000). Work-ing memory deficits in children with lowachievement in the national curriculum atseven years of age. British Journal of Educa-tional Psychology, 70, 177–194.

Gough, P. B., & Tunmer, W. E. (1986). Decoding,reading, and reading disability. Remedial andSpecial Education, 7(1), 6–10.

Guthrie, J. T., Wigfield, A., & VonSecker, C.(2000). Effects of integrated instruction onmotivation and strategy use in reading. Jour-nal of Educational Psychology, 92(2), 331–341.

Haengi, D., & Perfetti, C. A. (1994). Processingcomponents of college-level reading compre-hension. Discourse Processes, 17, 83–104.

Hannon, B., & Daneman, M. (2001). A new toolfor understanding individual differences inthe component processes of reading compre-hension. Journal of Educational Psychology,93(1), 103–128.

Hart, B., & Risley, T. (1995). Meaningful differencesin the everyday lives of young American children.Baltimore: Brookes.

Hulme, C., Muter, V., Snowling, M., & Stevenson,J. (2004). Phonemes, rimes, vocabulary, andgrammatical skills as foundations of earlyreading development: Evidence from a lon-gitudinal study. Developmental Psychology, 40,665–681.

Kenny, D. (1979). Correlation and causality. NewYork: Wiley.

National Reading Panel. (2000). Teaching childrento read: An evidence-based assessment of the sci-entific research literature on reading and its im-plications for reading instruction. Washington,DC: National Institute of Child Health andHuman Development

This content downloaded from 148.206.53.9 on Sat, 25 Jan 2014 23:36:04 PMAll use subject to JSTOR Terms and Conditions

Page 19: Assessing Reading Comprehension in Bilinguals. 2006.pdf

238 THE ELEMENTARY SCHOOL JOURNAL

NOVEMBER 2006

National Research Council. (1997). Improvingschooling for language-minority children. Wash-ington, DC: National Academies Press.

National Research Council. (1998). Preventingreading difficulties in young children. Washing-ton, DC: National Academies Press.

Palmer, J., MacLeod, C. M., Hunt, E., & David-son, J. E. (1985). Information processing cor-relates of reading. Journal of Memory and Lan-guage, 24, 59–88.

Perfetti, C. A. (1985). Reading ability. New York:Oxford.

Potts, G. R., & Peterson, S. B. (1985). Incorpora-tion versus compartmentalization in mem-ory for discourse. Journal of Memory and Lan-guage, 24, 107–118.

RAND Reading Study Group. (2002). Readingfor understanding: Toward an R&D program

in reading comprehension. Washington, DC:RAND Education.

Stanovich, K. E. (1991). Word recognition: Chang-ing perspectives. In R. Barr, M. Kamil, P. Mo-senthal, & P. D. Pearson (Eds.), Handbook ofreading research (Vol. 2, pp. 418–452). NewYork: Longman.

Sweet, A. P., Guthrie, J. T., & Ng, M. M. (1998).Teacher perceptions and student motivation.Journal of Educational Psychology, 90(2), 210–223.

Vellutino, F. R. (1979). Dyslexia: Theory and re-search. Cambridge, MA: MIT Press.

Vellutino, F. R. (1987). Dyslexia. Scientific Ameri-can, March, pp. 34–41.

Woodcock, R. W. (1991). Woodcock Language Pro-ficiency Battery—Revised (English Form). Chi-cago: Riverside.

This content downloaded from 148.206.53.9 on Sat, 25 Jan 2014 23:36:04 PMAll use subject to JSTOR Terms and Conditions