Research on Student Assessment

Core Assessment 4A Section One: Literacy Assessment and Professional

Development Report

For many reading specialists, assessment of student reading abilities is one of

the most important aspects of the job. Student assessment measures inform

instructional choices and practices, and provide a complex picture of why and how

struggling readers are encountering challenges in their journey towards becoming

better readers. The informed reading specialist must have a deep understanding of

current research on literacy assessment if they are to be an effective agent of

assessment. The following is a synthesis of research on factors that contribute to

reading success; assessments, their uses, and misuses; purposes for assessing

performance of all readers including tools for screening, diagnosis, progress, and

measuring outcomes; reliability, content, and construct validity; and state

assessment frameworks, proficiency standards, and student benchmarks.

There are a great many studies that investigate factors that contribute to

reading success in both the home and school settings. Included here is research on

how student temperament can impact academic resiliency when learning reading

skills; an analysis of statistically significant correlatives of contributing factors to

the academic success of students placed into a program for gifted children, yet who

have come from low socio-economic backgrounds; a general examination of factors

that contribute to reading success, and one study that focuses specifically on family

contributions to reading success.

McTigue, Washburn, and Liew (2009) have done recent research into the

impact children’s temperaments and the development of their social-emotional

1

skills has upon their academic resiliency and literacy learning. Their analysis of

current research supports the idea that “time spent developing early socioemotional

skills boosts students’ future success in literacy” (p. 423). When students of all

temperaments- extroverted or introverted, aggressive or more hesitant- are

coached to develop a sense of self-efficacy (which is teachable and not necessarily

inherent or fixed) their chances of developing literacy skills and reading ability

increase. Teachers of reading should take note that self-efficacy should be taught

hand-in-hand with early literacy, and indeed with literacy at any level.

Bailey’s recent study of students coming from low socioeconomic

backgrounds in a program for gifted children (children identified for and referred to

the Questioning, Understanding, Enriching, Seeking and Thinking (QUEST)

program) suggests that one key at-home influence on children’s later reading

success is the frequency with which they are read to. Upon statistical analysis of

questionnaires and interviews with students’ parents, the study found that of three

variables analyzed—regular parental reading (activities that take place at least 3-4

times per week), preschool exposure, and age at which children receive initial pre-

reading or reading instruction— “[I]t was determined that the economically at-risk

QUEST students privy to regular parental reading were more likely to experience

early reading success that QUEST students that were not exposed to the variable”

(Bailey, 2006, p. 314) In fact, regular parental reading was the only factor that

indicated a statistically significant influence on reading ability of the at-risk QUEST

children. Reading specialists who are working with family literacy programs—

especially with parents who may be economically at-risk— should be sure to

2

provide information and resources around the importance of frequent parental

reading to children.

Leslie and Allen (1999) found three independent variables that exerted

statistically significant influence on reading scores: amount of time spent reading in

classroom settings, level of parental involvement, whether in attendance at literacy

events or return rate of forms sent home, and amount of time a student spent

recreationally reading. All three were strongly correlated with higher reading

achievement in students that participated in the study.

Many schools are finally recognizing the power of families as first teachers as

significant factors in the success of the development of students’ literacy levels and

are implementing family literacy programs that connect the school and the home.

John Holloway (2004) nicely summarizes some of the recent research at the time

coming to the conclusion that “research indicates that family literacy activities

contribute to children’s success in school and that family literacy programs can

provide opportunities for educational success for parents and children. These

programs can also serve as models of family involvement, showing how families can

become part of an extended classroom and build on the work of the school”

(Holloway, 2004, p. 89).

Understanding factors that contribute to reading success is one of the first

steps when constructing a comprehensive literacy program designed to raise the

reading levels of all students. Additionally, a program of this nature would be

incomplete without a well-selected set of assessments that can provide rich data on

individual students so instruction can be tailored to meet their specific needs. The

3

reading specialist should be aware of common uses—and misuses—of such

assessments, and the direction in which the field of reading assessment is headed.

Included here are several articles discussing the need for a greater balance between

process and product assessments in an educational era where high-stakes,

summative, standardized testing has been privileged above other, more detailed

forms of individualized assessment.

Upon my review of the literature, I would assert that one of the greatest

points of active discussion in the field of reading assessment at this point in time is

“Balancing the assessment of learning and for learning in support of student literacy

achievement”. In Edwards, Turner, and Mokhtari’s 2008 article of the same name,

they explore the frustrations that literacy educators deal with in the face of this

imbalance (also referred to by others as assessment of “product” vs. “process”), and

suggest a handful of ideas to help such instructors strike a balance between the two

types of assessment. According to Edwards, Turner, and Mokhtari, multiple

assessments, culturally appropriate assessments, engaging students in the

assessment process and engaging school personnel in inquiry and action research

would be a first step in moving towards greater balance. The reading specialist

would do well to heed the research and recommendations in this area. A paradigm

shift is needed and reading specialists will be some of the prime movers and agents

of this change.

Winograd, Paris, and Bridge (1991) echo concerns about balance in literacy

assessment. They cite research stating that “traditional assessments are based upon

an outdated model of literacy,” “traditional assessments prohibit the use of learning

4

strategies,” “traditional assessments redefine educational goals,” and “traditional

assessments are easily misinterpreted and misused.” Their suggestions for

improving assessment are helpful. They suggest clarifying the goals of instruction

as well as the purposes of assessment, selecting multiple measures, and interpreting

results in ways that enhance instruction. They subsequently propose a model for

improving literacy assessment that includes helping students gain ownership of

their learning by monitoring comprehension, fluency, and a list of books read and

preferred authors. They also suggest helping teachers make instructional decisions

and helping parents understand their children’s progress through various measures

including conferences, or additional comments added to plain letter grades. They

even suggest helping administrators and community members make larger policy

decisions that would impact selection of testing that provides more detailed

feedback of student progress.

While assessment for learning is an admirable goal, it can be difficult to use—

or misuse—in a traditional educational setting. Many teachers and reading

specialists with good intentions may not know how to go about the data-driven

instruction process. Mokhtari, Rosemary and Edwards (2007) present a structure

for data analysis teams to use to help guide efforts at data driven instruction. Called

“The Data Analysis Framework for Instructional Decision Making,” it is a list of

guiding questions to help teams new to the data driven analysis procedure. Efforts

to use literacy assessments should always be based on research and the guidance of

educated professionals with proven validity and success. McKenna and Walpole

(2005) also suggest a model called “Reading First” which assumes a carefully

5

selected comprehensive reading program has already been selected district wide

and that there are already various screening assessments in use to catch students at

risk in various and specific areas. The framework of interventions varies on the

level of risk the student demonstrates on assessments and provides a structure to

help guide teachers through a process they may be unfamiliar with.

It should be clear from the previous section that process-focused assessment

for learning is in need of bolstering. The research on product-focused assessments

is copious and exhaustive. Standardized, product-based tests have been in use for

decades, yet it is common knowledge in the educational field that national reading

scores have held steady for decades as well. What about research on process based

testing? What tests do we use to assess student performance that provides us with

complex knowledge about multiple facets of a student’s reading abilities?

Understanding the use and misuse is only the beginning of a deeper

knowledge of assessment that should be cultivated by the reading specialist.

Additionally reading specialists should be aware of the different types of reading

assessments, their intended audiences, and how to use them to monitor progress

and measure outcomes.

Nina Nilsson (2008) provides an analysis of eight different informal reading

inventories used to assess reading process ability levels. The Informal Reading

Inventories analyzed were Applegate, Quinn and Applegate’s (2002) The critical

reading inventory: Assessing students’ reading and thinking (2nd ed.), Bader’s (2005)

Bader reading and language inventory (5th ed.), Burns and Roe’s (2007) Informal

reading inventory, Cooter, Flynt, and Cooter’s (2007) Comprehensive reading

6

inventory: Measuring reading development in regular and special education

classrooms, Johns’ (2005) Basic reading inventory (9th ed.), Leslie and Caldwell’s

(2006) Qualitative reading inventory-4, Silvaroli and Wheelock’s (2004) Classroom

reading inventory, and Woods and Moe’s (2007) Analytical reading inventory. All of

the informal reading inventories included passages to be read aloud and/or silently

by the student being evaluated. Each of the IRIs took a slightly different approach to

vocabulary, although all but one include word lists of varying levels to gain insights

into the student’s word recognition and decoding skills. Emphasis on word

recognition for the sake of identification versus word identification for the sake of

vocabulary knowledge and comprehension varied. Some IRIs provided

supplemental sections for phonemic awareness and phonics, but these were not

required portions of the main set of recommended evaluations. Additionally, all but

one of the IRIs included some measure of fluency. Nilsson provides a handy

summary of recommendation for choosing an IRI:

For reading professionals who work with diverse populations and are

looking for a diagnostic tool to assess the five critical components of reading

instruction, the CRI-CFC, in Spanish and English (Cooter et al., 2007) for

regular and special education students, as well as some sections of the BRLI

(Bader, 2005), are attractive options. Most likely, those who work with

middle and high school students will find the QRI-4 (Leslie & Caldwell, 2006)

and ARA (Woods & Moe, 2007) passages and assessment options appealing.

The CRI-2 (Applegate et al., 2008) would be a good fit for reading

professionals concerned with thoughtful response and higher-level thinking.

7

In addition, the variety of passages and rubrics in BRI (Johns, 2005) and

contrasting format options in CRI-SW (Silvaroli & Wheelock, 2004) would

provide flexibility for those who work with diverse classrooms that are

skills-based and have more of a literacy emphasis. For literature-based

literacy programs, the IRI-BR (Burns & Roe, 2007) with its appendix of

leveled literature selections is a valuable resources for matching students

with appropriate book selections after students’ reading levels are

determined. (p. 535)

A reading specialist would be wise to follow up on Nilsson’s

recommendations when seeking the proper IRI to use in the school district they are

working in. Additionally, it is important to have a more in-depth understanding of

some of the classical components of each IRI. Miscue analysis is experiencing

something of a resurgence, and McKenna and Picard provide a brief re-assessment

of the technique in their 2006 article, Revisiting the role of miscue analysis in effective

teaching. After a brief discussion of the history of miscue analysis, they explore one

study that put the validity of miscue analysis—insofar as it measures how and why

students make miscues based on context and prior knowledge—into question. In

other words, the reason why students make the errors they make are still not

completely clear, but the fact that they make errors should be considered. They

suggest that miscue analysis can be useful, but should results should be interpreted

with caution. They encourage the use of error totals for determining a student’s

independent and instructional reading levels, but semantically correct miscue tallies

are not supported by research and should be avoided. They write, “teachers should

8

view meaningful miscues (like substituting pony for horse) as evidence of

inadequate decoding skills, and not as an end result to be fostered. Because

beginning readers will attempt to compensate for weak decoding by reliance on

context, teachers should instruct them in how to use the graphophonic, semantic,

and syntactic cueing systems to support early reading” (McKenna & Picard, 2006).

They conclude that teachers and reading specialists should use focus on using

miscue analysis to monitor whether a student is relying too heavily on context and

instead shifting more towards decoding to figure out unknown words.

Kuhn, Schwanenflugel, and Meisenger provide a closer look at the

assessment of reading fluency in their 2010 article, Aligning theory and assessment

of reading fluency: Automaticity, prosody, and definitions of fluency. They explore

several theoretical perspectives on reading fluency and finally suggest an updated

definition of fluency that synthesizes the body of research presented earlier in the

article:

Fluency combines accuracy, automaticity, and oral reading prosody, which,

taken together, facilitate the reader’s construction of meaning. It is

demonstrated during oral reading through ease of word recognition,

appropriate pacing, phrasing, and intonation. It is a factor in both oral and

silent reading that can limit or support comprehension. (p. 240)

The purpose of their analysis and the production of their definition seems to

be to shed new light on the perception that a “fast” reader is a “good” reader. True

reading fluency is a combination of speed and prosody, the speed being an indicator

that the reading is occurring at a rate fast enough to be comprehended as whole

9

phrases and ideas, and the prosody being an indication of the reader’s

comprehension of the interpreted meaning. They note three final implications for

assessment. The first is that they suggest that if a word-per-minute assessment is

being used to assess student reading fluency that a prosodic measurement such as

the NAEP oral reading fluency scale (Pinnell et al., 1995) or the multidimensional

fluency scoring guide (Rasinski et al. 2009; Zutell & Rasinski, 1991) supplement its

use. Second, they suggest that fast decoding not be over-emphasized, and that a

comprehension evaluation be administered any time fluency is measured, which

could be as simple as a few impromptu questions or a brief discussion about what

was just read. Finally they assert that oral reading fluency is only one measure of

student reading ability, and that it be held in context with testing that evaluates

other aspects of reading ability as well, such as comprehension questions, retellings,

or miscue analyses.

While awareness of tests designed to assess different aspects of a reader’s

ability is important, it is also important to consider their reliability and construct

validity. We will take a brief look at research exploring the validity of IRIs, the

perceived validity of teachers as end-users of many reading assessment tools, the

validity of such “qualitative” forms of assessment as student portfolios, and the

validity of tests designed to measure ELL reading ability.

In 2001 Klesius and Homan published an article entitled, A validity and

reliability update on the informal reading inventory with suggestions for

improvement. In it they explore several aspects of validity in a broad comparison of

a commonly used group of Informal Reading Inventories. They explored content

10

validity, concurrent validity (“a comparison of performance on a new [IRI] test to

performance on existing [IRI] tests”, p. 72) inter-scorer reliability, and the impact of

passage length on the validity of reading scores. In terms of content validity, some

concerns were what percentage of comprehension questions could be answered

independent of having read the passage, or the scoring criteria used to determine

the students’ instructional level of reading, which varied greatly from one test to

another. The research on concurrent validity showed that generally from one test to

another the coefficients were acceptable, although little research has been done in

this area. Based on five separate studies, it was found that there was generally a

70% inter-scorer reliability rate. Research on passage length suggests that passages

shorter than 125 words may result in erratic or inaccurate results. The authors

make several suggestions in consideration of their findings both for teachers and for

evaluating IRIs. They conclude that despite issues with validity, IRIs are still

valuable reading assessment tools, and should be used in concert with some of the

suggestions and precautions they have made in consideration of the research on

validity.

Kyriakides (2004) explores the possibilities inherent in asking the teachers

themselves how useful the testing measures are to them, and how they use them in

their teaching. He suggests that this way of evaluating test “validity” is one way that

could be useful in developing the test in the future. Teachers responded to a set of

questionnaires and the data was processed to show the mean and standard

deviation of each of the responses. I found this article very interesting, and will

consider a strategy such as this to evaluate the information that is most useful to my

11

teachers as I provide them with student reading ability score information. In

addition to traditional measures of test validity, a reading specialist should consider

the usefulness of the test to the teachers and students as a key component in its true

“validity”.

For many schools, portfolio assessment is assumed to be outside of the scope

of possibility- either because it takes too much work to maintain properly, or

because such open-ended measures of student progress are simply not “valid” and

there are no “high stakes” portfolio evaluations coming down from on high.

Johnson, Fisher, Willeke and McDaniel (2003) explore validity measures of a family

literacy portfolio in an effort to contribute to research on such open-ended

assessments. They examined inter-scorer reliability rates on several various goals

assessed in the portfolio, and a holistic rubric used to evaluate 42 family portfolios.

While the inter-rater reliability for the estimate of the six goals ranged from a

dependability of .47 to .7, the holistic rubric had a much stronger reliability of .79.

According to the authors’ research, various sources of guidance suggest that “low

stakes assessments require a minimal reliability of .70; whereas, in applied settings

with high-stakes, tests require a minimal reliability of .9 (Herman et al., 1992;

Nunnally, 1978)” (p. 373). Despite the fact that only the holistic rubric would

qualify as acceptably valid under these terms, “feedback from stakeholders

indicated that the collaborative decision-making resulted in a credible assessment.

Family educators reported that their involvement focused attention on program

goals, contributed to their professional development, and increased their

understanding of families” (p. 375). The authors conclude that portfolio evaluation

12

has much potential, and that further research on reliability and validity measures

would contribute greatly to the field.

One validity construct that is becoming increasingly relevant in

contemporary U.S. culture is the validity of ELL test scores. While it is important to

measure how ELL students are performing in English—as that is commonly the

language of instruction—it may also provide an inaccurate or irrelevant assessment

of aspects of their knowledge or understanding that are not properly evaluated in a

language they are not fully fluent in. Sireci, Han, and Wells (2008) provide a

complex statistical formula that could potentially be used by those seeking to

evaluate test validity for ELLs in the future, although the evidence required to fill in

the formulas is yet to be collected.

One final factor that is key in literacy assessment is the standards we use to

provide guidance at the state and district level both for students and teachers, and

for reading professionals. McCombes-Tolis and Fein (2008) cite research affirming

“a direct relationship between teachers’ knowledge and skills about essential

components of literacy instruction and student literacy outcomes. (p. 236)”

Unfortunately they also note that there is a lack of consistent certification and

content standards for reading professionals, content-area teachers and classes,

which results in an inconsistent quality of literacy education. They discuss one

potentially successful possibility as the Connecticut Blueprint for Reading

Achievement. The authors identify this publication as an exhaustive and

comprehensive source of effective, research-backed literacy standards for educators

and classrooms. In an effort to evaluate the effectiveness of this document for

13

creating change within the teaching and learning community in Connecticut, the

authors administered an extensive questionnaire that measured teacher knowledge

of and perceived effectiveness of the blueprint. Their results were disappointing.

Most teachers did not correctly answer questions about basic content of the

blueprint, and were unable to correctly answer questions about basic literacy

competencies. They write,

Collectively, these results indicate that simply articulating essential teacher

competencies (knowledge/skills) within state reading blueprints is

inadequate to promote mastery of these competencies across targeted

teacher populations. Findings suggest instead that states that have taken

care to articulate essential teacher competencies within their reading

blueprints should also ensure that higher education teacher preparation

practices systematically prepare teacher candidates to meet these

competency standards so they may begin their careers as educators able to

effectively serve the literacy needs of diverse student populations (Ehri &

Williams, 1995). (p. 263)

Clearly it makes little sense to go through the trouble of producing high-

quality literacy standards if educators are unaware of them, or lack the training

necessary to implement them. Reading specialists everywhere should be sure to

familiarize themselves with the reading standards suggested within their states, as

research suggests that knowledge of these standards alone is correlated with

student reading success.

14

There have been those who question the effectiveness of the use of reading

standards at all for various specific uses. In Shannon’s 1995 article Can reading

standards really help? He discusses the original efforts of a joint task force between

the International Reading Association (IRA) and the National Council of Teachers of

English (NCTE) to create a set of national reading standards for reading

professionals and educators. The author’s primary concern is that the standards

created are themselves open for interpretation, especially from a social justice

standpoint. He asserts that the standards are open enough that different literacy

educators starting from different points and with different audiences could arrive at

different ends using the same guidelines, and that they do not help to address

student inequalities. He writes, “My point is that standards (or even laws) cannot

change biased thinking and behavior. (p. 6)” The article is an opinion piece, and it

seems the author has made an assumption that the primary, or even secondary

attempt of the authors of the standards was to directly impact de facto or de jure

inequality between student reading levels. However, his suggestion that the IRA

and NCTE “put some teeth into their declarations against bias in and out of schools”

(p. 7) is a relevant cry in 1995 when schools were only just beginning to grapple

with racial and economic inequality in schools in earnest.

Most reading educators and researchers agree that standards are only as

successful as those who choose to enforce and evaluate them in the classroom and

with individual students.

15

Bibliography

Bailey, L. B. (2006). Examining gifted students who are economically at-risk

to determine factors that influence their early reading success. Early Childhood

Educaction Journal, 33(5), 307-315.

Edwards, P. A., Turner, J. D., & Mokhtari, K. (2008). Balancing the assessment

of learning and for learning in support of student literacy achievement. The Reading

Teacher, 61(8), 682-684.

Holloway, J. H. (2004). Family literacy. Educational Leadership, 61(6), 88-89.

Johnson, R. L., Fisher, S., Willeke, M. J., & McDaniel, F. (2003). Portfolio

assessment in a collaborative program evaluation: the reliability and validity of a

family literacy portfolio. Evaluation and Program Planning, 26(1), 367-377.

Klesius, J. P., & Homan, S. (1985). A validity and reliability update on the

informal reading inventory with suggestions for improvement. Journal of Learning

Disabilities, 18(2), 71-76.

Kuhn, M. R., Schwanenflugel, P. J., & Meisinger, E. B. (2010). Aligning theory

and assessment of reading fluency: Automaticity, prosody, and definitions of

fluency. Reading Research Quarterly, 45(2), 230-251.

Kyriakides, L. (2004). Investigating validity from teachers' perspectives

through their engagement in large-scale assessment. Assessment in Education, 11(2),

143-163.

Leslie, L., & Allen, L. (1999). Factors that predict success in an early literacy

intervention project. Reading Research Quarterly, 34(4), 404-424.

16

Mccombes-Tolis, J., & Feinn, R. (2008). Comparing teachers' literacy-related

knowledge to their state's standards for reading. Reading Psychology, 29(1), 236-

265.

McKenna, M. C., & Picard, M. C. (2007). Revisiting the role of miscue analysis

in effective teaching. The Reading Teacher, 60(4), 378-380.

McKenna, M. C., & Walpole, S. (2005). How well does assessment inform our

reading instruction?. The Reading Teacher, 59(1), 84-86.

McTigue, E. M., Washburn, E. K., & Liew, J. (2009). Academic resilience and

reading: Building successful readers. The Reading Teacher, 65(2), 422-432.

Nilsson, N. L. (2008). A critical analysis of eight informal reading inventories.

The Reading Teacher, 61(7), 526-536.

Shannon, P. (1995). Can reading standards really help?. Clearing House,

68(4), 1-7.

Sireci, S. G., Han, K. T., & Wells, C. S. (2008). Methods for evaluating the

validity of test scores for english language learners. Educational Assessment, 13(1),

108-131.

Winograd, P., Paris, S., & Bridge, C. (1991). Improving the assessment of

literacy. The Reading Teacher, 45(2), 108-116.

17

Documents

Research on Student Assessment