Effects of Question Format on 2005 8th Grade Science WASL Scores

Effects of Question Format on

2005 8th Grade Science WASL Scores

Janet Gordon, Ed. D.

A Big Thank-you!

WERA

Pete Bylsma

Andrea Meld

Roy Beven

Yoonsun Lee

Joe Willhoft

North Central ESD

Today’s Presentation

• National trends in assessment

• Washington State trends

• My research on the science WASL

• A look at the literature to try to explain research results

• Take-home messages

National Trends in Science and Mathematics

Assessments

• Assessing what is valued in science professional community (inquiry, application)

• Assessing tightly integrated knowledge linked to application

• Involving teachers and professionals in test development

• What is easily measured

• Discrete bits of knowledge

• Off-the-shelf commercial tests

Placing More Emphasis On:

Compared To:

Improvements in theNational Assessment of

Educational Progress (NAEP)

• Items grouped into thematic blocks with rich context.

• Real-world application.

• Emphasizes integrated knowledge rather than bits of information.

The NAEP Results

• Lower omission rates on thematically grouped items compared to stand-alone m/c items.

• Increased student motivation to try item

• Increased student engagement(Silver, et al., 2000; Kenney & Lindquist, 2000)

Washington’s Science Standards & Strands

Washington’s Science Strands

2 Science WASL Question Types

Mostly Scenario Type

Rich Context

Clear, authentic task

5 to 6 multiple-choice, short or extended-constructed response items

Few Stand-Alone Type Discreet bits of

knowledge

1 multiple-choice or short-constructed response item

3 Item Response Formats

• Extended Constructed Response (ECR)– Students write 3-4 sentences

• Short Constructed Response (SCR)– Students write 1-2 sentences

• Multiple-choice (M/C)

3 Categories of Factors That Affect

Student Achievement Scores(The Student) Model of

Cognition

CultureGender, EthnicityIndividual differences

(The Test Item) ObservationItem format

InterpretationMeasurement model (IRT, Bayes Nets)

The Test Item - Observation

• Girls scored much lower on m/c compared to boys (Jones et al., 1992)

• Girls scored higher on constructed response compared to boys (Zenisky et al., 2004)

• Underrepresented groups score higher on performance-like formats (Stecher et al., 2000)

• Embedded Context = Increased comprehension (Solano-Flores, 2002; Zumbach & Reimann, 2002)

State’s 2005 Science WASL Scores

Proficient and Non-Proficient on 2005 8th-Grade Science WASL

80 7972

55

20 2128

45

0102030405060708090

AfricanAmerican

Hispanic AmericanIndian

White

Ethnicity

Per

cen

t

Not Proficient

Proficient

Statement of Problem

Is the science WASL accurately measuring

what students know?

Hypothesis

• Contextual, real-world scenarios make information accessible to all ethnicities (“cultural validity”).

• Clear, authentic tasks within scenario questions “unpacks” prior knowledge for ALL students

• Gender neutral – extended and short constructed response formats…not just m/c

Research Questions

On the 2005 8th grade science WASL:

Is there any significant difference in performance between gender and/or ethnic groups:

1) on stand-alone question types?

2) on scenario question types?

Methods - Instrument

• OSPI provided results from 8th grade 2005 science WASL

• Entire population: N = 81,690

• Invalid records excluded (e.g. cheating)

• Incomplete records excluded (e.g. gender or ethnicity omitted)

• Actual population: N = 77,692

Methods - Analysis

• MANOVA & follow-up ANOVAs

• Dependent Variable:– scenario score points – stand-alone score points

• Independent Variables:– gender– ethnicity

Methods - Analysis

• Analysis I – All item response formats

• Analysis II– Multiple-choice response formats only

• Effect Size (Cohen’s d)– Magnitude of differences

Results

Stand-Alone Question Type

Analysis Of Variance

Significant Differences?

Effect Size

Gender Groups – NOEthnic Subgroups – YESEthnicity x Gender-YES

Gender – Very smallEthnicity x Gender – very small

Ethnicity Small to Moderate

Between White,Asian,MultiRacial

AND AI/AN, HPI, Black, Hispanic groups

Scenario Question Type

Analysis Of Variance

Significant Differences?

Effect Size

Gender Groups – NOEthnic Subgroups – YESEthnicity x Gender-YES

Gender – Very smallEthnicity x Gender – very small

Ethnicity Large Effect Size

Between White,Asian,MultiRacial AND AI/AN, HPI, Black, Hispanic groups

Result 1

The achievement gap

between ethnic subgroups

is LARGER

on SCENARIO

vs. stand-alone question types.

Result 2

More students

received MORE points

on STAND-ALONE question

types compared to

scenario question types.

Result 3

A new achievement gap

between boys and girls

IS CREATED

when extended

constructed response items

were removed.

Three(3)

Prevailing Themes

In the Literature to

Help Explain

Differences

in

Student Achievement

THEME I - Individual Differences

ContentKnowledg

e

StrategicProcessingKnowledge

Expert/Novice Theory

(Alexander, 2003; Chi, 1988)

Novice-Dependent on working memory limits.

Expert-Fluent. Freed-up w.memory to focus on meaning/execution of problem.

THEME II - Opportunity To Learn

Quality Teaching & Learning (Darling-Hammond, 2000)

There are differences between schools in students’ exposure to knowledge or OTL

Deep understanding of science strategic processing knowledge often requires direct instruction & lots of practice (Garner, 1987)

OTL are often compromised in high-need schools (lack of PD support, supplies)

1) Passage Length (Davies, 1988)

2) Academic Vocabulary (Schaftel et al., 2006)

3) Degree of Knowledge Transfer (Chi et al., 1987)

4) Ambiguity & Complexity in Performance-Like Items (Haydel, 2003)

5) Science Strand Type (Bruschi & Anderson, 1994)

6) Instructional Sensitivity of Item (D’Agostino et al., 2007)

Theme III - Attributes of Items

Sensitivity of Items to Variations

in Classroom Instruction

(D’Agostino et al., 2007)

“The Test

Gap”

“The Learning

Gap”

Some item response formats are more sensitive to variations in classroom

instruction than others.

Standards

Translating This Into Classroom Practice

• Inspired to dig deeper into detailed learning progressions from novice to expert.

• Use these principals in your formative assessment process; can identify where students need rich feedback

• Many teachers are creating common Classroom-Based-Assessments (CBA) for quarterly benchmarking.

“To Go” Classroom Based Assessment (CBA) Creation

Checklist

“Because not all items are created equal.”

Did I…. For This Reason…

Use m/c, short and extended response item types?

To give both boys and girls equal chance to show evidence of learning.

Keep passage and sentence length to a minimum?

To uncover gaps in content knowledge and separate reading ability.

Use the same academic vocabulary that is in the standards?

Items are sensitive to variations in classroom instruction. Match instruction to standards.

“Lessons to Go”

• Use all 3 item response types in your classroom-based assessments (CBAs).

• Keep passage length at a minimum to tease apart content knowledge from reading ability and working memory limitations.

“Lessons to Go”

• Use the same academic vocabulary in the classroom and on your CBAs that is on the WASL.

• Use embedded context in a way that is similar to how students learned the material.

Suggestions for Future Research

1- Do similar patterns within question types exist between Schools? Classrooms?

2-Deeper examination of performance variance at the item level. What level of strategic processing knowledge is assumed compared to content knowledge?

3- Students’ perceptions of assessment items (think-aloud protocol).

4- Do the same patterns exist independent of reading proficiency?

References – Page 1

Alexander, P. A. (2003). The development of expertise: The journey from acclimation to proficiency. Educational Researcher, 32(8), 10-14.

Anderson, J. R. (1990). Cognitive Psychology and Its Implications (3rd ed.). New York: W.H. Freeman

Bruschi, B. A., & Anderson, B. T. (1994). Gender and ethnic differences in science achievement of nine-, thirteen-, and seventeen-year-old students. Paper presented at the Eastern Educational Research Association, Sarasota, FL.

Chi, M. T., Glaser, R., & Farr, M. J. (1988). The Nature of Expertise. Hillsdale, NJ: Lawrence Erlbaum Associates.

Cohen, D. K., & Hill, H. C. (2000). Instructional policy and classroom performance: The mathematics reform in California. Teachers College Record, 102(2), 294-343.

D'Agostino, J. V., Welsh, M. E., & Corson, M. E. (2007). Instructional sensitivity of a state's standards-based asssessment. Educational Assessment, 12, 1-22.Darling-Hammond, L. (2000). Teacher quality and student achievement: A review of state policy evidence. Seattle: Center for the Study of Teaching and Policy, University of Washington.

References – Page 2

de Ribaupierre, A., & Rieben, L. (1995). Individual and situational variability in cognitive development. Educational Psycologist, 30(1), 5-14.

Garner (1987). Garner, R. (1990). When children and adults do not use learning strategies: Towards a theory of settings. Review of Educational Research, 60, 517-529.

Haydel, A. M. (2003). Using cognitive analysis to understand motivational and situational influences in science achievement. Paper presented at the AERA, Chicago, Il.

Shaftel, J., Belton-Kocher, E., Glasnapp, D. & Poggio, J. (2006). The impact of language characteristics in mathematics test items on the performance of English language learners and students with disabilities. Educational Assessment, 11(2), 105-126.Marshall (1995).

Woltz, D. J. (2003). Implicit cognitive processes as aptitudes for learning. Educational Psycologist, 38(2), 95-104.

Documents

Effects of Question Format on 2005 8th Grade Science WASL Scores