TerraNova Evaluation of a Standardized Test Mini-Project 1 Teresa Frields and Mitzi Hoback

Preview:

Citation preview

TerraNovaEvaluation of a Standardized

Test

Mini-Project 1

TerraNovaEvaluation of a Standardized

Test

Mini-Project 1

Teresa Frields and Mitzi Hoback

A. General InformationA. General Information

Title: TerraNova

Publisher: CTB/McGraw-Hill

Date of Publication: 1997

A. General Information Cost

A. General Information Cost

Varies as to what is purchased

$122 per 30 Complete Battery Plus consumable test booklets

$92.50 per 30 Complete Battery Plus reusable test booklets

A. General Information Administration Time

A. General Information Administration Time

Varies by test and level

Typically given over a period of several test sessions or days

Fall, Winter, and Spring testing periods available

B. Brief Description of Purpose and Nature of Test

General Purpose of Test

B. Brief Description of Purpose and Nature of Test

General Purpose of Test

Constructed as a “comprehensive modular assessment series” of student achievement

Promoted as a device to help diverse audiences understand student academic achievement and progress

Reports provide useful and informative data which allows for national comparison of group and individual achievement

B. Brief Description of Purpose and Nature of Test

Population for which test is applicable

B. Brief Description of Purpose and Nature of Test

Population for which test is applicable

K-12Reading/language arts and mathematics

available for K-12Science and social studies tests available

1-2

B. Brief Description of Purpose and Nature of Test

Description of Content

B. Brief Description of Purpose and Nature of Test

Description of ContentMultiple choice formatGenerates precise norm-referenced achievement

scores and a full complement of objective mastery scores

Designed to measure concepts, processes, and skills taught throughout the nation

Content areas measured are Reading/Language Arts, Mathematics, Science, and Social Studies

B. Brief Description of Purpose and Nature of Test

Appropriateness of Assessment Method

B. Brief Description of Purpose and Nature of Test

Appropriateness of Assessment Method

Selected-response items can provide information on basic knowledge and some patterns of reasoning

Does not provide evidence for performance standards/targets

Other TerraNova formats provide a combination of selected-response and constructed-response

C. Technical EvaluationNorms/Standards

C. Technical EvaluationNorms/Standards

1. Type – The battery generates precise norm-referenced achievement scores and a full compliment of objective mastery scores.

Types of scores provided: Scaled Scores Grade Equivalents National Percentiles National Stanines Normal Curve EquivalentsReports are provided both individually and as groups of

students.

C. Technical EvaluationNorms/Standards

C. Technical EvaluationNorms/Standards

2. Standardization Sample – Size: The norming sample was based on a stratified national sample.

295 schools

Fall & Spring norming studies involved between 860,000 and 1,720,000

C. Technical EvaluationNorms/Standards

C. Technical EvaluationNorms/Standards

2. Standardization Sample – Representativeness:

Separate sampling designs were used for institutions of different types

Public schools stratified by region, community, type, size, & Orshansky Percentile (an indicator of socioeconomic status)

C. Technical EvaluationNorms/Standards

C. Technical EvaluationNorms/Standards

Standardization Sample – procedure followed in obtained sample:

Spring Standardization – April, 1996Fall Standardization – October 1996Recommended test administration period is

five week window centered on the norming periods

C. Technical EvaluationNorms/Standards

C. Technical EvaluationNorms/Standards

3. Standardization Sample – Availability of subgoup norms

Questionnaire sent to participating schools95% responded in the fall100% responded in the spring

C. Technical EvaluationNorms/Standards

C. Technical EvaluationNorms/Standards

3. Standard setting procedures employed – qualifications and selection of judges:

Nominations were made of experienced teachers and curriculum specialists with national reputations

Judges had to possess “deep understanding” of one of the five content areas

C. Technical EvaluationNorms/Standards

C. Technical EvaluationNorms/Standards

3. Standard setting procedures employed – number of judges:

2 committees for each of 5 content areasPrimary/Elementary and Middle/High

School4-5 teachers per committee, one curriculum

expert (external) and one CTB content expert (approximately 70 people total)

C. Technical EvaluationReliability

C. Technical EvaluationReliability

1. Types – Measure of internal consistency:

Kuder-Richardson Formula 20 (KR20) Item pattern KR20 (a unique measure that

takes into account the additional accuracy associated with IRT item-pattern scoring)

Coefficient alphaOn individual student score reports, a student’s score is

reported along with a confidence band.

C. Technical EvaluationReliability

C. Technical EvaluationReliability

2. Results:

Reliability coefficients were consistently .80s and .90s

Spelling consistently lowerGrade 1 and 2 also had slightly lower

coefficients

C. Technical EvaluationValidity

C. Technical EvaluationValidity

1. Types – Content-related:

Numerous studies (e.g. classroom pilots, usability, sensitivity) conducted

Advisory panel of teachers, administrators, and content specialists from all parts of country

Based on recommendations of SCANS (Secretary’s Commission of Achieving Necessary skills) report

C. Technical EvaluationValidity

C. Technical EvaluationValidity

1. Types – Content-related:

Developers and scorers worked together as constructed-response items were scored for consistency and accuracy of scoring guides and process

Reviewed various informational sources for children to determine topics of interest

C. Technical EvaluationValidity

C. Technical EvaluationValidity

1. Types – Criterion-related:

Conducted variety of research studies, such as correlation with SAT and ACT, NAEP, TIMMS

C. Technical EvaluationValidity

C. Technical EvaluationValidity

1. Types – Construct-related:

Careful test development process to support content validity and comprehensiveness of test

Construct validity for skills, concepts and processes measured in each subject

C. Technical EvaluationValidity

C. Technical EvaluationValidity

2. Results:

Provides achievement scores that are valid for several types of educational decision making

A thorough validity evaluation encompassed content-, criterion-, and construct-related evidence

BiasBias

Used the following procedures to reduce the amount of bias:

Ensured valid test planFollowed stringent editorial guidelinesConducted expert reviewsAnalyzed student data for differential item

functioningSelected best items

D. Summary of MMY ReviewsD. Summary of MMY Reviews

Reviewed by Judith A. Monsaas, Assoc. Prof. Of Education, North Georgia College and State University, Dahlonega, GA

Tests are “very engaging and user friendly”. Materials are well-constructed, and attractive,

Addition of performance standards is helpful for schools moving toward a standards-based curriculum framework

D. Review, continuedD. Review, continued

Claims to assist in decision making in many areas, including evaluation of student progress, instructional program planning, curriculum analysis, class grouping, etc. This reviewer believes they can support this claim

Has a particularly useful section for parents on “Using Test Results”

D. Review, continuedD. Review, continued

“Although these tests are attractive and more engaging than most achievement tests I have inspected, I doubt that students will forget that they are taking a test.”

Good section on “Avoiding Misinterpretations” when using grade equivalents is helpful

D. Review, continued D. Review, continued

Process used to develop the test and ensure content validity was very thorough and clearly explained

Norming and score reporting methods are well-developedReviewer’s only problem is with the mastery classifications for the criterion-

referenced interpretations. She feels they are arbitrarily defined.

D. Review continuedD. Review continued

Reviewed by Anthony J. Nitko, Professor, Department of Educational Psychology, University of Arizona, Tucson, AZ

One change in the new edition is that items within each subtest are organized according to contextual themes, countering the criticism that standardized tests assess strictly decontextualized knowledge and skills

D. Review ContinuedD. Review Continued

Developers carefully analyzed curriculum guides from around the country, as well as national and state standards and textbook series

Several usability studies were run. The results of these were used to improve test items, teachers’ directions, and page designs

D. Review continuedD. Review continued

Earlier editions criticized for problems related to speed. This version corrects those. Typically fewer than 4% of students fail to respond to the last item on each subtest

“One of the better batteries of its type.”Teachers’ materials exceptionally well-

done and informative

E. Critique of the Instrument E. Critique of the Instrument

Our research on the TerraNova helps us to draw the following conclusions:

A complete and comprehensive testNumerous measures and studies were done

to ensure technical requirementsTerraNova takes pride in its overall test

design, construction, norming, national standardization process, reliability, validity, and the reduction of bias issues

E. Critique of the InstrumentE. Critique of the Instrument

Does a good job supporting its purpose as a measure to aid in student achievement

Provides three main types of information including norm-referenced information, some criterion information, and standards-based performance information

Serves as a good measure in comparing student achievement with national performances

E. Critique of the InstrumentE. Critique of the Instrument

This is not a test that should be used by itself. It is simply one type of measure and cannot be the only measure used in making critical decisions

When used in conjunction with other test methods and teacher judgment, it is an effective measure for what it purports to do

Caution should be used when using this assessment to track state standards, although it purports to be accurately correlated, there is no substantial proof.

E. Critique of the InstrumentE. Critique of the Instrument

Interesting Tidbits:Del Harnish has done research on bias

issues and is published for his work on the TerraNova

Testnote Clarity is a computer program available with the disaggregation of data which allows the user to customize and apply to district curriculum

Recommended