16
177 Background A s students progress through high school and into institutions of higher education, they take numerous tests that vary in scope, content, and purpose. At the K-12 level, almost all of the states are currently using or develop- ing assessments that are aligned with state stan- dards. 1 Some of these assessment programs rely on commercially available, standardized, multi- ple-choice tests administered to every student, whereas others hire their own staff to develop items in multiple formats (including, for exam- ple, essays or portfolios) which are then admin- istered in a matrix sampling scheme (i.e., not every student completes every item). In some states, scores on these tests are used to inform decisions about grade promotion and gradua- tion. Students who plan to attend college also take one or more admissions tests, such as the ACT or the SAT I and II, and may take Advanced Placement (AP) exams, which pro- vide college credit for high school coursework. When they arrive at college, many students are asked to take placement exams which are used to assign them to appropriate courses. These exams are especially prevalent in large state higher-education systems. Assessments play a prominent role in the transition from high school to college. In most cases, test scores are among the major criteria used to determine who is accepted into an institution and who is assigned to remedial courses. Although these scores are imperfect, indirect measures of what students have accom- plished, they often provide valuable informa- tion that may improve the decision-making process. A well-constructed test of achievement in a particular subject area constitutes a sample of performance from a larger domain to which the user wishes to generalize. This domain will vary depending in part on the purpose of the assessment. A statewide achievement test might be designed to sample from a range of topics and to cover material learned across several grades. A college placement exam, in contrast, may have a narrower focus, reflecting the cur- riculum of a particular course. Consequently, these tests may not resemble one another close- ly in the constructs that they measure. In other words, they may not be well aligned. The goal of the present study is to investi- gate the degree of alignment among these dif- ferent types of tests in six case-study states, and to explore the potential consequences of any misalignment. We will compare assessments Alignment Among Secondary and Post-Secondary Assessments in California Vi-Nhuan Le, Laura Hamilton, and Abby Robyn The RAND Corporation PACE Chapter 9

Alignment Among Secondary and Post-Secondary Assessments in

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Alignment Among Secondary and Post-Secondary Assessments in

177

Background

As students progress through high schooland into institutions of higher education,

they take numerous tests that vary in scope,content, and purpose. At the K-12 level, almostall of the states are currently using or develop-ing assessments that are aligned with state stan-dards.1 Some of these assessment programs relyon commercially available, standardized, multi-ple-choice tests administered to every student,whereas others hire their own staff to developitems in multiple formats (including, for exam-ple, essays or portfolios) which are then admin-istered in a matrix sampling scheme (i.e., notevery student completes every item). In somestates, scores on these tests are used to informdecisions about grade promotion and gradua-tion. Students who plan to attend college alsotake one or more admissions tests, such as theACT or the SAT I and II, and may takeAdvanced Placement (AP) exams, which pro-vide college credit for high school coursework.When they arrive at college, many students areasked to take placement exams which are usedto assign them to appropriate courses. Theseexams are especially prevalent in large statehigher-education systems.

Assessments play a prominent role in thetransition from high school to college. In mostcases, test scores are among the major criteriaused to determine who is accepted into aninstitution and who is assigned to remedialcourses. Although these scores are imperfect,indirect measures of what students have accom-plished, they often provide valuable informa-tion that may improve the decision-makingprocess. A well-constructed test of achievementin a particular subject area constitutes a sampleof performance from a larger domain to whichthe user wishes to generalize. This domain willvary depending in part on the purpose of theassessment. A statewide achievement test mightbe designed to sample from a range of topicsand to cover material learned across severalgrades. A college placement exam, in contrast,may have a narrower focus, reflecting the cur-riculum of a particular course. Consequently,these tests may not resemble one another close-ly in the constructs that they measure. In otherwords, they may not be well aligned.

The goal of the present study is to investi-gate the degree of alignment among these dif-ferent types of tests in six case-study states, andto explore the potential consequences of anymisalignment. We will compare assessments

Alignment Among Secondary andPost-Secondary Assessments in California

Vi-Nhuan Le, Laura Hamilton, and Abby Robyn The RAND Corporation

P A C E

Chapter 9

Page 2: Alignment Among Secondary and Post-Secondary Assessments in

178

used for college admissions, college placement,and K-12 system monitoring and accountabilityin each state, classifying items along severaldimensions. For each state, we will obtain asummary of the ways in which the assessmentsare and are not aligned with one another, anddiscuss possible implications. This report pre-sents the results for California. It is importantto note that we do not claim that all tests mustbe well aligned. The conditions under whichalignment is important are discussed later.

This analysis is part of a larger study com-missioned by Stanford University. “The BridgeProject: Strengthening K-16 TransitionPolicies” is a national study funded by the PewCharitable Trusts and the U.S. Department ofEducations Office of Educational Research andImprovement. It focuses on the need toincrease the alignment between higher educa-tion admissions-related requirements and K-12curriculum frameworks, standards, and assess-ments. The study was prompted in part by aperceived disjuncture between standards forcollege admission and placement, on the onehand, and high school curriculum and instruc-tion on the other.2 The salience of this problemis underscored by a 1995 National EducationAssociation survey in which 82 percent ofHouse and Senate Education chairs polledviewed the improvement of connectionsbetween colleges and schools as among theirhighest priorities for higher education.3

Admissions policies are a primary way in whichcolleges influence the education of secondarystudents, and the tests that are given as part ofthe admissions and placement processes are amajor component of these policies.

Importance of AlignmentThere are at least three major ways to thinkabout alignment among different assessments.First, the content and format of test items sendmessages to students who take them.Particularly when tests have high stakesattached, such as graduation from high school,selection into college, or placement into aremedial program, they can be expected toinfluence the behaviors of examinees and, insome cases, their instructors. For example,multiple-choice tests are often criticized forencouraging an emphasis on memorization ofdiscrete facts rather than extended problemsolving. It is important to determine whethertests are sending a consistent message to stu-dents regarding what kinds of knowledge andskills are valued by the institutions they wish toattend. It is also critical that students haveample opportunity to prepare in appropriateways for high-stakes assessments. If studentsenter college unaware of what skills they will beexpected to demonstrate on a placement exam,they may not perform as well as they would ifgiven the opportunity to prepare. It is impor-tant to note here that the signals a test sendsare somewhat distinct from the measurementproperties of the test. For example, it is possi-ble that a multiple-choice test does indeedmeasure complex problem-solving skill, butthat examinees and instructors perceive the testas being focused on memorization or recall.

The importance of aligning the variousaspects of the educational system to support acommon set of goals has been recognized byadvocates of systemic reform,4 promoters oftest-based accountability systems,5 and many

C r u c i a l I s s u e s i n C a l i f o r n i a E d u c a t i o n 2 0 0 0

Page 3: Alignment Among Secondary and Post-Secondary Assessments in

others involved in educational reform efforts.Especially important to standards-based reformefforts is the degree to which the standards andthe assessments used to measure progresstoward them are consistent with one another. Arecent study by Webb found varied degrees ofalignment between tests and standards in mathand science in four states.6 A content analysis ofteacher licensing tests conducted by theEducation Trust showed that most such testsrequired little more than high school levelknowledge but that some were more rigorousthan others.7 Standards and assessments thatare not aligned with one another or thatencourage a focus on low-level skills createmixed messages and confusion for students,teachers, and others involved in promoting stu-dent learning.

The second aspect of alignment involves theconsistency with which students are rankordered or classified into categories or pro-grams (e.g., remedial instruction) by differenttests. If two tests are designed to measure thesame abilities, evidence must be gathered toshow that students who do well on one tend todo well on the other. Although most tests ofacademic achievement tend to correlate highlywith one another, even when subject and itemformat differ, it is nonetheless important toevaluate the magnitude of this correlation andthe consistency of any classification that resultsfrom test use. Scores on a high school mathexam should, for example, correlate highly withscores on a math placement test administeredby the higher education system.

Finally, it is essential that the standards usedfor decision making be comparable acrossassessments and set in a technically sound andcredible manner. The placement process often

involves selecting a cut score on an exam andassigning students to programs or coursesbased on whether or not their scores exceededthis cut score. Statewide assessment programsare increasingly reporting student performancein terms of standards similar to the achieve-ment levels used on the National Assessment ofEducational Progress (NAEP). These effortshave been criticized in part because the processof mapping performance to descriptors reliesheavily on judgments that are often error-prone.8 Even so, assessment results continue tobe reported in terms of standards, and it istherefore important to determine whether thestandards set on different tests provide reason-ably consistent information about students. If astudent is labeled “Advanced” or “Proficient”on a state test but is unable to reach the level ofperformance on a placement test necessary toavoid remedial coursework, there is reason tobelieve that the standards used on one or bothtests are inappropriate.

The current project is designed to provideinformation concerning the degree and natureof alignment among tests used for K-12 systemmonitoring and accountability, college admis-sions, and college placement in six states. Theproject is limited in scope and will not be ableto address all forms of alignment. We will relyon expert judgments regarding the features thatcharacterize test items, thereby addressing thefirst aspect of alignment discussed above.Because we will not have access to test scoredata, we will not be able to examine item char-acteristics or relationships among scores on dif-ferent tests and criterion measures (such asfirst-year grade point average). A comprehen-sive study of standard-setting across instru-ments is also beyond the scope of this project.

Alignment Among Secondary and Post-Secondary Assessments in California

179P A C E

Page 4: Alignment Among Secondary and Post-Secondary Assessments in

180

Importance of Considering Purpose ofAssessmentThe degree of alignment among different setsof tests will undoubtedly vary substantially.Even when assessments are designed to be par-allel, as with alternate forms of the SAT, wewould not expect perfect alignment. Becausethe assessments we are comparing in this studywere designed for different purposes, the align-ment is likely to be much less than perfect.This is not necessarily a problem, if the differ-ences result from appropriate efforts to tailorthe measure to the situation for which it wasdesigned. For example, a low-stakes K-12 sys-tem monitoring exam (i.e., one that is used totrack achievement but that has no conse-quences for individual students, teachers, orschools), might be designed to include a broadvariety of topics and therefore may not sampleadequately from college-level material. Theremay be no discernible negative effect of this onstudents’ efforts to prepare for other exams. If,however, scores on this K-12 exam were usedto determine which students should graduate orwhich teachers should get bonuses in their pay-checks, there would be a significant risk of“teaching to the test” that might result inteachers and students neglecting material thatis not tested. This type of response has beenobserved in states with test-based accountabilitysystems.9 Thus the purposes of the tests, andhow they are viewed by school personnel andstudents, influence the degree to which mis-alignment may pose a problem.

The nature of the misalignment is alsoimportant. In the example presented above, theissue was primarily one of content sampling.The problem may be more serious when twotests reflect different philosophies concerning

what students should know and what kinds ofskills they should be able to display. In manycases, the misalignment among K-12 and uni-versity-level tests results from reforms thathave taken hold at one level of the educationalsystem but not another. This is particularly truein states where new tests have been developedto reflect state standards or frameworks thatemphasize inquiry-based teaching and open-ended problem solving. In such cases, the skillsand knowledge students are expected todemonstrate on the state exams may differ sub-stantially from what is expected on collegeadmissions and placement exams. This creates aconfusing set of signals for students concerninghow they should prepare for the admissions andplacement process. It is this signaling functionof tests that is the primary motivation for thisalignment study.

Finally, the examinee population for whichthe test was designed, and the ways in whichscores are used, must be considered. Examsthat are intended to make fine distinctionsamong high-ability students need to include alarge number of difficult items and may includetopics that are covered in advanced courses.Such items would be less appropriate for a testthat is administered to the entire public schoolpopulation. So it would be reasonable to expectsome misalignments. All of the results we dis-cuss below should be interpreted with this inmind. Later we provide further discussion ofthe importance of considering purpose.

California’s AssessmentEnvironment

The current policy environment with respect tostandards and assessments in California is

C r u c i a l I s s u e s i n C a l i f o r n i a E d u c a t i o n 2 0 0 0

Page 5: Alignment Among Secondary and Post-Secondary Assessments in

described in the chapter of this volume byVenezia. Students in California high schools, par-ticularly those who plan to attend college, take anumber of tests that differ in format and purpose.Below we discuss each of the assessments that weexamined in this study. We study only mathemat-ics and English/language arts tests, though manyof the assessment programs discussed belowinclude tests in other subjects as well.

Several of the tests we examined, includingthe SAT I, SAT II, ACT, and AP exams, areused nationally to aid in college admissionsdecisions. The SAT I, a three-hour, mostlymultiple-choice exam that measures generalmathematical and verbal reasoning, is intendedto help predict success in college. Evidence ofits validity for this purpose typically focuses oncorrelations with freshman grade point average.The SAT II is a one-hour multiple-choice testthat assesses in-depth knowledge of a particularsubject, and is used by admissions officers as anadditional measure with which to evaluate stu-dent subject-matter competence. The SAT II isused primarily at the more selective institutionsand is taken by far fewer students than is theSAT I. For this study, we examined the follow-ing SAT II tests: Mathematics IC, MathematicsIIC, Literature, and Writing. The ACT is anapproximately three-hour exam consistingentirely of multiple-choice items. Used as analternative measure to the SAT I in evaluatingapplicants chances of success in college, itassesses achievement in several academic sub-jects, including science, reading, writing, andmath. The AP tests are used to measure col-lege-level achievement in several subjects, andto award academic credit to students whodemonstrate college-level proficiency. We

examined two AP exams: Calculus AB andEnglish Language and Composition.

Students are encouraged to take the ACT orSAT I within their junior or senior years, where-as the most optimal time to take the SAT II orAP exams is within months of completing a rele-vant course. Students are typically required totake either the SAT I or ACT, and, at certainschools, several SAT II exams as part of theadmissions process. While the AP tests are not arequirement, admissions officers are likely toview students with AP experience as better-pre-pared and more competitive applicants.

In addition to the college entrance tests,California students encounter several otherassessments during their high school years. Aspart of its Standardized Testing and Reporting(STAR) program, California currently requirespublic schools to administer the StanfordAchievement Test, Version 9 (Stanford 9) ingrades 2 through 11, published by HarcourtEducational Measurement. Scores on this one-hour multiple-choice test are used to monitorstudent achievement in basic academic skills,and allow comparisons to be made to a nation-al sample of students. In spring of 1999 a setof augmentation items was administered tosupplement the Stanford 9. These included 35language arts items and 35 math items, whichwere designed to assess progress toward thestate-adopted content standards. In grades 8-10, the specific math items administered weredetermined by the math course in which thestudent was enrolled. The augmented portionof STAR is still evolving, and we were unableto obtain the actual items administered to stu-dents. Therefore these items are not includedin our analysis. Results from the 1999 STAR

Alignment Among Secondary and Post-Secondary Assessments in California

181P A C E

Page 6: Alignment Among Secondary and Post-Secondary Assessments in

administration indicate that the augmenteditems were difficult for students. The gover-nor has proposed tying merit-based collegeaid to performance on these items; this andother proposed high-stakes uses of STARmake it highly likely that both students andteachers will increasinglyfocus their efforts on thistesting program.

Students also have theoption of taking theGolden State Exams(GSE), which are volun-tary tests allowing highschools students to earnspecial recognition whenthey graduate. The GSEsare 90-minutes tests con-taining both multiple-choice and open-endeditems. They are intendedto assess student achieve-ment relative to state-adopted content stan-dards in particular subjectareas. We included five ofthese tests in our study:High SchoolMathematics, First YearAlgebra, Geometry,Reading/Literature, andWritten Composition.Some of the GSE assess-ments are similar to end-of-course exams (e.g.,Algebra or Geometry),and are best taken whilethe students are currentlyenrolled in the course.

Other GSEs are comprehensive tests thatcover the content of several courses (e.g.,Reading/Literature, Written Composition, andHigh School Mathematics). Students wishingto take these tests are advised to wait untiltheir junior or senior year of high school.

182 C r u c i a l I s s u e s i n C a l i f o r n i a E d u c a t i o n 2 0 0 0

Table 1. Structural Characteristics of the Tests: Mathematics

Test MaterialsExamined Time Limit

Number ofItems Tools

ACT Full sample form 60 minutes 60 MC Calculator

AP Calculus AB Full form, 1997released exam

Two 90-minute sections

40 MC6 Free response

Graphing calcu-lator on last 15MC items

California StateUniversity EntryLevel MathematicsPlacement Exam

Sample items 75 minutes 65 MC Calculator

Golden State Exam(Algebra)

Sample items Two separate45-minute sessions

30 MC2 OE

Calculator, Ruler

Golden State Exam(Geometry)

Sample items Two separate45-minute sessions

30 MC2 OE

Calculator, Ruler

Page 7: Alignment Among Secondary and Post-Secondary Assessments in

Finally, examinees applying to any of the 31colleges under the California State University(CSU) and the University of California (UC)systems may be required to take a placementexam in math and/or English. Many of thecommunity colleges also administer placement

exams. These tests are used to determinewhether admitted students possess entry-levelmath and English skills. CSU has placementtests for both math and English, whereas UCadministers a system-wide test only forEnglish. The CSU system requires its students

obtain a minimum achievement levelon the SAT I, SAT II, or ACT inorder to be exempted from taking aplacement exam. UC requires a mini-mum achievement level on either theSAT II or AP exam. Students notmeeting the minimum standards underthe CSU guidelines must take a 75-minute multiple-choice math exam,and/or a 105-minute English test,which contains both multiple-choiceand essay items. Examinees not meet-ing the UC standards for English arerequired to take a two-hour essayexam. The community colleges admin-ister a range of exams; we include theSanta Barbara City College Englishexam in this analysis as an example. Allstudents planning to enroll in anEnglish course at the Santa BarbaraCity College must take the 85-minuteCollege Tests for English Placementbefore registration. The test, consist-ing of both multiple-choice and essayitems, is used to place students in anappropriate English course.

Tables 1 and 2 list these testingprograms and the type of informationwe were able to obtain for this study.For most tests, we used a single formfrom a recent administration or a full-length, published sample test. In a fewinstances where full-length forms were

Alignment Among Secondary and Post-Secondary Assessments in California

183P A C E

Purpose FrameworkContent as Specified inTesting Materials

Selection of studentsfor higher education

High school mathe-matics curriculum

Prealgebra (23%), ele-mentary algebra (17%),intermediate algebra(15%), coordinate geometry (15%), planegeometry (23%) andtrigonometry (7%)

Provide opportunitiesfor HS students toreceive college creditand advanced courseplacement

AP Calculus CourseDescription

Calculus

Assess whetheradmitted studentspossess entry levelmath skills

Statement onCompetencies inMathematicsExpected of EnteringCollege Studentsreviewed by facultyfrom CA communityColleges, CSU, andUC systems

Algebra I and II (60%),geometry (20%), datainterpretation, counting,probability, and statistics(20%)

Monitor studentachievement towardstate-approved con-tent standards, pro-vide special diploma

Mathematics ContentStandards forCalifornia PublicSchools, KindergartenThrough Grade 12adopted by the StateBoard of EducationStandards

First-year algebra

Monitor studentachievement towardstate-approved con-tent standards, pro-vide special diploma

Mathematics ContentStandards forCalifornia PublicSchools, KindergartenThrough Grade 12adopted by the StateBoard of EducationStandards

Geometry

Page 8: Alignment Among Secondary and Post-Secondary Assessments in

184 C r u c i a l I s s u e s i n C a l i f o r n i a E d u c a t i o n 2 0 0 0

unavailable, we used published sets of sampleitems. This was the case for the CSU place-ment tests and the GSEs. As mentioned earlier,we were also unable to obtain the STAR aug-mentation items, butinstead looked at theSTAR Test Blueprintsprovided by theCalifornia Department ofEducation. For theEnglish/language arts(ELA) tests, the tablespecifies whether the testincludes each of threepossible types of items:reading, objective (e.g.,multiple-choice) writing,and essay writing. Wheninterpreting results, thereader needs to keep inmind that the percent-ages we report for theCSU and GSE exams arenot necessarily the samepercentages that wouldbe obtained if we hadexamined an actual testform. They do, however,provide rough indicatorsof the emphasis placed onvarious topics in thematerials that are used bystudents to prepare forthe exams.

Methodology

The alignment analysis involved two majorphases. In phase 1, we developed a framework

Table 1 continued. Structural Characteristics of the Tests: Mathematics

Golden State Exam(High SchoolMathematics)

Sample items Two separate45-minute sessions

30 MC2 OE

Calculator, Ruler

SAT I Full sample form Two 30-minutesessionsOne 15-minutesession

35 MC15 QC10 GR

Calculator

SAT II-Level IC Full sample form 60 minutes 50 MC Calculator

SAT II-Level IIC Full sample form 60 minutes 50 MC Calculator

Stanford 9 Full form 60 minutes 48 MC Calculator, Ruler

Stanford 9 augmentation items

Test blueprints Calculator, Ruler

Notes

MC = multiple-choiceOE = open-ended GR = grid-in QC = quantitative comparison

Test MaterialsExamined Time Limit

Number ofItems Tools

Page 9: Alignment Among Secondary and Post-Secondary Assessments in

P A C E

Alignment Among Secondary and Post-Secondary Assessments in California

185

of specifications for each subject. We examinedseveral existing assessment frameworks, such asthose used to develop the National Assessmentof Educational Progress (NAEP), and combined

them to produce a set of specifications thataddressed the range of topics and item typesappearing on the tests included in this study.We then applied these frameworks to our set of

tests, and made several rounds of mod-ifications in response to difficulties weencountered in conducting the align-ment. The process was similar to onethat we use for developing scoringrubrics for open-ended assessmentitems. The resulting frameworks aredescribed later in this report.

Phase 2 consisted of the actualalignment exercise. Two raters whohad expertise in both the relevantsubject area and in the application ofscoring criteria to assessment resultsconducted the alignment analysis foreach subject. The raters workedthrough several of the assessmentstogether. When raters differed intheir interpretations of the frameworkcomponents, they discussed the dif-ference until agreement was reached.In cases where a disagreement couldnot be resolved, a third rater deter-mined the final categorization. Thisprocess resulted in reasonably highlevels of agreement (kappa values ofapproximately 85 percent to100 per-cent) for most categories. Two excep-tions were content area in math,where items often assessed skills inmore than one area, and passage topicin reading, because passages oftencould be coded as addressing morethan one topic. A final exception wasthe cognitive process category inmath, discussed further below. For

Monitor studentachievement towardstate-approved con-tent standards, pro-vide special diploma

Mathematics ContentStandards forCalifornia PublicSchools, KindergartenThrough Grade 12adopted by the StateBoard of EducationStandards

Algebra I and II, geometry,probability and statistics

Selection of studentsfor higher education

High school mathe-matics curriculum

Arithmetic (13%), algebra(35%), geometry, (26%),and other (26%)

Selection of studentsfor higher education

Three-year collegepreparatory mathe-matics curriculum

Algebra (30%), geometry(38%, specifically planeEuclidean (20%), coordi-nate (12%), and three-dimensional (6%)),trigonometry (8%), func-tions (12%), statisticsand probability (6%), andmiscellaneous (6%)

Selection of studentsfor higher education

More than three yearsof college preparatorymathematics curricu-lum

Algebra (18%), geometry(20%, specifically coordi-nate (12%) and three-dimensional (8%)),trigonometry (20%), func-tions (24%), statisticsand probability (6%), andmiscellaneous (12%)

Monitor studentachievement towardCA standards

National Council ofTeachers ofMathematicsStandards

Two subtests: mathemati-cal problem-solving andmathematical procedures

Monitor studentachievement towardCA standards

CA standards 23% algebra I, 31%geometry, 31% algebra II,14% statistics

Purpose FrameworkContent as Specified inTesting Materials

Page 10: Alignment Among Secondary and Post-Secondary Assessments in

these categories, agreement tended to beapproximately 70 percent.

Results for Mathematics

In this section we describe the results of thealignment exercise formath tests. First we pre-sent the framework thatwas developed. We thendescribe the major areasof alignment and mis-alignment, and discussthe implications of thesefindings for the signalsthat students receive.

FrameworkThe math frameworkconsisted of three majordimensions: technicalfeatures, content, andcognitive processes. Thisset of dimensions wasused in an earlier studyof the alignment betweenstate tests and NAEP,10

but we modified the defi-nitions of these dimen-sions to some degree toreflect unique character-istics of some of the testswe examined in thisstudy. The technicaldimension covered fea-tures of the test thatcould be describedthrough simple examina-

tion of the test and items—number of items,time limit, format (e.g., multiple-choice,essay), provisions for the use of tools such ascalculators or protractors, the use of diagramsor other graphics, the use of formulas, andwhether each item was embedded in a context(as in a word problem). The use of formulas

186 C r u c i a l I s s u e s i n C a l i f o r n i a E d u c a t i o n 2 0 0 0

Test MaterialsExamined Time Limit Number of Items Purpose

ACT Full sample form 80 minutes (35 minute readingsection, 45 minuteobjective writingsection)

40 MC reading75 MC objectivewriting

Selection of stu-dents for highereducation

AP Language andComposition

Sample questions 60 minute MCsection120 minute essaysection

52 MC3 essays

Provide opportuni-ties for HS stu-dents to receivecollege credit andadvanced courseplacement

California StateUniversity EntryLevel EnglishPlacement Exam

Sample items Two 30-minutesections(one section eachfor reading andobjective writing)45 minute essaysection

45 MC reading 45 MC objectivewriting 1 essay

Assess whetheradmitted studentspossess entrylevel English skills

Golden State Exam(Reading/Literature)

Sample items Two separate 45-minute sessions

30 MC2 essays

Monitor studentachievementtoward state-approved contentstandards, providespecial diploma

Golden State Exam(WrittenComposition)

Sample Items Two separate 45-minute sessions

30 MC2 essays

Monitor studentachievementtoward state-approved contentstandards providespecial diploma

Table 2. Structural Characteristics of the Tests: English/Language Arts

Page 11: Alignment Among Secondary and Post-Secondary Assessments in

P A C E

Alignment Among Secondary and Post-Secondary Assessments in California

187

was sometimes difficult to determine becauseproblems can be solved in multiple ways, and insome cases an item could be solved either withor without a formula. Items were coded asrequiring a formula only if it was determinedthat the formula was necessary for solving theproblem. Finally, we examined the context sur-

rounding the assessment, particularly thedegree to which high stakes are attached toperformance. This is important because itaffects examinee motivation.

The content dimension included several cat-egories of math topics, from pre-algebra (e.g.,numbers and operations) through calculus.

Almost all of the tests we examinedhad specifications that included manyor all of these categories. We listedsub-categories as a means of makingthe distinctions among the main cate-gories clearer, but we coded using onlythe main categories.

Finally, the cognitive dimension wasidentical to that used for NAEP, andincluded three categories—conceptualunderstanding, procedural knowledge,and problem solving. As is typical withstudies like this, the raters found thisdimension to be the most difficult tocode.11 The cognitive process cate-gories cannot always be separated neat-ly: According to the NAEP framework,“These abilities are…descriptions ofthe ways in which information is struc-tured for instruction and the ways inwhich students manipulate, reasonwith, or communicate their mathemati-cal ideas. As a consequence, there canbe no singular or unanimous agree-ment among educators about whatconstitutes a conceptual, a procedural,or a problem-solving item. What canbe classified are the actions a student islikely to undertake in processing infor-mation and providing a satisfactoryresponse.”12

Framework ReadingSection?

Objective Writing Section?

Essay Section?

High school mathematics curriculum

Y Y N

AP English Languageand CompositionCourse Description

Y N Y

CSU English curriculum Y Y Y

English-Language ArtsContent Standards forCalifornia PublicSchools, KindergartenThrough Grade Twelve,adopted by the StateBoard of EducationStandards

Y N Y

English-Language ArtsContent Standards forCalifornia PublicSchools, KindergartenThrough Grade Twelve,adopted by the StateBoard of EducationStandards

N Y Y

Page 12: Alignment Among Secondary and Post-Secondary Assessments in

188

In addition, items can often be solved inmultiple ways, sometimes as a function of theexaminees proficiency. What might be a prob-lem-solving item for one examinee mightrequire another to apply extensive proceduralknowledge. For instance, consider an item ask-ing students for the sum of the first 101 num-bers starting with zero. A procedural knowl-edge approach might involve a computation-intensive method, such as entering all the num-bers into a calculator to obtain the resultingsum. However, the problem-solving approachwould entail a recognitionthat all the numbers,except the number 50, canbe paired with anothernumber to form a sum of100 (100+0, 99+1, 98+2,etc.). The total sum isthen simply computed bymultiplying the numberof pairs (i.e., 50) by 100and adding 50. Clearly,depending upon the cho-sen approach, the sameitem can elicit varyinglevels of mathematicalsophistication. The cogni-tive processes required bythe items affect the con-struct that they measureand, as a consequence,examinee scores.However, for the purpos-es of this study, whichfocuses on signals sent toexaminees, clear distinc-tions along this dimensionare arguably less critical.

Aspects of Alignment and Misalignment inMathematicsTo evaluate alignment, the degree of consisten-cy among the measures in connection withstructural and content characteristics was stud-ied.i Table 3 provides more details on the struc-tural and content features of each test. Themeasures shared some features, particularlythose related to format and administrative con-ditions. Every assessment included multiple-choice items, and all but the GSE were admin-istered in a single testing session that took

C r u c i a l I s s u e s i n C a l i f o r n i a E d u c a t i o n 2 0 0 0

Table 2 continued. Structural Characteristics of the Tests: English/Language Arts

SAT II-Literature Full sample form 60 minutes 60 MC Selection of stu-dents for highereducation

SAT II-Writing Full sample form One 40-minuteMC sessionOne 20-minuteessay session

60 MC1 essay

Selection of stu-dents for highereducation

Stanford-9 Full form 60 minutes 84 MC (54 readingcomprehensionitems, 30 vocabu-lary items)

Monitor studentachievementtoward CA standards

University ofCalifornia Subject AExamination

Sample questions 2 hours 1 essay Assess admittedstudents’ writingskills

SAT I Full sample form Two 30-minutesessionsOne 15-minutesession

78 MC Selection of stu-dents for highereducation

Test MaterialsExamined Time Limit Number of Items Purpose

Santa BarbaraCollege Tests forEnglish Placement

Full sample form 85 minutes (30 minutes read-ing section, 35minutes objectivewriting section, 20minute essay)

35 MC reading 70 MC objectivewriting 1 essay

Assess whetherstudents possessentry level Englishskills

Page 13: Alignment Among Secondary and Post-Secondary Assessments in

approximately one hour. Students were allowedthe use of a calculator, although most questionsdid not require extensive computation.Familiarity with basic formulas and mathemati-cal identities was generally assumed as back-ground for the questions, but knowledge ofmore complex formulas was seldom necessary.

The assessments, however, tended to havemany more differences than similarities. Therewas a great deal of structural variation amongthe exams, especially with regard to the per-centages of items containing formulas and illus-

trations. Fewer than 10 percent of the SAT Iand Stanford 9 items required a memorizedformula, in contrast to 25 percent of the GSEGeometry problems. Whereas the GSEAlgebra and SAT II Level IIC assessmentsmade little use of figures, the Stanford 9 andGSE Geometry exams included many illustra-tions, with 42 percent and 75 percent of theiritems, respectively, containing a diagram.Differences in the degree to which tests requireinterpretation of spatial or figural informationare particularly important as they can affect

gender and other group differences.Instances of misalignment were

also observed with respect to theamount of contextualization provided.In spite of reform ideology that rec-ommends the inclusion of personallyrelevant items that require applica-tions of mathematical principles toreal-life situations, many of the examscontinued to measure student achieve-ment with abstract questions—that is,questions that included only numbersand symbols. No more than 25 per-cent of the items found on the collegeadmissions and placement assessmentswere contextualized (i.e., embedded ina story),whereas more than half of theStanford 9 items were classified asbeing contextualized.

Perhaps more important than thepercent of contextualized items is thenature of the contextualization. In thisrespect, only the GSE open-endedquestions were in line with the reformmovement. Although 58 percent ofthe Stanford 9 items were framed inrealistic situations, the presented sce-

Alignment Among Secondary and Post-Secondary Assessments in California

189P A C E

High school English andAmerican LiteratureCurriculum

Y N N

High school Readingand Language ArtsCurriculum

N Y Y

Aligned with NAEPframework

Y N N

UC English curriculum N N Y

High school Readingand Language ArtsCurriculum

Y Y N

Framework ReadingSection?

Objective Writing Section?

Essay Section?

High school Readingand Language ArtsCurriculum

Y Y Y

Page 14: Alignment Among Secondary and Post-Secondary Assessments in

narios were brief, and had limited practical applica-tions. On the other hand, the GSE open-endeditems allowed examinees to impose their ownmeanings and constraints, and bore some relevancyto “real-world” skills. The GSE open-ended itemswill be discussed more fully in a later section.

Widespread reform efforts have also been direct-ed toward the format in which test items are pre-sented. Despite frequent criticisms that multiple-choice items are limited inthe skills they measure,only the SAT I and GSEincluded items thatrequired students to gen-erate their own responses.The GSE open-endedquestions, however, weremuch more extensive thanthe SAT I items.Successful solution of aGSE open-ended problemgenerally required multi-ple steps, and studentswere asked to justify orexplain their solutions–frequently with diagramsor charts. In contrast, theSAT I open-ended itemsdid not necessarily call formultiple strategies, andcould sometimes be solvedwith algorithmic proce-dures. Furthermore, theSAT open-ended itemswere constrained, as theresponses could not takeon negative values. Thus,although the two tests

both make use of an open-response format, thecognitive demands differ dramatically.

An analogous problem arises with similarlynamed tests that assess very different sets of skills.Although all the exams are considered measures ofmathematics achievement, there is a great deal ofvariation in the constructs assessed. Approximately52 percent of the GSE Algebra items and 37 per-cent of the SAT I questions measured elementary

190 C r u c i a l I s s u e s i n C a l i f o r n i a E d u c a t i o n 2 0 0 0

Table 3. Percent of Items Falling in each Category: Mathematics

Test Format Context Graphs Diagrams

MC QC GR OE C S RO P S RO P

ACT 100 0 0 0 22 5 2 0 13 0 0

CSU 100 0 0 0 24 0 0 0 16 0 0

GSE (Algebra) 95 0 0 5 15 0 5 0 10 0 0

GSE (Geometry) 95 0 0 5 10 0 0 5 75 0 0

GSE (HS Math) 92 0 0 8 33 0 5 0 23 0 5

SAT I 58 25 17 0 25 7 0 0 18 0 0

SAT II-Level IC 100 0 0 0 18 8 0 0 26 0 0

SAT II-Level IIC 100 0 0 0 12 12 2 0 2 0 0

Stanford 9 100 0 0 0 58 21 4 0 42 0 0

Legend:

Format ContextMC = multiple-choice items C = contextualized items QC = quantitative comparison items RO = graph/diagram within response options GR = fill-in-the-grid items P = graph/diagram needs to be producedOE = open-ended items

Formulas ContentM = formula needs to be memorized PA = prealgebraG = formula is provided EA = elementary algebra

IA = intermediate algebraCG = coordinate geometryPG = plane geometryTR = trigonometrySP = statistics and probabilityMISC = miscellaneous topics

Page 15: Alignment Among Secondary and Post-Secondary Assessments in

P A C E

Alignment Among Secondary and Post-Secondary Assessments in California

191

algebra knowledge, whereas 40 percent of theStanford 9 items focused on statistics. For col-lege admissions exams such as the SAT IILevel IIC and ACT, relatively greater empha-sis was given to trigonometry, a topic that wasabsent from the both the GSE Algebra andSAT I exams.

The misalignments among the measures gobeyond content sampling, and extend to the

reasoning requirements elicited by each test.Although none of the assessments focusedheavily on problem-solving items, there weresome differences with respect to the emphasisgiven to domain knowledge. Ninety-eight per-cent of the CSU items entailed straightforwardapplication of declarative and proceduralknowledge. In a similar vein, the vast majorityof questions on the ACT, Stanford 9, and SAT

II Level IC tests were also solvable viaheuristics and algorithms. The SAT IILevel IIC, which was intended forexaminees enrolled in more advancedcollege preparatory math courses,placed the most emphasis on problem-solving ability (20 percent of its ques-tions).

Perhaps the source of the inconsis-tencies can be traced to variations in thepurposes of the assessments and in theframeworks that guided their develop-ment. The GSE and CSU weredesigned to be aligned with state-adopt-ed content standards, which have clearlyprescribed guidelines that shape thecontent of the assessments. TheStanford 9 also employs an externalframework, the National Council onTeachers of Mathematics Standards, butthis set of guidelines encompasses stan-dards that cut across state lines. Becausethey do not follow any explicit frame-work, the college admissions exams thatassess knowledge in particular subjects(i.e., SAT II and ACT) have moreloosely defined standards, and drawupon core concepts taught within mostmathematics courses. The SAT I, on the

Cognitive

Formulas Content Requirements

M G PA EA IA CG PG TR SP MISC CU PK PS

15 0 17 22 5 15 25 8 3 5 40 53 7

18 0 6 32 8 16 14 2 22 0 28 70 2

10 0 0 52 0 19 14 0 10 5 19 76 5

25 0 0 0 0 5 86 10 0 0 52 38 10

15 0 23 15 0 23 23 0 15 0 62 23 15

1 8 13 37 2 6 19 0 13 11 32 53 15

12 0 2 30 10 12 28 4 8 6 34 58 8

10 0 2 14 22 12 14 18 6 12 26 54 20

6 6 0 13 2 19 19 4 40 4 63 31 6

Graphs/Diagrams S = graph/diagram within item-stem

Cognitive RequirementsCU = conceptual understandingPK = procedural knowledgePS = problem-solving

Page 16: Alignment Among Secondary and Post-Secondary Assessments in

192

other hand, is independent of any specific cur-riculum or course, and is intended to assessgeneral mathematical reasoning proficiencydeveloped over years of schooling.

Several of the misalignments discussed earli-er should probably not be considered problem-atic, as some of the differences emerge fromappropriate efforts to adapt a test to serve aparticular purpose. For instance, although boththe SAT Level IIC and Stanford 9 includedtopics from a wide variety of courses, the SATLevel IIC drew upon trigonometry, whereasthe Stanford 9 rarely included such material.The broad content sampling found on both ofthese assessments can be further contrastedwith the topics on the GSE Geometry test,which reflected the curriculum of a specificcourse. In this particular case, the Stanford 9,SAT Level IIC, and GSE Geometry exam havedisparate purposes, which call for differing lev-els of mathematical sophistication and varyingextent of domain sampling. They are also tar-geted toward somewhat different examineepopulations. Because the SAT Level IIC is typi-cally used to select among higher-achievingstudents for entrance into universities and col-leges, the test needs to include many complexproblems with advanced content in order todistinguish among the examinees and rankorder them consistently. The Stanford 9, on theother hand, is used to monitor K-12 studentachievement, and therefore require items ofmore moderate difficulty that can be attemptedby students with a wider range of proficiencylevels and course-taking histories. In a similarvein, the GSE Geometry test, unlike the SATLevel IIC or the Stanford 9, is not a measure ofgeneral math ability, but a measure of achieve-ment in a particular course. Consequently, it is

more appropriate for this assessment to limit itscontent to a narrow area of math than to sam-ple extensively from the entire mathematicsdomain. Thus, when making decisions con-cerning whether misalignments pose a potentialproblem, it is important to consider the use ofthe test. For the measures discussed above, thediscrepancies most likely arise from variationsin their purposes, and are therefore acceptableinstances of misalignment.

However, discrepancies among exams withsimilar purposes are also evident. Consider theSAT II Level IC and GSE High School Mathexams.ii Although both are intended to assessthe proficiency of students who have takenthree years of college preparatory math cours-es, they differ in their structural and cognitivefeatures. The GSE contained a higher propor-tion of contextualized items (33 percent com-pared to 18 percent), whereas the SAT IIincluded more graphs (8 percent compared tonone). The GSE High School Math test alsoplaced a greater emphasis on problem-solvingitems. Finally, there were vast differences incontent sampling; the GSE was more likely todraw upon pre-algebra (23 percent comparedto 2 percent), whereas the SAT II includedmore elementary algebra items (30 percentcompared to 15 percent). In this particular case,the inconsistencies among the two sets of test-ing materials may send mixed messages to stu-dents regarding the emphases placed on varioustopics and skills.

Implications of the Misalignments The misalignments among the exam materialscan create a confusing set of signals pertainingto how students should prepare for the assess-ments. For example, the ACT and SAT I are

C r u c i a l I s s u e s i n C a l i f o r n i a E d u c a t i o n 2 0 0 0