Answers to Frequently Asked Questions about COMPASS e ... · Answers to Frequently Asked Questions about COMPASS e-Write & ESL e-Write 2 The COMPASS e-Write prompts were developed

Answers to Frequently Asked Questions about

COMPASS e-Write™ & ESL e-Write™

NOVEMBER 2012

© 2012 ACT, Inc. All Rights Reserved.

Answers to Frequently Asked Questions about COMPASS e-Write & ESL e-Write

1

Table of Contents

How does a writing essay test differ from a multiple-choice item test? ..................................................................1

What are the COMPASS e-Write 2–8 and COMPASS e-Write 2–12 prompts like?...............................................1

How do COMPASS e-Write prompts compare to other writing assessment prompts? ..........................................2

What are the ESL e-Write 2–12 prompts like?........................................................................................................3

Why do ESL e-Write prompts differ from COMPASS e-Write prompts? ................................................................3

How and why does COMPASS e-Write scoring differ from ESL e-Write scoring? .................................................5

COMPASS e-Write Scoring ................................................................................................................................5

COMPASS e-Write 2–8 Holistic Score Description ............................................................................................6

Subscores for COMPASS e-Write 2–8 ...............................................................................................................7

COMPASS e-Write 2–12 Holistic Score Description ..........................................................................................9

Subscores for COMPASS e-Write 2–12...........................................................................................................10

ESL e-Write Scoring .........................................................................................................................................12

ESL e-Write Analytic Score Scales...................................................................................................................13

ESL e-Write Overall Score................................................................................................................................15

Why is it inappropriate for native speakers to be tested with ESL tests? .............................................................17

How are COMPASS and ESL e-Write prompts and rubrics developed?..............................................................18

How are COMPASS and ESL e-Write prompts field tested?................................................................................19

How are raters trained to score e-Write responses, and how are field test responses evaluated?......................20

What reliability evidence exists regarding human rater scoring (i.e., inter-rater reliability)?.................................20

How is Vantage Learning’s IntelliMetric automated scoring engine trained to score COMPASS and ESL e-Write responses? .......................................................................................................................................23

What evidence is there that automated scoring is as accurate and reliable as human scoring? .........................24

Study Results for COMPASS e-Write 2–8 Score Scale and 1–4 Scoring Rubric ............................................25

Study Results for COMPASS e-Write 2–12 Score Scale and 1–6 Scoring Rubric ..........................................25

Study Results for ESL e-Write 2–12 Score Scale Analytic Scoring Guide.......................................................26

Scoring Quality Monitoring................................................................................................................................26

Are there responses that cannot be scored by the scoring engine? How are these scored? ..............................27

Given that most scores for COMPASS e-Write 2–12 fall into the 6, 7, and 8 score categories, how can schools use these scores to make meaningful placement decisions?...........................................................27

Best Practice: Multiple Measures for Writing ....................................................................................................28

What are other best practices associated with COMPASS direct writing assessments?.....................................30

Best Practice: Selecting a Direct Writing Assessment .....................................................................................30

Best Practice: Classroom Assessment & Standardized Assessment Differences ...........................................31

Best Practice: Local Pilot Testing & e-Write Validation ....................................................................................32

Best Practice: Other Considerations for Writing Assessment ..........................................................................34


1

How does a writing essay test differ from a multiple-choice item test?

The National Council of Teachers of English (NCTE) is devoted to improving the teaching and learning of English and language arts at all levels of education. In 1994, NCTE released Standards for the Assessment of Reading and Writing. Within this set of standards, NCTE specifically addresses the need for performance-based or authentic assessment for writing:

The general issue of the ‘realness’ of what is being measured (its construct-validity) is alluded to by the terms: authentic assessment, performance-based assessment, performance assessment, and demonstrations. Regardless of what the assessments are called, the issue is that tests must measure what they purport to measure: a reading test requires a demonstration of, among other things, constructing meaning from written text; a writing assessment requires a demonstration of producing written text.

IRA/NCTE Joint Task Force on Assessment, 1994 The overarching reason for evaluating writing using a direct writing assessment is to allow examinees to demonstrate facility with language through writing production. In the context of the COMPASS system, this direct writing component can be paired with the COMPASS Writing Skills Placement Test to provide a complementary multiple-choice examination that assesses an examinee’s ability to identify and correct errors in written text. Combining the COMPASS Writing Skills Placement Test with a writing essay test provides multiple measures to differentiate performance for placement decisions.

______________________________________________________________ What are the COMPASS e-Write 2–8 and COMPASS e-Write 2–12 prompts like?

For the COMPASS e-Write 2–8 and 2–12 prompts, examinees are provided with a writing task (prompt) that is framed within a familiar context. This might be a community or school setting where a problem or issue related to that setting is presented. Overall, the structure of the COMPASS e-Write prompts is a single-paragraph format that includes:

Sentence 1: Initial presentation of the setting, an issue or problem, and the person or the group considering how to address the issue or problem.

Sentence 2: Additional information about the setting, the issue or problem, and the outcome the person or group wants to achieve.

Sentences 3 and 4: Two different proposals for addressing the issue or problem.

Sentence 5: Instructions asking the writer to argue for one of the two proposals by providing a rationale for why the alternative they have chosen will be more likely to achieve the desired outcome.

This type of prompt requires that the examinee take a position and offer a solution that is supported with specific examples or evidence regarding the position taken. The sample prompt shown in Figure 1 provides an example of the model described above.

A School Board is concerned that the state's requirements for core courses in mathematics, English, science, and social studies may prevent students from taking important elective courses like music, other languages, and vocational education. The School Board would like to encourage more high school students to take elective courses and is considering two proposals. One proposal is to lengthen the school day to provide students with the opportunity to take elective courses. The other proposal is to offer elective courses in the summer. Write a letter to the School Board in which you argue for lengthening the school day or for offering elective courses during the summer, explaining why you think your choice will encourage more students to take elective courses.

Begin your letter: Dear School Board:

Figure 1 Sample COMPASS e-Write Prompt


2

The COMPASS e-Write prompts were developed to ensure that the format of the prompt is accessible and familiar to all levels of English-speaking students. The prompt structure is limited to four or five sentences. The sentences within the paragraph are simple sentences, with few independent or dependent clauses. The level of word choice within the COMPASS e-Write prompts is targeted at the high school level, with an overall readability level that is equivalent to high school-level text.

______________________________________________________________ How do COMPASS e-Write prompts compare to other writing assessment prompts?

The COMPASS e-Write prompt format is comparable to writing prompt formats found in virtually all secondary and postsecondary direct writing assessment programs within the United States. This includes secondary achievement tests (used to support accountability and inform grade promotion and graduation), entrance examinations (used to support selection decisions), and postsecondary tests (used for placing students and evaluating general education outcomes). Based on a review of direct writing assessments used by state testing programs, the COMPASS e-Write “take-a-position” prompt format is comparable to prompts used within numerous state-based writing assessments for high school students and to prompts associated with the persuasive writing domain tested as part of grade 12 writing on the National Assessment of Educational Progress (NAEP). COMPASS e-Write prompts are similar to prompts used on the ACT Writing Test, where the prompt describes a relevant issue, and examinees are asked to write about their perspective. COMPASS e-Write prompts and ACT Writing Test prompts are both designed to elicit a persuasive response. COMPASS e-Write 2–12 and ACT Writing Test responses are both evaluated using a 6-point scoring rubric. Differentiating factors between COMPASS e-Write and the ACT Writing Test include task and development differences. COMPASS e-Write uses a “problem/solution” task and examinees must defend one of two proposed solutions to a problem; the ACT Writing Test uses an “opinion on an issue” task, where examinees may argue for one of two perspectives in prompt or develop their own perspective on the issue. COMPASS prompts are developed for and field tested with entering college students; ACT Writing Test prompts are developed for and field tested with high school juniors. Development of the ACT Writing Test prompts was based on the ACT National Curriculum Survey®, which comprises a comprehensive review of state educational standards, a survey of educators, and consultation with content area experts across the curriculum, including high school teachers and instructors of entry-level college courses. English and Writing results from the ACT survey can be found at: http://www.act.org/research/curricsurvey.html. COMPASS e-Write prompts are also comparable in format and complexity to writing assessments affiliated with other postsecondary tests, including ACT-developed examinations and competitor products. Overall, examinees who have been exposed to a typical U.S. secondary education have likely encountered writing prompts in the past that are quite similar to the COMPASS e-Write prompts.

______________________________________________________________

http://www.act.org/research/curricsurvey.html


3

What are the ESL e-Write 2–12 prompts like?

For the ESL e-Write prompts, the examinee is provided with a writing task that is framed within a much more global or universal context. This includes settings related to society in general rather than specific communities. In addition, while the prompt provides contrasting positions, the positions are more general in nature to ensure that the setting is as familiar as possible to the intended audience. Overall, the structure of the ESL e-Write prompts is a single-paragraph format that includes:

Sentence 1: Presentation of one general or societal point of view.

Sentence 2: Presentation of another, contrasting point of view.

Sentence 3: Question posed to the writer to prompt a response (i.e., “What do you think?”).

Sentence 4: Final encouragement to use examples to support opinion. While this type of prompt format also requires that the examinee take a position and offer support for that position or opinion with examples, the framing of the prompt is the context of society and common societal encounters or issues, rather than specific problems set within a traditional native-speaker setting. The sample prompt in Figure 2 provides an example of the model outlined above.

Some people think that daily newspapers are the best source of information. Other people think that radio and television broadcasts are the best sources of information. What do you think? Use examples to support your opinion.

Figure 2 Sample ESL e-Write Prompt

______________________________________________________________ Why do ESL e-Write prompts differ from COMPASS e-Write prompts?

In Essentials of English, a position paper released by NCTE in 1982, the acquisition of English is characterized as including “skills in reading, writing, speaking, listening and observing.” This paper further indicates that “the development of these skills is a lifelong process.” English as a second language (ESL) is the study of English by nonnative speakers. Nonnative speakers are newly exposed to English. They come from widely varying linguistic and cultural backgrounds. ESL examinees may be unfamiliar with numerous English-language concepts, constructions, and conventions due to immersion in their own native culture and language. Native English speakers are those who have been immersed in the language and the culture. They have had English language acquisition opportunities since birth. There is an expectation that native English speakers have attained a facility of language that is based on a combination of settings, including education in an English-speaking environment. A basic expectation for English-speaking examinees is that they have achieved an understanding of the language based on life-long learning. In contrast, a basic expectation for non-English-speaking examinees is a lack of fluency or familiarity with the language and various cultural components. A direct writing assessment requires examinees to use the process of writing to achieve a sample of writing within a specified time frame. The relative differences in this process for a native English speaker versus a nonnative speaker are significant: in a writing assessment, the nonnative speaker must process the task, undertake the writing process, and produce writing, constantly navigating in a foreign language within a foreign culture. For this reason, the assessment of writing for ESL examinees differs from that of native English examinees.


4

In January 2001, the Conference on College Composition and Communication (CCCC) published its “Statement on Second Language Writing and Writers.” This statement was endorsed by the Teachers of English to Speakers of Other Languages (TESOL) Board of Directors at their February 2001 meeting. The following is an excerpt from the CCCC statement:

Writing prompts for placement and exit exams should avoid cultural references that are not readily understood by people who come from various cultural backgrounds. To reduce the risk of evaluating students on the basis of their cultural knowledge rather than their writing proficiency, students should be given several writing prompts to choose from when appropriate. The scoring of second-language texts should take into consideration various aspects of writing (e.g., topic development, organization, grammar, word choice), rather than focus only on one or two of these features that stand out as problematic. Conference on College Composition and Communication, 2001

The philosophy inherent in the ESL assessment standard outlined above was evident in the philosophy that drove development for the ESL e-Write prompts. The inherent differences between native English speakers and nonnative speakers were paramount in all phases of development for the ESL e-Write prompts. ESL prompts must be broad enough to allow for cultural differences (e.g., focused on more universal experiences). ESL prompts must be accessible to all levels of English-language learners, which means that the sentence structure and word choice must accommodate emerging English-language skills. Development for ESL e-Write prompts is a balance of posing a problem or task that is:

Appropriate for examinees in terms of age, education, and experience

Broad enough to minimize cultural differences

Simply worded enough to meet the needs of emerging English speakers This balance extends to the development of ESL scoring guides or rubrics that focus on scoring responses from individuals who are at varying levels of ability with regard to the English language. One additional difference for ESL e-Write is related to course placement decisions. COMPASS e-Write development was based on requirements for a native-speaking population, which includes placement into standard postsecondary English courses. In the case of native speakers, postsecondary course placement decisions typically require a single decision point for placing students into credit-bearing English courses versus developmental English courses. ESL e-Write prompt development was directly related to the course placement requirements for a nonnative population. ESL course placement decisions typically require at least two decision points for placement within at least three levels of courses: Pre-Level 1and Level 1 students, Level 2 students, and Level 3 students. There may be an additional decision point needed if an ESL program includes Level 4 students. Along with the complexities of ESL course placement decisions, there are specific instructional needs related to these three or four levels of ESL ability. ESL e-Write prompts and rubrics were developed to provide additional levels of differentiation to assist ESL educators in meeting local instructional needs and ESL course placement requirements.

______________________________________________________________


5

How and why does COMPASS e-Write scoring differ from ESL e-Write scoring?

A significant distinction is to be made between COMPASS e-Write scoring and ESL e-Write scoring. On a general level, there has been misinterpretation of the 2–12 holistic score scale for COMPASS e-Write 2–12 and the 2–12 domain-level and overall score scales for ESL e-Write. Because of this misinterpretation, there are those who think the COMPASS e-Write 2–12 and ESL e-Write prompts can be used interchangeably. However, this is not the case. While the score scales for these different writing placement tests are the same, the meaning of the two scales is substantially different. COMPASS e-Write Scoring

To fully explain differences in scoring, it is first important to describe different scoring models. When referring to the scoring model used for COMPASS e-Write 2–8 and 2–12 components, we refer to “holistic” scoring. Holistic scoring is based on reviewing features of writing and providing a score based on a general or holistic impression of the response. The primary scores assigned for COMPASS e-Write 2–8 and e-Write 2–12 are holistic scores. The purpose of the COMPASS e-Write scoring system is to assess a student’s performance given a first-draft writing situation. For the COMPASS e-Write 2–8 component, holistic scores are reported on a scale of 2–8. For the COMPASS e-Write 2–12 component, holistic scores are reported on a scale of 2–12. Each score point reflects a student’s ability to perform the skills identified in the respective COMPASS e-Write scoring guides. Responses are evaluated according to how well a student:

Formulates a clear and focused position on the issue defined in the prompt

Supports that position with reasons and evidence appropriate to the position taken and the specified concerns of the audience

Develops the argument in a coherent and logical manner

Expresses ideas using clear, effective language

A student obtains lower scores for not taking a position on the specified issue, not supporting that position with reasons and evidence, not developing the argument, or not expressing those ideas using clear, effective language. A student who does not respond to the prompt is assigned an “unscoreable” code rather than a score. Secondary COMPASS e-Write scores are subscores. Subscores are based on assigning a score associated with a specific feature of writing. Subscores provide users with some additional information regarding direct writing performance; subscores may be used to support instruction. COMPASS e-Write 2–8 and 2–12 subscores align with the following scoring rubric scales: subscores of 1–4 for COMPASS e-Write 2–8 and subscores of 1–6 for COMPASS e-Write 2–12. The subscores for both COMPASS e-Write 2–8 and 2–12 components are:

Focus

Content

Organization

Style

Conventions Please note that the primary holistic scores and the secondary subscores are assigned independently based on the respective holistic scoring rubrics and the respective subscore rubrics shown on the pages that follow. That is, while there is a relationship between the holistic scores and the subscores in terms of general features of writing, the subscores for the COMPASS e-Write components are not used to derive the holistic score. Separate scoring occurs for the holistic score versus the subscores.


6

The original COMPASS e-Write 2–8 direct writing assessment was introduced in 2001. Responses to these prompts are scored using a 4-point holistic scoring rubric, but are reported on a 2–8 scale so that the reported scores align with the original scoring model of two rater scores. The five COMPASS e-Write 2–8 subscores are reported on a 1–4 score scale. In 2006, ACT introduced three prompts for COMPASS e-Write 2–12 and in 2012 three more prompts were added to the COMPASS system. Responses to these prompts are scored using a 6-point holistic scoring rubric, but are reported on a 2–12 score scale, again, so that these align with the original scoring model of two rater scores. The COMPASS e-Write 2–12 subscores are reported on a 1–6 score scale. The COMPASS e-Write 2–8 holistic scoring rubric and the COMPASS e-Write 2–12 holistic scoring rubric are shown below and on the pages that follow. In addition to the holistic rubrics, the subscore rubrics are also provided. Please note that the holistic rubrics for both COMPASS e-Write 2–8 and COMPASS e-Write 2–12 show the rubric score level in the first column, the holistic score description in the second column, and the corresponding reported holistic score in the last column.

Table 1 Description of COMPASS e-Write 2–8 Holistic Rubric & Score Scale

Rubric Score Level COMPASS e-Write 2–8 Holistic Score Description

e-Write Score Reported

1

The response shows an inadequately developed sense of purpose, audience, and situation. Although the writer attempts to address the topic defined in the prompt, the response displays more than one of the following significant problems: Much of the style and language may be inappropriate for the occasion; focus may be unclear or unsustained; support is very minimal; sentences may be poorly constructed; word choice may be imprecise; or there may be many errors in usage and mechanics.

2

The response reflects some characteristics of a Level 2 response and some elements of a Level 4 response.

3

2

The response shows a partially developed sense of purpose, audience, and situation. The writer takes a position on the issue defined in the prompt and attempts to support that position, but there may be little elaboration or explanation. Focus may be unclear and not entirely sustained. Some effort to organize and sequence ideas is apparent, but organization may lack coherence. A limited control of language is apparent: word choice may be imprecise; sentences may be poorly constructed or confusing; and there may be many errors in usage and mechanics.

4


5

3

The response shows a developed sense of purpose, audience, and situation. The writer takes a position on the issue defined in the prompt with some elaboration or explanation. Focus is clear and generally maintained. Organization is generally clear. A competency with language is apparent: word choice and sentences are generally clear though there may be some errors in sentence structure, usage, and mechanics.

6

The response reflects some characteristics of a Level 6 response and some characteristics of a Level 8 response.

7

4

The response shows a thoughtful and well-developed sense of purpose, audience, and situation. The writer takes a position on the issue defined in the prompt with well-developed elaboration or explanation. Focus is clear and consistently maintained. Organization is unified and coherent. Good command of the language is apparent: word choice is precise; sentences are well structured and varied; and there are few errors in usage and mechanics.

8


7

Subscores for COMPASS e-Write 2–8

The purpose of the subscores for COMPASS e-Write is to provide additional information to students and institutions concerning students’ strengths and weaknesses. ACT has developed subscores for COMPASS e-Write 2–8, each using a 4-point scoring model. Descriptions of these score points are presented in tables 1a through 1e. For the human scoring, each response is read by one trained rater who assigns subscores in each of these five areas. Each score point on the 1–4 scale reflects a student’s ability to perform the specific skills identified in the scoring guide for that particular writing domain.

Table 1a COMPASS e-Write 2–8 Focus Domain

Subscore Description

1 The writing is not sufficient to maintain a point of view with any clarity or consistency.

2 The main idea(s) and point of view are generally maintained, although the writing may ramble in places and there may be digressions from the main points.

3 The main idea(s) and point of view are maintained throughout the writing.

4 The focus is sharp and is clearly and consistently maintained throughout the writing.

Table 1b COMPASS e-Write 2–8 Content Domain


1 Support for ideas is minimal; specific details are lacking; the writer does not adequately engage with the topic.

2 Some support is provided for the position taken; a few reasons (one to three) may be given without much elaboration beyond two or three sentences for each reason; a main impression may be one of rather simple and general writing.

3

Support for the position is rather elaborated and detailed (three to four reasons) in quite well-developed paragraphs; specific examples may be given, but they are sometimes not well selected. Due to a lack of clarity in places the reader sometimes may be confused. Development may be a bit repetitious.

4 Support for the position is elaborated in well-developed paragraphs; specific details and examples, sometimes from personal experience, are used. The writing gives a sense of completeness because the topic is quite thoroughly covered.

Table 1c COMPASS e-Write 2–8 Organization Domain


1 The writing is so minimal that little or no organization is apparent; there is no introduction, body, or conclusion; few or no transitions are used.

2

A simple organizational structure is apparent; there is usually some introduction, if only a sentence or two; the introduction may not adequately introduce the topic because the main point(s) is not presented; transitions may be lacking, confusing or obvious (e.g., “first,” “second”); the overall effect may be one of “listing” with several supporting ideas given but with little or no elaboration.

3 An organizational structure is apparent, with a (usually) well-defined introduction, body, and conclusion. The topic or position taken may not be completely clear in the introduction; transitions are usually used to show relationships between ideas.

4 The organization is unified and coherent with a well-developed introduction, body, and conclusion; sentences within paragraphs flow logically, and transitions are (generally) used consistently.


8

Table 1d

COMPASS e-Write 2–8 Style Domain


1 Language use is extremely simple; several sentences may be fragmented and confusing; words may be inaccurate or missing.

2 Language use is quite simple; more than one sentence may be fragmented or confusing; a few words may be inaccurate or missing.

3 Language use shows good control; sentences are clear, correct, and somewhat varied; word choice is appropriate and accurate.

4 Language use is interesting and engages the reader; sentences are varied in length and structure; word choice shows variety and is precise.

Table 1e COMPASS e-Write 2–8 Conventions Domain


1 The writing has errors of many kinds that may interfere with meaning, for example: sentence fragments, sentence splices, subject/verb agreement, plurals, inaccurate or missing words, punctuation, spelling.

2 The writing may have errors that distract the reader, but they do not usually interfere with meaning (e.g., sentence fragments and splices, punctuation, missing words, spelling, etc.).

3 Errors are relatively minor, such as a few words spelled inaccurately, missing apostrophes and commas, etc.

4 Errors are infrequent and minor, such as occasional spelling inaccuracies, missing commas, etc.


9

Table 2 Description of COMPASS e-Write 2–12 Holistic Rubric & Score Scale

Rubric Score Level COMPASS e-Write 2–12 Holistic Score Description

e-Write Score Reported

1

The response shows an inadequately developed sense of purpose, audience, and situation. These responses show a failed attempt to engage the issue defined in the prompt, and the response displays more than one of the following significant problems: focus on the stated position may be unclear or unsustained; support is lacking or not relevant; much of the style and language may be inappropriate for the occasion, with a very poor control of language: sentences may be poorly constructed and incomplete, word choice may be imprecise, or there may be so many severe errors in usage and mechanics that the writer’s ideas are very difficult to follow.

2


3

2

The response shows a poorly developed sense of purpose, audience, and situation. While the writer takes a position on the issue defined in the prompt, the response shows significant problems in one or more of the following areas, making the writer’s ideas often difficult to follow: focus on the stated position may be unclear or unsustained; support may be extremely minimal; organization may lack clear movement or connectedness; much of the style and language may be inappropriate for the occasion, with a weak control of language: sentences may be poorly constructed or incomplete, word choice may be imprecise, or there may be a pattern of errors in usage and mechanics that significantly interfere with meaning.

4


5

3

The response shows a partially developed sense of purpose, audience, and situation. The writer takes a position on the issue defined in the prompt and attempts to support that position, but with only a little elaboration or explanation. The writer maintains a general focus on the stated position, with minor digressions. Organization is clear enough to follow without difficulty. A limited control of language is apparent: word choice may be imprecise, sentences may be poorly constructed or confusing, and there may be numerous errors in usage and mechanics.

6


7

4

The response shows a developed sense of purpose, audience, and situation. The writer takes a position on the issue defined in the prompt and supports that position with some elaboration or explanation. Focus on the stated position is clear and generally maintained. Organization is generally clear. A competency with language is apparent: word choice and sentence structures are generally clear and appropriate, though there may be some errors in sentence structure, usage, and mechanics.

8


9

5

The response shows a well-developed sense of purpose, audience, and situation. The writer takes a position on the issue defined in the prompt and supports that position with moderate elaboration or explanation. Focus on the stated position is clear and consistent. Organization is unified and coherent. A command of language is apparent: word choice and sentence structures are generally varied, precise, and appropriate, though there may be a few errors in sentence structure, usage, and mechanics.

10


11

6

The response shows a thoughtful and well-developed sense of purpose, audience, and situation. The writer takes a position on the issue defined in the prompt and supports that position with extensive elaboration or explanation. Focus on the stated position is sharp and consistently maintained. Organization is unified and coherent. Outstanding command of language is apparent: word choice is precise, sentences are well structured and varied, and there are few errors in usage and mechanics.

12


10

Subscores for COMPASS e-Write 2–12

The purpose of the subscores for COMPASS e-Write is to provide additional information to students and institutions concerning students’ strengths and weaknesses. ACT has developed five subscore for COMPASS e-Write 2–12, each using a 6-point scoring model. Descriptions of these score points are presented in tables 2a through 2e. For the human scoring, each response is read by one trained rater who assigns subscores in each of these five areas. Each score point on the 1–6 scale reflects a student’s ability to perform the specific skills identified in the scoring guide for that particular writing domain.

Table 2a COMPASS e-Write 2–12 Focus Domain


1

The writing is not sufficient to maintain a point of view with any clarity or consistency. Focus is unclear due to one or more of the following reasons: the response is too short to provide sufficient evidence of focus; there is a lack of a stated position; digressions; or confusing language.

2 Focus is difficult to judge due to: the response being too short to provide sufficient evidence of focus, digressions that do not lead back to the stated position, or confusing language.

3 The main idea(s) and point of view are generally maintained. The writer maintains a general focus on the stated position; digressions usually lead back to the stated position.

4 The main idea(s) and point of view are maintained. The writer maintains a generally clear focus on the stated position; minor digressions eventually lead back to the stated position.

5 The main idea(s) and point of view are well maintained throughout the writing. The writer maintains a clear focus on the stated position.

6 The focus on the stated position is sharp and is clearly and consistently maintained throughout the writing.

Table 2b COMPASS e-Write 2–12 Content Domain


1 Support for ideas is extremely minimal or absent; specific details are lacking or not relevant; the writer does not adequately engage with the topic.

2 The writer supports ideas with extremely minimal elaboration; support may consist of unsupported assertions.

3 Only a little support is provided for the position taken; a few reasons may be given without much elaboration beyond one or two sentences for each reason; a main impression may be one of rather simple and general writing.

4 Support for the position is somewhat elaborated and detailed in well-developed paragraphs; specific examples may be given, but they are sometimes not well selected. Development may be a bit repetitious.

5 Support for the position is moderately elaborated in well-developed paragraphs; relevant, specific details and varied examples, sometimes from personal experience, are used. Development is clear, precise, and thorough.

6

Support for the position is extensively elaborated in well-developed and logically precise paragraphs; relevant, specific details and varied examples, sometimes from personal experience, are used. The writing gives a sense of completeness because the topic is quite thoroughly covered.


11

Table 2c

COMPASS e-Write 2–12 Organization Domain


1 The writing is so minimal that little or no organization is apparent; there is no introduction, body, or conclusion; few or no transitions are used.

2 Organization may lack clear movement or connectedness; paragraphs may not be used; transitional words or phrases are rarely used.

3

Organization is clear enough to follow without difficulty; the introduction and conclusion, if present, may be undeveloped; transitions may be lacking, confusing or predictable (e.g., “first,” “second,” etc.); the overall effect may be one of “listing” with several supporting ideas given but with little or no elaboration.

4 Organization is generally clear; introduction and conclusion are appropriate; some transitions show relationships among ideas and are usually appropriate.

5 Organization is unified and coherent; introduction and conclusion are developed; ideas show a progression or appropriate transitions show relationships among ideas.

6

The organization is unified and coherent, with a well-developed introduction, body, and conclusion; sentences within paragraphs flow logically, ideas show a clear progression, and effective transitions are used consistently and appropriately, clearly showing relationships among ideas.

Table 2d COMPASS e-Write 2–12 Style Domain


1 Very poor control of language is apparent: several sentences may be fragmented and confusing; words may be inaccurate or missing.

2 Weak control of language is apparent: sentence structures are often flawed and incomplete, as several sentences may be fragmented and confusing; word choice are simple and may be incorrect, imprecise, or vague.

3 A control of language is apparent: more than one sentence may be fragmented or confusing; a few words may be inaccurate or missing, but word choice is usually appropriate; phrasing may be vague or repetitive.

4 A competency with language is apparent: sentences are clear, correct, and somewhat varied; word choice is appropriate and accurate.

5 A command of language is apparent: sentence structures are usually varied, and word choice is usually varied, specific, and precise.

6 Language use is interesting and engages the reader: an outstanding command of the language is apparent; sentences are varied in length and structure; word choice is varied, specific, and precise.


12

Table 2e

COMPASS e-Write 2–12 Conventions Domain


1 The writing has severe errors of many kinds that interfere with meaning, for example: sentence fragments, sentence splices, subject/verb agreement, plurals, inaccurate or missing words, punctuation, spelling.

2 The writing has a pattern of errors that may significantly interfere with meaning, for example: sentence fragments, sentence splices, subject/verb agreement, plurals, inaccurate or missing words, punctuation, spelling.

3 The writing may have numerous errors that distract the reader, but they do not usually interfere with meaning (e.g., sentence fragments and splices, punctuation, missing words, spelling, etc.).

4 Some errors in grammar, usage and mechanics may be apparent, such as a few words spelled inaccurately, missing apostrophes and commas, etc., and they may distract and occasionally interfere with meaning.

5 A few errors in grammar, usage and mechanics may be apparent, such as occasional spelling inaccuracies, missing commas, etc, and they rarely distract or interfere with meaning.

6 A few errors in grammar, usage and mechanics may be apparent, such as occasional spelling inaccuracies, missing commas, etc, and they do not distract or interfere with meaning.

While COMPASS e-Write 2–12 prompts have been added to the COMPASS system, the COMPASS e-Write 2–8 prompts originally implemented within COMPASS are also available to all users. Given these two different score scales, COMPASS users can choose to continue using the 2–8 score scale or shift local direct writing assessment to the 2–12 score scale. ACT intends to integrate additional COMPASS e-Write 2–12 prompts into the COMPASS system in the future. ESL e-Write Scoring

In June 2006, ACT also introduced ESL e-Write 2–12. Responses to these prompts are scored using a 6-point analytic scoring rubric. This direct writing component was developed to address two levels of ESL testing needs—analytic scores to provide ESL instruction information and an overall score to use for ESL course placement. Unlike COMPASS e-Write, analytic scoring is the primary scoring model used for ESL e-Write, and all analytic scores are reported on a 2–12 score scale. Analytic scoring is based on reviewing features of writing and assigning a score that is directly associated with that feature. Analytic scoring provides more specific and, potentially, diagnostic information regarding specific writing strengths and weaknesses. The analytic scores for ESL e-Write are:

Development

Organization

Focus

Language Use

Mechanics


13

Since the analytic scores are the primary scores assigned for ESL e-Write, these scores are used to compute the overall ESL e-Write placement score. The ESL e-Write placement score is calculated using the assigned analytic scores; it is not a holistic score assigned using a holistic scoring rubric. In addition, based on recommendations from ESL educators, differential weighting is applied to the five ESL writing domains. The purpose of the ESL e-Write scoring system is to assess a student’s performance given a first-draft writing situation. ESL students are asked to provide a single writing sample in response to a specific prompt. ESL e-Write requires students to:

Take a position about a given issue.

Support the position with relevant reasons, examples, and details.

Organize and connect ideas in a clear and logical manner, maintaining a consistent focus on the main ideas throughout.

Express those ideas using correct grammar, usage, and mechanics.

ESL e-Write is designed to elicit written responses that demonstrate an ESL student’s ability to perform these skills. The writing task is defined by a single prompt that describes a common aspect of everyday life that is of general interest. Students are asked to state a personal opinion about this aspect of everyday life and to support their opinion giving reasons, examples, and details. Each prompt is presented in language that is simple and accessible for all ESL students, including beginning English speakers. The prompts do not require students to have background or specialized knowledge in order to successfully respond to the prompt.

ESL e-Write differs from the COMPASS e-Write prompts in that the method for deriving the overall score is based on a weighting and summing of the analytic scores. The following sections describe the analytic scoring model and the overall score calculation for ESL e-Write.

ESL e-Write Analytic Score Scales

The purpose of the ESL e-Write scoring system is to assess a student’s performance of the required skills given a timed, first-draft writing situation. For ESL e-Write, ACT worked with ESL experts to develop a 6-point, analytic rubric, with analytic scores presented on a scale of 2–12. The purpose of the analytic score scales is to provide skill domain information to students and institutions about students’ strengths and weaknesses. ACT developed five ESL writing domains, each using a 6-point scale. Each score point on the 1–6 scale reflects a student’s ability to perform the specific skills identified in the scoring guide for that particular domain. The following provides general descriptions of each of the ESL writing domains.

Development: the reasons, examples, and details that are used to support the stated or implied position

Focus: the clarity and consistency with which the main idea(s) or point of view is maintained

Organization: the clear, logical sequencing of ideas and the use of effective transitional devices to show relationships among ideas

Language Use: variety and correctness of sentence structures and word choices, and control of grammar and usage (e.g., word order, word forms, verb tense, and agreement)

Mechanics: errors in spelling, capitalization, and punctuation in relation to meaning

Detailed descriptions of the score points associated with ESL writing domains are presented in tables 3a through 3e.


14

Table 3a ESL e-Write Development Domain

Score Description

1 Development is severely limited, and writing may be partially unrelated to the topic.

2 Development is limited and may include excessive repetition of prompt ideas and/or consistently simple ideas.

3 The topic is developed using very few examples, which may be general and somewhat repetitious, but they are usually relevant to the topic.

4 The topic is developed using reasons supported by a few examples and details.

5 The topic is developed using reasons supported by some specific examples and details. Evidence of critical thinking and/or insight may be displayed.

6 The topic is developed using sound reasoning, supported by interesting, specific examples and details in a full, balanced response. Evidence of critical thinking and/or insight may be displayed. Opposing viewpoints may be considered and/or refuted.

Table 3b ESL e-Write Focus Domain

Score Description

1 Focus cannot be judged due to the brevity of the response.

2 Focus may be difficult to judge due to the brevity of the response; any digressions generally do not lead back to the task.

3 Focus is usually maintained on the main idea(s); any digressions usually lead back to the task.

4 Focus is adequately maintained on the main idea(s); any minor digressions lead back to the task.

5 Focus is maintained clearly on the main idea(s).

6 A sharp focus is maintained consistently on the main idea(s).

Table 3c ESL e-Write Organization Domain

Score Description

1 Little or no organizational structure is apparent.

2 The essay shows an understanding of the need for organization. Transitional words are rarely if ever used. There is minimal evidence of a beginning, middle and end to the essay.

3 Some organization may be evident. Transitions, if used, are generally simple and predictable. The introduction and conclusion, if present, may be undeveloped.

4 The essay demonstrates little evidence of the logical sequencing of ideas, but there is an adequate organizational structure and some transitions are used. There is an underdeveloped introduction and there may be no conclusion.

5 The essay demonstrates sequencing of ideas that is mostly logical, and appropriate transitions are used to show relationships among ideas. There is a somewhat developed introduction and there may be a brief conclusion.

6 The essay demonstrates logical sequencing of ideas, and transitions are used effectively to show relationships among ideas. There is a well-developed introduction and the essay may have a brief but clear conclusion.


15

Table 3d ESL e-Write Language Use Domain

Score Description

1 Sentences demonstrate little understanding of English word order, and word choice is often inaccurate. There are numerous errors in grammar and usage that frequently impede understanding.

2 Sentence structure is simple, with some errors evident in word order. Word choice is usually accurate but simple. Language control is inconsistent or weak, with many errors in grammar and usage, often making understanding difficult.

3 Most sentences are complete although some may not be correct or clear. Word choice is sometimes appropriate. Although a few errors may impede understanding, basic language control is evident and meaning is sometimes clear.

4 Some sentence variety is present, but some sentences may not be entirely correct or clear. Word choice is appropriate and varied. Although errors may be frequent, language control is adequate and meaning is usually clear.

5 A variety of kinds of sentences are present and are usually correct. Word choice is varied and occasionally specific. Overall, language control is good and meaning is clear.

6 A wide variety of kinds of sentences are present and usually correct. Word choice is varied and specific. Although there may be a few minor errors, language control is competent and meaning is clear.

Table 3e ESL e-Write Mechanics Domain

Score Description

1 Errors are frequently severe and obscure meaning, or mechanics cannot be judged due to the brevity of the response.

2 Errors often distract and/or frequently interfere with meaning, or mechanics may be difficult to judge due to the brevity of the response.

3 Errors sometimes distract and they occasionally interfere with meaning.

4 Errors usually do not distract or interfere with meaning.

5 Some errors are evident but they do not distract or interfere with meaning.

6 Only minor errors, if any, are present and they do not distract or interfere with meaning.

ESL e-Write Overall Score

In terms of the original ESL e-Write development effort, ACT determined that ESL instruction requires additional feedback on student-level performance; the analytic scores provide the cornerstone for this prescriptive information. The ESL analytic score scales are intended to provide writing skill domain information to students, instructors, and institutions concerning relative strengths and weaknesses. However, institutions typically require an overall score that can be used for general placement purposes in either standard English or ESL courses. Because the analytic scoring model for ESL e-Write differs from the holistic scoring model used for COMPASS e-Write placement scores, the method for achieving the overall score for ESL e-Write differs from the COMPASS e-Write components. As described previously, the ESL e-Write scoring model is based on analytic scoring, while the COMPASS e-Write scoring model is based on holistic scoring. Given this difference, the ESL e-Write overall score is based on the analytic scores assigned.


16

In addition, given ACT research in the areas of ESL curriculum, instruction, and assessment, especially as this relates to a direct writing component, some of the ESL writing domains carry more weight than others in terms of ESL instructional needs and students’ abilities to demonstrate language proficiency. To develop this weighting schema, ACT Test Development staff worked in close association with ESL practitioners and researchers who defined, ranked, and then weighted the ESL writing domains according to what they felt were the most important indicators of ESL writing skills. The weighting of the analytic scores for ESL e-Write was based on those domains that have the most meaning in terms of describing and assessing an ESL student’s writing abilities. For example, when comparing the Focus and Mechanics domains to the remaining three ESL writing domains, there are differences in the degree to which these domains actively impact meaningful writing performance. In the Mechanics domain, no language skills are being assessed, only capitalization, punctuation, and spelling. The Focus domain only measures whether a student can maintain a point of view. These two domains have much less impact on what a student can produce in an essay or on what a student is capable of demonstrating. ESL writing research points to capability with Language Use, Development, and to a lesser extent, Organization, as having a far greater impact on how well a student will do in college-level classes. In the ESL e-Write rubric, the descriptions of what is being measured in Language Use and Development are fuller and describe skills that are critical factors in ESL language learning. The Language Use domain measures sentence structure, word choice and word form, verb tenses, verb agreement—all skills that are integral to writing ability. The Development domain measures all the reasons, explanations, and details that students use in describing their opinion. Development is at the core of good writing; it is critical to assess ability related to Development because this domain requires students to demonstrate their thinking skills, their ability to reason, and their ability to produce meaningful, cohesive writing—once again, all essential and general skills needed to succeed in any college class. Table 4 lists each of the ESL e-Write analytic domains and outlines the weighting model used to derive the overall score; the overall score formula immediately follows the table.

Table 4

ESL e-Write Overall Score Weighting

Analytic Score Domains Analytic Score Weights

Development: the reasons, examples, and details that are used to support the stated or implied position

35 percent

Focus: the clarity and consistency with which the main idea(s) or point of view is maintained

10 percent

Organization: the clear, logical sequencing of ideas and the use of effective transitional devices to show relationships among ideas

15 percent

Language Use: variety and correctness of sentence structures and word choices, and control of grammar and usage (e.g., word order, word forms, verb tense and agreement)

35 percent

Mechanics: errors in spelling, capitalization, and punctuation in relation to meaning 5 percent

Overall Score 100 percent

To achieve the overall score, the ESL e-Write analytic scores are weighted, and the weighted analytic scores are summed. The reported overall score then ranges from 2 to 12 in increments of one point. The general formula used to compute the overall ESL e-Write score is shown in Figure 3.


17

Overall ESL e-Write Score:

(Development × 0.35) + (Focus × 0.10) +

(Organization × 0.15) + (Language Use × 0.35) +

(Mechanics × 0.05)

Figure 3 Computing Overall ESL e-Write Score

______________________________________________________________

Why is it inappropriate for native speakers to be tested with ESL tests?

Traditional ESL curricula include Listening, Speaking, Reading, and Writing components. Just as there are ESL-specific courses designed for English language learners to focus on the needs of this portion of the student population, there are ESL-specific tests designed to align with ESL curricula. It would not be appropriate to send native English speakers to ESL courses; neither is it appropriate to test native English speakers with ESL tests. The COMPASS program offers four ESL Placement Tests that align with traditional ESL course offerings: ESL Listening, ESL Reading, ESL Grammar/Usage, and ESL e-Write. The first three of these are the multiple-choice tests; the last of these is the ESL-specific direct writing assessment. Overall, ESL tests are not effective as measures of the literacy skills of native English speakers because native speakers’ strengths and weaknesses originate from very different sources in terms of knowledge, experience, and skill. For example, listening comprehension skills of native-speaking, low-level literacy students are generally much higher than their reading and writing skills. They have a native speaker’s experience as a listener, and thus the deficiencies they have may result from a lack of familiarity with the subject or vocabulary or the complexities involved when listening skills are used in combination with other skills (e.g., note-taking). With the exception of a minor focus on vocabulary, the ESL Listening Placement Test does not measure prior knowledge or listening in combination with other skills. The ACT ESL Grammar/Usage and ESL Reading Placement Tests are also not well suited to native-speaking, low-level literacy students. The majority of low-level literacy students encounter problems of decoding written language, either in terms of phonetics, sight word vocabulary, or reading fluency. This is a problem of the basic translation of written symbols to either sounds or meaning. The ESL assessments presume a basic functional literacy level. In other words, the ESL Grammar/Usage and ESL Reading tests do not measure the ability to decode per se, but rather the familiarity of the student with the conventions and vocabulary of English. In the ESL Grammar/Usage module, knowledge of English word order, basic verb tenses, and prepositions are measured. These elements of English are familiar to native speakers as a result of hearing and speaking English all of their lives and would be useless to measure. In the ESL Reading module, the lower levels contain some items that present a single word (e.g., “cup”) and four pictures, one of which represents the word. Examinees are asked to choose the picture that matches the word. The ESL assessment not only measures the students’ ability to say the word “cup,” but also to know what a cup is. ESL students may be able to read and pronounce English without being able to comprehend it. Low-level reading tests for native English speakers, on the other hand, measure the ability to decode the word and presume previous knowledge of its meaning. In a test for native-speaking students, therefore, we would not test the comprehension of the same vocabulary nor would we use the same distractors that are used in ESL tests. That is, the COMPASS Writing Skills Placement and COMPASS Reading Placement Tests were developed to address specific expectations for native English speakers. ACT would not suggest that native


18

speakers’ abilities in reading comprehension or in recognizing specific issues in written conventions could be appropriately assessed using the ESL tests. The same circumstances and examples outlined for the ESL Listening, ESL Grammar/Usage, and ESL Reading Placement Tests apply to ESL e-Write. There are basic expectations for native speakers regarding familiarity with the English language and its conventions that cannot exist for nonnative speakers, especially those just beginning to learn English. ESL e-Write prompts were developed to address widely varying English abilities and elicit writing by providing a task that is readily accessible to examinees based on relatively universal or global experiences. Given the inherent differences in the ESL e-Write prompts, the implementation of these prompts into the COMPASS system allows for routing ESL students who are in the advanced level from ESL e-Write into a COMPASS e-Write component. However, the COMPASS system does not allow for the routing of examinees from the COMPASS e-Write components into ESL e-Write. This was a conscious decision made to forestall the possibility of assessing native speakers with an ESL direct writing assessment. With the introduction of ESL e-Write in June 2006, sites now have access to placement tests for critical ESL curricular components: ESL Listening, ESL Grammar/Usage, ESL Reading, and ESL e-Write. ACT strongly recommends that COMPASS users review local postsecondary placement practices to ensure that ESL students are being tested with the appropriate ESL instruments. One student-centered benefit for revisiting and reaffirming local practices is that testing ESL students with ESL instruments is a fairer representation of ESL student ability. One institution-centered benefit is that sites can now more clearly differentiate between curriculum and instruction needs for native English speakers versus nonnative speakers. Sites that have traditionally been assessing all students with COMPASS e-Write now have the opportunity to examine test results and instructional needs specific to the two discrete populations (i.e., native speakers versus nonnative speakers), which allows for a more in-depth, institutionally meaningful, and fiscally responsible level of instructional planning.

______________________________________________________________ How are COMPASS and ESL e-Write prompts and rubrics developed?

For COMPASS e-Write, ACT Test Developers who have significant experience in prompt development at all age and grade levels develop the direct writing tasks. For ESL e-Write, prompts are developed by in-house ESL Test Development experts who work in close association with external ESL practitioners. For both COMPASS and ESL e-Write, ACT staff develops approximately twice the number of prompts that will eventually be used to allow for loss during reviews and field testing. Prompts are developed to elicit a depth and breadth of written responses. The scoring rubrics describe the features of writing based on specific scoring models and specific score categories. Prompts under development undergo multiple internal reviews. They also undergo external reviews with content experts (i.e., English faculty for COMPASS e-Write prompts and ESL faculty for ESL e-Write prompts). COMPASS and ESL e-Write prompts also undergo fairness reviews to ensure that prompts do not inappropriately advantage or disadvantage any groups (e.g., race/ethnicity, gender). The e-Write prompt reviews include reviewers representing specific minority populations (e.g., Asian, African American, Hispanic, Native American). For the ESL e-Write component, fairness reviews also include review by ESL educators to focus on any bias issues related to this population. Both soundness and fairness reviewers are individuals who have established credentials and expertise in the specific area of interest. Fairness reviewers are individuals who are also actively involved in issues related to the specific groups they represent. Once prompts are approved through the external content/soundness reviews and the fairness reviews, they are moved to the next level of development: the field test stage.

______________________________________________________________


19

How are COMPASS and ESL e-Write prompts field tested?

Prompts that have met all internal and external review requirements are field tested. Field tests are conducted at existing COMPASS sites (i.e., currently using COMPASS or ESL components); these sites enlist the aid of local students. ACT ensures that the participating field test sites are representative of the larger COMPASS client base. ACT works to ensure that participating examinees are from diverse racial and ethnic backgrounds and represents various levels of ability. Field tests are conducted using an electronic field test interface that is extremely similar to the actual COMPASS system so that this experience mirrors the real e-Write testing experience to the greatest extent possible. In soliciting site-level participation, ACT offers incentives to sites; past incentives have included free COMPASS and ASSET units. In soliciting participation, ACT makes it clear that scores from field tests will not be returned for individual students. The Standards for Educational and Psychological Testing (NCME, AERA, APA, 1999), the professional standards for educational assessment, clearly articulates testing organization responsibilities relative to providing scores to individuals and sites. Standard 5.12 specifically cautions against the return of scores that have not been fully analyzed and validated: “Scores should not be reported for individuals unless the validity, comparability, and reliability of such scores have been established.” Since the purpose of e-Write field tests is to allow analyses of scored responses to support decisions regarding prompt under development, returning scores for prompts that have not yet been approved for operational use would be contrary to established industry standards. The e-Write field tests conducted in 2004 included the three COMPASS e-Write 2–12 prompts and the three ESL e-Write prompts that were integrated into the COMPASS Internet system in June 2006. In March 2012, ACT added three additional COMPASS e-Write 2–12 prompts for a total of six prompts. Field testing of these prompts occurred in 2007 through 2010. Both the COMPASS e-Write 2–12 prompt field tests and the ESL e-Write field tests included two- and four-year institutions, colleges, and universities, located in various regions of the United States.

COMPASS e-Write 2–12 Field Test Sites ESL e-Write Field Test Sites

Midwest region (Indiana, Michigan, Minnesota, Missouri, Nebraska)

West region (Alaska, California)

Southwest region (Oklahoma)

Southeast region (Alabama, Arkansas, Georgia, North Carolina)

Midwest region (Illinois, Indiana, Iowa, Missouri, Nebraska, Ohio)

West region (California, Colorado, Oregon, Utah, Washington)

East region (Pennsylvania, Rhode Island, Maryland, New York)

Southwest region (Texas)

Southeast region (Florida, Georgia, Kentucky, North Carolina)

Postsecondary institutions that participated in both the COMPASS e-Write and ESL e-Write field tests ranged in size from small community college settings (e.g., less than 1,500 students) to larger state universities (e.g., more than 20,000 students). The total number of students tested for the COMPASS e-Write 2–12 field test was over 1,200. The total number of students tested for the ESL e-Write field test was over 1,000. The examinees who participated in the field test represented various racial/ethnic groups, including African American, Hispanic/Latino/a, and Asian populations: Students who participated in this field test ranged from developmental students to advanced writers for the COMPASS e-Write. Students participating in the ESL e-Write field test ranged from beginning ESL to advanced ESL students.

______________________________________________________________


20

How are raters trained to score e-Write responses, and how are field test responses evaluated?

ACT Test Development staff and experienced ACT Scoring Center staff, with the assistance of external experts as needed, first conduct a range-finding session to identify responses for each prompt that exemplify specific score levels. These responses are used to construct sets of anchor papers (i.e., “true score” papers) and training sets. It is during range-finding that the first level of prompt evaluation occurs. While reviewing and discussing responses during range-finding, Test Development and Scoring Center staff may see emerging patterns of responses. It may be almost immediately apparent that while enough responses have been gathered, the responses are not appropriately distributed along the score scale. This would alert ACT staff that locating the requisite number of responses for anchor and training sets may be problematic, and this may indicate a problem with the prompt. This is similar to field test results for a multiple-choice item where statistics show that most students picked the wrong answer, indicating that the item is not working properly. It is during range-finding that staff may determine whether responses to a specific field test prompt are even scored. Once range-finding is completed, the resulting materials are used to train ACT Scoring Center staff. Trained ACT staff score all field test responses for each prompt. The measuring the degree of accuracy in this hand-scoring effort is typically based on the level of reliability readers can achieve with regard to preselected and prescored responses. It is during training that leaders can identify those prompts where inter-rater reliability with prescored responses is high, indicating that readers score accurately in relation to papers that have been designated as “true score” responses. This is also a measure of whether or not a prompt is working properly. The culminating scoring process provides the final inter-rater reliability statistics. ACT Scoring Center staff often conducts debriefings subsequent to scoring to solicit feedback from the trained raters on a prompt’s ability to generate acceptable responses. This feedback is also invaluable in determining whether a prompt works properly. Finally, data from the human scoring effort are analyzed to further identify those prompts that can be scored with the greatest degree of accuracy and reliability. All of this information contributes to a decision to move an e-Write prompt toward operational implementation.

______________________________________________________________

What reliability evidence exists regarding human rater scoring (i.e., inter-rater reliability)?

Reliability is a critical component in analyzing scoring results and determining next steps for new writing prompts. Reliability in inter-rater (human versus human) scoring is typically reported in terms of exact agreement (e.g., two readers assign a score of 4) and adjacent agreement (e.g., one reader assigns a score of 4 and another reader assigns a score of 5). Adjacent agreement is traditionally represented as the percent of exact agreement plus the percent of scores within one point of each other. In the case of the 2004 field tests, three COMPASS e-Write 2–12 prompts and three ESL e-Write 2–12 prompts were determined to yield acceptable levels of inter-rater reliabilities (both exact and adjacent) to warrant moving these prompts to the next stage in the COMPASS prompt implementation process—Vantage IntelliMetric™ scoring engine calibration. Tables 5, 6, and 7 summarize scoring statistics, including inter-rater reliabilities, for COMPASS e-Write 2–8, COMPASS e-Write 2–12, and ESL e-Write. In these tables, mean score refers to the average of the sums of all pairs of raters for all responses to each prompt. Standard deviation refers to the average difference in those scores. Exact agreement refers to exact matches in the score assigned by the two raters, and adjacent agreement refers to scores of each rater that differ by no more than one point (i.e., exact agreement plus adjacent agreement). Please note that COMPASS e-Write 2–8 and COMPASS e-Write 2–12 are scored holistically, while ESL e-Write is scored using analytic scoring rubrics, with an overall score derived from a weighting and summing of the assigned analytic scores. This difference in scoring for the COMPASS e-Write components and ESL e-Write is described in detail in sections that follow.


21

Table 5 illustrates the scoring statistics for the COMPASS e-Write 2–8 prompts. These results show little variation in the mean scores or in the standard deviations for the distributions. The results indicate that the prompts were very similar in terms of overall performance and consistency of scoring. The means and standard deviations indicate that the six prompts are interchangeable and should be treated as equivalent forms of the assessment. The percent of exact agreement was also very consistent across prompts, ranging from 63 percent to 67 percent. These indices indicate that the raters were able to apply the scoring rubric in a very consistent manner.

Table 5 COMPASS e-Write 2–8 Prompt Holistic Scores Assigned by Trained Raters*

Prompt

Mean Score

Standard Deviation

Exact Agreement (%)

Exact + Adjacent Agreement (%)

COMP101 5.75 .79 66 100

COMP102 5.76 .78 65 100

COMP103 5.74 .75 66 100

COMP104 5.73 .80 67 100

COMP105 5.72 .74 65 100

COMP109 5.69 .74 63 100

* Based on the 4-point scoring rubric Table 6 illustrates the scoring statistics for the COMPASS e-Write 2–12 prompts. There was no significant variation in the mean scores or in the standard deviations for the distributions. The results provided in table 6 once again indicate that the prompts were very similar in terms of overall performance and consistency of scoring. The means and standard deviations indicate that the three prompts are interchangeable and should be treated as equivalent forms of the assessment. The percent of exact agreement was also very consistent across prompts, ranging from 77 percent to 80 percent. These indices indicate that the raters were able to apply the scoring rubric in a very consistent manner.

Table 6

COMPASS e-Write 2–12 Prompt Holistic Scores Assigned by Trained Raters *

Prompt

Mean Score

Standard Deviation

Exact Agreement (%)


COMP110 6.76 1.84 79 100

COMP114 6.62 1.89 79 100

COMP115 6.75 1.69 79 100

COMP116 6.56 1.79 74 100

COMP118 6.85 1.69 73 100

COMP120 6.70 1.77 76 100

* Based on the 6-point scoring rubric


22

Table 7 summarizes the technical results of the analytic scores assigned by trained raters for each domain for the three ESL prompts. Because the primary scoring model for the ESL prompts is based on the analytic scores, these are the categories represented within this table. There was some variation in the standard deviations for the analytic scores for the three prompts that is largely due to the expanded score scale and scores being more dispersed along the score scale. The results provided in table 7 indicate that the domain-level scoring for the ESL prompts was similar in terms of overall performance and consistency of scoring. The percent of exact agreement was consistent across analytic scores, ranging from 63 percent for Mechanics to 78 percent for Development. These indices indicate that the raters were able to apply the analytic scoring rubric in a consistent manner.

Table 7

ESL e-Write Prompt Analytic Scores Assigned by Trained Raters *

Domain

Mean Score

Standard Deviation

Exact Agreement (%)


Development 6.64 2.26 78 100

Focus 6.96 2.19 64 100

Organization 5.75 2.22 68 100

Language Use 6.65 2.07 66 100

Mechanics 7.14 2.08 63 100

* Based on the 6-point, domain-specific scoring rubrics As indicated previously, the statistics in tables 5 through 7 include the overall mean scores achieved during scoring in the first column and standard deviations in the second column. It has been noted that the standard deviation is defined as the average difference in the scores. Overall, the standard deviation generally describes the degree of dispersion of scores from the mean or the amount of clustering of the scores around the mean score. Smaller standard deviations indicate that there are more scores occurring closer to the mean, and larger standard deviations indicate that scores are more dispersed. The difference in standard deviations for COMPASS e-Write 2–8 prompts versus the COMPASS e-Write 2–12 prompts is related to differences in the overall score scales. That is, the 2–12 score scale simply has more scores and is, therefore, more likely to have greater dispersion in the scores (i.e., less clustering around the mean). The standard deviations for the ESL e-Write analytic scores are likewise larger because these are also based on a 2–12 scale. However, the dispersion of the ESL e-Write analytic scores is also somewhat higher due to differences in testing populations and resulting scores. ESL e-Write scores tend to be more distributed along the score scale continuum, while COMPASS e-Write scores are more clustered around the mean. Overall, the mean scores, the standard deviations, and the levels of inter-rater reliability (exact and adjacent) achieved for the COMPASS e-Write and ESL e-Write direct writing prompts are consistent with results achieved for other direct writing programs. These results were deemed acceptable such that the writing prompts were allowed to move to the next stage of implementation, which included the calibration of the IntelliMetric scoring engine. Field tested prompts that did not attain satisfactory human rater scoring statistics did not move beyond field test stage.

______________________________________________________________


23

How is Vantage Learning’s IntelliMetric automated scoring engine trained to score COMPASS and ESL e-Write responses?

Based on a review of field test data and ACT staff consensus that an e-Write prompt should move to the next stage of implementation, ACT submits the scoring rubric, field test responses, and associated scores for the selected prompts to Vantage Learning. This submission includes multiple representations of response scores along the scoring rubric score scale continuum, with a general target of approximately 300 responses per prompt for COMPASS e-Write and approximately 500 responses per prompt for ESL e-Write. Vantage uses the responses scored by trained raters to train the IntelliMetric scoring engine to recognize levels of responses based on the features of writing articulated within the rubric. This training process allows the scoring engine to calibrate the electronic scoring process (i.e., the application of a rubric) based on an analysis of responses that have been scored by human raters. Results from the Vantage IntelliMetric scoring are analyzed by ACT to verify and validate the electronic scoring. Once the electronic validity and reliability evidence are determined to be acceptable, writing prompts are moved to full operational status and are integrated into COMPASS. Conceptually, the idea of training the IntelliMetric scoring engine is very similar to the way human raters are trained. For any human rater training, scorers are introduced to the scoring rubric, focusing on the descriptions of specific features of writing associated with the specific score points. Once the rubric has been thoroughly reviewed, anchor papers or exemplars are introduced to scorers. Specific features associated with a paper are highlighted as exemplifying a certain level of response. In addition to the anchor papers, other prescored responses are introduced as part of the training process. These responses are intended to be representative of a particular level of response and an associated score; however, these examples may not be as concrete as the anchor papers, thereby providing finer distinctions or parameters for scoring. In human rater training, once scorers are led through the training sets, they are then asked to score prescored practice and/or qualifying sets of responses to establish individual and collective scoring performance. This performance is typically measured in terms of reliability or agreement rate, which likely will include measures associated with both exact agreement and adjacent agreement—a score within one score point of the other assigned score. In most human training efforts, scorers must demonstrate a prescribed level of reliability or agreement to be allowed to continue scoring. The IntelliMetric training model is also comparable to this human scoring activity. IntelliMetric results from the training or calibration activity are analyzed against the originally assigned human scores. If IntelliMetric cannot attain a level of agreement that is equal to or better than the human inter-rater reliability, additional training or calibration must occur. If the required level of IntelliMetric reliability cannot be achieved with additional calibration, then ACT would more fully explore the original human scoring and closely examine the prompt and responses to determine where the differences lie. If this difference cannot be corrected, COMPASS or ESL e-Write prompt cannot move to operational status. The URL inserted here provides a link to various Vantage Learning reports related to IntelliMetric research and results: http://www.vantagelearning.com/school/research/intellimetric.html.

http://www.vantagelearning.com/school/research/intellimetric.html


24

What evidence is there that automated scoring is as accurate and reliable as human scoring?

The IntelliMetric scoring engine relies on expert human rater scoring input to learn. IntelliMetric is trained to score essays much the same way as expert human raters are trained; the automated scoring engine is trained using a set of essays that have been scored by trained raters. This allows the scoring engine to recognize which elements of an essay written to a specific prompt are desirable. This training process is a prompt-specific process in which a unique training set is used and a resulting unique IntelliMetric scoring model is developed. Before IntelliMetric scoring is approved for use, the scoring model is evaluated to certify its accuracy. During the scoring engine training process, a portion of each essay set is withheld from the training set used to model scoring. These validation responses are scored by IntelliMetric and compared to the human expert scores. This provides a true comparison of blind scoring by IntelliMetric as compared to scores provided by expert raters. Using this validation set to evaluate the accuracy of automated scoring versus human rater scoring, the means of the humans and the IntelliMetric model are compared. If results do not differ significantly based on a t-test for difference in means, agreement rates are calculated. An exact agreement rate refers to the proportion of essays in which the human rater and the IntelliMetric score were identical, while adjacent agreement refers to the proportion of scores that were within one point of each other on a 6-point scale. Any scores of essays that are found to be discrepant (more than one point apart) are also noted and reviewed. Finally, the Pearson correlation between the scores is calculated. The higher the positive correlation between the two scores, which can range from 0 to 1, the more associated the data values are with each other. Vantage Learning research indicates that average human inter-rater reliability for 6-point scoring rubrics tends to be about 95 percent exact and adjacent agreement. These reports further indicate that IntelliMetric versus human rater reliability for 6-point rubrics is in the 97 to 99 percent range. Vantage literature that describes system capabilities and limitations indicate that while IntelliMetric does partially rely on artificial intelligence, it does not think. IntelliMetric is a tool used to apply the thinking of experts. That is, IntelliMetric learns to score in an extremely consistent manner using the responses and scored data that have been used to train the scoring engine. The following lists Vantage Learning informational materials and reports related to IntelliMetric research and results, including:

“IntelliMetric: Frequently Asked Questions”: http://www.vantagelearning.com/school/products/intellimetric/faq.html

“IntelliMetric™ Scoring Accuracy Across Genres and Grade Levels”: http://www.vantagelearning.com/docs/intellimetric/IM_ReseachSummary_IntelliMetric_Accuracy_Across_Genre_and_Grade_Levels.pdf

“An Evaluation ofthe IntelliMetricSMEssay Scoring System”: http://www.vantagelearning.com/docs/articles/Intel_MA_JTLA_200603.pdf

Tables 5 through 7 provided inter-rater reliabilities for COMPASS e-Write components and ESL e-Write. Again, these reliability statistics were based on human versus human scoring. The sections that follow provide details regarding reliabilities achieved for all e-Write components in terms of human versus IntelliMetric scoring. The results listed in the sections that follow and in tables 8 through 10 describe the degree of reliability when IntelliMetric scores were compared to trained rater scores.

http://www.vantagelearning.com/school/products/intellimetric/faq.html

http://www.vantagelearning.com/docs/intellimetric/IM_ReseachSummary_IntelliMetric_Accuracy_Across_Genre_and_Grade_Levels.pdf

http://www.vantagelearning.com/docs/intellimetric/IM_ReseachSummary_IntelliMetric_Accuracy_Across_Genre_and_Grade_Levels.pdf

http://www.vantagelearning.com/docs/articles/Intel_MA_JTLA_200603.pdf


25

Study Results for COMPASS e-Write 2–8 Score Scale and 1–4 Scoring Rubric

The rate of agreement between IntelliMetric scores and scores assigned by expert raters was originally examined for the COMPASS e-Write prompts that are scored on a scale of 1–4. The system-generated scores matched the expert-rater scores within one point 100 percent of the time and exactly matched human raters 66 percent to 88 percent of the time. The Pearson correlation coefficients for scores assigned by the COMPASS e-Write automated scoring when compared with scores assigned by expert raters ranged from .67 to .83. These results are presented in table 8. As a whole, the agreement rates meet or exceed the levels typically obtained by expert raters (see table 5). The correlation for the 4-point prompts (.67 to .83) is within an acceptable range and conforms to expectations for response scoring employing a 4-point scale.

Table 8 COMPASS e-Write 2–8 Prompt Holistic Scores:

Agreement between Human and IntelliMetric Scoring *

Prompt Agreement (%) Adjacent Agreement (%)

Correlation

COMP101 78 100 .73

COMP102 66 100 .69

COMP103 76 100 .67

COMP104 74 100 .67

COMP105 88 100 .83

COMP109 74 100 .75

* Based on the 4-point holistic scoring rubric

Study Results for COMPASS e-Write 2–12 Score Scale and 1–6 Scoring Rubric

The rate of agreement between IntelliMetric scores and scores assigned by expert raters was also examined for the prompts that are scored on a scale of 1–6. The system-assigned scores matched the expert rater scores within one point 100 percent of the time and exactly matched human raters 64 percent to 66 percent of the time. The correlations of scores assigned by IntelliMetric and expert raters ranged from .71 to .80. These results are presented in table 9. The Pearson correlation coefficient for the three 1–6 scale prompts examined (.71 to .80) is within an acceptable range and conforms to expectations for response scoring employing a 6-point scale. In fact, the average Pearson correlation for the three 1–6 scale prompts (78 percent) exceeds the average Pearson correlation for the six 1–4 scale prompts (72 percent). Both correlations are in an acceptable range for open-ended essay response scoring.

Table 9 COMPASS e-Write 2–12 Prompt Holistic Scores:

Agreement between Human and IntelliMetric Scoring *

Prompt Agreement (%) Adjacent Agreement (%)

Correlation

COMP110 58 100 .74

COMP114 70 100 .85

COMP115 76 100 .75

COMP116 78 100 .81

COMP118 74 100 .83

COMP120 74 100 .81

* Based on the 6-point holistic scoring rubric


26

Study Results for ESL e-Write 2–12 Score Scale Analytic Scoring Guide

The rate of agreement of scores assigned by the automated scoring engine and those assigned by expert raters was examined was also examined for ESL e-Write. ESL e-Write analytic scores assigned by IntelliMetric agreed with the scores assigned by expert raters within one point nearly 100 percent of the time; exact agreement ranged from 61 percent for the Mechanics scale to 77 percent for Development. The correlation between IntelliMetric and expert raters ranged from .79 to .92. These agreement rates and the Pearson correlations for the individual ESL domain scales are within acceptable ranges and conform to expectations for scoring. These results are presented in table 10. The level of analytic score agreement and overall correlation for ESL e-Write based on IntelliMetric versus human scoring comparisons is equal to scoring study results for the COMPASS e-Write prompts.

Table 10 ESL e-Write Prompt Analytic Scores:

Agreement between Human and IntelliMetric Scoring

Comparison Between ACT Raters and ESL e-Write Scoring

Domain Agreement (%) Adjacent Agreement (%)

Correlation

Development 77 100 .92

Focus 66 99 .86

Organization 66 99 .86

Language Use 62 97 .79

Mechanics 61 99 .80

Scoring Quality Monitoring

The results of ACT’s original analyses verified that the IntelliMetric automated scoring engine assigns scores to essay responses at reliability levels that are consistent with assessment industry and ACT standards. With an exact plus adjacent agreement rate at or approaching 100 percent, IntelliMetric scoring achieves levels of agreement comparable to agreement statistics for two expert raters. However, to further ensure high-quality scoring, ACT routinely monitors samples of student essays scored by IntelliMetric to provide ongoing verification of agreement between automated score results and expert rater scores. The targeted monitoring rate is approximately 10 percent of the machine-scored responses. Routine ACT quality monitoring of COMPASS e-Write or ESL e-Write essays has shown no statistical deviation from the original reliability rates achieved with IntelliMetric scoring.

______________________________________________________________


27

Are there responses that cannot be scored by the scoring engine? How are these scored?

For both COMPASS e-Write components and the ESL e-Write component, the system does not automatically score responses that deviate significantly from the patterns of results observed in the original training essay responses (i.e., the responses used to train the Vantage IntelliMetric scoring engine). Occasionally, a student may submit a response that is off topic, too short to support a score, or for some other reason is not scoreable by computer (e.g., a combination of a discernible topic focus and brevity). In these instances, the student’s response is automatically routed to ACT for scoring. Trained ACT raters score student responses that are not scored by the automated essay scoring system. Responses that are determined to be unscoreable by Vantage IntelliMetric include responses that:

Fail to address the topic defined by the prompt

Are written in a language other than English

Are too short to support a full evaluation

Refer to sensitive issues (e.g., violent or abusive situations) If the response is routed for handscoring and the response is found to be off topic, an unscoreable code of 91 is assigned. If the response is unintelligible by human raters, an unscoreable code of 92 is assigned. The scores and the student responses are then incorporated into the COMPASS Internet version and the institution is notified via email that score reports for those students are available.

______________________________________________________________

Given that most scores for COMPASS e-Write 2–12 fall into the 6, 7, and 8 score categories, how can schools use these scores to make meaningful placement decisions?

ACT conducted a concordance study in 2004 to provide comparisons of the two COMPASS e-Write score scales. For this study, the same responses to the same prompt were scored by the same group of raters; the only variable in the study was the scoring rubric used. Scoring Center staff scored 1,132 COMPASS e-Write responses using both a 2–8 scoring scale and a 2–12 scoring scale. Table 11 provides data comparisons from this concordance study.

Table 11 COMPASS e-Write Score Scale Comparisons

COMPASS e-Write 2–12

Score

% At or Below (cumulative %)

% Within Score Category

COMPASS e-Write 2–8

Score

% At or Below (cumulative %)

% Within Score Category

12

11

100.00

98.85

1.15

0.97 8 100.00 1.50

10

9

97.88

94.08

3.80

3.80 7 98.50 2.74

8

7

90.28

62.28

28.00

10.69 6 95.76 37.99

6 51.59 34.54 5 57.77 34.19

5

4

3

17.05

12.28

3.18

4.77

9.10

0.71

4

3

23.5

6.71

16.87

3.80

2 2.47 2.47 2 2.92 2.92


28

Data in table 11 show the percentage of responses receiving specific scores for both score scales, providing general comparisons of score-level frequency distributions for the two scales. For example, the frequencies of the scores of 11 and 12 on the COMPASS e-Write 2–12 score scale most closely align with the frequency of the score of 8 on the COMPASS e-Write 2–8 score scale. A score of 6 on the 2–12 score scale most closely aligns with a score of 5 on the 2–8 score scale. There is greater differentiation of scores 3–5 on the COMPASS e-Write 2–12 score scale; these scores tend to correspond to scores of 3 and 4 on the COMPASS e-Write 2–8 scale. Best Practice: Multiple Measures for Writing

This section focuses on a common query regarding COMPASS e-Write 2–12 where the majority of the scores fall into holistic score levels of 6, 7, and 8. That is: how can sites use these scores to make effective placement decisions when so many examinees score at similar levels? It is appropriate to point out that the same issue exists for COMPASS e-Write 2–8, where the majority of scores fall into holistic score levels of 4, 5, and 6. The question is how to differentiate ability when so many examinees are scoring within a narrow score range.

One important consideration is that the “clustering” of direct writing scores within the middle of the COMPASS e-Write score scales as illustrated in Table 11 is comparable to various state and national writing assessment results: Writing essay scores tend to fall in the middle range of performance because this is the ability level of most writers.

Another consideration is that a direct writing assessment is a single-item test: While the holistic score scale provides levels of writing performance differentiation, the written response is still a single response to one performance task.

An additional consideration is that the COMPASS e-Write direct writing assessments were developed as complementary tests to be used in conjunction with the COMPASS Writing Skills multiple-choice test: COMPASS e-Write components were developed because many postsecondary English language arts (ELA) programs consider writing production to be a cornerstone of curriculum and instruction, especially at the credit-bearing level. Therefore, the assessment of writing production can be an important component in differentiating ability for ELA course placement. However, the use of only a direct writing assessment may not yield sufficient information for placement decisions.

An ACT best practice recommendation is to use COMPASS e-Write components as companion tests with the COMPASS Writing Skills Placement Test to assess both an examinee’s ability to identify errors in writing and produce writing. Using COMPASS e-Write in conjunction with COMPASS Writing Skills allows a greater level of precision for local placement decisions. The complementary nature of the two tests can offset the clustering effects associated with a direct writing assessment and offer additional writing performance differentiation, For COMPASS sites that are attempting to place students into multiple ELA levels that begin with a sequence of developmental courses and move to credit-bearing courses, the use of multiple measures is critical to achieving the type of writing skills information and level of precision necessary. As a best practice for English language arts course placement, ACT recommends starting examinees in COMPASS Writing Skills and routing examinees to COMPASS e-Write (either 2–8 or 2–12) as companion assessments such that both assessments are administered in one sitting. The use of multiple measures for writing assessment may be accomplished by using COMPASS Writing Skills and COMPASS e-Write scores as “stand-alone” scores (e.g., two discrete measures). This means that placement decisions are based on two independent scores. However, it is also possible to use the multiple-measures feature within the COMPASS system to combine two tests into a single, composite score. The steps in setting up the multiple-measures feature are described in the COMPASS Internet Version “Help” interface. If users choose to use the multiple-measures feature available within the COMPASS system, ACT recommends using COMPASS Writing Skills as the primary measure and COMPASS e-Write as the secondary measure. In the COMPASS system multiple-measures setup, the primary measure is the determiner of the composite score scale. Using COMPASS Writing Skills as the primary measure and COMPASS e-Write as the secondary measure means the direct writing score would be automatically transformed to the multiple-choice score scale so that the final multiple-measures composite score scale would extend to 99 points. This multiple-measures


29

composite score approach for English language arts course placement allows for a greater number of sore points to be used in making placement decisions. Please note that if you use the COMPASS multiple-measures feature, the Standard Individual Report (SIR) will include 1) a COMPASS Writing Skills score, 2) a COMPASS e-Write score, and 3) a multiple-measures composite score for both the multiple-choice test and the direct writing assessment. Also note that the multiple-measures feature allows sites to weight measures based on site-specific content needs (e.g., more focus on the ability to identify errors in writing or more focus on writing production). Figure 4 illustrates the multiple-measures setup interface in the COMPASS system. In this example, COMPASS Writing Skills and COMPASS e-Write have been equally weighted.

Figure 4 Writing Multiple Measures Setup

This multiple-measures model can also be translated to ESL testing, where ACT recommends using ESL Grammar/Usage as the primary measure and ESL e-Write as the secondary measure such that the combined multiple-measure score scale is extended to the full range of multiple-choice score points available. The strength of these COMPASS multiple-measure models for writing is in combining different assessment approaches to better evaluate and discern performance; multiple measures provide better discrimination in performance. Combined assessment approaches for writing also allows for more in-depth prescriptive information than would otherwise be available for assessing individual strengths and weaknesses.

______________________________________________________________


30

What are other best practices associated with COMPASS direct writing assessments?

Best Practice: Selecting a Direct Writing Assessment

In analyzing the selection of a placement test, there are important questions sites should consider. The following describe broad categories of questions and specific questions associated with these categories:

1. What are we currently doing to meet our course placement needs?

Which tests are currently being used for specific courses (e.g., COMPASS tests, other)?

How were these tests chosen? What were the original analyses associated with the selection of these instruments?

How were existing cutoff scores and associated placement messages determined?

Have there been changes in local curriculum that may require adjusting the current course placement model (i.e., tests, cutoff scores, and placement messages used)?

What information do we have regarding the success of the current course placement model (e.g., research/data analysis, faculty feedback/validity evidence)?

2. What do we want to do differently and why?

Why should we investigate an additional test (e.g., e-Write, other)? Is there reason to believe that testing writing production is needed for local course placement?

What information do we have that contributes to concerns about the existing course placement model (e.g., institutional research data, course placement analysis, faculty feedback)?

What information will this additional test provide and to what degree does the content of this test align with local curriculum and instruction?

It is important to review the above when considering any course placement program or test. However, there are also other important factors sites need to consider that are specific to a COMPASS/ESL e-Write direct writing assessment:

As previously outlined within the “multiple measures” best practice section, a direct writing assessment is a single-item test: an essay is a single response to one performance task.

COMPASS/ESL e-Write components were developed as complementary tests to be used in conjunction with multiple-choice tests: using only a direct writing assessment may not yield sufficient information or differentiation for placement decisions, especially when multiple levels of course placement are needed.

COMPASS/ESL e-Write assessment scores align with 1) content described by the scoring rubrics and 2) scoring that was based on national samples: sites must determine whether the score points and proficiency descriptions articulated by the rubrics align with local curriculum.

Tables 5, 6, and 7 within this FAQ describe the established COMPASS/ESL e-Write exact and adjacent agreement parameters for human rater scoring. Tables 8, 9, and 10 within this FAQ describe the established exact and adjacent agreement parameters for human rater versus automated scoring engine scoring: these are the expected levels of agreement for the e-Write components.

A site should have the opportunity to analyze the content (validity) and reliability associated with any course placement test. For the COMPASS/ESL e-Write assessments, the content is established by the rubrics and associated scores and the reliability is established by inter-rater agreement (i.e., human versus human and human versus scoring engine). As a best practice, ACT strongly encourages sites to review COMPASS/ESL e-Write content and technical specifications to ensure that these tests meet local needs and expectations.


31

Best Practice: Classroom Assessment & Standardized Assessment Differences

The focus of classroom assessment is on measuring individual performance relative to a specific instructional setting within a particular curricular environment (e.g., English 101 taught by Ms. Cruz using ELA curriculum adopted by XYZ College). The use of assessment in a classroom necessarily focuses on what is being taught, and the approach to assessment can (and should) be adjusted to accommodate the needs of instruction. The focus of standardized assessment is on secure tests that are administered to groups of students in a standardized format for the purpose of measuring performance in a standardized manner. The groups of students being tested cross the boundaries of classrooms and institutions. Standardized assessments require that content and performance standards are applied in the same manner for all examinees to ensure that results are accurate (i.e., valid) and repeatable (i.e., reliable). COMPASS e-Write assessments are standardized performance assessments that were developed based on a national student sample using a standardized delivery system, standardized training, and a standardized scoring model. The COMPASS direct writing components adhere to the following:

The COMPASS e-Write system interface presents writing prompts with the same format in exactly the same way each time.

COMPASS e-Write automated scoring is modeled on a holistic scoring rubric and two human rater scores. The automated scoring engine is “trained” (or calibrated) to score e-Write responses exactly the same way each time, using the same rater-modeled scoring routine for all responses. This is achieved by using 300 to 500 e-Write responses and scores from two raters as the basis for training. This scoring engine training is followed by in-depth analysis to establish the degree of reliability (i.e., scoring agreement) between the automated scoring and human scoring.

Human rater scoring at ACT is based on the use of scoring rubrics and multiple sets of training materials. ACT staff, with the assistance of external experts as needed, conduct an initial range-finding session to identify responses for each writing prompt that exemplify specific score levels. These responses are used to construct sets of anchor papers (i.e., “true score” papers) and training sets.

While COMPASS e-Write scoring rubrics describe performance associated with various score points, the training materials provide critical examples to illustrate and underscore this writing performance description. The training provided ACT raters is the crucial component in standardizing how the scoring rubric is applied.

The ACT rater training process results in human raters understanding and accurately applying the scoring rubric with the greatest level of precision and consistency possible. Raters must qualify to be allowed to score and must meet a minimum level of consistency or reliability during scoring.

A critical component in human rater scoring is the degree to which individual preferences or interpretations are minimized. A classroom instructor develops tasks that are appropriate for instruction; an instructor evaluates performance based on local curriculum and a course-specific rubric. However, in standardized scoring for an assessment administered nationally, individual preferences or interpretations must be minimized to ensure fairness and mitigate individual bias.

Human rater scoring at ACT requires ongoing monitoring to avoid “rater drift,” which can be defined as individual movement away from a systematic application of the rubric by interjecting individual interpretations. ACT captures inter-rater reliability (agreement) statistics on an ongoing basis to measure rater drift, and expert readers monitor rater performance.

Individual retraining or group recalibration is conducted as needed. If individual rater scoring is shown to deviate from standardized scoring protocols and a correction cannot be made, that rater is removed from the scoring project.

Overall, an extremely important best practice consideration is related to the degree of standardization that is built into a COMPASS e-Write scoring model through human rater scoring. This standardization of rater scoring is, in turn, modeled through the training of the automated scoring engine. In a standardized assessment


32

environment, individual preferences, interpretations, and bias must be controlled to the greatest extent possible in order to promote accurate, reliable, and unbiased scoring. The differences between classroom instruction and assessment and standardized assessment can present challenges in terms of the relative differences in purpose, protocols, and interpretations. However, the ACT perspective is that classroom-based assessment to support instruction and standardized assessment to support broader institutional decisions provide two levels of crucial feedback to fully support an efficient and effective educational delivery system. Best Practice: Local Pilot Testing & e-Write Validation

As postsecondary institutions explore the viability of using COMPASS/ESL e-Write, there are various needs related to validating direct writing assessments locally. That is, an institution may be interested in using COMPASS e-Write as part of its ELA course placement model (i.e., in conjunction with COMPASS Writing Skills). However, the performance assessment nature of e-Write and automated scoring of COMPASS/ESL essays can represent a significant departure from local staff experiences and comfort level. Postsecondary sites examining COMPASS/ESL e-Write assessments often want to undertake a local validation effort that includes faculty scoring of e-Write responses. However, this type of validation effort can result in faulty-assigned scores that differ from COMPASS system-assigned scores due to myriad differences in scoring approaches. These differences are related to:

local perspectives on COMPASS or ESL e-Write rubric descriptions (e.g., local content interpretations that differ from the scoring model)

pilot study effects regarding the student sample selected (e.g., sample size, sample characteristics)

how faculty are trained (e.g., training that differs from the COMPASS e-Write training)

how scoring is monitored (e.g., scoring that fails to examine the accuracy and reliability of local scoring) Overall, local staff is inclined to review and evaluate COMPASS/ESL e-Write essay responses based on local curriculum and instruction; however, COMPASS/ESL e-Write scoring is based on a standardized scoring rubric and a nationally representative sample of scored student responses. When postsecondary staff compares local scores to COMPASS scores, it is roughly akin to using local norms versus national norms: the two sets of norms can each be valid for their respective purposes, but should not be compared to each other. As illustrated by COMPASS/ESL e-Write inter-rater reliability evidence (i.e., Tables 5, 6, and 7) and by human rater versus automated scoring engine reliability evidence (i.e., Tables 8, 9, and 10), differences in scores result even when substantial effort is made to achieve high levels of agreement. Essay scoring between two trained raters can result in differences; ACT staff work hard to mitigate these differences through ongoing monitoring of inter-rater agreement rates and retraining. At the next level, scoring between a trained rater and the automated scoring engine can result in differences; ACT quality monitoring and recalibration efforts are instituted to mitigate these differences. Based on ACT experience, local postsecondary staff may identify "differences" or disagreements with automated scores, but subsequent investigation indicates that while staff may be applying a rubric, they are not applying the ACT rubric. Given our experience, ACT would not recommend having faculty score COMPASS/ESL e-Write essays. Since ACT scoring of e-Write essays includes in-depth training relative to the application of the rubric, in-depth understanding of numerous essay exemplars, and in-depth monitoring of the scoring process (e.g., capturing inter-rater reliability indices, expert reader back-reading, retraining), our experience is that sites invest a great deal of time and resources in a scoring activity that may yield questionable results. A faculty rating process that focuses on an evaluation of course placement and e-Write alignment with a placement model yields much greater validation returns. In the sections that follow, ACT outlines two recommendations for local COMPASS/ESL e-Write validation.


33

COMPASS/ESL e-Write Validation Option 1: In the context of validating the use of COMPASS/ESL e-Write, it is important that sites first determine whether the existing placement model is working appropriately. ACT recommends validating COMPASS/ESL e-Write pilot testing results through the use of a faculty rating analysis that is accomplished by asking faculty to rate existing course placement decisions using the following scale:

5. Should definitely be placed in a higher level course.

4. Might have the ability to do well in a higher level course.

3. Is appropriately placed.

2. Might be better placed in a lower level course.

1. Should definitely be placed in a lower level course. The faculty rating scale described above can be used to validate an existing placement model (an important first step) and analyze the alignment of faculty placement evaluations with COMPASS/ESL e-Write scores. Using a faculty course placement evaluation in conjunction with COMPASS/ESL e-Write scores would typically yield the following data sets:

pilot testing results for COMPASS/ESL e-Write

faculty evaluations of current placement decisions for specific courses

results for COMPASS multiple-choice components The data gathered above would allow sites to analyze:

faculty feedback regarding current placement results for specific course levels

faculty agreement with COMPASS/ESL e-Write scores for specific course levels [e.g., agreement between ratings of 3 (appropriately placed)]

agreement between e-Write scores and multiple-choice results by course (i.e., at specific cutoff scores) Faculty evaluation of the existing placement model allows sites to identify gaps in current placement practices. Piloting of COMPASS/ESL e-Write components can then be conducted to analyze whether e-Write results align with faculty evaluations (i.e., provide additional information that would increase faculty satisfaction with course placement). Conducting an e-Write pilot without investigating the validity of existing placement models can result in substantial local effort and spurious pilot study results. COMPASS/ESL e-Write Validation Option 2: Another method for analyzing and validating course placement models can be accomplished through the use of the ACT Research Services Course Placement Service. The Course Placement Service provides information your institution can use to validate current cutoff scores, select new cutoff scores, or compare the effectiveness of different placement tests (e.g., COMPASS/ESL e-Write). The ACT Course Placement Service provides institution reports with student grades in up to 25 different courses, along with an overall grade point average. The institution can request analyses showing the relationship between course grades and selected placement variables (e.g., specific COMPASS test scores). For each course analysis, ACT provides key statistics that allow sites to determine the impact of setting/adjusting placement cutoff scores or incorporating an additional measure (e.g., COMPASS or ESL e-Write). The following outlines some typical questions the ACT Course Placement Service can help answer:

Are current cutoff scores too high or too low?

Are first-year students being placed in the appropriate college courses?

If the cutoff score for a particular course was raised, what percentage of entering students would be placed in the lower-level course?

How well are current placement tests functioning?


34

Specifics regarding ACT Research Services Course Placement Services can be found at http://www.act.org/research/services/crsplace/. Sites may contact ACT Research Services at 319/339-3089 for additional information. Sites may also contact ACT regional offices for additional information on costs associated with ACT Course Placement Services: http://www.act.org/contacts/field.html. COMPASS/ESL e-Write & Local Scoring: Some sites may prefer to proceed with a local COMPASS/ESL e-Write validation effort that includes pilot testing and faculty evaluation of essays and scores. Based on ACT experience, local scoring efforts tend to result in differences in scores due to local content expectations that depart from COMPASS/ESL e-Write scoring parameters. Should sites proceed with local scoring, queries regarding any resulting differences in scores should be communicated to ACT in the context of the scoring rubrics. That is, sites should be prepared to describe any “disagreement” with COMPASS/ESL e-Write scores as this specifically relates to the scoring rubrics. This may include an analysis of site-specific expectations contrasted with COMPASS/ESL e-Write scoring rubric expectations. The following provides some examples of analyzing differences:

Our faculty tends to “score down” for an essay that does not adhere to a five-paragraph format; however, COMPASS e-Write scoring does not take this into account.

Our ELA program focuses on writing mechanics as being particularly important in differentiating between developmental and credit-bearing courses. The COMPASS e-Write holistic scoring rubric does not appear to accommodate this focus.

Our institution promotes the use of standard English for all writing tasks. The use of “slang” does not seem to be penalized within the COMPASS e-Write scoring model.

Our ESL program emphasizes mechanics as an important feature of evolving writing skills. However, the ESL e-Write scoring model weights writing mechanics as only 5 percent of the overall score.

This type of content-specific feedback focuses on the scoring rubrics and allows ACT staff to better understand differences and provide a meaningful response. Queries regarding general differences in scores that do not register an understanding or analysis of the COMPASS/ESL e-Write scoring rubrics and that merely request an ACT “rescore” cannot be accommodated. Best Practice: Other Considerations for Writing Assessment

In addition to recommendations for using multiple measures, ACT has other best practice recommendations for English language arts course placement. The following outlines specific considerations for English course placement decisions. Use a direct writing assessment if writing production is an important component in the local English curriculum: Analyze local English courses in terms of specific performance expectations by reviewing course descriptions and other course-specific information (e.g., objectives). For example, an imaginary “English 90” course is an upper-level developmental course that includes the following descriptors: Focus on sentence structure, paragraph and essay development, and written expression. In this case, one portion of English 90 targets understanding grammar/syntax associated with the structure of language, which indicates that the COMPASS Writing Skills Placement Test would provide important feedback on ability related to identifying errors in writing. However, English 90 also focuses on essay development and written expression, which indicates that COMPASS e-Write would also provide critical feedback on ability relative to the production of writing. Effective course placement for English 90 should likely include measures that evaluate both the ability to recognize written errors and the ability to produce written communication. The selection of the appropriate COMPASS tests should be based on a course-level content and performance expectation analysis. This content analysis requires feedback from local English instructors to validate the use of the selected instruments.

http://www.act.org/research/services/crsplace/

http://www.act.org/contacts/field.html


35

Use the appropriate direct writing assessment for the testing population: As described previously in this FAQ, COMPASS e-Write was designed for native English speakers, while ESL e-Write was designed specifically for nonnative speakers/emerging English-language learners. Because COMPASS e-Write and ESL e-Write were developed for and field tested with different testing populations and use different scoring models, it is critical that the appropriate direct writing assessment be used for specific groups. Using COMPASS e-Write with examinees who should be tested using ESL e-Write (and vice versa) will contribute to scoring inaccuracies, which will negatively impact placement decisions. Use expectations associated with local curriculum to set cutoff scores and develop placement messages: When setting cutoff scores and developing associated COMPASS placement messages, it is extremely important that English or ESL instructors are involved in analyzing each English or ESL course in terms of performance expectations. Local instructors are essential in determining how performance levels of e-Write (i.e., rubric proficiency descriptions) align with course expectations. An important question to be answered as part of this content analysis is: What are the entry-level performance expectations for this course and how do these expectations align with the e-Write proficiency descriptions? This alignment analysis activity allows for the best possible content-driven decisions for course placement. Use expectations associated with local curriculum to validate the use of the instrument and associated cutoff scores: Once an e-Write examination has been implemented, it is important that institutions validate placement results using faculty feedback or use faculty judgments to adjust cutoff scores. Sites can use faculty ratings to evaluate placement decisions using the following rating scale:

5. Should definitely be placed in a higher level course.

4. Might have the ability to do well in a higher level course.

3. Is appropriately placed.

2. Might be better placed in a lower level course.

1. Should definitely be placed in a lower level course. Based on faculty feedback on the above, a site can better judge the validity of local placement decisions. Should faculty ratings indicate that a change is needed to improve alignment between course content expectations and e-Write results (e.g., the ratings data reflect a disconnect), an adjustment to cutoff scores should be made. When there is uncertainty as to how to use results to make placement decision adjustments, sites have ACT Course Placement Service resources available to support a data-driven fine-tuning of cutoff scores.

Documents

Answers to Frequently Asked Questions about COMPASS e ... · Answers to Frequently Asked Questions about COMPASS e-Write & ESL e-Write 2 The COMPASS e-Write prompts were developed