25
1 Running head: EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST Exploring the Writing Section of the TOEFL iBT Test: Analysis of Tasks and Scoring Process Augar M. Khoshaba The Monterey Institute of International Studies September 30, 2013

eng Web viewDespite the fact that the first two components of the early TOEFL offered direct measures of test takers’ English proficiency, the third section was

Embed Size (px)

Citation preview

Page 1: eng   Web viewDespite the fact that the first two components of the early TOEFL offered direct measures of test takers’ English proficiency, the third section was

1Running head: EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST

Exploring the Writing Section of the TOEFL iBT Test:

Analysis of Tasks and Scoring Process

Augar M. Khoshaba

The Monterey Institute of International Studies

September 30, 2013

Page 2: eng   Web viewDespite the fact that the first two components of the early TOEFL offered direct measures of test takers’ English proficiency, the third section was

2EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST

Exploring the Writing Section of the TOEFL iBT Test

When I applied for a scholarship to pursue my Master’s degree in the United States, the

sponsoring committee required me to provide a proof of eligibility, including graduation

transcript and medical report. Additionally, they asked for what seemed to concern them the

most: a report of my English language skills. In response, I prepared my documents and

registered for the only available language test back then: the paper version of The Test of English

as a Foreign Language (TOEFL PBT). My success in this test was my passport to pass different

stages of evaluation and, eventually, fly to the United States.

While I thought that my experience with standardized tests had ended with the TOEFL

PBT, a second round of testing started as I arrived in the U.S. This time, the Admission Office at

my school requested that I take the newest and most communicative version of the TOEFL

series: TOEFL iBT. According to ETS (2008a), this Internet-based test enables students to “get

into more than 6,000 universities worldwide,” and proves that they can “communicate effectively

in an academic environment.” Although I achieved the target score in the second attempt, my

sub-score in the writing section did not meet my program’s requirements. I had to take the test

two more times until I received a better score. This struggle with the writing section of the

TOEFL iBT made me question the integrity of its scoring system and whether the tasks actually

resembled those in college courses. Hence, I decided to review the writing portion of the TOEFL

iBT test to seek answers to my questions. This paper is divided into four sections: history of the

TOEFL, description of the writing section, scoring system, and analysis.

Page 3: eng   Web viewDespite the fact that the first two components of the early TOEFL offered direct measures of test takers’ English proficiency, the third section was

3EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST

History of the TOEFL Test

The increasing population of non-native students in the United States in late 1950s and

early 1960s necessitated the urgency of a language test that accommodates their academic needs.

As a result, the National Council on the Testing of English as a Foreign Language was

established in 1961, which launched its first TOEFL test in 1964. Nine years later, in 1973, the

Educational Testing Service (ETS) became the main operator of the TOEFL test. This

organization changed the format of the first TOEFL in 1976, from the five multiple-choice

sections into three new parts: reading comprehension and vocabulary; listening comprehension;

and structure and written expressions (ETS, 2007).

Despite the fact that the first two components of the early TOEFL offered direct measures

of test takers’ English proficiency, the third section was heavily criticized by English teachers

and score users alike. They argued that using discrete-point tests of English structure and written

expression does not assess examinees’ writing skills since they do not produce any written

responses. Thus, they requested a more appropriate measure that requires test takers’ to produce

academic essays equivalent to those they encounter in college classes (Chapelle et al., 2009).

Accordingly, the Test of Written English (TWE) was introduced in July, 1986. This test included

one writing task, where examinees write an essay to describe a chart, or compare, or express

their opinions within 30 minutes. It was scored on a 6-point scale and was first offered separately

from the TOEFL, but was soon integrated into it (Greenberg, 1986).

Despite people’s satisfaction with this stronger measure of writing, many of them

questioned its validity since it used a scale different from that of the TOEFL. These concerns,

along with a growing interest in applying the theory of communicative competence to language

Page 4: eng   Web viewDespite the fact that the first two components of the early TOEFL offered direct measures of test takers’ English proficiency, the third section was

4EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST

testing, led to the development of a more communicative version of the test. In 1998, ETS

introduced the computer-based version of the test: TOEFL cBT. In addition to including new

item formats and visual aids to the listening and reading section, this computerized version also

added a more communicative writing task along with the structure component. Nevertheless, this

test was criticized for not being completely communicative. As a result, a comprehensive and

more integrative version was designed in 2005: the “next generation” TOEFL iBT (ETS, 2005).

This Internet-based TOEFL consists of four sections: reading, listening, speaking, and

writing. The total score of the test is 12—30 points per section. The most salient feature of the

TOEFL iBT is the addition of the speaking section, which measures examinees’ abilities to

“communicate in English in an academic setting” (Sharpe, 2010). The test lasts for four hours

with a ten-minute break given in the middle, between the listening and speaking sections. It is

offered more than 50 times per year in 110 countries and has so far been taken by 27 million

examinees in the world (ETS, 2008a,2013b).

TOEFL iBT is a norm-referenced test, meaning that it is used to “spread students out in

percentile terms for proficiency or placement testing purposes” (Brown, 2005, p.76). Recently,

ETS (2013a) has published data from January 2012 – December 2012 representing the means

and standard deviations of examinees’ scores based on their gender or country (Appendix A). In

terms of test registration, test takers need to fill out an online registration form on ETS website

(Appendix B) and pay a fee of $160 to $250, depending on the country or testing center. The

software used to operate the test is straightforward and uses clear written and audio instructions,

which facilitate its use even by first-time test takers. I remember having total control in the test

Page 5: eng   Web viewDespite the fact that the first two components of the early TOEFL offered direct measures of test takers’ English proficiency, the third section was

5EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST

even though I was not quite familiar with the program. In the following two sections, I will offer

a general description of the writing portion in the TOEFL iBT and its scoring system.

Description of the Writing Section of the TOEFL iBT

The writing portion of the TOEFL iBT is the last part of the test after the speaking

section. It is a direct measure of examinees’ ability to write integrative and opinion essays

similar to those they produce in college. Unlike TWE or TOEFL cBT, this recent version

includes two essay writing tasks.

The Integrated Essay

In this task, students respond with an essay after they read a passage and listen to a

lecture discussing the same topic (Appendix C). The goal of the task is to measure examinees

abilities to synthesize or connect ideas from the two passages. The time allocated for this task is

20 minutes, in which test takers are expected to write 150-225 words (Sharpe, 2010).

The Independent Essay

In this task, examinees will read a prompt on the screen asking them to compose an essay

that reflects their opinion about common topics (Appendix D). They are expected to write 300-

350 words; therefore, they are given 30 minutes to finish the essay. In both tasks, a timer appears

on the screen to notify test takers about the time remaining to complete their essays.

Scoring System

ETS repeatedly stresses its use of a wide range of security measures to maintain integrity

in the scoring process. Most notable is the well-protected location where scoring takes place;

Page 6: eng   Web viewDespite the fact that the first two components of the early TOEFL offered direct measures of test takers’ English proficiency, the third section was

6EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST

tests are not graded at the testing centers; rather, they are scored in centralized networks by

professional raters representing an array of different cultural backgrounds. As soon as the

students finish the test, their essays will be sent to the Online Scoring Network (OSN), where

each essay will be marked on a scale from 0-5 points by using a holistic scoring approach. As

Brown (2005) puts it, the holistic scoring model “uses a single general scale to give a single

global rating for each student’s language production” (p.54). Usually, two raters mark each task

according to the writing scoring rubric (Appendix E).

Occasionally, the scores assigned by the two raters might differ by 1 point, a situation

that requires a third rater to mark the task to determine the final score. In the event that the three

scores were close to each other, the final score will be the mean of all the scores. However, if

these three scores were still inconsistent, the mean of the two closest scores will be the grade

assigned to the task (Sawaki et al., 2008). To calculate the total score for the writing section,

raters convert the average of the two scores—integrated and independent—to a score on a scale

of 30 points (ETS, 2005). Appendix (F) provides a practical conversion chart.

The holistic scoring model used in the writing measure has its strengths and weaknesses;

Bailey (1998) notes that the holistic approach is “fast” and results in a high level of rater

reliability. Moreover, it focuses on positive qualities of writers’ essays. On the other hand, “A

single score may mask differences across individual composition,” i.e.; two papers with the score

of “3” on the scale might exhibit different qualities (p.189). Regarding the TOEFL iBT test, ETS

often reports that the organization offers intensive training sessions for raters. Bailey (1998)

discusses a particular workshop where raters first review “benchmark papers”—samples that best

represent each point on the scale—to familiarize themselves with the scale. Then, by following

Page 7: eng   Web viewDespite the fact that the first two components of the early TOEFL offered direct measures of test takers’ English proficiency, the third section was

7EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST

the same scale, they read and mark similar papers and discuss their scores with their peers

(p.189). This process is intended to create a unified, more accurate measure.

Fifteen business days after the test date, examinees receive their score reports via email.

The scores are listed in a table divided into four sections: reading, listening, speaking, and

writing (Appendix G). In the writing section, the level of performance is indicated by four

categories: weak (0-1.0), limited (1.0-2.0), fair (2.5-3.5), and good (4.0-5.0). Next to each level,

there is a short description of examinee’s general performance with bullet points explaining

particular weaknesses (Appendix H). Moreover, the reports include small tables of score

interpretations on the back page. Hard copies of the reports are mailed to the examinees within a

week from receiving the electronic versions.

Analysis

In analyzing the writing section of the TOEFL iBT, I adapted two test analysis

frameworks developed by Wesche (1983) and Swain (1984). Under each analysis, I will include

a summary table of each component or principle, followed by a detailed discussion of how these

principles are reflected in the writing tasks of the Internet-based test.

Wesche’s Framework

Wesche (1983) proposed a practical framework to analyze test structure. It includes four

key components: stimulus material, task based to learner, learner’s response, and scoring criteria.

Table 1 shows how the writing tasks in the TOEFL iBT correspond to Wesche’s framework.

Page 8: eng   Web viewDespite the fact that the first two components of the early TOEFL offered direct measures of test takers’ English proficiency, the third section was

8EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST

Table 1. The writing tasks and Wesche’s test analysis framework

Wesche’s Test Analysis Components

Task 1 (Integrated Essay) Task 2 (Independent Essay)

Stimulus materialReading and listening passages.

Prompts.

Task based to the learner-Understanding the passages. -Synthesizing.

-Relying on experience.

Learner’s responseConstructive: essay writing. Constructive: essay writing.

Scoring criteriaScoring approach: holistic scoring on a scale from 1-5 points. Main focus: quality, completeness, and accurate content.

Scoring approach: holistic scale on a scale from 1-5 points. Main focus: quality and development.

Stimulus material. The stimulus material, as defined by Bailey (1998), is a “term refers

to whatever linguistic or non-linguistic information presented to the learners to get them to

demonstrate the skills or knowledge we [teachers] want to assess” (p.13). In the integrated essay

task, the stimulus materials are the reading passage and the listening lecture, whereas in the

independent essay writing, it is simply the prompt that appears on the screen.

Task posed to the learner. This point refers to the mental processes that examinees

activate in order to understand the task and produce output (Bailey, 1998). In the integrated

essay, the task posed to the learner is understanding and making connections between the reading

and listening passages. In the independent essay task, examinees need to rely on their creativity

to relate the topic to their personal experiences.

Page 9: eng   Web viewDespite the fact that the first two components of the early TOEFL offered direct measures of test takers’ English proficiency, the third section was

9EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST

Learner’s response. As the name suggests, this component deals with examinees’

outcomes which prove their ability to perform the task assigned to them (Bailey, 1998). In both

tasks, test takers respond with essays of varying lengths, depending on the preferences indicated

in the directions or time limit.

Scoring criteria. As discussed earlier, examinees’ writings on the TOEFL iBT are

scored holistically on a 5-point scale, and every point is accompanied by a description of an

examinee’s performance at that particular level as shown in (Appendix H). The focus of grading

in the two tasks is slightly different; while the integrated task looks at connectedness of ideas, the

independent task focuses more on the development of the topic.

Swain’s Framework

Swain (1984) highlights four principles that test developers need to utilize when

designing sound communicative tests: start from somewhere, concentrate on content, bias for

best, and work for washback. Table 2 displays these principles and their application to the

writing tasks of the TOEFL iBT.

Swain’s Analysis Principles Task 1 (Integrated Essay) Task 2 (Independent Essay)

Start from somewhereCommunicative competence theory

Communicative competence theory

Concentration on content Academic content Interactive content

Bias for bestVisual aids (picture), reading passage, clock, and note-taking.

Prompt, clock, and note-taking

Work for washbackTeaching of writing and test preparation courses

Teaching of writing and test preparation courses

Table 2. Swain’s analysis of communicative tests

Page 10: eng   Web viewDespite the fact that the first two components of the early TOEFL offered direct measures of test takers’ English proficiency, the third section was

10EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST

Start from somewhere. This principle indicates that tests should be built on a theoretical

foundation. The TOEFL iBT was established from the need to apply the theory of

communicative competence to language testing. The integrated task in the writing measure

requires examinees to integrate three skills (reading, listening, and writing) in order to produce

essays similar to those they write in university courses.

Concentrate on content. This principle “refers to both the content of the material used

as the basis of communicative language activities and the tasks used to elicit communicative

language behavior” (Swain, 1984, p.190). Since the writing section examines test takers’ writing

skills in a school environment, the content of the TOEFL iBT writing tasks is academic. For

instance, a test taker would compose a comprehensive essay after reading and listening to

passages on language acquisition. This example echoes Brown’s (2007) idea of language

contextuality, which is essential in promoting communicative competence.

Swain (1984) categorizes the content of large-scale communicative tests into four types:

motivating, substantive, integrated, and interactive. He defines the last as “the provision of

content that includes opinions or controversial ideas” (p.194). This type of content appears in the

independent task of the test, in which examinees express their attitudes toward general topics.

Bias for best. This principle focuses on test developers’ efforts to maximize examinees’

opportunities for successful performance (Swain, 1984). ETS has invested a great deal of energy

in designing user-friendly software that meets test takers visual and auditory preferences. In the

integrated writing task, a picture of a professor appears on the screen when examinees listen to a

lecture in order to simulate a real lecture environment. Additionally, test takers are allowed to

take notes during the task and, above all, the reading passage re-appears along with a timer when

Page 11: eng   Web viewDespite the fact that the first two components of the early TOEFL offered direct measures of test takers’ English proficiency, the third section was

11EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST

they start writing. On a broader level, ETS has published several editions of TOEFL iBT

preparation textbooks and online test samples which students can use in their daily practice.

Work for washback. The last category in Swain’s framework is the influence of a test

on language teaching, or more precisely, on “the curriculum that is related to” the test (Brown,

2005, p.242). The writing measure of the TOEFL iBT promotes positive washback because of its

communicative nature. Many ESL programs nowadays offer TOEFL iBT preparation courses to

help test takers achieve their target scores. As a former ESL student, I benefited greatly from the

preparatory course, especially from the timed-writing activities. This class was especially helpful

for one of the two students that I interviewed recently about the writing test. He said that the

course has even helped him improve his typing skills. He talked about his frustration in his first

test when he couldn’t complete his essay because he was a “very slow” typist.

Reliability and Validity

In addition to using Swain and Wesche’s frameworks, it is equally important to examine

the quality of a test in terms of its level of reliability and validity. Test reliability refers to the

consistency of ratings in a given test (Brown, 2005). In the TOEFL iBT, reliability value is

reported in a coefficience of 0-1, the closer the reliability value is to 1, the more reliable a test is.

In 2011, ETS published operational data that indicate the reliability levels of different sections

(Appendix I). On a scale of 30 points, the value of the writing section was 0.74 within 2.76 value

of the standard error of measurement. Though this value is the lowest compared to the other

sections, it is still considered to be high since it is only 0.26 points below1.0.

ETS as well as other resources confirm the former’s efforts in maintaining high reliability

by minimizing the impact of interrater issues. On the organization’s Online Scoring Network,

Page 12: eng   Web viewDespite the fact that the first two components of the early TOEFL offered direct measures of test takers’ English proficiency, the third section was

12EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST

raters are supervised by their leaders through a “toll-free phone arrangement” during the scoring

session. Next, specialists at ETS examine the quality of ratings, and finally, ETS reviews

supervisors’ reports on raters’ performance to further ensure consistency (Chapelle et al., 2009,

p.266).

However, reliability alone does not tell us enough about the quality of a test. A test’s

validity should be also investigated to determine its quality. Validity of the TOEFL iBT could be

measured on the basis of numerous propositions, such as those presented in (Appendix J). In this

review, however, I only focus on the first aspect in the list which focuses on the extent to which

the writing tasks measure what they are supposed to measure. In other words, how equivalent the

writing tasks of the TOEFL iBT are to those in college courses. To answer the questions,

Cumming et al. (2004) conducted a study on the authenticity of communicative tasks in the

TOEFL iBT. They interviewed seven highly experienced ESL teachers in the U.S and Canada

about whether the tasks of the then-new writing section actually resembled those in college

classes. In response, the teachers had their students take prototype TOEFL tasks. The results

indicated that 70% of students’ performances were similar to their performances when they write

English in class (Cumming et al., 2004).

As a test taker, I agree that most of the writings that I produce in graduate school require

me to state my opinion and synthesize ideas from different sources. Similarly, my interviewees

expressed their overall satisfaction with the nature of the tasks. Their main problem, besides one

of them being a slow typist, was writing under time pressure, which they gradually overcame

with the help of test preparation courses.

Page 13: eng   Web viewDespite the fact that the first two components of the early TOEFL offered direct measures of test takers’ English proficiency, the third section was

13EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST

Final Thoughts

This review of the writing section of the TOEFL iBT test served as a rich resource on

how and what goes into designing a high-stakes communicative test. In applying Wesche’s

(1983) and Swain’s (1984) analytical framework, I discovered that the writing portion of the

TOEFL iBT test exhibits features of a valid test. It measures test takers abilities to write essays

similar to those they compose at the university, by providing academic content supported with

auditory and visual aids to maximize the quality of their performances. Additionally, the writing

section appears to be reliable in that it manifests stability in the scoring process. One of the

interesting facts about the ETS scoring system is that more than one rater scores a single task to

ensure consistency. This complicated process made me think that my low score in the writing

section was more likely due to anxiety than to interrater problems.

Another feature that makes the TOEFL iBT a good candidate of an effective

communicative test is its positive influence on language teaching. Many language institutes offer

test preparation classes to equip international students, especially those with limited computer

skills, with the necessary tips and practice to receive the aimed score. Lastly, as a result of this

review, and particularly the long history of the TOEFL test, I came to realize that tests can never

be perfect; they change over time to meet the existing pedagogical practices. As a language

teacher, I will always bear this idea in mind because a test designed for a specific group in a

particular context might not work well for another population in different settings.

Page 14: eng   Web viewDespite the fact that the first two components of the early TOEFL offered direct measures of test takers’ English proficiency, the third section was

14EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST

References

Bailey, K. M. (1998). Learning about language assessment: Dilemmas, decisions and directions.

Boston, MA: Heinle & Heinle.

Brown, J. D. (2005). Testing in language programs: A comprehensive guide to English language

assessment (New ed.). Upper Saddle River, N.J: Prentice Hall Regents.

Brown, H. D. (2007). Teaching by principles: An interactive approach to language pedagogy.

White Plains, NY: Pearson Education.

Chapelle, C. A., Enright, M. K., & Jamieson, J. M. (2009). Building a validity argument for the

Test of English as a Foreign Language. New York, NY: Routledge.

Cumming, A., Grant, L., & Mulcahy-Ernt, P. (2004). A teacher-verification study of speaking

and writing prototype tasks for a new TOEFL. Language Testing, 21(1), 107-145.

Educational Testing Service. (2005). How to prepare for the next generation TOEFL test and

communicate with confidence. Retrieved September 28, 2013, from

http://www.transint.boun.edu.tr/toefl/belgeler/tips.pdf

Educational Testing Service. (2007). TOEFL computer-based and paper-based tests. Retrieved

September 28, 2013, from http://www.ets.org/Media/Research/pdf/TOEFL-SUM-0506-

CBT.pdf

Education Testing Service. (2008a). TOEFL iBT at a glance. Retrieved September 28, 2013,

from http://www.ets.org/Media/Tests/TOEFL/pdf/TOEFL_at_a_Glance.pdf

Page 15: eng   Web viewDespite the fact that the first two components of the early TOEFL offered direct measures of test takers’ English proficiency, the third section was

15EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST

Educational Testing Service. (2008b). Validity evidence supporting the interpretation and use of

TOEFL iBT scores. Retrieved September 28, 2013, from

http://www.ets.org/s/toefl/pdf/toefl_ibt_insight_s1v4.pdf

Educational Testing Service. (2011). Reliability and comparability of TOEFL iBT scores.

Retrieved September 28, 2013, from

http://www.ets.org/s/toefl/pdf/toefl_ibt_research_s1v3.pdf

Educational Testing Service. (2013a). Test and score data summary for TOEFL iBT tests and

TOEFL PBT tests. Retrieved September 28, 2013, from

http://www.ets.org/s/toefl/pdf/94227_unlweb.pdf

Educational Testing Service. (2013b). About the TOEFL iBT test. Retrieved September 28,

2013, from http://www.ets.org/toefl/ibt/about

Educational Testing Service. (2013). TOEFL Ibt test scores. Retrieved September 28, 2013, from

http://www.ets.org/toefl/ibt/scores/

Greenberg, K. L. (1986). The development and validation of the TOEFL writing test: A

discussion of TOEFL research reports 15 and 19. TESOL Quarterly.

Sawaki, Y., Stricker, L., & Oranje, A. (2008). Factor structure of the TOEFL internet-based test

(iBT): Exploration in a field trial sample. Educational Testing Service.

Sharpe, P. J. (2010). TOEFL iBT (13. ed.). Hauppauge, NY: Barron's.

Page 16: eng   Web viewDespite the fact that the first two components of the early TOEFL offered direct measures of test takers’ English proficiency, the third section was

16EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST

Swain, M. (1984). Large-scale communicative language testing: A case study. In S. J. Savignon,

& M. Berns (Eds.), Initiatives in communicative language teaching (pp. 185-201).

Reading, MA: Addison-Wesley.