Proficiency Test Reviews: the TOEFL iBT, IELTS, and PTE ...annesheriff.weebly.com/uploads/4/5/9/5/45955137/test_review.pdf · Proficiency Test Reviews: the TOEFL iBT, IELTS, and PTE-Academic

Proficiency Test Reviews: the TOEFL iBT, IELTS, and PTE-Academic

Anne Sheriff

Colorado State University

As universities in English speaking countries continue to recruit international and non-

native English speakers, the need to assess the English proficiency of these students’ continues to

increase. These universities set admission requirements so that they do not directly admit

students who do not have the English proficiency required for academic success. In order to help

universities make decisions about English proficiency admission requirements, testing

companies continue to design new tests. These revisions help to make English language

proficiency tests better reflect authentic language use tasks found in English medium classrooms.

Many of these newer versions of tests also work to make test tasks more integrated in the search

for authenticity.

For these test reviews, I am interested in looking at English proficiency tests used to

make admission decisions at English speaking universities. I focused on the most popular tests

in the United States, the United Kingdom and Australia and the most recent versions of these

tests. For these purposes, I chose to review the TOEFL iBT, the International English Language

Testing System (IELTS), and the Pearson Test of English Academic (PTE Academic). Each of

these tests assesses all four language skills: reading, listening, reading and writing. They also all

incorporate some level of integration of skills within their tasks. By reviewing these tests, I hope

to see how they differ in terms of format and scoring, as well as how well they assess the

language skills in a way that is authentic to the academic uses of English.

TOEFL iBT

Publisher: Educational Testing Service (ETS)

Publication Date: 1963--TOEFL pBT, 1998--TOFEL cBT, 2005--TOEFL iBT (ETS, 2007)

Target Population: Students whose first language is not English and who wish to

study at an English speaking university or work in an English speaking

country

Cost: Varies by country of administration; see www.ets.org/toefl

In the United States, the test costs $180. In the United Kingdom, the test

costs $185. In Australia, the test costs $240.

Table 1

Extended Review of TOEFL iBT

Test purpose The TOEFL iBT is designed to measure English proficiency

levels in the four language use domains: listening, speaking, reading

and writing, within language use tasks similar to those seen at

universities. As university students must use all four language skills

in all of their classes, the TOEFL iBT measures these skills in

integrated tasks. As ETS (2012) states, “The TOEFL iBT test uses

integrated tasks that require test takers to combine skills just as they

would in a real academic setting.”

Currently over 9,000 institutions in more than 130 countries

recognize the TOEFL iBT (ETS, 2011b). It is primarily used as an

admission criterion for English medium universities and institutions.

However it is also used by professional certification programs,

scholarships and governmental uses, obtaining visas for example, as

proof of English proficiency (Stoynoff & Chappelle, 2005).

Test structure The TOEFL iBT is a computer-based test administered from

a secure internet-testing network. It takes approximately four and a

half hours to complete and is divided into four sections: reading,

listening, speaking and writing.

The reading section has between 30 and 76 items and takes

60 to 80 minutes (ETS, 2012). Test takers read three or four

passages of general interest, meaning that these texts should not

require any specific background knowledge to understand (ETS,

2011b). Each text is approximately 700 words in length, and

students answer 12-14 questions about each reading passage. Based

on these passages, students are asked multiple choices questions that

assess comprehension of factual information, ability to infer,

vocabulary comprehension, and the relationships between ideas.

The listening portion of the test includes between 34 and 51

items and lasts between an hour and an hour and a half. Test takers

listen to four to six classroom discussions, lectures and/or

conversations that are between three and five minutes long and

answer questions on these listening passages. These questions are

multiple-choice items designed to assess understanding the main

idea, recognizing important details and the organization of

information, and understanding the relationship between ideas (ETS,

2011b).

The speaking section takes 20 minutes and is divided between

six different tasks. In this section, students are asked to speak on

topics that may come from personal experience, campus-based

situations and academic content. In Task 1 and 2, are called

independent speaking tasks, because test takers are only asked to

speak about their own ideas, opinions and experiences in their

responses (ETS, 2012). They are expected to respond to a “relatively

general question on a familiar topic” (ETS, 2011b). The other four

tasks integrate multiple skills and require students to orally respond

to information they receive from reading or listening. Tasks 3 and 4

usually asks students to read a short text, then listen to comments

made about the text before giving their response which may be a

summary of the speaker’s opinions or combining important

information from the two sources of input (ETS, 2012).

The writing section is fifty minutes long. This section

contains two tasks. Like the speaking section, the first task is

considered independent. In it, students write an essay on a general

subject, using only their own opinions and experiences. They are

expected to express their opinion of a given subject and develop their

ideas in a formal style. The second task integrates multiple language

skills. Students read a text, listen to a lecture that relates to the text

and then write an essay that summarizes the important topics in the

reading and explains how they relate to the opinions given in the

lecture. ETS (2012) says this summary should be in “connected

English prose.”

Test Scoring The overall score for the TOEFL iBT is 120 points as each

section is worth 30 points. The listening and reading sections are

scored by computer. Most of these multiple-choice questions are

scored as correct or incorrect. However, the questions that ask test

takers to identify relationships between ideas may be given partial

credit. Two human raters score the integrated writing tasks, while

one human rater and a computer score the independent writing task.

The speaking section responses are scored exclusively by several

human raters and are scored based on delivery, language use and

topic development (ETS, 2012). Scores for all of the performance

sections are assigned using holistic rubrics. Multiple raters score

each individual test to minimize rater bias. The raters are all

certified professionals. These raters come from multiple countries,

so that it is not only raters from the test takers country of origin who

score their performance (ETS, 2011b).

On the speaking tasks, each of the four tasks is rated from 0

to 4. The sum of the six rated tasks is then converted to the 0 to 30

scale. Likewise for writing, each of the tasks are rated from 0 to 5

then combined and converted to the 0 to 30 scale (ETS, 2011b).

Each institution that uses TOEFL iBT scores sets their own

cut-off scores.

Statistical

distribution of the

scores

ETS provides data based on test taker results from the

January 2013 to December 2013 testing period. Statistical

distribution reflects these results.

Mean St. Deviation

Reading 20.1 6.7

Listening 19.1 6.7

Speaking 20.1 4.6

Writing 20.6 5.0

(ETS, 2014)

Standard error of

measurement

The standard error of measurement as reported by ETS (2011a) is

based on data from 2007. The standard error of measurement (SEM)

is expressed in points on the TOEFL scale.

Section Scale SEM

Reading 0-30 3.35

Listening 0-30 3.20

Speaking 0-30 1.62

Writing 0-30 2.76

Total 0-120 5.64

Since the SEM for each section is fairly low in proportion to the

maximum possible score (10% or less), test takers true scores are

probably fairly close to the scores they receive.

Evidence of

reliability

The reliability estimate for select-response items is calculated using a

method based on item-response theory, while the estimate for the

constructed response items is based on generalizability theory (ETS,

2011a).

Section Reliability

Estimate

Reading 0.85

Listening 0.85

Speaking 0.88

Writing 0.74

Total 0.94

These reliability estimates are relatively high for each section except

the writing section, which could be due to the subjective nature of

writing tasks and their scoring. A 2008 study found that test takers

who took the TOEFL twice within a short amount of time showed a

high degree of consistency in their score (Zhang). In addition, the

TOEFL had a high overall reliability of 0.94, supporting the

consistency of scores and items.

Evidence of validity ETS has conducted numerous studies to support the validity

of the TOEFL iBT. In the report Validity evidence supporting the

interpretation and use of TOEFL iBT score (2011b), ETS lays out

and describes each proposition and the evidence used to prove

validity of the test. ETS (2011b) shows that validity increased with

this internet based version of the TOEFL as it includes integrated

skills tasks similar to those that would be encountered in an English

medium university classroom.

In addition, ETS researchers have conducted several studies

to build an argument for the predictive validity of the TOEFL iBT as

it relates to academic success. Studies compiled in Chapelle et. al.

(2008) show there is a correlation between the predictive value of the

TOEFL iBT and students’ academic success. However, a study by

Vu & Vu (2013) concluded that is no significant correlation between

TOEFL scores and GPA. Based on their findings, they state that

“The TOEFL test is not an aptitude and ability test…just simply a

language test used as a prerequisite for academic performance (p.

18).

Sawaki and Nissan (2009) provide evidence of criterion-

related validity in the listening section by comparing the language

use tasks used in this section of the test to native speaker language

use in higher education contexts. They found that students who

score a 14 or above on the listening section of the TOEFL iBT were

usually able to comprehend 50% or more of university lectures.

While this could potentially be a good thing, they do not conclude

that this is enough to succeed in an English medium university

classroom. They also do not provide evidence for how much more

students who score more than 14 can understand.

International English Language Test System (IELTS)

Publisher: University of Cambridge ESOL Examinations, the British Council and

IDP: IELTS Australia. Subject Manager, University of Cambridge ESOL

Examinations, 1 Hills Road, Cambridge CB1 2EU United Kingdom;

telephone 44-1223-553355; fax 44-1223-460278; [email protected];

http://www.ielts.org/. Manager, North America, Cambridge Examinations

and IELTS International, 100 East Corson Street, Suite 200, Pasadena, CA

91103 USA; telephone 626-564-2954; fax 626-564-2981;

[email protected]; https://www.ielts.org/

Publication Date: 1989

Target Population: Students for whom English is not a first language and who wish to work or

attend university in an English-speaking country

Cost: Varies greatly by testing center; see http://www.ielts.org/

IELTS has a set fee for its test. The Academic and General Training tests

are the same cost. However, fees vary between testing centers, possibly

due to differing administration fees. In the United States, costs range from

$190-$265, in the United Kingdom, £135-£145, in Australia, AUD$330.

Table 2

Extended Description for IELTS

Test purpose IELTS provides a measure of the English language ability of

non-native English speakers who wish to work or study where English is

the language of communication (O’ Sullivan, 2005). When it was first

created, the test was intended for students who wished to study in

Australia, the United Kingdom and New Zealand. However, in the past

10 years, schools in the United States and Canada have started accepting

scores. In addition to schools and universities, government agencies,

professional and industry bodies and multinational companies and

employers accept IELTS scores as proof of English ability. In fact, more

than 9,000 organizations in over 135 countries accept IELTS (IELTS

Partners, 2013a).

IELTS actually consists of two tests, an academic route for test

takers who are planning to apply to universities or for membership in

professional organizations and a general training route for test takers

who wish to attend certain high schools or participate in work or training

activities (O’Sullivan, 2005). Australia, New Zealand and Canada also

use the general training track for immigration purposes (IELTS Partners,

2013a). Every recognizing organization sets its own entry requirements.

Test Structure IELTS covers all for language skills: listening, reading, writing

and speaking. The academic and general training routes have the same

listening and speaking tests and differ only in the reading and writing

sections which are designed to reflect academic reading and writing

purposes and general training reading and writing purposes respectively.

The listening, reading and writing tests must be completed on the same

day. However, the speaking test may be taken up to seven days before

or after the other three tests (IELTS Partners, 2013b). The first three

tests take two hours and thirty minutes to complete. The speaking test

lasts between 11 and 14 minutes.

The listening test lasts approximately 30 minutes with an

additional 10 minutes of transfer time. It is made up of 40 questions of a

variety of types including: multiple choice, matching, form completion,

summary completion and short-answer questions on four different

listening sections. The first is a conversation between two people set in

an everyday social context. Section two is a monologue again set in an

everyday social context. The third section is a conversation between up

to four people, which is set in an educational or training context. The

last section is a monologue on an academic subject. Each section is only

heard once and is performed by a variety of voices and native-speaker

accents (IELTS Partners, 2013b).

The reading test is a 60 minute test with 40 questions. These

questions include a variety of different task types including: identifying

information, identifying writer’s views/claims, matching headings,

sentence completion, table completion and diagram-label completion.

The Academic reading test and the general training reading test each

have three sections, for a total text length of 2150-2750 words (IELTS

Partners, 2013b). In the academic reading test, each section contains one

long authentic text taken from books, journals, magazines or

newspapers. They have been written for a non-specialist audience about

academic topics of general interest. The texts may contain non-verbal

materials such as diagrams or illustrations. When they contain technical

terms, a simple glossary is provided (IELTS Partners, 2013b). The

General Training reading test, like the Academic reading test, contains

authentic texts, taken from notices, advertisements, company handbooks,

official documents, books, magazines and newspapers. The first section

contains two or three short factual texts relevant to everyday life in

English-speaking countries. The second section is two short texts that

focus on work-related issues. The last section is one longer, more

complex test on a topic of general interest (IELTs Partners, 2013b).

The writing test lasts 60 minutes and is made up of two tasks.

Test takers write at least 150 words for Task 1 and at least 250 words for

Task 2. For the Academic writing test, task 1 asks candidates to

summarize and explain the information presented in some sort of chart

or diagram in their own words. In task 2, they are asked to write an

essay in response to a point of view, argument or problem. Both task

responses should be written in a formal style. For the General Training

writing test, in task 1, test takers are given a situation and must write a

letter about this situation. The style of the letter may be personal, semi-

formal or formal. In task 2, test takers must write an essay in response

to a point or view, an argument or a problem.

The speaking test is a three-part face-to-face oral interview with

an examiner that lasts 11 to 14 minutes. This test is recorded for scoring

purposes. The first part is an introduction and interview that lasts four to

five minutes. The examiner introduces themself and then asks the test

taker general questions on familiar topics. In part two, the test taker is

given a task card, which asks them to talk about a particular topic. It

also gives the test taker several points that they can include in their talk.

They are given one minute to prepare and one to two minutes to speak.

Following their short talk, the examiner asks one or two questions on the

same topic. The last part is a two-way discussion. The examiner asks

further questions connected to the topic of part two for four to five

minutes, giving the test taker more opportunity to discuss abstract issues

and ideas (IELTS Partners, 2013b).

Test Scoring IELTS scores are reported on a 9-band scale, with 9 indicating an

expert user and 1 indicating a non-user (IELTS Partners, 2013a). Scores

are reported as whole or half bands.

On the listening and reading tests, each correct answer receives

one point for scores out of forty that are converted to the IELTS 9-band

scale. How these scores are converted for these skills is shown in the

table below.

Listening Academic Reading General Training

Reading

Raw

score

Band

score

Raw

score

Band

score

Raw

score

Band

score

16 5 15 5 15 4

23 6 23 6 23 5

30 7 30 7 30 6

35 8 35 8 34 7

On the writing test, responses are assessed by certified IELTS

examiners according to the Writing Test Band Descriptors (task

achievement/response, coherence and cohesion, lexical resource and

grammatical range and accuracy). Task 2 contributes twice as much as

Task 1 to the overall writing score. This distribution could be due to the

fact that Task 2 more closely reflects authentic academic situations and

involves higher order thinking skills.

On the speaking test, test takers are assessed on their

performance throughout the test by certified IELTS examiners according

to the Speaking Test Band Descriptors (fluency and coherence, lexical

resource, grammatical range and accuracy, and pronunciation) (IELTS

Partners, 2013b).

Once a band score has been determined for each part of the test,

the four scores are averaged together for the overall score. When a test

taker’s overall score ends in a .25, it is rounded up to the next highest

half band and when it ends in a .75, it is rounded up to the next highest

whole band (IELTS Partners, 2013a).

Statistical

Distribution of

the scores

IELTS provides data based on test taker results from 2012. According

to their research, these results are broadly in line with statistics of

previous years.

Mean

Standard

Deviation SEM

Listening 6.0 1.3 0.390

Academic Reading 5.9 1.0 0.316

General Training Reading 6.1 1.2 0.339

Academic Writing 5.5 - -

General Training Writing 5.9 - -

Speaking 6.0 - -

Due to the format of the Writing and Speaking modules, scores cannot

be reported in the same manner and no standard deviation information is

provided for these tests. However, IELTS states that they are rated by

trained and standardized examiners according to detailed descriptive

criteria and rating scales. (IELTS Partners, 2013a)

Standard error of

measurement

The overall standard error of measurement for IELTS is 0.22

bands (Pearson, 2012a). Because the SEM is less than one band, test

takers scores are a good indicator for their true abilities.

Evidence of

reliability

Internal consistency is reported for the reading and listening

modules for 2012. Across the 16 versions of the listening module, the

mean of Cronbach’s alpha is 0.91. Across the 16 versions of the general

training reading module, the mean of Cronbach’s alpha is 0.92. Finally

across the 16 versions of the academic reading module, the mean of

Cronbach’s alpha is 0.90. As these measures of internal consistency are

very close to 1, each section is shown to have high reliability.

Because the writing and speaking modules are not item-based,

IELTS does not report reliability for these modules in the same manner

as reading and listening. Reliable scores for these two sections are

assured through face-to-face training and certification of examiners as

well as redoing this process every two years. Because the speaking tests

are recorded, a professional support network can manage and

standardize their examiners ratings. Shaw (2004) reported that the inter-

rater correlation for these sections was 0.77 and the g-coefficients were

0.84-0.93 for the single-rater condition.

Overall reliability estimate for the test is reported as 0.96

(Pearson, 2012b). This shows that the test as a whole is also highly

reliable.

Evidence of

validity

The University of Cambridge ESOL Examinations claims the use

of expert judgment in operationalizing the construct as well as empirical

evidence provided through statistical analysis of test responses as

evidence of construct-related validity (O’Sullivan, 2005).

IELTS provides a large number of research studies on their

website showing the correlation between test scores and test outcomes as

evidence of predictive validity (IELTS Partners, 2013a). Some of these

studies examine the relationship between IELTS scores and grade point

average (GPA). One such study showed that IELTS scores for both

listening and reading correlated with higher GPAs, while the writing and

speaking score did not correlate with this information (Humphreys et. al,

2012). This study suggests that of the four sections, only the listening

and reading sections have some valid predictive value. Another study

funded by IELTS examined case studies of IELTS used for postgraduate

admissions and performance for predictive purposes. This study

concludes that IELTS scores do not seem to indicate strongly one way or

another how a student will perform in an English-speaking academic

domain (Lloyd-Jones, Neame & Medaney, 2011).

However a different study, by Ingram and Bayliss (2007), found

that most of the students who participated had language behavior that

equaled or exceeded that implied by their overall IELTS rating. This

study also found that an overall proficiency level of 6.5 or higher was

adequate for university study (Ingram & Bayliss, 2007).

Pearson Test of English Academic (PTE Academic)

Publisher: Pearson

Publication Date: 2009

Target Population: Students who do not speak English as their native language who want

to study at English medium universities

Cost: Cost varies by country of testing center; see pearsonpte.com for prices. In

the United States, the test costs $200. In the United Kingdom, the test

costs £155. In Australia, it cost AUS$330.

Table 3

Extended Review of PTE Academic

Test purpose PTE Academic is a proficiency assessment aimed at students

who wish to attend English medium universities. It measures

reading, writing, listening and speaking in integrated tasks within a

computer-based format. Pearson developed the PTE Academic to

meet the need for an English language proficiency test that

measured non-native English speakers English communication

skills more accurately in the context of an academic environment as

securely and efficiently as possible(Pearson, 2013).

As this test is relatively new, its goal is to one day compete

with other large-scale, high-stakes English proficiency tests such as

the IELTS and TOEFL iBT. In fact, in much of Pearson’s research

(2009, 2012a, 2012b), statistical comparisons to these two tests are

included as proof of its accuracy and objectivity.

In addition to university admission, the PTE Academic can

be used as proof of English proficiency in order to obtain visa in

New Zealand, Australia, the United Kingdom and the United States.

Test structure PTE Academic is a computer-based English language test. It

assesses the four skills, listening, reading, speaking, and writing, in

an integrated format. The test is three hours long and split into three

timed parts. The timings of these sections may vary slightly. There

are 20 different types of questions, some of which integrate more

than one language skill.

The test starts with an unscored personal introduction. The

first part of the test is speaking and writing which can last between

77 and 93 minutes. The second part is a 32 to 41 minute reading

section. Students can choose to take a break after this part before

starting on the last section which is a 45 to 57 minute listening

section (Pearson, 2012a).

Within the speaking and writing section of the test, there are

five types of tasks that focus on speaking and two types of tasks that

focus on writing. The speaking tasks include: reading aloud,

repeating sentences, describing images, retelling a lecture, and

answering short questions. The writing tasks include: summarizing

a written text and writing an essay.

Within the reading section of the test, there are five types of

tasks. These are two types of multiple-choice questions (choosing

single answers and choosing multiple answers), reordering

paragraphs, reading with fill in the blanks, and reading and writing

with fill in the blanks.

Within the writing section, test takers are asked to

summarize a spoken text, answer multiple choice questions with

multiple answers and with single answers, fill in the blanks,

highlight a correct summary, select missing words, highlight

incorrect words, and do a dictation (Pearson, 2011).

Test scoring PTE Academic provides an overall score for each test taker

as well as scores for communicative skills (reading, listening,

speaking and writing) and enabling skills (grammar, oral fluency,

pronunciation, spelling, vocabulary and written discourse). As

many items contribute to more than one communicative or enabling

skill, the overall score cannot be directly computed from these two

scores. All of the scores are on a scale from 10 to 90: 10 being the

lowest proficiency and 90 being the highest. For integrated skills

items, the item score contributes to all of the skills that the item

assesses.

The scores for the enabling skills rate performance in the

productive skills, speaking and writing (Pearson, 2012b). No

enabling skills score is awarded for responses to items that are

inappropriate in either content or form.

All items in PTE Academic are machine scored. Some items

have correct-incorrect scoring while others use partial credit

scoring, taking into account correctness, formal aspects and the

quality of the response. The formal aspects include whether or not a

response is under or over the word limit, while the quality of the

response is represented by the enabling skills assessed by an item.

Speaking and writing skill scores are generated by automated

scoring systems.

For performance based items, they are first scored on

content. If no response or an irrelevant response is given, the

content is scored as 0 and no more points are awarded for this task.

If a score is given for content, the item will also be scored on form.

If the response is of the appropriate length, a score will be given and

then the response will be rated on the enabling skills present in the

response. The total item score comes from adding the scores for

content, form and enabling skills. The total item score contributes

to the communicative skills score assessed by the item as well as to

the overall score. The individual scores awarded to each of the

enabling skills contribute to the enabling skills scores that are

reported (Pearson, 2012b).

Statistical

Distribution of scores

Riazi (2014) reports the mean and standard deviation for each skill

based on data from small group of students.

Mean SD

Listening 64.65 14.66

Reading 60.33 14.66

Speaking 66.92 17.10

Writing 60.5 13.72

Total 63.42 14.10

Standard Error of

Measurement

In their Accurate Factsheet, Pearson (2012) states the

standard error of measurement for PTE Academic as a whole is 2.32

points on the PTE Academic scoring scale. In addition, in their

Reliability and Validity Report, Pearson (2009) states that the

standard error of measurement for each of the communication skills

is less than 5 points. These measurements show that test takers

scores are very near their true scores.

Evidence of

Reliability

Pearson (2012) presents reliability estimates for each of the

communicative skills as well as an overall reliability. These

estimates are presented below:

Reliability

Listening 0.91

Reading 0.92

Speaking 0.91

Writing 0.91

Total 0.97

These estimates show a high reliability for the PTE Academic,

demonstrating the overall high consistency and stability of test

scores.

In order to support that the machine scoring (VERSANT)

system used to score this assessment was reliable, Pearson

compared the machine-given scores to human scores. The overall

reliability for both groups was 0.97 and the correlations between the

two types of scoring was 0.96 (Pearson, 2009).

Evidence of Validity From the beginning, evidence was collected to ensure the

validity of PTE Academic. It was important to the developers that

the test be linked to other external frameworks of language

proficiency. To fulfill this goal, PTE Academic was benchmarked

to the Common European Framework (CEF). In order to guarantee

this alignment, test item writers received training in understanding,

interpreting and using the CEF, and student responses were rated on

the CEF scale independent of the test scores by human raters

(Pearson, 2009).

Another way Pearson worked toward construct validity in

PTE Academic was having native English speakers test the pilot

items. Their results constituted one of item selection criteria (Zheng

& De Jong, 2011).

As proof of concurrent and predictive validity, Riazi (2013)

compared PTE Academic with IELTS. This comparison showed

that there is a significant correlation between test takers’ scores on

PTE Academic and IELTS Academic. In addition, PTE Academic

was able to significantly differentiate between lower and higher

proficiency groups as determined by IELTS Academic scores.

After reviewing these three tests, it was clear that all three tests assess language skills

necessary for success in an English medium classroom. All three work to use authentic materials

as input for the assessment. The TOEFL iBT has students listen to lectures, IELTS has students

read authentic articles, and PTE Academic has students listen to lectures and read texts that cover

academic topics. These tests also integrate the skills to varying degrees. IELTS has a few tasks

that require students to read short texts and write or speak about these texts. The TOEFL iBT

requires students that read or listen before speaking and read and listen before writing. The PTE

Academic integrates most of their tasks, asking students to read and write or listen and write or

read and speak throughout the test. These tests also differ in how they score these skills.

Because of the types of tasks included on the IELTS, highly trained raters are used to score all

parts of the test. The TOEFL is scored partially by computers, and partially by highly trained

raters as well. The PTE Academic, however, is completely scored by computer with the

VERSANT software. All three tests take more than two and a half hours to administer. IELTS

is the shortest at around two hours and forty-five minutes. PTE Academic is the next longest at

three hours. The TOEFL is by far the longest, lasting four and a half hours.

Students in an academic English context might need to choose between these three tests.

These students want to study at an English speaking university in the United States, the United

Kingdom or Australia, and they need to demonstrate their English proficiency in order to be

admitted into the university of their choice. All of them will have the equivalent of a high school

diploma and some of them might even already have some sort of undergraduate degree in their

home countries.

In choosing which exam to suggest for these students, the instructor must think about the

students’ goals. If a student knows exactly which university they wish to attend, they can look at

which test scores they accept and decide amongst which of those tests they will have the most

success on. They might think the format of one test seems more appealing. Speaking into a

microphone for the speaking section as on the TOEFL iBT or PTE Academic might seem less

stressful than a face to face interview as on the IELTS. I, personally, tend to be skeptical of a

test that claims to score speaking and writing tasks by computers only. However, PTE Academic

claims their scoring software has been trained with thousands of samples so that it grades

effectively and objectively any answer provided to it.

However, if a student does not know which university that they would like to attend, I

would suggest the TOEFL iBT. It is one of the most widely accepted English proficiency tests

and known by most everyone who works with international students. Another fact that may be a

deciding factor for students is that the TOEFL iBT is the cheapest of these three tests. While

extensive research is available for all three tests proving their reliability and validity, the TOEFL

iBT has published the most research as well as the most study manuals.

References

Chapelle, C. A., Enright, M.K., Jamieson, J.M. (2008) Building a validity argument for the Test

of English as a Foreign Language. New York, NY: Routledge Taylor & Francis Group.

ETS. (2007). Test and Score Data Summary or TOEFL Computer-Based and Paper-Based

Tests. Retrieved from http://www.ets.org/toefl/

ETS. (2014). Test and score data summary for TOEFL iBT Tests. Retrieved from

http://www.ets.org/toefl/

ETS. (2011). TOEFL iBT Research: Reliability and comparability of TOEFL iBT scores.

Retrieved from http://www.ets.org/toefl/

ETS. (2011). TOEFL iBT Research: TOEFL iBT test framework and test development. Retrieved

from http://www.ets.org/toefl/

ETS. (2011). TOEFL iBT Research: Validity evidence supporting the interpretation and use of

TEOFL iBT Scores. Retrieved from http://www.ets.org/toefl/

ETS. (2012). TOEFL Test Prep Planner. Retrieved from http://www.ets.org/toefl/

IELTS Partners. (2013). Information for Candidates. Retrieved from http://www.ielts.org/

IELTS Partners. (2013). IELTS – International English Language Testing System.

Retrieved from http://www.ielts.org/

Humphreys, P., Haugh, M., Fenton-Smith, B., Lobo, A., Michael, R & Walkinshaw, I. (2012).

Tracking international students’ English proficiency over the first semester of

undergraduate study. IELTS Research Reports Online Series (1). Retrieved from

http://www.ielts.org/

Ingram, D. & Bayliss, A. (2007). IELTS as a predictor of academic language performance.

Melbourne: IELTS Australia.

Lloyd-Jones, G., Neame, C. & Medaney, S. (2011) A multiple case study of the relationship

between the indicators of students’ English language competence on entry and students’

academic progress at an international postgraduate university. IELTS Research Reports

(11). Retrieved from www.ielts.org/PDF/vol11_report_3_a_multiple_case_study.pdf

O’Sullivan, B. (2005). International English Language Testing System (IELTS). In S.

Stoynoff & C. Chapelle, ESOL Tests and Testing (pp. 73-78). Alexandria, VA: TESOL.

Pearson. (2012). Pearson Academic Accurate Factsheet. Retrieved from

http://pearsonpte.com/

Pearson. (2012). PTE Academic and Me. Retrieved from http://pearsonpte.com/

Pearson. (2012). Pearson Academic Objective Factsheet. Retrieved from


Pearson. (2011). PTE Academic Test Tips. Retrieved from http://pearsonpte.com/

Pearson. (2012). PTE Academic Score Guide. Retrieved from http://pearsonpte.com/

Pearson. (2009). Validity and reliability in PTE Academic. Retrieved from


Riazi, M. (2013). Concurrent and predictive validity of Pearson Test of English Academic

(PTE Academic). Papers in Language Testing and Assessement 2(2), 1-27.

Sawaki, Y., & Nissan, S. Criterion-related validity of the TOEFL iBT listening section. ETS,

Princeton, NJ (2009).

Shaw, S. D. (2004). “IELTS writing: revising assessment criteria and scales (phase 3).”

Research notes 16, 3-7.

Stoynoff, S. & Chapelle, C. (2005). ESOL Tests and Testing. Alexandria, VA: TESOL

Vu, L.T. & Vu, P.H. (2013). Is the TOEFL score a reliable indicator of international graduate

students’ academic achievement in American higher education? International Journal on

Studies in English Language and Literature (IJSELL). 1 (1) p.11-19.

Zhang, Y. (2008). Repeater analysis for the TOEFL iBT. ETS Research Report (RM-08-05).

Princeton, NJ: ETS.

Zheng, Y. & De Jong J. H.A.L. (2011). Research Note : Establishing Construct and

Concurrent Validity of Pearson Test of English Academic. London: Pearson.

Documents

Proficiency Test Reviews: the TOEFL iBT, IELTS, and PTE ...annesheriff.weebly.com/uploads/4/5/9/5/45955137/test_review.pdf · Proficiency Test Reviews: the TOEFL iBT, IELTS, and PTE-Academic