Click here to load reader
Upload
lengoc
View
224
Download
3
Embed Size (px)
Citation preview
Proficiency Test Reviews: the TOEFL iBT, IELTS, and PTE-Academic
Anne Sheriff
Colorado State University
As universities in English speaking countries continue to recruit international and non-
native English speakers, the need to assess the English proficiency of these students’ continues to
increase. These universities set admission requirements so that they do not directly admit
students who do not have the English proficiency required for academic success. In order to help
universities make decisions about English proficiency admission requirements, testing
companies continue to design new tests. These revisions help to make English language
proficiency tests better reflect authentic language use tasks found in English medium classrooms.
Many of these newer versions of tests also work to make test tasks more integrated in the search
for authenticity.
For these test reviews, I am interested in looking at English proficiency tests used to
make admission decisions at English speaking universities. I focused on the most popular tests
in the United States, the United Kingdom and Australia and the most recent versions of these
tests. For these purposes, I chose to review the TOEFL iBT, the International English Language
Testing System (IELTS), and the Pearson Test of English Academic (PTE Academic). Each of
these tests assesses all four language skills: reading, listening, reading and writing. They also all
incorporate some level of integration of skills within their tasks. By reviewing these tests, I hope
to see how they differ in terms of format and scoring, as well as how well they assess the
language skills in a way that is authentic to the academic uses of English.
TOEFL iBT
Publisher: Educational Testing Service (ETS)
Publication Date: 1963--TOEFL pBT, 1998--TOFEL cBT, 2005--TOEFL iBT (ETS, 2007)
Target Population: Students whose first language is not English and who wish to
study at an English speaking university or work in an English speaking
country
Cost: Varies by country of administration; see www.ets.org/toefl
In the United States, the test costs $180. In the United Kingdom, the test
costs $185. In Australia, the test costs $240.
Table 1
Extended Review of TOEFL iBT
Test purpose The TOEFL iBT is designed to measure English proficiency
levels in the four language use domains: listening, speaking, reading
and writing, within language use tasks similar to those seen at
universities. As university students must use all four language skills
in all of their classes, the TOEFL iBT measures these skills in
integrated tasks. As ETS (2012) states, “The TOEFL iBT test uses
integrated tasks that require test takers to combine skills just as they
would in a real academic setting.”
Currently over 9,000 institutions in more than 130 countries
recognize the TOEFL iBT (ETS, 2011b). It is primarily used as an
admission criterion for English medium universities and institutions.
However it is also used by professional certification programs,
scholarships and governmental uses, obtaining visas for example, as
proof of English proficiency (Stoynoff & Chappelle, 2005).
Test structure The TOEFL iBT is a computer-based test administered from
a secure internet-testing network. It takes approximately four and a
half hours to complete and is divided into four sections: reading,
listening, speaking and writing.
The reading section has between 30 and 76 items and takes
60 to 80 minutes (ETS, 2012). Test takers read three or four
passages of general interest, meaning that these texts should not
require any specific background knowledge to understand (ETS,
2011b). Each text is approximately 700 words in length, and
students answer 12-14 questions about each reading passage. Based
on these passages, students are asked multiple choices questions that
assess comprehension of factual information, ability to infer,
vocabulary comprehension, and the relationships between ideas.
The listening portion of the test includes between 34 and 51
items and lasts between an hour and an hour and a half. Test takers
listen to four to six classroom discussions, lectures and/or
conversations that are between three and five minutes long and
answer questions on these listening passages. These questions are
multiple-choice items designed to assess understanding the main
idea, recognizing important details and the organization of
information, and understanding the relationship between ideas (ETS,
2011b).
The speaking section takes 20 minutes and is divided between
six different tasks. In this section, students are asked to speak on
topics that may come from personal experience, campus-based
situations and academic content. In Task 1 and 2, are called
independent speaking tasks, because test takers are only asked to
speak about their own ideas, opinions and experiences in their
responses (ETS, 2012). They are expected to respond to a “relatively
general question on a familiar topic” (ETS, 2011b). The other four
tasks integrate multiple skills and require students to orally respond
to information they receive from reading or listening. Tasks 3 and 4
usually asks students to read a short text, then listen to comments
made about the text before giving their response which may be a
summary of the speaker’s opinions or combining important
information from the two sources of input (ETS, 2012).
The writing section is fifty minutes long. This section
contains two tasks. Like the speaking section, the first task is
considered independent. In it, students write an essay on a general
subject, using only their own opinions and experiences. They are
expected to express their opinion of a given subject and develop their
ideas in a formal style. The second task integrates multiple language
skills. Students read a text, listen to a lecture that relates to the text
and then write an essay that summarizes the important topics in the
reading and explains how they relate to the opinions given in the
lecture. ETS (2012) says this summary should be in “connected
English prose.”
Test Scoring The overall score for the TOEFL iBT is 120 points as each
section is worth 30 points. The listening and reading sections are
scored by computer. Most of these multiple-choice questions are
scored as correct or incorrect. However, the questions that ask test
takers to identify relationships between ideas may be given partial
credit. Two human raters score the integrated writing tasks, while
one human rater and a computer score the independent writing task.
The speaking section responses are scored exclusively by several
human raters and are scored based on delivery, language use and
topic development (ETS, 2012). Scores for all of the performance
sections are assigned using holistic rubrics. Multiple raters score
each individual test to minimize rater bias. The raters are all
certified professionals. These raters come from multiple countries,
so that it is not only raters from the test takers country of origin who
score their performance (ETS, 2011b).
On the speaking tasks, each of the four tasks is rated from 0
to 4. The sum of the six rated tasks is then converted to the 0 to 30
scale. Likewise for writing, each of the tasks are rated from 0 to 5
then combined and converted to the 0 to 30 scale (ETS, 2011b).
Each institution that uses TOEFL iBT scores sets their own
cut-off scores.
Statistical
distribution of the
scores
ETS provides data based on test taker results from the
January 2013 to December 2013 testing period. Statistical
distribution reflects these results.
Mean St. Deviation
Reading 20.1 6.7
Listening 19.1 6.7
Speaking 20.1 4.6
Writing 20.6 5.0
(ETS, 2014)
Standard error of
measurement
The standard error of measurement as reported by ETS (2011a) is
based on data from 2007. The standard error of measurement (SEM)
is expressed in points on the TOEFL scale.
Section Scale SEM
Reading 0-30 3.35
Listening 0-30 3.20
Speaking 0-30 1.62
Writing 0-30 2.76
Total 0-120 5.64
Since the SEM for each section is fairly low in proportion to the
maximum possible score (10% or less), test takers true scores are
probably fairly close to the scores they receive.
Evidence of
reliability
The reliability estimate for select-response items is calculated using a
method based on item-response theory, while the estimate for the
constructed response items is based on generalizability theory (ETS,
2011a).
Section Reliability
Estimate
Reading 0.85
Listening 0.85
Speaking 0.88
Writing 0.74
Total 0.94
These reliability estimates are relatively high for each section except
the writing section, which could be due to the subjective nature of
writing tasks and their scoring. A 2008 study found that test takers
who took the TOEFL twice within a short amount of time showed a
high degree of consistency in their score (Zhang). In addition, the
TOEFL had a high overall reliability of 0.94, supporting the
consistency of scores and items.
Evidence of validity ETS has conducted numerous studies to support the validity
of the TOEFL iBT. In the report Validity evidence supporting the
interpretation and use of TOEFL iBT score (2011b), ETS lays out
and describes each proposition and the evidence used to prove
validity of the test. ETS (2011b) shows that validity increased with
this internet based version of the TOEFL as it includes integrated
skills tasks similar to those that would be encountered in an English
medium university classroom.
In addition, ETS researchers have conducted several studies
to build an argument for the predictive validity of the TOEFL iBT as
it relates to academic success. Studies compiled in Chapelle et. al.
(2008) show there is a correlation between the predictive value of the
TOEFL iBT and students’ academic success. However, a study by
Vu & Vu (2013) concluded that is no significant correlation between
TOEFL scores and GPA. Based on their findings, they state that
“The TOEFL test is not an aptitude and ability test…just simply a
language test used as a prerequisite for academic performance (p.
18).
Sawaki and Nissan (2009) provide evidence of criterion-
related validity in the listening section by comparing the language
use tasks used in this section of the test to native speaker language
use in higher education contexts. They found that students who
score a 14 or above on the listening section of the TOEFL iBT were
usually able to comprehend 50% or more of university lectures.
While this could potentially be a good thing, they do not conclude
that this is enough to succeed in an English medium university
classroom. They also do not provide evidence for how much more
students who score more than 14 can understand.
International English Language Test System (IELTS)
Publisher: University of Cambridge ESOL Examinations, the British Council and
IDP: IELTS Australia. Subject Manager, University of Cambridge ESOL
Examinations, 1 Hills Road, Cambridge CB1 2EU United Kingdom;
telephone 44-1223-553355; fax 44-1223-460278; [email protected];
http://www.ielts.org/. Manager, North America, Cambridge Examinations
and IELTS International, 100 East Corson Street, Suite 200, Pasadena, CA
91103 USA; telephone 626-564-2954; fax 626-564-2981;
[email protected]; https://www.ielts.org/
Publication Date: 1989
Target Population: Students for whom English is not a first language and who wish to work or
attend university in an English-speaking country
Cost: Varies greatly by testing center; see http://www.ielts.org/
IELTS has a set fee for its test. The Academic and General Training tests
are the same cost. However, fees vary between testing centers, possibly
due to differing administration fees. In the United States, costs range from
$190-$265, in the United Kingdom, £135-£145, in Australia, AUD$330.
Table 2
Extended Description for IELTS
Test purpose IELTS provides a measure of the English language ability of
non-native English speakers who wish to work or study where English is
the language of communication (O’ Sullivan, 2005). When it was first
created, the test was intended for students who wished to study in
Australia, the United Kingdom and New Zealand. However, in the past
10 years, schools in the United States and Canada have started accepting
scores. In addition to schools and universities, government agencies,
professional and industry bodies and multinational companies and
employers accept IELTS scores as proof of English ability. In fact, more
than 9,000 organizations in over 135 countries accept IELTS (IELTS
Partners, 2013a).
IELTS actually consists of two tests, an academic route for test
takers who are planning to apply to universities or for membership in
professional organizations and a general training route for test takers
who wish to attend certain high schools or participate in work or training
activities (O’Sullivan, 2005). Australia, New Zealand and Canada also
use the general training track for immigration purposes (IELTS Partners,
2013a). Every recognizing organization sets its own entry requirements.
Test Structure IELTS covers all for language skills: listening, reading, writing
and speaking. The academic and general training routes have the same
listening and speaking tests and differ only in the reading and writing
sections which are designed to reflect academic reading and writing
purposes and general training reading and writing purposes respectively.
The listening, reading and writing tests must be completed on the same
day. However, the speaking test may be taken up to seven days before
or after the other three tests (IELTS Partners, 2013b). The first three
tests take two hours and thirty minutes to complete. The speaking test
lasts between 11 and 14 minutes.
The listening test lasts approximately 30 minutes with an
additional 10 minutes of transfer time. It is made up of 40 questions of a
variety of types including: multiple choice, matching, form completion,
summary completion and short-answer questions on four different
listening sections. The first is a conversation between two people set in
an everyday social context. Section two is a monologue again set in an
everyday social context. The third section is a conversation between up
to four people, which is set in an educational or training context. The
last section is a monologue on an academic subject. Each section is only
heard once and is performed by a variety of voices and native-speaker
accents (IELTS Partners, 2013b).
The reading test is a 60 minute test with 40 questions. These
questions include a variety of different task types including: identifying
information, identifying writer’s views/claims, matching headings,
sentence completion, table completion and diagram-label completion.
The Academic reading test and the general training reading test each
have three sections, for a total text length of 2150-2750 words (IELTS
Partners, 2013b). In the academic reading test, each section contains one
long authentic text taken from books, journals, magazines or
newspapers. They have been written for a non-specialist audience about
academic topics of general interest. The texts may contain non-verbal
materials such as diagrams or illustrations. When they contain technical
terms, a simple glossary is provided (IELTS Partners, 2013b). The
General Training reading test, like the Academic reading test, contains
authentic texts, taken from notices, advertisements, company handbooks,
official documents, books, magazines and newspapers. The first section
contains two or three short factual texts relevant to everyday life in
English-speaking countries. The second section is two short texts that
focus on work-related issues. The last section is one longer, more
complex test on a topic of general interest (IELTs Partners, 2013b).
The writing test lasts 60 minutes and is made up of two tasks.
Test takers write at least 150 words for Task 1 and at least 250 words for
Task 2. For the Academic writing test, task 1 asks candidates to
summarize and explain the information presented in some sort of chart
or diagram in their own words. In task 2, they are asked to write an
essay in response to a point of view, argument or problem. Both task
responses should be written in a formal style. For the General Training
writing test, in task 1, test takers are given a situation and must write a
letter about this situation. The style of the letter may be personal, semi-
formal or formal. In task 2, test takers must write an essay in response
to a point or view, an argument or a problem.
The speaking test is a three-part face-to-face oral interview with
an examiner that lasts 11 to 14 minutes. This test is recorded for scoring
purposes. The first part is an introduction and interview that lasts four to
five minutes. The examiner introduces themself and then asks the test
taker general questions on familiar topics. In part two, the test taker is
given a task card, which asks them to talk about a particular topic. It
also gives the test taker several points that they can include in their talk.
They are given one minute to prepare and one to two minutes to speak.
Following their short talk, the examiner asks one or two questions on the
same topic. The last part is a two-way discussion. The examiner asks
further questions connected to the topic of part two for four to five
minutes, giving the test taker more opportunity to discuss abstract issues
and ideas (IELTS Partners, 2013b).
Test Scoring IELTS scores are reported on a 9-band scale, with 9 indicating an
expert user and 1 indicating a non-user (IELTS Partners, 2013a). Scores
are reported as whole or half bands.
On the listening and reading tests, each correct answer receives
one point for scores out of forty that are converted to the IELTS 9-band
scale. How these scores are converted for these skills is shown in the
table below.
Listening Academic Reading General Training
Reading
Raw
score
Band
score
Raw
score
Band
score
Raw
score
Band
score
16 5 15 5 15 4
23 6 23 6 23 5
30 7 30 7 30 6
35 8 35 8 34 7
On the writing test, responses are assessed by certified IELTS
examiners according to the Writing Test Band Descriptors (task
achievement/response, coherence and cohesion, lexical resource and
grammatical range and accuracy). Task 2 contributes twice as much as
Task 1 to the overall writing score. This distribution could be due to the
fact that Task 2 more closely reflects authentic academic situations and
involves higher order thinking skills.
On the speaking test, test takers are assessed on their
performance throughout the test by certified IELTS examiners according
to the Speaking Test Band Descriptors (fluency and coherence, lexical
resource, grammatical range and accuracy, and pronunciation) (IELTS
Partners, 2013b).
Once a band score has been determined for each part of the test,
the four scores are averaged together for the overall score. When a test
taker’s overall score ends in a .25, it is rounded up to the next highest
half band and when it ends in a .75, it is rounded up to the next highest
whole band (IELTS Partners, 2013a).
Statistical
Distribution of
the scores
IELTS provides data based on test taker results from 2012. According
to their research, these results are broadly in line with statistics of
previous years.
Mean
Standard
Deviation SEM
Listening 6.0 1.3 0.390
Academic Reading 5.9 1.0 0.316
General Training Reading 6.1 1.2 0.339
Academic Writing 5.5 - -
General Training Writing 5.9 - -
Speaking 6.0 - -
Due to the format of the Writing and Speaking modules, scores cannot
be reported in the same manner and no standard deviation information is
provided for these tests. However, IELTS states that they are rated by
trained and standardized examiners according to detailed descriptive
criteria and rating scales. (IELTS Partners, 2013a)
Standard error of
measurement
The overall standard error of measurement for IELTS is 0.22
bands (Pearson, 2012a). Because the SEM is less than one band, test
takers scores are a good indicator for their true abilities.
Evidence of
reliability
Internal consistency is reported for the reading and listening
modules for 2012. Across the 16 versions of the listening module, the
mean of Cronbach’s alpha is 0.91. Across the 16 versions of the general
training reading module, the mean of Cronbach’s alpha is 0.92. Finally
across the 16 versions of the academic reading module, the mean of
Cronbach’s alpha is 0.90. As these measures of internal consistency are
very close to 1, each section is shown to have high reliability.
Because the writing and speaking modules are not item-based,
IELTS does not report reliability for these modules in the same manner
as reading and listening. Reliable scores for these two sections are
assured through face-to-face training and certification of examiners as
well as redoing this process every two years. Because the speaking tests
are recorded, a professional support network can manage and
standardize their examiners ratings. Shaw (2004) reported that the inter-
rater correlation for these sections was 0.77 and the g-coefficients were
0.84-0.93 for the single-rater condition.
Overall reliability estimate for the test is reported as 0.96
(Pearson, 2012b). This shows that the test as a whole is also highly
reliable.
Evidence of
validity
The University of Cambridge ESOL Examinations claims the use
of expert judgment in operationalizing the construct as well as empirical
evidence provided through statistical analysis of test responses as
evidence of construct-related validity (O’Sullivan, 2005).
IELTS provides a large number of research studies on their
website showing the correlation between test scores and test outcomes as
evidence of predictive validity (IELTS Partners, 2013a). Some of these
studies examine the relationship between IELTS scores and grade point
average (GPA). One such study showed that IELTS scores for both
listening and reading correlated with higher GPAs, while the writing and
speaking score did not correlate with this information (Humphreys et. al,
2012). This study suggests that of the four sections, only the listening
and reading sections have some valid predictive value. Another study
funded by IELTS examined case studies of IELTS used for postgraduate
admissions and performance for predictive purposes. This study
concludes that IELTS scores do not seem to indicate strongly one way or
another how a student will perform in an English-speaking academic
domain (Lloyd-Jones, Neame & Medaney, 2011).
However a different study, by Ingram and Bayliss (2007), found
that most of the students who participated had language behavior that
equaled or exceeded that implied by their overall IELTS rating. This
study also found that an overall proficiency level of 6.5 or higher was
adequate for university study (Ingram & Bayliss, 2007).
Pearson Test of English Academic (PTE Academic)
Publisher: Pearson
Publication Date: 2009
Target Population: Students who do not speak English as their native language who want
to study at English medium universities
Cost: Cost varies by country of testing center; see pearsonpte.com for prices. In
the United States, the test costs $200. In the United Kingdom, the test
costs £155. In Australia, it cost AUS$330.
Table 3
Extended Review of PTE Academic
Test purpose PTE Academic is a proficiency assessment aimed at students
who wish to attend English medium universities. It measures
reading, writing, listening and speaking in integrated tasks within a
computer-based format. Pearson developed the PTE Academic to
meet the need for an English language proficiency test that
measured non-native English speakers English communication
skills more accurately in the context of an academic environment as
securely and efficiently as possible(Pearson, 2013).
As this test is relatively new, its goal is to one day compete
with other large-scale, high-stakes English proficiency tests such as
the IELTS and TOEFL iBT. In fact, in much of Pearson’s research
(2009, 2012a, 2012b), statistical comparisons to these two tests are
included as proof of its accuracy and objectivity.
In addition to university admission, the PTE Academic can
be used as proof of English proficiency in order to obtain visa in
New Zealand, Australia, the United Kingdom and the United States.
Test structure PTE Academic is a computer-based English language test. It
assesses the four skills, listening, reading, speaking, and writing, in
an integrated format. The test is three hours long and split into three
timed parts. The timings of these sections may vary slightly. There
are 20 different types of questions, some of which integrate more
than one language skill.
The test starts with an unscored personal introduction. The
first part of the test is speaking and writing which can last between
77 and 93 minutes. The second part is a 32 to 41 minute reading
section. Students can choose to take a break after this part before
starting on the last section which is a 45 to 57 minute listening
section (Pearson, 2012a).
Within the speaking and writing section of the test, there are
five types of tasks that focus on speaking and two types of tasks that
focus on writing. The speaking tasks include: reading aloud,
repeating sentences, describing images, retelling a lecture, and
answering short questions. The writing tasks include: summarizing
a written text and writing an essay.
Within the reading section of the test, there are five types of
tasks. These are two types of multiple-choice questions (choosing
single answers and choosing multiple answers), reordering
paragraphs, reading with fill in the blanks, and reading and writing
with fill in the blanks.
Within the writing section, test takers are asked to
summarize a spoken text, answer multiple choice questions with
multiple answers and with single answers, fill in the blanks,
highlight a correct summary, select missing words, highlight
incorrect words, and do a dictation (Pearson, 2011).
Test scoring PTE Academic provides an overall score for each test taker
as well as scores for communicative skills (reading, listening,
speaking and writing) and enabling skills (grammar, oral fluency,
pronunciation, spelling, vocabulary and written discourse). As
many items contribute to more than one communicative or enabling
skill, the overall score cannot be directly computed from these two
scores. All of the scores are on a scale from 10 to 90: 10 being the
lowest proficiency and 90 being the highest. For integrated skills
items, the item score contributes to all of the skills that the item
assesses.
The scores for the enabling skills rate performance in the
productive skills, speaking and writing (Pearson, 2012b). No
enabling skills score is awarded for responses to items that are
inappropriate in either content or form.
All items in PTE Academic are machine scored. Some items
have correct-incorrect scoring while others use partial credit
scoring, taking into account correctness, formal aspects and the
quality of the response. The formal aspects include whether or not a
response is under or over the word limit, while the quality of the
response is represented by the enabling skills assessed by an item.
Speaking and writing skill scores are generated by automated
scoring systems.
For performance based items, they are first scored on
content. If no response or an irrelevant response is given, the
content is scored as 0 and no more points are awarded for this task.
If a score is given for content, the item will also be scored on form.
If the response is of the appropriate length, a score will be given and
then the response will be rated on the enabling skills present in the
response. The total item score comes from adding the scores for
content, form and enabling skills. The total item score contributes
to the communicative skills score assessed by the item as well as to
the overall score. The individual scores awarded to each of the
enabling skills contribute to the enabling skills scores that are
reported (Pearson, 2012b).
Statistical
Distribution of scores
Riazi (2014) reports the mean and standard deviation for each skill
based on data from small group of students.
Mean SD
Listening 64.65 14.66
Reading 60.33 14.66
Speaking 66.92 17.10
Writing 60.5 13.72
Total 63.42 14.10
Standard Error of
Measurement
In their Accurate Factsheet, Pearson (2012) states the
standard error of measurement for PTE Academic as a whole is 2.32
points on the PTE Academic scoring scale. In addition, in their
Reliability and Validity Report, Pearson (2009) states that the
standard error of measurement for each of the communication skills
is less than 5 points. These measurements show that test takers
scores are very near their true scores.
Evidence of
Reliability
Pearson (2012) presents reliability estimates for each of the
communicative skills as well as an overall reliability. These
estimates are presented below:
Reliability
Listening 0.91
Reading 0.92
Speaking 0.91
Writing 0.91
Total 0.97
These estimates show a high reliability for the PTE Academic,
demonstrating the overall high consistency and stability of test
scores.
In order to support that the machine scoring (VERSANT)
system used to score this assessment was reliable, Pearson
compared the machine-given scores to human scores. The overall
reliability for both groups was 0.97 and the correlations between the
two types of scoring was 0.96 (Pearson, 2009).
Evidence of Validity From the beginning, evidence was collected to ensure the
validity of PTE Academic. It was important to the developers that
the test be linked to other external frameworks of language
proficiency. To fulfill this goal, PTE Academic was benchmarked
to the Common European Framework (CEF). In order to guarantee
this alignment, test item writers received training in understanding,
interpreting and using the CEF, and student responses were rated on
the CEF scale independent of the test scores by human raters
(Pearson, 2009).
Another way Pearson worked toward construct validity in
PTE Academic was having native English speakers test the pilot
items. Their results constituted one of item selection criteria (Zheng
& De Jong, 2011).
As proof of concurrent and predictive validity, Riazi (2013)
compared PTE Academic with IELTS. This comparison showed
that there is a significant correlation between test takers’ scores on
PTE Academic and IELTS Academic. In addition, PTE Academic
was able to significantly differentiate between lower and higher
proficiency groups as determined by IELTS Academic scores.
After reviewing these three tests, it was clear that all three tests assess language skills
necessary for success in an English medium classroom. All three work to use authentic materials
as input for the assessment. The TOEFL iBT has students listen to lectures, IELTS has students
read authentic articles, and PTE Academic has students listen to lectures and read texts that cover
academic topics. These tests also integrate the skills to varying degrees. IELTS has a few tasks
that require students to read short texts and write or speak about these texts. The TOEFL iBT
requires students that read or listen before speaking and read and listen before writing. The PTE
Academic integrates most of their tasks, asking students to read and write or listen and write or
read and speak throughout the test. These tests also differ in how they score these skills.
Because of the types of tasks included on the IELTS, highly trained raters are used to score all
parts of the test. The TOEFL is scored partially by computers, and partially by highly trained
raters as well. The PTE Academic, however, is completely scored by computer with the
VERSANT software. All three tests take more than two and a half hours to administer. IELTS
is the shortest at around two hours and forty-five minutes. PTE Academic is the next longest at
three hours. The TOEFL is by far the longest, lasting four and a half hours.
Students in an academic English context might need to choose between these three tests.
These students want to study at an English speaking university in the United States, the United
Kingdom or Australia, and they need to demonstrate their English proficiency in order to be
admitted into the university of their choice. All of them will have the equivalent of a high school
diploma and some of them might even already have some sort of undergraduate degree in their
home countries.
In choosing which exam to suggest for these students, the instructor must think about the
students’ goals. If a student knows exactly which university they wish to attend, they can look at
which test scores they accept and decide amongst which of those tests they will have the most
success on. They might think the format of one test seems more appealing. Speaking into a
microphone for the speaking section as on the TOEFL iBT or PTE Academic might seem less
stressful than a face to face interview as on the IELTS. I, personally, tend to be skeptical of a
test that claims to score speaking and writing tasks by computers only. However, PTE Academic
claims their scoring software has been trained with thousands of samples so that it grades
effectively and objectively any answer provided to it.
However, if a student does not know which university that they would like to attend, I
would suggest the TOEFL iBT. It is one of the most widely accepted English proficiency tests
and known by most everyone who works with international students. Another fact that may be a
deciding factor for students is that the TOEFL iBT is the cheapest of these three tests. While
extensive research is available for all three tests proving their reliability and validity, the TOEFL
iBT has published the most research as well as the most study manuals.
References
Chapelle, C. A., Enright, M.K., Jamieson, J.M. (2008) Building a validity argument for the Test
of English as a Foreign Language. New York, NY: Routledge Taylor & Francis Group.
ETS. (2007). Test and Score Data Summary or TOEFL Computer-Based and Paper-Based
Tests. Retrieved from http://www.ets.org/toefl/
ETS. (2014). Test and score data summary for TOEFL iBT Tests. Retrieved from
http://www.ets.org/toefl/
ETS. (2011). TOEFL iBT Research: Reliability and comparability of TOEFL iBT scores.
Retrieved from http://www.ets.org/toefl/
ETS. (2011). TOEFL iBT Research: TOEFL iBT test framework and test development. Retrieved
from http://www.ets.org/toefl/
ETS. (2011). TOEFL iBT Research: Validity evidence supporting the interpretation and use of
TEOFL iBT Scores. Retrieved from http://www.ets.org/toefl/
ETS. (2012). TOEFL Test Prep Planner. Retrieved from http://www.ets.org/toefl/
IELTS Partners. (2013). Information for Candidates. Retrieved from http://www.ielts.org/
IELTS Partners. (2013). IELTS – International English Language Testing System.
Retrieved from http://www.ielts.org/
Humphreys, P., Haugh, M., Fenton-Smith, B., Lobo, A., Michael, R & Walkinshaw, I. (2012).
Tracking international students’ English proficiency over the first semester of
undergraduate study. IELTS Research Reports Online Series (1). Retrieved from
http://www.ielts.org/
Ingram, D. & Bayliss, A. (2007). IELTS as a predictor of academic language performance.
Melbourne: IELTS Australia.
Lloyd-Jones, G., Neame, C. & Medaney, S. (2011) A multiple case study of the relationship
between the indicators of students’ English language competence on entry and students’
academic progress at an international postgraduate university. IELTS Research Reports
(11). Retrieved from www.ielts.org/PDF/vol11_report_3_a_multiple_case_study.pdf
O’Sullivan, B. (2005). International English Language Testing System (IELTS). In S.
Stoynoff & C. Chapelle, ESOL Tests and Testing (pp. 73-78). Alexandria, VA: TESOL.
Pearson. (2012). Pearson Academic Accurate Factsheet. Retrieved from
http://pearsonpte.com/
Pearson. (2012). PTE Academic and Me. Retrieved from http://pearsonpte.com/
Pearson. (2012). Pearson Academic Objective Factsheet. Retrieved from
http://pearsonpte.com/
Pearson. (2011). PTE Academic Test Tips. Retrieved from http://pearsonpte.com/
Pearson. (2012). PTE Academic Score Guide. Retrieved from http://pearsonpte.com/
Pearson. (2009). Validity and reliability in PTE Academic. Retrieved from
http://pearsonpte.com/
Riazi, M. (2013). Concurrent and predictive validity of Pearson Test of English Academic
(PTE Academic). Papers in Language Testing and Assessement 2(2), 1-27.
Sawaki, Y., & Nissan, S. Criterion-related validity of the TOEFL iBT listening section. ETS,
Princeton, NJ (2009).
Shaw, S. D. (2004). “IELTS writing: revising assessment criteria and scales (phase 3).”
Research notes 16, 3-7.
Stoynoff, S. & Chapelle, C. (2005). ESOL Tests and Testing. Alexandria, VA: TESOL
Vu, L.T. & Vu, P.H. (2013). Is the TOEFL score a reliable indicator of international graduate
students’ academic achievement in American higher education? International Journal on
Studies in English Language and Literature (IJSELL). 1 (1) p.11-19.
Zhang, Y. (2008). Repeater analysis for the TOEFL iBT. ETS Research Report (RM-08-05).
Princeton, NJ: ETS.
Zheng, Y. & De Jong J. H.A.L. (2011). Research Note : Establishing Construct and
Concurrent Validity of Pearson Test of English Academic. London: Pearson.