Upload
ledieu
View
217
Download
3
Embed Size (px)
Citation preview
Running head: TEST REVIEW 1
Test Review: TOEFL iBT, IELTS, and TEPS
Jee Eun Park
Colorado State University
TEST REVIEW 2
Abstract
This paper review three English proficiency assessments, which are TOEFL iBT, IELTS, and
TEPS for the purpose of decision making which test among these three provides the most
appropriate criterion with which the Korean government selects current elementary/secondary
school teachers for a long-term overseas study at a higher education institutions where English is
the language of instruction. Even though TEPS has been used by most of Korean governmental
institutions and public institutions in making employment decisions such as oversees postings,
this paper concludes that TOEFL iBT and IELTS would be more appropriate for the stated use.
Key words: proficiency test, standard deviation, standard error of measurement,
reliability, validity
TEST REVIEW 3
Test Review: TOEFL iBT, IELTS, and TEPS
This paper is aimed to compare Test of English Proficiency developed by Seoul National
University (TEPS) with two other English proficiency tests, which are Test of English as a
Foreign Language Internet-Based Test (TOEFL iBT) and International English Language
Testing System (IELTS). TEPS developed by Seoul National University in Korea is an English
proficiency test which has been widely used in Korea by most of Korean governmental
institutions and public institutions in making employment decisions, such as initial selection,
overseas postings, and promotion since 2004. My purpose for reviewing these three tests is to
find out which test among these three provides the most appropriate criterion with which the
Korean government selects current elementary/secondary school teachers for long-term overseas
studies at a higher education institutions where English is the language of instruction. Currently
TEPS is used for the selection of the teachers to be dispatched for long-term studies in any
English-speaking country, so my aim for the review is to find out whether TEPS is the most
appropriate test among the three.
Test of English as a Foreign Language Internet-Based Test (TOEFL iBT)
Publisher: Educational Testing Service, P. O. Box 6151 Princeton, NJ 08541-6151, USA;
telephone 609-771-7100; fax 610-290-8972; [email protected]; http://www.ets.org/toefle/
Publication Date: 2005
Target Population: Mainly nonnative-English-speaking students who wish to study at
institutions of higher learning where the language of instruction is English.
TEST REVIEW 4
Cost: The cost of the test range from US$160 to US$250 depending on the country the test
center is located. See http://www.ets.org/Media/Tests/TOEFL/tclists/IBT_a.html for more
information.
Overview
The TOEFL iBT test measures the ability of nonnative English speakers to use and understand
the English language as it is heard, spoken, read and written in the university classroom. The test
is administered via the Internet. The test measures reading, listening, speaking and writing skills.
It is offered 30 to 40 times a year, and is administered online at more than 4,500 testing sites in
165 countries. An extended description of TOEFL iBT is provided (see Table 1).
Table 1
Extended description for TEOFL IBT
Test purpose
The TOEFL iBT is a proficiency test designed to provide evidence of the
English language proficiency of nonnative-English-speaking applicants to
higher education institutions where English is the language of instruction.
Scores on the test are also used for English-language learning program
admission, scholarship and certification, English-language learner’s track of
their progress, and visa application.
Test structure
The TOEFL iBT test is given in English and administered via the internet.
There are four sections (listening, reading, speaking and writing) which take a
total of about four and a half hours to complete. During the test, you are asked
to perform tasks that combine more than one skill, such as:
Read, listen and then speak in response to a question
Listen and then speak in response to a question
Read, listen and then write in response to a question
The Reading section includes three or four reading passages. There are 12 to
14 questions per passage. You have from 60 to 80 minutes to answer all the
questions in the section. The Listening section includes questions about
academic lectures and long conversations. There are 34 to 51 questions for
the entire Listening section. You have 60 to 90 minutes to answer all the
questions in the section. There are four to six lectures (each three to five
minutes long, six questions per lecture) and two to three conversations (each
three minutes long, five questions per conversation). There is ten minute
break after the Listening section. The Speaking section is approximately 20
TEST REVIEW 5
minutes long and includes six questions. The first two questions are called
"independent Speaking tasks" because they require you to draw entirely on
your own ideas, opinions and experiences when you respond. The other four
questions are called "integrated Speaking tasks" because they require you to
integrate your English-language skills — listening and speaking or listening,
reading and speaking — just as you would in or out of a classroom. The total
time for the Writing section is 50 minutes. You are asked to write responses
to two writing tasks: an integrated Writing task and an independent Writing
task. In the integrated Writing task (20 minutes), the test takers are asked to
read a short passage, listen to a short lecture, and then write in response to
what you read and listened to. In the independent Task (30 minutes), the test
takers write an essay in response to a Writing topic.
Scoring of the
test
The TOEFL iBT test items are scored using four subscales which is Reading,
Listening, Speaking, and Writing:
Reading 0–30
Listening 0–30
Speaking 0–30
Writing 0–30
Total Score 0–120
The total score is provided to show the sum of the four skill scores. All
Reading and Listening items are scored as either correct or incorrect and
cumulative score of correct items is counted. For Speaking items, the test
taker’s responses are sent to the ETS Scoring Network, where three to six
certified human raters score them holistically on a scale of 0 to 4. The average
score on the six tasks is converted to a scaled score of 0 to 30. Writing
responses are scored holistically on a scale of 0 to 5. The average score on the
two tasks is converted to a scaled score of 0 to 30.
Statistical
distribution of
scores
Means and standard deviations are provided by gender for each section and
the total scores.
Group Reading Listening Speaking Writing Total
Male 20.4 20.0 20.0 20.9 81
(SD) 6.9 6.8 4.6 5.1 21
Female 19.9 20.1 20.8 21.1 82
(SD) 6.6 6.6 4.6 4.9 20
(TOEFL Test and Score Data Summary, 2011)
The standard
error of
measurement
(SEM)
The standard error of measurement (SEM) is 3.34 for Reading section, 3.20
for Listening section, 1.62 for Speaking section, and 2.76 for Writing section.
SEM for the total of each section is 5.64.
Evidence of
reliability
In the TOEFL iBT test, the reliability estimation for the Reading and
Listening sections that contain selected response questions is carried out using
a method based on item response theory (IRT). For the Speaking and Writing
sections that contain constructed response tasks, generalizability theory (G-
TEST REVIEW 6
theory) is used. Reliability estimates for each section are provided below.
Reading .85
Listening .85
Speaking .88
Writing .74
Total .94
The reliability estimates for the Reading, Listening, Speaking, and Total
scores are relatively high, while the reliability of the Writing score is
somewhat lower. However, this is typical result for writing measures
composed of only two tasks. ETS tells that the total score is the best
information for making high-stakes decisions such as admissions to college or
graduate school.
Evidence of
validity
Educational Testing Service provides a large number of research studies
claiming a strong validity of TOEFL on its homepage. The publisher claims
the content validity through the reviews of research and empirical studies of
language use at English-medium institutions of higher education. For the
evidence of construct validity, investigations of discourse characteristics of
written and spoken responses and strategies used in answering reading and
comprehension questions are provided. The publisher claims that TOEFL is
strongly related to other criteria of academic language proficiency such as
self-assessments, academic placements, performance on simulated academic
listening tasks. For example, the correlation coefficient between the scores on
the TOEFL iBT Speaking section and different types of local assessments
range from .78 to .44. (ETS, 2008)
International English Language Testing System (IELTS)
Publisher: University of Cambridge ESOL Examinations, the British Council, and IDP; IELTS
Australia. Subject Manager, University of Cambridge ESOL Examinations, 1 Hills Road,
Cambridge CBI 2EU United Kingdom; telephone 44-1223-553355; [email protected];
http://www. ielts.org/. Manager, North America, Cambridge Examinations and IELTS TS
International, 100 East Corson Street, Suite 200, Pasadena, CA 91103 USA; telephone 626-564-
2954; [email protected]; http://www.ielts.org/
Publication Date: 1989
Target Population: Students for whom English is not a first language and who wish to work or
attend university in an English-speaking country
TEST REVIEW 7
Cost: Varies greatly by location of test center. See http://www.ielts.org/ or the IELTS handbook
(University of Cambridge ESOL Examinations. 2003). In general, costs are Australia, A$330;
United Kingdom, £130; United States, about US$185.
Overview
IELTS is an updated version of the ELTS test used throughout the 1980’s in Australia, New
Zealand, and the United Kingdom (O’Sullivan, 2005). Since that time, the test has received two
major revisions to meet the increased demand for a modern English language test. Now, IELTS
is administered at over 800 centers across the globe and is recognized worldwide for assessing
listening, reading, writing, and speaking for ESL/EFL adult candidates. An extended description
of IELTS is provided (see Table 2).
Table 2
Extended description for IELTS
Test purpose
IELTS is accepted by more than 7,000 organizations worldwide. These
include universities, immigration departments, government agencies,
professional bodies and multinational companies.
The Academic format is, broadly speaking, for those who want to study or
train in an English-speaking university or institutions of higher and further
education. Admission to undergraduate and postgraduate courses is based on
the results of the Academic test.
The General Training format focuses on basic survival skills in broad social
and workplace contexts. It is typically for those who are going to English-
speaking countries to do secondary education, work experience or training
programs. People migrating to Australia, Canada and New Zealand must sit
the General Training test.
Test structure
IELTS is available in two test formats; Academic and General Training. All
test takers take the same Listening and Speaking modules but different
Reading and Writing modules.
IELTS has four parts – Listening (30 minutes), Reading (60 minutes), Writing
(60 minutes) and Speaking (11–14 minutes). The total test time is 2 hours and
45 minutes. In Listening parts, test takers listen to four recorded texts,
monologues and conversations in an everyday or educational context and
answer to a series of questions. The Academic Reading part includes three
long texts which range from the descriptive and factual to the discursive and
analytical, which are taken from books, journals, magazines, and newspapers.
TEST REVIEW 8
The reading part consists of 40 questions. The Academic Writing part
includes two tasks. One task is to summarize, describe, or explain a table,
graph, chart, or diagram with at least 150 words. The other task is to write a
short essay of at least 250 words. The Speaking part consists of three tasks.
The first task is to answer general questions with familiar topics. The second
part is to answer one or two questions with a particular topic and the third part
is to discuss more abstract ideas and issues for four to five minutes.
Scoring of the
test
The test takers receive scores on a Band Scale from 1 to 9. A profile score is
reported for each skill. The four individual scores are averaged and rounded
to produce an Overall Band Score. IELTS Listening and Reading sub-tests
contain 40 items and each item is awarded one mark; the maximum raw score
a test taker can achieve on a sub-test is 40. Band Scores ranging from Band 1
to Band 9 are awarded on the basis of their raw scores. When making the
Writing and Speaking sub-tests, examiners use detailed performance
descriptors which describe written and spoken performance at each of the 9
IELTS Bands.
Statistical
distribution of
scores
The following figures show the mean of overall and individual band scores
achieved by 2011 Academic and General Training candidates according to
their gender.
Band Listening Reading Writing Speaking Overall
Academic(Female) 6.1 6.1 5.6 5.9 6.0
GT(Male) 6.2 5.7 5.9 6.2 6.1
Academic(Male) 5.9 5.8 5.4 5.7 5.8
GT(Male) 6.3 5.7 5.8 6.3 6.1
Standard deviation is reported as 1.3 for Listening, 1.2 for Academic
Reading, and 1.5 for General Training Reading.
Standard error
of
measurement
(SEM)
The standard error of measurement (SEM) is .390 for Listening, .379 for
Academic Reading, and .424 for General Training Reading.
Evidence of
reliability
The reliability of Listening and Reading tests is reported using Cronbach's
alpha, a reliability estimate which measures the internal consistency of the 40-
item test. The following Listening and Reading material released in 2011 had
sufficient candidate responses to estimate and report meaningful reliability
values as follows:
Average Alpha across Listening versions .91
Average Alpha across General Training Reading versions .92
Average Alpha across Academic Reading versions .90
The reliability of the Writing and Speaking modules cannot be reported in the
same manner as for Reading and Listening because they are not item-based;
candidates' writing and speaking performances are rated by trained and
TEST REVIEW 9
standardized examiners according to detailed descriptive criteria and rating
scales. The assessment criteria used for rating Writing and Speaking
performance are described in the IELTS 2006 Handbook.
Evidence of
validity
University of Cambridge ESOL Examinations claims evidence of construct-
related validity through the use of expert judgment and content validity
through the representative nature of the test writers. The publisher also
provides a large number of research studies claiming high degree of criterion–
related validity (Ingram, 2004; Lloyd-Jones et al, 2011; Hill et al, 1999)
Test of English Proficiency developed by Seoul National University (TEPS)
Publisher: The TEPS Council of Seoul National University, 1 Gwanak-ro, Gwanak-go, Seoul,
151-742 South Korea; telephone 02-886-3330; fax 02-886-8110; http://www.teps.or.kr/
Publication Date: 1999
Target Population: Mainly nonnative-English-speaking Koreans who wish to study at tertiary
education institutions in Korea, or who wish to get hired or promoted in most companies and
public service institutions in Korea, or who wish to be dispatched abroad at work.
Cost: ₩36,000 (approximately US$ 30)
Overview
The Test of English Proficiency developed by Seoul National University (TEPS) is an English
proficiency test researched and developed by professors and researchers at Seoul National
University. TEPS has been administered nationwide since January 1999 by the TEPS Council of
Seoul National University. TEPS is designed to test applicants' communicative English skills and
to minimize test-taker reliance on certain strategies such as rote memorization. While TEPS is
taken by thousands of Koreans each year, even by non-Koreans from seven foreign countries, the
scores are not recognized by many organizations outside of Korea as an indicator of English
proficiency. An extended description of TEPS is provided (see Table 3).
Table 3
TEST REVIEW 10
Extended description for TEPS
Test purpose
TEPS is a proficiency test designed to provide evidence of the English
language proficiency of nonnative-English-speaking applicants with the
diverse contexts. Scores on the test are also used for admission to higher
education institutions in Korea, employment and promotion, scholarship and
certification, and English-language learner’s track of their progress.
Test structure
TEPS consists of 200 questions which are divided into four sections:
Listening Comprehension (60 questions), Grammar (50 questions),
Vocabulary (50 questions) and Reading Comprehension (40 questions). It
takes approximately two hours and twenty minutes to administer the test.
Scores are assigned on a scale of 990 points, and incorporate Item Response
Theory.
Listening Comprehension section includes four parts: Part 1 to 3 consists of
conversations, while Part 4 consists of short monologues in the form of
lectures, broadcasts, announcements, advertisements, etc. A variety of
situations and topic is used for questions of the Listening Comprehension
section.
Grammar section emphasizes a test taker’s ability to apply grammar skills in
real life situations. Therefore it has a time-constraint of 30 seconds per
question. The Grammar section includes four parts.
Vocabulary section asks the test taker to select the most appropriate word to
measure the test taker’s ability to use vocabulary in authentic and practical
contexts. It consists of two parts: Part 1 is conversation and Part 2 is a short
passage.
Reading Comprehension section includes three parts in which the test taker
has approximately one minute to read each short, self-contained passage and
answer a single question on it.
Scoring of the
test
TEPS is using the IRT scoring method to evaluate language proficiency with
more objective and accurate results. IRT is a probability theory that evaluates
proficiency based on a test taker’s responses and the difficulty level of
questions. Therefore, a test taker who accurately answers more difficult
questions will receive a substantially higher score that the one who answers
primarily lower level questions correctly. TEPS scores are categorized into
10 levels. Each level explains a test taker’s communicative competence.
(The TEPS Council Brochure, 2012)
Statistical
distribution of
scores
Means are provided by proficiency level, groups, and gender for each section
and the total scores.
Overall Listening Reading Grammar Vocabulary
596.5 245.5 237 57 55.5
(TEPS homepage, 2012)
Standard Deviation is reported as 10.1 for Listening Comprehension, 8.7 for
Grammar, 8.8 for Vocabulary, and 7.2 for Reading Comprehension.
(Choi, 1999)
The standard
error of
The standard error of measurement (SEM) is 3.089 for Listening
Comprehension, 3.052 for Grammar, 2.995 for Vocabulary, and 2.713 for
TEST REVIEW 11
measurement
(SEM)
Reading Comprehension. (Choi, 1999)
Evidence of
reliability
The reliability of TEPS is reported using Cronbach's alpha.
Listening Comprehension .907
Grammar .878
Vocabulary .885
Reading Comprehension .861 (Choi, 1999)
Evidence of
validity
Choi (1999) reports the evidence of validity by correlations between TEPS
and the Test of Oral Proficiency (TOP). The result shows correlation between
TEPS and TOP with the correlation coefficient, .5159 across the components
of each test. Correlation coefficients for reading and vocabulary vary
from .2400 to .4340.
TOEFL, and TOEIC Conversion Tables, which have been announced by The
TEPS Council of Seoul National University every four years since the year
1999 also show that there is high correlation between the tests.
Conclusion
The test takers for TEPS, TOEFL iBT, or IELTS are Korean EFL adult learners. They are
public elementary or secondary school teachers and public officers hired by governmental
institutions. They study English to be selected for the long-term abroad dispatch to study in a
higher education institutions where English is the language of instruction. Again, they study
English as a foreign language, which means they do not speak English for their daily
communication outside the classroom. Most of them have learned English more than 10 years in
their public education period as a foreign language. If they are selected for the abroad training,
they are supposed to communicate in English for their study and everyday life, because they will
be dispatched to a country where English is primarily spoken for their communication.
Comparison and Contrast of the three tests I reviewed are needed to come to a conclusion
which test would be the most appropriate for the given context. (Because IELTS has two
formats, which are Academic and General Training, it is necessary to inform that only Academic
format of IELTS is reviewed in this paper) All of the three tests are English proficiency tests to
TEST REVIEW 12
provide evidence of the English language proficiency of nonnative-English-speaking test takers.
All of them are standardized criterion-reference tests.
The test purposes of TOEFL iBT and IELTS are similar, which is to study or train in an
English-speaking university or institutions of higher and further education. But the test purpose
of TEPS is somewhat different from the other two, which is to acquire evidence of the English
language proficiency of nonnative-English-speaking applicants with the diverse contexts. The
structures of TOEFL iBT and IELTS are similar but the structure of TEPS is remarkably
different from the other two. Both TOEFL iBT and IELTS consist of four sub-tests, which are
Listening, Speaking, Reading, and Writing. Listening and Reading sections of both TOEFL iBT
and IELTS are objective tests (multiple-choice tests) and Speaking and Writing sections of both
TOEFL iBT and IELTS are performance-based tests. Contrastively, TEPS consists of four sub-
tests which are Listening Comprehension, Vocabulary, Grammar, and Reading Comprehension,
and all of the test items of TEPS are objective tests (multiple-choice tests). Accordingly, scoring
methods of TOEFL iBT and IELTS are similar but are different from those of TEPS. Listening
and Reading sections of TOEFL iBT and IELTS are scored as either correct or incorrect and
cumulative score of correct items is counted. Speaking and Reading Sections of TOEFL iBT and
IELTS are scored by human raters holistically on a provided scale. Because all of TEPS test
items are multiple-choice types, TEPS is scored as either correct or incorrect and cumulative
score of correct items is counted.
All of the three tests provide abundant convincing evidence for the substantially high
reliability of the tests. Standard Deviations (SD) are similar in all of the tests: The range of
TOEFL iBT SD ranges from 4.6 to 6.9, IELTs from 5.4 to 6.1, and TEPS from 7.2 to 10.1.
Standard error of measurement (SEM) for Listening and Reading sections of each test is also
TEST REVIEW 13
similar: SEMs for Listening section are 3.20 (TOEFL iBT), 3.90 (IELTS), and 3.08 (TEPS)
respectively, and SEMs for Reading section are 3.34 (TOEFL iBT), 3.79 (IELTS), and 2.713
(TEPS) respectively. Crobach’s alphas are also similar in all of the tests: Crobach’s alphas for
Listening section are .94 (TOEFL iBT), .91 (IELTS), and .907 (TEPS), and Crobach’s alphas for
Reading section are .74(TOEFL iBT), .90 (IELTS), and .861 (TEPS).
The publishers of TOEFL iBT and IELTS provide a great deal of evidence for the test’s
high degree of validity through a large number of research studies on its homepage. The
publishers of TOEFL iBT and IELTS provide the convincing evidence for almost every kind of
validity: the content validity, the construct validity, and the criterion-related validity. However,
evidence for the validity of TEPS is scarce. The study by Choi (1999) provided evidence for the
criterion-related validity. He correlated TEPS with Test of Oral Proficiency (TOP) and provided
correlation coefficients between the two tests: .5159 (overall), .2400 (Reading section), and
.4340 (Vocabulary section), which show intermediate or low degree of validity.
Considering the test takers’ contexts and the results of the reviews for each test, I would
conclude that TOEFL iBT and IELTS would be more appropriate for assessing adult EFL
learners in Korea who wish to be dispatched for long-term studies in any English-speaking
country. The main reason is that TOEFL iBT and IELTS are aimed for assessing the language
abilities of EFL learners who are specifically pursuing studying in a higher education institutions
where English is the language of instruction and the test tasks of the two tests reflect those tasks
that the Korean EFL learners will likely encounter. The target population of TEPS is not limited
to those who are interested in studying in a higher education institutions where English is the
language of instruction.
TEST REVIEW 14
Furthermore, both TOEFL iBT and IELTS provide extensive evidence details for a high
degree of reliability and validity, whereas TEPS provide a great deal of evidence mainly for the
reliability. I believe that TEPS may be substantially reliable, but the multiple choice nature of the
test may raise test validity concerns. In other words, TEPS does not provide any interpretation
about the test takers’ productive aspects of the English language ability.
TEST REVIEW 15
References
Banerjee, J. & Clapham, C. (2005). Test of English as a Foreign Language Computer-Based Test
(TOEFL CBT). In Stoynoff, S., & Chapelle, C., ESOL Tests and Testing: A Resource for
Teachers and Program Administrators (pp. 95-99).Alexandria, VA: TESOL.
Choi, I. (1999). Test fairness and validity of the TEPS. Language Research, 35(4), pp. 511-603.
Educational Testing Service. (2011). Reliability and Comparability of TOEFL iBT® Scores. In
TOEFL iBT Research Insight Series, (3), Retrieved from
http://www.ets.org/s/toefl/pdf/toefl_ibt_research_s1v3.pdf
Educational Testing Service. (2011). TOEFL Test and Score Data Summary, Retrieved from
http://www.ets.org/s/toefl/pdf/94227_unlweb.pdf
Educational Testing Service. (2008). Validity Evidence Supporting the Interpretation and Use of
TOEFL iBT™ Scores, In TOEFL Research Insight Series (4), Retrieved from
http://www.ets.org/s/toefl/pdf/toefl_ibt_insight_s1v4.pdf
Hill, K. and Lynch, B (1999). A comparison of IELTS and TOEFL as predictors of academic
success (IELTS Research Reports 2), Retrieved from University of Cambridge ESOL
Examinations website: http://www.ielts.org/researchers/research/predictive_validity.aspx
Ingram, D. (2007). IELTS as predictor of academic language performance, Part 1 (IELTS
Research Reports 7), Retrieved from University of Cambridge ESOL Examinations
website: http://www.ielts.org/researchers/research/predictive_validity.aspx
TEST REVIEW 16
Lloyd-Jones, G., Neame, C., and Medaney, S. (2011). A multiple case study of the relationship
between the indicators of students’ English language competence on entry and students’
academic progress at an international postgraduate university (IELTS Research Reports
11), Retrieved from University of Cambridge ESOL Examinations website:
http://www.ielts.org/researchers/research/predictive_validity.aspx
O’Sullivan, B. (2005). International English Language Testing System (IELTS). In Stoynoff, S.,
& Chapelle, C., ESOL Tests and Testing: A Resource for Teachers and Program
Administrators (pp. 73-81).Alexandria, VA: TESOL.
The TEPS Council of Seoul National University. (2012). TEPS Analysis. Retrieved from
http://www.teps.or.kr/
The TEPS Council of Seoul National University. (2012). The TEPS Council Brochure. (n.d.)
Retrieved from http://www.teps.or.kr/
The TEPS Council of Seoul National University. (2012). Test Performance 2011, Retrieved from
http://www.ielts.org/researchers/analysis_of_test_data/test_performance_2011.aspx