Principles in language testing

Principles in language testing

What is a good test?

What is the purpose of testing?

• The purpose of testing is to obtain information on language skills of the learners.

• Information is very costly. The more specific it is, the more cost it involves.– Is language testing targeting specific information?– Costs here involve human and material resources and

TIME.– Once an institution/teacher decided that the

information is needed, it/he should be ready to meet the costs.

Types of tests

• Achievement tests (final or progress)

• Proficiency tests

• Pro-achievement tests

• Diagnostic tests

• Placement tests

Test marking

• Assessment scale (also: rating scale)– criteria by which performances at a given

level will be recognized– levels of performance:

• 10 (excellent), 9 (very good), 8 (good)• bands 0-9 in IELTS• 1-100 pts in the national English examination• level descriptors – verbal descriptions of

performances that illustrate each level of competence on the scale

Communicative language competences

• Linguistic competences– lexical, grammatical, semantic, phonological,

orthographic, orthoepic• Sociolinguistic competences

– markers of social relations, politeness conventions, expressions of folk wisdom, register differences, dialect and accent

• Pragmatic competences– discourse comp. (ability to arrange sentences in

proper sequence), functional (requests, invitations etc.)

(adapted from CEFR 2001)

Competences vs. skills

• Competences are tested through skills• The four major skills are subdivided into

minor subskills:– reading comprehension:

• reading for general orientation• reading for information• reading for main ideas• reading for specific information• reading for implications etc.

(CEFR 2001)

What is good testing?

• It is valid • It is reliable• It is practical• It has positive impact

on the teaching process

• VALIDITY• RELIABILITY• PRACTICALITY• WASHBACK EFFECT

Test validity

• It appropriateness of the test; OR

• It shows that a test tests what it is supposed to test; OR

• A test is valid if it measures accurately what it is intended to measure.

• To establish that a test is valid, empirical evidence is needed. The evidence comes from different sources…

Types of validity

• Construct validity: – the extent to which a test measures the

underlying psychological construct (“ability, capacity”)

– the extent to which a test reflects the essential aspects of the theory on which that test is based

– an overarching notion of validity reflected in many subordinate forms of validity

In a more complicated way…

• If a test does not have construct validity, test scores will show CONSTRUCT IRRELEVANT VARIANCE. – E. g. in an advanced speaking test candidates

may be asked to speak on an abstract topic. Personal engagement in the topic, however, may weaken or improve the performance. BUT: having previous knowledge about the abstract topic should not be assessed.

Types of validity

• Content validity:– the extent to which a test adequately and sufficiently measures

the particular skills it sets out to measure (cf. test specifications)• Response validity:

– … test takers respond in the way expected by the test developers

• Predictive validity:– … a test accurately predicts future performance

• Concurrent validity:– … one test relate to scores on another external measure

• Face validity:– … test appears to measure whatever it claims to measure

(Hughes 2003: 26-35)

Types of validity

• Nearly 40 different types have been collected on a language testers’ forum…

• The more different types of validity are established in a test, the more valid that test is considered to be.

Test reliability

• Quality of test scores resulting from test administration:– accuracy of marking and fairness of scores– consistency of marking:

• similar scores on different days• similar scores from different markers

– inter-rater reliability– intra-rater reliability

Factors influencing reliability

1. The performance of test takers1. a sufficient number of items2. restricted freedom of test behaviour3. unambiguous items, clear instructions and rubrics4. layout, good copies, familiar format5. proper administration

2. The reliability of scorers1. objective scoring vs. subjective scoring2. restricting freedom of response3. a detailed scoring/marking key

Test feasibility/practicality

• It is the ease with which the items/tasks can be replicated in terms of resources needed, e. g. time, materials, people

Washback effect (sometimes ‘backwash’)

• It is a type of impact of examinations/tests on the classroom situation.

• Washback may be positive or negative.

How to achieve positive washback?

1. Test the abilities/skills whose development you want to encourage.

2. Sample widely and unpredictably.3. Use direct testing.4. Make testing criterion-referenced.5. Base achievement tests on objectives.6. Make sure that the test is known and

understood by students and other teachers.

References and additional reading

1. Alderson, Ch., D. Clapham and D. Wall. 1995. Language Test Construction and Evaluation. Cambridge: CUP

2. Hughes, A. 2003. Testing for Language Teachers. 2nd ed. Cambridge: CUP.

3. Council of Europe. 1991. Common European Framework of Reference for Languages. Cambridge: CUP.

Documents

Principles in language testing