5
1 Language testing A test is a sample of an individual’s behaviour/performance on the basis of which inferences are made about the more general underlying competence of that individual. Language tests refer to any kind of measurement/examination technique which aims at describing the test taker’s foreign language proficiency, e.g. oral interview, listening comprehension task, free composition writing. 1. Kinds of tests and testing Proficiency tests Proficiency tests aim to measure students' L2 competence regardless of any training they previously had in the language. In these tests, designers specify what the candidates should be able to do to pass the test. Achievement tests Achievement tests assess whether learners have acquired specific elements of language that they were taught in the language course they took part in. There are two types of achievement tests: final tests at the end of the course and progress tests during the course. Diagnostic tests Diagnostic tests help identify learners' strengths and weaknesses in L2. Their main aim is to help teachers decide what needs to be taught to students. Placement tests With the help of placements tests students can be placed in the learning group that is appropriate for their level of competence. Direct versus indirect testing Direct tests: candidates are required to perform the skill the test intends to measure. Indirect tests want to measure skills that underlie performance in a particular task. Discrete point versus integrative testing In discrete point tests every item focuses on one clear-cut segment of the target language without involving the others Typical test format: written multiple-choice test. In integrative tests candidates need to use a number of language elements at the same time in completing the test tasks. For example: essay writing, dictation, cloze test.

Language Testing

Embed Size (px)

DESCRIPTION

Language Testing

Citation preview

  • 1Language testing

    A test is a sample of an individuals

    behaviour/performance on the basis of

    which inferences are made about the

    more general underlying competence of

    that individual.

    Language tests refer to any kind of

    measurement/examination technique

    which aims at describing the test takers

    foreign language proficiency, e.g. oral

    interview, listening comprehension task,

    free composition writing.

    1. Kinds of tests and testing

    Proficiency tests

    Proficiency tests aim to measure students' L2 competence regardless of any training they previously had in the language. In these tests, designers specify what the candidates should be able to do to pass the test.

    Achievement tests

    Achievement tests assess whether learners have acquired specific elements of language that they were taught in the language course they took part in. There are two types of achievement tests: final tests at the end of the course and progress tests during the course.

    Diagnostic tests

    Diagnostic tests help identify learners'

    strengths and weaknesses in L2. Their

    main aim is to help teachers decide what

    needs to be taught to students.

    Placement tests

    With the help of placements tests

    students can be placed in the learning

    group that is appropriate for their level of

    competence.

    Direct versus indirect testing

    Direct tests: candidates are required to

    perform the skill the test intends to

    measure.

    Indirect tests want to measure skills that

    underlie performance in a particular task.

    Discrete point versus integrative testing

    In discrete point tests every item focuses on one

    clear-cut segment of the target language without

    involving the others Typical test format: written

    multiple-choice test.

    In integrative tests candidates need to use a

    number of language elements at the same time

    in completing the test tasks. For example: essay

    writing, dictation, cloze test.

  • 2Norm referenced tests

    In norm-referenced tests, candidates performance is assessed in comparison with that of the other candidates. For these reasons the cut-off points (line between fail and pass) are determined afterthe test results are obtained from the group of students based on the distribution of the scores.

    TOTAL

    75,070,0

    65,060,0

    55,050,0

    45,040,0

    35,030,0

    25,020,0

    TOTAL

    Fre

    qu

    en

    cy

    50

    40

    30

    20

    10

    0

    Std. Dev = 11,96

    Mean = 53,1

    N = 270,00

    Criterion referenced tests

    Criterion-referenced tests compare all the testees to a predetermined criterion. In such tests everybody whose achievement comes up to the pre-set criterion will receive a pass mark, while those under it will fail. The criteria are often set in terms of tasks that students have to be able to perform (e.g. to interact with an interlocutor with ease; to ask for information and understand instructions).

    Common European Framework of

    References for Languages

    Proficient user

    C2

    Can understand with ease virtually everything heard or read. Can summarise information from different spoken andwritten sources, reconstructing arguments and accounts in a coherent presentation. Can express him/herselfspontaneously, very fluently and precisely, differentiatingfiner shades of meaning even in more complex situations.

    C1 Can understand a wide range of demanding, longer texts, and recognise implicit meaning. Can express him/herselffluently and spontaneously without much obvious searchingfor expressions. Can use language flexibly and effectivelyfor social, academic and professional purposes. Canproduce clear, well-structured, detailed text on complexsubjects, showing controlled use of organisational patterns, connectors and cohesive devices.

    Independent user

    B2Can understand the main ideas of complex text on bothconcrete and abstract topics, including technicaldiscussions in his/her field of specialisation. Can interactwith a degree of fluency and spontaneity that makesregular interaction with native speakers quite possiblewithout strain for either party. Can produce clear, detailed text on a wide range of subjects and explain a viewpoint on a topical issue giving the advantages anddisadvantages of various options.

    B1 Can understand the main points of clear standard input on familiar matters regularly encountered in work, school, leisure, etc. Can deal with most situations likelyto arise whilst travelling in an area where the language is spoken. Can produce simple connected text on topicswhich are familiar or of personal interest. Can describeexperiences and events, dreams, hopes and ambitionsand briefly give reasons and explanations for opinionsand plans.

    Basic User

    A2 Can understand sentences and frequently used expressionsrelated to areas of most immediate relevance (e.g. very basicpersonal and family information, shopping, local geography, employment). Can communicate in simple and routine tasksrequiring a simple and direct exchange of information onfamiliar and routine matters. Can describe in simple termsaspects of his/her background, immediate environment andmatters in areas of immediate need.

    A1 Can understand and use familiar everyday expressions andvery basic phrases aimed at the satisfaction of needs of a concrete type. Can introduce him/herself and others and canask and answer questions about personal details such aswhere he/she lives, people he/she knows and things he/shehas. Can interact in a simple way provided the other persontalks slowly and clearly and is prepared to help.

    Objective testing versus subjective

    testing

    The scoring of a task is objective if the

    rater does not have to make a judgement

    because the scoring is unambiguous. For

    example: multiple choice test.

    In subjective test tasks, raters have to

    make a judgement when assessing

    candidates' performance. For example:

    marking of an essay.

  • 3Consider the tasks in the 199 exam:

    1. C-test

    2. Gap-fill task

    3. Summary writing

    Decide whether these tasks are direct or

    indirect, subjective or objective,

    integrative or discrete point tasks.

    Reliability

    Reliability is the extent to which a test is free of random measurement error and produces consistent results when administered under similar conditions. This means that a reliable test is not affected by circumstances outside the test (e.g. the people who administer and mark the test, the time and place of the test)

    Types of reliability:

    internal consistency: whether the test items are related to each other and measure the same ability

    parallel or alternate form reliability: how well parallel or alternate forms of the same test measure the same ability

    test-retest reliability: whether test-takers perform

    similarly each time they complete the test

    intra-rater reliability: whether the same rater

    assesses the test-takers' performance in the

    same way each time he/she evaluates the test

    inter-rater reliability: whether two raters assess

    the test-takers' performance in the same way

    Validity

    Validity is the extent to which a test measures what it is supposed to measure and nothing else.

    content validity: whether the test measures the ability it intends to measure;

    concurrent validity: whether the test takers' performance in a test correlates with their results in a different type of test;

    predictive validity: whether the test results accurately predict future performance;

    construct validity: whether the test appropriately represents the theory of language competence it is based on;

    face validity: whether the test looks as if it measures what it is supposed to measure.

    About the validity of the C-test

    The item-related strategies used by the participants

    Type of strategy Percentage of total

    Lexical 12.41

    Syntactic 9.97

    Morphological 3.71

    Textual 5.36

    Background knowledge 0.83

    Translation 6.48

    Counting the number of letters 15.31

    No strategy used - automatically filled in 45.87

    Total 100

    3. Types of frequently used objective

    test tasks

    Multiple choice.

    It consists of a stem: 1. He ______________ three letters since 9 o'clock.

    And options, one of which is correct and the others are distractors.

    A writes

    B has written

    C has been written

    D had written

    Cloze test

    It is a continuous text in which every Nth word is mechanically deleted. N is usually between five and ten. The examinees have to fill in these blanks. It aims to test reading comprehension, syntax and vocabulary.

  • 4C-test

    In the C-test the second half of every second word is left out. C-tests can provide a rough measure of learners' global level of proficiency.

    Dictation

    The basis of the procedure is that each individual dictated chunk is long enough (10-25 words) to exceed the learners short-term memory, and so the forgotten items have to be filled in from the context and the learners knowledge of the language.

    Editing

    The editing test is the is reverse of the cloze test.

    For example:

    extra words extra are inserted put placed gone into to a text, and testees are is required to crossing cross these out.

    Matching

    Candidates are given a list of possible answers which they have to match with another list of words.

    For example:

    Match the words on the left with those on the right to make other English words.

    1 head A partner

    2 room B wife

    3 business C master

    4 house D mate

    Ordering

    In ordering tasks, candidates have to put a group of words, sentences or paragraphs in order.

    For example:

    Put the following words in order to complete the sentence:

    went yesterday I cinemafriend to with.

    The oral proficiency interview

    Ideally the oral proficiency interview consists four phases:

    1. Warm-up: usually not marked;

    2. Level-check: getting an approximate idea of the learners proficiency level and the topics he/she feels comfortable in;

    3. Probes: actual rating starts only at this stage, the interviewee is pushed up to or beyond his/her level of competence;

    4.Wind-up: rounding off the interview by turning back to activities within the learners ability so as not to send him/her away with a feeling of failure.

    Analysis of test results

    The three most simple

    analyses of test

    results are the

    following:

    1. Distribution curve

    shows the number

    of students scoring

    within a particular

    range. Score14,0

    12,0

    10,0

    8,0

    6,0

    4,0

    2,0

    0,0

    20

    10

    0

    Std. Dev = 3,25

    Mean = 8,3

    N = 61,00

    2. Facility value expresses the proportion of

    students who responded correctly to an item.

    For example: if 100 students took part in a test,

    and 54 of them got the item right, the facility

    value is 0.54.

    3. Discrimination index expresses how well an

    item can discriminate between good and bad

    students. Ranges from 1 to - 1.

    Statistical features of good tests

    The distribution curve should be bell-shaped.

    Facility values should be between 0.3 and 0.7 (or in more lenient approaches to test design 0.2-0.8).

    Discrimination indices should be above 0.4 (or in more lenient approaches to test design above 0.3).

  • 5Washback

    Washback is the effect tests have on teaching and learning.

    A beneficial washback effect can be if a so far neglected skill (e.g. listening) is put into the focus of teaching as a result of the introduction of a test where scores in this skill are important in determining the candidates' grades.

    A negative washback effect can be if most of the time in lessons in secondary schools is spent on practising multiple choice tests.

    Tests have effect on those who take the test, the teachers who prepare the students for the tests, the teaching materials (e.g. course-books), the society and the educational system.

    1. Explain the difference between

    proficiency and achievement tests;

    b) diagnostic and placement tests;

    c) direct and indirect tests;

    d) subjective and objective tests;

    e) norm-referenced and criterion referenced tests;

    f) integrative and discrete point tests.

    2. What is reliability? List the various types of reliability.

    3. What is validity? List the various types of validity.

    4. What are the most frequently used objective test tasks?

    5. What are the most frequent statistical measures of test performance?

    6. What effects can tests have on teaching and learning?