Questions to check whether or not the test is well designed: 1. How do you know if a test is effective? 2. Can it be given within appropriate administrative

1

Principles of Language Assessment

Questions to check whether or not the test is well designed:

1. How do you know if a test is effective?

2. Can it be given within appropriate administrative constraints?

3. Is it dependable?4. Does it accurately measure what

we want to measure?

PRATICALITYRELIABILITY

VALIDITYAUTHENTICITY

WASHBACK

2

PRACTICALITYAn effective test is practical:

It is not excessively expensive.It stays within appropriate time

constraint.It is relatively easy to administer.It has a scoring/evaluation procedure

that is specific and time-efficient.

3

RELIABILITYA reliable test is consistent and dependable.If the test is given to the same student or matched students on

two different occasions, the test should yield the similar results.Factors contribute to the unreliability of a test:

Student-related reliability – temporary illness, fatigues, a “bad day”, anxiety, and others.

Rater Reliability – human error, subjectivity, bias in scoring process. Inter-rater reliability – two or more scorers yield inconsistent scores of the

same test, due to lack of attention to scoring criteria, inexperience, inattention, and preconceived biases.

Intra-rater reliability – a scorer yield inconsistent scores of the same tests done by different test-takers.

Unreliability may also result the conditions in which the test is administered, e.g., tape-recorder, and noises when in listening test.

4

ValidityValidity is the extent to which inferences made from

assessment results are appropriate, meaningful, and useful in terms of the purpose of the assessment.

A valid test of reading ability actually measure reading ability –previous knowledge in a subject nor other variable of questionable relevance.

Types of validity in tests:1. Content validity2. Criterion validity3. Construct validity4. Consequential validity5. Face validity

5

Content Validity (1)

If a test actually samples the subject matter about which conclusions are to be drawn, and if it requires the test taker to perform the behavior that is being measured, it can claim content-related evidence of validity, often popularity referred to content validity

A test is considered valid, if the tester can clearly define the achievement measured.

The test of speaking ability does not achieve the content validity if the test asks the test takers to answer paper-and pencil multiple-choice questions regarding grammatical judgment.

6

Content Validity (2)

Another way of understanding content validity is to consider the difference between direct and indirect testing.Direct testing - the test-taker is actually performing the

target task. Indirect testing – the test-takers are not performing the task

itself but rather than a task that is related in some way.The most feasible rule of thumb for achieving

content validity in classroom assessment is to test performance directly.

7

Criterion ValidityIn the case of teacher-made classroom assessment,

criterion-referenced evidence is best demonstrated through a comparison of results of an assessment with result of some other measure of the same criterion.

Criterion evidence usually falls into one of two categories: concurrent and predictive validity.Concurrent validity – the results are supported by other

concurrent performance beyond the assessment itself.Predictive validity – in the case of placement tests,

admissions assessment batteries, and the like.

8

Construct ValidityA construct is any theory, hypothesis, or model that

attempts to explain observed phenomena in our universe of perception.

Constructs may or may not be directly or empirically measured – the verification often requires inferential data.

Does this test actually tap into the theoretical construct as it has been designed.E.g., proficiency and communicative competence are

linguistic construct.Tests are operationally definitions of constructs in that

they operationalize the entity that is being measuredConstruct validity is a major issue in validating large-scale

standardized test of proficiency.

9

Consequential ValidityConsequential validity encompasses all the

consequences of a test, including such considerations as its accuracy in measuring intended criteria, its impact on the preparation of test-takers, its impact on the learners, and social consequence of s test’s interpretation and use

As high-stakes assessment has gained ground in the last two decades, one aspect of consequential validity has drawn special attention: the effect of test preparation courses and manual on performance.

10

Face Validity (1)Face validity referred to degree to which a

test looks right, and appears to measure the knowledge or abilities it claims to measure.

It is based on the subjective judgment of the examinees who take it, the administrative personnel who decide on its use, and other psychometrical unsophisticated observers.

Face validity asks the question, “ Does the test, on the ‘face’ of it, appear from the learner’s perspective to test what it is designed to test?”

11

Face Validity (2)Face validity will likely be high if learners

encounter:A well-constructed, expected format with

familiar taskA test that is clearly doable within the allotted

time limitItems that are clear and uncomplicatedDirections that are crystal clearTasks that relate to their course work (content

validity)A difficult level that presents a reasonable

challenge.Face validity is purely a factor of the “eye of

the beholder” – how the test-takers, or possibly the test giver, intuitively perceives the instrument.

Documents

Questions to check whether or not the test is well designed: 1. How do you know if a test is effective? 2. Can it be given within appropriate administrative