offeringaelt2010.files.wordpress.com · Web viewExample: MLAT (Modern Language Aptitude), PLAB (Pimsleur Language Aptitude Baterry) Selection/admission Test (Entering Test) Purpose:

Group Final AssignmentPresented to fulfill final assignment of English (Language) Learning Assessment

Compiled By:

Afrilia Martyarizkita (100221400409) Angela Merici Shella (1002214040969)

STATE UNIVERSITY OF MALANGENGLISH DEPARTMENT

MAJOR OF ENGLISH EDUCATION

December 2012

TEST TYPES

1. Types test based on purpose1.1. Language Aptitude Test

Purpose: It is designed to measure capacity or general ability to learn a foreign language (to know the talent in learning second language) Material: Language as a general abilityExample: MLAT (Modern Language Aptitude), PLAB (Pimsleur Language Aptitude Baterry)

1.2. Selection/admission Test (Entering Test)Purpose: it is designed to select applicants who fulfill the requirement or not. There will be two categories in this test, which are pass or fail. The one who pass could get the next section.Example: SNMPTN, test for entering SMA

1.3. Placement testPurpose: It is designed to place student into particular level or section that appropriate for their ability. We could know their level of ability with this testMaterial: Reflecting the materials in the instructional programExample: test which is conduct to place student in level elementary, level intermediate or level advance

1.4. Achievement TestPurpose: It designed to know what have student achieve in learning process. It refers to syllabus or curriculum.Material: Reflecting the instructional materials in syllabus/curriculum Example: final test, Midterm test

1.5. Diagnostic TestPurpose: It is designed to know the strength or the weaknesses of student in learning process.Material: Covering the instructional materials of a learning programAfter diagnose, we should follow up by having remedial test or reinforcement test

1.6. Proficiency TestPurpose: It is designed to know student current ability. This tests overall ability without limiting to course, curriculum, or single skill in the languageMaterial: Language as a general abilityExample: TOEFL, IELTS, TOEIC

1.7. Test for Research Purpose: it is designed to conduct research and it is not only related to class room general language ability. This test is the most conceptual because the purpose of this test have abstract characteristic which is to verify a theoryMaterial: Conceptualize general language abilityExample: test foe a research to know the relation between vocabulary sizes with speaking skill.

1.8. Program evaluation any onePurpose: it is designed to evaluate program Material: Conceptualize general language abilityExample: questioner

2. Types of test based on teachers scoring 2.1. Subjective testIt is a test in which the learners ability or performance are judged by examiners’ opinion and judgment. It requires the examinees to create their own responses. No single wording (or set of actions) can be regarded as the only correct response, and a response may earn full or partial credit. Responses must be scored subjectively by content experts. Example: writing essay and short answer.

2.2. Objective testIt is a test in which learners ability or performance are measured using specific set of answer. This test consists of factual questions and requires extremely short answers that can be quickly and unambiguously scored by anyone with an answer key, thus minimizing subjective judgments by both the person taking the test and the person scoring it. They tend to focus more on specific facts than on general ideas and concepts. Example: multiple choices test, true or false test, matching and problem based question

3. Types of test based on the way evaluate the testing3.1. Direct testing

It is test which elicits in process using language skill activities (listening, speaking, reading or writing) that follow directly.

3.2. Indirect testingIt is Test which does not elicit in process using language skill activities and if we use it, we could not follow directly.

Direct IndirectCompetence/system I IIPerformance III IV

I. Direct competence/system test is test which is oriented to measure the knowledge of language component that the process of elicitation is done by using listening, speaking, reading or writing.Example: test writing or speaking to measure vocabulary items.

II. Indirect competence/system test is test which is oriented to measure the knowledge of language component that the process of elicitation is not done by using listening, speaking, reading, or writing.Example: test of vocabulary items in the form of multiple choices.

III. Direct performance test is test which is oriented to measure the language skills that the process of elicitation is done by directly do listening, speaking, reading or writing.

Example: test of writing in the form of making letter.IV. Indirect performance test is test which is oriented to measure the language

skills that the process of elicitation is not done by directly do listening, speaking, reading or writing; even if we do that we do not do that in the form of natural communication. Example: test of listening in the form of arranging piece of pictures that appropriate with what student have listened from the record.

4. Types of test based on score interpretation4.1. Norm-referenced testing

Norm-referenced testing compare an examinee’s performance to that of other examines (comparing one ability to another). The goal is to rank the set of examinees so that decision about their opportunity for success. Example: collage entrance

4.2. Criterion-referenced testingCriterion-referenced test differ in that each examinee’s performance is compare to pre-defined set of criteria or a standard. Compare the individual performance tothe standard. Determine whether reaching exactly the standard, below the standard or above the standard.Example:

5. Type of test based on believe about language (theoretic)5.1. Discrete point testing

Language consists of two sub systems: of meaning and of form. Being considered abstract, the sub system of meaning is not feasible for linguistic analyses. On the other hand, the sub system of form can further be disentangled into smaller sub layer such as systems of phones, morphs, and those of syntactic structures.

Another distinction made in this view is the notion that language comprises a component –skill relation. The former consist of vocabulary, grammar, pronunciation, while the letter is made up of listening, speaking, reading, and writing. Where relevant, these language skills are characterized by language component.

This view implies that the mastery of language consequently needs to be achieved through the mastery of layers. In the testing practices all to measure the mastery of the language is also accomplished through each layers of language.

Apart from the advantages, the approach is also inbuilt with several potential challenges. In the places, constructing discrete-point test items is potentially energy and time consuming. Next, the test tends to be not included social contexts where verbal communication normally takes place. Most importantly, due to its atomistic nature, success in doing the test is not readily inferable to the ability of the test taker to communicate in real life circumstances.5.2. Integrative testing

Quite the opposite to the views held by the followers of the discrete-point approach, those advocating the integrative-pragmatic approach maintain that language processing cannot be fragmented into elements. These systems of forms and of

meaning work together to produce language. Seen from the theory, the approach is characterized by the Gestalt theory. Language is comprised by several parts in an integrative manner. Examples of test utilizing these principles include oral interview, writing, composition, dictation, and cloze procedures.

APPROACHES TO LANGUAGE TESTING

Based on the period in which the approaches are in operation, Baker (1989) distinguishes two major eras:

1. Pre-scientific1.1. The Classical and the Grammar-Translation Approaches

“All languages should be projected to Greek-Like or Latin-Like”This approaches emphasis in:

Grammatical rule Memorizing vocabulary Translating classical text

This approach is applied in GTM on learning English (concentrated on how to teach grammar and drilling vocabulary)The concept of this approach is if someone can translate English into their native language, it means that they have mastered English well.The example of activities that are used in this approach are:

Started with introduction of grammar and ended by translation. This approach is characterized particularly by an unclear distinction between teaching and testing.Example: grammar test, vocabulary test, and translating text.

2. Scientific1.1. The Discrete-Point Approach

“English can be broken into several components” This approaches emphasis in mastering English component separately.

Languages

Form Meaning

Phones Morphs other syntactic structureThis approach is applied in Audio Lingual Method in its five pillars as follows (Moulton cited in Newton, 1979:18)

Language is speech, not writing (represent of spoken) Language is a set of habit Language is what its native speakers say, not what someone thinks they

ought to say Teach the language not about the language Languages are differentThe examples of this approach are dialogue completion, pair dialogue performance, and atomistic testing (beginning of standardized test)

1.2. The Integrative-Pragmatic Approach

“Language is unitary; it is not divisible”. There is another name for the approach; “The Unitary Competence Hypothesis”. Language cannot be separated from its social context of use (which are limited). Thus, testing a language requires not just language elements and skills, but also contents to be conveyed, which necessarily imply the messages. The meaning derived from reading involves not just letter recognition, word recognition, knowledge of syntactic of lexical meaning but also knowledge of other social aspects contained in the text, al of which work together. Examples of tests in this approach are:

Interview Writing composition Dictation Cloze procedure

The scoring that are used in this approach are general impression and analytical.

1.3. The Communication-Based Movement ApproachThis approach belief is “Language is a means of communication” (meaning negotiation). To communicate through language means to use knowledge and skills (listening, speaking, reading, and writing) by way of authentic assessment (material and teaching aids). Teaching practices in the classroom are geared to the making of the students become communicatively competent in using the language learned according to the social context of use. It means that real-life is attempted to be brought to classroom contexts, and purposeful accomplishment of language functions like declining an invitation, agreeing, refusing, persuading, inviting, complementing, etc. The teaching techniques are commonly employed: games, ole play, and other communicative teaching activities.This in pat puts aside the concept of native speakers as a yardstick of measurement for effective communication of meaning is more valued.

1.4. The Performance-Based Movement Approach (authentic performance)“Language is a vehicle of context” (students do something about language). This approach is usually applied in CTL (Contextual Teaching Learning). Communicative Approach recognizes the importance of linking the classroom practices to real-life conducts beyond classroom contexts. However, rather than bring real-life to classroom (communicative) context, the joyful learning considers classrooms and beyond as real-life (contextual).In this approach grammar is not really considered (same with communicative). However, the Contextual Teaching and Learning has the following components (Johnson, 2000:24):

Making meaningful connections Doing significant work Self-regulated learning Collaborating Critical and collaborative thinking

Nurturing the individual Reaching high standards Using authentic assessment

These components are translated as the elements or the pillars of the Contextual Teaching Learning as follows: Inquiry Based-Learning, Constructivism, Questioning, Learning Communities, Authentic Assessment, Problem-Based Learning, Work-Based Learning, and Service Learning.The examples of authentic assessment that are used in this approach are: portfolio, project, experiment, extended response, and others.

REQUIREMENTS OF A GOOD ENGLISH TEST

English test is considered as tools or instruments which are designed for concerning the existence of English abilities which are not directly observable. Teachers rely on test and it needs to be good for purpose. Good test must be able to elicit latent English abilities so that they are observable. In contrast, bad test are misleading for subsequent processes like measurement and evaluation. A good test has to have the requirements such reliability, validity, practically and economy.

1. Reliability A good test needs to be reliable. Reliable means ‘stable’ or ‘consistence’ so that reliable test is a test that can produce stable scores or consistent scores. Test scores demonstrated consistency or stability no matter who administers the test, when or where the test is administeredMathematical term of score:

A good measurement is the one in which no error during measurement is committed. In other words, the error is equal to 0 (zero).There are three types to estimate reliability:1.1. Test Retest

The test-retest reliability estimation involves utilization of the same test to a number of test takers on different testing occasions. Used to assess the consistency of a measure from one time to anotherWeaknesses:• It is not easy to create a similar condition on different testing occasion.• It is not known exactly what the best time interval for conducting the

second test administration; too long or too close.Strength:• It has only one set of test to be constructed. , so we need less time and

energy1.2. Parallel Forms

The parallel forms technique requires the construction of two or more sets of test which these parallels test are made equal in every aspects of the test.Weaknesses:• Making test that are equally similar in all aspect is not an easy task. It

needs more energy and time consuming• It is not easy to keep the test taker’s mental condition the same when they

responded to two sets of test administered almost at the same time.Strength:Two form can be used in independent.

1.3. Internal ConsistencyThis internal consistency approach is based on the logic that if the items in the test are highly correlated, the test is said to be reliable. Internal consistency

Obtained or Observed Score = True Score + Measurement Error

can be performed using different approaches. These approaches are commonly called as:1.3.1. Split-half

In split-half estimation we randomly divide all items that purport to measure the same construct into two setsWeakness:• It does not fully reflect the true of reliability of the test

1.3.2. Inter-itemInter-item estimation uses all of the items on our instruments that are designed to measure the same construct. In addition to these types of estimating reliability of scores, an estimation of score consistency can also be applied to scores that derived from different scores or raters. In this case, various names are utilized. It is used to assess the degree to which different raters/observers give consistent estimates of the same phenomenon.

Factors that affect reliability of the score Test Taker -- perhaps the subject is having a bad day Test Itself -- the questions/ direction on the instrument may be unclear Testing Conditions -- there may be distractions during the testing that detract the

subject- too hot, too comfortable,etc Test Scoring -- scores may be applying different standards when evaluating the

subjects' responses- the evaluator might be ill, tired, and broken hearted, galau

2. ValidityBased on Kline (1993: 15), a test is said to be valid if it measures what it claims to measure. Simply, validity is the precision of the test in measuring what is intended to be measured. It has several dimensions or aspects, these are:2.1. Face Validity

‘The concept of face validity relates more to what a test appears to measures than what test actually measures’ (Cohen et al., 1988:125). Face validity of a test then is linked to what a test looks. If a test looks to measures what is intended to measure on “the face of it” or on its look, the test can be said to be face valid.In brief, face validity refers to the extent to which physical appearance of the test corresponds to what it is claimed to measure.

2.2. Content ValidityBased on Wiersma and Jurs (1990:184), content validity means the extent to which the test is representative of a defined body of content consisting of topics and processes. For instance, a grammar test contains grammatical points to be tested such as infinitive, gerunds, modals, tenses, etc.

2.3. Empirical ValidityEmpirical validity describes how closely scores on a test correspond (correlate) with behavior as measured in other contexts. Moreover, we can say an instrument has empirical validity when it is had tested. This kind of validity

can be differentiated into two based on the time for data collection of the external measure, they are:2.3.1. Concurrent Validity

If the results are supported by other concurrent performance beyond the assessment itself. On the other word, it is when the score in a test related to another score that was made.

2.3.2. Predictive ValidityPredictive validity means to assess and predict test takers’ prospect in the future life.

2.4. Construct ValidityBased on (Grondlund, 1985:72), construct validity is ‘. . . the extent to which the test performance can be interpreted in terms of on or more psychological construct.’

Based on Sukardi (2009:38), there are some factors that can influence the evaluation test become not valid such as internal factors, external factors and factors from the students them self.The internal factors from the test The instruction is not clear, so it can decrease the test validity. The words that are used in the structure of instrument evaluation are too difficult. The construction of the test items is not good. The difficulties level of the test items is not appropriate. The time allocated is not appropriate. The test items are not representing the content of the materials. The answers of the questions can be predicted by the students. The external factors The time allocated is not enough for the students. The assessment is not consistent. There is another person from outside helping the student to do the test.The factors from the students them self The wrong interpretation from the students. The students cannot concentrate well.

ITEM ANALYSIS

Item analysis is a process which examines student responses to individual test items (questions) in orderto assess the quality of those items and of the test as a whole. The purpose of doing item analysis are:

1. To improve items which will be used again in later tests. 2. To eliminate ambiguous items in a single test administration.3. To increase instructors' skills in test construction.4. To identifyspecific areas of course content which need greateremphasis or clarity.

The method that are used in doing item analysis are:1. Item Difficulty

Item difficulty is determined by the number of people who answer a particular test item correctly (p).Method of item difficulty:

1.1. Method for Dichotomously Scored Item

p is the difficulty of a certain item. R is the number of examinees who get that item correct. N is the total number of examinees.

1.2. Method for Polytomously Scored Item

1.3. Grouping MethodUpper (U) and Lower (L) Criterion groups are selected from the extremes of

distribution of test scores or job ratings

2. Item DiscriminationItem discrimination refers to the degree to which an item differentiates correctly among test takers in the behavior that the test is designed to measure. Item discrimination determines whether those who did well on the entire test did well on a particular item.Method of item discrimination: Index of Discrimination

D = pH - pL

is the proportion for examinees of lower group who get the item correct.

isthe proportion for examinees of upper group who get the item correct.

LP

UP

, the mean oftotal examinees’ scores on one item

, the perfect scores of that item

P= RN

X max

X̄

P= X̄X max

P=PU +PL

2

We need to set one or two cutting scores to divide the examinees into upper scoring group

and lower scoring group.pH is the proportion in the upper group who answer the item correctly and pL is the proportion in the lower group who answer the item correctly.Values of D may range from -1.00 to 1.00.

3. Item validity (point biserial method)Another way to determine the discriminability of an item is to determine the correlation

coefficient between performance on an item and performance on a test, or the tendency of students selecting the correct answer to have high overall scores. It is called as POINT-BISERIAL METHOD

4. Effectiveness of distractors

Scoring, Grading, Test-Score Interpretation

SCORINGScoring is a process to utilizing a number to represent the responses made by the test

taker. The score is basically raw (raw score) because in order for the score to be meaningful, further analyses are required.

Types of scoring: (It is classified based on how the test taker’s response is viewed and treated)

Dichotomous vs Continuous ScoringDichotomous:entails viewing and treating the response as either one of two distinct, exclusive categories. Example: scoring in multiple choice, true-false, and correct-incorrect (1 is assigned to a correct answer; 0 to an incorrect one)Continuous:views and treats the test taker’s response as being graded in nature.Example: speaking and writing test ( may be scored as 1,2,3,4,5 in term of fluency for speaking test)

Holistic, Primary Trait and Analytic ScoringHolistic:considers the test taker’s response as a whole totality rather than as consisting of fragmented part.Example: speaking test will be scored used Test of Spoken English (TSE) scoring guide Primary Trait:scoring focused on one specific type of features or traits that the test takers need to demonstrate For example, in writing, the teacher scored only the content of the writing product.Analytic scoring:emphasizes on individual points or components of the test takers’ response.Linguistic and non linguistic features are both important to be scored.

GRADING

Grades reflect the standard to have the weighting system of a quality. In grading, you can include both achievement aspect and non-achievement aspect. Relative grading is usually accomplished by ranking students in order of performance (percentile ranks) and assigning cut-off points for grades. It is allowing your own interpretation and of adjusting for unpredicted ease or difficulty of a test.

If you pre-specify standards of performance on a numerical point system, you are using an absolute system of grading. For example, having established points for a midterm test, points for a final exam, and points accumulated for the semester, are set by the institution.

Test Interpretation:

Steps:

1. Frequency Distribution

Showing the number of students who obtained mark awarded.

From 30 students, there are 6 students get 100

2. Measures of Central Tendency

Indicators of how well most students perform in the group or in brief, indicators of group performance.

Mean Median Mode

3. Measures of Variability

Indicators of homogenity or heterogenity of a group.

Range Standard deviation Variance

4. Item Analysis

A process which examines students’ responses to individual test items (questions) in order to assess the quality of those items and of the tests as a whole.

Stages of Test Construction

There is a set general procedure for test constructions.

Set the purposeIn constructing a test, a test maker has to make sure about what he/she wants to know and for what purpose. The following questions have to be answered: What kind of test is it to be? ( Achievement/proficiency/diagnostic/placement test) What is the precise purpose? What abilities are to be tested? How detailed must the results be? How accurate must the results be? How important is backwash? What constraints are set by unavailability of expertise, facilities, time (for

construction, administration and scoring)

Make Blue printThe first form that the solution takes is a set of specifications for the test. This will include information on: content, format and timing, criteria levels of performance and scoring procedures.

Expert review revision Item writing based on blue print

It is most unlikely that everything found under the heading of “Content” in the specifications can be included in any one version of the test. Choices have to be made. For content validity and for beneficial backwash, the important thing is to choose widely from the whole area of content. One should not concentrate on those elements known to be easy to test. Succeeding versions of the test should also sample widely and unpredictably.

Expert review in item writing revision

Empirical validation (empirical try out)- To know time allocation- To know item analysis- To know quality of a good test

Pretesting is needed although careful moderation has been administered. There are likely to be some problem with every test. It is obviously better if there problem can be identify before the test is administrated to the group for which it is intended. The aim should be to administer it first to a group as similar as possible to the one for which it is really intended. Problem in administration and scoring are noted. The reliability coefficients of the whole test and of its components are calculated, and individual items are analyzed.

Validity in particular test usually refers to criterion related to validity. We are looking for empirical evidence that the test will perform well against some criterion. For example: The achievement test might be validated against the ratings of students by their current language teachers and by their future subject teachers soon after the beginning of their academic courses.

Analyzed reliability, validity, practical, and economical Analyze the test item (proportion, discrimination, point biserial) revision

INTRODUCTION TO AUTHENTIC ASSESSMENT

Authentic assessment that is based on students activities that reflect real-world performances as closely as possible.The aims of authentic assessment is to monitor the progress of students’ learning (communicative competence (to listen, to speak, to read, and to write)) in order to aid things (strength and weaknesses) that needed by the students.The final target is to create individual who is independent, responsible, creative, and innovative.

The characteristics of authentic assessment:1. An authentic assessment usually includes a task for students to perform and a rubric

(analytical) by which their performance on the task will be evaluated.Principles of making a task:1.1. What to do1.2. How to do it1.3. How long students should do it1.4. What outcome to students submit

2. Constructed with responses3. Higher-order thinking

Students do not only remembering and understanding but also arranging, analyzing, and creating something

4. Integrated (skill and content)5. Process and product6. The depth information

The benefits of authentic assessment for students:1. Students will know how well their mastery toward the learning materials2. Students will know and strengthen their mastery in some skills3. Students will connect the learning with their experiences, world, and society in wider

scope4. Students will sharpen their skill in higher-order thinking5. Students will have responsibilities and choices6. Students will cooperate with other students7. Students will learn to measure the level of their performances

Variety of authentic assessment:1. Interview

In the interview activity, teacher ask/give questions to students, and the students answer the questions given by the teacher. The questions could be about things inside the students or outside the students.

2. RetellingStudents are asked to retell the story they have read or listened. The other students can ask many things about the story that the students retell. There are three things that can be evaluate from the retelling activity, those are:1.1. The students speaking skill

1.2. Organization text order1.3. Students’ responses to the test

3. ComposingStudents are asked to make essays by themselves. For example, descriptions, narration, exposition, etc. The assessment can be done in two approaches:1.1. Global approach

It focuses in the whole essay1.2. Component approach

It focuses in the certain aspects of the essay, they can be organization, language use, etc.

4. Project/exhibitionIn this authentic assessment, teacher asks students to do something in form of project that the result will be exhibited

5. ExperimentStudents can also assigned to do a certain experiment or explaining with procedure demonstration or the procedure of something

6. Written and/or spoken extended responseStudents asked to read texts for the review. Then the students asked to give a written responses of the questions which are given according to the texts.

7. ObservationThe teacher observes the activity that is done by the students. It can be done in two ways:1.1. Spontaneous observation

Observing everything that happens without any plan or preparation before1.2. Structured observation

Observing everything with plans or preparations before8. Portfolio

This authentic assessment done by collecting the students’ works in order to know the development of the students’ ability from time to time toward the aim of learning that already established. The key words of this authentic assessment are: work collections, self-assessment, development of the study, and the purpose of the study.

Documents

offeringaelt2010.files.wordpress.com · Web viewExample: MLAT (Modern Language Aptitude), PLAB (Pimsleur Language Aptitude Baterry) Selection/admission Test (Entering Test) Purpose: