ADMINISTRATION,Reporting and Scoring 1

7/30/2019 ADMINISTRATION,Reporting and Scoring 1

1/55

1

ADMINISTRATION, SCORING AND REPORTING

IntroductionAdministering a test usually is the simplest phases of the testing process. There

are some common problems associated with test administration, however, that

may also affect those scores. Careful planning can help the teacher avoid or

minimize such difficulties. When giving tests it is important that everything

possible be done to obtain valid results. Cheating, poor testing conditions, and

test anxiety, as well as errors in test scoring procedures contribute to invalid testresults. Many of these factors may be controlled by practicing good test

administration procedures. Practicing these procedures will prove to be less time

consuming and less troublesome than dealing with problems resulting

from poor procedures.

After administering a test, the teachers responsibility is to score it or arrange to

have it scored. The teacher then interprets the results and uses these interpretations

to make grading, selection, placement or other decisions. To accurately interpret

test scores, however, the teacher needs to analyze the performance of the test as a

whole and of the individual test items, and to use these data to draw valid

inferences about student performance. This information also helps faculty prepare

for post test discussions with students about the exam.


2/55

2

Administrating A TestIt plays a vital role in enhancing the reliability of the test scores. Test

should be administered in a congenial environment strictly as per the

instructions planned and assure uniformity of conclusions to all the people

tested.

Suggestions to administer the test-

Long announcements before or during the test should be not be made.

Instructions should be given in writing.

The test administration should not respond to the individual problems of

the examinees.

The steps to be followed in the administration of group tests are:

a) Motivate the students to do their best

b) Follow the directions closely.

c) Keep time accurately.

d) Record any significant events that might influence test scores

e) Collect the test materials promptly.

The guiding principle in administering an achievement test is that all students

must be given a fair chance to demonstrated their achievement of the learning

outcomes being measured. This mean a physical and psychological environment

conducive to their best efforts and the control of factors that might interfere with

valid measurement. Students will not perform at their best if they are tense and

anxious during testing. They should also be reassured that the time limits are

adequate to allowed them to complete the test. This, of course, assumes that the

test will be used to improve learning and that the time limits are adequate. Thethings to avoid while administering a test are-

Do not talk unnecessarily before the test.

Keep interruptions to a minimum during the test.

Avoid giving hits to pupils who ask.


3/55

3

Administer ing Exams

How an exam is administered can affect student performance as such as how the

exam was written. Below is a list of general principles to consider when

designing and administering examinations.

1. Give complete instructions as to how to take the examination counts or the

amount of time to spend on each section. This helps students to allocate

their efforts wisely.

2. State specifically what aids(e.g. calculator, notebooks) students are

allowed to use in the examination room.

3. Use assignments and homework to provide preparation for taking the

exams. For examples, if the assignments ask all essay questions, it would

be inappropriate for the examination to consist of 200 multiple choice

questions.

4. Practice taking the completed test yourself. You should count on the

students to take about 4 times the amount of time it takes you to complete

the test.

5. For final examinations; structure the test to cover the scope of the entire

course. The examination should be comprehensive enough to test

adequately the students learning of the course material. Use a variety of

different types of questions on the examination(e.g. multiple- choice,

essay, etc) because some topics are covered more effectively with certain

types of questions. Group questions of the same type together when

possible.

6. Tell the students what types of questions will be on the test(i.e essay,

multiple-choice etc) prior to the examination. Allow students to see

past(retired) expect. For essay exams, students understand how they will

be evaluated (if appropriate).7. Provide students with a list of review questions or topics covered on the

exam along with an indication of the relative emphasis on each topic.

8. Give detailed study suggestions.

9. Indicate how much the examination will count toward determining the

final grade.


4/55

4

Importance Of Test Admini stration

Consistency

Standardized tests are designed to be administered under consistent

procedures so that the test taking experience is as similar as possible across

examinees.

This similar experience increases the fairness of the as well as making

examinees scores more directly comparable.

Typical guidelines related to the test administration locations state that all

the sites should be comfortable, and should have good lighting, ventilation

and handicap accessibility.

Interruptions and distractions, such as excessive noise, should be prevented.

The time limits that have been established should be adhered to for all testadministrations.

Test security

Test security consists of method designed to prevent cheating, as well as to

protect the test items and content from being exposed to future test- takers.

Test administration procedures related to test security may begin as early s

the registration procedure. Many exam programs restrict examinees from

registering for a test unless they meet certain eligibility criteria. When examinees arrives at the test site , additional provisions for test

security include verify each examinees identification and restricting

materials(such as photographic or communication devices)that an examinee

is allowed to bring into the test administration. If the exam program uses

multiple parallel test forms, these may be distributed in a spiral fashion, in

order to prevent one examinee form being able to copy from another.(form

A is distributed to the first examinee, form B to the second examinee, form

A to the third examinee etc). The test proctors should also remain attentive throughout the test

administration to prevent cheating and other security breaches. When testing

is complete, all test related materials should be carefully collected from the

examinees before they depart.


5/55

5

Summary-

The use of orderly, standardized test administration procedures is beneficial

to examinees. In particular, administration procedures designed to promote

consistent conditions for all examinees increase the exam programs fairness.Test administration procedures related to security protect the integrity of the

test items. In both of these cases, the standardization of test administration

procedures prevents some examinees from being unfairly advantaged over

other examinees.

How many questions should I give?

It is important to allow your students enough time to complete the exam

comfortably and reasonably. Inevitably this will mean you must make some

choices about which questions you will ask.

o One minute per objective type question

o Two minute for a short answer requiring one sentence

o Five to ten minutes for a longer short answer

o Ten minutes for a problem that would take you two minutes to answer

o Fifteen minutes for a short, focused essay

o Thirty minutes for an essay of more than one to two pages

You should add ten minutes or so to allow for the distribution and collection of

the exam.

Administering tests-

There are several things you should keep in mind to make the experience run as

smoothly as possible-

Have extra copies of the test on hand, in case you have miscounted or in

the event of some other problem.


6/55

6

Minimize interruptions during the exam by reading the directions briefly at

the start and refraining from commenting during the exam unless you

discover a problem.

Periodically write the time remaining on the board .

Be alert for cheating but do not hover over the students and cause a

distraction.

There are also some steps that you can take to reduce the anxiety that students

will inevitably feel leading up to and during an exam. Consider the following-

Have old exams on file in the department office for students to review.

Give students practice exams prior to the real test.

Explain in advance of the test day, the exam format and rules, and explain

how this fits with your philosophy of testing.

Give students tips on how to study for and take the exam- this is not a test

of their test taking ability, but rather of their knowledge, so help them learn

to take tests.

Have extra office hours and a review session before the test.

Arrive at the exam site early, and be there yourself(rather than sending a

proxy) to communicated the importance of the event.

Recommendati ons for improving Test scores:

1) When a test is announced well in advance, do not wait until the day before

to begin studying spaced practice is more effective than massed practice.2) Ask the instructor for old copies of the examination to practice with.

3) Ask other students what kinds of tests the instructor usually gives.

4) Dont turn study session into social occasion, isolated studying is usually

more effective.


7/55

7

5) Dont be too comfortable when studying lying down is a physical care for

your body to sleep.

6) Study for the type of test which was announced.

7) If you do not know the type(style) of test, study for a free recall exam.

8) Ask yourself questions about the subject material read for detail, recite the

material just prior to test.

9) Try to form material you are studying into test questions.

10)Read test directions carefully before beginning exam. Ask administrator if

unclear or some details are not included.

11)If essay test, think about question and mentally formulate answer before

you begin writing.

12)Pace yourself while taking test. Do not try to be first person finished. Allow

enough time to review answers at end of session.

13)If you can rule out one wrong answer choice, guess even if there is a penalty

for wrong answers.

14)Skip more difficult items and return to them. Later, particularly if there are a

lot of questions.

15)When time permits, review your answer. Dont be overly eager to hand in

your test paper before all the available time has elapsed.

Scoring The Test


8/55

8

The principles of valuation should be followed in scoring the test. It enhances

the objectivity and reliability of the test.

Reliability- The degree of accuracy, consistency with which an exam, test

measures, what it seeks to measure a given variable. The degree ofconsistency among test scores. A test score is called reliable, when we have

reasons for believing it to be stable and trustworthy.

Objectivity- A test is objective, when the scorers personal judgment does not

affect the scoring. It eliminates fixed opinion or judgment of the person who

scores it. The extent to which independent and competent examiners agree on

what constitutes a good answer for each of the elements of a measuring

instruments.

Selection-type items

Prepare stencils when useful

When using stencil with holes, make sure that students marked only one

alternative

When response wrong, put red mark through correct answer

Apply formula for guessing only when a test is speeded

Weight all items the same (doing otherwise seldom makes a difference and

only confuses scoring)

Supply-type items

Use your carefully developed rubrics.

Uti l izing Rubr ics as Assessment Tools

What is a rubric?

A rubric is a scoring and instructional tool used to assess student performance

using a task-specific range or set of criteria. To measure student performance

against this pre-determined set of criteria for the task and levels of

performance(i.e. from poor to excellent) for each criterion. Most rubrics are

designed as a one- or two- page document formatted with a table or grid that

outlines the learning criteria for a specific lesson, assignment or project.


9/55

9

Rubrics can be created in a variety of forms and levels of complexity, but

they all:

Focus on measuring a stated objective(performance, behavior, or quality)

Use a range to rate performance

Contain specific performance characteristic arranged in levels indicating

the degree to which a standard has been met.

Two major types of rubrics:

A holistic rubric involves one global, holistic rating with a single score for an

entire product or performance based on an overall impression. These are

useful for summative assessment where an overall performance rating is

needed, for example, portfolios.

A holistic rubric requires the teacher to score the overall process or product as

a whole, without judging the component parts separately.

An analytical rubric divides a product or performance into essential traits

that are judged separately. Analytical rubrics are usually more useful for day

to day classroom use since they provide more detailed and precise feedback to

the student.

An analytical rubric, the teacher scores separate, individual parts of the

products or performance first, then sums the individual scores to obtain a total

score.


10/55

10

Assessing student learning

Rubrics provide instructors with an effective means of learning-centered

feedback and evaluation of student work. As instructional tools, rubrics enable

stdents to guage the strengths and weaknesses of their work and learning. As

assessment tools, rubrics enable faculty to provide detailed and informativeevaluations of students work.

Advantages of using rubrics:

They allow assessment to be more objective and consistent.

They clarify the instructors criteria in specific term.

They clearly show students how their work will be evaluated and what

is expected.

They promote awareness of the criteria to use when students assess per

performance.

They provide benchmarks against which to measure progress.

They reduce the amount of time teachers spend evaluating student work

by allowing them to simply circle an item in the rubrics.


11/55

11

They increase students sense of responsibility for their own work.

Steps of creating rubrics:

1. Define your assignment or project

This is the task you asking your student to perform.

2. Decide on a scale of performance.These can be a level for each grade(A-F)or three levels(outstanding,

acceptable, not acceptable; Great job, Okay,What happened?). These are

listed at the top of the grid.

3. Identify the criteria of the task.

These are the observable and measurable characteristics of the task. They

listed in the left-hand column. They can be weighted to convey relative

importance of each.

4. Describe the performance of each critierion.

These descriptors indicate what performance looks like at each level.

They offer specific feedback. Use samples of student work to help you

determine quality work.

Suggestions for use:

Hand out the rubric with rubric the assignment. Return the rubrics with theperformance descriptors circled.

Have students develop their own rubrics for a project.

Have students use the rubric for self-assessment or peer assessment


12/55

12


13/55

13


14/55

14


15/55

15

Methods Of Scoring In Standardized TestDifferent tests use different methods of scoring based on different needs. The

following table summarizes the three main categories of test scores:

1. Raw Scores

2. Criterion-referenced Scores

3.Norm-referenced Scores (how most standardized tests are scored

Score How score is

determined

Uses Potential drawbacks

Raw score By counting the

number (or

calculating a

percentage) of

correct responses or

points earned.

Often used in teacher-

developed assessment

instruments.

Scores may be

difficult to interpret

without knowledge of

how performance

relates to either a

specific criterion or a

norm or group.

Criterion-

referenced

Score

By comparing

performance to one

or more criteria orstandards for success

Useful when

determining whether

specific instructionalobjectives have been

achieved. Also useful

when determining if

basic skills that are

prerequisites for other

tasks have been

learned.

Criteria for assessing

mastery of complex

skills may be difficultto identify.

Age orGrade

Equivalent

(norm-

referenced)

By equating astudent's

performance to the

average performance

of students at a

particular age or

Useful whenexplaining norm-

referenced test

performance to people

unfamiliar with

standard scores.

Scores arefrequently

misinterpreted,

especially by

parents.

Scores may be


16/55

16

grade level inappropriately

used as a

standard that

all students

must meet. Scores are

often

inapplicable

when

achievement at

the secondary

level or higher

is beingassessed.

Do not give a

typical range of

performance

for students at

that age or

grade.

Percentile

Rank

(norm-

referenced)

By determining the

percentage of

students at the same

age or grade level

who obtained lower

scores.

Useful when

explaining norm-

referenced test

performance to people

unfamiliar with

standard scores.

Scores overestimate

differences near the

mean and

underestimate

differences at the

extremes.

Standard

Score

(norm-

referenced

By determining how

far the performance

is from the mean (for

the age or grade

level) with respect to

standard deviation

units.

Useful when

describing a student's

standing within the

norm group

Scores are not easily

understood by people

without some

knowledge of

statistics.


17/55

17

Standard Score

Definition

Standard scorehow far above or below average a student scored

Distance is calculated in standard deviation (SD) units (a standard

deviation is a measure of spread or variability)

The mean and standard deviation are for a particular norm group.

Standard Scores are by far the most complicated of the five types of

scores so they deserve a more in-depth look. When looking at the normal

distribution, a line is drawn from the highest point on the curve to the x-

axis. This point is the mean score. A standard deviation's worth is counted

out on each side of the mean and those points are marked. Another

standard deviation is counted out and two more points are marked. When

the normal distribution is divided up this way, you will always get the

same percentage of students scoring in each part. About 68% will score

within one standard deviation of the mean (34% in each direction). As

you move further from the mean, fewer and fewer students will perform at

these scores. A standard score simply tells us where a student scores in

relation to this normal distribution in standard deviation units.

Advantages

Based on the normal curve, which means that

1. Scores are distributed symmetrically around the mean (average)

2. Each SD represents a fixed (but different) percentage of cases

3. Almost everyone is included between3.0 and 3.0 SDs of the mean

4. The SD allows conversion of very different kinds of raw scores to acommon scale that has (a) equal units and (b) can be readily interpreted in

terms of the normal curve

5. When we can assume that scores follow a normal curve (classroom tests

usually dont but standardized tests do), we can translate standard scores

into percentilesvery useful.


18/55

18

Types of Standard Score

All Standard Scores

Share a common logic Can be translated into each other

Z-Score

Simplest

The one on which all others based

Formula: z = (X-M)/SD, where X is persons score, M is groups average,

and SD is groups spread (standard deviation in scores

Z is negative for scores that are below average, so zs are usually convertedinto some other system that has all positive numbers

T- Score

First a z-score is computed. Then a T-score with a mean of 50 and a standard

deviation of 10 is applied. T-scores are whole numbers and are never negative.

Normally distributed standard scores

M=50, SD=10 Can be obtained from z scores: T = 50 + 10(z)

Normal ized Standard Scores

Starts with scores that you want to make conform to the normal curve

Get percentile ranks for each score

Transform percentiles into z scores using a conversion table (I handed one

out in class)

Then transform into any other standard score you want (e.g., T-score, IQ

equivalents)

Hope that your assumption was right, namely, that the scores really do

naturally follow a normal curve. If they dont, your interpretations (say, of

equal units) may be somewhat mistaken


19/55

19

Stanines

Very simple type of normalized standard score

Ranges from 1-9 (the standard nines)

Each stanine from 2-8 covers SD Stanine 5 = percentiles 40-59 (the middle 20 percent)

A difference of 2 stanines usually signals a real difference

Strengths

1. Easily explained to students and parents

2. Normalized, so can compare different tests

3. Can add stanines to get a composite score

4. Easily recorded (only one column)

Limitations

1. Like all standard scores, cannot record growth

2. Crude, but prevents over interpretation

I Q Scores-

Tests that measure intelligence have a mean of 100 and (for the most part) a

standard deviation of 15. Most people will score between 85 and 115. Someone

who scores below a 70 is typically considered mentally retarded.

Normal-Curve Equi valents (NCE)


M=50

SD=21.06

Results in scores that go from 1-99

Like percentiles, expect that have equal units (this means that they make

fewer distinctions in the middle of the curve and more at the extremes)


20/55

20

Standard Age Scores (SAS)


Put into an IQ metric, where

M=100 SD=15 (Wechsler

IQ Test) or SD=16 (Stanford-Binet IQ Test)

Converting among Standard Scores

Easy Convertibility

All are different ways of saying the same thing

All represent equal units at different ranges of scores

All can be averaged (among themselves)

Can easily convert one into the other

Figure 19.2 on p 494 shows how they line up with each other

But interpretable only when scores are actually normally distributed

(standardized tests usually are)

Downsidenot as easily understood by students and parents as are

percentiles

Using Standard Scores to Examine Profi les

Uses

You can compare a students scores on different tests and subtests when

you convert all the scores to the same type of standard score

But all the tests must use the same norm group Plotting profiles can show their relative strengths and weaknesses

Should be plotted as confidence bands to illustrate fringe of error

Interpret scores as different only when their bands do not overlap

Sometimes plotted separately by male and female (say, on vocational

interest tests), but is controversial practice


21/55

21

Tests sometimes come with tabular or narrative reports ofprofiles

Using Standard Scores to Examine Mastery of Skil l Types

Some standardized tests try to provide some criterion-referencedinformation by providing scores on specific sets of skills (see Figure 19.4

on p. 498)

Be very cautious with theseuse them as clues only, because each skill

area typically has very few items

Cautions in I nterpreting Standardized Test Scores

Scores should be interpreted

1. With clear knowledge about what the test measures. Dont rely on titles;

examine the content (breadth, etc.)

2. In light of other factors (aptitudes, educational experiences, cultural

background, health, motivation, etc.) that may have affected test

performance

3. According to the type of decision being made (high or low for what?)

4. As a band of scores rather than a specific value. Always subtract and add 1

SEM from the score to get a range to avoid over interpretation5. In light of all your evidence. Look for corroborating or conflicting

evidence

6.Never rely on a single score to make a big decision


22/55

22

Marking Versus GradingGiving marks and grade as a response to students work is part of teachers

routine work . Marking refers to assigning marks or points to students

performance against marking scheme set for a test or an assignment. More oftenthan not, marking and scoring are regarded as part of the normal practice of

grading.

Brookhart (2004)defines grading as scoring or rating a individual assignments.

Grading will explain with attaching the meaning to the score that tells us if the

expectation have been exceeded , met or not met.

In relation to marking and grading of assessments, the University of Greenwich

makes the following helpful points:

1. Assessment is a matter of judgment , not simply computation.

2. Marks and grades are not absolute values, but symbols used by examiners

to communicate their judgment of a students work.

3. Marks and grades provide data for decisions about students fulfillment of

learning outcomes.

Marking and Grading Criteria:

Higher education institutions is normally use an institution-wide grading scale

for undergraduate programmes, whereas postgraduate programmes tend to be

graded on a pass/fail basis or pass/fail/distinction basis. Grading scales tend to

incorporate both percentage grading ; latter meaning letter such as A,B,C,etc. the

grading scale used in the UNIVERSITY of Greenwich shown:

Mark on 0-

100scale

Comments

70+ Work of

exceptional

quality

60-69 Work of very

good quality


23/55

23

50-59 Work of good

quality

40-49 Work of

satisfactory

standard

30-39 Compensatable

fail

0-29 Fail

Undergraduate grading scale are likely to be similar in other higher education

institutions. It is interesting to compare this scalewith percentage equivalents for

the class honours degree.

0-30% Fail

35-39% Pass degree

40-49% Third class honours

50-59% Lower second class honours

60-69% Upper second class honours

70% or more First class honours

Marks or grades are assigned to students essays to indicate the degree of

achievement they have attained and there are two systems for assigning grades.

Absolute grading gives the student marks for her essay answer, depending on

how well the essay has met assessment criteria and is usually expressed as a

percentage or letter ,e.g. 60% or B.

Relative grading tells the student how his essay answer rated in relation to other

students doing the same test, by indicating whether or not he was average , above

average, or below average. Relative grading usually uses a literal scale such as

A,B, C, D and F. Some teachers would argue that two grades are the best way of

marking, so that students are given either a pass or fail grade.


24/55

24

This gets over the problem of deciding what constitute on A or a C grade but

does reduce the information conveyed by a particular grade, since no

discrimination is made between students who pass with a very high level of

achievement and those who barely pass at all.

Common methods of Grading

a. Letter Grades: there is a great flexibility in the number of grades that

can be adopted i.e. 3-11. However 3- point scales may not

differentiate well between students of different abilities. 11-point

scales make too fine distinctions and can introduce arbitrariness.Most common scale is 7 and 5.

Examples of 7 points grading scale.

O- outstanding

A-Very goodB-Good

C-AverageD-Below average

E-Poor

F-Very poor

Example : of 5 points grading scale

A+- Excellent

A-Good

B-Average

C-Satisfactory

D-Fail

STRENGTHS-


25/55

25

Easy to use.

Easy to interpret theoretically

Provide a concise summary

Limitations

Meaning of grades may vary widely.

Do not describe strengths/ weaknesses of students.

2. Number /Percentage Grades(5,3,2,1,0) or (98%, 80%,60% etc)

It is same as letter grades. Only difference is that instead of letters numbers or

percentage is used.

Strengths

Easy to use

Easy to interpret theoretically

Provide a concise summary

May be combined with letter grades

More continuous than letter grades.

Limitations-

Meaning of grades may vary widely.

Do not describe strengths/weaknesses of students.

Meaning may need to be explained or interpreted.

Two category grades-(pass-fail) it is good for courses that require-mastery of

learning.


26/55

26

Strengths

Less reliable

Does not contain enough information about students achievement.

Provides no indication of the level of learning

Checklists and rating scales- they are more detailed and since they are too

detailed it is cumbersome for teachers to prepare.

Strengths

Present detailed lists of students achievements.

Can be combined with letter grades.

Good for clinical evaluation

Limitations

May become too detailed to easily comprehend.

Difficult for record keeping.

Uses of grading

1. Describe unambiguously the worth, merit or value of the work

accomplished. Grades are intended to communicate the achievement of

students.

2. Grades motivate students to learn

3. Provide information to students for self evaluation for analysis of strengths

and weaknesses.

4. Grades communicate performance levels to other.

5. Grades help in selecting people for rewards.


27/55

27

6. Communicate teachers judgment of the students progress.

Analytical method of marking (marking scheme)

When using absolute grading to specific criteria, it is useful to use the analytic

method of marking. In this method, a marking scheme is prepared in advance and

marks are allocated to the specific points of content in the marking specification.

However, it is often difficult to decide how many marks should be given to a

particular aspect, but the relative importance of each should be reflected in the

allocation. This method has the advantage that it can be more reliable provided

the marker is conscientious and it will bring to light any errors in the writing of

the question before the test is administered

Global method of marking( structured impressionistic marking)

The global method is also termed structured impressionistic marking, and is best

used with relative grading. This method still requires a marking specification, but

in this case it serves only as a standard of comparison. The grades used are not

usually percentages but scale, such as excellent/good/average/below

average/unsatisfactory scales can be devised. According to preference, but it is

important to select examples of answers that serve as standards for each of the

points on the scale. The teacher then reads each answer through very quickly and

put in the appropriate pile, depending whether it gives the impression of

excellent, good etc. the process is then repeated and it is much more effective if a

colleague is asked to do the second reading. This method is much faster than the

analytical one and can be quite effective for large numbers of questions.

Uses of marking


28/55

28

Marking has two distinct stakeholders, the students and the tutor. Both should

use marking as a means of raising achievement and attainment.

From tutors perspective marking should:

Check student understanding.

Direct future lesson planning and teaching.

Monitor progress through collection of marks.

Helps to assess student progress and attainment.

Set work of appropriate levels.

Have the clear objectives about what and how you teach.

Informs students and parents formatively and summatively.

From students perspective marking should and could help them.

Identify carelessness.

Proof- reading- i.e by making them check their work of spelling,

punctuation etc.

Draft work-students can become actively involved in improving their

own work.

Identify areas of weakness and strength.

Identify areas that lack understanding and knowledge.

Become more motivated and value to their work.


29/55

29

Scoring Essay Questions Prepare an outline of the expected answer in advance.

Use the scoring method which is most appropriate.

o Point method: each answer is compared to the ideal answer in thescoring key and a given number of points assigned in terms of

adequacy of the answer.

o Rating method: where the rating method is used, it is desirable to

make separate ratings for each characteristic evaluated. That is

answers should be rated separately for each characteristic

evaluated. That is answers should be rated separately for

organization, comprehensiveness, relevance of ideas, and the

like.Decide on provision for handling factors which are irrelevant to the

learning outcomes being measured.

o Legibility of hand writing, spelling, sentence structure,

punctuation and neatness, special efforts should be made to keep

away such factors from influencing our judgment.

Evaluate all answer to one question before going on to the next

question.

o The halo effect is less likely to form when the answers for a

given pupil are not evaluated in continuous sequence.

Evaluate the answers without looking atpupils name.

If especially important decisions are to be based on the results, obtain

two or more independent ratings.

Methods in scoring essay tests

It is critical that the teacher prepare, in advance a detailed ideal answer.

Student paper should be scored anonymously and that all answers to a

given item be scored one at a time, rather than grading each total separately

Distractors in scoring essay tests

Handwriting style

Grammar

Knowledge of the students


30/55

30

Neatness

Two ways of scoring essay test

1. Holistic scoring-in this, type, a total score is assigned in each essay items

based the teachers general impression or over-all assessment.

2. Analytic scoring- in this type, the essay is scored in term of each

component

Disadvantages in scoring essay test

Carryout effect-

Carryout effect in which the teacher develops an impression of the quality of the

answer from on item and carries it over to the next response. If the studentanswer from one item well, the teacher may be influenced to score subsequent

responses at a similarly high level; the same situation may occur with a poor

response.

Halo effect-

There may be a tendency in evaluating essay items to be influenced by a general

impression of the student or feelings about the student, either positive or

negative, that create halo effect when judging the quality of the answers. For

instance, the teacher may hold favorable opinions about the student from class or

clinical practice and believe that this learner had made significant improvement

in the course, which in turn might influence the scoring the responses.

Scor ing Guidel ines

These are the descriptions of scoring criteria that the trained readers will follow

to determine the score (16) for your essay. Papers at each level exhibit allormostof the characteristics described at each score point.

Score = 6


31/55

31

Essays within this score range demonstrate effective skill in responding to the

task.

The essay shows a clear understanding of the task. The essay takes a position on

the issue and may offer a critical context for discussion. The essay addresses

complexity by examining different perspectives on the issue, or by evaluating the

implications and/or complications of the issue, or by fully responding to

counterarguments to the writer's position. Development of ideas is ample,

specific, and logical. Most ideas are fully elaborated. A clear focus on the

specific issue in the prompt is maintained. The organization of the essay is clear:

the organization may be somewhat predictable or it may grow from the writer's

purpose. Ideas are logically sequenced. Most transitions reflect the writer's logic

and are usually integrated into the essay. The introduction and conclusion are

effective, clear, and well developed. The essay shows a good command of

language. Sentences are varied and word choice is varied and precise. There are

few, if any, errors to distract the reader.

Score = 5

Essays within this score range demonstrate competent skill in responding to the

task.

The essay shows a clear understanding of the task. The essay takes a position on

the issue and may offer a broad context for discussion. The essay shows

recognition of complexity by partially evaluating the implications and/orcomplications of the issue, or by responding to counterarguments to the writer's

position. Development of ideas is specific and logical. Most ideas are elaborated,

with clear movement between general statements and specific reasons, examples,

and details. Focus on the specific issue in the prompt is maintained. The


32/55

32

organization of the essay is clear, although it may be predictable. Ideas are

logically sequenced, although simple and obvious transitions may be used. The

introduction and conclusion are clear and generally well developed. Language is

competent. Sentences are somewhat varied and word choice is sometimes varied

and precise. There may be a few errors, but they are rarely distracting.

Score = 4

Essays within this score range demonstrate adequate skill in responding to the

task.

The essay shows an understanding of the task. The essay takes a position on theissue and may offer some context for discussion. The essay may show some

recognition of complexity by providing some response to counterarguments to

the writer's position. Development of ideas is adequate, with some movement

between general statements and specific reasons, examples, and details. Focus on

the specific issue in the prompt is maintained throughout most of the essay. The

organization of the essay is apparent but predictable. Some evidence of logical

sequencing of ideas is apparent, although most transitions are simple and

obvious. The introduction and conclusion are clear and somewhat developed.

Language is adequate, with some sentence variety and appropriate word choice.

There may be some distracting errors, but they do not impede understanding.

Score = 3

Essays within this score range demonstrate some developing skill in responding

to the task.

The essay shows some understanding of the task. The essay takes a position on

the issue but does not offer a context for discussion. The essay may acknowledge


33/55

33

a counterargument to the writer's position, but its development is brief or unclear.

Development of ideas is limited and may be repetitious, with little, if any,

movement between general statements and specific reasons, examples, and

details. Focus on the general topic is maintained, but focus on the specific issue

in the prompt may not be maintained. The organization of the essay is simple.

Ideas are logically grouped within parts of the essay, but there is little or no

evidence of logical sequencing of ideas. Transitions, if used, are simple and

obvious. An introduction and conclusion are clearly discernible but

underdeveloped. Language shows a basic control. Sentences show a little variety

and word choice is appropriate. Errors may be distracting and may occasionally

impede understanding.

Score = 2

Essays within this score range demonstrate inconsistent or weak skill in

responding to the task.

The essay shows a weak understanding of the task. The essay may not take a

position on the issue, or the essay may take a position but fail to convey reasons

to support that position, or the essay may take a position but fail to maintain a

stance. There is little or no recognition of a counterargument to the writer's

position. The essay is thinly developed. If examples are given, they are general

and may not be clearly relevant. The essay may include extensive repetition of

the writer's ideas or of ideas in the prompt. Focus on the general topic is

maintained, but focus on the specific issue in the prompt may not be maintained.

There is some indication of an organizational structure, and some logical

grouping of ideas within parts of the essay is apparent. Transitions, if used, are

simple and obvious, and they may be inappropriate or misleading. An

introduction and conclusion are discernible but minimal. Sentence structure and


34/55

34

word choice are usually simple. Errors may be frequently distracting and may

sometimes impede understanding.

Score = 1

Essays within this score range show little or no skill in responding to the task.

The essay shows little or no understanding of the task. If the essay takes a

position, it fails to convey reasons to support that position. The essay is

minimally developed. The essay may include excessive repetition of the writer's

ideas or of ideas in the prompt. Focus on the general topic is usually maintained,

but focus on the specific issue in the prompt may not be maintained. There islittle or no evidence of an organizational structure or of the logical grouping of

ideas. Transitions are rarely used. If present, an introduction and conclusion are

minimal. Sentence structure and word choice are simple. Errors may be

frequently distracting and may significantly impede understanding.

No Score

Blank, Off-Topic, Illegible, Not in English, or Void.

Guidelines in scoring Essay test to avoid subjectivity

Decide what factors constitute a good answer before administering an

essay question.

Explain these factors in the item item.

Read all the answers to a single essay question before reading other

questions.

Reread essay answer a second time after initial scoring


35/55

35

Scoring Objective ItemsFollowing are the method of scoring objectives items:

Scoring key

Strip key

Scoring stencil

Scoring key: if the pupils answers are recorded on the test paper

itself, a scoring key is usually obtained marking the correct answer on

a blank copy of the test.

The scoring procedure is then simply a matter of comparing the

columns of answers on this master copy with the columns of answers

on each pupils paper.

Strip Key: a strip key, which consists merely of strip of paper on

which columns of answers are recorded may also be used.

Scoring Stencil:where separate answer sheets are used, a scoring

stencil is most convenient. This is a blank answer sheet with holes

punched where the correct answers should appears.

One of the most important advantages of objective type test is ease and

accuracy of scoring. The best way to score objective tests is with a test

scanner. This technology can speed up scoring and minimized scoringerrors.

When using a test scanner, a scoring key is prepared on a machine-

scorable answer sheet and it is read by the scanner first. After the

scanner reads the scoring key, the student responses are read and stored

on the hard disk of an attached computer.

A separate program is used to score the student responses by

comparing each response to the correct answer on the answer key.

When this process is complete each students score, along with itemanalysis information is printed.


36/55

36

Item AnalysisThe procedure used to judge the quality of an item is called, Item analysis.

Item analysis is a post administration examination of a test . The quality of a

test depends upon the individual items of a test. A test is usually desirable to

evaluate effectiveness of items. It provides information concerning how well

each item in the test functions. An item analysis tells about the quality of an item.

One primary goal of item analysis is to help improve the test by revising or

discarding ineffective items. Another important function is to ascertain what test

takers do and do not know. I

Item Analysis describes the statistical analyses, which allow measurement of

the effectiveness of individual test items. An understanding of the factors which

govern effectiveness (and a means of measuring them) can enable us to create

more effective test questions and also regulate and standardize existing tests. Item

analysis helps to find out how difficult the test item . Similarly it also helps to

know how well the item discriminates between high and low scorers in the test.

Item analysis further helps to detect specific technical flaws and thus provide

further information for improving test items

To ascertain whether the questions/ items do their job effectively. A detailed test

and item analysis has to be done before a meaningful and scientific inference

about the test can be made in terms of its validity, reliability, objectivity and

usability.

A systematic analysis aims at finding the performance of a group.


37/55

37

The central tendency of marks obtained by them, e.g normal/average;

positive or negative skewness high or low value.

The variability characterized by standard deviation(SD) indicates the

nature of spread of marks, the greater the spread, and the greater will be

value of standard deviation.

Coefficient of reliability for the test indicating the degree of consistency

with which the test has measured the students abilities. A high value of

this means that the test is reliable and it produces virtually repeatable,

scores for the students.

Item analysis is useful in making meaningful interpretations and value

judgments about students performance.

A teacher or paper setter comes to know whether the items had the right

level of difficulty and whether there was discrimination between more able

and less able students.

Item analysis defines and maintains standard of performance, ensures

comparability of standards,

o To understand the behavior of items,

o To become better item writers, scientific, professional and

competent teachers.

Item analysis is a process of examining class-wide performance on individual test

items. There are three common types of item analysis which provide teachers

with three different types of information:

Difficulty Index - Teachers produce a difficulty index for a test item by

calculating the proportion of students in class who got an item correct.

(The name of this index is counter-intuitive, as one actually gets a measure

of how easy the item is, not the difficulty of the item.) The larger the


38/55

38

proportion, the more students who have learned the content measured by

the item.

Discrimination Index - The discrimination index is a basic measure of the

validity of an item. It is a measure of an item's ability to discriminate

between those who scored high on the total test and those who scored low.

Though there are several steps in its calculation, once computed, this index

can be interpreted as an indication of the extent to which overall

knowledge of the content area or mastery of the skills is related to the

response on an item. Perhaps the most crucial validity standard for a test

item is that whether a student got an item correct or not is due to their level

of knowledge or ability and not due to something else such as chance or

test bias.

Analysis of Response Options - In addition to examining the performance

of an entire test item, teachers are often interested in examining the

performance of individual distractors (incorrect answer options) on

multiple-choice items. By calculating the proportion of students who chose

each answer option, teachers can identify which distractors are "working"

and appear attractive to students who do not know the correct answer, and

which distractors are simply taking up space and not being chosen by

many students. To eliminate blind guessing which results in a correct

answer purely by chance (which hurts the validity of a test item), teachers

want as many plausible distractors as is feasible. Analyses of response

options allow teachers to fine tune and improve items they may wish to use

again with future classes.


39/55

39

Steps involved in Item Analysis

For each item count the number of students in each group who answered

the item correctly. For alternate response type of items, count the numberof students in each group who choose each alternative.

Award of score to each student.

A practical, simple and rapid method is to perforate on your answer sheet

the boxes corresponding to the correct answer, placing the perforated sheet

on the students answer sheet the raw score can be found almost

automatically.

A B C D

Ranking in order of merit and identifying high and low groups.

Arrange the answer sheets from the highest score to the lowest score.

Make two groupsi.e., highest scores in one group; lowest scores in other

group or top and bottom halves.

Calculation of dif f icul ty index of a question

For each item, compute the percentage of students who get the item correct is

called item difficulty index.

1. D=R/N *100

R: number of pupils who answered the item correctly.

N: total number of pupils who tried them.


40/55

40

The higher the difficulty index, the easier is the item. Difficulty

level/facility level of a test; it is an index of how easy or difficult the test is

form is a ratio of the average score of a sample of subjects on the test to the

maximum possible score on the test. It is usually expressed in percentage.

2. Difficulty level= average on the test/ Maximum possible score * 100

3. Difficulty index= H+L/N *100

H: Number of correct answers to the high group.

L: Number of correct answers to the low group.

N: Total number of students in both groups.

4. Find out the facility value of objective tests first.

5. Facility value= Number of students answering questions correctly * 100

Number of students who have taken the test.

If the facility value is 70 and above, those are easy questions; if it is below

70 the questions are difficult ones.

Estimating Discrimination Index(DI)

The discriminating power (validity index) of an item refers to the degree to which

a given item discriminates among students who differ sharply in the functions

measured by the test as a whole.

Formula-1

DI= RU-RL/1/2 N

RU= Number of correct responses from the upper group.

RL= Number of correct responses from lower group.


41/55

41

N= Total number of pupils who tried them.

High discriminate value questions are needed for selection purposes.

Formula-2

DI= No. of HAQ-LAQ/No. of HAG

No. of HAQ: number of students in high ability group answering the questions

correctly

No. of LAQ: Number of students in low ability group answering questions

correctly.

No. of HAG: Number of students in high ability group

Positive Discrimination: If an item is answered correctly by superiors

(upper groups) and but not answered correctly by inferiors (lower group)

such item possess positive discrimination.

Negative Discrimination: An item answered correctly by inferiors (lower

group) but not answered correctly by the superiors (upper groups) such item

possess negative discrimination.

Zero Discrimination: If an item is answered correctly by the same number ofsuperiors as well as inferiors examinees of the same group. The item cannot

discriminate between superior and inferior examinees. Thus, the

discrimination power of the item is zero.


42/55

42

Item analysis is a general term that refers to the specific methods used in

education to evaluate test items, typically for the purpose of test

construction and revision.

Regarded as one of the most important aspects of test construction and

increasingly receiving attention, it is an approach incorporated into item

response theory (IRT), which serves as an alternative to classical

measurement theory (CMT) or classical test theory (CTT). Classical

measurement theory considers a score to be the direct result of a person's

true score plus error.

It is this error that is of interest as previous measurement theories have

been unable to specify its source. However, item response theory uses item

analysis to differentiate between types of error in order to gain a clearer

understanding of any existing deficiencies.

Particular attention is given to individual test items, item characteristics,

probability of answering items correctly, overall ability of the test taker,

and degrees or levels of knowledge being assessed.

The Purpose Of I tem Analysis

There must be a match between what is taught and what is assessed.

However, there must also be an effort to test for more complex levels of

understanding, with care taken to avoid over-sampling items that assess

only basic levels of knowledge. Tests that are too difficult (and have an insufficient floor) tend to lead to

frustration and lead to deflated scores, whereas tests that are too easy (and

have an insufficient ceiling) facilitate a decline in motivation and lead to

inflated scores.


43/55

43

Tests can be improved by maintaining and developing a pool of valid items

from which future tests can be drawn and that cover a reasonable span of

difficulty levels.

Item analysis helps improve test items and identify unfair or biased items.

Results should be used to refine test item wording. In addition, closer

examination of items will also reveal which questions were most difficult,

perhaps indicating a concept that needs to be taught more thoroughly.

If a particular distracter (that is, an incorrect answer choice) is the most

often chosen answer, and especially if that distracter positively correlates

with a high total score, the item must be examined more closely for

correctness. This situation also provides an opportunity to identify and

examine common misconceptions among students about a particular

concept.

In general, once test items have been created, the value of these items can

be systematically assessed using several methods representative of item

analysis:

a) a test item's level of difficulty,

b) an item's capacity to discriminate, and c) the item characteristic curve.

Difficulty is assessed by examining the number of persons correctly

endorsing the answer. Discrimination can be examined by comparing the

number of persons getting a particular item correct with the total test score.

Finally, the item characteristic curve can be used to plot the likelihood of

answering correctly with the level of success on the test.

Using Item Analysis Results

It helps the judge the worth or quality of a test.


44/55

44

Aids in subsequent test revisions.

Lead to increase skill in test construction.

Provides diagnostic value and help in planning future learning activities.

Provides a basis for discussing test results.

For making decisions about the promotion of students to the next higher

grade.

To bring about improvement in teaching methods and techniques.

For making decisions about the promotion of students to the next higher

grade.

To bring about improvement in teaching methods and techniques.

I tem Dif fi culty

Perhaps item difficulty should have been named item easiness; it expresses

the proportion or percentage of students who answered the item correctly. Item

difficulty can range from 0.0 (none of the students answered the item correctly)

to 1.0 (all of the students answered the item correctly). Experts recommend that

the average level of difficulty for a four-option multiple choice test should be

between 60% and 80%; an average level of difficulty within this range can be

obtained, of course, when the difficulty of individual items falls outside of this

range. If an item has a low difficulty value, say, less than .25, there are several

possible causes: the item may have been miskeyed; the item may be too

challenging relative to the overall level of ability of the class; the item may be

ambiguous or not written clearly; there may be more than one correct answer.

Further insight into the cause of a low difficulty value can often be gained by

examining the percentage of students who chose each response option. For

example, when a high percentage of students chose a single option other than the


45/55

45

one that is keyed as correct, it is advisable to check whether a mistake was made

on the answer key.

Item Statistics

Item statistics are used to assess the performance of individual test items on the

assumption that the overall quality of a test derives from the quality of its items.

Item Number.

This is the question number taken from the student answer sheet. Up to 150 items

can be scored on the Standard Answer Sheet (purple).

Mean and S.D.

The mean is the "average" student response to an item. It is computed by adding

up the number of points earned by all students for the item, and dividing that total

by the number of students.

The standard deviation, orS.D., is a measure of the dispersion of student scores

on that item,

that is, it indicates how "spread out" the responses were. The item standard

deviation is most

meaningful when comparing items which have more than one correct alternative

and when scale scoring is used. For this reason it is not typically used to evaluate

classroom tests.

Item Difficulty.

For items with one correct alternative worth a single point, the item difficulty is

simply the percentage of students who answer an item correctly. In this case, it is

also equal to the item mean. The item difficulty index ranges from 0 to 100; the


46/55

46

higher the value, the easier the question. When an alternative is worth other than

a single point, or when there is more than one correct alternative per question, the

item difficulty is the average score on that item divided by the highest number of

points for any one alternative.

Item difficulty is relevant for determining whether students have learned the

concept being tested. It also plays an important role in the ability of an item to

discriminate between students who do not. The item will have low discrimination

if it is so difficult that almost everyone gets it wrong or guesses, or so easy that

almost everyone gets it right.

To maximize item discrimination, desirable difficulty levels are slightly higher

than midway

between chance and perfect scores for the item. (The chance score for five-option

questions, for example, is .20 because one-fifth of the students responding to the

question could be expected to choose the correct option by guessing.) Ideal

difficulty levels for multiple-choice items in terms of discrimination potential are:

Format Ideal Difficulty

Five-response multiple-choice 70

Four-response multiple-choice 74

Three-response multiple-choice 77

True-false (two-response multiple choice) 85

classifies item difficulty as "easy" if the index is 85% or above; "moderate" if it

is between 51 and 84%; and "hard" if it is 50% or below.


47/55

47

I tem Discrimination

Item discrimination refers to the ability of an item to differentiate among students

on the basis of how well they know the material being tested. Various hand

calculation procedures have traditionally been used to compare item responses to

total test scores using high and low scoring groups of students. Computerized

analyses provide more accurate assessment of the discrimination power of items

because they take into account responses of all students rather than just high and

low scoring groups.

The item discrimination index between student responses to a particular item and

total scores on all other items on the test.

This index is the equivalent of a point-biserial coefficient in this application. It

provides an

estimate of the degree to which an individual item is measuring the same thing as

the rest of the items.

Because the discrimination index reflects the degree to which an item and the test

as a whole are measuring a unitary ability or attribute, values of the coefficient

will tend to be lower for tests measuring a wide range of content areas than for

more homogeneous tests.

Item discrimination indices must always be interpreted in the context of the type

of test which is being analyzed.

Items with low discrimination indices are often ambiguously worded and should

be examined.

Items with negative indices should be examined to determine why a negative

value was obtained.

For example, a negative value may indicate that the item was miskeyed, so that

students who


48/55

48

knew the material tended to choose an unkeyed, but correct, response option.

Tests with high internal consistency consist of items with mostly positive

relationships with total

test score. In practice, values of the discrimination index will seldom exceed .50

because of the

differing shapes of item and total score distributions. Item discrimination as

"good" if the index is above .30; "fair" if it is between .10 and .30; and "poor" if

it is below .10.

Alternate Weight.

This column shows the number of points given for each response alternative.

For most tests, there will be one correct answer which will be given one point,

but ScorePak

allows multiple correct alternatives, each of which may be assigned a different

weight.

Means.

The mean total test score (minus that item) is shown for students who selected

each of

the possible response alternatives. This information should be looked at in

conjunction with the

discrimination index; higher total test scores should be obtained by students

choosing the correct,

or most highly weighted alternative. Incorrect alternatives with relatively high

means should be

examined to determine why "better" students chose that particular alternative.


49/55

49

Frequencies and Distribution.

The number and percentage of students who choose each

alternative are reported. The bar graph on the right shows the percentage

choosing each

response. Frequently chosen wrong alternatives may indicate common

misconceptions among

the students.

Difficulty and Discrimination Distributions

At the end of the Item Analysis report, test items are listed according their

degrees of difficulty (easy, medium, hard) and discrimination (good, fair, poor).

These distributions provide a quick overview of the test, and can be used to

identify items which are not performing well and which can perhaps be improved

or discarded.

Test Statistics

Two statistics are provided to evaluate the performance of the test as a whole.

Reliability Coefficient.

The reliability of a test refers to the extent to which the test is likely to

produce consistent scores. The particular reliability coefficient reflects three

characteristics of the test:

1. The inter correlations among the items -- the greater the relative number of

positive relationships, and the stronger those relationships are, the greater the

reliability. Item discrimination indices and the test's reliability coefficient are

related in this regard.


50/55

50

2. The length of the test -- a test with more items will have a higher reliability, all

other things

being equal.

3. The content of the test -- generally, the more diverse the subject matter tested

and the testing

techniques used, the lower the reliability.

Reliability coefficients theoretically range in value from zero (no reliability) to

1.00 (perfect

reliability). In practice, their approximate range is from .50 to .90 for about 95%

of the classroom tests scored

High reliability means that the questions of a test tended to "pull together."

Students who

answered a given question correctly were more likely to answer other questions

correctly. If a

parallel test were developed by using similar items, the relative scores of students

would show

little change.

Low reliability means that the questions tended to be unrelated to each other in

terms of who

answered them correctly. The resulting test scores reflect peculiarities of the

items or the testing situation more than students' knowledge of the subject matter.

As with many statistics, it is dangerous to interpret the magnitude of a reliability

coefficient out of context. High reliability should be demanded in situations in


51/55

51

which a single test score is used to make major decisions, such as professional

licensure examinations. Because classroom

examinations are typically combined with other scores to determine grades, the

standards for a

single test need not be as stringent. The following general guidelines can be used

to interpret

reliability coefficients for classroom exams:

Reliability Interpretation

.90 and above Excellent reliability; at the level of the best standardized tests

.80 - .90 Very good for a classroom test

.70 - .80 Good for a classroom test; in the range of most. There are probably a

few items which could be improved.

.60 - .70 Somewhat low. This test needs to be supplemented by other measures

(e.g., more tests) to determine grades. There are probably some items which

could be improved.

.50 - .60 Suggests need for revision of test, unless it is quite short (ten or fewer

items). The test definitely needs to be supplemented by other measures (e.g.,

more tests) for grading.

.50 or below Questionable reliability. This test should not contribute heavily to

the course grade, and it needs revision.

The measure of reliability used. This is the general form of the more commonly

reported KR-20 and can be applied to tests composed of items with different

numbers of points given for different response alternatives. When coefficient

alpha is applied to tests in which each item has only one correct answer and all


52/55

52

correct answers are worth the same number of points, the resulting coefficient is

identical to KR-20.

Standard Error of Measurement.

The standard error of measurement is directly related to the reliability of the test.

It is an index of the amount of variability in an individual student's performance

due to random measurement error. If it were possible to administer an infinite

number of parallel tests, a student's score would be expected to change from one

administration to the next due to a number of factors. For each student, the scores

would form a "normal" (bellshaped) distribution. The mean of the distribution is

assumed to be the student's "true score," and reflects what he or she "really"

knows about the subject. The standard deviation of the distribution is called the

standard error of measurement and reflects the amount of change in the student's

score which could be expected from one test administration to another.

Whereas the reliability of a test always varies between 0.00 and 1.00, the

standard error of

measurement is expressed in the same scale as the test scores. For example,

multiplying all test

scores by a constant will multiply the standard error of measurement by that same

constant, but

will leave the reliability coefficient unchanged.

A general rule of thumb to predict the amount of change which can be expected

in individual test

scores is to multiply the standard error of measurement by 1.5. Only rarely would

one expect a

student's score to increase or decrease by more than that amount between two

such similar


53/55

53

tests. The smaller the standard error of measurement, the more accurate the

measurement

provided by the test.

A CAUTION in I nterpreting I tem Analysis Resul ts

Each of the various item statistics provides information which can be used to

improve individual test items and to increase the quality of the test as a whole.

Such statistics must always be interpreted in the context of the type of test given

and the individuals being tested are not synonymous with item validity.

1.An external criterion is required to accurately judge the validity of test items.

By using the internal criterion of total test score, item analyses reflect internal

consistency of items rather than validity.

2. The discrimination index is not always a measure of item quality. There is a

variety of reasons an item may have low discriminating power:

(a) extremely difficult or easy items will have low ability to discriminate but

such items are often needed to adequately sample course content and objectives;

(b) An item may show low discrimination if the test measures many different

content areas and

cognitive skills. For example, if the majority of the test measures "knowledge of

facts," then an item assessing "ability to apply principles" may have a low

correlation with total test score, yet both types of items are needed to measure

attainment of course objectives.

3.Item analysis data are tentative. Such data are influenced by the type and

number of students being tested, instructional procedures employed, and chance

errors. If repeated use of items is possible, statistics should be recorded for each

administration of each item.


54/55

54

Summary

In the light of above discussion, we have discussed about administrating a test

and various suggestions to administer the test, importance of test administration,

recommendations for improving test scores. we learnt about scoring methods ,

various standard scores and marking and grading criteriaand its types. We

discussed about scoring essay test and objective test. We had detailed glance on

item analysis ,item difficulty and its uses.

Conclusion

By above discussion, I conclude the topic that by knowing proper knowledge

about good practice of administration of test and various methods of scoring the

test helps to improve performance of the student and teachers evaluation skill.


55/55

Bibliography

B. Sankaranarayan(2008), LEARNING AND TEACHING NURSING, 2nd

edition, Brainfill publishers. Pg no-232-233

K P Neeraja(2003), TEXTBOOK OF NURSING EDUCATION,1st edition,

Gopson paper ltd, Noida. Pg no-413-425

Francis M. Quinn(2000), PRINCIPLE AND PRACTICE OF NURSE

EDUCATION, 4th edition, nelson thornes ltd. Pg no-210-214

Marlyin H. Oermann(2009), EVALUATION AND TESTING NURSING

EDUCATION, 3

rd

edition, springe publisher. Pg no- 122-126

Documents

ADMINISTRATION,Reporting and Scoring 1