ADMINISTRATION,Reporting and Scoring 1

Embed Size (px)

Citation preview

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    1/55

    1

    ADMINISTRATION, SCORING AND REPORTING

    IntroductionAdministering a test usually is the simplest phases of the testing process. There

    are some common problems associated with test administration, however, that

    may also affect those scores. Careful planning can help the teacher avoid or

    minimize such difficulties. When giving tests it is important that everything

    possible be done to obtain valid results. Cheating, poor testing conditions, and

    test anxiety, as well as errors in test scoring procedures contribute to invalid testresults. Many of these factors may be controlled by practicing good test

    administration procedures. Practicing these procedures will prove to be less time

    consuming and less troublesome than dealing with problems resulting

    from poor procedures.

    After administering a test, the teachers responsibility is to score it or arrange to

    have it scored. The teacher then interprets the results and uses these interpretations

    to make grading, selection, placement or other decisions. To accurately interpret

    test scores, however, the teacher needs to analyze the performance of the test as a

    whole and of the individual test items, and to use these data to draw valid

    inferences about student performance. This information also helps faculty prepare

    for post test discussions with students about the exam.

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    2/55

    2

    Administrating A TestIt plays a vital role in enhancing the reliability of the test scores. Test

    should be administered in a congenial environment strictly as per the

    instructions planned and assure uniformity of conclusions to all the people

    tested.

    Suggestions to administer the test-

    Long announcements before or during the test should be not be made.

    Instructions should be given in writing.

    The test administration should not respond to the individual problems of

    the examinees.

    The steps to be followed in the administration of group tests are:

    a) Motivate the students to do their best

    b) Follow the directions closely.

    c) Keep time accurately.

    d) Record any significant events that might influence test scores

    e) Collect the test materials promptly.

    The guiding principle in administering an achievement test is that all students

    must be given a fair chance to demonstrated their achievement of the learning

    outcomes being measured. This mean a physical and psychological environment

    conducive to their best efforts and the control of factors that might interfere with

    valid measurement. Students will not perform at their best if they are tense and

    anxious during testing. They should also be reassured that the time limits are

    adequate to allowed them to complete the test. This, of course, assumes that the

    test will be used to improve learning and that the time limits are adequate. Thethings to avoid while administering a test are-

    Do not talk unnecessarily before the test.

    Keep interruptions to a minimum during the test.

    Avoid giving hits to pupils who ask.

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    3/55

    3

    Administer ing Exams

    How an exam is administered can affect student performance as such as how the

    exam was written. Below is a list of general principles to consider when

    designing and administering examinations.

    1. Give complete instructions as to how to take the examination counts or the

    amount of time to spend on each section. This helps students to allocate

    their efforts wisely.

    2. State specifically what aids(e.g. calculator, notebooks) students are

    allowed to use in the examination room.

    3. Use assignments and homework to provide preparation for taking the

    exams. For examples, if the assignments ask all essay questions, it would

    be inappropriate for the examination to consist of 200 multiple choice

    questions.

    4. Practice taking the completed test yourself. You should count on the

    students to take about 4 times the amount of time it takes you to complete

    the test.

    5. For final examinations; structure the test to cover the scope of the entire

    course. The examination should be comprehensive enough to test

    adequately the students learning of the course material. Use a variety of

    different types of questions on the examination(e.g. multiple- choice,

    essay, etc) because some topics are covered more effectively with certain

    types of questions. Group questions of the same type together when

    possible.

    6. Tell the students what types of questions will be on the test(i.e essay,

    multiple-choice etc) prior to the examination. Allow students to see

    past(retired) expect. For essay exams, students understand how they will

    be evaluated (if appropriate).7. Provide students with a list of review questions or topics covered on the

    exam along with an indication of the relative emphasis on each topic.

    8. Give detailed study suggestions.

    9. Indicate how much the examination will count toward determining the

    final grade.

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    4/55

    4

    Importance Of Test Admini stration

    Consistency

    Standardized tests are designed to be administered under consistent

    procedures so that the test taking experience is as similar as possible across

    examinees.

    This similar experience increases the fairness of the as well as making

    examinees scores more directly comparable.

    Typical guidelines related to the test administration locations state that all

    the sites should be comfortable, and should have good lighting, ventilation

    and handicap accessibility.

    Interruptions and distractions, such as excessive noise, should be prevented.

    The time limits that have been established should be adhered to for all testadministrations.

    Test security

    Test security consists of method designed to prevent cheating, as well as to

    protect the test items and content from being exposed to future test- takers.

    Test administration procedures related to test security may begin as early s

    the registration procedure. Many exam programs restrict examinees from

    registering for a test unless they meet certain eligibility criteria. When examinees arrives at the test site , additional provisions for test

    security include verify each examinees identification and restricting

    materials(such as photographic or communication devices)that an examinee

    is allowed to bring into the test administration. If the exam program uses

    multiple parallel test forms, these may be distributed in a spiral fashion, in

    order to prevent one examinee form being able to copy from another.(form

    A is distributed to the first examinee, form B to the second examinee, form

    A to the third examinee etc). The test proctors should also remain attentive throughout the test

    administration to prevent cheating and other security breaches. When testing

    is complete, all test related materials should be carefully collected from the

    examinees before they depart.

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    5/55

    5

    Summary-

    The use of orderly, standardized test administration procedures is beneficial

    to examinees. In particular, administration procedures designed to promote

    consistent conditions for all examinees increase the exam programs fairness.Test administration procedures related to security protect the integrity of the

    test items. In both of these cases, the standardization of test administration

    procedures prevents some examinees from being unfairly advantaged over

    other examinees.

    How many questions should I give?

    It is important to allow your students enough time to complete the exam

    comfortably and reasonably. Inevitably this will mean you must make some

    choices about which questions you will ask.

    o One minute per objective type question

    o Two minute for a short answer requiring one sentence

    o Five to ten minutes for a longer short answer

    o Ten minutes for a problem that would take you two minutes to answer

    o Fifteen minutes for a short, focused essay

    o Thirty minutes for an essay of more than one to two pages

    You should add ten minutes or so to allow for the distribution and collection of

    the exam.

    Administering tests-

    There are several things you should keep in mind to make the experience run as

    smoothly as possible-

    Have extra copies of the test on hand, in case you have miscounted or in

    the event of some other problem.

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    6/55

    6

    Minimize interruptions during the exam by reading the directions briefly at

    the start and refraining from commenting during the exam unless you

    discover a problem.

    Periodically write the time remaining on the board .

    Be alert for cheating but do not hover over the students and cause a

    distraction.

    There are also some steps that you can take to reduce the anxiety that students

    will inevitably feel leading up to and during an exam. Consider the following-

    Have old exams on file in the department office for students to review.

    Give students practice exams prior to the real test.

    Explain in advance of the test day, the exam format and rules, and explain

    how this fits with your philosophy of testing.

    Give students tips on how to study for and take the exam- this is not a test

    of their test taking ability, but rather of their knowledge, so help them learn

    to take tests.

    Have extra office hours and a review session before the test.

    Arrive at the exam site early, and be there yourself(rather than sending a

    proxy) to communicated the importance of the event.

    Recommendati ons for improving Test scores:

    1) When a test is announced well in advance, do not wait until the day before

    to begin studying spaced practice is more effective than massed practice.2) Ask the instructor for old copies of the examination to practice with.

    3) Ask other students what kinds of tests the instructor usually gives.

    4) Dont turn study session into social occasion, isolated studying is usually

    more effective.

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    7/55

    7

    5) Dont be too comfortable when studying lying down is a physical care for

    your body to sleep.

    6) Study for the type of test which was announced.

    7) If you do not know the type(style) of test, study for a free recall exam.

    8) Ask yourself questions about the subject material read for detail, recite the

    material just prior to test.

    9) Try to form material you are studying into test questions.

    10)Read test directions carefully before beginning exam. Ask administrator if

    unclear or some details are not included.

    11)If essay test, think about question and mentally formulate answer before

    you begin writing.

    12)Pace yourself while taking test. Do not try to be first person finished. Allow

    enough time to review answers at end of session.

    13)If you can rule out one wrong answer choice, guess even if there is a penalty

    for wrong answers.

    14)Skip more difficult items and return to them. Later, particularly if there are a

    lot of questions.

    15)When time permits, review your answer. Dont be overly eager to hand in

    your test paper before all the available time has elapsed.

    Scoring The Test

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    8/55

    8

    The principles of valuation should be followed in scoring the test. It enhances

    the objectivity and reliability of the test.

    Reliability- The degree of accuracy, consistency with which an exam, test

    measures, what it seeks to measure a given variable. The degree ofconsistency among test scores. A test score is called reliable, when we have

    reasons for believing it to be stable and trustworthy.

    Objectivity- A test is objective, when the scorers personal judgment does not

    affect the scoring. It eliminates fixed opinion or judgment of the person who

    scores it. The extent to which independent and competent examiners agree on

    what constitutes a good answer for each of the elements of a measuring

    instruments.

    Selection-type items

    Prepare stencils when useful

    When using stencil with holes, make sure that students marked only one

    alternative

    When response wrong, put red mark through correct answer

    Apply formula for guessing only when a test is speeded

    Weight all items the same (doing otherwise seldom makes a difference and

    only confuses scoring)

    Supply-type items

    Use your carefully developed rubrics.

    Uti l izing Rubr ics as Assessment Tools

    What is a rubric?

    A rubric is a scoring and instructional tool used to assess student performance

    using a task-specific range or set of criteria. To measure student performance

    against this pre-determined set of criteria for the task and levels of

    performance(i.e. from poor to excellent) for each criterion. Most rubrics are

    designed as a one- or two- page document formatted with a table or grid that

    outlines the learning criteria for a specific lesson, assignment or project.

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    9/55

    9

    Rubrics can be created in a variety of forms and levels of complexity, but

    they all:

    Focus on measuring a stated objective(performance, behavior, or quality)

    Use a range to rate performance

    Contain specific performance characteristic arranged in levels indicating

    the degree to which a standard has been met.

    Two major types of rubrics:

    A holistic rubric involves one global, holistic rating with a single score for an

    entire product or performance based on an overall impression. These are

    useful for summative assessment where an overall performance rating is

    needed, for example, portfolios.

    A holistic rubric requires the teacher to score the overall process or product as

    a whole, without judging the component parts separately.

    An analytical rubric divides a product or performance into essential traits

    that are judged separately. Analytical rubrics are usually more useful for day

    to day classroom use since they provide more detailed and precise feedback to

    the student.

    An analytical rubric, the teacher scores separate, individual parts of the

    products or performance first, then sums the individual scores to obtain a total

    score.

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    10/55

    10

    Assessing student learning

    Rubrics provide instructors with an effective means of learning-centered

    feedback and evaluation of student work. As instructional tools, rubrics enable

    stdents to guage the strengths and weaknesses of their work and learning. As

    assessment tools, rubrics enable faculty to provide detailed and informativeevaluations of students work.

    Advantages of using rubrics:

    They allow assessment to be more objective and consistent.

    They clarify the instructors criteria in specific term.

    They clearly show students how their work will be evaluated and what

    is expected.

    They promote awareness of the criteria to use when students assess per

    performance.

    They provide benchmarks against which to measure progress.

    They reduce the amount of time teachers spend evaluating student work

    by allowing them to simply circle an item in the rubrics.

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    11/55

    11

    They increase students sense of responsibility for their own work.

    Steps of creating rubrics:

    1. Define your assignment or project

    This is the task you asking your student to perform.

    2. Decide on a scale of performance.These can be a level for each grade(A-F)or three levels(outstanding,

    acceptable, not acceptable; Great job, Okay,What happened?). These are

    listed at the top of the grid.

    3. Identify the criteria of the task.

    These are the observable and measurable characteristics of the task. They

    listed in the left-hand column. They can be weighted to convey relative

    importance of each.

    4. Describe the performance of each critierion.

    These descriptors indicate what performance looks like at each level.

    They offer specific feedback. Use samples of student work to help you

    determine quality work.

    Suggestions for use:

    Hand out the rubric with rubric the assignment. Return the rubrics with theperformance descriptors circled.

    Have students develop their own rubrics for a project.

    Have students use the rubric for self-assessment or peer assessment

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    12/55

    12

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    13/55

    13

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    14/55

    14

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    15/55

    15

    Methods Of Scoring In Standardized TestDifferent tests use different methods of scoring based on different needs. The

    following table summarizes the three main categories of test scores:

    1. Raw Scores

    2. Criterion-referenced Scores

    3.Norm-referenced Scores (how most standardized tests are scored

    Score How score is

    determined

    Uses Potential drawbacks

    Raw score By counting the

    number (or

    calculating a

    percentage) of

    correct responses or

    points earned.

    Often used in teacher-

    developed assessment

    instruments.

    Scores may be

    difficult to interpret

    without knowledge of

    how performance

    relates to either a

    specific criterion or a

    norm or group.

    Criterion-

    referenced

    Score

    By comparing

    performance to one

    or more criteria orstandards for success

    Useful when

    determining whether

    specific instructionalobjectives have been

    achieved. Also useful

    when determining if

    basic skills that are

    prerequisites for other

    tasks have been

    learned.

    Criteria for assessing

    mastery of complex

    skills may be difficultto identify.

    Age orGrade

    Equivalent

    (norm-

    referenced)

    By equating astudent's

    performance to the

    average performance

    of students at a

    particular age or

    Useful whenexplaining norm-

    referenced test

    performance to people

    unfamiliar with

    standard scores.

    Scores arefrequently

    misinterpreted,

    especially by

    parents.

    Scores may be

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    16/55

    16

    grade level inappropriately

    used as a

    standard that

    all students

    must meet. Scores are

    often

    inapplicable

    when

    achievement at

    the secondary

    level or higher

    is beingassessed.

    Do not give a

    typical range of

    performance

    for students at

    that age or

    grade.

    Percentile

    Rank

    (norm-

    referenced)

    By determining the

    percentage of

    students at the same

    age or grade level

    who obtained lower

    scores.

    Useful when

    explaining norm-

    referenced test

    performance to people

    unfamiliar with

    standard scores.

    Scores overestimate

    differences near the

    mean and

    underestimate

    differences at the

    extremes.

    Standard

    Score

    (norm-

    referenced

    By determining how

    far the performance

    is from the mean (for

    the age or grade

    level) with respect to

    standard deviation

    units.

    Useful when

    describing a student's

    standing within the

    norm group

    Scores are not easily

    understood by people

    without some

    knowledge of

    statistics.

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    17/55

    17

    Standard Score

    Definition

    Standard scorehow far above or below average a student scored

    Distance is calculated in standard deviation (SD) units (a standard

    deviation is a measure of spread or variability)

    The mean and standard deviation are for a particular norm group.

    Standard Scores are by far the most complicated of the five types of

    scores so they deserve a more in-depth look. When looking at the normal

    distribution, a line is drawn from the highest point on the curve to the x-

    axis. This point is the mean score. A standard deviation's worth is counted

    out on each side of the mean and those points are marked. Another

    standard deviation is counted out and two more points are marked. When

    the normal distribution is divided up this way, you will always get the

    same percentage of students scoring in each part. About 68% will score

    within one standard deviation of the mean (34% in each direction). As

    you move further from the mean, fewer and fewer students will perform at

    these scores. A standard score simply tells us where a student scores in

    relation to this normal distribution in standard deviation units.

    Advantages

    Based on the normal curve, which means that

    1. Scores are distributed symmetrically around the mean (average)

    2. Each SD represents a fixed (but different) percentage of cases

    3. Almost everyone is included between3.0 and 3.0 SDs of the mean

    4. The SD allows conversion of very different kinds of raw scores to acommon scale that has (a) equal units and (b) can be readily interpreted in

    terms of the normal curve

    5. When we can assume that scores follow a normal curve (classroom tests

    usually dont but standardized tests do), we can translate standard scores

    into percentilesvery useful.

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    18/55

    18

    Types of Standard Score

    All Standard Scores

    Share a common logic Can be translated into each other

    Z-Score

    Simplest

    The one on which all others based

    Formula: z = (X-M)/SD, where X is persons score, M is groups average,

    and SD is groups spread (standard deviation in scores

    Z is negative for scores that are below average, so zs are usually convertedinto some other system that has all positive numbers

    T- Score

    First a z-score is computed. Then a T-score with a mean of 50 and a standard

    deviation of 10 is applied. T-scores are whole numbers and are never negative.

    Normally distributed standard scores

    M=50, SD=10 Can be obtained from z scores: T = 50 + 10(z)

    Normal ized Standard Scores

    Starts with scores that you want to make conform to the normal curve

    Get percentile ranks for each score

    Transform percentiles into z scores using a conversion table (I handed one

    out in class)

    Then transform into any other standard score you want (e.g., T-score, IQ

    equivalents)

    Hope that your assumption was right, namely, that the scores really do

    naturally follow a normal curve. If they dont, your interpretations (say, of

    equal units) may be somewhat mistaken

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    19/55

    19

    Stanines

    Very simple type of normalized standard score

    Ranges from 1-9 (the standard nines)

    Each stanine from 2-8 covers SD Stanine 5 = percentiles 40-59 (the middle 20 percent)

    A difference of 2 stanines usually signals a real difference

    Strengths

    1. Easily explained to students and parents

    2. Normalized, so can compare different tests

    3. Can add stanines to get a composite score

    4. Easily recorded (only one column)

    Limitations

    1. Like all standard scores, cannot record growth

    2. Crude, but prevents over interpretation

    I Q Scores-

    Tests that measure intelligence have a mean of 100 and (for the most part) a

    standard deviation of 15. Most people will score between 85 and 115. Someone

    who scores below a 70 is typically considered mentally retarded.

    Normal-Curve Equi valents (NCE)

    Normally distributed standard scores

    M=50

    SD=21.06

    Results in scores that go from 1-99

    Like percentiles, expect that have equal units (this means that they make

    fewer distinctions in the middle of the curve and more at the extremes)

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    20/55

    20

    Standard Age Scores (SAS)

    Normally distributed standard scores

    Put into an IQ metric, where

    M=100 SD=15 (Wechsler

    IQ Test) or SD=16 (Stanford-Binet IQ Test)

    Converting among Standard Scores

    Easy Convertibility

    All are different ways of saying the same thing

    All represent equal units at different ranges of scores

    All can be averaged (among themselves)

    Can easily convert one into the other

    Figure 19.2 on p 494 shows how they line up with each other

    But interpretable only when scores are actually normally distributed

    (standardized tests usually are)

    Downsidenot as easily understood by students and parents as are

    percentiles

    Using Standard Scores to Examine Profi les

    Uses

    You can compare a students scores on different tests and subtests when

    you convert all the scores to the same type of standard score

    But all the tests must use the same norm group Plotting profiles can show their relative strengths and weaknesses

    Should be plotted as confidence bands to illustrate fringe of error

    Interpret scores as different only when their bands do not overlap

    Sometimes plotted separately by male and female (say, on vocational

    interest tests), but is controversial practice

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    21/55

    21

    Tests sometimes come with tabular or narrative reports ofprofiles

    Using Standard Scores to Examine Mastery of Skil l Types

    Some standardized tests try to provide some criterion-referencedinformation by providing scores on specific sets of skills (see Figure 19.4

    on p. 498)

    Be very cautious with theseuse them as clues only, because each skill

    area typically has very few items

    Cautions in I nterpreting Standardized Test Scores

    Scores should be interpreted

    1. With clear knowledge about what the test measures. Dont rely on titles;

    examine the content (breadth, etc.)

    2. In light of other factors (aptitudes, educational experiences, cultural

    background, health, motivation, etc.) that may have affected test

    performance

    3. According to the type of decision being made (high or low for what?)

    4. As a band of scores rather than a specific value. Always subtract and add 1

    SEM from the score to get a range to avoid over interpretation5. In light of all your evidence. Look for corroborating or conflicting

    evidence

    6.Never rely on a single score to make a big decision

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    22/55

    22

    Marking Versus GradingGiving marks and grade as a response to students work is part of teachers

    routine work . Marking refers to assigning marks or points to students

    performance against marking scheme set for a test or an assignment. More oftenthan not, marking and scoring are regarded as part of the normal practice of

    grading.

    Brookhart (2004)defines grading as scoring or rating a individual assignments.

    Grading will explain with attaching the meaning to the score that tells us if the

    expectation have been exceeded , met or not met.

    In relation to marking and grading of assessments, the University of Greenwich

    makes the following helpful points:

    1. Assessment is a matter of judgment , not simply computation.

    2. Marks and grades are not absolute values, but symbols used by examiners

    to communicate their judgment of a students work.

    3. Marks and grades provide data for decisions about students fulfillment of

    learning outcomes.

    Marking and Grading Criteria:

    Higher education institutions is normally use an institution-wide grading scale

    for undergraduate programmes, whereas postgraduate programmes tend to be

    graded on a pass/fail basis or pass/fail/distinction basis. Grading scales tend to

    incorporate both percentage grading ; latter meaning letter such as A,B,C,etc. the

    grading scale used in the UNIVERSITY of Greenwich shown:

    Mark on 0-

    100scale

    Comments

    70+ Work of

    exceptional

    quality

    60-69 Work of very

    good quality

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    23/55

    23

    50-59 Work of good

    quality

    40-49 Work of

    satisfactory

    standard

    30-39 Compensatable

    fail

    0-29 Fail

    Undergraduate grading scale are likely to be similar in other higher education

    institutions. It is interesting to compare this scalewith percentage equivalents for

    the class honours degree.

    0-30% Fail

    35-39% Pass degree

    40-49% Third class honours

    50-59% Lower second class honours

    60-69% Upper second class honours

    70% or more First class honours

    Marks or grades are assigned to students essays to indicate the degree of

    achievement they have attained and there are two systems for assigning grades.

    Absolute grading gives the student marks for her essay answer, depending on

    how well the essay has met assessment criteria and is usually expressed as a

    percentage or letter ,e.g. 60% or B.

    Relative grading tells the student how his essay answer rated in relation to other

    students doing the same test, by indicating whether or not he was average , above

    average, or below average. Relative grading usually uses a literal scale such as

    A,B, C, D and F. Some teachers would argue that two grades are the best way of

    marking, so that students are given either a pass or fail grade.

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    24/55

    24

    This gets over the problem of deciding what constitute on A or a C grade but

    does reduce the information conveyed by a particular grade, since no

    discrimination is made between students who pass with a very high level of

    achievement and those who barely pass at all.

    Common methods of Grading

    a. Letter Grades: there is a great flexibility in the number of grades that

    can be adopted i.e. 3-11. However 3- point scales may not

    differentiate well between students of different abilities. 11-point

    scales make too fine distinctions and can introduce arbitrariness.Most common scale is 7 and 5.

    Examples of 7 points grading scale.

    O- outstanding

    A-Very goodB-Good

    C-AverageD-Below average

    E-Poor

    F-Very poor

    Example : of 5 points grading scale

    A+- Excellent

    A-Good

    B-Average

    C-Satisfactory

    D-Fail

    STRENGTHS-

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    25/55

    25

    Easy to use.

    Easy to interpret theoretically

    Provide a concise summary

    Limitations

    Meaning of grades may vary widely.

    Do not describe strengths/ weaknesses of students.

    2. Number /Percentage Grades(5,3,2,1,0) or (98%, 80%,60% etc)

    It is same as letter grades. Only difference is that instead of letters numbers or

    percentage is used.

    Strengths

    Easy to use

    Easy to interpret theoretically

    Provide a concise summary

    May be combined with letter grades

    More continuous than letter grades.

    Limitations-

    Meaning of grades may vary widely.

    Do not describe strengths/weaknesses of students.

    Meaning may need to be explained or interpreted.

    Two category grades-(pass-fail) it is good for courses that require-mastery of

    learning.

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    26/55

    26

    Strengths

    Less reliable

    Does not contain enough information about students achievement.

    Provides no indication of the level of learning

    Checklists and rating scales- they are more detailed and since they are too

    detailed it is cumbersome for teachers to prepare.

    Strengths

    Present detailed lists of students achievements.

    Can be combined with letter grades.

    Good for clinical evaluation

    Limitations

    May become too detailed to easily comprehend.

    Difficult for record keeping.

    Uses of grading

    1. Describe unambiguously the worth, merit or value of the work

    accomplished. Grades are intended to communicate the achievement of

    students.

    2. Grades motivate students to learn

    3. Provide information to students for self evaluation for analysis of strengths

    and weaknesses.

    4. Grades communicate performance levels to other.

    5. Grades help in selecting people for rewards.

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    27/55

    27

    6. Communicate teachers judgment of the students progress.

    Analytical method of marking (marking scheme)

    When using absolute grading to specific criteria, it is useful to use the analytic

    method of marking. In this method, a marking scheme is prepared in advance and

    marks are allocated to the specific points of content in the marking specification.

    However, it is often difficult to decide how many marks should be given to a

    particular aspect, but the relative importance of each should be reflected in the

    allocation. This method has the advantage that it can be more reliable provided

    the marker is conscientious and it will bring to light any errors in the writing of

    the question before the test is administered

    Global method of marking( structured impressionistic marking)

    The global method is also termed structured impressionistic marking, and is best

    used with relative grading. This method still requires a marking specification, but

    in this case it serves only as a standard of comparison. The grades used are not

    usually percentages but scale, such as excellent/good/average/below

    average/unsatisfactory scales can be devised. According to preference, but it is

    important to select examples of answers that serve as standards for each of the

    points on the scale. The teacher then reads each answer through very quickly and

    put in the appropriate pile, depending whether it gives the impression of

    excellent, good etc. the process is then repeated and it is much more effective if a

    colleague is asked to do the second reading. This method is much faster than the

    analytical one and can be quite effective for large numbers of questions.

    Uses of marking

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    28/55

    28

    Marking has two distinct stakeholders, the students and the tutor. Both should

    use marking as a means of raising achievement and attainment.

    From tutors perspective marking should:

    Check student understanding.

    Direct future lesson planning and teaching.

    Monitor progress through collection of marks.

    Helps to assess student progress and attainment.

    Set work of appropriate levels.

    Have the clear objectives about what and how you teach.

    Informs students and parents formatively and summatively.

    From students perspective marking should and could help them.

    Identify carelessness.

    Proof- reading- i.e by making them check their work of spelling,

    punctuation etc.

    Draft work-students can become actively involved in improving their

    own work.

    Identify areas of weakness and strength.

    Identify areas that lack understanding and knowledge.

    Become more motivated and value to their work.

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    29/55

    29

    Scoring Essay Questions Prepare an outline of the expected answer in advance.

    Use the scoring method which is most appropriate.

    o Point method: each answer is compared to the ideal answer in thescoring key and a given number of points assigned in terms of

    adequacy of the answer.

    o Rating method: where the rating method is used, it is desirable to

    make separate ratings for each characteristic evaluated. That is

    answers should be rated separately for each characteristic

    evaluated. That is answers should be rated separately for

    organization, comprehensiveness, relevance of ideas, and the

    like.Decide on provision for handling factors which are irrelevant to the

    learning outcomes being measured.

    o Legibility of hand writing, spelling, sentence structure,

    punctuation and neatness, special efforts should be made to keep

    away such factors from influencing our judgment.

    Evaluate all answer to one question before going on to the next

    question.

    o The halo effect is less likely to form when the answers for a

    given pupil are not evaluated in continuous sequence.

    Evaluate the answers without looking atpupils name.

    If especially important decisions are to be based on the results, obtain

    two or more independent ratings.

    Methods in scoring essay tests

    It is critical that the teacher prepare, in advance a detailed ideal answer.

    Student paper should be scored anonymously and that all answers to a

    given item be scored one at a time, rather than grading each total separately

    Distractors in scoring essay tests

    Handwriting style

    Grammar

    Knowledge of the students

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    30/55

    30

    Neatness

    Two ways of scoring essay test

    1. Holistic scoring-in this, type, a total score is assigned in each essay items

    based the teachers general impression or over-all assessment.

    2. Analytic scoring- in this type, the essay is scored in term of each

    component

    Disadvantages in scoring essay test

    Carryout effect-

    Carryout effect in which the teacher develops an impression of the quality of the

    answer from on item and carries it over to the next response. If the studentanswer from one item well, the teacher may be influenced to score subsequent

    responses at a similarly high level; the same situation may occur with a poor

    response.

    Halo effect-

    There may be a tendency in evaluating essay items to be influenced by a general

    impression of the student or feelings about the student, either positive or

    negative, that create halo effect when judging the quality of the answers. For

    instance, the teacher may hold favorable opinions about the student from class or

    clinical practice and believe that this learner had made significant improvement

    in the course, which in turn might influence the scoring the responses.

    Scor ing Guidel ines

    These are the descriptions of scoring criteria that the trained readers will follow

    to determine the score (16) for your essay. Papers at each level exhibit allormostof the characteristics described at each score point.

    Score = 6

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    31/55

    31

    Essays within this score range demonstrate effective skill in responding to the

    task.

    The essay shows a clear understanding of the task. The essay takes a position on

    the issue and may offer a critical context for discussion. The essay addresses

    complexity by examining different perspectives on the issue, or by evaluating the

    implications and/or complications of the issue, or by fully responding to

    counterarguments to the writer's position. Development of ideas is ample,

    specific, and logical. Most ideas are fully elaborated. A clear focus on the

    specific issue in the prompt is maintained. The organization of the essay is clear:

    the organization may be somewhat predictable or it may grow from the writer's

    purpose. Ideas are logically sequenced. Most transitions reflect the writer's logic

    and are usually integrated into the essay. The introduction and conclusion are

    effective, clear, and well developed. The essay shows a good command of

    language. Sentences are varied and word choice is varied and precise. There are

    few, if any, errors to distract the reader.

    Score = 5

    Essays within this score range demonstrate competent skill in responding to the

    task.

    The essay shows a clear understanding of the task. The essay takes a position on

    the issue and may offer a broad context for discussion. The essay shows

    recognition of complexity by partially evaluating the implications and/orcomplications of the issue, or by responding to counterarguments to the writer's

    position. Development of ideas is specific and logical. Most ideas are elaborated,

    with clear movement between general statements and specific reasons, examples,

    and details. Focus on the specific issue in the prompt is maintained. The

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    32/55

    32

    organization of the essay is clear, although it may be predictable. Ideas are

    logically sequenced, although simple and obvious transitions may be used. The

    introduction and conclusion are clear and generally well developed. Language is

    competent. Sentences are somewhat varied and word choice is sometimes varied

    and precise. There may be a few errors, but they are rarely distracting.

    Score = 4

    Essays within this score range demonstrate adequate skill in responding to the

    task.

    The essay shows an understanding of the task. The essay takes a position on theissue and may offer some context for discussion. The essay may show some

    recognition of complexity by providing some response to counterarguments to

    the writer's position. Development of ideas is adequate, with some movement

    between general statements and specific reasons, examples, and details. Focus on

    the specific issue in the prompt is maintained throughout most of the essay. The

    organization of the essay is apparent but predictable. Some evidence of logical

    sequencing of ideas is apparent, although most transitions are simple and

    obvious. The introduction and conclusion are clear and somewhat developed.

    Language is adequate, with some sentence variety and appropriate word choice.

    There may be some distracting errors, but they do not impede understanding.

    Score = 3

    Essays within this score range demonstrate some developing skill in responding

    to the task.

    The essay shows some understanding of the task. The essay takes a position on

    the issue but does not offer a context for discussion. The essay may acknowledge

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    33/55

    33

    a counterargument to the writer's position, but its development is brief or unclear.

    Development of ideas is limited and may be repetitious, with little, if any,

    movement between general statements and specific reasons, examples, and

    details. Focus on the general topic is maintained, but focus on the specific issue

    in the prompt may not be maintained. The organization of the essay is simple.

    Ideas are logically grouped within parts of the essay, but there is little or no

    evidence of logical sequencing of ideas. Transitions, if used, are simple and

    obvious. An introduction and conclusion are clearly discernible but

    underdeveloped. Language shows a basic control. Sentences show a little variety

    and word choice is appropriate. Errors may be distracting and may occasionally

    impede understanding.

    Score = 2

    Essays within this score range demonstrate inconsistent or weak skill in

    responding to the task.

    The essay shows a weak understanding of the task. The essay may not take a

    position on the issue, or the essay may take a position but fail to convey reasons

    to support that position, or the essay may take a position but fail to maintain a

    stance. There is little or no recognition of a counterargument to the writer's

    position. The essay is thinly developed. If examples are given, they are general

    and may not be clearly relevant. The essay may include extensive repetition of

    the writer's ideas or of ideas in the prompt. Focus on the general topic is

    maintained, but focus on the specific issue in the prompt may not be maintained.

    There is some indication of an organizational structure, and some logical

    grouping of ideas within parts of the essay is apparent. Transitions, if used, are

    simple and obvious, and they may be inappropriate or misleading. An

    introduction and conclusion are discernible but minimal. Sentence structure and

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    34/55

    34

    word choice are usually simple. Errors may be frequently distracting and may

    sometimes impede understanding.

    Score = 1

    Essays within this score range show little or no skill in responding to the task.

    The essay shows little or no understanding of the task. If the essay takes a

    position, it fails to convey reasons to support that position. The essay is

    minimally developed. The essay may include excessive repetition of the writer's

    ideas or of ideas in the prompt. Focus on the general topic is usually maintained,

    but focus on the specific issue in the prompt may not be maintained. There islittle or no evidence of an organizational structure or of the logical grouping of

    ideas. Transitions are rarely used. If present, an introduction and conclusion are

    minimal. Sentence structure and word choice are simple. Errors may be

    frequently distracting and may significantly impede understanding.

    No Score

    Blank, Off-Topic, Illegible, Not in English, or Void.

    Guidelines in scoring Essay test to avoid subjectivity

    Decide what factors constitute a good answer before administering an

    essay question.

    Explain these factors in the item item.

    Read all the answers to a single essay question before reading other

    questions.

    Reread essay answer a second time after initial scoring

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    35/55

    35

    Scoring Objective ItemsFollowing are the method of scoring objectives items:

    Scoring key

    Strip key

    Scoring stencil

    Scoring key: if the pupils answers are recorded on the test paper

    itself, a scoring key is usually obtained marking the correct answer on

    a blank copy of the test.

    The scoring procedure is then simply a matter of comparing the

    columns of answers on this master copy with the columns of answers

    on each pupils paper.

    Strip Key: a strip key, which consists merely of strip of paper on

    which columns of answers are recorded may also be used.

    Scoring Stencil:where separate answer sheets are used, a scoring

    stencil is most convenient. This is a blank answer sheet with holes

    punched where the correct answers should appears.

    One of the most important advantages of objective type test is ease and

    accuracy of scoring. The best way to score objective tests is with a test

    scanner. This technology can speed up scoring and minimized scoringerrors.

    When using a test scanner, a scoring key is prepared on a machine-

    scorable answer sheet and it is read by the scanner first. After the

    scanner reads the scoring key, the student responses are read and stored

    on the hard disk of an attached computer.

    A separate program is used to score the student responses by

    comparing each response to the correct answer on the answer key.

    When this process is complete each students score, along with itemanalysis information is printed.

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    36/55

    36

    Item AnalysisThe procedure used to judge the quality of an item is called, Item analysis.

    Item analysis is a post administration examination of a test . The quality of a

    test depends upon the individual items of a test. A test is usually desirable to

    evaluate effectiveness of items. It provides information concerning how well

    each item in the test functions. An item analysis tells about the quality of an item.

    One primary goal of item analysis is to help improve the test by revising or

    discarding ineffective items. Another important function is to ascertain what test

    takers do and do not know. I

    Item Analysis describes the statistical analyses, which allow measurement of

    the effectiveness of individual test items. An understanding of the factors which

    govern effectiveness (and a means of measuring them) can enable us to create

    more effective test questions and also regulate and standardize existing tests. Item

    analysis helps to find out how difficult the test item . Similarly it also helps to

    know how well the item discriminates between high and low scorers in the test.

    Item analysis further helps to detect specific technical flaws and thus provide

    further information for improving test items

    To ascertain whether the questions/ items do their job effectively. A detailed test

    and item analysis has to be done before a meaningful and scientific inference

    about the test can be made in terms of its validity, reliability, objectivity and

    usability.

    A systematic analysis aims at finding the performance of a group.

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    37/55

    37

    The central tendency of marks obtained by them, e.g normal/average;

    positive or negative skewness high or low value.

    The variability characterized by standard deviation(SD) indicates the

    nature of spread of marks, the greater the spread, and the greater will be

    value of standard deviation.

    Coefficient of reliability for the test indicating the degree of consistency

    with which the test has measured the students abilities. A high value of

    this means that the test is reliable and it produces virtually repeatable,

    scores for the students.

    Item analysis is useful in making meaningful interpretations and value

    judgments about students performance.

    A teacher or paper setter comes to know whether the items had the right

    level of difficulty and whether there was discrimination between more able

    and less able students.

    Item analysis defines and maintains standard of performance, ensures

    comparability of standards,

    o To understand the behavior of items,

    o To become better item writers, scientific, professional and

    competent teachers.

    Item analysis is a process of examining class-wide performance on individual test

    items. There are three common types of item analysis which provide teachers

    with three different types of information:

    Difficulty Index - Teachers produce a difficulty index for a test item by

    calculating the proportion of students in class who got an item correct.

    (The name of this index is counter-intuitive, as one actually gets a measure

    of how easy the item is, not the difficulty of the item.) The larger the

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    38/55

    38

    proportion, the more students who have learned the content measured by

    the item.

    Discrimination Index - The discrimination index is a basic measure of the

    validity of an item. It is a measure of an item's ability to discriminate

    between those who scored high on the total test and those who scored low.

    Though there are several steps in its calculation, once computed, this index

    can be interpreted as an indication of the extent to which overall

    knowledge of the content area or mastery of the skills is related to the

    response on an item. Perhaps the most crucial validity standard for a test

    item is that whether a student got an item correct or not is due to their level

    of knowledge or ability and not due to something else such as chance or

    test bias.

    Analysis of Response Options - In addition to examining the performance

    of an entire test item, teachers are often interested in examining the

    performance of individual distractors (incorrect answer options) on

    multiple-choice items. By calculating the proportion of students who chose

    each answer option, teachers can identify which distractors are "working"

    and appear attractive to students who do not know the correct answer, and

    which distractors are simply taking up space and not being chosen by

    many students. To eliminate blind guessing which results in a correct

    answer purely by chance (which hurts the validity of a test item), teachers

    want as many plausible distractors as is feasible. Analyses of response

    options allow teachers to fine tune and improve items they may wish to use

    again with future classes.

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    39/55

    39

    Steps involved in Item Analysis

    For each item count the number of students in each group who answered

    the item correctly. For alternate response type of items, count the numberof students in each group who choose each alternative.

    Award of score to each student.

    A practical, simple and rapid method is to perforate on your answer sheet

    the boxes corresponding to the correct answer, placing the perforated sheet

    on the students answer sheet the raw score can be found almost

    automatically.

    A B C D

    Ranking in order of merit and identifying high and low groups.

    Arrange the answer sheets from the highest score to the lowest score.

    Make two groupsi.e., highest scores in one group; lowest scores in other

    group or top and bottom halves.

    Calculation of dif f icul ty index of a question

    For each item, compute the percentage of students who get the item correct is

    called item difficulty index.

    1. D=R/N *100

    R: number of pupils who answered the item correctly.

    N: total number of pupils who tried them.

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    40/55

    40

    The higher the difficulty index, the easier is the item. Difficulty

    level/facility level of a test; it is an index of how easy or difficult the test is

    form is a ratio of the average score of a sample of subjects on the test to the

    maximum possible score on the test. It is usually expressed in percentage.

    2. Difficulty level= average on the test/ Maximum possible score * 100

    3. Difficulty index= H+L/N *100

    H: Number of correct answers to the high group.

    L: Number of correct answers to the low group.

    N: Total number of students in both groups.

    4. Find out the facility value of objective tests first.

    5. Facility value= Number of students answering questions correctly * 100

    Number of students who have taken the test.

    If the facility value is 70 and above, those are easy questions; if it is below

    70 the questions are difficult ones.

    Estimating Discrimination Index(DI)

    The discriminating power (validity index) of an item refers to the degree to which

    a given item discriminates among students who differ sharply in the functions

    measured by the test as a whole.

    Formula-1

    DI= RU-RL/1/2 N

    RU= Number of correct responses from the upper group.

    RL= Number of correct responses from lower group.

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    41/55

    41

    N= Total number of pupils who tried them.

    High discriminate value questions are needed for selection purposes.

    Formula-2

    DI= No. of HAQ-LAQ/No. of HAG

    No. of HAQ: number of students in high ability group answering the questions

    correctly

    No. of LAQ: Number of students in low ability group answering questions

    correctly.

    No. of HAG: Number of students in high ability group

    Positive Discrimination: If an item is answered correctly by superiors

    (upper groups) and but not answered correctly by inferiors (lower group)

    such item possess positive discrimination.

    Negative Discrimination: An item answered correctly by inferiors (lower

    group) but not answered correctly by the superiors (upper groups) such item

    possess negative discrimination.

    Zero Discrimination: If an item is answered correctly by the same number ofsuperiors as well as inferiors examinees of the same group. The item cannot

    discriminate between superior and inferior examinees. Thus, the

    discrimination power of the item is zero.

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    42/55

    42

    Item analysis is a general term that refers to the specific methods used in

    education to evaluate test items, typically for the purpose of test

    construction and revision.

    Regarded as one of the most important aspects of test construction and

    increasingly receiving attention, it is an approach incorporated into item

    response theory (IRT), which serves as an alternative to classical

    measurement theory (CMT) or classical test theory (CTT). Classical

    measurement theory considers a score to be the direct result of a person's

    true score plus error.

    It is this error that is of interest as previous measurement theories have

    been unable to specify its source. However, item response theory uses item

    analysis to differentiate between types of error in order to gain a clearer

    understanding of any existing deficiencies.

    Particular attention is given to individual test items, item characteristics,

    probability of answering items correctly, overall ability of the test taker,

    and degrees or levels of knowledge being assessed.

    The Purpose Of I tem Analysis

    There must be a match between what is taught and what is assessed.

    However, there must also be an effort to test for more complex levels of

    understanding, with care taken to avoid over-sampling items that assess

    only basic levels of knowledge. Tests that are too difficult (and have an insufficient floor) tend to lead to

    frustration and lead to deflated scores, whereas tests that are too easy (and

    have an insufficient ceiling) facilitate a decline in motivation and lead to

    inflated scores.

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    43/55

    43

    Tests can be improved by maintaining and developing a pool of valid items

    from which future tests can be drawn and that cover a reasonable span of

    difficulty levels.

    Item analysis helps improve test items and identify unfair or biased items.

    Results should be used to refine test item wording. In addition, closer

    examination of items will also reveal which questions were most difficult,

    perhaps indicating a concept that needs to be taught more thoroughly.

    If a particular distracter (that is, an incorrect answer choice) is the most

    often chosen answer, and especially if that distracter positively correlates

    with a high total score, the item must be examined more closely for

    correctness. This situation also provides an opportunity to identify and

    examine common misconceptions among students about a particular

    concept.

    In general, once test items have been created, the value of these items can

    be systematically assessed using several methods representative of item

    analysis:

    a) a test item's level of difficulty,

    b) an item's capacity to discriminate, and c) the item characteristic curve.

    Difficulty is assessed by examining the number of persons correctly

    endorsing the answer. Discrimination can be examined by comparing the

    number of persons getting a particular item correct with the total test score.

    Finally, the item characteristic curve can be used to plot the likelihood of

    answering correctly with the level of success on the test.

    Using Item Analysis Results

    It helps the judge the worth or quality of a test.

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    44/55

    44

    Aids in subsequent test revisions.

    Lead to increase skill in test construction.

    Provides diagnostic value and help in planning future learning activities.

    Provides a basis for discussing test results.

    For making decisions about the promotion of students to the next higher

    grade.

    To bring about improvement in teaching methods and techniques.

    For making decisions about the promotion of students to the next higher

    grade.

    To bring about improvement in teaching methods and techniques.

    I tem Dif fi culty

    Perhaps item difficulty should have been named item easiness; it expresses

    the proportion or percentage of students who answered the item correctly. Item

    difficulty can range from 0.0 (none of the students answered the item correctly)

    to 1.0 (all of the students answered the item correctly). Experts recommend that

    the average level of difficulty for a four-option multiple choice test should be

    between 60% and 80%; an average level of difficulty within this range can be

    obtained, of course, when the difficulty of individual items falls outside of this

    range. If an item has a low difficulty value, say, less than .25, there are several

    possible causes: the item may have been miskeyed; the item may be too

    challenging relative to the overall level of ability of the class; the item may be

    ambiguous or not written clearly; there may be more than one correct answer.

    Further insight into the cause of a low difficulty value can often be gained by

    examining the percentage of students who chose each response option. For

    example, when a high percentage of students chose a single option other than the

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    45/55

    45

    one that is keyed as correct, it is advisable to check whether a mistake was made

    on the answer key.

    Item Statistics

    Item statistics are used to assess the performance of individual test items on the

    assumption that the overall quality of a test derives from the quality of its items.

    Item Number.

    This is the question number taken from the student answer sheet. Up to 150 items

    can be scored on the Standard Answer Sheet (purple).

    Mean and S.D.

    The mean is the "average" student response to an item. It is computed by adding

    up the number of points earned by all students for the item, and dividing that total

    by the number of students.

    The standard deviation, orS.D., is a measure of the dispersion of student scores

    on that item,

    that is, it indicates how "spread out" the responses were. The item standard

    deviation is most

    meaningful when comparing items which have more than one correct alternative

    and when scale scoring is used. For this reason it is not typically used to evaluate

    classroom tests.

    Item Difficulty.

    For items with one correct alternative worth a single point, the item difficulty is

    simply the percentage of students who answer an item correctly. In this case, it is

    also equal to the item mean. The item difficulty index ranges from 0 to 100; the

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    46/55

    46

    higher the value, the easier the question. When an alternative is worth other than

    a single point, or when there is more than one correct alternative per question, the

    item difficulty is the average score on that item divided by the highest number of

    points for any one alternative.

    Item difficulty is relevant for determining whether students have learned the

    concept being tested. It also plays an important role in the ability of an item to

    discriminate between students who do not. The item will have low discrimination

    if it is so difficult that almost everyone gets it wrong or guesses, or so easy that

    almost everyone gets it right.

    To maximize item discrimination, desirable difficulty levels are slightly higher

    than midway

    between chance and perfect scores for the item. (The chance score for five-option

    questions, for example, is .20 because one-fifth of the students responding to the

    question could be expected to choose the correct option by guessing.) Ideal

    difficulty levels for multiple-choice items in terms of discrimination potential are:

    Format Ideal Difficulty

    Five-response multiple-choice 70

    Four-response multiple-choice 74

    Three-response multiple-choice 77

    True-false (two-response multiple choice) 85

    classifies item difficulty as "easy" if the index is 85% or above; "moderate" if it

    is between 51 and 84%; and "hard" if it is 50% or below.

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    47/55

    47

    I tem Discrimination

    Item discrimination refers to the ability of an item to differentiate among students

    on the basis of how well they know the material being tested. Various hand

    calculation procedures have traditionally been used to compare item responses to

    total test scores using high and low scoring groups of students. Computerized

    analyses provide more accurate assessment of the discrimination power of items

    because they take into account responses of all students rather than just high and

    low scoring groups.

    The item discrimination index between student responses to a particular item and

    total scores on all other items on the test.

    This index is the equivalent of a point-biserial coefficient in this application. It

    provides an

    estimate of the degree to which an individual item is measuring the same thing as

    the rest of the items.

    Because the discrimination index reflects the degree to which an item and the test

    as a whole are measuring a unitary ability or attribute, values of the coefficient

    will tend to be lower for tests measuring a wide range of content areas than for

    more homogeneous tests.

    Item discrimination indices must always be interpreted in the context of the type

    of test which is being analyzed.

    Items with low discrimination indices are often ambiguously worded and should

    be examined.

    Items with negative indices should be examined to determine why a negative

    value was obtained.

    For example, a negative value may indicate that the item was miskeyed, so that

    students who

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    48/55

    48

    knew the material tended to choose an unkeyed, but correct, response option.

    Tests with high internal consistency consist of items with mostly positive

    relationships with total

    test score. In practice, values of the discrimination index will seldom exceed .50

    because of the

    differing shapes of item and total score distributions. Item discrimination as

    "good" if the index is above .30; "fair" if it is between .10 and .30; and "poor" if

    it is below .10.

    Alternate Weight.

    This column shows the number of points given for each response alternative.

    For most tests, there will be one correct answer which will be given one point,

    but ScorePak

    allows multiple correct alternatives, each of which may be assigned a different

    weight.

    Means.

    The mean total test score (minus that item) is shown for students who selected

    each of

    the possible response alternatives. This information should be looked at in

    conjunction with the

    discrimination index; higher total test scores should be obtained by students

    choosing the correct,

    or most highly weighted alternative. Incorrect alternatives with relatively high

    means should be

    examined to determine why "better" students chose that particular alternative.

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    49/55

    49

    Frequencies and Distribution.

    The number and percentage of students who choose each

    alternative are reported. The bar graph on the right shows the percentage

    choosing each

    response. Frequently chosen wrong alternatives may indicate common

    misconceptions among

    the students.

    Difficulty and Discrimination Distributions

    At the end of the Item Analysis report, test items are listed according their

    degrees of difficulty (easy, medium, hard) and discrimination (good, fair, poor).

    These distributions provide a quick overview of the test, and can be used to

    identify items which are not performing well and which can perhaps be improved

    or discarded.

    Test Statistics

    Two statistics are provided to evaluate the performance of the test as a whole.

    Reliability Coefficient.

    The reliability of a test refers to the extent to which the test is likely to

    produce consistent scores. The particular reliability coefficient reflects three

    characteristics of the test:

    1. The inter correlations among the items -- the greater the relative number of

    positive relationships, and the stronger those relationships are, the greater the

    reliability. Item discrimination indices and the test's reliability coefficient are

    related in this regard.

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    50/55

    50

    2. The length of the test -- a test with more items will have a higher reliability, all

    other things

    being equal.

    3. The content of the test -- generally, the more diverse the subject matter tested

    and the testing

    techniques used, the lower the reliability.

    Reliability coefficients theoretically range in value from zero (no reliability) to

    1.00 (perfect

    reliability). In practice, their approximate range is from .50 to .90 for about 95%

    of the classroom tests scored

    High reliability means that the questions of a test tended to "pull together."

    Students who

    answered a given question correctly were more likely to answer other questions

    correctly. If a

    parallel test were developed by using similar items, the relative scores of students

    would show

    little change.

    Low reliability means that the questions tended to be unrelated to each other in

    terms of who

    answered them correctly. The resulting test scores reflect peculiarities of the

    items or the testing situation more than students' knowledge of the subject matter.

    As with many statistics, it is dangerous to interpret the magnitude of a reliability

    coefficient out of context. High reliability should be demanded in situations in

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    51/55

    51

    which a single test score is used to make major decisions, such as professional

    licensure examinations. Because classroom

    examinations are typically combined with other scores to determine grades, the

    standards for a

    single test need not be as stringent. The following general guidelines can be used

    to interpret

    reliability coefficients for classroom exams:

    Reliability Interpretation

    .90 and above Excellent reliability; at the level of the best standardized tests

    .80 - .90 Very good for a classroom test

    .70 - .80 Good for a classroom test; in the range of most. There are probably a

    few items which could be improved.

    .60 - .70 Somewhat low. This test needs to be supplemented by other measures

    (e.g., more tests) to determine grades. There are probably some items which

    could be improved.

    .50 - .60 Suggests need for revision of test, unless it is quite short (ten or fewer

    items). The test definitely needs to be supplemented by other measures (e.g.,

    more tests) for grading.

    .50 or below Questionable reliability. This test should not contribute heavily to

    the course grade, and it needs revision.

    The measure of reliability used. This is the general form of the more commonly

    reported KR-20 and can be applied to tests composed of items with different

    numbers of points given for different response alternatives. When coefficient

    alpha is applied to tests in which each item has only one correct answer and all

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    52/55

    52

    correct answers are worth the same number of points, the resulting coefficient is

    identical to KR-20.

    Standard Error of Measurement.

    The standard error of measurement is directly related to the reliability of the test.

    It is an index of the amount of variability in an individual student's performance

    due to random measurement error. If it were possible to administer an infinite

    number of parallel tests, a student's score would be expected to change from one

    administration to the next due to a number of factors. For each student, the scores

    would form a "normal" (bellshaped) distribution. The mean of the distribution is

    assumed to be the student's "true score," and reflects what he or she "really"

    knows about the subject. The standard deviation of the distribution is called the

    standard error of measurement and reflects the amount of change in the student's

    score which could be expected from one test administration to another.

    Whereas the reliability of a test always varies between 0.00 and 1.00, the

    standard error of

    measurement is expressed in the same scale as the test scores. For example,

    multiplying all test

    scores by a constant will multiply the standard error of measurement by that same

    constant, but

    will leave the reliability coefficient unchanged.

    A general rule of thumb to predict the amount of change which can be expected

    in individual test

    scores is to multiply the standard error of measurement by 1.5. Only rarely would

    one expect a

    student's score to increase or decrease by more than that amount between two

    such similar

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    53/55

    53

    tests. The smaller the standard error of measurement, the more accurate the

    measurement

    provided by the test.

    A CAUTION in I nterpreting I tem Analysis Resul ts

    Each of the various item statistics provides information which can be used to

    improve individual test items and to increase the quality of the test as a whole.

    Such statistics must always be interpreted in the context of the type of test given

    and the individuals being tested are not synonymous with item validity.

    1.An external criterion is required to accurately judge the validity of test items.

    By using the internal criterion of total test score, item analyses reflect internal

    consistency of items rather than validity.

    2. The discrimination index is not always a measure of item quality. There is a

    variety of reasons an item may have low discriminating power:

    (a) extremely difficult or easy items will have low ability to discriminate but

    such items are often needed to adequately sample course content and objectives;

    (b) An item may show low discrimination if the test measures many different

    content areas and

    cognitive skills. For example, if the majority of the test measures "knowledge of

    facts," then an item assessing "ability to apply principles" may have a low

    correlation with total test score, yet both types of items are needed to measure

    attainment of course objectives.

    3.Item analysis data are tentative. Such data are influenced by the type and

    number of students being tested, instructional procedures employed, and chance

    errors. If repeated use of items is possible, statistics should be recorded for each

    administration of each item.

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    54/55

    54

    Summary

    In the light of above discussion, we have discussed about administrating a test

    and various suggestions to administer the test, importance of test administration,

    recommendations for improving test scores. we learnt about scoring methods ,

    various standard scores and marking and grading criteriaand its types. We

    discussed about scoring essay test and objective test. We had detailed glance on

    item analysis ,item difficulty and its uses.

    Conclusion

    By above discussion, I conclude the topic that by knowing proper knowledge

    about good practice of administration of test and various methods of scoring the

    test helps to improve performance of the student and teachers evaluation skill.

  • 7/30/2019 ADMINISTRATION,Reporting and Scoring 1

    55/55

    Bibliography

    B. Sankaranarayan(2008), LEARNING AND TEACHING NURSING, 2nd

    edition, Brainfill publishers. Pg no-232-233

    K P Neeraja(2003), TEXTBOOK OF NURSING EDUCATION,1st edition,

    Gopson paper ltd, Noida. Pg no-413-425

    Francis M. Quinn(2000), PRINCIPLE AND PRACTICE OF NURSE

    EDUCATION, 4th edition, nelson thornes ltd. Pg no-210-214

    Marlyin H. Oermann(2009), EVALUATION AND TESTING NURSING

    EDUCATION, 3

    rd

    edition, springe publisher. Pg no- 122-126