    discuss in detail shortly), ESTIMATES the RANGE of scores

    within which an individuals true score or true level of ability lies.

    Student A gets a 75 on a test, we can only hope that As TRUESCORE - her actual level of ability- is somewhere around 75.

    The closer the reliability of the test is to perfect (r = 1.00), the more

    likely it is that the true score is very close to 75.

    If your obtained scores do not always reflect your true ability (if

    they underestimated or overestimated your true ability), then they

    were associated with some error.

    In other words, your OBTAINED SCORE has a TRUE SCORE

    component (actual level of ability, skill, knowledge), and an

    ERROR component (which acts to raise or lower the obtained


    Obtained score = True score +/- error score

    The Standard Error of Measurement (Sm)

    The standard error of measurement is the STANDARD

    DEVIATION of the ERROR scores of a test.

    Although we can never know the error scores, we can ESTIMATE

    the standard error of measurement by using the following formulawhere r is the reliability of the test:

    Sm = SD1- r

    Using the Standard Error of Measurement

    The distribution of error scores approximates the normal


    We can extend this information to construct a band around anyobtained score to identify the range of scores that, at a certain level

    of confidence, will capture or span an individuals true score.

    The SEM can be used to provide the following:

    To make an estimation of the value of a persons true score. Inother words, we can use it to predict what would happen if a person

    took additional equivalent tests.

    68% of the scores would fall between +or - 1 SEM of the true score.

    95% of the scores would fall between +or - 2 SEM of the true score.

    99.7% of the scores would fall between +or - 3 SEM of the true


    Thus, if a person achieved a score of 80 on a math test, and theSEM for that test was 5, then we could state the following:

    68% of the scores would fall between ____ and____

    95% of the scores would fall between ____and ____

    99.7% of the scores would fall between ___and____

    When applied to the prediction of future test performance,

    these ranges are known as CONFIDENCE INTERVALS

    That is, we can be:

    68%of the scores would fall between +or - 1 SEM of the true score.

    95%of the scores would fall between +or - 2 SEM of the true score.

    99.7% of the scores would fall between +or - 3 SEM of the true score.

    Confidence Intervals

    Finally, the SEM can be used to determine whether a score issignificantly different from a particular criterion such as a cutoffscore.

    If a person received a score of 105 on the WAIS, that has an SD of15, a reliability of .97, and an SEM of 2.5, how confident can we bethat repeated testing would not place this person in the gifted range(130 or above)?

    68% confident that the true score lies between____and_____

    95% confident that the true score lies between ___and_____

    99.7% confident that the true score lies between___and____

    Confidence IntervalsIn conclusion, the SEM is a statistic that estimates for us just

    how fallible, or error-prone tests are.

    Confidence Interval

    In education, we have long had a tendency to

    OVERINTERPRET small differences in test scores since we

    too often consider obtained scores to be completely accurate.

    Reliability and the SEM

    If a test is perfectly reliable (r = 1.00), then a student will always get

    exactly the same score, there will be no error and the SEM will be 0.

    If the test is not reliable, the SEM will be almost as big as the SD;


    The SD is the variability of raw scores; the SEM is the variability of

    error scores.

    Sources of Error

    Error Within Test-Takers (Intra-Individual Error)

    These include any within-student factors that would result

    in obtained scores being lower or higher than true scores.

    Error Within the Test

    This error is within-test and can include: trick questions;

    reading level too high; ambiguous questions; grammatical cues inthe items; items too easy or too difficult; and poorly written items.

    Error in Test Administration

    This error includes the following:Physical comfort

    Instructions & explanations- Different test

    administrators provide different amounts to test takers.

    Test administrator attitudes - Administrators differ in

    the notions they convey about the importance of the

    test, the extent to which they are emotionally supportiveof students, and the way in which they monitor the test.

    Error in Test Administration

    Error is Scoring

    Computer scoring has decreased this source of error.

    But teachers and administrators can still make mistakes onanswer keys; students dont use #2 pencils or make stray marks;

    and hand scoring can lead to error.

    Sources of Error Influencing Various Reliability


    Test-Retest Reliability

    If test-retest coefficients are determined over a short time, the effects

    of within-student error should be small.

    What about sources of:

    within test error ?

    error in administration?

    error in scoring?

    Sources of Error

    Alternated-forms reliability

    Since this form of reliability is determined by administering two

    different forms of the test to the same group close together in

    time, the effects of within-student error should be small.

    Sources of Error

    Internal consistency

    With this type of reliability, neither within-student nor within-

    test sources of error will exert an influence, since only 1 test isgiven one time. The same goes for administration and scoring


    Band Interpretation

    Johns scores on end of year achievement test

    Sub-tests Scores

    Reading 103Listening 104

    Writing 105

    Social Studies 98

    Science 100

    Math 91

    Band Interpretation

    How large a difference do we need between test scores toconclude that the differences represent real and not chance


    We can use the SEM to answer these questions, using a technique


    1. First, determine the SEM for each sub-test.

    2. Add and subtract the SEM for each sub-test score.

  • 8/14/2019 Error Organizer


    3. Graph each scale- Shade in the bands to represent the

    range of scores that has a 68% (or 95%) chance of capturing

    Johns true score.

    4. Interpret the bands- Interpret the profile of bands by

    visually inspecting the bars to see which band overlap and

    which do not.

    Band Interpretation

    Using the 68% band-- those bands that overlap probably

    represent differences that occurred by chance.

    In Johns case, his the difference between math and the

    other sub-tests, and Social Studies and Writing represent

    real differences (because there is no overlap).

    Band Interpretation

    What happens if we take a more conservative approach by

    using the 95% band?

    Since the bands are larger with 95% approach, the only real

    difference we find at the 95% level are between Johns math

    achievement and his achievement in listening and writing.

    Band Interpretation

    All the other bands overlap, suggesting that at the 95% level the

    differences in obtained scores are due to chance.

    If we employ the more conservative approach, we would concludethat even though the difference between Johns obtained reading and

    math scores is 12 points (103-91= 12), the difference is due to

    chance, not to a real difference in achievement.

    Band Interpretation

    To make it simpler, let differences at the 68% level be a signal to

    you. Let differences at the 95% level be a signal to the school and

    to parents.

    Also, use the 95% approach when determining real differences

    between a students potential for achievement (aptitude) and

    actual achievement.