Microsoft Word - Report on Descriptive Statistics and Item Analysis

Report on Descriptive Statistics and Item Analysis of

Objective Test Items

ii

Report on Descriptive Statistics and Item Analysis of Objective Test Items on

Data Extracted From the Grade 12 Final English Second Language Exam

2008.

by Stephan Freysen

Prof. T Kuhn CIA 722

7 April 2008

iii

Acknowledgements

I would like to express extreme gratitude to the Gauteng Department of

Education for the professional and cooperative manner in which they dealt.

The data-gathering for this report would have been far much more gruelling

had it not been for the selfless assistance that Mr. Y Zafir and Ms. L Bongani

provided me with.

I would also like to thank Prof. Knoetze for tabulating the test data. Thanks

to Prof. Kuhn for setting up a template with formulas. It has been a great

help.

iv

Descriptive Abstract

This report is written so that judgement can be passed on the reliability of

the multiple –choice test in the grade 12 English second language final

exam.

v

Table of contents

Acknowledgements

iii

Descriptive abstract

iv

List of Tables

vi

List of Figures

vii

Terminology list

viii

1. Introduction and purpose

1

2. Test analysis

2

2.1 Descriptive Test Analysis. 2

2.2 Graphic Representation

5

2.3 Reliability Coefficient. 7

3. Item analysis

8

3.1 Difficulty Index 10

3.2 Discrimination Index 11

4. Conclusion

12

Bibliography

13

Appendix A: Test Data

vi

List of tables

Table 2.1: Tabulated Test Scores

Table 2.2: Measure of Central Tendency

Table 2.3: Frequency Distribution

Table 2.4: Test scores with pq values

Table 3.1: Item Difficulty Indices

Table 3.2: Item Discrimination Indices

vii

List of Figures

Figure 2.1: Histogram of Frequency

Figure 2.2: Polygon of Frequency

Figure 2.3: Ogive of Frequency

Figure 4.1: Percentage of acceptability

viii

Terminology List

Descriptive Statistics The term used to refer to the mode, median and

mean.

Difficulty Index “Proportion of students who answered the item

correctly.” Borich & Kubiszyn (2007: 205)

Discrimination Index “Measure of the extent to which a test item

discriminates or differentiates between students who

do well on the overall test an those who do not do

well on the overall test.”

Borich & Kubiszyn (2007: 205)

Mean The average of a set of numbers

Median The score that splits the distribution in half.


Mode The score that appears most frequently in a set of

scores.


Quantitative Item

Analysis

“A numerical method for analyzing test items

employing student response alternatives or options.”


Reliability Refers to the internal consistency of a test.


Standard Deviation “The estimate of variability that accompanies the

mean in describing a distribution.”


1

1. Introduction

As we have all experienced, objective test items are a very popular tool for testing

knowledge. One of the most popular objective test item types is the multiple-

choice format. According to Borich & Kubiszyn (2007: 116), the uniqueness of

multiple-choice items is that these items allow you to measure knowledge at

higher levels in Bloom’s taxonomy than other objective test items. This provides a

problem, as assessors often do not consider any academic guidelines to set these

questions. The result being that the items differ vastly from one another in

difficulty indices and that they often present unrealistic discrimination indices.


The purpose of this report is to analyse the multiple-choice test item data that was

extracted from the final English second language grammar exam of 2008. This will

be achieved through analysis of the measure of central tendency and variability of

the data. The first part of the analysis will consist of the analysis of the question

(test) as a whole. The second part of the analysis will consist of individual item-

analysis.

The data includes the answers of twenty questions that were given by twenty five

learners. This is a small sample group, but it should provide enough critique on the

multiple-choice section of the exam to offer a detailed overview of the test’s

reliability. The findings in the report will be used to determine whether the

multiple-choice test items present in the exam was of adequate and fair difficulty.

2

2. Test Analysis.

2.1 Descriptive Test Analysis.

In quantitative analysis, the first step is to tabulate the raw test scores. According

to Borich & Kubiszyn (2007: 204), this type of analysis is the ideal for multiple-

choice tests.

Consider table 2.1 for the ascending numerical sorting of the test scores.

Table 2.1: Tabulated Test Scores

Learner Percentage of items correct

L19 15

L1 30

L17 35

L24 40

L7 45

L15 50

L22 50

L6 55

L21 55

L10 60

L18 65

L4 65

L8 65

L9 65

L23 70

L5 70

L12 75

L13 85

3

L14 85

L20 85

L25 85

L2 90

L3 90

L11 100

L16 100

As depicted in table 2.1, we can determine the lower scores, higher scores and the

middle scores. We can see that considering the 40% cut-off rate, only three

students failed this test, while eight students obtained a distinction

The measure of central tendency for these test scores in table 2.1 can be seen in

table 2.2

Table 2.2: Measure of Central Tendency.

Mean Median Mode Standard Deviation

65.2 65 65, 80 21.7

An equal distribution of 65% and 80% among these scores shows that it is bi-

modal. Most scores are above the mean. The next step is to group the scores in

table 2.1 into intervals. This is done in order to determine a simple frequency

distribution. In table 2.3, one can see the intervals, the lower and upper limits of

the intervals, the frequency and the cumulative frequency.

4

Table 2.3: Frequency Distribution

Learner Scores Lower limit Upper Limit

Mid Value Interval Frequency Cumulative Frequency

L19 15 15 22 18.5 15-22 1 1

L1 30 23 30 26.5 23-30 1 2

L17 35 31 38 34.5 31-38 1 3

L24 40 39 46 42.5 39-46 2 5

L7 45 47 54 50.5 47-54 2 7

L15 50 55 62 58.5 55-62 3 10

L22 50 63 70 66.5 63-70 6 16

L6 55 71 78 74.5 71-78 1 17

L21 55 79 86 82.5 79-86 4 21

L10 60 87 94 90.5 87-94 2 23

L18 65 95 102 98.5 95-102 2 25

L4 65

L8 65

L9 65

L23 70

L5 70

L12 75

L13 85

L14 85

L20 85

L25 85

L2 90

L3 90

L11 100

L16 100

2.2 Graphic Representation

In Figure 2.1, we can see that one learner scored between 20% and 26%. Three

learners scored between 34% and 4

graph shows that between one

learners scored between 41% and 47%

48% and 54%. Another three learners scored between 55% and 6

scored between 63% and 70%.

learners scored between 79% and 8

Between four and eight of the learners

2.3 once again, we can see that although the graph is accurate, the detail of the

distribution is still unclear, due to the large ga

Figure 2.1: Histogram of Frequency

In figure 2.2, the average of the int

see that the graph correlates

analysis done in figure 2.1 is reliable.

0

1

2

3

4

5

6

7Frequency Histogram

20-26 27-33 34-40 41-47

5


% and 47%. As the cut-off for passing is 40%, this

ne and three of these students passed. T

ers scored between 41% and 47% and another two learners scored between

learners scored between 55% and 62%.

%. One learner scored between 71% and 7

% and 86% and four learners scored

of the learners achieved distinctions. If we consider table


distribution is still unclear, due to the large gap in scores implied by the

e of the interval is depicted on the horizontal axis. We can

with figure 2.1 and can thus trust that the data

figure 2.1 is reliable.

Intervals

Frequency Histogram

48-54 55-62 63-70 71-78 79-86 87-94 95-102


for passing is 40%, this

passed. Two more

wo learners scored between

%. Six learners

% and 78%. Four

learners scored above that.

distinctions. If we consider table


the intervals.

on the horizontal axis. We can

with figure 2.1 and can thus trust that the data

102

Figure 2.2: Polygon of Frequency

Figure 2.3 concentrates on the upper values of the intervals. This curve also

correlates with figures 2.2 and 2.1.

Figure 2.3: Ogive of Frequency

0

1

2

3

4

5

6

7

0 20 40

f

Middle Values

Frequency Polygon

22, 1 30, 1 38, 1

46, 2

0

1

2

3

4

5

6

7

0 20 40

f

Upper Values

Frequency Ogive

6


correlates with figures 2.2 and 2.1.

60 80 100 120

Middle Values

Frequency Polygon

Series1

Linear (Series1)

46, 2 54, 2

62, 3

70, 6

78, 1

86, 4

94, 2 102, 2

60 80 100 120

Upper Values

Frequency Ogive

Series1


Series1

Linear (Series1)

Series1

7

All three graphs are leptokurtic and negatively skewed. This implies that that the

sample group did truly well in the multiple-choice test. According to Borich &

Kubiszyn (2007: 257), there can be multiple reasons for this, for example, that

the sample group might have been of high intelligence, that the test may have

been too easy or that the time-constraints for the test was too lenient.

2.3 Reliability Coefficient.

“Another way of estimating the internal consistency of a test is through one of the

Kuder-Richardson methods.” Borich & Kubiszyn (2007: 321)

For the purpose of this analysis, we will use the KR20 method, as it is the more

accurate way of determining the reliability of a test. Borich & Kubiszyn (2007:

322)

The formula for this test is: ��

� � 1 �1 � ∑ ��

From the data found in table 2.4, we can determine the reliability coefficient.

KR20� � 2020‐1 ��2.830336

240668 KR20��1.05��‐0.0000118� KR20��1.05��‐0.0000118� KR20� ‐0.00001239

The answer is a negative value and this can be interpreted that the test is not

reliable. Since the KR20 is equal to a very small negative amount, it is safe to

assume that the reliability is not far out, but the test is still too easy.

Based on the diminutive magnitude of the answer to the KR20, the KR21 method

was used as well to verify the reliability of the test.

8

�� 1 �1 � !� � !�

"� �� 20

20 � 1 �1 � 65.2�20 � 65.2�434�

�� 2019 �1 � 65.2��45.2�

188356 �� 1.05 ��64.2��45.2�

188356 �� 0.015

Since the outcome of this formula is a positive value, it complicates the decision of

whether the test is acceptable or not. The reason for this contradiction may lie

therein that the KR20 is more accurate than the KR21 Borich & Kubiszyn (2007:

322) and since both formulas provide small answers, it is probably safe to assume

that the test lies on the border of reliability. Since this is the case we will need to

analyse the difficulty and discrimination indices of each item individually.

9

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Total % Level

L1 1 1 0 0 0 0 0 0 0 0 1 0 0 0 1 1 1 0 0 0 6 30 L

L2 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 18 90 U

L3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 18 90 U

L4 1 1 1 0 1 1 0 1 1 1 1 1 0 1 0 1 1 0 0 1 14 70 U

L5 1 1 1 0 1 1 0 1 1 0 1 1 1 1 1 1 0 0 1 1 15 75 U

L6 1 0 1 1 0 1 0 0 1 0 1 1 0 1 1 1 0 0 0 1 11 55 U

L7 0 1 0 0 1 1 0 0 0 0 1 1 1 1 0 1 0 1 0 1 10 50 U

L8 1 1 1 0 1 1 0 0 0 0 1 1 1 1 1 1 1 0 1 0 13 65 U

L9 1 1 1 0 1 1 1 0 0 0 1 1 1 1 1 1 1 0 0 0 13 65 U

L10 1 1 0 0 1 1 1 0 0 0 1 0 0 1 0 1 1 1 1 1 12 60 U

L11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 20 100 U

L12 1 1 1 1 1 1 1 0 0 0 1 1 0 1 1 1 1 0 1 0 14 70 U

L13 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 17 85 U

L14 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 17 85 U

L15 1 1 1 1 1 0 0 1 0 0 1 1 0 0 1 0 0 0 0 0 9 45 L

L16 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 20 100 U

L17 0 1 0 0 1 0 1 0 1 0 1 0 1 1 1 0 0 0 0 0 8 40 L

L18 1 1 0 1 1 0 1 0 0 0 1 1 0 1 1 1 1 0 1 1 13 65 U

L19 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 3 15 L

L20 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 17 85 U

L21 1 0 1 1 0 1 0 0 1 0 1 1 0 1 1 1 0 0 0 1 11 55 U

L22 0 1 0 0 1 1 0 0 0 0 1 1 1 1 0 1 0 1 0 1 10 50 L

L23 1 1 1 0 1 1 0 0 0 0 1 1 1 1 1 1 1 0 1 0 13 65 U

L24 1 1 0 0 0 0 0 0 0 0 1 0 0 0 1 1 1 0 1 0 7 35 L

L25 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 0 1 17 85 U

p 0.8

4

0.88 0.68 0.48 0.84 0.68 0.44 0.5217

39

0.52 0.3333

33

0.92 0.76 0.6 0.84 0.8 0.916

667

0.62

5

0.3333

33

0.52 0.64

q 0.1

6

0.12 0.32 0.52 0.16 0.32 0.56 0.4782

61

0.48 0.6666

67

0.08 0.24 0.4 0.16 0.2 0.083

333

0.37

5

0.6666

67

0.48 0.36

pq 0.1

34

0.10

56

0.21

76

0.24

96

0.13

44

0.21

76

0.24

64

0.2495

27

0.24

96

0.2222

22

0.07

36

0.18

24

0.24 0.13

44

0.16 0.076

389

0.23

4375

0.2222

22

0.24

96

0.2304 3.830

336

Var 490.58

33

Part

1

1.0526

32

Part

2

0.9939

63

Table 2.4: Test scores with p and

10

3. Item Analysis

3.1 Difficulty Index

When considering table 3.1, we find that the difficulty indices demonstrate that

seven of the twenty questions were unacceptable because they were too easy.

These include Questions 1, 2, 5, 11, 14, 15 and16. Questions 6 and 12 were a bit

easy and the rest of the questions were of acceptable difficulty.

Table 3.1: Item Difficulty Indices

Question Difficulty Rating

Q1 .84 Unacceptable (too easy)


Q3 .68 Acceptable

Q4 .48 Acceptable


Q6 .68 Easy

Q7 .44 Acceptable

Q8 .52 Acceptable

Q9 .52 Acceptable

Q10 .33 Acceptable


Q12 .76 Easy

Q13 .60 Acceptable




Q17 .62 Acceptable

Q18 .33 Acceptable

Q19 .52 Acceptable

Q20 .64 Acceptable

11

3.2 Discrimination Index

In table 3.2, we can see that there are six items with a low discrimination index.

These items will have to be revised. It is also rather interesting to note the

correlation between the unacceptable difficulty indices and the unacceptable

discrimination indices as well as the correlation between the acceptable difficulty

indices and the acceptable discrimination indices.

Table 3.2: Item Discrimination Indices

Question Discrimination Rating

Q1 0.16 Negative

Q2 0.12 Negative

Q3 0.32 Positive

Q4 0.52 Positive

Q5 0.16 Negative

Q6 0.32 Positive

Q7 0.56 Positive

Q8 0.48 Positive

Q9 0.48 Positive

Q10 0.67 Positive

Q11 0.08 Negative

Q12 0.24 Positive

Q13 0.40 Positive

Q14 0.16 Negative

Q15 0.20 Negative

Q16 0.08 Positive

Q17 0.38 Positive

Q18 0.67 Positive

Q19 0.48 Positive

Q20 0.36 Positive

4. Conclusion


In this report on the 2008 Grade 12 English second language exam, the

assumption can be made that the multiple

thorough analysis of the freque

difficulty indices and the reliab

Items 1,2,5,11,14 and 15 will need revision so that this test may be graded as

reliable. Consider that 76% of the test

other 24% of the test is too easy. The questions mentioned were all rather easy

and therefore not really applicable for a final exam.

12



assumption can be made that the multiple-choice test was rather easy. The

the frequency, standard deviation, discrimination indices,

difficulty indices and the reliability coefficient clearly proved this assumption.


reliable. Consider that 76% of the test, as seen in figure 4.1, is reliable and the


and therefore not really applicable for a final exam.

Acceptable

76%

24%

Reliability


choice test was rather easy. The

ncy, standard deviation, discrimination indices,

this assumption.


is reliable and the


13

Bibliography

Borich, T. &. (2007). Educational Testing and Measurement: Classroom Application and Practice. NJ: John

Wiley & Sons. Inc.

Knoetze, J. (2007). Test Data. Retrieved April 1, 2008, from

http://www.jknoetze.co.za/CIA_722/testdata.xls

Appendix A: Test Data

Key C B D D B C D A C B A C B D A A C D B C

St No Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20

1 C B B A C D A D D A D A A A A C B D B

2 C B D D B D A A C B A C B D A A C D B C

3 C B D D B C D A C B A C B D A A C B D C

4 C B D B B C B A C B A C A D C A C B C C

5 C B D C B C B A C D A C B D A A A B B C

6 C A D D C C A D C D A C A D A A A B D C

7 B B A B B C B B D D A C B D C A A D D C

8 C B D B B C B D B C A C B D A A C A B A

9 C B D A B C D D B D A C B D A A C B D A

10 C B B A B C D C D C A B A D D A C D B C

11 C B D D B C D A C B A C B D A A C D B C

12 C B D D B C D D D A A C A D A A C B B D

13 C B D A B C D A C B A C B D A A A B B C

14 C B D A B C D A C B A C B D A A A B C

15 C B D D B B A A B D A C D A A C B B D D

16 C B D D B C D A C B A C B D A A C D B C

17 B B C C B A D D C A D B D A C A D

18 C B B D B A D D D D A C A D A A C B B C

19 D C A D B A B A D C C D A A D B B B A B

20 C B D D B C D A C A C D B D A A C D B C

21 C A D D C C A D C D A C A D A A A B D C

22 B B A B B C B B D D A C B D C A A D D C

23 C B D B B C B D B C A C B D A A C A B A

24 C B B A C D A D D A D A A A A C B D B

25 C B D D B D A A C B A C B D A A C D B C

Documents

Microsoft Word - Report on Descriptive Statistics and Item Analysis