4
Testing and Diagnostic Testing in Mathematics Author(s): M. Cresswell Source: Mathematics in School, Vol. 6, No. 4 (Sep., 1977), pp. 25-27 Published by: The Mathematical Association Stable URL: http://www.jstor.org/stable/30212458 . Accessed: 22/04/2014 11:29 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . The Mathematical Association is collaborating with JSTOR to digitize, preserve and extend access to Mathematics in School. http://www.jstor.org This content downloaded from 130.239.116.185 on Tue, 22 Apr 2014 11:29:29 AM All use subject to JSTOR Terms and Conditions

Testing and Diagnostic Testing in Mathematics

Embed Size (px)

Citation preview

Page 1: Testing and Diagnostic Testing in Mathematics

Testing and Diagnostic Testing in MathematicsAuthor(s): M. CresswellSource: Mathematics in School, Vol. 6, No. 4 (Sep., 1977), pp. 25-27Published by: The Mathematical AssociationStable URL: http://www.jstor.org/stable/30212458 .

Accessed: 22/04/2014 11:29

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

The Mathematical Association is collaborating with JSTOR to digitize, preserve and extend access toMathematics in School.

http://www.jstor.org

This content downloaded from 130.239.116.185 on Tue, 22 Apr 2014 11:29:29 AMAll use subject to JSTOR Terms and Conditions

Page 2: Testing and Diagnostic Testing in Mathematics

TESTING AND DIAGNOSTIC TESTING IN MATHEMATICS

by M. Cresswell of the N.F.E.R.'s Guidance and Assessment Service.

Modern approaches to teaching emphasise the individuality of pupils and the necessity of matching curricula and methods to their abilities and attainments. For many purposes assessment by subjective judgment and intuition is satisfactory, but often a more objective and less parochial measurement is required. The tests published by bodies such as the N.F.E.R. are designed to satisfy thisneed.

There has recently been a fashion amongst test designers and test users for "diagnosis" and a distinction has been drawn between standardised and "diagnostic" tests, but while it is true that most standardised tests are concerned with generalities and that the more specific type of diagnostic test is usually unstandardised, there seems to be no technical or educational reason why this should be so. Diagnostic information can be obtained from general standardised tests and the addition of normative information would make diagnostic tests more flexible. The more correct distinction is between those tests where a sample of curriculum is examined in order to provide a general index of attainment, and those which cover a wide and structured curriculum in order to describe attainment in detail.

The General Standardised Test This is still the most common type of test where the aim is to describe each candidate's general attainment in terms of that of his peers. Tests of this type examine a range of concepts and skills chosen to be as representative as possible of the curriculum, at a level appropriate for the children who are to be assessed.

As an aid in the selection of the sample of the curriculum to be covered, a blueprint is drawn up in two dimensional form.

Cognitive Level

"Calculations"

"Problems" "Sets"

Curriculum Content.

etc.

Fig. 1 Test blueprint Any topic (e.g. "percentages", "reflections") may appear at varying cognitive levels.

One of the most useful ways of specifying the psychological complexity of the questions is derived from the cognitive tax- onomy of Bloom et.al. (1) and a good example of this approach is to be found in the manual to N.F.E.R. Mathematics Attainment Test EF. In this instance the curriculum has been divided into 39 topics (e.g. "ratio," "approximations," "sets") and the cognitive levels identified are Recall of Knowledge, Use of Learnt Techniques and Skills, Comprehension, and Applications to unfamiliar situations. (Comprehension is further subdivided into translation from one symbolic form to another, interpretation and extrapolation.)

Clearly it is not possible to fill all the cells in such a blueprint and in any case it would be misleading to do so, topics are tested at the most appropriate level bearing in mind the likely mathematical experience of the children who are to be asses- sed.

The remainder of the construction of a general standardised attainment test is fairly straightforward at least in theory! Once the questions have been written according to the blueprint, problems such as ambiguous phrasing are dealt with during preliminary trials and the test is standardised on as representa- tive a sample of the population as possible.

The descriptive or diagnostic test In contrast to the general standardised test the blueprints for tests of this type (often called Criterion Referenced Tests) are usually one dimensional. The most popular choice of dimen- sion in mathematics testing is that of content, but the analysis is far more detailed than for a general test. For example, where a single topic of "addition" may appear on the blueprint for a general test, the descriptive test will explicitly cover subtopics such as "addition of two 2-digit numbers where no carrying is involved". Often more than one question is asked about each detailed objective (subtopic) and "Yardsticks" (pub. Thos. Nel- son) is a good example of this approach.

Although the intention of the constructors of this sort of test is usually to make a comprehensive coverage of the cur- riculum, there is a tendency to concentrate on testing skills and techniques rather than the higher cognitive levels which are sampled in tests of general attainment. This is perhaps under- standable given the problems involved in analysing say the Primary Mathematics Curriculum, into literally hundreds of small uniquely defined objectives, but criterion referenced tests should not in theory suffer in this way.

25

This content downloaded from 130.239.116.185 on Tue, 22 Apr 2014 11:29:29 AMAll use subject to JSTOR Terms and Conditions

Page 3: Testing and Diagnostic Testing in Mathematics

Some unavoidable statistics As a general rule the interpretation of criterion referenced test results does not require any knowledge of statistics. Results are stated in terms of the child's ability to meet a given criter- ion of success on a given part of the curriculum. With standar- dised tests however each child is compared with the standar- disation sample and the result quoted as a score on a numeri- cal scale. A knowledge of basic statistics is necessary if the best use is to be made of these tests.

Clearly if any test is given to a group of individuals they will be distributed across the possible scores according to their attainment:

No. of pupils

0 1 2 3 4 5 6 7 8 9 10 SCORE-

IFig. 2. Score distribution for a ten-item test.

Provided the distribution has only a single "hump" and is fairly symmetrical about the mean value, it is an easy task to fit the data to the statistical model known as the normal or Gaussian distribution. This distribution can be described completely by two parameters; the mean and the standard deviation:

I

34% 34%

2.5% 135%i 135%

2.5% -2SD -1SD Ian +1SD 2SD Fig. 3. Normal distribution

For a normal distribution about 68 per cent of the cases fall within +1 S D of the mean value and about 95 per cent fall within +2 S D. If a method of converting raw test scores (the number of correct answers), to scores on a scale related to a normal distribution of known mean and standard deviation is provided, the teacher can tell in terms of proportions how exceptional (or otherwise) any particular child's score may be.

For most standardised mathematics tests, standardisation samples are selected by age, the mean converted score is set at 100 and the standard deviation at 15. A pupil with a score of 115 therefore has a better mark than 84% of his age group. A score above 130 places a pupil in the top 21/2% of his peers. It is perhaps worth mentioning at this point that the normal curve is only a convenient model of test score distribution. There is no theoretical reason to assume that attainment or ability in mathematics is truly normally distributed.

Measurements derived from standardised tests are not, of course, exact but are subject to various sources of error and most commercially produced tests have a known published standard error. The standard error is essentially the standard deviation of the distribution of errors surrounding each meas- urement. It follows that for any particular measured score we can be 68% confident that the "true" score is within +1 S E, and 95% confident (this can be interpreted as being correct for 95 children out of 100) that the "true" score is within +2 S E of the obtained score (see Fig. 3).

Handling test results With one or two exceptions, most published mathematics tests will fit into one of the two categories discussed above. The general attainment test is still most commonly used and most teachers will have either administered tests of this type or have been faced with long lists of scores obtained by their use. It is surprising now frequently such scores are filed and forgotten, when the use of a few simple techniques can release a lot of hidden information.

26

The following example is taken from a real set of results, recently sent to the N.F.E.R. Guidance and Assessment Service for comment. Two N.F.E.R. standardised mathematics tests had been given to the same children in two successive years (when they were 3rd and 4th year juniors). There were 37 children in the group and the scores appeared to show an improvement in the fourth year, for example:

1974 109 101 88

108 113 112 106 89

126 124

1975 125 101 90

110 112 115 107 81

122 131

1974 96 98

120 79

101 119 96

112 135

1975 92

106 123 96

102 125 93

131 108

From an assessment of these data by eye it is clear that in both years the average score was above the tests' nationally stan- dardised means of 100. In fact, the mean scores for the whole group were 103 in 1974 and 108 in 1975 showing a considera- ble apparent improvement. (In this case it was necessary to evaluate the means, but it is very often sufficient to simply check that about half the children score over 100). In order to illustrate the pattern of the improvement in scores, histograms showing the score distributions were drawn for each year:

I No. of pupils

100 100

score in 1974 score in 1975

Fig. 4. Score Distributions for 1974 and 1975 results.

Here a curious pattern emerged and one possible explanation was that the above average children had improved, the below averages ones had not. To check this conjecture a scattergram was drawn:

140-x ,,x

130 3 X

Scoresxxx x

in 1975 120 4 X

110

ly xX

10oo-

9o- x ;e

80- X x x

70 1 x

I I I I 70 80 90 100 110 120 130 140

Scores in 1974

Fig. 5. Scattergram - 1974 results v. 1975 results

and found to give impressive confirmation. When these results were returned to the primary school

concerned, it emerged that it was in a selective Authority and this fact alone provides a possible explanation for the results obtained.

The purpose of this rather long example has been to illus- trate how much information can be obtained from an appar- ently sterile class list of standardised scores. The procedure follows five stages: 1. Assess data by eye.

This content downloaded from 130.239.116.185 on Tue, 22 Apr 2014 11:29:29 AMAll use subject to JSTOR Terms and Conditions

Page 4: Testing and Diagnostic Testing in Mathematics

2. (a) Check that the mean score is about 100 (for most standardised tests)

(b) If it is not it may be useful to evaluate it. 3. Draw histogram score distributions. 4. Draw scattergrams if results from more than one test are

being compared. 5. Carry out other statistical tests if necessary. It is rarely necessary to go further than stage four. In the example we have been discussing, it would be possible to evaluate the coefficient of correlation between the two tests and the significance of the difference in mean scores but little extra practical information would be obtained by doing so. The scattergram shows that as usual a fairly high positive correla- tion exists and a difference of 5 points between mean standar- dised scores will almost always be highly significant for a group of more than about 25 children.

Turning now to the consideration of individual results, it is clear that some children in the example have made considera- ble improvements in score but, that others have fallen back slightly. To be confident that a difference between two test scores is genuine it is necessary to compare it with a standard error. The standard errors of measurment were 2.8 for the test taken in 1974 (SEmi) and 3.4 for the test taken in 1975 (SEm2). To compare scores obtained on two tests the standard error of the difference between the scores is required. This is given by:

S.E. of difference= SEd = -/(SEm)2+(SEm2)2

Which in this case has a value of 4.4. There are four children in the example for whom the differ-

ence in score between the tests is greater than twice (more exactly 1.96 times) the standard error of the difference. For these children we can say with 95% confidence that their standing relative to their peers has changed in the intervening year.

But was the test valid? The accuracy of the conclusions reached in the foregoing example is, of course, entirely dependent upon the tests being fair measuring instruments for the children concerned. There are a number of different factors which contribute to the suita- bility of either general standardised tests or descriptive criter- ion referenced tests and before a test of any sort is used the following points should be checked.

(1) Is the age-range of the test right? This seems an obvious point to watch but the N.F.E.R. receives occasional letters from worried test users who have tested children with tests intended for an entirely different age group! With standardised tests it is often necessary to check that the conversion tables extend over the whole age range. For example a test may be aimed at 7-9 year olds, but only standardised for children aged between 7 years 6 months and 8 years 10 months.

(2) Are there any questions in the test which the children have not covered? The pace of curriculum change in recent years and the width of mathematics curricula have made both this question and the following one increasingly important. If there are only one or two specific items which are inappropriate the test might still be used since it is likely that the rank ordering of the children will be largely unaltered, but in these circums- tances the "national mean of 100" will have little meaning as a reference point.

(3) Are there any topics that the children have covered which are not in the test? The omission of some topics might alter the rank ordering of the pupils considerably and it is also possible that children who follow a very wide curriculum will not do themselves justice on the sample of mathematics included in a particular test. This does not imply that there is anything wrong with testing a specific set of skills or concepts for a particular purpose. The error (committed more often than one might expect) would be to draw too general a conclusion from

a very restricted sample of a child's attainment. The classic example of this is the use of arithmetic tests to assign pupils to mathematics sets. Some secondary schools screen their intake in this way and find that children from more traditional prim- ary schools tend to be assigned to the higher sets or streams; hardly a surprising result.

(4) Is the approach in the test similar to the approach of the teaching which the children have received? If points 2 and 3 above apply particularly to general standardised attainment tests, this one is particularly relevant to descriptive criterion referenced testing. Where we are drawing an inference about attainment in a particular mathematical area it is clearly impor- tant that the mathematical ideas tested by the questions should be as accessible as possible to the candidates.

Content analysis The foregoing discussion points clearly to the need for test content to be closely scrutinised before the results can be properly interpreted. If this scrutiny is done in a systematic way, an analysis can be made in terms of the mathematical ideas involved in each question; effectively a reversal of the blueprinting procedure. If we find that a pupil who we suspect has difficulty with negative numbers obtains a low score, reference to his script and the content analysis of the test will give a useful indication of whether or not this is because of his difficulty in this area. The test will thus provide confirmation (or otherwise) of the pupil's specific difficulty, and the teacher will be alerted to the influence that this has had on the pupil's overall score. It is often the case that where a test score is at odds with the teacher's expectation this is caused by an unex- pected failure, or success, on a few particular questions rather than disagreement on the level of the child's general attain- ment. On a class basis if we find that many of the children fail to answer, say, questions on line graphs, correctly, we might deduce that further work on this topic is necessary.

It should be understood that this procedure is not "teaching to the test". The presence in a test of questions on a particular topic does not imply in any way that the topic should be taught. However, if a teacher considers the topic important the test will give an indication of her pupils' attainment and she can decide whether further teaching time would be well spent on that particular area of the curriculum. Few teachers would mark exercise books without giving thought to the way in which the pupils obtained their answers, but all too often tests are marked in a purely mechanical fashion when a more active approach would release considerably more, useful informa- tion.

I have tried in this short article to outline some of the ways in which published mathematics tests, particularly the general standardised type, can be used in a constructive way by teachers to learn more about their pupils. Tests of the standar- dised type are often regarded with suspicion as "labelling devices", sometimes there is a feeling that they invite odious comparisons between pupils. To take this view is really to miss the point, standardised tests are statistical tools which can be misused, but used sensitively they can reveal a great deal about the strengths and weaknesses of both individual pupils and class or year groups. Those addicted to short snappy labels such as "bright", "dim" or even "practically E.S.N." use them in the absence of any objective information, and are often rudely awakened by standardised test scores. Tests pro- vide teachers with a method of learning more about their pupils, they have advantages and disadvantages; precision, but in a limited field. References 1. Bloom, B. et al (1956) "Taxonomy of Educational Objectives. Book 1, Cognitive Domain". Longmans. 2. Fraser, W. C. & Gillam, G. N. (1972) "Principles of Objective Testing in Mathematics". Heinemann 3. Lewis, D. G. (1974) "Assessment in Education". Hodder & Stough- ton 4. Lewis, D. G. (1967) "Statistical Methods in Education". Hodder & Stoughton

27

This content downloaded from 130.239.116.185 on Tue, 22 Apr 2014 11:29:29 AMAll use subject to JSTOR Terms and Conditions