The Development of Educational Evaluation 1-Libre

The Development of Educational Evaluation

Extract from my book:

Assessment of the Sudan School Certificate English Examinations (http://www.lulu.com/content/hardcover-book/assessment-of-the-sudan-school- certificate-English examinations/7583003)

e can begin this chapter by quotation from Sax:(1980:5) who told us that our

ancestors used principles of measurement to build shelters and tools, select a mate,

kill a prey, and fashion clothing long before the advent of educational and

psychological tests. The Ancient Egyptians devised sophisticated, complex

measurement to construct pyramids. By 2700 b.c they had mastered geometrical

concepts and were able to use measurements in three dimensions. The Bible also

attests to the early use of measurement. Noah was commanded to build an ark 300

cubits long, 50 cubits wide, and 30 cubits high, as the story came in the

Bible.(Genesis 6:15). We can trace the historical perspective of educational

assessment and evaluation with another quotation for Horace Mann, the famous

American lawyer who lived in the 19th century and who abandoned law for

education. We venture to predict that the mode of examination, by printed questions

and written answers, will constitute a new era in the history of our schools. Ebel:

(1972:5). So Mann can be considered the first educator who called for test

authentication.

Test and measurement of one kind or another played a big role in human history than

generally recognized. In fact, among the earliest records of the use of various testing

devices are those found in the Bible, although they generally have no direct reference

to education. One illustration will suffice. Ross:(1963:27) narrated the biblical story

of the "Gilieadites who took the passage of Jordon before the Ephraimiates: and it was

so, that one of those Ephraimiates which were escaped said, let me go over; that the

men of Gilead said unto him , art you an Ephraimate. If he said, nay; then said they

unto him, Say now Shibboleth: and he said Sibbloeth: for he could not frame to

pronounce it right. Then they took him, and slew him at the passage of Jordon: and

there fell at the time of Ephaimites forty and two thousand (Judges).

McNamara:(2000:68) and Hughes:(1995:37) quoted the same story and the later

commented that in general, the more important the decision based on test, the longer

the test should be. Jebhthah used the pronunciation of the word 'shibboleth' as a test to

distinguish his own men from Ephrainites, who could not pronounce the (sh). Those

W

who failed the test were executed. Any of Jephthah own men killed in error might

have wished for a longer, more reliable test. The development of measurement can be

traced from 257 B.C in China where an extensive system of written examinations of

educational achievement formed the basis for admission and promotion in the civil

service of ancient China. The system persisted until this century and was least partly

responsible for maintaining the internal stability of that society, and its relatively high

level of culture for over two millennia. It provides an alternative to the more usual

organization of ancient society, in which power and privilege were hereditary

prerogatives. Ebel: (1972:3). When universities were established in Europe in the

renaissance; examinations were largely oral and frequently took the form of public

disputation on controversial questions. The Society of Jesus was founded in 1540,

placed a high value on education and scholarship. Departing from the popular practice

of the times, Jesuit insisted on the use of written examination. In 1599 the society

issued comprehensive statement of the theory and practice of instruction. The

statement included a detailed sect of rules for the conduct of written school

examination, apart from the fact that it is in Latin could be used in an examination

room today.

Then in 1836, as a result of competition and friction between universities, the

University of London was charted to serve primarily as an examining and degree-

certifying authority. Ross :(1963:29) Then in 1845, Horace Mann, who was a

towering figure in the development of public education in the United States, took his

responsibilities very seriously as Secretary of the Massachusetts Board of Education.

He became involved in a controversy with the Boston schoolmasters over the

effectiveness of some of their methods. Mann felt special need for more adequate,

more objective evidence of pupils achievement than oral examinations. Written

examination would have a number of advantages. An idea which he shared with

another American who realized the value and the limitations of examinations -

Emerson. E White, who wrote in 1886 'It may be stated as a general fact that school

instruction and study are never much wider or netter than the test by which they are

measured? He enumerated several special advantages of the written test. Ross

:(1963:29)

Intelligence tests started in America by W. Stanley Jevons in 1874 and in 1879

Wilhelm Wundet in Germany began to do experimental psychological studies. In

1912 another important ideas, suggested by Stern, was that of representing

intelligence as the ratio of mental age to chronological age. This concept for which

Stren suggested the term "mental quotient" was later adopted by Terman as the

familiar Intelligence Quotient (IQ).

The distinctive contribution of the English to the measurement of intelligence has

been that of statistical methods as a tool for the analysis of the test results. Sir Francis

Galton in 1883 outlined a method of studying free association by quantitative method.

But his most notable contribution was in statistical analysis where he suggested

among other things a graphical method of representing the correlations as cited from

Ryan in Ros: (1963:31). These ideas were developed by his pupils such as Karl

Pearson and Charley E. Spearman. Spearman developed his well-known two-factor

theory of intelligence on the basis of statistical analysis. Cyril Burt who had been a

leader in the introducing and adopting Binet's work in Great Britain was in 1913

officially appointed school psychologist, possibly the first person to occupy that

position.

The French have long been leaders of the abnormal psychology. This brings us to the

most important name in the history of intelligence testing, Alfred Binet. He

contributed a technique of scale contribution and another one consisting of test

situations selected according to predominated criteria and standardized. The date 1905

was important, therefore, because it marked the appearance of measurement of

intelligence for the first scale, which, crude it was has served the pattern for

subsequent tests and scales of the world over. In 1890, the concept of mental tests

took place by measuring precisely certain sensory, motor and basic mental faculties

such as visual and auditory sensitivity. This was first done by James McKeen Cattell

who thought there should be a direct relation between a persons elemental processes

and his ability to use higher mental processes such as reasoning, critical thinking, and

creative imagination.

Achievement Tests: What is an achievement test? It is an ability test designed to

appraise what the individual has learned to do as result of planned previous

experience or training, often that provided in school. Thorndike: (1969:643). Another

definition is that, achievement test is an objective examination that measures

educationally relevant skills or knowledge about such subjects as reading, spelling, or

mathematics. (www.eric_digest/index:2003)

The first textbook in educational measurement appeared in 1903 by Dr. Edward L.

Thorndike. "Learning" as Thorndike believed does consist in the cultivation of

http://www.eric_digest/index:2003

faculties such as memory, willpower, reasoning or imagination. It consists in the

formation of fast numbers of specific connections, the strength of which is governed

by the law of readiness, exercise and effect. Ebel :(1972:11). Thorndike was the father

of educational measurement as it was put by Ayres in Ross :(1963:39). Thorndike

was the one who made the discrimination between the 'inventor' and the 'father' of the

movement. Although Thorndike's publications on statistical methods were influential

in education, he was not responsible for the early standard tests. The first test was the

Stone Arithmetic Test which was published in 1908 and the first scale was the

Thorndike Handwriting Scale which was announced in 1909 and polished in 1910.

The idea of standardized test made educators discover for the first time just how bad

the existing measurements were. Ross: (1963:39). Beginning in 1910 several studies

in the unreliability of examinations were carried out. A distinction should be made

between the limitations of the school marks and the limitation of the school

examinations. Then the need for reforms in college marking of examinations took

place and forcibly brought to public attention after the research carried out by Meyer

Max as in Ross(1963:39) who reported on marks collected from forty instructors for a

period of five years at the University of Missouri, and found astonishing variations.

There was also 'Franklin W. Johnson :(1911) who found a similar condition in the

University of Chicago High School, in Study of High-School Grades'.

In 1918 Thorndike published what had proved to be probably the most influential

paper that has ever appeared on educational measurement. It began with the well-

known dictum "whatever exist all exits in some amount." And as he looked in the

future, Thorndike saw it conditioned by a series of 'ifs' - " if those who object to the

quantative thinking in education will set themselves to work to understand it; if those

who criticize its presuppositions and methods will do actual experimental work to

improve its general logic and detailed procedures ; if those who are now at work in

devising and in using means of measurement will continue their work, the next

decades will bring sure gain in both theory and practice". This statement by

Thorndike followed by many developments in the field of measurement.

Objective Test: The decades between the two wars were years of rapid development

in the technique and the uses of educational measurement. There were many figures

such as A. McCall, who seems to have been the first to suggest the objective test."

Ross: (1963:47). There were also Edward K. Strong and Carl Brigham who developed

for the College Entrance Examinations Board an objective test of general verbal and

quantitative skills, called Scholastic Aptitude Test, to supplement the essay tests of

subjects matter achievement. The year 1912 witnessed the first attempts to measure

character by a test that designed by G. G. Fernald followed by Voelker in 1921 who

devised some actual test situations for measuring character.

By far the most ambitious attempt so far made is that of Character Education Inquiry

under the direction of Hartshorne and May which extended over five years from 1924

to 1929. Their method was to select repetitive and varied life situations which would

afford a valid index of the totality of the character of the individual. Ebel:(1972:15) In

the mid 1930s a small group of directors of state testing programs began meeting

each fall in New York City following the annual conference of the Educational

Record Bureau and the Cooperative Test service. These meetings evolved into an

annual invitation conference on testing problems. Measurement specialists present

papers on topics of current interests, which are then published in a book of processing.

(ibid)

Achievement Tests: In 1904 appeared E.L.Thorndike's 'An Introduction to the

Theory of Mental and Social Measurement' which made it available for the first time

to American students the statistical techniques necessary for educational research and

measurement. In 1914, Truman Kelley's, Education Guidance introduced education

workers to the alluring possibilities of partial and multiple correlation. In 1918

appeared 'The Seventeenth Yearbook of the National Society of the Study of

Education', where an essay for Thorndike contained the famous quotation "whatever

exists at all exists in some amount,"

The year 1927 witnessed the publication of Thorndike's 'The Measurement of

Intelligence' and C.E. Spearman's ''The Abilities of Man' which represented a distinct

point of view. In 1930 appeared a collection of books but the most important was the

'Bibliography of Mental Tests and Rating Scales' by Oscar Buros and Personality tests

of 1933,1934, and 1935, 1949, 1950 were milestone in the history of educational

measurement as was put by Ross. In 1947 appeared Lee.J.Gronbach's, 'Essentials of

Psychological Testing' and Robert A, Thorndike's 'Personal Selection of Test and

Measurement techniques'. In 1949 appeared Frank S. Freeman's 'Theory and Practice

of Psychological Testing' as well as E.L.Lindquist's 'Educational Measurement' in

1950. In 1956 appeared the 'Taxonomy of Education Objectives' which marked the

beginning of a new era in educational measurement.

The Taxonomy of Educational Objectives: In 1956 Benjamin S. Bloom and others

initiated and developed the concept of the taxonomy of educational objectives in their

book 'Taxonomy of Educational Objectives' which was first published in 1956, by

David MacKay's Company in New York. Three types of objectives were identified:

cognitive, affective and psychomotor. And because the cognitive taxonomy has

become especially well known and has had considerable impact in stimulating the

development in tests that measure more than knowledge, the researcher will make a

summary of its two important books, "Taxonomy of Educational Objectives, Book 1

Cognitive Domain, and Book 2 Affective Domain By: Benjamin S. Bloom as editor

(et.al). The first book was issued on 1956 and 1979 the edition which is at our hand

now. The second book was issued in 1964 and we are dealing with the 1971 edition

published by David McKay Company, New York.

Educational Objectives in Pupils' Evaluation: Efforts to translate the needs of

young people into educational objectives are typically directed towards producing

precise statements that identify the observable changes in pupils' behaviour that

should take place if the learning experiences are successful. Unquestionably such

objectives are excellent stating points for the development of the curriculum, planning

of teaching strategies and construction of testing. Eisner in Ahmann: (1975) identifies

two types of objectives, namely instructional and expressive. The former specify the

unambiguously the particular pupil behaviour to be acquired as result of learning. The

later do not.

The search for precise definition of educational objectives and goals has been and still

is a major field of research. The efforts of researchers have yielded very good

attempts such as the 'The Taxonomy of Educational Objectives' by Benjamin S.

Bloom and others. This attempt has widely contributed in the development of

education since it was first published in 1956. It has contributed in the development of

curriculum designing, teaching methods as well as in student measurement and

evaluation. And due to an advice from the supervisor and due to the importance of

this 'Taxonomy' the researcher will make a brief summary of the Taxonomy as this

would help many other researchers to make use of this attempt in their future studies.

The Three Domains of the Taxonomy:

Cognitive Objectives: emphasize remembering or reproducing something which has

presumably been learned, as well as objectives which involve the solving of

intellective tasks, for which the individual has to determine the essential problem and

then reorder given materials to combine it with ideas , methods, or procedures

previously learned.

Affective Objectives: emphasize a feeling tone, an emotion or a degree of acceptance

or rejection. Affective objectives vary from simple attention to selected phenomena to

complex but internally consistent qualities of character and conscious. A large number

of such objectives are found in the literature expressed as interests, attitudes,

apprehension, values and emotional sets or biases.

Psychomotoric Objectives: emphasize some muscular or motor skills, some

manipulation of material and objects or some act which requires a neuromuscular co-

ordination. When found, they were mostly related to handwriting, speech and to

physical education, trade, and technical courses.

The Taxonomy as a Classifying Device: The major task in setting up any kind of

taxonomy is that of selecting appropriate symbols, giving them precise and usable

definitions, and securing the consensus of the group which is to use them. Similarly

developing a classification of education objectives requires the selection of an

appropriate list of symbols to represent all the major types of educational outcomes.

Next, there is the task of defining these symbols with sufficient precision to permit

and facilitate communication about this phenomenon among teachers, administrators,

curriculum workers, testers, educational research workers and others who are likely to

use the taxonomy. Finally, there is the task of trying classification and securing

consensus of the educational workers who wish to use the taxonomy.

The Cognitive domain

It is classified into domains in six classes

Knowledge: it involves the recall of specifics and universals, the call of methods and

process, the recall of a pattern structure or setting and intellectual abilities and skills

Knowledge of specifics: The recall of specific and isolable bits of information. The emphasis is on the symbols with concrete reference. This which material is at

a very low level of abstraction may be thought of as elements from which more

complex and abstracts forms of knowledge are built.

Knowledge of ways and means of dealing with Specifics: Knowledge of the ways of organizing, studying, judging, and criticizing. This includes the methods

of inquiry, the chronological sequences, ant the standard of judgment within a

field as well as the patterns of organizing through which the areas of the fields

themselves are determined.

Knowledge of the universal and abstraction in a field: Knowledge of the major schemes and patterns by which phenomena and ideas are recognized. These are

the large structures, theories and generalization which dominate a subject field or

which are quite generally used in studying phenomena or solving problems. These

are at the highest levels of abstractions and complexity.

Intellectual Abilities and Skills: Abilities and skills refer to organized models of

operation and generalized techniques for dealing with materials and problems. The

materials and problems may be of such a nature that little or specialized and technical

information is needed. Such information as required can be assumed to be part of the

individual general fund of knowledge. Other problems may require specialized and

technical information at a rather high level such that specific knowledge and skill in

dealing with the problem and the materials required. The abilities and skills objectives

emphasize the mental process of organizing and reorganizing materials to achieve a

particular purpose. The materials may be given or remembered. The abilities are

classified under six classes come as follows:

1. Comprehension: This represents the lowest level of understanding. It refers to the

type of understanding or apprehension, such as the individual knows about what is

being communicated and can make use of the materials or idea being communicated

without necessarily relating it to other material or seeing its full implication.

Comprehensions consists of three sub classes; translation, interpretation and

extrapolation.

2. Application: The use of abstractions in particular and concrete situations. The

abstractions may be in the form of general ideas, rules of procedures, or generalized

methods. The abstractions may also be technical principles, ideas, and theories which

must be remembered and applied.

3. Analysis: the breakdown of communication into its constituents elements or parts

such that the relative hierarchy of ideas is made under clear/or the relationship

between the ideas expressed are made explicit. Such analyses are intended to clarify

the communication, to indicate how the communication is organized; and the way in

which it manages to convey its effects, as well as its basis and arrangement. It

includes sub classes such analyses of elements, relationship, and analyses of

organization principles.

4. Synthesis: The putting together of elements and parts so as to form a whole This

involves the process of working with pieces, parts, elements, etc., and arranging and

combing them in such a way as to constitute a pattern or structure clearly there

before. It has three sub classes, production of a unique communication, and

production of a plan or propped set of operations, derivation of a set of abstract

relations.

5. Evaluation: It is the judgment about the value of material and methods for given

proposes. Quantative and qualitative judgments about the extent to which, material

and method satisfy criteria. Use of standard of appraisal. The criteria may be those

determined by the students or those are given to him. Evaluation includes these

classes: judgment in terms of internal evidence, judgment in terms of external criteria,

which means evaluation of material with reference to selected or remembered criteria.

Classes of the Affective Domain:

1. Receiving (attending): Here we are concerned that the learner be sensitized to the

existence of certain phenomena and stimuli and that he be willing to receive or attend

to them. It has been categorized in three subclasses awareness, willingness to receive,

controlled or selected attention.

2. Responding: The term used to indicate the desire that a child becomes sufficiently

involved in or committed to a subject phenomena, or activity that he will seek it out

and again satisfaction from working with it, or engaging in it. It has three sub-

categories, acquiescence in responding, wiliness to respond and satisfaction in

response.

3. Valuing: At this level we are not concerned with the relationship among values but

rather with the internalization of a set of specific, ideal, values. This category will be

found appropriate for many objectives that use the term "attitude" as well as "values".

It has three sub-categories, acceptance of a value, preference for a value and

commitment.

4. Organization: This category is intended to as the proper classification of for

objectives which describe the beginning of a building of a value system. It is

subdivided into two levels conceptualization and organization of a value system.

5. Characterization by value or value complex: At this level of internalization the

value already have a place in the individual value hierarchy, are organized into some

kind of internally consistent system, have controlled the behaviour of the individual

for a sufficient time that he has adapted behaving this way; and an evocation of the

behaviour no longer arouses emotion or effect except when the individual is

threatened or challenged. It has two sub-categories; generalized set and

characterization. For further details see "Source: The Taxonomy of Educational

Objectives" by Benjamin S. Bloom (196:176-193).

The Value of the Taxonomy in Education: The impact of the taxonomies on

educational thinking is indeed powerful. They (taxonomies) do assist appreciably in

the task of helping teachers and other educational specialists discuss their curricular

and evaluation problems with greater precision. The three taxonomies represent the

total framework of educational objectives for all types of educational institutions.

Ahmann:(1975:4). As stated by Metzger in Orlich :( 1998:88) taxonomies had been

used in general curriculum design by (Pratt:1994) to provide the stimulating

experiences for preschoolers through technology), by (Morgan;1996) and test

construction, by (McLaughlin and Philips:1991- Marks, Vitek and Allen) who used

the taxonomies to relate data obtained from a satellite remote sensing exercise.

Perhaps the taxonomy's greatest contribution has been in the development of a

professional language. Teachers and administers who describe and analyze

instructions know that terms such as knowledge level and higher levels of learning

will be understood by educators everywhere. This Universals vocabulary reflecting a

specialized body of knowledge was an essential step in the professionalizing of

teaching. : ibid: (1998:89)

Psychological Evidence Supporting Bloom's Taxonomy: Does Bloom's Taxonomy

make sense psychologically? Evidence from a number of sources supports the idea,

that increased level of processing means better student learning. Perhaps the most

fundamental of these evidences is that teacher actions influence academic tasks,

which, ultimately influence learning. Doyle:(1983),Nickerson:(1985)-

Wakefield:(1996).

The Outcomes of Taxonomies on American Education: The taxonomy has played

a major role in shaping the American education. It helped in designing the curriculum,

developing the teaching methodology, and helped in assessment of the students'

performance and it helped research workers to do their job in the best professional

way. As came in Orlich:(1998:73:77), one recent important attempt to set goals for

education was the National Education Goals for the Year 2000, the product of an

education summit called by president George Bush in October 1989. All of the

nation's governors attended that historic meeting, (including Bill Clinton, who was

then governor of Arkansans).

New Models for the Cognitive Taxonomy: Blooms' taxonomy has provided a

number of useful insights about teaching and learning in the classroom since 1956.

Raymond Nickerson (1985) as mentioned in Orlich (1998:89) wrote a paper.

"Understanding Understanding" that triggered a serious reexaminations of the nature

of understanding and its role in Bloom's taxonomy. Having a keen look on the

traditional analogues of the cognitive taxonomy will explain to us the interaction of

all elements of the cognitive domain. We can see the analogue of the taxonomy in

these shapes.

(Fig.1) Taxonomy as staircase

(Fig.2) Taxonomy as a ladder

However, the researcher believes that the taxonomy had made it and still makes it

easier for most people in educational fields to classify, identify and apply clearly and

objectively the educational goals. But a few dealt to some extent with the taxonomy in

one phase or another. We can here mention some of Sudanese scholars who have

treated this part of the taxonomy (e.g., Mohammed Abdul Al-Fatah Shaheen(1983),

Abdel-Rahiem Ahmed Salim:(1985), Mohammed Hassan Sinadah (1986) and Farouq

Mohammed Ahmed A/Asalam, who advised and provided the researcher with

evaluation

synthesis

analysis

application

comprehension

knowledge

Evaluation

Synthesis

Analysis

Application

Comprehension

Knowledge

valuable authorities and references in the field of evaluation and measurement; and

who took the burden of the supervision of this research..

It is worth mentioning that Bloom did not stand alone in the field of taxonomies.

There are other taxonomies as well such as David R. Krathwol collaborated with

Bloom and others :(1964) in "Taxonomy of Educational Objectives: the Classification

of Educational Goals: Handbook 2: The Affective Domain, in (1964). there is also

Raymond Nickerson (1985) as mentioned in Orlich (1998:89). Arnold B. Arons

(1988) also examined concepts similar to Nickerson's where his studies created more

speculations about the role comprehension plays in learning. Many other researchers

contributed pieces to this dilemma such as Wittrock (1986); Jones (1985) ; Ennis

(1985); Beyer (1988); Whimby (1984); Haller, Child and Walberg (1988): See

Orlich:(1998:88). Gilbert Sax (1980:) also pointed to a taxonomy prepared by Anita

Harrow (1972) who wrote about the psychomotor domain.

Criticisms of the Taxonomies: Despite its widespread acceptance and use Bloom's

taxonomy has raised some continuous questions, one question is

incomprehensiveness. Some critics such Furst:(1994) in Orlich:(1991:89)). Others

raised some questions see that the taxonomy is too narrow and does not include all the

important outcomes taught in our school sequence of the levels whether the levels in

the hierarchy are discrete or overlapping. For some purposes, the taxonomies are not

sufficiently prcis as it has been used in general curriculum design. Ahmann

(1975:34)

Documents

The Development of Educational Evaluation 1-Libre