The evaluation and improvement of teaching in higher education

Higer Education 7 (1978) 221-245 221 �9 Elsevier Scientific Publishing Company, Amsterdam - Printed in the Netherlands

T H E E V A L U A T I O N A N D I M P R O V E M E N T OF T E A C H I N G

IN H I G H E R E D U C A T I O N *

MARCEL L. GOLDSCHMID Department of Psychology and Higher Education, Swiss Federal Institute of Technology,

Lausanne, Switzerland

ABSTRACT

Four procedures to evaluate teaching (by students, peers, video-recordings, and direct measurements of student learning) and three uses of the evaluation results (improving teaching, personnel decisions, course handbooks) are reviewed in the light of empirical evidence. Special emphasis is placed on the timing and validity of student ratings and the instruments used. Since none of the procedures appear sufficient in and by itself, a multiple indicator approach, especially for personnel decisions, would seem to be the most defensible one.

While it is essential to take evidence of teaching effectiveness into account in considerations for tenure and promotion, faculty must also be given opportunities to become professionals as teachers. Higher education units, designed primarily for this purpose, appear to be effective as judged by their clients (the faculty they have served), but have failed to make an impact on the faculty as a whole. What is requited now is an institutional commitment to quality instruction, i.e. a departmental policy on the evaluation of teaching and faculty development.

I Introduction

A n u m b e r o f fac tors have c o n t r i b u t e d to the rapid expans ion o f a t t e m p t s at evaluat ing teaching. A c c o u n t a b i l i t y is f inally a b o u t to m a k e its way in to h igher educa t ion (cf. Mor t imer , 1972; R o t e m and Glasman , 1977). On the

one hand , budge t cons t ra in t s (cf. Seldin, 1976), bu t still large publ ic invest- men t s , as well as increasing diff icul t ies for college graduates to find e m p l o y -

m e n t (cf. Scully, 1976) are genera t ing ques t ions a b o u t the qual i ty and effi-

c iency o f our teaching and learning sys tems. On the o ther , educa t iona l

t e c h n o l o g y and research have p r o d u c e d tools and strategies to evaluate and

enhance teaching effect iveness , as well as insights in to the under ly ing processes and m e c h a n i s m s (cf. Go ldschmid , 1976).

* Invited keynote address presented at the Third International Conference on Improving University Teaching, Newcastle-upon-Tyne, June 8-11, 1977. An annotated bibliography "Evaluation des syst6mes d'6ducation sup6rieure" produced by L. de Marchi in prepara- tion for this address is available from the author.

222

The purpose of this paper is to analyze current efforts to assess and improve teaching. Since institutional commitment and policy is emerging as a pivotal factor in the application of the available technology and appropriate use of the results, it will constitute a separate section.

II The Evaluation of Teaching

In discussing the issues involved in the assessment of teaching it is important to distinguish between the procedures and instruments applied to obtain the data and the subsequent use of the evaluation results.

1. PROCEDURES AND INSTRUMENTS

Depending on who does the evaluation we can differentiate among three procedures: student-, peer- and self-evaluations.

1.1. Evaluation by students Although it is by far the most common practice today to ask students

to provide a number of ratings on their instructors and courses, it has raised numerous queries and issues. I shall in the following analyze some of these in light of my own experience, and the available research literature.

The instruments. Most faculty use a standard questionnaire developed locally. Several of the forms, such as the Kansas State University's IDEA, the Purdue Rating Scale for Instructors, SIR designed by ETS and the University of Massachussetts' TABS, however, have found wider applica- tions. Some universities provide a catalogue of computer-scored items among which faculty choose those they find most relevant for the evaluation of their courses (Kulik, 1976). The advantages of a standard form include the possibility of computer scoring and providing norms for a given type of course (by context, level and class size, for example), enabling the instructor to compare the evaluation of his class with average ratings of similar courses.

Furthermore, a standard form usually undergoes considerable experi- mentation and revision before it is made available for general use. There is then some assurance that the content of the questionnaire is representative and relevant and the scaling procedure and statistical analysis adequate. Unfortunately, many questionnaires still in use are poorly designed and hard to interpret; there can be no doubt that to develop a good questionnaire requires a great deal of competence and effort (cf. Doyle, 1975; Frey, 1976; Harari and Zedeck, 1973). The individual instructor may have neither the expertise nor the time necessary to develop an appropriate instrument.

223

On the other hand, there are also advantages in designing a special instrument for a particular course. It allows the instructor to obtain feedback on specific and perhaps unique features of his course (cf. Dufresne, 1975). The standard questionnaires often provide general results which do not indicate precisely what aspects of the course design or presentation should be modified (cf. Pohlman, 1975; Sherman and Winstead, 1975).

At our institute we have therefore adopted the following procedure (designed only for the purpose of improving instruction): the faculty member can use our 32-item computer-scored questionnaire, delete any questions he thinks are inapplicable and add up to seven (also computer-scored) of his own. In addition, the students are asked to provide written comments on all questions and on the course in general. The computer print-out is returned to the instructor within a few days. Although he is the only one to see the results, we strongly recommend that the teacher discuss the evaluation with his students (Champagne, 1976). Upon an individual instructor's request and with his collaboration we also design specific evaluation instruments where the standard form is felt to be inadequate.

Several authors have proposed alternatives to the questionnaire. Rumery et al. (1975), for example, have argued in favor of regular student reports based on observations in class. Other instructors use open-ended questions or simply ask their students to write an essay on or description of the course (Silberman and Allender, 1974). While the ~nstructor may thus glean useful information, it may be difficult to get an overall impression on any one aspect of the course, nor is there any assurance that all important features of the course receive the necessary attention.

Timing of the evaluation. Unfortunately many instructors carry out course evaluations only at the end of the term. It is obvious that the students in those classes are then not going to benefit from any modifications the instructor may introduce as a result of the feedback obtained. In line with an emphasis on formative rather than summative evaluation (cf. Sherman and Winstead, 1975 ; Falk, 1977), several authors (Centra, 1972; Gage, 1974; McKeachie and Lin, 1975; Pambookian, 1976) have provided evidence that course evaluations in mid-semester can bring about changes in teaching practices, especially when the instructor rates himself more favorably than his students.

It may also be useful to ask for an evaluation from former students a few years after the course was given. After a more extended educational and perhaps some professional experience the perspective may be different. Those instructors, though, who may, in the light of unfavorable student ratings immediately after the course, shrug off the results by saying that students will better appreciate them later on, should be advised that Shef- field's (1974) findings do not support such a claim. Teachers evaluated

224

unfavorably by graduates several years later were in general rated similarly by their present students. Similar findings were presented by Drucker and Returners ( 1951 ) and Aleamoni (1974).

Reliability and validity. The evidence available suggests that students' ratings are reliable (cf. Aleamoni, 1974b; Costin et al., 1974; Doyle, 1975; Miller, 1972). With respect to validity, Gage (1974, p. 76), after reviewing several correlational studies, came to the conclusion that these studies "offer some support for the validity of students' ratings as indicators of how much students have learned, as objectively measured and adjusted for student aptitude." Subsequent to this review, Centra and Rose (1976) and Frey et al. (1975) also found correlations between student achievement and course ratings.

Further evidence of the validity of student ratings was provided by several investigators (e.g. Aleamoni and Ymer, 1973 ; Greenwood et al., 1973; Hildebrand and Wilson, 1970) who found a great deal of agreement between students and faculty on what constitutes effective teaching.

Several experimental studies addressed themselves to the question of the students' competence to judge their instructors' teaching. Zelby (1974) asked students to rate two courses he taught each by two different methods. Since he judged one approach to be educationally more valuable, while the students preferred the other, he concluded that "teaching for a good Student- Faculty-Evaluation may be inconsistent with the best educational practice" (p. 1269). As Gage (1974) points out, Zelby provided no evidence, however, on how well he performed in the two classes.

The authors of the famous "Dr. Fox experiment" (Naftulin et al., 1973, p. 630) concluded that "student satisfaction with learning may represent little more than the illusion of having learned." Given the fact that the instructional arrangement they used was very atypical of the usual classroom situation, it is questionable whether their findings can be generalized to the nor- mal student-evaluation procedures.

In subsequent experimental studies of the "Dr. Fox effect," which Williams and Ware (1976, p. 48) defined as "lack of correspondence between ratings and substance of instruction under high-expressive conditions," they obtained the following results:

(a) presenting a lecture in an enthusiastic manner produces more student learning when initial motivation is low;

(b) differences in the amount of information provided in the lecture correspond to differences in actual student learning;

(c) student ratings are accurate with respect to the amount of information presented and expected student learning when the lecture is not given in an enthusiastic manner; and

(d) the latter does not hold when the lecture is given with enthusiasm even after a second exposure.

225

T h e y conc luded tha t "unt i l s tudent rating scales are cons t ruc ted so as

to be valid with respect to d i f ferences in facul ty i n fo rma t ion giving, we

suggest tha t the best ( if no t the only) way to evaluate such differences (emphasis added! ) in facul ty is direct observat ion by t ra ined evaluators and

' the best (if no t the only) way to evaluate s tudent achievement is an achieve- m e n t test in con junc t ion with p rope r con t ro l s " (Ware and Williams, 1975, 1976; Williams and Ware, 1976; 1977).

Most reviewers (e.g., Aleamoni , 1974b; Centra, 1973 ; Costin et al., 1971 ; Doyle , 1975; Falk and Lee Dow, 1971; F lood Page, 1974; Gage, 1974; Grush and Costin, 1975; Menges, 1974; Miller, 1972, 1975; Murray, 1973; Seldin, 1976; Scot t , 1975 ; Subkoviac and Levin, 1974) o f the rich l i tera ture on teaching evaluat ion, however , have come to the conclus ion tha t s tudents are c o m p e t e n t to rate ins t ruc t ion at least in some areas (see 2.1.) and tha t thei r j udgmen t , while it should no t be de te rminan t , should be taken in to

accoun t in administrat ive decisions (see 2.2.). Fol lowing are some samples o f those conclusions:

"If teaching performance is to be evaluated, either for purposes of pay and promotion or for individual improvement, a systematic measure of student attitudes, opinions, and observations can hardly be ignored. The data which have been reviewed strongly suggest that the use of formal student ratings provides a reasonable way of measuring student reaction" (Costin et al., 1971). "'Feedback' provided by student assessment is valuable as information for lecturers. Where it is analyzed and acted upon it can contribute to increased efficiency in many of the means of presenting material to be learned. It is clear that the objectives of courses will be furthered if notes are legible, the spoken work audible, work planned in logical sequence and appropriate in amount to the time allowed to study it, if books referred to are available and examinations and tests are well constructed and equitably marked. If students find these conditions are not ful- filled it is useful to know. It is also important that lecturers and students should share an understanding of the objectives of their courses. Student assessment may reveal that an assumption that this common understanding exists is mistaken. Student assessment can be used as a reliable statement of students' perceptions of the teaching they are receiving even if they cannot be proved to be, by themselves, valid judgements of the quality of teaching upon which individual teachers can be appointed and promoted" (Falk and Lee Dow, 1971, p. 25). "Student evaluation is the most valid, reliable and defensible tool for faculty appraisal" (Miller, 1975). "Provided that the data are gathered carefully, reported appropriately, and inter- preted judiciously, student evaluation appears able to make a useful contribution to personnel decisions, course improvement, and possibly, to student advising" (Doyle, 1975, p. 86).

t ion " t he

Besides the empir ical evidence, one can also invoke a logical just if ica- for s tuden t evaluat ions (Gage, 1974). As Seldin (1976 , p. 76) pu t it op in ions o f those who eat the d inner should be considered , i f we want

226

to know how it tastes!" Students experience their teacher's behavior more personally and directly and are more affected by it than any other potential group of observers. Nevertheless, I would argue, as do Gage (1974), Kulik and McKeachie (1974), Scott (1975), Thomas (1976), Williams and Ware (1976) and others, that evaluation by students, while necessary, is not sufficient. Other means, in addition to student evaluation, must be used, if one wants to arrive at a more complete assessment of teaching effectiveness. There are several areas which are difficult to judge for students, for example, the level, amount and accuracy of the information presented, the relevance of the content with respect to the subsequent curriculum or the teacher's competency in the subject matter. As Gage (1974, p. 72) rather poignantly put it, if they had these competencies, "one would question whether they should be taking the respective courses."

In other words, evaluation by students should focus on behaviors and elements they can observe, for example, the clarity and variety of oral, written and audio-visual presentations, whether or not they were actively involved in the learning process and stimulated to think about the subject matter, their prior motivation for the course and the interest it generated, the degree of difficulty they experienced in understanding the course content and their reaction to the teacher's assessment of their work.

1.2. Evaluation by peers Peer evaluation has been the most widely used practice to judge the

quality of a great variety of professional practitioners (doctors, psycholo- gists, lawyers, etc.). Only very recently has there been a trend towards consumer involvement and evaluation in these fields. Instances of peer assessment of university teaching, however, have been very rare. The classroom remains a private domain and in the name of academic freedom, public scrutiny, even in the form of peer assessment, is taboo.

Yet, peers could contribute substantially to course evaluations (cf. AAUP, 1974; Beaird, 1975; Gage, 1974; Edwards, 1974). Their subject- matter expertise, their familiarity with the curriculum, their professional experience, and their own teaching practice are all characteristics which could and should be brought to bear on teaching assessment.

It should be emphasized though that here, too, careful procedures which guarantee fairness, continuity, and reciprocity are indispensable. A one-hour visit, for example, by a senior colleague or administrator hardly provides for an acceptable climate and appropriate sampling (AAUP, 1974). What I have in mind are mutual and repeated observations in the classroom, evaluations of course materials and organization, examination and grading procedures, and student learning by a group of peers. Peers, then, could complement the students' evaluation, particularly in those areas where the students' competence to judge can be questioned.

227

1.3. Self-evaluation One procedure which besides student ratings has gained some accep-

tance as a means to assess teaching effectiveness is self-evaluation, mostly by means of video-recordings (cf. Perlberg, 1976). At our institute, for example, we have recently addressed an open invitation to all faculty to have their classes (lectures, seminars, labs, etc.) recorded on video-tape whenever they like. Subsequently, they may view the recordings privately, with colleagues, students, and/or instructional consultants. We also provide them with a checklist which is designed to help them look for specific elements and assess their course. Occasionally, we organize seminars where (with the instructors' consent) we review and discuss a montage of revealing ex- cerpts of the participants' classroom performance. Although, as of yet, only a relatively small number of instructors have made use of this possibility, they have expressed their satisfaction and desire to repeat the recording in order to review the progress they have made. This evaluation procedure is clearly designed to provide direct and concrete feedback to the instructor which he can use to improve his performance, rather than to assess his teaching qualifications in promotion or tenure decisions.

Using at least two cameras and a split-screen technique, furthermore, the students' reactions and behaviors could also be t~ken into account. It is a procedure which could easily be combined with student and peer evaluation, if one asked those groups to view and comment on the playback. Again, I do not believe that this evaluation procedure is sufficient in and by itself. At best it represents a valuable complement to the other two I have described.

1.4. Evidence of student learning Teaching does not represent an end in itself, it is rather a means to

facilitate learning and generate interest in a subject. By analogy, the evaluation of teaching without reference to student learning and interest is incom- plete. In current practice, it is assumed that teaching effectiveness as judged by students, peers and the instructor himself, is related to student achievement and attitudes. Certainly with respect to attitudes, student ratings provide a more direct measure, but even here, we need to remember that students' ratings may be influenced by prior attitudes and expectations about the course content and teaching methods (Crowe and Feldhusen, 1976; Centra and Linn, 1976), needs (Tetenbaum, 1975) and situational variables (Abrami et al., 1976). As we have seen, there is also evidence that students' ratings correlate with student achievement.

A more direct assessment of student learning, however, would be preferable, but is not easy to come by. By simply referring to the grade distribution at the end of the course, for example, we could hardly talk about a direct measure of what the student has learned in that course. We would need to know what the student's aptitude and competence in the

228

subject matter were at the beginning of the course, whether his grade was a function of those of his peers (norm referenced) and at what level the objectives were set.

In order to overcome these difficulties, as well as remedy the deficiencies in traditional evaluation procedures, such as student ratings, some authors (e.g. Rose, 1976) have advocated a criterion-referenced measurement (cf. also Feldhusen et al., 1974; Goldschmid, 1976; Meredith, 1975; Whitely and Doyle, 1976). It requires explicitly stated objectives and evaluation criteria prior to instruction. What is measured, then, is the result o f instruction in terms of behavioral change toward these objectives according to absolute standards. Even though we cannot easily compare one instructor's teaching effectiveness with that o f another since he may set different standards and/or objectives even for the same content, at least we know specifi- cally what the students have learned in a particular course. Rose (1976) recognizes that such an approach would require expertise and time-con- suming educational programs. Faculty and administrators, for example, would need to be trained in the formulation of explicit objectives and the construction of criterion-referenced tests. Yet she argues that "The advantages of the system . . . outweigh its disadvantages. The process may initial- ly be difficult, but the payof f is well worth the effort. By defining specific objectives before instruction, teachers will more likely use relevant instructional materials to help students attain the objectives. The process of defining objectives requires instructors to think through exactly what they want to teach, and they are then more likely to aim for truly worthwhile goals. Also, students ' progress can be continuously monitored so that instruction can be improved while the course is in progress, not after the fact."

It is appropriate to conclude this section with Rose's final paragraph:

The evaluation of teaching is a far too serious and complicated process to be based solely on the personal assessment of administrators, the judgment of visit- ing peers, and examination of course syllabi or teaching methods, or student opinions. Each of those is useful, but none is sufficient by itself. We must recog- nize the deficiencies in these approaches, continue to explore and refine direct measures of learner growth, and, until the perfect evaluation system is found, if ever, use multiple indicators for assessing faculty performance.

2. THE USE OF EVALUATION RESULTS

While originally instructors administered questionnaires to their students to obtain feedback on their courses primarily to improve them, there has recently been a marked shift toward evaluations in the sense of account-

229

ability and considerations for promotion and tenure (Bejar, 1975; Seldin and Wakin, 1975). Occasionally students have used evaluation data for a third purpose, i.e. to arrive at a course handbook, sometimes referred to as a "counter- or anti-calendar". We shall consider each of these uses in turn.

2.1. Evaluation to Improve Teaching There can be little doubt that especially in the long run the use of evalua-

tion data for the purpose of improving teaching, i.e. improving student learning (Goldschmid, 1976), is the most valuable (Hutchison, 1974; Rotem and Glasman, 1977). As has already been pointed out, the quality of the procedure and instruments determines to a large extent the specific and direct use a faculty member can make of the results when he seeks to up- grade his instruction. Above all, the feedback needs to be specific: it must indicate precisely what instructional elements require modification (cf. Frey, 1976; Menges, 1974; Pohlman, 1975; Sherman and Winstead, 1975).

There are two other conditions, however, which have also to be met, if teaching evaluations are to have an effect. First, the instructor must care about the feedback and be motivated to improve teaching (Rotem and Glas- man, 1977). Second, he must be able to bring about the desired changes (cf. Goldschmid, 1976).

Some universities have required course evaluation for a long time without having shown that this obligation in and by itself,has generally led to teaching improvements. Little evidence (aside from that presented earlier on the effects of using a questionnaire at midterm on the final ratings [cf. 1.1 ]) is in fact available on the relationship between student evaluations and instructional improvements (Goldschmid, 1976; Kulik and Kulik, 1974; McKeachie and Lin, 1975). If there are no further consequences, an instructor or institution can easily fall into a routine of collecting student ratings without bothering to analyze and use them as guides to ameliorate teaching. While students have generally been very cooperative in completing course ratings, they may be less inclined to do so in the future, if no instructional changes occur (cf. Flood Page, 1974).

But even if a faculty member is willing to act upon the feedback obtained from his students, he may not know how to go about it. Let us not forget that university teachers typically are not trained in instruction either before or after they are appointed (Goldschmid, 1976; Huberman, 1974). They therefore may need the advice of colleagues and instructional consultants on how to modify their approach (AAUP, 1974; Centra, 1973; Sherman, 1976). Aleamoni (1974a) indeed found that results from computer-scored questionnaires together with individual consultation sessions did bring about important positive changes in class performance as measured by subsequent student ratings.

In other words, evaluation for improvement is no simple matter. Unless

230

there exists a climate of concern for effective teaching and the necessary resources to accomplish it, the best evaluation procedures are of little use. We shall come back to this point when we consider the institutional commitment to quality instruction.

2.2. Evaluation to rate instructors

One aspect of accountability in higher education involves the rating of instructors for administrative purposes, for example, in tenure and promotion decisions. This is a delicate matter indeed. On the one hand, one can argue that unless evaluation data are taken into account in such decisions, faculty may be reluctant to seek such data in the first place, and if the data are required, but not considered, not get involved in its analysis and use for improvement. On the other hand, evaluations for administrative purposes may be seen as undue pressure and a menace to academic freedom. Given such an outlook, there is substantial risk that faculty will become hostile and defensive (Kerlinger, 1971) and seek ways to undermine the process. Thus, course evaluations could easily become counterproductive.

Several authors (e.g. Sherman, 1976; Tetenbaum, 1975; Zelby, 1974) fear that faculty, particularly given the tight job market for academics, may teach to get good student evaluations, if such data are required and taken into account in personnel decisions. As Sherman (1976, p. 38) put it "students can be tricked into giving high ratings through highly visible changes of questionable importance." Rich (1976) in an empirical study in California, on the other hand, found that a large majority of faculty favor the use of student evaluations in tenure and promotion decisions. There were some differences, however, among his four samples (college, small private college, state college, and research university). Interestingly enough faculty at primarily teaching institutions were the least favorable, as were older teachers and those with more publications.

Given this background, several considerations might be born in mind in deciding on whether and how to use evaluation data for administrative purposes. First, as has already been pointed out, the validity of any one procedure is probably not sufficient to arrive at an acceptable judgment. A combination of procedures or "multiple indicator" approach would therefore be highly desirable. Second, one must carefully consider the context of instruction. Several reviewers (cf. Aleamoni, 1974b; Centra and Creech, 1976; Costin et al., 1971; Frey, 1976; Frey et al., 1975; Gage, 1972; Kulik and Kulik, 1974) have found that certain course characteristics beyond the instructor's control, such as the level and subject area of the course, the size of the class, and whether the course was elective or required, as well as teaching experience, and expected course grade, can influence the students' ratings. The faculty's research productivity and publications on the other hand appear not to be correlated with student ratings (cf. Aleamoni,

231

1974; Gage, 1974). Ratz (1975) found that "service courses" for students in other departments tend to receive lower ratings.

On the basis of such findings, Gage (1974) suggests that we abstain from making comparisons among instructors and only report the mean ratings on a given scale for a particular course. Centra and Linn (1976) argue that teachers must also look at the distribution of students' responses on each item, not only at the average. Gage (1974, p. 81) adds the following recommendations: - "Administrators should use the ratings as bases for decisions only in cases where they are extremely unfavorable or extremely favorable. - The consistency of ratings over years and courses should be taken into account", and finally: - "Only general, overall evaluations are needed for administrative purposes."

Referring to Smock and Crooks (1973, p. 81), Gage would restrict the evaluation to no more than five overall ratings which "would be meaningful for all courses, whatever the subject matter, teaching method, or class size." Centra and Rose (1976) have also argued that global ratings are more defensible than ratings of specific practices. Kane et al. statistically analyzed the generalizability of student ratings. They came to the conclusion that "the variability of student responses is much larger than the variability within students. In order to obtain dependable estimates of student opinion of instructional effectiveness, it is more important to ha~e a large sample of students than it is to have a large sample of items" (1976, p. 182).

Little evidence is available on how administrators actually use evaluation data. McKeachie and Lin (1975) found that information from student ratings of teaching was in fact no t used in decisions affecting promotion and salary. Similar results were obtained by Thorne et al. (1975). Thus, one issue which definitely needs further investigation is the decision-making process in staffing decisions in academic institutions.

2.3. S t u d e n t course h a n d b o o k

From time to time the student association on any given campus may decide "to do something" to improve the quality of instruction. It is not uncommon that they then attempt to evaluate all the courses on their own and publish the results in a course handbook or "anti-calendar" (In some cases such accounts have been accompanied by the teacher's comments; for examples, see Flood Page, 1974). Although ostensibly such information could help students select their courses, these reports often serve to publicly denounce the inadequacies of some teachers.

All that has been said so far points to the complexity and difficulties or "pitfalls and pratfalls" (Popham, 1974) involved in the evaluation of teaching. Students usually do not have the necessary time and expertise to devise

232

appropriate instruments and procedures nor do they have the resources to adequately analyze and report their data. The result may be unfair comparisons and labeling of teachers which in turn create hostility and bitterness among the faculty. In this case too, the whole venture may become counterproductive. Rather than being incited to improve his instruction - the primary purpose of the evaluation - the teacher may become defensive and insecure. Furthermore, it should not be overlooked that, while the students who did the evaluation and reporting will have left the institution in a short time, the instructor may suffer for years from the publicized report of his teaching. In an experimental study, Perry et al. (1974) demonstrated that prior evaluations of a lecturer which are communicated to students may affect their ratings regardless of actual performance in a subsequent course.

While the students' reactions, together with other data, should be considered both for the improvement of teaching and the evaluation of the instructor, I do not see what constructive purpose would be served by publishing the results obtained by the students (or for that matter by the faculty or administrators), even if the evaluation procedure used were adequate. Doyle (1975) has proposed that such data might be made available to students for the purpose of orientation and guidance through an advisory agency, which would have the responsibility to properly interpret and regularly update and revise them.

On the other hand, if students decide to evaluate their courses and publish the results, it most probably is a signal that they are unhappy about the instruction they receive. They may perceive such an evaluation as a means to pressure the faculty to take their opinions seriously and to make efforts to improve their instruction. Student pressure as well as public demands for accountability serve to move institutions and faculty to recon- sider their policy and practice with regard to teaching.

Before moving on to the next section, it should be noted that despite the widespread use of student evaluation and the abundant literature supporting it, there are still many faculty who oppose it (see for example Madox, 1975; Rhodes, 1976; Rodin, 1975) regardless of whether it is intended for the improvement of teaching or administrative decisions. A number of authors have discussed their objections in the light of available evidence (see, for example, Aleamoni, 1974b; Centra, 1973; Hildebrand, 1972; Huberman, 1974; Seldin, 1976).

Finally, Johnson et al. (1975) have argued that the absence of a theoretical framework for the analysis of teaching makes it impossible to use current evaluation methods. They insist on a normative conception of teaching and propose a model for its assessment which stipulates instructional intent as a key factor. Feldhusen et al. (1974) also maintain that a model of instruction must serve as the basis for course and instructor evaluation. Sheehan (1976) proposed the use of several types of ratings within a theoretical framework for improving university teaching.

233

III Efforts to Improve Teaching

Taking evidence of teaching effectiveness into account in considerations for recruitment, tenure and promotion constitutes a step in the right direc- tion, but it represents only one condition which must be met if we want to arrive at substantial improvement. Faculty must also be given the opportunity to learn more about the instructional process and become professionals as teachers (Goldschmid, 1976). With this in mind, a number of universities have established staff development programs often organized by special units, which have become known under a variety of names, such as higher education advisory and research centers, pedagogical service units and staff or faculty development institutes. By offering a documentation service, courses, workshops, seminars and consultation on teaching, as well as engaging in research on instruction, these units are supposed to provide the impetus for teaching improvement efforts.

Since various accounts of these units and staff development programs exist (cf. Alexander and Yelon, 1972; Berquist and Phillips, 1975; Centra, 1972; Falk, 1970; Gagnon, 1976; Goldschmid, 1976; Group for Human Development in Higher Education, 1974; Hore, 1976; Miller, 1977; Wergin et al., 1976) we can forego a detailed description of their activities here. In the context of this paper, we are more concerned with an evaluation of their function. Unfortunately, no reports of a systemfitic assessment of their effectiveness are available. I shall therefore in the following rely upon my personal experience in staff development and on contacts with many of these units. No doubt my recent exploration of the higher education scene in Australia and New Zealand has influenced my outlook.

At the outset, I would like to distinguish between effectiveness and impact. Most, if not all, of these units were designed to make an impact on their respective institutions in the sense that they were supposed to bring about massive improvement in the teaching practices of the universities they serve. There is no doubt that such expectations were unreasonable given, on the one hand, faculty resistance and lack of training, minimal institutional support and commitment, and lack of recognition for quality instruction and, on the other, the severe constraints placed upon the units' activities by the limited personnel and resources available to them. It is not uncommon to find only a handful of academics in these units designed to serve potentially several hundreds or in some cases thousands of faculty members.

Even considering some of the "older" units which have been in existence in Australia for almost twenty years, one does not get the impression that they have had an impact on their institutions. The number of faculty who use any of the services of these centers has been variously reported to me to represent approximately 1% to 10% of the total number of instructors. In addition, the observation is often made, that many of those who seek advice

234

and pa r t i c ipa te in seminars are a m o n g the best and m o s t conce rned teachers and poss ib ly need help less t han the o thers . Nor is there evidence tha t teaching prac t ices and conce rn for qua l i ty ins t ruc t ion in general have s ignif icant ly

a l tered on campuses where there is such a unit . Final ly , teaching innova t ions

r emain r a the r isola ted excep t ions .

This is no t to say, however , tha t these uni ts have no t been effect ive. On

the c o n t r a r y , the rat ings o f services p rov ided by the centers as judged by the i r

" c l i en t s " (i.e. the facu l ty t hey have served) a p p e a r to be ge/aerally very

posi t ive. Given the highly un favorab le odds to s tar t with, one migh t say tha t

a n u m b e r o f these centers have done r e m a r k a b l y well. A recen t set o f r ecom-

m e n d a t i o n s by the F e d e r a t i o n o f Aust ra l ian Univers i ty S ta f f Associa t ion

( F A U S A , 1977) bears wi tness to this ef fec t :

(i) FAUSA supports moves to improve the resources and processes of teaching, learning and research in the Australian universities. (ii) FAUSA accepts that Higher Education Research Units have a proper contribution to make to the improvement of teaching and learning in universities. (iii) FAUSA supports in particular those Higher Education Research Units activities which create awareness of new methods and resources and their effective application in teaching programmes. (iv) The academic staff of Higher Education Research Units should not be pre- cluded from engaging in individual research or in teaching programmes in other university departments. (v) FAUSA believes that academic staff in Higher Education Research Units should be employed under the same conditions as other academic staff. (vi) FAUSA opposes any move to use Higher Education Research Units in the evaluation of the teaching performance of any member of the academic staff without the consent of the member concerned and then such evaluation shall be confidential to the said member.

Visi t ing a n u m b e r o f these centers one is aware o f a feeling o f despon-

dency , even res ignat ion. What s ta r ted ou t as h o p e f u l and p romis ing ven tu re to br ing a b o u t rapid i m p r o v e m e n t s is n o w perce ived in a b r o a d e r con t ex t .

The fo rmidab le obs tac les to significant progress , especial ly in the inst i tu-

t ional sphere , are b e c o m i n g m o r e and m o r e apparen t . Originally, se t t ing up

these uni ts appea red to be in and b y i tself the s t ra tegy to improve teaching, bu t n o w they are seen m o r e realist ically as one e l emen t or i n s t r u m e n t in the long and a rduous process o f br inging a b o u t p r o f o u n d change. In any case, a sy s t ema t i c eva lua t ion o f the higher e d u c a t i o n uni ts will p rove diff icul t . While we mus t take the sa t i s fac t ion o f thei r cl ients in to accoun t , we should pro-

* It is clear from this recommendation that FAUSA is concerned lest the staff of these units become "inspectors" of teaching practices. Such a role is hardly compatible with that of a facilitator or consultant. Clearly the higher education centers were designed to help improve teaching rather than to evaluate it for the purpose of personnel decisions.

235

bably also consider their indirect effects on student learning. In this case, even if it were possible to demonstrate a direct influence of a consultation or a seminar on a teacher's behavior, it would remain to be shown, what change the instructional modification produced with respect to student learning. Occasionally, it is possible to follow from the initial problem an instructor presents to a unit right through to the measurement of student learning as a result of a new instructional approach inspired by the unit- faculty exchange. In the framework of our research on individualized instruction, for example, we have been able to analyze this process and demonstrate a positive change in student learning and attitudes (cf. Brun and Goldschmid, 1976; 1977a, b; B r u n e t al., 1977; Duchastel et al., 1977; and Fivaz and Goldschmid, 1977). Such a consultation-evaluation approach, furthermore, provides a means to both evaluate teaching effectiveness with respect to student learning (cf. 1.4) and to guide improvement efforts.

Besides requiring course evaluations, setting up higher education units and some tentative moves towards faculty development, one can point to few institutional strategies which have been proposed to improve teaching. A few universities have set aside special budgets to make small grants available to faculty and sometimes have even given "t ime-off" for the improvement of teaching. On the whole, it seems that, while these policies have helped some instructors, the vast majority remains untouched and unmoved. If this preliminary and rather subjective analysis of our current situation in higher education with respect to teaching has any validity, it would seem obvious that other strategies - complementary ones - must be ' found to improve teaching.

IV Institutional Commitment and Evaluation

What is required now, I believe, is an institutional commitment to and a policy of making quality instruction a priority and long-term concern (cf. Goldschmid and Goldschmid, 1976). Public demands for accountability, social pressures, and students' reactions in whatever form (drop-outs, lower enrollment, anti-calendars, etc.) are unlikely to subside, until and unless profound changes are made in our institutions of higher learning.

In the following, I shall briefly outline a plan designed to bring about such an institutional commitment to quality instruction. The smallest effective unit where the most important decisions are made in the university is the academic department. Recruitment, tenure and promotion of staff, and the curriculum are heavily influenced if not completely decided by the department. (In some institutions the pattern may vary somewhat; for example, the main decisions may be made at the level of a Faculty or Divi- sion). It is at this level that teaching must be discussed and improvement strategies considered (cf. AAUP, 1974).

236

The definition of effective teaching, how it can and will be measured and how the evaluation data will be used are matters on which an institutional consensus must first be reached (see also Beauchesne, 1972; Clark et al., 1977; Feldhusen et al., 1974; Menges, 1974; Norr and Crittenden, 1975). Just as important, the department must then develop a program for improving its instruction and making provisions for its application. Needless to say that both matters need to be discussed at length and in depth. Administrators must insist, that departments engage in this activity and help identify and secure means and resources to carry out departmental policy. Students too must be included in these discussions as valuable partners who are vitally concerned with these issues. The departmental decisions would then be made known and subjected to public scrutiny. At regular intervals the policy would be reviewed and its effects evaluated and discussed, and if necessary, revised. It is likely that the policies arrived at will differ (much as is the case in other areas) from department to department within a university and across universities, but such a divergence would only reflect the fact that there is no one way to teach (cf. Sheffield, 1974) nor one way to improve teaching.

The basic idea is to personally involve each faculty member in the policy elaboration and its application. The evaluation and improvement of teaching, then, would no longer be the sole concern of a few committed teachers, the higher education unit, or some committee, but becomes a vital issue for every faculty member since his status in the department will be at stake.

If such a policy existed, higher education units could fully play their role in helping to develop and carry out evaluation and improvement programs. Other means may well evolve as well, for example, the creation of a department- or Faculty-based consultancy and research service, a regular inservice- and, hopefully, a preservice training-program as well, to prepare future teachers in higher education (cf. Group for Human Development in Higher Education, 1974; Vattano, 1975).

Much more needs to be said about such a fundamental policy change and how it can be brought about. At best the thoughts I have expressed here will encourage others to search for constructive strategies which will bring about quality instruction in higher education.

Despite the very large investments in higher education, there is as of yet little literature available on institutional evaluation and most of what there is is theoretical. Davis (1976), for example, points to the difficulty of even agreeing on acceptable dimensions to be measured, such as financial status, educational policy or impact on students. Dressel and Lorimer (1961) and Genova et al. (1976) have presented comprehensive plans on how to go about an institutional self-evaluation which include suggestions as to the people involved and the procedures to be used. In Genova's proposal all four major groups, the faculty, students, administrators and governmental authorities would participate in the evaluation both as evaluators and as persons to be

237

evaluated. Other authors (e.g. Heaton, 1975) have recommended the application of MBO (Management by Objectives) to higher education. It calls for an active participation by all constituents and evaluation of performance based on criteria which have been set by the individuals concerned at each respective level. Finally, several authors (e.g., Lumsden, 1974; Roberson, 1971; Stake, 1976) have discussed possible ways of evaluating and improving the efficiency of educational institutions by taking into account economic and political considerations.

V Summary and Conclusions

1. THE EVALUATION OF TEACHING

The evaluation of teaching in higher education has become a major issue. It is important to separate problems referring to the procedure used and those involved in the use of the evaluation results.

1.1. Instruments and procedure Depending on who does the evaluation we can distinguish between

student, peer, and self-evaluations. Each has advantages and drawbacks; none is sufficient in and by itself if we want to arrive at a complete assessment of teaching effectiveness.

1.1.1. Evaluation by students. Empirical evidence and logical considerations support the use of student evaluations. Student ratings are correlated with student achievement and faculty ratings of teaching effectiveness. Especially experimental studies, but theoretical reflections as well, point to the limits of the students' competence to judge instruction. Their feedback is particularly valuable if it focuses on observations of teaching practices and their attitudes and behaviors prior to and in response to instruction, while their competence to judge the level and accuracy of the course content is questionable.

1.1.2. Peer evaluation. Although rarely used, peer evaluations could make a substantial contribtltion particularly in those areas where the student ratings are less valid and reliable.

1.1.3. Self-evaluation. By means of video- and audio-recordings an instructor can observe his own performance and pinpoint deficiencies he can subsequently try to remedy.

1.1.4. Student learning. Although teaching evaluations are assumed to be

238

related to measures of achievement, a more direct assessment of student learning would be desirable. The criterion-referenced measurement approach represents a promising alternative to arrive at such an evaluation.

In any case, whatever procedure is followed, it must be well designed. In particular, it requires appropriate instruments, representative samples and an adequate analysis and presentation of the results.

1.2. The use of evaluation results Most often, the evaluation of teaching has been designed to provide

feedback to the instructor to help him improve his instruction. Increasingly, student evaluations are used in recruitment, tenure and promotion decisions. Occasionally, students evaluate their courses to publish a course handbook.

1.2.1. Evaluation for Improvement. In order to be helpful to the instructor, such evaluations must indicate precisely what elements of a course require modification. Mid-term evaluations, as well as consultations with peers or instructional specialists may help the instructor to improve an ongoing course. Even a well-designed evaluation procedure is of little use, however, if the instructor is unwilling or unable to act upon the feedback he receives. On the whole, little evidence is available that would demonstrate that student evaluations have resulted in teaching improvements.

1.2.2. Evaluations for personnel decisions. A "multiple-indicator" approach, which includes student ratings, i.e. the combined use of several sources of data, appears to be the most defensible one for the purpose of administrative decisions. In contrast to the evaluation for improvement, a few general ratings are more appropriate. In addition, only results derived over a period of time, and from representative samples, should be considered. Careful attention must be paid to course characteristics which may influence the student ratings, but which may be beyond the instructor's control.

1.2.3. Course handbooks. Making the results of teaching evaluations generally available would seem to serve little purpose. Such reports may unfairly label instructors and create a climate of hostility and defensiveness.

2. THE IMPROVEMENT OF TEACHING

While consideration of teaching effectiveness in personnel decisions enhances the motivation to improve instruction, other conditions must be met as well, if we want to arrive at quality instruction. Faculty must be given the opportunity to become professionals as teachers and learn more about

239

the instructional process, particularly in light of the fact that they typically do not undergo training either before or after they are appointed. A number of institutions of higher education have therefore established staff development programs usually organized by higher education units. While these units appear to be effective, in the sense of providing satisfactory service to their clients - the faculty who participate in their seminars and seek their advice, etc. - they have not achieved impact - in the sense of bringing about massive improvement efforts on their respective campus.

3. INSTITUTIONAL COMMITMENT

An institutional commitment to make quality instruction a priority and long-term goal is now required. It is proposed that academic departments be required to seek a consensus on what constitutes effective teaching, on the criteria and evaluation procedures and on how the evaluation results are to be used in personnel decisions. The departments must also establish a program for the improvement of teaching and with the help of the administra- tion allocate the necessary personnel and resources to carry it out. Such an explicit policy on teaching effectiveness and improvement which would include provisions for its enforcement seems necessary to make more substantial progress to improve instruction in higher education. In other words, we must move from an individual to an institutional commitment to quality teaching.

References

Abrami, P.C., et al. (1976). "Course evaluation: How?," Journal of Educational Psy- chology, 68: 300-304.

Aleamoni, L.A. (1974a). "The usefulness of student evaluations in improving college teaching." Office of Instructional Research and Development, University of Arizona.

Aleamoni, L. A. (1974b). "Typical faculty concerns about student evaluation of instruction." Paper presented at the Symposium on Methods of Improving University Teach- ing, Technion Institute of Technology, Haifa, Israel.

Aleamoni, L.A. and Ymer, M. (1973). "An investigation of the relationship between colleague rating, student rating, research productivity and academic rank in rating instructional effectiveness," Journal of Educational Psychology, 1973: 274-277.

Alexander, L.T. and Yelon, S. L., eds. (1972). Instructional development agencies in higher education. East Lansing: Michigan State University Press.

American Association of University Professors; Committee C on Teaching, Research and Publication (1974). "Statement on teaching evaluation," AA UP Bulletin, 60, 166-170.

Beaird, J. H. (1975). "Colleague appraisal of faculty performance," in: Scott, C. S. and Thome, G. L., eds., Professorial assessment in higher education. Monmouth, Oregon: Teaching Research Division, Oregon State System of Higher Education.

Beauchesne, J. M. (1972). Inventaire des activitds quant d l'dvaluation de l'enseignement. Ottawa: Universit6 d'Ottawa, Bureau de recherche institutionnelle et de planification.

240

Bejar, I .I . (1975). "A survey of selected administrative practices supporting student evaluation of instructional programs," Research in Higher Education, 3: 77-86.

Berquist, W. H. and Phillips, S. R. (1975). "Components of an effective faculty development program," Journal of Higher Education, 46:177-209 .

Brun, J. and Goldschmid, M. L. (1976). "Les processus d'apprentissage dans l'enseignement individualis6." Conf6rence pr~sent6e ~ la 3~me rencontre suisse des chercheurs en psychologie. Soci6t6 Suisse de Psychologie, Fribourg, Suisse, 16-17 janvier.

Brun, J. and Goldschmid, M. L. (1977a). "Pratique de la recherche en 6ducation et recherche sur la pratique 6ducative: L'introduction de l'enseignement individualis6 darts l'enseignement sup6rieur," in: Soci6t6 Suisse pour la Recherche en Education, ed., Rapport du premier congrds 1976, pp. 72-90. Lausanne: SSRE.

Brun, J. and Goldschmid, M. L. (1977b). "Etude des processus d'apprentissage ~ travers trois cas d'enseignement individualis6," in: Bonboir, A., ed., Actes du congrds de l'Association Europdenne pour la Recherche Pddagogique et le D~veloppement de l'Enseignement Sup(rieur: Pddagogie de l'enseignement supdrieur. Innovations dans le programme et le processus d'enseignement, 30 aoat-3 septembre 1976. vol. 1, 89 - 104. Louvain-la-Neuve, Belgique: Universit6 Catholique de Louvain.

Brun, J. et al. (1977). "La participation des enseignants h l'introduction de diff6rentes formes d'enseignement individualis6 au niveau universitaire." Conf6rence pr6sent6e au 46me Congr6s de l'Association Internationale de P6dagogie Exp6rimentale de Langue Franqaise (AIPELF), sur La Recherche au service de l'innovation en 6ducation, 16-19 mai. Gen~ve, Suisse.

Centra, J. A. (1972). Strategies for improving college teaching. Washington: ERIC. Centra, J. A. (1973). "The student as godfather? The impact of student ratings on acade-

mia," Educational Researcher, 2: 4-8 . Centra, J. A. and Creech, R. F. (1976). The relationship between student, teacher and

course characteristics and student ratings o f teacher effectiveness. Princeton, N.J.: Educational Testing Service.

Centra, J. A. and Linn, R. L. (1976). "Student point of view in ratings of college instruction," Educational and Psychological Measurement, 36: 693-703.

Centra, J. A. and Rose, B. (1976). "Student ratings of instruction and their relationship to student learning," Research Bulletin, February Princeton, N. J.: Educational Testing Service.

Champagne, M. (1976). "Questionnaire d'6vatuation de cours. Manuel d'utilisation, questionnaire et annexes." Chaire de P6dagogie et Didactique, Ecole Polytechnique F6d6rale de Lausanne, Suisse.

Clark, M. J. et al. (1977). "A strategy for evaluating university courses." Unpublished paper. Wellington, N. Z.: Victoria University of Wellington.

Costin, F. et al. (1971). "Student ratings of college teaching: reliability, validity and usefulness," Review o f Educational Research, 41 : 511-535.

Crowe, M. H. and Feldhusen, J. F. (1976). "Student perceptions of organizational features in relationship to course ratings," Contemporary Educational Psychology, 1 : 376-383.

Davis, J. A. (1976). "Institutional evaluation," in Anderson, Scarvia B., et al., Encyclope- dia o f educational evaluation, pp. 202-205. San Francisco: Jossey-Bass.

Doyle, K.O. (1975). Student evaluation of instruction. Lexington, Mass.: Lexington Books.

Dressel, P. L. and Lorimer, M. F. (1961). "Institutional self-evaluation," in Dressel, P. L. et al., ed., Evaluation in higher education. Boston: Houghton Mifflin.

Drucker, A. J. and Remmers, H. A. (1951). "Do alumni and students differ in their atti- tude towards instructors?" Journal o/Educational Psychology, 42:129-143 .

241

Duchastel, P. et al. (1977). "Evaluation de diff6rentes formes d'enseignement individualis6 au niveau universitaire." Conf6rence pr6sent6e au 4~me Congr~s de l'Association Internationale de P6dagogie Exp6rimentale de Langue Frangaise (AIPELF), sur la recherche au service de l'innovation en 6ducation, 16-19 mai, Gen6ve, Suisse.

Dufresne, R. (1975). Guide d'~laboration d'un questionnaire pour une dvaluation som- maire de cours. Qu6bec: Universit6 Laval, Service de P6dagogie Universitaire.

Edwards, S. (1974). "A modest proposal for the evaluation of teaching," Liberal Educa- tion, 60:316-326 .

Falk, B. (1970). "The Melbourne approach to teacher training for university staff," The Australian University, 8: 57-66.

Falk, B. (1977). "Evaluation of teaching: Decision making about teachers and courses," The South Pacific Journal o f Teacher Education, 5 :41 -47 .

Falk, B. and Lee Dow, K. C. (1971). The assessment of university teaching. London: Society for Research into Higher Education.

FAUSA (Federation of Australian Universities Staff Association) (1977). "Statement of FAUSA policy," FAUSA Newsletter, (March 2): 3-4.

Feldhusen, J. F. et al. (1974). "A model of instruction as the base for course and instructor evaluation." Lafayette, Ind.: Purdue University.

Fivaz, R. and Goldschmid, M. L. (1977). "Conception et d6veloppement d'un syst6me d'enseignement de la physique par la d6couverte guidbe," in Bonboir, A., ed., Actes du congrds de l'Association Europdenne pour la Recherche et le Ddveloppement de l'Enseignement Supdrieur: Pddagogie de l'enseignement supdrieur. Innovations dans le programme et le processus d'enseignement, 30aoftt-3 septembre 1976, vol. 2, 543- 556. Louvain-la-Neuve, Belgique: Universit6 Catholique de Louvain.

Flood Page, C. (1974). Student evaluation of teaching: the American experience. London: Society for Research into Higher Education.

Frey, P.W. (1974). "The ongoing debate: student evaluation of teaching," Change Magazine, 64: 47-48; 64.

Frey, P. W. (1976). "Validity of student instructional ratings," Journal of Higher Educa- tion, 47: 327-336.

Frey, P. W. et al. (1975). "Student ratings of instruction: validation research," American Educational Research Journal, 12: 435-447.

Gage, N. L. (1972). Teacher effectiveness and teacher education. Palo Alto, Calif.: Pacific Books.

Gage, N. L. (1974). "Students' rating of college teaching: their justification and proper use," in Glasman, N. S. and Killait, B. R., eds., Second UCSB Conference on Effective Teaching, pp. 72-86. University of California at Santa Barbara, California.

Gagnon, M. (1976). "La p6dagogie ~ l'universit6?" Pgdagogiques, 1 : 3 -5 . Service P6dago- gique de l'Universit6 de Montr6al.

Genova, W. J., Madoff, M. K., Chin, R., et al. (1976). Mutual benefit evaluation or faculty and administrators in higher education. Cambridge, Mass.: Ballinger Publ.

Goldschmid, M. L. (1976). "Teaching and learning in higher education: recent trends," Higher Education, 5 : 437-456.

Goldschmid, M. L. and Goldschmid, B. (1976). "The role of institutional management in improving instruction in higher education." Paper presented at the Third General Conference on Institutional Management in Higher Education, 13-16 September. Paris: OECD, CERI.

* Gooler, D.D. (1977). "Criteria for evaluating the success of non-traditional post- secondary education programs, "Journal o f Higher Education, 48: 78-95.

Greenwood, G. E. et al. (1973). "Student evaluation of college teaching instrument: a factor analysis," Journal of Higher Education, 44: 596-604.

242

Group for Human Development in Higher Education, (1974). Faculty development in a time of retrenchment. New Rochelle: Change Book Department.

Grush, J. E. and Costin, F. (1975). "The student as consumer of the teaching process," American Educational Research Journal, 12: 55-66.

Harari, O. and Zedeck, S. (1973). "Development of behaviorally anchored scales for the evaluation of faculty teaching," Journal of Applied Psychology, 58:261-265.

Heaton, C. P., ed. (1975). Management by objectives in higher education: theory, cases and implementation. Durham: National Library for Higher Education.

Hildebrand, M. (1972). "How to recommend promotion for a mediocre teacher without actually lying," Journal o f Higher Education, 43: 44-62.

Hildebrand, M. and Wilson, R. C. (1970). Effective university teaching and its evaluation. Berkeley: Center for Research and Development in Higher Education.

* Hodgkinson, H. L. et al. (1975). Improving and assessing performance: evaluation in higher education. Berkeley: Center for Research and Development in Higher Education.

Hore, T. (1976). Teaching research units in Australasia. London: Commonwealth Secre- tariat.

Huberman, M. (1974). "La formation et l'6valuation de l'enseignement universitaire," Techniques d'instruction (GRETI), 1 : 8-11.

Hutchison, J. (1974). "Faculty and student attitudes toward evaluation of teaching," ERM Magazine, 7: 9-11.

Johnson, H. C., jr., et al. (1975). "The assessment of teaching in higher education. A critical retrospect and a proposal," two parts, Higher Education, 4: 173-200; 273-303.

Kane, M. T. et al. (1975). "Student evaluations of teaching: the generalizability of class means," Journal of Educational Measurement, 13:171 - 183.

* Kegan, D. L. (1977). "Using Bloom's taxonomy for curriculum planning and evaluation in nontraditional educational settings, "The Journal o f Higher Education, 48: 63-77.

Kerlinger, F. (1971). "Student evaluation of university professors," School and Society, 99: 353-356.

Kulik, J. A. (1976). "Student reactions to instruction," Memo to the Faculty, 58: 1-5. Kulik, J. A. and Kulik, C.-L. (1974). "Student ratings of instruction," Teaching of Psycho-

logy, 1 : 51-57. Kulik, J. A. and McKeachie, W. J. (1974). "The evaluation of teachers in higher education,"

in Kerlinger, F. N., ed., Review of research in education, Itasca, II1.: F. E. Peacock Publishers.

* Leinhardt, G. (1976). "Observation as a tool for evaluation of implementation," Instructional Science, 5: 343-364.

Lumsden, K. G., ed. (1974). Efficiency in universities: The La Paz papers. Amsterdam: Elsevier.

Madox, H. (1975). "The assessment of teaching by ratings: a critique," The Australian University, 13: 139-147.

* Mclntosh, N. (1974). "Some problems involved in the evaluation of multi-media education systems, "British Journal of Educational Technology, 5: 43-95.

McKeachie, W. J. and Lin, Y. G. (1975). Use of student ratings in evaluation of college teaching. Final report. Ann Arbor, Michigan: The University of Michigan, National Institute of Education.

Menges, R. O. (1974). "The new reporters: students rate instruction," in Pace, R. C., ed., New directions in higher education: evaluation of learning and teaching, San Francisco: Jossey-Bass.

Meredith, G. M. (1975). "Toward a system approach to student-based ratings of instruction," The Journal of Psychology, 91: 235-246.

243

Miller, A.H., ed. (1977). "Symposium: Teacher education for tertiary teachers," The South Pacific Journal of Teacher Education, 5: (1): 5-63.

Miller, R. I. (1972). Evaluating faculty performance. San Francisco: Jossey-Bass. Miller, R. I. (1975). "Faculty evaluation and development." Paper presented at the Inter-

national Conference on Improving University Teaching, Heidelberg, 9-11 May. Mortimer, K. P. (1972). Accountability in higher education. Washington, D.C.: American

Association for Higher Education. Murray, H. G. (1973). A guide to teaching evaluation. Toronto: Ontario Confederation

of University Faculty Association. Naftulin, D. H. et al. (1973). "The Doctor Fox lecture: a paradigm of educational selec-

tion," Journal of Medical Education, 48: 630-635. Norr, J.L. and Crittenden, K.S. (1975). "Evaluating college teaching as leadership,"

Higher Education, 4: 335-350. Pambookian, H. S. (1976). "Discrepancy between instructor and student evaluation of

instruction: effect on instructor," Instructional Science, 5: 63-75. * Parlett, M. (1972). "Evaluating innovations in teaching," in Butcher, H. J. and Rudd,

E., eds., Contemporary problems in higher education, pp. 144-154. London: McGraw- Hill.

Perlberg, A. (1976). "The use of laboratory systems in improving university teaching," Higher Education, 5: 135-151.

Perry, R. P. et al. (1974). "Effect of prior teaching evaluations and lecture presentation on ratings of teaching performance," Journal of Educational Psychology, 66: 851- 856.

Pohlman, J. T. (1975). "A description of teaching effectiveness as measured by student ratings," Journal of Educational Measurement, 12: 49-54.

Popham, J. W. (1974). "PitfaUs and pratfalls of teacher evaluation,'~ Educational Leader- ship, (Nov.): 141-146.

Ramsden, P. (1975). "Polytechnic students' expectations of their teachers and the use of a student feedback questionnaire: a preliminary report," Higher Education Bulletin, 3: 73-85.

Ratz, H. C. (1975). "Factors in the evaluation of instructors by students," IEEE Trans- actions on Education, E-18: 122-127.

Rhodes, D.M. (1976). "Achieving teaching excellence: some misconceptions and a proposal," Higher Education Bulletin, 4:105-121 .

Rich, H. E. (1976). "Attitudes of college and university faculty toward the use of student evaluation," Educational Research Quarterly, 3 :17-28 .

* Rippey, R. M., ed. (1973). Studies in transactional evaluation. Berkeley: McCutchan Publ. Corp.

Roberson, E. W., ed. (1977). Educational accountability through evaluation. Englewood Cliffs, N.J.: Educational Technology Publ.

Rodin, M. (1975). "Rating the teachers," The Center Magazine, 3: No. 5. Rose, C. (1976). "Stalking the perfect teacher," The Chronicle of Higher Educ~z~'on,

Sept. 27. Rotem, A. and Glasman, N. (1977). "Evaluation of university instructors in the United

States: the context," Higher Education, 6: 75-92. Rumery, R. et al. (1975). "The role of student reports in the evaluation of teaching,"

Higher Education Bulletin, 3: 93-99. Scott, C. (1975). "Correlates of student ratings of professorial performance: instructor

defined extenuating circumstances, class, size, and faculty member's professional experience and willingness to publish results," in Scott, C. S. and Thorne, G. L., eds., Professorial assessment in higher education, pp. 155-171. Monmouth, Oregon: Teaching Research Division, Oregon State Systems of Higher Education.

244

Scully, M. (1976). "Worldwide job crisis faces university graduates," The Chronicle o f Higher Education, Sept. 27.

Seldin, P. (1975). "Rating the teachers," The Center Magazine, 3: 75-76. Seldin, P. (1976). "New rating games for professors," The Peabody Journal o f Educa-

tion, 53: 254-259. Seldin, P. and Wakin, E. (1975). "Students now get to help decide the worth of their

professors," The New York Times, June 8. Sheehan, D. (t 976). "The localization, diagnostic, and monitoring functions of student

ratings in a model for improving university teaching," Instructional Science, 5: 77-92. Sheffield, E. F., ed. (1974). Teaching in the universities: No one way. Montreal: McGill-

Queens University Press. Sherman, T. M. (1976). "Trick or trait: A look at student evaluation of instruction,"

Educational Technology, 16: 38-40. Sherman, T. M. and Winstead, J. C. (1975). "A formative approach to student evaluation

of instruction," Educational Technology, 15: 34-39. Silberman, M. L. and Allender, J. S. (1974). "The course description: a semiprojective

technique for assessing students' reactions to college classes," Journal of Higher Educa- tion, 45: 450-457.

Smock, H. R. and Crooks, T. J. (1973). "A plan for a comprehensive evaluation of college teaching," Journal o f Higher Education, 44: 577-586.

Stake, R. E. (1976). L 'dvaluation des programmes d'enseignement: ndcessitd et r&ctions. Paris: OCDE/CERI.

Subkoviak, M. J. and Levin, J. R. (1974). "Determining the characteristics of the ideal professor: an alternative approach," Journal o f Educational Measurement, 11 : 269- 276.

Tetenbaum, T. J. (1975). "The role of student needs and teacher orientations in student ratings of teachers," American Educational Research Journal, 12:417-433.

Thomas, I.D. (1976). "Considerations in a course evaluation in higher education," Educational Technology, 16: 32-38.

Thorne, G.L. et al. (1975). "Factors influencing promotion and tenure: preliminary system finding," in Scott, C. S. and Thorne, G. L., eds., Professorial assessment in higher education. Monmouth, Oregon: Teaching Research Division, Oregon State Systems of Higher Education.

* Trow, M. (1970). "Methodological problems in the evaluation of innovation," in Wittrock, M. C. and Wiley, D. E., eds., The evaluation of instruction, pp. 289-331. New York: Holt, Rinehart and Winston.

Vattano, F. J. (1975). "A program to prepare graduate students for college teaching." Paper presented at the International Conference on Improving University Teaching, Heidelberg, 8-11 May.

Ware, J.E. and Williams, R.G. (1975). "The Doctor Fox Effect: a study of lecturer effectiveness and ratings of instruction," Journal o f Medical Education, 50: 149- 156.

Ware, J. E. and Williams, R. G. (1976). "Studies of student-faculty rating scoring methods: seeing through the Doctor Fox Effect." Mimeographed paper, Santa Monica, California: The Rand Corporation.

Wergin, J. F. et al. (1976). "The practice of faculty development," Journal of Higher Education, 47: 289-308.

Whitely, S. E. and Doyle, K. O. (1976). "Implicit theories in student ratings," American Educational Research Journal, 13: 241-253.

Williams, R. G. and Ware, J. E. (1976). "Validity of student ratings of instruction under different incentive conditions: a further study of the Doctor Fox Effect," Journal of Educational Psychology, 68: 48-56.

245

Williams, R. G. and Ware, J. E. (1977). "An extended visit with Dr. Fox: validity of student ratings of instruction after repeated exposures to a lecturer." American Educa- tional Research Journal, 14: 449-457 .

Wittrock, M. C. and Wiley, D. E., eds. (1970). The evaluation of instruction, issues and problems. New York: Holt, Rinehart and Winston.

Wolansky, W. (1976). "A multiple approach to faculty education," Education Journal, 97: 81-96 .

Zelby, L. W. (1974). "Student-faculty evaluation," Science, 183 :1267-1270 .

* These references on the evaluation of non-traditional institutions were added to the bibliography but are not cited in the text.

Documents

The evaluation and improvement of teaching in higher education