Assessment Centers for Course Evaluations: A Demonstration et al 1997 Assessment Cent… · .306 ASSESSMENT CENTERS school Students might have as a result of their traditional exposure

Assessment Centers for Course Evaluations:A Demonstration

Bronston T. MayesChristopher A. Belloli

Ronald E. RiggioMonica Aguirre

Student Outcome Assessment CenterCalifornia State University, Ftdlerton, CA 92634

A quasi-experimental design, employing assessmetit center (AC) pre-and post-tneasuresofskillsattd knowledge, as wellas traditionalpettcil-and-paper measures, was used to assess the amount of learning in twofundamentally differetit management classes. Otie class, the controlgroup, focused on the theories and models of organizational behaviorthat empltasized top-down learning associated with committing mate-rial to tnemory and recalling metnorized concepts. In the second class,the experimental group, students learned a variety ofliatids-on tnanage-tnent skills. Then they used bottom-up cognitive processes to identifycritical cues in the task environment in order to diagnose probletnsrequiring the application of specific skills. As expected, the theory classperformed better on traditional tests requiring the recall of data frommemory. The skills class perfortned better on assessment center exer-cises requiring the recognition of situational cues and the application ofappropriate managerial action. The most significant gain in managerialskills occurred in the areas of oral cotnmunication and self-presentationin a mock employment interview. Issues related to AC tise in highereducation are student motivation, AC cost control, reliability of perfor-mance ratings, and statistical significance/power.

Academicians have long recognized the importance of evaluating theeffectiveness of curriculum offerings. They have noted that the bestcriterion of good teaching is student learning. However, finding usefulmeasures of learning has been a difficult task (McKeachie, Lin, & Mann,1971). Because of the difficulty in finding acceptable measures of learn-ing, course evaluations have tended to employ measures of instructionalquality, which is believed to contribute to learning. One of the mostfrequently used assessment tools to evaluate the quality of instruction hasbeen the student opinion survey.

Riggio, RE., & Mayes, B.T. (Eds.). (1997). As.sessment centers: Research and applica-tions [Special issue]. Journal ofSacitil Beluiviortmd Per.ioncdity, Vol. 12, No. 5,303-320.©1997 Select Press, Corte Madera. CA. 41.V924-I6I2.

.304 ASSESSMENT CENTERS

A large body of research has been conducted on student opinionmeasures to determine what should be measured (e.g., Desbpande, Webb,& Marks, 1970), wbether they are related to what a student learns (e.g.,McKeachie et al., 1971), and whether they are subject to biases believedto operate when students evaluate faculty members (e.g., Hudelson, 1951;Rayder, 1968; Weaver, I960). In spite of mixed results from research onstudent opinion surveys, tbey remain the most widely used method ofevaluating teaching quality.

For the past ten years, however, a shift has been taking place in theparadigm governing higher education, away from a focus on deliveringinstruction to a focus on producing learning (Barr & Tagg, 1995). Theevaluation of institutional effectiveness under this paradigm rests onoutcome assessments of students at multiple times in their educationalexperiences. Learning effectiveness is inferred from the knowledge andskill gains that takes place as students are involved in the educationalprocess.

As a minimum, students should be assessed in terms oftheir knowl-edge and skills when tbey enter and when they exit the learning program.Important to this assessment concept is that the assessment processshould be independent of the learning process. That is, average gradesawarded by faeulty members are an unreliable measure of what studentshave learned, as is evidenced by widespread concern for creeping gradeinfiation. A more reliable indicator of learning is the measurement ofwhat students know and what they can do with the knowledge they haveacquired.

Assessment technologies based on skills and task performance havebeen available for more than fifty years. These methods are usuallyhoused in an Assessment Center (AC) and have been used to predictsuccess in numerous areas such as managerial jobs and governmentservice positions (Howard & Bray, 1988; Riggio, Aguirre, Mayes, Belloli,& Kubiak, 1997). We are proposing that AC methods can be modified tomeasure the leaming students gain from specific course instruction. Suchassessment meets the desired characteristic of instructional independenceproposed by Barr and Tagg (1995). That is, assessment center measuresfocus on skills that are evaluated in a setting removed from the classroom,and the evaluators are not the course instructor or others associated withthe delivery of instruction.

Porter and McKibbin (1988) recently made a distinction betweentwo kinds of knowledge business students might acquire. Disciplinecontent knowledge is the information that is commonly held amongmembers of a professional group. This is the knowledge that students mayretain in memory for later use wben a problem is recognized that might

Mayes e ta l . COURSE EVALUATION .3().S

require the use of that knowledge. For example, accounting studentsmight learn about the various kinds of categories for classifying financialtransactions and how these categories interact with each other in thepreparation of financial statements. This information may be stored in thestudent's memory until s/he is required to set up an accounting system andthen these categories would be used to structure new accounts. Thesestudents may also acquire technical knowledge in how to use the "tools"of the trade, such as computers, information systems, and the varioushardware and software configurations they might find at work. Accordingto Porter and McKibbin (1988), American business managers believe thecurrent crop of business graduates have acquired an acceptable amount ofthis knowledge.

An additional body of know-how focuses on human relations skills,which may be independent of the technical discipline-based knowledge.Human relations skills include sensitivity to others and awareness ofone's own reactions to others, conflict management techniques, creativeproblem solving, ability to fit into and adapt to a team structure, interact-ing with subordinates/superiors, running meetings, and being effective inoral and written communication. While technical skills may be sufficientat lower organizational levels, the human relations skill set becomes moreimportant as a student moves upward in the organization during normalcareer development (Katz, 1974). According to the Porter and McKibbin(1988) study, business students are typically deficient in human relationsskills.

In a parallel line of study, cognitive psychologists have identifiedtwo fundamental kinds of thought processes, top-down and bottom-up(Crowder & Wagner, 1992; Galambos, Abelson, & Black, 1986). Top-down, deductive processes are concerned with the use of knowledgestored in memory and applying this knowledge to familiar or new situa-tions. Top-down processing is typically associated with discipline con-tent knowledge. Bottom-up processing, on the other hand, involves therecognition of cues and connections among these cues in a situation toarrive at an understanding of what is happening and to determine whetherthese events pose a problem that must be solved. Bottom-up processing isinductive in nature and relies heavily on cue and pattern recognition.

We would argue that bottom-up cognitive processing is essential inexercising human relations skills. An individual must be able to recognizecues and patterns of cues received from others that might signal apotential problem. Based on these observations and information stored inmemory, the individual will interact with others to attempt a problemsolution. New cues are likely to emerge as a result of this interaction andthe bottom-up, top-down process continues. One problem that business

.306 ASSESSMENT CENTERS

school Students might have as a result of their traditional exposure tolecture and testing, is a relative underdevelopment of bottom-up process-ing skills, especially in the area ol human relations.

Thus, an assessment of the elTecliveness of business (and other)curricula should include measures of what students have learned in theway of content knowledge (top-down), as well as what they can do withthat knowledge in organizational settings by employing human relationsskills (bottom-up). The traditional method of measuring learning, thestandardized test, is a reliable way to measure the knowledge studentshave acquired.

Although not widely used in higher education, assessment centershave been developed in industrial and governmental settings. They areused to measure potential managers' ability to recognize the cues in asituation that indicate a problem state, and then apply an appropriatetechnique or decision process to solve the problem. Assessment centersinclude measurement of bottom-up cognitive skills as well as top-downskills.

The present study tested the effectiveness of assessment centermethods in evaluating the student learning associated with a course inmanagement skills (Whetten & Cameron, 1991). This course focusesheavily on bottom-up cognitive processing and the acquisition of humanrelations skills. A quasi-experimental design was used in which thestudents enrolled in the management skills course were compared with anonequivalent control group of students taking an introductory course inorganizational behavior. Students in the organizational behavior (OB)course received instruction in the social psychological theories thatexplain work behavior and attitudes. They were also taught the adminis-trative techniques that are based on these theories. This kind of coursetypically focuses on top-down skills. Thus, the students in the manage-ment skills course were instructed in techniques of bottom-up manage-ment, while those in the OB class leamed theory and top-down adminis-trative procedures. Our concern in this investigation was whether ACmeasures are sensitive enough to detect changes in a student's bottom-upmanagement skills that occur through involvement in a single course. Thehypotheses guiding this investigation were as follows:

HI. Assessment center methods will be sensitive enough todetect learning gains in a management skills course. Stu-dents in the management skills course (the experimentalgroup) will show greater learning gains in assessment cen-ter exercises measuring management skills than will thestudents in the organizational behavior theory course (thecontrol group).

Mayes etal. COURSE EVALUATION 307

H2. Because of the course locus on cognitive, rather than skilllearning, students in the organizational behavior theorycourse will show greater learning gains on cognitive teststhan will the students in the management skills course.

METHOD

Because our purpose was to explore the potential uses of assessmentcenter methods to conduct university course evaluations, this study wasperformed within normal administrative structures. Therefore, it was notfeasible to perform a true experiment with participants randomly assignedto experimental and control groups. Instead, we used a pretest-posttestquasi-experimental design with a nonequi valent comparison group (Cook& Campbell, 1979). In organizational settings where random assignmentof participants is not feasible and natural groups must be used, this designapproximates a true experiment. However, initial differences between thetwo groups must be statistically controlled.

ParticipantsParticipants were selected from two undergraduate management

courses in a large west coast business school. Both management classeswere part of the required course list in the completion of a BusinessAdministration degree with an emphasis in management. The experimen-tal group were participants enrolled in an upper division managementskills course where the emphasis was placed on developing managementskills and effectively applying them in organizational settings. The con-trol group consisted of participants enrolled in an upper division, theory-based introductory organizational behavior course focused on theoriesand models explaining work behavior.

The management skills course consisted of 30 students; 48% weremale. They had an average age of 27.3 years and had completed anaverage of 118 semester credit units. The organizational behavior coursewas comprised of 32 students, 50% male, with an average age of 24.9years, and 105 semester credit units.

In the management skills course the students were given instructionand practice in developing self-knowledge, interviewing, counseling/coaching employees, coordinating work with others, delegating responsi-bilities to subordinates, diagnosing subordinate performance problems,creative problem solving, negotiation, power building, effective self-presentation, conflict management, conducting meetings, and makingoral presentations. These course components and their correspondencewith the assessment center measures are presented in Table I.

308 ASSESSMENT CENTERS

Ii

cuoU

SU

cu

Cl]

c

B

I

II

s

11Oo

c

1

ion

a

Pre

sen

O

LGD

t Exe

rcis

e

• ^

03

Emp

abil

ac c

Pre

tC

omm

.Sk

ill

1 ^

Logi

cP

ersu

a-si

veM

anag

erP

oten

tial

Coo

rdi-

natio

nSy

nth.

Info.

i §

00

.g

aa.

ial

Skill

00

Man

o

g m

eeti

ngs

co

Con

du

•

•

ics

team

dyn

arr

00

Man

ag

•

uo

at

B

Coo

rd

o

iti

Del

egE

•

00c

robl

em s

olv

a.

1 Cre

ati

•

•

lana

gem

ent

Dn

c -s

1 Con

fline

go

g/co

achi

ngat

es

c cS V.

Cou

nssu

bo

•

00c%IU

Inte

rv

•

*

•

•ld

ing

>ent

atio

n)

•3 Si

Pow

er(s

elf-

•

*

•

•

g ot

hers

sl=i

Infl

ue

•

*

•

•

ons

•al p

rese

ntat

i

000

Mak

in•

agem

ent

aP

[Tim

e

M;iye.s el .il. COURSE EVALUATION .109

The objective of the course was to help the students learn the hands-on management skills that they might use to elicit the cooperation ofothers in accomplishing organizational objectives. The mode of instruc-tion emphasized experiential activities during which the students actuallypracticed the skills they were expected to learn.

The organizational behavior class used traditional lecture, films, andcase discussions to provide instruction in the social-psychological theo-ries explaining work behavior, perceptions, and attitudes. Topics coveredin this class were motivation, perception, attitude formation, cognitivedecision processes, learning theory, group processes, leadership, commu-nication, power and politics, and organization theory/development. Ad-ministrative processes used to apply these theories were also surveyed.The administrative tools presented in the class were techniques of perfor-mance appraisal, characteristics of reward systems, processes of commu-nication and organizational change. This course emphasized cognitivelearning rather than skill practice.

MeasuresThe measures used in this study were intended to capture a wide

range of student skills, knowledge, and abilities (SKAs). Work historyquestionnaires and a battery of psychological instruments were adminis-tered to participants during assessment center testing sessions. In thisstudy, these measures were used to test for differences between the twogroups in areas such as need for achievement (Jackson, 1974), need forpower/dominance (Steers & Braunstein, 1976), self-esteem (Rosenberg,1965), tolerance for ambiguity (Budner, 1962), and work experience,such as responsibility for persons, things, or data. These factors couldaffect student performance in these classes regardless of the coursecontent and mode of instruction. A variety of demographic characteristicswere collected from each of the participants as well.

The participants completed the Core Curriculum Assessment Pro-gram (CCAP), a 70-item exam designed by the American Assembly ofCollegiate Schools of Business (AACSB). This test measures studentknowledge in the core business areas and is intended to be used as anoutcome assessment measure in the accreditation process of businessschools. The CCAP OB/HR component was used as an indication ofcognitive learning. Paper-and-pencil tests of course content knowledgewere also administered in each class and were used to assign a portion ofthe course grades.

In addition to the paper-and-pencil measures, participants took partin four typical assessment center exercises: a leaderless group discussion,an oral presentation, a mock interview, and an in-basket exercise. These

Mt) ASSESSMENT CENTERS

exercises were scored by trained psychology students and the .scores wereused as indicators of skill acquisition.

Leaderless group discussiott (LCD). Participants were randomlyassigned to groups consisting of three to five members and were ran-domly assigned discussion topics dealing with local or national socialissues. The discussion topics were developed by the research team usinga class of entering business students. The class was asked to generate a listof problems facing the nation or local society. Then, the students in theclass individually wrote down the topics about which they had someknowledge. The topics chosen for use in the LGD were those that 80% ormore of the class members listed as being in their knowledge set. Duringthe LGD the student groups were instructed to discuss solutions andattempt to arrive at a consensus regarding their solutions to their ran-domly assigned problem. The LGD exercise was videotaped lo be rated ata later date and lasted 20 minutes. The primary dimension of interest inthe LGD exercise, managerial potential is rated on a 9-point Likert scale.

Oral presentation. Partieipants were given 20 minutes to prepare aone minute speech on a controversial issue that was randomly assigned tothem. The controversial topic pool was generated in introduetory busi-ness classes in a manner similar to the development of the LGD topics.Along with the assigned topic the students were given some preparedarguments both for and against the issue that could be used as a startingpoint in developing their preferred position. Partieipants were instructedto choose one side of the issue and try to be as persuasive as possible intheir argument. Their one minute speech was videotaped for rating at alater date. Of the oral presentation dimensions selected for this study,persuasiveness and logical thinking are dichotomous items (0 = No, 1 =Yes) while the overall rating of the presentation is based on a 9-pointscale.

Mock employment interview. The participants were asked to play therole of an interviewee applying for an entry level position in their chosenfield. Each participant was asked ten standard interview questions, andtheir responses were videotaped to be rated at a later date. Dimensionsselected from the mock interview exercise for this study include commu-nication skill, presentation style, and employability which were all ratedon a 9-point Likert scale.

In-basket exercise. The participants were randomly assigned one offour in-basket exercise sets. Each set described a fictitious company andthe manager's role the participant was to assume. An organization chartwas provided showing the focal manager's position in the company andthe names and titles of other key members ofthe organization. The shapeof the organization and the local manager's location were identical for

Mayes cl al. COURSE EVALUATION 31 I

each simulated company. The participants were told to play the role ofamanager who had just arrived in his/her office on a Sunday morning whenno one else was present. The participant was given a set of six in-basketitems and was told to handle them appropriately. The in-basket itemsincluded correspondence from persons external to the organization, memosfrom fellow employees, and requests from superior managers. The par-ticipants were required to prioritize the six items on a worksheet beforeresponding to the items using a word processing software package ol theirehoice. Participants saved their responses on a computer disk so thatraters eould evaluate them at a later date. The participants had one hour tocomplete the exereise. From the in-basket exercise, planning, delegation,coordination, and ability to synthesize information were ehosen for thisstudy and were scored on a 3-point rating seale.

PROCEDURE

In the first week ofthe semester, the students from both classes weretaken to a large computer lab where they completed the demographicquestionnaire, followed by the timed in-basket exercise. After the in-basket exercise, the students scheduled themselves for a testing sessionthe following week. These testing sessions were conducted in the StudentAssessment Center on eampus. During these testing sessions, the partici-pants were given the psychologieal measures, work history questionnaire,CCAP business exam, as well as the remaining three assessment centerexereises; the leaderless group discussion, oral presentation, and mockinterview. This same testing procedure was followed during the last twoweeks ofthe semester with special care given to ensure that a different in-basket set and a different form of the CCAP exam was issued to eachstudent.

After all of the data had been collected, raters were ehosen from apool of undergraduate psyehology student volunteers and given extensivetraining pertaining to each exereise and its respective rating form. Pilotstudies with these student raters indicated that they could achieve consis-tently reliable ratings of assessment eenter exercises (Riggio et al., 1997).For the present study, two raters were chosen to score the leaderless groupdiseussion, oral presentation, and moek interview, while two differentraters scored the in-basket tests. The pairs of raters worked independentlyand evaluated both the pre-course and post-course performanees of thestudent participants. Whether the pre- or post-performanees were scoredfirst was determined randomly for each rater. Reliabilities for thesemeasures reflect interrater eorrelations for the measures.

The elass membership ofthe students was elassified with a dummyvariable (0 = theory-based eourse/eontrol group, 1 = skills-based course/


experimental group). The performance raters did not know to whichgroup the student participants belonged.

ANALYSIS

Multiple regression analysis was the method we used for this studybased on the recommendations of Cohen and Cohen (1983). First, wecomputed a discriminant function analysis to identify whether the twogroups showed initial differences on the personal characteristics thatmight affect course performance (e.g., need for achievement). Since wemeasured a number of individual differences of this nature, it was neces-sary to use a multivariate test of group differences, rather than a series oft tests, which would inflate alpha error. If the discriminant functionanalysis shows significant group differences, it would be necessary tocontrol for these differences in the test of hypotheses.

Hierarchical regression was used to test hypotheses. This test evalu-ates posttest criterion differences between the groups after controlling forpretest scores on the criterion. This procedure also reduces the potentialfor reliability problems associated with using difference scores betweenpre- and post- criterion scores as indicators of change. In the regressionanalysis posttest criterion scores were used as dependent variables. Thecriterion pretest score is entered into the equation first, followed by thedummy coded group membership variable. The pretest score as an inde-pendent variable controls for group differences on the criteria beforetraining takes place. An experimental effect, that is, a learning gaindifference between the two groups, is indicated if a significant positivebeta weight is obtained for the group dummy variable.

RESULTS

The purpose of this study was to determine if assessment centermeasures are sensitive enough to differentiate skill leaming gains be-tween the two types of management courses; theory based and skill based.In order to rule out any pre-existing differences between the two groupsthat might explain variance in the learning gains, a discriminant functionanalysis was performed on the following measures: need for achieve-ment, need for power/ dominance, self-esteem, tolerance for ambiguity,self-efficacy, and work experience. This test revealed no significantdifferences between these groups.

The lack of pre-existing group differences means that the analysis oflearning gains between the two classes could proceed without thesecovariate controls. Means, standard deviations, reliabilities, andintercorrelations of selected dimensions within the assessment centerex,ercises, AACSB business exam, and course dimensions are presentedin Table 2.

Mavcs cl al. COURSE EVALUATION 313

I

sroo

as!

UT3§

iter

<N

(N

O\

<

oCO

o

c

§

QT3

T3C

c

r^ ^ O>TT O rn

rn oo rsir-; "^ -^

— oo — oo •^

oo *^3 r*-i g' - • C

r-j m ^ O •^ oo 2

-g SSgS

o Qo -3- gt vi\ j - ^ 09 cs o —.

_j o (̂ rv| r-i -

^frjqoqâSo

N>o'OfyrM'ôc99oo<0"î— f* '̂'̂1—;rÔ — ~^—^^rnmrncMOrÔ

S S S p 22 2

w^ O^ OO Ô q \ ^x vn ^r ^^ v^ ^-t fvj ^f p^ f̂p ^+•̂̂ lO ON ^^ Ci ' ^ r*% ^r v^ Ov Q^ ON f^ r^ ^J ^r

iliiîillill II ill I I

V5! a.

M !^

^ A•S- X

V

c00


The reliabilities of some performance measures are below the ac-ceptable level of .70 (Nunnally, 1978). These low reliabilities indicatethat there was disagreement between the two raters for these dimensions.The effect of these low reliabilities is to understate the bivariate correla-tions between these measures and others.

The bold entries represent the correlations between pretest andposttest measures. Because pre- and posttest measures are typicallyhighly correlated, it is not surprising that many of these coefficients arestatistically significant. Those that are not seem to be associated withmeasures that have low reliabilities. The significant intercorrelationsamong variables using the same method of measurement, for examplevideo taped performances, indicate the presence of a method bias. Exist-ence of a such a method bias is common with assessment center methodssuch as those employed in this study (Bycio, Alvares, & Hahn, 1987).

As shown in Table 2, the correlation between the OB/HR subscale ofthe AACSB exam and the course exam percentage is .39. indicatingmodest convergent validity for the AACSB exam. The correlation be-tween the participants' class membership and course exam percentagewas -.29 (p < .05), whieh shows that the theory-based class outperformedthe skill-based class on course content examinations. This could indicatelearning gains attributable to an emphasis on top-down cognitive process-ing in the theory-based course as opposed to a bottom-up approach in theskills-based course. Hypothesis 2 seems to be supported, although theeffect size for class membership is small. Approximately 9% of theexamination score variance is associated with class membership.

To determine whether these assessment methods can effectivelyevaluate relevant learning in a skill-based management course, a hierar-chical regression analysis was performed for each criterion. The proce-dure was described above. Because the students' age and amount ofcollege experience might affect performance in the assessment centerthese demographic variables were statistically controlled in hypothesistesting by entering them into the regression at step one, along with thepretest scores. Then, the class membership dummy variable was enteredat step two. The results of this regression analysis are presented inTable 3.

As shown in Table 3, a student's class membership is significant(p < .05) in explaining the learning gains in all of the mock interviewcomponents and in the delegation score derived from the in-basketexercise. However, the effect size for class membership was small,ranging between 4% and 9% of the explained variance for these criteria.Although not significant, comparable effect sizes were obtained for theoverall oral presentation rating (6.25%, p < .10) and synthesizing infor-

I COURSE EVALUATIONMayes cl al.

TABLE 3 Hierarchical Regression ofPosttest Performance Ratingson Pretest Performance Ratings and Class Membership

A.C. Measures

Oral PresentationPersuasivenessLogical ThinkingOverall Rating

Mock InterviewCommunication SkillPresentation StyleEmployability

In-basket ExerciseCoordinationPlanningDelegationSynthesizing Information

Leaderless Group DiscussionManagerial Potential

AACSB Business ExamOB/HR subscale

lietas

Pretest Performance Class Membership

,26**.06.38***

,60***.59***,65***

,01-,04.22*,49***

,26**

,34**

,16,08,25**

,22***,25***,23***

,07,14,30***

. 22**

,27**

,16

•..20: **p<.IO: ***p<.05.

mation (4%, p< .10). Perhaps as a result of low reliability, the otherassessment center measures were not significantly related to class mem-bership, although their beta weights were positive, as hypothesized. Thus,based on the significant beta weights for class membership. Hypothesis 1was partially supported.

DISCUSSION

Assessment Centers for Course EvaluationConcerning the question of whether assessment center methods can

be used to measure student learning in a university management skillscourse, our data show that only the mock interview criteria and thedelegation rating from the in-basket showed a significant differencebetween the skills and theory classes, as hypothesized. The failure to fmdsignificant learning differences for the other AC measures may be due tothe low or inconsistent reliabilities obtained for these measures. Low


reliability would tend to attenuate the correlations between these vari-ables and the group membership variable, thus reducing the statisticalpower to find relationships that may exist.

In the leaderless group discussion, group sizes varied From three tosix participants due to factors such as scheduling problems and absentee-ism. Research in group dynamics suggests that the size of a group mayplay a faetor in determining leader emergence or managerial effective-ness (Thomas & Fink, 1963). Analysis based on LGD performance,therefore, needs to address the potential group size effect. Variance due togroup size was partialled out of pre- and posttest ratings of managerialpotential by way of residual scores which were subsequently used in thesame hierarchical regression procedure employed earlier. This control forgroup size in the leaderless diseussion group, however failed to increasethe significance ofthe class membership variable.

One reason that the management skills elass performance on ACmeasures was not as high as expected might be the students' motivation toperform well on the exereises. The students did not have an extrinsieincentive to perform well on any of the assessments. While they wererequired to participate in the assessment center as one of their courserequirements, their course grade did not depend on AC performance.

Supplemental analyses were performed to control for the students'intrinsie incentive for performing well. For these analyses the students'Need for Aehievement score was entered into the regression before theclass membership variable. Controlling for this covariate in the regres-sion analyses had no effect on the significance of the class membershipvariable for any ofthe assessment measures.

Since need lor achievement (nAch) is characterized by a desire forfeedback, as well as a preference for moderately difficult goals, perhapsour research design removed one of these critical components for highnAch students. We told the students that their performanee on the ACaetivities would not be reported to them because of the research design.Feedbaek on pretest measures might sensitize the students to their defi-ciencies and they might work harder to overcome these deficiencies.While this would be desirable from a developmental perspeetive, it wouldereate a eonfound that would make it difficult to isolate the effects ol' theeourse eontent and method. Future research should address the issue ofwhether student skill learning is enhanced by pretest feedbaek and whetherthis effect is more pronounced for those with high nAeh.

Top-down and Bottom-up Cognitive ProcessingThere is modest support for the notion that top-down and bottom-up

skills are related to student performance in management courses. Top-down learning processes seem to be more evident in the theory class that

Maycsclal, COURSE EVALUATION 317

focused on acquiring knowledge and recalling the knowledge that isacquired. This conclusion is based on the significant correlation (r = -.31;p < ,05) between class membership and course exam perfomiance. Thetheory class performed better on knowledge tests, a top-down process. Onthe other hand, the management skills class tended to perform better onassessment center exercises requiring botlom-up processes. Of impor-tance to student outcome assessment is the need to match the method ofassessment to the kind of learning expected to take place in a given class.Our data would suggest that paper-and-pencil tests are appropriate indica-tors of learning in classes requiring top-down processing. On the otherhand, skill-based performance measures may be more valid indicators oflearning in courses requiring the use of bottom-up cognitive processing.

The OB/HR subcomponent ofthe AACSB standardized exam dem-onstrated a significant correlation with course examination performance(r = .39, p < .01), Since both classes contained topic material in the OB/HR area, this correlation demonstrates modest convergent validity for thestandardized test. However, a paired t-test comparing time one and timetwo scores for the test showed no significant differences between the pre-and posttest sessions. The posttest scores were higher than the pretestscores, which would be expected if learning had taken place. Since bothclasses had completed a full semester of study in this topic area, we mustconclude that the AACSB OB/HR subscale is not a good indicator oftheamount of learning that occurred in these students over one semester. Thetest may be more sensitive, however, to learning gains in courses such aspersonnel management, or specialized HR topics. The content of thesecourses may more closely match the content of the AACSB exam.

Methodological Considerations: Reliability, Cost,and Signiflcance Levels

We used trained undergraduate psychology students to rate thebusiness students' AC performance. Using the criterion of .70(Nunnally,1978) as a desired reliability for research purposes, our way of havingstudents rate the AC exercises independently, produced acceptablereliabilities for only a few of the exercises. Our methods were differentfrom traditional ACs in which raters discuss the performance of a partici-pant and then register their individual ratings. Such ratings should showhigher reliabilities.

The cost of the assessment process is an important consideration indetermining the utility of assessment center methods in evaluating per-sonnel (Cascio & Silbey, 1979). Cost is particularly important in univer-sity settings were budgets have been reduced significantly during the pastdecade. Our use of students as independent raters was intended to reducethe costs associated with assessment center ratings. First, student raters

ASSESSMENT CENTERS

are less costly than faculty raters, practicing managers, or professionalassessment center staff. Tbe cost of rating can be further reduced if raterscan operate independently, ratber than in teams. Our low reliabilities forsome measures suggest tbat using two raters working independently is nola desirable model. Two modifications to our rating method might im-prove measurement reliability. First, student raters could work in pairs torate AC performances. Then the scores of two pairs of raters could be usedto compute interrater reliabilities. A second approach also involves usingmore raters, but to combine their scores into a composite rating whosereliability could be estimated with coefficient alpha. Future researchshould address which ofthese methods produces the best reliabilities. Theuse of student raters appears to be the key to making AC ratings costeffective for universities, but care must be taken to assure that their ratingsare reliable.

A question that arises when performing research in applied settingsconcerns the desirable level of significance needed to rule out the nullhypothesis. The customary significance level used in psychological mea-surement research is p < .05. Results at this level of significance would beobtained only 5% of the time by chance alone. This is highly desirablefrom a theory testing perspective, although there is a developing contro-versy in psychology about whether the significance of a finding should bebased on p-values or effect sizes. We reported both with the understand-ing that moderate effect sizes, such as some of those we found, may not bestatistically significant due to sample size restriction.

In evaluating individual classes, sample sizes may typically be smallbecause of the conventional wisdom that smaller elasses are better learn-ing environments than large elasses. In an edueational system with smallelasses, it may be neeessary to relax the standards of statistical signifi-cance to increase the power of a study to find effect sizes that may matter.The acceptable p-level for evaluation researeh should, of eourse, bejointly determined by appropriate administrators and researchers.

From a purely administrative point of view, we feel it is reasonable toconsider results with a p < .20 significance. Thus, a decision-makerevaluating a program, or course of instruction, would falsely reject thenull hypothesis that the program/course had no effect on the criteria ofinterest only 20% of the time. An ineffective program/course would becorrectly eliminated or changed 80% of the time. Being correct 80% ofthe time should delight administrative decision makers!

Additional Research NeededThis study is one of the first to use AC methods in an experimental

design to evaluate student learning in a university management course.Our findings suggest several avenues of new research that might be

Mavcsclal. COURSE EVALUATION 319

useful. First, AC exercises should be developed that capture the skillsstudents learn from their course-work. Our exercises were based ontraditional AC methods and were designed to measure a broader range ofskills than those covered in the management skills course. More focusedexercises would be desirable and should be based on the specific skillsstudents are expected to acquire in their course-work. Alternatively, ACsmight be used to measure skill acquisition progress over periods greaterthan a single semester. Perhaps an annual assessment of student learningis more appropriate, but that is an empirical issue.

A second research issue is to explore ways to make AC ratings morereliable. In addition to focusing on the methods raters use to reachagreement, it may be simply easier to use more raters. Researchers eancompute alpha reliabilities from raters' independent judgments; the greaterthe number of raters, the higher the alpha. Some of our pilot studies in theAC have shown that increasing the number of raters to 4 will raise skilldimension reliabilities into the .80 to .90 range—a very respectable levelof reliability (Riggio etal., 1997). If trained student raters are used, ratherthan professional psychologists, the costs of assessment can be greatlyreduced.

A highly desirable third avenue of researeh is to explore the develop-mental uses of ACs in academic settings. This use of ACs is alreadywidely practiced in the business community. In education, ACs can beused to develop students, faculty, and administrators (Riggio et al., 1997).In our AC, 66% of the student participants rated their experiences asinteresting and valuable from the standpoint of giving them insight intomanagerial work. The psyehology students who ran the exercises andrated the AC performances reported that their experiences have increasedtheir career relevant skills. Typically our assessors are Industrial/Organi-zational psyehology students who have career goals to enter humanresource staff departments. Some who have obtained such placementhave reported that their AC experience contributed to their being selectedfor their positions. Studies focusing on non-business disciplines areneeded to demonstrate whether sim ilar career enhancement outcomes canbe expected through the use of AC methods.

We believe that AC methods hold considerable promise in educa-tional assessment. If the mission ofthe school is to impart learning, ratherthan simply deliver instruction, assessing the skill and knowledge gainsof the student is perhaps the only way to evaluate mission attainment.While the initial expense of setting up an assessment center may appeardaunting to some administrators, the expense is well warranted in termsof curriculum design, curriculum evolution, and judging curriculumeffectiveness.


REFERENCESBarr. R.B., & Tagg. J. (199.'i). From leaching lo learning—A new paradigm for undergradu-

ate education. Oumf-e. 27(6). 12-25.Budner, S. (1962). Intolerance of ambiguity as a personality \ianah\c.Jouimil<)fPers(imiHly.

30. 29-50.Bycio, P., Alvares, K.M., & Hahn, J. (1987). Situalional specificity in assessment center

ratings: A confirmatory factor analysis. Jiiui^nal of Applied P.iycluiUigy. 72. 463-474.Cascio, W.F., & Silbey, V. (1979). Utility of the assessment center as a selection device.

Journal of Applied Psychology. 64. 107-118.Cohen, J., & Cohen, P. (1983). Applied mulliple regression/airreUilion analyses for the

behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum.Cook, T.D., & Campbell, D.T. (1979). Quasi experimentalion: Design and analysis issues

for field .•idlings. Boston: Houghton Mifflin.Crowder, R.G.,& Wagner, R.K. (1992). The psychology of reading: An introduction. New

York: Oxford University Press. iDeshpande, A.S., Webb, S.C., & Marks, E. (I97O|). Student perceptions of engineering

instructor behaviors and their relationships to the evaluation of instructors and courses.Ameriean Educational Research Journal. 7. 289-305.

Galambos, J.A., Abelson, R.P., & Black, J.B. (1986). Knowledge .structures. Hillsdale, NJ:Erlbaum.

Howard, A., & Bray, D.W. (1988). Managerial lives in transition: Advancing age andchanging times. New York: Guilford Press.

Hudelson, E. (1951). The validity of student ratings of instructors. School and Society. 73.265-266.

Jackson, D.N. (1974). Personality Research Form manual. Port Huron, MI: ResearchPsychologist Press.

Katz, R.L. (1974). Skills of an effective administrator. Harvard Busines.s Review. 51. 90-102.

McKeachie, W.J., Lin, U., & Mann, W. (1971). Student ratings of teacher effectiveness:Validity studies. American Educational Research Journal. 8. 435-445.

Nunnally, J.C. (1978). Psychometric theory (2nd ed.). New York: McGraw-Hill.Porter, L.W., & McKibbin, L.E. (1988). Management education and development: Drift or

thrust into the 21st century. New York: McGraw-Hill.Rayder, N.F. (1968). College student ratings of instructors. Journal of Experimental

Education. 47. 76-81.Riggio, R.E., Aguirre, M., Mayes, B.T., Belloli, C.A., Kubiak, C.R. (1997). The use of

assessment center methods for student outcome assessment. In R.E. Riggio & B.T.Mayes (Eds.), Perspectives on assessment centers. [Special Issue]. Journal of SocialBehavior and Personality. 12(5). 273-288.

Rosenberg, M. (1965). Society and the adolescent self-image. Princeton, NJ: PrincetonUniversity Press.

Steers, R.M., & Braunstein, D.N. (1976). A behaviorally-based measure of manifest needsin work settings. Journal of Vocational Behavior. 9. 251-266.

Thomas, E.J., & Fink, CF. (1963). Effects of group size. Psychological Bulletin. 60. 371-384.

Weaver, C.H. (1960). Instructor rating by college students. Journal of Educational Psychol-ogy. 51.l\-25.

Whetten, D.A., & Cameron, K.S. (1991). Developing management skills (2nd ed.). NewYork: Harper Collins.

Documents

Assessment Centers for Course Evaluations: A Demonstration et al 1997 Assessment Cent… · .306 ASSESSMENT CENTERS school Students might have as a result of their traditional exposure