Psychometric Characteristics of the Behavioral Observation

Western Michigan University Western Michigan University

ScholarWorks at WMU ScholarWorks at WMU

Master's Theses Graduate College

8-1984

Psychometric Characteristics of the Behavioral Observation Scale Psychometric Characteristics of the Behavioral Observation Scale

Gregg Allen Bolt

Follow this and additional works at: https://scholarworks.wmich.edu/masters_theses

Part of the Industrial and Organizational Psychology Commons

Recommended Citation Recommended Citation Bolt, Gregg Allen, "Psychometric Characteristics of the Behavioral Observation Scale" (1984). Master's Theses. 1482. https://scholarworks.wmich.edu/masters_theses/1482

This Masters Thesis-Open Access is brought to you for free and open access by the Graduate College at ScholarWorks at WMU. It has been accepted for inclusion in Master's Theses by an authorized administrator of ScholarWorks at WMU. For more information, please contact [email protected].

http://scholarworks.wmich.edu/


https://scholarworks.wmich.edu/

https://scholarworks.wmich.edu/masters_theses

https://scholarworks.wmich.edu/grad

https://scholarworks.wmich.edu/masters_theses?utm_source=scholarworks.wmich.edu%2Fmasters_theses%2F1482&utm_medium=PDF&utm_campaign=PDFCoverPages

http://network.bepress.com/hgg/discipline/412?utm_source=scholarworks.wmich.edu%2Fmasters_theses%2F1482&utm_medium=PDF&utm_campaign=PDFCoverPages

https://scholarworks.wmich.edu/masters_theses/1482?utm_source=scholarworks.wmich.edu%2Fmasters_theses%2F1482&utm_medium=PDF&utm_campaign=PDFCoverPages

mailto:[email protected]



PSYCHOMETRIC CH ARA CTERISTICS OF THEBEHAVIORAL OBSERVATION SCALE

by

Gregg A l le n Bolt

A Thesis Submitted to the

Faculty o f The Graduate Col lege in p a r t i a l f u l f i l l m e n t o f the

requirements f o r the Degree o f Master o f Ar ts Department o f Psychology

Western Michigan U n iv e r s i t y Kalamazoo, Michigan

August 1984

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

PSYCHOMETRIC C H ARA CTERISTICS OF THEBEHAVIORAL OBSERVATION SCALE

Gregg A l le n B o l t , M. A.

Western Michigan U n i v e r s i t y , 1984

S e l f - , p e e r - , and supervisor ra t in g s were obtained on 52 psy

c h i a t r i c a ides using a Behavioral Observat ion Scale (BOS). S e l f

ra t in g s showed less len iency e r r o r than peer - and superviso r r a t i n g s .

Halo e r r o r could not be assessed due to a n egat iv e c o r r e l a t i o n be

tween means and v a r ia n ces . A m u l t i t r a i t - m u l t i m e t h o d (MTMM) a n a lys is

supported the presence o f strong r a t e r bias and s i g n i f i c a n t conver

gent v a l i d i t y but not d is c r im in a n t v a l i d i t y . The r e s u l t s o f the

analyses demonstrated t h a t the ra t in g s obta ined from a BOS were not

p sychom etr ica l ly super io r to o th er a p pra is a l fo rm ats. Quest ions

were raised as to the adequacy o f a f i v e p o in t s c a le , data t r a n s f o r

mation, and r a t i n g sca les .


ACKNOWLEDGEMENTS

I would l i k e to express s in cere g r a t i t u d e to the many people

involved in the w r i t i n g o f t h i s th e s is .

Thanks goes to Gerald DeWeerd and the nursing supervisors who

were w i l l i n g to undertake t h i s research p r o j e c t . Norman Peterson

deserves spec ial thanks f o r his undying w i l l i n g n e s s to advise me to

and from Grand Rapids, and f o r h is words o f encouragement. A spe

c i a l thanks a lso goes to Peninnah M i l l e r and Bradley Huitema who

provided s t a t i s t i c a l ass is tanc e and c o n s u l t a t io n . I would a lso l i k e

to acknowledge Dale Brethower and Jack Asher who served as committee

members. And to Shery l , my w i f e , a spec ial thanks f o r prov id in g en

couragement and support when the obstacles seemed insurmountable.

Gregg A11en Bolt

i i


INFORMATION TO USERS

This reproduction was made from a copy of a document sent to us for microfilming. While the most advanced technology has been used to photograph and reproduce this document, the quality of the reproduction is heavily dependent upon the quality of the material submitted.

The following explanation of techniques is provided to help clarify markings or notations which may appear on this reproduction.

1.The sign or “ target” for pages apparently lacking from the document photographed is “Missing Page(s)” . I f it was possible to obtain the missing page(s) or section, they are spliced into the film along with adjacent pages. This may have necessitated cutting through an image and duplicating adjacent pages to assure complete continuity.

2. When an image on the film is obliterated with a round black mark, it is an indication of either blurred copy because of movement during exposure, duplicate copy, or copyrighted materials that should not have been filmed. For blurred pages, a good image of the page can be found in the adjacent frame. I f copyrighted materials were deleted, a target note will appear listing the pages in the adjacent frame.

3. When a map, drawing or chart, etc., is part of the material being photographed, a definite method of “sectioning” the material has been followed. It is customary to begin filming at the upper left hand corner of a large sheet and to continue from left to right in equal sections with small overlaps. I f necessary, sectioning is continued again—beginning below the first row and continuing on until complete.

4. For illustrations that cannot be satisfactorily reproduced by xerographic means, photographic prints can be purchased at additional cost and inserted into your xerographic copy. These prints are available upon request from the Dissertations Customer Services Department.

5. Some pages in any document may have indistinct print. In all cases the best available copy has been filmed.

Uni

International300 N. Zeeb Road Ann Arbor, Ml 48106



1323913

BOLT, GREGG ALLEN

PSYCHOMETRIC CHARACTERISTICS OF THE BEHAVIORAL OBSERVATION SCALE

WESTERN MICHIGAN UNIVERSITY M .A. 1984

University Microfilms

International 300 N. Zeeb Road, Ann Arbor, MI 48106



TABLE OF CONTENTS

ACKNOWLEDGEMENTS ....................................................................................................... i i

LI ST OF T A B L E S ................................................ * ........................................ v

LIST OF F I G U R E S ....................................................................................................... V

Chapter

I . INTRODUCTION ................................................................................................. 1

Halo E r r o r .........................................................................................................11

Leniency E r ro r .......................................................................................... 16

Convergent and D iscr im inant V a l i d i t y ...................................... 19

I I . METHOD....................................................................................................................21*

S u b j e c t s ....................................................................................................... 21*

BOS D e v e l o p m e n t ................................. • .......................................................2 k

P r o c e d u r e ........................................................................................................ 26

A n a l y s e s ............................................................................................................. 26

I I I . R E S U L T S ............................................................................................................... 29

IV. DISCUSSION.......................................................................................................... 36

Data D i s t r i b u t i o n ......................................................................................36

Halo E f f e c t s .................................................................................................... *»1

Leniency E f f e c t ..................... ' .................................................................... 1*1

MTMM I n t e r p r e t a t i o n ................................................................................. 1*3

APPENDICES

A. BEHAVIORAL OBSERVATION SCALE FOR PSYCHIATRIC AIDE . . 1*6

B. INSTRUCTIONS FOR B O S ................................................................................... 51

C. ESTIMATES FOR VARIANCE COMPONENTS ............................................... 52

i i i


REFERENCE NOTES ......................................................................................... 53

BIBLIOGRAPHY............................................................................................................ 5i»

I v


LIST OF TABLES

Table

1. Example o f B E S .............................................................................................. 3

2. Example o f B O S ..............................................................................................k

3. Example o f Graphic Rating Scale .................................................... 5

k . Percentage o f Raters in Each Category f o r items on BOS . 29

5. Means, Var iances , One-Way ANOVA, Levene's Test f o r EqualVariances f o r Performance Ratings by Rat ing Course f o r Each U n i t ............................................................................................... 30

6. Dunn-Bonferroni Comparison Tests f o r PerformanceR a t i n g s ................................................................... ... ................................32

7. Weighted Means f o r Source R a t i n g s ................................................. 3*t

8. Three-Way Analys is o f Var iance Summary Tab le ....................... 35

9. Number and Percentage o f Supervisors in Each Category:Latham e t a l . ( 19 79 ) , D a t a ................................................................ 38

LIST OF FIGURES

Figure

1. Mean Source Rat ings by P s y c h i a t r i c U n i t ...................................33

<

v


CHAPTER I

INTRODUCTION

The assessment o f how wel l people perform on t h e i r jobs has been

the focus o f cons iderab le research and debate over the past 60 years .

The academician and the p r a c t i t i o n e r have generated l i t e r a l l y hun

dreds o f a r t i c l e s suggest ing new appra isa l systems, re v is in g old

ones, and bante r in g over how to control f o r r a t e r bias and the l i k e .

Why such a f u r o r over appra is a l systems? The demand f o r an e f f e c t i v e

and e f f i c i e n t method f o r assessing performance a r i s e s out o f one or

more o f the fo l lo w in g fo ur purposes: (a) a p p r a is a ls a re the basis

o f promotion and placement dec is io ns; (b) a p p ra is a ls a re f r e q u e n t ly

used to determine m e r i t a l l o c a t i o n ; (c) a p p r a is a ls a re the c r i t e r i o n

against which s e le c t io n devices and t r a i n i n g programs are v a l i d a t e d ;

and (d) a p p r a is a ls are one o f the pr imary sources o f performance feed

back (Kane & Lawler , 1979) . I f o rg a n iz a t io n s are to s u c ces s fu l ly u t i

l i z e performance a p p r a is a ls as the data base f o r personnel d ec is io n s ,

then they must be concerned about how wel l a given performance appra is

al a c c u r a t e ly r e f l e c t s actual performance. In l i g h t o f t h i s concern,

the purpose o f t h i s study was to exp lore some o f the psychometr ic pro

p e r t i e s o f a r e l a t i v e l y new appra is a l system, the Behavioral Observa

t io n Scale (BOS), developed by Latham and Wexley (1 9 7 7 ) . However, be

fo r e the s p e c i f i c s o f the study are descr ibed , the concerns f o r psycho

m e t r i c a l l y sound performance ap p ra isa ls warrent f u r t h e r comment.

Despite the cont inuous f low o f research and new s t a t e o f the

1


a r t a p p r a i s a l s , many o f the c u r r e n t l y used app ra isa l systems f a l l

short o f expecta t ions in terms o f d is c r im in a n t and convergent v a l i

d i t y , r e l i a b i l i t y , and freedom from r a t e r bias (Kane & Lawler , 1979)-

For those who must r e ly on performance a p p ra is a ls f o r personnel d e c i

sions, awareness o f the l i m i t a t i o n s o f a p p r a is a ls compounds an a lr eady

d i f f i c u l t dec is ion making process. Furthermore, legal requirements

f o r a p p r a is a ls are enforced by The Equal Employment Opportuni ty Com

mission (EEOC), the O f f i c e o f Federal Contract Compliance Programs,

and the co u r ts , who demand v a l i d i t y stud ies f o r a p p r a is a ls c o n t r ib u

t i n g to adverse impact. Since the t ime EEOC wrote "G u ide l ines f o r

Employment S e lec t io n Procedures" (197 0 ) , which in e f f e c t placed legal

requirements on " t e s t s and o th e r s e le c t i o n procedures which a re used

as a basis f o r any employment d e c is io n " (p. 655^3) , numerous court

cases have -been lo s t by o r g a n iz a t io n s because employers implemented

performance a p p r a is a ls c o n t r ib u t i n g to adverse impact. (For reviews

see Cascio & Bernard in , 1981; Schneier , 1978. ) As a r e s u l t o f these

legal pressures, the demand f o r p s ychom etr ic a l ly sound app ra isa l sys

tems has increased in the working community.

In response to the needs f o r e f f e c t i v e ap pra is a l systems, a num

ber o f new models and techn ica l advances in performance a p p ra is a ls

has appeared in the l i t e r a t u r e over the past 20 years ( e . g . , A l le n

6 Rosenberg, 1978; Latham & Wexley, 1977; Rosinger, Myers, & Leoy,

1982; Smith 6 Kendal l , 1963) • One o f the new models is the BOS. W ith

in the l i t e r a t u r e , the BOS is a t t imes r e f e r r e d to as an extension

o f the Behavioral Anchored Rat ing Scale (BARS) o r the Behavioral

Expecta t ion Scale (BES) developed by Smith and Kendall (1963)


( e . g . , Feldman, 1980; Landy & F a r r , 1980 ) . The BES d i f f e r s from the

BARS in t h a t BES behavioral statements a re w r i t t e n as expecta t ions

r a th e r than as neutral behaviors as w i th BARS (BARS and BES w i l l be

used in te rch a n ge ab ly ) . Both BES and BOS are developed using the C r i

t i c a l In c id ent Technique (Flanagan, 1959) . However, the developers

o f a BES generate behavioral anchors from the in c id e n ts , a l l o c a t e the

behaviora l anchors to s p e c i f i c dimensions, and use seven o f the an

chors to form a Thurstone- type r a t i n g s c a le . A BES example is shown

in Tab le 1.

Tab le 1

Example o f BES

M o t iv a t io n : W i l l in g ness to work hard

7 Employee could be expected to lend help too th e r employees when own work is f in i s h e d

6

5 Employee could be expected to o rg an ize t imeto insure complet ion o f tasks

i*

3 Employee could be expected to need f requentreminders about tasks a t hand

2

1 Employee could be expected to f r e q u e n t ly fo r g e tto complete work and report un f in is hed tasks

On the o th e r hand, the developers o f the BOS d e r iv e behavioral

d e s c r ip t io n s from the in c id e n ts , a l l o c a t e the d e s c r ip t io n s to s p e c i f i c

dimensions, and a t ta ch a L i k e r t - t y p e sca le to each d e s c r ip t i o n w i t h in

each dimension. An example is shown in Tab le 2. The s p e c i f i c


procedures f o r developing a BOS are found in Latham and Wexley (1981 ) .

Latham and Wexley (1981) p r i m a r i l y developed the BOS in order to o v e r

come the problems plaguing graphic ra t in g sca le s , a p p ra is a ls based on

c o s t - r e l a t e d outcomes, and the BES.

Tab le 2

Example o f BOS

M o t iv a t io n

1. Lends help to o th er s t a f f when needed

1 2 3 k 5Almost Never Almost Always

2. Organizes t ime so a l l tasks a re completed

1 2 3 ^ 5Almost Never Almost Always

3. Forgets to report unf in ished tasks

1 2 3 ^ 5Almost Always Almost Never

Graphic r a t i n g scales or t r a i t scales have been in c r e a s in g ly c r i

t i c i z e d by o th er researchers and unfavorab ly looked upon by the courts

(Borman 6 Dunnette, 1975; H o l ley & F i e l d , 1975; Kleiman S Durman, 1981;

Kleiman 6 F a ley , 1978; Latham & Wexley, 1977; Sche ie r , 1978) . An ex

ample is shown in Table 3- B r i e f l y , t r a i t s tend to be ambiguous and

cause confusion and m i s i n t e r p r e t a t i o n by r a t e r and r a te e a l i k e . In

a d d i t io n , a t the t ime o f e v a lu a t i o n , unless the e v a lu a to r knows spec i

f i c a l l y what behaviors the t r a i t s denote, feedback from t r a i t scales


5

can f r e q u e n t ly be meaningless or m is lead in g , and subsequently have

l i t t l e impact on f u t u r e performance. Cascio and Bernardin (1981)

suggested, "performance dimensions should be b e h a v io r a l l y based.

Avoid a b s t ra c t t r a i t names in graphic r a t in g s c a le s . "

Table 3

Example o f Graphic Rating Scale

1 2 3 ^ 5Unfavorable Favorable

Quick Tempered

Stubborn _m __

I n t e l 1i gent — _ __

Fi rm _ ___

Appreciates Me — — — — —

Despite the lack o f support f o r graphic r a t i n g sca les , i t should

be noted th a t l i t t l e evidence supports psychometric s u p e r i o r i t y o f

b e h a v io r a l ly based a p p ra is a ls over graphic ra t in g sca les . In a study

designed to assess u t i l i t y o f th ree r a t in g instruments, includ ing a

t r a i t sca le and a BARS, D eco t i is (1977) repor ted t h a t the th ree in

struments were approximately equal in terms o f t h e i r res is tan ce to

er ro rs o f leniency and c e n t ra l tendency. Landy and Farr (1980) in

t h e i r review on r a t in g too ls concluded, " A f t e r more than 30 years o f

serious research, i t seems th a t l i t t l e progress has been made in de

veloping an e f f i c i e n t and psychomet r ica l ly sound a l t e r n a t i v e to the

t r a d i t i o n a l graphic r a t in g sca le" (p. 89) . Though one may conclude,


based on common sense or i n t u i t i o n , t h a t b e h a v i o r a l l y based a p pra is a l

too ls a re superior to t r a i t sca les , data support ing psychometric su

p e r i o r i t y has y e t to be documented.

A ppra isa ls based on c o s t - r e l a t e d outcomes have some v a lu e , but

when used in i s o la t i o n from o t h e r performance d a ta , they can o f ten

be misleading and omit re le v a n t performance in form at ion . Such c o s t -

re l a t e d measures t y p i c a l l y include economic or c o s t - r e l a t e d outcomes

o f the o r g a n iz a t io n ( e . g . , p r o f i t s , costs , re turp on in ves tments ) .

Latham and Wexley (1981) c i t e d the fo l lo w in g problems assoc ia ted

w ith c o s t - r e l a t e d formats: (a) c o s t - r e l a t e d measures f r e q u e n t l y omit

re lev an t f a c t o r s f o r which the r a te e should be held accountable ; (b)

c o s t - r e l a t e d measures are o f te n d i f f i c u l t to o b ta in f o r every employee;

(c) c o s t - r e l a t e d measures cam f o r some employees invo lve fa c t o r s be

yond t h e i r c o n t r o l ; (d) c o s t - r e l a t e d measures can f a i l miserab ly in

prov id ing s p e c i f i c performance feedback necessary f o r increas ing or

m ainta in ing p r o d u c t i v i t y ; and (e) c o s t - r e l a t e d measures can f o s t e r a

" r e s u l t s - a t - a l 1-costs m e n t a l i t y " which can run counter to o r g a n iz a

t io n a l values and goals (pp. 4 1 - 4 4 ) . I t seems t h a t i f c o s t - r e l a t e d

measures a re to be used, they should be c a r e f u l l y s c r u t in i z e d to re

f l e c t on ly the fa c to r s under the contro l o f the ra tee and be used as

complementary data f o r b e h a v io r a l l y based d a ta . Latham, Fay, and S arr i

(1979) supported the need to b e h a v i o r a l l y based d a ta , f o r w ithout i t ,

" i t m a y b e easy to determine whether an employee is or is not meeting

a set o f o b j e c t i v e s , but the answer(s) to the q u es t io n (s ) o f how and

why can remain e lu s iv e " (p. 300) .

The BARS has received a cons id erab le amount o f a t t e n t i o n in the


l i t e r a t u r e since i t s development by Smith and Kendall (1 9 6 3 ) . (For

reviews see Landy 6 F a r r , 1980; Schwab, Heneman, 6 D e c o t i i s , 1975 .)

I t was the i n t e n t io n o f Smith and Kendall to develop a b e h a v io r a l ly

based ra t in g sca le der ived from a complete job a n a ly s i s ; hence, the

scale would take into account a l l c r i t i c a l behaviors o f a job and be

s p e c i f i c enough to avoid the confusion o f ambiguous t r a i t names. In

a d d i t i o n , the sca le could encompass c o s t - r e l a t e d measures. The BARS

was expected to provide p sychom etr ica l ly sound and s p e c i f i c p e r f o r

mance feedback. U n fo r tu n a te ly , the BARS has f a l l e n short o f o r i g i n a l

ex p ec ta t ion s . Studies t h a t have set out to support the psychometric

p r o p e r t ie s o f the BARS have reported equivocal r e s u l t s (Bernard in ,

1977; Bernardin , A lv a res , & Cranny, 1976; Borman 6 Dunnette, 1975;

Campbell, Dunnette, Avery, £ H e l l e r v i c k , 1973; Kingstrom S Bass, 1981

Landy £ F a r r , 1980; Schwab e t a l . , 1975; Shapira £ Sh iron, 1980) .

Borman and V a l lo n (197*0 reported t h a t a f t e r developing a BARS in one

s e t t in g and using i t in another s e t t i n g , the e f f e c t i v e n e s s o f the ap

p ra is a l decreased. S u b j e c t i v i t y in c a t e g o r i z in g the anchors on a BES

is c i t e d as a problem by Latham and Wexley (1981 ) , s ince nonindepen

dent ca teg o r ie s may r e s u l t in redundancy. F i n a l l y , Borman (1979) and

Landy and Far r (1980) argued t h a t f r e q u e n t l y a r a t e r using BARS has

problems d iscern in g the s i m i l a r i t y between anchors on the scale and

actual performance, which may r e s u l t in s i g n i f i c a n t r a t i n g e r r o rs and

poor v a l i d i t y . The f i n a l quest ion raised by Landy and F arr (1980)

concerns whether or not the b e n e f i t s outweigh the costs o f developing

a BARS. This appears to be a l e g i t i m a t e concern in l i g h t o f BARS 1 imi

t a t i o n s .


In order to overcome the l i m i t a t i o n s o f BARS, and ye t r e t a in

the p r a c t i c a l and legal advantages o f a b e h a v io ra l ly -b as ed r a t in g

sca le , Latham and Wexley (1977) developed the BOS. In a discussion

th a t c i t e d fo u r disadvantages w i th BES usage, Latham and Wexley (1981)

argued t h a t such l i m i t a t i o n s do not occur w i th BOS usage. F i r s t ,

"endorsement o f an in c id en t above the neutra l po in t on the BES impl ies

endorsement o f a l l o t h e r inc idents between the inc ident checked and

the neutra l p o in t" (p. 6 3 ) . The r a t e r using the BOS is al lowed to

eva lu a te the ra tee on a l l r e lev an t behaviors w i t h i n a behavioral d i

mension; whereas the r a t e r using BES is forced to e va lu a te the ra tee

on an e n t i r e behaviora l dimension w i th a s in g le endorsement. Prob

lems occur when the r a t e r cannot endorse items between the neutra l

point and the behaviora l item endorsed. This problem does not occur

w ith BOS usage.

Second, " th e s u b je c t iv e d e f i n i t i o n o f ' c r i t i c a l ' is minimized in

the generat ion o f the behaviora l items f o r BOS" (Latham 6 Wexley, 1981,

p. 6 3 ) . In the process o f BES development, on ly those items judged

to be " c r i t i c a l " a re re ta in ed f o r anchors on the r a t i n g sca le , thereby

increasing the chances o f s i g n i f i c a n t l y reducing content v a l i d i t y . Be

cause a l l behav iora l items t h a t a re not redundant are re ta in ed f o r the

BOS, content v a l i d i t y is not jeo p ard iz ed in BOS development.

T h i r d , " in using BES, standard or normal behaviors may not be

remembered in the same way as unusual or unique behav iors" (Latham &

Wexley, 1981, p. 6 A ) . In order to overcome t h i s problem the BES user

must s y s t e m a t ic a l ly record performance on normal, r o u t in e behav ior .

The recording procedure could e a s i l y become a time consuming task i f


the re lev an t behaviors a re unknown. BOS e l im in a t e s t h i s problem be

cause i t serves as a c h e c k l i s t f o r both the r a t e r and r a te e ; i r r e l e

vant behaviors a re ignored.

F i n a l l y , Latham and Wexley suggested t h a t the range o f behaviors

on a s in g le dimension may be biased by the judges who develop i t .

Atk in and Colon (1978) conducted research on the Thurstone scale and

found t h a t when judges b e l ie v e one dimension is s i g n i f i c a n t l y more

important than o t h e r s , they w i l l describe few acceptab le behaviors ,

many unacceptable behav iors , and almost no neu t ra l behav iors . Prob

lems o f t h i s so r t a re avoided i f one uses a BOS. On the BOS a r a t e r

is simply required to r a te the frequency o f behavior observed; a l l

r e le van t behaviors a re found on the sca le . Although Latham and Wexley

(1981) argued f o r the s u p e r i o r i t y o f the BOS, most o f t h e i r arguments

were based on lo g ic ra th e r than research da ta . In assessing the psy

chometric c h a r a c t e r i s t i c s o f the BOS, Latham and Wexley did not argue

tha t the BOS was superior to the BES, but they did suggest t h a t the

scale s a t i s f i e d EEOC requirements and standards.

The studies t h a t supported Latham and Wexley's content ion th a t

the BOS was s a t i s f a c t o r y both in terms o f r e l i a b i l i t y and v a l i d i t y

f o r assessing performance, were s u p r is i n g ly , based on the same data

set . Latham and Wexley (1981) wrote: " In previous s tud ies (Latham &

Wexley, 1977; Latham, Wexley & Rand, 1975; and Ronan & Latham, 197^)

the t e s t - r e t e s t and in te ro b serv er r e l i a b i l i t y , as wel l as the v a l i d i t y

o f the BOS in i n d i c a t in g employee attendance and p r o d u c t i v i t y , were

demonstrated" (p. 63 ) . Al though a complete a n a ly s is o f each study is

beyond the scope o f the present study, i t should be pointed out th a t


10

the th ree stud ies c i t e d in the above quote have hypotheses supported

by the same data set taken from performance o f loggers in the South

eastern United S ta tes . From t h i s , one might conclude t h a t a t best

the r e l i a b i l i t y and v a l i d i t y o f the BOS appears promising, but f u r

t h e r research is necessary before more conclusive statements can be

made.

In a d d i t i o n , Latham and Wexley (1981) contended t h a t the BOS

t y p i c a l l y s a t i s f i e s EEOC standards in terms o f content v a l i d i t y and

in te r ju d g e agreement o f c a t e g o r i z a t i o n . These standards w i l l most

l i k e l y be met i f the procedure t h a t Latham and Wexley (1981, 1977)

described f o r BOS development is fo l lowed .

With respect to r a t e r b ias , Latham and Wexley (1981) suggested

t h a t bias was minimized, "because observers do not have to ex t rap o

l a t e from what they have observed to the placement o f a checkmark be

side an example on the sca le t h a t may or may not be a p p r o p r ia te " (p.

6 3 ) . Empir ical support o f t h i s f i n a l conten t ion has ye t to be docu

mented.

The need f o r f u r t h e r research on the psychometric p r o p e r t ie s o f

the BOS is e v id e n t . From a psychometric s tandpoin t , the BOS has re

ceived on ly i n d i r e c t c r i t i c i s m . The c r i t i c i s m focused on b e h a v io ra l ly

based performance scales has f r e q u e n t ly been d i r e c te d a t the BARS or

BES and on ly i n d i r e c t l y a t the BOS ( e . g . , Landy & F a r r , 1980) . The

content ion t h a t the BOS is an extension o f the BES does not necessar

i l y a l lo w one to argue t h a t the psychometric p ro p e r t ie s o f the BES

are synonomous w i th those o f the BOS because the psychometric pro

p e r t i e s o f the Thurstone sca le are not synonomous w i th those o f the


11

L i k e r t sca le . The need, t h e r e f o r e , f o r independent research on the

psychometric c h a r a c t e r i s t i c s o f the BOS is necessary in o rder th a t

the user o f the BOS may be assured o f i t s e f f e c t i v e n e s s . The re

search a l ready completed on halo e r r o r , leniency e r r o r , convergent

and d isc r im ina n t v a l i d i t i e s w i l l be reviewed.

Halo Er ror

Halo e r r o r has been def ined by Holzbach (1978) as a bias in r a t

ings th a t occurs when a r a t e r evaluates an in d iv id u a l on var ious items

and dimensions w ithout d i f f e r e n t i a t i n g among them, but instead evalu- ’

ates the ra te e according to a s in g le global or o v e r a l l judgment. A

second usage o f halo e r r o r was o f f e r e d by Cooper (1981 ) ; he wrote,

" S a l i e n t f ea tu re s a f f e c t the ra t ing s o f ca te g o r ies th a t the r a t e r be

l iev es are re la te d to the s a l i e n t fe a tu r e s " (p. 218) . Both d e f i n i t i o n s ,

though conceptua l ly d i f f e r e n t , a re o p e r a t i o n a l i z e d s i m i l a r l y . In both

cases, the r a t e r s a re depic ted as t a r n is h in g the ra t in g s by eva lu a t in g

the ra tee in l i g h t o f a global e v a lu a t io n or o f some s a l i e n t f e a t u r e ( s ) .

In a review e n t i t l e d , "Ubiquitous Halo", Cooper (1981) i d e n t i f i e d

two forms o f halo e r r o r th a t occur in a l l ra t in g scales ; i l l u s o r y halo

and t ru e ha lo . I l l u s o r y halo is what one g e n e r a l ly th inks o f as halo

e r r o r ; i t is the bias in ra t in g s th a t most appra is a l users wish to

avoid . True halo is o p e r a t i o n a l i z e d as the t ru e c o r r e l a t i o n s th a t

e x i s t between dimensions. Any c o r r e l a t i o n between two dimensions on

an appra isa l w i l l cons is t o f some i l l u s o r y halo and some t ru e halo.

The c o r r e l a t i o n c o e f f i c i e n t is the sum o f t ru e halo and i l l u s o r y halo.

Both types o f halo war rant review.


12

Five sources o f i l l u s o r y halo have been i d e n t i f i e d by Cooper

(1981) : undersampling, e n g u l f in g , i n s u f f i c i e n t concreteness, in s u f

f i c i e n t r a t e r m ot iv a t io n and knowledge, and c o g n i t i v e d i s t o r t i o n s .

Undersampling as a source o f halo e r r o r occurs when the r a t e r has in

s u f f i c i e n t in format ion on the r a t e e ' s behavior; t h e r e f o r e , the r a t e r

is forced to r e l y on a global impression or a few s a l i e n t fea tu res

to make each r a t in g d ec is io n .

Engul f ing t races halo e r r o r to the r a t e r ' s b e l i e f t h a t ca teg o r ies

covary w i th global impressions or s a l i e n t f e a t u r e s .

Halo e r r o r th a t is a t t r i b u t a b l e to i n s u f f i c i e n t concreteness oc

curs when r a t e r s base t h e i r ra t in g s on s a l i e n t fe a tu re s because r a te rs

are unable to d i f f e r e n t i a t e item dimensions. An im p l ic a t io n o f t h i s

theory is t h a t i f r a t in g dimensions and ra t in g items are h ig h ly des

c r i p t i v e as opposed to a b s t r a c t , halo e r r o r w i l l be reduced. Em pir i

cal evidence f o r t h i s has been equ iv oca l . Cooper (1981) found e v i

dence to support the hypothesis t h a t halo e r r o r is reduced w i th

high ly d e s c r i p t i v e and concrete r a t in g items. F in le y , Osburn, Dubin,

and Jeannert (1977) reached no f i r m conclusions on the e f f e c t s o f

general and s p e c i f i c anchors on t h e i r r a t in g sca le .

A f o u r t h source o f i l l u s o r y halo a t t r i b u t e s biased ra t in g s to

one's i n a b i l i t y or unwil l ingness to s e n s i t i z e o n e s e l f to committ ing

halo e r r o r s . In an at tempt to remove t h i s source o f halo e r r o r ,

attempts have been made to t r a i n r a te rs to reduce i l l u s o r y ha lo . Some

methods o f t r a i n i n g have been more successful than o th e rs , however,

mixed re s u l t s have been more prev a len t ( e . g . , Borman, 1979; Fay S

Latham, 1982; Thorton 6 Zor ic h , 1980 ; Warmke & B i l l i n g s , 1979; Zedeck


13

& Cascio, 1382) .

A f i n a l source o f i l l u s o r y halo occurs as the r e s u l t o f cogni

t i v e d i s t o r t i o n s . Cooper (1981) argued t h a t stored ob servat io ns be

come d i s t o r t e d over t ime as one adds and d e le tes in format ion in the

c o g n i t i v e process. E s s e n t i a l l y d e t a i l is lo s t and b e l i e f s about d i

mension covar iance a re added. As in the t y p ic a l o r g a n iz a t io n w i th an

annual review, r a te rs must r e c a l l an i n d i v i d u a l ' s performance from

the past year ; actual behaviors cannot be r e c a l l e d , but impressions

r e s u l t i n g from cross-dimension c o r r e l a t i o n s a re r e c a l l e d . I f the

cross-dimension c o r r e l a t i o n s o v e r s t a t e t ru e covariance , i l l u s o r y halo

r e s u l t s . Cooper wrote, “Th is f i f t h source has been unappreciated in

the h a lo - re d u c t ion l i t e r a t u r e " (p. 2 21 ) .

For those who must r e ly on r a t i n g scales f o r performance reviews,

the presence o f t ru e halo in a d d i t io n to i l l u s o r y halo make ra t in g s

d i f f i c u l t to i n t e r p r e t . The premise t h a t t r u e c o r r e l a t i o n s e x i s t be

tween dimensions has been supported in the l i t e r a t u r e (Cooper, 1983;

Fay & Latham, 1982; Murphy, 1982) . Cooper (1981) argued t h a t the

a b i l i t i e s to perform a s p e c i f i c job a re f r e q u e n t ly more homogeneous

than heterogeneous. Cooper concluded t h a t a lthough a job may possess

various d u t i e s , the s k i l l s and a b i l i t i e s to perform those d u t ie s are

o f te n dependent and c o r r e l a t e d r e s u l t in g in t ru e halo on performance

a p p r a is a ls . For the researcher as wel l as the employer o f r a t in g

sca les , the im p l ic a t ion s o f t h i s would suggest t h a t in order to as

sess what is t r u e halo and what is i l l u s o r y halo, actua l between d i

mensions c o r r e l a t i o n s must be computed. Murphy (1982) agreed and

w ro te , “ Unless the researcher has some independent es t im ate o f t ru e


c o r r e l a t i o n s among performance dimensions and o f the c o r r e l a t i o n s

between performance appra is a l items and o v e r a l l e v a lu a t io n s , i t is

simply not poss ib le to c i t e the observed c o r r e l a t i o n s as evidence

o f ra t in g e r r o r " (p» 162) .

The evidence th a t supported the e x is ten c e o f t r u e and i l l u s o r y

halo was convincing and was taken in to co n s id era t io n when assessing

halo e r r o r in the present study. No attempt was made to suggest a

higher o r lower magnitude o f halo e r r o r using the BOS as compared to

o th e r ap pra isa l systems. The present study was designed to assess

magnitudinal d i f f e r e n c e s in halo e r r o rs across r a t e r s ( s e l f , peer ,

and supervisor ) using the BOS. I t was concluded then t h a t what was

i l l u s o r y halo and what was t r u e halo would make no d i f f e r e n c e s in the

r e s u l t s o f the present study.

As suggested above, the present study was concerned w i th halo

er r o r s as they occurred across r a t e r s ; a s i g n i f i c a n t amount o f r e

search has been conducted which addresses these issues, a l though none

has been found which uses the BOS. (See Landy & F a r r , 1980 f o r rev iew . )

Studies which examined d i f f e r e n c e s in halo e r r o r across the ro l e o f

the r a t e r reported equivocal r e s u l t s . Thorton (1980) reviewed the

l i t e r a t u r e on psychometr ic p r o p e r t ie s o f s e l f - a p p r a i s a l s and found

12 studies r e p o r t in g higher incidence o f halo f o r s e l f - r a t i n g s vs.

peer - and s u p e r v i s o r - r a t i n g s ; however, he a lso found 10 stud ies where

s e l f - a p p r a i s a l s manifested less halo than comparison groups. The d i f

ferences in halo e r r o r in p e e r - r a t i n g s vs. s u p e r v i s o r - r a t in g s has a lso

been s tud ie d . Klimoski and London (197*0 reported a g r e a t e r degree

o f halo e r r o r in p e e r - r a t i n g s , whereas Holzbach (1 9 7 8 ) , found s i m i l a r


15

degrees o f halo in p e e r - r a t in g s and s u p e r v is o r - r a t i n g s .

Researchers have suggested var ious hypotheses to e xp la in halo

e r r o r d i f f e r e n c e s among r a t e r s . One hypothesis s ta te s th a t r a t e rs

who occupy d i f f e r e n t ro les in an o rg a n iz a t io n view any one p a r t i c u

l a r job from d i f f e r e n t vantage po in ts (Borman, 197**; Holzbach, 1978;

Schneier & B eat ty , 1978; Zedeck, Imporato, Krausz, Oleno, 197*0-

Each r a t e r in t h i s case may have d i f f e r e n t expecta t io ns f o r an i n d i

v i d u a l ' s performance based on t h e i r ( r a t e r ) own jobs and exper iences.

Schneier and Beat ty (1978) hypothesized t h a t " d i f f e r e n c e s in job d u t ie s

and p r o x im i ty , causing d i f f e r i n g frequencies and/o r du ra t io n o f obser

va t io n or ra tee performance, could account f o r d ive rg en t r a t in g s given

by, f o r example, super iors and peers" (p. 130) .

I f the problem o f i n t e r p r e t a t i o n o f appra isa l dimensions and

items e x is t s f o r r a t e r s occupying d i f f e r e n t vantage p o in t s , then one

might expect t h a t i f the ap pra is a l items were very s p e c i f i c and be

hav iora l in nature then every r a t e r from each vantage po in t should

i n t e r p r e t the app ra isa l item in the same way, thus reducing the d i f

ferences in halo e r r o r across r a t e r s . A f t e r examining the stud ies

tha t assessed halo e r r o r across r a t e r s , i t was found t h a t many research

ers used graph ic r a t in g scales or BARS (Borman, 197**; Heneman, 197**;

Holzbach, 1978; Klimoski & London, 197**; Lawler , 1967; Lee, Malone, S

Greco, 1981; Parker , T a y l o r , B a r r e t , & Martens, 1959; Schneier & B ea t ty ,

1978; Zammuto e t a l , , 1982) . In comparison, Cooper (1 983) reported

tha t by using very s p e c i f i c , behav iora l r a t in g items, halo e r r o r was

reduced. I f i n s u f f i c i e n t concreteness promotes halo e r r o r , and p a r t i

c u l a r l y , causes r a t e r s from d i f f e r e n t ro les to commit d i f f e r e n t degrees


o f h a lo e r r o r , then i t can be hyp o th es iz ed t h a t by u t i l i z i n g a beha

v i o r a l r a t i n g s c a l e l i k e the BOS, d i f f e r e n c e s in h a lo e r r o r s across

r a t e r s w i l l be v i r t u a l l y n i l . The p o i n t is t h a t r a t i n g sca le s t h a t

c o n s i s t o f i tems t h a t a r e c l e a r l y d e l i n e a t e d , s p e c i f i c , and measur

a b l e should not be s u b j e c t to m i s i n t e r p r e t a t i o n f rom any van tage

p o i n t . T h e r e f o r e , no d i f f e r e n c e s in h a lo e r r o r s should occ ur across

r a t e r s f rom d i f f e r e n t r o l e s in the o r g a n i z a t i o n . In the p r e s e n t s tu dy ,

i t was h yp othes iz ed t h a t no d i f f e r e n c e s in degree o f h a lo e r r o r would

occur across s u p e r v i s o r - , p e e r - , and s e l f - r a t i n g s .

Leniency Er ror

According to Holzbach (1978 ) , " le n ie ncy e r r o r s , a t t r i b u t a b l e to

s p e c i f i c r a t i n g sources, occur when r a t ing s from d i f f e r e n t ra t in g

sources on the same ra te e group a re s i g n i f i c a n t l y d i f f e r e n t " (p. 579 ) .

Latham and Wexley (1981) suggested t h a t negat ive and p o s i t i v e l e n i

ency e r ro rs are committed by employers who ra te too easy or too

hard. Two problems occur w i th undue leniency e r r o r s ; one measurement

problem and one p r a c t i c a l problem.

The measurement problem occurs when leniency e r r o r s cause undue

r e s t r i c t i o n o f range on the performance ra t in gs which l i m i t s the mag

n i tu d e o f the poss ib le r e la t i o n s h i p between the ra t in g s and o th e r

v a r ia b le s o f in t e r e s t (Holzbach, 1978) . The p r a c t i c a l problem occurs

when the ra tee in t e r p r e t s the performance r a t in g s . With p o s i t i v e l e n i

ency e r r o r s , the performer w i l l i n c o r r e c t l y assume adequate performance

and cont inue w it h poor performance. The appra isa l contaminated w i th

negat iv e leniency e r ro rs w i l l i n c o r r e c t l y r e f l e c t poorer performance

than a c t u a l l y occurs. This performer may be deprived o f rewards or


17promotions t h a t were deserved. In e i t h e r s i t u a t i o n , poor performance

may r e s u l t w i th undue len iency e r r o r s .

With respect to len iency e r r o r s as they occur across r a t e r s , the

evidence suggests t h a t s e l f - a p p r a i s a l s a re more le n i e n t than e i t h e r

peer - o r s u p e r v is o r - r a t in g s (Holzbach, 1978; Klimoski & London, 197*t;

Meyer, 1980; Parker e t a l . , 1959; Schneier , 1978; Thorton, 1980) , a l

though one study (Heneman, 197*0 reported less len iency e r r o r f o r s e l f

a p p r a is a ls in comparison to s u p e r v is o r - a p p r a i s a 1 s. With respect to

su p e rv is o r - and p e e r - a p p r a i s a ls , two s tud ies (Schneier , 1978; Zedeck

e t a l . , 197*0 reported t h a t s u p e r v is o r -a p p ra is a ls demonstrated less l e

niency e r r o r s , and one study (Holzbach, 1978) reported no s i g n i f i c a n t

d i f f e r e n c e s .

Very l i t t l e research has been done to e xp la in why len iency e r ro rs

occur. Zammuto e t a l . (1982) reported t h a t o r g a n i z a t io n a l d i f f e r e n c e s

in len iency e r r o r occurred f o r s ix items on t h e i r performance appra isa l

though no conclusions were reached as to why t h i s occurred . No doubt

many o f the r a t e r c h a r a c t e r i s t i c s discussed e a r l i e r a f f e c t len iency

e r r o r s . In a d d i t io n , len iency e r r o rs may be g r e a t e r simply because

an in d iv id u a l wants to make h i m s e l f / h e r s e l f appear competent or a f e l

low employee to appear competent . The consequences o f the appra isa l

then could play an important r o le ; Zedeck and Cascio (1982) supported

t h i s assumption. Consis tent w i t h the research reviewed, i t was hypo

thes ized t h a t s e l f - a p p r a i s a l s would demonstrate more len iency e r ro rs

than e i t h e r peer - or s u p e r v i s o r - r a t in g s .

Before the l i t e r a t u r e is reviewed on convergent and d isc r im in an t

v a l i d i t y , a discussion on the o p era t io n a l d e f i n i t i o n s o f halo e r r o r


and len iency e r r o r is warranted , s ince some have found t h a t d i f f e r e n t

o p e ra t io n a l d e f i n i t i o n s y i e l d d i f f e r e n t values o f e r r o r ( e . g . , Saa l ,

Downey, & Lahey, 1980) . Saal e t a l . described four methods o f halo

e r r o r assessment taken from the l i t e r a t u r e . B r i e f l y , the methods are

the f o l lo w i n g : (a) comparison o f mean dimension r a t in g s which examine

the i n t e r c o r r e l a t i o n s among d i f f e r e n t dimensions, h igher i n t e r c o r r e l a

t io ns suggest g re a t e r halo e r r o r ; (b) the r e s u l t o f f a c t o r a n a ly s is o f

the dimension i n t e r c o r r e l a t i o n m a t r ix , fewer f a c t o r s or p r i n c i p l e com

ponents t h a t emerge are i n d i c a t i v e o f g r e a t e r halo e r r o r ; (c) a n a lys is

o f the var ian ce or standard d e v ia t io n s o f a r a t e r ' s r a t i n g o f an i n d i

vidual across each performance dimension, less var ian ce or r e s t r i c t e d

standard d e v ia t io n s suggests g r e a t e r incidence o f halo e r r o r ; and

(d) r a t e r x ra tee x dimension ANOVA, where a s i g n i f i c a n t r a t e r x ra tee

i n t e r a c t i o n , e s p e c i a l l y one t h a t accounts f o r a s i z e a b le propor t ion o f

the t o t a l v a r ia n c e , is in t e r p r e t e d as halo e r r o r . The o p e ra t io n a l

d e f i n i t i o n used in the present study to assess incidence o f halo e r

ro r among r a t e r s is the t h i r d d e f i n i t i o n , examinat ion o f r a t e r v a r i

ance. Al though Saal e t a l . (1980) c r i t i c i z e d a l l fo u r d e f i n i t i o n s as

poor in d ic a to rs o f abso lute ha lo , f o r the present purposes o f compar

ing halo e r r o r among r a te r s t h i s d e f i n i t i o n w i l l s u f f i c e .

According to Saal e t a l . ( 1 980) th ree o p era t io n a l d e f i n i t i o n s e x i s t

f o r assessing len iency e r r o r . The f i r s t d e f i n i t i o n , the most popular

one, is to compare the mean dimension ra t in g s w i th the m id -po in t o f

the sca le . Mean ra t in g s t h a t s i g n i f i c a n t l y exceed the m id -po in t o f

the sca le r e f l e c t len iency; whereas, mean ra t in g s t h a t are below the

midpoint o f the sca le r e f l e c t s e v e r i t y . The second d e f i n i t i o n


suggests a r a t e r x r a t e e x dimension ANOVA. A r a t e r main e f f e c t ,

e s p e c i a l l y one t h a t accounts f o r a la rg e pro por t ion o f the t o t a l v a r i

ance, is said to r e f l e c t len iency . F i n a l l y , Saal e t a l . suggested a

few stud ies examined the degree o f skewness o f dimension ra t in g s f o r

evidence o f len iency . A s i g n i f i c a n t neg at iv e skewness o f dimension

ra t in g s is said to r e f l e c t len iency; whereas, a s i g n i f i c a n t p o s i t i v e

skewness is said to r e f l e c t s e v e r i t y . A problem w i t h assessing l e n i

ency is th a t w i thout ac tua l performance d a ta , no abso lu te degree o f

len iency can be determined. The present study o p e r a t i o n a l l y def ined

len iency as present when s u p erv iso r - , peer- , and sel f -mean ra t in g s

d i f f e r e d s i g n i f i c a n t l y across behav iora l i tems. I n t e r e s t was in ex

amining incidence o f len ien cy , not in assessing abso lu te len ie ncy .

Convergent and D iscr im in an t V a l i d i t y

Since Campbell and " is k e (1959) f i r s t introduced the m u l t i t r a i t -

multimethod a n a ly s is (MTMM) as a means to assess convergent and d i s

cr im in an t v a l i d i t y , a s i g n i f i c a n t amount o f research has been done to

assess these v a l i d i t i e s o f ap pra is a l sca les . (For reviews, see Holz

bach, 1978; Lee, Malone, 6 Greco, 1981; Kavanaugh, MacKinney, 6 Wolins

1971.) Included in t h a t research were in v e s t i g a t i o n s rep o r t in g on the

d is c r im in a n t and convergent v a l i d i t y o f the ra t in g s obta ined w i th BARS

A few studies (Dickenson & T i c e , 1973; and Zedeck & Baker, 1972) have

reported l i t t l e o f e i t h e r v a l i d i t i e s . In c o n t r a s t , Friedman and Corne

l i u s (1976) found evidence o f convergent v a l i d i t y and less halo when

p a r t i c i p a n t s were a c t i v e in BARS c o n s t ru c t io n . In a d d i t io n , Lee,

Malone, and Greco (1981) using MTMM, found good convergent and d i s c r i

minant v a l i d i t y using a summated r a t in g sca le . No research was found


2 °

assessing d is c r im in a n t v a l i d i t y and convergent v a l i d i t y o f a BOS.

Before the present hypothesis is proposed, perhaps i t is worthy

to review the meaning and value o f both convergent and d is c r im in a n t

v a l i d i t y . Convergent v a l i d i t y has been def ined by Holzbach (1978) as,

"the e x ten t o f agreement between two or more measures o f the same

t r a i t using d i f f e r e n t methods" (p. 580 ) . D iscr im in an t v a l i d i t y has

been de f ined by the same as, " th e ex ten t o f independence between mea

sures o f d i f f e r e n t t r a i t s " (p. 5 80 ) . Al though the d e f i n i t i o n s o f con

vergent and d is c r im in a n t v a l i d i t y a re f a i r l y s t r a i g h t f o r w a r d the value

o f assessing them is more obscure. Lawler (1967) w ro te , " th e pr imary

gain from a research po in t o f view is t h a t t h i s approach [MTMM] al lows

the researcher to develop a much more s o p h is t ic a te d understanding o f

his c r i t e r i a than is poss ib le where i t is not employed" (p. 3 72 ) . Part

o f t h i s understanding, as expla ined by Lawler , comes about through

determining convergent and d is c r im in a n t v a l i d i t y . U n f o r t u n a t e ly , i t

seems t h a t , l i k e Lawler , numerous researchers have assumed t h a t more

in format ion is b e t t e r than none, s ince many o f the stud ies reviewed

by the present researcher never mentioned why d is c r im in a n t and conver

gent v a l i d i t y were being assessed.

The importance o f determin ing why convergent and d isc r im in a n t

v a l i d i t y are v a lu a b le must be accomplished so t h a t the r e s u l t s o f any

study assessing them can be put in to proper p e rs p ec t ive . F i r s t , Camp

b e l l and Fiske (1959) wrote , " v a l i d a t i o n is t y p i c a l l y convergent , a

conf i rm at io n by independent measurement procedures. Independence o f

methods is a common denominator among the major types o f v a l i d i t y

(except ing content v a l i d i t y ) in so fa r as they are to be d is t in g u ish ed


from r e l i a b i l i t y " (p. 8 1 ) . Al though convergent v a l i d i t y cannot and

does not e v a lu a te v a l i d i t y in the absolu te sense, evidence o f conver

gent v a l i d i t y , c o r r e l a t io n s between the same items or dimensions as

rated by var ious r a te rs being s i g n i f i c a n t l y l a r g e r than zero , i n d i

cates t h a t r a t e rs are ra t in g the same const ruc t o r behav ior , as is

the case w i th a BOS. Th ere fo re , w i th a BOS, s i g n i f i c a n t convergent

v a l i d i t y would suggest th a t r a te r s (peer , s e l f , and supervisor) are

r a t in g the same const ruct o r behavior .

Second, Campbell and Fiske (1959) w ro te , " f o r j u s t i f i c a t i o n o f

novel t r a i t measures, f o r the v a l i d a t i o n o f t e s t i n t e r p r e t a t i o n or f o r

the es tabl ishment o f constru ct v a l i d i t y , d is c r im in a n t v a l i d a t i o n as

wel l as convergent v a l i d a t i o n is re q u i red . Tests can be in v a l id a te d

by too high c o r r e l a t i o n s w i th o th e r t e s t s from which they were intended

to d i f f e r " (p. 8 1 ) . In the same way, behav iora l items on a BOS can

be in v a l id a t e d by too high c o r r e l a t i o n s w i th o th e r items from which

they were intended to d i f f e r . I f the BOS used in the present study

was found to possess some s i g n i f i c a n t degree o f d is c r im in a n t v a l i d i t y ,

then the present BOS could be said to d i f f e r e n t i a t e among behavioral

i tems. Whether or not d i f f e r e n t i a t i o n was accura te and meaningful

cannot be argued wi th out a d d i t io n a l " t r u e " performance da ta . A l l

t h a t could be argued is th a t the behav iora l items do d i s c r im i n a t e per

formance among ratees and /or w i t h i n one ra te e in an o r d e r l y fash io n .

I t should be summarized again a t t h i s time t h a t even i f a BOS was

found to possess s i g n i f i c a n t degrees o f convergent and d is c r im in a n t

v a l i d i t y , d i r e c t inferences about the accuracy or c r i t e r i o n - r e l a t e d

v a l i d i t y o f the items could not be made. I t is poss ib le t h a t a BOS


could possess both types o f v a l i d i t y f o r assessing performance, ye t

be in v a l i d in the sense t h a t i t measures " t r u e " performance. True

performance data are needed to make t ru e v a l i d i t y in fe rences . This

observat ion about convergent and d is c r im in a n t v a l i d i t y was f u r t h e r

d e l in e a te d and supported by Lawler (1967 ) -

S im i la r to the purposes o f the present study, o th e r i n v e s t ig a t io n s

t h a t have-ut i 1 ized MTMM approach to assess convergent and d is c r im in a n t

v a l i d i t y f o r combinat ions o f s u p e r v is o r - , s e l f - , and p e e r - r a t in g s have

g e n e r a l ly reported support f o r convergent v a l i d i t y and l i t t l e or no

support f o r d is c r im in a n t v a l i d i t y (Heneman, 197**; Kavanagh e t a l . , 1971

Klimoski & London, 197**; Lawler , 1967; Lee, Malone, & Greco, 1981) .

Lack o f d is c r im in a n t v a l i d i t y has been p r i m a r i l y a t t r i b u t e d to the oc

currence o f la rge halo e f f e c t s . One observat ion t h a t can be made re

garding these studies is t h a t the a ppra is a l scales used were graphic

r a t i n g scales or BARS. As pointed out e a r l i e r , using a summated r a t in g

scale w i th s p e c i f i c items produced good convergent and d is c r im in a n t

v a l i d i t y (Lee e t a l . , 1981) .

In the present study, the d is c r im in a n t v a l i d i t y and convergent

v a l i d i t y o f the BOS to measure performance were explored using the

MTMM. The m u l t i t r a i t s were the behaviora l items on the BOS. The

mult imethods were the r a t e r sources, peer, s e l f , and superv isors . I t

was hypothesized t h a t s ince the items were very concrete and s p e c i f i c ,

s i g n i f i c a n t convergent v a l i d i t y and d is c r im in a n t v a l i d i t y would be

found.

In review, the o ther two hypotheses pre v io u s ly proposed are th a t

(a) no d i f f e r e n c e s in degree o f halo e r r o r would occur across r a t e r


sources; and (b) s e l f - r a t i n g s would demonstrate more len iency e r r o r

than p e e r - r a t in g s or s u p e r v is o r - r a t i n g s .


CHAPTER I I

METHOD

S u b je c ts

Subjects f o r BOS development were 29 male p s y c h i a t r i c a ides

(a ides) and 1 female a id e randomly se lected from a popu la t ion o f

95 aides from a p s y c h i a t r i c f a c i l i t y in Western Michigan.

Subjects who provided data f o r hypotheses t e s t in g included 49

male aides and 2 female a id es , s ix nursing superv isors , and an un

determined number o f peers , both aides and nurses (R N 's ) , p roviding

156 peer ra t i n g s . (The number o f peers could not be determined,

since some peers rated more than one a id e and r a t e r ' s names were kept

anonymous.)

BOS Development

The procedure fo l lowed f o r BOS development c l o s e l y resembled the

procedure o u t l i n e d by Latham and Wexley (19 81 ) . From a computer gen

era ted l i s t o f 95 a id es , 30 aides were randomly se le c ted using a t a b l e

o f random numbers. From each o f the s ix p s y c h i a t r i c subun its, a pro

po r t io n o f a ides was se le c ted equal to the proport ion o f a ides on the

s p e c i f i c u n i t to the t o t a l number o f a id es . The c r i t i c a l inc id ent

technique developed by Flanagan (1959) was u t i l i z e d to c o l l e c t ten

c r i t i c a l inc id ents from each a id e , f i v e inc idents t h a t described e f

f e c t i v e behavior and f i v e inc id ents th a t described i n e f f e c t i v e beha-

v i o r a n d f i v e inciden ts t h a t described i n e f f e c t i v e behavior f o r each

24


25

a id e . (F ur th e r c l a r i f i c a t i o n o f e f f e c t i v e and i n e f f e c t i v e in c id ents

can be found in Latham and Wexley, 1981.)

A t o t a l o f 300 inc idents was c o l l e c t e d . A f t e r i n i t i a l screening

f o r redundancy and ambig u ity , a l i s t o f 167 in c id ents was obta in ed .

Since each o f the s ix u n i ts t re a te d d i f f e r e n t types and ages o f

p a t i e n t s , a id e job d u t ies v a r i e d , n e c e s s i ta t in g the development o f

f i v e s l i g h t l y modif ied BOS's. Relevancy o f the 167 inc id ents to the

u ni ts was accomplished by the resp ect iv e nursing s uperv isor . Each

supervisor e d i te d the l i s t f o r r e lev an t items and a p p ro p r ia te medi

cal ja rg o n . A t o t a l o f 92 inc id ents remained.

The next step was to determine o v e r a l l job ca te go r ie s o r beha

v i o r a l c r i t e r i a under which the inc idents would be grouped ( e . g . , work

h a b i ts , s t a f f i n t e r a c t i o n s , communication) . Nine o f the ten c r i t e r i a

se lected were chosen from the e x i s t i n g a id e job model. Two a id es ,

one nurse, and the researcher c o l l e c t i v e l y assigned the inc idents to

the broader behav iora l c a te g o r ie s . Eighteen items did not seem to

f i t under the given nine c r i t e r i a , so Work Habi ts was se lec ted as a

ten th c r i t e r i o n .

Content v a l i d i t y was assessed in two ways. F i r s t , i t was checked

to insure t h a t each accomplishment l i s t e d in the job d e s c r ip t io n was

represented by an in c id e n t . No items were added or d e le t e d . Second,

a completed BOS was sent to each supervisor and they were asked to

add, d e l e t e , and /o r e d i t items to make the appra isa l job r e leva n t f o r

the re s p e c t iv e u n i t . Supervisors d e le ted h to 26 items.

From the c o r r e c t io n s , f i v e Behavioral Observat ion Scales were

const ructed w i th an item range o f 66 to 88. A l l f i v e BOS's re ta in ed


the 10 general behav iora l c r i t e r i a . For items represent ing e f f e c t i v e

behav ior , ( 1) almost never and ( 5) almost always served as anchors

on the r a t in g s ca le . For items represent ing i n e f f e c t i v e behaviors;

( 1) almost always and ( 5) almost never served as r a t in g anchors.

T h ere fo re , a S_ always represented superior performance. Percents

d e f in in g the values o f 1 to 5 can be found in Appendix B as wel l as

the in s t r u c t io n s f o r complet ing the BOS.

Procedure

Over a f ive-month per io d , e va lu a t io n s o f a id e performance were

completed. Three sources provided e v a lu a t io n s : peers, s e l f , and su

p e rv iso rs . With respect to the peer e v a lu a t io n s , 25 aides were per

m it ted to choose between nurses and/or a ides to complete t h e i r e va lua

t io n s ; 27 o th e r aides had peers assigned to r a t e them by t h e i r super

v is o rs . The number o f peers rep o r t in g data on a s in g le a ide va r ied

from two to n in e . For the ana lyses , peer ra t in g s were averaged in to

a s in g le r a t i n g . A l l ra t in g s were recorded on computer scoring cards.

Ana 1yses

The analyses were computed using the data on behav iora l i tems,

those found common to the f i v e BOS's. (See Appendix A f o r the ^8

item BOS).

P r io r to running the data analyses fo r the assessment o f halo

e r r o r , len iency e r r o r , and convergent and d is c r im in a n t v a l i d i t y ,

histograms were p l o t t e d fo r each o f the f i v e p s y c h i a t r i c un it s ( u n i t s )

across the three r a t e r sources in order to assess fo r n o rm al i ty o f


27

the data d i s t r i b u t i o n . A normal d i s t r i b u t i o n o f data has been shown

to be a necessary assumption o f the ANOVA model (Hopkins & Glass, 1978) .

A t o t a l o f 15 histograms was p l o t t e d .

In o rd er to assess convergent and d isc r im in a n t v a l i d i t y and r a t e r

bias f o r the f i v e u n i ts taken c o l l e c t i v e i y , the ANOVA technique des

cr ib ed by Kavanagh e t a l . (1972) and Stanley (1961) was u t i l i z e d . By

using a three-way f a c t o r i a l design, Kavanagh e t a l . found t h a t very

l a rge MTMM mat r ices could be analyzed w ith considerab ly less e f f o r t .

For example, i f the present study assessed convergent and d isc r im in an t

v a l i d i t y by comparing in t e r c o r r e l a t i o n s as was the technique described

by Campbell and Fiske (195 9 ) , 13*000 i n t e r c o r r e l a t i o n s would have to

be examined and compared! The a l t e r n a t i v e design allowed f o r the as

sessment o f convergent v a l i d i t y by t e s t i n g f o r s i g n i f i c a n t main e f f e c t s

across aides and f o r the assessment o f d is c r im in an t v a l i d i t y by te s t in g

f o r a s i g n i f i c a n t i n t e r a c t i o n between aides and behaviora l items in a

3 x A8 x 52 f a c t o r i a l design, where there were 3 r a t e r sources, A8 be

hav iora l i tems, and 52 a id es .

Included in the ANOVA a n a ly s is described by Kavanagh e t a l . was an

assessment procedure f o r r a t e r bias o p e r a t io n a l i z e d as a s i g n i f i c a n t in

t e r a c t io n between aides and r a t e r sources. Some researchers have in cor

r e c t l y r e f e r r e d to r a t e r bias as halo e r r o r ( e . g . , Kavanagh e t a l . , 1971;

Holzbach, 1978; Lee e t a l . , 19 81 ) . I t should be pointed out t h a t r a t e r

bias may be due to halo e r r o r , but t h a t a s i g n i f i c a n t i n t e r a c t io n be

tween aides and r a t e r sources ( r a t e r bias ) may a lso be due to leniency

e r r o r or some o th e r systematic r a t e r b ias . T h e re fo re , f o r comparison

purposes w i th o th e r l i t e r a t u r e , r a t e r bias was c a lc u la t e d though i t was


2 8

not considered a measure o f halo e r r o r .

D i f fe ren ce s in halo e r r o r among r a t e r sources f o r each o f the

f i v e u n i ts were compared using Levene's Test f o r equal var iances .

Less var ia nce f o r any one r a t e r source f o r the ^8 behavioral items

was i n d i c a t i v e o f halo e r r o r .

U t i l i z i n g a one-way ANOVA, len iency e r r o r was o p e r a t io n a l i z e d as

a s i g n i f i c a n t r a t e r mean d i f f e r e n c e . Leniency e r r o r was assessed f o r

each u n i t . Given a s i g n i f i c a n t F - r a t i o , the Dunn-Bonferroni (Huitema,

1980) was u t i l i z e d to determine which r a t e r source was most severe and

most l e n i e n t .


CHAPTER I I I

RESULTS

The r e s u l t s o b ta in e d from comput ing the h is togra ms suggested t h a t

the d a ta v i o l a t e the assumpt ion o f n o r m a l i t y in t h e ANOVA model . H i s

tograms showed t h a t r a t i n g s across the t h r e e r a t e r sources c l u s t e r e d

toward the upper end o f th e s c a l e causing a s t ro ng n e g a t i v e l y skewed

d i s t r i b u t i o n . T a b le 4, a summary o f th e 15 h is togra ms computed, shows

t h e p e r ce nta ge o f r a t e r s in each r a t i n g c a t e g o r y .

T a b l e 4

Perc en tag e o f R a te r s in Each Category f o r I terns on BOS

Category S e l f Peer* Supervi sor

1 1.2% 0.5% 0.42%

2 1.7% 0.8% 0.68%

3 7.1% 4.7% 7.3%

4 31.2% 22.6% 28.3%

5 59-2% 71.3% 63.4%

* P e e r scores were rounded to t h e n e a r e s t wh ole number.

T a b l e 5 p re se n ts the means and v a r i a n c e s f o r t h e f i v e u n i t s and

the r e s u l t s o f t h e one-way ANOVA and Levene 's T e s t f o r equal v a r i a n c e s .

Halo e r r o r in per formance r a t i n g s was o p e r a t i o n a l l y d e f i n e d as p re se n t

when t h e v a r i a n c e a s s o c i a t e d w i t h s u p e r v i s o r , p e e r , and s e l f - r a t i n g s

29


Reproduced

with perm

ission of the

copyright ow

ner. Further

reproduction prohibited

without

permission.

T a b l e 5

Means, V a r i a n c e s , One-Way ANOVA, Lev en e 's T e s t f o r Equal Va r ia n ce s f o r Performance R a t in g s by R a t in g Source f o r Each U n i t

S e l f S u p e rv i so r Peer ANOVA L even e1s

l i t M V N M V N M V N MS F F

1 ^ - 373 .669 402 4.284 .945 401 4.548 .350 411 7.32 11 .23* 4 1 .7 4 *

2 4.262 .898 899 4.489 .378 872 4 .490 OO 854 14.87 2 6 .5 4 * 87.51

3 4.576 • 513 323 4.224 .643 317 4 .522 .465 324 11.47 2 1 . 28* 13 .3 9*

4 4.491

00LA 501 4.804 .253 522 4 .694 .249 509 12.92 35 .86 * 8 5 . 68*

5 4.491 • 00 281 4 .785 .228 275 4 .5 99 .551 284 6 .1 6 13 .4 8* 24 .17 *

* p .0001

were s i g n i f i c a n t l y d i f f e r e n t o r heterogeneous. S i g n i f i c a n t var ia nce

d i f f e re n c e s by r a t in g sources was found f o r each o f the f i v e u n i t s .

A f t e r f u r t h e r examinat ion o f the r e s u l t s , i t was discovered t h a t due

to the strong negat ive skewed d i s t r i b u t i o n s , a fu nc t io n a l r e l a t i o n s h i p

ex is ted between source mean ra t in g s and the resp ec t ive var iances .

S p e c i f i c a l l y , a s i g n i f i c a n t n eg at ive c o r r e l a t i o n e x is te d between the

r a t e r source means and t h e i r resp ect ive v ar ian ce rendering the var iance

d i f f e r e n c e s u n i n t e r p r e t a b l e as halo e r r o r ( r + - . 8 5 , £ ^ . 0 1 ) .

Leniency e r r o r in performance ra t in g s was o p e r a t i o n a l l y def ined

as present when mean ra t in g s assoc iated w i t h superv isors , peers, and

s e l f - r a t i n g s were s i g n i f i c a n t l y d i f f e r e n t . Table 5 presents the r e

s u l ts o f the one-way ANOVA and Tab le 6 presents the r e s u l t s o f the

Dunn-Bonferroni f o r p a i r comparisons. The r e s u l t s o f the one-way

ANOVA demonstrated s i g n i f i c a n t d i f f e r e n c e s among r a t e r sources f o r

each p s y c h i a t r i c u n i t . Dunn-Bonferroni t e s t s demonstrated t h a t f o r

seven o f the e ig h t s i g n i f i c a n t comparisons found between s e l f - r a t i n g s

and the o th er two sources, s e l f - r a t i n g s were lower o r less le n ie n t

than both peer and s u p e r v i s o r - r a t in g s .

The comparison t e s ts a lso demonstrated t h a t o f the four s i g n i f i

cant d i f f e r e n c e s found between p e e r - r a t in g s and s u p e r v i s o r - r a t in g s ,

two p e e r - r a t in g s from two u n i ts were higher o r more le n ie n t than t h e i r

resp ect iv e mean s u p e r v i s o r - r a t in g s and two p e e r - r a t in g s from two o ther

un it s were less le n i e n t than t h e i r resp ect iv e mean s u p e r v i s o r - r a t i n g s .

No d i f f e r e n c e between p e e r - r a t in g s and s u p e r v i s o r - r a t i n g s were found

f o r the f i f t h u n i t .

From the re s u l t s o f the data analyses on len iency e r r o r , i t


32

soa)

X I

</)Olc

roa:a)oc(0E

L.0)a.j_

O

(Aa)

co</)u03aEOo

couu.a)u-c0

CO1cc3Q

•1C •K Ht0 ’S 00 OO sO CMLA CM CA LA

u Li. SO O »— CA CMo • • • • •CO a r O LA CM CA

*>

a> aT CA a r LA LACL CO CO CM O CO3 2 ! CM -a* CM OO A -

t/> • • • • •a r a r a r -a* a r

</>>

u CO 0 CM a r CA0) -a* <A CM CA CA0) 21 LA -a* LA SO LA

a . • • • • •■a* a r -a* a r a r

* •j:CM * *)c OOOS -a* a r os

Li. 0 CA CA CO• • • • •

CA LA O LA f—L.<D

a . 00 O CM a r CAa r CA CM CA CA

t/) 2 : LA -a* LA sO LA> • • • • *

CA a r a r -a* a rM-•“ •(1)

l/> ro LA sO r_r-% vO CA CA

21 CA CM LA a r a r• • t •a r a r -a* a r a r

•JC •is •}* •K-a* CM <A * sOsO 0A sO CM 0

L. Li. LA CM O 0 CMo • • • • ••(A T—• sO sO CA r*^

>L.0) -a* c a a r LA LAa co CO CM O CO3 2 : CM -a* CM CO r**.to • • • • •

-a* -a* -a* a r a r(A>

M - CA LA SO »—r - . sO r^ . CA CA

73 S CA CM LA a r a rto • • • • •

-a* a r -a* a r a r

4->• M CM CA a r LAC

=3

O

VQ’l•is


appeared as i f a r e l a t i o n s h i p o r in t e r a c t io n e x is ted between mean

source ra t in g s and p s y c h i a t r i c u n i t s . F igure 1 demonstrates t h a t ,

indeed, an in t e r a c t io n e x is t e d . From the f i g u r e i t can be seen th a t

both s e l f - r a t i n g s and p e e r - r a t in g s tend to be more s t a b l e than super

v i s o r - r a t i n g s across u n i t s .

.0

.9

.8

• 7

.6

.5

.3

.2

4 . 0

Unit 2 53

S e l f - r a t i n g ------- * ----------* ------------* ------------* ---------- *

S u p e r v i s o r - r a t i n g °------------- °

P e e r - r a t in g +---------- h------------+------------+---------- +

Figure 1. Mean Source Rat ings by P s y c h ia t r i c Unit


In ad d i t io n to the above ana lyses , weighted means were ca lc u la te d

f o r the th ree r a t e r sources tak ing the f i v e u n i ts c o l l e c t i v e l y . The

re s u l ts are presented in Table 7- The mean s e l f - r a t i n g was lower than

the p e e r - r a t i n g and the s u p e r v i s o r - r a t i n g ; whereas, the mean peer -

r a t in g was comparable to the mean s u p e r v i s o r - r a t in g .

Tab le 7

Weighted Means f o r Source Rat ings

S e l f - r a t i n g Supervi s o r - r a t i n g P e e r - r a t ing

^.398 b . 523 1*.561

The r e s u l t s o f the ANOVA technique to t e s t f o r convergent and

d is c r im in an t v a l i d i t y and r a t e r bias are presented in Table 8. The

ana lys is provided no support f o r d isc r im in an t v a l i d i t y , strong support

f o r convergent v a l i d i t y , and strong support f o r s u b s tan t ia l r a t e r b ias .

Variance components were a ls o computed. Kavanagh e t a l . (1971) sug

gested formulas f o r es t im a t in g var iance components so th a t one might

compare the amount o f var ia nce due to each source in Table 8. Formu

las f o r computing var ia nce components can be found in Appendix C.


3 5

Table 8

Three-Way Analys is o f Var iance Summary Table

Source df MS F P* VarianceComponent

Aide (A) 51 13.964 15-50 .0001 .091

A x Behavior (B) 2397 1.490 1.65 .16 .196

A x Source (S) 102 6.585 7-31 .008 .118

Er ro r (A,B,S) 4794 .901

*P = P r o b a b i l i t y o f a Type 1 e r r o r .


CHAPTER IV

DISCUSSION

Data D i s t r i b u t i o n

P re l im in a ry an a ly s is o f the data raised some quest ions as to the

appropr ia teness o f the ANOVA models used to assess len iency e r r o r and

convergent and d is c r im in an t v a l i d i t y . Assumptions under ly ing the ANOVA

model include: (a) n o rm al i ty o f the data d i s t r i b u t i o n and (b) homoge

n e i t y o f var iances (Hopkins & Glass, 1978) . Both o f these assumptions

were shown to be v i o l a t e d by the data se t . S p e c i f i c a l l y , histograms

showed the data set to be n e g a t iv e ly skewed; and Levene's Test provided

support t h a t the var iances assoc ia ted w i t h r a t in g sources were h e te ro

geneous. V i o l a t i o n o f the assumptions has been shown to increase the

p r o b a b i l i t y o f a Type 1 e r r o r .

Despite the v i o l a t i o n s o f assumptions, both ANOVA techniques were

used on the data s e t . In defense o f using the ANOVA techniques on the

skewed da ta , research has demonstrated t h a t the ANOVA is robust w i th

respect to v i o l a t i o n s o f the n o rm a l i ty assumption given a la rg e sample

s iz e ( n > 3 0 ) (Glass, Peckham, & Sanders, 1972; Hopkins £ Glass, 1978) .

The degrees o f freedom f o r the present study, which are d i r e c t l y r e la t e d

to sample s i z e , ranged from 839 to 479^. With respect to heterogeneous

va r ia n ces , the ANOVA model has a lso been shown to be robust given equal

sample sizes o f the groups being compared (Glass, Peckham, & Sanders,

1972; Hopkins S Glass, 1978) . The sample si zes were approximate ly

36


equal in the present data se t .

Although i t was poss ib le to demonstrate t h a t the ANOVA was robust

when the assumptions th a t u n d e r l i e the model were v i o l a t e d , o th e r re

searchers have overcome s i m i l a r problems by using r a t i n g scales t h a t

range from seven po in ts ( e . g . , Friedman s C o rn e l iu s , 1976) to 110

( e . g . , Lee e t a l . , 1981) and by transforming the data set ( e . g . , Latham

et a l . , 1979) . The p o s s i b i l i t y o f increasing the range o f p o t e n t ia l

responses and trans forming the data raised two quest ions f o r the pres

ent study.

The f i r s t q u e s t i o n addressed the issue o f w h e th e r o r not a f i v e

p o i n t s c a l e was adequate f o r a p p r a i s i n g per fo rm ance . Latham and Wex-

ley (19o1) contended t h a t a f i v e p o i n t L i k e r t - t y p e s c a l e was adequate

f o r r a t i n g s c a le s based on the re se arch o f L i s s i t z and Green (1975)

and Jenkins and Taber ( 197 7 ) . A re ex a m i n a t io n o f t h i s l i t e r a t u r e

proved e n 1 i ghten i n g .

The research completed by L i s s i t z and Green (1975) and Jenkins

and Taber (1977) involved two Monte Carlo stud ies which examined the

optimal number or r a t in g po ints f o r assessing r e l i a b i l i t y . Both groups

o f researchers agreed th a t th e re was l i t t l e u t i l i t y in using more than

f i v e ra t in g po in ts given the data a re drawn from a normal ly d i s t r i

buted pop u la t io n . The data used by both groups were generated by a

computer where the means o f e r r o r s , c o r r e l a t io n s between e r r o r s , and

c o r r e l a t i o n s between e r r o r s and t r u e scores were equal to ze ro . Jen

kins and Taber made t h i s observa t io n :

Although our study was l i m i t e d to the case in which responses are d i s t r i b u t e d uni formly across a l l c a t e g o r ie s , such d i s t r i b u t i o n s a re not common in actual research. Future s im ula t ions should exp lore the


g e n e r a l i z a b i l i t y o f the c u r ren t f in d in g s f o r o th er d i s t r i b u t i o n s , e s p e c ia l l y f o r the skewed ones o f ten found in a p p l ied research, (p. 395)

I t seems t h a t Latham and Wexley (1981) f a i l e d to r e a l i z e t h a t p e r f o r

mance ra t in g scales o f ten generate skewed d i s t r i b u t i o n s thereby making

the f i v e po in t sca le undes i rab le . The data generated by Latham e t a l .

(1979) demonstrated a high degree o f skewness evidenced by the need

to e l i m i n a t e 32 o f 90 items because the items did not d is c r im in a t e

among performers. Furthermore, even a f t e r c a l c u l a t i n g the t o t a l range

o f scores f o r each supervisor and d iv i d in g by f i v e , the d i s t r i b u t i o n

s t i l l appeared n e g a t iv e ly skewed, as shown in Table 9.

Tab le 9

Number and Pe rc entage o f S u p e rv i so rs in Each Category : Latham e t a l . (1979) Data

1. Below Adequate 0 (OS)

2. Adequate 0 (OS)

3 . Ful 1 15 (17S)

A. Excel l en t 59 (65S)

5. Superior 16 (18S)

The q u e s t i o n then was r a i s e d as to e x a c t l y how Ronan and Latham

(197^) and Latham, Wexley , and Rand (1975) were a b l e t o conduct r e l i

a b i l i t y and v a l i d i t y s tu d ie s w i t h t h e i r o b ta i n e d skewed d i s t r i b u t i o n .

( I t should be noted t h a t a l l v a l i d i t y and r e l i a b i l i t y s tu d i e s c i t e d

by Latham and Wexley [1981] were g en era ted f rom th e same d a ta s e t ) .

Ronan and Latham r e p o r te d i n t e r o b s e r v e r r e l i a b i l i t y on th e raw scores


was .50 or less f o r 63 o f the 78 behavioral items. In t rao b server

r e l i a b i l i t y was reported as .64 to .8 4 ; however, in t ra o bs erv er r e l i

a b i l i t y was computed by using e ig h t composite scores where the 78

items were s u b j e c t i v e l y c l u s t e r e d . In the assessment o f concurrent

v a l i d i t y , Ronan and Latham c i t e d , " C o r r e la t io n s [between items and

each c r i t e r i o n ] were obtained a f t e r normal i z ing the response to each

i tem" (p. 6 0 ) . How the authors normal ized the data was not reported .

V a l i d i t y c o e f f i c i e n t s a f t e r norm al iza t ions ranged from .16 to .31 ,

where 16 o f the 17 v a l i d i t y c o e f f i c i e n t s were s i g n i f i c a n t a t the .001

l e v e l .

Latham et a l . (1975) reported on the i n t r a r a t e r rel i a b i 1 i t y , i n t e r

r a t e r r e l i a b i l i t y , and relevance o f c lu s te red items taken from a BOS.

In o rder to run the ana lyses , the 78 behav ioral items were grouped

in to e ig h t c r i t e r i o n scores by tak ing the a lg e b r a ic sum of the e f f e c

t i v e behaviors minus the i n e f f e c t i v e behaviors . In o th e r words, the

data were transformed in o rder to increase the r e l i a b i l i t y and v a l i

d i t y c o e f f i c i e n t s . The p o in t is t h a t both s tu d ies , Ronan and Latham

(1974) and Latham e t a l . (1975) demonstrated th a t w it h ou t some t r a n s

formation o f the da ta , the r e s u l t i n g c o r r e l a t i o n c o e f f i c i e n t s were

lower than des ired due to e i t h e r poor r e l i a b i l i t y and v a l i d i t y o f the

BOS f o r assessing performance o r skewness o f the data being analyzed.

(Skewness has been shown to be d etr im enta l to r e l i a b i l i t y and v a l i d i t y

c o e f f i c i e n t s by Lemke and Wiersma [ 1 9 7 6 ] . ) I f the low c o r r e la t i o n s

generated from the raw scores were the r e s u l t o f the skewed d i s t r i b u

t io n , then the f i v e point scale would not be the opt imal choice f o r a

performance a p p r a is a l . The process o f t ransforming the data raised


the second quest ion .

The second quest ion ra ised concerned whether o r not the data

generated by the BOS should have been transformed so t h a t the data

would be normal ly d i s t r i b u t e d . Al though three t ransfo rm at io ns were

considered, they were re je c te d because o f the issues they ra ise d .

The f i r s t issue d e a l t w i th whether o r not the o r i g i n a l research

quest ion , regarding the convergent and d is c r im in a n t v a l i d i t y o f the

BOS items, would be answered w i th transformed data . A trans form at io n

considered but re je c ted was c l u s t e r i n g the items together and using

the a lg e b r a ic sum o f the items ( e . g . , Ronan S Latham, 197*0. The

t rans form at io n was re je c ted on the grounds t h a t the researcher would

no longer be assessing convergent and d is c r im in a n t v a l i d i t y o f the

i tems, but o f the groups o f i tems.

The second issue d e a l t w i th the appropriateness o f norm al i z ing

the data given no evidence the t r u e populat ion was normal ly d i s t r i b u t e d .

The two t ransform at ions considered were the f o l lo w in g : f i r s t , using a

l o g a r i th m ic fu n c t io n to transform each data po in t (recommended by M i l

l e r , Note 1 ) ; and second, c o r r e c t in g each data po in t f o r len iency

e r r o r (recommended by Brethower, Note 2 ) . The lo g a r i th m ic func t io n

was re je c t e d on the grounds t h a t the r e s u l t i n g s t a t i s t i c a l analyses

from the transformed data would be d i f f i c u l t to i n t e r p r e t (Huitema,

Note 3) and, aga in , no data suggested the t r u e popula t ion was normal ly

d i s t r i b u t e d . Correct in g f o r len iency was re je c te d because no proof

o f abso lu te len iency e r r o r e x i s t e d . I t was poss ib le t h a t most a ides

had e x c e l l e n t performance and t h a t the t r u e populat ion d i s t r i b u t i o n

was n e g a t i v e ly skewed. Landy and Farr (1980) reached some re le v a n t


conclusions on the research involved in t ransformations o f data gener

ated from r a t in g sca le s , " in g en e ra l , t h i s p a r t i c u l a r area o f research

raises many more quest ions than i t answers" (p. 9 2 ) . Future research

tha t would focus on data transformat io ns should take in to account, not

only how the data a re c lu s t e r e d , but a lso how to i n t e r p r e t the c l u s

tered data.

Halo E f fe c ts

A secondary problem w i th the skewed d i s t r i b u t i o n was th a t i t in

troduced a n eg a t ive c o r r e l a t i o n between r a t e r mean ra t in g s and t h e i r

res p ec t ive v a r ian c e s . The c o r r e l a t i o n was such t h a t as mean ra t in g s

increased, var iances decreased. The c o r r e l a t i o n had the e f f e c t o f

rendering Levene's Test f o r equal var iances u n i n t e r p r e t a b l e , which

in turn made the assessment o f halo e r r o r impossible. Al though Le

vene's Test were s i g n i f i c a n t f o r a l l u n i ts across r a t e r s , i t d id not

make sense to i n t e r p r e t any one group as demonstrat ing g r e a t e r halo

e r r o r than another group because the var iances were n e g a t iv e ly c o r r e

la ted w i th t h e i r re s p ec t ive means. Im p l ic a t io n s f o r f u t u r e research

would be t h a t one should not use the compar is on-o f -var iance technique

f o r the assessment o f halo e r r o r i f skewed d i s t r i b u t i o n s are expected

or ob ta ined .

Leniency E f f e c t

Contrary to most previous research, s e l f - r a t i n g s tended to be

more severe than e i t h e r p e e r - r a t i n g s or s u p e r v i s o r - r a t in g s . In

a d d i t io n to t h i s study, Heneman (197^) was one o f the few researchers


k2

to f i n d s e l f - r a t i n g s le ss l e n i e n t than o t h e r r a t i n g sourc es . Heneman

h yp o th es iz ed t h a t s e l f - r a t i n g s tended to be more s e v ere because no

consequences were made c o n t i n g e n t on low r a t i n g s . S e l f - r a t e r s in the

Heneman study knew t h a t Heneman was c o l l e c t i n g d a t a f o r researc h p u r

poses o n ly and t h a t no consequences were a t t a c h e d to t h e i r r a t i n g s .

Few i f any consequences were a t t a c h e d to low r a t i n g s f o r the BOS used

in t h e p r e s e n t s tu d y . (The r e s e a r c h e r is not aware o f any formal con

sequences. )

A second o b s e r v a t i o n made f rom t h e a n a l y s i s com pleted on l e n i e n c y

was t h a t an i n t e r a c t i o n e x i s t e d between r a t e r s and p s y c h i a t r i c u n i t s .

The most pronounced i n t e r a c t i o n in v o lve d the s u p e r v i s o r - r a t i n g s . Both

p e e r - and s e l f - r a t i n g s tended t o be s t a b l e across u n i t s . However, the

s u p e r v i s o r tended t o f l u c t u a t e more between u n i t s suggest ing more i n

c o n s is te n c y in t h e i r r a t i n g s .

Seve ra l e x p l a n a t i o n s cou ld p o s s i b l y account f o r the f l u c t u a t i o n

in s u p e r v i s o r - r a t i n g s and the l a c k o f f l u c t u a t i o n in peer and s e l f -

r a t i n g s . F i r s t , because o n l y one s u p e r v i s o r r e p r e s e n te d each u n i t and

a number o f i n d i v i d u a l s r e p res en ted each s e l f mean r a t i n g and peer mean

r a t i n g , any b ia s in s u p e r v i s o r - r a t i n g s would not have been washed out

in t h e a v e r a g in g process as is p o s s i b l e w i t h t h e o t h e r groups. The

e f f e c t would be t h a t t h e s u p e r v i s o r - r a t i n g s would appear to be more

i n c o n s i s t e n t than t h e o t h e r groups. Second, a cc o r d i n g to Borman

( 197*0 , d i f f e r e n t r a t e r s tend t o v ie w a s i n g l e j o b d i f f e r e n t l y and,

t h e r e f o r e , tend to r a t e i n c o n s i s t e n t l y . I t was h y p o th es iz ed t h a t t h i s

e f f e c t would not occur s in c e t h e BOS c o n s is te d o f b e h a v i o r a l i tem s.

However, i t is p o s s i b l e t h a t the s u p e r v i s o r s s t i l l i n t e r p r e t e d the


**3

b e h a v i o r a l i tems d i f f e r e n t l y ; u n f o r t u n a t e l y , t h e r e was no s y s te m a t i c

way t o assess i f t h i s was the case. T h i r d , s u p e r v i s o r s may have i g -

> nored o r have been unab le to ju d g e the p e r ce n ta g e o f t imes an a i d e

engaged in any one b e h a v i o r . The e f f e c t would be t h a t each s u p e r v i s o r

used h i s / h e r own i n t e r p r e t a t i o n o f the number o f anchors a t t a c h e d to

each i tem on the BOS. The l i k e l i h o o d o f s u p e r v i s o r s using t h e i r own

r a t i n g i n t e r p r e t a t i o n s is f a i r l y good, s in ce being a b l e to e s t i m a t e

the p er ce n t d i f f e r e n c e a s s o c i a t e d w i t h a 4 o r 5 appears to be a d i f

f i c u l t t a s k g iv en many o p p o r t u n i t i e s to behave in a g iv en manner as

d i c t a t e d by the BOS. U n f o r t u n a t e l y , i t is not p o s s i b l e to assess

which e f f e c t o r com bina t ion o f e f f e c t s caused t h e r a t i n g f l u c t u a t i o n s .

F u tu r e re searc h might focus on each cause in a more c o n t r o l l e d s e t t i n g .

MTMM I n t e r p r e t a t i o n

The r e s u l t s o b ta i n e d f rom t h e th r e e -w a y ANOVA tec h n i q u e to assess

convergent and d i s c r i m i n a n t v a l i d i t y and r a t e r b ia s sho u ld , a cc or d in g

to Kavanagh e t a l . (1971 )> be i n t e r p r e t e d in the f o l l o w i n g manner.

D i f f e r e n t i a t i o n e x i s t e d among a id e s a t t r i b u t a b l e to the BOS used, t h a t

i s , person v a r i a n c e o r convergent v a l i d i t y . However, the e q u a l l y

l a r g e a i d e x source e f f e c t i n d i c a t e d a s u b s t a n t i a l method b ia s con

foun ding the f i r s t r e s u l t . In o t h e r words, the r a t i n g s f o r v a r i o u s

a id es were not c o n s i s t e n t across s u p e r v i s o r s , o r the r a t i n g s an a i d e

re c e iv e d were dependent upon which s u p e r v i s o r the a i d e had as a r a t e r ,

which would tend to d ecre as e the a i d e main e f f e c t . The l a ck o f a i d e ,

x b e h a v i o r i n t e r a c t i o n i n d i c a t e d no o r d e r i n g o f a id e s d i f f e r e n t l y on

d i f f e r e n t b e h a v i o r s , i . e . , no d i s c r i m i n a n t v a l i d i t y . The l a c k o f


d isc r im in a n t v a l i d i t y has g e n e r a l ly been a t t r i b u t e d to s u b s tan t ia l

r a t e r bias in performance ra t in g s (Heneman, 197^; Holzbach, 1978;

Lawler , 1967; Lee e t a l . , 1982) .

The important conclusion to be drawn from these r e s u l t s is t h a t

the ra t in g s obtained from using an instrument l i k e the BOS, which has

s p e c i f i c behav ioral i tems, a re not immune to the problems t h a t plague

BARS, BES, and graphic r a t in g sca les . Such problems inc lude , but are

not l im i t e d to , halo e r r o r and len iency e r r o r which adverse ly e f f e c t

d isc r im in a n t v a l i d i t y and generate skewed d i s t r i b u t i o n s making a n a ly

ses d i f f i c u l t to complete and i n t e r p r e t .

These observa t ions regarding the psychometric c h a r a c t e r i s t i c s o f

a BOS raised a f i n a l quest ion: Should performance r a t i n g scales even

be used? Landy and F a r r ' s (1980) conclusion is r e i t e r a t e d : " A f t e r

more than 30 years o f ser ious research, i t seems t h a t l i t t l e progress

has been made in developing an e f f i c i e n t and psychom etr ica l ly sound

a l t e r n a t i v e to the t r a d i t i o n a l graphic r a t in g sca le" (p. 8 9 ) . Even

the BOS, which appears to be the very best at tempt to overcome the

shortcomings o f ambiguous performance r a t in g sca les , was found to be

no b e t t e r than the graphic r a t i n g s ca le . Murphy, M a r t i n , and Garcia

(1982) concluded, based on t h e i r research w i th the BOS, " th e BOS as

t y p i c a l l y used, measure t r a i t l i k e judgements r a t h e r than behaviora l

observa t ion" (p. 562) .

A f t e r completing the l i t e r a t u r e review on performance a p p r a is a ls ,

and the data a n a ly s i s , i t became in c re as in g ly ev id en t t h a t t r u e halo

e r r o r could not be assessed w it hout t ru e performance measures, th a t

leniency e r r o r could not be assessed wi thout t r u e performance


measures, t h a t a l l c r i t e r i o n - r e l a t e d v a l i d i t y s tudies could not be

completed wi thout t r u e performance measures, and the dec is ion to

normal ize the data could not be made without t r u e performance data

t h a t would support a t ru e normal ized d i s t r i b u t i o n . Since t r u e per

formance measures a re necessary to compute many o f the psychometric

p r o p e r t ie s o f r a t in g sca les , i t seems log ica l to use them in eva lu

a t in g performance ra th er than s u b je c t iv e r a t in g sca les . Perhaps i f

we had spent 30 years o f ser ious research on techniques o f ob ta in in g

t r u e performance measures r a t h e r than a t tem pt in g to improve ra t in g

sca le s , performance a p p ra is a ls would be super ior to a p p r a is a ls now

a v a i l a b l e .

Future research should in v e s t ig a t e measures o f p r o d u c t i v i t y t h a t

inc lude outcome measures and process measures o f p r o d u c t i v i t y t h a t are

not cost r e l a t e d . Latham and Wexley (1981) argued t h a t c o s t - r e l a t e d

measures o f ten omit important in fo rm at io n , are d i f f i c u l t to o b ta in ,

include fa c t o r s beyond the per form er 's c o n t r o l , lead to a r e s u l t s - a t -

a l1 - c o s t s m e n t a l i t y , and provide inadequate feedback necessary to co r

re c t performance. Future in v e s t ig a t io n s should be d i re c t e d a t over

coming these b a r r i e r s and engineering new methods to measure p ro d u c t i

v i t y . Overcoming the hurdles to o b ta in in g t ru e performance measures

may be a more product ive avenue f o r research than a ttempting to o v e r

come the hurdles o f s u b je c t iv e r a t in g scales .


APPENDIX A

BEHAVIORAL OBSERVATION SCALE FOR PSYCHIATRIC AIDE

Interpersonal R e la t io nsh ip s w i th S t a f f

1. O f fe rs ass is tance to nurses

almost never 1 2 3 A

2. Attends s t a f f meetings when possible

almost never 1 2 3 ^

5 almost always

5 almost always

In te rpersona l R e la t ionsh ip s w i th P a t ie n ts

3. Takes i n i t i a t i v e to int roduce h imsel f to new p a t i e n t s

almost never 1 2 3 ** 5 almost always

k . P a r t i c i p a t e s in d a i l y a c t i v i t y w i th p a t ie n t s

almost never 1 2 3 ^ 5 almost always

5. Spends too much time in the o f f i c e avo id ing in t e r a c t i o n w i th p a t ie n t s

2 3 k !almost always 1 almost never

6. Discusses p a t i e n t issues c o n f i d e n t i a l l y when poss ib le

almost never 1 2 3 4 5

7. Is w i l l i n g to discuss p a t ie n t s complaints


8. Praises p a t ie n t s f o r accomplishments


almost always

almost always

almost always

Communication Process

9. Charts and records 1 : 1 1s


10. Completes his share o f the d a i l y ch ar t in g


5 almost always

5 almost always

A6


11. Charts in a concise manner

almost never 1 2 3 4 5 almost always

12. Charts l e g i b l y and w i th good grammar and s p e l l i n g


13- Charts w i th a p p ro p r ia te ch ar t in g symbols


14. Makes assessments re le v a n t to t reatment plan when ch ar t in g


15. Conveys accura te in format ion to team


16. Lis tens to p a t ie n ts and s t a f f w i th a t t e n t i o n


17. Informs a t l e as t one o th e r s t a f f member before leav in g the u n i t


18. C a l ls ahead i f he expects to be l a t e f o r work


S p i r i t u a l Values

19- Shares s p i r i t u a l needs and s p i r i t u a l issues a p p r o p r i a t e ly


Leadership A b i l i t i e s / R o l e Modeling

20. C ar r ies out a l l delegated tasks


21. Makes in d iv id u a l decis ions when necessary


22. Is a b le to cont inue to funct io n a p p r o p r ia t e l y in s t r e s s f u l s i tu a t io n s


23. Demonstrates good emphathic s k i l l s



24. Knows u n i t rules

almost never 1

25.

26.

27-

5 almost always

F a i l s to confront when u n i t r u l e is broken

almost always 1 2 3 4 5

Uses humor a p p r o p r ia t e ly

almost never 1 2

Complains about work, p a t i e n t s , and /or s t a f f

almost always 1 2 3 4 5

almost never

5 almost always

almost never

Educat ional

28. Attends assigned inserv ices

almost never 1 2 3 5 almost always

29- Takes opp o r tun i ty to invo lve h im self in nonmandatory inserv ic es


Teaching

30. Is ab le to teach c o n f l i c t r e so lu t io n and does so when necessary


31. A ss is ts in o r i e n t a t i n g new nursing personnel


Implementat ion o f Treatment Programs

32. Wri tes treatment programs w i th s p e c i f i c o b je c t iv e s


33- Follows trea tment plan issues in 1 : 1 's

almost never 1 2 3 4

34. Does his share o f the c lose observa t ions


35. Completes close observat ions on t ime


almost always

almost always

almost always


36. Attends team meetings when poss ib le


37- Provides suggest ions f o r t reatment plan in team meetings


38. Follows team dec is ions

almost never 1 2

5 almost always

5 almost always

Eth ic a l and Legal Issues

39. Knows lo c a t io n o f s a fe ty equipment


40. Knows f i r e and tornado procedures


41. Discusses c o n f i d e n t i a l information w i th p a t i e n t s , f a m i l i e s , f r i e n d s , or r e l a t i v e s o f p a t i e n t

•almost never 1 2 3 4 5 almost always

Publ ic R e la t ions

42. Rea l i zes t h a t t h e i r communication about t h e i r work in f luences the community's concept o f p s y c h ia t r i c care


43. Takes an a c t i v e ro le as host or hostess to new personnel , p a t i e n t s , v i s i t o r s , students , and in tern s


44. Can d i r e c t o thers to a p p r o p r ia te persons when a d d i t i o n a l i n f o r .mation is requested


Work Habits

45. Genera l ly d is p la ys a p o s i t i v e a t t i t u d e


46. Evidence in mood and a t t i t u d e th a t personal problems are not i n t e r f e r i n g w i th job performance



47. Demonstrates a w i l l i n g n e s s to accomplish var ious tasks and special p ro jec ts


48. Comes to work on t ime



APPENDIX B

INSTRUCTIONS FOR BOS

Rate the in d iv id u a l whom you a re e v a lu a t in g on the 5 po in t scale by darkening the corresponding number on the computer sheet . Rate the employee as best you can in the f o l lo w in g manner:

Employees rece ive a 1 i f you suspect they engage in t h i s behavior 0-50 percent o f the t ime, 2 f o r 50-65 percent o f the t ime, 3 f o r 65“ 80 percent o f the t ime, ** f o r 80-90 percent o f the t ime, and5 f o r 90-100 percent o f the t ime. I f you are unable to make a f a i rr a t i n g , leave i t b la n k .

NOTE: the words almost always and almost never a re reversed f o rthose items t h a t are worded as in a p p ro p r ia te behav io r . Hence, the in d iv id u a l w i l l always be rated a 5 when he is e x h i b i t i n g e x c e l l e n t behavior .

REMINDERS: Do not w r i t e on t h i s book le t .

Always use a #2 p e n c i l ; do not make any s t ray marks on the computer sheet .

When you erase, erase comple tely.Be sure you begin w i th number 1 on the computer sheet .

Do not f i l l in any names.

51


APPENDIX C

ESTIMATES FOR VARIANCE COMPONENTS

Source V a r ia n c e Formula

A id e (a) MS. - MS. D A A x B x S

nm

A x Behav io r (B) MS. „ - MS. „ _ A x B A x B x Sm

A x Source (S)MSA x S ” MSA x B x S

n

E r r o r MS. D c A x B x S

Note: n = number o f b e h a v i o r s , m = number o f sources

52


REFERENCE NOTES

1. M i l l e r , Penennah S. Personal communication, March 1984.

2. Brethower, Dale. Personal communication, March 1984.

3. Huitema, Bradley. Personal communication, A p r i l 1984.

53


BIBLIOGRAPHY

A l l e n , P . , 6 Rosenberg, S. (19 78 ) . The development o f a ta s k - o r ie n te d approach to performance e va lu a t io n in the C i t y o f New York. Publ ic Personnel Management, 7_, 26 -32 .

A t k i n , R. S . , 6 Conlon, E. J. (19 78 ) . Behav iora l1y anchored ra t in g sca les: Some t h e o r e t i c a l issues. Academy o f Management Review,3 , 119-128.

Bernardin , H. J. ( 1 97 7 ) . Behavioral expecta t ion scales versus summated r a t in g scales: A f a i r e r comparison. Journal o f AppliedPsychology, 6 2 ( 4 ) , 422-427 .

Bernardin , H. J . , A lv a re s , K. M . , s Cranny, C. J. ( 1976 ) . A recom- parison o f behav iora l expecta t io n scales to summated sca les .Journal o f Appl ied Psychology, 6 1 , 564-570 .

Borman, W. C. (1974 ) . The r a t in g o f i n d iv id u a ls in o r g a n iz a t io n s :An a l t e r n a t i v e approach. Org an iza t io n a l Behavior and Human Performance, 12 , 105-124.

Borman, W. C. ( 1 979 ) . Format and t r a i n i n g e f f e c t s on r a t i n g accuracy and r a t e r e r r o r s . Journal o f Appl ied Psychology, 6 4 , 410-421.

Borman, W. C . , & Dunnette, M. D. (19 75 ) . Behavior-based versust r a i t - o r i e n t a t e d performance r a t in g s : An em pir ic a l study. Journalo f Appl ied Psychology, 6 0 ( 5 ) , 561-565 .

Borman, W. C . , & V a l lo n , R. W. (1974 ) . A view o f what can happen when behaviora l expecta t io n scales are developed in one s e t t in g and used in an oth er. Journal o f Appl ied Psychology, 5 9 ( 2 ) , 197- 201.

Boruch, R . , L a rk in , J . , Wol ins, L . , & MacKinney, A. ( 1970 ) . A l t e r n a t i v e methods o f a n a ly s is : M u l t i t r a i t - m u l t i m e t h o d data . Educat io n a l and Psychological Measurement, 30 , 833- 8 5 3 .

Campbell, D. F . , & F is ke , D. W. (1959 ) . Convergent and d is c r im in a n t v a l i d a t i o n by the m u l t i t r a i t - m u l t i m e t h o d m a t r i x . Psychological B u l l e t i n , 56, 81 -105 .

Campbell, J. P . , Dunnette, M. D . , Arvey, R. D . , & H e l l e r v i k , I . V. (19 73 ) . The development and e va lu a t ion o f b e h a v i o r a l l y based r a t in g sca les . Journal o f Appl ied Psychology, 5 7 , 15-22.

Cascio, W. F . , S Bernard in , J. H. (1981 ) . Im p l ica t io n s o f p e r f o r mance ap p ra isa l f o r personnel d ec is io ns . Personnel Psychology,34, 211-226 .

54


55

Cooper, W. H. (1981 ) . Ubiquitous ha lo . Psychological B u l l e t i n , 90,218-244 .

Cooper, W. H. (1983) • In te rn a l homogeniety, d e s c r ip t iv e n e s s , andhalo: Resurrect ing some answers and quest ions about the s t r u c t u r e o f job performance r a t in g c a t e g o r ie s . Personnel Psychology,36, 489-501 .

D eC o t i is , T. A. (1977 ) . An a n a ly s is o f the ex te rn a l v a l i d i t y and ap p l ied relevance o f th re e ra t in g forms. O rg an iz a t io n a l Behavior and Human Performance, 19, 247-266.

Dickenson, T. I . , s T i c e , T. E. (1973) - A m u l t i t r a i t - m u l t i m e t h o d a n a ly s is o f scales developed by r e t r a n s l a t i o n . Organ iz a t io na l Behavior and Human Performance, j3, 421-438.

Fay, C. H . , 6 Latham, G. P. (19 82 ) . E f f e c t s o f t r a i n i n g and r a t in g scales on r a t in g e r r o r s . Personnel Psychology, 3 5 , 35“46.

Feldman, J . M. (1981) . Beyond a t t r i b u t i o n theory: C og n i t ive processes in performance a p p r a i s a l . Journal o f Appl ied Psychology,6 6 ( 2 ) , 127-148.

F i n l e y , D. M . , Osborn, H. G. , Dubin, J . A . , S Jeanneret , P. R.(1977 ) - B e h a v io ra l ly based r a t in g sca les : E f fe c ts o f s p e c i f i canchors and disguised sca le cont inua. Personnel Psychology, 30,659-669 .

Flanagan, J. C. ( 1959 ) . The c r i t i c a l inc ident technique. Psycholo- g ic a l B u l l e t i n , 5 1 , 327-358-

Friedman, B. A . , & Co rn e l iu s , E. T. (197 6 ) . E f f e c t o f r a t e r p a r t i c ip a t i o n in sca le const ruc t io n on the psychometric c h a r a c t e r i s t i c s o f two r a t i n g sca le formats. Journal o f Appl ied Psychology, 6 1 ( 2 ) , 210- 216 .

Glass, G. V . , Peckman, P. D . , & Sanders, J. R. (1972 ) . Consequences o f f a i l u r e to meet assumptions under ly ing the f i x e d e f f e c t s a n a ly s is o f v ar ia n ce and covariance . Review o f Educat ional Research,42, 237-288 .

Heneman, H. G. (1974 ) . Comparisons o f s e l f - and super io r ra t in gs o f managerial performance. Journal o f Appl ied Psychology, 59( 5 ) , 638-642 . ~

H o l le y , W. H . , 6 F i e l d , H. S. (1 975 ) . Performance a p pra is a l andthe law. Labor Law J o u r n a l , 423-430 .

Holzbach, R. L. (1978) . Rater bias in performance r a t in g s : Superio r ,s e l f - , and peer r a t i n g s . Journal o f Appl ied Psychology, 6 3 ( 5 ) , 579-588 .


Hopkins, K . , & Glass, G. (1 9 7 8 ) . Basic s t a t i s t i c s f o r the behavioral sc iences . Englewood C l i f f s , New Jersey: P r e n t i c e - H a l 1.

Huitema, B. E. (1980) . The a n a lys is o f covar iance and a l t e r n a t i v e s . New York: John Wiley & Sons.

Jenkins, D . , 6 Taber, T. ( 19 75 ) . A monte c a r lo study o f fac to rs a f f e c t i n g th ree indices o f composite scale r e l i a b i l i t y . Journal o f Appl ied Psychology, 6 0 ( 1 ) , 10-13.

Kane, J. S . , & Lawler , E. E. (1979 ) . Performance appra isa l e f f e c t iveness: I t s assessment and de term inates . In B. M. Staw ( E d . ) ,Research in o rg a n iza t io n a l b eh a v io r . Greenwich, CT: JAI Press.

Kavanagh, M . , MacKinney, A . , & Wol ins, L. (197 1 ) . Issues in manag e r i a l performance: M u l t i t r a i t - m u l t i m e t h o d a n a ly s is o f ra t i n g s .Psychological B u l l e t i n , 2 5 ( 1 ) . 3**"**9.

Kingstrom, P. 0 . , S Bass, A. R. (1981 ) . A c r i t i c a l a n a ly s is o fs tudies comparing b e h a v io ra l ly anchored ra t in g scales (BARS) and o th e r r a t in g formats . Personnel Psychology, 3*t, 263-289.

Kleiman, L. S . , & Durman, R. L. ( 1981) . Performance a p p r a i s a l ,p ro m ot i on , and the c o u r t s : A c r i t i c a l r e v 'e w . Personnel Ps ycho l ogy, 3i*., 103- 121.

Kle iman, L. S . , S F a l e y , R. (19 78 ) . Assessing c o n t e n t v a l i d i t y : Standards s e t by the c o u r t s . Personnel P sych o lo g y , 31 , 701-713.

K l i m o s k i , R. J . , & London, M. (197*0- Role o f the r a t e r in p e r f o r mance a p p r a i s a l . Journ a l o f A p p l i e d P sych o lo g y , 5 j3 (4 ) , *»*t5"**51 •

Landy, F. J . , & F a r r , J . L. ( 1980) . Performance r a t i n g . Psycho- l o g i c a l B u l l e t i n , 8 7 , 82 -107.

Latham, G. P . , Fay, C . , & S a a r i , L. ( 1 9 7 9 ) . The development o f b e h a v i o r a l o b s e r v a t i o n s ca le s f o r a p p r a i s i n g t h e per formance o f foreman. Personnel p sy ch o lo g y , 32 , 299_3 11 •

Latham, G. P . , S Wexley, K. N. (1 9 7 7 ) . Behavioral observat ions c a le s f o r per formance a p p r a i s a l purposes. Personnel P sych o lo g y ,30, 255-268 .

Latham, G. P . , & Wexley, K. N. (19 81 ) . Increasing p r o d u c t i v i t ythrough performance a p p r a i s a l . Reading, MA: Addison-Wesley.

Latham, G. P . , Wexley , K. N . , & Rand, T . M. (197 5 ) . The re le v a n c e o f b e h a v i o r a l c r i t e r i a developed from t h e c r i t i c a l i n c i d e n t t e c h n iq u e . Canadian Jou rna l o f B eh a v io ra l Sc ie n c e , _7> 3**9- 358.


Lawler , E. E. (1967) - The m u l t i t r a i t - m u l t i m e t h o d approach to measuring managerial job performance. Journal o f Appl ied Psychology. 51. (5) , 369-381 .

Lee, R . , Ma l ine , M . , & Greco, S. (1981 ) . M u l t i t r a i t - m u l t i m e t h o d a n a ly s is o f performance ra t in g s f o r law enforcement personnel . Journal o f Appl ied Psychology, 6 6 ( 5 ) , 625-632 .

Lemke, E . , & Wiersma, W. (19 76 ) . P r i n c i p le s o f psychological measurement. Chicago: Rand McNally.

L i s s i t z , R . , S Green, S. ( 197 5 ) . E f f e c t o f the number o f sca le points on r e l i a b i l i t y : A monte c a r lo approach. Journal o fAppl ied Psychology, 6 0 ( 1 ) , 10 -13 .

Meyer, H. H. (1 980 ) . S e l f a p pra isa l o f job performance. Personnel Psychology, 3 3 , 291-295-

Minium, E. W. (1 9 7 8 ) . S t a t i s t i c a l reasoning in psychology and educa t io n (2nd. e d . ) . New York: John W iley 6 Sons.

Murphy, K. (1 9 8 2 ) . D i f f i c u l t i e s in the s t a t i s t i c a l contro l o f halo . Journal o f Appl ied Psychology, 6 7 / 2 ) , 161-164 .

Murphy, K . , M a r t in , C . , & G arc ia , M. (1 982) . Do behavior observat i o n scales measure observat ion? Journal o f Appl ied Psychology, 6 7 ( 5 ) , 562-567 .

Myers, J . L . , DiCecco, J. V . , Whi te , J. B . , & Borden, V. M. ( 1982) . Repeated measurements on dichotomous v a r i a b l e s : Q and F t e s t s .Psychological B u l l e t i n , 9 2 ( 2 ) , 517-525.

Parker , J . , T a y lo r , E . , B a r re t , R . , & Martens, L. (1 959 ) . Rating sca le conten t : I I I . R e la t io n s h ip between su p erv iso ry - and s e l f -ra t i n g s . Personnel Psychology, 12, 49 -63 .

Ronan, W. W . , & Latham, G. P. (197 4 ) . The r e l i a b i l i t y and v a l i d i t y of the c r i t i c a l in c id en t technique: A c lo s e r look. Studies inFersonnel Psychology, 6 / 1 ) , 53 -64 .

Rosinger, G. , Myers, L. B . , & Leoy, G. W. (1 9 8 2 ) . Development o f a b e h a v i o r a l l y based performance appra isa l system. Personnel Psychology, j j , 75"88.

Saal , F . , Downey, R . , & Lahey, M. (1980 ) . Rat ing the ra t in g s :Assessing the psychometric q u a l i t y o f r a t i n g d a ta . Psychological B u l l e t i n , 8 8 ( 2 ) , 413-428.

Schneier , D. B. (1 9 7 8 ) . The impact o f EEOC l e g i s l a t ion on p e r f o r mance appra i sal s. Personnel , Ju ly -August , 24 -34 .


Schneier , C. E . , 6 B ea t ty , R. W. ( 1 978 ) . The in f lu e n ce o f ro lep re s c r ip t io n s on the performance appra isa l process. Academy o f Management J o u r n a l , 2J_(1), 129-135.

Schneier , C. E . , & B ea t ty , R. W. (1 979 ) . I n t e g r a t i n g b e h a v io r a l l y -based and e f fec t iven e s s -b ase d methods. Personnel A d m in is t ra to r ,2 4 ( 7 ) , 65 -76 .

Schwab, D. P . , Heneman, H. G. , & Decot i. is , T. A. ( 197 5 ) . Behavior- a l l y anchored r a t i n g scales: A review o f the l i t e r a t u r e . Personnel Psychology, 2 8 , 549-562 .

S h i p i r a , A . , & Shiron, A. (1980 ) . New issues in the use o f behav i o r a l ly anchored r a t i n g scales: Level o f a n a l y s is , the e f f e c t so f in c id en t f requency, and ex te rn a l v a l i d a t i o n . Journal o f App l i e d Psychology, 6 5 ( 5 ) , 517-523.

Smith, P. C . , & Kendal l , L. M. (1 963 ) . R e t r a n s la t io n on expecta t io ns An approach to the co n st ru c t io n o f unambiguous anchors f o r ra t in g sca les . Journal o f Appl ied Psychology, 4 7 , 149-155-

Stan ley , J. C. (19 61 ) . Ana lysis o f u n rep l ic a te d three-way c l a s s i f i ca t io n s , w i th a p p l i c a t io n s o f r a t e r bias and t r a i t independence. Psychometr ika, 2 6 , 205-219 .

Thorton, G. C. (1980 ) . Psychometric p ro p e r t ie s o f s e l f - a p p r a i s a l s o f jo b performance. Personnel Psychology, 3 3 , 263-271•

Thorton, G. C . , S Z o r ic h , S. ( 1 9 80 ) . T r a in in g to improve observer accuracy. Journal o f Appl ied Psychology, 6 5 ( 3 ) , 351“354.

United Sta tes C i v i l Serv ice Commission, EEOC, Department o f J u s t ic e and Department o f Labor. (1 9 77 ) . Uniform g u id e l in e s on employee s e le c t io n procedures. Federal R e g is te r , 42, 65542-65552 (Appendix B).

Warmke, D. L . , & B i l l i n g s , R. S. (1979 ) . Comparison o f t r a i n i n g methods f o r improving the psychometric q u a l i t y o f experimental and a d m i n i s t r a t i v e performance r a t in g s . Journal o f Appl ied Psychology, 6 4 ( 2 ) , 124-131.

Zammuto, R. F . , London, M . , & Rowland, K. (19 82 ) . O rgan iz a t io n and r a t e r d i f f e r e n c e s in performance a p p r a is a ls . Personnel Psychology, 35, 643-658 .

Zedeck, S . , & Baker, H. T. ( 1972 ) . Nursing performance as measured by behav ioral exp ecta t io n scales: A m u l t i t r a i t - m u l t i m e t h o d a n a ly s is . O rg an iza t io n a l Behavior and Human Performance, ]_> 457-466 .

Zedeck, S . , & Cascio, W. F. ( 1 982) . Performance ap p ra isa l dec is ions as a funct io n o f r a t e r t r a i n i n g and purposes o f the a p p r a i s a l . Journal o f Appl ied Psychology, §]_((>), 752-758 .


Zedeck, S . , Imparato, N . , Krausz, M. , Oleno, T. (1 9 7 4 ) . Development o f BARS as a fu n c t io n o f o r g a n iz a t io n a l l e v e l . Journal o f Appl ied Psychology, 5 9 ( 2 ) , 249-252.


Documents

Psychometric Characteristics of the Behavioral Observation