Individual Differences in the Ability to Judge Others Accurately David A. Kenny University of Connecticut University of Connecticut

Individual Differences in the Ability to Judge Others Accurately

David A. Kenny David A. Kenny

University of ConnecticutUniversity of Connecticut

http://davidakenny.net/kenny.htm

Overview• Review of previous literature

– Reliability

• Internal consistency

• Cross-target correlations

• Parallel forms

• New model: SCARIB

Accuracy About What?

• the target’s personality Is Dave friendly?

• target’s opinions or attitudes How does Dave feel about Lucy?

• what the target is currently thinking or feeling What is Dave thinking about now?

• the target’s mood Is Dave excited or bored?

JudgmentalAccuracy or

JA

JudgementalAccuracy or

JA

What Is Accuracy?

Correspondence between a judgement and a criterion measure

A Renewed Interest in Individual Differences

Interest in Emotional Intelligence (EQ)

Models that Provide a Framework for Understanding Judge Moderators

Neurological Deficits Creating Lower JA

Types of Measures Standardized Scales (fixed targets)

PONS IPT CARAT Sternberg measures

Agreement Across Targets empathic accuracy (EA) slide viewing

Standardized Scales Develop a pool of items

Pick the “good” items Establish reliability as measured by internal consistency

Low Reliability of Scales

Scale IIC

CARAT .56 .038

IPT-30 .52 .035

IPT-15 .38 .039

PONS .86 .027

Maybe an IIC of .03 Is Not All that Bad?

Peabody Picture Vocabulary Test: .08

Beck Depression: .30 Bem M/F Scale: .19 Rosenberg Self-Esteem: .34

I guess it is bad.

Agreement Across Targets

Same procedure, but different targets. example of slide viewing

Treat target as an “item” to assess reliability.

Statistical Analysis of Multiple Target Data

Social Relations Model Two-way data structure: Judge by

Target Three sources of variance

Judge Target Error and Relationship Judge/(Judge + Error) is like an IIC.

Social Relations Model Variance Partitioning: Emotion Recognition

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Goldenthal Sabatelli Elfenbein

JudgeTargetRel/Error

Social Relations Model Variance Partitioning: Empathic Accuracy

0

0.1

0.2

0.3

0.4

0.5

Ickes et al. Thomas &Fletcher

JudgeTargetRel/Error

Questions About EA Results

Ickes et al. Many of the studies show very small

amounts of judge variance 2 of the 3 studies that show the

greatest level have only 3 targets, 2 of which are very similar

Thomas & Fletcher Ad hoc analysis Possible nonindependence

Perhaps individual differences emerge with emotionally-charged stimuli?

What Do We Learn? Small judge variance ≈ .10 Large target variance ≈ .30 Large error/relationship var.

≈ .60

Convergent Validity?

Do different tests of judgemental ability correlate?

Convergent Validity?

CARAT 0.16

IPT-30 0.12 0.10

IPT15 -0.02 --- ---

STERN 1 0.14 --- --- ---

STERN 2 0.16 --- --- --- 0.02

PONS CARAT IPT-30 IPT-15 STRN1

Summary of Convergent Validity

Average correlation of about .10.

Perhaps there are many skills? The different skills do not

correlate highly.

Validity of JA?Recent Meta-analysis by Hall, Andrzejewski, and Yopchick (2008)

gender differences (Hall: r ≈ .20) positive personality (r ≈ .08) negative personality (r ≈ -.07) social competence

self rated (r ≈ .10)other rated (r ≈ .07)

Are There Individual

Differences? maybe not low internal consistency

standardized scalescross-target studies (mostly)

poor convergent validity

intuition validity data hints at some validity “Is JA the only skill or competence

without any individual differences?”

That is, if people are scoring above chance, would not we expect individual differences?

Maybe yes?

An Item Response Theory Model

• presume each question refers to a different item

• parameters• r is ability (normally distributed variable) minus difficulty• g is guessing (assuming two alternatives)

Model• probability that the judge is correct:

er/(1 + er) (e approximately equals 2.718)

• allow for guessing

er/(1 + er) + g[1 − (er/(1 + er)]

Average Item Difficulty• probability that judges are correct

across all items• allow for guessing• What is the ideal average item

difficulty?• 75%?

• results from a simulation that varies average item difficulty…

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0.5 0.6 0.7 0.8 0.9 1

Proportion Correct

Alp

ha

SD = 1.0

SD = 0.5

Interpretation

• Curves peak in the high 80s• Predicted by IRT (high .80s)• Better to design “easy” tests• Why?

• Performance of low ability judges is almost entirely due to chance. If you want to discriminate low ability judges, you need an easy test.

Deception

DANVA

PONS

IPT 30

IPT 15

CARAT

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.5 0.6 0.7 0.8 0.9 1

Proportion Correct

Alp

ha

TestsSD = 1.0SD = 0.5

Limits of the Standard IRT

Model• Guessing assumed to be

random• Cannot score below

chance• Unidimensional

SCARIB Model• Skewed• Channels• Attunement• Reversal • Information• Biased Guessing

Channels

• Different sources of information•Face•Body•Voice

• Different variables•Negative emotion•Positive emotion

Attunement• Judgement is quite difficult: Many

channels of information that must be monitored.

• A given judge generally allocates her or his attention in the same way.

• Metaphor of a radio: “tuned into” some channels more than others

• Different judges more attuned to different channels.

Skewed• Total attunement represents the

total resources that a judge can allocate to the task.

• The distribution of total resources is negatively skewed.•Most judges have many resources.•A few judges have very few resources.

• Total resources represents the “true score.”

Information• For each channel of each item, there is

information available.• For a given test, there may be more

information in some channels than in others.

Reversal• Very often the information is counter-

diagnostic.• For example: Someone who is smiling

may be unhappy.

Biased Guessing• Assume two response alternatives (e.g.,

happy and sad) • Some judges are biased in favor of one

alternative and some in favor of the other.

Formal Model for Judge i, Item j, and Channel k

• Resources: si negatively skewed ranging from 0 to 10

• Attunement: rik = (1 – a)si/c + adiksi or the allocation for judge i to channel k (dik = 1)

• Information: xik = |zik|ICIC

• Reversal: Some information is given a negative sign: xik –xik

• gij = whij + (1 – w)/a where w is the amount of biased guessing and hij is the direction (either 1 or 0)

IRT Equations for the Probability of Being

Correct• Diagnostic Informationvijk = (rikxjk) – 1.5(c + 1)

ev/(1 + ev) + g[1 − (ev/(1 + ev)]

• Counter-Diagnostic Informationvijk = –(rikxjk) – 1.5(c + 1)

g[1 − (ev/(1 + ev)]

Simulation• 24 items • 7 channels• attunement • reversal• item biases• biased guessing

Results• SCARIB appears to be able to reproduce

the basic results from JA studies.• Also results agree with IRT and prior

studies that the mean and alpha are positively correlated (r = .817)

Why Low Internal Consistency?

• Multiple channels• Information that varies by item or by item X

channel• Biased guessing• However, attunement in conjunction with

information varying by channel increases internal consistency.

Validity and Cross-Target Correlation

• Lowered by attunement in conjunction with information varying by channel.

• Slightly increased by biased guessing.• Cross-target correlation mirrors validity (r

= .929) much better than does internal consistency (r = .770).

Why Target Variance?

• More information for some targets.• “Better” information (i.e., fewer reversals)

for some targets.• Stereotype accuracy: Some targets conform

more to item biases.• Target differences are largely due to

information differences, not to “readability.”

Why Below Chance Responding?

• Reversal• Item Biases• Reliability and validity can be improved by

reversing some items when below-chance responding is due to reversal: Being wrong for the right reason. Reversal is counter productive when due to item biases.

One Major Limitation

• Ignores policy differences: You could be attuned to diagnostic information but use it the wrong way.

• Note though without allowing for policy differences, SCARIB does a good job reproducing JA results.

Implications

• JA tests should be “easy.”

– Establish individual differences for deception.

• The cross-target correlation is a better way of validating a test than internal consistency.

• May, at times, be beneficial to use “consensual” criteria.

Final Point

• Needed are experiments and statistical analyses to better estimate the SCARIB parameters.

Kia ora!

http://davidakenny.net/doc/scarib.ppt

Relationship to the Funder’s RAM Model

• Relevance: Is the information correlated with the correct answer (few reversals)?

• Availability: Does that information vary (|z|CICI)?

• Detection: Is the judge attuned to that information (rik)?

• Utilization: Does the judge know how to weight the information (oijk)?

Information (xjk) CICI

The larger the above, the more “available” the information.

A certain percentage of the information is reversed:If less than 50%, the information is diagnosticIf equal to 50%, the information is irrelevant.If greater than 50%, the information is counter-diagnostic.

Details: Attunement (rik)

• c channels to which the judge can allocate resources to processing

• si refers to the total resources that judge i can allocate to the task

• (1 – a)si/c + adiksi refers to allocation for judge i to channel k (dik = 1)

• variance of s a measure of individual differences

• 7 channels (a working assumption)

Policy (oijk)

Some judges have things backwards. For example, I might believe that “being nice” is a

sign of hostility. Today we ignore this factor: Set oijk = 1.

Judgement (vjik) (aikxjkoik) The above might be negative and so the judge

may be inclined to believe in the wrong answer (one reason why there can be below chance responding).

Parameter f is a difficulty (assumed to increase with more channels)

Biased Guessing: Model

• gij = whij + (1 – w)/a where w is the amount of biased guessing and hij is the direction (either 1 or 0)

• In a standardized tests items are “paired,” and so gi1 = 1 – gi2.

Item Selection Issues Items that correlate may not

measure a skill but rather a consistent bias.

When correlations are small and sample sizes not large, there is the danger of capitalization on chance.

Follow-up Alphas?

Scale nitial Follow-up

CARAT .56 .46

IPT-30 .52 .29

IPT-15 .38 .18

Reaction within the Area

abandonment by some of the “psychometric approach”

other forms of reliability (test-retest and split half)

no attempt to explain the low inter-item correlations (≈.03)

“It is possible that most of the variance … is due to differences in the judgeability of targets as opposed to the sensitivity of the perceivers.”

Malone & DePaulo (2001)

Channels: How Do They Work?

•Each item “depends” differentially on the different channels.•Tests “depend” more on some channels than do others.

10.009.008.007.006.005.004.003.002.001.000.00

V1

1,000

800

600

400

200

0

Fre

quency

Mean =8.759Std. Dev. =1.46474

N =9,999

Still True Now?“Our position is not that individual

differences are nonexistent in interpersonal accuracy. Rather, we believe that the variability of such differences is rather limited.”

Kenny & Albright (1987)

Documents

Individual Differences in the Ability to Judge Others Accurately David A. Kenny University of Connecticut University of Connecticut