Upload
jordan-rivera
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
Individual Differences in the Ability to Judge Others Accurately
David A. Kenny David A. Kenny
University of ConnecticutUniversity of Connecticut
http://davidakenny.net/kenny.htm
Overview• Review of previous literature
– Reliability
• Internal consistency
• Cross-target correlations
• Parallel forms
• New model: SCARIB
Accuracy About What?
• the target’s personality Is Dave friendly?
• target’s opinions or attitudes How does Dave feel about Lucy?
• what the target is currently thinking or feeling What is Dave thinking about now?
• the target’s mood Is Dave excited or bored?
JudgmentalAccuracy or
JA
JudgementalAccuracy or
JA
What Is Accuracy?
Correspondence between a judgement and a criterion measure
A Renewed Interest in Individual Differences
Interest in Emotional Intelligence (EQ)
Models that Provide a Framework for Understanding Judge Moderators
Neurological Deficits Creating Lower JA
Types of Measures Standardized Scales (fixed targets)
PONS IPT CARAT Sternberg measures
Agreement Across Targets empathic accuracy (EA) slide viewing
Standardized Scales Develop a pool of items
Pick the “good” items Establish reliability as measured by internal consistency
Low Reliability of Scales
Scale IIC
CARAT .56 .038
IPT-30 .52 .035
IPT-15 .38 .039
PONS .86 .027
Maybe an IIC of .03 Is Not All that Bad?
Peabody Picture Vocabulary Test: .08
Beck Depression: .30 Bem M/F Scale: .19 Rosenberg Self-Esteem: .34
I guess it is bad.
Agreement Across Targets
Same procedure, but different targets. example of slide viewing
Treat target as an “item” to assess reliability.
Statistical Analysis of Multiple Target Data
Social Relations Model Two-way data structure: Judge by
Target Three sources of variance
Judge Target Error and Relationship Judge/(Judge + Error) is like an IIC.
Social Relations Model Variance Partitioning: Emotion Recognition
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Goldenthal Sabatelli Elfenbein
JudgeTargetRel/Error
Social Relations Model Variance Partitioning: Empathic Accuracy
0
0.1
0.2
0.3
0.4
0.5
Ickes et al. Thomas &Fletcher
JudgeTargetRel/Error
Questions About EA Results
Ickes et al. Many of the studies show very small
amounts of judge variance 2 of the 3 studies that show the
greatest level have only 3 targets, 2 of which are very similar
Thomas & Fletcher Ad hoc analysis Possible nonindependence
Perhaps individual differences emerge with emotionally-charged stimuli?
What Do We Learn? Small judge variance ≈ .10 Large target variance ≈ .30 Large error/relationship var.
≈ .60
Convergent Validity?
Do different tests of judgemental ability correlate?
Convergent Validity?
CARAT 0.16
IPT-30 0.12 0.10
IPT15 -0.02 --- ---
STERN 1 0.14 --- --- ---
STERN 2 0.16 --- --- --- 0.02
PONS CARAT IPT-30 IPT-15 STRN1
Summary of Convergent Validity
Average correlation of about .10.
Perhaps there are many skills? The different skills do not
correlate highly.
Validity of JA?Recent Meta-analysis by Hall, Andrzejewski, and Yopchick (2008)
gender differences (Hall: r ≈ .20) positive personality (r ≈ .08) negative personality (r ≈ -.07) social competence
self rated (r ≈ .10)other rated (r ≈ .07)
Are There Individual
Differences? maybe not low internal consistency
standardized scalescross-target studies (mostly)
poor convergent validity
intuition validity data hints at some validity “Is JA the only skill or competence
without any individual differences?”
That is, if people are scoring above chance, would not we expect individual differences?
Maybe yes?
An Item Response Theory Model
• presume each question refers to a different item
• parameters• r is ability (normally distributed variable) minus difficulty• g is guessing (assuming two alternatives)
Model• probability that the judge is correct:
er/(1 + er) (e approximately equals 2.718)
• allow for guessing
er/(1 + er) + g[1 − (er/(1 + er)]
Average Item Difficulty• probability that judges are correct
across all items• allow for guessing• What is the ideal average item
difficulty?• 75%?
• results from a simulation that varies average item difficulty…
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0.5 0.6 0.7 0.8 0.9 1
Proportion Correct
Alp
ha
SD = 1.0
SD = 0.5
Interpretation
• Curves peak in the high 80s• Predicted by IRT (high .80s)• Better to design “easy” tests• Why?
• Performance of low ability judges is almost entirely due to chance. If you want to discriminate low ability judges, you need an easy test.
Deception
DANVA
PONS
IPT 30
IPT 15
CARAT
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.5 0.6 0.7 0.8 0.9 1
Proportion Correct
Alp
ha
TestsSD = 1.0SD = 0.5
Limits of the Standard IRT
Model• Guessing assumed to be
random• Cannot score below
chance• Unidimensional
SCARIB Model• Skewed• Channels• Attunement• Reversal • Information• Biased Guessing
Channels
• Different sources of information•Face•Body•Voice
• Different variables•Negative emotion•Positive emotion
Attunement• Judgement is quite difficult: Many
channels of information that must be monitored.
• A given judge generally allocates her or his attention in the same way.
• Metaphor of a radio: “tuned into” some channels more than others
• Different judges more attuned to different channels.
Skewed• Total attunement represents the
total resources that a judge can allocate to the task.
• The distribution of total resources is negatively skewed.•Most judges have many resources.•A few judges have very few resources.
• Total resources represents the “true score.”
Information• For each channel of each item, there is
information available.• For a given test, there may be more
information in some channels than in others.
Reversal• Very often the information is counter-
diagnostic.• For example: Someone who is smiling
may be unhappy.
Biased Guessing• Assume two response alternatives (e.g.,
happy and sad) • Some judges are biased in favor of one
alternative and some in favor of the other.
Formal Model for Judge i, Item j, and Channel k
• Resources: si negatively skewed ranging from 0 to 10
• Attunement: rik = (1 – a)si/c + adiksi or the allocation for judge i to channel k (dik = 1)
• Information: xik = |zik|ICIC
• Reversal: Some information is given a negative sign: xik –xik
• gij = whij + (1 – w)/a where w is the amount of biased guessing and hij is the direction (either 1 or 0)
IRT Equations for the Probability of Being
Correct• Diagnostic Informationvijk = (rikxjk) – 1.5(c + 1)
ev/(1 + ev) + g[1 − (ev/(1 + ev)]
• Counter-Diagnostic Informationvijk = –(rikxjk) – 1.5(c + 1)
g[1 − (ev/(1 + ev)]
Simulation• 24 items • 7 channels• attunement • reversal• item biases• biased guessing
Results• SCARIB appears to be able to reproduce
the basic results from JA studies.• Also results agree with IRT and prior
studies that the mean and alpha are positively correlated (r = .817)
Why Low Internal Consistency?
• Multiple channels• Information that varies by item or by item X
channel• Biased guessing• However, attunement in conjunction with
information varying by channel increases internal consistency.
Validity and Cross-Target Correlation
• Lowered by attunement in conjunction with information varying by channel.
• Slightly increased by biased guessing.• Cross-target correlation mirrors validity (r
= .929) much better than does internal consistency (r = .770).
Why Target Variance?
• More information for some targets.• “Better” information (i.e., fewer reversals)
for some targets.• Stereotype accuracy: Some targets conform
more to item biases.• Target differences are largely due to
information differences, not to “readability.”
Why Below Chance Responding?
• Reversal• Item Biases• Reliability and validity can be improved by
reversing some items when below-chance responding is due to reversal: Being wrong for the right reason. Reversal is counter productive when due to item biases.
One Major Limitation
• Ignores policy differences: You could be attuned to diagnostic information but use it the wrong way.
• Note though without allowing for policy differences, SCARIB does a good job reproducing JA results.
Implications
• JA tests should be “easy.”
– Establish individual differences for deception.
• The cross-target correlation is a better way of validating a test than internal consistency.
• May, at times, be beneficial to use “consensual” criteria.
Final Point
• Needed are experiments and statistical analyses to better estimate the SCARIB parameters.
Kia ora!
http://davidakenny.net/doc/scarib.ppt
Relationship to the Funder’s RAM Model
• Relevance: Is the information correlated with the correct answer (few reversals)?
• Availability: Does that information vary (|z|CICI)?
• Detection: Is the judge attuned to that information (rik)?
• Utilization: Does the judge know how to weight the information (oijk)?
Information (xjk) CICI
The larger the above, the more “available” the information.
A certain percentage of the information is reversed:If less than 50%, the information is diagnosticIf equal to 50%, the information is irrelevant.If greater than 50%, the information is counter-diagnostic.
Details: Attunement (rik)
• c channels to which the judge can allocate resources to processing
• si refers to the total resources that judge i can allocate to the task
• (1 – a)si/c + adiksi refers to allocation for judge i to channel k (dik = 1)
• variance of s a measure of individual differences
• 7 channels (a working assumption)
Policy (oijk)
Some judges have things backwards. For example, I might believe that “being nice” is a
sign of hostility. Today we ignore this factor: Set oijk = 1.
Judgement (vjik) (aikxjkoik) The above might be negative and so the judge
may be inclined to believe in the wrong answer (one reason why there can be below chance responding).
Parameter f is a difficulty (assumed to increase with more channels)
Biased Guessing: Model
• gij = whij + (1 – w)/a where w is the amount of biased guessing and hij is the direction (either 1 or 0)
• In a standardized tests items are “paired,” and so gi1 = 1 – gi2.
Item Selection Issues Items that correlate may not
measure a skill but rather a consistent bias.
When correlations are small and sample sizes not large, there is the danger of capitalization on chance.
Follow-up Alphas?
Scale nitial Follow-up
CARAT .56 .46
IPT-30 .52 .29
IPT-15 .38 .18
Reaction within the Area
abandonment by some of the “psychometric approach”
other forms of reliability (test-retest and split half)
no attempt to explain the low inter-item correlations (≈.03)
“It is possible that most of the variance … is due to differences in the judgeability of targets as opposed to the sensitivity of the perceivers.”
Malone & DePaulo (2001)
Channels: How Do They Work?
•Each item “depends” differentially on the different channels.•Tests “depend” more on some channels than do others.
10.009.008.007.006.005.004.003.002.001.000.00
V1
1,000
800
600
400
200
0
Fre
quency
Mean =8.759Std. Dev. =1.46474
N =9,999
Still True Now?“Our position is not that individual
differences are nonexistent in interpersonal accuracy. Rather, we believe that the variability of such differences is rather limited.”
Kenny & Albright (1987)