Upload
timothy-j-perfect
View
216
Download
1
Embed Size (px)
Citation preview
APPLIED COGNITIVE PSYCHOLOGYAppl. Cognit. Psychol. 16: 973–980 (2002)
Published online in Wiley InterScience(www.interscience.wiley.com) DOI: 10.1002/acp.920
Verbal Overshadowing in Voice Recognition
TIMOTHY J. PERFECT*, LAURA J. HUNTand CHRISTOPHER M. HARRIS
University of Plymouth, UK
SUMMARY
An experiment examined the influence of three factors on the accuracy and confidence in voiceidentifications from a voice-lineup. In a factorial design, participants either encoded the originalvoice deliberately or were exposed incidentally, either heard a normal voice, or a voice recordedthrough a telephone, and either described a target voice prior to the lineup or they did not. Themethod of encoding had no impact on performance, whilst hearing a telephone voice reducedconfidence without impairing accuracy. Providing a verbal description impaired subsequentidentification accuracy (a verbal overshadowing effect), without reducing confidence. Thus, thesedata demonstrate that verbal overshadowing can occur in voice recognition, and also provide anotherdissociation between confidence and performance. Copyright # 2002 John Wiley & Sons, Ltd.
The psychological literature on eyewitness identification has, understandably, largely
focused on witnesses’ ability to visually identify a perpetrator from a lineup, whether in
live, video or photographic form (see Sporer et al., 1996, for a review). However, there has
also been a smaller literature on the ability of witnesses to identify a perpetrator’s voice
from a lineup of voices (see Ormerod, 2001, for a review). The aim of the present work is
to examine one factor known to impair visual recognition—verbal overshadowing—to
the recognition of voices.
The verbal overshadowing effect was first introduced by Schooler and Engstler-
Schooler (1990). In a series of experiments, participants first witnessed a 30-second video
clip of a bank-robbery, in which the perpetrator’s face was clearly visible for the majority
of the film. Participants then either provided a verbal description of the perpetrator, or
acted as a no-description control group. Identification was then tested using an eight-
person, target-present, simultaneous photographic lineup. The results demonstrated the
negative impact of verbally describing a face compared to the no-description control, an
effect that has been called verbal overshadowing. Since that paper, there have been
numerous demonstrations of the verbal overshadowing effect on memory for a range of
materials, including configural relations, the taste of wine and musical form (see Schooler
et al., 1997, for a review).
It is not the aim of the present paper to review the previous theoretical debates on the
underlying mechanisms behind verbal overshadowing. No doubt other papers in this
Copyright # 2002 John Wiley & Sons, Ltd.
�Correspondence to: Professor T. J. Perfect, Department of Psychology, University of Plymouth, Drake Circus,Plymouth, PL4 8AA, UK. E-mail: [email protected]
special issue will provide the reader with the necessary historical perspective. Rather, the
primary aim of the present study is an attempt to extend the verbal overshadowing effect to
the issue of voice identification. However, some context for this work is required. The
most recent theorising about the genesis of the verbal overshadowing effect has focused on
the notion of transfer-inappropriate-retrieval (Schooler et al., 1997). In particular, with
regard to the verbal overshadowing effect for faces, the explanation of the observed
impairment is that providing a verbal description of the face shifts people from a holistic
processing style, to a featural one, and such a shift is likely to have a negative impact on
face recognition because faces are best processed holistically (Tanaka and Farah, 1993;
Macrae and Lewis, 2002). The question of interest here is whether verbal description of a
voice will also have a negative effect on voice recognition. Our expectation was that it
would, because voices represent a class of stimuli that are hard to verbally describe (see
Ormerod, 2001, for a review of the work on voice descriptions), and so they meet Schooler
et al.’s (1997) modality mismatch criterion. That is, the particular details of a voice—
beyond a classification of the speaker’s gender, age, and perhaps accent—are hard to
articulate, and so voices are likely to be susceptible to verbal overshadowing effects.
To date, there are no published studies demonstrating a verbal overshadowing effect in
the identification of previously unfamiliar voices. The only study of this topic known to the
authors is a conference paper by Schooler, Fiore, Melcher and Ambadar (1996) cited in
Schooler et al. (1997). In this study, participants first heard a tape-recorded voice, then
either verbally described the voice or acted as a control. At recognition participants were
required to select the voice saying the same statement as previously heard, from three
similar distractor voices. Additionally, participants were asked to indicate whether they
could provide a specific reason for their selection, or whether it was based on a gut
feeling—a ‘just know’ decision. The results indicated that a verbal overshadowing effect
did not influence the nature of the decision process, nor the accuracy of judgements for
which participants had a reason, but did impair the accuracy of recognition decisions based
upon gut feeling. Thus, the results of this unpublished study are suggestive of an overall
verbal overshadowing effect for voice recognition.
As well as measuring the accuracy of voice identifications following a verbal descrip-
tion, the present study also investigated the impact of verbal overshadowing on post-
identification confidence. This is an area of research that has been investigated, yet is
surprisingly neglected. By this we mean that although some of the previous studies on
verbal overshadowing have included measures of confidence in the design, the issue has
attracted almost no theoretical consideration. Those studies that have reported confidence
measures have been consistent in showing no effect of verbal overshadowing on
confidence, despite demonstrating impairments on performance. For example, Schooler
and Engstler-Schooler’s (1990) first two experiments included measures of confidence.
Both studies demonstrated a verbal overshadowing effect on accuracy, but no effect on
confidence. The same pattern was shown in Westerman and Larsen (1997), Experiment 2.
Our aim was to determine whether the same dissociation between confidence and accuracy
would be observed for the effects of verbalization on voice identification.
In addition to studying the verbal overshadowing effect, the present study also examined
two other factors that are pertinent to the issue of voice identification. The first is the issue
of whether witnesses encode the voice deliberately, or through incidental exposure. This
topic has been studied relatively little, but the research conducted suggests that deliberate
encoding is generally beneficial to later voice identification (Hammersley and Read,
1996). The second question is purely applied: what is the effect of identifying a voice
974 T. J. Perfect et al.
Copyright # 2002 John Wiley & Sons, Ltd. Appl. Cognit. Psychol. 16: 973–980 (2002)
heard over the telephone, rather than a normal voice? Clearly voices heard over the
telephone have forensic relevance; harassing telephone calls are a common problem that
the police have to deal with. Prior research indicates that, as long as the speech format is
held constant between encoding and test, performance is relatively robust (Rathborn et al.,
1981). However, it is an interesting issue to explore whether verbal overshadowing effects
would be observed for voices heard over the telephone. To the extent that naturalistic
stimuli enable holistic processing, and distorted stimuli hinder it (see, for example, the
work on face-recognition composites, such as Young et al., 1987), then one might expect
the verbal overshadowing effect to be larger for natural voices than for those heard over the
telephone.
METHOD
Participants
The participants were 56 undergraduate students from the Department of Psychology,
University of Plymouth. They were either volunteers or individuals who participated to
fulfil partial course credit.
Design
A 2� 2� 2 independent group factorial design was employed for the study. The three
factors were verbal overshadowing (description versus no description), distortion (normal
voices versus voices recorded over the telephone), and intentionality (incidental versus
deliberate encoding).
Materials
At encoding, a single voice recorded on audiocassette was presented to the participants.
Participants heard either a normal voice, or a voice recorded through the telephone. Each
voice said the same sentence as used by Schooler and Engstler-Schooler (1990), namely
‘Just follow the instructions, don’t press the alarm, and no one will get hurt’. Within each
condition selection of the voice sample was randomized.
At test, participants heard a recording of a lineup of six target voices, each saying the
same sentence as had been spoken by the ‘perpetrator’. Each lineup was target-present,
with the position of the perpetrator randomized within the list. It is important to point out
that the perpetrator was rerecorded saying the sentence, and so the task was voice-
identification rather than identification of the specific details of the previous recording.
Procedure
Participants were tested in small groups of up to seven people in a quiet test laboratory in
the Department of Psychology. Half the participants heard the target voice segment under
incidental instructions, and half under deliberate encoding instructions. Participants in the
incidental exposure condition were told that purpose of the study was to investigate
the possible relationship between voice recognition and basic mathematical ability, and so
the instructions highlighted the mathematical tasks they were to undertake. Specifically,
Voice recognition 975
Copyright # 2002 John Wiley & Sons, Ltd. Appl. Cognit. Psychol. 16: 973–980 (2002)
before listening to the randomly selected target voice participants were given the
following instruction: ‘First listen to the short recorded statement before you begin the
problems below. You will be given 10 minutes to answer as many of the questions as
possible. The experimenter will tell you when to start and finish. Please do not turn over
this page until you have reached the bottom or are instructed to do so.’
Participants in the deliberate encoding were given instructions that highlighted the
importance of remembering the voice. Before hearing the first voice, participants were told
‘You will now hear a short recorded statement. Please listen to the voice as carefully as
possible as you will be asked to recognise it later in the study.’ Following exposure to the
voice, participants were then introduced to the mathematical filler task and given their
instructions.
All participants completed a 10-minute filler task, consisting of arithmetic problems,
ranging in difficulty from the very easy (e.g. 11þ 13¼ ?) to the moderately difficult (e.g.
443,482–4545¼ ?). Participants continued solving such problems for 10 minutes, writing
their answers in the booklets provided.
Those in the verbal description condition were then instructed to provide a description
of the voice they had heard, in as much detail as possible. They were given 5 minutes to
provide as much information as they could. Those in the control condition had an
equivalent unfilled delay.
Participants then heard an audio-taped voice-lineup, consisting of six voices. The target
was always present in the lineup. Those in the normal voice condition heard each of the six
voices read out the same sentence as had been presented earlier. Those in the telephone
voice condition heard the same voices recorded through a telephone, again saying the
same sentence. Participants were required to listen to the full lineup, before selecting one
of the voices, and rating their confidence in the choice. The 10 point confidence scale
ranged from 1¼ not at all confident, to 10¼ very confident.
RESULTS AND DISCUSSION
The initial analysis examined the predictors of lineup accuracy. As there was only a single
lineup, the outcome measure was binary in nature (incorrect, correct) and so the
independent predictors of intentionality (deliberate versus incidental), distortion (normal
versus telephone voice) and verbal overshadowing (verbal description versus control) were
entered into a binary logistic regression to see which predicted lineup success. There was a
reliable verbal overshadowing effect, W(1)¼ 4.99, p< 0.03 such that verbally describing
the voice lead to poorer performance (21.4% correct) than control (50.0% correct). Neither
intentionality, W(1)¼ 2.99, p< 0.09, (deliberate study¼ 46.4% correct, incidental
study¼ 25% correct), nor distortion W(1)¼ 0.36, p< 0.55 (normal voice¼ 39.3% correct,
telephone voice¼ 32.1% correct) entered as significant in the model.
A regression model was also run to determine the mean confidence associated with a
lineup choice, independent of whether that choice was correct or not. Here, the only
significant predictor of confidence was the distortion effect, t¼ 2.26, p< 0.03, such that
confidence was higher when normal voices were judged (Mean rating 6.20) than when
telephone voices were judged (Mean rating 5.10). Neither verbal overshadowing, t¼ 0.00,
(both verbal description and control mean rating¼ 5.64) nor intentionality, t¼ 1.70,
p< 0.10 (deliberate study mean rating¼ 6.10, incidental study mean rating¼ 5.21)
entered as significant in the model.
976 T. J. Perfect et al.
Copyright # 2002 John Wiley & Sons, Ltd. Appl. Cognit. Psychol. 16: 973–980 (2002)
These two analyses indicate that confidence and accuracy are influenced independently
by different factors in this study. Thus, an apparent consequence of this is that confidence
and accuracy in this study should only be weakly related. This is indeed what was found.
Overall, across all conditions, the mean confidence accuracy correlation was, r¼ 0.06
indicating no relation at all between a witnesses choice of voice, and their confidence in
that choice.
In addition to the main analyses reported above, two further regression analyses were
run, to explore further the factors of intentionality and distortion. In each case, the
intention was to explore whether the verbal overshadowing effect interacted with these
other processing manipulations. First we ran a binary logistic regression for lineup
accuracy which included the predictors of verbal description, intentionality and their
interaction term. The results indicated a reliable description effect as before, W(1)¼ 4.98,
p< 0.03, no effect of intentionality as before, W(1)¼ 2.80, p< 0.10, and no interaction,
W(1)¼ 0. Thus it is apparent that the verbal overshadowing effect on accuracy of voice
identification is independent of the encoding manipulation used in the present study.
However, the same was not true for the analysis involving the factor of distortion.
Including an interaction term in the analysis removed a significant verbal overshadowing
effect, W(1)¼ 0.15, p< 0.70. There was no main effect of distortion as before,
W(1)¼ 0.057, p< 0.45. However, there was now a marginal interaction, W(1)¼ 3.31,
p< 0.07. The direction of this interaction was unexpected. For normal voices, 42.9% of
the control group were accurate, whereas 35.7% of the description group were accurate.
For the telephone voices, 57.1% of the control group were accurate, but only 7.1% of the
description group were successful. Thus the verbal overshadowing effect was apparently
larger for the telephone voices, although strong conclusions should perhaps not be drawn
on the basis of this analysis, because the interaction term was not quite statistically
significant, and the numbers of participants in each cell were relatively low.
Thus, the results of the present study are clear-cut in a number of ways. First, a reliable
verbal overshadowing effect was observed. This is the first such demonstration of verbal
overshadowing for voice recognition, and is thus an extension of the domains of
performance known to be influenced by verbal overshadowing. The second effect was
that the confidence in the identification decision was not influenced by verbal over-
shadowing. From an applied perspective, this is the worst possible result: verbal over-
shadowing decreases earwitnesses’ ability, without their awareness.
The effects of verbal overshadowing in this study therefore show a dissociation between
accuracy and confidence, a pattern that has been reported (though not discussed) elsewhere
(Schooler and Engstler-Schooler, 1990; Westerman and Larsen, 1997). This is worthy of
further consideration, since it relates to the mechanisms underlying the verbal over-
shadowing effect. Current theorizing about the verbal overshadowing effect centres on a
processing shift induced by the verbal description, from a holistic processing style to one
based on features (Macrae and Lewis, 2002). Given that face recognition is best achieved
through holistic processing (Tanaka and Farah, 1993), such a processing shift may account
for the drop in performance. But why should this shift not result in a drop in confidence
also?
Our methodology required that participants make a choice from the lineup, and so all
the errors were errors of commission, rather than omission. Whilst this procedure is not
ideal for applying the research to the real world, where witnesses are explicitly informed
that the perpetrator may or may not be in the lineup, it does clarify the interpretation of the
confidence data. Given that all participants were required to make a response, the lack of
Voice recognition 977
Copyright # 2002 John Wiley & Sons, Ltd. Appl. Cognit. Psychol. 16: 973–980 (2002)
an effect of verbal overshadowing on confidence becomes particularly surprising. If the
effect of verbal description was to reduce availability to the original memory trace, one
might have expected our methodology to result in more guessing following verbal
description. However, this was not apparently the case, since there was no confidence
effect. This is not because the confidence measure was insensitive however, because there
was an effect of distortion on confidence. Thus, these data imply that following verbal
description witnesses choose another voice with as much confidence as they would have
chosen the target if they had not given a description.
Re-examination of the previous confidence data in the verbal overshadowing literature
is inconclusive with regards to the confidence in false positive responses. Both Schooler
and Engstler-Schooler (1990) and Westerman and Larsen (1997) allowed participants to
indicate that the perpetrator was not in the lineup (though in fact the perpetrator was
present). However, neither study reports confidence ratings separately for false positive
and misses and so we do not know whether those who had provided a verbal description in
those experiments selected false positives with the same degree of confidence as those in
the control group. An alternative possibility is that they selected a foil with less
confidence, but rejected the lineup with more confidence. Since both would be errors,
the overall lack of an effect for confidence might be masking underlying differences in
how people judge their responses.
The drop in accuracy observed in the verbal overshadowing effect is consistent with the
data reported by Dunning and Stern (1994), who asked eyewitnesses to explain how they
identified a perpetrator from a lineup. Those who reported using a just-know or pop-out
method were more accurate than those who reported using strategies based on elimination
or comparison of lineup members. Thus, if one equates pop-out recognition with non-
verbalizable, holistic, processing, and comparison with verbalizable, featural processing,
then the pattern reported by Dunning and Stern (1994) echoes that seen in the verbal
overshadowing literature (see Schooler et al., 1997, for a similar argument). However, the
difficulty with this line of argument is that Dunning and Stern (1994) also found that
the processing difference was associated with a reduction in confidence. Thus, whilst the
effect of verbal overshadowing resembles the pattern reported by Dunning and Stern
(1994) with regards accuracy, it does not do so for confidence.
Thus, there is something of a theoretical puzzle here. The pattern of findings from a
series of verbal overshadowing studies, including this one, is a reliable effect on accuracy,
but no effect on confidence. In the present study, this lack of a confidence effect occurred
even though we required participants to guess if necessary. This drop in performance has
been explained as a change in processing strategy that follows a verbal description
(Schooler et al., 1997; Macrae and Lewis, 2002). However, the work directly on
processing strategy by (Dunning and Stern, 1994; Smith et al., 2000) suggests that
processing style is associated with changes in both accuracy and confidence (and response
latency also, but this has yet to be studied in relation to verbal overshadowing). Thus the
previous focus on accuracy alone has suggested that the work on processing strategy and
verbal overshadowing is highly similar. However, this has ignored the different patterns
seen in the confidence data. The question arises as to why apparently the same shift in
processing results in changes in confidence in one paradigm (measures of self-reported
strategy), but not the other (verbal overshadowing). The confidence data seem to suggest
that either the processing shift is not the same in both paradigms, or some other factor
accounts for the difference. Whilst it is beyond the scope of the present paper to answer
this question, it is hoped that future studies of verbal overshadowing will pay closer
978 T. J. Perfect et al.
Copyright # 2002 John Wiley & Sons, Ltd. Appl. Cognit. Psychol. 16: 973–980 (2002)
attention to the issue of confidence. In particular it may be helpful to examine confidence
in false positives and misses separately.
The second factor to influence the results of the present study was the use of the
telephone voice. Here the pattern was the opposite of the verbal overshadowing effect,
namely no influence on performance, but a change in confidence. Thus, witnesses were as
able to identify to voices on the telephone as accurately as they did with normal voices, but
they apparently did not believe so. This effect is useful in the respect that it demonstrates
that the lack of a confidence effect with verbal overshadowing is not due to low sensitivity
of the confidence measure. Although the statistical power of the analysis was restricted,
the marginal interaction between verbal overshadowing and the nature of the voice sample
was intriguing, and worthy of further research. The data suggested that the main effect of
verbal overshadowing was carried mainly in the telephone voices. This was contrary to our
initial expectation that naturalistic voices might be more readily processes in a holistic
manner, and so might be more susceptible to verbal overshadowing effects. However,
because of the statistical limitations, it is perhaps best not to over-interpret this finding
until it is replicated in a larger sample.
The effect of intentionality, although in the direction anticipated (better performance
and higher confidence with deliberate encoding) did not enter as significant for either
analysis. Further, the interaction between intentionality and verbal overshadowing was not
significant, indicating that a verbal overshadowing effect was present whether or not
participants tried attend to the initial voice. This suggests that the encoding processes are
irrelevant in producing a verbal overshadowing effect. However, given that participants in
both conditions knew that a memory test was forthcoming, perhaps our manipulation of
intentionality was insufficiently strong.
In summary, this small-scale study has shown that a reliable verbal overshadowing
effect is obtained for voice identification. Further, it replicates previous research in
showing that verbally describing a to-be-recognized stimulus leads to a decrement in
identification accuracy but does not reduce confidence. It is suggested that the dissociation
between performance and confidence offers scope to test different theoretical accounts of
the verbal overshadowing effect, and is an issue that to date has been neglected.
REFERENCES
Dunning D, Stern LB. 1994. Distinguishing accurate from inaccurate eyewitness identifications viaenquiries about decision processes. Journal of Personality and Social Psychology 49: 878–893.
Hammersley R, Read JD. 1996. Voice identification by humans and computers. In PsychologicalIssues in Eyewitness Identification, Sporer SL, Malpass RS, Koehnken G (eds). Erlbaum:Mahwah, NJ; 117–152.
Macrae CN, Lewis HL. 2002. Do I know you? Processing orientation and face recognition.Psychological Science 12: 194–196.
Melcher JM, Schooler JW. 1996. The misremembrance of wines past: verbal and perceptualexpertise differentially mediate verbal overshadowing of taste memory. Journal of Memory andLanguage 35: 231–245.
Ormerod D. 2001. Sounds familiar? Voice identification evidence. Criminal Law Review August:595–622.
Rathborn H, Bull R, Clifford BR. 1981. Voice recognition over the telephone. Journal of PoliceScience and Administration 9: 280–284.
Schooler JW, Fiore SM, Brandiamonte MA. 1997. At a loss from words: verbal overshadowing ofperceptual memories. The Psychology of Learning and Motivation 37: 291–340.
Voice recognition 979
Copyright # 2002 John Wiley & Sons, Ltd. Appl. Cognit. Psychol. 16: 973–980 (2002)
Schooler JW, Engstler-Schooler TY. 1990. Verbal overshadowing of visual memories: some thingsare better left unsaid. Cognitive Psychology 22: 36–71.
Smith SM, Lindsay RCL, Pryke S. 2000. Postdictors of eyewitness errors: can false identifications bediagnosed? Journal of Applied Psychology 85: 542–550.
Sporer SL, Malpass RS, Koehnken G (eds). 1996. Psychological Issues in Eyewitness Identification.Erlbaum: Mahwah, NJ.
Tanaka JW, Farah MJ. 1993. Parts and wholes in face recognition. Quarterly Journal of Experi-mental Psychology: Human Experimental Psychology 42: 225–245.
Westerman DL, Larsen JD. 1997. Verbal-overshadowing effect: evidence for a general shift inprocessing. American Journal of Psychology 110: 417–428.
Young AW, Hellawell D, Hay DC. 1987. Configural information in face perception. Perception 16:747–759.
980 T. J. Perfect et al.
Copyright # 2002 John Wiley & Sons, Ltd. Appl. Cognit. Psychol. 16: 973–980 (2002)