auditory visual perception

  • Upload
    ipires

  • View
    227

  • Download
    0

Embed Size (px)

Citation preview

  • 8/12/2019 auditory visual perception

    1/13

    Auditory-Visual Speech Perception and Auditory-Visual Enhancement in Normal-Hearing Younger

    and Older Adults

    Mitchell S. Sommers, Nancy Tye-Murray, and Brent Spehar

    Objective: The purpose of the present study was to

    examine the effects of age on the ability to benefit

    from combining auditory and visual speech infor-

    mation, relative to listening or speechreading

    alone. In addition, the study was designed to com-

    pare visual enhancement (VE) and auditory en-

    hancement (AE) for consonants, words, and sen-

    tences in older and younger adults.

    Design: Forty-four older adults and 38 younger

    adults with clinically normal thresholds for fre-

    quencies of 4 kHz and below were asked to identify

    vowel-consonant-vowels (VCVs), words in a carrier

    phrase, and semantically meaningful sentences in

    auditory-only (A), visual-only (V), and auditory-vi-

    sual (AV) conditions. All stimuli were presented in a

    background of 20-talker babble, and signal-to-bab-

    ble ratios were set individually for each participant

    and each stimulus type to produce approximately

    50% correct in the A condition.

    Results: For all three types of stimuli, older and

    younger adults obtained similar scores for the A

    condition, indicating that the procedure for indi-

    vidually adjusting signal-to-babble ratios was suc-

    cessful at equating A scores for the two age groups.

    Older adults, however, had significantly poorer per-formance than younger adults in the AV and V

    modalities. Analyses of both AE and VE indicated no

    age differences in the ability to benefit from com-

    bining auditory and visual speech signals after con-

    trolling for age differences in the V condition. Cor-

    relations between scores for the three types of

    stimuli (consonants, words, and sentences) indi-

    cated moderate correlations in the V condition but

    small correlations for AV, AE, and VE.

    Conclusions:Overall, the findings suggest that the

    poorer performance of older adults in the AV con-

    dition was a result of reduced speechreading abili-

    ties rather than a consequence of impaired integra-tion capacities. The pattern of correlations across

    the three stimulus types indicates some overlap in

    the mechanisms mediating AV perception of words

    and sentences and that these mechanisms are

    largely independent from those used for AV percep-

    tion of consonants.

    (Ear & Hearing 2005;26;263275)

    Considerable evidence is now available to suggestthat speech intelligibility improves when listeners canboth see and hear a talker, compared with listeningalone (Grant, Walden & Seitz, 1998; Sumby & Pollack,1954). Moreover, the benefits of combining auditoryand visual speech information increase with the diffi-culty of auditory-only (A) perception (Sumby & Pol-lack, 1954). This increased performance for auditory-

    visual (AV) compared with A or visual-only (V)presentations is at least partially a result of comple-mentary information available in the auditory and

    visual speech signals (Grant & Seitz, 1998; Grant, etal., 1998; Summerfield, 1987). Thus, for example, if alistener is unable to perceive the acoustic cues for placeof articulation, accurate intelligibility can be main-tained if the speaker is visible because speechreadingprovides an additional opportunity to extract placeinformation.

    Grant et al. (1998) proposed a conceptual frame-work for understanding the improved performancefor AV presentations, compared with either unimo-dal format (A or V) in which both peripheral andcentral mechanisms contribute to an individuals

    ability to benefit from combining auditory and vi-sual speech information. In the initial step of themodel, peripheral sensory systems (audition and

    vision) are responsible for extracting signal-relatedsegmental and suprasegmental phonetic cues inde-pendently from the auditory and visual speech sig-nals. These cues are then integrated and serve asinput to more central mechanisms that incorporatesemantic and syntactic information to arrive atphonetic and lexical decisions. This model is partic-ularly useful for comparing the benefits of combin-ing auditory and visual speech information across

    different populations because it highlights that dif-ferential benefits can result from changes in periph-eral, central, or a combination of peripheral andcentral abilities.

    One of the difficulties in comparing the benefitsobtained from combining auditory and visual speechinformation across different populations, however,is that performance for unimodal presentations isoften different across the populations of interest. Inthe present study, for example, we wanted to inves-tigate the effects of age on the ability to combine

    Department of Psychology (MSS), Washington University, St.Louis, Missouri; and Central Institute for the Deaf (N.T. andB.S.), St. Louis, Missouri.

    0196/0202/05/2603-0263/0 Ear & Hearing Copyright 2005 by Lippincott Williams & Wilkins Printed in the U.S.A.

    263

  • 8/12/2019 auditory visual perception

    2/13

    auditory and visual speech information. However,this effort is complicated by well-documented de-clines in both A and V speech perception as afunction of age (CHABA, 1988; Dancer, Krain,Thompson, Davis, & Glenn, 1994; Honneil, Dancer,& Gentry, 1991; Lyxell & Ronnberg, 1991; Middel-

    weerd & Plomp, 1987; Shoop & Binnie, 1979). Thus,evidence for age differences in the ability to benefitfrom combining auditory and visual speech signalscould be attributed to age-related differences inauditory sensitivity, age-related differences inspeechreading, age differences in integrating audi-tory and visual speech information, or some combi-nation of these factors.

    Although a few studies comparing AV scores inolder and younger adults have been designed tominimize the influence of unimodal performancedifferences (Cienkowski & Carney, 2002; Walden et

    al., 1993), other methodological concerns make theirfindings difficult to interpret. In one investigation,for example, Walden et al. (1993) compared differ-ences between AV, V, and A performance for conso-nant-vowels (CVs) and sentences in middle-aged (35to 50 years of age) and older (65 to 80 years of age)adults. For the sentences (but not for the CVs),testing was conducted in the presence of speech-shaped noise and noise levels were adjusted toobtain approximately 40 to 50% correct in the Acondition. Visual enhancement for sentences, de-fined as the difference between the AV and A condi-

    tions, did not differ between the two groups. V scoresfor sentences were significantly lower for the older(16.7%) than for the middle-aged adults (34.4%),preventing an analysis of age differences underconditions of equivalent unimodal performance.Moreover, scores in the AV condition for sentenceswere near ceiling for both groups (93.8 and 92% forthe middle-aged and older adults, respectively),making it difficult to interpret the null effects of age.

    Cienkowski & Carney (2002) minimized the con-tribution of presbycusis in an AV perception task bycomparing normal-hearing younger and older adultson susceptibility to the McGurk effect (McGurk &MacDonald, 1976). In the McGurk effect, partici-pants are presented with discrepant auditory and

    visual information and often perceive a fused re-sponse that differs from both inputs. For example,participants might be presented with an auditory

    VCV containing a medial bilabial stop (e.g., /aba/)while simultaneously viewing a face articulating a

    VCV with a medial velar stop (e.g., /aga/). On acertain percentage of trials, listeners will reporthearing a VCV with a medial alveolar place ofarticulation (e.g., /ada/ or /aa/) that represents afusion of the auditory and visual inputs. Cien-

    kowski & Carney found that the percentage of fused

    responses did not differ between normal-hearingolder and younger adults and suggested that thisfinding argued against age-related declines in audi-tory-visual integration. The results of the Cien-kowski & Carney study provide indirect evidence forsimilar enhancement from combining auditory and

    visual speech information as a function of age butbecause their study was not designed to assessenhancement specifically, they did not include mea-sures of unimodal performance. Consequently, it isnot clear whether the comparable susceptibility tothe McGurk effect in the two age groups was asso-ciated with equivalent A and V abilities.

    One purpose of the present study, therefore, wasto examine the effects of age on the ability to benefitfrom combining auditory and visual speech informa-tion after minimizing differences in unimodal per-formance. Toward this end, we tested normal-hear-

    ing younger and older adults under conditions thatproduced similar levels of A performance and thatalso avoided ceiling effects in the AV condition. Inaddition, we computed both visual enhancement(VE, the benefit obtained from adding a visual signalto an auditory stimulus) and auditory enhancement(AE, the benefit obtained from adding an auditorysignal to a visual-only stimulus) as a means ofexamining age differences in the ability to combineauditory and visual speech information after nor-malizing for any differences in unimodal perfor-mance (see Methods section for additional details on

    computing these two measures).A second goal of the study was to investigate agedifferences in AE and VE for consonants, words, andsentences. In general, only small to moderate corre-lations have been observed between VE for stimulidiffering in semantic context (e.g., nonsense sylla-bles, isolated words, and meaningful sentences),suggesting that the mechanisms mediating en-hancement may differ as a function of the amount ofsemantic or lexical information available in thestimulus (Grant & Seitz, 1998; Grant et al., 1998).Furthermore, to our knowledge, there have been nosystematic investigations of the relationship be-tween AE or VE for speech stimuli differing insemantic content, although a small number of inves-tigations (Dekle, Fowler, & Funnell, 1992; Sams,Manninen, Surakka, Helin, & Ktt, 1998) havecompared AV integration for different stimulustypes using the McGurk effect. In the study mostsimilar to the current investigation, Sams et al.(1998) found considerable variability in participantssusceptibility to the McGurk effect in Finnish as afunction of semantic context (consonants, isolatedwords, words in sentences). Thus, similar to resultswith A identification (Pichora-Fuller, Schneider, &

    Daneman, 1995; Sommers & Danielson, 1999) and

    264 EAR& HEARING/ JUNE2005

  • 8/12/2019 auditory visual perception

    3/13

    measures of auditory-visual integration with theMcGurk effect (Sams et al., 1998) the effect of age on

    VE and AE may differ as function of stimulus type.The current study was therefore designed to exam-ine whether any observed age effects on VE and AEwould be modulated by the amount of lexical and

    semantic information available.

    METHODS

    Participants

    Thirty-eight younger adults (mean age, 20.1, SD 2.1) and forty-four older adults (mean age 70.2,

    SD 6.8) served as participants. Younger adultswere all students at Washington University andwere recruited through posted advertisements.Older adults were all community-dwelling residentsand were recruited through a database maintained

    by the Aging and Development Program at Wash-ington University. Testing required three 2.5-hoursessions that were conducted on separate days. Allparticipants reported that English was their firstlanguage and that they had never had any lipread-ing training. Participants were paid $10/hour fortaking part in the experiments. All participantswere screened for CNS dysfunctions, using an exten-sive medical history questionnaire that asked aboutsignificant CNS events, including stroke, open orclosed head injury, concussions, any event in whichthe participant was rendered unconscious, dizzi-

    ness, and current medications. In addition, partici-pants were asked about conditions for which theywere currently being treated. Any participant with ahistory of CNS disorders, who was currently beingtreated for a CNS condition or who was currentlytaking drugs that affect CNS activity, was excluded.

    Verbal abilities were assessed using the vocabularysubtest of the Wechsler Adults Intelligence Scale,with a maximum score of 70. Mean scores for olderand younger adults were 55.3 (SD 8.7) and 46.2(SD 4.2), respectively. An independent-samplest-test indicated that vocabulary scores were signifi-cantly higher for older than for younger adults,t(81) 5.5, p 0.001. Older participants were alsoscreened for dementia using the Mini Mental StatusExam (Folstein, Folstein, & McHugh, 1975). Partic-ipants who scored below 24 (of 30) were excludedfrom further testing.

    Participants were also screened for vision andhearing before testing. Participants whose normalor corrected visual acuity, as assessed with a Snelleneye chart, exceeded 20/40 were excluded from par-ticipating to minimize the influence of reduced vi-sual acuity on the ability to encode visual speechinformation. Visual contrast sensitivity was mea-

    sured using the Pelli-Robson contrast sensitivity

    chart (Pelli, Robson, & Wilkins, 1998) and partici-pants whose score exceeded 1.8 were also excludedfrom further participation. Pure-tone air-conductionthresholds were obtained for all participants atoctave frequencies from 250 to 4000 Hz, using aportable audiometer (Beltone 110) and headphones

    (TDH 39). Any participant whose threshold ex-ceeded 20 dB HL (American National StandardsInstitute, 1989) was excluded from participating.Participants with asymmetric hearing losses, opera-tionalized as greater than a 10-dB threshold differ-ence between the two ears at any of the test frequen-cies, were also excluded. Pure-tone averages (500,1000, and 2000 Hz) for younger adults were 0.61 (SD 4.3) and 0.67 (SD 4.2) dB HL for the left andright ears, respectively. Corresponding values forolder adults were 14.1 (SD 6.6) and 13.2 (SD 6.3) dB HL. A comparison of pure-tone averages

    revealed that thresholds were significantly greaterfor older than for younger adults (left ear: t(81) 10.4,p 0.001; right ear:t(81) 10.1,p 0.001). Itshould also be noted that although we did not obtainthreshold measures at frequencies higher than 4000Hz, based on previous studies of age-related hearingloss (Morrell, Gordon-Salant, Pearson, Brant, &Fozard, 1996), the older adults almost certainly hadclinically significant hearing losses (e.g., greaterthan 20 dB HL) at frequencies above 4000 Hz. Thus,despite meeting the criteria for normal hearing forfrequencies through 4 kHz, older adults had signif-icantly poorer hearing than younger adults.

    Stimuli and Procedures

    Participants were presented with consonants,words (in a carrier phrase), and sentences in A, V,and AV conditions. All participants were first testedon consonants, followed by words and then sen-tences. Within each stimulus type, however, testingin the A, V, and AV conditions was counterbalancedsuch that approximately equal numbers of partici-pants from each age group were tested in eachpossible order of testing modality. All testing wasconducted in a double-walled sound attenuatingchamber (IAC, 4106). Stimuli were presented via aPC (Dell 420) equipped with a Sound Blaster Liveaudio card and Matrox (Millennium G400 Flex) 3D

    video card. Auditory stimuli for the A and AVconditions were presented binaurally in a multi-talker background babble (see below for details onbabble level) over headphones (Sennheiser HD 265).Signal level remained constant at 60 dB SPL for the

    A and AV conditions, as measured in a 6-cc supra-aural flat plate coupler using the A-weighting scale(Bruel & Kjaer 2231). Testing levels were evaluated

    against calibrated levels before every test using an

    EAR& HEARING, VOL. 26 NO. 3 265

  • 8/12/2019 auditory visual perception

    4/13

    in-line RMS meter. Visual stimuli for the V and AVconditions were presented on a 17-inch touch screenmonitor with participants sitting approximately0.5 m from the display. The video image completelyfilled the 17-in monitor. Participants viewed thehead and neck of each talker as they articulated the

    stimuli.

    Setting the Background Babble Level

    As noted, one goal of the present study was toinvestigate age differences in VE and AE underconditions in which A performance was similaracross groups and that did not produce ceiling levelscores in the AV condition. To accomplish this goal,all testing was conducted in a multitalker back-ground babble and signal-to-babble levels were setindividually for each participant and stimulus con-

    dition (consonants, words, and sentences) so as toproduce approximately 50% correct in the A condi-tion. For the words and sentences, the stimuli usedto establish the background babble level for a givencondition were the same type as those used in thecorresponding test condition (e.g., when words wereused as test items, the background babble level wasestablished using words). None of the stimuli usedfor setting the background babble levels for wordsand sentences were repeated in the test phase (seebelow for details on stimuli used for setting back-ground babble level with consonants). The multi-

    talker babble was captured from the Iowa Audiovi-sual Speech Perception Laserdisc (Tyler, Preece, &Tye-Murray, 1986) using 16-bit digitization and asampling rate of 48 kHz.

    Babble level was set independently for each par-ticipant and each stimulus type using a modified

    version of the procedure for establishing speechreception thresholds (American Speech and Hearing

    Association, 1988). Briefly, in the first phase of theprocedure a starting babble level was established byinitially setting signal-to-babble ratio to 20 dB.

    Appropriate stimuli (consonants, words, or sen-tences) were then presented as babble level wasincreased in 10-dB steps until the participants firstincorrect response. The starting babble level for theremainder of the tracking was then set at a level10-dB less than the level of the first incorrect re-sponse. In the next phase, babble level was incre-mented from the starting level (i.e., 10 dB below thelevel of the first incorrect response) in 2-dB steps(with two stimuli at each level) until the participantresponded incorrectly for five of six trials. Babblelevel for testing was then established by subtractingthe total number of correct responses from thestarting level and adding a correction factor of 1.

    Mean signal-to-babble ratios for the three stimulus

    types were as follows: consonants (younger, M 9.7, SD 2.1; older, M 4.8, SD 3.7); words(younger,M 8.6,SD 1.2; older,M 6.5,SD 1.5); sentences (younger, M 7.5, SD 0.9;older, M 5.6, SD 1.5). Independent-measurest-tests indicated that older adults were tested at

    significantly higher signal-to-babble ratios thanyounger adults for all three types of stimuli (p 0.001 for all comparisons).

    Consonants

    Participants received 42 repetitions of 13 conso-nants in an /iCi/ context. The consonants testedwere: /m/, /n/, /p/, /b/, //, /t/, /d/, /g/, /f/, /v/, /z/, /s/, and

    /k/ (e.g., participants were presented with /imi/, ini/,etc.). The same male talker produced all of thestimuli for testing consonants. The stimuli weredigitized from existing Laserdisc recordings of the

    Iowa Consonant Test (Tyler et al., 1986) by connect-ing the output of the Laserdisc player (LaservisionLD-V8000) into a commercially available PCI inter-face card for digitization (Matrox RT2000). Acquisi-tion was controlled by software (Adobe Premier).

    Video capture was 24-bit, 720 480 in NTSC-standard 4:3 aspect ratio and 29.97 frames persecond to best match the original analog media.

    Audio was captured at 16 bits and a sampling rate of48 kHz.

    Consonant identification in the A, V, and AVconditions was measured using a 13-item closed-set

    test. Testing order was counterbalanced such thateach modality was presented first, second, and thirdequally often to approximately the same number ofparticipants in each age group. To familiarize par-ticipants with the test stimuli, the 13 test conso-nants were first presented (in the /iCi/ context)using the A condition without any background bab-ble. Participants were instructed to respond bypressing the appropriate response area on a touch-screen monitor (ELO ETC-170C). Once participantswere able to identify all 13 consonants presented inquiet correctly, the background babble was added

    for all conditions (including V) at the level estab-lished during pretesting (i.e., at the level yieldingsignal-to-babble ratios designed to produce approx-imately 50% correct A consonant performance). Par-ticipants then received 42 presentations of eachconsonant with presentation order determined pseu-dorandomly for each participant. No feedback wasprovided during testing.

    Words

    Stimulus materials for testing words were digi-tized from analog recordings of the Childrens Audi-

    tory Visual Enhancement Test (CAVET) (Tye-Mur-

    266 EAR& HEARING/ JUNE2005

  • 8/12/2019 auditory visual perception

    5/13

    ray & Geers, 2001), using the same equipment andprocedures as for the consonants. The CAVET con-sists of three lists of 20 words and a practice listembedded in the carrier phrase (say the word)followed by a one- to three-syllable word. Conse-quently, scores for each of the individual conditions

    (A, V, and AV) were based on a relatively smallnumber of presentations (20 in each condition) com-pared with either the consonants or sentences. De-spite this potential limitation, we elected to use theCAVET for several reasons. First, the CAVET wasdesigned specifically to avoid floor and ceiling levelperformance in the V condition. Second, each list isconsidered equally difficult to speechread and useshighly familiar words. The same female talker pro-duced all stimuli. Participants were told the carrierphrase and were instructed to identify the word afterthe phrase. Lists were counterbalanced across the

    three presentation modalities (A, V, and AV) such thatapproximately equal numbers of older and youngeradults received each list in a given condition.

    Before testing, participants were informed thatthey would see, hear or both see and hear a talkerarticulating a carrier phrase say the word followedby a target word. Participants were told to say thetarget word aloud and scoring was based on exactphonetic matches (i.e., adding, deleting, or substi-tuting a single phoneme counted as an incorrectresponse). If participants were unsure of the targetword, they were encouraged to guess. Participantsreceived three practice trials in each condition (A, V,and AV) before testing and none of the practicewords appeared during the actual testing. All test-ing was performed using signal-to-babble levels es-tablished for words during pretesting. Participantsreceived a total of 60 trials (20 each in the A, V, and

    AV conditions) with modality test order counterbal-anced across participants. Within a given test con-dition (A, V, or AV) presentation order of the indi-

    vidual stimuli was determined pseudorandomly.

    Sentences

    Sentences were digitized from Laserdisc record-ings of the Iowa Sentence Test (Tyler et al., 1986),using the same equipment and procedure as wasused for the consonants and words. One hundredsentences (five lists of 20 sentences each) weredigitized. Two of the lists were used for practicestimuli and the remaining three were used fortesting. Each sentence in a list was produced by adifferent talker (half men, half women). Thus,within a list participants saw and/or heard a newtalker on every trial. Lists were counterbalancedacross participants such that each of the three

    lists was presented an equal number of times in

    the A, V, and AV conditions. All testing was conductedat signal-to-babble ratios established for sentencesduring pretesting. After the sentence was presented,participants were instructed to repeat as much of thesentence as possible and were again encouraged toguess if they were unsure of one or more words in the

    sentence. Scoring was based on five to seven key wordspresented in each sentence.

    Calculating VE and AE

    In the present study, VE was calculated relativeto an individuals A performance (expressed as pro-portion correct) according to the following equation:

    VE (AV A)/1 AThis measure of VE has been used in several inves-tigations of AV performance (Grant & Seitz, 1998;Grant et al., 1998; Rabinowitz, Eddington, Del-horne, & Cuneo, 1992) because it provides a methodfor comparing VE across a wide range of A and Vscores. For example, an individual scoring 80% in Apresentations and 90% with AV presentations wouldhave the same enhancement score as an individualscoring 20% in A and 60% in AV despite largeabsolute differences in overall A and AV perfor-mance. Furthermore, this equation avoids the biasinherent in the simple difference score AV-A, inwhich higher values of A necessarily lead to lower

    values of enhancement.Although less common, we also calculated AE

    according to the following equation:

    AE (AV A)/1 VIn the present study, AE is less likely to be affectedby differences in unimodal performance than themore traditional measure of VE because it normal-izes for any age differences in V performance (recallthat different signal-to-babble ratios were used tominimize age differences in A performance).

    To further minimize differences in unimodal per-formance in comparing both VE and AE as a func-tion of age, we used analysis of covariance (AN-COVA) to control for age differences in A or V scores.That is, when comparing AE as a function of age we

    used A scores as a covariate to control for anydifferences in A performance that were not elimi-nated by using different signal-to-babble ratios.Similarly, when comparing VE as a function of agewe used V scores as a covariate to control for any agedifferences in V performance.

    RESULTS ANDDISCUSSION

    Performance in A, V, and AV Conditions

    Figure 1 displays mean percent correct for olderand younger adults as a function of both stimulus

    type and presentation modality. A three-way,

    EAR& HEARING, VOL. 26 NO. 3 267

  • 8/12/2019 auditory visual perception

    6/13

    mixed-design analysis of variance was used to ex-amine main effects and interactions for age, stimu-lus type, and presentation modality. Age (younger

    versus older) was treated as an independent-mea-sures variable and stimulus type (consonants,

    words, sentences) and presentation modality (audi-tory, visual, and auditory-visual) were treated asrepeated-measures variables. Overall, performancediffered significantly across stimulus type [F(2, 164) 10.8, p 0.001]. Tukey honestly significant dif-ferences post hoc pairwise comparisons with a Bon-ferroni correction for multiple comparisons indi-cated significant differences between identificationscores for all three types of stimuli such that conso-nants (56.1%) were identified significantly betterthan words (51.1%) and words were identified sig-nificantly better than sentences (40%) (p 0.001 forall comparisons). Significant differences were alsoobserved for presentation modality [F(2, 164) 12.3, p 0.001], with scores for AV presentations(78.1%) significantly higher than for A (47.8%) andscores for A significantly higher than for V (21.9%)(p 0.001 for all comparisons). Recall, however,that A scores were established using signal-to-bab-ble ratios designed to produce approximately 50%correct in this condition. Therefore, although thedata indicate that the procedure for establishingappropriate signal-to-babble levels in A were suc-cessful (overall A performance was very close to50%), the measures do not reflect true group differ-

    ences in A performance. Finally, older adults scored

    significantly poorer overall (46.7%) than youngeradults [51.6%, F(1, 82) 16.2, p 0.001].

    Of particular interest to the present investigationis that the three way interaction of age stimulustype presentation modality was significant [F(4,

    328) 2.5, p 0.05]. A series of Tukey honestlysignificant differences post hoc comparisons with aBonferroni correction for multiple comparisons indi-cated no significant age differences in A perfor-mance for the three types of stimuli (p 0.9 for allcomparisons). This finding provides additional evi-dence that the procedure for equating A perfor-mance levels was effective across age groups andstimulus types. In the AV condition, however, olderadults exhibited significantly poorer identificationscores than younger participants for consonants andwords (p 0.05 for both comparisons) but not forsentences. Older adults also exhibited significantlypoorer V scores for both consonants (p 0.001) andwords (p 0.05). The difference between V perfor-mance for sentences was not significant (p 0.5),but this finding should be interpreted cautiouslybecause absolute performance levels were relativelylow for both older and younger adults. Post hoccomparisons of V performance as a function of stim-ulus type for older and younger adults indicated thatyounger adults had similar V scores for consonantsand words (p 0.3) but significantly lower scores forsentences (p 0.001 for the differences betweenconsonants and sentences and between words and

    sentences). For older adults, V performance was

    Fig. 1. Percent correct identification of consonants (left), words (middle), and sentences (right) for younger (filled bars) and older(open bars) adults. Presentation modality refers to visual only (V), auditory only (A), and auditory-visual (AV). Error bars indicatestandard deviation.

    268 EAR& HEARING/ JUNE2005

  • 8/12/2019 auditory visual perception

    7/13

    significantly better for words than for consonantsand significantly better for consonants than forsentences (p 0.001 for all comparisons).

    Comparison of VE and AE

    As noted, evidence for age differences in theability to extract visual speech information is acritical consideration when evaluating age-related

    changes in both VE and AE. In the current study,VE was examined using three separate ANCOVAs(one for each stimulus type). In all analyses, age(younger, older) served as an independent-measures

    variable and the appropriate V condition served asthe covariate (i.e., performance in the consonant Vcondition served as a covariate when comparingconsonant VE for older and younger adults). The leftpanel of Figure 2 displays the adjusted means for VEfrom the ANCOVAs as a function of age and stimu-lus type. None of the differences between older andyounger adults for VE reached statistical signifi-cance: for consonants [F(1, 81) 2.1, p 0.15; forwordsF(1, 81) 1.4,p 0.23; for sentencesF(1, 81) 1, p 0.93].

    Analyses paralleling those for VE were also con-ducted to examine age differences in AE. Age againserved as an independent-measures variable andthe corresponding A scores (rather than V scores)served as the covariates. The right panel of Figure 2displays the adjusted means for AE as a functionof stimulus type and age. Separate ANCOVAswere conducted to examine age differences in AEfor the three stimulus types and none of thecomparisons reach statistical significance (all F-

    values less than 1).

    Correlations Between Measures

    Correlations Between V Performance for Con-

    sonants, Words, and Sentences In addition toexamining age-related changes in AE and VE, thepresent study was also designed to investigate therelationship between performance in the V and AVmodalities as a function of stimulus type. Figure 3displays the relationships between V performancefor consonants, words, and sentences as a function of

    both age and stimulus type. Overall, correlationswere higher for younger than for older adults andwere stronger when correlating V for words andsentences than for either consonants and words orconsonants and sentences. All correlations were sig-nificant at the .01 level except for the correlationbetween words and sentences for younger adults (p 0.001). The correlation between consonants andwords for older adults (p 0.09) was not significant.These findings suggest that V performance for thethree types of stimuli is mediated, at least in part,by a set of common mechanisms that are a critical

    component of speechreading and that operate inde-pendent of lexical or semantic constraints.Correlations Between Measures of AV Perfor-

    mance Figure 4 displays scatterplots and correla-tion coefficients for AV performance as a function ofstimulus type and age group. In contrast to thefindings with V scores, only one correlationthecorrelation between AV performance for words andsentences in younger participantswas significant(p 0.01). However, the magnitude of the correla-tion (0.46) was relatively modest in that AV perfor-mance for words accounted for just slightly greater

    than 20% of the variance in AV performance with

    Fig. 2. Left, Adjusted means forvisual enhancement for younger(filled bars) and older (openbars) adults as a function of stim-ulus type. Right, Same as in leftpanel except data are for audi-tory enhancement. Error bars in-dicate standard deviation.

    EAR& HEARING, VOL. 26 NO. 3 269

  • 8/12/2019 auditory visual perception

    8/13

    sentences. For older adults, there was a positiverelationship between AV performance with wordsand sentences but this did not reach statisticalsignificance. Correlations between consonants andwords and between consonants and sentences didnot reach significance (and were actually negativefor younger participants). One implication of thesefindings is that AV performance for consonants ismediated by mechanisms that are distinct fromthose used to identify words and sentences. Foryounger adults, the moderate correlation betweenwords and sentences indicates some overlap in themechanisms used to identify these two types ofstimuli when both auditory and visual speech infor-mation is available. However, the relatively smallcorrelations obtained for older adults across allthree stimulus types suggests that they may rely ondifferent mechanisms for AV perception of conso-nants, words, and sentences.Relationship Between Measures of VE and AEFigures 5 and 6 display the scatterplots and

    correlation coefficients for VE and AE between con-sonants, words, and sentences as a function of age

    group. Overall, the findings are similar to those

    obtained with AV presentations in that the stron-gest positive correlations obtained were betweenwords and sentences. For VE, the correlation be-tween consonants and sentences was significant forolder adults (p 0.05) but not for younger adultswho, similar to the results for AV presentations,exhibited a nonsignificant (p 0.27) negative corre-lation between VE for consonants and sentences.For AE, the correlation between consonants andsentences was not significant for older adults butwas significant and negative for younger adults.Note that similar to the significant correlationsobserved in the AV condition, the relative magni-tude of the correlations for both VE and AE weresmall to moderate, accounting for 8 to 14% of the

    variance. Also similar to the AV results, older adultsdemonstrated a small, but nonsignificant, positivecorrelation for both VE and AE between consonantsand words but younger adults exhibited a smallnegative relationship between these two stimulustypes. Taken together, these findings suggest thatthe mechanisms mediating VE and AE for conso-nants are largely distinct from those mediating VE

    and AE for words and sentences and this is true for

    Fig. 3. Scatterplots and Pearson product-moment correlations between visual-only scores for consonants and words (left),consonants and sentences (middle), and words and sentences (right). Solid circles and clear triangles represent data for youngerand older adults, respectively. Darker solid line shows the best fitting regression line for the younger adult data and the lighterline shows the best fitting regression line for the older adults. Single (*), double (**), and triple (***) asterisks indicate significancelevels of 0.05, 0.01, and 0.001, respectively.

    270 EAR& HEARING/ JUNE2005

  • 8/12/2019 auditory visual perception

    9/13

    both older and younger adults. The small but signif-icant correlations between the two enhancement

    measures (AE and VE) for words and sentencessuggest some overlap in the mechanisms underlyingenhancement for these two types of stimuli but alsoindicate significant independence in the operationsmediating the benefits of combined auditory and

    visual speech information for words and sentences.Finally, the similar correlations for both VE and AEbetween words and sentences in younger and olderadults is consistent with the use of similar abilitiesin the two groups to improve word and sentenceperception when both auditory and visual speechsignals are available.

    General Discussion

    The present study was designed to investigate theeffects of age on the ability to benefit from combiningauditory and visual speech information. The find-ings indicate that older and younger adults exhibitcomparable benefits, as indexed by measures of both

    VE and AE, for all three types of stimuli tested(consonants, words, and sentences). Moreover, olderadults were able to achieve similar enhancementrelative to either unimodal condition, despite signif-icant reductions in V performance. Correlations be-

    tween performance with the different stimulus types

    indicated significant positive correlations between Vperformance for consonants, words, and sentences,

    with younger adults generally showing strongercorrelations than older adults. Correlations acrossstimulus types for AV, VE, and AE, however, weregenerally significant only for the relationship be-tween words and sentences, with small (and some-times negative) correlations between consonantsand words and between consonants and sentences.

    Age and Speechreading Performance

    The results for the V condition are in good agree-ment with previous studies demonstrating age-re-lated declines in speechreading (Dancer, et al., 1994;Honneil, et al., 1991; Lyxell & Ronnberg, 1991;Middelweerd & Plomp, 1987; Shoop & Binnie, 1979).Shoop & Binnie (1979), for example, reported differ-ences of approximately 13% between younger (40 to50) and older (over age 71) adults on a CV speechre-ading test. This age difference is similar to thedifference of approximately 17% between youngerand older adults that was observed for the V condi-tion with VCV stimuli in the present study. The Vsentence data from the present study are also ingood agreement with previous investigations of age-related changes in speechreading. Dancer et al.

    (1994) and Honneil et al. (1991), for instance, both

    Fig. 4. Same as Figure 3, except the data are for auditory-visual presentations.

    EAR& HEARING, VOL. 26 NO. 3 271

  • 8/12/2019 auditory visual perception

    10/13

    reported declines of 8 to 10% in V sentence intelli-gibility for older, compared with younger adults.These values compare favorably with the 6% differ-ence in V sentence performance that was observedbetween older and younger adults in the presentstudy. Thus, the findings from both the presentstudy and previous investigations suggest that olderadults are less able than younger participants toencode visual speech information.

    To our knowledge, the current study also repre-sents the first demonstration of age-related declinesin V performance using older participants with clin-ically normal hearing (at least through 4 kHz).

    Viewed in conjunction with previous findings ofdeclines in V performance for hearing-impairedolder adults (Dancer et al., 1994; Honneil et al.,1991), our results suggest a dissociation betweenhearing abilities and speechreading, at least in olderadults. Thus, it appears that aging is associatedwith declines in one or more capacities that arecritical for successful encoding of V speech informa-tion and that these impairments are independent ofhearing status. One implication of these findings isthat if speechreading training is to be successful

    with older adults, future research must be directed

    at identifying and correcting the mechanisms re-sponsible for the poorer V performance for thispopulation.

    Another interesting aspect of the present V datais the reduced performance for sentences, comparedwith either consonants or words. This pattern ofidentification scores is exactly opposite what is ob-served for A presentations, where identification ofwords in sentences is generally higher than forisolated words or CVs (Bernstein, Demorest, &Tucker, 2000; Hutchinson, 1989; Nittrouer & Boo-throyd, 1990; Sommers & Danielson, 1999). Oneexplanation for the differential pattern of findingswith A and V presentations is that the advantage forsentences with A presentations is dependent on thelisteners ability to extract and use semantic andsyntactic information to increase the predictabilityof target items (Miller, et al., 1951; Sommers &Danielson, 1999). In the case of V presentations,however, extraction of semantic and syntactic cuesis more difficult because identification of individualwords is often not possible. The reduced availabilityof semantic and syntactic information combinedwith the increased processing demands of under-

    standing sentences (i.e., sentences present more

    Fig. 5. Scatterplots and Pearson product-moment correlations between visual enhancement for consonants and words (left),consonants and sentences (middle), and words and sentences (right). Solid circles and clear triangles represent data for youngerand older adults, respectively. Darker solid line shows the best fitting regression line for the younger adult data and the lighterline shows the best fitting regression line for the older adults. Asterisks indicate significance at the 0.05 level.

    272 EAR& HEARING/ JUNE2005

  • 8/12/2019 auditory visual perception

    11/13

    information than either words or consonants) prob-ably is the main reason that V performance for

    sentences was the poorest of the three stimulustypes for both older and younger adults.Despite the relatively poor performance with V

    presentations, significant correlations were ob-tained between V performance with consonants,words, and sentences, and the magnitude of thesecorrelations is similar to what has been reportedpreviously (Bernstein, et al., 2000; Demorest, Bern-stein, & DeHaven, 1996; Grant & Seitz, 1998).Grant & Seitz (1998), for example, reported a corre-lation coefficient of 0.49 between V scores for conso-nants and sentences. This value is similar to thecorrelations of 0.42 and 0.39 obtained for youngerand older adults, respectively, in the present study.Bernstein et al. (2000) measured V performance forphonemes, words, and sentences in normal-hearingadults (18 to 45 years of age) and found significantcorrelations between scores for phonemes and words(0.39) and phonemes and sentences (0.43). Simi-larly, Demorest et al. (1996) reported that V perfor-mance for younger adult participants on nonsensesyllables correlated approximately .5 with V perfor-mance on both words and sentences, with an evenstronger correlation (approximately 0.8) between Vperformance with words and sentences. Again, the

    magnitudes of these correlations are comparable to

    those observed for younger adults in the V conditionof the present study. Thus, the picture that is

    beginning to emerge from investigations ofspeechreading using different types of stimulus ma-terials is that there is some overlap in the mecha-nisms used to extract V speech information fromconsonants, words, and sentences and that this setof core abilities is similar for both older and youngeradults.

    Auditory and Visual Enhancement

    The finding that older and younger adults exhibitsimilar enhancement from combining auditory and

    visual speech information replicates and extendsresults from previous investigations examining theeffects of age on AV speech perception. To ourknowledge, the present investigation is the first toexamine age effects on auditory-visual enhancementacross three different types of stimuli after control-ling for age-related changes in unimodal encoding.Consistent with the absence of age differences inenhancement that were obtained in the presentstudy, Helfer (1997; 1998) reported that visual ben-efit for words presented in nonsense sentences wassimilar for younger normal-hearing adults and olderparticipants with mild to moderate hearing impair-

    ments. These findings are particularly relevant to

    Fig. 6. Same as in Figure 5, except data are for auditory enhancement.

    EAR& HEARING, VOL. 26 NO. 3 273

  • 8/12/2019 auditory visual perception

    12/13

    the current investigation because Helfer used one ofthe same relative measures of enhancement, VE, asin the present study and older adults were tested ata slightly higher signal-to-noise ratio than youngeradults. Taken together, the findings from Helfer(1998) and the present investigation suggests that

    older and younger adults exhibit similar enhance-ment from combining auditory and visual speechsignals over a relatively large range of A perfor-mance. An important direction for future researchwill be to examine whether the age equivalence inenhancement observed in the present study main-tains for more adverse listening conditions where Ascores are reduced and the need for integratingauditory and visual speech signals is even greater.

    The finding that older and younger adults exhib-ited comparable enhancement from combining audi-tory and visual speech information suggests that age

    differences in AV performance reflect age-relatedimpairments in the ability to encode V speech sig-nals (recall that A performance was manipulated tobe approximately equivalent in the two groups)rather than a reduced ability to combine or integrateinformation across the two modalities. Within themodel of AV perception proposed by Grant et al.(1998), this hypothesis argues that when older andyounger adults are able to obtain similar amounts of

    visual and auditory speech information, both groupsexhibit similar abilities to integrate the unimodalpercepts into a unified AV perception. That is, thepresent findings suggest that any age differences in

    AV perception do not result from age differences incentral integration capacities. Investigations arecurrently underway in our laboratory to obtain moredirect measures of integration per se (i.e., indepen-dent of unimodal encoding) with the working hy-pothesis that older and younger adults will exhibitsimilar integration efficiency.

    Pattern of Correlations for VE and AE

    In contrast to the significant correlations betweenV performance for consonants, words, and sen-tences, both VE and AE for these three types ofstimuli generally exhibited small, and in some cases,even negative relationships. In interpreting theseresults it is important to note that a number ofmethodological differences between the conditionsmay have contributed to the relatively low correla-tions. Specifically, consonants were tested using aclosed-set format whereas words and sentences weretested using an open-set format. In addition, a singlemale talker produced the items for the consonanttest, a single female talker produced the items forthe word test, and both male and female talkers

    were used to produce items for the sentence test.

    Despite these methodological differences, however,the findings are generally consistent with previousmeasures of visual enhancement as a function ofstimulus type. Grant & Seitz (1998) also reported anear-zero (0.06) correlation between visual enhance-ment for consonants and sentences in a group of

    hearing-impaired adults ranging in age from 41 to76. This pattern of results is readily explainedwithin the model of AV perception proposed byGrant et al. (1998). Specifically, the model proposesthat AV performance for consonants is determinedprimarily by bottom-up capacities that serve toextract appropriate linguistic cues from auditoryand visual speech signals. For words and sentences,however, AV performance is determined by bothbottom-up extraction of auditory and visual cuesand by top-down lexical, syntactic, and semanticprocessing. Thus, the small correlations between

    enhancement (both AE and VE) for consonants andenhancement for other types of stimuli may reflectthe increased importance of top-down processingabilities for the more linguistically complex wordsand sentences. Consistent with this proposal, thestrongest correlations for both AE and VE werebetween words and sentences, suggesting someoverlap in the mechanisms mediating enhancementfor these two types of stimuli.

    Finally, it is useful to consider the pattern ofcorrelations for both AE and VE as a function ofstimulus type in relation to the analogous correla-tions for V and A. As noted, both the current studyand previous investigations have found moderate tostrong correlations between V performance for con-sonants, words, and sentences. Similarly, Humes etal. (1994) found that correlations between A perfor-mance with the same three types of stimuli (conso-nants, words, and sentences) ranged between 0.35and 0.85. These findings suggest that the absence ofstrong correlations for AE and VE across conso-nants, words, and sentences is a consequence of themechanisms mediating integration rather than towithin-modality differences in the mechanisms un-derlying identification of consonants, words and

    sentences. An important goal for future research,therefore, will be to identify the unique demandsimposed by integration and to specify why thosedemands reduce or eliminate correlations betweenenhancement for consonants, words, and sentences.

    ACKNOWLEDGMENTS

    Portions of these data were presented at the 144th meeting of theAcoustical Society of America, Cancun, Mexico. This research wassupported by grant R01 AG 18029 4 from the National Instituteon Aging. The authors thank Arnold Heidbreder for technicalassistance.

    274 EAR& HEARING/ JUNE2005

  • 8/12/2019 auditory visual perception

    13/13

    Address for correspondence: Mitchell S. Sommers, Department ofPsychology, Washington University, Campus Box 1125, St. Louis,MO 63130.

    Received April 20, 2004; accepted December 12, 2004

    REFERENCESAmerican National Standards Institute. (1989). Specification for

    Audiometers.ANSI S3.61989. New York.American Speech and Hearing Association (1988). Guidelines for

    determining threshold levels for speech. American Speech andHearing Association,8589.

    Bernstein, L. E., Demorest, M. E., & Tucker, P. E. (2000). Speechperception without hearing. Perception and Psychophysics, 62,233252.

    CHABA, Committee on Hearing and Bioacoustics, WorkingGroup on Speech Understanding and Aging. (1988). Speechunderstanding and aging. Journal of the Acoustical Society of

    America, 83, 859 895.Cienkowski, K. M., & Carney, A. E. (2002). Auditory-visual

    speech perception and aging. Ear and Hearing, 23, 439 449.

    Dancer, J., Krain, M., Thompson, C., Davis, P., & Glenn, J.(1994). A cross-sectional investigation of speechreading inadults: Effects of age, gender, practice and education. Volta

    Review, 96, 3140.Dekle, D. J., Fowler, C. A., & Funnell, M. G. (1992). Audiovisual

    integration in perception of real words. Perception & Psycho-physics, 51, 355362.

    Demorest, M. E., Bernstein, L. E., & DeHaven, G. P. (1996).Generalizability of speechreading performance on nonsensesyllables, words, and sentences: Subjects with normal hearing.

    Journal of Speech and Hearing Research, 39, 697713.Folstein, M. F., Folstein, S. E., & McHugh, P. R. (1975). Mini-

    mental state: A practical method for grading the cognitive stateof patients for the clinician.Journal of Psychological Research,12, 189 198.

    Grant, K. W., & Seitz, P. F. (1998). Measures of auditory-visualintegration in nonsense syllables and sentences. Journal of the

    Acoustical Society of America, 104, 24382450.Grant, K. W., Walden, B. E., & Seitz, P. F. (1998). Auditory-visual

    speech recognition by hearing-impaired subjects: Consonantrecognition, sentence recognition, and auditory-visual integra-tion. Journal of the Acoustical Society of America, 103 , 26772690.

    Helfer, K. S. (1997). Auditory and auditory-visual perception ofclear and conversational speech.Journal of Speech, Language,and Hearing Research, 40, 432443.

    Helfer, K. S. (1998). Auditory and auditory-visual recognition ofclear and conversational speech by older adults. Journal of the

    American Academy of Audiology, 9, 234 242.

    Honneil, S., Dancer, J., & Gentry, B. (1991). Age and speechre-ading performance in relation to percent correct, eyeblinks,and written responses. Volta Review, May, 207212.

    Humes, L. E., Watson, B. U., Christensen, L. A., Cokely, C. G.,Halling, D. C., & Lee, L. (1994). Factors associated withindividual differences in clinical measures of speech recogni-tion among the elderly. Journal of Speech and Hearing Re-search, 37, 465474.

    Hutchinson, K. M. (1989). Influence of sentence context on speechperception in younger and older adults. Journal of Gerontology,

    44, 3644.Lyxell, B., & Ronnberg, J. (1991). Word discrimination and

    chronological age related to sentence-based speech-readingskill. British Journal of Audiology, 25, 310.

    McGurk, H., & MacDonald, J. (1976). Hearing lips and seeingvoices. Nature, 264, 746 748.

    Middelweerd, M. J., & Plomp, R. (1987). The effect of speechre-ading on the speech-reception threshold of sentences in noise.

    Journal of the Acoustical Society of America, 82, 21452147.Miller, G. A., Heise, G. A., & Lichten, W. (1951). The intelligibility

    of speech as a function of the context of the test material.Journal of Experimental Psychology, 41, 329 335.

    Morrell, C. H., Gordon-Salant, S., Pearson, J. D., Brant, L. J., &Fozard, J. L. (1996). Age- and gender-specific reference rangesfor hearing level and longitudinal changes in hearing level.

    Journal of the Acoustical Society of America, 100, 19491967.Nittrouer, S., & Boothroyd, A. (1990). Context effects in phoneme

    and word recognition by younger children and older adults.Journal of the Acoustical Society of America, 87, 27052715.

    Pelli, D., Robson, J., & Wilkins, A. (1998). The design of a new

    letter chart for measuring contrast sensitivity. Clinical VisionScience, 2, 187199.

    Pichora-Fuller, M. K., Schneider, B. A., & Daneman, M. (1995).How younger and older adults listen to and remember speechin noise. Journal of the Acoustical Society of America, 97,593608.

    Rabinowitz, W. M., Eddington, D. K., Delhorne, L. A., & Cuneo,P. A. (1992). Relations among different measures of speechreception in subjects using a cochlear implant. Journal of the

    Acoustical Society of America, 92, 18691881.Sams, M., Manninen, P., Surakka, V., Helin, P., & Ktt, R.

    (1998). McGurk effect in Finnish syllables, isolated words andwords in sentences: Effects of word meaning and sentencecontext. Speech Communication, 26, 7587.

    Shoop, C., & Binnie, C. A. (1979). The effects of age upon the

    visual perception of speech. Scandinavian Audiology, 8, 38.Sommers, M. S., & Danielson, S. M. (1999). Inhibitory processes

    and spoken word recognition in younger and older adults: Theinteraction of lexical competition and semantic context. Psy-chology and Aging, 14, 458 472.

    Sumby, W. H., & Pollack, I. (1954). Visual contributions to speechintelligibility in noise. Journal of the Acoustical Society of

    America, 26, 212215.Summerfield, Q. (1987). Some preliminaries to a comprehensive

    account of audio-visual speech perception. In B. Dodd and R.Campbell (Eds.), Hearing by Eye: The Psychology of Lip-reading (pp. 351). Hillsdale, NJ: Lawrence Erlbaum Associ-ates.

    Tye-Murray, N. & Geers, A. (2001). Childrens audio-visual en-

    hancement test, Central Institute for the Deaf, St. Louis,Missouri.Tyler, R. D., Preece, J., & Tye-Murray, N. (1986). The Iowa laser

    videodisk tests. University of Iowa Hospitals: Iowa City, IA.Walden, B. E., Busacco, D. A., & Montgomery, A. A. (1993).

    Benefit from visual cues in auditory-visual speech recognitionby middle-aged and elderly persons. Journal of Speech and

    Hearing Research, 36, 431436.

    EAR& HEARING, VOL. 26 NO. 3 275