Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Georgia State UniversityScholarWorks @ Georgia State University
Psychology Dissertations Department of Psychology
Summer 8-1-2012
Investigating Speech Perception in EvolutionaryPerspective: Comparisons of Chimpanzee (Pantroglodytes) and Human CapabilitiesLisa A. HeimbauerGeorgia State University
Follow this and additional works at: https://scholarworks.gsu.edu/psych_diss
This Dissertation is brought to you for free and open access by the Department of Psychology at ScholarWorks @ Georgia State University. It has beenaccepted for inclusion in Psychology Dissertations by an authorized administrator of ScholarWorks @ Georgia State University. For more information,please contact [email protected].
Recommended CitationHeimbauer, Lisa A., "Investigating Speech Perception in Evolutionary Perspective: Comparisons of Chimpanzee (Pan troglodytes)and Human Capabilities." Dissertation, Georgia State University, 2012.https://scholarworks.gsu.edu/psych_diss/106
INVESTIGATINGSPEECHPERCEPTIONINEVOLUTIONARYPERSPECTIVE:
COMPARISONSOFCHIMPANZEE(PANTROGLODYTES)ANDHUMANCAPABILITIES
by
LISAA.HEIMBAUER
UndertheDirectionofMichaelJ.Owren
ABSTRACT
Therehasbeenmuchdiscussionregardingwhetherthecapabilitytoperceivespeechis
uniquelyhuman.The“SpeechisSpecial”(SiS)viewproposesthathumanspossessaspecialized
cognitivemoduleforspeechperception(Mann&Liberman,1983).Incontrast,the“Auditory
Hypothesis”(Kuhl,1988)suggestsspoken‐languageevolutiontookadvantageofexisting
auditory‐systemcapabilities.InsupportoftheAuditoryHypothesis,thereisevidencethat
Panzee,alanguage‐trainedchimpanzee(Pantroglodytes),perceivesspeechinsynthetic“sine‐
wave”and“noise‐vocoded”forms(Heimbauer,Beran,&Owren,2011).Humancomprehension
ofthesealteredformsofspeechhasbeencitedasevidenceforspecializedcognitivecapabili‐
ties(Davis,Johnsrude,Hervais‐Adelman,Taylor,&McGettigan,2005).
InlightofPanzee’sdemonstratedabilities,threeexperimentsextendedthese
investigationsofthecognitiveprocessesunderlyingherspeechperception.Thefirstexperi‐
mentinvestigatedtheacousticcuesthatPanzeeandhumansusewhenidentifyingsine‐wave
andnoise‐vocodedspeech.ThesecondexperimentexaminedPanzee’sabilitytoperceive
“time‐reversed”speech,inwhichindividualsegmentsofthewaveformarereversedintime.
Humansareabletoperceivesuchspeechifthesesegmentsdonotmuchexceedaverage
phonemelength.Finally,thethirdexperimenttestedPanzee’sabilitytogeneralizeacrossboth
familiarandnoveltalkers,aperceptuallychallengingtaskthathumansseemtoperform
effortlessly.
Panzee’sperformancewassimilartothatofhumansinallexperiments.InExperiment1,
resultsdemonstratedthatPanzeelikelyattendstothesame“spectro‐temporal”cuesinsine‐
waveandnoise‐vocodedspeechthathumansaresensitiveto.InExperiment2,Panzeeshowed
asimilarintelligibilitypatternasafunctionofreversal‐windowlengthasfoundinhuman
listeners.InExperiment3,Panzeereadilyrecognizedwordsnotonlyfromavarietyoffamiliar
adultmalesandfemales,butalsofromunfamiliaradultsandchildrenofbothsexes.Overall,
resultssuggestthatacombinationofgeneralauditoryprocessingandsufficientexposureto
meaningfulspokenlanguageissufficienttoaccountforspeech‐perceptionevidencepreviously
proposedtorequirespecialized,uniquelyhumanmechanisms.Thesefindingsinturnsuggest
thatspeech‐perceptioncapabilitieswerealreadypresentinlatentforminthecommon
evolutionaryancestorsofmodernchimpanzeesandhumans.
INDEXWORDS:Speechperception,Evolution,Language‐trainedchimpanzee,Syntheticspeech
INVESTIGATINGSPEECHPERCEPTIONINEVOLUTIONARYPERSPECTIVE:
COMPARISONSOFCHIMPANZEE(PANTROGLODYTES)ANDHUMANCAPABILITIES
by
LISAA.HEIMBAUER
ADissertationSubmittedinPartialFulfillmentoftheRequirementsfortheDegreeof
DoctorofPhilosophy
intheCollegeofArtsandSciences
GeorgiaStateUniversity
2012
CopyrightbyLisaA.Heimbauer
2012
INVESTIGATINGSPEECHPERCEPTIONINEVOLUTIONARYPERSPECTIVE:
COMPARISONSOFCHIMPANZEE(PANTROGLODYTES)ANDHUMANCAPABILITIES
by
LISAA.HEIMBAUER
CommitteeChair: MichaelJ.Owren
Committee: RoseA.Sevcik
GwenFrishkoff
MichaelJ.Beran
ElectronicVersionApproved:
OfficeofGraduateStudies
CollegeofArtsandSciences
GeorgiaStateUniversity
August2012
iv
DEDICATION
Idedicatethisdissertationtomyfamily,especiallymylovinghusbandGarywhohas
supportedmeunconditionallyandunselfishlysinceIdecidedtobeginmyeducation.Iamalso
gratefultomyparents,AlbertandMarie,forinstillinginmealoveofeducation,andmy
brotherBillanddaughter‐in‐lawTarawhowerealwaystherewhenIneededsomeonetolisten.
Tomychildren,Randy,Stephanie,Melanie,andGary,andmygrandchildren,Holly,Samantha,
Thomas,John,andLeah,Iespeciallyhopemyjourneyinspiresyoutoalwaysfollowyour
dreams—wherevertheyleadyou.
v
ACKNOWLEDGMENTS
Iamtrulyindebtedandgratefultomyadvisor,MichaelOwren,forthesupportand
guidanceheprovidedthroughoutthewritingofmydissertation.Additionally,hisrigorous
scholarshipandtheoreticalbeliefsontheevolutionofcommunicationhavebeeninspiringand
havehelpedtoshapemyprogramofresearchduringmygraduateschooleducation.These
pagesalsoreflecttherelationshipswithothergenerousandinspiringindividualswhohave
influencedme,servedonmycommittees,andprovidedsupportandfriendshipduringmytime
atGeorgiaStateUniversityandtheLanguageResearchCenter,especiallyMikeBeran.
Inaddition,Iamobligedtootherswhohavesupportedmebycontributingtomy
education,especiallydissertationcommitteemembersRoseSevcikandGwenFrishkoff,and
facultymembersSarahBrosnanandSeydaÖzçaliskan.Iwouldalsoliketothankthemembers
oftheResearchontheChallengesofAcquiringLanguageandLiteracyinitiativeandtheDuane
M.RumbaughFellowshipcommitteeforthegenerousfellowshipsupport.Finally,Iamgrateful
tothegraduatestudentswhohaveprovidedintellectualandmoralsupport,andmore
importantly,friendship.
vi
TABLEOFCONTENTS
ACKNOWLEDGMENTS…………..………………………………………..……………………………………………………….v
LISTOFTABLES……………..…..…………………………………………………………………………………………………….x
LISTOFFIGURES……..……………………………………………………………………………………………..…………….…xi
1.INTRODUCTION………………………………………..………………………………………………………………………1
1.1Humanspeech……………………………………………………………………………………………………3
1.1.1Naturalspeech‐perceptionphenomena………………………………………………6
1.1.2Syntheticspeechperception……………………………………………………………….7
1.2TheSiSargument…………………………………………………………………………………………….11
1.3TheAuditoryHypothesis………………………………………………………………………………….13
1.3.1Mammalianhearing………………………………………………………………………….14
1.3.2 Speechperceptioninnon‐primates……………………………………………………17
1.3.3Speechperceptioninnonhumanprimates……………………………………….19
1.3.4Evidenceoftop‐downprocessingwhenperceivingspeech……………….22
1.4Speechperceptioninalanguage‐trainedchimpanzee…………………………………….24
1.4.1PreviousexperimentswithPanzee………..……………………………….…………..24
2.CURRENTEXPERIMENTS…………………………………………………………………………………………………28
2.1GeneralMethods………………………………………………………………………………………….…30
2.1.1Subject…………………………………………………………………………………………….…30
2.1.2Participants………………………………………………………………………………………..30
2.1.3Apparatus…………………………………………………………………………………………..31
2.1.4Stimuli………………………………………………………………………………………………..31
vii
2.1.5Chimpanzeeprocedure……………………………………………………………………..32
2.1.6Humanprocedures……………………………………………………………………………33
2.1.7Dataanalysis…………………………………………………………………………………….34
3.EXPERIMENT1……………………………………………..………………………………………………………………..34
3.1Experiment1a………………………………………………………………………………………………….35
3.1.1Subject……………………………………………………………………………………………..36
3.1.2Participants………………………………………………………………………………………36
3.1.3Stimuli………………………………………………………………………………………………36
3.1.4Chimpanzeeprocedure…………………………………………………………………….38
3.1.5Humanprocedure………………….…………………………………………………………39
3.1.6Dataanalysis……………………………………………………………………………………40
3.1.7Results……………………………………………………………………………………………..41
3.1.8Discussion………………………………………………………………………………………..43
3.2Experiment1b………………………………………………………………………………………………..45
3.2.1Subject……………………………………………………………………………………………..46
3.2.2Participants………………………………………………………………………………………46
3.2.3Stimuli………………………………………………………………………………………………46
3.2.4Chimpanzeeprocedure…………………………………………………………………….48
3.2.5Humanprocedure…………………………………………………………………………….49
3.2.6Dataanalysis……………………………………………………………………………………49
3.2.7Results……………………………………………………………………………………………..49
3.2.8Discussion………………………………………………………………………………………..51
viii
3.3GeneralDiscussion………………………………………………………………………………………..52
4.EXPERIMENT2…………………………………………………………………………………………………………….53
4.1Subject………………………………………………………………………………………………………….54
4.2Participants…………………………………………………………………………………………………..54
4.3Stimuli…………………………………………………………………………………………………………..55
4.4Chimpanzeeprocedure…………………………………………………………………………………55
4.5Humanprocedure…………………………………………………………………………………………56
4.6Dataanalysis…………………………………………………………………………………………………56
4.7Results………………………………………………………………………………………………………….58
4.8Discussion…………………………………………………………………………………………………….60
5.EXPERIMENT3…………………………………………………………………………………………………………….62
5.1Subject………………………………………………………………………………………………………….65
5.2Participants…………………………………………………………………………………………………..65
5.3Audio‐recordingandstimuli…………………………………………………………………………65
5.4Chimpanzeeprocedure……..…………………………………………..…………………………….67
5.5Dataanalysis…………………………………………………………………………………………………67
5.6Results……………………………………………………………………………………………………….…68
5.7Discussion………………………………………………………………………………………………….…69
6.GENERALDISCUSSION…………………………………………………………………………………………………70
6.1Currentresults…………………………………………………….………………..........................71
6.2Implicationsofexperimentalresults……………………………………………………………72
6.2.1 TheSiSviewversustheAuditoryHypothesis………………………………….72
ix
6.2.2 Top‐downprocessingandspeechperceptionexperience………………74
6.3 Cognitiveprocessingandlanguage……………………………………………………………75
6.4 Futuredirections……………………………………………………………………………………….78
REFERENCES………..……………………………………………………………………………………………………………80
x
LISTOFTABLES
Table1.Orientationandtestwordgroups…………………………………………………………………………..37
Table2.Lower‐to‐uppernoise‐bandcutofffrequencies………………………………………………….…..47
xi
LISTOFFIGURES
Figure1.Spectrographicwordexamples…………………………………………………………………................5
Figure2.Panzee’ssyntheticspeechword‐recognitionperformance……………………..……………….27
Figure3.SamplesofphotographsusedinPanzee’sspoken‐wordidentification…………………….32
Figure4.Panzeeworkingonacomputertask…………………………………………………………………………33
Figure5.Intelligibilityofsine‐wavespeechtohumans.……………………………………….…………………36
Figure6.Experiment1achimpanzeeandhumanwordrecognition……………………..………………..42
Figure7.Intelligibilityofnoise‐vocodedspeechtohumans………………………………….………………..46
Figure8.Experiment1bchimpanzeeandhumanwordrecognition……………………………………….50
Figure9.Intelligibilityoftime‐reversedspeechtohumans……………………………………………………54
Figure10.Experiment2chimpanzeeandhumanwordrecognition………………….……………………59
Figure11.SamplesoflexigramsusedinPanzee’sspoken‐wordidentificationtask………………..67
Figure12.Experiment3chimpanzeeword‐recognitionperformanceacrosstalkers………….…..68
1
1.INTRODUCTION
Speechperceptionistheabilitytohearandrecognizetheacousticsofspokenlanguage.
Itinvolvesmanylevelsofprocessing—fromtheauditoryinputtothecomprehensionoflexical
meaning.Lenneberg(1967)claimedthatbothspeechproductionandspeechperceptionare
uniquelyhumanadaptations;aviewlatertermed“SpeechisSpecial”(SiS)byLiberman(1982).
Incontrast,studieswithnonhumanshaverevealedthatsomeanimalsareabletodiscriminate
andcategorizephonemes—thesmallestunitofspeechsounds—muchashumansdo(Kluender,
Diehl,&Killeen,1987;Kuhl&Miller,1975;Kuhl&Padden,1982,1983).Therefore,itmaybe
thatauditoryprocessinginhumansandnonhumansarefundamentallysimilar,asproposedby
Kuhl’s(1988)“AuditoryHypothesis.”Inthisview,acommonevolutionaryancestorofhumans
andothermammalspossessedlatentspeech‐processingcapabilitiesthatpredatedspeech
itself.Evolutionofhumanspeech‐productioncapabilitieswouldthenhavetakenadvantageof
existingauditoryprocessing.
Numerousexperimentshaveinvestigatedbothhumanandnonhumanspeechpercep‐
tiontoevaluatetheseopposingviews.Forexample,evidenceproposedtosupporttheSiS
approachincludesthathumansareabletorecognizemeaningfulspeechinanumberof
fundamentallyaltered,syntheticforms(Remez,2005;Trout,2001).Theseabilitieshavebeen
difficulttoshowinnonhumans.Animalstypicallydonotunderstandwordmeaningandare
testedwiththeverybrief,meaninglessphonemecomponentsofspeechinstead.More
recently,however,experimentswithalanguage‐trainedchimpanzee(Pantroglodytes)named
Panzeehavedemonstratedtheabilitytorecognizesyntheticspeechinsomeofthehighly
reducedformshumanshavebeentestedwith(Heimbauer,Beran,&Owren,2011).These
2
outcomessupporttheAuditoryHypothesis,suggestingthatcomprehensionofmeaningful
speechdoesnotrequireperceptionspecializations.Instead,listenerscanapplygeneralaudi‐
toryprocessessharedwithpre‐homininancestors.
Afterdiscussinggeneralaspectsofhumanspeechperceptionandproduction,key
findingsfromexperimentsinvestigatingSiSandtheAuditoryHypothesiswillbereviewed.Three
newexperimentsconductedwithPanzee,whichfurtherinvestigatedthespeech‐perception
capabilitiesofthischimpanzee,willthenbepresented.Thefirstoftheseexperimentsextended
anearlierfindingwhichdemonstratedthatPanzeewasabletorecognizespokenwordspre‐
sentedin“sine‐wave”and“noise‐vocoded”forms(Heimbaueretal.,2011).Botharesynthetic
versionsofspeechthatreducenormalspeechacousticstosmallsetsofsinewavesandnoise
bands,respectively.ItwashypothesizedbasedonPanzee’spreviousresultsthatshewouldrely
onthesameacousticcuesashumanswhenhearingthesespeechforms(Remez,Rubin,Pisoni,
&Carrell,1981;Shannon,Zeng,Kamath,Wygonski,&Ekelid,1995).Alternatively,Panzeecould
relyonamoreholisticapproach,forinstancematchinggeneralcharacteristicsofsynthetic
wordstocorrespondingnaturalversions.Totestthishypothesis,bothPanzeeandhuman
participantsweretestedwithvaryingversionsofsine‐waveandnoise‐vocodedspeech,using
stimulisimilartothoseusedinprevioushumantesting(Remezetal.,1981;Shannonetal.,
1995).
Inthesecondexperiment,Panzeeandhumansheardwordspresentedin“time‐
reversed”form.Thismanipulationinvolvesseparatingthespeechsignalintofixedtime‐length
segmentsandthenreversingthese“windows”throughoutthesignal.Humansfindspeechin
thisformhighlyintelligibleforreversalwindowsuptoapproximately100millisecondsinlength
3
(Saberi&Perrott,1999),withintelligibilitydecreasingaswindowlengthincreases.Here,the
relationshipbetweenphoneme‐andwindow‐lengthisproposedtobethecriticalfeature.
Specifically,ifthereversalwindowissmallerthantheaveragedurationofaphoneme,the
manipulatedspeechremainscomprehensible.However,ifwindowlengthexceedsphoneme
duration,thenthespeechbecomescriticallydistorted.BasedonPanzee’srobustspeech‐
perceptionabilities,itwasagainhypothesizedthatPanzeewouldperformsimilarlytohuman
listeners,therebyprovidingevidenceofattendingtothephonemecontentofspeech.
Alternatively,ifPanzee’sspeechrecognitionreliesonadifferentstrategythanhumansuse—for
instance,amoreholisticapproach—shewouldnotshowthesamewindow‐length‐dependent
intelligibilityfunction.
ThethirdexperimentinvestigatedPanzee’sabilitytounderstandspeechfromavariety
oftalkers.Thiscapabilityisimportantbecausetheacousticcharacteristicsofspeechcanvary
significantlyamongbothindividualtalkersandclassesoftalkers(Pisoni,1995;Remez,2005).As
Panzeeroutinelyinteractswith,andrespondsto,differentindividuals,itwashypothesizedthat
shewouldreadilyperceivefamiliarwordsregardlessofthetalker.Alternatively,shemaysimply
havebecomeaccustomedtothespeechofparticular,knownindividualswithoutbeingableto
normalizetonoveltalkers.Toinvestigatethishypothesis,Panzeewastestedforrecognition
abilitieswiththespeechofavarietyofmaleandfemalefamiliaradults,unfamiliaradultsand
children.
1.1Humanspeech
Whenperceivingnaturalspeech,listenersarefacedwithmanyperceptualchallenges.
Themostimportantistocombineandcategorizethediverseacousticelementsmakingupthe
4
speechstream(Remez,2005),therebymappingsignalacousticsontotheirlinguisticcorrelates
(Pisoni,1995).Wordsareproduceddifferentlyeachtimetheyareuttered,andminimaldiffer‐
encesbetweenwordsoftenmatter.Researchers,therefore,havefocusedonhowlistenerscan
recognizespecificspeechsoundsinthefaceofpotentiallylargeacousticdifferencesandthen
usethisinformationtounderstandthephoneticcontentofspokenlanguage(Appelbaum,
1996).
TheacousticcharacteristicsofspeechareillustratedinspectrographicforminFigure1a.
Aspectrogramisavisualdepictionofthetime‐varyingpropertiesofsound,withtimedisplayed
onthehorizontalaxis,frequencyontheverticalaxis,andamplitudeshownasthedarknessof
shadingatanyparticularpoint(Olive,Greenwood,&Coleman,1993).Oneimportantcharacter‐
isticisthefundamentalfrequency(F0),whichcorrespondstothebasicrateofvibrationofthe
vocalfoldsinthelarynx.F0istypicallythelowestprominentfrequencyvisibleinaspeech
spectrogram.RegularvibrationalsoproducesenergyatintegermultiplesofF0,referredtoas
“harmonics”(H2andH3arelabeledinthefigure).Energyfromthelarynxsubsequentlypasses
throughthepharynx,oralcavity,andnasalcavity,whoseresonancesacttofilterthisenergy.
Theseresonances,termed“formants”strengthentheenergyinsomefrequencyregionswhile
weakeningitinothers.Theseeffectsarevisibleinaspectrogramaslarger,darkbandsof
energy.Thelowestthreebands,oftenalsocalledformants,arealsolabeledinthefigure(F1,
F2,andF3).
5
Figure1.Spectrographicwordexamples.Thespectrogramswerecreatedusingasamplingrate
of44100Hzand0.03sGaussiananalysiswindow.a)Thenaturalword“tickle,”showingits
fundamentalfrequency(F0),nexthigherharmonics(H2andH3),andlowestthreeformants
(F1,F2,andF3).b)Theword“tickle”insine‐waveform,withindividualsine‐waves(SW)
marked.c)Theword“tickle”innoise‐vocodedform,madewithfivenoise‐bands(NB).d)The
word“tickle”intime‐reversedform,witha50mstime‐reversal(TR)windownoted.
6
Theperceptualchallengeofrecognizingacousticallyvariablespeechisformallyknown
asthe“lackofinvariance”problem.Speechacousticscanbehighlyvariable,withindividuals
showingsystematicdifferencesinspeechrate,F0s,andformantvalues.Thelatterareeven
moredifferentamongclassesoftalkers,forinstancemalesversusfemales,andadultsversus
children.Inadditiontovaryingduetophysicalandphysiologicalproperties,speechacoustics
canalsodifferduetofactorssuchastalkeremotionalstateorregionaldialect(Pisoni,1995).
Theresultisthatthereisnosimplemappingbetweenacousticstructureandphoneticunits,
meaningthatlistenershavetocategorizeatalker’sphonemesintheabsenceofinvariantcuing.
Humansnonethelessroutinelyrecognizespeechfrombothfamiliarandunfamiliartalkers,an
abilityreferredtoas“talkernormalization.”
1.1.1Naturalspeechperceptionphenomena.Listenersaccomplishabroadarrayof
auditoryperceptualtasksbothwhenlearningandafterhavingmasteredlanguage,manyof
whichseemeffortless.Themajorityoflanguageskillsarelearnedimplicitly,withoutinstruction,
relativelypassively,andwithminimalconsciousattention.Infantsasyoungaseightmonthsold
cansegmentandorganizephonemesandwordsfromacontinuous,acousticspeechstream
(Marcus,Vijayan,Rao,&Vishton,1999;Saffran,Aslin,&Newport,1996).Categorizationof
individualphonemeshasbeendemonstratedatevenyoungerages(Werker&Desjardins,
1995),whichislikelyimportantinlearninghowdifferentlanguageelementscombine.Experi‐
mentsbyWerkerandDesjardins(1995)haverevealedthatat6to8monthsofage,infants
discriminatedphonemesacrossavarietyoflanguages,butby10to12monthsofagehadbe‐
cometunedtothephonemesofthelanguagespokenbytheircaregivers.
7
Inadditiontoimplicitperceptionofnaturalspeech,thereareautomaticperceptual
phenomenathatoccurwhenhumanshearspeechinalteredordistortedforms.Onenotable
exampleistheabilitytoidentifywordscontainingaphonemeorasyllablethathasbeen
replacedbywhitenoise.Thiscapabilityhasbeentermed"phonemicrestoration”byWarren
(1970)andissodeeplyrootedcognitivelythatparticipantscannotonlyidentifythewords,but
alsoreporthearingthemintheirentiretywithoutperceptiblegaps(Warren&Obusek,1971).
Oneinterpretationofthisphenomenonisthatthemissingsegmentisre‐createdinthebrain,
evenintheabsenceofthatsound(Kashino,2006).Thisabilityto“hear”themissingsegment
presumablyenableslistenerstofillinthegapsthatroutinelyoccurwhenspeechisheardin
noisy,everydaysituations.Anotherphenomenonofinteresthasbeentermed“duplexpercep‐
tion”(Rand,1974).Duplexperceptionoccurswhenashortsinewave,soundinglikeachirp,is
presentedtooneearandisperceptuallyintegratedintoanotherwiseincompletephoneme
beingplayedtotheotherear(Whalen&Liberman,1987).DavisandJohnsrude(2007)
suggestedthatduplexperceptionillustratesthathumanlistenersactivelyattempttoorganize
soundintoperceptiblespeechwheneverpossible.Infact,thistendencytoorganizelikely
extendstoanyformofdistortedspeech.
1.1.2Syntheticspeechperception.Studiesusingfundamentallyalteredspeechforms
havebeeninvaluableforunderstandinghowthehumancognitivesystemorganizesacoustic
elementsofspeechformeaningfullanguagecomprehension.Here,threeformsofaltered
speechareofparticularinterest.Twoofthese,sine‐waveandnoise‐vocodedspeech,lackmany
oftheacousticfeaturestraditionallyconsideredcrucialtospeechperception,includingF0and
formants(seeFigures1band1c).Thethirdform,time‐reversedspeech,altersmoment‐to‐
8
momenttemporalpatterninginthesignal(seeFigure1d).Ineachcase,humanlistenersrelyon
theirextensiveexperiencewithspokenlanguagetomakesenseoftheinput.Beingabletodo
soisconsideredaformof“top‐downprocessing,”wherebyalistenertakesadvantageof
previouslylearnedacousticandphoneticinformation(Davis&Johnsrude,2007;Mann&
Liberman,1983;Newman,2006;Whalen&Liberman,1987).Top‐downprocessingislikely
criticalinnormativespeechperceptionaswellasindifficultlisteningsituations,withprocessing
ofsyntheticwordsandsentencesbecomingusefulinunderstandinghowacousticinputcan
contributetorecognitionofspeechatvariouslevelsoforganization(Davis,Johnsrude,Hervais‐
Adelman,Taylor,&McGettigan,2005;Hillenbrand,Clark,&Baer,2011;Remezetal.,2009;
Saberi&Perrott,1999).
Since1981,sine‐wavespeechhasbeeninvestigatedforthepurposeofunderstanding
spokenlanguageprocessing(Lewis&Carrell,2007;Remezetal.,2009;Remezetal.,1981;
Rosneretal.,2003).Inthissynthesisform,wordsorsentencesareproducedfromthreesine
wavesthattrackthefirstthreeformantsofthenaturalspeechsignal(seeFigure1b).Sine‐wave
speechisextremelyunnatural‐soundingandisconsideredtopreservekeyphoneticproperties
onlyinanabstractform(Remezetal.,2009).Intheirexperiments,Remezandcolleagues
(1981)presentedasine‐wavesentencetohumanlisteners.Whentheparticipantswerenot
toldthatthesesoundscouldbeunderstoodasspeech,theydescribedthemas“science‐fiction
sounds”or“whistles.”Whentheyweretoldthattheywouldbehearingsentencesproducedby
acomputer,however,thelistenersweretypicallyabletoidentifyasubstantialnumberofthe
syllablesandwordsinthesentence.Theresearchersconcludedthatperceptionofsine‐wave
speechwasevidencefora“speechmodeofperception,”andthatlistenersexpectingtoheara
9
language‐likestimulustunedintothismode.Evenintheabsenceoftraditionalacousticcues,
listenerswereabletoperceivephoneticcontentinthesine‐wavesignal.
Anotheralteredformofinterestis“noise‐vocoded”speech,whichissynthesizedfrom
noisebands(seeFigure1c).Tocreatenoise‐vocodedspeech,thenaturalsignalisdividedintoa
numberoffrequencybandsusingindividualband‐passfilters.Theintensitypattern,orampli‐
tudeenvelope,ofeachbandisextractedoverthelengthofthatsignal.Resultingenvelopesare
thenusedtomodulatecorresponding,frequency‐limitedbandsofwhitenoise.Theresultisa
seriesofamplitude‐modulated,noisewaveformsthatwhensummedpotentiallybecomes
recognizableasharsh,butcomprehensiblespeech(Davisetal.,2005;Shannonetal.,1995).
Perceptionofnoise‐vocodedspeechisofparticularinterest,becauseitisasimulationof
theinputproducedbyacochlearimplant—asurgicallyimplanted,electronicdeviceforthe
hearing‐impaired(Dorman,Loizou,Spahr,&Maloff,2002).However,noise‐vocodedspeechis
alsousefulininvestigatingspeechperceptioninnormallyhearingindividuals,asitpreserves
theamplitudeandtemporalinformationoftheoriginalutterance,whileomittingmostspectral
detail(Shannonetal.,1995).EvenintheabsenceofF0andformantsnoise‐vocodedspeechcan
carryasurprisingamountofinformationregardingphonemes(Dormanetal.,2002;Sawusch,
2005).Onecriticalfactoristhenumberofnoisebandsusedinthesynthesisprocess.Listeners
cannotreliablyrecognizenoise‐vocodedspeechcreatedwithonlytwonoisebands.However,
recognitionbecomesmuchmoreconsistentifthreeorfourbandsarepresent(Shannonetal.,
1995).Whentenormorenoisebandsareused,noise‐vocodedspeechisreadilyintelligible
eventonaïvelisteners(Davisetal.,2005).Individualshearingspeechinthissynthesisform
typicallyshowimprovementwithpractice.Forexample,Davisetal.(2005)reportedthat
10
identificationofnoise‐vocodedwordsinsentencesincreasedfromlessthan10%to70%correct
withinjustafewminutes.
Themostrecentlydevelopedsynthesisformofinterestistime‐reversedspeech(Barkat,
Meunier,&Magrin‐Chagnolleau,2002;Purnell&Raimy,2008;Saberi&Perrott,1999).As
showninFigure1d,suchsoundscontainsegmentsofequallengththathavebeenreversedin
time,typicallyonamillisecond(ms)scale.Thismanipulationpreservestheamplitudeofeach
frequencycomponentateverypointintime,butreversesthepatternofenergychangeswithin
eachwindow.Theresultingdisruptionoftheamplitudeenvelopecouldmakethesignal
unintelligible,butlistenersarereliablyabletorecognizespeechcontentatwindowlengthsup
to100ms(Saberi&Perrott,1999).Windowlengthsexceeding100msproducepartialintelligi‐
bility,withthe50%“threshold”pointforintelligibilityoccurringat130msormore.The
interpretationisthatindividualphoneticsegments(“phones”)inspeechrangefromapproxi‐
mately50to100ms(Crystal&House,1988).Inotherwords,reversal‐windowsupto100‐ms
longleavemanyindividualphonemesundisturbed.However,longerwindowsbreakupmore
andmoreindividualphonemes,therebymakingthespeechunintelligible.Barkatetal.(2002)
confirmedthisviewofphoneme‐lengthperceptioninfindingthatFrench‐speakingparticipants
hearingFrenchsentencesshowedadifferent50%intelligibilitythresholdfortime‐reversed
speech.Althoughtheselistenersalsodemonstrateddecreasedspeechrecognitionaswindow‐
lengthsincreased,intelligibilityfellmoreslowly.The50%thresholdwasreachedatawindow
approximately20mslongerthanforEnglish‐speakinglistenershearingEnglishsentences,likely
reflectingalongermeanphonemedurationinFrenchcomparedtoEnglish.
11
Overall,resultsofsynthetic‐speechperceptionexperimentsprovideevidencethatsine‐
wave,noise‐vocoded,andtime‐reversedversionscanbeintelligible.Itisnotclear,however,
whatexactlythesethreesynthesisformshaveincommontomakeeachrecognizableto
listeners.Forsyntheticspeech,Remezetal.(1994)characterizethecriticalcuesforsine‐wave
andnoise‐vocodedspeechas“spectro‐temporal”patterning,whilealsonotingthatthetwo
versionsdonotshowobviouscommonalities.Thisinterpretationisarguablysupportedby
findingsfromtime‐reversedspeechexperiments,whichshowthatpreservingspectro‐temporal
informationwithinphonemesisacriticalfactor(seealsoDrullman,2006;Remez,2005).
1.2TheSiSargument
In1967,Libermanandcolleaguesproposedthatinordertoperceivespeech,humans
mustdrawontheirimplicitknowledgeofhowphonemesarearticulated.Theyhypothesized
thatspokenwordsareperceivedbyidentifyingassociatedvocaltractgestures,ratherthanby
identifyingsoundpatternsofspeechitself.Thishypothesiswastermedthe“MotorTheoryof
SpeechPerception,”andimpliesthatonlyproducersofspeechcanalsoperceivespeech.
Therefore,bothspeechproductionandspeechperceptionwouldbe“special”tohumans.Later,
Fodor(1983)revivedthehistoricalconceptofmentalmodularityfromGall’sphrenology,in
whichindividualmentalfacultiesareassociatedwithdomain‐specificareasofthebrain.Fodor
suggestedthatthesespecializedcognitivemodulesoperateindividuallyondomain‐specific
input.Thistheory“uppedtheante”foruniquelyhumanspeechperception,withLibermanand
colleagues(Liberman&Mattingly,1989;Mann&Liberman,1983;Whalen&Liberman,1987)
thenclaimingthathumanshaveaspecialized,cognitivemoduleresponsibleforspeech
perception.
12
InsupportofthisSiSargument,avarietyofhumanspeechperceptionphenomenahave
beenproposedasevidenceforaspeechmode,aFodorianphoneticmodule,orboth.For
example,bothphonemicrestorationandduplexperceptionhavebeencitedasevidencefora
uniquelyhuman,speechmodule(Liberman&Mattingly,1989;Mann&Liberman,1983;
Whalen&Liberman,1987).Findingspertainingtoneuralmechanismsofspeechperception
havealsobeeninterpretedinthisway.Forinstance,mosthumansshowa“right‐ear
advantage”(REA)whenprocessingthephoneticelementsofspeech,meaningthatinputpre‐
sentedtotherightearisrecognizedmorequicklyandaccuratelythanthatpresentedtotheleft
ear.TheREAisattributedtophoneticallybasedlanguageprocessingbeingprimarilylateralized
tothelefthemisphereinapproximately95%ofthepopulation(Hughdahl,2004;Kimura,1961;
Studdert‐Kennedy&Shankweiler,1970).
Historically,dichoticlisteningexperimentshavebeenusedtoinvestigatelanguage‐
relatedlateralizationeffects(forareview,seeHugdahl&Davidson,2004).Intheseexperi‐
ments,differentphoneticstimuliarepresentedsimultaneouslytothetwoears,and
participantsareinstructedtoreportwhatisheardinoneearortheother.Studdert‐Kennedy
andShankweiler(1970)useddichoticlisteningtoshowthatbothhemispheresareinvolvedin
processingpurelyauditoryparametersofaspeechsignal.However,theyfoundthatleft
lateralizationoccurswhenspecificallylinguisticfeaturesareattendedto.Theyconcludedthat
thesefindingsprovideevidenceofaspecialized,linguisticdevicelocatedinthelefthemi‐
sphere—therebysupportingtheSiSview.Cutting(1974)investigatedtheREAassociatedwith
syllableandphonemeperception,usingstylized,syntheticvowelandconsonant‐vowelsyllables
andsine‐wavefacsimilesthatwerenotperceivedasspeechsounds.Heconcludedthatthere
13
mightbetwoprocessingmechanismsinthelefthemisphere—onededicatedtospeechperse,
andtheotherrespondingtorapidfrequencychangesinanyauditoryinput,whichprocesses
formantmovementinspeechaswellasfrequencymodulationinothersounds.
Itshouldbenotedthatsomeanimals,includingnonhumanprimates,havealso
demonstratedasymmetriesforperceptionofvocalizations(forreviewsseeCorballis,2009;
Taglialatela,2007).Japanesemacaques(Macacafuscata)haveshownanREAandleft‐
hemisphereadvantageforspecies‐specificvocalizations(Petersen,Beecher,Zoloth,Moody,&
Stebbins,1978;Petersenetal.,1984),ashavemice(Ehert,1987).Magneticresonanceimaging
studiesofgorilla(Gorillagorillagorilla),orangutan(Pongopygmaeus),chimpanzee(Pan
troglodytes),andbonobo(Panpaniscus)brainshaverevealedalargerplanumtemporaleinthe
lefthemisphere,parallelinganasymmetryinthecorrespondingareaofthehumanbrain
(Gannon,Holloway,Broadfield,&Braun,1998;Hopkins,Marino,Rilling,&MacGregor,1998).In
humans,theplanumtemporaleisknowntobeinvolvedinbothproductionandperceptionof
spokenlanguage(discussedinmoredetailbelow).
1.3TheAuditoryHypothesis
DespiteavarietyofevidenceinsupportoftheSiShypothesis,theclaimthatonlyspeech
producerscanalsoperceivespeechisproblematic,becauseonlyhumanshavethevocal‐tract
morphologycapableofproducingspeech(deBoer,2006;Fitch,2000).Thedescendedlarynx,in
conjunctionwiththehyoidboneandtongueroot,enablesproductionofdiverseformant
patternsbyhumans(Lieberman,1968).Therefore,theuniquenessofhumananatomymakesit
difficulttorefuteSiSclaims.Inaddition,humanswhoaremutecanstillroutinelyperceive
speech(Studdert‐Kennedy,1980),andsomeanimalshavedemonstratedatleasttheabilityto
14
discriminateandcategorizespeechsoundsmuchashumansdo(Kluender,Diehl,&Killeen,
1987;Kuhl&Miller,1975;Kuhl&Padden,1982,1983).Trout(2001)hasproposedthatrefuting
theSiSviewrequiresdemonstratingacommoncognitiveorbiologicalsubstrateforspeech
perceptioninhumansandnonhumananimals.Inhisview,suchademonstrationisneededto
supporttheargumentthatalthoughthehumanabilitytoproducespeechisuniqueamong
mammals,thesamemaynotbetrueforspeechperception.However,todate,neithera
commonmechanismnorcommonfunctionalorganizationforspeechperceptionhasbeen
shownacrossspecies.Itmaybe,instead,thathumanspeech‐productioncapabilitiesaremore
specificallyadaptedtolanguagethantheauditorysystemis.
Notsurprisingly,numerousauditory‐perceptionstudieswithanimalshavebeen
conductedtotestclaimsofhumanspecializationsforspeech.Atleastsomeevidencesuggests
thatperceptionofspeechisrootedinthesamegeneralmechanismsofauditionandperception
evolvedtohandleotherclassesofenvironmentalsounds(Diehl,Lotto,&Holt,2004).Thenext
section,therefore,reviewssomebasicsofmammalianhearing,followedbyspeech‐perception
evidencefromresearchwithanimals,includingbothnon‐primatesandnonhumanprimates.
1.3.1Mammalianhearing.Theauditory‐perceptioncapabilitiesofhumansandother
mammalsmayverywellhaveevolvedsimilarly,asthegeneralmorphologyandfunctionsofthe
auditorysystemhavebeensimilarinallmammalssincethemiddleearwastransformedfrom
itsreptilianform(Allin,1975;Fleagle,1999;Stebbins,1983).Inmammals,thebasicfunctionof
theearisamplificationandtransmissionofsound,inadditiontosensorytransduction.The
outerearconsistsofthepinna,usedforgatheringandlocalizingsound.Soundwavesarethen
funneledalongtheearcanaltothetympanicmembrane(eardrum).Theexternalearfilters
15
soundsomewhat,passingenergyin2to5kHzfrequencybest—arangethatisimportantfor
speechperception(Breedlove,Watson,&Rosenzweig,2007).Soundpressurewavescausethe
tympanicmembranetovibrate,whichinturnmovesthethreesmallbones(ossicles)ofthe
middleear.Thelastboneinthechain—thestapes—transmitsthisenergytotheovalwindow,a
smallmembraneatthebaseofthecochleaoftheinnerear.Here,energyfromtheoriginal
cyclicalairpressurechangestowavesinfluid(Kalat,2009;Pickles,1988).Finally,transduction
occurswhenthesecochlearfluidwavescauseneuralfiringintheeighthcranialnerve,whichis
transmittedtotheauditorycortex(forfurtherdetails,seeHackney,2006).
Theauditorysystemissimilarinallmammals,functioningasasoundlocalizerand
showingsensitivityacrossarangeofsoundfrequencies.However,thatrangevarieswithhead
size(andassociatedinter‐auraldistance),assoundlocalizationrequiresspecieswithsmaller
headstobesensitivetohigher‐frequencyenergy(Gans,1992;Heffner,2004;Masterton,
Heffner,&Ravizza,1969).Smallermammalsthustypicallyhearwellabovethenominal20‐kHz
limitofhumanhearing.Theearliestmammalswerelikelysimilar(Rosowski,1992),asaremost
nonhumanprimates(Beecher,1974;Jackson,Heffner,&Heffner,1999;Mastertonetal.,1969;
Owren,Hopp,Sinnott,&Petersen,1988;Stebbins,1973;Stebbins&Moody,1994).However,
frequencysensitivityhasalsobeensubjecttochangeoverthecourseofprimateevolution,for
instanceduetohabitatdifferencesaswellasbody‐sizeeffects(Owrenetal.,1988).
Onesuchchangewasthatlarger‐bodiedmonkeysandapesacquiredbetterlow‐
frequencyhearing.Thistrendhasresultedinbothchimpanzeesandhumansshowingwell‐
developedhearingbothatlow(belowapproximately1kHz)andmid‐rangefrequencies(1to8
kHz).Stebbins(1973)hasproposedthatthesehearingchangesmayhavebeenrelatedto
16
evolutionarypressuresformoreintricate,intra‐specificvocalcommunicationsystems.This
viewisconsistentwithhumanspeechperceptionpotentiallybeinggroundedinthemore
generalauditoryprocessingcapabilitiesoflargermammalsthanbeingspecies‐specific.Stebbins
(1983)hasalsoshownthathearinginchimpanzeesismoresimilartothatofhumansthanto
OldandNewWorldmonkeys—whocanhearfrequenciesashighas40to45kHz.Hisreview
includedseveralauditoryparameters—high‐andlow‐frequencysensitivity,lowestfrequency
threshold,bestfrequency,andareaoftheaudiblefield—forfivemammalianspecies,including
anon‐primate(treeshrew),aprosimian(bushbaby),andthreeanthropoidprimates(macaque,
chimpanzee,andhuman).Becauselow‐andmid‐rangefrequencyhearingiscommonto
chimpanzeesandhumans,overallhumanauditorysensitivitymayfacilitatecommunication
withoutbeingaspecializationforspeech.
IncontrasttothechimpanzeehearingcapabilitiesreviewedbyStebbins(1983),Kojima
(1990)reportedthattwochimpanzeeswerenotablylesssensitivethanhumanstofrequencies
between2and4kHz.AlthoughKojimafoundexternalearresonancestobeapproximatelythe
sameforbothspecies,thechimpanzeesensitivitypatternshowedapronounceddecreasein
mid‐rangesensitivity‐‐moresimilartopatternsfoundinOldandNewWorldmonkeys
(Beecher,1974;Stebbins,1973)thaninhumans.However,2to4kHzisalsotherangeof
highestenergyinchimpanzeescreams(seeFigure1binRiede,Arcadi,&Owren,2007)—which
arefrequentlyveryloud.Itmay,instead,bethatthedecreasedmid‐rangesensitivityof
Kojima’stwocaptivesubjectsreflectedhearinglossduetorepeatedexposuretohigh‐
amplitudescreamingbyconspecificsintheanimals’confinedhousingspaces.Ifso,
17
chimpanzeeswithoutanyhearingimpairmentshouldbeabletoprocesscriticalfrequenciesof
speechaswellashumans.
1.3.2Speechperceptioninnon‐primates.Researchattemptingtofindhuman‐
nonhumananimalcommonalitiesinspeechperceptionmainlyfocusedonperceptionof
rudimentaryelementsofspokenlanguage.InsupportoftheAuditoryHypothesis,experiments
withchinchillasbyKuhlandMiller(1975)andLoebachandWickesberg(2006)haverevealed
thattheseanimalsperceiveanddiscriminateatleastsomeindividualphonemicfeaturesof
speech.Intheirseminalstudy,KuhlandMiller(1975)demonstratedthatchinchillasdiscrimi‐
natedbetweenconsonant‐vowelsyllablesdifferinginvoice‐onsettime.Thisfeatureisthe
lengthoftimebetweeninitialconsonantarticulationmovementsandonsetofvocal‐foldvibra‐
tion.Specifically,thechinchillasweretrainedtoresponddifferentlytoavarietyofinitial/t/and
/d/consonant‐vowelsyllablesproducedbyeighttalkers.Theanimalsalsocorrectlyclassified
novelinstancesofinitial/t/and/d/syllables,includingsyllablesproducedbynewfourtalkers,
producedinnewvowelcontexts,andcomputer‐synthesized/ta/and/da/syllables.
Inaneurologicalstudy,LoebachandWickesberg(2006)demonstratedthattheremight
beacommonphysiologicalsubstrateintheperipheralauditorysystemofchinchillasand
humansinvolvedinrecognitionofspeechcues.Theanimalsshowedauditory‐nerveresponses
whenhearingsyllablesinnaturalandnoise‐vocodedformthatresembledthoseshownby
humanshearingthesamesounds.LoebachandWickesbergpresentedfoursyllablesproduced
bymaletalkersinbothnaturalformandresynthesizedformusingone,two,three,orfour
noisebands.Despitethedifferentspectralprofilesofnaturalandnoise‐vocodedspeech,
chinchillasrespondedsimilarlytohumanstothecuestoconsonantidentity.Inhumans,
18
commonspectro‐temporalfeaturesofnaturalandnoise‐vocodedspeechprovidethesecues
forspeechrecognition,andperceptionisenhancedasthenumberofnoisebandsusedinnoise‐
vocodingsynthesisincreases(Shannonetal.,1995).Findingparallelperformanceinchinchillas
suggeststhattheseanimalsrespondtothesamespectro‐temporalcuesinnoise‐vocodedand
naturalspeechthathumansrespondto.
Birdsarenotmammals,andthereforearenotcloseevolutionaryrelativesofhumans,
butaviancommunicationabilitiesshouldalsobementioned.AlthoughtheclassAvesoriginated
approximately50millionyearsago(Sibley&Alquist,1990),researchwithbirdsalsoprovides
evidencethathumansmaybeutilizinggeneralmechanismsinspeechprocessing.Vocal
communicationinbirdsshowssomeparallelstohumanabilities.Forexample,manysongbird
speciesusecomplexvocalizationsandsimilarunderlyingdevelopmentalandmechanistic
processesmaybeinvolved(forareview,seeBeckers,2011,andDoupe&Kuhl,1999).
Psychoacousticstudieshaveshownthatatleastsomebirdsdemonstratesomeofthepercep‐
tualphenomenaproposedtobespecialtohumanshearingspeech—specifically,phoneme
discriminationandcategorization,compensationforcoarticulation,andanabilitytosolvethe
lackofinvarianceproblem.Asoneexample,bothsongbirdsandnon‐songbirdshavedemon‐
stratedtheabilitytodiscriminatephonemevariationsofthevowel/a/(Hienz,Sachs,&Sinnott,
1981).Japanesequail(Coturnixcoturnix)canalsodiscriminateandcategorizeinitial‐consonant
syllables(Kluenderetal.,1987),whilebudgerigars(Melopsittacusundulates)discriminate
vowelcategoriesandaremoresensitivetophonemicvoweldistinctionsthantotalker‐related
vowelvariation(DoolingandBrown,1990).Morerecently,Ohms,Gill,VanHeijningen,Beckers,
andtenCate(2010)foundthatzebrafinches(Taeniopygiaguttata)coulddiscriminateand
19
categorizemonosyllabicwordsthatdifferinvowels,andcangeneralizethisabilitytounfamiliar
maleandfemaletalkers.Resultssuchasthesesuggestthatsomebirdshaveanabilitytonor‐
malizespecificcomponentsofspeechacrosstalkers.Budgerigarsandzebrafincheshaveeven
demonstratedtheabilitytodiscriminatebothfull‐formantandsine‐waveversionsof/ra‐la/,
revealingsomesimilaritiestohumandiscriminationofspeechandspeech‐likesounds(Bestet
al.,1989;Doolingetal.,1995).Clearly,birdsshareatleastsomerudimentaryspeech‐
perceptioncapabilitieswithhumans.
1.3.3Speechperceptioninnonhumanprimates.Nonhumanprimates,closer
evolutionaryrelativesofhumansthanofothermammalsandbirds,alsodemonstrateauditory
processingcapabilitiesthatsupporttheAuditoryHypothesis.Intwostudies,KuhlandPadden
(1982,1983)testedrhesusmacaques(M.mulatta),anOldWorldmonkey,andhumaninfants
tocomparediscriminationofvoicedandvoicelessphonemes.Thedifferencebetweenthese
twoisthatvoicedsoundsincludevocal‐foldvibrationandvoicelesssoundsdonot.Monkeys
werefirsttrainedtocategorizethesephonemetypesina“same‐different”procedure,and
afterweretestedwithunfamiliarsyllablepairsonastimuluscontinuumrangingfromvoicedto
voiceless(e.g.,/ba‐pa/).Inthesecondexperiment,thesubjectsheardsyllablepairsdifferingin
“placeofarticulation”(e.g.,/b/‐/d/).Thisfeaturecanrefertotongueplacementduringspeech
production.Forexample,when/b/isproduced,thelipscometogetherandthetongueisheld
awayfromtheteeth.However,when/d/isproduced,thelipsareseparatedandthetongue
touchestheridgeabovethetopoftheteeth.Bothexperimentsrevealedthatthemacaques
dividedthestimuluscontinuumatthesamephysicalpointsashumans,meaningtheyshowed
similarboundarypointsincategorizationbasedonbothvoicingandplace.
20
Morerecently,ithasbeendemonstratedthatJapanesemacaquescanperceivethe
articulationeventsofspeech,althoughtheirperformancebetterresemblesthatofhuman
infantsthanofadults.SinnottandGilmore(2004)investigatedperceptionofplace‐of‐
articulationinformationinnaturalspeechbymonkeysandadulthumans,presenting
consonant‐voweltokensconsistingof/b/or/d/combinedwith/i/,/e/,/a/,or/u/.Tongue
placementisdifferentacrossthesesounds,with/a/and/u/describedas“back”vowels,and/i/
and/e/as“front”vowels.Infrontvowelsthetongueispositionedasfarforwardaspossible
withoutcreatingaconstrictionthatwouldproduceaconsonantsound.Inbackvowelsthe
tongueispositionedasfarbackaspossiblewithoutcreatingaconstriction.
SinnottandGilmore(2004)usedatwo‐choiceidentificationtask,wherebythemonkeys
andhumanparticipantshadtoactivelyclassify/b/versus/d/consonant‐vowelstimuliby
movingalever.Humansperformedwellwithallstimuli,whilethemonkeysperformedbetter
withtokensbasedonthebackvowels/a/and/u/thanwithfrontvowels/i/and/e/.Anearlier
studyhadshownthatthree‐tofour‐month‐oldhumaninfantsalsoclassifybackvowelsmore
easily(Eimas,1999),evidentlylearninghowtoreliablydifferentiatefrontvowelsovertime.
SinnottandGilmorethereforeconcludedthatthemonkeys’performancereflectedbasic
auditory‐systemprocessing,asisalsofoundinpreverbalinfantsbeforecriticalspeech‐related
learningoccurs.
Thereisalsoevidencethatcotton‐toptamarins(Saguinusoedipus),aNewWorld
monkey,performsimilarlytoone‐month‐oldinfantswhendiscriminatingDutchversus
Japanesesentences(Ramus,Hauser,Miller,Morris,&Mehler,2000).Althoughtestingmethods
werequitedifferentforthetwospecies,theyfoundthatbothwereabletodistinguishthese
21
languages.Inaddition,bothspeciesdidsointhefaceofatleastmodesttalker‐related
variability,asstimulifromatotaloffourspeakersofeachlanguagewereused.Neither
monkeysnorinfantswereabletodiscriminatethelanguageswhensentenceswereplayed
backwards.Theresearchersconcludedthattheinfantswerelikelyusinginnate,generalized
auditoryprocessingsharedacrossnonhumanprimates,ifnotallmammals.
Asmentionedearlier,bothbehavioralandneuroanatomicalstudieshaverevealedthat
primatesotherthanhumansshowhemisphericlateralizationeffects(Taglialatela,2007).
Specifically,severalmacaquespeciesexhibitanREAforcommunicativelyrelevantvocalizations.
Inchimpanzees,brain‐imagingstudieshaverevealedparallelstoWernicke’sandBroca’sareas
inhumans—bothofwhichareconsideredcriticalinlanguagefunctions.Wernicke’sareais
locatedintheplanumtemporaleofthetemporallobeandisinvolvedinhumanlanguage
perceptionandcomprehension.Thisregioncanbeasmuchasfivetimeslargerintheleft
hemispherethanintheright.Inchimpanzees,thisbrainstructurewassignificantlylargerinthe
lefthemispherefor94%ofMRIscansexaminedbyGannonandcolleagues(1998).Broca’sarea
islocatedintheinferiorfrontalgyrus,andiscriticalinspeechproduction.Similarly,thisareais
activatedinchimpanzeeswhentheseanimalsvocalize(Taglialatela,Russell,Schaeffer,&
Hopkins,2008).
Onlyalimitednumberofstudieshavetestedperceptualdiscriminationofspeech
soundsinapes.Inonesuchexperiment,Kojimaetal.(1989)presentedtwochimpanzeeswith
consonant‐vowelsyllables,usingsynthetic/ga/‐/ka/and/ba/‐/da/continuatoexaminepercep‐
tionofvoicingandplace‐of‐articulationcontrasts,respectively.Usingresponsetimesasan
indexofperceivedsimilarity,chimpanzeesdemonstratedbetterdiscriminationofsyllables
22
whenphoneticcontrastswerebasedonthesamefeaturesofvoicingandarticulationthat
humansattendto.Asintheearlier‐mentionedSinnottandGilmore(2004)experimentwith
macaques,chimpanzeesdidnotperformaswellashumans,likelyreflectingtheimportanceof
earlyexperienceinhumanspeechperceptualdevelopment.Asnotedearlier,oneunique
aspectofhumanlanguageacquisitionistheimportanceoflearningandenvironmentalinput.
1.3.4Evidenceoftop‐downprocessingwhenperceivingspeech.Speechperception
studieswithnonhumanprimatesappeartodemonstratetheoperationofgeneralauditory
processing—insupportoftheAuditoryHypothesis.Asinstudieswithnon‐primates,however,
thisworkhasnotprovidedevidenceofthehigher‐level,top‐down‐processingcapabilities
claimedtobeevidenceofauniquelyhuman,cognitivemodule(Liberman&Mattingly,1989;
Mann&Liberman,1983;Whalen&Liberman,1987).Suchprocessingispotentiallydifficultto
demonstratewiththerudimentarytypesofstimuliusedinthesestudies,withtop‐downeffects
emergingmoreclearlyinhumansatthelevelofwordsandsentences.Inotherwords,
distinguishingSiStheoryandtheAuditoryHypothesisalsorequiresexamininghigher‐order
processingofmeaningfulspeechinnonhumans,ratherthanformeaningfulspeechsegments.
Infact,acousticcuestophonemesandsyllablesmaynotbeprocessedinthesameway
aslexicalcomponentsoflanguage.Forexample,theformercouldbeprocessedasnon‐speech
soundsare,using“bottom‐up”perceptionthatisnotstronglyguidedbyhigher‐levelknowledge
ofmeaningfullanguage.Becausechimpanzeesandhumansarecloseevolutionaryrelatives—
divergencefromacommonancestoroccurred5‐to8‐millionyearsago(Wood,1996)—
investigatingwordrecognitionbytheseapescouldprovidecompellingevidencefortheSiS
versusAuditoryHypothesisdebate.AsStebbins(1983)noted,chimpanzeesmaybethespecies
23
toshedlightonwhethertheabilitytoperceivemeaningfulspeechwaspresentinlatentformin
homininsbeforetheevolutionofmechanismstoproducespeech.
RevisingtheMotorTheoryofSpeechPerception,Galantucci,Fowler,andTurvey(2006)
recentlysuggestedthattheonlycompellingevidenceforneuralhardwarespecializedfor
speechwouldbediscoveringadedicatedcircuitactive“ifandonlyif”speechisperceivedor
produced.Galantuccietal.furtherarguethatspeechcannotbeunderstoodinisolation,that
productionandperceptioncomponentsnecessarilyworktogether,andthatspokenlanguageis
criticallyembeddedinacommunicativecontext.Thesearerestrictiveargumentsthatagain
tendtoinherentlyruleoutthepossibilityofanimalexperiments.Asmentionedearlier,given
thatnonhumanprimatesdonotspeak,itisimpossibletotestthemforadedicatedcircuit
involvedinbothspeechproductionandperception.
Anotherpointisthatveryfewnonhumansareraisedinaspeech‐richenvironment,
meaningthatanimalstypicallyhavenoopportunitytoexperiencetheinputthatiscrucialto
humanlanguagedevelopment.Averyfewapes,however,havebeenraisedinamannersimilar
tohumans.Throughacombinationoflanguageexposureandenculturationbyhumans,these
animalsacquiredbothmeaningfulcommunicativeandword‐recognitionabilities(Beran,
Savage‐Rumbaugh,Brakke,Kelley,&Rumbaugh,1998;Brakke&Savage‐Rumbaugh,1995;
Rumbaugh&Savage‐Rumbaugh,1996;Savage‐Rumbaugh,Murphy,Sevcik,&Brakke,1993).
WhileapproachingtheimpossiblecriteriaproposedbyGalantuccietal.(2005;seealsoTrout,
2001),language‐capableapesalsohavethepotentialtobetestedwithmeaningfulspeech.
24
1.4Speechperceptioninalanguage‐trainedchimpanzee
Language‐trainedapesarguablypresentaconvincing,andpossiblyunique,opportunity
forinvestigatingspeechperceptioninnonhumansandsettlingtheSiSversusAuditory
Hypothesisdebate.Specifically,ifspeechperceptionutilizesaspecialized,humancognitive
module,thenalanguage‐trainedapeshouldnotbeableidentifyorunderstandspeech
presentedinthealtered,syntheticformsproposedtorequirethisspecialization.Furthermore,
suchananimalshouldarguablynotexhibittalkernormalizationformeaningfulspeech,given
theassociatedlackofinvarianceproblem.However,previousresearchwithoneparticular
language‐trainedapesuggestsotherwise,insupportoftheAuditoryHypothesis.
1.4.1PreviousexperimentswithPanzee.Toexaminewhetherapesandhumanscan
showfundamentalsimilaritiesinspeechprocessing,arecentseriesofexperimentsassessedthe
perceptualcapabilitiesofachimpanzeenamedPanzee.Thisanimalisanadultfemale,housed
attheLanguageResearchCenter(LRC)atGeorgiaStateUniversity(GSU).Shewasraised
routinelyhearingspeechfromtheageofeightdaysold,showingreliablerecognitionof
approximately130spokenEnglishwords.Panzeewasalsotaughtcorrespondingvisuo‐graphic
symbols,calledlexigrams,andcanuseboththesesymbolsandassociatedphotographsto
communicateineverydayandexperimentalsituations.WhenPanzeehearsafamiliarEnglish
word,sheisreliablyabletochoosethecorrect,correspondinglexigramorphotographfrom
amongmultiplealternatives(Beranetal.,1998).Todate,Panzeehasbeentestedwithnatural,
whispered,andseveralformsofsyntheticspeech(Heimbaueretal.,2011;Heimbauer,
unpublisheddata).
25
TotestPanzee,48two‐tofive‐syllablewordswerechosenfromamongherfamiliar
Englishwords.Inalltheexperiments,Panzeefirstheardawordinnaturalorsyntheticform.
Hertaskwasthentouseajoysticktoselecttheonelexigramorphotographcorrespondingto
thatwordfromamongfoursuchitemsappearingonacomputerscreen.Panzeereceivedno
feedbackforcorrectorincorrectchoicesduringtestsessions,rulingoutthepossibilityoflearn‐
inghowtorespondtoalteredversionsofthewords.However,shedidreceivearewardevery
threeorfourtrialsonarandomized,noncontingentbasis.Sixteendifferentnaturalandeight
differenttestwordswerepresentedineachexperimentalsession,withsessionsincludingfour
blocksofthesewordsinrandomizedorder,foratotalof96trials.Typically,asessionlasted20
to30minutes.
Annualtestingovera10‐yearperiodendingin2008hasdemonstratedthatPanzee’s
sessionperformancefornaturalwordsisconsistentlybetween75%and85%correct
(M.J.Beran,personalcommunication,January2010).Whentestedwithwordsinwhispered
form,sheperformedsimilarlyatorabove75%.Subsequenttestingwithwordsinfourdifferent
syntheticformsproducedthesameresults(Heimbauer,unpublisheddata).Oneformsimply
reproducedtheoriginalutteranceascloselyaspossible,butanotherformwaslesscomplete.
Thislesscomplete,“voiced‐only”formincludedonlytonalelementsoftheoriginal,meaning
thatanynoise‐basedcomponentswereremoved.Perceptionofvoiced‐onlyspeechthus
involvestop‐downprocessing,asunvoiced,noisyacousticsareimportantcontributorstomany
Englishphonemes.Despitethemissingsounds,Panzeeshowednodifferenceinperformance
withvoiced‐onlyversusnaturalwords,evenwhenconsideringonly“firsttrials”‐‐the48
26
instancesinwhichsheheardagivenwordinsyntheticformforthefirsttime(Heimbauer,
unpublisheddata).
ThetworemainingsynthesisformswereselectedtobedirectlyrelevanttotheSiS
versusAuditoryHypothesisdebate,withoutcomesshowninFigure2.Onewasnoise‐vocoded
speech,synthesizedusingsevennoise‐bands.Thisformcanbechallengingtohumans,butis
relativelycomprehensibletomostlisteners(Davis&Johnsrude,2007;Davisetal.,2005;
Hervais‐Adelmanetal.,2008;Shannonetal.,1995).Despitethispotentialchallenge,Panzee’s
performancewasstatisticallywellabovechancelevelbothonfirsttrialsandoverall.Her
performancewiththeseversionswasstatisticallybelowtheoutcomesfornaturalwords
presentedinthesamesession,butsheshowednoevidenceofhavinglearnedhowtorespond
tothem(Heimbaueretal.,2011).Thelastsyntheticformtestedwassine‐wavespeech,
describedas“science‐fictionsounds”and“whistles”bynaïvehumanlisteners(Remezetal.,
1981)and“impossiblyunspeechlike”byprominentspeechresearchers(Remezetal.,1994).
Again,Panzee’sperformancewasabovechanceonfirsttrialsandoverall.Whileshewasless
accurateidentifyingsine‐wavewordsthannaturalwords,humanperformancewassimilarto
thisinatranscriptiontask‐‐evenwithhumansreceivingorientationtosine‐wavespeechand
hearingallthenatural‐wordstimuliaheadoftime(Heimbaueretal.,2011).
Theresultsofthesenoise‐vocodedandsine‐wavespeechexperimentsprovidedthefirst
evidenceofhuman‐likeperformancebyanonhumanrespondingtomeaningful,butincomplete
andperceptuallydifficultsyntheticspeech.Top‐downprocessingabilitiesarenecessaryto
identifyspeechinbothforms,withbothPanzeeandhumanlistenersneedingtoaccessprevi‐
ousknowledgeaboutnaturalspeechinordertoidentifyalteredversionsofthewords.Given
27
thatsomeresearchershavelinkedperceptionofbothnoise‐vocodedandsine‐wavespeechtoa
proposedphoneticmodule(e.g.,Trout,2001;Whalen&Liberman,1987),theseresultsdirectly
contradictSiSargumentswhilesupportingtheAuditoryHypothesis.However,similaritiesin
recognitionperformancedonotnecessarilyimplysimilaritiesinunderlyingprocessing,andit
remainsunclearifPanzeewasusingthesameperceptualstrategiesashumanstoperformthis
task.
Figure2.Panzee’ssynthetic‐speech,word‐recognitionperformance.Thefigureisreproduced
fromHeimbauer,Beran,&Owren,2011,andincludesmeansandstandarderrorsofpercent‐
age‐correctperformancefor48wordsheardinnatural,noise‐vocoded,andsine‐waveforms.
Firsttrialsrepresentthe48firstinstancesofthechimpanzeehearingawordingivensynthetic
form.Thefirstsetofsine‐waveresultsshowsperformancewithnoncontingent,intermittent
rewarddeliveryandnoresponsefeedback.Thesecondsetshowsperformancewithcontingent
rewardreceivedonnaturaltrialsbutwithnorewardorresponsefeedbackonsine‐wavetrials.
Thedashedlineindicatesthechance‐performancerateof25%correct.Allcomparisonsto
chanceperformancewerestatisticallysignificantatp<0.008andaremarkedbyapairof
asterisks.
28
2.CURRENTEXPERIMENTS
Inthedebateabouttheexistenceof,ornecessityfor,humanspecializationsforspeech
perception,Panzee’snaturalandsyntheticspeechword‐recognitionabilitiesprovideevidence
fortheAuditoryHypothesisview.Herperformancewithnoise‐vocodedandsine‐wavewordsin
particularsuggeststhathumanspeechperceptionisgroundedingeneralizedauditorycapabili‐
tiesandextensiveexperiencewithspeechratherthanspecializedprocessingmechanisms.
Thesestudiesdonot,however,allowanunequivocalconclusionthatPanzee’sspeechprocess‐
ingisfundamentallysimilartohumanperception.Forexample,itispossiblethatPanzeeisable
torecognizeherrelativelysmallnumberoffamiliarwordsthroughmoreholisticjudgments,
eitherbasedondurationoroverallauralimpressions(seeHeimbaueretal.,2011).Totestthe
SiSviewversustheAuditoryHypothesismoredefinitively,itwouldrequirespecificevidence
aboutdetailedaspectsofPanzee’sprocessingstrategies.
Therefore,thecurrentexperimentsweredesignedtoextendpreviousworkonPanzee’s
speechperception,specificallyinvestigatingwhethershereliesonsimilarauditoryperceptual
mechanismsashumans.Experiment1examinedtheacousticcuesPanzeemaybeattendingto
whenhearingspeechinsine‐wave(Experiment1a)andnoise‐vocoded(Experiment1b)forms.
Forhumans,sine‐wavespeechbecomesmoredifficulttorecognizewheneitherthefirst(SW1)
orsecondsinewave(SW2)ofthethreesinewavesisremoved(Remezetal.,1981).Hypothesiz‐
ingthatPanzeewouldalsoshowevidenceofrelyingdisproportionatelyonthesecues,
Experiment1acomparedherperformancetothatofhumanparticipantswhenhearingfour
criticallydifferentversionsofsine‐wavewords.InExperiment1b,Panzeeandhumanpartici‐
pantsweretestedwithnoise‐vocodedwordsproducedfromvaryingnumbersofnoise
29
bands.Previousresearchhasshownthathumansfinditeasiertorecognizesentencesproduced
withfourormorenoisebands(Shannonetal.,1995),attributedtothefactthatincreased
numbersofnoisebandsenhancetheamplitudeandfrequencymodulationinformation
represented(Davisetal.,2005;Shannonetal.,1995).Basedonherpreviousperformancewith
wordsinnoise‐vocodedform,itwasagainhypothesizedthatPanzeewouldshowsimilar
performancetohumansasafunctionofthenumberofnoisebandsusedinsynthesis.
Experiment2focusedontime‐reversedspeech.Here,previousresearchwithhumans
hasrevealedthatphonemesarecuedoversegmentsofroughly50to100ms(Crystal&House,
1988),withshorterwindowsintime‐reversedformleavingintelligibilitylargelyunaffected,but
longerwindowshavingasignificantdetrimentalimpact(Saberi&Perrott,1999).Thesecond
experiment,therefore,wasdesignedtoinvestigatewhetherPanzee’sperceptionalsorelieson
cuingoverthistimeframe.Inthisstudy,sheheardtestwordswithreversalwindowsthat
rangedfrom25msto200msineight25‐msincrements,andhumanparticipantsweretested
forcomparisonpurposes.Thepredictiontestedwasthat,likehumans,Panzeewouldperform
bestwithwindowlengthslessthantheaveragedurationofEnglishphonemes,with
intelligibilitydecreasingto50%thresholdlevelforawindow‐lengthofapproximately130ms—
asfoundforhumanperformancebySaberiandPerrott(1999).
Experiment3investigatedthelackofinvarianceproblem.Whilehumansseemto
effortlesslyaccommodatetheacousticvariabilityfoundinspeechfromdifferenttalkers,
evidenceofcorrespondingtalkernormalizationcapabilitiesinnonhumansissuggestive,but
limited.AsPanzeehasheardandrespondedtospeechfromavarietyoftalkersthroughher
lifetime,thislastexperimentwasdesignedtotesthertalker‐normalizationcapabilitiesmore
30
systematically.Stimuliincludedspeechfromadiversesetoftalkers,includingvariationin
biologicalsex,age,anddialectbackground.Inaddition,someoftheindividualtalkerswere
familiartoPanzeeandotherswereunfamiliar.
2.1GeneralMethods
2.1.1Subject.ThesubjectwasthefemalechimpanzeePanzee,whowas25yearsold
whenthecurrentexperimentsbegan.Thisanimalissociallyhousedwiththreeconspecificsat
theLRCatGSU.Panzeehasdailyaccesstoindoorandoutdoorareas,unlimitedaccesstowater,
andisfedfruitsandvegetablesthreetimesaday.Sheparticipatesintestingonavoluntary
basisandmaychoosenottoparticipateortostoprespondingduringasession.Panzeeusesa
language‐like,lexigram‐basedcommunicationsystemtorequestitemsthroughoutthedayand
oftenduringexperimentalsituations.Inadditiontolanguage‐comprehensiontestingusing
lexigramsandphotographs,thisanimalalsohasexperiencewithnumerous,computer‐based
protocols(Rumbaugh&Washburn,2003).Inthethreeexperiments,sheparticipatedinthree
tofour,20‐to30‐minutesessionsperweek,andworkedforfavoredfooditems.Shewastested
inanindoorareaofherdailylivingspace,whichwasadjacenttootherchimpanzeeareas.
Duringtestsessions,otherchimpanzeescouldbeeitherindoorsoroutdoors,withtheoptionof
movingbetweenthoseareasatwill.
2.1.2Participants.Humanparticipantswereundergraduates,aged18to55yearsold,
andwererecruitedviatheGSUon‐line,experimentparticipationsystem.Eachparticipantwas
testedinonlyoneoftheapplicableexperiments—eitherinExperiment1a,1b,or2.Only
participantswithoutreportedhearingproblemsandthatwerenativeEnglish‐speakerswere
includedinanalyses.
31
2.1.3Apparatus.Computerprogramsusedtotestthechimpanzeewerewrittenin
VisualBasicVersion6.0(MicrosoftCorp.,Redmond,WA)andrunonaDellDimension2400
personalcomputer(DellUSA,RoundRock,TX).ASamsungModel930BLCDmonitor(Samsung
Electronics,Seoul,SouthKorea),aRealisticSA‐150stereoamplifier(TandyCorp.,FortWorth,
TX),andtwoADSL200speakers(Analog&DigitalSystems,Wilmington,MA)wereconnectedto
thecomputer.ThechimpanzeeregisteredherchoicesusingacustomizedGravis42111
GamepadProvideo‐gamingjoystick(KensingtonTechnologyGroup,SanFrancisco,CA).Human
participantsheardexperimentalstimulithroughSennheiserHD650headphonesinasound‐
deadenedroom.Theexperimentswerecontrolledviaacomputerfromanadjacentroom,and
soundswerepresentedviaTDTSystemIImodules(Tucker‐DavisTechnologies,Alachua,FL).
Audio‐recordingwasconductedwithaShurePG14/PG30‐K7head‐wornwirelessmicrophone
system(ShureInc.,Niles,IL),andeitheraRealistic32‐12008stereomixingconsole(Tandy
Corp.,Ft.Worth,TX)andMarantzPMD671ProfessionalSolid‐StateRecorder(Mahwah,New
Jersey),oraMacBookProlaptopcomputer(AppleInc.,Cupertino,CA).Acousticprocessingwas
conductedusingaMacBookProlaptop,PraatVersion5.1.11,acousticssoftware(Boersma,
2008),andcustom‐writtenscripts(Owren,2010).
2.1.4Stimuli.Spokenstimuliwerechosenfromalistofapproximately130wordsthat
Panzeehasconsistentlyidentifiedinadecadeofannualword‐comprehensiontesting.Natural
wordstimuliwererecordedat44100Hzwith16‐bitword‐widthandfilteredtoremoveany
60‐Hz,ACcontaminationandDCoffset.Individualwordswereisolatedbycroppingcorrespond‐
ingsegmentsatzerocrossings,with100msofsilencethenaddedtothebeginningandendof
32
eachfile.Finally,eachwaveformwasrescaledsoitsmaximumamplitudevaluecoincidedwith
themaximumrepresentablevalue.
2.1.5Chimpanzeeprocedure.Panzeewastestedusingthegeneralprocedureemployed
forannualword‐comprehensiontesting.Sheinitiatedatrialbyusingthejoysticktomoveacur‐
sorfromthebottomoftheLCDscreenintoacentered“start”box,triggeringonepresentation
ofthestimulus.Thecursorthenresettothebottomofthescreen,thestartboxreappeared,
andasecondcursormovementproducedanotherstimuluspresentation.Aftera1‐secdelay,
fourdifferentphotographs(Experiments1and2;seeFigure3)orlexigrams(Experiment3;see
Figure11)appearedonthescreen.Oneoftheseitemswasthecorrectmatchtotheaudio
stimulus,andtheotherswerefoilschosenrandomlybythecontrollingcomputerprogram.As
illustratedinFigure4,visualitemswerepositionedrandomlyinfourofsixpossiblelocations—
threeontheleftsideofthescreenandthreeontheright.Photographfoilswerethoseof
wordsusedinthesamesession,therebyreducingthechancethatPanzeecouldruleoutitems
correspondingtowordsshewasnothearing(Beran&Washburn,2002).
Figure3.SamplesofthephotographsusedinPanzee’sspoken‐wordrecognitiontask.
33
Panzee’staskwastousethejoysticktomovethecursorfromthemiddleofthescreen
tothephotographcorrespondingtothestimulusword(seeFigure4).InExperiments1aand1b
bothnaturalandalteredwordswerepresentedinrandomizedorderwithineachtrialblock.In
Experiment2,onlyalteredwordswerepresented;andinExperiment3,onlynatural‐word
stimuliwereused.Panzeewasrewardedwithhighlyvaluedfood,includingpiecesofcherries,
grapes,blueberries,peaches,raspberries,strawberries,mixedfruit,orChexMix®.Thereward
regimenwasspecifictoeachexperiment,andisnotedintheindividualproceduresections.
Figure4.Panzeeworkingonacomputertask.Shewashearingwordsandchoosingcorrespond‐
ingphotos.
2.1.6Humanprocedures.Humantestingvariedsomewhatbyexperimentandis
describedintheindividualproceduresections.Commonelementsincludedthatstimuliwere
presentedinrandomizedblocks,thatthestimuluswasheardtwiceoneachtrial1200‐msapart,
andthatlistenershadeightsecondsinwhichtotranscribethatword.
34
2.1.7Dataanalysis.Statisticaltestingvariedbyexperimentandisdiscussedseparately
ineachcase.
3.EXPERIMENT1
AlthoughPanzeehasdemonstratedtheabilitytoidentifysine‐waveandnoise‐vocoded
speech(Heimbaueretal.,2011),thecognitivemechanismsthatsheisemployingareunknown.
Thus,twostudiesinvestigatedtheacousticcuesshemaybeattendingtowhenidentifying
thesealteredformsofspeech.ItwashypothesizedthatPanzeewouldshowevidenceofusing
thesameinformationhumansareproposedtouse,namelythespectro‐temporalcues
producedbyamplitudeandfrequencymodulationsovertime(Remezetal.,1981;Remezetal.,
1994;Shannonetal.,1995).Alternatively,shemightutilizemoreholisticcues,suchasword
lengthorgeneralsoundimpressions.AnalysesconductedbyHeimbaueretal.(2011)argued
againstthispossibility.Inthesepreviousexperiments,itwasexpectedthatwhenPanzeemade
errorsshewouldhavebeenchoosingfoilsthatcorrespondedtowordswhoseoverallduration
orsyllablecountweresimilartothoseofthetargetword.However,shedidnotdemonstrate
eitherofthesestrategieswhenerrorsonsine‐waveandnoise‐vocodedwordswereanalyzed.
Here,Panzeewastestedwithsine‐wavewordsthatincludedvaryingcombinationsof
individualtones(Experiment1a),andnoise‐vocodedspeechproducedfromvaryingnumbersof
noisebands(Experiment1b).Asaresult,thesedifferentsyntheticwordsincludeddiffering
degreesoftime‐varyingamplitudeandfrequencycuingofakindpreviouslyshowntosystem‐
aticallyaffecthumanperformance.TherationalewasthatifperformancebybothPanzeeand
humanswassimilarlycompromisedorfacilitatedacrossthevarioussyntheticstimuli,thetwo
speciescouldbeinferredtobeattendingtosimilarelementsofthesounds.Althoughdifficult
35
tospecifyprecisely,thisinformationhasbeencharacterizedascriticalspectro‐temporal
patterningineachnaturalwordthatispreservedinsyntheticversions.
3.1Experiment1a
AseminalstudybyRemezandcolleagues(1981)exploredhumanperceptionand
identificationofsine‐wavespeechwiththeobjectiveofinvestigatingtheroleoftime‐varying
propertiesinspeechperception.Theyfoundthatsine‐wavespeech,despitelackingthetradi‐
tionalacousticinformation—suchasF0andformants—couldbeintelligible.Inaddition,as
showninFigure5,theirlistenersweremoresuccessfulinidentifyingsentencecomponents
whenSW1andSW2werebothpresent(SW123andSW12forms)thanwheneitheronewas
absent.SW1andSW2modelthecorrespondingamplitudeandfrequencymodulationpatterns
ofnatural‐speechformantsF1andF2,respectively.Thatoutcomewasexpected,asitisthe
lowesttwoformantsthattypicallymostclearlycuevowelidentity,inadditiontoproviding
articulationinformationforadjacentconsonants(Drullman,2006;Ladefoged,2001).Thus,
Remezetal.’sresultsdemonstratethatcorrespondingtoneanalogstoF1andF2contribute
disproportionatelytosine‐wavespeechidentificationaswell.
ToascertainifPanzeealsodifferentiallyusesthesecomponentsofsine‐wavespeech,
bothsheandhumanparticipantswerepresentedwith24wordssynthesizedinfourofthe
sameformsusedbyRemezandcolleagues.Allthreesinewaveswerepresentinoneversion
(SW123),whileoneofthethreewasremovedintheothers.ItwashypothesizedthatifPanzee
identifiessine‐wavespeechusingsimilaracousticcuesashumans,herperformancewhen
particularsinewavesweremissingwouldbesimilartohumans.Specifically,shewaspredicted
36
toperformbestwithwordsthatincludedbothSW1andSW2,(SW123,SW12)andlesswell
wheneitherofthesecomponentswasmissing(SW13,SW23).
Figure5.Intelligibilityofsine‐wavespeechtohumans.ThefigureisreproducedfromRemezet
al.,1981,andshowssyllable‐transcriptionresultsofsine‐wavesentencesinsevendifferent
forms.
3.1.1Subject.ThesubjectwasthechimpanzeePanzee.
3.1.2Participants.Therewere12humanparticipants(eightfemales).
3.1.3Stimuli.ForExperiment1a,naturalwordstimuliwererecordedspokenbyan
adultmaleresearcher(MJB),whoisveryfamiliartoPanzeeandwhoconductedherannual,
word‐comprehensiontestingovera10‐yearperiod.Stimuliconsistedofnaturalversionsand
sine‐waveversionsof24spokenwordsthatPanzeehadpreviouslysuccessfullyidentifiedin
sine‐waveform(Heimbaueretal.,2011).The24‐wordsetcontained9two‐syllablewords,13
three‐syllablewords,1four‐syllableword,and1five‐syllableword.Anadditional12words
fromthelargerwordlistthatPanzeecanroutinelyidentifywereusedduringaninitial,
37
“orientation”phasethatincludedbothnaturalandSW123versions(seeTable1foracomplete
listoforientationandexperimentalwords).Toproducethesine‐wavestimuliinthethree
incompleteforms,eitherSW1,SW2,orSW3wereremovedfromthepreviouslyconstructed
processedSW123versions.Individualsine‐waveswereremovedusingHanning‐window,band‐
passfiltering.
Table1.
Orientationandtestwordgroups.TestwordsusedinExperiments1a(AandB),1b(CandD),2
(*),and3(EandF),aswellasorientationwordsusedinExperiments1aand1b(O).
Words Experiment 1a Experiment 1b Experiment 2 Experiment 3 Apple O O F Apricot A D * E Balloon O O F Banana A * E Blueberries B D * E Bubbles A C F Carrot O E Celery D * F Cereal O O * F Clover F Coffee O E ColonyRoom B D E Gorilla B C * Honeysuckle E Hotdog O F Jello O Kiwi O Koolaid O Lemonade B D * Lettuce C Lookout A O M&M B * Melon A E MushroomTrail B O * F Noodles C E
38
ObservationRoom B D Orange A C OrangeDrink A O * F OrangeJuice B D * E Peaches O C F Pineapple * E Pineneedle B O * F PlasticBag B D * Popsicle O D * Potato B D * E Raisin O C Sparkler A C Strawberries O * F Sugarcane B D * Surprise A C SweetPotato O F Tickle A C E Tomato D * F Toothpaste C E TV C F Vitamins O * E Water A O F
Yogurt A O E
3.1.4Chimpanzeeprocedure.Inallsessions,wordswerepresentedinfourrandomized
blocks,foratotalof96trials.Oneachtrial,Panzeechosefromamongfourphotographsthat
allcorrespondedtowordsbeingusedinthatparticularsession.WhenPanzeeheardawordin
naturalformandmadeacorrectchoice,sheheardanascending(“correct”)toneandreceived
afoodreward.Whenshemadeanincorrectchoiceonanaturalwordtrial,sheheardabuzzer‐
like(“incorrect”)soundanddidnotreceiveareward.InsessionswhenPanzeeheardboth
naturalandsyntheticwords,thisfeedbackhelpedkeephermotivated.Neitherfeedback
soundsnorfoodrewardswereprovidedontrialswithsyntheticstimuli.
39
InitialsessionsassessedPanzee’sperformancewhenhearingthe24testwordsin
naturalform.Toprogress,Panzeewasrequiredtoperformatorabove70%correctwith
naturaltestwordsforthreeconsecutivesessions(chancewas25%).Panzeethencompleted
bothnaturalandsine‐wavesessionswithorientationwords.Here,sheheardeightblocksof12
naturalorientationwordsfortwosessions,averaging72%correct.Thensheheardthesewords
inbothnaturalandsine‐waveformfortwosessions.Inthefirst,sheheardsixorientation
wordsinnaturalformandtheothersixinSW123formforeightblocks.Inthesecondsession,
thesixwordsPanzeepreviouslyheardasnaturalwordswerepresentedinSW123form,and
viceversa.Afterthesine‐waveorientationphase,Panzeeparticipatedinoneadditionalsession
withnaturaltestwordstorefreshheronthetestwordset,andperformedat80%correct.
Inthetestingphase,Panzeecompletedonesessionwiththe12GroupAwordsin
naturalformandtheremaining12GroupBwordsinSW123,SW12,SW13,andSW23forms.In
asecondsessiononadifferentday,sheheardtheGroupBwordsinnaturalformandGroupA
wordsinthefoursine‐waveforms.Trialswererandomizedwithinblocksinthesesessions,with
Panzeehearingnaturalwordsfourtimeseachandsine‐wavewordsonceineachform.She
participatedinthesetwotypesofsessionsthreetimeseach,inanalternatingorder,resultingin
atotalof12trialsforeachwordinnaturalformand3trialsforeachwordineverysine‐wave
form.
3.1.5Humanprocedure.Pilotexperimentationdemonstratedthathumanswereat
ceilingperformanceforallwordformswhenstimuliwerepresentedusingthewordrecognition
methodemployedwithPanzee.Therefore,insteadofhavingparticipantschoosefromfour
photographstoidentifythestimuli,aword‐recallmethodwasusedwherebyparticipantshad
40
totranscribethesoundstheyheard.First,however,participantswerefamiliarizedwiththe
wordsetbyexposuretothe24testwordspresentedonaPowerPointpresentation.Photo‐
graphsoftest‐wordobjectswereshown,oneatatime,whilethecorresponding,naturally
recordedwordwasheard.Then,theparticipantwasaskedtowritedownthenameofeach
photographasitwaspresentedwithoutwordlabelsorsounds.Participantswerealsofamiliar‐
izedwithsine‐wavespeechbylisteningtoarecordingofthewords“one”through“ten”and
then“ten”through“one”inSW123form.Theywereinstructedtoinformtheexperimenteras
soonastheywereabletoidentifythesesoundsasspeech.
Inthetestsession,participantsheardthestimuliintwodifferentrandomizedblocks,
withblockordercounterbalancedacrossindividuals.OneblockconsistedofGroupAwordsin
naturalformandGroupBwordsinthefoursine‐waveforms,andtheotherblockincluded
GroupBwordsinnaturalformandGroupAwordsinthefoursine‐waveforms.Withinablock,
naturalwordswerepresentedfortwotrials,andsinewavewordswerepresentedforonetrial
eachineverysine‐waveform.Thesesessionsincludedatotalof72trials.
3.1.6Dataanalysis.Panzee’smeanpercentage‐correctperformanceinorientation
versustestsessionswithnaturalwordswascomparedusinganunpairedt‐testforapossible
learningeffect.Meanpercentage‐correctperformanceforeachsine‐wavewordformwithin
andacrossthesixtestsessionswascomparedtochance‐rateperformanceof25%using
binomialtests.Pearson’schi‐squaredtestswithaBonferronicorrectionwereconductedto
comparePanzee’sperformanceacrossthevarioussine‐waveversions.Humanpercentage‐
correctperformancewascomputedforeachwordform,withmeanperformanceforthe12
participantsthenexaminedseparatelyfornaturalandsine‐waveversions.ANOVAwasusedto
41
testforanoveralleffectofsine‐wavewordforms,andTukeypost‐hoccomparisonswereused
forsubsequentpair‐wisecomparisonsamongthem.Finally,anindependentt‐testwascon‐
ductedtotestforpossibleeffectsofblock‐presentationorder.
3.1.7Results.Panzee’smeanperformanceoverthethreenaturalwordorientation
sessionswas73.3%(SD=1.58),whichwasstatisticallyabovechancelevel(p<0.001).Correct
natural‐wordtrialsinthesixtestsessionsrangedfrom81.3%to93.8%,averaging87.2%
(SD=4.06)overall,whichalsowassignificantlyabovechance(p<0.001).Anunpaired,2‐tailed
t‐testrevealedthatPanzee’sperformancewithnaturalwordswassignificantlyhigherintest
sessionsthaninorientationssessions,t(7)=3.95,p<0.01,asshowninFigure6a.Overall,
correctperformanceforallsine‐wavewordswasstatisticallyabovechancelevel(SW123and
SW12forms,p<0.001;SW23andSW13,p<0.05).AsillustratedinFigure6b,Panzeewas36%
correctforSW23andSW13words,and59%correctforSW123andSW12words.Achi‐squared
testwithaBonferronicorrectedalphavalueof0.025revealedthatcorrectperformancefor
SW123andSW12wordswassignificantlygreaterthanforSW23(p=0.006)andSW13versions
(p=0.004).
42
Figure6.Experiment1achimpanzeeandhumanwordrecognition.a)Meanperformancewith
naturalwordsbyPanzeeandthehumanparticipants,withapplicablestandarddeviations.
b)Panzee’ssine‐wavewordperformance,withchance‐levelaccuracyshownbythedashedline.
c)Meanhumansine‐wavewordperformance,withstandarderrorbars.
43
Meantranscriptionperformanceofnaturalwordsbyhumanswas99.8%correct
(SEM=0.06),asshowninFigure6a.Meanpercentage‐correctvaluesforSW123,SW12,SW23,
andSW13wordformswere43%,35%,31%,and27%,respectively.AKolmogorov‐Smirnovtest
fornormalityvalidateduseofANOVA,andresultsrevealedastatisticallysignificantdifference
amongthevariousoutcomes,F(3,44)=6.00,p=0.002.Furthermore,Tukeypost‐hoc
comparisonsshowedthatoutcomesweresignificantlyhigherforSW123stimulithanforSW13
versions,p=0.001,asillustratedinFigure6c.Nootherdifferenceswerefound.Independent,
2‐tailed,t‐testresultsrevealedaneffectofpresentationorder.Fiveparticipantstranscribed
GroupBwordsfirst,andperformedsignificantlybetteronGroupAsine‐wavewordsthandid
participantshearingGroupAwordsfirst,t(46)=2.28,p=0.027.Similarly,thesevenpartici‐
pantstranscribingGroupAwordsfirstperformedsignificantlybetteronGroupBsine‐wave
wordsthanthosehearingGroupBwordsfirst,t(29.6)=9.33,p<0.001.Examiningindividual
performance,sixparticipantsperformedsimilarlytoPanzee,eitheroverallorinoneofthe
blocks.Fortheseparticipants,performancewasthesameforSW123andSW12,orforSW23
andSW13wordforms.ThesesixparticipantsalsoperformednotablybetteronSW123and
SW12words,thanonSW23andSW13forms.Finally,boththehumansandPanzeenever
recognizedthewords“banana,”“bubbles,”“orangedrink”and“pineneedle”inseveralofthe
SWforms.
3.1.8Discussion.Panzeedemonstratedconsistentnaturalword‐recognitionper‐
formance,showingsimilaroutcomesinbothorientationandtestsessionsasinearlierannual
testingsynthetic‐speechexperiments(Heimbaueretal.,2011).HerrecognitionofSW123
wordswasalsosimilartoperformanceinprevioussine‐wavetesting(Heimbaueretal.,2011).
44
PanzeeidentifiedmorewordsinSW123andSW12formthaninSW13orSW23form.Human
performedsimilarly,althoughonlythedifferencebetweenSW123(withbothSW1andSW2
present)andSW13(missingSW2)performancewasstatisticallysignificant,whiletheSW123
andSW23performancedifferencewasnot.However,6ofthe12participantsdidperform
similarlytoPanzee,eitheroverallorinoneofthetwotest‐wordblocks.Inotherwords,these
participantsperformedexactlythesamewithSW123andSW12forms,orSW23andSW13
forms,andwerebetteratidentifyingSW123andSW12wordsthanSW23andSW13wordsin
thoseinstances.
Unexpectedly,Panzee’sperformanceonSW123wordswas58%correct,whichwas
higherthanmeanhumanoutcomeof43%correct.Panzee’shigheraccuracymaybeduetothe
factthatalthoughsine‐wavewordscanbequitechallengingeventohumans(Heimbaueretal.,
2011),shewasveryfamiliarwithherwordsetandhadheardtheminSW123forminearlier
experiments.Althoughhumanparticipantswereexposedtoandtestedwiththenaturalwords
beforehearingthesine‐waveforms,theywerelessfamiliarwiththemthanwasthe
chimpanzee.
Themoreimportantresultisthatbothspeciesshowedastatisticallysignificantper‐
formancedifferencebetweencompletesine‐wavewords(SW123)andthesamewordswhen
missingthetoneanalogtoF2(SW2).ThisresultisconsistentwiththehypothesisthatPanzee
respondstothesamecuesinsine‐wavespeechthathumansrespondto,withthefurther
implicationthatsheisattendingtothesamefeaturesashumansinnaturalspeechaswell.This
conclusionisbasedonthefindingsthatPanzeewasmostsuccessfulinidentifyingsine‐wave
speechthatincludedinformationconcerningbothF1andF2,themostimportantformantsin
45
humanperceptionofnaturalspeech(Drullman,2006;Remez&Rubin,1990).BothPanzee
andhumansdemonstratedanabilitytointerpretsine‐wavesascuestophoneticcontent,also
suggestingthatbothweredrawingonimplicitknowledgeofspeechacousticsandcorrespond‐
ingphonetics(Davis&Johnsrude,2007;Mann&Liberman,1983;Newman,2006;Whalen&
Liberman,1987).Takentogether,theseoutcomesareindicativeofcognitivetop‐down
processing.
3.2Experiment1b
Panzee’sabilitytoidentifywordsinnoise‐vocodedformalsoprovidedanopportunityto
examinethecuessheissensitivetoinsyntheticspeech,withcorrespondingimplicationsfor
naturalspeechprocessing.Althoughherpreviousperformancehassuggestedthatgeneral
auditoryprocessingcapabilitiesmaybesufficientforhuman‐likespeechperception(Heimbauer
etal.,2011),moredetailedtestingwithnoise‐vocodedwordscouldfurtherstrengthenthis
conclusion.Hence,thepurposeofthenextexperimentwastocomparePanzee’sperformance
withnoise‐vocodedwordswithvaryingdegreesofspectro‐temporalinformationtothatof
humans.
In1995,Shannonandcolleaguesfoundthatasthenumberofbandsusedtosynthesize
noise‐vocodedphonemesandsentencesincreased,participantsshowedcorresponding
improvementsinidentificationaccuracy(seeFigure7).Withtrainedlisteners,fournoisebands
areoftensufficientforspeechrecognition,whileatleasttennoisebandsarenecessarywith
untrainedparticipants(Davisetal.,1995;Shannonetal.,1995).Previously,Panzeedemon‐
stratedrecognitionoffamiliarwordsinnoise‐vocodedformsynthesizedwithsevennoisebands
(Heimbaueretal.,2011).Therefore,thisexperimentassessedherword‐recognitionabilityasa
46
functionofthenumberofnoisebandsusedtoproducethestimuli.Itwashypothesizedthatif
Panzeeusesthesameavailablecuesashumans,shewouldshowasimilarpatternofperform‐
anceacrossthoseforms.Specifically,herperformancewaspredictedtoincreaselinearlywith
increasingnumbersofnoisebands.
Figure7.Intelligibilityofnoise‐vocodedspeechtohumans.Percentage‐correctperformance
foreighthumanlistenersidentifyingconsonants,vowels,andsentencesasafunctionofnoise‐
bandnumberinnoise‐vocodedspeechtestedbyShannonetal.(1995).Thedashedline
denoteschance‐levelaccuracy.
3.2.1Subject.Panzeeagainwasthesubject.
3.2.2Participants.Therewere12humanparticipants(eightfemales).
3.2.3Stimuli.Stimuliconsistedof24previouslyrecordedandprocessednaturalwords
(seeTable1),whichwerethosethatPanzeehadbestidentifiedinnoise‐vocodedforminearlier
testing(Heimbaueretal.,2011).Noise‐vocodedversionsvariedfromtwotofivenoisebands,
47
andweresynthesizedusinglower‐andupper‐cutofffrequencies(seeTable2)calculatedusing
the“Greenwoodfunction”(Souza&Rosen,2009).Thisfunctioncalculatesfrequencyranges
correspondingtoequaldistancesalongthebasilarmembraneofthecochlea,andcanbe
appliedtobothhumansandothermammals,includingnonhumanprimates(Greenwood,1961;
Greenwood,1990).Theapproachwasusedtoensureorderlyselectionoffrequency‐cutoff
valuesastheyrelatetohearing.
Teststimuliconsistedof11two‐syllablewords,11three‐syllablewords,1four‐syllable
word,and1five‐syllableword.AllwordswerechosenfromalistofthosethatPanzee
previouslysuccessfullyidentifiedinnoise‐vocodedform.Fifteenofthesewordswerealsoused
inExperiment1a.Twelveadditionalwords,innaturalandapreviouslysynthesizedformusing
sevennoisebands(NB7),wereusedduringanorientationphase(seeTable1).Toproducethe
variousnoise‐bandteststimuli(NB2,NB3,NB4,andNB5),thenaturalspeechsignalwasdivided
into2,3,4,or5frequencybandsusingaband‐passfilter.Theamplitudeenvelopeofeachband
wasthenextractedandusedtomodulateacorrespondingwhite‐noiseband.Theresulting
amplitude‐modulatednoisewaveformswerethensummed.
Table2.
Lower‐to‐uppercutofffrequenciesfornoise‐bandstimuliinExperiment1b.
Bands Frequency(Hz)
2 100‐1005 1005‐5000
3 100‐548 548‐1755 1755‐5000
4 100‐392 392‐1005 1005‐2294 2294‐5000
5 100‐315 315‐705 705‐1410 1410‐2687 2687‐5000
48
3.2.4Chimpanzeeprocedure.Thetestingprocedureandrewardregimenwerethesame
asthoseusedinExperiment1a,andallsessionsconsistedof96trials.Panzeefirstcompleted
threenaturalwordsessionstoprepareherforthetestsessions,andtoensurenormative
performancewithnaturalwords.Criterionperformancetoprogresstotheorientationphases
wassetat75%inthreeconsecutivesessions.Duringorientation,Panzeecompletedonesession
with12non‐testwordsinnaturalform,asecondsessionwithsixoftheseinnaturalandsixin
NB7form,andathirdsessionwiththesewordsintheconverseforms.Inthefinalorientation
phase,Panzeecompletedonemoresessionwiththe24naturaltestwordsandthenthree
sessionsofthesewordsinnaturalandNB7forms.Ineachoftheselattersessions,adifferent
eightwordswereinNB7formandtheremaining16werenaturalversions.
Anadditionalprogrammingcontingencywasaddedintestsessions.Here,wordsofthe
sametype,meaningnaturalornoise‐vocoded,werenotpresentedmorethanthreetimesina
row.ThisadjustmentwasmadetoavoidfrustrationthatcouldpossiblyresultfromPanzee
hearingaseriesofchallengingnoise‐vocodedwordsconsecutively.Inthefirsttestsession,
Panzeeheard12words(GroupC)innaturalformandtheremaining12words(GroupD)inNB2,
NB3,NB4,andNB5versions.Inasecondtestsession,onadifferentday,sheheardGroupD
wordsinnaturalformandGroupCwordsinthefourNBversions(seeTable1).Withina
session,therewerefourtrialswitheachnaturalwordandonetrialwitheachofthewordsin
everyNBform.Panzeeparticipatedinthesetwosessiontypesthreetimeseachinalternating
order,resultinginatotalof12trialsforeachofthe24naturalwordsand3trialsforeachofthe
NBwordforms.
49
3.2.5Humanprocedure.Severalhumanlistenerswerefirsttestedinpilotsessionsusing
thesameorientationandtestproceduresasinExperiment1a.However,astheseparticipants
demonstratedhighaccuracywithtestwordsinallNBforms,theorientationprocedurewas
changed.Experimentalparticipantswereinsteadfamiliarizedwithnoise‐vocodedspeechonly
bylisteningtoarecordingofthewords“one”through“ten”andthen“ten”through“one”in
NB7form.Followingthissimpleorientation,theyheardandtranscribedoneblockofthe24
naturalwordsinrandomizedorder.Lastlytheyheardandtranscribedarandomizedtestblock
ofthesamewordsineachofthefournoise‐bandforms,foratotalof96trials.
3.2.6Dataanalysis.DataforbothPanzeeandthehumanparticipantswasanalyzedas
inExperiment1a.
3.2.7Results.Panzee’snaturalword‐recognitionperformanceinorientationsessions
rangedfrom77.2%to83.3%,withanoverallmeanof80.6%(SD=3.11),whichwasstatistically
abovechancelevel,p<0.001(seeFigure8a).Percentage‐correctonnaturalwordsinthesix
testsessionsrangedfrom77.1%to87.5%,withanoverallmeanof82.8%(SD=3.8)correct,
whichwasalsosignificantlyabovechancelevel(p<0.001).Anunpaired,2‐tailedt‐testrevealed
thatPanzee’snatural‐wordperformancewasnotstatisticallydifferentbetweenthesetwo
sessiontypes,t(7)=0.70,ns.
Panzee’spercentagecorrectforNB5,NB4,andNB3wordformsrangedfrom61%to
50%(seeFigure8b),andoverallwassignificantlyabovechance(p<0.001).HerNB2word
performancewaslowerat38%correct,andnotsignificantlydifferentfromchance.Aone‐
tailed,chi‐squaredtest,withaBonferroniadjustedalphavalueof0.017,showedthatPanzee’s
50
recognitionofNB5wordswassignificantlyhigherthanNB2versions(p=0.002),butnothigher
thaneitherNB4orNB3forms.
Figure8.Experiment1bchimpanzeeandhumanwordrecognition.a)Meanperformancewith
naturalwordsbyPanzeeandthehumanparticipants.b)Panzee’snoise‐vocodedwordper‐
formance,withchance‐levelaccuracyshownbythedashedline.c)Meanhumanperformance
fornoise‐vocodedwords,withstandarderrorbars.
51
Humanwordtranscriptionofnaturalwordswas100%correct(seeFigure8a).Mean
percentage‐correctvaluesforNB2,NB3,NB4,andNB5formswere80%,78%,68%,and38%,
respectively(seeFigure8c).AfteraKolmogorov‐Smirnovtestshowedthedatatobenormally
distributed,ANOVArevealedanoveralleffectacrossthesenoise‐vocodedwordforms,F(3,44)
=24.0,p<0.001.Tukeypost‐hoccomparisonsrevealedasignificantdifferencebetween
performancewithNB5andNB2forms(p<0.001),butnootherconditioneffects.Examiningthe
performancesofindividualparticipantsrevealedthatfourperformedmuchasPanzeedid.In
otherwords,theyshowedthebestperformancewithNB5words,worstforNB2forms,and
virtuallyidenticaloutcomesforNB4andNB3words.Panzeeneverrecognizedthewords
“celery,”“noodles,”and“raisin”inNB2form,and11of12humanparticipantscompletely
failedwiththeseitemsaswell.
3.2.8Discussion.Asinearliertesting,Panzeeagaindemonstratedtheabilitytoreliably
identifywordsinnoise‐vocodedform.However,herperformancewassignificantlybetterfor
wordsinNB5formthanincorrespondingNB2versions.Humansperformedsimilarly,bothin
thecurrentworkandincomparableearlierstudies(Shannonetal.,1995).Asexpected,increas‐
ingnumbersofnoisebandswasassociatedwithhigherword‐identificationperformancefor
bothPanzeeandthehumans,withbothspeciesperformingaswellwithNB4andNB5formsas
inearliertestingwithNB7stimuli(Heimbaueretal.,2011).Theresultsagainconfirmthatnoise‐
vocodedspeechbasedonasfewasfournoisebandsisreliablycomprehensible(Souza&
Rosen,2009;Shannonetal.,1995)—inthiscaseforachimpanzeeaswell.Whilehumans
recognizedmoreNB5,NB4,andNB3wordsthanPanzeeinthecurrentexperiment,they
52
performednobetterthanshedidwithNB2stimuli.Onlyoneoftwelvehumanswasableto
identifythethreewordsthatwereunintelligibletoPanzeeinthisform.
Ashypothesized,Panzee’sperformancewithnoise‐vocodedwordsshowedevidenceof
sensitivitytothesamecuesashumanlisteners.Inbothcases,perceptionwassuccessfulin
spiteoftheabsenceofbasicspeechfeatures,suchasF0andformantinformation.Whatever
thespectro‐temporalcuesthatremained,thislanguage‐trainedchimpanzeewasabletotake
advantageofthem.Aswithsine‐wavespeech,theoutcomesareinconsistentwiththeSiS
perspectiveandinsteadsupporttheAuditoryHypothesis.Resultsarealsoagainindicativeof
top‐downprocessing,withbothspeciesevidentlymakinguseofpreviousknowledgeofspeech
acousticsandphoneticcategoriesininterpretingthesefundamentallyaltered,synthetic
versions.
3.3GeneralDiscussion
Experiment1wasdesignedtoinvestigatetheacousticcuesthatPanzeemaybeutilizing
whenlisteningtosine‐waveandnoise‐vocodedspeech,comparingheroutcomestoanalogous
humanperformance.Ashypothesized,Panzeeshowedevidenceofsensitivitytothesamecues
inbothsyntheticformsashumans—stimuluspatterningarguedtoreflectspectro‐temporal
propertiesoftheoriginal,naturalspeechsignal(Remezetal.,1994).Forbothsine‐waveand
noise‐vocodedspeechrecognition,Panzeeandcurrenthumanparticipantsperformedbetter
whenthesyntheticwordformsincludedtheattributesshowntofacilitatehumanperformance
incomparablepreviouswork.Additionally,Panzee’sperformanceinbothExperiments1aand
1bdemonstratedthatshecanreliablyunderstandspeechthathasbeencharacterizedas
missingtraditionalacousticcuestophoneticcontent(Remezetal.,1994).Thissupportsthe
53
viewthathumanspecializationsforperceptionarenotnecessaryforspeechperception(Kuhl,
1988),andthatPanzeeisdemonstratingtop‐downinterpretationofthisimpoverishedspeech
inputinthesamewayashumans(Davisetal.,2005;Davis&Johnsrude,2007;Hillenbrandet
al.,2011).
Inbothexperiments,astrategyofmatchingsyntheticversionstoholisticpropertiesof
knownwordswouldmakeidentificationdifficultorimpossible.Forexample,performanceby
matchingonoveralldurationwouldnotlikelybedramaticallyaffectedacrossthevariousforms,
asgrosstemporalpropertieswerecapturedacrossindividualsine‐wavecomponentsandnoise‐
bands.Inaddition,bothsine‐waveandnoise‐vocodedversionsaredramaticallyalteredrelative
tonaturalspeech,andoverall“auditoryimpressions”aremarkedlydifferentamongthetwo.
ThereasonableconclusionisthatbothPanzeeandhumanlistenerswereabletoperceivethe
syntheticversionsasperceptiblespeechinspiteoftheirunusualacoustics.
4.EXPERIMENT2
ThemanipulationsusedtoinvestigateacousticcuestophoneticcontentinExperiment1
are,ofcourse,nottheonlywaytoapproachbasicspeech‐perceptionproblems.Onealternative
istoexaminephonemesasindividualsegmentswithinthespeechsignal,whichistherationale
behindrecentworkwithtime‐reversedspeech.Asdiscussedearlier,creatingthisspeechform
involvesreversingshort,fixedportionsofthesignal.Althoughtemporalpropertiesofthewave‐
formaremarkedlychangedbythistime‐reversal,theyhavelittleeffectonspeechintelligibility
aslongaswindow‐lengthislessthan100ms(Barkatetal.,2002;Saberi&Perrott,1999)—
meaningnolongerthantheapproximatedurationofphoneticsegments(Crystal&House,
1988).Infact,SaberiandPerrottfoundthatintelligibilityoftime‐reversedspeechforEnglish
54
listenersdecreasedto50%thresholdlevelwhenwindow‐lengthswere130msormore(see
Figure9).ThegoalofExperiment2,therefore,wastoassessPanzee’sabilitytorecognizetime‐
reversedspeech,testingwhethersheshowedsimilarprocessingoverphoneme‐lengthwindow
lengths.ItwashypothesizedthatPanzeedoesperceivespeechbasedonphonemicsegments,
andwoulddemonstratehuman‐likeperformancewhentestedwithtime‐reversedspeech.Ifso,
theresultwouldconstituteevidencethatspeechperceptionmechanismsarebasedongeneral‐
izedmechanismsandprovidesupportfortheAuditoryHypothesis.
Figure9.Intelligibilityoftime‐reversedspeechtohumans.ThefigureisreproducedfromSaberi
andPerrott(1999),andshowssubjectiveintelligibilityratingsoftime‐reversedsentencesby
sevenparticipants.Notethat50%intelligibilityoccursatawindow‐lengthofapproximately
130ms.
4.1Subject
ThesubjectwasthechimpanzeePanzee.
4.2Participants
Therewere12humanparticipants(8females).
55
4.3Stimuli
TwentyofPanzee’spreviouslyrecordedandprocessed,three‐syllable,naturalwords
werechosen(seeTable1).Three‐syllablewordswereusedexclusivelytomaximizebothword
lengthanduniformityamongthestimuli,aswellasbeingacontrolforsyllablecountasa
possiblecuetowordidentification.Eachofthesewordswasmanipulatedbyapplyingtime‐
reversalwindowsofvaryinglengths,startingatthebeginningofthefileandcontinuingthrough
contiguoussectionstotheend.Wordswerereproducedineightforms,usingwindowlengths
of25ms(TR25),50ms(TR50),75ms(TR75),100ms(TR100),125ms(TR125),150ms(TR150),
175ms(TR175),and200ms(TR200).Thefinaltime‐reversedsegmentineachsoundfilewas
almostalwayssmallerthanthenominalwindowlength,afactorthatwillbediscussedlaterin
theResultssection.
4.4Chimpanzeeprocedure
ThecomputertestingprocedurewasthesameasinExperiments1aand1b;however,
testsessionsnowconsistedof80trials.Panzeereceivednoauditoryfeedbackonindividual
trialsandwasrewardedaftereverythreetofourtrials,independentofperformance.Thisnon‐
contingentrewardregimenwasusedtoavoidthepossibilityoflearningeffectsacrossthe
varioustime‐reversedforms.Panzeedidnotreceiveanyorientationfortime‐reversedstimuli.
Instead,shesimplyparticipatedinsessionshearingthe20wordsinonlynaturalformbefore
testingbegan.Eachofthesesessionsconsistedoffour,randomizedwordblocks,andshewas
requiredtoperformatalevelofatleast70%correctoverthreeconsecutivesessions.She
reachedthiscriterionaftersixsessions.
56
Duringthefirsttestsession,Panzeeheardthe20wordsonetimeeachinfourtime‐
reversedforms—TR50,TR100,TR150,andTR200.Inthesecondtestsession,onadifferentday,
sheheardthesame20wordsonetimeeachintheotherfourforms—TR25,TR75,TR125,and
TR175(seeTable1).Stimuliwererandomizedwithinblocks,andPanzeecompletedbothtypes
ofsessionsfourtimeseach,inalternatingorder.Overall,testingincluded4trialsforeachofthe
20wordsineveryTRform.
4.5Humanprocedure
Twosetsof12participantsweretested.“Group1”wastestedwiththesame20test
wordsusedwithPanzeeinformsTR25,TR75,TR125,andTR175,foratotalof80randomized
trialsheardinasinglesession.“Group2”wastestedsimilarly,butwiththe20wordsinforms
TR50,TR100,TR150,andTR200.
4.6Dataanalysis
Panzee’spercentage‐correctperformancewascomputedforeachoftheeightword
forms,bothwithinandacrosssixtestsessions,whichincludedthreetrialsforeachwordin
eachoftheeighttime‐reversedforms.Twosessionswereexcludedfromdataanalysis,because
Panzeewasconsistentlydistractedanddisinterested.Duringoneofthesesessions,she
constantlymovedawayfromthetestscreentolookoutawindowatcarscomingandgoing.
Duringanothersession,sherepeatedlymovedawayfromthetestscreentoaskfordifferent
foodrewards.Althoughsheeventuallycompletedthesetwosessions,herperformancewas
lessthan60%correctwitheventheeasiest(mostnatural‐sounding)time‐reversedwords(e.g.,
TR25andTR50).
57
Binomialtestswereconductedtocompareperformancetoachancerateof25%for
eachversionseparately.AKruskal‐Wallistestwasusedtotestforanoveralleffectamongword
forms,withMann‐WhitneyUtestsappliedinpost‐hoc,pair‐wisecomparisons.Therelationship
betweenperformanceandthetime‐reversedvalueformwasmodeledusinglinearregression.
Inadditiontothisstatisticaltesting,twothresholdvalueswerecomputedforword
intelligibilityasafunctionofwindow‐length.Thesevalueswerebasedontherationalethatthe
thresholdrepresentsthehalfwaypointbetweennoperceptionandperfectperception.While
thelowboundarycouldbetakenaschance‐rateperformanceat25%correct,thehigh
boundarywasnotasclear‐cut.A“high”thresholdwas,therefore,setasthemidpointbetween
25%and100%(62.5%),anda“low”thresholdwassetasrelativeto80%(52.5%),andboth
valueswereatabovechance‐rateperformance(p<0.001).ThelattervaluerepresentsPanzee’s
overall,long‐termperformancelevel,basedonahistoricrangeof75%to85%innaturalword
recognition(M.J.Beran,personalcommunication,January2010),aswellasherperformancein
Experiment1.Panzee’s50%thresholdperformanceusingthesetwothresholdvalueswasthen
interpolatedbasedonintelligibilityratesattheclosesttestedreversal‐windows.
Humanmeanpercentage‐correctperformanceforeachwordformwascomputedfor
Group1andGroup2separately.ANOVAwasusedtotestforanoveralleffectwithindividual
word‐formresultscomparedusingTukeypost‐hoctests.Linear‐regressionanalysiswasalso
applied;andforregressionpurposes,Group1andGroup2participantsweretreatedasasingle
sample.
58
4.7Results
AsshowninFigure10,Panzee’scorrectwordrecognitionforthesixsessionsranged
from49%to63%,withameanof56%overallsessions,andwasstatisticallyabovechance
level,p<0.001.Performancewasalsostatisticallyabovechancelevelforallindividualword
forms.Regressionanalysisrevealedthatwindowlengthpredictedpercentagecorrect,
β=‐0.17,p<0.01,andasignificantamountofthevariance,R2=.80,F(1,6)=23.7,p<0.01.
AKolmogorov‐Smirnovtestshowedthatthedatawerenotnormallydistributed.AKruskal‐
Wallistestwasconducted,andrevealedanoverallperformancedifferenceamongtheTRword
forms,X2(7)=16.7,p=0.019.BecausePanzee’sperformancewaslessthan60%correctfor
TR125,TR150,TR175,andTR200words,ameancomparisonMannWhitneyUtestwas
conductedbetweenTR125andTR175totestforapotentialdifferenceinintelligibilityperform‐
ance,similartothatreportedforhumansbySaberiandPerrott(1999).Resultsdid,infact,
revealasignificantdifference,p<0.05.Panzee’shighandlow50%‐intelligibilitythreshold
pointswere92.5msand141.0ms,respectively.
59
Figure10.Experiment2chimpanzeeandhumanwordrecognition.Thetopfigureshows
Panzee’stime‐reversedwordperformance,correspondingregressionline,andchance‐level
accuracy.Both“high”and“low”thresholdsarealsonoted(seetextforfurtherdetails).The
bottomfigureshowsmeantime‐reversedwordperformanceforhumans,withstandarderror
barsandtheregressionline.ThewhitelineontheTR200barsrepresentspercentage‐correct
valuesforPanzeeandhumanparticipantswhenwordswithfinalwindow‐lengthsof130ms
orlesswereexcludedfromanalysis.Asterisksdenotestatisticalsignificance:*=p<0.05,
**=p<0.001.
Meanpercentage‐correctwascalculatedforhumansforeachTRvalue,andrangedfrom
almost100%correctforTR25to23%forTR200(seeFigure10).Regressionanalysisrevealed
thatreversalwindow‐lengthwasastrongstatisticalpredictorofpercentagecorrect,β=‐0.45,
60
p<0.001,andaccountedforalmostalloftheassociatedvariance,R2=.97,F(1,6)=200.3,
p<0.001(seeFigure10).AfteraKolmogorov‐Smirnovtestconfirmednormality,ANOVAof
Group1datashowedanoveralleffectamongTR25,TR75,TR125,andTR175,F(3,44)=62.3,
p<0.001.Tukeypost‐hoccomparisontestsrevealedstatisticallysignificantdifferences
betweenTR25andTR75,andTR75andTR125(both,p<0.001).However,therewasno
performancebetweenTR125andTR175forms.Afteragainconfirmingnormality,anANOVA
showedanoveralleffectamongTR50,TR100,TR150,andTR200Group2performance,
F(3,44)=61.2,p<0.001.Tukeypost‐hoccomparisonsrevealedstatisticaldifferencesbetween
TR50andTR100,andTR100andTR150(both,p<0.001),butnotbetweenTR150andTR200.
Theintelligibilitythresholdvaluecalculatedforhumanswas121.6ms.
4.8Discussion
Inthisexperiment,Panzeeagaindemonstratedproficiencywithwordsinnaturalform
duringorientationsessions.Moreimportantly,herperformancerevealedherabilitytoidentify
wordsintime‐reversedform.Time‐reversedspeechhasbeendescribedas“themostdrastic
formoftimescaledistortion”(Licklider&Miller,1960).Despitethedramaticchangesinvolved,
Panzee,likehumanparticipants,recognizedspeechinmanyoftheseforms.Inaddition,the
declineinrecognitionperformance,expectedasthereversalmanipulationcomestoaffect
adjacentphonemes,occurredsimilarlyinPanzeeandthehumans.
BothPanzee’sandthehumans’performancerevealedthattime‐reversalwindowlength
significantlypredictedpercentagecorrectoutcomes.AsinSaberiandPerrott’s(1999)work
withsentences,wordintelligibilitywasbelow50%atreversal‐windowlengthsofapproximately
130msforbothPanzeeandhumanparticipants.Usingamaximumpossibleperformanceof
61
100%,her50%intelligibilitythresholdwasjustabove90ms.Settingthatlevelat80%,her
thresholdwasat141ms.Bothvaluesaresimilartothecurrentmeanthresholdvalueof122ms
forhumans,aswellasSaberiandPerrott’s(1999)outcomeofapproximately130ms.Although
Group1participantsdidnotdemonstrateasignificantdecreaseinperformancebetweenTR125
andTR175,theydidshowadifferencebetweenTR75andTR125(81%and48%,respectively).
Group2participantsshowedasignificantdecreasebetweenTR100andTR150(63.3%and
33.8%,respectively).
AlthoughPanzee’spatternofdecreasingperformancefromTR25toTR200wassome‐
whatdifferentthanmeanhumanperformance,oneparticipantshowedexactlythesame
patternasPanzeeforTR25,TR75,TR125,andTR175wordforms.Thisparticipantperformedat
100%onTR25andTR75words,andperformanceforTR125(50%)waslessthanatTR75(15%).
Furthermore,bothPanzeeandhumansfounditdifficulttoidentify“tomato”and“potato”for
TRlengthsof125ormore,withPanzeeonlydoingsoinoneinstance(“potato,”inTR200form).
Humansshowedatotalofonly29%correcttrialsforthesetwowordsinthefourlonger‐
windowforms.
Unexpectedly,PanzeedemonstratedbetterperformanceonTR200wordsthanwith
eitherTR150orTR175forms.Whileearlierworkhasonlyusedsentencestimuli(Saberi&
Perrott,1999),thisworkusedwordstimuli,whichareshorterinduration.Itmaybethatthe
differencebetweenusingsentence‐andword‐lengthstimulihadaminoreffectonthecurrent
results.For12ofthe20wordsthefinalreversalwindowencompassedonly130msorless
whenthewordswereinTR200form.Thisfinalsegmentcould,therefore,haveprovideda
strongerindicationofwordidentitythanintendedforbothPanzeeandthehumanlisteners.In
62
all,18ofPanzee’s28(64.3%)correctresponsestoTR200wordswereforthese12items.In
humans,the12wordsaccountedfor38.9%ofthecorrectTR200trials.Excludingthesewords
fromanalysisproduced13.9%and13.8%correctforPanzeeandthehumans,respectively.
Theseoutcomesarelowerthanforanyotherwindowlength(seeFigure10).
Ifthehypothesisthattime‐reversedspeech“works”becausereversalwithina
phoneme‐lengthwindowdoesnotaffectprocessing,thenPanzeeisevidentlylisteningto
phoneme‐lengthsegments.Ifphonemesaretypicallyconsideredtobe50‐to100‐mslong
(Crystal&House,1988),thenPanzeedidwellatthosewindowlengths.Itwasnotclearexactly
howtosetPanzee’s50%intelligibilitythreshold.However,thehighandlowpointsthatwere
chosenproducedvaluessimilartohumanthreshold.Usingaverydifferentrationalethanwas
usedforproducingsine‐waveandnoise‐vocodedspeech,Panzee’sperformancesuggeststhat
shesegmentsspeechbasedonthesamephoneme‐basedorganizationashumans,andmay
alsohearwordsasasequenceofphonemes.Theresultsindicatethatdetailedauditoryanalysis
oftheshort‐termacousticspectrumisnotessential.Rather,theamplitudeenvelopeismore
likelyimportant,andshorttime‐reversalwindowsdonotappeartosignificantlyimpairspeech
perceptionofthephonemeinformationwithinthewindow.However,performancedeclines
whenwindowsarelongenoughtocauseoverlapofphoneme‐lengthinformation.Thisdecrease
revealsthatforbothspecies,extensivealterationofthespectro‐temporalcuingcancritically
affectperformance.
5.EXPERIMENT3
Inadditiontothequestionsregardinghowspecificacousticelementscontributeto
lexicalinformation,therearemoregeneralperceptualproblemsthatarise.Thelackof
63
invarianceproblem,asdescribedearlier,reflectsthehighvariabilityinspeechacousticsthat
resultsfromthedifferentphysicalandphysiologicalpropertiesofindividualtalkers(Pisoni,
1997).Becauseofthesedifferencesintalkeracoustics,speechmustbenormalizedinorderfor
listenerstoperceivethecommonlexicalidentitiesofindividualwords.Listenersroutinely
normalizespeechfrombothfamiliarandunfamiliartalkers,despitedifferencessuchasageand
sexclasses,andlanguagebackgrounds.Acoustically,thisvariationaffectsavarietyoffeatures,
suchasF0range,formantfrequencies,speakingrate,andacousticalpatterningforagiven
phoneme(forareviewseeBenzeghibaetal.,2007).
Notonlyisthereadifferencebetweenmaleandfemalevoices,inthatmalevoiceshave
alowerF0,butchildren’sspeechisalsoverydifferentfromadultspeech.Children’sspeechis
typicallycharacterizedbyhigherpitchandformantfrequencies,especiallyforvowels(Gerosa,
Giuliani,&Brugnara,2007;Lee,Potamianos,&Narayanan,1999).Inaddition,childrenunder
theageofseventypicallyhavelongerphonedurationandlargerspectralandtemporal
variabilityinconsonantsandconsonant‐voweltransitionsthanolderchildrenandadults
(Gerosa,Lee,Giuliani,&Narayanan,2006).Itisthesecharacteristicsthatcanoftenmakechild
speechmoredifficulttounderstandthanadultspeech.
Talkernormalizationemergesearlyinhumanontogeny,asshownbyfindingthatinfants
aremoresensitivetotalkervariationatsevenandahalfmonthsofagethanattenandahalf
months(Houston&Jusczyk,2000).However,theprocessbywhichtalkernormalizationoccurs
isnotwellunderstood,anddifferentmodelshavebeenproposed(Creel&Tumlin,2009).Some
researchersbelievethattheprocessisonewherebythelistenerstripsawayindividualtalker
informationtoextractphonemecontentinabstractform.Othersinsteadproposethat
64
generalizingacrosstalkersisbasedonlearningandimplicitlyrememberingalargenumberof
instancesofspeechsoundsfrommanydifferentindividualtalkers(Creel&Tumlin,2009;
Sumner,2011).Thelatterviewissupportedbythefactthatlearningtalker‐specificcharacteris‐
ticscanimprovelinguisticprocessing(Nygaard&Pisoni,1998;Nygaard,Sommers,&Pisoni,
1994;Pisoni,1995).
Althoughtherearedifferentviewsregardinghowtalkernormalizationoccurs,this
phenomenonhasnotbeenstronglylinkedtoeitherSiSorAuditoryHypothesispositions.On
theonehand,thedifficultyoftalkernormalizationcouldbeinterpretedasevidencefortheSiS
argument.Ontheotherhand,someevidencefavorstheAuditoryHypothesis.Forexample,
somenon‐primates,suchaschinchillasandbirds,havedemonstratedtheabilitytocategorize
speechsoundsandmonosyllabicwordsandthengeneralizeperformancetoatleastasmall
numberofunfamiliartalkers(Kuhl&Miller,1975;Ohmsetal.,2010).Budgerigarshavedemon‐
stratedthattheyarelesssensitivetovowelsfromdifferenttalkersthantovowelsbetween
categories(Dooling&Brown,1990).Thereisalsoindirectevidencethatapesnormalizespeech
acrosstalkers.Forexample,Panzeeandseverallanguage‐trainedbonobosinteractwithatleast
adozenhumansonaneverydaybasis,andreactappropriatelytospeechcommandsand
requests(M.J.Beran,personalcommunication,January2010).
However,theempiricalnonhumannormalizationstudieshavetypicallyinvolvedonlya
fewtalkersandshort,oftenrudimentarysounds.Thus,Experiment3wasdesignedtoaddress
talkernormalizationmoresystematicallyinPanzee,providinganopportunitybothtotesther
withspeechfromalargenumberofindividuals,andtopresentmorecomplex,lexically
meaningfulstimuli.Asbefore,thehypothesiswasthatshewouldbesimilartohumansin
65
demonstratingnormalization,becauseshehasheardspeechfrommanydifferenttalkers
throughherlifeandcurrentlyrespondstoEnglishspokenbyvariousindividuals.However,both
earlierwork(Heimbaueretal.,2011)andExperiments1and2involvedthespeechofonly
talker,MJB.Specificinformationregardingherpotentialtalkernormalizationabilities,there‐
fore,isnotavailable.Experiment3includedbothfamiliarandunfamiliaradulttalkers,witha
varietyofdialectbackgrounds.Wordsfromyoungchildrenwerealsopresented.Panzeehas
beenexposedtochildren’svoicesmuchlessfrequently,especiallylaterinherlife,eitherin
experimentsorininformalinteractions.Performancewiththesetalkerswascomparedto
recognitionofspeechfromtalkerMJB,whoisarguablyoneofthetalkersPanzeeismost
familiarwith.
5.1Subject
ThesubjectwasthechimpanzeePanzee.
5.2Participants
Nohumanlistenersweretested.However,audio‐recordingincludedatotalof31
differentnative‐Englishspeakersincluding21adultsand10children.Thesetalkersincludedthe
familiartalkerMJB,5familiarmales,5unfamiliarmales,5familiarfemales,5unfamiliar
females,and5boysand5girls(allunfamiliar).Theagerangeofadulttalkerswas20to72years
old,andtheagerangeforchildrenwas4to7yearsold.
5.3Audiorecordingandstimuli
Teststimuliconsistedof15two‐syllable,14three‐syllable,and3four‐syllablewords
(seeTable1).Alltalkerswererecordedspeaking48words,butonly32wereusedinthe
experiment.Someofthewordsweredifficultforthechildrentopronounce,andtheydidnot
66
alwaysspeakclearly.The32wordswerechosenonthebasisoffindingthebestrecordings
fromall31talkers.MJBand20additionalnativeEnglish‐speakingadultswererecordedreading
theindividualwordsfromindexcards.TennativeEnglish‐speakingchildrenwererecordedas
theynamedphotographsappearingindividuallyinaMicrosoftPowerPointpresentation.Ifa
childcouldnotnameaphotograph,theyweretoldthewordexplicitly.The30newtalkerswere
groupedasfamiliaradultmales(FAM),familiaradultfemales(FAF),unfamiliaradultmales
(UAM),unfamiliaradultfemales(UAF),unfamiliarmalechildren(UCM),andunfamiliarfemale
children(UCF).MJBwasre‐recordedforthisexperimentusingthesameequipmentthatwas
usedtorecordtheother30talkers.
TalkerswerefromawiderangeofareaswithintheUnitedStates,withavarietyof
regionaldialectbackgrounds.MJBwasborninOhio,andhadalsolivedinAlabamaandGeorgia.
Theother10familiartalkerswereborninsixdifferentstatesandinGermany,andhadlivedina
totalof14otherstatesandWashington,DC.Thenorthern‐mostofthesestateswasMichigan,
thesouthern‐mostwasFlorida,theeastern‐mostwasNewYork,andthewestern‐mostwere
CaliforniaandOregon.SometalkershadalsolivedinGermany,Japan,Nepal,Switzerland,and
Taiwan.The10unfamiliartalkerswereborninsixdifferentstatesandinPuertoRico,andhad
livedinatotalofnineotherstates.Thenorthern‐mostofthesewasNewYork,thesouthern‐
mostwasLouisiana,theeastern‐mostwasNewJersey,andthewestern‐mostwereCalifornia
andHawaii.OneofthesetalkershadalsolivedinGermany.Allofthechildrenhadbeenborn
andraisedexclusivelyineitherGeorgiaorNewYork.
67
5.4Chimpanzeeprocedure
Panzeewastestedforatotalof14sessions,eachofwhichincluded80trials.Inthefirst
session,sheheard16testwords(GroupE)spokenbyMJB.Inthenextsixsessionssheheard
GroupEwordsspokenbythefivetalkerswithineachofthespecifictalker‐typegroups.The
sessionorderoftalkertypesfortestingwas:FAM,UAM;FAF;UAF;UCM;UCF.Intheeighth
session,Panzeeheardtheremaining16testwords(GroupF)spokenbyMJBinasession,which
wasfollowedbysixsessionswithGroupFwords—onesessionforeachofthesixtalker‐type
groups.Testingorderdifferedrelativetotheearliersessionsandwas:UAM;FAF;UCF;FAM;
UCM;UAF.Inthisexperiment,Panzeechosefromfourlexigrams(seeFigure11),insteadoffour
photographs.Thischangewasmadetobeabletoeventuallycompareresultingdatatoan
earlier,unpublishedexperimentthatalsousedlexigrams.Panzeereceivedauditoryfeedback
oneverytrial,andwasrewardedforallcorrectresponses.Thisrewardschedulekeptherhighly
motivated,andcouldbeusedbecauseeachtrialwasunique.
Figure11.SamplesoflexigramsusedinPanzee’sspoken‐wordrecognitiontask.
5.5Dataanalysis
Panzee’sdatawereanalyzedasinExperiment2.
68
5.6Results
AsshowninFigure12,Panzee’smeancorrect‐trialperformance,wascalculatedforeach
talkerandaveragedforthetwosessionsforeachdifferenttalkertype,rangedfrom75.6%
(MJB,UCF)to81.3%(FAM).Wordrecognitionwassignificantlyabovechancelevelforalltalker
types,p<0.001.AKolmogorov‐Smirnovtestshowedthatthedatawerenotnormallydistrib‐
uted,andaKruskal‐Wallistestwasconductedusingcombinationsoftalkertypes.Because
performanceforalltalker‐typeswassimilartoannualperformancelevels(seeFigure12),some
talker‐typeswerecombinedtoanalyzeperformancecomparisons.Theseanalysiscategories
were:“AllFamiliarAdults”(FAMandFAF),“AllUnfamiliarAdults”(UAMandUAF),“AllAdult
Males”(FAMandUAM),“AllAdultFemales”(FAFandUAF),and“AllChildren”(UCMandUCF).
Therationaleforcombiningdatafromboysandgirlswasthatpriortopuberty,vocaltractsand
vocalfoldsofgirlsandboysareverysimilar(Simpson,2009).Resultsrevealednooverall
differenceinperformanceamongthecollapsedcategories,X2(4)=0.58,ns.
Figure12.Experiment3chimpanzeeword‐recognitionperformanceacrosstalkers.Talker‐type
groupswereasfollows:MJBisthefamiliarmaleresearcher,FAMisotherfamiliaradultmales,
69
FAFisfamiliaradultfemales,UAMisunfamiliaradultmales,UAFisunfamiliaradultfemale,
UCMisunfamiliarboys,andUCFisunfamiliarfemales.Thedashedlinesignifieschance‐rate
performance.
5.7Discussion
Becausespeechacousticsarehighlyvariableoveravarietyofcharacteristics,listeners
haveto“solve”thelackofinvarianceproblemalmosteverytimetheyhearspokenlanguage.
Althoughtheprocessbywhichthisoccursremainsunclear(Creel&Tumlin,2009),speech
experiencelikelyplaysamajorrole.Infantsdemonstrateatleastsometalkernormalization
abilitybytheageoftenandahalfmonths(Houston&Jusczyk,2000),andadults,havingavast
amountofspeechexperience,routinelynormalizeacrossawiderangeoftalkers(Benzeghibaet
al.,2007).Althoughsomenonhumanshaveshownanabilitytodiscriminateandcategorize
speechsounds(Dooling&Brown,1990;Kuhl&Miller,1975;LoebachandWickesberg,2006;
Ohmsetal.,2010),talkernormalizationhasnotpreviouslybeensystematicallyinvestigatedin
nonhumans.
Inthecurrentexperiment,Panzeewastestedmoredeliberately,demonstratingthe
abilitytonormalizeacrossarangeoftalkersproducingherfamiliarwords.Notsurprisingly,
PanzeerecognizedthesewordswhenspokenbythefamiliarresearcherMJBatratesimilarto
thatfoundinmanyprevioustestswithhisvoice.However,shealsoshowedessentiallythe
sameperformancewhenhearingthevoicesoffamiliarmalesandfemales,unfamiliarmalesand
females,andunfamiliarboysandgirls.Inadditiontoage‐andsex‐relatedvariation,these
talkerscamefromavarietyofregionaldialectbackgrounds.Theadultshadlivedinatotalof26
states,Washington,DC,PuertoRico,andfiveothercountries.Althoughthechildrenwereonly
70
fromthenorthernstateofNewYorkandthesouthernstateofGeorgia,theirvoiceswerethe
leastfamiliartoPanzee,aswellasbeingverydifferentfromtheadultvoices(Gerosaetal.,
2007;Leeetal.,1999).
OnepossibleinterpretationofPanzee’sperformancewithfamiliartalkersisthatshehad
previouslylearnedthefeaturesofeachperson’svoiceindividually,therebyknowingfrom
experiencewhateachofthesewordssoundedlikewhenspokenbytheseparticularindividuals.
However,thatexplanationcannotaccountforherabilitytorecognizetheunfamiliaradultsor
children.Amorelikelyexplanationisthat,ashypothesized,Panzeewasshowinghuman‐like,
talker‐normalizationabilities.HerperformanceprovidesevidenceinsupportoftheAuditory
Hypothesis,ratherthantheSiSview.
6.GENERALDISCUSSION
Insightsintothegeneralityoftheauditoryandcognitiveprocessesinvolvedinspeech
perceptionarefundamentaltoresolvingtheSiSversusAuditoryHypothesisdebate.Although
argumentsbyGalantuccietal.(2006)andothersalmostruleoutthepossibilityofmeaningful
animalexperiments,comparisonsbetweenhumansandnonhumansareanecessity.
Mammalianandcloseprimaterelativesareofgreatestinterest,andthecurrentworkdemon‐
stratesthatcomparisonswiththeseanimalscanbeveryinformative.Whilemanyanimal
studieshaveinvestigatedperceptionofrudimentaryspeechsounds,nonehavepreviously
presentedmeaningfulwords,syntheticversionsofthoseitems,ortestedawiderangeof
differenttalkers.ThecurrentworkwaspossiblespecificallybecauseofPanzee’slanguage‐
comprehensionabilities(Beranetal.,1998;Brakke&Savage‐Rumbaugh,1995;Heimbaueret
al.,2011;Rumbaugh&Savage‐Rumbaugh,1996).
71
6.1Currentresults
Experiment1investigatedthepossibilitythatPanzeeusesthesameinformationas
humanstoidentifysyntheticspeechinsine‐waveandnoise‐vocodedforms(Heimbaueretal.,
2011).Insine‐wavespeechperception,humansperformsignificantlybetterwhenbothSW1
andSW2arepresent(Remezetal.,1981);thecomponentsthataremodeledonformantsF1
andF2inthenaturalspeechsignal.Panzee’shuman‐likeperformancewithsine‐wavespeech
indicatesthatshealsoattendsmoretotheseparticulartonesinthesyntheticspeech,with
implicationsforsensitivitytothecorrespondingformantsinnaturalspeech.Similarresults
occurredwithnoise‐vocodedspeech,showingthatbothPanzeeandhumansperformedbest
withstimulithatincludedfourorfivenoisebands,lesswellwiththree,andpoorlywithtwo.
Theseoutcomesmatchearlierfindingswithnoise‐vocodedspeech,showingdifferencesin
relativeintelligibilityofthesesynthesisformsbyhumans.
Asdiscussedearlier,itisdifficulttospecifyexactlywhichacousticcuesarecriticalin
thesetypesofsynthesizedspeech,orwhatbothhaveincommon(Remezetal.,1994).
However,itisclearthatamplitudeandfrequencymodulationovertimeisimportant,bothin
sine‐waveandnoise‐vocodedspeech(Remezetal.,1981;Shannonetal.,1995).Experiment2
testedtemporalcuingdifferently,usingasyntheticspeechformPanzeehadnoexperience
with.Thetime‐reversedspeechusedposesauniqueperceptualprobleminthatitdisrupts
acousticpatterningovertime.Intelligibilityoftime‐reversedspeechdependsonthelengthof
reversedsegmentsrelativetotypicalphonemelength.Differentialperformancehasbeen
attributedtothefactthatphoneticcontentwillberelativelyundisturbedifthelengthofthe
reversalwindowiswithinthetimeframeoftypicalphonemelength,whichisestimatedtobe
72
50to100ms(Crystal&House,1988).DatafrombothPanzeeandthehumansrevealsastrong
relationshipbetweenwindow‐lengthandpercentagecorrectidentification,withsimilar
intelligibilitythresholdsineachcase.ItwasconcludedthatPanzeewassensitivetophoneme‐
relatedcuesintime‐reversed,aswellasinnatural,speech.
Finally,Experiment3testedPanzee’sabilitytosolvethelackofinvarianceproblem
createdinidentifyingspeechacrossavarietyoftalkers,includingbothfamiliarandunfamiliar
individuals,adultsandchildren,andavarietyofdialects.Despitethehighacousticvariabilityof
humanvoices(Evans&Iverson,2004;Hillenbrandetal.,1995;Pisoni,1995;Remez,2005),
Panzeewasabletorecognizewordsfromalltalker‐typesequallywell,includingthenovel
conditionsofchildren’svoices.Panzee’sperformancewassimilartoherhistoriclevelsinevery
case,withnosignificantperformancedifferencebetweenanyofthetalkergroups.
6.2Implicationsofexperimentalresults
ItisevidentfromPanzee’sperformanceinthethreeexperimentsthatalanguage‐
trainedchimpanzeecanprovideauniqueanimalmodelforinvestigatingspeechperception.
Overall,theseexperimentalresultscontradicttheSiSperspectiveandsupporttheAuditory
Hypothesis.Inaddition,resultsrevealthelikelyspeechperceptioncapabilitiesofanape‐human
commonancestor.
6.2.1TheSiSviewversustheAuditoryHypothesis.TheSiSviewproposesthathumans
possessevolutionaryspecializationsforspeechperceptionintheformofaspeechmoduleorat
leastaspeechmodeofperception(Mann&Liberman,1983;Trout,2001).Incontrast,the
AuditoryHypothesisclaimsthatgeneralauditorycapabilitiesaresufficienttoprocessspeechin
theabsenceofuniquelyhumanspecializations(Kuhl,1988)—atleastwiththenecessary
73
experiencewithspeechinput.Manypreviousanimalstudieshaveinvestigatedspeech‐
perceptionissuesbasedonrudimentaryspeechelements,andoverallresultshavefavoredthe
AuditoryHypothesis.However,thecurrentworkandearlierdatafromexperimentstesting
Panzee(Heimbaueretal.,2011)arguemuchmorestronglyinthesamedirection.Inadditionto
showingevidenceofusingtheavailablespectro‐temporalcuesinsine‐waveandnoise‐vocoded
speechashumansdo,Panzeehasalsonowdemonstratedapparentattentiontophoneme
organization,andaleveloftalkernormalizationthatarguablysolvesthelackofinvariance
problem—atleastforindividualwords.
TheseoutcomesmaketheSiSview(Liberman&Mattingly,1989;Mann&Liberman,
1983;Whalen&Liberman,1987)appearhighlyunlikely—atleastinanystrictinterpretation.
Forexample,theSiSapproachtomodularityarguesthatthisspecializationisinnate,and
evolutionarilyuniquetohumans.Inthisperspective,Panzeecannothavesuchamodule;but
shestillshowsevidenceofprocessingspeechasifshedoes.Ifaspeechmoduledoesexistin
humans,itislikelytohaveemergedbasedonlatentspeech‐perceptioncapabilitiesthatwere
alreadypresentinthecommonancestorofhumansandchimpanzees,ratherthanfrom
scratch.Theclaimofaspeechmodeofperceptionislessextreme.Aspeechmodedoesnot
meanthatspeech‐perceptioncapabilitiesareinnateoruniquelyhuman,onlythatthelistener
beabletolearnimportantpropertiesofspeechfromexperience.Suchexperienceisroutinein
humandevelopment,andimpossibleinexactlythesameforminanyotheranimal.However,
Panzeedidreceiveconsistentexposuretohumanspeechalmostfrombirth,andher
demonstratedspeech‐perceptioncapabilitieshighlightthecriticalimportanceofsuchexperi‐
enceinthecontextoftheSiSversusAuditoryHypothesisdebate.
74
6.2.2Top‐downprocessingandspeechperceptionexperience.Panzee’sabilitiesalso
provideevidencepertainingtotop‐downprocessinginspeechperception.Eachofthefour
taskspresentedtoherrequiretop‐downprocessing,oratleastisconsideredtointhecontext
ofhumanspeechperception.Aninnate,speech‐perceptionmodulewouldbeanextremeform
oftop‐downprocessing,althoughthisapproachthendownplaystheroleofexperiencewith
speech.Panzee’sabilitiesargueagainstsuchamoduleandinfavorofastrongroleof
experience.Basedonherperformance,itismorelikelythatthecriticalfactorinhumantop‐
downprocessingisthevastamountofpassiveexperiencethathumaninfantshavehearing
speechfrombirthon,ratherthanaspeechmodule.Forinstance,experiencehearingspeech
allowsinfantstolearnwhatspeechsoundsarebeingused,howdifferencesamongsoundsmay
ormaynotbesignificanttocategorizingthem,andthemeaningsthatsoundcombinations
convey(Marcusetal.,1999;Saffranetal.,1996;Werker&Desjardins,1995).
Perceivinginaspeechmodemaybemoreamatterofexperiencethaninnateness.Each
ofthetasksPanzeewasabletoperformbasedontop‐downprocessingmayalsorepresenta
formofspeech‐modeperception.Whilesomeclassicdemonstrationsofthehypothesized
speechmodearenotapplicabletoher,Panzee’sabilitytoperceivefundamentallyaltered
syntheticstimuliashavinglexicalproperties,tocompensateorcorrectfordistortions
introducedbytime‐reversalwindows,andtomapbetweenvariablespeechacousticsand
phoneticfeaturesarespeech‐mode,andtop‐down,functions.
Panzee’sspeech‐perceptionabilitiesmayalsorepresentanexampleof“emergents”
(Rumbaugh,2002;Rumbaugh,King,Beran,Washburn,&Gould,2007)—definedforboth
humansandanimalsasimportantcomponentsoflearningandcognitionasnewbehaviorswith
75
antecedentsinpreviouslygainedknowledgeorexperience(Rumbaugh&Washburn,2003).
Theydifferfrombehaviorslearnedthroughoperantorclassicalconditioning,areconsidered
commoninnonhumans,andarearguedtoprovidethepotentialbasisfornewandinnovative
actions.Emergentsmaybenecessaryforadaptiveandbehaviorallyflexiblespeciestomeet
newchallengesincomplexenvironmentsandcanbeexpressedindifferentsituationsatthe
firstopportunity.Thespeech‐processingabilitiesthatPanzeehasnowareemergentsinthis
sense,withherextensiveexperiencewithnaturalspeechprovidingthebasisforsolvinga
varietyofperceptualproblems—includingnewones.
Thesespeech‐perceptionabilitiesarealsoatestamenttoPanzee’slanguage‐richand
enculturatedrearinghistory(Rumbaugh&Savage‐Rumbaugh,1996).Afewotherapeshave
beensimilarlyraised,includingthenow‐adultbonobosKanziandPanbanisha.Theseanimals
haveshownstrongevidenceofunderstandingspokenEnglishwords(Beranetal.,1998;Brakke
&Savage‐Rumbaugh,1995;Rumbaugh&Savage‐Rumbaugh,1996),includingwhenordered
syntacticallyinmeaningfulsentences(Savage‐Rumbaughetal.,1993).Incontrast,apesraised
usingthesamelexigramsandphotographstocommunicatebutwithoutearlyorextensive
functionalspeechinputdonotshowanynotableabilitytoidentifyspokenwordswhentested
annually(M.J.Beran,personalcommunication,January2010).
6.3Cognitiveprocessingandlanguage
AlthoughPanzeecanrecognizefamiliarwordsinnaturalandsyntheticversions,sheis
unlikelytobeprocessingspeechexactlyashumansdo.Forinstance,sheisclearlylessefficient
thanhumanswithspeechinanyoftheformstested.Herbestperformanceneverapproached
routinelevelsofspeechrecognitioninhumans,asdemonstratedinExperiment3whenshe
76
neverperformedabove84%correctwithexclusivelynaturalwords.Thereislikelyaneffectof
relativelyunimportantfactors,suchasPanzeemayhaveexperiencedfrustrationorboredomby
repeatedlybeingtestedwiththesamelimitedsetofwordsandwithpotentiallydifficult
versionspresentedunpredictably.AlthoughPanzeerecognizesthespokenwordscorrespond‐
ingtolexigramsandphotographs,herinabilitytoproducespeechherselfsetsherapartfromall
humantalkers.Inhumans,criticalaspectsofspeechknowledgecontinuetodevelopthrough‐
outchildhood,withanindividual’sownspeechproductionperceptualknowledgeplayingarole.
Growingawarenessofthephonologyoflanguagesimilarlyincreasesbothrecognitionofand
manipulationofspeechsounds(Goswami,2006,2008).WhilePanzee’sperformancedemon‐
stratesthatspeechproductionisnotnecessaryforspeechperception,productionmay
nonethelessbeacriticalcomponentofhumanspeechprocessing.
AlthoughPanzee’sperformancesupportstheAuditoryHypothesisandageneralaudi‐
torymodelofspeechperception,thespecificcognitiveprocessesinvolvedarelargely
unknown.Humansareproposedtohavea“phonologicalloop”inworkingmemorythatstores,
rehearses,andmanipulatesauditoryinputinspeechform(Baddeley&Hitch,1974).No
informationisavailableconcerningPanzee’sworking‐memorycapabilities,andshemaybe
processingspeechinphonologicalratherthanpurelyauditoryforminshort‐termstorage.
Baddeley,Gathercole,andPapagno(1998)havesuggestedthatthephonologicalloopin
humansmightserveasalanguage‐learningdevicewithanintegralroleforbothspokenand
writtenlanguageacquisition.Itispossiblethatdifferencesbetweenthespeechperception
abilitiesofPanzeeandthoseofhumansmaybeduetodifferencesinthedevelopmentof
workingmemoryandpossiblyevenothercognitiveprocesses.Similarly,itisnotknownhow
77
Panzeemapsthewordsshehearsontocorrespondingmeaningsinlong‐termmemory.
Differencesbetweenchimpanzeeandhumanlong‐termmemorymayberesponsibleforthe
decreaseinherefficiencywhenrespondingtowordrecognitiontasks,withoutreflectingimpor‐
tantdiscrepanciesinprocessingandcategorizingthespeechsoundsthemselves.
Panzeemayevenbeatadisadvantagesimplyduetobeingunabletotakelanguageuse
tothelevelofreading.Brain‐imagingstudiesinvestigatingbraindevelopmentinliterateversus
illiterateadultsprovideevidencethatreadinginfluencesbrainstructure,specificallybeing
correlatedwithincreasedwhitematterandconnectivityinthelefthemisphere.Inthespecific
areasinvolved—thecorpuscallosum,inferiorparietalregions,andparieto‐temporalregions—
areallinvolvedinreadingandverbalworkingmemory.Illiterateadultsareoftenmoreright
lateralizedanddonotshowcorrespondingwhitematterandconnectivityeffects.Theseresults
indicatelearningtoreadislinkedtobrainplasticityandaidsindevelopmentofleft‐
lateralization(Carreirasetal.,2009;Petersson,Silva,Castro‐Caldas,Ingvar,&Reis,2007).
Musicaltrainingalsoleadstostrongerleftlateralizationoftheperisylvianbrainareas
associatedwithlanguage(i.e.,Broca’sandWernicke’sregions;Limb,Kemeny,Ortigoza,
Rouhani,&Braun,2006),andimmatureneuralresponsestorhythmiccuesindyslexicchildren
potentiallyimpedespeechdevelopment(Goswami,2006).Increasesinoverallleftlateralization
duringdevelopmentmaybeimportantforspeech‐perceptionefficiencyaswell.
OtherexperimentsbyPeterssonandcolleagues(2007)demonstratethatliterateversus
illiteratelistenersmayengageindifferenttypesofcognitiveprocessingwhenlisteningtoand
repeatingspeech.Theyproposedthatwhileliteratelistenersreliedsolelyonlanguageprocess‐
ing,illiterateparticipantsintheirstudyalsoengagedinvisual‐spatialprocessing.Speculatively,
78
alanguage‐trainedapeshowinglesshemisphericlateralizationmayalsobeutilizingmore
visuallybasedprocessingand“seeing”lexigramswhenhearingEnglishwords,insteadof
focusingontheiracousticproperties.Panzeemaybeprocessingspeechdifferentlybecause
ofalackoftheexperiencethathumansobtainbylearningtoread,whichthenallowsthem
totakeadvantageofbrainplasticityforenhancementofperception(Carreirasetal.,2009;
Peterssonetal.,2007).Inthiscase,itmaybethatdevelopmentofaspecificcognitiveability
affectsthedevelopmentofanother.AllthehumanstestedforcomparisontoPanzeemayhave
hadanadvantageduetoacombinationofgeneticandenvironmentalfactorscontributingto
functionalhemisphericspecialization(Peterssonetal,2007),whichislikelythecaseformany
humanlanguageabilities.
6.4Futuredirections
Thecurrentworkhasonlyscratchedthesurfacewithrespecttopotentiallanguage‐
relatedexperimentalworkwithPanzee.Forexample,additionalexperimentsconductedwith
bothPanzeeandotherlanguage‐trainedchimpanzeescouldprovideinformationaboutspeech‐
perceptionandtheunderlyingmechanismsinvolved,andcouldcontributetothediscussionof
theevolutionofassociatedcognitiveprocesses.Asafollow‐uptoExperiment1,experiments
withPanzeeandhumanscouldbeconductedtoinvestigatemappingofspectro‐temporal
acousticcuesontophoneticfeaturesandtolexicalidentitymoreprecisely.Specifically,the
effectsofsmallchangesatpointsinthestimulithatarecriticalforperceptionbyhumanscould
becomparedtocomparablechangesmadeatnon‐criticalpoints.
ResultsofExperiment2alsoprovideopportunitiesforinvestigatingmoredetailed
aspectsofPanzee’sspeechperception.Asdemonstrated,time‐reversedspeechbecamemore
79
distortedandunintelligibleasreversal‐windowlengthincreased.ItappearsthatPanzee,like
humans,hashadadequateearlydevelopmentalexperiencewithlanguagetobeabletouse
phoneme‐lengthinformationtoperceivespeech.However,SaberiandPerrott(1999)also
reportedthatperceptionoftime‐reversedspeechwasrobusttotemporalshiftsmadetoentire
segments.Thismanipulationcreatedapparentreverberationinthespeechsounds,butdidnot
importantlydisturbphoneticperception.FutureexperimentscouldinvestigatePanzee’sability
torecognizetime‐reversedspeechusingthis“delaymethod”tomaptherobustnessofthetop‐
downprocessingmechanismssheutilizes.
Panzee’stalker‐normalizationabilitiescanalsobefollowedupon.Resultsofthis
experimentdidnotshedmuchlightonhownormalizationoccurs,althoughthetopicisstill
beingdebatedinthespeechresearchcommunity.Here,Panzeemaybestoringtalker‐specific,
speech‐soundvariants,ashumando(Bradlow,Nygaard,&Pisoni,1999;Creel&Tumlin,2009;
Sumner,2011),andthenusingtheserepresentationstogeneralizetonovel,unfamiliar
instancesofthesesounds.ExperimentswithPanzeecouldintroducehertonewwordsspoken
eitherbymultipletalkersorasingleindividual.Afterwordlearning,herabilitytoperceive
thoseitemsfromnoveltalkers,assyntheticreplicas,orinalteredandreducedversionswould
revealmoredetailsabouthowsheremembersandrepresentsphoneticinformation.
Finally,Panzee’spossibleuseofvisualcuestospeechhasneverbeeninvestigated.
Speechperceptionistypicallyamulti‐modalevent,withcuesavailablebothfromtheacoustic
signalandcorrespondingtalkerarticulationmovements.Attendingtothesevisualcuesis
knowntoaidhumanspeechperceptioninnoisyenvironments(Rosenblum,2005),and
facilitatespre‐linguisticdeafchildreninacquiringsomeaspectsofphonologicalawareness
80
throughlipreading(Dodd&Hermelin,1977).Visualcuesalsohelpsightedchildreninlearning
todistinguishthefunctionalunitsofspokenlanguage,whiletheabsenceofthisinformationisa
detrimenttolanguagelearninginblindchildren(Mills,1987).
PattersonandWerker(2003)havearguedthattheabilityofevenyounginfantsto
integrateauditoryandvisualphoneticinformationisevidenceofuniquelyhumanspeech
mechanisms,aproposalthatiswellsuitedtotestingwithPanzee.IzumiandKojima(2007)have
providedsuggestiveevidencebytestingachimpanzeewithauditoryandvisualinformation
fromvocalizingconspecifics.Theirstudywaslimitedinthenumberofvocalizationsandvisual
stimulithatcouldbepresented—limitationsthatarenotapplicableinPanzee’scase.Infact,
shecouldbetestedquiteextensivelyforevidenceoftheintegrationofauditoryandvisual
informationusingavarietyofspeechsounds,articulatorymovements,andtalkers.
Thesearejustafewofthequestionsrelatingtospeechperceptioncapabilitiesthat
futureresearchwithalanguage‐trainedchimpanzeesuchasPanzeecouldaddress.Future
studiescouldinvestigateawiderangeofpertinentissuesregardingthedetailsofboth
perceptualandothercognitiveaspectsofspeechprocessing,sheddinglightnotonlyonher
particularabilities,butalsoonthoseofhumansandancestralapes.Themostcompellingmoti‐
vationforpursuingsuchwork,however,maybethatPanzeeisoneofaverysmallnumberof
animalswiththeseuniquespeechperceptionabilitiesand,therefore,shouldbeinvolvedin
suchresearchtothegreatestextentpossible.
REFERENCES
Allin,E.F.(1975).Evolutionofthemammalianmiddleear.JournalofMorphology,147,403‐
437.
81
Appelbaum,I.(1996).Thelackofinvarianceproblemandthegoalofspeechperception.Fourth
InternationalConferenceonSpokenLanguageProcessing,Philadelphia,3,1541‐1544.
Baddeley,A.,Gathercole,S.,&Papagno,C.(1998).Thephonologicalloopasalanguagelearning
device.PsychologicalReview,105,158‐173.
Baddeley,A.D.,&Hitch,G.(1974).Workingmemory.InG.A.Bower(Ed.),Recentadvancesin
learningandmotivation(pp.47‐89).NewYork:Academic.
Barkat,M.,Meunier,F.,&Magrin‐Chagnolleau,I.(2002).Intelligibilityoftime‐reversedspeech
inFrench.ISCATutorialandResearchWorkshoponTemporalIntegrationinthePercep‐
tionofSpeech,Aix‐en‐Provence,France,P1‐3.
Beckers,G.J.L.(2011).Birdspeechperceptionandvocalproduction:Acomparisonwith
humans.HumanBiology,83,192‐212.
Beecher,M.D.(1974).Hearingintheowlmonkey(Aotustrivirgatus)I:Auditorysensitivity.
JournalofComparativePhysiologyandPsychology,86,898‐901.
Benzeghiba,M.,DeMori,R.,Deroo,O.,Dupont,S.,Erbes,T.,Jouvet,D.,Fissore,L.,Laface,P.,
Mertins,A.,Ris,C.,Rose,R.,Tyagi,V.,&Wellekens,C.(2007).Automaticspeechrecog‐
nitionandspeechvariability:Areview.SpeechCommunication,49,763‐786.
Beran,M.J.,Savage‐Rumbaugh,E.S.,Brakke,K.E.,Kelley,J.W.,&Rumbaugh,D.M.(1998).
Symbolcomprehensionandlearning:A"vocabulary"testofthreechimpanzees.
EvolutionofCommunication,2,171‐188.
Beran,M.J.,&Washburn,D.A.(2002).Chimpanzeerespondingduringmatchingtosample:
Controlbyexclusion.JournaloftheExperimentalAnalysisofBehavior,78,497‐508.
82
Best,C.T.,Studdert‐Kennedy,M.,Manuel,S.,&Rubin‐Spitz,J.(1989).Discoveringphonetic
coherenceinacousticpatterns.Perception&Psychophysics,45,237‐250.
Boersma,P.,&Weenink,D.(2008).Praat:doingphoneticsbycomputer[Computerprogram].
Version5.1.11,retrieved1September2008fromhttp://www.praat.org/.
Bradlow,A.R.,Nygaard,L.C.,&Pisoni,D.B.(1999).Effectsoftalker,rate,andamplitudevaria‐
tiononrecognitionmemoryforspokenwords.Perception&Psychophysics,61,206‐219.
Brakke,K.E.,&Savage‐Rumbaugh,E.S.(1995).Thedevelopmentoflanguageskillsinbonobo
andchimpanzee‐I.Comprehension.Language&Communication,15,121‐148.
Breedlove,S.M.,Watson,N.V.,&Rosenzweig,M.R.(2007).Biologicalpsychology(5thed.).
Sunderland,CT:Sinauer.
Carreiras,M.,Seghier,M.L.,Baquero,S.,Estevez,A.,Lozano,A.,Devlin,J.T.,&Price,C.J.
(2009).Ananatomicalsignatureforliteracy.Nature,461,983‐986.
Carruthers,P.(2002).Thecognitivefunctionsoflanguage.BehavioralandBrainSciences,25,
657‐726.
Corballis,M.C.(2009).Theevolutionandgeneticsofcerebralasymmetry.PhilosophicalTrans‐
actionsofTheRoyalSocietyB:BiologicalSciences,364,867‐879.
Creel,S.C.,&Tumlin,M.A.(2009).Talkerinformationisnotnormalizedinfluentspeech:
Evidencefromon‐lineprocessingofspokenwords.31stAnnualMeetingoftheCognitive
ScienceSociety,Amsterdam,845‐850.
Crystal,T.H.,&House,A.S.(1988).Segmentaldurationsinconnected‐speechsignals:Current
results.JournaloftheAcousticalSocietyofAmerica,83,1553‐1573.
83
Cutting,J.E.(1974).Twoleft‐hemispheremechanismsinspeechperception.Perception&
Psychophysics,16,601‐612.
Davis,M.H.,&Johnsrude,I.S.(2007).Hearingspeechsounds:Top‐downinfluencesonthe
interfacebetweenauditionandspeechperception.HearingResearch,229,132‐147.
Davis,M.H.,Johnsrude,I.S.,Hervais‐Adelman,A.,Taylor,K.,&McGettigan,C.(2005).Lexical
informationdrivesperceptuallearningofdistortedspeech:Evidencefromthe
comprehensionofnoise‐vocodedsentences.JournalofExperimentalPsychology:
General,134,222‐241.
deBoer,B.(2006).Theevolutionofspeech.InK.Brown(Ed.),Encyclopediaoflanguageand
linguistics(2nded.)(Vol.4,pp.335‐338).Amsterdam:Elsevier.
Diehl,R.L.,Lotto,A.J.,&Holt,L.L.(2004).Speechperception.AnnualReviewofPsychology,
55,149‐179.
Dodd,B.(1987).Lip‐reading,phonologicalcodinganddeafness.InB.Dodd,&R.Campbell
(Eds.),Hearingbyeye:Thepsychologyoflip‐reading(pp.177‐190).London:Erlbaum.
Dodd,B.,&Hermelin,B.(1977).Phonologicalcodingbytheprelinguisticallydeaf.Perception&
Psychophysics,21,413‐417.
Dooling,R.J.,Best,C.T.,&Brown,S.D.(1995).Discriminationofsyntheticfull‐formantand
sinewave/ra‐la/continuabybudgerigars(Melopsittacusundulatus)andzebrafinches
(Taeniopygiaguttata).JournaloftheAcousticalSocietyofAmerica,97,1839‐1846.
Dooling,R.J.,&Brown,S.D.(1990).Speechperceptionbybudgerigars(Melopsittacus
undulatus):Spokenvowels.Perception&Psychophysics,47,568‐574.
84
Dorman,M.F.,Loizou,P.C.,Spahr,A.J.,&Maloff,E.(2002).Acomparisonofthespeech
understandingprovidedbyacousticmodelsoffixed‐channelandchannel‐pickingsignal
processorsforcochlearimplants.JournalofSpeech,Language,andHearingResearch,
45,783‐788.
Doupe,A.,&Kuhl,P.K.(1999).Birdsongandspeech:Commonthemesandmechanisms.
AnnualReviewofNeuroscience,22,567‐631.
Drullman,R.(2006).Thesignificanceoftemporalmodulationfrequenciesforspeechintelligi‐
bility.InS.Greenberg,&W.A.Ainsworth(Eds.),Listeningtospeech:Anauditory
perspective(pp.39‐47).Mahwah,NJ:Erlbaum.
Ehert,G.(1987).Lefthemisphereadvantageinthemousebrainforrecognizingultrasonic
communicationcalls.Nature,325,249‐251.
Eimas,P.(1999).Segmentalandsyllablerepresentationsintheperceptionofspeechbyyoung
infants.JournaloftheAcousticalSocietyofAmerica,105,1901‐1911.
Evans,B.G.,&Iverson,P.(2004).Vowelnormalizationforaccent:Aninvestigationofbest
exemplarlocationsinnorthernandsouthernBritishEnglishsentences.Journalofthe
AcousticalSocietyofAmerica,115,352‐361.
Fitch,W.T.(2000).Theevolutionofspeech:Acomparativereview.TrendsinCognitive
Sciences,4,258‐267.
Fleagle,J.G.(1999).Primateadaptationandevolution(2nded.).SanDiego:Academic.
Fodor,J.A.(1983).Modularityofmind:Anessayonfacultypsychology.Cambridge,MA:MIT.
Galantucci,B.,Fowler,C.A.,&Turvey,M.T.(2006).Themotortheoryofspeechperception
reviewed.PsychonomicBulletinandReview,13,361‐377.
85
Gannon,P.J.,Holloway,R.L.,Broadfield,D.C.,&Braun,A.R.(1998).Asymmetryof
chimpanzeeplanumtemporale:HumanlikepatternofWernicke'slanguagearea
homolog.Science,279,220‐222.
Gans,C.(1992).Anoverviewoftheevolutionarybiologyofhearing.InD.B.Webster,
R.R.Fay,&A.N.Popper(Eds.),Theevolutionarybiologyofhearing(pp.3‐13).New
York:Springer.
Gerosa,M.,Giuliani,D.,&BrugnaraF.(2007).Acousticvariabilityandautomaticrecognitionof
children'sspeech.SpeechCommunication,49,847‐860.
Gerosa,M.,Lee,S.,Giuliani,D.,&Narayanan,S.(2006).Analyzingchildren'sspeech:Anacous‐
ticstudyofconsonantsandconsonant‐voweltransition.ProceedingsoftheInternational
ConferenceonAcoustics,Speech,andSignalProcessing(ICASSP06),France,1,393‐396.
Goswami,U.(2006).Readinganditsdevelopment:Insightsfrombrainscience.LiteracyToday,
46,28‐29.
Goswami,U.(2008).Reading,complexityandthebrain.Literacy,42,67‐74.
Greenwood,D.D.(1961).Criticalbandwidthandthefrequencycoordinatesofthebasilar
membrane.JournaloftheAcousticalSocietyofAmerica,33,1344‐1356.
Greenwood,D.D.(1990).Acochlearfrequency‐positionfunctionforseveralspecies‐‐29years
later.JournaloftheAcousticalSocietyofAmerica,87,2592‐2605.
Hackney,C.M.(2006).Fromcochleatocortex:Asimpleanatomicaldescription.InS.Green‐
berg,&W.A.Ainsworth(Eds.),Listeningtospeech:Anauditoryperspective(pp.65‐77).
Mahwah,NJ:Erlbaum.
86
Heffner,R.S.(2004).Primatehearingfromamammalianperspective.TheAnatomicalRecord
PartA:DiscoveriesinMollecularCellularandEvolutionaryBiology,281A,1111‐1122.
Heffner,R.S.,&Heffner,H.E.(1992).Visualfactorsinsoundlocalizationinmammals.The
JournalofComparativeNeurology,317,219‐232.
Heimbauer,L.A.,Beran,M.J.,&Owren,M.J.(2011).Achimpanzeerecognizessynthetic
speechwithsignificantlyreducedacousticcuestophoneticcontent.CurrentBiology,21,
1210‐1214.
Hervais‐Adelman,A.,Davis,M.H.,Johnsrude,I.S.,&Carlyon,R.P.(2008).Perceptuallearning
ofnoisevocodedwords:Effectsoffeedbackandlexicality.JournalofExperimental
Psychology:HumanPerceptionandPerformance,34,460‐474.
Hienz,R.D.,Sachs,M.B.,&Sinnott,J.M.(1981).Discriminationofsteady‐statevowelsby
blackbirdsandpigeons.JournaloftheAcousticalSocietyofAmerica,70,699‐706.
Hillenbrand,J.M.,Clark,M.J.,&Baer,C.A.(2011).Perceptionofsinewavevowels.Journalof
theAcousticalSocietyofAmerica,129,3991‐4000.
Hillenbrand,J.M.,Getty,L.A.,Clark,M.J.,&Wheeler,K.(1995).Acousticcharacteristicsof
AmericanEnglishvowels.JournaloftheAcousticalSocietyofAmerica,97,3099‐3111.
Hopkins,W.D.,Marino,L.,Rilling,J.K.,&MacGregor,L.A.(1998).Planumtemporaleasymme‐
triesingreatapesasrevealedbymagneticresonanceimaging(MRI).Neuroreport,9,
2913‐2918.
Houston,D.M.,&Jusczyk,P.W.(2000).Theroleoftalker‐specificinformationinword
segmentationbyinfants.JournalofExperimentalPsychology:HumanPerceptionand
Performance,26,1570‐1582.
87
Hugdahl,K.(2004).Dichoticlisteninginthestudyofauditorylaterality.InK.Hugdahl,&
R.J.Davidson(Eds.),Theasymmetricalbrain(pp.441‐476).Cambridge,MA:MIT.
Hugdahl,K.,&Davidson,R.J.(2004).Theasymmetricalbrain.Cambridge,MA:MIT.
Izumi,A.,&Kojima,S.(2004).Matchingvocalizationstovocalizingfacesinachimpanzee(Pan
troglodytes).AnimalCognition,7,179‐184.
Jackson,I.I.,Heffner,R.S.,&Heffner,H.E.(1999).Free‐fieldaudiogramoftheJapanese
macaque(Macacafuscata).JournaloftheAcousticalSocietyofAmerica,106,3017‐
3023.
Kalat,J.W.(2009).Biologicalpsychology(5thed.).Belmont,CA:Wadsworth.
Kashino,M.(2006).Phonemicrestoration:Thebraincreatesmissingspeechsounds.Acoustical
Science&Technology,27,318‐321.
Kimura,D.(1961).Cerebraldominanceandtheperceptionofverbalstimuli.CanadianJournal
ofPsychology,15,166‐171.
Kluender,K.R.,Diehl,R.L.,&Killeen,P.R.(1987).Japanesequailcanlearnphoneticcategories.
Science,237,1195‐1197.
Kluender,K.R.,Lotto,A.J.,&Holt,L.L.(2006).Contributionsofnonhumananimalmodelsto
understandinghumanspeechperception.InS.Greenberg,&W.A.Ainsworth(Eds.),
Listeningtospeech:Anauditoryperspective(pp.203‐220).Mahwah,NJ:Erlbaum.
Kojima,S.(1990).Comparisonofauditoryfunctionsinthechimpanzeeandhuman.Folia
Primatologica,55,62‐72.
Kojima,S.,&Kiritani,S.(1989).Vocal‐auditoryfunctionsinthechimpanzee:Vowelperception.
InternationalJournalofPrimatology,10,199‐213.
88
Kojima,S.,Tatsumi,I.F.,&Hirose,H.(1989).Vocal‐auditoryfunctionsofthechimpanzee:
Consonantperception.HumanEvolution,4,403‐416.
Kuhl,P.K.(1988).Auditoryperceptionandtheevolutionofspeech.HumanEvolution,3,19‐43.
Kuhl,P.K.(2004).Earlylanguageacquisition:Crackingthespeechcode.NatureReviewsNeuro‐
science,5,831‐843.
Kuhl,P.K.,&Meltzoff,A.N.(1982).Thebimodalperceptionofspeechininfancy.Science,218,
1138‐1141.
Kuhl,P.K.,&Miller,J.D.(1975).Speechperceptionbythechinchilla:Voiced‐voiceless
distinctioninalveolarplosiveconsonants.Science,190,69‐72.
Kuhl,P.K.,&Padden,D.M.(1982).Enhanceddiscriminabilityatthephoneticboundariesfor
thevoicingfeatureinmacaques.Perception&Psychophysics,35,542‐550.
Kuhl,P.K.,&PaddenD.M.(1983).Enhanceddiscriminabilityatthephoneticboundariesforthe
placefeatureinmacaques.JournaloftheAcousticalSocietyofAmerica,73,1003‐1010.
Ladefoged,P.(2001).Vowelsandconsonants:Anintroductiontothesoundsoflanguage.
Malden,MA:Blackwell.
Laver,J.(1994).Principlesofphonetics.Cambridge,UK:CambridgeUniversity.
Lee,S.,Potamianos,A.,&Narayanan,S.(1999).Acousticsofchildren'sspeech:Developmental
changesoftemporalandspectralparameters.JournaloftheAcousticalSocietyof
America,105,1455‐1468.
Lenneberg,E.H.(1967).Biologicalfoundationsoflanguage.NewYork:Wiley.
89
Lewis,D.E.,&Carrell,T.D.(2007).Theeffectofamplitudemodulationonintelligibilityoftime‐
varyingsinusoidalspeechinchildrenandadults.Perception&Psychophysics,69,1140‐
1151.
Liberman,A.M.(1982).Onfindingthatspeechisspecial.AmericanPsychologist,37,148‐167.
Liberman,A.M.,Cooper,F.S.,Shankweiler,D.P.,&Studdert‐Kennedy,M.(1967).Perceptionof
thespeechcode.PsychologicalReview,74,431‐461.
Liberman,A.M.,&Mattingly,I.G.(1985).Themotortheoryofspeechperceptionrevisited.
Cognition,21,1‐36.
Liberman,A.M.,&Mattingly,I.G.(1989).Aspecializationforspeechperception.Science,243,
489‐494.
Licklider,J.C.R.,&Miller,G.A.(1960).Theperceptionofspeech.InS.S.Stevens(Ed.),
Handbookofexperimentalpsychology(pp.1040‐1074).NewYork:Wiley.
Lieberman,P.(1968).Primatevocalizationandhumanlinguisticability.JournaloftheAcoustical
SocietyofAmerica,44,1574‐1584.
Limb,C.J.,Kemeny,S.,Ortigoza,E.B.,Rouhani,S.,&Braun,A.R.(2006).Lefthemispheric
lateralizationofbrainactivityduringpassiverhythmperceptioninmusicians.The
AnatomicalRecordPartA,288,382‐389.
Lindblom,B.E.F.,&Studdert‐Kennedy,M.(1967).Ontheroleofformanttransitionsinvowel
recognition.JournaloftheAcousticalSocietyofAmerica,42,830‐843.
Loebach,J.L.,&Wickesberg,R.E.(2006).Therepresentationofnoisevocodedspeechinthe
auditorynerveofthechinchilla:Physiologicalcorrelatesoftheperceptionofspectrally
reducedspeech.HearingResearch,213,130–144.
90
Mann,V.A.,&Liberman,A.M.(1983).Somedifferencesbetweenphoneticandauditorymodes
ofperception.Cognition,14,211‐235.
Marcus,G.,Vijayan,S.,Rao,S.,&Vishton,P.M.(1999).Rulelearningbyseven‐month‐old
infants.Science,283,77‐80.
Masterton,B.,Heffner,H.,&Ravizza,R.(1969).Theevolutionofhumanhearing.Journalofthe
AcousticalSocietyofAmerica,45,966‐985.
McGurk,H.,&MacDonald,J.(1976).Hearinglipsandseeingvoices.Nature,264,746‐748.
Mills,A.(1987).Thedevelopmentofphonologyintheblindchild.InB.Dodd,&R.Campbell
(Eds.),Hearingbyeye:Thepsychologyoflip‐reading(pp.145‐162).London:Erlbaum.
Newman,R.S.(2006).Perceptualrestorationintoddlers.Perception&Psychophysics,68,625‐
642.
Nygaard,L.C.&Pisoni,D.B.(1998).Talker‐specificlearninginspeechperception.Perception&
Psychophysics,60,355‐376.
Nygaard,L.C.,Sommers,M.S.,&Pisoni,D.B.(1994).Speechperceptionasatalker‐contingent
process.PsychologicalScience,5,42‐46.
Ohms,V.R.,Gill,A.,VanHeijningen,C.A.A.,Becker,G.J.L.,&tenCate,C.(2010).Zebrafinches
exhibitspeaker‐independentphoneticperceptionofhumanspeech.Philosophical
TransactionsofTheRoyalSocietyB:BiologicalSciences,277,1003‐1009.
Olive,J.P,Greenwood,A.,&Coleman,J.(1993).AcousticsofAmericanEnglishspeech:
Adynamicapproach.NewYork:Springer.
Owren,M.J.(2010).GSUPraatTools:ScriptsformodifyingandanalyzingsoundsusingPraat
acousticssoftware.BehaviorResearchMethods,40,822‐829.
91
Owren,M.J.,&Cardillo,G.C.(2006).Therelativerolesofvowelsandconsonantsindiscrimi‐
natingtalkeridentityversuswordmeaning.JournaloftheAcousticalSocietyofAmerica,
119,1727‐1739.
Owren,M.J.,Hopp,S.L.,Sinnott,J.M.,&Petersen,M.R.(1988).Absoluteauditorythresholds
inthreeOldWorldmonkeyspecies(Cercopithecusaethiops,C.neglectus,Macaca
fuscata)andhumans(Homosapiens).JournalofComparativePsychology,102,99‐107.
Patterson,M.L.,&Werker,J.F.(2003).Two‐month‐oldinfantsmatchphoneticinformationin
lipsandvoice.DevelopmentalScience,6,191‐196.
Perfetti,C.A.,&Sandak,R.(2000).Readingoptimallybuildsonspokenlanguageimplications
fordeafreaders.JournalofDeafStudiesandDeafEducation,5,32‐50.
Petersen,M.R.,Beecher,M.D.,Zoloth,S.R.,Green,S.,Marler,P.,Moody,D.B.,&Stebbins,
W.C.(1984).NeurallateralizationofvocalizationsbyJapanesemacaques:Communica‐
tivesignificanceismoreimportantthanacousticstructure.BehavioralNeuroscience,98,
779‐790.
Petersen,M.R.,Beecher,M.D.,Zoloth,S.R.,Moody,D.B.,&Stebbins,W.C.(1978).Neural
lateralizationofspecies‐specificvocalizationsbyJapanesemacaques(Macacafuscata).
Science,202,324‐327.
Petersson,K.M.,Silva,C.,Castro‐Caldas,A.,Ingvar,M.,&Reis,A.(2007).Literacy:Acultural
influenceonfunctionalleft‐rightdifferencesintheinferiorparietalcortex.European
JournalofNeuroscience,26,791‐799.
Pickles,J.O.(1988).Anintroductiontothephysiologyofhearing.SanDiego:Academic.
92
Pisoni,D.B.(1995).Somethoughtson"normalization"inspeechperception.Researchon
SpokenLanguageProcessing,ProgressReportNo.20,IndianaUniversity,3‐29.
Purnell,T.C.,andRaimy,E.(2008).UniversityofWisconsin‐Madison.Shorttime‐reversal
windowsusedtoinvestigatetheprocessionofintonation.Unpublishedmanuscript.
Ramus,F.,Hauser,M.D.,Miller,C.,Morris,D.,&Mehler,J.(2000).Languagediscriminationby
humannewbornsandbycotton‐toptamarinmonkeys.Science,288,349‐351.
Rand,T.C.(1974).Dichoticreleasefrommaskingforspeech.JournaloftheAcousticalSocietyof
America,55,678‐680.
Remez,R.E.(2005).Perceptualorganizationofspeech.InD.B.Pisoni,&R.E.Remez(Eds.),The
handbookofspeechperception(pp.28‐50).Oxford:Blackwell.
Remez.,R.E.,Dubowski,K.R.,Broder,R.S.,Davids,M.L.,Grossman,Y.S.,Moskalenko,M.,
Pardo,J.S.,&Hasbun,S.M(2011).Auditory‐phoneticprojectionandlexicalstructurein
therecognitionofsine‐wavewords.JournalofExperimentalPsychology:HumanPercep‐
tionandPerformance,37,968‐977.
Remez,R.E.,&Rubin,P.E.(1990).Ontheperceptionofspeechfromtime‐varyingacoustic
information:Contributionsofamplitudevariation.Perception&Psychophysics,48,313‐
325.
Remez,R.E.,Rubin,P.E.,Berns,S.M.,Pardo,J.S.,&Lang,J.M.(1994).Ontheperceptional
organizationofspeech.PsychologicalReview,101,129‐156.
Remez,R.E.,Rubin,P.E.,Pisoni,D.B.,&Carrell,T.D.(1981).Speechperceptionwithouttradi‐
tionalspeechcues.Science,212,947‐949.
93
Romski,M.A.,&Sevcik,R.A.(1996).Breakingthespeechbarrier:Languagedevelopment
throughaugmentedmeans.Baltimore:Brookes.
Rosenblum,L.D.(2005).Primacyofmultimodalspeechperception.InD.B.Pisoni,&
R.E.Remez(Eds.),Thehandbookofspeechperception(pp.51‐78).Oxford:Blackwell.
Rosner,B.S.,Talcott,J.B.,Witton,C.,Hogg,J.D.,Richardson,A.J.,Hansen,P.C.,&Stein,J.F.
(2003)Theperceptionof"sine‐wavespeech"byadultswithdevelopmentaldisabilities.
JournalofSpeech,Language,andHearingResearch,46,68‐79.
Rosowski,J.J.(1992).Hearingintransitionalmammals:Predictionsfromthemiddle‐ear
anatomyandhearingcapabilitiesofextantmammals.InD.B.Webster,R.R.Fay,&
A.N.Popper(Eds.),Theevolutionarybiologyofhearing(pp.615‐632).NewYork:
Springer.
Rumbaugh,D.M.(2002).Emergentsandrationalbehavior.EyeonPsiChi,6(2),8‐14.
Rumbaugh,D.M.,King,J.E.,Beran,M.J.,Washburn,D.A.,&Gould,K.L.(2007).Asalience
theoryoflearningandbehavior:Withperspectivesonneurobiologyandcognition.
InternationalJournalofPrimatology,28,973‐996.
Rumbaugh,D.M.,&Savage‐Rumbaugh,E.S.(1996).Biobehavioralrootsoflanguage:Words,
apes,andachild.InB.M.Velichkovsky,&D.M.Rumbaugh(Eds.),Communicating
meaning:Theevolutionanddevelopmentoflanguage(pp.257‐274).Mahwah,NJ:
Erlbaum.
Rumbaugh,D.M.,&Washburn,D.A.(2003).Intelligenceofapesandotherrationalbeings.
NewHaven,CT:YaleUniversity.
94
Saberi,K.,&Perrott,D.R.(1999).Cognitiverestorationofreversedspeech.Nature,398,760.
Saffran,J.R.,Aslin,R.N.,&Newport,E.L.(1996).Statisticallearningby8month‐oldinfants.
Science,274,1926‐1928.
Saldana,H.M.,Pisoni,D.B.,Fellowes,J.M.,&Remez,R.E.(1996).Audio‐visualspeech
perceptionwithoutspeechcues.FourthInternationalConferenceonSpokenLanguage,
Philadelphia,4,2187‐2190.
Savage‐Rumbaugh,E.S.,Murphy,J.,Sevcik,R.A.,&Brakke,K.E.(1993).Languagecomprehen‐
sioninapeandchild.MonographsoftheSocietyforResearchinChildDevelopment,
58,1‐222.
Sawusch,J.R.(2005).Acousticanalysisandsynthesisofspeech.InD.B.Pisoni,&R.E.Remez
(Eds.),Thehandbookofspeechperception(pp.7‐27).Oxford:Blackwell.
Shannon,R.V.,Fu,Q.J.,&Galvia,J.(2004).Thenumberofspectralchannelsrequiredfor
speechrecognitiondependsonthedifficultyofthelisteningsituation.ActaOto‐
Laryngologica,552,50‐54.
Shannon,R.V.,Zeng,F.,Kamath,V.,Wygonski,J.,&Ekelid,M.(1995).Speechrecognitionwith
primarilytemporalcues.Science,270,303‐304.
Sibley,C.G.,&Ahlquist,J.E.(1990).Phylogenyandclassificationofbirds:Astudyinmolecular
evolution.NewHaven,CT:YaleUniversity.
Simpson,A.P.(2009).Phoneticdifferencesbetweenmaleandfemalespeech.Languageand
LinguisticsCompass,3,621‐640.
Sinnott,J.M.,&Gilmore,C.S.(2004).Perceptionofplace‐of‐articulationinformationinnatural
speechbymonkeysversushumans.PerceptionandPsychophysics,66,1341‐1350.
95
Skipper,J.I.,vanWassenhove,V.,Nusbaum,H.C.,&Small,S.I.(2007).Hearinglipsandseeing
voices:Howcorticalareassupportingspeechproductionmediateaudiovisualspeech
perception.CerebralCortex,17,2387‐2399.
Souza,P.,&Rosen,S.(2009).Effectsofenvelopebandwidthontheintelligibilityofsine‐and
noise‐vocodedspeech.JournaloftheAcousticalSocietyofAmerica,126,792‐805.
Stebbins,W.C.(1973).HearingofOldWorldmonkeys(Cercopithecinae).AmericanJournalof
PhysicalAnthropology,38,357‐364.
Stebbins,W.C.(1983).Theacousticsenseofanimals.Cambridge,MA:HarvardUniversity.
Stebbins,W.C.,&Moody,D.B.(1994).Howmonkeysheartheworld:Auditoryperceptionin
nonhumanprimates.InR.R.Fay,&A.N.Popper(Eds.),Comparativehearing:Mammals
(pp.97‐133).NewYork:Springer.
Studdert‐Kennedy,M.(1980).SpeechPerception.LanguageandSpeech,23,45‐66.
Studdert‐Kennedy,M.,&Shankweiler,D.(1970).Hemisphericspecializationforspeech
perception.JournaloftheAcousticalSocietyofAmerica,48,579‐594.
Sumby,W.H.,&Pollack,I.(1954).Visualcontributiontospeechintelligibilityinnoise.Journal
oftheAcousticalSocietyofAmerica,26,212‐215.
Sumner,M.(2011).Theroleofvariationintheperceptionofaccentedspeech.Cognition,119,
131‐136.
Taglialatela,J.P.(2007).Functionalandstructuralasymmetriesforauditoryperceptionand
vocalproductioninnonhumanprimates.InW.D.Hopkins(Ed.),Theevolutionof
hemisphericspecializationinprimates(pp.120‐145).Amsterdam:Elsevier.
96
Taglialatela,J.P.,Russell,J.L.,Schaeffer,J.A.,&Hopkins,W.D.(2008).Communicative
signalingactivates"Broca's"homologinchimpanzees.CurrentBiology,18,343‐348.
Tartter,V.C.(1989).What'sinawhisper?JournaloftheAcousticalSocietyofAmerica,86,
1678‐1683.
Trout,J.D.(2001).Thebiologicalbasisofspeech:Whattoinferfromtalkingtotheanimals.
PsychologicalReview,108,523‐549.
Trout,J.D.(2003).Biologicalspecializationsforspeech:Whatcantheanimalstellus?Current
DirectionsinPsychologicalScience,12,155‐159.
Tuomainen,J.,Andersen,T.S.,&Tiippana,K.,&Sams,M.(2005).Audio‐visualspeechpercep‐
tionisspecial.Cognition,96,B13‐B22.
Walden,B.E.,Busacco,D.A.,&Montgomery,A.A.(1993).Benefitfromvisualcuesinauditory‐
visualspeechrecognitionbymiddle‐agedandelderlypersons.JournalofSpeechand
HearingResearch36,431‐436.
Warren,R.M.(1970).Perceptualrestorationofmissingspeechsounds.Science,167,392‐393.
Warren,R.M.,&Obusek,C.J.(1971).Speechperceptionandphonemicrestorations.Percep‐
tion&Psychophysics,9,358‐362.
Werker,J.F.,&Desjardins,R.N.(1995).Listeningtospeechinthefirstyearoflife:Experiential
influencesonphonemeperception.CurrentDirectionsinPsychologicalScience,4,76‐81.
Whalen,D.H.,&Liberman,A.M.(1987).Speechperceptiontakesprecedenceovernonspeech
perception.Science,237,169‐171.
Wood,B.(1996).Humanevolution.Bioessays,18,945‐954.