110
Georgia State University ScholarWorks @ Georgia State University Psychology Dissertations Department of Psychology Summer 8-1-2012 Investigating Speech Perception in Evolutionary Perspective: Comparisons of Chimpanzee (Pan troglodytes) and Human Capabilities Lisa A. Heimbauer Georgia State University Follow this and additional works at: hps://scholarworks.gsu.edu/psych_diss is Dissertation is brought to you for free and open access by the Department of Psychology at ScholarWorks @ Georgia State University. It has been accepted for inclusion in Psychology Dissertations by an authorized administrator of ScholarWorks @ Georgia State University. For more information, please contact [email protected]. Recommended Citation Heimbauer, Lisa A., "Investigating Speech Perception in Evolutionary Perspective: Comparisons of Chimpanzee (Pan troglodytes) and Human Capabilities." Dissertation, Georgia State University, 2012. hps://scholarworks.gsu.edu/psych_diss/106

Investigating Speech Perception in Evolutionary

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Investigating Speech Perception in Evolutionary

Georgia State UniversityScholarWorks @ Georgia State University

Psychology Dissertations Department of Psychology

Summer 8-1-2012

Investigating Speech Perception in EvolutionaryPerspective: Comparisons of Chimpanzee (Pantroglodytes) and Human CapabilitiesLisa A. HeimbauerGeorgia State University

Follow this and additional works at: https://scholarworks.gsu.edu/psych_diss

This Dissertation is brought to you for free and open access by the Department of Psychology at ScholarWorks @ Georgia State University. It has beenaccepted for inclusion in Psychology Dissertations by an authorized administrator of ScholarWorks @ Georgia State University. For more information,please contact [email protected].

Recommended CitationHeimbauer, Lisa A., "Investigating Speech Perception in Evolutionary Perspective: Comparisons of Chimpanzee (Pan troglodytes)and Human Capabilities." Dissertation, Georgia State University, 2012.https://scholarworks.gsu.edu/psych_diss/106

Page 2: Investigating Speech Perception in Evolutionary

INVESTIGATINGSPEECHPERCEPTIONINEVOLUTIONARYPERSPECTIVE:

COMPARISONSOFCHIMPANZEE(PANTROGLODYTES)ANDHUMANCAPABILITIES

by

LISAA.HEIMBAUER

UndertheDirectionofMichaelJ.Owren

ABSTRACT

Therehasbeenmuchdiscussionregardingwhetherthecapabilitytoperceivespeechis

uniquelyhuman.The“SpeechisSpecial”(SiS)viewproposesthathumanspossessaspecialized

cognitivemoduleforspeechperception(Mann&Liberman,1983).Incontrast,the“Auditory

Hypothesis”(Kuhl,1988)suggestsspoken‐languageevolutiontookadvantageofexisting

auditory‐systemcapabilities.InsupportoftheAuditoryHypothesis,thereisevidencethat

Panzee,alanguage‐trainedchimpanzee(Pantroglodytes),perceivesspeechinsynthetic“sine‐

wave”and“noise‐vocoded”forms(Heimbauer,Beran,&Owren,2011).Humancomprehension

ofthesealteredformsofspeechhasbeencitedasevidenceforspecializedcognitivecapabili‐

ties(Davis,Johnsrude,Hervais‐Adelman,Taylor,&McGettigan,2005).

Page 3: Investigating Speech Perception in Evolutionary

InlightofPanzee’sdemonstratedabilities,threeexperimentsextendedthese

investigationsofthecognitiveprocessesunderlyingherspeechperception.Thefirstexperi‐

mentinvestigatedtheacousticcuesthatPanzeeandhumansusewhenidentifyingsine‐wave

andnoise‐vocodedspeech.ThesecondexperimentexaminedPanzee’sabilitytoperceive

“time‐reversed”speech,inwhichindividualsegmentsofthewaveformarereversedintime.

Humansareabletoperceivesuchspeechifthesesegmentsdonotmuchexceedaverage

phonemelength.Finally,thethirdexperimenttestedPanzee’sabilitytogeneralizeacrossboth

familiarandnoveltalkers,aperceptuallychallengingtaskthathumansseemtoperform

effortlessly.

Panzee’sperformancewassimilartothatofhumansinallexperiments.InExperiment1,

resultsdemonstratedthatPanzeelikelyattendstothesame“spectro‐temporal”cuesinsine‐

waveandnoise‐vocodedspeechthathumansaresensitiveto.InExperiment2,Panzeeshowed

asimilarintelligibilitypatternasafunctionofreversal‐windowlengthasfoundinhuman

listeners.InExperiment3,Panzeereadilyrecognizedwordsnotonlyfromavarietyoffamiliar

adultmalesandfemales,butalsofromunfamiliaradultsandchildrenofbothsexes.Overall,

resultssuggestthatacombinationofgeneralauditoryprocessingandsufficientexposureto

meaningfulspokenlanguageissufficienttoaccountforspeech‐perceptionevidencepreviously

proposedtorequirespecialized,uniquelyhumanmechanisms.Thesefindingsinturnsuggest

thatspeech‐perceptioncapabilitieswerealreadypresentinlatentforminthecommon

evolutionaryancestorsofmodernchimpanzeesandhumans.

INDEXWORDS:Speechperception,Evolution,Language‐trainedchimpanzee,Syntheticspeech

Page 4: Investigating Speech Perception in Evolutionary

INVESTIGATINGSPEECHPERCEPTIONINEVOLUTIONARYPERSPECTIVE:

COMPARISONSOFCHIMPANZEE(PANTROGLODYTES)ANDHUMANCAPABILITIES

by

LISAA.HEIMBAUER

ADissertationSubmittedinPartialFulfillmentoftheRequirementsfortheDegreeof

DoctorofPhilosophy

intheCollegeofArtsandSciences

GeorgiaStateUniversity

2012

Page 5: Investigating Speech Perception in Evolutionary

CopyrightbyLisaA.Heimbauer

2012

Page 6: Investigating Speech Perception in Evolutionary

INVESTIGATINGSPEECHPERCEPTIONINEVOLUTIONARYPERSPECTIVE:

COMPARISONSOFCHIMPANZEE(PANTROGLODYTES)ANDHUMANCAPABILITIES

by

LISAA.HEIMBAUER

CommitteeChair: MichaelJ.Owren

Committee: RoseA.Sevcik

GwenFrishkoff

MichaelJ.Beran

ElectronicVersionApproved:

OfficeofGraduateStudies

CollegeofArtsandSciences

GeorgiaStateUniversity

August2012

Page 7: Investigating Speech Perception in Evolutionary

iv

DEDICATION

Idedicatethisdissertationtomyfamily,especiallymylovinghusbandGarywhohas

supportedmeunconditionallyandunselfishlysinceIdecidedtobeginmyeducation.Iamalso

gratefultomyparents,AlbertandMarie,forinstillinginmealoveofeducation,andmy

brotherBillanddaughter‐in‐lawTarawhowerealwaystherewhenIneededsomeonetolisten.

Tomychildren,Randy,Stephanie,Melanie,andGary,andmygrandchildren,Holly,Samantha,

Thomas,John,andLeah,Iespeciallyhopemyjourneyinspiresyoutoalwaysfollowyour

dreams—wherevertheyleadyou.

Page 8: Investigating Speech Perception in Evolutionary

v

ACKNOWLEDGMENTS

Iamtrulyindebtedandgratefultomyadvisor,MichaelOwren,forthesupportand

guidanceheprovidedthroughoutthewritingofmydissertation.Additionally,hisrigorous

scholarshipandtheoreticalbeliefsontheevolutionofcommunicationhavebeeninspiringand

havehelpedtoshapemyprogramofresearchduringmygraduateschooleducation.These

pagesalsoreflecttherelationshipswithothergenerousandinspiringindividualswhohave

influencedme,servedonmycommittees,andprovidedsupportandfriendshipduringmytime

atGeorgiaStateUniversityandtheLanguageResearchCenter,especiallyMikeBeran.

Inaddition,Iamobligedtootherswhohavesupportedmebycontributingtomy

education,especiallydissertationcommitteemembersRoseSevcikandGwenFrishkoff,and

facultymembersSarahBrosnanandSeydaÖzçaliskan.Iwouldalsoliketothankthemembers

oftheResearchontheChallengesofAcquiringLanguageandLiteracyinitiativeandtheDuane

M.RumbaughFellowshipcommitteeforthegenerousfellowshipsupport.Finally,Iamgrateful

tothegraduatestudentswhohaveprovidedintellectualandmoralsupport,andmore

importantly,friendship.

Page 9: Investigating Speech Perception in Evolutionary

vi

TABLEOFCONTENTS

ACKNOWLEDGMENTS…………..………………………………………..……………………………………………………….v

LISTOFTABLES……………..…..…………………………………………………………………………………………………….x

LISTOFFIGURES……..……………………………………………………………………………………………..…………….…xi

1.INTRODUCTION………………………………………..………………………………………………………………………1

1.1Humanspeech……………………………………………………………………………………………………3

1.1.1Naturalspeech‐perceptionphenomena………………………………………………6

1.1.2Syntheticspeechperception……………………………………………………………….7

1.2TheSiSargument…………………………………………………………………………………………….11

1.3TheAuditoryHypothesis………………………………………………………………………………….13

1.3.1Mammalianhearing………………………………………………………………………….14

1.3.2 Speechperceptioninnon‐primates……………………………………………………17

1.3.3Speechperceptioninnonhumanprimates……………………………………….19

1.3.4Evidenceoftop‐downprocessingwhenperceivingspeech……………….22

1.4Speechperceptioninalanguage‐trainedchimpanzee…………………………………….24

1.4.1PreviousexperimentswithPanzee………..……………………………….…………..24

2.CURRENTEXPERIMENTS…………………………………………………………………………………………………28

2.1GeneralMethods………………………………………………………………………………………….…30

2.1.1Subject…………………………………………………………………………………………….…30

2.1.2Participants………………………………………………………………………………………..30

2.1.3Apparatus…………………………………………………………………………………………..31

2.1.4Stimuli………………………………………………………………………………………………..31

Page 10: Investigating Speech Perception in Evolutionary

vii

2.1.5Chimpanzeeprocedure……………………………………………………………………..32

2.1.6Humanprocedures……………………………………………………………………………33

2.1.7Dataanalysis…………………………………………………………………………………….34

3.EXPERIMENT1……………………………………………..………………………………………………………………..34

3.1Experiment1a………………………………………………………………………………………………….35

3.1.1Subject……………………………………………………………………………………………..36

3.1.2Participants………………………………………………………………………………………36

3.1.3Stimuli………………………………………………………………………………………………36

3.1.4Chimpanzeeprocedure…………………………………………………………………….38

3.1.5Humanprocedure………………….…………………………………………………………39

3.1.6Dataanalysis……………………………………………………………………………………40

3.1.7Results……………………………………………………………………………………………..41

3.1.8Discussion………………………………………………………………………………………..43

3.2Experiment1b………………………………………………………………………………………………..45

3.2.1Subject……………………………………………………………………………………………..46

3.2.2Participants………………………………………………………………………………………46

3.2.3Stimuli………………………………………………………………………………………………46

3.2.4Chimpanzeeprocedure…………………………………………………………………….48

3.2.5Humanprocedure…………………………………………………………………………….49

3.2.6Dataanalysis……………………………………………………………………………………49

3.2.7Results……………………………………………………………………………………………..49

3.2.8Discussion………………………………………………………………………………………..51

Page 11: Investigating Speech Perception in Evolutionary

viii

3.3GeneralDiscussion………………………………………………………………………………………..52

4.EXPERIMENT2…………………………………………………………………………………………………………….53

4.1Subject………………………………………………………………………………………………………….54

4.2Participants…………………………………………………………………………………………………..54

4.3Stimuli…………………………………………………………………………………………………………..55

4.4Chimpanzeeprocedure…………………………………………………………………………………55

4.5Humanprocedure…………………………………………………………………………………………56

4.6Dataanalysis…………………………………………………………………………………………………56

4.7Results………………………………………………………………………………………………………….58

4.8Discussion…………………………………………………………………………………………………….60

5.EXPERIMENT3…………………………………………………………………………………………………………….62

5.1Subject………………………………………………………………………………………………………….65

5.2Participants…………………………………………………………………………………………………..65

5.3Audio‐recordingandstimuli…………………………………………………………………………65

5.4Chimpanzeeprocedure……..…………………………………………..…………………………….67

5.5Dataanalysis…………………………………………………………………………………………………67

5.6Results……………………………………………………………………………………………………….…68

5.7Discussion………………………………………………………………………………………………….…69

6.GENERALDISCUSSION…………………………………………………………………………………………………70

6.1Currentresults…………………………………………………….………………..........................71

6.2Implicationsofexperimentalresults……………………………………………………………72

6.2.1 TheSiSviewversustheAuditoryHypothesis………………………………….72

Page 12: Investigating Speech Perception in Evolutionary

ix

6.2.2 Top‐downprocessingandspeechperceptionexperience………………74

6.3 Cognitiveprocessingandlanguage……………………………………………………………75

6.4 Futuredirections……………………………………………………………………………………….78

REFERENCES………..……………………………………………………………………………………………………………80

Page 13: Investigating Speech Perception in Evolutionary

x

LISTOFTABLES

Table1.Orientationandtestwordgroups…………………………………………………………………………..37

Table2.Lower‐to‐uppernoise‐bandcutofffrequencies………………………………………………….…..47

Page 14: Investigating Speech Perception in Evolutionary

xi

LISTOFFIGURES

Figure1.Spectrographicwordexamples…………………………………………………………………................5

Figure2.Panzee’ssyntheticspeechword‐recognitionperformance……………………..……………….27

Figure3.SamplesofphotographsusedinPanzee’sspoken‐wordidentification…………………….32

Figure4.Panzeeworkingonacomputertask…………………………………………………………………………33

Figure5.Intelligibilityofsine‐wavespeechtohumans.……………………………………….…………………36

Figure6.Experiment1achimpanzeeandhumanwordrecognition……………………..………………..42

Figure7.Intelligibilityofnoise‐vocodedspeechtohumans………………………………….………………..46

Figure8.Experiment1bchimpanzeeandhumanwordrecognition……………………………………….50

Figure9.Intelligibilityoftime‐reversedspeechtohumans……………………………………………………54

Figure10.Experiment2chimpanzeeandhumanwordrecognition………………….……………………59

Figure11.SamplesoflexigramsusedinPanzee’sspoken‐wordidentificationtask………………..67

Figure12.Experiment3chimpanzeeword‐recognitionperformanceacrosstalkers………….…..68

Page 15: Investigating Speech Perception in Evolutionary

1

1.INTRODUCTION

Speechperceptionistheabilitytohearandrecognizetheacousticsofspokenlanguage.

Itinvolvesmanylevelsofprocessing—fromtheauditoryinputtothecomprehensionoflexical

meaning.Lenneberg(1967)claimedthatbothspeechproductionandspeechperceptionare

uniquelyhumanadaptations;aviewlatertermed“SpeechisSpecial”(SiS)byLiberman(1982).

Incontrast,studieswithnonhumanshaverevealedthatsomeanimalsareabletodiscriminate

andcategorizephonemes—thesmallestunitofspeechsounds—muchashumansdo(Kluender,

Diehl,&Killeen,1987;Kuhl&Miller,1975;Kuhl&Padden,1982,1983).Therefore,itmaybe

thatauditoryprocessinginhumansandnonhumansarefundamentallysimilar,asproposedby

Kuhl’s(1988)“AuditoryHypothesis.”Inthisview,acommonevolutionaryancestorofhumans

andothermammalspossessedlatentspeech‐processingcapabilitiesthatpredatedspeech

itself.Evolutionofhumanspeech‐productioncapabilitieswouldthenhavetakenadvantageof

existingauditoryprocessing.

Numerousexperimentshaveinvestigatedbothhumanandnonhumanspeechpercep‐

tiontoevaluatetheseopposingviews.Forexample,evidenceproposedtosupporttheSiS

approachincludesthathumansareabletorecognizemeaningfulspeechinanumberof

fundamentallyaltered,syntheticforms(Remez,2005;Trout,2001).Theseabilitieshavebeen

difficulttoshowinnonhumans.Animalstypicallydonotunderstandwordmeaningandare

testedwiththeverybrief,meaninglessphonemecomponentsofspeechinstead.More

recently,however,experimentswithalanguage‐trainedchimpanzee(Pantroglodytes)named

Panzeehavedemonstratedtheabilitytorecognizesyntheticspeechinsomeofthehighly

reducedformshumanshavebeentestedwith(Heimbauer,Beran,&Owren,2011).These

Page 16: Investigating Speech Perception in Evolutionary

2

outcomessupporttheAuditoryHypothesis,suggestingthatcomprehensionofmeaningful

speechdoesnotrequireperceptionspecializations.Instead,listenerscanapplygeneralaudi‐

toryprocessessharedwithpre‐homininancestors.

Afterdiscussinggeneralaspectsofhumanspeechperceptionandproduction,key

findingsfromexperimentsinvestigatingSiSandtheAuditoryHypothesiswillbereviewed.Three

newexperimentsconductedwithPanzee,whichfurtherinvestigatedthespeech‐perception

capabilitiesofthischimpanzee,willthenbepresented.Thefirstoftheseexperimentsextended

anearlierfindingwhichdemonstratedthatPanzeewasabletorecognizespokenwordspre‐

sentedin“sine‐wave”and“noise‐vocoded”forms(Heimbaueretal.,2011).Botharesynthetic

versionsofspeechthatreducenormalspeechacousticstosmallsetsofsinewavesandnoise

bands,respectively.ItwashypothesizedbasedonPanzee’spreviousresultsthatshewouldrely

onthesameacousticcuesashumanswhenhearingthesespeechforms(Remez,Rubin,Pisoni,

&Carrell,1981;Shannon,Zeng,Kamath,Wygonski,&Ekelid,1995).Alternatively,Panzeecould

relyonamoreholisticapproach,forinstancematchinggeneralcharacteristicsofsynthetic

wordstocorrespondingnaturalversions.Totestthishypothesis,bothPanzeeandhuman

participantsweretestedwithvaryingversionsofsine‐waveandnoise‐vocodedspeech,using

stimulisimilartothoseusedinprevioushumantesting(Remezetal.,1981;Shannonetal.,

1995).

Inthesecondexperiment,Panzeeandhumansheardwordspresentedin“time‐

reversed”form.Thismanipulationinvolvesseparatingthespeechsignalintofixedtime‐length

segmentsandthenreversingthese“windows”throughoutthesignal.Humansfindspeechin

thisformhighlyintelligibleforreversalwindowsuptoapproximately100millisecondsinlength

Page 17: Investigating Speech Perception in Evolutionary

3

(Saberi&Perrott,1999),withintelligibilitydecreasingaswindowlengthincreases.Here,the

relationshipbetweenphoneme‐andwindow‐lengthisproposedtobethecriticalfeature.

Specifically,ifthereversalwindowissmallerthantheaveragedurationofaphoneme,the

manipulatedspeechremainscomprehensible.However,ifwindowlengthexceedsphoneme

duration,thenthespeechbecomescriticallydistorted.BasedonPanzee’srobustspeech‐

perceptionabilities,itwasagainhypothesizedthatPanzeewouldperformsimilarlytohuman

listeners,therebyprovidingevidenceofattendingtothephonemecontentofspeech.

Alternatively,ifPanzee’sspeechrecognitionreliesonadifferentstrategythanhumansuse—for

instance,amoreholisticapproach—shewouldnotshowthesamewindow‐length‐dependent

intelligibilityfunction.

ThethirdexperimentinvestigatedPanzee’sabilitytounderstandspeechfromavariety

oftalkers.Thiscapabilityisimportantbecausetheacousticcharacteristicsofspeechcanvary

significantlyamongbothindividualtalkersandclassesoftalkers(Pisoni,1995;Remez,2005).As

Panzeeroutinelyinteractswith,andrespondsto,differentindividuals,itwashypothesizedthat

shewouldreadilyperceivefamiliarwordsregardlessofthetalker.Alternatively,shemaysimply

havebecomeaccustomedtothespeechofparticular,knownindividualswithoutbeingableto

normalizetonoveltalkers.Toinvestigatethishypothesis,Panzeewastestedforrecognition

abilitieswiththespeechofavarietyofmaleandfemalefamiliaradults,unfamiliaradultsand

children.

1.1Humanspeech

Whenperceivingnaturalspeech,listenersarefacedwithmanyperceptualchallenges.

Themostimportantistocombineandcategorizethediverseacousticelementsmakingupthe

Page 18: Investigating Speech Perception in Evolutionary

4

speechstream(Remez,2005),therebymappingsignalacousticsontotheirlinguisticcorrelates

(Pisoni,1995).Wordsareproduceddifferentlyeachtimetheyareuttered,andminimaldiffer‐

encesbetweenwordsoftenmatter.Researchers,therefore,havefocusedonhowlistenerscan

recognizespecificspeechsoundsinthefaceofpotentiallylargeacousticdifferencesandthen

usethisinformationtounderstandthephoneticcontentofspokenlanguage(Appelbaum,

1996).

TheacousticcharacteristicsofspeechareillustratedinspectrographicforminFigure1a.

Aspectrogramisavisualdepictionofthetime‐varyingpropertiesofsound,withtimedisplayed

onthehorizontalaxis,frequencyontheverticalaxis,andamplitudeshownasthedarknessof

shadingatanyparticularpoint(Olive,Greenwood,&Coleman,1993).Oneimportantcharacter‐

isticisthefundamentalfrequency(F0),whichcorrespondstothebasicrateofvibrationofthe

vocalfoldsinthelarynx.F0istypicallythelowestprominentfrequencyvisibleinaspeech

spectrogram.RegularvibrationalsoproducesenergyatintegermultiplesofF0,referredtoas

“harmonics”(H2andH3arelabeledinthefigure).Energyfromthelarynxsubsequentlypasses

throughthepharynx,oralcavity,andnasalcavity,whoseresonancesacttofilterthisenergy.

Theseresonances,termed“formants”strengthentheenergyinsomefrequencyregionswhile

weakeningitinothers.Theseeffectsarevisibleinaspectrogramaslarger,darkbandsof

energy.Thelowestthreebands,oftenalsocalledformants,arealsolabeledinthefigure(F1,

F2,andF3).

Page 19: Investigating Speech Perception in Evolutionary

5

Figure1.Spectrographicwordexamples.Thespectrogramswerecreatedusingasamplingrate

of44100Hzand0.03sGaussiananalysiswindow.a)Thenaturalword“tickle,”showingits

fundamentalfrequency(F0),nexthigherharmonics(H2andH3),andlowestthreeformants

(F1,F2,andF3).b)Theword“tickle”insine‐waveform,withindividualsine‐waves(SW)

marked.c)Theword“tickle”innoise‐vocodedform,madewithfivenoise‐bands(NB).d)The

word“tickle”intime‐reversedform,witha50mstime‐reversal(TR)windownoted.

Page 20: Investigating Speech Perception in Evolutionary

6

Theperceptualchallengeofrecognizingacousticallyvariablespeechisformallyknown

asthe“lackofinvariance”problem.Speechacousticscanbehighlyvariable,withindividuals

showingsystematicdifferencesinspeechrate,F0s,andformantvalues.Thelatterareeven

moredifferentamongclassesoftalkers,forinstancemalesversusfemales,andadultsversus

children.Inadditiontovaryingduetophysicalandphysiologicalproperties,speechacoustics

canalsodifferduetofactorssuchastalkeremotionalstateorregionaldialect(Pisoni,1995).

Theresultisthatthereisnosimplemappingbetweenacousticstructureandphoneticunits,

meaningthatlistenershavetocategorizeatalker’sphonemesintheabsenceofinvariantcuing.

Humansnonethelessroutinelyrecognizespeechfrombothfamiliarandunfamiliartalkers,an

abilityreferredtoas“talkernormalization.”

1.1.1Naturalspeechperceptionphenomena.Listenersaccomplishabroadarrayof

auditoryperceptualtasksbothwhenlearningandafterhavingmasteredlanguage,manyof

whichseemeffortless.Themajorityoflanguageskillsarelearnedimplicitly,withoutinstruction,

relativelypassively,andwithminimalconsciousattention.Infantsasyoungaseightmonthsold

cansegmentandorganizephonemesandwordsfromacontinuous,acousticspeechstream

(Marcus,Vijayan,Rao,&Vishton,1999;Saffran,Aslin,&Newport,1996).Categorizationof

individualphonemeshasbeendemonstratedatevenyoungerages(Werker&Desjardins,

1995),whichislikelyimportantinlearninghowdifferentlanguageelementscombine.Experi‐

mentsbyWerkerandDesjardins(1995)haverevealedthatat6to8monthsofage,infants

discriminatedphonemesacrossavarietyoflanguages,butby10to12monthsofagehadbe‐

cometunedtothephonemesofthelanguagespokenbytheircaregivers.

Page 21: Investigating Speech Perception in Evolutionary

7

Inadditiontoimplicitperceptionofnaturalspeech,thereareautomaticperceptual

phenomenathatoccurwhenhumanshearspeechinalteredordistortedforms.Onenotable

exampleistheabilitytoidentifywordscontainingaphonemeorasyllablethathasbeen

replacedbywhitenoise.Thiscapabilityhasbeentermed"phonemicrestoration”byWarren

(1970)andissodeeplyrootedcognitivelythatparticipantscannotonlyidentifythewords,but

alsoreporthearingthemintheirentiretywithoutperceptiblegaps(Warren&Obusek,1971).

Oneinterpretationofthisphenomenonisthatthemissingsegmentisre‐createdinthebrain,

evenintheabsenceofthatsound(Kashino,2006).Thisabilityto“hear”themissingsegment

presumablyenableslistenerstofillinthegapsthatroutinelyoccurwhenspeechisheardin

noisy,everydaysituations.Anotherphenomenonofinteresthasbeentermed“duplexpercep‐

tion”(Rand,1974).Duplexperceptionoccurswhenashortsinewave,soundinglikeachirp,is

presentedtooneearandisperceptuallyintegratedintoanotherwiseincompletephoneme

beingplayedtotheotherear(Whalen&Liberman,1987).DavisandJohnsrude(2007)

suggestedthatduplexperceptionillustratesthathumanlistenersactivelyattempttoorganize

soundintoperceptiblespeechwheneverpossible.Infact,thistendencytoorganizelikely

extendstoanyformofdistortedspeech.

1.1.2Syntheticspeechperception.Studiesusingfundamentallyalteredspeechforms

havebeeninvaluableforunderstandinghowthehumancognitivesystemorganizesacoustic

elementsofspeechformeaningfullanguagecomprehension.Here,threeformsofaltered

speechareofparticularinterest.Twoofthese,sine‐waveandnoise‐vocodedspeech,lackmany

oftheacousticfeaturestraditionallyconsideredcrucialtospeechperception,includingF0and

formants(seeFigures1band1c).Thethirdform,time‐reversedspeech,altersmoment‐to‐

Page 22: Investigating Speech Perception in Evolutionary

8

momenttemporalpatterninginthesignal(seeFigure1d).Ineachcase,humanlistenersrelyon

theirextensiveexperiencewithspokenlanguagetomakesenseoftheinput.Beingabletodo

soisconsideredaformof“top‐downprocessing,”wherebyalistenertakesadvantageof

previouslylearnedacousticandphoneticinformation(Davis&Johnsrude,2007;Mann&

Liberman,1983;Newman,2006;Whalen&Liberman,1987).Top‐downprocessingislikely

criticalinnormativespeechperceptionaswellasindifficultlisteningsituations,withprocessing

ofsyntheticwordsandsentencesbecomingusefulinunderstandinghowacousticinputcan

contributetorecognitionofspeechatvariouslevelsoforganization(Davis,Johnsrude,Hervais‐

Adelman,Taylor,&McGettigan,2005;Hillenbrand,Clark,&Baer,2011;Remezetal.,2009;

Saberi&Perrott,1999).

Since1981,sine‐wavespeechhasbeeninvestigatedforthepurposeofunderstanding

spokenlanguageprocessing(Lewis&Carrell,2007;Remezetal.,2009;Remezetal.,1981;

Rosneretal.,2003).Inthissynthesisform,wordsorsentencesareproducedfromthreesine

wavesthattrackthefirstthreeformantsofthenaturalspeechsignal(seeFigure1b).Sine‐wave

speechisextremelyunnatural‐soundingandisconsideredtopreservekeyphoneticproperties

onlyinanabstractform(Remezetal.,2009).Intheirexperiments,Remezandcolleagues

(1981)presentedasine‐wavesentencetohumanlisteners.Whentheparticipantswerenot

toldthatthesesoundscouldbeunderstoodasspeech,theydescribedthemas“science‐fiction

sounds”or“whistles.”Whentheyweretoldthattheywouldbehearingsentencesproducedby

acomputer,however,thelistenersweretypicallyabletoidentifyasubstantialnumberofthe

syllablesandwordsinthesentence.Theresearchersconcludedthatperceptionofsine‐wave

speechwasevidencefora“speechmodeofperception,”andthatlistenersexpectingtoheara

Page 23: Investigating Speech Perception in Evolutionary

9

language‐likestimulustunedintothismode.Evenintheabsenceoftraditionalacousticcues,

listenerswereabletoperceivephoneticcontentinthesine‐wavesignal.

Anotheralteredformofinterestis“noise‐vocoded”speech,whichissynthesizedfrom

noisebands(seeFigure1c).Tocreatenoise‐vocodedspeech,thenaturalsignalisdividedintoa

numberoffrequencybandsusingindividualband‐passfilters.Theintensitypattern,orampli‐

tudeenvelope,ofeachbandisextractedoverthelengthofthatsignal.Resultingenvelopesare

thenusedtomodulatecorresponding,frequency‐limitedbandsofwhitenoise.Theresultisa

seriesofamplitude‐modulated,noisewaveformsthatwhensummedpotentiallybecomes

recognizableasharsh,butcomprehensiblespeech(Davisetal.,2005;Shannonetal.,1995).

Perceptionofnoise‐vocodedspeechisofparticularinterest,becauseitisasimulationof

theinputproducedbyacochlearimplant—asurgicallyimplanted,electronicdeviceforthe

hearing‐impaired(Dorman,Loizou,Spahr,&Maloff,2002).However,noise‐vocodedspeechis

alsousefulininvestigatingspeechperceptioninnormallyhearingindividuals,asitpreserves

theamplitudeandtemporalinformationoftheoriginalutterance,whileomittingmostspectral

detail(Shannonetal.,1995).EvenintheabsenceofF0andformantsnoise‐vocodedspeechcan

carryasurprisingamountofinformationregardingphonemes(Dormanetal.,2002;Sawusch,

2005).Onecriticalfactoristhenumberofnoisebandsusedinthesynthesisprocess.Listeners

cannotreliablyrecognizenoise‐vocodedspeechcreatedwithonlytwonoisebands.However,

recognitionbecomesmuchmoreconsistentifthreeorfourbandsarepresent(Shannonetal.,

1995).Whentenormorenoisebandsareused,noise‐vocodedspeechisreadilyintelligible

eventonaïvelisteners(Davisetal.,2005).Individualshearingspeechinthissynthesisform

typicallyshowimprovementwithpractice.Forexample,Davisetal.(2005)reportedthat

Page 24: Investigating Speech Perception in Evolutionary

10

identificationofnoise‐vocodedwordsinsentencesincreasedfromlessthan10%to70%correct

withinjustafewminutes.

Themostrecentlydevelopedsynthesisformofinterestistime‐reversedspeech(Barkat,

Meunier,&Magrin‐Chagnolleau,2002;Purnell&Raimy,2008;Saberi&Perrott,1999).As

showninFigure1d,suchsoundscontainsegmentsofequallengththathavebeenreversedin

time,typicallyonamillisecond(ms)scale.Thismanipulationpreservestheamplitudeofeach

frequencycomponentateverypointintime,butreversesthepatternofenergychangeswithin

eachwindow.Theresultingdisruptionoftheamplitudeenvelopecouldmakethesignal

unintelligible,butlistenersarereliablyabletorecognizespeechcontentatwindowlengthsup

to100ms(Saberi&Perrott,1999).Windowlengthsexceeding100msproducepartialintelligi‐

bility,withthe50%“threshold”pointforintelligibilityoccurringat130msormore.The

interpretationisthatindividualphoneticsegments(“phones”)inspeechrangefromapproxi‐

mately50to100ms(Crystal&House,1988).Inotherwords,reversal‐windowsupto100‐ms

longleavemanyindividualphonemesundisturbed.However,longerwindowsbreakupmore

andmoreindividualphonemes,therebymakingthespeechunintelligible.Barkatetal.(2002)

confirmedthisviewofphoneme‐lengthperceptioninfindingthatFrench‐speakingparticipants

hearingFrenchsentencesshowedadifferent50%intelligibilitythresholdfortime‐reversed

speech.Althoughtheselistenersalsodemonstrateddecreasedspeechrecognitionaswindow‐

lengthsincreased,intelligibilityfellmoreslowly.The50%thresholdwasreachedatawindow

approximately20mslongerthanforEnglish‐speakinglistenershearingEnglishsentences,likely

reflectingalongermeanphonemedurationinFrenchcomparedtoEnglish.

Page 25: Investigating Speech Perception in Evolutionary

11

Overall,resultsofsynthetic‐speechperceptionexperimentsprovideevidencethatsine‐

wave,noise‐vocoded,andtime‐reversedversionscanbeintelligible.Itisnotclear,however,

whatexactlythesethreesynthesisformshaveincommontomakeeachrecognizableto

listeners.Forsyntheticspeech,Remezetal.(1994)characterizethecriticalcuesforsine‐wave

andnoise‐vocodedspeechas“spectro‐temporal”patterning,whilealsonotingthatthetwo

versionsdonotshowobviouscommonalities.Thisinterpretationisarguablysupportedby

findingsfromtime‐reversedspeechexperiments,whichshowthatpreservingspectro‐temporal

informationwithinphonemesisacriticalfactor(seealsoDrullman,2006;Remez,2005).

1.2TheSiSargument

In1967,Libermanandcolleaguesproposedthatinordertoperceivespeech,humans

mustdrawontheirimplicitknowledgeofhowphonemesarearticulated.Theyhypothesized

thatspokenwordsareperceivedbyidentifyingassociatedvocaltractgestures,ratherthanby

identifyingsoundpatternsofspeechitself.Thishypothesiswastermedthe“MotorTheoryof

SpeechPerception,”andimpliesthatonlyproducersofspeechcanalsoperceivespeech.

Therefore,bothspeechproductionandspeechperceptionwouldbe“special”tohumans.Later,

Fodor(1983)revivedthehistoricalconceptofmentalmodularityfromGall’sphrenology,in

whichindividualmentalfacultiesareassociatedwithdomain‐specificareasofthebrain.Fodor

suggestedthatthesespecializedcognitivemodulesoperateindividuallyondomain‐specific

input.Thistheory“uppedtheante”foruniquelyhumanspeechperception,withLibermanand

colleagues(Liberman&Mattingly,1989;Mann&Liberman,1983;Whalen&Liberman,1987)

thenclaimingthathumanshaveaspecialized,cognitivemoduleresponsibleforspeech

perception.

Page 26: Investigating Speech Perception in Evolutionary

12

InsupportofthisSiSargument,avarietyofhumanspeechperceptionphenomenahave

beenproposedasevidenceforaspeechmode,aFodorianphoneticmodule,orboth.For

example,bothphonemicrestorationandduplexperceptionhavebeencitedasevidencefora

uniquelyhuman,speechmodule(Liberman&Mattingly,1989;Mann&Liberman,1983;

Whalen&Liberman,1987).Findingspertainingtoneuralmechanismsofspeechperception

havealsobeeninterpretedinthisway.Forinstance,mosthumansshowa“right‐ear

advantage”(REA)whenprocessingthephoneticelementsofspeech,meaningthatinputpre‐

sentedtotherightearisrecognizedmorequicklyandaccuratelythanthatpresentedtotheleft

ear.TheREAisattributedtophoneticallybasedlanguageprocessingbeingprimarilylateralized

tothelefthemisphereinapproximately95%ofthepopulation(Hughdahl,2004;Kimura,1961;

Studdert‐Kennedy&Shankweiler,1970).

Historically,dichoticlisteningexperimentshavebeenusedtoinvestigatelanguage‐

relatedlateralizationeffects(forareview,seeHugdahl&Davidson,2004).Intheseexperi‐

ments,differentphoneticstimuliarepresentedsimultaneouslytothetwoears,and

participantsareinstructedtoreportwhatisheardinoneearortheother.Studdert‐Kennedy

andShankweiler(1970)useddichoticlisteningtoshowthatbothhemispheresareinvolvedin

processingpurelyauditoryparametersofaspeechsignal.However,theyfoundthatleft

lateralizationoccurswhenspecificallylinguisticfeaturesareattendedto.Theyconcludedthat

thesefindingsprovideevidenceofaspecialized,linguisticdevicelocatedinthelefthemi‐

sphere—therebysupportingtheSiSview.Cutting(1974)investigatedtheREAassociatedwith

syllableandphonemeperception,usingstylized,syntheticvowelandconsonant‐vowelsyllables

andsine‐wavefacsimilesthatwerenotperceivedasspeechsounds.Heconcludedthatthere

Page 27: Investigating Speech Perception in Evolutionary

13

mightbetwoprocessingmechanismsinthelefthemisphere—onededicatedtospeechperse,

andtheotherrespondingtorapidfrequencychangesinanyauditoryinput,whichprocesses

formantmovementinspeechaswellasfrequencymodulationinothersounds.

Itshouldbenotedthatsomeanimals,includingnonhumanprimates,havealso

demonstratedasymmetriesforperceptionofvocalizations(forreviewsseeCorballis,2009;

Taglialatela,2007).Japanesemacaques(Macacafuscata)haveshownanREAandleft‐

hemisphereadvantageforspecies‐specificvocalizations(Petersen,Beecher,Zoloth,Moody,&

Stebbins,1978;Petersenetal.,1984),ashavemice(Ehert,1987).Magneticresonanceimaging

studiesofgorilla(Gorillagorillagorilla),orangutan(Pongopygmaeus),chimpanzee(Pan

troglodytes),andbonobo(Panpaniscus)brainshaverevealedalargerplanumtemporaleinthe

lefthemisphere,parallelinganasymmetryinthecorrespondingareaofthehumanbrain

(Gannon,Holloway,Broadfield,&Braun,1998;Hopkins,Marino,Rilling,&MacGregor,1998).In

humans,theplanumtemporaleisknowntobeinvolvedinbothproductionandperceptionof

spokenlanguage(discussedinmoredetailbelow).

1.3TheAuditoryHypothesis

DespiteavarietyofevidenceinsupportoftheSiShypothesis,theclaimthatonlyspeech

producerscanalsoperceivespeechisproblematic,becauseonlyhumanshavethevocal‐tract

morphologycapableofproducingspeech(deBoer,2006;Fitch,2000).Thedescendedlarynx,in

conjunctionwiththehyoidboneandtongueroot,enablesproductionofdiverseformant

patternsbyhumans(Lieberman,1968).Therefore,theuniquenessofhumananatomymakesit

difficulttorefuteSiSclaims.Inaddition,humanswhoaremutecanstillroutinelyperceive

speech(Studdert‐Kennedy,1980),andsomeanimalshavedemonstratedatleasttheabilityto

Page 28: Investigating Speech Perception in Evolutionary

14

discriminateandcategorizespeechsoundsmuchashumansdo(Kluender,Diehl,&Killeen,

1987;Kuhl&Miller,1975;Kuhl&Padden,1982,1983).Trout(2001)hasproposedthatrefuting

theSiSviewrequiresdemonstratingacommoncognitiveorbiologicalsubstrateforspeech

perceptioninhumansandnonhumananimals.Inhisview,suchademonstrationisneededto

supporttheargumentthatalthoughthehumanabilitytoproducespeechisuniqueamong

mammals,thesamemaynotbetrueforspeechperception.However,todate,neithera

commonmechanismnorcommonfunctionalorganizationforspeechperceptionhasbeen

shownacrossspecies.Itmaybe,instead,thathumanspeech‐productioncapabilitiesaremore

specificallyadaptedtolanguagethantheauditorysystemis.

Notsurprisingly,numerousauditory‐perceptionstudieswithanimalshavebeen

conductedtotestclaimsofhumanspecializationsforspeech.Atleastsomeevidencesuggests

thatperceptionofspeechisrootedinthesamegeneralmechanismsofauditionandperception

evolvedtohandleotherclassesofenvironmentalsounds(Diehl,Lotto,&Holt,2004).Thenext

section,therefore,reviewssomebasicsofmammalianhearing,followedbyspeech‐perception

evidencefromresearchwithanimals,includingbothnon‐primatesandnonhumanprimates.

1.3.1Mammalianhearing.Theauditory‐perceptioncapabilitiesofhumansandother

mammalsmayverywellhaveevolvedsimilarly,asthegeneralmorphologyandfunctionsofthe

auditorysystemhavebeensimilarinallmammalssincethemiddleearwastransformedfrom

itsreptilianform(Allin,1975;Fleagle,1999;Stebbins,1983).Inmammals,thebasicfunctionof

theearisamplificationandtransmissionofsound,inadditiontosensorytransduction.The

outerearconsistsofthepinna,usedforgatheringandlocalizingsound.Soundwavesarethen

funneledalongtheearcanaltothetympanicmembrane(eardrum).Theexternalearfilters

Page 29: Investigating Speech Perception in Evolutionary

15

soundsomewhat,passingenergyin2to5kHzfrequencybest—arangethatisimportantfor

speechperception(Breedlove,Watson,&Rosenzweig,2007).Soundpressurewavescausethe

tympanicmembranetovibrate,whichinturnmovesthethreesmallbones(ossicles)ofthe

middleear.Thelastboneinthechain—thestapes—transmitsthisenergytotheovalwindow,a

smallmembraneatthebaseofthecochleaoftheinnerear.Here,energyfromtheoriginal

cyclicalairpressurechangestowavesinfluid(Kalat,2009;Pickles,1988).Finally,transduction

occurswhenthesecochlearfluidwavescauseneuralfiringintheeighthcranialnerve,whichis

transmittedtotheauditorycortex(forfurtherdetails,seeHackney,2006).

Theauditorysystemissimilarinallmammals,functioningasasoundlocalizerand

showingsensitivityacrossarangeofsoundfrequencies.However,thatrangevarieswithhead

size(andassociatedinter‐auraldistance),assoundlocalizationrequiresspecieswithsmaller

headstobesensitivetohigher‐frequencyenergy(Gans,1992;Heffner,2004;Masterton,

Heffner,&Ravizza,1969).Smallermammalsthustypicallyhearwellabovethenominal20‐kHz

limitofhumanhearing.Theearliestmammalswerelikelysimilar(Rosowski,1992),asaremost

nonhumanprimates(Beecher,1974;Jackson,Heffner,&Heffner,1999;Mastertonetal.,1969;

Owren,Hopp,Sinnott,&Petersen,1988;Stebbins,1973;Stebbins&Moody,1994).However,

frequencysensitivityhasalsobeensubjecttochangeoverthecourseofprimateevolution,for

instanceduetohabitatdifferencesaswellasbody‐sizeeffects(Owrenetal.,1988).

Onesuchchangewasthatlarger‐bodiedmonkeysandapesacquiredbetterlow‐

frequencyhearing.Thistrendhasresultedinbothchimpanzeesandhumansshowingwell‐

developedhearingbothatlow(belowapproximately1kHz)andmid‐rangefrequencies(1to8

kHz).Stebbins(1973)hasproposedthatthesehearingchangesmayhavebeenrelatedto

Page 30: Investigating Speech Perception in Evolutionary

16

evolutionarypressuresformoreintricate,intra‐specificvocalcommunicationsystems.This

viewisconsistentwithhumanspeechperceptionpotentiallybeinggroundedinthemore

generalauditoryprocessingcapabilitiesoflargermammalsthanbeingspecies‐specific.Stebbins

(1983)hasalsoshownthathearinginchimpanzeesismoresimilartothatofhumansthanto

OldandNewWorldmonkeys—whocanhearfrequenciesashighas40to45kHz.Hisreview

includedseveralauditoryparameters—high‐andlow‐frequencysensitivity,lowestfrequency

threshold,bestfrequency,andareaoftheaudiblefield—forfivemammalianspecies,including

anon‐primate(treeshrew),aprosimian(bushbaby),andthreeanthropoidprimates(macaque,

chimpanzee,andhuman).Becauselow‐andmid‐rangefrequencyhearingiscommonto

chimpanzeesandhumans,overallhumanauditorysensitivitymayfacilitatecommunication

withoutbeingaspecializationforspeech.

IncontrasttothechimpanzeehearingcapabilitiesreviewedbyStebbins(1983),Kojima

(1990)reportedthattwochimpanzeeswerenotablylesssensitivethanhumanstofrequencies

between2and4kHz.AlthoughKojimafoundexternalearresonancestobeapproximatelythe

sameforbothspecies,thechimpanzeesensitivitypatternshowedapronounceddecreasein

mid‐rangesensitivity‐‐moresimilartopatternsfoundinOldandNewWorldmonkeys

(Beecher,1974;Stebbins,1973)thaninhumans.However,2to4kHzisalsotherangeof

highestenergyinchimpanzeescreams(seeFigure1binRiede,Arcadi,&Owren,2007)—which

arefrequentlyveryloud.Itmay,instead,bethatthedecreasedmid‐rangesensitivityof

Kojima’stwocaptivesubjectsreflectedhearinglossduetorepeatedexposuretohigh‐

amplitudescreamingbyconspecificsintheanimals’confinedhousingspaces.Ifso,

Page 31: Investigating Speech Perception in Evolutionary

17

chimpanzeeswithoutanyhearingimpairmentshouldbeabletoprocesscriticalfrequenciesof

speechaswellashumans.

1.3.2Speechperceptioninnon‐primates.Researchattemptingtofindhuman‐

nonhumananimalcommonalitiesinspeechperceptionmainlyfocusedonperceptionof

rudimentaryelementsofspokenlanguage.InsupportoftheAuditoryHypothesis,experiments

withchinchillasbyKuhlandMiller(1975)andLoebachandWickesberg(2006)haverevealed

thattheseanimalsperceiveanddiscriminateatleastsomeindividualphonemicfeaturesof

speech.Intheirseminalstudy,KuhlandMiller(1975)demonstratedthatchinchillasdiscrimi‐

natedbetweenconsonant‐vowelsyllablesdifferinginvoice‐onsettime.Thisfeatureisthe

lengthoftimebetweeninitialconsonantarticulationmovementsandonsetofvocal‐foldvibra‐

tion.Specifically,thechinchillasweretrainedtoresponddifferentlytoavarietyofinitial/t/and

/d/consonant‐vowelsyllablesproducedbyeighttalkers.Theanimalsalsocorrectlyclassified

novelinstancesofinitial/t/and/d/syllables,includingsyllablesproducedbynewfourtalkers,

producedinnewvowelcontexts,andcomputer‐synthesized/ta/and/da/syllables.

Inaneurologicalstudy,LoebachandWickesberg(2006)demonstratedthattheremight

beacommonphysiologicalsubstrateintheperipheralauditorysystemofchinchillasand

humansinvolvedinrecognitionofspeechcues.Theanimalsshowedauditory‐nerveresponses

whenhearingsyllablesinnaturalandnoise‐vocodedformthatresembledthoseshownby

humanshearingthesamesounds.LoebachandWickesbergpresentedfoursyllablesproduced

bymaletalkersinbothnaturalformandresynthesizedformusingone,two,three,orfour

noisebands.Despitethedifferentspectralprofilesofnaturalandnoise‐vocodedspeech,

chinchillasrespondedsimilarlytohumanstothecuestoconsonantidentity.Inhumans,

Page 32: Investigating Speech Perception in Evolutionary

18

commonspectro‐temporalfeaturesofnaturalandnoise‐vocodedspeechprovidethesecues

forspeechrecognition,andperceptionisenhancedasthenumberofnoisebandsusedinnoise‐

vocodingsynthesisincreases(Shannonetal.,1995).Findingparallelperformanceinchinchillas

suggeststhattheseanimalsrespondtothesamespectro‐temporalcuesinnoise‐vocodedand

naturalspeechthathumansrespondto.

Birdsarenotmammals,andthereforearenotcloseevolutionaryrelativesofhumans,

butaviancommunicationabilitiesshouldalsobementioned.AlthoughtheclassAvesoriginated

approximately50millionyearsago(Sibley&Alquist,1990),researchwithbirdsalsoprovides

evidencethathumansmaybeutilizinggeneralmechanismsinspeechprocessing.Vocal

communicationinbirdsshowssomeparallelstohumanabilities.Forexample,manysongbird

speciesusecomplexvocalizationsandsimilarunderlyingdevelopmentalandmechanistic

processesmaybeinvolved(forareview,seeBeckers,2011,andDoupe&Kuhl,1999).

Psychoacousticstudieshaveshownthatatleastsomebirdsdemonstratesomeofthepercep‐

tualphenomenaproposedtobespecialtohumanshearingspeech—specifically,phoneme

discriminationandcategorization,compensationforcoarticulation,andanabilitytosolvethe

lackofinvarianceproblem.Asoneexample,bothsongbirdsandnon‐songbirdshavedemon‐

stratedtheabilitytodiscriminatephonemevariationsofthevowel/a/(Hienz,Sachs,&Sinnott,

1981).Japanesequail(Coturnixcoturnix)canalsodiscriminateandcategorizeinitial‐consonant

syllables(Kluenderetal.,1987),whilebudgerigars(Melopsittacusundulates)discriminate

vowelcategoriesandaremoresensitivetophonemicvoweldistinctionsthantotalker‐related

vowelvariation(DoolingandBrown,1990).Morerecently,Ohms,Gill,VanHeijningen,Beckers,

andtenCate(2010)foundthatzebrafinches(Taeniopygiaguttata)coulddiscriminateand

Page 33: Investigating Speech Perception in Evolutionary

19

categorizemonosyllabicwordsthatdifferinvowels,andcangeneralizethisabilitytounfamiliar

maleandfemaletalkers.Resultssuchasthesesuggestthatsomebirdshaveanabilitytonor‐

malizespecificcomponentsofspeechacrosstalkers.Budgerigarsandzebrafincheshaveeven

demonstratedtheabilitytodiscriminatebothfull‐formantandsine‐waveversionsof/ra‐la/,

revealingsomesimilaritiestohumandiscriminationofspeechandspeech‐likesounds(Bestet

al.,1989;Doolingetal.,1995).Clearly,birdsshareatleastsomerudimentaryspeech‐

perceptioncapabilitieswithhumans.

1.3.3Speechperceptioninnonhumanprimates.Nonhumanprimates,closer

evolutionaryrelativesofhumansthanofothermammalsandbirds,alsodemonstrateauditory

processingcapabilitiesthatsupporttheAuditoryHypothesis.Intwostudies,KuhlandPadden

(1982,1983)testedrhesusmacaques(M.mulatta),anOldWorldmonkey,andhumaninfants

tocomparediscriminationofvoicedandvoicelessphonemes.Thedifferencebetweenthese

twoisthatvoicedsoundsincludevocal‐foldvibrationandvoicelesssoundsdonot.Monkeys

werefirsttrainedtocategorizethesephonemetypesina“same‐different”procedure,and

afterweretestedwithunfamiliarsyllablepairsonastimuluscontinuumrangingfromvoicedto

voiceless(e.g.,/ba‐pa/).Inthesecondexperiment,thesubjectsheardsyllablepairsdifferingin

“placeofarticulation”(e.g.,/b/‐/d/).Thisfeaturecanrefertotongueplacementduringspeech

production.Forexample,when/b/isproduced,thelipscometogetherandthetongueisheld

awayfromtheteeth.However,when/d/isproduced,thelipsareseparatedandthetongue

touchestheridgeabovethetopoftheteeth.Bothexperimentsrevealedthatthemacaques

dividedthestimuluscontinuumatthesamephysicalpointsashumans,meaningtheyshowed

similarboundarypointsincategorizationbasedonbothvoicingandplace.

Page 34: Investigating Speech Perception in Evolutionary

20

Morerecently,ithasbeendemonstratedthatJapanesemacaquescanperceivethe

articulationeventsofspeech,althoughtheirperformancebetterresemblesthatofhuman

infantsthanofadults.SinnottandGilmore(2004)investigatedperceptionofplace‐of‐

articulationinformationinnaturalspeechbymonkeysandadulthumans,presenting

consonant‐voweltokensconsistingof/b/or/d/combinedwith/i/,/e/,/a/,or/u/.Tongue

placementisdifferentacrossthesesounds,with/a/and/u/describedas“back”vowels,and/i/

and/e/as“front”vowels.Infrontvowelsthetongueispositionedasfarforwardaspossible

withoutcreatingaconstrictionthatwouldproduceaconsonantsound.Inbackvowelsthe

tongueispositionedasfarbackaspossiblewithoutcreatingaconstriction.

SinnottandGilmore(2004)usedatwo‐choiceidentificationtask,wherebythemonkeys

andhumanparticipantshadtoactivelyclassify/b/versus/d/consonant‐vowelstimuliby

movingalever.Humansperformedwellwithallstimuli,whilethemonkeysperformedbetter

withtokensbasedonthebackvowels/a/and/u/thanwithfrontvowels/i/and/e/.Anearlier

studyhadshownthatthree‐tofour‐month‐oldhumaninfantsalsoclassifybackvowelsmore

easily(Eimas,1999),evidentlylearninghowtoreliablydifferentiatefrontvowelsovertime.

SinnottandGilmorethereforeconcludedthatthemonkeys’performancereflectedbasic

auditory‐systemprocessing,asisalsofoundinpreverbalinfantsbeforecriticalspeech‐related

learningoccurs.

Thereisalsoevidencethatcotton‐toptamarins(Saguinusoedipus),aNewWorld

monkey,performsimilarlytoone‐month‐oldinfantswhendiscriminatingDutchversus

Japanesesentences(Ramus,Hauser,Miller,Morris,&Mehler,2000).Althoughtestingmethods

werequitedifferentforthetwospecies,theyfoundthatbothwereabletodistinguishthese

Page 35: Investigating Speech Perception in Evolutionary

21

languages.Inaddition,bothspeciesdidsointhefaceofatleastmodesttalker‐related

variability,asstimulifromatotaloffourspeakersofeachlanguagewereused.Neither

monkeysnorinfantswereabletodiscriminatethelanguageswhensentenceswereplayed

backwards.Theresearchersconcludedthattheinfantswerelikelyusinginnate,generalized

auditoryprocessingsharedacrossnonhumanprimates,ifnotallmammals.

Asmentionedearlier,bothbehavioralandneuroanatomicalstudieshaverevealedthat

primatesotherthanhumansshowhemisphericlateralizationeffects(Taglialatela,2007).

Specifically,severalmacaquespeciesexhibitanREAforcommunicativelyrelevantvocalizations.

Inchimpanzees,brain‐imagingstudieshaverevealedparallelstoWernicke’sandBroca’sareas

inhumans—bothofwhichareconsideredcriticalinlanguagefunctions.Wernicke’sareais

locatedintheplanumtemporaleofthetemporallobeandisinvolvedinhumanlanguage

perceptionandcomprehension.Thisregioncanbeasmuchasfivetimeslargerintheleft

hemispherethanintheright.Inchimpanzees,thisbrainstructurewassignificantlylargerinthe

lefthemispherefor94%ofMRIscansexaminedbyGannonandcolleagues(1998).Broca’sarea

islocatedintheinferiorfrontalgyrus,andiscriticalinspeechproduction.Similarly,thisareais

activatedinchimpanzeeswhentheseanimalsvocalize(Taglialatela,Russell,Schaeffer,&

Hopkins,2008).

Onlyalimitednumberofstudieshavetestedperceptualdiscriminationofspeech

soundsinapes.Inonesuchexperiment,Kojimaetal.(1989)presentedtwochimpanzeeswith

consonant‐vowelsyllables,usingsynthetic/ga/‐/ka/and/ba/‐/da/continuatoexaminepercep‐

tionofvoicingandplace‐of‐articulationcontrasts,respectively.Usingresponsetimesasan

indexofperceivedsimilarity,chimpanzeesdemonstratedbetterdiscriminationofsyllables

Page 36: Investigating Speech Perception in Evolutionary

22

whenphoneticcontrastswerebasedonthesamefeaturesofvoicingandarticulationthat

humansattendto.Asintheearlier‐mentionedSinnottandGilmore(2004)experimentwith

macaques,chimpanzeesdidnotperformaswellashumans,likelyreflectingtheimportanceof

earlyexperienceinhumanspeechperceptualdevelopment.Asnotedearlier,oneunique

aspectofhumanlanguageacquisitionistheimportanceoflearningandenvironmentalinput.

1.3.4Evidenceoftop‐downprocessingwhenperceivingspeech.Speechperception

studieswithnonhumanprimatesappeartodemonstratetheoperationofgeneralauditory

processing—insupportoftheAuditoryHypothesis.Asinstudieswithnon‐primates,however,

thisworkhasnotprovidedevidenceofthehigher‐level,top‐down‐processingcapabilities

claimedtobeevidenceofauniquelyhuman,cognitivemodule(Liberman&Mattingly,1989;

Mann&Liberman,1983;Whalen&Liberman,1987).Suchprocessingispotentiallydifficultto

demonstratewiththerudimentarytypesofstimuliusedinthesestudies,withtop‐downeffects

emergingmoreclearlyinhumansatthelevelofwordsandsentences.Inotherwords,

distinguishingSiStheoryandtheAuditoryHypothesisalsorequiresexamininghigher‐order

processingofmeaningfulspeechinnonhumans,ratherthanformeaningfulspeechsegments.

Infact,acousticcuestophonemesandsyllablesmaynotbeprocessedinthesameway

aslexicalcomponentsoflanguage.Forexample,theformercouldbeprocessedasnon‐speech

soundsare,using“bottom‐up”perceptionthatisnotstronglyguidedbyhigher‐levelknowledge

ofmeaningfullanguage.Becausechimpanzeesandhumansarecloseevolutionaryrelatives—

divergencefromacommonancestoroccurred5‐to8‐millionyearsago(Wood,1996)—

investigatingwordrecognitionbytheseapescouldprovidecompellingevidencefortheSiS

versusAuditoryHypothesisdebate.AsStebbins(1983)noted,chimpanzeesmaybethespecies

Page 37: Investigating Speech Perception in Evolutionary

23

toshedlightonwhethertheabilitytoperceivemeaningfulspeechwaspresentinlatentformin

homininsbeforetheevolutionofmechanismstoproducespeech.

RevisingtheMotorTheoryofSpeechPerception,Galantucci,Fowler,andTurvey(2006)

recentlysuggestedthattheonlycompellingevidenceforneuralhardwarespecializedfor

speechwouldbediscoveringadedicatedcircuitactive“ifandonlyif”speechisperceivedor

produced.Galantuccietal.furtherarguethatspeechcannotbeunderstoodinisolation,that

productionandperceptioncomponentsnecessarilyworktogether,andthatspokenlanguageis

criticallyembeddedinacommunicativecontext.Thesearerestrictiveargumentsthatagain

tendtoinherentlyruleoutthepossibilityofanimalexperiments.Asmentionedearlier,given

thatnonhumanprimatesdonotspeak,itisimpossibletotestthemforadedicatedcircuit

involvedinbothspeechproductionandperception.

Anotherpointisthatveryfewnonhumansareraisedinaspeech‐richenvironment,

meaningthatanimalstypicallyhavenoopportunitytoexperiencetheinputthatiscrucialto

humanlanguagedevelopment.Averyfewapes,however,havebeenraisedinamannersimilar

tohumans.Throughacombinationoflanguageexposureandenculturationbyhumans,these

animalsacquiredbothmeaningfulcommunicativeandword‐recognitionabilities(Beran,

Savage‐Rumbaugh,Brakke,Kelley,&Rumbaugh,1998;Brakke&Savage‐Rumbaugh,1995;

Rumbaugh&Savage‐Rumbaugh,1996;Savage‐Rumbaugh,Murphy,Sevcik,&Brakke,1993).

WhileapproachingtheimpossiblecriteriaproposedbyGalantuccietal.(2005;seealsoTrout,

2001),language‐capableapesalsohavethepotentialtobetestedwithmeaningfulspeech.

Page 38: Investigating Speech Perception in Evolutionary

24

1.4Speechperceptioninalanguage‐trainedchimpanzee

Language‐trainedapesarguablypresentaconvincing,andpossiblyunique,opportunity

forinvestigatingspeechperceptioninnonhumansandsettlingtheSiSversusAuditory

Hypothesisdebate.Specifically,ifspeechperceptionutilizesaspecialized,humancognitive

module,thenalanguage‐trainedapeshouldnotbeableidentifyorunderstandspeech

presentedinthealtered,syntheticformsproposedtorequirethisspecialization.Furthermore,

suchananimalshouldarguablynotexhibittalkernormalizationformeaningfulspeech,given

theassociatedlackofinvarianceproblem.However,previousresearchwithoneparticular

language‐trainedapesuggestsotherwise,insupportoftheAuditoryHypothesis.

1.4.1PreviousexperimentswithPanzee.Toexaminewhetherapesandhumanscan

showfundamentalsimilaritiesinspeechprocessing,arecentseriesofexperimentsassessedthe

perceptualcapabilitiesofachimpanzeenamedPanzee.Thisanimalisanadultfemale,housed

attheLanguageResearchCenter(LRC)atGeorgiaStateUniversity(GSU).Shewasraised

routinelyhearingspeechfromtheageofeightdaysold,showingreliablerecognitionof

approximately130spokenEnglishwords.Panzeewasalsotaughtcorrespondingvisuo‐graphic

symbols,calledlexigrams,andcanuseboththesesymbolsandassociatedphotographsto

communicateineverydayandexperimentalsituations.WhenPanzeehearsafamiliarEnglish

word,sheisreliablyabletochoosethecorrect,correspondinglexigramorphotographfrom

amongmultiplealternatives(Beranetal.,1998).Todate,Panzeehasbeentestedwithnatural,

whispered,andseveralformsofsyntheticspeech(Heimbaueretal.,2011;Heimbauer,

unpublisheddata).

Page 39: Investigating Speech Perception in Evolutionary

25

TotestPanzee,48two‐tofive‐syllablewordswerechosenfromamongherfamiliar

Englishwords.Inalltheexperiments,Panzeefirstheardawordinnaturalorsyntheticform.

Hertaskwasthentouseajoysticktoselecttheonelexigramorphotographcorrespondingto

thatwordfromamongfoursuchitemsappearingonacomputerscreen.Panzeereceivedno

feedbackforcorrectorincorrectchoicesduringtestsessions,rulingoutthepossibilityoflearn‐

inghowtorespondtoalteredversionsofthewords.However,shedidreceivearewardevery

threeorfourtrialsonarandomized,noncontingentbasis.Sixteendifferentnaturalandeight

differenttestwordswerepresentedineachexperimentalsession,withsessionsincludingfour

blocksofthesewordsinrandomizedorder,foratotalof96trials.Typically,asessionlasted20

to30minutes.

Annualtestingovera10‐yearperiodendingin2008hasdemonstratedthatPanzee’s

sessionperformancefornaturalwordsisconsistentlybetween75%and85%correct

(M.J.Beran,personalcommunication,January2010).Whentestedwithwordsinwhispered

form,sheperformedsimilarlyatorabove75%.Subsequenttestingwithwordsinfourdifferent

syntheticformsproducedthesameresults(Heimbauer,unpublisheddata).Oneformsimply

reproducedtheoriginalutteranceascloselyaspossible,butanotherformwaslesscomplete.

Thislesscomplete,“voiced‐only”formincludedonlytonalelementsoftheoriginal,meaning

thatanynoise‐basedcomponentswereremoved.Perceptionofvoiced‐onlyspeechthus

involvestop‐downprocessing,asunvoiced,noisyacousticsareimportantcontributorstomany

Englishphonemes.Despitethemissingsounds,Panzeeshowednodifferenceinperformance

withvoiced‐onlyversusnaturalwords,evenwhenconsideringonly“firsttrials”‐‐the48

Page 40: Investigating Speech Perception in Evolutionary

26

instancesinwhichsheheardagivenwordinsyntheticformforthefirsttime(Heimbauer,

unpublisheddata).

ThetworemainingsynthesisformswereselectedtobedirectlyrelevanttotheSiS

versusAuditoryHypothesisdebate,withoutcomesshowninFigure2.Onewasnoise‐vocoded

speech,synthesizedusingsevennoise‐bands.Thisformcanbechallengingtohumans,butis

relativelycomprehensibletomostlisteners(Davis&Johnsrude,2007;Davisetal.,2005;

Hervais‐Adelmanetal.,2008;Shannonetal.,1995).Despitethispotentialchallenge,Panzee’s

performancewasstatisticallywellabovechancelevelbothonfirsttrialsandoverall.Her

performancewiththeseversionswasstatisticallybelowtheoutcomesfornaturalwords

presentedinthesamesession,butsheshowednoevidenceofhavinglearnedhowtorespond

tothem(Heimbaueretal.,2011).Thelastsyntheticformtestedwassine‐wavespeech,

describedas“science‐fictionsounds”and“whistles”bynaïvehumanlisteners(Remezetal.,

1981)and“impossiblyunspeechlike”byprominentspeechresearchers(Remezetal.,1994).

Again,Panzee’sperformancewasabovechanceonfirsttrialsandoverall.Whileshewasless

accurateidentifyingsine‐wavewordsthannaturalwords,humanperformancewassimilarto

thisinatranscriptiontask‐‐evenwithhumansreceivingorientationtosine‐wavespeechand

hearingallthenatural‐wordstimuliaheadoftime(Heimbaueretal.,2011).

Theresultsofthesenoise‐vocodedandsine‐wavespeechexperimentsprovidedthefirst

evidenceofhuman‐likeperformancebyanonhumanrespondingtomeaningful,butincomplete

andperceptuallydifficultsyntheticspeech.Top‐downprocessingabilitiesarenecessaryto

identifyspeechinbothforms,withbothPanzeeandhumanlistenersneedingtoaccessprevi‐

ousknowledgeaboutnaturalspeechinordertoidentifyalteredversionsofthewords.Given

Page 41: Investigating Speech Perception in Evolutionary

27

thatsomeresearchershavelinkedperceptionofbothnoise‐vocodedandsine‐wavespeechtoa

proposedphoneticmodule(e.g.,Trout,2001;Whalen&Liberman,1987),theseresultsdirectly

contradictSiSargumentswhilesupportingtheAuditoryHypothesis.However,similaritiesin

recognitionperformancedonotnecessarilyimplysimilaritiesinunderlyingprocessing,andit

remainsunclearifPanzeewasusingthesameperceptualstrategiesashumanstoperformthis

task.

Figure2.Panzee’ssynthetic‐speech,word‐recognitionperformance.Thefigureisreproduced

fromHeimbauer,Beran,&Owren,2011,andincludesmeansandstandarderrorsofpercent‐

age‐correctperformancefor48wordsheardinnatural,noise‐vocoded,andsine‐waveforms.

Firsttrialsrepresentthe48firstinstancesofthechimpanzeehearingawordingivensynthetic

form.Thefirstsetofsine‐waveresultsshowsperformancewithnoncontingent,intermittent

rewarddeliveryandnoresponsefeedback.Thesecondsetshowsperformancewithcontingent

rewardreceivedonnaturaltrialsbutwithnorewardorresponsefeedbackonsine‐wavetrials.

Thedashedlineindicatesthechance‐performancerateof25%correct.Allcomparisonsto

chanceperformancewerestatisticallysignificantatp<0.008andaremarkedbyapairof

asterisks.

Page 42: Investigating Speech Perception in Evolutionary

28

2.CURRENTEXPERIMENTS

Inthedebateabouttheexistenceof,ornecessityfor,humanspecializationsforspeech

perception,Panzee’snaturalandsyntheticspeechword‐recognitionabilitiesprovideevidence

fortheAuditoryHypothesisview.Herperformancewithnoise‐vocodedandsine‐wavewordsin

particularsuggeststhathumanspeechperceptionisgroundedingeneralizedauditorycapabili‐

tiesandextensiveexperiencewithspeechratherthanspecializedprocessingmechanisms.

Thesestudiesdonot,however,allowanunequivocalconclusionthatPanzee’sspeechprocess‐

ingisfundamentallysimilartohumanperception.Forexample,itispossiblethatPanzeeisable

torecognizeherrelativelysmallnumberoffamiliarwordsthroughmoreholisticjudgments,

eitherbasedondurationoroverallauralimpressions(seeHeimbaueretal.,2011).Totestthe

SiSviewversustheAuditoryHypothesismoredefinitively,itwouldrequirespecificevidence

aboutdetailedaspectsofPanzee’sprocessingstrategies.

Therefore,thecurrentexperimentsweredesignedtoextendpreviousworkonPanzee’s

speechperception,specificallyinvestigatingwhethershereliesonsimilarauditoryperceptual

mechanismsashumans.Experiment1examinedtheacousticcuesPanzeemaybeattendingto

whenhearingspeechinsine‐wave(Experiment1a)andnoise‐vocoded(Experiment1b)forms.

Forhumans,sine‐wavespeechbecomesmoredifficulttorecognizewheneitherthefirst(SW1)

orsecondsinewave(SW2)ofthethreesinewavesisremoved(Remezetal.,1981).Hypothesiz‐

ingthatPanzeewouldalsoshowevidenceofrelyingdisproportionatelyonthesecues,

Experiment1acomparedherperformancetothatofhumanparticipantswhenhearingfour

criticallydifferentversionsofsine‐wavewords.InExperiment1b,Panzeeandhumanpartici‐

pantsweretestedwithnoise‐vocodedwordsproducedfromvaryingnumbersofnoise

Page 43: Investigating Speech Perception in Evolutionary

29

bands.Previousresearchhasshownthathumansfinditeasiertorecognizesentencesproduced

withfourormorenoisebands(Shannonetal.,1995),attributedtothefactthatincreased

numbersofnoisebandsenhancetheamplitudeandfrequencymodulationinformation

represented(Davisetal.,2005;Shannonetal.,1995).Basedonherpreviousperformancewith

wordsinnoise‐vocodedform,itwasagainhypothesizedthatPanzeewouldshowsimilar

performancetohumansasafunctionofthenumberofnoisebandsusedinsynthesis.

Experiment2focusedontime‐reversedspeech.Here,previousresearchwithhumans

hasrevealedthatphonemesarecuedoversegmentsofroughly50to100ms(Crystal&House,

1988),withshorterwindowsintime‐reversedformleavingintelligibilitylargelyunaffected,but

longerwindowshavingasignificantdetrimentalimpact(Saberi&Perrott,1999).Thesecond

experiment,therefore,wasdesignedtoinvestigatewhetherPanzee’sperceptionalsorelieson

cuingoverthistimeframe.Inthisstudy,sheheardtestwordswithreversalwindowsthat

rangedfrom25msto200msineight25‐msincrements,andhumanparticipantsweretested

forcomparisonpurposes.Thepredictiontestedwasthat,likehumans,Panzeewouldperform

bestwithwindowlengthslessthantheaveragedurationofEnglishphonemes,with

intelligibilitydecreasingto50%thresholdlevelforawindow‐lengthofapproximately130ms—

asfoundforhumanperformancebySaberiandPerrott(1999).

Experiment3investigatedthelackofinvarianceproblem.Whilehumansseemto

effortlesslyaccommodatetheacousticvariabilityfoundinspeechfromdifferenttalkers,

evidenceofcorrespondingtalkernormalizationcapabilitiesinnonhumansissuggestive,but

limited.AsPanzeehasheardandrespondedtospeechfromavarietyoftalkersthroughher

lifetime,thislastexperimentwasdesignedtotesthertalker‐normalizationcapabilitiesmore

Page 44: Investigating Speech Perception in Evolutionary

30

systematically.Stimuliincludedspeechfromadiversesetoftalkers,includingvariationin

biologicalsex,age,anddialectbackground.Inaddition,someoftheindividualtalkerswere

familiartoPanzeeandotherswereunfamiliar.

2.1GeneralMethods

2.1.1Subject.ThesubjectwasthefemalechimpanzeePanzee,whowas25yearsold

whenthecurrentexperimentsbegan.Thisanimalissociallyhousedwiththreeconspecificsat

theLRCatGSU.Panzeehasdailyaccesstoindoorandoutdoorareas,unlimitedaccesstowater,

andisfedfruitsandvegetablesthreetimesaday.Sheparticipatesintestingonavoluntary

basisandmaychoosenottoparticipateortostoprespondingduringasession.Panzeeusesa

language‐like,lexigram‐basedcommunicationsystemtorequestitemsthroughoutthedayand

oftenduringexperimentalsituations.Inadditiontolanguage‐comprehensiontestingusing

lexigramsandphotographs,thisanimalalsohasexperiencewithnumerous,computer‐based

protocols(Rumbaugh&Washburn,2003).Inthethreeexperiments,sheparticipatedinthree

tofour,20‐to30‐minutesessionsperweek,andworkedforfavoredfooditems.Shewastested

inanindoorareaofherdailylivingspace,whichwasadjacenttootherchimpanzeeareas.

Duringtestsessions,otherchimpanzeescouldbeeitherindoorsoroutdoors,withtheoptionof

movingbetweenthoseareasatwill.

2.1.2Participants.Humanparticipantswereundergraduates,aged18to55yearsold,

andwererecruitedviatheGSUon‐line,experimentparticipationsystem.Eachparticipantwas

testedinonlyoneoftheapplicableexperiments—eitherinExperiment1a,1b,or2.Only

participantswithoutreportedhearingproblemsandthatwerenativeEnglish‐speakerswere

includedinanalyses.

Page 45: Investigating Speech Perception in Evolutionary

31

2.1.3Apparatus.Computerprogramsusedtotestthechimpanzeewerewrittenin

VisualBasicVersion6.0(MicrosoftCorp.,Redmond,WA)andrunonaDellDimension2400

personalcomputer(DellUSA,RoundRock,TX).ASamsungModel930BLCDmonitor(Samsung

Electronics,Seoul,SouthKorea),aRealisticSA‐150stereoamplifier(TandyCorp.,FortWorth,

TX),andtwoADSL200speakers(Analog&DigitalSystems,Wilmington,MA)wereconnectedto

thecomputer.ThechimpanzeeregisteredherchoicesusingacustomizedGravis42111

GamepadProvideo‐gamingjoystick(KensingtonTechnologyGroup,SanFrancisco,CA).Human

participantsheardexperimentalstimulithroughSennheiserHD650headphonesinasound‐

deadenedroom.Theexperimentswerecontrolledviaacomputerfromanadjacentroom,and

soundswerepresentedviaTDTSystemIImodules(Tucker‐DavisTechnologies,Alachua,FL).

Audio‐recordingwasconductedwithaShurePG14/PG30‐K7head‐wornwirelessmicrophone

system(ShureInc.,Niles,IL),andeitheraRealistic32‐12008stereomixingconsole(Tandy

Corp.,Ft.Worth,TX)andMarantzPMD671ProfessionalSolid‐StateRecorder(Mahwah,New

Jersey),oraMacBookProlaptopcomputer(AppleInc.,Cupertino,CA).Acousticprocessingwas

conductedusingaMacBookProlaptop,PraatVersion5.1.11,acousticssoftware(Boersma,

2008),andcustom‐writtenscripts(Owren,2010).

2.1.4Stimuli.Spokenstimuliwerechosenfromalistofapproximately130wordsthat

Panzeehasconsistentlyidentifiedinadecadeofannualword‐comprehensiontesting.Natural

wordstimuliwererecordedat44100Hzwith16‐bitword‐widthandfilteredtoremoveany

60‐Hz,ACcontaminationandDCoffset.Individualwordswereisolatedbycroppingcorrespond‐

ingsegmentsatzerocrossings,with100msofsilencethenaddedtothebeginningandendof

Page 46: Investigating Speech Perception in Evolutionary

32

eachfile.Finally,eachwaveformwasrescaledsoitsmaximumamplitudevaluecoincidedwith

themaximumrepresentablevalue.

2.1.5Chimpanzeeprocedure.Panzeewastestedusingthegeneralprocedureemployed

forannualword‐comprehensiontesting.Sheinitiatedatrialbyusingthejoysticktomoveacur‐

sorfromthebottomoftheLCDscreenintoacentered“start”box,triggeringonepresentation

ofthestimulus.Thecursorthenresettothebottomofthescreen,thestartboxreappeared,

andasecondcursormovementproducedanotherstimuluspresentation.Aftera1‐secdelay,

fourdifferentphotographs(Experiments1and2;seeFigure3)orlexigrams(Experiment3;see

Figure11)appearedonthescreen.Oneoftheseitemswasthecorrectmatchtotheaudio

stimulus,andtheotherswerefoilschosenrandomlybythecontrollingcomputerprogram.As

illustratedinFigure4,visualitemswerepositionedrandomlyinfourofsixpossiblelocations—

threeontheleftsideofthescreenandthreeontheright.Photographfoilswerethoseof

wordsusedinthesamesession,therebyreducingthechancethatPanzeecouldruleoutitems

correspondingtowordsshewasnothearing(Beran&Washburn,2002).

Figure3.SamplesofthephotographsusedinPanzee’sspoken‐wordrecognitiontask.

Page 47: Investigating Speech Perception in Evolutionary

33

Panzee’staskwastousethejoysticktomovethecursorfromthemiddleofthescreen

tothephotographcorrespondingtothestimulusword(seeFigure4).InExperiments1aand1b

bothnaturalandalteredwordswerepresentedinrandomizedorderwithineachtrialblock.In

Experiment2,onlyalteredwordswerepresented;andinExperiment3,onlynatural‐word

stimuliwereused.Panzeewasrewardedwithhighlyvaluedfood,includingpiecesofcherries,

grapes,blueberries,peaches,raspberries,strawberries,mixedfruit,orChexMix®.Thereward

regimenwasspecifictoeachexperiment,andisnotedintheindividualproceduresections.

Figure4.Panzeeworkingonacomputertask.Shewashearingwordsandchoosingcorrespond‐

ingphotos.

2.1.6Humanprocedures.Humantestingvariedsomewhatbyexperimentandis

describedintheindividualproceduresections.Commonelementsincludedthatstimuliwere

presentedinrandomizedblocks,thatthestimuluswasheardtwiceoneachtrial1200‐msapart,

andthatlistenershadeightsecondsinwhichtotranscribethatword.

Page 48: Investigating Speech Perception in Evolutionary

34

2.1.7Dataanalysis.Statisticaltestingvariedbyexperimentandisdiscussedseparately

ineachcase.

3.EXPERIMENT1

AlthoughPanzeehasdemonstratedtheabilitytoidentifysine‐waveandnoise‐vocoded

speech(Heimbaueretal.,2011),thecognitivemechanismsthatsheisemployingareunknown.

Thus,twostudiesinvestigatedtheacousticcuesshemaybeattendingtowhenidentifying

thesealteredformsofspeech.ItwashypothesizedthatPanzeewouldshowevidenceofusing

thesameinformationhumansareproposedtouse,namelythespectro‐temporalcues

producedbyamplitudeandfrequencymodulationsovertime(Remezetal.,1981;Remezetal.,

1994;Shannonetal.,1995).Alternatively,shemightutilizemoreholisticcues,suchasword

lengthorgeneralsoundimpressions.AnalysesconductedbyHeimbaueretal.(2011)argued

againstthispossibility.Inthesepreviousexperiments,itwasexpectedthatwhenPanzeemade

errorsshewouldhavebeenchoosingfoilsthatcorrespondedtowordswhoseoverallduration

orsyllablecountweresimilartothoseofthetargetword.However,shedidnotdemonstrate

eitherofthesestrategieswhenerrorsonsine‐waveandnoise‐vocodedwordswereanalyzed.

Here,Panzeewastestedwithsine‐wavewordsthatincludedvaryingcombinationsof

individualtones(Experiment1a),andnoise‐vocodedspeechproducedfromvaryingnumbersof

noisebands(Experiment1b).Asaresult,thesedifferentsyntheticwordsincludeddiffering

degreesoftime‐varyingamplitudeandfrequencycuingofakindpreviouslyshowntosystem‐

aticallyaffecthumanperformance.TherationalewasthatifperformancebybothPanzeeand

humanswassimilarlycompromisedorfacilitatedacrossthevarioussyntheticstimuli,thetwo

speciescouldbeinferredtobeattendingtosimilarelementsofthesounds.Althoughdifficult

Page 49: Investigating Speech Perception in Evolutionary

35

tospecifyprecisely,thisinformationhasbeencharacterizedascriticalspectro‐temporal

patterningineachnaturalwordthatispreservedinsyntheticversions.

3.1Experiment1a

AseminalstudybyRemezandcolleagues(1981)exploredhumanperceptionand

identificationofsine‐wavespeechwiththeobjectiveofinvestigatingtheroleoftime‐varying

propertiesinspeechperception.Theyfoundthatsine‐wavespeech,despitelackingthetradi‐

tionalacousticinformation—suchasF0andformants—couldbeintelligible.Inaddition,as

showninFigure5,theirlistenersweremoresuccessfulinidentifyingsentencecomponents

whenSW1andSW2werebothpresent(SW123andSW12forms)thanwheneitheronewas

absent.SW1andSW2modelthecorrespondingamplitudeandfrequencymodulationpatterns

ofnatural‐speechformantsF1andF2,respectively.Thatoutcomewasexpected,asitisthe

lowesttwoformantsthattypicallymostclearlycuevowelidentity,inadditiontoproviding

articulationinformationforadjacentconsonants(Drullman,2006;Ladefoged,2001).Thus,

Remezetal.’sresultsdemonstratethatcorrespondingtoneanalogstoF1andF2contribute

disproportionatelytosine‐wavespeechidentificationaswell.

ToascertainifPanzeealsodifferentiallyusesthesecomponentsofsine‐wavespeech,

bothsheandhumanparticipantswerepresentedwith24wordssynthesizedinfourofthe

sameformsusedbyRemezandcolleagues.Allthreesinewaveswerepresentinoneversion

(SW123),whileoneofthethreewasremovedintheothers.ItwashypothesizedthatifPanzee

identifiessine‐wavespeechusingsimilaracousticcuesashumans,herperformancewhen

particularsinewavesweremissingwouldbesimilartohumans.Specifically,shewaspredicted

Page 50: Investigating Speech Perception in Evolutionary

36

toperformbestwithwordsthatincludedbothSW1andSW2,(SW123,SW12)andlesswell

wheneitherofthesecomponentswasmissing(SW13,SW23).

Figure5.Intelligibilityofsine‐wavespeechtohumans.ThefigureisreproducedfromRemezet

al.,1981,andshowssyllable‐transcriptionresultsofsine‐wavesentencesinsevendifferent

forms.

3.1.1Subject.ThesubjectwasthechimpanzeePanzee.

3.1.2Participants.Therewere12humanparticipants(eightfemales).

3.1.3Stimuli.ForExperiment1a,naturalwordstimuliwererecordedspokenbyan

adultmaleresearcher(MJB),whoisveryfamiliartoPanzeeandwhoconductedherannual,

word‐comprehensiontestingovera10‐yearperiod.Stimuliconsistedofnaturalversionsand

sine‐waveversionsof24spokenwordsthatPanzeehadpreviouslysuccessfullyidentifiedin

sine‐waveform(Heimbaueretal.,2011).The24‐wordsetcontained9two‐syllablewords,13

three‐syllablewords,1four‐syllableword,and1five‐syllableword.Anadditional12words

fromthelargerwordlistthatPanzeecanroutinelyidentifywereusedduringaninitial,

Page 51: Investigating Speech Perception in Evolutionary

37

“orientation”phasethatincludedbothnaturalandSW123versions(seeTable1foracomplete

listoforientationandexperimentalwords).Toproducethesine‐wavestimuliinthethree

incompleteforms,eitherSW1,SW2,orSW3wereremovedfromthepreviouslyconstructed

processedSW123versions.Individualsine‐waveswereremovedusingHanning‐window,band‐

passfiltering.

Table1.

Orientationandtestwordgroups.TestwordsusedinExperiments1a(AandB),1b(CandD),2

(*),and3(EandF),aswellasorientationwordsusedinExperiments1aand1b(O).

Words Experiment 1a Experiment 1b Experiment 2 Experiment 3 Apple O O F Apricot A D * E Balloon O O F Banana A * E Blueberries B D * E Bubbles A C F Carrot O E Celery D * F Cereal O O * F Clover F Coffee O E ColonyRoom B D E Gorilla B C * Honeysuckle E Hotdog O F Jello O Kiwi O Koolaid O Lemonade B D * Lettuce C Lookout A O M&M B * Melon A E MushroomTrail B O * F Noodles C E

Page 52: Investigating Speech Perception in Evolutionary

38

ObservationRoom B D Orange A C OrangeDrink A O * F OrangeJuice B D * E Peaches O C F Pineapple * E Pineneedle B O * F PlasticBag B D * Popsicle O D * Potato B D * E Raisin O C Sparkler A C Strawberries O * F Sugarcane B D * Surprise A C SweetPotato O F Tickle A C E Tomato D * F Toothpaste C E TV C F Vitamins O * E Water A O F

Yogurt A O E

3.1.4Chimpanzeeprocedure.Inallsessions,wordswerepresentedinfourrandomized

blocks,foratotalof96trials.Oneachtrial,Panzeechosefromamongfourphotographsthat

allcorrespondedtowordsbeingusedinthatparticularsession.WhenPanzeeheardawordin

naturalformandmadeacorrectchoice,sheheardanascending(“correct”)toneandreceived

afoodreward.Whenshemadeanincorrectchoiceonanaturalwordtrial,sheheardabuzzer‐

like(“incorrect”)soundanddidnotreceiveareward.InsessionswhenPanzeeheardboth

naturalandsyntheticwords,thisfeedbackhelpedkeephermotivated.Neitherfeedback

soundsnorfoodrewardswereprovidedontrialswithsyntheticstimuli.

Page 53: Investigating Speech Perception in Evolutionary

39

InitialsessionsassessedPanzee’sperformancewhenhearingthe24testwordsin

naturalform.Toprogress,Panzeewasrequiredtoperformatorabove70%correctwith

naturaltestwordsforthreeconsecutivesessions(chancewas25%).Panzeethencompleted

bothnaturalandsine‐wavesessionswithorientationwords.Here,sheheardeightblocksof12

naturalorientationwordsfortwosessions,averaging72%correct.Thensheheardthesewords

inbothnaturalandsine‐waveformfortwosessions.Inthefirst,sheheardsixorientation

wordsinnaturalformandtheothersixinSW123formforeightblocks.Inthesecondsession,

thesixwordsPanzeepreviouslyheardasnaturalwordswerepresentedinSW123form,and

viceversa.Afterthesine‐waveorientationphase,Panzeeparticipatedinoneadditionalsession

withnaturaltestwordstorefreshheronthetestwordset,andperformedat80%correct.

Inthetestingphase,Panzeecompletedonesessionwiththe12GroupAwordsin

naturalformandtheremaining12GroupBwordsinSW123,SW12,SW13,andSW23forms.In

asecondsessiononadifferentday,sheheardtheGroupBwordsinnaturalformandGroupA

wordsinthefoursine‐waveforms.Trialswererandomizedwithinblocksinthesesessions,with

Panzeehearingnaturalwordsfourtimeseachandsine‐wavewordsonceineachform.She

participatedinthesetwotypesofsessionsthreetimeseach,inanalternatingorder,resultingin

atotalof12trialsforeachwordinnaturalformand3trialsforeachwordineverysine‐wave

form.

3.1.5Humanprocedure.Pilotexperimentationdemonstratedthathumanswereat

ceilingperformanceforallwordformswhenstimuliwerepresentedusingthewordrecognition

methodemployedwithPanzee.Therefore,insteadofhavingparticipantschoosefromfour

photographstoidentifythestimuli,aword‐recallmethodwasusedwherebyparticipantshad

Page 54: Investigating Speech Perception in Evolutionary

40

totranscribethesoundstheyheard.First,however,participantswerefamiliarizedwiththe

wordsetbyexposuretothe24testwordspresentedonaPowerPointpresentation.Photo‐

graphsoftest‐wordobjectswereshown,oneatatime,whilethecorresponding,naturally

recordedwordwasheard.Then,theparticipantwasaskedtowritedownthenameofeach

photographasitwaspresentedwithoutwordlabelsorsounds.Participantswerealsofamiliar‐

izedwithsine‐wavespeechbylisteningtoarecordingofthewords“one”through“ten”and

then“ten”through“one”inSW123form.Theywereinstructedtoinformtheexperimenteras

soonastheywereabletoidentifythesesoundsasspeech.

Inthetestsession,participantsheardthestimuliintwodifferentrandomizedblocks,

withblockordercounterbalancedacrossindividuals.OneblockconsistedofGroupAwordsin

naturalformandGroupBwordsinthefoursine‐waveforms,andtheotherblockincluded

GroupBwordsinnaturalformandGroupAwordsinthefoursine‐waveforms.Withinablock,

naturalwordswerepresentedfortwotrials,andsinewavewordswerepresentedforonetrial

eachineverysine‐waveform.Thesesessionsincludedatotalof72trials.

3.1.6Dataanalysis.Panzee’smeanpercentage‐correctperformanceinorientation

versustestsessionswithnaturalwordswascomparedusinganunpairedt‐testforapossible

learningeffect.Meanpercentage‐correctperformanceforeachsine‐wavewordformwithin

andacrossthesixtestsessionswascomparedtochance‐rateperformanceof25%using

binomialtests.Pearson’schi‐squaredtestswithaBonferronicorrectionwereconductedto

comparePanzee’sperformanceacrossthevarioussine‐waveversions.Humanpercentage‐

correctperformancewascomputedforeachwordform,withmeanperformanceforthe12

participantsthenexaminedseparatelyfornaturalandsine‐waveversions.ANOVAwasusedto

Page 55: Investigating Speech Perception in Evolutionary

41

testforanoveralleffectofsine‐wavewordforms,andTukeypost‐hoccomparisonswereused

forsubsequentpair‐wisecomparisonsamongthem.Finally,anindependentt‐testwascon‐

ductedtotestforpossibleeffectsofblock‐presentationorder.

3.1.7Results.Panzee’smeanperformanceoverthethreenaturalwordorientation

sessionswas73.3%(SD=1.58),whichwasstatisticallyabovechancelevel(p<0.001).Correct

natural‐wordtrialsinthesixtestsessionsrangedfrom81.3%to93.8%,averaging87.2%

(SD=4.06)overall,whichalsowassignificantlyabovechance(p<0.001).Anunpaired,2‐tailed

t‐testrevealedthatPanzee’sperformancewithnaturalwordswassignificantlyhigherintest

sessionsthaninorientationssessions,t(7)=3.95,p<0.01,asshowninFigure6a.Overall,

correctperformanceforallsine‐wavewordswasstatisticallyabovechancelevel(SW123and

SW12forms,p<0.001;SW23andSW13,p<0.05).AsillustratedinFigure6b,Panzeewas36%

correctforSW23andSW13words,and59%correctforSW123andSW12words.Achi‐squared

testwithaBonferronicorrectedalphavalueof0.025revealedthatcorrectperformancefor

SW123andSW12wordswassignificantlygreaterthanforSW23(p=0.006)andSW13versions

(p=0.004).

Page 56: Investigating Speech Perception in Evolutionary

42

Figure6.Experiment1achimpanzeeandhumanwordrecognition.a)Meanperformancewith

naturalwordsbyPanzeeandthehumanparticipants,withapplicablestandarddeviations.

b)Panzee’ssine‐wavewordperformance,withchance‐levelaccuracyshownbythedashedline.

c)Meanhumansine‐wavewordperformance,withstandarderrorbars.

Page 57: Investigating Speech Perception in Evolutionary

43

Meantranscriptionperformanceofnaturalwordsbyhumanswas99.8%correct

(SEM=0.06),asshowninFigure6a.Meanpercentage‐correctvaluesforSW123,SW12,SW23,

andSW13wordformswere43%,35%,31%,and27%,respectively.AKolmogorov‐Smirnovtest

fornormalityvalidateduseofANOVA,andresultsrevealedastatisticallysignificantdifference

amongthevariousoutcomes,F(3,44)=6.00,p=0.002.Furthermore,Tukeypost‐hoc

comparisonsshowedthatoutcomesweresignificantlyhigherforSW123stimulithanforSW13

versions,p=0.001,asillustratedinFigure6c.Nootherdifferenceswerefound.Independent,

2‐tailed,t‐testresultsrevealedaneffectofpresentationorder.Fiveparticipantstranscribed

GroupBwordsfirst,andperformedsignificantlybetteronGroupAsine‐wavewordsthandid

participantshearingGroupAwordsfirst,t(46)=2.28,p=0.027.Similarly,thesevenpartici‐

pantstranscribingGroupAwordsfirstperformedsignificantlybetteronGroupBsine‐wave

wordsthanthosehearingGroupBwordsfirst,t(29.6)=9.33,p<0.001.Examiningindividual

performance,sixparticipantsperformedsimilarlytoPanzee,eitheroverallorinoneofthe

blocks.Fortheseparticipants,performancewasthesameforSW123andSW12,orforSW23

andSW13wordforms.ThesesixparticipantsalsoperformednotablybetteronSW123and

SW12words,thanonSW23andSW13forms.Finally,boththehumansandPanzeenever

recognizedthewords“banana,”“bubbles,”“orangedrink”and“pineneedle”inseveralofthe

SWforms.

3.1.8Discussion.Panzeedemonstratedconsistentnaturalword‐recognitionper‐

formance,showingsimilaroutcomesinbothorientationandtestsessionsasinearlierannual

testingsynthetic‐speechexperiments(Heimbaueretal.,2011).HerrecognitionofSW123

wordswasalsosimilartoperformanceinprevioussine‐wavetesting(Heimbaueretal.,2011).

Page 58: Investigating Speech Perception in Evolutionary

44

PanzeeidentifiedmorewordsinSW123andSW12formthaninSW13orSW23form.Human

performedsimilarly,althoughonlythedifferencebetweenSW123(withbothSW1andSW2

present)andSW13(missingSW2)performancewasstatisticallysignificant,whiletheSW123

andSW23performancedifferencewasnot.However,6ofthe12participantsdidperform

similarlytoPanzee,eitheroverallorinoneofthetwotest‐wordblocks.Inotherwords,these

participantsperformedexactlythesamewithSW123andSW12forms,orSW23andSW13

forms,andwerebetteratidentifyingSW123andSW12wordsthanSW23andSW13wordsin

thoseinstances.

Unexpectedly,Panzee’sperformanceonSW123wordswas58%correct,whichwas

higherthanmeanhumanoutcomeof43%correct.Panzee’shigheraccuracymaybeduetothe

factthatalthoughsine‐wavewordscanbequitechallengingeventohumans(Heimbaueretal.,

2011),shewasveryfamiliarwithherwordsetandhadheardtheminSW123forminearlier

experiments.Althoughhumanparticipantswereexposedtoandtestedwiththenaturalwords

beforehearingthesine‐waveforms,theywerelessfamiliarwiththemthanwasthe

chimpanzee.

Themoreimportantresultisthatbothspeciesshowedastatisticallysignificantper‐

formancedifferencebetweencompletesine‐wavewords(SW123)andthesamewordswhen

missingthetoneanalogtoF2(SW2).ThisresultisconsistentwiththehypothesisthatPanzee

respondstothesamecuesinsine‐wavespeechthathumansrespondto,withthefurther

implicationthatsheisattendingtothesamefeaturesashumansinnaturalspeechaswell.This

conclusionisbasedonthefindingsthatPanzeewasmostsuccessfulinidentifyingsine‐wave

speechthatincludedinformationconcerningbothF1andF2,themostimportantformantsin

Page 59: Investigating Speech Perception in Evolutionary

45

humanperceptionofnaturalspeech(Drullman,2006;Remez&Rubin,1990).BothPanzee

andhumansdemonstratedanabilitytointerpretsine‐wavesascuestophoneticcontent,also

suggestingthatbothweredrawingonimplicitknowledgeofspeechacousticsandcorrespond‐

ingphonetics(Davis&Johnsrude,2007;Mann&Liberman,1983;Newman,2006;Whalen&

Liberman,1987).Takentogether,theseoutcomesareindicativeofcognitivetop‐down

processing.

3.2Experiment1b

Panzee’sabilitytoidentifywordsinnoise‐vocodedformalsoprovidedanopportunityto

examinethecuessheissensitivetoinsyntheticspeech,withcorrespondingimplicationsfor

naturalspeechprocessing.Althoughherpreviousperformancehassuggestedthatgeneral

auditoryprocessingcapabilitiesmaybesufficientforhuman‐likespeechperception(Heimbauer

etal.,2011),moredetailedtestingwithnoise‐vocodedwordscouldfurtherstrengthenthis

conclusion.Hence,thepurposeofthenextexperimentwastocomparePanzee’sperformance

withnoise‐vocodedwordswithvaryingdegreesofspectro‐temporalinformationtothatof

humans.

In1995,Shannonandcolleaguesfoundthatasthenumberofbandsusedtosynthesize

noise‐vocodedphonemesandsentencesincreased,participantsshowedcorresponding

improvementsinidentificationaccuracy(seeFigure7).Withtrainedlisteners,fournoisebands

areoftensufficientforspeechrecognition,whileatleasttennoisebandsarenecessarywith

untrainedparticipants(Davisetal.,1995;Shannonetal.,1995).Previously,Panzeedemon‐

stratedrecognitionoffamiliarwordsinnoise‐vocodedformsynthesizedwithsevennoisebands

(Heimbaueretal.,2011).Therefore,thisexperimentassessedherword‐recognitionabilityasa

Page 60: Investigating Speech Perception in Evolutionary

46

functionofthenumberofnoisebandsusedtoproducethestimuli.Itwashypothesizedthatif

Panzeeusesthesameavailablecuesashumans,shewouldshowasimilarpatternofperform‐

anceacrossthoseforms.Specifically,herperformancewaspredictedtoincreaselinearlywith

increasingnumbersofnoisebands.

Figure7.Intelligibilityofnoise‐vocodedspeechtohumans.Percentage‐correctperformance

foreighthumanlistenersidentifyingconsonants,vowels,andsentencesasafunctionofnoise‐

bandnumberinnoise‐vocodedspeechtestedbyShannonetal.(1995).Thedashedline

denoteschance‐levelaccuracy.

3.2.1Subject.Panzeeagainwasthesubject.

3.2.2Participants.Therewere12humanparticipants(eightfemales).

3.2.3Stimuli.Stimuliconsistedof24previouslyrecordedandprocessednaturalwords

(seeTable1),whichwerethosethatPanzeehadbestidentifiedinnoise‐vocodedforminearlier

testing(Heimbaueretal.,2011).Noise‐vocodedversionsvariedfromtwotofivenoisebands,

Page 61: Investigating Speech Perception in Evolutionary

47

andweresynthesizedusinglower‐andupper‐cutofffrequencies(seeTable2)calculatedusing

the“Greenwoodfunction”(Souza&Rosen,2009).Thisfunctioncalculatesfrequencyranges

correspondingtoequaldistancesalongthebasilarmembraneofthecochlea,andcanbe

appliedtobothhumansandothermammals,includingnonhumanprimates(Greenwood,1961;

Greenwood,1990).Theapproachwasusedtoensureorderlyselectionoffrequency‐cutoff

valuesastheyrelatetohearing.

Teststimuliconsistedof11two‐syllablewords,11three‐syllablewords,1four‐syllable

word,and1five‐syllableword.AllwordswerechosenfromalistofthosethatPanzee

previouslysuccessfullyidentifiedinnoise‐vocodedform.Fifteenofthesewordswerealsoused

inExperiment1a.Twelveadditionalwords,innaturalandapreviouslysynthesizedformusing

sevennoisebands(NB7),wereusedduringanorientationphase(seeTable1).Toproducethe

variousnoise‐bandteststimuli(NB2,NB3,NB4,andNB5),thenaturalspeechsignalwasdivided

into2,3,4,or5frequencybandsusingaband‐passfilter.Theamplitudeenvelopeofeachband

wasthenextractedandusedtomodulateacorrespondingwhite‐noiseband.Theresulting

amplitude‐modulatednoisewaveformswerethensummed.

Table2.

Lower‐to‐uppercutofffrequenciesfornoise‐bandstimuliinExperiment1b.

Bands Frequency(Hz)

2 100‐1005 1005‐5000

3 100‐548 548‐1755 1755‐5000

4 100‐392 392‐1005 1005‐2294 2294‐5000

5 100‐315 315‐705 705‐1410 1410‐2687 2687‐5000

Page 62: Investigating Speech Perception in Evolutionary

48

3.2.4Chimpanzeeprocedure.Thetestingprocedureandrewardregimenwerethesame

asthoseusedinExperiment1a,andallsessionsconsistedof96trials.Panzeefirstcompleted

threenaturalwordsessionstoprepareherforthetestsessions,andtoensurenormative

performancewithnaturalwords.Criterionperformancetoprogresstotheorientationphases

wassetat75%inthreeconsecutivesessions.Duringorientation,Panzeecompletedonesession

with12non‐testwordsinnaturalform,asecondsessionwithsixoftheseinnaturalandsixin

NB7form,andathirdsessionwiththesewordsintheconverseforms.Inthefinalorientation

phase,Panzeecompletedonemoresessionwiththe24naturaltestwordsandthenthree

sessionsofthesewordsinnaturalandNB7forms.Ineachoftheselattersessions,adifferent

eightwordswereinNB7formandtheremaining16werenaturalversions.

Anadditionalprogrammingcontingencywasaddedintestsessions.Here,wordsofthe

sametype,meaningnaturalornoise‐vocoded,werenotpresentedmorethanthreetimesina

row.ThisadjustmentwasmadetoavoidfrustrationthatcouldpossiblyresultfromPanzee

hearingaseriesofchallengingnoise‐vocodedwordsconsecutively.Inthefirsttestsession,

Panzeeheard12words(GroupC)innaturalformandtheremaining12words(GroupD)inNB2,

NB3,NB4,andNB5versions.Inasecondtestsession,onadifferentday,sheheardGroupD

wordsinnaturalformandGroupCwordsinthefourNBversions(seeTable1).Withina

session,therewerefourtrialswitheachnaturalwordandonetrialwitheachofthewordsin

everyNBform.Panzeeparticipatedinthesetwosessiontypesthreetimeseachinalternating

order,resultinginatotalof12trialsforeachofthe24naturalwordsand3trialsforeachofthe

NBwordforms.

Page 63: Investigating Speech Perception in Evolutionary

49

3.2.5Humanprocedure.Severalhumanlistenerswerefirsttestedinpilotsessionsusing

thesameorientationandtestproceduresasinExperiment1a.However,astheseparticipants

demonstratedhighaccuracywithtestwordsinallNBforms,theorientationprocedurewas

changed.Experimentalparticipantswereinsteadfamiliarizedwithnoise‐vocodedspeechonly

bylisteningtoarecordingofthewords“one”through“ten”andthen“ten”through“one”in

NB7form.Followingthissimpleorientation,theyheardandtranscribedoneblockofthe24

naturalwordsinrandomizedorder.Lastlytheyheardandtranscribedarandomizedtestblock

ofthesamewordsineachofthefournoise‐bandforms,foratotalof96trials.

3.2.6Dataanalysis.DataforbothPanzeeandthehumanparticipantswasanalyzedas

inExperiment1a.

3.2.7Results.Panzee’snaturalword‐recognitionperformanceinorientationsessions

rangedfrom77.2%to83.3%,withanoverallmeanof80.6%(SD=3.11),whichwasstatistically

abovechancelevel,p<0.001(seeFigure8a).Percentage‐correctonnaturalwordsinthesix

testsessionsrangedfrom77.1%to87.5%,withanoverallmeanof82.8%(SD=3.8)correct,

whichwasalsosignificantlyabovechancelevel(p<0.001).Anunpaired,2‐tailedt‐testrevealed

thatPanzee’snatural‐wordperformancewasnotstatisticallydifferentbetweenthesetwo

sessiontypes,t(7)=0.70,ns.

Panzee’spercentagecorrectforNB5,NB4,andNB3wordformsrangedfrom61%to

50%(seeFigure8b),andoverallwassignificantlyabovechance(p<0.001).HerNB2word

performancewaslowerat38%correct,andnotsignificantlydifferentfromchance.Aone‐

tailed,chi‐squaredtest,withaBonferroniadjustedalphavalueof0.017,showedthatPanzee’s

Page 64: Investigating Speech Perception in Evolutionary

50

recognitionofNB5wordswassignificantlyhigherthanNB2versions(p=0.002),butnothigher

thaneitherNB4orNB3forms.

Figure8.Experiment1bchimpanzeeandhumanwordrecognition.a)Meanperformancewith

naturalwordsbyPanzeeandthehumanparticipants.b)Panzee’snoise‐vocodedwordper‐

formance,withchance‐levelaccuracyshownbythedashedline.c)Meanhumanperformance

fornoise‐vocodedwords,withstandarderrorbars.

Page 65: Investigating Speech Perception in Evolutionary

51

Humanwordtranscriptionofnaturalwordswas100%correct(seeFigure8a).Mean

percentage‐correctvaluesforNB2,NB3,NB4,andNB5formswere80%,78%,68%,and38%,

respectively(seeFigure8c).AfteraKolmogorov‐Smirnovtestshowedthedatatobenormally

distributed,ANOVArevealedanoveralleffectacrossthesenoise‐vocodedwordforms,F(3,44)

=24.0,p<0.001.Tukeypost‐hoccomparisonsrevealedasignificantdifferencebetween

performancewithNB5andNB2forms(p<0.001),butnootherconditioneffects.Examiningthe

performancesofindividualparticipantsrevealedthatfourperformedmuchasPanzeedid.In

otherwords,theyshowedthebestperformancewithNB5words,worstforNB2forms,and

virtuallyidenticaloutcomesforNB4andNB3words.Panzeeneverrecognizedthewords

“celery,”“noodles,”and“raisin”inNB2form,and11of12humanparticipantscompletely

failedwiththeseitemsaswell.

3.2.8Discussion.Asinearliertesting,Panzeeagaindemonstratedtheabilitytoreliably

identifywordsinnoise‐vocodedform.However,herperformancewassignificantlybetterfor

wordsinNB5formthanincorrespondingNB2versions.Humansperformedsimilarly,bothin

thecurrentworkandincomparableearlierstudies(Shannonetal.,1995).Asexpected,increas‐

ingnumbersofnoisebandswasassociatedwithhigherword‐identificationperformancefor

bothPanzeeandthehumans,withbothspeciesperformingaswellwithNB4andNB5formsas

inearliertestingwithNB7stimuli(Heimbaueretal.,2011).Theresultsagainconfirmthatnoise‐

vocodedspeechbasedonasfewasfournoisebandsisreliablycomprehensible(Souza&

Rosen,2009;Shannonetal.,1995)—inthiscaseforachimpanzeeaswell.Whilehumans

recognizedmoreNB5,NB4,andNB3wordsthanPanzeeinthecurrentexperiment,they

Page 66: Investigating Speech Perception in Evolutionary

52

performednobetterthanshedidwithNB2stimuli.Onlyoneoftwelvehumanswasableto

identifythethreewordsthatwereunintelligibletoPanzeeinthisform.

Ashypothesized,Panzee’sperformancewithnoise‐vocodedwordsshowedevidenceof

sensitivitytothesamecuesashumanlisteners.Inbothcases,perceptionwassuccessfulin

spiteoftheabsenceofbasicspeechfeatures,suchasF0andformantinformation.Whatever

thespectro‐temporalcuesthatremained,thislanguage‐trainedchimpanzeewasabletotake

advantageofthem.Aswithsine‐wavespeech,theoutcomesareinconsistentwiththeSiS

perspectiveandinsteadsupporttheAuditoryHypothesis.Resultsarealsoagainindicativeof

top‐downprocessing,withbothspeciesevidentlymakinguseofpreviousknowledgeofspeech

acousticsandphoneticcategoriesininterpretingthesefundamentallyaltered,synthetic

versions.

3.3GeneralDiscussion

Experiment1wasdesignedtoinvestigatetheacousticcuesthatPanzeemaybeutilizing

whenlisteningtosine‐waveandnoise‐vocodedspeech,comparingheroutcomestoanalogous

humanperformance.Ashypothesized,Panzeeshowedevidenceofsensitivitytothesamecues

inbothsyntheticformsashumans—stimuluspatterningarguedtoreflectspectro‐temporal

propertiesoftheoriginal,naturalspeechsignal(Remezetal.,1994).Forbothsine‐waveand

noise‐vocodedspeechrecognition,Panzeeandcurrenthumanparticipantsperformedbetter

whenthesyntheticwordformsincludedtheattributesshowntofacilitatehumanperformance

incomparablepreviouswork.Additionally,Panzee’sperformanceinbothExperiments1aand

1bdemonstratedthatshecanreliablyunderstandspeechthathasbeencharacterizedas

missingtraditionalacousticcuestophoneticcontent(Remezetal.,1994).Thissupportsthe

Page 67: Investigating Speech Perception in Evolutionary

53

viewthathumanspecializationsforperceptionarenotnecessaryforspeechperception(Kuhl,

1988),andthatPanzeeisdemonstratingtop‐downinterpretationofthisimpoverishedspeech

inputinthesamewayashumans(Davisetal.,2005;Davis&Johnsrude,2007;Hillenbrandet

al.,2011).

Inbothexperiments,astrategyofmatchingsyntheticversionstoholisticpropertiesof

knownwordswouldmakeidentificationdifficultorimpossible.Forexample,performanceby

matchingonoveralldurationwouldnotlikelybedramaticallyaffectedacrossthevariousforms,

asgrosstemporalpropertieswerecapturedacrossindividualsine‐wavecomponentsandnoise‐

bands.Inaddition,bothsine‐waveandnoise‐vocodedversionsaredramaticallyalteredrelative

tonaturalspeech,andoverall“auditoryimpressions”aremarkedlydifferentamongthetwo.

ThereasonableconclusionisthatbothPanzeeandhumanlistenerswereabletoperceivethe

syntheticversionsasperceptiblespeechinspiteoftheirunusualacoustics.

4.EXPERIMENT2

ThemanipulationsusedtoinvestigateacousticcuestophoneticcontentinExperiment1

are,ofcourse,nottheonlywaytoapproachbasicspeech‐perceptionproblems.Onealternative

istoexaminephonemesasindividualsegmentswithinthespeechsignal,whichistherationale

behindrecentworkwithtime‐reversedspeech.Asdiscussedearlier,creatingthisspeechform

involvesreversingshort,fixedportionsofthesignal.Althoughtemporalpropertiesofthewave‐

formaremarkedlychangedbythistime‐reversal,theyhavelittleeffectonspeechintelligibility

aslongaswindow‐lengthislessthan100ms(Barkatetal.,2002;Saberi&Perrott,1999)—

meaningnolongerthantheapproximatedurationofphoneticsegments(Crystal&House,

1988).Infact,SaberiandPerrottfoundthatintelligibilityoftime‐reversedspeechforEnglish

Page 68: Investigating Speech Perception in Evolutionary

54

listenersdecreasedto50%thresholdlevelwhenwindow‐lengthswere130msormore(see

Figure9).ThegoalofExperiment2,therefore,wastoassessPanzee’sabilitytorecognizetime‐

reversedspeech,testingwhethersheshowedsimilarprocessingoverphoneme‐lengthwindow

lengths.ItwashypothesizedthatPanzeedoesperceivespeechbasedonphonemicsegments,

andwoulddemonstratehuman‐likeperformancewhentestedwithtime‐reversedspeech.Ifso,

theresultwouldconstituteevidencethatspeechperceptionmechanismsarebasedongeneral‐

izedmechanismsandprovidesupportfortheAuditoryHypothesis.

Figure9.Intelligibilityoftime‐reversedspeechtohumans.ThefigureisreproducedfromSaberi

andPerrott(1999),andshowssubjectiveintelligibilityratingsoftime‐reversedsentencesby

sevenparticipants.Notethat50%intelligibilityoccursatawindow‐lengthofapproximately

130ms.

4.1Subject

ThesubjectwasthechimpanzeePanzee.

4.2Participants

Therewere12humanparticipants(8females).

Page 69: Investigating Speech Perception in Evolutionary

55

4.3Stimuli

TwentyofPanzee’spreviouslyrecordedandprocessed,three‐syllable,naturalwords

werechosen(seeTable1).Three‐syllablewordswereusedexclusivelytomaximizebothword

lengthanduniformityamongthestimuli,aswellasbeingacontrolforsyllablecountasa

possiblecuetowordidentification.Eachofthesewordswasmanipulatedbyapplyingtime‐

reversalwindowsofvaryinglengths,startingatthebeginningofthefileandcontinuingthrough

contiguoussectionstotheend.Wordswerereproducedineightforms,usingwindowlengths

of25ms(TR25),50ms(TR50),75ms(TR75),100ms(TR100),125ms(TR125),150ms(TR150),

175ms(TR175),and200ms(TR200).Thefinaltime‐reversedsegmentineachsoundfilewas

almostalwayssmallerthanthenominalwindowlength,afactorthatwillbediscussedlaterin

theResultssection.

4.4Chimpanzeeprocedure

ThecomputertestingprocedurewasthesameasinExperiments1aand1b;however,

testsessionsnowconsistedof80trials.Panzeereceivednoauditoryfeedbackonindividual

trialsandwasrewardedaftereverythreetofourtrials,independentofperformance.Thisnon‐

contingentrewardregimenwasusedtoavoidthepossibilityoflearningeffectsacrossthe

varioustime‐reversedforms.Panzeedidnotreceiveanyorientationfortime‐reversedstimuli.

Instead,shesimplyparticipatedinsessionshearingthe20wordsinonlynaturalformbefore

testingbegan.Eachofthesesessionsconsistedoffour,randomizedwordblocks,andshewas

requiredtoperformatalevelofatleast70%correctoverthreeconsecutivesessions.She

reachedthiscriterionaftersixsessions.

Page 70: Investigating Speech Perception in Evolutionary

56

Duringthefirsttestsession,Panzeeheardthe20wordsonetimeeachinfourtime‐

reversedforms—TR50,TR100,TR150,andTR200.Inthesecondtestsession,onadifferentday,

sheheardthesame20wordsonetimeeachintheotherfourforms—TR25,TR75,TR125,and

TR175(seeTable1).Stimuliwererandomizedwithinblocks,andPanzeecompletedbothtypes

ofsessionsfourtimeseach,inalternatingorder.Overall,testingincluded4trialsforeachofthe

20wordsineveryTRform.

4.5Humanprocedure

Twosetsof12participantsweretested.“Group1”wastestedwiththesame20test

wordsusedwithPanzeeinformsTR25,TR75,TR125,andTR175,foratotalof80randomized

trialsheardinasinglesession.“Group2”wastestedsimilarly,butwiththe20wordsinforms

TR50,TR100,TR150,andTR200.

4.6Dataanalysis

Panzee’spercentage‐correctperformancewascomputedforeachoftheeightword

forms,bothwithinandacrosssixtestsessions,whichincludedthreetrialsforeachwordin

eachoftheeighttime‐reversedforms.Twosessionswereexcludedfromdataanalysis,because

Panzeewasconsistentlydistractedanddisinterested.Duringoneofthesesessions,she

constantlymovedawayfromthetestscreentolookoutawindowatcarscomingandgoing.

Duringanothersession,sherepeatedlymovedawayfromthetestscreentoaskfordifferent

foodrewards.Althoughsheeventuallycompletedthesetwosessions,herperformancewas

lessthan60%correctwitheventheeasiest(mostnatural‐sounding)time‐reversedwords(e.g.,

TR25andTR50).

Page 71: Investigating Speech Perception in Evolutionary

57

Binomialtestswereconductedtocompareperformancetoachancerateof25%for

eachversionseparately.AKruskal‐Wallistestwasusedtotestforanoveralleffectamongword

forms,withMann‐WhitneyUtestsappliedinpost‐hoc,pair‐wisecomparisons.Therelationship

betweenperformanceandthetime‐reversedvalueformwasmodeledusinglinearregression.

Inadditiontothisstatisticaltesting,twothresholdvalueswerecomputedforword

intelligibilityasafunctionofwindow‐length.Thesevalueswerebasedontherationalethatthe

thresholdrepresentsthehalfwaypointbetweennoperceptionandperfectperception.While

thelowboundarycouldbetakenaschance‐rateperformanceat25%correct,thehigh

boundarywasnotasclear‐cut.A“high”thresholdwas,therefore,setasthemidpointbetween

25%and100%(62.5%),anda“low”thresholdwassetasrelativeto80%(52.5%),andboth

valueswereatabovechance‐rateperformance(p<0.001).ThelattervaluerepresentsPanzee’s

overall,long‐termperformancelevel,basedonahistoricrangeof75%to85%innaturalword

recognition(M.J.Beran,personalcommunication,January2010),aswellasherperformancein

Experiment1.Panzee’s50%thresholdperformanceusingthesetwothresholdvalueswasthen

interpolatedbasedonintelligibilityratesattheclosesttestedreversal‐windows.

Humanmeanpercentage‐correctperformanceforeachwordformwascomputedfor

Group1andGroup2separately.ANOVAwasusedtotestforanoveralleffectwithindividual

word‐formresultscomparedusingTukeypost‐hoctests.Linear‐regressionanalysiswasalso

applied;andforregressionpurposes,Group1andGroup2participantsweretreatedasasingle

sample.

Page 72: Investigating Speech Perception in Evolutionary

58

4.7Results

AsshowninFigure10,Panzee’scorrectwordrecognitionforthesixsessionsranged

from49%to63%,withameanof56%overallsessions,andwasstatisticallyabovechance

level,p<0.001.Performancewasalsostatisticallyabovechancelevelforallindividualword

forms.Regressionanalysisrevealedthatwindowlengthpredictedpercentagecorrect,

β=‐0.17,p<0.01,andasignificantamountofthevariance,R2=.80,F(1,6)=23.7,p<0.01.

AKolmogorov‐Smirnovtestshowedthatthedatawerenotnormallydistributed.AKruskal‐

Wallistestwasconducted,andrevealedanoverallperformancedifferenceamongtheTRword

forms,X2(7)=16.7,p=0.019.BecausePanzee’sperformancewaslessthan60%correctfor

TR125,TR150,TR175,andTR200words,ameancomparisonMannWhitneyUtestwas

conductedbetweenTR125andTR175totestforapotentialdifferenceinintelligibilityperform‐

ance,similartothatreportedforhumansbySaberiandPerrott(1999).Resultsdid,infact,

revealasignificantdifference,p<0.05.Panzee’shighandlow50%‐intelligibilitythreshold

pointswere92.5msand141.0ms,respectively.

Page 73: Investigating Speech Perception in Evolutionary

59

Figure10.Experiment2chimpanzeeandhumanwordrecognition.Thetopfigureshows

Panzee’stime‐reversedwordperformance,correspondingregressionline,andchance‐level

accuracy.Both“high”and“low”thresholdsarealsonoted(seetextforfurtherdetails).The

bottomfigureshowsmeantime‐reversedwordperformanceforhumans,withstandarderror

barsandtheregressionline.ThewhitelineontheTR200barsrepresentspercentage‐correct

valuesforPanzeeandhumanparticipantswhenwordswithfinalwindow‐lengthsof130ms

orlesswereexcludedfromanalysis.Asterisksdenotestatisticalsignificance:*=p<0.05,

**=p<0.001.

Meanpercentage‐correctwascalculatedforhumansforeachTRvalue,andrangedfrom

almost100%correctforTR25to23%forTR200(seeFigure10).Regressionanalysisrevealed

thatreversalwindow‐lengthwasastrongstatisticalpredictorofpercentagecorrect,β=‐0.45,

Page 74: Investigating Speech Perception in Evolutionary

60

p<0.001,andaccountedforalmostalloftheassociatedvariance,R2=.97,F(1,6)=200.3,

p<0.001(seeFigure10).AfteraKolmogorov‐Smirnovtestconfirmednormality,ANOVAof

Group1datashowedanoveralleffectamongTR25,TR75,TR125,andTR175,F(3,44)=62.3,

p<0.001.Tukeypost‐hoccomparisontestsrevealedstatisticallysignificantdifferences

betweenTR25andTR75,andTR75andTR125(both,p<0.001).However,therewasno

performancebetweenTR125andTR175forms.Afteragainconfirmingnormality,anANOVA

showedanoveralleffectamongTR50,TR100,TR150,andTR200Group2performance,

F(3,44)=61.2,p<0.001.Tukeypost‐hoccomparisonsrevealedstatisticaldifferencesbetween

TR50andTR100,andTR100andTR150(both,p<0.001),butnotbetweenTR150andTR200.

Theintelligibilitythresholdvaluecalculatedforhumanswas121.6ms.

4.8Discussion

Inthisexperiment,Panzeeagaindemonstratedproficiencywithwordsinnaturalform

duringorientationsessions.Moreimportantly,herperformancerevealedherabilitytoidentify

wordsintime‐reversedform.Time‐reversedspeechhasbeendescribedas“themostdrastic

formoftimescaledistortion”(Licklider&Miller,1960).Despitethedramaticchangesinvolved,

Panzee,likehumanparticipants,recognizedspeechinmanyoftheseforms.Inaddition,the

declineinrecognitionperformance,expectedasthereversalmanipulationcomestoaffect

adjacentphonemes,occurredsimilarlyinPanzeeandthehumans.

BothPanzee’sandthehumans’performancerevealedthattime‐reversalwindowlength

significantlypredictedpercentagecorrectoutcomes.AsinSaberiandPerrott’s(1999)work

withsentences,wordintelligibilitywasbelow50%atreversal‐windowlengthsofapproximately

130msforbothPanzeeandhumanparticipants.Usingamaximumpossibleperformanceof

Page 75: Investigating Speech Perception in Evolutionary

61

100%,her50%intelligibilitythresholdwasjustabove90ms.Settingthatlevelat80%,her

thresholdwasat141ms.Bothvaluesaresimilartothecurrentmeanthresholdvalueof122ms

forhumans,aswellasSaberiandPerrott’s(1999)outcomeofapproximately130ms.Although

Group1participantsdidnotdemonstrateasignificantdecreaseinperformancebetweenTR125

andTR175,theydidshowadifferencebetweenTR75andTR125(81%and48%,respectively).

Group2participantsshowedasignificantdecreasebetweenTR100andTR150(63.3%and

33.8%,respectively).

AlthoughPanzee’spatternofdecreasingperformancefromTR25toTR200wassome‐

whatdifferentthanmeanhumanperformance,oneparticipantshowedexactlythesame

patternasPanzeeforTR25,TR75,TR125,andTR175wordforms.Thisparticipantperformedat

100%onTR25andTR75words,andperformanceforTR125(50%)waslessthanatTR75(15%).

Furthermore,bothPanzeeandhumansfounditdifficulttoidentify“tomato”and“potato”for

TRlengthsof125ormore,withPanzeeonlydoingsoinoneinstance(“potato,”inTR200form).

Humansshowedatotalofonly29%correcttrialsforthesetwowordsinthefourlonger‐

windowforms.

Unexpectedly,PanzeedemonstratedbetterperformanceonTR200wordsthanwith

eitherTR150orTR175forms.Whileearlierworkhasonlyusedsentencestimuli(Saberi&

Perrott,1999),thisworkusedwordstimuli,whichareshorterinduration.Itmaybethatthe

differencebetweenusingsentence‐andword‐lengthstimulihadaminoreffectonthecurrent

results.For12ofthe20wordsthefinalreversalwindowencompassedonly130msorless

whenthewordswereinTR200form.Thisfinalsegmentcould,therefore,haveprovideda

strongerindicationofwordidentitythanintendedforbothPanzeeandthehumanlisteners.In

Page 76: Investigating Speech Perception in Evolutionary

62

all,18ofPanzee’s28(64.3%)correctresponsestoTR200wordswereforthese12items.In

humans,the12wordsaccountedfor38.9%ofthecorrectTR200trials.Excludingthesewords

fromanalysisproduced13.9%and13.8%correctforPanzeeandthehumans,respectively.

Theseoutcomesarelowerthanforanyotherwindowlength(seeFigure10).

Ifthehypothesisthattime‐reversedspeech“works”becausereversalwithina

phoneme‐lengthwindowdoesnotaffectprocessing,thenPanzeeisevidentlylisteningto

phoneme‐lengthsegments.Ifphonemesaretypicallyconsideredtobe50‐to100‐mslong

(Crystal&House,1988),thenPanzeedidwellatthosewindowlengths.Itwasnotclearexactly

howtosetPanzee’s50%intelligibilitythreshold.However,thehighandlowpointsthatwere

chosenproducedvaluessimilartohumanthreshold.Usingaverydifferentrationalethanwas

usedforproducingsine‐waveandnoise‐vocodedspeech,Panzee’sperformancesuggeststhat

shesegmentsspeechbasedonthesamephoneme‐basedorganizationashumans,andmay

alsohearwordsasasequenceofphonemes.Theresultsindicatethatdetailedauditoryanalysis

oftheshort‐termacousticspectrumisnotessential.Rather,theamplitudeenvelopeismore

likelyimportant,andshorttime‐reversalwindowsdonotappeartosignificantlyimpairspeech

perceptionofthephonemeinformationwithinthewindow.However,performancedeclines

whenwindowsarelongenoughtocauseoverlapofphoneme‐lengthinformation.Thisdecrease

revealsthatforbothspecies,extensivealterationofthespectro‐temporalcuingcancritically

affectperformance.

5.EXPERIMENT3

Inadditiontothequestionsregardinghowspecificacousticelementscontributeto

lexicalinformation,therearemoregeneralperceptualproblemsthatarise.Thelackof

Page 77: Investigating Speech Perception in Evolutionary

63

invarianceproblem,asdescribedearlier,reflectsthehighvariabilityinspeechacousticsthat

resultsfromthedifferentphysicalandphysiologicalpropertiesofindividualtalkers(Pisoni,

1997).Becauseofthesedifferencesintalkeracoustics,speechmustbenormalizedinorderfor

listenerstoperceivethecommonlexicalidentitiesofindividualwords.Listenersroutinely

normalizespeechfrombothfamiliarandunfamiliartalkers,despitedifferencessuchasageand

sexclasses,andlanguagebackgrounds.Acoustically,thisvariationaffectsavarietyoffeatures,

suchasF0range,formantfrequencies,speakingrate,andacousticalpatterningforagiven

phoneme(forareviewseeBenzeghibaetal.,2007).

Notonlyisthereadifferencebetweenmaleandfemalevoices,inthatmalevoiceshave

alowerF0,butchildren’sspeechisalsoverydifferentfromadultspeech.Children’sspeechis

typicallycharacterizedbyhigherpitchandformantfrequencies,especiallyforvowels(Gerosa,

Giuliani,&Brugnara,2007;Lee,Potamianos,&Narayanan,1999).Inaddition,childrenunder

theageofseventypicallyhavelongerphonedurationandlargerspectralandtemporal

variabilityinconsonantsandconsonant‐voweltransitionsthanolderchildrenandadults

(Gerosa,Lee,Giuliani,&Narayanan,2006).Itisthesecharacteristicsthatcanoftenmakechild

speechmoredifficulttounderstandthanadultspeech.

Talkernormalizationemergesearlyinhumanontogeny,asshownbyfindingthatinfants

aremoresensitivetotalkervariationatsevenandahalfmonthsofagethanattenandahalf

months(Houston&Jusczyk,2000).However,theprocessbywhichtalkernormalizationoccurs

isnotwellunderstood,anddifferentmodelshavebeenproposed(Creel&Tumlin,2009).Some

researchersbelievethattheprocessisonewherebythelistenerstripsawayindividualtalker

informationtoextractphonemecontentinabstractform.Othersinsteadproposethat

Page 78: Investigating Speech Perception in Evolutionary

64

generalizingacrosstalkersisbasedonlearningandimplicitlyrememberingalargenumberof

instancesofspeechsoundsfrommanydifferentindividualtalkers(Creel&Tumlin,2009;

Sumner,2011).Thelatterviewissupportedbythefactthatlearningtalker‐specificcharacteris‐

ticscanimprovelinguisticprocessing(Nygaard&Pisoni,1998;Nygaard,Sommers,&Pisoni,

1994;Pisoni,1995).

Althoughtherearedifferentviewsregardinghowtalkernormalizationoccurs,this

phenomenonhasnotbeenstronglylinkedtoeitherSiSorAuditoryHypothesispositions.On

theonehand,thedifficultyoftalkernormalizationcouldbeinterpretedasevidencefortheSiS

argument.Ontheotherhand,someevidencefavorstheAuditoryHypothesis.Forexample,

somenon‐primates,suchaschinchillasandbirds,havedemonstratedtheabilitytocategorize

speechsoundsandmonosyllabicwordsandthengeneralizeperformancetoatleastasmall

numberofunfamiliartalkers(Kuhl&Miller,1975;Ohmsetal.,2010).Budgerigarshavedemon‐

stratedthattheyarelesssensitivetovowelsfromdifferenttalkersthantovowelsbetween

categories(Dooling&Brown,1990).Thereisalsoindirectevidencethatapesnormalizespeech

acrosstalkers.Forexample,Panzeeandseverallanguage‐trainedbonobosinteractwithatleast

adozenhumansonaneverydaybasis,andreactappropriatelytospeechcommandsand

requests(M.J.Beran,personalcommunication,January2010).

However,theempiricalnonhumannormalizationstudieshavetypicallyinvolvedonlya

fewtalkersandshort,oftenrudimentarysounds.Thus,Experiment3wasdesignedtoaddress

talkernormalizationmoresystematicallyinPanzee,providinganopportunitybothtotesther

withspeechfromalargenumberofindividuals,andtopresentmorecomplex,lexically

meaningfulstimuli.Asbefore,thehypothesiswasthatshewouldbesimilartohumansin

Page 79: Investigating Speech Perception in Evolutionary

65

demonstratingnormalization,becauseshehasheardspeechfrommanydifferenttalkers

throughherlifeandcurrentlyrespondstoEnglishspokenbyvariousindividuals.However,both

earlierwork(Heimbaueretal.,2011)andExperiments1and2involvedthespeechofonly

talker,MJB.Specificinformationregardingherpotentialtalkernormalizationabilities,there‐

fore,isnotavailable.Experiment3includedbothfamiliarandunfamiliaradulttalkers,witha

varietyofdialectbackgrounds.Wordsfromyoungchildrenwerealsopresented.Panzeehas

beenexposedtochildren’svoicesmuchlessfrequently,especiallylaterinherlife,eitherin

experimentsorininformalinteractions.Performancewiththesetalkerswascomparedto

recognitionofspeechfromtalkerMJB,whoisarguablyoneofthetalkersPanzeeismost

familiarwith.

5.1Subject

ThesubjectwasthechimpanzeePanzee.

5.2Participants

Nohumanlistenersweretested.However,audio‐recordingincludedatotalof31

differentnative‐Englishspeakersincluding21adultsand10children.Thesetalkersincludedthe

familiartalkerMJB,5familiarmales,5unfamiliarmales,5familiarfemales,5unfamiliar

females,and5boysand5girls(allunfamiliar).Theagerangeofadulttalkerswas20to72years

old,andtheagerangeforchildrenwas4to7yearsold.

5.3Audiorecordingandstimuli

Teststimuliconsistedof15two‐syllable,14three‐syllable,and3four‐syllablewords

(seeTable1).Alltalkerswererecordedspeaking48words,butonly32wereusedinthe

experiment.Someofthewordsweredifficultforthechildrentopronounce,andtheydidnot

Page 80: Investigating Speech Perception in Evolutionary

66

alwaysspeakclearly.The32wordswerechosenonthebasisoffindingthebestrecordings

fromall31talkers.MJBand20additionalnativeEnglish‐speakingadultswererecordedreading

theindividualwordsfromindexcards.TennativeEnglish‐speakingchildrenwererecordedas

theynamedphotographsappearingindividuallyinaMicrosoftPowerPointpresentation.Ifa

childcouldnotnameaphotograph,theyweretoldthewordexplicitly.The30newtalkerswere

groupedasfamiliaradultmales(FAM),familiaradultfemales(FAF),unfamiliaradultmales

(UAM),unfamiliaradultfemales(UAF),unfamiliarmalechildren(UCM),andunfamiliarfemale

children(UCF).MJBwasre‐recordedforthisexperimentusingthesameequipmentthatwas

usedtorecordtheother30talkers.

TalkerswerefromawiderangeofareaswithintheUnitedStates,withavarietyof

regionaldialectbackgrounds.MJBwasborninOhio,andhadalsolivedinAlabamaandGeorgia.

Theother10familiartalkerswereborninsixdifferentstatesandinGermany,andhadlivedina

totalof14otherstatesandWashington,DC.Thenorthern‐mostofthesestateswasMichigan,

thesouthern‐mostwasFlorida,theeastern‐mostwasNewYork,andthewestern‐mostwere

CaliforniaandOregon.SometalkershadalsolivedinGermany,Japan,Nepal,Switzerland,and

Taiwan.The10unfamiliartalkerswereborninsixdifferentstatesandinPuertoRico,andhad

livedinatotalofnineotherstates.Thenorthern‐mostofthesewasNewYork,thesouthern‐

mostwasLouisiana,theeastern‐mostwasNewJersey,andthewestern‐mostwereCalifornia

andHawaii.OneofthesetalkershadalsolivedinGermany.Allofthechildrenhadbeenborn

andraisedexclusivelyineitherGeorgiaorNewYork.

Page 81: Investigating Speech Perception in Evolutionary

67

5.4Chimpanzeeprocedure

Panzeewastestedforatotalof14sessions,eachofwhichincluded80trials.Inthefirst

session,sheheard16testwords(GroupE)spokenbyMJB.Inthenextsixsessionssheheard

GroupEwordsspokenbythefivetalkerswithineachofthespecifictalker‐typegroups.The

sessionorderoftalkertypesfortestingwas:FAM,UAM;FAF;UAF;UCM;UCF.Intheeighth

session,Panzeeheardtheremaining16testwords(GroupF)spokenbyMJBinasession,which

wasfollowedbysixsessionswithGroupFwords—onesessionforeachofthesixtalker‐type

groups.Testingorderdifferedrelativetotheearliersessionsandwas:UAM;FAF;UCF;FAM;

UCM;UAF.Inthisexperiment,Panzeechosefromfourlexigrams(seeFigure11),insteadoffour

photographs.Thischangewasmadetobeabletoeventuallycompareresultingdatatoan

earlier,unpublishedexperimentthatalsousedlexigrams.Panzeereceivedauditoryfeedback

oneverytrial,andwasrewardedforallcorrectresponses.Thisrewardschedulekeptherhighly

motivated,andcouldbeusedbecauseeachtrialwasunique.

Figure11.SamplesoflexigramsusedinPanzee’sspoken‐wordrecognitiontask.

5.5Dataanalysis

Panzee’sdatawereanalyzedasinExperiment2.

Page 82: Investigating Speech Perception in Evolutionary

68

5.6Results

AsshowninFigure12,Panzee’smeancorrect‐trialperformance,wascalculatedforeach

talkerandaveragedforthetwosessionsforeachdifferenttalkertype,rangedfrom75.6%

(MJB,UCF)to81.3%(FAM).Wordrecognitionwassignificantlyabovechancelevelforalltalker

types,p<0.001.AKolmogorov‐Smirnovtestshowedthatthedatawerenotnormallydistrib‐

uted,andaKruskal‐Wallistestwasconductedusingcombinationsoftalkertypes.Because

performanceforalltalker‐typeswassimilartoannualperformancelevels(seeFigure12),some

talker‐typeswerecombinedtoanalyzeperformancecomparisons.Theseanalysiscategories

were:“AllFamiliarAdults”(FAMandFAF),“AllUnfamiliarAdults”(UAMandUAF),“AllAdult

Males”(FAMandUAM),“AllAdultFemales”(FAFandUAF),and“AllChildren”(UCMandUCF).

Therationaleforcombiningdatafromboysandgirlswasthatpriortopuberty,vocaltractsand

vocalfoldsofgirlsandboysareverysimilar(Simpson,2009).Resultsrevealednooverall

differenceinperformanceamongthecollapsedcategories,X2(4)=0.58,ns.

Figure12.Experiment3chimpanzeeword‐recognitionperformanceacrosstalkers.Talker‐type

groupswereasfollows:MJBisthefamiliarmaleresearcher,FAMisotherfamiliaradultmales,

Page 83: Investigating Speech Perception in Evolutionary

69

FAFisfamiliaradultfemales,UAMisunfamiliaradultmales,UAFisunfamiliaradultfemale,

UCMisunfamiliarboys,andUCFisunfamiliarfemales.Thedashedlinesignifieschance‐rate

performance.

5.7Discussion

Becausespeechacousticsarehighlyvariableoveravarietyofcharacteristics,listeners

haveto“solve”thelackofinvarianceproblemalmosteverytimetheyhearspokenlanguage.

Althoughtheprocessbywhichthisoccursremainsunclear(Creel&Tumlin,2009),speech

experiencelikelyplaysamajorrole.Infantsdemonstrateatleastsometalkernormalization

abilitybytheageoftenandahalfmonths(Houston&Jusczyk,2000),andadults,havingavast

amountofspeechexperience,routinelynormalizeacrossawiderangeoftalkers(Benzeghibaet

al.,2007).Althoughsomenonhumanshaveshownanabilitytodiscriminateandcategorize

speechsounds(Dooling&Brown,1990;Kuhl&Miller,1975;LoebachandWickesberg,2006;

Ohmsetal.,2010),talkernormalizationhasnotpreviouslybeensystematicallyinvestigatedin

nonhumans.

Inthecurrentexperiment,Panzeewastestedmoredeliberately,demonstratingthe

abilitytonormalizeacrossarangeoftalkersproducingherfamiliarwords.Notsurprisingly,

PanzeerecognizedthesewordswhenspokenbythefamiliarresearcherMJBatratesimilarto

thatfoundinmanyprevioustestswithhisvoice.However,shealsoshowedessentiallythe

sameperformancewhenhearingthevoicesoffamiliarmalesandfemales,unfamiliarmalesand

females,andunfamiliarboysandgirls.Inadditiontoage‐andsex‐relatedvariation,these

talkerscamefromavarietyofregionaldialectbackgrounds.Theadultshadlivedinatotalof26

states,Washington,DC,PuertoRico,andfiveothercountries.Althoughthechildrenwereonly

Page 84: Investigating Speech Perception in Evolutionary

70

fromthenorthernstateofNewYorkandthesouthernstateofGeorgia,theirvoiceswerethe

leastfamiliartoPanzee,aswellasbeingverydifferentfromtheadultvoices(Gerosaetal.,

2007;Leeetal.,1999).

OnepossibleinterpretationofPanzee’sperformancewithfamiliartalkersisthatshehad

previouslylearnedthefeaturesofeachperson’svoiceindividually,therebyknowingfrom

experiencewhateachofthesewordssoundedlikewhenspokenbytheseparticularindividuals.

However,thatexplanationcannotaccountforherabilitytorecognizetheunfamiliaradultsor

children.Amorelikelyexplanationisthat,ashypothesized,Panzeewasshowinghuman‐like,

talker‐normalizationabilities.HerperformanceprovidesevidenceinsupportoftheAuditory

Hypothesis,ratherthantheSiSview.

6.GENERALDISCUSSION

Insightsintothegeneralityoftheauditoryandcognitiveprocessesinvolvedinspeech

perceptionarefundamentaltoresolvingtheSiSversusAuditoryHypothesisdebate.Although

argumentsbyGalantuccietal.(2006)andothersalmostruleoutthepossibilityofmeaningful

animalexperiments,comparisonsbetweenhumansandnonhumansareanecessity.

Mammalianandcloseprimaterelativesareofgreatestinterest,andthecurrentworkdemon‐

stratesthatcomparisonswiththeseanimalscanbeveryinformative.Whilemanyanimal

studieshaveinvestigatedperceptionofrudimentaryspeechsounds,nonehavepreviously

presentedmeaningfulwords,syntheticversionsofthoseitems,ortestedawiderangeof

differenttalkers.ThecurrentworkwaspossiblespecificallybecauseofPanzee’slanguage‐

comprehensionabilities(Beranetal.,1998;Brakke&Savage‐Rumbaugh,1995;Heimbaueret

al.,2011;Rumbaugh&Savage‐Rumbaugh,1996).

Page 85: Investigating Speech Perception in Evolutionary

71

6.1Currentresults

Experiment1investigatedthepossibilitythatPanzeeusesthesameinformationas

humanstoidentifysyntheticspeechinsine‐waveandnoise‐vocodedforms(Heimbaueretal.,

2011).Insine‐wavespeechperception,humansperformsignificantlybetterwhenbothSW1

andSW2arepresent(Remezetal.,1981);thecomponentsthataremodeledonformantsF1

andF2inthenaturalspeechsignal.Panzee’shuman‐likeperformancewithsine‐wavespeech

indicatesthatshealsoattendsmoretotheseparticulartonesinthesyntheticspeech,with

implicationsforsensitivitytothecorrespondingformantsinnaturalspeech.Similarresults

occurredwithnoise‐vocodedspeech,showingthatbothPanzeeandhumansperformedbest

withstimulithatincludedfourorfivenoisebands,lesswellwiththree,andpoorlywithtwo.

Theseoutcomesmatchearlierfindingswithnoise‐vocodedspeech,showingdifferencesin

relativeintelligibilityofthesesynthesisformsbyhumans.

Asdiscussedearlier,itisdifficulttospecifyexactlywhichacousticcuesarecriticalin

thesetypesofsynthesizedspeech,orwhatbothhaveincommon(Remezetal.,1994).

However,itisclearthatamplitudeandfrequencymodulationovertimeisimportant,bothin

sine‐waveandnoise‐vocodedspeech(Remezetal.,1981;Shannonetal.,1995).Experiment2

testedtemporalcuingdifferently,usingasyntheticspeechformPanzeehadnoexperience

with.Thetime‐reversedspeechusedposesauniqueperceptualprobleminthatitdisrupts

acousticpatterningovertime.Intelligibilityoftime‐reversedspeechdependsonthelengthof

reversedsegmentsrelativetotypicalphonemelength.Differentialperformancehasbeen

attributedtothefactthatphoneticcontentwillberelativelyundisturbedifthelengthofthe

reversalwindowiswithinthetimeframeoftypicalphonemelength,whichisestimatedtobe

Page 86: Investigating Speech Perception in Evolutionary

72

50to100ms(Crystal&House,1988).DatafrombothPanzeeandthehumansrevealsastrong

relationshipbetweenwindow‐lengthandpercentagecorrectidentification,withsimilar

intelligibilitythresholdsineachcase.ItwasconcludedthatPanzeewassensitivetophoneme‐

relatedcuesintime‐reversed,aswellasinnatural,speech.

Finally,Experiment3testedPanzee’sabilitytosolvethelackofinvarianceproblem

createdinidentifyingspeechacrossavarietyoftalkers,includingbothfamiliarandunfamiliar

individuals,adultsandchildren,andavarietyofdialects.Despitethehighacousticvariabilityof

humanvoices(Evans&Iverson,2004;Hillenbrandetal.,1995;Pisoni,1995;Remez,2005),

Panzeewasabletorecognizewordsfromalltalker‐typesequallywell,includingthenovel

conditionsofchildren’svoices.Panzee’sperformancewassimilartoherhistoriclevelsinevery

case,withnosignificantperformancedifferencebetweenanyofthetalkergroups.

6.2Implicationsofexperimentalresults

ItisevidentfromPanzee’sperformanceinthethreeexperimentsthatalanguage‐

trainedchimpanzeecanprovideauniqueanimalmodelforinvestigatingspeechperception.

Overall,theseexperimentalresultscontradicttheSiSperspectiveandsupporttheAuditory

Hypothesis.Inaddition,resultsrevealthelikelyspeechperceptioncapabilitiesofanape‐human

commonancestor.

6.2.1TheSiSviewversustheAuditoryHypothesis.TheSiSviewproposesthathumans

possessevolutionaryspecializationsforspeechperceptionintheformofaspeechmoduleorat

leastaspeechmodeofperception(Mann&Liberman,1983;Trout,2001).Incontrast,the

AuditoryHypothesisclaimsthatgeneralauditorycapabilitiesaresufficienttoprocessspeechin

theabsenceofuniquelyhumanspecializations(Kuhl,1988)—atleastwiththenecessary

Page 87: Investigating Speech Perception in Evolutionary

73

experiencewithspeechinput.Manypreviousanimalstudieshaveinvestigatedspeech‐

perceptionissuesbasedonrudimentaryspeechelements,andoverallresultshavefavoredthe

AuditoryHypothesis.However,thecurrentworkandearlierdatafromexperimentstesting

Panzee(Heimbaueretal.,2011)arguemuchmorestronglyinthesamedirection.Inadditionto

showingevidenceofusingtheavailablespectro‐temporalcuesinsine‐waveandnoise‐vocoded

speechashumansdo,Panzeehasalsonowdemonstratedapparentattentiontophoneme

organization,andaleveloftalkernormalizationthatarguablysolvesthelackofinvariance

problem—atleastforindividualwords.

TheseoutcomesmaketheSiSview(Liberman&Mattingly,1989;Mann&Liberman,

1983;Whalen&Liberman,1987)appearhighlyunlikely—atleastinanystrictinterpretation.

Forexample,theSiSapproachtomodularityarguesthatthisspecializationisinnate,and

evolutionarilyuniquetohumans.Inthisperspective,Panzeecannothavesuchamodule;but

shestillshowsevidenceofprocessingspeechasifshedoes.Ifaspeechmoduledoesexistin

humans,itislikelytohaveemergedbasedonlatentspeech‐perceptioncapabilitiesthatwere

alreadypresentinthecommonancestorofhumansandchimpanzees,ratherthanfrom

scratch.Theclaimofaspeechmodeofperceptionislessextreme.Aspeechmodedoesnot

meanthatspeech‐perceptioncapabilitiesareinnateoruniquelyhuman,onlythatthelistener

beabletolearnimportantpropertiesofspeechfromexperience.Suchexperienceisroutinein

humandevelopment,andimpossibleinexactlythesameforminanyotheranimal.However,

Panzeedidreceiveconsistentexposuretohumanspeechalmostfrombirth,andher

demonstratedspeech‐perceptioncapabilitieshighlightthecriticalimportanceofsuchexperi‐

enceinthecontextoftheSiSversusAuditoryHypothesisdebate.

Page 88: Investigating Speech Perception in Evolutionary

74

6.2.2Top‐downprocessingandspeechperceptionexperience.Panzee’sabilitiesalso

provideevidencepertainingtotop‐downprocessinginspeechperception.Eachofthefour

taskspresentedtoherrequiretop‐downprocessing,oratleastisconsideredtointhecontext

ofhumanspeechperception.Aninnate,speech‐perceptionmodulewouldbeanextremeform

oftop‐downprocessing,althoughthisapproachthendownplaystheroleofexperiencewith

speech.Panzee’sabilitiesargueagainstsuchamoduleandinfavorofastrongroleof

experience.Basedonherperformance,itismorelikelythatthecriticalfactorinhumantop‐

downprocessingisthevastamountofpassiveexperiencethathumaninfantshavehearing

speechfrombirthon,ratherthanaspeechmodule.Forinstance,experiencehearingspeech

allowsinfantstolearnwhatspeechsoundsarebeingused,howdifferencesamongsoundsmay

ormaynotbesignificanttocategorizingthem,andthemeaningsthatsoundcombinations

convey(Marcusetal.,1999;Saffranetal.,1996;Werker&Desjardins,1995).

Perceivinginaspeechmodemaybemoreamatterofexperiencethaninnateness.Each

ofthetasksPanzeewasabletoperformbasedontop‐downprocessingmayalsorepresenta

formofspeech‐modeperception.Whilesomeclassicdemonstrationsofthehypothesized

speechmodearenotapplicabletoher,Panzee’sabilitytoperceivefundamentallyaltered

syntheticstimuliashavinglexicalproperties,tocompensateorcorrectfordistortions

introducedbytime‐reversalwindows,andtomapbetweenvariablespeechacousticsand

phoneticfeaturesarespeech‐mode,andtop‐down,functions.

Panzee’sspeech‐perceptionabilitiesmayalsorepresentanexampleof“emergents”

(Rumbaugh,2002;Rumbaugh,King,Beran,Washburn,&Gould,2007)—definedforboth

humansandanimalsasimportantcomponentsoflearningandcognitionasnewbehaviorswith

Page 89: Investigating Speech Perception in Evolutionary

75

antecedentsinpreviouslygainedknowledgeorexperience(Rumbaugh&Washburn,2003).

Theydifferfrombehaviorslearnedthroughoperantorclassicalconditioning,areconsidered

commoninnonhumans,andarearguedtoprovidethepotentialbasisfornewandinnovative

actions.Emergentsmaybenecessaryforadaptiveandbehaviorallyflexiblespeciestomeet

newchallengesincomplexenvironmentsandcanbeexpressedindifferentsituationsatthe

firstopportunity.Thespeech‐processingabilitiesthatPanzeehasnowareemergentsinthis

sense,withherextensiveexperiencewithnaturalspeechprovidingthebasisforsolvinga

varietyofperceptualproblems—includingnewones.

Thesespeech‐perceptionabilitiesarealsoatestamenttoPanzee’slanguage‐richand

enculturatedrearinghistory(Rumbaugh&Savage‐Rumbaugh,1996).Afewotherapeshave

beensimilarlyraised,includingthenow‐adultbonobosKanziandPanbanisha.Theseanimals

haveshownstrongevidenceofunderstandingspokenEnglishwords(Beranetal.,1998;Brakke

&Savage‐Rumbaugh,1995;Rumbaugh&Savage‐Rumbaugh,1996),includingwhenordered

syntacticallyinmeaningfulsentences(Savage‐Rumbaughetal.,1993).Incontrast,apesraised

usingthesamelexigramsandphotographstocommunicatebutwithoutearlyorextensive

functionalspeechinputdonotshowanynotableabilitytoidentifyspokenwordswhentested

annually(M.J.Beran,personalcommunication,January2010).

6.3Cognitiveprocessingandlanguage

AlthoughPanzeecanrecognizefamiliarwordsinnaturalandsyntheticversions,sheis

unlikelytobeprocessingspeechexactlyashumansdo.Forinstance,sheisclearlylessefficient

thanhumanswithspeechinanyoftheformstested.Herbestperformanceneverapproached

routinelevelsofspeechrecognitioninhumans,asdemonstratedinExperiment3whenshe

Page 90: Investigating Speech Perception in Evolutionary

76

neverperformedabove84%correctwithexclusivelynaturalwords.Thereislikelyaneffectof

relativelyunimportantfactors,suchasPanzeemayhaveexperiencedfrustrationorboredomby

repeatedlybeingtestedwiththesamelimitedsetofwordsandwithpotentiallydifficult

versionspresentedunpredictably.AlthoughPanzeerecognizesthespokenwordscorrespond‐

ingtolexigramsandphotographs,herinabilitytoproducespeechherselfsetsherapartfromall

humantalkers.Inhumans,criticalaspectsofspeechknowledgecontinuetodevelopthrough‐

outchildhood,withanindividual’sownspeechproductionperceptualknowledgeplayingarole.

Growingawarenessofthephonologyoflanguagesimilarlyincreasesbothrecognitionofand

manipulationofspeechsounds(Goswami,2006,2008).WhilePanzee’sperformancedemon‐

stratesthatspeechproductionisnotnecessaryforspeechperception,productionmay

nonethelessbeacriticalcomponentofhumanspeechprocessing.

AlthoughPanzee’sperformancesupportstheAuditoryHypothesisandageneralaudi‐

torymodelofspeechperception,thespecificcognitiveprocessesinvolvedarelargely

unknown.Humansareproposedtohavea“phonologicalloop”inworkingmemorythatstores,

rehearses,andmanipulatesauditoryinputinspeechform(Baddeley&Hitch,1974).No

informationisavailableconcerningPanzee’sworking‐memorycapabilities,andshemaybe

processingspeechinphonologicalratherthanpurelyauditoryforminshort‐termstorage.

Baddeley,Gathercole,andPapagno(1998)havesuggestedthatthephonologicalloopin

humansmightserveasalanguage‐learningdevicewithanintegralroleforbothspokenand

writtenlanguageacquisition.Itispossiblethatdifferencesbetweenthespeechperception

abilitiesofPanzeeandthoseofhumansmaybeduetodifferencesinthedevelopmentof

workingmemoryandpossiblyevenothercognitiveprocesses.Similarly,itisnotknownhow

Page 91: Investigating Speech Perception in Evolutionary

77

Panzeemapsthewordsshehearsontocorrespondingmeaningsinlong‐termmemory.

Differencesbetweenchimpanzeeandhumanlong‐termmemorymayberesponsibleforthe

decreaseinherefficiencywhenrespondingtowordrecognitiontasks,withoutreflectingimpor‐

tantdiscrepanciesinprocessingandcategorizingthespeechsoundsthemselves.

Panzeemayevenbeatadisadvantagesimplyduetobeingunabletotakelanguageuse

tothelevelofreading.Brain‐imagingstudiesinvestigatingbraindevelopmentinliterateversus

illiterateadultsprovideevidencethatreadinginfluencesbrainstructure,specificallybeing

correlatedwithincreasedwhitematterandconnectivityinthelefthemisphere.Inthespecific

areasinvolved—thecorpuscallosum,inferiorparietalregions,andparieto‐temporalregions—

areallinvolvedinreadingandverbalworkingmemory.Illiterateadultsareoftenmoreright

lateralizedanddonotshowcorrespondingwhitematterandconnectivityeffects.Theseresults

indicatelearningtoreadislinkedtobrainplasticityandaidsindevelopmentofleft‐

lateralization(Carreirasetal.,2009;Petersson,Silva,Castro‐Caldas,Ingvar,&Reis,2007).

Musicaltrainingalsoleadstostrongerleftlateralizationoftheperisylvianbrainareas

associatedwithlanguage(i.e.,Broca’sandWernicke’sregions;Limb,Kemeny,Ortigoza,

Rouhani,&Braun,2006),andimmatureneuralresponsestorhythmiccuesindyslexicchildren

potentiallyimpedespeechdevelopment(Goswami,2006).Increasesinoverallleftlateralization

duringdevelopmentmaybeimportantforspeech‐perceptionefficiencyaswell.

OtherexperimentsbyPeterssonandcolleagues(2007)demonstratethatliterateversus

illiteratelistenersmayengageindifferenttypesofcognitiveprocessingwhenlisteningtoand

repeatingspeech.Theyproposedthatwhileliteratelistenersreliedsolelyonlanguageprocess‐

ing,illiterateparticipantsintheirstudyalsoengagedinvisual‐spatialprocessing.Speculatively,

Page 92: Investigating Speech Perception in Evolutionary

78

alanguage‐trainedapeshowinglesshemisphericlateralizationmayalsobeutilizingmore

visuallybasedprocessingand“seeing”lexigramswhenhearingEnglishwords,insteadof

focusingontheiracousticproperties.Panzeemaybeprocessingspeechdifferentlybecause

ofalackoftheexperiencethathumansobtainbylearningtoread,whichthenallowsthem

totakeadvantageofbrainplasticityforenhancementofperception(Carreirasetal.,2009;

Peterssonetal.,2007).Inthiscase,itmaybethatdevelopmentofaspecificcognitiveability

affectsthedevelopmentofanother.AllthehumanstestedforcomparisontoPanzeemayhave

hadanadvantageduetoacombinationofgeneticandenvironmentalfactorscontributingto

functionalhemisphericspecialization(Peterssonetal,2007),whichislikelythecaseformany

humanlanguageabilities.

6.4Futuredirections

Thecurrentworkhasonlyscratchedthesurfacewithrespecttopotentiallanguage‐

relatedexperimentalworkwithPanzee.Forexample,additionalexperimentsconductedwith

bothPanzeeandotherlanguage‐trainedchimpanzeescouldprovideinformationaboutspeech‐

perceptionandtheunderlyingmechanismsinvolved,andcouldcontributetothediscussionof

theevolutionofassociatedcognitiveprocesses.Asafollow‐uptoExperiment1,experiments

withPanzeeandhumanscouldbeconductedtoinvestigatemappingofspectro‐temporal

acousticcuesontophoneticfeaturesandtolexicalidentitymoreprecisely.Specifically,the

effectsofsmallchangesatpointsinthestimulithatarecriticalforperceptionbyhumanscould

becomparedtocomparablechangesmadeatnon‐criticalpoints.

ResultsofExperiment2alsoprovideopportunitiesforinvestigatingmoredetailed

aspectsofPanzee’sspeechperception.Asdemonstrated,time‐reversedspeechbecamemore

Page 93: Investigating Speech Perception in Evolutionary

79

distortedandunintelligibleasreversal‐windowlengthincreased.ItappearsthatPanzee,like

humans,hashadadequateearlydevelopmentalexperiencewithlanguagetobeabletouse

phoneme‐lengthinformationtoperceivespeech.However,SaberiandPerrott(1999)also

reportedthatperceptionoftime‐reversedspeechwasrobusttotemporalshiftsmadetoentire

segments.Thismanipulationcreatedapparentreverberationinthespeechsounds,butdidnot

importantlydisturbphoneticperception.FutureexperimentscouldinvestigatePanzee’sability

torecognizetime‐reversedspeechusingthis“delaymethod”tomaptherobustnessofthetop‐

downprocessingmechanismssheutilizes.

Panzee’stalker‐normalizationabilitiescanalsobefollowedupon.Resultsofthis

experimentdidnotshedmuchlightonhownormalizationoccurs,althoughthetopicisstill

beingdebatedinthespeechresearchcommunity.Here,Panzeemaybestoringtalker‐specific,

speech‐soundvariants,ashumando(Bradlow,Nygaard,&Pisoni,1999;Creel&Tumlin,2009;

Sumner,2011),andthenusingtheserepresentationstogeneralizetonovel,unfamiliar

instancesofthesesounds.ExperimentswithPanzeecouldintroducehertonewwordsspoken

eitherbymultipletalkersorasingleindividual.Afterwordlearning,herabilitytoperceive

thoseitemsfromnoveltalkers,assyntheticreplicas,orinalteredandreducedversionswould

revealmoredetailsabouthowsheremembersandrepresentsphoneticinformation.

Finally,Panzee’spossibleuseofvisualcuestospeechhasneverbeeninvestigated.

Speechperceptionistypicallyamulti‐modalevent,withcuesavailablebothfromtheacoustic

signalandcorrespondingtalkerarticulationmovements.Attendingtothesevisualcuesis

knowntoaidhumanspeechperceptioninnoisyenvironments(Rosenblum,2005),and

facilitatespre‐linguisticdeafchildreninacquiringsomeaspectsofphonologicalawareness

Page 94: Investigating Speech Perception in Evolutionary

80

throughlipreading(Dodd&Hermelin,1977).Visualcuesalsohelpsightedchildreninlearning

todistinguishthefunctionalunitsofspokenlanguage,whiletheabsenceofthisinformationisa

detrimenttolanguagelearninginblindchildren(Mills,1987).

PattersonandWerker(2003)havearguedthattheabilityofevenyounginfantsto

integrateauditoryandvisualphoneticinformationisevidenceofuniquelyhumanspeech

mechanisms,aproposalthatiswellsuitedtotestingwithPanzee.IzumiandKojima(2007)have

providedsuggestiveevidencebytestingachimpanzeewithauditoryandvisualinformation

fromvocalizingconspecifics.Theirstudywaslimitedinthenumberofvocalizationsandvisual

stimulithatcouldbepresented—limitationsthatarenotapplicableinPanzee’scase.Infact,

shecouldbetestedquiteextensivelyforevidenceoftheintegrationofauditoryandvisual

informationusingavarietyofspeechsounds,articulatorymovements,andtalkers.

Thesearejustafewofthequestionsrelatingtospeechperceptioncapabilitiesthat

futureresearchwithalanguage‐trainedchimpanzeesuchasPanzeecouldaddress.Future

studiescouldinvestigateawiderangeofpertinentissuesregardingthedetailsofboth

perceptualandothercognitiveaspectsofspeechprocessing,sheddinglightnotonlyonher

particularabilities,butalsoonthoseofhumansandancestralapes.Themostcompellingmoti‐

vationforpursuingsuchwork,however,maybethatPanzeeisoneofaverysmallnumberof

animalswiththeseuniquespeechperceptionabilitiesand,therefore,shouldbeinvolvedin

suchresearchtothegreatestextentpossible.

REFERENCES

Allin,E.F.(1975).Evolutionofthemammalianmiddleear.JournalofMorphology,147,403‐

437.

Page 95: Investigating Speech Perception in Evolutionary

81

Appelbaum,I.(1996).Thelackofinvarianceproblemandthegoalofspeechperception.Fourth

InternationalConferenceonSpokenLanguageProcessing,Philadelphia,3,1541‐1544.

Baddeley,A.,Gathercole,S.,&Papagno,C.(1998).Thephonologicalloopasalanguagelearning

device.PsychologicalReview,105,158‐173.

Baddeley,A.D.,&Hitch,G.(1974).Workingmemory.InG.A.Bower(Ed.),Recentadvancesin

learningandmotivation(pp.47‐89).NewYork:Academic.

Barkat,M.,Meunier,F.,&Magrin‐Chagnolleau,I.(2002).Intelligibilityoftime‐reversedspeech

inFrench.ISCATutorialandResearchWorkshoponTemporalIntegrationinthePercep‐

tionofSpeech,Aix‐en‐Provence,France,P1‐3.

Beckers,G.J.L.(2011).Birdspeechperceptionandvocalproduction:Acomparisonwith

humans.HumanBiology,83,192‐212.

Beecher,M.D.(1974).Hearingintheowlmonkey(Aotustrivirgatus)I:Auditorysensitivity.

JournalofComparativePhysiologyandPsychology,86,898‐901.

Benzeghiba,M.,DeMori,R.,Deroo,O.,Dupont,S.,Erbes,T.,Jouvet,D.,Fissore,L.,Laface,P.,

Mertins,A.,Ris,C.,Rose,R.,Tyagi,V.,&Wellekens,C.(2007).Automaticspeechrecog‐

nitionandspeechvariability:Areview.SpeechCommunication,49,763‐786.

Beran,M.J.,Savage‐Rumbaugh,E.S.,Brakke,K.E.,Kelley,J.W.,&Rumbaugh,D.M.(1998).

Symbolcomprehensionandlearning:A"vocabulary"testofthreechimpanzees.

EvolutionofCommunication,2,171‐188.

Beran,M.J.,&Washburn,D.A.(2002).Chimpanzeerespondingduringmatchingtosample:

Controlbyexclusion.JournaloftheExperimentalAnalysisofBehavior,78,497‐508.

Page 96: Investigating Speech Perception in Evolutionary

82

Best,C.T.,Studdert‐Kennedy,M.,Manuel,S.,&Rubin‐Spitz,J.(1989).Discoveringphonetic

coherenceinacousticpatterns.Perception&Psychophysics,45,237‐250.

Boersma,P.,&Weenink,D.(2008).Praat:doingphoneticsbycomputer[Computerprogram].

Version5.1.11,retrieved1September2008fromhttp://www.praat.org/.

Bradlow,A.R.,Nygaard,L.C.,&Pisoni,D.B.(1999).Effectsoftalker,rate,andamplitudevaria‐

tiononrecognitionmemoryforspokenwords.Perception&Psychophysics,61,206‐219.

Brakke,K.E.,&Savage‐Rumbaugh,E.S.(1995).Thedevelopmentoflanguageskillsinbonobo

andchimpanzee‐I.Comprehension.Language&Communication,15,121‐148.

Breedlove,S.M.,Watson,N.V.,&Rosenzweig,M.R.(2007).Biologicalpsychology(5thed.).

Sunderland,CT:Sinauer.

Carreiras,M.,Seghier,M.L.,Baquero,S.,Estevez,A.,Lozano,A.,Devlin,J.T.,&Price,C.J.

(2009).Ananatomicalsignatureforliteracy.Nature,461,983‐986.

Carruthers,P.(2002).Thecognitivefunctionsoflanguage.BehavioralandBrainSciences,25,

657‐726.

Corballis,M.C.(2009).Theevolutionandgeneticsofcerebralasymmetry.PhilosophicalTrans‐

actionsofTheRoyalSocietyB:BiologicalSciences,364,867‐879.

Creel,S.C.,&Tumlin,M.A.(2009).Talkerinformationisnotnormalizedinfluentspeech:

Evidencefromon‐lineprocessingofspokenwords.31stAnnualMeetingoftheCognitive

ScienceSociety,Amsterdam,845‐850.

Crystal,T.H.,&House,A.S.(1988).Segmentaldurationsinconnected‐speechsignals:Current

results.JournaloftheAcousticalSocietyofAmerica,83,1553‐1573.

Page 97: Investigating Speech Perception in Evolutionary

83

Cutting,J.E.(1974).Twoleft‐hemispheremechanismsinspeechperception.Perception&

Psychophysics,16,601‐612.

Davis,M.H.,&Johnsrude,I.S.(2007).Hearingspeechsounds:Top‐downinfluencesonthe

interfacebetweenauditionandspeechperception.HearingResearch,229,132‐147.

Davis,M.H.,Johnsrude,I.S.,Hervais‐Adelman,A.,Taylor,K.,&McGettigan,C.(2005).Lexical

informationdrivesperceptuallearningofdistortedspeech:Evidencefromthe

comprehensionofnoise‐vocodedsentences.JournalofExperimentalPsychology:

General,134,222‐241.

deBoer,B.(2006).Theevolutionofspeech.InK.Brown(Ed.),Encyclopediaoflanguageand

linguistics(2nded.)(Vol.4,pp.335‐338).Amsterdam:Elsevier.

Diehl,R.L.,Lotto,A.J.,&Holt,L.L.(2004).Speechperception.AnnualReviewofPsychology,

55,149‐179.

Dodd,B.(1987).Lip‐reading,phonologicalcodinganddeafness.InB.Dodd,&R.Campbell

(Eds.),Hearingbyeye:Thepsychologyoflip‐reading(pp.177‐190).London:Erlbaum.

Dodd,B.,&Hermelin,B.(1977).Phonologicalcodingbytheprelinguisticallydeaf.Perception&

Psychophysics,21,413‐417.

Dooling,R.J.,Best,C.T.,&Brown,S.D.(1995).Discriminationofsyntheticfull‐formantand

sinewave/ra‐la/continuabybudgerigars(Melopsittacusundulatus)andzebrafinches

(Taeniopygiaguttata).JournaloftheAcousticalSocietyofAmerica,97,1839‐1846.

Dooling,R.J.,&Brown,S.D.(1990).Speechperceptionbybudgerigars(Melopsittacus

undulatus):Spokenvowels.Perception&Psychophysics,47,568‐574.

Page 98: Investigating Speech Perception in Evolutionary

84

Dorman,M.F.,Loizou,P.C.,Spahr,A.J.,&Maloff,E.(2002).Acomparisonofthespeech

understandingprovidedbyacousticmodelsoffixed‐channelandchannel‐pickingsignal

processorsforcochlearimplants.JournalofSpeech,Language,andHearingResearch,

45,783‐788.

Doupe,A.,&Kuhl,P.K.(1999).Birdsongandspeech:Commonthemesandmechanisms.

AnnualReviewofNeuroscience,22,567‐631.

Drullman,R.(2006).Thesignificanceoftemporalmodulationfrequenciesforspeechintelligi‐

bility.InS.Greenberg,&W.A.Ainsworth(Eds.),Listeningtospeech:Anauditory

perspective(pp.39‐47).Mahwah,NJ:Erlbaum.

Ehert,G.(1987).Lefthemisphereadvantageinthemousebrainforrecognizingultrasonic

communicationcalls.Nature,325,249‐251.

Eimas,P.(1999).Segmentalandsyllablerepresentationsintheperceptionofspeechbyyoung

infants.JournaloftheAcousticalSocietyofAmerica,105,1901‐1911.

Evans,B.G.,&Iverson,P.(2004).Vowelnormalizationforaccent:Aninvestigationofbest

exemplarlocationsinnorthernandsouthernBritishEnglishsentences.Journalofthe

AcousticalSocietyofAmerica,115,352‐361.

Fitch,W.T.(2000).Theevolutionofspeech:Acomparativereview.TrendsinCognitive

Sciences,4,258‐267.

Fleagle,J.G.(1999).Primateadaptationandevolution(2nded.).SanDiego:Academic.

Fodor,J.A.(1983).Modularityofmind:Anessayonfacultypsychology.Cambridge,MA:MIT.

Galantucci,B.,Fowler,C.A.,&Turvey,M.T.(2006).Themotortheoryofspeechperception

reviewed.PsychonomicBulletinandReview,13,361‐377.

Page 99: Investigating Speech Perception in Evolutionary

85

Gannon,P.J.,Holloway,R.L.,Broadfield,D.C.,&Braun,A.R.(1998).Asymmetryof

chimpanzeeplanumtemporale:HumanlikepatternofWernicke'slanguagearea

homolog.Science,279,220‐222.

Gans,C.(1992).Anoverviewoftheevolutionarybiologyofhearing.InD.B.Webster,

R.R.Fay,&A.N.Popper(Eds.),Theevolutionarybiologyofhearing(pp.3‐13).New

York:Springer.

Gerosa,M.,Giuliani,D.,&BrugnaraF.(2007).Acousticvariabilityandautomaticrecognitionof

children'sspeech.SpeechCommunication,49,847‐860.

Gerosa,M.,Lee,S.,Giuliani,D.,&Narayanan,S.(2006).Analyzingchildren'sspeech:Anacous‐

ticstudyofconsonantsandconsonant‐voweltransition.ProceedingsoftheInternational

ConferenceonAcoustics,Speech,andSignalProcessing(ICASSP06),France,1,393‐396.

Goswami,U.(2006).Readinganditsdevelopment:Insightsfrombrainscience.LiteracyToday,

46,28‐29.

Goswami,U.(2008).Reading,complexityandthebrain.Literacy,42,67‐74.

Greenwood,D.D.(1961).Criticalbandwidthandthefrequencycoordinatesofthebasilar

membrane.JournaloftheAcousticalSocietyofAmerica,33,1344‐1356.

Greenwood,D.D.(1990).Acochlearfrequency‐positionfunctionforseveralspecies‐‐29years

later.JournaloftheAcousticalSocietyofAmerica,87,2592‐2605.

Hackney,C.M.(2006).Fromcochleatocortex:Asimpleanatomicaldescription.InS.Green‐

berg,&W.A.Ainsworth(Eds.),Listeningtospeech:Anauditoryperspective(pp.65‐77).

Mahwah,NJ:Erlbaum.

Page 100: Investigating Speech Perception in Evolutionary

86

Heffner,R.S.(2004).Primatehearingfromamammalianperspective.TheAnatomicalRecord

PartA:DiscoveriesinMollecularCellularandEvolutionaryBiology,281A,1111‐1122.

Heffner,R.S.,&Heffner,H.E.(1992).Visualfactorsinsoundlocalizationinmammals.The

JournalofComparativeNeurology,317,219‐232.

Heimbauer,L.A.,Beran,M.J.,&Owren,M.J.(2011).Achimpanzeerecognizessynthetic

speechwithsignificantlyreducedacousticcuestophoneticcontent.CurrentBiology,21,

1210‐1214.

Hervais‐Adelman,A.,Davis,M.H.,Johnsrude,I.S.,&Carlyon,R.P.(2008).Perceptuallearning

ofnoisevocodedwords:Effectsoffeedbackandlexicality.JournalofExperimental

Psychology:HumanPerceptionandPerformance,34,460‐474.

Hienz,R.D.,Sachs,M.B.,&Sinnott,J.M.(1981).Discriminationofsteady‐statevowelsby

blackbirdsandpigeons.JournaloftheAcousticalSocietyofAmerica,70,699‐706.

Hillenbrand,J.M.,Clark,M.J.,&Baer,C.A.(2011).Perceptionofsinewavevowels.Journalof

theAcousticalSocietyofAmerica,129,3991‐4000.

Hillenbrand,J.M.,Getty,L.A.,Clark,M.J.,&Wheeler,K.(1995).Acousticcharacteristicsof

AmericanEnglishvowels.JournaloftheAcousticalSocietyofAmerica,97,3099‐3111.

Hopkins,W.D.,Marino,L.,Rilling,J.K.,&MacGregor,L.A.(1998).Planumtemporaleasymme‐

triesingreatapesasrevealedbymagneticresonanceimaging(MRI).Neuroreport,9,

2913‐2918.

Houston,D.M.,&Jusczyk,P.W.(2000).Theroleoftalker‐specificinformationinword

segmentationbyinfants.JournalofExperimentalPsychology:HumanPerceptionand

Performance,26,1570‐1582.

Page 101: Investigating Speech Perception in Evolutionary

87

Hugdahl,K.(2004).Dichoticlisteninginthestudyofauditorylaterality.InK.Hugdahl,&

R.J.Davidson(Eds.),Theasymmetricalbrain(pp.441‐476).Cambridge,MA:MIT.

Hugdahl,K.,&Davidson,R.J.(2004).Theasymmetricalbrain.Cambridge,MA:MIT.

Izumi,A.,&Kojima,S.(2004).Matchingvocalizationstovocalizingfacesinachimpanzee(Pan

troglodytes).AnimalCognition,7,179‐184.

Jackson,I.I.,Heffner,R.S.,&Heffner,H.E.(1999).Free‐fieldaudiogramoftheJapanese

macaque(Macacafuscata).JournaloftheAcousticalSocietyofAmerica,106,3017‐

3023.

Kalat,J.W.(2009).Biologicalpsychology(5thed.).Belmont,CA:Wadsworth.

Kashino,M.(2006).Phonemicrestoration:Thebraincreatesmissingspeechsounds.Acoustical

Science&Technology,27,318‐321.

Kimura,D.(1961).Cerebraldominanceandtheperceptionofverbalstimuli.CanadianJournal

ofPsychology,15,166‐171.

Kluender,K.R.,Diehl,R.L.,&Killeen,P.R.(1987).Japanesequailcanlearnphoneticcategories.

Science,237,1195‐1197.

Kluender,K.R.,Lotto,A.J.,&Holt,L.L.(2006).Contributionsofnonhumananimalmodelsto

understandinghumanspeechperception.InS.Greenberg,&W.A.Ainsworth(Eds.),

Listeningtospeech:Anauditoryperspective(pp.203‐220).Mahwah,NJ:Erlbaum.

Kojima,S.(1990).Comparisonofauditoryfunctionsinthechimpanzeeandhuman.Folia

Primatologica,55,62‐72.

Kojima,S.,&Kiritani,S.(1989).Vocal‐auditoryfunctionsinthechimpanzee:Vowelperception.

InternationalJournalofPrimatology,10,199‐213.

Page 102: Investigating Speech Perception in Evolutionary

88

Kojima,S.,Tatsumi,I.F.,&Hirose,H.(1989).Vocal‐auditoryfunctionsofthechimpanzee:

Consonantperception.HumanEvolution,4,403‐416.

Kuhl,P.K.(1988).Auditoryperceptionandtheevolutionofspeech.HumanEvolution,3,19‐43.

Kuhl,P.K.(2004).Earlylanguageacquisition:Crackingthespeechcode.NatureReviewsNeuro‐

science,5,831‐843.

Kuhl,P.K.,&Meltzoff,A.N.(1982).Thebimodalperceptionofspeechininfancy.Science,218,

1138‐1141.

Kuhl,P.K.,&Miller,J.D.(1975).Speechperceptionbythechinchilla:Voiced‐voiceless

distinctioninalveolarplosiveconsonants.Science,190,69‐72.

Kuhl,P.K.,&Padden,D.M.(1982).Enhanceddiscriminabilityatthephoneticboundariesfor

thevoicingfeatureinmacaques.Perception&Psychophysics,35,542‐550.

Kuhl,P.K.,&PaddenD.M.(1983).Enhanceddiscriminabilityatthephoneticboundariesforthe

placefeatureinmacaques.JournaloftheAcousticalSocietyofAmerica,73,1003‐1010.

Ladefoged,P.(2001).Vowelsandconsonants:Anintroductiontothesoundsoflanguage.

Malden,MA:Blackwell.

Laver,J.(1994).Principlesofphonetics.Cambridge,UK:CambridgeUniversity.

Lee,S.,Potamianos,A.,&Narayanan,S.(1999).Acousticsofchildren'sspeech:Developmental

changesoftemporalandspectralparameters.JournaloftheAcousticalSocietyof

America,105,1455‐1468.

Lenneberg,E.H.(1967).Biologicalfoundationsoflanguage.NewYork:Wiley.

Page 103: Investigating Speech Perception in Evolutionary

89

Lewis,D.E.,&Carrell,T.D.(2007).Theeffectofamplitudemodulationonintelligibilityoftime‐

varyingsinusoidalspeechinchildrenandadults.Perception&Psychophysics,69,1140‐

1151.

Liberman,A.M.(1982).Onfindingthatspeechisspecial.AmericanPsychologist,37,148‐167.

Liberman,A.M.,Cooper,F.S.,Shankweiler,D.P.,&Studdert‐Kennedy,M.(1967).Perceptionof

thespeechcode.PsychologicalReview,74,431‐461.

Liberman,A.M.,&Mattingly,I.G.(1985).Themotortheoryofspeechperceptionrevisited.

Cognition,21,1‐36.

Liberman,A.M.,&Mattingly,I.G.(1989).Aspecializationforspeechperception.Science,243,

489‐494.

Licklider,J.C.R.,&Miller,G.A.(1960).Theperceptionofspeech.InS.S.Stevens(Ed.),

Handbookofexperimentalpsychology(pp.1040‐1074).NewYork:Wiley.

Lieberman,P.(1968).Primatevocalizationandhumanlinguisticability.JournaloftheAcoustical

SocietyofAmerica,44,1574‐1584.

Limb,C.J.,Kemeny,S.,Ortigoza,E.B.,Rouhani,S.,&Braun,A.R.(2006).Lefthemispheric

lateralizationofbrainactivityduringpassiverhythmperceptioninmusicians.The

AnatomicalRecordPartA,288,382‐389.

Lindblom,B.E.F.,&Studdert‐Kennedy,M.(1967).Ontheroleofformanttransitionsinvowel

recognition.JournaloftheAcousticalSocietyofAmerica,42,830‐843.

Loebach,J.L.,&Wickesberg,R.E.(2006).Therepresentationofnoisevocodedspeechinthe

auditorynerveofthechinchilla:Physiologicalcorrelatesoftheperceptionofspectrally

reducedspeech.HearingResearch,213,130–144.

Page 104: Investigating Speech Perception in Evolutionary

90

Mann,V.A.,&Liberman,A.M.(1983).Somedifferencesbetweenphoneticandauditorymodes

ofperception.Cognition,14,211‐235.

Marcus,G.,Vijayan,S.,Rao,S.,&Vishton,P.M.(1999).Rulelearningbyseven‐month‐old

infants.Science,283,77‐80.

Masterton,B.,Heffner,H.,&Ravizza,R.(1969).Theevolutionofhumanhearing.Journalofthe

AcousticalSocietyofAmerica,45,966‐985.

McGurk,H.,&MacDonald,J.(1976).Hearinglipsandseeingvoices.Nature,264,746‐748.

Mills,A.(1987).Thedevelopmentofphonologyintheblindchild.InB.Dodd,&R.Campbell

(Eds.),Hearingbyeye:Thepsychologyoflip‐reading(pp.145‐162).London:Erlbaum.

Newman,R.S.(2006).Perceptualrestorationintoddlers.Perception&Psychophysics,68,625‐

642.

Nygaard,L.C.&Pisoni,D.B.(1998).Talker‐specificlearninginspeechperception.Perception&

Psychophysics,60,355‐376.

Nygaard,L.C.,Sommers,M.S.,&Pisoni,D.B.(1994).Speechperceptionasatalker‐contingent

process.PsychologicalScience,5,42‐46.

Ohms,V.R.,Gill,A.,VanHeijningen,C.A.A.,Becker,G.J.L.,&tenCate,C.(2010).Zebrafinches

exhibitspeaker‐independentphoneticperceptionofhumanspeech.Philosophical

TransactionsofTheRoyalSocietyB:BiologicalSciences,277,1003‐1009.

Olive,J.P,Greenwood,A.,&Coleman,J.(1993).AcousticsofAmericanEnglishspeech:

Adynamicapproach.NewYork:Springer.

Owren,M.J.(2010).GSUPraatTools:ScriptsformodifyingandanalyzingsoundsusingPraat

acousticssoftware.BehaviorResearchMethods,40,822‐829.

Page 105: Investigating Speech Perception in Evolutionary

91

Owren,M.J.,&Cardillo,G.C.(2006).Therelativerolesofvowelsandconsonantsindiscrimi‐

natingtalkeridentityversuswordmeaning.JournaloftheAcousticalSocietyofAmerica,

119,1727‐1739.

Owren,M.J.,Hopp,S.L.,Sinnott,J.M.,&Petersen,M.R.(1988).Absoluteauditorythresholds

inthreeOldWorldmonkeyspecies(Cercopithecusaethiops,C.neglectus,Macaca

fuscata)andhumans(Homosapiens).JournalofComparativePsychology,102,99‐107.

Patterson,M.L.,&Werker,J.F.(2003).Two‐month‐oldinfantsmatchphoneticinformationin

lipsandvoice.DevelopmentalScience,6,191‐196.

Perfetti,C.A.,&Sandak,R.(2000).Readingoptimallybuildsonspokenlanguageimplications

fordeafreaders.JournalofDeafStudiesandDeafEducation,5,32‐50.

Petersen,M.R.,Beecher,M.D.,Zoloth,S.R.,Green,S.,Marler,P.,Moody,D.B.,&Stebbins,

W.C.(1984).NeurallateralizationofvocalizationsbyJapanesemacaques:Communica‐

tivesignificanceismoreimportantthanacousticstructure.BehavioralNeuroscience,98,

779‐790.

Petersen,M.R.,Beecher,M.D.,Zoloth,S.R.,Moody,D.B.,&Stebbins,W.C.(1978).Neural

lateralizationofspecies‐specificvocalizationsbyJapanesemacaques(Macacafuscata).

Science,202,324‐327.

Petersson,K.M.,Silva,C.,Castro‐Caldas,A.,Ingvar,M.,&Reis,A.(2007).Literacy:Acultural

influenceonfunctionalleft‐rightdifferencesintheinferiorparietalcortex.European

JournalofNeuroscience,26,791‐799.

Pickles,J.O.(1988).Anintroductiontothephysiologyofhearing.SanDiego:Academic.

Page 106: Investigating Speech Perception in Evolutionary

92

Pisoni,D.B.(1995).Somethoughtson"normalization"inspeechperception.Researchon

SpokenLanguageProcessing,ProgressReportNo.20,IndianaUniversity,3‐29.

Purnell,T.C.,andRaimy,E.(2008).UniversityofWisconsin‐Madison.Shorttime‐reversal

windowsusedtoinvestigatetheprocessionofintonation.Unpublishedmanuscript.

Ramus,F.,Hauser,M.D.,Miller,C.,Morris,D.,&Mehler,J.(2000).Languagediscriminationby

humannewbornsandbycotton‐toptamarinmonkeys.Science,288,349‐351.

Rand,T.C.(1974).Dichoticreleasefrommaskingforspeech.JournaloftheAcousticalSocietyof

America,55,678‐680.

Remez,R.E.(2005).Perceptualorganizationofspeech.InD.B.Pisoni,&R.E.Remez(Eds.),The

handbookofspeechperception(pp.28‐50).Oxford:Blackwell.

Remez.,R.E.,Dubowski,K.R.,Broder,R.S.,Davids,M.L.,Grossman,Y.S.,Moskalenko,M.,

Pardo,J.S.,&Hasbun,S.M(2011).Auditory‐phoneticprojectionandlexicalstructurein

therecognitionofsine‐wavewords.JournalofExperimentalPsychology:HumanPercep‐

tionandPerformance,37,968‐977.

Remez,R.E.,&Rubin,P.E.(1990).Ontheperceptionofspeechfromtime‐varyingacoustic

information:Contributionsofamplitudevariation.Perception&Psychophysics,48,313‐

325.

Remez,R.E.,Rubin,P.E.,Berns,S.M.,Pardo,J.S.,&Lang,J.M.(1994).Ontheperceptional

organizationofspeech.PsychologicalReview,101,129‐156.

Remez,R.E.,Rubin,P.E.,Pisoni,D.B.,&Carrell,T.D.(1981).Speechperceptionwithouttradi‐

tionalspeechcues.Science,212,947‐949.

Page 107: Investigating Speech Perception in Evolutionary

93

Romski,M.A.,&Sevcik,R.A.(1996).Breakingthespeechbarrier:Languagedevelopment

throughaugmentedmeans.Baltimore:Brookes.

Rosenblum,L.D.(2005).Primacyofmultimodalspeechperception.InD.B.Pisoni,&

R.E.Remez(Eds.),Thehandbookofspeechperception(pp.51‐78).Oxford:Blackwell.

Rosner,B.S.,Talcott,J.B.,Witton,C.,Hogg,J.D.,Richardson,A.J.,Hansen,P.C.,&Stein,J.F.

(2003)Theperceptionof"sine‐wavespeech"byadultswithdevelopmentaldisabilities.

JournalofSpeech,Language,andHearingResearch,46,68‐79.

Rosowski,J.J.(1992).Hearingintransitionalmammals:Predictionsfromthemiddle‐ear

anatomyandhearingcapabilitiesofextantmammals.InD.B.Webster,R.R.Fay,&

A.N.Popper(Eds.),Theevolutionarybiologyofhearing(pp.615‐632).NewYork:

Springer.

Rumbaugh,D.M.(2002).Emergentsandrationalbehavior.EyeonPsiChi,6(2),8‐14.

Rumbaugh,D.M.,King,J.E.,Beran,M.J.,Washburn,D.A.,&Gould,K.L.(2007).Asalience

theoryoflearningandbehavior:Withperspectivesonneurobiologyandcognition.

InternationalJournalofPrimatology,28,973‐996.

Rumbaugh,D.M.,&Savage‐Rumbaugh,E.S.(1996).Biobehavioralrootsoflanguage:Words,

apes,andachild.InB.M.Velichkovsky,&D.M.Rumbaugh(Eds.),Communicating

meaning:Theevolutionanddevelopmentoflanguage(pp.257‐274).Mahwah,NJ:

Erlbaum.

Rumbaugh,D.M.,&Washburn,D.A.(2003).Intelligenceofapesandotherrationalbeings.

NewHaven,CT:YaleUniversity.

Page 108: Investigating Speech Perception in Evolutionary

94

Saberi,K.,&Perrott,D.R.(1999).Cognitiverestorationofreversedspeech.Nature,398,760.

Saffran,J.R.,Aslin,R.N.,&Newport,E.L.(1996).Statisticallearningby8month‐oldinfants.

Science,274,1926‐1928.

Saldana,H.M.,Pisoni,D.B.,Fellowes,J.M.,&Remez,R.E.(1996).Audio‐visualspeech

perceptionwithoutspeechcues.FourthInternationalConferenceonSpokenLanguage,

Philadelphia,4,2187‐2190.

Savage‐Rumbaugh,E.S.,Murphy,J.,Sevcik,R.A.,&Brakke,K.E.(1993).Languagecomprehen‐

sioninapeandchild.MonographsoftheSocietyforResearchinChildDevelopment,

58,1‐222.

Sawusch,J.R.(2005).Acousticanalysisandsynthesisofspeech.InD.B.Pisoni,&R.E.Remez

(Eds.),Thehandbookofspeechperception(pp.7‐27).Oxford:Blackwell.

Shannon,R.V.,Fu,Q.J.,&Galvia,J.(2004).Thenumberofspectralchannelsrequiredfor

speechrecognitiondependsonthedifficultyofthelisteningsituation.ActaOto‐

Laryngologica,552,50‐54.

Shannon,R.V.,Zeng,F.,Kamath,V.,Wygonski,J.,&Ekelid,M.(1995).Speechrecognitionwith

primarilytemporalcues.Science,270,303‐304.

Sibley,C.G.,&Ahlquist,J.E.(1990).Phylogenyandclassificationofbirds:Astudyinmolecular

evolution.NewHaven,CT:YaleUniversity.

Simpson,A.P.(2009).Phoneticdifferencesbetweenmaleandfemalespeech.Languageand

LinguisticsCompass,3,621‐640.

Sinnott,J.M.,&Gilmore,C.S.(2004).Perceptionofplace‐of‐articulationinformationinnatural

speechbymonkeysversushumans.PerceptionandPsychophysics,66,1341‐1350.

Page 109: Investigating Speech Perception in Evolutionary

95

Skipper,J.I.,vanWassenhove,V.,Nusbaum,H.C.,&Small,S.I.(2007).Hearinglipsandseeing

voices:Howcorticalareassupportingspeechproductionmediateaudiovisualspeech

perception.CerebralCortex,17,2387‐2399.

Souza,P.,&Rosen,S.(2009).Effectsofenvelopebandwidthontheintelligibilityofsine‐and

noise‐vocodedspeech.JournaloftheAcousticalSocietyofAmerica,126,792‐805.

Stebbins,W.C.(1973).HearingofOldWorldmonkeys(Cercopithecinae).AmericanJournalof

PhysicalAnthropology,38,357‐364.

Stebbins,W.C.(1983).Theacousticsenseofanimals.Cambridge,MA:HarvardUniversity.

Stebbins,W.C.,&Moody,D.B.(1994).Howmonkeysheartheworld:Auditoryperceptionin

nonhumanprimates.InR.R.Fay,&A.N.Popper(Eds.),Comparativehearing:Mammals

(pp.97‐133).NewYork:Springer.

Studdert‐Kennedy,M.(1980).SpeechPerception.LanguageandSpeech,23,45‐66.

Studdert‐Kennedy,M.,&Shankweiler,D.(1970).Hemisphericspecializationforspeech

perception.JournaloftheAcousticalSocietyofAmerica,48,579‐594.

Sumby,W.H.,&Pollack,I.(1954).Visualcontributiontospeechintelligibilityinnoise.Journal

oftheAcousticalSocietyofAmerica,26,212‐215.

Sumner,M.(2011).Theroleofvariationintheperceptionofaccentedspeech.Cognition,119,

131‐136.

Taglialatela,J.P.(2007).Functionalandstructuralasymmetriesforauditoryperceptionand

vocalproductioninnonhumanprimates.InW.D.Hopkins(Ed.),Theevolutionof

hemisphericspecializationinprimates(pp.120‐145).Amsterdam:Elsevier.

Page 110: Investigating Speech Perception in Evolutionary

96

Taglialatela,J.P.,Russell,J.L.,Schaeffer,J.A.,&Hopkins,W.D.(2008).Communicative

signalingactivates"Broca's"homologinchimpanzees.CurrentBiology,18,343‐348.

Tartter,V.C.(1989).What'sinawhisper?JournaloftheAcousticalSocietyofAmerica,86,

1678‐1683.

Trout,J.D.(2001).Thebiologicalbasisofspeech:Whattoinferfromtalkingtotheanimals.

PsychologicalReview,108,523‐549.

Trout,J.D.(2003).Biologicalspecializationsforspeech:Whatcantheanimalstellus?Current

DirectionsinPsychologicalScience,12,155‐159.

Tuomainen,J.,Andersen,T.S.,&Tiippana,K.,&Sams,M.(2005).Audio‐visualspeechpercep‐

tionisspecial.Cognition,96,B13‐B22.

Walden,B.E.,Busacco,D.A.,&Montgomery,A.A.(1993).Benefitfromvisualcuesinauditory‐

visualspeechrecognitionbymiddle‐agedandelderlypersons.JournalofSpeechand

HearingResearch36,431‐436.

Warren,R.M.(1970).Perceptualrestorationofmissingspeechsounds.Science,167,392‐393.

Warren,R.M.,&Obusek,C.J.(1971).Speechperceptionandphonemicrestorations.Percep‐

tion&Psychophysics,9,358‐362.

Werker,J.F.,&Desjardins,R.N.(1995).Listeningtospeechinthefirstyearoflife:Experiential

influencesonphonemeperception.CurrentDirectionsinPsychologicalScience,4,76‐81.

Whalen,D.H.,&Liberman,A.M.(1987).Speechperceptiontakesprecedenceovernonspeech

perception.Science,237,169‐171.

Wood,B.(1996).Humanevolution.Bioessays,18,945‐954.