16
Journal of Experimental Psychology: Human Perception and Performance 1980, Vol. 6, No. 1, 110-125 Phonetic Categorization in Auditory Word Perception William F. Ganong III Brown University To investigate the interaction in speech perception of auditory information and lexical knowledge (in particular, knowledge of which phonetic sequences are words), acoustic continua varying in voice onset time were constructed so that for each acoustic continuum, one of the two possible phonetic cat- egorizations made a word and the other did not. For example, one continuum ranged between the word dash and the nonword task; another used the nonword dask and the word task. In two experiments, subjects showed a significant lexical effect—that is, a tendency to make phonetic categoriza- tions that make words. This lexical effect was greater at the phoneme boundary (where auditory information is ambiguous) than at the ends of the continua. Hence the lexical effect must arise at a stage of processing sensitive to both lexical knowledge and auditory information. Linguistic context has long been known to aid and bias the identification of speech. For example, the identification of sequences of words in noise is substantially aided if the words form sentences (Miller, Heise, & Lichten, 1951). The phoneme restoration effect (Warren, 1970) shows that context can control the perception of individual seg- ments : When a single phone of an utterance is replaced by a noise burst or a cough, a This work was supported by Grant HD 05331 from the National Institute of Child Health and Human Development to Peter D. Eimas, by Na- tional Science Foundation (NSF) Grant BMS 75 08439 to Richard B. Millward, by National Insti- tutes of Health Grant NO1 HD 12420 to Haskins Laboratories, and by an NSF postdoctoral fellow- ship to the author (SMI 77-12336). Alvin Liberman made the facilities of Haskins Laboratories available to me. Terry Halwes's and Patricia Keating's aid in generating the stimuli was essential, and discussion with them, Susan Carey, and Peter D. Eimas helped to clarify the issues dis- cussed here. Several reviewers, including Neal Mac- millan, Bruno Repp, and Michael Posner, made comments that improved the presentation of this material. I thank them all. This work was reported at the 95th meeting of the Acoustical Society of America at Providence, Rhode Island, May 1978. Requests for reprints should be sent to William F. Ganong III, who is now at the Department of Psy- chology, University of Pennsylvania, 3815 Walnut Street, Philadelphia, Pennsylvania 19104. listener is not aware which phone was re- placed. Both previous and subsequent con- text influence this effect (Warren & Sher- man, 1974). Perception is influenced not only by immediate linguistic context (e.g., the role a word plays in a sentence) but also by the frequency of a word's use in the language. For example, there is a large word-frequency effect in the identification of words presented in noise (Broadbent, 1967). This article examines the influence on phonetic categorization of a rather simple linguistic variable: the lexical status of a phonetic sequence (i.e., whether the pho- netic sequence is a word). Lexical status is already known to influence phonetic pro- cessing. Reaction time for phoneme detec- tion is faster if the target appears in a word than if it appears in a nonword (Rubin, Turvey, & Van Gelder, 1976). Presumably this word advantage reflects a perceptual bias in favor of words similar to the per- ceptual bias that favors high-frequency words over low-frequency words in noise. The purpose of this article is to determine the stage in the perceptual process at which the biasing effect of lexical status appears. Does it follow phonetic categorization, or, alternatively, can it influence the interpre- Copyright 1980 by the American Psychological Association, Inc. 0096-1523/80/0601-0110$00.75 110

Phonetic Categorization in Auditory Word Perception...text influence this effect (Warren & Sher-man, 1974). Perception is influenced not only by immediate linguistic context (e.g.,

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Phonetic Categorization in Auditory Word Perception...text influence this effect (Warren & Sher-man, 1974). Perception is influenced not only by immediate linguistic context (e.g.,

Journal of Experimental Psychology:Human Perception and Performance1980, Vol. 6, No. 1, 110-125

Phonetic Categorization in Auditory Word Perception

William F. Ganong IIIBrown University

To investigate the interaction in speech perception of auditory informationand lexical knowledge (in particular, knowledge of which phonetic sequencesare words), acoustic continua varying in voice onset time were constructedso that for each acoustic continuum, one of the two possible phonetic cat-egorizations made a word and the other did not. For example, one continuumranged between the word dash and the nonword task; another used thenonword dask and the word task. In two experiments, subjects showed asignificant lexical effect—that is, a tendency to make phonetic categoriza-tions that make words. This lexical effect was greater at the phonemeboundary (where auditory information is ambiguous) than at the ends ofthe continua. Hence the lexical effect must arise at a stage of processingsensitive to both lexical knowledge and auditory information.

Linguistic context has long been knownto aid and bias the identification of speech.For example, the identification of sequencesof words in noise is substantially aided ifthe words form sentences (Miller, Heise, &Lichten, 1951). The phoneme restorationeffect (Warren, 1970) shows that contextcan control the perception of individual seg-ments : When a single phone of an utteranceis replaced by a noise burst or a cough, a

This work was supported by Grant HD 05331from the National Institute of Child Health andHuman Development to Peter D. Eimas, by Na-tional Science Foundation (NSF) Grant BMS 7508439 to Richard B. Millward, by National Insti-tutes of Health Grant NO1 HD 12420 to HaskinsLaboratories, and by an NSF postdoctoral fellow-ship to the author (SMI 77-12336).

Alvin Liberman made the facilities of HaskinsLaboratories available to me. Terry Halwes's andPatricia Keating's aid in generating the stimuli wasessential, and discussion with them, Susan Carey,and Peter D. Eimas helped to clarify the issues dis-cussed here. Several reviewers, including Neal Mac-millan, Bruno Repp, and Michael Posner, madecomments that improved the presentation of thismaterial. I thank them all. This work was reportedat the 95th meeting of the Acoustical Society ofAmerica at Providence, Rhode Island, May 1978.

Requests for reprints should be sent to William F.Ganong III, who is now at the Department of Psy-chology, University of Pennsylvania, 3815 WalnutStreet, Philadelphia, Pennsylvania 19104.

listener is not aware which phone was re-placed. Both previous and subsequent con-text influence this effect (Warren & Sher-man, 1974). Perception is influenced notonly by immediate linguistic context (e.g.,the role a word plays in a sentence) but alsoby the frequency of a word's use in thelanguage. For example, there is a largeword-frequency effect in the identificationof words presented in noise (Broadbent,1967).

This article examines the influence onphonetic categorization of a rather simplelinguistic variable: the lexical status of aphonetic sequence (i.e., whether the pho-netic sequence is a word). Lexical status isalready known to influence phonetic pro-cessing. Reaction time for phoneme detec-tion is faster if the target appears in a wordthan if it appears in a nonword (Rubin,Turvey, & Van Gelder, 1976). Presumablythis word advantage reflects a perceptualbias in favor of words similar to the per-ceptual bias that favors high-frequencywords over low-frequency words in noise.The purpose of this article is to determinethe stage in the perceptual process at whichthe biasing effect of lexical status appears.Does it follow phonetic categorization, or,alternatively, can it influence the interpre-

Copyright 1980 by the American Psychological Association, Inc. 0096-1523/80/0601-0110$00.75

110

Page 2: Phonetic Categorization in Auditory Word Perception...text influence this effect (Warren & Sher-man, 1974). Perception is influenced not only by immediate linguistic context (e.g.,

PHONETIC CATEGORIZATION IN WORD PERCEPTION 111

tation of acoustic cues, which underliesphonetic categorization ?

Asking this question presupposes thatthere is a stage of processing in normalspeech perception that carries out phoneticcategorization. This is a substantive as-sumption—it is possible, for example, toconstruct models for word perception thatdo not include a stage of phonetic categoriz-ation (Klatt, 1979). But the assumptionmust certainly be correct for experimentsin which subjects are required to make pho-netic categorizations. Evidence for this as-sumption in other experimental contextsderives primarily from work on categoricalperception. The first psychologists to in-vestigate the perception of syntheticallyconstructed stop consonants discoveredthat listeners were able to discriminate withease only those stimuli that they perceivedas belonging to different phonetic categories(Liberman, Harris, Hoffman, & Griffith,1957). Discrimination of stimuli from thesame phonetic category was only slightlyabove chance. This phenomenon was namedcategorical perception because subjectsacted as if the only information availablewas the phonetic category of each stimulus.

The concept of categorical perceptionhas since undergone modification. It is nowclear that subjects can use some auditoryinformation as well as information aboutphonetic categories in' discrimination(Carney, Widin, & Viemeister, 1977; Fuji-saki & Kawashima, 1970), but under manyconditions, the perception of speech, especi-ally the stop consonants, is very nearlycategorical. Is this auditory information ofany use in the normal course of speech per-ception (the purpose of which is clearly theidentification of words and sentences, nottheir discrimination)? Liberman, Mat-tingly, and Turvey (1972) proposed, in-stead, that the role of phonetic categoriza-tion is the substitution of a phonetic labelfor the auditory information representing astop consonant in order to facilitate higherlevel linguistic processing.1

The phonetic categorization of an acous-tic continuum between different stop con-sonants is typically characterized by twounambiguous regions (which are consis-

tently given one or another phonetic cate-gorization) separated by a narrow boundaryregion containing phonetically ambiguousstimuli. An acoustic variable frequentlyused to construct such acoustic continua isvoice onset time (VOT). VOT is an acousticcue for voicing in syllable-initial stops inmany languages (Lisker & Abramson,1964). Perception of such a continuum isoften described by a single number, thelocus of the phoneme boundary, which isthe (interpolated) point on the acousticcontinuum that would receive each pho-netic categorization on half of the trials.For an English-speaking subject, the pho-neme boundary between d and t responsesfor a [da-ta] continuum is at about 35-msec VOT. Stimuli with VOT of less than30 msec are consistently labelled d, andstimuli with VOT greater than 40 msec areconsistently labeled t.

A model in which lexical status affectsphonetic processing only after phoneticcategorization has occurred can be called acategorical model. In an interactive model,lexical status would be allowed to directand bias the processing of the auditoryinformation specifying phonetic categories.One example of such a model is the criterion-shift model, in which lexical status would

1 The situation is clearly different for noncate-gorical phonetic distinctions. Many phonetic dis-tinctions (such as differences between vowels) areconveyed by large acoustic differences that can beeasily discriminated regardless of phonetic category.Context can bias the interpretation of these differ-ences in words. In an experiment on the perceptionof disyllabic words in noise, Pollack (1959) showedthat some additional auditory information beyond asimple phonetic transcription is available to subjectsfor a few seconds after a word is presented. He hadsubjects identify words from a response set given atvarious delays after the stimulus. For delays under5 sec, performance depended on the delay betweenpresentation of the stimulus and the response setand was better than performance based on subjects'immediate identification of the stimulus presented.Thus, some form of auditory memory must havebeen involved. However, Pollack's stimuli differedin both consonants and vowels. The auditory mem-ory used may have been primarily memory forvowels. The present experiment extends Pollack'sresults to the class of speech sounds that are leastlikely to allow such auditory memory: the stopconsonants.

Page 3: Phonetic Categorization in Auditory Word Perception...text influence this effect (Warren & Sher-man, 1974). Perception is influenced not only by immediate linguistic context (e.g.,

112 WILLIAM F. GANONG III

0.40

LJ_ 0.20

0.00

1.00

8 0.80w>§at/>S. 0.60•oa>o

> 0.40

aI 0.20

0.00 L

Word -Nonword

Neutral

ShortVOT

-7.5Msec

-2.5Msec

+2.5Msec

+ 7.5Msec

LongVOT

Boundary

Figure 1. The lexical effect, as predicted by the categorical model. (The lower portion of the graphshows the proportion of stimuli given a voiced phonetic categorization as a function of voice onsettime (VOT) for three different types of continua. The solid line shows idealized data from a neutralcontinuum, whose perception is not biased by lexical effects. The line with short dashes shows thethe categorical model's prediction of a lexical effect on a continuum whose voiced end is a word andwhose voiceless end is a nonword. Similarly, the line with long dashes shows the result of a post-categorical tendency to make phonetic categorizations voiceless, on a continuum in which voicelessresponses make words. The top part of the figure shows, as a function of VOT, a measure of thelexical effect.)

change the criterion by which the auditoryinformation is judged. The categoricalmodel can be tested by determining whetherthe effect of lexical status is concentratedat the phoneme boundary.

In the categorical model, a bias towardphonetic categorizations that make wordswould operate as a correction process. Can-didate phonetic categorizations that did nothappen to make words would be changed tophonetic categorizations that did makewords. Thus, subjects presented withspeech stimuli for phonetic classification asbeginning with [d] or [t] might, upon hear-ing the nonword task, correct the [t]categorization to [d] so as to make theword dash. Since a strict categorical model

assumes no acoustic information is avail-able when lexical status has its effect (i.e.,after phonetic categorization), for a partic-ular acoustic continuum between wordsand nonwords, the probability of a cate-gorization being corrected depends only onthe categorization and not on the acousticinformation that specified it. A formal de-scription of this model is given in AppendixA.

In the present experiment, the phoneticcategorization of a lexically biased con-tinuum is not compared with the categoriz-ation of an unbiased continuum, but witha continuum biased in the opposite direc-tion. Figure 1 shows the result of sucrbiases, according to the categorical model

Page 4: Phonetic Categorization in Auditory Word Perception...text influence this effect (Warren & Sher-man, 1974). Perception is influenced not only by immediate linguistic context (e.g.,

PHONETIC CATEGORIZATION IN WORD PERCEPTION 113

The figure shows a hypothetical phoneticcategorization function (the solid line)based on a cumulative normal function, as-suming a difference of 5-msec VOT corre-sponds to a z score difference of 1.2. Thecategorical model's predictions for theshape of the identification functions, as-suming a probability of correcting a non-word response of .25, is also shown. In thetop panel of Figure 1 is the resulting lexi-cal-effect function, which is simply the dif-ference between the two identification func-tions. The shape of this lexical-effect func-tion, as predicted by the categorical model,is derived in Appendix A. In Figure 1 it isassumed that the two lexical biases areequal, so the lexical-effect function does notdepend on VOT. If the biases are not equal,the resulting lexical-effect function will bea monotonic function of VOT, and its valueat the phoneme boundary will be theaverage of the values at the ends of thecontinuum.

In a criterion-shift model, on the otherhand, a bias toward hearing words couldaffect the interpretation of acoustic in-formation at a stage of processing beforephonetic categorization. The criterion formaking a phonetic categorization would de-pend on lexical status. This change in cri-terion would produce a shift in the locationof the phoneme boundary. For example, asubject whose phoneme boundary for a [da-ta] continuum was at 35-msec VOT mightrequire 40-msec VOT to hear a [t] in theenvironment uash (because dash is a wordand task is not). Such a shift in the locationof the phoneme boundary would not pro-duce a uniform effect throughout the con-tinuum but would produce an effect con-centrated near the phoneme boundary, justas a change in threshold affects the prob-ability of detection of near-threshold stimulifar more than subthreshold or suprathresh-old stimuli. The categorization of stimulifar from the phoneme boundary would notbe influenced much because the shift incriterion would not affect the interpretationof unambiguous acoustic evidence. How-ever, for the acoustically ambiguous stimulinear the boundary, a change in criterionwould produce large effects on categoriza-

tion. Such a change could shift a stimulusfrom the ambiguous region to the word re-gion or from the part of the nonword pho-netic category near the boundary into theambiguous region. This model is also de-scribed quantitatively in Appendix A.

Figure 2 shows the predictions of thecriterion-shift model. This figure assumesequal but opposite direction lexical biasesfor the two continua, as did Figure 1. As-suming that the phonetic categorizationfunction is sigmoid, with the steepest slopenear the phoneme boundary, the lexical-effect function will generally reach a maxi-mum in the neighborhood of the phonemeboundary and thus be greater near thephoneme boundary than at either end of thecontinuum. This is unlike the shape of thelexical-effect function according to thecategorical model, for which the value atthe boundary must be between the valuesat the ends of the continua. Thus, to testthe categorical model, it is only necessaryto determine whether the effect of lexicalbias is spread equally throughout the con-tinuum (as the categorical model predicts)or concentrated at the phoneme boundary(as the criterion-shift model predicts.)

The most direct way to determinewhether lexical effects are stronger near thephoneme boundary would be to comparephonetic categorizations of lexically biasedacoustic continua with the phonetic cate-gorizations of neutral continua, which arenot biased by lexical factors. Alexander(Note 1) has done this with limited suc-cess.2 The present study, instead, comparedcategorizations of matched pairs of con-tinua chosen so that lexical biases on thetwo continua operate in opposite directions.Pairs of VOT continua between monosyl-labic words and nonwords were synthesizedso that the voiced end of one continuum ofeach pair was a word and the voiceless end

* The problem with this approach is that biases inthe perception of the neutral continuum againstwhich the lexically biased series are judged can con-fuse and hide lexical effects. This seems to havehappened in Alexander's (Note 1) study. It showedan overall bias toward phonetic categorizations thatmake words, but this bias was not reliable acrosscontinua.

Page 5: Phonetic Categorization in Auditory Word Perception...text influence this effect (Warren & Sher-man, 1974). Perception is influenced not only by immediate linguistic context (e.g.,

114 WILLIAM F. GANONG III

0.40

ui_ 0.20o

0.00

1.00

8 0.80

oa

I 0.60•o01

£ 0.40•5d.1 0.20

0.00

Word-Nonword

Neutral

— •* Nonword-Word

ShortVOT

-7.5 -2.5 T+2.5 +7.5Msec Msec Msec Msec

Boundary

LongVOT

Figure 2. The lexical effect, as predicted by the criterion-shift model. (This figure is analogous toFigure 1, except that it shows predictions of the criterion-shift model.)

was not, and vice versa for the other con-tinuum. For example, one continuumranged from the word dash to the nonwordtask, and its matched pair ranged fromdask (nonword) to task (word). Thus, it isonly possible to measure the lexical effectfor the two continua combined. The con-tinua of each pair were carefully matchedto have the same vowel and to be as similaras possible in postvocalic consonants.

Experiment 1

Method

Stimuli. Seven pairs of continua were synthe-sized using the Haskins Laboratories speech synthe-sis by rule program, FOVE (Ingemann, Note 2). Fouralveolar continuum pairs were synthesized. One pairwas based on the words dash and task (that is, onecontinuum ranged between the word dash and thenonword task; the other ranged between the non-word dask and the word task). Other continuum pairswere based on dust and tuft, dirt and turf, and dose

and toast. Three velar continua, based on gift andkiss, geese and keep, and gush and cusp were also syn-thesized. Each continuum had seven members, withVOTs of 15, 25, 30, 35, 40, 45, and 55 msec. Thestimuli were recorded on audiotape in the formatthat subjects later heard. The alveolar and velarstimuli were presented in different blocks. Thus,there was a block containing a randomization of allthe alveolar stimuli (presented with a 3-sec inter-stimulus interval), which was followed by a blockcontaining all the velar stimuli. This pattern wasrepeated six times. Another tape was constructedthat contained all the endpoint stimuli, with a S-secinterstimulus interval.

Procedure. Seventeen subjects3 (paid volunteersfrom the psychology department's subject pool)participated in the experiment. First, they were pre-sented with a block of trials containing each stimulusfrom the alveolar continua. The subjects were in-structed to write, for each stimulus, their first im-pression as to whether the syllable began with d or t.This was followed by a block of velar trials. Six

3 An 18th subject, who refused to categorize anyof the velar stimuli as g or k, was immediatelydropped.

Page 6: Phonetic Categorization in Auditory Word Perception...text influence this effect (Warren & Sher-man, 1974). Perception is influenced not only by immediate linguistic context (e.g.,

PHONETIC CATEGORIZATION IN WORD PERCEPTION 115

0.4075Q)

LJ_ 0.20DU

XV

n nn

-

/ --./ ^

/ ^/ -* ^_ __

-- """ """ — — —

_ - ' "

1.00

g 0.80coo.

^ 0.60•oU

> 0.40

d.

1 0.20

-

-

.

-

^ "̂*" ^^ \ »_--_----,^6 \ttlf\f ft — MsvfMUAr/H^^ \ 9 *° WOru iNOnWOfu

A \\ \ » •* Nonvuord-Word

\ \

\ \

\ \V \^ \\ V\ \

\ v%» — --,.N 9

1 1 I 1 I I

Short -7.5 -2.5 t+2.5 +7.5 LongVOT Msec Msec Msec Msec VOT

Boundary

Figure 3. Results of Experiment 1. (Phonetic categorizations pooled as described in the text.)

blocks of alveolar and six blocks of velar trials werepresented to each subject. The subjects were nottold which words and nonwords would be presented.

Unfortunately, FOVE does not always produce per-fectly intelligible speech. To be able to show an ad-vantage for phonetic categorizations that makewords, a subject must hear the words as words.Hence, in a second condition, (which always fol-lowed the first) the endpoints of the continua werepresented and subjects spelled out the words andnonwords they heard. The subjects were instructedto spell words correctly and to make a rough guessfor the spelling of nonwords. The data from thesecond condition were used to determine which con-tinua each subject heard correctly. For each subject,the phonetic categorization data were analyzed foronly those continuum pairs for which the lexicalstatus (e.g., word or noword) of three of the fourendpoint stimuli was correctly identified.

Results

Application of the criterion to subject'sspelling of the words and nonwords resultedin elimination of from 0 to 3 of the 7 con-tinuum pairs for each subject, for a total

of 22 of the 119 continuum pairs (18%).Different continuum pairs passed the cri-teria to quite different degrees. The mostsuccessful pair, dash-task, was spelled ac-ceptably by all of the 17 subjects, whereasthe geese-keep pair was spelled correctlyby only 6 of the subjects.

The results of phonetic categorization,pooled across subjects and continua, areshown in Figure 3. Phoneme boundariesdiffered across subjects and across con-tinua. Thus, each subject's data from agiven continuum pair was pooled, and thephoneme boundary for that subject on thatcontinuum pair was determined. The posi-tion of each phoneme boundary was esti-mated by finding the VOT that receivedthe proportion of voiced responses closestto one half. Henceforth, the data are con-sidered relative to the position of thesephoneme boundaries. The line with shortdashes in the figure shows the proportionof voiced responses to continua whose

Page 7: Phonetic Categorization in Auditory Word Perception...text influence this effect (Warren & Sher-man, 1974). Perception is influenced not only by immediate linguistic context (e.g.,

116 WILLIAM F. GANONG III

voiced end is a word, and the line withlong dashes shows responses to continuawhose voiceless end is a word. The leftmostand rightmost points of each line show re-sponses to the endpoints of the continua:stimuli with VOTs of 15 and 55 msec. Thefour middle points of each line show re-sponses to stimuli with various differencesin VOT from the phoneme boundary. Forexample, the third point from the left ineach line was determined by pooling datafrom the first stimulus to the left of eachsubject's phoneme boundary for each con-tinuum pair. There was a small but consis-tent lexical effect in all seven continuumpairs (p < .01, sign test). This effect wasalso consistent across subjects (poolingdata across different continua); 16 of the17 subjects showed a lexical effect (p< .001, sign test).

In Figure 3, the lexical effect seems to bestronger at the voiceless end of the continuathan at the voiced end. This pattern wasreliable across subjects (11 subjects hadstronger effects at the voiceless end, 2 at thevoiced end, and 4 had equal-size effects;p < .05, sign test) but not across continua(five continua showed the effect, one didnot, and one showed equal-size effects).Most of the effect was caused by the gift-kiss continuum pair. Most subjects cate-gorized almost all of the stimuli from thegift-kift continuum as beginning with g.

The data were next examined to deter-mine whether the lexical effect was spreadthroughout the continuum, as predicted bythe categorical model, or concentrated atthe phoneme boundary, as predicted by theinteractive model. For each subject, thesize of the lexical effect exhibited at thephoneme boundary was compared with thesum of the two lexical effects measured atthe ends of the continua. Sixteen of the sub-jects showed more lexical effect at the pho-neme boundary than at either endpoint,and 1 showed the opposite tendency(p < .005, sign test). Five of the sevencontinuum pairs showed more of an effectat the boundaries than at the endpoints.One showed equal-size effects (the dose-toast pair), and one (the problematicalgeese-keep pair) showed the opposite ef-

fect. Only 6 of the subjects passed the(quite weak) screening criteria for thiscontinuum, and none of them heard all fourof the endpoint stimuli from these continuacorrectly. The concentration of the lexicaleffect at the phoneme boundary is, then,not significant across continua (p > .1).However, this failure to obtain significanceis probably due to the small number ofdifferent continuum pairs tested and to thepoor quality of the geese-keep pair.

Experiment 2

Experiment 2 was designed to provide areplication of Experiment 1, with more con-tinuum pairs, to determine whether thelexical effect arises the first time a subjecthears stimuli from a particular continuum,4

and to determine whether the effect is ro-bust enough to appear even if subjectsknow the stimulus set.

Method

Stimuli. The stimuli for Experiment 2 wereproduced by digitally cross splicing tokens of naturalspeech. Any stimulus with a positive VOT containstwo acoustic segments: a voiceless, noisy segment(consisting of a burst and aspiration) and a voicedsegment which begins, by definition, at the voiceonset time. In VOT continua produced with aspeech synthesizer (as used in most previous experi-ments on the perception of VOT), the voiceless seg-ment is produced by passing aspiration noise throughthe formant filters. At the onset of voicing, glottalpulsing replaces aspiration noise as the source for theformant filters, and a voiced segment results. Forthe stimuli for Experiment 2, these two segmentswere excised from natural speech rather than pro-duced synthetically (Lisker, 1976; Spencer &Halwes, Note 3). Thus the stimuli were considerablymore intelligible than those of Experiment 1.

For each continuum pair, two voiced stimuli withexactly the same acoustic waveform for the first 100msec were constructed by digital splicing. Two voice-less endpoints were similarly constructed. Thesetokens were used to supply the voiced and aspiratedsegments to construct the stimuli with differentVOTs.

4 This condition was included to assure that thelexical effect is truly perceptual, that is, that itarises before subjects have learned the set of con-tinua used in the experiment. Otherwise, the lexicaleffect might be due to processes peculiar to a situa-tion in which a closed set of words and nonwords ispresented repeatedly.

Page 8: Phonetic Categorization in Auditory Word Perception...text influence this effect (Warren & Sher-man, 1974). Perception is influenced not only by immediate linguistic context (e.g.,

PHONETIC CATEGORIZATION IN WORD PERCEPTION 117

For each continuum, six stimuli were produced bycross splicing. One had the shortest VOT possibleusing the particular natural tokens on which thestimuli were based. The next four stimuli for eachcontinuum were chosen to span the phoneme bound-ary (as measured in a pilot experiment) in approxi-mately S-msec steps. The sixth stimulus had thelongest VOT possible given the tokens used. TheVOTs of the first, second, and sixth stimulus of eachcontinuum are given in Table 1. The VOTs of thethird, fourth, and fifth stimuli were approximately 5,10, or 15 msec more than the VOT of the secondstimulus. The details of the construction and selec-tion of these stimuli is given in Appendix B.

Eighteen new subjects (again, paid volunteersfrom the psychology department's subject pool)participated in Experiment 2. The experiment wasconducted using a PDP-8 minicomputer, whichgenerated the stimuli on-line by digital splicing,presented the stimuli to subjects (at a 10-kHzsampling rate) and collected responses. Subjects wererun in groups of 3 or fewer. They heard stimulipresented over headphones and typed their re-sponses. Each subject was assigned to one of sixgroups, each group containing 3 subjects. Allsubjects in each group heard the stimuli in the sameorder.

Each subject participated in one session consist-ing of seven blocks of trials. In the first block oftrials, subjects were presented with a randomizationof all 48 labial stimuli and pushed the b or p keys toindicate their first impression of the first segment ofeach stimulus. The order in which stimuli were pre-sented was constrained so that for each stimulus ofeach continuum, one of the six groups of subjectsheard that stimulus before they heard the otherstimuli from the same continuum. The second andthird blocks presented the alveolar and velar stimuliin an analogous fashion for phonetic categorization.

The fourth block was a whole-syllable identifica-tion condition. Subjects were presented with end-point stimuli from all of the continua in a randomorder and responded by indicating whether thestimulus was a word or nonword (by pushing thew or n keys) and spelling out the stimulus.

In the fifth through seventh blocks, subjectsagain phonetically categorized the labial, alveolar,and velar stimuli. However, this time the stimuliwere presented in groups of trials containing onlystimuli from a particular continuum pair, and thefirst four stimuli of each such group of trials werethe four endpoints of the two continua. Within eachblock, the different groups of trials were not explicitlyseparated, but the way in which the stimuli weregrouped was explained to the subjects. Throughoutthe experiment, each stimulus was presented 3 secafter the last subject finished responding to theprevious stimulus.

Results

The same criteria used in Experiment 1were applied to subjects' identification ofthe lexical status of the endpoints (col-

Table 1Stimuli for Experiment 2

Voice onset times

Voiced

bashboatbabebeef

darkdeepdepthdirt

garbgorgegulpgoutgiftgeese

Word

Voiceless

pastpopepagepeace

tarpteachtextturf

carscorpsecultcouchkisskeep

1

Labials

10.08.67.28.0

Alveolars

18.622.119.319.5

Velars18.235.435.614.923.121.9

Stimulus

2

19.018.016.216.7

36.636.046.952.2

40.653.239.437.940.639.0

6

61.742.859.456.1

84.489.082.686.0

64.386.169.866.785.188.8

Note. Stimuli 1 and 6 had the shortest and longestvoice onset times (VOTs) possible, given the partic-ular tokens used. Stimuli 2, 3, 4, and 5 were chosento span the phoneme boundary in 5-msec steps.Thus the voice onset times (VOTs) of stimuli 3, 4,and 5 can be obtained by adding 5, 10, or 15 msecto the VOT for Stimulus 2.

lected in the fourth block). The stimuli forExperiment 2 were more successful thanthose of Experiment 1—only 24 of the 252continuum pairs were eliminated (10%).Figure 4 shows the phonetic categorizationdata, pooled in the same way as the data ofFigure 3 (i.e., with respect to phonemeboundary locations for each continuumpair), for the first three blocks of trials, inwhich the stimuli were presented in randomorder for phonetic categorization. Again,there is a lexical effect that is significantacross subjects (17 of 18, p < .001, signtest) and across continua (14 of 14, p< .001). This time the effect is significantlystronger at the boundary than at the endof each continuum, showing a larger effectboth for subjects (15 of the 18 subjectsshow the effect, p < .005) and for continua-(14 of 14, p < .001). Thus, in Experiment

Page 9: Phonetic Categorization in Auditory Word Perception...text influence this effect (Warren & Sher-man, 1974). Perception is influenced not only by immediate linguistic context (e.g.,

118 WILLIAM F. GANONG III

0.40

0.20

O.C

1.00

K 0.60

> 0.40

I 0.20

0.00 u

— •* Nonword - Word

-Nonword

LShorlVOT

-7.5 -2.5Msec Msec

+2.5Msec

+7.5Msec

LongVOT

Boundary

Figure 4. Results of Experiment 2. (Phonetic categorizations of each subject during the first threeblocks of the experiment, when stimuli from different continua were presented together in randomorder.)

2 as in Experiment 1, there is a lexical effectconcentrated at the phoneme boundary.

The data were next examined to see ifthe lexical effect was present the first timethat subjects heard a stimulus from a par-ticular continuum. Figure 5 shows the re-sulting data, pooled with respect to pho-neme boundary location tor the first threeblocks of trials. Again, there is a significantlexical effect across continua (nine continuashow the lexical effect, one shows the oppo-site effect, and four show no effect, p <.05, by a sign test), and, again, it appearsstronger at the phoneme boundary than atthe ends of the continua.

Finally, data gathered in the last threeblocks of trials were examined. Responsespooled in the usual way are shown in Figure6. Again, there is a significant lexical effectthat is stronger at the boundary than at theendpoints. This effect is not apparently

different from the effects during the firstblocks.

The failure of blocking to reduce or elim-inate the word advantage was surprising,since most previous studies of the word ad-vantage in vision or word frequency effectsin auditory word perception have foundthat the effects are eliminated when themessage set is known. However, the wordadvantage in visual perception can bemaintained in the face of perfect knowledgeof the stimulus set (Smith & Haviland,1972) if the visual angle of the words issmall enough (Purcell, Stanovich, & Spec-tor, 1978).

In Figure 4 (as in Figure 3), there seemsto be a slightly larger lexical effect for thevoiceless stimuli than for the voiced stimuli.On closer examination, however, this ten-dency is not reliable across subjects (of the14 subjects showing different amounts of

Page 10: Phonetic Categorization in Auditory Word Perception...text influence this effect (Warren & Sher-man, 1974). Perception is influenced not only by immediate linguistic context (e.g.,

PHONETIC CATEGORIZATION IN WORD PERCEPTION 119

lexical bias at the ends of the continua, 7showed more effect at the voiced end) norcontinua (five of the nine unequally affectedcontinuum pairs showed stronger effects atthe voiced end). The tendency for the lexi-cal effect to be greater at the voiceless endof the continuum in Experiment 1 thusseems to have been a product of the particu-lar words or speech-synthesis strategy usedthere.

The data were also examined to deter-mine whether the size of the lexical effectis correlated with the frequency of occur-rence of the words denning the continua.The sum of the logarithm of the frequencyof occurrence (as measured using the Ku-cera & Francis, 1967, norms) of the wordsdefining each continuum pair was correlatedwith the size of the lexical effect measuredat the phoneme boundary for each pair.The correlation coefficient ( — .04) was notreliably different from zero. Thus, no word-

frequency effect is apparent in these data.However, the experiment was not designedto test for word-frequency effects, so littleimportance should be attached to this re-sult.

Discussion

It is clear from the results of Experi-ments 1 and 2 that lexical status affectsphonetic categorizations much more foracoustically ambiguous (i.e., boundary)stimuli than for acoustically unambiguous(endpoint) stimuli. Hence, lexical statushas an effect before acoustic information isreplaced by a phonetic categorization. Thisdemonstration that lexical effects arestronger at the boundary than at the end-points does not rule out the possibility thatsome lexical effects occur after phoneticcategorization. However, the concentrationof the lexical effect at the phoneme bound-

0.40

UJ- 0.20

0.00

1.00

0.80

0.60

o0.40

a.o£ 0.20

0.00 L

Word - Nonword

•* Nonword-Word

ShortVOT

-7.5Msec

-2.5Msec

f + 2.Ms

5Msec

+7.5Msec

LongVOT

Boundary

Figure 5. Phonetic categorization when stimuli are unfamiliar. (These data represent subjects'responses to the first stimulus presented to them from each continuum.)

Page 11: Phonetic Categorization in Auditory Word Perception...text influence this effect (Warren & Sher-man, 1974). Perception is influenced not only by immediate linguistic context (e.g.,

120 WILLIAM F. GANONG III

0.40

0.20

0.00

1.00

g 0.80c

0.60

0.40

0.20

0.00 L

-« Word-Nonword

x N onword - Word

Short -7.5 -2.5 T + 2.5Msec Msec Msec

Boundary

+7.5Msec

LongVOT

Figure 6. Phonetic categorization with grouped presentation of stimuli. (The results of the lastthree blocks of Experiment 2, when stimuli from each continuum pair were presented in the samegroup of trials.)

ary certainly does show that some acousticinformation is available when lexical knowl-edge comes into play. So the categoricalmodel is incorrect, although there may bepostcategorical effects as well as precate-gorical ones.6

Lexical status is a fairly simple form ofhigher level linguistic knowledge thatmight be expected to affect the interpreta-tion of acoustic evidence. Other studieshave shown effects comparable to the re-sults reported here, using semantic (ratherthan lexical) information to bias phoneticcategorizations. Games and Bond (1977)showed that sentence context can biasphonetic categorization, and Spencer andHalwes (Note 3) have shown that the sizeof these effects depends, to some extent atleast, on subjects' expectations about andknowledge of the experimental situation.Similarly, Marslen-Wilson and Welsh

(1978) and Cole and Jakimik (1978) haveshown that the detection and shadowing ofmispronunciations depends on the predict-ability (on syntactic and semantic grounds)of the mispronounced word. Thus, it seemsthat the additional auditory informationtapped by the lexical effect can interactwith much higher level linguistic con-straints.

There are at least three different levels atwhich this information could be coded. Theinformation could be coded as extra candi-date phonetic categorizations; it could be

6 It could be argued that the results of the presentexperiments are "merely" due to response bias. Butthis claim is irrelevant to the question under con-sideration here: The categorical model prohibits re-sponse bias (or any other tendency to perceivespeech sounds as words) from being concentratedat the phoneme boundary.

Page 12: Phonetic Categorization in Auditory Word Perception...text influence this effect (Warren & Sher-man, 1974). Perception is influenced not only by immediate linguistic context (e.g.,

PHONETIC CATEGORIZATION IN WORD PERCEPTION 121

kept in a raw, uninterpreted form; or itcould be coded as confidence ratings forvarious phonetic categorizations.

In a simple version of sophisticated guess-ing theory, the extra information would berepresented as a set of possible phoneticcategorizations (Catlin, 1969). In the caseof stimuli from an acoustic continuum, thisis equivalent to adding a third categoriza-tion, don't know, to the two possible cate-gorizations of the continuum. To accountfor the lexical effect's concentration at thephoneme boundary, this model must as-sume that only items labeled as ambiguousare susceptible to lexical effects and thatthe probability of an item receiving anambiguous categorization increases nearthe phoneme boundary. Of the variouspossible representations of the additionalauditory information, this is the mostlinguistic.

At the opposite pole is the possibility thatthe perceptual system stores the additionalauditory information in a raw, unprocessed,echoic form. In this model, when the outputof the phonetic recognizer is not a word, theacoustic evidence specifying that phoneticsequence would be reexamined to determinewhether the evidence was consistent withany phonetic strings that were words. Thismodel would explain the concentration ofthe lexical effect near the phoneme bound-ary by postulating that stimuli far from thephoneme boundary would be consistentwith only one phonetic categorization.Stimuli near the boundary, on the otherhand, would allow two categorizations, andthe one that made a word would be favored.

It is necessary to posit the existence of anechoic store not only to explain the evidencefor the stimulus suffix effect (Crowder &Morton, 1969) but also to explain the trad-ing relations shown in the integration ofdifferent acoustic cues into a phonetic per-cept (Repp, Liberman, Eccardt, & Peset-sky, 1978). There are two problems withthe assumption that the additional audi-tory information involved in the lexicaleffect is stored in echoic memory. One isthe apparent coarseness of the code inechoic memory. Distinctions among stopconsonants cannot be retrieved from the

precategorical acoustical store. Thus, astimulus suffix interferes with memory foracoustically dissimilar vowels but not foracoustically similar vowels (Darwin &Baddeley, 1974) or stop consonants (Crow-der, 1971). Another potential problem is theshort duration over which informationstored in echoic memory is available. Esti-mates vary, but the duration of echoicmemory for consonants is probably underone second.

A third form of coding, intermediate inprocessing depth between the candidatephonetic categorizations of sophisticatedguessing theory and the raw sensory in-formation of the echoic store, would usegoodness of fit ratings of different possiblephonetic categorizations. In this model, abias toward categorizations that makewords would show up as a lower ratingthreshold for words than for nonwords. Forstimuli near the phoneme boundary, thisdifference in thresholds would affect pho-netic categorization. However, the phoneticcategorization of stimuli far from theboundary would not be influenced by thesedifferences in threshold because the ratingsof the correct phonetic categorizationwould be quite high. Information in thisform is neither strictly phonetic nor audi-tory but, rather, expresses the relation be-tween the auditory data and the phoneticpossibilities in a succinct way. It is inter-esting that the most successful speech un-derstanding system to date, HARPY, worksin this way (Klatt, 1977 ; Lowerre & Reddy,in press).

Also, implicit confidence rating responsesof this sort have been used in many modelsof linguistic processing. Morton's (1969)logogens measure the acoustic/phonetic fitof each word in the language to the stimu-lus.6 Marslen-Wilson and Welsh's (1978)modifications of the logogen view preservethis feature. And Massaro's (Note 4)model of the integration of information

« The logogen model, as stated by Morton (1969),simply counts phonetic features. To account for thedata presented here, the model must be slightlymodified to sum the goodness of fit ratings of eachsegment of the word. This modification seems in thespirit of the original logogen model.

Page 13: Phonetic Categorization in Auditory Word Perception...text influence this effect (Warren & Sher-man, 1974). Perception is influenced not only by immediate linguistic context (e.g.,

122 WILLIAM F. GANONG III

from different knowledge sources also pro-vides estimates of the goodness of fit ofvarious options. Massaro's work is of par-ticular interest because he has carried outexperiments on reading that are analagousto the experiments described here. He hasexamined the visual perception of letterlikeforms drawn from a (visual) continuumfrom the letter c to the letter e (Naus &Shillman, 1976). The perception of suchstimuli is biased in favor of orthographi-cally regular strings, just as in the presentexperiment phonetic categorization is bi-ased toward words.

Deciding between these different repre-sentations of the additional auditory in-formation will be a difficult task becausethey are all modifications of the simplecategorical model with the same goal. Theycan perhaps best be distinguished by ex-amining the way in which the processing ofphonetically ambiguous items is influencedby later arriving biasing information. Allof the work described above on syntacticand semantic context in phonetic decisionsprovides the biasing context before thephonetic ambiguity. Providing the biasinginformation at various delays after the am-biguous item should provide informationabout the nature of the coding of this addi-tional auditory information. If the informa-tion is stored in some sort of echoic store,phonetic categorization of the ambiguousacoustic information should only be sus-ceptible to bias over a short period of time,regardless of the linguistic structure of theutterance. On the other hand, if the in-formation is stored as confidence ratingsor as sets of possible phonetic categoriza-tions, the linguistic structure of the utter-ance (particularly, whether the ambiguousinformation and biasing information are inthe same clause) would be of great import-ance.

Reference Notes

1. Alexander, D. The effect of semantic value on per-ceived b-p boundary. Unpublished manuscript,Massachusetts Institute of Technology, 1972.

2. Ingemann, F. Speech synthesis by rule using theFOVE program. Paper presented at the Interna-tional Congress of Phonetic Sciences, Miami,1977.

3. Spencer, N. J., & Halwes, T. Relating categoricalspeech perception to ordinary language: Boundaryshifts on a "t" to "d" continuum in nonsense, word,and sentence context. Manuscript in preparation,1978.

4. Massaro, D. W. Reading and listening (Tech.Rep. 423). Madison: University of Wisconsin,Wisconsin Research and Development Centerfor Cognitive Learning, December, 1977.

References

Broadbent, D. E. Word-frequency effect and re-sponse bias. Psychological Review, 1967, 74, 1-15.

Carney, A. E., Widin, G. P., & Viemeister, N. F.Noncategorical perception of stop consonants dif-fering in VOT. Journal of the Acoustical Society ofAmerica, 1977, 62, 961-970.

Catlin, J. On the word-frequency effect. Psychologi-cal Review, 1969, 76, 504-506.

Cole, R. A., & Jakimik, J. Understanding speech:How words are heard. In G. Underwood (Ed.),Strategies of information processing. London:Academic Press, 1978.

Crowder, R. G. The sound of vowels and consonantsin immediate memory. Journal of Verbal Learningand Verbal Behavior, 1971, 10, 587-596.

Crowder, R. G., & Morton, J. Precategorical acous-tic storage (PAS). Perception & Psychophysics,1969, 5, 365-373.

Darwin, C. J., & Baddeley, A. D. Acoustic memoryand the perception of speech. Cognitive Psychology,1974, 6, 41-60.

Fujisaki, H., & Kawashima, T. Some experimentson speech perception and a model for the percep-tual mechanism. Annual Report of the EngineeringResearch Institute (University of Tokyo), 1970,29, 207-214.

Games, S., & Bond, Z. S. The relationship betweensemantic expectation and acoustic information.In W. Dressier, O. Pfeiffer, & T. Herok (Eds.),Phonologica 1976 Akten der dritten InternationalPhonologie-Tagung Wien, 1.-4. September, 1976.Innsbruck: Institut fur Sprachwissenschaft derUniversitat Innsbruck, 1977.

Klatt, D. H. Review of the ARPA speech under-standing project. Journal of the Acoustical Societyof America, 1977, 62, 1345-1366.

Klatt, D. H. Speech perception: A model of acoustic-phonetic analysis and lexical access. Journal ofPhonetics, 1979, 7, 279-312.

Kucera, H., & Francis, W. N. Computational analy-sis of present-day American English. Providence,R.I.: Brown University Press, 1967.

Liberman, A. M., Harris, K. S., Hoffman, H. >., &Griffith, B. C. The discrimination of speechsounds within and across phoneme boundaries.Journal of Experimental Psychology, 1957, 54,358-368.

Liberman, A. M., Mattingly, I. G., & Turvey, M. T.Language codes and memory codes. In A. W.Melton & E. Martin (Eds.), Coding processes in

Page 14: Phonetic Categorization in Auditory Word Perception...text influence this effect (Warren & Sher-man, 1974). Perception is influenced not only by immediate linguistic context (e.g.,

PHONETIC CATEGORIZATION IN WORD PERCEPTION 123

human memory. Washington, D.C.: V. H. Wins-ton, 1972.

Lisker, L. Stop voicing production: Natural outputsand synthesized inputs. In, Haskins Laboratories:Status report on speech research, 1976, SR-47.(ERIC Document Reproduction Service No.ED-128-870; NTIS No. AD A031789)

Lisker, L., & Abramson, A. Cross-language study ofvoicing in initial stops. Word, 1964, 30, 384-422.

Lowerre, B., & Reddy, D. R. The HARPY speech un-derstanding system. In W. Lea (Ed.), Trends inspeech recognition. Englewood Cliffs, NJ.: Pren-tice-Hall, in press.

Marslen-Wilson, W. D., & Welsh, A. Processing in-teraction and lexical access during word recogni-tion in continuous speech. Cognitive Psychology,1978, 10, 29-63.

Miller, G. A., Heise, G., & Lichten, W. The intelligi-bility of speech as a function of the context of thetest materials. Journal of Experimental Psychol-ogy, 1951, 41, 329-335.

Morton, J. A. Interaction of information in wordperception. Psychological Review, 1969, 76, 165-178.

Myerow, S., & Millward, R. SPLIT—A sound editorfor a PDP-8 computer. Behavior Research Meth-ods and Instrumentation, 1978,10, 281-284.

Naus, M. J., & Shillman, R. J. Why a Y is not a V.A new look at the distinctive features of letters.

Journal of Experimental Psychology: Human Per-ception and Performance, 1976, 2, 394-400.

Pollack, I. Message uncertainty and message re-ception. Journal of the Acoustical Society ofAmerica, 1959, 31, 1500-1508.

Purcell, D. G., Stanovich, K. E., & Spector, A.Visual angle and the word superiority effect.Memory & Cognition, 1978, 6, 3-8.

Repp, B. H., Liberman, A. M., Eccardt, T., & Peset-sky, D. Perceptual integration of cues for stop,fricative, and affricate manner. Journal of Experi-mental Psychology: Human Perception and Per-formance, 1978, 4, 621-637.

Rubin, P., Turvey, M. T., & Van Gelder, P. Initialphonemes are detected faster in spoken wordsthan in spoken nonwords. Perception & Psycho-physics, 1976, 10, 394-398.

Smith, E. E., & Haviland, S. E. Why words are per-ceived more accurate.y than nonwords. Journalof Experimental Psychology, 1972, 92, 59-64.

Stevens, K. N., & Klatt, D. H. Role of formanttransitions in the voice-voiceless distinction forstops. Journal of the Acoustical Society of America,1974, 55, 653-659.

Warren, R. M. Perceptual restoration of missingspeech sounds. Science, 1970, 167, 392-393.

Warren, R. M., & Sherman, G. L. Phonemic restora-tion based on subsequent context. Perception &Psychophysics, 1974, 16, 150-156.

Appendix A

This appendix describes quantitatively thepredictions of the categorical and criterion-shift models.

The categorical model can be described bytwo equations. For a continuum Cl, for whicha response of d makes a word and t does not(such as the dash-tash series), the categoricalmodel predicts

P(d|t>) = Po(d|») + Pohange^Cl - -Po(d|«0]

where P(d\v) is the probability of respondingd to a stimulus with voice onset time (VOT) »,.Po(d|f) is the probability of a response of d toa stimulus with VOT = v in an unbiased situa-tion (which is also the probability of an internalresponse of d before the correction process),and PohangeiCi is the probability of correct-ing a t response to d for Continuum Cl. Simi-larly, the predictions for a continuum in whicha t response makes a word (such as the dask-task continuum) are given by

P(d\v) = (1 - Pchange:C2)*.Po(d|t;).

The lexical effect function L(v) is simply thedifference in response probabilities for the two

continua of a matched pair. Thus,

L(v) = P9(d\v) + Pehange:Cl*[l -

= -Pohange:Cl (Pohange:C2 — -Poh»nge:Cl)

In Figure 1, it is assumed that P0hange:C2= .Pehange:Ci, so L(v) is constant (i.e., does notdepend on »). This is not an essential featureof the model ; there is no reason to think biasesare equal. However, for any value of PChange:C2— -Pchange:Cii the value of L(v) at the boundary(where P»(v) — .5) will be equal to the meanof the values of L(v) at the ends of the contin-uum and certainly less than their sum.

The quantitative predictions of the criterion-shift model are described by

where Po(d\x) is the probability of respondingd to a stimulus with VOT = x in a situation inwhich there is no lexical bias, and 6ci is thecriterion shift for this continuum. 6ci will benegative when the voiced end of ContinuumCl is a word (resulting in more voice responses)and positive when the voiceless end is a word.

Page 15: Phonetic Categorization in Auditory Word Perception...text influence this effect (Warren & Sher-man, 1974). Perception is influenced not only by immediate linguistic context (e.g.,

124 WILLIAM F. GANONG III

The shape of L(v), according to the criterionshift model, is simply

L(v) = Po(d | t> - P0(d\v + 602).

Figure 2 shows the predictions of the cri-terion-shift model, assuming that Pa (d|») is acumulative normal function. Figure 2 assumesequal but opposite direction lexical biases forthe two continua, as did Figure 1. Assuming

that fo(d|w) is sigmoid, with steepest slopenear the phoneme boundary, L(v) will generallyreach a maximum in the neighborhood of thephoneme boundary and, thus, be greater nearthe phoneme boundary than at either end ofthe continuum. This is unlike the shape ofL(v) according to the categorical model, forwhich the value at the boundary must be be-tween the values at the ends of the continua.

Appendix B

I am aware of only two previous studies(Lisker, 1976; Spencer & Halwes, Note 3) thatused voice onset time (VOT) continua pro-duced by cross splicing natural speech. Sincethese studies are not readily available, it seemsimportant to describe the technique in somedetail.

The splicing method uses segments cut fromvoiced and voiceless natural tokens to supplythe aspirated and voiced segments of eachstimulus. Good tokens of each stimulus arerecorded and digitized, and pitch periods aremarked in the voiced stimulus. Stimuli fromthe VOT continuum are produced by replacingthe segment before a particular pitch period inthe voiced token with an equal duration seg-ment from the beginning of the voiceless token.

Vor these purposes, the beginning of a pitchperiod is taken to be the last upward-goingzero crossing before a pitch pulse. This choiceminimizes clicks due to splicing and assuresthat a pitch pulse occurs at each nominal VOT.Stimuli produced by this method can only haveVOTs at times that are the beginnings of pitchperiods in the voiced stimulus.)

For the present experiment, tokens of thevoiced and voiceless syllables used to constructthe continua were spoken by a female pho-netician, in the sentence environment "Nowsay . . .," and recorded on audiotape. Herfundamental frequency was approximately 200Hz at the beginning of each syllable. The tokenswere digitized on a PDP-8 minicomputer, usinga sampling rate of 10 kHz and a 4.5-kHz low-pass filter. The waveform editing programSPLIT (Myerow & Millward, 1978) was used toedit tokens out of the carrier phrase, and thebeginning of each token's burst was carefullydetermined. Zero crossings before pitch periodswere located by examining an oscillographicdisplay. The digitized tokens and the locationsof the pitch periods were used to generate thespliced tokens that subjects heard.

Tokens were digitized for 22 continua pairs.The endpoint stimuli from those continua werepresented to 13 subjects for identification of thewhole syllable (not just the initial segment).Those continua pairs for which more than fiveof the endpoints were incorrectly identifiedwere eliminated. The errors in identificationrarely involved the voicing of the initial seg-ment. Often errors were due to misperceptionsof the vowel or final consonants. The remain-ing 16 continua pairs were used in a pilot ex-periment. Two of these pairs contributed manymore errors in identification of the endpointsthan did the other pairs, so these 2 pairs werealso eliminated. The remaining continua pairs(whose word endpoints and VOTs are given inTable 1) were used in Experiment 2. It is im-portant to note that the selection of stimuliwas on grounds independent of whether theyproduced the lexical effect. The excluded con-tinua were not eliminated because they failedto show a word bias, but simply because theendpoint stimuli were not intelligible enough.Thus, there is no reason to think that the re-maining continua pairs are not a representa-tive sample from the set of intelligible continuapairs for English.

In the pilot experiment, there was consider-able variation in the position of phonemeboundaries within the pairs of continua. Onesource of this variation could be random fluctu-ations in the particular aspiration and voicingwaveforms used. Random variations in theamplitude of segments of aspiration noise orglottal pulses could influence the position ofthe phoneme boundary. Therefore, it was de-cided to use exactly the same aspiration andvoicing waveforms to construct the stimuli ofboth continua of each pair. The first pitchpulse more than 100 msec after stimulus onsetwas found in the voiced token of one of thecontinua. In both voiced and voiceless tokens,the initial segment of this duration was re-

Page 16: Phonetic Categorization in Auditory Word Perception...text influence this effect (Warren & Sher-man, 1974). Perception is influenced not only by immediate linguistic context (e.g.,

PHONETIC CATEGORIZATION IN WORD PERCEPTION 125

placed with a comparable segment from thecorresponding member of the other continuum.The tokens produced sounded perfectly naturalbut produced pairs of continua that used ex-actly the same aspiration and voicing sequencesfor the first 100 msec of the stimuli. For half ofthe continua pairs, the initial segments weretaken from the continuum whose voiced endwas a word, and vice versa for the other half.

One objection to the use of stimuli producedin this manner can be raised. Although VOT isclearly controlled by this method, other acous-tic cues (e.g., formant transition duration,Stevens & Klatt, 1974) that are known to af-fect the voicing decision for stops are not con-trolled systematically. This objection is correct

but irrelevant to the purposes of the presentstudy. If the goal for the study were to investi-gate just the acoustic cue VOT (and not otheracoustic cues), this would be a serious problem.But in the present study, it is only importantto relate the perception of stimuli varyingalong some.acoustic variable to other factors;it is not important that this variable be anyparticular acoustic cue. As long as subjects'perceptions of the stimuli depend on theacoustic variable (as they evidently did inthis experiment), the stimuli will be sufficientfor examining the relation between acousticinformation and higher order knowledge.

Received January 12, 1979 •