Predicting physical aspects of English stress

Dept. for Speech, Music and Hearing

Quarterly Progress andStatus Report

Predicting physical aspectsof English stress

McAllister, R.

journal: STL-QPSRvolume: 12number: 1year: 1971pages: 020-029

http://www.speech.kth.se/qpsr

http://www.speech.kth.se

http://www.speech.kth.se/qpsr

STL-QPSR 1/1971 20.

C. PREDICTING PHYSICAL ASPECTS O F ENGLISH STRESS*

R. M cAlli s t e r

1. Problem

This paper i s a discussion of an attempt to discover correlation between

syntactic s t ructure and s t r e s s contours in American English.

The problem was made more tractable within this framework by using

the recent and well known work of Chomsky and Halle in "The Sound Pat te rn

of English" (1968) a s a point of reference. Right branching s t ructure and

the rules of s t r e s s pertaining to such s t ructures a r e the object of inquiry in

this study. Specifically, the study i s concerned with the operation of the

Nuclear S t r e s s Rule on right-branching s t ructures (pp. 17- 18 in Chomsky

and Halle, 1968) and the consequences of this operation in the speech wave.

Correlation has been sought between the output of the Nuclear S t r e s s Rule

when applied to a right-branching s t ructure and the fundamental frequency,

intensity, and duration pa ramete r s in the speech wave. In other words; what

do the s t r e s s contours that a r e the outputs of this rule mean in t e r m s of the ~ i

phonetic facts ? The correlation between the output of these ru les and the

phonetic facts may have important implications in a l a rge r discussion of the ~ role of the phonetic representation in the phonological component of the gram-

m a r . This linguistic application of the resul ts of this study will be men-

tioned l a t e r in this paper , a t which time the view of the role of the phonetic

representation presented in Chomsky and Halle (1968) will be discussed.

Also included in this paper i s an attempt to discover regularit ies in the

aforementioned pa ramete r s that might lead to the formulation of ru les t h a t

m o r e adequately express the relevant factors in the correlation between

surface s t ructure and the speech wave.

2. Experimental procedures

2. 1 Method -------

Special attention was given to the construction of the tes t phrases . The syntactic s t ructure has been s t r ic t ly defined and methodically expanded.

Starting with a simple two-constituent t ree ' big pants' , a right branching

t r e e was built up by expanding the construction to the left.

-:t Thesis work associated with the Dept. of Phonetics, Stockholm University

STL-QPSR 1/1971 2 1.

With this procedure, a range of s t ress levels generated by the NSR (Nuc-

lear Stress Rule) was obtained for each word in the syntax. In other words,

the syntax was expanded in the same way a s the transformational cycle oper-

ates on a right-branching structure through the NSR,

Fig. I-C-1 i s a summary of the material used in this study presented in

terms of its derivation using the procedure described above. Sentence A

represents the syntax in its longest form and below sentence A i s the deriva-

tion of its s t ress contour through the cyclical application of the NSR. The

phrases 1-7 below the s t ress level numerals a r e the phrases used a s test ut-

terances in this study. They a r e positioned so that the numerals above these

sentences represent both a step in the cyclical derivation of the s t ress contour

of sentence A and the final s t ress contours of the test utterances.

2.2 Measurements ----------- Mingograms were made of all repetitions of utterances and isolated words.

Fundamental frequency, intensity, and duratioh were measured for all five

readings.

2 . 2 . i Fundamental frequency: Fundamental frequency was measured at the

temporal midpoint of the vowel of the words in the test utterances. Care was

taken to produce an artifact free representation of Fo. In the preliminary

work for this investigation, two sets of data had to be partially rejected be-

cause it proved impossible to obtain reliable F information. The voices of 0

the informants were simply acoustically incompatible with a satisfactory

mingograph analysis. The informants also had many mispronunciations a s -

sociated with emphasis and enumeration intonation. That par t of the data

that was not rejected showed qualitative similarity to the data represented

below.

2.2.2 Intensity: Intensity was measured a t the temporal midpoint of the

vowel. An integration time of 25 msec was selected. The delay due to filter

inertia required a temporal compensation of approximately 20 msec which

corresponded to 1 mm in the sampling of the intensity curve.

2 .2 .3 Duration: Duration was measured for each stressed word from the

release of i ts initial consonant to the end of its voiced segment.

SPEECH MATERIALS SENTENCE A He kicked the cop who thought that Dick sharkl haw boucj'i the bett for his big pants

CYCLE 1 I 1 1 1 1 1 1 2 1

PHRASE 1

CYCLE 2

CYCLE 6 1 1 2 3 4 5 6 7 1

big pants

1 1 1 1 1 2 3 1

PHRASE 2

CYCLE 3

PHRASE 3

C'KLE 4

PHRASE 4

CYCLE 5

PHRASE 61 The cop thcught that Dick should have bought the belt for his big pants

the belt for his big pants

1 1 1 1 2 3 4 1

He bcught the belt for his big pants

1 1 1 2 3 4 5 1

Dick should have bought the belt for his big pants

1 1 2 3 4 5 6 1

CYCLE 7 1 2

PHRASE 5 He thoughtthatDick shouldhavebought thebelt forhis big pants

PHRASE 7=SENTENCE d He W the cop who thowht that Dick should have bw&t the belt for his bg pants

Fig. I -C-I . Speech ma te r i a l used in this study presented in t e r m s of i ts derivation. Sentence A and the phrases 1-7 a r e positioned s o that the s t r e s s level numerals represent both a step in the cyclical derivation of the s t r e s s contour of sentence A and the final s t r e s s contours of phrases 1-7 which were used a s tes t utterances.

STL-QPSR 1/1971

3 . Results

3. 1 Fundamental f r e e e n c y ---- ------- -

The measured Fo values were plotted a s a function of the words in the

t e s t phrases . This i s meant to throw light on the Fo contour in each sen-

tence and to facilitate discovery of any meaningful regularity in these con-

tours. Fig. I-C-2 is a summary of these Fo contours for a l l the tes t ph rases

considering only the s t r e s sed words. As each phrase i s read f rom left to

right, Fo decreases in value between s t r e s s level 2 a t the beginning and 1 a t

the end. Of part icular in te res t he re is that 1 has consistently the lowest

value. This phenomenon has often been observed and discussed in the

l i terature (cf. Lieberman 1966). It appears that a s each expansion of the

syntax was executed, the value of Fo a t the beginning of the phrase was I

raised. Therefore, the point of termination being around 100 Hz for a l l the

phrases , the initial value and range of Fo increases with increasing length

(number of s t r e s s e s ) in the phrase. There i s a higher ra te of reduction in

the range of s t r e s s levels, starting f rom 2 down to about 4. In other words,

the ra te of reduction i s higher towards the beginning of every utterance than

towards the end. The unstressed words in the phrases show an Fo contour

which i s somewhat s imilar to that of the s t r e s sed words. The significant

difference being that if the level of an initial unstressed word (which i s always

very low) i s disregarded, the range of F values is often somewhat smaller . 0

F o r a fuli account of this data, sce McAllister, forthcoming.

3.2 Intensity -------- As fo r F the measured intensity values were plotted a s a function of the

0

words in the t e s t phrases . This i s intended to elucidate the intensity con-

tours for each phrase and to make meaningful regular i t ies in these contours

m o r e obvious. Fig. I-C-3 shows a summary of these intensity contours con-

sidering only the s t r e s sed words. It i s obvious that these curves a r e s im-

ilar in some important ways to the F data. The descent of intensity value 0

between s t r e s s level 2 a t the beginning of each phrase and 1 a t the end i s not

unlike the Fo pat terns although the rate of reduction i s quite different. Note

here again that s t r e s s 1 corresponds to the lowest intensity level. There

seems to be he re also, an approximate colnmon terminal intensity level for

nearly a l l the phrases . Unlike the pattern f o r Fo where the s t r e s sed and un-

s t r e s sed curves gradually came together towards the end of the sentence,

the two intensity curves seem to have no common values.

1 I I I I I I I I I I I I I I I 1

150 - - 2

140 - -

130 - -

2 110 -

100 -

90 - L

J I I I 1 I I I I I I 1 I 1 I I I I

HE THE WHO THAT SHOULD HAVE THE FOR HIS KICKED COP THOUGHT DICK BOUGHT BELT BIG PANTS

Fig, I-C-2. Summary of the Fo contours for all the test phrases with Fo plotted as a function of the stressed words in the phrase.

25 HE THE WHO THAT SHOULD HAVE THE FOR HIS

KICKED COP THOUGHT DICK BOUGHT BELT BIG PANTS

Fig. I-C-3. Summary of the intensity contours for a l l the tes t ph rases with the inten- ,

sity values plotted as a function of the s t r e s sed words in all the phrases.

-

STRESS LEVEL OF PRECEDING SYLLABLE

I I I I I I I I

PHRASE FINAL SYLLABLE

WITH MAIN STRESS

-

Fig. I-C-5. Duration in milliseconds from consonant release to the end of the voiced segment for the word 'pants' plotted a s a function of the s tress level of the preceding syllable.

4

C

i

- -

I I I I I I 1 I

STL-QPSR 1/197 1 24.

higher level of s tress . It i s clear that when we consider the NSR, promi-

nence does not bear a 1- I relation with Fo. The point of the NSR i s that i t

unfailingly gives s t ress priority to the right-most constituent in the phrase.

Therefore in these strictly right-branching structures, where the NSR i s

cyclicr?lly applicable, the right-most constituent should always have the

highest of whatever the s t ress levels represent. The resul ts of the Fo

measurements show that s t r e s s level 1 consistently represents the lowest

Fo. This discrepancy between the s t ress level contours which a r e outputs

of the NSR and the Fo parameter would lead us to believe that there must

be a more effective way to describe F patterns in right-branching struc- 0

tures.

The results presented in Fig. I-C-2 show evident regularities in the Fo

patterns. According to Chomsky and Halle, the central cr i ter ia for the

operation of the NSR a re bracketing of the surface structure and the labeling

of the syntactic categories within the brackets. vlith this information, the

NSR can then generate the proper s t r e s s contours. The results show, how-

ever , that the output of the NSR has no direct correlation in the Fo patterns

for the s tressed words shown in Fig. I-C-2. The F contours for the un- 0

s tressed words also show obvious regularities and these regularities seem

to have a definite relation to the contours for the s t ressed words. The

description of the F values for the unstressed words in these constructions 0

then, can be made in t e rms of the relationship between the stressed and un-

s t ressed curves with the s t ressed functioning a s a reference.

4.2 Intensity - - - - - - - - Some of the discussion concerning the resul ts of the intensity measure-

ments will be similar to that of the Fo results. As was the case f o r the Fo

curves in Fig. I-C-2, it appears difficult to discover meaningful correla-

tion between the intensity patterns and the s t r e s s level contours which a r e

the outputs of the NSR. The same, somewhat misleading tendency for agree-

ment between decending s t re s s levels and lowered intensity value s i s apparent

in Fig. I- C-3. The validity of the s t r e s s level numerals a s indicators of

degree of intensity i s , however, a s for Fo, weakened by the fact that the

level 1 has consistently a relativcly low value in Fig. I-C-3. We must then

establish that if the higher s t r ess level numerals a r e to indicate a higher

degree of any characteristic, then this characteristic can hardly be inten-

sity. It i s obvious that the intensity envelopes show some similarities fo r

STL-QPSR 1/1971 25.

the Fo patterns in Fig. I-C-2. The general tendency for gradual reduction

of value from high, left-most s t r e s s level 2 to a low approximate common

value at the right-most level 1 i s evident. Except for the shor ter utterances,

the intensity curves a r e quite s imilar in t e r m s of r a t e of reduction (ca

2.5 d ~ / s t r e s s ) . This may lead us to believe that there i s a general ten-

dency for the intensity levels to be grouped along the same axis, and a s each

expansion of the syntax was executed, the intensity values for the added

s t ressed words were simply added to the other a t the appropriate place on

the axis, i. e. higher and to the left of the others. The slope of this axis

may be m o r e clearly stated if we f i r s t take into account the observation

that vowel quality has an effect in intensity levels. F r o m Lehiste (1970),

pp. 120 - 123, we know that close vowels have somewhat lower inherent in-

tensity than the open vowels. An adjustment of the intensity levels of the

words in the utterances according to these facts would ra i se the close vow-

els ' and lower the open vowels' intensity slightly.

As was the case for the unstressed word' s Fo, the unstressed word' s

intensity levels appear to have a direct relationship to those of the s t ressed

words. The contours fo r the unstressed words, with the possible exception

of the relative pronoun ' that' could then be defined in t e r m s of the contours

for the s t ressed words. The data for the pronoun ' that' i s probably too

limited to lead to any suspicion that this tendency for a lower intensity level

has any generality for &is o r any other relative pronoun.

4. 3 Duration -------- In the resul t s of the duration measurements , we see perhaps the most

pronounced discrepancy in t h e correlation between the phonetic facts and

the s t r e s s level contours which a r e outputs of the NSR. One could expect

that if duration were the character is t ic described with the s t r e s s level nu-

mera l s , a lowering of s t r e s s level would entail a durational reduction. Fig.

I-C-4 shows that this i s the case for the s t r e s s level change f rom 1 to 2 but

not for s t r e s s levels lower than 2. There appears to be two durational cate-

gories for the syllable nuclei in these tes t words. There i s a long category

which occurs when the words a r e pronounced in isolated position and a short

category which occurs when the words a r e pronounced in context. This is

the case for al l the words in the utterances except ' pants' . The syllable

nucleus in this word has a more constant duration value than those of any

other words. As mentioned in the presentation of the resul ts , the isolated

STL-QPSR 1/1971 26.

' pants' is slightly longer than the occurrences in context. The difference,

however, i s small compared to the other words. It i s important to keep in

mind that the word 'pants ' which shows this relative invariability in dura-

tion of measured segments, always has the s t r e s s level 1. These utterances,

then, could be said to show two durational levels instead of the maximum of

eight a s the NSR would lead u s to expect. F i r s t the durational level 1 and

then a durational level l e s s than 1, le t us say 2. The level 1 occurs at the

right-most constituent and in isolated position. The level 2 elsewhere.

An important question here concerns the meaning of the fact that the right-

mos t element in these s tructures has a durational value equal to that of i t s

isolated value. This could mean that s t r e s s pr ior i ty given to right hand

constituents in the NSR holds for the duration parameter . One can hardly

help feeling a certain hesitancy to accept this correlation between the NSR

and the duration parameter a t face value. The well-known tendency for

utterance-final syllable to undergo lengthening should be borne in mind.

5. Summary of the discussion

Consideration of the facts presented thus f a r would seem t o indicate that

the output of the Nuclear S t ress Rule does not show a simple correlation with

the prosodic pa ramete r s of the speech wave. This would be adequate motiva-

tion for u s a t this point, to systematically evaluate this rule a s a means of

predicting acoustic parameters in the speech wave.

5. 1 The NSR-based model ---------------

In this evaluation, an NSR-based model for the prediction of the F and 0

intensity parameters in this study will be compared to an alternative model

based on the number of s t r e s s e s in a phrase and position of words in the

phrase. The duration parameter will not be included in this comparison.

In an NSR-based model we must make use pr imar i ly of outputs of this

rule a s a basis fo r predictions. The s t r e s s contours which a r e outputs of

the NSR a r e expressed in t e r m s of numerals, the numeral 1 representing

the greatest degree of some s t r e s s characteristic. A reasonable demand

that may be placed on this rule i s that it, through i t s s t r e s s contour outputs,

show a correlation between the syntax (information the rule makes use of

in i t s operation) and phonetic reality by predicting the parameters wc have

discussed in this paper. The capacity of the NSR to do so i s i l lustrated in

Figs. I-C-6 and I-C-7. It is immediately apparent in these figures t h a t a t

Fig. I -C-6 .

STRESS LEVEL

A summary of observed Fo values for each s tress level including data for all the stressed words isolated and in context. The line connects the median values for each s tress level.

STRESS LEVEL Fig. I-C-7. A summary of observed intensity values for each s t r e s s level

including data fo r a l l the s t r e s s e d words isolated and in context. The line connects the median values for each s t r e s s level.

STL-QPSR i / i 97 1 27.

l ea s t two problems a r e associated with the NSR-based model. F i r s t , there

i s no l inear correlation between descending Fo o r intensity values and de-

scending s t r e s s levels. The second problem apparent in Figs. I-C-6 and

I-C -7 i s the spread of the data. The NSR-based model not only does not

account for the spread, but perhaps more important, does not account for

the obvious regularit ies in this spread. Use of ,an NSR-based model for the

Fo 2nd intensity pat terns in t h i s study would require considerable adjust-

ments of the NSR outputs. These adjustments would be based on some of

the same factors which an alternative model would make use of fo r predic-

tion of F and intensity such a s the number of s t r e s s e s and their position 0

in the utterance. Thus the low Fo level of terminal s t r e s s I would have to

be included a s a basic r d e . a

5 .2 Alternative model based on_phrase length and2osi t ion ----------------- ----- --- ----

Examination of the figure mater ia l presented in this study leads to the

conclusion that it is possible to formulate ru les that have quite different

character is t ics than the NSR. Instead of the syntactic information used by

the NSR, i t is possible to derive the Fo and intensity values f o r the words

in the t e s t ph rases by considering ( I) the length of the phrase and (2) the

position of the individual word in the phrase. In the formulas presented

below, position i s numbered f rom left to right. The left-most position in

the phrase has the position 1, the next position to the right 2, and so on.

The derivation of the values fo r the unstressed words is a l so included in

this quantitative t reatment of the data.

The following formulas a r e suggestions for derivation of F and intensity 0

values for each word in the t e s t phrases used in this study.

n = number of s t r e s s e s in phrase

p = position

k = . 2 for a succeeding unstressed word

k = 0 for s t ressed words

W roo > fx

PREDICTED FUNDAMENTAL FREQUENCY

Fig. I-C-8. Observed values for Fo plotted as a function of values predicted with the formula ( I ) section 5. 2.

STL-QPSR 1/1971 29.

(Strangely enough, Chomsky and Halle argue that the phonetic representation

need not have any relation to a "physical o r acoustic reality", p. 25. ) If

ru les provide outputs in t e r m s of a phonetic representation with little cor -

relation to the speech wave produced, w e must conclude that the demands

of the phonetic r ep re sentation must be better defined, and that ru les which

give r i s e to this phonetic representation bc accordingly formulated.

References:

Chomsky, N. and Halle, M. (1968): The Sound Pa t t e rn of English, Harper and Row, New York.

Fant , G. (1968): "Analysis and Synthesis of Speech Processes" , in Manual of Phonetics, ed. B. Malmberg, North-Holland Publ. Co., Amsterdam, pp. 173-277.

Lehiste, I. (1970): Suprasegmentals, The MIT P r e s s , Cambridge, Mass.

Lieberman, P. ( 1967): Intonation, Perception and Language, The MIT P r e s s , Cambridge, mass.

Lindblom, B. (1970): "Temporal Organization of Syllabic Processes" , invited paper presented a t the ASA-meeting a t Atlantic City, April 1970.

McAllister, R. (1971): "The Nuclear S t r e s s Rule and the Description of English Stress" , forthcoming in the Proceedings of the VIIth International Congress of Phonetic Sciences, Montreal.

Postal , P. ( 1968): Aspects of Phonological Theory, Harper and Row, New York.

Documents

Predicting physical aspects of English stress