Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Dept. for Speech, Music and Hearing
Quarterly Progress andStatus Report
Predicting physical aspectsof English stress
McAllister, R.
journal: STL-QPSRvolume: 12number: 1year: 1971pages: 020-029
http://www.speech.kth.se/qpsr
STL-QPSR 1/1971 20.
C. PREDICTING PHYSICAL ASPECTS O F ENGLISH STRESS*
R. M cAlli s t e r
1. Problem
This paper i s a discussion of an attempt to discover correlation between
syntactic s t ructure and s t r e s s contours in American English.
The problem was made more tractable within this framework by using
the recent and well known work of Chomsky and Halle in "The Sound Pat te rn
of English" (1968) a s a point of reference. Right branching s t ructure and
the rules of s t r e s s pertaining to such s t ructures a r e the object of inquiry in
this study. Specifically, the study i s concerned with the operation of the
Nuclear S t r e s s Rule on right-branching s t ructures (pp. 17- 18 in Chomsky
and Halle, 1968) and the consequences of this operation in the speech wave.
Correlation has been sought between the output of the Nuclear S t r e s s Rule
when applied to a right-branching s t ructure and the fundamental frequency,
intensity, and duration pa ramete r s in the speech wave. In other words; what
do the s t r e s s contours that a r e the outputs of this rule mean in t e r m s of the ~ i
phonetic facts ? The correlation between the output of these ru les and the
phonetic facts may have important implications in a l a rge r discussion of the ~ role of the phonetic representation in the phonological component of the gram-
m a r . This linguistic application of the resul ts of this study will be men-
tioned l a t e r in this paper , a t which time the view of the role of the phonetic
representation presented in Chomsky and Halle (1968) will be discussed.
Also included in this paper i s an attempt to discover regularit ies in the
aforementioned pa ramete r s that might lead to the formulation of ru les t h a t
m o r e adequately express the relevant factors in the correlation between
surface s t ructure and the speech wave.
2. Experimental procedures
2. 1 Method -------
Special attention was given to the construction of the tes t phrases . The syntactic s t ructure has been s t r ic t ly defined and methodically expanded.
Starting with a simple two-constituent t ree ' big pants' , a right branching
t r e e was built up by expanding the construction to the left.
-:t Thesis work associated with the Dept. of Phonetics, Stockholm University
STL-QPSR 1/1971 2 1.
With this procedure, a range of s t ress levels generated by the NSR (Nuc-
lear Stress Rule) was obtained for each word in the syntax. In other words,
the syntax was expanded in the same way a s the transformational cycle oper-
ates on a right-branching structure through the NSR,
Fig. I-C-1 i s a summary of the material used in this study presented in
terms of its derivation using the procedure described above. Sentence A
represents the syntax in its longest form and below sentence A i s the deriva-
tion of its s t ress contour through the cyclical application of the NSR. The
phrases 1-7 below the s t ress level numerals a r e the phrases used a s test ut-
terances in this study. They a r e positioned so that the numerals above these
sentences represent both a step in the cyclical derivation of the s t ress contour
of sentence A and the final s t ress contours of the test utterances.
2.2 Measurements ----------- Mingograms were made of all repetitions of utterances and isolated words.
Fundamental frequency, intensity, and duratioh were measured for all five
readings.
2 . 2 . i Fundamental frequency: Fundamental frequency was measured at the
temporal midpoint of the vowel of the words in the test utterances. Care was
taken to produce an artifact free representation of Fo. In the preliminary
work for this investigation, two sets of data had to be partially rejected be-
cause it proved impossible to obtain reliable F information. The voices of 0
the informants were simply acoustically incompatible with a satisfactory
mingograph analysis. The informants also had many mispronunciations a s -
sociated with emphasis and enumeration intonation. That par t of the data
that was not rejected showed qualitative similarity to the data represented
below.
2.2.2 Intensity: Intensity was measured a t the temporal midpoint of the
vowel. An integration time of 25 msec was selected. The delay due to filter
inertia required a temporal compensation of approximately 20 msec which
corresponded to 1 mm in the sampling of the intensity curve.
2 .2 .3 Duration: Duration was measured for each stressed word from the
release of i ts initial consonant to the end of its voiced segment.
SPEECH MATERIALS SENTENCE A He kicked the cop who thought that Dick sharkl haw boucj'i the bett for his big pants
CYCLE 1 I 1 1 1 1 1 1 2 1
PHRASE 1
CYCLE 2
CYCLE 6 1 1 2 3 4 5 6 7 1
big pants
1 1 1 1 1 2 3 1
PHRASE 2
CYCLE 3
PHRASE 3
C'KLE 4
PHRASE 4
CYCLE 5
PHRASE 61 The cop thcught that Dick should have bought the belt for his big pants
the belt for his big pants
1 1 1 1 2 3 4 1
He bcught the belt for his big pants
1 1 1 2 3 4 5 1
Dick should have bought the belt for his big pants
1 1 2 3 4 5 6 1
CYCLE 7 1 2
PHRASE 5 He thoughtthatDick shouldhavebought thebelt forhis big pants
PHRASE 7=SENTENCE d He W the cop who thowht that Dick should have bw&t the belt for his bg pants
Fig. I -C-I . Speech ma te r i a l used in this study presented in t e r m s of i ts derivation. Sentence A and the phrases 1-7 a r e positioned s o that the s t r e s s level numerals represent both a step in the cyclical derivation of the s t r e s s contour of sentence A and the final s t r e s s contours of phrases 1-7 which were used a s tes t utterances.
STL-QPSR 1/1971
3 . Results
3. 1 Fundamental f r e e e n c y ---- ------- -
The measured Fo values were plotted a s a function of the words in the
t e s t phrases . This i s meant to throw light on the Fo contour in each sen-
tence and to facilitate discovery of any meaningful regularity in these con-
tours. Fig. I-C-2 is a summary of these Fo contours for a l l the tes t ph rases
considering only the s t r e s sed words. As each phrase i s read f rom left to
right, Fo decreases in value between s t r e s s level 2 a t the beginning and 1 a t
the end. Of part icular in te res t he re is that 1 has consistently the lowest
value. This phenomenon has often been observed and discussed in the
l i terature (cf. Lieberman 1966). It appears that a s each expansion of the
syntax was executed, the value of Fo a t the beginning of the phrase was I
raised. Therefore, the point of termination being around 100 Hz for a l l the
phrases , the initial value and range of Fo increases with increasing length
(number of s t r e s s e s ) in the phrase. There i s a higher ra te of reduction in
the range of s t r e s s levels, starting f rom 2 down to about 4. In other words,
the ra te of reduction i s higher towards the beginning of every utterance than
towards the end. The unstressed words in the phrases show an Fo contour
which i s somewhat s imilar to that of the s t r e s sed words. The significant
difference being that if the level of an initial unstressed word (which i s always
very low) i s disregarded, the range of F values is often somewhat smaller . 0
F o r a fuli account of this data, sce McAllister, forthcoming.
3.2 Intensity -------- As fo r F the measured intensity values were plotted a s a function of the
0
words in the t e s t phrases . This i s intended to elucidate the intensity con-
tours for each phrase and to make meaningful regular i t ies in these contours
m o r e obvious. Fig. I-C-3 shows a summary of these intensity contours con-
sidering only the s t r e s sed words. It i s obvious that these curves a r e s im-
ilar in some important ways to the F data. The descent of intensity value 0
between s t r e s s level 2 a t the beginning of each phrase and 1 a t the end i s not
unlike the Fo pat terns although the rate of reduction i s quite different. Note
here again that s t r e s s 1 corresponds to the lowest intensity level. There
seems to be he re also, an approximate colnmon terminal intensity level for
nearly a l l the phrases . Unlike the pattern f o r Fo where the s t r e s sed and un-
s t r e s sed curves gradually came together towards the end of the sentence,
the two intensity curves seem to have no common values.
1 I I I I I I I I I I I I I I I 1
150 - - 2
140 - -
130 - -
2 110 -
100 -
90 - L
J I I I 1 I I I I I I 1 I 1 I I I I
HE THE WHO THAT SHOULD HAVE THE FOR HIS KICKED COP THOUGHT DICK BOUGHT BELT BIG PANTS
Fig, I-C-2. Summary of the Fo contours for all the test phrases with Fo plotted as a function of the stressed words in the phrase.
25 HE THE WHO THAT SHOULD HAVE THE FOR HIS
KICKED COP THOUGHT DICK BOUGHT BELT BIG PANTS
Fig. I-C-3. Summary of the intensity contours for a l l the tes t ph rases with the inten- ,
sity values plotted as a function of the s t r e s sed words in all the phrases.
-
STRESS LEVEL OF PRECEDING SYLLABLE
I I I I I I I I
PHRASE FINAL SYLLABLE
WITH MAIN STRESS
-
Fig. I-C-5. Duration in milliseconds from consonant release to the end of the voiced segment for the word 'pants' plotted a s a function of the s tress level of the preceding syllable.
4
C
i
- -
I I I I I I 1 I
STL-QPSR 1/197 1 24.
higher level of s tress . It i s clear that when we consider the NSR, promi-
nence does not bear a 1- I relation with Fo. The point of the NSR i s that i t
unfailingly gives s t ress priority to the right-most constituent in the phrase.
Therefore in these strictly right-branching structures, where the NSR i s
cyclicr?lly applicable, the right-most constituent should always have the
highest of whatever the s t ress levels represent. The resul ts of the Fo
measurements show that s t r e s s level 1 consistently represents the lowest
Fo. This discrepancy between the s t ress level contours which a r e outputs
of the NSR and the Fo parameter would lead us to believe that there must
be a more effective way to describe F patterns in right-branching struc- 0
tures.
The results presented in Fig. I-C-2 show evident regularities in the Fo
patterns. According to Chomsky and Halle, the central cr i ter ia for the
operation of the NSR a re bracketing of the surface structure and the labeling
of the syntactic categories within the brackets. vlith this information, the
NSR can then generate the proper s t r e s s contours. The results show, how-
ever , that the output of the NSR has no direct correlation in the Fo patterns
for the s tressed words shown in Fig. I-C-2. The F contours for the un- 0
s tressed words also show obvious regularities and these regularities seem
to have a definite relation to the contours for the s t ressed words. The
description of the F values for the unstressed words in these constructions 0
then, can be made in t e rms of the relationship between the stressed and un-
s t ressed curves with the s t ressed functioning a s a reference.
4.2 Intensity - - - - - - - - Some of the discussion concerning the resul ts of the intensity measure-
ments will be similar to that of the Fo results. As was the case f o r the Fo
curves in Fig. I-C-2, it appears difficult to discover meaningful correla-
tion between the intensity patterns and the s t r e s s level contours which a r e
the outputs of the NSR. The same, somewhat misleading tendency for agree-
ment between decending s t re s s levels and lowered intensity value s i s apparent
in Fig. I- C-3. The validity of the s t r e s s level numerals a s indicators of
degree of intensity i s , however, a s for Fo, weakened by the fact that the
level 1 has consistently a relativcly low value in Fig. I-C-3. We must then
establish that if the higher s t r ess level numerals a r e to indicate a higher
degree of any characteristic, then this characteristic can hardly be inten-
sity. It i s obvious that the intensity envelopes show some similarities fo r
STL-QPSR 1/1971 25.
the Fo patterns in Fig. I-C-2. The general tendency for gradual reduction
of value from high, left-most s t r e s s level 2 to a low approximate common
value at the right-most level 1 i s evident. Except for the shor ter utterances,
the intensity curves a r e quite s imilar in t e r m s of r a t e of reduction (ca
2.5 d ~ / s t r e s s ) . This may lead us to believe that there i s a general ten-
dency for the intensity levels to be grouped along the same axis, and a s each
expansion of the syntax was executed, the intensity values for the added
s t ressed words were simply added to the other a t the appropriate place on
the axis, i. e. higher and to the left of the others. The slope of this axis
may be m o r e clearly stated if we f i r s t take into account the observation
that vowel quality has an effect in intensity levels. F r o m Lehiste (1970),
pp. 120 - 123, we know that close vowels have somewhat lower inherent in-
tensity than the open vowels. An adjustment of the intensity levels of the
words in the utterances according to these facts would ra i se the close vow-
els ' and lower the open vowels' intensity slightly.
As was the case for the unstressed word' s Fo, the unstressed word' s
intensity levels appear to have a direct relationship to those of the s t ressed
words. The contours fo r the unstressed words, with the possible exception
of the relative pronoun ' that' could then be defined in t e r m s of the contours
for the s t ressed words. The data for the pronoun ' that' i s probably too
limited to lead to any suspicion that this tendency for a lower intensity level
has any generality for &is o r any other relative pronoun.
4. 3 Duration -------- In the resul t s of the duration measurements , we see perhaps the most
pronounced discrepancy in t h e correlation between the phonetic facts and
the s t r e s s level contours which a r e outputs of the NSR. One could expect
that if duration were the character is t ic described with the s t r e s s level nu-
mera l s , a lowering of s t r e s s level would entail a durational reduction. Fig.
I-C-4 shows that this i s the case for the s t r e s s level change f rom 1 to 2 but
not for s t r e s s levels lower than 2. There appears to be two durational cate-
gories for the syllable nuclei in these tes t words. There i s a long category
which occurs when the words a r e pronounced in isolated position and a short
category which occurs when the words a r e pronounced in context. This is
the case for al l the words in the utterances except ' pants' . The syllable
nucleus in this word has a more constant duration value than those of any
other words. As mentioned in the presentation of the resul ts , the isolated
STL-QPSR 1/1971 26.
' pants' is slightly longer than the occurrences in context. The difference,
however, i s small compared to the other words. It i s important to keep in
mind that the word 'pants ' which shows this relative invariability in dura-
tion of measured segments, always has the s t r e s s level 1. These utterances,
then, could be said to show two durational levels instead of the maximum of
eight a s the NSR would lead u s to expect. F i r s t the durational level 1 and
then a durational level l e s s than 1, le t us say 2. The level 1 occurs at the
right-most constituent and in isolated position. The level 2 elsewhere.
An important question here concerns the meaning of the fact that the right-
mos t element in these s tructures has a durational value equal to that of i t s
isolated value. This could mean that s t r e s s pr ior i ty given to right hand
constituents in the NSR holds for the duration parameter . One can hardly
help feeling a certain hesitancy to accept this correlation between the NSR
and the duration parameter a t face value. The well-known tendency for
utterance-final syllable to undergo lengthening should be borne in mind.
5. Summary of the discussion
Consideration of the facts presented thus f a r would seem t o indicate that
the output of the Nuclear S t ress Rule does not show a simple correlation with
the prosodic pa ramete r s of the speech wave. This would be adequate motiva-
tion for u s a t this point, to systematically evaluate this rule a s a means of
predicting acoustic parameters in the speech wave.
5. 1 The NSR-based model ---------------
In this evaluation, an NSR-based model for the prediction of the F and 0
intensity parameters in this study will be compared to an alternative model
based on the number of s t r e s s e s in a phrase and position of words in the
phrase. The duration parameter will not be included in this comparison.
In an NSR-based model we must make use pr imar i ly of outputs of this
rule a s a basis fo r predictions. The s t r e s s contours which a r e outputs of
the NSR a r e expressed in t e r m s of numerals, the numeral 1 representing
the greatest degree of some s t r e s s characteristic. A reasonable demand
that may be placed on this rule i s that it, through i t s s t r e s s contour outputs,
show a correlation between the syntax (information the rule makes use of
in i t s operation) and phonetic reality by predicting the parameters wc have
discussed in this paper. The capacity of the NSR to do so i s i l lustrated in
Figs. I-C-6 and I-C-7. It is immediately apparent in these figures t h a t a t
Fig. I -C-6 .
STRESS LEVEL
A summary of observed Fo values for each s tress level includ- ing data for all the stressed words isolated and in context. The line connects the median values for each s tress level.
STRESS LEVEL Fig. I-C-7. A summary of observed intensity values for each s t r e s s level
including data fo r a l l the s t r e s s e d words isolated and in context. The line connects the median values for each s t r e s s level.
STL-QPSR i / i 97 1 27.
l ea s t two problems a r e associated with the NSR-based model. F i r s t , there
i s no l inear correlation between descending Fo o r intensity values and de-
scending s t r e s s levels. The second problem apparent in Figs. I-C-6 and
I-C -7 i s the spread of the data. The NSR-based model not only does not
account for the spread, but perhaps more important, does not account for
the obvious regularit ies in this spread. Use of ,an NSR-based model for the
Fo 2nd intensity pat terns in t h i s study would require considerable adjust-
ments of the NSR outputs. These adjustments would be based on some of
the same factors which an alternative model would make use of fo r predic-
tion of F and intensity such a s the number of s t r e s s e s and their position 0
in the utterance. Thus the low Fo level of terminal s t r e s s I would have to
be included a s a basic r d e . a
5 .2 Alternative model based on_phrase length and2osi t ion ----------------- ----- --- ----
Examination of the figure mater ia l presented in this study leads to the
conclusion that it is possible to formulate ru les that have quite different
character is t ics than the NSR. Instead of the syntactic information used by
the NSR, i t is possible to derive the Fo and intensity values f o r the words
in the t e s t ph rases by considering ( I) the length of the phrase and (2) the
position of the individual word in the phrase. In the formulas presented
below, position i s numbered f rom left to right. The left-most position in
the phrase has the position 1, the next position to the right 2, and so on.
The derivation of the values fo r the unstressed words is a l so included in
this quantitative t reatment of the data.
The following formulas a r e suggestions for derivation of F and intensity 0
values for each word in the t e s t phrases used in this study.
n = number of s t r e s s e s in phrase
p = position
k = . 2 for a succeeding unstressed word
k = 0 for s t ressed words
W roo > fx
PREDICTED FUNDAMENTAL FREQUENCY
Fig. I-C-8. Observed values for Fo plotted as a function of values predicted with the formula ( I ) section 5. 2.
STL-QPSR 1/1971 29.
(Strangely enough, Chomsky and Halle argue that the phonetic representation
need not have any relation to a "physical o r acoustic reality", p. 25. ) If
ru les provide outputs in t e r m s of a phonetic representation with little cor -
relation to the speech wave produced, w e must conclude that the demands
of the phonetic r ep re sentation must be better defined, and that ru les which
give r i s e to this phonetic representation bc accordingly formulated.
References:
Chomsky, N. and Halle, M. (1968): The Sound Pa t t e rn of English, Harper and Row, New York.
Fant , G. (1968): "Analysis and Synthesis of Speech Processes" , in Manual of Phonetics, ed. B. Malmberg, North-Holland Publ. Co., Amsterdam, pp. 173-277.
Lehiste, I. (1970): Suprasegmentals, The MIT P r e s s , Cambridge, Mass.
Lieberman, P. ( 1967): Intonation, Perception and Language, The MIT P r e s s , Cambridge, mass.
Lindblom, B. (1970): "Temporal Organization of Syllabic Processes" , invited paper presented a t the ASA-meeting a t Atlantic City, April 1970.
McAllister, R. (1971): "The Nuclear S t r e s s Rule and the Description of English Stress" , forthcoming in the Proceedings of the VIIth International Congress of Phonetic Sciences, Montreal.
Postal , P. ( 1968): Aspects of Phonological Theory, Harper and Row, New York.