Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Dept. for Speech, Music and Hearing
Quarterly Progress andStatus Report
A quantitative theory ofcardinal vowels and the
teaching of pronunciationLindblom, B. and Sundberg, J.
journal: STL-QPSRvolume: 10number: 2-3year: 1969pages: 019-025
http://www.speech.kth.se/qpsr
STL-QPSR 2-3/1969
B, A QUANTITATIVE THEORY OF CARDINAL VOWELS AND THE TEACHING O F PRONUPJCIATION*
B I Lindblom and J. Sundberg
Abstract
The problem of devising a reference sys tem for specifying the phonetic value of vowels is discussed. The c lass ica l theory of Cardinal Vowels is examined a s well a s previous quantitative frameworks for describing vowel pronunciation. An attempt i s made to construct a model of vowel production that combines the idea of a reference sys tem with an objective numerical method of specification. A se t of vowels i s generated +hat represent the most extreme vowels in t e r m s of the total acoti:.tic vowel space character is t ic of the model. These sounds a r e selected s o a s to be a l so approximately equidistant acoustically. These model- based "cardinal vowels" a r e compared with a se t of t rue cardinal vowels pronounced by Daniel Jones. The two se t s display many qualitative s imilar i t ies both acoustically and auditorily. The im- plications of a vowel reference sys tem that could indeed be pro- duced "from written desc riptions ", is touched upon in par t icular with regard to the teaching of pronunciation to hard of hearing as well a s normal students. An analysis -by- synthesis application i s suggested in which the model i s used to supplement v ioua l spec t ra l displays of vowel sounds with articulatory interpreta- tions. Some of the problems that would have to be overcome in such a method of "automatic articulation instruction" a r e men- tioned, fo r instance, those of normalization and compensatory articulation.
The Cardinal Vowel Svstem
In the teaching of pronunciation one principle i s t o descr ibe the un-
known sounds to be learned in relation to sounds that a r e known to the
student. Phoneticians have pointed out that the sounds of a given lan-
guage cannot be used f o r this purpose since there a r e l a rge variations
among speakers of the same language owing t:, dialectal, socialogical,
and other factors ( I) . Instead of teaching, fo r instance, vowel pronun-
ciation i n t e r m s of the vowel sounds of a par t icular l a n g u a ~ e a reference
sys tem that i s independent of any given language has been devised. A
famous example of such a sys tem is the Cardinal Vowels. These sounds
a r e said to be "a se t of fixed vowel-sounds having known acoustic quali-
t ies and known tongue and l ip positions"(2). There a r e a pr imary se t
and a secondary se t of cardinal vowels each se t comprising eight vowels.
* This paper was presented a t the Second International Congress of Applied Linguistics, Cambridge, England, 8-1 2, Sept. 1969.
,
STL-QPSR 2-3/1969 20.
It is customary to r e fe r to a given cardinal vowel by i t s number and to
descr ibe i ts articulation in t e r m s of th ree dimensions : tongue height,
front-back position of the tongue and degree of rounding. In Table I-B-1
below we show the articulatory specifications and symbols used to de-
sc r ibe the pr imary and secondary sets.
TABLE I-B-1
close i u
half-close e o
half - open E 3
open a a
PRIMARY CARDINAL
VOWELS
close Y L u half-close b Y SECONDARY CARDINAL
half - open ce A VOWELS
open CIE D
(3). The following features a r e said to character ize these sounds . (1) They a r e independent of the vowels of any language.
(2) They a r e fixed reference points of "exactly determined and invariable quality".
(3) They a r e peripheral vowels, Thus in principle i t should be possible to descr ibe an a rb i t r a ry vowel quality of any language by interpolating between the reference points.
(4) They a r e auditorily equidistant.
(5) Moreover, "the values of cardinal vowels cannot be learnt f rom written descriptions; they should be lea t by o r a l A instruction f rom a teacher who knows them" .
The l a s t property indicates that the sys tem i s not an objective and
quantitative one but rel ies heavily on the motor skil ls and perceptual
acuity of the student. It is passed on by o r a l tradition.
Quantitative Frameworks of Vowel S~eci f ica t ion
F r o m the ear ly fifties onwards acoustic phonetics has made rapid
progress . Among the achievements in this field a r e the schemes devised
by Fant (4) and Stevens and House (5) to study the relation between vowel
articulations and their acoustic results. We a r e referr ing to the three-
pa ramete r models of these investigators. The following th ree parameters
a r e controlled in these models:
(1) length and opening a r e a of l ip section,
(2 ) position of maximal tongue constriction,
(3) the magnitude of this constriction.
On the basis of these th ree num'bers the cross-sect ional a r e a s along the
vocal t r ac t a r e derived with the aid of rules that differ somewhat between
the two versions of the three-parameter model. Given the distribution of
cross-sectional a r e a s along the t rac t , o r the a r e a function, - the acoustic
determinants of vowel quality, the formant frequencies, a r e ccmputed.
According to this type of model possible vowel articulation i s defined a s
any permissible combination of parameter values the parameters being
dimensions of the a r e a function.
Although these frameworks of vowel specification a r e objective and
quantitative which the cardinal vowel sys tem is not they sometimes spec-
ify "possible vowel articulation" in too generous a fashion and in a man-
ne r which i s not always easy to interpret in intuitively meaningful a r t ic -
ulatory t e r m s such as open-clos e, front-back, etc.
A Model of Vowel Production
In the present paper we shall report on an attempt to construct a model
that combines the idea of an articulatory and perceptual reference sys t em
inherent in the cardinal vowel t h e ~ ~ r y . Our a i m has been to build into the
model a l l that we know a t present about the natural degrees of freedom
of the vocal t rac t . In s o doing we hope that we might a r r i v e a t a n im-
proved definition of the notion of "possible vowel articulation".
A. Articulatory Proper t ies -------- ------ Our model is controlled by means of the following independent com-
ponent s :
the mandible whose movements we res t r ic t to a single fixed path.
the tongue whose shape can be varied continuously by l inear interpola- tion between three basic configurations of tongue c o n t ~ u r s corresponding to [i], [ a ] and [u], respectively. A cer ta in mixture of palatalization, velarization and pharyngealiza- tion ("[i) -nessf l , "Cu] -nessn andl[a] -ness1', respectively) corresponds to cer ta in numerical values of these para- meters .
STL-QPSR 2-3/1969 22.
This choice can be justified partly on the basis of data obtained f rom la te ra l X-ray profiles of Swedish vowels. It turns out that t h ree main farnilizs of tongue contours a r e obtained provided that these con- tours a r e plotted with the lower jaw a s reference (Fig. I-B-1). An explanation of this ra ther restr ic ted se t of contours i s readily apparent when we think of the arrangement of the major extrinsic muscles of the tongue. We find the genioglossus, styloglos sus and the hyoglossus muscles which s e e m mechanically cap- able of participating in the contraction patterns under- lying the production of [i], [ and [ a ] , respectively(7).
labio-muscular which is independent of jaw position. activitv (round-
I \
ing- spreading)
larynx height
All of these parameters lend themselves naturally to a n interpretation in t e r m s of "muscle lengths". To com- pute a sound wave f rom such specifications the procedure is the following:
1 . Choose parametr ic values (jaw opening, tongue shape, rounding - spreading, larynx height).
2. Compute the associated ar t iculatory profile, that is, the contours of the vocal t r ac t and i ts length in a l a t e ra l projection.
3. Translate the resul t of 2 into an a r e a function (the variation of cross-sect ional a r e a along the tract) .
4. Compute the formant frequencies corresponding to this a r e a function.
In Fig. I-B-2 the three basic tongue shapes a r e shown. A coordinate
sys tem anchored on the mandible i s a l so depicted. This sys tem is used
to compute interpolated tongue shapes. At the top left we see the para-
me te r s of width and height of l ip separation with the aid of which the opening
a r e a i s derived. Below a la te ra l profile tracing i s shown with another co-
ordinate system. This sys tem i s used to compute the a r e a function.
B. Acoustic Proper t ies ----------- -- Naw assume that like the child learning to talk, we combine different
mandible positions, tongue shapes, l ip s ta tes and larynx heights in a l l
I possible ways and l is ten to the acoustic resul t in each individual case.
1 Whatever we do with our articulatory components it i s c l ea r that the I I human speech organs a r e constrained in such a way s o a s to permit only i
certain vowel qualities, o r combinations of formant frequency values.
Fig. I-B-1. Midsagittal tongue contours for Swedish vowels in relation to outline of mandible. Top left: [ i , e , r , .I. Top right: [u]. Below: [ a , o, ?I.
Fig. I-B-2. I n t h e u p p e r l e f t - h a n d p a r t t h e p a r a m e t e r s d e t e r m i n i n g the mouth opening a r e a A a r e shown: h = vert ical sepa- ration between lips; w = distance between mouth corners ; p i s a number that specifies the curvature of the lip con- tours . These contours when projected on a frontal plane a r e assumed to be given by
In the upper right par t the basic tongue shapes of the model a r e shown. A polar coordinate sys tem defined in relation to the mandible i s a lso indicated. With the aid of this coordinate sys tem interpolated tongue shapes associated with [ i , u] and [ a ] were computed.
In the lower par t of the figure a la te ra l X-ray tracing can be seen. Superimposed on the profile i s a coord- inate sys tem defined in relation to fixed s t ruc tures such a s the maxilla. This sys tem was used in the de te r - mination of a r e a functions.
Certain other F3-F2dFI combinations charac ter ize vowels that we could
produce only with the aid of a terminal analogue synthesizer - not with our
mouths. F r o m the point of view of the human speech mechanism such
combinations would be impossible. The space that character izes the a -
coustic possibilities of our model i s shown in Fig. I-B-31 In this figure
we have separated the rounded and spread subspaces. The lower fields
r e f e r to a l l possible combinations of F and F values. The top a r e a s 2 1 pertain to the corresponding F values. When we explore the contours 3 of these spaces auditorily we find the qualities indicated by the vowel
symbols. These points have been selected a t approximately equidistant
F1 steps.
True and Model-Based Cardinal Vowels
Clear ly the f i r s t four features mentioned on p. 20 a s characterizing
cardinal vowels apply a l so to the vowels generated by the model. They
a r e independent of any language, they a r e per ipheral and fixed reference
points and they a r e acoustically ( i f not auditorily) equidistant.
Consequently i t would be of some interest t o compare a se t of model-
based vowels with a se t of t rue cardinal vowels. Fig. I-B-4 demonstrates
the resul ts of an acoustic comparison of this type. The f i r s t th ree for- I
mant frequencies were measured in a se t of cardinal vowels a s spoken
by Daniel Jones (I) . The left plot in this figure shows the extreme vowels
generated with the model (also shown in Fig. I-B-3). It is seen that the
t r u e cardinal vowels span a somewhat l a r g e r range. This difference is
probably due to the fact that,among other factors , Daniel Jones has a
shor t e r overall t r ac t length than that of our model and that he a l so de-
l iberately shortened his t r ac t by elevating his larynx to an extreme posi-
tion when producing [il and probably [a). Qualitative s imilar i t ies do
exist between the se ts , however. Table I-B-2 contains a specification of
the articulatory parameters that underlie the model generated sounds.
Table I-B-2. The articulatory dimensions of per ipheral vowel types
Tongue shape
Palatal Palato-pharyngeal Pharyngeal Velar
Close i u
e o
J A W a Open E a
ROUNDED VOWEL SPACE S P R E A D VOWEL SPACE
-
-
- +-.-.-.4.
- -.-.-.- Roundrd and larynx bprassad
-
-
-
-
[Ul I 1 L I I .
.1 .2 4 .4 .S .6 .7 .8 kHz .I .2 .3 .4 .5 .6 .7 .8 kHz
FIRST FORMANT FREQUENCY F IRST FORMANT FREQUENCY
Fig. I -B-3 . The maximal rounded and spread vowel spaces that the model i s capable of generating.
Implications of a Cuantitative Theory --- of Cardinal Vowels f o r the Teaching of Pronunciation
In theory, a physiological model of vowel production should be a useful
tool in the teaching of pronunciation t 3 second language l ea rne r s and hard
of hearing children. In practice, such an application would require solutions
to somc technical problems which, by rro means, however, appear insur-
mountable. Imagine that the formant pattern of a vowel in a given language
o r dialect could be measured automatically and represented a s a dot on an
oscilloscope screen. F2 and F o r some measure combining these two 3 ' frequencies, could be plotted along the ordinate and F along the abscissa.
1 Technologically this i s not wishful thinking. Attempts have already been
made along these l ines (6). Suppose that w e f i r s t plot a ta rge t vowel a s in-
dicated in Fig. I-B-5. Next we a s k our subject who might for instance be
a deaf child to produce a vov~el a s c lose to the target a s possible. In a l l
probability our pupil will mis s the target perhaps in the manner indicated
in Fig, I-B-5. might a t this stage have the child t r y to improve his
pronunciation by a t r ia l -and-er ror procedure. An even be t te r method
would be to supplement the visuzi display with an indication of how the
child should change his articulation in o rde r to reach the target. Such
information could be given in t e r m s of t ra jector ies depicting the formant
frequency shift associated with "isolingual" - and "isomandibular" condi-
tions. A se t of such curves i s given in Fig. I-B-5. This instrument
would se rve a s a so r t of "automatic articulation instructor".
This goal might a l so sound utopian but i t should be possible given a
computer and an acceptable theory of vowel production.
The present model of vowel production is obviously unsatisfactory a s
such a theory. There a r e a number of modifications that a r e clear ly
needed to improve it?. There is for instance the problem of normalizing
F-pat tern data for ta lkers whose vocal t r ac t lengths differ often non-
uniformly in s ize There i s a lso the problem of compensatory ar t icu-
lations that our model a t present often incorrectly allows in the c a s e of
non-peripheral vowels. Although it might be possible to produce an [jd]
3: e. g. , independent control of tongue blade in retroflexion and coronal consonant articulation; independent control of pharynx width in connec- tion with the tense-lax distinction on which the vowel harmony of the Akan languages seems to be based; hyperpalatalization, velarization and pharyngealization to produce cotnnensatory articalations and (ifferent types of dorsa l constrictions l a r g e r than those for vowels; control of the form of the erozs-sectional a r e a ( la terals , fricatives etr . ).
MOVE TONGUE FORWARD !
P U P I ~ S VOWEL
FIRST FORMANT FREQUENCY
Fig. I-B-5. Stylized visual display of s p e c t r a l p roper t i e s of vowels intended t o be used in improving the pronunciation of vowels by hard of hearing ,subjects . The t a r g e t vowel and the pup i l ' s vowel a r e represen ted by dots . The t r a j e c - t o r i e s indicate the formant frequency shifts that would resu l t if the pupil changed his ar t icula t ion a s indicated, that i s , by lowering his jaw and by moving his tongue forward. These t r a j e c t o r i e s a r e assumed t o be based on an analys is-by-synthesis of the pupil' s vowel using a quantitative model of vowel production of the type desc r ibed i n the paper .
STL-QPSR 2-3/1969 25.
with spread lips by compensating elsewhere along the t r ac t i t is extreme-
ly r a r e to find a Swedish native ta lker who consistently spreads his [ b ] sounds. Among other things there seems to be a principle of disambigua-
tion that i s yet to be discovered in future research.
References :
(1) Cardinal Vowels (spoken by D. ones), Linguaphone Institute Y w % 6 ) .
(2) D. Jones: An Outline of English Phonetics (Cambridge 1956), 8th edition.
(3) D. Aberc rombie: Elements of General Pholletics ( ~ d i n b u r ~ h 1967).
(4) G. Fant: Acoustic Theory of Speech Production ( ' s -Gravenhage 1960).
(5) K, NA Stevens and A. S, House: "Development of a quantitative descr ip- tion of vowel articuIation", J. Acoust. Soc. Am. - 17 (1 955), pp. 484-493r
(6) J. M. Pidkett and A, Constam: "A visual speech t r a ine r with simplified indication of vowel spec!truml', Am, Ann. of the deaf - 1 13 (1 968), pp. 253-258.
I. B. Thomas and R. C. Snell: "Articulation training through the use of a real-t ime visual display of speech parameters" , m s sub- mitted for publication in Am.Ann. of the Deaf.
A. J. Goldberg: "Visual perception of speech stimuli", Quarterly P r o g r e s s Report No. 92 (MIT, RLE, Cambridge, Mass. ), Jan. 15, 1969, pp. 335-338.
(7) P. Ladefoged: "Physiological characterization of speech", Working Paper s in Phonetics (UCLA), June 1964, pp. 2-9.
(8) G. Fant: "A note on vocal t r ac t s ize factors and non-uniform F- pattern scalings", STL-QPSR 4/1966 (KTH, Stockholm), pp. 22-30.
Acknowledgments
This work was supported by the National Institutes of Health
Research Grant No. NB 04003-07 and the Tri-Centennial Fund of
the Bank of Sweden Contract No. 67/48.