A quantitative theory of cardinal vowels and the teaching

Dept. for Speech, Music and Hearing

Quarterly Progress andStatus Report

A quantitative theory ofcardinal vowels and the

teaching of pronunciationLindblom, B. and Sundberg, J.

journal: STL-QPSRvolume: 10number: 2-3year: 1969pages: 019-025

http://www.speech.kth.se/qpsr

http://www.speech.kth.se

http://www.speech.kth.se/qpsr

STL-QPSR 2-3/1969

B, A QUANTITATIVE THEORY OF CARDINAL VOWELS AND THE TEACHING O F PRONUPJCIATION*

B I Lindblom and J. Sundberg

Abstract

The problem of devising a reference sys tem for specifying the phonetic value of vowels is discussed. The c lass ica l theory of Cardinal Vowels is examined a s well a s previous quantitative frameworks for describing vowel pronunciation. An attempt i s made to construct a model of vowel production that combines the idea of a reference sys tem with an objective numerical method of specification. A se t of vowels i s generated +hat represent the most extreme vowels in t e r m s of the total acoti:.tic vowel space character is t ic of the model. These sounds a r e selected s o a s to be a l so approximately equidistant acoustically. These model- based "cardinal vowels" a r e compared with a se t of t rue cardinal vowels pronounced by Daniel Jones. The two se t s display many qualitative s imilar i t ies both acoustically and auditorily. The implications of a vowel reference sys tem that could indeed be pro- duced "from written desc riptions ", is touched upon in par t icular with regard to the teaching of pronunciation to hard of hearing as well a s normal students. An analysis -by- synthesis application i s suggested in which the model i s used to supplement v ioua l spec t ra l displays of vowel sounds with articulatory interpreta- tions. Some of the problems that would have to be overcome in such a method of "automatic articulation instruction" a r e mentioned, fo r instance, those of normalization and compensatory articulation.

The Cardinal Vowel Svstem

In the teaching of pronunciation one principle i s t o descr ibe the un-

known sounds to be learned in relation to sounds that a r e known to the

student. Phoneticians have pointed out that the sounds of a given lan-

guage cannot be used f o r this purpose since there a r e l a rge variations

among speakers of the same language owing t:, dialectal, socialogical,

and other factors ( I) . Instead of teaching, fo r instance, vowel pronun-

ciation i n t e r m s of the vowel sounds of a par t icular l a n g u a ~ e a reference

sys tem that i s independent of any given language has been devised. A

famous example of such a sys tem is the Cardinal Vowels. These sounds

a r e said to be "a se t of fixed vowel-sounds having known acoustic quali-

t ies and known tongue and l ip positions"(2). There a r e a pr imary se t

and a secondary se t of cardinal vowels each se t comprising eight vowels.

* This paper was presented a t the Second International Congress of Applied Linguistics, Cambridge, England, 8-1 2, Sept. 1969.

,

STL-QPSR 2-3/1969 20.

It is customary to r e fe r to a given cardinal vowel by i t s number and to

descr ibe i ts articulation in t e r m s of th ree dimensions : tongue height,

front-back position of the tongue and degree of rounding. In Table I-B-1

below we show the articulatory specifications and symbols used to de-

sc r ibe the pr imary and secondary sets.

TABLE I-B-1

close i u

half-close e o

half - open E 3

open a a

PRIMARY CARDINAL

VOWELS

close Y L u half-close b Y SECONDARY CARDINAL

half - open ce A VOWELS

open CIE D

(3). The following features a r e said to character ize these sounds . (1) They a r e independent of the vowels of any language.

(2) They a r e fixed reference points of "exactly determined and invariable quality".

(3) They a r e peripheral vowels, Thus in principle i t should be possible to descr ibe an a rb i t r a ry vowel quality of any language by interpolating between the reference points.

(4) They a r e auditorily equidistant.

(5) Moreover, "the values of cardinal vowels cannot be learnt f rom written descriptions; they should be lea t by o r a l A instruction f rom a teacher who knows them" .

The l a s t property indicates that the sys tem i s not an objective and

quantitative one but rel ies heavily on the motor skil ls and perceptual

acuity of the student. It is passed on by o r a l tradition.

Quantitative Frameworks of Vowel S~eci f ica t ion

F r o m the ear ly fifties onwards acoustic phonetics has made rapid

progress . Among the achievements in this field a r e the schemes devised

by Fant (4) and Stevens and House (5) to study the relation between vowel

articulations and their acoustic results. We a r e referr ing to the three-

pa ramete r models of these investigators. The following th ree parameters

a r e controlled in these models:

(1) length and opening a r e a of l ip section,

(2 ) position of maximal tongue constriction,

(3) the magnitude of this constriction.

On the basis of these th ree num'bers the cross-sect ional a r e a s along the

vocal t r ac t a r e derived with the aid of rules that differ somewhat between

the two versions of the three-parameter model. Given the distribution of

cross-sectional a r e a s along the t rac t , o r the a r e a function, - the acoustic

determinants of vowel quality, the formant frequencies, a r e ccmputed.

According to this type of model possible vowel articulation i s defined a s

any permissible combination of parameter values the parameters being

dimensions of the a r e a function.

Although these frameworks of vowel specification a r e objective and

quantitative which the cardinal vowel sys tem is not they sometimes spec-

ify "possible vowel articulation" in too generous a fashion and in a man-

ne r which i s not always easy to interpret in intuitively meaningful a r t ic -

ulatory t e r m s such as open-clos e, front-back, etc.

A Model of Vowel Production

In the present paper we shall report on an attempt to construct a model

that combines the idea of an articulatory and perceptual reference sys t em

inherent in the cardinal vowel t h e ~ ~ r y . Our a i m has been to build into the

model a l l that we know a t present about the natural degrees of freedom

of the vocal t rac t . In s o doing we hope that we might a r r i v e a t a n im-

proved definition of the notion of "possible vowel articulation".

A. Articulatory Proper t ies -------- ------ Our model is controlled by means of the following independent com-

ponent s :

the mandible whose movements we res t r ic t to a single fixed path.

the tongue whose shape can be varied continuously by l inear interpola- tion between three basic configurations of tongue c o n t ~ u r s corresponding to [i], [ a ] and [u], respectively. A cer ta in mixture of palatalization, velarization and pharyngealization ("[i) -nessf l , "Cu] -nessn andl[a] -ness1', respectively) corresponds to cer ta in numerical values of these parameters .

STL-QPSR 2-3/1969 22.

This choice can be justified partly on the basis of data obtained f rom la te ra l X-ray profiles of Swedish vowels. It turns out that t h ree main farnilizs of tongue contours a r e obtained provided that these contours a r e plotted with the lower jaw a s reference (Fig. I-B-1). An explanation of this ra ther restr ic ted se t of contours i s readily apparent when we think of the arrangement of the major extrinsic muscles of the tongue. We find the genioglossus, styloglos sus and the hyoglossus muscles which s e e m mechanically capable of participating in the contraction patterns under- lying the production of [i], [ and [ a ] , respectively(7).

labio-muscular which is independent of jaw position. activitv (round-

I \

ing- spreading)

larynx height

All of these parameters lend themselves naturally to a n interpretation in t e r m s of "muscle lengths". To compute a sound wave f rom such specifications the procedure is the following:

1 . Choose parametr ic values (jaw opening, tongue shape, rounding - spreading, larynx height).

2. Compute the associated ar t iculatory profile, that is, the contours of the vocal t r ac t and i ts length in a l a t e ra l projection.

3. Translate the resul t of 2 into an a r e a function (the variation of cross-sect ional a r e a along the tract) .

4. Compute the formant frequencies corresponding to this a r e a function.

In Fig. I-B-2 the three basic tongue shapes a r e shown. A coordinate

sys tem anchored on the mandible i s a l so depicted. This sys tem is used

to compute interpolated tongue shapes. At the top left we see the para-

me te r s of width and height of l ip separation with the aid of which the opening

a r e a i s derived. Below a la te ra l profile tracing i s shown with another co-

ordinate system. This sys tem i s used to compute the a r e a function.

B. Acoustic Proper t ies ----------- -- Naw assume that like the child learning to talk, we combine different

mandible positions, tongue shapes, l ip s ta tes and larynx heights in a l l

I possible ways and l is ten to the acoustic resul t in each individual case.

1 Whatever we do with our articulatory components it i s c l ea r that the I I human speech organs a r e constrained in such a way s o a s to permit only i

certain vowel qualities, o r combinations of formant frequency values.

Fig. I-B-1. Midsagittal tongue contours for Swedish vowels in relation to outline of mandible. Top left: [ i , e , r , .I. Top right: [u]. Below: [ a , o, ?I.

Fig. I-B-2. I n t h e u p p e r l e f t - h a n d p a r t t h e p a r a m e t e r s d e t e r m i n i n g the mouth opening a r e a A a r e shown: h = vert ical separation between lips; w = distance between mouth corners ; p i s a number that specifies the curvature of the lip contours . These contours when projected on a frontal plane a r e assumed to be given by

In the upper right par t the basic tongue shapes of the model a r e shown. A polar coordinate sys tem defined in relation to the mandible i s a lso indicated. With the aid of this coordinate sys tem interpolated tongue shapes associated with [ i , u] and [ a ] were computed.

In the lower par t of the figure a la te ra l X-ray tracing can be seen. Superimposed on the profile i s a coordinate sys tem defined in relation to fixed s t ruc tures such a s the maxilla. This sys tem was used in the de te r - mination of a r e a functions.

Certain other F3-F2dFI combinations charac ter ize vowels that we could

produce only with the aid of a terminal analogue synthesizer - not with our

mouths. F r o m the point of view of the human speech mechanism such

combinations would be impossible. The space that character izes the a -

coustic possibilities of our model i s shown in Fig. I-B-31 In this figure

we have separated the rounded and spread subspaces. The lower fields

r e f e r to a l l possible combinations of F and F values. The top a r e a s 2 1 pertain to the corresponding F values. When we explore the contours 3 of these spaces auditorily we find the qualities indicated by the vowel

symbols. These points have been selected a t approximately equidistant

F1 steps.

True and Model-Based Cardinal Vowels

Clear ly the f i r s t four features mentioned on p. 20 a s characterizing

cardinal vowels apply a l so to the vowels generated by the model. They

a r e independent of any language, they a r e per ipheral and fixed reference

points and they a r e acoustically ( i f not auditorily) equidistant.

Consequently i t would be of some interest t o compare a se t of model-

based vowels with a se t of t rue cardinal vowels. Fig. I-B-4 demonstrates

the resul ts of an acoustic comparison of this type. The f i r s t th ree for- I

mant frequencies were measured in a se t of cardinal vowels a s spoken

by Daniel Jones (I) . The left plot in this figure shows the extreme vowels

generated with the model (also shown in Fig. I-B-3). It is seen that the

t r u e cardinal vowels span a somewhat l a r g e r range. This difference is

probably due to the fact that,among other factors , Daniel Jones has a

shor t e r overall t r ac t length than that of our model and that he a l so de-

l iberately shortened his t r ac t by elevating his larynx to an extreme posi-

tion when producing [il and probably [a). Qualitative s imilar i t ies do

exist between the se ts , however. Table I-B-2 contains a specification of

the articulatory parameters that underlie the model generated sounds.

Table I-B-2. The articulatory dimensions of per ipheral vowel types

Tongue shape

Palatal Palato-pharyngeal Pharyngeal Velar

Close i u

e o

J A W a Open E a

ROUNDED VOWEL SPACE S P R E A D VOWEL SPACE

-

-

- +-.-.-.4.

- -.-.-.- Roundrd and larynx bprassad

-

-

-

-

[Ul I 1 L I I .

.1 .2 4 .4 .S .6 .7 .8 kHz .I .2 .3 .4 .5 .6 .7 .8 kHz

FIRST FORMANT FREQUENCY F IRST FORMANT FREQUENCY

Fig. I -B-3 . The maximal rounded and spread vowel spaces that the model i s capable of generating.

Implications of a Cuantitative Theory --- of Cardinal Vowels f o r the Teaching of Pronunciation

In theory, a physiological model of vowel production should be a useful

tool in the teaching of pronunciation t 3 second language l ea rne r s and hard

of hearing children. In practice, such an application would require solutions

to somc technical problems which, by rro means, however, appear insur-

mountable. Imagine that the formant pattern of a vowel in a given language

o r dialect could be measured automatically and represented a s a dot on an

oscilloscope screen. F2 and F o r some measure combining these two 3 ' frequencies, could be plotted along the ordinate and F along the abscissa.

1 Technologically this i s not wishful thinking. Attempts have already been

made along these l ines (6). Suppose that w e f i r s t plot a ta rge t vowel a s in-

dicated in Fig. I-B-5. Next we a s k our subject who might for instance be

a deaf child to produce a vov~el a s c lose to the target a s possible. In a l l

probability our pupil will mis s the target perhaps in the manner indicated

in Fig, I-B-5. might a t this stage have the child t r y to improve his

pronunciation by a t r ia l -and-er ror procedure. An even be t te r method

would be to supplement the visuzi display with an indication of how the

child should change his articulation in o rde r to reach the target. Such

information could be given in t e r m s of t ra jector ies depicting the formant

frequency shift associated with "isolingual" - and "isomandibular" condi-

tions. A se t of such curves i s given in Fig. I-B-5. This instrument

would se rve a s a so r t of "automatic articulation instructor".

This goal might a l so sound utopian but i t should be possible given a

computer and an acceptable theory of vowel production.

The present model of vowel production is obviously unsatisfactory a s

such a theory. There a r e a number of modifications that a r e clear ly

needed to improve it?. There is for instance the problem of normalizing

F-pat tern data for ta lkers whose vocal t r ac t lengths differ often non-

uniformly in s ize There i s a lso the problem of compensatory ar t icu-

lations that our model a t present often incorrectly allows in the c a s e of

non-peripheral vowels. Although it might be possible to produce an [jd]

3: e. g. , independent control of tongue blade in retroflexion and coronal consonant articulation; independent control of pharynx width in connec- tion with the tense-lax distinction on which the vowel harmony of the Akan languages seems to be based; hyperpalatalization, velarization and pharyngealization to produce cotnnensatory articalations and (ifferent types of dorsa l constrictions l a r g e r than those for vowels; control of the form of the erozs-sectional a r e a ( la terals , fricatives etr . ).

MOVE TONGUE FORWARD !

P U P I ~ S VOWEL

FIRST FORMANT FREQUENCY

Fig. I-B-5. Stylized visual display of s p e c t r a l p roper t i e s of vowels intended t o be used in improving the pronunciation of vowels by hard of hearing ,subjects . The t a r g e t vowel and the pup i l ' s vowel a r e represen ted by dots . The t r a j e c - t o r i e s indicate the formant frequency shifts that would resu l t if the pupil changed his ar t icula t ion a s indicated, that i s , by lowering his jaw and by moving his tongue forward. These t r a j e c t o r i e s a r e assumed t o be based on an analys is-by-synthesis of the pupil' s vowel using a quantitative model of vowel production of the type desc r ibed i n the paper .

STL-QPSR 2-3/1969 25.

with spread lips by compensating elsewhere along the t r ac t i t is extreme-

ly r a r e to find a Swedish native ta lker who consistently spreads his [ b ] sounds. Among other things there seems to be a principle of disambigua-

tion that i s yet to be discovered in future research.

References :

(1) Cardinal Vowels (spoken by D. ones), Linguaphone Institute Y w % 6 ) .

(2) D. Jones: An Outline of English Phonetics (Cambridge 1956), 8th edition.

(3) D. Aberc rombie: Elements of General Pholletics ( ~ d i n b u r ~ h 1967).

(4) G. Fant: Acoustic Theory of Speech Production ( ' s -Gravenhage 1960).

(5) K, NA Stevens and A. S, House: "Development of a quantitative descr ip- tion of vowel articuIation", J. Acoust. Soc. Am. - 17 (1 955), pp. 484-493r

(6) J. M. Pidkett and A, Constam: "A visual speech t r a ine r with simplified indication of vowel spec!truml', Am, Ann. of the deaf - 1 13 (1 968), pp. 253-258.

I. B. Thomas and R. C. Snell: "Articulation training through the use of a real-t ime visual display of speech parameters" , m s sub- mitted for publication in Am.Ann. of the Deaf.

A. J. Goldberg: "Visual perception of speech stimuli", Quarterly P r o g r e s s Report No. 92 (MIT, RLE, Cambridge, Mass. ), Jan. 15, 1969, pp. 335-338.

(7) P. Ladefoged: "Physiological characterization of speech", Working Paper s in Phonetics (UCLA), June 1964, pp. 2-9.

(8) G. Fant: "A note on vocal t r ac t s ize factors and non-uniform F- pattern scalings", STL-QPSR 4/1966 (KTH, Stockholm), pp. 22-30.

Acknowledgments

This work was supported by the National Institutes of Health

Research Grant No. NB 04003-07 and the Tri-Centennial Fund of

the Bank of Sweden Contract No. 67/48.

Documents

A quantitative theory of cardinal vowels and the teaching