Eﬀect of Repetition in Pronunciation Practice on Retrieval

Cognitive Studies, 18(2), 320-328. (June 2011)

● Short Notes●

　Effect of Repetition in Pronunciation Practiceon Retrieval of Nonsense Words

Katsumi Nagai　

A nonsense word retrieval test was conducted to examine the effectiveness of two

common practices in language classrooms: repetition “with” a teacher and “after” a

teacher. Participants were asked to memorize two lists of bisyllabic test words by

reading the words aloud with and after the model presentation (repetition phase). Par-

ticipants heard test words and were expected to reject the “new” test words as “words

that were not repeated in the test phase”. The number of correct responses and par-

ticipants’ levels of confidence were analysed, and both ANOVA and ROC (Receiver

Operating Chracteristic) curve showed that repetition “with a teacher” significantly

surpassed “after a teacher”. These results demonstrate the advantage of pronunciation

practice simultaneously with a teacher.

Keywords: pronunciation practice（発音練習）, repetition（反復）, retrieval of words

（単語再認）, Receiver Operating Characteristics curve（ROC曲線）

1. Introduction

Pronunciation practice has special importance

for language learning because speech sounds

form the basis of all languages. Foreign lan-

guage learners have been long encouraged to im-

prove their ‘naturalness’ measured against ‘na-

tive’ speakers of the target language. Conse-

quently, pronunciation practice occupies an im-

portant position in language classes when learn-

ers’ mispronunciation is expected to create un-

avoidable difficulties in communication (see re-

views in Stern, 1992; Howatt and Widdowson,

2004). Teaching plans typically include repeti-

tion of minimally-paired target sounds, words,

and phrases following the teacher’s explanation

and model pronunciation (Celce-Murcia, Brin-

ton, and Goodwin, 1996, p.310; Doff, 1988;

Kelly, 2000). However, a principal aim in pro-

nunciation practice can be also an improvement

Foreign Language Centre (Daikyo-centre), KagawaUniversity.

in ‘intelligibility’ — the degree to which the

learner’s speech can be understood by others,

both receptively and productively. This view-

point is free from disagreements over the defini-

tions of ‘good’ and ‘correct’ accent when ‘get-

ting meaning across to the listeners’ is set as

the goal of pronunciation practice (Kenworthy,

1987; Dalton and Seidlhofer, 1994). Despite

persistent efforts of teachers and researchers, it

seems there is a long way to go before standard-

ized scales of ‘naturalness’ and ‘intelligibility’ are

available to language learners (see examples in

Miwa, Sasaki, and Tanno, 2000; Aoyama, Flege,

Guion, Akahane-Yamada, and Yamada, 2004).

Even so, another goal of oral practice can be

set as a substitute for written grammar exercises

because vocalization requires less time and en-

ergy than paper-based tasks. It is true that little

previous research has been published about this

goal, and that some teachers might say that such

oral practice can not be categorized as a pronun-

ciation drill in cases where no weight is given to

Vol. 18 No. 2 Repetition in Pronunciation Practice on Retrieval of Nonsense Words 321

‘naturalness’ or ‘intelligibility’. Nevertheless, it

is still reasonable for researchers to maximize the

efficiency of pronunciation drills as long as audio-

lingual-type oral practice produces a certain and

inevitable effect on learners. In practical terms,

a more important matter for teachers is to clar-

ify how this effect can be maximized. This is the

purpose of the present experiment.

When the first aim of pronunciation drills is

to increase learners’ vocabulary, the effect can

be quantitatively measured by a word recogni-

tion test, in which the scores of nonsense-word

retrieval are measured after pronunciation drills.

In the present experiment, nonsense test words

were synthesized to minimize the deviation of

various acoustic characteristics found in the hu-

man voice. Effectiveness was measured by the

scores of word retrieval tests with a factor of six-

point confidence rating (Massaro, 1975, pp.85–

141; Wickens, 2002, pp.83–92). The pronunci-

ation drills in the present paper are limited to

the following basic twofold types: repetition ‘af-

ter’ the teacher and repetition ‘with’ the teacher;

variations of oral practice can be considered as

derived forms of these two types of practices.

Repetition ‘after’ the teacher is considered the

default and unmarked practice, what most teach-

ers and learners do in their classrooms (see Nagai

2007 for a description of the varieties of pronun-

ciation drills).

Nagai (2007) conducted four experiments to

assess naturalness of learners’ repetition in two

ways — ‘after’ the teacher and ‘with’ the teacher.

The results show that repetition ‘after’ the

teacher surpassed ‘with’ the teacher in natural-

ness of the sentences spoken by the learners prob-

ably because repeating ‘with’ a model is a task

which hinders learners’ precise auditory feed-

back by its random and nonstationary interrup-

tion including the participants’ own voices. An-

other experiment in Nagai (2009) firstly asked

the participants to repeat test sentences orally

‘after’ and ‘with’ the model presentation, and

then asked them to judge whether the sentences

were grammatical or not by pressing keys on a re-

sponse pad. The results indicated that repeating

‘with’ the model tends to yield better grammat-

ical judgment scores than repeating ‘after’ the

model. Do tests on vocabulary show the same

tendency as seen in Nagai (2009)? Does pronun-

ciation practice ‘after’ the teacher also improve

the learners’ scores on a nonsense word memo-

rization test? The present study addresses these

issues through an experiment with word retrieval

tests administered after vocalizing the test words

after and with the model presentation.

2. Experiment

2.1 Participants

Six male and four female college students

aged between 19 and 21 participated in the ex-

periment. They all were born and raised in

Okayama. They received honoraria for their par-

ticipation. None of them had hearing disorders

or experience living abroad.

2.2 Test words

Test words, shown in Table 1, were synthe-

sized using a speech synthesizer (Fujitsu, 1998)

at 16bit/22kHz sampling to minimize durational

variety and to maximize the naturalness of coar-

ticulation within the words. The list con-

tained 120 bisyllabic (two consonant-vowel com-

binations: C1V1.C2V2) nonsense test words of

which the association (i.e. familiarity) values

for Japanese speakers are below 20 (on scale of

1 to 100) in Hayashi (1976). Some candidate

words, either entry words in a popular Japanese

dictionary (Shogakukan, 1988) or Japanese ono-

matopoeia were also removed from the list. The

list of nonsense test words in Hayashi (1976) in-

cludes words of foreign origin. Note that limit-

ing test words to phonetically and phonologically

English-type words would decrease the validity of

the result because the aim of present paper is to

examine vocabulary building through oral prac-

tice. The mean duration of all test words was

476ms (S.D.=20.8). Table 2 indicates variations

322 Cognitive Studies June 2011

Table 1 List of test words

heha nehe rano ruyu

hehu nehi raro sehe

heka neke rayo seho

heme neme rehe sonu

hene newa reke suse

henu nime reme suyo

heyo nino rera tehi

hihe nona rewa teyu

hinu nonu reya tohe

hise noyo reyu tonu

hohi noyu rinu tuhu

honi nuha rite tuko

honu nuhe riwa tunu

huho nuho rohe tusa

kehe nuko romo tuse

keku numi roni tuso

kenu numo ronu wami

keyo numu rowa wamo

mehe nuna royo wane

mehu nune royu waso

memi nuni ruhe wayu

mena nuse ruke yohe

mihu nuso rume yuha

moma nutu runi yuhe

monu nuwa runu yuma

muhe nuya rura yumu

mumo nuyo ruro yunu

munu nuyu rute yuro

muwa raho ruya yuti

nate rani ruyo yuyo

Table 2 Mean durations and standard deviations of C1V1 and C2V2 units in test words.

The duration of whole CV units ranges between 395 ms (shortest ‘ku’) and

528 ms (longest ‘se’).

head

consonants

mean CV

duration (ms)S.D.

h 252.6 18

k 175.5 18

m 240 16

n 238.6 17

r 244.8 15

s 280.5 15

t 201 12

y 243.7 7

vowelsmean CV

duration (ms)S.D.

a 238.3 30

e 247.7 37

i 235.2 19

o 240.4 38

u 218.6 31

of syllabic duration sorted by their consonants

and vowels in the CV units.

2.3 Procedure and apparatus

After a brief explanation of experimental

procedures in a soundproofed room (Yamaha

ANUKC3508), participants put on headphones

(Sony MDR-CD2000) connected to a computer

extension card (Creative Audigy 2ZS). They lis-

tened to stimuli presented at 60 dB (SPL) and

repeated the test words loudly. The first part of

the experiment consisted of six repetition phases

with ‘repeat after me’-type practice and then

six more test phases. One repetition phase in-

cluded ten nonsense test words with pauses of

five seconds between each word, and participants

repeated the test words ten times in one rep-

etition phase. Participants were asked to vo-

calize the test words clearly, and their voices

were recorded digitally to ensure a precise re-

production of the test words (with a Roland R-1

recorder and Sony ECM-55B microphone). Then

the test phase, termed ‘signal plus noise trials’ in

Signal Detecting Theory (Green & Swets, 1966;


a. Repetition phase(10 words ‘after’ theteacher, 50 seconds)

b. Test phase(10 repeated words +10 new words, self-paced)

c. Repetition phase(10 words ‘with’ theteacher, 50 seconds)

d. Test phase(10 repeated words +10 new words, self-paced)

Figure 1 Procedures of the experiment. One repetition phase (a) and the following test phase

(b) composed one “repetition ‘after’ the teacher” set. The six sets (six a-b, a-b, ...

a-b sets) composed the first part of the experiment. The second half also consisted

of six “repetition ‘with’ the teacher” phases and six test phases (six c-d, c-d, ... c-d

sets). Half of the participants started the experiment of ‘with the teacher’ sets, and

all test words were presented randomly.

Lindsay & Norman, 1977; Wickens, 2002), fol-

lowed the repetition phase after a short cue of

white noise for 100 ms. One test phase included

twenty test words presented in randomized or-

der. The twenty words in the test phase con-

sisted of ten ‘old’ test words from the preced-

ing repetition phase (words the participants vo-

calized), and ten more nonsense words ‘new’ to

the participants (words the participants did not

vocalize). The participants heard twenty test

words through their headphones and self-checked

their memory by expressing their confidence lev-

els. The participants were expected to reject the

‘new’ test words as ‘not the word I repeated.’

The participants’ levels of confidence were de-

fined as certainty factors (CFs hereafter), and

the levels were measured on a scale from one (‘yes

with most certainty’) to six (‘no with most cer-

tainty’) as shown in Table 3. Participants pushed

one of the six keys on a six-button switchbox

(Cydrus Response Pad RB-620). The test phase

was self-paced and no feedback was given to the

participants. After short cue of five white noises

and pauses for 100 ms each, the test phase en-

tered the second repetition phase. Participants

were allowed to have a 30 minute break after six

repetition-and-test phases in the experiment.

The second half of the experiment consisted

of six more repetition phases and six more test

phases. The difference was that participants re-

peated the test words ‘with the teacher’ during

the repetition phases. Participants were pre-

sented the test words twice and were required

to repeat test words chorally and simultaneously

with the second audio presentation. This type

of repetition is similar to practices of ‘echoing,’

‘mirroring,’ or ‘shadowing’ as classified in Nagai

(2007).

Half number of the participants started the

experiment with repetition ‘after’ the teacher,

and the other half started repetition ‘with’ the

teacher to make a counterbalance. The overall

procedures of the experiment are illustrated in

Figure 1. The total number of trials was 2 (‘rep-

etition after me’ and ‘repetition with me’ at the

trial phases) × 20 (trials at the test phases) × 6

(phases at the first and second halves) = 240.

If repetition ‘after’ the teacher is divided into

several cognitive stages (i.e. (1) perception of the

teacher’s voice, (2) holding the teacher’s model

pronunciation in temporary storage, (3) plan-

ning pronunciation by reference to the teacher’s


Table 3 Levels of confidence (certainty

factor, CF). Six buttons on the

switchbox were laid out in a

straight vertical line. The box

was placed with the key no.6

(yes with most certainty) in

front of the participant.

1 no with most certainty

2 no with certainty

3 no without certainty

4 yes without certainty

5 yes with certainty

6 yes with most certainty

model, and (4) articulation by the learner), rep-

etition ‘with’ the teacher has an additional pro-

cess ((5) adjustment of articulation following the

second model pronunciation by their teacher).

The adjustment of articulation (stage (5) when

repeating ‘with’ the teacher) included an over-

lapping process of ‘the second audio presenta-

tion’ and ‘student’s voice being fed back audi-

torily and mentally’. Note here that the stu-

dents were hearing their own voices (auditory

and mental feedback) not only when they re-

peated ‘with’ the teacher, but also when repeat-

ing ‘after’ the teacher. In other words, repetition

‘with the student’s own voice (repetition ‘after’

the teacher)’ and repetition ‘with the teacher’s

voice + the student’s voice (repetition ‘with’ the

teacher) simultaneously’ were compared in the

present experiment. This is unavoidable because

synchronous repetition of nonsense words pre-

sented only once (i.e. ‘presenting test words only

one time when repeating ‘with’ the teacher’ with-

out visual presentation) is too difficult for partic-

ipants, and because ‘presenting test words two

times when repeating ‘after’ the teacher’ is the

same as allowing the participants to listen to the

test words three times (i.e. they hear their own

voices thirdly).

3. Results and discussion

If the CFs in Table 3 are simplified into di-

chotic yes/no responses, the result can be sum-

Figure 2 Number of correct responses counted

by dichotic (yes/no) criteria

marized as shown in Figure 2. The numbers

of correct responses in Figure 2 correspond to

the scores of eleven participants (A-K) and their

scores are summed up in Table 4. Under the as-

sumption that the number of correct responses

equals the scores of each participant with nor-

mal distributions, analyses of variance were cal-

culated to examine the effects of the two repeti-

tion conditions (‘after’ versus ‘with’ the teacher)

and the two other levels of type of trials (‘hit’

and ‘correct rejection,’ detailed below) in Fig-

ure 2. While one variable, ‘repetition after/with

the teacher,’ had a significant main effect on the

number of correct responses (the dependent vari-

able), F (1, 40) = 10.59, p < .05. The other vari-

able ‘hit/correct rejection’ did not have a signif-

icant effect, F (1, 40) = 1.04, p = .31, n.s.. The

post hoc test showed that the score for repeti-

tion ‘with’ the teacher was higher than that of

‘after’ the teacher. It also indicated that ‘hit’

responses outnumbered ‘correct rejections’ when

the participants repeated the test words ‘after’

the teacher, p < .05. These results demonstrate

the importance of repetition ‘with’ the teacher.

They also imply that, in the test phase follow-

ing ‘repetition after the teacher,’ the participants

were better at recalling the test words which they

repeated in the first repetition phase than they

were at rejecting new words added in the test

phase.

Signal detection theory provides a framework

to classify the correct responses into two cate-


Table 4 Mean number of correct responses (hit and correct rejection)

out of 60 trials each (240 in total)

mean S.D.

repeat after the teacher hit 37.91 5.4

correct rejection 46.18 8.0

subtotal 40.00 5.1

repeat with the teacher hit 42.09 3.9

correct rejection 45.73 6.1

subtotal 43.91 5.4

total 42.98 6.7

gories: ‘hit’ and ‘correct rejection’. ‘Hit’ indi-

cates the correct response elicited by participants

who judged that ‘the word presented in the test

phase’ was included among ‘the test words in

the previous repetition phase’ (i.e. ‘old’ words).

‘Correct rejection’ of newly added test words in

the test phase, on the other hand, represents an-

other type of correct response that is evoked by

participants who judged that ‘the word in the

test phase’ was not ‘the word s/he had repeated

in the preceding repetition phase’ (i.e. a ‘new’

word).

Figure 3 indicates the two idealized contin-

uum distributions of responses. The broken line

can be considered to indicate the number of re-

sponses to the ‘noise’ (test words newly added

in the test phase). The overlapping right curve

corresponds to the number of responses to the

‘noise and signal’ (test words with and without

newly added words). The area of overlap shows

where the participants had difficulty in distin-

guishing ‘noise and signal’ from ‘noise’. Because

the present experiment was of a forced-choice de-

sign, the participants needed to draw and repre-

sented by a vertical line which determines the

cutoff point of the two distributions. Accord-

ingly, it is reasonable to assume that the larger

distance of the two distributions (d′) yields a

more precise distinction between the ‘noise’ and

‘noise and signal’ distribution. The present study

tried to differentiate between two styles of repe-

tition (‘after’ and ‘with’ the teacher in the rep-

Figure 3 Idealized distributions of noise (test

words newly added in the second

test phases) and noise+signal (all

test words in the test phase includ-

ing words repeated in the first rep-

etition phases)

etition phases) by calculating the distances (d′)

of the two distributions.

The cutoff points of the vertical line indicate

trade-offs of the two correct responses (‘hit’ and

‘correct rejections’) and the distance (d′) varies

together. Therefore, the results of the present ex-

periment can be illustrated by plotting the ‘hit’

rate against the ‘false alarm (complement set of

‘correct rejections’)’ rate as shown in Figure 4.

Panels (a-e) in Figure 4 are called Receiver Oper-

ating Characteristic (ROC) curves, and they are

indices of the sensitivity of participants to the

signals. It is also known that as the distance (d′)

increases, the area of the bottom-right portion of

the graphs will become wider (see Wickens, 2002,


Figure 4 Receiver Operating Characteristics (ROC) curves (repetition ‘after’ and ‘with’ the

model pronunciation are abbreviated to a-repeat and w-repeat)

pp. 39–58 for the theoretical background).

The lower right-hand area under the ROC

curves can be maximized by moving the ROC

curves to the left-hand border (the ordinate) and

the top border (the abscissa). Then normalized

ROC curves were computed in Figure 5 to com-

pare the area under the ROC curves in Figure 4.

It is more desirable to have a larger distance (d′)

because a large distance (d′) of the two distribu-

tions indicates more precise detection (reception)

of the signal as explained above, and because

participants’ more precise detection (perception)

can be regarded as indicating better retrieval of

the test words. The values of d′ can be obtained

by deciding crossing points of the linear approx-

imations of the normalized ROC curves and the

abscissa, p(H) = 0. The d′ value of repetition

‘after’ the teacher was 1.35, while that of rep-

etition ‘with’ the teacher phase was 1.57. This

difference shows a tendency for repetition ‘with’

the model presentation (at the repetition phases

of the experiment) to yield better retrieval of test

words than repetition ‘after’ the model presenta-

tion.

4. Conclusion

Memorization of new phonological lexicon is

an essential step in foreign language learning, re-

gardless of learners’ achievement levels. Vocal-

izing words is a useful tactic for minimizing the

target words because oral repetition requires no

written text at hand. This repetition can be

classified into two basic types from the view-

point of teacher’s and learner’s vocal overlap-

ping, i.e. repetition ‘with’ or ‘after’ the teacher.

Using ANOVA and the framework of signal de-

tection theory, the present paper tested which

of the two types of vocalization was more effec-

tive. ANOVA results show that scores of rep-

etition ‘with’ the model presentation is signifi-

cantly higher than that of ‘after’ the teacher’s

voice. The ROC curve analysis also ensures that

Vol.18 No.2 Repetition in Pronunciation Practice on Retrieval of Nonsense Words 327

Figure 5 Normalized Receiver Operating Characteristics (ROC) curves calculated from the

data of Figure 4.

repetition ‘with’ the model presentation height-

ens the value of the discrimination index (d′).

These results suggest that repetition ‘with’ the

teacher is a better activity than repetition ‘after’

the teacher when teaching new words. Vocaliza-

tion ‘with’ the teacher can be regarded as better

practice than repetition ‘after’ the teacher in lan-

guage classrooms, because students will memo-

rize target words (i.e. phonological lexicon) more

easily when they repeat ‘with’ their teacher.

The data were obtained from an experiment

in the retrieval of ‘new’ and ‘old’ words after

an audio presentation and its vocal repetition.

The difference of discrimination index (d′) cal-

culated from the ROC curves is regarded as the

difference of confidence in participants’ ability to

distinguish ‘old’ words (that the participants re-

peated) from ‘new’ words (that they did not re-

peat). The sensitivity of the participants is also

regarded as being indicated by the accuracy of

their responses in the present study. It must be

admitted that the number of audio presentation

is different between repetitions ‘after’ and ‘with’

a model. However, the framework of this exper-

iment may establish a new research method for

the study of the role of repeating in language

learning.

References

Aoyama, K., Flege, J. E., Guion, S. G., Akahane-

Yamada, R., & Yamada, T. (2004). “Per-

ceived phonetic dissimilarity and L2 speech

learning: The case of Japanese /r/ and En-

glish /l/ and /r/.” Journal of Phonetics, 32,

233–250.

Celce-Murcia, M., Brinton, D. M., & Goodwin,

J. M. (1996). Teaching Pronunciation. Cam-

bridge: Cambridge University Press.

Dalton, C., & Seidlhofer, B. (1994). Pronuncia-

tion. Oxford: Oxford University Press.

Doff, A. (1988). Teach English. Cambridge:

Cambridge University Press.

Fujitsu Ltd. (1998). Speech API for Microsoft

Windows. Tokyo: Fujitsu Ltd.

Green, D. M., & Swets, J. W. (1966). Signal de-

tection theory and psychophysics. New York:

Wiley.

Hayashi, S. (1976). New nonsense syllable list.

Nagoya: Tokai University Press.

Howatt, A. P. R., & Widdowson, H. G. (2004). A

History of English Language Teaching Second

edition. Oxford: Oxford University Press.

Kenworthy, J. (1987). Teaching English Pronun-

ciation. London: Longman.

Kelly, G. (2000). How to Teach Pronunciation.

Harlow: Pearson Education.

Lindsay, P. H., & Norman, D. A. (1977). Human

information processing. New York: Academic

Press.

Massaro, D. W. (1975). Experimental psychology

and information processing. Chicago: Rand

McNally College Publishing.

Miwa, J., Sasaki, H., & Tanno, K. (2000).

“Japanese Spoken Language Learning Sys-


tem Using Java Information Technology.”

Sixth International Conference on Spoken

Language Processing (ICSLP 2000), Vol.III,

578–581.

Nagai, K. (2009). “Effect of pronunciation prac-

tices on the acquisition of artificial lan-

guages.” Studies in phonetics and speech

communication. Kinki Society for Phonetics.

6, 225–233.

Nagai, K. (2007). “Differences of pronunciation

practices: A study of ‘Repeat with me’ and

‘Repeat after me’.” Journal of the Phonetic

Society of Japan, 11, 79–93.

Shogakukan (1988). Kokugo Dai Jiten Dictio-

nary (Revised edition).

Stern, H. H. (1992). Issues and Options in Lan-

guage Teaching. Oxford: Oxford University

Press.

Wickens, T. D. (2002). Elementary Signal De-

tection Theory. Oxford: Oxford University

Press.

(Received 9 Sep. 2010)

(Accepted 12 April 2011)

長井克己（正会員）1996年，M.Sc. in Applied Linguistics (Univer-

sity of Edinburgh)．1999年，博士（言語文化学・大阪大学）．1987年山口大学人文学部卒業後，大阪府立香里ヶ丘高等学校教諭，同農芸高等学校教諭，津山工業高等専門学校助教授を経て，現在香川大学大学教育開発センタ准教授．日本語，英語，ゲール語の学習者の音声を母語話者と比較することにより，より有効な発音練習の方法を提案することを目指している．日本音声学会，日本音響学会，日本心理学会，Acoustical Society of America などに所属．

Documents

Eﬀect of Repetition in Pronunciation Practice on Retrieval