Acoustic Analysis of Voice: A Tutorial - Western Michigan …homepages.wmich.edu/~hillenbr/Papers/Hillenbrand 201… · · 2013-07-12Acoustic Analysis of Voice: A Tutorial ... with

2

Acoustic Analysis of Voice: A Tutorial

James M. Hillenbrand

Department of Speech Pathology and Audiology

Western Michigan University

Kalamazoo, MI 49008

3

Abstract

This tutorial reviews acoustic methods that have been used to characterize vocal function.

The most persuasive argument for the use of acoustic measures is that all of the information used

by listeners to make judgments about speech is to be found in the acoustic signal. Acoustic

methods have been used clinically to differentiate normal from abnormal voices, to aid in

differential diagnosis, to evaluate the relative effectiveness of different treatment approaches, and

to track progress in voice therapy. The measures discussed here focus on quantifying the degree of

periodicity, the shape of the spectrum, and the range of vocal intensity.

Introduction

Fundamental to the logic underlying many acoustic measures of voice is the simple

observation that the voice is produced to please the human ear. Listeners make both linguistic and

aesthetic judgments about voice signals, and essentially all of the information that controls these

judgments is embedded within the acoustic signal. It is also the case that the disturbances in glottal

vibratory patterns that are associated with laryngeal pathologies will be reflected in changes in the

acoustic signal that is radiated from the lips. Also, as has been noted by many investigators, the

voice signal is perhaps the most noninvasive and easily obtained measure of vocal performance.

As straightforward as this logic may seem, uncovering the fundamental relationships between

acoustic properties and perceived vocal qualities, or the relationships between the characteristics

of the acoustic signal and the underlying laryngeal vibratory patterns that shape the sound wave,

has not proven to be an especially easy problem.

Goals of Acoustic Analysis

The major clinical applications of acoustic analysis tend to fall into three broad categories:

(1) screening, (2) diagnostic support, (3) evaluation of the relative effectiveness of different

treatment approaches, and (4) assessment of progress throughout the course of treatment (see

Laver Hiller & Beck (1992) for discussion and a slightly expanded list of applications). Screening

applications are often directed at rapid and inexpensive early detection of serious conditions such

4

as laryngeal carcinoma, although there have also many attempts to develop screening methods to

detect the presence versus absence of laryngeal pathologies as a general category. Many

techniques have been described that are capable of discriminating normal from abnormal voices

with reasonably good accuracy (e.g. Hadjitodorov & Mitev, 2002), although the usefulness of

automating this kind of decision making has been questioned by some investigators (e.g., Hirano et

al., 1988). The closely related goal of diagnostic support involves the use acoustic measurements

of voice to differentiate among different pathological conditions. The clinical utility of

discriminating among underlying pathological conditions is quite clear; however, this is a

considerably more difficult problem than separating normal from abnormal voices, and success in

this area has been limited. The use of acoustic measurements to provide objective indices of

progress throughout the course of a treatment program, or to compare the relative effectiveness of

alternative treatment strategies, has received a good deal of attention in the literature. This is

perhaps the single most promising application of acoustic measurements of vocal function.

Signals for Analysis

Airborne Microphone Signal

The simplest and most commonly used input signal is an unprocessed sound wave transduced by a

conventional airborne microphone. The chief advantage of this approach is that this signal is, for

all practical purposes, equivalent to the one that reaches the ear. The principal drawback of this

simple signal is that the acoustic properties of the waveform reflect contributions not only of the

laryngeal source function that is of greatest interest but also the resonance characteristics of the

vocal tract.

Contact Microphone Signal

A commonly used alternative is to record the voice signal using a contact microphone placed on

the throat. The resulting signal is generally claimed to be less contaminated by vocal tract events.

A primary effect of this recording technique is to strongly emphasize the lower frequency

components of the voice signal at the expense of upper harmonics in formant regions. This often

5

simplifies technical problems associated with measuring features such as instantaneous

fundamental frequency (Schoentgen, 1989; Hirano, 1981).

Glottal Inverse Filtering

A more direct approach to removing the effects of vocal tract filtering involves the use of a

pneumotachograph mask and inverse filter to recover an approximation to the glottal volume

velocity function (Rothenberg, 1973). The basic idea behind this technique is that if the formant

frequencies and bandwidths are known, an inverse filter can be designed to significantly reduce the

effects of vocal tract filtering. An approximation to the glottal flow signal is derived by passing the

oral flow signal through the inverse filter. There is, however, a certain amount of error involved in

estimating formant parameters, and it is not always clear how precisely the glottal flow signal is

approximated. The middle panel of Figure 1 shows an example of a glottal flow signal. (See

Sondhi (1975) and Javkin, Antonanzas-Barroso & Maddieson (1987) for other approaches to the

problem of recovering the glottal flow signal.)

LPC Residue Signal

A very different approach to inverse filtering is based on a signal analysis method called linear

predictive coding (LPC). If certain simplifying assumptions are made, LPC can be used to separate

the source and filter components of a speech signal. When applied to voice signals, LPC is based

on the assumption that voiced speech can be modeled as the output of a time-varying filter that is

excited by quasi-periodic pulses. The filter represents the combined effects of vocal-tract filtering,

lip radiation, and shape of the glottal pulse. LPC uses time-series analysis to produce a sequence of

estimates of the frequency response curve of this time-varying filter. The filter shapes that are

recovered can then be inverted to produce an estimate of the pulse train that is implicitly assumed

to excite the filter. The top and bottom panels of Figure 1 show a speech signal along with the

"LPC residue signal," the source waveform derived through the application of an LPC inverse

filter. Because of the simplifying assumptions that are made when LPC is applied to speech

modeling, the residue signal does not resemble a naturally occurring glottal source waveform.

Whether this is an advantage or a drawback depends on the goals of the analysis technique. For

6

example, since the recovered source signal tends to show relatively sharp pulses at the onsets of

individual pitch periods, the LPC residue can simplify the measurement of acoustic features that

depend on detecting the onsets of individual pitch pulses (see Davis, 1976; 1981).

Speech Material

The most commonly used utterance for acoustic voice analysis is a monotone sustained vowel. The

use of sustained vowels makes it much easier to obtain acoustically based estimates of vocal

performance that are comparable across a population of talkers who may display a broad range of

speaking styles in material such as read text or conversational speech. It is also a much simpler

matter to separate glottal source parameters from those associated with the vocal tract when a

relatively fixed vocal tract posture is maintained. The most important limitation of sustained

vowels is that these simple utterances are sometimes not sufficiently revealing of the underlying

pathological condition. As Schoentgen (1989) noted, speakers are often able to compensate for

pathological conditions during constant-pitch phonation but may be unable to do so during more

natural utterances involving dynamic adjustments of both laryngeal and vocal tract structures.

Although the issue has received little systematic study, several investigators have noted that there

are many pathological conditions which tend to produce vocal quality disturbances that are

conditioned by the type of speech task (e.g., Hirano, 1981; Sapienza, Walton & Murry, 2000). For

example, in a comparison of isolated vowels and connected speech, Klingholtz (1990) reported

two findings that are of considerable interest. First, measurements of spectral noise from

dysphonic speakers' isolated vowels accounted for less than half of the variance in spectral noise

measures from connected speech material spoken by the same talkers. This finding suggests that

vocal performance measured from isolated vowels is not an accurate gauge of a speaker's

performance in connected speech. Second, the separation of normal from pathological speakers

based on the noise measures was four times more accurate when the measures were made from

connected speech than isolated vowels (see also Hillenbrand, Cleveland & Erickson, 1994;

Hillenbrand & Houde, 1996).

7

Acoustic Measures of Vocal Function

An enormous variety of acoustic methods for assessing vocal function have been proposed.

In this brief review it will be possible to provide a limited discussion of some of the more

commonly used procedures. For the purposes of organizing the discussion, the techniques will be

grouped somewhat arbitrarily into: (1) periodicity measures, (2) measures of spectral shape, and

(3) intensity measures.

Periodicity Measures

As von Leden, Moore and Timke (1960) noted, "... the commonest observation in

pathologic conditions is a strong tendency for frequent and rapid changes in the regularity of the

vibratory pattern" (p. 34). It is therefore not surprising that the bulk of work on acoustic analysis

has focused on the measurement of signal periodicity. While these departures from perfect

periodicity may be observed directly with high-speed film or videostroboscopy, a wide range of

instrumentally simple, noninvasive methods are available to measure the acoustic consequences of

these disturbances in periodicity.

Perturbation Analysis

Pitch Perturbation

There is no single area of acoustic voice analysis that has received more attention or

generated more debate than the measurement of vocal perturbation. The point of departure for this

work is Lieberman's observation – first with high-speed film (Lieberman, 1961) and later with

acoustic measurements (Lieberman, 1963) – that pathological voices tend to show unusually large

cycle-to-cycle fluctuations in the fundamental period. The phenomenon of cycle-to-cycle

fluctuations in the fundamental period is referred to variously as pitch perturbation, fundamental

frequency perturbation, or vocal jitter.

Despite the simplicity of the underlying concept, a dizzying array of methods have been

used to quantify pitch perturbation. The simplest of these is mean jitter, defined as the average

absolute difference in fundamental period between adjacent pitch pulses. Since there is a strong

correlation between mean jitter and the average fundamental period (e.g., Lieberman, 1961;

8

Koike, 1973), it is common to express jitter as a percentage of the average fundamental period.

Percent jitter is defined as mean jitter divided by the mean period, multiplied by 100. However,

there are some rather clear indications that this simple method does not, in fact, adequately

normalize for differences in average fundamental frequency (see Horii (1979) and Orlikoff and

Baken (1990) for a discussion). Another approach that is in common use involves calculating a

sequence of differences between the instantaneous fundamental period and a running average of

adjacent periods (e.g., Koike, 1973). The purpose of the running average is to minimize the effects

of slowly varying changes in fundamental frequency and emphasize the quasi-random, rapid

fluctuations initially observed by Lieberman. Numerous other methods have been used to calculate

pitch perturbation (see Baken, 1987; Laver et al., 1992; Pinto and Titze, 1990, for reviews), and

this wide variation in computational methods is at least partly responsible for the limited

agreement across laboratories on a host of issues related to perturbation measures.

Amplitude Perturbation

Amplitude perturbation, or vocal shimmer, is defined as cycle-to-cycle fluctuation in the

amplitudes of adjacent pitch pulses. As with jitter, a wide variety of calculation methods have been

used. The simplest is mean shimmer, which is simply the average absolute difference in amplitude

between adjacent pitch pulses. Methods based running averages are also in widespread use (e.g.,

Takahashi & Koike, 1975).

Clinical Relevance of Perturbation Measures

There is no clear consensus on the clinical relevance of perturbation measures. Many

studies have confirmed Lieberman's report that perturbation measures distinguish normal from

pathological voices with reasonable accuracy, although even on this relatively simple point there is

disagreement (e.g., Hecker &Kruel, 1971; Ludlow, Basshich, Connor, Coulter & Lee, 1987;

Davis, 1981; Zyski, Bull, McDonald & Johns, 1984). Although some investigators have reported

positive findings, there is no consistent evidence across studies that perturbation measures – alone

or in combination with other acoustic measures – can sort patients by diagnostic category with the

kind of precision that would be needed in clinical settings. In terms of perceptual correlates, there

9

is good evidence from several synthesis studies showing that increased jitter is associated with a

sensation of roughness; the evidence regarding the perceptual correlates of shimmer is mixed

(Wendahl, 1966; Heiberger & Horii, 1982; Hillenbrand, 1988). Due, in part, to a variety of

technical problems in measuring perturbation from natural speech signals (discussed below), there

is no consistent evidence for a straightforward relationship between perturbation values measured

from natural speech and perceptual dimensions such as roughness, hoarseness, or overall severity

of dysphonia prompting some investigators to question the utility of perturbation measures for

characterizing voice quality (Kreiman & Gerratt (2005).

Methodological Concerns

Issues related to the measurement of jitter and shimmer have been examined in minute

detail in a large number of methodological studies. The majority of the technical limitations that

have been reported in these studies have their origin in two fundamental facts about perturbation

measures: (1) the phenomena being measured are quite small (e.g., percent jitter is generally below

about 1% in normal voices and seldom exceeds more than a few percent in disordered voices), and

(2) the technique requires very precise determination of the onsets and offsets of individual pitch

pulses. The net result is that even relatively small errors in fundamental frequency measurement

can have a large effect on perturbation measures, and seemingly minor differences in technique

from one measurement system to another can result in poor reliability. Further, uncertainty

regarding the precise locations of pitch pulses is especially great for the marginally periodic voices

that are of greatest interest in most clinical settings (Rabinov, Kreiman, Gerratt & Bielamowicz,

1995). These technical problems, in combination with a considerable diversity in methods across

systems, have resulted in the disappointing levels of reliability and validity that have been reported

in several studies (e.g., Rabinov et al., 1995; Bielamowicz, Kreiman, Gerratt, Dauer & Berke,

1996; Karnell, Scherer & Fisher, 1991; Karnell, Hall & Landahl, 1995; Hillenbrand, 1987). For

example, Bielamowicz et al. made perturbation measures from voice samples recorded from both

normal and dysphonic speakers. The measures were obtained with three different commercial

systems and a semiautomatic system in which errors in locating pitch-pulse boundaries could be

10

corrected by hand. For jitter measures, some pairs of measurement systems agreed reasonably

well, with correlations of approximately 0.80. However, other pairs of systems agreed very poorly

with correlations as low as 0.23. Reliability for shimmer measures was better, with most

correlations 0.80-0.90 range.

Other Periodicity Measures

Many other methods have been used to measure the degree of periodicity in a voice

waveform. Described below is a sampling of the some of the techniques that have developed.

Harmonics-to-Noise Ratio (HNR)

A measure called harmonics-to-noise ratio (HNR; Yumoto, Gould, & Baer, 1982) begins

with a sustained vowel that has been delineated into individual pitch pulses. The numerator in the

HNR calculation is simply the RMS energy in the average of all individual pitch pulses. (For a

perfectly periodic signal, the RMS energy in the average pitch pulse will be equal to the total RMS

energy in the original waveform since the waveform would consist of the exact repetition of the

average pitch pulse.) The noise component is estimated by subtracting this average pitch pulse

from each individual pitch pulse in the original signal. The RMS energy in this difference

waveform serves as the numerator in the HNR calculation. Studies using this technique have

reported large differences in HNR between normal and disordered talkers, as well as strong

correlations with subjective ratings of hoarseness severity (Yumoto et al., 1982; Yumoto, Sasaki &

Okamura 1984). Since this technique, like the perturbation measures discussed above, depends on

identifying the precise locations of the boundaries of individual pitch pulses, HNR suffers from

many of the reliability and validity problems described in the previous section (e.g., Hillenbrand,

1987).

Spectral Methods

Many techniques have been developed that measure signal periodicity in the frequency

domain. In a perfectly periodic signal whose spectrum is measured with an ideal analyzer, all of

the spectral energy will be found at harmonically related frequencies. With increasing departures

11

from perfect periodicity, the ratio of harmonic energy to energy at non-harmonic frequencies will

decrease. Panel a of Figure 2 shows the spectrum of a normally phonated vowel with a well

defined harmonic structure. The spectrum of the moderately breathy voice in panel b shows less

prominent harmonics and greater energy in the regions between harmonics. Several periodicity

metrics have been developed around the straightforward principle of measuring the extent to

which signal energy is concentrated in the harmonic regions of the spectrum. For example,

Kasuya, Ogawa, Mashima, and Ebihara (1986) developed a spectral periodicity measure called

Normalized Noise Energy (NNE). NNE measures correctly identified all T2-T4 glottic cancers as

abnormal. However, the technique missed roughly a fourth of T1 cancers, and there was no

evidence that NNE measures could separate the glottic cancer group from other laryngeal

pathologies (see also Hirano et al., 1988; Kojima, Gould, Lambaise, & Isshiki, 1980; Emanuel &

Sansone, 1969; Deal & Emanuel, 1978; Lively & Emanuel, 1970; Cox, Ito & Morrison, 1989;

Klingholtz, 1990).

Cepstral Methods

Another way to capture the degree of harmonic organization in a spectrum is to compute a

spectrum of the log power spectrum, known as the cepstrum. For a periodic signal with a well

defined harmonic structure, the second spectrum will show a strong component corresponding to

the regularity of the harmonic peaks. With increasing departures from perfect periodicity, the

prominence this cepstral peak will be reduced. Hillenbrand et al. (1994) developed a metric called

Cepstral Peak Prominence (CPP), which was simply an amplitude-normalized measure of the

amplitude of the cepstral peak corresponding to the harmonic regularity (see panels c and d of

Figure 2). CPP was shown to correlate strongly with listener ratings of breathiness for both

sustained vowels and continuous speech (see also Hillenbrand et al., 1996; de Krom, 1993). In

addition, multi-factor acoustic models in which CPP measures figure prominently have been found

to accurately predict listener ratings of severity of dysphonia (Awan, Roy & Dromey, 2009),

differentiate perceptually distinct pathologic voice types (Awan & Roy, 2005), and classify voices

by diagnostic category (Callan, Kent, Roy & Tasko, 1999).

12

Autocorrelation

Autocorrelation analysis computes the correlation between a signal and a delayed copy of

the same signal at delays corresponding to the minimum and maximum expected fundamental

periods. The basic idea behind the technique is that the correlation between the signal and the

delayed copy will reach a maximum when the copy is offset by exactly one fundamental period.

The delay at which the peak occurs serves as an estimate of the fundamental period, and the

prominence of the peak in the autocorrelation function can serve as a measure of the degree of

signal periodicity; i.e., highly periodic signals should show more prominent peaks than marginally

periodic signals (see Figure 3). Davis (1976) developed a measure called Pitch Amplitude (PA)

that was based on autocorrelation analysis of the LPC residue signal. Significant differences in PA

measures were found between normal and pathological speakers, although the overlap between the

groups was sufficiently great that a screening tool based on this measure does not seem feasible. A

closely related measure was applied to ordinary voice waveforms by Hillenbrand et al. (1994;

1996) and was found to account for a large percentage of the variance in breathiness ratings from

sustained vowels and connected speech samples produced by normal and pathological speakers.

Measures of Spectral Shape

Spectral shape measures have been directed primarily at quantifying the level of aspiration

noise that is associated with insufficient glottal closure. Since aspiration noise is relatively strong

in the higher frequencies, a voice signal produced with incomplete glottal closure might be

expected to show a greater ratio of high- to low-frequency energy than normally phonated voice

signals. Figure 4 shows long-term average spectra (Fourier spectra calculated every 5 ms and

averaged over the full utterance) computed from short sentences read by five normal speakers and

five speakers with voice disorders that are characterized by moderate to severe breathiness. Large

differences can be seen in the levels of high- versus low-frequency energy. Frokjaer-Jensen and

Prytz (1976) developed a straightforward measure of spectral shape that consisted of ratio of

energy above 1 kHz to energy below 1 kHz from long-term average spectra of 45-second samples

13

of read text. They found significant reductions in high frequency energy following treatment.

Conceptually similar measures described by Klich (1982), Fukazawa El-Assuooty and Honjo

(1988), Shrivastav and colleagues (Shrivastav & Sapienza, 2003; Shrivastav & Camacho, 2010)

and Hillenbrand et al. (1996) were found to correlate well with listener ratings of breathiness (but

see Klatt & Klatt, 1990, and Hillenbrand, 1988, for negative findings). Two significant features of

this family of spectral shape measures are worth noting. First, the methods are technically very

straightforward and, although it has not been studied, it is difficult to envision reliability problems

of the kind that have been reported for measures such as pitch and amplitude perturbation. Second,

the measures are quite suitable for read texts as well as sustained vowels. In fact, recent results

indicate that spectral tilt measures are more closely associated with perceived voice quality for

connected speech than sustained vowels (Hillenbrand et al., 1996).

Intensity Measures

Measures of vocal intensity have received rather limited attention in comparison with

measures of periodicity and spectral shape. Although reduced intensity is associated with many

laryngeal pathologies, the lack of standardized procedures for measuring intensity and the

difficulty of controlling vocal effort complicate the interpretation of these measurements. A

reduction in the dynamic range of phonation intensity is frequently associated with laryngeal

pathology. Coleman, Mabis and Hinson (1977) reported an intensity range of approximately 50 dB

in the middle of the pitch range for a group of young normal subjects, although much smaller

normative values for intensity range were reported by Gramming (1988) and Colton, Casper and

Leonard (2011).

Vocal intensity range is known to depend on fundamental frequency (e.g., Wolfe, Stanley

& Sette, 1935). Consequently, a complete picture of a speaker's dynamic range for intensity

requires sampling the minimum and maximum vocal intensity for a broad range of fundamental

frequencies. A graphic presentation of the minimum and maximum phonation intensity for

different fundamental frequencies is known variously as a fundamental frequency-intensity

14

profile, a voice range profile (VRP), a phonetogram, or a phonogram. Several techniques have

been used to elicit VRPs (see Coleman, 1993). In the method described by Coleman et al. (1977),

the minimum and maximum phonation intensities are measured every 10% of the total

fundamental frequency range. An averaged VRP for a group of ten normal men from Coleman et

al. (1977) is shown in Figure 5. The compression of the dynamic range at the low and high ends of

the fundamental frequency range, and the upward trend in minimum phonation intensities at the

high end of the fundamental frequency range, have been observed in other studies (e.g.,

Gramming, 1988).

Conclusions

A great deal of effort has gone into the development of acoustic measures that can be used

to quantify various aspects of vocal function. These efforts have been more successful in some

areas than others. In particular, efforts to infer underlying pathological conditions based on the

acoustic signal have been limited in part by our incomplete understanding of some fundamental

relationships between physiology and acoustics, and in part by the inherently complex,

one-to-many relationships that exist between acoustic features and underlying pathological

conditions. For example, there is no single pathological condition that produces the turbulence

noise that is captured by many of the measures that have developed, nor is there a single laryngeal

pathology that can produce aperiodicities in the laryngeal vibratory pattern that other acoustic

measures have attempted to quantify. This should not lead us to conclude that measures of the type

described here are necessarily of limited value in the assessment of vocal function. Medicine is

replete with examples of diagnostic measures whose outcomes are not uniquely associated with

any individual pathological condition, but which can be of considerable significance in

combination with other diagnostic information. Commonplace examples include leukocyte

counts, blood pressure, and body temperature. It is doubtful that even the most enthusiastic

proponent of acoustic measures of voice would place any of the acoustic methods alongside these

everyday diagnostic workhorses, but the best of the acoustic measures can contribute significantly

15

to the assessment of vocal function.

16

References

Awan, S., and Roy, N. (2005). Acoustic prediction of voice type in women with functional

dysphonia. Journal of Voice, 19, 268-282.

Awan, S., Roy, N., & Dromey, C. (2009). Estimating dysphonia severity in continuous speech:

Application of a multi-parameter spectral/cepstral model. Clinical Linguistics and Phonetics, 23,

825-841.

Baken, R.J. (1987). Clinical Measurement of Speech and Voice. Boston: College Hill.

Bielamowicz, S., Kreiman, J., Gerratt, B.R., Dauer, M.S., and Berke, G.S. (1996). Comparison of

voice analysis systems for perturbation measurement. Journal of Speech and Hearing Research,

39, 126-134.

Callan, D.E., Kent, R.D., Roy, N., & Tasko, S.M. (1999). Self-organizing map for the

classification of normal and disordered voices. Journal of Speech, Language and Hearing

Research, 42, 355-366.

Coleman, R.F. (1993). Sources of variation in phonetograms. Journal of Voice. 7, 1-14.

Coleman, R.F., Mabis, J.H., and Hinson, J.K. (1977). Fundamental frequency-sound pressure level

profiles of adult male and female voices. Journal of Speech and Hearing Research, 20, 197-204.

Colton, R., Casper, J.K., & Leonard, R. (2011). Understanding Voice Problems: A Physiological

Perspective for Diagnosis and Management. 4th

Ed. Baltimore: Lippincott Williams & Wilkins.

Cox, N.B., Ito, M., and Morrison, M. (1989). Technical considerations in computation of spectral

harmonics-to-noise ratios for sustained vowels. Journal of Speech and Hearing Research, 32,

203-218.

Davis, S.B. (1976). Computer evaluation of laryngeal pathology based on inverse filtering of

speech. SCRL Monograph 13, Speech Communication Research Laboratory, Santa Barbara, CA.

Davis, S.B. (1981). Acoustic characteristics of normal and pathological voices. ASHA Reports, 11,

97-115.

de Krom, G. (1993). A cepstrum-based technique for determining a harmonic-to-noise ratio in

speech signals. Journal of Speech and Hearing Research, 36, 254-266.

Deal, R., and Emanuel, F.W. (1978). Some waveform and spectral features of vowel roughness.

Journal of Speech and Hearing Research, 21, 250-264.

Emanuel, F.W., and Sansone, F. (1969). Some spectral features of "normal" and simulated "rough"

vowels. Folia Phoniatrica, 21, 401-415.

17

Frokjaer-Jensen, B. and Prytz, S. (1976). Registration of voice quality. Bruel and Kjaer Technical

Review, 3, 3-17.

Fukazawa, T., El-Assuooty, A., and Honjo, I. (1988). A new index for evaluation of the turbulent

noise in pathological voice. Journal of the Acoustical Society of America, 83, 1189-1193.

Gramming, P. (1988). The phonetogram: An experimental and clinical study. Doctoral

dissertation, University of Lund, Malmo, Sweden.

Hadjitodorov, S. & Mitev, P. (2002). A computer system for acoustic analysis of pathologic voices

and laryngeal diseases screening. Medical Engineering and Physics, 24, 419-429.

Hecker, M.H.L., and Kreul, E.J. (1971). Descriptions of the speech of patients with cancer of the

vocal folds. Part I: Measures of fundamental frequency. Journal of the Acoustical Society of

America, 49, 1275-1282.

Heiberger, V.L., and Horii, Y. (1982). Jitter and shimmer in sustained phonation. In N.J. Lass

(Ed.), Speech and Language: Advances in Basic Research and Practice, Vol. 7. (pp. 299-332).

New York: Academic Press.

Hillenbrand, J.M. (1987). A methodological study of perturbation and additive noise in

synthetically generated voice signals. Journal of Speech and Hearing Research, 30, 448-461.

Hillenbrand, J.M. (1988). Perception of aperiodicities in synthetically generated voices. Journal of

the Acoustical Society of America, 83, 2361-2371.

Hillenbrand, J.M., Cleveland, R.A., and Erickson, R.L. (1994). Acoustic correlates of breathy

vocal quality. Journal of Speech and Hearing Research, 37, 769-778.

Hillenbrand, J.M., and Houde, R.A. (1996). Acoustic correlates of breathy vocal quality:

Dysphonic voices and continuous speech. Journal of Speech and Hearing Research, 39,

311-321.

Hirano, M. (1981). Clinical Examination of Voice. New York: Springer-Verlag.

Hirano, M., Hibi, S., Yoshida, T., Hirade, Y., Kasuya, H., and Kikuchi, Y. (1988). Acoustic

analysis of pathological voice. Acta Otolaryngologica, 105, 432-438.

Horii, Y. (1979). Fundamental frequency perturbation observed in sustained phonation. Journal of

Speech and Hearing Research, 22, 5-19.

Javkin, H., Antonanzas-Barroso, and Maddieson, I. (1987). Digital inverse filtering for linguistic

research. Journal of Speech and Hearing Research, 30, 122-129.

Karnell, M.P., Hall, K.D., and Landahl, K.L. (1995). Comparison of fundamental frequency and

perturbation measurements among three analysis systems. Journal of Voice, 9, 383-393.

18

Karnell, M.P., Scherer, R.C., and Fischer, L.B. (1991). Comparison of acoustic voice perturbation

measures among three independent voice laboratories. Journal of Speech and Hearing Research,

34, 781-795.

Kasuya, H., Ogawa, S., Mashima, K., & Ebihara, S.(1986). Normalized noise energy as an

acoustic measure to evaluate pathologic voice. Journal of the Acoustical Society of America, 80,

1329-1334.

Klatt, D.H., and Klatt L.C. (1990). Analysis, synthesis, and perception of voice quality variations

among female and male talkers. Journal of the Acoustical Society of America, 87, 820-857.

Klich, R.J. (1982). Relationships of vowel characteristics to listener ratings of breathiness. Journal

of Speech and Hearing Research, 25, 574-580.

Klingholtz, F. (1990). Acoustic recognition of voice disorders: A comparative study of running

speech versus sustained vowels. Journal of the Acoustical Society of America, 87, 2218-2224.

Koike, Y. (1973). Application of some acoustic measures for the evaluation of laryngeal

dysfunction. Studia Phonologica, 7, 17-23.

Kojima, H., Gould, W.J., Lambaise, A., and Isshiki, N. (1980). Computer analysis of hoarseness.

Acta Otolaryngologica, 89, 547-554.

Kreiman, J., and Gerratt, B. (2005). Perception of aperiodicity in pathological voice. Journal of the

Acoustical Society of America, 117, 2201–2211.

Laver, J., Hiller, S., and Beck, J.M. (1992). Acoustic waveform perturbations and voice disorders.

Journal of Voice, 6, 115-126.

Lieberman, P. (1961). Perturbations in vocal pitch. Journal of the Acoustical Society of America,

35, 344-353.

Lieberman, P. (1963). Some acoustic measures of the fundamental periodicity of normal and

pathologic larynges. Journal of the Acoustical Society of America, 35, 344-353.

Lively, M.A., and Emanuel, F.W. (1970). Spectral noise levels and roughness severity ratings for

normal and simulated rough vowels produced by adult females, Journal of Speech and Hearing

Research, 13, 503-517.

Ludlow, C.L., Bassich, C., Connor, N.P., Coulter, D.C., and Lee, Y.J. (1987). The validity of using

phonatory jitter and shimmer to detect laryngeal pathology. In T. Baer, C. Sasaki, and K. Harris

(Eds.), Laryngeal Function in Phonation and Respiration. (pp. 492-508). Boston: College Hill.

Orlikoff, R.F., and Baken, R.J. (1990). Consideration of the relationship between fundamental

frequency of phonation and vocal jitter. Folia Phoniatrica, 42, 31-40.

19

Pinto, N.B., and Titze, I.R. (1990). Unification of perturbation measures in speech signals. Journal

of the Acoustical Society of America, 87, 1278-1289.

Rabinov, C.R., Kreiman, J., Gerratt, B.R., and Bielamowicz, S. (1995). Comparing reliability of

perceptual ratings of roughness and acoustic measures of jitter. Journal of Speech and Hearing

Research, 38, 26-32.

Rothenberg, M. (1973). A new inverse-filtering technique for deriving the glottal airflow

waverorm during voicing. Journal of the Acoustical Society of America, 53, 1632-1645.

Sapienza, C.M., Walton, S., & Murry, T. (2000). Adductor spasmodic dysphonia and muscular

tension dysphonia: Acoustic analysis of sustained phonation and reading. Journal of Voice, 14,

502-520.

Shrivastav, R., & Camacho, A. (2010). A computational model to predict changes in breathiness

resulting from variations in aspiration noise level. Journal of Voice, 24, 395-405.

Shrivastav, R., & Sapienza, C. (2003). Objective measures of breathy voice quality obtained using

an auditory model. Journal of the Acoustical Society of America, 114, 2217–2224.

Schoentgen, J. (1989). Jitter in sustained vowels and isolated sentences produced by dysphonic

speakers. Speech Communication, 8, 61-79.

Smith, B., Weinberg, B., Feth, L., and Horii, Y. (1978). Vocal roughness and jitter characteristics

of vowels produced by esophageal speakers. Journal of Speech and Hearing Research, 21,

240-249.

Sondhi, M.M. (1975). Measurement of the glottal waveform. Journal of the Acoustical Society of

America, 57, 228-232.

Takahashi, H., and Koike, Y. (1975). Some perceptual dimensions and acoustical correlates of

pathologic voices. Acta Otolaryngologica, Suppl. 338, 1-24.

Titze, I.R. (1994). Toward standards in acoustic analysis of voice. Journal of Voice, 8, 1-7.

von Leden, H., Moore, P., and Timke, R. (1960). Laryngeal vibrations: Measurements of the

glottal wave. Part III, The pathologic larynx. Archives of Otolaryngology, 71, 16-35.

Wendahl, R.W. (1966). Laryngeal analog synthesis of jitter and shimmer auditory parameters of

harshness. Folia Phoniatrica, 18, 98-108.

Wolfe, S., Stanley, W., and Sette, W. (1935). Quantitative studies on the singing voice. Journal of

the Acoustical Society of America, 6, 255-266.

Yumoto, E., Gould, W.J., and Baer, T. (1982). Harmonics-to-noise ratio as an index of the degree

20

of hoarseness. Journal of the Acoustical Society of America, 71, 1544-1550.

Yumoto, E., Sasaki, Y., and Okamura, H. (1984). Harmonics-to- noise ratio and psychophysical

measurement of the degree of hoarseness. Journal of Speech and Hearing Research, 27, 2-6.

Zyski, B., Bull, G., McDonald, W., and Johns, M. (1984). Perturbation analysis of normal and

pathologic larynges. Folia Phoniatrica, 36, 190-198.

21 �Figure Captions

Figure 1. An acoustic signal, a glottal airflow signal estimated by glottal inverse filtering, and an

LPC residue signal derived by applying an LPC inverse filter.

Figure 2. Top panels: Fourier spectra for a nonbreathy (panel a) and a moderately breathy (panel

b) voice signal. Bottom panels: cepstra of a nonbreathy (panel c) and a moderately

breathy (panel d) voice signal.

Figure 3. Autocorrelation functions for normally phonated and breathy voice signals.

Figure 4. Long-term average spectra for five utterances spoken by nonpathological talkers and

by five talkers with voice disorders characterized by breathiness. Signals from

Hillenbrand et al. (1996).

Figure 5. Voice range profile based on an average of ten normal men (from Coleman et al.,

1977).

22

Acknowledgments

I am grateful to Michael Clark and Robert Erickson for comments on a previous draft.

Preparation of this chapter was supported by NIH grant R01-DC01661.

23

Figure 1

24

Figure 2

25

Figure 3

26

Figure 4

27

Documents

Acoustic Analysis of Voice: A Tutorial - Western Michigan …homepages.wmich.edu/~hillenbr/Papers/Hillenbrand 201… · · 2013-07-12Acoustic Analysis of Voice: A Tutorial ... with