9
High-speed image analysis reveals chaotic vibratory behaviors of pathological vocal folds Yu Zhang a,, Jun Shao b , Christopher R. Krausert c , Sai Zhang a , Jack J. Jiang b,c a Key Laboratory of Underwater Acoustic Communication and Marine Information Technology of the Ministry of Education, Xiamen University, Xiamen Fujian 361005, China b Shanghai EENT Hospital of Fudan University, Shanghai, China c Department of Surgery, Division of Otolaryngology – Head and Neck Surgery, University of Wisconsin School of Medicine and Public Health, Madison, WI 53792-7375, USA article info Article history: Received 1 November 2010 Accepted 23 January 2011 Available online 17 February 2011 abstract Laryngeal pathology is usually associated with irregular dynamics of laryngeal activity. High-speed imaging facilitates direct observation and measurement of vocal fold vibra- tions. However, chaotic dynamic characteristics of aperiodic high-speed image data have not yet been investigated in previous studies. In this paper, we will apply nonlinear dynamic analysis and traditional perturbation methods to quantify high-speed image data from normal subjects and patients with various laryngeal pathologies including vocal fold nodules, polyps, bleeding, and polypoid degeneration. The results reveal the low-dimen- sional dynamic characteristics of human glottal area data. In comparison to periodic glottal area series from a normal subject, aperiodic glottal area series from pathological subjects show complex reconstructed phase space, fractal dimension, and positive Lyapunov expo- nents. The estimated positive Lyapunov exponents provide the direct evidence of chaos in pathological human vocal folds from high-speed digital imaging. Furthermore, significant differences between the normal and pathological groups are investigated for nonlinear dynamic and perturbation analyses. Jitter in the pathological group is significantly higher than in the normal group, but shimmer does not show such a difference. This finding suggests that the traditional perturbation analysis should be cautiously applied to high speed image signals. However, the correlation dimension and the maximal Lyapunov expo- nent reveal a statistically significant difference between normal and pathological groups. Nonlinear dynamic analysis is capable of quantitatively describing the aperiodic vocal fold vibrations and may be helpful for understanding disordered behaviors in pathological laryngeal systems. Ó 2011 Elsevier Ltd. All rights reserved. 1. Introduction In recent years, nonlinear dynamic analysis and chaos theory have been applied to investigate abnormal or disordered behaviors of biomedical systems. Laryngeal pathology is usually associated with irregular dynamics of the vocal folds. Application of nonlinear dynamic analy- sis to human voice has received recent interest [1–10]. Nonlinear dynamic analyses of pathological voices from patients with laryngeal pathologies have suggested that voice irregularities or aperiodicity are associated with chaos [1–3]. However, sound waves must travel from the vocal folds through the pharynx and the oral cavity before they can be recorded. As a result, acoustic voice measure- ment could not directly observe vocal fold vibrations, and is susceptible to vocal tract filtering and aerodynamic turbulent noise [4–11]. Medical imaging has played an important role in measuring vocal fold vibrations and providing visual infor- mation about Laryngeal systems [12–16]. Commonly used laryngeal stroboscopy provides images of the glottal 0960-0779/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.chaos.2011.01.007 Corresponding author. E-mail address: [email protected] (Y. Zhang). Chaos, Solitons & Fractals 44 (2011) 169–177 Contents lists available at ScienceDirect Chaos, Solitons & Fractals Nonlinear Science, and Nonequilibrium and Complex Phenomena journal homepage: www.elsevier.com/locate/chaos

High-speed image analysis reveals chaotic vibratory behaviors of pathological vocal folds

Embed Size (px)

Citation preview

Chaos, Solitons & Fractals 44 (2011) 169–177

Contents lists available at ScienceDirect

Chaos, Solitons & FractalsNonlinear Science, and Nonequilibrium and Complex Phenomena

journal homepage: www.elsevier .com/locate /chaos

High-speed image analysis reveals chaotic vibratory behaviorsof pathological vocal folds

Yu Zhang a,⇑, Jun Shao b, Christopher R. Krausert c, Sai Zhang a, Jack J. Jiang b,c

a Key Laboratory of Underwater Acoustic Communication and Marine Information Technology of the Ministry of Education, Xiamen University,Xiamen Fujian 361005, Chinab Shanghai EENT Hospital of Fudan University, Shanghai, Chinac Department of Surgery, Division of Otolaryngology – Head and Neck Surgery, University of Wisconsin School of Medicine and Public Health, Madison,WI 53792-7375, USA

a r t i c l e i n f o

Article history:Received 1 November 2010Accepted 23 January 2011Available online 17 February 2011

0960-0779/$ - see front matter � 2011 Elsevier Ltddoi:10.1016/j.chaos.2011.01.007

⇑ Corresponding author.E-mail address: [email protected] (Y. Zhang)

a b s t r a c t

Laryngeal pathology is usually associated with irregular dynamics of laryngeal activity.High-speed imaging facilitates direct observation and measurement of vocal fold vibra-tions. However, chaotic dynamic characteristics of aperiodic high-speed image data havenot yet been investigated in previous studies. In this paper, we will apply nonlineardynamic analysis and traditional perturbation methods to quantify high-speed image datafrom normal subjects and patients with various laryngeal pathologies including vocal foldnodules, polyps, bleeding, and polypoid degeneration. The results reveal the low-dimen-sional dynamic characteristics of human glottal area data. In comparison to periodic glottalarea series from a normal subject, aperiodic glottal area series from pathological subjectsshow complex reconstructed phase space, fractal dimension, and positive Lyapunov expo-nents. The estimated positive Lyapunov exponents provide the direct evidence of chaos inpathological human vocal folds from high-speed digital imaging. Furthermore, significantdifferences between the normal and pathological groups are investigated for nonlineardynamic and perturbation analyses. Jitter in the pathological group is significantly higherthan in the normal group, but shimmer does not show such a difference. This findingsuggests that the traditional perturbation analysis should be cautiously applied to highspeed image signals. However, the correlation dimension and the maximal Lyapunov expo-nent reveal a statistically significant difference between normal and pathological groups.Nonlinear dynamic analysis is capable of quantitatively describing the aperiodic vocal foldvibrations and may be helpful for understanding disordered behaviors in pathologicallaryngeal systems.

� 2011 Elsevier Ltd. All rights reserved.

1. Introduction

In recent years, nonlinear dynamic analysis and chaostheory have been applied to investigate abnormal ordisordered behaviors of biomedical systems. Laryngealpathology is usually associated with irregular dynamicsof the vocal folds. Application of nonlinear dynamic analy-sis to human voice has received recent interest [1–10].Nonlinear dynamic analyses of pathological voices from

. All rights reserved.

.

patients with laryngeal pathologies have suggested thatvoice irregularities or aperiodicity are associated withchaos [1–3]. However, sound waves must travel from thevocal folds through the pharynx and the oral cavity beforethey can be recorded. As a result, acoustic voice measure-ment could not directly observe vocal fold vibrations, andis susceptible to vocal tract filtering and aerodynamicturbulent noise [4–11].

Medical imaging has played an important role inmeasuring vocal fold vibrations and providing visual infor-mation about Laryngeal systems [12–16]. Commonly usedlaryngeal stroboscopy provides images of the glottal

Table 1Subject information.

Subject No. Sex Age Diagnosis Subject No. Sex Age Diagnosis

1 M 70 Bleeding 16 M 44 Polyp2 M 52 Polyp 17 M 52 Polyp3 M 38 Polyp 18 F 26 Nodules4 F 30 Polyp 19 F 58 Normal5 F 33 Nodules 20 M 35 Normal6 F 59 Polyp 21 M 34 Normal7 F 41 Nodules 22 F 54 Normal8 F 25 Nodules 23 F 61 Normal9 F 26 Polyp 24 F 51 Normal

10 M 56 Polypoid degeneration 25 F 50 Normal11 F 43 Polyp 26 M 29 Normal12 F 44 Polyp 27 F 36 Normal13 F 47 Polyp 28 F 51 Normal14 M 48 Polyp 29 M 47 Normal15 F 46 Nodules 30 F 60 Normal

Fig. 1. (a) High speed digital imaging (HSDI) setup. (b) Glottal area threshold of vocal folds with a polyp where the glottis is marked by red pixels. (Forinterpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

170 Y. Zhang et al. / Chaos, Solitons & Fractals 44 (2011) 169–177

surface at a low frame rate. It is applicable only for periodicvocal fold vibrations and cannot capture aperiodic vibra-tory patterns. In comparison with laryngeal stroboscopy,

high speed digital imaging (HSDI) allows us to directlymeasure vocal fold vibrations at a much higher frame rate.Thus, aperiodic vibratory patterns can be recorded using

Fig. 2. High speed image sequences of a normal subject (a) and a patient with vocal nodules (b), where the time interval of successive frames is 2Dt.

Y. Zhang et al. / Chaos, Solitons & Fractals 44 (2011) 169–177 171

HSDI. The analysis of the glottal area function from HSDIhas been used for the assessment of vocal disorders fromlaryngeal pathologies. Yan et al. [15] applied automatictracing of vocal fold movement to extract the glottal areaof normal subjects and patients, and they performedamplitude and frequency perturbation analyses of the glot-tal area series. From high-speed image data, Dollingeret al., [12] extracted asymmetric tension parameters of avocal fold model simulating laryngeal paralysis. However,previous HSDI studies have focused on developing meth-ods to track and segment the glottis in image sequences,but have not yet quantified irregular or aperiodic dynam-ics of vocal fold vibrations of patients with laryngealpathologies.

The hypothesis of this study is that nonlinear dynamicmethods might effectively analyze aperiodic high-speedimages from patients and might provide valuable informa-tion on vibratory mechanisms of vocal disorders from lar-yngeal pathologies. To our knowledge, the present studymay be one of the earliest efforts in applying nonlinear dy-namic analysis to quantify the irregular dynamics of high-speed image data from human subjects with laryngealpathologies including vocal nodules, polyps, bleeding,and polypoid degeneration. In order to test the hypothesis,high-speed image data from both normal and pathologicalsubjects is analyzed using traditional perturbation analy-sis, including jitter and shimmer, and using nonlinear dy-namic analyses, including phase space reconstruction,correlation dimension, and Lyapunov exponents. Further-more, significant differences between the normal andpathological groups are investigated for nonlinear dynamicand traditional perturbation analyses. The results suggestthe applicability of using nonlinear dynamic analysis andHSDI to quantitatively describe the irregular dynamics ofthe vocal folds in subjects with laryngeal pathologies.

2. Materials and methods

2.1. Subjects and high-speed imaging recording information

The Institutional Review Board of Fudan UniversityEENT Hospital approved the protocol and consent proce-dure. Eighteen subjects with laryngeal diseases and twelvenormal subjects participated in this study. The laryngealpathologies include vocal fold bleeding, polyps, nodules,and polypoid degeneration. Diagnoses were made by anattending physician and were based on the subject’s med-ical history and an endoscopic examination of the vocalfolds. The normal subjects were healthy volunteers withno current or past evidence of voice or speech disorderswho had normal larynges as determined by the clinicalexamination performed by an otolaryngologist. Subjectinformation is shown in Table 1. Subjects were asked tosustain the vowel/a/ at a comfortable pitch and intensityas steadily and as long as possible in order to reduce thenonstationary variances in respiratory and articulatory as-pects. To avoid transient effects of voice breaks duringvoice onset and offset, middle and stationary high-speedglottal area series were chosen for analysis.

Fig. 1(a) shows the systematic diagram of the clincalHSDI setup. Vocal fold vibrations of human subjects wererecorded with a high-speed digital camera (Fastcam-ultima APX). The high-speed digital camera system acquiredimages at a sampling rate, fs, of 4000 frames per secondwith a resolution of 256 � 512 pixels. The vocal fold edgesand glottal area were extracted using the image edgedetection and thresholding methods that have been re-ported in our previous studies [16]. Fig. 1(b) displays theglottal area threshold in a patient with a right vocal foldpolyp. Glottal area with a grey scale level less than a cer-tain threshold from the surrounding vocal fold tissue was

nþ1 n n

172 Y. Zhang et al. / Chaos, Solitons & Fractals 44 (2011) 169–177

separated, and then the glottal area could be recorded as atime series of s(tn), where ti = iDt(i = 0,1, . . . ,N � 1),N = 4000, and Dt = 1/4000 s is the discrete time interval.Fig. 2(a) and (b) show nine successive high-speed framesof a normal subject and a subject with vocal nodules,respectively, where the time interval of successive framesis 2Dt. In comparision with the regular vibratory patternof the normal vocal folds, vocal nodules disrupt glottal clo-sure and cause an irregular vocal fold vibration. HSDI visu-ally displays the vibratory patterns of the vocal folds.However, in order to provide further quantitative informa-tion about vocal fold vibrations, nonlinear dynamic analy-ses must be performed on the glottal area series s(tn).

2.2. Perturbation analysis

The traditional perturbation methods [1] including per-cent jitter and percent shimmer will be applied to analyzethe glottal area signals s(tn) from HSDI. The percent jitter isa cycle-to-cycle frequency perturbation measure, and thepercent shimmer is a cycle-to-cycle amplitude perturba-tion measure. The software Cspeech 4.0 was used to ex-tract the measures of jitter and shimmer [17]. Previousstudies have shown that perturbation analysis of anaperiodic signal is unreliable and that only a nearly peri-odic signal can be reliably analyzed with jitter and shim-mer [18,19]. As a practical guideline, Titze [18] suggestedthat perturbation measures less than about 5% might bereliable for nearly periodic voices. In this study, we appliedthe perturbation analysis for nearly periodic voices basedon Titze’s suggestion. However, nonlinear dynamic meth-ods, including phase space reconstruction, correlationdimension, and the Lyapunov exponent, did not have lim-itations like perturbation analysis and will be used for theanalyses of both nearly periodic and aperiodic glottal areasignals.

2.3. Nonlinear dynamic analysis

For the glottal area series s(tn) of all subjects, nonlineardynamic methods, including phase space reconstruction,correlation dimension D2, and the Lyapunov exponent k1,will be applied. The reconstructed phase space illustratesthe dynamic behavior of a signal: a periodic signal pro-duces a closed trajectory, while an aperiodic signal looksirregular and chaotic. A phase space reconstruction canbe created by plotting a signal against itself at some timedelay [20]. We have the following m-dimensional time de-lay vector:

XðtÞ ¼ fsðtÞ; sðt þ sÞ; . . . ; sðt þ ½m� 1�sÞg; ð1Þ

where m is the embedding dimension and s is the time de-lay. Takens [21] demonstrated that when m > 2D + 1,where D is the dimension of the attractor, the recon-structed phase space with the lagged coordinates is topo-logically equivalent to the original phase space withphysical coordinates. For a finite length time series, s canbe estimated by using the mutual information method pro-posed by Fraser & Swinney [22]. In the curve of the mutualinformation versus time delay s, the first minimum value

of the mutual information function provides an effectivecriterion for choosing the proper time delay s.

The correlation dimension D2, proposed by Grassbergerand Procaccia [23], is a geometric measure of the attractorthat describes how strongly two points on the attractor arecorrelated in phase space. The correlation dimension effec-tively quantifies and analyzes the complexity or irregular-ity of the reconstructed phase space. A more complexsystem with a higher correlation dimension may requiremore variables to describe its behaviors. The correlationdimension can be calculated from a time series with lengthN,

D2 ¼ limr!0log CðrÞ

log r; ð2Þ

where r is the radius around Xi. The correlation integralC(r) is the probability that two different vectors on theattractor are separated by a distance smaller than a radiusr. For finite datasets, the correlated points in a recon-structed phase space would produce an ‘‘anomalous shoul-der’’ in the curve of logC(r) versus log (r), which mightresult in inaccurate estimates of D2. To avoid this problem,C(r) can be obtained using the formula by Theiler [24]:

CðW;N;rÞ¼ 2ðNþ1�WÞðN�WÞ

XN�1

n¼W

XN�1�n

i¼0

hðr�kXi�XiþnkÞ;

ð3Þ

where h(x) is the Heaviside step function defined by

hðxÞ ¼ 1 x > 0;0 x 6 0:

This formula eliminates all sets of vectors that lie withinW of each other, and in this study, W was set to be theproper time delay s. When W = 1, Eq. (3) is reduced tothe standard Grassberger-Procaccia formula [23]. In thecurve of logC(W,N,r) vs. logr, if the radius r were too small,random noise would be dominant, and the estimate of D2

would continue to increase with m. On the other hand, ifr were too large, all dot pairs in the reconstructed phasespace would be smaller than r, and the estimate of D2

would approach zero. There is a finite region of r betweenthese two cases called the scaling region [23] in which theslopes of logC(W,N,r) vs. log (r) curves increase at first buteventually converge when m is sufficiently large. The satu-rated slopes give the estimate of the correlation dimensionof a voice signal.

Lyapunov exponents are one of the most effective dy-namic descriptors of chaos [25,26]. Lyapunov exponentsdescribe the average divergent or convergent rates ofneighboring orbits in phase space. Generally, a system withone positive Lyapunov exponent is chaotic, while a systemwith no positive exponents is regular. Based on the recon-structed phase space, we can approach the reconstructeddynamics as

Xnþ1 ¼ FnXn; ð4Þ

where F is a differentiable vector function. For a suffi-ciently small perturbation dXn to the state Xn, we havethe following perturbation from Eq. (4),

dX � DF dX ; ð5Þ

0 200 400 600 800 100010-6

10-4

10-2

100

102

1.00 1.04 1.08 1.12 1.16 1.20

0

2000

4000

Frequency (Hz)

Am

plitu

deG

lott

al a

rea

(pix

els)

Time (s)

-1000 0 1000 2000 3000 4000 5000-1000

0

1000

2000

3000

4000

5000

x(t+

3Δt)

x(t)

a

b

c

d

2 3 4 5 6 7 8-16

-12

-8

-4

0

ln (r)

ln C

(r)

0.0 0.1 0.2 0.3 0.4 0.5-1.5

-1.0

-0.5

0.0

0.5

1.0

Time (s)

Max

imal

Lya

puno

v ex

pone

nt

Fig. 3. Regular dynamics of the glottal area series from a normal subject.(a) Periodic time series (upper) and frequency spectrum (lower); (b) Thereconstructed phase space; (c) The correlation integral ln C (r) vs ln (r);(d) The estimated Lyapunov exponent.

Y. Zhang et al. / Chaos, Solitons & Fractals 44 (2011) 169–177 173

where DFn is the Jacobian matrix of F with respect to Xn,which can be obtained by applying the least-squares errormethod to local tangent vectors [26]. Using Eq. (5), we canobtain the relationship between Xn+1 and X1 as

dXnþ1 � DFnDFn�1dXn�1 � � � � �Yn

j¼1

DFjdX1: ð6Þ

Considering the convergent or divergent tendency of a setof bases {ei},i = 1,2, . . . ,m in their tangent space, we thenhave the definition of Lyapunov exponent

ki ¼ limn!11n

logYn

j¼1

DFjei

����������: ð7Þ

In numerical calculations, all vectors tend to fall along thepath of most rapid growth, corresponding to the maximalLyapunov exponent k1 [25], that is,

kdXnþ1k � kdX1k expðk1nÞ: ð8Þ

In order to obtain other Lyapunov exponents, theGram–Schmidt reorthonormalization procedure shouldbe used [25]. For the glottal area time series s(tn), we usedthe method of Holzfuss and Lauterborn [26,27] to calculatethe maximal Lyapunov exponent k1, although the wholeLyapunov exponent spectrum can be obtained using thismethod. We calculated the Lyapunov exponents of theLorenz equation: _x ¼ 16ðy� xÞ; _y ¼ xð45:92� zÞ � y; _z ¼xy� 4z from a time series of x(t) as k1 � 1.5, k2 � 0, andk3 � � 22.5. The estimated maximal Lyapunov exponentof the Lorenz chaotic attractor agrees with the valuesknown from the literature [25], which examines ournumerical procedure.

2.4. Statistical analysis

The correlation dimension D2 and Lyapunov exponentk1 of the HSDI samples from twelve normal subjects andeighteen pathological subjects were calculated. Jitter andshimmer of nearly-periodic signals from twelve normalsubjects and ten pathological subjects were calculatedusing CSpeech. Because it could not be predefined whetherthe tested groups were from normally distributed popula-tions, the Mann–Whitney rank sum test was applied usingjitter, shimmer, D2, and k1 as dependent variables andthe normal and pathological subject groups as indepen-dent variables. The SigmaPlot 4.0 (Jendel Scientific) andSigmaStat 3.0 (Jendel Scientific) software were used forstatistical analysis and graphing. The statistical signifi-cance was set at 0.05.

3. Results

The glottal area of normal subject may show periodictime series and the frequency spectrum with clear har-monic structure. Fig. 3 displays the typical results of ana-lyzing the glottal area time series s(tn) of a normalsubject. The time series and frequency spectrum areshown in Fig. 3(a) as the upper and lower graphs, respec-tively. The percent jitter and shimmer were estimated as0.37% and 1.33%. The normal glottal area produces a

0 200 400 600 800 100010-6

10-4

10-2

100

102

0.10 0.12 0.14 0.16 0.18 0.20

0

2000

4000

6000

Time (s)

Am

plitu

de

Frequency (Hz)

Glo

ttal

are

a (p

ixel

s)

0 1500 3000 4500 6000

0

1500

3000

4500

6000

x(t+

2 Δt)

x(t)

2 3 4 5 6 7 8-16

-12

-8

-4

0

ln C

(r)

ln (r)

0.0 0.2 0.4 0.6 0.8 1.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

λ 1

Time (s)

a

b

c

d

Fig. 4. Irregular dynamics of the glottal area series from a patient withvocal polyps. (a) Aperiodic time series (upper) and frequency spectrum(lower); (b) The reconstructed phase space; (c) The correlation integral lnC(r) vs ln (r); (d) The estimated positive Lyapunov exponent.

174 Y. Zhang et al. / Chaos, Solitons & Fractals 44 (2011) 169–177

regular reconstructed phase space, as shown in Fig. 3(b),where the time delay was determined to be 3Dt usingthe mutual information method. Fig. 3(c) shows the depen-dence of log C(r) on logr, where the curves from top tobottom correspond to the embedding dimension m = 1,2, . . . ,12, respectively. With the increase of m, the correla-tion dimension D2 = 1.25 ± 0.02 can be obtained using acurve fit to theses curves in the scaling region(e6.5 < r < e7.8). In Fig. 3(d), the estimated Lyapunov expo-nent k1 of the normal subject asymptotically approachesa constant of 0.03 with the evolution of time, suggestingthe regular dynamics of the normal glottal area data.

However, the glottal area of pathological subjects mayshow aperiodic time series and the frequency spectrumwith strong noisy components. Fig. 4 shows the typical re-sults of the glottal area time series of a patient with vocalnodules. The upper and lower graphs in Fig. 4(a) illustratethe time series and frequency spectrum, respectively. Jitterand shimmer cannot be reliably obtained because the glot-tal area series shows strong aperiodicity. The glottal areafrom the patient with vocal nodules displays the irregularreconstructed phase space with the proper time delay of2Dt, as shown in Fig. 4(b). Fig. 4(c) shows the curves oflogC(r) vs. log r, where the curves from top to bottom cor-respond to the embedding dimension m = 1, 2, . . . ,12,respectively. With the increase of m, the correlationdimension D2 and its standard deviation can be obtainedas 3.30 and 0.05 in the scaling region (e6.8 < r < e7.6).Fig. 4(d) shows that the estimated Lyapunov exponent k1

approaches 0.24 as time evolves. The positive Lyapunovexponent indicates that the glottal area from the patientwith vocal nodules is chaotic.

Table 2 summarizes the results of the statistical analysisof the perturbation and nonlinear dynamic parameters. Fornearly periodic signals from twelve normal and ten patho-logical subject, Fig. 5 shows the results of traditional per-turbation analyses. The mean jitter value of the normalgroup is 0.34% which is significantly lower than the valueof 0.74% in the pathological group (p = .018). However,shimmer did not show a significant difference betweenthe normal and patient groups at the 0.05 significance le-vel. For both periodic and aperiodic signals from twelvenormal and eighteen pathological subjects, Fig. 6 showsthe distributions of the nonlinear dynamic parameters ofD2 and k1. The mean correlation dimensions of the normaland pathological groups (and their standard deviations)were 1.29 (0.33) and 1.98 (0.56), respectively. The meanLyapunov exponent of the normal group (and its standarddeviation) was 0.031 (0.014) while that of the pathologicalgroup was 0.12 (0.09). Results of the Mann–Whitney ranksum test performed on these two nonlinear dynamicparameters revealed that both the correlation dimensionsand Lyapunov exponents of the pathological group weresignificantly higher than those of the normal group(p < 0.001).

4. Discussion

Nonlinear dynamical features have been applied to ana-lyze voices in speaker identification [6,7,28] and music

Table 2Comparisons of normal and pathological groups for differences in perturbation parameters (jitter and shimmer) and nonlinear dynamic parameters (correlationdimension D2 and the Lyapunov exponent k1).

Normal Pathological Mann–Whitney Rank Sum Test (p-value)

Mean SD Mean SD

Jitter 0.34 0.15 0.74 0.62 0.018a

Shimmer 1.18 0.40 1.60 1.25 0.903D2 1.29 0.33 1.98 0.56 <0.001a

LCE 0.031 0.014 0.12 0.09 <0.001a

a Significant.

0.0

.5

1.0

1.5

2.0

2.5

3.0

0

1

2

3

4

5

6Jitter (%)

P

p = 0.018 p = 0.903

Shimmer (%)

N P N

Fig. 5. The distributions of traditional perturbation parameters (jitter andshimmer), where N and P correppond to normal and pathological groups,respectively. The line inside the box marks the median, whiskers mark the10th and 90th percentiles, and the dots are the outlying points.

0

1

2

3

4

0.0

.1

.2

.3

.4

.5

p < 0.001 p < 0.001

P N P N

D2 λ1

Fig. 6. The distribution of nonlinear dynamic parameters (correlationdimension D2 and the maximal Lyapunov exponent k1), where N and Pcorreppond to normal and pathological groups, respectively. The lineinside the box marks the median, whiskers mark the 10th and 90thpercentiles, and the dots are the outlying points.

Y. Zhang et al. / Chaos, Solitons & Fractals 44 (2011) 169–177 175

classification [29]. However, the recorded voice signal inte-grates vocal fold vibrations with other external factorssuch as vocal tract filtering and aerodynamic turbulentnoise [1–11]. Since nonlinearity also comes from turbulentnoise, the voice signals cannot provide direct informa-tion for nonlinearity of vocal fold vibration. Our recent

in vitro studies have found [14,27] that excised larynxactivities can display nonlinear dynamic behaviors whensubglottal pressure is extremely high. However, it did notshow a direct clinical application for detecting laryngealdisease. Clinically, vocal disorders may be observed in pa-tients with laryngeal diseases without too much voice ef-fort. This study represents a direct clinical application ofnonlinear dynamic analysis to hig-speed imaging by quan-tifying irregular dynamics of pathological vocal folds fromhuman subjects. As far as we know, this is the first timethat nonlinear dynamic analysis has been applied to revealthe evidence of chaos in HSDI of human subjects. Applyingnonlinear dynamic analysis to HSDI may also be valuablein describing the chaotic dynamics of pathological vocalfold vibrations.

Visualization analysis methods such as the analyticphase plot based on the Hilbert transform [15] have beenapplied to plot the vibrations of pathological vocal foldsin HSDI. Phase space reconstruction applied in this studyshows a qualitative similarity with the analytic phase plotin illustrating the behaviors of high-speed image data. Asshown in Figs. 3 and 4, the nearly periodic glottal move-ment from a normal subject shows a regular reconstructedphase space, and the aperiodic glottal movement from apathological subject shows an irregular pattern of recon-structed phase space. However, neither of these two meth-ods can provide quantitative information about aperiodicdynamics of high-speed images, and they may not quanti-tatively detect subtle changes in vocal function. In recentyears, quantitative analysis methods have not been appliedto aperiodic high-speed image data from human subjects[12,13,15]. Thus, reliable and objective measures pre-sented in this study for quantitative assessment of disor-dered glottal movement in HSDI could be useful.

Jitter and shimmer describe variance in cycle-to-cyclefrequency and amplitude, respectively, and they requireaccurate voice pitch determination. Both parameters havebeen previously applied to objectively evaluate the vocalfold vibratory quality from HSDI and document clinicaltreatment effects [15,30]. Yan et al. [15] found that the jit-ter and shimmer of the glottal area time series of patientswith laryngeal pathologies are much higher than those ofnormal subjects. Schwarz et al. [30] found that after voicetherapy, jitter values of the glottal movements decreasedwhile shimmer values did not consistently follow this pat-tern. In this study, we performed the perturbation analysisfor the nearly periodic glottal area series from HSDI. Theresults in Fig. 5 show that there was a significant differencebetween pathological and normal groups in jitter measure-

176 Y. Zhang et al. / Chaos, Solitons & Fractals 44 (2011) 169–177

ments. This did not, however, hold true for shimmer. Inaddition, jitter and shimmer have been found to be sensi-tive to variations in recording systems, analysis systems,and extraction algorithms [31–33]. Low sampling rateand image resolution of HSDI will cause notable errors inpitch extractions and perturbation parameter estimationsthat may be comparable to the magnitude of differencein perturbation between normal and pathological vocalfolds. Furthermore, these perturbation parameters cannotreliably analyze strongly aperiodic signals [17–19]. Titze[18] suggested that signals should be categorized as Type1, 2, or 3 to determine if a particular signal is appropriatefor perturbation analysis. Type 1 signals are nearly periodicand suitable for perturbation analysis. Type 2 signals,which contain strong modulations or subharmonics, andType 3 signals, which are aperiodic or chaotic, may notbe appropriate for perturbation analysis. As shown inFig. 4(a), the aperiodic waveform and broadband spectrumof the glottal area series from the subject with vocal polypshave the typical characteristics of the Type 3 signal. The ill-defined signal pitch prevents stable estimations of jitterand shimmer. The large variance and poor reliability makejitter and shimmer unsuitable to analyze this aperiodicglottal area data from HSDI. These studies suggest that per-turbation analysis should be cautiously applied in theassessment of laryngeal pathologies using HSDI. Nonlineardynamic methods allowing for the analysis of aperiodicvoices may prove to be important tools.

In this study, we have applied nonlinear dynamic meth-ods to quantitatively describe the irregular dynamics of thevocal folds. The aperiodic glottal area from the pathologicalsubject shows a fractal dimension (D2 � 3.3) and a positiveLyapunov exponent (k1 � 0.24), as shown in Fig. 4. The cor-relation dimensions of the pathological group are signifi-cantly higher than those of the normal group (p < 0.001),indicating the greater complexity of pathological vocalfolds. As shown in Fig. 6, the glottal area data from HSDIshows finite dimension values (D2 < 4). This suggests thatfinite state variables might be sufficient to describe thedynamics of vocal fold vibrations in pathological subjects.In addition, the Lyapunov exponents of the pathologicalgroup are significantly higher than those of the normalgroup (p < 0.001). The positive Lyapunov exponents showdirect evidence of chaos in HSDI. These findings agree withour recent model study [34] that more state variables areneeded to simulate the vibratory dynamics of pathologicalvocal folds and that irregular vocal fold vibrations areattributed to chaos. In addition, unlike jitter and shimmer,nonlinear dynamic analysis does not require accurate pitchextraction. Recent study has shown that nonlinear dy-namic analysis is more effective than perturbation analysisin describing noisy time series [31]. Thus, nonlinear dy-namic analysis can successfully quantify the irregulardynamics of vocal fold vibrations despite the low samplingrate and image resolution of HSDI.

5. Conclusion

In this paper, we have applied perturbation and nonlin-ear dynamic analyses for the glottal area signals extracted

from HSDI of normal and pathological subjects. The normalsubject typically shows nearly periodic time series, dis-crete spectra, and regular dynamics. Conversely, the path-ological subjects typically show aperiodic glottal area timeseries, broadband spectra, and irregular dynamics. Finitecorrelation dimensions and positive Lyapunov exponentsdemonstrated low-dimensional chaotic characteristics ofirregular laryngeal vibrations from patients with laryngealpathologies. Furthermore, nonlinear dynamic and tradi-tional perturbation analyses of the normal and pathologi-cal groups have been compared. Shimmer did not show asignificant difference between the normal and pathologicalgroups. Correlation dimension and the Lyapunov exponentwere significantly more effective at differentiating be-tween normal and pathological groups than jitter andshimmer. These results showed that traditional perturba-tion analysis should be cautiously applied. However, wediscovered that nonlinear dynamic methods could be usedto describe the irregular dynamics of vocal fold vibrationsfrom HSDI. Applying nonlinear dynamic analysis in HSDIprovided direct evidence of chaos in real pathological lar-yngeal systems. Such quantitative studies in HSDI areimportant because established quantitative results mightbe helpful in determining the dynamic mechanism of vocaldisorders and examining clinical voice analysis methods.

Acknowledgments

This study was supported by National Institutes ofHealth grant 1-RO1DC05522 and National Institutes ofHealth grant 1-RO1DC006019 from the National Instituteof Deafness and Other Communication Disorders. Thiswork is supported by the National Sciences Foundation ofChina (11071283), the Sciences Foundation of Shanxi(2009011005-3) and the Major Subject Foundation ofShanxi (20091028).

References

[1] Titze IR, Baken R, Herzel H. Evidence of chaos in vocal fold vibration.In: I. R. Titze, editors. Vocal fold physiology: new frontier in basicscience, singular, San Diego; 1993. p.143–188.

[2] Herzel H, Berry D, Titze IR, Steinecke I. Nonlinear dynamics of thevoice: signal analysis and biomechanical modeling. Chaos 1995;5:30–4.

[3] Tao C, Liu X, Du GH. Detection of dynamical structure from short andnoisy chaotic series. Phys Rev E 2010;81(4). 046209-1-6.

[4] Vaziri G, Almasganj F, Behroozmand R. Pathological assessment ofpatients’ speech signals using nonlinear dynamical analysis. ComputBiol Med 2010;40:54–63.

[5] Mu J, Tao C, Du GH. Extracting periodic driving signal from chaoticnoise. Progr Nat Sci 2003;13:666–71.

[6] Sabanal S, Nakagawa M. The fractal properties of vocal sounds andtheir application in the speech recognition model. Chaos SolitonFract 1996;7:1825–43.

[7] Petry A, Barone D. Speaker identification using nonlinear dynamicalfeatures. Chaos Soliton Fract 2002;13:221–31.

[8] Behroozmand R, Almasganj F, Moradi MH. International conferenceon acoustics, speech and signal processing, Pathological assessmentof vocal fold nodules and polyp using acoustic perturbation andphase space features. In: Proceedings of the IEEE internationalconference on acoustics, speech and signal processing, vol. II. 2006.p. 1056A.

[9] Jiang JJ, Zhang Y. Nonlinear dynamic analysis of speech frompathologic subjects. Electron Lett 2002;38:294–5.

Y. Zhang et al. / Chaos, Solitons & Fractals 44 (2011) 169–177 177

[10] Zhang Y, McGilligan C, Zhou L, Vig M, Jiang JJ. Nonlinear dynamicanalysis of voices before and after surgical excision of vocal polyps. JAcoust Soc Am 2004;115:2270–7.

[11] Baken RJ. Clinical measurement of speech and voice. California:Singular Publishing Group; 2000.

[12] Dollinger M, Hoppe U, Hettlich F, Lohscheller J, Schuberth S, EysholdtU. Vibration parameter extraction from endoscopic image series ofthe vocal folds. IEEE Trans Biomed Eng 2002;49:773–81.

[13] Olthoff A, Woywod C, Kruse E. Stroboscopy versus high-speedglottography: a comparative study. Laryngoscope 2007;117:1123–6.

[14] Zhang Y, Jiang JJ. Spatiotemporal chaos in excised Larynx vibrations.Phys Rev E 2005;72:35201–4.

[15] Yan Y, Ahmad K, Kunduk M, Bless D. Analysis of vocal fold vibrationsfrom high-speed laryngeal images using Hilbert transform-basedmethodology. J Voice 2005;19:161–75.

[16] Zhang Y, Bieging E, Tsui H, Jiang JJ. Efficient and effective extractionof vocal fold vibratory patterns from high-speed digital imaging. JVoice 2010;24:21–9.

[17] Milenkovic P, Read C, Cspeech version 4 user’s manual, Madison, WI:Author; 1992.

[18] Titze IR. Summary statement: workshop on acoustic voice analysis.National Center for Voice and Speech, Denver. Colorado; 1995.

[19] Giovanni A, Ouaknine M, Triglia J-M. Determination of largestLyapunov exponents of vocal signal: application to unilateralLaryngeal paralysis. J Voice 1998;13:341.

[20] Packard NH, Crutchfield JP, Farmer JD, Shaw RS. Geometry from atime series. Phys Rev Lett 1980;45:712.

[21] Takens F. Detecting strange attractors in turbulence. In: Rand DA,Young B-S, editors. Lecture Notes in Mathematics, Vol.898. Berlin: Springer-Verlag; 1981. p. 366–81.

[22] Fraser AM, Swinney HL. Independent coordinates for strangeattractors from mutual information. Phys Rev A 1986;33:1134–40.

[23] Grassberger P, Procaccia I. Measuring the strangeness of strangeattractors. Phys D 1983;9:189–208.

[24] Theiler J. Spurious dimension from correlation algorithms applied tolimited time series data. Phys Rev A 1986;34:2427–32.

[25] Wolf A, Swift JB, Swinney HL, Vastano JA. Determining Lyapunovexponents from a time series. Phys D 1985;16:285–317.

[26] Holzfuss J, Lauterborn W. Liapunov exponents from a time series ofacoustic chaos. Phys Rev A 1989;39:2146–52.

[27] Jiang JJ, Zhang Y, Ford CN. Nonlinear dynamic of phonations inexcised Larynx experiments. J Acoust Soc Am 2003;114:2198–205.

[28] Seo JP, Kim MS, Baek IC, Kwon YH, Lee KS. Similar speakerrecognition using nonlinear analysis. Chaos Soliton Fract 2004;21:159–64.

[29] Bigerelle M, Iost A. Fractal dimension and classi�cation of music.Chaos Soliton Fract 2000;11:2179–92.

[30] Schwarz R, Hoppe U, Schuster M, Wurzbacher T, Eysholdt U,Lohscheller J. Classification of unilateral vocal fold paralysis byendoscopic digital high-speed recordings and inversion of abiomechanical model. IEEE Trans Biomed Eng 2006;53:1099–108.

[31] Zhang Y, Wallace SM, Jiang JJ, Zhou L. Comparison of nonlineardynamic methods and perturbation methods for voice analysis. JAcoust Soc Am 2005;118:2551–60.

[32] Titze IR, Liang H. Comparison of F0 extraction method for high-precision voice perturbation measurements. J Speech Hear Res1993;36:1120–33.

[33] Karnell MP, Chang A, Smith A, Hoffman H. Impact of signal type ofvalidity of voice perturbation measures. NCVS Status Progr Rep1997;11:91–4.

[34] Zhang Y, Jiang JJ. Chaotic vibratory behaviors of a vocal-fold modelwith a unilateral polyp. J Acoust Soc Am 2004;115:1266–9.