4
F. Gritti 1 , L. Bocchi 1 , I.Romagnoli 2 , F.Gigliotti 2 and C. Manfredi 1 1 Department of Electronics and Telecommunications, Università degli Studi di Firenze, Firenze, Italy 2 Fondazione Don C. Gnocchi, Impruneta, Firenze, Italy AbstractOne of the most common applications of snore sound analysis is to find out its relationship with obstructive sleep apnoea (OSA). The present work is aimed at developing a method for processing full night snore recordings obtained from a simple recording system that can be used both for clinical and home application. Automatic extraction of snore sounds only could in fact allow for saving much time usually required for manual analysis The analysis system consists of three steps: pre-processing, automatic segmentation, extraction of features and classification. The automatic segmentation is based on short-term energy measures: Otsu thresholding is applied to the histogram of the audio signal energy to detect the starting and ending points of sound events in the whole recording. Once all the sound events from the signal are obtained, they have to be classified as snore, breath or “other”. For a reliable analysis of OSA, only snore events must be detected. To this aim, we present a new classification system based on two Artificial Neural Networks applied to four features: length, energy, standard deviation and maximum amplitude, for each extracted event. Audio data from 24 patients were used to test the method; on the dataset, a sensitivity of 86.2% and a specificity of 86.3% were obtained. Future work will be devoted to enhancing the procedure and to defining a reliable method for the identification of post-apnoeic events from the detected snore sounds. KeywordsSnore, obstructive sleep apnoea, neural network, automatic segmentation. I. INTRODUCTION Narrowing or obstruction in the upper airways gives rise to air turbulence and vibration of the soft palate, thus producing snore sounds. Occasional snoring, usually due to fatigue or bad neck position, is quite common but is not of concern to health. Instead, loud and regular snoring is a circumstance that may be associated with Obstructive Sleep apnoea (OSA) [1]. OSA is characterized by instability of the upper airways during sleep, which results in markedly reduced (hypopnoea) or absent (apnoea) airflow at the nose/mouth. Episodes are typically accompanied by oxy-hemoglobin de- saturation and terminated by brief micro arousals that result in sleep fragmentation and diminished amount of slow wave and REM sleep. The disease is associated with significant clinical consequences including neuro-cognitive dysfunction, cardiovascular disease (hypertension, stroke, myocardial infarction, heart failure), metabolic dysfunction, respiratory failure, and cor pulmonale [2]. Despite the high prevalence of strong snoring in the population, characteristic of OSA, this disorder is frequently unrecognised and undiagnosed. Polysomnography (PSG) is the “gold standard” test for the diagnosis of OSA. The test consists in an overnight study during which multiple physiologic signals of the sleeping patient are monitored. PSG is expensive and unsuited for community screening and the high prevalence for OSA require considering other simplified approaches to the diagnosis. For this reason the analysis of snore sounds has recently received more attention thanks to its diagnostic potential for detecting sleep apnoea non-invasively. Commonly tracheal respiratory sounds are recorded using a microphone placed over the patient’s neck or hung above the patient’s head during the night, leading to long lasting audio signals (68 hours). Hence an automatic and accurate identification of snore episodes from full-night recordings is desirable to obtain objective data and also to save time. Several methods for automatic classification of snore and not-snore sounds have been employed using different characteristics of the respiratory sounds, such as neural network classifier based on temporal and spectral features of sound signals [3], pitch of the sounds signals [4], hidden Markov models (HMM) based on Mel-frequency cepstral coefficients (MFCC) of sounds [5]. However, most often in these studies the automatic segmentation step is not included, the snore events being detected manually or with semi-automatic methods. In this work we propose a method for automatic detection of snore events from audio recordings using short- term energy measure and an automatic classification based on four basic features: length, energy, standard deviation and maximum amplitude of each extracted sound event. Á. Jobbágy (Ed.): 5th European IFMBE Conference, IFMBE Proceedings 37, pp. 183–186, 2011. www.springerlink.com Automatic Detection of Snore Events from Full Night Audio Recordings

[IFMBE Proceedings] 5th European Conference of the International Federation for Medical and Biological Engineering Volume 37 || Automatic Detection of Snore Events from Full Night

  • Upload
    akos

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: [IFMBE Proceedings] 5th European Conference of the International Federation for Medical and Biological Engineering Volume 37 || Automatic Detection of Snore Events from Full Night

F. Gritti1, L. Bocchi1, I.Romagnoli2, F.Gigliotti2 and C. Manfredi1

1 Department of Electronics and Telecommunications, Università degli Studi di Firenze, Firenze, Italy

2 Fondazione Don C. Gnocchi, Impruneta, Firenze, Italy

Abstract— One of the most common applications of snore sound analysis is to find out its relationship with obstructive sleep apnoea (OSA). The present work is aimed at developing a method for processing full night snore recordings obtained from a simple recording system that can be used both for clinical and home application. Automatic extraction of snore sounds only could in fact allow for saving much time usually required for manual analysis

The analysis system consists of three steps: pre-processing, automatic segmentation, extraction of features and classification.

The automatic segmentation is based on short-term energy measures: Otsu thresholding is applied to the histogram of the audio signal energy to detect the starting and ending points of sound events in the whole recording.

Once all the sound events from the signal are obtained, they have to be classified as snore, breath or “other”. For a reliable analysis of OSA, only snore events must be detected. To this aim, we present a new classification system based on two Artificial Neural Networks applied to four features: length, energy, standard deviation and maximum amplitude, for each extracted event.

Audio data from 24 patients were used to test the method; on the dataset, a sensitivity of 86.2% and a specificity of 86.3% were obtained. Future work will be devoted to enhancing the procedure and to defining a reliable method for the identification of post-apnoeic events from the detected snore sounds.

Keywords— Snore, obstructive sleep apnoea, neural

network, automatic segmentation.

I. INTRODUCTION

Narrowing or obstruction in the upper airways gives rise to air turbulence and vibration of the soft palate, thus producing snore sounds. Occasional snoring, usually due to fatigue or bad neck position, is quite common but is not of concern to health. Instead, loud and regular snoring is a circumstance that may be associated with Obstructive Sleep apnoea (OSA) [1].

OSA is characterized by instability of the upper airways during sleep, which results in markedly reduced (hypopnoea) or absent (apnoea) airflow at the nose/mouth.

Episodes are typically accompanied by oxy-hemoglobin de-saturation and terminated by brief micro arousals that result in sleep fragmentation and diminished amount of slow wave and REM sleep. The disease is associated with significant clinical consequences including neuro-cognitive dysfunction, cardiovascular disease (hypertension, stroke, myocardial infarction, heart failure), metabolic dysfunction, respiratory failure, and cor pulmonale [2].

Despite the high prevalence of strong snoring in the population, characteristic of OSA, this disorder is frequently unrecognised and undiagnosed.

Polysomnography (PSG) is the “gold standard” test for the diagnosis of OSA. The test consists in an overnight study during which multiple physiologic signals of the sleeping patient are monitored. PSG is expensive and unsuited for community screening and the high prevalence for OSA require considering other simplified approaches to the diagnosis. For this reason the analysis of snore sounds has recently received more attention thanks to its diagnostic potential for detecting sleep apnoea non-invasively.

Commonly tracheal respiratory sounds are recorded using a microphone placed over the patient’s neck or hung above the patient’s head during the night, leading to long lasting audio signals (6–8 hours). Hence an automatic and accurate identification of snore episodes from full-night recordings is desirable to obtain objective data and also to save time.

Several methods for automatic classification of snore and not-snore sounds have been employed using different characteristics of the respiratory sounds, such as neural network classifier based on temporal and spectral features of sound signals [3], pitch of the sounds signals [4], hidden Markov models (HMM) based on Mel-frequency cepstral coefficients (MFCC) of sounds [5]. However, most often in these studies the automatic segmentation step is not included, the snore events being detected manually or with semi-automatic methods.

In this work we propose a method for automatic detection of snore events from audio recordings using short-term energy measure and an automatic classification based on four basic features: length, energy, standard deviation and maximum amplitude of each extracted sound event.

Á. Jobbágy (Ed.): 5th European IFMBE Conference, IFMBE Proceedings 37, pp. 183–186, 2011. www.springerlink.com

Automatic Detection of Snore Events from Full Night Audio Recordings

Page 2: [IFMBE Proceedings] 5th European Conference of the International Federation for Medical and Biological Engineering Volume 37 || Automatic Detection of Snore Events from Full Night

II. MATERIALS AND METHODS

Clinical audio signals were recorded at Fondazione Don Gnocchi, Pozzolatico, Firenze, while home recordings were obtained from volunteers at their private home. The length of each signal is of about 7-8 hours. Patients slept in single bedroom, separated from partner, pets, television and others predictable sources of noise.

The audio signal has been digitized at 16-bit with a sampling frequency Fs=44.100 kHz, using a Tascam Us-144 sound card and a unidirectional microphone Shure SM58, positioned at about 30 cm from the mouth of the patient.

The aim of the analysis system is the detection of snoring events from the audio signal. This has been achieved by means of the following three steps: A. Pre-processing: loading of audio signal, band pass

filtering and down sampling; B. Automatic segmentation: detection of the sound parts of

the signal; C. Extraction of the features and classification:

identification of snoring events. The proposed method was developed under Matlab 7.11.00 software tool. A flow chart is shown in Figure 1. The next sections describe each step in detail.

A. Pre-processing

Through a user-friendly dialog window the user loads the audio signal and sets the following parameters for subsequent processing:

Sampling frequency (44.100 kHz by default); Down sampling frequency (11.025 kHz by default); Starting and ending samples, to select the part of

the signal to be processed; Size of analysis window (40 ms by default).

The first step in the analysis of respiratory sounds is to remove the effects of low and high-frequency noise components.

Since tracheal sounds have most components in the frequency range 100 Hz - 1000 Hz [6][7], in this study the recorded sounds have been band-pass filtered by a Butterworth filter of order 5 and a cut-off frequency of 100–1000 Hz to reduce the effects of heart sounds and high-frequency noises [1].

Main frequency components of breathing and snoring sounds are in fact included in this range.

After the filtering step, the signal was down sampled to 11.025 kHz, to reduce the size of the data and hence speed up signal processing.

Fig. 1 Flow chart of the analysis system.

B. Automatic segmentation

When recording nocturnal tracheal sounds, the audio signal is composed of two types of events. The first one is made up by “silence” events i.e. those that do not contain any sound; the second one is made up by “sound” events that include breathing episodes, snoring episodes and “other” sounds such as oral noise, ambient sounds, patient’s cough, speech and blanket movements.

The aim is the extraction of the “sound” parts of audio signal only, while ‘silence’ segments must be removed.

One of the most used techniques is based on short-term energy measures (STE), that increases during “sound” events and decreases during “silence” episodes [8], [9]. Hence, the signal energy was used here with suitable thresholds to separate “sound” and “silence” events.

Extraction of features

Classification

Short-Term Energy (STE)

Signal Processing

STE Histogram

Otsu threshold

Starting/Ending points of sound events

Recorded Signal

Bandpass Filtering

Loading Signal

Downsampling

184 F. Gritti et al.

IFMBE Proceedings Vol. 37

Page 3: [IFMBE Proceedings] 5th European Conference of the International Federation for Medical and Biological Engineering Volume 37 || Automatic Detection of Snore Events from Full Night

The pre-processed signal was divided into windows of 40ms in length with 50% overlap between adjacent windows. In each window the Short-Term Energy was computed as:

(1) Where n is the number of elements in the window, s is

the signal and k is a small constant to avoid log(0). Then, the histogram of the signal energy was computed

and the Otsu method was iteratively applied to obtain two thresholds: the upper one tu and the lower one tl [10], [11]. Further, the whole signal was processed and the starting and ending points of each single “sound” event was found as follows: when the STE curve overpasses the upper threshold, the first point under the lower threshold (on the left side of the curve with respect to the upper threshold) is detected in order to get the starting point. When the STE curve falls down tl, the ending point of the event is found (Figure 2). At the end of this step, all the “sound” events in the audio signal are extracted and stored for the next elaboration.

Fig. 2. The starting and ending points of a “sound” event.

C. Extraction of the features and classification

In previous studies, the analysis of snoring signal was performed in time or in frequency domain and several kinds of features were taken into consideration [2]. In this work, a basic set of features was extracted from each “sound” event to classify the event. Four parameters were find out to describe the sound event in the time domain. The first one is the length of each sound event calculated as the distance between the starting and the ending points of the event. The second parameter is the standard deviation (STD) of the sound event. The third one is its maximum amplitude calculated as the difference between the maximum and minimum amplitude of the signal. The last parameter is the mean value of the Short-Term Energy (STE) computed according to Eq.1.

From the analysis of the characteristics of the “sound” events, the following observations can be made: Snoring and breathing sounds usually have longer

duration than “other” sounds. A typical situation is shown in Figure 3;

In snoring sounds the maximum amplitude, the mean value of STE and the STD are usually higher than in breathing and “other” sounds.

Fig. 3. Length of "sound” events.

Hence, a classification system was designed, made up by two artificial neural networks, aimed to identify the three kinds of sounds that have different characteristics: snoring, breathing and “other”. In particular, the first network was used to identify the “other” sounds, while the second one was used to discriminate between snoring and breathing sounds. A training set was set up (after listening the recordings and manually classifying the events), made up by 1643 snoring, breathing and ‘other’ sounds in total.

The first network was trained with all the events of the training set using only the length of the event as input. The outcome of listening was used as teaching input, where snoring and breathing sounds are labelled with 1 and “others” with 0. After the training step, the network output is tested and compared with a desired input. The “other” sounds correctly recognized as “other” (true negative) were removed from the training set used in the second network.

The second network consists of three inputs, corresponding to the mean value of STE, the STD and the maximum amplitude, respectively. In the teaching input snoring sounds are labelled with 1, while breathing and “other” not correctly recognised (false positive) by the first network, are labelled with 0.

The settings for both neural networks were defined as follows: the number of hidden units was experimentally set equal to 10; a sigmoidal curve was adopted for the hidden units, while the output unit has a hyperbolic tangent activation function; the training method was the error back-propagation rule in 300 iterations, using an adaptive learning rate with momentum.

Automatic Detection of Snore Events from Full Night Audio Recordings 185

IFMBE Proceedings Vol. 37

Page 4: [IFMBE Proceedings] 5th European Conference of the International Federation for Medical and Biological Engineering Volume 37 || Automatic Detection of Snore Events from Full Night

III. RESULTS

Audio recordings from 24 patients of different age and sex were considered. Six cases out of the 24 ones were not used to test the proposed method due to too low signal amplitude and too many artefacts corrupting the signal. The other 18 recordings were analysed. Thirty minutes were extracted from each recording, selected in the central part of the signal when the patient was certainly sleeping and low environmental noise was detected.

A preliminary evaluation was carried out to assess the performances of the pre-processing step: band pass filtering allowed to reduce low and high frequency noise components; thus the thresholds computed in the segmentation phase were consistent with the snore sounds.

The accuracy of the automatic segmentation, evaluated as the number of sounds detected over the total number of sounds, is about 97%. Silence events were not extracted, according to the aims of this step of analysis.

In order to evaluate the performance of the classifier, the two trained networks were used in a testing phase. Results are summarized in Table I.

Table 1 Results obtained with the two trained networks.

Parameter 1ST Network [%] 2ST Network [%]

Accuracy 83.8 86.2 Sensibility Specificity

83.6 85.4

86.2 86.3

The first network was tested on 787 “sound” events,

different from the original training set. From the analysis of the ROC curve, obtained by varying the threshold applied to the output of the neural network, a “best” threshold was obtained that allows to correctly identify 85.4% of the “other” sounds. These sounds were stored in the list of not-snore events and removed from the test set.

The second network was tested on the remaining sounds. As for the first network, the ROC curve was set up and the best threshold was computed. This threshold was used to identify the snore and not-snore sounds stored in the list of the snore events and in that of the not-snore events (together with the sounds identified with the first network), respectively.

The accuracy (number of correct classifications) of the second network was found equal to 86.2%. This result corresponds to a sensitivity (true positive (TP) ratio) equal to 86.2 and a specificity (true negative (TN) ratio) equal to 86.3.

The large variety of different kind of snores did not allow for a perfect recognition, however these results seem quite good also as compared to existing literature [1], [11].

IV. CONCLUSIONS

A full automatic, highly sensitive system for snore identification during sleep is proposed. The algorithm fails to correctly identify snores in case of low intensity snores, as such events have low energy and low maximum amplitude. However, as post apnoeic snore events are more intense than non-post apnoeic ones, this limitation could be acceptable.

Future work will be devoted to enhancing the procedure and to defining a reliable method for the identification of post-apnoeic events from the automatically detected snore sounds.

ACKNOWLEDGMENTS

The authors greatly acknowledge Ente Cassa di Risparmio di Firenze, LIAB Project 2009, for contribution to this work.

REFERENCES

1. Ayatollah A., Moussavia Z. (2010) Automatic breath and snore sounds classification from tracheal and ambient sounds recordings Future of health insurance. Medical Engineering & Physics

2. Karunajeewa A.S., Udantha R. (2008) Silense-breathing-snore classification from snore-related sound. Physiol. Meas. 29:227-243

3. Jane R, Sola-Soler J, et al. (2000) Automatic detection of snoring signals: validation with simple snorers and OSAS patients. IEEE-EMBS. p. 3129– 31

4. Abeyratne U, Wakwella A, et al (2005) Pitch jump probability measures for the analysis of snoring sounds in apnea. Physiol Meas 26:779–98

5. Duckitt W, Tuomi S, Niesler T. (2006) Automatic detection, segmentation and assessment of snoring from ambient acoustic data. Physiol Meas 27:1047–56

6. Fiz J, Abad J. et al. (1996) Acoustic analysis of snoring in patients with simple snoring and obstructive sleep apnea. Eur Respir J 9:146-59

7. Beck R., Odeh M.et al. (1995) The acoustic properties of snores. Eur Respir J 8(12):2120-8

8. Deller J.R., Hansen J.H. et al. (1983) Discrete-Time processing of speech signal. Mcmillan Publishing Company

9. Kulkas A, Huupponen E. et al (2009) Intelligent methods for identifying respiratory cycle phases from tracheal sound signal during sleep. Comput Biol Med 39:1000-1005

10. Otsu N. (1979) A threshold selection method from gray-level histograms. IEEE Trans. Sys., Man., Cyber. Vol. 9(1), p.62-66

11. Calisti M., Bocchi L. et al. (2009) Automatic detection of snore episodes from full night sound recordings: home and clinical application. AVFA09, Madrid, Spain

186 F. Gritti et al.

IFMBE Proceedings Vol. 37