15
P. Foggia, A. Saggese, N. Strisciuglio, M. Vento University of Salerno - Italy M achine I ntelligence lab for V ideo , I mage and A udio processing "Cascade classifiers trained on gammatonegrams for reliably detecting audio events," Advanced Video and Signal Based Surveillance (AVSS), 2014 11th IEEE International Conference on , vol., no., pp.50,55, 26-29 Aug. 2014 - doi: 10.1109/AVSS.2014.6918643

Cascade classifiers trained on gammatonegrams for reliably detecting audio events

Embed Size (px)

DESCRIPTION

In this paper we propose a novel method for the detection of events of interest through audio analysis. The system that we propose is based on the representation of the audio streams through a Gammatone image, which describes the time-frequency distribution of the energy of the signal; this representation is inspired by the functioning of the human auditory system. A pool of AdaBoost cascade classifiers, one for each class of events of interest, is involved in the event detection stage. The performance of the proposed system has been evaluated on a large data set of audio events for surveillance applications and the achieved results, compared with two state of the art approaches, confirm its effectiveness. Downlaod the paper at: http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=6918643

Citation preview

Page 1: Cascade classifiers trained on gammatonegrams for reliably detecting audio events

P. Foggia, A. Saggese, N. Strisciuglio, M. Vento

University of Salerno - Italy

Machine Intelligence lab for Video, Image and Audio processing

"Cascade classifiers trained on gammatonegrams for reliably detecting audio events," Advanced Video and Signal Based Surveillance (AVSS), 2014 11th IEEE International Conference on ,

vol., no., pp.50,55, 26-29 Aug. 2014 - doi: 10.1109/AVSS.2014.6918643

Page 2: Cascade classifiers trained on gammatonegrams for reliably detecting audio events

State of the art

Single-layer representation or classification Vacher et al. (2004), Clavel et al. (2005): GMM classifier

Valenzise et al. (2007): GMM for background modeling

Rabaoui et al. (2008): OC-SVM with a novel dissimilaritymeasure.

Complex classification architecture or representation Rouas et al. (2006): GMM + SVM

Ntalampiras et al. (2009): two-stage GMM classifier Conte et al. (2012): two classifier with different time

resolutions

Chin and Burred (2012): sub-sequences matchingthrough Genetic Motif Discovery technique.

Page 3: Cascade classifiers trained on gammatonegrams for reliably detecting audio events

Proposed Architecture

ImageRepresentation

FeaturesExtraction

(Haar)

Cascade Classifiers

Page 4: Cascade classifiers trained on gammatonegrams for reliably detecting audio events

Audio representation

Biologically-inspired representation of audio streams as the response of the cochlea membrane in the human auditorysystem (Gammatone filter bank)

Scream Gun shot Glass breaking

Page 5: Cascade classifiers trained on gammatonegrams for reliably detecting audio events

Proposed Architecture

ImageRepresentation

FeaturesExtraction

(Haar)

Cascade Classifiers

Page 6: Cascade classifiers trained on gammatonegrams for reliably detecting audio events

Haar features

Haar Wavelets to describe local variations of energy in the Gammatonegram images f.i. abrupt variations of the energy distribution along time is effectively

described by a vertical Haar basis function

Efficiently computed from the Integral Image of the Gammatonegram

Page 7: Cascade classifiers trained on gammatonegrams for reliably detecting audio events

Proposed Architecture

ImageRepresentation

FeaturesExtraction

(Haar)

Cascade Classifiers

Page 8: Cascade classifiers trained on gammatonegrams for reliably detecting audio events

Cascade Classifiers

Events of interest can occur at every position in time

Classification through a n x m sliding window

Multi-stage cascade classifier learned with AdaBoostalgorithm (inspired to VJ face detector)

Smaller and simpler classifiers in the first stages of the cascade

Speed-up for the early rejection of negative windows

Input Image

rejected (no-events)

eventdetected

Page 9: Cascade classifiers trained on gammatonegrams for reliably detecting audio events

Data Set (http://mivia.unisa.it)

4 classes of sounds

Glass breaking (GB), Gun shot (GS), Screams (S), Background sound (BG)

2500 events for each class

1000 for training and 1500 for testing

The events are created by super-imposingabnormal sounds on several background sounds

Originally 173 background sounds + 278 sound from the classes of interest

Page 10: Cascade classifiers trained on gammatonegrams for reliably detecting audio events

Experimental Evaluation

Recognition Rate

Correct detection/classification of events of interest

False Positive Rate (False alarms)

Detection of events of interest when onlybackground sounds is present

Comparison with 2 other methods from the literature based on a LVQ [1] and Bag of Aural Words (BoAW) classifier [2]

[1] Conte et al. - An ensemble of rejecting classifiers for anomaly detection of audio events, AVSS 2012[2] Carletti et al. - Audio surveillance using a bag of aural words classifier, AVSS 2013

Page 11: Cascade classifiers trained on gammatonegrams for reliably detecting audio events

Experimental Evaluation (2)

Avg. Rec. Rate = 95.89%

Avg. Rec. Rate = 79.87% Avg. Rec. Rate = 95.67%

[1] [2]

Recognition Rate

Page 12: Cascade classifiers trained on gammatonegrams for reliably detecting audio events

Experimental Evaluation (3)

False Positive Rate

[1]

[2]

[1]

[2]

Page 13: Cascade classifiers trained on gammatonegrams for reliably detecting audio events

Qualitative analysis

Many false scream detections occur on background sounds that contain loud cheeringcrowds or twistles

Scream

Twistle

Cheeringbaby

Page 14: Cascade classifiers trained on gammatonegrams for reliably detecting audio events

Conclusions

Innovative approach for audio analysis and events detection based on Computer Vision techniques

High detection capabilities

Low processing time: complex features are computed only for windows that are more probable to contain an event of interest

Detection of sounds of interest with lowenergy

Page 15: Cascade classifiers trained on gammatonegrams for reliably detecting audio events

References

P. Foggia,A. Saggese, N.Strisciuglio, M. Vento

"Cascade classifiers trained on gammatonegrams for reliably detecting audio events" Advanced Video and Signal Based Surveillance (AVSS), 2014 11th IEEE International Conference on , vol., no., pp.50,55, 26-29 Aug. 2014doi: 10.1109/AVSS.2014.6918643

Web: http://mivia.unisa.it

Email: nstrisciuglio[at]unisa.it