Upload
nicola-strisciuglio
View
132
Download
2
Embed Size (px)
DESCRIPTION
In this paper we propose a novel method for the detection of events of interest through audio analysis. The system that we propose is based on the representation of the audio streams through a Gammatone image, which describes the time-frequency distribution of the energy of the signal; this representation is inspired by the functioning of the human auditory system. A pool of AdaBoost cascade classifiers, one for each class of events of interest, is involved in the event detection stage. The performance of the proposed system has been evaluated on a large data set of audio events for surveillance applications and the achieved results, compared with two state of the art approaches, confirm its effectiveness. Downlaod the paper at: http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=6918643
Citation preview
P. Foggia, A. Saggese, N. Strisciuglio, M. Vento
University of Salerno - Italy
Machine Intelligence lab for Video, Image and Audio processing
"Cascade classifiers trained on gammatonegrams for reliably detecting audio events," Advanced Video and Signal Based Surveillance (AVSS), 2014 11th IEEE International Conference on ,
vol., no., pp.50,55, 26-29 Aug. 2014 - doi: 10.1109/AVSS.2014.6918643
State of the art
Single-layer representation or classification Vacher et al. (2004), Clavel et al. (2005): GMM classifier
Valenzise et al. (2007): GMM for background modeling
Rabaoui et al. (2008): OC-SVM with a novel dissimilaritymeasure.
Complex classification architecture or representation Rouas et al. (2006): GMM + SVM
Ntalampiras et al. (2009): two-stage GMM classifier Conte et al. (2012): two classifier with different time
resolutions
Chin and Burred (2012): sub-sequences matchingthrough Genetic Motif Discovery technique.
Proposed Architecture
ImageRepresentation
FeaturesExtraction
(Haar)
Cascade Classifiers
Audio representation
Biologically-inspired representation of audio streams as the response of the cochlea membrane in the human auditorysystem (Gammatone filter bank)
Scream Gun shot Glass breaking
Proposed Architecture
ImageRepresentation
FeaturesExtraction
(Haar)
Cascade Classifiers
Haar features
Haar Wavelets to describe local variations of energy in the Gammatonegram images f.i. abrupt variations of the energy distribution along time is effectively
described by a vertical Haar basis function
Efficiently computed from the Integral Image of the Gammatonegram
Proposed Architecture
ImageRepresentation
FeaturesExtraction
(Haar)
Cascade Classifiers
Cascade Classifiers
Events of interest can occur at every position in time
Classification through a n x m sliding window
Multi-stage cascade classifier learned with AdaBoostalgorithm (inspired to VJ face detector)
Smaller and simpler classifiers in the first stages of the cascade
Speed-up for the early rejection of negative windows
Input Image
rejected (no-events)
eventdetected
Data Set (http://mivia.unisa.it)
4 classes of sounds
Glass breaking (GB), Gun shot (GS), Screams (S), Background sound (BG)
2500 events for each class
1000 for training and 1500 for testing
The events are created by super-imposingabnormal sounds on several background sounds
Originally 173 background sounds + 278 sound from the classes of interest
Experimental Evaluation
Recognition Rate
Correct detection/classification of events of interest
False Positive Rate (False alarms)
Detection of events of interest when onlybackground sounds is present
Comparison with 2 other methods from the literature based on a LVQ [1] and Bag of Aural Words (BoAW) classifier [2]
[1] Conte et al. - An ensemble of rejecting classifiers for anomaly detection of audio events, AVSS 2012[2] Carletti et al. - Audio surveillance using a bag of aural words classifier, AVSS 2013
Experimental Evaluation (2)
Avg. Rec. Rate = 95.89%
Avg. Rec. Rate = 79.87% Avg. Rec. Rate = 95.67%
[1] [2]
Recognition Rate
Experimental Evaluation (3)
False Positive Rate
[1]
[2]
[1]
[2]
Qualitative analysis
Many false scream detections occur on background sounds that contain loud cheeringcrowds or twistles
Scream
Twistle
Cheeringbaby
Conclusions
Innovative approach for audio analysis and events detection based on Computer Vision techniques
High detection capabilities
Low processing time: complex features are computed only for windows that are more probable to contain an event of interest
Detection of sounds of interest with lowenergy
References
P. Foggia,A. Saggese, N.Strisciuglio, M. Vento
"Cascade classifiers trained on gammatonegrams for reliably detecting audio events" Advanced Video and Signal Based Surveillance (AVSS), 2014 11th IEEE International Conference on , vol., no., pp.50,55, 26-29 Aug. 2014doi: 10.1109/AVSS.2014.6918643
Web: http://mivia.unisa.it
Email: nstrisciuglio[at]unisa.it