Detection and Segmentation of Bird Song in Noisy Environments
Lawrence Neal, UHC Honors Thesis
Slide 2
Bioacoustics Project Bird Species Identifiable by species
Presence/Absence, activity data is useful Bird activity may shift
in response to climate change, ecological factors
Slide 3
Bioacoustics Project
Slide 4
Automated Recording Song Meter automated recorders Collected
May-August beginning 2009
Slide 5
Audio Data Analysis Involves several steps: Extracting Bird
Sound from Audio Identifying Bird Species Mapping species data back
to sites
Slide 6
Audio Data Analysis Involves several steps: Extracting Bird
Sound from Audio (Segmentation) Identifying Bird Species Mapping
species data back to sites
Slide 7
Segmentation Time-Domain Segmentation Separates audio into
multiple clips Energy Thresholding, Onset/Offset Detection Has been
applied to bird song Harma 2003, Fagerlund 2004, Lee 2008
Slide 8
Segmentation Time-Domain Segmentation
Slide 9
Segmentation Cannot separate overlapping sounds
Slide 10
Segmentation Time-Frequency Segmentation Segment regions of the
2D spectrogram
Slide 11
Segmentation Spectrogram Segmentation Similar to image
segmentation
Slide 12
Spectrograms Two-dimensional representation of sound Audio
amplitude at each (time, frequency) Generated by short-time Fourier
Transform Male voice saying 'nineteenth century'. Violin playing
(note harmonics)
Slide 13
Spectrograms Tradeoffs in parameters Larger STFT size Higher
freq. resolution
Slide 14
Spectrograms Tradeoffs in parameters Shorter step size Higher
time resolution
Slide 15
Spectrogram Segments Each segment is a continuous region
Defined by a binary mask over the spectrogram
Slide 16
Spectrogram Segments Can be converted back to audio with
inverse STFT, or left as 2D segments
Slide 17
Segmentation Methods Per-Pixel Random Forest Trains on one
feature vector per pixel Outputs probability per-pixel Superpixel
Merger Method First splits spectrogram into superpixels Trains on
one feature vector per superpixel Second classifier trains per
superpixel pair Outputs connected sets of superpixels
Slide 18
Random Forest Supervised Classifier Trains on human-provided
data with labels Feature Vector of values, each with yes/no label
Learns to mimic the humans labels Based on decision trees: Tree is
traversed with feature vector X Each interior node is a decision of
the type: If (X d < ) go left; else go right Each leaf node
contains a class label In this case, two classes: Bird Sound and
Negative
Slide 19
Random Forest Constructed by recursive procedure Check if all
remaining examples are the same If so, finish with a leaf node
Select a random subset of features For each one, find the optimal
split (highest Gini) Choose the (feature, split) pair for maximum
Gini coefficient and create new interior node Split the examples
and recursively create two child nodes Classification is a vote
among all trees
Slide 20
Per-Pixel Training Hand-Drawn mask over spectrogram Pixels are
randomly sampled
Slide 21
Per-Pixel Training Feature vector includes: Pixel Frequency
Window Variance All window pixel values
Slide 22
Per-Pixel Output Probability Mask over the spectrogram
Threshold is applied to extract segments
Slide 23
Per-Pixel Output
Slide 24
Slide 25
Slide 26
Per-Pixel Limitations Scope is limited to window size High
threshold causes oversegmentation Low threshold causes
undersegmentation Slow- must classify for each pixel
Slide 27
Superpixel Method Begins with an initial pre-segmentation
Modification of Simple Linear Iterative Clustering (SLIC) image
segmentation Uses computed features that describe regions of the
spectrogram Segments are sets of superpixels
Slide 28
Superpixel Clustering Based on SLIC method: Each pixel is
assigned a 5-valued vector (X,Y, L, a, b) for position and color
Locally-constrained K-Means Clustering Each centroid searches only
a radius of 2S S = sqrt(N/K) Creates a set of regularly-sized
regions Some regions boundaries follow the edges of larger objects
in the image
Slide 29
Superpixel Clustering Over-segments an image Edges of clusters
are along image edges But, doesnt work for spectrograms
Slide 30
Superpixel Clustering Spectrograms lack edges Also, only one
channel of color Instead of (x,y,L,a,b), we use a new vector: (x,
y, B, V, G x, G y, P x, P y )
Slide 31
Superpixel Clustering X,Y values Time and frequency values in
the spectrogram B, V Pixel values after Gaussian blur, variance of
pixel values G x,G y Horizontal/Vertical Sobel Gradient values P x,
P y Time and Frequency values of nearest peak (weighted by Gaussian
kernel)
Slide 32
Superpixel Clustering
Slide 33
Foreground/Background Classifier Random Forest trained using
the same manual spectrogram labels as per-pixel Each superpixel is
labeled positive (foreground) if more than 10% of its area overlaps
with a positive-labeled region Feature vector describes superpixel:
Mean and variance of pixel values, blurred pixel values, peak
frequencies Histogram of Oriented Gradients
Slide 34
Foreground/Background Classifier
Slide 35
Superpixel Merger Classifier Random Forest trained to classify
pairs of adjacent superpixels Positive classification: Merge
together Negative classification: Split apart After background
pixels are discarded, all remaining edges between superpixels are
classified All edges above a threshold are merged
Slide 36
Superpixel Merger Classifier
Slide 37
Superpixel Method Output
Slide 38
Slide 39
Slide 40
Slide 41
Slide 42
Slide 43
Evaluation Datasets HJ Andrews dataset, 625 recordings Each 15
seconds long Drawn 2 each from 24 hours Set A dataset, 166
recordings All from early and mid morning Paired by year,
2009/2010
Slide 44
Differences in Training Data
Slide 45
Results
Slide 46
Results
Slide 47
Results
Slide 48
Results
Slide 49
Future Work Superpixel Method is promising Faster than
per-pixel classification Could use more sophisticated merger
technique
Slide 50
Bibliography A. Harma, Automatic identication of bird species
based on sinusoidal modeling of syllables, in IEEE International
Conference on Acoustics Speech and Signal Processing, April 2003,
pp. 545 548. Chang-Hsing Lee, Chin-Chuan Han, and Ching-Chien
Chuang, Automatic classication of bird species from their sounds
using two-dimensional cepstral coefcients, IEEE Transactions on
Audio, Speech, and Language Processing, vol. 16, no. 8, pp. 1541
1550, 2008. Leo Breiman, Random forests, Machine Learning, pp. 532,
January 2001. Fagerlund, Seppo. Automatic Recognition of Bird
Species by Their Sounds. Masters Thesis, HELSINKI UNIVERSITY OF
TECHNOLOGY, Laboratory of Acoustics and Audio Signal Processing.
Nov. 8, 2004