50
Detection and Segmentation of Bird Song in Noisy Environments Lawrence Neal, UHC Honors Thesis

Detection and Segmentation of Bird Song in Noisy Environments

Embed Size (px)

DESCRIPTION

Detection and Segmentation of Bird Song in Noisy Environments. Lawrence Neal, UHC Honors Thesis. Bioacoustics Project. Bird Species Identifiable by species Presence/Absence, activity data is useful Bird activity may shift in response to climate change, ecological factors. - PowerPoint PPT Presentation

Citation preview

Page 1: Detection and Segmentation of Bird Song in Noisy Environments

Detection and Segmentation of Bird Song in Noisy EnvironmentsLawrence Neal, UHC Honors Thesis

Page 2: Detection and Segmentation of Bird Song in Noisy Environments

Bioacoustics ProjectBird Species

◦Identifiable by species◦Presence/Absence, activity data is

useful Bird activity may shift in response to

climate change, ecological factors

Page 3: Detection and Segmentation of Bird Song in Noisy Environments

Bioacoustics Project

Page 4: Detection and Segmentation of Bird Song in Noisy Environments

Automated RecordingSong Meter automated recordersCollected May-August beginning

2009

Page 5: Detection and Segmentation of Bird Song in Noisy Environments

Audio Data Analysis

Involves several steps:◦Extracting Bird Sound from Audio◦Identifying Bird Species◦Mapping species data back to sites

Page 6: Detection and Segmentation of Bird Song in Noisy Environments

Audio Data Analysis

Involves several steps:◦Extracting Bird Sound from Audio

(Segmentation)◦Identifying Bird Species◦Mapping species data back to sites

Page 7: Detection and Segmentation of Bird Song in Noisy Environments

SegmentationTime-Domain Segmentation

◦Separates audio into multiple clips◦Energy Thresholding, Onset/Offset

Detection◦Has been applied to bird song

Harma 2003, Fagerlund 2004, Lee 2008

Page 8: Detection and Segmentation of Bird Song in Noisy Environments

SegmentationTime-Domain Segmentation

Page 9: Detection and Segmentation of Bird Song in Noisy Environments

SegmentationTime-Domain Segmentation

◦Cannot separate overlapping sounds

Page 10: Detection and Segmentation of Bird Song in Noisy Environments

SegmentationTime-Frequency Segmentation

◦Segment regions of the 2D spectrogram

Page 11: Detection and Segmentation of Bird Song in Noisy Environments

SegmentationSpectrogram Segmentation

◦Similar to image segmentation

Page 12: Detection and Segmentation of Bird Song in Noisy Environments

SpectrogramsTwo-dimensional representation

of sound◦Audio amplitude at each (time,

frequency)◦Generated by short-time Fourier

Transform Male voice saying 'nineteenth century'.

Violin playing (note harmonics)

Page 13: Detection and Segmentation of Bird Song in Noisy Environments

SpectrogramsTradeoffs in parameters

◦Larger STFT size◦Higher freq. resolution

Page 14: Detection and Segmentation of Bird Song in Noisy Environments

SpectrogramsTradeoffs in parameters

◦Shorter step size◦Higher time resolution

Page 15: Detection and Segmentation of Bird Song in Noisy Environments

Spectrogram SegmentsEach segment is a continuous

region◦Defined by a binary mask over the

spectrogram

Page 16: Detection and Segmentation of Bird Song in Noisy Environments

Spectrogram SegmentsCan be converted back to audio

with inverse STFT, or left as 2D segments

Page 17: Detection and Segmentation of Bird Song in Noisy Environments

Segmentation MethodsPer-Pixel Random Forest

◦Trains on one feature vector per pixel◦Outputs probability per-pixel

Superpixel Merger Method◦First splits spectrogram into

‘superpixels’◦Trains on one feature vector per

superpixel◦Second classifier trains per

superpixel pair◦Outputs connected sets of

superpixels

Page 18: Detection and Segmentation of Bird Song in Noisy Environments

Random ForestSupervised Classifier

◦Trains on human-provided data with labels “Feature Vector” of values, each with

yes/no label

◦Learns to mimic the human’s labelsBased on decision trees:

◦Tree is traversed with feature vector X

◦Each interior node is a decision of the type: If (Xd < θ) go left; else go right

◦Each leaf node contains a class label In this case, two classes: ‘Bird Sound’ and

‘Negative’

Page 19: Detection and Segmentation of Bird Song in Noisy Environments

Random ForestConstructed by recursive procedure

◦Check if all remaining examples are the same If so, finish with a leaf node

◦Select a random subset of features For each one, find the optimal split (highest Gini)

◦Choose the (feature, split) pair for maximum Gini coefficient and create new interior node

◦Split the examples and recursively create two child nodes

Classification is a vote among all trees

Page 20: Detection and Segmentation of Bird Song in Noisy Environments

Per-Pixel TrainingHand-Drawn mask over

spectrogram◦Pixels are randomly sampled

Page 21: Detection and Segmentation of Bird Song in Noisy Environments

Per-Pixel TrainingFeature vector includes:

◦Pixel Frequency◦Window Variance◦All window pixel values

Page 22: Detection and Segmentation of Bird Song in Noisy Environments

Per-Pixel OutputProbability Mask over the

spectrogramThreshold is applied to extract

segments

Page 23: Detection and Segmentation of Bird Song in Noisy Environments

Per-Pixel Output

Page 24: Detection and Segmentation of Bird Song in Noisy Environments

Per-Pixel Output

Page 25: Detection and Segmentation of Bird Song in Noisy Environments

Per-Pixel Output

Page 26: Detection and Segmentation of Bird Song in Noisy Environments

Per-Pixel LimitationsScope is limited to window sizeHigh threshold causes

oversegmentationLow threshold causes

undersegmentationSlow- must classify for each pixel

Page 27: Detection and Segmentation of Bird Song in Noisy Environments

Superpixel MethodBegins with an initial pre-

segmentation◦Modification of Simple Linear

Iterative Clustering (SLIC) image segmentation

◦Uses computed features that describe regions of the spectrogram

Segments are sets of superpixels

Page 28: Detection and Segmentation of Bird Song in Noisy Environments

Superpixel ClusteringBased on SLIC method:

◦Each pixel is assigned a 5-valued vector (X,Y, L, a, b) for position and color

Locally-constrained K-Means Clustering◦Each centroid searches only a radius

of 2S S = sqrt(N/K)

Creates a set of regularly-sized regions◦Some regions’ boundaries follow the

edges of larger objects in the image

Page 29: Detection and Segmentation of Bird Song in Noisy Environments

Superpixel ClusteringOver-segments an image

◦Edges of clusters arealong image edges

But, doesn’t workfor spectrograms

Page 30: Detection and Segmentation of Bird Song in Noisy Environments

Superpixel ClusteringSpectrograms lack edges

◦Also, only one channel of colorInstead of (x,y,L,a,b), we use a

new vector:◦(x, y, B, V, Gx, Gy, Px, Py)

Page 31: Detection and Segmentation of Bird Song in Noisy Environments

Superpixel ClusteringX,Y values

◦Time and frequency values in the spectrogram

B, V◦Pixel values after Gaussian blur, variance of

pixel valuesGx ,Gy

◦Horizontal/Vertical Sobel Gradient valuesPx, Py

◦Time and Frequency values of nearest peak (weighted by Gaussian kernel)

Page 32: Detection and Segmentation of Bird Song in Noisy Environments

Superpixel Clustering

Page 33: Detection and Segmentation of Bird Song in Noisy Environments

Foreground/Background ClassifierRandom Forest trained using the

same manual spectrogram labels as per-pixel◦Each superpixel is labeled positive

(foreground) if more than 10% of its area overlaps with a positive-labeled region

Feature vector describes superpixel:◦Mean and variance of pixel values,

blurred pixel values, peak frequencies◦Histogram of Oriented Gradients

Page 34: Detection and Segmentation of Bird Song in Noisy Environments

Foreground/Background Classifier

Page 35: Detection and Segmentation of Bird Song in Noisy Environments

Superpixel Merger ClassifierRandom Forest trained to classify

pairs of adjacent superpixels◦Positive classification: Merge

together◦Negative classification: Split apart

After background pixels are discarded, all remaining edges between superpixels are classified◦All edges above a threshold are

merged

Page 36: Detection and Segmentation of Bird Song in Noisy Environments

Superpixel Merger Classifier

Page 37: Detection and Segmentation of Bird Song in Noisy Environments

Superpixel Method Output

Page 38: Detection and Segmentation of Bird Song in Noisy Environments

Superpixel Method Output

Page 39: Detection and Segmentation of Bird Song in Noisy Environments

Superpixel Method Output

Page 40: Detection and Segmentation of Bird Song in Noisy Environments

Superpixel Method Output

Page 41: Detection and Segmentation of Bird Song in Noisy Environments

Superpixel Method Output

Page 42: Detection and Segmentation of Bird Song in Noisy Environments

Superpixel Method Output

Page 43: Detection and Segmentation of Bird Song in Noisy Environments

Evaluation DatasetsHJ Andrews dataset, 625

recordings◦Each 15 seconds long◦Drawn 2 each from 24 hours

“Set A” dataset, 166 recordings◦All from early and mid morning◦Paired by year, 2009/2010

Page 44: Detection and Segmentation of Bird Song in Noisy Environments

Differences in Training Data

Page 45: Detection and Segmentation of Bird Song in Noisy Environments

Results

Page 46: Detection and Segmentation of Bird Song in Noisy Environments

Results

Page 47: Detection and Segmentation of Bird Song in Noisy Environments

Results

Page 48: Detection and Segmentation of Bird Song in Noisy Environments

Results

Page 49: Detection and Segmentation of Bird Song in Noisy Environments

Future WorkSuperpixel Method is promising

◦Faster than per-pixel classification◦Could use more sophisticated

merger technique

Page 50: Detection and Segmentation of Bird Song in Noisy Environments

Bibliography A. Harma, “Automatic identification of bird species based on sinusoidal

modeling of syllables,” in IEEE International Conference on Acoustics Speech and Signal Processing, April 2003, pp. 545–548.

Chang-Hsing Lee, Chin-Chuan Han, and Ching-Chien Chuang, “Automatic classification of bird species from their sounds using two-dimensional cepstral coefficients,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 16, no. 8, pp. 1541 – 1550, 2008.

Leo Breiman, “Random forests,” Machine Learning, pp. 5–32, January 2001. Fagerlund, Seppo. Automatic Recognition of Bird Species by Their Sounds.

Master’s Thesis, HELSINKI UNIVERSITY OF TECHNOLOGY, Laboratory of Acoustics and Audio Signal Processing. Nov. 8, 2004