Detection and Segmentation of Bird Song in Noisy Environments Lawrence Neal, UHC Honors Thesis

Embed Size (px)

Citation preview

  • Slide 1
  • Detection and Segmentation of Bird Song in Noisy Environments Lawrence Neal, UHC Honors Thesis
  • Slide 2
  • Bioacoustics Project Bird Species Identifiable by species Presence/Absence, activity data is useful Bird activity may shift in response to climate change, ecological factors
  • Slide 3
  • Bioacoustics Project
  • Slide 4
  • Automated Recording Song Meter automated recorders Collected May-August beginning 2009
  • Slide 5
  • Audio Data Analysis Involves several steps: Extracting Bird Sound from Audio Identifying Bird Species Mapping species data back to sites
  • Slide 6
  • Audio Data Analysis Involves several steps: Extracting Bird Sound from Audio (Segmentation) Identifying Bird Species Mapping species data back to sites
  • Slide 7
  • Segmentation Time-Domain Segmentation Separates audio into multiple clips Energy Thresholding, Onset/Offset Detection Has been applied to bird song Harma 2003, Fagerlund 2004, Lee 2008
  • Slide 8
  • Segmentation Time-Domain Segmentation
  • Slide 9
  • Segmentation Cannot separate overlapping sounds
  • Slide 10
  • Segmentation Time-Frequency Segmentation Segment regions of the 2D spectrogram
  • Slide 11
  • Segmentation Spectrogram Segmentation Similar to image segmentation
  • Slide 12
  • Spectrograms Two-dimensional representation of sound Audio amplitude at each (time, frequency) Generated by short-time Fourier Transform Male voice saying 'nineteenth century'. Violin playing (note harmonics)
  • Slide 13
  • Spectrograms Tradeoffs in parameters Larger STFT size Higher freq. resolution
  • Slide 14
  • Spectrograms Tradeoffs in parameters Shorter step size Higher time resolution
  • Slide 15
  • Spectrogram Segments Each segment is a continuous region Defined by a binary mask over the spectrogram
  • Slide 16
  • Spectrogram Segments Can be converted back to audio with inverse STFT, or left as 2D segments
  • Slide 17
  • Segmentation Methods Per-Pixel Random Forest Trains on one feature vector per pixel Outputs probability per-pixel Superpixel Merger Method First splits spectrogram into superpixels Trains on one feature vector per superpixel Second classifier trains per superpixel pair Outputs connected sets of superpixels
  • Slide 18
  • Random Forest Supervised Classifier Trains on human-provided data with labels Feature Vector of values, each with yes/no label Learns to mimic the humans labels Based on decision trees: Tree is traversed with feature vector X Each interior node is a decision of the type: If (X d < ) go left; else go right Each leaf node contains a class label In this case, two classes: Bird Sound and Negative
  • Slide 19
  • Random Forest Constructed by recursive procedure Check if all remaining examples are the same If so, finish with a leaf node Select a random subset of features For each one, find the optimal split (highest Gini) Choose the (feature, split) pair for maximum Gini coefficient and create new interior node Split the examples and recursively create two child nodes Classification is a vote among all trees
  • Slide 20
  • Per-Pixel Training Hand-Drawn mask over spectrogram Pixels are randomly sampled
  • Slide 21
  • Per-Pixel Training Feature vector includes: Pixel Frequency Window Variance All window pixel values
  • Slide 22
  • Per-Pixel Output Probability Mask over the spectrogram Threshold is applied to extract segments
  • Slide 23
  • Per-Pixel Output
  • Slide 24
  • Slide 25
  • Slide 26
  • Per-Pixel Limitations Scope is limited to window size High threshold causes oversegmentation Low threshold causes undersegmentation Slow- must classify for each pixel
  • Slide 27
  • Superpixel Method Begins with an initial pre-segmentation Modification of Simple Linear Iterative Clustering (SLIC) image segmentation Uses computed features that describe regions of the spectrogram Segments are sets of superpixels
  • Slide 28
  • Superpixel Clustering Based on SLIC method: Each pixel is assigned a 5-valued vector (X,Y, L, a, b) for position and color Locally-constrained K-Means Clustering Each centroid searches only a radius of 2S S = sqrt(N/K) Creates a set of regularly-sized regions Some regions boundaries follow the edges of larger objects in the image
  • Slide 29
  • Superpixel Clustering Over-segments an image Edges of clusters are along image edges But, doesnt work for spectrograms
  • Slide 30
  • Superpixel Clustering Spectrograms lack edges Also, only one channel of color Instead of (x,y,L,a,b), we use a new vector: (x, y, B, V, G x, G y, P x, P y )
  • Slide 31
  • Superpixel Clustering X,Y values Time and frequency values in the spectrogram B, V Pixel values after Gaussian blur, variance of pixel values G x,G y Horizontal/Vertical Sobel Gradient values P x, P y Time and Frequency values of nearest peak (weighted by Gaussian kernel)
  • Slide 32
  • Superpixel Clustering
  • Slide 33
  • Foreground/Background Classifier Random Forest trained using the same manual spectrogram labels as per-pixel Each superpixel is labeled positive (foreground) if more than 10% of its area overlaps with a positive-labeled region Feature vector describes superpixel: Mean and variance of pixel values, blurred pixel values, peak frequencies Histogram of Oriented Gradients
  • Slide 34
  • Foreground/Background Classifier
  • Slide 35
  • Superpixel Merger Classifier Random Forest trained to classify pairs of adjacent superpixels Positive classification: Merge together Negative classification: Split apart After background pixels are discarded, all remaining edges between superpixels are classified All edges above a threshold are merged
  • Slide 36
  • Superpixel Merger Classifier
  • Slide 37
  • Superpixel Method Output
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Evaluation Datasets HJ Andrews dataset, 625 recordings Each 15 seconds long Drawn 2 each from 24 hours Set A dataset, 166 recordings All from early and mid morning Paired by year, 2009/2010
  • Slide 44
  • Differences in Training Data
  • Slide 45
  • Results
  • Slide 46
  • Results
  • Slide 47
  • Results
  • Slide 48
  • Results
  • Slide 49
  • Future Work Superpixel Method is promising Faster than per-pixel classification Could use more sophisticated merger technique
  • Slide 50
  • Bibliography A. Harma, Automatic identication of bird species based on sinusoidal modeling of syllables, in IEEE International Conference on Acoustics Speech and Signal Processing, April 2003, pp. 545 548. Chang-Hsing Lee, Chin-Chuan Han, and Ching-Chien Chuang, Automatic classication of bird species from their sounds using two-dimensional cepstral coefcients, IEEE Transactions on Audio, Speech, and Language Processing, vol. 16, no. 8, pp. 1541 1550, 2008. Leo Breiman, Random forests, Machine Learning, pp. 532, January 2001. Fagerlund, Seppo. Automatic Recognition of Bird Species by Their Sounds. Masters Thesis, HELSINKI UNIVERSITY OF TECHNOLOGY, Laboratory of Acoustics and Audio Signal Processing. Nov. 8, 2004