ZFTFTB docs Documentation

ZFTFTB docs DocumentationRelease 0.3.0

Jeffrey Markowitz

July 25, 2016

Contents

1 Installation 31.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Using the install script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Manual installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Usage 52.1 Spectrograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Sound clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3 Extracting songs from mat/wav files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.4 Song detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.5 Spectral density images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.6 Similarity scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3 API 15

Bibliography 19

i

ii

ZFTFTB docs Documentation, Release 0.3.0

The Zebra Finch Time Frequency Toolbox contains a set of MATLAB functions for handling songbird data. See thetable of contents below for installation and usage instructions.

Contents:

Contents 1


2 Contents

CHAPTER 1

Installation

Note that this assumes a passing familiarity with the command line and git. If you are unfamiliar, google “commandline tutorial” and “git tutorial” (hard-linking to specific tutorials would be a fool’s errand at this point).

Simply clone the ZFTFTB github repository, and add the directory and subdirectories to your MATLAB path (includ-ing any dependencies, see below). An installation script is included that automates the entire process.

1.1 Requirements

This has been tested using MATLAB 2010A and later on Windows and Mac (Linux should be fine). You must havethe markolab and robofinch toolboxes in your MATLAB path. The only MATLAB Toolbox required is the SignalProcessing toolbox, which is typically included in most standard installations. It is highly recommended that you havegit installed for ease of installation and managing dependencies.

1.2 Using the install script

First clone the repository somewhere reasonable using the terminal (Linux/OS X):

$git clone https://github.com/jmarkow/zftftb.git

If you are running MATLAB on Linux or OS X, a script is available to automatically add ZFTFTB and any necessarydependencies to your MATLAB path. First, navigate to the cloned repository or unzipped directory of files from thegithub repository:

>>cd some_directory/zftftb/>>zftftb_install

You should be prompted to select a base directory for dependencies (e.g. ~/Documents/MATLAB). Then, assuminggit is installed and your pathdef.m is writable, the rest should be taken care of for you.

1.3 Manual installation

If you are somewhat comfortable with the command line and MATLAB, manual installation shouldn’t be too onerous.If you are working with OS X or Linux, pop open a terminal and clone the ZFTFTB and Markolab repositories:

$git clone https://github.com/jmarkow/zftftb.git$git clone https://github.com/jmarkow/markolab.git$git clone https://github.com/jmarkow/robofinch.git

3

https://github.com/jmarkow/markolab/

https://github.com/jmarkow/robofinch/


Then, make sure the repositories and their sub-directories are added to the MATLAB path.

4 Chapter 1. Installation

CHAPTER 2

Usage

Hint: To pass parameters to any function in the toolbox use parameter/value pairs, see examples below.

Warning: Many functions in this toolbox are parallel-enabled (i.e. include parfor loops), check the ParallelPreferences in your MATLAB installation to avoid unwanted behavior.

2.1 Spectrograms

To generate a spectrogram, use the function zftftb_pretty_sonogram, which computes a simple multi-taperspectrogram using the Gaussian and Gaussian derivative windows, if audio_data contains the mic trace and fs thesampling rate:

>>[s,f,t]=zftftb_pretty_sonogram(audio_data,fs,'len',80,'overlap',79.5,'clipping',[-2 2]);>>figure();>>imagesc(t,f,s);>>axis xy;

The len and overlap parameters set the length and overlap of the STFT to 80 and 79.5 milliseconds, respectively.Clipping sets the lower and upper clip to -2 and 2 (in logn units, this is the default for legacy compatibility, set theoption ‘units’ to ‘dB’ to work in decibels). All options given after the first two, the audio data and the sampling rate,are treated as parameter/value pairs.

Parame-ter

Description Format Options Default

overlap STFT overlap (ms) integer N/A 67len STFT window length (ms) integer N/A 70nfft FFT size (samples) integer [] auto autozeropad Zero-pad (samples) integer [] none, 0

auto[]

filtering High-pass filter corner Fs (5-pole Elliptic) float [] none []clipping Spectrogram clipping (logn units) 2 floats N/A [-2

2]units Set spectrogram units string ln,db,lin lnpostproc Prettify spectrogram (non-linear) string y,n ysaturation Image saturation (brightness of image, postproc on

only)float [0-1] .8

5


2.2 Sound clustering

Sound clustering is performed with zftftb_song_clust, which computes the L1 distance between features com-puted for a user-defined template, and a set of audio files. The first option to the script is the directory to process, ifno options are given it will process the current directory (pwd) with parameters set to the default. All options afterthe first, the directory to process, are passed as parameter/value pairs (see examples below). The basic workflow is asfollows: (1) spectral features are computed for all files in a director, (2) the L1 distance between a template and thefiles is computed, (3) the user selects hits based on the distance measure. Results for a particular template are stored ina sub-directory of your choice. You can go back to this directory and re-run any stage of the process without having torecompute the other stages (examples are given below). It will work with data saved in .mat files (requires a functionto point to location of the data and sampling rate), or audio files.

Warning: You may want to use your selection for automatic clustering later, if so set the train_classifierparameter to 1 or true.

1. To cluster a set of .wav files use the following command.

>>zftftb_song_clust;

2. You should see the following outputs.

>>zftftb_song_clust;>>Auto detecting file type>>File filter: *.wav>>Would you like to go to a (p)revious run or (c)reate a new one?

3. The file filter will use the first extension it finds in the directory. For example, if the first file in the directory is a.wav file, the script assumes all files to process are .wav files. This can be overridden through any of the scriptoptions detailed below. If you choose to (c)reate a new run, you will be asked to name the sub-directory to storeresults in.

4. After this, you will then need to select an audio file (anywhere on the computer) that contains the template (finda good sample in the gif directory). Once the file is selected, you will be presented with a GUI to tell theprogram exactly where the template is in time.

5. Finally, you will perform a manual cluster cut on the L1 distances between the template and the data. Note thatthe distances have been inverted, so higher numbers indicate a closer match. The clustering window should looklike this,

Typically, you will find features on the X and Y that separate the points in the upper right hand corner effectivelyand draw a border around them. To do this, try different features for X and Y until you see something that lookslike the above figure. Then, click on Draw cluster (X and Y only). The window should now looklike this,

Now draw a polygon around the cluster in the upper right hand corner. At each vertex left-click, when you’redone drawing press ENTER.

6 Chapter 2. Usage


2.2. Sound clustering 7


8 Chapter 2. Usage


Click on DONE to indicate that you’re finished drawing. As in the rightmost figure you’ll see the points changecolors to reflect your selection. Now, set Cluster selection to the cluster that you want. Close thewindow and the script will extract your selection.

Parameters for zftftb_song_clust are given below.

Parame-ter


colors colormap to use for spectrograms string MATLABcolormaps

hot

len STFT window length for spectrograms (ms) integer N/A 34overlap STFT overlap (ms) integer N/A 33disp_band STFT frequency range 2 ints N/A [1

10e3]audio_load Anonymous function used for loading audio data

from .mat filesanonymousfunction

N/A

data_load Anonymous function used for loading data toalign

anon N/A

file_filt File extension filter string auto,wav,mat autoextract Extract .gif, .wav, and .mat files post-alignment logical N/A trueclust_lim Limit on number of points to show for cluster

cuttinginteger N/A 1e4

train_classifierTrain a classifier to recognize the cluster cut logical N/A 1

2.2.1 Loading audio data using anonymous functions

To load audio data from a MATLAB file, zftftb_song_clust must know which variables contain the audio dataand the sampling rate. For example, this simple function assumes the audio data is in the field data in the structureaudio and the field fs contains the sampling rate:

function [DATA,FS]=my_audioload(FILE)%

load(FILE,'audio');

2.2. Sound clustering 9


DATA=audio.data;FS=audio.fs;

Save it as my_audioload.m somewhere in your MATLAB path (e.g. ~/Documents/MATLAB). Then, assign thefunction to an anonymous function:

>>loading_function=@(FILE) my_audioload(FILE);

Then pass the anonymous function to the audio_load parameter:

>>zftftb_song_clust(pwd,'audio_load',loading_function);

2.2.2 Features used for clustering

The features are detailed in [Pooleetal2012]. In brief, the reassigned spectrogram is computed by first taking theGabor trasform, i.e. short-time Fourier transform (STFT) with a Gaussian window,

𝑋(𝜏, 𝜔) =

∫︁𝑒−(𝑡−𝜏)2/2𝜎2

𝑒𝑖𝜔(𝑡−𝜏)𝜒(𝜏)𝑑𝜏

and then the STFT with derivative of the Gaussian window

𝜂(𝜏, 𝜔) =2

𝜎

∫︁(𝜏 − 𝑡)𝑒−(𝑡−𝜏)2/2𝜎2

𝑒𝑖𝜔(𝑡−𝜏)𝜒(𝜏)𝑑𝜏

a ratio between the ratio is then used as the basis for the features,

𝜂/𝑋 = |𝑆|𝑒𝑖𝜑

The complex phase 𝜑 of the ratio 𝜂/𝑋 defines the direction of maximum spectral derivative. From these terms wecalculate the following features: local power in the sonogram |𝑋|, 𝑐𝑜𝑠(𝜑), then a measure of how quickly the spectralderivative changes in time 𝜕(𝑐𝑜𝑠(𝜑))

𝜕𝑡 and frequency 𝜕(𝑐𝑜𝑠(𝜑))𝜕𝜔 . The points presented to the user for manual cluster

cutting are local minima in the L1 distance in these features between the template and the sound data to be clustered.The features in the clustering GUI are labeled as follows:

1. cos -> 𝑐𝑜𝑠(𝜑)

2. dx -> 𝜕(𝑐𝑜𝑠(𝜑))𝜕𝑡

3. dy -> 𝜕(𝑐𝑜𝑠(𝜑))𝜕𝜔

4. amp -> |𝑋|

5. product -> product of all features

6. curvature -> curvature of product

Here as an example of what the features look like on a sample of zebra finch song. In practice, amp is simply asmoothed spectrogram, and only frequencies between 3 and 9 kHz are used.

2.3 Extracting songs from mat/wav files

To extract songs from wav files in the current directory that may contain long segments of silence, usezftftb_song_chop:

>>zftftb_song_chop;

10 Chapter 2. Usage


2.3. Extracting songs from mat/wav files 11


This will find stretches of singing and extract them into the sub-directory chop_data. As withzftftb_song_clust, all options passed after the first, the directory to process, are parameter/value pairs, e.g.:

>>zftftb_song_chop(pwd,'audio_pad',3);

Will process the current directory and pad the extractions with 3 seconds before and after the vocalization period.

Parame-ter

Description Format Options De-fault

song_len window length for computing power band crossing (s) float N/A .005song_overlap window overlap for computing power band crossing (s) float N/A 0song_band frequency band that contains singing (Hz) 2 ints N/A [3e3

7e3]song_ratio ratio of pwer in the song_band and outside of the

song_bandfloat N/A 2

song_durationsmoothing kernel for song_ratio (s) float N/A .8song_pow threshold on power in singing band float N/A -infsong_thresh threshold on smoothed song ratio for song detection float N/A .1cus-tom_load

anonymous function used for loading data from MATLABfiles (see audio_load from above section)

anonymousfunction

N/A

file_filt filter for files to check string N/A ’\*.wav’audio_pad pad to include before and after detected song (s) float N/A 1colors spectrogram colormap string MATLAB

colormapshot

disp_band frequency band to use for spectrograms 2 ints N/A [19e3]

clipping spectrogram clipping (logn units) 2 floats N/A [-22]

ex-port_wav

export .wav files? logical N/A TRUE

ex-port_spectrogram

export spectrograms as .gifs? logical N/A TRUE

2.4 Song detection

If you have loaded a microphone signal into MATLAB, you can check for time points with singing. The functionreturns two outputs, the first is a vector of logicals indicating the presence (TRUE) or absence (FALSE) of song, thesecond is a vector of timestamps. The function has two obligatory options to pass, the mic data and the sampling rate,all additional options should be parameter/value pairs.:

>>[y,fs]=wavread('mydata.wav');>>[idx,t]=zftftb_song_det(y,fs);

The following parameters can be passed as parameter value pairs.

Parameter Description Format Options Defaultlen Window length (s) for computing power float N/A .005song_band Frequency range (Hz) for detecting song 2 floats N/A [2e3 6e3]overlap STFT overlap for computing power (s) float N/A 0song_duration smoothing for power calculation (s) float N/A .8ratio_thresh ratio of song to nonsong in power float N/A 2pow_thresh Threshold for song power float N/A -infsong_thresh Threshold for song ratio float N/A .2

12 Chapter 2. Usage


For example, to use a lower threshold on the ratio of power for song to nonsong (all frequencies outside of thesong_band):

>>[idx,t]=zftftb_song_det(y,fs,'song_thresh',.1)

2.5 Spectral density images

To compute a spectral density image, this uses the technique employed in [Markowitzetal2013]. If you would liketo use consensus contours, as described in [Limetal2013]. The spectral density image takes a group of sounds andforms a probability density in time and frequency. The inputs are a samples x trials matrix of doubles and thesampling rate. All options passed after the first two are considered parameter/value pairs:

>>[sdi f t contours]=zftftb_sdi(mic_matrix,fs);>>figure();>>imagesc(t,f,sdi.im);>>axis xy;

This will compute the spectral density image display the imaginary contours (sdi.re contains the contours from thereal component).

Parame-ter


tscale time-scale for Gaussian window (ms) float N/A 1.5len length of Gaussian window (ms) float N/A 34nfft fft length (ms) float [] for

auto[]

overlap STFT overlap (ms) float N/A 33filtering Corner Fs (Hz) for high-pass filter for mic trace (4-pole

elliptic)float [] for

none500

mask_only Exclude power weighting in spectral density image logical N/A falsespec_thresh Threshold on power-weighted contour image float N/A .78norm_amp Normalize mic traces by their abs(max) value logical N/A trueweighting Power weighting string log,lin log

2.6 Similarity scores

Similarity scores quantify the similarity between two groups of sounds. You will need the contours variable re-turned from zftftb_sdi (see Spectral density images). To compute the scores between the imaginary contours forgroups 1 and 2:

>>[sdi_group1 f t contours_group1]=zftftb_sdi(mic_matrix_group1,fs);>>[sdi_group2 f t contours_group2]=zftftb_sdi(mic_matrix_group2,fs);>>scores=zftftb_sdi_simscore(contours_group1.im,contours_group2.im,f,t);

Scores contains a 2 x 2 cell array, where the first dimension indicates the reference spectral density image, and thesecond the contour group. For example, scores{1,2} contains the similarity scores between contour group 2 andspectral density image 1 (the likelihood of group 2 given the probability density of group 1). Mathematically the scorefor sound i in contour group 2 relative to spectral density image 1 is:

SIM𝑖1,2 =

∑︀SDI1 · CONTOUR𝑖

2√︁∑︀(SDI1)2 · (CONTOUR𝑖

2)2

2.5. Spectral density images 13


14 Chapter 2. Usage

CHAPTER 3

API

[s,f,t]=zftftb_pretty_sonogram(signal,fs,varargin)

Parameters

• signal – microphone signal

• fs – sampling rate (default=48e3)

• varargin – parameter/value pairs (see table)

Returns s STFT magnitude

Returns f vector of frequency bins

Returns t vector of time bins

Parame-ter

Description For-mat

Options Default

overlap STFT overlap (ms) integer N/A 67len STFT window length (ms) integer N/A 70nfft FFT size (samples) integer [] auto autozeropad Zero-pad (samples) integer [] none, 0

auto[]

filtering High-pass filter corner Fs (5-pole Elliptic) float [] none []clipping Spectrogram clipping (logn units) 2 floats N/A [-2

2]units Set spectrogram units string ln,db,lin lnpostproc Prettify spectrogram (non-linear) string y,n ysaturation Image saturation (brightness of image, postproc

on only)float [0-1] .8

[sdi,f,t,contours]=zftftb_sdi(signals,fs,varargin)

Parameters

• signals – samples x trials matrix of microphone signals

• fs – sampling rate

• varargin – parameter/value pairs

Returns sdi spectral density image, imaginary and real components

Returns f vector of frequency bins


Returns contours time-frequency contours, imaginary and real components

15


Parame-ter

Description For-mat

Options De-fault

tscale time-scale for Gaussian window (ms) float N/A 1.5len length of Gaussian window (ms) float N/A 34nfft fft length (ms) float [] for

auto[]

overlap STFT overlap (ms) float N/A 33filtering Corner Fs (Hz) for high-pass filter for mic trace (4-pole

elliptic)float [] for

none500

mask_only Exclude power weighting in spectral density image logical N/A falsespec_thresh Threshold on power-weighted contour image float N/A .78norm_amp Normalize mic traces by their abs(max) value logical N/A trueweighting Power weighting string log,lin log

scores=zftftb_sdi_simscores(contour_group1,contour_group2,f,t,varargin)

Parameters

• contour_group1 – time-frequency contours group 1

• contour_group2 – time-frequency contours group 2

• f – vector of frequency bins

• t – vector of time bins


Returns scores cell array of similarity scores

Param-eter

Description For-mat

Options De-fault

time_range Min and max time points toconsider

2floats

[] to use first and lasttime points

[]

freq_band Min and max frequency pointsto consider

2floats

N/A [110e3]

zftftb_song_chop(dir, varargin)

Parameters

• dir – directory to process


16 Chapter 3. API


Parame-ter


song_len window length for computing power band crossing (s) float N/A .005song_overlap window overlap for computing power band crossing (s) float N/A 0song_band frequency band that contains singing (Hz) 2 ints N/A [3e3

7e3]song_ratio ratio of pwer in the song_band and outside of the

song_bandfloat N/A 2

song_durationsmoothing kernel for song_ratio (s) float N/A .8song_pow threshold on power in singing band float N/A -infsong_thresh threshold on smoothed song ratio for song detection float N/A .1cus-tom_load

anonymous function used for loading data fromMATLAB files (see audio_load from abovesection)

anony-mousfunction

N/A

file_filt filter for files to check string N/A ’\*.wav’audio_pad pad to include before and after detected song (s) float N/A 1colors spectrogram colormap string MATLAB

colormapshot

disp_band frequency band to use for spectrograms 2 ints N/A [19e3]

clipping spectrogram clipping (logn units) 2 floats N/A [-22]

ex-port_wav

export .wav files? logical N/A TRUE

ex-port_spectrogram

export spectrograms as .gifs? logical N/A TRUE

zftftb_song_clust(dir, varargin)

Parameters

• dir – directory to process


Param-eter


colors colormap to use for spectrograms string MATLABcolormaps

hot

len STFT window length for spectrograms (ms) integer N/A 34overlap STFT overlap (ms) integer N/A 33disp_band STFT frequency range 2 ints N/A [1

10e3]au-dio_load

Anonymous function used for loading audiodata from .mat files

anonymousfunction

N/A

data_load Anonymous function used for loading data toalign

anon N/A

file_filt File extension filter string auto,wav,mat autoextract Extract .gif, .wav, and .mat files post-alignment logical N/A trueclust_lim Limit on number of points to show for cluster

cuttinginteger N/A 1e4

[song_idx,t]=zftftb_song_det(signal,fs,varargin)

Parameters

• signal – microphone signal

17


• fs – sampling rate (default=48e3)


Returns song_idx vector of logicals indicating presence or absence of song


Parameter Description Format Options Defaultlen Window length (s) for computing power float N/A .005song_band Frequency range (Hz) for detecting song 2 floats N/A [2e3 6e3]overlap STFT overlap for computing power (s) float N/A 0song_duration smoothing for power calculation (s) float N/A .8ratio_thresh ratio of song to nonsong in power float N/A 2pow_thresh Threshold for song power float N/A -infsong_thresh Threshold for song ratio float N/A .2

18 Chapter 3. API

Bibliography

[Pooleetal2012] The Song Must Go On: Resilience of the Songbird Vocal Motor Pathway

[Markowitzetal2013] Long-range order in canary song, PLoS Comp Bio, 2013

[Limetal2013] Stable time-frequency contours for sparse signal representation, IEEE EUSIPCO, 2013

19

https://dx.doi.org/10.1371/journal.pone.0038173

https://dx.doi.org/10.1371/journal.pcbi.1003052

http://ieeexplore.ieee.org/xpls/icp.jsp?arnumber=6811462


20 Bibliography

Index

Zzftftb_song_chop() (built-in function), 16zftftb_song_clust() (built-in function), 17

21