Noise Robust Pitch Tracking by Subband Autocorrelation...

Noise Robust Pitch Trackingby Subband Autocorrelation Classification (SAcC)

Byung Suk Lee & Dan Ellis • Columbia University / ICSI • {bsl,dpwe}@ee.columbia.edu

Summary: A neural net classifier is trained to identify the pitch of a frame of subband autocorrelation principal components. Accuracy is greatly improved for noisy, bandlimited speech, matched to the training data.

COLUMBIA

UNIVERSITYLaboratory for the Recognition andOrganization of Speech and Audio

for Interspeech 2012 • 2012-09-11 • dpwe@ee.columbia.eduMATLAB code available: http://labrosa.ee.columbia.edu/projects/SAcC/

Background

Material

Filterbank Classifier• “Classic” approaches to pitch tracking reveal time-domain waveform

periodicity via autocorrelation or related approaches.• An example is YIN [de Cheveigné & Kawahara 2002]:

• We are specifically interested in pitch tracking for low quality radio reception data (e.g. hand-held narrow-FM) (RATS project).

• This data has both high noise/distortion and narrow bandwidth.

• 48, 4-pole 2-zero ERB-scale cochlea filter approximations form auditory-like subbands.

• The core of our system is a trained (MLP) classifier.

• It takes k (10) PCA coefficients for each of s (48) subband autocorrelations and estimates posteriors over p (67) quantized pitch values.

• The MLP is trained discriminatively on noisy data (with ground truth pitches).

• Discriminative training virtually eliminates octave/suboctave errors seen in peak-based pitch tracking.

• Added interference hurts the autocorrelation, but filtering into subbands can reveal certain bands with better local SNR. Also, the contribution of each subband can be adjusted in later fusion to select sources.

• An example is [Wu, Wang & Brown 2003] (“the Wu algorithm”):

• This work began as an investigation into the peak selection and integration stages of the Wu algorithm. We found that replacing both stages with a trained classifier offered large performance improvements, as well as the chance to train domain-specific pitch trackers.

Training Data

Evaluation

• The pitch classifier needs training data with ground truth. Performance improves when training data is more similar to test data.

• We used pitch-annotated data (Keele), then artificially filtered and added noise to resemble the target domain.

• We also generated pseudo-ground-truth for target domain data by using only frames where three independent pitch trackers agreed.

Autocorrelation-based

di�erence function

Cumulativenormalization

andthreshold

Local peak selection

& interpolation

SoundPitch trackingusing HMM*

* not in standard YIN

Sound AuditoryFilterbank

NormalizedAutocorelation

Channel/PeakSelection

Cross-channelIntegration

Pitch Trackingusing HMM

time / s

GPE = Evv/NvvEvv

VE = (Evv+Evu)/NvEvv Evu

UE = Euv/Nu PTE = (VE+UE)/2Euv

−10 0 10 20 30

FDA − RBF and pink noise

−10 0 10 20 30

SNR (dB)−10 0 10 20 30

SNR (dB)

YINWuget_f0SAcC

0agVagUdUVdVV

YINWuSWIPE’

600 700 800 900 1000 1100 1200 1300 1400 1500 16000

LabelReject

Clean Speech

Lags50 100 150 200 250 300 350 400

Autocorrelation Principal Component Analysis Hidden Markov Model Smoothing• Normalized autocorrelation out to 25 ms

(400 samples @ 16 kHz) reveals periodic structure in each subband.

• The autocorrelation in each subband is highly constrained by the bandlimited input.

• Pitch information is distributed throughout each autocorrelation row; simple peak-picking ignores most of this.

• The autocorrelations in a given subband always reflect the center frequency, i.e., they occupy a small part of the 400-D space.

• We use per-subband PCA to reduce each band to 10 coefficients, which preserves almost all the variance.

• PCA bases remain stable when learned from different signal conditions, so are fixed.

• The MLP generates posterior probabilities across 66 pitch candidates + “no pitch” for every 10 ms time frame.

• These become a single pitch track via Viterbi decoding through an HMM with pitch states.

• Transition probabilities are set parametrically (pitch-invariant) and tuned empirically.

• We tested on the pitch- annotated FDA data with radio-band filtering and pink noise added at a range of SNRs.

• SAcC trained on Keele data (with similar corruption) substantially outperformed other pitch trackers.

• Later experiments show that SAcC can generalize to mismatched train/test scenarios.

• GPE, the most common pitch tracking metric, only considers frames where both ground truth and system report a pitch value. This rewards “punting” on difficult frames.

• We define VE as accuracy over all true-voiced frames, and UE over all true-unvoiced frames.

• We evaluate by PTE, the mean of VE and UE.

Time / s0 0.5 1 1.5

Gain / dB

Radio-bandfilter (RBF)

0 -20 -40YI

time / s10Pitch “scores” from YIN, Wu, and SAcC

ReferencesA. de Cheveigné and H. Kawahara, “YIN, a fundamental frequency estimator

for speech and music,” J. Acoust. Soc. Am., 111(4):1917–1930, April 2002.

M. Wu, D.L. Wang, and G.J. Brown, “A multipitch tracking algorithm for noisy speech,” IEEE Tr. Speech and Audio Proc., 11(3):229–241, May 2003.

MLP output

SAcC log MLP output

Wu likelihood

1 - YIN similarity

Noise Robust Pitch Tracking by Subband Autocorrelation...

Documents

Spatial Analysis - Autocorrelation

Heteroskedasticity- and Autocorrelation-Robust Inference · HAC = Heteroskedasticity- and Autocorrelation-Consistent HAR = Heteroskedasticity- and Autocorrelation-Robust 1) HAC/HAR

autocorrelation Time series analysis, COMP6053 lectureusers.ecs.soton.ac.uk/jn2/teaching/timeSeries.pdf · Autocorrelation and partial autocorrelation We can look directly at the

9.1 Lecture #9 Studenmund (2006) Chapter 9 Objectives The nature of autocorrelation The consequences of autocorrelation Testing the existence of autocorrelation

1 Autocorrelation

CONJOINT PROBABILISTIC SUBBAND MODELING by Ashok

Autocorrelation x 4

Pitch of unresolved harmonics: Evidence against autocorrelation

Lesson 7: Estimation of Autocorrelation and Partial ... · Lesson 7: Estimation of Autocorrelation and Partial Autocorrelation Function Umberto Triacca Dipartimento di Ingegneria

Spatial Autocorrelation - GitHub Pages › ... › lecture_pdfs › Spatial_Autocorrelation.p… · Spatial Autocorrelation in SEMs Space: The Final SEM? 1.Detecting spatial autocorrelation

Subband and Wavelet Coding

Correlation and Autocorrelation

6338_multicollinearity & Autocorrelation

Spatial Autocorrelation (3) Global Spatial Autocorrelation

Tut11 autocorrelation

On the Use of Autocorrelation Analysis for Pitch Detection

3. Subband Coding

New 15: Subband Processing - Imperial College London · 2017. 12. 15. · Subband processing 15: Subband Processing •Subband processing •2-band Filterbank •Perfect Reconstruction

Wavelets and subband codding

Subband Coding