11
A Robust Algorithm for Pitch Tracking David Talkin Hsiao-Tsung Hung

A Robust Algorithm for Pitch Tracking

  • Upload
    naiara

  • View
    88

  • Download
    0

Embed Size (px)

DESCRIPTION

A Robust Algorithm for Pitch Tracking. David Talkin. Hsiao- Tsung Hung. Outline. NCCF Algorithm outline Post-processing. Normalized cross-correlation function . period candidate generation function The NCCF at lag k and analysis frame i is. k. NCCF. Problem - PowerPoint PPT Presentation

Citation preview

Page 1: A Robust Algorithm for Pitch Tracking

A Robust Algorithm for Pitch Tracking

David Talkin

Hsiao-Tsung Hung

Page 2: A Robust Algorithm for Pitch Tracking

Outline

• NCCF• Algorithm outline• Post-processing

Page 3: A Robust Algorithm for Pitch Tracking

Normalized cross-correlation function

• period candidate generation function– The NCCF at lag k and analysis frame i is

Page 4: A Robust Algorithm for Pitch Tracking

k

Page 5: A Robust Algorithm for Pitch Tracking

5

• Problem

Between the times of 198.625 and 198.7 the correlation peak at twice the correct period is stronger and more consistent than the “true” peak.

NCCF

Page 6: A Robust Algorithm for Pitch Tracking

6

NCCF

1. The local maximum in Φ corresponding to the “true” F0 for voiced speech is usually the largest and it close to 1.0.

2. When multiple maximum in Φ exist and have values close to 1.0, the maximum corresponding to the shortest period is usually the correct choice.

3. True Φ maximum in temporally adjacent analysis frames are located usually at comparable lags, since F0 is a slowly-varying function of time.

4. The “true” F0 occasionally changes abruptly by doubling or halving5. Voicing tends to change states with low frequency.6. The largest non-zero-lag maximum in Φ for unvoiced speech is usually

consider ably less than 1.0 .7. The short-time spectra of voiced and unvoiced speech frames are usually

quite different.8. Amplitude tends to increase at the onset of voicing and to decrease at

offset.

Page 7: A Robust Algorithm for Pitch Tracking

7

Algorithm outline1. Provide two versions of the sampled speech data.(16k,2k Hz)2. Compute the NCCF of the low sample rate signal for all lags in the F0

range of interest(50~500Hz).Record the location of local maximum in the first-pass NCCF.

3. Compute the NCCF of the high sample rate signal only in the vicinity of promising peaks found in the first pass. Search again for the local maximum in this refined NCCF to obtain improved peak location and amplitude estimates.

4. Each peak retained from the high-resolution NCCF generates a candidate F0 for that frame. At each frame the hypothesis that the frame is unvoiced is also advanced.

5. DP is used to select the set of NCCF peaks or unvoiced hypotheses across all frames that best match the characteristics mentioned above.

Page 8: A Robust Algorithm for Pitch Tracking

8

Post-processing• Using dynamic programming to select the best F0 and voicing state

candidates at each frame based on a combination of local and contextual evidence.

• Objective function as the local cost– Frame i is voiced with period

– Unvoiced at frame i

L :lag :the number of states proposed at frame i. the value of the jth local maximum in Φ at frame i.This local cost function favors close to 1 and shorter lags for voiced frames, and close to 0 for unvoiced frames.

Page 9: A Robust Algorithm for Pitch Tracking

9

Post-processing• The inter-frame F0 transition cost δ at frame i when hypotheses j

and k at the current and previous frames are both voiced is defined as :

*F0 is a slowly-varying function of time.

• Unvoiced to unvoiced :

Page 10: A Robust Algorithm for Pitch Tracking

10

Post-processing• voiced to unvoiced :

• unvoiced to voiced :

If the speech amplitude is increasing, rr>1 , if it is decreasing ,0 < rr <1.

Page 11: A Robust Algorithm for Pitch Tracking

11

Post-processing• Optimal objective function for frame i: