35
A System for Hybridizing Vocal Performance By Kim Hang Lau

A System for Hybridizing Vocal Performance By Kim Hang Lau

Embed Size (px)

Citation preview

Page 1: A System for Hybridizing Vocal Performance By Kim Hang Lau

A System for Hybridizing Vocal Performance

By Kim Hang Lau

Page 2: A System for Hybridizing Vocal Performance By Kim Hang Lau

Parameters of the singing voice

Parameters of the singing voice can be loosely classified as:– Timbre– Pitch contour– Time contour (rhythm)– Amplitude envelope (projections)

Page 3: A System for Hybridizing Vocal Performance By Kim Hang Lau

Vocal Modification

Vocal modification refers to the signal processing of live or recorded singing to achieve a different inflection and/or timbre

Commercially available units include– Intonation corrector

– Pitch/formant processor

– Harmonizer

– Vocoder

Page 4: A System for Hybridizing Vocal Performance By Kim Hang Lau

Objectives

Prototype a system for vocal modification Modify a source vocal sample to match the

time evolution, pitch contour and amplitude envelope of a similarly sung, target vocal sample

Simulates a transfer of singing techniques from a target vocalist to a source vocalist – thus a hybridizing vocal performance

Page 5: A System for Hybridizing Vocal Performance By Kim Hang Lau

Order of Presentation

System Overview Individual components System evaluation System limitations Conclusions and recommendations

Page 6: A System for Hybridizing Vocal Performance By Kim Hang Lau

System Overview

Three components– Pitch-marking– Time-alignment– Time/pitch/amplitude

modification engine Inspired by Verhelst’s

prototype system for the post-synchronization of speech utterances

Page 7: A System for Hybridizing Vocal Performance By Kim Hang Lau

Targeted System Specifications

Vocal performance Commercial singing

Vocal pitch range 60-1200 Hz

Detection accuracy/resolution 10 cents

Detection dynamic range 40dB

Sampling rate 44.1kHz and 48kHz

Time-scale modification ±20%

Pitch-scale modification ±600 cents

Page 8: A System for Hybridizing Vocal Performance By Kim Hang Lau

Component No.1Pitch-marking

Page 9: A System for Hybridizing Vocal Performance By Kim Hang Lau

Pitch-marking and Glottal Closure Instants (GCIs)

Information generated from pitch-marking– Pitch period

– Amplitude envelope

– Voiced/unvoiced segment boundaries

Pitch-marks

5ms5msP P’

Page 10: A System for Hybridizing Vocal Performance By Kim Hang Lau

Pitch-marking applying Dyadic Wavelet Transform (DyWT)

Kadambe adapted Mallat’s algorithm for edge detection in image signal to the detection of GCIs in speech signal

He assumed the correlation between edges in image signal and GCIs in speech signal

DyWT computation for dyadic scales 2^3 to 2^5 was sufficient for pitch-marking

If a particular peak detected in DyWT matches for two consecutive scales, starting from a lower scale, that time-instant is taken as a GCI

Page 11: A System for Hybridizing Vocal Performance By Kim Hang Lau

Mallat KadambeOriginal Signal 2^1

2^2 2^3

2^4 2^5

Base-band

Page 12: A System for Hybridizing Vocal Performance By Kim Hang Lau

The proposed pitch-marking scheme

Detection principle– Detection of the scale that contains the fundamental

period– Starting from a higher scale (of lower frequency), there

is a considerable jump in frame power when this scale is encountered

Features– 4X decimation to support high sampling rates – Frame based processing and error correction for

possible quasi-real-time detection

Page 13: A System for Hybridizing Vocal Performance By Kim Hang Lau

The proposed pitch-marking system

Page 14: A System for Hybridizing Vocal Performance By Kim Hang Lau

Comparisons of results with Auto-Tune

Proposed system Auto-Tune

Page 15: A System for Hybridizing Vocal Performance By Kim Hang Lau

Component No.2The Modification Engine

Page 16: A System for Hybridizing Vocal Performance By Kim Hang Lau

(n): time-modification factor (n): pitch-modification factor

(n): amplitude modification factor D(n): time-warping function

(n) (n) (n) D(n)

Time/pitch/amplitude modification engine

Page 17: A System for Hybridizing Vocal Performance By Kim Hang Lau

TD-PSOLA(Time-domain Pitch Synchronous Overlap-Add)

Time-domain splicing overlap-add method Used in prosodic modification of speech

Page 18: A System for Hybridizing Vocal Performance By Kim Hang Lau

Evaluation of the modification engine

Original

TD-PSOLA

Auto-Tune

Page 19: A System for Hybridizing Vocal Performance By Kim Hang Lau

Component No.3Time-alignment

Page 20: A System for Hybridizing Vocal Performance By Kim Hang Lau

Time-alignment Based on Verhelst’s prototye

system that applies Dynamic Time Warping (DTW)

He claimed that the basic local constrain produces the most accurate time-warping path

Exponential increase in computation as length of comparison increases

Accuracy deteriorates as length of comparison increases

Page 21: A System for Hybridizing Vocal Performance By Kim Hang Lau

Adaptations from Verhelst’s method

Proposed to perform time-alignment on a voiced/unvoiced segmental basis– DTW for voiced segments– Linear Time Warping (LTW) for unvoiced segments

Global constraints are introduced to further reduce computations

Synchronization of voiced/unvoiced segments are required, which is manually edited in current implementation

Page 22: A System for Hybridizing Vocal Performance By Kim Hang Lau

Manipulation of modification parameters

Simple smoothing of (n), (n) using linear phase FIR low-pass filters are performed before feeding them to the modification engine

Page 23: A System for Hybridizing Vocal Performance By Kim Hang Lau

The Prototype System

Page 24: A System for Hybridizing Vocal Performance By Kim Hang Lau

System Evaluation: case 1

Page 25: A System for Hybridizing Vocal Performance By Kim Hang Lau

System Evaluation: case 2

Page 26: A System for Hybridizing Vocal Performance By Kim Hang Lau

System Limitations

Segmentation– Lack of a reliable technique for voiced/unvoiced

segmentation– Segmentation and classification of different

vocal sounds is the key to devise rules for modification

Modification engine– Lack capabilities to handle pitch transition, total

dependence to the pitch-marking stage

Page 27: A System for Hybridizing Vocal Performance By Kim Hang Lau

System Limitations

Pitch-marking– Proposed system lacks robustness– Despite desirable time-response of the wavelet filter

bank, its frequency response is not capable of isolating harmonics effectively and efficiently

Time-alignment– The DTW basic local constraint allows infinite time

expansion and compression. – This factor often causes distortions in the synthesized

vocal sample

Page 28: A System for Hybridizing Vocal Performance By Kim Hang Lau

Conclusions and Recommendations

Current systems works well for slow and continuous singing

Further improvements on the individual components are recommended to handle greater dynamic changes of the vocal signal, thereby extending the current good results to a wider range of singing styles

Page 29: A System for Hybridizing Vocal Performance By Kim Hang Lau

Questions

&

Answers

Page 30: A System for Hybridizing Vocal Performance By Kim Hang Lau

Wavelet filter bank

Page 31: A System for Hybridizing Vocal Performance By Kim Hang Lau

Dyadic Spline Wavelet

Page 32: A System for Hybridizing Vocal Performance By Kim Hang Lau

Wide-band analysis

Page 33: A System for Hybridizing Vocal Performance By Kim Hang Lau

DTW local constraints

Page 34: A System for Hybridizing Vocal Performance By Kim Hang Lau

Calculation of pitch-marks

Page 35: A System for Hybridizing Vocal Performance By Kim Hang Lau

DyWT