Upload
jodie-poole
View
217
Download
0
Tags:
Embed Size (px)
Citation preview
A System for Hybridizing Vocal Performance
By Kim Hang Lau
Parameters of the singing voice
Parameters of the singing voice can be loosely classified as:– Timbre– Pitch contour– Time contour (rhythm)– Amplitude envelope (projections)
Vocal Modification
Vocal modification refers to the signal processing of live or recorded singing to achieve a different inflection and/or timbre
Commercially available units include– Intonation corrector
– Pitch/formant processor
– Harmonizer
– Vocoder
Objectives
Prototype a system for vocal modification Modify a source vocal sample to match the
time evolution, pitch contour and amplitude envelope of a similarly sung, target vocal sample
Simulates a transfer of singing techniques from a target vocalist to a source vocalist – thus a hybridizing vocal performance
Order of Presentation
System Overview Individual components System evaluation System limitations Conclusions and recommendations
System Overview
Three components– Pitch-marking– Time-alignment– Time/pitch/amplitude
modification engine Inspired by Verhelst’s
prototype system for the post-synchronization of speech utterances
Targeted System Specifications
Vocal performance Commercial singing
Vocal pitch range 60-1200 Hz
Detection accuracy/resolution 10 cents
Detection dynamic range 40dB
Sampling rate 44.1kHz and 48kHz
Time-scale modification ±20%
Pitch-scale modification ±600 cents
Component No.1Pitch-marking
Pitch-marking and Glottal Closure Instants (GCIs)
Information generated from pitch-marking– Pitch period
– Amplitude envelope
– Voiced/unvoiced segment boundaries
Pitch-marks
5ms5msP P’
Pitch-marking applying Dyadic Wavelet Transform (DyWT)
Kadambe adapted Mallat’s algorithm for edge detection in image signal to the detection of GCIs in speech signal
He assumed the correlation between edges in image signal and GCIs in speech signal
DyWT computation for dyadic scales 2^3 to 2^5 was sufficient for pitch-marking
If a particular peak detected in DyWT matches for two consecutive scales, starting from a lower scale, that time-instant is taken as a GCI
Mallat KadambeOriginal Signal 2^1
2^2 2^3
2^4 2^5
Base-band
The proposed pitch-marking scheme
Detection principle– Detection of the scale that contains the fundamental
period– Starting from a higher scale (of lower frequency), there
is a considerable jump in frame power when this scale is encountered
Features– 4X decimation to support high sampling rates – Frame based processing and error correction for
possible quasi-real-time detection
The proposed pitch-marking system
Comparisons of results with Auto-Tune
Proposed system Auto-Tune
Component No.2The Modification Engine
(n): time-modification factor (n): pitch-modification factor
(n): amplitude modification factor D(n): time-warping function
(n) (n) (n) D(n)
Time/pitch/amplitude modification engine
TD-PSOLA(Time-domain Pitch Synchronous Overlap-Add)
Time-domain splicing overlap-add method Used in prosodic modification of speech
Evaluation of the modification engine
Original
TD-PSOLA
Auto-Tune
Component No.3Time-alignment
Time-alignment Based on Verhelst’s prototye
system that applies Dynamic Time Warping (DTW)
He claimed that the basic local constrain produces the most accurate time-warping path
Exponential increase in computation as length of comparison increases
Accuracy deteriorates as length of comparison increases
Adaptations from Verhelst’s method
Proposed to perform time-alignment on a voiced/unvoiced segmental basis– DTW for voiced segments– Linear Time Warping (LTW) for unvoiced segments
Global constraints are introduced to further reduce computations
Synchronization of voiced/unvoiced segments are required, which is manually edited in current implementation
Manipulation of modification parameters
Simple smoothing of (n), (n) using linear phase FIR low-pass filters are performed before feeding them to the modification engine
The Prototype System
System Evaluation: case 1
System Evaluation: case 2
System Limitations
Segmentation– Lack of a reliable technique for voiced/unvoiced
segmentation– Segmentation and classification of different
vocal sounds is the key to devise rules for modification
Modification engine– Lack capabilities to handle pitch transition, total
dependence to the pitch-marking stage
System Limitations
Pitch-marking– Proposed system lacks robustness– Despite desirable time-response of the wavelet filter
bank, its frequency response is not capable of isolating harmonics effectively and efficiently
Time-alignment– The DTW basic local constraint allows infinite time
expansion and compression. – This factor often causes distortions in the synthesized
vocal sample
Conclusions and Recommendations
Current systems works well for slow and continuous singing
Further improvements on the individual components are recommended to handle greater dynamic changes of the vocal signal, thereby extending the current good results to a wider range of singing styles
Questions
&
Answers
Wavelet filter bank
Dyadic Spline Wavelet
Wide-band analysis
DTW local constraints
Calculation of pitch-marks
DyWT