Upload
britanni-guerrero
View
14
Download
0
Embed Size (px)
DESCRIPTION
IEEE Speech Coding Workshop Sept 17–20, 2000 Lake Lawn Resort Delavan, WI Jean-Marc Valin, Roch Lefebvre University of Sherbrooke Bandwidth Extension of Narrowband Speech for Low Bit-Rate Wideband Coding. Outline. Problem statement Proposed solution System performance Discussion. - PowerPoint PPT Presentation
Citation preview
Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 1
IEEE Speech Coding WorkshopSept 17–20, 2000Lake Lawn Resort
Delavan, WI
Jean-Marc Valin, Roch LefebvreUniversity of Sherbrooke
Bandwidth Extension of Narrowband Speech for Low Bit-Rate Wideband Coding
Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 2
Outline
• Problem statement
• Proposed solution
• System performance
• Discussion
Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 3
Problem Statement
• Telephone Band: 300 - 3400 Hz
• AM Band: 50 - 7000 Hz
• How to make sound like with 500 bits/sec?
• We need to recover information from both low and high-frequency bands
(G.729)
Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 4
Proposed Solution
0 1000 2000 3000 4000 5000 6000 7000 800010
20
30
40
50
60
70
80
90
100
110
Frequency (Hz)
Am
plit
ude
(d
B)
• 1) Do our best to recover the wideband information from narrowband speech
• 2) Use coding for the information that cannot be recovered
– Recovered information :• Low-frequency band
• High-frequency excitation
– Coded information :• High-frequency spectral
envelope
Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 5
System Overview
• Inverse IRM filter is optional– produces a flat response from 200-3500 Hz
InverseIRMFilter
Low-frequencyregeneration
2
High-frequencyregeneration
8 kHznarrowband
16 kHzwideband
50-300 Hzband
300-3400 Hzband
3400-8000 HzbandSide information
Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 6
Low-Frequency Regeneration (1/2)
• Assumptions :– Only pitch harmonics need to be recovered
• In general, no more than two pitch harmonics below 200 Hz
– Absolute phase is not perceptually relevant
• Frequency of harmonics determined from pitch analysis
• Amplitudes determined from feed-forward multi-layer perceptron (output in log domain)
Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 7
Low-Frequency Regeneration (2/2)
Low-frequencyharmonic synthesis
Scaleand sum
21st harmonic
2nd harmonic
Pitchanalysis
MFCCcalculation
Multi-layerPerceptron
Pitch delay
Cepstral coefficients
Scalefactors
Narrowbandspeech
Lowfrequencies
Pitch gain
LPfilter
(1)
(1)
(16)
Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 8
High-Frequency Extension
• Excitation-filter model (16 ms frames)
• Problem is separated in two parts– Excitation extension (no side information)– Spectral envelope coding (side information)
A(z)Narrowband
speech
LPCanalysis
Excitationextension B(z)
1
Spectral envelopeExtension
Side information(High-frequency spectral envelope)
High-frequency
band
Highpass
Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 9
Excitation Extension
Narrowbandexcitation
Absolutevalue
Whiteningfilter
widebandexcitation
2 Highpass
0 5 10 15 20-0.5
0
0.5
1
0 2 4 6 80
2
4
6
8
10
0 5 10 15 200
0.2
0.4
0.6
0.8
1
0 2 4 6 80
5
10
15
0 5 10 15 20-0.2
0
0.2
0.4
0.6
0.8
0 2 4 6 80
1
2
3
4
5
Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 10
Spectral Envelope Coding
• Spectral envelope calculated from the wideband LPC coefficients
• Quantization of the 3000-8000 Hz range (40 points)– Log domain– 8-bit Vector Quantization (500 bits/s side information,
using 16 ms frames)
• Concatenation with envelope obtained from LPC analysis on narrowband speech
Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 11
Objective results
• Low-frequency band– 3 dB RMS error on harmonic amplitude
• High-frequency band– 3.6 dB RMS error on envelope– No objective measure for excitation extension
(perceptually very close to original)
Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 12
Subjective Results
female male
Original wideband
Recovered fromoriginal IRM-filtered speech
Recovered fromG.729 coded speech
Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 13
Discussion
• Highlights– Expand IRM-filtered telephone-band speech to AM band– Very low side information rate (500 bits/s)
• Areas of improvement– Use high-band spectral estimation before coding– Use residual low-frequency information (below 300 Hz)– Noise robustness– Post-filtering