13
Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 1 IEEE Speech Coding Workshop Sept 17–20, 2000 Lake Lawn Resort Delavan, WI Jean-Marc Valin, Roch Lefebvre University of Sherbrooke Bandwidth Extension of Narrowband Speech for Low Bit-Rate Wideband Coding

IEEE Speech Coding Workshop Sept 17–20, 2000 Lake Lawn Resort Delavan, WI

Embed Size (px)

DESCRIPTION

IEEE Speech Coding Workshop Sept 17–20, 2000 Lake Lawn Resort Delavan, WI Jean-Marc Valin, Roch Lefebvre University of Sherbrooke Bandwidth Extension of Narrowband Speech for Low Bit-Rate Wideband Coding. Outline. Problem statement Proposed solution System performance Discussion. - PowerPoint PPT Presentation

Citation preview

Page 1: IEEE Speech Coding Workshop Sept 17–20, 2000 Lake Lawn Resort Delavan, WI

Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 1

IEEE Speech Coding WorkshopSept 17–20, 2000Lake Lawn Resort

Delavan, WI

Jean-Marc Valin, Roch LefebvreUniversity of Sherbrooke

Bandwidth Extension of Narrowband Speech for Low Bit-Rate Wideband Coding

Page 2: IEEE Speech Coding Workshop Sept 17–20, 2000 Lake Lawn Resort Delavan, WI

Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 2

Outline

• Problem statement

• Proposed solution

• System performance

• Discussion

Page 3: IEEE Speech Coding Workshop Sept 17–20, 2000 Lake Lawn Resort Delavan, WI

Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 3

Problem Statement

• Telephone Band: 300 - 3400 Hz

• AM Band: 50 - 7000 Hz

• How to make sound like with 500 bits/sec?

• We need to recover information from both low and high-frequency bands

(G.729)

Page 4: IEEE Speech Coding Workshop Sept 17–20, 2000 Lake Lawn Resort Delavan, WI

Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 4

Proposed Solution

0 1000 2000 3000 4000 5000 6000 7000 800010

20

30

40

50

60

70

80

90

100

110

Frequency (Hz)

Am

plit

ude

(d

B)

• 1) Do our best to recover the wideband information from narrowband speech

• 2) Use coding for the information that cannot be recovered

– Recovered information :• Low-frequency band

• High-frequency excitation

– Coded information :• High-frequency spectral

envelope

Page 5: IEEE Speech Coding Workshop Sept 17–20, 2000 Lake Lawn Resort Delavan, WI

Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 5

System Overview

• Inverse IRM filter is optional– produces a flat response from 200-3500 Hz

InverseIRMFilter

Low-frequencyregeneration

2

High-frequencyregeneration

8 kHznarrowband

16 kHzwideband

50-300 Hzband

300-3400 Hzband

3400-8000 HzbandSide information

Page 6: IEEE Speech Coding Workshop Sept 17–20, 2000 Lake Lawn Resort Delavan, WI

Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 6

Low-Frequency Regeneration (1/2)

• Assumptions :– Only pitch harmonics need to be recovered

• In general, no more than two pitch harmonics below 200 Hz

– Absolute phase is not perceptually relevant

• Frequency of harmonics determined from pitch analysis

• Amplitudes determined from feed-forward multi-layer perceptron (output in log domain)

Page 7: IEEE Speech Coding Workshop Sept 17–20, 2000 Lake Lawn Resort Delavan, WI

Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 7

Low-Frequency Regeneration (2/2)

Low-frequencyharmonic synthesis

Scaleand sum

21st harmonic

2nd harmonic

Pitchanalysis

MFCCcalculation

Multi-layerPerceptron

Pitch delay

Cepstral coefficients

Scalefactors

Narrowbandspeech

Lowfrequencies

Pitch gain

LPfilter

(1)

(1)

(16)

Page 8: IEEE Speech Coding Workshop Sept 17–20, 2000 Lake Lawn Resort Delavan, WI

Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 8

High-Frequency Extension

• Excitation-filter model (16 ms frames)

• Problem is separated in two parts– Excitation extension (no side information)– Spectral envelope coding (side information)

A(z)Narrowband

speech

LPCanalysis

Excitationextension B(z)

1

Spectral envelopeExtension

Side information(High-frequency spectral envelope)

High-frequency

band

Highpass

Page 9: IEEE Speech Coding Workshop Sept 17–20, 2000 Lake Lawn Resort Delavan, WI

Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 9

Excitation Extension

Narrowbandexcitation

Absolutevalue

Whiteningfilter

widebandexcitation

2 Highpass

0 5 10 15 20-0.5

0

0.5

1

0 2 4 6 80

2

4

6

8

10

0 5 10 15 200

0.2

0.4

0.6

0.8

1

0 2 4 6 80

5

10

15

0 5 10 15 20-0.2

0

0.2

0.4

0.6

0.8

0 2 4 6 80

1

2

3

4

5

Page 10: IEEE Speech Coding Workshop Sept 17–20, 2000 Lake Lawn Resort Delavan, WI

Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 10

Spectral Envelope Coding

• Spectral envelope calculated from the wideband LPC coefficients

• Quantization of the 3000-8000 Hz range (40 points)– Log domain– 8-bit Vector Quantization (500 bits/s side information,

using 16 ms frames)

• Concatenation with envelope obtained from LPC analysis on narrowband speech

Page 11: IEEE Speech Coding Workshop Sept 17–20, 2000 Lake Lawn Resort Delavan, WI

Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 11

Objective results

• Low-frequency band– 3 dB RMS error on harmonic amplitude

• High-frequency band– 3.6 dB RMS error on envelope– No objective measure for excitation extension

(perceptually very close to original)

Page 12: IEEE Speech Coding Workshop Sept 17–20, 2000 Lake Lawn Resort Delavan, WI

Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 12

Subjective Results

female male

Original wideband

Recovered fromoriginal IRM-filtered speech

Recovered fromG.729 coded speech

Page 13: IEEE Speech Coding Workshop Sept 17–20, 2000 Lake Lawn Resort Delavan, WI

Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 13

Discussion

• Highlights– Expand IRM-filtered telephone-band speech to AM band– Very low side information rate (500 bits/s)

• Areas of improvement– Use high-band spectral estimation before coding– Use residual low-frequency information (below 300 Hz)– Noise robustness– Post-filtering