Download ppt - Seattle, Washington Apr 25-28th/00 3GPP2-C11-20000425-011 A Collaboration between QUALCOMM, Motorola, and Lucent Technologies 3GPP2 Presentation Selectable

Seattle, Washington Apr 25-28th/003GPP2-C11-20000425-011

A Collaboration between QUALCOMM, Motorola, and Lucent Technologies

3GPP2 PresentationSelectable Mode Vocoder


Agenda

Encoder/Decoder Structure

Front-End Processing + Mode Decision

Concepts of Rate Patterns

PPPWI Model + Quantizers + FER Processing

PPPWI Interoperability

Strengths of PPPWI

Conclusion


Overview of the Encoder


Overview of the Decoder


Front-End Processing and Mode Decision

High-Pass Filter similar to IS-127

Noise-Suppression similar to IS-127

LPC Analysis and Voice Activity Decision similar to IS-127

Speech-type decision related parameters computation, pitch lag

computation, speech-type decision multi-level logic

Rate decision based on the speech-type decision and ADR mode

LPC quantizers are different for different rates and coding

schemes

RCELP residual modification similar to IS-127


Reusable IS-127 Modules

High-Pass filter

Noise suppressor

VAD + LPC analysis + pitch lag estimation

LPC quantization routines

ACB search and gain computations

Basic FCB routines

Filtering routines


Coding Schemes

FCELP (Full-rate CELP) Transient, bump-upand Mode 1 voiced frames

HCELP (½-Rate CELP) Ends of words

FPPPWI/QPPPWI (Full/Quarterrate Prototype Pitch PeriodWaveform Interpolation)

Voiced frames

NELP (¼-Rate Noise Excited LP) Unvoiced frames

1/8 Rate Silence frames


Concepts of Rate Patterns

Basic idea: FQQFQQ… = = HHHHHH… in terms of ADR

Why is FQQ better than HHH?

More full rates => naturally higher quality

Full rates are less predictive than half rates => higher

performance in FER can be achieved in FQQ

Some segments like highly periodic voiced speech can be

encoded in quarter rates. Why use a half-rate when the

quarter rate is sufficient?


Concepts of Rate Patterns (cont.)

In our SMV, different rate patterns are applied to voiced

speech in different modes:

Mode 0: FcFcFcFcFcFcFcFcFc … (no pattern)

Mode 1: FcQFcFcQFcFcQFc… (“hiding” the quarter rate)

Mode 2: QQFpQQFpQQFp… (memoryless full rate)

Fc: Full-rate CELP; Fp: Full-rate PPPWI; Q: 1/4-rate PPPWI

A closed-loop measure is used to evaluate the coding

efficiency of the Q frames; if not satisfactory, bump it up to

full rate and the rate pattern gets reset


Concepts of Rate Patterns (cont.) Results:

Very good quality in Mode 1 where only one quarter-rate

PPPWI frame is inserted between two full-rate frames on either

side (gives 15% more full-rate frames than when using half-rate

CELP for voiced speech)

Very good quality in Mode 2 where every third frame is a high

quality full-rate PPPWI (gives double amount of full-rate frames

than when using half-rate CELP for voiced speech)

Enhanced quality under FER conditions due to high percentage

of full-rate in both Mode 1 and Mode 2 as well as the usage of

memoryless full-rate PPPWI as the every third high-quality full-

rate frame


Comparison of Rate Statistics

Half-rate CELP forvoiced speech

Our SMV coder

Full-rate 36.11% 41.11%Half-rate 17.96% 2.97%Quarter-rate 18.32% 28.31%Eighth-rate 27.61% 27.61%

Half-rate CELP forvoiced speech

Our SMV coder

Full-rate 14.33% 27.08%Half-rate 39.74% 3.18%Quarter-rate 18.32% 42.13%Eighth-rate 27.61% 27.61%

Mode 1

Mode 2


PPPWI Basic ConceptsTitle:intro.figCreator:fig2dev Version 3.1 Patchlevel 2Preview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.

Punch line: Quantize L samples only (not 160)


Structural Diagram of PPPWI

Available at full-rate or quarter-rate

Quantization performed in frequency-domain

Produces time-synchronous output


2D surface Construction

Title:FSR.figCreator:fig2dev Version 3.1 Patchlevel 2Preview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.


Phase Track Computation

To convert from the 2D surface back to 1D, we need to choose a

sample point from each instantaneous PW

The location of this sample point is indicated by a phase value

Need to take the alignment shift into account when designing

such a phase track so as to achieve time-synchrony

Four boundary conditions are resulted: initial and final pitch

values, initial and final phase offsets => cubic phase track


Amplitude Quantization


Problem: How to quantize the phase spectrum of the PW?

Solution: Adopt a multi-band alignment approach

Divide the spectra into N subbands => N band-pass signals

Perform an alignment search on each subband such that each

band-pass PW is maximally aligned with the corresponding

target

Phase Quantization


Closed-loop Bump-Up Scheme

Capture the quantization outliers especially in the quarter

rate PPPWI

Can either bump up to full-rate CELP or full-rate PPPWI

Efficient bump-up schemes have been devised to measure

the amplitude and phase quantization efficiencies

Dramatically increase the performance of quarter-rate

PPPWI


Bit Allocation Scheme

PPPWI (quarter-rate)Parameters Bits/Frame Bits/s

Mode 1 50Pitch 4 200LSFs 16 800Amplitude Spectrum 18 900Voicing Cut-Off Freq. 1 50

Overall Bit Rate: 40 2000

Transmission Rate

Quarter-Rate PPPWI


Bit Allocation Scheme

PPPWI (Full-Rate)Parameters Bits/Frame Bits/s

Mode 1 50Pitch 7 350LSFs 28 1400Phase Spectrum 107 5350Amplitude Spectrum 27 1350

Overall Bit Rate: 170 8500

Transmission Rate

Full-Rate PPPWI


PPPWI Interoperability with CELP

Contrast to the conventional WI, PPPWI maintains time-

synchrony with the original => fully compatible with any

waveform matching coders including CELP

PPPWI employs pitch estimation, LSF and residual

quantizers=> similar framework to most CELP coders

Easy and simple integration into CELP codersCELP -> PPPWI: take the last L samples from the ACB

memory to form the previous prototypePPPWI -> CELP: populate the ACB memory with the PPPWI

reconstructed time synchronous residual signal


Illustration of PPPWI Interoperability


PPPWI FER Processing

Prototype repeat processing

Full-rate PPPWI: memoryless => provides instant FER recovery

Quarter-rate PPPWI: the transmission of delta lag allows the

decoder to reconstruct the correct delay contour for more than one

frame back

Because of its powerful smoothing capability, PPPWI synthesis

procedure is employed to eliminates discontinuities created by

typical CELP FER routines (for instance, IS-127)

The aggressive usage of quarter rates allows a higher percentage of

full-rates which in turn improves FER performance


PPPWI Smoothing Capability in FER

Original residual

Quantized residual

Quantized residualwith FER (IS-127)

A click sound

Erased Frame

after pitch contour fixing

Quantized residualwith FER + post

PPPWI smoothingcontinuous pitch warping

frame boundaries


Strengths of PPPWI

High Fidelity:

ability to code an entire voiced speech frame with very few bits

quarter-rate PPPWI allows for increase in quantity of full-rate

frames available for voiced speech frames, which in turn translates

into high quality

the above advantages taken together are superior to a method where

a bit-starved half-rate CELP coder has to quantize an entire frame

Compatible:

can be operated in conjunction with any waveform coding

algorithms including CELP due to its time-synchrony nature


Strengths of PPPWI (cont.)

Scalable:

Full-Rate: provides an immediate and full recovery in FER

Quarter-Rate: can be hidden in periodic segments to lower ADR

while keeping high percentage of full rates

Reliable:

robust in noise and tandem conditions

the interpolation scheme smoothens discontinuities (clicks) and

artifacts during frame erasures


“Normalized Delta Quantization” of LSP Parameters

Speech Processing Research Lab


NDQ (cont.) LSP’s form a strictly ascending set of real values that span [0.0, 0.5]. Subsequent training of scalar or split vector quantizers generate

overlap, which “prunes” the available parameter space. e.g.:QuantizedValue ofLSP(1):

0.05

CodebookValues ofLSP(2):

index value0 0.021 0.042 0.063 0.08

Problem:Indices 0 and 1 are

invalid due to LSP(2) < LSP(1)

• By re-mapping parameter space according to the following, all values in codebook are valid:

i ˆ i ˆ

i1max ˆ i 1

ˆ i ˆ i1

ˆ i1ˆ i max 0.50.0

max ˆ i1



NDQ (cont.)

Performance– Using EVRC (IS-127) as benchmark, NDQ yields the following improvement:

Quantizer Avg SD (dB) Std Dev > 2 dB (%) > 4 dB (%)

IS-127 (22) 1.58 0.41 13.3 0.024NDQ (22) 1.49 0.39 9.3 0.024

IS-127 (28) 1.22 0.36 3.6 0.019NDQ (28) 1.06 0.29 1.2 0.005



ACELP “tracks” are generally used to code pulse position information:

BUT, for mid to high rate coding arrangements:– Tracks over constrain position flexibility

» More tracks force pulses to non-optimum positions, producing noise.

– Traditional position coding methods are problematic for more than two pulses per track

» Pulse “indistintion”and “degeneration” cause redundancy in codewords

Factorial Pulse Coding

Track Positions

T0 0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50

T1 1, 6, 11, 16, 21, 26, 31, 36, 41, 46, 51

T2 2, 7, 12, 17, 22, 27, 32, 37, 42, 47, 52

T3 3, 8, 13, 18, 23, 28, 33, 38, 43, 48, 53

T4 4, 9, 14, 19, 24, 29, 34, 39, 44, 49, 54

EVRC ACELPConfiguration:

8 pulses on 5 tracks.Each pulse pair iscoded using 7 bits(11x 11 = 121 < 27)



Factorial Coding Method (cont.) Factorial Packing Method allows m pulses to be coded on n positions

using M bits in accordance with the theoretical minimum expression:

– Eliminates “track” concept» Removes pulse position constraints, reduces noise

– Maximally efficient» Based on combinatorial mathematics» Pulse “indistinction”and “degeneration” redundancies eliminated» Codes non-zero positions, then magnitude information in novel iterative algorithm

Mm

i

i inFimDN 2),(),(21



Factorial Pulse Coding (cont.)

Codec Example Bit AllocationLPC 28Open loop pitch 7Closed loop pitch 6ACB gain 9FCB shape 105FCB gain 15

Total 170

Rate (bps) 8500

n m i F(n,i) = n!/(i!(n-i)!) D(m,i)=F(m-1,i-1) 2̂ i F*D*2̂ i % of total

54 7 7 177100560 1 128 22668871680 66.2664%6 25827165 6 64 9917631360 28.9915%5 3162510 15 32 1518004800 4.4375%4 316251 20 16 101200320 0.2958%3 24804 15 8 2976480 0.0087%2 1431 6 4 34344 0.0001%1 54 1 2 108 0.0000%

Sampling rate: 8000 Combs. 34208719092Overlap: 0 Bits/track 35

Framesize: 160 efficiency 99.6%Subframes: 3 Bits/sframe 35

# Tracks: 1 Bits/frame 105Pulses/track: 7 Data rate 5250



EVRC HR Codebook Configuration

Table 4.5.7.4-1. Positions of Individual Pulses in the Rate 1/ 2 Algebraic Codebook

Pulse Positions

T0 0 7 14 21 28 35 42 49

T1 2 9 16 23 30 37 44 51

T2 4 11 18 25 32 39 46 53

Codewords

qTi ; 0=i=2

000 001 010 011 100 101 110 111



Variable Configuration ACELPxw(n)

+

-

Mean Squared

Error

Fixed Codebook

(FCB)

FCB Index

k

X

Error Minimization

Process

ck

k

Zero State Weighted Synthesis

Filter H(z)

ek

Zero State Pitch Filter

P(z)

ck'

xw(n)

+

-

Mean Squared

Error

Fixed Codebook

(FCB)m

FCB Index

k

X

Error Minimization

Process

ck

k

Zero State Weighted Synthesis

Filter H(z)

ek

Dispersion Matrix m

ck[m]

Configuration Control

Aq(z)

m

Traditional ACELPPitch filter reinforcesharmonic structureof strongly voiced

speech. Has no effecton long pitch periodsor unvoiced speech.

Results in undermodeledexcitation vector ck.

Variable ConfigurationProvides mechanismfor varying speechmodel based on

transmitted codecparameters.



Variable Configuration ACELP (cont)Configuration

Control

<= 64

< 95 and

<= th

< L and

b > th

Y

NN

Y

Y

N

>= 110Y

N

>= 95

Done

Y

N

rc1 > rth and

< th

Y

N

FCB Configuration

m = 6

FCB Configuration

m = 2

FCB Configuration

m = 1

FCB Configuration

m = 3

FCB Configuration

m = 4

FCB Configuration

m = 5

Aq(z) to rc's (reflection

coeffs)



Joint Interleaved ACELP (JCELP):

Low bit rate ACELP tracks cannot represent all positions,e.g., a significant pulse at position 4 cannot be adequately coded.

Table 2: Example of Pulse Position Definitions using the Prior Art

Pulse Positions

p0 0 7 14 21 28 35 42 49

p1 2 9 16 23 30 37 44 51

p2 3 10 17 24 31 38 45 52

p4 5 12 19 26 33 40 47 54



JCELP (cont.)

Allows ALL individualpositions to be coded,but prohibits certainCOMBINATIONS of

pulses. This methodALWAYS allows the

most significant pulseto be coded.

From the previous example,position 4 can always becoded with pulse p0, but

only when p1 is[1, 9, 13, 21, 25, 33, 37, 45, 53]

0 1 2 3 4 5 6 7 8

0

1

2

3

4

5

6

7

8

Pulse 0 decimated position (0)

Pulse 1 decimated

position (1)

9

9

10 11 12 13

10

11

12

13

0 4 8 12 16 20 24 28 32 36 40 44 48 52

1

5

9

13

17

21

25

29

33

37

41

45

49

53

Pulse 0 actual subframe position (p0)

Pulse 1 actual

subframe position

(p1)

Figure 3: Joint Interleaved Pulse Permutation Matrix (Pulses 0 and 1, L = 54)



Current HR Codebook:Combined Variable Configuration & 3D

JCELP Basic configuration: 3 pulses, 11 bits/subframe– 2 bits for signs (+-+, -+-, ++-, --+)– exhaustive search (2 x 2^9 = 1024 iterations)

3-D JCELP allows all positions to be coded– 2-D “checker board” extended to a 3-D cube– Sorry, no 3-D graphics

Allowable pulse positions computed algebraically



Lucent Technologies CDMA Selectable Mode Vocoder

Main Approach

– CELP/RCELP/PPP based coder

– Mode - 0

» Based on IS-127, RCELP

– Modes1 & 2

» Uses Full Rate RCELP & Quarter Rate PPP

» Does not use full rate PPP

» RDA tuned to use full-rate RCELP/Quarter rate PPP

» Lower complexity alternative to Full-Rate PPP


Lucent Full-Rate

Based on EVRC IS-127

– LSP Quantization

» Two stage MSVQ , non-predictive, 28 bits

• Stage 1: Full dimension 10 bits,

• Stage 2: Split of 3 4 3 with 6 7 5 bits

• Performance vs (EVRC full-rate LSP quantizer)

– MIRS Clean– SD: 1.15 dB (1.34 dB), 2dB Outliers: 3.56% (8.83%)– CAR15– SD: 1.20 dB (1.42 dB), 2dB Outliers: 5.72% (13.15%)


Lucent Full-Rate Continued

– FCB Search

» Steal delta-delay and LPC flag bits (6 bits)

» 9 pulses in 54 sample subframe with 37 bits

• Each subframe is divided into 7 tracks of 8 locations

• 3 bits for track select

• 20 bits for five one-pulse tracks

• 14 bits for two two-pulse tracks

– RCELP, ACB and FCB gain quantizations

» Same as EVRC


Lucent Half-Rate

– RCELP Based

– LSP Quantizer

» New 2-stage MSVQ

» First stage is same as full rate LSP quantizer

» Second stage is a split vector quantizer

• 5-5 dimension splits

• 6-6 bit allocations


Lucent Half-Rate– RCELP Based

– LSP Quantizer

» New 2-stage MSVQ

» First stage is same as full rate LSP quantizer

» Second stage is a split vector quantizer

• 5-5 dimension splits

• 6-6 bit allocations

– FCB Search

» Same as IS-127 (10 bits/subframe)


Improved Postfilter

– Improves Tandem Performance

» Weigthing filter is generated from LSF interpolations

– Ref: 1997 Speech Coding Workshop, Tasaki, et.al. pp.57

9,...,5,5.0

4,...,004.03.0

i

iiki

)()1()()(3 ifkifkif cirefi

)(7.0)(3.0)(4 ififif cref

)(

)()(

4

3

zA

zAzW parcorfirstwithsetLsffref :