Seattle, Washington Apr 25-28th/003GPP2-C11-20000425-011
A Collaboration between QUALCOMM, Motorola, and Lucent Technologies
3GPP2 PresentationSelectable Mode Vocoder
Seattle, Washington Apr 25-28th/003GPP2-C11-20000425-011
Agenda
Encoder/Decoder Structure
Front-End Processing + Mode Decision
Concepts of Rate Patterns
PPPWI Model + Quantizers + FER Processing
PPPWI Interoperability
Strengths of PPPWI
Conclusion
Seattle, Washington Apr 25-28th/003GPP2-C11-20000425-011
Overview of the Encoder
Seattle, Washington Apr 25-28th/003GPP2-C11-20000425-011
Overview of the Decoder
Seattle, Washington Apr 25-28th/003GPP2-C11-20000425-011
Front-End Processing and Mode Decision
High-Pass Filter similar to IS-127
Noise-Suppression similar to IS-127
LPC Analysis and Voice Activity Decision similar to IS-127
Speech-type decision related parameters computation, pitch lag
computation, speech-type decision multi-level logic
Rate decision based on the speech-type decision and ADR mode
LPC quantizers are different for different rates and coding
schemes
RCELP residual modification similar to IS-127
Seattle, Washington Apr 25-28th/003GPP2-C11-20000425-011
Reusable IS-127 Modules
High-Pass filter
Noise suppressor
VAD + LPC analysis + pitch lag estimation
LPC quantization routines
ACB search and gain computations
Basic FCB routines
Filtering routines
Seattle, Washington Apr 25-28th/003GPP2-C11-20000425-011
Coding Schemes
FCELP (Full-rate CELP) Transient, bump-upand Mode 1 voiced frames
HCELP (½-Rate CELP) Ends of words
FPPPWI/QPPPWI (Full/Quarterrate Prototype Pitch PeriodWaveform Interpolation)
Voiced frames
NELP (¼-Rate Noise Excited LP) Unvoiced frames
1/8 Rate Silence frames
Seattle, Washington Apr 25-28th/003GPP2-C11-20000425-011
Concepts of Rate Patterns
Basic idea: FQQFQQ… = = HHHHHH… in terms of ADR
Why is FQQ better than HHH?
More full rates => naturally higher quality
Full rates are less predictive than half rates => higher
performance in FER can be achieved in FQQ
Some segments like highly periodic voiced speech can be
encoded in quarter rates. Why use a half-rate when the
quarter rate is sufficient?
Seattle, Washington Apr 25-28th/003GPP2-C11-20000425-011
Concepts of Rate Patterns (cont.)
In our SMV, different rate patterns are applied to voiced
speech in different modes:
Mode 0: FcFcFcFcFcFcFcFcFc … (no pattern)
Mode 1: FcQFcFcQFcFcQFc… (“hiding” the quarter rate)
Mode 2: QQFpQQFpQQFp… (memoryless full rate)
Fc: Full-rate CELP; Fp: Full-rate PPPWI; Q: 1/4-rate PPPWI
A closed-loop measure is used to evaluate the coding
efficiency of the Q frames; if not satisfactory, bump it up to
full rate and the rate pattern gets reset
Seattle, Washington Apr 25-28th/003GPP2-C11-20000425-011
Concepts of Rate Patterns (cont.) Results:
Very good quality in Mode 1 where only one quarter-rate
PPPWI frame is inserted between two full-rate frames on either
side (gives 15% more full-rate frames than when using half-rate
CELP for voiced speech)
Very good quality in Mode 2 where every third frame is a high
quality full-rate PPPWI (gives double amount of full-rate frames
than when using half-rate CELP for voiced speech)
Enhanced quality under FER conditions due to high percentage
of full-rate in both Mode 1 and Mode 2 as well as the usage of
memoryless full-rate PPPWI as the every third high-quality full-
rate frame
Seattle, Washington Apr 25-28th/003GPP2-C11-20000425-011
Comparison of Rate Statistics
Half-rate CELP forvoiced speech
Our SMV coder
Full-rate 36.11% 41.11%Half-rate 17.96% 2.97%Quarter-rate 18.32% 28.31%Eighth-rate 27.61% 27.61%
Half-rate CELP forvoiced speech
Our SMV coder
Full-rate 14.33% 27.08%Half-rate 39.74% 3.18%Quarter-rate 18.32% 42.13%Eighth-rate 27.61% 27.61%
Mode 1
Mode 2
Seattle, Washington Apr 25-28th/003GPP2-C11-20000425-011
PPPWI Basic ConceptsTitle:intro.figCreator:fig2dev Version 3.1 Patchlevel 2Preview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.
Punch line: Quantize L samples only (not 160)
Seattle, Washington Apr 25-28th/003GPP2-C11-20000425-011
Structural Diagram of PPPWI
Available at full-rate or quarter-rate
Quantization performed in frequency-domain
Produces time-synchronous output
Seattle, Washington Apr 25-28th/003GPP2-C11-20000425-011
2D surface Construction
Title:FSR.figCreator:fig2dev Version 3.1 Patchlevel 2Preview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.
Seattle, Washington Apr 25-28th/003GPP2-C11-20000425-011
Phase Track Computation
To convert from the 2D surface back to 1D, we need to choose a
sample point from each instantaneous PW
The location of this sample point is indicated by a phase value
Need to take the alignment shift into account when designing
such a phase track so as to achieve time-synchrony
Four boundary conditions are resulted: initial and final pitch
values, initial and final phase offsets => cubic phase track
Seattle, Washington Apr 25-28th/003GPP2-C11-20000425-011
Amplitude Quantization
Seattle, Washington Apr 25-28th/003GPP2-C11-20000425-011
Problem: How to quantize the phase spectrum of the PW?
Solution: Adopt a multi-band alignment approach
Divide the spectra into N subbands => N band-pass signals
Perform an alignment search on each subband such that each
band-pass PW is maximally aligned with the corresponding
target
Phase Quantization
Seattle, Washington Apr 25-28th/003GPP2-C11-20000425-011
Closed-loop Bump-Up Scheme
Capture the quantization outliers especially in the quarter
rate PPPWI
Can either bump up to full-rate CELP or full-rate PPPWI
Efficient bump-up schemes have been devised to measure
the amplitude and phase quantization efficiencies
Dramatically increase the performance of quarter-rate
PPPWI
Seattle, Washington Apr 25-28th/003GPP2-C11-20000425-011
Bit Allocation Scheme
PPPWI (quarter-rate)Parameters Bits/Frame Bits/s
Mode 1 50Pitch 4 200LSFs 16 800Amplitude Spectrum 18 900Voicing Cut-Off Freq. 1 50
Overall Bit Rate: 40 2000
Transmission Rate
Quarter-Rate PPPWI
Seattle, Washington Apr 25-28th/003GPP2-C11-20000425-011
Bit Allocation Scheme
PPPWI (Full-Rate)Parameters Bits/Frame Bits/s
Mode 1 50Pitch 7 350LSFs 28 1400Phase Spectrum 107 5350Amplitude Spectrum 27 1350
Overall Bit Rate: 170 8500
Transmission Rate
Full-Rate PPPWI
Seattle, Washington Apr 25-28th/003GPP2-C11-20000425-011
PPPWI Interoperability with CELP
Contrast to the conventional WI, PPPWI maintains time-
synchrony with the original => fully compatible with any
waveform matching coders including CELP
PPPWI employs pitch estimation, LSF and residual
quantizers=> similar framework to most CELP coders
Easy and simple integration into CELP codersCELP -> PPPWI: take the last L samples from the ACB
memory to form the previous prototypePPPWI -> CELP: populate the ACB memory with the PPPWI
reconstructed time synchronous residual signal
Seattle, Washington Apr 25-28th/003GPP2-C11-20000425-011
Illustration of PPPWI Interoperability
Seattle, Washington Apr 25-28th/003GPP2-C11-20000425-011
PPPWI FER Processing
Prototype repeat processing
Full-rate PPPWI: memoryless => provides instant FER recovery
Quarter-rate PPPWI: the transmission of delta lag allows the
decoder to reconstruct the correct delay contour for more than one
frame back
Because of its powerful smoothing capability, PPPWI synthesis
procedure is employed to eliminates discontinuities created by
typical CELP FER routines (for instance, IS-127)
The aggressive usage of quarter rates allows a higher percentage of
full-rates which in turn improves FER performance
Seattle, Washington Apr 25-28th/003GPP2-C11-20000425-011
PPPWI Smoothing Capability in FER
Original residual
Quantized residual
Quantized residualwith FER (IS-127)
A click sound
Erased Frame
after pitch contour fixing
Quantized residualwith FER + post
PPPWI smoothingcontinuous pitch warping
frame boundaries
Seattle, Washington Apr 25-28th/003GPP2-C11-20000425-011
Strengths of PPPWI
High Fidelity:
ability to code an entire voiced speech frame with very few bits
quarter-rate PPPWI allows for increase in quantity of full-rate
frames available for voiced speech frames, which in turn translates
into high quality
the above advantages taken together are superior to a method where
a bit-starved half-rate CELP coder has to quantize an entire frame
Compatible:
can be operated in conjunction with any waveform coding
algorithms including CELP due to its time-synchrony nature
Seattle, Washington Apr 25-28th/003GPP2-C11-20000425-011
Strengths of PPPWI (cont.)
Scalable:
Full-Rate: provides an immediate and full recovery in FER
Quarter-Rate: can be hidden in periodic segments to lower ADR
while keeping high percentage of full rates
Reliable:
robust in noise and tandem conditions
the interpolation scheme smoothens discontinuities (clicks) and
artifacts during frame erasures
Seattle, Washington Apr 25-28th/003GPP2-C11-20000425-011
“Normalized Delta Quantization” of LSP Parameters
Speech Processing Research Lab
Seattle, Washington Apr 25-28th/003GPP2-C11-20000425-011
NDQ (cont.) LSP’s form a strictly ascending set of real values that span [0.0, 0.5]. Subsequent training of scalar or split vector quantizers generate
overlap, which “prunes” the available parameter space. e.g.:QuantizedValue ofLSP(1):
0.05
CodebookValues ofLSP(2):
index value0 0.021 0.042 0.063 0.08
Problem:Indices 0 and 1 are
invalid due to LSP(2) < LSP(1)
• By re-mapping parameter space according to the following, all values in codebook are valid:
i ˆ i ˆ
i1max ˆ i 1
ˆ i ˆ i1
ˆ i1ˆ i max 0.50.0
max ˆ i1
Speech Processing Research Lab
Seattle, Washington Apr 25-28th/003GPP2-C11-20000425-011
NDQ (cont.)
Performance– Using EVRC (IS-127) as benchmark, NDQ yields the following improvement:
Quantizer Avg SD (dB) Std Dev > 2 dB (%) > 4 dB (%)
IS-127 (22) 1.58 0.41 13.3 0.024NDQ (22) 1.49 0.39 9.3 0.024
IS-127 (28) 1.22 0.36 3.6 0.019NDQ (28) 1.06 0.29 1.2 0.005
Speech Processing Research Lab
Seattle, Washington Apr 25-28th/003GPP2-C11-20000425-011
ACELP “tracks” are generally used to code pulse position information:
BUT, for mid to high rate coding arrangements:– Tracks over constrain position flexibility
» More tracks force pulses to non-optimum positions, producing noise.
– Traditional position coding methods are problematic for more than two pulses per track
» Pulse “indistintion”and “degeneration” cause redundancy in codewords
Factorial Pulse Coding
Track Positions
T0 0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50
T1 1, 6, 11, 16, 21, 26, 31, 36, 41, 46, 51
T2 2, 7, 12, 17, 22, 27, 32, 37, 42, 47, 52
T3 3, 8, 13, 18, 23, 28, 33, 38, 43, 48, 53
T4 4, 9, 14, 19, 24, 29, 34, 39, 44, 49, 54
EVRC ACELPConfiguration:
8 pulses on 5 tracks.Each pulse pair iscoded using 7 bits(11x 11 = 121 < 27)
Speech Processing Research Lab
Seattle, Washington Apr 25-28th/003GPP2-C11-20000425-011
Factorial Coding Method (cont.) Factorial Packing Method allows m pulses to be coded on n positions
using M bits in accordance with the theoretical minimum expression:
– Eliminates “track” concept» Removes pulse position constraints, reduces noise
– Maximally efficient» Based on combinatorial mathematics» Pulse “indistinction”and “degeneration” redundancies eliminated» Codes non-zero positions, then magnitude information in novel iterative algorithm
Mm
i
i inFimDN 2),(),(21
Speech Processing Research Lab
Seattle, Washington Apr 25-28th/003GPP2-C11-20000425-011
Factorial Pulse Coding (cont.)
Codec Example Bit AllocationLPC 28Open loop pitch 7Closed loop pitch 6ACB gain 9FCB shape 105FCB gain 15
Total 170
Rate (bps) 8500
n m i F(n,i) = n!/(i!(n-i)!) D(m,i)=F(m-1,i-1) 2̂ i F*D*2̂ i % of total
54 7 7 177100560 1 128 22668871680 66.2664%6 25827165 6 64 9917631360 28.9915%5 3162510 15 32 1518004800 4.4375%4 316251 20 16 101200320 0.2958%3 24804 15 8 2976480 0.0087%2 1431 6 4 34344 0.0001%1 54 1 2 108 0.0000%
Sampling rate: 8000 Combs. 34208719092Overlap: 0 Bits/track 35
Framesize: 160 efficiency 99.6%Subframes: 3 Bits/sframe 35
# Tracks: 1 Bits/frame 105Pulses/track: 7 Data rate 5250
Speech Processing Research Lab
Seattle, Washington Apr 25-28th/003GPP2-C11-20000425-011
EVRC HR Codebook Configuration
Table 4.5.7.4-1. Positions of Individual Pulses in the Rate 1/ 2 Algebraic Codebook
Pulse Positions
T0 0 7 14 21 28 35 42 49
T1 2 9 16 23 30 37 44 51
T2 4 11 18 25 32 39 46 53
Codewords
qTi ; 0=i=2
000 001 010 011 100 101 110 111
Speech Processing Research Lab
Seattle, Washington Apr 25-28th/003GPP2-C11-20000425-011
Variable Configuration ACELPxw(n)
+
-
Mean Squared
Error
Fixed Codebook
(FCB)
FCB Index
k
X
Error Minimization
Process
ck
k
Zero State Weighted Synthesis
Filter H(z)
ek
Zero State Pitch Filter
P(z)
ck'
xw(n)
+
-
Mean Squared
Error
Fixed Codebook
(FCB)m
FCB Index
k
X
Error Minimization
Process
ck
k
Zero State Weighted Synthesis
Filter H(z)
ek
Dispersion Matrix m
ck[m]
Configuration Control
Aq(z)
m
Traditional ACELPPitch filter reinforcesharmonic structureof strongly voiced
speech. Has no effecton long pitch periodsor unvoiced speech.
Results in undermodeledexcitation vector ck.
Variable ConfigurationProvides mechanismfor varying speechmodel based on
transmitted codecparameters.
Speech Processing Research Lab
Seattle, Washington Apr 25-28th/003GPP2-C11-20000425-011
Variable Configuration ACELP (cont)Configuration
Control
<= 64
< 95 and
<= th
< L and
b > th
Y
NN
Y
Y
N
>= 110Y
N
>= 95
Done
Y
N
rc1 > rth and
< th
Y
N
FCB Configuration
m = 6
FCB Configuration
m = 2
FCB Configuration
m = 1
FCB Configuration
m = 3
FCB Configuration
m = 4
FCB Configuration
m = 5
Aq(z) to rc's (reflection
coeffs)
Speech Processing Research Lab
Seattle, Washington Apr 25-28th/003GPP2-C11-20000425-011
Joint Interleaved ACELP (JCELP):
Low bit rate ACELP tracks cannot represent all positions,e.g., a significant pulse at position 4 cannot be adequately coded.
Table 2: Example of Pulse Position Definitions using the Prior Art
Pulse Positions
p0 0 7 14 21 28 35 42 49
p1 2 9 16 23 30 37 44 51
p2 3 10 17 24 31 38 45 52
p4 5 12 19 26 33 40 47 54
Speech Processing Research Lab
Seattle, Washington Apr 25-28th/003GPP2-C11-20000425-011
JCELP (cont.)
Allows ALL individualpositions to be coded,but prohibits certainCOMBINATIONS of
pulses. This methodALWAYS allows the
most significant pulseto be coded.
From the previous example,position 4 can always becoded with pulse p0, but
only when p1 is[1, 9, 13, 21, 25, 33, 37, 45, 53]
0 1 2 3 4 5 6 7 8
0
1
2
3
4
5
6
7
8
Pulse 0 decimated position (0)
Pulse 1 decimated
position (1)
9
9
10 11 12 13
10
11
12
13
0 4 8 12 16 20 24 28 32 36 40 44 48 52
1
5
9
13
17
21
25
29
33
37
41
45
49
53
Pulse 0 actual subframe position (p0)
Pulse 1 actual
subframe position
(p1)
Figure 3: Joint Interleaved Pulse Permutation Matrix (Pulses 0 and 1, L = 54)
Speech Processing Research Lab
Seattle, Washington Apr 25-28th/003GPP2-C11-20000425-011
Current HR Codebook:Combined Variable Configuration & 3D
JCELP Basic configuration: 3 pulses, 11 bits/subframe– 2 bits for signs (+-+, -+-, ++-, --+)– exhaustive search (2 x 2^9 = 1024 iterations)
3-D JCELP allows all positions to be coded– 2-D “checker board” extended to a 3-D cube– Sorry, no 3-D graphics
Allowable pulse positions computed algebraically
Speech Processing Research Lab
Seattle, Washington Apr 25-28th/003GPP2-C11-20000425-011
Lucent Technologies CDMA Selectable Mode Vocoder
Main Approach
– CELP/RCELP/PPP based coder
– Mode - 0
» Based on IS-127, RCELP
– Modes1 & 2
» Uses Full Rate RCELP & Quarter Rate PPP
» Does not use full rate PPP
» RDA tuned to use full-rate RCELP/Quarter rate PPP
» Lower complexity alternative to Full-Rate PPP
Seattle, Washington Apr 25-28th/003GPP2-C11-20000425-011
Lucent Full-Rate
Based on EVRC IS-127
– LSP Quantization
» Two stage MSVQ , non-predictive, 28 bits
• Stage 1: Full dimension 10 bits,
• Stage 2: Split of 3 4 3 with 6 7 5 bits
• Performance vs (EVRC full-rate LSP quantizer)
– MIRS Clean– SD: 1.15 dB (1.34 dB), 2dB Outliers: 3.56% (8.83%)– CAR15– SD: 1.20 dB (1.42 dB), 2dB Outliers: 5.72% (13.15%)
Seattle, Washington Apr 25-28th/003GPP2-C11-20000425-011
Lucent Full-Rate Continued
– FCB Search
» Steal delta-delay and LPC flag bits (6 bits)
» 9 pulses in 54 sample subframe with 37 bits
• Each subframe is divided into 7 tracks of 8 locations
• 3 bits for track select
• 20 bits for five one-pulse tracks
• 14 bits for two two-pulse tracks
– RCELP, ACB and FCB gain quantizations
» Same as EVRC
Seattle, Washington Apr 25-28th/003GPP2-C11-20000425-011
Lucent Half-Rate
– RCELP Based
– LSP Quantizer
» New 2-stage MSVQ
» First stage is same as full rate LSP quantizer
» Second stage is a split vector quantizer
• 5-5 dimension splits
• 6-6 bit allocations
Seattle, Washington Apr 25-28th/003GPP2-C11-20000425-011
Lucent Half-Rate– RCELP Based
– LSP Quantizer
» New 2-stage MSVQ
» First stage is same as full rate LSP quantizer
» Second stage is a split vector quantizer
• 5-5 dimension splits
• 6-6 bit allocations
– FCB Search
» Same as IS-127 (10 bits/subframe)
Seattle, Washington Apr 25-28th/003GPP2-C11-20000425-011
Improved Postfilter
– Improves Tandem Performance
» Weigthing filter is generated from LSF interpolations
– Ref: 1997 Speech Coding Workshop, Tasaki, et.al. pp.57
9,...,5,5.0
4,...,004.03.0
i
iiki
)()1()()(3 ifkifkif cirefi
)(7.0)(3.0)(4 ififif cref
)(
)()(
4
3
zA
zAzW parcorfirstwithsetLsffref :