[IEEE 2010 7th International Multi-Conference on Systems, Signals and Devices (SSD) - Amman, Jordan (2010.06.27-2010.06.30)] 2010 7th International Multi- Conference on Systems, Signals

2 010 7th International M ulti-Conference on S ystems, S ignals and Devices

Switched Split Vector Quantizer Applied for Encoding the LPC Parameters of the

2.4 Kbits/s MELP Speech Coder Merouane Bouzid, Salah Eddine Cheraitia, and Moussa Hireche

Speech Communication and Signal Processing Laboratory

Electronics Faculty, USTHB University, Algiers, ALGERIA

[email protected]

Abstract- In this paper, we present an optimized switched split vector quantization (SSVQ) scheme developed for low bit-rate encoding of the LPC parameters represented by the line spectral frequencies (LSF). It will be shown that the SSVQ provides better performance in terms of bit-rate, spectral distortion and computational complexity than the traditional split vector quantizer. We further applied our SSVQ encoding system, called LSF-SSVQ encoder, to quantize the LSF parameters of the 2.4 Kbits/s normalised speech coder MELP operating over an ideal noiseless channel.

I. INTRODUCTION

In speech coding systems, the short- term spectral information of the speech signal is often modelled by the frequency response of an all-pole filter whose transfer

function is denoted by H(z) = IIA(z) in which A(z) = 1 + alz-1

+ . . . + Gp z-p [1]. In telephone band speech coding (300-3400 Hz,/e = 8 KHz), the parameters of this filter are derived from the input signal through linear prediction analysis of p = 10 order. The 10 parameters {aih�1,2,,1O, known as the L inear P redictive Coding (LPC) coefficients [1 ], [2 ], play a major role in the overall bandwidth and preserving the quality of the encoded speech. Therefore, the challenge in the quantization of the LPC parameters is to achieve the transparent quantization quality [3], with the minimum bit-rate while maintaining the memory and computational complexity at a low level.

In practice, one doesn't quantifY directly the LPC coefficients because they have poor quantization properties. Thus, other equivalent parametric representations have been formulated which convert them into much more suitable parameters to quantize, One of the most efficient representations of the LPC coefficients is the L ine S pectral F requency (LSF) [4 ). The LSF parameters (L SFs) which are related to the zeros of polynomials derived from A(z) [2 ] exhibit a number of interesting properties [3], [5] making them a very attractive set of transmission parameters for the LPC coefficients,

E xploiting these properties, various coding schemes based on scalar and vector quantization were developed in the past for the efficient quantization of spectral LSF parameters. S everal works showed that the vector quantizer (V Q) schemes, such as multistage V Q [6], S plit V Q [3] . . . , can achieve at lower bit-rates the transparent quantization quality of the LSF

978-1-4244-7534-6/10/$26.00 mOIO IEEE

parameters compared with those conceived based on scalar quantizer (S Q) [7],

In this paper, we present an optimized encoding system based on the switched split vector quantization (SSV Q) method, The aim of this system, called "LSF-SSV Q E ncoder", is to achieve a low bit efficient quantization of the spectral LSF parameters. To more improve the performances of our LSF-SSV Q encoders, we used an appropriate weighted distance measure in the design and operation of the encoding system. After that, we applied the LSF-S SV Q encoding system to encode the LSF parameters of the normalised speech coder M ELP of2 .4 Kbits/s.

II. SWITCHED SPLIT VECTOR QUANTIZATION

The switched split vector quantizer (SSV Q) is a hybrid scheme based on a switch vector quantizer combined with many split vector quantizers (SV Qs) [8 ], [9 ].

In the SSV Q scheme, vectors are classified using an exhaustive search switch V Q and are then quantized by individual SV Qs, The advantage of classifYing before splitting is that global dependencies between vector components are exploited in the first stage, Also, the sub-optimality of splitting is then limited to local regions rather than the entire vector space, Hence, it will be shown next that the SSV Q provides a better trade-off than traditional SV Q in terms of bit rate and distortion performance,

Before presenting the S SV Q design process, let's first review briefly the basics of the SV Q. An N parts kdimensional SV Q (noted N-SV Q) is composed of N classical V Qs of smaller sizes and dimensions [3]. Its basic principle consists in partitioning the set of the training base vectors x of dimension k in N sub-sets of sub-vectors of dimension ki

(with L�J ki = k ), Then, for each part, the corresponding V Q

codebook will be designed by using the well-known L BG-V Q algorithm [10 ). Compared with a conventional unstructured kdimensional V Q, of rate R bits/sample (bps) and size L = 2 R\ an N-SV Q is thus constituted of N codebooks of sizes Li =

2 Riki (where L = TI�l Li and Ri is the partial rate given in bps).

Quantizing an input source vector by an SV Q encoding system consists thus to split this vector into N sub-vectors of smaller dimension which will be quantized after separately by using their respective codebooks.

2010 7th International M ulti-Conference on Systems, S ignals and Devices

A. Design principle of the SSVQ

The basic principle of the SSV Q consists in dividing the training database vectors space into several parts (division), where each part is represented by a corresponding local SV Q quantizer [8], [9].

Fig. 1 shows a schematic of SSV Q codebook construction. The first step consists in applying the L BG-V Q algorithm on all the training base vectors in order to produce m centroids (code-vectors). The set of these code-vectors (noted Ym) is called the switch V Q codebook where m is the number of switch direction. Then, this codebook is used to divide the training database into m parts (clusters) according to the nearest-neighbour criterion. I.e., each vector of the training base is compared with the m code-vectors then classified to the cluster of the nearest code-vector. At the end of this step, there will be thus m parts corresponding to m switching directions.

In the second step, each part i (i = 1, ... , m) will be represented by a local N-SV Qi. It is about dividing the classified vectors of each part into N sets of sub-vectors, then to apply L BG-V Q algorithm to each set of sub-vectors in order to produce the N corresponding codebooks of local NSV Qi. At the end of the SSV Q design, one obtains (N m + 1) codebooks where the first is the switch V Q and the other N m codebooks are those of the local m N-SV Q.

x

0' > , VJ

I ,-----, I

L. __ � LOB to ... � 1 _____ 1

Fig . 1 SSVQ codebook construction

B. Coding/decoding SSVQ

S SV Q encoding of an input source vector x passes by two main steps. F irst, the vector x is switched to one of the m possible directions according to the nearest neighbour criterion. After that, the vector will be quantized by using the corresponding N-SV Q selected by the switch vector quantizer Ym. Thus, the SSV Q encoder transmits to the decoder an index i composed of N + 1 concatenated indices. The first index is indicates the switch direction. It is the index of the nearest code-vector to x among the m code-vector of the codebook Ym•

The remaining N indices in (n = 1 . . . N) will be provided by the local N-SV Q corresponding to the direction is. It is about the N indices of the N sub-vectors of x coded separately by the N-SV Q of the selected part is. In this work, one supposes that the transmissions are done through a noiseless ideal channel. F or an SSV Q operating at b bits/vector and m switch directions, we require bs = log2 m bits to uniquely identifY all possible switching directions. The remaining rate of (b � bs) bits will be shared in N partial rates to encode the N subvectors of x by the corresponding N-SV Q. One notes these partial rates by bj (j = 1 . . . N).

The SSV Q decoder, which has the same codebooks as the encoder, receives the sent index i = (i., in). It uses the first index (, to select the switch direction. Then, it provides the decoded vector of x by concatenating the sub code-vectors of indices in (n = 1, ... , N) corresponding to the is part N-SV Q.

III. EFFICIENT ENCODING OF LSF PARAMETERS USING THE

SSV Q TECHNIQUE

Using the SSV Q technique described above, an encoding scheme, called "LSF-SSV Q E ncoder", was developed to achieve a transparent quantization of the spectral LSF parameters. The encoder objective is to efficiently quantize the LSF parameters of one frame using only the dependencies among the same parameters. Recall that transparent quantization means that negligible audible distortion has incurred during quantization and consequently the coded speech is indistinguishable from the original through listening.

In the design of the local N-SV Q of our various LSF-SSV Q encoders, the LSF vector of dimension lO is divided either into two parts with (4 � 6) division or into three parts with (3 � 3 � 4 ) division. In general, the allocated bits are uniformly allocated to individual sub-parts of the same division wherever possible [8], [9].

To more improve the performance of the LSF-SSV Q encoder and to get transparent quantization quality at lower bit rate, we selected an appropriate distance measure. It's about the weighted E uclidean distance measure which is performed in the frequency LSF domain since the LSFs have a very good correspondence to the spectral envelop. S o, to emphasize a particular portion of the spectrum, such as formant regions and low frequencies, the LSFs of that part can be given more weight than the others. The weighted E uclidean distance measure used in this work is given by [11]:

(1)

where j; and fi are respectively the ith coefficients of the

original f and quantized j LSF vectors; Ci and Wi represent

respectively the constant and variable weights assigned to the ith LSF coefficient.

M any weighting functions have been defined to calculate the variable weight vector W = [Wj . . . WlO]. In our LSF-SSV Q design, we used a computationally efficient weighting function, given by [12]:


withfo = 0 etfi 1 = 0.5.

The speech database used in the experiments consists of (2) approximately 85 minutes of speech taken from the TIM IT

speech database (sampling ratej;= 16 kHz) [13]. The speech signals are first low pass filtered to 3 A kHz, and then down sampled to 8 kHz. To construct the LSF vectors database, we

The additional constant weight vector c = [c], . . . , ClO] is experimentally determined [3] :

used the same LPC analysis function of the FS 1 0 16 speech coder [14], where a lO-order LP C analysis, based on the autocorrelation method, is performed every analysis frame of 30 ms using a Hamming window. One part of the LSF database, consisting of 144984 LSF vectors, is used for {l.0 , for 1:::; i :::; 8

ci == 0.8, for i == 9

004, for i = 10

(3) training and the other part, of 26560 LSF vectors (different from the training set) is used for testing.

We present below the quantization performances of the LSF-SSV Q encoders operating at different bit-rates and number of switch directions. The performances are evaluated by the average spectral distortion (S D) which is often used as an objective measure of the LSF encoding performance. This measure correlates well with human perception of distortion. When calculated discretely over a limited bandwidth, the spectral distortion for frame i is given, in decibels, by [6], [11]:

(4)

For speech signal sampled at 8 kHz with a 3 kHz bandwidth, an N = 256 point FF T is used to compute the

original S(e2WIN) and quantized S(e27miN) power spectra of the LP C synthesis filter, associated with the /h frame of speech. The spectral distortion is thus computed discretely with a resolution of 31.25 Hz per sample over 96 uniformly spaced points from 125 Hz to 3.125 kHz. The constants no and nl in E q. (4) correspond to 1 and 96 respectively. The average spectral distortion is evaluated for all of frames (N.t) in the speech database:

Nf 1 SD==- I SDi•

Nf i=l (5)

In the past, an average S D of about 1 dB has been suggested for transparent quantization quality and used as a

goal in designing many LPC quantization schemes. However, P aliwal and Atal in [3] established that the average S D could not be sufficient to measure perceived quality alone. They introduced the notion of spectral Outliers frames. Consequently, we can get transparent quality if we maintain the following three conditions:

- 1) The average S D is approximately 1 dB,

- 2) The percentage of outlier frames having S D between 2 and 4 dB is less than 2%,

- 3) No outlier frame having S D greater than 4 dB.

F or different bit-rates b and number of switch directions. m

= 2bs, the performances of two examples of LSF-SSV Q encoders using 2 parts (4 - 6) and 3 parts (3 - 3 - 4) are shown respectively in Tables I and II.

TABLE! PERFORMANCES OF THE 2- PARTS LSF-SSVQ WITH THE DIVISION (4 - 6)

Bits/frame Average Outliers (in %) m

b (bs+ bl + b2 ) SD (dB) 2-4 dB >4 dB

26 (3 + 11 + 12) 0.98 0.46 0.000 25 (3 + 11 + 11) 1.06 0.97 0.000

8 24 (3 + 10 + 11) 1.10 1.20 0.000 23 (3 + 10 + 10) 1.18 2.49 0.000 22 (3 + 9 + 10) 1.22 3.06 0.000 26 (4 + 11 + 11) 0.98 0.65 0.000 25 (4 + 10 + 11) 1.03 0.80 0.000

16 24 (4 + 10 + 10) 1.11 1.58 0.007 23 (4 + 9 + 10) 1.15 2.00 0.003 22 (4 + 9 + 9) 1.23 3.79 0.003 26 (5 + 10+ 11) 0.95 0.44 0.000 25 (5 + 10 + 10) 1.03 0.81 0.000

32 24 (5 + 9 + 10) 1.08 1.01 0.000 23 (5 + 9 + 9) 1.16 1.97 0.000 22 (5 + 8 + 9) 1.21 2.62 0.000

TABLE II PERFORMANCES OF THE 3-PARTS LSF-SSVQ WITH THE DIVISION (3 - 3 - 4)

Bits/frame b: Average Outliers (in %) m

b (b,+ bl + b2+ b3) SD (dB) 2-4 dB >4 dB

26 (3 + 8 + 8 + 7) 0.98 0.80 0.00 25 (3 + 8 + 7 + 7) 1.05 1.25 0.00

8 24 (3 + 7 + 7 + 7) 1.08 1.49 0.00 23 (3 + 7 + 7 + 6) 1.14 2.80 0.00 22 (3 + 7 + 6 + 6) 1.26 4.28 0.00 26 (4 + 8 + 7 + 7) 0.98 0.71 0.00 25 (4 + 7 + 7 + 7) 1.02 0.85 0.00

16 24 (4 + 7 + 7 + 6) 1.09 1.70 0.00 23 (4 + 7 + 6 + 6) 1.16 2.58 0.00 22 (4 + 6 + 6 + 6) 1.21 3.33 0.00 26 (5 + 7 + 7 + 7) 0.96 0.42 0.00 25 (5 + 7 + 7 + 6) 1.02 0.95 0.00

32 24 (5+7+6+6) 1.08 1.47 0.00 23 (5 + 6 + 6 + 6) 1.13 1.89 0.00 22 (5 + 6 + 6 + 5) 1.20 3.41 0.00

2 010 7th International M ulti-Conference on Systems, S ignals and Devices

These simulation results show that the LSF-SSV Q performances, in terms of average S D and Outliers frames, can be always improved by increasing the number of switching directions. This is due to the increased ability of the unconstrained switch V Q to exploit dependencies as its codebook becomes larger.

In addition, it is noticed that the two parts LSF-SSV Q encoder gives better results than the three parts, especially in terms of outliers. The two parts LSF-SSV Q encoder can achieve the transparent quantization at a rate of 2 3 bits/frame (bpi) while the three parts LSF-SSV Q encoder needs 24 bpf to get the transparent quantization.

Table III presents a performances comparative evaluation between a two parts LSF-SSV Q encoder with m = 32 directions and a conventional SV Q of2 parts.

TABLE III PERFORMANCE COMPARISONS BETWEEN THE 2-SVQ AND THE 2-SSVQ

(m = 32) AS A FUNCTION OF BIT -RA TE

Rate Average SD (dB) Outliers 2-4 (%)

Bits/frame 2-SVQ 2-SSVQ 2-SVQ 2-SSVQ (m = 32) (m = 32)

26 1.05 0.95 0.82 0.44 25 1.03 1.03 0.95 0.81 24 1.17 1.08 2.11 1.01 23 1.21 1.16 2.63 1.97 22 1.29 1.21 5.05 2.62

These results show that the SSV Q yields significant improvement to the LSFs encoding performance compared to the SV Q, especially in terms of SO "Outliers". The performances of the LSF-SSV Q encoders could be more improved by increasing the number of switching directions m. F or example, the LSF-SSV Q encoder of24 bpf with m = 128 results in an average S D of 1 .069 dB and 1 .009 % outliers in the range 2 -4 dB.

This gain in S D performance is the result of the exploitation of correlation across all dimensions by the unconstrained initial switch V Q which uses the full vector dimension, contrary to the SV Q encoder which splits the LSF vectors at the beginning.

In addition, it was proved that the computational complexity of the SSV Q is lower than that of the SV Q [8 ]. L et's illustrate this advantage by a simple example of computation of the search complexity which indicates the number of calculated distance to encode one vector. Consider a 22 bits/frame 2 -parts SSV Q with m = 16 switch directions. Four bits are thus necessary to encode the switch index. The remaining 18 bits are allocated equitably for each local SV Q (9 bits per part).

The total number of searches required by the 2 -SSV Q to quantize a vector is thus equal to 24 + 29 + 29 = 1040 searches. On the other hand, for the same rate of 22 bpf, a conventional

2 -SV Q requires 211 + 211 = 4096 searches to quantize a vector. Thus, for a given bit-rate, the SSV Q has a smaller search complexity than the split V Q.

IV . LSF-SSV Q SCHEME FOR ENCODING THE M ELP LSF PARAMETERS

In this section we use the LSF-SSV Q encoder (with weighted distance) to quantize the LSF parameters of the normalized speech coder M ELP operating over a noiseless ideal channel. The Federal standard M ELP ( M ixed E xcitation L inear P rediction) is a speech coder of 2 .4 Kbits/s developed by the U S DoD [15]. According to the M ELP norm, the LSF parameters are encoded at the origin by an MSV Q of 25 bits/frame.

The database used in the following evaluations is composed of 113s of speech sequences extracted randomly from the TIM IT test database. To evaluate the objective quality of the coded speech signals synthesized by the M ELP coder, we used the ITU -T Recommendation P .862 known under the abbreviation P ES Q (P erceptual E valuation of Quality S peech) [16]. The P ES Q could evaluate the listening quality under many degradation conditions and have a very close correlation with the subjective evaluations.

In Table IV , we present a comparative evaluation results of the global ME LP performances where its LSF parameters were coded separately by the following encoders: the original MSV Q-M ELP of25 bpf, the LSF-SSV Q of24 ( 7 + 8 + 9 ) bpf with division (4 - 6) and m = 128 and finally the 2 -SV Q of24 (12 + 12 ) bpf with the division (4-6).

TABLE IV PERFORMANCE-PESQ OF THE GLOBAL MELP CODER

Encoders-LSF Performance Performance MELP

LSF-Coding Average Outliers (in %) ofMELP

SD (dB) 2-4 dB >4 dB PESQ

MSVQOrig. 1.018 1.230 0.000 3.160 (25 bpf)

LSF-SSVQ 1.078 1.032 0.000 3.157 (24 bpf)

2-SVQ 1.193 2.759 0.000 3.148 (24 bpf)

These comparative results show that the 24 bits/frame LSFSSV Q encoder, incorporated in the M ELP , presents comparable performance with the original 25 bits/frame MSV Q. Indeed, the LSF-SSV Q achieved a transparent quantization of the M ELP LSF parameters with in addition a

gain of I bits/frame. One observes also the superiority of the SSV Q technique

over the SV Q. In other hand, the M ELP coder performance in term of P ES Q are very acceptable (P ES Q higher than 3), ensuring thus a good quality communications with high levels of intelligibility.

V . CONCLUSION

In this work, an encoding system based on the SSV Q technique has been successfully applied for the efficient encoding of the LSF parameters of the 2 .4 Kbits/s M ELP speech coder. Compared to the conventional SV Q LSFencoder, the LSF-SSV Q encoder can save about 1 -2


bits/frame, while maintaining comparable performances. The performance of the LSF-TCV Q encoder in the presence of channel errors remains to be studied. Also a more careful study of the encoder robustness with respect to changes in recording conditions is necessary.

REFERENCES

[I] L. R. Rabiner and R. W. Schafer, Digital Processing of Speech signals, Prentice-Hall, Englewood Cliffs, Nl, 1978.

[2] W. B. Kleijn and K. K. Paliwal, Speech coding and synthesis, Elsevier Science B.V., 1995.

[3] K.K. Paliwal and B. S. Atal, "Efficient vector quantization of LPC parameters at 24 bits/frame", IEEE Transactions on Speech and Audio Processing, vol. I, no. I, pp. 3-14, 1993.

[4] F. Itairura, "Line spectrum representation of linear predictive coefficients of speech signals," Journal of Acoust. SOCiety America, vol. 57, p. 535, 1975.

[5] l. Pan and T. R. Fischer, "Vector quantization of speech line spectrum pair parameters and reflection coefficients," IEEE Trans. Speech Audio Process., vol. 6, no. 2, pp. 106-115, March 1998.

[6] W. F. Leblanc, B. Bhattacharya, S. A. Mahmoud and V. Cuperman, "Efficient search and design procedures for robust multi-stage VQ of LPC parameters for 4 kb/s speech coding," IEEE Trans. Speech and Audio Processing, vol. I, no. 4, pp. 373-385, Oct. 1993.

[7] F. K. Soong and B. H. luang, "Optimal quantization of LSP parameters," in Proc. IEEE Int. Con! Acous., Speech Signal Processing, ICASSP'88, New York, April 1988, pp. 394-397.

[8] S. Stephen and K. K. Paliwal, "Efficient vector quantization of line spectral frequencies using the switched split vector quantiser," in Proc. Int. Con! Spoken language Processing, leju, Korea, 2004.

[9] S. Stephen and K. K. Paliwal, "A comparative study of LPC parameter representations and quantisation schemes for wideband speech coding," Digital Signal Processing journal, Elsevier, vol. 17, pp. 114-137, 2007._

[10] Y. Linde, A. Buzo and R. M. Gray, "An Algorithm for Vector Quantization Design," IEEE Transactions on Communications, vol. COM-28, pp. 84-95, 1980.

[II] M. Bouzid, A. Djeradi and B. Boudraa, "Optimized Trellis Coded Vector Quantization of LSF Parameters: Application to the 4.8 Kbps FSI016 Speech Coder," Signal Processing journal, Elsevier, vol. 85, no. 9, pp. 1675-1694, September 2005.

[12] R. Laroia, N. Phamdo and N. Farvardin, "Robust and efficient quantization of speech LSP parameters using structured vector quantizers," in Proc. IEEE Int. Conference Acoustic Speech and Signal Processing ICASSP'91, May 1991, pp. 641-644.

[13] l.S. Garofolo and aI., "DARPA TIMIT Acoustic-phonetic Continuous Speech Database," National Institute of Standards and Technology (NIST), Gaithersburg, October 1988.

[14] l. P. Campbell, T. E. Tremain and V. C. Welch, "The Proposed Federal Standard 1016 4800 bps Voice Coder: CELP," Speech Technology Magazine, pp. 58-64, April/May 1990.

[15] A. McCree, K. Truong, E. B. George, T.P. Bamwell and V. Viswanathan, "A 2.4 kbits/s MELP Coder Candidate for the New U.S. Federal Standard," in Proc. of IEEE ICASSP-96, 1996, pp. 200-203.

[16] ITU-T, Recommendation P.862, "Perceptual evaluation of speech quality assessment of narrowband telephone networks and speech codecs," February 2001.

Documents

[IEEE 2010 7th International Multi-Conference on Systems, Signals and Devices (SSD) - Amman, Jordan (2010.06.27-2010.06.30)] 2010 7th International Multi- Conference on Systems, Signals