04119318

Proceedings of 2006 IEEE Information Theory Workshop (ITW'06)

DESIGN AND DESCRIPTION OF A 600 BPSSPEECH CODER BASED ON MELPE

Feng Zou Ying Guo Xinfu Chen Yan LiuTelecommunication Engineering College, Air Force Engineering University, Xi'an, Shaanxi 710077, China

Email: feng-zou dl63.com

Abstract-This paper describes a 600 bps speech coder basedon the enhanced mixed excitation linear prediction model Ill

(MELPe). The algorithm of this speech coder includesfeatures of MELPe, which can obtain high qualitysynthesized speech and is robust in difficult backgroundnoise environments. To reduce the bit rate, we havedeveloped a modified multi-frame joint vector quantizationthat takes advantage of inherent inter-frame redundancy.The predicted multi-stage vector quantizer (PMSVQ) isdesigned to quantize the line spectrum frequency (LSF)parameters. Simulation results have proven that an efficientand high quality coding has been achieved at bit rate 600bps, and the proposed coder is better than the existing 2400bps LPClOe standard [2].

I. INTRODUCTION

The Mixed Excitation Linear Predictive [3] (MELP)vocoder was selected as the 2400 bps Federal StandardVocoder in 1996. The United States Department ofDefense Digital Voice Processing Consortium (DDVPC)had taken a multi-year extensive testing program. MELPwas selected as the best of seven candidates and even beatthe FS1016 4800 bps vocoder. MELPe provides a 1200bps option and speech enhancement. It had been adoptedas the STANAG4591 2400/1200 bps vocoder by NorthAtlantic Treaty Organization (NATO). With the optionalnoise pre-processor, MELPe is robust in difficultbackground noise environments such as those frequentlyencountered in commercial and military communicationsystems.

In this paper, we describe the important aspects of thealgorithm, which is used in proposed 600 bps speechcoder. The core analysis algorithm is shared with the2400 bps MELPe standard, and the transmittedparameters are the part of the 2400 bps MELPe coder.The parameters of three consecutive frames are groupedtogether into a superframe. Proposed coder uses themodified multi-frame joint vector quantization to quantizethe parameters of superframe. The predicted multi-stagevector quantizer is designed because of the LSFcorrelation between frames to frames. An MA prediction[4] is used to quantize the LSF parameters. The MAprediction is effective against channel errors because thepropagation of decoding errors is limited in the order ofthe prediction. The PMSVQ achieves "transparentquality" while the computation complexity is low, and thepropagation of the errors is limited.

II. CODER OVERVIEW

The proposed coder is designed to operate with anappropriately band-limited signal sampled at 8000 Hz.The input and output samples are represented using 16-blinear PCM. The coder operates on frames of 25 ms,using three consecutive frames are grouped together intoa superframe. This results in an overall algorithmic delayof 75 ms.The analysis and synthesis algorithm of proposed coder

is shared with the 2400 bps MELPe. There are six kindsof parameters need to be transmitted in the MELPevocoder. We select band-pass voicing, energy, pitch andspectrum to be quantized and transmitted. No bits areused to perform the others quantization. The Fouriermagnitude vector is quantized to one of two vectors. Aflat vector is selected for unvoiced frames, and a singlevector is used for voiced frames. The vector which isselected depends on the voiced/unvoiced decision. Theaperiodic flag can also be achieved from thevoiced/unvoiced decisions of the superframe, becauseaperiodic pulses are used most often during transitionregions between voiced and unvoiced segments of thespeech signal. So no bits are required to perform thequantization of Fourier magnitude and aperiodic flag. Theselected parameters will be quantized jointly. Noisepre-processor, adaptive spectral enhancement, and pulsedispersion are used to obtain high quality synthesizedspeech.

III. MODIFIED MULTI-FRAME JOINT VECTORQUANTIZATION

A. Multi-frame LSFjoint quantization based on PMSVQ

The linear prediction coefficients are converted intoline spectrum frequency, and the LSF parameters of threeconsecutive frames are grouped together into a matrix.We can exploit the redundancy arising from thecorrelation between consecutive matrixes, so that apredicted multi-stage vector quantizer is designed toquantize the LSF parameters. The LSF residue of theprior superframe will be used to predict the LSFcoefficients of the current superframe. Input LSFparameters are predicted by using a second order MAprediction and the residue of the predicted LSP

1-4244-0067-8/06/$20.00 C)2006 IEEE. 356

parameters is quantized by a four-stage VQ [5] Fig. 1shows the PMSVQ scheme.

codebook r WP f

F£_1d L PL J -1, 4-

Codebook mn-iumindex

Figure 1. The scheme ofPMSVQ

Input LSF parameters are Wj . Quantized LSF

parameters (i j are generated using:

i,j = o,j rij + Pl,j -

i-1,3 + P2,j r'i-1,2=k,j- diag{pp,j j p;jJ}M

Z pkj =I; j=1,2,3;k =O, M;M =2k=O

(1)

(2)

(3)

Where i is the i-th superframe, j is the j-th frame of thesuperframe, M is the MA prediction order, ri is the

output vector from the four-stage VQ at the j-th frame ofi-th superframe, I is the unit matrix, and Pk j is a

diagonal prediction matrix.The generalized Lloyd algorithm [6] is used to train the

MA predictive coefficients. First, the algorithm generatesthe code from the LSF codebook that minimizes thedistortion for the input LSF parameters for each frame.Second, it determines MA predictive coefficients thatminimize the distortion between input parameters and thereconstructed parameters for all frames. The twoprocesses are performed alternately. While MA predictivecoefficients are being trained, the codebooks are keptfixed. Spectral distortion is selected as the distortionmeasure.

SD [fio sigicoj ]1t (4)

Where i is the i-th superframe, j is the j-th frame of the

superframe, and Sij (w) and Si, (o) are the power

spectrum of unquantized and quantized signal.The MSVQ codebook consists of four stages of 128,

128, 64, and 64 levels respectively. The search procedureis an M-best [7] approximation to a full search, in whichthe M=8 best code vectors from each stage are saved foruse with the next stage, and uses spectral distortion as thedistortion measure also.

TABLE I. LSF QUANTIZER PERFORMANCE BASED ON PMSVQ

Average SD (dB) 4>SD>2 SD>4

1.24 7.87%0 0.1%o

Table 1 shows the performance of the PMSVQ. It canachieve "transparent quality" approximately, and onlyuses 26-bit codebook to quantize. The concept of"Transparent quantization" was described in reference [8].Any degradation caused by channel errors affects thequality of only a few of the subsequent superframeswhich is determined by the order ofMA prediction.

B. Multi-frame pitch quantization

The pitch information of superframe is quantized used7-bit codebook. The quantization schemes of pitch aredetermined by the different voiced/unvoiced decisions ofthe superframe. Pitch information is not to be quantized,where all the frames are unvoiced in a superframe. Forsuperframe that contains only one voiced frame, the pitchvalue of voiced frame is quantized on a logarithmic scalewith a 99-level uniform quantizer which is the same asthat in the 2400 bps MELP standard. The unused bits areused to the error protection. Within the superframe wherethe voiced frames are two or three, the pitch parametersare vector quantized. A special distortion measure is usedin this VQ algorithm which is additional detailed inreference [9]. The distortion measure is showed asfollow:

d =wEp_ |-i+A A1312i=l i=l

1, voiced frame{=0. 1, unvoiced frame

Ap =pi - Pi-1, voiced frames0, otherwise

(5)

(6)

(7)

Where pi and Pi are the unquantized and quantized log

pitch values respectively, po is the last log pitch value

of the previous superframe, wi is the weighting

coefficient, 3 is a parameter to control the contributionof pitch differentials which is set to be 1 in the proposedcoder. The optimum index is selected from codebook thatminimizes the distortion.

C. Multi-frame band-pass voicing quantization

The proposed coder determines the five band-passvoiced/unvoiced decisions per frame, and uses a 3-bitcodebook to quantize per superframe by taking advantageof inter-frame redundancy of the voicing decisions. Theband-pass voiced/unvoiced decisions parameters of threeconsecutive frames are grouped together into a vector.

1-4244-0067-8/06/$20.00 C)2006 IEEE. 357

The VQ algorithm uses weighted Euclidean distance asthe distortion measure.

d= yw1(b,j, ) (8)i=1 j=1

Where i is the i-th frame of the current superframe, j is thej-th band-pass of the current frame, bi j = 1 means that

the j-th band-pass voiced/unvoiced decisions is voiced,

otherwise bij = 0, bij is the quantized band-pass

voiced/unvoiced decision, and Wj is the weighted factor

which is determined by training.

D. Multi-frame gain quantization

Two gain parameters are calculated per frame, and thelogarithmic energy values from three successive framesare grouped to form vectors of 6 dimensions. An 8-bitcodebook is used quantized the vectors. The gaincodebook was generated using the K-means vectorquantization algorithm, and the Euclidean distance isadopted as the distortion measure.

E. Bit allocation

The proposed coder operates on frames of 25 ms, andthe block buffer of three consecutive frames, for blockduration of 75 ms. The bit allocation of proposed coderare shown in Table 2. A total of 45 bits is used persuperframe.

TABLE II. THE PROPOSED CODER BIT ALLOCATION

Parameters Bits (bit)

Pitch 7

Gain 8

Fourier Magnitudes 0

Band-pass Voicing 3

Aperiodic Flag 0

LSF 26

Synchronization 1

Total 45

IV. TEST RESULT

The Diagnostic Rhyme Test (DRT) and the DiagnosticAcceptability Measure (DAM) are used in these informaltests. DRT is used to measure speech intelligibility, andspeech quality is measured by DAM. For comparisonpurposes, the 2.4 kbps MELPe standard coder P] was used.The coders were tested on speech containing quietbackground, 1% random bit error channel, and high

mobility multipurpose wheeled vehicle (HMMWV)background. All of the coders scored higher for maletalkers than female talkers, and the averaged results ofmale and female scores are shown in Table 3 to 5. Thesubjective quality of the proposed coder is found betterthan that of LPClOe [2] and approximately near the 2.4kbps MELP standard [3] in informal tests.

TABLE III. INFORMAL TEST RESULTS IN QUIET BACKGROUND

Test item DRT DAM

Test Speech Signal (Quiet) 96.1 86.0

2400 bps MELPe 94.2 70.1

600 bps proposed coder 91.3 56.7

TABLE IV. INFORMAL TEST RESULTS IN 1% BRE

Test item DRT DAM

Test Speech Signal (Quiet) 96.1 86.0

2400 bps MELPe 92.2 59.5


TABLE V. INFORMAL TEST RESULTS INHMMWV BACKGROUND

Test item DRT DAM

Test Speech Signal (HMMWV) 92.0 50.3

2400 bps MELPe 76.2 54.6


V. CONCLUSION

In this paper, a new 600 bps speech coder based onMELPe is proposed, and the important aspects of thealgorithm are described. The proposed coder uses newtechniques for improving performance. The PMSVQ isdesigned to quantize the LSF parameters, which achieves"transparent quality" approximately and against channelerrors effective. Using the modified multi-frame jointvector quantization to quantize the parameters ofsuperframe, we can reduce the bit rate and obtain highquality synthesized speech. The informal subjectivequality tests show that the speech quality of proposedcoder is found better than that of LPC 1 Oe andapproximately near the 2.4 kbps MELP standard.

ACKNOWLEDGMENTS

The research is supported by: Shaanxi Natural ScienceFoundation of China (No. 2006F40).

1-4244-0067-8/06/$20.00 C2006 IEEE. 358

REFERENCES

[1] J. S. Collura, and D. F. Brandt, "The 1.2kps/2.4kbpsMELP speech coding suite with integrated noisepre-processing," in Proc. IEEE Mil. Comm. Atlantic City,NJ, vol. 2, pp. 1449-1453, Oct.-Nov. 1999.[2] T.E. Tremain, "The government standard linearpredictive coding algorithm: LPC- 10," SpeechTechnology, vol. 2, no. 1, pp. 40-49, April. 1982.[3] McCree A, and Truong K, "A 2.4 kbit/s MELP codercandidate for the new U.S. federal standard," Proceedingsof IEEE ICASSP 1996. Piscataway, New Jersey. IEEEPress, pp. 200-203, 1996.[4] R. Salami, and C. Laflamme, "Design and descriptionof CS-ACELP: A toll quality 8kb/s speech coder," IEEETransactions on Speech andAudio Processing, vol. 6, no.2,pp. 116-130, March. 1998.[5] Chan W Y, and Gupta S, "Enhanced multistage vectorquantization by joint codebook design," IEEETransactions on Communications, vol. 40, no. 11, pp.1693-1697, 1992.[6] S. P. Lloyd, "Least squares quantization in PCM,"IEEE Trans. Inform. Theory, vol. 28, no. 2, pp. 129-137,1982.[7] LeBlanc W P, and Bhattacharya B, "Efficient searchand design procedures for robust multi-stage VQ of LPCparameters for 4 kb/s speech coding," IEEE Transactionson Speech and Audio Processing, vol. 1, no. 4, pp.373-385, 1993.[8] K. K. Paliwal and B. S. Atal, "Efficient VectorQuantization of LPC Parameters at 24 Bits/Frame," IEEETransactions on Speech and Audio Processing, vol. 1, no.1, pp. 3-14, Jan. 1993.[9] Wang Tian, and Koishida K, "A 1200 bps speechcoder based on MELP," IEEE. ICASSP 2000. Piscataway,New Jersey. IEEE Press, pp. 1375-1378, 2000.

1-4244-0067-8/06/$20.00 C)2006 IEEE. 359

Documents

04119318