Design and simulation of an efficient adaptive delta modulation embedded coder

Design and simulation of an efficient adaptive delta modulation embedded coder

S.D. Zhang G.B. Lockhart

Indexing terms: Delta modulation, Embedded HCDM coder

Abstract: An embedded coding version of hybrid companding delta modulation (HCDM) is described that operates from 16 to 48 kb/s in 8 kb/s steps. The embedded HCDM coder employs the explicit noise coding technique to transmit an adaptive PCM (APCM) coded version of the HCDM reconstruction error signal as a supplementary bit stream that may be partly or wholly deleted in transmission. SNR performance with speech input depends critically on the design of the supplemental APCM code and two new coding algorithms are investigated. In algorithm 1 , the basic cue for step size adaptation is obtained from the RMS slope energy of the HCDM output whereas in algorithm 2, the HCDM reconstruction error is logarithmically compressed before quantisation and the basic step size is derived from peak input magnitudes. Instantaneous adaptation for both algorithms is achieved by using step size multipliers which are optimised for operation at single fixed bit rates and also for decoding with an unknown number of input bit deletions. Simulation results show that SNR performance is significantly enhanced using either algorithm and a graceful reduction of reconstructed speech quality with progressive bit deletion is achieved over the range from 48 kb/s to 16 kb/s. On the whole, the SNR performance of the embedded HCDM system is superior in comparison with conventional HCDM.

1 Introduction

Embedded speech coding allows progressive bit rate reductions to be imposed on the transmitted bit stream without reference to encoder or decoder and with a graceful degradation in received speech quality. This property can be advantageous in digital communication systems when the transmitted bit rate exceeds available channel or network capacity. Rather than terminate communication, it may be preferable to discard bits, provided communication can be maintained at lower bit rates with acceptable loss in the quality of received speech.

Conventional PCM is the simplest example of an embedded coder. Using the same decoder, progressive deletion of least significant bits from codewords produces

0 IEE, 1995 Paper 1942K (ES), first received 13th June 1994 and in final revised form 10th March 1995 The authors are with the Department of Electronic and Electrical Engineering, University of Leeds, Leeds LS2 9JT, United Kingdom

IEE Proc.-Vis. Image Signal Process., Vol. 142, No. 3, June 1995

bit rate reductions with a SNR penalty of about 6 dB per discarded bit. More efficient differential coders such as DPCM and delta modulation (DM) are not inherently embedded since output bits cannot be discarded without incurring severe noise problems due to encoder/decoder mistracking. Embedded versions of both DPCM and DM have therefore been proposed which exhibit embedded coding properties [l-51. In this case, explicit coding of the basic coder’s quantisation noise offers an effective approach [2, 31. Such coders operate at (Rhos + R,,J bits/sample with an output consisting of two bit

streams. The basic coder output of Rbas bits/sample must be received correctly to ensure proper decoder operation but the remaining R,, bits/sample, generated by encod- ing the quantisation noise, provide a supplemental output which may be progressively deleted to reduce bit rates.

At lower bit rates, vector quantisation can be used to provide an embedded output stream [6]. Some embedded systems based on analysis by synthesis predictive coders have also been proposed such as embedded CELP [7] and embedded RPE [SI. Both are based on multi- stage coding and exhibit a graceful degradation in speech quality with progressive bit deletion with only a slight reduction in SNR performance.

An efficient embedded adaptive DM (ADM) system based on the explicit noise coding technique [3] will be described. Compared with adaptive PCM (APCM) and adaptive DPCM (ADPCM), ADM coders have the advantages of extremely low coding complexity and high resistance to transmission error. Among three representa- tive ADM coders, continuously variable slope delta modulation (CVSD) [9, lo], constant factor delta modulation (CFDM) [ll-131 and hybrid companding delta modulation (HCDM) [14], HCDM possesses the best performance in SNR gain and dynamic range regardless of transmission environment [lS]. The proposed embedded system is based on HCDM coding and, like HCDM, employs efficent but simple hybrid companding mecha- nisms for supplementary APCM coding. For these reasons, embedded HCDM (EHCDM) is attractive for applications such as mobile communications requiring robust operation at medium bit rates.

Rbas = 1 bit/sample is transmitted by the basic HCDM encoder and the difference between the HCDM output and original speech is encoded using APCM to produce R,,, supplemental bits per sample. The HCDM reconstruction error consists of a mix of high amplitude segments due to slope overload with longer segments of granular noise of considerably lower amplitudes. The performance of the EHCDM system depends critically on matching the supplemental APCM encoder to this signal. Two candidate algorithms of low complexity were investigated.

155

2 Embedded hybrid companding delta modulation (EHCDM) then

and c(n) is the output bit. The overall adaptation rule is

(3) The dashed box in Fig. 1 represents a basic conventional HCDM encoder and decoder incorporating both syllabic and instantaneous adaptation schemes [14]. Syllabic adaptation, controlled by the RMS slope energy, E, of

=

The basic HCDM coder in Fig. 1 is supplemented by the APCM coder, also with a hybrid companding mechan-

HCDM I

(16 kbis) I w cDM(n) f

instantaneous adaptation

fnst-order basic step slze

.... ...................................................................

output I ........................

APCM

(32 kb/s) decimator APCM ‘ P C d n )

reconstruction error

0

i HCOM , 1 :

....................................... I decoder input

i .................................................................................................................................................... i 3II decimator I

APCM output

b

* “ e h ” ldecimator I

output - b

Fig. 1 (1 Encoder b Decoder

Block diagram of embedded hybrid companding delta modulation ( E H C D M )

the decoded speech, determines the basic step size of the quantiser and is updated every frame of N samples. Instantaneous adaptation changes the step size at each sampling instant according to the states of the current and two preceding output bits. The quantiser basic step size A is given by

where a is a scaling constant and A = aE (14

where f in) is the decoded output. The instantaneous adaptation logic is given by

An) = M(n)y(n - 1) ( 2 4 and

1.5 if c(n) = c(n - 1) = c(n - 2)

M(n) = 0.66 if c(n) # c(n - 1) (24 i 1 otherwise where y(n) provides instantaneous adaptation for DM quantiser step size 6(n), M(n) is the step size multiplier

I56

ism for changing step size. The HCDM reconstruction error is lowpass filtered to the bandwidth of the original speech to remove out-of-band quantisation noise. Deci- mation is then possible since DM coders are normally oversampled. (This approach has been named ‘reduced rate embedded DM’ [2].) The APCM coder encodes the filtered, decimated HCDM reconstruction error and bit rate reductions are achieved by dropping bits from the APCM output codewords. Exactly the same lowpass filter and decimator must be used in the encoder and decoder to ensure correct combination of the HCDM and APCM decoded outputs.

3 Algorithm design

3.1 Algorithm 1 It is reasonable to assume that high or low magnitude HCDM error samples tend to occur when the average slope energy, E, is respectively high or low. Syllabic adaptation can therefore be achieved by deriving the basic step size, A,,dk), for the kth frame directly from the RMS slope energy of the HCDM decoded output which

IEE Proc.-Vis. Image Signal Process., Vol. 142, No. 3, June 1995

is already available at the decoder, that is,

APCM(k) = aE(k - 1)/2B (44 where a is a scaling factor and B is the number of bits available for APCM coding. The instantaneous PCM step size, h(n), is then obtained using the following adaptation procedure

8pcdkN + = M ( I cpcdkN + n - 1) I )APcdk) fo rn=0 ,1 , ..., N - 1 and k = 0 , 1 , ..., K - 1 (4b)

where N is the frame length, K is the total number of frames, 6,,,(kN + n) is the step size of the PCM quantiser at the nth HCDM error sample in the kth frame, and M( ) is the step size multiplier defined as a time-invariant function of the latest quantisation level, I cpcM(n - 1) 1 , of the PCM coder output [16].

3.2 Algorithm 2 The filtered, decimated HCDM reconstruction error consists of granular noise with occasional impulsive com- ponents of relatively large amlitude. Although noise-like, its amplitude distribution approximates that of speech and the dynamic range is also large. The use of the p-law function, originally designed for efficient speech compression, was therefore investigated for compression of the HCDM reconstruction error by the supplemental coder. The HCDM error is first p-law compressed and then quantised by an adaptive quantiser using the instantan- eously changing step size

6 d n ) = M( I cpCM(n - 1) I PPCM ( 5 4 where APCM is a constant step size given by

d 141

APCM = B I 4 l,d2B- I 4 Imx is the magnitude of the possible largest HCDM error input to the compressor in the p-law function, and B is a scale constant.

4 Simulation

The EHCDM system was simulated using 3.1 s and 4.5 s of female and male speech, respectively, bandlimited to 3.4 kHz and sampled at 16 kHz. The utterances were ‘the bank of England has started making a new five-pound note’ and ‘I rode for a long distance in one of the public coaches on the day preceding Christmas’ by a female and male speaker, respectively. The estimation of HCDM basic step size was made once every 5 ms (80 samples) in a backward mode. Decimation, by a factor of 2 with 4 bit APCM coding of the HCDM reconstruction error, gives R,, = 1 at a sampling rate of 16 kHz with Rsup = B = 4 (referred to the 8 kHz decimated rate) allowing operation from 16 to 48 kb/s in 8 kb/s steps.

4.1 Algorithm 1 Fig. 2 shows SNR performance of the EHCDM system using eqn. 4 4 as a function of the scaling factor a. The optimum values for a over the PCM operating rates from 1 to 4 bits/sample are very different, ranging from a = 2 at 24 kb/s to a = 12 at 48 kb/s and a tradeoff in the selection of a suitable value for tl exists between operation at high and low bit rates. Comparison’of HCDM reconstruction error with the corresponding average slope energy confirmed that high and low amplitude HCDM error samples do occur mainly during periods of high and low slope energy, respectively. Thus linear PCM, using the adaptation scheme of eqn. 4 4 leads to a higher

I E E Proc.-Vis. Image Signal Process., Vol. 142, No. 3, June 1995

SNR performance than PCM without an adaptation scheme, particularly at higher bit rates, as illustrated by curves a and b in Fig. 3. In this case, the value of a was

1 ° 1

0

2 4 6 8 10 12

scaling facto1

Fig. 2 EHCDM system using eqn. 40 a R,,, = B = 4 b R,up = B = 3 E R,,,= 8 = 2 d R,,, = B = I

S N R performance as a Junction of scaling factor a Jar the

28 -

2 4 - m .

a‘ 2 0 - D

i n -

16 -

12 -

16 24 32 40 4 8

btt rate, k b i s

Fig. 3 a Uniform quantiser b Algorithm 1 with sealing factor II = 10.0 and step size multipliers M = I c Algorithm 1 with values of rx and M (saand wlmn in Table I ) optimised for each value of B d Algorithm 1 with values oin and M (third column in Table 1) for unknown bit deletions

S N R performance of the EHCDM system

10, favourable to system performance at higher rates, and the step size used for the uniform quantiser was optimised only for B = 4.

System performance was further improved by optim- king step size multipliers in eqn. 4b. Consider first the case of B = 4. SNR was determined as M , varied from 1.0 to 0.4 with a = 10 and M 2 = M , = . . . = M , = 1.0. The maximum SNR resulted when M , = 0.5, and was chosen as the initial optimum value. Similarly, the initial optimum value for M, was obtained with a = 10, M , = 0.5 and M 3 = M , = . . . = M , = 1.0. Note that the step size multipliers follow the usual criterion [16]

Mi < M , < . ’ . < ML/Z

for MI < 1 and M,/ , > 1 (6)

157

where L is the number of the quantiser output levels. After the 8 initial optimum values were obtained for the 8 multipliers, the above optimisation procedure was repeated until the final optimum values were obtained. The entire procedure was also repeated for B = 3, 2, and 1 respectively. Optimum values of scaling factor, a, and step size multipliers M are listed in Table 1 (second column).

Table 1 : Scaling factor a and step size multipliers M for APCM suoolemental coding using algorithm 1

B Optimised for each Optimised for unknown value of B bit deletion

1 2 3 4 4

U 2.0 3.5 6.5 10.0 10.0 M , 1.0 0.70 0.60 0.50 0.54 M, 2.30 0.70 0.56 0.54 M3 1.20 0.65 0.54 M a 2.00 0.82 0.54

1.02 1.45 1.40 1.45 1.45 1.45

M5

M6

M, M, 1.70 1.45

The SNR values using eqn. 4b with the optimum parameter values in the second column of Table 1 are illustrated by curve c in Fig. 3. Performance is significantly enhanced by using these factors and multiplier values for instantaneous step size adaptation. Although such performance can be obtained by operating at single fixed bit rates, this cannot give proper embedded coding operation if the number of bit deletions is unknown at the decoder and therefore quantiser characteristics cannot be optimised to account for different received bit rates. Nevertheless, it is still possible to design an adaptive quantiser which leads to higher SNR values than using adaptation based on the HCDM output alone (eqn. 4a). It follows from the inequalities (eqn. 6) that step size multipliers for such a quantiser are subject to the following constraints

L/4 < 1 (74

M L / 4 + 1 = M L / 4 + 2 = " ' = (7b)

M - M = M 1 - 2 -

The third column in Table 1 gives such a set of multipliers (with a = 10.0). Midrise quantiser levels and associ- ated step size multipliers are as shown in Fig. 4a when the APCM decoder operates at 4 bits/sample (E = 4). When one bit is discarded, the characteristics become equivalent to that in Fig. 4b and so on for further bit

7612 361 2 LP i-' I I I I I I I I H I I I I I I I

-86 -66 -46 -26l - t 26 46 66 86 a/i-

1 4 5 0.54 -11612

1 1 4 5 145

deletions. Curve d in Fig. 3 shows that the performance of the EHCDM system using this set of multipliers is significantly enhanced in comparison with using eqn. 4a alone (equivalent to setting M = 1 and a = 10.0 for each value of B in eqn. 46) although slightly inferior to using the parameters in the second column of Table 1 (curve c).

4.2 Algorirhm 2 The amplitude distributions for the same input speech and for the corresponding HCDM error were determined experimentally. Unlike the input speech, the error signal is noise-like but exhibits a similar amplitude distribution and therefore p-law compression can be applied to the HCDM error to achieve efficint compression. The error signal amplitude distribution following p-law compression, has a relatively flat amplitude distribution over a much wider range of amplitudes, making p-law quantisation considerably more efficient than uniform quantisation. This is confirmed by the SNR performance comparison shown by curves d and e in Fig. 8.

Parameters which should be appropriately optimised in the p-law function are p and the possible maximum magnitude, I q /,,,ax, of the compressor input. In general, the error amplitude distribution becomes flatter with increasing values of p but satisfactory compression is achieved with p = 100, and SNR performance effectively saturates at this point when the system operates at higher rates. Adjacent sample correlation p(1) of the compressed error decreases with increasing p as illustrated in Fig. 5 where p(1) is defined as

N - 2

P(1) = "=O c q(44(n + l p q Z ( n ) " = O (8)

q(n) is the nth error sample and N is the total number of the error samples. Excessively high values of p significantly reduce adjacent sample correlation making the use of further instantaneous adaptation (eqn. 5 4 less efficient. A compromise value of p = 100 was therefore chosen for the APCM coder. The value of I q must be chosen to be large enough to allow the quantiser following the compressor to cover the maximum possible dynamic range of the compressed HCDM error samples for a variety of input speech material. We set I q Imx = 2 I q I,,,, where 1qIm denotes the maximum magnitude of the HCDM error samples for the input speech segments used in simulation. Simulation indicates that for a given p, compression becomes inefficient with increasing 14 However, increasing I q I m x leads to increasing adjacent sample correlation p(1) in the compressed error, as indic-

-86 -66 -46 - 26 46 66 86

a

Fig. 4 Quantiser levels and step size multipliers (1 Step size multipliers used in four-bit APCM quantiser b Equivalent quantisalion characteristics with one bit deletion

158

b

I E E Proc.-Vis. Image Signal Process., Vol. 142, No. 3, June 1995

ated by curves a and b in Fig. 5, making the use of eqn. 5a more efficient.

The remaining parameters affecting EHCDM performance are the scaling factor fi and the step size multi-

0 2 5

015

\

-- 25 50 75 100 125 150 175 ZOO 225

P Fig. 5 a function of@ (1 I q I_* = I q iII, the maximum magnitude of HCDM error samples b 1q1,,=2iql,

Adjacent sample correlation in the compressed HCDM error as

pliers M in eqn. 5. SNR values as a function of fi with B = 4, 3, 2, 1 respectively and M = 1 are plotted in Fig. 6. As in Fig. 2, a larger value for /? is advantageous at

i

5. m 18, C

050 0 60 0 70 0 80

sca l ing factor

Fig. 6 system using eqn. 5 with M = I a R , , = B = 4 b R , = B = 3 c R , , = B = 2 d R , , = B = I

S N R values as a function of scale factor ,9 for the EHCDM

higher rates and vice versa. However, the optimum values for /? over the operating rates from B = 1 to 4 in Fig. 6 have a relatively smaller range than those in Fig. 2, and in this range the SNR variation with /? in Fig. 6 is considerably smaller. For this reason and in contrast to the selection of a, which is independently optimised for each value of B (Table I), we chose G( = 0.7 for operation at all rates, somewhat favourable to performance at higher rates. The SNR penalty at lower rates can be compens- ated for in the instantaneous adaptation design by choosing appropriate step size multipliers.

The system performance was then substantially improved by carefully choosing values for M in eqn. 5a as described in Section 4.1 for Table 1. SNR performance is illustrated in Fig. 7 using algorithm 2 with three differ-

IEE Pro<.-Vis. Image Signal Process., Vol. 142, No. 3, June 1995

ent sets of multipliers and a scaling constant fi = 0.7. Curve a ( M = 1) corresponds to simple p-law quantisation whereas curves b and c correspond to the two sets of

,$ "1

$ 1 20

! 16

1 2 -

L.

bi l rote, k b k

Fig. 7 n M = l b Values of M (second column in Table 2) optimised for each value of B c Values of M (first se1 of the third column in Table 2) for unknown bit deletions

S N R performance ofEHCDM using algorithm 2

multipliers listed in Table 2 (the second column and the left set in the third column), respectively. Only the latter, which carries a slight SNR penalty, can be used if the number of bit deletions is unknown.

Table 2: Scaling factor /I and step size multipliers M for APCM supplemental coding using algorithm 2

6 Optirnised for each Optirnised for value of B unknown bit deletion

1 2 3 4 4 4

fl 0.70 0.70 0.70 0.70 0.70 0.60 M , 0.50 0.55 0.70 0.70 0.74 0.70 M, 1.05 0.70 0.70 0.74 0.70

0.70 M3 1.00 0.75 0.74 M4 1.15 0.90 0.74 0.70

1.34 M5 1.00 1.15 1.34 M e 1.00 1.15

M, 1.15 1.15 1.34 M. 1.20 1.15 1.34

It can be seen from Fig. 7 that the SNR increase with transmission rate from 16 kb/s to 24 kb/s, using the first multiplier set in the third column of Table 2, is small because the values selected for B and M favour performance at higher bit rates. By choosing different scale factors and sets of multipliers, performance tradeoffs can be obtained between high and low bit rates and two such parameter sets are given in the third column of Table 2 (with p = 100). Using the second parameter set, the SNR value at 24 kb/s increases by more than 1 dB at the expense of a SNR reduction at 48 kb/s of approximately the same amount. Note that p = 100 also favours performance at higher rates. If a larger value, say, 225, is used with corresponding appropriate values for /? and M , more distinctive SNR tradeoffs can be obtained. This property could be exploited in controlling the effects of channel or network congestion to improve mean speech quality. The second set of the parameters should only be

I 5 Y

selected when frequent congestion is detected and the probability of low transmission rates is high, and vice versa.

5 Performance comparison and conclusions

Fig. 8 compares SNR performance of the EHCDM system using the two algorithms with optimum scaling

3 2 1 *=

16 2 4 32 10

bit rate, kbis

Fig. 8 S N R performance of original HCDM and EHCDM with optimum scaling constant and step size multipliers for unknown bit deletions a HCDM (original) b EHCDM, using algorithm I (curve d in Fig. 3) c EHCDM, using algorithm 2 (curve c in Fig. 7) d EHCDM, using uniform quantiser e EHCDM, using logarithmic quantiser

constants and step size multipliers for unknown numbers of bit deletions. The performance of the system with supplemental PCM coding using only uniform or logarithmic quantisation is also illustrated by curves d and e. Algorithm 1 is slightly inferior at 24 and 48 kb/s, but represents a significant improvement over PCM using uniform or logarithmic quantisation only. Both algorithms are low in complexity and incorporate instantaneous adjustment of the basic step size. The EHCDM system using algorithm 2 yields very good quality output at 48 kb/s and differences from the original speech are barely perceptible. There is a graceful reduction in quality from 48 to 16 kb/s in 8 kb/s steps. Algorithm 1 produces a slightly inferior SNR performance but as the basic step size, APCM(k) in algorithm 1 is derived directly from the RMS slope energy E of the HCDM output, it has a particular advantage of low complexity.

Conventional HCDM and EHCDM were also compared. The solid curve (curve a) in Fig. 8 illustrates SNR performance of the original HCDM coder using the same input speech and operating at fixed bit rates from 16 to 48 kb/s, respectively. Although HCDM SNR performance is higher than EHCDM at 24 and 32 kb/s, the difference at 32 kb/s is very small. However, the performance of EHCDM using both algorithms is superior at 40 kb/s and particularly at 48 kb/s. Informal listening tests also confirmed the SNR performance comparison results. A fairer comparison with conventional HCDM can be made since SNR tradeoffs by approximately equal amounts between high and low rates are available to EHCDM. Results show that, on the whole, SNR performance of the EHCDM system using algorithms 1 and 2 is superior to that of the original HCDM coder.

6 References

1 GOODMAN, D.J.: ‘Embedded DPCM for variable bit rate transmission’, IEEE Trans., 1980, COM-28, pp. 1040-1046

2 WASSELL, I.J., GOODMAN, D.J., and STEEL, R.: ’Embedded delta modulation’, IEEE Trans. ASSP, 1988,36, pp. 1236-1243

3 JAYANT. N.S.: ’Variable rate ADPCM b a d on explicit noise coding’, Bell Syst. Tech. J. , 1983,62, pp. 657-677

4 ZHANG, S., and LOCKHART, G.B.: ‘Design and performance of robust embedded ADPCM coder’, Electron. Lett., 1991, 27, pp. 1786-1788

5 ZHANG, S.D., and LOCKHART, G.B.: ‘An efficient embedded ADPCM coder’. Proceedings of 5th IEE international conference on Telecommunications, Brighton, UK, March 1995, pp. 210-214

6 HAOUI, A., and MESSERSCHMITT, D.G.: ‘Embedded coding of speech: a vector quantization approach’. Proceedings of the IEEE international conference on Acoustics, speech and signal processing, March 1985, pp. 1703-1706

7 LACOVO, R.D.D., and SERENO, D.: ‘Embedded CELP coding for variable bit-rate between 6.4 and 9.6 kbit/s’. Proceedings of the IEEE international conference on Acoustics, speech and signal processing, 1991, pp. 681-684

8 ZHANG, S.D., and LOCKHART, G.B.: ‘An embedded scheme for regular pulse excited (RPE) linear predictive coding’. Proceedings of the IEEE international conference on Acoustics. speech and signal processing, Detroit, USA, May 1995, pp. 37-40

9 GREEFKES, J.A.: ‘A digitally controlled delta d e c Cor speech transmission’, Con$ Rec., IEEE Int. Con$ Commun., 1970, 1 , pp. 7- 33-7-48

IO STEELE, R.: ‘Delta modulation systems’ (Pentech Press, London, 197.5)

1 1 WINKLER, M.R.: ’High information modulation’, IEEE Inc. Con$ Rec., 1963, (S), pp. 260-265

12 JAYANT, N.S.: ‘Adaptive delta modulation with a one-bit memory’, Bell Syst. Tech. J. , 1970,49, pp. 321-342

13 KYAW, A.K., and STEELE, R.: ‘Constant-factor delta modulation’, Electron. Lett., 1973,9, pp. 96-97

14 UN, C.K., LEE, H.S., and SONG, J.S.: ‘Hybrid companding delta modulation’, IEEE Trans., 1981, COM-29, pp. 1337-1344

15 UN, C.K., and LEE, H.S.: ’A study of the comparative performance of adaptive delta modulation systems’, IEEE Trans., 1980, COM-28, pp. 96-101

16 JAYANT, N.S.: ’Adaptive quantization with a one word memory’, Bell Syst. Tech. J . , 1973,52, pp. 1 1 19-1 144

160 IEE Proc.-Vis. Image Signal Process., Vol. 142, No. 3, June 1995

Documents

Design and simulation of an efficient adaptive delta modulation embedded coder