7
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 37, NO. 2, MARCH 1991 407 where the Si are examples of a zero-mean random signal wm- ponent with amplitude 8, the F. are the additive i.i.d. noise samples and the Ni are also i.i.d. noise samples but they are generally correlated with the F.. Here we assume that E[NJF.] # 0. The locally optimum nonlinearity is, from [8], given as 1 [ fdx) ui‘f;;(.)+2au’(x) T‘”( x) = - = ah( x) + Zag2( x ), (30) where u(x) is defined in (24) and U,’ = E[Sf]. In this case, again, if neither term in the numerator of (30) is zero, it is not possible to factor the locally optimum nonlinearity into time dependent and time independent parts and we must use the more general results of (18) and (14). The time dependent levels are, from (14), given by i u~2[fh‘;(~k)-f~.‘(tk-l)l+2a[u(tk)- u(tk-l)l FW( Ik - FW( lk - 1) qp = - ( u?ll,k +2a12,k. (31) The breakpoints are obtained from (18) as the solution to an equation that is exactly of the same form as (28) but with a replaced by 2a, s, replaced by U,’, and g, replaced by h,. Another interesting observation model for a random signal in signal-dependent noise from [8] is XI =os, + a8Iq + v, ( 32) where the S, are samples of a zero-mean random signal compo- nent with amplitude 8, the F are the additive i.i.d. noise samples and the N, are also i.i.d. noise samples but they are generally correlated with the F. Here we assume that E[N,(q:] = 0. The locally optimum nonlinearity is, from [8], given as =c$hl(x)+a2h3(x), (33) where v(x) is defined as In this case, also, if neither term in the numerator of (33) is zero, it is not possible to factor the locally optimum nonlinearity into time dependent and time independent parts and we must use the more general results of (14) and (18). The time depen- dent levels are, from (14) given by 1. fh’(fk) - fh”(fk- 111 + a’[ tk) - - 1 11 FW( fk) - FW( tk - 1) (35) @= - i The breakpoints are obtained from (18) as the solution to a formula which is exactly of the same form as (28) but with a replaced by a’, si replaced by U?, g, replaced by h,, and g, replaced by h,. V. CONCLUSION The nonlinear equations that determine the design of the locally optimum detector based on quantized data for the im- portant practical case of time invariant breakpoints have been derived in this correspondence. These equations are different from the case where the breakpoints are not constrained to be time invariant for many observation models. Some common observation models such as additive noise models for both known and random signals and some multiplicative noise models can be handled easily by relating them to time invariant cases through a factoring of their locally optimum nonlinearity, but for some signal-dependent noise models, the results cannot be obtained through this factoring technique and one must resort to solving our more general equations. These equations were solved for some illustrative examples. It was further demon- strated that the solution to the locally optimum quantizer design for the time invariant breakpoint constraint is the same as that quantizer design that minimizes the time average mean-square difference between the quantizer and the locally optimum time- varying nonlinearity. REFERENCES S. A. Kassam, Signal Detection in Non-Gaussian Noise. New York Springer-Verlag, 1988. C. W. Helstrom, “Improved multilevel quantization for detection of narrowband signals,” IEEE Trans. Aerospace Electron. Syst., vol. 24, no. 2, pp. 141-147, Mar. 1988. B. Picinbono and P. Duvaut, “Optimum quantization for detection,” IEEE Trans. Commun., vol. 36, no. 11, pp. 1254-1258, Nov. 1988. G. R. Benitz and J. A. Bucklew, “Asymptotically optimum quantiz- ers for detection of i.i.d. data,” IEEE Trans. Inform. Theory, vol. 35, no. 2, pp. 316-325, Mar. 1989. H. V. Poor and J. B. Thomas, “Optimum quantization for local decisions based on independent samples,” J. Franklin Ins?., vol. 303, no. 1, pp. 549-561, Jan. 1977. H. V. Poor and D. Alexandrou, “A general relationship between two quantizer design criteria,” IEEE Trans. Inform. Theory, vol. IT-26, no. 2, pp. 210-212, Mar. 1980. I. Song and S. A. Kassam, “Locally optimum detection of signals in a generalized observation model: The known signal case,” IEEE Trans. Inform. Theory, vol. 35, pp. 502-515, May 1990. -, “Locally optimum detection of signals in a generalized observation model: The random signal case,” IEEE Truns. Inform. Theory, vol. 35, pp. 516-530, May 1990. B. Aazhang and H. V. Poor, “On optimum and nearly optimum data quantization for signal detection,” IEEE Trans. Commun., vol. D. Alexandrou and H. V. Poor, “The analysis and design of data quantization schemes for stochastic-signal detection systems,” IEEE Transactions on Commun., vol. COM-28, pp. 983-991, July 1980. E. Abaya and G. L. Wise, “Some notes on optimal quantization,” in IEEE Int. Conf. Commun., pp. 30.7.1-30.7.5, June 1981. COM-32, pp. 745-751, July 1984. On Entropy of Pyramid Structures R. Padmanabha Rao and William A. Pearlman Abstract-Entropy for a stationary Gaussian source is expressed in terms of its spectrum and used to analyze pyramid structures. Defining the entropy in this manner helps to introduce a quantity called the spectml roughness measure. The main result is that if a Gaussian process is optimally decomposed into a pyramid then the combined first-order entropy of the pyramid is closer to the entropy rate of the source than the first-order entropy of the full-band process. This means lossless compression of the source sequence can he obtained by merely repre- Manuscript received June 18, 1990; revised July 19, 1990. This work is based on work supported by the National Science Foundation Grant MIP-8610029. This work was presented in part at the IEEE Interna- tional Symposium on Information Theory, San Diego, CA, January The authors are with the Electrical, Computer, and Systems Engineer- IEEE Log Number 9041020. 14-19, 1990. ing Department, Rensselaer Polytechnic Institute, Troy, NY 12180. 0018-9448/91/0300-0407$01.00 01991 IEEE

On entropy of pyramid structures

  • Upload
    wa

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: On entropy of pyramid structures

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 37, NO. 2, MARCH 1991 407

where the Si are examples of a zero-mean random signal wm- ponent with amplitude 8, the F. are the additive i.i.d. noise samples and the Ni are also i.i.d. noise samples but they are generally correlated with the F.. Here we assume that E[NJF.] # 0. The locally optimum nonlinearity is, from [8] , given as

1 [ f d x )

u i ‘ f ; ; ( . )+2au’ (x ) T‘”( x ) = -

= a h ( x ) + Zag2( x ) , ( 3 0 ) where u ( x ) is defined in (24) and U,’ = E[Sf]. In this case, again, if neither term in the numerator of (30) is zero, it is not possible to factor the locally optimum nonlinearity into time dependent and time independent parts and we must use the more general results of (18) and (14). The time dependent levels are, from (14), given by

i u ~ 2 [ f h ‘ ; ( ~ k ) - f ~ . ‘ ( t k - l ) l + 2 a [ u ( t k ) - u(tk-l)l

FW( I k - F W ( l k - 1 )

q p = - ( u ? l l , k + 2 a 1 2 , k . ( 3 1 )

The breakpoints are obtained from (18) as the solution to an equation that is exactly of the same form as (28) but with a replaced by 2 a , s, replaced by U,’, and g, replaced by h , .

Another interesting observation model for a random signal in signal-dependent noise from [8] is

XI =os, + a8Iq + v, ( 32) where the S, are samples of a zero-mean random signal compo- nent with amplitude 8, the F are the additive i.i.d. noise samples and the N, are also i.i.d. noise samples but they are generally correlated with the F. Here we assume that E [ N , ( q : ] = 0. The locally optimum nonlinearity is, from [8] , given as

= c $ h l ( x ) + a 2 h 3 ( x ) , ( 3 3 ) where v ( x ) is defined as

In this case, also, if neither term in the numerator of (33) is zero, it is not possible to factor the locally optimum nonlinearity into time dependent and time independent parts and we must use the more general results of (14) and (18). The time depen- dent levels are, from (14) given by

1. f h ’ ( f k ) - f h ” ( f k - 1 1 1 + a’[ t k ) - - 1 11 FW( f k ) - FW( t k - 1)

( 3 5 ) @= - i

The breakpoints are obtained from (18) as the solution to a formula which is exactly of the same form as (28) but with a replaced by a’, si replaced by U?, g , replaced by h , , and g , replaced by h, .

V. CONCLUSION

The nonlinear equations that determine the design of the locally optimum detector based on quantized data for the im- portant practical case of time invariant breakpoints have been derived in this correspondence. These equations are different from the case where the breakpoints are not constrained to be time invariant for many observation models. Some common

observation models such as additive noise models for both known and random signals and some multiplicative noise models can be handled easily by relating them t o time invariant cases through a factoring of their locally optimum nonlinearity, but for some signal-dependent noise models, the results cannot be obtained through this factoring technique and one must resort to solving our more general equations. These equations were solved for some illustrative examples. It was further demon- strated that the solution to the locally optimum quantizer design for the time invariant breakpoint constraint is the same as that quantizer design that minimizes the time average mean-square difference between the quantizer and the locally optimum time- varying nonlinearity.

REFERENCES S . A. Kassam, Signal Detection in Non-Gaussian Noise. New York Springer-Verlag, 1988. C. W. Helstrom, “Improved multilevel quantization for detection of narrowband signals,” IEEE Trans. Aerospace Electron. Syst., vol. 24, no. 2, pp. 141-147, Mar. 1988. B. Picinbono and P. Duvaut, “Optimum quantization for detection,” IEEE Trans. Commun., vol. 36, no. 11, pp. 1254-1258, Nov. 1988. G. R. Benitz and J. A. Bucklew, “Asymptotically optimum quantiz- ers for detection of i.i.d. data,” IEEE Trans. Inform. Theory, vol. 35, no. 2, pp. 316-325, Mar. 1989. H. V. Poor and J. B. Thomas, “Optimum quantization for local decisions based on independent samples,” J. Franklin Ins?., vol. 303, no. 1, pp. 549-561, Jan. 1977. H. V. Poor and D. Alexandrou, “A general relationship between two quantizer design criteria,” IEEE Trans. Inform. Theory, vol. IT-26, no. 2, pp. 210-212, Mar. 1980. I. Song and S. A. Kassam, “Locally optimum detection of signals in a generalized observation model: The known signal case,” IEEE Trans. Inform. Theory, vol. 35, pp. 502-515, May 1990. -, “Locally optimum detection of signals in a generalized observation model: The random signal case,” IEEE Truns. Inform. Theory, vol. 35, pp. 516-530, May 1990. B. Aazhang and H. V. Poor, “On optimum and nearly optimum data quantization for signal detection,” IEEE Trans. Commun., vol.

D. Alexandrou and H. V. Poor, “The analysis and design of data quantization schemes for stochastic-signal detection systems,” IEEE Transactions on Commun., vol. COM-28, pp. 983-991, July 1980. E. Abaya and G. L. Wise, “Some notes on optimal quantization,” in IEEE Int. Conf. Commun., pp. 30.7.1-30.7.5, June 1981.

COM-32, pp. 745-751, July 1984.

On Entropy of Pyramid Structures

R. Padmanabha Rao and William A. Pearlman

Abstract-Entropy for a stationary Gaussian source is expressed in terms of its spectrum and used to analyze pyramid structures. Defining the entropy in this manner helps to introduce a quantity called the spectml roughness measure. The main result is that if a Gaussian process is optimally decomposed into a pyramid then the combined first-order entropy of the pyramid is closer to the entropy rate of the source than the first-order entropy of the full-band process. This means lossless compression of the source sequence can he obtained by merely repre-

Manuscript received June 18, 1990; revised July 19, 1990. This work is based on work supported by the National Science Foundation Grant MIP-8610029. This work was presented in part at the IEEE Interna- tional Symposium on Information Theory, San Diego, CA, January

The authors are with the Electrical, Computer, and Systems Engineer-

IEEE Log Number 9041020.

14-19, 1990.

ing Department, Rensselaer Polytechnic Institute, Troy, NY 12180.

0018-9448/91/0300-0407$01.00 01991 IEEE

Page 2: On entropy of pyramid structures

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 37, NO. 2, MARCH 1991 408

( 4 2 implies downsamplc by 2)

r

Fig. 1. Tree diagram for pyramid generation.

senting it on a pyramid structure. Examples using 1-D and 2-D sources are presented to veri@ the above claims.

Index Terms --Source coding, image coding, entropy, rate-distortion theory, pyramid, subband

I. INTRODUCTION Data compression using a pyramid representation has wit-

nessed much activity in recent times, especially in the area of image coding. Another closely related scheme that has found popularity is subband coding. The basic idea of both these schemes is to divide the frequency band of the original source into a number of subbands using a bank of band-pass filters. Taking advantage of the reduced bandwidth of the subbands, one can then subsample them in proportion to their resolution, thus giving rise to a multiresolution representation. If the fre- quency band is split uniformly so as to give equal width sub- bands we get the usual "subband" structure. On the other hand, if the band splitting operation is done recursively on the low-pass subband only, as depicted in Fig. 1, we get a multiresolution "pyramid" structure. Note that in the pyramid structure the fullband is split up into subbands whose bandwidth reduces in octave steps.

The advantages of a multiresolution decomposition are mani- fold, especially for representation and coding of images. Be- cause of its structure, an image pyramid simultaneously makes available versions of the original image at different resolutions. A coarse image can be used for performing quick point and line operations, recognition, segmentation and other pattern match- ing algorithms. Another important application of image pyra- mids is in progressive transmission. Instead of sending the whole image at once, a smaller image with lower resolution is sent first. This image generally has enough information for perform- ing early recognition. If desired, a higher resolution image can be obtained by sequentially sending the other subbands. Pyra- mids have also found extensive applications in the area of image coding. Excellent results using subbands and pyramids for image coding can be found in [11-[31.

Now, in spite of an abundance of literature on subband and pyramid coding, we have been unable to find a satisfactory answer to the question as to why such a scheme, wherein the frequency spectrum is divided into a number of disjoint inter- vals, succeeds in the first place. It is well known that in these schemes the coding error in a particular subband is confined to that subband, and that to some extent noise spectrum shaping can be done. These points however do not answer the question adequately. In this correspondence, therefore, we seek a more satisfactory answer by analyzing multiresolution structures from an information theoretic point of view. Since the theory of data compression is directly linked to the concept of entropy, we

shall carry out our analysis in terms of source entropy. We shall see shortly that such an approach does indeed provide us with an interesting insight regarding the concepts of pyramid and subband representation of sources.

This correspondence is organized as follows. In the next section we introduce the concept of spectral roughness measure for a Gaussian process and look at some of its properties. The spectral roughness measure is then used to prove the main result of this paper in Section 111. A few examples are also presented therein to corroborate our results. Finally, in Section IV we generalize our results to image sources which are two- dimensional (2-D) processes.

11. SPECTRAL ROUGHNESS MEASURE

Let X,, be a discrete-time, stationary process with power spectraldensity (psd)S,(w) and varianceu: = 1/2lr/?,,,S,(w)do. Corresponding to any stationary source Z having memory, we can define a memoryless source Z * having probability density function (pdf) equal to the first-order pdf of Z. Thus, corre- sponding to X we have the i.i.d source X * such that

n

Pj;!(X, 9 x * , . . ., x,,) = n P P ( X d , (1) k = l

where p(") denotes nth order pdf. The spectrum of X * is given by

S X * ( W ) =U;. (2)

That is, the psd of X * is flat for the entire range of frequencies. Letting R"(D) and R" ' (D) be the rate-distortion functions of X and X*, respectively, we can write (see [6]),

R " ' ( D ) - A " , I R " ( D ) ~ R " * ( D ) . (3)

(4)

In (31,

A:= Z{Xi; Xo, X i , ' . . },

where Z represents mutual information. Thus, AX, is the mutual information between the output of the present realization of X and its infinite past.

Clearly, A: is in some manner a measure of the memory of X. Wyner and Ziv have derived some interesting properties regarding this in [6]. However most of their analysis is in the time domain. We now reinterpret this quantity in terms of the spectrum of the source and show that it has a nice intuitive interpretation in the frequency domain too. We shall subse- quently use this frequency domain interpretation to derive our main result regarding pyramid entropies.

A. Spectral Entropy

ing the following quantity The transformation to frequency domain is achieved by defin-

which we call the spectral entropy of X.

to relate H," to the more commonly used differential entrop (or entropy rate) h,. Indeed, by invoking the definition of h,,

For the particular instance when X is Gaussian it is possible

1 2 h, = - log2~eQ, , (6

where

(7)

Page 3: On entropy of pyramid structures

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 37, NO. 2, MARCH 1991 409

is the source entropy power, we can write

1 1 “ 2 2

h = -H,” + - I o g ( 2 ~ e ) . (8)

Thus, except for linear scaling and shifting the differential entropy rate and spectral entropy of a Gaussian source are the same. In other words, we can use H,” for performing the same calculations and making the same interpretations as we would

At this point we would like to comment on some of the notation to be used in this correspondence. To enhance clarity we shall continue using subscripts to denote which source is being referenced. The asterisk will be used to differentiate a source from its corresponding memoryless source. We denote differential entropy by h and spectral entropy by H,. The order of entropy (i.e., first-order, Nth-order, etc.) appears as a sub- script. In conformity with the more standard usage we shall refer to the infinite order differential entropy as entropy rate. Thus the entropy rate of X is written as ht and its spectral entropy as H,”. Now, if a source is memoryless, its entropy h, is independent of N. Therefore for the particular memoryless source X*, we can unambiguously denote its entropy rate by h”’ and its spectral entropy by ff,”’.

Now, it is shown in [6] that A i is the difference between the first-order entropy and entropy rate of the source, i.e.,

hX*

A:= h”*- h:. (9) In (9) the only restriction on X is that it be stationary. However, if we impose the additional constraint that X should also be Gaussian, then by exploiting the linear relationship between its differential entropy h, and spectral entropy H,”, (see (811, we can write,

1 2

A : = - ( H , ” * - H , ” )

Using the definition of spectral flatness measure [5 ] ,

we then obtain

Remark 1: The reader is reminded that the source X has now been further specialized to a Gaussian process. Looking at (13) we see that as y,’ increases, A: decreases and vice versa. In other words, A t behaves very much like a spectral roughness measure.

Remark 2: For i.i.d. sources, which have a smooth spectrum, y,’ equals 1 and hence A: equals 0. As source memory in- creases, the spectrum becomes more rough and the spectral flatness measure (sfm) y,’ approaches zero. Correspondingly from (13) we see that A: approaches m. Furthermore, since 0 I y,’ I 1, A: 2 0, i.e., the spectral roughness measure is always nonnegative.

Remark 3: The entropy power of any non-Gaussian source with memory is smaller than that of the Gaussian source with the same power spectral density and having the same amount of memory as measured by its A,. Furthermore, a decrease in memory (with variance remaining the same) corresponds to an increase in source entropy. Thus, the non-Gaussian source will have the same entropy as the Gaussian source only if its memory

is decreased, that is, if the value of its Am is made less than that of the Gaussian source. This simply means that the spectral roughness measure Am of a Gaussian source overbounds the value of Am of all sources having the same entropy power.

111. THE MAIN RESULT

We now show that if a Gaussian sequence is optimally filtered into a pyramid then the sum of Am’s of the pyramid components is closer to 0 than the Am of the original sequence. To prove this assume that our original Gaussian sequence X is ideally filtered into m pyramid components Y,,Y,; . .,Y,, as shown in Fig. 1. Note that since the input is a Gaussian process the pyramid components are all also Gaussian. Furthermore, because of the nonoverlapping nature of the filters, they are independent of one another. After filtering the pyramid components are resam- pled in the usual fashion, i.e., if the spectral width of the jth component is T / M then it is decimated by a factor of MI. The frequency range o i the j th subband Y is denoted by I, = [ w , , w , + , ) u ( - - w,], and its bandwidth by Awl = Iw, - w I + ,1= T / M , . Also, since we require that

m

U I, = [ - T,T], and Zk n I, = 4, k z j , (14) J = 1

we get m

l / M j = l . j = l

The ideal filtering of the jth subband filter produces a Gaussian sequence whose spectrum before subsampling is

(16) if w E Zj ,

s; , (w) = (;(U)? otherwise.

Resampling by a factor of Mi modifies the spectrum to

in the frequency interval - T < w I T referenced to the new sampling frequency. The l/Mj factor is needed to maintain equal variance before and after resampling.

Recall now that the spectral entropy H,” and variance U: of the fullband sequence X are given by

By definition, the first order entropy h;‘ of X * equals the first-order entropy ht of X and, since X * is i.i.d., h;’ also equals its entropy rate h”’, i.e., hf = h;’= h**. Also, the spec- tral entropy of X * is

As for the j th component spectral entropy as

of the pyramid, we can express its

logS,(w)dw-/ logMjdw , (21) Ii 1

Page 4: On entropy of pyramid structures

410 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 37, NO. 2, MARCH 1991

and its variance as

1 = - / S , ( w ) d w .

27T I,

For the corresponding memoryless sequence *, we have

As in the fullband case, we once again have h p = h$ = hy;. Since the pyramid decomposition results in a set of indepen-

dent processes, the total entropy per sample of the pyramid is obtained by adding the weighted entropies of all the compo- nents; weighting for the jth component is Mi, the factor by which it is decimated. This implies that the weighted sum of Am’s of the pyramid components would give us the A, for the complete pyramid. Taking the difference between Am of full- band and A, of the pyramid we get

Making the appropriate substitutions from (191423) and fol- lowed by some mathematical manipulation leaves us with

Noting now that Zy=ll/Mj= 1, and that the logarithm is a convex n function, we can apply Jensen’s inequality to obtain

m 1 1 - log M~u; I log -Mia; (26)

= log U,’. (27)

j = 1 Mj [ j y l Mi ] The last step follows because the filtering process has been assumed ideal. Substituting back in (25) leads us to the conclu- sion that

m m (28) AFullhand - Amramid 0.

Thus we see that A, of a Gaussian sequence does not increase when it is split into a pyramid representation. In reality, there is a vast reduction in the value of A, when the process is decom- posed into a pyramid, as our examples in the next section demonstrate.

This result is easier to observe in the case when all subbands are of equal width. Indeed, then the argument of the logarithm in (25) simply reduces to the ratio of the arithmetic mean of the subband variances to their geometric mean. Since this ratio is never less than unity, we get the same result as in (28).

It is also easy to show that the spectral roughness measure is a nonincreasing function of the number of levels in the pyramid. In order to do this consider the pyramid decomposition shown in Fig. 1, where at each level the lowpass signal is split into two equal width subbands. Assume now that the number of levels in an m - 1 level pyramid (having m subbands) is increased by 1. Using the notation of Fig. 1 we see that the new pyramid is obtained by splitting component Y, of the old pyramid into two equal width components Y,: and Y,:+l. It is not difficult to see

that any change in the value of Am is produced only by the splitting of component Y,. By associating Y, with “ fullband” and the Y ’ s with “subband” in (25), we then see that the splitting operation does not increase the value of Am. In other words, the A, of the new m level pyramid is no greater than that of its parent ( m - 1) level pyramid.

What does this mean? Recall that A, is the difference of the first-order entropy and entropy rate of the source. Although we are talking of differential entropies, which have no meaning per se, one must keep in mind that the difference of differential entropies does indicate a difference in information. This means that if the resolution of measurement is held fixed, differential entropies can be used as a measure of information. For exam- ple, we can compare the first order differential entropies of two sources by scalar quantizing them using the same quantizer and then computing their first order discrete entropies. Thus, the difference of Am’s previously calculated tells us that we need less bits to scalar quantize (to whatever resolution we desire) the pyramid as compared with the original source. In other words, near-lossless compression can be achieved by merely represent- ing the original source on a pyramid structure. We say near-loss- less because some error is introduced when the original source is reconstructed from a quantized version of the pyramid. How- ever, if the quantization is sufficiently fine, this error is negligi- ble. That indeed this is true will be verified by our simulation results in the next section. It is also clear that higher compres- sion can be achieved by simply increasing the number of levels in the pyramid (since we decrease the spectral roughness mea- sure as we go from any level to the next higher level).

We can also interpret the above result in the frequency domain. The band-pass filtering and subsequent subsampling operation on the subbands results in stretching of the spectrum in each subinterval. As a result the spectral roughness measure of the subbands is less than the original spectrum. Through the definition of the spectral roughness measure this means that inter-sample correlation within the subbands is less than that in the original process and thus simple scalar quantization of the subbands would need less bits.

Equality occurs in (26) if and only if Mju; is constant inde- pendent of j. This condition is satisfied when the input process is i.i.d. Hence, we corroborate rigorously the well-known fact that subband and pyramid coding of i.i.d. sources produce no benefits over direct coding. The equality condition is rarely met for non-i.i.d., real-life sources and thus for such sources subband and pyramid coding offer a clear cut advantage. We summarize this discussion in the form of a theorem and its two corollaries.

Theorem 1: The spectral roughness measure A, for an ide- ally filtered multiresolution pyramid representation of a Gauss- ian source is no greater than the Am for fullband. Furthermore, A, is a nonincreasing function of the number of levels in the pyramid.

Corollary I : The difference between first order entropy h , and entropy rate h, for a Gaussian source reduces when it is represented on a pyramid structure. This follows from the definition of Am.

Corollary 2: For a Gaussian source and squared-error crite- rion, and for rates higher than the critical rate (i.e., for D I min S,(w)), R,(D) is closer to R(D) for the pyramid representa- tion than for the fullband.

Proof: Using the results of [6] we can show that

R,(D)-R(D)=A,, D ~ m i n S , ( o ) . (29)

But it is known that the rate-distortion function, R(D), does not change when a Gaussian sequence is split into subbands (see

0 [4]). The proof then follows by invoking Theorem 1.

Page 5: On entropy of pyramid structures

41 1 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 37, NO. 2, MARCH 1991

TABLE I VARIATION OF A i WITH PYRAMID SIZE FOR AN AR(1) SOURCE

Level Lowpass Highpass Total

1 0.714 0.005 0.360 2 0.493 0.071 0.144 3 0.212 0.018 0.057

0 (Fullband) 0.83 - 0.83

A. Examples

Source, AR(l), whose present output X(n) is given by I ) First-Order Markou Process: Consider a first-order Markov

X( n) = p X ( n - 1) + Z( n ) , (30) where Z is a zero-mean, white Gaussian noise process with v: = 1 and p = 0.9. The sfm for this source is given by y,' = 1 - p 2 [5], which implies that

1

This source is decomposed into a pyramid structure by using 1-D QMF's. It is well known that QMF's cause cancellation of inter-bqnd aliasing and also closely approximate the action of ideal filters upon the signal. Thus we can expect that the theory derived in the last section for optimal pyramids would hold for these pyramids too. Table I shows the variation of A i as a function of pyramid level. The A i for each filtered AFt se- quence is obtained by first calculating the sfm of the best fitting m(1) source for the corresponding sequence and then using (31). We see from the table that as the number of levels in the pyramid increases the value of A i decreases. Since Am is a spectral roughness measure, the spectral roughness of any given pyramid component is less than that of the fullband. From (31) this means that p has a smaller value in any pyramid component as compared to the original sequence. In other words, inter-sam- ple correlation is reduced in the pyramid representation. Also shown in Table I are the values of the spectral roughness measure for the highpass and lowpass components at each level of the pyramid. The highpass filtered sequences have a much lower value of Am than the lowpass sequences. This is to be expected since the AFW) source with p = 0.9 has most of its energy concentrated at the lower end of the frequency spec- trum.

2) Second-Order Markov Source: The AR(2) process of zero- mean is given by

X(n) = b,X(n - 1) + b,X(n -2) + Z ( n ) , (32) where once again Z is a white noise process with U: = 1. The spectral flatness measure for this source is given by [5 ]

( l+b2) (1 -b l -b2) (1+ b , -b , ) Y,' = . (33)

( 1 - b2) Hence for this source,

1 1-b , 2 A:= - log (34) (1 + b2)( 1 - b, - b, ) ( l+ b, - b2) '

Thus, if we know the coefficients b, and b,, we can obtain y,' and A:. Here we consider a typical AR(2) process with b, = 1.0 and b, = -0.5. The values of sfm and A: for this source are 0.417 and 0.438, respectively. As in the previous example, we decompose this source into a pyramid structure by using 1-D QMF's. The values of A i for different pyramids of different heights are plotted in Fig. 2. It can be seen once again that the spectral roughness measure is a monotonic nonincreasing func-

tion of the number of pyramid levels. Also, the greatest gains are obtained in the lower levels of the pyramid, i.e., the de- crease in Am as a new level is added to the pyramid is maximum initially and decreases as the number of levels in the pyramid increases. From the point of view of implementation, therefore, it may not be worth the computational complexity to go beyond a few levels. Experiments in pyramid coding of images have shown that on an average three to four levels are sufficient for satisfactory performance.

3) Speech Source: For real world sources, such as speech and images, it is not possible to obtain exactly the values of sfm and spectral roughness measure. Therefore we model the speech waveform by an AR(2) process and proceed as in Example 2. Of course, the accuracy of such a calculation is strongly dependent on how closely the AR(2) source approximates the speech source. In reality one might use a higher order Markov process than the AR(2) used here. However, it is still possible to obtain a meaningful insight into the pyramid forming process by using a simple process like the AR(2) process. In Table I1 we show the values of 6 , and b, obtained for different pyramid components along with the values of Am. The variation of Am is also plotted in Fig. 2. Once again it can be seen that in terms of entropy and inter-sample correlation the pyramid representation is superior to the fullband sequence.

IV. EXTENSION TO MULTIPLE DIMENSIONS The results of the last section can be easily extended to

multiple dimensions, in particular to images, which are 2-D sources. For such sources the psd is a function of two variables and the expression for spectral entropy simply becomes

Using (35) and other analogous expressions, we can proceed in an identical fashion as before and arrive at a similar result as in the 1-D case.

It has been explained in the previous section that the first- order entropy of the pyramid is less than that of the fullband Gaussian process. This fact is also derivable directly from (28). Indeed, substituting the definition of Am from (9) and making the appropriate manipulations reveals (for details see Appendix)

We now show experimentally that this result holds for images which are non-Gaussian as well. As an example consider the Lena pyramid depicted in Fig. 3. This pyramid has been gener- ated by using 2-D separable QMF's. Note that in this case the signal is split into four equal width subbands at each stage (as opposed to two subbands in the 1-D case). Tbe first-order entropies of the fullband and pyramid representation are com- puted by scalar quantizing the fullband and subband pixels to the same resolution. In other words, quantizers having the same number of output levels are used to quantize the fullband and pyramid components. In this example we have chosen 256-level quantizers. The entropies thus calculated are shown in Table 111. The reconstructed image using the quantized subbands from a 3-level pyramid is depicted in Fig. 4. The reconstruction error is negligible. From the table we see that the number of bits needed to represent the image decreases as the number of levels in the pyramid is increased. For example the 3-level Lena pyramid needs only 3.82 bits per pixel. In contrast the fullband requires almost 8 bits per pixel.

Page 6: On entropy of pyramid structures

1 1 2 I E b t TRANSAC I IONS ON I N F O K M A T I O N THEORY, VOL 17. N O 2, M A R C H IWI

AR (2) Source Speech Source

f 0.44 0

s 0 v)

C L (r 3 0.22 s - 0

0

a VI

L - 0.00

f 2.0 s

1.5 v)

C J=

0 1.0

a

2 0.5 n

- 9)

v)

0.0 0 1 2 3 4 0 1 2 3 4

Pyramid Levels Pyramid Levels

Fig. 2. Variation of (Ax) with number of pyramid levels.

TABLE 11 VAR14T10Y OF WIT11 PYRAMID SIZE FOR SPEECH SOUR( L

Lowpass Highpass Level hl hZ 1% hl h2 L Ax for Pyramid

~ - 1.497 1 1.707 - O.XhX 1.609 - 0.709 - 0.693 0.423 1.016 3 I ,260 - 0 . ~ 4 0 0.929 0.196 -O.h05 0.681 0.614

0.062 - 0.724 0.372 - 0.308 0.045 0.056 0.43s 3

0 (Fullband) 1.440 -0.488 1.497 -

-

Fig. 3. 2-Level pyramid for Lena

v. c o u c ~ l ~ u s l o u

We have presented an information-theoretic analysis of mul- tiresolution pyramid structures. T h e analysis was carried out using the concept of spectral entropy which, for Gaussian sources, is linearly related to the differential entropy. The spectral cntropy was used t o define the spectral roughness measure that in turn is an indicator of the amount of memory in a source. T h e more the memory in a source, thc greatcr is thc value of its spect.ral roughness measure. The spectral roughness measure also plays an important role in lower bounding the rate-distortion function. The main result of this correspondence

Fig. 4. Lena reconstructed from quantized pyramid

Pyramid Level Entropy (bpp) Normalized Distortion

0 (Fullband) 7.57 0 1 4.62 1.8x 2 3.96 3 . 0 ~ IO-' 3 3.82 4 . 2 ~ lo-' is that the spectral roughness measure of a pyramid s&ucturc is

Page 7: On entropy of pyramid structures

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 37, NO. 2, MARCH 1991 413

less than that of the original fullband process. This means that substantial reduction in bits can be obtained by merely repre- senting a source as a pyramid. Simulations using 1-D and 2-D sources verifying these claims have been presented.

(those achieving the exponent) can be designed by choosing codes appropriate for the associated compound channel.

Index Te--hbitrarily varying channel, interleavin& random codes, error bounds.

APPENDIX Proof for (36): Merely substituting (9) into (28) gives us the

first inequality. To prove the second inequality, we use (8) to obtain

Noting now that H;Pyramid is the weighted sum of the spectral entropies of the pyramid components and making the appropri- ate substitutions from (19)-(23), we obtain

But we know that Mi 2 1, for all j (recall that Mi is the factor by which subband j is resampled). Thus, the right-hand side of equation (38) is always nonnegative and the proof is complete.

U

REFERENCES [l] E. H. Adelson, E. Simoncelli, and R. Hingorani, “Orthogonal pyra-

mid transforms for imaging coding,” Proc. SPIE, Oct. 1987. [2] J. W. Woods and S. D. O’Neil, “Subband coding of images,” IEEE

Trans. Acoust. Speech Signal Processing, vol. ASSP-34, pp.

[31 S. G. Mallat, “A theory for multiresolution signal decomposition: the wavelet representation,” IEEE Trans. Pattern Anal. Machine Intell.,

[4] S. Nanda and W. A. Pearlman, “Tree coding of image subbands,” Proc. ICASSP, Apr. 1988; also in IEEE Trans. Acoust. Speech Signal Processing (in revision).

[SI N. S. Jayant and Peter Noll, Digital Coding of Waueforms. Engle- wood Cliffs, NJ: Prentice-Hall, 1984.

[61 A. D. Wyner and J. Ziv, “Bounds on the rate-distortion function for stationary sources with memory,” IEEE Trans. Inform. Theory, vol. IT-17, no. 9, pp. 508-513, Sept. 1971.

1278-1288, Oct. 1986.

vol. PAMI-11, pp. 674-693, July 1989.

Interleaving and the Arbitrarily Varying Channel

Brian L. Hughes, Member, IEEE

Abstract -The arbitrarily vatying channel (AVC) models a channel with unknown parameters that change with time in an arbitrary way from one symbol transmission to the next. In this note, we investigate the relationship between the error probability suffered on the AVC by a deterministic code with random block interleaving and the error proba- bility predicted by an unknown, f l e d channel model (i.e., a compound channel). Our main results are that codes of this form can achieve the same error exponents as “fully” random codes; further, optimal codes

Manuscript received July 19, 1989; revised August 28, 1990. This work was supported by the National Science Foundation under Grant NCR- 8804257 and by the US . Army Research Office under Grant DAAL03- 89-K-0130. This work was presented in part at the Conference on Information Sciences and Systems, Johns Hopkins University, Balti- more, MD, March 27-29, 1990.

The author is with the Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD 21218.

IEEE Log Number 9041017.

I. INTRODUCIION

Real communication channels are often corrupted by noise with a partially unknown statistical description. Further, the statistics of this noise may change with time in an arbitrary way that makes accurate estimation of the channel impossible or impractical. Examples of such channels arise in situations involv- ing uncoordinated multiple-access or hostile jamming.

Two different approaches to understanding basic communica- tion principles for such channels have emerged. The first ap- proach, pursued almost exclusively by information theorists, centers on the arbitrarily varying channel (AVC) 111. (For a partial survey of the AVC literature, see [2], Chapter 2, Section 6.) Roughly speaking, the AVC models a channel in which the interfering signal can change from one symbol transmission to the next in an arbitrary way within some fixed set of possible values. Consequently, the interfering signal may have arbitrary, time-varying statistics.

A second approach, exemplified by [3], [4] is based on the common practice of interleaving, which is usually used to com- bat burst-noise. An interleaver permutes the time-ordering of successive encoded symbols in a deterministic way prior to transmission; a deinterleauer at the receiver acts synchronously to restore the original ordering prior to decoding. This approach asserts that when coded symbols are interleaved prior to trans- mission, the coding channel (consisting of interleaver, physical channel and deinterleaver) is well-modeled as a fixed, memory- less channel with uncertain statistics. In information theory, a fixed channel with partially unknown statistics is called a com- pound channel (CC) [2], [5]. The assumption that the coding channel is fixed (but unknown) greatly simplifies both the analy- sis and design of robust codes. Many of the techniques which are useful in the study of known, memoryless channels can be carried over, with minor modifications to memoryless channels with unknown statistics [4].

A key difficulty with the latter approach is that the role of interleaving is qualitative and implicit; it serves only to intu- itively justify a compound channel model. Bounds on error probability based on this approximation are thus somewhat misleading, since they are, at best, guaranteed to hold only in the limit as the interleaver becomes infinitely complex.

The primary aim of this correspondence is to investigate the relationship between the error probability of a code used on the AVC with interleaving, and the error probability predicted by the CC approximation. We remark here that applying a fixed permutation to encoded symbols transmitted over an AVC will clearly not alter the worst-case error probability. A “jammer” could, at least in principle, apply the same permutation to his transmissions. In practice, the transmitter may use a pseudo- random interleaver so that the time-ordering of encoded sym- bols is difficult to predict or computationally difficult to identify by a jammer [6]. Here we investigate random interleaving as a probabilistic model of pseudo-random interleaving.

A second aim of this correspondence is to determine the information-theoretic limitations of the AVC when randomly- interleaved codes are used. It is well-known that random codes are more powerful than deterministic codes on the AVC in the sense that a higher capacity can be achieved. In fact, in many cases of practical interest, a positive information rate can be achieved only if a random code is used. A deterministic code

OO18-9448/91/0300-0413$01 .OO 0 1 991 IEEE