Theory and Applications of the Shift-Invariant, Time-Varying and Undecimated Wavelet Transforms

RICE UNIVERSITYTheory and Applications of the Shift-Invariant,Time-Varying and Undecimated WaveletTransformsbyHaitao GuoA Thesis Submittedin Partial Fulfillment of theRequirements for the DegreeMaster of ScienceApproved, Thesis Committee:C. Sidney Burrus, ChairmanProfessor of Electrical and ComputerEngineeringRichard G. BaraniukAssistant Professor of Electrical andComputer EngineeringRonny O. Wells Jr.Professor of MathematicsHouston, TexasMay, 1995

Theory and Applications of the Shift-Invariant,Time-Varying and Undecimated WaveletTransformsHaitao GuoAbstractIn this thesis, we generalize the classical discrete wavelet transform, and constructwavelet transforms that are shift-invariant, time-varying, undecimated, and signaldependent. The result is a set of powerful and e�cient algorithms suitable for awide variety of signal processing tasks, e.g., data compression, signal analysis, noisereduction, statistical estimation, and detection. These algorithms are comparable andoften superior to traditional methods. In this sense, we put wavelets in action.

AcknowledgmentsI want to thank my thesis advisor, Dr. Sidney Burrus, for introducing me into the the-ory of wavelets, and for his encouragement and guidance. His perspective and insighthad a profound in uence on this thesis. I also would like to thank the members ofmy thesis committee, Dr. Richard Baraniuk and Dr. Ronny Wells. They all providedsubstantial input throughout the period during which this research was being done.I am also indebted to Ramesh Gopinath for his help and encouragement. Thanks tothe members of the DSP group and the Computational Mathematic Laboratory ofRice University for many fruitful discussions and collaborations. Special thanks toIvan Selesnick, Markus Lang, and Jan Odegard of the DSP group for reading earlierdrafts of this thesis.The generous �nancial support of ARPA and Texas ATP grant that made thisresearch possible is also gratefully acknowledged.Also, I would like to thank all those authors who made their technical reports andpublications readily available on the Internet and the World Wide Web.On the personal side, I would like to thank my parents for making this all possiblethrough their constant support and understanding over the years. I really appreciatethe love and support of my companion, Lin Yue, who has been exploring life with meand shares my interest in academic endeavor.

ToMom and Dad

ContentsAbstract iiAcknowledgments iiiList of Illustrations ixList of Tables xiii1 Introduction 11.1 Historical Background of Wavelets : : : : : : : : : : : : : : : : : : : : 11.2 The Scope of the Thesis : : : : : : : : : : : : : : : : : : : : : : : : : 21.2.1 From Fixed Transform to Adaptive Transform : : : : : : : : 21.2.2 From Deterministic Signal Analysis to Statistical SignalProcessing : : : : : : : : : : : : : : : : : : : : : : : : : : : : 41.2.3 From Theory to Application : : : : : : : : : : : : : : : : : : : 51.3 Wavelet Applications { A General Framework : : : : : : : : : : : : : 51.4 Overview of the Thesis : : : : : : : : : : : : : : : : : : : : : : : : : : 61.5 Acronyms : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 81.6 Notation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 92 Multiresolution Analysis 122.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 122.2 Continuous Multiresolution Analysis : : : : : : : : : : : : : : : : : : 132.2.1 Basic De�nitions in Multiresolution Analysis : : : : : : : : : : 132.2.2 Basic Operations in Multiresolution Analysis : : : : : : : : : : 142.2.3 Multiresolution for L2(IR) : : : : : : : : : : : : : : : : : : : : 15

vi2.2.4 Nesting Spaces for the Continuous Multiresolution Analysis : 162.2.5 Orthogonal Multiresolution Analysis : : : : : : : : : : : : : : 182.3 Discrete Wavelet Transform : : : : : : : : : : : : : : : : : : : : : : : 192.3.1 Fast Algorithm : : : : : : : : : : : : : : : : : : : : : : : : : : 192.3.2 Computational Complexity of the DWT : : : : : : : : : : : : 212.4 Multiresolution for Discrete Wavelet Transform : : : : : : : : : : : : 222.4.1 Basic De�nitions in Multirate Digital Signal Processing : : : : 222.4.2 Discrete Wavelet Transform Filters : : : : : : : : : : : : : : : 232.4.3 The Connection between Discrete and Continuous MRAs : : : 252.5 Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 253 Best Basis Algorithms 263.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 263.2 Best Wavelet Packet Transform : : : : : : : : : : : : : : : : : : : : : 273.2.1 Basic Idea : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 273.2.2 Fast Searching Algorithm : : : : : : : : : : : : : : : : : : : : 283.2.3 Power and Complexity : : : : : : : : : : : : : : : : : : : : : : 323.3 Best Shift Wavelet Transform : : : : : : : : : : : : : : : : : : : : : : 343.3.1 Basic Idea : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 343.3.2 Fast Searching Algorithm : : : : : : : : : : : : : : : : : : : : 363.3.3 Power and Complexity : : : : : : : : : : : : : : : : : : : : : : 393.4 Best Shift Wavelet Packet Transform : : : : : : : : : : : : : : : : : : 403.4.1 Basic Idea : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 403.4.2 Fast Searching Algorithm : : : : : : : : : : : : : : : : : : : : 423.4.3 Power and Complexity : : : : : : : : : : : : : : : : : : : : : : 463.5 Time-Varying Best Wavelet Packet Transform : : : : : : : : : : : : : 483.5.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : 483.5.2 Fast Searching Algorithm : : : : : : : : : : : : : : : : : : : : 49

vii3.5.3 Power and Complexity : : : : : : : : : : : : : : : : : : : : : : 523.5.4 Discussions : : : : : : : : : : : : : : : : : : : : : : : : : : : : 543.6 Time-Varying Best Shift Wavelet Packet Transform : : : : : : : : : : 563.7 Discussion and Future Work : : : : : : : : : : : : : : : : : : : : : : : 564 Applications of Orthonormal Wavelet Transform 584.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 584.2 Signal Analysis : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 584.3 Data Compression : : : : : : : : : : : : : : : : : : : : : : : : : : : : 594.3.1 Data Compression in Wavelet Domain : : : : : : : : : : : : : 594.3.2 Rate-Distortion as an Additive Measure : : : : : : : : : : : : 604.4 Denoising : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 614.4.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : 614.4.2 Denoising of a Single Observation : : : : : : : : : : : : : : : : 624.4.3 Denoising of a Sequence of Observations : : : : : : : : : : : : 694.4.4 Denoising in an ON Basis : : : : : : : : : : : : : : : : : : : : 714.4.5 Best Basis Denoising : : : : : : : : : : : : : : : : : : : : : : : 734.4.6 Speckle Reduction : : : : : : : : : : : : : : : : : : : : : : : : 744.5 Joint Denoising and Data Compression : : : : : : : : : : : : : : : : : 864.5.1 Problem Setup : : : : : : : : : : : : : : : : : : : : : : : : : : 864.5.2 Design of an Asymptotically Optimal Quantizer : : : : : : : : 874.5.3 Example : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 884.6 Detection : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 904.7 Future Work : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 915 Undecimated Wavelet Transforms 925.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 925.2 Undecimated Discrete Wavelet Transform : : : : : : : : : : : : : : : : 92

viii5.2.1 A Review : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 925.2.2 The Computational Complexity of the UDWT : : : : : : : : : 945.2.3 Multiresolution for the UDWT : : : : : : : : : : : : : : : : : 955.3 Undecimated Discrete Wavelet Packet Transform : : : : : : : : : : : 955.4 Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 976 Applications of Undecimated Wavelet Transforms 986.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 986.2 Convolution using the UDWT : : : : : : : : : : : : : : : : : : : : : : 986.2.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : 986.2.2 The Scheme : : : : : : : : : : : : : : : : : : : : : : : : : : : : 996.2.3 Limitations : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1006.2.4 Computational Complexity of Convolution Algorithms : : : : 1016.2.5 Size Property of the Autocorrelation Sequences of DWT Basis 1026.2.6 Example : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1036.2.7 Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1066.3 Compression using the UDWT : : : : : : : : : : : : : : : : : : : : : : 1066.3.1 Quantization of the Coe�cients of the UDWT : : : : : : : : : 1066.4 Denoising in Redundant Basis : : : : : : : : : : : : : : : : : : : : : : 1076.4.1 The Method : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1076.4.2 The Ideal Risk : : : : : : : : : : : : : : : : : : : : : : : : : : 1086.5 Compression and Denoising using the UDWT : : : : : : : : : : : : : 1096.6 Conclusion and Future Work : : : : : : : : : : : : : : : : : : : : : : : 1117 Conclusions 1127.1 Summary of the Work : : : : : : : : : : : : : : : : : : : : : : : : : : 1127.2 What Are the Problems that Wavelets Are Good For? : : : : : : : : 1127.3 Future Directions : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 113

ixBibliography 114

Illustrations1.1 Wavelet Applications { A General Framework : : : : : : : : : : : : : 51.2 Overview of the Thesis : : : : : : : : : : : : : : : : : : : : : : : : : : 72.1 The nesting spaces for the continuous multiresolution analysis. : : : : 172.2 Functions that span the nesting spaces. : : : : : : : : : : : : : : : : : 172.3 The time-frequency/scale plot for functions that span the nestingspaces. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 182.4 Building block for the discrete wavelet transform. : : : : : : : : : : : 192.5 Diagram for the three-level discrete wavelet transform. : : : : : : : : 202.6 Building block for the inverse discrete wavelet transform. : : : : : : : 202.7 Diagram for the three-level inverse discrete wavelet transform. : : : : 213.1 Diagram for the three-level discrete wavelet packets transform. : : : : 283.2 The building block of the cost tree for the best wavelet packetalgorithm. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 293.3 The complete cost tree for the three-level best wavelet packet algorithm. 313.4 Examples of pruned wavelet packet trees and correspondingtime-frequency plots. : : : : : : : : : : : : : : : : : : : : : : : : : : : 313.5 The splitting tree that corresponds to the three-level classical discretewavelet transform. : : : : : : : : : : : : : : : : : : : : : : : : : : : : 323.6 The power of the best wavelet packet algorithm.log2(PBWPA(N; log2N;M)): : : : : : : : : : : : : : : : : : : : : : : : 33

xi3.7 Two equivalent representations of the building block for theundecimated discrete wavelet transform. : : : : : : : : : : : : : : : : 353.8 Diagram for the two-level undecimated discrete wavelet transform : : 353.9 The underlying functions for the undecimated wavelet transform. : : 363.10 The building block of the cost tree for the best shift wavelettransform algorithm. : : : : : : : : : : : : : : : : : : : : : : : : : : : 373.11 The cost tree for the three-level best shift wavelet transform algorithm. 383.12 Examples of paths for some three-level best shift wavelet transforms. 383.13 Examples of pruned cost tree for the best shift wavelet algorithm. : : 393.14 Diagram for the two-level undecimated wavelet packet transform. : : 413.15 The building block of the cost tree for the best shift wavelettransform algorithm. : : : : : : : : : : : : : : : : : : : : : : : : : : : 423.16 The complete cost tree for the two-level best shift wavelet packetstransform algorithm. : : : : : : : : : : : : : : : : : : : : : : : : : : : 443.17 Examples of the trees correspond to some two-level best shift waveletpacket transform. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 443.18 Examples of pruned shift wavelet packet tree and correspondingtime-frequency plot. : : : : : : : : : : : : : : : : : : : : : : : : : : : : 453.19 The power of the best shift wavelet packet algorithm.log2(PBSWPA(N; log2N;M)): : : : : : : : : : : : : : : : : : : : : : : 473.20 Example of a cost table for all the segments of a length-8 input. : : : 493.21 Example of a time-varying best cost table for all the segments thatstarts from 1. The input length is 8. : : : : : : : : : : : : : : : : : : 503.22 The forest of cost trees for the time-varying wavelet packet algorithm. 513.23 The power of the time-varying best wavelet packet algorithm.log2(PTVBWPA(N; log2N;M)): : : : : : : : : : : : : : : : : : : : : : : 544.1 The diagram of the wavelet based signal analysis. : : : : : : : : : : : 58

xii4.2 The diagram of the wavelet based data compression. : : : : : : : : : : 594.3 Functionals of the Gaussian PDF. : : : : : : : : : : : : : : : : : : : : 634.4 x2[Q(�� x)�Q(� � x)], Q2(� � x) +Q2(� + x) and Rxhard� . : : : : 644.5 Q2b(� � x; � ) +Q2b(� + x; � ) and Rxsoft� . : : : : : : : : : : : : : : : : 654.6 Rxsoft� =[(� 2 + 1)(e��2=2 +Rxideal)] : : : : : : : : : : : : : : : : : : : : : 684.7 Rxhard� =[(� 2 + 1)(e��2=2 +Rxideal)]. : : : : : : : : : : : : : : : : : : : : 684.8 The diagram of the best basis denoising. : : : : : : : : : : : : : : : : 734.9 The diagram of speckle reduction via wavelet shrinkage. : : : : : : : : 754.10 The Original SAR image of a farm area, HV polarization. : : : : : : : 794.11 Processed image, using Daubechies' length-4 wavelets(HV), softthresholding. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 794.12 Wavelet based multi-polarization speckle reduction (method 1).Step 1: Perform PWF. Step 2: Perform wavelet denoising. : : : : : : 804.13 Wavelet based multi-polarization speckle reduction (method 2).Step 1: Denoise individual polarimetric images HH, HV and VV.Step 2: Combine with PWF. : : : : : : : : : : : : : : : : : : : : : : 814.14 Wavelet based multi-polarization speckle reduction (method 3).Step 1: Decorrelate with PWF matrix. Step 2: Denoise with waveletthresholding. Step 3: Add resulting three images in magnitude. : : : 814.15 Original PWF image of a farm area. : : : : : : : : : : : : : : : : : : 834.16 Resulting image of wavelet based multi-polarization speckle reduction(method 1). Daubechies' length-4 wavelet and the soft thresholdingscheme are used. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 834.17 The diagram of the joint denoising and compression using the DWT. 864.18 Various quantizers for data compression. : : : : : : : : : : : : : : : : 884.19 Examples of joint denoise and compression : : : : : : : : : : : : : : : 894.20 The rate-risk curve of joint denoising and compression using the ONDWT : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 90

xiii4.21 The diagram of the wavelet based detection. : : : : : : : : : : : : : : 905.1 Two equivalent representations for the building block for theundecimated inverse discrete wavelet transform. : : : : : : : : : : : : 935.2 Diagram for the two-level undecimated inverse discrete wavelettransform : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 945.3 Diagram for the two-level undecimated inverse discrete waveletpacket transform. : : : : : : : : : : : : : : : : : : : : : : : : : : : : 966.1 The diagram of the convolution using the UDWT. : : : : : : : : : : : 996.2 Example of a lowpass �lter that can be implemented using the UDWT. 1056.3 The diagram of the denoising using the UDWT. : : : : : : : : : : : : 1076.4 Example of joint denoising and compression using the UDWT. : : : : 1106.5 The rate-risk curve of joint denoising and compression using theundecimated wavelet transform. : : : : : : : : : : : : : : : : : : : : : 110

Tables1.1 Elements in the general framework for di�erent signal processing tasks. 63.1 The power of the best wavelet packet algorithm.log2(PBWPT (N; log2N;M)): : : : : : : : : : : : : : : : : : : : : : : : 333.2 The power of the best shift wavelet packet algorithm.log2(PBSWPT (N; log2N;M)): : : : : : : : : : : : : : : : : : : : : : : : 463.3 The power of the time-varying best wavelet packet algorithm.log2(PTVBWPA(N; log2N;M)): : : : : : : : : : : : : : : : : : : : : : : 534.1 Speckle reduction results for signal polarization SAR image: s=m forclutter data. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 784.2 Speckle reduction results for signal polarization SAR image: log � stdfor clutter data. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 784.3 Speckle reduction results for signal polarization SAR image:target-to-clutter ratio(t/c) and de ection ratio for clutter data. : : : : 844.4 Speckle reduction results for multi-polarization SAR image: s=m forclutter data. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 844.5 Speckle reduction results for multi-polarization SAR image: log � stdfor clutter data : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 844.6 Speckle reduction results for multi-polarization SAR image:target-to-clutter ratio(t/c) and de ection ratio for di�erent methods. 85

xv6.1 The QMF coe�cients for the designed lowpass �lter. : : : : : : : : : 1046.2 The weighting coe�cients for the designed autocorrelation sequences. 104

1Chapter 1Introduction1.1 Historical Background of WaveletsWavelets are mathematical tools with powerful structure and enormous freedom. Themultiresolution structure of wavelets allows one to zoom in on local signal behaviorto analyze signal details, or zoom out to get a global (Fourier like) view of the signal.Although the idea of multiresolution analysis goes back to early years, it was formallydeveloped in the 1980's by Morlet, Grossmann [37], Meyer [56, 58], Mallat [54], andothers. The construction of compactly supported wavelets by Daubechies [12] furthercaptured the attention of the larger scienti�c community and triggered a huge amountof research activity, especially in the areas of signal processing, applied mathematics,numerical analysis, and statistics.In one of the key results of the wavelet theory, the wavelets are shown to beunconditional basis for a very wide class of function spaces [56]. As a result the co-e�cients of the wavelets transform reveal properties (e.g., time-frequency contents,singularities, smoothness, etc.) of the signals e�ectively and faithfully. Moreover, ithas recently been shown that the wavelet coe�cients for signals belonging to thesefunction spaces are very sparse, and the decay of the magnitude of the wavelet coef-�cients is the fastest among all orthonormal basis. Based on these results, Donohoand Johnstone have shown that wavelets basis are optimal basis for data compres-sion, noise reduction, and statistical estimation [19]. In fact, simple thresholding ofthe wavelet coe�cients works essentially better for recovery and estimation than anyother method [19, 26].

2The multiresolution structure of wavelets permits fast implementations (on the or-der of the length of the signal) to calculate the wavelet coe�cients, thus this powerfultool is also very practical.1.2 The Scope of the ThesisCurrently, there are several trends in the research area of wavelets. These directionsare theoretically challenging and practically important. We have devoted much e�ortin these directions, and our results are summarized in this thesis. Of course, ourbackground and interests in digital signal processing certainly in uence our viewpoint.1.2.1 From Fixed Transform to Adaptive TransformThe classical wavelet transforms (either discrete or continuous) are signal indepen-dent and �xed transforms. It is clearly desirable to have transforms that adapt to theinput signal for the given signal processing task. The pioneering work was done byCoifman with the introduction of the wavelet packets transform [9]. Roughly speak-ing, the wavelet packets transform provides a set of basis functions with di�erent time-frequency/scale characteristics. Due to the property of wavelets, the wavelet packetstransform can be implemented with a computational complexity is O(N logN), whereN is the length of the input signal. For many applications, one can construct thebest orthonormal transform from the wavelet packets basis, using a fast binary treesearching algorithm.One well known disadvantage of the discrete wavelet transform (DWT) is the lackof shift invariance. The reason is that there are many legitimate DWTs for di�erentshifted versions of the same signal. Looking at the problem from the positive side,we [38], independently of other researchers [52, 14], proposed an algorithm to �nd thebest shifted version of DWTs for a given input signal. The algorithm has a structuresimilar to the best wavelet packets algorithm, and the computational complexity ofremains O(N logN). Going one step further, we have proposed an algorithm to �nd

3the best wavelet packets and best shift jointly. Similar algorithms were independentlyproposed in [14, 8], and were shown to be advantageous as an unknown transientdetector.Another well known disadvantage of the discrete wavelet transform is that thetransform does not change with time. Sudden change in the signal causes many largewavelet coe�cients across several scales. Although those large coe�cients may helpus to detect the change, they cause much trouble for noise reduction (denoising) andcompression. In [15, 20], the best time segmentations were searched, and one sep-arate wavelet transform was used for each segment. Although the wavelet packetstransform can adapt to the frequency/scale characteristics of the signal, it does notvary over time. Thus the wavelet packets transform can not adapt to non-stationarysignals. Since a majority of the real world signals are non-stationary, it is naturallydesirable to have an algorithm that segments the signal in time, and �nds the bestfrequency/scale representation for each segment as the normal wavelet packets trans-form does. However, this is clearly a challenge, since the number of choices is huge.The �rst attack on the problem is the so called double tree algorithm [46], whereonly binary tree type time segmentations are allowed. Very recently, progressive ap-proaches are proposed [79, 72], where the input signal is �rst cut into a sequence ofbasic blocks, then dynamic programming is used to �nd the best time segmentationand frequency adaptation.We are also interested in the problem of �nding the best time-varying waveletpacket transform, and also realized that the problem is approachable only using dy-namic programming [39]. In this thesis, we describe a dynamic programming algo-rithm that solves the problem completely, removing all the unnecessary restrictionslike the binary tree type time segmentations in [46], or the blocking in [79, 72]. Wealso take fully the advantage of the structure of the problem and the structure of thewavelet transform to minimize the complexity of our algorithm. Further combiningour shift-invariant wavelet packets algorithm with our time-varying wavelet packets

4algorithm, we have also developed a fast algorithm that simultaneously �nd the besttime segmentation, the best wavelet packet, and the best shift. Our time-varyingwavelet transforms are very e�cient and powerful. We have used them for signalanalysis, data compression, and denoising, and shown that signi�cant improvementscan be achieved by the time-varying transforms.1.2.2 From Deterministic Signal Analysis to Statistical Signal ProcessingThe wavelet transform was �rst introduced for the analysis of deterministic signals infunction spaces. The applications in a stochastic setting, however, have only recentlybeen considered [27, 48, 21], and are shown to be very promising.One common task for statistical signal processing is the signal detection, whereone has to make the decision whether the interested signal is present in the noisyobservation or not. When the interested signal is known, the optimal detector isthe matched �lter. However, when the signal is not known, e.g., unknown transient,the detection problem is still open. In this thesis, we propose a novel wavelet-basedtransient detector, and show that it performs better than the traditional square-lawdetector. Our early work was reported in [41].Another important task is called denoising or statistical estimation, where the ob-servation is contaminated by noise, and we want to remove the noise and restore thesignal. It has been shown that the wavelet thresholding method in [21] is asymptoti-cally minimax near optimal for signals from a wide verity of smooth function spaces,and that no other method has a similar kind of optimality [26]. We signi�cantly im-prove the classical denoising method by using an undecimated, thus redundant andshift invariant wavelet transform [49]. A detailed analysis of our novel noise reductionmethod is presented in this thesis. Although similar observations have been made[25, 61, 10], no theoretical results have been previously reported.

51.2.3 From Theory to ApplicationAs the wavelet theory reaches certain maturity, the research activities increase in theapplication areas. The huge volumes of proceedings of the conferences devoted towavelet applications re ect this trend.One practical application that we have considered is the speckle reduction ofsynthetic-aperture radar (SAR) [44, 45]. Independent of other researcher [59], wehave applied the denoising method of Donoho and Johnstone, and our improvedmethod to the SAR problem. The resulting wavelet-based speckle reduction methodis shown to be superior to the traditional methods.1.3 Wavelet Applications { A General FrameworkWaveletTransform Coe�cientsProcessing InverseWaveletTransform- -Figure 1.1 Wavelet Applications { A General FrameworkMany wavelet based application algorithms share a structure similar to Figure 1.1.The �rst stage is some type of wavelet transform, and the second stage is the (of-ten nonlinear) processing of wavelet coe�cients, and if necessary, the inverse wavelettransform follows. For tasks such as data compression, signal analysis, noise reduc-tion, statistical estimation, and detection, the processing methods of the waveletcoe�cients are certainly di�erent. Also, the desired properties of the wavelet trans-forms are di�erent. For example, for the purpose of signal analysis, we might want awavelet transform that has better time-frequency localization, while for noise reduc-tion, we want the signal and noise are maximally separated in the wavelet domain. In

6Table 1.1, we summarize the desirable properties of the transforms and the coe�cientprocessing techniques for many signal processing tasks.We emphasize that the desired property of wavelets can often be measured point-wise on the wavelet transform coe�cients, so the measures are often additive. Thismotivates us to construct abstract algorithms that can minimize any additive measure,then use an appropriate measure for the problem at hand. In this sense, we constructa general framework for wavelet based applications. Simple adjustments result ina powerful algorithm for any particular task. The construction and analysis of thegeneral framework is the ultimate goal of this thesis.Task Desirable transform Coe�cients processingsignal analysis time-frequency localization display and comparisondata compression lower rate-distortion curve quantization and codingnoise reduction signal-noise separation thresholdingstatistical estimationdetection signal concentration detection statisticsTable 1.1 Elements in the general frameworkfor di�erent signal processing tasks.1.4 Overview of the ThesisChapter 2 is the foundation of the whole thesis. To facilitate further construction, wepresent the multiresolution analysis from the continuous point of view. The connec-tion with the discrete wavelet basis is discussed afterward. The logical structure ofthe thesis is plotted in Figure 1.2. As we can see, there are two separated branchesafter Chapter 2.The �rst branch leads to Chapter 3, where we restrict ourselves to orthonormal(ON) wavelet transforms. In this chapter, we follow the best basis paradigm, andconstruct a sequence of algorithms with increasing power and complexity. These al-

7Multiresolution AnalysisChap. 2?Best Basis AlgorithmsChap. 3??AnalysisSec. 4.2? ?DetectionSec. 4.6??DenoisingSec. 4.4??CompressionSec. 4.3? ?Denoising & CompressionSec. 4.5? ?ConclusionChap. 7?Undecimated DWTChap. 5?ConvolutionSec. 6.2? ?CompressionSec. 6.3? ?DenoisingSec. 6.4??Denoising & CompressionSec. 6.5??Figure 1.2 Overview of the Thesisgorithms minimize any additive measure over increasing sets of possible choices, whileretaining the orthonormal property of the resulting transform. To our knowledge, thelast two algorithms, where we seek the best time-varying wavelet packets transform,have not appeared in the literature.In Chapter 4, we study several applications such as signal analysis (Section 4.2),data compression (Section 4.3), denoising (Section 4.4), detection (Section 4.6), andjoint denoising and compression (Section 4.5). Some of our early results have beenreported in [41, 45, 44, 43].The second branch leads to Chapter 5, where we study the property of the un-decimated discrete wavelet transform (UDWT). Although it has been independentlydiscovered several times in the literature, the UDWT is not widely understood.Chapter 6 presents several novel applications of the the undecimated discretewavelet transform. In Section 6.2, we show, to our knowledge for the �rst time, thatwe can implement certain linear convolutions using the UDWT. We further show

8that for some cases, the UDWT based implementation has its unique advantage.Section 6.3 shows that UDWT is robust against the quantization noise introducedin data compression. One of the main contributions of the thesis is in Section 6.4,where we use the UDWT and drastically improved the noise reduction performance,compared with the classical denoising method by Donoho and Johnstone [21]. Thespecial combination of joint denoising and compression using the UDWT is discussedin Section 6.5. Some of our early works on the applications of the UDWT werereported in [42, 49, 43, 40].Chapter 7, the last chapter of the thesis, contains the conclusion and discussionsabout possible future work.Since this thesis is about wavelets, the style of the thesis follows the multiresolutionstructure of the wavelets. The thesis begins with an introduction and ends with aconclusion and future work. Each main chapter also begins with an introduction andends with a conclusion and future work. The sections in Chapter 4 and Chapter 6,also follow the same structure. At one point, the maximum scale of the thesis reaches�ve.1.5 AcronymsIn this thesis, we generalize the classical discrete wavelet transform, and constructmany useful variations. For conciseness, we denote them as:DWT Discrete Wavelet TransformDWPT Discrete Wavelet Packets TransformBSWT Best Shift Wavelet TransformBWPT Best Wavelet Packet TransformBSWPT Best Shift Wavelet Packet TransformTVBWPT Time-Varying Best Wavelet Packet TransformTVBSWPT Time-Varying Best Shift Wavelet Packet Transform

9UDWT Undecimated Discrete Wavelet TransformUDWPT Undecimated Discrete Wavelet Packet TransformThe fast searching algorithms that �nd the best transforms for the input signalare calledBSWA Best Shift Wavelet AlgorithmBWPA Best Wavelet Packet AlgorithmBSWPA Best Shift Wavelet Packet AlgorithmTVBWPA Time-Varying Best Wavelet Packet AlgorithmTVBSWPA Time-Varying Best Shift Wavelet Packet AlgorithmOther acronyms that we use in this thesis are summarized below. Most of themare standard in the wavelet and signal processing communities.ACF Autocorrelation Function (Sequence)FIR Finite Impulse ResponseFFT Fast Fourier TransformMLE Maximum Likelihood EstimatorMRA Multiresolution AnalysisON OrthonormalPDF Probability Density FunctionPWF Polarimetric Whitening FilterQMF Quadrature Mirror FilterSAR Synthetic Aperture RadarSNR Signal to Noise Ratio1.6 NotationIn this thesis, we encounter many mathematical entities, e.g., vector, matrix andvarious operators. Also we need many performance measures to compare various

10algorithms. It is our e�ort to keep the notations consistent. Here we summarize thisnotation that used globally in this thesis.ABCXYZ Matricesabcxyz Vectorsabcxyz Scalar variablesABCXY Z Scalar constants� Convolution Kronecker tensor product� Direct sum of spaces�(x) Continuous scaling function (x) Continuous wavelet functionT� Continuous translation operatorR� Continuous repetition operatorDj Continuous dilation operatorsj Discrete scaling sequence at the jth levelwj Discrete wavelet sequence at the jth levelaj Autocorrelation sequence of wjDL Discrete down-sampling operatorUL Discrete up-sampling operatorSi Discrete shift operatorRestimate Risk of the estimateL(h) Length of the vector hC(x) The cost measure on vector xB(x) The best cost measure on vector xMoperation(: : :) The number of multiplications of the numerical operationAoperation(: : :) The number of additions of the numerical operationPalgorithm(: : :) The power of the algorithmEvalalgorithm(: : :) The number of cost evaluations of the algorithm

11Memalgorithm(: : :) The computer memory requirement of the algorithmCompalgorithm(: : :) The number of comparisons need for the algorithm

12Chapter 2Multiresolution Analysis2.1 IntroductionThe idea of the multiresolution analysis is not new, it has already been used in manyareas. In computer vision, images are successively approximated starting from acoarse version and going to a �ne resolution. The images of di�erent resolutions forma pyramid. Many computer vision problems, e.g., target detection, motion estimation,and object recognition, can be e�ciently solved on the pyramid. The pyramid codingscheme of Burt and Adelson [6] has the same avor. In computer graphics, thesuccessive re�nementmethod that generates �ner and �ner approximations of surfacesand curves is also a multiresolution approach. The multigridmethods for the solutionsof partial di�erential equations are also closely related.The multiresolution framework for wavelet expansions was formulated by Mallat[54] and Meyer [57]. It uni�ed many previous ideas, and soon became a powerfulconcept to analyze, construct, and interpret wavelet transforms.In this chapter, we will �rst present the original multiresolution analysis of Mallatand Meyer. The wavelets are building blocks for the multiresolution framework. Sincethe nesting spaces for the original multiresolution analysis are spanned by continuouswavelets, we call it the continuous multiresolution analysis. A fast algorithm for thewavelet transform is introduced and its computational complexity is analyzed.The discrete wavelet transform also generates a discrete multiresolution analysis,where the building blocks are discrete sequences instead of continuous functions. In

13Section 2.4, we study the discrete multiresolution analysis and its relation with thecontinuous counterpart.This chapter serves as a foundation of the thesis, so many basic de�nitions andnotations in wavelets theory and digital signal processing are introduced.2.2 Continuous Multiresolution Analysis2.2.1 Basic De�nitions in Multiresolution AnalysisSet of FunctionsWe �rst de�ne a set of functions on the real line asI = f i(t) j i 2 Ig (2:1)where I is the index set, and t 2 R. Usually, I is �nite or countable.Space Spanned by a Set of FunctionsThe space spanned by a set of functions I is denoted as:I = SpanfIg (2:2)and f(t) 2 I ) f(t) =Xi2I ai i(t) (2:3)where the ai 2 IR, are called the expansion coe�cients. Since the functions in Iare not necessarily orthogonal to each other, the ai's are not necessarily unique. Sowe can have di�erent sets of coe�cients for one function. Practically, the procedure,method or algorithm which is used to calculate the coe�cients will �x the ai's. It isalso possible to choose a set of ai's based on some optimality criterion de�ned on thespace that the ai's reside in. Then the problems become 1) which criterion to use,and 2) �nd an e�cient algorithm to search for the optimal coe�cients.

142.2.2 Basic Operations in Multiresolution AnalysisTranslationLet (t) be a function on the real line, the translation of (t) by the amount � isde�ned as: T� = (t� � ) (2:4)The translation of a set of functions is de�ned as the set of translations of all thefunctions in the set: T�I = f i(t� � ) j i 2 Ig (2:5)RepetitionWe then proceed to de�ne a family of a countable number of functions constructedby repeated translations of a single function as follows:R� = f k j k = Tk� = (t� k�); k 2 ZZg (2:6)where � is the step size between the translations. Also, k can be in any other set,e.g., �nite set of integers.We can similarly de�ne a repetition of a set of functions:R�I = fk;I j k;I = Tk�I ; k 2 ZZg= f k;i j k;i = Tk� i = i(t� k�); k 2 ZZ; i 2 Ig (2.7)DilationThe third operation is dilation:Dj (t) = 2�j=2 (2�jt); j 2 ZZ (2:8)This operation is often called the scaling operation. It is also possible to de�ne thescaling operation for power other than 2, and for non-integer scales.

15The dilation operator can be also applied to a set of functions:DjI = fDj i j i 2 Ig= n2�j=2 i(2�jt) j i 2 Io (2.9)and across a set of scales:DJI = fDj i j i 2 I; j 2 Jg= n2�j=2 i(2�jt) j i 2 I; j 2 Jo (2.10)Relations among the operatorsIt can be easily checked that the following relations hold:R�T� = T�R� (2.11)T�Dj = DjT�=2j (2.12)R�Dj = DjR�=2j (2.13)2.2.3 Multiresolution for L2(IR)Using the above de�nitions, Mallat [54] and Meyer [57] de�ned a multiresolutionanalysis of L2(IR) as a sequence of closed subspaces Vj of L2(IR), j 2 ZZ, with thefollowing properties:1. Vj � Vj+1,2. v(x) 2 Vj () D1v(x) 2 Vj+1,3. v(x) 2 V0 () T1v(x) 2 V0,4. +1[j=�1 Vj is dense in L2(IR) and +1\j=�1 Vj = 0,5. A scaling function � 2 V0, with a non-vanishing integral, exists such that R1�is a Riesz basis of V0.

16Since V�1 � V0, there exists a sequence h = (hk)2 `2(IR), such that the scalingfunction satis�es �(x) = 2Xk hk�(2x� k) (2:14)It is also obviously true that DjR1� is a Riesz basis of Vj . So we have� � � � V�2 � V�1 � V0 � V1 � V2 � � � (2:15)or � � � � D�2R1� � D�1R1� � R1� � D1R1� � D2R1� � � � (2:16)2.2.4 Nesting Spaces for the Continuous Multiresolution AnalysisThe orthogonal complement space of Vj in Vj�1 is called Wj, i.e.Vj�1 = Vj �Wj (2:17)So Mj Wj = L2(IR): (2:18)If there exists a function , such that R1 is a Riesz basis of W0, then is awavelet . So DjR1 is a Riesz basis of Wj, andDj�1R1� = DjR1��DjR1 : (2:19)So Mj DjR1 = L2(IR): (2:20)Since is a element of V�1, there exists a sequence g=(gk) 2 `2(IR) , such that (x) = 2Xk gk�(2x� k) (2:21)The nesting spaces spanned by the wavelets and scaling functions are shown inFigure 2.1. The functions that span the spaces are shown in Figure 2.2. If we represent

17the time-frequency contents of these functions by rectangular boxes in the time-frequency plane, we have Figure 2.3. Clearly, for high frequency, the time resolutionis higher but the frequency resolution is poor. While for low frequency, the frequencyresolution is higher, but the time resolution is poor. The time-frequency product is aconstant, as indicated by the uncertainty principle. Also the width of each frequencyband is proportional to the center frequency of the frequency band. Such a system isalso called a constant Q system, and has been studied earlier in circuit theory.V3 W3 W2 W1Figure 2.1 The nesting spaces for the continuous multiresolution analysis.W1W2W3V3Figure 2.2 Functions that span the nesting spaces.

18tfFigure 2.3 The time-frequency/scale plot forfunctions that span the nesting spaces.2.2.5 Orthogonal Multiresolution AnalysisIf we further require that all the functions that span the nesting spaces of the mul-tiresolution analysis are orthogonal to each other, then the resulting system is calledthe orthogonal multiresolution analysis. As we will see later, the orthogonal mul-tiresolution analysis has many advantages. So we mainly consider the orthogonalmultiresolution analysis in this thesis.First, if �(x) and (x) are orthogonal, we can show thatgk = (�1)kh1�k; (2:22)so h and g are related by time reversal and ipping signs of every other element.Second, �(x) is orthogonal to integer translations of itself. Using Eqn. (2.14), wecan show Xk hkhk�2n = �(n): (2:23)If we interpret h as coe�cients of the �nite impulse response (FIR) �lter, then h isthe so called quadrature mirror �lter (QMF) [71].

192.3 Discrete Wavelet Transform2.3.1 Fast AlgorithmThe multiresolution analysis is a systematic approach to analyze signals at di�erentresolutions. In order to analyze signals, we need to calculate the expansion coe�cientsof the signal, which are de�ned ascj(k) = hf(t);DjTk�i ; (2:24)dj(k) = hf(t);DjTk i : (2:25)They are called the scaling coe�cients and the wavelets coe�cients respectively.Instead of direct calculation, we explore the dependency of the coe�cients on dif-ferent scales. It can be shown that,cj(k) =Xm hm�2kcj�1(m); (2:26)dj(k) =Xm gm�2kcj�1(m): (2:27)The equivalent �ltering scheme is shown in Figure 2.4. Therefore, if we know thescaling coe�cients at the �ne scale, we can get all the scaling and wavelet coe�cientsat coarser scales by iterating the building block of Figure 2.4, and get a structure asin Figure 2.5. This is called the discrete wavelet transform (DWT).- - H - #2- L - #2cj�1 djcjFigure 2.4 Building block for the discrete wavelet transform.

20- - H - #2- L - #2 - - H - #2- L - #2 - - H - #2- L - #2Figure 2.5 Diagram for the three-level discrete wavelet transform.On the other hand, if we know the scaling and wavelet coe�cients at the coarserscale, we can get the �ner scale scaling coe�cients bycj�1(k) =Xm hm�2kcj(m) +Xm hm�2kdj(m): (2:28)The equivalent �ltering scheme is shown in Figure 2.6. So, if we know the waveletcoe�cients at all coarser scales, and the scaling coe�cients at the coarsest scale, wecan get the scaling coe�cients at the �nest scale by iterating the building block ofFigure 2.6, and get a structure as in Figure 2.7. This is called the inverse discretewavelet transform (IDWT). -H-"2- -L-"2- cj�1djcjFigure 2.6 Building block for the inverse discrete wavelet transform.

21-H-"2- -L-"2- -H-"2- -L-"2- -H-"2- -L-"2-Figure 2.7 Diagram for the three-level inverse discrete wavelet transform.2.3.2 Computational Complexity of the DWTLet us assume the length of the input signal is N , the length of the quadrature mirror�lter (QMF) is M , and the number of levels of decomposition is L. For �nite lengthsignal, we need to treat the boundaries correctly. There are many methods [7, 32, 46],the simplest one is to use the periodic extensions of the signal over the boundaries.The basic step of DWT is the convolution of the input signal with the QMF, andthe e�cient implementation has the lattice structure [76]. The number of multipli-cations and additions needed to convolve the signal with both the high pass and thelow pass QMF is1 MQMF (N;M) =MN; (2:29)AQMF (N;M) =MN: (2:30)Throughout this thesis, we use M to denote the number of real multiplications,and A for the number of real additions, various subscripts are used for di�erentalgorithms. Due to the lattice structure, the complexity in (2.29) and (2.30) is nearlyhalf of what is normally required for straight forward convolutions. If down-samplingis performed after the convolution, the above complexities are further cut by half.1First order approximation.

22For the orthonormal (ON) DWT,MDWT (N;M;L) =MN �1� 12L� ; (2:31)ADWT (N;M;L) =MN �1 � 12L� : (2:32)When L is su�ciently large, they converge to MN , which is independent of thenumber of levels of the decomposition. This is due to the successive down-samplingsthat are carried out at each level.Since the discrete wavelet transform is nothing but successive �ltering, the com-puter memory requirement isML. If all the input data are in memory, the DWT canbe implemented in place, resulting a memory requirement ofMemDWT(N;M;L) = N: (2:33)Throughout this thesis, we only consider QMFs with short length. So we use thelattice structure to implement them. For other types of wavelets, especially longerones, fast Fourier transform based algorithms are more e�cient [78].2.4 Multiresolution for Discrete Wavelet Transform2.4.1 Basic De�nitions in Multirate Digital Signal ProcessingIn the area of digital signal processing (DSP), we only deal with sequences of realnumbers. The fundamental operation in DSP is �ltering or convolution. For a inputsequence x = fx0; x1; x2; : : : ; xNg, and a �lter h = fh0; h1; h2; : : : ; hMg, the output ofthe �lter operation is y = x � h = fy0; y1; y2; : : : ; yLg; (2:34)where yi = MXj=0hjxi�j : (2:35)

23The auto-correlation sequence of �lter h, denoted by a = Ah, is de�ned as theconvolution of h with the time-reversed version of h,ai = (Ah)i = MXj=0 hjhj�i; i = �M + 1; : : : ; 0; : : :M � 1: (2:36)For multi-rate DSP, the down-sample operator D and up-sample operator U arede�ned as y = DLx () yi = (DLx)i = xLi (2:37)and z = ULx () zi = (ULx)i = 8><>: xi=L, if i is integer-multiple of L0, otherwise (2:38)where L is the rate changing factor. So DL keeps every Lth point of the input sequence,while UL inserts L� 1 zeros between every other point in the input sequence.The frequency response of a digital �lter h is de�ned as the discrete Fouriertransform of the �lter sequence,H(!) = Fh = MX0 hie�|!i; (2:39)where | = p�1. Clearly, H(w) is a 2� periodic complex valued function.It can easily be shown that the frequency response of the autocorrelation functiona of the digital �lter h isA(w) = Fa = M�1X�M+1 aie|!i = jH(w)j2: (2:40)2.4.2 Discrete Wavelet Transform FiltersFrom the discrete point of view, we summarize some of the previous results. Inthis section, many results are more precise, and are directly related to the actualimplementations.

24Inside the discrete wavelet transform (DWT), there are two length-M(even) quadra-ture mirror �lters (QMF) h and g, and they are related by time-reversal and ippingthe signs of every other points, i.e.gi = (�1)ihM�i+1; i = 1; 2; : : : ;M: (2:41)In order to be a wavelet �lter, the autocorrelation function a of h must satisfy thefollowing conditions, a0 = 1; (2.42)a2i = 0: (2.43)Also, the sum of hi's should be p2, i.e.Xi hi = p2: (2:44)These conditions require h to be a lowpass �lter. Furthermore consider Eqn. 2.41, wecan see that g is a highpass �lter. This justi�es the notation in Figure 2.4, 2.5, 2.6,and 2.7, where we used H for g, and L for h.So all the nonzero even indexed points of a must be zeros. Eqn. (2.42) and(2.43) are a set of M=2 quadratic equations, and Eqn. (2.44) is one linear equation,Luckily, the solutions to (2.44), (2.42) and (2.43) can be parameterized by M=2 � 1free parameters � = f�1; �2; : : : ; �M=2�1g [76]. However, h depends on � in a verynonlinear fashion.The ith scale/level scaling �lter si and wavelet �lter wi are recursively de�ned bysi = U2h � si�1; (2.45)wi = U2g � si�1; (2.46)i.e. up-sampling and convolution. We start froms0 = h; (2.47)w0 = g: (2.48)

25Parallel to the developments in Section 2.2.3, we can de�ne a discrete multires-olution analysis (DMRA) for RN , where the basis functions are discrete sequences.Instead of dilation of continuous functions, the basis sequences are related by up-sampling and convolutions.2.4.3 The Connection between Discrete and Continuous MRAsThe connections between discrete and continuous MRAs are,� If the starting sequences are the scaling coe�cients for the continuous multires-olution analysis, then the discrete multiresolution analysis generates the samecoe�cients as does the continuous multiresolution analysis on dyadic rationals.� When the number of levels is large, the basis sequences of the discrete multireso-lution analysis converge to the basis functions of the continuous multiresolutionanalysis.2.5 SummaryIn this chapter, we have introduced the multiresolution formulation of wavelets. Themultiresolution analysis frame work allows us to simultaneously have global and detailknowledge of the signal. A fast algorithm that performs the multiresolution analysisis also introduced, and is shown to be computationally very e�cient.

26Chapter 3Best Basis Algorithms3.1 IntroductionThe wavelet transforms not only provide a powerful multiresolution analysis of thesignals, they also have built-in structures that allow us to construct huge amount ofbasis that have di�erent time/space and frequency/scale characteristics. Due to theinherent hierarchical structure of the wavelet transforms, fast algorithms that expandsignals onto those basis, and e�cient methods that �nd the best basis within the hugeset, exist and are practical.The introduction of the wavelet packets transform [9] and the best wavelet packetalgorithm by Coifman opened a new area in the theory and applications of wavelets.The so called best basis paradigm has been proven to be a very powerful tool formany practical problems. We brie y present the original algorithm of Coifman inSection 3.2 and highlight the fundamental idea behind the algorithm.One of the main ingredients in the wavelets transform is the downsampling ateach scale. Although the downsampling reduces the output data rate and results incompact representation, it also introduces one artifact { shift-variance. The wavelettransform of a signal and the wavelet transform of a shifted version of the same signalis drastically di�erent. The lack of shift invariance is one well known disadvantage ofthe discrete wavelet transform. We study this problem in Section 3.3, and show thatwe can follow the best basis paradigm, and �nd the best shifted version of DWTs forthe input signal. Going one step further, we present an algorithm in Section 3.4 thatjointly �nds the best wavelet packet and the best shift.

27Since a majority of the real world signals are non-stationary, we would like tohave a wavelet transform that varies with time. Although many time-varying wavelettransforms have been constructed [46, 32], the problem of jointly �nding the besttime segmentation and the best wavelet packet transform for each segment remainedunsolved. Several attempts have been made [46, 79, 72], and all of them impose some-what unrealistic restrictions. Section 3.5 shows that the size of this problem increasesexponentially, but we can construct a solution with only polynomial complexity. Allthe unnecessary restrictions are removed, and the structure of the problem and thestructure of the wavelet transform are fully explored to minimize the complexity ofour algorithm. Section 3.6 further extends our approach to �nding the best timesegmentation, the best wavelet packet transform and the best shift for each segmentjointly.In the whole chapter, we follow the best basis paradigm, and introduce a setof algorithms with increasing power. For each algorithm, its power, computationalcomplexity and storage requirements are analyzed in detail.Throughout this chapter, we consider only the additive cost measure, denoted asC(x), where x is a vector. The cost measure is additive if the cost of the concatenatedvector equals the sum of the costs of the individual vectors, i.e.C([x;y]) = C(x) + C(y): (3:1)For convenience, we denote the best cost of a vector as B(x). Also, throughoutthis chapter, N stands for the length of the input, L stands for the number of levels,and M is the length of the QMF.3.2 Best Wavelet Packet Transform3.2.1 Basic IdeaInstead of repeating the DWT �ltering only for lowpass band, as in Section 2.3, wecan also split the highpass band. The resulting algorithm is called the (full) discrete

28wavelet packets transform (DWPT), and it was �rst introduced by Coifman [9]. The�lter bank structure of a three-level DWPT is shown in Figure 3.1.- - H - #2- L - #2

- - H - #2- L - #2 - - H - #2- L - #2- - H - #2- L - #2- - H - #2- L - #2 - - H - #2- L - #2- - H - #2- L - #2Figure 3.1 Diagram for the three-level discrete wavelet packets transform.Since the full discrete wavelet packets transform generates more output points(counting all the outputs of the intermediate �lter banks) than the inputs, it givesan over-complete representation. We can choose a set of basic vectors and form anorthonormal (ON) basis, such that some cost measure on the transformed coe�cientsis minimized. Moreover, when the cost is additive, the best (ON) wavelet packettransform (BWPT) can be found in O(N logN) time.3.2.2 Fast Searching AlgorithmThe building block of the cost tree for the best wavelet packet algorithm is shown inFigure 3.2. The root represents a vector, the left leaf represents the output vector ofthe lowpass �ltering and downsampling, and the right leaf represents the the output of

29highpass �ltering and downsampling. The corresponding one level wavelet transformis shown in Figure 2.4. Low HighFigure 3.2 The building block of the cost treefor the best wavelet packet algorithm.We now have two di�erent representations, one is given by the vector on the rootnode, the other is given by the union of the vectors on the leaves. Since the wavelettransform is invertible, these two representations contain the same information. Sowe can choose the one that minimizes the cost. Under the assumption that the costis additive, the optimal rule is simple:if C(root) < C(low) + C(high)choose rootB(root) = C(root)elsechoose leavesB(root) = C(low) + C(high)endThe complete three-level cost tree for the best wavelet packet selection is shown inFigure 3.3. In order to �nd the best cost and the best tree shape of an input vector,we use the dynamic programming method [2]. The dynamic programming theoryrequires that any subpath of the optimal solution is optimal for the subproblem. Inour problem, we want to �nd the optimal tree starting from the root. Suppose wehave found the optimal tree from the root, then the dynamic programming theorytells us that for any node on the optimal tree, the segment of the original optimaltree which starts from this node is optimal for the problem of �nding the optimal

30tree starting from this node. So, in order to �nd the optimal tree that starts fromthe root, we need to start from the leaves, working down to the root, and solve allthe subproblems of �nding the optimal tree starting from all the nodes in between.The pseudo code of the best wavelet packet algorithm (BWPA) is as follows:Step 0:take the complete wavelet packets transform as in Figure 3.1Step 1:for all the nodes in the treecalculate C(node)endfor all the end nodes (leaves)B(leaf) = C(leaf)endStep 2:for the current level from the one next to leaves to the rootfor all the nodes on the current levelif C(node) < B(low) + B(high)choose not to split from this nodeB(node) = C(node)elsechoose to further split from this nodeB(node) = B(low) + B(high)endendendStep 3:Starting from the root, walk out the optimal tree.In the pseudo code, low and high are lowpass child and highpass child of the currentnode respectively.Some examples of the resulting best tree shapes are shown in Figure 3.4(a,c), andthe corresponding time-frequency plots are shown in Figure 3.4(b,d). These plotsdemonstrate the frequency adaptation power of the wavelet packets transform. Thetree shape that corresponds to the DWT is shown in Figure 3.5.

31Figure 3.3 The complete cost tree for thethree-level best wavelet packet algorithm.

(a) (b)(c) (d)Figure 3.4 Examples of pruned wavelet packet treesand corresponding time-frequency plots.

32Figure 3.5 The splitting tree that corresponds to thethree-level classical discrete wavelet transform.3.2.3 Power and ComplexityWe measure the power of an algorithm by the number of possible combinations itsearches. Due to the structure,PBWPA(N;L;M) = 8>>>>><>>>>>: 1 if N is odd, or N < MPBWPA(N=2; L;M) N > 2LM(PBWPA(N=2; L � 1;M))2 + 1 otherwise (3:2)Although we can not �nd an analytical formula of PBWPA(N;L;M), we can calculatethem numerically. Some of the numerical results are shown in Table 3.1 and inFigure 3.6, both indicate that the power grows exponentially with the length of theinput.Assuming the input data are in the memory, and we want to keep all the completewavelet packets coe�cients in the memory, the memory requirement for the DWPTis MemDWPT (N;L;M) = (L+ 1)N: (3:3)And the numbers of multiplications and additions required for the DWPT areMDWPT (N;L;M) =MLN; (3:4)ADWPT (N;L;M) =MLN: (3:5)

33Table 3.1 The power of the best wavelet packetalgorithm. log2(PBWPT (N; log2N;M)):NM 1 2 4 8 16 32 64 128 256 512 10242 0 1 2 5 9 19 38 75 150 301 6024 0 0 1 2 5 9 19 38 75 150 3016 0 0 0 1 2 5 9 19 38 75 1508 0 0 0 1 2 5 9 19 38 75 15010 0 0 0 0 1 2 5 9 19 38 75200

400600

8001000

24

68

10

0

100

200

300

400

500

600

700

NM

log2

(P)

Figure 3.6 The power of the best wavelet packetalgorithm. log2(PBWPA(N; log2N;M)):

34However, if we know the best tree shape, the best wavelet packet transform is anON transform, and can be implemented in O(N) time.MBWPT (N;L;M) =MN; (3:6)ABWPT (N;L;M) =MN: (3:7)For the best wavelet packet algorithm, its complexity is related to the number ofnodes in the tree, which depends on the number of levels we take. We can easily showEvalDWPA(N;L;M) = (L+ 1)N; (3:8)MemDWPA(N;L;M) = 2L � 1; (3:9)ADWPA(N;L;M) = (L+ 1)N + 2L � 1; (3:10)CompDWPA(N;L;M) = 2L � 1: (3:11)3.3 Best Shift Wavelet Transform3.3.1 Basic IdeaRecall from Section 2.3, the basic DWT building block has two downsampling blocks.We realize that we can either take the even or odd indexed downsamples, and are ableto reconstruct the original signal. If we keep both the even and the odd parts, theresult is the one-level undecimated discrete wavelet transform (UDWT). The �lterbank structure of the one-level UDWT is shown in Figure 3.7.If we iterate the building block on the lowpass sides of the �lter bank, we get theUDWT. The two-level UDWT is shown in Figure 3.8.The underlying functions of the undecimated wavelet transform are shown inFigure 3.9. Compared with Figure 2.2, the shapes of the functions are the same, butthe distances between nearby functions remain unchanged across all the scales.In the rest of the section, we concentrate on �nding the orthonormal best shiftwavelet transform (BSWT), and leave the detailed discussions of the UDWT toChapter 5.

35- - H- L - -z�1#2- #2- -z�1#2- #2(a) - - H- L - - odd- even- - odd- even(b)Figure 3.7 Two equivalent representations of the buildingblock for the undecimated discrete wavelet transform.- - H- L

- - odd- even- - odd- even

- - H- L - - odd- even- - odd- even- - H- L - - odd- even- - odd- evenFigure 3.8 Diagram for the two-levelundecimated discrete wavelet transform

36W1W2V2Figure 3.9 The underlying functions forthe undecimated wavelet transform.3.3.2 Fast Searching AlgorithmWe are again in the situation where we have a choice of the way we decompose thesignal. The building block of the cost tree for the algorithm that �nds the best shiftwavelet transform (BSWT) is shown in Figure 3.10. The root represents the even andodd highpass outputs, the left leaf represents the outputs of the lowpass �ltering andeven downsampling, and the right leaf represents the outputs of the lowpass �lteringand odd downsampling. The corresponding one-level undecimated wavelet transformis shown in Figure 3.7. The optimal rule of �nding the best decomposition is as simpleas if C(even low) + C(even high) < C(odd low) + C(odd high)choose even downsamplesB(root) = C(even low) + C(even high)elsechoose odd downsamplesB(root) = C(odd low) + C(odd high)endThe complete three-level cost tree for the best shift wavelet transform selection isshown in Figure 3.11. Each node in the tree represents the even and odd highpassoutputs, the left child of that node represents the outputs of the lowpass �ltering andeven downsampling, and the right child represents the outputs of the lowpass �lteringand odd downsampling. The end nodes (leaves) are special in that they represent thedownsampled lowpass outputs from previous level.

37Even OddFigure 3.10 The building block of the cost tree forthe best shift wavelet transform algorithm.In order to �nd the best cost and the best tree shape of a input vector, the dynamicprogramming method [2] is used again. The pseudo code of the best shift waveletpacket algorithm (BSWA) isStep 0:take the undecimated wavelet transform as in Figure 3.8Step 1:for all the nodes in the treecalculate C(even high) and C(odd high)endfor all the end nodes (leaves)B(leaf) = C(leaf)endStep 2:for the current level from the one next to leaves to the rootfor all the nodes on the current levelif B(even low) + C(even high) < B(odd low) + C(odd high)choose even downsamplesB(node) = B(odd low) + C(odd high)else choose odd downsamplesB(node) = B(odd low) + C(odd high)endendendStep 3:Starting from the root, walk out the optimal path.Some examples of the trees that correspond to some best shift wavelet transformsare shown in Figure 3.12. Clearly they are di�erent from the trees for the best wavelet

38Figure 3.11 The cost tree for the three-levelbest shift wavelet transform algorithm.packet transform. The trees for the BSWT are various paths from the root to the leaf.Another example of the tree and the corresponding time-frequency plot is shown inFigure 3.13. Compared with Figure 2.3, the time-frequency plot has the same octavestructure, but the relative locations of the boxes are changed.

(a) (b)Figure 3.12 Examples of paths for somethree-level best shift wavelet transforms.

39(a) (b)Figure 3.13 Examples of pruned cost treefor the best shift wavelet algorithm.3.3.3 Power and ComplexityThe power of the best shift wavelet transform is easy to �nd, since the number ofpossible choices is the same as the number of end nodes (leaves) in the cost tree. SoPBSWA(N;L;M) = 2L; (3:12)which is far less than the power of the best wavelet packet algorithm. However, aswe will show in Chapter 4, there are situations that the best shift wavelet transformis the right tool.Assuming the input data are in the memory, and we want to keep all the completeundecimated wavelet coe�cients in the memory, the memory requirement for theUDWT is MemUDWT(N;L;M) = (L+ 1)N: (3:13)And the numbers of multiplications and additions required for the UDWT areMUDWT (N;L;M) =MLN; (3:14)AUDWT (N;L;M) =MLN: (3:15)They are the same as the complete wavelet packets transform.

40However, if we know the best path, the best shift wavelet transform is an ONtransform, and can be implemented in O(N) time.MBSWT (N;L;M) =MN; (3:16)ABSWT (N;L;M) =MN: (3:17)For the best shift wavelet algorithm, its complexity is related to the number ofnodes in the tree, which depends on the number of levels we take. We can easily showEvalBSWA(N;L;M) = (L + 1)N; (3:18)MemBSWA(N;L;M) = 2L � 1; (3:19)ABSWA(N;L;M) = (L+ 1)N + 2L � 1; (3:20)CompBSWA(N;L;M) = 2L � 1: (3:21)They are the same as the best wavelet packet algorithm.The best shift wavelet transform is shift-invariant in the sense that if we shift thesignal, the minimum cost will remain the same. An extension to 2D is described in[52].3.4 Best Shift Wavelet Packet Transform3.4.1 Basic IdeaWe can combine the ideas of the best wavelet packet transform and the best shiftwavelet transform, and jointly �nd the best shift and the best wavelet packet. Similarideas were independently proposed in [14, 8], however, the algorithm in [14] is sub-optimal compared with the algorithm we describe here.The basic idea is to further split the highpass band and keep both the even andodd down-samples, i.e. iterate the building block in Figure 3.7 on all the outputbranches. The resulting two-level complete undecimated wavelet packets transform

41is shown in Figure 3.14. Again, we concentrate on �nding the orthonormal best shiftand wavelet packet transform (BSWPT) in the rest of the section, and leave thedetailed discussions of the undecimated wavelet packet transform to Section 5.3.-- H- L

-- odd- even --H- L -- odd- even-- odd- even--H- L -- odd- even-- odd- even-- odd- even --H- L -- odd- even-- odd- even--H- L -- odd- even-- odd- evenFigure 3.14 Diagram for the two-levelundecimated wavelet packet transform.

423.4.2 Fast Searching AlgorithmThe building block of the cost tree for the algorithm that �nds the best shift waveletpacket transform (BSWPT) is shown in Figure 3.15. The root represents the in-put vector, four leaves represent the even lowpass, even highpass, odd lowpass, andodd highpass respectively. The corresponding one-level undecimated wavelet packettransform is shown in Figure 3.7. The optimal rule of �nding the best decompositionis C(even) = C(even low) + C(even high)C(odd) = C(odd low) + C(odd high)if C(root) = minfC(even); C(odd); C(root)gchoose not to further splitB(root) = C(root)elseif C(odd) = minfC(even); C(odd); C(root)gchoose odd splitB(root) = C(odd)elsechoose even splitB(root) = C(even)endEven OddLow High Low High

Figure 3.15 The building block of the cost tree forthe best shift wavelet transform algorithm.

43The complete two-level cost tree for the best shift wavelet packet transform selec-tion is shown in Figure 3.16. The algorithm that �nds the BSWPT isStep 0:take the undecimated wavelet packet transform as in Figure 3.14Step 1:for all dark nodes in the treecalculate C(node)endfor all the end nodes (leaves)B(leaf) = C(leaf)endStep 2:for the current level from the one next to leaves to the rootfor all the nodes on the current levelif the nodes are not darkB(node) = B(low) + B(high)elseif C(node) = minfC(even); C(odd); C(node)gchoose not to further splitB(node) = C(node)elseif C(odd) = minfC(even); C(odd); C(root)gchoose odd splitB(node) = C(odd)elsechoose even splitB(node) = C(even)endendendendStep 3:Starting from the root, walk out the optimal path.Some examples of the trees that correspond to some best shift wavelet packettransforms are shown in Figure 3.17. More examples are shown in Figure 3.18, andthe corresponding time-frequency plots are also shown. We can see that the bestshift wavelet packet transforms have combined the frequency adaptation power of the

44Figure 3.16 The complete cost tree for the two-levelbest shift wavelet packets transform algorithm.wavelet packet transform and the shift-invariance property of the best shift wavelettransform.

(a) (b)Figure 3.17 Examples of the trees correspond to sometwo-level best shift wavelet packet transform.

45(a) (b)(c) (d)Figure 3.18 Examples of pruned shift wavelet packettree and corresponding time-frequency plot.

463.4.3 Power and ComplexityWe can show,PBSWPA(N;L;M) = 8>>>>><>>>>>: 1 if N is odd, or N < MPBSWPA(N=2; L;M) N > 2LM(2PBSWPA(N=2; L � 1;M))2 + 1 otherwise (3:22)Although we can not �nd an analytical formula for PBSWPA(N;L;M), we can calcu-late them numerically. Some of the numerical results are shown in Table 3.2, and inFigure 3.19, both indicate that the power grows exponentially with the length of theinput. Compared with Table 3.1, the exponents are nearly doubled.Table 3.2 The power of the best shift wavelet packetalgorithm. log2(PBSWPT (N; log2N;M)):NM 1 2 4 8 16 32 64 128 256 512 10242 0 2 4 9 20 41 83 167 335 6714 0 0 2 4 9 20 41 83 167 335 6716 0 0 0 2 4 9 20 41 83 167 3358 0 0 0 2 4 9 20 41 83 167 33510 0 0 0 0 2 4 9 20 41 83 167Assuming the input data are in the memory, and we want to keep all the completeundecimated wavelet packet coe�cients in the memory, the memory requirement forthe UDWPT is MemUDWPT (N;L;M) = (2L+1 � 1)N: (3:23)And the number of multiplications and additions required for the UDWPT areMUDWPT (N;L;M) =M(2L+1 � 1)N; (3:24)AUDWPT (N;L;M) =M(2L+1 � 1)N: (3:25)

47200

400600

8001000

24

68

10

0

100

200

300

400

500

600

700

NM

log2

(P)

Figure 3.19 The power of the best shift waveletpacket algorithm. log2(PBSWPA(N; log2N;M)):However, if we know the best tree, the best shift wavelet packet transform is anON transform, and can be implemented in O(N) time.MBSWPT (N;L;M) =MN; (3:26)ABSWPT (N;L;M) =MN: (3:27)For the best shift wavelet packet algorithm, its complexity is related to the numberof nodes in the tree, which depends on the number of levels we take. We can easilyshow EvalBSWPA(N;L;M) = �2L+1 � 1�N; (3:28)MemBSWPA(N;L;M) = 13 �4L+1 � 1� ; (3:29)since only dark nodes need memory. The numbers of additions and comparisons areABSWPA(N;L;M) = �2L+1 � 1�N + 23 �4L � 1� ; (3:30)CompBSWPA(N;L;M) = 23 �4L � 1� : (3:31)

483.5 Time-Varying Best Wavelet Packet Transform3.5.1 IntroductionAlthough the algorithms introduced in the previous sections are quite powerful, theyhave two shortcomings. First of all, the �lter bank structures do not change withtime. For the best wavelet packet transform, we can easily imagine that there mightbe signals whose frequency contents change with time, i.e. they are nonstationary.For the best shift wavelet transform, we might be in the situation that one shift isgood for one part of the signal and another shift is good for another part of the signal.The searching algorithms have to compromise in these cases. The second drawbackis rooted in the discrete wavelet transform. In order to take an L level transform,the length of the input signal N must be divisible by 2L. Therefore, if N is a primenumber, we can not use the DWT2, thus those algorithms in the previous sectionsare not feasible either. This is evident from the spiky shapes of Figure 3.6 and 3.19.To avoid these problems, we need to introduce the time-varying wavelet systems.The idea is quite simple, we can cut the input signal into several non-overlappingsegments, and use di�erent wavelet packet transforms on di�erent segments. However,the task of �nding the best time segmentations and the best wavelet packet transformon each segment is very hard, since the number of possible choices is huge. Forexample, for a length N signal, the number of di�erent ways of segmenting the signalis 2N�1. Also recall from Table 3.1, for each segment, the number of possible waveletpacket transforms increases exponentially with the length of the segment.Fortunately, if the cost function is additive, we can exploit the structure of thewavelet transform and construct a fast algorithm that �nds the time-varying bestwavelet packet transform (TVBWPT) e�ciently. The dynamic programming idea isagain heavily used in the searching algorithm, which we shall call the time-varyingbest wavelet packet algorithm (TVBWPA).2We are aware of some tricks to �x this problem, but they require rather complicated bookkeeping.

493.5.2 Fast Searching AlgorithmThe Main IdeaLet x be a length-N input signal, for which the ith element is xi. We use xi:j todenote a segment of x that starts from the ith element and ends with the jth element,i.e. xi:j = [xi; xi+1; : : : ; xj�1; xj]. The cost of the best wavelet packet transform ofxi:j is denoted as C(i; j), and can be found by the best wavelet packet algorithm inSection 3.2. For all the segments of x, we can form a triangle table of C(i; j)'s. Anexample of this table for a length-8 signal is shown in Figure 3.20.C(1; 1)C(1; 2) C(2; 2)C(1; 3) C(2; 3) C(3; 3)C(1; 4) C(2; 4) C(3; 4) C(4; 4)C(1; 5) C(2; 5) C(3; 5) C(4; 5) C(5; 5)C(1; 6) C(2; 6) C(3; 6) C(4; 6) C(5; 6) C(6; 6)C(1; 7) C(2; 7) C(3; 7) C(4; 7) C(5; 7) C(6; 7) C(7; 7)C(1; 8) C(2; 8) C(3; 8) C(4; 8) C(5; 8) C(6; 8) C(7; 8) C(8; 8)Figure 3.20 Example of a cost table forall the segments of a length-8 input.Let c1; c2; : : : ; cn be a set of segmentation points. The problem of �nding thetime-varying best wavelet packets transform can be formulated asminn;1<c1<c2<:::<cn<N C(1; c1) + C(c1 + 1; c2) + : : :+ C(cn + 1; N): (3:32)

50Notice that the number of the optimal segments n is also unknown. Let B(i; j) denotethe cost of the best time-varying wavelet packet transform of xi:j, i.e.B(i; j) = minn;i<c1<c2<:::<cn<j C(i; c1) + C(c1 + 1; c2) + : : :+ C(cn + 1; j): (3:33)Suppose we have found the best segmentation of x, with the optimal cost B(1; N).Assume the last segment is xj:N , then the dynamic programming principle tells usthat B(1; N) = B(1; j � 1) + C(j;N); (3:34)in other words, the segmentation for x1:j�1 in the optimal solution which we havefound for x is optimal for the subproblem of �nding the minimal TVBWPT cost ofx1:j�1. This fact provides us a constructive algorithm to �nd B(1; N).B(1; 0) = 0for i = 1 to NB(1; i) = min0�j<i (B(1; j) + C(j + 1; i)) :endTherefore, we simply �ll up a table as in Figure 3.21 from left to right. Also wekeep track of the optimal j for each i, i.e. the location of the last segmentation forx1:i. Therefore, when we �nd B(1; N), we can backtrack and �nd all the optimalsegmentation points.B(1; 1) B(1; 2) B(1; 3) B(1; 4) B(1; 5) B(1; 6) B(1; 7) B(1; 8)Figure 3.21 Example of a time-varying best cost table for all thesegments that starts from 1. The input length is 8.Some Key Observations to Speedup the AlgorithmNaive ways of �nding the best wavelet packet transforms for all the segments of x areprohibitive. Since we know from Section 3.2 that we need O(N logN) operations to

51�nd C(1; N) alone. The cost trees for all the segments form a forest as in Figure 3.22.In order to �nd C(i; j)'s, we need to gather costs on all the nodes of all the trees, andprune all the trees in the forest.Figure 3.22 The forest of cost trees for thetime-varying wavelet packet algorithm.After careful inspection, we realize that all the costs on the nodes can be calculatedbased on previous results, and they also can be used for future calculations. Forexample, C(1; 5) = P5i=1 g(xi), where g is the cost function. So C(2; 6) = C(1; 5) +g(x6)� g(x1). Thus, instead of 5 operations, we only need two operations. When thelength of the vector is long, this could mean great savings. By careful arrangements,we can �nd the costs on all the nodes on all the trees in the forest at an expanse oftwo additions per node.Another problem of the time-varying wavelet packet transform is that the trans-form coe�cients of the segmented input is not the segmented transform coe�cients ofthe unsegmented input. However, computing the wavelet packet transform of all thesegments is computationally prohibitive. By inspecting the structure of the wavelet

52transform, we realize that in order to get the wavelet coe�cients of the segmentedsignal from the wavelet coe�cients of the unsegmented signal, we only need to updateat most M � 2 coe�cients on each level. Since for practical applications, the lengthof the wavelet �lterM is a small number, the cost of updating the wavelet coe�cientscan be treated as a constant on each node in the forest.The AlgorithmStep 0:take the complete undecimated wavelet packets transform as in Figure 3.14Step 1:for l from 1 to Nfor all the length-l segmentationsupdate wavelet coe�cients for the segmented signalupdate the cost tree for this segmentprune the cost tree and �nd the optimal BWPT cost C(i; j)record the optimal tree shape in the table of C's.endendStep 2:B(1; 0) = 0for i = 1 to NB(1; i) = min0�j<i (B(1; j) + C(j + 1; i)) :endStep 3:Starting from B(1; N), backtrack to �nd all the segmentation points.Look up the table of C(i; j), to �nd the optimal tree shape for eachoptimal segments.3.5.3 Power and ComplexityThe number of possible outcomes of the TVBWPA, i.e. the power of the TVBWPA,can be recursively computed as3PTVBWPA(N;L;M) = PBWPA(N;L;M)+ NXi=1 PTVBWPA(i; L;M)PBWPA(N�i; L;M):(3:35)3Except for Haar wavelet which does not require boundary treatment.

53Interestingly enough, the algorithm to compute PTVBWPA(N;L;M) is itself a dy-namical programming algorithm. Although we can not �nd an analytical formulafor PTVBWPA(N;L;M), we can calculate them numerically. Some of the numericalresults are shown in Table 3.3, and in Figure 3.23, both indicate that the powergrows exponentially with the length of the input. Compared with Table 3.1 and 3.2,the exponents are again nearly doubled. Unlike Figure 3.6 and 3.19, Figure 3.23 ismonotonically increasing and not sensitive to M .Table 3.3 The power of the time-varying best waveletpacket algorithm. log2(PTVBWPA(N; log2N;M)):NM 1 2 4 8 16 32 64 128 256 512 10244 0 1 3 7 16 33 67 135 270 5416 0 1 3 7 15 31 64 129 259 5198 0 1 3 7 15 31 63 127 256 51310 0 1 3 7 15 31 63 127 255 511 1024In step 0 of the algorithm, we need to take the undecimated wavelet packet trans-form, and the number of multiplications and additions required for the UDWPT areMUDWPT (N;L;M) =M(2L+1 � 1)N; (3:36)AUDWPT (N;L;M) =M(2L+1 � 1)N: (3:37)As shown in the previous section, the computational complexity of the time-varying wavelet packet algorithm is directly related to the number of nodes in thecost forest. A careful study shows that the number of nodes is bounded by N2 log2Nfor a length-N signal. So the total complexity, including the undecimated waveletpacket transform, the forming of the cost forest, the pruning of all the trees in thecost forest, and the searching for the best segmentation, is on the order of N2 log2N .

54200

400600

8001000

24

68

10

0

200

400

600

800

1000

1200

NM

log2

(P)

Figure 3.23 The power of the time-varying best waveletpacket algorithm. log2(PTVBWPA(N; log2N;M)):3.5.4 DiscussionsShift Invariant?The algorithm is not exactly shift invariant, since we use the plain wavelet packettransform on each time segment. However, since the underlying functions are allthe shifted versions of all the wavelet packets waveforms, the algorithm is not verysensitive to the shifts.Boundary TreatmentWe need to calculate the wavelet transform of all the possible time segments of thesignal. There are many ways to treat the boundaries [7, 32, 46]. Our idea of updatingthe coe�cients and our result of constant operations per levels hold in general. Inpractice, we can choose any method that is suitable to the problem and easy tocompute.

55Comparison with Previous WorksMany people have recognized the importance of �nding the best time domain adapta-tion, and have made some progress in the past few years. For denoising and compres-sion, Donoho [20] and Deng et al. [15] proposed methods to �nd the segmentations.However, they only used the standard wavelet transform on each segments. In con-trast, our algorithm jointly �nds the best time segmentations and the best waveletpacket transform on each segment of the signal. In [46], a double tree algorithm isproposed to attack the problem. But the method there is quite restrictive, since onlybinary tree type segmentations were allowed. Recently, block based algorithms wereproposed [79, 72], where the length of the segments can be any multiple of the blocklength. We do not have this restriction either. The last step of their algorithms,where the best time segmentations are found, is similar to the dynamic programmingapproach we use here. However, most computation is in the �rst part, where thecomplete wavelet packet transforms are computed for all the segments, and the besttransforms are searched for all the segments. No computationally e�cient algorithmswas proposed in [79, 72] for the �rst part. In our algorithm, we take advantage ofthe structure of the wavelet transform and the structure of the cost forest, and pro-gressively update the wavelet transforms of segments, and update the cost trees inthe forest. Our algorithm is quite e�cient, and the total complexity, including theundecimated wavelet packet transform, the forming of the cost forest, the pruning ofall the trees in the cost forest, and the searching for the best segmentation, is on theorder of N2 log2N .Further ImprovementsThe complexity of our algorithm can be further improved. If we have some knowledgeof the longest elements in the signal, we can reduce the level of decompositions, i.e.reduce the maximum height of the forest. Also, we can restrict the maximum lengthof the segments. Instead of having a triangular forest, we can have a band-triangular

56forest. As pointed out in [72], the dynamic programming algorithm here is relatedto the dynamic programming algorithm for Viterbi decoding [28] in communication.Many welled developed techniques that improve performance for Viterbi decodingcan be modi�ed and used here.3.6 Time-Varying Best Shift Wavelet Packet TransformIn the previous section, we introduced the time-varying best wavelet packet transform.One possible drawback is that the algorithm is not necessarily shift-invariant. Wecan �x this problem by using the best shift wavelet packet transform algorithm onall the possible segments. All the other steps of the time-varying best wavelet packetalgorithm carry over. Instead of having wavelet packet cost trees in the forest, webuild the best shift wavelet packet cost trees (e.g., Figure 3.16) in the forest. Wecan also show that using the similar updating strategy, the amount of work neededis constant on each node. However, the number of nodes in the tree is O(N3) for alength-N signal. So we gain the shift invariance at the expense of more computations.3.7 Discussion and Future WorkIn this chapter, we have developed a set of powerful and e�cient algorithms. Thefundamental ideas are the same. First, we expand the signal on to a large library offunctions. Then �nd a set of functions from the library, and use them as basis of anorthonormal transform. Any additive cost measure on the transformed coe�cientscan be minimized. The multiresolution structure allows us to have fast algorithms forboth the calculation of the expansion coe�cients and the searching of the best basis.Of course, there are several limitations of the best basis paradigm. The additivecost function is a necessary requirement for dynamic programming. As we will see inthe next chapter, many problems result in additive cost measures, or can be solved

57using a sequence of additive costs. Although best basis approach has been used fornon-additive cost measures [74], optimality can not be assured.Like any dynamic programming scheme, all the best basis algorithms give us onesolution that has the minimum cost. They will not generate all the solutions thathave the same minimum cost. If we force the algorithms to �nd all the solutions, thecomplexities of the algorithms might grow exponentially.Higher dimensions are clearly challenging, since there are many ways to segmenthigh dimensional objects. Progress needs to be made to extend these algorithms tohigher dimensions.In this chapter, we only consider the structure of the wavelet systems. It is alsomeaningful and possible to design the wavelets themselves in order to further improvethe performance [36, 75, 50]. There are many types of wavelets systems, e.g., M-bandwavelets [73], Cos-Modulated wavelets [35], and others [31]. Generalization of thebest basis idea to other types of wavelets is also possible.

58Chapter 4Applications of Orthonormal Wavelet Transform4.1 IntroductionIn the previous chapter, we have introduced a set of algorithms that can �nd thebest orthonormal wavelet transform for any given signal and for any additive costfunction. In this chapter, we study several applications of these orthonormal wavelettransforms, and develop suitable cost measures for these applications.4.2 Signal Analysis y W YFigure 4.1 The diagram of the wavelet based signal analysis.In the ground-breaking paper of Coifman and Wickerhauser [9], the best waveletpacket algorithm was proposed for signal analysis. They �nd the best wavelet packettransform by minimizing the entropy of the transformed coe�cients. The charac-teristic of the signal can be deduced from the structure of the transform and thecoe�cients of the transformed signal.Let y be the signal, W be the wavelet transformation matrix. The wavelet trans-form of y is Y = Wy. Now scale Y to the unit vector u in the l2 norm, withui = Yi=kYk2, so that the transform vector has unit energy kuk2 = 1. Since the

59transform is orthogonal, normalizing the input vector y has the same e�ect. Theentropy that used in [9] is de�ned byH(u) = Xi:ui 6=0 u2i lnu2i : (4:1)Other often used measures are, the log energy,E(u) = Xi:ui 6=0 lnu2i ; (4:2)and the lp norm, lp(u) = Xi juijp!1=p : (4:3)We can use these cost measures for the time-varying wavelet transforms that wedeveloped in the previous chapter. The results are powerful tools for the nonstationarysignal analysis.4.3 Data Compression4.3.1 Data Compression in Wavelet Domainy W Y Q Y W�1 yFigure 4.2 The diagram of the wavelet based data compression.To minimize storage space or transmission bandwidth, we need to represent signalsby as few bits as possible. This is the data compression problem, and it has beenstudied for several decades. Many methods exist in the literature, e.g. [47, 29]. Thekey point is how to trade bits with accuracy. It is rather surprising that waveletsbring new answers to this old problem, especially for image compression [5].

60The straightforward and e�ective method of wavelet based data compression hasthree stages: 1) take the wavelet transform of the data; 2) quantize the coe�cientsusing a uniform scalar quantizer, as shown in Figure 4.18(a); 3) entropy encode theoutput symbols of the quantizer. To reconstruct the signal, 1) entropy decode; 2)convert symbols to corresponding values as indicated by the quantizer; 3) inversewavelet transform. This is the wavelet based data compression model we consider inthis section. There are many variations in practice, but all of those wavelet basedcompression schemes have the same structure.In short, the scheme we consider here isy =W�1Y =W�1Q(Y ) =W�1Q(Wy); (4:4)where W is the wavelet transformation matrix, and Q is a point wise quantizer.The quantization stage is the only stage that introduces error, which is called thequantization error. The usual measure of the distortion is the l2 errorE = ky� yk22: (4:5)The performance of a data compression scheme is measured by a rate-distortion func-tion, where the rate is measured by the number of bits used. Clearly compressionwith a lower rate-distortion curve is better.4.3.2 Rate-Distortion as an Additive MeasureLet us assume that we have a coder, so for any quantized symbol Yi, we know thenumber of bits used to code it, denoted by B(Yi). The total number of bits used forY is B =Xi B(Yi): (4:6)Since the transform is orthogonal,E =Xi (yi � yi)2 =Xi (Yi � Yi)2: (4:7)

61For a given rate budget B, we want to minimize the distortion E, or for a givendistortion E, we want to minimize the total rate B. The constrained problem can besolved using the Lagrange multiplier � [29]. So we minimize a weighted sum of rateand distortion, L = E + �B =Xi �(Yi � Yi)2 + �B(Yi)� : (4:8)where � is the Lagrange multiplier, and the optimal � can be found iteratively. Now(Yi � Yi)2 + �B(Yi) (4:9)is a additive cost measure on the transformed coe�cients. In [67], the above weighted-distortion measure is used to select the best wavelet packet transform for data andimage compression. We generalized the idea to the shift-invariance and time-varyingwavelet transforms. We are able to further improve the data compression perfor-mance.4.4 Denoising4.4.1 IntroductionIn this section, we present another important application of wavelets { denoising. Ithas been shown [21, 24, 23, 26, 18, 19], that the wavelet-based method has certainkinds of optimality that are not achievable using any other methods.In Section 4.4.2 to 4.4.4, we present the original method of Donoho and Johnstonefrom a slightly di�erent point of view. In Section 4.4.5, we try to improve the denoisingperformance by using our results on the shift-invariant and time-varying wavelettransforms. Finally, we present an application of the wavelet denoising to specklereduction for the synthetic aperture radar (SAR).

624.4.2 Denoising of a Single ObservationThe Problem and Four Estimators/DenoisersLet us consider that we have a noisy observation of a single deterministic value:y = x+ n; (4:10)where x is the value we want to know, and n is Gaussian noise with unit variance.We have no a priori knowledge of the probability distribution of x. This is a classicestimation problem. We measure the quality of the estimator byRx(y) = E(jx� xj2): (4:11)It is well know that the maximal likelihood estimation (MLE) of x is y, i.e. the noisyobservation itself, xMLE(y) = y; (4:12)and RxMLE = 1: (4:13)Let us examine the following two estimators, namely the hard-thresholding esti-mator: xhard� (y) = 8><>: y if jyj � �0 otherwise (4:14)and the soft-thresholding estimator:xsoft� (y) = 8>>>>><>>>>>: y � � if y � �0 if jyj < �y + � if y � �� (4:15)In both Eqn. 4.14 and 4.15, � is a non-negative value that represents the threshold.In order to evaluate the quality of both schemes, we need to de�ne some functionalof the Gaussian density. LetQ(a) = Z 1a 1p2�e�1=2x2dx; (4:16)

63and Q2b(a; b) = Z 1a (x� b)2 1p2�e�1=2x2dx; (4:17)in particular, Q2(a) = Q2b(a; 0) = Z 1a x2 1p2�e�1=2x2dx: (4:18)There are no close form expressions of these functions, but they are well de�ned, andtabulated in various mathematical handbooks.−5 −4 −3 −2 −1 0 1 2 3 4 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x

Q(x)

Q2(x)

(a) Q(x) and Q2(x) −5 −4 −3 −2 −1 0 1 2 3 4 50

1

2

3

4

5

6

7

8

9

10

x

Q2

b(x

,t)

Q2b(x,3)

Q2b(x,2)

Q2b(x,1)

Q2b(x,0)(b) Q2b(x; b) for b = 0; 1; 2; 3:Figure 4.3 Functionals of the Gaussian PDF.From Figure 4.3, we can see that 0 � Q(x) � 1, and 0 � Q2(x) � 1, but Q(x)converges to 1 or 0 faster. Also 0 � Q2b(x; b) � b2 + 1, and it converges relativelyfast. Then we can showRxhard� = x2[Q(�� x)�Q(� � x)] +Q2(� � x) +Q2(� + x); (4:19)and Rxsoft� = x2[Q(�� x)�Q(� � x)] +Q2b(� � x; � ) +Q2b(� + x; � ): (4:20)

64−10 −8 −6 −4 −2 0 2 4 6 8 100

5

10

15

x

x.^

2.*

(Q(−

t−x)−

Q(t

−x))

t=5

t=4

t=3

t=2

t=1(a) x2(Q(�� x)�Q(� � x)) −10 −8 −6 −4 −2 0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

t=1

x

Q2

(t−

x)+

Q2

(t+

x)

t=1.55

t=2

t=3

t=4

t=5(b) Q2(� � x) +Q2(� + x)−10 −8 −6 −4 −2 0 2 4 6 8 100

5

10

15

x

t=5

t=4

t=3

t=2

t=1(c) Rxhard�Figure 4.4 x2[Q(�� x)�Q(� � x)], Q2(� � x) +Q2(� + x) and Rxhard� .

65In both Rxhard� and Rxsoft� , the �rst term represents the cost for setting the estimatorto zero. It is clear from Figure 4.4(a), that 1) when the threshold gets bigger, thisportion of the cost gets bigger too; 2) when the magnitude of x gets bigger, the costof this portion tends to zero very fast, since it is part of the tail probability of theGaussian distribution. The situation becomes more complicated when we consider thesecond term of the cost in Rxhard� , 1) the cost decreases when the threshold increases;2) the cost is bounded above by 1; 3) the cost decreases faster when the thresholddrops from 1 to 4; 4) when the threshold is bigger than 5, the curve mearly shifts.When we combine these costs together, we get a set of curves of Rxhard� for di�erentthresholds, as in Figure 4.4(c). Those �gures show, if x is small, it is best to use a bigthreshold, i.e. to kill as much as noise possible, while for large x, a small threshold isdesired.−10 −8 −6 −4 −2 0 2 4 6 8 100

5

10

15

20

25

30

x

Q2

b(t

−x,t)+

Q2

b(t

+x,t)

t=5

t=4

t=3

t=2

t=1(a) Q2b(� � x; � ) +Q2b(� + x; � ) −10 −8 −6 −4 −2 0 2 4 6 8 100

5

10

15

20

25

30

x

t=5

t=4

t=3

t=2

t=1(b) Rxsoft� for b = 0; 1; 2; 3:Figure 4.5 Q2b(� � x; � ) +Q2b(� + x; � ) and Rxsoft� .Similarly, we can analyze the performance of the soft-thresholding estimator.Figure 4.5(a) shows the cost of shrinking the observation y when x is bigger thanthe threshold. We can see that 1) it is bounded above by � 2 + 1; 2) a big threshold

66works great if x is small, but miserably if x is big. The transition points occur atx = � . If we consider both parts of the error, we can deduce from Figure 4.5(b) that1) when x is big, the error is dominated by the second part of shrinking y; 2) whenx is small, the error is dominated by the �rst part.As a benchmark, we also consider a somewhat ideal estimator:xideal(y; x) = 8><>: y if jxj � 10 otherwise (4:21)Since xideal� requires the knowledge of x, it is not realizable, but we can comparethose realizable estimators with it. It can be easily seen that the quality of the idealestimator is, Rxideal = minn1; x2o ; (4:22)which is also called the ideal risk.Relative QualitiesThose four estimators in Section 4.4.2 enjoy many interesting properties. Some of theproperties are critical, since they allow us to extend our method to sequence space,and qualify our results.Fact 4.4.1 For xMLE, xhard� and xsoft� de�ned in Eqn. 4.12, 4.14 and4.15, xMLE = xhard0 = xsoft0 : (4:23)This says that MLE is a special case of the hard-thresholding estimator and thesoft-thresholding estimator when the threshold value is 0.Fact 4.4.2 For xMLE, and xideal� de�ned in Eqn. 4.12, and 4.21,Rxideal 8><>: = RxMLE if jxj � 1< RxMLE otherwise: (4:24)

67This says that MLE achieves the ideal risk when jxj � 1, but fails otherwise.Proposition 4.4.1 For xhard� and xideal� de�ned in Eqn. 4.14 and 4.21,Rxideal � Rxhard� 8� � 0; (4:25)and for any x and � > 0, there exists � (�; x), such thatRxhard�(�;x) < Rxideal + �: (4:26)Prop. 4.4.1 has an interesting practical implication. It shows that by optimally choos-ing the threshold, the performance of a hard-thresholding estimator is arbitrarilyclose to the ideal risk. We also need to point out that there is no similar resultlike Prop. 4.4.1 for the soft-threshold estimator. For example, when x is small, it ispossible to get a lower risk than the ideal risk if the threshold is picked optimally4.Theorem 4.4.1 Donoho [24] For all � � p2 log 2,Rxsoft� < (� 2 + 1)(e��2=2 +Rxideal): (4:27)This bound is tight when both x and � are big. It is interesting to note that the ratiobetween l.h.s and r.h.s. of Eqn. 4.27 monotonically decrease when jxj decreases, seeFigure 4.6.Theorem 4.4.2 Donoho [24] For all � � p2 log 2,Rxhard� < (� 2 + 1)(e��2=2 +Rxideal): (4:28)From Figure 4.7(a), we can see that this bound is not tight when � is small. It istight only when � is very large and at x = � , as seen in Figure 4.7(b).4e.g. when :7 < jxj < 1, and � = 1.

68−10 −8 −6 −4 −2 0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

t=1

t=2

t=3

t=4

t=5Figure 4.6 Rxsoft� =[(� 2 + 1)(e��2=2 +Rxideal)]−10 −8 −6 −4 −2 0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x

t=1

t=1.55

t=2

t=3

t=4

t=5(a) small � −100 −80 −60 −40 −20 0 20 40 60 80 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x

t=80t=60

t=40

t=20

t=10

(b) big �Figure 4.7 Rxhard� =[(� 2 + 1)(e��2=2 +Rxideal)].

694.4.3 Denoising of a Sequence of ObservationsThe ProblemLet's assume we have a sequence of observations,yi = xi + ni; i = 1; 2; : : : ; N; (4:29)where ni's are i.i.d. Gaussian noise with unit variance. In case of correlated noise,we need to decorrelate them �rst. While for unknown variance, we need to estimatethe variance �rst. Let's denote y = (y1; y2; : : : ; yN)T , x = (x1; x2; : : : ; xN)T , andn = (n1; n2; : : : ; nN)T . Since the noise are independent, we can treat the elements ofy independently, and get the so called diagonal estimators,xMLE(y) = y; (4:30)xhard� (y) = (x1hard� (y1); x2hard� (y2); : : : ; xNhard� (yN ))T ; (4:31)xsoft� (y) = (x1soft� (y1); x2soft� (y2); : : : ; xNsoft� (yN ))T ; (4:32)and xideal� (y) = (x1ideal� (y1); x2ideal� (y2); : : : ; xNideal� (yN ))T ; (4:33)Clearly, RxMLE = N; (4:34)Rxhard� = NXi=1Rxihard� (4:35)Rxsoft� = NXi=1Rxisoft� (4:36)and Rxideal� = NXi=1Rxiideal� = NXi=1minn1; x2io : (4:37)

70Relative QualitiesTheorem 4.4.3 Donoho [24]Rxsoft�� < (2 logN + 1)(1 +Rxideal): (4:38)where � �� = p2 logN .Proof: Let � = p2 logN in Thm. 4.4.1, and use Eqn. 4.36. 2Theorem 4.4.4 Donoho [24]Rxhard�� < (2 logN + 1)(1 +Rxideal); (4:39)where � �� = p2 logN .Proof: Let � = p2 logN in Thm. 4.4.2, and use Eqn. 4.35. 2Optimal Threshold SelectionAlthough the threshold is �xed asymptotically, we are interested in �nding the optimalthreshold for �nite length data, which is very important in practice.Known xOf course, if x is known, we do not need to estimate it. Here we make thisunrealistic assumption, only to show the di�culty of the problem.Fact 4.4.3 For �xed x, both Rhardx� and Rsoftx� are in�nitely di�erentiablefunction w.r.t. � .For bounded x, we can see that both Rhardx� and Rsoftx� are bounded and in�nitelydi�erentiable on a bounded compact set, so the optimal threshold exists and can be

71found by an optimization algorithm. Here, we use one threshold for all the elementsin the sequence, otherwise it is unrealistic. It is also possible to use the �rst ordernecessary condition, and solve for the set of points that are candidates for the problem,however, the equations are nonlinear, and we have to rely on optimization routines.If the optimal thresholds are found, we can get the lower bound of the risks.Unknown xIn the real world, x is unknown, any many methods have been proposed. Forexample, the cross validation scheme [60], minimum description length (MDL) [69],and the SURE shrink [22].4.4.4 Denoising in an ON BasisThe MethodLet yi = xi + ni; i = 1; : : : ;N (4:40)where xi is the original signal, ni is i.i.d. white Gaussian noise with unit variance,and yi is the observation. The goal is to get a good estimate of x from the noisyobservation y. Let W denote the wavelet transformation matrix, we haveWy = Wx+Wn; (4:41)or Yi = Xi +Ni; i = 1; : : : ;N (4:42)where Ni is white Gaussian noise with unit variance. We consider a diagonal projec-tion estimator of the form Xi = �iYi, �i = 0 or 1, orx = W T X = W T�Y = W T�Wy (4:43)where � = diag(�i).

72The Ideal RiskUnder the above assumptions, the ideal diagonal projection estimator is obtained bysetting �i = 8><>: 1 jXij � 10 jXij < 1 (4:44)The ideal risk isRDWT (X) = E kx� xk22= E W T X �W TX 22 = E W T (X �X) 22= E X �X 22= Xi min(X2i ; 1): (4.45)The ideal risk is unattainable, since it requires the knowledge of X to decide whetherto keep or kill the coe�cients. However, it can be considered as a benchmark. Wecan �x the function to be estimated, and change the wavelets, or replace the wavelettransform with any other orthogonal transforms, then compare the ideal risks. It willgive us some idea about which basis is better to use for the signal. It is clear that if thecoe�cients decay fast in the basis, then the estimator has a lower risk asymptotically[26]. The advantage of the wavelet transform comes from the fact that for a wideclass of functions, their wavelet coe�cients have the fastest decay rate among all theorthogonal transforms.For the orthogonal transform model as in Eqn. 4.43, it can be shown [21] thatwithout the knowledge of X, i.e. just rely on the data, the risk of the followingestimator: �i = 8><>: 1 jYij � 10 jYij < 1 (4:46)is within a log factor of the ideal risk. Donoho and Johnstone further showed [21] thatthe coe�cients is asymptotically minimax optimal for functions from a wide rangeof function classes. So thresholding in the wavelet domain is the optimal method to

73denoise functions from any of these functions spaces [21]. No other methods has thesimilar optimality [21, 26].4.4.5 Best Basis Denoisingy(x) W (y) Y (X) T� X(Y ) W T x(y)Figure 4.8 The diagram of the best basis denoising.Although the wavelet thresholding is asymptotically optimal, we still hope toimprove the performance for �nite length signals. Also, the classic wavelet transformis �xed, not shift-invariant, and not signal adaptive. The shift-invariant and time-varying wavelet transforms that we developed in Chapter 3 have the potential tofurther improve the denoising performance.Clearly, the question is which cost measure to use. Donoho [23] proposed a costfunction that is minimax optimal for the basis selection. Here we take a di�erentapproach.If the signal is known, and the threshold is �xed, we know exactly the actual costfor the thresholding of individual coe�cients. The actual cost is shown in Eqn. 4.20for the soft thresholding, and in Eqn. 4.19 for the hard thresholding. Since thetransform is orthonormal, actual costs are additive. So we can use them to �nd thebest basis. Of course, the signal is unknown to us, but we can �nd the best basis forsome previous estimate of the signal, then use the best basis to improve the denoising.We can iterate the process until we converge to a �xed basis. This iteration processis highly nonlinear, so it is hard to prove a convergence theorem. According to ourexperience, when the signal to noise ratio is high (> 0dB), the algorithm usuallyconverges. Intuitively, wavelet transforms are good at separating signal and noise, so

74it is easier to adapt to the signal in the wavelet domain than in any other orthonormaltransform domains.4.4.6 Speckle ReductionIntroductionSpeckle results from the need to create an image with coherent radiation. Specklephenomena can be found in SAR, acoustic imagery, and laser range data. A fullydeveloped speckle pattern appears chaotic and unordered. Thus when image detail isimportant, speckle can be considered as noise that causes degradation of the image.Therefore, speckle reduction is important in several applications of coherent imaging.In this section, we study the minimization of speckle e�ects when we already havea digitized speckled image. Dewaele et al. [17] compared several speckle reductiontechniques, including Lee's statistical �lter, the sigma �lter, and Crimmins' geometric�lter. These methods achieve moderate speckle reduction, but smooth out sharpfeatures in the image. Novak [63] derived a polarimetric whitening �lter (PWF) forfully polarimetric SAR data. However, this method does not utilize spatial correlation{ only the correlation across polarizations is used.We propose a novel speckle reduction method based on thresholding the waveletcoe�cients of the logarithmically transformed image. This method can provide sig-ni�cant speckle reduction and target-to-clutter improvement while preserving theresolution of the original SAR imagery. Thus it can be used as a pre-processing stepto improve the performance of automatic target detection and recognition algorithmsbased on SAR images.The statistical properties of the speckle noise were studied by Goodman [30]. Heshows that when the imaging system has a resolution cell that is small compared withthe spatial detail in the object, and the speckle-degraded image has been sampledcoarsely enough that the degradation at any pixel can be assumed to be independentof the degradation at all other pixels, coherent speckle noise can be modeled as

75multiplicative noise. Also, the real and imaginary parts of the complex speckle noiseare independent, have zero mean, and are identically distributed Gaussian randomvariables. Arsenault [1] shows that when the image intensity is integrated with a�nite aperture and logarithmically transformed, the speckle noise is approximatelyGaussian additive noise, and it tends to a normal probability much faster than theintensity distribution. Thus we have~y(m;n) = ~x(m;n) + ~e(m;n) (4:47)where ~y = ln(jyj), and y is the observed complex SAR imaginary. x is the desiredtexture information, but it is contaminated by the speckle noise e. If an integratingaperture is used, and if we assume that the size of the aperture is small enough toretain texture detail, then ~e is close to Gaussian distributed. The goal for specklereduction is equivalent to �nding the best estimate of x.The Details of the MethodBased on the above discussion, we propose the following method for speckle reduction:WaveletTransform Shrinkage InverseTransformlog j:j - - -Figure 4.9 The diagram of speckle reduction via wavelet shrinkage.Although, this is a straight forward application of the wavelet denoising schemeof Donoho, a number of important factors have to be carefully decided.Choice of wavelet: Under the name of wavelet analysis, there are many choices,such as Daubechies' family of wavelets, Coi ets, M-band wavelets [33], wavelet pack-ets [9], and space-varying wavelets [7, 34]. Longer wavelets with higher regularitytend to give a little better result in term of speckle reduction. However, if the wavelet

76�lter is too long, details of the image might be over-smoothed. Also the computa-tional complexity is nearly proportional to the length of the wavelets. So we chooselength-4 Daubechies' wavelet, which achieves a balance between speckle reductionand the improvement in target-to-clutter contrast, and at the same time it is alsocomputationally very e�cient.Levels of wavelet transform: In order to separate the back ground textureand local granular speckle phenomena, a number of levels of the wavelet transform isneeded. Clearly, the level is also related to the length of the wavelet �lter. For ourchoice of a D4 wavelet, we usually take at least 5 levels of the transform.Size of the wavelet transform: Wavelet transforms are taken block by block.In order to minimize the boundary e�ects, we use at least 128 � 128 blocks, often512 � 512 blocks.Thresholding Scheme: soft thresholding or hard thresholding. Although thesoft thresholding is optimal in theory, the following hard thresholding scheme, isshown to give better results for certain application [69].w = 8><>: w if jwj > t0 otherwiseSo we test both of them, and compare the results.The threshold: This is the most important factor of the algorithm. Since thenoise variance is not known in practice, it must be estimated from the data. Anumber of approaches exist [21]. We found the following method is simple and verye�ective. Take the high/high part of the �rst level of the wavelet decomposition. Theestimated noise variance is taken to be the standard deviation of the high/high part.For i.i.d. Gaussian noise, we found that t = 1:5 : : : 3� yield excellent results. Usingthis range of thresholds , 86.6% : : :99.7% of the noise values have been suppressed.Our thresholding scheme is di�erent from the one in Donoho's work, which over-smoothes the image. Also, we do not threshold the low/low part of the �nal level of

77the wavelet decomposition. This guarantees that the mean of the processed image isthe same as the mean of the original image.Results of Speckle ReductionThis section presents numerical results obtained by applying the wavelet thresholdingbased speckle reduction method to actual SAR imagery. The data we are using werecollected near Stockbridge, NY by the Lincoln Laboratory MMW SAR. We chosefour types of clutter regions in the images: trees, scrub, grass and shadows. Discreteobjects, like cars and power-line towers are considered as targets. We applied thewavelet based speckle reduction algorithm, and computed the following four statisticsto evaluate the performance:Standard-deviation-to-mean ratio (s/m) : The quantity s=m(both in power) isa measure of image speckle in homogeneous region [30, 1, 17, 51]. We computedthe s=m ratio for each type of clutter region to quantify the speckle reductioncapacity of our algorithm.Log standard deviation [63]: The standard deviation of the clutter data (in dB).This is an important quantity that directly a�ects the target detection perfor-mance of a standard two-parameter constant false alarm rate detector (CFAR)algorithm.Target-to-clutter ratio(t/c): The di�erence between the target and clutter means(in dB). It measures how the target stands out of the surrounding clutter.De ection ratio: This is the two-parameter CFAR detection statistic.M = y � �y�y (4:48)where y is the scalar pixel value of the cell, �y is the estimated mean of y, and�y is the estimated standard deviation of y. After speckle reduction, M shouldbe higher at known re ector points and lower elsewhere.

78Table 4.1, 4.2 and 4.3 show those four values for original and processed images forfour typical regions. The large reductions of s/m and log-standard deviation indicatethat a signi�cant amount of speckle has been removed. The soft thresholding schemeperformsmuch better in terms of s=m and log�std than the hard thresholding scheme.Both of them performs equally better in terms of de ection ratio. However, for target-to-clutter ratio (t/c), hard thresholding gives better results than soft thresholding.This is not surprising since the peak value is reduced by soft thresholding of thewavelet coe�cients.To visualize the result, we show the the original and the wavelet processed HV-polarization image in Figure 4.10 and 4.11. We can see from the image that speckle isgreatly reduced while sharp features are maintained. The computational complexityof the wavelet shrinkage method is of O(N), where N is the size of the data. Thusour proposed method is e�cient.Table 4.1 Speckle reduction results for signalpolarization SAR image: s=m for clutter data.Trees Scrub Grass ShadowOriginal HH 1.8207 1.3366 1.0590 1.2152Soft Threshold 1.0602 0.5740 0.4251 0.5137Hard Threshold 1.7783 1.2454 0.9283 1.1272Table 4.2 Speckle reduction results for signalpolarization SAR image: log � std for clutter data.Trees Scrub Grass ShadowOriginal HH 7.2431 6.0598 5.4190 5.6231Soft Threshold 4.3230 2.4263 1.8283 1.9988Hard Threshold 5.5608 3.8532 2.9457 3.1140

79100 200 300 400 500

50

100

150

200

250

300

350

400

450

500Figure 4.10 The Original SAR image of a farm area, HV polarization.100 200 300 400 500

50

100

150

200

250

300

350

400

450

500Figure 4.11 Processed image, using Daubechies'length-4 wavelets(HV), soft thresholding.

80Speckle Reduction in Multipolarization SAR ImageryThe availability of fully polarimetric SAR data makes it possible to reduce the speckleby utilizing the correlations between the co-polarized (HH, VV) and cross-polarized(HV, VH) images. Novak [63] derived a polarimetric whitening �lter (PWF) whichin theory is optimal if the correlations between HH, HV, and VV are known for everypixel. However, a study [51] shows that this is rarely the case. Lee [51] proposes anadaptive method which estimates the correlations using a moving window. However,the optimal window size is hard to �nd.We propose three methods for fully polarimetric SAR data. As shown in Figure 4.12,4.12 and 4.12, they are di�erent combinations of the PWF and wavelet speckle re-duction method. Due to the nonlinear nature of the wavelet shrinkage method, theyare not equivalent.---VVHVHH PWF - ln j�j - DWT - Thresh-olding -IDWT - e(�) -| {z }SAR DespeckleFigure 4.12 Wavelet based multi-polarization speckle reduction (method1). Step 1: Perform PWF. Step 2: Perform wavelet denoising.Using the fully polarimetric SAR data, we tested the three methods. The statisticsare shown in Table 4.4.6, 4.4.6 and 4.4.6. The further reductions of s/m and log-standard deviation indicate that a signi�cant amount of speckle has been removed.To demonstrate these results visually, we show the PWF processed image of the samefarm scene in �gure 4.15. Figure 4.16 shows the wavelet processed image. Since

81VVHVHH - ln j�j - DWT - Thresh-olding - IDWT- e(�) -- ln j�j - DWT - Thresh-olding - IDWT- e(�) -- ln j�j - DWT - Thresh-olding - IDWT- e(�) - PWF -Figure 4.13 Wavelet based multi-polarization speckle reduction (method2). Step 1: Denoise individual polarimetric images HH, HV and VV. Step 2:Combine with PWF.---VVHVHH Changeofbasis VV��p HHp (1�j�j2)HVp�HH - SARDespeckle - (�)2 -- SARDespeckle - (�)2 -- SARDespeckle - (�)2 - P -Figure 4.14 Wavelet based multi-polarization speckle reduction (method3). Step 1: Decorrelate with PWF matrix. Step 2: Denoise with waveletthresholding. Step 3: Add resulting three images in magnitude.

82three methods produce visually similar result, only the result of method 1 with softthresholding is shown.Comparing the results for soft thresholding and hard thresholding for all the threemethods, we see that soft thresholding gives consistently better results in terms ofs=m, log � std and de ection ratio, while the hard thresholding is better in terms oft=c. Method 1 gives overall better performance, and method 1 with hard thresholdingis the only combination that increase the performance in term of all the four statistics.After speckle reduction, the de ection ratio should be higher at known re ectorpoints and lower elsewhere. In ATD/R systems [63], the de ection ratio is calculatedfor each cell, and compared to a constant that de�nes the false alarm rate. As shownin Table 4.4.6, the de ection ratio is much higher in the wavelet processed images thanthat in the PWF processed image. We also tested both PWF and wavelet shrinkagemethods on a SAR image that contains several standard re ectors. At these points,the de ection ratio value is 30% to 50% higher in wavelet processed image than that inPWF processed image. Elsewhere in the image, the de ection ratio values are roughlythe same for both methods. This strongly indicates the advantage of our method,and suggests an improvement in detection performance. Also, cleaner images suggestpotential improvements for classi�cation and recognition.Similar method was independently proposed in [59]. However, only hard thresh-olding scheme was used there. Comparing with [59], we also study speckle reductionfor multi-polarization SAR, and the e�ect of speckle reduction on the performance ofautomatic target detection and recognition system.

83100 200 300 400 500

50

100

150

200

250

300

350

400

450

500Figure 4.15 Original PWF image of a farm area.100 200 300 400 500

50

100

150

200

250

300

350

400

450

500Figure 4.16 Resulting image of wavelet based multi-polarization specklereduction (method 1). Daubechies' length-4 wavelet and the softthresholding scheme are used.

84Table 4.3 Speckle reduction results for signal polarization SAR image:target-to-clutter ratio(t/c) and de ection ratio for clutter data.t/c De ection ratioOriginal HH 31.1813 5.1456Soft Threshold 18.4149 7.5897Hard Threshold 28.6837 7.2191Table 4.4 Speckle reduction results formulti-polarization SAR image: s=m for clutter data.Trees Scrub Grass ShadowPWF 1.3033 0.8240 0.6549 0.7007Method 1, Soft 0.9233 0.4464 0.3034 0.3578Method 2, Soft 0.8712 0.3757 0.2889 0.3327Method 3, Soft 0.8272 0.3719 0.2754 0.3141Method 1, Hard 1.1950 0.6971 0.4979 0.5527Method 2, Hard 1.4507 0.8947 0.7249 0.7665Method 3, Hard 1.3579 0.8650 0.6868 0.7285Table 4.5 Speckle reduction results formulti-polarization SAR image: log� std for clutter dataTrees Scrub Grass ShadowPWF 4.9404 3.4292 2.9528 2.8999Method 1, Soft 3.8321 1.8809 1.3264 1.3068Method 2, Soft 3.5897 1.5279 1.1681 1.2310Method 3, Soft 3.4680 1.5062 1.1371 1.1659Method 1, Hard 4.4482 2.6858 2.0449 1.9409Method 2, Hard 4.8833 3.1973 2.6807 2.6915Method 3, Hard 4.7115 3.1115 2.6087 2.5732

85Table 4.6 Speckle reduction results for multi-polarization SAR image:target-to-clutter ratio(t/c) and de ection ratio for di�erent methods.t/c De ection ratioPWF 34.0269 11.1842Method 1, Soft 29.5359 18.0767Method 2, Soft 24.3769 16.5064Method 3, Soft 23.7384 16.4851Method 1, Hard 35.3883 15.5491Method 2, Hard 37.2382 13.0932Method 3, Hard 32.2843 11.6748

864.5 Joint Denoising and Data Compression4.5.1 Problem Setupy(x) W Y (X) Q� X(Y ) W T x(y)Figure 4.17 The diagram of the jointdenoising and compression using the DWT.There is a striking similarity between data compression and denoising, i.e. thetransform is the same, the error measure is similar, both the quantizer Q and thethresholding T are point wise operators. Donoho showed [19] that data compressionand denoising are theoretically closely related. The fact that wavelets are the un-conditional basis makes them essentially better for both compressing and denoisingthan any other orthogonal basis, in an asymptotic minimax sense. Another impor-tant property of wavelets is that they decorrelate near optimally for many correlatedsources. This is another reason why it works so well in practice for data compression.In this section, we consider the joint denoising and compression problem. We haveobservations of samples of a deterministic function contaminated by additive whiteGaussian noise, as y = x+ n; (4:49)where x is the unknown signal vector, n is i.i.d. Gaussian noise, and y is the observa-tion vector. For simplicity, we normalize (4.49), such that the noise has unit variance.The goal of denoising and compression is to �nd a good estimate x, and minimize thenumber of bits needed to represent x simultaneously. The quality of x is measuredby the risk Rx = E hkx� xk22i : (4:50)

87where the error is caused by the presence of the noise, and the representation of thecontinuous amplitude of x by �nite number of values, i.e. the quantization process.It is clear that the individual problems of denoising and compression are special casesof the problem we consider here. A trivial approach to attack the joint problem isto �rst estimate/denoising according to some optimality criterion and then compressusing the usual minimum mean square error criterion. However, the outcome of thesequential approach is not optimal.4.5.2 Design of an Asymptotically Optimal QuantizerAs shown in previous sections, the optimal method for data compression or denois-ing is point wise processing in the wavelet domain. For the problem of joint datacompression and denoising, we need a point wise operator, which both shrinks andquantizes the wavelet coe�cients. If we simply concatenate the denoiser and the com-pressor, we obtain a nonlinear operator as in 4.18(c), assuming soft thresholding. Wecan observe that unlike the shrinking operator, each coe�cient is treated di�erently,depending on its value. It is clear that the amount of shrinking is anywhere from� � �=2 to � + �=2, where � is the shrinking threshold and � is the step size of theuniform quantizer. This clearly violates the necessary condition for asymptoticallyoptimal denoising, which requires that the shrinkage is at least p2 logN . A simplemodi�cation is shown in Figure 4.18(b), which guarantees that the minimum amountof shrinking is p2 logN , while the maximum amount of shrinking is p2 logN + �.We can show that for the new quantizer,Rxsoft�;� < ((q2 logN + �)2 + 1)(1 +Rxideal): (4:51)This indicates that the new quantizer is optimal in the sense that it can achievethe same asymptotical rate as the plain denoising, the di�erences are in lower orderterms. Also it guarantees (with high probability) that the reconstructed signal is as

88smooth as the original signal, or we can say it is essentially noise free, which is nottrue for the simple quantizer in Figure 4.18(c).For hard thresholding, we can similarly construct an asymptotically optimal quan-tizer as in Figure 4.18(d), and proveRxhard�;� < ((q2 logN + �)2 + 1)(1 +Rxideal): (4:52)−5 −4 −3 −2 −1 0 1 2 3 4 5

−5

−4

−3

−2

−1

0

1

2

3

4

5

input

quan

tized

out

put (a) −5 −4 −3 −2 −1 0 1 2 3 4 5

−5

−4

−3

−2

−1

0

1

2

3

4

5

input

quan

tized

out

put (b) −5 −4 −3 −2 −1 0 1 2 3 4 5

−5

−4

−3

−2

−1

0

1

2

3

4

5

input

quan

tized

out

put (c) −5 −4 −3 −2 −1 0 1 2 3 4 5

−5

−4

−3

−2

−1

0

1

2

3

4

5

input

quan

tized

out

put (d)Figure 4.18 Quantizers (a) Uniform Scalar Quantizer, (b) Optimal SoftQuantizer, (c) Soft Thresholding + Uniform Scalar Quantizer, (d) OptimalHard Quantizer4.5.3 ExampleTheoretically, the optimality of wavelet based denoising and compression holds ingreat generality [19]. The following example shows that it is also very appealingin practice. The signal `HeaviSine' is generated by Donoho's MATLAB routineMakeSignal from his software packet TeachWave (available by anonymous ftp fromplayfair.stanford.edu). The length of the signal is 2048, and the signal-to-noise ratiois 15:8095dB (the noise variance is 1). Daubechies' length 8 �lter is used (the mostsymmetric version). The signal is shown in Figure 4.19(a), and one realization of thenoisy signal (generated by MATLAB randn) is shown in Figure 4.19(b). The resultof Donoho's soft threshold denoising is shown in Figure 4.19(c), where the thresh-old is p2 logN , and 8 level periodic DWT is used. We simultaneously compress

89and denoise the signal using the derived optimal quantizer, the outcome is shown inFigure 4.19(d). The entropy of the quantized coe�cients is 0:0624, and the risk perpoint is 0.2426.200 400 600 800 1000 1200 1400 1600 1800 2000

−15

−10

−5

0

5

10

(a) 200 400 600 800 1000 1200 1400 1600 1800 2000−15

−10

−5

0

5

10

(b)200 400 600 800 1000 1200 1400 1600 1800 2000

−15

−10

−5

0

5

10

(c) 200 400 600 800 1000 1200 1400 1600 1800 2000−15

−10

−5

0

5

10

(d)Figure 4.19 Examples of joint denoise and compression. (a) HeaviSine.(b) Noisy HeaviSine. (c) Soft Threshold Denoising. (d) Compressed anddenoised in ON DWT basis.In order to evaluate the overall performance of our algorithm, we plot the rate-riskcurve in Figure 4.20, where the rate is the average number of bits used per point, andthe risk is the averaged l2 error per point between the original and the simultaneousdenoised and compressed signal. The result is an average of 100 realizations. Forcomparison, we also include the rate-risk curve for compression without denoising,which is lower bounded by 1 (the noise variance). Figure 4.20 clearly shows the

90advantage of our algorithm, it uses much fewer bits and the resulting signal has muchless risk.0 1 2 3 4 5 6

0

0.5

1

1.5

2

2.5

rate(bits per point)

risk(

expe

cted

mea

n sq

uare

erro

r)

Figure 4.20 The rate-risk curve of joint denoising and compression.Compress the noisy signal (dashed curve), ON wavelet denoising andcompression (solid curve)4.6 Detectiony W Y T� Y kY k2Figure 4.21 The diagram of the wavelet based detection.One common task for statistical signal processing is signal detection, where one hasto make a decision whether the interested signal is present in the noisy observationor not. When the interested signal is known, the optimal detector is the matched�lter. However, when the signal is not known, e.g., unknown transient, the detectionproblem is still open.

91As in Figure 4.21, the proposed detector has three stages. The �rst stage is thewavelet transform, followed by soft thresholding. In the last stage, we calculate thel2 norm of the thresholded coe�cients, if the l2 norm is bigger than a threshold, wedecide that there is signal present, otherwise we decide that there is pure noise.This detector can be interpreted as a optimal estimator followed by a square lawdetector. It is a generalization of the generalized likelihood ratio test (GLRT). Insteadof estimating an unknown parameter as in GLRT, we estimate the signal itself. Theperformance of this detector is directly related to the ability to concentrate energyinto few wavelet coe�cients Since the wavelet is asymptotically minimax optimal fora wide range of function spaces [26], this estimator is optimal in the same sense.In practice, especially when we only have �nite-length data, we can use the shift-invariant and time-varying wavelet transforms to maximize the energy concentrationof the transformed coe�cients, and further improve the detection performance.4.7 Future WorkMany generalizations can be made, one of the most important ideas is that insteadof �nding the best basis for one signal, we can �nd the best basis for a set of signals.Since many problems, e.g., classi�cation and recognition, are posed for sets of signals,�nding the best basis for sets of signals is very promising. Some work [68] has beendone, but more work is needed to develop a complete theory.

92Chapter 5Undecimated Wavelet Transforms5.1 IntroductionIn Chapter 3, we introduced several shift invariant transforms. However, those trans-forms are nonlinear, i.e. the transform of the sum of two signals is not the sum of thetransforms of the individual signals. The wavelet transform that is both linear andshift-invariant is the undecimated wavelet transform.5.2 Undecimated Discrete Wavelet Transform5.2.1 A ReviewThe undecimated discrete wavelet transform has been independently discovered sev-eral times, for di�erent purposes and under di�erent names [4, 70, 55, 49, 65], e.g.shift/translation invariant wavelet transform, stationary wavelet transform, or redun-dant wavelet transform. The key point is that it is redundant, shift invariant, andit gives a denser approximation to the continuous wavelet transform than the ap-proximation provided by the orthonormal (ON) discrete wavelet transform (DWT).A discussion of the algorithm and history can be found in [49].From the �lter bank point of view, we keep both even and odd downsamples, andfurther split the lowpass bands. The structure of a two level undecimated wavelettransform is shown in Figure 3.8. For the inverse, we invert both the even part andthe odd part, then average the result, as in Figure 5.1. The two level undecimatedinverse wavelet transform is shown in Figure 5.2.

93-H -L--z�1"2- -z�1"2- --z�1"2- -z�1"2- (a) -H -L--odd- -even- --odd- -even- (b)Figure 5.1 Two equivalent representations for the building blockfor the undecimated inverse discrete wavelet transform.We can also show the UDWT from the matrix point of view. The undecimateddiscrete wavelet transform can be visualized as a matrix multiplicationY =Wy; (5:1)where y is a 1 � N input vector, W is an (L + 1)N � N matrix, where L is thenumber of levels of decomposition, and Y is the (L + 1)N � 1 output vector. W =[W1;W2; : : : ;WL;WL+1]T , where Wi is an N �N matrix, and the columns of Wiare circularly shifted versions of a single vector wi, which is the usual discrete wavelettransform (DWT) basis at ith scale (i = 1 for the �nest scale), and wL+1 is the scalingfunction at the coarsest scale.There are many inverse transforms, and a balanced one is given byM = �12W1; 122W2; : : : ; 12LWL; 12LWL+1� ; (5:2)where the factors (12 ; 14; : : : 12L ; 12L ) are required to o�set the increasing redundancyof the UDWT as the scale becomes coarser. Direct multiplication in (5.1) requiresO((L+1)N2) operations. Fortunately fast algorithm exists, so that the total numberof operations is O(LN), which is at most O(N logN).

94-H-L

--odd- -even---odd-

-even--H -L--odd- -even- --odd- -even- -H -L--odd- -even- --odd- -even- Figure 5.2 Diagram for the two-levelundecimated inverse discrete wavelet transform5.2.2 The Computational Complexity of the UDWTFor L level UDWT, since we do not down-sample, the total complexity increaseslinearly with L, as MUDWT (N;M;L) =MNL; (5:3)AUDWT (N;M;L) =MNL: (5:4)The number of levels L is bounded above by log2N , soMUDWT (N;M;L) �MN log2N; (5:5)AUDWT (N;M;L) �MN log2N: (5:6)

95The complexity for the inverse transform is the same.We need to point out that DWT and UDWT are themselves sequences of convo-lutions. Depending on the practical situation, we might implement those convolutionusing other fast algorithms as the FFT, e.g., for cosine modulated orthogonal wavelettransform [35].5.2.3 Multiresolution for the UDWTOne key observation on 2.14 is (x� 12) = 2Xk hk (2(x� 12)� k) = 2Xk hk (2x� k � 1) (5:7)So D0R1 � D1R1T 12 : (5:8)Combined with 2.20, we have D0R1 � R1D1 : (5:9)In general,DiR1 � DjR1T k2j�i ; i < j; k = 0; : : : ; 2j�i � 1: (5:10)In stead of having one �xed way to split a space as 2.19, we have two ways:Dj�1R1 = DjR1 �DjR1 = DjR1T 12 �DjR1T 12 : (5.11)Repeatedly using this, we decompose the space into a set of very redundant subspaces.5.3 Undecimated Discrete Wavelet Packet TransformWe can further repeat the �ltering on highpass band, keep both even and odd down-samples at the same time. The resulting transform is the complete undecimated

96-H-L

--odd--even-

-H -L--odd- -even- --odd- -even- -H -L--odd- -even- --odd- -even---odd-

-even--H -L--odd- -even- --odd- -even- -H -L--odd- -even- --odd- -even-Figure 5.3 Diagram for the two-level undecimatedinverse discrete wavelet packet transform.

97discrete wavelet packet transform. The diagram for the forward transform is inFigure 3.14, and the diagram for the inverse transform is in Figure 5.3.The computational complexity of the undecimated discrete wavelet packet trans-form is MUDWPT (N;L;M) =M(2L+1 � 1)N; (5:12)AUDWPT (N;L;M) =M(2L+1 � 1)N: (5:13)The inverse transform has the same complexity.5.4 SummaryIn this chapter, we introduced the undecimated wavelet transforms. They are lin-ear, shift-invariant, redundant, and undecimated. The drawbacks are, the lack oforthogonality, the increased computational complexity, and the increased size of theoutputs.

98Chapter 6Applications of Undecimated Wavelet Transforms6.1 IntroductionIn the previous chapter, we introduced the undecimated wavelet transforms. Theyare linear, shift-invariant, redundant, and undecimated. In this chapter, we discussseveral applications of the undecimated wavelet transform.6.2 Convolution using the UDWT6.2.1 IntroductionConvolution is the fundamental operation of linear system theory, and discrete con-volution is one of the most widely used digital signal processing operation. Finiteimpulse response (FIR) digital �lters are designed to be convolved with input sig-nals to achieve certain e�ects, and the fast Fourier transform (FFT) is mostly usedto implement convolution. Therefore any scheme that can speed up the convolutionprocess is theoretically interesting and practically important.The Fourier transform, Laplace transform and Z-transform all have similar con-volution theorems, which relate a signal domain convolution to a transform domainproduct. The wavelet transform is a powerful new mathematical tool. It would de-sirable to have a similar convolution theorem for the wavelet transform. It has beenshown, however that the continuous wavelet transform can not admit a Fourier typeconvolution theorem [53], but we can convolve two signals by directly convolving thesub-band signals and combining the results [77, 66]. Neither of the above answers

99is quite satisfactory. The lack of shift invariance is one of the reasons for the non-existence of a wavelet convolution theorem of the Fourier type.The undecimated discrete wavelet transform (UDWT) is linear and shift invariant,so it could be used to implement convolution. The computational complexity of theUDWT is O(N logN), which is of the same order of the FFT. In this section, we pro-pose a scheme to implement convolution using the UDWT, and study its advantagesand limitations.6.2.2 The Schemey W Wy D DWy M MDWyFigure 6.1 The diagram of the convolution using the UDWT.Let d be an 1�(L+1) vector, and o be an 1�N vector of ones, then the Kroneckertensor product k = do is a vector of length (L+1)N . Let D be a diagonal matrixwith k on the diagonal, then MDW is a linear shift5 invariant operator, i.e.MDW(ax+ by) = aMDWx+ bMDWy; (6:1)and MDWS(x; I) = S(MDWx; I); (6:2)where S(x; I) denotes the circular shift of x by I units. SoMDW performs a circularconvolution, as x � h =MDWx; (6:3)5We only consider the circular shifts in this section.

100where h is determined by d, and the QMF. Thus we have a scheme that implementsconvolution using UDWT. We can further merge those scaling factors (12 ; 14 ; : : : 12L ; 12L )in M into the diagonal matrix D, and simplify our method as WT ~DW.Our UDWT based convolution method has a similar form as the convolutiontheorem of the Fourier transform, i.e. transform domain product. However, that isonly true for one of the input signals (x), the other input h is buried in the QMFand d. Or we can equivalently say that h is transformed to d through some othertransformation involving the QMF. This is not surprising, since the wavelet transformthat has the exact Fourier type convolution theorem does not exist [53]. Anotherdrawback is, as we will discuss in Section 6.2.3, only certain types of convolutions canbe implemented using the UDWT.Note that WTDW can be interpreted as a scale weighting operation.6.2.3 LimitationsIt is natural to ask whether arbitrary circular convolution can be implemented thisway. The degree of freedom of our scheme is L+1+(M=2�1) = L+M=2, since d isfree and we have M=2 � 1 degrees of freedom in choosing the QMF. Since M � N6,and L � log2N , the total degree of freedom of the scheme is upper bounded bylog2N + N=2, which is less than N . Clearly, not all circular convolutions can beimplemented using UDWT.To characterize those convolutions that can be implemented using UDWT, weneed to study our scheme in greater detail. Let h denote the convolution kernel forWTDW, i.e. x � h =WTDWx. We can show thath = L+1Xi=1 diai; (6:4)where di is the e�ective weighting factor for the ith scale, and ai is the autocorrelationsequence of wi. So the �lter that we actually use is a weighted combination of the6We assume the �lter is shorter than the signal.

101autocorrelation sequences of the wavelets on di�erent scales. Since the autocorrelationsequences are symmetric and of odd length7, h must also be symmetric and of oddlength, i.e. the type-I �lter in [64]. So the degree of freedom of our scheme is upperbounded by N=2 + 1. This bound is tighter than the bound we previously had.It is thus important to ask whether all the odd length symmetric �lters can beimplemented using UDWT, i.e. whether the bound N=2 + 1 is tight. For a given h,we need to solve a set of nonlinear equations to get the QMF and d. These equationsare not straight forward to solve, so one might want to design �lters that have thestructure as in (6.4). Since all the QMF can be parameterized [76], the design task isan unconstrained nonlinear optimization problem, for which general tools exist [16].6.2.4 Computational Complexity of Convolution AlgorithmsBecause of the structure, the multiplication of D can be merged into the UDWT, anddoes not require any additional computations. So the total numbers of multiplicationsand additions needed to perform our circular convolution scheme areMUDWTConv(N;M;L) = 2MNL; (6:5)and AUDWTConv(N;M;L) = 2MNL: (6:6)The straight forward convolution requiresMConv(N;L(h)) = NL(h); (6:7)and AConv(N;L(h)) = NL(h); (6:8)operations, where L(h) is the length of h.7Ignoring the circular nature of our autocorrelation at this moment.

102For straight forward FFT based convolution,MFFTConv(N) = N log2N +N; (6:9)and AFFTConv(N) = 72N log2N: (6:10)In order for our algorithm to be computationally the most e�cient, we need8L(h) > 2ML; (6:11)and N > 2 89ML: (6:12)For su�ciently long x, (6.12) is easily satis�ed. We will show in Section 6.2.5 that wecan generate a rather long �lter by a short QMF and few levels of decomposition, suchthat (6.11) is valid. For these cases, we indeed have a fast algorithm to implementthe convolution.6.2.5 Size Property of the Autocorrelation Sequences of DWT BasisLet L(s) denote the length/support of the vector s. Since the DWT basis sequencesat di�erent levels are related by up-sampling and convolution, it can be shown that9L(wi) = 2L(wi�1) +M � 2; L(w1) =M: (6:13)The length of the autocorrelation function and the length of the DWT basis arerelated by L(ai) = 2L(wi)� 1: (6:14)Solving the di�erence equation (6.13), we haveL(wL) = 2L(M � 1)�M + 2; (6:15)8We compare the total number of multiplications and additions.9Ignore the warping caused by the circular convolution.

103and L(aL) = 2L+1(M � 1) � 2M + 3: (6:16)So L(aL) grows exponentially with L 10, thus L(h) also grows exponentially with L,and the growth rate is independent ofM . We can see that (6.11), which only requiresa linear growth rate with L, can be easily satis�ed. If we measure the e�ciency of ouralgorithm by the ratios between the computational complexity of other algorithmsand the complexity of our UDWT based algorithm, we can conclude that the e�ciencygrows logarithmically with N and exponentially with L. We need to emphasize thatthe �lters we can e�ciently implement must have the structure as (6.4). Theremight exist other e�cient algorithms to implement such �lters, however we are notaware of any such algorithm. Also more sophisticated FFT based algorithm mightbe competitive. Thus, further investigation is needed.6.2.6 ExampleAs an example, we would like to design a lowpass FIR �lter with a cuto� frequency at0:125�. For simplicity, we only minimize the l2 di�erence between the ideal �lter andthe designed �lter. We use three level UDWT implementation, and length 6 QMFs oneach level. In order to gain more freedom, we use di�erent QMFs on di�erent levels.However, this does not change the total complexity of the implementation. We needto �nd three lattice parameters on each level, and four weighting coe�cients. Theunconstrained optimization routine in [16] were used to �nd the 14 unknowns thatminimize the l2 di�erence between the ideal �lter and the designed �lter, and theresulting11 QMF coe�cients12 and weighting vector are shown in Table 6.1 and 6.2respectively. The frequency response of the �nal �lter is shown in Figure 6.2(a). The10We assume N is very large, since L(aL) < N .11One local minimum.12The constraint on the sum of squares of the QMF coe�cients is relaxed, and it does not a�ect theproblem.

104resulting �lter has the same l2 error as a length-53 optimal FIR �lter, whose frequencyresponse is shown in Figure 6.2(b). Comparing Figure 6.2(a) and Figure 6.2(b), wecan see that the new �lter has a slightly higher overshoot, but its frequency responsein the stopband is much better. The computational complexity of the new �lter is thesame as a length-36 FIR �lter. For comparison, we also plot the frequency responseof a length-36 optimal FIR lowpass �lter in Figure 6.2(c). Clearly, for this example,we can e�ciently implement better �lter using the UDWT.Table 6.1 The QMF coe�cients for the designed lowpass �lter.level QMF coe�cients1 0.618744 1.023193 -0.310055 0.135727 -0.135082 0.0816862 0.527011 0.760529 0.341187 -0.193336 -0.068969 0.0477923 0.012580 0.537700 0.803249 0.213485 -0.156461 0.003661Table 6.2 The weighting coe�cients for thedesigned autocorrelation sequences.i 1 2 3 4di 0.201496 -0.121795 -0.001087 -0.000692The frequency responses of the four autocorrelation sequences on di�erent levelsare shown in Figure 6.2(d). Since the Fourier transform is linear, the frequencyresponse in Figure 6.2(a) is the weighted combination of the frequency responses ofthe autocorrelation sequences, and the weights are the same as in Table 6.2.During our limited experiments, we observe that the UDWT implementations isvery e�cient when the lowpass bandwidth is �2N , and the e�ciency grows when thepassband gets narrower. This may be due to the tree structure of the wavelet trans-form, where we successively split the lowpass band. For other cuto� frequencies or

1050 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

Am

plit

ud

e

ω/π

0 0.2 0.4 0.6 0.8 1

0

−10

−20

−30

−40

−50

Ma

g.

in d

B

ω/π(a) 0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

Am

plit

ud

e

ω/π

0 0.2 0.4 0.6 0.8 1

0

−10

−20

−30

−40

−50

Ma

g.

in d

B

ω/π(b)0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

Am

plit

ud

e

ω/π

0 0.2 0.4 0.6 0.8 1

0

−10

−20

−30

−40

−50

Ma

g.

in d

B

ω/π(c) 0 0.2 0.4 0.6 0.8 1ω/π

Am

plit

ud

e

(d)Figure 6.2 Frequency responses of lowpass �lters with cuto� frequency at0:125�. (a) The designed �lter that can be e�cient implemented using theUDWT. (b) The length-53 optimal �lter that has the same l2 error as the�lter in (a). (c) The length-36 optimal �lter that has the samecomputational complexity as the �lter in (a). (d) The frequency responses ofthe four autocorrelation sequences.

106other �lter types, the undecimated wavelet packet transform or the M-band undec-imated wavelet transforms might be helpful. Multirate implementations of narrow-band lowpass �lters have been studied in [11], however the implementations there arenot strictly shift invariant. We also need to compare with the interpolated FIR �lter[62], which an is e�cient technique for the design and implementation of narrow bandlowpass �lters.6.2.7 SummaryWe propose a novel method to implement the convolution using the undecimatedwavelet transform. Similar to the convolution theorem of the Fourier transform, theUDWT based convolution has the form of a transform domain product. The �ltersthat can be implemented using the UDWT are completely characterized. We alsoshow that for certain cases, our method is more e�cient than traditional convolutionalgorithms.6.3 Compression using the UDWT6.3.1 Quantization of the Coe�cients of the UDWTThe rows of the undecimated wavelet transformation matrix constitute a frame of RN .It is well known [13, 3] that the frame representation is robust against additive whitenoise in transform domain, e.g. quantization noise. Here we evaluate the robustness ofthe undecimated wavelet transform against quantization noise. Note, in this section,we only consider error introduced by quantization.When the step size of the uniform quantizer is small, i.e. �ne quantization, thequantization noise can be modeled by additive white and uniformly distributed ran-dom variables [47], with zero mean and variance �212, where � is the step size of thequantizer. We measure the error by E(W;x) = E [kMQ(Wx)� xk22]. Under the

107above assumptions, we can showE(WUWT ;x) � 13E(WDWT ;x); (6:17)which indicates that for the same amount of total quantization error, the step size forthe undecimated wavelet transform can be p3 times the step size of the orthogonaltransform. Thus the number of bits needed per coe�cient is reduced. However thetotal number of coe�cients increases by a factor of L + 1. In practice, we need to�nd the proper amount of redundancy, such that the transform is redundant enoughto be robust against noise, but not over redundant to increase the total rate under aspeci�ed distortion. We also need to point out that the coe�cients of the UDWT arecorrelated, so we can exploit the correlation to code the quantized coe�cients withfewer bits.6.4 Denoising in Redundant Basis6.4.1 The Methody(x) W Y (X) T� X(Y ) M x(y)Figure 6.3 The diagram of the denoising using the UDWT.Assume we have signal in noise,yi = xi + ni i = 1; : : : ;N (6:18)where xi is the original signal, ni is i.i.d. white Gaussian noise with unit variance, andyi is the observation. Let W denote the redundant wavelet transformation matrix,we have Wy = Wx+Wn; (6:19)

108or Yi = Xi +Ni; i = 1; : : : ;N (6:20)The inverse redundant wavelet transformation matrix M satis�es:MW = I (6:21)Ni is colored Gaussian noise with the correlation matrixWM . We consider a diagonalprojection estimator of the form Xi = �iYi, �i = 0 or 1. This is called keep or kill, orhard thresholding rule. We have:x = W T X =M�Y =M�Wy (6:22)where � = diag(�i).6.4.2 The Ideal RiskUnder the above assumptions, the ideal diagonal projection estimator is obtained bysetting �i = 8><>: 1 jXij � 10 jXij < 1 (6:23)For convenience, we de�ne set A = fi j jXij � 1g, and B = fi j jXij < 1g. De�neSi def= X �X = �iYi �Xi = 8><>: Ni i 2 A�Xi i 2 B (6:24)The ideal risk is RRDWT (X)= E kx� xk22= E MX �MX 22 = E M(X �X) 22= Xi;j2A hWi;Wji hMi;Mji + Xi;j2BXiXj hMi;Mji (6.25)where Wi, or Mi are ith column of W or M . We also used that fact that E[Ni] = 0.

109For a �xed function, we can compare the ideal risk for the wavelet estimator asin Eqn. (4.43) with the redundant wavelet estimator as in Eqn. (6.22). For severalstandard examples, the ideal risks for the redundant wavelet estimator are 12 to 23 ofthe ideal risks for the wavelet estimator. This indicates the advantage of using theredundant wavelet transform.Our extensive simulations have shown that the UDWT based denoising signi�-cantly improves the denoising performance [49], compared with the classical waveletshrinkage method of Donoho and Johnstone. Especially, when the signal to noiseratio is low, our method improves the performance by an order of magnitude [49].6.5 Compression and Denoising using the UDWTWe have shown in Section 6.3 and Section 6.4, that the undecimated wavelet transformis very powerful for either denoising or compression. Based on these results, wepropose a robust joint denoising and compression method using the undecimatedwavelet transform. The procedures are the same as in Section 4.5, but the orthogonalwavelet transform is replaced by the undecimated wavelet transform.Following the same example in Section 4.5, the UDWT denoised signal is shownin Figure 6.4(a), where the quantizer is as Figure 4.18(d). The compressed signalis shown in Figure 6.4(b), and the entropy of the quantized UDWT coe�cients is7:6079 bits per input point, the risk per point is 0.0608. The rate-risk curve is shownin Figure 6.5, which shows that the lowest risk is obtained by joint denoising andcompression of the UDWT coe�cients.

110200 400 600 800 1000 1200 1400 1600 1800 2000

−15

−10

−5

0

5

10

(a) 200 400 600 800 1000 1200 1400 1600 1800 2000−15

−10

−5

0

5

10

(b)Figure 6.4 Example of joint denoise and compression using theundecimated wavelet transform, the original signals are in Figure 4.19. (a)Denoising using the UDWT, (b) Compression and Denoising using theUDWT.0 2 4 6 8 10 12

0

0.5

1

1.5

2

2.5

rate(bits per point)

risk(

expe

cted

mea

n sq

uare

erro

r)

Figure 6.5 The rate-risk curve of joint denoising and compression usingthe undecimated wavelet transform. Compress the noisy signal (dashedcurve), undecimated wavelet denoising and compression (solid curve)

1116.6 Conclusion and Future WorkThe undecimated wavelet transforms are very powerful, and deserve much more at-tention. One open problem is how to control the degree of redundancy. We wantenough redundancy to be robust, but not too much redundancy in order to savecomputation. Can we extend the best basis paradigm to the best redundant frame?Much research has to be done to answer this question.

112Chapter 7Conclusions7.1 Summary of the WorkIn this thesis, we have developed several shift-invariant, and time-varying orthogonalwavelet transforms. These algorithms are shown to be powerful and e�cient forapplications such as signal analysis, data compression, denoising, and detection.We also studied the undecimated wavelet transform, and proposed some novelapplications in convolution, compression, and denoising.7.2 What Are the Problems that Wavelets Are Good For?Over the years, people have been asking the question, \What are the problems thatwavelets are good for? " In this thesis, we have studied several such problems. Wenow summarize some properties of the problems for which the wavelets are the righttool,� The problem must have a sparse representation in the wavelet domain. Theoptimality of wavelet based denoising, compression and detection relies on thisproperty.� There must be multiscale activities, as we have shown in several places in thethesis. The multiresolution property plays a major role in many of the applica-tions.

113� Only the amplitudes of wavelet coe�cients are important for the problem. Themajor di�erence of wavelets analysis, compared to Fourier analysis, is that itrepresents information in amplitude, not in phase or frequency.7.3 Future DirectionsAlthough the modern wavelet theory only has a history short than a decade, its greatpotential has already been widely recognized. However, many questions are still open,� Can we develop a complete multiscale system theory that can also embodynonlinear processing in wavelet domain?� Can we use the knowledge of the structure or the model of the problem to furtherimprove the performance, especially in low signal to noise ratio situations?� Can we develop higher level grammatical representations to describe multiscaledynamic behaviors that often occur in real world situations?It may take another decade to answer these questions.

114Bibliography[1] H. H. Arsenault and G. April. Properties of speckle integrated with a �nite aper-ture and logarithmically transformed. J. Opt. Soc. Am., 66:1160{1163, November1976.[2] R. Bellman. Dynamic Programming. Princeton University Press, 1957.[3] J. J. Benedetto and S. Li. Subband coding and noise reduction in multiresolutionanalysis frames. InWavelet Applications in Signal and Image Processing, volume2242, pages 154{165, San Diego, CA, July 1994. SPIE.[4] G. Beylkin. On the representation of operators in bases of compactly supportedwavelets. SIAM J. Numer. Anal., 29(6):1716{1740, 1992.[5] B. V. Brower. Low-bit rate image compression evaluation. In SPIE, Orlando,FL, April 1994. SPIE.[6] P. J. Burt and E. H. Adelson. The laplacian pyramid as a compact image code.IEEE Trans. Commun., 31(4):532{540, April 1983.[7] A. Cohen, I. Daubechis, and P. Vial. Wavelets on the interval and fast wavelettransforms. Applied and Computational Harmonic Analysis, 1(1):54{81, Decem-ber 1993.[8] I. Cohen, S. Raz, and D. Malah. Shift invariant wavelet packet bases. In Proc. Int.Conf. Acoust., Speech, Signal Processing, volume 4, pages 1080{1084, Detroit,MI, 1995. IEEE.

115[9] R. R. Coifman and M. V. Wickerhauser. Entropy-based algorithms for best basisselection. IEEE Trans. Inform. Theory, 38(2):1713{1716, 1992.[10] R.R. Coifmna and D. L. Donoho. Translation-invariant de-noising. Technicalreport, Stanford University, Department of Statistics, 1995. To Appear, Waveletsand Statistics, Anestis Antoniadis, ed. Springer-Verlag Lecture Notes.[11] R. E. Crochiere and L. R. Rabiner. Multirate Digital Signal Processing. Prentice-Hall, Englewood Cli�, NJ, 1983.[12] I. Daubechies. Orthonormal bases of compactly supported wavelets. Comm.Pure Applied Math., XLI(41):909{996, November 1988.[13] I. Daubechies. Ten Lectures on Wavelets. SIAM, Philadelphia, PA, 1992. Notesfrom the 1990 CBMS-NSF Conference on Wavelets and Applications at Lowell,MA.[14] S. del Marco, J. Weiss, and K. Jager. Wavepacket-based transient signal detectorusing a translation invariant wavelet transform. InWavelet Applications in Signaland Image Processing, volume 2242, pages 154{165, San Diego, CA, July 1994.SPIE.[15] B. Deng, B. Jawerth, G. Peters, and W. Sweldens. Wavelet probing forcompression-based segmentation. In Wavelet Applications in Signal and ImageProcessing, San Diego, CA, July 1993. SPIE.[16] J. E. Dennis and R. B. Schnabel. Numerical Methods for Unconstrained Opti-mization and Nonlinear Equations. Prentice-Hall, Inc., Englewood Cli�s, NewJersey, 1st edition, 1983.[17] P. Dewaele, P. Wambacq, A. Oosterlinck, and J.L. Marchand. Comparison ofsome speckle reduction techniques for SAR images. IGARSS, 10:2417{2422, May1990.

116[18] D. L. Donoho. Nonlinear wavelet methods for recovery of signals, densities, andspectra from indirect and noisy data. In Proceedings of Symposia in AppliedMathematics, volume 00, pages 173{205. American Mathematical Society, 1993.[19] D. L. Donoho. Unconditional bases are optimal bases for data compressionand for statistical estimation. Applied and Computational Harmonic Analysis,1(1):100{115, December 1993.[20] D. L. Donoho. On minimum entropy segmentation. Technical report, StanfordUniversity, Department of Statistics, March 1994.[21] D. L. Donoho. De-noising via soft-thresholding. IEEE Trans. Inform. Theory,to appear 1995. Also Tech. Report 409, Department of Statistics, Stanford Uni-versity.[22] D. L. Donoho and I. M. Johnstone. Adaptating to unknown smoothness viawavelet shrinkage. J. Am. Stat. Ass. To appear, Also Tech. Report 425, Depart-ment of Statistics, Stanford University, June, 1993.[23] D. L. Donoho and I. M. Johnstone. Ideal denoising in an orthonormal basischosen from a library of bases. Technical report, September 1994.[24] D. L. Donoho and I. M. Johnstone. Ideal spatial adaptation via wavelet shrinkage.Biometrika, 81:425{455, 1994. Also Tech. Report 400, Department of Statistics,Stanford University, July, 1992.[25] D. L. Donoho, I. M. Johnstone, G. Kerkyacharian, and D. Picard. Discussion ofthe paper wavelet shrinkage: Asymptopia? J. R. Statist. Soc. B., 57(2):337{369,1995.[26] D. L. Donoho, I. M. Johnstone, G. Kerkyacharian, and D. Picard. Waveletshrinkage: Asymptopia? J. R. Statist. Soc. B., 57(2):301{337, 1995. Also Tech.Report 419, Department of Statistics, Stanford University, March, 1993.

117[27] P. Flandrin. Wavelet analysis and synthesis of fractional brownian motion. IEEETrans. Inform. Theory, IT-38(2):910{917, 1992.[28] G. D. Forney. The Viterbi algorith,. Proc. IEEE, 61(3):268{278, March 1973.[29] A. Gersho and R. M. Gray. Vector Quantization and Signal Compression. KluwerAcademic Publishers, Boston, MA, 1992.[30] J. W. Goodman. Some fundamental properties of speckle. J. Opt. Soc. Am.,66:1145{1150, November 1976.[31] R. A. Gopinath. Wavelet and Filter Banks - New Results and Applications. PhDthesis, Rice University, Houston, TX-77251, August 1992.[32] R. A. Gopinath and C. S. Burrus. Factorization approach to unitary time-varying �lter banks and wavelets. Technical report, Rice University, Houston,TX, December 1992.[33] R. A. Gopinath and C. S. Burrus. Wavelets and �lter banks. In C. K. Chui, edi-tor, Wavelets: A Tutorial in Theory and Applications, pages 603{654. AcademicPress, San Diego, CA, 1992. Also Tech. Report CML TR91-20, September 1991.[34] R. A. Gopinath and C. S. Burrus. A tutorial overview of wavelets, �lterbanksand interrelations. In Proc. of the ISCAS, Chicago, IL, May 1993. IEEE.[35] R. A. Gopinath and C. S. Burrus. On cosine-modulated wavelet orthonormalbases. IEEE Trans. on Image Processing, 43(2), February 1995. Also Tech.Report No. CML TR92-06, 1992.[36] R. A. Gopinath, J. E. Odegard, and C. S. Burrus. On the optimal and robustwavelet representation of signals and the wavelet sampling theorem. IEEE Trans.on CAS II, 41(4):262{277, April 1994. Also Tech. Report CML TR92-05, RiceUniversity, Houston, TX.

118[37] A. Grossman and J. Morlet. Decomposition of Hardy functions into squareintegrable wavelets of constant shape. SIAM J. Math. Anal., 15(4):723{736,July 1984.[38] H. Guo. DWT madness. Technical report, Computational Mathematics Labo-ratory, Rice University, Houston, TX, September 1994.[39] H. Guo. Dynamic programming approach to time varying �r �lter banks. Tech-nical report, Computational Mathematics Laboratory, Rice University, Houston,TX, August 1994.[40] H. Guo. Redundant wavelet transform and denoising. Technical Report CMLTR94-17, Rice University, Houston, TX, December 1994.[41] H. Guo. Unknown signal dectection using wavelets. Technical report, ECEDepartment, Rice University, Houston, TX, April 1994. Project Report for ELEC530.[42] H. Guo and C. S. Burrus. Undecimated discrete wavelet transform and convolu-tion. Technical report, Computational Mathematics Laboratory, Rice University,Houston, TX, March 1995.[43] H. Guo, M. Lang, J. E. Odegard, and C. S. Burrus. Nonlinear shrinkage ofundecimated DWT for noise reduction and data compression. In Proceedingsof the International Conference on Digital Signal Processing, Limassol, Cyprus,June 1995. Also Tech report CML TR95-06, Rice University, Houston, TX,March 1995.[44] H. Guo, J. E. Odegard, M. Lang, R. A. Gopinath, I. Selesnick, and C. S. Burrus.Speckle reduction via wavelet shrinkage with application to SAR based ATD/R.In SPIE Math. Imaging: Wavelet Applications in Signal and Image Processing,

119volume 2303, pages 333{344, San Diego, CA, July 1994. Also Tech report CMLTR94-03, Rice University, Houston, TX.[45] H. Guo, J. E. Odegard, M. Lang, R. A. Gopinath, I. Selesnick, and C. S. Burrus.Wavelet based speckle reduction with application to SAR based ATD/R. In Proc.Int. Conf. Image Processing, volume I, pages 75{79, Austin, TX, November 1994.IEEE. Also Tech report CML TR94-02, Rice University, Houston, TX.[46] C. Herley, J. Kova�cevi�c, Kannan Ramchandran, and M. Vetterli. Tilings ofthe time-frequency plane: Construction of arbitrary orthogonal bases and fasttiling algorithms. IEEE Transactions on Signal Processing, 41(12):3341{3359,December 1993. Special issue on wavelets, Also Tech. Report CU/CTR/TR315-92-25.[47] N. S. Jayant and P. Noll. Digital Coding of Waveforms. Prentice-Hall, Inc.,Englewood Cli�s, NJ, 1st edition, 1984.[48] H. Krim, J.-C. Pesquet, and A. S. Willsky. Robust multiscale representationof processes and optimal signal reconstruction. In Proc. IEEE Conf. on Time-Frequency and Time-Scale Analysis, pages 1{4. IEEE, 1994.[49] M. Lang, H. Guo, J. E. Odegard, C. S. Burrus, and R. O. Wells, Jr. Nonlinearprocessing of a shift-invariant DWT for noise reduction. In SPIE conference onwavelet applications, volume 2491, Orlando, FL, April 1995. Also Tech. reportCML TR95-03, Rice University, Houston, TX.[50] M. Lang, I. Selesnick, J. E. Odegard, and C. S. Burrus. Constrained FIR �lterdesign for 2-band �lter banks and orthonormal wavelets. In Sixth Digital SignalProcessing Workshop, pages 211{214, October 1994.

120[51] J. Lee, M. R. Grunes, and S. A. Mango. Speckle reduction in multipolarization,multifrequency SAR imagery. IEEE Trans. Geoscience and Remote Sensing,29:535{544, July 1991.[52] J. Liang and T. W. Parks. A two-dimensional translation invariant wavelet rep-resentation and its applications. In Proc. Int. Conf. Image Processing, volume I,pages 66{70, Austin, TX, November 1994. IEEE.[53] A. R. Lindsey. The non-existence of a wavelet function admitting a wavelet trans-form convolution theorem of the Fourier type. Technical report, Ohio University,Athens, Ohio, August 1994.[54] S. Mallat. Multiresolution approximation and wavelets. Trans. of AmericanMath. Soc., 315:69{88, 1989.[55] S. Mallat. Zero-crossings of a wavelet transform. IEEE Trans. Inform. Theory,37(4), July 1991.[56] Y. Meyer. L'analyses par ondelettes. Pour la Science, September 1987.[57] Y. Meyer. Ondelettes et op�erateurs. Hermann, Paris, 1990.[58] Y. Meyer. Wavelets: Algorithms and Applications. SIAM, Philadelphia, 1993.Translated by R. D. Ryan.[59] P. Moulin. A wavelet regularization method for di�use radar-target imaging andspeckle-noise reduction. Journal of Mathematical Imaging and Vision, 3(1):123{134, January 1993.[60] G. P. Nason. Wavelet regression by cross-validation. Technical Report 447,Stanford University, Statistics Department, March 1994.

121[61] G. P. Nason and B. W. Silverman. Stationary wavelet transform and some sta-tistical applications. Technical report, Department of Mathematics, Universityof Bristol, Bristol, U.K., February 1995. Also Tech. Report.[62] Y. Neuvo, D. Cheng-Yu, and S. J. Mitra. Interpolated �nite impulse response�lters. IEEE Trans. ASSP, ASSP-32(3):563{570, June 1984.[63] L. M. Novak, M. C. Burl, and W. W. Irving. Optimal polarimetric processingfor enhanced target detection. IEEE Trans. AES, 29:234{244, January 1993.[64] T. W. Parks and C. S. Burrus. Digital Filter Design. John Wiley & Sons. Inc.,New York, NY, 1st edition, 1987.[65] J. C. Pesquet, H. Krim, and H.Carfantan. Time invariant orthonormal waveletrepresentations. Submitted to IEEE Trans. SP, 1994.[66] S. M. Phoong and P. P. Vaidyanathan. The biorthonormal �lter bank convolver,and applications in low sensitivity FIR �lter structures. Proc. Int. Conf. Acoust.,Speech, Signal Processing, III:165{168, 1993.[67] K. Ramachandran and M. Vetterli. Best wavelet packets in a rate-distortionsense. IEEE Trans. on Image Processing, 1993. 160-175.[68] N. Saito. Local Feature Extraction and Its Applications Using a Library of Bases.PhD thesis, Yale University, December 1994.[69] N. Saito. Simultaneous noise suppresion and signal compression using a libraryof orthonormal bases and the minimum description length criterion. In WaveletApplications, volume 2242, pages 224{235, Bellingham, WA, 1994. SPIE.[70] M. J. Shensa. The discrete wavelet transform: Wedding the �a trous and Mallatalgorithms. IEEE Trans. Inform. Theory, 40:2464{2482, 1992.

122[71] M. J. Smith and T. P. Barnwell III. Exact reconstruction techniques for tree-structured subband coders. IEEE Trans. ASSP, 34:434{441, 1986.[72] M. A. Sola and S. Sallent. Best progressive tiling of the time-frequency planebased on fast time splitting and wavelet transform. In Proc. IEEE Conf. onTime-Frequency and Time-Scale Analysis, pages 132{135. IEEE, 1994.[73] P. Ste�en, P. Heller, R. A. Gopinath, and C. S. Burrus. Theory of regularM -bandwavelet bases. IEEE Trans. SP, 41(12), December 1993. Special Transactionissue on wavelets; Rice contribution also in Tech. Report No. CML TR-91-22,November 1991.[74] C. Taswell. Near-best basis selection algorithms with non-additive informationcost functions. In Proc. IEEE Conf. on Time-Frequency and Time-Scale Analy-sis. IEEE, 1994.[75] A. H. Tew�k, D. Sinha, and P. Jorgensen. On the optimal choice of a wavelet forsignal representation. IEEE Trans. Inform. Theory, 38(2):747{765, March 1992.[76] P. P. Vaidyanathan. Multirate Systems and Filter Banks. Prentice Hall, Engle-wood Cli�s, NJ, 1992.[77] P. P. Vaidyanathan. Orthonormal and biorthonormal �lter-banks as convolvers,and convolutional coding gain. IEEE Trans. SP, June 1993.[78] M. Vetterli and J. Kovacevic. Wavelets and Subband Coding. Prentice Hall,Englewood Cli�s, NJ, 1995.[79] Z. Xiong, C. Herley, K. Ramchandran, and M. T. Orchard. Flexible time seg-mentations for time-varying wavelet packets. In Proc. IEEE Conf. on Time-Frequency and Time-Scale Analysis, pages 9{12. IEEE, 1994.

Documents

Theory and Applications of the Shift-Invariant, Time-Varying and Undecimated Wavelet Transforms