13
Random projection and orthonormality for lossy image compression Jose ´ J. Amador National Aeronautics and Space Administration, John F. Kennedy Space Center, KSC, FL 32899, USA Received 14 January 2005; received in revised form 14 February 2006; accepted 31 May 2006 Abstract There exist many lossy image compression techniques, some of which are based on dimensionality reduction. In this paper, a method for lossy image compression is introduced which utilizes the dimensionality reduction technique known as Random Projection. Random Projection has proven itself as an effective technique for reducing the dimensionality of data, particularly when dimensionality d is mod- erately high (e.g., d < 1500). Image columns or rows are treated as vectors in feature space which are thereby reduced in size to a user specified dimension k where k d. The condition of orthonormality is utilized thereby establishing a technique applicable to image com- pression. Although the compression is lossy, experiments indicate that the recovered image is effectively restored. Visual data is shown in the form of comparison between original and recovered image. Quantitative data includes the compression ratio achieved, the peak sig- nal-to-noise ratio, and the root mean square error. Published by Elsevier B.V. Keywords: Image compression; Random projection; Transform image coding 1. Introduction Digital image processing generally creates a significant amount of very large files purposely containing digital image data. These must be archived or exchanged between numerous users and systems. Consequently, needed are efficient methods for the storage and transfer of digital image data files [1]. Since digital images are very data intensive, reducing their size could produce files that are more elaborate than would otherwise be desired. By eliminating redundancy or unnecessary information, image compression is the activity that addresses this aim. Image files contain a substantial amount of information this is redundant, and more so that is irrelevant, thus mak- ing them prime candidates for contemporary compression techniques. Compression techniques exploit redundancy and irrelevancy by transforming the data file into a smaller one, from which the original image file can later be recov- ered. The ratio between the original and recovered data files, called the compression ratio, identifies the degree of compaction attained. Some data compression algorithms are known as lossless, whiles other are lossy. A lossless algorithm only eliminates redundant information such that data can be recovered exactly after decompression of the file. Alternatively, lossy algorithms eliminate redundancy and irrelevancy, hence, only allowing an approximate reconstruction of the original data. As intuitively expected, lossy algorithms achieve higher compaction ratios. For images a slight loss of fidelity is often acceptable for a higher degree of compaction. File compression and decompression time requirements are not insignificant. Intuitively, the algorithms achieving the best compaction are usually not the fastest; accordingly, choices must be made for each circumstance. Some compres- sion programs offer users the choice of lossless or lossy, considering the decision between speed versus compression ratio. Ultimately, lossy algorithms are usually the method of choice when regarding the compression of image data [1]. This paper is related to the work of Bingham and Mannila [2] which considers using sinusoidal kernels for 0262-8856/$ - see front matter Published by Elsevier B.V. doi:10.1016/j.imavis.2006.05.018 E-mail address: [email protected] www.elsevier.com/locate/imavis Image and Vision Computing 25 (2007) 754–766

Random projection and orthonormality for lossy image compression

Embed Size (px)

Citation preview

Page 1: Random projection and orthonormality for lossy image compression

www.elsevier.com/locate/imavis

Image and Vision Computing 25 (2007) 754–766

Random projection and orthonormality for lossy image compression

Jose J. Amador

National Aeronautics and Space Administration, John F. Kennedy Space Center, KSC, FL 32899, USA

Received 14 January 2005; received in revised form 14 February 2006; accepted 31 May 2006

Abstract

There exist many lossy image compression techniques, some of which are based on dimensionality reduction. In this paper, a methodfor lossy image compression is introduced which utilizes the dimensionality reduction technique known as Random Projection. RandomProjection has proven itself as an effective technique for reducing the dimensionality of data, particularly when dimensionality d is mod-erately high (e.g., d < 1500). Image columns or rows are treated as vectors in feature space which are thereby reduced in size to a userspecified dimension k where k� d. The condition of orthonormality is utilized thereby establishing a technique applicable to image com-pression. Although the compression is lossy, experiments indicate that the recovered image is effectively restored. Visual data is shown inthe form of comparison between original and recovered image. Quantitative data includes the compression ratio achieved, the peak sig-nal-to-noise ratio, and the root mean square error.Published by Elsevier B.V.

Keywords: Image compression; Random projection; Transform image coding

1. Introduction

Digital image processing generally creates a significantamount of very large files purposely containing digitalimage data. These must be archived or exchangedbetween numerous users and systems. Consequently,needed are efficient methods for the storage and transferof digital image data files [1]. Since digital images arevery data intensive, reducing their size could producefiles that are more elaborate than would otherwise bedesired. By eliminating redundancy or unnecessaryinformation, image compression is the activity thataddresses this aim.

Image files contain a substantial amount of informationthis is redundant, and more so that is irrelevant, thus mak-ing them prime candidates for contemporary compressiontechniques. Compression techniques exploit redundancyand irrelevancy by transforming the data file into a smallerone, from which the original image file can later be recov-ered. The ratio between the original and recovered data

0262-8856/$ - see front matter Published by Elsevier B.V.

doi:10.1016/j.imavis.2006.05.018

E-mail address: [email protected]

files, called the compression ratio, identifies the degree ofcompaction attained.

Some data compression algorithms are known as lossless,whiles other are lossy. A lossless algorithm only eliminatesredundant information such that data can be recoveredexactly after decompression of the file. Alternatively, lossyalgorithms eliminate redundancy and irrelevancy, hence,only allowing an approximate reconstruction of the originaldata. As intuitively expected, lossy algorithms achieve highercompaction ratios. For images a slight loss of fidelity is oftenacceptable for a higher degree of compaction.

File compression and decompression time requirementsare not insignificant. Intuitively, the algorithms achievingthe best compaction are usually not the fastest; accordingly,choices must be made for each circumstance. Some compres-sion programs offer users the choice of lossless or lossy,considering the decision between speed versus compressionratio. Ultimately, lossy algorithms are usually the methodof choice when regarding the compression of image data[1].

This paper is related to the work of Bingham andMannila [2] which considers using sinusoidal kernels for

Page 2: Random projection and orthonormality for lossy image compression

J.J. Amador / Image and Vision Computing 25 (2007) 754–766 755

Random Projection (RP). However, our research differs intwo distinct aspects. First, Bingham and Mannila [2] useRP to show that projecting high-dimensional data onto alower-dimensional subspace does not distort data signifi-cantly (i.e., distance between two dimensionality reduceddata vectors is comparable to their Euclidean distance inthe original high-dimensional space.) This was shown use-ful for problems of image registration and noise reduction.Our work focuses on image compression which was notextensively covered in [2]. Second, Bingham and Mannila[2] state that RP is applicable where similarity between datavectors should be preserved under dimensionality reduc-tion, but where data is not intended for visualization bythe human eye. Alternatively, we provide a paradigmwhere sinusoidal kernels and RP are successful in providingcompression and recovery of images.

1.1. Prior art

Recent work in the field has focused on dimensionalityreduction techniques to provide a high degree of compac-tion [3–11]. These are lossy algorithms that are quantita-tively shown to discard redundant and irrelevantinformation, however, still providing a subjectively accept-able recovered image. Their unifying theme: Methods orig-inally developed for the task of dimensionality reduction.Dimensionality reduction techniques have the capabilityof reducing high-dimensional data into a lower-dimension-al model such that the properties (i.e., distance betweensamples) are preserved. Methods such as Principle Compo-nent Analysis (PCA) [3–5], Singular Value Decomposition(SVD) [6–8], and the Discrete Cosine Transform (DCT) [9–11] have shown themselves applicable to the image com-pression problem.

In [3–5] PCA was utilized for developing an image com-pression scheme. For compressing video streams, PCA ispart of a new coding technique based on estimating ortho-normal images onto which video images are projected [3].Hyperspectral image data is compacted for subsequent clas-sification by implementing a Gauss–Markov Random Fieldas a model of observed data [4]. Maximum a posteriori(MAP) estimation then compresses and reconstructs datawith PCA. To overcome the loss of discriminant informationwhen compressing hyperspectral images, a new methodapplies linear feature extraction followed by PCA therebycompressing the image whose discriminant features areenhanced [5]. Experiments indicate slightly better classifica-tion accuracies when compared to the typical PCA method.

Variants to SVD are considered for image compressionutilizing the Karhunen–Loeve Transform (KLT) [6], semi-discrete decomposition [7,12], and an adaptive SVD [8].The hybrid of KLT and SVD exploits variations in local sta-tistics within the image. A switching system selects betweenKLT and SVD using a global rate-distortion criterion [6].A variant to SVD known as semi-discrete decomposition(SDD) [12] restricts elements of the outer product vectorsto 0, 1, or �1. This allows storing approximations of high

rank for the same amount of space [7]. Another variationadaptively selects appropriate singular values specifying apercentage sum of the singular values instead of a fixed num-ber [8]. This scheme attempts to achieve higher compressionratios for ‘‘complex’’ images whereas lower compressionratios are obtained for ‘‘simple’’ images.

The DCT has also been use in image compression formultimedia, medical imagery, and remote sensing applica-tions [9–11]. A global DCT uses data truncation combinedwith run-length and Huffman coding [9]. After globallytransforming the image, data truncation discards low-energycoefficients and then followed by entropy encoding withrun-length and Huffman techniques. The encoding pre-serves spatial correlation of DCT coefficients as well aseliminating statistical redundancies in the run-length.Sonogram images are compressed by a DCT designed asa bandpass filter [10]. The bandpass filter decomposessub-blocks of the image into equal sized bands. The bandsare gathered using a high similarity property that providesthe compaction. Remote sensing images are compressed byboth a DCT and discrete wavelet transform (DWT) meth-odology [11]. For both the DCT and DWT techniques, thequantization process normalizes the JPEG quantizationmatrix, and then the quantized coefficients were derivedwith run-length coding. Empirically, neither techniquewas shown better over the other when considering all typesof remote sensing images.

1.2. Proposed method and outline

An intention of this paper is to establish a method utiliz-ing RP, previously determined incapable of successfullysolving the image compression problem [2]. Random Projec-tion, devised under conditions of orthonormality, is a dimen-sionality reduction technique deceptively simple, yet verypowerful. Like PCA and SVD, RP is based on matrix com-putations; however, the calculations do not require expen-sive eigenvalue decomposition of covariance matrices.Instead, a random matrix R is formed which aids in project-ing the d · N data matrix X into k dimensions; the entireoperation is of O(dkN). Thus, the image is defined as datamatrix X where compression is the random mapping of theoriginal d · N image into a k-dimensional or k · N imagesize. For decompression, the random mapping is invertedwhere the pseudo-inverse of R or its transpose is used.

The remainder of the paper is organized as follows: InSection 2 a detailed discussion is given on RP, how it isapplied to the image compression problem; Section 3 cov-ers the approach taken by this research; an extensive empir-ical study and review of results is given in Section 4; Section5 offers a summary and conclusion.

2. Random projection model

In many applications, high-dimensionality of datarestricts choices for data processing techniques. Suchapplication areas include genetic profile matching, text

Page 3: Random projection and orthonormality for lossy image compression

756 J.J. Amador / Image and Vision Computing 25 (2007) 754–766

documents, and image data. In each of these cases thedimensionality is considerably large. A statistically optimalway of dimensionality reduction is to project data onto alower-dimensional orthogonal subspace that captures asmuch variation of the data as possible. One of the bestand most widely used methods is SVD; unfortunately, itis computationally intensive to calculate for high-dimen-sional data. Highly desirable would be a computationallysimple method of dimensionality reduction that does notintroduce signification distortion in the data set.

In RP, the original high-dimensional data is projectedonto a lower-dimensional subspace using a random matrixwhose columns have unit lengths. RP has been found com-putationally efficient and a sufficiently accurate method fordimensionality reduction of highly dimensional data sets[13–19].

2.1. Fundamentals

Foremost, RP is a technique just linear in dimension,simply because the choice of projection does not dependupon the data at all. This is a noteworthy property, theprojection to a lower-dimensional subspace is picked atrandom; it can be shown with high probability that theprojected data in that equivalent subspace will retain theapproximate level of separation of their high-dimensionalcounterparts. The use of randomness may seem subopti-mal; it should be possible to achieve better results whenconsidering the data. Nonetheless, no deterministicprocedures have the performance guarantees that will bepresented. Indeed, randomness is now a tool in algorithmdesign; for example, it is the basis of the only known poly-nomial-time algorithm for primality testing [20].

The concept of RP is straightforward: Given a datamatrix X, the dimensionality of data can be reduced byprojecting it through the origin onto a lower-dimensionalsubspace formed by a set of random vectors [14],

Ak�N ¼ Rk�d � Xd�N ; ð1Þwhere N is the total number of points, d is the originaldimension, and k is the desired lower dimension. Thecentral idea of RP is based on the Johnson–Lindenstrausslemma [21]:

Theorem 1 (Johnson–Lindenstrauss lemma). For any

0 < e < 1 and any integer n, let k be a positive integer such

that

k P 4½ðe2=2Þ � ðe3=3Þ��1 ln n: ð2Þ

Then for any set W of n points in Rd , there is a map f:Rd ! Rk such that for all u, v 2W,

ð1� eÞku� vk26 kf ðuÞ � f ðvÞk2

6 ð1þ eÞku� vk2: ð3Þ

In words the lemma states that if points in a vector spaceare projected randomly onto a selected lower-dimensionalsubspace, then distances between points are preserved(i.e., not distorted more than a factor of (1 ± e)), for any

0 < e < 1. For complete proofs on the lemma refer to[22,23]. Furthermore, this map can be found in randomizedpolynomial time [14].

2.2. Selecting the random matrix

The choice of random matrix R is one of the crucialpoints of interest. The elements of R, rij, where R = {rij j0 6 i 6 k, 0 6 j 6 d} from original dimension d to desireddimension k, are often Gaussian distributed [16]. In fact,practically all zero mean, unit variance distributions of rij

would give a mapping that satisfies the Johnson–Lindenstr-auss lemma. However, given these conditions Eq. (1) is nota projection since R is not an orthogonal matrix. For R tobe orthogonal the random matrix must consist of orthogo-nal column vectors. In applications of text and image dataretrieval the requirement of orthogonality for R is relievedby the results in [14,24]; in high-dimensional space thereexists a much larger number of almost orthogonal thanorthogonal directions. Indeed, vectors having randomdirections might be sufficiently close to orthogonal andequivalently RTR would estimate an identity matrix. Other-wise stated, the random vectors might be close enough toorthogonal to offer a reasonable approximation of the ori-ginal vectors.

Unfortunately, when the elements of R are Gaussiandistributed this results in poor image compression usingRP. The transformation utilized for image compressionmust result in an image that looks as similar as possibleto the original. To recover an image compressed with RPrequires inverting the random mapping. Since R is almostorthogonal, its transpose RT is deemed a good approxima-tion of the inverse. Consider the sample image shown inFig. 1. It is compressed at a 1:1 ratio and recovered withRP using an R whose elements are independently chosenfrom the standard Normal distribution N(0, 1). Noticehow the recovered image of Fig. 1(b) is visually poor, tothe human eye.

To overcome this problem we could find the inverse ofR, however, this is computationally intensive averagingH(n3) operations [25]. Furthermore, for non-square matri-ces the H(n3) operations may have to be computed twice,once more to find the left- or right-inverse if the other doesnot exist. Indeed, there is no guarantee that we will evenfind R�1 [26]. Instead of expensive computations to findR�1, consider treating R as an orthonormal matrix.

Recall from linear algebra, an orthonormal matrix is anot necessarily square matrix that may contain real or com-plex entries whose columns treated as vectors in Rm or Cm,respectively, are orthonormal with respect to the standardinner product [26]. This means that an m · n matrix G isorthonormal if

G�T

m�n �Gm�n ¼ In�n; ð4Þwhere G�

T

denotes the conjugate transpose (i.e., thecomplex conjugation followed by the transpose operation)

Page 4: Random projection and orthonormality for lossy image compression

Fig. 1. Baboon image compressed and recovered with RP using R from N(0,1): (a) original image; (b) RP distributed by N(0,1).

J.J. Amador / Image and Vision Computing 25 (2007) 754–766 757

of each element in G, and I is the identity matrix. If them · n matrix G is orthonormal, then n 6 m. Thus, realm · n orthonormal matrices are precisely the matrices thatresult from deleting m�n columns from an orthogonal ma-trix; the complex m · n orthonormal matrices are preciselymatrices that result from deleting m�n columns from a uni-tary matrix [26,27]. An orthonormal matrix G contains aset of orthonormal basis functions such that it satisfiestwo properties. First, the property of orthonormality whichminimizes the sum of squares error for any truncation ofthe matrix; second, the property of completeness whichmakes the sum of squares error vanish in case no trunca-tion is performed.

Therefore, unitary and orthogonal matrices are them-selves orthonormal; however, an orthonormal matrix withreal elements is orthogonal, whereas an orthonormalmatrix with complex elements is not. Since the need oforthogonality is vital to RP, real-orthonormal matricesare desired. These can be obtained from kernel matricesof certain sinusoidal transforms.

2.2.1. Signal transforms – Kernel matrices

To utilize RP for image compression requires ‘‘orthonor-malizing’’ the random matrix R. Specifically, a real-ortho-

Fig. 2. Baboon image compressed and recovered with RP using R fr

normal matrix is desired; an orthogonal matrix alone –although containing real elements – does not suffice becauseit must be square where in practice R is not. To obtain amatrix with real-orthonormal entries, kernel matrices ofclassical transforms from signal theory are considered. TheDiscrete Fourier Transform (DFT) is an obvious startingpoint; consider its kernel matrix W [27] (where the indicescorrespond to the rows and columns of an image)

W ¼w0;0 . . . w0;N�1

. . . . . . . . .

wN�1;0 . . . wN�1;N�1

264

375; ð5Þ

where

wi;k ¼1ffiffiffiffiNp e�j2pik

N : ð6Þ

Because the imaginary exponential is periodic, W is notorthogonal. The complex nature of the elements, even inhigh-dimensional space of images, does not lend the DFTkernel matrix to RP. Fig. 2 shows a sample image com-pressed at a 1:1 ratio and recovered with RP using an R

whose elements are distributed by Eq. (6). Fortunately,there are several relatives to the DFT using sinusoidal basis

om DFT kernel; (a) original image; (b) RP distributed by DFT.

Page 5: Random projection and orthonormality for lossy image compression

758 J.J. Amador / Image and Vision Computing 25 (2007) 754–766

functions and are real-valued in nature, such as the Dis-crete Cosine, Discrete Sine, and Hartley Transforms.

The Discrete Cosine Transform (DCT) is a real andorthonormal transform that is C = C*) C�1 = CT. Intwo-dimensions it is defined as

F Cðm; nÞ ¼ aðmÞaðnÞX

i

Xj

f ði; jÞ cospð2iþ 1Þm

2N

� ��

� cospð2jþ 1Þn

2N

� ��; ð7Þ

and the inverse is taken by

f ði; jÞ ¼X

m

Xn

aðmÞaðnÞF Cðm; nÞ

� cospð2iþ 1Þm

2N

� �cos

pð2jþ 1Þn2N

� �; ð8Þ

such that the coefficients are

að0Þ ¼ffiffiffiffi1

N

rand aðmÞ ¼

ffiffiffiffi2

N

rfor 1 6 m 6 N : ð9Þ

Notice that the forward and inverse transforms of Eqs. (7)and (8), respectively, have the same kernel matrix form thatis

Ci;m ¼ aðmÞ cospð2iþ 1Þm

2N

� �: ð10Þ

The DCT is not the real part of the DFT; it is nearly opti-mal with high positive values of adjacent sample correla-tions, and it can be computed via the DFT using a FastFourier Transform (FFT) algorithm [28,29]. Furthermore,when compared to the DFT, DCT coding produces feweredge-effect problems making it very useful for imageprocessing. In Fig. 3 the sample Baboon image is both com-pressed and recovered at a 2:1 ratio with RP using an Rwhose elements are distributed from the DCT kernel ma-trix, C.

The Discrete Sine Transform (DST) is a linear, invert-ible function F : R fi R. The two-dimensional case isdefined as,

Fig. 3. Baboon image compressed and recovered with RP using R from D

F Sðm; nÞ ¼2

N þ 1

Xi

Xj

f ði; jÞ sinpðiþ 1Þðmþ 1Þ

N þ 1

� ��

� sinpðjþ 1Þðnþ 1Þ

N þ 1

� ��; ð11Þ

and the inverse

f ði; jÞ ¼ 2

N þ 1

Xm

Xn

F Sðm; nÞ

� sinpðiþ 1Þðmþ 1Þ

N þ 1

� �sin

pðjþ 1Þðnþ 1ÞN þ 1

� �:

ð12ÞThe DST kernel matrix thus has the following form,

Si;m ¼ffiffiffiffiffiffiffiffiffiffiffiffi

2

N þ 1

rsin

pðiþ 1Þðmþ 1ÞN þ 1

� �: ð13Þ

The DST is not the imaginary part of the DFT; it has excel-lent energy compaction properties for images [28,30]. TheDST is a fast transform; there exist versions that are con-siderably faster than the FFT and fast DCT algorithms[30]. Fig. 4 has the Baboon image recovered at a 2:1 ratiousing RP with an R distributed by DST kernel matrix S.

The Discrete Hartley Transform (DHT) is a periodictransform analogous to its continuous integral version[31,32]. It is a linear, invertible function defined in two-di-mensions as,

F Tðm; nÞ ¼1

N

Xi

Xj

f ði; jÞ cas2pNðimþ jnÞ

� �� �; ð14Þ

with inverse

f ði; jÞ ¼ 1

N

Xm

Xn

F Tðm; nÞ cas2pNðimþ jnÞ

� �; ð15Þ

where basis function cas(h) is

casðhÞ ¼ cosðhÞ þ sinðhÞ ¼ffiffiffi2p� cosðh� p=4Þ ð16Þ

which is simply a cosine shifted 45� to the right. The kernelmatrix is then represented as

CT kernel matrix C; (a) original image; (b) RP distributed by DCT.

Page 6: Random projection and orthonormality for lossy image compression

Fig. 4. Baboon image compressed and recovered with RP using R from DST kernel matrix S; (a) original image; (b) RP distributed by DST.

J.J. Amador / Image and Vision Computing 25 (2007) 754–766 759

Ti;m ¼ffiffiffiffi1

N

rcas 2p

ijN

� �: ð17Þ

As given, the DHT is closely related to the DFT. The DHTis simply the real part minus the imaginary part of thecorresponding Fourier transform [33]. Fig. 5 illustratesthe sample image compressed at 2:1 and recovered withRP using an R distributed by T. Although the recoveredimage is better than using the DFT, it is still visually poor.While the DHT is a computational alternative to the DFT,it works particularly well under conditions when the kernelmatrix is symmetric (i.e., square.) Consequently, the resultof Fig. 5 indicates that generating an R via the DHT is notentirely orthonormal.

There are other signal theory transforms that usevariations of square waves rather than sinusoids as basisfunctions; these are known as Rectangular Transforms.Some of the Rectangular transforms of interest include theHadamard, Slant, and Haar Transforms [27,34–36]. TheHadamard Transform contains only ±1 as elements in itskernel matrix; the Slant Transform contains basis functionswhich match the linearly sloping background present in mostimages; the Haar transform has a kernel matrix that

Fig. 5. Baboon image compressed and recovered with RP using R from Ha

addresses lines and edges specifically since its basis functionsresemble these features. For all three, the kernel matricesexist for N = 2n where n is an integer [27]. Consequently,the kernel matrices only exist as symmetric in some cases,and square in all. This implies that results obtained fromthese matrices would be similar to those obtained from theDHT. The requirement of N = 2n makes these transformssuboptimal when applying it to the random matrix.

Accordingly, the transforms of most interest are theDCT and DST. These lend themselves very well to theorthonormality condition of R and have fast versions thatexist. In Section 4, quantitative results will compare imagescompressed and recovered with both DCT and DST. Thebest one will be chosen for subsequent comparison betweenother accepted methods of lossy image compression.

3. Image compression and recovery approach

In this section, the forward transformation of the image(i.e., compression via RP) and the recovery (decompres-sion) is reviewed. Additional discussions on degree of com-paction, for storage and transmission of the compressedimage, are covered below.

rtley kernel matrix T; (a) original image; (b) RP distributed by Hartley.

Page 7: Random projection and orthonormality for lossy image compression

760 J.J. Amador / Image and Vision Computing 25 (2007) 754–766

3.1. Compression procedure

For compression of digital image data, let image I of sizeNr · Nc, where Nr is the number of rows and Nc is the num-ber of columns, be represented by data matrix X, such thatNr = d and Nc = N. Now select R to be distributed as Eqs.(10) or (13) such that Rk · d) Rk · Nr. Hence, Eq. (1) nowbecomes:

Ak�Nc ¼ Rk�Nr � XNr�Nc : ð18ÞFig. 6 depicts the entire image compression and recoverysystem, the compression procedure is diagrammaticallyshown in the leftmost block labeled ‘Compression.’

The projection or compression occurs as follows: RecallTheorem. 1, let u = [u1, . . .,ud]T be a vector taken from acolumn of image I, in d-dimensional Euclidean space. Letk be the desired dimension where k� d. Project (compress)u to a k-dimensional Euclidean space by first selecting thek-dimensional subspace represented by a k · d matrix R.Thus, to project first choose a random orthonormal matrixR, then multiply R and u, and subsequently scale the result-ing vector. This results in projected vector v, which issimply,

v ¼ffiffiffidk

rRu: ð19Þ

The scaling termffiffiffiffiffiffiffiffid=k

ptakes into account the decrease in

dimensionality of the data, according to the Johnson–Lin-denstrauss lemma.

3.2. Decompression procedure

To visualize how an image is compressed by RP, therandom mapping must be reversed. The pseudo-inverseof R must be obtained in this case; unfortunately, it iscomputationally intensive to compute. However, since R

is orthonormal the transpose of R, RT, is a good approxi-mation for the pseudo-inverse. The image can be recoveredas

XNr�Nc ¼ RTNr�k � Ak�Nc ; ð20Þ

where Ak�Nc is the result of random projection of theoriginal image I.

Fig. 6. RP image compression

Fig. 6 illustrates the decompression procedure in theright-most block labeled ‘Decompression.’

3.3. Storage and transmission discussion

The amount of storage saved is based on size of thecompressed image, represented by matrix A. Recall thatA is sized by k · Nc; if original image size is Nr · Nc, thenthe percentage of X that is the size of A is

k � N c

Nr � N c

� �100; ð21Þ

which simplifies to

kNr

� �100: ð22Þ

From Eq. (22), the compression ratio (CR) for storage iseasily obtained as

Nr

k: ð23Þ

Clearly the storage saving is a function of dimensionality,original, and desired.

For transmission and recovery, savings must considercompressed image A and random matrix R. Recall that R

is sized by k · Nr, using the same reasoning with image sizeof Nr · Nc, the percentage of X that is the sum of A and R

is

ðk � NcÞ þ ðk � NrÞNr � N c

� �100; ð24Þ

which simplifies to

kN rþ k

Nc

� �100: ð25Þ

In fact, transmission of the compressed image and its sub-sequent recovery requires storing and sending A and R.

Given the compression and decompression procedurespresented, the following observations are made:

1. For storage,• If Nr > Nc then CR is better than when Nr = Nc.• If Nr < Nc then CR is less than or equal to when

Nr = Nc.

and decompression system.

Page 8: Random projection and orthonormality for lossy image compression

J.J. Amador / Image and Vision Computing 25 (2007) 754–766 761

2. For transmission,• If Nr > Nc or Nr < Nc, then the transmission savings is

worse than when Nr = Nc. (Because the size of I forNr > Nc or Nr < Nc would be larger than Nr = Nc.)

3. For both storage and transmission,• Given a fixed k, as the image size Nr · Nc increases, so

does CR.• Given a fixed Nr · Nc, as the desired dimensionality

gets smaller (k fi 1), CR increases.

The apparent problems with transmission can be easilysolved. Although for image recovery R is required, it is gen-erated independently from X. Thus, instead of transmittingA and R, A need only be transmitted and the method inwhich R is created (e.g., DCT, DST, etc.) be repeated atthe receiving end.

Eqs. (21) through (25) indicate that the compressionratio of the image is influenced by k. As k fi 1 the compres-sion percentage approaches 100, and in some cases,exceeds. Therefore, this technique’s trade-off will be thequality-of-image recovered versus the lowest-dimension-kachieved.

4. Performance evaluation

An extensive series of experiments were conductedagainst the proposed approach. The performance wasfirst compared between utilizing the DCT kernel matrixto the DST counterpart as the distribution for R. Theintent is to show which distribution for R is better suitedto image compression via RP. After choosing the suitablekernel matrix, performance was then compared againsttwo widely accepted image compression techniques,which are also known as dimensionality reduction meth-ods: the Discrete Cosine Transform (DCT) as defined intwo dimensions [27], and Singular Value Decomposition(SVD). Three images were utilized for all testing:Apollo17CSM, Baboon, and Lena. In both cases, theoriginal image is compared against the recovered image.Quantitative data was obtained in the form of the com-

Fig. 7. Visual comparison of Apollo17CSM image for kernel matrices at compr(c) RP using DST kernel matrix.

pression ratio; the compression and recovery time; thepeak signal-to-noise ratio (PSNR) and root mean-squareerror (RMSE) obtained [37]. It is important to note thata higher PSNR, or lower RMSE, does not necessarilyindicate or correlate well with perceived image quality;however, they do provide an indication of relative qual-ity. For this reason, the actual recovered images areincluded for the comparisons.

4.1. Kernel matrix comparison

When evaluating which kernel matrix is more suitablefor R a comparison of original image to recovered, as wellas quantitative measures is provided. Each image is256 · 256 pixels of gray-level type.

Consider Fig. 7 which compares the performance ofDCT and DST kernels using Apollo17CSM. Fig. 7(a) con-tains the original image, Fig. 7(b) shows the recoveredimage using RP with a DCT kernel, and Fig. 7(c) containsthe result using DST. The compression ratio shown is 4:1.At first glance all three images look alike; upon closerexamination differences can be noted. The cross-hairs ofFigs. 7(b) and (c) are slightly blurred, in addition toFig. 7(c) containing discernable vertical lines. The compres-sion and recovery time for 7(b) is marginally better than7(c), as well as the PSNR of 34.582 dB and RMSE of4.758r.

Fig. 8 compares the kernels against the Baboon imageused in earlier discussions. Visually speaking all threeimages look similar, except for short vertical lines in thelower left-hand corner using the DST kernel, Fig. 8(c).The PSNR and RMSE achieved by the DCT kernel oncemore was better than obtained by DST, 24.929 dB and14.457r.

In Fig. 9 the comparison is made using the classic Lena

image. Figs. 9(b) and (c) show some fuzziness around themodel’s face; the DST result once again contains verticalline artifacts, this time on both the left- and right-ends ofthe image. These are especially visible across the smallportion of the mirror. DST coefficients become coarsely

ession ratio of 4:1; (a) original image; (b) RP using DCT kernel matrix; and

Page 9: Random projection and orthonormality for lossy image compression

Fig. 9. Visual comparison of Lena image for kernel matrices at compression ratio of 4:1; (a) original image; (b) RP using DCT kernel matrix; and (c) RPusing DST kernel matrix.

Fig. 8. Visual comparison of Baboon image for kernel matrices at compression ratio of 4:1; (a) original image; (b) RP using DCT kernel matrix; and (c) RPusing DST kernel matrix.

762 J.J. Amador / Image and Vision Computing 25 (2007) 754–766

quantized in still images [38]. This coarse quantizationalong with independent quantization of neighboringpixels causes the artifacts. The PSNR achieved by theDCT kernel was better with 33.436 dB and lower RMSEat 5.430r.

Both visually and quantitatively, the DCT kernel whenused for R indicates better performance. Although thequantitative measures seemed slightly better, the perceivedimage quality is distinct. For all three images, RP utilizingDCT outperformed it DST counterpart. Furthermore, interms of processing time for compression and decompres-sion DCT executed in lesser time; all quantitative resultsare shown in Table 1. Given this evidence, the kernel of

Table 1The comparison results between the DCT and DST kernel matrix distributing

Image – Kernel Compression ratio Compression time (

Apollo17CSM – DCT 4:1 1.452Baboon – DCT 4:1 1.482Lena – DCT 4:1 1.472

Apollo17CSM – DST 4:1 1.492Baboon – DST 4:1 1.503Lena – DST 4:1 1.512

choice will be the DCT; the remaining experiments showthe RP version utilizing DCT kernel matrix for R.

4.2. Technique comparison

The RP compression technique is compared against twowell-known and accepted image compression methods,Discrete Cosine Transform (DCT) image coding and Sin-gular Value Decomposition (SVD). Both of these are alsoknown as dimensionality reduction schemes making themrelevant for comparison to RP. Since RP is performedagainst the entire image, the DCT version used is similarto the global approach taken in [9]. The DCT is applied

elements in R

s) Decompression time (s) PSNR (dB) RMSE (r)

1.443 34.582 4.7581.402 24.929 14.4571.412 33.436 5.430

1.392 33.463 5.4131.422 24.768 14.7271.392 32.361 6.145

Page 10: Random projection and orthonormality for lossy image compression

J.J. Amador / Image and Vision Computing 25 (2007) 754–766 763

to the whole image instead of the traditional coding of8 · 8 blocks. The use of quantization (i.e., degree of com-pression) remains. In all experiments a quantization levelof 50 was used from the range [1, 100]. SVD takes anym · n matrix A, where m P n, as the product of an m · n

matrix U, an n · n diagonal matrix W with positive or zeroelements (i.e., the singular values), and the transpose of ann · n matrix V such that

A ¼ U �W � VT: ð26ÞThe implementation of SVD is similarly to the algorithmpresented in [39], matrix A represents the image and is re-

Table 2Experimental results between DCT, SVD, and RP image compression scheme

Compression type – Image Compression ratio Compression tim

DCT – Apollo17CSM 2.829:1 4330.507SVD – Apollo17CSM 2.813:1 11.416RP – Apollo17CSM 2.798:1 2.113

DCT – Baboon 1.881:1 4324.217SVD – Baboon 1.869:1 12.177RP – Baboon 1.868:1 1.572

DCT – Lena 2.722:1 4329.768SVD – Lena 2.695:1 11.306RP – Lena 2.695:1 2.193

Fig. 10. Apollo17CSM image comparison at approximate compression ratiorecovery; and (d) RP recovery.

placed by U on output. The singular values of W are alsocalculated, as well as matrix V. Note that when A is sym-metric (as is with image size of 256 · 256), U = V.

A quantization of 50 for DCT was used as the bench-mark for all experiments. The CR achieved for DCT wasthen matched as closely as possible by SVD and RP(through experimentation) without exceeding it for com-parison purposes. In fact, the CR for RP was kept less thanor equal to that achieved by either DCT or SVD to indicateimproved or similar performance. The results of these com-parisons are discussed in detail below and are listed inTable 2.

s

e (s) Decompression time (s) PSNR (dB) RMSE (r)

6542.498 0.804 232.4552.293 42.493 1.9142.163 44.352 1.545

6555.326 3.547 169.5012.263 32.762 5.8681.562 25.367 13.747

6547.415 3.522 170.0032.254 39.108 2.8262.173 39.922 2.573

of approximately 3:1; (a) original image; (b) DCT recovery; (c) SVD

Page 11: Random projection and orthonormality for lossy image compression

Fig. 11. Baboon image comparison at approximate compression ratio of approximately 2:1; (a) original image; (b) DCT recovery; (c) SVD recovery; and(d) RP recovery.

764 J.J. Amador / Image and Vision Computing 25 (2007) 754–766

Let’s first consider Fig. 10, it contains the first visualcomparison between the three methods. Fig. 10(a) is theoriginal Apollo17CSM image; Fig. 10(b) shows the recov-ered image with DCT coding, Fig. 10(c) contains theSVD outcome, and Fig. 10(d) shows the RP result. Aslisted in Table 2, the CR compared for all three methodsis approximately 3:1 (recall, a quantization level of 50 forDCT reached this value as the benchmark.) Clearly, theDCT result of Fig. 10(b) is visually the worst, with littleor no difference between the SVD and RP results of Figs.10(c) and (d), respectively. Quantitatively, the highestPSNR and lowest RMSE were obtained by the RPapproach, as shown in Table 2. Furthermore, RP recoverywas significantly faster than DCT and about 10 s fasterthan SVD.

In Fig. 11, the comparison is made using the Baboon

image. Fig. 11(a) shows the original image; notice the poorresult of DCT given in Fig. 11(b). In contrast, both SVDand RP recovered very well as shown in Figs. 11(c) and(d), respectively. However, data in Table 2 indicates SVDachieving better PSNR and RMSE. In Fig. 11(d) there existslightly noticeable vertical lines on the right-side of thebaboon’s nose; otherwise, the image looks similar to theoriginal. Notice how RP continues to execute the fastestat 3.134 seconds whereas SVD was almost 5 times slower.

Fig. 12 contains the comparison for Lena. Once moreFig. 12(a) indicates a poor result for DCT while SVDand RP recovered without any discernable artifacts, Figs.12(c) and (d), respectively. RP achieved the best PSRNand RMSE, once more executing in fastest time. ThePSNR of 39.922 dB was almost a full decibel better thanSVD running about 10 s faster.

Overall, RP was shown quantitatively better in only twocases Figs. 10 and 12. Nonetheless, perceived image qualityfor RP recovery was subjectively good in all three testcases. Although there were artifacts in the recovered imageof Fig. 11(d), they are mostly indiscernible. Furthermore,RP executed fastest indicating a robust procedure eventhough protracted matrix operations are involved.

5. Conclusions

In this paper, attention was focused on introducing alossy image compression approach using RP. RandomProjection is known as a simple yet powerful tool fordimensionality reduction and it was applied to imagecompression by overcoming the problem with its orthogo-nality constraint. Using orthonormal kernel matrices fromsignal theory, it was shown how RP solves the image com-pression problem. This is a clear improvement over previ-

Page 12: Random projection and orthonormality for lossy image compression

Fig. 12. Lena image comparison at approximate compression ratio of approximately 3:1; (a) original image; (b) DCT recovery; (c) SVD recovery; and (d)RP recovery.

J.J. Amador / Image and Vision Computing 25 (2007) 754–766 765

ous research ([2]) that stated RP was inapplicable as animage compression technique. Experiments were conductedoffering good visual and quantitative results as to itsperformance and efficiency. Perceived quality was as goodto the original image in all test cases. Its processing was thefastest, indicating an effective algorithm even though noattempts at optimization were made.

An obvious deficiency of the RP approach is the impliedrequirement of transmitting both compressed data A andrandom matrix R for subsequent decompression. Recallfrom Section 3.2 and Fig. 6 that both A and R are neededfor decompression. However, if the method for generatingR is also performed at the receiving end, only A need betransmitted (since R is independently created from datamatrix X.) This is an improvement over SVD; althoughthe entire image is compressed into diagonal matrix W,the matrices U and V are unique for the given image. Theseare required, along with the image, before decompressioncan happen at the receiving end.

For future research, another possible method for creat-ing R is utilizing wavelet theory, in particular DiscreteWavelet Transforms (DWTs). There exists a family oforthonormal wavelets that are applied to the one- andtwo-dimensional cases of the DWT [40]. Although decom-pression is definitely lossy, expected is a very good recon-

struction achieved by this highly compact coding. Aninvestigation of orthonormal DWTs as the random matrixis currently under consideration.

Acknowledgements

The author would like to thank the anonymous review-ers for their insightful and useful comments on this re-search. Their remarks went a long way to improving thepresentation and identifying the uniqueness of this paper.The author would also like to thank his employer, theNational Aeronautics and Space Administration (NASA)at the John F. Kennedy Space Center (KSC), Florida, forproviding the resources to facilitate this effort.

References

[1] M. Sonka, V. Hlavac, R. Boyle, Image Processing, Analysis,and Machine Vision, second ed., Brooks/Cole, Pacific Grove,1999.

[2] E. Bingham, H. Mannila, Random projection in dimensionalityreduction: applications to image and text data, in: Proceedings on 7thACM SIGKDD International Conference on Knowledge Discoveryand Data Mining, 2001, pp. 245–250.

[3] J. L. Crowley, K. Schwerdt, Robust tracking and compression forvideo communication, Proceedings of International Workshop on

Page 13: Random projection and orthonormality for lossy image compression

766 J.J. Amador / Image and Vision Computing 25 (2007) 754–766

Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems (1999) 2–9.

[4] H. Chen, C. H. Chen, Hyperspectral image data unsupervisedclassification using Gauss-Markov random fields and PCA principle,Proceedings of IEEE International Geoscience and Remote SensingSymposium (2002) 1431–1433.

[5] E. Choi, H. Choi, C. Lee, Principal component analysis with pre-emphasis for compression of hyperspectral imagery, Proceedings ofIEEE International Geoscience and Remote Sensing Symposium(2005) 704–706.

[6] P. Waldermar, T. A. Ramstad, Hybrid KLT-SVD image compres-sion, Proceedings of IEEE International Conference on Acoustics,Speech, and Signal Processing (1997) 2713–2716.

[7] S. Zyto, A. Grama, W. Szpankowski, Semi-discrete matrix transforms(SDD) for image and video compression, Proceedings of DataCompression Conference (2002) 484.

[8] M. Tian, S-W. Luo, L-Z. Liao, An investigation into using singularvalue decomposition as a method of image compression, Proceedingsof 4th International Conference on Machine Leaning and Cybernetics(2005) 5200-5204.

[9] F-Z. N. Nacer, A. Zergaınoh, A. Merigot, Global discrete cosinetransform for image compression, Proceedings of InternationalSymposium on Signal Processing and its Applications (2001) 545-548.

[10] Y.-G. Wu, S.-C. Tai, Medical image compression by discrete cosinetransform spectral similarity strategy, IEEE Trans. Inf. Tech.Biomed. 5 (3) (2001) 236–243.

[11] I. Hacihaliloglu, M. Kartal, DCT and DWT based image compres-sion in remote sensing images, Proceedings of International Sympo-sium on Antennas and Propagation (2004) 3856–3858.

[12] T.G. Kolda, D.P. O’Leary, A semi-discrete decomposition for latentsemantic indexing in information retrieval, ACM-Trans. Inf. Syst. 16(4) (1998) 322–346.

[13] P. Indyk, R. Motwani, Approximate nearest neighbors: towardsremoving the curse of dimensionality, Proceedings of 30th Symposiumon Theory of Computing (1998) 604–613.

[14] S. Kaski, Dimensionality reduction by random mapping: fastsimilarity computation for clustering, Proceedings of InternationalJoint Conference on Neural Networks (1998) 413–418.

[15] R. I. Arriaga, S. Vempala, An algorithmic theory of learning: robustconcepts and random projection, Proceedings of 40th Annual Sym-posium on Foundations of Computer Science (1999) 616–623.

[16] S. Dasgupta, Experiments with random projection, Proceedings ofUncertainty in Artificial Intelligence (2000).

[17] D. Achlioptas, Database-friendly random projections, Proceedings ofACM Symposium on the Principles of Database Systems (2001)274-281.

[18] A. Garg, S. Har-Peled, D. Roth, On generalization bounds, projec-tion profile, and margin distribution, Proceedings of 19th Interna-tional Conference on Machine Learning (2002).

[19] S. S. Vempala, Random projection, Tech. Rep., Department ofMathematics, Massachusetts Institute of Technology, Cambridge,MA, USA, (2002).

[20] M. Agrawal, N. Kayal, N. Saxena, PRIMES in P, Tech. Rep.,Department of Computer Science and Engineering, Indian Instituteof Technology, Kanpur, India, (2002).

[21] W. B. Johnson, J. Lindenstrauss, Extension of Lipschitz mappingsinto a Hilbert space, Proceedings of Conference in ModernAnalysis and Probability, Contemporary Mathematics (1984)189-206.

[22] P. Frankl, H. Maehara, The Johnson–Lindenstrauss lemma and thesphericity of some graphs, J. Comb. Theory Ser. B 44 (1988)355–362.

[23] S. Dasgupta, A. Gupta, An elementary proof of the Johnson-Lindenstrauss lemma, Tech. Rep., TR-99-006, International Com-puter Science Institute, Berkeley, CA, USA, (1999).

[24] R. Hecht-Nielsen, Context vectors: general purpose approximatemeaning representations self-organized from raw data, in: J.M.Zurada, R.J. MarksII, C.J. Robinson (Eds.), Computational Intelli-gence: Imitating Life, IEEE Press, Cambridge, 1994, pp. 43–56.

[25] T. Cormen, C. Leiserson, R. Rivest, Introduction to Algorithms,second ed.,., McGraw-Hill, New York, 1997.

[26] B. Noble, J. Daniel, Applied Linear Algebra, third ed., Prentice-Hall,Englewood Cliffs, 1988.

[27] K. Castleman, Digital Image Processing, first ed., Prentice-Hall,Englewood Cliffs, 1996.

[28] S.A. Martucci, Symmetric convolution and the discrete sine andcosine transforms, IEEE Trans. Sig. Process SP-42 (1994) 1038–1051.

[29] M.J. Narasimha, A.M. Perterson, On the computation of the discretecosine transform, IEEE Trans. Commun. COM-26 (6) (1978)934–936.

[30] R. Yip, K.R. Rao, A fast computational algorithm for the discretesine transform, IEEE Trans. Commun. COM-28 (2) (1980) 304–307.

[31] R.V.L. Hartley, A more symmetrical Fourier analysis applied totransmission problems, Proc. IRE 30 (1942) 144–150.

[32] R.M. Bracewell, The discrete Hartley transform, J. Opt. Soc. Am. 73(12) (1983) 1832–1835.

[33] H.V. Sorenson, D.L. Jones, C.S. Burrus, M.T. Heideman, Oncomputing the discrete Hartley transform, IEEE Trans. Acoust.Speech Sig. Proc. ASSP 33 (4) (1985) 1231–1238.

[34] B.J. Falkowski, Transformation kernels and algorithms forthe generation of forward and inverse Hadamard orderedmulti-polarity generalized Walsh transforms, Proceedings ofthe 37th Midwest Symposium on Circuits and Systems(1994) 825–828.

[35] M.M. Anguh, R.R. Martin, A Truncation method for computingslant transforms with applications to image processing, IEEE Trans.Commun. 43 (6) (1995) 2103–2110.

[36] G. Kaiser, The fast Haar transform, IEEE Potentials 17 (2) (1998)34–37.

[37] M. Rabbani, P.W. Jones, Digital Image Compression Techniques,SPIE – International Society for Optical Engineers, New York, 1991.

[38] N. Saito, J.-F. Remy, The polyharmonic local sine transform: a newtool for local image analysis and synthesis without edge effect, Tech.Rep., Department of Mathematics, University of California, Davis,CA, USA, (2004).

[39] W.H. Press, S.A. Teukolsky, W.T. Vetterling, B.P. Flannery,NUMERICAL RECIPES in C: The Art of Scientific Computing,second ed., Cambridge University Press, New York, 1999.

[40] I. Daubechies, Orthonormal bases of compactly supported wavelets,Commun. Pure Appl. Math. 41 (1988) 909–996.