Randomized methods in lossless compression of ...users.wfu.edu/.../JRandomizedAlgorithms_HSI_2013.pdf · Randomized methods in lossless compression of hyperspectral data Qiang Zhang,a

Randomized methods in losslesscompression of hyperspectral data

Qiang ZhangV. Paúl PaucaRobert Plemmons

Randomized methods in lossless compressionof hyperspectral data

Qiang Zhang,a V. Paúl Pauca,b and Robert PlemmonscaWake Forest School of Medicine, Department of Biostatistical Sciences, Winston-Salem,

North Carolina [email protected]

bWake Forest University, Department of Computer Science, Winston-Salem, North Carolina27109

cWake Forest University, Departments of Mathematics and Computer Science, Winston-Salem,North Carolina 27109

Abstract. We evaluate recently developed randomized matrix decomposition methods for fastlossless compression and reconstruction of hyperspectral imaging (HSI) data. The simple ran-dom projection methods have been shown to be effective for lossy compression without severelyaffecting the performance of object identification and classification. We build upon these meth-ods to develop a new double-random projection method that may enable security in data trans-mission of compressed data. For HSI data, the distribution of elements in the resulting residualmatrix, i.e., the original data subtracted by its low-rank representation, exhibits a low entropyrelative to the original data that favors high-compression ratio. We show both theoretically andempirically that randomized methods combined with residual-coding algorithms can lead toeffective lossless compression of HSI data. We conduct numerical tests on real large-scaleHSI data that shows promise in this case. In addition, we show that randomized techniquescan be applicable for encoding on resource-constrained on-board sensor systems, where thecore matrix-vector multiplications can be easily implemented on computing platforms such asgraphic processing units or field-programmable gate arrays. © 2013 Society of Photo-OpticalInstrumentation Engineers (SPIE) [DOI: 10.1117/1.JRS.7.074598]

Keywords: random projections; hyperspectral imaging; dimensionality reduction; lossless com-pression; singular value decomposition.

Paper 12486SS received Jan. 3, 2013; revised manuscript received Apr. 18, 2013; accepted forpublication Jun. 14, 2013; published online Jul. 30, 2013.

1 Introduction

Hyperspectral image (HSI) data are the measurements of the electromagnetic radiation reflectedfrom an object or a scene (i.e., materials in the image) at many narrow wavelength bands.Spectral information is important in many fields such as environmental remote sensing, mon-itoring chemical/oil spills, and military target discrimination. For comprehensive discussions,see Refs. 1–3. HSI data is being gathered in sensors of increasing spatial, spectral, and radio-metric resolutions leading to the collection of truly massive datasets. The transmission, storage,and processing of these large datasets present significant difficulties in practical situations asnew-generation sensors are used. For example, for aircraft or for increasingly popularunmanned-aerial vehicles carrying hyperspectral scanning imagers, the imaging time is limitedby the data capacity and computational capability of the on-board equipment; since within 5 to10 s, hundreds to thousands of pixels of hyperspectral data are collected and often preprocessed.1

For real-time on-board processing, it would be desirable to design algorithms capable of com-pressing such amounts of data within 5 to 10 s, before the next section of the scene is scanned.This requirement makes it difficult to apply algorithms such as JPEG2000,4 three-dimensional(3-D)-SPIHT,5 or 3-D-SPECK,6 unless it is being deployed on acceleration platforms such asdigital signal processor,7 graphic processing unit (GPU), or field-programmable gate array

0091-3286/2013/$25.00 © 2013 SPIE

Journal of Applied Remote Sensing 074598-1 Vol. 7, 2013

(FPGA). For example, Christophe and Pearlman8 reported over 2 min of processing time using3-D-SPIHT with random access for a 512 × 512 × 224 HSI dataset, including 30 s for the dis-crete wavelet transformation.

Dimensionality reduction methods can provide means to deal with the computationaldifficulties of hyperspectral data. These methods often use projections to compress a high-dimensional data space represented by a matrix A into a lower-dimensional space representedby a matrix B, which is then factorized. For HSI processing, hundreds of bands of images can begrouped in a 3-D data array, also called a tensor or a datacube, which can be unfolded into amatrix A from which B is obtained and then factorized. Such factorizations are referred to as low-rank matrix factorizations, resulting in a low-rank matrix approximation to the original HSI datamatrix A.2,9–11

However, dimensionality reduction techniques provide lossy compression, as the originaldata is not exactly represented or reconstructed from the lower-dimensional space. Recent effortsto provide lossless compression exploit the correlation structure within HSI data, encoding theresiduals (original data—approximation) after stripping off the correlated parts.12,13 Given thelarge number of pixels, such correlations are often restricted to spatially or spectrally local areas,whereas dimensionality reduction techniques essentially explore the global correlation structure.In this paper, we propose the use of randomized dimensionality reduction techniques for effi-ciently capturing global correlation structures and residual encoding, as in Ref. 13, and for pro-viding lossless compression. The success of this approach requires low entropy of thedistribution of the residual data relative to the original, and as it shall be observed in the exper-imental section this appears to be the case with HSI data.

The most popular methods for low-rank factorizations employ the singular value decompo-sition (SVD), e.g., Ref. 14, and can lead to popular data analysis methods such as principalcomponent analysis (PCA).15 Compared with algorithms that employ fixed basis functions,such as 3-D wavelets in JPEG2000, 3-D-SPIHT, and 3-D-SPECK, the basis given by theSVD or PCA are data driven and provide a more compact representation of the originaldata. Moreover, by the optimality of the truncated SVD’s (TSVD) low-rank approximation,14

the Frobenius norm of the residual matrix is also optimal, and a low entropy in its distributionmay be expected. Both the SVD and PCA can be used to represent an n-band hyperspectraldataset with the data size equivalent to only k bands, where k ≪ n. For applications of theSVD and PCA in HSI, see Refs. 16–19. The main disadvantage of using the SVD is its com-putation time: Oðmn2Þ floating-point operations (flops) for an m × n matrix (m ≥ n) (Ref. 20).With recent technology, HSI datasets can easily be at the million pixel or even giga pixel-level,rendering the use of a full SVD impractical on real scenarios.

The recent development of probabilistic methods for approximated singular vectors and sin-gular values has provided a way to circumvent the computational complexity of the SVD, thoughat the cost of optimality in the approximation.21 These methods begin by randomly projecting theoriginal matrix to obtain a lower-dimensional matrix, while keeping the range of the originalmatrix asymptotically intact. The much smaller-projected matrix is then factorized using a full-matrix decomposition such as the SVD. The resulting singular vectors are backprojected to theoriginal space. Compared with deterministic methods, probabilistic methods often offer lower-computational cost, while still achieving high-accuracy approximations (see Ref. 21 and thereferences therein).

Chen et al.22 have recently provided an extensive study on the effects of linear projections onthe performance of target detection and classification of HSI. In their tests, they found that thedimensionality of hyperspectral data can typically be reduced to 1∕5 ∼ 1∕3 that of the originaldata without severely affecting the performances of classical target detection and classificationalgorithms. Compressive sensing approaches for HSI also take advantage of redundancy alongthe spectral dimension,11,17,23–25 and involve random projection of the data onto a lower-dimen-sional space. For example, Fowler17 proposed an approach that exploits the use of compressiveprojections in sensors that integrate dimensionality reduction and signal acquisition to effectivelyshift the computational burden of PCA from the encoder platform to the decoder site. This tech-nique, termed compressive-projection PCA (CPPCA), couples random projections at theencoder with a Rayleigh–Ritz process for approximating eigenvectors at the decoder. In itsuse of random projections, this technique possesses a certain duality with newer randomized

Zhang, Pauca, and Plemmons: Randomized methods in lossless compression of hyperspectral data


SVD (rSVD) approaches recently proposed.19 However, CPPCA recovers coefficients of aknown sparsity pattern in an unknown basis. Accordingly, CPPCA requires the additionalstep of eigenvector recovery.

In this paper, we will present several randomized algorithms designed for the on-boardprocessing of the lossy and the lossless compressions of HSI. Our goals include the fast process-ing of hundreds of pixels of hyperspectral data within a time frame of 5 s and to achieve a losslesscompression ratio (CR) close to 3. The structure in the remainder of the paper is as follows. InSec. 2, we present several fast-randomized methods for the purposes of lossless compression andreconstruction, suitable for on-board and off-board (receiving station) processing. In Sec. 3, weapply the methods to a large HSI dataset to demonstrate the efficiency and effectiveness of theproposed methods. We conclude with some observations in Sec. 4.

2 Methodology

Randomized algorithms have recently drawn a large amount of interest,21 and here we exploitthis approach specifically for efficient on-board lossless compression and data transfer and off-board reconstruction of HSI data. For lossless compression, the process is as follows:

1. Calculate a low-rank approximation of the original data using randomized algorithms.2. Encode the residual (original data—approximation) using standard integer or floating

point-coding algorithms.

We present several randomized algorithms for efficient low-rank approximation. They can bewritten in fewer than 10 lines of pseudo-code, can be easily implemented on PC platforms, andmay be ported to platforms such as GPUs or FPGAs. As readers will see, in all of the large-scalecomputations only matrix-vector multiplications are involved, and more computationally inten-sive SVD computations involve only small scale matrices.

In the encoding and decoding algorithms that follow, it is assumed that HSI data is collectedin blocks of size nx × ny × n, where nx and ny are the number of pixels along the spatial dimen-sions and n is the number of spectral bands. During compression, each block is first unfoldedinto a two-dimensional array of size m × n, where m ¼ nxny, by stacking each slice of size nx ×ny into a one-dimensional array of size m × 1. The compact representation for each block canthen be stored on board. See Sec. 3 for a more extensive discussion of compression of HSI data inblocks as the entire dataset is being gathered.

We start by defining terms and notations. The SVD of a matrix A ∈ Rm×n is defined asA ¼ UΣVT , where U and V are orthonormal and the columns of which are denoted as uiand vi, respectively. Σ is a diagonal matrix with entries σ1 ≥ σ2 ≥ · · ·≥ σp ≥ 0, with p ¼minðm; nÞ. For some k ≤ p, the TSVD rank-k approximation of A is a matrix Ak such thatAk ¼

Pki¼1 σiuiv

Ti ¼ UkΣkVT

k , whereUk and Vk contain the first k-columns ofU and V, respec-tively. The residual matrix obtained from the approximation of A with Ak is given byR ¼ A − Ak. By the Eckart–Young theorem,14 Ak is the optimal rank-k approximation of A min-imizing the Frobenius norm of R.

2.1 Single-Random Projection Method

Computing low-rank approximations of a large matrix using the SVD is prohibitive in most ofthe real-world applications. Randomized projections into lower-dimensional spaces provide afeasible way to get around this problem. Let P ¼ ðpijÞm×k1

be a matrix of size m × k1 withrandom independent and identically distributed (i.i.d.) entries drawn from N ð0; 1Þ. We definethe random projection of the row space of A onto a lower k1-dimensional subspace as

B ¼ PTA: (1)

If P is of size n × k1, then B ¼ AP is a similar random projection of the column space of A.Given a target rank k, Vempala26 uses such P matrices to propose an efficient algorithm for

computing a rank-k approximation of A. The algorithm consists of the following three sim-ple steps:



1. Compute the random projection B ¼ 1∕ffiffiffiffiffik1

pPT1A for some k1 ≥ c log n∕ϵ2

2. Compute the SVD, B ¼ Pi¼1λiuiv

Ti

3. Return: Ak←AðPki¼1 viv

Ti Þ ¼ AVkV

Tk .

It is also shown in Ref. 26 that with a high probability, the norm error between Ak and A isbounded by

kA − Akk2F ≤ kA − Akk2F þ 2ϵkAkk2F; (2)

where Ak is is the optimal rank-k approximation provided by the TSVD. This bound shows thatthe approximation Ak is near optimal for small ϵ.

During HSI remote sensing data acquisition, Vempala’s algorithm may enable lossy com-pression by efficiently computing and storing AVk and Vk on-board as the data is being gathered.The storage requirement of AVk and Vk is proportional to ðmþ nÞk compared with mn of theoriginal data. For lossless compression, the residual R ¼ A − Ak may be compressed with aninteger or floating point-coding algorithm and also stored on board.

Encoding and decoding procedures using Vempala’s algorithm are presented in Algorithms 1and 2, respectively. For lossy compression, Rmay be ignored. Clearly, there is a tradeoff betweenthe target rank, reducing the size of AVk and Vk, and the compressibility of the residual R, whichis also dependent on the type of data being compressed. Figure 1 illustrates this tradeoff, assum-ing that the entropy of the residual decreases as a scaled power law in the form of k−s∕α fors ¼ 0.1∶0.1∶2 and with constant α.

Matrix P1 plays an important role in the efficient low-rank approximation of A. P1 could befairly large depending on the prespecified value of ϵ. For example, for ϵ ¼ 0.15, c ¼ 5, andn ¼ 220, P requires k1 ≥ 1199 columns. However, P1 is needed only once in the compressionprocess, and may be generated in blocks (see Sec. 3). In addition, the distribution of randomentries in P1 is symmetric, being drawn from a normal distribution. Zhang et al.27 relax thisrequirement to allow for any distribution with a finite variance. For faster implementation, a

Algorithm 1 On-Board Random Projections Encoder.

Input: HSI data block of size nx × ny × n, unfolded into a m × n array A, target rank k , and approximationtolerance ϵ.

Output: V k , W , R

1. Compute B ¼ 1∕ffiffiffiffiffiffik1

pPT

1A, for some k1 ≥ c log n∕ϵ2.

2. Compute the SVD of B: B ¼ Pi¼1λi ui v T

i .

3. Construct the rank-k approximation: Ak ¼ WVTk ;W ¼ AV k .

4. Compute the residual: R ¼ A − Ak .

5. Encode the residual as R with a parallel coding algorithm.

6. Store V k , W , and R

Algorithm 2 Off-Board Random Projection Decoder.

Input: V k , W , R.

Output: The original matrix A.

1. Decode R from R with a parallel decoding algorithm.

2. Compute the rank-k approximation: Ak ¼ WVTk .

3. Reconstruct the original: A ¼ Ak þ R



circulant random matrix could also be effective,27,28 needing storage of only one randomvector.

2.2 Double-Random Projections Method

Avariant of the above low-rank approximation approach may be derived by introducing a secondrandom projection for the row space

B2 ¼ AP2; (3)

where P2 ∈ Rn×k2 has i.i.d. entries drawn from N ð0; 1Þ and B2 ∈ Rm×k2 . Substitution of A inEq. (3) with its rank-k approximation AVkV

Tk results in

B2 ≈ AVkVTk P2: (4)

Notice that VTk P2 has full-row rank, hence its Moore–Penrose pseudo-inverse satisfies

ðVTk P2ÞðVT

k P2Þ† ¼ Ik: (5)

Multiplying Eq. (4) on both sides with ðVTk P2Þ† gives

B2ðVTk P2Þ† ≈ AVk: (6)

A new rank-k approximation of A can then be obtained as

Ak ¼ B2ðVTk P2Þ†VT

k ≈ AVkVTk ≈ A: (7)

As in Vempala’s algorithm, the quality of this approximation depends on choosing a suffi-ciently large value of k2 ≥ 2kþ 1 (see Ref. 27 for a more detailed discussion). We refer to thismethod as the double-random projection (DRP) approach for low-rank approximation.

During HSI remote sensing data acquisition, the DRP approach may enable lossy compres-sion by efficiently computing and storing B2, Vk, and P2 on-board as the data is being gathered.The storage requirement for these factors is proportional to ðmþ nÞk2 þ nk. For lossless com-pression, the residual R ¼ A − Ak may be compressed with an integer or floating point-codingalgorithm and also stored on-board. Encoding and decoding procedures based on DRP are pre-sented in Algorithms 3 and 4, respectively. For lossy compression, R may be ignored as in thesingle-random projection case.

At a slight loss of precision and increased storage requirement, the DRP encoding and decod-ing algorithms offer the additional advantage of secure data transfer if P2 is used as a shared key

0 50 100 150 200 250 3000

2

4

6

8

10

12

14

0.10.20.30.40.50.60.70.80.911.11.21.31.41.51.61.71.81.92

desired rankco

mpr

essi

on r

atio

Fig. 1 Theoretical compressibility curves when entropy of the residual decreases as ðk∕αÞ−s fork ¼ 2; : : : ; 300, s ¼ 0.1: : : 2 and a constant α ¼ 2. The dashed line indicates a compressed ratio of1 (original data).



between the remote sensing aircraft and the ground. It remains to be seen whether this cipher iseasily violated in the future study, and for now we can regard it as a lightweight security. In thiscase, P2 could be generated and transmitted securely only once between the ground and theaircraft. Subsequent communication would not require transmission of P2. Unlike the sin-gle-random projection approach, interception of factors B2, Vk, and R would not easily leadto a reconstruction of the original without P2.

2.3 Randomized Singular Value Decomposition

The rSVD algorithm described by Halko et al.21 explores approximate matrix factorizations byrandom projections and separates the process into two stages. In the first stage, A is projectedinto a l-dimensional space by computing

Y ¼ AΩ; (8)

where Ω is a matrix of size n × l with random entries drawn from N ð0; 1Þ. Then, for a givenϵ > 0, a matrix Q ∈ Rm×l whose columns form an orthonormal basis for the range of Y isobtained such that

kA −QQTAk22 ≤ ϵ: (9)

See Algorithms 4.1 and 4.2 in Ref. 21 to see howQ and lmay be computed adaptively. In thesecond stage, the SVD of the reduced matrix QTA ∈ Rl×n is computed as U Σ VT . Since l ≪ n,it is generally computationally feasible to compute the SVD of the reduced matrix. Matrix A canthen be approximated as

A ≈ ðQUÞΣVT ¼ U Σ VT ; (10)

Algorithm 3 On-Board Double-Random Projections Encoder.

Input: HSI data block of size nx × ny × n, unfolded into a m × n array A, target rank k , and approximationtolerance ?.

Output: B2, V k , R.

1. Compute: B1 ¼ 1∕ffiffiffiffiffiffik1

pPT

1A, and B2 ¼ AP2, for some k1 ≥ c log n∕ϵ2 and k2 ≥ 2k þ 1.

2. Compute the SVD of B1: B1 ¼ Pi¼1λi ui v T

i .

3. Compute the rank-k approximation: Ak ¼ B2ðV Tk P2Þ†V T

k .

4. Compute the residual: R ¼ A − Ak .

5. Code the residual as R with a parallel coding algorithm.

6. Store B2, V k , and R

Algorithm 4 Off-Board Double-Random Projections Decoder.

Input: B2, V k , P2, R

Output: The original matrix A.


2. Compute the low-rank approximation: Ak ¼ B2ðV Tk P2Þ†V T

k

3. Reconstruct the original: A ¼ Ak þ R



where U ¼ QU and V are orthonormal matrices. As such, Eq. (10) is an approximate SVD of A,and the range of U is an approximation to range of A. See Ref. 21 for details on the choice of l,along with extensive numerical experiments using rSVD methods, and a detailed error analysisof the two-stage method described above.

The rSVD approach may also be used to specify HSI encoding and decoding compressionalgorithms, as shown in Algorithms 5 and 6. For lossy compression, Q and B need to be com-puted and stored on-board. The storage requirement for these factors is proportional toðmþ nÞl. As in the previous cases, for lossless compression the residual may be calculatedand compressed using an integer or floating point-coding algorithm.

Compared with the previous single- and double-random projection approaches, rSVDrequires the computation of Q but is also able to push the SVD calculation to the decoder.Since l appears to be much smaller than k1 and k2 in practice, the encoder is able to storeQ and B directly without any loss in the approximation accuracy. Perhaps, the key benefitof rSVD lies in that the low-rank approximation factors U, Σ, and V can be used directlyfor subsequent analysis such as PCA, clustering, etc.

2.4 Randomized Singular Value Decomposition by DRP

The DRP approach can also be applied in the rSVD calculation by introducing

B1 ¼ PT1A; (11)

where P1 is of sizem × k1 with entries drawn fromN ð0; 1Þ. Replacing Awith the rSVD approxi-mation, QQTA leads to

Algorithm 5 Randomized SVD Encoder.

Input: HSI data block of size nx × ny × n, unfolded into a m × n array A and approximation tolerance ?.

Output: Q, B, R

1. Calculate: Y ¼ AΩ, for some l > k

2. Apply Algorithm 4.2 in Ref. 21 to obtain Q from Y

3. Compute: B ¼ QTA

4. Compute the residual: R ¼ A −QB

5. Code R as R with a parallel coding algorithm.

6. Store Q, B, and R

Algorithm 6 Randomized SVD Decoder.

Input: Q, B, and R

Output: The original matrix A and its rank-k approximate SVD U , Σ, V


2. Compute the SVD: B ¼ U Σ V

3. Compute: U ¼ QU

4. Compute the low-rank approximation: Al ¼ U Σ V

5. Reconstruct the original: A ¼ Al þ R



B1 ≈ PT1QQTA: (12)

Multiplying both sides by the pseudo-inverse of PT1Q, we have

ðPT1QÞ†B1 ≈QTA: (13)

With this slight modification, the rSVD calculation in the encoder can proceed by usingðPT

1QÞ†B1 instead of QTA. The corresponding encoding algorithm is given Algorithm 7.The decoder algorithm remains the same as in the rSVD case.

3 Numerical Experiments

We have tested the encoding algorithms presented in Sec. 2 on a large and publicly available HSIdataset, namely Indian Pines, collected by AVIRIS over a 25 × 6 mi2 portion of NorthwestTippecanoe County, Indiana, on June 12, 1992. The sensor has a spectral range of 0.45 to2.5 μm over 220 bands, and the full dataset consists of a 2;678 × 614 × 220 image cube storedas unsigned 16-bit integers. Figure 2 shows the 100th band in grayscale.

A remote-sensing aircraft carrying hyperspectral scanning imagers can collect such a datacube in blocks of hundreds to thousands of pixels in size, each gathered within a few secondstime.1 The size of each data block is determined by factors such as the ground sample distanceand the flight speed.

To simulate this process, we unfolded the Indian Pines data cube into a large matrix T of size1;644;292 × 220, and then divided T into nine blocks Ai of size m ¼ 182;699 × n ¼ 220 each.For simplicity, the last pixel in the original dataset was ignored. Each Ai block was then com-pressed sequentially using the encoding algorithms of Sec. 2. In all cases, Ai is converted from anunsigned 16-bit integer to double the precision before compression, and the compressed rep-resentation is converted back to unsigned 16-bit integers for storage.

All algorithms were implemented in Matlab, and the tests were performed on a PC platformhaving eight 3.2 GHz Intel Xeon cores and 12 Gb memory. In the implementation ofAlgorithm 1, random matrix P1 ∈ Rm×k1 could be large, since m ¼ 182;699 and the oversam-pling requirement k1 ≥ c log n∕ϵ2 can lead to relatively large k1, e.g., k1 ¼ 1199 when c ¼ 5

and ϵ ¼ 0.15. To reduce the memory requirement, we implicitly represent P1 in column blocksas P1 ¼ ½Pð1Þ

1 Pð2Þ1 : : : PðνÞ

1 � and implement the matrix multiplication PT1A as a series of products

PðjÞ1 A, generating and storing P1 as only one block at the time.

3.1 Compressibility of HSI Data

As alluded to with the compressibility curves in Fig. 1, the effectiveness of low-rank approxi-mation and residual encoding depends on (1) the compressibility of the data and (2) the effec-tiveness of dimensionality reduction in reducing the entropy of the residual as a function of thedesired rank k. The first point can be demonstrated by computing high-accuracy approximated

Algorithm 7 Randomized SVD by DRP Encoder.

Input: HSI data block of size nx × ny × n, unfolded into a m × n array A and approximation tolerance ϵ

Output: Q;W , and R

1. Calculate: B1 ¼ 1ffiffiffiffik1

p PT1A, Y ¼ AΩ, for some k1 ≥ c? log ?n

ϵ2and l > k

2. Apply Algorithm 4.2 in Ref. 21 to obtain Q from Y

3. Compute the residual: R ¼ A −QW;W ¼ ðPT1QÞ†B1

4. Code R as R with a parallel coding algorithm.

5. Store Q, W , and R



singular vectors and singular values of the entire Indian Pines dataset using the rSVD algorithm.Figure 3 shows the first eight singular vectors folded as images of size 2;678 × 614. Figure 4shows the corresponding singular values up to the 20th value. As can be observed, a great deal ofthe information is encoded in the first six singular vectors and singular values with the seventhsingular vector appearing more like noise.

To address the second point, we compare the histogram of the original dataset with that of theresidual produced by the rSVD encoder in Algorithm 5 with target rank k ¼ 6. Figure 5(a) shows

Fig. 2 The grayscale image of the 100th band.

500 1000 1500 2000 2500

200

400

600

500 1000 1500 2000 2500

200

400

600

500 1000 1500 2000 2500

200

400

600

500 1000 1500 2000 2500

200

400

600

500 1000 1500 2000 2500

200

400

600

500 1000 1500 2000 2500

200

400

600

500 1000 1500 2000 2500

200

400

600

500 1000 1500 2000 2500

200

400

600

Fig. 3 The first eight singular vectors, ui , shown as images.



values in the original dataset to be in the range ½0; 0.4�. After rSVD encoding, the residual valuesare roughly distributed in a Laplacian distribution in the range ½−0.1; 0.1� as seen in Fig. 5(b).Moreover, 95.42% of the residual values are within the range of ½−:0015; :0015� (notice the logscale on the y-axis). This suggests that the entropy of the residual is significantly smaller than theentropy of the original dataset and that, as a consequence, the residual may be effectivelyencoded for lossless compression. Figure 5(c) shows the probability of observing a residualvalue, r, greater than a given value x, i.e., pðr > xÞ, and again indicating the residuals are highlydensely distributed around zero.

3.2 Lossless Compression Through Randomized Dimensionality Reduction

We use the entropy of the residuals produced by each encoding algorithm as the information-theoretic lower bound, i.e., the minimum amount of bits required to code the residuals, to esti-mate the amount of space needed to store a compressed residual. This entropy of the distributionof residual values is defined as

hðRÞ ¼ −Z

pðxÞ logðpðxÞÞdx; (14)

2 4 6 8 10 12 14 16 18 2010

0

101

102

103

104

i

σ i

Fig. 4 The singular spectrum of the full Indian Pines dataset singular values up to the 20th value.

−0.1 −0.05 0 0.05 0.110

0

102

104

106

108

1010

(b)0 0.1 0.2 0.3 0.4

100

102

104

106

108

1010

(a)−0.01 −0.005 0 0.005 0.01

0

0.2

0.4

0.6

0.8

1

(c)

Fig. 5 (a) The distribution of the original Indian Pines hyperspectral imaging (HSI) data values.(b) The distribution of residuals after subtracting the truncated SVD (TSVD) approximation fromthe original data. (c) The cumulative distribution of residuals after subtracting the TSVD approxi-mation from the original data.



where pðxÞ is the probabilistic distribution function of residual values. We estimate hðRÞ bycomputing and scaling histograms of residual values [as in Fig. 5(b)].

We assume that, like the original data, the low-rank representation and the correspondingresidual are stored in the signed 16-bit integer format. The CR is then calculated by dividingthe amount of storage needed for the original data by the amount of storage needed for thecompressed data. As an example, for Algorithm 1, output Vk and W ¼ AVk require space pro-portional to ðmþ nÞk. If the entropy of the residual is hðRÞ bits, then the lossless CR obtainedusing Algorithm 1 is calculated as

CR ¼ 16mn16nkþ 16 mkþ hðRÞmn

: (15)

Figure 6 shows lossless CRs obtained using all four encoding algorithms of Sec. 2 as a func-tion of data block Ai. The target rank is k ¼ 6 for all cases, and the number of columns in P1 andP2 are k1 ¼ 1;000 and k2 ¼ 2kþ 1 ¼ 13, respectively. Notice that the CRs are above 2.5 andclose to or around 3, while Wang et al.13 indicated 3 as a good CR for HSI data. Readers shouldbe aware that Fig. 6 only shows the theoretical upper bounds of the lossless CRs, while those inRef. 13 are the real ones. The CRs produced by the DRP variants are slightly lower than theircounterparts. This is an expected result as the advantage of DRP (Algorithm 3) lies in the easilyimplemented lightweight data security. Finally, high CRs above 4.5 may be achieved, as shownin Fig. 6, for the last data block. This block corresponds to segments of homogeneous vegetation,seen in the right side of Fig. 2, which has been extensively tested by classification algorithms.29

Besides the theoretical upper-bounds of the CRs presented in Fig. 6, we also combine therandomized methods with some popular lossless compression algorithms for coding the resid-uals. The chosen residual coding methods include the Lempel-Ziv-Welch (LZW) algorithm,30

Huffman coding,31 Arithmetic coding,32 and JPEG2000.33 Table 1 presents the mean losslessCRs of the nine blocks of HSI data, where columns correspond to the randomized methodsand rows correspond to the coding algorithms. The highest CR of 2.430 is achieved bythe combination of the rSVD method and the JPEG2000 algorithm. Given the rapiddevelopment of coding algorithms, and the relatively limited and rudimentary algorithms pre-sented here, the CR can be further elevated by incorporating more advanced algorithms in thefuture work.

1 2 3 4 5 6 7 8 92

2.5

3

3.5

4

4.5

5

Block

Com

pres

sion

Rat

io

RPDRPrSVDrSVD−DRP

Fig. 6 The lossless compression ratios (CR) using Algorithms 1, 3, 5, and 7.



3.3 Optimal Compressibility

Optimal CRs using the randomized dimensionality reduction methods of Sec. 2 depend on theappropriate selection of parameters such as target rank, approximation error tolerances, etc. Suchresults for the Indian Pines dataset are beyond the scope of this paper. However, some optimalityinformation can be gleaned by observing how CR changes as a function of target rank k (withother parameters fixed). Notice from Ref. 15 that the amount of storage needed for the low-rankrepresentation increases with k, while the entropy of the residual decreases. The two terms in thedenominator thus result in an optimal k, which is often data dependent. Figure 7 shows such theresult for the Indian Pines dataset. The different curves correspond to different data blocks of theoriginal dataset, and the solid red curve is the mean across all blocks. Our choice of k ¼ 6 is seento be near optimal.

We can learn several things from the curves in Fig. 7. First, HSI data is compressible, but itscompressibility depends on the right choice of k in the presented algorithms. Some hints onchoosing the right k can be seen through the singular values and singular vectors. For example,in Fig. 3, the singular vectors after the sixth singular vector look more and more like noise, whichtells us most of the information is contained in the first six singular vectors. Second, we haveempirically demonstrated the entropy of residuals approximately following the power law,

Table 1 Lossless compression ratios (CRs) of hyperspectral imaging (HSI) data with combina-tions of randomized methods with coding algorithms.

Algorithm 1 Algorithm 3 Algorithm 5 Algorithm 7

LZW 1.438 1.338 1.569 1.563

Huffman coding 2.353 2.022 2.328 2.316

Arithmetic coding 2.362 2.017 2.326 2.313

JPEG2000 2.414 2.189 2.430 2.419

0 20 40 60 80 1001

1.5

2

2.5

3

3.5

4

4.5

5Algorithm 1 (RP)

0 20 40 60 80 1001

1.5

2

2.5

3

3.5

4

4.5

5Algorithm 3 (DRP)

0 20 40 60 80 1001

1.5

2

2.5

3

3.5

4

4.5

5Algorithm 5 (rSVD)

0 20 40 60 80 1001

1.5

2

2.5

3

3.5

4

4.5

5Algorithm 7 (rSVD−DRP)

Block 1Block 2Block 3Block 4Block 5Block 6Block 7Block 8Block 9Mean

Fig. 7 CRs of the Indian Pines HSI dataset as function of target rank k .



i.e., ðk∕αÞ−s, as illustrated in Fig. 1; hence, the optimal k has a highly peak area, and is relativelyeasy to choose from compared with flatter curves. Further tests are needed to develop robustmethods for obtaining near optimal CRs. The adaptive selection of the rank parameter in therSVD calculation21 can be used as an important first step in this direction. Third, since therSVD algorithm is near optimal in terms of the Frobenius norm, or in the mean squared errorsense, the similar curves by other randomized algorithms demonstrate that they all share the nearoptimality as the rSVD algorithm.

To further justify this finding, we explore their suboptimality through comparing theFrobenius norms and entropies of the residuals by the four randomized algorithms withthose by the exact TSVD. Figure 8(a) shows the ratios of the Frobenius norm of residualsby the exact TSVD and the Frobenius norm of residuals by each algorithm for the nine blocksof HSI data, while Fig. 8(b) shows the ratios of the entropy of residuals by the exact TSVD andthe entropy of residuals by each algorithm. The ratio at 1 represents the exact optimality, whilehigher ratios are more optimal than the lower ones. In terms of the Frobenius norm, three of thefour algorithms are fairly close to the optimal, while Algorithm 3, the DRP algorithm, shows lessoptimality. In terms of the entropy, all four algorithms are fairly close to the optimal, whichexplains why the CRs of the four algorithms are all fairly close to each other. Interestingly,in Fig. 8(b), we observe ratios even higher than 1, which means in some cases the entropiesof residuals by these algorithms can be even less than those by the exact TSVD.

3.4 Time Performance of Randomized Dimensionality Reduction

If lossy compression of HSI data is preferred, randomized dimensionality reduction methods canperform in near real time. Figure 9 shows the amount of time (in seconds) that each encoder inSec. 2 takes to process each data block Ai, while ignoring the residuals. Notice that all encoderstake less than 5 s for each of the nine data blocks. The computation times of the RP encoder(Algorithm 1) and the DRP encoder (Algorithm 1) do not appear to be significantly different, andboth take less than 2.4 s per data block, averaging about 2.3 s over all nine data blocks. This cantranslate to a mean throughput of 182;699 × 220 × 8∕2.3 ≈ 140 Mb∕s. Note that the originalunsigned 16-bit integer is converted to double precision before processing. The green curvecorresponding to the rSVD encoder (Algorithm 5) shows the best performance, while theblack curve corresponding to the rSVD-DRP encoder (Algorithm 7) is the slowest, but stilltakes less than 5 s per block. The extra time is spent in step 3 computing the pseudo-inverseof PT

1Q. Efficient non-Matlab implementations of the encoding algorithms presented in thispaper on platforms such as GPUs, would be expected to perform in real time. For lossless com-pression, our tests show that the low-entropy residuals may be effectively compressed with con-ventional tools, such as gzip, in less than 4 s per data block, or better performance tools, such as

0 2 4 6 8 100.4

0.5

0.6

0.7

0.8

0.9

1

(a) (b)

Frobenius norm optimality

Algorithm 1Algorithm 3Algorithm 5Algorithm 7

0 2 4 6 8 100.9

0.92

0.94

0.96

0.98

1

1.02

1.04

1.06Entropy Optimality

Fig. 8 (a) The ratios of the Frobenius norm of residuals of the nine blocks of HSI data by eachalgorithm and that by the exact TSVD. (b) The ratios of the entropy of residuals by each algorithmand that by the exact TSVD.



JPEG2000, which can compress each block within 4.5 s. For Huffman coding and Arithmeticcoding algorithms, computation would take significantly longer times without the assistance ofspecial acceleration platforms, such as GPU or FPGA.

For comparison, we also run 3-D-SPECK and 3-D-SPIHT on a 512 × 512 × 128 subset, andboth algorithms needed over 2 min to provide lossless compression. Christophe and Pearlmanalso reported over 2 min of processing time using 3-D-SPIHTwith random access for a similar-size dataset.8

4 Conclusions and Discussions

As HSI datasets grow in size, compression and dimensionality reduction for analytical purposesbecome increasingly critical for storage, data transmission, and subsequent postprocessing. Thispaper shows the potential of using randomized algorithms for efficient and effective compressionand reconstruction of massive HSI datasets. Built upon the random projection and rSVD algo-rithms, we have further developed a DRP method for a standalone encoding algorithm or for itbeing combined with the rSVD algorithm. The DRP algorithm slightly sacrifices CRs, whileadding a lightweight encryption security.

We have demonstrated that for a large HSI dataset, such as the Indian Pines dataset, theo-retical CRs close to 3 are possible, while empirical CRs can be as high as 2.43 based on testing alimited number of coding algorithms. We have used the rSVD algorithm also to estimate nearoptimal target ranks by simply using the approximate singular vectors. Choosing optimal param-eters for dimensionality reduction using randomized methods is a topic of future research. Theadaptive rank selection method described in Ref. 21 offers an initial step in this direction. Interms of the suboptimality of the randomized algorithms, we have compared them with the exactTSVD in terms of the Frobenius norm and the entropy of the residuals, both of which appear tobe near optimal empirically.

The presented randomized algorithms can be regarded as loss compression algorithms, whichneed to be combined with residual-coding algorithms for the lossless compression. We haveshown empirically that the entropy of the residual (original data—low-rank approximation)decreases significantly for HSI data. Conventional entropy-based methods for integer codingare expected to perform well on these low-entropy residuals. Integrating advanced residual-coding algorithms with the randomized algorithm is an important research topic for the futurestudy.

One concern for the residual coding is the speed. In this regard, recent developments in float-ing-point coding34 have shown throughputs reaching as high as 75 Gb∕s on a GPU. On an eightXeon-core computer, we have observed throughputs near 20 Gb∕s. Both of these throughputs

1 2 3 4 5 6 7 8 9 101.5

2

2.5

3

3.5

4

4.5

5

Block

Tim

e (S

econ

d)

Computation time for lossy compression using Algorithm 1, 3, 5, and 7

RPDRPrSVDrSVD−DRP

Fig. 9 The computation time for lossy compression by Algorithms 1, 3, 5, and 7.



should be sufficient for coding the required HSI residual data. Saving residuals back as 16-bitintegers can further reduce the computation time.

Acknowledgments

Research by R. Plemmons and Q. Zhang is supported by the U.S. Air Force Office of ScientificResearch (AFOSR), under Grant FA9550-11-1-0194.

References

1. M. T. Eismann, Hyperspectral Remote Sensing, SPIE Press, Bellingham, WA (2012).2. H. F. Grahn and E. Paul Geladi, Techniques and Applications of Hyperspectral Image

Analysis, John Wiley & Sons Ltd., West Sussex, England (2007).3. J. Bioucas-Dias et al., “Hyperspectral unmixing overview: geometrical, statistical, and

sparse regression-based approaches,” IEEE J. Sel. Top. Appl. Earth Obs. Rem. Sens. 5(2),354–379 (2012).

4. J. Zhang et al., “Evaluation of jp3d for lossy and lossless compression of hyperspectralimagery,” in 2009 IEEE Int. Geoscience and Remote Sensing Symposium, IGARSS2009, Vol. 4, pp. IV-474, IEEE, Cape Town, South Africa (2009).

5. B. Kim, Z. Xiong, and W. Pearlman, “Low bit-rate scalable video coding with 3-d set par-titioning in hierarchical trees (3-D SPIHT),” IEEE Trans. Circuits Syst. Video Technol.10(8), 1374–1387 (2000), http://dx.doi.org/10.1109/76.889025.

6. X. Tang, S. Cho, and W. Pearlman, “Comparison of 3D set partitioning methods in hyper-spectral image compression featuring an improved 3D-SPIHT,” in Proc. Data CompressionConf., 2003, DCC 2003, p. 449, IEEE, Snowbird, UT (2003).

7. Y. Langevin and O. Forni, “Image and spectral image compression for four experiments onthe ROSETTA and Mars express missions of ESA,” in Int. Symp. Optical Science andTechnology, pp. 364–373, SPIE Press, Bellingham, WA (2000).

8. E. Christophe and W. Pearlman, “Three-dimensional SPIHT coding of volume images withrandom access and resolution scalability,” J. Image Video Process., 13, Article 2 (2008),http://dx.doi.org/10.1155/2008/248905.

9. J. C. Harsanyi and C. Chang, “Hyperspectral image classification and dimensionality reduc-tion: an orthogonal subspace projection approach,” IEEE Trans. Geosci. Rem. Sens. 32(4),779–785 (1994), http://dx.doi.org/10.1109/36.298007.

10. A. Castrodad et al., “Learning discriminative sparse models for source separation and map-ping of hyperspectral imagery,” IEEE Trans. Geosci. Rem. Sens. 49(11), 4263–4281 (2011),http://dx.doi.org/10.1109/TGRS.2011.2163822.

11. C. Li et al., “A compressive sensing and unmixing scheme for hyperspectral data process-ing,” IEEE Trans. Image Process. 21(3), 1200–1210 (2012).

12. X. Tang and W. Pearlman, “Three-dimensional wavelet-based compression of hyperspectralimages,” Hyperspectral Data Compression, pp. 273–308, Springer, New York (2006).

13. H. Wang, S. Babacan, and K. Sayood, “Lossless hyperspectral-image compression usingcontext-based conditional average,” IEEE Trans. Geosci. Rem. Sens. 45(12), 4187–4193(2007), http://dx.doi.org/10.1109/TGRS.2007.906085.

14. G. H. Golub and C. F. V. Loan, Matrix Computations, 3rd ed., The Johns HopkinsUniversity Press, Baltimore, Maryland (1996).

15. I. Jolliffe, Principal Component Analysis, 2nd ed., Springer, New York (2002).16. Q. Du and J. Fowler, “Hyperspectral image compression using JPEG2000 and principal

component analysis,” IEEE Geosci. Rem. Sens. Lett. 4(2), 201–205 (2007), http://dx.doi.org/10.1109/LGRS.2006.888109.

17. J. Fowler, “Compressive-projection principle component analysis,” IEEE Trans. ImageProcess. 18(10), 2230–2242 (2009), http://dx.doi.org/10.1109/TIP.2009.2025089.

18. P. Drineas and M. W. Mahoney, “A randomized algorithm for a tensor-based generalizationof the SVD,” Linear Algebra Appl. 420(2–3), 553–571 (2007), http://dx.doi.org/10.1016/j.laa.2006.08.023.



http://dx.doi.org/10.1109/JSTARS.2012.2194696

http://dx.doi.org/10.1109/TIP.2011.2167626

19. J. Zhang et al., “Randomized SVD methods in hyperspectral imaging,” J. Elect. Comput.Eng., article 3, in press (2012).

20. L. Trefethen and D. Bau, Numerical Linear Algebra, Lecture 31, SIAM, Philadelphia, PA(1997).

21. N. Halko, P. G. Martinsson, and J. A. Tropp, “Finding structure with randomness: prob-abilistic algorithms for constructing approximate matrix decompositions,” SIAM Rev. 53(2),217–288 (2011), http://dx.doi.org/10.1137/090771806.

22. Y. Chen, N. Nasrabadi, and T. Tran, “Effects of linear projections on the performance oftarget detection and classification in hyperspectral imagery,” J. Appl. Rem. Sens. 5(1),053563 (2011), http://dx.doi.org/10.1117/1.3659894.

23. Q. Zhang et al., “Joint segmentation and reconstruction of hyperspectral data with com-pressed measurements,” Appl. Opt. 50(22), 4417–4435 (2011), http://dx.doi.org/10.1364/AO.50.004417.

24. M. Gehm et al., “Single-shot compressive spectral imaging with a dual-disperser architec-ture,” Opt. Express 15(21), 14013–14027 (2007), http://dx.doi.org/10.1364/OE.15.014013.

25. A. Wagadarikar et al., “Single disperser design for coded aperture snapshot spectral imag-ing,” Appl. Opt. 47(10), B44–B51 (2008), http://dx.doi.org/10.1364/AO.47.000B44.

26. S. Vempala, The Random Projection Method, Vol. 65, American Mathematical Society,Providence, Rhode Island (2004).

27. Q. Zhang, V. P. Pauca, and R. Plemmons, “Image reconstruction from double random pro-jections,” (2013), to be submitted.

28. W. Bajwa et al., “Toeplitz-structured compressed sensing matrices,” in IEEE/SP 14thWorkshop on Statistical Signal Processing, 2007, SSP’07, pp. 294–298, IEEE,Madison, WI (2007).

29. R. Archibald and G. Fann, “Feature selection and classification of hyperspectral imageswith support vector machines,” IEEE Geosci. Rem. Sens. Lett. 4(4), 674–677 (2007),http://dx.doi.org/10.1109/LGRS.2007.905116.

30. J. Ziv and A. Lempel, “Compression of individual sequences via variable-rate coding,”IEEE Trans. Inf. Theor. 24(5), 530–536 (1978), http://dx.doi.org/10.1109/TIT.1978.1055934.

31. K. Skretting, J. H. Husøy, and S. O. Aase, “Improved Huffman coding using recursive split-ting,” in Proc. Norwegian Signal Processing, NORSIG, IEEE, Norway (1999).

32. M. Nelson and J.-L. Gailly, The Data Compression Book, 2nd ed., M & T Books,New York, NY (1995).

33. T. Acharya and P.-S. Tsai, JPEG2000 Standard for Image Compression: Concepts,Algorithms and VLSI Architectures, Wiley & Sons Ltd., Hoboken, NJ (2005).

34. M. O’Neil and M. Burtscher, “Floating-point data compression at 75 gb∕s on a GPU,” inProc. Fourth Workshop on General Purpose Processing on Graphics Processing Units,p. 7, ACM, New York, NY (2011).

Biographies and photographs of the authors are not available.



http://dx.doi.org/10.1155/2012/409357

http://dx.doi.org/10.1155/2012/409357

Documents

Randomized methods in lossless compression of ...users.wfu.edu/.../JRandomizedAlgorithms_HSI_2013.pdf · Randomized methods in lossless compression of hyperspectral data Qiang Zhang,a