Upload
amber-habib
View
1.747
Download
2
Embed Size (px)
DESCRIPTION
A description of how the discrete Fourier (cosine) transform is used to store and compress JPEG images.
Citation preview
Discrete Cosine Transform and JPEG Files
Amber Habib
Mathematical Sciences Foundation
Delhi
www.mathscifound.org
Abstract
A digital image can be viewed as an array of numbers, each numberrepresenting the colour value of the corresponding pixel. In the JPEGformat, these numbers are stored indirectly, via their discrete cosine
transform. This enables easy compression, resizing, etc. For furthersavings, the array produced by the discrete cosine transform is storedusing Huffman encoding.
The calculations and plotting were carried out using Mathematica.These notes were prepared for MSF’s Programme in MathematicalSimulation and IT. They provided the base for student projects inimage manipulation using Matlab. The projects used Fourier analysisas well as wavelets.
Contents
1 Discrete Fourier Transform 2
2 Discrete Cosine Transform 4
3 The Two Dimensional Discrete Cosine Transform 6
4 Huffman Encoding 10
1
1 DISCRETE FOURIER TRANSFORM 2
1 Discrete Fourier Transform
Consider the data depicted in the following graph:
1 2 3 4 5 6
1
2
3
4
5
To represent this data in a way that can be easily manipulated for differentpurposes, we wish to construct a function that passes through all the datapoints.
More specifically, we construct a function of the form
f(x) =A0
2+
5∑
k=1
Ak cos(kx) +
4∑
k=1
Bk sin(kx), (1)
where the Ak’s and Bk’s are suitably chosen constants. This function iscalled the discrete Fourier transform of the data. Note that we have 10 datapoints1 and 10 unknown constants. Further, each data point, on substitutionin (1) creates a linear equation for the unknowns. So we can hope to solvethis linear system and obtain a unique set of values for Ak and Bk.
For our data, the Ak’s turn out to be given by the vector
A = (5.65486,−0.628319,−0.628319,
−0.628319,−0.628319,−0.314159),
and the Bk’s by the vector
B = (−1.93377,−0.864806,−0.4565,
−0.204153, 0).
1Actually 11, but the last is just a repeat of the first.
1 DISCRETE FOURIER TRANSFORM 3
Note: We have not said anything about how to find the coefficients Ak andBk. Our immediate interest is in observing that this knowledge is useful,and then later we will see how to obtain it.
The discrete Fourier transform f(x) passes exactly through the data points:
1 2 3 4 5 6
1
2
3
4
5
6
Now we investigate the contribution of the different coefficients Ak and Bk.Suppose we set A4 and B4 to zero. Then the function becomes:
1 2 3 4 5 6
1
2
3
4
5
This function doesn’t represent the data exactly but it does roughly followthe general trend.
Now let us instead drop the A2 and B2 terms:
2 DISCRETE COSINE TRANSFORM 4
1 2 3 4 5 6
1
2
3
4
5
The loss in quality is much greater. This shows that the “higher order”terms contribute less than the “lower order” terms. Therefore, we need notstore them to the same order of accuracy.
Suppose then, that we round off the last couple of coefficients of A and B:
A = (2.82743,−0.628319,−0.628319,−0.628319,−0.6,−0.3),
B = (−1.93377,−0.864806,−0.4565,−0.2, 0).
This makes no noticeable difference to the accuracy of the interpolation:
1 2 3 4 5 6
1
2
3
4
5
6
2 Discrete Cosine Transform
It is possible to manipulate the discrete Fourier transform of a set of dataso that it consists of only cosine terms (all the Bk’s are zero). The benefitsfrom this are simpler computational procedures, especially when we deal
2 DISCRETE COSINE TRANSFORM 5
with data which comes as arrays instead of lists. Thus, consider a string ofdata, such as
Data = (123, 157, 142, 127, 131, 102, 99, 235).
Instead of distributing these values at evenly spaced points 2πk/8, k =0, 1, . . . , 7, we associate them to points π(2k + 1)/16. Further, we symmet-rically assign the same values to the points −π(2k + 1)/16. Thus, we get acollection of data points that is symmetric with respect to the y-axis:
-3 -2 -1 1 2 3
120
140
160
180
200
220
If we calculate the discrete Fourier transform for such data, we find that thesine terms vanish (because sine is odd) and only the cosine terms remain(because cosine is even, like the data). This special form is called the discrete
cosine transform of the data.
For data such as we have given (8 points), the discrete cosine transform is
f(x) =A0
2+
7∑
k=1
Ak cos(kx),
where the Fourier coefficients Ak are given by
Ak =1
4
7∑
n=0
Data(n) cos
(
(2n + 1)πk
16
)
.
Note that we have numbered the data points as 0, 1, . . . , 7.
For the example we have given, this formula produces the following valuesfor the Fourier coefficients:
A0 = 139.5 A1 = −10.04 A2 = 24.25 A3 = −35.36A4 = 20.51 A5 = −28.66 A6 = 6.79 A7 = −4.22
3 THE TWO DIMENSIONAL DISCRETE COSINE TRANSFORM 6
The corresponding cosine transform f(x) exactly passes through the datapoints:
0.5 1 1.5 2 2.5 3100
125
150
175
200
225
250
3 The Two Dimensional Discrete Cosine Trans-
form
A digital image consists of a rectangular array of closely packed pixels, each ofwhom is assigned a colour value. These colour values are given by numbers,and various formats exist for mapping colours to corresponding numbers.For instance, in one format, colours are broken up into their red, green andblue (RGB) components and a particular colour is chosen by assigning acorresponding intensity (via a number) to each of the RGB components.For example, the background colour of the following box is obtained bysetting R = G = B = 0.8 :
A Shaded Box
Thus, a digital image, for the mathematician, is just an array of numbers.To manipulate this array, we use a two dimensional version of the discretecosine transform.
Consider the data depicted in the following table:
3 THE TWO DIMENSIONAL DISCRETE COSINE TRANSFORM 7
123 157 142 127 131 102 99 235
134 135 157 112 109 106 108 136
135 144 159 108 112 118 109 126
176 183 161 111 186 130 132 133
137 149 154 126 185 146 131 132
121 130 127 146 205 150 130 126
117 151 160 181 250 161 134 125
168 170 171 178 183 179 112 124
We number the rows and columns as
0, 1, 2, . . . , 7.
Thus the (0, 0) entry is 123, the (3, 7) entry is 133, etc.
To this data, we apply the two dimensional discrete Cosine transform, definedby:
DCT(u, v) =1
4C(u)C(v)
7∑
x=0
7∑
y=0
[
Data(x, y) ×
cos(
π(2x + 1)u
16
)
cos(
π(2y + 1)v
16
) ]
.
Here Data(x, y) refers to the (x, y) entry in the data table given above. Thecoefficients C(u) and C(v) are defined by
C(h) =1√2
if h = 0 and C(h) = 1 if h 6= 0.
The discrete cosine transform produces the following table, after rounding:2
1149 39 -43 -10 25 -84 11 41
-81 -3 114 -74 -6 -2 21 -6
14 -11 0 -43 25 -3 17 -39
1 -61 -14 -12 36 -24 -18 4
44 13 36 -5 9 -22 6 -8
36 -12 -9 -5 20 -29 -21 13
-19 -8 21 -6 3 2 11 -22
-5 -14 -11 -18 -5 -1 7 -5
2We always round off the results of our calculations to integers, because (1) integerstake less space than reals, (2) integer operations are faster, and (3) color values are usuallyintegers. It is one of the important strengths of the discrete cosine transform that theerrors introduced by the rounding off are inconsequential.
3 THE TWO DIMENSIONAL DISCRETE COSINE TRANSFORM 8
The first thing is to establish that we can recover the data from its dis-crete cosine transform. For this purpose we define the inverse discrete cosine
transform by
IDCT(x, y) =1
4
7∑
u=0
7∑
v=0
[
C(u)C(v)DCT(u, v) ×
cos(
π(2x + 1)u
16
)
cos(
π(2y + 1)v
16
) ]
.
If we apply the IDCT to the DCT table, we get (after rounding):
123 157 142 127 131 102 99 235
134 135 157 112 109 106 108 135
135 144 159 108 112 118 109 127
176 183 161 111 186 130 132 133
137 149 154 126 185 146 131 132
121 130 127 145 205 150 130 126
117 151 160 181 250 161 134 125
168 170 171 178 183 179 112 124
Can you spot any difference between this and the original data?
Suppose we store the data via its DCT. We ask if we can afford to lose someof the details of the DCT without significantly affecting the quality of thedata. One way to reduce the amount of space required by the DCT is todivide every entry by, say, 8 (thus saving 3 bits per entry since the numbersare stored in binary).
Then the DCT becomes
144 5 -5 -1 3 -10 1 5
-10 0 14 -9 -1 0 3 -1
2 -1 0 -5 3 0 2 -5
0 -8 -2 -2 4 -3 -2 0
6 2 4 -1 1 -3 1 -1
4 -2 -1 -1 2 -4 -3 2
-2 -1 3 -1 0 0 1 -3
-1 -2 -1 -2 -1 0 1 -1
Clearly, this “Compressed DCT” occupies much less space.
To recover the original data, we just uncompress by multiplying by 8, andthen apply IDCT.
3 THE TWO DIMENSIONAL DISCRETE COSINE TRANSFORM 9
This time there is some loss:
122 161 145 130 128 106 101 233
142 132 156 116 109 107 108 132
138 146 155 105 109 118 113 127
175 184 163 110 190 127 132 134
137 148 155 128 182 149 130 133
119 132 126 147 204 149 133 129
115 149 156 177 248 159 137 125
174 173 171 180 186 178 114 123
Another approach is to compress the entries on the top left less (as theseare more significant). For example, we divide the entries in the top left 4×4submatrix of DCT by 2, and all the other entries by 8:
574 20 -22 -5 3 -10 1 5
-40 -2 57 -37 -1 0 3 -1
7 -6 0 -22 3 0 2 -5
0 -30 -7 -6 4 -3 -2 0
6 2 4 -1 1 -3 1 -1
4 -2 -1 -1 2 -4 -3 2
-2 -1 3 -1 0 0 1 -3
-1 -2 -1 -2 -1 0 1 -1
We uncompress the last table by multiplying by 2 and 8 in the appropriateplaces. Then we apply IDCT, and we get:
120 159 144 129 127 105 100 234
138 130 156 116 109 106 108 133
135 145 156 106 109 117 113 127
174 183 164 111 191 127 131 132
139 149 154 127 183 149 128 129
121 133 125 146 204 149 131 125
114 148 156 177 248 158 136 123
170 172 172 182 186 177 113 123
This hybrid approach offers almost as much compression as the previousone, with lower loss of quality.
4 HUFFMAN ENCODING 10
4 Huffman Encoding
The discrete cosine transform produces the numbers used to store and trans-mit an image. However, these numbers are not stored according to theirvalues, but through a code that further reduces the required space. Thiscode names numbers according to their frequency. More frequent numbersare given shorter codes.
Consider the compressed DCT we had obtained in the last section:
144 5 -5 -1 3 -10 1 5
-10 0 14 -9 -1 0 3 -1
2 -1 0 -5 3 0 2 -5
0 -8 -2 -2 4 -3 -2 0
6 2 4 -1 1 -3 1 -1
4 -2 -1 -1 2 -4 -3 2
-2 -1 3 -1 0 0 1 -3
-1 -2 -1 -2 -1 0 1 -1
We will construct a binary tree out of the numbers in this grid.
Step 1. List all the numbers occurring in the table, along with their frequen-cies:
Data -10 -9 -8 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 14 144
Freq. 2 1 1 3 1 4 7 14 9 5 5 4 3 2 1 1 1
Step 2. Arrange the numbers in increasing order of frequency:
Data -9 -8 -4 6 14 144 -10 5 -5 4 -3 3 1 2 -2 0 -1
Freq. 1 1 1 1 1 1 2 2 3 3 4 4 5 5 7 9 14
Each number will become a ‘leaf’of the binary tree. This leaf will be labelledby the number and its frequency. For instance, since 5 has frequency 2, thecorresponding leaf will be drawn as 5:2 .
Step 3. Two leaves with the lowest frequency are combined into one node.This node is labelled by the sum of their frequencies. Thus, we get
4 HUFFMAN ENCODING 11
n2
-9:1 -8:1���
AAA
We repeat this step, with the following modification: Leaves and nodesalready collected below a node are ignored while comparing frequencies.Only the top nodes and remaining leaves are taken into account.
Step 4. The starting situation is:
-4:1 6:1 14:1 144:1 n2
-9:1 -8:1���
AAA
-10:2 5:2 -5:3 · · ·
On collecting the lowest frequency leaves under a node, we get:
14:1 144:1 n2
-4:1 6:1���
AAA
n2
-9:1 -8:1���
AAA
-10:2 5:2 -5:3 · · ·
Step 5.
n2
14:1 144:1���
AAA
n2
-4:1 6:1���
AAA
n2
-9:1 -8:1���
AAA
-10:2 5:2 · · ·
Step 6.
n2
-9:1 -8:1���
AAA
-10:2 5:2 -5:3 4:3 n4
�����
AA
AA
A
n2
14:1 144:1���
AAA
n2
-4:1 6:1���
AAA
-3:4 · · ·
4 HUFFMAN ENCODING 12
n4
�����
AA
AA
A
n2
14:1 144:1���
AAA
n2
-4:1 6:1���
AAA
n4
�����
AA
AA
A
n2
-9:1 -8:1���
AAA
-10:2
n17
�����
AA
AA
A
n8
-3:4���
AAA
n9
3:4 n5
5:2 -5:3���
AAA
���
AAA
n14
�����
AA
AA
A
n7
4:3���
AAA
-2:7
n36
�����
AA
AA
A
n19
0:9 n10
1:5 2:5���
AAA
���
AAA
n28
� ��64
��
���
������������
AA
AA
AA
AA
AA
AA
AAAAA
-1:14
0 1
0 1 0 1
0 1 0 1 0 1
0 1 0 1 0 1
0 1
0 1 0 1
0 1
0 1 0 1 0 1
Figure 1: The binary tree for the Huffman code.
By now, the general scheme should be clear. It is evident that we have madecertain choices in each step: namely the order in which we write nodes/leaveshaving the same frequency. This does affect the final binary tree we obtain.However, once we have described the method of coding, it will be obviousthat these choices do not affect the efficiency of the encoding.
Figure 1 shows the final binary tree for our data. We have also labelled eachbranch of the tree: by 0 if it is a left branch and by 1 if it is a right branch.The encoding proceeds as follows. To obtain the code for a value, start fromthe root (the node labelled 64) and move down to the value, noting downeach 0 or 1 label for a branch as you cross it. Thus, in moving to the leaffor the value -10, we obtain the sequence 00011. This is the code for that
4 HUFFMAN ENCODING 13
144 - 5 −5 −1
−10�
0 14 −9
2 −1 0 −5
0 −8 −2 −2
Figure 2: The sequence in which values are encoded.
value.
Note that the most frequent value (-1) has the shortest code (01), and theless frequent ones have progressively longer codes. A value such as 144, withfrequency 1, has the longest code: 100001.
The table is coded by going through the values one-by-one in the zigzag man-ner shown in Figure 2 and writing their codes – without any separators! Forinstance the starting sequence 144, 5, -10,. . . , becomes 1000011011000011. . . .(144 → 100001, 5 → 10110, −10 → 00011) To decode this string, one needonly refer to the tree. We start at the root and follow the left or rightbranches according to whether we see a 0 or a 1. When we reach a leaf, wenote the corresponding value and start again at the root.
Exercise. Show that our table of values can be described by 231 binarydigits if we use Huffman encoding. If, on the other hand, we had workedwith codes of fixed length, we would have needed 320 binary digits.