Discrete Cosine Transform and JPEG Files

Discrete Cosine Transform and JPEG Files

Amber Habib

Mathematical Sciences Foundation

Delhi

www.mathscifound.org

Abstract

A digital image can be viewed as an array of numbers, each numberrepresenting the colour value of the corresponding pixel. In the JPEGformat, these numbers are stored indirectly, via their discrete cosine

transform. This enables easy compression, resizing, etc. For furthersavings, the array produced by the discrete cosine transform is storedusing Huffman encoding.

The calculations and plotting were carried out using Mathematica.These notes were prepared for MSF’s Programme in MathematicalSimulation and IT. They provided the base for student projects inimage manipulation using Matlab. The projects used Fourier analysisas well as wavelets.

Contents

1 Discrete Fourier Transform 2

2 Discrete Cosine Transform 4

3 The Two Dimensional Discrete Cosine Transform 6

4 Huffman Encoding 10

1

1 DISCRETE FOURIER TRANSFORM 2

1 Discrete Fourier Transform

Consider the data depicted in the following graph:

1 2 3 4 5 6

1

2

3

4

5

To represent this data in a way that can be easily manipulated for differentpurposes, we wish to construct a function that passes through all the datapoints.

More specifically, we construct a function of the form

f(x) =A0

2+

5∑

k=1

Ak cos(kx) +

4∑

k=1

Bk sin(kx), (1)

where the Ak’s and Bk’s are suitably chosen constants. This function iscalled the discrete Fourier transform of the data. Note that we have 10 datapoints1 and 10 unknown constants. Further, each data point, on substitutionin (1) creates a linear equation for the unknowns. So we can hope to solvethis linear system and obtain a unique set of values for Ak and Bk.

For our data, the Ak’s turn out to be given by the vector

A = (5.65486,−0.628319,−0.628319,

−0.628319,−0.628319,−0.314159),

and the Bk’s by the vector

B = (−1.93377,−0.864806,−0.4565,

−0.204153, 0).

1Actually 11, but the last is just a repeat of the first.

1 DISCRETE FOURIER TRANSFORM 3

Note: We have not said anything about how to find the coefficients Ak andBk. Our immediate interest is in observing that this knowledge is useful,and then later we will see how to obtain it.

The discrete Fourier transform f(x) passes exactly through the data points:

1 2 3 4 5 6

1

2

3

4

5

6

Now we investigate the contribution of the different coefficients Ak and Bk.Suppose we set A4 and B4 to zero. Then the function becomes:

1 2 3 4 5 6

1

2

3

4

5

This function doesn’t represent the data exactly but it does roughly followthe general trend.

Now let us instead drop the A2 and B2 terms:

2 DISCRETE COSINE TRANSFORM 4

1 2 3 4 5 6

1

2

3

4

5

The loss in quality is much greater. This shows that the “higher order”terms contribute less than the “lower order” terms. Therefore, we need notstore them to the same order of accuracy.

Suppose then, that we round off the last couple of coefficients of A and B:

A = (2.82743,−0.628319,−0.628319,−0.628319,−0.6,−0.3),

B = (−1.93377,−0.864806,−0.4565,−0.2, 0).

This makes no noticeable difference to the accuracy of the interpolation:

1 2 3 4 5 6

1

2

3

4

5

6

2 Discrete Cosine Transform

It is possible to manipulate the discrete Fourier transform of a set of dataso that it consists of only cosine terms (all the Bk’s are zero). The benefitsfrom this are simpler computational procedures, especially when we deal

2 DISCRETE COSINE TRANSFORM 5

with data which comes as arrays instead of lists. Thus, consider a string ofdata, such as

Data = (123, 157, 142, 127, 131, 102, 99, 235).

Instead of distributing these values at evenly spaced points 2πk/8, k =0, 1, . . . , 7, we associate them to points π(2k + 1)/16. Further, we symmet-rically assign the same values to the points −π(2k + 1)/16. Thus, we get acollection of data points that is symmetric with respect to the y-axis:

-3 -2 -1 1 2 3

120

140

160

180

200

220

If we calculate the discrete Fourier transform for such data, we find that thesine terms vanish (because sine is odd) and only the cosine terms remain(because cosine is even, like the data). This special form is called the discrete

cosine transform of the data.

For data such as we have given (8 points), the discrete cosine transform is

f(x) =A0

2+

7∑

k=1

Ak cos(kx),

where the Fourier coefficients Ak are given by

Ak =1

4

7∑

n=0

Data(n) cos

(

(2n + 1)πk

16

)

.

Note that we have numbered the data points as 0, 1, . . . , 7.

For the example we have given, this formula produces the following valuesfor the Fourier coefficients:

A0 = 139.5 A1 = −10.04 A2 = 24.25 A3 = −35.36A4 = 20.51 A5 = −28.66 A6 = 6.79 A7 = −4.22

3 THE TWO DIMENSIONAL DISCRETE COSINE TRANSFORM 6

The corresponding cosine transform f(x) exactly passes through the datapoints:

0.5 1 1.5 2 2.5 3100

125

150

175

200

225

250

3 The Two Dimensional Discrete Cosine Trans-

form

A digital image consists of a rectangular array of closely packed pixels, each ofwhom is assigned a colour value. These colour values are given by numbers,and various formats exist for mapping colours to corresponding numbers.For instance, in one format, colours are broken up into their red, green andblue (RGB) components and a particular colour is chosen by assigning acorresponding intensity (via a number) to each of the RGB components.For example, the background colour of the following box is obtained bysetting R = G = B = 0.8 :

A Shaded Box

Thus, a digital image, for the mathematician, is just an array of numbers.To manipulate this array, we use a two dimensional version of the discretecosine transform.

Consider the data depicted in the following table:


123 157 142 127 131 102 99 235

134 135 157 112 109 106 108 136

135 144 159 108 112 118 109 126

176 183 161 111 186 130 132 133

137 149 154 126 185 146 131 132

121 130 127 146 205 150 130 126

117 151 160 181 250 161 134 125

168 170 171 178 183 179 112 124

We number the rows and columns as

0, 1, 2, . . . , 7.

Thus the (0, 0) entry is 123, the (3, 7) entry is 133, etc.

To this data, we apply the two dimensional discrete Cosine transform, definedby:

DCT(u, v) =1

4C(u)C(v)

7∑

x=0

7∑

y=0

[

Data(x, y) ×

cos(

π(2x + 1)u

16

)

cos(

π(2y + 1)v

16

) ]

.

Here Data(x, y) refers to the (x, y) entry in the data table given above. Thecoefficients C(u) and C(v) are defined by

C(h) =1√2

if h = 0 and C(h) = 1 if h 6= 0.

The discrete cosine transform produces the following table, after rounding:2

1149 39 -43 -10 25 -84 11 41

-81 -3 114 -74 -6 -2 21 -6

14 -11 0 -43 25 -3 17 -39

1 -61 -14 -12 36 -24 -18 4

44 13 36 -5 9 -22 6 -8

36 -12 -9 -5 20 -29 -21 13

-19 -8 21 -6 3 2 11 -22

-5 -14 -11 -18 -5 -1 7 -5

2We always round off the results of our calculations to integers, because (1) integerstake less space than reals, (2) integer operations are faster, and (3) color values are usuallyintegers. It is one of the important strengths of the discrete cosine transform that theerrors introduced by the rounding off are inconsequential.


The first thing is to establish that we can recover the data from its dis-crete cosine transform. For this purpose we define the inverse discrete cosine

transform by

IDCT(x, y) =1

4

7∑

u=0

7∑

v=0

[

C(u)C(v)DCT(u, v) ×

cos(

π(2x + 1)u

16

)

cos(

π(2y + 1)v

16

) ]

.

If we apply the IDCT to the DCT table, we get (after rounding):

123 157 142 127 131 102 99 235

134 135 157 112 109 106 108 135

135 144 159 108 112 118 109 127

176 183 161 111 186 130 132 133

137 149 154 126 185 146 131 132

121 130 127 145 205 150 130 126

117 151 160 181 250 161 134 125

168 170 171 178 183 179 112 124

Can you spot any difference between this and the original data?

Suppose we store the data via its DCT. We ask if we can afford to lose someof the details of the DCT without significantly affecting the quality of thedata. One way to reduce the amount of space required by the DCT is todivide every entry by, say, 8 (thus saving 3 bits per entry since the numbersare stored in binary).

Then the DCT becomes

144 5 -5 -1 3 -10 1 5

-10 0 14 -9 -1 0 3 -1

2 -1 0 -5 3 0 2 -5

0 -8 -2 -2 4 -3 -2 0

6 2 4 -1 1 -3 1 -1

4 -2 -1 -1 2 -4 -3 2

-2 -1 3 -1 0 0 1 -3

-1 -2 -1 -2 -1 0 1 -1

Clearly, this “Compressed DCT” occupies much less space.

To recover the original data, we just uncompress by multiplying by 8, andthen apply IDCT.


This time there is some loss:

122 161 145 130 128 106 101 233

142 132 156 116 109 107 108 132

138 146 155 105 109 118 113 127

175 184 163 110 190 127 132 134

137 148 155 128 182 149 130 133

119 132 126 147 204 149 133 129

115 149 156 177 248 159 137 125

174 173 171 180 186 178 114 123

Another approach is to compress the entries on the top left less (as theseare more significant). For example, we divide the entries in the top left 4×4submatrix of DCT by 2, and all the other entries by 8:

574 20 -22 -5 3 -10 1 5

-40 -2 57 -37 -1 0 3 -1

7 -6 0 -22 3 0 2 -5

0 -30 -7 -6 4 -3 -2 0

6 2 4 -1 1 -3 1 -1

4 -2 -1 -1 2 -4 -3 2

-2 -1 3 -1 0 0 1 -3

-1 -2 -1 -2 -1 0 1 -1

We uncompress the last table by multiplying by 2 and 8 in the appropriateplaces. Then we apply IDCT, and we get:

120 159 144 129 127 105 100 234

138 130 156 116 109 106 108 133

135 145 156 106 109 117 113 127

174 183 164 111 191 127 131 132

139 149 154 127 183 149 128 129

121 133 125 146 204 149 131 125

114 148 156 177 248 158 136 123

170 172 172 182 186 177 113 123

This hybrid approach offers almost as much compression as the previousone, with lower loss of quality.

4 HUFFMAN ENCODING 10

4 Huffman Encoding

The discrete cosine transform produces the numbers used to store and trans-mit an image. However, these numbers are not stored according to theirvalues, but through a code that further reduces the required space. Thiscode names numbers according to their frequency. More frequent numbersare given shorter codes.

Consider the compressed DCT we had obtained in the last section:

144 5 -5 -1 3 -10 1 5

-10 0 14 -9 -1 0 3 -1

2 -1 0 -5 3 0 2 -5

0 -8 -2 -2 4 -3 -2 0

6 2 4 -1 1 -3 1 -1

4 -2 -1 -1 2 -4 -3 2

-2 -1 3 -1 0 0 1 -3

-1 -2 -1 -2 -1 0 1 -1

We will construct a binary tree out of the numbers in this grid.

Step 1. List all the numbers occurring in the table, along with their frequen-cies:

Data -10 -9 -8 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 14 144

Freq. 2 1 1 3 1 4 7 14 9 5 5 4 3 2 1 1 1

Step 2. Arrange the numbers in increasing order of frequency:

Data -9 -8 -4 6 14 144 -10 5 -5 4 -3 3 1 2 -2 0 -1

Freq. 1 1 1 1 1 1 2 2 3 3 4 4 5 5 7 9 14

Each number will become a ‘leaf’of the binary tree. This leaf will be labelledby the number and its frequency. For instance, since 5 has frequency 2, thecorresponding leaf will be drawn as 5:2 .

Step 3. Two leaves with the lowest frequency are combined into one node.This node is labelled by the sum of their frequencies. Thus, we get


n2

-9:1 -8:1��

AAA

We repeat this step, with the following modification: Leaves and nodesalready collected below a node are ignored while comparing frequencies.Only the top nodes and remaining leaves are taken into account.

Step 4. The starting situation is:

-4:1 6:1 14:1 144:1 n2

-9:1 -8:1��

AAA

-10:2 5:2 -5:3 · · ·

On collecting the lowest frequency leaves under a node, we get:

14:1 144:1 n2

-4:1 6:1��

AAA

n2

-9:1 -8:1��

AAA

-10:2 5:2 -5:3 · · ·

Step 5.

n2

14:1 144:1��

AAA

n2

-4:1 6:1��

AAA

n2

-9:1 -8:1��

AAA

-10:2 5:2 · · ·

Step 6.

n2

-9:1 -8:1��

AAA

-10:2 5:2 -5:3 4:3 n4

��

AA

AA

A

n2

14:1 144:1��

AAA

n2

-4:1 6:1��

AAA

-3:4 · · ·


n4

��

AA

AA

A

n2

14:1 144:1��

AAA

n2

-4:1 6:1��

AAA

n4

��

AA

AA

A

n2

-9:1 -8:1��

AAA

-10:2

n17

��

AA

AA

A

n8

-3:4��

AAA

n9

3:4 n5

5:2 -5:3��

AAA

��

AAA

n14

��

AA

AA

A

n7

4:3��

AAA

-2:7

n36

��

AA

AA

A

n19

0:9 n10

1:5 2:5��

AAA

��

AAA

n28

� ��64

��

��

��

AA

AA

AA

AA

AA

AA

AAAAA

-1:14

0 1

0 1 0 1

0 1 0 1 0 1

0 1 0 1 0 1

0 1

0 1 0 1

0 1

0 1 0 1 0 1

Figure 1: The binary tree for the Huffman code.

By now, the general scheme should be clear. It is evident that we have madecertain choices in each step: namely the order in which we write nodes/leaveshaving the same frequency. This does affect the final binary tree we obtain.However, once we have described the method of coding, it will be obviousthat these choices do not affect the efficiency of the encoding.

Figure 1 shows the final binary tree for our data. We have also labelled eachbranch of the tree: by 0 if it is a left branch and by 1 if it is a right branch.The encoding proceeds as follows. To obtain the code for a value, start fromthe root (the node labelled 64) and move down to the value, noting downeach 0 or 1 label for a branch as you cross it. Thus, in moving to the leaffor the value -10, we obtain the sequence 00011. This is the code for that


144 - 5 −5 −1

−10�

0 14 −9

2 −1 0 −5

0 −8 −2 −2

Figure 2: The sequence in which values are encoded.

value.

Note that the most frequent value (-1) has the shortest code (01), and theless frequent ones have progressively longer codes. A value such as 144, withfrequency 1, has the longest code: 100001.

The table is coded by going through the values one-by-one in the zigzag man-ner shown in Figure 2 and writing their codes – without any separators! Forinstance the starting sequence 144, 5, -10,. . . , becomes 1000011011000011. . . .(144 → 100001, 5 → 10110, −10 → 00011) To decode this string, one needonly refer to the tree. We start at the root and follow the left or rightbranches according to whether we see a 0 or a 1. When we reach a leaf, wenote the corresponding value and start again at the root.

Exercise. Show that our table of values can be described by 231 binarydigits if we use Huffman encoding. If, on the other hand, we had workedwith codes of fixed length, we would have needed 320 binary digits.

Documents

Discrete Cosine Transform and JPEG Files