View
214
Download
0
Embed Size (px)
Citation preview
Data Compression 2
TerminologyTerminology
Physical versus logicalPhysical versus logical– PhysicalPhysical
Performed on data regardless of what Performed on data regardless of what information it containsinformation it contains
Translates a series of bits to another Translates a series of bits to another series of bitsseries of bits
– LogicalLogical Knowledge-basedKnowledge-based Change Change United Kingdom United Kingdom to to UKUK
Data Compression 3
TerminologyTerminology
SymmetricSymmetric– Compression and decompression Compression and decompression
roughly use the same techniques and roughly use the same techniques and take just as longtake just as long
– Data transmission which requires Data transmission which requires compression and decompression on-compression and decompression on-the-fly will require these types of the-fly will require these types of algorithmsalgorithms
Data Compression 4
TerminologyTerminology
AsymmetricAsymmetric– Most common is where compression Most common is where compression
takes a lot more time than decompressiontakes a lot more time than decompression In an image database, each image will be In an image database, each image will be
compressed once and decompressed many compressed once and decompressed many timestimes
– Less common is where decompression Less common is where decompression takes a lot more time than compressiontakes a lot more time than compression Creating many backup files which will hardly Creating many backup files which will hardly
ever be readever be read
Data Compression 5
TerminologyTerminology
Non-adaptiveNon-adaptive– Contain a static dictionary of Contain a static dictionary of
predefined substrings to encode predefined substrings to encode which are known to occur with high which are known to occur with high frequencyfrequency
AdaptiveAdaptive– Dictionary is built from scratchDictionary is built from scratch
Data Compression 6
TerminologyTerminology
Semi-adaptiveSemi-adaptive– In pass 1, an optimal dictionary is In pass 1, an optimal dictionary is
constructedconstructed– In pass 2, the actual compression In pass 2, the actual compression
occursoccurs
Data Compression 7
TerminologyTerminology
LosslessLossless– decompress(compress(data)) = datadecompress(compress(data)) = data
LossyLossy– decompress(compress(data)) decompress(compress(data)) data data– A small change in pixel values may be A small change in pixel values may be
invisible, howeverinvisible, however
Data Compression 9
Run-Length EncodingRun-Length Encoding
Repeating string of characters, Repeating string of characters, called a called a run, run, is coded into two is coded into two bytesbytes– First byte contains the First byte contains the run count, run count, one one
less than the number of repetitionsless than the number of repetitions– Second byte contains the Second byte contains the run value, run value,
the character being repeatedthe character being repeated
Data Compression 10
Run-Length EncodingRun-Length Encoding
‘‘77777zzzyyyyyyV’ becomes 77777zzzyyyyyyV’ becomes ‘472z5y0V’‘472z5y0V’– 15 byte string becomes 8 bytes long15 byte string becomes 8 bytes long– Compression ratio of almost 2 to 1Compression ratio of almost 2 to 1
Some strings become twice as longSome strings become twice as long– ‘‘7fu5JLY9jhYIujG’7fu5JLY9jhYIujG’
Data Compression 12
Lempel-Ziv-Welch (LZW)Lempel-Ziv-Welch (LZW)
LosslessLossless GIF, TIFF, V.42bis modem compression GIF, TIFF, V.42bis modem compression
standard, PostScript Level 2standard, PostScript Level 2 Substitutional or dictionary-basedSubstitutional or dictionary-based
– Algorithm builds a data dictionaryAlgorithm builds a data dictionary– Code emitted if pattern found in Code emitted if pattern found in
dictionary, while if not already in dictionary, while if not already in dictionary, it is addeddictionary, it is added
– Not necessary to have dictionary to do Not necessary to have dictionary to do decompressiondecompression
Data Compression 13
Lempel-Ziv-Welch (LZW)Lempel-Ziv-Welch (LZW)
HistoryHistory– 19771977
Abraham Lempel and Jakob Ziv published a Abraham Lempel and Jakob Ziv published a paper on a universal data compression paper on a universal data compression algorithmalgorithm– Called LZ77Called LZ77
– 19781978 Lempel and Ziv formulated an improved, Lempel and Ziv formulated an improved,
dictionary-based data compression algorithmdictionary-based data compression algorithm– Called LZ78Called LZ78
Data Compression 14
Lempel-Ziv-Welch (LZW)Lempel-Ziv-Welch (LZW)
HistoryHistory– 19811981
While working for Sperry, Lempel and Ziv, with While working for Sperry, Lempel and Ziv, with some other researchers filed for a patent for some other researchers filed for a patent for LZ78LZ78– Granted in 1984Granted in 1984
– 19841984 While working for Sperry, Terry Welch modified While working for Sperry, Terry Welch modified
LZ78LZ78– Result was LZW algorithmResult was LZW algorithm– Published in IEEE ComputerPublished in IEEE Computer
Data Compression 15
Lempel-Ziv-Welch (LZW)Lempel-Ziv-Welch (LZW)
HistoryHistory– 19851985
Sperry granted a patent for Welch’s Sperry granted a patent for Welch’s modification and for implementation of modification and for implementation of LZWLZW
– 19861986 Sperry and Burroughs merged to form Sperry and Burroughs merged to form
UnisysUnisys– Ownership of Sperry patent transferred to Ownership of Sperry patent transferred to
UnisysUnisys
Data Compression 16
Lempel-Ziv-Welch (LZW)Lempel-Ziv-Welch (LZW)
HistoryHistory– 19871987
CompuServe created GIF file formatCompuServe created GIF file format– Required use of LZW algorithmRequired use of LZW algorithm– Didn’t check patents for LZWDidn’t check patents for LZW– Unisys also didn’t realize GIF used LZW 1988Unisys also didn’t realize GIF used LZW 1988
Aldus released Revision 5.0 of TIFF file formatAldus released Revision 5.0 of TIFF file format– Used LZW algorithmUsed LZW algorithm
– 19901990 Unisys licensed Adobe for use of LZW patent Unisys licensed Adobe for use of LZW patent
for PostScriptfor PostScript
Data Compression 17
Lempel-Ziv-Welch (LZW)Lempel-Ziv-Welch (LZW)
HistoryHistory– 19911991
Unisys licensed Aldus for use of LZW Unisys licensed Aldus for use of LZW patent in TIFFpatent in TIFF
– 19931993 Unisys became aware the GIF file format Unisys became aware the GIF file format
used LZWused LZW Negotiations began with CompuServeNegotiations began with CompuServe
Data Compression 18
Lempel-Ziv-Welch (LZW)Lempel-Ziv-Welch (LZW)
HistoryHistory– 19941994
Unisys and CompuServe came to an Unisys and CompuServe came to an understanding that LZW algorithm by understanding that LZW algorithm by CompuServe would be licensed for the CompuServe would be licensed for the application of the GIF file format in software application of the GIF file format in software used primarily to access the CompuServe used primarily to access the CompuServe Information ServiceInformation Service
– 19951995 America Online and Prodigy also entered into America Online and Prodigy also entered into
license agreements with Unisys for LZWlicense agreements with Unisys for LZW
Data Compression 19
Lempel-Ziv-Welch (LZW)Lempel-Ziv-Welch (LZW)
GIF is not in public domainGIF is not in public domain Some people were suspicious Some people were suspicious
regarding the announcement of regarding the announcement of CompuServe that it was getting a CompuServe that it was getting a license from Unisyslicense from Unisys– In programming community it was In programming community it was
known for many years prior to this that known for many years prior to this that GIF used LZW and that LZW was GIF used LZW and that LZW was patented by Unisyspatented by Unisys
Data Compression 20
Lempel-Ziv-Welch (LZW)Lempel-Ziv-Welch (LZW)
Some people were suspicious regarding Some people were suspicious regarding the announcement of CompuServe that the announcement of CompuServe that it was getting a license from Unisysit was getting a license from Unisys– Unisys claimed that CompuServe only Unisys claimed that CompuServe only
found out rather late that this was the casefound out rather late that this was the case– GIF was becoming an integral part of WWW GIF was becoming an integral part of WWW
for exchanging low-resolution graphicsfor exchanging low-resolution graphics
Data Compression 21
Lempel-Ziv-Welch (LZW)Lempel-Ziv-Welch (LZW)
Eventually, Unisys’ LZW patent and Eventually, Unisys’ LZW patent and licensing agreements heldlicensing agreements held– Unisys reduced license fees after 1995Unisys reduced license fees after 1995– Unisys wouldn’t charge anything for Unisys wouldn’t charge anything for
inadvertent infringement by GIF inadvertent infringement by GIF software products delivered prior to software products delivered prior to 19951995
License fees still required for updates License fees still required for updates delivered after 1995delivered after 1995
Data Compression 22
Lempel-Ziv-Welch (LZW)Lempel-Ziv-Welch (LZW)
Not illegal to own, transmit, or Not illegal to own, transmit, or receive GIF files, just to compress receive GIF files, just to compress or decompress them without a or decompress them without a licenselicense
Data Compression 23
Lempel-Ziv-Welch (LZW)Lempel-Ziv-Welch (LZW)
3 1 2 5 1 3 1 4 1 2 5 1 5 5 1 5 5 1 4
Search buffer Lookahead buffer
offset = 0
length = 0
Output is (0, 0, code(4))
Data Compression 24
Lempel-Ziv-Welch (LZW)Lempel-Ziv-Welch (LZW)
3 1 2 5 1 3 1 4 1 2 5 1 5 5 1 5 5 1 4
Search buffer Lookahead buffer
offset = 7
length = 4
Output is (7, 4, code(5))
Data Compression 25
Lempel-Ziv-Welch (LZW)Lempel-Ziv-Welch (LZW)
3 1 2 5 1 3 1 4 1 2 5 1 5 5 1 5 5 1 4
Search buffer Lookahead buffer
offset = 3
length = 5
Output is (3, 5, code(4))
Data Compression 26
JPEGJPEG
Joint Photographic Experts GroupJoint Photographic Experts Group 19821982
– ISO (International Standard ISO (International Standard Organization) formed Photographic Organization) formed Photographic Experts Group (PEG)Experts Group (PEG)
Develop methods of transmitting video, Develop methods of transmitting video, images and text over ISDN (Integrated images and text over ISDN (Integrated Services Digital Network) linesServices Digital Network) lines
Data Compression 27
JPEGJPEG
19861986– Subgroup of CCITT (International Subgroup of CCITT (International
Telegraph and Telephone Telegraph and Telephone Consultative Committee) began to Consultative Committee) began to look at methods of compressing color look at methods of compressing color and gray-scale data for fax and gray-scale data for fax transmissiontransmission
– Methods for this were similar to those Methods for this were similar to those being considered by PEGbeing considered by PEG
Data Compression 28
JPEGJPEG
19871987– Two groups combined into JPEGTwo groups combined into JPEG
Most previous compression Most previous compression methods did poor job of methods did poor job of compressing continuous-tone compressing continuous-tone image dataimage data
Data Compression 29
JPEGJPEG
Very few file formats can support Very few file formats can support 24-bit raster images24-bit raster images– GIF only works for 256 colorsGIF only works for 256 colors– LZW doesn’t work well on scanned LZW doesn’t work well on scanned
image dataimage data– TIFF and BMP didn’t compress this TIFF and BMP didn’t compress this
type of image data very welltype of image data very well
Data Compression 30
JPEGJPEG
JPEG compresses continuous tone JPEG compresses continuous tone image data with a pixel depth of 6-image data with a pixel depth of 6-24 bits with good efficiency24 bits with good efficiency
JPEG itself doesn’t define standard JPEG itself doesn’t define standard file formatfile format
Data Compression 31
JPEGJPEG
Toolkit of methods with quality-Toolkit of methods with quality-compression trade-offcompression trade-off
LossyLossy– Discards information that human eye Discards information that human eye
cannot easily seecannot easily see Slight changes in color not perceived wellSlight changes in color not perceived well Slight changes in intensity are well Slight changes in intensity are well
perceivedperceived
Data Compression 32
JPEGJPEG
Works well with color or gray-scale Works well with color or gray-scale continuous tone images: continuous tone images: photographs, video stills, complex photographs, video stills, complex graphics which resemble natural graphics which resemble natural objectsobjects
Doesn’t work well for animations, ray Doesn’t work well for animations, ray tracing, line art, black-and-white tracing, line art, black-and-white documents, and typical vector documents, and typical vector graphicsgraphics
Data Compression 33
JPEGJPEG
End-user can tune quality of JPEG End-user can tune quality of JPEG encoder through use of Q-factor, encoder through use of Q-factor, which ranges from 1-100which ranges from 1-100– Q-factor = 1 produces smallest, worst Q-factor = 1 produces smallest, worst
quality imagesquality images– Q-factor = 100 produces largest, best Q-factor = 100 produces largest, best
quality imagesquality images Optimal value of Q-factor is image Optimal value of Q-factor is image
dependentdependent
Data Compression 34
JPEGJPEG
JPEG introduces artifacts in images JPEG introduces artifacts in images containing large areas of a single containing large areas of a single colorcolor
JPEG is slow if implemented in JPEG is slow if implemented in softwaresoftware
Baseline JPEGBaseline JPEG– Minimal subset of JPEG which all JPEG-Minimal subset of JPEG which all JPEG-
aware applications are required to aware applications are required to supportsupport
Data Compression 36
JPEGJPEG
Color transformColor transform– Encodes each component in a color Encodes each component in a color
model separatelymodel separately– Is independent of any color space Is independent of any color space
modelmodel
Data Compression 37
JPEGJPEG
Color transformColor transform– Best compression ratios result if a Best compression ratios result if a
luminance (gray scale)/chrominance luminance (gray scale)/chrominance (color) color space, such as YUV, is used(color) color space, such as YUV, is used Human eyes more sensitive to luminance Human eyes more sensitive to luminance
information (Y) than to chrominance information (Y) than to chrominance information (U, V)information (U, V)
The other models spread human sensitive The other models spread human sensitive information across each of their 3 information across each of their 3 componentscomponents
Data Compression 38
JPEGJPEG
Down-samplingDown-sampling– Average groups of pixels togetherAverage groups of pixels together– To exploit human’s lesser sensitivity to To exploit human’s lesser sensitivity to
chrominance information, we use fewer chrominance information, we use fewer pixels for the chrominance channelspixels for the chrominance channels In an image of 1000 In an image of 1000 1000 pixels, we might 1000 pixels, we might
use 1000 use 1000 1000 luminance pixels, but only 1000 luminance pixels, but only
500 500 500 chrominance pixels 500 chrominance pixels– Each chrominance pixel covers the same area as a Each chrominance pixel covers the same area as a
2 2 2 block of luminance pixels 2 block of luminance pixels
Data Compression 39
JPEGJPEG
Down-samplingDown-sampling– For each 2 For each 2 2 block, we can store 6 2 block, we can store 6
pixel values pixel values 4 luminance values and 2 chrominance 4 luminance values and 2 chrominance values [1 for each of 2 channels] values [1 for each of 2 channels]
instead of 12 instead of 12 4 pixel values for each of 3 channels4 pixel values for each of 3 channels
This 50% reduction in data has almost no This 50% reduction in data has almost no perceivable effectperceivable effect
Data Compression 40
JPEGJPEG
Discrete cosine transformDiscrete cosine transform– For each color channel, the image For each color channel, the image
data is divided into 8 data is divided into 8 8 blocks 8 blocks– DCT applied to each blockDCT applied to each block
Low-order, or DC, term represents Low-order, or DC, term represents average value in the blockaverage value in the block
Successive higher-order, or AC, terms Successive higher-order, or AC, terms represent the strength of more rapid represent the strength of more rapid changes across the blockchanges across the block
Data Compression 41
JPEGJPEG
Discrete cosine transformDiscrete cosine transform– Can discard high-frequency dataCan discard high-frequency data– DCT is lossless except for roundoff DCT is lossless except for roundoff
errorserrors– DCT is most costly step in JPEGDCT is most costly step in JPEG
Data Compression 43
JPEGJPEG
An 8 An 8 8 block 8 block from an 8 bit image an 8 bit image
124 125 122 120 122 119 117 118121 121 120 119 119 120 120 118126 124 123 122 121 121 120 120124 124 125 125 126 125 124 124127 127 128 129 130 128 127 125143 142 143 142 140 139 139 139150 148 152 152 152 152 150 151156 159 158 155 158 158 157 156
Data Compression 44
JPEGJPEG The DCT coefficients corresponding to the previous 8 The DCT coefficients corresponding to the previous 8 8 8
blockblock
39.88 6.56 -2.24 1.22 -0.37 -1.08 0.79 1.13-102.43 4.56 2.26 1.12 0.35 -0.63 -1.05 -0.48
37.77 1.31 1.77 0.25 -1.50 -2.21 -0.10 0.23-5.67 2.24 -1.32 -0.81 1.41 0.22 -0.13 0.17-3.37 -0.74 -1.75 0.77 -0.62 -2.65 -1.30 0.765.98 -0.13 -0.45 -0.77 1.99 -0.26 1.46 0.003.97 5.52 2.39 -0.55 -.051 -0.84 -0.52 -0.13
-3.43 0.51 -1.07 0.87 0.96 0.09 0.33 0.01
DC coefficient AC coefficients
Data Compression 45
JPEGJPEG
Quantization Quantization – Divide DCT output by a quantization Divide DCT output by a quantization
coefficient and round result to integercoefficient and round result to integer The larger the coefficient, the more data is The larger the coefficient, the more data is
lostlost Each of the 64 positions of the DCT output Each of the 64 positions of the DCT output
block has its own coefficientblock has its own coefficient– Higher order terms have a larger coefficientHigher order terms have a larger coefficient
Different coefficients for luminance and Different coefficients for luminance and chrominance channelschrominance channels
Data Compression 46
JPEGJPEG
QuantizationQuantization– This is the step controlled by the This is the step controlled by the
quality-factorquality-factor– Selecting quantization coefficients is Selecting quantization coefficients is
an artan art
Data Compression 47
JPEGJPEG
Sample quantization tableSample quantization table– Coefficients based on human perceptionCoefficients based on human perception
16 11 10 16 24 40 51 6112 12 14 19 26 58 60 5514 13 16 24 40 57 69 5614 17 22 29 51 87 80 6218 22 37 56 68 109 103 7724 35 55 64 81 104 113 9249 64 78 87 103 121 120 10172 92 95 98 112 100 103 99
Data Compression 48
JPEGJPEG
LabelsLabels– Label labLabel labijij corresponding to the quantized corresponding to the quantized
value of the transform coefficient cvalue of the transform coefficient cijij is is
where Qwhere Qijij is the (i,j) is the (i,j)thth element of the element of the quantization tablequantization table
labc
Qijij
ij
0 5.
Data Compression 49
JPEGJPEG
Quantizer labels corresponding to Quantizer labels corresponding to the previous 8 the previous 8 8 block 8 block2 1 0 0 0 0 0 0
-9 0 0 0 0 0 0 03 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 0
Data Compression 50
EncodingEncoding
Huffman compress resulting Huffman compress resulting coefficientscoefficients– Can use arithmetic coding as wellCan use arithmetic coding as well
Data Compression 51
Huffman CodingHuffman Coding
LosslessLossless
Symbol Probability Symbol Probabilitya .10 a .10b .20 b .20c .04 (ce) .11d .10 d .10e .07 f .20f .20 g .29g .29
Data Compression 52
Huffman CodingHuffman Coding
Symbol Probability Symbol Probability(ad) .20 ((ad)(ce)) .31
b .20 b .20(ce) .11 f .20
f .20 g .29g .29
Symbol Probability Symbol Probability((ad)(ce)) .31 (((ad)(ce))g) .60
(bf) .40 (bf) .40g .29
Data Compression 53
Huffman CodingHuffman Coding
0
0
0
0
1
1
11
11 0 0
a d c e g b f
Symbol Code Symbol Codea 0000 e 0011b 10 f 11c 0010 g 01d 0001
Data Compression 54
Arithmetic CodingArithmetic Coding
LosslessLossless
Symbol Probabilitya .3b .2c .1d .4
String = caddcadd
Data Compression 55
Arithmetic CodingArithmetic Coding1.0000
0.6000
0.5000
0.3000
0.0000
dc
ba
*
0.6000
0.5600
0.5500
0.5300
0.5000
dc
ba *
0.5300
0.5180
0.5150
0.5090
0.5000
dc
ba
*
0.5300
0.5252
0.5216
0.5090
0.5180
dc
ba
* • Tag for string caddcadd is any number in [0.5252, 0.5300)• Such a number is .10000111• Thus, the code of caddcadd is 10000111
Data Compression 56
JPEG ExtensionsJPEG Extensions
ProgressiveProgressive– For applications that need to receive For applications that need to receive
JPEG data streams and display them JPEG data streams and display them on the flyon the fly
– Baseline JPEG image can be displayed Baseline JPEG image can be displayed only after all of the image data has only after all of the image data has been receivedbeen received
Data Compression 57
JPEG ExtensionsJPEG Extensions
ProgressiveProgressive– Instead of interlacing, where a Instead of interlacing, where a
majority of the image must be sent to majority of the image must be sent to be able to tell what it is, we send be able to tell what it is, we send successively better resolution imagessuccessively better resolution images
Lossless JPEGLossless JPEG
Data Compression 58
Fractal CompressionFractal Compression
Suppose we have a linear, non-Suppose we have a linear, non-identity, function of one variable, identity, function of one variable, g, having xg, having xff as a fixed point as a fixed point
– g(xg(xff) = x) = xff
We can compute the fixed point by We can compute the fixed point by the approximation x*, g(x*), the approximation x*, g(x*), g(g(x*)), g(g(g(x*))), …, where x* is g(g(x*)), g(g(g(x*))), …, where x* is any initial approximationany initial approximation
Data Compression 59
Fractal CompressionFractal Compression
ExampleExample– f(x) = ax + bf(x) = ax + b
– Fixed point is solution to xFixed point is solution to xff = ax = axff + b + b oror
– For a = 0.5, b = 1, we have that xFor a = 0.5, b = 1, we have that xff = = 22
xb
af 1
Data Compression 60
Fractal CompressionFractal Compression
ExampleExample– To calculate the fixed point by the To calculate the fixed point by the
previous approximation, use the previous approximation, use the initial guess 1 and calculate g(1), initial guess 1 and calculate g(1), g(g(1)), g(g(g(1))), …, where g(x) = g(g(1)), g(g(g(1))), …, where g(x) = x/2 + 1x/2 + 1
The approximations are 1.5, 1.75, 1.875, The approximations are 1.5, 1.75, 1.875, 1.9375, …, which converges to 2, the 1.9375, …, which converges to 2, the fixed pointfixed point
Data Compression 61
Fractal CompressionFractal Compression
Given an image I, treated as an array Given an image I, treated as an array of integers, suppose we have a non-of integers, suppose we have a non-identity function g(I) = Iidentity function g(I) = I
If it was cheaper to encode g than to If it was cheaper to encode g than to encode I, we could communicate g and encode I, we could communicate g and reconstruct I by the sequence of reconstruct I by the sequence of approximations Iapproximations I00, g(I, g(I00), g(g(I), g(g(I00)), )), g(g(g(Ig(g(g(I00))), …, where I))), …, where I00 is the all zero is the all zero imageimage
Data Compression 62
Fractal CompressionFractal Compression
Partition image into equal size Partition image into equal size range blocksrange blocks
For each range block, RFor each range block, Rkk, find a , find a domain blockdomain block, D, Dkk, twice the size , twice the size of a range block, and a function gof a range block, and a function gkk such that such that g D Rk k k
Data Compression 63
Fractal CompressionFractal Compression
Consider the functionConsider the function
– This function has a fixed point IThis function has a fixed point Iff = = g(Ig(Iff), where), where
This function has a fixed point IThis function has a fixed point Iff = g(I = g(Iff), ), wherewhere
g gkk
I If
Data Compression 64
Fractal CompressionFractal Compression
ggkk is a composition of a is a composition of a geometricgeometric transformation followed by a transformation followed by a massicmassic transformation transformation– Geometric transformationGeometric transformation
Moves domain blockMoves domain block Changes the size of the domain blockChanges the size of the domain block
– Massic transformationMassic transformation Adjusts intensity and orientation of pixelsAdjusts intensity and orientation of pixels