Context-based image retrieval system: mechanisms ...chens/courses/cis6931/2001/Ni.doc · Web viewContext-based image retrieval system: mechanisms differences from traditional retrieval

Context-based image retrieval system: mechanisms differences from

traditional retrieval systems and related compression and histogram

techniques

(Yan Wei, Yang Li, Ni Xudong)

Abstract

It is necessary to index and retrieve relative information on Internet and keep pace

with the latest data. For these purposes, Retrieval Systems have been researched and

developed for indexing and retrieving. However, there is much space left for Retrieval

Systems to develop. The increasing development of advanced multimedia applications

requires new technologies for organizing and retrieving by content databases of still

digital images or digital video sequences --- context-based image retrieval system.

At this point, this paper provides a comprehensive survey of the mechanisms of

traditional retrieval systems and content-based image retrieval systems and introduces

two promising research directions --- histogram and compression. Based on the

handsome projects, the paper firstly compares the traditional retrieval systems and

content-based image retrieval systems. Secondly, this paper focuses on the color

feature, a main tool for the human to recognize the object. Since Swain and Ballard,

many effective methods have been proposed to retrieve the image on the base of color

feature, such as Histogram Intersection and Color Histogram. With the more

requirement and looming problem in this field, further researches and techniques are

proposed to increase the quality and speed of image retrieval, this section will survey

on the recursive HSV-space segmentation technique, description and representation of

multimedia information, multiple distributions, color cooccurrence histogram and

compressed color histogram, as well as the intelligent retrieval system. In the third

section, this paper provides a survey about compression techniques used in still image

compression and in video compression. In still image, the paper will focus on the

lossless compressions solved by Huffman coding and arithmetic coding, and the lossy

image compression solved by DCT-Based transform coding and Subband coding. In

video compression, the paper focuses on the most popular compression technique

MPEG, and introduce four kinds of MPEG formats. At the end, the paper will provide

the overview of JPEG 2000.

Section 1. (by Yan Wei)

1. Introduction

It is necessary to index and retrieve relative information on Internet and keep pace with the latest data. For these purposes, the corporations have researched and developed many tools for indexing and retrieving, which are called Information Retrieval Systems. However, due to the immature constructing stage, there is much space left for Information Retrieval System to develop. The increasing development of advanced multimedia applications requires new technologies for organizing and retrieving by content databases of still digital images or digital video sequences. At this point, the paper expounds a new way--- image retrieve.

2.Traditional retrieval systems 2.1. How do retrieval system works

There are three parts in traditional retrieval system: user retrieval interface, indexing spider and database engine.

2.1.1 Indexing Spider

In [5], indexing spiders just like browsers, request and retrieve documents from web servers. But unlike browsers, they do it not for viewing by humans but for automatic indexing and inclusion into their database. They do it around the clock. Each new document encountered by the spider is scanned for links, and these links are either traversed immediately or scheduled for later retrieval. Theoretically, by following all links starting from a representative initial set of documents, a spider will end up having indexed the whole Web.

Almost all systems allow users to add their URLs to the database for spidering. Some of them retrieve submitted documents immediately, others schedule them for future scanning. Spiders update their databases by revisiting sites they've already indexed. The update periods have been varied from one week to several months.

2.1.2. How to design for retrieval systems

All searches on the Web are being done via keywords, so it is probably the most important requirement to make sure that your documents contain all the keywords that

are likely to be used to find the document. Two distinct strategies can be outlined in this respect [6].

The first idea is simple: the more keywords are hit in a page, the better. Therefore, it necessary to think about all possible synonyms, variants, generic inclusive terms, subterms, and related concepts for words. Besides, the keywords can be entered in a different grammatical form, such as plural instead of singular for nouns. So the document should contain the most common collocations of the main keyword with closely related nouns, adjectives, verbs, and so on.

The second idea is that one of the factors in results ranking, as implemented by major systems is frequency, which is computed as the number of keyword occurrences divided by the document size. One consequence of this calculation is that if two documents contain the same keyword (located at the same distance from the top of document), the one that is smaller in size will get a higher ranking.

These two keyword strategies correspond to the two types of search queries, specific and general searches. Some retrieval system users are looking for very specific information: they use rare keywords, phrase searches, and various advanced features such as Boolean operators.

2.1.3. User interface The user interface is the visible part for the users to generate search queries.

All major retrieval systems have, besides the simplest form of query with one or several keywords, some additional search options. However, the scope of these features varies significantly, and no standard syntax for invoking them is yet established. Among the most common search options are:

Boolean operators: AND (find all), OR (find any), AND NOT (exclude) to combine keywords in queries.

Phrase search: Looking for the keywords only if they're positioned in the document next to each other, in this particular order.

proximity:

Looking for the keywords only if they're close enough to each other (the notion of "close enough" ranges from 2 in-between words for WebCrawler to 25 words for Lycos).

media search: Looking for pages containing Java applets, Shockwave objects, and so on;

special searches: Looking for keywords or URLs within links, image names, document titles;

various search constraints: Limiting the search to a time span of document creation, specifying a document language (Alta Vista), and so on.

In the future, retrieval systems may offer more sophisticated options, although for now, their search interfaces seem to be developing in another direction, described in the following subsection.

2.2. Ranking and results All retrieval systems rank their results so that more relevant documents are at the top of the list [5]. This sorting is based on the frequency of keywords within a document, and the distance of keyword occurrences from the beginning of the document. In other words, if one document contains two matches for a keyword and another is identical but contains only one, the first document will be closer to the top of list. If two documents are identical except that one has a keyword positioned closer to the top, it will come first. Usually, lists of search results contain document titles, URLs, summaries, sometimes dates of the document creation and document sizes. For compiling document summaries, several approaches have been developed.

Many retrieval systems use META descriptions provided by page authors, but when META data is unavailable, they usually take the first 100 or 200 characters of page text. Excite stands apart by ignoring META tags altogether and employing a sophisticated algorithm that extracts sentences and presents them as the page's summary.

2.3. the typical traditional retrieval systems developed by us

We will expound the frameworks, methods and flowing charts of traditional Retrieval System, based on the handsome experience of AIL --- the Multi-Coding Intelligent Retrieval System, which was developed by Wei Yan [4]. Figure 1. Shows the flow control of the project.

NO YES

NO continue to search

Yes

Fig.1 flow control of program

AIL adds some features on the basis of the traditional retrieval system：multi-codes support, flexible searching methods, the various searching grammar、subject-oriented and persoanlizeation for users.

Build up the search object Search from the original URL

Find the free slot and generate the search threads

No free slot AND URL list is empty

理的WWW地址队列

为空 Parse pages and segementation 数据存入数据库

User stops search

理的WWW地址队列

为空Stop search

oracle

User retrieval interface

The user interface of “AIL” can be divided into general search retrieval interface and user temple based interface. The general search retrieval interface is composed of simple and advanced search modules.

Simple interface

Fig. 2 AIL simple search interface

advanced search interface

Fig.3 AIL advanced search interface

Fig. 4 the search results of AIL

user template search interface

Fig. 5 the user template search interface

In the advanced search interface (shown in Figure 3), users can input key words and sentences. Besides, users also can choose the search parameters, for example, the author of document, subjects, the size of file, and the modified date of files. Based on these parameters, system can generate dynamic retrieval sentence. Further, system rearranges the data in database by professional fields. If users choose the professional templates in interface, system will retrieve data from according professional databases, which fastens the retrieval speed and realizes the idea of subject-orientation.

The results include the document number, titles, size and created date, abstract and whole article. Besides, AIL ranks the results by hitting.

2.4. the shortcomings of traditional retrieval system

There are some shortcomings of traditional retrieval systems:

Traditional retrieval method can not provide retrieval on the semantic level.

Some spiders cannot understand frames. Spiders always dies when sparse the frames in the HTML pages.

Retrieval systems cannot yet make heads or tails of any images, audio or video clips, so these bits of information are wasted. What remains is pure HTML source, of which spiders additionally strip off all markup and tags to get to the bare-bones plain text.

3. Image retrieval system

With the recent growth of the World Wide Web, many image and video storage and retrieval systems are being ported to the Web to give the public easy access. Currently people develop a new prototype of retrieval system --- image and video retrieval system. An image and video storage and retrieval system is an application that manages, stores and provides tools for the retrieval of images and videos.

3.1. Content-Based Query

The basic idea of content-based query is that when the user can provide a description of some of the prominent visual features of an image or video a mechanism is available by which the computer can search the archive and return the images and videos that best match the description. Typically, research on content-based query has focused on the visual features of color, texture and shape. For example, in [2], the IBM Query By Image Content (QBIC) project proposes and utilizes feature sets that capture the color, texture and shape of image objects that have been segmented manually. Texture and color features are also utilized that describe the global features of images.

The components of image retrieval system include:

a graphical user interface

a server application for receiving and processing queries an image retrieval server an image archive index files that index the images in the archive by visual features

In [2], the system provides the user with a graphical user-interface. The user-interface will collect the user's query and communicate the query to the query server. The Query Server receives the query message from the system interface and translates the query to find the images that match the query. After finding the best image matches to the query, thumbnail presentations of the image matches are to be transmitted back and presented to the user. The user should be able to select from the thumbnails to download the image files.

The query formulated by the user is translated into a query string which is passed to the query server. The query string should be visible, decodeable and adulterable to the user. The visibility and decodability of the query string may help the user to understand how the query information is recorded which may improve the user's ability to use the system. The decodeability condition requires that the query string may be easily interpreted. The user should also be able to both reuse and adulterate the queries by entering the query strings if desired. After the query is executed by the server, the output of the query should include:

indication to the user that the parameters were received correctly

the image thumbnails

indications of the match scores for each image match.

3.2. VisualSEEk

[2] introduces a specific example--- VisualSEEk. VisualSEEk is a visual feature retrieval system, which is developed at Columbia University. VisualSEEk provides a tool with which a person may search for and retrieve images and videos over the Web. The person formulates queries by using the VisualSEEk interface tools to illustrate salient features of the images and videos desired. They query is sent to the server which finds and retrieves to the user the images and videos that best match the visual description in the query. The VisualSEEk interface also provides tools to assign other visual properties to query elements. These include texture, shape, motion and embedded text.

3.2.1. System design

Fig. 7 system design

The overall system architecture will consist of three parts [3]:

user-interface

network handling

server programs

The user-interface will consist of a Web browser. The browser will have the ability to connect to URLs on the Web, display HTML pages and display and save JPEG images. The VisualSEEk collects the query from the user and send the query string to the Server as parameters of a CGI-BIN URL.

The communication across the network from user to server will be handled entirely by the HTTP protocol. The Common Gateway Interface (CGI) interface used for HTTP will be used to execute the server program on the server machine. The query string will be passed within the URL. The user interface will collect the query from the user and for a query string. The query program will execute by reading the query string file, the database indexes and the database and the query will be performed. The HTML output of the query shall be structured such that image thumbnails appear for each matched image. When the user selects a thumbnail, the corresponding image is downloaded from the server.

The objective for this kind of retrieval system is to create an image and video search function that is easy to use and provides power and flexibility in expression of visual queries. These systems provide an interface through which a person may search for and retrieve images and videos over the Web. The person formulates queries by using the interface to create a query. The query is sent to the server which finds and retrieves to the user the images and videos that best match the visual description in the query. The usable features include color contents and the spatial layout of color regions.

For example, the features are used as follows: to retrieve images of ``sunsets'' from the archive, one possible query might be constructed by sketching a yellow circle near the center of the image (for the sun) and filling the upper part of the image with orange (for the sky). The images and videos that best match this query will contain colored regions that closely match the specified color regions in terms of color matches and spatial regions, which should include “sunsets”. Figure 7 shows the interface.

Fig. 7 sunsets retrieval interface

3.3.2. Challenge

However, there are two technical challenges for the systems to provide easy access to the images and videos. First, visual searching tools are primitive. Typically, information describing each image is recorded in text, or using keywords. This requires great human effort in creating the meta-data that enables visual queries. The text descriptions also do not completely or consistently characterize the content of the images and videos. Second, the relatively large data sizes of images and videos compared to the communication channel bandwidth prohibits the user from browsing or perusing all but a small portion of the archive at a time. Therefore, the ability to find desired images and videos depends primarily on the capabilities of the query tools provided by the system.

4. The differences between the two systems

The context based image retrieval system is better than traditonal retrieval system in some points:

The indexing part of context based image retrieval system is more powerful. Because images retrieve needs more requirements.

The context based image retrieval system can do with the uncompleted and fuzzy queries.

The context based image retrieval system can refine the results.

However, there are still many open research issues to be solved before retrieval systems can be put into practice. In this paper, the context based image retrieval system should implement the following points:

Adopt the template conception into the image retrieve

Just like the ideas in AIL system, based on the templates, users can retrieve the images according to their interests. The image results returned by the systems are from all fields and not well classified. Adopting the templates, users can generate more exact queries to find what they really need.

Combined with the traditional retrieval system

Since the retrieval techniques are more mature than that of the context based image retrieval system. If we add the traditional retrieval system function into the retrieval system, the comprehensive systems will become more powerful. The systems will do both the key words and image researching tasks, thus taking more advantages of the resources in Internet.

More resources

The context based image retrieval system is just in developing. The image resources on WWW need to be enlarged. However, more resources mean that the database and other related modules in systems need to be updated to a higher level, which increase the cost. Therefore more efficient indexing and storing algorithms should be expounded.

Section 2. Histogram—Representation of Color Feature in Image Processing (by Yang Li)

In recent years, numerous methods for efficient image indexing and retrieval from

image databases have been proposed for digital library and other applications. Low-level

visual features such as color, texture, shape and so on are often employed to search

relevant images based on the query image. Among these features, color constitutes a

powerful visual cue and is perhaps the most salient and commonly used feature in color

image retrieval systems. Color distribution information can be represented in a number

of ways: mean RGB co-variant RGB [7], color clusters [8,9,10], and color names [11,12].

However, in image retrieval Color Histogram is the most commonly used color feature

representation. Statistically, it denotes the joint probability of the intensities of the three

color channels. It is most commonly in the sense that color statistics such as the mean or

color clusters are usually calculated from the color histogram. Swain and Ballard

proposes Histogram Intersection, a L1 metric, as the similarity measure for the Color

Histogram[13]. To take into account the similarities between similar but not identical

colors, Ioka[14] and Niblack et al. [15] introduced a L2-related metric in comparing the

histograms.

Searching and locating multimedia data in response to queries need a good

description and representation of multimedia information. Mahmood and Tanveer

proposed two ways of capturing color content in images called Color Histogram and

Region Color [16]. These descriptors are suitable for a wide variety of applications

requiring image-to-image matching and object-to image matching. For the color

histogram descriptor, they propose a matching method based on the perceptual distance

between colors. They propose a color similarity matrix that gives a quantitative

comparison value of a color cell with every other color cell. A weighted Euclidean

distance of colors is then take to match color histograms [14]. When the goal is to find a

query embedded in an image based on the color of one or more of its regions, a region

color, which is illumination and pose-invariant to account of the different appearances of

the query object in images of the database, is introduced. Such a descriptor allows the

localization of the query containing regions through suitable segmentation. And the

region color can support cross-modal queries such as using a skin color query to retrieve

videos depicting a person talking, and it allows direct data manipulation since the

matching of a color region retrieves the relevant image automatically. The utility of these

descriptors can be demonstrated through experiments on the MPEG-7 dataset.

Due to the simplicity of color histogram [17,18], it remains the most

commonly used method of this task. However, the lack of good perceptual histogram

similarity measures, the global color content of histograms, and the erroneous retrieval

results due to gamma nonlinearity, call for improved methods [19]. Androutsos,

Plataniotis and Venetsanopoulos present a new scheme which implements a recursive

HSV-space segmentation technique to identify perceptually prominent color areas. The

average color vector of these extracted areas are then used to build the image indices,

requiring very little storage. Their retrieval is performed by implementing a combination

distance measure, based on the vector angle between two vectors. Their system provides

accurate retrieval results and high retrieval rate. It allows for queries based on single or

multiple colors and, in addition, it allows for certain colors to be excluded in the query.

This flexibility is due to our distance measure and the multidimensional query space in

which the retrieval ranking of the database images is determined. Furthermore, their

scheme proves to be very resistant to gamma nonlinearity providing robust retrieval

results for a wide range of gamma nonlinearity values, which proves to be of great

importance since, in general, the image acquisition source is unknown.

Furthermore, considering that most Color Histograms are very sparse and thus

sensitive to noise, a widespread representation of image color content uses a color

clustering technique based on a single color histogram giving the frequency of occurrence

of every color quantizing the color space. Stricker and Orengo propose to use the

cumulated Color Histogram. Their research results demonstrated the advantages of the

proposed approach over the conventional Color Histogram approach [18].

Colombo and Genovesi introduces a method for extending the use of image

histograms to characterize the local color properties of an image and better preserve its

intrinsic geometric information [20]. The method uses a set of color histograms to

represent an image through a variable number of regions, each with a well-defined and

homogeneous color distribution. The extended representation is extracted by embedding

the histogram intersection operator in standard segmentation techniques as a measure of

color distribution homogeneity between two image regions. As such, the novel

representation can capture specific objects contained in the image, whose distribution

differs significantly from the global image color distribution. Besides, it is not required

that each region have a dominant color: multimode distributions are also successfully

classified as homogeneous regions. Segmenting the image using color distributions

makes it also possible to obtain additional features, related to the geometric

characteristics of each region and the spatial relationships between pairs of regions. All

these descriptions are said to be “induced” by the color distribution, which remains the

main feature of the representation. Once a query image directly reflecting the user’s

current retrieval task is produced through a graphic interface, it is processed as above to

obtain an internal representation suitable for image search inside a pictorial database.

The metric of similarity assessment of a global similarity score. Aiming at improving

and exploiting the user’s knowledge of system behavior, they introduce “internal query

manipulation”, which is based on accessing and manipulating through graphics directly

the internal query, thus complementing traditional query composition and refinement

modes such as query by sketch, query by example and relevance feedback.

Object recognition in images is always based on a model of the object at some

level of abstraction. And rigidity is one interesting dimension of abstraction. Near one

end of this dimension are the several object recognition algorithms that abstract objects

into a rigid or semi-rigid geometric juxtaposition of image features. These include

Hausdorff distance [21], geometric hashing [22], active blobs [23], and eigenimages

[24,25]. In contrast, histogram-base approaches abstract away (nearly) all geometric

relationships between pixels. In pure histogram matching, e.g. Swain & Ballard [13],

there is no preservation of geometry, just an accounting of the number of pixels of given

colors. The technique of Funt & Finlayson [26] uses a histogram of the ratios of

neighboring pixels, which introduces a slight amount of geometry into the representation.

Peng Chang and john Krumm introduces some geometric representation into the color

histogram by using a histogram of the cooccurrences of color pixels[27]. The color

cooccurrence histogram (CH) keeps track of the number of pairs of certain colored pixels

that occur at certain separation distances in image space. By adjusting the distances of

cooccurrences, they can adjust the sensitivity of the algorithm to geometric changes in the

object’s appearance such as caused by viewpoint change or object flexing. The CH is

also robust to partial occlusions, because they do not require that the image account for

all the cooccurrences of the model. A significant part of the paper is devoted to

understanding the algorithm’s false alarm probability, which shows a principled way of

choosing the algorithm’s adjustable parameters. The first theoretical false alarm analysis

of histograms both cooccurrence and regular) for recognizing objects. The approach

discussed in the paper is similar to other histogram-based approaches, most of which are

used to find images in a database rather than\ to find an object in an image. Those

approaches share an attempt to add spatial information to a regular color histogram.

Huang et al.[28] use the “color correlogram” to search a database for similar images. The

correlogram is essentially a normalized version of Peng and John’s CH. Pass and Zabih

[29] use “color coherence vectors” hat represent which image colors are par of relatively

large regions of similar color.

6.Histogram modification and in particular histogram equalization is the one of

the basic and most useful operations in image processing, especially enhancement of

contrast. Basically contrast enhancement techniques are divided into global and local

histograms. An early attempt to introduce shape criteria in contrast enhancement was

done in [30]. Mathematical Morphology School [31] argues that the basic operations on

images should be invariant with respect to contrast changes, such as honomorphic

transformations. As a consequence the basic information of an image is contained in the

family of its binary shadow or level-sets in the family of sets

Xu : = { x: u(x)> },

for all values of in the range of u. And under fairly general conditions, an image

can be reconstructed from its level-sets by the formula u(x)=sup{ : x Xu}. If h is a

strictly increasing function, and the transformation v=h(u) does not nodify the family of

level-sets of u, then it only changees its index in the sense that

Xh()v = Xu for all .

The formalization of multiscale analyses given in [32] leads to a formulation of

recursive/causal/local morphological and geometric invariant filters in terms of solutions

of certain partial differential equations of geometric type, which provides a new view on

many of the basic mathematical morphology operations. One of their basic assumptions

was the locality assumption which aimed to translate into a mathematical language. The

basic operations which are taken into consideration are a kind of local average around

each pixel, that is, only a few pixels around a given sample influence the output value of

the operations. But this excludes the case of algorithms as histogram modification, and

the operations like those in [33] are not modeled by these equations. G. Sapiro and V.

Caselles show in their paper that the histogram can be modified to achieve any given

distribution. The modification can be performed while simultaneously reducing noise,

thus avoiding the noise sharpening effect in classical algorithms. Their approaches are

extended to local contrast enhancement as well. One of the advantage of the use of

PDE’s for image processing is the possibility to combine algorithm, which is successfully

used for example in [34], and the smoothing operator in [35] and the debluring one in

[36] are combined together. Other advantage of this methodology is the accuracy

achieved when efficient numerical implementations are used. They present a novel PDE

for histogram modification and show how to obtain any grey-level distribution, and then

combine it with the smoothing operator proposed in [37], obtaining contrast

normalization and denoising at the same time. C. Vicent, L. Jose-Luis, M. Jean-Michel

and S. Guillermo propose a novel approach for shape preserving contrast enhancement,

which is a particular case of homomorphic transformation [38]. They realize the contrast

enhancement by means of a local histogram equalization algorithm, because global

histogram modification not always produces good contrast, especially small regions

which are hardly visible after such a global operation. The scheme they introduced is

based on the grey-values and spatial relations between pixels in the image, and following

mathematical morphology, constitute the basic objects in the scene. Both example of

grey-value and color images are presented. Their approach attains both shape-

preservation property of global techniques and the contrast improvement quality of local

ones.

One disadvantage of color indexing based on color distribution histogram is

that the speed of the system is deirctly related to the size of the histogram to be indexed.

And it is relatively more expensive to match or compare color histogram. J. Berens, G.

D. Finlyason and G. Qiu [39] show that color histograms contain highly correlated

information and so they can be effectively compressed: they can be represented by a few

numbers. They show how color histogram can be effectively compressed and how

compressed color histograms can be compared for indexing. As such, color histogram

comparison is no slower than any other color-based indexing method. They make two

important contributions. First, they show that an opponent color histogram can be

compressed more readily than can conventional color space. Secondly, they use the

standard transform encoding methods (the Kkarhunen-Loeve transform, the discrete

cosine transform, the Hadamard transform and hybrid transforms) to compress color

histograms. Experiments show that compressing rates of up to 250:1 are possible without

affecting indexing performance. This means that a database can be searched that is 250

times larger in the same time as that searched by conventional indexing.

Section 3. Image Compression (by Xudong Ni)

Recent years have significant progress in the compression of still images and

motion image. Image compression technology has dramatically widened the application

of still digital images, this section briefly introduces the goals, principles and methods of

still image compression: transform coding, subband coding, we focus on the basic ideas,

principles and processing steps of some methods. We will introduce some recent works

about image compression JPEG2000. Finally, this section also briefly introduces motion

image compression: MPEG

3.1 Object Quality Measure

Digital images have broad application to areas such as Internet browsing, TV

transmission, video conferencing, transmission of remotely sensed images and printing,

but the vast amount of data required to represent a digital data image restricts these

applications. Application of digital images often is not viable due to high storage or

transmission costs. Image compression technology offers a possible solution. The basic

goal of image compression is to reduce the bit rate of an image to minimize the

communication channel capacity or digital storage memory requirements while

maintaining necessary fidelity in the image, or, equivalently, to obtain the best possible

fidelity for a given bit rate. The bit rate is measured in bits per sample or bits per pixel

(bpp). The raw (uncompressed) bit rate is typically 8 bits per pixel for a gray-level and 24

bits per pixel for a color image with three 8-bit components. Fidelity can be judged by

quantitative criteria such as mean square error (MSE) or peak signal-to-noise ratio

(PSNR)[40] between the original image and the reconstructed image. MSE is given by

and PSNR is given by

where the are the elements of the original image and the are the elements of the

reconstructed image. N and M are the dimensions of the image. Generally, the smaller the

MSE, the higher the PSNR and the better the image quality.

The fidelity of the reconstructed image can also be judged by subjective criteria

such as the statistically based acceptability tests for specific applications or viewer

quality ratings. For example, five-point scales of quality (bad, poor, fair, good, excellent)

are sometimes used. However, quantitative criteria are often used when evaluating image

compression methods because of the testing cost inherent to subjective tests.

3.2 The principles of image compression

Almost all methods of image compression are based on two fundamental principles.

The first principle is to exploit the properties of the signal sources, e.g., the

statistical property, and to remove redundancy from the signal. This approach is called

redundancy reduction. Almost all sampled signals in coding are redundant because

Nyquist sampling typically tends to preserve some degree of intersample correlation. This

redundancy is reflected in the form of a nonflat power spectrum. Greater degrees of

nonflatness lead to greater gains from redundancy removal. These gains are also referred

to as prediction gains or transform coding gains, depending on whether the redundancy is

processed in the time domain or frequency (or transform) domain.

The second principle is to exploit the properties of the signal receiver (usually the

human visual system) and to remove parts or details of the signal that will not be noticed

by the receiver. This approach is called irrelevancy reduction. The idea is to quantize the

sample or transform coefficients just finely enough to leave an imperceptibly distorted

result, even though the quantized quantity is not mathematically zero. If the available bit

rate is not sufficient to realize this kind of perceptual transparency, the intent is to

minimize the perceptibility of the distortion.

Different image compression methods are based on either redundancy reduction

or irrelevancy reduction while most methods exploit both. The parts of a coder that

process redundancy and irrelevancy are separate in some methods, while in other

methods they cannot be easily separated.

3.3 Classification of Image Compression Methods

The classification of image compression methods can be made by different

characteristics. A widely acceptable classification can be made as information-lossless or

information-loss techniques.

3.3.1 Lossless compression techniques

Lossless is also called noiseless coding, entropy coding or data compaction. The

most popular lossless compression techniques are

Huffman coding[41]

Arithmetic coding[42]

The latter has produced 5-10% better compression than Huffman coding but is

also generally more complex. Lossless compression methods are often based on

redundancy reduction. With lossless compression, the compressed data can be exactly

restored so as to be identical to the original.

3.3.2 lossy compression techniques

with lossy compression, the original image cannot be exactly recovered from the

compressed data, which is only an acceptable approximation to the original. Lossy image

compression methods are often based on irrelevancy reduction,

though practical image compression methods usually exploit both redundancy

reduction and irrelevancy reduction. The lossy image compression methods include the

following subclasses:

scalar quantization, including Pulse Coding Modulation (PCM) and

Differential PCM (DPCM).

Transform coding, including: Discrete Cosine Transform (DCT), etc.

Subband /Wavelet coding

3.4 DCT-Based Transform Coding

The basic motivation behind transform coding is to transform a set of pixels or

samples from the spatial domain into another set of less correlated (or more independent)

coefficients in the frequency domain, so that the frequency domain coefficients can be

encoded more efficiently. Since DCT and DPCM have been briefly introduced in the

Advanced Information Processing class. We attempt to focus on the intuitive ideas of

most recent compression technique: Subband/ Wavelet coding.

3.5 Subband/Wavelet Coding(SBC)

3.5.1 Basic Idea of Subband Coding

Within a typical image, although areas of significant spatial activity are usually

apparent, there also exist extensive regions where detail is slowly varying or even

substantially uniform. Given that, by standard analytical methods, rapidity of spatial

variation can be expressed in terms of spatial frequency components. We are then led to

the conclusion that image data has a strongly low-pass spectrum. Thus, it is wasteful to

expend as much effort coding insignificant segments of the spectrum as on processing

those spectral regions in which the data energy is concentrated. Therefore, an intuitive

approach to coding might be to split the image into different frequency bands and apply

efficient techniques to the individual sub-bands. This is the motivation of subband image

coding. Subband coding was first introduced in speech coding by Crochiere et al. [43] in

1976. The basic idea is to divide the frequency band of the signal and then to code each

subband with either PCM or DPCM using a coder and bit rate accurately matched to the

statistic of that band. The idea of subband coding was extended to image coding by

Woods and O'Neil in 1986 [44]. The basic idea of subband image coding is to split the

image signal into frequency bands of equal or equal bandwidth, and then encode these

subbands independently according to the signal energy contained in that band.

3.5.2 Subband Filtering and Decomposing

The essential step of subband image coding is the decomposition of the signal into

the various frequency bands by means of a subband filter bank. Then, different methods

can be applied to encode the subbands.

x

I-M(x)

M(x)

Filter bank with morphological filter yielding perfect reconstruction when M(x) is a generalized half-band filter

For perfect reconstruction, subband filter bank must obey certain design rules.

The Quadrature Mirror Filter(QMF) is one of these filters, QMF was introduced in

subband by Esteban and Galand [45]

Morphological Subband Decomposition (MSD) is another subband filter

introduced by Olivier Egger[46], the following figure is the results using MSD filter.

MSD decomposed image, “Lena” two subband

MSD decomposed image, “Lena” seven subband

3.5.3 Advantage of subband Coding

(1) It supplies a scalable image representation method and facilitates progressive

transmission: In the context of image coding, scalability means that the

transmitted bit-stream can be decoded hierarchically. That is, a low-resolution

version of the transmitted image can be decoded with few operations, and the full

resolution image will only be decoded if necessary or desired. This facilitates

progressive transmission, where low resolution data is sent first and further detail

appears gradually. The SBC technique makes high-definition and low definition

systems compatible, in that the low definition receivers can simply ignore the

high resolution subbands.

(2) It has good subjective error properties: Since quantization is performed separately

for each subband, SBC allows a more flexible design of the coding scheme which

gives good subjective error properties. Different subbands can be allocated

different bit rates. Therefore, through an appropriate bit allocation strategy among

subbands, subjectively superior performance can be obtained when compared to a

fullband coding scheme.

(3) It has good SNR performance: Compared to adaptive DCT, subband coding has

the best SNR performance at all bit rates in the range 0.67-2.0 bpp [50]. The

reconstructed image using SBC is also without blocking effects which may appear

when an image is coded by using block transform coding, such as DCT-coding.

We will compare performance of DCT and SBC in the later part of this survey.

3.6 Overview of JPEG 2000

The International Standards Organization's JPEG2000 committee[47] has

finalized specs for a new algorithm that compresses images up to 200 times with no

appreciable degradation in quality. The JPEG2000 spec, which will become ISO

15444 when it's officially approved in 2001, major change from the current JPEG is

that wavelets will replace DCT as the means of transform coding.

Among many things it will address:

o Low bit-rate compression performance,

o Lossless and lossy compression in a single codestream,

o Transmission in noisy environment where bit-error is high,

o Application to both gray/color images and bi-level (text) imagery, natural

imagery and computer generated imagery,

o Interface with MPEG-4,

o Content-based description.

JPEG2000 image

(middle) shows

almost no quality

loss from current

JPEG, even at 158:1

compression.[48]

3.7 Overview of MPEG

Moving Picture Experts Group [49] (MPEG) is a working group of ISO/IEC in

charge of the development of standards for coded representation of digital audio and

video. Established in 1988, the group that produced MPEG-1, the standard on which

such products as Video CD and MP3 are based, MPEG-2 the standard on which such

products as Digital Television set top boxes and DVD are based and MPEG-4, the

standard for multimedia for the web and mobility. The current thrust is MPEG-7

"Multimedia Content Description Interface" whose completion is scheduled for July

2001. Work on the new standard MPEG-21 "Multimedia Framework" has started in

June 2000 and has already produced a Draft Technical Report. Several Calls for

Proposals have already been issued.

3.8 Summary

We have introduced some elementary concepts of image compression and some

basic image compression methods. DCT-based transform coding is widely used due to its

simplicity and standardization originated from JPEG. Subband coding is very efficient at

the bit rate from 0.5-2.0 bpp and quite suitable to progressive transmission situations.

This section cannot provide complete descriptions or exhaustive studies of image

compression methods. It is hoped, however, that it will provide readers with an

introduction to image compression in the recent and current works in image compression

area.

http://www.cselt.it/mpeg/standards/mpeg-1/mpeg-1.htm


http://www.cselt.it/mpeg/standards/mpeg-7/mpeg-7.zip



Reference:

[1]Y. Rui, T. Huang, S. Mehrotra, and M. Ortega. A relevance feedback architecture for

content-based multimedia information systems. In Workshop on Content Based Access

of Image and Video Librariec, Porto Rico, June 1997

[2]John R. Smith and Shih-Fu Chang. VisualSEEk: a Content-Based Image/Video

Retrieval System, System Report and User's Manual, version 1.0 beta

[3]EDOARDO ARDIZZONE, MARCO LA CASCIA. Automatic Video Database

Indexing and Retrieval. 1997 Kluwer Academic Publishers

[4]Wei Yan. The mechanisms and implement of Intelligent Information Retrieval

System, The master thesis of Tianjin University. May 2000

[5]http://www.peachpit.com/books/catalog/69642.html，Search Engines for the World

Wide Web (2nd Edition)，Peachpit Press, ISBN 0-201-69642-8

[6]http://websearch.miningco.com/library/weekly/aa010899.htm，Super Searchers'

Search Secrets，Mining Co. Web Search Guide, Jan. 1, 1999

[7] Finlayson, G.D., Chatterjee, S.S., and Funt, B.V.: “Color angular indexing,”

Proceedings of the European conference on Computer vision, April 1996, pp. 16-27.

[8] Uchiyama, T., and Arbib, M.A.: “Color image segmentation using competitive

learning,” IEEE Trans. Pattern Anal. Mach. Intell., 1994, 16, (12), pp. 1197-1206

[9] Rubner, Y.: “Perceptual metrics for image database navigation,” PhD thesis, Dept. of

Computer Science, Stanford University, 1999

[10] Selim, S.Z., and Ismail, M.A.: K-means-type algorithms: “A generalized

convergence theorem and characterization of loca optimality,” IEEE Trans. Pattern Anal.

Mach. Intell., 1984, PAMI-6, (1), pp. 81-87

[11] Syeda-mahmood, T.F.: “Data and model-driven selection using color regions,” Int.

J.Computer Vis., 1997, 21, (1/2), pp. 9-36

[12] Mehter, P.M., Kankanhalli, M. S., Narasimhalu, A.D., and MAN, G. C.:” Color

matching for image retrieval,” Pattern Recognit. Lett., 1995, 16, pp. 325-331

[13] Swain M., Ballard D. “Color Indexing,” International Journal of Computer Vision,

vol. 7, n. 11, 1991.

http://websearch.miningco.com/library/weekly/aa010899.htm

http://www.peachpit.com/books/catalog/69642.html

[14] M. Ioka, A method of defining the similarity of images on the basis of color

information, Technical report, Tech.Report RT-0030, IBM Tokyo Research Lab., 1989.

[15] W. Niblack, R. Barber, and et al. “The QBIC project: Querying images by content

using color, texture and shape,” In Proc. SPIE Storage and Retrieval for Image and

Video Databases, Feb 1994.

[16] Tanveer Syeda-Mahmood; Dragutin Pekovic; “On describing color and shape

information in images” Signal Processing: Image Communication, vol. 16, no. 1, pp. 15-

31, Sep 2000.

[17] X. Wan and C.-C.Jay Kuo, “Color distribution analysis and quantizaiton for image

retrieval,” in Storage and Retrieval for Image and Video Databases IV, SPIE-2670, pp.

8-16. 1995.

[18] M. Sticker and M.Orengo, “Similarity of color images,” in Storage and Retrieval for

Image and Video Databases III, SPIE-2420, 1995, pp. 381-392.

[19] Androustsos, D; Plataniotis, KN; Venetsanopoulos, AN; “A Novel Vector-Based

Approach to Color Image Retrieval Using a Vector Angular-Based Distance Measure,”

Computer Vision and Image Understanding, vol. 75, no. 1, pp. 46-58, 1999.

[20] C Colombo, I Genovesi, “Image Querying and Retrieval by Multiple Color

Distributions” Proceedings of Image and Video content based retrieval, pp.19-26, 1998.

[21] D. P. Huttenlocher, G. A. Klanderman, and W. J. Ricklidge, “Comparing Images

Using the Hausdorff Distane,” IEEE Transactions on Pattern Analysis and Machine

Intelligence, vol. 15, pp. 850-863, 1993.

[22] Y.Lamdan and H.J. Wolfson, “Geometric Hashing: A General and Efficient Model-

Based Recognition Scheme,” presented at Second International Conference on Computer

Vision, Tampa, Florida, 1988.

[23] S. Sclaroff and J. Isidoro, “Active Blobs,” presented at Sixth International

Conference on COMPUTER Vision, Bombay, India, 1998.

[24] M.Turk and A.Pentland, “Eigenfaces for Recognition,” Journal of Cognitive

Neuroscience, vol. 3, pp. 71-86, 1991.

[25] H.Murase and S.K.Nayar, “Visual Learning and Recognition of 3-D Objects from

Appearance,” International Journal of Computer Vision, vol. 14, pp. 5-24, 1995.

[26] B.V. Funt and G. D. Finlayson, “Color Constant Color Indexing,” IEEE

Transactions on Pattern Analysis and Machine Intelligence, vol. 17, pp. 522-529, 1995.

[27] C. Peng and K. John, “Object Recognition with Color Cooccurrence Histograms,”

IEEE conference on Computer Vision and Pattern Recognition, Fort Collins, CO, June

23-25, 1999.

[28]J. Huang, S. R. Kumar, M. Mitra, W.-J. Zhu, and R. Zabih, “Image Indexing Using

Color Correlograms,” presented at IEEE conference on Computer Vision and Pattern

Recognition, San Juan, Puerto Rico, 1997.

[29] G. Gass and R. Zabih, “Histogram Refinement for Content-Based Image retrieval,”

presented at IEEE Workshop on Applications of Computer Vision, Sarasota, Florida,

1996.

[30] R. Cromatie and S. M. Pizer “Edge-affected context for adaptive contrast

enhancement,” Proc. Information Processing in Medical Imaging, Lecture Notes in comp.

Science 511 pp. 474-485 July, 1991.

[31] J. Seral, “Image Analysis and Mathematical Morphology,” Academic Press, New

York, 1983.

[32] L. Alvarez, P. L. Lions, “Axioms and fundamental equations of image processing,”

Arch. Rational Mechanics and Anal. 16: IX pp. 200-257, 1993.

[33] G. Sapiro and V.Casselles “Histogram modification via differentil equations,”

Journal of Differential Equations 135:2 pp. 238-268 1997.

[34] L. Alvarez and L. Mazorra, “Signal and Image restoration by using shock filters and

anisotropic diffusion,” SIAM J. Numer. Anal., 1994

[35] L. Alvarez, P. L. Lions, and J.M. Morel, “Iimage selective smoothing and edge

detection by nonlinear diffusion,” SIAM J.Number. Anal. 29, pp. 845-866, 1992.

[36] S. Osher and L. I. Rudin, “Feature-oriented image enhancement using shock filters,”

SIAM J. Number. Anal.27, PP. 919-940, 1990.

[37] G. Sapiro and A.Tannenbaum, “Edge preserving geometric enhancement of MRI

data,” EE-TR, University of Minnesota, April 1994.

[38] Caselles, Vicent; Lisani, Jose-Luis; Morel, Jean-Michel; Sapiro, Guillermo, “Shape

Preserving Local Histogram Modification,” IEEE Transactions on Image Processing, vol.

8, no. 2, pp. 220-230, Feb 1999.

[39] J. Berrens, G. D. Finlayson, G.Qiu, “Image indexing using compressed color

histogram,” IEEE Proceedings, Vision, image and signal processing, vol. 147, no. 4,

2000, pp. 349-355.

[40] Yang.C, Shong.G, and Zhang.C “A subband Coding Aiming at Real-Time Image

Transmission”, Journal of Image and Graphics, China, 2000 Vol.5 No.3 P.191-195

[41] D.A. Huffman, "A method for the construction of minimum redundancy codes," In

Proc.IRE, vol. 40, pp. 1098-1101, 1962.

[42] W. B. Pennebaker, et al., "Arithmetic coding articles," IBM J. Res. Dev., vol. 32, no.

6, pp.717-774, Nov. 1988.

[43] R. E. Crochiere, S. A. Webber, and J. L. Flanagan, "Digital coding of speech in

subbands,"Bell. Syst. Tech. J., vol. 55, pp. 1069-1085, Oct. 1976.

[44] J. W. Woods, and S. D. O'neil, "Subband coding of images," IEEE Trans. Acoust.,

Speech,Signal Processing, vol. ASSP-34, no. 5, pp. 1278-1288, Oct. 1986.

[45] D. Esteban, and C. Galand, "Application of quadrature mirror filters to split band

voice coding schemes," in Proc. ICASSP, pp. 191-195, May 1977.

[46] Egger Olivier, Li Wei ， Kunt Murat. High compression image coding using an ad

aptive morphological subband decomposition. IEEE Proceedings,1995,83:272-287.

[47] “A overview of JPEG 2000” http://citeseer.nj.nec.com/264144.html

[48] “JPEG2000 wavelet compression spec approved” http://www.eetimes.com/story/

OEG19991228S0028

[49] Moving Picture Experts Group; http://www.cselt.it/mpeg/

[50] J. W. Woods, and S. D. O'neil, "Subband coding of images," IEEE Trans. Acoust.,

Speech,Signal Processing, vol. ASSP-34, no. 5, pp. 1278-1288, Oct. 1986.

http://www.cselt.it/mpeg/

http://www.eetimes.com/story/

http://citeseer.nj.nec.com/264144.html

Documents

Context-based image retrieval system: mechanisms ...chens/courses/cis6931/2001/Ni.doc · Web viewContext-based image retrieval system: mechanisms differences from traditional retrieval