Bag Of Features

8/7/2019 Bag Of Features

http://slidepdf.com/reader/full/bag-of-features 1/10

Image Categorization

Origin and Motivation

Origin 1: Texture recognition

• Texture is characterized by the repetition of basic elements or textons

• For stochastic textures, it is the identity of the textons, not their spatial arrangement, that

matters.



Origin 2: Bag-of-words models

• Orderless document representation: frequencies of words from a dictionary Salton &McGill (1983)



“Bag-of-Features” Approach:

The task of image categorization is to label a query image to a certain scene type, e.g.,

“building,” “street,” “mountains,” or “forest.” The main difference compared to recognition

tasks for distinct objects is a much wider range of intra-class variation. Two instances of type

“building,” for example, can look very different in spite of having certain common features.

Therefore a more or less rigid model of the object geometry is not applicable any longer.

Main Idea :

A similar problem is faced in document analysis when attempting to automatically assign a

piece of text to a certain topic, e.g., “mathematics,” “news,” or “sports”.This problem is

solved by the definition of a so-called codebook there .A codebook consists of lists of words

or phrases which are typical for a certain topic.

It is built in a training phase. As a result, each topic is characterized by a “bag of words” (setof codebook entries), regardless of the position at which they actually appear in the text.

During classification of an unknown text the codebook entries can be used for gathering

evidence that the text belongs to a specific topic.

This solution can be applied to the image categorization task as well: here, the “visual

codebook” consists of characteristic region descriptors (which correspond to the “words”)

and the “bag of words” is often described as a “bag of features”in literature.

The visual codebook is built in a training phase where descriptors are extracted from sample

images of different scene types and clustered in feature space. The cluster centers can beinterpreted as the visual words.

In the recognition phase, the feature distribution of a query image based on the codebook

data is derived (e.g., through assignment of each descriptor to the most similar codebook

entry) and classification is done by comparing it to the distributions of the scene types learnt

in the training phase, e.g., by calculating some kind of similarity between the histograms of

the query image and known scene types in the model database.



Bag of features: outline

1. Extract features

2. Learn “visual vocabulary”.

3. Quantize features using visual vocabulary.

4. Represent images by frequencies of “visual words”.

1. Extract features :

The identification of image patches can be achieved by one of the following :

Random Sampling: An alternative strategy is to sample the image by

random. Empirical studies conducted give evidence that such a simple

random sampling strategy yields equal or even better recognition

results,because it is possible to sample image patches densely, whereas the

number ofpatches is limited for keypoint detectors as they focus on



characteristic points. Dense sampling has the advantage of containing more

information.

Regular Grid

Interest point detector

Segmentation-based patches



2. Learning the visual vocabulary

After feature detection, each image is abstracted by several local patches. Feature

representation methods deal with how to represent the patches as numerical vectors. These

methods are called feature descriptors. A good descriptor should have the ability to handle

intensity, rotation, scale and affine variations to some extent. One of the most famous

descriptors is Scale Invariant feature Transform (SIFT).SIFT converts each patch to 128-

dimensional vector. After this step, each image is a collection of vectors of the same

dimension (128 for SIFT), where the order of different vectors is of no importance.

A codeword can be considered as a representative of several similar patches. One simple

method is performing K-means clustering over all the vectors.Codewords are then defined as

the centers of the learned clusters.

Thus, each patch in an image is mapped to a certain codeword through the clustering process

and the image can be represented by the histogram of the codewords.

http://en.wikipedia.org/wiki/K-means_clustering



http://en.wikipedia.org/wiki/Histogram









An alternative clustering scheme is the k-means algorithm which intends to identify densely

populated regions in feature space (i.e., where many descriptors are located close to each

other).

Each cluster center produced by k-means becomes a code vector.

The advantage of k-means clustering is that the codebook fits better to

the actual distribution of the data, but on the other hand – at least in its original

form – k-means only performs local optimization and the number of clusters k has

to be known in advance,There are other approaches use Agglomerative Cluster.

Agglomerative Clustering:

which automatically determine the number of clusters by successively merging features until

a cut-off threshold t on the cluster compactness is reached. However, both the runtime and

the memory requirements are often significantly higher for agglomerative methods.

3. Quantize features using visual vocabulary:The codebook is used for quantizing features

A vector quantizer takes a feature vector and maps it to the index of the nearest codevector

in a codebook.

Codebook = visual vocabulary

Code vector = visual word



Visual Vocabulary example

Image patch examples of visual words



4-image representation

Image Classification

Given the bag-of-features representations of images from different classes

Disadvantages:

One of notorious disadvantages of BoW is that it ignores the spatial relationships among the

patches, which is very important in image representation. Researchers have proposedseveral methods to incorporate the spatial information.

Documents

Bag Of Features