CBIR by deep learning

© Vigen Sahakyan 2016

Content Based Image Retrieval by

Deep Learning


Agenda

● Goals● What is CBIR?● What is Deep Learning ?● AutoEncoder● Tool description


Goals

● We want to create Image search system based on Machine Learning technique, which can do searching by image content. It has lots of applications in public safety, military, medicine diagnoses e.t.c

● In modern web we have millions and billions of images without labels and only a couple thousands of labeled images. The problem is how we can use the power of this unlabeled data in our system ?

● In this presentation we explain our CBIR system which able to collect all meaningful information from unlabeled data by using one of the widely used Deep Learning technique which is called AutoEncoder.


What is CBIR?

● Content Based Image Retrieval (CBIR)

● Is the process by which one searches for similar images.

● "Content-based" means that the search analyzes the contents of the image rather than the metadata such as keywords, tags, or descriptions associated with the image.

● One of the open problems in Computer Vision.

● It has lots of applications in many fields such as (Public safety, Military, Medical Diagnoses, Robotics e.t.c)


What is Deep Learning?

1. Deep learning is a branch of machine learning based on a set of algorithms that attempt to model high-level abstractions in data by using multiple processing layers.

2. It’s used in Machine Learning to automatically figure out high level feature.3. By Deep Learning we can extract high level features like shape, texture, contrast e.t.c from image

datasets(it’s not necessary for images to be labeled).

4. There are lots of Deep Learning algorithms like Convolutional and Recursive Neural Network, Deep Belief Network, Restricted Boltzmann Machine e.t.c. In this work we were used AutoEncoder .

5. It has lots of applications in many fields such as (Computer Vision, Search Engines, SpeechRecognition, Artificial Intelligence e.t.c)


AutoEncoder

● The aim of an autoencoder is to learn a representation (encoding) for a set of data, typically for the purpose of dimensionality reduction.

● Recently, the autoencoder concept has become more widely used for learning generative models of data

● The AutoEncoder is also a Neural Network. The difference is that the AutoEncoder usesunsupervised learning. To achieve this, the AutoEncoder gets the same input value vectorat the output. Differences in the vectors at the output can be considered errors for backpropagation. It try to learn codec on hiddenlayer (encoded value).

● Input = Decode(Encode(Input))


Tool description

1. First of all Web service receive raw image (.jpg, .png, e.t.c) and pass it to preprocessing step.

2. Preprocess raw Image:a. Resize image to the appropriate size (our model size)b. Generate GrayScale representation of resized image.

3. Generate row vector from preprocessed image pixels.4. Call Normalization module


Tool description

We call sigmoid function on value of every neuronand it useful to have normalized inputs, to find global minimum faster and improve error rate.

1. We do Min-Max normalization of input values by followingformula. zi=(xi−min(x))/(max(x)−min(x))

2. In our case zi = xi / 2553. Call Encoding module


Tool description

We have already pretrained our AutoEncoder model via stochastic gradient descent. As dataset we used 60000 unlabeled images of handwritten digits. After training AutoEncoder figured out lots of high level feature of those images.

1. We feed our normalized row image to our AutoEncoder then we get more compact feature vector (this vector represent probabilities of each high level feature to be found on this image).

2. We pass new compact vector to Classifier module. (There isn’t need to normalize this vector as it’s already had normalized when passed through sigmoid function)


Tool description

We pre trained our Neural Network classifier with severalthousands of labeled examples which were passed throughthe AutoEncoder.

1. We feed row vector encoded by AutoEncoderand call Result retrieval module to figure outResult class from output layer.


Tool description

Each node in the output layer will have a probability that it's class is the correct output.

1. If the probability of one of the outputs class is greater than the threshold (0.5) then it is considered as result class.


Result

We tested our algorithm on MNIST digital handwritten image dataset and compared it with the couple of famous article results.

MNIST

Our algorithm 95%

Yann LeCun algorithm 95.3%

Aurelio Ranzato algorithm 99%

http://yann.lecun.com/exdb/mnist/

http://yann.lecun.com/exdb/mnist/

http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf

http://yann.lecun.com/exdb/publis/pdf/ranzato-cvpr-07.pdf

Technology

CBIR by deep learning