CIS 660 Image Searching System using CNN-LSTMeecs.csuohio.edu/~sschung/CIS660/QASystemImageTextSagarMyur.pdf · Image Processing - CNN Image Searching System ReLu Replace negative

CIS 660

Image Searching System using CNN-LSTM

Presented by

Mayur Rumalwala

Sagar Dahiwala

AGENDAImage Searching System

• Problem in Image Searching?

• Proposed Solution

• Tools, Library and Dataset used

• Architecture of Proposed System

• Implementation of Algorithm

• CNN, LSTM

Problem in image searching?Image Searching System

• Current system search images,

• Based on title

• Based on description

• Based on META data

• Large Images set without description

• Instagram

Proposed SolutionImage Searching System

• Two Different approaches we can think of,

1. Search image using similar image

2. Search image based on sentence (User query)

Tools, Library and Dataset usedImage Searching System

• Image Dataset – The Caltech 256

• 256 Object Categories + Cluster

• At least 80 images per categories

• 30,608 images

• http://www.vision.caltech.edu/Image_Datasets/Caltech256/

• Text Dataset – Cornell Movie Dialog corpus dataset

• 220,579 conversational exchanges between 10,292 pairs of movie characters

• involves 9,035 characters from 617 movies

• in total 304,713 utterances

• http://www.cs.cornell.edu/~cristian/Cornell_Movie-Dialogs_Corpus.html

http://www.vision.caltech.edu/Image_Datasets/Caltech256/

http://www.cs.cornell.edu/~cristian/Cornell_Movie-Dialogs_Corpus.html

Architecture of Proposed SystemImage Searching System

Image Processing - CNNImage Searching System

Image representation based on RGB


# Convolutional Layer 1.filter_size1 = 5num_filters1 = 32

# Convolutional Layer 2.filter_size2 = 5num_filters2 = 64

How filter is used



ReLu Replace negative values with zero

# Rectified Linear Unit (ReLU).# It calculates max(x, 0) for each input pixel x.# This adds some non-linearity to the formula layer = tf.nn.relu(layer)


# This is 2x2 max-pooling, which means that we# consider 2x2 windows and select the largest value# in each window. Then we move 2 pixels to the next window.layer = tf.nn.max_pool(value=layer,

ksize=[1, 2, 2, 1],strides=[1, 2, 2, 1],padding='SAME')

Max Pooling


def new_conv_layer(input, # The previous layer.num_input_channels, # Num. channels in prev. layer.filter_size, # Width and height of each filter.num_filters, # Number of filters.use_pooling=True): # Use 2x2 max-pooling.

# Shape of the filter-weights for the convolution.# This format is determined by the TensorFlow API.shape = [filter_size, filter_size, num_input_channels, num_filters]

# Create new weights aka. filters with the given shape.weights = new_weights(shape=shape)

# Create new biases, one for each filter.biases = new_biases(length=num_filters)

# Create the TensorFlow operation for convolution.# Note the strides are set to 1 in all dimensions.# The first and last stride must always be 1,# because the first is for the image-number and# the last is for the input-channel.# But e.g. strides=[1, 2, 2, 1] would mean that the filter# is moved 2 pixels across the x- and y-axis of the image.# The padding is set to 'SAME' which means the input image# is padded with zeroes so the size of the output is the same.layer = tf.nn.conv2d(input=input,

filter=weights,strides=[1, 1, 1, 1],padding='SAME')


User Query processing - RNNImage Searching System

RNN – Recurrent Neural Network







Vector Representation



User Query processing - LSTMImage Searching System

• Element by Element addition (+)

• Element by Element Multiplication (X)

• Memory (M)

• Squashing function (f)

User Query processing - NLTKImage Searching System

NLTK – Natural Language Tool-Kit

Sentence : “cat with my car”

<RB.?>* = "0 or more of any tense of adverb," followed by:

<VB.?>* = "0 or more of any tense of verb," followed by:

<NNP>+ = "One or more proper nouns," followed by

<NN>? = “0 or one singular noun."

+ = match 1 or more

? = match 0 or 1 repetitions.

* = match 0 or MORE repetitions

. = Any character except a new line

chunkGram = r"""Chunk: {<RB.?>*<VB.?>*<NNP>+<NN>?}"""

chunkParser = nltk.RegexpParser(chunkGram)

chunked = chunkParser.parse(tagged)

######### NLTK Chunking #########

(S cat/NN with/IN my/PRP$ car/NN)

{'my': 0, 'with': 0, 'car': 1, 'cat': 1}

/NN – Singular Noun/IN – Preposition/PRP – Personal Pronoun

User Query processing – NLTKImage Searching System

Predicting Images – Cosine SimilarityImage Searching System

Term Value

Cat 1

With 0

my 0

Car 1

Kitty 1

Dog 0

NLTK Output Vector

Predicted Class Initial probability Add synonyms (Probability)

Normalize

Car 0.2 0.2 0.2/1.7=0.12

Cat 0.7 0.7 0.7/1.7=0.41

kitty - 0.7 0.7/1.7=0.41

Dog 0.1 0.1 0.1/1.7=0.06

With - 0 0

my - 0 0

1.0 1.7 1.0

CNN Output Vector

User Query“cat with my car”

Final OverviewImage Searching System

NLTK

CNN

“Cat with my car”

Cosine Similarity

Term Dictionary

Term Dictionary

How its going to works?Image Searching System

Any Question ?

Thank You

Documents

CIS 660 Image Searching System using CNN-LSTMeecs.csuohio.edu/~sschung/CIS660/QASystemImageTextSagarMyur.pdf · Image Processing - CNN Image Searching System ReLu Replace negative