Upload
others
View
9
Download
0
Embed Size (px)
Citation preview
CIS 660
Image Searching System using CNN-LSTM
Presented by
Mayur Rumalwala
Sagar Dahiwala
AGENDAImage Searching System
• Problem in Image Searching?
• Proposed Solution
• Tools, Library and Dataset used
• Architecture of Proposed System
• Implementation of Algorithm
• CNN, LSTM
Problem in image searching?Image Searching System
• Current system search images,
• Based on title
• Based on description
• Based on META data
• Large Images set without description
Proposed SolutionImage Searching System
• Two Different approaches we can think of,
1. Search image using similar image
2. Search image based on sentence (User query)
Tools, Library and Dataset usedImage Searching System
• Image Dataset – The Caltech 256
• 256 Object Categories + Cluster
• At least 80 images per categories
• 30,608 images
• http://www.vision.caltech.edu/Image_Datasets/Caltech256/
• Text Dataset – Cornell Movie Dialog corpus dataset
• 220,579 conversational exchanges between 10,292 pairs of movie characters
• involves 9,035 characters from 617 movies
• in total 304,713 utterances
• http://www.cs.cornell.edu/~cristian/Cornell_Movie-Dialogs_Corpus.html
Architecture of Proposed SystemImage Searching System
Image Processing - CNNImage Searching System
Image representation based on RGB
Image Processing - CNNImage Searching System
# Convolutional Layer 1.filter_size1 = 5num_filters1 = 32
# Convolutional Layer 2.filter_size2 = 5num_filters2 = 64
How filter is used
Image Processing - CNNImage Searching System
Image Processing - CNNImage Searching System
ReLu Replace negative values with zero
# Rectified Linear Unit (ReLU).# It calculates max(x, 0) for each input pixel x.# This adds some non-linearity to the formula layer = tf.nn.relu(layer)
Image Processing - CNNImage Searching System
# This is 2x2 max-pooling, which means that we# consider 2x2 windows and select the largest value# in each window. Then we move 2 pixels to the next window.layer = tf.nn.max_pool(value=layer,
ksize=[1, 2, 2, 1],strides=[1, 2, 2, 1],padding='SAME')
Max Pooling
Image Processing - CNNImage Searching System
def new_conv_layer(input, # The previous layer.num_input_channels, # Num. channels in prev. layer.filter_size, # Width and height of each filter.num_filters, # Number of filters.use_pooling=True): # Use 2x2 max-pooling.
# Shape of the filter-weights for the convolution.# This format is determined by the TensorFlow API.shape = [filter_size, filter_size, num_input_channels, num_filters]
# Create new weights aka. filters with the given shape.weights = new_weights(shape=shape)
# Create new biases, one for each filter.biases = new_biases(length=num_filters)
# Create the TensorFlow operation for convolution.# Note the strides are set to 1 in all dimensions.# The first and last stride must always be 1,# because the first is for the image-number and# the last is for the input-channel.# But e.g. strides=[1, 2, 2, 1] would mean that the filter# is moved 2 pixels across the x- and y-axis of the image.# The padding is set to 'SAME' which means the input image# is padded with zeroes so the size of the output is the same.layer = tf.nn.conv2d(input=input,
filter=weights,strides=[1, 1, 1, 1],padding='SAME')
Image Processing - CNNImage Searching System
User Query processing - RNNImage Searching System
RNN – Recurrent Neural Network
RNN – Recurrent Neural Network
User Query processing - RNNImage Searching System
RNN – Recurrent Neural Network
User Query processing - RNNImage Searching System
RNN – Recurrent Neural Network
User Query processing - RNNImage Searching System
Vector Representation
User Query processing - RNNImage Searching System
RNN – Recurrent Neural Network
User Query processing - LSTMImage Searching System
• Element by Element addition (+)
• Element by Element Multiplication (X)
• Memory (M)
• Squashing function (f)
User Query processing - NLTKImage Searching System
NLTK – Natural Language Tool-Kit
Sentence : “cat with my car”
<RB.?>* = "0 or more of any tense of adverb," followed by:
<VB.?>* = "0 or more of any tense of verb," followed by:
<NNP>+ = "One or more proper nouns," followed by
<NN>? = “0 or one singular noun."
+ = match 1 or more
? = match 0 or 1 repetitions.
* = match 0 or MORE repetitions
. = Any character except a new line
chunkGram = r"""Chunk: {<RB.?>*<VB.?>*<NNP>+<NN>?}"""
chunkParser = nltk.RegexpParser(chunkGram)
chunked = chunkParser.parse(tagged)
######### NLTK Chunking #########
(S cat/NN with/IN my/PRP$ car/NN)
{'my': 0, 'with': 0, 'car': 1, 'cat': 1}
/NN – Singular Noun/IN – Preposition/PRP – Personal Pronoun
User Query processing – NLTKImage Searching System
Predicting Images – Cosine SimilarityImage Searching System
Term Value
Cat 1
With 0
my 0
Car 1
Kitty 1
Dog 0
NLTK Output Vector
Predicted Class Initial probability Add synonyms (Probability)
Normalize
Car 0.2 0.2 0.2/1.7=0.12
Cat 0.7 0.7 0.7/1.7=0.41
kitty - 0.7 0.7/1.7=0.41
Dog 0.1 0.1 0.1/1.7=0.06
With - 0 0
my - 0 0
1.0 1.7 1.0
CNN Output Vector
User Query“cat with my car”
Final OverviewImage Searching System
NLTK
CNN
“Cat with my car”
Cosine Similarity
Term Dictionary
Term Dictionary
How its going to works?Image Searching System
Any Question ?
Thank You