54
Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction by Noh, Hyeonwoo, Paul Hongsuck Seo, and Bohyung Han.[1] Presented : Badri Patro 1 1 Computer Vision Reading Group Indian Institute of Technology Kanpur. June 24, 2016 Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 1 / 45

Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

Image Question Answering using Convolutional NeuralNetwork with Dynamic Parameter Prediction

by Noh, Hyeonwoo, Paul Hongsuck Seo, and Bohyung Han.[1]

Presented : Badri Patro1

1Computer Vision Reading GroupIndian Institute of Technology Kanpur.

June 24, 2016

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 1 / 45

Page 2: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

Plagiarism

All the figures, equations and text are taken from above paper andreference papers.

This presentation is meant for educational purpose only.

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 2 / 45

Page 3: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

Table of Contents

1 Introduction

2 Related Work

3 Algorithm OverviewProblem Formulation

4 Network ArchitectureClassification NetworkParameter Prediction NetworkParameter Hashing

5 Training AlgorithmTraining by Error Back-Propagation

6 ExperimentDatasetsResults

7 Conclusion

8 References

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 3 / 45

Page 4: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

Table of Contents

1 Introduction

2 Related Work

3 Algorithm OverviewProblem Formulation

4 Network ArchitectureClassification NetworkParameter Prediction NetworkParameter Hashing

5 Training AlgorithmTraining by Error Back-Propagation

6 ExperimentDatasetsResults

7 Conclusion

8 References

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 4 / 45

Page 5: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

Introduction

Problem Statement:Given an image,machine will automatically answer questions posed byhumans in natural language query.

Training on a set of triplets (image, question, answer).

Answers can be single word or multiple word, depending ondataset.

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 5 / 45

Page 6: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

Introduction

Problem Statement:Given an image,machine will automatically answer questions posed byhumans in natural language query.

Training on a set of triplets (image, question, answer).

Answers can be single word or multiple word, depending ondataset.

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 5 / 45

Page 7: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

Introduction

Type of Question

Fine-grained recognition (e.g.,“What kind of cheese is on thepizza?”).

Object detection(e.g., “How many bikes are there?”).

Activity recognition (e.g., “Is this man crying?”).

Knowledge base reasoning(e.g., “Why did Katappa killedBahubali?”).

Commonsense reasoning (e.g., “Does this person have 20/20vision?”, “Is this person expecting company?”)..

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 6 / 45

Page 8: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

Introduction

Type of Question

Fine-grained recognition (e.g.,“What kind of cheese is on thepizza?”).

Object detection(e.g., “How many bikes are there?”).

Activity recognition (e.g., “Is this man crying?”).

Knowledge base reasoning(e.g., “Why did Katappa killedBahubali?”).

Commonsense reasoning (e.g., “Does this person have 20/20vision?”, “Is this person expecting company?”)..

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 6 / 45

Page 9: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

Introduction

Type of Question

Fine-grained recognition (e.g.,“What kind of cheese is on thepizza?”).

Object detection(e.g., “How many bikes are there?”).

Activity recognition (e.g., “Is this man crying?”).

Knowledge base reasoning(e.g., “Why did Katappa killedBahubali?”).

Commonsense reasoning (e.g., “Does this person have 20/20vision?”, “Is this person expecting company?”)..

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 6 / 45

Page 10: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

Introduction

Type of Question

Fine-grained recognition (e.g.,“What kind of cheese is on thepizza?”).

Object detection(e.g., “How many bikes are there?”).

Activity recognition (e.g., “Is this man crying?”).

Knowledge base reasoning(e.g., “Why did Katappa killedBahubali?”).

Commonsense reasoning (e.g., “Does this person have 20/20vision?”, “Is this person expecting company?”)..

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 6 / 45

Page 11: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

Introduction

Type of Question

Fine-grained recognition (e.g.,“What kind of cheese is on thepizza?”).

Object detection(e.g., “How many bikes are there?”).

Activity recognition (e.g., “Is this man crying?”).

Knowledge base reasoning(e.g., “Why did Katappa killedBahubali?”).

Commonsense reasoning (e.g., “Does this person have 20/20vision?”, “Is this person expecting company?”)..

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 6 / 45

Page 12: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

Introduction

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 7 / 45

Page 13: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

Result of multiple questions for a single image

Figure : Result of the proposed algorithm on multiple questions for a single image

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 8 / 45

Page 14: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

Introduction

The critical challenge of this problem is that different questionsrequire different types and levels of understanding of an image to findcorrect answers.

For example, to answer the question like “how is the weather?” weneed to perform classification on multiple choices related to weather,while we should decide between yes and no for the question like “isthis picture taken during the day?”.

For this reason, not only the performance on a single recognition taskbut also the capability to select a proper task is important to solveImageQA problem.

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 9 / 45

Page 15: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

Introduction

Simple deep learning based approaches that perform classification ona combination of features extracted from image and questioncurrently demonstrate the state-of-the-art accuracy.

The existing approaches of VQA extract image features using aconvolutional neu- ral network (CNN), and use CNN or bag-of-wordsto obtain feature descriptors from question.

These methods can be interpreted as the answer is given by theco-occurrence of a particular combination of features extracted froman image and a question.

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 10 / 45

Page 16: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

Introduction

Contrary to the existing approaches, Author defines a differentrecognition task depending on a question.

In order to realize this idea, Author proposed a deep CNN with adynamic parameter layer whose weights are determined adaptivelybased on questions.

This paper claims that a single deep CNN architecture can take careof various tasks by allowing adaptive weight assignment in thedynamic parameter layer

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 11 / 45

Page 17: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

Introduction

Main contributions in this work are summarized below:

successfully adopted a deep CNN with a dynamic parameter layerfor ImageQA, which is a fully-connected layer whose parametersare determined dynamically based on a given question.

To predict a large number of weights in the dynamic parameterlayer, applyed hashing trick , which reduces the number ofparameters significantly with little impact on network capacity.

We fine-tune GRU pre-trained on a large-scale text corpus [14] toimprove generalization performance of our network.

This is the first work to report the results on all currentlyavailable benchmark datasets such as DAQUAR,COCO-QA andVQA.

Our algorithm achieves the state-of-the-art performance on allthe three datasets..

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 12 / 45

Page 18: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

Table of Contents

1 Introduction

2 Related Work

3 Algorithm OverviewProblem Formulation

4 Network ArchitectureClassification NetworkParameter Prediction NetworkParameter Hashing

5 Training AlgorithmTraining by Error Back-Propagation

6 ExperimentDatasetsResults

7 Conclusion

8 References

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 13 / 45

Page 19: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

A multi-world approach to QA by[Malinowski et al.,2014 ][6]

It employs semantic image segmentation and symbolic questionreasoning to solve ImageQA problem.However, this method depends on a pre-defined set ofpredicates,which makes it difficult to represent complex modelsrequired to understand input images.Deep learning based approaches demonstrate competitiveperformances in ImageQA.

Figure : multi-world approach[Malinowski et al.,2014][6]Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 14 / 45

Page 20: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

Neural Image-QA[Malinowski et al.,2015][5]

Most of approaches based on deep learning, use CNNs to extractfeatures from image while they use different strategies to handlequestion sentences.Some algorithms employ embedding of joint features based on imageand ques- tion.Neural Image-QA model based approch, the image representationfrom CNN is fed to each hidden layers of LSTM.In this model, the answers is short, such as one single word ,i.e theobject category, color, number, and so on.

Figure : Neural Image-QA(Malinowski et al.,2015)[5]Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 15 / 45

Page 21: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

VSE(VIS+LSTM) Model[Ren et al.,May,2015][4]

In VSE(visual semantic embedding) Model, image QA task isformulated as a classification problem .This model also contains a single LSTM and a CNN.LSTM is employed to jointly model the image and question bytreating the image as an independent word, and appending it to thequestion at the beginning or end.

Figure : VSE model- VIS LSTM(Ren et al.,May,2015)[4]

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 16 / 45

Page 22: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

mQA Model

The structure of mQA model is inspired by the m-RNN model [7] for theimage captioning and image-sentence retrieval tasks. Block diagram ofM-RNN as show in figure.mQA adopts a deep CNN for computer vision and a RNN for language.mQA model is extended to handle the input of question and image pairs,and generate answers.

Figure : m RNN model(Mao et al.,2015)[7]

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 17 / 45

Page 23: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

mQA Model [Gao et al.,Nov, 2015][3]

It contains four components: a Long Short-Term Memory (LSTM) toextract the question representation, a Convolutional Neural Network(CNN) to extract the visual representation, an LSTM for storing thelinguistic context in an answer, and a fusing component to combinethe information from the first three components and generate theanswer.Different from [4][5], the image representation does not feed into theLSTM . Here, we use two separate LSTMs for questions and answersrespectively in consideration of the different properties (e.g.grammar) of questions and answers.

Figure : mQA model(Gao et al.,Nov, 2015)[3]

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 18 / 45

Page 24: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

ConvQA Model [Ma et al.,Nov,2015] [2]

In this approach uses 3 CNN’s - one to extract sentence representation,one for image representation, and the third is a multimodal layer to fusethe two.

Figure : ConvQA(Ma et al.,Nov,2015)[2]

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 19 / 45

Page 25: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

Table of Contents

1 Introduction

2 Related Work

3 Algorithm OverviewProblem Formulation

4 Network ArchitectureClassification NetworkParameter Prediction NetworkParameter Hashing

5 Training AlgorithmTraining by Error Back-Propagation

6 ExperimentDatasetsResults

7 Conclusion

8 References

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 20 / 45

Page 26: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

Introduction

Figure : Overall architecture of the proposed Dynamic Parameter Predictionnetwork (DPPnet), which is composed of the classification network and theparameter prediction network.

The weights in the dynamic parameter layer are mapped by a hashingtrick from the candidate weights obtained from the parameterprediction network.

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 21 / 45

Page 27: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

Problem Formulation

ImageQA systems predict the best answer a given an image I and aquestion q.

Conventional approaches

It is a joint feature vector based on two inputs I and q and solve aclassification problem for ImageQA giveb by following eq

Where ω is a set of all possible answers and θ is a vector for theparameters in the network

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 22 / 45

Page 28: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

Problem Formulation

Proposed approaches

Here, Authur uses the question to predict weights in the classifier andsolve the problem.

where θs andθd(q) denote static and dynamic parameters, respectively.Note that the values of θd(q) are determined by the question q

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 23 / 45

Page 29: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

Table of Contents

1 Introduction

2 Related Work

3 Algorithm OverviewProblem Formulation

4 Network ArchitectureClassification NetworkParameter Prediction NetworkParameter Hashing

5 Training AlgorithmTraining by Error Back-Propagation

6 ExperimentDatasetsResults

7 Conclusion

8 References

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 24 / 45

Page 30: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

Classification Network

The classification network is constructed based on VGG 16-layer netwhich is pre-trained on ImageNet.

The last layer of Vgg net is removed and attached threefully-connected layers.

The second last fully-connected layer is the dynamic parameter layerwhose weights are determined by the parameter prediction network,

The last fully-connected layer is the classification layer whose outputdimensionality is equal to the number of possible answers.

The probability for each answer is computed by applying a softmaxfunction to the output vector of the final layer.

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 25 / 45

Page 31: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

Classification Network

Dynamic parameter layer in the second last fully-connected layerinstead of the classification layer because it involves the smallestnumber of parameters.

As the number of parameters in the classification layer increases inproportion to the number of possible answers, which may not be agood option to general ImageQA problems.

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 26 / 45

Page 32: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

Dynamic Parameter Layer

Classification network has a dynamic parameter layer. That is, for aninput vector of the dynamic parameter layer f i = [f i1 , ..., f

iN ], its

output vector denoted by f o = [f o1 , ..., foM ] is given

Where b denotes a bias and Wd(q)εRM∗N denotes the weight matrixconstructed dynamically using the parameter prediction network giventhe input question q.

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 27 / 45

Page 33: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

Parameter Prediction Network(PPN)

The parameter prediction network is composed of GRU cells [Chung.et al.,] followed by a fully-connected layer.

fc layer of PPN produces the candidate weights to be used for theconstruction of weight matrix in the dynamic parameter layer withinthe classification network.

fc layer of PPN produces the candidate weights to be used for theconstruction of weight matrix in the dynamic parameter layer withinthe classification network.

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 28 / 45

Page 34: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

Parameter Prediction Network(PPN)

Let w1, ..., wT be the words in a question q, where T is the numberof words in the question.

In each time step t, given the embedded vector xt for a word wt ,GRUencoder updates its hidden ht at time t is given by

where rt and zt respectively denote the reset and update gates attime t,ht is candidate activation at time t.

Note that the coefficient matrices related to GRU such asWr ,Wz ,Wh,Ur ,Uz , andUh are learned by our training algorithm.

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 29 / 45

Page 35: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

Parameter Hashing

The final hash function is given by

This function is useful to remove the bias of hashed inner product[3(original ref)].

In our implementation of the hash function, we adopt an open-sourceimplementation of xxHash.

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 30 / 45

Page 36: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

Table of Contents

1 Introduction

2 Related Work

3 Algorithm OverviewProblem Formulation

4 Network ArchitectureClassification NetworkParameter Prediction NetworkParameter Hashing

5 Training AlgorithmTraining by Error Back-Propagation

6 ExperimentDatasetsResults

7 Conclusion

8 References

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 31 / 45

Page 37: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

Training by Error Back-Propagation

The proposed network is trained end-to-end to minimize the errorbetween the ground-truths and the estimated an- swers.

The error is back-propagated to both the classification network andthe parameter prediction network and jointly trained by a 1st orderoptimization method.

Let L denote the loss function. The partial derivatives of L withrespect to the k th element in the input and output of the dynamicparameter layer are given respectively by

The two derivatives have the following relation:

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 32 / 45

Page 38: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

Training by Error Back-Propagation

The proposed network is trained end-to-end to minimize the errorbetween the ground-truths and the estimated an- swers.

The error is back-propagated to both the classification network andthe parameter prediction network and jointly trained by a 1st orderoptimization method.

Let L denote the loss function. The partial derivatives of L withrespect to the k th element in the input and output of the dynamicparameter layer are given respectively by

The two derivatives have the following relation:

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 32 / 45

Page 39: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

Training by Error Back-Propagation

The proposed network is trained end-to-end to minimize the errorbetween the ground-truths and the estimated an- swers.

The error is back-propagated to both the classification network andthe parameter prediction network and jointly trained by a 1st orderoptimization method.

Let L denote the loss function. The partial derivatives of L withrespect to the k th element in the input and output of the dynamicparameter layer are given respectively by

The two derivatives have the following relation:

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 32 / 45

Page 40: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

Training by Error Back-Propagation

Likewise, the derivative with respect to the assigned weights in thedynamic parameter layer is given by

A single output value of the PPN is shared by multiple connections inthe DPL.

To compute the derivative with respect to an element in the outputof the parameter prediction network(PPN) as follows

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 33 / 45

Page 41: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

Table of Contents

1 Introduction

2 Related Work

3 Algorithm OverviewProblem Formulation

4 Network ArchitectureClassification NetworkParameter Prediction NetworkParameter Hashing

5 Training AlgorithmTraining by Error Back-Propagation

6 ExperimentDatasetsResults

7 Conclusion

8 References

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 34 / 45

Page 42: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

VQA

MS-COCO

DAQUAR-all and DAQUAR reduced.

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 35 / 45

Page 43: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

Evaluation results on VQA

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 36 / 45

Page 44: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

Evaluation results on COCO-QA

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 37 / 45

Page 45: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

Evaluation results on DAQUAR

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 38 / 45

Page 46: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

Sentences before and after fine-tuning GRU

Figure : Retrieved sentences before and after fine-tuning GRU

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 39 / 45

Page 47: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

Sentences before and after fine-tuning GRU

Figure : Retrieved sentences before and after fine-tuning GRU

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 40 / 45

Page 48: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

Result of multiple questions for a single image

Figure : Result of the proposed algorithm on multiple questions for a single image

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 41 / 45

Page 49: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

Results of single common question for multiple images

Figure : Results of the proposed algorithm on a single common question formultiple images

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 42 / 45

Page 50: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

Table of Contents

1 Introduction

2 Related Work

3 Algorithm OverviewProblem Formulation

4 Network ArchitectureClassification NetworkParameter Prediction NetworkParameter Hashing

5 Training AlgorithmTraining by Error Back-Propagation

6 ExperimentDatasetsResults

7 Conclusion

8 References

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 43 / 45

Page 51: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

Conclusion

The effectiveness of the proposed architecture is supported byexperimental results showing the state-of-the-art performances onthree different dataset.

Note that the proposed method achieved outstanding performanceeven without more complex recognition processes such as referencingobjects.

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 44 / 45

Page 52: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

Table of Contents

1 Introduction

2 Related Work

3 Algorithm OverviewProblem Formulation

4 Network ArchitectureClassification NetworkParameter Prediction NetworkParameter Hashing

5 Training AlgorithmTraining by Error Back-Propagation

6 ExperimentDatasetsResults

7 Conclusion

8 References

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 45 / 45

Page 53: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

Noh, Hyeonwoo, Paul Hongsuck Seo, and Bohyung Han. ‘‘Image questionanswering using convolutional neural network with dynamic parameterprediction.”,arXiv preprint arXiv:1511.05756 (2015).””

L. Ma, Z. Lu, and H. Li., ‘‘Learning to Answer Questions From Image usingConvolutional Neural Network”,CoRR abs/1506.00333, Nov, 2015.

H. Gao, J. Mao, J. Zhou, Z. Huang, L. Wang and W. Xu.,‘‘Are you talking to amachine? dataset and methods for multilingual image question answering.”,arXiv1505.05612v3, Nov, 2015.

M. Ren, R. Kiros, and R. S. Zemel, ‘‘Exploring models and data for image questionanswering”,arXiv 1505.02074, , 2015.

M. Malinowski, M. Rohrbach, and M. Fritz.,‘‘Ask your neurons: A neural-basedapproach to answering questions about images.”,arXiv 1505.01121, Nov, 2015.

M. Malinowski and M. Fritz.,‘‘Towards a visual turing challenge.”,In LearningSemantics (NIPS workshop), 2014.

J. Mao, W. Xu, Y. Yang, J. Wang, Z. Huang, and A. Yuille.,‘‘Deep captioning withmultimodal recurrent neural networks (m-rnn).”,In ICLR, 2015.

S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C. L. Zitnick, and D.Parikh.,‘‘Vqa: Visual question answering.”, arXiv preprint arXiv:1505.00468, 2015.

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 45 / 45

Page 54: Image Question Answering using Convolutional Neural Network …home.iitk.ac.in/~badri/badripatro/doc/Image_Question... · 2017-09-17 · Image Question Answering using Convolutional

S. Hochreiter and J. Schmidhuber.,‘‘Long short-term memory.”,Neuralcomputation, 9(8):1735-1780, 1997.

K. Simonyan and A. Zisserman.,‘‘Very deep convolutional networks for large-scaleimage recognition.”, In ICLR, 2015.

I. Sutskever, O. Vinyals, and Q. V. Le.,‘‘Sequence to sequence learning with neuralnetworks.”, In NIPS, pages 3104-3112, 2014.

Badri Patro (IIT Kanpur) Image Question Answering June 24, 2016 45 / 45