41
Dynamic Memory Networks for Visual and Textual Question Answering Nils Hjortnaes Seminar on Bio-Medical Image Analysis

Dynamic Memory Networks for Visual and Textual Question ... · and Textual Question Answering Nils Hjortnaes Seminar on Bio-Medical Image Analysis. Table of Contents 1. Problem -

  • Upload
    others

  • View
    21

  • Download
    0

Embed Size (px)

Citation preview

Dynamic Memory Networks for Visual and Textual Question Answering

Nils HjortnaesSeminar on Bio-Medical Image Analysis

Table of Contents

1. Problem - What is being accomplished?

2. Dynamic Memory Networks - What is this based on?

3. Improved Dynamic Memory Networks - What is new?

4. Results - Do they improve things?

5. Conclusion

Demo

http://visualqa.csail.mit.edu/

Problem?

Question Answering Example 1

I: Jane went to the hallway.

I: Mary walked to the bathroom.

I: Sandra went to the garden.

I: Daniel went back to the garden.

I: Sandra took the milk there.

Q: Where is the milk?

A: garden(Kumar et al.)

Question Answering Example 2

Dynamic Memory Networks

Dynamic Memory Networks (DMN)

Tangent: Gated Recurrent Units

Gated Recurrent Units (GRU)

● Appear repeatedly

● Type of RNN

● Similar to LSTM

○ Less expensive

○ Same performance

(Kumar et al.)

Dynamic Memory Networks (DMN)

(Kumar et al.)

Dynamic Memory Networks (DMN)

(Kumar et al.)

DMN Input Module

(Kumar et al.)

Dynamic Memory Networks (DMN)

(Kumar et al.)

DMN Question Module

(Kumar et al.)

Dynamic Memory Networks (DMN)

(Kumar et al.)

DMN Episodic Memory Module

(Kumar et al.)

DMN Attention Mechanism

(Kumar et al.)

Dynamic Memory Networks (DMN)

(Kumar et al.)

DMN Answer Module

(Kumar et al.)

m

q

Improved Dynamic Memory Networks(DMN+)

Improved Dynamic Memory Networks (DMN+)

● Improvement on the Dynamic Memory Network

● Two major changes proposed

○ Input Module split in two

○ New memory GRU

● Visual input added

● Supporting facts not marked in training

DMN+ Input Module

Remember: DMN encodes directly to facts and is one-directional

(Kumar et al.)

DMN+ Input Module

● Regions processed with

CNN

● Output 14 x14 grid

● Vector size = 512

● Linear Layer

● Input Fusion Layer -

Bi-directional GRU

DMN+ Visual Input Model

DMN+ Episodic Memory Module

DMN+ Attention Mechanism

DMN+ Episodic Memory Module

DMN+ Memory Update

Tied Memory Update

Untied Memory Update

Results

DMN Version Comparison

DMN+ and others

Visual Question Answering Results

Visual Question Answering Results

Attention Overlays

Attention Overlays

Attention Overlays

Attention Overlays

Conclusion

Conclusion

● Good improvements

○ Input Module

○ Memory Gate and GRU

● Visual Input added

● New state of the art

● But - room for improvement

References

Cho, K., van Merrienboer, B., Bahdanau, D., and Bengio, Y. On the properties of neural machine translation: Encoder-decoder approaches. CoRR, abs/1409.1259, 2014a.

Kumar, A., Ondruska, P., Iyyer, M., Bradbury, J., Gulrajani, I., Zhong, V., Paulus, R., Socher, R. Ask Me Anything: Dynamic Memory Networks for Natural Language Processing. arXiv:1506.07285, 2016

Xiong, C., Merity, S., Socher, R. Dynamic Memory Networks for Visual and Textual Question Answering. arXiv:1603.01417, 2016

Sukhbaatar, S., Szlam, A., Weston, J., and Fergus, R. End-to-end memory networks. In NIPS, 2015

Zhou, B., Tian, Y., Sukhbaatar, S., Szlam, A., Fergus, R. Simple baseline for visual

question answering. arXiv prepring arXiv:1512.02167, 2015