Upload
others
View
21
Download
0
Embed Size (px)
Citation preview
Dynamic Memory Networks for Visual and Textual Question Answering
Nils HjortnaesSeminar on Bio-Medical Image Analysis
Table of Contents
1. Problem - What is being accomplished?
2. Dynamic Memory Networks - What is this based on?
3. Improved Dynamic Memory Networks - What is new?
4. Results - Do they improve things?
5. Conclusion
Question Answering Example 1
I: Jane went to the hallway.
I: Mary walked to the bathroom.
I: Sandra went to the garden.
I: Daniel went back to the garden.
I: Sandra took the milk there.
Q: Where is the milk?
A: garden(Kumar et al.)
Gated Recurrent Units (GRU)
● Appear repeatedly
● Type of RNN
● Similar to LSTM
○ Less expensive
○ Same performance
(Kumar et al.)
Improved Dynamic Memory Networks (DMN+)
● Improvement on the Dynamic Memory Network
● Two major changes proposed
○ Input Module split in two
○ New memory GRU
● Visual input added
● Supporting facts not marked in training
● Regions processed with
CNN
● Output 14 x14 grid
● Vector size = 512
● Linear Layer
● Input Fusion Layer -
Bi-directional GRU
DMN+ Visual Input Model
Conclusion
● Good improvements
○ Input Module
○ Memory Gate and GRU
● Visual Input added
● New state of the art
● But - room for improvement
References
Cho, K., van Merrienboer, B., Bahdanau, D., and Bengio, Y. On the properties of neural machine translation: Encoder-decoder approaches. CoRR, abs/1409.1259, 2014a.
Kumar, A., Ondruska, P., Iyyer, M., Bradbury, J., Gulrajani, I., Zhong, V., Paulus, R., Socher, R. Ask Me Anything: Dynamic Memory Networks for Natural Language Processing. arXiv:1506.07285, 2016
Xiong, C., Merity, S., Socher, R. Dynamic Memory Networks for Visual and Textual Question Answering. arXiv:1603.01417, 2016
Sukhbaatar, S., Szlam, A., Weston, J., and Fergus, R. End-to-end memory networks. In NIPS, 2015
Zhou, B., Tian, Y., Sukhbaatar, S., Szlam, A., Fergus, R. Simple baseline for visual
question answering. arXiv prepring arXiv:1512.02167, 2015