56
Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee, Dhruv Batra IEEE ICCV 2017 Presented By: Nalin Chhibber [email protected] CS 885: Reinforcement Learning Pascal Poupart

Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,

Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning

Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee, Dhruv Batra

IEEE ICCV 2017

Presented By:Nalin Chhibber

[email protected]

CS 885: Reinforcement LearningPascal Poupart

Page 2: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,

Outline● Introduction● Paper overview● Contribution and key takeaways● Critique● Class discussion

Page 3: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,

Introduction

Problem Space: Intersection of Vision and Language

● Image Captioning○ Predict one sentence description of an image.

● Visual Question Answering○ Predict a natural language answer given an image and a question.

● Visual Dialog○ Predict a free-form NL answer given an image, a dialog history, and

a follow-up question.

Page 4: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,

Paper Overview

Focused on creating a visually-grounded conversational artificial intelligence (AI)

Develop AI agents that can● See (understand contents of an image)● Communicate (understand and hold a dialog in natural language)

Applications:● Help visually impaired users understand their surroundings● Enable analysts to sift through large quantities of surveillance data

Page 5: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,

Paper Overview

Most of the previous work treat this as a static supervised learning problem

Problem-1Model cannot steer conversation and doesn’t get to see the future consequences of its utterances during training.

Problem-2Evaluations are infeasible for utterances outside the dataset.

Page 6: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,

Guess WhichAn image guessing game between Q-Bot and A-Bot

Page 7: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,

Guess WhichAn image guessing game between Q-Bot and A-Bot

Q-Bot

Questioning AgentBlind-folded

Page 8: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,

A-Bot

Answering AgentAccess to secret image

Guess WhichAn image guessing game between Q-Bot and A-Bot

Page 9: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,

Training types

Conducted two types of demonstration with

● Completely ungrounded synthetic world (RL from scratch)○ Agents communicate via symbols with no pre-specified meanings.

● Large-scale experiment on real images using VisDial dataset○ Pretrain on dialog data with SL, followed by fine-tuning with RL.

Page 10: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,

Reinforcement Learning Framework

Environment

Action

State

Other agent + set of all images Other agent + secret image

Set of all possible questions Set of valid answers

st=[I, c, q1a1, q2a2..qt-1at-1, qt]st=[c, q1a1, q2a2..qt-1at-1]

RewardChange in distance to the true representation before/after a round of dialog

Page 11: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,

Training Details1. Pretrained with supervised learning on Visual Dialog dataset (VisDial)2. Fine-tuned with REINFORCE

Page 12: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,

Training Details1. Pretrained with supervised learning on Visual Dialog dataset (VisDial)2. Fine-tuned with REINFORCE

Curriculum LearningProblem: Discrete change in learning landscapeSolution: Gently hand over control to reinforcement learning

Page 13: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,

Training Details1. Pretrained with supervised learning on Visual Dialog dataset (VisDial)2. Fine-tuned with REINFORCE

Curriculum LearningProblem: Discrete change in learning landscapeSolution: Gently hand over control to reinforcement learning

Reward ShapingProblem: Delayed rewardSolution: Improvement-based intermediate rewards

Page 14: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,

Model Internals

Page 15: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,

Model Internals

Page 16: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,
Page 17: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,
Page 18: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,
Page 19: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,
Page 20: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,
Page 21: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,
Page 22: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,
Page 23: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,
Page 24: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,
Page 25: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,
Page 26: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,
Page 27: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,
Page 28: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,
Page 29: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,
Page 30: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,
Page 31: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,
Page 32: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,
Page 33: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,
Page 34: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,
Page 35: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,
Page 36: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,
Page 37: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,

Model Evaluation

Page 38: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,

Model Evaluation

1. Comparison with few natural ablations of the full model (RL-full-QAf)○ SL-pretrained○ Frozen-A○ Frozen-Q○ Frozen-F (regression network)

2. How well the agents perform at guessing game3. How closely they emulate human dialogs

Page 39: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,

Evaluation 1

1. Comparison with few natural ablations of the full model (RL-full-QAf)2. How well the agents perform at guessing game3. How closely they emulate human dialogs

Page 40: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,

Evaluation 2

1. Comparison with few natural ablations of the full model (RL-full-QAf)2. How well the agents perform at guessing game3. How closely they emulate human dialogs

Page 41: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,

Evaluation 2

1. Comparison with few natural ablations of the full model (RL-full-QAf)2. How well the agents perform at guessing game3. How closely they emulate human dialogs

Page 42: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,

Evaluation 2

1. Comparison with few natural ablations of the full model (RL-full-QAf)2. How well the agents perform at guessing game3. How closely they emulate human dialogs

Page 43: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,

Evaluation 2

1. Comparison with few natural ablations of the full model (RL-full-QAf)2. How well the agents perform at guessing game3. How closely they emulate human dialogs

Page 44: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,

Evaluation 2

Page 45: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,

Evaluation 2

Page 46: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,

Evaluation 2

Page 47: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,

Evaluation 2

Sorting based on distance to fc7 vectors

Page 48: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,

Evaluation 2

Sorting based on distance to fc7 vectors

Page 49: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,

Evaluation 2

Rank of ground truth image = 2

Page 50: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,

Evaluation 3

1. Comparison with few natural ablations of the full model (RL-full-QAf)2. How well the agents perform at guessing game3. How closely they emulate human dialogs

Human interpretability study to measure:● whether humans can easily understand the Q-BOT-A-BOT dialog.● how image-discriminative the interactions are.

Mean rank for ground-truth image (lower is better)

Mean Reciprocal Rank(higher is better)

3.70 vs 2.73(SL) (RL)

0.518 vs 0.622(SL) (RL)

Page 51: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,

Results

Page 52: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,

SL vs SL+RL

Supervised Q-BOT seemed to mimic how humans ask questions.

RL trained Q-BOT seemed to shifts strategies and asks questions that the A-BOT was better at answering.

Results

Page 53: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,

SL vs SL+RL

Supervised Q-BOT seemed to mimic how humans ask questions.

RL trained Q-BOT seemed to shifts strategies and asks questions that the A-BOT was better at answering.

Dialog between the agents were NOT ‘hand engineered’ to be image discriminative. It emerged as a strategy to succeed at the image-guessing game.

Results

Page 54: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,

● Emergence of Grounding (RL from scratch)

More details in the follow-up paper:Natural Language Does Not Emerge 'Naturally' in Multi-Agent

DialogKottur et al., EMNLP 2017

Results

The two bots invented their own communication protocol without any human supervision

Page 55: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,

Contributions● Goal-driven training of visual question answering and dialog agents.

○ Self-talk = infinite data○ Goal-based = evaluation on downstream task○ Agent-driven = agents learn to deal with consequences of their actions.

● End-to-end learning from pixels to multi-agent multi-round dialog to game reward.○ Move from SL on static datasets to RL on actual environment.

Page 56: Abhishek Das, Satwik Kottur, José M.F. Moura, …Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee,

Class Discussions

● Do you think this approach is limited to goal-driven tasks in dialog systems?○ If not, how can this be extended to open-ended conversations?

● What other reward models can be used to make SL-RL dialog systems more successful?