Deep Reinforcement Learning-based Image Captioning with...

Deep Reinforcement Learning-based Image Captioning with Embedding Reward

Zhou Ren1 Xiaoyu Wang1 Ning Zhang1 Xutao Lv1 Li-Jia Li2

1 Snap Research 2Google

Image captioning

Deep Reinforcement Learning-based Image Captioning with Embedding Reward. CVPR 2017.

[Farhadi et al. ECCV 2010][Kulkarni et al. CVPR 2011][Yang et al. EMNLP 2011][Fang et al. CVPR 2015][Lebret et al. ICLR 2015][Mao et al. ICLR 2015][Vinyals, et al. CVPR 2015][Karpathy et al. CVPR 2015][Chen et al. CVPR 2015][Xu et al. ICML 2015][Johnson et al. CVPR 2016][You et al. CVPR 2016]….

indoordarkfurrysiteat

Previous work

catbat chair

[Farhadi et al. 2010; Kulkarni et al. 2011; Yang et al. 2011]

example:

Previous work

semantic attention [You et al. CVPR 2016]

word detection [Fang et al. CVPR 2015]

spatial attention [Xu et al. ICML 2015]

[Lebret et al. 2015; Mao et al. 2015; Vinyals, et al. 2015; Karpathy et al. 2015]

○ prone to accumulate generation errors during inference

○ sensitive to beam sizes during beam searchlocal

● Our target

○ better at utilizing the global information

○ be able to compensate errors

○ less sensitive to beam sizes

Motivation

Decision-Making framework

with Reinforcement Learning

● Limitations of current mainstream framework (encoder-decoder)

○ only local information is utilized

Why using decision-making?

Human-level gaming control [Mnih et al. Nature 2015]

Visual navigation [Zhu et al. ICRA 2017]

AlphaGo [Silver et al. Nature 2016]

Agent Goal

EnvironmentDeep Reinforcement Learning-based Image Captioning with Embedding Reward. CVPR 2017.

actions

reward

Image captioning reformulation in decision-making

Agent Goal

Environmentstate

actions

reward

Agent Goal

Environmentstate

actions

reward

- Goal: to generate a visual description given an image

Agent Goal

Environmentstate

actions

reward

- Agent: the image captioning model to learn

Agent Goal

Environmentstate

actions

reward

- Environment: the given image I + the words predicted so far

Agent Goal

Environmentstate

actions

reward

- State: representation of the environment at t ,

- Action: the word to generate at t + 1,

Agent Goal

Environmentstate

actions

reward

- Action: the word to generate at t + 1,

Agent Goal

Environmentstate

actions

reward

- Reward: the feedback for reinforcement learning

Overview of our approach

❏ Training using reinforcement learning with embedding reward

❏ Testing using lookahead inference

● We propose a decision-making framework for image captioning

❏ An agent model contains a policy network, to capture the local information

a value network, to capture the global information

Our approach - agent architecture

Policy network

Value network

Policy network (local guidance)

Value network

example:

Policy network

Value network (global guidance)

vθ(st) is trying to regress the reward in the end

example:

Our approach - train our agent

● Pretrain policy network p with cross entropy loss

● Pretrain value network vᶿ with the mean squared loss

● Train p and vᶿ jointly using deep Reinforcement Learning

○ an Actor-Critic RL model

○ MIXER [Ranzato et al. ICLR 2016]

Reinforcement learning - reward definition

● Literature: metric-driven [Ranzato et al. ICLR 2016]

● Limitations:

○ metrics in image captioning are not perfectly defined.

○ it needs to be retrained for each metric in isolation.

○ it doesn’t have value network (no global guidance).

Embedding space

● Visual-Semantic Embedding

Reinforcement learning - reward definition

[Frome et al. NIPS 2013; Kiros et al. TACL 2015]

example:

Our approach - inference with our agent

Lookahead inference

global guidance

local guidance

Experimental Results

Results on MS-COCO

simple policy net

simple net structure①

Results on MS-COCO

simple policy net

simple net structure①

simple policy net

advanced net structure

semantic attention

spatial attention

object detection

Results on MS-COCO

simple policy net

advanced net structure

semantic attention

spatial attention

object detection

Results on MS-COCO

embedding-driven RL

metric-driven RL③

Results on MS-COCO

w/o external training data

with external training data④

Results on MS-COCO

Qualitative results

Take Home

● We proposed a novel decision-making framework for image captioning.

○ An agent model → a policy network + a value network

○ A training method → Reinforcement Learning with embedding reward

○ An inference method → lookahead inference

● Utilizing both global and local information is important for sequential

generation tasks.

● Embedding can capture global information and can serve as a very

good global guidance.

Thank you

zhou.ren@snap.com

Thank you!Welcome to visit our poster at #9-B

Deep Reinforcement Learning-based Image Captioning with...

Documents

Unsteady Hydrodynamic Modeling of a Cycloidal PropellerBenedict et al 2014a, Benedict et al 2014b, Zachary et al 2013, Shrestha et al 2012, Shrestha et al 2014, Hrishikeshavan et al

DEEPCODER: LEARNING TO WRITE PROGRAMS · DEEPCODER: LEARNING TO WRITE PROGRAMS Balog et. al. ICLR 2017 Presented By: Azad CS 294 - 159 March 13, 2019

W. et al. Tikka Kato et al. Unterberger Park et al. Carley Matsubara et al. Lee Bradley et al. Park et al. Schmidhammer et al. Ginsburg et al. Korden et al. Mikulin et al. Bouche et

Published as a conference paper at ICLR 2018 et al... · Published as a conference paper at ICLR 2018 EIGENOPTION DISCOVERY THROUGH THE DEEP SUCCESSOR REPRESENTATION Marlos C. Machado1,

arXiv:1511.06390v2 [stat.ML] 30 Apr 2016 · arXiv:1511.06390v2 [stat.ML] 30 Apr 2016 Published as a conference paper at ICLR 2016 (2014), Kingma et al. (2014)) –, or training autoencoder

(12) United States Patent (10) Patent No.: US … · Liamos et al. Mann et al. Heller et al. Gotoh et al. Rush et al. Mao et al. Liamos et al. Desai ..... A61B 5,14532 204,403.01

18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding:

Unsupervised Anomaly Detection in Energy Time Series Data ... · 1Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14 2 Rezende et al., Stochastic Backpropagation and Approximate

(12) United States Patent (io) · Sevcik et al. Ryanetal..... 144/154.5 Govindaraj et al. Kennedy et al. Malone Woods et al. Takahashi et al. Dick et al. Ceroll et al. Caron et al

EXHIBIT A2 - cases.justia.com · 811996 Konrad 1211996 Koppolu et al. 211997 Duscher et al. 311997 Koppolu et al. 511997 Koppolu et al. 711997 Ashe et al. 511998 Koppolu et al. 911998

WHAT A NUISANCE - ICLR

Workshop track - ICLR 2017

From Honey Bee to Mouse Brain - GitHub PagesOutrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer, Shazeer et al. ICLR 2017 Stats 285 Stats 285 A/Compute

Workshop track - ICLR 2018

Troya et al Dooley et al Fancourt et al Zhong et al · Troya et al Involving patients with dementia in decisions to initiate treatment Dooley et al ... Elena Garralda Simon Gilbody

Classical Structured Prediction Losses for Sequence to ... · Reinforcement Learning-inspired methods MIXER (Ranzato et al., ICLR 2016) Actor-Critic (Bahdanau et al., ICLR 2017) Using

(12) United States Patent (10) Patent No.: US 7895.275 B1 ... Gutta et al. Signes et al. Hashimoto et al. Ghashghai et al. Ting et al. Vaidyanathan et al. Obradovich Larsson et al

Oceana et al vs.BOEM et al

6 Journal research area: Whole Plant and Ecophysiology et al., 1998; Hemamalini et al., 2000; Li et al., 2001; Xing et al., 2002; Courtois et al., 2003; Xu 123 et al., 2004; Lian et

Literature Review Messent et al (1998,1999) Robertson et al (2000) Beart et al (2001) Heller et al (2002)