No Metrics Are Perfect: Adversarial REwardLearning for ...xwang/slides/AREL.pdf · for Visual...

Preview:

Citation preview

NoMetricsArePerfect:AdversarialREward Learning

forVisualStorytellingXin(Eric)Wang*,Wenhu Chen*,Yuan-FangWang,WilliamWang

1

2

Caption:Twoyoungkidswithbackpackssittingontheporch.

Image Captioning

VisualStorytelling

3

Story:Thebrotherdidnotwanttotalktohissister.Thesiblingsmadeup.Theystartedtotalkandsmile.Theirparentsshowedup.Theywerehappytoseethem.

Story:Thebrother didnotwanttotalktohissister.Thesiblings madeup.Theystartedtotalkandsmile.Theirparents showedup.Theywerehappy toseethem.

Imagination

Story:Thebrother didnotwanttotalktohissister.Thesiblings madeup.Theystartedtotalkandsmile.Theirparents showedup.Theywerehappytoseethem.

Emotion Subjectiveness

4

Story#2:Thebrotherandsisterwerereadyforthefirstdayofschool.Theywereexcitedtogototheirfirstdayandmeetnewfriends.Theytoldtheirmomhowhappytheywere.Theysaidtheyweregoingtomakealotofnewfriends.Thentheygotupandgotreadytogetinthecar.

VisualStorytelling

5

Behavioralcloningmethods(e.g.MLE)arenotgoodenoughforvisualstorytelling

ReinforcementLearning

6

o Directlyoptimizetheexistingmetrics• BLEU,METEOR,ROUGE,CIDEr• Reduceexposurebias

ReinforcementLearning(RL)

Environment

RewardFunction

OptimalPolicy

Rennie 2017, “Self-critical Sequence Training for Image Captioning”

7

We had a great time to have a lot of the. They were to be a of the. They were to be in the. The and it were to be the. The, and it were to be the.

AverageMETEORscore:40.2(SOTAmodel:35.0)

8

I had a great time at the restaurant today. The food was delicious. I had a lot of food. I had a great time.

BLEU-4score:0

9

NoMetricsArePerfect!

Inverse Reinforcement Learning

10

InverseReinforcementLearning(IRL)

Environment

RewardFunction

OptimalPolicy

ReinforcementLearning(RL)

Environment

RewardFunction

OptimalPolicy

Reinforcement Learning

AdversarialREward Learning(AREL)

11

Environment

AdversarialObjective PolicyModelRewardModel

Reward

GeneratedStory

ReferenceStory

RL

InverseRL

PolicyModel𝜋"

12

CNN

Mybrotherrecentlygraduatedcollege.

Itwasaformalcapandgownevent.

Mymomanddadattended.

Later,myauntandgrandmashowedup.

Whentheeventwasoverheevengotcongratulatedbythemascot.

Encoder Decoder

RewardModel𝑅$

13

Story Convolution FClayerPooling

CNN

mymomanddad

attended.

<EOS>

Reward

+

Kim 2014, “Convolutional Neural Networks for Sentence Classification”

AssociatingReward withStory

Approximatedatadistribution Partitionfunction

Story

Optimalrewardfunction𝑅$∗(𝑊) isachievedwhen

LeCun et al. 2006, “A tutorial on energy-based learning”

Energy-basedmodels associateanenergyvalue𝐸$(𝑥) withasample𝑥,modelingthedataasaBoltzmanndistribution

𝑝$ 𝑥 =exp(−𝐸$(𝑥))

𝑍

RewardFunctionRewardBoltzmannDistribution

ARELObjective

15

• TheobjectiveofRewardModel𝑅$:

Empiricaldistribution Policydistribution

RewardBoltzmanndistribution

• TheobjectiveofPolicyModel𝜋":

Therefore,wedefineanadversarialobjectivewithKL-divergence

RewardVisualization

16

Method BLEU-1 BLEU-2 BLEU-3 BLEU-4 METEOR ROUGE CIDErSeq2seq (Huangetal.) - - - - 31.4 - -HierAttRNN (Yuetal.) - - 21.0 - 34.1 29.5 7.5AREL(ours) 63.7 39.0 23.1 14.0 35.0 29.6 9.5

AutomaticEvaluation

17

Method BLEU-1 BLEU-2 BLEU-3 BLEU-4 METEOR ROUGE CIDErSeq2seq (Huangetal.) - - - - 31.4 - -HierAttRNN (Yuetal.) - - 21.0 - 34.1 29.5 7.5XE 62.3 38.2 22.5 13.7 34.8 29.7 8.7AREL(ours) 63.7 39.0 23.1 14.0 35.0 29.6 9.5

Method BLEU-1 BLEU-2 BLEU-3 BLEU-4 METEOR ROUGE CIDErSeq2seq (Huangetal.) - - - - 31.4 - -HierAttRNN (Yuetal.) - - 21.0 - 34.1 29.5 7.5XE 62.3 38.2 22.5 13.7 34.8 29.7 8.7BLEU-RL 62.1 38.0 22.6 13.9 34.6 29.0 8.9METEOR-RL 68.1 35.0 15.4 6.8 40.2 30.0 1.2ROUGE-RL 58.1 18.5 1.6 0.0 27.0 33.8 0.0CIDEr-RL 61.9 37.8 22.5 13.8 34.9 29.7 8.1AREL(ours) 63.7 39.0 23.1 14.0 35.0 29.6 9.5

Method BLEU-1 BLEU-2 BLEU-3 BLEU-4 METEOR ROUGE CIDErSeq2seq (Huangetal.) - - - - 31.4 - -HierAttRNN (Yuetal.) - - 21.0 - 34.1 29.5 7.5XE 62.3 38.2 22.5 13.7 34.8 29.7 8.7BLEU-RL 62.1 38.0 22.6 13.9 34.6 29.0 8.9METEOR-RL 68.1 35.0 15.4 6.8 40.2 30.0 1.2ROUGE-RL 58.1 18.5 1.6 0.0 27.0 33.8 0.0CIDEr-RL 61.9 37.8 22.5 13.8 34.9 29.7 8.1AREL(ours) 63.7 39.0 23.1 14.0 35.0 29.6 9.5

Method BLEU-1 BLEU-2 BLEU-3 BLEU-4 METEOR ROUGE CIDErSeq2seq (Huangetal.) - - - - 31.4 - -HierAttRNN (Yuetal.) - - 21.0 - 34.1 29.5 7.5XE 62.3 38.2 22.5 13.7 34.8 29.7 8.7BLEU-RL 62.1 38.0 22.6 13.9 34.6 29.0 8.9METEOR-RL 68.1 35.0 15.4 6.8 40.2 30.0 1.2ROUGE-RL 58.1 18.5 1.6 0.0 27.0 33.8 0.0CIDEr-RL 61.9 37.8 22.5 13.8 34.9 29.7 8.1GAN 62.8 38.8 23.0 14.0 35.0 29.5 9.0AREL(ours) 63.7 39.0 23.1 14.0 35.0 29.6 9.5

Huang et al. 2016, “Visual Storytelling” Yu et al. 2017, “Hierarchically-Attentive RNN for Album Summarization and Storytelling”

HumanEvaluation

18

0% 10% 20% 30% 40% 50%

XE BLEU-RL CIDEr-RL GAN AREL

TuringTest

Win Unsure

-17.5 -13.7-26.1

-6.3

HumanEvaluation

19

Relevance:thestoryaccuratelydescribeswhatishappeninginthephotostreamandcoversthemainobjects.Expressiveness:coherence,grammaticallyandsemanticallycorrect,norepetition,expressivelanguagestyle.Concreteness:thestoryshouldnarrateconcretelywhatisintheimagesratherthangivingverygeneraldescriptions.

PairwiseComparison

20

XE-ss Wetookatriptothemountains.

Thereweremanydifferentkindsofdifferentkinds.

Wehadagreattime.

Hewasagreattime.

Itwasabeautifulday.XE-ss Wetookatripto

themountains.

Thereweremanydifferentkindsofdifferentkinds.

Wehadagreattime.

Hewasagreattime.

Itwasabeautifulday.

ARELThefamilydecidedtotakeatriptothecountryside.

Thereweresomanydifferentkindsofthingstosee.

Thefamilydecidedtogoonahike. Ihadagreattime.

Attheendoftheday,wewereabletotakeapictureofthebeautifulscenery.

Human-createdStory

Wewentonahikeyesterday.

Therewerealotofstrangeplantsthere.

Ihadagreattime.Wedrankalotofwaterwhilewewerehiking.

Theviewwasspectacular.

XE-ss Wetookatriptothemountains.

Thereweremanydifferentkindsofdifferentkinds.

Wehadagreattime.

Hewasagreattime.

Itwasabeautifulday.

ARELThefamilydecidedtotakeatriptothecountryside.

Thereweresomanydifferentkindsofthingstosee.

Thefamilydecidedtogoonahike. Ihadagreattime.

Attheendoftheday,wewereabletotakeapictureofthebeautifulscenery.

Takeaway

21

o Generatingandevaluatingstoriesarebothchallengingduetothecomplicatednatureofstories

o Noexistingmetricsareperfectforeithertrainingortestingo ARELisabetterlearningframeworkforvisualstorytelling

§ Canbeappliedtoothergenerationtaskso Ourapproachismodel-agnostic

§ Advancedmodelsà betterperformance

Thanks!

22

Paper:https://arxiv.org/abs/1804.09160Code: https://github.com/littlekobe/AREL

Recommended