No Metrics Are Perfect: Adversarial REwardLearning for ...xwang/slides/AREL.pdf · for Visual...

NoMetricsArePerfect:AdversarialREward Learning

forVisualStorytellingXin(Eric)Wang*,Wenhu Chen*,Yuan-FangWang,WilliamWang

Caption:Twoyoungkidswithbackpackssittingontheporch.

Image Captioning

VisualStorytelling

Story:Thebrotherdidnotwanttotalktohissister.Thesiblingsmadeup.Theystartedtotalkandsmile.Theirparentsshowedup.Theywerehappytoseethem.

Story:Thebrother didnotwanttotalktohissister.Thesiblings madeup.Theystartedtotalkandsmile.Theirparents showedup.Theywerehappy toseethem.

Imagination

Story:Thebrother didnotwanttotalktohissister.Thesiblings madeup.Theystartedtotalkandsmile.Theirparents showedup.Theywerehappytoseethem.

Emotion Subjectiveness

Story#2:Thebrotherandsisterwerereadyforthefirstdayofschool.Theywereexcitedtogototheirfirstdayandmeetnewfriends.Theytoldtheirmomhowhappytheywere.Theysaidtheyweregoingtomakealotofnewfriends.Thentheygotupandgotreadytogetinthecar.

VisualStorytelling

Behavioralcloningmethods(e.g.MLE)arenotgoodenoughforvisualstorytelling

ReinforcementLearning

o Directlyoptimizetheexistingmetrics• BLEU,METEOR,ROUGE,CIDEr• Reduceexposurebias

ReinforcementLearning(RL)

Environment

RewardFunction

OptimalPolicy

Rennie 2017, “Self-critical Sequence Training for Image Captioning”

We had a great time to have a lot of the. They were to be a of the. They were to be in the. The and it were to be the. The, and it were to be the.

AverageMETEORscore:40.2(SOTAmodel:35.0)

I had a great time at the restaurant today. The food was delicious. I had a lot of food. I had a great time.

BLEU-4score:0

NoMetricsArePerfect!

Inverse Reinforcement Learning

InverseReinforcementLearning(IRL)

Environment

RewardFunction

OptimalPolicy

ReinforcementLearning(RL)

Environment

RewardFunction

OptimalPolicy

Reinforcement Learning

AdversarialREward Learning(AREL)

Environment

AdversarialObjective PolicyModelRewardModel

Reward

GeneratedStory

ReferenceStory

InverseRL

PolicyModel𝜋"

Mybrotherrecentlygraduatedcollege.

Itwasaformalcapandgownevent.

Mymomanddadattended.

Later,myauntandgrandmashowedup.

Whentheeventwasoverheevengotcongratulatedbythemascot.

Encoder Decoder

RewardModel𝑅$

Story Convolution FClayerPooling

mymomanddad

attended.

Reward

Kim 2014, “Convolutional Neural Networks for Sentence Classification”

AssociatingReward withStory

Approximatedatadistribution Partitionfunction

Optimalrewardfunction𝑅$∗(𝑊) isachievedwhen

LeCun et al. 2006, “A tutorial on energy-based learning”

Energy-basedmodels associateanenergyvalue𝐸$(𝑥) withasample𝑥,modelingthedataasaBoltzmanndistribution

𝑝$ 𝑥 =exp(−𝐸$(𝑥))

RewardFunctionRewardBoltzmannDistribution

ARELObjective

• TheobjectiveofRewardModel𝑅$:

Empiricaldistribution Policydistribution

RewardBoltzmanndistribution

• TheobjectiveofPolicyModel𝜋":

Therefore,wedefineanadversarialobjectivewithKL-divergence

RewardVisualization

Method BLEU-1 BLEU-2 BLEU-3 BLEU-4 METEOR ROUGE CIDErSeq2seq (Huangetal.) - - - - 31.4 - -HierAttRNN (Yuetal.) - - 21.0 - 34.1 29.5 7.5AREL(ours) 63.7 39.0 23.1 14.0 35.0 29.6 9.5

AutomaticEvaluation

Method BLEU-1 BLEU-2 BLEU-3 BLEU-4 METEOR ROUGE CIDErSeq2seq (Huangetal.) - - - - 31.4 - -HierAttRNN (Yuetal.) - - 21.0 - 34.1 29.5 7.5XE 62.3 38.2 22.5 13.7 34.8 29.7 8.7AREL(ours) 63.7 39.0 23.1 14.0 35.0 29.6 9.5

Method BLEU-1 BLEU-2 BLEU-3 BLEU-4 METEOR ROUGE CIDErSeq2seq (Huangetal.) - - - - 31.4 - -HierAttRNN (Yuetal.) - - 21.0 - 34.1 29.5 7.5XE 62.3 38.2 22.5 13.7 34.8 29.7 8.7BLEU-RL 62.1 38.0 22.6 13.9 34.6 29.0 8.9METEOR-RL 68.1 35.0 15.4 6.8 40.2 30.0 1.2ROUGE-RL 58.1 18.5 1.6 0.0 27.0 33.8 0.0CIDEr-RL 61.9 37.8 22.5 13.8 34.9 29.7 8.1AREL(ours) 63.7 39.0 23.1 14.0 35.0 29.6 9.5

Method BLEU-1 BLEU-2 BLEU-3 BLEU-4 METEOR ROUGE CIDErSeq2seq (Huangetal.) - - - - 31.4 - -HierAttRNN (Yuetal.) - - 21.0 - 34.1 29.5 7.5XE 62.3 38.2 22.5 13.7 34.8 29.7 8.7BLEU-RL 62.1 38.0 22.6 13.9 34.6 29.0 8.9METEOR-RL 68.1 35.0 15.4 6.8 40.2 30.0 1.2ROUGE-RL 58.1 18.5 1.6 0.0 27.0 33.8 0.0CIDEr-RL 61.9 37.8 22.5 13.8 34.9 29.7 8.1GAN 62.8 38.8 23.0 14.0 35.0 29.5 9.0AREL(ours) 63.7 39.0 23.1 14.0 35.0 29.6 9.5

Huang et al. 2016, “Visual Storytelling” Yu et al. 2017, “Hierarchically-Attentive RNN for Album Summarization and Storytelling”

HumanEvaluation

0% 10% 20% 30% 40% 50%

XE BLEU-RL CIDEr-RL GAN AREL

TuringTest

Win Unsure

-17.5 -13.7-26.1

HumanEvaluation

Relevance:thestoryaccuratelydescribeswhatishappeninginthephotostreamandcoversthemainobjects.Expressiveness:coherence,grammaticallyandsemanticallycorrect,norepetition,expressivelanguagestyle.Concreteness:thestoryshouldnarrateconcretelywhatisintheimagesratherthangivingverygeneraldescriptions.

PairwiseComparison

XE-ss Wetookatriptothemountains.

Thereweremanydifferentkindsofdifferentkinds.

Wehadagreattime.

Hewasagreattime.

Itwasabeautifulday.XE-ss Wetookatripto

themountains.

Wehadagreattime.

Hewasagreattime.

Itwasabeautifulday.

ARELThefamilydecidedtotakeatriptothecountryside.

Thereweresomanydifferentkindsofthingstosee.

Thefamilydecidedtogoonahike. Ihadagreattime.

Attheendoftheday,wewereabletotakeapictureofthebeautifulscenery.

Human-createdStory

Wewentonahikeyesterday.

Therewerealotofstrangeplantsthere.

Ihadagreattime.Wedrankalotofwaterwhilewewerehiking.

Theviewwasspectacular.

XE-ss Wetookatriptothemountains.

Wehadagreattime.

Hewasagreattime.

Itwasabeautifulday.

ARELThefamilydecidedtotakeatriptothecountryside.

Thereweresomanydifferentkindsofthingstosee.

Thefamilydecidedtogoonahike. Ihadagreattime.

Attheendoftheday,wewereabletotakeapictureofthebeautifulscenery.

Takeaway

o Generatingandevaluatingstoriesarebothchallengingduetothecomplicatednatureofstories

o Noexistingmetricsareperfectforeithertrainingortestingo ARELisabetterlearningframeworkforvisualstorytelling

§ Canbeappliedtoothergenerationtaskso Ourapproachismodel-agnostic

§ Advancedmodelsà betterperformance

Thanks!

Paper:https://arxiv.org/abs/1804.09160Code: https://github.com/littlekobe/AREL

No Metrics Are Perfect: Adversarial REwardLearning for ...xwang/slides/AREL.pdf · for Visual...

Documents

project managment - wang jing yuan (Eric Wang)

EN04 Object-Oriented Modeling with PowerDesigner 9.5 Xiao Wang PowerDesigner Chief Architect, EBD xwang@sybase.com

Literature Review - faculty.uml.edufaculty.uml.edu/xwang/16.541/2013/literature review.pdfthe literature review, if the body of the literature review is not already a chronology

Interferometric optical fiber biosensor.pptfaculty.uml.edu/xwang/16.541/2013/Interferometric optical fiber biosensor.pdf · • blocking of active binding epitopes, • steric hindrance,

Zach Wang, Eric Wang, Tom Siitari, & Nick Navarro

Wang TINS 2001 Wang et al PNAS 2004 Wang Neuron 2002 400ms

Biosensors in Clinical Chemistry - Faculty Server …faculty.uml.edu/xwang/16.541/2010/invited talk Eugene.pdfMEASUREMENT OF BLOOD GLUCOSE Direct measurement of glucose is difficult

Jih-Wang Wang 2006/4/26

Bai-Ling Wang & Hang Wang

Curriculum Vitae of Mingfu Wang NAME: PRESENT TITLE: … of Mingfu Wang... · 2018-11-06 · 1 Wang M Curriculum Vitae of Mingfu Wang NAME: Mingfu Wang PRESENT TITLE: Associate Professor

1 Measurement and Analysis of LDAP Performance Xin Wang( xwang@ctr.columbia.edu ), Henning Schulzrinne, Dilip Kandlur, Dinesh Verma

Disposable Miniature Pressure Sensor for Cardiologist …faculty.uml.edu/xwang/16.541/2013/Disposable Miniature Pressure... · Disposable Miniature Pressure Sensor for Cardiologist

Xianlong Wang 1,2,*, Chengfei Wang 1 - Semantic Scholar

Wang Wang Radio

-- Speech Activities in CST Thomas Fang Zhengcslt.riit.tsinghua.edu.cn/~fzheng/TALKS/20011030_TALK_AT...2001/10/30 · Wenhu WU, “Language understanding component for Chinese dialogue

Hierarchical cluster analysis of a ... - School of Meteorologyweather.ou.edu › ~xwang › part1_20110507.pdf · Hierarchical cluster analysis of a convection-allowing ensemble during

Wang BASIC Language Reference Manual - Wang 2200

SUNING WANG Curriculum Vitae (2019)faculty.chem.queensu.ca/people/faculty/wang/Old Site/WANG...CV - Professor Suning Wang Page 1 of 34 SUNING WANG Curriculum Vitae (2019) Contact Information

Xingwei Wang - uml.edufaculty.uml.edu/xwang/16.311/MATERIALS/Notebook.pdfLaboratory Notebook Guidelines INTRODUCTION Using a Laboratory Notebook to record ideas, inventions, experimentation

Integrated Coverage and Connectivity Configuration in ...xwang/papers/sensys03_ccp.pdf · provide sufficient coverage for distributed detection. Provided scalability and fault-tolerance,