Upload
hoangkhanh
View
213
Download
0
Embed Size (px)
Citation preview
Outline
Overview of Task and Dataset
Overview of Challenge
Winner Announcements
Analysis of Results
20
Datasetsplits
Images Questions Answers
Training 80K 443K 4.4M
Validation 40K 214K 2.1M
Dataset size is approximate23
Datasetsplits
Images Questions Answers
Training 80K 443K 4.4M
Validation 40K 214K 2.1M
Test 80K 447K
Dataset size is approximate24
TestDataset• 4splitsofapproximatelyequalsize
• Test-dev (development)– DebuggingandValidation.
• Test-standard (publications)– UsedtoscoreentriesforthePublicLeaderboard.
• Test-challenge (competitions)– Usedtorankchallengeparticipants.
• Test-reserve (checkoverfitting)– Usedtoestimateoverfitting.Scoresonthissetareneverreleased.
Slideadaptedfrom:MSCOCODetection/SegmentationChallenge,ICCV2015 25
ChallengeRunner-Ups
JointRunner-UpTeam1
SNU-BI
ChallengeAccuracy: 71.69
Jin-HwaKim(SeoulNationalUniversity)Jaehyun Jun(SeoulNationalUniversity)
Byoung-Tak Zhang(SeoulNationalUniversity&Surromind Robotics)
28
ChallengeRunner-Ups
JointRunner-UpTeam2
HDU-UCAS-USYD
ChallengeAccuracy: 71.91
ZhouYu(HangzhouDianzi University,China)JunYu(HangzhouDianzi University,China)
Chenchao Xiang(HangzhouDianzi University,China)
Jianping Fan(HangzhouDianzi University,China)
Dalu Guo (TheUnversity ofSydney, Australia)
Dacheng Tao(TheUniversityofSydney,Australia)
LiangWang(HangzhouDianzi University,China)
QingmingHuang(UniversityofChineseAcademyofSciences)
ChallengeWinner
ChallengeAccuracy: 72.41
YuJiang†(FacebookAIResearch)Vivek Natarajan†(FacebookAIResearch)Xinlei Chen†(FacebookAIResearch)
Dhruv Batra (FacebookAIResearch&GeorgiaTech)MarcusRohrbach (FacebookAIResearch)
30
DeviParikh(FacebookAIResearch&GeorgiaTech)
FAIR-A*
†equalcontribution
Easyvs.DifficultQuestions
0
10
20
30
40
50
60
70
0/10 1/10 2/10 3/10 4/10 5/10 6/10 7/10 8/10 9/10 10/10
Perc
enta
ge o
f que
stio
ns
corre
ctly
ans
wer
ed b
y te
ams
Number of top 10 teams
Easyvs.DifficultQuestions
0
10
20
30
40
50
60
70
0/10 1/10 2/10 3/10 4/10 5/10 6/10 7/10 8/10 9/10 10/10
Perc
enta
ge o
f que
stio
ns
corre
ctly
ans
wer
ed b
y te
ams
Number of top 10 teams
82.5% of questions can be answered by at least 1 method!
DifficultQuestions
Easyvs.DifficultQuestions
0
10
20
30
40
50
60
70
0/10 1/10 2/10 3/10 4/10 5/10 6/10 7/10 8/10 9/10 10/10
Perc
enta
ge o
f que
stio
ns
corre
ctly
ans
wer
ed b
y te
ams
Number of top 10 teams
DifficultQuestions
EasyQuestions
Easyvs.DifficultQuestions
0
10
20
30
40
50
60
70
0/10 1/10 2/10 3/10 4/10 5/10 6/10 7/10 8/10 9/10 10/10
Perc
enta
ge o
f que
stio
ns
corre
ctly
ans
wer
ed b
y te
ams
Number of top 10 teams
2016 2017 2018
DifficultQuestionswithRareAnswersWhatisthenameof…Whatisthenumberon…Whatiswrittenonthe…Whatdoesthesign…Whattimeisit?Whatkindof…Whattypeof…Whyisthe…
Resultson“number”questions
30
35
40
45
50
55
60FA
IR-A
*H
DU
-UC
AS-U
SYD
SNU
-BI
casi
a_iv
aTo
hoku
CV
Lab
MIL
-UT
ut-s
wk
grap
h-at
tent
ion-
msm
DC
D_Z
JUvq
abyt
e fsU
TS_Y
ZZD
Adel
aide
-Ten
eyVQ
A-R
easo
nTen
sor
UPM
C-L
IP6
wyv
ernb
aica
ptio
n_vq
acv
qana
gize
roC
FM-U
ESTC
VQA_
NTU
yudf
2010
nmla
b612
Tsin
ghua
CVL
abC
IST-
VQA
VLC
Sou
tham
pton
Rel
VQA
Uni
vers
ity o
f Gue
lph
MLR
GN
TU_R
OSE
_UST
Czh
i-sm
ileVQ
A-M
achi
ne+ xie
Vard
aan
HAC
KER
SAE
-VQ
Ada
ndel
ingh
ost
VQA-
Lear
ning
vqa-
such
owH
AIBI
Nw
indL
BLVQ
A_Sa
nvq
atea
m_m
cb_b
ench
mar
kak
shay
_isi
cal
"num
ber"
accu
racy
AnswerTypeAnalyses
• SNU_BIperformsthebestfor“number”questions
• Noteamstatisticallysignificantlybetterthanthewinnerteamfor“yes/no”and“other”
Aremodelssensitivetosubtlechangesinimages?
womanDifferent answers
Similar images
man
Who is wearing glasses?
Aremodelssensitivetosubtlechangesinimages?
• Arepredictionsdifferentforcomplementaryimages?• Arepredictionsaccurateforcomplementaryimages?
Arepredictionsdifferent forcomplementaryimages?
40
45
50
55
60
65
70
FAIR
-A*
HD
U-U
CAS
-USY
DSN
U-B
Ica
sia_
iva
MIL
-UT
Toho
ku C
V La
but
-sw
kgr
aph-
atte
ntio
n-m
smD
CD
_ZJU
vqab
yte fs
UTS
_YZZ
DAd
elai
de-T
eney
VQA-
Rea
sonT
enso
rU
PMC
-LIP
6w
yver
nbai
capt
ion_
vqa
cvqa
nagi
zero
CFM
-UES
TCVQ
A_N
TUyu
df20
10nm
lab6
12Ts
ingh
uaC
VLab
CIS
T-VQ
AVL
C S
outh
ampt
onR
elVQ
AU
nive
rsity
of G
uelp
h M
LRG
NTU
_RO
SE_U
STC
zhi-s
mile
VQA-
Mac
hine
+ xie
Vard
aan
HAC
KER
SAE
-VQ
Ada
ndel
ingh
ost
VQA-
Lear
ning
vqa-
such
owH
AIBI
Nw
indL
BLVQ
A_Sa
nvq
atea
m_m
cb_b
ench
mar
kak
shay
_isi
cal
Arepredictionsaccurate forcomplementaryimages?
40
42
44
46
48
50
52
54
56
58
60
FAIR
-A*
HD
U-U
CAS
-USY
DSN
U-B
Ica
sia_
iva
MIL
-UT
Toho
ku C
V La
but
-sw
kgr
aph-
atte
ntio
n-m
smD
CD
_ZJU
vqab
yte fs
UTS
_YZZ
DAd
elai
de-T
eney
VQA-
Rea
sonT
enso
rU
PMC
-LIP
6w
yver
nbai
capt
ion_
vqa
cvqa
nagi
zero
CFM
-UES
TCVQ
A_N
TUyu
df20
10nm
lab6
12Ts
ingh
uaC
VLab
CIS
T-VQ
AVL
C S
outh
ampt
onR
elVQ
AU
nive
rsity
of G
uelp
h M
LRG
NTU
_RO
SE_U
STC
zhi-s
mile
VQA-
Mac
hine
+ xie
Vard
aan
HAC
KER
SAE
-VQ
Ada
ndel
ingh
ost
VQA-
Lear
ning
vqa-
such
owH
AIBI
Nw
indL
BLVQ
A_Sa
nvq
atea
m_m
cb_b
ench
mar
kak
shay
_isi
cal
Arepredictionsaccurateforcomplementaryimages?
40
42
44
46
48
50
52
54
56
58
60
FAIR
-A*
HD
U-U
CAS
-USY
DSN
U-B
Ica
sia_
iva
MIL
-UT
Toho
ku C
V La
but
-sw
kgr
aph-
atte
ntio
n-m
smD
CD
_ZJU
vqab
yte fs
UTS
_YZZ
DAd
elai
de-T
eney
VQA-
Rea
sonT
enso
rU
PMC
-LIP
6w
yver
nbai
capt
ion_
vqa
cvqa
nagi
zero
CFM
-UES
TCVQ
A_N
TUyu
df20
10nm
lab6
12Ts
ingh
uaC
VLab
CIS
T-VQ
AVL
C S
outh
ampt
onR
elVQ
AU
nive
rsity
of G
uelp
h M
LRG
NTU
_RO
SE_U
STC
zhi-s
mile
VQA-
Mac
hine
+ xie
Vard
aan
HAC
KER
SAE
-VQ
Ada
ndel
ingh
ost
VQA-
Lear
ning
vqa-
such
owH
AIBI
Nw
indL
BLVQ
A_Sa
nvq
atea
m_m
cb_b
ench
mar
kak
shay
_isi
cal
52.7%2017 winner
+4.8%absolute
Aremodelsdrivenbypriors?
Onlyconsiderthosequestionswhoseanswersarenotpopular(giventhequestiontype)intraining
• 1-Prior:Testanswersarenotthetop-1mostcommonintraining
• 2-Prior:Testanswerarenotthetop-2mostcommonintraining
Agrawal et al., CVPR 2018
Aremodelsdrivenbypriors?
5-6%drop
50
55
60
65
70
75
FAIR
-A*
HD
U-U
CAS
-USY
DSN
U-B
Ica
sia_
iva
MIL
-UT
Toho
ku C
V La
but
-sw
kgr
aph-
atte
ntio
n-m
smD
CD
_ZJU
vqab
yte fs
UTS
_YZZ
DAd
elai
de-T
eney
VQA-
Rea
sonT
enso
rU
PMC
-LIP
6w
yver
nbai
capt
ion_
vqa
cvqa
nagi
zero
CFM
-UES
TCVQ
A_N
TUyu
df20
10nm
lab6
12Ts
ingh
uaC
VLab
CIS
T-VQ
AVL
C S
outh
ampt
onR
elVQ
AU
nive
rsity
of G
uelp
h M
LRG
NTU
_RO
SE_U
STC
zhi-s
mile
VQA-
Mac
hine
+ xie
Vard
aan
HAC
KER
SAE
-VQ
Ada
ndel
ingh
ost
VQA-
Lear
ning
vqa-
such
owH
AIBI
Nw
indL
BLVQ
A_Sa
nvq
atea
m_m
cb_b
ench
mar
kak
shay
_isi
cal
All Questions Non-1-Prior Questions
Aremodelsdrivenbypriors?
15-16%drop
40
45
50
55
60
65
70
75
FAIR
-A*
HD
U-U
CAS
-USY
DSN
U-B
Ica
sia_
iva
MIL
-UT
Toho
ku C
V La
but
-sw
kgr
aph-
atte
ntio
n-m
smD
CD
_ZJU
vqab
yte fs
UTS
_YZZ
DAd
elai
de-T
eney
VQA-
Rea
sonT
enso
rU
PMC
-LIP
6w
yver
nbai
capt
ion_
vqa
cvqa
nagi
zero
CFM
-UES
TCVQ
A_N
TUyu
df20
10nm
lab6
12Ts
ingh
uaC
VLab
CIS
T-VQ
AVL
C S
outh
ampt
onR
elVQ
AU
nive
rsity
of G
uelp
h M
LRG
NTU
_RO
SE_U
STC
zhi-s
mile
VQA-
Mac
hine
+ xie
Vard
aan
HAC
KER
SAE
-VQ
Ada
ndel
ingh
ost
VQA-
Lear
ning
vqa-
such
owH
AIBI
Nw
indL
BLVQ
A_Sa
nvq
atea
m_m
cb_b
ench
mar
kak
shay
_isi
cal
All Questions Non-2-Prior Questions
Improvementfrom2017challenge
• 1-Prior:Bestperformanceimprovedby3.8%• 2-Prior:Bestperformanceimprovedby3.3%
Aremodelscompositional?
Onlyconsiderthosequestionswhicharecompositionallynovel:
• QApairisnotseenintraining• Constitutingconceptsseenintraining
Agrawal et al., Arxiv 2018
Aremodelscompositional?
12-13%drop
40
45
50
55
60
65
70
75
FAIR
-A*
HD
U-U
CAS
-USY
DSN
U-B
Ica
sia_
iva
MIL
-UT
Toho
ku C
V La
but
-sw
kgr
aph-
atte
ntio
n-m
smD
CD
_ZJU
vqab
yte fs
UTS
_YZZ
DAd
elai
de-T
eney
VQA-
Rea
sonT
enso
rU
PMC
-LIP
6w
yver
nbai
capt
ion_
vqa
cvqa
nagi
zero
CFM
-UES
TCVQ
A_N
TUyu
df20
10nm
lab6
12Ts
ingh
uaC
VLab
CIS
T-VQ
AVL
C S
outh
ampt
onR
elVQ
AU
nive
rsity
of G
uelp
h M
LRG
NTU
_RO
SE_U
STC
zhi-s
mile
VQA-
Mac
hine
+ xie
Vard
aan
HAC
KER
SAE
-VQ
Ada
ndel
ingh
ost
VQA-
Lear
ning
vqa-
such
owH
AIBI
Nw
indL
BLVQ
A_Sa
nvq
atea
m_m
cb_b
ench
mar
kak
shay
_isi
cal
All Questions Compositionally Novel Questions
Averageanswerrecall
• NewaccuracymetricproposedinKafle andKannan,ICCV17– Alsoknownas“Normalizedaccuracy”
• Method:– Computesaccuracyforeachuniqueanswer– Takethemeanoveralluniqueanswers
• Rewardsmodelswhichperformwellforrareanswers
Averageanswerrecall
18
20
22
24
26
28
30
FAIR
-A*
HD
U-U
CAS
-USY
DSN
U-B
Ica
sia_
iva
Toho
ku C
V La
bM
IL-U
Tut
-sw
kgr
aph-
atte
ntio
n-m
smD
CD
_ZJU
vqab
yte fs
UTS
_YZZ
DAd
elai
de-T
eney
VQA-
Rea
sonT
enso
rU
PMC
-LIP
6w
yver
nbai
capt
ion_
vqa
cvqa
nagi
zero
CFM
-UES
TCVQ
A_N
TUyu
df20
10nm
lab6
12Ts
ingh
uaC
VLab
CIS
T-VQ
AVL
C S
outh
ampt
onR
elVQ
AU
nive
rsity
of G
uelp
h M
LRG
NTU
_RO
SE_U
STC
zhi-s
mile
VQA-
Mac
hine
+ xie
Vard
aan
HAC
KER
SAE
-VQ
Ada
ndel
ingh
ost
VQA-
Lear
ning
vqa-
such
owH
AIBI
Nw
indL
BLVQ
A_Sa
nvq
atea
m_m
cb_b
ench
mar
kak
shay
_isi
cal
Averageanswerrecall
18
20
22
24
26
28
30
FAIR
-A*
HD
U-U
CAS
-USY
DSN
U-B
Ica
sia_
iva
Toho
ku C
V La
bM
IL-U
Tut
-sw
kgr
aph-
atte
ntio
n-m
smD
CD
_ZJU
vqab
yte fs
UTS
_YZZ
DAd
elai
de-T
eney
VQA-
Rea
sonT
enso
rU
PMC
-LIP
6w
yver
nbai
capt
ion_
vqa
cvqa
nagi
zero
CFM
-UES
TCVQ
A_N
TUyu
df20
10nm
lab6
12Ts
ingh
uaC
VLab
CIS
T-VQ
AVL
C S
outh
ampt
onR
elVQ
AU
nive
rsity
of G
uelp
h M
LRG
NTU
_RO
SE_U
STC
zhi-s
mile
VQA-
Mac
hine
+ xie
Vard
aan
HAC
KER
SAE
-VQ
Ada
ndel
ingh
ost
VQA-
Lear
ning
vqa-
such
owH
AIBI
Nw
indL
BLVQ
A_Sa
nvq
atea
m_m
cb_b
ench
mar
kak
shay
_isi
cal
ProgressinVQA
68
50
55
60
65
70
75
12/7/15 3/16/16 6/24/16 10/2/16 1/10/17 4/20/17 7/29/17 11/6/17 2/14/18 5/25/18
ICCV15
Accu
racy
on
v2
ProgressinVQA
69
50
55
60
65
70
75
12/7/15 3/16/16 6/24/16 10/2/16 1/10/17 4/20/17 7/29/17 11/6/17 2/14/18 5/25/18
ICCV15
2016Challengewinner
Accu
racy
on
v2
ProgressinVQA
70
50
55
60
65
70
75
12/7/15 3/16/16 6/24/16 10/2/16 1/10/17 4/20/17 7/29/17 11/6/17 2/14/18 5/25/18
ICCV15
2016Challengewinner
Accu
racy
on
v2
+7.0%absolute
ProgressinVQA
71
50
55
60
65
70
75
12/7/15 3/16/16 6/24/16 10/2/16 1/10/17 4/20/17 7/29/17 11/6/17 2/14/18 5/25/18
ICCV15
2016Challengewinner
2017Challengewinner
Challenge2017deadline
Accu
racy
on
v2
ProgressinVQA
72
50
55
60
65
70
75
12/7/15 3/16/16 6/24/16 10/2/16 1/10/17 4/20/17 7/29/17 11/6/17 2/14/18 5/25/18
ICCV15
2016Challengewinner
2017Challengewinner
Challenge2017deadline
+6.7%absolute
Accu
racy
on
v2
ProgressinVQA
73
50
55
60
65
70
75
12/7/15 3/16/16 6/24/16 10/2/16 1/10/17 4/20/17 7/29/17 11/6/17 2/14/18 5/25/18
ICCV15
2016Challengewinner
2017Challengewinner
Challenge2018deadline
2018Challengewinner
Accu
racy
on
v2
ProgressinVQA
74
50
55
60
65
70
75
12/7/15 3/16/16 6/24/16 10/2/16 1/10/17 4/20/17 7/29/17 11/6/17 2/14/18 5/25/18
ICCV15
2016Challengewinner
2017Challengewinner
+3.4%absolute2018Challenge
winner
Accu
racy
on
v2
VisualDialogChallenge2018
75
• Deadline:mid-August,2018• Results:September8th,2018 atECCV2018
visualdialog.org/challenge/2018
• ~130k images (COCO)• 10-round dialog / image• ~1.3 million QA pairs• Evaluation
• Automatic metrics• Human annotations
VisDial v1.0