76
Aishwarya Agrawal (Georgia Tech) Yash Goyal (Georgia Tech)

Aishwarya Agrawal (GeorgiaTech)visualqa.org/static/slides/2018_workshop_yash_slides.pdf · sm JU e fs ZZD e-y VQA-r MC-6 i a a o CFM-TC TU 0 nmlab612 b T VQA n A G TC i-e VQA-+ e

Embed Size (px)

Citation preview

AishwaryaAgrawal(Georgia Tech)

Yash Goyal(Georgia Tech)

Outline

OverviewofTaskandDataset

OverviewofChallenge

WinnerAnnouncements

AnalysisofResults

2

Outline

OverviewofTaskandDataset

OverviewofChallenge

WinnerAnnouncements

AnalysisofResults

3

Outline

OverviewofTaskandDataset

OverviewofChallenge

WinnerAnnouncements

AnalysisofResults

4

Outline

OverviewofTaskandDataset

OverviewofChallenge

WinnerAnnouncements

AnalysisofResults

5

Outline

OverviewofTaskandDataset

OverviewofChallenge

WinnerAnnouncements

AnalysisofResults

6

VQATask

7

VQATask

Whatisthemustachemadeof?

8

VQATask

Whatisthemustachemadeof?

AISystem

9

VQATask

Whatisthemustachemadeof?

bananasAISystem

10

VQAv1.0Dataset

11

VQAv1.0Dataset

12

About objects

VQAv1.0Dataset

13

Fine-grained recognition

VQAv1.0Dataset

14

Counting

VQAv1.0Dataset

15Common

sense

VQAv2.0Dataset

womanDifferent answers

Similar images

VQA v1.0

man

Who is wearing glasses?

New in VQA v2.0

VQAv2.0DatasetStats

• >200Kimages

• >1.1Mquestions

• >11Manswers

18

1.8xVQAv1.0

AccuracyMetric

19

Outline

Overview of Task and Dataset

Overview of Challenge

Winner Announcements

Analysis of Results

20

VQAChallengeonhttps://evalai.cloudcv.org/

21

Datasetsplits

Images Questions Answers

Training 80K 443K 4.4M

Dataset size is approximate22

Datasetsplits

Images Questions Answers

Training 80K 443K 4.4M

Validation 40K 214K 2.1M

Dataset size is approximate23

Datasetsplits

Images Questions Answers

Training 80K 443K 4.4M

Validation 40K 214K 2.1M

Test 80K 447K

Dataset size is approximate24

TestDataset• 4splitsofapproximatelyequalsize

• Test-dev (development)– DebuggingandValidation.

• Test-standard (publications)– UsedtoscoreentriesforthePublicLeaderboard.

• Test-challenge (competitions)– Usedtorankchallengeparticipants.

• Test-reserve (checkoverfitting)– Usedtoestimateoverfitting.Scoresonthissetareneverreleased.

Slideadaptedfrom:MSCOCODetection/SegmentationChallenge,ICCV2015 25

Outline

Overview of Task and Dataset

Overview of Challenge

Winner Announcements

Analysis of Results

ChallengeStats

• 40teams• >=40institutions*• >=8countries*

*Statisticsbasedonteamsthathavereplied

ChallengeRunner-Ups

JointRunner-UpTeam1

SNU-BI

ChallengeAccuracy: 71.69

Jin-HwaKim(SeoulNationalUniversity)Jaehyun Jun(SeoulNationalUniversity)

Byoung-Tak Zhang(SeoulNationalUniversity&Surromind Robotics)

28

ChallengeRunner-Ups

JointRunner-UpTeam2

HDU-UCAS-USYD

ChallengeAccuracy: 71.91

ZhouYu(HangzhouDianzi University,China)JunYu(HangzhouDianzi University,China)

Chenchao Xiang(HangzhouDianzi University,China)

Jianping Fan(HangzhouDianzi University,China)

Dalu Guo (TheUnversity ofSydney, Australia)

Dacheng Tao(TheUniversityofSydney,Australia)

LiangWang(HangzhouDianzi University,China)

QingmingHuang(UniversityofChineseAcademyofSciences)

ChallengeWinner

ChallengeAccuracy: 72.41

YuJiang†(FacebookAIResearch)Vivek Natarajan†(FacebookAIResearch)Xinlei Chen†(FacebookAIResearch)

Dhruv Batra (FacebookAIResearch&GeorgiaTech)MarcusRohrbach (FacebookAIResearch)

30

DeviParikh(FacebookAIResearch&GeorgiaTech)

FAIR-A*

†equalcontribution

Outline

OverviewofTaskandDataset

OverviewofChallenge

WinnerAnnouncements

AnalysisofResults

ChallengeResults

60

62

64

66

68

70

72

74

ChallengeResults

60

62

64

66

68

70

72

74

ChallengeResults

67

68

69

70

71

72

73

ChallengeResults

67

68

69

70

71

72

73

+3.4%absolute

StatisticalSignificance

• Bootstrapsamples5000times• @95%confidence

StatisticalSignificance

67

68

69

70

71

72

73

Ove

rall A

ccur

acy

Easyvs.DifficultQuestions

Easyvs.DifficultQuestions

0

10

20

30

40

50

60

70

0/10 1/10 2/10 3/10 4/10 5/10 6/10 7/10 8/10 9/10 10/10

Perc

enta

ge o

f que

stio

ns

corre

ctly

ans

wer

ed b

y te

ams

Number of top 10 teams

Easyvs.DifficultQuestions

0

10

20

30

40

50

60

70

0/10 1/10 2/10 3/10 4/10 5/10 6/10 7/10 8/10 9/10 10/10

Perc

enta

ge o

f que

stio

ns

corre

ctly

ans

wer

ed b

y te

ams

Number of top 10 teams

82.5% of questions can be answered by at least 1 method!

DifficultQuestions

Easyvs.DifficultQuestions

0

10

20

30

40

50

60

70

0/10 1/10 2/10 3/10 4/10 5/10 6/10 7/10 8/10 9/10 10/10

Perc

enta

ge o

f que

stio

ns

corre

ctly

ans

wer

ed b

y te

ams

Number of top 10 teams

DifficultQuestions

EasyQuestions

Easyvs.DifficultQuestions

0

10

20

30

40

50

60

70

0/10 1/10 2/10 3/10 4/10 5/10 6/10 7/10 8/10 9/10 10/10

Perc

enta

ge o

f que

stio

ns

corre

ctly

ans

wer

ed b

y te

ams

Number of top 10 teams

2016 2017 2018

DifficultQuestionswithRareAnswers

DifficultQuestionswithRareAnswersWhatisthenameof…Whatisthenumberon…Whatiswrittenonthe…Whatdoesthesign…Whattimeisit?Whatkindof…Whattypeof…Whyisthe…

Easyvs.DifficultQuestions

Easyvs.DifficultQuestions

DifficultQuestionswithFrequentAnswers

EasyQuestions

AnswerTypeAnalyses

• SNU_BIperformsthebestfor“number”questions

Resultson“number”questions

30

35

40

45

50

55

60FA

IR-A

*H

DU

-UC

AS-U

SYD

SNU

-BI

casi

a_iv

aTo

hoku

CV

Lab

MIL

-UT

ut-s

wk

grap

h-at

tent

ion-

msm

DC

D_Z

JUvq

abyt

e fsU

TS_Y

ZZD

Adel

aide

-Ten

eyVQ

A-R

easo

nTen

sor

UPM

C-L

IP6

wyv

ernb

aica

ptio

n_vq

acv

qana

gize

roC

FM-U

ESTC

VQA_

NTU

yudf

2010

nmla

b612

Tsin

ghua

CVL

abC

IST-

VQA

VLC

Sou

tham

pton

Rel

VQA

Uni

vers

ity o

f Gue

lph

MLR

GN

TU_R

OSE

_UST

Czh

i-sm

ileVQ

A-M

achi

ne+ xie

Vard

aan

HAC

KER

SAE

-VQ

Ada

ndel

ingh

ost

VQA-

Lear

ning

vqa-

such

owH

AIBI

Nw

indL

BLVQ

A_Sa

nvq

atea

m_m

cb_b

ench

mar

kak

shay

_isi

cal

"num

ber"

accu

racy

AnswerTypeAnalyses

• SNU_BIperformsthebestfor“number”questions

• Noteamstatisticallysignificantlybetterthanthewinnerteamfor“yes/no”and“other”

Aremodelssensitivetosubtlechangesinimages?

womanDifferent answers

Similar images

man

Who is wearing glasses?

Aremodelssensitivetosubtlechangesinimages?

• Arepredictionsdifferentforcomplementaryimages?• Arepredictionsaccurateforcomplementaryimages?

Arepredictionsdifferent forcomplementaryimages?

40

45

50

55

60

65

70

FAIR

-A*

HD

U-U

CAS

-USY

DSN

U-B

Ica

sia_

iva

MIL

-UT

Toho

ku C

V La

but

-sw

kgr

aph-

atte

ntio

n-m

smD

CD

_ZJU

vqab

yte fs

UTS

_YZZ

DAd

elai

de-T

eney

VQA-

Rea

sonT

enso

rU

PMC

-LIP

6w

yver

nbai

capt

ion_

vqa

cvqa

nagi

zero

CFM

-UES

TCVQ

A_N

TUyu

df20

10nm

lab6

12Ts

ingh

uaC

VLab

CIS

T-VQ

AVL

C S

outh

ampt

onR

elVQ

AU

nive

rsity

of G

uelp

h M

LRG

NTU

_RO

SE_U

STC

zhi-s

mile

VQA-

Mac

hine

+ xie

Vard

aan

HAC

KER

SAE

-VQ

Ada

ndel

ingh

ost

VQA-

Lear

ning

vqa-

such

owH

AIBI

Nw

indL

BLVQ

A_Sa

nvq

atea

m_m

cb_b

ench

mar

kak

shay

_isi

cal

Arepredictionsaccurate forcomplementaryimages?

40

42

44

46

48

50

52

54

56

58

60

FAIR

-A*

HD

U-U

CAS

-USY

DSN

U-B

Ica

sia_

iva

MIL

-UT

Toho

ku C

V La

but

-sw

kgr

aph-

atte

ntio

n-m

smD

CD

_ZJU

vqab

yte fs

UTS

_YZZ

DAd

elai

de-T

eney

VQA-

Rea

sonT

enso

rU

PMC

-LIP

6w

yver

nbai

capt

ion_

vqa

cvqa

nagi

zero

CFM

-UES

TCVQ

A_N

TUyu

df20

10nm

lab6

12Ts

ingh

uaC

VLab

CIS

T-VQ

AVL

C S

outh

ampt

onR

elVQ

AU

nive

rsity

of G

uelp

h M

LRG

NTU

_RO

SE_U

STC

zhi-s

mile

VQA-

Mac

hine

+ xie

Vard

aan

HAC

KER

SAE

-VQ

Ada

ndel

ingh

ost

VQA-

Lear

ning

vqa-

such

owH

AIBI

Nw

indL

BLVQ

A_Sa

nvq

atea

m_m

cb_b

ench

mar

kak

shay

_isi

cal

Arepredictionsaccurateforcomplementaryimages?

40

42

44

46

48

50

52

54

56

58

60

FAIR

-A*

HD

U-U

CAS

-USY

DSN

U-B

Ica

sia_

iva

MIL

-UT

Toho

ku C

V La

but

-sw

kgr

aph-

atte

ntio

n-m

smD

CD

_ZJU

vqab

yte fs

UTS

_YZZ

DAd

elai

de-T

eney

VQA-

Rea

sonT

enso

rU

PMC

-LIP

6w

yver

nbai

capt

ion_

vqa

cvqa

nagi

zero

CFM

-UES

TCVQ

A_N

TUyu

df20

10nm

lab6

12Ts

ingh

uaC

VLab

CIS

T-VQ

AVL

C S

outh

ampt

onR

elVQ

AU

nive

rsity

of G

uelp

h M

LRG

NTU

_RO

SE_U

STC

zhi-s

mile

VQA-

Mac

hine

+ xie

Vard

aan

HAC

KER

SAE

-VQ

Ada

ndel

ingh

ost

VQA-

Lear

ning

vqa-

such

owH

AIBI

Nw

indL

BLVQ

A_Sa

nvq

atea

m_m

cb_b

ench

mar

kak

shay

_isi

cal

52.7%2017 winner

+4.8%absolute

Aremodelsdrivenbypriors?

Onlyconsiderthosequestionswhoseanswersarenotpopular(giventhequestiontype)intraining

• 1-Prior:Testanswersarenotthetop-1mostcommonintraining

• 2-Prior:Testanswerarenotthetop-2mostcommonintraining

Agrawal et al., CVPR 2018

Aremodelsdrivenbypriors?

5-6%drop

50

55

60

65

70

75

FAIR

-A*

HD

U-U

CAS

-USY

DSN

U-B

Ica

sia_

iva

MIL

-UT

Toho

ku C

V La

but

-sw

kgr

aph-

atte

ntio

n-m

smD

CD

_ZJU

vqab

yte fs

UTS

_YZZ

DAd

elai

de-T

eney

VQA-

Rea

sonT

enso

rU

PMC

-LIP

6w

yver

nbai

capt

ion_

vqa

cvqa

nagi

zero

CFM

-UES

TCVQ

A_N

TUyu

df20

10nm

lab6

12Ts

ingh

uaC

VLab

CIS

T-VQ

AVL

C S

outh

ampt

onR

elVQ

AU

nive

rsity

of G

uelp

h M

LRG

NTU

_RO

SE_U

STC

zhi-s

mile

VQA-

Mac

hine

+ xie

Vard

aan

HAC

KER

SAE

-VQ

Ada

ndel

ingh

ost

VQA-

Lear

ning

vqa-

such

owH

AIBI

Nw

indL

BLVQ

A_Sa

nvq

atea

m_m

cb_b

ench

mar

kak

shay

_isi

cal

All Questions Non-1-Prior Questions

Aremodelsdrivenbypriors?

15-16%drop

40

45

50

55

60

65

70

75

FAIR

-A*

HD

U-U

CAS

-USY

DSN

U-B

Ica

sia_

iva

MIL

-UT

Toho

ku C

V La

but

-sw

kgr

aph-

atte

ntio

n-m

smD

CD

_ZJU

vqab

yte fs

UTS

_YZZ

DAd

elai

de-T

eney

VQA-

Rea

sonT

enso

rU

PMC

-LIP

6w

yver

nbai

capt

ion_

vqa

cvqa

nagi

zero

CFM

-UES

TCVQ

A_N

TUyu

df20

10nm

lab6

12Ts

ingh

uaC

VLab

CIS

T-VQ

AVL

C S

outh

ampt

onR

elVQ

AU

nive

rsity

of G

uelp

h M

LRG

NTU

_RO

SE_U

STC

zhi-s

mile

VQA-

Mac

hine

+ xie

Vard

aan

HAC

KER

SAE

-VQ

Ada

ndel

ingh

ost

VQA-

Lear

ning

vqa-

such

owH

AIBI

Nw

indL

BLVQ

A_Sa

nvq

atea

m_m

cb_b

ench

mar

kak

shay

_isi

cal

All Questions Non-2-Prior Questions

Aremodelsdrivenbypriors?

52

53

54

55

56

57

58

Improvementfrom2017challenge

• 1-Prior:Bestperformanceimprovedby3.8%• 2-Prior:Bestperformanceimprovedby3.3%

Aremodelscompositional?

Onlyconsiderthosequestionswhicharecompositionallynovel:

• QApairisnotseenintraining• Constitutingconceptsseenintraining

Agrawal et al., Arxiv 2018

Aremodelscompositional?

Aremodelscompositional?

12-13%drop

40

45

50

55

60

65

70

75

FAIR

-A*

HD

U-U

CAS

-USY

DSN

U-B

Ica

sia_

iva

MIL

-UT

Toho

ku C

V La

but

-sw

kgr

aph-

atte

ntio

n-m

smD

CD

_ZJU

vqab

yte fs

UTS

_YZZ

DAd

elai

de-T

eney

VQA-

Rea

sonT

enso

rU

PMC

-LIP

6w

yver

nbai

capt

ion_

vqa

cvqa

nagi

zero

CFM

-UES

TCVQ

A_N

TUyu

df20

10nm

lab6

12Ts

ingh

uaC

VLab

CIS

T-VQ

AVL

C S

outh

ampt

onR

elVQ

AU

nive

rsity

of G

uelp

h M

LRG

NTU

_RO

SE_U

STC

zhi-s

mile

VQA-

Mac

hine

+ xie

Vard

aan

HAC

KER

SAE

-VQ

Ada

ndel

ingh

ost

VQA-

Lear

ning

vqa-

such

owH

AIBI

Nw

indL

BLVQ

A_Sa

nvq

atea

m_m

cb_b

ench

mar

kak

shay

_isi

cal

All Questions Compositionally Novel Questions

Aremodelscompositional?

53

54

55

56

57

58

59

60

61

56.5%2017 winner

+3.4%absolute

Aremodelscompositional?

53

54

55

56

57

58

59

60

61

Averageanswerrecall

• NewaccuracymetricproposedinKafle andKannan,ICCV17– Alsoknownas“Normalizedaccuracy”

• Method:– Computesaccuracyforeachuniqueanswer– Takethemeanoveralluniqueanswers

• Rewardsmodelswhichperformwellforrareanswers

Averageanswerrecall

18

20

22

24

26

28

30

FAIR

-A*

HD

U-U

CAS

-USY

DSN

U-B

Ica

sia_

iva

Toho

ku C

V La

bM

IL-U

Tut

-sw

kgr

aph-

atte

ntio

n-m

smD

CD

_ZJU

vqab

yte fs

UTS

_YZZ

DAd

elai

de-T

eney

VQA-

Rea

sonT

enso

rU

PMC

-LIP

6w

yver

nbai

capt

ion_

vqa

cvqa

nagi

zero

CFM

-UES

TCVQ

A_N

TUyu

df20

10nm

lab6

12Ts

ingh

uaC

VLab

CIS

T-VQ

AVL

C S

outh

ampt

onR

elVQ

AU

nive

rsity

of G

uelp

h M

LRG

NTU

_RO

SE_U

STC

zhi-s

mile

VQA-

Mac

hine

+ xie

Vard

aan

HAC

KER

SAE

-VQ

Ada

ndel

ingh

ost

VQA-

Lear

ning

vqa-

such

owH

AIBI

Nw

indL

BLVQ

A_Sa

nvq

atea

m_m

cb_b

ench

mar

kak

shay

_isi

cal

Averageanswerrecall

18

20

22

24

26

28

30

FAIR

-A*

HD

U-U

CAS

-USY

DSN

U-B

Ica

sia_

iva

Toho

ku C

V La

bM

IL-U

Tut

-sw

kgr

aph-

atte

ntio

n-m

smD

CD

_ZJU

vqab

yte fs

UTS

_YZZ

DAd

elai

de-T

eney

VQA-

Rea

sonT

enso

rU

PMC

-LIP

6w

yver

nbai

capt

ion_

vqa

cvqa

nagi

zero

CFM

-UES

TCVQ

A_N

TUyu

df20

10nm

lab6

12Ts

ingh

uaC

VLab

CIS

T-VQ

AVL

C S

outh

ampt

onR

elVQ

AU

nive

rsity

of G

uelp

h M

LRG

NTU

_RO

SE_U

STC

zhi-s

mile

VQA-

Mac

hine

+ xie

Vard

aan

HAC

KER

SAE

-VQ

Ada

ndel

ingh

ost

VQA-

Lear

ning

vqa-

such

owH

AIBI

Nw

indL

BLVQ

A_Sa

nvq

atea

m_m

cb_b

ench

mar

kak

shay

_isi

cal

ProgressinVQA

68

50

55

60

65

70

75

12/7/15 3/16/16 6/24/16 10/2/16 1/10/17 4/20/17 7/29/17 11/6/17 2/14/18 5/25/18

ICCV15

Accu

racy

on

v2

ProgressinVQA

69

50

55

60

65

70

75

12/7/15 3/16/16 6/24/16 10/2/16 1/10/17 4/20/17 7/29/17 11/6/17 2/14/18 5/25/18

ICCV15

2016Challengewinner

Accu

racy

on

v2

ProgressinVQA

70

50

55

60

65

70

75

12/7/15 3/16/16 6/24/16 10/2/16 1/10/17 4/20/17 7/29/17 11/6/17 2/14/18 5/25/18

ICCV15

2016Challengewinner

Accu

racy

on

v2

+7.0%absolute

ProgressinVQA

71

50

55

60

65

70

75

12/7/15 3/16/16 6/24/16 10/2/16 1/10/17 4/20/17 7/29/17 11/6/17 2/14/18 5/25/18

ICCV15

2016Challengewinner

2017Challengewinner

Challenge2017deadline

Accu

racy

on

v2

ProgressinVQA

72

50

55

60

65

70

75

12/7/15 3/16/16 6/24/16 10/2/16 1/10/17 4/20/17 7/29/17 11/6/17 2/14/18 5/25/18

ICCV15

2016Challengewinner

2017Challengewinner

Challenge2017deadline

+6.7%absolute

Accu

racy

on

v2

ProgressinVQA

73

50

55

60

65

70

75

12/7/15 3/16/16 6/24/16 10/2/16 1/10/17 4/20/17 7/29/17 11/6/17 2/14/18 5/25/18

ICCV15

2016Challengewinner

2017Challengewinner

Challenge2018deadline

2018Challengewinner

Accu

racy

on

v2

ProgressinVQA

74

50

55

60

65

70

75

12/7/15 3/16/16 6/24/16 10/2/16 1/10/17 4/20/17 7/29/17 11/6/17 2/14/18 5/25/18

ICCV15

2016Challengewinner

2017Challengewinner

+3.4%absolute2018Challenge

winner

Accu

racy

on

v2

VisualDialogChallenge2018

75

• Deadline:mid-August,2018• Results:September8th,2018 atECCV2018

visualdialog.org/challenge/2018

• ~130k images (COCO)• 10-round dialog / image• ~1.3 million QA pairs• Evaluation

• Automatic metrics• Human annotations

VisDial v1.0

Thanks!

Questions?