Learnable pooling with Context Gating for Video Classification · Learnable pooling with Context...

Preview:

Citation preview

Learnable pooling with Context Gating for Video Classification

Antoine Miech, Ivan Laptev, Josef Sivic

1

Goal: Multi-modal features pooling2

POOLING MODULE

CLASSIFICATION

Recurrent model (e.g LSTM)3

LSTM

LSTM

Recurrent model (e.g LSTM)4

LSTM

LSTM

LSTM

LSTM

Recurrent model (e.g LSTM)5

LSTM

LSTM

LSTM

LSTM

LSTM

LSTM

Recurrent model (e.g LSTM)6

LSTM

LSTM

LSTM

LSTM

LSTM

LSTM

LSTM

LSTM

Recurrent model (e.g LSTM)7

LSTM

LSTM

LSTM

LSTM

LSTM

LSTM

LSTM

LSTM

Problem ?

▹ Slow for inference/training▹ This is NOT a sequential problem▹ Needs lots of data for training▹ How about very long videos ?

But surprisingly good results !

8

Our approach9

Traditional pooling

▹ Bag-of-visual-words [Sivic and Zisserman, 2003][Csurka et al., 2004]

▹ VLAD [Jégou et al., 2010]

▹ Fisher Vector [Perronnin et al., 2007]

10

Traditional Pipeline11

Unsupervised learning of a dictionary

Learning a supervised classifier

Encoding And unsupervised

dimension reduction

End-to-end Pipeline

Supervised learning of dictionary+

Supervised dimension reduction+

Learning a supervised classifier

VS

Learnable pooling 12

Unsupervised End-to-EndBag-of-visual-Words [Sivic and Zisserman, 2003] Soft-DBoW

VLAD [Jégou et al., 2010] NetVLAD [Arandjelović et al. CVPR 2016]

Fisher Vector [Perronnin et al., 2007] NetFV

Model overview13

Context Gating14

Context Gating15

Context Gating16

Equation:

Model overview17

Results18

Generalization19

20

BonusWinning the kaggle competition

21

Effects of ensembling22

LOUPE Tensorflow toolbox23

GITHUB LOUPE repo:github.com/antoine77340/LOUPE

GITHUB Kaggle code repo:github.com/antoine77340/Youtube-8M-WILLOW

Questions ?

24

Recommended