WhatMakesaVideoaVideo:AnalyzingTemporal Informationin...

Preview:

Citation preview

What Makes a Video a Video: Analyzing Temporal Information inVideo Understanding Models and Datasets

De-An Huang1, Vignesh Ramanathan2, Dhruv Mahajan2, Lorenzo Torresani2, Manohar Paluri2, Li Fei-Fei1, Juan Carlos Niebles1

Stanford University1, Facebook2

Motivation Class-Agnostic Temporal Generator AnalysisØ Videos contain much more than just the imagesØ Still missing an explicit analysis of temporal information

Ø Analyze the video model trained on a dataset (fixed weights)Ø Propose three frameworks to ablate temporal info from test video

Ø Single frame is just an image and contains no temporal information

(b) Video matching C3D deep features of (a)(a) Original Video

Approach Overview

0 10 20 30 40 50 60 70 80 90

Original Video

No Temporal

Conv

1

Conv

2

Conv

3

Conv

4

Conv

5

C3D trainedon UCF101

Test Video SelectedFrame

Subsampling

FrameSelector

TemporalGenerator

GeneratedVideoGenerator

Selector

6%

Ø Temporal Dist Shift: Model has not seen “static videos” in trainingØ Generate a video from the frame to bridge the distribution shift but

without using any ”real” temporal information

Ø Learning the Temporal Generator: The video generated from the imageshould be perceptually similar to the original video for the model

Ø Key frame for us to recognize the action without temporal informationØ ! " : Estimate of frame quality

Conv

1

Conv

2

Conv

3

Conv

4

Conv

5

C3D trainedon UCF101

Test Video MiddleFrame

ReplicatedFrames

ReplicateFrames

MiddleFrame

Conv

1

Conv

2

Conv

3

Conv

4

Conv

5

C3D trainedon UCF101

Test Video MiddleFrame

MiddleFrame

TemporalGenerator

GeneratedVideo

Naïve Subsampling

Video Model (C3D)Input Video

SelectedFrame

GeneratedVideo

TemporalGenerator

Subsampling

ℓ$ ℓ% ℓ& ℓ' ℓ(

Motion-Invariant Frame Selector

! )* = max/ 0/()*)0/()*) : score of class 3

Input Video

Sub-sampledFrame Candidates

……

)$

)*

)4

!(")

!(")

!(")

argmaxØ Oracle Key Frames (UpperBound): select the framesthat can give correctprediction

Ø Analyzing Motion Information

Ø 40% of UCF101 and 35% of Kinetics classes do not need motion

Ø

Ø Temporal Generator:

Ø Frame Selection:

Ø Oracle Fame Selection

JuggleBallsOriginal Vid

JuggleBallsTemp. Gen.

PlayFluteOriginal Vid

PlayFluteTemp. Gen.

Sled

Dog

R

acin

gIc

e Sk

atin

gB

oxin

gsp

eedb

agSk

iJu

mpi

ng

Recommended