Self-supervised Learning for Video Correspondence Flovgg/publications/2019/Lai19/poster.pdf ·...

Preview:

Citation preview

● 1. Color dropout as augmentation

● 2. Cycle-consistency + Scheduled sampling

● 3. Restricted attention for higher resolution.

Consequences: longer tracks, reduced drift

> Oral session (Video Analysis): 13:00 - 13:15 Thursday 12th> Paper, Code, Pretrained model available for download. Checkout:

Self-supervised Learning for Video Correspondence FlowZihang Lai, Weidi Xie

VGG, University of Oxford

The objective of this paper is self-supervised learning of matching correspondences along videos, which we term correspondence flow. Learning only from unlabeled videos, we propose to train a “pointer” that reconstructs a target frame by copying pixels from a reference frame.

Introduction

Our correspondences could be used to propagate many entities (e.g. segmentation masks, keypoints) along a video sequence.

Qualitative results on DAVIS and JMHDB

What to do with correspondence?

Frame t(Only R) model

Frame t+1(RGB)

Frame t(RGB) model

Frame t+1(RGB)

Trai

ning

Test

ing

A feature extractor that produce embeddings suitable for matching correspondences.

What to learn?

Objective: Learning pixel correspondence in videos without annotations!

framet

framet+1 featuret+1

featuretMatching Feature

extractor

How to learn?

We outperform existing self-supervised learning approaches by a significant margin.

Results

Method Supervised J & F (Mean)

Optical Flow ✗ 26.0

Vondrick et al. ✗ 34.0

CycleTime ✗ 40.7

Ours ✗ 49.5

SiamMask ✓ 53.1

OSVOS ✓ 60.3

Method Supervised J&F (Mean)

Optical Flow ✗ 26.0

Vondrick et al. ✗ 34.0

Wang et al. ✗ 40.7

Ours ✗ 49.5

OSVOS ✓ 60.3

Method Supervised PCK @.1

Optical Flow ✗ 49.0

Vondrick et al. ✗ 45.2

Wang et al. ✗ 57.7

Ours ✗ 58.5

ImageNet ✓ 58.4

Video segmentation(DAVIS-2017)

Keypoint tracking(JHMDB dataset)

Find more...

Frame t(Only R) model

Frame t+1(RGB)

Search region

Reference frame Target frame

Recommended