T-CNN Object Detection from Video Kang, Kai and Ouyang, Wanli and Li, Hongsheng and Wang, Xiaogang CVPR 2016 [arxiv] [code] Slides by Andrea Ferri ([email protected] ) Computer Vision Reading Group @ UPC BarcelonaTech (Spring 2016)

T-CNN, Object Detection from Video

Download PDF Report

Upload
universitat-politecnica-de-catalunya
View
940
Download
0

Embed Size (px)

Citation preview

T-CNNObject Detection from Video

Kang, Kai and Ouyang, Wanli and Li, Hongsheng and Wang, Xiaogang

CVPR 2016

[arxiv] [code]

Slides by Andrea Ferri ([email protected])Computer Vision Reading Group @ UPC BarcelonaTech (Spring 2016)

http://arxiv.org/abs/1604.02532

https://github.com/myfavouritekk/T-CNN

http://arxiv.org/abs/1604.02532

mailto:[email protected]

https://github.com/imatge-upc/readcv/blob/master/README.md

Page 2: T-CNN, Object Detection from Video

Summary:

•Introduction;•Architecture;

I. Still-Image Detection;II. MCS & MGP;

III. Tubelet Re-Scoring;

•Experiment.

Page 3: T-CNN, Object Detection from Video

Introduction:

DET & VID challenges

are strongly DIFFERENT.

DET applied to VID has:→ Large Temporal Fluctuations→ Generate False Positives

Page 4: T-CNN, Object Detection from Video

Page 5: T-CNN, Object Detection from Video

T-CNN means:Tubelets - Convolutional

Neural Network Where Tubelets are:

Bounding Box Sequences Having:• Temporal Information;• Contextual Information.

Page 6: T-CNN, Object Detection from Video

Architecture:

T-CNN is a composition of nowadays State of the Art:• Still-Image Object Detection;• Object Tracking Algorithm;• A Lot of Cool Tricks.

Page 7: T-CNN, Object Detection from Video

Page 8: T-CNN, Object Detection from Video

I. Still-Image DetectionThe used Detectors are:•DeepID-Net (Improvement of R-CNN);•CRAFT (Extension of Fast R-CNN).Both use different Region Proposal pre-trained models and training strategies.

Page 9: T-CNN, Object Detection from Video

II. MCS & MGPMulti-Context Suppression

Page 10: T-CNN, Object Detection from Video

Multi-Context Suppression

→ Sort all detection scores of all proposals in a video in descending order

→ The classes of the high rankings are denoted as the confident

→ The scores of classes with low rankings are suppressed, while the scores of confident classes remain unchanged.

Page 11: T-CNN, Object Detection from Video

Motion-Guided Propagation

Page 12: T-CNN, Object Detection from Video

Motion-Guided Propagation

→ In each frame, some objects are not found by detector. However, detections on adjacent frames are complementary to each other;

→Detections are propagated to adjacent frames. Optical flow is used for guiding the propagation;

→Propagation results in redundant boxes, which can be easily handled by non- maximum suppression (NMS).

Page 13: T-CNN, Object Detection from Video

III. Tubelet Re-Scoring

1.High Confidence Tracking;

2.Spatial Max Pooling;

3.Temporal Re-Scoring.

Page 14: T-CNN, Object Detection from Video

High Confidence Tracking

1 → Obtain detection results from still-image detectors;

2 → Choose high-confidence detections as starting points (anchors) for tracking;

3 → Obtain tubelets, which are bounding box sequences generated from tracking algorithms.

Page 15: T-CNN, Object Detection from Video

Spatial Max Pooling

- Still-image detection results that have large overlaps with tubelet boxes are chosen for each tubelet;

- Only detections with maximum detection scores are left after spatial max-pooling;

Used the Kalman Filter to smooth the bounding box locations.

Page 16: T-CNN, Object Detection from Video

Temporal Re-Scoring

• Tubelet Classification. Classify tubelets based on statistics of detection scores (mean, median, top-k). A linear classifier is learnt based on the statistics;

• Tubelet Re-scoring. Map detection scores of positive tubelets to [0.5, 1], negative ones to [0, 0.5].

Used a Bayesian Classifier.

Page 17: T-CNN, Object Detection from Video

Experiments:

•Tricky work behind Dataset for training (Dataset Ratio 2:1=DET:VID);•Main Parameters:•MGP: 7 Frames;•MCS: 0,0003 Top classes of Boxes;

Page 18: T-CNN, Object Detection from Video

Results:

Page 19: T-CNN, Object Detection from Video

Page 20: T-CNN, Object Detection from Video

Page 21: T-CNN, Object Detection from Video

Reference:

• T-CNN: Tubelets with Convolutional Neural Networks for Object Detection from Videos : Kai Kang, Hongsheng Li, Junjie Yan, Xingyu Zeng, Bin Yang, Tong Xiao, Cong Zhang, Zhe Wang, Ruohui Wang, Xiaogang Wang, and Wanli Ouyang.

Andrea Ferri, [email protected]

http://arxiv.org/abs/1604.02532