Minimizing Annotation Effort - Intelligent Vehiclesintelligent-vehicles.org/wp-content/uploads/2019/06/Lopez-Minimizin… · 1rst object detector fully trained using videogame data

Minimizing Annotation Effort

Dr. Antonio M. López

[email protected]

June 9th, 2019

ACKNOWLEDGMENTS

ICREA Academia Programme

ACKNOWLEDGMENTS

MICINN Project TIN2017-88709-R ("DANA")

AGAUR 2017-SGR-01597

CERCA (Centres de Recerca de Catalunya)

ACCIÓ (Generalitat de Catalunya)

[email protected]

Divide-&-Conquer Engineering View: Modular approach (Perception Local Maneuver)

[email protected]

Deep CNNs Need Annotated Data

Let’s labelling data for fun!

[email protected]

2008-10 2011 2012-14 2015-16 2017-19

1rst object detector

fully trained using

videogame data.

ECCV’14, ICCV’15, ECCV’16, ICCV’17, ECCV’18*

(*) Friday, Full day at room N1095ZG, VisDA Challenge

DA VirtualReal

for DPM

Virtual/Augmented Reality

for Visual Artificial Intelligence (VARVAI)

Deep Learning “starts”

for Computer Vision

Transferring & Adapting Source Knowledge

in Computer Vision (TASK-CV)

ECCV’16 & ACM-MM’16

’18: Computer Graphics for Autonomous Driving

Explosion on the use of synthetic

data in Computer Vision: GTA-V,

Internet Models, ...

AD Challenge

@ CVPR’19

[email protected]

Pure Data-Driven AI View & Naturalistic View: End-to-End Autonomous Driving

[email protected]

Imitation Learning: No manual supervision

[email protected]

ALVINN (1988)¹ DAVE (2005)²

1.D. Pomerleau. ALVINN: An autonomous land vehicle in a neural network. NIPS, 1988.

2.Y. LeCun, U. Muller, J. Ben, E. Cosatto, and B. Flepp. Off-road obstacle avoidance through end-to-end learning. NIPS, 2005.

[email protected]

Pure Data-Driven AI View & Naturalistic View: End-to-End Autonomous Driving (P&LP)

Still, many diverse experiences are required!

Index

• SYNTHIA: co-training object detectors

• CARLA: multimodal end-to-end driving

Index



[email protected]

[email protected]

Unlabelled

Real-world

Data

Self-labelled

Real-world

Data

Object

DetectorDetect

Self-Learning, under domain shift

source: SYNTHIA, target: real-world dataset.

Detections as

Labelled Data

Train

Basic assumption:

Source model is relatively good detecting

on target data.

Basic idea:

1. Start with a detector trained on SYNTHIA.

2. Use the detector to process images of an

unlabelled real-world dataset (e.g. KITTI).

3. Select the M images with highest detection

scores. (Thr high precision, low recall).

4. Use detections and backgrounds from such

M images as self-labelled real-world data.

5. Retrain the detector with the SYNTHIA

data and the self-labelled data.

6. Keep doing 2-5 for C cycles.

Co-Training, under domain shift source: SYNTHIA, target: real-world dataset.

Unlabelled

Real-world

Data

Self-labelled

Real-world

Data #1

Object

Detector #1

Detect

Detections as

Labelled Data

TrainObject

Detector #2

Self-labelled

Real-world

Data # 2

Train

Detect

Basic assumptions:

1. Source models are good

detecting on target data.

2. Both detectors behave

essentially different.

Basic idea:

1. ~ Self-learning: one detector

(#1) sends to the other (#2)

the M images with most

confident detections.

2. ~ Discrepancy: from such M

images, the other detector

(#2) only keeps the N with

lowest confidence, N<M.

3. Parallel training.

4. Keep doing 1-3 for C cycles.

Index



[email protected]

Pure Data-Driven AI View & Naturalistic View: End-to-End Autonomous Driving (P&LP)

… by Imitation/demonstration (behavior cloning)

[email protected]

?StraightLeft Right Nothing

Trajectory Planning

[email protected]

Branched Architecture

“End to End Driving via Conditional Imitation Learning”, Codevilla et al., ICRA’2018

[email protected]

[email protected]

“Monocular Depth Estimation by Learning from Heterogeneous Datasets”,

A. Gurram, O. Urfalioglu, I. Halfaoui, F. Bouzaraa, A.M. Lopez,

IEEE Intelligent Vehicles Symposium, 2018

Depth ground truth: KITTI LiDAR

Semantic ground truth: Cityscapes semantic segmenation

[email protected] 21

Phase 1 – Discrete depth estimation (i.e. classification).


Phase 1 – Semantic segmentation (classification).


Phase 2 – Depth regression.

[email protected]

KITTI: Training set (LiDAR ground truth) & Testing set

[email protected]

Quantitative results

Eigen et al. KITTI split. DRN - Depth regression network, DC-DRN Depth regression model with pre-trained classification network. DSC-DRN - Depth

regression network trained with the conditional flow approach for depth ranges 1-80m & 1-50m. In Godard approaches, "K" means using KITTI for

training, "CS + K" means using Cityscapes too. Bold stands for best, italics for second best.

[email protected]

Cityscapes Testing! (cross-domain generalization)

[email protected]

Photo-realistic SYNTHIA

[email protected]

Multimodal end-to-end driving: RGB+D multisensory / single-sensor (monocular)

Yi et al. (arXiv:1906.03199)

[email protected]

[email protected]

Address

Edifici O, Campus UAB

08193 Bellaterra

Barcelona

Phone & Fax

Direct Line: +34 93 581 2561

Fax: +34 93 581 1670

www.cvc.uab.es

E-contact

www.cvc.uab.es/~antonio

[email protected]

Dr. Antonio M. López, Principal Investigator UAB & CVC ADAS Group

In conclusion, we are

lazy annotators!!!

Many Thanks!!!Q?

Documents

Minimizing Annotation Effort - Intelligent Vehiclesintelligent-vehicles.org/wp-content/uploads/2019/06/Lopez-Minimizin… · 1rst object detector fully trained using videogame data