Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Minimizing Annotation Effort
Dr. Antonio M. López
June 9th, 2019
ACKNOWLEDGMENTS
ICREA Academia Programme
ACKNOWLEDGMENTS
MICINN Project TIN2017-88709-R ("DANA")
AGAUR 2017-SGR-01597
CERCA (Centres de Recerca de Catalunya)
ACCIÓ (Generalitat de Catalunya)
Divide-&-Conquer Engineering View: Modular approach (Perception Local Maneuver)
2008-10 2011 2012-14 2015-16 2017-19
1rst object detector
fully trained using
videogame data.
ECCV’14, ICCV’15, ECCV’16, ICCV’17, ECCV’18*
(*) Friday, Full day at room N1095ZG, VisDA Challenge
DA VirtualReal
for DPM
Virtual/Augmented Reality
for Visual Artificial Intelligence (VARVAI)
Deep Learning “starts”
for Computer Vision
Transferring & Adapting Source Knowledge
in Computer Vision (TASK-CV)
ECCV’16 & ACM-MM’16
’18: Computer Graphics for Autonomous Driving
Explosion on the use of synthetic
data in Computer Vision: GTA-V,
Internet Models, ...
AD Challenge
@ CVPR’19
Pure Data-Driven AI View & Naturalistic View: End-to-End Autonomous Driving
Imitation Learning: No manual supervision
ALVINN (1988)¹ DAVE (2005)²
1.D. Pomerleau. ALVINN: An autonomous land vehicle in a neural network. NIPS, 1988.
2.Y. LeCun, U. Muller, J. Ben, E. Cosatto, and B. Flepp. Off-road obstacle avoidance through end-to-end learning. NIPS, 2005.
Pure Data-Driven AI View & Naturalistic View: End-to-End Autonomous Driving (P&LP)
Still, many diverse experiences are required!
Index
• SYNTHIA: co-training object detectors
• CARLA: multimodal end-to-end driving
Index
• SYNTHIA: co-training object detectors
• CARLA: multimodal end-to-end driving
Unlabelled
Real-world
Data
Self-labelled
Real-world
Data
Object
DetectorDetect
Self-Learning, under domain shift
source: SYNTHIA, target: real-world dataset.
Detections as
Labelled Data
Train
Basic assumption:
Source model is relatively good detecting
on target data.
Basic idea:
1. Start with a detector trained on SYNTHIA.
2. Use the detector to process images of an
unlabelled real-world dataset (e.g. KITTI).
3. Select the M images with highest detection
scores. (Thr high precision, low recall).
4. Use detections and backgrounds from such
M images as self-labelled real-world data.
5. Retrain the detector with the SYNTHIA
data and the self-labelled data.
6. Keep doing 2-5 for C cycles.
Co-Training, under domain shift source: SYNTHIA, target: real-world dataset.
Unlabelled
Real-world
Data
Self-labelled
Real-world
Data #1
Object
Detector #1
Detect
Detections as
Labelled Data
TrainObject
Detector #2
Self-labelled
Real-world
Data # 2
Train
Detect
Basic assumptions:
1. Source models are good
detecting on target data.
2. Both detectors behave
essentially different.
Basic idea:
1. ~ Self-learning: one detector
(#1) sends to the other (#2)
the M images with most
confident detections.
2. ~ Discrepancy: from such M
images, the other detector
(#2) only keeps the N with
lowest confidence, N<M.
3. Parallel training.
4. Keep doing 1-3 for C cycles.
Index
• SYNTHIA: co-training object detectors
• CARLA: multimodal end-to-end driving
Pure Data-Driven AI View & Naturalistic View: End-to-End Autonomous Driving (P&LP)
… by Imitation/demonstration (behavior cloning)
Branched Architecture
“End to End Driving via Conditional Imitation Learning”, Codevilla et al., ICRA’2018
“Monocular Depth Estimation by Learning from Heterogeneous Datasets”,
A. Gurram, O. Urfalioglu, I. Halfaoui, F. Bouzaraa, A.M. Lopez,
IEEE Intelligent Vehicles Symposium, 2018
Depth ground truth: KITTI LiDAR
Semantic ground truth: Cityscapes semantic segmenation
Phase 1 – Discrete depth estimation (i.e. classification).
Phase 1 – Semantic segmentation (classification).
Phase 2 – Depth regression.
KITTI: Training set (LiDAR ground truth) & Testing set
Quantitative results
Eigen et al. KITTI split. DRN - Depth regression network, DC-DRN Depth regression model with pre-trained classification network. DSC-DRN - Depth
regression network trained with the conditional flow approach for depth ranges 1-80m & 1-50m. In Godard approaches, "K" means using KITTI for
training, "CS + K" means using Cityscapes too. Bold stands for best, italics for second best.
Cityscapes Testing! (cross-domain generalization)
Photo-realistic SYNTHIA
Multimodal end-to-end driving: RGB+D multisensory / single-sensor (monocular)
Yi et al. (arXiv:1906.03199)
Address
Edifici O, Campus UAB
08193 Bellaterra
Barcelona
Phone & Fax
Direct Line: +34 93 581 2561
Fax: +34 93 581 1670
www.cvc.uab.es
E-contact
www.cvc.uab.es/~antonio
Dr. Antonio M. López, Principal Investigator UAB & CVC ADAS Group
In conclusion, we are
lazy annotators!!!
Many Thanks!!!Q?