Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
End-to-end Lane Detection through Differentiable Least-Squares Fitting
Wouter Van Gansbeke, Bert De Brabandere, Davy Neven, Marc Proesmans, Luc Van Gool
arXiv:1902.00293v3 [cs.CV] 5 Sep 2019
ECE 285 – Autonomous Driving Systems
Presented by – Anirudh Swaminathan – April 23, 2020
Why Lane Detection?
Detecting lanes is important because:-
Position vehicle within the lane
Plan future trajectory, lane departures
2
Lane Detection Background
Previous methods before this paper:-
Two step pipelines
First step -> segment lane line markings
Second step -> fit a lane line model to post-processed mask
3
2-stage examples
Classical SIFT[20] / SURF[2] for feature extraction
RANSAC / spline / polynomial for parameters of best fitting model
Deep Learning Based Instance Segmentation – LaneNet [24]
Curve fitting mostly same
[24] - Towards End-to-End Lane Detection: an Instance Segmentation ApproachDavy Neven, Bert De Brabandere, Stamatios Georgoulis, Marc Proesmans, Luc Van Gool ESAT-PSI, KU LeuvenarXiv:1802.05591v1 [cs.CV] 15 Feb 2018[2] H. Bay, T. Tuytelaars, and L. Van Gool. Surf: Speeded up robust features. Proceedings of the European Conference on Computer Vision, 2006[20] D. G. Lowe. Object recognition from local scale-invariant features. In Proceedings of the IEEE International Conference on Computer Vision, 1999.
4
Objective of the Paper
End-to-end manner
Directly regress lane parameters
5
Motivation
Why single step?
Parameters not optimized for true task True task – estimating lane curvature parameters
Proxy task – Segmenting lane markings
Prevents instabilities in curve fitting 2 step –> outliers
End-to-end -> implicitly learn features to prevent instabilities
6
Methodology
Key Idea -> Integrate curve fitting step as a differentiable in-network optimization step
Deep Network for the feature extraction step
Key Idea -> A geometric loss function for the network
7
Framework
The framework consists of 3 main modules:-
Deep network to generate weighted pixel coordinates
Differential weighted least squares fitting module
Geometric Loss Function
8
Example Architecture – Figure 1 from the paper 9
Generating Weighted Pixel Coordinates
First Module of network
Normalized Coordinates -> x map and y map
Each coordinate -> weight w
Feature map -> same spatial dimensions as that of input image
10
Feature Maps
Non-negative weights
Width – w, height – h; m = w * h
M triplets generated – (x, y, w)
One feature map for each lane
11
Example Architecture – Figure 1 from the paper 12
Weighted Least Squares Layer
M triplets (x, y, w) -> weighted points in 2D space
Fit curve
Module output -> n parameters of best-fitting curve
13
Background - Least Squares Fitting
𝑋𝑋𝑋𝑋 = 𝑌𝑌;𝑋𝑋 ∈ 𝑅𝑅𝑚𝑚𝑚𝑚𝑚𝑚 ;𝑋𝑋 ∈ 𝑅𝑅𝑚𝑚×1;𝑌𝑌 ∈ 𝑅𝑅𝑚𝑚×1
X is input, 𝑋𝑋 are parameters, and Y is output
Least Squares -> 𝑋𝑋 = 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 ||𝑋𝑋𝑋𝑋 − 𝑌𝑌||2
Normal Equation -> 𝑋𝑋 = 𝑋𝑋𝑇𝑇𝑋𝑋 −1𝑋𝑋𝑇𝑇𝑌𝑌
14
Background – Weighted Least Squares
Least squares extended
𝑊𝑊 ∈ 𝑅𝑅𝑚𝑚×m ; Diagonal matrix -> weights for each pair of observations
𝑋𝑋 = 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 ||𝑊𝑊12(𝑋𝑋𝑋𝑋 − 𝑌𝑌)||2
𝑋𝑋𝑇𝑇𝑊𝑊𝑋𝑋𝑋𝑋 = 𝑋𝑋𝑇𝑇𝑊𝑊𝑌𝑌
𝑋𝑋 = 𝑋𝑋𝑇𝑇𝑊𝑊𝑋𝑋 −1𝑋𝑋𝑇𝑇𝑊𝑊𝑌𝑌
15
Backprop through the layer
Equations involve differentiable matrix operations
Calculate the derivative of 𝑋𝑋 wrt W
Refer to [10] to derive backprop
M. B. Giles. An extended collection of matrix derivative results for forward and reverse mode automatic differentiation.Technical report, University of Oxford, 2008.
16
Example Architecture – Figure 1 from the paper 17
Geometric Loss Function - precursor
Usually, L2 loss used for curve fitting
Here, 𝑋𝑋𝑖𝑖 and �𝑋𝑋𝑖𝑖 -> generated and groundtruth curve parameters
18
Geometric Loss Function
Lane Detection -> geometric interpretation
Minimize squared area between predicted curve and ground truth curve
19
Geometric Meaning
20
Geometric Loss Function – Parabola
This paper -> lane curves parabolic
𝑦𝑦 = 𝑋𝑋0 + 𝑋𝑋1𝑥𝑥 + 𝑋𝑋2𝑥𝑥2; Δ𝑋𝑋𝑖𝑖 = 𝑋𝑋𝑖𝑖 − �𝑋𝑋𝑖𝑖
21
Optional Transformations
Weighted coordinates -> another reference frame
Use fixed transformation matrix H
Lane line -> better as parabola from top-down/ortho view(BEV)
22
Experiment – Ego Lane Detection
Ego lane -> the current lane of the vehicle
Two lane marking -> one left and one right
Parabola -> upto fixed distance t from car
Overall error = average over 2 lanes, and average over images
23
Dataset
TuSimple Dataset [29]
Manually select and clean up the annotations of 2535 images
Filter out images where ego-lane cannot be detected unambiguously
20% images -> validation set
Not include images of single temporal sequence in both train and val sets
[29] TuSimple. Tusimple benchmark, 2017. 24
Annotation
Ground truth curve parameters -> parabola
Draw curve of fixed thickness as dense label
25
Baseline – Cross-entropy training
Training Segmentation
Per-pixel binary cross-entropy loss
Testing Segmentation mask generated
Parabola fitted in least squares sense
26
End-to-end training
ERFNet [28] -> network architecture
350 epochs; 1 GPU; 256*512 resolution; batch size 8
Adam[19] with LR 10−4
PyTorch [26][28] E. Romera, J. M. Alvarez, L. M. Bergasa, and R. Arroyo. Efficient convnet for real-time semantic segmentation. In IEEE Intelligent Vehicles Symposium, pages 1789–1794, 2017.
[19] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations, 2015.
[26] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. De- Vito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer. Automatic differentiation in pytorch. In NIPS-W, 2017. 27
Example Architecture – Figure 1 from the paper 28
Detour -ERFNet
29
ERFNet
Semantic Segmentation
Typical Encoder – Decoder architecture
Last layer -> adapted to output 2 feature maps
One for each ego lane
Transform weighted coordinates using fixed H to top-down view
[28] E. Romera, J. M. Alvarez, L. M. Bergasa, and R. Arroyo. Efficient convnet for real-time semantic segmentation. In IEEE Intelligent Vehicles Symposium, pages 1789–1794, 2017 30
Results - Quantitative
31
Result – Training curves
32
Qualitative Results 33
Analysis
Lower error than cross-entropy method
Convergence slower -> supervision signal weaker
Generated weight maps look like segmentation maps the network eventually discovers that the most consistent way to satisfy the loss function is to focus
on the visible lane markings in the image, and to map them to a segmentation-like representation in the weight maps.
34
Further Experiments – Multi-lane detection
4 lanes total -> ortho-view
Line prediction branch; horizon prediction branch
Horizon prediction branch -> regression -> estimate horizon
Line prediction branch -> whether lane is present or not
35
Architecture
36
Architecture details
Side branches -> 4 conv layers -> each 3x3
Then max pool -> FC layer
Losses for 3 tasks -> combined linearly
37
Dataset
3626 images
20% validation set
2782 test set images
38
Comparison
ERFNet without backprop through least squares layer -> baseline
[25] Spatial CNN
[24] -> Instance Segmentation approach
[24] D. Neven, B. De Brabandere, S. Georgoulis, M. Proesmans, and L. Van Gool. Towards end-to-end lane detection: an instance segmentation approach. arXiv:1802.05591, 2018.
[25] X. Pan, J. Shi, P. Luo, X. Wang, and X. Tang. Spatial as deep: Spatial cnn for traffic scene understanding. In AAAI, 2018. 39
Results Quantitative
40
Results -Qualitative
41
Analysis
Improve upon baseline by 0.7%
Faster than benchmarks in test time -> no post-processing required
71 fps on NVIDIA 1080Ti
42
ADVANTAGES
Optimized for true task -> prevents instabilities in curve fitting
Offers degree of interpretability Generated weight maps -> segmentation-like
Can be inspected and visualized
Geometry aware criterion is loss function
Handle large variance, faded lane markings
Moves complexity from post-processing to network -> one-shot fitting
43
DISADVANTAGES
Loss function -> more complicated for higher order curves
Fixed transformation H to ortho-view If ground plane is different (ex. Sloping uphill), then bad lane parameters in test time
Local minimum possible – author Vanishing point in horizon/left corner of image features -> good curve -> no improvement
Multi-lane -> fixed number of maps -> pre-defined order Lane changes hard; Order ambiguous
Instance segmentation -> not subject to specific order
Quantitative results -> comparatively worse from slide 40
44
KEY TAKEAWAYS
Including differentiable in-network optimization step.
Geometric Loss function relevant to the task
45
Question to the class
Why do you think that the loss in the Least Squares layer is only back-propagated to the coordinate weights only, and not to the coordinates themselves?
46
THANK YOU!
Questions?