End-to-end Lane Detection through Differentiable...

Preview:

Citation preview

End-to-end Lane Detection through Differentiable Least-Squares Fitting

Wouter Van Gansbeke, Bert De Brabandere, Davy Neven, Marc Proesmans, Luc Van Gool

arXiv:1902.00293v3 [cs.CV] 5 Sep 2019

ECE 285 – Autonomous Driving Systems

Presented by – Anirudh Swaminathan – April 23, 2020

Why Lane Detection?

Detecting lanes is important because:-

Position vehicle within the lane

Plan future trajectory, lane departures

2

Lane Detection Background

Previous methods before this paper:-

Two step pipelines

First step -> segment lane line markings

Second step -> fit a lane line model to post-processed mask

3

2-stage examples

Classical SIFT[20] / SURF[2] for feature extraction

RANSAC / spline / polynomial for parameters of best fitting model

Deep Learning Based Instance Segmentation – LaneNet [24]

Curve fitting mostly same

[24] - Towards End-to-End Lane Detection: an Instance Segmentation ApproachDavy Neven, Bert De Brabandere, Stamatios Georgoulis, Marc Proesmans, Luc Van Gool ESAT-PSI, KU LeuvenarXiv:1802.05591v1 [cs.CV] 15 Feb 2018[2] H. Bay, T. Tuytelaars, and L. Van Gool. Surf: Speeded up robust features. Proceedings of the European Conference on Computer Vision, 2006[20] D. G. Lowe. Object recognition from local scale-invariant features. In Proceedings of the IEEE International Conference on Computer Vision, 1999.

4

Objective of the Paper

End-to-end manner

Directly regress lane parameters

5

Motivation

Why single step?

Parameters not optimized for true task True task – estimating lane curvature parameters

Proxy task – Segmenting lane markings

Prevents instabilities in curve fitting 2 step –> outliers

End-to-end -> implicitly learn features to prevent instabilities

6

Methodology

Key Idea -> Integrate curve fitting step as a differentiable in-network optimization step

Deep Network for the feature extraction step

Key Idea -> A geometric loss function for the network

7

Framework

The framework consists of 3 main modules:-

Deep network to generate weighted pixel coordinates

Differential weighted least squares fitting module

Geometric Loss Function

8

Example Architecture – Figure 1 from the paper 9

Generating Weighted Pixel Coordinates

First Module of network

Normalized Coordinates -> x map and y map

Each coordinate -> weight w

Feature map -> same spatial dimensions as that of input image

10

Feature Maps

Non-negative weights

Width – w, height – h; m = w * h

M triplets generated – (x, y, w)

One feature map for each lane

11

Example Architecture – Figure 1 from the paper 12

Weighted Least Squares Layer

M triplets (x, y, w) -> weighted points in 2D space

Fit curve

Module output -> n parameters of best-fitting curve

13

Background - Least Squares Fitting

𝑋𝑋𝑋𝑋 = 𝑌𝑌;𝑋𝑋 ∈ 𝑅𝑅𝑚𝑚𝑚𝑚𝑚𝑚 ;𝑋𝑋 ∈ 𝑅𝑅𝑚𝑚×1;𝑌𝑌 ∈ 𝑅𝑅𝑚𝑚×1

X is input, 𝑋𝑋 are parameters, and Y is output

Least Squares -> 𝑋𝑋 = 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 ||𝑋𝑋𝑋𝑋 − 𝑌𝑌||2

Normal Equation -> 𝑋𝑋 = 𝑋𝑋𝑇𝑇𝑋𝑋 −1𝑋𝑋𝑇𝑇𝑌𝑌

14

Background – Weighted Least Squares

Least squares extended

𝑊𝑊 ∈ 𝑅𝑅𝑚𝑚×m ; Diagonal matrix -> weights for each pair of observations

𝑋𝑋 = 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 ||𝑊𝑊12(𝑋𝑋𝑋𝑋 − 𝑌𝑌)||2

𝑋𝑋𝑇𝑇𝑊𝑊𝑋𝑋𝑋𝑋 = 𝑋𝑋𝑇𝑇𝑊𝑊𝑌𝑌

𝑋𝑋 = 𝑋𝑋𝑇𝑇𝑊𝑊𝑋𝑋 −1𝑋𝑋𝑇𝑇𝑊𝑊𝑌𝑌

15

Backprop through the layer

Equations involve differentiable matrix operations

Calculate the derivative of 𝑋𝑋 wrt W

Refer to [10] to derive backprop

M. B. Giles. An extended collection of matrix derivative results for forward and reverse mode automatic differentiation.Technical report, University of Oxford, 2008.

16

Example Architecture – Figure 1 from the paper 17

Geometric Loss Function - precursor

Usually, L2 loss used for curve fitting

Here, 𝑋𝑋𝑖𝑖 and �𝑋𝑋𝑖𝑖 -> generated and groundtruth curve parameters

18

Geometric Loss Function

Lane Detection -> geometric interpretation

Minimize squared area between predicted curve and ground truth curve

19

Geometric Meaning

20

Geometric Loss Function – Parabola

This paper -> lane curves parabolic

𝑦𝑦 = 𝑋𝑋0 + 𝑋𝑋1𝑥𝑥 + 𝑋𝑋2𝑥𝑥2; Δ𝑋𝑋𝑖𝑖 = 𝑋𝑋𝑖𝑖 − �𝑋𝑋𝑖𝑖

21

Optional Transformations

Weighted coordinates -> another reference frame

Use fixed transformation matrix H

Lane line -> better as parabola from top-down/ortho view(BEV)

22

Experiment – Ego Lane Detection

Ego lane -> the current lane of the vehicle

Two lane marking -> one left and one right

Parabola -> upto fixed distance t from car

Overall error = average over 2 lanes, and average over images

23

Dataset

TuSimple Dataset [29]

Manually select and clean up the annotations of 2535 images

Filter out images where ego-lane cannot be detected unambiguously

20% images -> validation set

Not include images of single temporal sequence in both train and val sets

[29] TuSimple. Tusimple benchmark, 2017. 24

Annotation

Ground truth curve parameters -> parabola

Draw curve of fixed thickness as dense label

25

Baseline – Cross-entropy training

Training Segmentation

Per-pixel binary cross-entropy loss

Testing Segmentation mask generated

Parabola fitted in least squares sense

26

End-to-end training

ERFNet [28] -> network architecture

350 epochs; 1 GPU; 256*512 resolution; batch size 8

Adam[19] with LR 10−4

PyTorch [26][28] E. Romera, J. M. Alvarez, L. M. Bergasa, and R. Arroyo. Efficient convnet for real-time semantic segmentation. In IEEE Intelligent Vehicles Symposium, pages 1789–1794, 2017.

[19] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations, 2015.

[26] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. De- Vito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer. Automatic differentiation in pytorch. In NIPS-W, 2017. 27

Example Architecture – Figure 1 from the paper 28

Detour -ERFNet

29

ERFNet

Semantic Segmentation

Typical Encoder – Decoder architecture

Last layer -> adapted to output 2 feature maps

One for each ego lane

Transform weighted coordinates using fixed H to top-down view

[28] E. Romera, J. M. Alvarez, L. M. Bergasa, and R. Arroyo. Efficient convnet for real-time semantic segmentation. In IEEE Intelligent Vehicles Symposium, pages 1789–1794, 2017 30

Results - Quantitative

31

Result – Training curves

32

Qualitative Results 33

Analysis

Lower error than cross-entropy method

Convergence slower -> supervision signal weaker

Generated weight maps look like segmentation maps the network eventually discovers that the most consistent way to satisfy the loss function is to focus

on the visible lane markings in the image, and to map them to a segmentation-like representation in the weight maps.

34

Further Experiments – Multi-lane detection

4 lanes total -> ortho-view

Line prediction branch; horizon prediction branch

Horizon prediction branch -> regression -> estimate horizon

Line prediction branch -> whether lane is present or not

35

Architecture

36

Architecture details

Side branches -> 4 conv layers -> each 3x3

Then max pool -> FC layer

Losses for 3 tasks -> combined linearly

37

Dataset

3626 images

20% validation set

2782 test set images

38

Comparison

ERFNet without backprop through least squares layer -> baseline

[25] Spatial CNN

[24] -> Instance Segmentation approach

[24] D. Neven, B. De Brabandere, S. Georgoulis, M. Proesmans, and L. Van Gool. Towards end-to-end lane detection: an instance segmentation approach. arXiv:1802.05591, 2018.

[25] X. Pan, J. Shi, P. Luo, X. Wang, and X. Tang. Spatial as deep: Spatial cnn for traffic scene understanding. In AAAI, 2018. 39

Results Quantitative

40

Results -Qualitative

41

Analysis

Improve upon baseline by 0.7%

Faster than benchmarks in test time -> no post-processing required

71 fps on NVIDIA 1080Ti

42

ADVANTAGES

Optimized for true task -> prevents instabilities in curve fitting

Offers degree of interpretability Generated weight maps -> segmentation-like

Can be inspected and visualized

Geometry aware criterion is loss function

Handle large variance, faded lane markings

Moves complexity from post-processing to network -> one-shot fitting

43

DISADVANTAGES

Loss function -> more complicated for higher order curves

Fixed transformation H to ortho-view If ground plane is different (ex. Sloping uphill), then bad lane parameters in test time

Local minimum possible – author Vanishing point in horizon/left corner of image features -> good curve -> no improvement

Multi-lane -> fixed number of maps -> pre-defined order Lane changes hard; Order ambiguous

Instance segmentation -> not subject to specific order

Quantitative results -> comparatively worse from slide 40

44

KEY TAKEAWAYS

Including differentiable in-network optimization step.

Geometric Loss function relevant to the task

45

Question to the class

Why do you think that the loss in the Least Squares layer is only back-propagated to the coordinate weights only, and not to the coordinates themselves?

46

THANK YOU!

Questions?

Recommended