78
IN DEGREE PROJECT COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS , STOCKHOLM SWEDEN 2020 3D LiDAR based Drivable Road Region Detection for Autonomous Vehicles JIANGPENG TAO KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE

3D LiDAR based Drivable Road Region Detection for Autonomous Vehicles …kth.diva-portal.org/smash/get/diva2:1424484/FULLTEXT01.pdf · 2020. 4. 17. · DEGREE PROJECT IN COMPUTER

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

  • IN DEGREE PROJECT COMPUTER SCIENCE AND ENGINEERING,SECOND CYCLE, 30 CREDITS

    , STOCKHOLM SWEDEN 2020

    3D LiDAR based Drivable Road Region Detection for Autonomous Vehicles

    JIANGPENG TAO

    KTH ROYAL INSTITUTE OF TECHNOLOGYSCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE

  • 3D LiDAR based DrivableRoad Region Detection forAutonomous Vehicles

    JIANGPENG TAO

    Master in System, Control and RoboticsDate: March 4, 2020Supervisor: John FolkessonExaminer: Patric JensfeltSchool of Electrical Engineering and Computer ScienceHost company: Scania ABSwedish title: 3D-LiDAR-baserad körbar vägregistrering förautonoma fordon

  • iii

    AbstractAccurate and robust perception of surrounding objects of interest, such as on-road obstacles, ground surface, curb and ditch, is an essential capability forpath planning and localization in autonomous driving. Stereo cameras are of-ten used for this purpose. Comparably, 3D LiDARs directly provide accuratedepth measurements of the environment without the need for association ofpixels in image pairs. In this project, disparity is used to bridge the gap be-tween LiDAR and stereo cameras, therefore efficiently extracting the groundsurface and obstacles from 3D point cloud in the way of 2D image processing.Given the extracted ground points, three kinds of features are designed to de-tect road structures with large geometrical variation, such as curbs, ditches andgrasses. Based on the feature result, a robust regression method named leasttrimmed squares is used to fit the final road boundary. The proposed approachis verified with the real dataset from a 64-channel LiDAR mounted on Scaniabus Klara, as well as the KITTI road benchmark, both achieving satisfyingperformances in some particular situations.

  • iv

    SammanfattningExakt och robust perception av omgivande föremål av intresse, såsom hin-der på vägar, markytor, trottoarkanter och diken, är en väsentlig förmåga förvägplanering och lokalisering vid autonom körning. Stereokameror användsofta för detta ändamål. I jämförelse, 3D LiDAR ger exakta djupmätningar di-rekt av miljön utan att behöva matcha pixlar i bildpar. I detta projekt användsskillnaden för att överbrygga klyftan mellan LiDAR och stereokameror, ochdärmed effektivt hitta markytan och hinder från ett 3D-punktmoln genom 2D-bildbehandling. Givet att markytan har hittats, tre typer av funktioner under-söks för att upptäcka vägkonstruktioner med stor geometrisk variation, somtrottoarkanter, dike och gräs. Baserat på funktionsresultatet används en robustregressionsmetod, least trimmed squares, för att passa den slutliga väggrän-sen. Det föreslagna tillvägagångssättet verifieras med två dataset med data från64-kanalig LiDAR, en från Scania-bussen Klara och KITTI, och uppnår till-fredsställande prestanda i vissa givna situationer.

  • v

    AcknowledgementsThis thesis project is performed at SCANIA CV AB, EARM group. The workis financed by the Vinnova research project iQPilot (project number 2016-02547), for which I am grateful. The objective of this project is to movea step closer to the introduction of self-driving heavy-duty vehicles in traf-fic environments. I would first like to express my great appreciation to Prof.John Folkesson and Dr. Batool Nazre for their persistent supervision and helpthroughout this thesis project. They encouraged and steered me in the rightdirection whenever I got into trouble.

    I would also like to thank all my colleagues at Scania who have ever sup-port my work and provide valuable suggestion during my mid-term and finalpresentations, especially Dr. Zhan Wang and my manager Per Sahlholm.

    In addition, a thank you to my thesis examiner Prof. Patric Jensfelt for hisassessment and comments on this thesis.

    Finally, I would like to express my profound gratitude to my parents andto my girlfriend for their unfailing support and continuous encouragementthroughout my years of study and through the process of this project. Thiswork would not be possible to complete without them. Thank you.

  • Contents

    1 Introduction 11.1 Project Overview . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Research Question . . . . . . . . . . . . . . . . . . . . . . . 51.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    2 Related Work 62.1 Rigid Model Fitting . . . . . . . . . . . . . . . . . . . . . . . 72.2 Occupancy Grid Map based . . . . . . . . . . . . . . . . . . 82.3 Disparity based . . . . . . . . . . . . . . . . . . . . . . . . . 82.4 Deep Learning based . . . . . . . . . . . . . . . . . . . . . . 92.5 Method Choice . . . . . . . . . . . . . . . . . . . . . . . . . 10

    3 Theory 113.1 Stereo Vision . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    3.1.1 Stereo Camera Model . . . . . . . . . . . . . . . . . 113.1.2 Stereo Matching . . . . . . . . . . . . . . . . . . . . 133.1.3 U-V-disparity Domain . . . . . . . . . . . . . . . . . 143.1.4 3D Planes Projection in U-V-disparity Domain . . . . 15

    3.2 3D LiDAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.2.1 LiDAR Model . . . . . . . . . . . . . . . . . . . . . 18

    4 Methodology 204.1 Ground Surface Extraction . . . . . . . . . . . . . . . . . . . 20

    4.1.1 Disparity Image Generation . . . . . . . . . . . . . . 204.1.2 U-disparity Map . . . . . . . . . . . . . . . . . . . . 234.1.3 V-disparity Map . . . . . . . . . . . . . . . . . . . . 244.1.4 Crude Obstacle Removal . . . . . . . . . . . . . . . . 254.1.5 Longitudinal Road Profile Extraction . . . . . . . . . 254.1.6 Road Point Extraction . . . . . . . . . . . . . . . . . 28

    4.2 Road Boundary Detection . . . . . . . . . . . . . . . . . . . . 30

    vi

  • CONTENTS vii

    4.2.1 Feature Design . . . . . . . . . . . . . . . . . . . . . 304.2.2 Boundary Model . . . . . . . . . . . . . . . . . . . . 32

    4.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 344.3.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . 344.3.2 Experimental Equipment . . . . . . . . . . . . . . . . 354.3.3 Practical Considerations . . . . . . . . . . . . . . . . 35

    5 Results and Discussion 375.1 Case study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

    5.1.1 Ground Surface Extraction . . . . . . . . . . . . . . . 375.1.2 Candidate Boundary Point Identification . . . . . . . . 39

    5.2 Evaluation on KITTI Dataset . . . . . . . . . . . . . . . . . . 415.3 Computation Complexity . . . . . . . . . . . . . . . . . . . . 425.4 Good properties . . . . . . . . . . . . . . . . . . . . . . . . . 455.5 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

    6 Conclusions 486.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 486.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . 49

    Bibliography 51

    A Societal Aspects 57A.1 Ethics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57A.2 Sustainability . . . . . . . . . . . . . . . . . . . . . . . . . . 58A.3 Social Relevance . . . . . . . . . . . . . . . . . . . . . . . . 59

    B Resulting Samples 60B.1 Road Boundary Extraction . . . . . . . . . . . . . . . . . . . 60

    B.1.1 LiDAR Supplier’s Dataset (Urban Scene) . . . . . . . 60B.1.2 Scania Dataset (Suburb Scene) . . . . . . . . . . . . . 60

    B.2 KITTI Road Benchmark . . . . . . . . . . . . . . . . . . . . 60

  • Chapter 1

    Introduction

    Modern vehicles increasingly have advanced driver assistance systems (ADAS)equipped to enhance better driving and safety. More and more features areadded and improved, including avoiding collisions and alerting to potentialdangers. Such systems are expected to grow more and more complex and reli-able towards fully autonomous driving during the following decades. Histori-cally, autonomous vehicle research began to gain momentum with the DARPAGrand Challenge in 2005 and Urban Challenge in 2007. Significant researchwork has been done in the related sensor, perception, planning, control sys-tems. Since then, a large number of companies and research organizationshave joined and developed prototypes of autonomous vehicles (AVs).

    Drivable road region detection is the fundamental task both in ADAS andautonomous driving. The system requires identifying the road boundary anddetecting the surrounding obstacles, such as vehicles, pedestrians, guardrailsand buildings. Importantly, the detection result provides the straightforwardperceptual cue for collision avoidance and path planning. Therefore, the real-world application puts very high demand on its reliability. Research work[1] states that an acceptable error rate should be less than one error in 54,000frames. Achieving such low error rates faces a myriad of challenges in realistictraffic scenes. There are numerous kinds of obstacles with different shapes,sizes, colors, textures, dynamic states. The ground surfaces can be planar,bumpy, sloped, or undulating. Even structured urban road might be built by adifferent standard.

    A wide variety of sensor equipped in AVs were found useful for the taskof drivable road region detection, including imaging sensors, LiDAR (lightdetection and ranging), RADAR (radio detection and ranging). Each kindof sensor has its advantages and drawbacks. The imaging sensor is the most

    1

  • 2 CHAPTER 1. INTRODUCTION

    commonly used modality and it can obtain regular and dense visual data withcolor and texture information. However, the data clarity heavily depends on thevisibility conditions. Shadows, extreme weather conditions (rain, fog, snow),sun glare may cause unpredictable model output. On the other hand, RADARhas high reliability in range and velocity measurement even under bad weather.But its weaknesses are also obvious. It has a very narrow field of view (lessthan 15 degrees) and low accuracy in lateral directions, which means a largeblind region exists. In contrast, LiDARs can capture highly accurate geometrywith a larger field of view, and also do not suffer from the external illumination.Although currently the high cost limits the extensive use of LiDARs, expertsexpect this to fall to less than 100 dollars in the next 5 years. It is therefore ofinterest to survey the applicability of LiDARs data without using photometricinformation for the road perception task.

    1.1 Project OverviewThis project was conducted at the company Scania AB in Stockholm. Scania isdeveloping an prototype of fully autonomous bus called Klara. A 64-channelLiDAR is mounted with around 14 degrees pitch angle on the head of Klara tocapture depth cues ahead. This project aims to propose an algorithm to detectthe front drivable road region by classifying each 3D point as drivable road orobstacles. For this task, deep learning techniques have shown extraordinaryresults recently. But Scania would prefer not to implement such data-drivenblack box model, which may output unpredictable and unexplainable detec-tion result under unseen conditions. Thus, the primary focus of this projectis restrictively on traditional methods exploiting geometry and projection at-tributes.

    In this project, a solution for drivable road region detection is proposedand it proceeds in five main steps: disparity image generation, crude obstacleremoval, road profile extraction, candidate boundary point identification, roadboundary regression. The block diagram of the proposed system is illustratedin Fig. 1.1. The proposed method is evaluated on real-world dataset, andthe results in the case of different traffic scenarios are analyzed. A resultingsample is as shown in Fig. 1.2.

  • CHAPTER 1. INTRODUCTION 3

    Figure 1.1: The block diagram of the proposed drivable road region detectionsystem

    Firstly, the ground surface and obstacles are extracted from LiDAR pointcloud based on the U-V-disparity technique. Originally, disparity is a conceptmodeled in the camera coordinate. For the case of 3D LiDAR, the disparityimage is derived by projecting forward point cloud onto an image plane and

  • 4 CHAPTER 1. INTRODUCTION

    computing the corresponding inverse depth up to a scale. In the U-disparitymap, obstacles such as vehicles, pedestrians, trees, guardrails, are mostly pre-sented as pixels with high intensity. In the V-disparity map, an ideal plane isprojected as a straight line. However, for the non-flat or multiplanar groundsurface in real world, the corresponding projection would be indeterminatecombination of line segments. Instead of 2D line fitting algorithms, the Sobeledge detection filter is used to extract a road profile from the V-disparity map.

    The extracted ground surface still retains some undrivable road structures,including curbs, ditches, grasses. The points corresponding to these roadstructures generally have a large geometrical variation from neighbors. There-fore, we propose three kinds of features to detect such points, which are seenas candidate boundary points. The road boundary is assumed to be observableand straight. Thus, a robust linear regression method is applied to fit left andright road boundaries. Those ground points between the two resulting roadboundaries represent the final drivable road region.

    (a) Result of ground surface extraction and boundary regression (The ground truth ofdrivable road region is presented in light blue. Yellow denotes ground surface. Thegreen points represent obstacles. Feature points are denoted in blue, and the regressedroad boundaries are plotted as red lines)

    (b) Final result of drivable road region detection (The green color denotes true posi-tives, blue color representing false positives and red color being false negatives)

    Figure 1.2: Resulting sample (The classified point cloud are projected andoverlay on the synchronized image for better visualization and understanding)

  • CHAPTER 1. INTRODUCTION 5

    1.2 Research QuestionA frame of point cloud from 64-channel LiDARs contains millions of 3Dpoints, which makes it computationally expensive for searching and filter-ing operations among points. How can we design a efficient non-data-drivenmethod to segment the point cloud into clusters of drivable road region andobstacles? Can such a method overcome the irregular and sparse problem ofLiDAR data? Can such a method keep generalized in different traffic scenar-ios?

    1.3 OutlineThis report is organized as follows:

    • Chapter 2 reviews the existing work in drivable road region detection.

    • Chapter 3 describes the prerequisite theory to help understand the pro-posed method.

    • Chapter 4 explains the proposed method and experimental setup.

    • Chapter 5 presents and analyze the experimental results in different roadscenarios. Also, the advantages and limitations of the proposed methodare discussed.

    • Finally, Chapter 6 concludes this thesis and suggests some potential fu-ture work.

  • Chapter 2

    Related Work

    This chapter summarizes the related work in the area of drivable road regiondetection.

    Over the decades, a large number of solutions based on different sensorshas been proposed for road perception. Hillel [1] surveys the research progressin road and lane detection. The predominant approaches in this area apply theirmodel for the case of image data. Since the traffic infrastructures, such as lanemark, road signs, are particularly designed and built for human visual system,the texture and color cues in the image data are the most straightforward. Be-sides, with current manufacture development cameras are cheap to access androbust for usage. With the prior knowledge of traffic scenes, the road regiondetection generally converts as a pixel-wise segmentation problem [4, 5, 6]or vanishing point detection task [7, 8, 9]. On the other hand, stereo imagingmethods extract 3D information from an image pair and model based on 3Dspace relations rather than photometric information [10, 11]. Although thedeep estimation of stereo imaging can not reach the same accuracy and relia-bility as LiDARs, the modeling methodology regarding object detection andclassification mostly can be transformed to LiDAR applications. In addition,in order to overcome the drawbacks of single sensor some work focus on fu-sion between image and LiDAR data by projecting the LiDAR points onto theimage plane [12, 13, 14].

    This project aims at exploring a solution with pure LiDAR data. Thus,only work exploiting 3D cues will be discussed in the following.

    6

  • CHAPTER 2. RELATED WORK 7

    2.1 Rigid Model FittingTraditionally, given 3D data the road perception follows such process: Fist isto separate the traffic scenes into geometrical structures such as road surfaces,curbs, poles, planes, corners. Then is to establish rigid models to present thesestructures.

    Road surface

    In general, road surface mostly is assumed to be an ideal plane. The mostconventional approaches to plane extraction are the random sample consensus(RANSAC) [15], Hough based methods [16], and normal estimation. How-ever, thesemethods restrict to handle only planar road surfaces. For non-planarroad surfaces, Ai [17] applies a quadratic road model combining with least-squares fitting. In [18], the ground surface approximation is presented by aB-Spline model.

    Road boundary

    Traffic roads are generally separated by different kinds of road structures, in-cluding curbs, grasses, ditches, guardrails. Similarly, all these structures havelarge elevation variation in their surroundings. The differential filter is com-monly used to extract curb features from a single laser scan, and can be con-volved with different kinds of data representation, such as elevations [19, 20],ranges [21], top-view Euclidean positions [22]. Some prior knowledge are uti-lized to improve the accuracy of curb detection, including road width and curbheight [21]. In [20], the least trimmed squares regression method is appliedto robustly fit the road boundary. Work by Fernández [23] proposes a methodbased on 3D curvature which can tackle curbs with different heights.

    Lane mark

    Lane mark rarely becomes the perception target of LiDAR-based techniques.Since lane marks are generally painted with negligible thickness on road, theintensity measurement would be the only useful information. However, it actu-ally is not reliable enough and easily affected by ranges and weather. Nonethe-less, in [24] intensity features are first extracted by computing the gradients inthe single scans and then radon transform is used to estimate the straight lanemarks. In [25], the Otsu thresholding method is applied to segment LiDARpoints into asphalt and road marking.

  • 8 CHAPTER 2. RELATED WORK

    2.2 Occupancy Grid Map basedOccupancy grid map is one of commonly used ways of LiDAR data presenta-tion. It splits a certain range of space into equally sized cells. Each cell storessome predefined state variables and is recursively updated based on the Bayesrule. Herbet et al. [26] introduce the concept of elevation map which repre-sents 3D information in 2D bird view and stores the height of the surface ineach cell. Triebel et al. [27] propose a multiple surface representation whichextends themodel ability to complex structures. Compared to the conventionalpoint cloud representation, such grid maps are more compact and organized.

    In the grid map, each cell corresponds to a number of different 3D points.Generally, the variation of each set of points acts as a strong cue for 3D objectdetection task. In [28] the elevation histogram of the points in each cell iscombined with a graph model to detect obstacles. Zhao et al. [29] propose arobust curb and road surface detection method by extracting three spatial cues,including the elevation difference, gradient value and normal orientation.

    2.3 Disparity basedKnowledge of disparity represents the 3D information and several kinds ofmethods model traffic scenes based on the disparity characteristics.

    U-V-disparity

    First introduced by Labayrade [10], the V-disparity represents 3D ground sur-face and obstacles as simple piecewise linear curves in 2D space. The V-disparity is obtained by accumulating pixels of the same disparity value oneach column in the (u, v) image coordinate system. In the V-disparity map, thelongitudinal road profile is projected as a straight line, while on-road obstaclesare single vertical lines. Similarly, Hu et al. [30] proposed the concept of U-V-disparity and extended its classification types to ground surfaces, obstacles,and roadside structures. Gao et al. [31] proved the applicability of the U-V-disparity technique in active 3D imaging cameras. All methods above employcurve-fitting techniques, e.g., Hough transform to extract line features fromthe U-V-disparity domain. However, it cannot tackle the scenarios with highlyunstructured ground surfaces, such as uphill, downhill, undulating hills. Theapproach in [32] extracts such ground surfaces by a nonparametric strategy inthe U-V-disparity domain. With the assistance of U-V-disparity techniques,

  • CHAPTER 2. RELATED WORK 9

    further perception works on other road features were carried out, includinglane marks [33, 34], pothole [35, 36].

    Stixel world

    Another technique based on disparity is the "Stixel World" [11] which is pro-posed and mainly promoted by David Pfeiffer. The Stixel World is a medium-level representation of 3D traffic scenes to segment an image or depth mapsinto superpixels, and each superpixel is a rectangular stick with a certain heightand class label, named "stixel". In the U-V-disparity domain, its segmentationis inferred by solving a maximum-a-posteriori (MAP) problem by minimiz-ing an energy function [37, 38]. In general, the Stixel World can be used toseparate free space, static obstacles, moving objects, and background [11, 39,40]. It also has proven applicable in both stereo images and 3D LiDAR rangedata [41, 37]. Franke further extended the availability of this technique in thecase of adverse weather [42], and slanted streets [43]. Besides disparity, imagecolor and texture information is used to improve the performance [44, 45, 46].

    Lidar-histogram

    Inspired by the U-V-disparity work, Kong et al. [47] proposed a method calledLidar-histogram to segment Lidar point cloud into road plane, positive andnegative obstacles. Generally, the U-V disparity technique requires the dispar-ity map generated from stereo image matching. The alternative way of acquir-ing a disparity map is projecting 3D point clouds onto an image plane. Similarto the V-disparity map, the Lidar-histogram simplifies the segmentation prob-lem as a 2D linear fitting task. To refine the detected road region, Kong et al.[48, 49] further designed a row and column scanning strategy based on theheight difference. However, its performance is limited to the sparsity and dis-continuity of Lidar data. Therefore, Kong explored some improvement workon the fusion of LiDAR and camera, including upsampling the point cloud[50] and combing the FCNN-based results [51].

    2.4 Deep Learning basedIn recent five years, almost all LiDAR-based object detection methods whichrank tops on the mainstream benchmark are based on deep learning. Theunprecedented enhancement of computing power and access of large-scaledataset make deep learning training applicable. Considerably, deep learn-

  • 10 CHAPTER 2. RELATED WORK

    ing based methods has proven more generalized and accurate than traditionalmethods.

    Predominantly, a category of approaches transform the irregular LiDARdata into 2D images and consider as a typical image semantic segmentationproblem, including SqueezeSeg [52] PointSeg [53] in spherical coordinates,LiLaNet [54] in cylindrical coordinates, work by Dewan [55] in bird view.With 2D image presentation, a great number of state-of-the-art image-baseddeep learning framework can be applied. Another way of methods projectpoint cloud on regular Cartesian voxel grids and extend convolution layers to3D space, such SEGCloud [56] andOctNet [57]. Although the spatial relationsare kept, this kind of methods cannot fail to deal with sparse LiDAR pointcloud and reach real-time performance. Differently, PointNet [58] proposes anetwork architecture which can directly consume raw point clouds. Basically,it combines CNN structures with both local and global points features whichare invariant to point order. It has proven useful and successful in both objectclassification and semantic segmentation task, but restrict to relatively small-scale scenes.

    2.5 Method ChoiceFor this project, deep learning based methods are not preferred due to theunexplainable output and need of heavy data annotation work. Besides, theoccupancy grid map discretizes point clouds into compact and organized cells,but results in lots of empty or barren cells. Processing such data is inefficientand wasting computation.

    Manmade road structures generally keep certain statistical properties of3D points. A variety of existing features are available for road perception,such as 3D curvatures, normal and elevation difference. But it is not trivial todesign a robust enough feature with the influence of diverse on-road obstacles.Moreover, a frame of data captured by 64-channel LiDARs contains millionsof 3D points. It means computing features of such large data would be a heavycalculation burden. Inspired by the Lidar-histogram work [47], the 3D groundsurface and obstacle extraction can be converted to a simple 2D line detectiontask in the U-V-disparity domain, thereby avoiding massive pointwise opera-tions in 3D space. In this project, improvements are made specially for the caseof non-flat or multiplanar ground surfaces. The removal of obstacle points notonly largely simplifies the feature design, but save mass of computation.

  • Chapter 3

    Theory

    The basic theories about theU-V-disparity and 3DLiDAR sensor are explainedin the following.

    3.1 Stereo VisionStereo vision is a broad research topic in computer vision aimed at extracting3D information from two or more images.

    3.1.1 Stereo Camera ModelThe most specialized and standard case of a stereo vision system is two iden-tical pinhole cameras are displaced horizontally from each other, in a similarmanner of human binocular vision. Its general setup is illustrated in Fig. 3.1.

    Under the pinhole camera model, the central projection of a point (X, Y,Z) in the camera coordinate is simply expressed as a linear mapping:{

    u = fuXZ

    + u0v = fv

    YZ

    + v0(3.1)

    Where the parameter u and v denote the image coordinates; fu = α/su andfv = α/sv represent the focal length in terms of pixels; α is the focal lengthin terms of distance, su and sv are the size of each pixel in u and v directions.In the following, we assume fu = fv and replace with f .

    11

  • 12 CHAPTER 3. THEORY

    Figure 3.1: The general stereo camera model, where: P is a point in the realworld coordinate, b is the distance of baseline, Ol,r is the optical center, p =(u0, v0) is the principal point.

    Homogeneous coordinates

    Homogeneous coordinates are a specific coordinate system in projective ge-ometry. It works by adding an extra coordinate to a Euclidean coordinate. Forexample, a coordinate triple (kx, ky, k) with any non-zero value k, representsthe same point (x, y) in 2D Euclidean space. In this way, the triple (x, y, 0)corresponds to the point (x/0, y/0) at infinity.

    In general, points in space are expressed in terms of different Euclideancoordinate frames. Two different coordinate frames are related via a rotationand a translation. With assistance of homogeneous coordinates, projectivemapping from world coordinate to image coordinate can be easily inferred bymatrix multiplication:

    k

    uv1

    = K [R | T ]XwYwZw1

    (3.2)

    K =

    f γ u0 00 f v0 00 0 1 0

    [R | T ] =

    [R3×3 T3×101×3 1

    ] (3.3)

  • CHAPTER 3. THEORY 13

    Where K is the intrinsic matrix corresponding to Eq. 3.1. γ denotes theskew coefficient between the x and y axis of the camera coordinates and isoften approximately 0. Besides, [R | T ] are the extrinsic parameters used totransform from world coordinates to camera coordinates. R and T are respec-tively the rotation matrix and translation vector. k is the scaling factor.

    Disparity

    The disparity refers to the difference between the u coordinate of two corre-sponding points within a stereo image pair.

    Based on Fig. 3.1, the transformation from world coordinates to the twocamera coordinates is achieved by a simple vector translation with ±b/2 in xdirection. Therefore, the disparity can be easily obtained by:

    ∆ = ul − ur = fb

    Z(3.4)

    As shown in Eq. 3.4, the disparity is inversely proportional to the distancefrom the observer. Thus, given disparity map we can directly recover the 3Ddepths of each pixel.

    3.1.2 Stereo MatchingStereomatching is the process of estimating a 3Dmodel of the scene by findingthe pixels in two or more views that correspond to the same 3D position. Itsprimary task is to measure the disparity map accurately and efficiently. So far,stereo matching is still an actively studied and fundamental research area.

    Basically, stereo matching techniques follow four steps:

    1) Calibration: Obtains intrinsic and extrinsic parameters of the stereocamera offline.

    2) Rectification: Uses the calibrated parameters to remove lens distortionsand transform the stereo pair into the standard setup as Fig. 3.1.

    3) Stereo Correspondence: Aims at finding homologous points in thestereo pair and estimating the disparity map. This tough procedure faceslots of specific pitfalls, such as photometric noises, ambiguous patterns,occlusions, and discontinuities. Therefore, a wide range of methodsarises for robustness and optimization. In general, suitable algorithmsare adapted to the image data accordingly.

  • 14 CHAPTER 3. THEORY

    4) Triangulation: Computes the 3D positions of each pixel based on Eq.3.1 and 3.4, given the calibrated parameters and disparity map.

    Although typical stereo matching algorithms are computationally expen-sive, utilizing FPGA (Field Programmable Gate Array) or GPUs (Graphic Pro-cessing Units) can allow for real-time performance.

    3.1.3 U-V-disparity DomainThe U-V-disparity domain is used to describe the relationship between imagecoordinates (u, v) and disparity in stereo vision. It is commonly used to detectground surfaces and structured obstacles in 3D scenes.

    In general, the stereo rig is mounted on a robot approximately parallel tothe ground surface as shown in Fig. 3.2. We use ψ, φ, θ to represent the yaw,roll, pitch angle of the camera coordinate with respect to the world coordinate.These three angles often approximately equal to 0.

    Figure 3.2: The setup of the stereo camera and world coordinate system, wherethe parameters denotation is same with Fig. 3.1.

    Under the setup of Fig. 3.2, the projective mapping follows the Eq. 3.2with the intrinsic and transformation matrix:

  • CHAPTER 3. THEORY 15

    K =

    f 0 u0 00 f v0 00 0 1 0

    R =

    1 0 00 cos θ − sin θ0 sin θ cos θ

    × cosψ 0 sinψ0 1 0− sinψ 0 cosψ

    × cosφ − sinφ 0sinφ cosφ 0

    0 0 1

    Tl,r =

    [± b

    20 1

    ]T

    (3.5)

    From Eq. 3.2 and 3.5, we can derive the image coordinates (u, v) anddisparity:

    ul,r = u0 + fX cosψ cosφ−Y cosψ sinφ+Z sinψ±b/2

    k

    v = v0 + fX(cos θ sinφ−sin θ sinψ cosφ)+Y (cos θ cosφ+sin θ sinψ sinφ)−Z sin θ cosψ

    k

    ∆ = ul − ur = f bkk =X(sin θ sinφ− cos θ cosφ sinψ)+

    Y (sin θ cosφ+ cos θ sinφ sinψ) + Z cos θ cosψ(3.6)

    To simplify the Eq. 3.6, we set the yaw and roll angles to 0 and use a newimage coordinates (U, V ) with respect to the camera principle point:

    Ul,r = ul,r − u0 = fX ± b/2

    Y sin θ + Z cos θ

    V = v − v0 = fY cos θ − Z sin θY sin θ + Z cos θ

    ∆ = ul − ur = fb

    Y sin θ + Z cos θ

    (3.7)

    3.1.4 3D Planes Projection in U-V-disparity DomainMan-made environments are dominated by planar horizontal, vertical and obliquesurfaces. For instance, the ground surface nearby approximates horizontal orslightly oblique planes, and most of obstacles and buildings stand verticallyon the ground. Fig. 3.3 shows six types of planes in the world coordinates.These typical planes can be mapped as simple linear curves in U-V-disparitydomain[10][30]

  • 16 CHAPTER 3. THEORY

    Figure 3.3: The typical planes in stereo system

    Horizontal plane

    Horizontal planes (white in Fig. 3.3) in the world coordinates can be simplydescribed as:

    Y = λ (3.8)

    Substitute Eq. 3.8 into Eq. 3.7 obtains:

    λ

    b∆ = V cos θ + f sin θ (3.9)

    Therefore, a horizontal plane in the world coordinates will be projected asa straight line in the V-disparity domain.

    Vertical plane

    Vertical planes (yellow in Fig. 3.3) in the world coordinates can be representedas:

    Z = λ (3.10)

    Combining with Eq. 3.7 and 3.10, we can deduce:

    λ

    b∆ = −V sin θ + f cos θ (3.11)

  • CHAPTER 3. THEORY 17

    It shows that a vertical plane in the world coordinates will be also projectedas a straight line in the V-disparity domain. If the pitch angle θ is sufficientlysmall, then Eq. 3.11 will be equivalent to Eq. 3.4:

    ∆ ≈ f bλ

    (3.12)

    Side surface plane

    Side surface planes (purple in Fig. 3.3) in the world coordinates can be ex-pressed as:

    X = λ (3.13)

    Substitute Eq. 3.13 into Eq. 3.7 derive a linear relationship in U-disparitydomain with respect to the left image:

    2λ+ b

    2b∆ = Ul (3.14)

    Oblique plane

    Some othermore general types of planes also exist inman-made environments.The red case in Fig. 3.3 can be described as:

    Z = kY +m (3.15)

    Combining with Eq. 3.7 and 3.15, we can prove such planes are also pro-jected as straight lines in the V-disparity domain:

    m

    b∆ = −V (sin θ + k cos θ) + f(cos θ − k sin θ) (3.16)

    The green case in Fig. 3.3 can be expressed as:

    Z = kX +m (3.17)

    Similarly, we can deduce the relationship between (U, V ) and disparity:

    2m− kb2b

    ∆ = −V sin θ − kUl + f cos θ (3.18)

    When the pitch angle θ is sufficiently small, disparity becomes linearlyrelated to U :

    2m− kb2b

    ∆ = −kUl + f (3.19)

  • 18 CHAPTER 3. THEORY

    The blue case in Fig. 3.3 can be modeled by:

    Y = kX +m (3.20)

    With the similar principle, we can obtain:

    2m− kb2b

    ∆ = kUl + V cos θ + f sin θ (3.21)

    Differently, the projection of this type of oblique planes doesn’t follow alinear relation.

    3.2 3D LiDARCurrently, the mechanical scanning LiDAR is the most commonly used typeof laser sensors in autonomous driving. It can collect data over a wide area ofup to 360 degrees by physically rotating a laser/receiver assembly, or rotatinga mirror to steer a light beam. Each emitted beam is known as one channeland various numbers of channels are available including 1, 4, 16, 32, 64, 128.The range measurement can be directly calculated based on the time differencebetween the output pulse and receipt pulse.

    3.2.1 LiDAR ModelA typical 64-channel LiDAR can capture millions of precise distancemeasure-ment points every second. The emitted orientation of each point is generallyfixed and defined by the azimuth and zenith angles. Therefore, the raw Li-DAR data consists of four kinds of information: azimuth angle, zenith angle,range, and intensity. Besides, the resolution of azimuth angle can be adjustedby the alterable rotational speed. For example, the rotational frequency of 10Hz corresponds to collecting one point per 0.2 degree. So, the azimuth angleresolution is 360/0.2 = 1800 points per scan.

    As shown in Fig. 3.4, the scan with a small zenith angle generally mea-sures the distant objects. Correspondingly, the point cloud becomes sparseras the perception distance increases. In order to avoid the irregular and sparseproblem, each frame of LiDAR data is generally stored as regular images asshown in Fig. 3.5.

  • CHAPTER 3. THEORY 19

    Figure 3.4: Model of mechanical scanning LiDARs (The blue cylinder denotesLiDAR sensor and the long arrows presents laser beams)

    (a) Intensity image

    (b) Range image

    Figure 3.5: Regular LiDAR image (The x-aixs corresponds to the azimuthangle and y-axis corresponds to the zenith angle)

  • Chapter 4

    Methodology

    This chapter describes the method of detecting drivable road region and actualimplementation in detail.

    4.1 Ground Surface ExtractionThe U-V-disparity techniques have proven useful and robust for obstacle andground surface detection in traffic scenes. However, most research works de-rive and apply the model for the case of image data. This work extends thedisparity-based method’s applicability to 3D LiDAR data.

    4.1.1 Disparity Image GenerationIn general, the disparity in stereo vision is obtained by matching pixels in animage pair and computing its distance. Alternatively, LiDARs measure thedepth of each real-world point, and the disparity can be directly obtained basedon Eq. 3.4.

    Central projection

    We can assume one camera plane is placed parallel to the Y-Z plane of theLiDAR coordinates as shown in Fig. 4.1. Under the basic pinhole model, eachforward 3D point from LiDAR can be mapped to image coordinates (u, v) byEq. 3.1.

    In general, the density of LiDAR point cloud is relatively smaller than theimage resolution. As a result, the projected image will be full of missing pixelsas shown in Fig. 4.2. The data density is principally determined by the focallength in terms of pixels f and the range of azimuth angle. Eq. 3.1 shows a

    20

  • CHAPTER 4. METHODOLOGY 21

    Figure 4.1: The setup of the virtual camera and LiDAR coordinate system. (Olis the origion of the LiDAR coordinates and coincides with the optical centerOi)

    negative correlation between the focal length and density. If the focal lengthis too small, a large proportion of 3D points will be projected into the samepixels. On the contrary, the image data will be too sparse with large focallength. On the other hand, the image data becomes sparser as the deflectionangle from Z-axis increases. Thus, we should carefully choose suitable valuesof the two parameters.

    Due to LiDARs’ scanning property, the projected region is in the shape ofan hourglass as shown in Fig. 4.2. The regular image therefore is obtained bycropping into a rectangle.

    Figure 4.2: Projected image sample with grayscale LiDAR intensity (The redbounding box covers the region to crop)

  • 22 CHAPTER 4. METHODOLOGY

    Scaled disparity

    SinceU-V-disparitymethods only need to compute the histograms of disparity,it is unnecessary to obtain the real disparity in stereo vision. The equivalentdisparity is defined as:

    ∆ = k · 1X

    (4.1)

    Where k is the scale factor and can be an arbitrary appropriate constantvalue.

    Interpolation

    As for the sparsity problem, the simplest solution is linearly interpolating themissing pixels by finding the nearest non-zero pixels. Specially, some mate-rials cannot bounce back laser pulses to the LiDAR receiver, such as water,glasses. Thus, in the real dataset relatively large segments are often missing.So, the search of non-zero pixels will be limited within a particular range.

    Basically, the interpolation follows two steps:

    1) For a missing pixel with coordinate (u, v), find the nearest up and downnon-zero pixels within coordinate range of [v− d, v+ d], then calculatethe new disparity by:

    ∆new =∆up(vup − v) + ∆down(v − vdown)

    vup − vdown(4.2)

    2) If don’t find two non-zero pixels in v coordinate direction, then processsimilarly in u direction:

    ∆new =∆up(vup − v) + ∆down(v − vdown)

    vup − vdown(4.3)

    Through the interpolation, the result will look like a regular disparity im-age, as shown in Fig. 4.3. However, the interpolation not only brings in arti-facts but also increases computation complexity byO(MN) for an image witha resolution of (M,N). In the following section, we will analyze and concludethat the interpolation step is not necessary.

  • CHAPTER 4. METHODOLOGY 23

    (a) Original disparity image in grayscale

    (b) Interpolated disparity image in grayscale

    (c) Interpolated disparity image in pseudo color

    Figure 4.3: Resulting sample of linear interpolation

    4.1.2 U-disparity MapTheU-disparitymap is achieved by computing disparity histograms column bycolumn from the disparity image. The disparity value is manually set to rangefrom 0 tomaxDisparity_U and then averagely split into bins with number ofbins_U . The histogram is defined as follows:

    H im =rows∑n=0

    ξm,n

    ξm,n =

    {1, if ∆(m,n) in ith bin0, otherwise

    (4.4)

    Wherem and n are respectively the column and row index.The histograms are then sorted according to its corresponding column in-

    dex. The value of pixels in the U-disparity map means accumulated number ofcertain disparity. Therefore, the U-disparity map can be regarded as a featureimage with x-axis of column index and y-axis of discrete disparity. Fig. 4.4illustrates the U-disparity map of the sample of Fig. 4.3.

  • 24 CHAPTER 4. METHODOLOGY

    Figure 4.4: U-disparity map sample

    4.1.3 V-disparity MapUsing a similar principle, the V-disparity is obtained by accumulating thepixels with the same disparity in a rowwise manner. The two parametersmaxDisparity_V and bins_V should be set up as well. The histogram isdefined as:

    H in =cols∑m=0

    ξm,n

    ξm,n =

    {1, if ∆(m,n) in ith bin0, otherwise

    (4.5)

    As shown in Fig. 4.5a, the V-disparity map provides a side-view projectionof the 3D scene.

    (a) Original (b) After obstacle removal

    Figure 4.5: V-disparity map sample

  • CHAPTER 4. METHODOLOGY 25

    4.1.4 Crude Obstacle RemovalIn traffic scenes, most obstacles can be presented as a combination of somevertical planes in the world coordinates, such as vehicles, pedestrians, cyclists,buildings, and guardrails. As explained in section 3.1.4, for the case of verticalplanes disparity is linearly correlated to the only variable, column index u.Therefore, in a certain column the pixels from obstacles will have roughly thesame disparity value. That is, the pixels with high intensity in the U-disparitymap correspond to obstacles.

    Thus, the obstacles can be identified by applying a thresholding operationin the U-disparity map. If the intensity value of a pixel in the U-disparity mapis larger than a certain threshold thr_U , its corresponding pixel in the originaldisparity image is labeled as obstacles.

    Figure 4.6: Resulting disparity image in pseudo color after obstacle removal(red means value of zero)

    As shown in Fig. 4.6, not all pixels from obstacles are removed completely.Actually, this stage is intended to simplify the V-disparity map’s representa-tion. As illustrated in Fig. 4.5b, the vertical lines with high intensity in Fig.4.5a which correspond to obstacles are eliminated. In this way, the road pro-file can be extracted more accurately from the preprocessed V-disparity map,which will be discussed in the following section.

    4.1.5 Longitudinal Road Profile ExtractionWith the linear characteristics described in section 3.1.4, we can infer thatthrough the previous step, the pixels with high intensity in the V-disparitymap are very likely to correspond to the ground surfaces and are supposed toform a continuous curve. In general urban scenes, the well-structured groundsurface mostly consists of one single plane and will be projected as an idealstraight line in the V-disparity map. Thus, typical 2D line fitting algorithms,like Hough transformation, geometric Hashing, are proven applicable and re-liable but fail for the cases of non-flat or multiplanar ground surfaces.

    In general, the projection of ground surfaces is surrounded by pixels withlow intensity. Instead of fitting into rigid models, we use the Sobel gradient

  • 26 CHAPTER 4. METHODOLOGY

    operator to detect the corresponding edges. The operator uses two 3×3 kernelswhich are convolved with the V-disparity map to compute the vertical andhorizontal derivative approximations:

    Gx =

    −1 0 +1−2 0 +2−1 0 +1

    ∗ Iv, Gy = +1 +2 +10 0 0−1 −2 −1

    ∗ Iv (4.6)Where Iv is the V-disparity map and ∗ denotes the 2D convolution opera-

    tion.The gradient in any orientation is approximated by combining the filter

    response in vertical and horizontal orientations. And the resulting gradientmagnitude is defined as follows:

    G =√G2x +G

    2y (4.7)

    Using the Sobel operator enhances the projection of ground surfaces in theV-disparity disparity. Applying a thresholding operation can directly separatethe potential pixels as shown in Fig 4.7. The threshold parameter is denotedasmagThr.

    (a) Gradient magnitude (b) After thresholding operation

    Figure 4.7: Resulting sample of the Sobel operator (Pixels in the red circlesare outliers)

    As shown in Fig. 4.7b some outliers still exist but generally keep a distanceaway from the road profile. A simple row scanning strategy is proposed toextract the road profile on the thresholded binary image:

  • CHAPTER 4. METHODOLOGY 27

    • First, the initial road profile consists of the most right white pixels ofeach row.

    • Then an outlier checking process is carried out from the second bottomrow to the top row. If the column index of the initial road profile Λi isout of range [Λi+1−outlierThr,Λi+1 +outlierThr], the outlier shouldbe replaced by the most right white pixel within this range. Specially, ifno white pixel exists within this range, then:

    Λi = Λi+1 − 1 (4.8)

    where i denotes the row index from 1 to image height and outlierThris a constant threshold value.

    Applicability in the case of removing interpolation

    All aforementioned results are based on the interpolated disparity image ofFig. 4.3b. We found the Sobel operator also works for the case of originaldisparity image without interpolation process.

    As shown in Fig. 4.8a, some rows on the V-disparity map lack the pixelswith high intensity due to the missing pixels problem in the original disparityimage. But the corresponding gradient magnitude still keep high since the3 by 3 Sobel filter takes its surrounding pixels into consideration. Furtherexperiments show that the interpolation step is not necessary and such roadprofile extraction method is applicable for spare disparity images.

    (a) V-disparity map (b) Thresholded map

    Figure 4.8: Resulting sample for the case of removing interpolation (The redarrows point out the missing high intensity pixels and the red pixels are theextracted road profile).

  • 28 CHAPTER 4. METHODOLOGY

    4.1.6 Road Point ExtractionGiven the longitudinal road profile in the V-disparity map, the pixels whosedisparity is smaller than the road profile only possibly correspond to negativeobjects, such as ditches, potholes. The pothole detection is beyond the scopeof this project and we just assume no pothole or other road surface distressexist. Besides, Ditches are regards as a class of road boundaries and will bediscussed in the following section. Thus, the region with disparity no largerthan corresponding road profile is labeled as ground surface as shown in Fig.4.9.

    Figure 4.9: Extracted road region in pseudo color (red means value of zero)

    Similarly, it is straightforward to label the LiDAR points based on the dis-parity of the extracted road profile as follows. The 3D points whose corre-sponding disparity is not larger than the road profile are labeled as groundpoints. Besides, the points projected outside the image bounding box are calleddead points since they are not involved in the proposed method. All remainingpoints then are obstacle points. Fig. 4.10 illustrates the labeling result.

    Why not label based on the image extraction result

    Since comparing disparity value pixel by pixel to road profile can obtain theimage result as Fig. 4.9, one intuitive but defective solution is to label LiDARpoints by checking whether the corresponding projection is within the 2D roadregion.

    Due to the discrete image coordinate system, a large portion of LiDARpoints are actually projected onto the same pixels. The overlapping rate gener-ally reaches to 20%. That is, one certain pixel probably corresponds to severalLiDAR points which actually belong to different classes. Therefore, it is bet-ter to label based on the disparity value of each point rather than that of eachimage pixel. Fig. 4.11 gives the resulting difference of the two labeling ways.The color representation will keep the same in the following point cloud visu-alization.

  • CHAPTER 4. METHODOLOGY 29

    (a) Font view

    (b) Top view

    Figure 4.10: Resulting sample of labeling LiDAR points (Yellow denotes roadclass, blue is dead points and green represents obstacle class).

    (a) Pixel based (b) Point based

    Figure 4.11: Comparsion between two labeling ways (The green points in thered circles are projected onto the same pixel in the image coordinates).

  • 30 CHAPTER 4. METHODOLOGY

    4.2 Road Boundary DetectionDespite of successful obstacle removal by previous step, the result of groundsurface extraction still retains some road structures which mostly act as driv-able road boundary, including curbs, ditches, grasses, sidewalks. These dif-ferent boundary structures commonly have a large geometrical variation in thesurroundings. In the following, several types of features are proposed to ob-tain the candidate boundary points. Then, a regression algorithm is used to fitthe boundary function for the case of simple straight road.

    4.2.1 Feature DesignThe following three types of feature descriptors response a value to each Li-DAR point. A large value means the corresponding point very differentiateswith its surrounding points. A simple thresholding operation is applied to filterout the road boundary candidate points. The threshold parameter is denotedas feature_thr.

    3D surface curvature

    The surface curvature describes the variation along the surface normal and isdefined as follows: For each point p, its surrounding points pi within a certainradius are selected. The curvature corresponds to the weight of the smallesteigenvector.

    p =1

    k

    k∑i=1

    pi

    C =1

    k

    k∑i=1

    (pi − p)(pi − p)T

    σ =λ0

    λ0 + λ1 + λ2, λ0 < λ1 < λ2

    (4.9)

    Where λ is the eigenvalue of covariance matrix C.The surface curvature varies between 0 and 1 and low values correspond

    to flat surfaces. This feature is generally more robust and stable than surfacenormal [23], but requires the point cloud to be dense enough.

  • CHAPTER 4. METHODOLOGY 31

    Elevation variance

    It is assumed that the road is the smoothest surface with unsmoothed edgeson both sides. Considering each laser scan, the variance in vertical directionprovides a discriminative descriptor of the smoothness. For each scan, thepoints labeled as ground surfaces are remained and sorted by correspondingazimuth angles. For ith point pi = [xi, yi, zi], its front and back k points areselected. The elevation variance is obtained by calculating the variance in thez direction:

    z =1

    2k + 1

    i+k∑j=i−k

    zj

    var =1

    2k + 1

    i+k∑j=i−k

    (zj − z)2(4.10)

    As shown in Fig. 4.12, the road boundary can be identified by searchingthe left and right local extreme peaks.

    (a) LiDAR scan

    (b) Elevation variance (k = 3)

    Figure 4.12: Resulting samples of the elevation variance feature (Yellow de-notes the ground surface points and obstacle points are plotted in green. Thepoints in the red circles corresponds to curbs)

    Least square error (LSE) of linear regression

    As shown in Fig. 3.4, the laser scans emitted to a ideal planar road surfacereturn concentric rings. We can easily infer that for a short enough segmentfrom these rings, the corresponding points are expected to be linear correlated.However, the existence of road boundary structures largely deforms such ringsas illustrated in Fig. 4.13, meanwhile break the linear characteristic on the road

  • 32 CHAPTER 4. METHODOLOGY

    edges. Therefore, considering the bird view of each scan, the sequential pointswith low linear correlation are very likely to correspond to the road boundary.

    Figure 4.13: Model of laser scan on road surfaces, curbs, ditches and grasses(The dotted line segments represents one laser scan)

    For each scan, the points labeled as ground surfaces are sorted by azimuthangles first. For ith point pi = [xi, yi, zi], the selected sequence consists ofits front and back k points. It is assumed that the sequence of points can bemodeled as a simple linear function:

    xj = β0 + β1yj + εj, j ∈ [i− k, i+ k] (4.11)

    The parameter β can be easily estimated by the standard linear least square.The sum of square residual is a straightforward indicator to measure the degreeof linearity:

    Least square error =i+k∑j=i−k

    ε2j (4.12)

    Similar with the elevation variance feature, the extreme peaks correspondsto the candidate boundary points as shown in Fig. 4.14.

    4.2.2 Boundary ModelThe candidate points identified by previous step inevitably contain some falsepositives. A robust regression method called the least trimmed squares (LTS)is applied to fit the boundary function as described in Algorithm 1. Comparedto the standard least squares method, the LTSmethod attempts to minimize the

  • CHAPTER 4. METHODOLOGY 33

    (a) LiDAR scan

    (b) LSE (k = 5)

    Figure 4.14: Resulting samples of the LSE feature (The points in the red circlescorresponds to curbs)

    sum of squared residuals over a subset of the input points. The feature pointswith large residuals are iteratively removed and do not affect the final fit.

    In order to simplify this task, it is assumed that straight boundaries existon both left and right sides. The road boundaries are fitted to a simple lin-ear function. Therefore, such method cannot tackle the cases of no boundaryobservation, or single boundary observation, or winding boundaries.

    Algorithm 1 The Least Trimmed Squares AlgorithmInput: Independent variable X = {x1, ..., xN},

    Dependent variable Y = {y1, ..., yN},Outlier portion p

    Output: Estimated parameter β1: Number of points to remove n = length(X)× p2: for 1 to n do3: β = EstimateParameter(X, Y )4: Ỹ = FitData(β,X)5: residual γ = abs(Ỹ − Y )6: Remove the observation (xi, yi) with maximal residual7: return β

  • 34 CHAPTER 4. METHODOLOGY

    4.3 ImplementationThis section introduces how the proposed method is implemented.

    4.3.1 DatasetWe have evaluated the aforementioned algorithm on three datasets which en-compass both urban and suburb scenes, two types of 64-channel LiDARs.

    Scania dataset

    A 64-channel LiDAR is mounted with a relatively large tilt angle on the headof the Scania autonomous bus Klara as shown in Fig. 4.15. Such large mountangle makes the nearby (

  • CHAPTER 4. METHODOLOGY 35

    KITTI road dataset

    The KITTI road estimation benchmark [59] consists of 289 training and 290testing images, as well as the synchronized LiDAR data captured by the Velo-dyne HDL-64E. This dataset mainly focuses on well-structured urban roadscenarios, but covers quite an amount of different road contexts. The Groundtruth of drivable road areas has been annotated on each image. Using the pro-vided calibration parameters between the camera and LiDAR frames, we caneasily obtain the corresponding road annotation of each point.

    4.3.2 Experimental EquipmentHardware resources

    In this project, a common laptop with an Intel Core i7-4720HQ CPU and 8GB RAM was used for the proposed model.

    Software resources

    Several pieces of software and programming languages were used for the im-plementation of different stages. The primary development was done on Mat-lab 2017b. However, the LiDAR supplier cooperating with Scania only pro-vides a ROS interface to load raw data. Therefore, some work regarding rawdata preprocessing and format conversion was developed on ROS Kinetic inC++. As for KITTI dataset, an official Matlab development kit is available forbasic data processing and visualization.

    4.3.3 Practical ConsiderationsIn general, a frame of 64-channel LiDARs contains over onemillion 3D points.Processing such huge amount of data is very computationally heavy. In orderto speed up the proposed algorithm, several assumptions are established asfollows:

    • The lane width generally ranges from 2.5 to 3.7 meters. The widestroad with eight lanes and pavements should be less than 35 meters wide.Therefore, it is reasonable to only consider 3D points within the rangeof [−25, 25] meters in the y direction.

    • The road will cease to be visible along the horizon line. We assume thevanishing point of roads keep at the particular row horiz_row of the

  • 36 CHAPTER 4. METHODOLOGY

    disparity image. Then the 3D points projected above the particular roware directly labeled as obstacles.

    • Theoretically, the disparity can increase from 0 to infinite as the decreaseof depth. But such enormous range will be split to numerous histogrambins for the U and V disparity map. Considering the valid perceptiondistance of LiDARs (about 1 to 120 meters), the scaled disparity is lim-ited to rang from 0 to 65535. That is, the points with depth less than 1meters are directly removed.

  • Chapter 5

    Results and Discussion

    Unfortunately, both Scania and LiDAR supplier dataset do not contain thepoint-wise annotation on the classes of obstacles and road surfaces. There-fore, we can only qualitatively analyze results for the cases of different trafficscenarios. Then quantitative evaluation is carried out on KITTI benchmark.After that, both advantages and disadvantages of the exploited method are de-scribed.

    5.1 Case studyScania uses a 64-channel LiDAR with different specifications from the Velo-dyne HDL-64E, including lower vertical angular resolution, larger verticalfield of view, lower precision. Moreover, the LiDAR is unconventionallymountedwith a large pitch angle. Most importantly, the proposed method is expectedto prove useful for such LiDAR data and setup.

    5.1.1 Ground Surface ExtractionUrban Scene

    Urban traffic scenes generally consist of flat and planar road surfaces, well-structured road design, as well as vertical on-road obstacles. For typical ur-ban road surface, the longitudinal road profile is mostly presented as a perfectstraight line in the V-disparity domain. Common line extraction algorithmslike Hough transform can be used to extract the road profile [10, 30, 47]. De-spite that the proposed method complicates the extraction of road profile as anedge detection problem, it achieves satisfying performance on urban scenes.

    37

  • 38 CHAPTER 5. RESULTS AND DISCUSSION

    As shown in Fig. 5.1, obstacles are all successfully identified, includingvehicles, pedestrians, trees, buildings. Points over 30 meters away are actuallyvery sparse, but the method still successfully extract the correct road surfacepoints. More resulting samples in urban scenes are presented in AppendixB.1.1 to show its robustness and accuracy.

    (a) Front view

    (b) Top view and V-disparity map (The white pixels denote the extracted road profile)

    Figure 5.1: Resulting sample of the LiDAR supplier’s dataset (The dead pointsare not included for better visualization)

    Suburb Scene

    Suburb scenes are dominated by nonplanar road surfaces and irregular struc-tures, such as ditches, bushes and hills. The line fitting techniques definitelyfail to tackle such road cases since the mapping of road profiles in the V-disparity map are not a simple straight line. The work in [32] proposes a roadprofile extraction algorithm by identifying the points with maximum intensityvalue for each row in the V-disparity map. We find it is only useful for thetraffic scenario with single lane.

    Fig. 5.2 shows a case of fork where the road extends to two different planes.Due to higher density on the left path, the road profile extracted by the baselinealgorithm [32] mostly corresponds to the left ground surface. As a result, amass of points on the right path, as well as some with higher elevation around

  • CHAPTER 5. RESULTS AND DISCUSSION 39

    the left bush, are misclassified as obstacles as shown in Fig. 5.2c. In contrast,our proposed method can deal with this challenging case. Appendix B.1.2presents more resulting samples in other suburb scenarios.

    In summary, the proposed method is reliable and accurate for the groundsurface extraction task. Although a few misclassified points exist around theobstacles or on ground surfaces, it never incorrectly labels the whole vehicle orpedestrian clusters as ground surface on any LiDAR frame of the two datasets.

    (a) Result of the proposed algorithm (front and right view)

    (b) Result of the proposed algorithm (c) Result of the baseline algorithm [32]

    Figure 5.2: Resulting samples of Scania dataset (Green points in the red circlesrepresent trees)

    5.1.2 Candidate Boundary Point IdentificationAs illustrated in Fig. 5.3 and 5.4, the three types of features can roughly iden-tify these boundary points, but also have obvious discriminative performance.

  • 40 CHAPTER 5. RESULTS AND DISCUSSION

    The curvature feature is the stablest, but requires dense enough point clouds.On the contrary, the LSE feature behaves better for distant and sparse pointclouds. The elevation variance feature can deal with both close and distantpoints, but is relatively sensitive to noises.

    (a) Curvature (b) Elevation variance (c) LSE

    Figure 5.3: Resulting samples for identifying curb points (The blue points aredetected features; the parameter setting are as follows: curvature, radius =0.5, feature_thr = 0.03; elevation variance: k = 3, feature_thr = 0.0015;LSE: k = 5, feature_thr = 0.02)

    (a) Front view

    (b) Curvature (c) Elevation variance (d) LSE

    Figure 5.4: Resulting samples for identifying ditch points (The parameter set-ting keeps the same with Fig. 5.3)

  • CHAPTER 5. RESULTS AND DISCUSSION 41

    5.2 Evaluation on KITTI DatasetSince the KITTI dataset is dominated by planar urban road scenarios, theground plane extraction does not face much difficulty as explained in Section4.1.1. Besides, the elevation variance feature is chosen to use after carefullytuning parameters and comparing performances of the three types of features.However, limited by the road boundary regressionmodel, the proposedmethodfails in some tricky cases as shown in Fig. 5.5. Therefore, the road boundariesare required to be straight and observable on both sides in order to make theboundary model effective. Fig. 5.6 presents some successfully resulting sam-ples. Finally, the yellow points between the two detected road boundaries willbe labeled as drivable road region. More resulting samples are shown in Ap-pendix B.2.

    KITTI officially defines several pixel-based metrics for evaluation in 2Dbird’s eye view space [60]. With the similar principle, classical metrics areused for point-based evaluation as follows:

    Precision =TP

    TP + FP(5.1)

    Recall =TP

    TP + FN(5.2)

    False Positive Rate (FPR) =FP

    TP + FP + TN + FN(5.3)

    False Negative Rate (FNR) =FN

    TP + FP + TN + FN(5.4)

    Accuracy =TP + TN

    TP + FP + TN + FN(5.5)

    Results on the test set can only be evaluated via the KITTI official website.Among the available train set, we manually select 165 frames which containtwo straight observable road boundaries. The evaluation result of these 165frames is presented in Tab. 5.1.

    Table 5.1: Evaluation on the selected 165 frames

    Precision Recall FPR FNR Accuracy94.36 % 98.45 % 2.07 % 0.56 % 97.37 %

    In the most common and simplest situation, the proposed method achievesa satisfying and effective performance with especially high recall and low false

  • 42 CHAPTER 5. RESULTS AND DISCUSSION

    negative rate. Generally, precision and recall are inversely related to eachother, where it is only possible to increase one at the cost of reducing the other.Considering the safety of obstacle avoidance system, a relatively high preci-sion is more vital. Therefore, the proposed method may encounter potentialsafety issues.

    5.3 Computation ComplexityThe proposed algorithm does not involve lots of complex calculations. Espe-cially, the ground surface extractionmethod segments point clouds in 2D spacemainly by simple comparison operations. Besides, after removing obstaclesthe point cloud fed to the road boundary detection is largely shrunk.

    The computational complexity is evaluated by the runtime of the devel-oped programs in Matlab. Based on the result of Tab. 5.2, the speed of theground surface extraction algorithm can reach averagely 23.26 frames per sec-ond. Computing curvature features is much more time-consuming than othertwo features since it requires massive point-wise searching operations. If us-ing the elevation variance feature, the average speed of the whole algorithmis about 5.46 fps. However, it is anticipated that there is plenty of room foroptimization, especially if the method is implemented in C++.

    Table 5.2: Runtime of each step on 100 LiDAR frames in Matlab

    Step Time / s

    Groundsurfaceextraction

    Project to image plane 0.81U-disparity 3.29

    Remove obstacle roughly 1.92V-disparity 0.37

    Extract road profile 0.18Classify point cloud 1.03

    Roadboundarydetection

    Compute surface curvature feature 99.88Compute elevation variance feature 7.65

    Compute LSE feature 20.60Regress road boundary 3.06

  • CHAPTER 5. RESULTS AND DISCUSSION 43

    (a) Fork road

    (b) Too short curb

    (c) Winding road

    (d) Blocked by vehicles

    Figure 5.5: Failure cases of the KITTI dataset (The classified 3D points areall projected and overlay on the synchronized image for better visualizationand understanding. The ground truth of drivable road region is presented inlight blue. Yellow denotes points labeled as road surface. The green pointsrepresent obstacles. Feature points are denoted in blue, and the fitting roadboundaries are plotted as red lines)

  • 44 CHAPTER 5. RESULTS AND DISCUSSION

    Figure 5.6: Resulting samples of KITTI dataset

  • CHAPTER 5. RESULTS AND DISCUSSION 45

    5.4 Good propertiesBesides the detection performance and computation complexity discussed inthe above, some other good properties of the proposed algorithm are summa-rized as follows:

    1) Omit the stereo matching processMost camera-basedU-V-disparity approaches obtain disparitymap fromdense stereo matching. However, stereo matching techniques generallyrequire specific optimization to reach real-time performance, and theerror of disparity values increases with depth. In contrast, our methoddirectly achieves the disparity map by a simple linear projection and issuccessfully adjusted to tackle the sparsity of disparity maps. Therefore,such LiDAR-based algorithm is not only more computationally efficient,but also more reliable for 3D information inference.

    2) Identify outliers of LiDAR point cloudsAs shown in Fig. 5.7, some extreme points exist which actually don’tcorrespond to any real-world objects. But such outliers generally areisolated from the ground surface, thereby resulting in large differencebetween corresponding disparity values and road profiles. Therefore,the outliers are successfully identified as obstacles and don’t affect thesubsequent stage of road boundary detection.

    Figure 5.7: Ability to identify outliers of LiDAR data (The isolated points inthe red circles are definitely outliers and all classified as obstacles).

    3) Parameters are easy to tuneTotally 11 parameters need to be tuned manually and carefully, but anamount of practice shows that the parameters can be easily determinedbased on the greedy strategy. Namely, an optimal value is chosen foreach step without considering the interaction among parameters.

  • 46 CHAPTER 5. RESULTS AND DISCUSSION

    5.5 LimitationsAs described in Section 4.2, the boundary detection method is restricted to thesimplest case. In addition, some assumptions and simplifications are appliedto the surface extraction model. The corresponding limitations are listed inthe following:

    1) Reduce the field of view to a camera caseThe two types of LiDARs in this project both capture 3D data over a 360-degree scanning area. LiDARs generally are mounted horizontally onthe roof of autonomous cars, thereby providing 3D cues around. How-ever, the proposed method only considers the front points which are pro-jected within the predefined image region. In the real world, perceptionof back and two-side environment is an indispensable capability, espe-cially when AVs decide to back up or make a turn.

    2) Require calibration on LiDARs’ pitch and yaw anglesAs explained in Section 3.1.4, only when the pitch and yaw angles aresmall enough, the linear characteristic is obeyed in the U-V-disparitydomain. Otherwise, the obstacles and ground surfaces cannot be pro-jected as curves with high intensity. Thus, calibration on the two anglesis necessary. Then the LiDAR coordinate can be rotated with approxi-mate zero pitch and yaw angles. In other words, a large calibration erroror looseness or accidental displacement possibly leads to invalidation ofthe proposed method.

    3) Require a decent-sized road area in frontThe assumption of a decent-sized road area is reasonable since AVs gen-erally keep a certain distance from surrounding obstacles. If the groundsurface is almost completely blocked by obstacles, the method will ex-tract an incorrect road profile.

    4) May fail to detect slanted obstaclesAs described in Section 3.1.4, oblique planes are projected as lines in theV-disparity domain. But oblique planes can be sloped ground surfaces,but also obstacles such as hills and oblique buildings. Fig. 5.8 showsan failure case of road profile extraction. The hill is falsely labeled asground surface since its projection is also a line in the V-disparity map.

  • CHAPTER 5. RESULTS AND DISCUSSION 47

    Figure 5.8: Failure case of road profile extraction

  • Chapter 6

    Conclusions

    6.1 ConclusionsIn this project, a drivable road region detection system was developed for au-tonomous vehicles with a 64-channel LiDAR. The thesis work consists of threecontributions:

    • A U-V-disparity based method is proposed to extract ground surfacefrom point cloud. In the U-V-disparity domain, obstacles are projectedas points with high intensity, and ground surface is mapped as a curve.Using this attribute, the proposed method converts the 3D ground sur-face extraction problem to a simple edge detection task in 2D image.Those 3D points above the extracted ground surface are classified asobstacles.

    • Three types of features are designed to filter the 3D ground points andobtain the candidate boundary points.

    • A robust regression algorithm is proposed to fit road boundaries.

    Experiments have been conducted under variations involving different roadscenarios, LiDAR mounting pose and two LiDAR configurations. Experi-mental results illustrate that the ground surface extraction method can achievea realtime and robust performance. The three types of features can roughlyidentify the boundary points, but are all too computational-heavy due to mas-sive filtering operations among points. Limited by the regression algorithm,the proposed system finally restricts to the most common case: two straightboundaries are observable. Evaluation on KITTI road benchmark shows thatfor such case the overall recall and accuracy both reach a high rate over 97%.

    48

  • CHAPTER 6. CONCLUSIONS 49

    6.2 Future WorksThe potential directions for future works are suggested as follows:

    Expanding and annotating the Scania dataset

    What concerns Scania most is the performance of the proposed system onthe Scania Klara bus. But the Scania dataset used in this project only con-tains 414 frames in suburb scene. Moreover, no pointwise annotation makesit impossible to conduct quantitative evaluation and compare with other meth-ods. Therefore, it is essential to expand and annotate dataset, convering largeramount of different road contexts.

    Improving road boundary model

    The linear regression model used in this project can only fit straight roadboundary. But actually the real-world cases aremuchmore complicated. Somenonlinear curve fitting functions, such as B-spline, may be useful for the casesof winding or fork road. Besides, one interesting direction is to incorporatethe prior knowledge of surrounding road from map. In this way, the systemwould know which model is suitable for current case.

    What if road boundary becomes unobservable

    The least trimmed squares algorithmwas used to avoid the impact of false pos-itives, but it cannot tackle the situation when the road boundary is blocked byobstacles. One possible solution is to track the detected boundary points basedon Kalman filter. In general, the detection part would be much more computa-tionally heavy than tracker. How to choose a suitable detection frequency canbe investigated in future works.

    Exploiting the information of reflectivity

    In this project, it is assumed that considerable geometrical variation existsaround road boundaries. But sidewalks are probably separated from roadwayswithout distinct curbs as shown in Fig. 5.5b. In this case, only using depth in-formation would be difficult to identify the true boundary points. The LiDARused by Scania can capture an intriguing type of information, reflectivity. Itsstrength only depends on the composition of the surface object. If roadways ismade of unique and consistent material, such as asphalt, concrete, reflectivitywould provide a straightforward cue for the boundary detection task.

  • 50 CHAPTER 6. CONCLUSIONS

    Extending to 3D object detection

    The proposed method has proven useful for obstacle extraction. Further stepsmight cluster the extracted obstacle points as separated objects. Various clus-tering methods are worth of investigation, such as K-means, mean shift, DB-SCAN. The corresponding results would allow to determine the pose and mo-tion state of surrounding vehicles and pedestrians.

  • Bibliography

    [1] Aharon Bar Hillel et al. “Recent progress in road and lane detection: asurvey”. In: Machine vision and applications 25.3 (2014), pp. 727–745.

    [2] Hazel Si Min Lim and Araz Taeihagh. “Algorithmic Decision-MakinginAVs:Understanding Ethical and Technical Concerns for Smart Cities”.In: Sustainability 11.20 (2019), p. 5791.

    [3] World Health Organization et al. Global status report on road safety2018: Summary. Tech. rep. World Health Organization, 2018.

    [4] José M Alvarez, A Lopez, and Ramon Baldrich. “Illuminant-invariantmodel-based road segmentation”. In: 2008 IEEE Intelligent VehiclesSymposium. IEEE. 2008, pp. 1175–1180.

    [5] Ying Guo, Vadim Gerasimov, and Geoff Poulton. “Vision-based driv-able surface detection in autonomous ground vehicles”. In: 2006 IEEE/RSJInternational Conference on Intelligent Robots and Systems. IEEE. 2006,pp. 3273–3278.

    [6] Vijay Badrinarayanan, Ignas Budvytis, and Roberto Cipolla. “Mixtureof trees probabilistic graphical model for video segmentation”. In: In-ternational journal of computer vision 110.1 (2014), pp. 14–29.

    [7] Hui Kong, Jean-Yves Audibert, and Jean Ponce. “Vanishing point detec-tion for road detection”. In: 2009 IEEE Conference on Computer Visionand Pattern Recognition. IEEE. 2009, pp. 96–103.

    [8] Jinjin Shi, Jinxiang Wang, and Fangfa Fu. “Fast and robust vanishingpoint detection for unstructured road following”. In: IEEE Transactionson Intelligent Transportation Systems 17.4 (2015), pp. 970–979.

    [9] Hui Kong, Sanjay E Sarma, and Feng Tang. “Generalizing Laplacian ofGaussian filters for vanishing-point detection”. In: IEEE Transactionson Intelligent Transportation Systems 14.1 (2012), pp. 408–418.

    51

  • 52 BIBLIOGRAPHY

    [10] Raphael Labayrade, Didier Aubert, and J-P Tarel. “Real time obstacledetection in stereovision on non flat road geometry through" v-disparity"representation”. In: Intelligent Vehicle Symposium, 2002. IEEE. Vol. 2.IEEE. 2002, pp. 646–651.

    [11] Hernán Badino, Uwe Franke, and David Pfeiffer. “The stixel world-acompact medium level representation of the 3d-world”. In: Joint PatternRecognition Symposium. Springer. 2009, pp. 51–60.

    [12] Luca Caltagirone et al. “LIDAR–camera fusion for road detection us-ing fully convolutional neural networks”. In: Robotics and AutonomousSystems 111 (2019), pp. 125–131.

    [13] Xiaofeng Han et al. “Road detection based on the fusion of Lidar andimage data”. In: International Journal of Advanced Robotic Systems14.6 (2017), p. 1729881417738102.

    [14] ShuoGu et al. “RoadDetection through CRF based LiDAR-Camera Fu-sion”. In: 2019 International Conference on Robotics and Automation(ICRA). IEEE. 2019, pp. 3832–3838.

    [15] Angel Domingo Sappa et al. “An efficient approach to onboard stereovision system pose estimation”. In: IEEE Transactions on IntelligentTransportation Systems 9.3 (2008), pp. 476–490.

    [16] WShane Grant, Randolph CVoorhies, and Laurent Itti. “Finding planesin LiDAR point clouds for real-time registration”. In: 2013 IEEE/RSJInternational Conference on Intelligent Robots and Systems. IEEE. 2013,pp. 4347–4354.

    [17] Xiao Ai et al. “Obstacle detection using U-disparity on quadratic roadsurfaces”. In: 16th International IEEE Conference on Intelligent Trans-portation Systems (ITSC 2013). IEEE. 2013, pp. 1352–1357.

    [18] Andreas Wedel et al. “B-spline modeling of road surfaces for freespaceestimation”. In: 2008 IEEE Intelligent Vehicles Symposium. IEEE. 2008,pp. 828–833.

    [19] Wende Zhang. “Lidar-based road and road-edge detection”. In: 2010IEEE Intelligent Vehicles Symposium. IEEE. 2010, pp. 845–848.

    [20] Alberto YHata, Fernando SOsorio, andDenis FWolf. “Robust curb de-tection and vehicle localization in urban environments”. In: 2014 IEEEIntelligent Vehicles Symposium Proceedings. IEEE. 2014, pp. 1257–1262.

  • BIBLIOGRAPHY 53

    [21] Baoxing Qin et al. “Curb-intersection feature based monte carlo local-ization on urban roads”. In: 2012 IEEE International Conference onRobotics and Automation. IEEE. 2012, pp. 2640–2646.

    [22] Jaehyun Han et al. “Enhanced road boundary and obstacle detectionusing a downward-looking LIDAR sensor”. In: IEEE Transactions onVehicular Technology 61.3 (2012), pp. 971–985.

    [23] Carlos Fernández et al. “Curvature-based curb detection method in ur-ban environments using stereo and laser”. In: 2015 IEEE Intelligent Ve-hicles Symposium (IV). IEEE. 2015, pp. 579–584.

    [24] Soren Kammel and Benjamin Pitzer. “Lidar-based lane marker detec-tion andmapping”. In: 2008 IEEE Intelligent Vehicles Symposium. IEEE.2008, pp. 1137–1142.

    [25] Alberto Hata and Denis Wolf. “Road marking detection using LIDARreflective intensity data and its application to vehicle localization”. In:17th International IEEE Conference on Intelligent Transportation Sys-tems (ITSC). IEEE. 2014, pp. 584–589.

    [26] In-So Kweon et al. “Terrain mapping for a roving planetary explorer”.In: IEEE International Conference on Robotics and Automation. IEEE.1989, pp. 997–1002.

    [27] Rudolph Triebel, Patrick Pfaff, and Wolfram Burgard. “Multi-level sur-facemaps for outdoor terrainmapping and loop closing”. In: 2006 IEEE/RSJinternational conference on intelligent robots and systems. IEEE. 2006,pp. 2276–2282.

    [28] Sujit Kuthirummal, Aveek Das, and Supun Samarasekera. “A graphtraversal based algorithm for obstacle detection using lidar or stereo”.In: 2011 IEEE/RSJ International Conference on Intelligent Robots andSystems. IEEE. 2011, pp. 3874–3880.

    [29] Gangqiang Zhao and Junsong Yuan. “Curb detection and tracking using3D-LIDAR scanner”. In: 2012 19th IEEE International Conference onImage Processing. IEEE. 2012, pp. 437–440.

    [30] Zhencheng Hu, Francisco Lamosa, and Keiichi Uchimura. “A completeuv-disparity study for stereovision based 3d driving environment anal-ysis”. In: Fifth International Conference on 3-D Digital Imaging andModeling (3DIM’05). IEEE. 2005, pp. 204–211.

  • 54 BIBLIOGRAPHY

    [31] Yuan Gao et al. “UV-disparity based obstacle detection with 3D cameraand steerable filter”. In: 2011 IEEE Intelligent Vehicles Symposium (IV).IEEE. 2011, pp. 957–962.

    [32] Meiqing Wu, Siew-Kei Lam, and Thambipillai Srikanthan. “Nonpara-metric technique based high-speed road surface detection”. In: IEEETransactions on Intelligent Transportation Systems 16.2 (2014), pp. 874–884.

    [33] Umar Ozgunalp et al. “Multiple lane detection algorithm based on noveldense vanishing point estimation”. In: IEEE Transactions on IntelligentTransportation Systems 18.3 (2016), pp. 621–632.

    [34] Rui Fan and Naim Dahnoun. “Real-time stereo vision-based lane de-tection system”. In: Measurement Science and Technology 29.7 (2018),p. 074005.

    [35] Amita Dhiman, Hsiang-Jen Chien, and Reinhard Klette. “Road surfacedistress detection in disparity space”. In: 2017 International Conferenceon Image and Vision Computing New Zealand (IVCNZ). IEEE. 2017,pp. 1–6.

    [36] Zhen Zhang et al. “An efficient algorithm for pothole detection usingstereo vision”. In: 2014 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP). IEEE. 2014, pp. 564–568.

    [37] David Pfeiffer andUwe Franke. “Towards aGlobal OptimalMulti-LayerStixel Representation of Dense 3D Data.” In: BMVC. Vol. 11. 2011,pp. 51–1.

    [38] Marius Cordts et al. “The stixel world: A medium-level representationof traffic scenes”. In: Image and Vision Computing 68 (2017), pp. 40–52.

    [39] Friedrich Erbs, Alexander Barth, and Uwe Franke. “Moving vehicle de-tection by optimal segmentation of the dynamic stixel world”. In: 2011IEEE Intelligent Vehicles Symposium (IV). IEEE. 2011, pp. 951–956.

    [40] Friedrich Erbs, Beate Schwarz, andUwe Franke. “Stixmentation-ProbabilisticStixel based Traffic Scene Labeling.” In: Bmvc. 2012, pp. 1–12.

    [41] David Pfeiffer et al. “Ground truth evaluation of the Stixel representa-tion using laser scanners”. In: 13th International IEEE Conference onIntelligent Transportation Systems. IEEE. 2010, pp. 1091–1097.

  • BIBLIOGRAPHY 55

    [42] MaximilianMuffert, Nicolai Schneider, and Uwe Franke. “Stix-Fusion:a probabilistic Stixel integration technique”. In: 2014 Canadian Confer-ence on Computer and Robot Vision. IEEE. 2014, pp. 16–23.

    [43] Daniel Hernandez-Juarez et al. “Slanted Stixels: Representing San Fran-cisco’s steepest streets”. In: arXiv preprint arXiv:1707.05397 (2017).

    [44] Friedrich Erbs et al. “Spider-based Stixel object segmentation”. In: 2014IEEE Intelligent Vehicles Symposium Proceedings. IEEE. 2014, pp. 906–911.

    [45] Marius Cordts et al. “Object-level priors for stixel generation”. In: Ger-man Conference on Pattern Recognition. Springer. 2014, pp. 172–183.

    [46] Lukas Schneider et al. “Semantic stixels: Depth is not enough”. In: 2016IEEE Intelligent Vehicles Symposium (IV). IEEE. 2016, pp. 110–117.

    [47] Liang Chen, Jian Yang, and Hui Kong. “Lidar-histogram for fast roadand obstacle detection”. In: 2017 IEEE International Conference onRobotics and Automation (ICRA). IEEE. 2017, pp. 1343–1348.

    [48] Shuo Gu et al. “Lidar-based urban road detection by histograms of nor-malized inverse depths and line scanning”. In: 2017 European Confer-ence on Mobile Robots (ECMR). IEEE. 2017, pp. 1–6.

    [49] Shuo Gu et al. “Histograms of the Normalized Inverse Depth and LineScanning for Urban Road Detection”. In: IEEE Transactions on Intelli-gent Transportation Systems (2018).

    [50] Shuo Gu et al. “3-D LiDAR + Monocular Camera: An Inverse-Depth-Induced Fusion Framework for Urban RoadDetection”. In: IEEE Trans-actions on Intelligent Vehicles 3.3 (2018), pp. 351–360.

    [51] Yigong Zhang et al. “Fusion of lidar and camera by scanning in lidarimagery and image-guided diffusion for urban road detection”. In: 2018IEEE Intelligent Vehicles Symposium (IV). IEEE. 2018, pp. 579–584.

    [52] BichenWu et al. “Squeezeseg: Convolutional neural nets with recurrentcrf for real-time road-object segmentation from 3d lidar point cloud”.In: 2018 IEEE International Conference on Robotics and Automation(ICRA). IEEE. 2018, pp. 1887–1893.

    [53] YuanWang et al. “Pointseg: Real-time semantic segmentation based on3d lidar point cloud”. In: arXiv preprint arXiv:1807.06288 (2018).

    [54] Florian Piewak et al. “Boosting lidar-based semantic labeling by cross-modal training data generation”. In: Proceedings of the European Con-ference on Computer Vision (ECCV). 2018, pp. 0–0.

  • 56 BIBLIOGRAPHY

    [55] Ayush Dewan, Gabriel L Oliveira, and Wolfram Burgard. “Deep se-mantic classification for 3d lidar data”. In: 2017 IEEE/RSJ InternationalConference on Intelligent Robots and Systems (IROS). IEEE. 2017, pp. 3544–3549.

    [56] Lyne Tchapmi et al. “Segcloud: Semantic segmentation of 3d pointclouds”. In: 2017 International Conference on 3D Vision (3DV). IEEE.2017, pp. 537–547.

    [57] Gernot Riegler, Ali OsmanUlusoy, andAndreasGeiger. “Octnet: Learn-ing deep 3d representations at high resolutions”. In: Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition. 2017,pp. 3577–3586.

    [58] Charles R Qi et al. “Pointnet: Deep learning on point sets for 3d clas-sification and segmentation”. In: Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition. 2017, pp. 652–660.

    [59] Andreas Geiger et al. “Vision meets robotics: The KITTI dataset”. In:The International Journal of Robotics Research 32.11 (2013), pp. 1231–1237.

    [60] Jannik Fritsch, Tobias Kuehnl, and Andreas Geiger. “A new perfor-mancemeasure and evaluation benchmark for road detection algorithms”.In: 16th International IEEE Conference on Intelligent TransportationSystems (ITSC 2013). IEEE. 2013, pp. 1693–1700.

  • Appendix A

    Societal Aspects

    A.1 EthicsAutonomous vehicles (AVs) are already capable of running in structured roadenvironment. But in order to popularize AVs the interest of various stake-holders must be balanced and also explicit ethics criteria must be set up. Theethical issues are essentially algorithmic for engineers. Engineers focus onimproving self-driving technique, therefore achieving optimum of economicbenefits and cost. That means engineers should be responsible for all possibleproblems, including ethical dilemma. To meet engineering requirements auto-mobile component suppliers would also redesign their products. On the otherhand, related government department will certainly face ethics and forensicdisputes caused by AVs. Obviously, the most closely involved group is users.Only when potential users approve attribution of liability and ethics criteria,will they buy the cars. On the contrary, once someone enjoy self-driving ser-vice initiatively, he must accept risks of AVs.

    The core of ethical issues lies in how much people accept a machine thatmight make mistakes and who should be responsible for