2
End-to-End Driving in Unstructured Environments Using Conditional Imitation Learning Joonwoo Ahn 1 and Jaeheung Park 1 Abstract— We propose a method using conditional imitation learning for autonomous driving on unstructured environments. This method is to train a driving policy to imitate an expert action about the input information. Also, the driving policy is classified according to conditional driving purposes. We show the effectiveness of the proposed method in Carla simulator. I. I NTRODUCTION In order to develop Level-4 autonomous driving, strategy for driving at unstructured environments such as parking lots is needed. In these environments, there is narrow and no lane. Also, obstacles are in various forms. Therefore, the vehicle meets a stochastic situation and it is difficult to drive using a rule-based algorithm. The deep learning method is used to deal with the stochastic situation. Among them, the reinforcement learning method is typically applied in simple environments [1]. This method is mostly used in the robot to avoid obstacles using LiDAR in uncomplicated environments. It is also applied in the lane-changing decision at structured roads. However, the configuration of the reward function according to the objective is heuristic, so it can not be applied at real situations. In real situations, imitation learning has shown good performance than reinforcement learning because there are no heuristic things [2]. A neural network model is given the input with labeled according to a representation of the intention of expert for training. However, many researchers focused on purely reactive tasks and simple environments, such as lane following or obstacle avoidance [3]. We ap- ply the imitation learning to the unstructured environments especially parking lot considering the intersection situation, which is called the conditional imitation learning. II. PROBLEM DEFINITION The goal is to train a conditional driving policy π in the complex and narrow environments with an intersection. There is no reference path to track, and localization data is noisy. We assume that we can get only a camera image data c and navigation level information d. d is called driving purpose d and determined by a global planner before driving. It is categorized to the intersection (d = d s ,d l ,d r ) and non- intersection information(d = d f ); d s : go straight, d l : turn left, d r : turn right at the intersection and d f : following at the non-intersection. *This work was not supported by any organization. 1 are with the DYROS (Dynamic Robotics Systems) Lab, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, Republic of Korea. Jaeheung Park is the corresponding author. joonwooahn, [email protected] Fig. 1. System architecture of end-to-end imitation learning III. PROBLEM SOLVING APPROACH A. Conditional Imitation Learning The basic idea behind imitation learning is to train a function approximator F that mimics the expert. An implicit assumption of imitation learning is that the experts actions E(c i ) are fully explained by the observation c i ; a i = E(c i ). It means that c i and a i are mapped by E and trained π mimics it well. If this assumption holds, π will be able to fit the function E sufficiently given enough data set D = {<c i ,a i >} i=1 which is generated by the expert. Similar to supervised learning problem, in which the parameters θ of the function approximator F (c; θ) are optimized to fit the mapping of c i to a i and get driving policy π; π = minimize θ X i=1 l(F (c i ; θ),a i ) (1) , where l is a loss function which is difference between action of the experts (E(c i )) and the function approximator F (c; θ). Equation (1) is difficult to drive at an intersection. To address it, F d = {F s ,F l ,F r ,F f } are subdivided by the driving purpose d = {d s ,d l ,d r ,d f }; π = minimize θ d X i=1 l d (F d (c i ; θ),a i ). (2) At each time step t, the driving policy π receives the camera image data c t and the driving purpose d t , and takes the action a t (see Fig. 1). While driving, d t is selected considering a pose of the vehicle and a location of the intersection. If the vehicle is driving at a non-intersection, d t is d f . When the vehicle approaches the intersections within bound α, d f is switched to d s,l,r while its heading is aligned with the direction of the intersection range in β.

End-To-End Driving in Unstructured Environments …dyros.snu.ac.kr/wp-content/uploads/2019/07/UR19_0204_FI...Title End-To-End Driving in Unstructured Environments Using Conditional

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: End-To-End Driving in Unstructured Environments …dyros.snu.ac.kr/wp-content/uploads/2019/07/UR19_0204_FI...Title End-To-End Driving in Unstructured Environments Using Conditional

End-to-End Driving in Unstructured Environments Using ConditionalImitation Learning

Joonwoo Ahn1 and Jaeheung Park1

Abstract— We propose a method using conditional imitationlearning for autonomous driving on unstructured environments.This method is to train a driving policy to imitate an expertaction about the input information. Also, the driving policy isclassified according to conditional driving purposes. We showthe effectiveness of the proposed method in Carla simulator.

I. INTRODUCTION

In order to develop Level-4 autonomous driving, strategyfor driving at unstructured environments such as parking lotsis needed. In these environments, there is narrow and no lane.Also, obstacles are in various forms. Therefore, the vehiclemeets a stochastic situation and it is difficult to drive usinga rule-based algorithm.

The deep learning method is used to deal with thestochastic situation. Among them, the reinforcement learningmethod is typically applied in simple environments [1].This method is mostly used in the robot to avoid obstaclesusing LiDAR in uncomplicated environments. It is alsoapplied in the lane-changing decision at structured roads.However, the configuration of the reward function accordingto the objective is heuristic, so it can not be applied at realsituations.

In real situations, imitation learning has shown goodperformance than reinforcement learning because there areno heuristic things [2]. A neural network model is giventhe input with labeled according to a representation of theintention of expert for training. However, many researchersfocused on purely reactive tasks and simple environments,such as lane following or obstacle avoidance [3]. We ap-ply the imitation learning to the unstructured environmentsespecially parking lot considering the intersection situation,which is called the conditional imitation learning.

II. PROBLEM DEFINITION

The goal is to train a conditional driving policy π inthe complex and narrow environments with an intersection.There is no reference path to track, and localization datais noisy. We assume that we can get only a camera imagedata c and navigation level information d. d is called drivingpurpose d and determined by a global planner before driving.It is categorized to the intersection (d = ds, dl, dr) and non-intersection information(d = df ); ds: go straight, dl: turnleft, dr: turn right at the intersection and df : following atthe non-intersection.

*This work was not supported by any organization.1are with the DYROS (Dynamic Robotics Systems) Lab, Graduate

School of Convergence Science and Technology, Seoul National University,Seoul, Republic of Korea. Jaeheung Park is the corresponding author.joonwooahn, [email protected]

Fig. 1. System architecture of end-to-end imitation learning

III. PROBLEM SOLVING APPROACH

A. Conditional Imitation Learning

The basic idea behind imitation learning is to train afunction approximator F that mimics the expert. An implicitassumption of imitation learning is that the experts actionsE(ci) are fully explained by the observation ci; ai = E(ci).It means that ci and ai are mapped by E and trained πmimics it well. If this assumption holds, π will be able tofit the function E sufficiently given enough data set D ={< ci, ai >}i=1 which is generated by the expert. Similarto supervised learning problem, in which the parameters θof the function approximator F (c; θ) are optimized to fit themapping of ci to ai and get driving policy π;

π = minimizeθ∑i=1

l(F (ci; θ), ai) (1)

, where l is a loss function which is difference between actionof the experts (E(ci)) and the function approximator F (c; θ).

Equation (1) is difficult to drive at an intersection. Toaddress it, Fd = {Fs, Fl, Fr, Ff} are subdivided by thedriving purpose d = {ds, dl, dr, df};

π = minimizeθd∑i=1

ld(Fd(ci; θ), ai). (2)

At each time step t, the driving policy π receives the cameraimage data ct and the driving purpose dt, and takes the actionat (see Fig. 1). While driving, dt is selected consideringa pose of the vehicle and a location of the intersection. Ifthe vehicle is driving at a non-intersection, dt is df . Whenthe vehicle approaches the intersections within bound α, dfis switched to ds,l,r while its heading is aligned with thedirection of the intersection range in β.

Page 2: End-To-End Driving in Unstructured Environments …dyros.snu.ac.kr/wp-content/uploads/2019/07/UR19_0204_FI...Title End-To-End Driving in Unstructured Environments Using Conditional

B. Network Architecture

The function approximator F is represented by a deepneural network based on InceptionResnetv2. It constructs thenetwork by applying the residual connection concept to theinception structure. This increases accuracy and reduces theamount of computing effort.

Input of InceptionResnetv2 layer is camera image c, andits size is 360 x 120 x 3. This layer passes through 50%dropout layer and is used as the input of the fully connectedlayer with n output units. It finally classifies the action ausing the softmax layer. n is a number of a and has 30cases. An optimizer is used Adam with learning rate 1e-5,and loss function is used the categorical cross entropy. Weset the batch size: 128 and epoch: 30. A data augmentation isnot applied, because the position and direction of the cameraimage c are important to decide a.

C. Data Acquisition

When performing the imitation learning, a key is to collecttraining data. We use Carla simulator to get it by tracking aglobal reference path which is composed of the way-pointsset. The vehicle uses the pure pursuit path-tracking algorithmto track a look-ahead point on the path at a predetermineddistance. When it meets obstacles, the expert moves the look-ahead point orthogonal to the direction of the vehicle. Inputand action data set D = {< c, a >} is collected 10 timesper 1 second.

IV. EVALUATION

We train and evaluate the presented approach by usingCarla simulator. α is set to 0.7 m, and β is set to 0.25 rad.

A. System Setup

As shown in Fig. 1, driving purpose d selects the drivingpolicy π. Then, selected πd compute the steering angle a byreceiving camera image c which is front view RGB cameraimage. a is discretized to 30 classes and its resolution is 36degree and it is a label of the input. Finally, a is passedthrough the low pass filter for smooth driving. The targetspeed is set to 15 km/h and set to low in proportion tosteering angle a.

B. Environments Setup

Unreal Engine 4 is used to make a narrow track and aparking lot with static obstacles. A curvature at narrow trackis 0.14, and width is 7 m (Fig. 2 (a)). In the parking lot, itscurvature is 0.1, and width is 2.7 m to 10 m (Fig. 2 (b)).

C. Results

Fig. 2 shows a trajectory of the vehicle which is controlledby the trained driving policy π. At the narrow track whichis shown in Fig. 2(a), the vehicle tended to cut the corner.Nevertheless, it was able to drive this track without collision.In complex parking lot, (Fig. 2(b)), the vehicle avoided staticobstacles narrowly. When the vehicle drove the track 100times, there were about 35 crashes. We tested at non-trainedenvironments using the driving policy π, the vehicle cannotdrive well.

(a)

(b)

Fig. 2. End-to-end imitation learning test results. Red line is a vehicletrajectory. Blue dash box is an intersection zone. Green dash and arrow arethe driving purpose. (a) narrow track, (b) parking lot

V. CONCLUSION

We propose conditional imitation learning according todriving purpose, which is an approach to learning fromexpert demonstrations of low-level controls with high-levelcommands. In the simulation, we applied the presented ap-proach to camera-based driving of unstructured environmentswith the intersection. Our results show that the vehicle candrive safely at trained environments. Several future workswill be considered; i) train with velocity, ii) driving inuntrained environments, iii) dynamic obstacles, and iv) realvehicle test.

REFERENCES

[1] J. Kober, J. A. Bagnell, and J. Peters, “Reinforcement learning inrobotics: A survey,” The International Journal of Robotics Research,vol. 32, no. 11, pp. 1238–1274, 2013.

[2] F. Codevilla, M. Miiller, A. Lopez, V. Koltun, and A. Dosovitskiy,“End-to-end driving via conditional imitation learning,” in 2018 IEEEInternational Conference on Robotics and Automation (ICRA). IEEE,2018, pp. 1–9.

[3] M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B. Flepp,P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang et al., “Endto end learning for self-driving cars,” arXiv preprint arXiv:1604.07316,2016.