The AI Driving Olympics at NIPS 2018 - GitHub PagesCorresponding author: [email protected] live ”Robotariums” where submissions will be evalu-ated automatically. Keywords

The AI Driving Olympics at NIPS 2018

Andrea Censi,1,2 Liam Paull,3∗ Jacopo Tani,1 Thomas Ackermann,1 Oscar Beijbom,2 Berabi Berkai,1 GianmarcoBernasconi,1 Anne Kirsten Bowser, Simon Bing,1 Pin-Wei David Chen,4 Yu-Chen Chen,4 Maxime

Chevalier-Boisvert,3 Breandan Considine,2,3 Justin De Castri,5 Maurilio Di Cicco,2 Manfred Diaz,3 Paul AurelDiederichs,1 Florian Golemo,3,6 Ruslan Hristov,2 Lily Hsu,4 Yi-Wei Daniel Huang,4 Chen-Hao Peter Hung,4

Qing-Shan Jia,7 Julien Kindle,1 Dzenan Lapandic,1 Cheng-Lung Lu,1,4 Sunil Mallya,5 Bhairav Mehta,3 Aurel Neff,1

Eryk Nice,2 Yang-Hung Allen Ou,4 Abdelhakim Qbaich,3 Josefine Quack,1 Claudio Ruch,1 Adam Sigal,3 NiklasStolz,1 Alejandro Unghia,1 Ben Weber,1 Sean Wilson,8 Zi-Xiang Xia,4 Timothius Victorio Yasin,4 Nivethan

Yogarajah,1 Julian Zilly,1 Yoshua Bengio,3 Tao Zhang,7 Hsueh-Cheng Wang,4 Stefano Soatto,5 Magnus Egerstedt,8

Emilio Frazzoli1,2

Abstract

Deep learning and reinforcement learning have haddramatic recent breakthroughs. However, the abilityto apply these approaches to control real physicallyembodied agents remains primitive compared to tra-ditional robotics approaches.

To help bridge this gap, we are announcing the AIDriving Olympics (AI-DO), which will be a live com-petition at the Neural Information Processing Sys-tems (NIPS) in Dec. 2018. The overall objectiveof the competition is to evaluate the state of the artof machine learning and artificial intelligence on aphysically embodied platform. We are using theDuckietown [14] platform since it is a simple andwell-specified environment that can be used for au-tonomous navigation.

The competition comprises five tasks of increasingcomplexity - from simple lane following to manag-ing an autonomous fleet. For each task we will pro-vide tools for competitors to use in the form of simu-lators, logs, low-cost access to robotic hardware and

1 ETH Zurich, Zurich, Switzerland2 nuTonomy, Aptiv Mobility Group, Boston, USA3 Mila, Universite de Montreal, Montreal, Canada4 National Chiao Tung University, Taiwan5 Amazon AWS, Seattle, USA6 INRIA Bordeaux, France7 Tsinghua, Beijing, China8 Georgia Tech, Atlanta, USA∗ Corresponding author: [email protected]

live ”Robotariums” where submissions will be evalu-ated automatically.

Keywords Robotics, safety-critical AI, self-drivingcars, autonomous mobility on demand.

1 Introduction

Machine Learning (ML) methods have shown re-markable success for tasks in both disembodieddataset settings and virtual environments, yet stillhave to prove themselves in embodied robotic tasksettings. The requirements for real physical systemsare much different: (a) the robot must make actionsbased on sensor data in realtime, (b) in many casesall computation must take place onboard the platformand resources may be limited, and (c) many physicalsystems are safety-critical which imposes a necessityfor performance guarantees and a good understand-ing of uncertainty. All of these requirements presenta stress on current ML methods that are built on DeepLearning (DL), are resource hungry and don’t pro-vide performance guarantees “out of the box”.

“Classical” robotics systems are built as a composi-tion of blocks (perception, estimation, planning, con-trol, etc. ), and an understanding of the interplaybetween these different components is necessary forrobots to be able to achieve complex and useful tasks.However, with ML methods it is unclear whetherthese abstractions are necessary and whether it is

1

Figure 1: Duckietown is a scaled-down autonomouscity where the inhabitants are duckies. The platformcomprises autonomous vehicles (Duckiebots) and awell-specified environment in which the Duckiebotsoperate. This platform is used as the basis for the AIDriving Olympics competition which will take placeat the NIPS 2018 conference.

more effective to learn chains of these componentsin an ”end-to-end” fashion or to separate the compo-nents out and learn them in isolation.

In order to address these issues, we have developed anew benchmark for evaluating ML on physically em-bodied systems, called the ”AI Driving Olympics” orAI-DO. The first AI-DO event will be a live compe-tition at the Neural Information Processing Systems(NIPS) 2018 conference in Montreal, Quebec.

A main consideration for casting this research asa competition is that solutions to embodied robotictasks are notoriously difficult to compare [2, 4]. Asargued by Behnke [4], competitions are therefore asuitable way of advancing robotics by making solu-tions comparable and results reproducible. Withouta clear benchmark of performance, progress is diffi-cult to measure. To drive home this message, we re-call the story of the drunkard who has lost his housekeys and then searches for them under the street light.When asked why he searches close to the street lighthe answers because it is lighter there [9]. Learningfrom this experience, we aspire to cast this competi-tion in a way that encourages solving the problemsthat need to be solved rather than the ones we know

how to solve.

Furthermore, as argued in [18] in the context of com-puter vision, large scale endeavors with relevant real-world applications have the potential to push the en-velope of what is possible further in a process of pro-ductive failure [10]. Productive failure is a conceptpioneered by Kapur which emphasizes the impor-tance of productive struggle in the context of learn-ing.

1.1 Novelty and related work

We call this competition the “AI Driving Olympics”(AI-DO) because there will be a set of different trialsthat correspond to progressively more sophisticatedbehaviors for cars. These vary in complexity, fromthe reactive task of lane following to more complexand “cognitive” behaviors outlined in section 3.

The competition will be live at the Neural Informa-tion Processing Systems (NIPS) conference, but par-ticipants will not need to be physically present—theywill just need to send their source code.

There will be qualifying rounds in simulation, similarto the recent DARPA Robotics Challenge [8], and wewill make the use of “robotariums,” [16], facilitiesthat allow remote experimentation in a reproduciblesetting more closely described in section 2.

Many competitions exist in the robotics field. One ex-ample is the long-running annual Robocup [12], orig-inally thought for robot soccer (wheeled, quadruped,and biped), and later extended to other tasks (searchand rescue, home assistance, etc.). Other impact-ful competitions are the DARPA Challenges, suchas the DARPA Grand Challenges [13] in 2007-8that rekindled the field of self-driving cars, and therecent DARPA Robotics Challenge for humanoidrobots [8].

In ML in general, and at NIPS in particular, there ex-ist no competitions that involve physical robots. Yet,the interactive, embodied setting is thought to be anessential scenario to study intelligence [15]. The ap-

2

Table 1: Competition and Benchmark Comparison

Competition DARPA [13] KITTI [6] Robocup [12] RL comp. [11] HuroCup [3] AI-DO

Accessibility - ✓ ✓ ✓ ✓ ✓Diverse metrics ✓ - ✓ - ✓ ✓Modularity - - - - - ✓Resource constraints ✓ - ✓ - ✓ ✓Simulation/Realism ✓ ✓ ✓ - ✓ ✓ML compatibility - ✓ - ✓ - ✓Embodiment ✓ - ✓ - ✓ ✓Teaching - - - - - ✓

Perception Estimation Planning Control …

LaneDetection

ObjectDetection …

RectifyImage

ColorCorrection …

Figure 2: Modularity - The full robotics problemis achieved through a hierarchical composition ofblocks. Which are the right components and whatis the right level of hierarchy to apply machine learn-ing?

plication to robotics is often stated as a motivationin recent ML literature (e.g., [5, 7] among many oth-ers). However, the vast majority of these works onlyreport on results in simulation [11] and on very sim-ple (usually grid-like) environments.

To highlight what makes robotics and this competi-tion unique, we list essential characteristics compar-ing existing competitions to the upcoming AI-DO asoutlined in Tab. 1.

• Accessibility: No upfront costs other than theoption of assembling a Duckiebot are required.Conceptually, classes teaching robotics and ma-chine learning in the Duckietown setting will bemade available online as described in section 5.

• Resource constraints: In robotics constraintson power, computation, memory and actuatorconstraints play a vital role.

• Modularity: More often than not, roboticpipelines can be decomposed into several mod-ules (see Fig. 2).

• Simulation/Realism: Duckietowns will bemade available both in simulation and realityposing interesting questions about the relation-ship to each other.

• ML compatibility: Additionally past data andongoing data from cars in Duckietown will bemade available to allow for training of ML algo-rithms.

• Embodiment: As any real robot, closed-loopand real-time interactive tasks await the partici-pants of the competition.

• Diverse metrics: In most real-world settings,not one single number determines performanceon a task. Similarly AI-DO employs multiplediverse performance metrics simultaneously.

2 The Platform

We make available a number of different resourcesto competitors to help them build, test, and finallyevaluate their algorithms:

3

Figure 3: Currently 17 hours of logs are availablefrom 38 different robots with more being continu-ously added.

1. The physical Duckietown platform [14, 17](Fig. 1): miniature vision-based vehicles andcities in which the vehicles drive. This is an af-fordable setup (∼$200/robot), with rigorouslydefined hardware and environment specifica-tions (e.g., the appearance specification, whichenables repeatable experimentation in the on-line documentation [1]).

2. Data (Fig 3): Access to many hours of logsfrom previous use of the platform in diverseenvironments (a database that will grow as thecompetition unfolds).

3. A cloud simulation and training environ-ment: for testing in simulation before trying onthe physical fleet of robots.

4. “Robotariums”: remotely accessible, com-pletely autonomous environments, continuouslyrunning the Duckietown platform. Robotariumswill allow participants to see their code runningin controlled and reproducible conditions, andobtain performance metrics on the tasks.

2.1 The Development Pipeline

Given the availability and relative affordability of theplatform, we expect most participants to choose to

build their own Duckietown and Duckiebots and de-velop their algorithms on the physical platform.

Alternatively, we provide a realistic cloud simulationenvironment. The cloud simulation also serves as aselection mechanism to access a robotarium.

The robotariums enable reproducible testing in con-trolled conditions. Moreover, they will produce thescores for the tasks by means of a network of street-level image sensors. The robotarium scores are theofficial scores for the leader board and they are usedfor the final selection of which code will be run atthe live competition at NIPS. The participants willnot need to be physically at NIPS — they can par-ticipate remotely by submitting a Docker container,which will be run for them following standardizedprocedures.

2.1.1 The physical Duckietown platform

The physical Duckietown platform comprises au-tonomous vehicles (Duckiebots) and a customizablemodel urban environment (Duckietown) [1].

The Robot Duckiebots are equipped with only onesensor: a front-viewing camera with 160◦ fish-eyelens, streaming 640× 480 resolution images reliablyat 30 fps.

Actuation is provided through two DC motors that in-dependently drive the front wheels (differential driveconfiguration), while the rear end of the Duckiebotis equipped with a passive omnidirectional wheel. Aminimum radius of curvature constraint is imposed,at software level, to simulate more car-like dynam-ics.

All the computation is done onboard on a RaspberryPi 3 B+ computer, equipped with a quad-core 1.4GHz, 64 bit CPU and 1 GB of RAM.

We might support other configurations for the pur-poses of deploying neural networks onto the robots.

Power is provided by a 10 Ah battery, which guaran-tees several hours of operation.

4

http://purl.org/dth/duckietown-specs

Figure 4: The Duckietown environment is rigorouslydefined at road and signal level. When the appear-ance specifications are met, Duckiebots are guaran-teed to navigate cities of any conforming topology.

The Environment Duckietowns are modular,structured environments built on two layers: theroad and the signal layers (Fig. 4).

There are six well defined road segments: straight,left and right 90 deg turns, 3-way intersection, 4-wayintersection, and empty tile. Each is built on indi-vidual tiles, and their interlocking enables customiza-tion of city sizes and topographies. The appearancespecifications detail the color and size of the lines aswell as the geometry of the roads.

The signal layer comprises of street signs and trafficlights.

In the baseline implementation, street signs are April-Tags (in union with typical road sign symbols) thatenable global localization and interpretation of inter-section topologies by Duckiebots. The appearancespecifications detail their size, height and positioningin the city. Traffic lights provide a centralized solu-tion for intersection coordination, encoding signalsin different LED blinking frequencies[1].

Fleet management

Navigation

Lane following

Abstraction

Figure 5: The tasks are designed to be increas-ingly complex, and consequently investigate the au-tonomous mobility on demand problem at increasinglevels of abstraction.

3 Tasks

Many recent works in deep (reinforcement) learningcite robotics as a potential application domain [5, 7].However, comparatively few actually demonstrate re-sults on physical agents. This competition is an op-portunity to properly benchmark the current state ofthe art of these methods as applied to a real roboticssystem.

Our experience thus far indicates that many of theinherent assumptions made in the ML communitymay not be valid on real-time physically embodiedsystems. Additionally, considerations related to re-source consumption, latency, and system engineer-ing are rarely considered in the ML domain but arecrucially important for fielding real robots. We hopethis competition can be used to benchmark the stateof the art as it pertains to real physical systems and,in the process, spawn a more meaningful discussionabout what is necessary to move the field forward.

The best possible outcome is that a larger propor-tion of the ML community redirects its efforts to-wards real physical agents acting in the real world,and helps to address the unique characteristics of theproblem. The guaranteed impact is that we can estab-lish a baseline for where the state of the art really isin this domain.

5

We propose a sequence of five challenges, in increas-ing order of difficulty. We briefly discuss the goal ofeach challenge; later, we will discuss the metrics indetail.

LF Lane following on a closed course, with staticobstacles (cones and parked vehicles). Therobot will be placed on a conforming closedtrack (with no intersections) and should followthe track on the right-hand lane.

LFV Lane following on a continuous closed course,with static obstacles plus dynamic vehiclescontrolled “by the matrix” sharing the road.Now the agent needs to deal with an environ-ment populated by other intelligent agents.

NAVV Navigation plus vehicles controlled by the ma-trix. Requires that the robot implement a coordi-nation protocol for navigating the intersections.

FM Fleet management: This scenario is meant toresemble a taxi-service. Customer request togo from location ”A” to location ”B” arrive se-quentially and need to be served in an intelligentmanner. Participants are asked to submit a cen-tral dispatcher that submits navigation tasks toDuckiebots to best serve customer requests.

AMOD Autonomous mobility on demand: in additionto dynamic navigation, participants must imple-ment a central dispatcher that provides goals tothe individual robots. Points are given for thenumber of customers served.

3.1 Metrics and Judging

Each challenge will have specific objective metrics toquantify success. The success metrics will be basedon measurable quantities such as speed, timing andautomated detection of breaches of rules of the road.No human judges are necessary.

There are going to be three classes of metrics:

1. Traffic laws compliance metrics, to penalizethe illegal behavior (e.g. not respecting the rightof way);

2. Performance metrics, such as the averagespeed, to penalize inefficient solutions;

3. Comfort metrics, to penalize unnatural solu-tions that would not be comfortable to a passen-ger.

We will keep these metrics separate, and evaluate thesolutions on a partial order, rather than creating atotal order by collapsing the metrics in a single re-ward function to optimize. We will impose minimumobjectives (minimum speed, maximum violation) toavoid degenerate solutions (e.g. a robot that does notmove is very safe, but it does not have an accept-able performance). A more detailed mathematicaldescription of these metrics is provided in the sup-plementary material (section 8).

We anticipate that there will be multiple winners ineach competition (e.g. a very conservative solution,a more adventurous one, etc.).

4 Technical Components

4.1 Localization

We are building infrastructure in our Robotariumssystem to automate the localization of the robots.This is essential for two reasons: 1) To evaluate thescore of the agents, and 2) To robustly automate theprocess of resetting the agents at the their initial con-figurations in preparation for the evaluation of thenext entry.

The approach for automatic localization is based onan system of cameras mounted on traffic lights andother elevated city structures, which are designedto locate the robots and report on their locations(Fig. 6).

Additional solutions might be used to increase over-all robustness (e.g., ceiling mounted cameras).

6

Table 2: Overview of tasks in pictures. From left to right: Lane following, lane following with dynamicvehicles, navigation, fleet management and autonomous mobility on demand.

3

3

3

3

13

16

14

15

3381

3383

3382

3380

152

3

132

3142 75

333333374

333333374152

3

132

3142 75

333333374

333333376

152

3

132

3142 75

333333374

333333374161

3

141

3151 78

333333377

333333379

152

3

132

314275

333333374

333333374159

3

139

314992

333333391

333333393

152

3

132

314275

333333374

333333374160

3

140

315088

333333389

333333390

152

3

132

314275

333333374

333333374158

3

138

314820

4

3333333205

3333333203

152

3

132

314275

333333374

333333374 156

3

136

3146210

3333333211

3333333209

152

3

132

314275

333333374

333333374 157

3

137

3147207

3333333206

3333333208

152

3

132

314275

333333374

333333374 155

3

135

3145217

3333333214

3333333213

152

3

132

314275

333333374

33333337461

3

57

366224

3333333226

3333333225

152

3

132

314275

333333374

33333337462

3

58

365222

3333333221

3333333223

152

3

132

3142 75

74

333333374154

3

134

3144 217

3333333216

3

3

3

3

198

199

46

197

3385

3387

3386

3384

3

3

3

3

48

45

47

44

33202

33200

3394

33201

6

5 4

7

3

1 2

8 9

11

1514

10

13

153

133

143

219

218

220

12

215

1618

19

20 21

17

Figure 6: The overhead cameras are used to provideground truth localization of the robots inside the Rob-otarium environments.

4.2 Containerization

In order to lower the barrier of entry for participantsand minimize the amount of code refinements be-tween platforms (simulators, robotarium, real robot),we are developing a state-of-the-art containerizationinfrastructure, which is also compatible with thecloud interface.

4.3 Simulation

We are developing a suite of simulators that areavailable to the competitors for use in development.These simulators scale in fidelity with the complexity

Figure 7: Our Docker containerization system actsas an interface between the hardware/simulator andthe competitors developed code.

of the tasks:

• we have a very lightweight ”machine-learningfriendly” simulator with minimal dependenciesand a low-level API based on OpenGL (shownin Fig. 8) that is used for lane following tasks;

• for navigation tasks, we provide a higher fidelitysimulator based on Unity;

• for the AMOD task, we provide a fleet-levelsimulator.

4.4 Baselines

We are providing baseline “strawman” solutions,based both on “classical” and learning-based ap-proaches. These solutions are fully functional (theywill get a score if submitted) and contain all the nec-essary components, but should be easily beatable.

7

Figure 8: Lightweight OpenGL based simulatorused for lane following tasks.

Additionally, the baseline solutions will be modular,i.e., they will contain a composition of containers,where competitors can choose to replace single con-tainers or combinations (there need not be a one toone mapping between the containers in the baselinesolutions and the competitor entries) as long as thewell-specified APIs are preserved.

5 Pedagogical

It is envisioned that there will be two different groupsof competitors: (a) independent learners not affili-ated with any institution and (b) participants withinthe framework of a university class. A second ver-sion of the globally synchronized class will culmi-nate this year with students submitting entries to thecompetition. We have provided a full suite of onlinelearning materials in the ”Duckiebook” [1] to supportboth groups. These materials include slides, theoryunits, exercises, and demonstration instructions. Formore details about the educational components of theprojects please see [17].

6 Outreach

One of the key design criteria for this project andcompetition is the low barrier of entry, both in terms

of effort and cost. This is very important since wewant to encourage participation from all geographi-cal and demographic categories.

To achieve this objective we have:

• designed the containerization framework to al-low for rapid development;

• striven to keep the hardware components of theplatform accessible (mostly off-the-shelf items)and low cost;

• provided access to cloud infrastructure, to en-sure a level playing field for all participants.

7 Conclusion

We are excited to announce a new competition - theAI Driving Olympics, which will take place at theNIPS 2018 conference. The main objective is toevaluate the performance of state-of-the-art machinelearning systems in the real physically embodiedrobot setting. The challenges inherent in deployingrobots in the real world are quite different than mostother applications where machine learning has hadrecent breakthroughs. We believe this is an oppor-tunity to inspire the members of the ML communityto focus more efforts on the physical embodied sce-nario, and to provide a benchmark for realistic com-parison of algorithms. The platform is designed havea very low barrier for entry, both in terms of costand in terms of effort, and we therefore hope to at-tract participants from diverse geographical regionsand underrepresented demographics.

References[1] The Duckietown Book (Duckiebook).

docs.duckietown.org.

[2] John Anderson, Jacky Baltes, and Chi TaiCheng. Robotics competitions as benchmarksfor AI research. The Knowledge Engineer-ing Review, 26(1):1117, 2011. doi: 10.1017/S0269888910000354.

8

[3] Jacky Baltes, Kuo-Yang Tu, Soroush Sadeghne-jad, and John Anderson. HuroCup: competitionfor multi-event humanoid robot athletes. TheKnowledge Engineering Review, 32, 2017.

[4] Sven Behnke. Robot competitions-ideal bench-marks for robotics research. In Proc. of IROS-2006 Workshop on Benchmarks in Robotics Re-search. Institute of Electrical and ElectronicsEngineers (IEEE), 2006.

[5] Devendra Singh Chaplot, Emilio Parisotto,and Ruslan Salakhutdinov. Active NeuralLocalization. In International Conferenceon Learning Representations, 2018. URLhttps://openreview.net/forum?id=ry6-G_66b.

[6] Andreas Geiger, Philip Lenz, and Raquel Ur-tasun. Are we ready for autonomous driving?the kitti vision benchmark suite. In ComputerVision and Pattern Recognition (CVPR), 2012IEEE Conference on, pages 3354–3361. IEEE,2012.

[7] Irina Higgins, Arka Pal, Andrei Rusu, LoicMatthey, Christopher Burgess, AlexanderPritzel, Matthew Botvinick, Charles Blundell,and Alexander Lerchner. {DARLA}: Im-proving Zero-Shot Transfer in ReinforcementLearning. In Proceedings of the 34th In-ternational Conference on Machine Learning(ICML), volume 70, pages 1480–1490, 2017.

[8] Karl Iagnemma and Jim Overholt. Introduc-tion. Journal of Field Robotics, 32(2):187–188,2015.

[9] Abraham Kaplan. The conduct of inquiry:Methodology for behavioural science. Rout-ledge, 2017.

[10] Manu Kapur. Productive failure. Cognition andinstruction, 26(3):379–424, 2008.

[11] Ł. Kidzinski, S. P. Mohanty, C. Ong, J. L.Hicks, S. F. Carroll, S. Levine, M. Salathe, andS. L. Delp. Learning to Run challenge: Synthe-sizing physiologically accurate motion using

deep reinforcement learning. ArXiv e-prints, 32018.

[12] Hiroaki Kitano, Minoru Asada, Yasuo Ku-niyoshi, Itsuki Noda, and Eiichi Osawa.Robocup: The robot world cup initiative. InProceedings of the first international confer-ence on Autonomous agents, pages 340–347.ACM, 1997.

[13] Keoni Mahelona, Jeffrey Glickman, AlexanderEpstein, Zachary Brock, Michael Siripong, andK Moyer. Darpa grand challenge. 2007.

[14] Liam Paull, Jacopo Tani, Heejin Ahn, JavierAlonso-Mora, Luca Carlone, Michal Cap,Yu Fan Chen, Changhyun Choi, Jeff Dusek, Ya-jun Fang, and others. Duckietown: an open,inexpensive and flexible platform for auton-omy education and research. In Robotics andAutomation (ICRA), 2017 IEEE InternationalConference on, pages 1497–1504. IEEE, 2017.

[15] Rolf Pfeifer and Christian Scheier. Understand-ing intelligence. MIT press, 2001.

[16] Daniel Pickem, Paul Glotfelter, Li Wang, MarkMote, Aaron Ames, Eric Feron, and MagnusEgerstedt. The Robotarium: A remotely ac-cessible swarm robotics research testbed. InRobotics and Automation (ICRA), 2017 IEEEInternational Conference on, pages 1699–1706.IEEE, 2017.

[17] Jacopo Tani, Liam Paull, Maria T Zuber,Daniela Rus, Jonathan How, John Leonard, andAndrea Censi. Duckietown: an innovative wayto teach autonomy. In International ConferenceEduRobotics 2016, pages 104–121. Springer,2016.

[18] Julian Zilly, Amit Boyarski, Micael Car-valho, Amir Atapour Abarghouei, Konstanti-nos Amplianitis, Aleksandr Krasnov, Massi-miliano Mancini, Hernn Gonzalez, RiccardoSpezialetti, Carlos Sampedro Perez, and oth-ers. Inspiring Computer Vision System Solu-tions. arXiv preprint arXiv:1707.07210, 2017.

9

https://openreview.net/forum?id=ry6-G_66b

https://openreview.net/forum?id=ry6-G_66b

8 Appendix: Performance met-rics

Measuring performance in robotics is less clear cutand more multidimensional than traditionally en-countered in machine learning settings. Nonetheless,to achieve reliable performance estimates we assesssubmitted code on several episodes with different ini-tial settings and compute statistics on the outcomes.We denote J to be an objective or cost function to op-timize, which we evaluate for every experiment. Inthe following formalization, objectives are assumedto be minimized.

In the following we summarize the objectives usedto quantify how well an embodied task is completed.We will produce scores in three different categories:performance objective, traffic law objective and com-fort objective. Note that the these objectives are notmerged into one single number.

8.1 Performance objective

Lane following (LF / LFV)

As a performance indicator for the ”lane followingtask” and the ”lane following task with other dy-namic vehicles”, we choose the speed v(t) along theroad (not perpendicular to it) over time of the Duck-iebot. This then in turn measures the moved dis-tance per episode, where we fix the time length ofan episode. This encourages both faster driving aswell as algorithms with lower latency. An episodeis used to mean running the code from a particularinitial configuration.

JP−LF (V )(t) =

∫ t

0

−v(t)dt

The integral of speed is defined over the traveled dis-tance of an episode up to time t = Teps, where Teps

is the length of an episode.

Navigation (NAVV)

Similarly, for the ”navigation with dynamic vehiclestask” (NAVV), we choose the time it takes to go frompoint A to point B within a Duckietown map as per-formance indicator. A trip from A to B is active assoon as it is received as long as it has not been com-pleted.

This is formalized in the equation and integral below.

JP−NAV V (t) =

∫ t

0

IAB−activedt

The indicator function IAB−active is 1 if a trip is ac-tive and 0 otherwise. Again the integral of an episodeis defined up to time t = Teps, where Teps is thelength of an episode.

Fleet management (FM)

As performance objective on task FM, we calculatethe sum of trip times to go from Ai to Bi. Thisgeneralizes the objective from task NAVV to multi-ple trips. The difference to task NAVV is that nowmultiple trips (Ai, Bi) may be active at the sametime. A trip is active as soon as it is requested and aslong as it has not been completed. Likewise, multi-ple Duckiebots are now available to service the addi-tional requests. To reliably evaluate the metric, mul-tiple pairs of points A, B will be sampled at differenttime points within an episode.

JP−FM (t) =∑i

∫ t

0

Ii−activedt

The indicator function Ii−active is 1 if a trip is activeand 0 otherwise. Again the integral of an episode isdefined up to time t = Teps, where Teps is the lengthof an episode.

Autonomous mobility on demand (AMoD)

An AMoD system needs to provide the highest pos-sible service level in terms of journey times and wait

10

times, while ensuring maximum fleet efficiency. Wehave two scoring metrics representing these goals ina simplified manner. In order to normalize their con-tributions, we supply a baseline case B for every sce-nario. In the baseline case, a simple heuristic prin-ciple is used to operate the AMoD system and theperformance is recorded.

We introduce the following variables:

dT total distance driven by fleetdE empty distance driven by fleet

without a customer on board

R ∈ N+ number of requests in scenariowi ∈ R waiting time of request iwi,B ∈ R wi in B case

N ∈ N+ number of taxis

NB ∈ N+ N in B case

The first performance metric is for cases when thesame number of vehicles as in the benchmark caseNB is used:

JP−AMOD−1 = 0.5 · dEdT

+ 0.5 ·∑K

i=1 wi∑Ki=1 wi,B

The second performance metric allows the designerto reduce the number of vehicles, if possible, or in-crease it if deemed useful:

JP−AMOD−2 = JAMOD−1 + 0.5 · N

NB

For the AMoD task, only a performance metric willbe evaluated. Robotic taxis are assumed to alreadyobserve the rules of the road as well as drive comfort-ably. Through the abstraction of the provided AMoDsimulation, these conditions are already enforced.

8.2 Traffic law objective

The following are a list of rule objectives the Duck-iebots are supposed to abide by within Duckietown.All individual rule violations will be summarized inone overall traffic law objective JT . These penaltieshold for the lane following, navigation and fleet man-agement tasks (LF, LFV, NAVV, FM).

Figure 9: Picture depicting a situation in which thestaying-in-the-lane rule applies.

Quantification of “Staying in the lane” TheDuckietown traffic laws say:

“The vehicle must stay at all times in the right lane,and ideally near the center.”

We quantify this as follows: let d(t) be the abso-lute perpendicular distance of the center of mass theDuckiebot-body from the middle of the right lane,such that d(t) = 0 corresponds to the robot being inthe center of the right lane at a given instant. Whiled(t) stays within an acceptable range no cost is in-curred. When the safety margin dsafe is violated, coststarts accumulating proportionally to the square ofd(t) up to an upper bound dmax. If even this boundis violated a lump penalty α is incurred.

The “stay-in-lane” cost function is therefore definedas:

11

JT−LF (t) =

∫ Teps

0

0 d(t) < dsafe

βd(t)2 dsafe ≤ d(t) ≤ dmax

α d(t) > dmax

An example situation where a Duckiebot does notstay in the lane is shown in Fig. 9.

Quantification of “Stopping at red intersectionline” The Duckietown traffic laws say:

“Every time the vehicle arrives at an intersection witha red stop line, the vehicle should come to a completestop in front of it, before continuing.”

Figure 10: Picture depicting a Duckiebot stoppingat a red intersection line.

During each intersection traversal, the vehicle is pe-nalized by γ if there was not a time t when the ve-hicle was at rest (v(t) = 0) in the stopping zone de-fined as the rectangular area of the same width asthe red line between 3 and 10 cm distance from thestart of the stop line perpendicular to the center ofmass point of the Duckiebot. This situation is demon-strated in Fig. 10. The condition that the position p(t)of the center of mass of the Duckiebot is in the stop-ping zone is denoted with p(t)bot ∈ S . Then we writethe objective as the cumulative sum of stopping at in-tersection rule infractions.

JT−SI(t) =∑tk

γI∄t s.t. v(t)=0∧p(t)bot∈Szone

Here the sum over time increments tk denote thetime intervals in which this conditions is checked.The rule penalty is only applied once the Duckiebotleaves the stopping zone. Only then is it clear that itdid not stop within the stopping zone.

To measure this cost, the velocities v(t) are evaluatedwhile the robot is in the stopping zone S.

Quantification of “Keep safety distance” TheDuckietown traffic laws say:

“Each Duckiebot should stay at an adequate distancefrom the Duckiebot in front of it, on the same lane,at all times.”

We quantify this rule as follows: Let b(t) denote thedistance between the center of mass of the Duckiebotand the center of mass of the closest Duckiebot infront of it which is also in the same lane. Further-more let bsafe denote a cut-off distance after whicha Duckiebot is deemed ”far away”. Let δ denote ascalar positive weighting factor. Then

JT−SD(t) =

∫ t

0

δ ·max(0, b(t)− bsafe)2.

Quantification of “Avoiding collisions” TheDuckietown traffic laws say:

At any time a Duckiebot shall not collide with an-other object or Duckiebot.

Figure 11: Picture depicting a collision situation.

12

The vehicle is penalized by ν if within a time a timeinterval of length tk t ∈ [t, t + tk), the distance ℓ(t)between the vehicle and a nearby object or other ve-hicle is zero or near zero. ℓ(t) denotes the perpen-dicular distance between any object and the Duck-iebot rectangular surface. The collision cost objec-tive therefore is

JT−AC(t) =∑tk

νI∃t∈[t−tk,t)ℓ(t)<ϵ

Time intervals are chosen to allow for maneuveringafter collisions without incurring further costs.

An illustration of a collision is displayed in Fig. 11.

Quantification of “Yielding the right of way”The Duckietown traffic laws say:

Every time a Duckiebot arrives at an intersectionwith a road joining on the right, it needs to checkwhether there are other Duckiebots on the right-handlane of the joining road. If so, these vehicles shalltraverse the intersection first.

Mathematically we accumulate penalties µ wheneverthe Duckiebot moves at an intersection while thereis a Duckiebot (DB) on the right hand joining lane(RHL).

JT−Y R(t) =∑tk

µIv(t)>0∧∃ DB in RHL

The yield situation at an intersection is depicted inFig. 12.

Hierarchy of rules To account for the relative im-portance of rules, the factors α, β, γ, δ, ν, µ of theintroduced rules will be weighted relatively to eachother.

Letting > here denote “more important than”, we de-fine the following rule hierarchy:

JT−AC > JT−SI > JT−Y R > JT−SD > JT−LF

I.e.:

Figure 12: Picture depicting a situation in which theyield rule applies.

Collision avoidance > Stop line > Yielding >Safety distance > Staying in the lane.

This constrains the factors α, β, γ, δ, ν, µ whose ex-act values will be determined empirically to enforcethis relative importance.

While the infractions of individual rules will be re-ported, as a performance indicator all rule violationsare merged into one overall traffic law objective JT .Let T denote a particular task, then the rule violationobjective is the sum of all individual rule violationsJi which are an element of that particular task.

JT =∑i

IJi∈TJT−i,

where IJi∈T is the indicator function that is 1 if arule belongs to the task and 0 otherwise.

8.3 Comfort metric

Lane following and navigation (LF, LFV, NAVV)

In the single robot setting, we encourage “comfort-able” driving solutions. We therefore penalize largeaccelerations to achieve smoother driving. Thisis quantified through smoothed changes in Duck-iebot position pbot(t). Smoothing is performed by

13

convolving the Duckiebot position pbot(t) with asmoothing filter ksmooth.

As a comfort metric, we measure the smoothed abso-lute changes in position ∆pbot(t) over time.

JC−LF/LFV/NAV V (t) =

∫ t

0

ksmooth ∗∆pbot(t)dt

Fleet management (FM)

In the fleet management setting ”customer experi-ence” is influenced greatly by how fast and depend-able a service is. If it is known that a taxi arrivesquickly after ordering it, it makes the overall taxi ser-vice more convenient.

We therefore define the comfort metric as the maxi-mal waiting time Twait until customer pickup. LetTwait denote the time beginning at the reception of aride request until when the ride is started.

Let Swait(t) = {Twait1 , . . . } denote the set of wait-ing times of all started ride requests Ai → Bi up totime t. Then the comfort metric of the fleet manage-ment task is the maximal waiting time stored in theset Swait.

JC−FM (t) = maxTwait

Swait

This concludes the exposition of the rules of the AIDriving Olympics. Rules and their evaluation aresubject to changes to ensure practicability and fair-ness of scoring.

14

Documents

The AI Driving Olympics at NIPS 2018 - GitHub PagesCorresponding author: [email protected] live ”Robotariums” where submissions will be evalu-ated automatically. Keywords