10
Exploration using an Information-Based Reaction-Diusion Process Maani Ghaari Jadidi, Jaime Valls Mir´o, Rafael Valencia, Juan Andrade-Cetto and Gamini Dissanayake University of Technology, Sydney (UTS), Australia {maani.ghaarijadidi, jaime.vallsmiro, gamini.dissanayake}@uts.edu.au ¨ Orebro University, Sweden [email protected] Institut de Rob` otica i Inform` atica Industrial, CSIC-UPC, Spain [email protected] Abstract In this paper, a novel solution for autonomous robotic exploration is proposed. We model the distribution of information in an unknown environment as an unsteady diusion process, which can be an appropriate mathematical for- mulation and analogy for expanding, time- varying, and dynamic environments. This in- formation distribution map is the solution of the diusion process partial dierential equa- tion, and is regressed from sensor data as a Gaussian Process. Optimization of the pro- cess parameters leads to an optimal frontier map which describes regions of interest for fur- ther exploration. Since the presented approach considers a continuous model of the environ- ment, it can be used to plan smooth exploration paths exploiting the structural dependencies of the environment whilst handling sparse sensor measurements. The performance of the ap- proach is evaluated through simulation results in the well-known Freiburg and Cave maps. 1 Introduction Exploration in unknown environments is among the ma- jor challenges for an autonomous robot. Autonomous exploration should be ecient with regards to one or more criteria (minimum energy consumption, shortest path, reduced map and localization uncertainties, re- duced computational complexity), and be performed This paper is an extended version of the paper presented in a RSS’13 workshop: M. Ghaari Jadidi, J. Valls Mir´ o, R. Valencia, J. Andrade-Cetto and G. Dissanayake. Explo- ration in Information Distribution Maps, RSS’13 Workshop on Robotic Exploration, Monitoring, and Information Con- tent, Berlin, Germany, 2013. J. Andrade acknowledges support from the Spanish Min- istry of Economy and Competitiveness project PAU+ DPI 2011-27510 and the EU FP7 project ARCAS ICT-287617. R. Valencia acknowledges support from the EU FP7 project SPENCER ICT-600877. with limited perception capabilities due to sensor lim- itations. Yamauchi [Yamauchi, 1997] expressed the central question in exploration as: “Given what you know about the world, where should you move to gain as much new information as possible?” Thus, exploration entails building a reliable and accurate world model in which to evaluate such criteria. Currently, almost all available ex- ploration methods rely on planar grid-based representa- tions. Relying on grid-based maps, however, ignores the structural dependency in the environment due to the as- sumption of independence between cells. Moreover, the use of grid-maps at a fixed resolution have scalability is- sues both in the size of the area they can handle, and in the dimension of the space to explore, leading to an increase in computational complexity. We tackle these two shortcomings of grid maps by de- vising a continuous map representation that takes into account structural dependencies by learning the environ- ment information distribution from sensor data. In this paper, information is defined as the inverse of the envi- ronment’s structural variance. This means that maxi- mization of information (completion of a map) leads to reduction of uncertainty in the environment’s map. We model the distribution of information in the environment as an unsteady diusion process, which is an appropriate mathematical formulation and analogy for expanding, time-varying, and dynamic environments. We use Gaus- sian Processes (GPs) as a regression tool in a Bayesian framework to learn the information distribution map. The resulting map is the solution of the diusion pro- cess PDE (partial dierential equation) [Stocker, 2011]. In contrast with frontier based methods which extract frontiers from an occupancy grid map, a continuous fron- tier map is computed by optimizing the process param- eters. By clustering the frontier map, possible goals for further exploration are extracted. After a review of the work related to our approach (Section 2), in Section 3 we provide a brief introduc- tion to Gaussian Processes. Next, in Section 4, we in-

Exploration using an Information-Based Reaction-Di usion

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Exploration using an Information-Based Reaction-Di usion

Exploration using an Information-Based Reaction-Di↵usion Process

Maani Gha↵ari Jadidi, Jaime Valls Miro, Rafael Valencia, Juan Andrade-Cetto

and Gamini Dissanayake

University of Technology, Sydney (UTS), Australia{maani.gha↵arijadidi, jaime.vallsmiro, gamini.dissanayake}@uts.edu.au

Orebro University, Sweden [email protected] de Robotica i Informatica Industrial, CSIC-UPC, Spain [email protected]

Abstract

In this paper, a novel solution for autonomousrobotic exploration is proposed. We modelthe distribution of information in an unknownenvironment as an unsteady di↵usion process,which can be an appropriate mathematical for-mulation and analogy for expanding, time-varying, and dynamic environments. This in-formation distribution map is the solution ofthe di↵usion process partial di↵erential equa-tion, and is regressed from sensor data as aGaussian Process. Optimization of the pro-cess parameters leads to an optimal frontiermap which describes regions of interest for fur-ther exploration. Since the presented approachconsiders a continuous model of the environ-ment, it can be used to plan smooth explorationpaths exploiting the structural dependencies ofthe environment whilst handling sparse sensormeasurements. The performance of the ap-proach is evaluated through simulation resultsin the well-known Freiburg and Cave maps.

1 Introduction

Exploration in unknown environments is among the ma-jor challenges for an autonomous robot. Autonomousexploration should be e�cient with regards to one ormore criteria (minimum energy consumption, shortestpath, reduced map and localization uncertainties, re-duced computational complexity), and be performed

This paper is an extended version of the paper presented

in a RSS’13 workshop: M. Gha↵ari Jadidi, J. Valls Miro,

R. Valencia, J. Andrade-Cetto and G. Dissanayake. Explo-

ration in Information Distribution Maps, RSS’13 Workshop

on Robotic Exploration, Monitoring, and Information Con-

tent, Berlin, Germany, 2013.

J. Andrade acknowledges support from the Spanish Min-

istry of Economy and Competitiveness project PAU+ DPI

2011-27510 and the EU FP7 project ARCAS ICT-287617.

R. Valencia acknowledges support from the EU FP7 project

SPENCER ICT-600877.

with limited perception capabilities due to sensor lim-itations.

Yamauchi [Yamauchi, 1997] expressed the centralquestion in exploration as: “Given what you know aboutthe world, where should you move to gain as muchnew information as possible?” Thus, exploration entailsbuilding a reliable and accurate world model in which toevaluate such criteria. Currently, almost all available ex-ploration methods rely on planar grid-based representa-tions. Relying on grid-based maps, however, ignores thestructural dependency in the environment due to the as-sumption of independence between cells. Moreover, theuse of grid-maps at a fixed resolution have scalability is-sues both in the size of the area they can handle, andin the dimension of the space to explore, leading to anincrease in computational complexity.

We tackle these two shortcomings of grid maps by de-vising a continuous map representation that takes intoaccount structural dependencies by learning the environ-ment information distribution from sensor data. In thispaper, information is defined as the inverse of the envi-ronment’s structural variance. This means that maxi-mization of information (completion of a map) leads toreduction of uncertainty in the environment’s map. Wemodel the distribution of information in the environmentas an unsteady di↵usion process, which is an appropriatemathematical formulation and analogy for expanding,time-varying, and dynamic environments. We use Gaus-sian Processes (GPs) as a regression tool in a Bayesianframework to learn the information distribution map.The resulting map is the solution of the di↵usion pro-cess PDE (partial di↵erential equation) [Stocker, 2011].In contrast with frontier based methods which extractfrontiers from an occupancy grid map, a continuous fron-tier map is computed by optimizing the process param-eters. By clustering the frontier map, possible goals forfurther exploration are extracted.

After a review of the work related to our approach(Section 2), in Section 3 we provide a brief introduc-tion to Gaussian Processes. Next, in Section 4, we in-

Page 2: Exploration using an Information-Based Reaction-Di usion

troduce our proposed information maps learned and in-ferred from GPs, and in Section 5 we present the deriva-tion of the frontier maps as a result of modeling the in-formation distribution as a di↵usion process. Sections 6and 7 describe the process of extracting goals for explo-ration and map regeneration, respectively. Finally, sec-tions 8 and 9 describe simulation results of the proposedapproach on two well known mapping data-sets and con-clusion, including possible extensions to this work.

2 Related Work

Julia et al.

[Julia et al., 2012] studied and comparedseven methods for autonomous exploration and map-ping of unknown environments on the basis of explo-ration time and mapping error, both for single andcooperative mapping. The seven major approachesanalyzed are nearest frontier [Yamauchi, 1998], cost-utility [Gonzalez-Banos and Latombe, 2002], behaviour-based coordinated [Lau, 2003], coordinated [Burgard et

al., 2000], market-based coordinated [Zlot et al., 2002],integrated [Makarenko et al., 2002], and hybrid inte-grated [Julia et al., 2010]. The comparison yielded con-clusive results on the strategy to be used depending onthe type of scenario and number of robots used. How-ever, all of the techniques analyzed used occupancy gridsfor map representation.

Valencia et al.

[Valencia et al., 2012] presented anactive exploration strategy that tackles the limitationof the granularity of the occupancy grid by minimiz-ing overall map and path entropies. Whereas themap entropy is computed on an occupancy grid at avery coarse resolution, the path entropy is the out-come of maintaining a Pose SLAM [Ila et al., 2010;Valencia et al., 2013], a variant of SLAM in which onlythe robot trajectory is estimated through the observa-tion of relative constraints between robot poses. In thework presented here, we resort also to Pose SLAM forlocalization and navigation, but maintain instead a con-tinuous information distribution map.

In [Valencia et al., 2012] candidate goals were eitherfrontiers [Yamauchi, 1997], or loop closing entropy min-imizers. The paths to either frontiers or loop closurecandidates were computed as shortest paths in a proba-bilistic roadmap [Kavraki et al., 1996]. Another strategyto compute exploration paths is to treat the frontier be-tween explored and unexplored areas as boundary con-ditions, and the explored area as a scalar potential field.The objective here is to find optimal paths to the un-explored sections. In [Prestes e Silva et al., 2002] forinstance, the gradient is computed on this field solvingLaplace’s PDE with Dirichlet boundary conditions (set-ting 1 for obstacles and 0 for free cells). This gradientdescent direction indicates the path to the unexploredsections. Shade and Newman [Shade and Newman, 2011]

suggest the use of both Dirichlet and Neumann bound-ary conditions, and extend the method to work with oc-tomaps. Optimal paths to the boundaries of unexploredsections are computed using steepest descent on the as-sociated gradient field.

To cope with varying resolutions for the explored andunexplored regions, in [Shen et al., 2012] is proposed tomap unexplored regions sparsely and to determine theregions for further exploration based on the evolution ofa stochastic di↵erential equation that simulates the ex-pansion of a system of particles with Newtonian dynam-ics. The method is also presented for a 3D environment,although no considerations are made on how to computethe path to the goal. In [Vallve and Andrade-Cetto,2013], an information potential field by taking into ac-count the joint entropy of map and path is defined. Therobot trajectory is computed from descending on the gra-dient of the potential field, still, an accurate computa-tion of occupancy maps is preliminary to compute thepotential field.

The above mentioned methods can consider uncer-tainty in the occupancy cells but not take into accountstructural correlations in the environment. The path tothe boundary computed on the scalar field assume a uni-formly discretized occupancy map. Recent developmentsin Bayesian regression and classification methods, par-ticularly from the machine learning community, are pro-viding strong mathematical tools for continuous learn-ing and inference in complex data sets. Non-parametrickernel models, such as Gaussian Processes, have provenparticularly powerful to represent the a�nity of spa-tially correlated data, hence overcoming the traditionalassumption of independence between cells, characteris-tic of the occupancy grid method for mapping environ-ments [T O’Callaghan and Ramos, 2012]. The GP as-sociated variance surface equates to a continuous repre-sentation of uncertainty in the environment, which canbe used to highlight unexplored regions and optimizea robot’s search plan. The continuity property of theGP map can improve the flexibility of the planner byinferring directly on collected sensor data without be-ing limited by the resolution of the grid cell [Gan et al.,2009].

In this paper, we propose an exploration method thatcomputes exploration goals on the GP associated vari-ance surface. However, training a unique GP for bothoccupied and free areas results in a mixed variance sur-face and it is not possible to di↵erentiate between bound-aries of occupied-unknown and free-unknown space with-out thresholding of the continuous map (see Fig. 6in [T O’Callaghan and Ramos, 2012]). Moreover, it lim-its selection of an appropriate kernel and results in ex-trapolated obstacles or low quality free areas map. Toaddress this problem we propose training two separate

Page 3: Exploration using an Information-Based Reaction-Di usion

GPs, one for free areas and one for obstacles, and com-pute the di↵erence between them to come up with aunique information distribution map for exploration.

3 Gaussian Processes

GPs are a non-parametric Bayesian regression techniquein the sense that they do not explicitly define a functionalrelationship between inputs and outputs. Instead, statis-tical inference is employed to learn dependencies betweenpoints in a data set [Rasmussen and Williams, 2006]. AGP f(x) is described by its mean, m(x), and covariance(kernel) function, k(x, x0), as

f(x) ⇠ GP(m(x), k(x, x0)) (1)

where x and x0 are the training and test (query) inputvectors, respectively. By assuming that the target data,y, is jointly Gaussian, it follows

f(x0) ⇠ N (µ,�2) (2)

where

µ = E(f 0|x, y, x0) = k(x0, x)[k(x, x) + �2n

I]�1y, (3)

�2 = k(x0, x0)� k(x0, x)[k(x, x) + �2n

I]�1k(x, x0), (4)

and �2n

is the variance of the Gaussian observation noiseand f 0 represents the output values at the test locations.

4 Inferring Information Distribution

Maps with GPs

This section presents our solution for the mapping prob-lem. Through inference with GPs, we are able to esti-mate the full map posterior and avoid marginalization.We maintain two GPs, one for free areas and another onefor obstacles. To this end, we assume that the robot isequipped with a laser range finder and that local sensormeasurements are mapped into a global reference framewith Pose SLAM [Ila et al., 2010].

To compute the free area GP map, the training pointsare sampled along the laser beam between the robot andthe sensed obstacles as in [T O’Callaghan and Ramos,2012]. Computing the obstacle GP map is more straightforward, as it is possible to use the measured points inthe global reference frame directly. In both instances thetarget value can be simply set to one depending on thenature of the map, i.e, a binary classification problemwith static state: obstacle or free. Note that in train-ing with one GP the target value can be �1 and 1 foroccupied and free points, respectively.

The selection of an appropriate kernel lies at the heartof GP regression. The covariance function places a priorlikelihood on the possible functions used to evaluate thedependencies between the observations. Since environ-ments are constructed from sudden changes from free

areas to obstacles, we are interested in covariance andmean functions which produce as sharp a distributionas possible. However, sharp kernels are inappropriatefor covering large free areas, or for learning structuraldependencies such as walls. This imposes a trade-o↵between two competing objectives, smoothness to coverlarge areas and structure, and sharpness to model dis-continuities well. To this end, we have chosen to useMatern covariance functions [Stein, 1999],

k(x, x0) =1

�(⌫)2⌫�1

"p2⌫|x� x0|

#⌫

K⌫

p2⌫|x� x0|

!

(5)where � is the Gamma function, K

(·) is the modifiedBessel function of the second kind of order ⌫, is thecharacteristic length scale, and ⌫ is a parameter used tocontrol the smoothness of the covariance.

The nice feature of this kernel is that with ⌫ = 5/2 anda linear mean function, the resulting distribution mapsare twice mean square di↵erentiable. This continuity ofthe distribution map guarantees smooth planning on theinformation surface, a feature particularly beneficial forplanning in higher dimensions.

Regressing our two GP maps with the sensor data dur-ing navigation induces the most likely probability of oc-cupancy and free space. For a given query point in themap x0

i

, our GPs predict mean values for its occupancyand free states µ

i

, and associated variances �2i

. By in-verting these variances, we can compute the informationassociated to that location, �

i

= 1/�2i

. Querying over auniformly sampled range of robot locations, we assembleboth an obstacle information map, ⇤

o

(x, t) : R3 7! R,and a free area information map, ⇤

f

(x, t) : R3 7! R.With x representing a point in the map and t expressingthe time.

In the classical occupancy mapping scenario, the stateof a cell can be seen as a binary state (obstacle or free),and one usually computes the log odds l(x

i

) = log p(xi

)�log(1� p(x

i

)), i.e., obstacle over free, which for Normaldistributions reduces to l(x

i

) = log(|⌃o

i

|/|⌃f

i

|)+ kxo

i

�µo

i

k2⌃o

i

�kxf

i

� µf

i

k2⌃f

i

. To simplify the expression, we

assume equal mean values for free and obstacle states,i.e., no a priori knowledge. This simplification furtherreduces the log-odds expression to the di↵erence betweenthe inverse covariances. With this simile, we define ourdesired information distribution map as

⇤ = ⇤o

� ⇤f

. (6)

Figure 1 illustrates an example of a regressed informa-tion distribution map.

4.1 Selection of Training and Query Sets

Defining a reasonable training set is a key factor in GPs.In our case, the training set consists of temporally anno-

Page 4: Exploration using an Information-Based Reaction-Di usion

(a) (b) (c) (d)

Figure 1: A regressed information distribution map for the Cave environment, size of 20m⇥20m. a) Map of exploredarea, where the blue circles are robot poses and the black points are relevant laser scans. b) The obstacle distributionmap ⇤

o

. c) The free space distribution map ⇤f

. d) The overall information distribution map ⇤. For ⇤o

, ⇤f

, data islocally normalized between 0 and 1 at each iteration.

tated measurement points in the global reference frame.Given that each Pose SLAM pose is annotated with itscorresponding laser scan, and that such a map containsonly such poses that introduce a relevant amount of in-formation change into the map, the obvious choice is touse such sensor data to train the GPs.

During open loop, each new pose in the map gives anassociated scan which is fed to the GP. These updatesare local, and seldom overlap previous learned regionsin the domain of the GP. At loop closure however, thewhole map is recomputed and a new training sampleis produced for all scans at their newly computed posemeans. The map is updated globally and so is the GP.In this way, we guarantee that flow of information intothe GP is equivalent to that in Pose SLAM.

A plausible query set could be a dense uniform dis-tribution over the entire GP domain. That is, samplingall areas covered by the robot sensor range. However,a more e�cient alternative is to compute the query setlocally over a moving window of fixed size centered atthe robot’s current pose and covering the current sensorrange.

4.2 Updating the Global Information Map

After inferring a local information map, we need to fuseit with the global information map. The Bayesian com-mittee machine (BCM) [Tresp, 2000], suggests an ap-proach to combine estimators which were trained on dif-ferent data sets. Assuming a Gaussian prior with zeromean and covariance ⌃ and each GP with mean E(f 0|D

i

)and covariance cov(f 0|D

i

), where Di

= {(x, y)i

} is thedataset of observations used for each process, it followsthat

E(f 0|D) = C�1pX

i=1

cov(f 0|Di

)�1E(f 0|Di

) (7)

with

C = cov(f 0|D)�1 = �(p� 1)(⌃)�1 +pX

i=1

cov(f 0|Di

)�1

(8)where p is the total number of GPs. Note that in this pa-per, E(f 0|D

i

) represents information values from locallynormalized information distribution map.

5 Information Distribution Model

As the robot explores the environment, the concentra-tion of information (uncertainty) varies with time anddepends on the location of the sensed data. Therefore,the concentration of information is a time-dependentscalar field over the explored areas and the movementof the robot spreads the scalar field out as sensory datafill unknown areas with information. This is an anal-ogy with reaction-di↵usion processes and we can describethe distribution of information similar to a dynamic sys-tem with parabolic PDEs. Inherently, this mathemat-ical form takes into account time and space evolutionof the information and even its internal reaction com-bining newly detected and previously gained data. Theunsteady di↵usion equation takes the form

@⇤

@t= a2r2⇤� g⇤ (9)

where ⇤(x, t) is the total concentration of informationat pose x and at time t, and a2 and g are the di↵usionconstant and the information disintegration rate, respec-tively.

5.1 Transient State

In contrast to the regular procedure with di↵usion pro-cesses, we are not interested in solving this equation. Weare interested instead in computing, at each iteration,

Page 5: Exploration using an Information-Based Reaction-Di usion

the residual

R =@⇤

@t� a2r2⇤+ g⇤ . (10)

The residual contains the deviation between the pro-posed model and the real process and, therefore, showspotential areas for further exploration. Exploring theseareas result in both, satisfying the information distribu-tion model and compensating for the di↵erence betweenideal and regressed data.

To compute the optimal residual, we minimize thesquared norm of R

mina,g2R

kvec(R)k22 . (11)

This optimization leads to optimal values for a⇤ andg⇤ in each iteration and, accordingly, an optimal resid-ual, R⇤. The negative parts of the information distribu-tion map define free areas and frontiers are boundariesbetween free and unkown areas, hence, we call the neg-ative valued part of the optimal residual a frontier mapand compute it with

R⇤ =@⇤

@t� a⇤2r2⇤+ g⇤⇤ (12)

F =

⇢R⇤ R⇤ < 00 otherwise

. (13)

The imposed constraint to select only negative valuesin the frontier map is a result of the construction of theinformation map from obstacles and free areas. Onlynegative values correspond to variations of informationbetween known and unknown areas, whereas positivevalues represent information variations around obstacles.

5.2 Steady State

When @⇤@t

= 0 and g = 0, the reaction-di↵usion processPDE enters a steady state phase where the behaviour issimilar to Laplace’s equation. Hence, if after optimizingthe PDE parameters, the solution parameters are closeto zero, the PDE equation is not a valid proposition any-more, and the frontier map can be obtained solving theLaplace equation. In this case, the frontier map can bedefined as

F = {�r2⇤ : F < 0} . (14)

On the other hand, for any scalar field it holds that thecurl of the gradient is always zero, therefore

limt!1|r⇥r⇤| = 0,

and to exploit both equations concurrently in the frontiermap we can write

F = {�r2⇤|r⇥r⇤| : F < 0} . (15)

Algorithm 1 explorationIDM(probot

, plocal

, reset)

1: pglobal

transform2global(probot

, plocal

)2: p

free

lineSegmentation(probot

, plocal

)3: if firstFrame then

4: ⇤ = ⇤o

= ⇤f

= ;5: optimize GPs hyperparameters ✓

o

, ✓f

6: else if reset == true then

7: ⇤ = ⇤o

= ⇤f

= ;8: end if

9: x0o

, x0f

testData(probot

)10: x

o

, xf

trainingData(pglobal

, pfree

)11: �2

o

GP(✓o

, xo

, x0o

), �2f

GP(✓f

, xf

, x0f

)

12: �o

1/�2o

, �f

1/�2f

13: normalize �o

, �f

14: update ⇤o

,⇤f

15: ⇤ ⇤o

� ⇤f

16: find a⇤, g⇤

17: if

@⇤@t

= 0 and g = 0 then

18: F {�r2⇤|r⇥r⇤| : F < 0}19: else

20: F {R⇤ : R⇤ < 0}21: end if

22: goal goalExtraction(F , probot

)23: return goal

As a result of this separation between transient andsteady phases, the method considers both time and spa-tial evolution of the environment. In the transient state,time evolution of the map plays a key role and the mostrecent informative goals are more relevant, whereas thesteady state response o↵ers the possibility to cover allpotential regions that have not yet been explored. Fig-ure 2 shows the frontier surface evolution in transientstate during an exploration scenario in the Cave map.

6 Goal Extraction

In this section, extraction of exploration goals togetherwith summarization of the proposed algorithm is pre-sented. Once the frontier map is computed, we can gen-erate an arbitrary number of targets on the map. Theavailable continuous frontier map shows the most recentinformative regions on the map, and we are free to choosethe best target based on desired requirements and priori-ties. Greedy techniques maximize the di↵erence betweenthe information gain and the cost of an action [Thrun et

al., 2005]. We propose a clustering method using thek-means algorithm. Initially, a thresholded frontier mapwhich contains top 30% values (can be variable) is cre-ated. Afterwards, a target which balances informationgain and traverse distances can be selected

f = max{fi

= ↵mi

ni

N� d

1/2i

, i = 1, 2, ...,M} (16)

Page 6: Exploration using an Information-Based Reaction-Di usion

(a) (b) (c)

(d) (e) (f)

Figure 2: Evolution of the frontier surface in the Cave map during an exploration experiment. a� c) In pose SLAMmaps red lines and points show the planned path and the robot poses, respectively. Green lines indicate detected loop-closure. d�f) The frontier map facilitates exploration goal extractions. Negative values show information variationsbetween known and unknown areas, whereas positive values represent information variations around obstacles. Mapsare non-dimensional.

where di

is the distance from the current robot pose tothe i-th cluster centroid (squared root to prevent steepvariations), m

i

is the mean value of the valid frontierpoints in the cluster, n

i

is the number of points in the i-thcluster, N is total number of points, M is total numberof clusters, and ↵ is a factor to relate information gainto the cost of motion. Algorithm 1 shows the overallprocedure for exploration in a continuous space, takingas inputs the local sensor measurements p

local

and therobot pose p

robot

in the global reference frame.

7 Map Regeneration

In the proposed approach the traditional occupancy gridmap has been substituted by the information distribu-tion map. To show the viability of this map for planning,a simple thresholded information map containing the de-sired unoccupied spaces for safe traversal has been used.

In practice, Pose SLAM may change the map signifi-cantly due to loop closure. Large changes in the map cancause notable shifts in robot pose, hence resulting in alow quality information map. To solve this issue, one so-lution is to reset and learn the information map with allthe available data again. A possible measure to detectsuch drift in the information map can be e�ciently cal-

culated with the bounded measure provided by Jensen-Shannon divergence [Lin, 1991]. The generalized Jensen-Shannon divergence for n probability, p1, p2, ..., pn, withweights ⇡1,⇡2, ...,⇡n

is

JS⇡

(p1, p2, ..., pn) = H(nX

i=1

⇡i

pi

)�nX

i=1

⇡i

H(pi

) (17)

where H is the Shannon entropy function.

H(p) = �nX

i=1

p(xi

)log(p(xi

)) (18)

and p(xi

) is the probability associated to variable xi

.Alternatively, cumulative relative entropy by summing

the computed Jensen-Shannon entropy in each iterationcan be used. The cumulative relative entropy shows theinformation map drift over a period of time and containsthe history of map variations. Consequently, the methodis less sensitive to small sudden changes.

Fusion of local and global information maps as a mix-ture of GPs inherently provides a smooth recovery of themap. Furthermore, the proposed approach by [Valenciaet al., 2012] for replanning based on map entropy canalso be considered to improve the exploration strategyand avoid planning on significantly altered maps.

Page 7: Exploration using an Information-Based Reaction-Di usion

(a)

(b)

Figure 3: The Final occupancy grid maps; a) explorationwith the grid-based method and b) exploration in infor-mation distribution map. Inferring the information dis-tribution map with GP regression leads to a partiallyincomplete equivalent occupancy grid map. See Fig. 4for a completed experiment with GP regression.

8 Results and Discussion

This section demonstrates the proposed approachthrough exploration simulation in the Cave1 andFreiburg2 maps. These maps represent varying di�cultyof the environments which test the capabilities of themethod.

The proposed approach is compared with a state-of-the-art grid-based method, [Valencia et al., 2012], in ta-ble 1. Data are averaged over ten experiments for traveldistance, exploration time, average map entropy rate,and final map relative entropy. The presented methodoutperforms the compared grid-based method in traveldistance and map entropy rate. In spite of higher explo-ration time, larger absolute value for map entropy ratemeans fewer steps are required to complete an experi-

1Radish: The robotics dataset repository.

2www.slam.org.

Table 1: Comparison of the exploration methods in Cavemap (averaged over 10 experiments, mean ± std)

Grid-based GP-based

Travel dist. (m) 331 221 ± 22

Expl. time (min) 33.38 35.53 ± 9.39

Map ent. rate (NATS/step) -0.4438 -0.9795 ± 0.1384

Final map ent. (NATS) 143.8696 148.3280 ± 1.5910

ment. The map entropy rate reduces 54% more than thegrid-based approach, therefore, expected overall explo-ration time remains close.

Final map entropy for both methods is computed fromthe final occupancy grid map, Fig.3. For our approachoccupancy grid map is computed once at the end of theexperiment as the information distribution map is inher-ently di↵erent. The final map entropy value in our ap-proach is higher than the compared grid-based method,however, the fact that GP regression covers regions withsparse measurements and computes full posterior of amap by considering cell dependencies results in a par-tially incomplete equivalent occupancy grid map. Thisin part improves travel distance and map entropy rateas it can be seen in table 1.

8.1 Maps with varying levels of detail

The Cave map is small with connected and clear walls.The Freiburg map is relatively large in relation to Cave,and contains many points and disconnected obstacleswhich make it a challenging environment to explore.

Figure 4 illustrates the fully explored Cave map. As itcan be seen in frame (a), by exploiting the informationmap data, the robot was able to complete the map witha reduced set of movements. Despite the sparse measure-ments in some regions, the GPs are able to learn the in-formation map, depicted in Fig. 4(b), for goal extractionand planning. Note that calculation of the informationmap as the di↵erence between obstacles and free areasmap not only makes di↵erentiation between occupied-unknown and free-unknown parts possible, it results ina sharper information map around an obstacle’s cornerwhich itself improves the performance of the planner andobstacle avoidance technique.

Figure 5 shows exploration results in the Freiburg map(40m ⇥ 15m). In the illustrated example, the bottomframe shows the computed transient frontier map andthe extracted goal in which valid regions for clusteringand goal extraction have negative values. As an ad-vantage of planning in the information map, when therobot reached the target, concurrent map completionand increase in information or reduction of uncertaintyoccurs. This inherent active link between explorationand mapping is highly demanded and desirable for anyautonomous robotic scenario. However, structural com-plexity of Freiburg map and its size cause a high com-

Page 8: Exploration using an Information-Based Reaction-Di usion

0 5 10 15 20 25

0

5

10

15

20

25

x

y

Pose SLAM map

(a)

(b)

Figure 4: a) Exploration in Cave map, size of 20m⇥20m.In this experiment, loop closures detected during PoseSLAM caused small rotations in the map. b) Final in-formation distribution map; in information form, thesesharp loop-closure changes are better absorbed in its fi-nal representation. In spite of the sparse measurementsat some locations, at the end of the experiment the wholemap is completely explored based on the informationmap.

putational load and requires special computing facilities.Figure 6 shows the capabilities of the method to accom-modate to varying distributions of obstacles in a explo-ration run over the same environment. In the region inthe left part of the map, with little obstacle density, therobot is commanded with larger motion commands thanin the right part of the map, in which there is signifi-cantly more clutter, and the robot is commanded withslow, short movements.

Stop criterion depends on application, but a commonchoice in map completion is exploring till no frontier re-mains. Thus, the stop criterion is F = ; or in a morepractical way kvec(F)k1 < �. In this research, all thepresented results are achieved with � = 0.09. Anotheruseful measure is the predefined relative entropy. Nosignificant change of a map’s relative entropy over a pe-riod of time means there is no variation in the map andthe robot is in a fully explored area. In addition, cu-

(a)

(b)

(c)

Figure 5: Exploration in Freiburg map which is a chal-lenging environment due to the many discontinuities,obstacles, and rooms. In the information distributionmap, fully and partially explored regions can be easilyidentified. The information variation over boundariesof unexplored areas shows potential regions for furtherexploration. The bottom frame shows the transient fron-tier map, in which lower and negative values indicate re-gions of interest for goal extraction. The extracted goalis shown with a purple circle and, as it can be seen in thePose SLAM map, the robot successfully moved towardthe goal.

mulative relative entropy provides a measure to controlthe desired quality of the information map. Therefore,in trivial environments by selection of a higher thresh-old, it can reduce computational time of the exploration

Page 9: Exploration using an Information-Based Reaction-Di usion

(a)

(b)

Figure 6: Long-term exploration in the Freiburg map. Inthis run, the robot has been successfully exploring themap with respect to the stop criteria for a significantlylarger period of time. Note how in the eastern part of themap there are discontinuities in obstacles, and it is morechallenging for the robot to explore. b) Final informationdistribution map.

strategy.GP computations have been implemented with the

Open Source GP library in [Rasmussen and Williams,2006].

8.2 Comparison with gridmaps

Figure 7 shows three points in time during a tradi-tional frontier-based exploration with grid maps, usingthe same environment and specifications from the resultsshown in Fig. 2, with a cell size of 0.2m⇥0.2m. The robotis always driven to the nearest frontiers, with size largerthan 9 cells. Besides the loss of information due to thediscretization, such a sequence makes evident the e↵ectof the independence assumption between cells. In Fig. 7small frontiers appear because the information of nearfree and occupied cells is not propagated to the rest ofthe cells. Hence, when the robot has eventually exploredthe larger frontiers, it might be driven to such artifactsinstead of more informative regions. In a more realis-tic model, the occupancy in each place is not randomlydistributed as implicitly assumed by grid structures. In-stead, a spatial correlation between points in the mapshould exist given the structured spatial nature of the

world around us, and this is exactly what is achievedwith the proposed GP maps. Training values for freeand occupied space are appropriately propagated overthe environment, thus generating well defined frontiersas shown in Fig. 2.

9 Conclusion

A solution to the robot exploration problem is proposedin this work with the introduction of a novel informa-tion distribution map. The main improvement that thissolution brings in relation to traditional mapping ap-proaches is avoiding the need to create an occupancygrid map, thus explicitly avoiding the strong assump-tion of independence between cells, which in turn facil-itate accounting for structural dependencies in the en-vironment. Moreover, the solution is also a better fitto e�cient exploration of larger environments given thewell-known scalability issues of grid-maps at fixed res-olutions, and the increase in computational complexitythat factor leads to.

It is also plausible to extract goals directly from theglobal information distribution map. Although this mapvaries with time, working with this map to extract po-tential exploration goals is a time-independent calcula-tion, and only the currently available global informationmap is relevant at any given instant. In addition, themultivariate and continuous nature of the approach to-gether with the possibility of iterative updating appearspromising for 3D applications.

References

[Burgard et al., 2000] W. Burgard, M. Moors, D. Fox,R. Simmons, and S. Thrun. Collaborative multi-robotexploration. In Proc. IEEE Int. Conf. Robot Au-

tomat., volume 1, pages 476–481, 2000.

[Gan et al., 2009] S.K. Gan, K.J. Yang, andS. Sukkarieh. 3D online path planning in a con-tinuous Gaussian process occupancy map. In Proc.

Australian Conf. Field Robotics, 2009.

[Gonzalez-Banos and Latombe, 2002] H.H. Gonzalez-Banos and J.C. Latombe. Navigation strategies forexploring indoor environments. The Int. J. Robot.

Res., 21(10-11):829–848, 2002.

[Ila et al., 2010] V. Ila, J.M. Porta, and J. Andrade-Cetto. Information-based compact Pose SLAM. IEEETrans. Robot., 26(1):78–93, 2010.

[Julia et al., 2010] M. Julia, O. Reinoso, A. Gil,M. Ballesta, and L. Paya. A hybrid solution tothe multi-robot integrated exploration problem. Eng.Appl. Artif. Intell., 23(4):473–486, 2010.

[Julia et al., 2012] M. Julia, A. Gil, and O. Reinoso.A comparison of path planning strategies for au-

Page 10: Exploration using an Information-Based Reaction-Di usion

(a) (b) (c)

Figure 7: Three points in time during a frontier-based process using a grid map. Beyond the discretization e↵ects,the e↵ect of the independence assumption between cells can be seen as small frontiers appearing since informationof near free and occupied cells is not propagated to the rest of the cells. Cell colours indicate free (white), unknown(gray), occupied (black), and frontier (red).

tonomous exploration and mapping of unknown en-vironments. Auton. Robot., pages 1–18, 2012.

[Kavraki et al., 1996] L. Kavraki, P. Svestkaand, J. C.Latombe, and M. Overmars. Probabilistic roadmapsfor path planning in high dimensional configurationspaces. IEEE Trans. Robot., 12:566–580, 1996.

[Lau, 2003] H. Lau. Behavioural approach for multi-robot exploration. In Australasian Conf. Robot. Au-

tomat., 2003.

[Lin, 1991] J. Lin. Divergence measures based on theShannon entropy. IEEE Trans. Information Theory,37(1):145–151, 1991.

[Makarenko et al., 2002] A.A. Makarenko, S.B.Williams, F. Bourgault, and H.F. Durrant-Whyte.An experiment in integrated exploration. In Proc.

IEEE/RSJ Int. Conf. Intell. Robots Syst., volume 1,pages 534–539, 2002.

[Prestes e Silva et al., 2002] E. Prestes e Silva, P.M. En-gel, M. Trevisan, and M.A.P. Idiart. Explorationmethod using harmonic functions. Robot. Auton.

Syst., 40(1):25–42, 2002.

[Rasmussen and Williams, 2006] C.E. Rasmussen andC.K.I. Williams. Gaussian processes for machine

learning, volume 1. MIT press, 2006.

[Shade and Newman, 2011] R. Shade and P. Newman.Choosing where to go: Complete 3D exploration withstereo. In Proc. IEEE Int. Conf. Robot Automat.,pages 2806–2811, 2011.

[Shen et al., 2012] S. Shen, N. Michael, and V. Ku-mar. Stochastic di↵erential equation-based explo-ration algorithm for autonomous indoor 3d explo-ration with a micro-aerial vehicle. The Int. J. Robot.

Res., 31(12):1431–1444, 2012.

[Stein, 1999] M.L. Stein. Interpolation of spatial data:

some theory for kriging. Springer Verlag, 1999.

[Stocker, 2011] T. Stocker. Introduction to climate mod-

elling. Springer, 2011.

[T O’Callaghan and Ramos, 2012] S. T O’Callaghanand F.T. Ramos. Gaussian process occupancy maps.The Int. J. Robot. Res., 31(1):42–62, 2012.

[Thrun et al., 2005] S. Thrun, W. Burgard, and D. Fox.Probabilistic robotics, volume 1. MIT press, 2005.

[Tresp, 2000] V. Tresp. A Bayesian committee machine.Neural Computation, 12(11):2719–2741, 2000.

[Valencia et al., 2012] R. Valencia, J. Valls Miro, G. Dis-sanayake, and J. Andrade-Cetto. Active Pose SLAM.In Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst.,pages 1885–1891, 2012.

[Valencia et al., 2013] R. Valencia, M. Morta,J. Andrade-Cetto, and J.M. Porta. Planningreliable paths with pose slam. Robotics, IEEE

Transactions on, 29(4):1050–1059, 2013.

[Vallve and Andrade-Cetto, 2013] J. Vallve andJ. Andrade-Cetto. Mobile robot exploration with po-tential information fields. In 6th European Conference

on Mobile Robots, 2013.

[Yamauchi, 1997] B. Yamauchi. A frontier-based ap-proach for autonomous exploration. In Int. Sym.

Comput. Intell. Robot. Automat., pages 146–151,1997.

[Yamauchi, 1998] B. Yamauchi. Frontier-based explo-ration using multiple robots. In Int. Conf. Au-

tonomous Agents, pages 47–53, 1998.

[Zlot et al., 2002] R. Zlot, A. Stentz, M.B. Dias, andS. Thayer. Multi-robot exploration controlled by amarket economy. In Proc. IEEE Int. Conf. Robot Au-

tomat., volume 3, pages 3016–3023, 2002.