Decentralized Multi-Agent Exploration with Online-Learning ...robotics.upo.es/~lmercab/publications/papers/icra2016_GPs.pdf · to monitor dynamic processes, and makes the developed

Decentralized Multi-Agent Exploration with Online-Learning ofGaussian Processes

Alberto Viseras Ruiz1, Thomas Wiedemann1, Christoph Manss1, Lukas Magel1,Joachim Mueller1, Dmitriy Shutin1, Luis Merino2

Abstract— Exploration is a crucial problem in safety of lifeapplications, such as search and rescue missions. Gaussianprocesses constitute an interesting underlying data model thatleverages the spatial correlations of the process to be exploredto reduce the required sampling of data. Furthermore, multi-agent approaches offer well known advantages for exploration.Previous decentralized multi-agent exploration algorithms thatuse Gaussian processes as underlying data model, have onlybeen validated through simulations. However, the implementa-tion of an exploration algorithm brings difficulties that were nottackle yet. In this work, we propose an exploration algorithmthat deals with the following challenges: (i) which informationto transmit to achieve multi-agent coordination; (ii) how toimplement a light-weight collision avoidance; (iii) how to learnthe data’s model without prior information. We validate ouralgorithm with two experiments employing real robots. First,we explore the magnetic field intensity with a ground-basedrobot. Second, two quadcopters equipped with an ultrasoundsensor explore a terrain profile. We show that our algorithmoutperforms a meander and a random trajectory, as well as weare able to learn the data’s model online while exploring.

I. INTRODUCTION

A. Motivation

Exploration is a fundamental task in a wide range ofapplications, such as surveying, environmental analysis, orsearch and rescue. For these scenarios, mobile robots areoften well-suited to carry sensors to relevant sampling loca-tions. In general, it is desirable to reduce the required timeand manpower for the exploration of a given environment.This naturally leads to parallelization by means of multiplerobots, which form a self-coordinating multi-agent systemwith relatively little human supervision per robot.

Let us consider as an example a region devastated by anearthquake. In such a situation, we must explore the areaas efficiently as possible to, for instance, obtain a usablemap of the disaster area fast and with high resolution. Weforesee the use of a swarm of intelligent agents to carry outsuch exploration tasks. Here, the following questions arise: i)

1All authors are with the Institute of Communications and Navigationof the German Aerospace Center (DLR), Oberpfaffenhofen, 82234Wessling, Germany, [email protected],[email protected], [email protected],[email protected], [email protected],[email protected]

2L. Merino is with School of Engineering, Pablo de Olavide University,Crta. Utrera km 1, Seville, Spain [email protected]. His work ispartially funded by the Junta de Andalucia through the project PAIS-MultiRobot (TIC-7390)

Fig. 1: Two quadcopters exploring an a priori unknown terrainprofile with our proposed decentralized multi-agent explorationalgorithm. They are equipped with an ultrasound sensor facingdown to measure the distance to the floor.

which information should the agents communicate to coordi-nate themselves efficiently, and to decide where to measure;ii) how to avoid inter-agent collisions; iii) how to learn amodel of the map’s structure without prior information andiv) how to tackle these challenges in a decentralized mannerto increase the algorithm robustness.

In Figure 1 we show a scaled-down version of our moti-vation statement. Here, we explore a terrain profile with twoquadcopters using our proposed algorithm.

B. Related Work

The recent advances in mobile robotics, together withthe appearance of smaller and more controllable flyingrobots, have opened new frontiers for the development ofnovel exploration algorithms [1]. The trend in the roboticscommunity points towards swarms of small robots that runlight algorithms in a distributed manner. In this direction,Julian et al. present a coordination method for distributedrobots based on the maximum informativeness criterion [2].Here we extend their work by exploiting spatial correlationsto accelerate the exploration performance.

These ideas have already been explored in several works.For example, Markov random fields have been used to modelthe spatial correlation of an oceanographic process withthe goal of deriving a level curve tracking algorithm [3].We are interested in probabilistic models that could allowus to learn the process’ model from the acquired data.Specifically, Gaussian processes represent a powerful methodto model spatial phenomena, as well as to learn the process’spatial structure. Singh et al. propose a procedure to define

suitable covariance functions for Gaussian process regressionin environmental surveillance applications [4]. They extendthose covariance functions to perform informative path plan-ning. Informative path planning – also called exploration,active sensing or informative sampling – is often combinedwith Gaussian processes due to their ability to predict theremaining uncertainty about the unobserved part of theprocess. Krause et al. study what is the optimal placementfor a network of sensors in order to reduce the uncertaintyof the process under study [5]. However, in their model,the sensor placements are fixed. Here, we are interested inexploration with mobile robots. This allows us to cover largerenvironments, requires less robots, gives us more flexibilityto monitor dynamic processes, and makes the developedalgorithm more robust against agent failures.

In the literature, single mobile robots have been consid-ered for different exploration applications using Gaussianprocesses as models, such as SLAM [6] or environmentalmonitoring [7]. Online estimation of a radio signal sourcehas been studied in the Gaussian processes context as wellto model the signal propagated by the radio signal source [8].These works only focus on exploration with a single agent.We, however, argue that the appropriate cooperation within aswarm of multiple agents could increase the performance androbustness of our exploration algorithms, as we will show inthis work.

The problem of multi-robot exploration was tackled bySingh et al. to optimally plan the trajectories of multiplerobots to explore a process [9]. This process is modeled asa pre-learned Gaussian process and the optimal explorationproblem is solved in a centralized manner. Decentralizationbrings advantages in terms of the algorithm’s robustnessrespect to agent failures. Chen et al. propose the use ofGaussian processes to monitor online traffic with multiplerobots in a decentralized manner [10]. In [11], we proposeda decentralized multi-agent strategy to explore a magneticfield intensity in an indoor environment. We evaluated thealgorithm’s performance with respect to several covariancefunctions and proved its convergence and scalability withsimulations. However, both of the aforementioned worksassume the model’s hyperparameters to be known. Ouyanget al. go one step further and propose an algorithm that alsolearns these as the process’ structure [12]. They use thisto actively sense an environmental phenomenon, which ismodeled as a non-stationary Dirichlet process mixture ofGaussian processes. However, they validated the algorithmin simulations and did not tackle the challenges that ap-pear in an experiment. Similar to [5], in [13] the authorspropose a decentralized algorithm with mobile robots thatact as elements of a sensor network to monitor a physicalphenomenon in a lab environment. However, they focus onthe spatio-temporal monitoring; i.e. the sensor placementsare fixed and the robots must decide where to move nextin order to monitor the process. The main difference withour work lies on the fact that we consider all the positionsin the environment as possible sensor placements. Then, weonly aim to measure in some of the locations to reconstruct

the physical process of the complete environment with ahigh resolution. In addition, we concentrate on the online-learning of the hyperparameters and show the experimentalperformance of the learning process.

The aforementioned works for a decentralized multi-agentexploration algorithm are solely validated through simula-tions. However, the inclusion of several robots in-the-loopbrings difficulties that were not yet shown in the literature. Inthis work, we propose an algorithm that addresses the follow-ing three questions: i) which information should the robotstransmit to allow a decentralized multi-agent coordination;ii) how to implement a light-weight collision avoidancemechanism and iii) how to learn the hyperparameters online,while exploring.

The remainder of the paper is organized as follows. Sec-tion II states formally the problem. Section III summarizesthe Gaussian process model used to represent the physicalphenomena to be explored. We describe in Section IVthe multi-agent exploration algorithm. Section V presentsthe experiments performed, and is then followed by theconclusions.

II. PROBLEM STATEMENT

We wish to explore an a priori unknown physical processas accurately as possible, in the sense of minimizing thedifference between estimate and ground truth, and do soefficiently, i.e. consuming as little as possible of the limitedresources such as time, energy, or communication capacity.

In this work, we assume the following:• The borders that define the environment are a priori

known.• The physical process is time-invariant, such as magnetic

field intensity, or terrain profile.• The agent’s position is known exactly and noise-free.• The agents network is fully connected and the

communication between agents is perfect.

We employ a swarm of N agents to explore the physicalprocess under study. The position of the ith agent is xi;with xi ∈ X ⊂ R3, and i = 1, 2, ..., N , and X is theenvironment in which the robot can operate. The physicalprocess in position xi ∈ X is given by the variable yxi ∈ Y ,with Y ⊂ R. We assume a sensor model that is described aszxi

= yxi+ ε, where zxi

is the measurement of the physicalprocess taken by robot i at position xi and ε is a noise factorthat is distributed according to ε ∼ N (0, σ2

n).

III. SPATIAL GAUSSIAN PROCESS MODEL

Consider, for example, the task of measuring a terrainprofile of an unknown environment, or the magnetic fieldintensity in an indoor environment as the process at hand.If we could model the spatial correlation of this process,we could fill spatial gaps between measurements with pre-dictions [14]. The stronger the correlations and the betterthey are represented in a model, the fewer measurementsare needed to achieve a certain accuracy. Gaussian processesrepresent a method to model physical phenomena with strong

spatial variations, like ozone concentration [7], magnetic fieldintensity [15], etc.

A Gaussian process [16] is a collection of random vari-ables, any finite number of which have a multivariate Gaus-sian distribution. As such, it is fully specified by a meanfunction m(x) and a covariance function k(x,x′) for anygiven positions x and x′.

We define the following vectors for the ith robot: 1) Xi

is a matrix that contains all locations where the ith robothas measured Xi =

[x[1]i ,x

[2]i , · · · ,x

[p]i

]T; 2) zi are the

measurements taken at locations Xi according to the sensor

model from section II; 3) Xi∗ =[x[1]i∗ ,x

[2]i∗ , · · · ,x

[n]i∗

]Tis a

matrix composed by all positions where we aim to predictthe physical process’ values.

Gaussian processes are commonly used as priors in aBayesian setting. Given zi and Xi, we can predict thetarget values yi∗ for the corresponding Xi∗. The elementsin yi∗ are distributed according to: p(yi∗|Xi∗,Xi, zi) =N (µi∗,Σi∗). The mean vector µi∗ and the covariance matrixΣi∗ of the predictive distribution are calculated as:

µi∗ = m(Xi∗) + KT∗K−1(zi −m(Xi)),

Σi∗ = K∗∗ −KT∗K−1K∗,

(1)

with K,K∗,K∗∗ the following matrices defined from thecovariance function as:

K =

k(x

[1]i ,x

[1]i ) · · · k(x

[1]i ,x

[p]i )

.... . .

...k(x

[p]i ,x

[1]i ) · · · k(x

[p]i ,x

[p]i )

,

K∗ =

k(x[1],x

[1]i∗ ) · · · k(x[1],x

[n]i∗ )

.... . .

...k(x[p],x

[1]i∗ ) · · · k(x[p],x

[n]i∗ )

,

K∗∗ =

k(x

[1]i∗ ,x

[1]i∗ ) · · · k(x

[1]i∗ ,x

[n]i∗ )

.... . .

...k(x

[n]i∗ ,x

[1]i∗ ) · · · k(x

[n]i∗ ,x

[n]i∗ )

.

(2)

The definition of the covariance function assumes the notionof similarity, which means that we expect that closer pointsare more likely to be similar. We focus our interest instationary and isotropic covariance functions. In this work,we employ the squared exponential covariance function (3)because of its ability to model smooth processes, as the oneswe aim to explore.

k(x,x′) = σ2f · exp

[−(x− x′)2

2l2

]+ σ2

nδxx′ , (3)

where δxx′ = 1 iff x = x′ is the Kronecker’s delta.Let us define the hyper-parameters θ = [σ2

f , l, σ2n]

T as aset of parameters that completely define the process’ model.Gaussian processes represent a powerful method to learn thismodel from the data. Given the training data, we can computethe hyperparameters θi∗ that best fit our measurements. This

can be done by maximizing the log-marginal likelihood(LML) with respect to the hyper-parameters θ,

θi∗ = argmaxθ

{1

2zTi K−1zi +

1

2log|K|

}, (4)

where K is a function of the hyperparameters θ accordingto (2) . We employ conjugate gradients for the optimization.

IV. DECENTRALIZED MULTI-AGENTEXPLORATION

We aim to explore an a priori unknown process. Thatmeans we should learn the process’ model while exploringin order to sample at those locations that better exploitthe current model. First, we present the algorithm for thesingle-agent exploration. Second, we extend it to allow theexploration with multiple robots in a decentralized manner.

A. Single Agent Exploration

We propose an algorithm that is based on four steps:Sense, Learn, Predict and Move (see Algorithm 1). It takesas an input the set of locations X over which the exploredprocess is defined, and a stopping criterion. This stoppingcriterion could be defined by the user (e.g. time, battery life)or could be determined from the data (e.g. measure of re-maining uncertainty). We initialize the ith robot’s position asx[1]i , and the vector of measurements zi and their respective

positions Xi as empty (line 1). Then we run the algorithmuntil it fulfills the stopping criterion.

Algorithm 1 SingleAgentExploration(X , StopCriterion)

1: i = 1;xi ← x[1]i ; zi ← NULL;Xi ← NULL

2:3: while ! StopCriterion do4: zxi ← Sense(xi)5: zi ← [zi; zxi ]

6: Xi ← [Xi;xTi ]

7: θi∗ ← LearnHyp(zi,Xi)8: Xi∗ ← CalcNeighbors(X ,xi)9: µi∗,Σi∗ ← PredictGP(zi,Xi,Xi∗,θi∗)

10: xnext ← argmaxx∈Xi∗

(σ2i∗)

11: xi ← MoveTo(xnext)

First, we take several measurements at position xi andfilter them to obtain zxi (line 4). This filtering step allowsus to mitigate the sensor noise and to reject possible outliers.

Second, we incorporate the new measurement and itslocation into our measurements’ vector and positions’ vector(lines 5,6). We use those measurements to learn the model’sparameters that best represent our data (line 7). The betterthe model, the better we will be able to predict the mostinteresting areas to explore next. In our Gaussian processmodel, learning the model is equivalent to learning theoptimal hyperparameters that characterize the covariancefunction. This can be done with equation (4).

Next, we select the set of positions where the agent canmove in the current iteration (line 8). Since we are interestedin learning the local correlation between measurements,we consider a greedy approach to select the agent’s next

position. Therefore, the matrix of possible next positions Xi∗corresponds to those located within a sphere Br(xi)

1, ofradius r equal to the desired measurements’ resolution2 andcentered at the agent’s position xi:

Xi∗ = {x ∈ X | x ∈ Br(xi)} . (5)

We predict the process’ vector of means µi∗ and covari-ance matrix Σi∗ in the positions Xi∗ using the Gaussianprocess model learned in line 7. These vectors of means andvariances are calculated according to equation (1), where theinputs are the vector of measurements zi and their respectivelocations Xi (line 9). In [11], we showed that the locality ofthe predictions does not affect the algorithm’s performance.Therefore, here we just perform the prediction in a localarea centered at the agent’s position in order to preserve thescalability with respect to the number of measurements.

The predicted variance is a measure of the process en-tropy [17]. We aim to sample where the entropy is highest;i.e. at the most informative location. Therefore, we moveto the position xnext ∈ Xi∗ with the highest predicteduncertainty/variance (line 10). The variance is given byvector σ2

i∗ that is composed of the elements of the maindiagonal of matrix Σi∗. Each of the elements of vector σ2

i∗represents the variance at positions x

[j]i ∈ Xi∗.

The robot continues running the algorithm by movingtowards positions with high uncertainty. Once the algorithmstops, the collected measurements are used to predict theprocess value at any possible location in X . This alsosuppresses the measurement noise. The prediction gives usthe mean and variance of the distribution at all positionsx ∈ X . We select the mean of each of the distributions asour exploration result of the physical process under study.

B. Multi-Agent Exploration

Now, we extend the single-agent exploration algorithmto handle multiple robots (see Algorithm 2). The proposedalgorithm is able to coordinate the robots while avoidinginter-agent collisions. This algorithm runs in a decentralizedmanner, such that each of the agents does its own decisionsbased on the current information. This information consistsof the single measurements taken by the agents, togetherwith their respective locations and the positions where theagents are heading to. They correspond to an exchange often scalar values per agent in a 3-dimensional environment,which makes the information exchange feasible from thecommunication perspective. Multi-agent coordination by ex-changing a small amount of data is possible due to the factthat agents share the same data model; i.e. they all employthe same mean function and covariance function for theGaussian processes regression and learning. In the following,we explain the algorithm in detail.

Let’s consider N robots for the exploration, with eachrobot i starting at a different position x

[1]i . The vectors of

1In our particular experimental setup, where the agents move in a two-dimensonal space, the sphere Br corresponds to a circle of radius r.

2The measurements’ resolution corresponds to the resolution with whichwe aim to reconstruct the physical process under study.

Algorithm 2 MultiAgentExploration(X , StopCriterion)

1: {xi}Ni=1 ← x[1]i ; {zi}Ni=1 ← NULL; {Xi}Ni=1 ← NULL

2:3: for agent i = 1, ..., N do4: while ! StopCriterion do5: zothLast,XothLast,XothNext ← ReceiveInfo(∀j ∈ N)6: zxi ← Sense(xi)7: zi ← [zi; zothLast; zxi ]

8: Xi ← [Xi;XothLast;xiT ]

9: θi∗ ← LearnHyp(zi,Xi)10: Xi∗ ← CalcNeighborsMA(X ,XothLast,XothNext,xi)11: µi∗,Σi∗ ← PredictGP(zi,Xi,Xi∗,θi∗)12: xnext ← argmax

x∈Xi∗(σ2

i∗)

13: BroadcastInfo(zxi ,xi,xnext)14: xi ← MoveTo(xnext)

measurements and measurements’ positions of each of therobots are initialized as an empty set (line 1). Then, eachindividual robot runs its own algorithm and communicateswith the other robots until the stopping criterion is fulfilled.It is important to remark that the for-loop runs in parallelin each of the agents. However, the instructions containedin the while-loop run sequentially for each of the N agents.We assume as well that the agents are timely synchronized.

In a first step, we receive the information broadcasted bythe other agents (line 5). It consists of the last measurementtaken by each of the other agents (zothLast); as well as thepositions where those measurements were taken (XothLast),and the next positions where the agents are heading to(XothNext)3. This information is sufficient to achieve theinter-agent coordination. On the one hand, the knowledgeabout the other agents’ current position and next positionsallows us to implement a collision avoidance mechanism.On the other hand, the set of measurements provided bythe other agents act as an indirect mechanism to coordinatethe exploration efforts. Since all agents share the same datamodel, each of them can reproduce what the other agents’uncertainties look like. This leads to an implicit coordinationthat avoids, for example, that two agents measure at the sameposition if it is not strictly necessary. It is important to noticethat the reception works as a process in parallel, capturingall the broadcasted information during one iteration of thealgorithm.

In a second step, the robot incorporates the new mea-surements to its own vector of measurements (lines 6-8)and computes its next position as explained in the previoussection. However, we must consider the possible inter-agentcollisions to calculate the set of next positions Xi∗ (line10). Here, we add a new constraint to calculate Xi∗ thatforbids agents to be closer than a safety distance dsafe toeach other. We must guarantee this distance respect to theother agents’ positions XothLast, as well as to the positionsXothNext where the other agents are heading to. Now the

3Vector zothLast has dimensions (N − 1, 1). Matrices XothLast andXothNext have dimensions (N − 1, 3).

set of possible next positions is reduced to:

Xi∗ =

x ∈ X

∣∣∣∣∣∣x ∈ Br(xi),min(||x−XothLast||2) > dsafe,min(||x−XothNext||2) > dsafe,

(6)

where min(||x−X||2) is the minimum euclidean distancebetween x and any of the rows in vector X.

Once the agent has calculated its next position, it broad-casts its current measurement and its next position to visit.The agents explore the physical process until they reach theirstopping criteria. Then each of them is able to reconstructthe physical process as we described it for the single-agentexploration case.

V. EXPERIMENTS AND DISCUSSION OF RESULTSWe validate our algorithm for the exploration of two

different physical processes with two different types ofrobotic platforms (ground-based and flying robots). Inboth cases, and without loss of generality, we assume atwo-dimensional environment X ⊂ R2. For the secondexperiment, each of the quadcopters move in a 2-dimensionalspace at a constant height. We employ a commercial motioncapture system (Vicon) to provide ground truth informationof the robot’s position. Our particular setup consists of 16infrared sensitive cameras and infrared strobes.

(a) Quadcopter. (b) Slider.

Fig. 2: Left: a quadcopter equipped with an ultrasound sensorfacing down to measure the range to the floor. Right: a holonomicground-based robot measuring the magnetic field intensity with amagnetometer.

A. Experiment 1: Magnetic Field Intensity

1) Experimental Setup: We explore the magnetic fieldintensity on the floor in an indoor environment using aground-based holonomic robot equipped with a magnetome-ter (see Figure 2b). We aim to reconstruct this magnetic fieldintensity with a resolution of 10 cm in an environment thatmeasures 7.5 m × 3 m. The robot is a modified version ofthe commercially available Slider platform by CommonplaceRobotics. The magnetic field sensor module used in thereported experiments is part of a commercial integratedsensor package (Xsens MTx). The complete algorithm runson a laptop mounted on top of the robot.

We measured the complete magnetic field intensity in theexploration environment with a resolution of 10 cm and wetook these measurements as ground truth. This is a validassumption, considering that the magnetometer is consideredas almost noise-free according to its specifications.

2) Experimental Results: In this first experiment, we areinterested in testing the proposed exploration strategy forthe single-robot case. Therefore, we test the algorithm per-formance assuming we know a priori the data model; i.e. weknow the model’s hyperparameters a priori. Figure 3 showsthe evolution of the root-mean-square error (RMSE) as afunction of exploration time for three different trajectories: ameander like trajectory, a random trajectory where the targetpositions are selected according to a uniform distributionfrom the set of positions in the environment, and finally ouralgorithm’s trajectory. We predict the magnetic field valuesin the non-measured positions using Gaussian processesregression with the optimal hyperparameters learned fromthe ground truth data. We observe that the RMSE with ouralgorithm is close to zero, while we have just measured a40% of the environment.

Fig. 3: Root mean squared error of the estimated magnetic field in-tensity with respect to the ground truth for three different algorithmsduring a 23 minutes exploration.

B. Experiment 2: Terrain Profile

1) Experimental Setup: We validate our algorithm in asecond experiment with two quadcopters to explore a heightprofile. In this case, the environment measures 8 m× 3 m,and the built height profile measures approx. 60 cm from topto bottom. We intend to explore it with a lateral resolution of20 cm. Each of the quadcopters flies at a different constantheight above the floor of 1 m and 1.5 m to avoid the riskof collisions, although they are not aware of it. They areeach equipped with a commercial ultrasound sensor fromMaxBotix facing down to measure the range to the floor(see Figure 2a). The sensors each have a nominal range ofapprox. 7.5 m with opening angles of 45◦ in the near fieldand a cylindrical measurement profile for ranges greater than1.5 m. The profile’s height is calculated as the differencebetween the robot’s actual height, which is known and noise-free, and the range between the quadcopter and the heightprofile. This range is measured with the ultrasound sensor.It is important to remark that the measured terrain’s heightis independent of the quadcopter’s z component.

In order to consider valid measurements for the explo-ration task, we carry out a pre-filtering step. This is neededdue to the characteristics of the ultrasound sensor. Its work-ing principle is based on sending an impulse and waitingfor the echo to calculate the distance to the closest objectwithin the sensor’s footprint. It could happen that reflections

in the environment and oscillations of the quadcopter’s posecould lead to missing measurements (the echo does not returnto the ultrasound sensor) or not-plausible measurements (theimpulse gets affected by multiple reflections and the resultingmeasurement is inconsistent considering the environmentstructure). Therefore, multiple measurements need to betaken to mitigate such effects. We solve this problem byaveraging over 10 valid measurements taken at the positionof interest; i.e. we discard those measurements that are notplausible, such as negative heights and heights that are abovethe quadcopter actual z position. In case we obtain no validmeasurements during an interval of 30 seconds, we averageover the last 10 valid measurements, which are stored in abuffer. This approximation is possible due to the notion ofsimilarity that assumes that closer points in space are morelikely to be similar.

The algorithm uses the robot operating system (ROS) [18],and utilizes WiFi for the communication between agents. Itis important to notice that, although the algorithm itself is de-centralized, only parts of it run on the Raspberry Pi mountedon the quadcopters due to the computational limitations ofthese devices. Instead, the Gaussian processes regression andlearning of hyperparameters run in one separate central com-puter, but in a decentralized manner. This decentralization ispossible because of the nature of ROS, where nodes runin different threads. We employ the pyGPs [19] library toperform these calculations.

For the terrain profile, the ground truth corresponds to 13markers placed randomly distributed along the height profile.Their positions are recorded using the Vicon tracking system.

2) Experimental Results: We have shown in the previousexperiment that our algorithm outperforms the meander andrandom trajectory. However, we assumed the data model asknown. In a realistic scenario, such as a search and rescuemission, we have no prior information about the physicalprocess under study. Instead, we must learn the data modelonline while performing the exploration. We show in thissection results corresponding to the online learning of themodel’s hyperparameters; as well as the performance of theproposed multi-agent exploration algorithm.

First, we analyze the root-mean-square error (RMSE)between estimate and ground truth (the 13 markers men-tioned above) after running the exploration algorithm duringthe lifetime of the quadcopter’s battery, which is approx.8 minutes in our case. Exchanging the batteries of thequadcopter is a highly time-consuming process. Therefore,our goal is developing exploration algorithms that are ableto reconstruct the original process during this battery’s life-time. This is done by both developing intelligent explorationstrategies and increasing the number of robots in the swarm.

We compare in Figure 4 the performance of five explo-ration strategies over time: (i) meander-like trajectory withpre-set hyperparameters that measures all the positions inthe environment; (ii) random trajectory with online learn-ing of hyperparameters; (iii) our single-agent explorationalgorithm with online learning of hyperparameters; (iv) ourmulti-agent exploration algorithm with online learning of

hyperparameters ; (v) our multi-agent exploration algorithmwith pre-set hyperparameters. Both trajectories (i) and (ii)employ Gaussian processes regression to predict the physicalprocess values at positions that were not yet measured.Trajectories (i) and (v) carry out the regression using theoptimal hyperparameters learned from the data collected withthe meander trajectory in a previous exploration run.

Fig. 4: Terrain profile. Root mean squared error with respect tothe ground truth for five different algorithms during a 8 minutesexploration.

First thing we observe from results is that the minimumerror we achieve is approx. 7 cm. This error may seemrelatively large. However, it is equal to the best performancewe can obtain with the ultrasound sensor in the exploredenvironment. We measured all positions in the environment(we needed 3 batteries and 53 minutes) with the meandertrajectory and calculated the RMSE as benchmark. Theresulting RMSE was 7.64 cm4, although we had measuredthe complete physical process. This error is considerablylarge and is due to the sensor’s footprint and sensor’scharacteristics, and to the oscillations of the quadcopter whileflying . The sensor takes the minimum range – maximumheight – within its footprint. Therefore it is not able todistinguish between different heights that are close. Thisfact induces the error. Since that is the best performancewe can get with the sensor without any postprocessing ofthe measurements, we assume this error as the best possiblesolution we can obtain.

Next fact we notice is that, in contrast to the resultsobtained for the magnetic field intensity, the error with therandom and single-agent exploration is larger than the oneobtained with the meander trajectory. The difference in thiscase is that we are learning the model’s hyperparametersonline, while in the other situation they were pre-set to theoptimal values. Now, the amount of measurements collectedwith a single quad in the initial phase of the exploration runis not enough to learn the process model fast, which incursin an initial loss of performance that is dragged during therest of the exploration.

However, attending at results from Figure 4, performanceof the random trajectory is comparable to the performanceof the single-agent exploration algorithm. Figure 5a demon-strates that this performance of the random trajectory is a

4The fact that the meander’s error after measuring the complete processis larger than the one obtained with our algorithm is due to the sensor’snoise.

mere coincidence, since this trajectory does not guaranteethe convergence of the hyperparameters learning (see intervalbetween 210 and 320 seconds). In contrast, as we showin Figure 5b, our algorithm converges fast to the optimalhyperparameters given the available measurements becauseof the greedy nature of the algorithm.

(a) Hyperparameters learned while following a random trajectory.

(b) Hyperparameters learned by the two quads, hans and hindrich,while exploring with our algorithm.

Fig. 5: Hyperparameters learned during the exploration run. Theblack points represent the actual values. The color lines are theresult of a linear interpolation of those points.

Since one quad is not sufficient to explore this physicalprocess, we use the multi-agent exploration algorithm withonline learning of hyperparameters with two quads (trajec-tory (iv)). Here, we observe in Figure 4 that the error hasbeen reduced by one half respect to our benchmark, whichproves the correct coordination between robots. However,this error of 12.84 cm is still larger than our benchmarkerror of 7.64 cm. On the other hand, the exploration timewas 8 minutes, which is approx. seven times smaller thanthe time needed to achieve our benchmark error.

In order to understand where this remaining error lies, werun the multi-agent exploration algorithm with two quads butwith the optimal hyperparameters that where set and pre-learned from the data collected with the meander trajectory.This corresponds to trajectory (v). Here, we notice thatthe error is approx. equal as our best possible solutionwhile the exploration time is approx. seven times smaller.Figure 6 shows the trajectories of the two robots for thetrajectories (iv) and (v). We can see that for the onlinelearning case, the robots fly around in a local area in thestarting phase. This response is due to the fact that therobots have not learned a proper model and are not ableto decide correctly where to measure next. We observe thisbehavior as well in Figure 5b. In the first 220 seconds, thehyperparameters learning does not converge and this causesan inferior performance compared to the set hyperparameterstrajectory. However, we remark that the convergence time

is fast considering the non-smooth nature of the measuredphysical process.

(a) Hyperparameters learning. (b) Hyperparameters set.

Fig. 6: Reconstruction of the terrain profile (see Figure 1) afterrunning our algorithm for two different setups: online learning(left) and defined hyperparameters (right). On top we show thetrajectories of the two quadcopters. The big blue dots correspondto the markers’ positions that serve us as ground truth.

We can conclude that our multi-agent exploration algo-rithm is able to achieve the correct coordination between therobots. However, as part of the future work we must improvethe learning phase to get closer to the performance obtainedwhile using the optimal hyperparameters.

VI. CONCLUSIONS AND FUTURE WORK

The paper has presented a method for multi-agent explo-ration of spatially distributed physical phenomena. Gaussianprocesses are employed as the underlying representation.Then, the system is able to learn, online and in a decentral-ized fashion, the hyperparameters of the Gaussian process,and, at the same time, determine the best next positions ofthe agents in the team to obtain new measurements fromthe point of view of information gain. Agent coordinationand collision avoidance is achieved by sharing information.The experiments show how the method allows for a moreefficient exploration of the environment compared to otherstrategies.

Future extensions of the algorithm include considering amore complex environment populated with obstacles. In thissense, we aim to extend our previous work in [20] to proposea multi-agent exploration algorithm in complex, unknownenvironments. We would like as well to perform explorationmissions with robots with complex dynamics. In addition,in order to consider the algorithm for actual search andrescue operations, we should extend it to handle uncertaintyin the robot’s motion and positioning. As part of the futurework, we believe we could reduce the ultrasound sensor erroradding a post-processing step and considering an array of

sensors. Additionally, we are interested in exploring alter-native covariance functions to represent better the process’model.

One of the assumptions of our proposed work is that thenetwork of agents is fully connected. This assumption couldbe relaxed by considering a connected network and usingappropriate communication protocols. However, to achievescalability respect to the number of agents, we aim toconsider distributed algorithms. That includes solving theGaussian processes regression and learning of hyperparam-eters in a distributed fashion; as well as determining theexploration targets in a distributed manner by employingconsensus algorithms.

REFERENCES

[1] V. Kumar and N. Michael, “Opportunities and challenges with au-tonomous micro aerial vehicles,” The International Journal of RoboticsResearch, vol. 31, no. 11, pp. 1279–1291, 2012.

[2] B. J. Julian, M. Angermann, M. Schwager, and D. Rus, “Distributedrobotic sensor networks: An information-theoretic approach,” TheInternational Journal of Robotics Research, vol. 31, no. 10, pp. 1134–1154, 2012.

[3] R. K. Williams and G. Sukhatme, “Probabilistic spatial mapping andcurve tracking in distributed multi-agent systems,” in Robotics andAutomation (ICRA), 2012 IEEE International Conference on. IEEE,2012, pp. 1125–1130.

[4] A. Singh, F. Ramos, H. D. Whyte, and W. J. Kaiser, “Modelingand decision making in spatio-temporal processes for environmentalsurveillance,” in Robotics and Automation (ICRA), 2010 IEEE Inter-national Conference on. IEEE, 2010, pp. 5490–5497.

[5] A. Krause, A. Singh, and C. Guestrin, “Near-optimal sensor place-ments in gaussian processes: Theory, efficient algorithms and empiricalstudies,” The Journal of Machine Learning Research, vol. 9, pp. 235–284, 2008.

[6] M. Ghaffari Jadidi, J. Valls Miro, R. Valencia, and J. Andrade-Cetto, “Exploration on continuous gaussian process frontier maps,” inRobotics and Automation (ICRA), 2014 IEEE International Conferenceon. IEEE, 2014, pp. 6077–6082.

[7] R. Marchant and F. Ramos, “Bayesian optimisation for informativecontinuous path planning,” in Robotics and Automation (ICRA), 2014IEEE International Conference on. IEEE, 2014, pp. 6136–6143.

[8] J. Fink and V. Kumar, “Online methods for radio signal mappingwith mobile robots,” in Robotics and Automation (ICRA), 2010 IEEEInternational Conference on. IEEE, 2010, pp. 1940–1945.

[9] A. Singh, A. Krause, C. Guestrin, and W. J. Kaiser, “Efficient informa-tive sensing using multiple robots,” Journal of Artificial IntelligenceResearch, pp. 707–755, 2009.

[10] J. Chen, K. H. Low, Y. Yao, and P. Jaillet, “Gaussian processdecentralized data fusion and active sensing for spatiotemporal trafficmodeling and prediction in mobility-on-demand systems.”

[11] A. V. Ruiz, M. Angermann, I. Wieser, M. Frassl, and J. Mueller, “Effi-cient multi-agent exploration with gaussian processes,” in AustralasianConference on Robotics and Automation (ACRA), 2014.

[12] R. Ouyang, K. H. Low, J. Chen, and P. Jaillet, “Multi-robot activesensing of non-stationary gaussian process-based environmental phe-nomena,” in Proceedings of the 2014 international conference onAutonomous agents and multi-agent systems. International Foundationfor Autonomous Agents and Multiagent Systems, 2014, pp. 573–580.

[13] R. Stranders, A. Rogers, and N. Jennings, “A decentralized, on-line coordination mechanism for monitoring spatial phenomena withmobile sensors,” 2008.

[14] N. Cressie, “Statistics for spatial data,” Terra Nova, vol. 4, no. 5, pp.613–617, 1992.

[15] A. Kemppainen, J. Haverinen, I. Vallivaara, and J. Roning, “Near-optimal slam exploration in gaussian processes,” in Multisensor Fusionand Integration for Intelligent Systems (MFI), 2010 IEEE Conferenceon. IEEE, 2010, pp. 7–13.

[16] C. E. Rasmussen and C. K. Williams, “Gaussian processes for machinelearning (adaptive computation and machine learning),” 2005.

[17] T. M. Cover and J. A. Thomas, Elements of information theory. JohnWiley & Sons, 2012.

[18] ROS (Robot Operating System), 2015 (accessed September 14, 2015).[Online]. Available: http://wiki.ros.org/

[19] pyGPs - A Package for Gaussian Processes, 2015 (accessedSeptember 14, 2015). [Online]. Available: http://www-ai.cs.uni-dortmund.de/weblab/static/api docs/pyGPs/

[20] A. Viseras Ruiz and C. Olariu, “A general algorithm for explorationwith gaussian processes in complex, unknown environments,” in IEEEInternational Conference on Robotics and Automation (ICRA), 2015.

Documents

Decentralized Multi-Agent Exploration with Online-Learning ...robotics.upo.es/~lmercab/publications/papers/icra2016_GPs.pdf · to monitor dynamic processes, and makes the developed