Anytime Planning for Decentralized Multi-Robot Active … · 2020-02-11 · Anytime Planning for Decentralized Multi-Robot Active Information Gathering Brent Schlotfeldt1 Dinesh Thakur1

Anytime Planning for DecentralizedMulti-Robot Active Information Gathering

Brent Schlotfeldt1 Dinesh Thakur1 Nikolay Atanasov2

Vijay Kumar1 George Pappas1

1GRASP LaboratoryUniversity of PennsylvaniaPhiladelphia, PA, USA

2Department of Electrical and Computer EngineeringUniversity of California San Diego

San Diego, CA, USA

May 17, 2018

Brent Schlotfeldt1, Dinesh Thakur1, Nikolay Atanasov2, Vijay Kumar1,, George Pappas1 (Penn)Anytime Planning May 17, 2018 1 / 25

What is the problem?

Suppose the robot at x0 wants to plan a path to xT to track a target yT


Overview

1 Active Information GatheringProblem StatementReduction to Open-Loop Control, and Optimal Solution(ε,δ)-Reduced Value IterationAnytime Reduced Value Iteration

2 Multiple RobotsCoordinate DescentDistributed EstimationApplication: Target Tracking


Problem Statement

Consider a team of n mobile robots, with sensor motion models:

xi,t+1 = fi (xi,t , ui,t), i ∈ 1, . . . , n (1)

where xi,t ∈ X , ui,t ∈ U , and U is a finite set.

The robots goal is to track a target yt with target motion model:

yt+1 = Ayt + wt , wt ∼ N (0,W ) t = 0, . . . ,T (2)

The robots are able to sense the target using on-board sensors obeying thefollowing sensor observation model:

zi,t = H(xi,t)yt + vt(xi,t), vt ∼ N (0,V (xi,t)) (3)

Note the Linear, Gaussian assumptions on the target and observations, butnot on the robot.

The set U of admissible controls is finite.

To simplify notation, we refer to the state of the team as xt at time t.


Problem Statement


xi,t+1 = fi (xi,t , ui,t), i ∈ 1, . . . , n (1)

where xi,t ∈ X , ui,t ∈ U , and U is a finite set.The robots goal is to track a target yt with target motion model:

yt+1 = Ayt + wt , wt ∼ N (0,W ) t = 0, . . . ,T (2)







Problem Statement


xi,t+1 = fi (xi,t , ui,t), i ∈ 1, . . . , n (1)


yt+1 = Ayt + wt , wt ∼ N (0,W ) t = 0, . . . ,T (2)







Problem Statement


xi,t+1 = fi (xi,t , ui,t), i ∈ 1, . . . , n (1)


yt+1 = Ayt + wt , wt ∼ N (0,W ) t = 0, . . . ,T (2)







Active Information Gathering

Problem (Active Information Acquisition)

Given an initial sensor pose x0 ∈ X , a prior distribution over the targetstate y0, and a finite planning horizon T , the task is to choose a sequenceof control actions: u0, . . . , uT−1 ∈ UT which minimize conditional entropyh between the target state: y1:T and the measurement set z1:T :

minu∈UT

J(n)T (u) :=

T∑t=1

h(yt |z1:t , x1:t)† (4)

s.t. xt+1 = f (xt , ut), t = 0, ...,T − 1

yt+1 = Ayt + wt t = 0, ...,T − 1

zt = H(xt)yt + vt t = 0, ...,T

(5)

† Note that many other information measures are possible, such as mutualinformation, minimum mean-squared error, and others.1

1Nikolay Atanasov et al. “Information acquisition with sensing robots: Algorithmsand error bounds”. In: 2014 IEEE International Conference on Robotics andAutomation (ICRA). 2014.


Reduction to Open-Loop Control

Prior y0 is Gaussian N (y0,Σ0), =⇒ Kalman filter is optimal.

Objective proportional to log det, i.e. h(yt |z1:t , x1:t) ∼ log det(Σt)

Σt independent of realizations z1:t =⇒ open loop control optimal!2

Problem (Active Information Acquisition)

minµ∈UT

J(n)T (µ) :=

T∑t=1

log det(Σt)

s.t. xt+1 = f (xt , ut), t = 0, ...,T − 1

Σt+1 = ρpt+1(ρet (Σt), xt+1), t = 0, ...,T − 1

Update: ρet (Σ, x) := Σ− Kt(Σ, x)Ht(x)Σ

Kt(Σ, x) := ΣHt(x)T (Ht(x)ΣHt(x)T + Vt(x))−1

Predict: ρpt (Σ) := AtΣATt + Wt

2Jerome Le Ny and George J Pappas. “On trajectory optimization for active sensingin Gaussian process models”. In: 2009 IEEE Conference on Decision and Control(CDC). 2009.


Optimal Solution

Deterministic optimal control problem can be solved with Forward ValueIteration, which builds a search tree from (x0, Σ0)

0 0,x

3u

Branching due to number of motion primitives

Depth due to length of planning horizon

2u1u

Full search has complexity O(‖U‖T ).

Can we do better?


What if trajectories cross?

Want: Conditions on Σt to eliminate nodes if trajectories cross at level t ofthe tree:

2

0x

31 2

1x 31x

11x

12x

22x 3

2x42x

52x

1 2

32 1 3(1 )a a ±

There is a convex combination of Σ1, and Σ3 that dominates Σ2, i.e.Σ2 aΣ1 + (1− a)Σ3. It can be safely removed from the tree.


ε - Algebraic Redundancy

Want: Conditions on Σt to eliminate nodes if trajectories cross:3

Definition (ε-Algebraic Redundancy )

Let ε ≥ 0 and let ΣiKi=1 ⊂ Sn+ be a finite set. A matrix Σ ∈ Sn

+ isε-algebraically redundant with respect to Σi if there exist nonnegativeconstants αiKi=1 such that:

K∑i=1

αi = 1 and Σ + εIn K∑i=1

αiΣi

2 1 3(1 )a a ±

Example: Suppose aΣ1 + (1− a)Σ3 is a convexcombination of Σ1, and Σ2. Σ1 and Σ3 cannotstrictly eliminate Σ2, because it is not containedin aΣ1 + (1− a)Σ3. However if we relax theinequality by εI , we can remove Σ2.

3Michael P Vitus et al. “On efficient sensor scheduling for linear dynamical systems”.In: Automatica 48.10 (2012), pp. 2482–2493.


(ε, δ)- Reductions

Can we remove even more trajectories?

What if the trajectories do not exactly cross at time t?

Definition (Trajectory δ-Crossing)

Trajectories π1, π2 ∈ XT δ-cross at time t ∈ [1,T ] if dX (π1t , π2t ) ≤ δ for

δ ≥ 0.


(ε, δ) Reduced Value Iteration

Idea: Prune trajectories close in space and similar in informativeness

Algorithm 1 (ε, δ) Reduced Value Iteration

1: J0 ← 0, S0 ← (x0,Σ0, J0). St ← ∅ for t = 1, ...T2: for t = 1 : T do3: for all (x ,Σ, J) ∈ St−1 do4: for all u ∈ U do5: xt ← f (x , u),Σt ← ρet+1(ρpt (Σ, x , u), xt))6: Jt ← J + log det(Σt)7: St ← St ∪ (xt ,Σt , Jt)8: end for9: end for

10: Sort St in ascending order according to log det(·)11: S ′t ← St [1]12: for all (x ,Σ, J) ∈ St \ St [1] do13: % Find all nodes in S ′t , which δ-cross x:14: Q ← (Σ′)|(x ′,Σ′, J ′) ∈ S ′t , dX (x , x ′) ≤ δ15: if isempty(Q) or not(Σ is ε-alg-red wrt Q) then16: S ′t ← S ′t ∪ (x ,Σ, J)17: end if18: end for19: St ← S ′t20: end for21: return min J | (x ,Σ) ∈ ST


Expanding Nodes inSearch Tree =⇒

(ε,δ)-Pruning =⇒

Performance Guarantee

Search tree with (ε, δ)-pruning is (ε, δ)-suboptimal with cost Jε,δT suchthat the bound holds4

0 ≤ Jε,δT − J∗T ≤ (ζT − 1)[J∗T − log det(W )

]+ ε∆T (6)

where ζT :=t−1∏τ=1

(1 +τ∑

s=1

Lsf Lmδ) ≥ 1 (7)

and ∆T , Lsf and Lm are constants resulting from continuity assumptions.

Note the tradeoff in performance: (ε,δ) → (0, 0) =⇒ Jε,δT → J∗T .

If (ε,δ) → (∞,∞), we get a greedy policy with no guarantee.

This bound is monotone in ε, δ.

4Nikolay Atanasov et al. “Information acquisition with sensing robots: Algorithmsand error bounds”. In: 2014 IEEE International Conference on Robotics andAutomation (ICRA). 2014.


Motivation for Anytime Planning

How do ε,δ affect execution time?

What if problem dimension changes? (e.g. more targets)

What if team changes? (e.g. more robots)

All of these can be addressed by modifying the search algorithm toiteratively construct the tree, adding progressively more trajectories untilexecution time runs out, always returning a feasible solution. This is calledan anytime algorithm.


Anytime Reduced Value Iteration

Idea: First build a greedy search tree, then iteratively ImprovePath untiltime expires

To efficiently re-use computations, we mark all computed nodes St as openand closed, i.e. Ot , Ct at all levels t in the tree5

Algorithm 2 Anytime Reduced Value Iteration (x0, Σ0, TARVI )

1: J0 ← 0,S0 ← (x0,Σ0, J0). St ← ∅ for t = 1, ...T2: C0 ← ∅3: Ot ← St for t = 0, ...T4: S ← S0, ..ST, O ← O0, ...OT, C ← C0, ..CT5: ε←∞, δ ←∞6: S,O, C ← ImprovePath(S,O, C, ε, δ)7: Publish best solution from O[T ]8: while Time Elapsed ≤ TARVI do9: Decrease (ε, δ)

10: S,O, C ← ImprovePath(S,O, C, ε, δ)11: Publish best solution J from O[T ]12: end while

5Brent Schlotfeldt et al. “Anytime Planning for Decentralized Multirobot ActiveInformation Gathering”. In: IEEE Robotics and Automation Letters 3.2 (2018),pp. 1025–1032.


Tree Interpretation


Improve Path

Algorithm 3 ImprovePath (S, O, C, ε, δ)

1: for t = 1 : T do2: for all (x ,Σ, J) ∈ Ot−1 \ Ct−1 do3: Ct−1 ← Ct−1 ∪ x ,Σ4: for all u ∈ U do5: xt ← f (x , u), Σt ← ρxt (Σ)6: Jt ← J + log det(Σt)7: St ← St ∪ (xt ,Σ, Jt))8: end for9: end for

10: Sort St in ascending order according tolog det(·)

11: Ot ← Ot ∪ St [1]12: for all (x ,Σ, J) ∈ St \ Ot do13: % Find all nodes in Ot , which δ-cross x:14: Q ← Σ′|(x ′,Σ′, J ′) ∈ Ot , dX (x , x ′) ≤ δ15: Ot ← Ot ∪ (x ,Σ, J)16: end for17: O[t] = Ot

18: end for19: return S,O, C


Expanding Open, but notClosed nodes =⇒

(ε,δ)-pruning by markingnodes Open =⇒

Main Result

The following holds regarding the performance of ARVI:6

Theorem (ARVI)

The following are satisfied by the ARVI algorithm:

(i) When ImprovePath(ε, δ) returns with finite (ε, δ), the returnedsolution is guaranteed to be (ε, δ)-sub-optimal.

(ii) The cost J(n)T of the returned solution decreases monotonically over

time.

(iii) Given infinite time, C = O ⊆ S, and O[T ] will contain the optimalsolution to the planning problem.

6Brent Schlotfeldt et al. “Anytime Planning for Decentralized Multirobot ActiveInformation Gathering”. In: IEEE Robotics and Automation Letters 3.2 (2018),pp. 1025–1032.


What about multiple robots?

Algorithm still scales exponentially in the number of robots.

Consider a coordinate descent approach7. Each robot solves theirown planning problem with fixed trajectories from prior robots.

uc1,0:(T−1) ∈ argminµ∈UT

1

J(1)T (µ) (8)

This continues:

uc2,0:(T−1) ∈ argminµ∈UT

2

J(2)T (uc1,0:(T−1), µ) (9)

...

7Nikolay Atanasov et al. “Decentralized active information acquisition: Theory andapplication to multi-robot SLAM”. . In: 2015 IEEE International Conference onRobotics and Automation (ICRA) (2015), pp. 4775–4782.


Performance

Coordinate Descent reduces the planning complexity from exponentialto linear in number of robots.

Algorithm achieves no less than 50% of the optimal performance, inthe case of submodular objective functions8.

Results can also be extended to (approximately) sub-modularobjective functions.

8Nikolay Atanasov et al. “Decentralized active information acquisition: Theory andapplication to multi-robot SLAM”. . In: 2015 IEEE International Conference onRobotics and Automation (ICRA) (2015), pp. 4775–4782.


What about Estimation?

The robots can use a distributed information filter9, which uses the inversecovariance Ωi,t = Σ−1i,t and ωi,t = Σ−1i,t yi,t :

Update Step: ωi,t+1 =∑

j∈Ni∪i

κi,jωj,t + HTi V−1i zi (t)

Ωi,t+1 =∑

j∈Ni∪i

κi,jΩj,t + HTi V−1i Hi

yi (t) := Ω−1i,t ωi,t

Predict Step: Ωi,t+1 = (AΩ−1i,t+1AT + W )−1

ωi,t+1 = Ωi,t+1Ayi,t+1

(10)

9Nikolay Atanasov et al. “Joint estimation and localization in sensor networks”. In:Decision and Control (CDC), 2014 IEEE 53rd Annual Conference on. IEEE. 2014,pp. 6875–6882.


Application: Target Tracking

We use a discretized unicycle model for f (x , u):x1t+1

x2t+1

θt+1

=

x1tx2tθt

+

ν sinc(ωτ2 ) cos(θt + ωτ2 )

ν sinc(ωτ2 ) sin(θt + ωτ2 )

τω

(11)

The target model is a double integrator:

yt+1 = A

[I2 τ I20 I2

]yt + wt , wt ∼ N

(0, q

[τ 3/3I2 τ 2/2I2τ 2/2I2 τ I2

])(12)

The robots have range and bearing sensors.

zt,m = h(xt , yt,m) + vt , vt ∼ N(0,V (xt , yt,m)

)h(x , y) =

[r(x , y)α(x , y)

]:=

[ √(y1 − x1)2 + (y2 − x2)2

tan−1((y2 − x2)(y1 − x1))− θ

](13)




x2t+1

θt+1

=

x1tx2tθt

+



τω

(11)


yt+1 = A

[I2 τ I20 I2

]yt + wt , wt ∼ N

(0, q

[τ 3/3I2 τ 2/2I2τ 2/2I2 τ I2

])(12)



)h(x , y) =

[r(x , y)α(x , y)

]:=

[ √(y1 − x1)2 + (y2 − x2)2

tan−1((y2 − x2)(y1 − x1))− θ

](13)




x2t+1

θt+1

=

x1tx2tθt

+



τω

(11)


yt+1 = A

[I2 τ I20 I2

]yt + wt , wt ∼ N

(0, q

[τ 3/3I2 τ 2/2I2τ 2/2I2 τ I2

])(12)



)h(x , y) =

[r(x , y)α(x , y)

]:=

[ √(y1 − x1)2 + (y2 − x2)2

tan−1((y2 − x2)(y1 − x1))− θ

](13)


Linearization, and MPC

Due to the non-linear observation model, we linearize about a predicted targettrajectory y and apply MPC:

∇yh(x , y) =1

r(x , y)

[(y1 − x1) (y2 − x2) 01x2

− sin(θ + α(x , y)) cos(θ + α(x , y)) 01x2

](14)

Algorithm 4 Anytime Multi-Robot Target Tracking

1: Input: Tmax , x0, ω0, Ω0, f ,U ,H,V ,A,W ,T ,TARVI

2: while t = 1 : Tmax do3: Send ωi ,t and Ωi ,t to neighboring robots.4: Receive measurements zi ,t and perform distributed update step with

any neighbor ωi ,t , Ωi ,t received.5: Predict a target trajectory of length T : yt , ...yT , and linearize obser-

vation model: Ht ← ∇yh(·, yt)6: Plan T -step trajectories with ARVI (Alg. 2) and coordinate descent.7: Apply first control input to move each robot.8: end while


Linearization, and MPC

Due to the non-linear observation model, we linearize about a predicted targettrajectory y and apply MPC:

∇yh(x , y) =1

r(x , y)

[(y1 − x1) (y2 − x2) 01x2

− sin(θ + α(x , y)) cos(θ + α(x , y)) 01x2

](14)

Algorithm 5 Anytime Multi-Robot Target Tracking

1: Input: Tmax , x0, ω0, Ω0, f ,U ,H,V ,A,W ,T ,TARVI

2: while t = 1 : Tmax do3: Send ωi ,t and Ωi ,t to neighboring robots.4: Receive measurements zi ,t and perform distributed update step with

any neighbor ωi ,t , Ωi ,t received.5: Predict a target trajectory of length T : yt , ...yT , and linearize obser-

vation model: Ht ← ∇yh(·, yt)6: Plan T -step trajectories with ARVI (Alg. 2) and coordinate descent.7: Apply first control input to move each robot.8: end while


Simulation Environment


Some Results

0 200 400 600 800 1000

Timestep

-2

-1

0

1

2

En

tro

py p

er

targ

et

(na

ts)

Average Target Entropy

0 m

1 m

5 m

10 m

15 m

m

(a)

0 200 400 600 800 1000

Timestep

0

2

4

6

8

10

Ave

rag

e t

arg

ets

de

tecte

d

Number of Targets

0 m

1 m

5 m

10 m

15 m

m

(b)

0 200 400 600 800 1000

Timestep

0

0.2

0.4

0.6

0.8

1

1.2

RM

SE

pe

r ta

rge

t (m

)

Average RMSE over time

0 m

1 m

5 m

10 m

15 m

m

(c)

0 100 200 300 400 500 600

Timestep

-3

-2

-1

0

1

2

3

4

En

tro

py p

er

targ

et

(na

ts)

Entropy per Target

TARVI

= 0 (greedy)

TARVI

= 1.5s

TARVI

= 3s

(d)

0 100 200 300 400 500 600

Timestep

0

5

10

15

20

25N

um

be

r o

f ta

rge

tsNumber of Targets Tracked

TARVI

= 0 (greedy)

TARVI

= 1.5s

TARVI

= 3s

(e)

0 100 200 300 400 500 600

Timestep

0

0.5

1

1.5

2

RM

SE

(m

)

Position RMSE per target

TARVI

= 0 (greedy)

TARVI

= 1.5s

TARVI

= 3s

(f)

Figure: Simulation results with 9 robots tracking 16 moving targets. The top rowsweeps over communication radius, and the bottom row sweeps planning time.


Conclusion

Future Work:

Resilient Active Information Acquisition (IROS 2018)

Unknown Target Models

Quadrotor Motion Model, Camera observation Model, ContinuousActions, etc.


Documents

Anytime Planning for Decentralized Multi-Robot Active … · 2020-02-11 · Anytime Planning for Decentralized Multi-Robot Active Information Gathering Brent Schlotfeldt1 Dinesh Thakur1