32
Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe http://teamcore.usc.edu Lafayette College

Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe Lafayette College

Embed Size (px)

Citation preview

Page 1: Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe  Lafayette College

Towards a Theoretic Understanding of DCEE

Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe

http://teamcore.usc.edu

LafayetteCollege

Page 2: Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe  Lafayette College

2

Forward Pointer

When Should There be a “Me” in “Team”? Distributed Multi-Agent Optimization Under Uncertainty

Matthew E. Taylor, Manish Jain, Yanquin Jin, Makoto Yooko, & Milind Tambe

Wednesday, 8:30 – 10:30 Coordination and Cooperation 1

Page 3: Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe  Lafayette College

Teamwork: Foundational MAS Concept

• Joint actions improve outcome• But increases communication & computation• Over two decades of work

• This paper: increased teamwork can harm team– Even without considering communication & computation– Only considering team reward– Multiple algorithms, multiple settings– But why? 3

Page 4: Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe  Lafayette College

DCOPs: Distributed Constraint Optimization Problems• Multiple domains

– Meeting scheduling– Traffic light coordination– RoboCup soccer– Multi-agent plan coordination– Sensor networks

• Distributed– Robust to failure– Scalable

• (In)Complete– Quality bounds

Page 5: Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe  Lafayette College

a2 a3 Reward

10

0

0

6

a1 a2 Reward

10

0

0

6

DCOP Framework

R(a) RS (aS )S

a1 a2 a3

5

Page 6: Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe  Lafayette College

a2 a3 Reward

10

0

0

6

a1 a2 Reward

10

0

0

6

DCOP Framework

R(a) RS (aS )S

a1 a2 a3

6

Page 7: Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe  Lafayette College

a2 a3 Reward

10

0

0

6

a1 a2 Reward

10

0

0

6

DCOP Framework

R(a) RS (aS )S

a1 a2 a3

Different “levels” of teamwork possibleComplete Solution is NP-Hard 7

Page 8: Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe  Lafayette College

D-CEE: Distributed Coordination of Exploration and Exploitation

• Environment may be unknown• Maximize on-line reward over some number of rounds

– Exploration vs. Exploitation

• Demonstrated mobile ad-hoc network– Simulation [Released] & Robots [Released Soon]

Page 9: Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe  Lafayette College

9

DCOP

Distrubted Constraint Optimization Problem

Page 10: Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe  Lafayette College

10

DCOP → DCEE

Distributed Coordination of Exploration and Exploitation

Page 11: Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe  Lafayette College

DCEE Algorithm: SE-Optimistic (Will build upon later)

a1 a2 a3 a4

Rewards on [1,200]

99 50 75

If I move, I’d get R=200

11

Page 12: Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe  Lafayette College

a3a3

DCEE Algorithm: SE-Optimistic (Will build upon later)

a1 a2 a4

Rewards on [1,200]

99 50 75

If I move, I’d gain

101

If I move, I’d gain

251

If I move, I’d gain

275

If I move, I’d gain

125

Explore or Exploit?

12

Page 13: Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe  Lafayette College

Success! [ATSN-09][IJCAI-09]

• Both classes of (incomplete) algorithms• Simulation and on Robots

– Ad hoc Wireless Network

(Improvement if performance > 0)

Chain Density = 1/3 Density = 2/3 Full0

0.1

0.2

0.3

0.4

0.5

0.6

Varying Topology

SE-Optimistic

SE-Mean

BE-Stay

BE-Backtrack

BE-RebidSca

led

Cum

ulat

ive

Sig

nal S

tren

gth

Page 14: Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe  Lafayette College

k-Optimality• Increased coordination – originally DCOP formulation

– In DCOP, increased k = increased team reward

• Find groups of agents to change variables – Joint actions– Neighbors of moving group cannot move

• Defines amount of teamwork

(Higher communication & computation overheads)

15

Page 15: Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe  Lafayette College

16

“k-Optimality” in DCEE• k=1, 2, ...

o Groups of size k form, those with the most to gain move (change the value of their variable)

o A group can only move if no other agents in its neighborhood move

Page 16: Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe  Lafayette College

a3a2a2 a3

a3

Example: SE-Optimistic-2

a1 a2 a4

Rewards on [1,200]

99 50 75

If I move, I’d gain

101

If I move, I’d gain

251

If I move, I’d gain

275

If I move, I’d gain

125

a1 a499 50 75

101 + 251

- 101

251 + 275

- 150

275 + 250

- 150

125 + 275 - 125

200-99

Page 17: Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe  Lafayette College

Sample coordination results

Omnscient Optimistic Omniscient Optimistic0

10000

20000

30000

40000 k=1 k=2 k=3

Tot

al A

vera

ge G

ain

Complete Graph Chain Graph

ArtificiallySuppliedRewards(DCOP)

Omniscient: confirms DCOP result, as expected

?! !

Page 18: Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe  Lafayette College

Physical Implementation

• Create Robots• Mobile ad-hoc Wireless Network

Page 19: Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe  Lafayette College

20

Confirms Team Uncertainty Penalty

• Averaged over 10 trials each• Trend confirmed!• (Huge standard error)

Tota

l Gai

n

Chain Complete

?! !

Page 20: Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe  Lafayette College

21

Problem with “k-Optimal”

• Unknown rewards– cannot know if can increase reward by moving!

• Define new term: L-Movement– # of agents that can change variables per round– Independent of exploration algorithm– Graph dependant– Alternate measure of teamwork

Page 21: Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe  Lafayette College

L-Movement• Example: k = 1 algorithms

– L is the size of the largest maximal independent set of the graph– NP-hard to calculate for a general graph– harder for higher k

• Consider ring & complete graphs, both with 5 vertices – ring graph: maximal independent set is 2– complete graph: maximal independent set is 1

• For k =1 – L=1 for a complete graph– size of the maximal independent set of a ring graph is:

General DCOP Analysis Tool?

Page 22: Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe  Lafayette College

23

Configuration HypercubeNo (partial-)assignment is believed to be better than anotherwlog, agents can select next value when exploring

Define configuration hypercube: CEach agent is a dimension

is total reward when agent takes value cannot be calculated without explorationvalues drawn from known reward distribution

Moving along an axis in hypercube → agent changing value

Example: 3 agents (C is 3 dimensional)Changing from C[a, b, c] to C[a, b, c’]Agent A3 changes from c to c’

Page 23: Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe  Lafayette College

24

How many agents can move? (1/2) 

• In a ring graph with 5 nodeso k = 1 : L = 2o k = 2 : L = 3 

• In a complete graph with 5 nodeso k = 1 : L = 1o k = 2 : L = 2

Page 24: Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe  Lafayette College

25

Configuration is reachable by an algorithm with movement L in s steps

if an only if

and

How many agents can move? (2/2)

C[2,2] reachable for L=1 if s ≥ 4

Page 25: Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe  Lafayette College

L-Movement ExperimentsFor various DCEE problems, distributions, and L:

For steps s = 1...30:1. Construct hypercube with s values per dimension2. Find M, the max achievable reward in s steps, given L3. Return average of 50 runs

Example: 2D Hypercubeo Only half reachable if L=1 o All locations reachable if L=2

s

s

Page 26: Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe  Lafayette College

27

Restricting to L-Movement: Complete

L=1→2Complete Graph o k = 1 : L = 1o k = 2 : L = 2

Ave

rage

Max

imum

Rew

ard

Dis

cove

red

Page 27: Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe  Lafayette College

Restricting to L-Movement: Ring

L=2→3

Ring grapho k = 1 : L = 2o k = 2 : L = 3 

28

Ave

rage

Max

imum

Rew

ard

Dis

cove

red

Page 28: Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe  Lafayette College

29

1. Uniform distribution of rewards

2. 4 agents

3. Different normal distribution

Complete Ring

Page 29: Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe  Lafayette College

30

k and L: 5-agent graphsK value Ring Graph, L value Complete Graph, L value

1 2 1

2 3 2

3 3 3

4 4 4

5 5 5

• Increasing k changes L less in ring than complete• Configuration Hypercube is upper bound

• Posit a consistent negative effect• Suggests why increasing k has different effects:

• Larger improvement in complete than ring for increasing k

Page 30: Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe  Lafayette College

31

L-movement May Help Explain Team Uncertainty Penalty

• L = 2 will be able to explore more of C than algorithm with L = 1– Independent of exploration algorithm!– Determined by k and graph structure– C is upper bound – posit constant negative effect

• Any algorithm experiences diminishing returns as k increases– Consistent with DCOP results

• L-movement difference between k = 1 algorithms and k = 2 – Larger difference in graphs with more agents– For k = 1, L = 1 for a complete graph – For k = 1, L increases with the number of vertices in a ring graph

Page 31: Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe  Lafayette College

Towards a Theoretic Understanding of DCEE

Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe

http://teamcore.usc.edu

Thank you

Page 32: Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe  Lafayette College

33