Upload
mikel-merrett
View
219
Download
2
Tags:
Embed Size (px)
Citation preview
Towards a Theoretic Understanding of DCEE
Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe
http://teamcore.usc.edu
LafayetteCollege
2
Forward Pointer
When Should There be a “Me” in “Team”? Distributed Multi-Agent Optimization Under Uncertainty
Matthew E. Taylor, Manish Jain, Yanquin Jin, Makoto Yooko, & Milind Tambe
Wednesday, 8:30 – 10:30 Coordination and Cooperation 1
Teamwork: Foundational MAS Concept
• Joint actions improve outcome• But increases communication & computation• Over two decades of work
• This paper: increased teamwork can harm team– Even without considering communication & computation– Only considering team reward– Multiple algorithms, multiple settings– But why? 3
DCOPs: Distributed Constraint Optimization Problems• Multiple domains
– Meeting scheduling– Traffic light coordination– RoboCup soccer– Multi-agent plan coordination– Sensor networks
• Distributed– Robust to failure– Scalable
• (In)Complete– Quality bounds
a2 a3 Reward
10
0
0
6
a1 a2 Reward
10
0
0
6
DCOP Framework
R(a) RS (aS )S
a1 a2 a3
5
a2 a3 Reward
10
0
0
6
a1 a2 Reward
10
0
0
6
DCOP Framework
R(a) RS (aS )S
a1 a2 a3
6
a2 a3 Reward
10
0
0
6
a1 a2 Reward
10
0
0
6
DCOP Framework
R(a) RS (aS )S
a1 a2 a3
Different “levels” of teamwork possibleComplete Solution is NP-Hard 7
D-CEE: Distributed Coordination of Exploration and Exploitation
• Environment may be unknown• Maximize on-line reward over some number of rounds
– Exploration vs. Exploitation
• Demonstrated mobile ad-hoc network– Simulation [Released] & Robots [Released Soon]
9
DCOP
Distrubted Constraint Optimization Problem
10
DCOP → DCEE
Distributed Coordination of Exploration and Exploitation
DCEE Algorithm: SE-Optimistic (Will build upon later)
a1 a2 a3 a4
Rewards on [1,200]
99 50 75
If I move, I’d get R=200
11
a3a3
DCEE Algorithm: SE-Optimistic (Will build upon later)
a1 a2 a4
Rewards on [1,200]
99 50 75
If I move, I’d gain
101
If I move, I’d gain
251
If I move, I’d gain
275
If I move, I’d gain
125
Explore or Exploit?
12
Success! [ATSN-09][IJCAI-09]
• Both classes of (incomplete) algorithms• Simulation and on Robots
– Ad hoc Wireless Network
(Improvement if performance > 0)
Chain Density = 1/3 Density = 2/3 Full0
0.1
0.2
0.3
0.4
0.5
0.6
Varying Topology
SE-Optimistic
SE-Mean
BE-Stay
BE-Backtrack
BE-RebidSca
led
Cum
ulat
ive
Sig
nal S
tren
gth
k-Optimality• Increased coordination – originally DCOP formulation
– In DCOP, increased k = increased team reward
• Find groups of agents to change variables – Joint actions– Neighbors of moving group cannot move
• Defines amount of teamwork
(Higher communication & computation overheads)
15
16
“k-Optimality” in DCEE• k=1, 2, ...
o Groups of size k form, those with the most to gain move (change the value of their variable)
o A group can only move if no other agents in its neighborhood move
a3a2a2 a3
a3
Example: SE-Optimistic-2
a1 a2 a4
Rewards on [1,200]
99 50 75
If I move, I’d gain
101
If I move, I’d gain
251
If I move, I’d gain
275
If I move, I’d gain
125
a1 a499 50 75
101 + 251
- 101
251 + 275
- 150
275 + 250
- 150
125 + 275 - 125
200-99
Sample coordination results
Omnscient Optimistic Omniscient Optimistic0
10000
20000
30000
40000 k=1 k=2 k=3
Tot
al A
vera
ge G
ain
Complete Graph Chain Graph
ArtificiallySuppliedRewards(DCOP)
Omniscient: confirms DCOP result, as expected
?! !
Physical Implementation
• Create Robots• Mobile ad-hoc Wireless Network
20
Confirms Team Uncertainty Penalty
• Averaged over 10 trials each• Trend confirmed!• (Huge standard error)
Tota
l Gai
n
Chain Complete
?! !
21
Problem with “k-Optimal”
• Unknown rewards– cannot know if can increase reward by moving!
• Define new term: L-Movement– # of agents that can change variables per round– Independent of exploration algorithm– Graph dependant– Alternate measure of teamwork
L-Movement• Example: k = 1 algorithms
– L is the size of the largest maximal independent set of the graph– NP-hard to calculate for a general graph– harder for higher k
• Consider ring & complete graphs, both with 5 vertices – ring graph: maximal independent set is 2– complete graph: maximal independent set is 1
• For k =1 – L=1 for a complete graph– size of the maximal independent set of a ring graph is:
General DCOP Analysis Tool?
23
Configuration HypercubeNo (partial-)assignment is believed to be better than anotherwlog, agents can select next value when exploring
Define configuration hypercube: CEach agent is a dimension
is total reward when agent takes value cannot be calculated without explorationvalues drawn from known reward distribution
Moving along an axis in hypercube → agent changing value
Example: 3 agents (C is 3 dimensional)Changing from C[a, b, c] to C[a, b, c’]Agent A3 changes from c to c’
24
How many agents can move? (1/2)
• In a ring graph with 5 nodeso k = 1 : L = 2o k = 2 : L = 3
• In a complete graph with 5 nodeso k = 1 : L = 1o k = 2 : L = 2
25
Configuration is reachable by an algorithm with movement L in s steps
if an only if
and
How many agents can move? (2/2)
C[2,2] reachable for L=1 if s ≥ 4
L-Movement ExperimentsFor various DCEE problems, distributions, and L:
For steps s = 1...30:1. Construct hypercube with s values per dimension2. Find M, the max achievable reward in s steps, given L3. Return average of 50 runs
Example: 2D Hypercubeo Only half reachable if L=1 o All locations reachable if L=2
s
s
27
Restricting to L-Movement: Complete
L=1→2Complete Graph o k = 1 : L = 1o k = 2 : L = 2
Ave
rage
Max
imum
Rew
ard
Dis
cove
red
Restricting to L-Movement: Ring
L=2→3
Ring grapho k = 1 : L = 2o k = 2 : L = 3
28
Ave
rage
Max
imum
Rew
ard
Dis
cove
red
29
1. Uniform distribution of rewards
2. 4 agents
3. Different normal distribution
Complete Ring
30
k and L: 5-agent graphsK value Ring Graph, L value Complete Graph, L value
1 2 1
2 3 2
3 3 3
4 4 4
5 5 5
• Increasing k changes L less in ring than complete• Configuration Hypercube is upper bound
• Posit a consistent negative effect• Suggests why increasing k has different effects:
• Larger improvement in complete than ring for increasing k
31
L-movement May Help Explain Team Uncertainty Penalty
• L = 2 will be able to explore more of C than algorithm with L = 1– Independent of exploration algorithm!– Determined by k and graph structure– C is upper bound – posit constant negative effect
• Any algorithm experiences diminishing returns as k increases– Consistent with DCOP results
• L-movement difference between k = 1 algorithms and k = 2 – Larger difference in graphs with more agents– For k = 1, L = 1 for a complete graph – For k = 1, L increases with the number of vertices in a ring graph
Towards a Theoretic Understanding of DCEE
Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe
http://teamcore.usc.edu
Thank you
33