View
216
Download
1
Tags:
Embed Size (px)
Citation preview
Autonomous Target Assignment:
A Game Theoretical Formulation
Gurdal Arslan & Jeff Shamma
Mechanical and Aerospace EngineeringUCLA
AFOSR / MURI
2
max Global Utility ( assignment )
Setup for Target Assignment Problem
3
Global Utility Example
Global Utility = Utility generated at target j
E [ total value of ( destroyed target – vehicles lost ) ]
j
No engagement hereIndependent engagements
(for example)
4
Joint Optimization
• Can be formulated as an integer programming problem
- Computationally hard
- Relaxation techniques available for suboptimal solutions
• Decentralized implementation
- Requires global information
- Agreement issues can arise
) ( maxa
aU g
Global Utility
),...,( 1 naaAssignment Profile :
5
Game Theory Formulation
• Vehicles are self-interested players with private utilities
• A vehicle need not know other vehicles’ utilities.
• Individual utilities depend on local information only.
• Vehicles negotiate an agreeable assignment.
)(aU i
6
Autonomous Target Assignment Problem
Design
- Vehicle utilities
- Negotiation mechanisms
so that
vehicles agree on an assignment with high Global Utility
using
- low computational power
- low inter-vehicle communication
)(aU i
7
Agreeable Assignment - Nash Equilibrium
• An assignment is a ( pure ) Nash equilibrium if no player has an incentive to unilaterally deviate from it.
• Example :
• Pure Nash equilibria : (1,1), (2,2), (3,3)
• Mixed Nash equilibria : ([ .54 .27 .18] , [ .27 .54 .18])
21 3
2,1 0,0 0,0
0,0 1,2 0,0
0,0 0,0 3,3
1
1 2
2
3
3
8
Utility Design
• Vehicle utilities should be aligned with Global Utility
• Ideal alignment :
– Only globally optimal assignments should be agreeable
– Not possible without computing globally optimal assignments
• Relaxed alignment ( factoredness in Wolpert et al. 2000 ) :
– Globally optimum assignment is always agreeable (pure Nash)
),...,,...,( ),...,~,...,(
),...,,...,( ),...,~,...,(
11
11
nignig
niinii
aaaUaaaU
aaaUaaaU
9
Aligned Utilities - Team Play
• For every vehicle,
• Example :
• Not localized- Each vehicle needs global information
- Low Signal-to-Noise-Ratio (Wolpert et al. 2000)
)()( aUaU gi
21
2 , 2 0 , 0
0 , 0 1 , 1
1
1 2
2
Suboptimal Nash
10
Aligned Utilities - Wonderful Life Utility (Wolpert et al. 2000)
• Marginal contribution of vehicle # i to Global Utility, i.e.,
• Localized :
- Equal marginal contribution to engagements within range
- Signal-to-Noise-Ratio is maximized
)" 0 " :()()( iggi aaUaUaU
no engagement
11
Aligned Utilities - Wonderful Life Utility (Wolpert et al. 2000)
• Aligned :
• Leads to a Potential Game with potential
• Convergent negotiation mechanisms for potential games
),...,,...,( ),...,~,...,(
),...,,...,( ),...,~,...,(
11
11
nignig
niinii
aaaUaaaU
aaaUaaaU
)(aU g
12
A Misaligned Utility Structure
• Equally Shared Utilities :
• Hence
• Global optimum may not be Nash agreeable
• A pure Nash agreeable assignment may not exists at all !
aaU i
i vehiclesingparticipat ofnumber
with engagementby generatedutility )(
i
ig aUaU )()(
13
Negotiation Mechanisms
• At step k, vehicle # i proposes a target
based on the past proposal profiles
• Is there a reasonable negotiation mechanism that leads to a Nash equilibrium ?
• Adopt learning methods in repeated games
)1(),...,1( kaa
)(kai
14
Fictitious Play (FP)
• Empirical frequencies of past proposals
• Vehicle # i proposes the best response to
• Initial proposals are random
.....) , %30 , 20(%)( kqi
target # 1 target # 2
) )1(,...),...,1( ( maxarg)( 1 kqakqUka niia
ii
15
Convergence of FP in Potential Games
• Convergent to a (possibly mixed) Nash equilibrium
• Believed to be generically convergent to a pure Nash
• May be trapped at a suboptimal Nash
21
2 , 2 0 , 0
0 , 0 1 , 1
1
1 2
2
1k allfor ) 2 , 2 () ( ) 2 , 2 () 1 ( kaa
16
Stochastic FP
• Randomness may help avoid a suboptimal assignment
• Positive probability of convergence to any pure Nash equilibrium in almost all potential games
• Conjecture :
Stochastic FP converges to one of the pure Nash equilibria almost surely, in almost all potential games.
)( ) )1(,...),...,1( ( maxarg~)( 1 iniip
i pHkqpkqUkai
17
Stochastic FP with Utility Measurements
• Empirical average payoff for past proposals
• Propose the target with largest empirical average payoff
Ti kVkVkV
ii.....) , )( , )( ()( 21
)( maxarg~)( iiT
pi pH(k)Vpka
ii
target # 1 target # 2
18
Stochastic FP with Utility Measurements
• Advantage :
Don’t need to keep track of empirical frequencies
• Conjecture :
Converges to one of the pure Nash equilibria almost surely, in almost all potential games.
• Convergence may be slower than FP
19
Near Optimum Performance
• Example:
40 uniform weapons negotiate 40 non-uniform targets
20
Simulation Environment
• Consists of entities and a battlefield
21
Entity Types
• Entities can be of different types
• Each entity type represented by a data structure
• For example:
type = ‘uav’
side = ‘blue’
attributes = {‘radius’ ‘pkill’ ‘SARscanrate’ … }
states = {‘health’ ‘location’ ‘heading’ ‘nstores’ … }
routine = ‘uav_rule’;
22
Entity Rules
• How an entity type interacts and performs tasks
• For example, routine for `uav’ type :
• If UAV is alive
– Find all SAM’s within radius
– If no missile is launched & UAV has ammo
Target closest live enemy
23
State Space
• State space consists of
- states of all entities
- environment states
number of iterations left, etc.
• Simulation state is updated essentially based on the entity rules
• At each step, simulation state is displayed using visualization tools.
24
Simulation
• Scalable: One routine for each entity type
• Easy to modify, introduce new types, rules, etc.
25
Extensions & Issues
• Investigate other negotiation mechanisms
- Gradient based mechanisms
- Replicator dynamics
- Finite Memory
- Asynchronous mechanisms
• Explore applications
- Traffic management
- Routing in communication networks