Strategic and Tactical Planning - The Rachel and Selim Benin

Strategic and Tactical PlanningA glimps of future

Zinovi RabinovichJeffrey S. Rosenschein

School of Engineering and Computer SciencesHebrew University in Jerusalem

Strategic and Tactical Planning – p.1/23

Agenda

Planning - the common viewBlocks world exampleDriving a carStrategic vs. Tactical Planning’Let it be’ plansTactics - proposed solutionPotential applicability


Planning - AI inheritance

Recall the most classical AI view of the world: StateOriented Domains (SOD)[3]:

A world is perceived to be in one of a presetgroup of statesand a set of actions is provided to shift the worldfrom one state to another

Notice that it is a very nice kind of worldWorld’s state is knownActions perform in exactly the prescribed wayClosed world assumption holds










Notice that it is a very nice kind of world

World’s state is knownActions perform in exactly the prescribed wayClosed world assumption holds







Planing - AI inheritance (cont)

In SODs, planning’s subject is to sway the worldfrom one state to another in some optimal way.

Thus, a plan is a sequence of actions that brings theworld from the current state to a certain target state.The usual level of optimality of such plans is thenumber of steps (actions) prescribed by the plan toreach the desired state.



In SODs, planning’s subject is to sway the worldfrom one state to another in some optimal way.Thus, a plan is a sequence of actions that brings theworld from the current state to a certain target state.

The usual level of optimality of such plans is thenumber of steps (actions) prescribed by the plan toreach the desired state.



In SODs, planning’s subject is to sway the worldfrom one state to another in some optimal way.Thus, a plan is a sequence of actions that brings theworld from the current state to a certain target state.The usual level of optimality of such plans is thenumber of steps (actions) prescribed by the plan toreach the desired state.


Blocks WorldFor example consider the Blocks World domain...

Plan:Move black from 2 onto white at 1Move gray from 3 onto table at 2Move black from 1 onto gray at 3Move white from 1 onto gray at 2

















Rigidity of approach

However all the major assumptions of SOD cease to existas we attempt to move toward a more realistic setup.

Our sensory system is clogged by noise and is asubject to aliasingActions are seldom accurate and tend to have sideeffectsThe world keeps ’spinning’ without any interventionfrom us




Our sensory system is clogged by noise and is asubject to aliasing

Actions are seldom accurate and tend to have sideeffectsThe world keeps ’spinning’ without any interventionfrom us




Our sensory system is clogged by noise and is asubject to aliasingActions are seldom accurate and tend to have sideeffects

The world keeps ’spinning’ without any interventionfrom us




Our sensory system is clogged by noise and is asubject to aliasingActions are seldom accurate and tend to have sideeffectsThe world keeps ’spinning’ without any interventionfrom us


Planning solutions to world dynamics

Conditional plans[2, 1] came to help to deal withcontingencies of plan executionas well as Partial (Global) Planning, to use multipleparticipating entities and develop the plan on-the flyand many others: via negotiation, mixed-initiative,etc.

... but they all assume that we’d like to keep the worldstate under control, or at least in certain descriptivebounds...


Planning solutions to world dynamics

Conditional plans[2, 1] came to help to deal withcontingencies of plan executionas well as Partial (Global) Planning, to use multipleparticipating entities and develop the plan on-the flyand many others: via negotiation, mixed-initiative,etc.

... but they all assume that we’d like to keep the worldstate under control, or at least in certain descriptivebounds...


Driving a car

Imagine a car running in a single lane road, e.g. formulaone race-car

What is the set of all possible states?

Set of all possible margins from road edgeWe can discretize the domain for simplicity

What is the subset of all states we’d like to be in?

Those in the middle of the road, far-far awayfrom the edges, raw-ground and pedestrians


Driving a car


What is the set of all possible states?Set of all possible margins from road edgeWe can discretize the domain for simplicity

What is the subset of all states we’d like to be in?

Those in the middle of the road, far-far awayfrom the edges, raw-ground and pedestrians


Driving a car


What is the set of all possible states?Set of all possible margins from road edgeWe can discretize the domain for simplicity

What is the subset of all states we’d like to be in?Those in the middle of the road, far-far awayfrom the edges, raw-ground and pedestrians


Driving a car (cont)

Consider the following plan: push the car into (roughly)the middle of the road and leave it there

The plan achieves the presence of a car in the middleof the roadThe plan works almost always (we can’t eradicate the case where a

16ton carrier will propel the car into oblivion) and does not need revisionThough the plan was correct, we didn’t mean for the carto stay stationary...So what happened?





16ton carrier will propel the car into oblivion) and does not need revision

Though the plan was correct, we didn’t mean for the carto stay stationary...So what happened?





16ton carrier will propel the car into oblivion) and does not need revisionThough the plan was correct, we didn’t mean for the carto stay stationary...So what happened?


Driving a (moving) car

There are actually two different reasoning levels fordriving a (moving) car:

The reason for being in that car - desire to trace atrajectory from point A to point B over timeThe reason for wheels adjustment - forcing a car tostay at a given trajectory over time

We do not have a stationary, goal margin(s) to road edges.Rather we’d like it to develop according to a certain de-sign.




The reason for being in that car - desire to trace atrajectory from point A to point B over time

The reason for wheels adjustment - forcing a car tostay at a given trajectory over time













Strategic vs. Tactical Planning

Planning (and especially in dynamic environment) is(roughly) a two level construction:

Strategic - high level - transforming system goalsinto desired system dynamicsTactical - low level - building a sequence of actionsthat attempt to force the system into the desired formof dynamics.


Strategic/Tactical Loop

The two levels of the hierarchy create a relentless flow ofplanning:

Given global goal and previous success of followingstrategic directives, update and formulate analternative strategyGiven a strategy, provide tactical (= implementation)support and evaluation of feasibility


Compare:

In classical planning tactical level is degenerative

Even in conditional planning we allow the system todevelop freely and simply describe for each contingencythe desired continuation

Strong ’tactical’ level allows re-planning procedures(should such occur) to be part of a standard planningloop

Previously exceptional, radical, potentially fatal (strategic)plan failure, now becomes a common, mild, recoverablesituation, part of normal activity


Compare:

In classical planning tactical level is degenerativeEven in conditional planning we allow the system todevelop freely and simply describe for each contingencythe desired continuation




Compare:





Compare:





FormalismTo formalize the tactical level operations we use aPOMDP like description:

Given a system described by:

A set of possible states

�

Possible control actions

�

System transition dynamics:

��

An initial state � �

Set of possible observations�

Observation probabilities��

Strategic target � � � � � �

Find the sequence of actions such that observed system dynamics��

would be as close as possible to � - minimize tacticaldistance


Stayin’ alive plans

How can we measures distance between twofunctions?

The functions are actually probabilities useKulbach-Leibler distance

How do we treat time and value over time?Compute resulting probability distribution ofdistance and keep the probability of breaking athreshold low - just stay alive





��

� � ��






��

� � ��

How do we treat time and value over time?

Compute resulting probability distribution ofdistance and keep the probability of breaking athreshold low - just stay alive





��

� � ��



Tactics - proposed solution

Keep track of probable system state - � � !#" $Keep track of estimated system transitions - �� !" %& " $

Given current beliefs select an action:

' ( ) *+ , - .0/12 3 4�5 6

7 4�5 89 1�: 5 6

3 4�; 9 5 8 : 1 : 5 6<>= ? @BA 7C D @BE FG E H G GBI @BE F G E H H

But how can we keep track of and ?


Tactics - proposed solution

Keep track of probable system state - � � !#" $Keep track of estimated system transitions - �� !" %& " $

Given current beliefs select an action:

' ( ) *+ , - .0/12 3 4�5 6

7 4�5 89 1�: 5 6

3 4�; 9 5 8 : 1 : 5 6<>= ? @BA 7C D @BE FG E H G GBI @BE F G E H H

But how can we keep track of � !" $

and � !" %& " $

?


Proposed solution (cont)

Initialize your beliefs to some prior distribution

Use “Bayesian anti-aliasing” for � !#" $

:

A 7C D @BE H J A @BK G EML ' H5 8

N @BE G 'L E F H A 7 @BE F H

For � !#" %& " $

solve the following:

- .0/3 4�5 8 9 5 6

2 3O 4 5 6 <>= ? @BA @BE F G E H G G A 7 @E F G E H H

s.t.

A 7C D @E H ) PE F5

A @BE FG E H A 7 @BE H

PEF

5A @E F G E H ) Q


Conditional applicability

Consider a multi-agent system with communication under thefollowing assumptions:

Communication activity does not changes the environment andis integrated into the global action set

Action cost and state transition evaluation are separable

denote R @BE FL 'L E H

the overall value of transition from state E

to E F

under action ' then exists:

R @BE FL 'L E H ) I @E F G E H S T @ 'G E H

Then it is possible to create (optimal) control of the system using the

above strategic vs. tactical planning paradigmStrategic and Tactical Planning – p.18/23

MAS Strategic vs. Tactical protocol

Strategic levels of different agents will agree upon atarget in a communication session

Basically creating a common evaluation function, and converting it into a target distribution

Under assumption of complete coordination, anagent will use proposed tactical planning to complywith the strategy

Strong failure of the strategy, will initiate communicationfor the repetition of the strategic layer operation




Basically creating a common evaluation function

I U VXW V Y Z

, and converting it into a target distribution







I U VXW V Y Z








I U VXW V Y Z





Tactical communication timing

Communication cost is equivalent to one decision step:Tactical level communication will ariseautomatically as a sole action that does not have thecapability to hinder the system developmentdynamics

[ \ � ]^ �_ `bacd e f � gh f � ij c k � ge fml j � i k c k � g�n� � � ho p � q � � � � � � q � �


Tactical communication timing

Communication cost is an elaborate function:rs t

Convert the function into distributionKeep track of action usage distribution u !#v & " $

.Use the following to select an action:

[ \ � ]^ �_ `acd e f � gh f � ij c k � ge fml j � i k c k � g�� h o p � q � � � � � � q � � xw y �n� � {z ho p [ � � � �{| [ � �


ConclusionA novel view of planning and plans was introduced.New optimality measure for agent behavior in astochastic environment was developed in theframework of continual planning.A feasible algorithm for multi-agent communicationutilization under the measure was proposed.


Future WorkInvestigate the connections between the classicaloptimality measure(s) and tactical distanceProve/disprove that proposed tactical solution is anoptimal one relative to tactical distanceInvestigate the effects of tactical solution onmulti-agent system with communication


References[1] Jim Blythe. An overview of planning under

uncertainty. volume 1600 of Lecture Notes inComputer Science, pages 85–?? 1999.

[2] Craig Boutilier, Thomas Dean, and Steve Hanks.Decision-theoretic planning: structural assumptionsand computational leverage. Journal of ArtificialIntelligence Research, 11:1–94, 1999.

[3] Jeffrey S. Rosenschein and Gilad Zlotkin. Rules ofEncounter: Designing Conventions for AutomatedNegoti ation Among Computers. MIT Press,Cambridge, Massachusetts, 1994.


Documents

Strategic and Tactical Planning - The Rachel and Selim Benin