32
COMP5211 Lecture 2: Reactive Agents Fangzhen Lin Department of Computer Science and Engineering Hong Kong University of Science and Technology Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 1 / 32

COMP5211 Lecture 2: Reactive Agentshome.cse.ust.hk/faculty/flin/comp5211/lect2.pdfthe task of learning is to compute a function that responds ”acceptably” to the members of the

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: COMP5211 Lecture 2: Reactive Agentshome.cse.ust.hk/faculty/flin/comp5211/lect2.pdfthe task of learning is to compute a function that responds ”acceptably” to the members of the

COMP5211 Lecture 2: Reactive Agents

Fangzhen Lin

Department of Computer Science and EngineeringHong Kong University of Science and Technology

Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 1 / 32

Page 2: COMP5211 Lecture 2: Reactive Agentshome.cse.ust.hk/faculty/flin/comp5211/lect2.pdfthe task of learning is to compute a function that responds ”acceptably” to the members of the

Overview

Agent is a term referring to entities that can perform certain tasks insome environments autonomously.

An agent needs to have perception, can perform some actions, hassome purpose (goal).

We first study stimulus-response agents, then consider adding statesto these agents.

We consider how to control these agents using rules, neural networks,and genetic algorithms, and whether the programs are desiged orlearned (evolved).

Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 2 / 32

Page 3: COMP5211 Lecture 2: Reactive Agentshome.cse.ust.hk/faculty/flin/comp5211/lect2.pdfthe task of learning is to compute a function that responds ”acceptably” to the members of the

Stimulus-Response Agents

Stimulus-Response (S-R) agents are machines without internal states thatsimply react to immediate stimuli in their environments.

Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 3 / 32

Page 4: COMP5211 Lecture 2: Reactive Agentshome.cse.ust.hk/faculty/flin/comp5211/lect2.pdfthe task of learning is to compute a function that responds ”acceptably” to the members of the

A Boundary-Following RobotA robot in a two-dimensional grid world:

Boundary

Solidobject

The robot senses whetherthe eight surrounding cellsare free for it to occupy

A robot starting here willgo clockwise around theinside of the outer boundary

A robot starting here will gocounterclockwise around theoutside boundary of the object

s1 s2 s3

s8 s4

s7 s6 s5

© 1998 Morgan Kaufmann Publishers

Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 4 / 32

Page 5: COMP5211 Lecture 2: Reactive Agentshome.cse.ust.hk/faculty/flin/comp5211/lect2.pdfthe task of learning is to compute a function that responds ”acceptably” to the members of the

Sensors and Actions

Sensors: eight of them s1 - s8. si = 1 if the corresponding cell is occupied,it equals to 0 otherwise.Actions: the robot can move to an adjacent free cell. There are four ofthem:

1 north - moves the robot one cell up.

2 east - moves the robot one cell to the right.

3 south - moves the robot one cell down.

4 west - moves the robot one cell to the left.

All of them have their indicated effects unless the robot attempts to moveto a cell that is not free; in that case, it have no effect.

Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 5 / 32

Page 6: COMP5211 Lecture 2: Reactive Agentshome.cse.ust.hk/faculty/flin/comp5211/lect2.pdfthe task of learning is to compute a function that responds ”acceptably” to the members of the

Basic Architecture

Designer’s job: specify a function of the sensory inputs that selects actionsappropriate for the task at the hand (boundry-following in our example).In general, it is often convenient to divide the function into twocomponents: perceptual processing and action function:

Perceptualprocessing

Actionfunction

Sensoryinput Action

Featurevector, X

Next to wall

In a corner

Designer’s intendedmeanings:

011

11

© 1998 Morgan Kaufmann Publishers

Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 6 / 32

Page 7: COMP5211 Lecture 2: Reactive Agentshome.cse.ust.hk/faculty/flin/comp5211/lect2.pdfthe task of learning is to compute a function that responds ”acceptably” to the members of the

Perception

The robot has 8 sensory inputs s1,...,s8.

There are 28 = 256 different combinations of these values.

Although some of the combinations are not legal because of ourrestriction against tight spaces (spaces between objects andboundaries that are only one cell wide), there are still many legal ones.

For the task at the hand, we only need four binary-valued featuresabout the sensor: x1, x2, x3, x4:

x1 x2 x3 x4

In each diagram, the indicated feature has value 1 if and only if at least one of the shaded cells is not free.

© 1998 Morgan Kaufman Publishers

Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 7 / 32

Page 8: COMP5211 Lecture 2: Reactive Agentshome.cse.ust.hk/faculty/flin/comp5211/lect2.pdfthe task of learning is to compute a function that responds ”acceptably” to the members of the

Action

Given the four features, we can specify a function of these features thatwill select the right action for boundary-following as follows:

if none of the four features equal to 1, then move north (it could beany direction).

if x1 = 1 and x2 = 0, then move east;

if x2 = 1 and x3 = 0, then move south;

if x3 = 1 and x4 = 0, then move west;

if x4 = 1 and x1 = 0, then move north.

We now need to represent and implement perception and action functions.To that end, we briefly review Boolean algebra below.

Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 8 / 32

Page 9: COMP5211 Lecture 2: Reactive Agentshome.cse.ust.hk/faculty/flin/comp5211/lect2.pdfthe task of learning is to compute a function that responds ”acceptably” to the members of the

Boolean Algebra

A Boolean function maps an n tuple of (0, 1) values to {0, 1};Boolean algebra is a convenient notation for representing Booleanfunctions using · (and, often omitted), + (or), and − (negation):

1 + 1 = 1, 1 + 0 = 1, 0 + 0 = 0,

1 · 1 = 1, 1 · 0 = 0, 0 · 0 = 0,

1 = 0, 0 = 1

The two binary operators are commutative and associative.

Examples: x4 is s1 + s8; the condition for the robot moving north isx1x2x3x4 + x4x1.

Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 9 / 32

Page 10: COMP5211 Lecture 2: Reactive Agentshome.cse.ust.hk/faculty/flin/comp5211/lect2.pdfthe task of learning is to compute a function that responds ”acceptably” to the members of the

Production SystemsOne convenient representation for an action function is productionsystems.

A production system is a sequence of productions, which are rules ofthe form:

c → a

meaning if condition c is true, then do a. Here a could be a primitiveaction, a call to a procedure, or a set of actions to be executedsimultaneously.

When there are more than one productions can be fired (theircondition parts are true), then the first one is applied.

The following is a production system representation of the actionfunction for our boundary-following robot:

x4x1 → north, x3x4 → west,

x2x3 → south, x1x2 → east,

1→ north.

Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 10 / 32

Page 11: COMP5211 Lecture 2: Reactive Agentshome.cse.ust.hk/faculty/flin/comp5211/lect2.pdfthe task of learning is to compute a function that responds ”acceptably” to the members of the

Here is a production system for getting the robot to a corner:

inCorner → nil ,

1 → bf ,

where inCorner is a feature of the sensory input corresponding todetecting a corner, nil is the do-nothing action, and bf is the actionthat the above boundary-folliwng production system will produce.

Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 11 / 32

Page 12: COMP5211 Lecture 2: Reactive Agentshome.cse.ust.hk/faculty/flin/comp5211/lect2.pdfthe task of learning is to compute a function that responds ”acceptably” to the members of the

TLUs

Boolean functions, thus those production systems whose conditionsare Boolean functions can be implemented easily as programs. Theyalso be implemented directly as circuits.

A type of circuits of particular interest in AI is threshold logic unit(TLU), also called perceptron:

w1

x1

w2

x2

wixi

wn

xn

Σ

Σ

...

...

Σ x wi ii=1

n

Weighted sum

Output, f

Threshold, θ

f = 1 if

= 0 otherwise

x wi i ≥ θi=1

n

© 1998 Morgan Kaufmann Publishers

Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 12 / 32

Page 13: COMP5211 Lecture 2: Reactive Agentshome.cse.ust.hk/faculty/flin/comp5211/lect2.pdfthe task of learning is to compute a function that responds ”acceptably” to the members of the

TLUsNot all Boolean functions can be implemented as TLUs - those can arecalled linearly separable functions. Here are two examples:

1.5–1

1

1

x2

x1

x3

x1x2 x3

© 1998 Morgan Kaufman Publishers

0.5 x1x2

s2

s3

s4

s5

1

1

–2

–2

© 1998 Morgan Kaufman Publishers

Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 13 / 32

Page 14: COMP5211 Lecture 2: Reactive Agentshome.cse.ust.hk/faculty/flin/comp5211/lect2.pdfthe task of learning is to compute a function that responds ”acceptably” to the members of the

Networks

A single production rule can sometimes be encoded as a TLU.

A production system needs a network of TLUs to implements.

Such networks are often called a neural netwroks, which can be usedto represent any Boolean function.

Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 14 / 32

Page 15: COMP5211 Lecture 2: Reactive Agentshome.cse.ust.hk/faculty/flin/comp5211/lect2.pdfthe task of learning is to compute a function that responds ”acceptably” to the members of the

Learning Action-Selection Functions

I will talk about two such learning techniques for now:

Learning TLUs;

Genetic programming.

Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 15 / 32

Page 16: COMP5211 Lecture 2: Reactive Agentshome.cse.ust.hk/faculty/flin/comp5211/lect2.pdfthe task of learning is to compute a function that responds ”acceptably” to the members of the

Supervised Learning

Given a training set consisting of

a set Σ of n-dimensional vectors (these vectors could be the vectorsof the robot’s sensory inputs, or they might be the feature vectorscomputed by the perceptual processing component);

for each vector in Σ, an associated action called label of the vector(this action could be the one that the learner observed performed bythe teacher, or simply the desired action given by the designer of arobot);

the task of learning is to compute a function that responds ”acceptably”to the members of the training set: this usually means that the functionagrees with as many members of the training set as possible.In our case, the function should be give by a TLU.

Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 16 / 32

Page 17: COMP5211 Lecture 2: Reactive Agentshome.cse.ust.hk/faculty/flin/comp5211/lect2.pdfthe task of learning is to compute a function that responds ”acceptably” to the members of the

Example - When to Move East

A training set for learning when to move east:

1

2

3

4

5

6

Input Sensory x1x2number vector (move east)1 00001100 0

2 11100000 1

3 00100000 1

4 00000000 0

5 00001000 0

6 01100000 1

© 1998 Morgan Kaufman Publishers

Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 17 / 32

Page 18: COMP5211 Lecture 2: Reactive Agentshome.cse.ust.hk/faculty/flin/comp5211/lect2.pdfthe task of learning is to compute a function that responds ”acceptably” to the members of the

Learning TLUs

Recall that a TLU is determined by:

number of inputs;

for each input an associated number called its weight;

a number called the threshold.

We know the number of inputs from the training set; we can assume thatthe threshold is always o by introducing a new input with its weight set tobe the negative of the original threshold, and its input always set to 1.So what we need to learn is the vector of weights.

Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 18 / 32

Page 19: COMP5211 Lecture 2: Reactive Agentshome.cse.ust.hk/faculty/flin/comp5211/lect2.pdfthe task of learning is to compute a function that responds ”acceptably” to the members of the

The Error-Correction ProcedureGiven a vector X from the training set (augmented by the n + 1th specialinput 1), let d be the label (desired output) of this input, and f the actualoutput of the old TLU, the weight change rule is:

W←W + c(d − f )X,

where c , a number, is called the learning rate.To use this rule to learn a TLU: start with a random initial weight vector,choose a learning rate c , and then iterate through the set of training setrepeatedly until it becomes stable.If the traning set can be captured by a linearly separable Boolean function,then this procedure will terminate. The exact number of steps neededdepends on:

Initial weight values

Learning rate in weight updating rule

Order of presentation of training examples

Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 19 / 32

Page 20: COMP5211 Lecture 2: Reactive Agentshome.cse.ust.hk/faculty/flin/comp5211/lect2.pdfthe task of learning is to compute a function that responds ”acceptably” to the members of the

Machine Evolution

Human becomes what we are now through millions of years evolutionfrom apes.

Presumably, we could simulate this evolution process to evolve smartmachines as well.

Evolution has two key components:I reproduction (how do parents produce descendants); andI survival of the fittest (how to select which of these descendants are

going to reproduce more descendants).

Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 20 / 32

Page 21: COMP5211 Lecture 2: Reactive Agentshome.cse.ust.hk/faculty/flin/comp5211/lect2.pdfthe task of learning is to compute a function that responds ”acceptably” to the members of the

Genetic Programming (GP)

GP is about evolving programs to solve specific problems.

The first issue is what the representation is for programs: thetechniques for evolving C programs would be different from that forML programs. The general idea is:

I decide what the legal programs are;I define a fitness function;I select a set of legal programs as generation 0;I produce the next generation until a desired program is produced.

Three common techniques for producing the next generations are:I copy (clone a good gene to the next generation);I crossover (mix-up two parent’s genes);I mutation (mutate a gene)

I will illustrate the idea using the boundary-following robot.

Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 21 / 32

Page 22: COMP5211 Lecture 2: Reactive Agentshome.cse.ust.hk/faculty/flin/comp5211/lect2.pdfthe task of learning is to compute a function that responds ”acceptably” to the members of the

The Task

Wall-following in the following grid-world (we have to fix the environment):

nw n ne

e

sessw

w

© 1998 Morgan Kaufman Publishers

Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 22 / 32

Page 23: COMP5211 Lecture 2: Reactive Agentshome.cse.ust.hk/faculty/flin/comp5211/lect2.pdfthe task of learning is to compute a function that responds ”acceptably” to the members of the

Program Representation

We shall assume that the program that we are looking for can beconstructed from the primitives ones (east, west, etc) by Boolean ops andconditionals. The example in the following slide represent a program thatdoes the same thing as the following production system:

(n + ne)e → east,

(e + se)s → south,

(s + sw)w → west,

1 → north.

Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 23 / 32

Page 24: COMP5211 Lecture 2: Reactive Agentshome.cse.ust.hk/faculty/flin/comp5211/lect2.pdfthe task of learning is to compute a function that responds ”acceptably” to the members of the

eastAND

IF

OR NOT

n ne e

southAND

IF

OR NOT

e se s

westAND

IF

OR NOT

s sw w

north

(IF (AND (OR (n) (ne)) (NOT (e)))

(east)

(IF (AND (OR (e) (se)) (NOT (s)))

(south)

(IF (AND (OR (s) (sw)) (NOT (w)))

(west)

(north))))

© 1998 Morgan Kaufman PublishersFangzhen Lin (HKUST) Lecture 2: Reactive Agents 24 / 32

Page 25: COMP5211 Lecture 2: Reactive Agentshome.cse.ust.hk/faculty/flin/comp5211/lect2.pdfthe task of learning is to compute a function that responds ”acceptably” to the members of the

Fitness Function

Given a program, and a starting position for the robot, we run it untilit has carried out 60 steps, and then count the number of cells nextto the wall that are visited during these 60 steps. (Max=32, Min=0.)

For a given program, we do ten of these runs with the robot startingat ten randomly chosen starting positions. The total count of thenext-to-the-wall cels visited is taken as the fitness of the program.(Max = 320, Min=0.)

Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 25 / 32

Page 26: COMP5211 Lecture 2: Reactive Agentshome.cse.ust.hk/faculty/flin/comp5211/lect2.pdfthe task of learning is to compute a function that responds ”acceptably” to the members of the

The GP ProcessGeneration 0: 5000 random programs.Procedure for producing generation (n+1) from generation n:

Copy 10% of the programs from generation n to generation n+1.These programs are chosen by the following tournament selectionprocess: 7 programs are randomly selected from the population, andthe most fit of these seven is chosen.

The rest (90%) are produced from generation n by a crossoveroperation as follows: a mother and a father is chosen from generationn by the tournament selection process, and a randomly chosensubtree of the father is used to replace a randomly selected subtree ofthe mother. See next page for an example.

Sometimes a mutation operator is also used: it selects a memberfrom generation n by the tournament selection process, and then arandomly chosen subtree of it is deleted and replaced by a randomtree. When used, the mutation operation is used sparingly (maybe1% rate).

Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 26 / 32

Page 27: COMP5211 Lecture 2: Reactive Agentshome.cse.ust.hk/faculty/flin/comp5211/lect2.pdfthe task of learning is to compute a function that responds ”acceptably” to the members of the

An Example of Crossover Operation

NOT

AND IF

se IF AND OR se

NOT s OR se NOT nwe

west w south north

NOT

AND

se IF

NOT s AND

NOTwest se

north

Mother program Father program Child program

Randomly chosencrossover points

© 1998 Morgan Kaufman Publishers

Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 27 / 32

Page 28: COMP5211 Lecture 2: Reactive Agentshome.cse.ust.hk/faculty/flin/comp5211/lect2.pdfthe task of learning is to compute a function that responds ”acceptably” to the members of the

Performance of GP

The performance of GP depends on:

the size of generation 0;

the copy rate, crossover rate, and mutation rate;

the parameters used in the tournament selection process.

For the wall-following example, it will generate a perfect one after about10 generations:

350

300

250

200

150

100

50

0

0 1 2 3 4 5 6 7 8 9 10

Fitn

ess

Generation number© 1998 Morgan Kaufman Publishers

Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 28 / 32

Page 29: COMP5211 Lecture 2: Reactive Agentshome.cse.ust.hk/faculty/flin/comp5211/lect2.pdfthe task of learning is to compute a function that responds ”acceptably” to the members of the

State Machines

Recall that S-R agents have no memory; they just respond to thecurrent stimuli.

Often one needs more powerful agents. State machines are agentswho can remember the action that they have just done, and the statethat the previous environment is in, and can have some mental states.

Action function of a state machine is then a mapping of the currentsensory inputs, the state of the environment at the previous time step,the action that the machine has just taken, and the current mentalstates.

Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 29 / 32

Page 30: COMP5211 Lecture 2: Reactive Agentshome.cse.ust.hk/faculty/flin/comp5211/lect2.pdfthe task of learning is to compute a function that responds ”acceptably” to the members of the

Basic Architecture of State Machines

01111

Perceptualprocessing

Memory(previous featurevector andprevious action)

Actionfunction

Actionat

Sensoryinput

Xt–1at–1

Feature vector, Xt

© 1998 Morgan Kaufman Publishers

Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 30 / 32

Page 31: COMP5211 Lecture 2: Reactive Agentshome.cse.ust.hk/faculty/flin/comp5211/lect2.pdfthe task of learning is to compute a function that responds ”acceptably” to the members of the

Sensory-Impaired Bounday-Following Robot

Consider again our boundary-following robot, assume now that it’ssomewhat sensory-impaired: its sensory inputs are only (s2, s4, s6, s8).Can you design an S-R agent for following a bounday based only on thesefour sensors?

Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 31 / 32

Page 32: COMP5211 Lecture 2: Reactive Agentshome.cse.ust.hk/faculty/flin/comp5211/lect2.pdfthe task of learning is to compute a function that responds ”acceptably” to the members of the

Let w1 − w8 be the features defined as:

wi = si when i = 2, 4, 6, 8;

w1 = 1 iff at the previous time step, w2 = 1, and the robot movedeast;

w3 = 1 iff at the previous time step, w4 = 1, and the robot movedsouth;

w5 = 1 iff at the previous time step, w6 = 1, and the robot movedwest;

w7 = 1 iff at the previous time step, w8 = 1, and the robot movednorth.

Using these features, the following production system will do the job:

w2w4→ east, w4w6→ south,

w6w8→ west, w8w2→ north,

w1 → north, w3 → east,

w5 → south, w7 → west,

1→ north

Fangzhen Lin (HKUST) Lecture 2: Reactive Agents 32 / 32