18
NEW TIES WP2 Agent and learning mechanisms

NEW TIES WP2 Agent and learning mechanisms

Embed Size (px)

DESCRIPTION

NEW TIES WP2 Agent and learning mechanisms. Decision making and learning. Agents have a controller (decision tree, DQT) Input: situation (as perceived = seen/heard/interpr’d Output: action Decision making = using DQT Learning = modifying DQT - PowerPoint PPT Presentation

Citation preview

NEW TIES WP2 Agent and learning mechanisms

Decision making and learning

Agents have a controller (decision tree, DQT) Input: situation (as perceived = seen/heard/interpr’d Output: action

Decision making = using DQT Learning = modifying DQT Decisions also depend on inheritable “attitude

genes” (learned through evolution)

Example of a DQT

0.5

B

BT

ABias Test Action Decision 0.2 Genetic bias YES Boolean choice

Legend

VISUAL:FRONTFOODREACHABLE

T

NO YES

TURNLEFT

MOVE TURNRIGHT

A

0.6 0.2 0.2

PICKUP

1.0

A

BAG:FOODT

YES NO

TURNLEFT

MOVE TURNRIGHT

A

0.6 0.2 0.2

EAT

1.0

A

0.5

Interaction evolution & individual learning

Bias node with n children each with bias bi

Bias ≠ probability Bias bi is learned, changing (name: learned

bias) Genetic bias gi is inherited, part of genome,

constant Actual probability of choosing child x:

p(b,g) = b + (1 - b) ∙ g Learned and inherited behaviour are linked

through formula

DQT nodes & parameters cont’d

Test node language: native concepts + emerging concepts

Native: see_agent, see_mother, see_food, have_food, see_mate, …

New concepts can emerge by categorisation (discrimination game)

Learning: the heart of the emergence engine

Evolutionary learning: not within an agent (not during lifetime), over

generations by variation + selection

Individual learning: within one agent, during lifetime by reinforcement learning

Social learning: during lifetime, in interacting agents by sending/receiving + adopting knowledge pieces

Types of learning: properties Evolutionary learning:

Agent does not create new knowledge during lifetime Basic DQTree + genetic biases are inheritable “knowledge creator” = crossover and mutation

Individual learning: Agent does create new knowledge during lifetime DQTree + learned biases are modified “knowledge creator” = reinforcement learning (driven by

rewards) Individually learnt knowledge dies with its host agent

Social learning: Agent imports knowledge already created elsewhere (new? not

new?) Adoption of imported knowledge ≈ crossover Importing knowledge pieces

can save effort for recipient can create novel combinations

Exporting knowledge helps its preservation after death of host

Present status of types of learning

Evolutionary learning: Demonstrated in 2 NT scenarios Autonomous selection/reproduction causes problems with

population stability (im/explosion) Individual learning:

code, but never demonstrated in NT scenarios Social learning:

Under construction/design based on the “telepathy” approach

Communication protocols + adoption mechanisms needed

Evolution: variation operators

Operators for DQT: Crossover = subtree swap Mutation =

Substitute subtree with random sub-tree Change concepts in test nodes Change bias on an edge

Operators for attitude genes: Crossover = full arithmetic xover Mutation =

Add Gaussian noise Replace with random value

Evolution: selection operators

Mate selection: Mate action chosen by DQT Propose – accept proposal Adulthood OK

Survivor selection: Dead if too old ( ≥ 80 years) Dead if zero energy

Experiment: Simple world

Setup: Environment

World size: 200 x 200 grid cells Agents and food (no tokens, roads, etc).

Both are variable in number. Initial distribution of agents (500): in

upper left corner Initial distribution of food (10000): 5000

in upper left and lower right corner.

Experiment: Simple world

Setup: Agents

Native knowledge (concepts and DQT sub trees)

Navigating (random walk) Eating (identify, pickup and eat plants) Mating (identify mates, propose/agree)

Random DQT-tree branches Differs per agent Based on the “pool” of native concepts

Experiment: Simple world

Simulation continued for 3 months real time to test stability

Experiment: Poisonous Food

Setup: Environment

Two types of food: poisonous (decreases energy) and edible (increases energy)

World size: 200 x 200 grid cells Agents and food (no tokens, roads, etc). Both

are variable in number. Initial distribution of agents (500): uniform

random over the grid space. Initial distribution of food (10000): 5000 of

each type of food uniform random over the same grid space as the agents.

Experiment: Poisonous Food

Setup: Agent

Native knowledge Identical to simple world experiment

Additional native knowledge Can distinguish poisonous from edible plants Relation with eating/picking up is not present

No random DQT-tree branches

Experiment: Poisonous Food

Measures

Population size Welfare (energy) Number of poisonous and edible plants Complexity of controller (nr. of nodes) Age

Experiment: Poisonous Food

Demo

Experiment: Poisonous Food Results

0

500

1000

1500

2000

2500

timestep 1250 2500 3750 5000 6250 7500 8750 10000 11250 12500 13750 15000

population size

healthy plants (x10)

poisonous plants (x10)

average agent energy (x100)