113
Machine Learning and ILP for Multi-Agent Systems Daniel Kudenko & Dimitar Kazakov Department of Computer Science University of York, UK ACAI-01, Prague, July 2001

acai01-updated.ppt

  • Upload
    butest

  • View
    370

  • Download
    1

Embed Size (px)

Citation preview

Page 1: acai01-updated.ppt

Machine Learning and ILP for Multi-Agent Systems

Daniel Kudenko & Dimitar Kazakov

Department of Computer Science

University of York, UK

ACAI-01, Prague, July 2001

Page 2: acai01-updated.ppt

Why Learning Agents?

Agent designers are not able to foresee all situations that the agent will encounter.

To display full autonomy Agents need to learn from and adapt to novel environments.

Learning is a crucial part of intelligence.

Page 3: acai01-updated.ppt

A Brief History

Machine Learning

Agents

Disembodied ML

Single-Agent System

Single-Agent Learning Multiple

Single-Agent Learners

Multiple Single-AgentSystem

Social Multi-AgentLearners

Social Multi-AgentSystem

Page 4: acai01-updated.ppt

Outline

Principles of Machine Learning (ML) ML for Single Agents ML for Multi-Agent Systems Inductive Logic Programming for Agents

Page 5: acai01-updated.ppt

What is Machine Learning?

Definition: A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. [Mitchell 97]

Example: T = “play tennis”, E = “playing matches”, P = “score”

Page 6: acai01-updated.ppt

Types of Learning

Inductive Learning (Supervised Learning) Reinforcement Learning Discovery (Unsupervised Learning)

Page 7: acai01-updated.ppt

Inductive Learning

[An inductive learning] system aims at determining a description of a given concept from a set of concept examples provided by the teacher and from background knowledge. [Michalski et al. 98]

Page 8: acai01-updated.ppt

Inductive Learning

Examples of Category C1

Examples of Category C2

Examples of Category Cn

Inductive LearningSystem

Hypothesis(Procedure to Classify

New Examples)

Page 9: acai01-updated.ppt

Inductive Learning ExampleAmmo: lowMonster: nearLight: goodCategory: shoot

Inductive LearningSystem

If (Ammo = high) and (light {medium, good}) then shoot; ………..

Ammo: lowMonster: farLight: mediumCategory: ¬shoot

Ammo: highMonster: farLight: goodCategory: shoot

Page 10: acai01-updated.ppt

Performance Measure

Classification accuracy on unseen test set.

Alternatively: measure that incorporates cost of false-positives and false-negatives (e.g. recall/precision).

Page 11: acai01-updated.ppt

Where’s the knowledge?

Example (or Object) language Hypothesis (or Concept) language Learning bias Background knowledge

Page 12: acai01-updated.ppt

Example Language

Feature-value vectors, logic programs. Which features are used to represent

examples (e.g., ammunition left)? For agents: which features of the

environment are fed to the agent (or the learning module)?

Constructive Induction: automatic feature selection, construction, and generation.

Page 13: acai01-updated.ppt

Hypothesis Language

Decision trees, neural networks, logic programs, …

Further restrictions may be imposed, e.g., depth of decision trees, form of clauses.

Choice of hypothesis language influences choice of learning methods and vice versa.

Page 14: acai01-updated.ppt

Learning bias

Preference relation between legal hypotheses.

Accuracy on training set. Hypothesis with zero error on training

data is not necessarily the best (noise!). Occam’s razor: the simpler hypothesis

is the better one.

Page 15: acai01-updated.ppt

Inductive Learning

No “real” learning without language or learning bias.

IL is search through space of hypotheses guided by bias.

Quality of hypothesis depends on proper distribution of training examples.

Page 16: acai01-updated.ppt

Inductive Learning for Agents

What is the target concept (i.e., categories)?

Example: do(a), ¬do(a) for specific action a.

Real-valued categories/actions can be discretized.

Where does the training data come from and what form does it take?

Page 17: acai01-updated.ppt

Batch vs Incremental Learning

Batch Learning: collect a set of training examples and compute hypothesis.

Incremental Learning: update hypothesis with each new training example.

Incremental learning more suited for agents.

Page 18: acai01-updated.ppt

Batch Learning for Agents

When should (re-)computation of hypothesis take place?

Example: after experienced accuracy of hypothesis drops below threshold.

Which training examples should be used?

Example: sequences of actions that led to success.

Page 19: acai01-updated.ppt

Eager vs. Lazy learning

Eager learning: commit to hypothesis computed after training.

Lazy learning: store all encountered examples and perform classification based on this database (e.g. nearest neighbour).

Page 20: acai01-updated.ppt

Active Learning

Learner decides which training data to receive (i.e. generates training examples and uses oracle to classify them).

Closed Loop ML: learner suggests hypothesis and verifies it experimentally. If hypothesis is rejected, the collected data gives rise to a new hypothesis.

Page 21: acai01-updated.ppt

Black-Box vs. White-Box

Black-Box Learning: Interpretation of the learning result is unclear to a user.

White-Box Learning: Creates (symbolic) structures that are comprehensible.

Page 22: acai01-updated.ppt

Reinforcement Learning

Agent learns from environmental feedback indicating the benefit of states.

No explicit teacher required. Learning target: optimal policy (i.e.,

state-action mapping) Optimality measure: e.g., cumulative

discounted reward.

Page 23: acai01-updated.ppt

Q Learning

Value of a state: discounted cumulative reward V(st) = i 0 i r(st+i,at+i)

0 < 1 is a discount factor ( = 0 means that only immediate reward is considered).r(st+i ,at+i) is the reward determined by performing actions specified by policy .

Q(s,a) = r(s,a) + V*((s,a))

Optimal Policy: *(s) = argmaxa Q(s,a)

Page 24: acai01-updated.ppt

Q Learning

Initialize all Q(s,a) to 0

In some state s choose some action a. Let s’ be the resulting state.

Update Q:

Q(s,a) = r + maxa’ Q(s’,a’)

Page 25: acai01-updated.ppt

Q Learning

Guaranteed convergence towards optimum (state-action pairs have to be visited infinitely often).

Exploration strategy can speed up convergence.

Basic Q Learning does not generalize: replace state-action table with function approximation (e.g. neural net) in order to handle unseen states.

Page 26: acai01-updated.ppt

Pros and Cons of RL+ Clearly suited to agents acting and

exploring an environment.+ Simple.- Engineering of suitable reward function

may be tricky. - May take a long time to converge.- Learning result may be not transparent

(depending on representation of Q function).

Page 27: acai01-updated.ppt

Combination of IL and RL

Relational reinforcement learning [Dzeroski et al. 98]: leads to more general Q function representation that may still be applicable even if the goals or environment change.

Explanation-based learning and RL [Dietterich and Flann, 95].

More ILP and RL: see later.

Page 28: acai01-updated.ppt

Unsupervised Learning

Acquisition of “useful” or “interesting” patterns in input data.

Usefulness and interestingness are based on agent’s internal bias.

Agent does not receive any external feedback.

Discovered concepts are expected to improve agent performance on future tasks.

Page 29: acai01-updated.ppt

Learning and Verification

Need to guarantee agent safety. Pre-deployment verification for non-

learning agents. What to do with learning agents?

Page 30: acai01-updated.ppt

Learning and Verification[Gordon ’00] Verification after each self-modification

step. Problem: Time-consuming. Solution 1: use property-preserving

learning operators. Solution 2: use learning operators which

permit quick (partial) re-verification.

Page 31: acai01-updated.ppt

Learning and Verification

What to do if verification fails? Repair (multi)-agent plan. Choose different learning operator.

Page 32: acai01-updated.ppt

Learning in Multi-Agent Systems

Classification Social Awareness. Communication Role Learning. Distributed Learning.

Page 33: acai01-updated.ppt

Types of Multi-Agent Learning[Weiss & Dillenbourg 99] Multiplied Learning: No interference in

the learning process by other agents (except for exchange of training data or outputs).

Divided Learning: Division of learning task on functional level.

Interacting Learning: cooperation beyond the pure exchange of data.

Page 34: acai01-updated.ppt

Social Awareness

Awareness of existence of other agents and (eventually) knowledge about their behavior.

Not necessary to achieve near optimal MAS behavior: rock sample collection [Steels 89].

Can it degrade performance?

Page 35: acai01-updated.ppt

Levels of Social Awareness [Vidal&Durfee 97]

0-level agent: no knowledge about existence of other agents.

1-level agent: recognizes that other agents exist, model other agents as 0-level.

2-level agent: has some knowledge about behavior of other agents and their behavior; model other agents as 1-level agents.

k-level agent: model other agents as (k-1)-level.

Page 36: acai01-updated.ppt

Social Awareness and Q Learning 0-level agents already learn implicitly

about other agents. [Mundhe and Sen, 00]: study of two Q

learning agents up to level 2. Two 1-level agents display slowest and

least effective learning (worse than two 0-level agents).

Page 37: acai01-updated.ppt

Agent models and Q Learning Q: S An R, where n is the number of

agents. If other agent’s actions are not observable,

need assumption for actions of other agents. Pessimistic assumption: given an agent’s

action choice other agents will minimize reward.

Optimistic assumption: other agents will maximize reward.

Page 38: acai01-updated.ppt

Agent Models and Q Learning

Pessimistic Assumption leads to overly cautious behavior.

Optimistic Assumption guarantees convergence towards optimum [Lauer & Riedmiller ‘00].

If knowledge of other agent’s behavior available, Q value update can be based on probabilistic computation [Claus and Boutilier ‘98]. But: no guarantee of optimality.

Page 39: acai01-updated.ppt

Q Learning and Communication[Tan 93]

Types of communication: Sharing sensation Sharing or merging policies Sharing episodes

Results: Communication generally helps Extra sensory information may hurt

Page 40: acai01-updated.ppt

Role Learning

Often useful for agents to specialize in specific roles for joint tasks.

Pre-defined roles: reduce flexibility, often not easy to define optimal distribution, may be expensive.

How to learn roles? [Prasad et al. 96]: learn optimal

distribution of pre-defined roles.

Page 41: acai01-updated.ppt

Q Learning of roles

[Crites&Barto 98]: elevator domain; regular Q learning; no specialization achieved (but highly efficient behavior).

[Ono&Fukumoto 96]: Hunter-Prey domain, specialization achieved with greatest mass merging strategy.

Page 42: acai01-updated.ppt

Q Learning of Roles [Balch 99] Three types of reward function: local

performance-based, local shaped, global. Global reward supports specialization. Local reward supports emergence of

homogeneous behaviors. Some domains benefit from learning team

heterogeneity (e.g., robotic soccer), others do not (e.g., multi-robot foraging).

Heterogeneity measure: social entropy.

Page 43: acai01-updated.ppt

Distributed Learning

Motivation: Agents learning a global hypothesis from local observations.

Application of MAS techniques to (inductive) learning.

Applications: Distributed Data Mining [Provost & Kolluri ‘99], Robotic Soccer.

Page 44: acai01-updated.ppt

Distributed Data Mining

[Provost& Hennessy 96]: Individual learners see only subset of all training examples and compute a set of local rules based on these.

Local rules are evaluated by other learners based on their data.

Only rules with good evaluation are carried over to the global hypothesis.

Page 45: acai01-updated.ppt

Bibliography[Mitchell 97] T. Mitchell. Machine Learning. McGraw Hill, 1997.

[Michalski et al. 98] R.S. Michalski, I. Bratko, M. Kubat. Machine Learning and Data Mining: Methods and Applications. Wiley, 1998.

[Dietterich&Flann 95] T. Dietterich and N.Flann. Explanation-based Learning and Reinforcement Learning. In Proceedings of the Twelfth International Conference on Machine Learning, 1995.

[Dzeroski et al. 98] S. Dzeroski, L. DeRaedt, and H. Blockeel. Relational Reinforcement Learning. In: Proceedings of the Eighth International Conference on Inductive Logic Programming ILP-98. Springer, 1998.

[Gordon 00] D. Gordon: Asimovian Adaptive Agents. Journal of Artificial Intelligence Research, 13, 2000.

[Weiss & Dilelnbourg 99] G. Weiss and P. Dillenbourg. What is ‘Multi’ in Multi-Agent Learning? In P. Dillenbourg (ed.), Collaborative Learning. Cognitive and Computational Approaches. Pergamon Press, 1999.

[Vidal & Durfee 97] J.M. Vidal and E. Durfee. Agents Learning about Agents: A Framework and Analysis. In Working Notes of the AAAI-97 workshop on Multiagent Learning, 1997.

[Mundhe & Sen 00] M. Mundhe and S. Sen. Evaluating Concurrent Reinforcement Learners. Proceedings of the Fourth International Conference on Multiagent Systems, IEEE Press, 2000.

[Claus & Boutillier 98] C. Claus and C. Boutillier. The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems. AAAI 98.

[Lauer & Riedmiller 00] M. Lauer and M. Riedmiller. An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems. In Proceedings of the Seventeenth International Conference in Machine Learning, 2000.

Page 46: acai01-updated.ppt

Bibliography[Tan 93] M. Tan. Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents. In: Proceedings of the Tenth International Conference on Machine Learning, 1993.

[Prasad et al. 96] M.V.N. Prasad, S.E. Lander and V.R. Lesser. Learning Organizational Roles for Negotiated Search. International Journal of Human-Computer Studies, 48(1), 1996.

[Ono & Fukomoto 96] N. Ono and K. Fukomoto. A Modular Approach to Multi-Agent Reinforcement Learning. Proceedings of the First International Conference on Multi-Agent Systems, 1996.

[Crites & Barto 98] R. Crites and A. Barto. Elevator Group Control Using Multiple Reinforcement Learning Agents. Machine Learning, 1998.

[Balch 99] T. Balch. Reward and Diversity in Multi-Robot Foraging. Proceedings of the IJCAI-99 Workshop on Agents Learning About, From, and With other Agents, 1999.

[Provost & Kolluri 99] F. Provost and V. Kolluri. "A Survey of Methods for Scaling Up Inductive Algorithms." Data Mining and Knowledge Discovery 3, 1999.

[Provost & Hennessy 96] F. Provost and D. Hennessy. Scaling up: Distributed Machine Learning with Cooperation. AAAI 96, 1996.

Page 47: acai01-updated.ppt

B R E A K

Page 48: acai01-updated.ppt

Machine Learning and ILP for MAS: Part II

Integration of ML and Agents ILP and its potential for MAS Agent Applications of ILP Learning, Natural Selection and Language

Page 49: acai01-updated.ppt

Machine Learning and ILP for MAS: Part II

Integration of ML and Agents ILP and its potential for MAS Agent Applications of ILP Learning, Natural Selection and Language

Page 50: acai01-updated.ppt

From Machine Learning to Learning Agents

Classic Machine Learning

Machine Learning: Learning as the only goal

Active Learning

Learning as one of many goals: Learning Agent(s)

Closed Loop Machine Learning

Page 51: acai01-updated.ppt

Integrating Machine Learning into the Agent Architecture

Time constraints on learning Synchronisation between agents’

actions Learning and Recall

Page 52: acai01-updated.ppt

Time Constraints on Learning

Machine Learning alone: – predictive accuracy matters, time doesn’t

(just a price to pay) ML in Agents

– Soft deadlines: resources must be shared with other activities (perception, planning, control)

– Hard deadlines: imposed by environment: Make up your mind now! (or they’ll eat you)

Page 53: acai01-updated.ppt

Doing Eager vs. Lazy Learning under Time Pressure Eager Learning

– Theories typically more compact…– …and faster to use– Takes more time to learn – do it when the

agent is idle Lazy Learning

– Knowledge acquired at (almost) no cost– May be much slower when a test example

comes

Page 54: acai01-updated.ppt

“Clear-cut” vs. Any-time Learning

Consider two types of algorithms: Running a prescribed number of steps

guarantees finding a solution– can use worst case complexity analysis to find an

upper bound on the execution time Any-time algorithms

– a longer run may result in a better solution– don’t know an optimal solution when they see one– example: Genetic Algorithms– policies: halt learning to meet hard deadlines or

when cost outweighs expected improvements of accuracy

Page 55: acai01-updated.ppt

Time Constraints on Learning in Simulated Environments Consider various cases:

– Unlimited time for learning– Upper bound on time for learning– Learning in real time

Gradually tightening the constraints makes integration easier

Not limited to simulations: real-world problems have similar setting– e.g., various types of auctions

Page 56: acai01-updated.ppt

Synchronisation Time ConstraintsUnlimited time Upper

boundReal time

1-move-per-round, batch update

Logic-based MAS for conflict simulations (Kudenko, Alonso)

1-move-per-round, immediate update

The York MA Environment

(Kazakov et al.)

Asynchronous Multi-agent Progol (Muggleton)

Page 57: acai01-updated.ppt

Learning and Recall

Agent must strike a balance between: Learning, which updates the model of

the world Recall, which applies existing model of

the world to other tasks

Page 58: acai01-updated.ppt

Learning and Recall (2)

Update sensory information

Recall current model of world to choose and carry out an action

Observe the action outcome

Learn new model of the world

Page 59: acai01-updated.ppt

Learning and Recall (3)

Update sensory information

Recall current model of world to choose and carry out an action

Learn new model of the world

• In theory, the two can run in parallel

• In practice, must share limited resources

Page 60: acai01-updated.ppt

Learning and Recall (4)

Possible strategies: Parallel learning and recall at all times Mutually exclusive learning and recall

– After incremental, eager learning, examples are discarded…

– …or kept if batch or lazy learning used Cheap on-the-fly learning (preprocessing),

off-line computationally expensive learning– reduce raw information, change object language– analogy with human learning and the role of sleep

Page 61: acai01-updated.ppt

Machine Learning and ILP for MAS: Part II

Integration of ML and Agents ILP and its potential for MAS Agent Applications of ILP Learning, Natural Selection and Language

Page 62: acai01-updated.ppt

Machine Learning Revisited

ML can be seen as the task of: taking a set of observations represented

in a given object/data language and representing (the information in) that

set in another language called concept/hypothesis language.

A side effect of this step – the ability to deal with unseen observations.

Page 63: acai01-updated.ppt

Object and Concept Language

Object Language: (x,y,+/-). Concept Language: any ellipse

(5 param.)

+++

+

_

__

_

Page 64: acai01-updated.ppt

Machine Learning Biases The concept/hypothesis language

specifies the language bias, which limits the set of all concepts/hypotheses that can be expressed/considered/learned.

The preference bias allows us to decide between two hypotheses if they both classify the training data equally.

The search bias defines the order in which hypotheses will be considered.– Important if one does not search the whole

hypothesis space.

Page 65: acai01-updated.ppt

Preference Bias, Search Bias & Version Space

Version space: the subset of hypotheses that have zero training error.

+++

+

_

__

_

most spec. concept

most gen. concept

Page 66: acai01-updated.ppt

Inductive Logic Programming

Based on three pillars: Logic Programming (LP) to represent

data and concepts (i.e., object and concept language)

Background Knowledge to extend the concept language

Induction as learning method

Page 67: acai01-updated.ppt

LP as ILP Object Language

A subset of First Order Predicate Logic (FOPL) called Logic Programming.

Often limited to ground facts, i.e., propositional logic (cf. ID3 etc.).

In the latter case, data can be represented as a single table.

Page 68: acai01-updated.ppt

ILP Object Language Example

Good bargain cars ILP representation

model mileage price y/n

BMW Z3

50,000 £5000 + gbc(z3,50000,5000).

Audi V8

30,000 £4000 + gbc(v8,30000,4000).

Fiat Uno

90,000 £3000 - :- gbc(uno,90000,3000).

Page 69: acai01-updated.ppt

LP as ILP Concept Language

The concept language of ILP is relations expressed as Horn clauses, e.g.:

equal(X,X).greater(X,Y) :- X > Y.

Cf. propositional logic representation:(arg1=1 & arg2=1)or(arg1=2 & arg2=2)...

– Tedious for finite domains and impossible otherwise.

Most often there is one target predicate (concept) only. – exceptions exist, e.g., Progol 5.

Page 70: acai01-updated.ppt

Modes in ILP

Used to distinguish between input attributes (mode +) output attributes (mode -) of the predicate

learned. Mode # used to describe attributes that must

contain a constant in the predicate definition. E.g., use mode car_type(+,+,#) to learncar_type(Doors,Roof,sports_car):-

Doors =< 2, Roof = convertible.

Page 71: acai01-updated.ppt

Modes in ILP

Used to distinguish between input attributes (mode +) output attributes (mode -) of the predicate

learned. Mode # used to describe attributes that must

contain a constant in the predicate definition. E.g., use mode car_type(+,+,#) to learncar_type(Doors,Roof,sports_car):-

Doors =< 2, Roof = convertible.

Page 72: acai01-updated.ppt

Modes in ILP

Used to distinguish between input attributes (mode +) output attributes (mode -) of the predicate

learned. Mode # used to describe attributes that must

contain a constant in the predicate definition. E.g., use mode car_type(+,+,#) to learncar_type(Doors,Roof,sports_car):-

Doors =< 2, Roof = convertible.

Page 73: acai01-updated.ppt

Modes in ILP

Used to distinguish between input attributes (mode +) output attributes (mode -) of the predicate

learned. Mode # used to describe attributes that must

contain a constant in the predicate definition. E.g., use mode car_type(-,-,#) to learncar_type(Doors,Roof,sports_car):-

(Doors = 1 ; Doors = 2), Roof = convertible.

Page 74: acai01-updated.ppt

Types in ILP

Specify the range for each argument User-defined types represented as

unary predicates:colour(blue). colour(red). colour(black).

Built-in types also provided:nat/1, real/1, any/1 in Progol.

These definitions may or may not be generative: colour(X) instantiates X,nat(X) does not.

Page 75: acai01-updated.ppt

ILP Types and Modes: Example

Good bargain cars ILP representation (Progol)

model mileage price y/n modeh(1,gbc(+model,+mileage,+price))?

BMW Z3

50,000 5000 + gbc(z3,50000,5000).

Audi V8

30,000 4000 + gbc(v8,30000,4000).

Fiat Uno

90,000 3000 - :- gbc(uno,90000,3000).

Page 76: acai01-updated.ppt

Positive Only Learning

A way of dealing with domains where no negative examples are available.– Learn the concept of non-self-destructive

actions. The trivial definition “Anything belongs

to the target concept” looks all right ! Trick: generate random examples and

treat them as negative.– Requires generative type definitions.

Page 77: acai01-updated.ppt

Background Knowledge

Only very simple math. relations, such as identity and “greater than” used so far:equal(X,X).greater(X,Y) :- X > Y.

These can also be easily hard-wired in the concept language of propositional learners.

ILP’s big advantage: one can extend the concept language with user-defined concepts or background knowledge.

Page 78: acai01-updated.ppt

Background Knowledge (2) The use of certain BK predicates may be a

necessary condition for learning the right hypothesis.

Redundant or irrelevant BK slows down the learning.

ExampleBK: prod(Miles,Price,Threshold):- Miles * Price < Threshold.

Modes: modeh(1,gbc(#model,+miles,+price))? modeb(1,prod(+miles,+price,#threshold))?

Th: gbc(z3,Miles,Price) :- prod(Miles,Price,250000001).

Page 79: acai01-updated.ppt

Choice of Background Knowledge In an ideal world one should start from a complete

model of the background knowledge of the target population. In practice, even with the most intensive anthropological studies, such a model is impossible to achieve. We do not even know what it is that we know ourselves. The best that can be achieved is a study of the directly relevant background knowledge, though it is only when a solution is identified that one can know what is or is not relevant.

The Critical Villager, Eric Dudley

Page 80: acai01-updated.ppt

ILP Preference Bias

Typically a trade-off between generality and complexity:– cover as many positive examples (and as

few negative ones) as you can…– …with as simple a theory as possible

Some ILP learners allow the users to specify their own preference bias.

Page 81: acai01-updated.ppt

Induction in ILP

Bottom-up (least general generalisation)– Map a term into a variable– Drop a literal from the clause body

Top-down (refinement operator)– Instantiate a variable– Add a literal to the clause body

Mixed techniques (e.g., Progol)

Page 82: acai01-updated.ppt

Example of Induction

p(X,Y).

p(b,a) :- q(b).

p(X,a).p(X,Y) :- q(X).

BK:

q(b).q(c).

Training examples:

p(b,a).p(f,g).:- p(i,j).

Page 83: acai01-updated.ppt

Induction in Progol

For each training example– Find the most general theory (clause) T– Find the most specific theory (clause) – Search the space in between in a top-down

fashion: T = p(X,Y)

= p(X,a) :- q(X).

p(X,a).p(X,Y) :- q(X)

Page 84: acai01-updated.ppt

Summary of ILP Basics

Symbolic Eager Knowledge-oriented (white-box) learner Complex, flexible hypothesis space Based on Induction

Page 85: acai01-updated.ppt

Learning Pure Logic Programs vs. Decision Lists Pure logic programs: the order of

clauses is irrelevant, and they must not contradict each other.

Decision lists: the concept language includes the predicate cut (!).

The use of decision lists can make for simpler (more concise) theories.

Page 86: acai01-updated.ppt

Decision List Example

%action(Cat,ObservedAnimal,Action).

action(Cat,Animal,stay):-dog(Animal),owner(Owner,Animal),owner(Owner,Cat),!.

action(Cat,Animal,run):-dog(Animal),!.

action(Cat,Animal,stay).

Page 87: acai01-updated.ppt

Updating Decision Lists with Exceptions

action(Cat,caesar,run):- !.action(Cat,Animal,stay):-dog(Animal),owner(Owner,Animal),owner(Owner,Cat),!.

action(Cat,Animal,run):-dog(Animal),!.

action(Cat,Animal,stay).

Page 88: acai01-updated.ppt

Updating Decision Lists with Exceptions Could be very beneficial in agents when

immediate updating of the agent’s knowledge is important: just add the exception at the top of the list.

Computationally inexpensive – does not need to modify the rest of the list.

Exceptions could be compiled into rules when agent is inactive.

Page 89: acai01-updated.ppt

Replacing Exceptions with Rules: Before

action(Cat,caesar,run):- !.action(Cat,rex,run):- !.action(Cat,rusty,run):- !.action(Cat,Animal,stay):-dog(Animal),owner(Owner,Animal),owner(Owner,Cat),!.

Page 90: acai01-updated.ppt

Replacing Exceptions with Rules: After

action(Cat,Animal,run):- dog(Animal), owner(richard,Animal),!.action(Cat,Animal,stay):-dog(Animal),owner(Owner,Animal),owner(Owner,Cat),!.

Page 91: acai01-updated.ppt

Eager ILP vs. Analogical Prediction Eager Learning: learn theory, dispose of

observations. Lazy Learning:

– keep all observations– compare new with old ones to classify– no explanation provided.

Analogical Prediction (Muggleton, Bain ‘98)– Combines the often higher accuracy of lazy

learning with an intelligible, explicit hypothesis typical for ILP

– Constructs a local theory for each new observation that is consistent with the largest number of training examples.

Page 92: acai01-updated.ppt

Analogical Prediction Example

owner(richard,caesar).

action(Cat,caesar,run).

owner(richard,rex).

action(Cat,rex,run).

owner(daniel,blackie).

action(Cat,blackie,stay).

owner(richard,rusty).

action(Cat,rusty,?).

Page 93: acai01-updated.ppt

Analogical Prediction Example

owner(richard,caesar).

action(Cat,caesar,run).

owner(richard,rex).

action(Cat,rex,run).

owner(daniel,blackie).

action(Cat,blackie,stay).

owner(richard,rusty).

action(Cat,Dog,run):-

owner(richard,Dog).

Page 94: acai01-updated.ppt

Timing Analysis of Theories Learned with ILP The more training examples, the more

accurate the theory… …but how long does it take to produce

an answer ? No theoretical work on the subject so far Experiment shows nontrivial behaviour

(reminding of the phase transitions observed in SAT learning).

Page 95: acai01-updated.ppt

Timing Analysis of ILP Theories: Example Kazakov, PhD Thesis:

• left: simple theory with low coverage; succeeds or quickly fails high speed

• middle: medium coverage, fragmentary theory, lots of backtracking low speed

• right: general theory with high coverage; less backtracking high speed

Page 96: acai01-updated.ppt

Machine Learning and ILP for MAS: Part II

Integration of ML and Agents ILP and its potential for MAS Agent Applications of ILP Learning, Natural Selection and Language

Page 97: acai01-updated.ppt

Agent Applications of ILP

Relational Reinforcement Learning (Džeroski, De Raedt, Driessens)

combines reinforcement learning with ILP generalises over previous experience and

goals (Q-table) to produce logical decision trees

results can be used to address new situations Don’t miss the next talk (~11:40 –13:10h) !

Page 98: acai01-updated.ppt

Agent Applications of ILP

ILP for Verification and Validation of MAS (Jacob, Driessens, De Raedt)

Also uses FOPL decision trees Observes agents’ behavour and

represents it as a logical decision tree The rules in the decision tree can be

compared with the designers’ intentions Test domain: RoboCup

Page 99: acai01-updated.ppt

Agent Applications of ILP

Reid & Ryan 2000: ILP used to help hierarchical

reinforcement learning ILP constructs high-level features that

help discriminate between (state,action) transitions with non-deterministic behaviour

Page 100: acai01-updated.ppt

Agent Applications of ILP

Matsui et al. 2000: Proposed an ILP agent that avoids

actions which will probably fail to achieve the goal.

Application domain: RoboCup

Alonso & Kudenko ‘99: ILP and EBL for conflict simulations.

Page 101: acai01-updated.ppt

The York MA Environment

Species of 2D agents competing for renewable, limited resources.

Agents have simple hard-coded behaviour based on the notion of drives.

Each agent can optionally have an ILP (Progol) mind – a separate process receiving observations and suggesting actions.

Allows to select the values of inherited features through natural selection.

Page 102: acai01-updated.ppt

The York MA Environment

Page 103: acai01-updated.ppt

The York MA Environment

ILP hasn’t been used in experiments yet (to come soon).

A number of experiments using inheritance studied Kinship-driven Altruism among Agents.

The start-up project sponsored by Microsoft.

Undergraduate students involved so far: Lee Mallabone, Steve Routledge, John Barton.

Page 104: acai01-updated.ppt

Machine Learning and ILP for MAS: Part II

Integration of ML and Agents ILP and its potential for MAS Agent Applications of ILP Learning, Natural Selection and Language

Page 105: acai01-updated.ppt

Learning and Natural Selection

In learning, search is trivial, choosing the right bias is hard.

But, the choice of learning bias is always external to the learner !

To find the best suited bias one could combine arbitrary choices of bias of with evolution and natural selection of the fittest individuals.

Page 106: acai01-updated.ppt

Darwinian vs. Lamarckian Evolution Darwinian evolution: nothing learned by

the individual is encoded in the genes and passed on to the offspring.

The Baldwin effect: learning abilities (good biases) are selected in evolution because they give the individual a better chance in a dynamic environment.

What is passed on to the offspring is useful, but very general.

Page 107: acai01-updated.ppt

Darwinian vs. Lamarckian Evolution (2) Lamarckian Evolution: individual

experience acquired in life can be inherited.

Not the case in nature. Doesn’t mean we can’t use it. The inherited concepts may be too

specific and not of general importance.

Page 108: acai01-updated.ppt

Learning and Language

Language uses concepts which are– specific enough to be useful to most/all

speakers of that language– general enough to correspond to shared

experience (otherwise, how would one know what the other is talking about !)

The concepts of a language serve as a learning bias which is “inherited” not in genes but through education.

Page 109: acai01-updated.ppt

Communication and Learning Language

– helps one learn (in addition to inherited biases)– allows to communicate knowledge.

Distinguish between– Knowledge: things that one can explain by the

means of a language to another. – Skills: the rest, require individual learning, cannot be

communicated.

If watching was enough to learn, the dog would have become a butcher. Bulgarian proverb.

Page 110: acai01-updated.ppt

Communication and Learning (2)

In NLP, forgetting [examples] may be harmful (van den Bosch et al.)

An expert is someone who does not think anymore – he knows. Frank Lloyd Wright.

It may be difficult to communicate what one has learned because of– Limited bandwidth (for lazy learning)– The absence of appropriate concepts in the

language (for black-box learning)

Page 111: acai01-updated.ppt

Communication and Learning (3)

In a society of communicating agents, less accurate white-box learning may be better than more accurate but expensive learning that cannot be communicated since the reduced performance could be outweighed by the much lower cost of learning.

Page 112: acai01-updated.ppt

Our Current Research

Inductive Bias Selection (Shane Greenaway)

Role Learning (Spiros Kapetanakis) Inductive Learning for Games (Alex

Champandard) Machine Learning of Natural Language

in MAS (Mark Bartlett)

Page 113: acai01-updated.ppt

The End