CS 416 Artificial Intelligence Lecture 3 Agents Agents

CS 416Artificial Intelligence

Lecture 3Lecture 3

AgentsAgents

Lecture 3Lecture 3

AgentsAgents

History of AI

Read the complete story in textRead the complete story in text

• Early to mid 50sEarly to mid 50s

– Alan Turing, Marvin Minsky (student of von Neumann), Alan Turing, Marvin Minsky (student of von Neumann), John McCarthy, Alan Newell, Herbert SimonJohn McCarthy, Alan Newell, Herbert Simon

Read the complete story in textRead the complete story in text

• Early to mid 50sEarly to mid 50s

– Alan Turing, Marvin Minsky (student of von Neumann), Alan Turing, Marvin Minsky (student of von Neumann), John McCarthy, Alan Newell, Herbert SimonJohn McCarthy, Alan Newell, Herbert Simon

History of AI: 1952- 1969

Great successes!Great successes!

• Solving hard math problemsSolving hard math problems

• game playinggame playing

• LISP was invented by McCarthy (1958)LISP was invented by McCarthy (1958)

– second oldest language in existencesecond oldest language in existence

– could accept new axioms at runtimecould accept new axioms at runtime

• McCarthy went to MIT and Marvin Minsky started lab at StanfordMcCarthy went to MIT and Marvin Minsky started lab at Stanford

– Both powerhouses in AI to this dayBoth powerhouses in AI to this day

Great successes!Great successes!

• Solving hard math problemsSolving hard math problems

• game playinggame playing

• LISP was invented by McCarthy (1958)LISP was invented by McCarthy (1958)

– second oldest language in existencesecond oldest language in existence

– could accept new axioms at runtimecould accept new axioms at runtime

• McCarthy went to MIT and Marvin Minsky started lab at StanfordMcCarthy went to MIT and Marvin Minsky started lab at Stanford

– Both powerhouses in AI to this dayBoth powerhouses in AI to this day

History of AI: 1966 - 1973

A dose of reality – OverhypedA dose of reality – Overhyped

• Systems fail to play chess and translate RussianSystems fail to play chess and translate Russian

– Computers were ignorant to context of their logicComputers were ignorant to context of their logic

– Problems were intractableProblems were intractable

algorithms that work in principle may not work in practicealgorithms that work in principle may not work in practice

Combinatorial Explosion / Curse of DimensionalityCombinatorial Explosion / Curse of Dimensionality

– Fatal flaw in neural networks was exposedFatal flaw in neural networks was exposed

though flaw was first resolved in 1969, neural networks did not though flaw was first resolved in 1969, neural networks did not return to vogue until late 1980sreturn to vogue until late 1980s

A dose of reality – OverhypedA dose of reality – Overhyped

• Systems fail to play chess and translate RussianSystems fail to play chess and translate Russian

– Computers were ignorant to context of their logicComputers were ignorant to context of their logic

– Problems were intractableProblems were intractable

algorithms that work in principle may not work in practicealgorithms that work in principle may not work in practice

Combinatorial Explosion / Curse of DimensionalityCombinatorial Explosion / Curse of Dimensionality

– Fatal flaw in neural networks was exposedFatal flaw in neural networks was exposed

though flaw was first resolved in 1969, neural networks did not though flaw was first resolved in 1969, neural networks did not return to vogue until late 1980sreturn to vogue until late 1980s

AI History: 1969 - 1979

Knowledge-based Systems (Expert systems)Knowledge-based Systems (Expert systems)

• Problem: General logical algorithms could not be applied to Problem: General logical algorithms could not be applied to realistic problemsrealistic problems

• Solution: accumulate specific logical algorithmsSolution: accumulate specific logical algorithms

– DENDRAL – infer chemical structureDENDRAL – infer chemical structure

knowledge of scientists boiled down to cookbook logicknowledge of scientists boiled down to cookbook logic

large number of special purpose rules worked welllarge number of special purpose rules worked well

Knowledge-based Systems (Expert systems)Knowledge-based Systems (Expert systems)

• Problem: General logical algorithms could not be applied to Problem: General logical algorithms could not be applied to realistic problemsrealistic problems

• Solution: accumulate specific logical algorithmsSolution: accumulate specific logical algorithms

– DENDRAL – infer chemical structureDENDRAL – infer chemical structure

knowledge of scientists boiled down to cookbook logicknowledge of scientists boiled down to cookbook logic

large number of special purpose rules worked welllarge number of special purpose rules worked well

AI History: 1980 - present

Let the good times rollLet the good times roll

• The demonstrated success of AI invited investmentsThe demonstrated success of AI invited investments

– from millions to billions of dollars in 10 yearsfrom millions to billions of dollars in 10 years

• extravagant AI promises again led to “AI Winter” when extravagant AI promises again led to “AI Winter” when investments in technology dropped (1988)investments in technology dropped (1988)

Neural Networks came back from the dead (1986)Neural Networks came back from the dead (1986)

Let the good times rollLet the good times roll

• The demonstrated success of AI invited investmentsThe demonstrated success of AI invited investments

– from millions to billions of dollars in 10 yearsfrom millions to billions of dollars in 10 years

• extravagant AI promises again led to “AI Winter” when extravagant AI promises again led to “AI Winter” when investments in technology dropped (1988)investments in technology dropped (1988)

Neural Networks came back from the dead (1986)Neural Networks came back from the dead (1986)

AI History: 1987 - present

AI becomes a scienceAI becomes a science

• More repeatability of experimentsMore repeatability of experiments

• More development of mathematical underpinningsMore development of mathematical underpinnings

• Reuse of time-tested modelsReuse of time-tested models

Intelligent Agents (1994)Intelligent Agents (1994)

• AI systems exist in real environments with real sensory inputsAI systems exist in real environments with real sensory inputs

• Niches of AI need to be reorganizedNiches of AI need to be reorganized

AI becomes a scienceAI becomes a science

• More repeatability of experimentsMore repeatability of experiments

• More development of mathematical underpinningsMore development of mathematical underpinnings

• Reuse of time-tested modelsReuse of time-tested models

Intelligent Agents (1994)Intelligent Agents (1994)

• AI systems exist in real environments with real sensory inputsAI systems exist in real environments with real sensory inputs

• Niches of AI need to be reorganizedNiches of AI need to be reorganized

AI History: Where are We Now?

• Autonomous planning: scheduling operations aboard a robotAutonomous planning: scheduling operations aboard a robot

– Some notable failures (Dante falls in a crater after one Some notable failures (Dante falls in a crater after one step) and shining successes (Mars Spirit Rover)step) and shining successes (Mars Spirit Rover)

• Game playing: Kasparov lost to IBM’s Big Blue in chessGame playing: Kasparov lost to IBM’s Big Blue in chess

– Rules were changed to prevent computer from retraining Rules were changed to prevent computer from retraining over night and to provide human players with more over night and to provide human players with more examples of computerized playexamples of computerized play

• Autonomous planning: scheduling operations aboard a robotAutonomous planning: scheduling operations aboard a robot

– Some notable failures (Dante falls in a crater after one Some notable failures (Dante falls in a crater after one step) and shining successes (Mars Spirit Rover)step) and shining successes (Mars Spirit Rover)

• Game playing: Kasparov lost to IBM’s Big Blue in chessGame playing: Kasparov lost to IBM’s Big Blue in chess

– Rules were changed to prevent computer from retraining Rules were changed to prevent computer from retraining over night and to provide human players with more over night and to provide human players with more examples of computerized playexamples of computerized play

AI History: Where are We Now?

• Autonomous Control: CMU’s NAVLAB drove from Pittsburgh Autonomous Control: CMU’s NAVLAB drove from Pittsburgh to San Francisco under computer control 98% of timeto San Francisco under computer control 98% of time

• Stanford vehicle wins 2006 DARPA Grand ChallengeStanford vehicle wins 2006 DARPA Grand ChallengeCMU’s 2005 vehicle falls crashes at starting lineCMU’s 2005 vehicle falls crashes at starting line

• Logistics: deployment of troops to IraqLogistics: deployment of troops to Iraq

• Robotics: remote heart operationsRobotics: remote heart operations

• human genome, protein folding, drug discoveryhuman genome, protein folding, drug discovery

• stock marketstock market

• Autonomous Control: CMU’s NAVLAB drove from Pittsburgh Autonomous Control: CMU’s NAVLAB drove from Pittsburgh to San Francisco under computer control 98% of timeto San Francisco under computer control 98% of time

• Stanford vehicle wins 2006 DARPA Grand ChallengeStanford vehicle wins 2006 DARPA Grand ChallengeCMU’s 2005 vehicle falls crashes at starting lineCMU’s 2005 vehicle falls crashes at starting line

• Logistics: deployment of troops to IraqLogistics: deployment of troops to Iraq

• Robotics: remote heart operationsRobotics: remote heart operations

• human genome, protein folding, drug discoveryhuman genome, protein folding, drug discovery

• stock marketstock market

Review

We’ll study systems that act rationallyWe’ll study systems that act rationally• They need not necessarily “think” or act like humansThey need not necessarily “think” or act like humans

• They need not “think” in rational waysThey need not “think” in rational ways

The domain of AI research changes over timeThe domain of AI research changes over time

AI research draws from many fieldsAI research draws from many fields• Philosophy, psychology, neuroscience, mathematics, Philosophy, psychology, neuroscience, mathematics,

economics, mechanical, linguisticseconomics, mechanical, linguistics

AI has had ups and downs since 1950AI has had ups and downs since 1950

We’ll study systems that act rationallyWe’ll study systems that act rationally• They need not necessarily “think” or act like humansThey need not necessarily “think” or act like humans

• They need not “think” in rational waysThey need not “think” in rational ways

The domain of AI research changes over timeThe domain of AI research changes over time

AI research draws from many fieldsAI research draws from many fields• Philosophy, psychology, neuroscience, mathematics, Philosophy, psychology, neuroscience, mathematics,

economics, mechanical, linguisticseconomics, mechanical, linguistics

AI has had ups and downs since 1950AI has had ups and downs since 1950

Outline

1.1. What is an agent?What is an agent?

2.2. What is rationality?What is rationality?

3.3. Where do agents operate?Where do agents operate?

4.4. What goes into an agent?What goes into an agent?

1.1. What is an agent?What is an agent?

2.2. What is rationality?What is rationality?

3.3. Where do agents operate?Where do agents operate?

4.4. What goes into an agent?What goes into an agent?

1. What is an agent?

PerceptionPerception• Sensors receive input from environmentSensors receive input from environment

– Keyboard clicksKeyboard clicks

– Camera dataCamera data

– Bump sensorBump sensor

ActionAction

• Actuators impact the environmentActuators impact the environment

– Move a robotic armMove a robotic arm

– Generate output for computer displayGenerate output for computer display

PerceptionPerception• Sensors receive input from environmentSensors receive input from environment

– Keyboard clicksKeyboard clicks

– Camera dataCamera data

– Bump sensorBump sensor

ActionAction

• Actuators impact the environmentActuators impact the environment

– Move a robotic armMove a robotic arm

– Generate output for computer displayGenerate output for computer display

Perception

PerceptPercept

• Perceptual inputs at an instantPerceptual inputs at an instant

• May include perception of internal stateMay include perception of internal state

Percept SequencePercept Sequence

• Complete history of all prior perceptsComplete history of all prior percepts

Do you need a Do you need a percept sequencepercept sequence to play Chess? to play Chess?

PerceptPercept

• Perceptual inputs at an instantPerceptual inputs at an instant

• May include perception of internal stateMay include perception of internal state

Percept SequencePercept Sequence

• Complete history of all prior perceptsComplete history of all prior percepts

Do you need a Do you need a percept sequencepercept sequence to play Chess? to play Chess?

An agent as a function

Agent maps percept sequence to actionAgent maps percept sequence to action

• Agent:Agent:

– Set of all inputs known as Set of all inputs known as state spacestate space

• Repeating loop:Repeating loop:

We must construct f( ), our agentWe must construct f( ), our agent

• It must act rationallyIt must act rationally

Agent maps percept sequence to actionAgent maps percept sequence to action

• Agent:Agent:

– Set of all inputs known as Set of all inputs known as state spacestate space

• Repeating loop:Repeating loop:

We must construct f( ), our agentWe must construct f( ), our agent

• It must act rationallyIt must act rationally

The agent’s environment

What is known about percepts?What is known about percepts?

• Quantity, range, certainty…Quantity, range, certainty…

– If percepts are finite, could a table store mapping?If percepts are finite, could a table store mapping?

What is known about environment?What is known about environment?

• Is Is f (a, e)f (a, e) a known function and predictable? a known function and predictable?

– Do you know what effect your actions will have?Do you know what effect your actions will have?

More on this laterMore on this later

What is known about percepts?What is known about percepts?

• Quantity, range, certainty…Quantity, range, certainty…

– If percepts are finite, could a table store mapping?If percepts are finite, could a table store mapping?

What is known about environment?What is known about environment?

• Is Is f (a, e)f (a, e) a known function and predictable? a known function and predictable?

– Do you know what effect your actions will have?Do you know what effect your actions will have?

More on this laterMore on this later

2. Is your agent rational?

We agree on what an agent must doWe agree on what an agent must do

Can we evaluate its quality?Can we evaluate its quality?

Performance MetricsPerformance Metrics

• Very ImportantVery Important

• Frequently the hardest part of the research problemFrequently the hardest part of the research problem

• Design these to suit what you really want to happenDesign these to suit what you really want to happen

We agree on what an agent must doWe agree on what an agent must do

Can we evaluate its quality?Can we evaluate its quality?

Performance MetricsPerformance Metrics

• Very ImportantVery Important

• Frequently the hardest part of the research problemFrequently the hardest part of the research problem

• Design these to suit what you really want to happenDesign these to suit what you really want to happen

Performance vis-à-vis rationality

For each percept sequence, a rational agent For each percept sequence, a rational agent should select an action that maximizes its should select an action that maximizes its performance measureperformance measure

Example: autonomous vacuum cleanerExample: autonomous vacuum cleaner

• What is the performance measure?What is the performance measure?

For each percept sequence, a rational agent For each percept sequence, a rational agent should select an action that maximizes its should select an action that maximizes its performance measureperformance measure

Example: autonomous vacuum cleanerExample: autonomous vacuum cleaner

• What is the performance measure?What is the performance measure?• Penalty for eating the cat? How much?Penalty for eating the cat? How much?

• Penalty for missing a spot?Penalty for missing a spot?

• Reward for speed?Reward for speed?

• Reward for conserving power?Reward for conserving power?

• Penalty for eating the cat? How much?Penalty for eating the cat? How much?

• Penalty for missing a spot?Penalty for missing a spot?

• Reward for speed?Reward for speed?

• Reward for conserving power?Reward for conserving power?

Learning and Autonomy

LearningLearning

• To update the agent function, , in light of observed To update the agent function, , in light of observed performance of percept-sequence to action pairsperformance of percept-sequence to action pairs

– Learn what you know betterLearn what you know better

Learn from focused trial and errorLearn from focused trial and error

– Learn to distinguish what you don’t knowLearn to distinguish what you don’t know

What parts of state space to explore?What parts of state space to explore?

LearningLearning

• To update the agent function, , in light of observed To update the agent function, , in light of observed performance of percept-sequence to action pairsperformance of percept-sequence to action pairs

– Learn what you know betterLearn what you know better

Learn from focused trial and errorLearn from focused trial and error

– Learn to distinguish what you don’t knowLearn to distinguish what you don’t know

What parts of state space to explore?What parts of state space to explore?

Adding intelligence to agent function

At design timeAt design time

• Some agents are designed with clear procedure to improve Some agents are designed with clear procedure to improve performance over time. Really the engineer’s intelligence.performance over time. Really the engineer’s intelligence.

– Camera-based user identificationCamera-based user identification

At runtimeAt runtime

• With experience, agent changes its program (parameters)With experience, agent changes its program (parameters)

At design timeAt design time

• Some agents are designed with clear procedure to improve Some agents are designed with clear procedure to improve performance over time. Really the engineer’s intelligence.performance over time. Really the engineer’s intelligence.

– Camera-based user identificationCamera-based user identification

At runtimeAt runtime

• With experience, agent changes its program (parameters)With experience, agent changes its program (parameters)

3. Where do agents operate?

An effective way to classify agentsAn effective way to classify agents

• What does “where” mean?What does “where” mean?

– What do inputs to agent look like?What do inputs to agent look like?

Discrete, repetitive, noisy, etc.Discrete, repetitive, noisy, etc.

An effective way to classify agentsAn effective way to classify agents

• What does “where” mean?What does “where” mean?

– What do inputs to agent look like?What do inputs to agent look like?

Discrete, repetitive, noisy, etc.Discrete, repetitive, noisy, etc.

How big is your percept?

Dung BeetleDung Beetle• Almost no perception (percept)Almost no perception (percept)

– Rational agents fine-tune actions based on feedbackRational agents fine-tune actions based on feedback

Sphex WaspSphex Wasp• Has percepts, but lacks percept sequenceHas percepts, but lacks percept sequence

– Rational agents change plans entirely when fine tuning failsRational agents change plans entirely when fine tuning fails

A DogA Dog• Equipped with percepts and percept sequencesEquipped with percepts and percept sequences

– Reacts to environment and can significantly alter behavior Reacts to environment and can significantly alter behavior

Dung BeetleDung Beetle• Almost no perception (percept)Almost no perception (percept)

– Rational agents fine-tune actions based on feedbackRational agents fine-tune actions based on feedback

Sphex WaspSphex Wasp• Has percepts, but lacks percept sequenceHas percepts, but lacks percept sequence

– Rational agents change plans entirely when fine tuning failsRational agents change plans entirely when fine tuning fails

A DogA Dog• Equipped with percepts and percept sequencesEquipped with percepts and percept sequences

– Reacts to environment and can significantly alter behavior Reacts to environment and can significantly alter behavior

Qualities of a task environment (1)

Fully ObservableFully Observable• Agent need not store any aspects of stateAgent need not store any aspects of state

– Hansel and Gretel’s bread crumbsHansel and Gretel’s bread crumbs

– Volume of observables may be overwhelmingVolume of observables may be overwhelming

Partially ObservablePartially Observable• Some data is unavailableSome data is unavailable

– MazeMaze

– Noisy sensorsNoisy sensors

Fully ObservableFully Observable• Agent need not store any aspects of stateAgent need not store any aspects of state

– Hansel and Gretel’s bread crumbsHansel and Gretel’s bread crumbs

– Volume of observables may be overwhelmingVolume of observables may be overwhelming

Partially ObservablePartially Observable• Some data is unavailableSome data is unavailable

– MazeMaze

– Noisy sensorsNoisy sensors


DeterministicDeterministic• Always the same outcome for environment/action pairAlways the same outcome for environment/action pair

StochasticStochastic• Not always predictable – randomNot always predictable – random

Partially Observable vs. StochasticPartially Observable vs. Stochastic• My cats think the world is stochasticMy cats think the world is stochastic

– It’s really only partially observable by themIt’s really only partially observable by them

• Physicists think the world is deterministicPhysicists think the world is deterministic

– Somewhere there is a “god function” that explains it allSomewhere there is a “god function” that explains it all

DeterministicDeterministic• Always the same outcome for environment/action pairAlways the same outcome for environment/action pair

StochasticStochastic• Not always predictable – randomNot always predictable – random

Partially Observable vs. StochasticPartially Observable vs. Stochastic• My cats think the world is stochasticMy cats think the world is stochastic

– It’s really only partially observable by themIt’s really only partially observable by them

• Physicists think the world is deterministicPhysicists think the world is deterministic

– Somewhere there is a “god function” that explains it allSomewhere there is a “god function” that explains it all


MarkovianMarkovian

• Future environment depends only on current environment and actionFuture environment depends only on current environment and action

EpisodicEpisodic

• Percept sequence can be segmented into independent temporal Percept sequence can be segmented into independent temporal categoriescategories

– Behavior at traffic light independent of previous trafficBehavior at traffic light independent of previous traffic

SequentialSequential

• Current decision could affect all future decisionsCurrent decision could affect all future decisions

MarkovianMarkovian

• Future environment depends only on current environment and actionFuture environment depends only on current environment and action

EpisodicEpisodic

• Percept sequence can be segmented into independent temporal Percept sequence can be segmented into independent temporal categoriescategories

– Behavior at traffic light independent of previous trafficBehavior at traffic light independent of previous traffic

SequentialSequential

• Current decision could affect all future decisionsCurrent decision could affect all future decisions


StaticStatic

• Environment doesn’t change over timeEnvironment doesn’t change over time

– Crossword puzzleCrossword puzzle

DynamicDynamic

• Environment changes over timeEnvironment changes over time

– Driving a carDriving a car

Semi-dynamicSemi-dynamic

• Environment is static, but performance metrics are dynamicEnvironment is static, but performance metrics are dynamic

– You never make a second first impressionYou never make a second first impression

StaticStatic

• Environment doesn’t change over timeEnvironment doesn’t change over time

– Crossword puzzleCrossword puzzle

DynamicDynamic

• Environment changes over timeEnvironment changes over time

– Driving a carDriving a car

Semi-dynamicSemi-dynamic

• Environment is static, but performance metrics are dynamicEnvironment is static, but performance metrics are dynamic

– You never make a second first impressionYou never make a second first impression


DiscreteDiscrete• Values of a state space feature (dimension) are constrained Values of a state space feature (dimension) are constrained

to distinct values from a finite setto distinct values from a finite set

– Blackjack:Blackjack:

ContinuousContinuous• Variable has infinite variationVariable has infinite variation

– Antilock brakes:Antilock brakes:

– Are computers really continuous?Are computers really continuous?

DiscreteDiscrete• Values of a state space feature (dimension) are constrained Values of a state space feature (dimension) are constrained

to distinct values from a finite setto distinct values from a finite set

– Blackjack:Blackjack:

ContinuousContinuous• Variable has infinite variationVariable has infinite variation

– Antilock brakes:Antilock brakes:

– Are computers really continuous?Are computers really continuous?

Qualities of a task environment

Towards a terse description of problem domainsTowards a terse description of problem domains• Environment: Environment: features, dimensionality, degrees of freedomfeatures, dimensionality, degrees of freedom

• Observable?Observable?

• Predictable?Predictable?

• Dynamic?Dynamic?

• Continuous?Continuous?

• Performance metricPerformance metric

Towards a terse description of problem domainsTowards a terse description of problem domains• Environment: Environment: features, dimensionality, degrees of freedomfeatures, dimensionality, degrees of freedom

• Observable?Observable?

• Predictable?Predictable?

• Dynamic?Dynamic?

• Continuous?Continuous?

• Performance metricPerformance metric

4. Building Agent Programs

Example: the table approachExample: the table approach• Build a table mapping states to actionsBuild a table mapping states to actions

– Chess has 10Chess has 10150 150 entries (10entries (108080 atoms in the universe) atoms in the universe)

– I’ve said memory is free, but keep it within the confines of I’ve said memory is free, but keep it within the confines of the boundable universethe boundable universe

• Still, tables have their placeStill, tables have their place

Discuss four agent program principlesDiscuss four agent program principles

Example: the table approachExample: the table approach• Build a table mapping states to actionsBuild a table mapping states to actions

– Chess has 10Chess has 10150 150 entries (10entries (108080 atoms in the universe) atoms in the universe)

– I’ve said memory is free, but keep it within the confines of I’ve said memory is free, but keep it within the confines of the boundable universethe boundable universe

• Still, tables have their placeStill, tables have their place

Discuss four agent program principlesDiscuss four agent program principles

Simple Reflex Agents

• Sense environmentSense environment

• Match sensations with rules in databaseMatch sensations with rules in database

• Rule prescribes an actionRule prescribes an action

Reflexes can be badReflexes can be bad

• Don’t catch a falling iron.Don’t catch a falling iron.

Inaccurate informationInaccurate information

• Misperception can trigger reflex when inappropriateMisperception can trigger reflex when inappropriate

But rules databases can be made large and complexBut rules databases can be made large and complex

• Sense environmentSense environment

• Match sensations with rules in databaseMatch sensations with rules in database

• Rule prescribes an actionRule prescribes an action

Reflexes can be badReflexes can be bad

• Don’t catch a falling iron.Don’t catch a falling iron.

Inaccurate informationInaccurate information

• Misperception can trigger reflex when inappropriateMisperception can trigger reflex when inappropriate

But rules databases can be made large and complexBut rules databases can be made large and complex

Simple Reflex Agents w/ Incomplete Sensing

How can you react to things you cannot see?How can you react to things you cannot see?

• Vacuum cleaning the room w/o any sensorsVacuum cleaning the room w/o any sensors

• Vacuum cleaning room w/ bump sensorVacuum cleaning room w/ bump sensor

• Vacuum cleaning room w/ GPS and perfect map of static Vacuum cleaning room w/ GPS and perfect map of static environmentenvironment

Randomized actions are very useful hereRandomized actions are very useful here

How can you react to things you cannot see?How can you react to things you cannot see?

• Vacuum cleaning the room w/o any sensorsVacuum cleaning the room w/o any sensors

• Vacuum cleaning room w/ bump sensorVacuum cleaning room w/ bump sensor

• Vacuum cleaning room w/ GPS and perfect map of static Vacuum cleaning room w/ GPS and perfect map of static environmentenvironment

Randomized actions are very useful hereRandomized actions are very useful here

Model-based Reflex Agents

So when you can’t see something, you model it!So when you can’t see something, you model it!• Create an internal variable to store your expectation of Create an internal variable to store your expectation of

variables you can’t observevariables you can’t observe

• If I throw a ball to you and it falls short, do I know why?If I throw a ball to you and it falls short, do I know why?

– I don’t really know why…I don’t really know why…

Aerodynamics, mass, my energy levels…Aerodynamics, mass, my energy levels…

– I do have a modelI do have a model

Ball falls short, throw harderBall falls short, throw harder

So when you can’t see something, you model it!So when you can’t see something, you model it!• Create an internal variable to store your expectation of Create an internal variable to store your expectation of

variables you can’t observevariables you can’t observe

• If I throw a ball to you and it falls short, do I know why?If I throw a ball to you and it falls short, do I know why?

– I don’t really know why…I don’t really know why…

Aerodynamics, mass, my energy levels…Aerodynamics, mass, my energy levels…

– I do have a modelI do have a model

Ball falls short, throw harderBall falls short, throw harder

Model-based Reflex Agents

Admit it, you can’t see and understand everythingAdmit it, you can’t see and understand everything

Models are very important!Models are very important!

• We all use models to get through our livesWe all use models to get through our lives

– Psychologists have many names for these context-Psychologists have many names for these context-sensitive modelssensitive models

• Agents need models tooAgents need models too

Admit it, you can’t see and understand everythingAdmit it, you can’t see and understand everything

Models are very important!Models are very important!

• We all use models to get through our livesWe all use models to get through our lives

– Psychologists have many names for these context-Psychologists have many names for these context-sensitive modelssensitive models

• Agents need models tooAgents need models too

Goal-based Agents

Overall goal is known, but lacking moment-to-moment Overall goal is known, but lacking moment-to-moment performance measureperformance measure

• Don’t exactly know what performance maximizing action is at each stepDon’t exactly know what performance maximizing action is at each step

Example:Example:

• How to get from A to B?How to get from A to B?

– Current actions have future consequencesCurrent actions have future consequences

– SearchSearch and and PlanningPlanning are used to explore paths through state space are used to explore paths through state space from A to Bfrom A to B

Overall goal is known, but lacking moment-to-moment Overall goal is known, but lacking moment-to-moment performance measureperformance measure

• Don’t exactly know what performance maximizing action is at each stepDon’t exactly know what performance maximizing action is at each step

Example:Example:

• How to get from A to B?How to get from A to B?

– Current actions have future consequencesCurrent actions have future consequences

– SearchSearch and and PlanningPlanning are used to explore paths through state space are used to explore paths through state space from A to Bfrom A to B

Utility-based Agents

Goal-directed agents that have a utility functionGoal-directed agents that have a utility function

• Function that maps internal and external states into a scalarFunction that maps internal and external states into a scalar

– A scalar is a number used to make moment-to-moment A scalar is a number used to make moment-to-moment evaluations of candidate actions evaluations of candidate actions

Goal-directed agents that have a utility functionGoal-directed agents that have a utility function

• Function that maps internal and external states into a scalarFunction that maps internal and external states into a scalar

– A scalar is a number used to make moment-to-moment A scalar is a number used to make moment-to-moment evaluations of candidate actions evaluations of candidate actions

Learning Agents

Desirable to build a system that “figures it out”Desirable to build a system that “figures it out”• GeneralizableGeneralizable

• Compensates for absence of designer knowledgeCompensates for absence of designer knowledge

• ReusableReusable

• Learning by example isn’t easy to accomplishLearning by example isn’t easy to accomplish

– What exercises do you do to learn?What exercises do you do to learn?

– What outcomes do you observe?What outcomes do you observe?

– What inputs to your alter?What inputs to your alter?

Desirable to build a system that “figures it out”Desirable to build a system that “figures it out”• GeneralizableGeneralizable

• Compensates for absence of designer knowledgeCompensates for absence of designer knowledge

• ReusableReusable

• Learning by example isn’t easy to accomplishLearning by example isn’t easy to accomplish

– What exercises do you do to learn?What exercises do you do to learn?

– What outcomes do you observe?What outcomes do you observe?

– What inputs to your alter?What inputs to your alter?

Learning Agents

Performance ElementPerformance Element• Selecting actions (this is the “agent” we’ve been discussing)Selecting actions (this is the “agent” we’ve been discussing)

Problem GeneratorProblem Generator• Provides suggestions for new tasks to explore state spaceProvides suggestions for new tasks to explore state space

CriticCritic• Provides learning element with feedback about progress (are we doing Provides learning element with feedback about progress (are we doing

good things or should we try something else?)good things or should we try something else?)

Learning ElementLearning Element• Making improvements (how is agent changed based on experience)Making improvements (how is agent changed based on experience)

Performance ElementPerformance Element• Selecting actions (this is the “agent” we’ve been discussing)Selecting actions (this is the “agent” we’ve been discussing)

Problem GeneratorProblem Generator• Provides suggestions for new tasks to explore state spaceProvides suggestions for new tasks to explore state space

CriticCritic• Provides learning element with feedback about progress (are we doing Provides learning element with feedback about progress (are we doing

good things or should we try something else?)good things or should we try something else?)

Learning ElementLearning Element• Making improvements (how is agent changed based on experience)Making improvements (how is agent changed based on experience)

A taxi driver

Performance ElementPerformance Element• Knowledge of how to drive in trafficKnowledge of how to drive in traffic

Problem GeneratorProblem Generator• Proposes new routes to try to hopefully improve driving skillsProposes new routes to try to hopefully improve driving skills

CriticCritic• Observes tips from customers and horn honking from other carsObserves tips from customers and horn honking from other cars

Learning ElementLearning Element• Relates low tips to actions that may be the causeRelates low tips to actions that may be the cause

Performance ElementPerformance Element• Knowledge of how to drive in trafficKnowledge of how to drive in traffic

Problem GeneratorProblem Generator• Proposes new routes to try to hopefully improve driving skillsProposes new routes to try to hopefully improve driving skills

CriticCritic• Observes tips from customers and horn honking from other carsObserves tips from customers and horn honking from other cars

Learning ElementLearning Element• Relates low tips to actions that may be the causeRelates low tips to actions that may be the cause

Review

Outlined families of AI problems and solutionsOutlined families of AI problems and solutions

I consider AI to be a problem of I consider AI to be a problem of searchingsearching

• Countless things differentiate search problemsCountless things differentiate search problems

– Number of percepts, number of actions, amount of a priori Number of percepts, number of actions, amount of a priori knowledge, predictability of world…knowledge, predictability of world…

• Textbook is divided into sections based on these differencesTextbook is divided into sections based on these differences

Outlined families of AI problems and solutionsOutlined families of AI problems and solutions

I consider AI to be a problem of I consider AI to be a problem of searchingsearching

• Countless things differentiate search problemsCountless things differentiate search problems

– Number of percepts, number of actions, amount of a priori Number of percepts, number of actions, amount of a priori knowledge, predictability of world…knowledge, predictability of world…

• Textbook is divided into sections based on these differencesTextbook is divided into sections based on these differences

Sections of book

• Problem solving:Problem solving: Searching through predictable, discrete environments Searching through predictable, discrete environments

• Knowledge and Reasoning:Knowledge and Reasoning: Searching when a model of the world is Searching when a model of the world is knownknown

– aa leads to leads to bb and and bb leads to leads to cc… so go to … so go to aa to reach to reach cc

• Planning:Planning: Refining search techniques to take advantage of domain Refining search techniques to take advantage of domain knowledgeknowledge

• Uncertainty:Uncertainty: Using statistics and observations to collect knowledge Using statistics and observations to collect knowledge

• Learning:Learning: Using observations to understand the way the world works Using observations to understand the way the world works and to act rationally within itand to act rationally within it

• Problem solving:Problem solving: Searching through predictable, discrete environments Searching through predictable, discrete environments

• Knowledge and Reasoning:Knowledge and Reasoning: Searching when a model of the world is Searching when a model of the world is knownknown

– aa leads to leads to bb and and bb leads to leads to cc… so go to … so go to aa to reach to reach cc

• Planning:Planning: Refining search techniques to take advantage of domain Refining search techniques to take advantage of domain knowledgeknowledge

• Uncertainty:Uncertainty: Using statistics and observations to collect knowledge Using statistics and observations to collect knowledge

• Learning:Learning: Using observations to understand the way the world works Using observations to understand the way the world works and to act rationally within itand to act rationally within it

Documents

CS 416 Artificial Intelligence Lecture 3 Agents Agents