Mind is About Predictions Rich Sutton AT&T Labs with special thanks to Michael Littman, Doina Precup, Satinder Singh, David McAllester

Mind is About Predictions

Rich SuttonAT&T Labs

with special thanks to Michael Littman, Doina Precup, Satinder Singh, David

McAllester

Mind is About PredictionsHypothesis: Knowledge is predictive

About what-leads-to-what, under what ways of behavingWhat will I see if I go around the corner?Objects: What will I see if I turn this over?Active vision: What will I see if I look at my hand?Value functions: What is the most reward I know how to get?

Such knowledge is learnable, chainable

Hypothesis: Mental activity is working with predictionsLearning themCombining them to produce new predictions (reasoning)Converting them to action (planning, reinforcement

learning)Figuring out which are most useful

Philosophical and Psychological Roots

• Like classical british empiricism (1650–1800)– Knowledge is about experience– Experience is central

• But not anti-nativist (evolutionary experience)• Emphasizing sequential rather than simultaneous

events– Replace association/contiguity with prediction/contingency

• Close to Tolman’s “Expectancy Theory” (1932–1950)– Cognitive maps, vicarious trial and error

• Psychology struggled to make it a science (1890–1950)– Introspection– Behaviorism, operational definitions– Objectivity

Modern Computional View of Mind

• OK to talk about insides of minds• Ok to talk about the function and purpose of a

design• We talk about Why

– Why a system works– Why it should compute X and in manner Y– Why such a system should achieve purpose Z

• This is new, and resolves classical struggles– Servo-mechanisms, state-transition probabilities– Utility and decision theory– Information as signal – subjective (private) yet clear

• Purpose defines and constrains mental constructs

Informational View of Mind

• Mind does information processing• Mind exchanges information with the world

• Only experience is known for sure– Anything more public or “objective” is suspect

• World is an I-O entity, a black box• Although we often seem to talk about what is

inside,All we can sensibly talk about is I-O behavior

• This “interactionist stance” seems to follow from IVoM

Mind Worldexperience

Is Mind about Predictions?OR

Is Mind about Action (or Policies)?

• Of course it is ultimately about action• But action generation methods are relatively clear

– Value functions and decision theory• Pick action that maximizes expected cumulative reward

– OR• Policy gradient RL methods• Execution-time search• Reflexes and behavior-based robotics• Learning-extended reflexes and conditioning

• Flexible cognition requires more than action generation

• Most mental activity is working with predictions

An old, simple, appealing idea

• Mind as prediction engine! • Predictions are learnable, combinable• They represent cause and effect, and can be

pieced together to yield plans• Perhaps this old idea is essentially correct.• Just needs

– Development, revitalization in modern forms– Greater precision, formalization, mathematics– The computational perspective to make it respectable– Imagination, determination, patience

• Not rushing to performance• Not building in ungrounded world knowledge

Topics

• Super-Predictions

• Combining Predictions (reasoning and planning)

• Predictions and State

Experience L st at st+1 at+1 st+2 at+2L

1-step Prediction

state action

X Ya Pr st+1 =Y st =X,at =a

k-step Prediction

X Y Pr st+k =Y st =X,atL at+k given by π

In general, predictions depend on actions, on policiesAnd there is a huge space of policies…can be closed loop

The Simplest Predictions

Simple Mixture Predictions

Where will I be in 10–20 steps?

Where will I be in roughly 10 steps?

now

10steps

20steps

10 steps

Arbitrary terminationprofiles are possible

Closed-loop termination:Terminate depending on what happensWhere will I be when X happens?

short term

medium termlong term

Closed-loop termination loosens the time-specificity of

predictions• Instead of “what will I see at t+100?”• Can say “what will I see when I open the box?”

• Will we elect a black or a woman president first?• Where will the tennis ball be when it reaches

me?• What time will it be when the talk starts?

or “when John arrives?” “when the bus comes?” “when I get to the store?”

A substantial increase in expressiveness

Super-PredictionsClosed-loop terminationsAnd Closed-loop policies

Correspond to arbitrary experimentsand the results of those experiments

What will I see if I go into the next room?What time will it be when the talk is over?Is there a dollar in the wallet in my pocket?Where is my car parked?Can I throw the ball into the basket?Is this a chair situation?What will I see if I turn this object around?

Anatomy of a Super-Prediction1 Predictor

Recognizes the conditions, makes the prediction

2 Experiment - policy- termination condition- measurement function(s)

3 GoalA function of the anticipated measurementto be maximized by choice of policy and termination

p:S → M

π :S → A or 2A

β :S → [0,1]

m: S ×A{ }* → M

g:M → ℜ

p(s) ≈ Pr ξ st =s,π,β m(ξ)ξ∈ A ×S{ }*

ξ=atst+1L st+k

∑ = Eπ,β m(ξ){ }

Example: Open-the-door• Predictor

Use visual input to estimate– Probabilities of succeeding in opening the door, and of

other outcomes (door locked, no handle, no real door)– expected cumulative cost (sub-par reward) in trying

• Experiment– Policy for walking up to the door, shaping grasp of

handle, turning, pulling, and opening the door– Terminate on successful opening or various failure

conditions– Measure outcome and cumulative cost

• Goal – Sum of expected cost and expected value of outcome– Can be used to define experiment’s policy and

termination

RoboCup-Soccer Example

Safe to pass?

Predict the outcome of choosing to pass

• The pass will take several steps to set up – choosing to pass involves a whole action policy• You may choose to not to pass half way through• Terminations and outcomes: – pass is aborted – opponents touch the ball before teammate – teamate touches first, appears to control ball – ball goes out of bounds

Example: Pass-to-Teammate• Predictor uses perceived positions of ball,

opponents, etc. to estimate probabilities of – Successful pass, openness of receiver– Interception– Reception failure– Aborted pass, in trouble– Aborted pass, something better to do– Loss of time

• Experiment– Policy for maneuvering ball, or around ball, to set up and

pass– Termination strategy for aborting, recognizing

completion– Measurement of outcome, time

• Goal – Some combination of outcome values, time, openness of

rec.

Topics




Combining Predictions I: Composition

If the mind is about predictions,Then thinking is combining predictions to produce new ones

X Yπ1β1

T1Y Z

π2β2

T2

X Z π1β1 then π2β2

T1+T2

Here each prediction is assumed to predictA transient measurement (e.g., elapsed time, cumulative reward)A final measurement (e.g., partial distribution of outcome states)

The new prediction does not necessarily have a goal

Combining Predictions I: Composition

If the mind is about predictions,Then thinking is combining predictions to produce new ones

X Yπ1β1

T1Y Z

π2β2

T2

X Z 11 then if Y 22

T1 .8 T2

Here each prediction is assumed to predictA transient measurement (e.g., elapsed time, cumulative reward)A final measurement (e.g., partial distribution of outcome states)

The new prediction does not necessarily have a goal

Y’ .1

Y’’ .1

.8

Y’ .1

Y’’ .1

.8

Combining Predictions II: Choice

A predictor plus a goal compose to form a value function

we can do all the usual planning backups with p g

p:S → M

g:M → ℜ

pog:S → ℜ

X Yπ β

Tg = 5

X Y’′ π ′ β

′ T g = 6

In X, for g, is a better Choice than .Store it with g.

′ π ′ β

Room-to-Room Super-Predictions

up

down

rightleft

(to each room's 2 hallways)

Fail 33% of the time

Sutton, Precup,& Singh, 1999

8 multi-step super-predictions

4 stochasticprimitive actions

“Options”Precup 2000Sutton, Precup, & Singh 1999

Predict: Probability of reaching each terminal hallwayGoal: minimize # steps + values for target and other outcome hallway

Policy

Terminationhallways

Target (goal)hallway

Planning with Super-Predictions

Iteration #0 Iteration #1 Iteration #2

with cell-to-cellprimitive actions

Iteration #0 Iteration #1 Iteration #2

with room-to-roomoptions

V(goa l)=1

V (goa l)=1

(super-predictions)

Topics




Predictive State Representations

• Hypothesis: What we normally think of as stateis a set of predictions about outcomes of experiments– Wallet’s contents, John’s location, presence of objects…

• Problem: So far we have assumed statesbut really world just gives information, “observations”

• There are several ways to formalize this problem– Learning deterministic Finite State Automata

• Rivest & Schapire, 1987

– Adding stochasticity: An alternative to Hidden Markov Models

• Herbert Jaeger, 1999

– Adding action: An alternative to Partially Observable Markov Decision Processes

• Littman, Sutton, & Singh 2001

PSR Formalism 1

Mind Worldactions

observations

Experience: A1 O1 A2 O2L

Random variables

A test is a subsequence, a simple case of an experiment if the actions are done, will the observations occur?

The world is defined by the probabilities of each test from the beginning of time:

and after a finite history sequence h (formally another test):

t∈ A ×O{ }*, e.g., t=a1o1L ak ok

Pr(t)=Pr O1 =o1,L ,Ok =ok A1 =a1,L ,Ak =ak

Pr(t |h) =Pr(ht)Pr(h)

PSR Formalism 2

A Predictive State Representation (PSR) is

a set of tests

whose vector of predictions

is sufficient information to predict all tests

i.e., whose predictions are a sufficient statistic, a

state

A linear PSR is a PSR where each ft is linear

T = ti{ }

r x (h)= Pr(t1 |h),L ,Pr(tn |h)[ ]

Pr(t |h) = ft(r x (h)) ∀t,h

Walk/Reset Example

Actions:Walk : Take a random step left or right, see 0Reset: Jump to rightmost state, see 1 if already there

Need to remember of Walks since last ResetProbabilities of being rightmost are:

1 .5 .5 .375 .375 .3125 .3125…

PSR tests: Reset1, Walk0Reset1

1

Walk/Reset Example

Start on Right...Walk : Take a random step left or right, see 0Reset: Jump to rightmost state, see 1 if already there


1 .5 .5 .375 .375 .3125 .3125…


1

Walk/Reset Example

Start on Right...After one Walk Walk step left or right, see 0Reset: Jump to rightmost state, see 1 if already there


1 .5 .5 .375 .375 .3125 .3125…


.5 .5

1

Walk/Reset Example

Start on Right...After one Walk Walk step left or right, see 0After two Walks state, see 1 if already there


1 .5 .5 .375 .375 .3125 .3125…


.25 .25 .5

1

PSR Results

• Exist compact, linear PSRs– # tests ≤ # states in minimal POMDP– # tests ≤ Rivest & Schapire’s Diversity– # tests can be exponentially fewer than diversity and

POMDP

• Compact simulation/update process• Construction algorithm from POMDP

• Learning/discovery algorithms of Rivest and Schapire, and of Jaeger, do not immediately extend to PSRs

• There are natural EM-like algorithms (current work)

Constructing Linear PSRs from POMDPs

Outcome vector u(t):the predictions for test t from all POMDP states.

A test t is said to be independent of a set of tests Tif it’s outcome vector is linearly independent of T’s o.v.s

Accumulate tests whose outcome vectors are independent

Search:

Start with T={}

While some extension aot of t T independent, add to T

Else terminate, return T.

PSR Conclusions

• A path to exorcizing the assumption of state– Toward the goal of totally data- (experience-) oriented

AI

• The predictive view of state is competitive– Even better (more compact) in some ways– States have data interpretations!– And are thus potentially more learnable, refinable

• Naturally leads to constructive discovery ideas– Searching for the right tests to understand the world

• “Tests” generalize naturally to super-predictions

Empiricism

Mind Worldactions

observations

Experience is the data; it is all we really know Experience should be the focus of AI

But by and large it is not… even in robotics, Alife, etc.

Experience is central —Knowledge is about experience

Mind is About PredictionsHypothesis: Knowledge is predictive

About what-leads-to-what, under what ways of behavingSuch knowledge is learnable, chainable

Hypothesis: Mental activity is working with predictionsLearning themCombining them to produce new predictions (reasoning)Converting them to action (planning, reinforcement learning)Figuring out which are most useful

Hypothesis: These ideas are newly viableUnfamiliar flexibiliy & expressiveness of “super”-predictionsNew engineering planning methods DP/RL/ValuesNew state-representation ideas

Hypothesis: Predictions are the Coin of the Mental Realm

It’s Hard to Build Large AI Systems

• Brittleness• Unforeseen interactions• Scaling

• Requires too much manual complexity management– people must understand, intervene, patch and tune– like programming

• Need more autonomy– learning, verification– internal coherence of knowledge and experience

AI Implications of Predictive View

• An alternative theory of knowledge and thought– Alternative to conventional, symbolic “language of

thought”– Alternative to “database” view of knowledge

• Requires experiments to be in the machine, not just the designer — true grounding

• Automated complexity management– Should help with brittleness and scaling

• Could permit AI systems of much greater complexity

Both Predictors and Experiments must be in the

Machine• “Classical” AI systems omit both!

– e.g., “Tweety is a bird”, “John loves Mary”– sometimes called the “symbol grounding problem”

• Modern AI sytems tend to skimp the experiments– supervised learning, Bayes nets, robotics…

• It is not OK to leave the experimental definitions to external, human observers– the information is just not in the machine– we don’t understand it; we haven’t done our job!

• Yet this is such an appealing shortcut that we have almost always done it

More Predictive Knowledge

• John is in the coffee room• My car in is the South parking lot• What we know about geography, navigation• What we know about how an object looks, rotates• What we know about how objects can be used• Recognition strategies for objects and letters• The portrait of Washington on the dollar in the

wallet in my other pants in the laundry, has a mustache on it– Composing experiments creates a productive rep’n

language

Relational, Propositional, and Deictic

objects X, If I drop X, then X will be on the floor– Holding object X means predicting certain sensations if,

for example, one directs one’s eyes toward one’s hand– Thus, on dropping, the predicted sensations are merely

transferred from the looking-at-hand prediction to the looking-at-floor prediction

– Such transfer of existing predictions should be a common part of visual knowledge - updated every time the eyes move

X,Y, such that Red(X), Blue(Y), and Above(X,Y)– There is some place I can foveate and see Red– There is some place I can foveate and see Blue– If I foveate first the Red place, “mark” it, then the Blue

place, the mark will be Above the fovea (may need to search)

– These are typical ideas of modern, active, deictic vision

X

X

Should All Knowledge be Experiential?

Allowing only Predictions in terms of Data? loses

• Expressiveness– can’t talk about objects, space, people; no “is-a” or “part-

of”

• External (human) coherence– verbal labels, interpretability, explainability, calibration– the “shortcut” of entering knowledge directly into the

agent

gains • The knowledge will have meaning to the machine• It can be mechanically learned/verified/extended • It will be suited for a general reasoning processes

– composition and backup of predictions to yield new predictions

There is value in forcing world knowledge into prediction

form• We will finally have all the knowledge in the

machine– all will be mechanically interpretable– we will finally really understand the knowledge’s meaning – anything else is just an empty shell

• Agent will be able to learn/verify/extend knowledge– provides an internal coherence for the knowledge– enable building it up from a firm foundation

• The knowledge will flow immediately into a general reasoning engine– the concatenation of predictions yields new predictions

Conclusions

• World knowledge must be expressed in terms of the data

• Such posterior grounding is challenging,– lose expressiveness in the short term– lose external (human) coherence, explainability

• But can be done step by step, • And brings palpable benefits

– autonomous learning/verification/extension of knowledge– autonomous complexity management due to internal

coherence – knowledge suited to general reasoning process

• We must provide this grounding!

Documents

Mind is About Predictions Rich Sutton AT&T Labs with special thanks to Michael Littman, Doina Precup, Satinder Singh, David McAllester