Reinforcement Learning. The study of thinking. 1) Problem-Solving 2) Reasoning

Reinforcement Learning

The study of thinking.

1) Problem-Solving

2) Reasoning

Perception Memory Thinking/CognitionSensation Encoding

Retrieval

---------------------------------------------------------------------Low Level Higher Level

Thinking is a higher-level cognitive process that requires all sorts of cognitive operations (e.g. attention, perception, memory, language) and is often a conscious, controlled process

Should we wait until we understand the lower-level processes first? Research in higher-level cognition might inform research at lower-level cognition and vice-versa.

The study of thinking

Modern view: • Thinking is an internal cognitive process

• The exact nature of these processes cannot be observed directly from behavior

• However, most cognitive theories lead to testable predictions. Behavioral experiments can test these predictions. Cognitive processes are inferred indirectly from behavior.

Well-defined & Ill-defined Problems

Well-defined problems have completely specified initial conditions, goals, and operators works well with computer simulation

Ill-defined problems have some aspects which are not completely specified sometimes requires insight to see problem in a new way

1. Writing a good paper = ?2. solving an algebra problem = ? 3. conducting a statistical significance test = ?4. designing a good experiment = ?5. choosing a president = ?6. reducing drunk driving = ?7. being a nice person = ?

Well-defined problem solving

INITIAL STATE GOAL STATE

INITIAL STATE

GOAL STATE

?

Play the game: http://www.mazeworks.com/hanoi/

- given state - goal state- obstacles- operators

http://www.mazeworks.com/hanoi/

problem solving strategies

How to solve the maze? - trial and error- forward - backward- means-end analysis

• Most problem solving situations involves a combination of planning (means-end analysis), trial and error, and reinforcement learning and perhaps ... insight

• Reinforcement learning grew out of behaviorism • Insight Gestaltists view• Planning grew out of AI and cognitive psychology

Learning by Reinforcement

Associationist theories of thinking -> thinking as response learning

Three elements of associationist theory:1) stimulus: a problem solving situation2) response: a particular problem solving behavior 3) associations: strength between stimulus and response

SR3

R2

R1

Thorndike’s work on cats in a puzzle box

• Cats initially solved the puzzle box problem by trial and error – trying various responses until one accidentally worked

• After being placed in the box many times, it learned the successful response and pulled the string almost immediately

Habit Family Hierarchy

Try most dominant response first, then second strongest, etc.

1) Law of exercise: practice tends to increase S-R link

2) Law of effect: responses that solve a problem increase in strength. Responses that do not help solve problem lose strength

SR3

R2

R1

• What about response chains?

• E.g.:

• How can path from initial state to goal state be strengthened? How to avoid dead-ends?

• How can we reward a successful action that only much later in time leads to success? problem of delayed reinforcement

• Modern reinforcement learning involves passing strengths of successful responses back through a chain.

startgoal

Maze example

• Reinforcement learning example for mazes

Reinforcement Learning

• Behavior follows simple associations in response chains. No planning, no mental maps, no “insight”

• Learning from very simple feedback: failure or success• Associative strengths between response chains are

learned. Passing strength back in time

startgoal

Demo’s

Reinforcement learning in mazes: http://www.ise.pw.edu.pl/~cichosz/rl-java/

Reinforcement learning in robot-arm control: http://www.fe.dis.titech.ac.jp/~gen/robot/robodemo.html

Robot learning task of pole-balancing and devilsticking:http://www-clmc.usc.edu/movies/learning.html

http://www.ise.pw.edu.pl/~cichosz/rl-java/

http://www.fe.dis.titech.ac.jp/~gen/robot/robodemo.html

http://www.fe.dis.titech.ac.jp/~gen/robot/robodemo.html

http://www-clmc.usc.edu/movies/learning.html

Some Amazing AnagramsOriginal Becomes...

Dormitory Dirty Room

Desperation A Rope Ends It

The Morse Code Here Come Dots

Slot Machines Cash Lost in 'em

Animosity Is No Amity

Snooze Alarms Alas! No More Z's

Alec Guinness Genuine Class

Semolina Is No Meal

The Public Art Galleries Large Picture Halls, I Bet

A Decimal Point I'm a Dot in Place

The Earthquakes That Queer Shake

Eleven plus two Twelve plus one

Contradiction Accord not in it

To be or not to be: that is the question, whether tis nobler in the mind to suffer the slings and

arrows of outrageous fortune.

In one of the Bard's best-thought-of tragedies, our insistent hero, Hamlet, queries on two fronts about how life turns rotten.

"That's one small step for a man, one giant leap for mankind." -- Neil A. Armstrong

A thin man ran; makes a large stride; left planet, pins flag on moon! On to Mars!

S R1

R2

R3

R4

Stimulus Response(a new letter combination)

g o r w n g r o w n

w r o n g

w r g n o

…

Anagram solving time depends on:- familiarity of goal word- letter transition probability of goal word- letter transition probability of presented word- number of moves

Class Experiment

• Replicate effect of familiarity

Ready...?• nrdki

» (drink 7.0)

• aewtr» (water 3.0)

• cahtb» (batch 16.0)

• milbc» (climb 7.5)

• kcler» (clerk 17.5)

• rtypa» (party 14.0)

• huocg» (cough 23.5)

• rmcap » (cramp 12.0)

• nrdki » (drink

7.0)

• aewtr» (water 3.0)

• cahtb» (batch 16.0)

• milbc» (climb

7.5)

• kcler» (clerk

17.5)

• rtypa» (party

14.0)

• huocg» (cough 23.5)

• rmcap » (cramp 12.0)

Mean solution times:

High familiarity = 7.9 secLow familiarity = 17.3 sec

• Can all thinking be described by trial and error/ stimulus-response?

• What about insight? Gestaltist view

• What about planning? AI view

The Handcuffs Puzzle

The Set-Up For this puzzle you need two people, some rope and some empty space to do the puzzle in. Each person will need a piece of rope with a loop tied in both ends, so it can be worn as handcuffs. The rope should be reasonably long, so that the person wearing it can easily step over it if they want.

Each person puts on a complete set of handcuffs. Before putting them on, they loop their handcuffs around each other so they are tied together. Each person should wear a complete set of handcuffs. They then have to get themselves apart while following these rules:

The handcuffs cannot be removed.

Do not break, cut, saw through, bite

through or in any other way damage

the rope. Damaging each other is

probably a bad idea too.

content copied from: http://ccins.camosun.bc.ca/~jbritton/jbhandcuff.htm

http://ccins.camosun.bc.ca/~jbritton/jbhandcuff.htm

Documents

Reinforcement Learning. The study of thinking. 1) Problem-Solving 2) Reasoning