Machine Learning

Machine LearningRobert Stengel

Robotics and Intelligent Systems MAE 345,

Princeton University, 2009

• Markov Decision Processes– Optimal and near-optimal control

• Finding Decision Rules in Data– ID3 algorithm

• Search

Copyright 2009 by Robert Stengel. All rights reserved. For educational use only.http://www.princeton.edu/~stengel/MAE345.html

! Multistep NN with Memory

! Maze-Navigating Robot

! Robotic Prosthetic Device

! Optimal Control of an Ambiguous Robot! Game-Playing NN

! NN for Object Recognition

! Robotic Cloth Folder

! SAGA Simulated Creature

! NN to Optimize Problem Set Solution! Blob-Tracking NN

! Dust-Collecting Robot that Learns

! NN for Stock Return Prediction

Proposed Term Paper TopicsMAE 345, Fall 2009

• Identification of key attributes andoutcomes

• Taxonomies developed by experts

• First principles of science andmathematics

• Trial and error

• Probability theory and fuzzy logic

• Simulation and empirical results

Finding DecisionRules in Data

Example of On-LineCode Modification

• Execute a decision tree– Get wrong answer

• Add logic to distinguish between right and wrongcases– If Comfort Zone = Water,

• then Animal = Hippo,

• else Animal = Rhino

– True, but Animal is Dinosaur, not Hippo

– Ask user for right answer

– Ask user for a rule that distinguishes between right andwrong answer: If Animal is extinct, …

Markov Decision Process

• Model for decision making under uncertainty

S,A,Pamxk,x '( ),Ram xk ,x '( )!" #$

where

S : finite set of states, x1, x2 ,…,xK

A : finite set of actions, a1,a2 ,…,aM

Pamxk,x '( ) = Pr x

kti+1( ) = x ' | x

kti( ) = xk ,a t

i( ) = am!" #$

Ramxk,x '( ) = Expected immediate reward for transition from x

k to x '

• Optimal decision maximizes expected total reward (orminimizes expected total cost) by choosing best set ofactions (or control policy)– Linear-quadratic-Gaussian (LQG) control

– Dynamic programming -> HJB equation ~> A* search

– Reinforcement learning ~> Heuristic search

Maximizing the Utility Functionof a Markov Process

Utility function: J = ! (t)t=0

"

# Ra(t ) x(t),x(t +1)[ ]

! (t) : discount rate, 0<! (t)<1

Utility function to go = Value function: V = ! (t)t= t

current

"

# Ra(t ) x(t),x(t +1)[ ]

uopt

t( ) = argmaxa

Ra(t ) x(t),x(t +1)[ ] + ! (t) P

a(t ) x(t),x(t +1)[ ]V x(t +1)[ ]t= tcurrent

"

#$%&

'&

()&

*&

• Optimal control at t

• Optimized value function

V * t( ) = Ruopt (t )

x * (t)[ ] + ! (t) Puopt (t )

x * (t),xest* (t +1)[ ]V x

est* (t +1)[ ]

t= tcurrent

"

#

Reinforcement (“Q”) LearningControl of a Markov Process

ubest t( ) = argmaxu

Q x(t +1),u[ ]

• Q: quality of a state-action function

• Heuristic value function

• One-step philosophy for heuristic optimization

• Various algorithms for computing best control value

Q x(t +1),u(t +1)[ ] = Q x(t),u(t)[ ] +!(t) Ru(t ) x(t)[ ] + " (t)max

u

Q x(t +1),u[ ]#$

%& 'Q x(t),u(t)[ ]{ }

!(t) : learning rate, 0<!(t)<1

Q-Learning Snail Q-Learning, Ball on Plate

Q Learning Control of a MarkovProcess is Analogous to LQG

Control in the LTI Case

Q x(t +1),u(t +1)[ ] = Q x(t),u(t)[ ] +!(t) Ru(t ) x(t)[ ] + " (t)max

u

Q x(t +1),u[ ]#$

%& 'Q x(t),u(t)[ ]{ }

!(t) : learning rate, 0<!(t)<1

xk+1

= !xk+ "C x̂

k# x

k*( )

x̂k= !x̂

k"1" #C x̂

k"1" x

k"1*( ) +K z

k"H

xx̂k"1( )

Controller

Estimator

LQG Control Optimizes Discrete-

Time LTI Markov Process

S,A,Pamxk,x '( ),Ram xk ,x '( )!" #$

where

S : infinite set of states, x1, x2 ,…,xK

A : infinite set of actions, a1,a2 ,…,aM

Pamxk,x '( ) = Pr x

kti+1( ) = x ' | x

kti( ) = xk ,a t

i( ) = am!" #$

Ramxk,x '( ) = Expected immediate reward for transition from x

k to x '

Structuring an EfficientDecision Tree (Off-Line)

• Choose most important attributes first

• Recognize when no result can bededuced

• Exclude irrelevant factors

• Iterative Dichotomizer*: the ID3 Algorithm– Build an efficient decision tree from a fixed

set of examples (supervised learning)

*Dichotomy: Division into two (usually contradictory)

parts or opinions

Fuzzy Ball-Game Training Set

Case # Forecast Temperature Humidity Wind Play Ball?1 Sunny Hot High Weak No2 Sunny Hot High Strong No3 Overcast Hot High Weak Yes4 Rain Mild High Weak Yes5 Rain Cool Low Weak Yes6 Rain Cool Low Strong No7 Overcast Cool Low Strong Yes8 Sunny Mild High Weak No9 Sunny Cool Low Weak Yes

10 Rain Mild Low Weak Yes11 Sunny Mild Low Strong Yes12 Overcast Mild High Strong Yes13 Overcast Hot Low Weak Yes14 Rain Mild High Strong No

DecisionsAttributes

Parameters of the ID3 Algorithm

• Decisions, e.g., Play ball ordon!t play ball–D = Number of possible decisions

• Decision: Yes, no

Parameters ofthe ID3 Algorithm

• Attributes, e.g., Temperature, humidity,wind, weather forecast– M = Number of attributes to be considered in

making a decision

– Im = Number of values that the ith attribute cantake• Temperature: Hot, mild, cool

• Humidity: High, low

• Wind: Strong, weak

• Forecast: Sunny, overcast, rain

Parameters of

the ID3 Algorithm

• Training trials, e.g., all thegames played last month–N = Number of training trials

–n(i) = Number of examples withith attribute

Example: Probability Spaces forThree Attributes

Attribute #1

2 possible values

Attribute #2

6 possible values

Attribute #3

4 possible values

• Probability of an attribute valuerepresented by area in diagram

Example: Decision, givenValues of Three Attributes

Attribute #1

2 possible values

Attribute #2

6 possible values

Attribute #3

4 possible values

Accurate Detection of Events Dependson Their Probability of Occurence

Accurate Detection of Events Dependson Their Probability of Occurence

!noise

= 0.1

!noise

= 0.2

!noise

= 0.4

Entropy Measures InformationContent of a Signal

• S = Entropy of a signal encoding I distinct events

S = ! Pr(i) log2 Pr(i)i=1

I

"

• i = Index identifying an event encoded bya signal

• Pr(i) = Probability of ith event

• log2Pr(i) = Number of bits required tocharacterize the probability that the ith

event occurs

0 " Pr(.) " 1

log2 Pr(.) " 0

Entropy of Two Events with Various

Frequencies of Occurrence

• Frequencies of occurrence estimateprobabilities of each event (#1 and #2)– Pr(#1) = n(#1)/N

– Pr(#2) = n(#2)/N = 1 – n(#1)/N

S = S#1 + S# 2

= !Pr(#1) log2 Pr(#1) ! Pr(# 2) log2 Pr(# 2)

log2 Pr(#1 or #2) " 0

• Pr(i) log2Pr(i) represents the channel capacity(i.e., average number of bits) required to portraythe ith event

Entropy of Two Events with Various

Frequencies of OccurrenceEntropies for 128 TrialsPr(#1) - # of Bits(#1) Pr(#2) - # of Bits(#2) Entropy

n n/N log2(n/N) 1 - n/N log2(1 - n/N) S1 0.008 -7 0.992 -0.011 0.066

2 0.016 -6 0.984 -0.023 0.116

4 0.031 -5 0.969 -0.046 0.201

8 0.063 -4 0.938 -0.093 0.337

16 0.125 -3 0.875 -0.193 0.544

32 0.25 -2 0.75 -0.415 0.811

64 0.50 -1 0.50 -1 1

96 0.75 -0.415 0.25 -2 0.811

112 0.875 -0.193 0.125 -3 0.544

120 0.938 -0.093 0.063 -4 0.337

124 0.969 -0.046 0.031 -5 0.201

126 0.984 -0.023 0.016 -6 0.116

127 0.992 -0.011 0.008 -7 0.066

Best Decision is Related to Entropy

and the Probability of Occurrence

S = ! Pr(i) log2 Pr(i)i=1

I

"

• High entropy– Signal provides high coding

precision of distinct events

– Differences coded with few bits

• Low entropy– Lack of distinction between

signal values

– Detecting differences requiresmany bits

• Best classification of eventswhen S = 1...– but that may not be achievable

Decision-Making

Parameters for ID3

• SD = Entropy of all possible decisions

SD

= ! Pr(d) log2 Pr(d)d=1

D

"

• Gi = Information gain of ith attribute

Gi

= SD

+ Pr(i) Pr(id) log2 Pr(id )

d=1

D

!i=1

I m

!

• Pr(id) = n(id)/ N(d) = Probability that ith

attribute correlates with dth decision

Case # Forecast Temperature Humidity Wind Play Ball?1 Sunny Hot High Weak No2 Sunny Hot High Strong No3 Overcast Hot High Weak Yes4 Rain Mild High Weak Yes5 Rain Cool Low Weak Yes6 Rain Cool Low Strong No7 Overcast Cool Low Strong Yes8 Sunny Mild High Weak No9 Sunny Cool Low Weak Yes

10 Rain Mild Low Weak Yes11 Sunny Mild Low Strong Yes12 Overcast Mild High Strong Yes13 Overcast Hot Low Weak Yes14 Rain Mild High Strong No

Decision Tree Produced byID3 Algorithm

• Temperature is inconsequential andis not included in the decision tree

• Root Attribute gains, Gi

– Forecast: 0.246

– Temperature: 0.029

– Humidity: 0.151

– Wind: 0.048

Decision Tree Produced byID3 Algorithm

• Sunny BranchAttribute gains, Gi

– Temperature: 0.57

– Humidity: 0.97

– Wind: 0.019

Search

• Typical AI textbook problems– Prove a theorem

– Solve a puzzle (e.g., Tower ofHanoi)

– Find a sequence of moves thatwins a game (e.g., chess)

– Find the shortest pathconnecting a set of points (e.g.,Traveling salesman problem)

– Find a sequence of symbolictransformations that solve acalculus problem (e.g.,Mathematica)

• The common thread: search– Structures for search

– Strategies for search

Curse of

Dimensionality• Feasible search paths may

grow without bound– Possible combinatorial

explosion

– Checkers: 5 x 1020 possiblemoves

– Chess: 10120 moves

– Protein folding: ?

• Limiting search complexity– Redefine search space

– Employ heuristic (i.e., pragmatic)rules

– Establish restricted search range

– Invoke decision models thathave worked in the past

Structures for Search

• Trees–Single path between root and any node

–Path between adjacent nodes = arc

–Root node• no precursors

–Leaf node• no successors

• possible terminator

Structures for Search

• Graphs

–Multiple pathsbetween rootand somenodes

–Trees aresubsets ofgraphs

Directions of Search

• Forward chaining

–Reason from premises to actions

–Data-driven: draw conclusionsfrom facts

• Backward chaining

–Reason from actions to premises

–Goal-driven: find facts thatsupport hypotheses

Strategies for Search

• Realistic assessment– Not necessary to consider all 10120 possible moves

to play good chess

– Playing excellent chess may require much forwardand backward chaining, but not 10120 evaluations

– Most applications are more procedural

• Search categories– Blind search

– Heuristic search

– Probabilistic search

– Optimization

• Search forward from opening?

• Search backward from end game?

• Both?

Blind Search• Node expansion

– Find all successors to that node

• Depth-first forward search– Expand nodes descended from most recently

expanded node

– Consider other paths only after reaching a nodewith no successors

• Breadth-first forward search– Expand nodes in order of proximity to the start node

– Consider all sequences of arc number n (from rootnode) before considering any of number (n + 1)

– Exhaustive, but guaranteed to find the shortest pathto a terminator

Blind Search

• Bidirectional search– Search forward from root node and

backward from one or more leaf nodes

– Terminate when search nodes coincide

• Minimal-cost forward search– Each arc is assigned a cost

– Expand nodes in order of minimum cost

AND/OR Graph Search

• A node is “solved” if– It is a leaf node with a satisfactory goal

state

– It has solved AND nodes as successors

– It has OR nodes as successors, at leastone of which is solved.

• Goal: Solve the root node

Heuristic Search

• For large problems, blind search typicallyleads to combinatorial explosion

• Employ heuristic knowledge about thequality of possible paths– Decide which node to expand next

– Discard (or prune) nodes that are unlikely tobe fruitful

• Search for feasible (approximatelyoptimal) rather than optimal solutions

• Ordered or best-first search– Always expand “most promising” node

Heuristic Optimal Search

Heuristic DynamicProgramming: A* Search

• Each arc bears an incremental cost

• Cost, J, estimated at kth instant =– Cost accrued to k

– Remaining cost to reach final point, kf

• Goal: minimize estimated cost by choice ofremaining arcs

• Choose arck+1, arck+2 accordingly

• Use heuristics to estimate remaining cost

ˆ J k f= Ji

i=1

k

! + ˆ J i(arci)i= k +1

k f

!

Mechanical Control System

Inferential Fault Analyzer forHelicopter Control System

• Local failure analysis

– Set of hypothetical models of specific failure

• Global failure analysis

– Forward reasoning assesses failure impact

– Backward reasoning deduces possible causes

Cockpit Controls

Forward Rotor

Aft Rotor

• Frames store facts and facilitate search and inference

– Components and up-/downstream linkages of control system

– Failure model parameters

– Rule base for failure analysis (LISP)

Local Failure Analysis

Heuristic Search• Global failure analysis

– Determination based on aggregate oflocal models

• Heuristic score based on– Criticality of failure

– Reliability of component

– Extensiveness of failure

– Implicated devices

– Level of backtracking

– Severity of failure

– Net probability of failure model

Global Failure Analysis

Shortest Path Problems

• Neural network solution

• Simulated annealing solution

• Genetic algorithm solution

• Find the shortest (orleast costly) path thatvisits all selected citiesjust once– Traveling Saleman

– MapQuest/GPS/GIS

Modified Dijkstra

Algorithm

Next Time:Knowledge

Representation

Documents

Machine Learning