Upload
butest
View
375
Download
4
Embed Size (px)
Citation preview
Machine LearningRobert Stengel
Robotics and Intelligent Systems MAE 345,
Princeton University, 2009
• Markov Decision Processes– Optimal and near-optimal control
• Finding Decision Rules in Data– ID3 algorithm
• Search
Copyright 2009 by Robert Stengel. All rights reserved. For educational use only.http://www.princeton.edu/~stengel/MAE345.html
! Multistep NN with Memory
! Maze-Navigating Robot
! Robotic Prosthetic Device
! Optimal Control of an Ambiguous Robot! Game-Playing NN
! NN for Object Recognition
! Robotic Cloth Folder
! SAGA Simulated Creature
! NN to Optimize Problem Set Solution! Blob-Tracking NN
! Dust-Collecting Robot that Learns
! NN for Stock Return Prediction
Proposed Term Paper TopicsMAE 345, Fall 2009
• Identification of key attributes andoutcomes
• Taxonomies developed by experts
• First principles of science andmathematics
• Trial and error
• Probability theory and fuzzy logic
• Simulation and empirical results
Finding DecisionRules in Data
Example of On-LineCode Modification
• Execute a decision tree– Get wrong answer
• Add logic to distinguish between right and wrongcases– If Comfort Zone = Water,
• then Animal = Hippo,
• else Animal = Rhino
– True, but Animal is Dinosaur, not Hippo
– Ask user for right answer
– Ask user for a rule that distinguishes between right andwrong answer: If Animal is extinct, …
Markov Decision Process
• Model for decision making under uncertainty
S,A,Pamxk,x '( ),Ram xk ,x '( )!" #$
where
S : finite set of states, x1, x2 ,…,xK
A : finite set of actions, a1,a2 ,…,aM
Pamxk,x '( ) = Pr x
kti+1( ) = x ' | x
kti( ) = xk ,a t
i( ) = am!" #$
Ramxk,x '( ) = Expected immediate reward for transition from x
k to x '
• Optimal decision maximizes expected total reward (orminimizes expected total cost) by choosing best set ofactions (or control policy)– Linear-quadratic-Gaussian (LQG) control
– Dynamic programming -> HJB equation ~> A* search
– Reinforcement learning ~> Heuristic search
Maximizing the Utility Functionof a Markov Process
Utility function: J = ! (t)t=0
"
# Ra(t ) x(t),x(t +1)[ ]
! (t) : discount rate, 0<! (t)<1
Utility function to go = Value function: V = ! (t)t= t
current
"
# Ra(t ) x(t),x(t +1)[ ]
uopt
t( ) = argmaxa
Ra(t ) x(t),x(t +1)[ ] + ! (t) P
a(t ) x(t),x(t +1)[ ]V x(t +1)[ ]t= tcurrent
"
#$%&
'&
()&
*&
• Optimal control at t
• Optimized value function
V * t( ) = Ruopt (t )
x * (t)[ ] + ! (t) Puopt (t )
x * (t),xest* (t +1)[ ]V x
est* (t +1)[ ]
t= tcurrent
"
#
Reinforcement (“Q”) LearningControl of a Markov Process
ubest t( ) = argmaxu
Q x(t +1),u[ ]
• Q: quality of a state-action function
• Heuristic value function
• One-step philosophy for heuristic optimization
• Various algorithms for computing best control value
Q x(t +1),u(t +1)[ ] = Q x(t),u(t)[ ] +!(t) Ru(t ) x(t)[ ] + " (t)max
u
Q x(t +1),u[ ]#$
%& 'Q x(t),u(t)[ ]{ }
!(t) : learning rate, 0<!(t)<1
Q-Learning Snail Q-Learning, Ball on Plate
Q Learning Control of a MarkovProcess is Analogous to LQG
Control in the LTI Case
Q x(t +1),u(t +1)[ ] = Q x(t),u(t)[ ] +!(t) Ru(t ) x(t)[ ] + " (t)max
u
Q x(t +1),u[ ]#$
%& 'Q x(t),u(t)[ ]{ }
!(t) : learning rate, 0<!(t)<1
xk+1
= !xk+ "C x̂
k# x
k*( )
x̂k= !x̂
k"1" #C x̂
k"1" x
k"1*( ) +K z
k"H
xx̂k"1( )
Controller
Estimator
LQG Control Optimizes Discrete-
Time LTI Markov Process
S,A,Pamxk,x '( ),Ram xk ,x '( )!" #$
where
S : infinite set of states, x1, x2 ,…,xK
A : infinite set of actions, a1,a2 ,…,aM
Pamxk,x '( ) = Pr x
kti+1( ) = x ' | x
kti( ) = xk ,a t
i( ) = am!" #$
Ramxk,x '( ) = Expected immediate reward for transition from x
k to x '
Structuring an EfficientDecision Tree (Off-Line)
• Choose most important attributes first
• Recognize when no result can bededuced
• Exclude irrelevant factors
• Iterative Dichotomizer*: the ID3 Algorithm– Build an efficient decision tree from a fixed
set of examples (supervised learning)
*Dichotomy: Division into two (usually contradictory)
parts or opinions
Fuzzy Ball-Game Training Set
Case # Forecast Temperature Humidity Wind Play Ball?1 Sunny Hot High Weak No2 Sunny Hot High Strong No3 Overcast Hot High Weak Yes4 Rain Mild High Weak Yes5 Rain Cool Low Weak Yes6 Rain Cool Low Strong No7 Overcast Cool Low Strong Yes8 Sunny Mild High Weak No9 Sunny Cool Low Weak Yes
10 Rain Mild Low Weak Yes11 Sunny Mild Low Strong Yes12 Overcast Mild High Strong Yes13 Overcast Hot Low Weak Yes14 Rain Mild High Strong No
DecisionsAttributes
Parameters of the ID3 Algorithm
• Decisions, e.g., Play ball ordon!t play ball–D = Number of possible decisions
• Decision: Yes, no
Parameters ofthe ID3 Algorithm
• Attributes, e.g., Temperature, humidity,wind, weather forecast– M = Number of attributes to be considered in
making a decision
– Im = Number of values that the ith attribute cantake• Temperature: Hot, mild, cool
• Humidity: High, low
• Wind: Strong, weak
• Forecast: Sunny, overcast, rain
Parameters of
the ID3 Algorithm
• Training trials, e.g., all thegames played last month–N = Number of training trials
–n(i) = Number of examples withith attribute
Example: Probability Spaces forThree Attributes
Attribute #1
2 possible values
Attribute #2
6 possible values
Attribute #3
4 possible values
• Probability of an attribute valuerepresented by area in diagram
Example: Decision, givenValues of Three Attributes
Attribute #1
2 possible values
Attribute #2
6 possible values
Attribute #3
4 possible values
Accurate Detection of Events Dependson Their Probability of Occurence
Accurate Detection of Events Dependson Their Probability of Occurence
!noise
= 0.1
!noise
= 0.2
!noise
= 0.4
Entropy Measures InformationContent of a Signal
• S = Entropy of a signal encoding I distinct events
S = ! Pr(i) log2 Pr(i)i=1
I
"
• i = Index identifying an event encoded bya signal
• Pr(i) = Probability of ith event
• log2Pr(i) = Number of bits required tocharacterize the probability that the ith
event occurs
0 " Pr(.) " 1
log2 Pr(.) " 0
Entropy of Two Events with Various
Frequencies of Occurrence
• Frequencies of occurrence estimateprobabilities of each event (#1 and #2)– Pr(#1) = n(#1)/N
– Pr(#2) = n(#2)/N = 1 – n(#1)/N
S = S#1 + S# 2
= !Pr(#1) log2 Pr(#1) ! Pr(# 2) log2 Pr(# 2)
log2 Pr(#1 or #2) " 0
• Pr(i) log2Pr(i) represents the channel capacity(i.e., average number of bits) required to portraythe ith event
Entropy of Two Events with Various
Frequencies of OccurrenceEntropies for 128 TrialsPr(#1) - # of Bits(#1) Pr(#2) - # of Bits(#2) Entropy
n n/N log2(n/N) 1 - n/N log2(1 - n/N) S1 0.008 -7 0.992 -0.011 0.066
2 0.016 -6 0.984 -0.023 0.116
4 0.031 -5 0.969 -0.046 0.201
8 0.063 -4 0.938 -0.093 0.337
16 0.125 -3 0.875 -0.193 0.544
32 0.25 -2 0.75 -0.415 0.811
64 0.50 -1 0.50 -1 1
96 0.75 -0.415 0.25 -2 0.811
112 0.875 -0.193 0.125 -3 0.544
120 0.938 -0.093 0.063 -4 0.337
124 0.969 -0.046 0.031 -5 0.201
126 0.984 -0.023 0.016 -6 0.116
127 0.992 -0.011 0.008 -7 0.066
Best Decision is Related to Entropy
and the Probability of Occurrence
S = ! Pr(i) log2 Pr(i)i=1
I
"
• High entropy– Signal provides high coding
precision of distinct events
– Differences coded with few bits
• Low entropy– Lack of distinction between
signal values
– Detecting differences requiresmany bits
• Best classification of eventswhen S = 1...– but that may not be achievable
Decision-Making
Parameters for ID3
• SD = Entropy of all possible decisions
SD
= ! Pr(d) log2 Pr(d)d=1
D
"
• Gi = Information gain of ith attribute
Gi
= SD
+ Pr(i) Pr(id) log2 Pr(id )
d=1
D
!i=1
I m
!
• Pr(id) = n(id)/ N(d) = Probability that ith
attribute correlates with dth decision
Case # Forecast Temperature Humidity Wind Play Ball?1 Sunny Hot High Weak No2 Sunny Hot High Strong No3 Overcast Hot High Weak Yes4 Rain Mild High Weak Yes5 Rain Cool Low Weak Yes6 Rain Cool Low Strong No7 Overcast Cool Low Strong Yes8 Sunny Mild High Weak No9 Sunny Cool Low Weak Yes
10 Rain Mild Low Weak Yes11 Sunny Mild Low Strong Yes12 Overcast Mild High Strong Yes13 Overcast Hot Low Weak Yes14 Rain Mild High Strong No
Decision Tree Produced byID3 Algorithm
• Temperature is inconsequential andis not included in the decision tree
• Root Attribute gains, Gi
– Forecast: 0.246
– Temperature: 0.029
– Humidity: 0.151
– Wind: 0.048
Decision Tree Produced byID3 Algorithm
• Sunny BranchAttribute gains, Gi
– Temperature: 0.57
– Humidity: 0.97
– Wind: 0.019
Search
• Typical AI textbook problems– Prove a theorem
– Solve a puzzle (e.g., Tower ofHanoi)
– Find a sequence of moves thatwins a game (e.g., chess)
– Find the shortest pathconnecting a set of points (e.g.,Traveling salesman problem)
– Find a sequence of symbolictransformations that solve acalculus problem (e.g.,Mathematica)
• The common thread: search– Structures for search
– Strategies for search
Curse of
Dimensionality• Feasible search paths may
grow without bound– Possible combinatorial
explosion
– Checkers: 5 x 1020 possiblemoves
– Chess: 10120 moves
– Protein folding: ?
• Limiting search complexity– Redefine search space
– Employ heuristic (i.e., pragmatic)rules
– Establish restricted search range
– Invoke decision models thathave worked in the past
Structures for Search
• Trees–Single path between root and any node
–Path between adjacent nodes = arc
–Root node• no precursors
–Leaf node• no successors
• possible terminator
Structures for Search
• Graphs
–Multiple pathsbetween rootand somenodes
–Trees aresubsets ofgraphs
Directions of Search
• Forward chaining
–Reason from premises to actions
–Data-driven: draw conclusionsfrom facts
• Backward chaining
–Reason from actions to premises
–Goal-driven: find facts thatsupport hypotheses
Strategies for Search
• Realistic assessment– Not necessary to consider all 10120 possible moves
to play good chess
– Playing excellent chess may require much forwardand backward chaining, but not 10120 evaluations
– Most applications are more procedural
• Search categories– Blind search
– Heuristic search
– Probabilistic search
– Optimization
• Search forward from opening?
• Search backward from end game?
• Both?
Blind Search• Node expansion
– Find all successors to that node
• Depth-first forward search– Expand nodes descended from most recently
expanded node
– Consider other paths only after reaching a nodewith no successors
• Breadth-first forward search– Expand nodes in order of proximity to the start node
– Consider all sequences of arc number n (from rootnode) before considering any of number (n + 1)
– Exhaustive, but guaranteed to find the shortest pathto a terminator
Blind Search
• Bidirectional search– Search forward from root node and
backward from one or more leaf nodes
– Terminate when search nodes coincide
• Minimal-cost forward search– Each arc is assigned a cost
– Expand nodes in order of minimum cost
AND/OR Graph Search
• A node is “solved” if– It is a leaf node with a satisfactory goal
state
– It has solved AND nodes as successors
– It has OR nodes as successors, at leastone of which is solved.
• Goal: Solve the root node
Heuristic Search
• For large problems, blind search typicallyleads to combinatorial explosion
• Employ heuristic knowledge about thequality of possible paths– Decide which node to expand next
– Discard (or prune) nodes that are unlikely tobe fruitful
• Search for feasible (approximatelyoptimal) rather than optimal solutions
• Ordered or best-first search– Always expand “most promising” node
Heuristic Optimal Search
Heuristic DynamicProgramming: A* Search
• Each arc bears an incremental cost
• Cost, J, estimated at kth instant =– Cost accrued to k
– Remaining cost to reach final point, kf
• Goal: minimize estimated cost by choice ofremaining arcs
• Choose arck+1, arck+2 accordingly
• Use heuristics to estimate remaining cost
ˆ J k f= Ji
i=1
k
! + ˆ J i(arci)i= k +1
k f
!
Mechanical Control System
Inferential Fault Analyzer forHelicopter Control System
• Local failure analysis
– Set of hypothetical models of specific failure
• Global failure analysis
– Forward reasoning assesses failure impact
– Backward reasoning deduces possible causes
Cockpit Controls
Forward Rotor
Aft Rotor
• Frames store facts and facilitate search and inference
– Components and up-/downstream linkages of control system
– Failure model parameters
– Rule base for failure analysis (LISP)
Local Failure Analysis
Heuristic Search• Global failure analysis
– Determination based on aggregate oflocal models
• Heuristic score based on– Criticality of failure
– Reliability of component
– Extensiveness of failure
– Implicated devices
– Level of backtracking
– Severity of failure
– Net probability of failure model
Global Failure Analysis
Shortest Path Problems
• Neural network solution
• Simulated annealing solution
• Genetic algorithm solution
• Find the shortest (orleast costly) path thatvisits all selected citiesjust once– Traveling Saleman
– MapQuest/GPS/GIS
Modified Dijkstra
Algorithm
Next Time:Knowledge
Representation