Reinforcement Learning Hidden Markov Model - Cog Sciajyu/Teaching/Cogs202_sp14/Slides/lect6.pdf ·...

Hidden Markov ModelReinforcement Learning

Week 6 PresentationYashodhan, Chun, Ning

Hidden Markov Models (HMM)Questions:● What are HMMs useful for ?● What are some of the assumptions underlying HMMs ?● What are the 3 problems for HMMs ? Explain each in

terms of the coin toss example.

Coin Toss Example3 coins - C1, C2, C3Select a coin randomly, flip it and repeat

Given only the sequence HTTHTHHT, can we find out the sequence of coins that was chosen ?

Outcome

Hidden Markov Model

Outcome

State sequence i1,i2,...iT

Observation sequence O1,O2,...,OT

N = number of distinct states (N = 3 here)M = number of distinct observation symbols (M = 2 here)T = length of observation sequence (T = 8 here)Denote the N states by 1,2,...,N (state i corresponds to coin i being chosen)Denote the M observation symbols by V = {v1,...,vM} (v1 = H, v2 = T)

Hidden Markov ModelComponent Meaning Example

Initial state distribution Probability of being in state i at t = 1

For i = 1, this is the probability of choosing coin 1 at t = 1

Transition matrix Probability of a transition from state i to state j

a12 is the probability of choosing coin 2 immediately after coin 1

Emission matrix Probability of observing vk in state j

b1(2) is the probability of observing Tails given that coin 1 has been chosen

AssumptionsFinite context

Shared distributions

Three Problems for HMMsProbability of the observation sequence

Choosing the most likely state sequence

Estimate the parameters of the HMM

Problem 1 - Direct computation

Involves 2TNT multiplications. For N = 5, T = 100, this is ~1072 multiplications

Marginalization

Product Rule

Problem 1 - Forward VectorProbability of sequence of t observations with state i at time t

marginalization

product rule

Base case of recursion

Problem 1 - Forward Vector

Now we can write P(O) in terms of the forward vector

marginalization

Problem 1 - Using Forward VectorCompute forward vector at t = 1

Compute forward vectors for t = 2 to T

Compute probability of the observation sequence

t = 1 .. T

i = 1 .. n

t = 1 .. T

i = 1 .. n

t = 1 .. T

i = 1 .. n

Number of multiplications is of the order of N2T

Problem 1 - Using Backward Vector

Compute backward vector at t = T

Compute backward vectors for t = T-1 to 1

Compute probability of the observation sequence

t = 1 .. T

i = 1 .. n

t = 1 .. T

i = 1 .. n

t = 1 .. T

i = 1 .. n

Probability of observation sequence from time t+1 to T given state i at time t

Reinforcement Learning

Agent and environment interact at discrete time steps : t = 0, 1, 2,..Agent observes state at step t : st ∈SProduces action at step t : at ∈ A(st )Gets resulting reward : rt +1 ∈ℜ and resulting next state : st +1Policy at step t, πt :a mapping from states to action probabilitiesπt (s, a) = probability that at = a when st = s

Goals and Rewards● Reward: a single number rt at each time step● Agent’s goal: maximize cumulative reward in the long

run● Examples of rewards

○ Maze: +1 for escape, -1 for each time prior to escape○ Walking: proportional to robot’s forward motion

● Focus is on what the robot should achieve, not how it should be achieved○ Chess: reward only for winning, not for achieving sub-goals

Returns - Formalizing the goalReward sequence after time t: rt+1,rt+2,rt+3…Return Rt: function of the reward sequenceMaximize expected return Episodic tasks:

Continuing tasks (discounted return):

where is the discount rate

ExampleFailure: the pole falling beyond a critical angle or the cart hitting the end of the track

As an episodic task: episode ends upon failurereward = +1 for each step before failurereturn = number of steps before failure

As a continuing task with discounted returnreward = -1 upon failure, 0 otherwisereturn = , for k steps before failure

In either case, return is maximized by avoiding failure for as long as possible

The Markov PropertiesIn general case, the environment state may depends everything that has happened earlier.

If the environment state has Markov property, the environment response at t +1 only depends on state and action in the previous time.

Markov Decision Process (MDP)Definition: A reinforcement learning task that satisfies the Markov property.

Transition Probability:

Expected value of next reward:

Recycling Robot MDP

❏ At each time the robot needs to decide whether it should 1). actively search for a can, 2). remain stationary and wait for someone to bring it a can, 3). go back to home to recharge its battery

❏ Reward = numbers of cans collected❏ Searching will collect more cans but lower the battery. If

it runs out of battery, has to be rescued❏ decision is solely based on energy level of the battery

Recycling Robot MDP, cont’s

❏ Searching beginning with high energy leaves energy level high with probability and low with 1 -

❏ Searching beginning with low energy leaves energy level low with probability depleted with probability 1 -

❏ Each collected can is counted as unit reward❏ Rescue will result -3 reward

Recycling Robot MDP, cont’s

State-Value Function for policy ❏ State-Value function: the expected return when starting

in the state and following a certain policy ❏ Policy: A policy is a mapping from each state to

the probability of taking action in state

State-Value Function, cont’sBackup DiagramBellman equation for

Action-Value Function for policy Action-Value function: Expected return starting from state , taking the action , and then following policy .

GridworldAction: North,south,east,westReward: 1). -1 for taking the agent off the grid; 2). 0 for other action except those that move the agent out of the special states A and B; 3) +10 for any action in state A; 4) +5 for any action in state B

Optimal State-Value Functions

❏ There are always one or more policies that are better than or equal to others. These are optimal policies denoted by

❏ Optimal state-value functions

Bellman optimality equation for Bellman optimal equation Backup diagram

Bellman optimal equation has a unique solution independent of the policy

Optimal Action-Value Function for Optimal action-value function gives expected return for taking action in state and then following the optimal policy

Bellman optimality equation for

Backup diagram

Greedy policy

● For each state, there will be one or more actions at which the maximum is obtained the Bellman optimality equation. Any policy that assigns non-zero probability only to those action is an optimal policy(greedy policy).

● If one uses optimal value function to evaluate the one-step consequences of actions, then the greedy policy is actually optimal in the long-term sense.

Bellman Optimality Equations for the Recycling Robot

Yu & Dayan 2005● Expected Uncertainty and Unexpected Uncertainty● AcetylCholine and Norepinephrine● Observed Phenomenon:

Task Paradigm

Model for the Task and Neurochemistry

Generic Internal Model

Three Models● The Ideal Learner

● The Approximate Inference Model

● The Bottom-up Naive Model

Model ComparisonCost:

NE and Ach, using Approximate Model

Without Depletion of NE or Ach

With Depletion of NE or Ach

Different “Depletion” Scenario

Reinforcement Learning Hidden Markov Model - Cog Sciajyu/Teaching/Cogs202_sp14/Slides/lect6.pdf ·...

Documents

Hidden Markov Models and Gaussian Mixture Models · Hidden Markov Models and Gaussian Mixture Models ... Hidden Markov Model ... ASR Lectures 4&5 Hidden Markov Models and Gaussian

Hidden Markov Models · Hidden Markov Models 1 10-601 Introduction to Machine Learning Matt Gormley Lecture 20 Nov. 7, 2018 ... Hidden Markov Model 28 A Hidden Markov Model (HMM)

Hidden Markov Models - Home | Princeton Universityrvan/orf557/hmm080728.pdf · 1 Hidden Markov Models..... 1 1.1 Markov Processes ..... 1 1.2 Hidden Markov Models..... 4 ... able

Interacting Hidden Markov Models for Video Understandingdraper/papers/narayana_ijprai18.pdfKeywords: Hidden Markov models; video analysis; Segre variety. 1. Introduction Hidden Markov

Markov Chains and Hidden Markov Models - Rice University · are “hidden”; hence, we have a hidden Markov model, or HMM. ... In Markov chains and hidden Markov models, the probability

Hidden Markov Models in Bioinformaticscsatol/mach_learn/bemutato/Mate_Korosi_HMMpr… · Outline ˜ Markov Chain ˜ HMM (Hidden Markov Model) ˜ Hidden Markov Models in Bioinformatics

9 Markov chains and Hidden Markov Models - Freie … · 9 Markov chains and Hidden Markov Models We will discuss: Markov chains Hidden Markov Models (HMMs) Algorithms: Viterbi, forward,

Hidden Markov Models./awm/tutorials/hmm14.pdf · Hidden Markov Models ... 14)

Hidden Markov Models - uml.edugrinstei/91.510/HMM/Class Sildes - Hidden Markov... · Hidden Markov Models. ... particular hidden state. • Thus a hidden Markov model is a standard

Hidden Markov Model Nov 11, 2008 Sung-Bae Cho. Hidden Markov Model Inference of Hidden Markov Model Path Tracking of HMM Learning of Hidden Markov Model

Hidden Markov Models - AUusers-cs.au.dk/cstorm/courses/PRiB_f12/slides/hidden-markov-model… · Hidden Markov Models Markov Model Hidden Markov Model If the latent variables are

Hidden Markov

Hidden Markov Models

Hidden Markov Models - Penn Engineeringcis520/lectures/HMM.pdf · Hidden Markov Models ... w is the “hidden” part of the “Hidden Markov Model” In speech recognition, we will

EE365: Hidden Markov Models - Stanford Universityee266.stanford.edu/lectures/hmm.pdf · EE365: Hidden Markov Models Hidden Markov Models The Viterbi Algorithm 1. Hidden Markov Models

L13: hidden Markov models - Texas A&M Universityresearch.cs.tamu.edu/prism/lectures/sp/l13.pdf · L13: hidden Markov models • Discrete Markov processes • Hidden Markov models

Hidden Markov Model

Bioinformatics Introduction to Hidden Markov Models - … · Bioinformatics Introduction to Hidden Markov Models Hidden Markov Models and Multiple Sequence Alignment Slides borrowed

Overview Hidden Markov Models Gaussian Mixture Models · Hidden Markov Models and Gaussian Mixture Models ... Hidden Markov models HMM algorithms ... model ASR Lectures 4&5 Hidden

L13: hidden Markov modelscourses.cs.tamu.edu/rgutier/csce630_f14/l13.pdf · L13: hidden Markov models • Discrete Markov processes • Hidden Markov models • Forward and Backward