49
Artificial Intelligence CS 165A Mar 14, 2019 Instructor: Prof. Yu-Xiang Wang ® Review 1

Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

Artificial IntelligenceCS 165A

Mar 14, 2019

Instructor: Prof. Yu-Xiang Wang

® Review

1

Page 2: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

Announcement

• HW4 due

• MP2 due– A tournament will be conducted and the results revealed on Piazza.

• HW3 grading is being done.– The TA catching a homework deadline.

– It will be distributed in Friday’s discussion class.

– The TA will talk about HW3 and HW4 solutions.

• Final: Next Monday 12 - 3!

2

Page 3: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

Very important: Course evaluation

• Please complete the course evaluation form– I need on volunteer to help collecting the responses and submit theforms to the CS department office.

• This is the first time I am running CS165A.– Your feedbacks will make a difference.

• We are in the process of modernizing the course.– New topics, e.g., RL.– More rigorous treatments.

• I hope CS165A will be a course that you brag about inyour job interviews and remember for many years.

3

Page 4: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

MP1: Leaderboard

1. Brian Humphreys: 100%– Multinomial NaiveBayes: Using up to 5-grams features.

2. Claudia Zeng: 99.93%3. Calvin Wang: 98.97%

• Other winners will be notified in private emails.– Number 4, 5 get 20 bonus points.– Number 6- 10 get 10 bonus points.

4

Page 5: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

MP1: Considerations

• Accuracy:

– 50% is trivial.

– 80% is the Baseline multinomial naïve bayes without additional

feature engineering.

• Run time: From ~10 s to more than one hour

– Use hashtables: dictionary. Use sparse representation.

• Modularize your code

– Feature extractor(Text) à Feature

– Train(Feature, Label) à Model parameters

– Predict(Feature) à Prediction

– Evaluate(Prediction, TrueLabel) à Accuracy

5

Page 6: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

Course schedule

6

Week Topic

1Introduction and Course OverviewAI Problem Solving and Intelligent Agents

2Quantifying uncertaintyProbabilistic Reasoning: Bayes Network

3Probabilistic Reasoning: Conditional IndependencesMachine Learning: Supervised Learning

4

Machine Learning: Unsupervised Learning

Machine Learning: How machine learning works?

5Continuous optimizationSearch: Solving problems with Search

6Search: Basic searchMidterm

7Search: Informed searchRL: RL Overview and MDP

8RL: Multi-arm BanditsRL: Contexual Bandits & Policy evaluation

9RL: Tabular MDP, RL algorithmsLogic: Propositional Logic

10Logic: First order logicReview session

11 Final Exam. March 18 12:00 PM - 3:00 PM

Probabilistic Reasoning

Machine Learning

Search

ReinforcementLearning

Logic

Page 7: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

Final exam

• A small fraction will be about Lecture 1-9

• Main focus on Lecture 10 – 17– Search– RL– Logic

• There will be questions about using tools from 1-9 on the second half of the class

7

Page 8: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

Lecture 1-2: AI Overview

• Strong AI / Weak AI• Turing Test• AI for problem solving• Rational agents• Examples of AI in the real world

8

Page 9: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

Our view of AI

• So this course is about designing rational agents

– Constructing f– For a given class of environments and tasks, we seek the agent (or

class of agents) with the “best” performance

– Note: Computational limitations make complete rationality unachievable in most cases

• In practice, we will focus on problem-solving techniques(ways of constructing f ), not agents per se

9

Page 10: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

Different Ways of Looking at the AI

• Agent types / level of intelligence– Low-level: Reflex agents

– Mid-level: Goal-based / Utility-based agents: planning

– High-level: Knowledge-based: Logic agents

• Optimization view– Everything is an optimization problem

• Theoretical aspects– Time/space complexity

– Algorithms and data structures

– Statistical properties: sample complexity

10

Page 11: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

Rational agents and Optimization

• What is the objective function that an AI agent optimizes?– Likelihood, Training error, Utility, Reward, Regret

• What is the argument over which the AI agent is optimizing?– Policy, action, search strategy

• What are the input from the environment into that objective functions– Observation, State, Reward, Feedback, Labels, Features.

• What are the algorithms used for these agents– Gradient descent, SGD. Tree-search, Graph search. Dynamic

programming. Explore-Exploit. 11

Page 12: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

Lecture 3-5: Probabilities and BayesNet

• CPTs– Count number of independent numbers to represent a CPT

• Conditional, Marginal, Probabilistic Inference with BayesRule

• Read off conditional independences from the graph– d-separation– Markov Blanket

• Undirected graphical models (Not to be in the exam)

12

Page 13: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

13

Example: Flu and measles

Flu

MeaslesFever

Spots

P(Flu) = 0.01P(Measles) = 0.001

P(Flu)

P(Measles)

P(Spots | Measles)

P(Fever | Flu, Measles)

P(Spots | Measles) = [0, 0.9]P(Fever | Flu, Measles) = [0.01, 0.8, 0.9, 1.0]

Compute P(Flu | Fever) and P(Flu | Fever, Spots).Are they equivalent?

CPTs:

Page 14: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

d-separation and Markov Blanket

14

3 ways to block paths from X to Y, given E

The set of nodes E d-separates sets Xand Y

1. Parents

2. Children

3. Children’s other parents

Page 15: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

Connecting BayesNet to MDP

• Markov decision process has a collection of random variable– State, Action, Reward at time 1,2,3,4,…

• The MDP specifies a very specific way how these random variables are generated– In a form of a probability distribution, that can be factorized.

• What is the BayesNet of an MDP?

• What are the conditional independences?

15

Page 16: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

Lecture 6-9: Machine Learning

• Types of machine learning: Supervised / unsupervised…

– Examples.

• ML is about coming up with objectives to optimize

– Often MLE, MAP.

– Often there is a graphical model.

• How to optimize?

– Gradient Descent, SGD

• Statistical Learning theory

– uniform convergence using

– Hoeffding’s inequality and Union bound.16

Page 17: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

Understanding Machine Learning

• What do we mean by saying: ML works – Error decomposition – Empirical risk, Risk, Generalization

• When ML does not work? What are the assumptions?– iid: Independent and Identically Distributed– Training data and test data are drawn from the same distribution

• Statistical Learning theory– How many data points do we need?– Hoeffding’s inequality and the union bound.

17

Page 18: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

Lecture 10 – 13: Search

• Formulate problems as search

• Algorithms for search

• Search for playing games (more than one player)

18

Page 19: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

19

Example: Romania

Page 20: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

20

How do we evaluate a search algorithm?

• Primary criteria to evaluate search strategies– Completeness

• Is it guaranteed to find a solution (if one exists)?– Optimality

• Does it find the “best” solution (if there are more than one)?– Time complexity

• Number of nodes generated/expanded• (How long does it take to find a solution?)

– Space complexity• How much memory does it require?

• Some performance measures– Best case– Worst case– Average case– Real-world case

*Note that this is not saying it’s space/time complexity is optimal.

Page 21: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

21

State-Space Diagrams

• State-space description can be represented by a state-space diagram, which shows– States (incl. initial and goal)– Operators/actions (state transitions)– Path costs

Page 22: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

22

Search Tree

B C CB F

D H G

A D GA D E

B C

A

Search tree (partially expanded)

Page 23: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

Uninformed search algorithms

23

(Section 3.4.7 in the AIMA book.)

Page 24: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

24

Informed search (A* search)

f(n) = g(n) + h(n)

Page 25: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

What we know about A* Search?

• If the heuristic is “admissible”– Optimistic, always underestimates cost– Then, A* Tree search is optimal, and optimally efficient.

• If the heuristic is “consistent”– Then A* Graph Search is optimal.

• Question to think about– How to learn a heuristic function?

25

Page 26: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

Games and minimax search

• Specification of a game:– State-action-transition-– Two-player, Zero-sum, perfect

information, Deterministic

• Minimax search:– Search assuming your opponent is

behaving adversarially.

• Two ways to speedup– Pruning– Cut off minimax search early, use a

heuristic26

Opponent’smove

7 3 -8 50

Your move

3 -8

3

MIN

MAX

Page 27: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

Search depth cutoff

Tic-Tac-Toe withsearch depth 2

Evaluations shown for X

-2

27

Page 28: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

Lecture 13 -16: Reinforcement Learning

• Overview– What are the problems that are best solved by RL– How RL is related to supervised learning / search

• Settings:– Multi-armed bandit– Contextual bandit– Policy evaluation and causal inference– Reinforcement Learning

• Key concepts:– Explore-Exploit: The need for exploration– Value function, Q function: The need for long-term planning

28

Page 29: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

Reinforcement learning problem setup

• State, Action, Reward and Observation

• Policy:– When the state is observable:– Or when the state is not observable

• Learn the best policy that maximizes the expected reward

– Finite horizon (episodic) RL:

– Infinite horizon RL:29

⇡ : S ! A<latexit sha1_base64="JSR2KBCrB1Pfm6GeQfT6grf0tX8=">AAACEnicbZDLSsNAFIYnXmu9RV26GSyCbkoiguKq6sZlRXuBJpTJdNIOncyEmYlSQp7Bja/ixoUibl25822ctAG19YeBj/+cw5zzBzGjSjvOlzU3v7C4tFxaKa+urW9s2lvbTSUSiUkDCyZkO0CKMMpJQ1PNSDuWBEUBI61geJnXW3dEKir4rR7FxI9Qn9OQYqSN1bUPvZieQehFSA8wYulNBj1J+wONpBT3P/551rUrTtUZC86CW0AFFKp37U+vJ3ASEa4xQ0p1XCfWfoqkppiRrOwlisQID1GfdAxyFBHlp+OTMrhvnB4MhTSPazh2f0+kKFJqFAWmM19RTddy879aJ9HhqZ9SHieacDz5KEwY1ALm+cAelQRrNjKAsKRmV4gHSCKsTYplE4I7ffIsNI+qruHr40rtooijBHbBHjgALjgBNXAF6qABMHgAT+AFvFqP1rP1Zr1PWuesYmYH/JH18Q0lz53J</latexit><latexit sha1_base64="JSR2KBCrB1Pfm6GeQfT6grf0tX8=">AAACEnicbZDLSsNAFIYnXmu9RV26GSyCbkoiguKq6sZlRXuBJpTJdNIOncyEmYlSQp7Bja/ixoUibl25822ctAG19YeBj/+cw5zzBzGjSjvOlzU3v7C4tFxaKa+urW9s2lvbTSUSiUkDCyZkO0CKMMpJQ1PNSDuWBEUBI61geJnXW3dEKir4rR7FxI9Qn9OQYqSN1bUPvZieQehFSA8wYulNBj1J+wONpBT3P/551rUrTtUZC86CW0AFFKp37U+vJ3ASEa4xQ0p1XCfWfoqkppiRrOwlisQID1GfdAxyFBHlp+OTMrhvnB4MhTSPazh2f0+kKFJqFAWmM19RTddy879aJ9HhqZ9SHieacDz5KEwY1ALm+cAelQRrNjKAsKRmV4gHSCKsTYplE4I7ffIsNI+qruHr40rtooijBHbBHjgALjgBNXAF6qABMHgAT+AFvFqP1rP1Zr1PWuesYmYH/JH18Q0lz53J</latexit><latexit sha1_base64="JSR2KBCrB1Pfm6GeQfT6grf0tX8=">AAACEnicbZDLSsNAFIYnXmu9RV26GSyCbkoiguKq6sZlRXuBJpTJdNIOncyEmYlSQp7Bja/ixoUibl25822ctAG19YeBj/+cw5zzBzGjSjvOlzU3v7C4tFxaKa+urW9s2lvbTSUSiUkDCyZkO0CKMMpJQ1PNSDuWBEUBI61geJnXW3dEKir4rR7FxI9Qn9OQYqSN1bUPvZieQehFSA8wYulNBj1J+wONpBT3P/551rUrTtUZC86CW0AFFKp37U+vJ3ASEa4xQ0p1XCfWfoqkppiRrOwlisQID1GfdAxyFBHlp+OTMrhvnB4MhTSPazh2f0+kKFJqFAWmM19RTddy879aJ9HhqZ9SHieacDz5KEwY1ALm+cAelQRrNjKAsKRmV4gHSCKsTYplE4I7ffIsNI+qruHr40rtooijBHbBHjgALjgBNXAF6qABMHgAT+AFvFqP1rP1Zr1PWuesYmYH/JH18Q0lz53J</latexit><latexit sha1_base64="JSR2KBCrB1Pfm6GeQfT6grf0tX8=">AAACEnicbZDLSsNAFIYnXmu9RV26GSyCbkoiguKq6sZlRXuBJpTJdNIOncyEmYlSQp7Bja/ixoUibl25822ctAG19YeBj/+cw5zzBzGjSjvOlzU3v7C4tFxaKa+urW9s2lvbTSUSiUkDCyZkO0CKMMpJQ1PNSDuWBEUBI61geJnXW3dEKir4rR7FxI9Qn9OQYqSN1bUPvZieQehFSA8wYulNBj1J+wONpBT3P/551rUrTtUZC86CW0AFFKp37U+vJ3ASEa4xQ0p1XCfWfoqkppiRrOwlisQID1GfdAxyFBHlp+OTMrhvnB4MhTSPazh2f0+kKFJqFAWmM19RTddy879aJ9HhqZ9SHieacDz5KEwY1ALm+cAelQRrNjKAsKRmV4gHSCKsTYplE4I7ffIsNI+qruHr40rtooijBHbBHjgALjgBNXAF6qABMHgAT+AFvFqP1rP1Zr1PWuesYmYH/JH18Q0lz53J</latexit>

⇡t : (O ⇥A⇥ R)t�1! A

<latexit sha1_base64="oKDl42DTAiBMX0Nan8pkezMJmYI=">AAACPHicbVC7SgNBFJ31GeMramkzGIRYGHZFUKyiNnbGRx6QXcPsZJIMmX0wc1cJy36YjR9hZ2VjoYittZNkkZh4YOBwzrnMvccNBVdgmi/GzOzc/MJiZim7vLK6tp7b2KyqIJKUVWggAll3iWKC+6wCHASrh5IRzxWs5vbOB37tnknFA/8W+iFzPNLxeZtTAlpq5m7skDfhBBdsj0CXEhFfJjZwj6lf4XRccN34Otm7i2HfSrAteacLRMrgAY/Fm7m8WTSHwNPESkkepSg3c892K6CRx3yggijVsMwQnJhI4FSwJGtHioWE9kiHNTT1id7GiYfHJ3hXKy3cDqR+PuChOj4RE0+pvufq5GBFNekNxP+8RgTtYyfmfhgB8+noo3YkMAR40CRucckoiL4mhEqud8W0SyShoPvO6hKsyZOnSfWgaGl+dZgvnaV1ZNA22kEFZKEjVEIXqIwqiKJH9Ire0YfxZLwZn8bXKDpjpDNb6A+M7x9/IbAU</latexit><latexit sha1_base64="oKDl42DTAiBMX0Nan8pkezMJmYI=">AAACPHicbVC7SgNBFJ31GeMramkzGIRYGHZFUKyiNnbGRx6QXcPsZJIMmX0wc1cJy36YjR9hZ2VjoYittZNkkZh4YOBwzrnMvccNBVdgmi/GzOzc/MJiZim7vLK6tp7b2KyqIJKUVWggAll3iWKC+6wCHASrh5IRzxWs5vbOB37tnknFA/8W+iFzPNLxeZtTAlpq5m7skDfhBBdsj0CXEhFfJjZwj6lf4XRccN34Otm7i2HfSrAteacLRMrgAY/Fm7m8WTSHwNPESkkepSg3c892K6CRx3yggijVsMwQnJhI4FSwJGtHioWE9kiHNTT1id7GiYfHJ3hXKy3cDqR+PuChOj4RE0+pvufq5GBFNekNxP+8RgTtYyfmfhgB8+noo3YkMAR40CRucckoiL4mhEqud8W0SyShoPvO6hKsyZOnSfWgaGl+dZgvnaV1ZNA22kEFZKEjVEIXqIwqiKJH9Ire0YfxZLwZn8bXKDpjpDNb6A+M7x9/IbAU</latexit><latexit sha1_base64="oKDl42DTAiBMX0Nan8pkezMJmYI=">AAACPHicbVC7SgNBFJ31GeMramkzGIRYGHZFUKyiNnbGRx6QXcPsZJIMmX0wc1cJy36YjR9hZ2VjoYittZNkkZh4YOBwzrnMvccNBVdgmi/GzOzc/MJiZim7vLK6tp7b2KyqIJKUVWggAll3iWKC+6wCHASrh5IRzxWs5vbOB37tnknFA/8W+iFzPNLxeZtTAlpq5m7skDfhBBdsj0CXEhFfJjZwj6lf4XRccN34Otm7i2HfSrAteacLRMrgAY/Fm7m8WTSHwNPESkkepSg3c892K6CRx3yggijVsMwQnJhI4FSwJGtHioWE9kiHNTT1id7GiYfHJ3hXKy3cDqR+PuChOj4RE0+pvufq5GBFNekNxP+8RgTtYyfmfhgB8+noo3YkMAR40CRucckoiL4mhEqud8W0SyShoPvO6hKsyZOnSfWgaGl+dZgvnaV1ZNA22kEFZKEjVEIXqIwqiKJH9Ire0YfxZLwZn8bXKDpjpDNb6A+M7x9/IbAU</latexit><latexit sha1_base64="oKDl42DTAiBMX0Nan8pkezMJmYI=">AAACPHicbVC7SgNBFJ31GeMramkzGIRYGHZFUKyiNnbGRx6QXcPsZJIMmX0wc1cJy36YjR9hZ2VjoYittZNkkZh4YOBwzrnMvccNBVdgmi/GzOzc/MJiZim7vLK6tp7b2KyqIJKUVWggAll3iWKC+6wCHASrh5IRzxWs5vbOB37tnknFA/8W+iFzPNLxeZtTAlpq5m7skDfhBBdsj0CXEhFfJjZwj6lf4XRccN34Otm7i2HfSrAteacLRMrgAY/Fm7m8WTSHwNPESkkepSg3c892K6CRx3yggijVsMwQnJhI4FSwJGtHioWE9kiHNTT1id7GiYfHJ3hXKy3cDqR+PuChOj4RE0+pvufq5GBFNekNxP+8RgTtYyfmfhgB8+noo3YkMAR40CRucckoiL4mhEqud8W0SyShoPvO6hKsyZOnSfWgaGl+dZgvnaV1ZNA22kEFZKEjVEIXqIwqiKJH9Ire0YfxZLwZn8bXKDpjpDNb6A+M7x9/IbAU</latexit>

St 2 S<latexit sha1_base64="H4OVRyT8Zmoun872yVAuceDZNfk=">AAAB+3icbVBNS8NAFHypX7V+1Xr0slgETyURQY9FLx4rtbXQhLDZbtqlm03Y3Ygl5K948aCIV/+IN/+NmzYHbR1YGGbe481OkHCmtG1/W5W19Y3Nrep2bWd3b/+gftjoqziVhPZIzGM5CLCinAna00xzOkgkxVHA6UMwvSn8h0cqFYvFvZ4l1IvwWLCQEayN5NcbXV+7TCA3wnpCMM+6uV9v2i17DrRKnJI0oUTHr3+5o5ikERWacKzU0LET7WVYakY4zWtuqmiCyRSP6dBQgSOqvGyePUenRhmhMJbmCY3m6u+NDEdKzaLATBYR1bJXiP95w1SHV17GRJJqKsjiUJhypGNUFIFGTFKi+cwQTCQzWRGZYImJNnXVTAnO8pdXSf+85Rh+d9FsX5d1VOEYTuAMHLiENtxCB3pA4Ame4RXerNx6sd6tj8VoxSp3juAPrM8f6YGUWQ==</latexit><latexit sha1_base64="H4OVRyT8Zmoun872yVAuceDZNfk=">AAAB+3icbVBNS8NAFHypX7V+1Xr0slgETyURQY9FLx4rtbXQhLDZbtqlm03Y3Ygl5K948aCIV/+IN/+NmzYHbR1YGGbe481OkHCmtG1/W5W19Y3Nrep2bWd3b/+gftjoqziVhPZIzGM5CLCinAna00xzOkgkxVHA6UMwvSn8h0cqFYvFvZ4l1IvwWLCQEayN5NcbXV+7TCA3wnpCMM+6uV9v2i17DrRKnJI0oUTHr3+5o5ikERWacKzU0LET7WVYakY4zWtuqmiCyRSP6dBQgSOqvGyePUenRhmhMJbmCY3m6u+NDEdKzaLATBYR1bJXiP95w1SHV17GRJJqKsjiUJhypGNUFIFGTFKi+cwQTCQzWRGZYImJNnXVTAnO8pdXSf+85Rh+d9FsX5d1VOEYTuAMHLiENtxCB3pA4Ame4RXerNx6sd6tj8VoxSp3juAPrM8f6YGUWQ==</latexit><latexit sha1_base64="H4OVRyT8Zmoun872yVAuceDZNfk=">AAAB+3icbVBNS8NAFHypX7V+1Xr0slgETyURQY9FLx4rtbXQhLDZbtqlm03Y3Ygl5K948aCIV/+IN/+NmzYHbR1YGGbe481OkHCmtG1/W5W19Y3Nrep2bWd3b/+gftjoqziVhPZIzGM5CLCinAna00xzOkgkxVHA6UMwvSn8h0cqFYvFvZ4l1IvwWLCQEayN5NcbXV+7TCA3wnpCMM+6uV9v2i17DrRKnJI0oUTHr3+5o5ikERWacKzU0LET7WVYakY4zWtuqmiCyRSP6dBQgSOqvGyePUenRhmhMJbmCY3m6u+NDEdKzaLATBYR1bJXiP95w1SHV17GRJJqKsjiUJhypGNUFIFGTFKi+cwQTCQzWRGZYImJNnXVTAnO8pdXSf+85Rh+d9FsX5d1VOEYTuAMHLiENtxCB3pA4Ame4RXerNx6sd6tj8VoxSp3juAPrM8f6YGUWQ==</latexit><latexit sha1_base64="H4OVRyT8Zmoun872yVAuceDZNfk=">AAAB+3icbVBNS8NAFHypX7V+1Xr0slgETyURQY9FLx4rtbXQhLDZbtqlm03Y3Ygl5K948aCIV/+IN/+NmzYHbR1YGGbe481OkHCmtG1/W5W19Y3Nrep2bWd3b/+gftjoqziVhPZIzGM5CLCinAna00xzOkgkxVHA6UMwvSn8h0cqFYvFvZ4l1IvwWLCQEayN5NcbXV+7TCA3wnpCMM+6uV9v2i17DrRKnJI0oUTHr3+5o5ikERWacKzU0LET7WVYakY4zWtuqmiCyRSP6dBQgSOqvGyePUenRhmhMJbmCY3m6u+NDEdKzaLATBYR1bJXiP95w1SHV17GRJJqKsjiUJhypGNUFIFGTFKi+cwQTCQzWRGZYImJNnXVTAnO8pdXSf+85Rh+d9FsX5d1VOEYTuAMHLiENtxCB3pA4Ame4RXerNx6sd6tj8VoxSp3juAPrM8f6YGUWQ==</latexit>

At 2 A<latexit sha1_base64="xX32X1fWfQeu2hPnv8gbSWA79Eo=">AAAB+3icbVBNS8NAFHypX7V+1Xr0slgETyURQY+tXjxWsLXQhLDZbtqlm03Y3Ygl5K948aCIV/+IN/+NmzYHbR1YGGbe481OkHCmtG1/W5W19Y3Nrep2bWd3b/+gftjoqziVhPZIzGM5CLCinAna00xzOkgkxVHA6UMwvSn8h0cqFYvFvZ4l1IvwWLCQEayN5NcbHV+7TCA3wnpCMM86uV9v2i17DrRKnJI0oUTXr3+5o5ikERWacKzU0LET7WVYakY4zWtuqmiCyRSP6dBQgSOqvGyePUenRhmhMJbmCY3m6u+NDEdKzaLATBYR1bJXiP95w1SHV17GRJJqKsjiUJhypGNUFIFGTFKi+cwQTCQzWRGZYImJNnXVTAnO8pdXSf+85Rh+d9FsX5d1VOEYTuAMHLiENtxCF3pA4Ame4RXerNx6sd6tj8VoxSp3juAPrM8fsa2UNQ==</latexit><latexit sha1_base64="xX32X1fWfQeu2hPnv8gbSWA79Eo=">AAAB+3icbVBNS8NAFHypX7V+1Xr0slgETyURQY+tXjxWsLXQhLDZbtqlm03Y3Ygl5K948aCIV/+IN/+NmzYHbR1YGGbe481OkHCmtG1/W5W19Y3Nrep2bWd3b/+gftjoqziVhPZIzGM5CLCinAna00xzOkgkxVHA6UMwvSn8h0cqFYvFvZ4l1IvwWLCQEayN5NcbHV+7TCA3wnpCMM86uV9v2i17DrRKnJI0oUTXr3+5o5ikERWacKzU0LET7WVYakY4zWtuqmiCyRSP6dBQgSOqvGyePUenRhmhMJbmCY3m6u+NDEdKzaLATBYR1bJXiP95w1SHV17GRJJqKsjiUJhypGNUFIFGTFKi+cwQTCQzWRGZYImJNnXVTAnO8pdXSf+85Rh+d9FsX5d1VOEYTuAMHLiENtxCF3pA4Ame4RXerNx6sd6tj8VoxSp3juAPrM8fsa2UNQ==</latexit><latexit sha1_base64="xX32X1fWfQeu2hPnv8gbSWA79Eo=">AAAB+3icbVBNS8NAFHypX7V+1Xr0slgETyURQY+tXjxWsLXQhLDZbtqlm03Y3Ygl5K948aCIV/+IN/+NmzYHbR1YGGbe481OkHCmtG1/W5W19Y3Nrep2bWd3b/+gftjoqziVhPZIzGM5CLCinAna00xzOkgkxVHA6UMwvSn8h0cqFYvFvZ4l1IvwWLCQEayN5NcbHV+7TCA3wnpCMM86uV9v2i17DrRKnJI0oUTXr3+5o5ikERWacKzU0LET7WVYakY4zWtuqmiCyRSP6dBQgSOqvGyePUenRhmhMJbmCY3m6u+NDEdKzaLATBYR1bJXiP95w1SHV17GRJJqKsjiUJhypGNUFIFGTFKi+cwQTCQzWRGZYImJNnXVTAnO8pdXSf+85Rh+d9FsX5d1VOEYTuAMHLiENtxCF3pA4Ame4RXerNx6sd6tj8VoxSp3juAPrM8fsa2UNQ==</latexit><latexit sha1_base64="xX32X1fWfQeu2hPnv8gbSWA79Eo=">AAAB+3icbVBNS8NAFHypX7V+1Xr0slgETyURQY+tXjxWsLXQhLDZbtqlm03Y3Ygl5K948aCIV/+IN/+NmzYHbR1YGGbe481OkHCmtG1/W5W19Y3Nrep2bWd3b/+gftjoqziVhPZIzGM5CLCinAna00xzOkgkxVHA6UMwvSn8h0cqFYvFvZ4l1IvwWLCQEayN5NcbHV+7TCA3wnpCMM86uV9v2i17DrRKnJI0oUTXr3+5o5ikERWacKzU0LET7WVYakY4zWtuqmiCyRSP6dBQgSOqvGyePUenRhmhMJbmCY3m6u+NDEdKzaLATBYR1bJXiP95w1SHV17GRJJqKsjiUJhypGNUFIFGTFKi+cwQTCQzWRGZYImJNnXVTAnO8pdXSf+85Rh+d9FsX5d1VOEYTuAMHLiENtxCF3pA4Ame4RXerNx6sd6tj8VoxSp3juAPrM8fsa2UNQ==</latexit>

Rt 2 R<latexit sha1_base64="mcKcKCPJb1sMOgc9hprL//Z0AUs=">AAAB+nicbVDLSsNAFL2pr1pfqS7dDBbBVUlE0GXRjcta7APaECbTSTt0MgkzE6XEfoobF4q49Uvc+TdO2iy09cDA4Zx7uWdOkHCmtON8W6W19Y3NrfJ2ZWd3b//Arh52VJxKQtsk5rHsBVhRzgRta6Y57SWS4ijgtBtMbnK/+0ClYrG419OEehEeCRYygrWRfLva8vWACTSIsB4HQdaa+XbNqTtzoFXiFqQGBZq+/TUYxiSNqNCEY6X6rpNoL8NSM8LprDJIFU0wmeAR7RsqcESVl82jz9CpUYYojKV5QqO5+nsjw5FS0ygwk3lCtezl4n9eP9XhlZcxkaSaCrI4FKYc6RjlPaAhk5RoPjUEE8lMVkTGWGKiTVsVU4K7/OVV0jmvu4bfXdQa10UdZTiGEzgDFy6hAbfQhDYQeIRneIU368l6sd6tj8VoySp2juAPrM8fFw2T4Q==</latexit><latexit sha1_base64="mcKcKCPJb1sMOgc9hprL//Z0AUs=">AAAB+nicbVDLSsNAFL2pr1pfqS7dDBbBVUlE0GXRjcta7APaECbTSTt0MgkzE6XEfoobF4q49Uvc+TdO2iy09cDA4Zx7uWdOkHCmtON8W6W19Y3NrfJ2ZWd3b//Arh52VJxKQtsk5rHsBVhRzgRta6Y57SWS4ijgtBtMbnK/+0ClYrG419OEehEeCRYygrWRfLva8vWACTSIsB4HQdaa+XbNqTtzoFXiFqQGBZq+/TUYxiSNqNCEY6X6rpNoL8NSM8LprDJIFU0wmeAR7RsqcESVl82jz9CpUYYojKV5QqO5+nsjw5FS0ygwk3lCtezl4n9eP9XhlZcxkaSaCrI4FKYc6RjlPaAhk5RoPjUEE8lMVkTGWGKiTVsVU4K7/OVV0jmvu4bfXdQa10UdZTiGEzgDFy6hAbfQhDYQeIRneIU368l6sd6tj8VoySp2juAPrM8fFw2T4Q==</latexit><latexit sha1_base64="mcKcKCPJb1sMOgc9hprL//Z0AUs=">AAAB+nicbVDLSsNAFL2pr1pfqS7dDBbBVUlE0GXRjcta7APaECbTSTt0MgkzE6XEfoobF4q49Uvc+TdO2iy09cDA4Zx7uWdOkHCmtON8W6W19Y3NrfJ2ZWd3b//Arh52VJxKQtsk5rHsBVhRzgRta6Y57SWS4ijgtBtMbnK/+0ClYrG419OEehEeCRYygrWRfLva8vWACTSIsB4HQdaa+XbNqTtzoFXiFqQGBZq+/TUYxiSNqNCEY6X6rpNoL8NSM8LprDJIFU0wmeAR7RsqcESVl82jz9CpUYYojKV5QqO5+nsjw5FS0ygwk3lCtezl4n9eP9XhlZcxkaSaCrI4FKYc6RjlPaAhk5RoPjUEE8lMVkTGWGKiTVsVU4K7/OVV0jmvu4bfXdQa10UdZTiGEzgDFy6hAbfQhDYQeIRneIU368l6sd6tj8VoySp2juAPrM8fFw2T4Q==</latexit><latexit sha1_base64="mcKcKCPJb1sMOgc9hprL//Z0AUs=">AAAB+nicbVDLSsNAFL2pr1pfqS7dDBbBVUlE0GXRjcta7APaECbTSTt0MgkzE6XEfoobF4q49Uvc+TdO2iy09cDA4Zx7uWdOkHCmtON8W6W19Y3NrfJ2ZWd3b//Arh52VJxKQtsk5rHsBVhRzgRta6Y57SWS4ijgtBtMbnK/+0ClYrG419OEehEeCRYygrWRfLva8vWACTSIsB4HQdaa+XbNqTtzoFXiFqQGBZq+/TUYxiSNqNCEY6X6rpNoL8NSM8LprDJIFU0wmeAR7RsqcESVl82jz9CpUYYojKV5QqO5+nsjw5FS0ygwk3lCtezl4n9eP9XhlZcxkaSaCrI4FKYc6RjlPaAhk5RoPjUEE8lMVkTGWGKiTVsVU4K7/OVV0jmvu4bfXdQa10UdZTiGEzgDFy6hAbfQhDYQeIRneIU368l6sd6tj8VoySp2juAPrM8fFw2T4Q==</latexit>

Ot 2 O<latexit sha1_base64="OrcLGQFJVQKHdlJdMLP5tdjPy54=">AAAB+3icbVBNS8NAFNzUr1q/aj16WSyCp5KIoMeiF2+tYGuhCWGz3bRLN5uw+yKWkL/ixYMiXv0j3vw3btoctHVgYZh5jzc7QSK4Btv+tipr6xubW9Xt2s7u3v5B/bDR13GqKOvRWMRqEBDNBJesBxwEGySKkSgQ7CGY3hT+wyNTmsfyHmYJ8yIyljzklICR/Hqj44PLJXYjAhNKRNbJ/XrTbtlz4FXilKSJSnT9+pc7imkaMQlUEK2Hjp2AlxEFnAqW19xUs4TQKRmzoaGSREx72Tx7jk+NMsJhrMyTgOfq742MRFrPosBMFhH1sleI/3nDFMIrL+MySYFJujgUpgJDjIsi8IgrRkHMDCFUcZMV0wlRhIKpq2ZKcJa/vEr65y3H8LuLZvu6rKOKjtEJOkMOukRtdIu6qIcoekLP6BW9Wbn1Yr1bH4vRilXuHKE/sD5/AN0ZlFE=</latexit><latexit sha1_base64="OrcLGQFJVQKHdlJdMLP5tdjPy54=">AAAB+3icbVBNS8NAFNzUr1q/aj16WSyCp5KIoMeiF2+tYGuhCWGz3bRLN5uw+yKWkL/ixYMiXv0j3vw3btoctHVgYZh5jzc7QSK4Btv+tipr6xubW9Xt2s7u3v5B/bDR13GqKOvRWMRqEBDNBJesBxwEGySKkSgQ7CGY3hT+wyNTmsfyHmYJ8yIyljzklICR/Hqj44PLJXYjAhNKRNbJ/XrTbtlz4FXilKSJSnT9+pc7imkaMQlUEK2Hjp2AlxEFnAqW19xUs4TQKRmzoaGSREx72Tx7jk+NMsJhrMyTgOfq742MRFrPosBMFhH1sleI/3nDFMIrL+MySYFJujgUpgJDjIsi8IgrRkHMDCFUcZMV0wlRhIKpq2ZKcJa/vEr65y3H8LuLZvu6rKOKjtEJOkMOukRtdIu6qIcoekLP6BW9Wbn1Yr1bH4vRilXuHKE/sD5/AN0ZlFE=</latexit><latexit sha1_base64="OrcLGQFJVQKHdlJdMLP5tdjPy54=">AAAB+3icbVBNS8NAFNzUr1q/aj16WSyCp5KIoMeiF2+tYGuhCWGz3bRLN5uw+yKWkL/ixYMiXv0j3vw3btoctHVgYZh5jzc7QSK4Btv+tipr6xubW9Xt2s7u3v5B/bDR13GqKOvRWMRqEBDNBJesBxwEGySKkSgQ7CGY3hT+wyNTmsfyHmYJ8yIyljzklICR/Hqj44PLJXYjAhNKRNbJ/XrTbtlz4FXilKSJSnT9+pc7imkaMQlUEK2Hjp2AlxEFnAqW19xUs4TQKRmzoaGSREx72Tx7jk+NMsJhrMyTgOfq742MRFrPosBMFhH1sleI/3nDFMIrL+MySYFJujgUpgJDjIsi8IgrRkHMDCFUcZMV0wlRhIKpq2ZKcJa/vEr65y3H8LuLZvu6rKOKjtEJOkMOukRtdIu6qIcoekLP6BW9Wbn1Yr1bH4vRilXuHKE/sD5/AN0ZlFE=</latexit><latexit sha1_base64="OrcLGQFJVQKHdlJdMLP5tdjPy54=">AAAB+3icbVBNS8NAFNzUr1q/aj16WSyCp5KIoMeiF2+tYGuhCWGz3bRLN5uw+yKWkL/ixYMiXv0j3vw3btoctHVgYZh5jzc7QSK4Btv+tipr6xubW9Xt2s7u3v5B/bDR13GqKOvRWMRqEBDNBJesBxwEGySKkSgQ7CGY3hT+wyNTmsfyHmYJ8yIyljzklICR/Hqj44PLJXYjAhNKRNbJ/XrTbtlz4FXilKSJSnT9+pc7imkaMQlUEK2Hjp2AlxEFnAqW19xUs4TQKRmzoaGSREx72Tx7jk+NMsJhrMyTgOfq742MRFrPosBMFhH1sleI/3nDFMIrL+MySYFJujgUpgJDjIsi8IgrRkHMDCFUcZMV0wlRhIKpq2ZKcJa/vEr65y3H8LuLZvu6rKOKjtEJOkMOukRtdIu6qIcoekLP6BW9Wbn1Yr1bH4vRilXuHKE/sD5/AN0ZlFE=</latexit>

⇡⇤ = argmax⇡2⇧

E[1X

t=1

�t�1Rt]<latexit sha1_base64="YlLbWUf0D4dk67J+wTbjs5rWhLg=">AAACOHicbVDLShxBFK3W+Bpfo1m6KTIIIijdIuhGkAQhu4who8J0T3O7pnosrKpuqm6LQ9Gf5cbPcBeyyUIJ2foFqRlnER8HCg7n3Mutc7JSCoth+DOYmv4wMzs3v9BYXFpeWW2urZ/ZojKMd1ghC3ORgeVSaN5BgZJflIaDyiQ/z66+jPzza26sKPQPHJY8UTDQIhcM0Etp81tcit42PaIxmEGs4CZ1XomFjtuipl7AyyxzJ3WXxrZSqcOjqO457+c49P4AlIKew52opt9TpEnabIW74Rj0LYkmpEUmaKfN+7hfsEpxjUyCtd0oLDFxYFAwyetGXFleAruCAe96qkFxm7hx8JpueqVP88L4p5GO1f83HChrhyrzk6Mk9rU3Et/zuhXmh4kTuqyQa/Z8KK8kxYKOWqR9YThDOfQEmBH+r5RdggGGvuuGLyF6HfktOdvbjTw/3W8df57UMU82yCeyRSJyQI7JV9ImHcLILflFHshjcBf8Dv4Ef59Hp4LJzkfyAsHTP4yErNg=</latexit><latexit sha1_base64="YlLbWUf0D4dk67J+wTbjs5rWhLg=">AAACOHicbVDLShxBFK3W+Bpfo1m6KTIIIijdIuhGkAQhu4who8J0T3O7pnosrKpuqm6LQ9Gf5cbPcBeyyUIJ2foFqRlnER8HCg7n3Mutc7JSCoth+DOYmv4wMzs3v9BYXFpeWW2urZ/ZojKMd1ghC3ORgeVSaN5BgZJflIaDyiQ/z66+jPzza26sKPQPHJY8UTDQIhcM0Etp81tcit42PaIxmEGs4CZ1XomFjtuipl7AyyxzJ3WXxrZSqcOjqO457+c49P4AlIKew52opt9TpEnabIW74Rj0LYkmpEUmaKfN+7hfsEpxjUyCtd0oLDFxYFAwyetGXFleAruCAe96qkFxm7hx8JpueqVP88L4p5GO1f83HChrhyrzk6Mk9rU3Et/zuhXmh4kTuqyQa/Z8KK8kxYKOWqR9YThDOfQEmBH+r5RdggGGvuuGLyF6HfktOdvbjTw/3W8df57UMU82yCeyRSJyQI7JV9ImHcLILflFHshjcBf8Dv4Ef59Hp4LJzkfyAsHTP4yErNg=</latexit><latexit sha1_base64="YlLbWUf0D4dk67J+wTbjs5rWhLg=">AAACOHicbVDLShxBFK3W+Bpfo1m6KTIIIijdIuhGkAQhu4who8J0T3O7pnosrKpuqm6LQ9Gf5cbPcBeyyUIJ2foFqRlnER8HCg7n3Mutc7JSCoth+DOYmv4wMzs3v9BYXFpeWW2urZ/ZojKMd1ghC3ORgeVSaN5BgZJflIaDyiQ/z66+jPzza26sKPQPHJY8UTDQIhcM0Etp81tcit42PaIxmEGs4CZ1XomFjtuipl7AyyxzJ3WXxrZSqcOjqO457+c49P4AlIKew52opt9TpEnabIW74Rj0LYkmpEUmaKfN+7hfsEpxjUyCtd0oLDFxYFAwyetGXFleAruCAe96qkFxm7hx8JpueqVP88L4p5GO1f83HChrhyrzk6Mk9rU3Et/zuhXmh4kTuqyQa/Z8KK8kxYKOWqR9YThDOfQEmBH+r5RdggGGvuuGLyF6HfktOdvbjTw/3W8df57UMU82yCeyRSJyQI7JV9ImHcLILflFHshjcBf8Dv4Ef59Hp4LJzkfyAsHTP4yErNg=</latexit><latexit sha1_base64="YlLbWUf0D4dk67J+wTbjs5rWhLg=">AAACOHicbVDLShxBFK3W+Bpfo1m6KTIIIijdIuhGkAQhu4who8J0T3O7pnosrKpuqm6LQ9Gf5cbPcBeyyUIJ2foFqRlnER8HCg7n3Mutc7JSCoth+DOYmv4wMzs3v9BYXFpeWW2urZ/ZojKMd1ghC3ORgeVSaN5BgZJflIaDyiQ/z66+jPzza26sKPQPHJY8UTDQIhcM0Etp81tcit42PaIxmEGs4CZ1XomFjtuipl7AyyxzJ3WXxrZSqcOjqO457+c49P4AlIKew52opt9TpEnabIW74Rj0LYkmpEUmaKfN+7hfsEpxjUyCtd0oLDFxYFAwyetGXFleAruCAe96qkFxm7hx8JpueqVP88L4p5GO1f83HChrhyrzk6Mk9rU3Et/zuhXmh4kTuqyQa/Z8KK8kxYKOWqR9YThDOfQEmBH+r5RdggGGvuuGLyF6HfktOdvbjTw/3W8df57UMU82yCeyRSJyQI7JV9ImHcLILflFHshjcBf8Dv4Ef59Hp4LJzkfyAsHTP4yErNg=</latexit>

⇡⇤ = argmax⇡2⇧

E[TX

t=1

Rt]<latexit sha1_base64="TjJh8rvPqqmVSrTZo42NEAGcvKs=">AAACJXicbZDNSsNAFIUn/tb6V3XpZrAI4kISEXRhoSiCyypWC0kaJtNpHTqZhJkbsYS8jBtfxY0LiwiufBUntQttPTBw+O69zL0nTATXYNuf1szs3PzCYmmpvLyyurZe2di81XGqKGvSWMSqFRLNBJesCRwEayWKkSgU7C7snxf1uwemNI/lDQwS5kekJ3mXUwIGBZVTL+HtfVzDHlE9LyKPQWaIx6XX4Dk2AO7DMLvIXezpNAoyqDl5O7vJrwPAflCp2gf2SHjaOGNTRWM1gsrQ68Q0jZgEKojWrmMn4GdEAaeC5WUv1SwhtE96zDVWkohpPxtdmeNdQzq4GyvzJOAR/T2RkUjrQRSazmJtPVkr4H81N4XuiZ9xmaTAJP35qJsKDDEuIsMdrhgFMTCGUMXNrpjeE0UomGDLJgRn8uRpc3t44Bh/dVStn43jKKFttIP2kIOOUR1dogZqIoqe0At6Q0Pr2Xq13q2Pn9YZazyzhf7I+voGRi6lIQ==</latexit><latexit sha1_base64="TjJh8rvPqqmVSrTZo42NEAGcvKs=">AAACJXicbZDNSsNAFIUn/tb6V3XpZrAI4kISEXRhoSiCyypWC0kaJtNpHTqZhJkbsYS8jBtfxY0LiwiufBUntQttPTBw+O69zL0nTATXYNuf1szs3PzCYmmpvLyyurZe2di81XGqKGvSWMSqFRLNBJesCRwEayWKkSgU7C7snxf1uwemNI/lDQwS5kekJ3mXUwIGBZVTL+HtfVzDHlE9LyKPQWaIx6XX4Dk2AO7DMLvIXezpNAoyqDl5O7vJrwPAflCp2gf2SHjaOGNTRWM1gsrQ68Q0jZgEKojWrmMn4GdEAaeC5WUv1SwhtE96zDVWkohpPxtdmeNdQzq4GyvzJOAR/T2RkUjrQRSazmJtPVkr4H81N4XuiZ9xmaTAJP35qJsKDDEuIsMdrhgFMTCGUMXNrpjeE0UomGDLJgRn8uRpc3t44Bh/dVStn43jKKFttIP2kIOOUR1dogZqIoqe0At6Q0Pr2Xq13q2Pn9YZazyzhf7I+voGRi6lIQ==</latexit><latexit sha1_base64="TjJh8rvPqqmVSrTZo42NEAGcvKs=">AAACJXicbZDNSsNAFIUn/tb6V3XpZrAI4kISEXRhoSiCyypWC0kaJtNpHTqZhJkbsYS8jBtfxY0LiwiufBUntQttPTBw+O69zL0nTATXYNuf1szs3PzCYmmpvLyyurZe2di81XGqKGvSWMSqFRLNBJesCRwEayWKkSgU7C7snxf1uwemNI/lDQwS5kekJ3mXUwIGBZVTL+HtfVzDHlE9LyKPQWaIx6XX4Dk2AO7DMLvIXezpNAoyqDl5O7vJrwPAflCp2gf2SHjaOGNTRWM1gsrQ68Q0jZgEKojWrmMn4GdEAaeC5WUv1SwhtE96zDVWkohpPxtdmeNdQzq4GyvzJOAR/T2RkUjrQRSazmJtPVkr4H81N4XuiZ9xmaTAJP35qJsKDDEuIsMdrhgFMTCGUMXNrpjeE0UomGDLJgRn8uRpc3t44Bh/dVStn43jKKFttIP2kIOOUR1dogZqIoqe0At6Q0Pr2Xq13q2Pn9YZazyzhf7I+voGRi6lIQ==</latexit><latexit sha1_base64="TjJh8rvPqqmVSrTZo42NEAGcvKs=">AAACJXicbZDNSsNAFIUn/tb6V3XpZrAI4kISEXRhoSiCyypWC0kaJtNpHTqZhJkbsYS8jBtfxY0LiwiufBUntQttPTBw+O69zL0nTATXYNuf1szs3PzCYmmpvLyyurZe2di81XGqKGvSWMSqFRLNBJesCRwEayWKkSgU7C7snxf1uwemNI/lDQwS5kekJ3mXUwIGBZVTL+HtfVzDHlE9LyKPQWaIx6XX4Dk2AO7DMLvIXezpNAoyqDl5O7vJrwPAflCp2gf2SHjaOGNTRWM1gsrQ68Q0jZgEKojWrmMn4GdEAaeC5WUv1SwhtE96zDVWkohpPxtdmeNdQzq4GyvzJOAR/T2RkUjrQRSazmJtPVkr4H81N4XuiZ9xmaTAJP35qJsKDDEuIsMdrhgFMTCGUMXNrpjeE0UomGDLJgRn8uRpc3t44Bh/dVStn43jKKFttIP2kIOOUR1dogZqIoqe0At6Q0Pr2Xq13q2Pn9YZazyzhf7I+voGRi6lIQ==</latexit>

T: horizon

γ: discount factor

Page 30: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

Multi-arm bandits: Problem setup

• No state. k-actions

• You decide which arm to pull in every iteration

• You collect a cumulative payoff of

• The goal of the agent is to maximize the expected payoff.– Or to minimize the Regret.

30

a 2 A = {1, 2, ..., k}<latexit sha1_base64="Pgd6Bi1XtRX/xI8rK2a2CZi9uYo=">AAACCnicbZDNSsNAFIVv6l+tf1GXbkaL4KKEpAi6EapuXFawrdCEMplO26GTSZiZCCV07cZXceNCEbc+gTvfxknbhbYeGPg4917m3hMmnCntut9WYWl5ZXWtuF7a2Nza3rF395oqTiWhDRLzWN6HWFHOBG1opjm9TyTFUchpKxxe5/XWA5WKxeJOjxIaRLgvWI8RrI3VsQ8x8plAfoT1gGCeXY7RBfIzr1KtOI5TGfrjjl12HXcitAjeDMowU71jf/ndmKQRFZpwrFTbcxMdZFhqRjgdl/xU0QSTIe7TtkGBI6qCbHLKGB0bp4t6sTRPaDRxf09kOFJqFIWmM19Zzddy879aO9W98yBjIkk1FWT6US/lSMcozwV1maRE85EBTCQzuyIywBITbdIrmRC8+ZMXoVl1PMO3p+Xa1SyOIhzAEZyAB2dQgxuoQwMIPMIzvMKb9WS9WO/Wx7S1YM1m9uGPrM8flR6YQQ==</latexit><latexit sha1_base64="Pgd6Bi1XtRX/xI8rK2a2CZi9uYo=">AAACCnicbZDNSsNAFIVv6l+tf1GXbkaL4KKEpAi6EapuXFawrdCEMplO26GTSZiZCCV07cZXceNCEbc+gTvfxknbhbYeGPg4917m3hMmnCntut9WYWl5ZXWtuF7a2Nza3rF395oqTiWhDRLzWN6HWFHOBG1opjm9TyTFUchpKxxe5/XWA5WKxeJOjxIaRLgvWI8RrI3VsQ8x8plAfoT1gGCeXY7RBfIzr1KtOI5TGfrjjl12HXcitAjeDMowU71jf/ndmKQRFZpwrFTbcxMdZFhqRjgdl/xU0QSTIe7TtkGBI6qCbHLKGB0bp4t6sTRPaDRxf09kOFJqFIWmM19Zzddy879aO9W98yBjIkk1FWT6US/lSMcozwV1maRE85EBTCQzuyIywBITbdIrmRC8+ZMXoVl1PMO3p+Xa1SyOIhzAEZyAB2dQgxuoQwMIPMIzvMKb9WS9WO/Wx7S1YM1m9uGPrM8flR6YQQ==</latexit><latexit sha1_base64="Pgd6Bi1XtRX/xI8rK2a2CZi9uYo=">AAACCnicbZDNSsNAFIVv6l+tf1GXbkaL4KKEpAi6EapuXFawrdCEMplO26GTSZiZCCV07cZXceNCEbc+gTvfxknbhbYeGPg4917m3hMmnCntut9WYWl5ZXWtuF7a2Nza3rF395oqTiWhDRLzWN6HWFHOBG1opjm9TyTFUchpKxxe5/XWA5WKxeJOjxIaRLgvWI8RrI3VsQ8x8plAfoT1gGCeXY7RBfIzr1KtOI5TGfrjjl12HXcitAjeDMowU71jf/ndmKQRFZpwrFTbcxMdZFhqRjgdl/xU0QSTIe7TtkGBI6qCbHLKGB0bp4t6sTRPaDRxf09kOFJqFIWmM19Zzddy879aO9W98yBjIkk1FWT6US/lSMcozwV1maRE85EBTCQzuyIywBITbdIrmRC8+ZMXoVl1PMO3p+Xa1SyOIhzAEZyAB2dQgxuoQwMIPMIzvMKb9WS9WO/Wx7S1YM1m9uGPrM8flR6YQQ==</latexit><latexit sha1_base64="Pgd6Bi1XtRX/xI8rK2a2CZi9uYo=">AAACCnicbZDNSsNAFIVv6l+tf1GXbkaL4KKEpAi6EapuXFawrdCEMplO26GTSZiZCCV07cZXceNCEbc+gTvfxknbhbYeGPg4917m3hMmnCntut9WYWl5ZXWtuF7a2Nza3rF395oqTiWhDRLzWN6HWFHOBG1opjm9TyTFUchpKxxe5/XWA5WKxeJOjxIaRLgvWI8RrI3VsQ8x8plAfoT1gGCeXY7RBfIzr1KtOI5TGfrjjl12HXcitAjeDMowU71jf/ndmKQRFZpwrFTbcxMdZFhqRjgdl/xU0QSTIe7TtkGBI6qCbHLKGB0bp4t6sTRPaDRxf09kOFJqFIWmM19Zzddy879aO9W98yBjIkk1FWT6US/lSMcozwV1maRE85EBTCQzuyIywBITbdIrmRC8+ZMXoVl1PMO3p+Xa1SyOIhzAEZyAB2dQgxuoQwMIPMIzvMKb9WS9WO/Wx7S1YM1m9uGPrM8flR6YQQ==</latexit>

A1, A2, ..., AT<latexit sha1_base64="3nwYg5V/8/2poVMB58MrV1fD5fQ=">AAAB+HicbZDNTgIxFIXv4B/iD6Mu3TQSExdkMkNMdAm6cYkJIAlMJp3SgYZOZ9J2TJDwJG5caIxbH8Wdb2OBWSh4k6Zfzrk3vT1hypnSrvttFTY2t7Z3irulvf2Dw7J9dNxRSSYJbZOEJ7IbYkU5E7Stmea0m0qK45DTh3B8O/cfHqlULBEtPUmpH+OhYBEjWBspsMuNwKs2glrVcRxztwK74jruotA6eDlUIK9mYH/1BwnJYio04Vipnuem2p9iqRnhdFbqZ4qmmIzxkPYMChxT5U8Xi8/QuVEGKEqkOUKjhfp7YopjpSZxaDpjrEdq1ZuL/3m9TEfX/pSJNNNUkOVDUcaRTtA8BTRgkhLNJwYwkczsisgIS0y0yapkQvBWv7wOnZrjGb6/rNRv8jiKcApncAEeXEEd7qAJbSCQwTO8wpv1ZL1Y79bHsrVg5TMn8Keszx9NF5Dm</latexit><latexit sha1_base64="3nwYg5V/8/2poVMB58MrV1fD5fQ=">AAAB+HicbZDNTgIxFIXv4B/iD6Mu3TQSExdkMkNMdAm6cYkJIAlMJp3SgYZOZ9J2TJDwJG5caIxbH8Wdb2OBWSh4k6Zfzrk3vT1hypnSrvttFTY2t7Z3irulvf2Dw7J9dNxRSSYJbZOEJ7IbYkU5E7Stmea0m0qK45DTh3B8O/cfHqlULBEtPUmpH+OhYBEjWBspsMuNwKs2glrVcRxztwK74jruotA6eDlUIK9mYH/1BwnJYio04Vipnuem2p9iqRnhdFbqZ4qmmIzxkPYMChxT5U8Xi8/QuVEGKEqkOUKjhfp7YopjpSZxaDpjrEdq1ZuL/3m9TEfX/pSJNNNUkOVDUcaRTtA8BTRgkhLNJwYwkczsisgIS0y0yapkQvBWv7wOnZrjGb6/rNRv8jiKcApncAEeXEEd7qAJbSCQwTO8wpv1ZL1Y79bHsrVg5TMn8Keszx9NF5Dm</latexit><latexit sha1_base64="3nwYg5V/8/2poVMB58MrV1fD5fQ=">AAAB+HicbZDNTgIxFIXv4B/iD6Mu3TQSExdkMkNMdAm6cYkJIAlMJp3SgYZOZ9J2TJDwJG5caIxbH8Wdb2OBWSh4k6Zfzrk3vT1hypnSrvttFTY2t7Z3irulvf2Dw7J9dNxRSSYJbZOEJ7IbYkU5E7Stmea0m0qK45DTh3B8O/cfHqlULBEtPUmpH+OhYBEjWBspsMuNwKs2glrVcRxztwK74jruotA6eDlUIK9mYH/1BwnJYio04Vipnuem2p9iqRnhdFbqZ4qmmIzxkPYMChxT5U8Xi8/QuVEGKEqkOUKjhfp7YopjpSZxaDpjrEdq1ZuL/3m9TEfX/pSJNNNUkOVDUcaRTtA8BTRgkhLNJwYwkczsisgIS0y0yapkQvBWv7wOnZrjGb6/rNRv8jiKcApncAEeXEEd7qAJbSCQwTO8wpv1ZL1Y79bHsrVg5TMn8Keszx9NF5Dm</latexit><latexit sha1_base64="3nwYg5V/8/2poVMB58MrV1fD5fQ=">AAAB+HicbZDNTgIxFIXv4B/iD6Mu3TQSExdkMkNMdAm6cYkJIAlMJp3SgYZOZ9J2TJDwJG5caIxbH8Wdb2OBWSh4k6Zfzrk3vT1hypnSrvttFTY2t7Z3irulvf2Dw7J9dNxRSSYJbZOEJ7IbYkU5E7Stmea0m0qK45DTh3B8O/cfHqlULBEtPUmpH+OhYBEjWBspsMuNwKs2glrVcRxztwK74jruotA6eDlUIK9mYH/1BwnJYio04Vipnuem2p9iqRnhdFbqZ4qmmIzxkPYMChxT5U8Xi8/QuVEGKEqkOUKjhfp7YopjpSZxaDpjrEdq1ZuL/3m9TEfX/pSJNNNUkOVDUcaRTtA8BTRgkhLNJwYwkczsisgIS0y0yapkQvBWv7wOnZrjGb6/rNRv8jiKcApncAEeXEEd7qAJbSCQwTO8wpv1ZL1Y79bHsrVg5TMn8Keszx9NF5Dm</latexit>

TX

t=1

Rt

<latexit sha1_base64="YIoivBJth6KmUEL1hovQIgmRNeY=">AAAB+XicbZDLSgMxFIYz9VbrbdSlm2ARXJUZEXQjFN24rNIbtOOQSdM2NMkMyZlCGfomblwo4tY3cefbmLaz0NYfAh//OYdz8keJ4AY879sprK1vbG4Vt0s7u3v7B+7hUdPEqaasQWMR63ZEDBNcsQZwEKydaEZkJFgrGt3N6q0x04bHqg6ThAWSDBTvc0rAWqHrdk0qwwxu/OlTHT+GELplr+LNhVfBz6GMctVC96vbi2kqmQIqiDEd30sgyIgGTgWblrqpYQmhIzJgHYuKSGaCbH75FJ9Zp4f7sbZPAZ67vycyIo2ZyMh2SgJDs1ybmf/VOin0r4OMqyQFpuhiUT8VGGI8iwH3uGYUxMQCoZrbWzEdEk0o2LBKNgR/+cur0Lyo+JYfLsvV2zyOIjpBp+gc+egKVdE9qqEGomiMntErenMy58V5dz4WrQUnnzlGf+R8/gDqbpMs</latexit><latexit sha1_base64="YIoivBJth6KmUEL1hovQIgmRNeY=">AAAB+XicbZDLSgMxFIYz9VbrbdSlm2ARXJUZEXQjFN24rNIbtOOQSdM2NMkMyZlCGfomblwo4tY3cefbmLaz0NYfAh//OYdz8keJ4AY879sprK1vbG4Vt0s7u3v7B+7hUdPEqaasQWMR63ZEDBNcsQZwEKydaEZkJFgrGt3N6q0x04bHqg6ThAWSDBTvc0rAWqHrdk0qwwxu/OlTHT+GELplr+LNhVfBz6GMctVC96vbi2kqmQIqiDEd30sgyIgGTgWblrqpYQmhIzJgHYuKSGaCbH75FJ9Zp4f7sbZPAZ67vycyIo2ZyMh2SgJDs1ybmf/VOin0r4OMqyQFpuhiUT8VGGI8iwH3uGYUxMQCoZrbWzEdEk0o2LBKNgR/+cur0Lyo+JYfLsvV2zyOIjpBp+gc+egKVdE9qqEGomiMntErenMy58V5dz4WrQUnnzlGf+R8/gDqbpMs</latexit><latexit sha1_base64="YIoivBJth6KmUEL1hovQIgmRNeY=">AAAB+XicbZDLSgMxFIYz9VbrbdSlm2ARXJUZEXQjFN24rNIbtOOQSdM2NMkMyZlCGfomblwo4tY3cefbmLaz0NYfAh//OYdz8keJ4AY879sprK1vbG4Vt0s7u3v7B+7hUdPEqaasQWMR63ZEDBNcsQZwEKydaEZkJFgrGt3N6q0x04bHqg6ThAWSDBTvc0rAWqHrdk0qwwxu/OlTHT+GELplr+LNhVfBz6GMctVC96vbi2kqmQIqiDEd30sgyIgGTgWblrqpYQmhIzJgHYuKSGaCbH75FJ9Zp4f7sbZPAZ67vycyIo2ZyMh2SgJDs1ybmf/VOin0r4OMqyQFpuhiUT8VGGI8iwH3uGYUxMQCoZrbWzEdEk0o2LBKNgR/+cur0Lyo+JYfLsvV2zyOIjpBp+gc+egKVdE9qqEGomiMntErenMy58V5dz4WrQUnnzlGf+R8/gDqbpMs</latexit><latexit sha1_base64="YIoivBJth6KmUEL1hovQIgmRNeY=">AAAB+XicbZDLSgMxFIYz9VbrbdSlm2ARXJUZEXQjFN24rNIbtOOQSdM2NMkMyZlCGfomblwo4tY3cefbmLaz0NYfAh//OYdz8keJ4AY879sprK1vbG4Vt0s7u3v7B+7hUdPEqaasQWMR63ZEDBNcsQZwEKydaEZkJFgrGt3N6q0x04bHqg6ThAWSDBTvc0rAWqHrdk0qwwxu/OlTHT+GELplr+LNhVfBz6GMctVC96vbi2kqmQIqiDEd30sgyIgGTgWblrqpYQmhIzJgHYuKSGaCbH75FJ9Zp4f7sbZPAZ67vycyIo2ZyMh2SgJDs1ybmf/VOin0r4OMqyQFpuhiUT8VGGI8iwH3uGYUxMQCoZrbWzEdEk0o2LBKNgR/+cur0Lyo+JYfLsvV2zyOIjpBp+gc+egKVdE9qqEGomiMntErenMy58V5dz4WrQUnnzlGf+R8/gDqbpMs</latexit>

T maxa2[k]

E[Rt|a]�TX

t=1

Ea⇠⇡ [E[Rt|a]]<latexit sha1_base64="JQCYv8AV15V4Bxv4/26ZC0Vdkug=">AAACWHicbVFda9swFJXdrzTrujR97MulYbCXFnsUupdBWSn0sStJW7A9IytyIiLJRroeC67/ZKEP7V/py+Q0sPXjguDonHukq6OslMJiEDx4/srq2vpGZ7P7Yevj9qfeTv/KFpVhfMQKWZibjFouheYjFCj5TWk4VZnk19nstNWvf3NjRaGHOC95ouhEi1wwio5Ke8UQYkX/pDWNhY5mSdNucZpl9VkTXaYIt0ATgAOA2FYqrfF72Pwa/utpjVYoiEvhrJLnGL17AMRGTKaYpL1BcBgsCt6CcAkGZFkXae8uHhesUlwjk9TaKAxKTGpqUDDJm25cWV5SNqMTHjmoqeI2qRfBNPDZMWPIC+OWRliw/ztqqqydq8x1tkPb11pLvqdFFebfklroskKu2fNFeSUBC2hThrEwnKGcO0CZEW5WYFNqKEP3F10XQvj6yW/B1dfD0OGfR4OTH8s4OmSP7JMvJCTH5ISckwsyIozckydv1VvzHn3ib/ibz62+t/Tskhfl9/8CQ8qzCA==</latexit><latexit sha1_base64="JQCYv8AV15V4Bxv4/26ZC0Vdkug=">AAACWHicbVFda9swFJXdrzTrujR97MulYbCXFnsUupdBWSn0sStJW7A9IytyIiLJRroeC67/ZKEP7V/py+Q0sPXjguDonHukq6OslMJiEDx4/srq2vpGZ7P7Yevj9qfeTv/KFpVhfMQKWZibjFouheYjFCj5TWk4VZnk19nstNWvf3NjRaGHOC95ouhEi1wwio5Ke8UQYkX/pDWNhY5mSdNucZpl9VkTXaYIt0ATgAOA2FYqrfF72Pwa/utpjVYoiEvhrJLnGL17AMRGTKaYpL1BcBgsCt6CcAkGZFkXae8uHhesUlwjk9TaKAxKTGpqUDDJm25cWV5SNqMTHjmoqeI2qRfBNPDZMWPIC+OWRliw/ztqqqydq8x1tkPb11pLvqdFFebfklroskKu2fNFeSUBC2hThrEwnKGcO0CZEW5WYFNqKEP3F10XQvj6yW/B1dfD0OGfR4OTH8s4OmSP7JMvJCTH5ISckwsyIozckydv1VvzHn3ib/ibz62+t/Tskhfl9/8CQ8qzCA==</latexit><latexit sha1_base64="JQCYv8AV15V4Bxv4/26ZC0Vdkug=">AAACWHicbVFda9swFJXdrzTrujR97MulYbCXFnsUupdBWSn0sStJW7A9IytyIiLJRroeC67/ZKEP7V/py+Q0sPXjguDonHukq6OslMJiEDx4/srq2vpGZ7P7Yevj9qfeTv/KFpVhfMQKWZibjFouheYjFCj5TWk4VZnk19nstNWvf3NjRaGHOC95ouhEi1wwio5Ke8UQYkX/pDWNhY5mSdNucZpl9VkTXaYIt0ATgAOA2FYqrfF72Pwa/utpjVYoiEvhrJLnGL17AMRGTKaYpL1BcBgsCt6CcAkGZFkXae8uHhesUlwjk9TaKAxKTGpqUDDJm25cWV5SNqMTHjmoqeI2qRfBNPDZMWPIC+OWRliw/ztqqqydq8x1tkPb11pLvqdFFebfklroskKu2fNFeSUBC2hThrEwnKGcO0CZEW5WYFNqKEP3F10XQvj6yW/B1dfD0OGfR4OTH8s4OmSP7JMvJCTH5ISckwsyIozckydv1VvzHn3ib/ibz62+t/Tskhfl9/8CQ8qzCA==</latexit><latexit sha1_base64="JQCYv8AV15V4Bxv4/26ZC0Vdkug=">AAACWHicbVFda9swFJXdrzTrujR97MulYbCXFnsUupdBWSn0sStJW7A9IytyIiLJRroeC67/ZKEP7V/py+Q0sPXjguDonHukq6OslMJiEDx4/srq2vpGZ7P7Yevj9qfeTv/KFpVhfMQKWZibjFouheYjFCj5TWk4VZnk19nstNWvf3NjRaGHOC95ouhEi1wwio5Ke8UQYkX/pDWNhY5mSdNucZpl9VkTXaYIt0ATgAOA2FYqrfF72Pwa/utpjVYoiEvhrJLnGL17AMRGTKaYpL1BcBgsCt6CcAkGZFkXae8uHhesUlwjk9TaKAxKTGpqUDDJm25cWV5SNqMTHjmoqeI2qRfBNPDZMWPIC+OWRliw/ztqqqydq8x1tkPb11pLvqdFFebfklroskKu2fNFeSUBC2hThrEwnKGcO0CZEW5WYFNqKEP3F10XQvj6yW/B1dfD0OGfR4OTH8s4OmSP7JMvJCTH5ISckwsyIozckydv1VvzHn3ib/ibz62+t/Tskhfl9/8CQ8qzCA==</latexit>

Page 31: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

Exploration vs. Exploitation

31(Illustration from Dan Klein and Pieter Abbeel’s course in UC Berkeley)

Page 32: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

Contextual Bandits

32

Features: [Burger, Fries, Onion Ring, Fried Chicken]

Features: [Noodles, Tom Yum Soup, Poor service]

(Illustration from Dan Klein and Pieter Abbeel’s course in UC Berkeley)

Page 33: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

Algorithms for Bandits Algorithms

• Multi-armed Bandits– Explore-First– eps-Greedy– Upper Confidence Bound

• Contextual bandits– Infinite state space– Work with a policy class instead

• The concept of regret

33

Page 34: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

Off-policy evaluation under the Contextual bandits model• Contexts:

– drawn iid, possibly infinite domain

• Actions: – Taken by a randomized “Logging” policy

• Reward:– Revealed only for the action taken

• Value: –

• We collect data by the above processes.

x1, ..., xn ⇠ �

ri ⇠ D(r|xi, ai)

(xi, ai, ri)ni=1

vµ = Ex⇠�Ea⇠µ(·|x)ED[r|x, a]

34

ai ⇠ µ(a|xi)

Page 35: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

35

T Y

X

U

Average Treatment EffectATE = E[Y | T =1] – E[Y | T = 0]

C

Ignorability Assumption

Clinical Trial and ATE estimation

Page 36: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

Reinforcement Learning

• Simplify the problem– Make state discrete– Make action discrete– Make state fully observable.

• Remaining challenge: Learning / Long-term planning

• What is a Q function, what is a Value function?

• What is the optimal policy?

36

Page 37: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

Value function for a specific policy

+1

-1

• reward +1 at [4,3], -1 at [4,2]• reward -0.04 for each step

actions: UP, DOWN, LEFT, RIGHT

UP

80% move UP

10% move LEFT

10% move RIGHT

( +1-0.04 0 )8/9 * +

1/9 * (-0.04 + Vπ(s’) )+1.0

Page 38: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

Algorithms for RL

• Dynamic programming– policy evaluation: compute Vp from p– policy improvement: improve p based on Vp

• Monte Carlo Methods– For on-policy policy evaluation– For estimating the Q function and V-function.

• Temporal Difference Learning– Bootstrap with current belief of the Value function.

• Policy gradient– Stochastic Gradient Descent!

38

Page 39: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

Lecture 17, 18: Logic

• Logic agent

• Knowledge Base– Tell operation– Ask operation

• Components of a formal mathematical logic system– Syntax, Semantics

• Inference Algorithms.

39

Page 40: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

• Need a formal logic system to work

• Need a data structure to represent known facts

• Need an algorithm to answer ASK questions

40

Knowledge Base

Inference engine

Domain specific content; facts

ASK

TELL

Domain independent algorithms; can deduce new facts from the KB

Recap: KB AgentsTrue sentences

Page 41: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

Syntax and semantics

• Two components of a logic system

• Syntax --- How to construct sentences– The symbols– The operators that connect symbols together– A precedence ordering

• Semantics --- Rules the assignment of sentences to truth– For every possible worlds (or “models” in logic jargon)– The truth table is a semantics

41

Page 42: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

42

Entailment

Representation

World

FactFOLLOWS

ENTAILS

Facts

Sentences

Semantics

Sentence

Semantics

A is entailed by B, if A is true in all possible worlds consistent with Bunder the semantics.

Page 43: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

Inference procedure

• Inference procedure– Rules (algorithms) that we apply (often recursively) to derive truthfrom other truth.

– Could be specific to a particular set of semantics, a particularrealization of the world.

• Soundness and completeness of an inference procedure– Soundness: All truth discovered are valid.– Completeness: All truth that are entailed can be discovered.

43

Page 44: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

Propositional Logic

• Syntax:

44

• Syntax– True, false, propositional symbols– ( ) , ¬ (not), Ù (and), Ú (or), Þ (implies), Û (equivalent)

• Semantics:– Five rules (the following truth table)

• Inference rules:– Modus Pronens etc. Most important: Resolution

Page 45: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

Propositional logic agent

• Representation: Conjunctive Normal Forms– Represent them in a data structure: a list, a heap, a tree?

– Efficient TELL operation

• Inference: Solve ASK question– Use “Resolution” only on CNFs is Sound and Complete.

– Equivalent to SAT, NP-complete, but good heuristics / practicalalgorithms exist

• Possible answers to ASK:– Valid, Satisfiable, Unsatisfiable

45

Page 46: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

First order logic

• More expressive language– Relations and functions of objects.– Quantifiers such as, All, Exists.

• Easier to construct a KB.– Need much smaller number of sentences to capture a domain.

• Follow the same structure: Symbols, Semantics

• Dedicated inference algorithms

• (FOL is not covered in the Final)46

Page 47: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

We have gone a long way!

47

Week Topic

1Introduction and Course OverviewAI Problem Solving and Intelligent Agents

2Quantifying uncertaintyProbabilistic Reasoning: Bayes Network

3Probabilistic Reasoning: Conditional IndependencesMachine Learning: Supervised Learning

4

Machine Learning: Unsupervised Learning

Machine Learning: How machine learning works?

5Continuous optimizationSearch: Solving problems with Search

6Search: Basic searchMidterm

7Search: Informed searchRL: RL Overview and MDP

8RL: Multi-arm BanditsRL: Contexual Bandits & Policy evaluation

9RL: Tabular MDP, RL algorithmsLogic: Propositional Logic

10Logic: First order logicReview session

11 Final Exam. March 18 12:00 PM - 3:00 PM

Probabilistic Reasoning

Machine Learning

Search

ReinforcementLearning

Logic

Page 48: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

Future of AI

• More higher level intelligence– But more learning based than rule-based

• More stateful systems, more reinforcement learning

• More AI in the non-iid environment– Structured– Adversarial

• More forms of agent’s perception– Weak supervision– Self-supervision (bootstrapping)

48

Page 49: Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019winter/Lec… · Machine Learning: How machine learning works? 5 Continuous optimization Search: Solving problems with Search

Final words

• With greater power comes great responsibility.– Ethics in AI, Privacy– AI for good causes– Social impacts

• Thank you!– It’s my pleasure to work with you!– I hope the course is / will be useful.

49