Upload
others
View
9
Download
0
Embed Size (px)
Citation preview
Artificial IntelligenceCS 165A
Mar 14, 2019
Instructor: Prof. Yu-Xiang Wang
® Review
1
Announcement
• HW4 due
• MP2 due– A tournament will be conducted and the results revealed on Piazza.
• HW3 grading is being done.– The TA catching a homework deadline.
– It will be distributed in Friday’s discussion class.
– The TA will talk about HW3 and HW4 solutions.
• Final: Next Monday 12 - 3!
2
Very important: Course evaluation
• Please complete the course evaluation form– I need on volunteer to help collecting the responses and submit theforms to the CS department office.
• This is the first time I am running CS165A.– Your feedbacks will make a difference.
• We are in the process of modernizing the course.– New topics, e.g., RL.– More rigorous treatments.
• I hope CS165A will be a course that you brag about inyour job interviews and remember for many years.
3
MP1: Leaderboard
1. Brian Humphreys: 100%– Multinomial NaiveBayes: Using up to 5-grams features.
2. Claudia Zeng: 99.93%3. Calvin Wang: 98.97%
• Other winners will be notified in private emails.– Number 4, 5 get 20 bonus points.– Number 6- 10 get 10 bonus points.
4
MP1: Considerations
• Accuracy:
– 50% is trivial.
– 80% is the Baseline multinomial naïve bayes without additional
feature engineering.
• Run time: From ~10 s to more than one hour
– Use hashtables: dictionary. Use sparse representation.
• Modularize your code
– Feature extractor(Text) à Feature
– Train(Feature, Label) à Model parameters
– Predict(Feature) à Prediction
– Evaluate(Prediction, TrueLabel) à Accuracy
5
Course schedule
6
Week Topic
1Introduction and Course OverviewAI Problem Solving and Intelligent Agents
2Quantifying uncertaintyProbabilistic Reasoning: Bayes Network
3Probabilistic Reasoning: Conditional IndependencesMachine Learning: Supervised Learning
4
Machine Learning: Unsupervised Learning
Machine Learning: How machine learning works?
5Continuous optimizationSearch: Solving problems with Search
6Search: Basic searchMidterm
7Search: Informed searchRL: RL Overview and MDP
8RL: Multi-arm BanditsRL: Contexual Bandits & Policy evaluation
9RL: Tabular MDP, RL algorithmsLogic: Propositional Logic
10Logic: First order logicReview session
11 Final Exam. March 18 12:00 PM - 3:00 PM
Probabilistic Reasoning
Machine Learning
Search
ReinforcementLearning
Logic
Final exam
• A small fraction will be about Lecture 1-9
• Main focus on Lecture 10 – 17– Search– RL– Logic
• There will be questions about using tools from 1-9 on the second half of the class
7
Lecture 1-2: AI Overview
• Strong AI / Weak AI• Turing Test• AI for problem solving• Rational agents• Examples of AI in the real world
8
Our view of AI
• So this course is about designing rational agents
– Constructing f– For a given class of environments and tasks, we seek the agent (or
class of agents) with the “best” performance
– Note: Computational limitations make complete rationality unachievable in most cases
• In practice, we will focus on problem-solving techniques(ways of constructing f ), not agents per se
9
Different Ways of Looking at the AI
• Agent types / level of intelligence– Low-level: Reflex agents
– Mid-level: Goal-based / Utility-based agents: planning
– High-level: Knowledge-based: Logic agents
• Optimization view– Everything is an optimization problem
• Theoretical aspects– Time/space complexity
– Algorithms and data structures
– Statistical properties: sample complexity
10
Rational agents and Optimization
• What is the objective function that an AI agent optimizes?– Likelihood, Training error, Utility, Reward, Regret
• What is the argument over which the AI agent is optimizing?– Policy, action, search strategy
• What are the input from the environment into that objective functions– Observation, State, Reward, Feedback, Labels, Features.
• What are the algorithms used for these agents– Gradient descent, SGD. Tree-search, Graph search. Dynamic
programming. Explore-Exploit. 11
Lecture 3-5: Probabilities and BayesNet
• CPTs– Count number of independent numbers to represent a CPT
• Conditional, Marginal, Probabilistic Inference with BayesRule
• Read off conditional independences from the graph– d-separation– Markov Blanket
• Undirected graphical models (Not to be in the exam)
12
13
Example: Flu and measles
Flu
MeaslesFever
Spots
P(Flu) = 0.01P(Measles) = 0.001
P(Flu)
P(Measles)
P(Spots | Measles)
P(Fever | Flu, Measles)
P(Spots | Measles) = [0, 0.9]P(Fever | Flu, Measles) = [0.01, 0.8, 0.9, 1.0]
Compute P(Flu | Fever) and P(Flu | Fever, Spots).Are they equivalent?
CPTs:
d-separation and Markov Blanket
14
3 ways to block paths from X to Y, given E
The set of nodes E d-separates sets Xand Y
1. Parents
2. Children
3. Children’s other parents
Connecting BayesNet to MDP
• Markov decision process has a collection of random variable– State, Action, Reward at time 1,2,3,4,…
• The MDP specifies a very specific way how these random variables are generated– In a form of a probability distribution, that can be factorized.
• What is the BayesNet of an MDP?
• What are the conditional independences?
15
Lecture 6-9: Machine Learning
• Types of machine learning: Supervised / unsupervised…
– Examples.
• ML is about coming up with objectives to optimize
– Often MLE, MAP.
– Often there is a graphical model.
• How to optimize?
– Gradient Descent, SGD
• Statistical Learning theory
– uniform convergence using
– Hoeffding’s inequality and Union bound.16
Understanding Machine Learning
• What do we mean by saying: ML works – Error decomposition – Empirical risk, Risk, Generalization
• When ML does not work? What are the assumptions?– iid: Independent and Identically Distributed– Training data and test data are drawn from the same distribution
• Statistical Learning theory– How many data points do we need?– Hoeffding’s inequality and the union bound.
17
Lecture 10 – 13: Search
• Formulate problems as search
• Algorithms for search
• Search for playing games (more than one player)
18
19
Example: Romania
20
How do we evaluate a search algorithm?
• Primary criteria to evaluate search strategies– Completeness
• Is it guaranteed to find a solution (if one exists)?– Optimality
• Does it find the “best” solution (if there are more than one)?– Time complexity
• Number of nodes generated/expanded• (How long does it take to find a solution?)
– Space complexity• How much memory does it require?
• Some performance measures– Best case– Worst case– Average case– Real-world case
*Note that this is not saying it’s space/time complexity is optimal.
21
State-Space Diagrams
• State-space description can be represented by a state-space diagram, which shows– States (incl. initial and goal)– Operators/actions (state transitions)– Path costs
22
Search Tree
B C CB F
D H G
A D GA D E
B C
A
Search tree (partially expanded)
Uninformed search algorithms
23
(Section 3.4.7 in the AIMA book.)
24
Informed search (A* search)
f(n) = g(n) + h(n)
What we know about A* Search?
• If the heuristic is “admissible”– Optimistic, always underestimates cost– Then, A* Tree search is optimal, and optimally efficient.
• If the heuristic is “consistent”– Then A* Graph Search is optimal.
• Question to think about– How to learn a heuristic function?
25
Games and minimax search
• Specification of a game:– State-action-transition-– Two-player, Zero-sum, perfect
information, Deterministic
• Minimax search:– Search assuming your opponent is
behaving adversarially.
• Two ways to speedup– Pruning– Cut off minimax search early, use a
heuristic26
Opponent’smove
7 3 -8 50
Your move
3 -8
3
MIN
MAX
Search depth cutoff
Tic-Tac-Toe withsearch depth 2
Evaluations shown for X
-2
27
Lecture 13 -16: Reinforcement Learning
• Overview– What are the problems that are best solved by RL– How RL is related to supervised learning / search
• Settings:– Multi-armed bandit– Contextual bandit– Policy evaluation and causal inference– Reinforcement Learning
• Key concepts:– Explore-Exploit: The need for exploration– Value function, Q function: The need for long-term planning
28
Reinforcement learning problem setup
• State, Action, Reward and Observation
• Policy:– When the state is observable:– Or when the state is not observable
• Learn the best policy that maximizes the expected reward
– Finite horizon (episodic) RL:
– Infinite horizon RL:29
⇡ : S ! A<latexit sha1_base64="JSR2KBCrB1Pfm6GeQfT6grf0tX8=">AAACEnicbZDLSsNAFIYnXmu9RV26GSyCbkoiguKq6sZlRXuBJpTJdNIOncyEmYlSQp7Bja/ixoUibl25822ctAG19YeBj/+cw5zzBzGjSjvOlzU3v7C4tFxaKa+urW9s2lvbTSUSiUkDCyZkO0CKMMpJQ1PNSDuWBEUBI61geJnXW3dEKir4rR7FxI9Qn9OQYqSN1bUPvZieQehFSA8wYulNBj1J+wONpBT3P/551rUrTtUZC86CW0AFFKp37U+vJ3ASEa4xQ0p1XCfWfoqkppiRrOwlisQID1GfdAxyFBHlp+OTMrhvnB4MhTSPazh2f0+kKFJqFAWmM19RTddy879aJ9HhqZ9SHieacDz5KEwY1ALm+cAelQRrNjKAsKRmV4gHSCKsTYplE4I7ffIsNI+qruHr40rtooijBHbBHjgALjgBNXAF6qABMHgAT+AFvFqP1rP1Zr1PWuesYmYH/JH18Q0lz53J</latexit><latexit sha1_base64="JSR2KBCrB1Pfm6GeQfT6grf0tX8=">AAACEnicbZDLSsNAFIYnXmu9RV26GSyCbkoiguKq6sZlRXuBJpTJdNIOncyEmYlSQp7Bja/ixoUibl25822ctAG19YeBj/+cw5zzBzGjSjvOlzU3v7C4tFxaKa+urW9s2lvbTSUSiUkDCyZkO0CKMMpJQ1PNSDuWBEUBI61geJnXW3dEKir4rR7FxI9Qn9OQYqSN1bUPvZieQehFSA8wYulNBj1J+wONpBT3P/551rUrTtUZC86CW0AFFKp37U+vJ3ASEa4xQ0p1XCfWfoqkppiRrOwlisQID1GfdAxyFBHlp+OTMrhvnB4MhTSPazh2f0+kKFJqFAWmM19RTddy879aJ9HhqZ9SHieacDz5KEwY1ALm+cAelQRrNjKAsKRmV4gHSCKsTYplE4I7ffIsNI+qruHr40rtooijBHbBHjgALjgBNXAF6qABMHgAT+AFvFqP1rP1Zr1PWuesYmYH/JH18Q0lz53J</latexit><latexit sha1_base64="JSR2KBCrB1Pfm6GeQfT6grf0tX8=">AAACEnicbZDLSsNAFIYnXmu9RV26GSyCbkoiguKq6sZlRXuBJpTJdNIOncyEmYlSQp7Bja/ixoUibl25822ctAG19YeBj/+cw5zzBzGjSjvOlzU3v7C4tFxaKa+urW9s2lvbTSUSiUkDCyZkO0CKMMpJQ1PNSDuWBEUBI61geJnXW3dEKir4rR7FxI9Qn9OQYqSN1bUPvZieQehFSA8wYulNBj1J+wONpBT3P/551rUrTtUZC86CW0AFFKp37U+vJ3ASEa4xQ0p1XCfWfoqkppiRrOwlisQID1GfdAxyFBHlp+OTMrhvnB4MhTSPazh2f0+kKFJqFAWmM19RTddy879aJ9HhqZ9SHieacDz5KEwY1ALm+cAelQRrNjKAsKRmV4gHSCKsTYplE4I7ffIsNI+qruHr40rtooijBHbBHjgALjgBNXAF6qABMHgAT+AFvFqP1rP1Zr1PWuesYmYH/JH18Q0lz53J</latexit><latexit sha1_base64="JSR2KBCrB1Pfm6GeQfT6grf0tX8=">AAACEnicbZDLSsNAFIYnXmu9RV26GSyCbkoiguKq6sZlRXuBJpTJdNIOncyEmYlSQp7Bja/ixoUibl25822ctAG19YeBj/+cw5zzBzGjSjvOlzU3v7C4tFxaKa+urW9s2lvbTSUSiUkDCyZkO0CKMMpJQ1PNSDuWBEUBI61geJnXW3dEKir4rR7FxI9Qn9OQYqSN1bUPvZieQehFSA8wYulNBj1J+wONpBT3P/551rUrTtUZC86CW0AFFKp37U+vJ3ASEa4xQ0p1XCfWfoqkppiRrOwlisQID1GfdAxyFBHlp+OTMrhvnB4MhTSPazh2f0+kKFJqFAWmM19RTddy879aJ9HhqZ9SHieacDz5KEwY1ALm+cAelQRrNjKAsKRmV4gHSCKsTYplE4I7ffIsNI+qruHr40rtooijBHbBHjgALjgBNXAF6qABMHgAT+AFvFqP1rP1Zr1PWuesYmYH/JH18Q0lz53J</latexit>
⇡t : (O ⇥A⇥ R)t�1! A
<latexit sha1_base64="oKDl42DTAiBMX0Nan8pkezMJmYI=">AAACPHicbVC7SgNBFJ31GeMramkzGIRYGHZFUKyiNnbGRx6QXcPsZJIMmX0wc1cJy36YjR9hZ2VjoYittZNkkZh4YOBwzrnMvccNBVdgmi/GzOzc/MJiZim7vLK6tp7b2KyqIJKUVWggAll3iWKC+6wCHASrh5IRzxWs5vbOB37tnknFA/8W+iFzPNLxeZtTAlpq5m7skDfhBBdsj0CXEhFfJjZwj6lf4XRccN34Otm7i2HfSrAteacLRMrgAY/Fm7m8WTSHwNPESkkepSg3c892K6CRx3yggijVsMwQnJhI4FSwJGtHioWE9kiHNTT1id7GiYfHJ3hXKy3cDqR+PuChOj4RE0+pvufq5GBFNekNxP+8RgTtYyfmfhgB8+noo3YkMAR40CRucckoiL4mhEqud8W0SyShoPvO6hKsyZOnSfWgaGl+dZgvnaV1ZNA22kEFZKEjVEIXqIwqiKJH9Ire0YfxZLwZn8bXKDpjpDNb6A+M7x9/IbAU</latexit><latexit sha1_base64="oKDl42DTAiBMX0Nan8pkezMJmYI=">AAACPHicbVC7SgNBFJ31GeMramkzGIRYGHZFUKyiNnbGRx6QXcPsZJIMmX0wc1cJy36YjR9hZ2VjoYittZNkkZh4YOBwzrnMvccNBVdgmi/GzOzc/MJiZim7vLK6tp7b2KyqIJKUVWggAll3iWKC+6wCHASrh5IRzxWs5vbOB37tnknFA/8W+iFzPNLxeZtTAlpq5m7skDfhBBdsj0CXEhFfJjZwj6lf4XRccN34Otm7i2HfSrAteacLRMrgAY/Fm7m8WTSHwNPESkkepSg3c892K6CRx3yggijVsMwQnJhI4FSwJGtHioWE9kiHNTT1id7GiYfHJ3hXKy3cDqR+PuChOj4RE0+pvufq5GBFNekNxP+8RgTtYyfmfhgB8+noo3YkMAR40CRucckoiL4mhEqud8W0SyShoPvO6hKsyZOnSfWgaGl+dZgvnaV1ZNA22kEFZKEjVEIXqIwqiKJH9Ire0YfxZLwZn8bXKDpjpDNb6A+M7x9/IbAU</latexit><latexit sha1_base64="oKDl42DTAiBMX0Nan8pkezMJmYI=">AAACPHicbVC7SgNBFJ31GeMramkzGIRYGHZFUKyiNnbGRx6QXcPsZJIMmX0wc1cJy36YjR9hZ2VjoYittZNkkZh4YOBwzrnMvccNBVdgmi/GzOzc/MJiZim7vLK6tp7b2KyqIJKUVWggAll3iWKC+6wCHASrh5IRzxWs5vbOB37tnknFA/8W+iFzPNLxeZtTAlpq5m7skDfhBBdsj0CXEhFfJjZwj6lf4XRccN34Otm7i2HfSrAteacLRMrgAY/Fm7m8WTSHwNPESkkepSg3c892K6CRx3yggijVsMwQnJhI4FSwJGtHioWE9kiHNTT1id7GiYfHJ3hXKy3cDqR+PuChOj4RE0+pvufq5GBFNekNxP+8RgTtYyfmfhgB8+noo3YkMAR40CRucckoiL4mhEqud8W0SyShoPvO6hKsyZOnSfWgaGl+dZgvnaV1ZNA22kEFZKEjVEIXqIwqiKJH9Ire0YfxZLwZn8bXKDpjpDNb6A+M7x9/IbAU</latexit><latexit sha1_base64="oKDl42DTAiBMX0Nan8pkezMJmYI=">AAACPHicbVC7SgNBFJ31GeMramkzGIRYGHZFUKyiNnbGRx6QXcPsZJIMmX0wc1cJy36YjR9hZ2VjoYittZNkkZh4YOBwzrnMvccNBVdgmi/GzOzc/MJiZim7vLK6tp7b2KyqIJKUVWggAll3iWKC+6wCHASrh5IRzxWs5vbOB37tnknFA/8W+iFzPNLxeZtTAlpq5m7skDfhBBdsj0CXEhFfJjZwj6lf4XRccN34Otm7i2HfSrAteacLRMrgAY/Fm7m8WTSHwNPESkkepSg3c892K6CRx3yggijVsMwQnJhI4FSwJGtHioWE9kiHNTT1id7GiYfHJ3hXKy3cDqR+PuChOj4RE0+pvufq5GBFNekNxP+8RgTtYyfmfhgB8+noo3YkMAR40CRucckoiL4mhEqud8W0SyShoPvO6hKsyZOnSfWgaGl+dZgvnaV1ZNA22kEFZKEjVEIXqIwqiKJH9Ire0YfxZLwZn8bXKDpjpDNb6A+M7x9/IbAU</latexit>
St 2 S<latexit sha1_base64="H4OVRyT8Zmoun872yVAuceDZNfk=">AAAB+3icbVBNS8NAFHypX7V+1Xr0slgETyURQY9FLx4rtbXQhLDZbtqlm03Y3Ygl5K948aCIV/+IN/+NmzYHbR1YGGbe481OkHCmtG1/W5W19Y3Nrep2bWd3b/+gftjoqziVhPZIzGM5CLCinAna00xzOkgkxVHA6UMwvSn8h0cqFYvFvZ4l1IvwWLCQEayN5NcbXV+7TCA3wnpCMM+6uV9v2i17DrRKnJI0oUTHr3+5o5ikERWacKzU0LET7WVYakY4zWtuqmiCyRSP6dBQgSOqvGyePUenRhmhMJbmCY3m6u+NDEdKzaLATBYR1bJXiP95w1SHV17GRJJqKsjiUJhypGNUFIFGTFKi+cwQTCQzWRGZYImJNnXVTAnO8pdXSf+85Rh+d9FsX5d1VOEYTuAMHLiENtxCB3pA4Ame4RXerNx6sd6tj8VoxSp3juAPrM8f6YGUWQ==</latexit><latexit sha1_base64="H4OVRyT8Zmoun872yVAuceDZNfk=">AAAB+3icbVBNS8NAFHypX7V+1Xr0slgETyURQY9FLx4rtbXQhLDZbtqlm03Y3Ygl5K948aCIV/+IN/+NmzYHbR1YGGbe481OkHCmtG1/W5W19Y3Nrep2bWd3b/+gftjoqziVhPZIzGM5CLCinAna00xzOkgkxVHA6UMwvSn8h0cqFYvFvZ4l1IvwWLCQEayN5NcbXV+7TCA3wnpCMM+6uV9v2i17DrRKnJI0oUTHr3+5o5ikERWacKzU0LET7WVYakY4zWtuqmiCyRSP6dBQgSOqvGyePUenRhmhMJbmCY3m6u+NDEdKzaLATBYR1bJXiP95w1SHV17GRJJqKsjiUJhypGNUFIFGTFKi+cwQTCQzWRGZYImJNnXVTAnO8pdXSf+85Rh+d9FsX5d1VOEYTuAMHLiENtxCB3pA4Ame4RXerNx6sd6tj8VoxSp3juAPrM8f6YGUWQ==</latexit><latexit sha1_base64="H4OVRyT8Zmoun872yVAuceDZNfk=">AAAB+3icbVBNS8NAFHypX7V+1Xr0slgETyURQY9FLx4rtbXQhLDZbtqlm03Y3Ygl5K948aCIV/+IN/+NmzYHbR1YGGbe481OkHCmtG1/W5W19Y3Nrep2bWd3b/+gftjoqziVhPZIzGM5CLCinAna00xzOkgkxVHA6UMwvSn8h0cqFYvFvZ4l1IvwWLCQEayN5NcbXV+7TCA3wnpCMM+6uV9v2i17DrRKnJI0oUTHr3+5o5ikERWacKzU0LET7WVYakY4zWtuqmiCyRSP6dBQgSOqvGyePUenRhmhMJbmCY3m6u+NDEdKzaLATBYR1bJXiP95w1SHV17GRJJqKsjiUJhypGNUFIFGTFKi+cwQTCQzWRGZYImJNnXVTAnO8pdXSf+85Rh+d9FsX5d1VOEYTuAMHLiENtxCB3pA4Ame4RXerNx6sd6tj8VoxSp3juAPrM8f6YGUWQ==</latexit><latexit sha1_base64="H4OVRyT8Zmoun872yVAuceDZNfk=">AAAB+3icbVBNS8NAFHypX7V+1Xr0slgETyURQY9FLx4rtbXQhLDZbtqlm03Y3Ygl5K948aCIV/+IN/+NmzYHbR1YGGbe481OkHCmtG1/W5W19Y3Nrep2bWd3b/+gftjoqziVhPZIzGM5CLCinAna00xzOkgkxVHA6UMwvSn8h0cqFYvFvZ4l1IvwWLCQEayN5NcbXV+7TCA3wnpCMM+6uV9v2i17DrRKnJI0oUTHr3+5o5ikERWacKzU0LET7WVYakY4zWtuqmiCyRSP6dBQgSOqvGyePUenRhmhMJbmCY3m6u+NDEdKzaLATBYR1bJXiP95w1SHV17GRJJqKsjiUJhypGNUFIFGTFKi+cwQTCQzWRGZYImJNnXVTAnO8pdXSf+85Rh+d9FsX5d1VOEYTuAMHLiENtxCB3pA4Ame4RXerNx6sd6tj8VoxSp3juAPrM8f6YGUWQ==</latexit>
At 2 A<latexit sha1_base64="xX32X1fWfQeu2hPnv8gbSWA79Eo=">AAAB+3icbVBNS8NAFHypX7V+1Xr0slgETyURQY+tXjxWsLXQhLDZbtqlm03Y3Ygl5K948aCIV/+IN/+NmzYHbR1YGGbe481OkHCmtG1/W5W19Y3Nrep2bWd3b/+gftjoqziVhPZIzGM5CLCinAna00xzOkgkxVHA6UMwvSn8h0cqFYvFvZ4l1IvwWLCQEayN5NcbHV+7TCA3wnpCMM86uV9v2i17DrRKnJI0oUTXr3+5o5ikERWacKzU0LET7WVYakY4zWtuqmiCyRSP6dBQgSOqvGyePUenRhmhMJbmCY3m6u+NDEdKzaLATBYR1bJXiP95w1SHV17GRJJqKsjiUJhypGNUFIFGTFKi+cwQTCQzWRGZYImJNnXVTAnO8pdXSf+85Rh+d9FsX5d1VOEYTuAMHLiENtxCF3pA4Ame4RXerNx6sd6tj8VoxSp3juAPrM8fsa2UNQ==</latexit><latexit sha1_base64="xX32X1fWfQeu2hPnv8gbSWA79Eo=">AAAB+3icbVBNS8NAFHypX7V+1Xr0slgETyURQY+tXjxWsLXQhLDZbtqlm03Y3Ygl5K948aCIV/+IN/+NmzYHbR1YGGbe481OkHCmtG1/W5W19Y3Nrep2bWd3b/+gftjoqziVhPZIzGM5CLCinAna00xzOkgkxVHA6UMwvSn8h0cqFYvFvZ4l1IvwWLCQEayN5NcbHV+7TCA3wnpCMM86uV9v2i17DrRKnJI0oUTXr3+5o5ikERWacKzU0LET7WVYakY4zWtuqmiCyRSP6dBQgSOqvGyePUenRhmhMJbmCY3m6u+NDEdKzaLATBYR1bJXiP95w1SHV17GRJJqKsjiUJhypGNUFIFGTFKi+cwQTCQzWRGZYImJNnXVTAnO8pdXSf+85Rh+d9FsX5d1VOEYTuAMHLiENtxCF3pA4Ame4RXerNx6sd6tj8VoxSp3juAPrM8fsa2UNQ==</latexit><latexit sha1_base64="xX32X1fWfQeu2hPnv8gbSWA79Eo=">AAAB+3icbVBNS8NAFHypX7V+1Xr0slgETyURQY+tXjxWsLXQhLDZbtqlm03Y3Ygl5K948aCIV/+IN/+NmzYHbR1YGGbe481OkHCmtG1/W5W19Y3Nrep2bWd3b/+gftjoqziVhPZIzGM5CLCinAna00xzOkgkxVHA6UMwvSn8h0cqFYvFvZ4l1IvwWLCQEayN5NcbHV+7TCA3wnpCMM86uV9v2i17DrRKnJI0oUTXr3+5o5ikERWacKzU0LET7WVYakY4zWtuqmiCyRSP6dBQgSOqvGyePUenRhmhMJbmCY3m6u+NDEdKzaLATBYR1bJXiP95w1SHV17GRJJqKsjiUJhypGNUFIFGTFKi+cwQTCQzWRGZYImJNnXVTAnO8pdXSf+85Rh+d9FsX5d1VOEYTuAMHLiENtxCF3pA4Ame4RXerNx6sd6tj8VoxSp3juAPrM8fsa2UNQ==</latexit><latexit sha1_base64="xX32X1fWfQeu2hPnv8gbSWA79Eo=">AAAB+3icbVBNS8NAFHypX7V+1Xr0slgETyURQY+tXjxWsLXQhLDZbtqlm03Y3Ygl5K948aCIV/+IN/+NmzYHbR1YGGbe481OkHCmtG1/W5W19Y3Nrep2bWd3b/+gftjoqziVhPZIzGM5CLCinAna00xzOkgkxVHA6UMwvSn8h0cqFYvFvZ4l1IvwWLCQEayN5NcbHV+7TCA3wnpCMM86uV9v2i17DrRKnJI0oUTXr3+5o5ikERWacKzU0LET7WVYakY4zWtuqmiCyRSP6dBQgSOqvGyePUenRhmhMJbmCY3m6u+NDEdKzaLATBYR1bJXiP95w1SHV17GRJJqKsjiUJhypGNUFIFGTFKi+cwQTCQzWRGZYImJNnXVTAnO8pdXSf+85Rh+d9FsX5d1VOEYTuAMHLiENtxCF3pA4Ame4RXerNx6sd6tj8VoxSp3juAPrM8fsa2UNQ==</latexit>
Rt 2 R<latexit sha1_base64="mcKcKCPJb1sMOgc9hprL//Z0AUs=">AAAB+nicbVDLSsNAFL2pr1pfqS7dDBbBVUlE0GXRjcta7APaECbTSTt0MgkzE6XEfoobF4q49Uvc+TdO2iy09cDA4Zx7uWdOkHCmtON8W6W19Y3NrfJ2ZWd3b//Arh52VJxKQtsk5rHsBVhRzgRta6Y57SWS4ijgtBtMbnK/+0ClYrG419OEehEeCRYygrWRfLva8vWACTSIsB4HQdaa+XbNqTtzoFXiFqQGBZq+/TUYxiSNqNCEY6X6rpNoL8NSM8LprDJIFU0wmeAR7RsqcESVl82jz9CpUYYojKV5QqO5+nsjw5FS0ygwk3lCtezl4n9eP9XhlZcxkaSaCrI4FKYc6RjlPaAhk5RoPjUEE8lMVkTGWGKiTVsVU4K7/OVV0jmvu4bfXdQa10UdZTiGEzgDFy6hAbfQhDYQeIRneIU368l6sd6tj8VoySp2juAPrM8fFw2T4Q==</latexit><latexit sha1_base64="mcKcKCPJb1sMOgc9hprL//Z0AUs=">AAAB+nicbVDLSsNAFL2pr1pfqS7dDBbBVUlE0GXRjcta7APaECbTSTt0MgkzE6XEfoobF4q49Uvc+TdO2iy09cDA4Zx7uWdOkHCmtON8W6W19Y3NrfJ2ZWd3b//Arh52VJxKQtsk5rHsBVhRzgRta6Y57SWS4ijgtBtMbnK/+0ClYrG419OEehEeCRYygrWRfLva8vWACTSIsB4HQdaa+XbNqTtzoFXiFqQGBZq+/TUYxiSNqNCEY6X6rpNoL8NSM8LprDJIFU0wmeAR7RsqcESVl82jz9CpUYYojKV5QqO5+nsjw5FS0ygwk3lCtezl4n9eP9XhlZcxkaSaCrI4FKYc6RjlPaAhk5RoPjUEE8lMVkTGWGKiTVsVU4K7/OVV0jmvu4bfXdQa10UdZTiGEzgDFy6hAbfQhDYQeIRneIU368l6sd6tj8VoySp2juAPrM8fFw2T4Q==</latexit><latexit sha1_base64="mcKcKCPJb1sMOgc9hprL//Z0AUs=">AAAB+nicbVDLSsNAFL2pr1pfqS7dDBbBVUlE0GXRjcta7APaECbTSTt0MgkzE6XEfoobF4q49Uvc+TdO2iy09cDA4Zx7uWdOkHCmtON8W6W19Y3NrfJ2ZWd3b//Arh52VJxKQtsk5rHsBVhRzgRta6Y57SWS4ijgtBtMbnK/+0ClYrG419OEehEeCRYygrWRfLva8vWACTSIsB4HQdaa+XbNqTtzoFXiFqQGBZq+/TUYxiSNqNCEY6X6rpNoL8NSM8LprDJIFU0wmeAR7RsqcESVl82jz9CpUYYojKV5QqO5+nsjw5FS0ygwk3lCtezl4n9eP9XhlZcxkaSaCrI4FKYc6RjlPaAhk5RoPjUEE8lMVkTGWGKiTVsVU4K7/OVV0jmvu4bfXdQa10UdZTiGEzgDFy6hAbfQhDYQeIRneIU368l6sd6tj8VoySp2juAPrM8fFw2T4Q==</latexit><latexit sha1_base64="mcKcKCPJb1sMOgc9hprL//Z0AUs=">AAAB+nicbVDLSsNAFL2pr1pfqS7dDBbBVUlE0GXRjcta7APaECbTSTt0MgkzE6XEfoobF4q49Uvc+TdO2iy09cDA4Zx7uWdOkHCmtON8W6W19Y3NrfJ2ZWd3b//Arh52VJxKQtsk5rHsBVhRzgRta6Y57SWS4ijgtBtMbnK/+0ClYrG419OEehEeCRYygrWRfLva8vWACTSIsB4HQdaa+XbNqTtzoFXiFqQGBZq+/TUYxiSNqNCEY6X6rpNoL8NSM8LprDJIFU0wmeAR7RsqcESVl82jz9CpUYYojKV5QqO5+nsjw5FS0ygwk3lCtezl4n9eP9XhlZcxkaSaCrI4FKYc6RjlPaAhk5RoPjUEE8lMVkTGWGKiTVsVU4K7/OVV0jmvu4bfXdQa10UdZTiGEzgDFy6hAbfQhDYQeIRneIU368l6sd6tj8VoySp2juAPrM8fFw2T4Q==</latexit>
Ot 2 O<latexit sha1_base64="OrcLGQFJVQKHdlJdMLP5tdjPy54=">AAAB+3icbVBNS8NAFNzUr1q/aj16WSyCp5KIoMeiF2+tYGuhCWGz3bRLN5uw+yKWkL/ixYMiXv0j3vw3btoctHVgYZh5jzc7QSK4Btv+tipr6xubW9Xt2s7u3v5B/bDR13GqKOvRWMRqEBDNBJesBxwEGySKkSgQ7CGY3hT+wyNTmsfyHmYJ8yIyljzklICR/Hqj44PLJXYjAhNKRNbJ/XrTbtlz4FXilKSJSnT9+pc7imkaMQlUEK2Hjp2AlxEFnAqW19xUs4TQKRmzoaGSREx72Tx7jk+NMsJhrMyTgOfq742MRFrPosBMFhH1sleI/3nDFMIrL+MySYFJujgUpgJDjIsi8IgrRkHMDCFUcZMV0wlRhIKpq2ZKcJa/vEr65y3H8LuLZvu6rKOKjtEJOkMOukRtdIu6qIcoekLP6BW9Wbn1Yr1bH4vRilXuHKE/sD5/AN0ZlFE=</latexit><latexit sha1_base64="OrcLGQFJVQKHdlJdMLP5tdjPy54=">AAAB+3icbVBNS8NAFNzUr1q/aj16WSyCp5KIoMeiF2+tYGuhCWGz3bRLN5uw+yKWkL/ixYMiXv0j3vw3btoctHVgYZh5jzc7QSK4Btv+tipr6xubW9Xt2s7u3v5B/bDR13GqKOvRWMRqEBDNBJesBxwEGySKkSgQ7CGY3hT+wyNTmsfyHmYJ8yIyljzklICR/Hqj44PLJXYjAhNKRNbJ/XrTbtlz4FXilKSJSnT9+pc7imkaMQlUEK2Hjp2AlxEFnAqW19xUs4TQKRmzoaGSREx72Tx7jk+NMsJhrMyTgOfq742MRFrPosBMFhH1sleI/3nDFMIrL+MySYFJujgUpgJDjIsi8IgrRkHMDCFUcZMV0wlRhIKpq2ZKcJa/vEr65y3H8LuLZvu6rKOKjtEJOkMOukRtdIu6qIcoekLP6BW9Wbn1Yr1bH4vRilXuHKE/sD5/AN0ZlFE=</latexit><latexit sha1_base64="OrcLGQFJVQKHdlJdMLP5tdjPy54=">AAAB+3icbVBNS8NAFNzUr1q/aj16WSyCp5KIoMeiF2+tYGuhCWGz3bRLN5uw+yKWkL/ixYMiXv0j3vw3btoctHVgYZh5jzc7QSK4Btv+tipr6xubW9Xt2s7u3v5B/bDR13GqKOvRWMRqEBDNBJesBxwEGySKkSgQ7CGY3hT+wyNTmsfyHmYJ8yIyljzklICR/Hqj44PLJXYjAhNKRNbJ/XrTbtlz4FXilKSJSnT9+pc7imkaMQlUEK2Hjp2AlxEFnAqW19xUs4TQKRmzoaGSREx72Tx7jk+NMsJhrMyTgOfq742MRFrPosBMFhH1sleI/3nDFMIrL+MySYFJujgUpgJDjIsi8IgrRkHMDCFUcZMV0wlRhIKpq2ZKcJa/vEr65y3H8LuLZvu6rKOKjtEJOkMOukRtdIu6qIcoekLP6BW9Wbn1Yr1bH4vRilXuHKE/sD5/AN0ZlFE=</latexit><latexit sha1_base64="OrcLGQFJVQKHdlJdMLP5tdjPy54=">AAAB+3icbVBNS8NAFNzUr1q/aj16WSyCp5KIoMeiF2+tYGuhCWGz3bRLN5uw+yKWkL/ixYMiXv0j3vw3btoctHVgYZh5jzc7QSK4Btv+tipr6xubW9Xt2s7u3v5B/bDR13GqKOvRWMRqEBDNBJesBxwEGySKkSgQ7CGY3hT+wyNTmsfyHmYJ8yIyljzklICR/Hqj44PLJXYjAhNKRNbJ/XrTbtlz4FXilKSJSnT9+pc7imkaMQlUEK2Hjp2AlxEFnAqW19xUs4TQKRmzoaGSREx72Tx7jk+NMsJhrMyTgOfq742MRFrPosBMFhH1sleI/3nDFMIrL+MySYFJujgUpgJDjIsi8IgrRkHMDCFUcZMV0wlRhIKpq2ZKcJa/vEr65y3H8LuLZvu6rKOKjtEJOkMOukRtdIu6qIcoekLP6BW9Wbn1Yr1bH4vRilXuHKE/sD5/AN0ZlFE=</latexit>
⇡⇤ = argmax⇡2⇧
E[1X
t=1
�t�1Rt]<latexit sha1_base64="YlLbWUf0D4dk67J+wTbjs5rWhLg=">AAACOHicbVDLShxBFK3W+Bpfo1m6KTIIIijdIuhGkAQhu4who8J0T3O7pnosrKpuqm6LQ9Gf5cbPcBeyyUIJ2foFqRlnER8HCg7n3Mutc7JSCoth+DOYmv4wMzs3v9BYXFpeWW2urZ/ZojKMd1ghC3ORgeVSaN5BgZJflIaDyiQ/z66+jPzza26sKPQPHJY8UTDQIhcM0Etp81tcit42PaIxmEGs4CZ1XomFjtuipl7AyyxzJ3WXxrZSqcOjqO457+c49P4AlIKew52opt9TpEnabIW74Rj0LYkmpEUmaKfN+7hfsEpxjUyCtd0oLDFxYFAwyetGXFleAruCAe96qkFxm7hx8JpueqVP88L4p5GO1f83HChrhyrzk6Mk9rU3Et/zuhXmh4kTuqyQa/Z8KK8kxYKOWqR9YThDOfQEmBH+r5RdggGGvuuGLyF6HfktOdvbjTw/3W8df57UMU82yCeyRSJyQI7JV9ImHcLILflFHshjcBf8Dv4Ef59Hp4LJzkfyAsHTP4yErNg=</latexit><latexit sha1_base64="YlLbWUf0D4dk67J+wTbjs5rWhLg=">AAACOHicbVDLShxBFK3W+Bpfo1m6KTIIIijdIuhGkAQhu4who8J0T3O7pnosrKpuqm6LQ9Gf5cbPcBeyyUIJ2foFqRlnER8HCg7n3Mutc7JSCoth+DOYmv4wMzs3v9BYXFpeWW2urZ/ZojKMd1ghC3ORgeVSaN5BgZJflIaDyiQ/z66+jPzza26sKPQPHJY8UTDQIhcM0Etp81tcit42PaIxmEGs4CZ1XomFjtuipl7AyyxzJ3WXxrZSqcOjqO457+c49P4AlIKew52opt9TpEnabIW74Rj0LYkmpEUmaKfN+7hfsEpxjUyCtd0oLDFxYFAwyetGXFleAruCAe96qkFxm7hx8JpueqVP88L4p5GO1f83HChrhyrzk6Mk9rU3Et/zuhXmh4kTuqyQa/Z8KK8kxYKOWqR9YThDOfQEmBH+r5RdggGGvuuGLyF6HfktOdvbjTw/3W8df57UMU82yCeyRSJyQI7JV9ImHcLILflFHshjcBf8Dv4Ef59Hp4LJzkfyAsHTP4yErNg=</latexit><latexit sha1_base64="YlLbWUf0D4dk67J+wTbjs5rWhLg=">AAACOHicbVDLShxBFK3W+Bpfo1m6KTIIIijdIuhGkAQhu4who8J0T3O7pnosrKpuqm6LQ9Gf5cbPcBeyyUIJ2foFqRlnER8HCg7n3Mutc7JSCoth+DOYmv4wMzs3v9BYXFpeWW2urZ/ZojKMd1ghC3ORgeVSaN5BgZJflIaDyiQ/z66+jPzza26sKPQPHJY8UTDQIhcM0Etp81tcit42PaIxmEGs4CZ1XomFjtuipl7AyyxzJ3WXxrZSqcOjqO457+c49P4AlIKew52opt9TpEnabIW74Rj0LYkmpEUmaKfN+7hfsEpxjUyCtd0oLDFxYFAwyetGXFleAruCAe96qkFxm7hx8JpueqVP88L4p5GO1f83HChrhyrzk6Mk9rU3Et/zuhXmh4kTuqyQa/Z8KK8kxYKOWqR9YThDOfQEmBH+r5RdggGGvuuGLyF6HfktOdvbjTw/3W8df57UMU82yCeyRSJyQI7JV9ImHcLILflFHshjcBf8Dv4Ef59Hp4LJzkfyAsHTP4yErNg=</latexit><latexit sha1_base64="YlLbWUf0D4dk67J+wTbjs5rWhLg=">AAACOHicbVDLShxBFK3W+Bpfo1m6KTIIIijdIuhGkAQhu4who8J0T3O7pnosrKpuqm6LQ9Gf5cbPcBeyyUIJ2foFqRlnER8HCg7n3Mutc7JSCoth+DOYmv4wMzs3v9BYXFpeWW2urZ/ZojKMd1ghC3ORgeVSaN5BgZJflIaDyiQ/z66+jPzza26sKPQPHJY8UTDQIhcM0Etp81tcit42PaIxmEGs4CZ1XomFjtuipl7AyyxzJ3WXxrZSqcOjqO457+c49P4AlIKew52opt9TpEnabIW74Rj0LYkmpEUmaKfN+7hfsEpxjUyCtd0oLDFxYFAwyetGXFleAruCAe96qkFxm7hx8JpueqVP88L4p5GO1f83HChrhyrzk6Mk9rU3Et/zuhXmh4kTuqyQa/Z8KK8kxYKOWqR9YThDOfQEmBH+r5RdggGGvuuGLyF6HfktOdvbjTw/3W8df57UMU82yCeyRSJyQI7JV9ImHcLILflFHshjcBf8Dv4Ef59Hp4LJzkfyAsHTP4yErNg=</latexit>
⇡⇤ = argmax⇡2⇧
E[TX
t=1
Rt]<latexit sha1_base64="TjJh8rvPqqmVSrTZo42NEAGcvKs=">AAACJXicbZDNSsNAFIUn/tb6V3XpZrAI4kISEXRhoSiCyypWC0kaJtNpHTqZhJkbsYS8jBtfxY0LiwiufBUntQttPTBw+O69zL0nTATXYNuf1szs3PzCYmmpvLyyurZe2di81XGqKGvSWMSqFRLNBJesCRwEayWKkSgU7C7snxf1uwemNI/lDQwS5kekJ3mXUwIGBZVTL+HtfVzDHlE9LyKPQWaIx6XX4Dk2AO7DMLvIXezpNAoyqDl5O7vJrwPAflCp2gf2SHjaOGNTRWM1gsrQ68Q0jZgEKojWrmMn4GdEAaeC5WUv1SwhtE96zDVWkohpPxtdmeNdQzq4GyvzJOAR/T2RkUjrQRSazmJtPVkr4H81N4XuiZ9xmaTAJP35qJsKDDEuIsMdrhgFMTCGUMXNrpjeE0UomGDLJgRn8uRpc3t44Bh/dVStn43jKKFttIP2kIOOUR1dogZqIoqe0At6Q0Pr2Xq13q2Pn9YZazyzhf7I+voGRi6lIQ==</latexit><latexit sha1_base64="TjJh8rvPqqmVSrTZo42NEAGcvKs=">AAACJXicbZDNSsNAFIUn/tb6V3XpZrAI4kISEXRhoSiCyypWC0kaJtNpHTqZhJkbsYS8jBtfxY0LiwiufBUntQttPTBw+O69zL0nTATXYNuf1szs3PzCYmmpvLyyurZe2di81XGqKGvSWMSqFRLNBJesCRwEayWKkSgU7C7snxf1uwemNI/lDQwS5kekJ3mXUwIGBZVTL+HtfVzDHlE9LyKPQWaIx6XX4Dk2AO7DMLvIXezpNAoyqDl5O7vJrwPAflCp2gf2SHjaOGNTRWM1gsrQ68Q0jZgEKojWrmMn4GdEAaeC5WUv1SwhtE96zDVWkohpPxtdmeNdQzq4GyvzJOAR/T2RkUjrQRSazmJtPVkr4H81N4XuiZ9xmaTAJP35qJsKDDEuIsMdrhgFMTCGUMXNrpjeE0UomGDLJgRn8uRpc3t44Bh/dVStn43jKKFttIP2kIOOUR1dogZqIoqe0At6Q0Pr2Xq13q2Pn9YZazyzhf7I+voGRi6lIQ==</latexit><latexit sha1_base64="TjJh8rvPqqmVSrTZo42NEAGcvKs=">AAACJXicbZDNSsNAFIUn/tb6V3XpZrAI4kISEXRhoSiCyypWC0kaJtNpHTqZhJkbsYS8jBtfxY0LiwiufBUntQttPTBw+O69zL0nTATXYNuf1szs3PzCYmmpvLyyurZe2di81XGqKGvSWMSqFRLNBJesCRwEayWKkSgU7C7snxf1uwemNI/lDQwS5kekJ3mXUwIGBZVTL+HtfVzDHlE9LyKPQWaIx6XX4Dk2AO7DMLvIXezpNAoyqDl5O7vJrwPAflCp2gf2SHjaOGNTRWM1gsrQ68Q0jZgEKojWrmMn4GdEAaeC5WUv1SwhtE96zDVWkohpPxtdmeNdQzq4GyvzJOAR/T2RkUjrQRSazmJtPVkr4H81N4XuiZ9xmaTAJP35qJsKDDEuIsMdrhgFMTCGUMXNrpjeE0UomGDLJgRn8uRpc3t44Bh/dVStn43jKKFttIP2kIOOUR1dogZqIoqe0At6Q0Pr2Xq13q2Pn9YZazyzhf7I+voGRi6lIQ==</latexit><latexit sha1_base64="TjJh8rvPqqmVSrTZo42NEAGcvKs=">AAACJXicbZDNSsNAFIUn/tb6V3XpZrAI4kISEXRhoSiCyypWC0kaJtNpHTqZhJkbsYS8jBtfxY0LiwiufBUntQttPTBw+O69zL0nTATXYNuf1szs3PzCYmmpvLyyurZe2di81XGqKGvSWMSqFRLNBJesCRwEayWKkSgU7C7snxf1uwemNI/lDQwS5kekJ3mXUwIGBZVTL+HtfVzDHlE9LyKPQWaIx6XX4Dk2AO7DMLvIXezpNAoyqDl5O7vJrwPAflCp2gf2SHjaOGNTRWM1gsrQ68Q0jZgEKojWrmMn4GdEAaeC5WUv1SwhtE96zDVWkohpPxtdmeNdQzq4GyvzJOAR/T2RkUjrQRSazmJtPVkr4H81N4XuiZ9xmaTAJP35qJsKDDEuIsMdrhgFMTCGUMXNrpjeE0UomGDLJgRn8uRpc3t44Bh/dVStn43jKKFttIP2kIOOUR1dogZqIoqe0At6Q0Pr2Xq13q2Pn9YZazyzhf7I+voGRi6lIQ==</latexit>
T: horizon
γ: discount factor
Multi-arm bandits: Problem setup
• No state. k-actions
• You decide which arm to pull in every iteration
• You collect a cumulative payoff of
• The goal of the agent is to maximize the expected payoff.– Or to minimize the Regret.
30
a 2 A = {1, 2, ..., k}<latexit sha1_base64="Pgd6Bi1XtRX/xI8rK2a2CZi9uYo=">AAACCnicbZDNSsNAFIVv6l+tf1GXbkaL4KKEpAi6EapuXFawrdCEMplO26GTSZiZCCV07cZXceNCEbc+gTvfxknbhbYeGPg4917m3hMmnCntut9WYWl5ZXWtuF7a2Nza3rF395oqTiWhDRLzWN6HWFHOBG1opjm9TyTFUchpKxxe5/XWA5WKxeJOjxIaRLgvWI8RrI3VsQ8x8plAfoT1gGCeXY7RBfIzr1KtOI5TGfrjjl12HXcitAjeDMowU71jf/ndmKQRFZpwrFTbcxMdZFhqRjgdl/xU0QSTIe7TtkGBI6qCbHLKGB0bp4t6sTRPaDRxf09kOFJqFIWmM19Zzddy879aO9W98yBjIkk1FWT6US/lSMcozwV1maRE85EBTCQzuyIywBITbdIrmRC8+ZMXoVl1PMO3p+Xa1SyOIhzAEZyAB2dQgxuoQwMIPMIzvMKb9WS9WO/Wx7S1YM1m9uGPrM8flR6YQQ==</latexit><latexit sha1_base64="Pgd6Bi1XtRX/xI8rK2a2CZi9uYo=">AAACCnicbZDNSsNAFIVv6l+tf1GXbkaL4KKEpAi6EapuXFawrdCEMplO26GTSZiZCCV07cZXceNCEbc+gTvfxknbhbYeGPg4917m3hMmnCntut9WYWl5ZXWtuF7a2Nza3rF395oqTiWhDRLzWN6HWFHOBG1opjm9TyTFUchpKxxe5/XWA5WKxeJOjxIaRLgvWI8RrI3VsQ8x8plAfoT1gGCeXY7RBfIzr1KtOI5TGfrjjl12HXcitAjeDMowU71jf/ndmKQRFZpwrFTbcxMdZFhqRjgdl/xU0QSTIe7TtkGBI6qCbHLKGB0bp4t6sTRPaDRxf09kOFJqFIWmM19Zzddy879aO9W98yBjIkk1FWT6US/lSMcozwV1maRE85EBTCQzuyIywBITbdIrmRC8+ZMXoVl1PMO3p+Xa1SyOIhzAEZyAB2dQgxuoQwMIPMIzvMKb9WS9WO/Wx7S1YM1m9uGPrM8flR6YQQ==</latexit><latexit sha1_base64="Pgd6Bi1XtRX/xI8rK2a2CZi9uYo=">AAACCnicbZDNSsNAFIVv6l+tf1GXbkaL4KKEpAi6EapuXFawrdCEMplO26GTSZiZCCV07cZXceNCEbc+gTvfxknbhbYeGPg4917m3hMmnCntut9WYWl5ZXWtuF7a2Nza3rF395oqTiWhDRLzWN6HWFHOBG1opjm9TyTFUchpKxxe5/XWA5WKxeJOjxIaRLgvWI8RrI3VsQ8x8plAfoT1gGCeXY7RBfIzr1KtOI5TGfrjjl12HXcitAjeDMowU71jf/ndmKQRFZpwrFTbcxMdZFhqRjgdl/xU0QSTIe7TtkGBI6qCbHLKGB0bp4t6sTRPaDRxf09kOFJqFIWmM19Zzddy879aO9W98yBjIkk1FWT6US/lSMcozwV1maRE85EBTCQzuyIywBITbdIrmRC8+ZMXoVl1PMO3p+Xa1SyOIhzAEZyAB2dQgxuoQwMIPMIzvMKb9WS9WO/Wx7S1YM1m9uGPrM8flR6YQQ==</latexit><latexit sha1_base64="Pgd6Bi1XtRX/xI8rK2a2CZi9uYo=">AAACCnicbZDNSsNAFIVv6l+tf1GXbkaL4KKEpAi6EapuXFawrdCEMplO26GTSZiZCCV07cZXceNCEbc+gTvfxknbhbYeGPg4917m3hMmnCntut9WYWl5ZXWtuF7a2Nza3rF395oqTiWhDRLzWN6HWFHOBG1opjm9TyTFUchpKxxe5/XWA5WKxeJOjxIaRLgvWI8RrI3VsQ8x8plAfoT1gGCeXY7RBfIzr1KtOI5TGfrjjl12HXcitAjeDMowU71jf/ndmKQRFZpwrFTbcxMdZFhqRjgdl/xU0QSTIe7TtkGBI6qCbHLKGB0bp4t6sTRPaDRxf09kOFJqFIWmM19Zzddy879aO9W98yBjIkk1FWT6US/lSMcozwV1maRE85EBTCQzuyIywBITbdIrmRC8+ZMXoVl1PMO3p+Xa1SyOIhzAEZyAB2dQgxuoQwMIPMIzvMKb9WS9WO/Wx7S1YM1m9uGPrM8flR6YQQ==</latexit>
A1, A2, ..., AT<latexit sha1_base64="3nwYg5V/8/2poVMB58MrV1fD5fQ=">AAAB+HicbZDNTgIxFIXv4B/iD6Mu3TQSExdkMkNMdAm6cYkJIAlMJp3SgYZOZ9J2TJDwJG5caIxbH8Wdb2OBWSh4k6Zfzrk3vT1hypnSrvttFTY2t7Z3irulvf2Dw7J9dNxRSSYJbZOEJ7IbYkU5E7Stmea0m0qK45DTh3B8O/cfHqlULBEtPUmpH+OhYBEjWBspsMuNwKs2glrVcRxztwK74jruotA6eDlUIK9mYH/1BwnJYio04Vipnuem2p9iqRnhdFbqZ4qmmIzxkPYMChxT5U8Xi8/QuVEGKEqkOUKjhfp7YopjpSZxaDpjrEdq1ZuL/3m9TEfX/pSJNNNUkOVDUcaRTtA8BTRgkhLNJwYwkczsisgIS0y0yapkQvBWv7wOnZrjGb6/rNRv8jiKcApncAEeXEEd7qAJbSCQwTO8wpv1ZL1Y79bHsrVg5TMn8Keszx9NF5Dm</latexit><latexit sha1_base64="3nwYg5V/8/2poVMB58MrV1fD5fQ=">AAAB+HicbZDNTgIxFIXv4B/iD6Mu3TQSExdkMkNMdAm6cYkJIAlMJp3SgYZOZ9J2TJDwJG5caIxbH8Wdb2OBWSh4k6Zfzrk3vT1hypnSrvttFTY2t7Z3irulvf2Dw7J9dNxRSSYJbZOEJ7IbYkU5E7Stmea0m0qK45DTh3B8O/cfHqlULBEtPUmpH+OhYBEjWBspsMuNwKs2glrVcRxztwK74jruotA6eDlUIK9mYH/1BwnJYio04Vipnuem2p9iqRnhdFbqZ4qmmIzxkPYMChxT5U8Xi8/QuVEGKEqkOUKjhfp7YopjpSZxaDpjrEdq1ZuL/3m9TEfX/pSJNNNUkOVDUcaRTtA8BTRgkhLNJwYwkczsisgIS0y0yapkQvBWv7wOnZrjGb6/rNRv8jiKcApncAEeXEEd7qAJbSCQwTO8wpv1ZL1Y79bHsrVg5TMn8Keszx9NF5Dm</latexit><latexit sha1_base64="3nwYg5V/8/2poVMB58MrV1fD5fQ=">AAAB+HicbZDNTgIxFIXv4B/iD6Mu3TQSExdkMkNMdAm6cYkJIAlMJp3SgYZOZ9J2TJDwJG5caIxbH8Wdb2OBWSh4k6Zfzrk3vT1hypnSrvttFTY2t7Z3irulvf2Dw7J9dNxRSSYJbZOEJ7IbYkU5E7Stmea0m0qK45DTh3B8O/cfHqlULBEtPUmpH+OhYBEjWBspsMuNwKs2glrVcRxztwK74jruotA6eDlUIK9mYH/1BwnJYio04Vipnuem2p9iqRnhdFbqZ4qmmIzxkPYMChxT5U8Xi8/QuVEGKEqkOUKjhfp7YopjpSZxaDpjrEdq1ZuL/3m9TEfX/pSJNNNUkOVDUcaRTtA8BTRgkhLNJwYwkczsisgIS0y0yapkQvBWv7wOnZrjGb6/rNRv8jiKcApncAEeXEEd7qAJbSCQwTO8wpv1ZL1Y79bHsrVg5TMn8Keszx9NF5Dm</latexit><latexit sha1_base64="3nwYg5V/8/2poVMB58MrV1fD5fQ=">AAAB+HicbZDNTgIxFIXv4B/iD6Mu3TQSExdkMkNMdAm6cYkJIAlMJp3SgYZOZ9J2TJDwJG5caIxbH8Wdb2OBWSh4k6Zfzrk3vT1hypnSrvttFTY2t7Z3irulvf2Dw7J9dNxRSSYJbZOEJ7IbYkU5E7Stmea0m0qK45DTh3B8O/cfHqlULBEtPUmpH+OhYBEjWBspsMuNwKs2glrVcRxztwK74jruotA6eDlUIK9mYH/1BwnJYio04Vipnuem2p9iqRnhdFbqZ4qmmIzxkPYMChxT5U8Xi8/QuVEGKEqkOUKjhfp7YopjpSZxaDpjrEdq1ZuL/3m9TEfX/pSJNNNUkOVDUcaRTtA8BTRgkhLNJwYwkczsisgIS0y0yapkQvBWv7wOnZrjGb6/rNRv8jiKcApncAEeXEEd7qAJbSCQwTO8wpv1ZL1Y79bHsrVg5TMn8Keszx9NF5Dm</latexit>
TX
t=1
Rt
<latexit sha1_base64="YIoivBJth6KmUEL1hovQIgmRNeY=">AAAB+XicbZDLSgMxFIYz9VbrbdSlm2ARXJUZEXQjFN24rNIbtOOQSdM2NMkMyZlCGfomblwo4tY3cefbmLaz0NYfAh//OYdz8keJ4AY879sprK1vbG4Vt0s7u3v7B+7hUdPEqaasQWMR63ZEDBNcsQZwEKydaEZkJFgrGt3N6q0x04bHqg6ThAWSDBTvc0rAWqHrdk0qwwxu/OlTHT+GELplr+LNhVfBz6GMctVC96vbi2kqmQIqiDEd30sgyIgGTgWblrqpYQmhIzJgHYuKSGaCbH75FJ9Zp4f7sbZPAZ67vycyIo2ZyMh2SgJDs1ybmf/VOin0r4OMqyQFpuhiUT8VGGI8iwH3uGYUxMQCoZrbWzEdEk0o2LBKNgR/+cur0Lyo+JYfLsvV2zyOIjpBp+gc+egKVdE9qqEGomiMntErenMy58V5dz4WrQUnnzlGf+R8/gDqbpMs</latexit><latexit sha1_base64="YIoivBJth6KmUEL1hovQIgmRNeY=">AAAB+XicbZDLSgMxFIYz9VbrbdSlm2ARXJUZEXQjFN24rNIbtOOQSdM2NMkMyZlCGfomblwo4tY3cefbmLaz0NYfAh//OYdz8keJ4AY879sprK1vbG4Vt0s7u3v7B+7hUdPEqaasQWMR63ZEDBNcsQZwEKydaEZkJFgrGt3N6q0x04bHqg6ThAWSDBTvc0rAWqHrdk0qwwxu/OlTHT+GELplr+LNhVfBz6GMctVC96vbi2kqmQIqiDEd30sgyIgGTgWblrqpYQmhIzJgHYuKSGaCbH75FJ9Zp4f7sbZPAZ67vycyIo2ZyMh2SgJDs1ybmf/VOin0r4OMqyQFpuhiUT8VGGI8iwH3uGYUxMQCoZrbWzEdEk0o2LBKNgR/+cur0Lyo+JYfLsvV2zyOIjpBp+gc+egKVdE9qqEGomiMntErenMy58V5dz4WrQUnnzlGf+R8/gDqbpMs</latexit><latexit sha1_base64="YIoivBJth6KmUEL1hovQIgmRNeY=">AAAB+XicbZDLSgMxFIYz9VbrbdSlm2ARXJUZEXQjFN24rNIbtOOQSdM2NMkMyZlCGfomblwo4tY3cefbmLaz0NYfAh//OYdz8keJ4AY879sprK1vbG4Vt0s7u3v7B+7hUdPEqaasQWMR63ZEDBNcsQZwEKydaEZkJFgrGt3N6q0x04bHqg6ThAWSDBTvc0rAWqHrdk0qwwxu/OlTHT+GELplr+LNhVfBz6GMctVC96vbi2kqmQIqiDEd30sgyIgGTgWblrqpYQmhIzJgHYuKSGaCbH75FJ9Zp4f7sbZPAZ67vycyIo2ZyMh2SgJDs1ybmf/VOin0r4OMqyQFpuhiUT8VGGI8iwH3uGYUxMQCoZrbWzEdEk0o2LBKNgR/+cur0Lyo+JYfLsvV2zyOIjpBp+gc+egKVdE9qqEGomiMntErenMy58V5dz4WrQUnnzlGf+R8/gDqbpMs</latexit><latexit sha1_base64="YIoivBJth6KmUEL1hovQIgmRNeY=">AAAB+XicbZDLSgMxFIYz9VbrbdSlm2ARXJUZEXQjFN24rNIbtOOQSdM2NMkMyZlCGfomblwo4tY3cefbmLaz0NYfAh//OYdz8keJ4AY879sprK1vbG4Vt0s7u3v7B+7hUdPEqaasQWMR63ZEDBNcsQZwEKydaEZkJFgrGt3N6q0x04bHqg6ThAWSDBTvc0rAWqHrdk0qwwxu/OlTHT+GELplr+LNhVfBz6GMctVC96vbi2kqmQIqiDEd30sgyIgGTgWblrqpYQmhIzJgHYuKSGaCbH75FJ9Zp4f7sbZPAZ67vycyIo2ZyMh2SgJDs1ybmf/VOin0r4OMqyQFpuhiUT8VGGI8iwH3uGYUxMQCoZrbWzEdEk0o2LBKNgR/+cur0Lyo+JYfLsvV2zyOIjpBp+gc+egKVdE9qqEGomiMntErenMy58V5dz4WrQUnnzlGf+R8/gDqbpMs</latexit>
T maxa2[k]
E[Rt|a]�TX
t=1
Ea⇠⇡ [E[Rt|a]]<latexit sha1_base64="JQCYv8AV15V4Bxv4/26ZC0Vdkug=">AAACWHicbVFda9swFJXdrzTrujR97MulYbCXFnsUupdBWSn0sStJW7A9IytyIiLJRroeC67/ZKEP7V/py+Q0sPXjguDonHukq6OslMJiEDx4/srq2vpGZ7P7Yevj9qfeTv/KFpVhfMQKWZibjFouheYjFCj5TWk4VZnk19nstNWvf3NjRaGHOC95ouhEi1wwio5Ke8UQYkX/pDWNhY5mSdNucZpl9VkTXaYIt0ATgAOA2FYqrfF72Pwa/utpjVYoiEvhrJLnGL17AMRGTKaYpL1BcBgsCt6CcAkGZFkXae8uHhesUlwjk9TaKAxKTGpqUDDJm25cWV5SNqMTHjmoqeI2qRfBNPDZMWPIC+OWRliw/ztqqqydq8x1tkPb11pLvqdFFebfklroskKu2fNFeSUBC2hThrEwnKGcO0CZEW5WYFNqKEP3F10XQvj6yW/B1dfD0OGfR4OTH8s4OmSP7JMvJCTH5ISckwsyIozckydv1VvzHn3ib/ibz62+t/Tskhfl9/8CQ8qzCA==</latexit><latexit sha1_base64="JQCYv8AV15V4Bxv4/26ZC0Vdkug=">AAACWHicbVFda9swFJXdrzTrujR97MulYbCXFnsUupdBWSn0sStJW7A9IytyIiLJRroeC67/ZKEP7V/py+Q0sPXjguDonHukq6OslMJiEDx4/srq2vpGZ7P7Yevj9qfeTv/KFpVhfMQKWZibjFouheYjFCj5TWk4VZnk19nstNWvf3NjRaGHOC95ouhEi1wwio5Ke8UQYkX/pDWNhY5mSdNucZpl9VkTXaYIt0ATgAOA2FYqrfF72Pwa/utpjVYoiEvhrJLnGL17AMRGTKaYpL1BcBgsCt6CcAkGZFkXae8uHhesUlwjk9TaKAxKTGpqUDDJm25cWV5SNqMTHjmoqeI2qRfBNPDZMWPIC+OWRliw/ztqqqydq8x1tkPb11pLvqdFFebfklroskKu2fNFeSUBC2hThrEwnKGcO0CZEW5WYFNqKEP3F10XQvj6yW/B1dfD0OGfR4OTH8s4OmSP7JMvJCTH5ISckwsyIozckydv1VvzHn3ib/ibz62+t/Tskhfl9/8CQ8qzCA==</latexit><latexit sha1_base64="JQCYv8AV15V4Bxv4/26ZC0Vdkug=">AAACWHicbVFda9swFJXdrzTrujR97MulYbCXFnsUupdBWSn0sStJW7A9IytyIiLJRroeC67/ZKEP7V/py+Q0sPXjguDonHukq6OslMJiEDx4/srq2vpGZ7P7Yevj9qfeTv/KFpVhfMQKWZibjFouheYjFCj5TWk4VZnk19nstNWvf3NjRaGHOC95ouhEi1wwio5Ke8UQYkX/pDWNhY5mSdNucZpl9VkTXaYIt0ATgAOA2FYqrfF72Pwa/utpjVYoiEvhrJLnGL17AMRGTKaYpL1BcBgsCt6CcAkGZFkXae8uHhesUlwjk9TaKAxKTGpqUDDJm25cWV5SNqMTHjmoqeI2qRfBNPDZMWPIC+OWRliw/ztqqqydq8x1tkPb11pLvqdFFebfklroskKu2fNFeSUBC2hThrEwnKGcO0CZEW5WYFNqKEP3F10XQvj6yW/B1dfD0OGfR4OTH8s4OmSP7JMvJCTH5ISckwsyIozckydv1VvzHn3ib/ibz62+t/Tskhfl9/8CQ8qzCA==</latexit><latexit sha1_base64="JQCYv8AV15V4Bxv4/26ZC0Vdkug=">AAACWHicbVFda9swFJXdrzTrujR97MulYbCXFnsUupdBWSn0sStJW7A9IytyIiLJRroeC67/ZKEP7V/py+Q0sPXjguDonHukq6OslMJiEDx4/srq2vpGZ7P7Yevj9qfeTv/KFpVhfMQKWZibjFouheYjFCj5TWk4VZnk19nstNWvf3NjRaGHOC95ouhEi1wwio5Ke8UQYkX/pDWNhY5mSdNucZpl9VkTXaYIt0ATgAOA2FYqrfF72Pwa/utpjVYoiEvhrJLnGL17AMRGTKaYpL1BcBgsCt6CcAkGZFkXae8uHhesUlwjk9TaKAxKTGpqUDDJm25cWV5SNqMTHjmoqeI2qRfBNPDZMWPIC+OWRliw/ztqqqydq8x1tkPb11pLvqdFFebfklroskKu2fNFeSUBC2hThrEwnKGcO0CZEW5WYFNqKEP3F10XQvj6yW/B1dfD0OGfR4OTH8s4OmSP7JMvJCTH5ISckwsyIozckydv1VvzHn3ib/ibz62+t/Tskhfl9/8CQ8qzCA==</latexit>
Exploration vs. Exploitation
31(Illustration from Dan Klein and Pieter Abbeel’s course in UC Berkeley)
Contextual Bandits
32
Features: [Burger, Fries, Onion Ring, Fried Chicken]
Features: [Noodles, Tom Yum Soup, Poor service]
(Illustration from Dan Klein and Pieter Abbeel’s course in UC Berkeley)
Algorithms for Bandits Algorithms
• Multi-armed Bandits– Explore-First– eps-Greedy– Upper Confidence Bound
• Contextual bandits– Infinite state space– Work with a policy class instead
• The concept of regret
33
Off-policy evaluation under the Contextual bandits model• Contexts:
– drawn iid, possibly infinite domain
• Actions: – Taken by a randomized “Logging” policy
• Reward:– Revealed only for the action taken
• Value: –
• We collect data by the above processes.
x1, ..., xn ⇠ �
ri ⇠ D(r|xi, ai)
(xi, ai, ri)ni=1
vµ = Ex⇠�Ea⇠µ(·|x)ED[r|x, a]
34
ai ⇠ µ(a|xi)
35
T Y
X
U
Average Treatment EffectATE = E[Y | T =1] – E[Y | T = 0]
C
Ignorability Assumption
Clinical Trial and ATE estimation
Reinforcement Learning
• Simplify the problem– Make state discrete– Make action discrete– Make state fully observable.
• Remaining challenge: Learning / Long-term planning
• What is a Q function, what is a Value function?
• What is the optimal policy?
36
Value function for a specific policy
+1
-1
• reward +1 at [4,3], -1 at [4,2]• reward -0.04 for each step
actions: UP, DOWN, LEFT, RIGHT
UP
80% move UP
10% move LEFT
10% move RIGHT
( +1-0.04 0 )8/9 * +
1/9 * (-0.04 + Vπ(s’) )+1.0
Algorithms for RL
• Dynamic programming– policy evaluation: compute Vp from p– policy improvement: improve p based on Vp
• Monte Carlo Methods– For on-policy policy evaluation– For estimating the Q function and V-function.
• Temporal Difference Learning– Bootstrap with current belief of the Value function.
• Policy gradient– Stochastic Gradient Descent!
38
Lecture 17, 18: Logic
• Logic agent
• Knowledge Base– Tell operation– Ask operation
• Components of a formal mathematical logic system– Syntax, Semantics
• Inference Algorithms.
39
• Need a formal logic system to work
• Need a data structure to represent known facts
• Need an algorithm to answer ASK questions
40
Knowledge Base
Inference engine
Domain specific content; facts
ASK
TELL
Domain independent algorithms; can deduce new facts from the KB
Recap: KB AgentsTrue sentences
Syntax and semantics
• Two components of a logic system
• Syntax --- How to construct sentences– The symbols– The operators that connect symbols together– A precedence ordering
• Semantics --- Rules the assignment of sentences to truth– For every possible worlds (or “models” in logic jargon)– The truth table is a semantics
41
42
Entailment
Representation
World
FactFOLLOWS
ENTAILS
Facts
Sentences
Semantics
Sentence
Semantics
A is entailed by B, if A is true in all possible worlds consistent with Bunder the semantics.
Inference procedure
• Inference procedure– Rules (algorithms) that we apply (often recursively) to derive truthfrom other truth.
– Could be specific to a particular set of semantics, a particularrealization of the world.
• Soundness and completeness of an inference procedure– Soundness: All truth discovered are valid.– Completeness: All truth that are entailed can be discovered.
43
Propositional Logic
• Syntax:
44
• Syntax– True, false, propositional symbols– ( ) , ¬ (not), Ù (and), Ú (or), Þ (implies), Û (equivalent)
• Semantics:– Five rules (the following truth table)
• Inference rules:– Modus Pronens etc. Most important: Resolution
Propositional logic agent
• Representation: Conjunctive Normal Forms– Represent them in a data structure: a list, a heap, a tree?
– Efficient TELL operation
• Inference: Solve ASK question– Use “Resolution” only on CNFs is Sound and Complete.
– Equivalent to SAT, NP-complete, but good heuristics / practicalalgorithms exist
• Possible answers to ASK:– Valid, Satisfiable, Unsatisfiable
45
First order logic
• More expressive language– Relations and functions of objects.– Quantifiers such as, All, Exists.
• Easier to construct a KB.– Need much smaller number of sentences to capture a domain.
• Follow the same structure: Symbols, Semantics
• Dedicated inference algorithms
• (FOL is not covered in the Final)46
We have gone a long way!
47
Week Topic
1Introduction and Course OverviewAI Problem Solving and Intelligent Agents
2Quantifying uncertaintyProbabilistic Reasoning: Bayes Network
3Probabilistic Reasoning: Conditional IndependencesMachine Learning: Supervised Learning
4
Machine Learning: Unsupervised Learning
Machine Learning: How machine learning works?
5Continuous optimizationSearch: Solving problems with Search
6Search: Basic searchMidterm
7Search: Informed searchRL: RL Overview and MDP
8RL: Multi-arm BanditsRL: Contexual Bandits & Policy evaluation
9RL: Tabular MDP, RL algorithmsLogic: Propositional Logic
10Logic: First order logicReview session
11 Final Exam. March 18 12:00 PM - 3:00 PM
Probabilistic Reasoning
Machine Learning
Search
ReinforcementLearning
Logic
Future of AI
• More higher level intelligence– But more learning based than rule-based
• More stateful systems, more reinforcement learning
• More AI in the non-iid environment– Structured– Adversarial
• More forms of agent’s perception– Weak supervision– Self-supervision (bootstrapping)
48
Final words
• With greater power comes great responsibility.– Ethics in AI, Privacy– AI for good causes– Social impacts
• Thank you!– It’s my pleasure to work with you!– I hope the course is / will be useful.
49