Upload
others
View
17
Download
0
Embed Size (px)
Citation preview
人工智慧&最佳化理論與應用
Introduction
Artificial Intelligence and
Machine Learning
• Web Intelligence and Data Mining Lab• Web Data Extraction & Integration • Information Retrieval and Extraction• Data Mining and Machine Learning
https://sites.google.com/site/jahuichang/
中央大學資工系教授桃園市研考委員 2015. 3 - Now
科技部智慧學門複審委員 2014.10 - NowTAAI常務理事 2004.1- NowACLCLP 理事 2012.1- Now
Outline
Artificial Intelligence
Knowledge Reasoning
Machine Learning
Pattern RecognitionNLP: IE+IR+MT/
Statistical Reasoning
Problem Solving/ Search Algorithm
3UncertaintyReasoning
Expert System/Logic Programming
ProbabilityBayesian Network
Supervised learning
What is AI?
• Thinking vs. Acting • Humanly vs. Rationally
Thinking humanly Thinking rationally
Acting humanly Acting rationally
What is AI?
• Humanly
– Thinking humanly: cognitive modeling
– Acting humanly: Turing Test
• Rationally
– Thinking rationally: "laws of thought“
– Acting rationally: rational agent
AI prehistory
• Philosophy Logic, methods of reasoning, mind as physical system foundations of learning, language,rationality
• Mathematics Formal representation and proof algorithms,computation, (un)decidability, (in)tractability,probability
• Economics utility, decision theory • Neuroscience physical substrate for mental activity• Psychology phenomena of perception and motor control,
experimental techniques• Computer building fast computers
engineering• Control theory design systems that maximize an objective
function over time • Linguistics knowledge representation, grammar
Abridged history of AI
• 1943 McCulloch & Pitts: Boolean circuit model of brain• 1950 Turing's "Computing Machinery and Intelligence"• 1956 Dartmouth meeting: "Artificial Intelligence" adopted• 1952—69 Look, Ma, no hands! • 1950s Early AI programs, including Samuel's checkers
program, Newell & Simon's Logic Theorist, Gelernter's Geometry Engine
• 1965 Robinson's complete algorithm for logical reasoning• 1966—73 AI discovers computational complexity
Neural network research almost disappears• 1969—79 Early development of knowledge-based systems• 1980-- AI becomes an industry • 1986-- Neural networks return to popularity• 1987-- AI becomes a science • 1995-- The emergence of intelligent agents
AI Renaissance
• The Internet, intranets, and the AI renaissance, Intelligence 1997
– Daniel E. O’Leary
• Why artificial intelligence is enjoying a renaissance
– The Economist, 2017
AI: State of the Art
• Deep Blue defeated the reigning world chess champion Garry Kasparov in 1997
• IBM's Watson Supercomputer Destroys Humans in Jeopardy, 2011
• 人機大戰 Google AlphaGO勝出!李世乭投降, 2016
無人駕駛車、車聯網
• Self-Driving: No hands across America, 1995, 2017
– Google自動車十新創事業Waymo
– 三星與奧迪
– 加拿大黑莓研發中心
– 進軍共享經濟福斯推新品牌Moia
– BMW 慕尼黑測試無人車
– 英國新創公司 Oxbotica
• 5分鐘視頻告訴你,為什麼無人駕駛比人強
Part I Artificial Intelligence
Two Chapters
Part II Problem Solving
Four Chapters
Part III Knowledge and Reasoning
Six Chapters
Part IV Uncertain Knowledge and Reasoning
Five Chapters
Part V Learning
Four Chapters
Part VII Communicating, Perceiving, and Acting
Natural Language Processing
Natural Language for Communication
Perception (vision)
Robotics
Artificial Intelligence: A Modern Approach
Outline
Artificial Intelligence
Knowledge Reasoning
Machine Learning
Pattern RecognitionNLP: IE+IR+MT/
Statistical Reasoning
Problem Solving/ Search Algorithm
3UncertaintyReasoning
Expert System/Logic Programming
ProbabilityBayesian Network
Supervised learning
SOLVING PROBLEM BY SEARCH
Video lectures on YouTube:https://www.youtube.com/playlist?list=PLAwxTw4SYaPlqMkzr4xyuD6cXTIgPuzgn
Some games
• 8-queens
• Bridge
• 8-puzzle
• Sudoku
• Shortest path
• Coloring
• Dice …
14 Jan 2004 CS 3243 - Blind Search 15
14 Jan 2004 CS 3243 - Blind Search 16
Problem Types: Terminologies
• Fully vs Partial observable
• Deterministic vs. Stochastic
• Discrete or Continuous
• Benign vs. Adversarial
Checker Poker Robotic Car
Vacuum
Rout Finding Problem
Search Algorithms
• Uniform Search
• Depth-First Search
• Breadth-First Search
• A* (A-Star) Search Algorithm
• Local Beam Search
• Hill Climbing
• Simulated annealing search
Local Search in Continuous Spaces
Outline
Artificial Intelligence
Knowledge Reasoning
Machine Learning
Pattern RecognitionNLP: IE+IR+MT/
Statistical Reasoning
Problem Solving/ Search Algorithm
3UncertaintyReasoning
Expert System/Logic Programming
ProbabilityBayesian Network
Supervised learning
Knowledge Reasoning
Why do we need logics?
• Problem solving agents cannot infer unobserved information.
• We want an algorithm that reasons in a way that resembles reasoning in humans.
• Logic a.k.a. Symbolic Reasoning
Wumpus World PEAS
• Performance measure– gold +1000, death -1000
– -1 per step, -10 for using the arrow
• Environment– Shooting kills wumpus if you are
facing it
– Shooting uses up the only arrow
• Sensors: – Stench, Breeze, Glitter, Bump,
Scream
• Actuators: – Left turn, Right turn, Forward,
Grab, Shoot
Topics for Logic Reasoning
• Knowledge-based agents
• Logic in general - models and entailment
• Propositional (Boolean) logic
• Equivalence, validity, satisfiability
• Inference rules and theorem proving for Horn clauses
– forward chaining
– backward chaining
– resolution
Expert Systems
Logic in general
• Logics are formal languages for representing information such that conclusions can be drawn– Syntax defines the sentences in the language– Semantics define the "meaning" of sentences;– i.e., define truth of a sentence in a world
• Entailment means that one thing follows from another:KB ╞ α
– Knowledge base KB entails sentence α if and only if α is true in all worlds where KB is true, i.e. M(KB) M()
• Models are formally structured worlds with respect to which truth can be evaluated– M() is the set of worlds where is true– M(KB) is the set of all worlds where KB is true– Think of KB and as a collection of constraints
Inference
• KB ├i α = sentence α can be derived from KB by procedure i
• Soundness: i is sound if whenever KB ├i α, it is also true that KB╞ α
• Completeness: i is complete if whenever KB╞ α, it is also true that KB ├i α
Propositional logic: Syntax and Semantics
• Propositional logic is the simplest logic – illustrates basic ideas
• The proposition symbols S1, S2 etc are sentences– If S is a sentence, S is a sentence (negation)
– If S1 and S2 are sentences, S1 S2 is a sentence (conjunction)
– If S1 and S2 are sentences, S1 S2 is a sentence (disjunction)
– If S1 and S2 are sentences, S1 S2 is a sentence (implication)
– If S1 and S2 are sentences, S1 S2 is a sentence (biconditional)
Reasoning Patterns
• How do we know KB╞ α?
– Model checking, O(2n)
– Application of inference rules
• Inference Rules
– Modus Ponens
– And-Elimination
• Monontonicity
– If KB╞ α, then KB╞ α
Inference by Model Checking
• Depth-first enumeration of all models is sound and complete
• For n symbols, time complexity is O(2n), space complexity is O(n)
Assign true to
variable P
Assign false
to variable P
Application of Inference Rules
• Two sentences are logically equivalent iff true in same set of models: α ≡ β iff α╞ β and β╞ α
You need to
know these
(Discrete math)
Resolution
• Inference rule for CNF: sound and complete! *( )
( )
( )
A B C
A
B C
“If A or B or C is true, but not A, then B or C must be true.”
( )
( )
( )
A B C
A D E
B C D E
“If A is false then B or C must be true, or if A is true
then D or E must be true, hence since A is either true or
false, B or C or D or E must be true.”
( )
( )
( )
A B
A B
B B B
Simplification
* Resolution is “refutation complete”
in that it can prove the truth of any
entailed sentence by refutation.
Efficient Propositional Inference
• Two families of efficient algorithms for propositional inference:
Complete backtracking search algorithms
• DPLL algorithm (Davis, Putnam, Logemann, Loveland)
Incomplete local search algorithms
• WalkSAT algorithm
Summary for Knowledge Reasoning
• Logical agents apply inference to a knowledge baseto derive new information and make decisions
• Basic concepts of logic:– syntax: formal structure of sentences– semantics: truth of sentences wrt models– entailment: necessary truth of one sentence given another– inference: deriving sentences from other sentences– soundness: derivations produce only entailed sentences– completeness: derivations can produce all entailed
sentences
• Resolution is complete for propositional logicForward, backward chaining are linear-time, complete for Horn clauses
• Propositional logic lacks expressive power
Outline
Artificial Intelligence
Knowledge Reasoning
Machine Learning
Pattern RecognitionNLP: IE+IR+MT/
Statistical Reasoning
Problem Solving/ Search Algorithm
3UncertaintyReasoning
Expert System/Logic Programming
ProbabilityBayesian Network
Supervised learning
4 UNCERTAINTY REASONING
Bayesian Networks
3
Bayesian Networks
• A simple, graphical notation for conditional independence assertions and hence for compact specification of full joint distributions
• Syntax:– a directed, acyclic graph (link ≈ "directly influences")– a conditional distribution for each node given its parents:P (Xi | Parents (Xi))
Compactness
• A CPT for Boolean Xi with k Boolean parents has 2k
rows for the combinations of parent values
• Each row requires one number p for Xi = true(the number for Xi = false is just 1-p)
• If each variable has no more than k parents, the complete network requires O(n ·2k) numbers– I.e., grows linearly with n, vs. O(2n) for the full joint
distribution
– For burglary net, 1 + 1 + 4 + 2 + 2 = 10 numbers (vs. 25-1 = 31)
Semantics
The full joint distribution is defined as the product of the local conditional distributions:
P (X1, … ,Xn) = πi = 1 P (Xi | Parents(Xi))
e.g., P(j m a b e)
= P (j | a) P (m | a) P (a | b, e) P (b) P (e)
= 0.9*0.7*0.001*0.999*0.998
0.00063
n
Local semantics
• Local semantics: each node is conditionally independent of its nondescendants given its parents
Markov blanket
• Each node is conditionally independent of all others given its Markov blanket: parents + children + children's parents
Inference by Enumeration
Exact Inference vs. Approximate Inference
• Speeding Up Inference
– Pull out terms
– Maximize Independence
– Variable Enumeration
• Approximate Inference by
– Sampling
– Rejection sampling
– Gibbs sampling
Outline
Artificial Intelligence
Knowledge Reasoning
Machine Learning
Pattern RecognitionNLP: IE+IR+MT/
Statistical Reasoning
Problem Solving/ Search Algorithm
3UncertaintyReasoning
Expert System/Logic Programming
ProbabilityBayesian Network
Supervised learning
MACHINE LEARNING
a.k.a. Predictive analytics, Data science
• Deep Learning by I. Goodfellow, Y. Bengio, A. Courville, 2016• The Elements of Statistical Learning: Data Mining, Inference, and
Prediction, 2nd ed. by T. Hestie, R. Tibshirani, J. Friedman, 2010• Pattern Recognition and Machine Learning by C. M. Bishop, 2017• Introduction to Machine Learning, 2nd ed. by Ethem Alpaydin,
2010• Machine Learning by Tom M. Mitchell, 1997
Machine Learning Books
Topics from PRML
• Introduction• Probability Distribution• Linear Models for Regression• Linear Models for Classification• Neural Networks• Kernel Mothods• Sparse Kernel Machines• Graphical Models• Approximate Inference
Topics from Deep Learning
• Applied Math and ML Basics– Linear Algebra– Probability and Information Theory– Numerical Computation– Machine Learning Basics
• Deep Networks: Modern Practices– Deep Forward Neural Networks– Regularization for Deep Learning– Optimization for Training Deep Models– Convolutional Neural Networks– Sequence Modeling: RNN
Learning from Examples
»Supervised Learning
»Classification
»Regression
»Sequence Labeling
»Structure Learning
»Object Recognition
»Summarization
»Unsupervised Learning»Clustering
»Semi-supervised Learning»Use both labeled and
unlabeled training data
»Active Learning»Choose the critical data
points to be labeled
»Distant Learning
Classification: Definition
• Given a collection of records (training set)
– Each record contains a set of attributes, one of the attributes is the class.
• Find a model for class attribute as a function of the values of
other attributes.
• Goal: previously unseen records should be assigned a class as
accurately as possible.
– A test set is used to determine the accuracy of the model.
– Usually, the given data set is divided into training and test sets, with
training set used to build the model and test set used to validate it.
Classification Example
Tid Refund MaritalStatus
TaxableIncome Cheat
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes10
Refund MaritalStatus
TaxableIncome Cheat
No Single 75K ?
Yes Married 50K ?
No Married 150K ?
Yes Divorced 90K ?
No Single 40K ?
No Married 80K ?10
Test
Set
Training
SetModel
Learn
Classifier
Clustering: Definition
• Given a set of points, with a notion of distancebetween points, group the points into some number of clusters, so that – Members of a cluster are close/similar to each other
– Members of different clusters are dissimilar
• Usually:– Points are in a high-dimensional space
– Similarity is defined using a distance measure• Euclidean (data points)
• Cosine (vectors),
• Jaccard (set),
• Edit distance (string), …
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
http://www.mmds.org52
Example: Clusters & Outliers
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
http://www.mmds.org53
x x
x x x x
x x x x
x x x
x x
x
xx x
x x
x x x
x
x x x
x
x x
x x x x
x x x
x
x
x
x x
x x x x
x x x x
x x x
x x
x
xx x
x x
x x x
x
x x x
x
x x
x x x x
x x x
x
Outlier Cluster
Structural Learning: An unified framework
Supervised Learning
• Linear Regression
• Binary Classification
• Bayesian Networks
• Minimize Errors
• Maximize Likelihood
Bayesian Networks
• Definition:
– If the network structure of the model is a directed acyclic graph, the model represents a factorization of the joint probability of all random variables.
– Conditional Probability Tables
– Smoothing
– Inference
𝑃 𝑥1, 𝑥2,… , 𝑥𝑛 = 𝑖=1
𝑛
𝑝(𝑥𝑖|𝑝𝑎𝑟𝑒𝑛𝑡 𝑥𝑖 )
Continuous Optimization Problem
• General form:
where– (x) : Rn → R is the objective function to be minimized
over the variable x ,– gi(x) ≤ 0 are called inequality constraints, and– hi(x) = 0 are called equality constraints.
• Special cases:– Unconstrained Optimization, m=p=0– Linear constrained Optimization: gi(x) and hi(x) are linear– Nonlinear constrained Optimization– Quadratic Programming
𝐦𝐢𝐧𝐢𝐦𝐢𝐳𝐞𝑥𝑓(𝑥)
Subject to 𝑔𝑖 𝑥 ≤ 0, 𝑖 = 1,… ,𝑚ℎ𝑗 𝑥 = 0, 𝑗 = 1,… , 𝑝
Quadratic Programming
• The objective of quadratic programming is to find an n-dimensional vector x, that will
• Given:
– a real-valued, n-dimensional vector c,
– an n × n-dimensional real symmetric matrix Q,
– an m × n-dimensional real matrix A, and
– an m-dimensional real vector b,
𝐦𝐢𝐧𝐢𝐦𝐢𝐳𝐞𝒙
12𝒙𝑇𝑄𝒙+𝒄𝑇𝒙)
Subject to 𝐴𝒙 ≤ 𝑏
Summary
• Related Courses
– AI, Neural Network, Data Mining, Machine Learning, Optimization, Graphical Models, Deep Learning
• Optimization Problems
– Combinatorial Optimization Problem• Search for values for each variable
• Constrained Satisfaction Problem
– Continuous Optimization Problem
• Batch/Minibatch/Stochastic Gradient descent
• Inference
– Symbolic Logic vs. Statistical Models
Questions & Answers