Introduction Artificial Intelligence and Machine Learning

人工智慧&最佳化理論與應用

Introduction

Artificial Intelligence and

Machine Learning

• Web Intelligence and Data Mining Lab• Web Data Extraction & Integration • Information Retrieval and Extraction• Data Mining and Machine Learning

https://sites.google.com/site/jahuichang/

中央大學資工系教授桃園市研考委員 2015. 3 - Now

科技部智慧學門複審委員 2014.10 - NowTAAI常務理事 2004.1- NowACLCLP 理事 2012.1- Now



Outline

Artificial Intelligence

Knowledge Reasoning

Machine Learning

Pattern RecognitionNLP: IE+IR+MT/

Statistical Reasoning

Problem Solving/ Search Algorithm

3UncertaintyReasoning

Expert System/Logic Programming

ProbabilityBayesian Network

Supervised learning

What is AI?

• Thinking vs. Acting • Humanly vs. Rationally

Thinking humanly Thinking rationally

Acting humanly Acting rationally

What is AI?

• Humanly

– Thinking humanly: cognitive modeling

– Acting humanly: Turing Test

• Rationally

– Thinking rationally: "laws of thought“

– Acting rationally: rational agent

AI prehistory

• Philosophy Logic, methods of reasoning, mind as physical system foundations of learning, language,rationality

• Mathematics Formal representation and proof algorithms,computation, (un)decidability, (in)tractability,probability

• Economics utility, decision theory • Neuroscience physical substrate for mental activity• Psychology phenomena of perception and motor control,

experimental techniques• Computer building fast computers

engineering• Control theory design systems that maximize an objective

function over time • Linguistics knowledge representation, grammar

Abridged history of AI

• 1943 McCulloch & Pitts: Boolean circuit model of brain• 1950 Turing's "Computing Machinery and Intelligence"• 1956 Dartmouth meeting: "Artificial Intelligence" adopted• 1952—69 Look, Ma, no hands! • 1950s Early AI programs, including Samuel's checkers

program, Newell & Simon's Logic Theorist, Gelernter's Geometry Engine

• 1965 Robinson's complete algorithm for logical reasoning• 1966—73 AI discovers computational complexity

Neural network research almost disappears• 1969—79 Early development of knowledge-based systems• 1980-- AI becomes an industry • 1986-- Neural networks return to popularity• 1987-- AI becomes a science • 1995-- The emergence of intelligent agents

AI Renaissance

• The Internet, intranets, and the AI renaissance, Intelligence 1997

– Daniel E. O’Leary

• Why artificial intelligence is enjoying a renaissance

– The Economist, 2017

http://ieeexplore.ieee.org/document/562929/

https://www.economist.com/blogs/economist-explains/2016/07/economist-explains-11

http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=562929

http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=562929

AI: State of the Art

• Deep Blue defeated the reigning world chess champion Garry Kasparov in 1997

• IBM's Watson Supercomputer Destroys Humans in Jeopardy, 2011

• 人機大戰 Google AlphaGO勝出！李世乭投降, 2016

https://www.youtube.com/watch?v=WFR3lOm_xhE

https://www.inside.com.tw/2016/03/09/google-alpha-go-surrender

無人駕駛車、車聯網

• Self-Driving: No hands across America, 1995, 2017

– Google自動車十新創事業Waymo

– 三星與奧迪

– 加拿大黑莓研發中心

– 進軍共享經濟福斯推新品牌Moia

– BMW 慕尼黑測試無人車

– 英國新創公司 Oxbotica

• 5分鐘視頻告訴你，為什麼無人駕駛比人強

http://www.sfchronicle.com/business/article/Driverless-milestone-No-Hands-Across-America-11278241.php

http://www.hksilicon.com/articles/1168269

機器翻譯的進步

Google翻譯導入類神經網路的深度學習

https://buzzorange.com/techorange/2016/09/29/google-translation-gnmt/

Part I Artificial Intelligence

Two Chapters

Part II Problem Solving

Four Chapters

Part III Knowledge and Reasoning

Six Chapters

Part IV Uncertain Knowledge and Reasoning

Five Chapters

Part V Learning

Four Chapters

Part VII Communicating, Perceiving, and Acting

Natural Language Processing

Natural Language for Communication

Perception (vision)

Robotics

Artificial Intelligence: A Modern Approach

Outline


Knowledge Reasoning

Machine Learning







Supervised learning

SOLVING PROBLEM BY SEARCH

Video lectures on YouTube:https://www.youtube.com/playlist?list=PLAwxTw4SYaPlqMkzr4xyuD6cXTIgPuzgn

https://www.youtube.com/playlist?list=PLAwxTw4SYaPlqMkzr4xyuD6cXTIgPuzgn

Some games

• 8-queens

• Bridge

• 8-puzzle

• Sudoku

• Shortest path

• Coloring

• Dice …

14 Jan 2004 CS 3243 - Blind Search 15

14 Jan 2004 CS 3243 - Blind Search 16

Problem Types: Terminologies

• Fully vs Partial observable

• Deterministic vs. Stochastic

• Discrete or Continuous

• Benign vs. Adversarial

Checker Poker Robotic Car

Vacuum

https://www.youtube.com/watch?v=5lcLmhsmBnQ&list=PLAwxTw4SYaPlqMkzr4xyuD6cXTIgPuzgn&index=7

Rout Finding Problem

Search Algorithms

• Uniform Search

• Depth-First Search

• Breadth-First Search

• A* (A-Star) Search Algorithm

• Local Beam Search

• Hill Climbing

• Simulated annealing search

Local Search in Continuous Spaces

Outline


Knowledge Reasoning

Machine Learning







Supervised learning

Knowledge Reasoning

Why do we need logics?

• Problem solving agents cannot infer unobserved information.

• We want an algorithm that reasons in a way that resembles reasoning in humans.

• Logic a.k.a. Symbolic Reasoning

Wumpus World PEAS

• Performance measure– gold +1000, death -1000

– -1 per step, -10 for using the arrow

• Environment– Shooting kills wumpus if you are

facing it

– Shooting uses up the only arrow

• Sensors: – Stench, Breeze, Glitter, Bump,

Scream

• Actuators: – Left turn, Right turn, Forward,

Grab, Shoot

Topics for Logic Reasoning

• Knowledge-based agents

• Logic in general - models and entailment

• Propositional (Boolean) logic

• Equivalence, validity, satisfiability

• Inference rules and theorem proving for Horn clauses

– forward chaining

– backward chaining

– resolution

Expert Systems

Logic in general

• Logics are formal languages for representing information such that conclusions can be drawn– Syntax defines the sentences in the language– Semantics define the "meaning" of sentences;– i.e., define truth of a sentence in a world

• Entailment means that one thing follows from another:KB ╞ α

– Knowledge base KB entails sentence α if and only if α is true in all worlds where KB is true, i.e. M(KB) M()

• Models are formally structured worlds with respect to which truth can be evaluated– M() is the set of worlds where is true– M(KB) is the set of all worlds where KB is true– Think of KB and as a collection of constraints

Inference

• KB ├i α = sentence α can be derived from KB by procedure i

• Soundness: i is sound if whenever KB ├i α, it is also true that KB╞ α

• Completeness: i is complete if whenever KB╞ α, it is also true that KB ├i α

Propositional logic: Syntax and Semantics

• Propositional logic is the simplest logic – illustrates basic ideas

• The proposition symbols S1, S2 etc are sentences– If S is a sentence, S is a sentence (negation)

– If S1 and S2 are sentences, S1 S2 is a sentence (conjunction)

– If S1 and S2 are sentences, S1 S2 is a sentence (disjunction)

– If S1 and S2 are sentences, S1 S2 is a sentence (implication)

– If S1 and S2 are sentences, S1 S2 is a sentence (biconditional)

Reasoning Patterns

• How do we know KB╞ α?

– Model checking, O(2n)

– Application of inference rules

• Inference Rules

– Modus Ponens

– And-Elimination

• Monontonicity

– If KB╞ α, then KB╞ α

Inference by Model Checking

• Depth-first enumeration of all models is sound and complete

• For n symbols, time complexity is O(2n), space complexity is O(n)

Assign true to

variable P

Assign false

to variable P

Application of Inference Rules

• Two sentences are logically equivalent iff true in same set of models: α ≡ β iff α╞ β and β╞ α

You need to

know these

(Discrete math)

Resolution

• Inference rule for CNF: sound and complete! *( )

( )

( )

A B C

A

B C

“If A or B or C is true, but not A, then B or C must be true.”

( )

( )

( )

A B C

A D E

B C D E

“If A is false then B or C must be true, or if A is true

then D or E must be true, hence since A is either true or

false, B or C or D or E must be true.”

( )

( )

( )

A B

A B

B B B

Simplification

* Resolution is “refutation complete”

in that it can prove the truth of any

entailed sentence by refutation.

Efficient Propositional Inference

• Two families of efficient algorithms for propositional inference:

Complete backtracking search algorithms

• DPLL algorithm (Davis, Putnam, Logemann, Loveland)

Incomplete local search algorithms

• WalkSAT algorithm

Summary for Knowledge Reasoning

• Logical agents apply inference to a knowledge baseto derive new information and make decisions

• Basic concepts of logic:– syntax: formal structure of sentences– semantics: truth of sentences wrt models– entailment: necessary truth of one sentence given another– inference: deriving sentences from other sentences– soundness: derivations produce only entailed sentences– completeness: derivations can produce all entailed

sentences

• Resolution is complete for propositional logicForward, backward chaining are linear-time, complete for Horn clauses

• Propositional logic lacks expressive power

Outline


Knowledge Reasoning

Machine Learning







Supervised learning

4 UNCERTAINTY REASONING

Bayesian Networks

3

Bayesian Networks

• A simple, graphical notation for conditional independence assertions and hence for compact specification of full joint distributions

• Syntax:– a directed, acyclic graph (link ≈ "directly influences")– a conditional distribution for each node given its parents:P (Xi | Parents (Xi))

Compactness

• A CPT for Boolean Xi with k Boolean parents has 2k

rows for the combinations of parent values

• Each row requires one number p for Xi = true(the number for Xi = false is just 1-p)

• If each variable has no more than k parents, the complete network requires O(n ·2k) numbers– I.e., grows linearly with n, vs. O(2n) for the full joint

distribution

– For burglary net, 1 + 1 + 4 + 2 + 2 = 10 numbers (vs. 25-1 = 31)

Semantics

The full joint distribution is defined as the product of the local conditional distributions:

P (X1, … ,Xn) = πi = 1 P (Xi | Parents(Xi))

e.g., P(j m a b e)

= P (j | a) P (m | a) P (a | b, e) P (b) P (e)

= 0.9*0.7*0.001*0.999*0.998

0.00063

n

Local semantics

• Local semantics: each node is conditionally independent of its nondescendants given its parents

Markov blanket

• Each node is conditionally independent of all others given its Markov blanket: parents + children + children's parents

Inference by Enumeration

Exact Inference vs. Approximate Inference

• Speeding Up Inference

– Pull out terms

– Maximize Independence

– Variable Enumeration

• Approximate Inference by

– Sampling

– Rejection sampling

– Gibbs sampling

https://www.youtube.com/watch?v=DWO-XKo2iS8&list=PLAwxTw4SYaPlqMkzr4xyuD6cXTIgPuzgn&index=143

https://www.youtube.com/watch?v=W5g-4a2PIcI&list=PLAwxTw4SYaPlqMkzr4xyuD6cXTIgPuzgn&index=159

Outline


Knowledge Reasoning

Machine Learning







Supervised learning

MACHINE LEARNING

a.k.a. Predictive analytics, Data science

• Deep Learning by I. Goodfellow, Y. Bengio, A. Courville, 2016• The Elements of Statistical Learning: Data Mining, Inference, and

Prediction, 2nd ed. by T. Hestie, R. Tibshirani, J. Friedman, 2010• Pattern Recognition and Machine Learning by C. M. Bishop, 2017• Introduction to Machine Learning, 2nd ed. by Ethem Alpaydin,

2010• Machine Learning by Tom M. Mitchell, 1997

Machine Learning Books

https://www.amazon.com/Deep-Learning-Adaptive-Computation-Machine/dp/0262035618/ref=pd_sim_14_2?_encoding=UTF8&psc=1&refRID=D08X34MAP6ANYT781EZK

https://www.amazon.com/Deep-Learning-Adaptive-Computation-Machine/dp/0262035618/ref=pd_sim_14_2?_encoding=UTF8&psc=1&refRID=D08X34MAP6ANYT781EZK

https://www.amazon.com/Elements-Statistical-Learning-Prediction-Statistics/dp/0387848576/ref=sr_1_12?s=books&ie=UTF8&qid=1502240081&sr=1-12&keywords=machine+learning

http://research.microsoft.com/en-us/um/people/cmbishop/prml/

http://www.amazon.com/Introduction-Machine-Learning-Adaptive-Computation/dp/026201243X/ref=sr_1_3?ie=UTF8&s=books&qid=1266796255&sr=8-3

http://www.amazon.com/Ethem-Alpaydin/e/B001KD8D4G/ref=sr_ntt_srch_lnk_3?_encoding=UTF8&qid=1266796255&sr=8-3

http://www.amazon.com/Machine-Learning-Mcgraw-Hill-International-Edit/dp/0071154671/ref=sr_1_1?ie=UTF8&s=books&qid=1266796255&sr=8-1

http://www.amazon.com/Tom-M.-Mitchell/e/B000APT5O2/ref=sr_ntt_srch_lnk_1?_encoding=UTF8&qid=1266796255&sr=8-1

Topics from PRML

• Introduction• Probability Distribution• Linear Models for Regression• Linear Models for Classification• Neural Networks• Kernel Mothods• Sparse Kernel Machines• Graphical Models• Approximate Inference

Topics from Deep Learning

• Applied Math and ML Basics– Linear Algebra– Probability and Information Theory– Numerical Computation– Machine Learning Basics

• Deep Networks: Modern Practices– Deep Forward Neural Networks– Regularization for Deep Learning– Optimization for Training Deep Models– Convolutional Neural Networks– Sequence Modeling: RNN

Learning from Examples

»Supervised Learning

»Classification

»Regression

»Sequence Labeling

»Structure Learning

»Object Recognition

»Summarization

»Unsupervised Learning»Clustering

»Semi-supervised Learning»Use both labeled and

unlabeled training data

»Active Learning»Choose the critical data

points to be labeled

»Distant Learning

Classification: Definition

• Given a collection of records (training set)

– Each record contains a set of attributes, one of the attributes is the class.

• Find a model for class attribute as a function of the values of

other attributes.

• Goal: previously unseen records should be assigned a class as

accurately as possible.

– A test set is used to determine the accuracy of the model.

– Usually, the given data set is divided into training and test sets, with

training set used to build the model and test set used to validate it.

Classification Example

Tid Refund MaritalStatus

TaxableIncome Cheat

1 Yes Single 125K No

2 No Married 100K No

3 No Single 70K No

4 Yes Married 120K No

5 No Divorced 95K Yes

6 No Married 60K No

7 Yes Divorced 220K No

8 No Single 85K Yes

9 No Married 75K No

10 No Single 90K Yes10

Refund MaritalStatus

TaxableIncome Cheat

No Single 75K ?

Yes Married 50K ?

No Married 150K ?

Yes Divorced 90K ?

No Single 40K ?

No Married 80K ?10

Test

Set

Training

SetModel

Learn

Classifier

Clustering: Definition

• Given a set of points, with a notion of distancebetween points, group the points into some number of clusters, so that – Members of a cluster are close/similar to each other

– Members of different clusters are dissimilar

• Usually:– Points are in a high-dimensional space

– Similarity is defined using a distance measure• Euclidean (data points)

• Cosine (vectors),

• Jaccard (set),

• Edit distance (string), …

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

http://www.mmds.org52

Example: Clusters & Outliers

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

http://www.mmds.org53

x x

x x x x

x x x x

x x x

x x

x

xx x

x x

x x x

x

x x x

x

x x

x x x x

x x x

x

x

x

x x

x x x x

x x x x

x x x

x x

x

xx x

x x

x x x

x

x x x

x

x x

x x x x

x x x

x

Outlier Cluster

Structural Learning: An unified framework

Supervised Learning

• Linear Regression

• Binary Classification

• Bayesian Networks

• Minimize Errors

• Maximize Likelihood

Bayesian Networks

• Definition:

– If the network structure of the model is a directed acyclic graph, the model represents a factorization of the joint probability of all random variables.

– Conditional Probability Tables

– Smoothing

– Inference

𝑃 𝑥1, 𝑥2,… , 𝑥𝑛 = 𝑖=1

𝑛

𝑝(𝑥𝑖|𝑝𝑎𝑟𝑒𝑛𝑡 𝑥𝑖 )

https://en.wikipedia.org/wiki/Directed_acyclic_graph

https://en.wikipedia.org/wiki/Probability

Continuous Optimization Problem

• General form:

where– (x) : Rn → R is the objective function to be minimized

over the variable x ,– gi(x) ≤ 0 are called inequality constraints, and– hi(x) = 0 are called equality constraints.

• Special cases:– Unconstrained Optimization, m=p=0– Linear constrained Optimization: gi(x) and hi(x) are linear– Nonlinear constrained Optimization– Quadratic Programming

𝐦𝐢𝐧𝐢𝐦𝐢𝐳𝐞𝑥𝑓(𝑥)

Subject to 𝑔𝑖 𝑥 ≤ 0, 𝑖 = 1,… ,𝑚ℎ𝑗 𝑥 = 0, 𝑗 = 1,… , 𝑝

https://en.wikipedia.org/wiki/Loss_function

https://en.wikipedia.org/wiki/Constraint_(mathematics)

Quadratic Programming

• The objective of quadratic programming is to find an n-dimensional vector x, that will

• Given:

– a real-valued, n-dimensional vector c,

– an n × n-dimensional real symmetric matrix Q,

– an m × n-dimensional real matrix A, and

– an m-dimensional real vector b,

𝐦𝐢𝐧𝐢𝐦𝐢𝐳𝐞𝒙

12𝒙𝑇𝑄𝒙+𝒄𝑇𝒙)

Subject to 𝐴𝒙 ≤ 𝑏

https://en.wikipedia.org/wiki/Symmetric_matrix

Summary

• Related Courses

– AI, Neural Network, Data Mining, Machine Learning, Optimization, Graphical Models, Deep Learning

• Optimization Problems

– Combinatorial Optimization Problem• Search for values for each variable

• Constrained Satisfaction Problem

– Continuous Optimization Problem

• Batch/Minibatch/Stochastic Gradient descent

• Inference

– Symbolic Logic vs. Statistical Models

Questions & Answers

Documents

Introduction Artificial Intelligence and Machine Learning