69
Machine Learning Sourangshu Bhattacharya

Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

Machine Learning

Sourangshu Bhattacharya

Page 2: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

Bayesian Networks

Directed Acyclic Graph (DAG)

Page 3: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

Bayesian Networks

General Factorization

Page 4: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

Curve Fitting Re-visited

Page 5: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

Maximum Likelihood

Determine by minimizing sum-of-squares error, .

Page 6: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

Curve Fitting

Polynomial

Page 7: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

Curve Fitting

Plate

Page 8: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

Predictive Distribution

Page 9: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

MAP: A Step towards Bayes

Determine by minimizing regularized sum-of-squares error, .

Page 10: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

Bayesian Curve Fitting (3)

Input variables and explicit hyperparameters

Page 11: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

Bayesian Curve Fitting—Learning

Condition on data

Page 12: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

Generative Models

Causal process for generating images

Page 13: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

Discrete Variables (1)

General joint distribution: K 2 { 1 parameters

Independent joint distribution: 2(K { 1) parameters

Page 14: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

Discrete Variables (2)

General joint distribution over M variables: KM { 1 parameters

M -node Markov chain: K { 1 + (M { 1)K(K { 1)

parameters

Page 15: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

Conditional Independence

a is independent of b given c

Equivalently

Notation

Page 16: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

Conditional Independence: Example 1

Page 17: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

Conditional Independence: Example 1

Page 18: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

Conditional Independence: Example 2

Page 19: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

Conditional Independence: Example 2

Page 20: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

Conditional Independence: Example 3

Note: this is the opposite of Example 1, with c unobserved.

Page 21: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

Conditional Independence: Example 3

Note: this is the opposite of Example 1, with c observed.

Page 22: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

“Am I out of fuel?”

B = Battery (0=flat, 1=fully charged)F = Fuel Tank (0=empty, 1=full)G = Fuel Gauge Reading

(0=empty, 1=full)

and hence

Page 23: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

“Am I out of fuel?”

Probability of an empty tank increased by observing G = 0.

Page 24: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

“Am I out of fuel?”

Probability of an empty tank reduced by observing B = 0. This referred to as “explaining away”.

Page 25: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

D-separation• A, B, and C are non-intersecting subsets of nodes in a

directed graph.• A path from A to B is blocked if it contains a node such that

eithera) the arrows on the path meet either head-to-tail or tail-

to-tail at the node, and the node is in the set C, orb) the arrows meet head-to-head at the node, and

neither the node, nor any of its descendants, are in the set C.

• If all paths from A to B are blocked, A is said to be d-separated from B by C.

• If A is d-separated from B by C, the joint distribution over all variables in the graph satisfies .

Page 26: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

D-separation: Example

Page 27: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

D-separation: Example

Page 28: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

D-separation: I.I.D. Data

Page 29: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

Markov Random Fields

Markov Blanket

Page 30: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

Cliques and Maximal Cliques

Clique

Maximal Clique

Page 31: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

Joint Distribution

where is the potential over clique C and

is the normalization coefficient; note: MK-state variables KM terms in Z.

Energies and the Boltzmann distribution

Page 32: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

Illustration: Image De-Noising (1)

Original Image Noisy Image

Page 33: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

Illustration: Image De-Noising (2)

Page 34: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

Illustration: Image De-Noising (3)

Noisy Image Restored Image (ICM)

Page 35: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

Converting Directed to Undirected Graphs (1)

Page 36: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

Converting Directed to Undirected Graphs (2)

Additional links

Moralization

Moral Graph

Page 37: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

Directed vs. Undirected Graphs (1)

Page 38: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

Directed vs. Undirected Graphs (2)

Page 39: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

Inference in Graphical Models

Page 40: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

Inference in Graphical Models

• Posterior Marginalization: P xn y1, … , yn

=

𝑥1,…,𝑥𝑛−1,𝑥𝑛+1,…,𝑥𝑁

𝑃(𝑥1, … , 𝑥𝑁|𝑦1, … , 𝑦𝑀)

• M A P:𝑥1∗, … , 𝑥𝑁

∗ = max𝑥1,…,𝑥𝑁𝑃(𝑥1, … , 𝑥𝑁|𝑦1, … , 𝑦𝑀)

Page 41: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

Conditional Probability:

𝑃 𝑋 𝑌 =1

𝑍exp(𝑤𝑇𝑓 𝑋, 𝑌 )

𝑓 𝑋, 𝑌 = [𝜓12 𝑥1, 𝑥2, 𝑌 , … , 𝜓𝑛−1,𝑛 𝑥𝑛−1, 𝑥𝑛, 𝑌 ]

Conditional Random Fields

𝑋1 𝑋2 𝑋𝑛

𝑌1 𝑌2 𝑌𝑛

. . .

Page 42: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

• Observation sequence: 𝑌1, … , 𝑌𝑛; State sequence: 𝑋1, … , 𝑋𝑛• Transition Prob.: 𝑃 𝑋𝑖+1 𝑋𝑖 ; Emission prob.: 𝑃(𝑌𝑖|𝑋𝑖).

𝑃 𝑋, 𝑌 = 𝑃(𝑋1)

𝑖=1

𝑛−1

𝑃(𝑋𝑖+1|𝑋𝑖)

𝑗=1

𝑛

𝑃(𝑌𝑗|𝑋𝑗)

Hidden Markov Model

𝑋1 𝑋2 𝑋𝑛

𝑌1 𝑌2 𝑌𝑛

. . .

Page 43: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

Inference on a Chain

Page 44: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

Inference on a Chain

Page 45: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

Inference on a Chain

Page 46: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

Inference on a Chain

Page 47: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

Inference on a Chain

To compute local marginals:

•Compute and store all forward messages, .

•Compute and store all backward messages, .

•Compute Z at any node xm

•Compute

for all variables required.

Page 48: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

Trees

Undirected Tree Directed Tree Polytree

Page 49: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

Factor Graphs

Page 50: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

Factor Graphs from Directed Graphs

Page 51: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

Factor Graphs from Undirected Graphs

Page 52: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

The Sum-Product Algorithm (1)

Objective:

i. to obtain an efficient, exact inference algorithm for finding marginals;

ii. in situations where several marginals are required, to allow computations to be shared efficiently.

Key idea: Distributive Law

Page 53: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

The Sum-Product Algorithm (2)

Page 54: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

The Sum-Product Algorithm (3)

Page 55: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

The Sum-Product Algorithm (4)

Page 56: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

The Sum-Product Algorithm (5)

Page 57: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

The Sum-Product Algorithm (6)

Page 58: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

The Sum-Product Algorithm (7)

Initialization

Page 59: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

The Sum-Product Algorithm (8)

To compute local marginals:• Pick an arbitrary node as root

• Compute and propagate messages from the leaf nodes to the root, storing received messages at every node.

• Compute and propagate messages from the root to the leaf nodes, storing received messages at every node.

• Compute the product of received messages at each node for which the marginal is required, and normalize if necessary.

Page 60: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

Sum-Product: Example (1)

Page 61: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

Sum-Product: Example (2)

Page 62: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

Sum-Product: Example (3)

Page 63: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

Sum-Product: Example (4)

Page 64: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

The Max-Sum Algorithm (1)

Objective: an efficient algorithm for finding

i. the value xmax that maximises p(x);

ii. the value of p(xmax).

In general, maximum marginals joint maximum.

Page 65: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

The Max-Sum Algorithm (2)

Maximizing over a chain (max-product)

Page 66: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

The Max-Sum Algorithm (3)

Generalizes to tree-structured factor graph

maximizing as close to the leaf nodes as possible

Page 67: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

The Max-Sum Algorithm (4)

Max-Product Max-Sum

For numerical reasons, use

Again, use distributive law

Page 68: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

The Max-Sum Algorithm (5)

Initialization (leaf nodes)

Recursion

Page 69: Pattern Recognition and Machine Learning : Graphical Modelscse.iitkgp.ac.in/~sourangshu/coursefiles/ML15A/graphicalmodels.pdf · Bayesian Curve Fitting—Learning Condition on data

The Max-Sum Algorithm (6)

Termination (root node)

Back-track, for all nodes i with l factor nodes to the root (l=0)