Learning In Bayesian Networks. General Learning Problem Set of random variables X = {X 1, X 2, X 3,...

Learning In Bayesian Networks

General Learning Problem

Set of random variables X = {X1, X2, X3, X4, …}

Training set D = {X(1), X(2), …, X(N)}

Each observation specifies values of subset of variables X(1) = {x1, x2, ?, x4, …}

X(2) = {x1, x2, x3, x4, …}

X(3) = {?, x2, x3, x4, …}

Predict joint distribution over some variables given other variables E.g., P(X1, X3 | X2, X4)

Classes Of Graphical Model Learning Problems

Network structure knownAll variables observed

Network structure knownSome missing data (or latent variables)

Network structure not knownAll variables observed

Network structure not knownSome missing data (or latent variables)

today and nextclass

going to skip(not too relevantfor papers we’ll read;see optionalreadings for moreinfo)

If Network Structure Is Known, The Problem Involves Learning Distributions

The distributions are characterized by parameters .𝝝

Goal: infer that best explains the data D.𝝝

Bayesian methods

Maximum a posteriori

Maximum likelihood

Max. pseudo-likelihood

Moment matching

Learning CPDs When All Variables Are Observed And Network Structure Is Known

Trivial problem if we’re doing Maximum Likelihood estimation

ZX Y P(Z|X,Y)0 0 ?0 1 ?1 0 ?1 1 ?

P(X)?P(X)?

X Y Z0 0 10 1 10 1 01 1 11 1 11 0 0

Training Data

Recasting Learning As Bayesian Inference

We’ve already used Bayesian inference in probabilistic models to compute posteriors on latent (a.k.a. hidden, nonobservable) variables from data.

 E.g., Weiss model Direction of motion

 E.g., Gaussian mixture model To which cluster does each data point belong

Why not treat unknown parametersin the same way?

 E.g., Gaussian mixture parameters E.g., entries in conditional prob. tables

Recasting Learning As Inference

Suppose you have a coin with an unknown bias, θ ≡ P(head).

You flip the coin multiple times and observe the outcome.

From observations, you can infer the bias of the coin

This is learning. This is inference.

Treating Conditional ProbabilitiesAs Latent Variables

Graphical model probabilities (priors, conditional distributions) can also be cast as random variables

E.g., Gaussian mixture model

Remove the knowledge “built into” the links (conditional distributions) and into the nodes (prior distributions).

Create new random variables to represent the knowledge

Hierarchical Bayesian Inference

Slides stolen from David Heckerman tutorial

I’m going to flip a coin, with unknown bias.Depending on whether it comes up heads or tails, I will flip one of two other coins, each with an unknown bias.

X: outcome of first flipY: outcome of second flip

training example 1

training example 2

Parameters might not be independent

training example 1

training example 2

General Approach: Bayesian Learningof Probabilities in a Bayes Net

If network structure Sh known and no missing data…

We can express joint distribution over variables X in terms of model parameter vector θs

Given random sample D = {X(1), X(2), …, X(N)}, compute the posterior distribution p(θs | D, Sh)

Probabilistic formulation of all supervised and unsupervised learning problems.

Computing Parameter Posteriors

E.g., net structure X→Y

Computing Parameter Posteriors

Given complete data (all X,Y observed) and no direct dependencies among parameters,

Explanation Given complete data, each setof parameters is disconnected fromeach other set of parameters in the graph

X Y θy|x

D separation

parameterindependence

Posterior Predictive Distribution

Given parameter posteriors

What is prediction of next observation X(N+1)?

How can this be used for unsupervised and supervised learning?

What does this formula look like if we’re doing approximate inference by sampling θs and/or X(N+1)?

What we talkedabout the past three classes

What we justdiscussed

Prediction Directly From Data

In many cases, prediction can be made without explicitly computing posteriors over parametersE.g., coin toss example from earlier class

Posterior distribution is

Prediction of next coin outcome

Terminology Digression

Bernoulli : Binomial :: Categorical : Multinomial

singlecoinflip

N flipsof coin

(e.g., examscore)

singleroll

of die

N rollsof die

(e.g., # ofeach major)

Generalizing To Categorical RVs In Bayes Net

Variable Xi is discrete, with values xi1, ... xi

i: index of categorical RVj: index over configurations of the parents of node ik: index over values of node i

unrestricted distribution: one parameter per probability

notation change

Prediction Directly From Data:Categorical Random Variables

Prior distribution is Posterior distribution is

Posterior predictive distribution:

I: index over nodesj: index over values of parents of Ik: index over values of node i

Other Easy Cases

Members of the exponential family

see Barber text 8.5

Linear regression with Gaussian noise

see Barber text 18.1

Learning In Bayesian Networks. General Learning Problem Set of random variables X = {X 1, X 2, X 3,...

Documents

Learning Set Instructions - ioer.ilsharedlearning.org · Learning Set Instructions April 2016 ilsharedlearning.org About Learning Sets Learning Sets consist of an ordered sequence

The Outdoor Furniture Collection…D. 4 Seat Bar Set Set includes: 4 x Bar Stools 1 x Square Bar with Integrated Ice Bucket C. 6 Seat Bar Set Set includes: 6 x Bar Stools 1 x Round

Specifications...LX531H/ LX530H Dimension LX531H LX530H Set Size (With Stand, W x H x D, mm) 42" 961 x 612 x 218 28H" 641.5 x 432 x 170 32H" 732 x 481 x 207 Set Weight (With Stand,

OBSERVATIONAL LEARNING AND LEARNING SET. OBSERVATIONAL LEARNING & LEARNING SET Learning through watching and listening to someone else. This means we

Machine Learning Week 1, Lecture 2. Recap Supervised Learning Data Set Learning Algorithm Hypothesis h h(x) ≈ f(x) Unknown Target f Hypothesis Set 5 0

GOAL: Gist-set Online Active Learning for E cient Chest X

Computational Learning Theory: Probably Approximately ...svivek.com/teaching/machine-learning/lectures/... · Recall: The setup • Instance Space: X, the set of examples • Concept

REPLACEMENT PARTS MANUAL - Bard HVAC · Heat Strip 5 KW Heat Strip 10 KW X X X X 18 116-191 Heat Strip Wrapper X X X 23 23 3000-1148 3000-1171 Wire Set Wire Set X X X X X ... 6 7003-056

Algebra II Chapter 2 Notes A set of values and a set of values. · · 2014-08-23A set of _ values and a set of _ values. ... Given: f(x) = x – 25 g(x) = 3𝑥 h(x) = 1

SET X - Manual Basico Espanol

George Boole 1815-1864. Set Theory More Set Theory – Union (“OR”) X is the set of men, Y is the set of women X + Y = ? Y + X = ?

Manifold Learning Techniques: So which is the best? · Manifold Learning for Dimensionality Reduction X ∈ℜD →Y ∈ℜd Goa l: We want to map a D-dimensional data set X to a

SET-BC Professional Learning Services 2019-2020...SET-BC Professional Learning Services 2019-2020 ... SET-BC Professional Learning Services 2019-2020 - 2 - March 2019 ... Plickers,

par Soroush Mehri - CORE · 1.5 Gradient descent algorithm applied to f(x 1,x 2)=(x2 1 +x 2 −11)2 + (x 1 +x2 2 −7) 2 for diﬀerent learning rates. With learning rate set too

Let’s get learning! R€¦ · Let’s get learning! ©2014 KindergartenWorks Choose a set of games and activities. Let’s get learning! kindergarten set 1 R set 2 R set 3 R set

Patt. Rec. and Mach. Learning Ch. 1: Introductionjegou/bishopreadinggroup/chap1.pdf · Machine learning: a large set of input vectors x 1,...,x N, or a training set is used to tune

Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x) F(x) x x-x* x* Feasible set K Optimization

1 Imbalanced Data Set Learning with Synthetic Examples Benjamin X. Wang and Nathalie Japkowicz

Halloween Learning Set - tribobot.com

Learning Outcomes and Modularisation A Key for Innovative ...supporthere.org/sites/default/files/session_6_curricula_design_cend… · Outcome 1 x x Learning Outcome 2 x x x Learning