Bayesian Learning, cont’d. Administrivia Homework 1 returned today (details in a second) Reading 2 assigned today S. Thrun, Learning occupancy grids with

Bayesian Learning,cont’d

Administrivia•Homework 1 returned today (details in a

second)

•Reading 2 assigned today

•S. Thrun, Learning occupancy grids with forward sensor models. Autonomous Robots, 2002.

•Due: Oct 26

•Much crunchier than the first! Don’t slack.

•Work with your group to sort out the math.

•Questions to mailing list and me.

•Midterm exam: Oct 21

Homework 1 results•Mean=30.3; std=6.9

IID Samples•In supervised learning, we usually

assume that data points are sampled independently and from the same distribution

•IID assumption: data are independent and identically distributed

•⇒ joint PDF can be written as product of individual (marginal) PDFs:

The max likelihood recipe•Start with IID data

•Assume model for individual data point, f(X;Θ)

•Construct joint likelihood function (PDF):

•Find the params Θ that maximize L

•(If you’re lucky): Differentiate L w.r.t. Θ, set =0 and solve

•Repeat for each class

Exercise•Find the maximum likelihood estimator of μ

for the univariate Gaussian:

•Find the maximum likelihood estimator of β for the degenerate gamma distribution:

•Hint: consider the log of the likelihood fns in both cases

Solutions•PDF for one data point:

•Joint likelihood of N data points:

Solutions•Log-likelihood:


•Differentiate w.r.t. μ:







Solutions•What about for the gamma PDF?

Putting the parts together

[X,Y]

com

ple

te

train

ing

data

Putting the parts together Assumed distribution

family (hyp. space)w/ parameters Θ

Parameters for class a:

Specific PDFfor class a



Gaussian Distributions

5 minutes of math...•Recall your friend the Gaussian PDF:

•I asserted that the d-dimensional form is:

•Let’s look at the parts...

5 minutes of math...

5 minutes of math...•Ok, but what do the parts mean?

•Mean vector, : mean of data along each dimension

5 minutes of math...•Covariance matrix

•Like variance, but describes spread of data

5 minutes of math...•Note: covariances on the diagonal of

are same as standard variances on that dimension of data

•But what about skewed data?

5 minutes of math...•Off-diagonal covariances ( )

describe the pairwise variance

•How much xi changes as x

j changes (on

avg)

5 minutes of math...•Calculating from data:

•In practice: you want to measure the covariance between every pair of random variables (dimensions):

•Or, in linear algebra:

Documents

Bayesian Learning, cont’d. Administrivia Homework 1 returned today (details in a second) Reading 2 assigned today S. Thrun, Learning occupancy grids with