Bayesian Learning, Cont’d. Administrivia Various homework bugs: Due: Oct 12 (Tues) not 9 (Sat) Problem 3 should read: (duh) (some) info on naive Bayes

Bayesian Learning, Cont’d

Administrivia•Various homework bugs:

•Due: Oct 12 (Tues) not 9 (Sat)

•Problem 3 should read:

•(duh)

•(some) info on naive Bayes in Sec. 4.3 of text

Administrivia•Another bug in last time’s lecture:

•Multivariate Gaussian should look like:

5 minutes of math...•Joint probabilities

•Given d different random vars,

•The “joint” probability of them taking on the simultaneous values

•given by

•Or, for shorthand,

•Closely related to the “joint PDF”

5 minutes of math...•Independence:

•Two random variables are statistically independent iff:

•Or, equivalently (usually for discrete RVs):

•For multivariate RVs:

Exercise•Suppose you’re given the PDF:

•Where z is a normalizing constant.

•What must z be to make this a legitimate PDF?

•Are and independent? Why or why not?

•What about the PDF:

Parameterizing PDFs•Given training data, [X, Y], w/ discrete labels Y

•Break data out into sets , etc.

•Want to come up with models, ,

•Suppose the individual f()s are Gaussian, need the params μ and σ

•How do you get the params?

•Now, what if the f()s are something really funky you’ve never seen before in your life, with parameters

, etc.

Maximum likelihood•Principle of maximum likelihood:

•Pick the parameters that make the data as probable (or, in general “likely”) as possible

•Regard the probability function as a func of two variables: data and parameters:

•Function L is the “likelihood function”

•Want to pick the that maximizes L

Example•Consider the exponential PDF:

•Can think of this as either a function of x or τ

Exponential as fn of x

Exponential as a fn of τ

Max likelihood params•So, for a fixed set of data, X, want the

parameter that maximizes L

•Hold X constant, optimize

•How?

•More important: f() is usually a function of a single data point (possibly vector), but L is a func. of a set of data

•How do you extend f() to set of data?

IID Samples•In supervised learning, we usually

assume that data points are sampled independently and from the same distribution

•IID assumption: data are independent and identically distributed

IID Samples•In supervised learning, we usually

assume that data points are sampled independently and from the same distribution

•IID assumption: data are independent and identically distributed

•⇒ joint PDF can be written as product of individual (marginal) PDFs:

The max likelihood recipe•Start with IID data

•Assume model for individual data point, f(X;Θ)

•Construct joint likelihood function (PDF):

•Find the params Θ that maximize L

•(If you’re lucky): Differentiate L w.r.t. Θ, set =0 and solve

•Repeat for each class

Exercise•Find the maximum likelihood estimator of μ

for the univariate Gaussian:

•Find the maximum likelihood estimator of β for the degenerate gamma distribution:

•Hint: consider the log of the likelihood fns in both cases

Putting the parts together

[X,Y]

com

ple

te

train

ing

data

5 minutes of math...•Marginal probabilities

•If you have a joint PDF:

•... and want to know about the probability of just one RV (regardless of what happens to the others)

•Marginal PDF of or :

5 minutes of math...•Conditional probabilities

•Suppose you have a joint PDF, f(H,W)

•Now you get to see one of the values, e.g., H=“183cm”

•What’s your probability estimate of A, given this new knowledge?

5 minutes of math...•Conditional probabilities

•Suppose you have a joint PDF, f(H,W)

•Now you get to see one of the values, e.g., H=“183cm”

•What’s your probability estimate of A, given this new knowledge?

Everything’s random...•Basic Bayesian viewpoint:

•Treat (almost) everything as a random variable

•Data/independent var: X vector

•Class/dependent var: Y

•Parameters: Θ

•E.g., mean, variance, correlations, multinomial params, etc.

•Use Bayes’ Rule to assess probabilities of classes

Documents

Bayesian Learning, Cont’d. Administrivia Various homework bugs: Due: Oct 12 (Tues) not 9 (Sat) Problem 3 should read: (duh) (some) info on naive Bayes