21
Bayesian Learning, Cont’d

Bayesian Learning, Cont’d. Administrivia Various homework bugs: Due: Oct 12 (Tues) not 9 (Sat) Problem 3 should read: (duh) (some) info on naive Bayes

  • View
    213

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Bayesian Learning, Cont’d. Administrivia Various homework bugs: Due: Oct 12 (Tues) not 9 (Sat) Problem 3 should read: (duh) (some) info on naive Bayes

Bayesian Learning, Cont’d

Page 2: Bayesian Learning, Cont’d. Administrivia Various homework bugs: Due: Oct 12 (Tues) not 9 (Sat) Problem 3 should read: (duh) (some) info on naive Bayes

Administrivia•Various homework bugs:

•Due: Oct 12 (Tues) not 9 (Sat)

•Problem 3 should read:

•(duh)

•(some) info on naive Bayes in Sec. 4.3 of text

Page 3: Bayesian Learning, Cont’d. Administrivia Various homework bugs: Due: Oct 12 (Tues) not 9 (Sat) Problem 3 should read: (duh) (some) info on naive Bayes

Administrivia•Another bug in last time’s lecture:

•Multivariate Gaussian should look like:

Page 4: Bayesian Learning, Cont’d. Administrivia Various homework bugs: Due: Oct 12 (Tues) not 9 (Sat) Problem 3 should read: (duh) (some) info on naive Bayes

5 minutes of math...•Joint probabilities

•Given d different random vars,

•The “joint” probability of them taking on the simultaneous values

•given by

•Or, for shorthand,

•Closely related to the “joint PDF”

Page 5: Bayesian Learning, Cont’d. Administrivia Various homework bugs: Due: Oct 12 (Tues) not 9 (Sat) Problem 3 should read: (duh) (some) info on naive Bayes

5 minutes of math...•Independence:

•Two random variables are statistically independent iff:

•Or, equivalently (usually for discrete RVs):

•For multivariate RVs:

Page 6: Bayesian Learning, Cont’d. Administrivia Various homework bugs: Due: Oct 12 (Tues) not 9 (Sat) Problem 3 should read: (duh) (some) info on naive Bayes

Exercise•Suppose you’re given the PDF:

•Where z is a normalizing constant.

•What must z be to make this a legitimate PDF?

•Are and independent? Why or why not?

•What about the PDF:

Page 7: Bayesian Learning, Cont’d. Administrivia Various homework bugs: Due: Oct 12 (Tues) not 9 (Sat) Problem 3 should read: (duh) (some) info on naive Bayes

Parameterizing PDFs•Given training data, [X, Y], w/ discrete labels Y

•Break data out into sets , etc.

•Want to come up with models, ,

•Suppose the individual f()s are Gaussian, need the params μ and σ

•How do you get the params?

•Now, what if the f()s are something really funky you’ve never seen before in your life, with parameters

, etc.

Page 8: Bayesian Learning, Cont’d. Administrivia Various homework bugs: Due: Oct 12 (Tues) not 9 (Sat) Problem 3 should read: (duh) (some) info on naive Bayes

Maximum likelihood•Principle of maximum likelihood:

•Pick the parameters that make the data as probable (or, in general “likely”) as possible

•Regard the probability function as a func of two variables: data and parameters:

•Function L is the “likelihood function”

•Want to pick the that maximizes L

Page 9: Bayesian Learning, Cont’d. Administrivia Various homework bugs: Due: Oct 12 (Tues) not 9 (Sat) Problem 3 should read: (duh) (some) info on naive Bayes

Example•Consider the exponential PDF:

•Can think of this as either a function of x or τ

Page 10: Bayesian Learning, Cont’d. Administrivia Various homework bugs: Due: Oct 12 (Tues) not 9 (Sat) Problem 3 should read: (duh) (some) info on naive Bayes

Exponential as fn of x

Page 11: Bayesian Learning, Cont’d. Administrivia Various homework bugs: Due: Oct 12 (Tues) not 9 (Sat) Problem 3 should read: (duh) (some) info on naive Bayes

Exponential as a fn of τ

Page 12: Bayesian Learning, Cont’d. Administrivia Various homework bugs: Due: Oct 12 (Tues) not 9 (Sat) Problem 3 should read: (duh) (some) info on naive Bayes

Max likelihood params•So, for a fixed set of data, X, want the

parameter that maximizes L

•Hold X constant, optimize

•How?

•More important: f() is usually a function of a single data point (possibly vector), but L is a func. of a set of data

•How do you extend f() to set of data?

Page 13: Bayesian Learning, Cont’d. Administrivia Various homework bugs: Due: Oct 12 (Tues) not 9 (Sat) Problem 3 should read: (duh) (some) info on naive Bayes

IID Samples•In supervised learning, we usually

assume that data points are sampled independently and from the same distribution

•IID assumption: data are independent and identically distributed

Page 14: Bayesian Learning, Cont’d. Administrivia Various homework bugs: Due: Oct 12 (Tues) not 9 (Sat) Problem 3 should read: (duh) (some) info on naive Bayes

IID Samples•In supervised learning, we usually

assume that data points are sampled independently and from the same distribution

•IID assumption: data are independent and identically distributed

•⇒ joint PDF can be written as product of individual (marginal) PDFs:

Page 15: Bayesian Learning, Cont’d. Administrivia Various homework bugs: Due: Oct 12 (Tues) not 9 (Sat) Problem 3 should read: (duh) (some) info on naive Bayes

The max likelihood recipe•Start with IID data

•Assume model for individual data point, f(X;Θ)

•Construct joint likelihood function (PDF):

•Find the params Θ that maximize L

•(If you’re lucky): Differentiate L w.r.t. Θ, set =0 and solve

•Repeat for each class

Page 16: Bayesian Learning, Cont’d. Administrivia Various homework bugs: Due: Oct 12 (Tues) not 9 (Sat) Problem 3 should read: (duh) (some) info on naive Bayes

Exercise•Find the maximum likelihood estimator of μ

for the univariate Gaussian:

•Find the maximum likelihood estimator of β for the degenerate gamma distribution:

•Hint: consider the log of the likelihood fns in both cases

Page 17: Bayesian Learning, Cont’d. Administrivia Various homework bugs: Due: Oct 12 (Tues) not 9 (Sat) Problem 3 should read: (duh) (some) info on naive Bayes

Putting the parts together

[X,Y]

com

ple

te

train

ing

data

Page 18: Bayesian Learning, Cont’d. Administrivia Various homework bugs: Due: Oct 12 (Tues) not 9 (Sat) Problem 3 should read: (duh) (some) info on naive Bayes

5 minutes of math...•Marginal probabilities

•If you have a joint PDF:

•... and want to know about the probability of just one RV (regardless of what happens to the others)

•Marginal PDF of or :

Page 19: Bayesian Learning, Cont’d. Administrivia Various homework bugs: Due: Oct 12 (Tues) not 9 (Sat) Problem 3 should read: (duh) (some) info on naive Bayes

5 minutes of math...•Conditional probabilities

•Suppose you have a joint PDF, f(H,W)

•Now you get to see one of the values, e.g., H=“183cm”

•What’s your probability estimate of A, given this new knowledge?

Page 20: Bayesian Learning, Cont’d. Administrivia Various homework bugs: Due: Oct 12 (Tues) not 9 (Sat) Problem 3 should read: (duh) (some) info on naive Bayes

5 minutes of math...•Conditional probabilities

•Suppose you have a joint PDF, f(H,W)

•Now you get to see one of the values, e.g., H=“183cm”

•What’s your probability estimate of A, given this new knowledge?

Page 21: Bayesian Learning, Cont’d. Administrivia Various homework bugs: Due: Oct 12 (Tues) not 9 (Sat) Problem 3 should read: (duh) (some) info on naive Bayes

Everything’s random...•Basic Bayesian viewpoint:

•Treat (almost) everything as a random variable

•Data/independent var: X vector

•Class/dependent var: Y

•Parameters: Θ

•E.g., mean, variance, correlations, multinomial params, etc.

•Use Bayes’ Rule to assess probabilities of classes