35
Bayesian Statistics AI Friends Seminar Ganguk Hwang Department of Mathematical Sciences KAIST AI Friends Seminar Ganguk Hwang Bayesian Statistics January 2019 1 / 37

Bayesian Statistics - AiFrenz · Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Bayesian Statistics - AiFrenz · Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range

Bayesian Statistics

AI Friends Seminar

Ganguk Hwang

Department of Mathematical SciencesKAIST

AI Friends Seminar Ganguk Hwang Bayesian Statistics January 2019 1 / 37

Page 2: Bayesian Statistics - AiFrenz · Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range

Introduction

Introduction

AI Friends Seminar Ganguk Hwang Bayesian Statistics January 2019 2 / 37

Page 3: Bayesian Statistics - AiFrenz · Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range

Introduction

Bayesian Inference

Bayesian inference is a method of statistical inference in which Bayes’theorem is used to update the probability for a hypothesis as moreevidence or information becomes available.

Bayesian updating is particularly important in the dynamic analysis of asequence of data.

Bayesian inference has found application in a wide range of activities,including science, engineering, philosophy, medicine, sport, and law.

(in Wikipedia)

AI Friends Seminar Ganguk Hwang Bayesian Statistics January 2019 3 / 37

Page 4: Bayesian Statistics - AiFrenz · Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range

Introduction

Thomas Bayes and Bayes’ Theorem

Thomas Bayes (1701 1761) was an English statistician, philosopher andPresbyterian minister who is known for formulating a specific case of thetheorem that bears his name: Bayes’ theorem.

Bayes never published what would become his most famousaccomplishment; his notes were edited and published after his death byRichard Price.

(in Wikipedia)Bayes’ TheoremFor a partition Ei of the sample space Ω and an event F ,

PEi|F =PEi ∩ FPF

=PF |EiPEi∑Ni=1 PEiPF |Ei

AI Friends Seminar Ganguk Hwang Bayesian Statistics January 2019 4 / 37

Page 5: Bayesian Statistics - AiFrenz · Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range

Introduction

Distribution Function and Parameter

The distribution function F (x) of a random variable X is defined by

F (x) = PX ≤ x.

The parameter of a distribution function is the value that determines thedistribution.

For instance, when we consider a Bernoulli random variable X withPX = 1 = p, PX = 0 = 1− p, the value p determines thedistribution of X and is the parameter of the distribution of X.

AI Friends Seminar Ganguk Hwang Bayesian Statistics January 2019 5 / 37

Page 6: Bayesian Statistics - AiFrenz · Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range

Introduction

Likelihood Function

Likelihood FunctionLet X1, X2, · · · , Xn be independent and identically distributed (i.i.d.)random variables with parameter θ. The probability mass (or density)function of Xi is given by p(x|θ). Then, the likelihood L(x1, x2, · · · , xn|θ)is defined by

L(x1, x2, · · · , xn|θ) = p(x1|θ)p(x2|θ) · · · p(xn|θ)

Maximum Likelihood Estimator (MLE) of θ

The MLE is the value of θ that maximizes the likelihood function.

Here, we consider the likelihood function as a function of θ.

To compute the MLE, we usually use log(L(x1, x2, · · · , xn|θ)).

AI Friends Seminar Ganguk Hwang Bayesian Statistics January 2019 6 / 37

Page 7: Bayesian Statistics - AiFrenz · Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range

Bayesian models

Bayesian models

AI Friends Seminar Ganguk Hwang Bayesian Statistics January 2019 7 / 37

Page 8: Bayesian Statistics - AiFrenz · Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range

Bayesian models

Bayesian models

In Bayesian models we use the Bayes’ rule to obtain unknown probability.

In Bayesian models with two random varaibles X and Y , the following areinitially given.

The data generating distribution: X|Y ∼ p(x|y)

The prior distribution Y ∼ p(y)

We then obtain a sample value X = x and want to estimate thedistribution (the posterior distribution) of Y , given X = x, i.e., p(y|x).

p(y|x) =p(x, y)

p(x)=p(x|y)p(y)

p(x)=

p(x|y)p(y)∫p(x, y) dy

.

We frequently use p(y|x) ∝ p(x|y)p(y).

AI Friends Seminar Ganguk Hwang Bayesian Statistics January 2019 8 / 37

Page 9: Bayesian Statistics - AiFrenz · Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range

Bayesian models

Bayesian models

Consider a random variable X having distribution function F (x) withunknown parameter θ.

In Bayesian models, the unknown parameter θ is considered stochastic. Sowe believe that θ ∼ p(θ) where p(θ) is called a prior distribution beforesampling. After sampling, using Bayes’ rule we obtain so-called a posteriordistribution as follows:

p(θ|x) =p(x, θ)

p(x)=p(x|θ)p(θ)p(x)

=p(x|θ)p(θ)∫p(x, θ) dθ

.

AI Friends Seminar Ganguk Hwang Bayesian Statistics January 2019 9 / 37

Page 10: Bayesian Statistics - AiFrenz · Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range

Bayesian models

Bayesian models

P (θ) is the prior which is our belief of θ without considering the data(evidence) DP (θ|D) is the posterior which is a refined belief of θ with the evidenceDP (D|θ) is the likelihood which is the probability of obtaining the dataD as generated with parameter θ

P (D) is the evidence which is the probability of the data asdetermined by considering all possible values of θ

AI Friends Seminar Ganguk Hwang Bayesian Statistics January 2019 10 / 37

Page 11: Bayesian Statistics - AiFrenz · Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range

Bayesian models

Bayesian models

Figure: Concept of Bayesian models

AI Friends Seminar Ganguk Hwang Bayesian Statistics January 2019 11 / 37

Page 12: Bayesian Statistics - AiFrenz · Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range

Bayesian models

Maximum A Posterior Estimator

Consider X ∼ p(x|θ) with a prior distribution of p(θ). Here, θ is theparameter of the distribution of X.

The Maximum A Posterior (MAP) estimator is given by

θMAP = argmaxθ log p(θ|X).

Note that p(θ|X) is the posterior distribution of θ and

p(θ|X) =p(θ,X)

p(X).

Since p(X) is not a function of θ, we have

θMAP = argmaxθ log p(θ,X).

AI Friends Seminar Ganguk Hwang Bayesian Statistics January 2019 12 / 37

Page 13: Bayesian Statistics - AiFrenz · Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range

Bayesian models

Maximum A Posterior Estimator

Recall that the Maximum Likelihood Estimator (MLE) is defined by

θMLE = argmaxθ log p(X|θ)

and the MAP estimator is given by

θMAP = argmaxθ log p(θ,X) = argmaxθ log p(X|θ)p(θ).

So, the MAP estimator can use the prior information, but the MLE cannot.

AI Friends Seminar Ganguk Hwang Bayesian Statistics January 2019 13 / 37

Page 14: Bayesian Statistics - AiFrenz · Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range

Bayesian models

Bayesain model: an example

Let X1, X2, · · · , Xn be indepedent and identically distributed (i.i.d.)Bernoulli random variables where

PX1 = 1|θ = θ, PX1 = 0|θ = 1− θ.

Here, θ is usually called the parameter of Bernoulli distribution.First, observe that

PX1 = x1, X2 = x2, · · · , Xn = xn|θ

=

n∏i=1

PXi = xi|θ =

n∏i=1

θxi(1− θ)1−xi

= θ∑n

i=1 xi(1− θ)n−∑n

i=1 xi .

AI Friends Seminar Ganguk Hwang Bayesian Statistics January 2019 14 / 37

Page 15: Bayesian Statistics - AiFrenz · Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range

Bayesian models

The prior distribution of θ is given by a uniform distribution over [0, 1], i.e.,

p(θ) = 1 for 0 ≤ θ ≤ 1.

After obtaining sample values X1 = x1, X2 = x2, · · · , Xn = xn, we areinterested in the posterior distribution of θ.

p(θ|X1 = x1, X2 = x2, · · · , Xn = xn)

∝ P (X1 = x1, X2 = x2, · · · , Xn = xn|θ)p(θ)∝ θ

∑ni=1 xi(1− θ)n−

∑ni=1 xi .

AI Friends Seminar Ganguk Hwang Bayesian Statistics January 2019 15 / 37

Page 16: Bayesian Statistics - AiFrenz · Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range

Bayesian models

Considering the normalization constant, we obtain

p(θ|X1 = x1, X2 = x2, · · · , Xn = xn)

=Γ(n+ 2)

Γ(1 +∑n

i=1 xi)Γ(1 + n−∑n

i=1 xi)θ∑n

i=1 xi(1− θ)n−∑n

i=1 xi

which is a Beta distribution with parameters a =∑n

i=1 xi + 1 andb = n−

∑ni=1 xi + 1.

c.f.

Beta(a, b) =Γ(a+ b)

Γ(a)Γ(b)θa−1(1− θ)b−1

Note that, when a = b = 1, Beta(a, b) is in fact a uniform distribution over[0, 1]. That is, the prior and the posterior of θ are both Beta distributions.

AI Friends Seminar Ganguk Hwang Bayesian Statistics January 2019 16 / 37

Page 17: Bayesian Statistics - AiFrenz · Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range

Bayesian models

Example 1

Suppose we want to buy something from Amazon.com, and there are twosellers offering it for the same price. Seller 1 has 90 positive reviews and10 negative reviews. Seller 2 has 2 positive reviews and 0 negative reviews.Who should we buy from? (from Machine Learning by K.P. Murphy)

Let θ1 and θ2 be the unknown reliabilities of the two sellers. Since wedon’t know much about them, we will endow them both with uniformpriors, θi ∼ Beta(1, 1), i = 1, 2. The posteriors are

p(θ1|D1) = Beta(91, 11), p(θ2|D2) = Beta(3, 1).

Hence,

P (θ1 > θ2|D1,D2) =

∫ 1

0

∫ 1

0Iθ1>θ2Beta(θ1|91, 11)Beta(θ2|3, 1) dθ1dθ2

= 0.710.

This concludes that we are better off buying from seller 1.AI Friends Seminar Ganguk Hwang Bayesian Statistics January 2019 17 / 37

Page 18: Bayesian Statistics - AiFrenz · Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range

Bayesian models

Example 2

Figure: Bayesian Experiment for Bernoulli distribution with p = 0.7

AI Friends Seminar Ganguk Hwang Bayesian Statistics January 2019 18 / 37

Page 19: Bayesian Statistics - AiFrenz · Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range

Bayesian models

More generally, if we use Beta(a, b) as a prior distribution of θ, we have

p(θ|X1 = x1, X2 = x2, · · · , Xn = xn)

∝ P (X1 = x1, X2 = x2, · · · , Xn = xn|θ)p(θ)

∝ P (X1 = x1, X2 = x2, · · · , Xn = xn|θ)Γ(a+ b)

Γ(a)Γ(b)θa−1(1− θ)b−1

∝ θa+∑n

i=1 xi−1(1− θ)b+n−∑n

i=1 xi−1.

Hence, we see that

p(θ|X1 = x1, X2 = x2, · · · , Xn = xn) = Beta(a+

n∑i=1

xi, b+ n−n∑i=1

xi).

In this case, the Beta distribution is a conjugate prior.

From now on, we use the notation p(θ|x1, x2, · · · , xn) or p(θ|X) forsimplicity.

AI Friends Seminar Ganguk Hwang Bayesian Statistics January 2019 19 / 37

Page 20: Bayesian Statistics - AiFrenz · Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range

Bayesian models

More on Conjugate Distributions

In Bayesian probability theory, if the posterior distributions p(θ|x) are inthe same probability distribution family as the prior probability distributionp(θ), the prior and posterior are then called conjugate distributions, andthe prior is called a conjugate prior for the likelihood function.

As shown before, if we use a conjugate prior, we can obtain a closed-formexpression for the posterior. This also shows how the likelihood functionupdates a prior distribution.

AI Friends Seminar Ganguk Hwang Bayesian Statistics January 2019 20 / 37

Page 21: Bayesian Statistics - AiFrenz · Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range

Bayesian models

Conjugate Prior for Normal distribution

Assume that X is distributed according to a normal distribution withunknown mean µ and variance 1/τ (or precision τ), i.e.,

X ∼ N (µ, τ−1)

and that the prior distribution on µ and τ , (µ, τ), has a Normal-Gammadistribution

(µ, τ) ∼ NormalGamma(µ0, λ0, α0, β0),

for which the density p(µ, τ) satisfies

p(µ, τ) ∝ τα0− 12 exp[−β0τ ] exp

[−λ0τ(µ− µ0)2

2

].

AI Friends Seminar Ganguk Hwang Bayesian Statistics January 2019 21 / 37

Page 22: Bayesian Statistics - AiFrenz · Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range

Bayesian models

Suppose thatX1, . . . , Xn | µ, τ ∼ i.i.d. N

(µ, τ−1

),

i.e., the components of X = (X1, · · · , Xn) are conditionally independentgiven µ, τ and the conditional distribution given µ, τ is normal withexpectation µ and variance 1/τ .

The posterior distribution of µ and τ given this dataset X, is given by

P (τ, µ | X) ∝ L(X | τ, µ) p(τ, µ),

where L is the likelihood of the data given the parameters.

AI Friends Seminar Ganguk Hwang Bayesian Statistics January 2019 22 / 37

Page 23: Bayesian Statistics - AiFrenz · Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range

Bayesian models

Since the data are i.i.d., the likelihood of X is given by

L(X | τ, µ) ∝n∏i=1

τ1/2 exp

[−τ2

(xi − µ)2]

∝ τn/2 exp

[−τ2

n∑i=1

(xi − µ)2

]

∝ τn/2 exp

[−τ2

n∑i=1

(xi − x+ x− µ)2

]

∝ τn/2 exp

[−τ2

n∑i=1

((xi − x)2 + (x− µ)2

)]

∝ τn/2 exp

[−τ2

((n− 1)s2 + n(x− µ)2

)],

where x = 1n

∑ni=1 xi and s2 = 1

n−1∑n

i=1(xi − x)2.

AI Friends Seminar Ganguk Hwang Bayesian Statistics January 2019 23 / 37

Page 24: Bayesian Statistics - AiFrenz · Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range

Bayesian models

So, the posterior distribution of the parameters is proportional to the priortimes the likelihood as given below.

P (τ, µ | X)

∝ L(X | τ, µ) p(τ, µ)

∝ τn/2 exp

[−τ2

((n− 1)s2 + n(x− µ)2

)]× τα0− 1

2 exp[−β0τ ] exp

[−λ0τ(µ− µ0)2

2

]∝ τ

n2+α0− 1

2 exp

[−τ(

1

2(n− 1)s2 + β0

)]× exp

[−τ

2

(n(x− µ)2 + λ0(µ− µ0)2

)]

AI Friends Seminar Ganguk Hwang Bayesian Statistics January 2019 24 / 37

Page 25: Bayesian Statistics - AiFrenz · Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range

Bayesian models

The last exponential term is simplified as follows:

n(x− µ)2 + λ0(µ− µ0)2

= (n+ λ0)µ2 − 2(nx+ λ0µ0)µ+ nx2 + λ0µ

20

= (n+ λ0)(µ2 − 2

nx+ λ0µ0n+ λ0

µ) + nx2 + λ0µ20

= (n+ λ0)

(µ− nx+ λ0µ0

n+ λ0

)2

+ nx2 + λ0µ20 −

(nx+ λ0µ0)2

n+ λ0

= (n+ λ0)

(µ− nx+ λ0µ0

n+ λ0

)2

+λ0n(x− µ0)2

n+ λ0

AI Friends Seminar Ganguk Hwang Bayesian Statistics January 2019 25 / 37

Page 26: Bayesian Statistics - AiFrenz · Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range

Bayesian models

Combining all the results yields

P (τ, µ | X)

∝ τn2+α0− 1

2 exp

[−τ(

1

2(n− 1)s2 + β0

)]× exp

[−τ

2

((n+ λ0)

(µ− nx+ λ0µ0

n+ λ0

)2

+λ0n(x− µ0)2

n+ λ0

)]

∝ τn2+α0− 1

2 exp

[−τ(β0 +

1

2

((n− 1)s2 +

λ0n(x− µ0)2

n+ λ0

))]× exp

[−τ

2(n+ λ0)

(µ− nx+ λ0µ0

n+ λ0

)2]

AI Friends Seminar Ganguk Hwang Bayesian Statistics January 2019 26 / 37

Page 27: Bayesian Statistics - AiFrenz · Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range

Bayesian models

That is, the posterior is exactly in the same form as a Normal-Gammadistribution, i.e.,

P (τ, µ | X) = NormalGamma(µ1, λ1, α1, β1)

where

µ1 =nx+ λ0µ0n+ λ0

,

λ1 = n+ λ0,

α1 = α0 +n

2,

β1 = β0 +1

2

((n− 1)s2 +

λ0n(x− µ0)2

n+ λ0

)

AI Friends Seminar Ganguk Hwang Bayesian Statistics January 2019 27 / 37

Page 28: Bayesian Statistics - AiFrenz · Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range

Bayesian models

Wishart Distribution

Suppose X is a p× n matrix, each column of which is independentlydrawn from a p-variate normal distribution with zero mean

xi=(x1i , · · · , xpi )> ∼ Np(0,Σ), 1 ≤ i ≤ n.

Then, the Wishart distribution is the probability distribution of the p× prandom matrix

S =

n∑i=1

xix>i , S ∼Wp(Σ, n).

The positive integer n is the degrees of freedom. For n ≥ p the matrix Sis invertible with probability 1 if Σ is invertible.If p = Σ = 1, then this distribution is a chi-squared distribution with ndegrees of freedom.

AI Friends Seminar Ganguk Hwang Bayesian Statistics January 2019 28 / 37

Page 29: Bayesian Statistics - AiFrenz · Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range

Bayesian models

The probability density function of S ∼Wp(Σ, n) is given by

f(S) =|S|

n−p−12

|Σ|n2 2

np2 Γp

(n2

) exp

(−1

2tr(Σ−1S)

)

where Γp(x) = πp(p−1)

4∏pj=1 Γ(x+ 1−j

2 ).

AI Friends Seminar Ganguk Hwang Bayesian Statistics January 2019 29 / 37

Page 30: Bayesian Statistics - AiFrenz · Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range

Bayesian models

Inverse-Wishart Distribution

The Inverse-Wishart distribution is a probability distribution on positivedefinite matrices.

We say that T follows the Inverse-Wishart distribution with Ψ and m,denoted by T ∼ InvWp(Ψ,m) if its inverse T−1 follows Wp(Ψ

−1,m).

The probability density function of T ∼ InvWp(Ψ,m) is given by

f(T) =|Ψ|

m2

|T|m+p+1

2 2mp2 Γp

(m2

) exp

(−1

2tr(ΨT−1)

)

where Γp(x) = πp(p−1)

4∏pj=1 Γ(x+ 1−j

2 ).

AI Friends Seminar Ganguk Hwang Bayesian Statistics January 2019 30 / 37

Page 31: Bayesian Statistics - AiFrenz · Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range

Bayesian models

We now show that the Inverse-Wishart distribution is conjugate prior onthe covaraince matrix parameter of a multivariate normal distribution.Consider

X = (x1, · · · ,xn),xi ∼ Np(0,Σ),Σ ∼ InvWp(Ψ,m)

Then the posterior distribution of Σ satisfies

f(Σ|X) ∝ f(X|Σ)g(Σ|Ψ,m)

∝ (2π)−np2 |Σ|−

n2 exp

(−1

2

n∑i=1

x>i Σ−1xi

)

× |Ψ|m2

|Σ|m+p+1

2 2mp2 Γp

(m2

) exp

(−1

2tr(ΨΣ−1)

)

∝ |Σ|−n2−m+p+1

2 exp

(−1

2

n∑i=1

x>i Σ−1xi −1

2tr(ΨΣ−1)

)

AI Friends Seminar Ganguk Hwang Bayesian Statistics January 2019 31 / 37

Page 32: Bayesian Statistics - AiFrenz · Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range

Bayesian models

Noting that

n∑i=1

x>i Σ−1xi = tr

(n∑i=1

x>i Σ−1xi

)=

n∑i=1

tr(x>i Σ−1xi)

=

n∑i=1

tr(xix>i Σ−1) = tr(

n∑i=1

xix>i Σ−1)

= tr(SΣ−1), where S =

n∑i=1

xix>i ,

we obtain

f(Σ|X) ∝ |Σ|−n+m+p+1

2 exp

(−1

2tr(SΣ−1 + ΨΣ−1)

)∝ |Σ|−

n+m+p+12 exp

(−1

2tr((S + Ψ)Σ−1)

).

AI Friends Seminar Ganguk Hwang Bayesian Statistics January 2019 32 / 37

Page 33: Bayesian Statistics - AiFrenz · Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range

Bayesian models

Bayesian Decision Theory

The mode of a posterior distribuiton, e.g., the MAP estimator, is often avery poor choice as a summary because the mode is usually quite untypicalof the distribution, unlike the mean and the median.

In this case we use Bayesian decision theory where a loss function L(θ, θ)is considered. Here, L(θ, θ) is the loss we have if the truth is θ and ourestimate is θ.

With a loss function and a posterior distribution we are trying to minimize

argminθ

E[L(θ, θ)|D].

By choosing different loss functions we have different estimators otherthan the MAP estimator.

AI Friends Seminar Ganguk Hwang Bayesian Statistics January 2019 33 / 37

Page 34: Bayesian Statistics - AiFrenz · Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range

Bayesian models

Bayesian Decision Theory

L(θ, θ) = Iθ 6=θ: The MAP estimator

E[L(θ, θ)|D] = p(θ 6= θ|D) = 1− p(θ|D)

L(θ, θ) = (θ − θ)2: the mean

E[L(θ, θ)|D] = E[(θ − θ)2|D] = E[(θ − E[θ])2|D] + (E[θ]− θ)2

L(θ, θ) = |θ − θ|: the median

E[L(θ, θ)|D] = E[|θ − θ||D]

AI Friends Seminar Ganguk Hwang Bayesian Statistics January 2019 34 / 37

Page 35: Bayesian Statistics - AiFrenz · Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range

Bayesian models

References

K.P. Murphy, Machine Learning, The MIT Press, 2012.

AI Friends Seminar Ganguk Hwang Bayesian Statistics January 2019 35 / 37