39
Probability Basics for Machine Learning CSC411 Shenlong Wang* Friday, January 15, 2015 *Based on many others’ slides and resources from Wikipedia

Probability*Basics** for*Machine*Learning*Probability*Basics** for*Machine*Learning* CSC411 Shenlong*Wang* Friday,*January*15,*2015* *Based*on*many*others’*slides*and*resources*from*Wikipedia*

  • Upload
    others

  • View
    17

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Probability*Basics** for*Machine*Learning*Probability*Basics** for*Machine*Learning* CSC411 Shenlong*Wang* Friday,*January*15,*2015* *Based*on*many*others’*slides*and*resources*from*Wikipedia*

Probability  Basics    for  Machine  Learning  

CSC411  Shenlong  Wang*  

Friday,  January  15,  2015  

*Based  on  many  others’  slides  and  resources  from  Wikipedia  

Page 2: Probability*Basics** for*Machine*Learning*Probability*Basics** for*Machine*Learning* CSC411 Shenlong*Wang* Friday,*January*15,*2015* *Based*on*many*others’*slides*and*resources*from*Wikipedia*

Outline  

•  MoHvaHon  •  NotaHon,  definiHons,  laws  •  ExponenHal  family  distribuHons  – E.g.  Normal  distribuHon  

•  Parameter  esHmaHon  •  Conjugate  prior  

Page 3: Probability*Basics** for*Machine*Learning*Probability*Basics** for*Machine*Learning* CSC411 Shenlong*Wang* Friday,*January*15,*2015* *Based*on*many*others’*slides*and*resources*from*Wikipedia*

Why  Represent  Uncertainty?  

•  The  world  is  full  of  uncertainty  – “Is  there  a  person  in  this  image?”  – “What  will  the  weather  be  like  today?”  – “Will  I  like  this  movie?”  

•  We’re  trying  to  build  systems  that  understand  and  (possibly)  interact  with  the  real  world  

•  We  oZen  can’t  prove  something  is  true,  but  we  can  sHll  ask  how  likely  different  outcomes  are  or  ask  for  the  most  likely  explanaHon  

amazon  

Page 4: Probability*Basics** for*Machine*Learning*Probability*Basics** for*Machine*Learning* CSC411 Shenlong*Wang* Friday,*January*15,*2015* *Based*on*many*others’*slides*and*resources*from*Wikipedia*

Why  Use  Probability  to  Represent  Uncertainty?  

•  Write  down  simple,  reasonable  criteria  that  you'd  want  from  a  system  of  uncertainty  (common  sense  stuff),  and  you  always  get  probability:  –  Probability  theory  is  nothing  but  common  sense  reduced  to  calculaHon.  —  Pierre  Laplace,  1812  

•  Cox  Axioms  (Cox  1946);  See  Bishop,  SecHon  1.2.3  •  We  will  restrict  ourselves  to  a  relaHvely  informal  discussion  of  probability  theory.  

Page 5: Probability*Basics** for*Machine*Learning*Probability*Basics** for*Machine*Learning* CSC411 Shenlong*Wang* Friday,*January*15,*2015* *Based*on*many*others’*slides*and*resources*from*Wikipedia*

NotaHon  •  A  random  variable  X  represents  outcomes  or  states  of  the  world.    

•  We  will  write  p(x)  to  mean  Probability(X  =  x)    •  Sample  space:  the  space  of  all  possible  outcomes  (may  be  discrete,  conHnuous,  or  mixed)  

•  p(x)  is  the  probability  mass  (density)  func8on  – Assigns  a  number  to  each  point  in  sample  space  – Non-­‐negaHve,  sums  (integrates)  to  1  –  IntuiHvely:  how  oZen  does  x  occur,  how  much  do  we  believe  in  x.    

Page 6: Probability*Basics** for*Machine*Learning*Probability*Basics** for*Machine*Learning* CSC411 Shenlong*Wang* Friday,*January*15,*2015* *Based*on*many*others’*slides*and*resources*from*Wikipedia*

Joint  Probability  DistribuHon  •  Prob(X=x,  Y=y)  – “Probability  of  X=x  and  Y=y”    – p(x,  y)    

CondiHonal  Probability  DistribuHon  •  Prob(X=x|Y=y)  – “Probability  of  X=x  given  Y=y”    – p(x|y)  =  p(x,y)/p(y)  

 

Page 7: Probability*Basics** for*Machine*Learning*Probability*Basics** for*Machine*Learning* CSC411 Shenlong*Wang* Friday,*January*15,*2015* *Based*on*many*others’*slides*and*resources*from*Wikipedia*

The  Rules  of  Probability  

•  Sum  Rule  (marginalizaHon/summing  out):  

•  Product/Chain  Rule:  

),...,,(...)(

),()(

2112 3

Nx x x

y

xxxpxp

yxpxp

N

∑∑ ∑

=

=

),...,|()...|()(),...,()()|(),(

111211 −=

=

NNN xxxpxxpxpxxpxpxypyxp

Page 8: Probability*Basics** for*Machine*Learning*Probability*Basics** for*Machine*Learning* CSC411 Shenlong*Wang* Friday,*January*15,*2015* *Based*on*many*others’*slides*and*resources*from*Wikipedia*

Bayes’  Rule  

•  One  of  the  most  important  formulas  in  probability  theory  

     •  This  gives  us  a  way  of  “reversing”  condiHonal  probabiliHes  

p(x | y) = p(y | x)p(x)p(y)

=p(y | x)p(x)p(y | x ')p(x ')

x '∑

Page 9: Probability*Basics** for*Machine*Learning*Probability*Basics** for*Machine*Learning* CSC411 Shenlong*Wang* Friday,*January*15,*2015* *Based*on*many*others’*slides*and*resources*from*Wikipedia*

Independence  

•  Two  random  variables  are  said  to  be  independent  iff  their  joint  distribuHon  factors  

 •  Two  random  variables  are  condi8onally  independent  given  a  third  if  they  are  independent  aZer  condiHoning  on  the  third  

X ⊥Y⇔ p(x, y) = p(y | x)p(x) = p(x | y)p(y) = p(x)p(y)

X ⊥Y | Z⇔ p(x, y | z) = p(y | x, z)p(x | z) = p(y | z)p(x | z) ∀z

Page 10: Probability*Basics** for*Machine*Learning*Probability*Basics** for*Machine*Learning* CSC411 Shenlong*Wang* Friday,*January*15,*2015* *Based*on*many*others’*slides*and*resources*from*Wikipedia*

ConHnuous  Random  Variables  •  Outcomes  are  real  values.    Probability  density  funcHons  define  distribuHons.  – E.g.,          

•  ConHnuous  joint  distribuHons:  replace  sums  with  integrals,  and  everything  holds  – E.g.,  MarginalizaHon  and  condiHonal  probability  

∫∫ ==yy

yPyzxPzyxPzxP )()|,(),,(),(

⎭⎬⎫

⎩⎨⎧ −−= 2

2 )(21exp

21),|( µ

σσπσµ xxP

Page 11: Probability*Basics** for*Machine*Learning*Probability*Basics** for*Machine*Learning* CSC411 Shenlong*Wang* Friday,*January*15,*2015* *Based*on*many*others’*slides*and*resources*from*Wikipedia*

Summarizing  Probability  DistribuHons  

•  It  is  oZen  useful  to  give  summaries  of  distribuHons  without  defining  the  whole  distribuHon  (E.g.,  mean  and  variance)  

•  Mean:      •  Variance:  

•  Nth  moment:    

dxxpxxxEx

)(][ ∫ ⋅==

dxxpxExxx

)(])[()var( 2∫ ⋅−=

=E[x2 ]−E[x]2

µn = (x − c)n ⋅ p(x)x∫ dx

Page 12: Probability*Basics** for*Machine*Learning*Probability*Basics** for*Machine*Learning* CSC411 Shenlong*Wang* Friday,*January*15,*2015* *Based*on*many*others’*slides*and*resources*from*Wikipedia*

ExponenHal  Family  

•  Family  of  probability  distribuHons  •  Many  of  the  standard  distribuHons  belong  to  this  family    – Bernoulli,  binomial/mulHnomial,  Poisson,  Normal  (Gaussian),  beta/Dirichlet,…  

•  Share  many  important  properHes  – e.g.  They  have  a  conjugate  prior  (we’ll  get  to  that  later.  Important  for  Bayesian  staHsHcs)  

•  First  –  let’s  see  some  examples  

Page 13: Probability*Basics** for*Machine*Learning*Probability*Basics** for*Machine*Learning* CSC411 Shenlong*Wang* Friday,*January*15,*2015* *Based*on*many*others’*slides*and*resources*from*Wikipedia*

DefiniHon  •  The  exponenHal  family  of  distribuHons  over  x,  given  parameter  η  (eta)  is  the  set  of  distribuHons  of  the  form  

•  x-­‐scalar/vector,  discrete/conHnuous  •  η  –  ‘natural  parameters’  •  u(x)  –  some  funcHon  of  x  (sufficient  staHsHc)  •  g(η)  -­‐  normalizer    

)}(exp{)()()|( xugxhxp Tηηη =

1)}(exp{)()( =∫ dxxuxhg Tηη

Page 14: Probability*Basics** for*Machine*Learning*Probability*Basics** for*Machine*Learning* CSC411 Shenlong*Wang* Friday,*January*15,*2015* *Based*on*many*others’*slides*and*resources*from*Wikipedia*

Example  1:  Bernoulli  

•  Binary  random  variable  -­‐    •  p(heads)  =  µ  •  Coin  toss  

xxxp −−= 1)1()|( µµµ

}1,0{∈X]1,0[∈µ

Page 15: Probability*Basics** for*Machine*Learning*Probability*Basics** for*Machine*Learning* CSC411 Shenlong*Wang* Friday,*January*15,*2015* *Based*on*many*others’*slides*and*resources*from*Wikipedia*

Example  1:  Bernoulli  

xxxp −−= 1)1()|( µµµ

}1

exp{ln)1(

)}1ln()1(lnexp{

xu

xx

⎟⎠

⎞⎜⎝

⎛−

−=

−−+=

µµ

µµ

)()(11)(

1ln

)(1)(

ηση

ησµµµ

η η

−=

+==⇒⎟⎟

⎞⎜⎜⎝

−=

=

=

ge

xxuxh

)}(exp{)()()|( xugxhxp Tηηη =

)exp()()|( xxp ηηση −=

Page 16: Probability*Basics** for*Machine*Learning*Probability*Basics** for*Machine*Learning* CSC411 Shenlong*Wang* Friday,*January*15,*2015* *Based*on*many*others’*slides*and*resources*from*Wikipedia*

Example  2:  MulHnomial  •  p(value  k)  =  µk  

•  For  a  single  observaHon  –  die  toss  – SomeHmes  called  Categorical  

•  For  mulHple  observaHons    –  integer  counts  on  N  trials  – Prob(1  came  out  3  Hmes,  2  came  out  once,…,6  came  out  7  Hmes  if  I  tossed  a  die  20  Hmes)  

1],1,0[1

=∈ ∑=

M

kkk µµ

∏∏ =

=M

k

xk

kk

Mk

xNxxP

11 !

!)|,...,( µµ

∑=

=M

kk Nx

1

Page 17: Probability*Basics** for*Machine*Learning*Probability*Basics** for*Machine*Learning* CSC411 Shenlong*Wang* Friday,*January*15,*2015* *Based*on*many*others’*slides*and*resources*from*Wikipedia*

Example  2:  MulHnomial  (1  observaHon)  

}lnexp{1∑=

=M

kkkx µ

xxx=

=

)(1)(

uh

)}(exp{)()()|( xugxhxp Tηηη =

∏=

=M

k

xkMkxxP

11 )|,...,( µµ

)exp()|( xx Tp ηη =

Parameters  are  not  independent  due  to  constraint  of  summing  to  1,  there’s  a  slightly  more  involved  notaHon  to  address  that,  see  Bishop  2.4  

Page 18: Probability*Basics** for*Machine*Learning*Probability*Basics** for*Machine*Learning* CSC411 Shenlong*Wang* Friday,*January*15,*2015* *Based*on*many*others’*slides*and*resources*from*Wikipedia*

Example  3:  Normal  (Gaussian)  DistribuHon  

•  Gaussian  (Normal)  

⎭⎬⎫

⎩⎨⎧ −−= 2

2 )(21exp

21),|( µ

σσπσµ xxp

Page 19: Probability*Basics** for*Machine*Learning*Probability*Basics** for*Machine*Learning* CSC411 Shenlong*Wang* Friday,*January*15,*2015* *Based*on*many*others’*slides*and*resources*from*Wikipedia*

Example  3:  Normal  (Gaussian)  DistribuHon  

 

•  µ  is  the  mean  •  σ2  is  the  variance  •  Can  verify  these  by  compuHng  integrals.    E.g.,  

⎭⎬⎫

⎩⎨⎧ −−= 2

2 )(21exp

21),|( µ

σσπσµ xxp

x ⋅ 12πσ

exp −12σ 2 (x −µ)2

⎧ ⎨ ⎩

⎫ ⎬ ⎭ dx = µ

x→−∞

x→∞

Page 20: Probability*Basics** for*Machine*Learning*Probability*Basics** for*Machine*Learning* CSC411 Shenlong*Wang* Friday,*January*15,*2015* *Based*on*many*others’*slides*and*resources*from*Wikipedia*

Example  3:  Normal  (Gaussian)  DistribuHon  

•  MulHvariate  Gaussian  

 

P(x |µ,∑) = 2π ∑−1/ 2 exp −12(x −µ)T ∑−1(x −µ)

⎧ ⎨ ⎩

⎫ ⎬ ⎭

Page 21: Probability*Basics** for*Machine*Learning*Probability*Basics** for*Machine*Learning* CSC411 Shenlong*Wang* Friday,*January*15,*2015* *Based*on*many*others’*slides*and*resources*from*Wikipedia*

Example  3:  Normal  (Gaussian)  DistribuHon  

•  MulHvariate  Gaussian  

 •  x  is  now  a  vector  •  µ  is  the  mean  vector  •  Σ  is  the  covariance  matrix  

⎭⎬⎫

⎩⎨⎧ −∑−−∑=∑ −− )()(21exp2),|( 12/1

µµπµ xxxp T

Page 22: Probability*Basics** for*Machine*Learning*Probability*Basics** for*Machine*Learning* CSC411 Shenlong*Wang* Friday,*January*15,*2015* *Based*on*many*others’*slides*and*resources*from*Wikipedia*

Important  ProperHes  of  Gaussians  

•  All  marginals  of  a  Gaussian  are  again  Gaussian  •  Any  condiHonal  of  a  Gaussian  is  Gaussian  •  The  product  of  two  Gaussians  is  again  Gaussian  

•  Even  the  sum  of  two  independent  Gaussian  RVs  is  a  Gaussian.    

Page 23: Probability*Basics** for*Machine*Learning*Probability*Basics** for*Machine*Learning* CSC411 Shenlong*Wang* Friday,*January*15,*2015* *Based*on*many*others’*slides*and*resources*from*Wikipedia*

ExponenHal  Family  RepresentaHon  

}21exp{)

4exp()2()2(

}21

21exp{

21

)(21exp

21),|(

2222

212

1

221

222

22

22

⎥⎦

⎤⎢⎣

⎡⎥⎦

⎤⎢⎣

⎡ −−=

=−

++−

=

⎭⎬⎫

⎩⎨⎧ −−=

xx

xx

xxp

σσµ

ηη

ηπ

µσσ

µσσπ

µσσπ

σµ

)}(exp{)()()|( xugxhxp Tηηη =

)(xh )(ηg Tη )(xu

Page 24: Probability*Basics** for*Machine*Learning*Probability*Basics** for*Machine*Learning* CSC411 Shenlong*Wang* Friday,*January*15,*2015* *Based*on*many*others’*slides*and*resources*from*Wikipedia*

Example:  Maximum  Likelihood  For  a  1D  Gaussian  

•  Suppose  we  are  given  a  data  set  of  samples  of  a  Gaussian  random  variable  X,  D={x1,…,  xN}  and  told  that  the  variance  of  the  data  is  σ2  

     What  is  our  best  guess  of  µ?    

*Need  to  assume  data  is  independent  and  idenHcally  distributed  (i.i.d.)  

x1   x2   xN  …  

Page 25: Probability*Basics** for*Machine*Learning*Probability*Basics** for*Machine*Learning* CSC411 Shenlong*Wang* Friday,*January*15,*2015* *Based*on*many*others’*slides*and*resources*from*Wikipedia*

Example:  Maximum  Likelihood  For  a  1D  Gaussian  

What  is  our  best  guess  of  µ?    •  We  can  write  down  the  likelihood  func8on:  

•  We  want  to  choose  the  µ  that  maximizes  this  expression  – Take  log,  then  basic  calculus:  differenHate  w.r.t.  µ,  set  derivaHve  to  0,  solve  for  µ  to  get  sample  mean  

 

∏ ∏= = ⎭

⎬⎫

⎩⎨⎧ −−==µ

N

i

N

i

ii xxpdp1 1

22 )(

21exp

21),|()|( µ

σσπσµ

µML =1N

xii=1

N∑

Page 26: Probability*Basics** for*Machine*Learning*Probability*Basics** for*Machine*Learning* CSC411 Shenlong*Wang* Friday,*January*15,*2015* *Based*on*many*others’*slides*and*resources*from*Wikipedia*

Example:  Maximum  Likelihood  For  a  1D  Gaussian  

x1   x2   xN  …  µML  

σML  

Maximum  Likelihood  

Page 27: Probability*Basics** for*Machine*Learning*Probability*Basics** for*Machine*Learning* CSC411 Shenlong*Wang* Friday,*January*15,*2015* *Based*on*many*others’*slides*and*resources*from*Wikipedia*

ML  esHmaHon  of  model  parameters  for  ExponenHal  Family  

( )

)g(for solve 0, set to...,)|(

)}(exp{)()(),...,()|( 1

ηηη

ηηη

∇=∂

== ∑∏Dp

xugxhxxpDpn

nTN

nN

∑=

=∇−N

nnML xu

Ng

1)(1)(ln η

•   Can  in  principle  be  solved  to  get  esHmate  for  eta.    •   The  soluHon  for  the  ML  esHmator  depends  on  the  data  only  through  sum        over  u,    which  is  therefore  called  sufficient  sta8s8c  •   What  we  need  to  store  in  order  to  esHmate  parameters.    

Page 28: Probability*Basics** for*Machine*Learning*Probability*Basics** for*Machine*Learning* CSC411 Shenlong*Wang* Friday,*January*15,*2015* *Based*on*many*others’*slides*and*resources*from*Wikipedia*

Bayesian  ProbabiliHes  

•                                     is  the  likelihood  func8on  •                           is  the  prior  probability  of  (or  our  prior  belief  over)  θ  –  our  beliefs  over  what  models  are  likely  or  not  before  seeing  any  data  

•                                                                                                     is  the                normaliza8on  constant  or  par88on  func8on  

•                                 is  the  posterior  distribu8on  –  Readjustment  of  our  prior  beliefs  in  the  face  of  data  

)()()|()|(

dppdpdp θθ

θ =

∫= θθθ dPdpdp )()|()(

)|( θdp)(θp

)|( dp θ

Page 29: Probability*Basics** for*Machine*Learning*Probability*Basics** for*Machine*Learning* CSC411 Shenlong*Wang* Friday,*January*15,*2015* *Based*on*many*others’*slides*and*resources*from*Wikipedia*

Example:  Bayesian  Inference  For  a  1D  Gaussian  

•  Suppose  we  have  a  prior  belief  that  the  mean  of  some  random  variable  X  is  µ0  and  the  variance  of  our  belief  is  σ02  

•  We  are  then  given  a  data  set  of  samples  of  X,  d={x1,…,  xN}  and  somehow  know  that  the  variance  of  the  data  is  σ2  

 What  is  the  posterior  distribu7on  over  (our  belief  about  the  value  of)  µ?    

Page 30: Probability*Basics** for*Machine*Learning*Probability*Basics** for*Machine*Learning* CSC411 Shenlong*Wang* Friday,*January*15,*2015* *Based*on*many*others’*slides*and*resources*from*Wikipedia*

Example:  Bayesian  Inference  For  a  1D  Gaussian  

x1   x2   xN  …  

Page 31: Probability*Basics** for*Machine*Learning*Probability*Basics** for*Machine*Learning* CSC411 Shenlong*Wang* Friday,*January*15,*2015* *Based*on*many*others’*slides*and*resources*from*Wikipedia*

Example:  Bayesian  Inference  For  a  1D  Gaussian  

x1   x2   xN  …   µ0  

σ0  

Prior  belief  

Page 32: Probability*Basics** for*Machine*Learning*Probability*Basics** for*Machine*Learning* CSC411 Shenlong*Wang* Friday,*January*15,*2015* *Based*on*many*others’*slides*and*resources*from*Wikipedia*

Example:  Bayesian  Inference  For  a  1D  Gaussian  

•  Remember  from  earlier  

•                               is  the  likelihood  func8on  

•                     is  the  prior  probability  of  (or  our  prior  belief  over)  µ  

)()()|()|(

dppdpdp µµ

)|( µdp

)(µp

∏ ∏= = ⎭

⎬⎫

⎩⎨⎧ −−==µ

N

i

N

i

ii xxPdp1 1

22 )(

21exp

21),|()|( µ

σσπσµ

⎭⎬⎫

⎩⎨⎧

µ−µ−=µµ 202

0000 )(

21exp

21),|(

σσπσp

Page 33: Probability*Basics** for*Machine*Learning*Probability*Basics** for*Machine*Learning* CSC411 Shenlong*Wang* Friday,*January*15,*2015* *Based*on*many*others’*slides*and*resources*from*Wikipedia*

Example:  Bayesian  Inference  For  a  1D  Gaussian  

),|()|()()|()|(

NNDppDpDp

σµµ=µ

µµ∝µ

Normal

µN =σ 2

Nσ 02 +σ 2 µ0 +

Nσ 02

Nσ 02 +σ 2 µML

1σN2 =

1σ 02 +

Nσ 2

where  

Page 34: Probability*Basics** for*Machine*Learning*Probability*Basics** for*Machine*Learning* CSC411 Shenlong*Wang* Friday,*January*15,*2015* *Based*on*many*others’*slides*and*resources*from*Wikipedia*

Example:  Bayesian  Inference  For  a  1D  Gaussian  

x1   x2   xN  …   µ0  

σ0  

Prior  belief  

Page 35: Probability*Basics** for*Machine*Learning*Probability*Basics** for*Machine*Learning* CSC411 Shenlong*Wang* Friday,*January*15,*2015* *Based*on*many*others’*slides*and*resources*from*Wikipedia*

Example:  Bayesian  Inference  For  a  1D  Gaussian  

x1   x2   xN  …   µ0  

σ0  

Prior  belief  µML

 

σML  

Maximum  Likelihood  

Page 36: Probability*Basics** for*Machine*Learning*Probability*Basics** for*Machine*Learning* CSC411 Shenlong*Wang* Friday,*January*15,*2015* *Based*on*many*others’*slides*and*resources*from*Wikipedia*

Example:  Bayesian  Inference  For  a  1D  Gaussian  

x1   x2   xN  µN  

σN  

Prior  belief  Maximum  Likelihood  

Posterior  Distribu8on  

Page 37: Probability*Basics** for*Machine*Learning*Probability*Basics** for*Machine*Learning* CSC411 Shenlong*Wang* Friday,*January*15,*2015* *Based*on*many*others’*slides*and*resources*from*Wikipedia*

Image  from  xkcd.com  

Page 38: Probability*Basics** for*Machine*Learning*Probability*Basics** for*Machine*Learning* CSC411 Shenlong*Wang* Friday,*January*15,*2015* *Based*on*many*others’*slides*and*resources*from*Wikipedia*

Conjugate  Priors  •  NoHce  in  the  Gaussian  parameter  esHmaHon  example  that  the  funcHonal  form  of  the  posterior  was  that  of  the  prior  (Gaussian)  

•  Priors  that  lead  to  that  form  are  called  ‘conjugate  priors’  

•  For  any  member  of  the  exponenHal  family  there  exists  a  conjugate  prior  that  can  be  wri{en  like  

•  MulHply  by  likelihood  to  obtain  posterior  (up  to  normalizaHon)  of  the  form    

•  NoHce  the  addiHon  to  the  sufficient  staHsHc  •  ν  is  the  effecHve  number  of  pseudo-­‐observaHons.    

}exp{)(),(),|( χνηηνχνχη ν Tgfp =

)})((exp{)(),,|(1

νχηηνχη ν +∝ ∑=

+N

nn

TN xugDp

Page 39: Probability*Basics** for*Machine*Learning*Probability*Basics** for*Machine*Learning* CSC411 Shenlong*Wang* Friday,*January*15,*2015* *Based*on*many*others’*slides*and*resources*from*Wikipedia*

Conjugate  Priors  -­‐  Examples  

•  Beta  for  Bernoulli/binomial  •  Dirichlet  for  categorical/mulHnomial  •  Normal  for  mean  of  Normal  •  And  many  more...  – Conjugate  Prior  Table:    

•  h{p://en.wikipedia.org/wiki/Conjugate_prior