Upload
dahua-lin
View
285
Download
4
Embed Size (px)
Citation preview
Outline
You$will$learn$the$basics$of$probabilis3c$modeling$in$this$lecture:
• Graphical*Models
• Exponen3al*Families
• Conjugate*Prior
• How*to*formulate*and*analyze*a*graphical*models*in*prac3ce.
2
Graphical)Models• The%key%idea%behind%graphical)models%is%factoriza2on.
• A%graphical)model%generally%refers%to%a%family%of%joint%distribu8ons%over%mul8ple%variables%that%factorize%according%to%the%structure%of%the%underlying%graph.%
3
Graphical)Models
A"graphical)model"can"be"viewed"in"two"ways:
• A#data#structure#that#provides#the#skeleton#for#represen3ng#a#joint#distribu3on#in#a#factorized#manner.
• A#compact#representa3on#of#a#set#of#condi/onal0independencies#about#a#family#of#distribu3ons.
These%two%views%are%equivalent%in%a%strict%sense.
4
Distribu(ons+on+a+Graph
Consider)a)graph) ,)where)edges)can)be)directed)or)undirected:
• A#ach'a'random'variable' 'to'each'vertex'
• The'state'space'for' 'is'denoted'by'
• A'par9cular'instance'of' 'is'denoted'by'
• We'can'also'consider'a'set'of'variables:' 'and'
5
Categories*of*Graphical*Models• Bayesian)Networks)(Directed)Acyclic)Graphs)
• Markov)Random)Fields)(Undirected)Graphs)
• Chain)Graphs)(Directed)acyclic)graphs)over)undirected)components)
• Factor)Graphs
6
Directed(Acyclic(Graphs
Consider)a)directed'graph) :
• "is"called"a"directed'acyclic'graph'(DAG)"if"it"has"no"directed'acyclic'cycles
7
Directed(Acyclic(Graphs((cont'd)
Consider)a)directed'acyclic'graph) :
• Given'an'edge' ,' 'is'called'a'parent'of' ,'and' 'is'called'a'child'of' .
• A'vertex' 'is'called'an'ancestor'of' 'and' 'an'descendant'of' ,'denoted'as' ,'if'there'exists'a'directed'path'from' 'to' .
8
Topological)Ordering• A#topological)ordering#of#a#directed)graph#
#is#a#linear)ordering#of#ver,ces#such#that#for#each#edge# ,# #always#comes#before# .
• A#finite&directed&graph#is#acyclic#if#and#only#if#it#has#a#topological&ordering.
9
Bayesian(Networks
Given&a&DAG& ,&we&say&a&joint&distribu,on&over& &factorizes&according&to& ,&if&its&density& &can&be&expressed&as:
• Such&a&model&is&called&a&Bayesian(Network&over& .
• &is&the&set&of& 's&parents,&which&can&be&empty.
10
Undirected)Graphs)and)Cliques
Consider)an)undirected)graph)
• A#clique#is#a#fully#connected#subset#of#ver4ces
• A#clique#is#called#maximal#if#it#is#not#properly.contained#in#another#clique.#
• #denotes#the#set#of#all#maximal.cliques.#
12
Markov'Random'Fields
Consider)an)undirected)graph) ,)we)say)a)joint)distribu2on)of) )factorizes)according)to) )if)its)density) )can)be)expressed)as:
• This&is&called&a&Markov'Random'Field&over& .
• &are&called&factors.
14
Markov'Random'Fields'(cont'd)
• The%normalizing*constant% %is%usually%needed%to%ensure%the%distribu2on%is%properly%normalized:
• Generally,*the*compa&bility,func&ons* *need*not*have*any*obvious*rela4ons*with*the*marginal*or*condi4onal*distribu4ons*over*the*cliques.
15
a b
c
MRF$Parameteriza-on
• All$MRFs$can$be$parameterized$in$terms$of$maximal&cliques.$In$prac9ce,$this$is$not$necessarily$the$most$natural$way.
• Natural(parameteriza.on:(
• Maximal4clique(based:(
(with(
16
Condi&onal)Independence
Consider)a)joint)distribu/on)over) ,) )and) )are)called)condi&onally*independent)given) ,)denoted)by) )iff
More%generally,
18
Condi&onal)Independence)(cont'd)
If#the#condi,onal#distribu,ons# #and# #have#densi,es# #and# ,#then# ,#if#the#following#equality#holds#almost'surely:
!
19
I"map
• Let% %be%a%family%of%distribu2ons%(e.g.%a%graphical%model).%We%define% %to%be%the%set%of%condi)onal,independencies%in%the%form%of% %that%hold%for%all%distribu2ons%in% .
• Given%a%graph% %associated%with%a%set%of%condi)onal,independencies% ,%then% %is%called%an%I0map%of% %if% .
• An%I0map%is%a%graph%that%captures%(part%of)%the%condi2onal%independencies%of%a%distribu2on%family.%
20
Condi&onal)Independencies)of)MRFs• The%condi+onal%independencies%of%an%MRF%can%be%characterized%in%three%ways:
• Local&independencies
• Pairwise&independencies
• Global&independencies
• In%the%sequel,%we%consider%an%undirected&graph%.
21
Local&Independencies
• For%each% ,%the%Markov'blanket%of% %w.r.t.% ,%denoted%by% ,%is%the%set%of%all%neighbors%of% .
• Local'independencies:% %is%independent%of%the%rest%given%its%neighbors.
22
Pairwise(Independencies
• Pairwise(independencies:#Given#two#disjoint#sets# #with#no#direct#edges#between#them,# #is#independent#of#
#given#the#rest:
23
Global&Independencies
• We$say$ $separates$ $and$ ,$denoted$by$ ,$if$all$paths$between$ $and$ $go$through$ .
• Global+independencies:$If$ $separates$ $and$ ,$then$ $is$independent$of$ $given$ .
24
Rela%ons)between)Independencies
•
• Given'a'distribu/on'or'a'family'of'distribu/on' ,'we'say' 'sa#sfies' 'if'it'sa/sfies'all'condi#onal,independencies'in' ,'denoted'by' .
• .
• If' 'is'a'family'of'posi#ve,distribu#ons,'then
25
Soundness
• Let% %be%a%distribu-on%that%factorizes%according%to%an%undirected%graph% ,%then% ,%or%in%other%words,% %is%an%I"map%of%
• %and% .
• How%to%proof?
• How%is%the%separa)on,assump)on%related%to%the%maximal,cliques?
26
We#have#shown#that#if# #factorizes#according#to# ,#then# #is#an#I5map#for# .#Is#the#converse#also#true?
27
Hammersley)Clifford
• (Hammersley*Clifford0Theorem)"Let" "be"a"posi5ve0distribu5on"over" "and" "be"an"I.map"of","then" "factorizes"according"to" ."
• Combining"Soundness"and"Hammersley*Clifford:
• A"posi5ve0distribu5on" "factorizes"according"to" "if"and"only"if" "is"an"I*map"of" .
28
Condi&onal)Independencies)of)BN• Condi'onal*independencies*of*a*Bayesian(network*can*be*characterized*in*two*ways:
• local*independencies
• global*independencies*(via* .separa0on)
• In*the*sequel,*we*consider*a*directed(graph*.
29
b c
g
a
d
i
e f
h
Local&Independencies
• Given' ,' 'is'independent'of'its'non#descendants'given'its'parents'
:
30
!separa'on
When%"influence"%can%flow%from% %to% %via% ,%we%say%that%the%trail% %is%ac)ve:
• "is"ac#ve"iff" "is"not"observed.
• "is"ac#ve"iff" "is"not"observed.
• "is"ac#ve"iff" "is"not"observed.
• (V(structure)" "is"ac#ve"iff"either" "or"some"of" 's"descendants"is"observed.
32
!separa'on*(cont'd)
• A#trail# #is#called#ac#ve#when#all#sub2trails# #are#ac#ve.
• Let# #be#three#sets#of#ver8ces#of# .# #and# #are# 2separated#by# ,#denoted#by# ,#if#there#is#neither#direct#link#nor#ac8ve#trail#between##and# #when# #are#observed.
33
b c
g
a
d
i
e f
h
Global&Independencies
• Given' ,' 'is'independent'of' 'given' 'if''and' 'is' 'separated'by' 'on'
the'graph' :
•
• (Soundness)"If" "factorize"according"to" ,"then"
,"or"we"say" "is"an"I2map"of" .
34
Moralized*Graphs
• Given'a'directed'graph' ,'we'can'construct'a'moralized'graph,'denoted'by' 'by'adding'edges'between'each'node'and'its'parents'and'between'each'node's'parents.'
• In' ,'the'subgraph'that'span' 'forms'a'clique,'denoted'by' .'
• The'procedure'of'construc=ng' 'from' 'is'called'moraliza2on.
35
From%BN%to%MRF
If# #factorizes#according#to# #as
then% %factorizes%according%to% :
• .#Is#the#opposite#true?
37
From%BN%to%MRF%(cont'd)• In$general,$moraliza(on$may$cause$the$loss$of$condi(onal,independencies.
• $may$be$a$proper$subset$of$ .
• Consider$a$DAG:$
• $holds$for$ $but$not$for$ .$
• Not$every$MRF$can$be$converted$to$a$BN.
38
Factor'Graphs• An$MRF$does$not$always$fully$reveal$the$factorized.structure$of$a$distribu8on.
• A$factor.graph$can$some8mes$give$a$more$accurate$characteriza8on$of$a$family$of$distribu8ons.$
• A$factor.graph$is$a$bipar4te.graph$with$links$between$two$types$of$nodes:$variables$and$factors.$
• A$variable$ $and$a$factor$ $is$linked$in$a$factor$graph,$if$the$factor$involves$ $as$an$argument.
39
Study&of&Distribu.ons• Graphical*models* *structure*of*(in)dependencies
• Exponen8al*families* *algebraic*characteris8cs
41
Exponen'al*Families
An#exponen&al)family# #over#a#measure#space# :
• sufficient)sta+s+cs:"
• canonical)parameter)func+on:"
• par++on)func+on:"
• base)density:" "over"42
Par$$on'Func$on• The%par$$on'func$on%is%given%by:
• The%log$par((on*func(on%given%by%%is%o.en%used%instead%of% .
43
Parameter'Space• An$exponen)al$family$is$essen)ally$determined$by$the$domain$ $and$the$sufficient-sta.s.cs$ .
• The$set$of$valid$parameters$is$
• An$exponen)al$family$can$be$parameterized$in$many$ways.$When$ ,$it$is$said$to$be$in$the$canonical-form.$
44
Many%important%families%of%distribu3ons%are%exponen3al%families:
• Binomial(distribu/ons
• Poisson(distribu/on
• Normal(distribu/on
• Exponen/al(distribu/on
• Beta(distribu/on
• And%many%more%...
45
Bernoulli)Distribu.on
Domain:(
Parameter:(
Density:)
Bernoulli)distribu.ons!describe!an!event!that!may!or!may!not!happen.
46
Bernoulli)Distribu.on)(cont'd)
• sufficient)sta+s+cs:"
• canonical)parameter:"
• base)density:" "w.r.t."coun'ng
• par++on)func+on:"
47
Bernoulli)Distribu.on)(cont'd)
• sufficient)sta+s+cs:"
• canonical)parameters:"
• base)density:" "w.r.t."coun'ng
• par++on)func+on:"
48
Poisson&Distribu,on
Domain:(
Parameter:(
Density:
Poisson&distribu,ons!characterize!the!number!of!independent!events!occurring!in!a!certain!rate!!within!a!unit!6me.
49
Poisson&Distribu,on&(cont'd)
• sufficient)sta+s+cs:"
• canonical)parameter:" "
• base)density:" "w.r.t."coun'ng
• par++on)func+on:"
50
Exponen'al*Distribu'on
Domain:(
Parameter:(
Density:)
Exponen'al*distribu'ons!characterize!the!*me!interval!between!independent!events!occurring!at!a!certain!rate! .!
51
Exponen'al*Distribu'on*(cont'd)
• sufficient)sta+s+cs:" "or"
• canonical)parameter:" "or"
• base)density:" "w.r.t."Lebesgue
• par++on)func+on:" ,"
which"is"finite"only"when" .
52
Normal'Distribu.on
Domain:(
Parameter:(
Density:
Normal'distribu.ons!are!probably!the!most!widely!used!distribu2ons!in!probabilis2c!analysis.
53
Normal'Distribu.on'(cont'd)
• sufficient)sta+s+cs:"
• canonical)parameter:" "
• base)density:" "w.r.t."Lebesgue
• par++on)func+on:"
54
Normal'Distr.'in'Canonical'Form
The$normal$distribu1on$can$be$alterna1vely$parameterized$in$the$canonical'form:
• poten&al)coefficient:"
• precision)coefficient:"
with% .
55
Regular(Family
In#the#sequel,#we#focus#on#exponen2al#families#in#the#canonical'form,#the#set#of#valid#canonical'parameters#is:
The$exponen)al$family$ $is$called$a$regular'family,$if$$is$an$open$subset$of$ .$We$restrict$our$
a:en)on$to$regular'families.$
56
Iden%fiability
Let$ $be$a$parameterized$family:
• "is"called"iden%fiable"when"each"distribu1on"in" "corresponds"to"a"unique"parameter"in" :
!
• Iden%fiability"means"that"the"parameter"of"a"distribu1on"can"be"learned"from"observed3samples"without"the"need"of"addi1onal"constraints.
57
Minimal'and'Overcomplete
Consider)an)exponen-al)family)with)sufficient)stats) ,
• If$there$exist$ $such$that$$holds$almost$everywhere,$this$is$called$
a$overcomplete*representa.on,$otherwise,$it$is$called$a$minimal*representa.on.
• An$exponen;al$family$is$iden.fiable$if$and$only$if$the$representa;on$is$minimal.$Why?
58
Minimal'and'Overcomplete'(cont'd)• Consider*an*exponen.al*family*with*sufficient*stats*
*such*that* *is*constant,*then*for*each*,* *for*each* *is*also*in* *and*it*
yields*the*same*distribu.on.
• We*will*answer*why*minimal&representa,on&is&iden,fiable*later.
• Overcomplete&representa,on*is*useful*as*it*may*lead*to*more*natural*parameteriza.on.*Also,*with*addi.onal*constraints,*it*can*be*made*iden,fiable.
59
Bernoulli)Revisited
Consider)two)representa.ons:
• [R1]" "
• [R2]"
For$each$representa-on:
• Is$it$minimal$or$overcomplete?
• If$it$is$overcomplete,$find$ $such$that$
• Is$it$iden.fiable$or$uniden.fiable?60
Mean%Parameters• The%expecta'on%of%sufficient%sta0s0cs%as%below%are%called%mean+parameters:
• Under'certain'condi+ons,'the'distribu+on'in'an'exponen+al'family'is'uniquely'determined'by'the'mean-parameters,'which'thus'provide'an'alterna+ve'parameteriza+on.
61
Realizable(Mean(Parameters
• Given'a'sufficient'stats' ,'we'say'a'distribu4on' 'realizes'a'mean*parameter' 'if' .
• The'set'of'(realizable)*mean*parameters'for'a'given'sufficient'stats' 'is:
• Here,' 'is'not'restricted'to'the'exponen4al'family.'
• 'is'a'convex*set.'Why?
62
Convex'Hulls
• Given'a'set' ,'the'convex'hull'of' ,'denoted'by' ,'is'the'set'of'all'convex'combina/ons'of'elements'in' .
• 'are'the'minimum'convex'set'containing' .
• A'convex'hull'of'some'finite'set'is'called'a'convex'polytope.
• Convex'polytopes'are'compact.
63
Probability*Simplex
Given&a&finite&space& ,&the&probability*simplex&over& :
When% ,% %reduces%to:%
and$ $is$an$ 'dimensional$polytope.
64
Polytope(of(Mean(Parameters
When%the%sample%space% %is%finite,%given%any%,%the%set% %is%a%convex'polytope:
Par$cularly,*each* *can*be*wri1en*as
65
Log$par((on*Func(on
The$log$par((on*func(on$given$by$
has$the$following$proper0es:
• "is"a"convex'func*on"and"thus""is"a"convex'set.
66
Log$par((on*Func(on*(cont'd)
• For%an%overcomplete*representa.on%with% ,%%has% ,%because:%
• Conversely,,for,a,minimal&representa,on,,we,have,,for,every,non1zero,vector, ,,hence,
,is,posi7ve,definite,for,every, ,and,thus, ,is,strictly&convex.,
67
Gradient)Map
The$gradient)map$ $is$a$mapping$from$the$canonical)parameters$ $to$the$mean)parameters$ .
• When&is& &injec&ve((i.e.(one,to,one)&?
• When&is& &surjec&ve&(onto& )&?&
68
Gradient)Map)(cont'd)• The%gradient)map%is%injec.ve%if%and%only%if%the%exponen2al%representa2on%is%minimal.
• An%exponen2al%family%with%minimal)representa.on%is%iden.fiable.
• How%to%prove?% %is%strictly)convex% %.
• With%overcomplete)representa.on,%there%is%one=to=one%correspondence%between%mean%parameters%and%affine)subsets%of% .
69
Gradient)Map)(cont'd)
• With&minimal&representa,on,& &is&onto& ,&the&interior&of& .&
• Each&mean¶meter& &is&uniquely&realized&by&a&canonical¶meter& .
• Given& ,&there&can&be&many&distribu;ons&that&realize& ,&among&which&there&is&one&that&maximizes&the&entropy,&which&is&in&the&exponen;al&family&associated&with& &(we&will&see&this).
70
Maximum'Entropy'(Solu2on)
Using&the&method'of'Lagrange'mul0pliers,&we&get&the&op.ma& :
The$solu)on$to$an$maximum&entropy&problem$with$expecta1on&constraints$is$always$an$exponen1al&family&distribu1on.$This$can$be$generalized$to$con)nuous$space,$using$calculus&of&varia1ons.
73
Kullback(Leibler-Divergence
The$Kullback(Leibler-divergence$(or$KL-divergence)$between$two$probability$densi4es$ $and$ $(w.r.t.$the$same$base$measure)$is$defined$to$be
KL#divergence#is#not$symmetric.
74
KL#Divergence#(cont'd)
• (Gibbs&inequality)" ,"where"the"equality"holds"if"and"only"if" "almost"everywhere"(w.r.t."the"base"measure)."
• Given"two"distribu;ons" "in"the"same"exponen;al"family"with"sufficient"stats" :
75
Projec'ons*of*Distribu'ons
Let$ $be$an$exponen&al)family$and$ $be$a$distribu-on,$both$over$the$same$space$ :
• The%I"projec)on+(informa)on+projec)on)%of% %onto% :
• The%M"projec)on+(moment+projec)on)%of% %onto% :
76
Maximum'Likelihood'Es1ma1on
Given&a¶meterized&family& &over& ,&and& ,&the&log$likelihood&of& :
!is!called!an!maximum&likelihood&es.mate!given! !if
77
MLE$(cont'd)
• Given' ,' 'is'called'an'
empirical)probability)measure,'which'has
• The%log$likelihood%of% %given% %can%be%rewri1en%as
78
MLE$(cont'd)• We$have$
.
• Maximizing$ $is$equivalent$to$minimizing$ .
• Maximum'likelihood'es/ma/on$is$equivalent$to$M1projec/on$of$the$empirical'distribu/on$ $to$a$given$family.
79
M"projec)ons
Given&an&exponen+al&family& &and&an&arbitrary&distribu+on& &over& ,&then
Thus,&the&op#ma& &is
80
M"projec)ons,(cont'd)
• This&is&a&convex'problem.&The& &is&op-mal&iff:
• M0projec-on&to&an&exponen1al&family& &is&to&find&a&distribu1on&in& &whose&mean'parameter&matches&the&input&mean& .&( &is&always&realizable,&why?)
• With&minimal'representa-on&the&op1ma&is&unique;&otherwise,&the&set&of&op1mal&solu1ons&is&an&affine'subset&of& ,&which&yield&the&same&distribu1on.
81
What%about%I"projec)ons%?
• We$will$see$their$u-lity$when$we$talk$about$mean%field%methods$and$varia0onal%inference.
82
Convex'Conjugate
Let$ $be$a$real)valued$func0on$ :
• The%convex'conjugate%of% %is%defined%to%be
• "is"always"convex"no"ma,er"whether"so"is" .
• "is"convex.
83
Convex'Conjugate
• (Fenchel's*inequality)"
• (Fenchel2Moreau*theorem)" "iff" "is"convex"and"lower*semi2con:nuous:
84
Conjugate*Duality
• "is"called"dually&coupled"if"
.
• The%convex'conjugate%to%a%log+par..on%func.on% :
• Supreme(a*ained(at( (iff( (is(dually&coupled.
85
Conjugate*Duality*(cont'd)
• With& ,&the&log$par((on*func(on& &has:
• Supreme(a*ained(at( (iff( (is(dually&coupled,(which(has( .
• With(a(minimal&representa1on,( (maps( (one8to8one(onto( ,(while( (is(the(inverse(map.
87
Prior%and%Posterior• In$Bayesian(analysis,$we$usually$place$a$prior$with$density$ $over$the$parameter$space$ .
• A$parameter$ $is$linked$to$observa;ons$ $via$a$likelihood(model:$ .
• The$posterior(measure$given$ $is$
88
Prior%and%Posterior%(cont'd)• Compu'ng*the*posterior(distribu,on*is*in*generally*very*difficult.
• However,*under*certain(condi,on*(e.g.*when*the*prior*is*conjugate*to*the*likelihood(model),*the*computa'on*becomes*par'cularly*easy.
89
Conjugate*Prior
• A#prior#with#density# #is#called#a#conjugate*prior#to#the#likelihood#model# ,#if#the#posterior#distribu9on#given# #is#in#the#same#parameterized#family,#i.e.#in#the#form#
• #is#le01associa3ve#and#sa9sfies#.
• When# ,# .#The#result#is#independent#of#the#order#of#samples.
90
CP#for#Exponen,al#Families
Generally,)conjugate*pairs)in)exponen0al*families)are)in)the)following)form:
• Prior:'
• Likelihood:'
91
CP#for#Exponen,al#Families#(cont'd)
Hence,&the&posterior(update:
• with&a&single&observa1on:&
• with&mul*ple&observa*ons:&
92
CP#for#Exponen,al#Families#(cont'd)• The%family%of%conjugate*priors%is%largely%determined%by%the%likelihood*model,%par6cularly%by%the%form%of%
%and% .
• A%family%of%prior*distribu5ons%can%serve%as%the%conjugate*priors%to%different%likelihood*model.%
93
Example:)Beta,Bernoulli• Prior:'Beta'distribu0on
• Likelihood:+Bernoulli+distribu3on
• Posterior:*remains*a*Beta*distribu2on
94
Example:)Normal-Normal• Prior:'Normal'distribu1on
• Likelihood:+Normal+distribu4on+(fixed+variance)
• Posterior:*remains*a*Normal*distribu3on
95
Dirichlet)Distribu-on
• Dirichlet)distribu.on"is"a"distribu+on"over" .
• It"is"o2en"used"as"a"conjugate)prior"to"the"Categorical)distribu.on"or"the"Mul.nomial)distribu.on.
• With" "as"the"parameter,"its"density:
96
Dirichlet)Categorical
• Prior:'
• Likelihood:'
• Posterior:'remains'a'Dirichlet'distribu7on:'
• When' ,' 'reduces'to'a'uniform(distribu-on'over' .
98
Dirichlet)Distribu-on)(cont'd)• Dirichlet*distribu/ons*are*an*exponen&al)family:
• Canonical*parameter:*
• Sufficient*stats:*
• Log;par//on:*
• Hence,*
99
Predic've)Distribu'on
Given& ,&the&distribu/on&of&a&new&sample& ?
With%exponen&al)family%and%conjugacy,%we%have
100
Important)Conjugate)Pairs• Beta":"the"probability,parameter"of"Bernoulli,"Binomial,"Geometric,"or"Nega4ve,Binomial
• Normal:"the"mean,parameter"of"Normal
• InverseGamma:"the"variance,parameter"of"Normal
• Gamma:"the"rate,parameter"of"Exponen4al"or"Poisson,"or"the"precision,parameter"of"Normal
101
Important)Conjugate)Pairs)(cont'd)• Dirichlet:#the#probability.vector#of#Categorical#or#Mul4nomial
• Mul4variate.Normal:#the#mean.vector#of#Mul4variate.Normal
• InverseWishart:#the#covariance.matrix#of#Mul4variate.Normal
• Wishart:#the#precision.matrix#of#Mul4variate.Normal
102
N
M
µk
�2
zi
xi
⇡
GMM
A"Gaussian'Mixture'Model'(GMM)"with"fixed"variance:
• This&model&is¬$complete
• How&are& &and& &generated?
104
N
M
µk
�2
zi
xi
⇡
µ0
�20
↵
GMM#(with#Prior)
With%priors%placed%over%model*parameters,%we%get%a%Hierarchical*Bayesian*Model:
• Hyperparameters"have"no"parents"(top"level)
• Observa-ons"have"no"children"(bo4om"level)
• Each"unknown"variable"is"generated"according"to"its"parents
105
M
Nnd
✓d
zdi
wdi
�k
PLSI
Probabilis)c+Latent+Seman)c+Indexing+(PLSI):
• Each&topic&is&associated&with&,&a&distribu2on&over&the&
vocabulary.
• Each&document& &comes&with&a&vector&of&topic+propor-ons& &
• To&generate&each&word& :
• This&is¬&a&complete&model.
108
M
Nnd
✓d
zdi
wdi
�k
↵
�
LDA
Latent&Dirichlet&Alloca/on&(LDA)!completes!PLSI!by!placing!Dirichlet&priors!over!latent!variables:
• For%each%document,%the%topic&propor(ons%are%generated%as%
• For%each%topic,%the%word%distribu+on%is%generated%as
109
Summary• The%Basics%of%Graphical%Models
• Bayesian%Networks%and%Markov%Random%Fields.%
• How%the%joint%distribuBon%factorizes%according%to%the%graph.
• RelaBons%between%graphical%structure%and%condiBonal%independencies.
• Factor%graphs.
111
Summary'(cont'd)• The%Basics%of%Exponen1al%Families
• The%form%of%exponen1al%families
• Minimal%and%overcomplete%representa1on,%iden1fiability
• Convexity%of%log@par11on%func1on,%gradient%map
• KL%divergence,%projec1ons%of%distribu1ons
• Conjugate%duality%between%log@par11on%func1on%and%nega1ve%entropy
112
Summary'(cont'd)• Conjugate+Prior
• Posterior+distribu2ons+in+Bayesian+analysis
• Conjugate+prior,+especially+of+exponen2al+families+
• Important+conjugate+pairs
113