114
Lecture 4 Graphical Model and Exponen1al Family Dahua Lin The Chinese University of Hong Kong 1

MLPI Lecture 4: Graphical Model and Exponential Family

Embed Size (px)

Citation preview

Lecture'4

Graphical)Model)and)Exponen1al)FamilyDahua%Lin

The$Chinese$University$of$Hong$Kong

1

Outline

You$will$learn$the$basics$of$probabilis3c$modeling$in$this$lecture:

• Graphical*Models

• Exponen3al*Families

• Conjugate*Prior

• How*to*formulate*and*analyze*a*graphical*models*in*prac3ce.

2

Graphical)Models• The%key%idea%behind%graphical)models%is%factoriza2on.

• A%graphical)model%generally%refers%to%a%family%of%joint%distribu8ons%over%mul8ple%variables%that%factorize%according%to%the%structure%of%the%underlying%graph.%

3

Graphical)Models

A"graphical)model"can"be"viewed"in"two"ways:

• A#data#structure#that#provides#the#skeleton#for#represen3ng#a#joint#distribu3on#in#a#factorized#manner.

• A#compact#representa3on#of#a#set#of#condi/onal0independencies#about#a#family#of#distribu3ons.

These%two%views%are%equivalent%in%a%strict%sense.

4

Distribu(ons+on+a+Graph

Consider)a)graph) ,)where)edges)can)be)directed)or)undirected:

• A#ach'a'random'variable' 'to'each'vertex'

• The'state'space'for' 'is'denoted'by'

• A'par9cular'instance'of' 'is'denoted'by'

• We'can'also'consider'a'set'of'variables:' 'and'

5

Categories*of*Graphical*Models• Bayesian)Networks)(Directed)Acyclic)Graphs)

• Markov)Random)Fields)(Undirected)Graphs)

• Chain)Graphs)(Directed)acyclic)graphs)over)undirected)components)

• Factor)Graphs

6

Directed(Acyclic(Graphs

Consider)a)directed'graph) :

• "is"called"a"directed'acyclic'graph'(DAG)"if"it"has"no"directed'acyclic'cycles

7

Directed(Acyclic(Graphs((cont'd)

Consider)a)directed'acyclic'graph) :

• Given'an'edge' ,' 'is'called'a'parent'of' ,'and' 'is'called'a'child'of' .

• A'vertex' 'is'called'an'ancestor'of' 'and' 'an'descendant'of' ,'denoted'as' ,'if'there'exists'a'directed'path'from' 'to' .

8

Topological)Ordering• A#topological)ordering#of#a#directed)graph#

#is#a#linear)ordering#of#ver,ces#such#that#for#each#edge# ,# #always#comes#before# .

• A#finite&directed&graph#is#acyclic#if#and#only#if#it#has#a#topological&ordering.

9

Bayesian(Networks

Given&a&DAG& ,&we&say&a&joint&distribu,on&over& &factorizes&according&to& ,&if&its&density& &can&be&expressed&as:

• Such&a&model&is&called&a&Bayesian(Network&over& .

• &is&the&set&of& 's&parents,&which&can&be&empty.

10

Bayesian(Networks:(Example

!

11

Undirected)Graphs)and)Cliques

Consider)an)undirected)graph)

• A#clique#is#a#fully#connected#subset#of#ver4ces

• A#clique#is#called#maximal#if#it#is#not#properly.contained#in#another#clique.#

• #denotes#the#set#of#all#maximal.cliques.#

12

Undirected)Graphs)and)Cliques)(cont'd)

13

Markov'Random'Fields

Consider)an)undirected)graph) ,)we)say)a)joint)distribu2on)of) )factorizes)according)to) )if)its)density) )can)be)expressed)as:

• This&is&called&a&Markov'Random'Field&over& .

• &are&called&factors.

14

Markov'Random'Fields'(cont'd)

• The%normalizing*constant% %is%usually%needed%to%ensure%the%distribu2on%is%properly%normalized:

• Generally,*the*compa&bility,func&ons* *need*not*have*any*obvious*rela4ons*with*the*marginal*or*condi4onal*distribu4ons*over*the*cliques.

15

a b

c

MRF$Parameteriza-on

• All$MRFs$can$be$parameterized$in$terms$of$maximal&cliques.$In$prac9ce,$this$is$not$necessarily$the$most$natural$way.

• Natural(parameteriza.on:(

• Maximal4clique(based:(

(with(

16

The$graphical$structure$also$encodes$a$set$of$condi3onal$independencies$among$the$variables.

17

Condi&onal)Independence

Consider)a)joint)distribu/on)over) ,) )and) )are)called)condi&onally*independent)given) ,)denoted)by) )iff

More%generally,

18

Condi&onal)Independence)(cont'd)

If#the#condi,onal#distribu,ons# #and# #have#densi,es# #and# ,#then# ,#if#the#following#equality#holds#almost'surely:

!

19

I"map

• Let% %be%a%family%of%distribu2ons%(e.g.%a%graphical%model).%We%define% %to%be%the%set%of%condi)onal,independencies%in%the%form%of% %that%hold%for%all%distribu2ons%in% .

• Given%a%graph% %associated%with%a%set%of%condi)onal,independencies% ,%then% %is%called%an%I0map%of% %if% .

• An%I0map%is%a%graph%that%captures%(part%of)%the%condi2onal%independencies%of%a%distribu2on%family.%

20

Condi&onal)Independencies)of)MRFs• The%condi+onal%independencies%of%an%MRF%can%be%characterized%in%three%ways:

• Local&independencies

• Pairwise&independencies

• Global&independencies

• In%the%sequel,%we%consider%an%undirected&graph%.

21

Local&Independencies

• For%each% ,%the%Markov'blanket%of% %w.r.t.% ,%denoted%by% ,%is%the%set%of%all%neighbors%of% .

• Local'independencies:% %is%independent%of%the%rest%given%its%neighbors.

22

Pairwise(Independencies

• Pairwise(independencies:#Given#two#disjoint#sets# #with#no#direct#edges#between#them,# #is#independent#of#

#given#the#rest:

23

Global&Independencies

• We$say$ $separates$ $and$ ,$denoted$by$ ,$if$all$paths$between$ $and$ $go$through$ .

• Global+independencies:$If$ $separates$ $and$ ,$then$ $is$independent$of$ $given$ .

24

Rela%ons)between)Independencies

• Given'a'distribu/on'or'a'family'of'distribu/on' ,'we'say' 'sa#sfies' 'if'it'sa/sfies'all'condi#onal,independencies'in' ,'denoted'by' .

• .

• If' 'is'a'family'of'posi#ve,distribu#ons,'then

25

Soundness

• Let% %be%a%distribu-on%that%factorizes%according%to%an%undirected%graph% ,%then% ,%or%in%other%words,% %is%an%I"map%of%

• %and% .

• How%to%proof?

• How%is%the%separa)on,assump)on%related%to%the%maximal,cliques?

26

We#have#shown#that#if# #factorizes#according#to# ,#then# #is#an#I5map#for# .#Is#the#converse#also#true?

27

Hammersley)Clifford

• (Hammersley*Clifford0Theorem)"Let" "be"a"posi5ve0distribu5on"over" "and" "be"an"I.map"of","then" "factorizes"according"to" ."

• Combining"Soundness"and"Hammersley*Clifford:

• A"posi5ve0distribu5on" "factorizes"according"to" "if"and"only"if" "is"an"I*map"of" .

28

Condi&onal)Independencies)of)BN• Condi'onal*independencies*of*a*Bayesian(network*can*be*characterized*in*two*ways:

• local*independencies

• global*independencies*(via* .separa0on)

• In*the*sequel,*we*consider*a*directed(graph*.

29

b c

g

a

d

i

e f

h

Local&Independencies

• Given' ,' 'is'independent'of'its'non#descendants'given'its'parents'

:

30

X

Z

Y

Y

Z

X

Z

X Y Z

X Y

indirect effect

common cause common effect

31

!separa'on

When%"influence"%can%flow%from% %to% %via% ,%we%say%that%the%trail% %is%ac)ve:

• "is"ac#ve"iff" "is"not"observed.

• "is"ac#ve"iff" "is"not"observed.

• "is"ac#ve"iff" "is"not"observed.

• (V(structure)" "is"ac#ve"iff"either" "or"some"of" 's"descendants"is"observed.

32

!separa'on*(cont'd)

• A#trail# #is#called#ac#ve#when#all#sub2trails# #are#ac#ve.

• Let# #be#three#sets#of#ver8ces#of# .# #and# #are# 2separated#by# ,#denoted#by# ,#if#there#is#neither#direct#link#nor#ac8ve#trail#between##and# #when# #are#observed.

33

b c

g

a

d

i

e f

h

Global&Independencies

• Given' ,' 'is'independent'of' 'given' 'if''and' 'is' 'separated'by' 'on'

the'graph' :

• (Soundness)"If" "factorize"according"to" ,"then"

,"or"we"say" "is"an"I2map"of" .

34

Moralized*Graphs

• Given'a'directed'graph' ,'we'can'construct'a'moralized'graph,'denoted'by' 'by'adding'edges'between'each'node'and'its'parents'and'between'each'node's'parents.'

• In' ,'the'subgraph'that'span' 'forms'a'clique,'denoted'by' .'

• The'procedure'of'construc=ng' 'from' 'is'called'moraliza2on.

35

Moralized*Graphs*(Illustra3on)

b c

g

a

d

i

e f

h

b c

g

a

d

i

e f

h

36

From%BN%to%MRF

If# #factorizes#according#to# #as

then% %factorizes%according%to% :

• .#Is#the#opposite#true?

37

From%BN%to%MRF%(cont'd)• In$general,$moraliza(on$may$cause$the$loss$of$condi(onal,independencies.

• $may$be$a$proper$subset$of$ .

• Consider$a$DAG:$

• $holds$for$ $but$not$for$ .$

• Not$every$MRF$can$be$converted$to$a$BN.

38

Factor'Graphs• An$MRF$does$not$always$fully$reveal$the$factorized.structure$of$a$distribu8on.

• A$factor.graph$can$some8mes$give$a$more$accurate$characteriza8on$of$a$family$of$distribu8ons.$

• A$factor.graph$is$a$bipar4te.graph$with$links$between$two$types$of$nodes:$variables$and$factors.$

• A$variable$ $and$a$factor$ $is$linked$in$a$factor$graph,$if$the$factor$involves$ $as$an$argument.

39

Factor'Graphs'(Illustra0on)

40

Study&of&Distribu.ons• Graphical*models* *structure*of*(in)dependencies

• Exponen8al*families* *algebraic*characteris8cs

41

Exponen'al*Families

An#exponen&al)family# #over#a#measure#space# :

• sufficient)sta+s+cs:"

• canonical)parameter)func+on:"

• par++on)func+on:"

• base)density:" "over"42

Par$$on'Func$on• The%par$$on'func$on%is%given%by:

• The%log$par((on*func(on%given%by%%is%o.en%used%instead%of% .

43

Parameter'Space• An$exponen)al$family$is$essen)ally$determined$by$the$domain$ $and$the$sufficient-sta.s.cs$ .

• The$set$of$valid$parameters$is$

• An$exponen)al$family$can$be$parameterized$in$many$ways.$When$ ,$it$is$said$to$be$in$the$canonical-form.$

44

Many%important%families%of%distribu3ons%are%exponen3al%families:

• Binomial(distribu/ons

• Poisson(distribu/on

• Normal(distribu/on

• Exponen/al(distribu/on

• Beta(distribu/on

• And%many%more%...

45

Bernoulli)Distribu.on

Domain:(

Parameter:(

Density:)

Bernoulli)distribu.ons!describe!an!event!that!may!or!may!not!happen.

46

Bernoulli)Distribu.on)(cont'd)

• sufficient)sta+s+cs:"

• canonical)parameter:"

• base)density:" "w.r.t."coun'ng

• par++on)func+on:"

47

Bernoulli)Distribu.on)(cont'd)

• sufficient)sta+s+cs:"

• canonical)parameters:"

• base)density:" "w.r.t."coun'ng

• par++on)func+on:"

48

Poisson&Distribu,on

Domain:(

Parameter:(

Density:

Poisson&distribu,ons!characterize!the!number!of!independent!events!occurring!in!a!certain!rate!!within!a!unit!6me.

49

Poisson&Distribu,on&(cont'd)

• sufficient)sta+s+cs:"

• canonical)parameter:" "

• base)density:" "w.r.t."coun'ng

• par++on)func+on:"

50

Exponen'al*Distribu'on

Domain:(

Parameter:(

Density:)

Exponen'al*distribu'ons!characterize!the!*me!interval!between!independent!events!occurring!at!a!certain!rate! .!

51

Exponen'al*Distribu'on*(cont'd)

• sufficient)sta+s+cs:" "or"

• canonical)parameter:" "or"

• base)density:" "w.r.t."Lebesgue

• par++on)func+on:" ,"

which"is"finite"only"when" .

52

Normal'Distribu.on

Domain:(

Parameter:(

Density:

Normal'distribu.ons!are!probably!the!most!widely!used!distribu2ons!in!probabilis2c!analysis.

53

Normal'Distribu.on'(cont'd)

• sufficient)sta+s+cs:"

• canonical)parameter:" "

• base)density:" "w.r.t."Lebesgue

• par++on)func+on:"

54

Normal'Distr.'in'Canonical'Form

The$normal$distribu1on$can$be$alterna1vely$parameterized$in$the$canonical'form:

• poten&al)coefficient:"

• precision)coefficient:"

with% .

55

Regular(Family

In#the#sequel,#we#focus#on#exponen2al#families#in#the#canonical'form,#the#set#of#valid#canonical'parameters#is:

The$exponen)al$family$ $is$called$a$regular'family,$if$$is$an$open$subset$of$ .$We$restrict$our$

a:en)on$to$regular'families.$

56

Iden%fiability

Let$ $be$a$parameterized$family:

• "is"called"iden%fiable"when"each"distribu1on"in" "corresponds"to"a"unique"parameter"in" :

!

• Iden%fiability"means"that"the"parameter"of"a"distribu1on"can"be"learned"from"observed3samples"without"the"need"of"addi1onal"constraints.

57

Minimal'and'Overcomplete

Consider)an)exponen-al)family)with)sufficient)stats) ,

• If$there$exist$ $such$that$$holds$almost$everywhere,$this$is$called$

a$overcomplete*representa.on,$otherwise,$it$is$called$a$minimal*representa.on.

• An$exponen;al$family$is$iden.fiable$if$and$only$if$the$representa;on$is$minimal.$Why?

58

Minimal'and'Overcomplete'(cont'd)• Consider*an*exponen.al*family*with*sufficient*stats*

*such*that* *is*constant,*then*for*each*,* *for*each* *is*also*in* *and*it*

yields*the*same*distribu.on.

• We*will*answer*why*minimal&representa,on&is&iden,fiable*later.

• Overcomplete&representa,on*is*useful*as*it*may*lead*to*more*natural*parameteriza.on.*Also,*with*addi.onal*constraints,*it*can*be*made*iden,fiable.

59

Bernoulli)Revisited

Consider)two)representa.ons:

• [R1]" "

• [R2]"

For$each$representa-on:

• Is$it$minimal$or$overcomplete?

• If$it$is$overcomplete,$find$ $such$that$

• Is$it$iden.fiable$or$uniden.fiable?60

Mean%Parameters• The%expecta'on%of%sufficient%sta0s0cs%as%below%are%called%mean+parameters:

• Under'certain'condi+ons,'the'distribu+on'in'an'exponen+al'family'is'uniquely'determined'by'the'mean-parameters,'which'thus'provide'an'alterna+ve'parameteriza+on.

61

Realizable(Mean(Parameters

• Given'a'sufficient'stats' ,'we'say'a'distribu4on' 'realizes'a'mean*parameter' 'if' .

• The'set'of'(realizable)*mean*parameters'for'a'given'sufficient'stats' 'is:

• Here,' 'is'not'restricted'to'the'exponen4al'family.'

• 'is'a'convex*set.'Why?

62

Convex'Hulls

• Given'a'set' ,'the'convex'hull'of' ,'denoted'by' ,'is'the'set'of'all'convex'combina/ons'of'elements'in' .

• 'are'the'minimum'convex'set'containing' .

• A'convex'hull'of'some'finite'set'is'called'a'convex'polytope.

• Convex'polytopes'are'compact.

63

Probability*Simplex

Given&a&finite&space& ,&the&probability*simplex&over& :

When% ,% %reduces%to:%

and$ $is$an$ 'dimensional$polytope.

64

Polytope(of(Mean(Parameters

When%the%sample%space% %is%finite,%given%any%,%the%set% %is%a%convex'polytope:

Par$cularly,*each* *can*be*wri1en*as

65

Log$par((on*Func(on

The$log$par((on*func(on$given$by$

has$the$following$proper0es:

• "is"a"convex'func*on"and"thus""is"a"convex'set.

66

Log$par((on*Func(on*(cont'd)

• For%an%overcomplete*representa.on%with% ,%%has% ,%because:%

• Conversely,,for,a,minimal&representa,on,,we,have,,for,every,non1zero,vector, ,,hence,

,is,posi7ve,definite,for,every, ,and,thus, ,is,strictly&convex.,

67

Gradient)Map

The$gradient)map$ $is$a$mapping$from$the$canonical)parameters$ $to$the$mean)parameters$ .

• When&is& &injec&ve((i.e.(one,to,one)&?

• When&is& &surjec&ve&(onto& )&?&

68

Gradient)Map)(cont'd)• The%gradient)map%is%injec.ve%if%and%only%if%the%exponen2al%representa2on%is%minimal.

• An%exponen2al%family%with%minimal)representa.on%is%iden.fiable.

• How%to%prove?% %is%strictly)convex% %.

• With%overcomplete)representa.on,%there%is%one=to=one%correspondence%between%mean%parameters%and%affine)subsets%of% .

69

Gradient)Map)(cont'd)

• With&minimal&representa,on,& &is&onto& ,&the&interior&of& .&

• Each&mean&parameter& &is&uniquely&realized&by&a&canonical&parameter& .

• Given& ,&there&can&be&many&distribu;ons&that&realize& ,&among&which&there&is&one&that&maximizes&the&entropy,&which&is&in&the&exponen;al&family&associated&with& &(we&will&see&this).

70

Entropy

Given&an&exponen+al&family&distribu+on&

The$entropy$of$ $is$defined$to$be:

71

Maximum'Entropy'(Problem)

Consider)a)finite)space) ,)we)want)to

!

What's'the'solu,on?

72

Maximum'Entropy'(Solu2on)

Using&the&method'of'Lagrange'mul0pliers,&we&get&the&op.ma& :

The$solu)on$to$an$maximum&entropy&problem$with$expecta1on&constraints$is$always$an$exponen1al&family&distribu1on.$This$can$be$generalized$to$con)nuous$space,$using$calculus&of&varia1ons.

73

Kullback(Leibler-Divergence

The$Kullback(Leibler-divergence$(or$KL-divergence)$between$two$probability$densi4es$ $and$ $(w.r.t.$the$same$base$measure)$is$defined$to$be

KL#divergence#is#not$symmetric.

74

KL#Divergence#(cont'd)

• (Gibbs&inequality)" ,"where"the"equality"holds"if"and"only"if" "almost"everywhere"(w.r.t."the"base"measure)."

• Given"two"distribu;ons" "in"the"same"exponen;al"family"with"sufficient"stats" :

75

Projec'ons*of*Distribu'ons

Let$ $be$an$exponen&al)family$and$ $be$a$distribu-on,$both$over$the$same$space$ :

• The%I"projec)on+(informa)on+projec)on)%of% %onto% :

• The%M"projec)on+(moment+projec)on)%of% %onto% :

76

Maximum'Likelihood'Es1ma1on

Given&a&parameterized&family& &over& ,&and& ,&the&log$likelihood&of& :

!is!called!an!maximum&likelihood&es.mate!given! !if

77

MLE$(cont'd)

• Given' ,' 'is'called'an'

empirical)probability)measure,'which'has

• The%log$likelihood%of% %given% %can%be%rewri1en%as

78

MLE$(cont'd)• We$have$

.

• Maximizing$ $is$equivalent$to$minimizing$ .

• Maximum'likelihood'es/ma/on$is$equivalent$to$M1projec/on$of$the$empirical'distribu/on$ $to$a$given$family.

79

M"projec)ons

Given&an&exponen+al&family& &and&an&arbitrary&distribu+on& &over& ,&then

Thus,&the&op#ma& &is

80

M"projec)ons,(cont'd)

• This&is&a&convex'problem.&The& &is&op-mal&iff:

• M0projec-on&to&an&exponen1al&family& &is&to&find&a&distribu1on&in& &whose&mean'parameter&matches&the&input&mean& .&( &is&always&realizable,&why?)

• With&minimal'representa-on&the&op1ma&is&unique;&otherwise,&the&set&of&op1mal&solu1ons&is&an&affine'subset&of& ,&which&yield&the&same&distribu1on.

81

What%about%I"projec)ons%?

• We$will$see$their$u-lity$when$we$talk$about$mean%field%methods$and$varia0onal%inference.

82

Convex'Conjugate

Let$ $be$a$real)valued$func0on$ :

• The%convex'conjugate%of% %is%defined%to%be

• "is"always"convex"no"ma,er"whether"so"is" .

• "is"convex.

83

Convex'Conjugate

• (Fenchel's*inequality)"

• (Fenchel2Moreau*theorem)" "iff" "is"convex"and"lower*semi2con:nuous:

84

Conjugate*Duality

• "is"called"dually&coupled"if"

.

• The%convex'conjugate%to%a%log+par..on%func.on% :

• Supreme(a*ained(at( (iff( (is(dually&coupled.

85

Conjugate*Duality*(cont'd)

• "has:

• "on" "determined"via"Cauchy"sequences.

86

Conjugate*Duality*(cont'd)

• With& ,&the&log$par((on*func(on& &has:

• Supreme(a*ained(at( (iff( (is(dually&coupled,(which(has( .

• With(a(minimal&representa1on,( (maps( (one8to8one(onto( ,(while( (is(the(inverse(map.

87

Prior%and%Posterior• In$Bayesian(analysis,$we$usually$place$a$prior$with$density$ $over$the$parameter$space$ .

• A$parameter$ $is$linked$to$observa;ons$ $via$a$likelihood(model:$ .

• The$posterior(measure$given$ $is$

88

Prior%and%Posterior%(cont'd)• Compu'ng*the*posterior(distribu,on*is*in*generally*very*difficult.

• However,*under*certain(condi,on*(e.g.*when*the*prior*is*conjugate*to*the*likelihood(model),*the*computa'on*becomes*par'cularly*easy.

89

Conjugate*Prior

• A#prior#with#density# #is#called#a#conjugate*prior#to#the#likelihood#model# ,#if#the#posterior#distribu9on#given# #is#in#the#same#parameterized#family,#i.e.#in#the#form#

• #is#le01associa3ve#and#sa9sfies#.

• When# ,# .#The#result#is#independent#of#the#order#of#samples.

90

CP#for#Exponen,al#Families

Generally,)conjugate*pairs)in)exponen0al*families)are)in)the)following)form:

• Prior:'

• Likelihood:'

91

CP#for#Exponen,al#Families#(cont'd)

Hence,&the&posterior(update:

• with&a&single&observa1on:&

• with&mul*ple&observa*ons:&

92

CP#for#Exponen,al#Families#(cont'd)• The%family%of%conjugate*priors%is%largely%determined%by%the%likelihood*model,%par6cularly%by%the%form%of%

%and% .

• A%family%of%prior*distribu5ons%can%serve%as%the%conjugate*priors%to%different%likelihood*model.%

93

Example:)Beta,Bernoulli• Prior:'Beta'distribu0on

• Likelihood:+Bernoulli+distribu3on

• Posterior:*remains*a*Beta*distribu2on

94

Example:)Normal-Normal• Prior:'Normal'distribu1on

• Likelihood:+Normal+distribu4on+(fixed+variance)

• Posterior:*remains*a*Normal*distribu3on

95

Dirichlet)Distribu-on

• Dirichlet)distribu.on"is"a"distribu+on"over" .

• It"is"o2en"used"as"a"conjugate)prior"to"the"Categorical)distribu.on"or"the"Mul.nomial)distribu.on.

• With" "as"the"parameter,"its"density:

96

Dirichlet)Distribu-on)(cont'd)

• Mean:' 'with' .

• Covariance:'

• Mode:'

• Marginal:'97

Dirichlet)Categorical

• Prior:'

• Likelihood:'

• Posterior:'remains'a'Dirichlet'distribu7on:'

• When' ,' 'reduces'to'a'uniform(distribu-on'over' .

98

Dirichlet)Distribu-on)(cont'd)• Dirichlet*distribu/ons*are*an*exponen&al)family:

• Canonical*parameter:*

• Sufficient*stats:*

• Log;par//on:*

• Hence,*

99

Predic've)Distribu'on

Given& ,&the&distribu/on&of&a&new&sample& ?

With%exponen&al)family%and%conjugacy,%we%have

100

Important)Conjugate)Pairs• Beta":"the"probability,parameter"of"Bernoulli,"Binomial,"Geometric,"or"Nega4ve,Binomial

• Normal:"the"mean,parameter"of"Normal

• InverseGamma:"the"variance,parameter"of"Normal

• Gamma:"the"rate,parameter"of"Exponen4al"or"Poisson,"or"the"precision,parameter"of"Normal

101

Important)Conjugate)Pairs)(cont'd)• Dirichlet:#the#probability.vector#of#Categorical#or#Mul4nomial

• Mul4variate.Normal:#the#mean.vector#of#Mul4variate.Normal

• InverseWishart:#the#covariance.matrix#of#Mul4variate.Normal

• Wishart:#the#precision.matrix#of#Mul4variate.Normal

102

Examples)of)Graphical)Models

103

N

M

µk

�2

zi

xi

GMM

A"Gaussian'Mixture'Model'(GMM)"with"fixed"variance:

• This&model&is&not$complete

• How&are& &and& &generated?

104

N

M

µk

�2

zi

xi

µ0

�20

GMM#(with#Prior)

With%priors%placed%over%model*parameters,%we%get%a%Hierarchical*Bayesian*Model:

• Hyperparameters"have"no"parents"(top"level)

• Observa-ons"have"no"children"(bo4om"level)

• Each"unknown"variable"is"generated"according"to"its"parents

105

N

M

µk

�2

zi

xi

µ0

�20

GMM#(Joint#Model)

This%is%an%exponen&al)family.

106

Topic&Models

107

M

Nnd

✓d

zdi

wdi

�k

PLSI

Probabilis)c+Latent+Seman)c+Indexing+(PLSI):

• Each&topic&is&associated&with&,&a&distribu2on&over&the&

vocabulary.

• Each&document& &comes&with&a&vector&of&topic+propor-ons& &

• To&generate&each&word& :

• This&is&not&a&complete&model.

108

M

Nnd

✓d

zdi

wdi

�k

LDA

Latent&Dirichlet&Alloca/on&(LDA)!completes!PLSI!by!placing!Dirichlet&priors!over!latent!variables:

• For%each%document,%the%topic&propor(ons%are%generated%as%

• For%each%topic,%the%word%distribu+on%is%generated%as

109

MN

nd

✓d

zdi

wdi�k

LDA$(Joint$Model)

Again,'an'exponen&al)family.

110

Summary• The%Basics%of%Graphical%Models

• Bayesian%Networks%and%Markov%Random%Fields.%

• How%the%joint%distribuBon%factorizes%according%to%the%graph.

• RelaBons%between%graphical%structure%and%condiBonal%independencies.

• Factor%graphs.

111

Summary'(cont'd)• The%Basics%of%Exponen1al%Families

• The%form%of%exponen1al%families

• Minimal%and%overcomplete%representa1on,%iden1fiability

• Convexity%of%log@par11on%func1on,%gradient%map

• KL%divergence,%projec1ons%of%distribu1ons

• Conjugate%duality%between%log@par11on%func1on%and%nega1ve%entropy

112

Summary'(cont'd)• Conjugate+Prior

• Posterior+distribu2ons+in+Bayesian+analysis

• Conjugate+prior,+especially+of+exponen2al+families+

• Important+conjugate+pairs

113

Summary'(cont'd)• Prac&ce

• How+to+formulate+a+graphical+model+based+on+intui&on

• Graphical+representa&on+of+a+model,+factor+graph

• Analysis+of+the+joint+distribu&on

114