Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
CS839:ProbabilisticGraphicalModels
Lecture3:UndirectedGraphicalModels
TheoRekatsinas
1
RepresentingMultivariateDistributions
2
Section1
•Undirected:Markovrandomfields•Undirectededgessimplygivecorrelationsbetweenvariables
P (X1, X2, X3, X4, X5, X6, X7, X8)
=
1
Zexp(E(X1) + E(X2) + E(X1, X3) + E(X2, X4) + E(X2, X5))
exp(E(X6, X3, X4) + E(X7, X6) + E(X8, X5, X6))
ReviewofBayesNets
3
Section1
• LetI(G)bethesetoflocalindependencepropertiesencodedbyDAGG:
• WesaythatKisanI-mapforasetofindependenciesIif• AfullyconnectedDAGGisanI-mapforanydistribution• ADAGGisaminimal-mapforPifitisanI-mapforP,andiftheremovalofevenasingleedgefromGrendersitnotanI-map• AdistributionmayhaveseveralminimalI-maps• Eachcorrespondtoaspecificnode-ordering
I(K) ✓ I
PerfectMaps(P-Maps)
4
Section1
• ADAGGisaperfectmap(P-map)foradistributionPifI(P)=I(G)
PerfectMaps(P-Maps)
5
Section1
• ADAGGisaperfectmap(P-map)foradistributionPifI(P)=I(G)
• Thm:NoteverydistributionhasaperfectmapasDAG.
PerfectMaps(P-Maps)
6
Section1
• ADAGGisaperfectmap(P-map)foradistributionPifI(P)=I(G)
• Thm:NoteverydistributionhasaperfectmapasDAG.• Proof:Supposewehaveamodelwhereand
CanwerepresentthiswithaBN?A?C|{B,D} B?D|{A,C}
PerfectMaps(P-Maps)
7
Section1
• ADAGGisaperfectmap(P-map)foradistributionPifI(P)=I(G)
• Thm:NoteverydistributionhasaperfectmapasDAG.• Proof:SupposewehaveamodelwhereandCanwerepresentthiswithaBN?CannotberepresentedbyanyBN
A?C|{B,D} B?D|{A,C}
PerfectMaps(P-Maps)
8
Section1
• ADAGGisaperfectmap(P-map)foradistributionPifI(P)=I(G)
• Thm:NoteverydistributionhasaperfectmapasDAG.• Proof:SupposewehaveamodelwhereandCanwerepresentthiswithaBN?CannotberepresentedbyanyBN
A?C|{B,D} B?D|{A,C}
PerfectMaps(P-Maps)
9
Section1
• ADAGGisaperfectmap(P-map)foradistributionPifI(P)=I(G)
• TheP-mapofadistributionis uniqueuptoI-equivalencebetweennetworks.Thatis,adistributionPcanhavemanyP-maps,butallofthemareI-equivalent.
UndirectedGraphicalModels
10
Section1
• Pairwise(non-causal)relationships• Wecanwritedownthemodel,scorespecificconfigurationsoftheRVsbutnotgeneratesamples• Contingencyconstraintsonnodeconfigurations
Example
11
Section1
TheGridModel
12
Section1
• Imageprocessing,latticephysics,etc.• Eachnodemayrepresentapixel,anatom,etc.• Thestatesofadjacentornearbynodesarecoupledduetopatterncontinuityorelectro-magneticforce,etc.• Mostlikelyjoint-configurationsusuallycorrespondtoa“low-energy”state
Representation
13
Section1
• Def:anundirectedgraphicalmodelrepresentsadistributiondefinedbyanundirectedgraphH,andasetofpositivepotentialfunctionsψ associatedwiththecliquesofH,s.t.
whereZ isknownasthepartitionfunction:
• A.K.A.MarkovRandomFields,Markovnetworks• Thepotentialfunctioncanbeunderstoodasa“score”ofthejointconfiguration
P (X1, . . . , Xn)
P (X1, . . . , Xn) =1
Z
Y
c2C
c(Xc)
Z =X
X1,...,Xn
Y
c2C
c(Xc)
GlobalMarkovIndependencies
14
Section1
• LetH beanundirectedgraph:
• B separatesAandCifeverypathfromanodeinAtoanodeinCpassesthroughanodeinB:• AprobabilitydistributionsatisfiestheglobalMarkovpropertyifforanydisjointA,B,C,suchthatBseparatesAandC,AisindependentofCgivenB:
sepH(A;C|B)
I(H) = {A?C|B : sepH(A;C|B)}
LocalMarkovIndependencies
15
Section1
• ForeachnodethereisauniqueMarkovblanketofdenotedwhichisthesetofneighborsofinthegraph(thosethatshareanedgewith)
• Def:ThelocalMarkovindependenciesassociatedwithHis:
Inotherwords,isindependentoftherestofthenotesinthegraphgivenitsdirectneighbors
Xi 2 V XiMBXi Xi
Xi
I(H) : {Xi?V � {Xi}�MBXi |MBXi : 8i}
Xi
SummaryofQualitativeSpecification:SemanticsinanMRF
16
Section1
• Structure:anundirectedgraph• AnodeisconditionallyindependentofeveryothernodeinthenetworkgivenitsDirectedneighbors
• Potentialandthecliquesinthegraphcompletelydeterminethejointdistribution
• Modelscorrelationsbetweenvariablesbutwehavenoexplicitwaytogeneratesamples
QuantitativeSpecification:Cliques
17
Section1
• ForG={V,E},acliqueisasubgraphG’suchthatnodesinG’arefullyinterconnected.• A(maximal)cliqueisacompletesubgraphs.t. anysupersetV’’isnotcomplete.• Asub-cliqueisnotnecessarilymaximalclique
• Example:• Max-cliques={A,B,D},{B,C,D}• Sub-cliques={A,B},{C,D},…
GibbsDistributionandCliquePotential
18
Section1
• Def:anundirectedgraphicalmodelrepresentsadistributiondefinedbyanundirectedgraphH,andasetofpositivepotentialfunctionsψ associatedwiththecliquesofH,s.t.
whereZ isknownasthepartitionfunction:
• A.K.A.MarkovRandomFields,Markovnetworks• Thepotentialfunctioncanbeunderstoodasa“score”ofthejointconfiguration
P (X1, . . . , Xn)
P (X1, . . . , Xn) =1
Z
Y
c2C
c(Xc)
Z =X
X1,...,Xn
Y
c2C
c(Xc)
Gibbsdistribution
InterpretationofCliquePotentials
19
Section1
• Themodelimplies.Thisindependencestatementimplies(bydefinition)thatthejointmustfactorizeas:P(X,Y,Z)=P(Y)P(X|Y)P(Z|Y)
• Wecanwrite:P(X,Y,Z)=P(X,Y)P(Z|Y)orP(X,Y,Z)=P(X|Y)P(Z,Y)• cannot haveall potentialsbemarginals• cannot haveall potentialsbeconditionals
• Thepositivecliquepotentialscanonlybethoughtofasgeneral"compatibility"functionsovertheirvariables,butnotasprobabilitydistributions.
X?Z|Y
ExampleMRF- maxcliques
20
Section1
ExampleMRF- maxcliques
21
Section1
• Fordiscretenodes,wecanrepresentP(X1,X2,X3,X4)astwo3Dtablesinsteadofone4Dtable.
ExampleMRF- maxcliques
22
Section1
• Fordiscretenodes,wecanrepresentP(X1,X2,X3,X4)astwo3Dtablesinsteadofone4Dtable.
ExampleMRF– subcliques
23
Section1
• WecanrepresentP(X1,X2,X3,X4)as52Dtablesinsteadofone4Dtable.• PairMRFs,apopularsimplespecialcase• I(P’)vsI(P’’)? D(P’)vs.D(P’’)?
ExampleMRF– subcliques
24
Section1
• WecanrepresentP(X1,X2,X3,X4)as52Dtablesinsteadofone4Dtable.• PairMRFs,apopularsimplespecialcase• I(P’)= I(P’’)? D(P’) D(P’’)?◆
ExampleMRF– canonicalrepresentation
25
Section1
• Mostgeneral,subsumeP’andP’’asspecialcases
Hammersley-CliffordTheorem
26
Section1
Ifarbitrarypotentialsareutilizedinthefollowingproductformulaforprobabilities
thenthefamilyofprobabilitydistributionsobtainedisexactlythatsetwhichrespects thequalitativespecification(theconditionalindependencerelations)describedearlier
Thm:LetPbeapositive distributionoverVandHaMarkovnetworkgraphoverV.IfHisanI-mapforP,thenPisaGibbsdistributionoverH.
P (X1, . . . , Xn) =1
Z
Y
c2C
c(Xc)
Z =X
X1,...,Xn
Y
c2C
c(Xc)
Independenceproperties:globalindependencies
27
Section1
• Whatkindofdistributionscanberepresentedbyundirectedgraphs?• Def:theglobalMarkovpropertiesofaUGHare
I(H) = {A?C|B : sepH(A;C|B)}
LocalandGlobalMarkovProperties
28
Section1
• Fordirectedgraphs,wedefinedI-mapsintermsoflocalMarkovproperties,andderivedglobalindependence.• Forundirectedgraphs,wedefinedI-mapsintermsofglobalMarkovproperties,andwillnowderivelocalindependence.• Def:ThepairwiseMarkovindependenciesassociatedwithundirectedgraphH=(V,E)are
Ip(H) = {X?Y |V \ {X,Y } : {X,Y } /2 E}
X1?X5|{X2, X3, X4}
LocalMarkovProperties
29
Section1
• AdistributionhasthelocalMarkovpropertyw.r.t.agraphH=(V,E)iftheconditionaldistributionofavariablegivenitsneighborsisindependentoftheremainingnodes
• istheMarkovblanketofX
Ip(H) = {X?Y |V \ (X [NH(X))|NH(X) : X 2 V }
NH(X)
Perfectmaps
30
Section1
• Def:AMarkovnetworkHisaperfectmapforPifforanyX,Y,Zwehavethat:
• Thm:NoteverydistributionhasaperfectmapasanMRF• Noundirectednetworkcancapturetheindependencesencodedinav-structure
sepH(X,Z|Y ) , P |= (X?Z|Y )
ExponentialFamilies
31
Section1
• Constrainingcliquepotentialstobepositivecouldbeinconvenient(e.g.,theinteractionsbetweenapairofatomscanbeeitherattractiveorrepulsive).Werepresentacliquepotentialψinanunconstrainted formusingareal-value“energy”functionφandhave:
• Thisgives:
• Inphysics,thisistheBoltzmanndistribution.• Instatistics,thisislog-linearmodel.
x
(Xc
) = exp(��c
(Xc
)
P (X) =
1
Zexp
�X
c2C
�c(Xc)
!=
1
Zexp(�H(X))
Freeenergy
Boltzmannmachines
32
Section1
• Afullyconnectedgraphwithpairwisepotentialsonbinary-valuesnodesiscalledaBoltzmannmachine
P (X1, X2, X3, X4) =1
Xexp
0
@X
ij
�ij(Xi, Xj)
1
A
=
1
Xexp
0
@X
ij
✓ijXiXj +
X
I
↵iXi + C
1
A
RestrictedBoltzmannMachines(RBMs)
33
Section1
PropertiesofRBMs
34
Section1
• Factorsaremarginallydependent
• Factorsareconditionallyindependentgivenobservationsonvisiblenodes.
• IterativeGibbssampling• Learningwithcontrastivedivergence
P (l|w) =Y
I
P (li|w)
ConditionalRandomFields
35
Section1
• Discriminative
• Doesn’tassumethatfeaturesareindependent
• Whenlabelingfeatureobservationsaretakenintoaccount
P✓(Y |X) =
1
Z(✓, X)
exp
X
c
✓cfc(X,Yc)
!
Xi
ConditionalModels
36
Section1
•ModelconditionalprobabilityP(labelsequenceY|observationsequenceX)ratherthanjointprobabilityP(X,Y)• Specifytheprobabilityofpossiblelabelsequencesgivenanobservationsequence
• Allowarbitrary,non-independentfeaturesontheobservationsequenceX• Theprobabilityofatransitionbetweenlabelsmaydependonpastandfutureobservations• Relaxstrongindependentassumptioningenerativemodels
ConditionalDistribution
37
Section1
• IfthegraphH=(V,E)ofYisatree,theconditionaldistributionoverthelabelsequenceY= y,givenX=xbytheHammersleyCliffordtheoremofrandomfieldsis:
• xisadatasequence,yisalabelsequence,visavertexfromV=setoflabelrandomvariables,eisanedgefromEoverV,fandgaregivenandfixed,gisaBooleanvertexfeature,fisaBooleanedgefeature,kisthenumberoffeatures,λ,andμareparameters,isthesetofcomponentsy
P✓(y|x) / exp
0
@X
e2E,k
�kfk(e, y|e, x) +X
v2V,k
µkgk(v, y|v, x)
1
A
y|e
ConditionalDistribution
38
Section1
• CRFsusetheobservation-dependentnormalizationZ(x)fortheconditionaldistributions
P✓(y|x) =1
Z(x)
exp
0
@X
e2E,k
�kfk(e, y|e, x) +X
v2V,k
µkgk(v, y|v, x)
1
A
ConditionalDistribution
39
Section1
• Allowarbitrarydependenciesoninput
• Cliquedependenciesonlabels
• Useapproximateinferenceforgeneralgraphs
Summary
40
Section1
• Undirectedgraphicalmodelscapture“relatedness”,”coupled”,“co-occurrence”betweenentities• Localandglobalindependencepropertiesidentifiableviagraphseparationcriteria• Definedoncliquepotentials• Generallyintractabletocomputelikelihoodduetopresenceofpartitionfunction• Likelihood-basedlearning
• Canbeusedtodefineeitherjointorconditionaldistributions• Importantspecialcases:Ising models,RBMs,CRFs