Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
ProbabilisticModeling
n Compactlyrepresentdistributions.n Computetheirmarginalsefficiently.n Computetheirmodesefficiently.n Learnthemefficiently.
n GraphicalModelsareprobabilisticmodels:
𝑷 𝑿 = 𝒙 =𝟏𝒁𝚷𝒌 𝝓𝒌 𝒙{𝒌}
1: 2
LimitationsofGraphicalModels
n GraphicalModelscan’trepresentallcompactdistributions¨ Forex:Uniformdistributionovervectorswitheven1s
n Computingthepartitionfunctionisintractableingeneraln Thus,weuse
¨ ApproximateInference¨ ApproximateTraining
1: 3
KeyideabehindSPNs
n Partitionfunction(Z)isasumoverexponentialtermsn Wecanrearrangeandgroupsometermstoreusecomputations
¨ That’sessentiallywhatInferencetriestodo.
n WhynotlearnthestructurethatisoptimizedforZcalculations¨ EssentiallywhatSPNstrytodo
n IfanSPNisvalid,thenZcanbecomputedexactly andinlineartime.
n QuestionnowishowmanysuchZs exist¨ Canrepresentalargersetofdistributions.¨ ArguethatSPNscoveraninterestingandusefulset
1: 4
SumProductNetwork
n UsestheideaofaNetworkPolynomial
𝛷 𝑥 =∑ 𝛷 𝑥0 ∏(𝑥)
𝑩𝒆𝒓𝒏𝒐𝒖𝒍𝒍𝒊𝑫𝒊𝒔𝒕𝒓𝒊𝒃𝒖𝒕𝒊𝒐𝒏𝛷 𝑥, �̅� =p[x] +(1-p)[�̅�]
GraphicalModelA->B𝛷 𝐴, 𝐵, �̅�, 𝐵T =p(B|A)[A][B]+p(𝐵T|A)[A][𝐵T] + p(B|�̅�)[�̅�][B] +…
n Foraparticularstate,only1monomialwillbeactive,andwillgivetheunnormalizedprobabilityofthatstate
n P(A=1,B=0)∝ 𝛷(A=1,B=0,AU =0,BU =1)
1: 5
NetworkPolynomial(Contd.)
P(A=1,B=0)∝ 𝛷(A=1,B=0,AU =0,BU =1)
P(A=1)∝𝛷(A=1,B=1,AU =0,BU =1)
Z∝𝛷(A=1,B=1,AU =1,BU =1)
n Zislinearinthenumberoftermsinthenetworkpolynomial¨ However,thereareexponentialsuchterms.¨ Canwedosomething?
n SPNsrepresentthenetworkpolynomialinahierarchicalmanner,andthusreusecomputations.
1: 6
SumProductNetworkn AlternateSumandProductLayersn Variablesattheleavesn Weightsonthechildrenofthesumnoden S(x)=outputoftherootnoden Polynomial#edgesimpliestractableinference
ValidSumProductNetworksn Canwemakeanyarbitraryconnections?n Sumproductnetworkshouldbevalid,i.e.itshouldrepresent
avalidnetworkpolynomial
S 𝑒 =∑ 𝑆 𝑥 ∀𝑒0∈Z
n SufficientconditionsforanSPNtobevalid:-¨ Completeness:
Allsumnodes’childrenshouldhavethesamescope¨ Consistency:
Thereshouldbenoproductnodethathasavariableinonechild,anditscomplementintheother.
GeneralizationsofSPNsn DiscreteVariables
Use[X=x]asvariables
n Continuousvariables
Replacesumnodeswithintegrationnodes
n MAPInference
ReplacesumnodeswithMaxnodes
Trainingn Iftheweightsofchildrenofallsumnodessumto1
¨ Z=1¨ UnnormalizedProbabilitiesbecomenormalizedprobabilities.¨ Candirectlymaximizetheloglikelihoodoftheoutputofthenetwork
n Initialization: Pickanydense valid SumProductNetworkwithpolynomialnumberofedges.
n Training:Minimizethenegativeloglikelihoodofthetrainingdata.¨ UseL0orL1normtopushmaximumweightstowards0¨ Normalizeweightsaftereveryiteration
n Post-Processing: Prunetheconnectionswith0weightsandanynodeswithoutchildren.
Training(Contd.)n Method1: BackPropagation
¨ However,asnetworksbecomeverydeep,itsuffersduetothevanishinggradientproblem.
n Method2: ExpectationMaximization¨ TheyarguethatEMsufferswiththesameproblemasnetworks
becomeverydeep.
n Method2b: HardMethods¨ HardEM: Don’testimatethefullposterior,insteadpicktheMAP/MLE
estimatesofthehiddenvariables.
Experimentsn Imagecompletiontask:
¨ Givenahalftrimmedimage,completetheimage.
n Benchmarkedagainst¨ DeepBoltzmannMachine¨ DeepBeliefNetwork¨ PCA¨ NearestNeighbor
n Usedaveragesquareddistancebetweentheactualimageandthegeneratedimageasthemetric
n Training:MaximizeLikelihoodn Test: UseMAPInferencetofilltheimage
KeydifferenceswithGraphicalModelsn Wecan’treasonaboutConditionalIndependencieswithSPNs.n Noteasytoembeddomainknowledgeintothenetwork.n We’llalwayshavetolearnthestructureofthenetworkfrom
thedataitself.n WelosetheinterpretabilitythatwehadinGraphicalmodels.
withotherDeepArchitecturesn Wehaveproductnodesaswell.
Conclusionn GraphicalModelssufferduetointractableinferenceandthus
resorttoapproximateinference.n Sumproductnetworksexplicitlyoptimizetheirnetwork
structuretotargetintractableinference.n SPNsleadtoexact andefficient inference.n SPNscancompactlyrepresentsomedistributionswhich
GraphicalModelscan’t.