18
Page 1 Sum-Product Networks - A new Deep Architecture As presented by: Shashank Gupta

Sum-Product Networks - A new Deep Architectureswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide4.pdf · Probabilistic Modeling n Compactly represent distributions. n

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1

Sum-ProductNetworks-AnewDeepArchitecture

Aspresentedby:ShashankGupta

ProbabilisticModeling

n Compactlyrepresentdistributions.n Computetheirmarginalsefficiently.n Computetheirmodesefficiently.n Learnthemefficiently.

n GraphicalModelsareprobabilisticmodels:

𝑷 𝑿 = 𝒙 =𝟏𝒁𝚷𝒌 𝝓𝒌 𝒙{𝒌}

1: 2

LimitationsofGraphicalModels

n GraphicalModelscan’trepresentallcompactdistributions¨ Forex:Uniformdistributionovervectorswitheven1s

n Computingthepartitionfunctionisintractableingeneraln Thus,weuse

¨ ApproximateInference¨ ApproximateTraining

1: 3

KeyideabehindSPNs

n Partitionfunction(Z)isasumoverexponentialtermsn Wecanrearrangeandgroupsometermstoreusecomputations

¨ That’sessentiallywhatInferencetriestodo.

n WhynotlearnthestructurethatisoptimizedforZcalculations¨ EssentiallywhatSPNstrytodo

n IfanSPNisvalid,thenZcanbecomputedexactly andinlineartime.

n QuestionnowishowmanysuchZs exist¨ Canrepresentalargersetofdistributions.¨ ArguethatSPNscoveraninterestingandusefulset

1: 4

SumProductNetwork

n UsestheideaofaNetworkPolynomial

𝛷 𝑥 =∑ 𝛷 𝑥0 ∏(𝑥)

𝑩𝒆𝒓𝒏𝒐𝒖𝒍𝒍𝒊𝑫𝒊𝒔𝒕𝒓𝒊𝒃𝒖𝒕𝒊𝒐𝒏𝛷 𝑥, �̅� =p[x] +(1-p)[�̅�]

GraphicalModelA->B𝛷 𝐴, 𝐵, �̅�, 𝐵T =p(B|A)[A][B]+p(𝐵T|A)[A][𝐵T] + p(B|�̅�)[�̅�][B] +…

n Foraparticularstate,only1monomialwillbeactive,andwillgivetheunnormalizedprobabilityofthatstate

n P(A=1,B=0)∝ 𝛷(A=1,B=0,AU =0,BU =1)

1: 5

NetworkPolynomial(Contd.)

P(A=1,B=0)∝ 𝛷(A=1,B=0,AU =0,BU =1)

P(A=1)∝𝛷(A=1,B=1,AU =0,BU =1)

Z∝𝛷(A=1,B=1,AU =1,BU =1)

n Zislinearinthenumberoftermsinthenetworkpolynomial¨ However,thereareexponentialsuchterms.¨ Canwedosomething?

n SPNsrepresentthenetworkpolynomialinahierarchicalmanner,andthusreusecomputations.

1: 6

SumProductNetworkn AlternateSumandProductLayersn Variablesattheleavesn Weightsonthechildrenofthesumnoden S(x)=outputoftherootnoden Polynomial#edgesimpliestractableinference

ValidSumProductNetworksn Canwemakeanyarbitraryconnections?n Sumproductnetworkshouldbevalid,i.e.itshouldrepresent

avalidnetworkpolynomial

S 𝑒 =∑ 𝑆 𝑥 ∀𝑒0∈Z

n SufficientconditionsforanSPNtobevalid:-¨ Completeness:

Allsumnodes’childrenshouldhavethesamescope¨ Consistency:

Thereshouldbenoproductnodethathasavariableinonechild,anditscomplementintheother.

GeneralizationsofSPNsn DiscreteVariables

Use[X=x]asvariables

n Continuousvariables

Replacesumnodeswithintegrationnodes

n MAPInference

ReplacesumnodeswithMaxnodes

Trainingn Iftheweightsofchildrenofallsumnodessumto1

¨ Z=1¨ UnnormalizedProbabilitiesbecomenormalizedprobabilities.¨ Candirectlymaximizetheloglikelihoodoftheoutputofthenetwork

n Initialization: Pickanydense valid SumProductNetworkwithpolynomialnumberofedges.

n Training:Minimizethenegativeloglikelihoodofthetrainingdata.¨ UseL0orL1normtopushmaximumweightstowards0¨ Normalizeweightsaftereveryiteration

n Post-Processing: Prunetheconnectionswith0weightsandanynodeswithoutchildren.

Training(Contd.)n Method1: BackPropagation

¨ However,asnetworksbecomeverydeep,itsuffersduetothevanishinggradientproblem.

n Method2: ExpectationMaximization¨ TheyarguethatEMsufferswiththesameproblemasnetworks

becomeverydeep.

n Method2b: HardMethods¨ HardEM: Don’testimatethefullposterior,insteadpicktheMAP/MLE

estimatesofthehiddenvariables.

Experimentsn Imagecompletiontask:

¨ Givenahalftrimmedimage,completetheimage.

n Benchmarkedagainst¨ DeepBoltzmannMachine¨ DeepBeliefNetwork¨ PCA¨ NearestNeighbor

n Usedaveragesquareddistancebetweentheactualimageandthegeneratedimageasthemetric

n Training:MaximizeLikelihoodn Test: UseMAPInferencetofilltheimage

QuantitativeResults

QualitativeResultsOriginal

SPN

DBM

DBN

PCA

NN

Speedn SPNisnotonlymoreaccuratebecauseofexactinferencen But,isalsoorderofmagnitudefaster

KeydifferenceswithGraphicalModelsn Wecan’treasonaboutConditionalIndependencieswithSPNs.n Noteasytoembeddomainknowledgeintothenetwork.n We’llalwayshavetolearnthestructureofthenetworkfrom

thedataitself.n WelosetheinterpretabilitythatwehadinGraphicalmodels.

withotherDeepArchitecturesn Wehaveproductnodesaswell.

Conclusionn GraphicalModelssufferduetointractableinferenceandthus

resorttoapproximateinference.n Sumproductnetworksexplicitlyoptimizetheirnetwork

structuretotargetintractableinference.n SPNsleadtoexact andefficient inference.n SPNscancompactlyrepresentsomedistributionswhich

GraphicalModelscan’t.

Questions?