Sum-Product Networks: A New Deep Architecture

Pedro DomingosDept. Computer Science & Eng.

University of Washington

Joint work with Hoifung Poon

Graphical Models: Challenges

Bayesian Network Markov Network

Sprinkler Rain

Grass Wet

Advantage: Compactly represent probability

Problem: Inference is intractable

Problem: Learning is difficult

Restricted Boltzmann Machine (RBM)

Stack many layersE.g.: DBN [Hinton & Salakhutdinov,2006]

CDBN [Lee et al.,2009]

DBM [Salakhutdinov & Hinton,2010]

Potentially much more powerful than shallow architectures [Bengio, 2009]

But … Inference is even harder Learning requires extensive effort

Deep Learning

Learning: Requires approximate inference

Inference: Still approximate

Graphical Models

E.g., hierarchical mixture model, thin junction tree, etc.

Problem: Too restricted

Graphical Models

Existing Tractable Models

This Talk: Sum-Product Networks

Compactly represent partition function using a deep network

Graphical Models

Sum-Product Networks

Graphical Models

Exact inference linear time in network size

Graphical Models

Can compactly represent many more distributions

Graphical Models

Learn optimal way to reuse computation, etc.

Outline

Sum-product networks (SPNs) Learning SPNs Experimental results Conclusion

Bottleneck: Summing out variables

E.g.: Partition function

Sum of exponentially many products

Why Is Inference Hard?

( , , ) , ,N j N

P X X X XZ

Alternative Representation

X1 X2 P(X)

1 1 0.4

1 0 0.2

0 1 0.1

0 0 0.3

P(X) = 0.4 I[X1=1] I[X2=1]

+ 0.2 I[X1=1] I[X2=0]

+ 0.1 I[X1=0] I[X2=1]

+ 0.3 I[X1=0] I[X2=0]

Alternative Representation

X1 X2 P(X)

1 1 0.4

1 0 0.2

0 1 0.1

0 0 0.3

P(X) = 0.4 I[X1=1] I[X2=1]

+ 0.2 I[X1=1] I[X2=0]

+ 0.1 I[X1=0] I[X2=1]

+ 0.3 I[X1=0] I[X2=0]

Shorthand for Indicators

X1 X2 P(X)

1 1 0.4

1 0 0.2

0 1 0.1

0 0 0.3

P(X) = 0.4 X1 X2

+ 0.2 X1 X2

+ 0.1 X1 X2

+ 0.3 X1 X2

Sum Out Variables

X1 X2 P(X)

1 1 0.4

1 0 0.2

0 1 0.1

0 0 0.3

P(e) = 0.4 X1 X2

+ 0.2 X1 X2

+ 0.1 X1 X2

+ 0.3 X1 X2

e: X1 = 1

Set X1 = 1, X1 = 0, X2 = 1, X2 = 1

Easy: Set both indicators to 1

Graphical Representation

0.2 0.1

X1 X2 X1

X1 X2 P(X)

1 1 0.4

1 0 0.2

0 1 0.1

0 0 0.3

But … Exponentially Large

Example: Parity Uniform distribution over states with even number of 1’s

X1 X1 X4

But … Exponentially Large

Example: Parity Uniform distribution over states of even number of 1’s

X1 X1 X4

Can we make this more compact?

Use a Deep Network

Example: Parity Uniform distribution over states with even number of 1’s

Use a Deep Network

Induce many hidden layers

Use a Deep Network

Reuse partial computation

Sum-Product Networks (SPNs) Rooted DAG Nodes: Sum, product, input indicator Weights on edges from sum to children

0.7 0.3

0.80.30.10.20.70.90.4

Distribution Defined by SPN

P(X) S(X)

0.7 0.3

0.80.30.10.20.70.90.4

Distribution Defined by SPN

P(X) S(X)

0.7 0.3

0.80.30.10.20.70.90.4

0.7 0.3

0.80.30.10.20.70.90.4

1 0 1 1e: X1 = 1

P(e) Xe S(X) S(e)

Can We Sum Out Variables?

0.7 0.3

0.80.30.10.20.70.90.4

1 0 1 1e: X1 = 1

P(e) = Xe P(X) S(e)

0.7 0.3

0.80.30.10.20.70.90.4

1 0 1 1e: X1 = 1

P(e) Xe S(X) S(e)

0.7 0.3

0.80.30.10.20.70.90.4

1 0 1 1e: X1 = 1

P(e) Xe S(X) S(e)

Valid SPN

SPN is valid if S(e) = Xe S(X) for all e

Valid Can compute marginals efficiently

Partition function Z can be computed by setting all indicators to 1

Valid SPN: General Conditions

Theorem: SPN is valid if it is complete & consistent

Incomplete Inconsistent

Complete: Under sum, children cover the same set of variables

Consistent: Under product, no variable in one child and negation in another

S(e) Xe S(X) S(e) Xe S(X)

Semantics of Sums and Products

Product Feature Form feature hierarchy

Sum Mixture (with hidden var. summed out)

I[Yi = j]

…… …… j

…… ……

wijSum out Yi

Inference

Probability: P(X) = S(X) / Z

0.7 0.3

0.80.30.10.20.70.90.4

1 0 0 1

0.6 0.9 0.7 0.8

0.42 0.72

X: X1 = 1, X2 = 0

Inference

If weights sum to 1 at each sum node Then Z = 1, P(X) = S(X)

0.7 0.3

0.80.30.10.20.70.90.4

1 0 0 1

0.6 0.9 0.7 0.8

0.42 0.72

X: X1 = 1, X2 = 0

Inference

Marginal: P(e) = S(e) / Z

0.7 0.3

0.80.30.10.20.70.90.4

1 0 1 1

0.6 0.9 1 1

0.6 0.9

0.69 = 0.51 0.18e: X1 = 1

Inference

0.7 0.3

0.80.30.10.20.70.90.4

1 0 1 1

0.6 0.9 0.7 0.8

0.42 0.72

0.3 0.72 = 0.216e: X1 = 1

MAP: Replace sums with maxs

MAX MAX MAX MAX

0.7 0.42 = 0.294

Inference

0.7 0.3

0.80.30.10.20.70.90.4

1 0 1 1

0.6 0.9 0.7 0.8

0.42 0.72

0.3 0.72 = 0.216e: X1 = 1

MAX: Pick child with highest value

MAP State: X1 = 1, X2 = 0

MAX MAX MAX MAX

0.7 0.42 = 0.294

Handling Continuous Variables

Sum Integral over input

Simplest case: Indicator Gaussian

SPN compactly defines a very large mixture of Gaussians

SPNs Everywhere

Graphical models

Existing tractable models, e.g.:hierarchical mixture model, thin junction tree, etc.

SPNs can compactly represent many more distributions

SPNs Everywhere

Graphical models

Inference methods, e.g.:Junction-tree algorithm, message passing, …

SPNs can represent, combine, and learn the optimal way

SPNs Everywhere

Graphical models

SPNs can be more compact byleveraging determinism,

context-specific independence, etc.

SPNs Everywhere

Graphical models

Models for efficient inference

E.g., arithmetic circuits, AND/OR graphs, case-factor diagrams

SPN: First approach for learning directly from data

SPNs Everywhere

Graphical models

General, probabilistic convolutional network

Sum: Average-pooling

Max: Max-pooling

SPNs Everywhere

Graphical models

General, probabilistic convolutional network

Grammars in vision and language

E.g., object detection grammar,probabilistic context-free grammar

Sum: Non-terminalProduct: Production rule

Outline

General Approach

Start with a dense SPN

Find the structure by learning weightsZero weights signify absence of connections

Can also learn with EMEach sum node is a mixture over children

The Challenge

In principle, can use gradient descent

But … gradient quickly dilutes

Sum-Product Networks: A New Deep Architecture

Documents

Ptolemy: Architecture Support for Robust Deep Learning

The Cache-Coherence Problem · 2 Read Sum P 1 Write Sum = 3 P 2 Write Sum = 7 P 1 Read Sum 12. CSC/ECE 506: Architecture of Parallel Computers Cache-Coherence Problem Illustration

Exact, Tractable Inference in the Sigma Cognitive ... · Sum-product networks (SPNs) are a new kind of deep architecture that support exact, tractable inference over a large class

A Deep Learning Architecture for Psychometric Natural

Deep EvoGraphNet Architecture For Time-Dependent Brain

Virtual SAN Architecture Deep Dive - VMwaredownload3.vmware.com/vmworld/2014/downloads/session-pdfs/STO1279... · Virtual SAN Architecture Deep Dive STO1279 Christos Karamanolis,

Loan Origination Reference Architecture Deep Dive

Gemmini: Enabling Systematic Deep-Learning Architecture

Deep dive into Java security architecture

VMworld 2014: Virtual SAN Architecture Deep Dive

Mixed Sum-Product Networks: A Deep Architecture for Hybrid ... · Mixed Sum-Product Networks: A Deep Architecture for Hybrid Domains Alejandro Molina alejandro.molina@tu-dortmund.de

An introduction to Sum-Product Networks (SPNs) · An introduction to Sum-Product Networks (SPNs): A new deep probabilistic architecture Felix McGregor Prof Johan du Preez

Sum-Product Networks: A New Deep Architecture · 1 Sum-Product Networks: A New Deep Architecture Pedro Domingos Dept. Computer Science & Eng. University of Washington Joint work with

An Improved Deep Learning Architecture for Person Re ...openaccess.thecvf.com/content_cvpr_2015/papers/Ahmed_An...An Improved Deep Learning Architecture for Person Re-Identiﬁcation

A Deep Sum-Product Architecture for Robust Facial Attributes …xgwang/papers/luoWTiccv13a.pdf · 2013-10-15 · A Deep Sum-Product Architecture for Robust Facial Attributes Analysis

Exchange Server 2013 Architecture Deep Dive

2014-29-8 Why the Deep State Always Wins-Zero-Sum Perpetual War

An Architecture for Deep, Hierarchical Generative …papers.nips.cc/paper/6141-an-architecture-for-deep...An Architecture for Deep, Hierarchical Generative Models Philip Bachman phil.bachman@maluuba.com

Deep convolutional neural networks - Computer Science- …web.cs.ucdavis.edu/.../lee_lecture19_deeplearning_notes.pdf · Deep convolutional neural networks ... “Deep” architecture

Conditional Sum-Product Networks: Imposing Structure on ... · CONDITIONAL SUM-PRODUCT NETWORKS Conditional Sum-Product Networks: Imposing Structure on Deep Probabilistic Architectures