Probabilistic Circuits: A New Synthesis of Logic and Machine...

Probabilistic and Logistic Circuits:

A New Synthesis of

Logic and Machine Learning

Guy Van den Broeck

RelationalAI ArrowCon

Feb 5, 2019

Which method to choose?

Classical AI Methods:

Hungry?

Restaurant?

Sleep?

Clear Modeling Assumption

Well-understood

Neural Networks:

“Black Box”

Good performance

on Image Classification

Outline

• Adding knowledge to deep learning

• Probabilistic circuits

• Logistic circuits for image classification

Outline

Motivation: Video

[Lu, W. L., Ting, J. A., Little, J. J., & Murphy, K. P. (2013). Learning to track and identify players from broadcast sports videos.]

Motivation: Robotics

[Wong, L. L., Kaelbling, L. P., & Lozano-Perez, T., Collision-free state estimation. ICRA 2012]

Motivation: Language

• Non-local dependencies:

At least one verb in each sentence

• Sentence compression

If a modifier is kept, its subject is also kept

• Information extraction

• Semantic role labeling

… and many more!

[Chang, M., Ratinov, L., & Roth, D. (2008). Constraints as prior knowledge],…, [Chang, M. W., Ratinov, L., & Roth, D. (2012).

Structured learning with constrained conditional models.], [https://en.wikipedia.org/wiki/Constrained_conditional_model]

Motivation: Deep Learning

[Graves, A., Wayne, G., Reynolds, M., Harley, T., Danihelka, I., Grabska-Barwińska, A., et al.. (2016).

Hybrid computing using a neural network with dynamic external memory. Nature, 538(7626), 471-476.]

Courses: • Logic (L)

• Knowledge Representation (K)

• Probability (P)

• Artificial Intelligence (A)

• Must take at least one of

Probability or Logic.

• Probability is a prerequisite for AI.

• The prerequisites for KR is

either AI or Logic.

Constraints

Running Example

L K P A

0 0 0 0

0 0 0 1

0 0 1 0

0 0 1 1

0 1 0 0

0 1 0 1

0 1 1 0

0 1 1 1

1 0 0 0

1 0 0 1

1 0 1 0

1 0 1 1

1 1 0 0

1 1 0 1

1 1 1 0

1 1 1 1

unstructured L K P A

0 0 0 0

0 0 0 1

0 0 1 0

0 0 1 1

0 1 0 0

0 1 0 1

0 1 1 0

0 1 1 1

1 0 0 0

1 0 0 1

1 0 1 0

1 0 1 1

1 1 0 0

1 1 0 1

1 1 1 0

1 1 1 1

structured

Structured Space

7 out of 16 instantiations

are impossible

• Must take at least one of

Probability (P) or Logic (L).

• Probability is a prerequisite

for AI (A).

• The prerequisites for KR (K) is

either AI or Logic.

L K P A

0 0 0 0

0 0 0 1

0 0 1 0

0 0 1 1

0 1 0 0

0 1 0 1

0 1 1 0

0 1 1 1

1 0 0 0

1 0 0 1

1 0 1 0

1 0 1 1

1 1 0 0

1 1 0 1

1 1 1 0

1 1 1 1

unstructured L K P A

0 0 0 0

0 0 0 1

0 0 1 0

0 0 1 1

0 1 0 0

0 1 0 1

0 1 1 0

0 1 1 1

1 0 0 0

1 0 0 1

1 0 1 0

1 0 1 1

1 1 0 0

1 1 0 1

1 1 1 0

1 1 1 1

structured

Boolean Constraints

7 out of 16 instantiations

are impossible

Learning in Structured Spaces

Data Constraints (Background Knowledge)

(Physics)

ML Model

Today‟s machine learning tools

don‟t take knowledge as input!

Deep Learning with

Logical Knowledge

Data Constraints

Deep Neural

Network

Neural Network Logical Constraint

Output

Output is

probability vector p,

not Boolean logic!

Semantic Loss

Q: How close is output p to satisfying constraint?

Answer: Semantic loss function L(α,p)

• Axioms, for example:

– If p is Boolean then L(p,p) = 0

– If α implies β then L(α,p) ≥ L(β,p) (α more strict)

• Properties:

– If α is equivalent to β then L(α,p) = L(β,p)

– If p is Boolean and satisfies α then L(α,p) = 0

SEMANTIC

Semantic Loss: Definition

Theorem: Axioms imply unique semantic loss:

Probability of getting x after

flipping coins with prob. p

Probability of satisfying α after

flipping coins with prob. p

Example: Exactly-One

• Data must have some label

We agree this must be one of the 10 digits:

• Exactly-one constraint

→ For 3 classes:

• Semantic loss:

𝒙𝟏 ∨ 𝒙𝟐∨ 𝒙𝟑¬𝒙𝟏 ∨ ¬𝒙𝟐¬𝒙𝟐 ∨ ¬𝒙𝟑¬𝒙𝟏 ∨ ¬𝒙𝟑

Only 𝒙𝒊 = 𝟏 after flipping coins

Exactly one true 𝒙 after flipping coins

Semi-Supervised Learning

• Intuition: Unlabeled data must have some label

Cf. entropy constraints, manifold learning

• Minimize exactly-one semantic loss on unlabeled data

Train with

𝑒𝑥𝑖𝑠𝑡𝑖𝑛𝑔 𝑙𝑜𝑠𝑠 + 𝑤 ∙ 𝑠𝑒𝑚𝑎𝑛𝑡𝑖𝑐 𝑙𝑜𝑠𝑠

MNIST Experiment

Competitive with state of the art

in semi-supervised deep learning

FASHION Experiment

Outperforms Ladder Nets! Same conclusion on CIFAR10

What about real constraints? Paths cf. Nature paper

Good variable assignment

(represents route)

Bad variable assignment

(does not represent route)

16,777,032

Unstructured probability space: 184+16,777,032 = 224

Space easily encoded in logical constraints [Nishino et al.]

How to Compute Semantic Loss?

• In general: #P-hard

Negation Normal Form Circuits

[Darwiche 2002]

Δ = (sun ∧ rain ⇒ rainbow)

Input:

Bottom-up Evaluation

1 0 0 1

0 = 1 AND 0

1 1 0 1

Logical Circuits

Decomposable Circuits

Decomposable

[Darwiche 2002]

Tractable for Logical Inference

• Is there a solution? (SAT)

– SAT(𝛼 ∨ 𝛽) iff SAT(𝛼) or SAT(𝛽) (always)

– SAT(𝛼 ∧ 𝛽) iff SAT(𝛼) and SAT(𝛽) (decomposable)

• How many solutions are there? (#SAT)

• Complexity linear in circuit size

Deterministic Circuits

Deterministic

[Darwiche 2002]

How many solutions are there?

(#SAT)

How many solutions are there?

(#SAT)

Arithmetic Circuit

Tractable for Logical Inference

• Is there a solution? (SAT)

• How many solutions are there? (#SAT)

• Stricter languages (e.g., BDD, SDD):

– Equivalence checking

– Conjoin/disjoint/negate circuits

• Complexity linear in circuit size

• Compilation into circuit language by either

– ↓ exhaustive SAT solver

– ↑ conjoin/disjoin/negate

✓ ✓

How to Compute Semantic Loss?

• In general: #P-hard

• With a logical circuit for α: Linear!

• Example: exactly-one constraint:

• Why? Decomposability and determinism!

L(α,p) = L( , p) = - log( )

Predict Shortest Paths

Add semantic loss

for path constraint

Is output

a path? Are individual

edge predictions

correct?

Is prediction

the shortest path?

This is the real task!

(same conclusion for predicting sushi preferences, see paper)

Outline

L K L P A P L L P A P L K L P P

K K A A A A

Logical Circuits

Can we represent a distribution

over the solutions to the constraint?

¬L K L ⊥

P A ¬P ⊥

L ¬L ⊥

¬P ¬A P

0.6 0.4

¬L ¬K L ⊥

P ¬P ⊥

0.8 0.2

A ¬A 0.25 0.75

A ¬A 0.9 0.1

0.1 0.6

Probabilistic Circuits

Syntax: assign a normalized probability to each OR gate input

Bottom-Up Evaluation of PSDDs

Input:

1 0 0 1

Multiply the parameters

bottom-up

0 0 0 1

0.1 0.8 0.0 0.3

0.01 0.24 0.00

0.194 0.096

0 0.096

𝐏𝐫(𝑨,𝑩, 𝑪, 𝑫) =𝟎. 𝟎𝟗𝟔

0.1= 0.1*1 + 0.9*0

0.24= 0.8*0.3

¬L K L ⊥ P A ¬P ⊥ L ¬L ⊥ ¬P ¬A P ¬L ¬K L ⊥ P ¬P ⊥

K ¬K A ¬A A ¬A

Input: L, K, P, A

are true

0.1 0.6

1 0 1 0 1 0 0.6 0.4 1 0 1 0

0.8 0.2 0.25 0.75 0.9 0.1

Pr(L,K,P,A) = 0.3 x 1 x 0.8 x 0.4 x 0.25 = 0.024

Alternative View of PSDDs

0.6 0.4

0.8 0.2

A A 0.25 0.75

A A 0.9 0.1

0.1 0.6

Can read probabilistic independences off the circuit structure!

Each node represents

a normalized

distribution!

Can interpret every parameter as a conditional probability! (XAI)

Tractable for Probabilistic Inference

• MAP inference:

Find most-likely assignment to x given y (otherwise NP-hard)

• Computing conditional probabilities Pr(x|y) (otherwise #P-hard)

• Sample from Pr(x|y)

• Algorithms linear in circuit size (pass up, pass down, similar to backprop)

Parameter Learning Algorithms

• Closed form

max likelihood

from complete data

• One pass over data to estimate Pr(x|y)

Not a lot to say: very easy!

…are Sum-Product Networks …are Arithmetic Circuits

p1 s1 p2 s2 pn sn

PSDD AC

* * * 1 2 n

p1 s1 p2 s2 pn sn

Learn Mixtures of PSDD Structures

State of the art

on 6 datasets!

Q: “Help! I need to learn a

discrete probability distribution…”

A: Learn mixture of PSDDs!

Strongly outperforms

• Bayesian network learners

• Markov network learners

Competitive with

• SPN learners

• Cutset network learners

Outline

• Logistic circuits for image

classification

What if I only want to classify Y?

Pr(𝑌, 𝐴, 𝐵, 𝐶, 𝐷)

What if we only want to

learn a classifier 𝐏𝐫 𝒀 𝑿)

Input:

Aggregate the parameters

bottom-up

Logistic function on final

output

1 0 0 1

𝐏𝐫 𝒀 = 𝟏 𝑨, 𝑩, 𝑪, 𝑫) =𝟏

𝟏+𝒆𝒙𝒑(−𝟏.𝟗)=𝟎. 𝟖𝟔𝟗

Logistic Circuits: Evaluation

Alternative View on Logistic Circuits

Represents Pr 𝑌 𝐴, 𝐵, 𝐶, 𝐷

• Take all „hot‟ wires

• Sum their weights

• Push through logistic function

Special Case: Logistic Regression

What about other logistic circuits in

more general forms?

Pr 𝑌 = 1 𝐴, 𝐵, 𝐶, 𝐷 =1

1 + ex p( − 𝐴 ∗ 𝜃𝐴 −¬𝐴 ∗ 𝜃¬𝐴 − 𝐵 ∗ 𝜃𝐵 −⋯)

Logistic Regression

Parameter Learning

Reduce to logistic regression:

Features associated with each wire

“Global Circuit Flow” features

Learning parameters θ is convex optimization!

Structure Learning Primitive

Logistic Circuit Structure Learning

Calculate Gradient Variance

Execute the best operation

Generate candidate operations

Comparable Accuracy with Neural Nets

Significantly Smaller in Size

Better Data Efficiency

Logistic vs. Probabilistic Circuits

Probabilities

become

log-odds

Pr 𝑌 𝐴, 𝐵, 𝐶, 𝐷

Pr(𝑌, 𝐴, 𝐵, 𝐶, 𝐷)

Interpretable?

Logistic Circuits: Conclusions

• Synthesis of symbolic AI and statistical

learning

• Discriminative counterparts of probabilistic

circuits

• Convex parameter learning

• Simple heuristic for structure learning

• Good performance

• Easy to interpret

Conclusions

Statistical ML

“Probability”

Symbolic AI

“Logic”

Connectionism

“Deep”

Circuits

Probabilistic Circuits: A New Synthesis of Logic and Machine...

Documents

Logic Synthesis - UMass Amherst

Logic Synthesis with VHDL Combinational Logic Bob Reese

THE LOGIC OF CHEMICAL SYNTHESIS: MULTISTEP SYNTHESIS · PDF fileTHE LOGIC OF CHEMICAL SYNTHESIS: MULTISTEP SYNTHESIS OF COMPLEX CARBOGENIC MOLECULES Nobel Lecture, ... Molecules such

Logic of Chemical Synthesis

COE 561 Digital System Design & Synthesis Sequential Logic Synthesis

LOGIC SYNTHESIS FOR CONCURRENT ERROR DETECTION€¦ · LOGIC SYNTHESIS FOR CONCURRENT ERROR DETECTION ... Logic Synthesis for Concurrent Error Detection ... General Structure of a

Quantum logic synthesis (srikanth)

6.Logic Synthesis

Unit 3: Logic Synthesis

Cra en en Broeck

COE 561 Digital System Design & Synthesis Logic Synthesis Background

Multilevel Logic Synthesis -- Introduction. ENEE 6442 Multilevel Logic Synthesis: Outline > Introduction =What is multilevel logic? =Why we need it? =Problems

Introduction to Logic Synthesis

Multilevel Logic Synthesis -- Algebraic Methods

RTL synthesis: from logic synthesis to automatic pipelining

Active Inductive Logic Programming for Code Searchguyvdb/slides/ICSE19.pdfActive Inductive Logic Programming for Code Search Aishwarya Sivaraman, Tianyi Zhang, Guy Van den Broeck,

Verilog – Sequential Logic - Worcester Polytechnic Instituteusers.wpi.edu/~rjduck/Verilog for synthesis - sequential logic rev... · Verilog – Sequential Logic Verilog for Synthesis

Reversible Logic Circuit Synthesis

Synthesis of Reversible Logic Circuits

Logic Synthesis Outline –Logic Synthesis Problem –Logic Specification –Two-Level Logic Optimization Goal –Understand logic synthesis problem –Understand