Upload
isaac-burke
View
216
Download
0
Embed Size (px)
DESCRIPTION
Object Classification Assign semantic labels to objects Corgi Puppy Dog Cat ✔ ✔ ✖ ✔
Citation preview
Augmenting Probabilistic Graphical Models with Ontology Information:
Object Classification
R. MöllerInstitute of Information Systems
University of Luebeck
Large-Scale Object Recognition using Label Relation Graphs
Jia Deng1,2, Nan Ding2, Yangqing Jia2, Andrea Frome2, Kevin Murphy2, Samy Bengio2, Yuan Li2, Hartmut Neven2, Hartwig Adam2
University of Michigan1, Google2
Based on:
Object Classification
• Assign semantic labels to objects
CorgiPuppy
Dog
Cat
✔
✔
✖✔
Object Classification
• Assign semantic labels to objects
Probabilities
0.9
0.8
0.9
0.1
CorgiPuppy
Dog
Cat
Object Classification
• Assign semantic labels to objects
Feature Extractor
Features Classifier Probabilities
0.9
0.8
0.9
0.1
CorgiPuppy
Dog
Cat
Object Classification
• Multiclass classifier: Softmax
CorgiPuppy
Dog
Cat
/
/
/
/+
Assumes mutual exclusive labels.
0.2
0.4
0.3
0.1
• Independent binary classifiers: Logistic Regression
CorgiPuppy
Dog
Cat
0.4
0.8
0.6
0.2
No assumptions about relations.
Object labels have rich relations
Corgi Puppy
Dog Cat
ExclusionHierarchical
Dog
CatCorgi Puppy
OverlapSoftmax: all labels are mutually exclusive Logistic Regression: all labels overlap
Goal: A new classification model
Respects real world label relations
CorgiPuppy
Dog
Cat
0.9
0.8
0.9
0.1Corgi Puppy
Dog Cat
Visual Model + Knowledge Graph
CorgiPuppy
Dog
Cat
Visual Model
0.9
0.8
0.9
0.1
Knowledge Graph
Joint Inference
Assumption in this work:Knowledge graph is given and fixed.
Agenda
• Encoding prior knowledge (HEX graph)• Classification model• Efficient Exact Inference
Agenda
• Encoding prior knowledge (HEX graph)• Classification model• Efficient Exact Inference
Hierarchy and Exclusion (HEX) Graph
Corgi Puppy
Dog Cat
ExclusionHierarchical
• Hierarchical edges (directed)• Exclusion edges (undirected)
Examples of HEX graphs
Car Bird
Dog Cat
Male Female
Person
Child
BoyRound
Red Shiny
Thick
Mutually exclusive All overlapping Combination
Girl
State Space: Legal label configurations
Dog Cat Corgi Puppy
0 0 0 0
0 0 0 1
0 0 1 0
0 0 1 1
1 0 0 0
…
1 1 0 0
1 1 0 1
…
Corgi Puppy
Dog Cat
Each edge defines a constraint.
State Space: Legal label configurations
Corgi Puppy
Dog CatDog Cat Corgi Puppy
0 0 0 0
0 0 0 1
0 0 1 0
0 0 1 1
1 0 0 0
…
1 1 0 0
1 1 0 1
…
Hierarchy: (dog, corgi) can’t be (0,1)
Each edge defines a constraint.
State Space: Legal label configurations
Corgi Puppy
Dog CatDog Cat Corgi Puppy
0 0 0 0
0 0 0 1
0 0 1 0
0 0 1 1
1 0 0 0
…
1 1 0 0
1 1 0 1
…Exclusion: (dog, cat) can’t be (1,1)
Hierarchy: (dog, corgi) can’t be (0,1)
Each edge defines a constraint.
Agenda
• Encoding prior knowledge (HEX graph)• Classification model• Efficient Exact Inference
HEX Classification Model• Pairwise Conditional Random Field (CRF)
Input scores Binary Label vector
HEX Classification Model• Pairwise Conditional Random Field (CRF)
Binary Label vector
Unary: same as logistic regression
Input scores
HEX Classification Model• Pairwise Conditional Random Field (CRF)
Binary Label vector
Unary: same as logistic regression
If violates constraints
Otherwise
0
Pairwise: set illegal configuration to zero
Input scores
HEX Classification Model• Pairwise Conditional Random Field (CRF)
Binary Label vector
Partition function: Sum over all (legal) configurations
Input scores
HEX Classification Model• Pairwise Conditional Random Field (CRF)
Binary Label vector
Probability of a single label: marginalize all other labels.
Input scores
Special Case of HEX Model
• Softmax
Car Bird
Dog Cat
Round
Red Shiny
Mutually exclusive All overlapping
Thick
• Logistic Regressions
Learning
CorgiPuppy
Dog
CatDNN
Label: Dog
Maximize marginal probability of observed labels
Back Propagation
DogCorgiPuppy
Cat
1??
?
DNN = Deep Neural Network
Agenda
• Encoding prior knowledge (HEX graph)• Classification model• Efficient Exact Inference
Naïve Exact Inference is Intractable• Inference: – Computing partition function– Perform marginalization
• HEX-CRF can be densely connected (large treewidth)
Observation 1: Exclusions are good
Car Bird
Dog Cat
• Lots of exclusions Small state space Efficient inference• Realistic graphs have lots of exclusions.• Rigorous analysis in paper.
Number of legal states is O(n), not O(2n).
Observation 2: Equivalent graphs
Dog Cat
Corgi
PuppyPembroke Welsh Corgi
Cardigan Welsh Corgi
Dog Cat
Corgi
PuppyPembroke Welsh Corgi
Cardigan Welsh Corgi
Observation 2: Equivalent graphs
Sparse equivalent•Small Treewidth •Dynamic programming
Dog Cat
Corgi
PuppyPembroke Welsh Corgi
Cardigan Welsh Corgi
Dog Cat
Corgi
PuppyPembroke Welsh Corgi
Cardigan Welsh Corgi
Dog Cat
Corgi
PuppyPembroke Welsh Corgi
Cardigan Welsh Corgi
Dense equivalent•Prune states •Can brute force
A
BF
B
ED
C
G
F
BC
F
HEX Graph Inference
A
B
E
D
C
G
F
A
B
E
D
C
G
F
A
B
E
D
C
G
F
A
BF
B
ED
C
G
F
BC
F
1. Sparsify
(offline)
3.Densify(offline)
2.Build Junction Tree(offline)
4.Prune Clique States
(offline)
5. Message Passing on
legal states (online)
31
Digression: Polytrees• A network is singly connected (a polytree) if
it contains no undirected loops.
Theorem: Inference in a singly connected network can be done in linear time*.
Main idea: in variable elimination, need only maintain distributions over single nodes.
* in network size including table sizes.
© Jack Breese (Microsoft) & Daphne Koller (Stanford)
32
The problem with loops
Rain
Cloudy
Grass-wet
Sprinkler
P(c) 0.5
P(r)c c
0.99 0.01 P(s)c c
0.01 0.99
deterministic or
The grass is dry only if no rain and no sprinklers.
P(g) = P(r, s) ~ 0
© Jack Breese (Microsoft) & Daphne Koller (Stanford)
33
The problem with loops contd.
= P(r, s)
P(g | r, s) P(r, s) + P(g | r, s) P(r, s)
+ P(g | r, s) P(r, s) + P(g | r, s) P(r, s)
0
10
0
= P(r) P(s) ~ 0.5 ·0.5 = 0.25
problem
~ 0
P(g) =
© Jack Breese (Microsoft) & Daphne Koller (Stanford)
34
Variable elimination
A
B
C
P(c) = P(c | b) P(b | a) P(a) b a
P(b)
x
P(A) P(B | A)
P(B, A) A P(B)
x
P(C | B)
P(C, B) B P(C)
© Jack Breese (Microsoft) & Daphne Koller (Stanford)
35
Inference as variable elimination
• A factor over X is a function from val(X) to numbers in [0,1]:– A CPT is a factor– A joint distribution is also a factor
• BN inference:– factors are multiplied to give new ones– variables in factors summed out
• A variable can be summed out as soon as all factors mentioning it have been multiplied.
© Jack Breese (Microsoft) & Daphne Koller (Stanford)
36
Variable Elimination with loops
Smoking
GenderAge
Cancer
LungTumor
SerumCalcium
Exposureto Toxics
xP(A,G,S)
P(A) P(S | A,G)P(G)
P(A,S)G
E,S P(C)
P(L | C) x P(C,L) C
P(L)
Complexity is exponential in the size of the factors
P(E,S)A
P(A,E,S)
P(E | A)
x
P(C | E,S)
P(E,S,C)x
© Jack Breese (Microsoft) & Daphne Koller (Stanford)
37
Join trees*
P(A)
P(S | A,G)
P(G)
P(A,S)xxx A, G, S
E, S, C
C, LC, S-C
A join tree is a partially precompiled factorization
Smoking
GenderAge
Cancer
LungTumor
SerumCalcium
Exposureto Toxics
* aka Junction Tree, Lauritzen-Spiegelhalter, or Hugin algorithm, …
A, E, S
© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Junction Tree Algorithm
• Converts Bayes Net into an undirected tree– Joint probability remains unchanged– Exact marginals can be computed
• Why ???– Uniform treatment of Bayes Net and MRF– Efficient inference is possible for undirected trees