28
A Comparison Between Bayesian Networks and Generalized Linear Models in the Indoor/Outdoor Scene Classification Problem

A Comparison Between Bayesian Networks and Generalized Linear Models in the Indoor/Outdoor Scene Classification Problem

Embed Size (px)

Citation preview

Page 1: A Comparison Between Bayesian Networks and Generalized Linear Models in the Indoor/Outdoor Scene Classification Problem

A Comparison Between Bayesian Networks and Generalized Linear

Models in the Indoor/OutdoorScene Classification Problem

Page 2: A Comparison Between Bayesian Networks and Generalized Linear Models in the Indoor/Outdoor Scene Classification Problem

Overview Introduce Scene Classification Problems Motivation for Scene Classification Kodak's JBJL Database and Features Bayesian Networks

Brief Overview (description, inference, structure learning)

Classification Results GLM

Briefer Overview Classification Results

Comparison and Conclusion

Page 3: A Comparison Between Bayesian Networks and Generalized Linear Models in the Indoor/Outdoor Scene Classification Problem

Problem Statement: Given a set of consumer digital images, can we use a statistical model to distinguish between indoor images and outdoor images?

Page 4: A Comparison Between Bayesian Networks and Generalized Linear Models in the Indoor/Outdoor Scene Classification Problem

Motivation

Kodak Increase visual appeal by processing based on

classification Object Recognition

Provide context information which may give clues to scale, location, identity, etc.

Page 5: A Comparison Between Bayesian Networks and Generalized Linear Models in the Indoor/Outdoor Scene Classification Problem

Procedure

Establish ground-truth for all images Perform feature extraction and

confidence/probability mapping for features Divide images into training and testing set Use test images to train a model to predict

ground-truth Use the model to predict ground truth for the

test set Evaluate performance

Page 6: A Comparison Between Bayesian Networks and Generalized Linear Models in the Indoor/Outdoor Scene Classification Problem

Kodak JBJL

Consumer image database 615 indoor and 693 outdoor images Some images are difficult for HSV to determine

whether it is indoor or outdoor Some images have indoor and outdoor parts

Page 7: A Comparison Between Bayesian Networks and Generalized Linear Models in the Indoor/Outdoor Scene Classification Problem

Features and Probability Mapping

“Low-level” Features Ohta-space color histogram (color information) MSAR model (texture information)

“Mid-level” Features Grass classifier Sky classifier

K-NN Used to Extract Probs from Features Quantized to nearest 10% (11 states for Mid-level,

3 states for Low-level)

Page 8: A Comparison Between Bayesian Networks and Generalized Linear Models in the Indoor/Outdoor Scene Classification Problem

Feature Probs and Classes

> table( iocTable$indoor.outdoor, iocTable$color ) 0.1 0.5 0.9 indoor 415 13 187 outdoor 523 13 157

> table( iocTable$indoor.outdoor, iocTable$texture ) 0.1 0.5 0.9 indoor 509 1 105 outdoor 567 2 124

> table( iocTable$indoor.outdoor, iocTable$bluesky ) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 indoor 566 20 10 0 3 3 0 0 0 1 7 outdoor 91 14 6 0 3 7 0 0 4 10 535

> table( iocTable$indoor.outdoor, iocTable$grass ) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 indoor 576 0 1 0 3 3 0 0 9 11 2 outdoor 100 116 49 0 23 14 0 0 1 0 352

Page 9: A Comparison Between Bayesian Networks and Generalized Linear Models in the Indoor/Outdoor Scene Classification Problem

Stat. Model 1: Bayesian Network

Graphical Model Variables are represented by vertices of a

graph Conditional relationships are represented by

directed edges Conditional Probability table associated with

each vertex Quantifies vertex relationships Facilitates automated inference

Page 10: A Comparison Between Bayesian Networks and Generalized Linear Models in the Indoor/Outdoor Scene Classification Problem
Page 11: A Comparison Between Bayesian Networks and Generalized Linear Models in the Indoor/Outdoor Scene Classification Problem

Exact Inference

Model Joint Probability

Inference

Page 12: A Comparison Between Bayesian Networks and Generalized Linear Models in the Indoor/Outdoor Scene Classification Problem

Structure Learning Search Space

Space BNs Variable-State Combination

(#States per Node) x (#Nodes) 2178 possible

Structures Limited to DAGs 29281

Page 13: A Comparison Between Bayesian Networks and Generalized Linear Models in the Indoor/Outdoor Scene Classification Problem

Scoring Metric

Score a structure based on how well the data models the data

We do have an expression estimate the data given the structure

Unfortunately, the data probability is difficult to estimate

Page 14: A Comparison Between Bayesian Networks and Generalized Linear Models in the Indoor/Outdoor Scene Classification Problem

The Bayes Dirichlet Likelihood Equivalent

Can compare structures 2 at a time What is the prior on structure?

Assume all structures are equally likely Use #edges to penalize complex networks

Page 15: A Comparison Between Bayesian Networks and Generalized Linear Models in the Indoor/Outdoor Scene Classification Problem

Challenges

Not all structures can be considered if there is only a small amount of data. Context dilution Can't consider cases where CPT cannot be filled in

Finding an optimal structure is NP hard

Page 16: A Comparison Between Bayesian Networks and Generalized Linear Models in the Indoor/Outdoor Scene Classification Problem

BDe Structure For I/O Classification

Greedy algorithm with BDe scoring Naïve Bayes Model!

Page 17: A Comparison Between Bayesian Networks and Generalized Linear Models in the Indoor/Outdoor Scene Classification Problem

Result Compared to Previous

Indoor vs Outdoor Classification using Computed SemanticFeatures Expert OpinionCorrect Incorrect Percent Correct

Indoor 519 96 84.4%Outdoor 589 104 85.0%Overall 1108 200 84.7%

Indoor vs Outdoor Classification using Computed SemanticFeatures

Model SelectionCorrect Incorrect Percent Correct

Indoor 288 9 97.0%Outdoor 350 7 98.0%Overall 638 16 97.3%

Previous Results

Our Results

Page 18: A Comparison Between Bayesian Networks and Generalized Linear Models in the Indoor/Outdoor Scene Classification Problem

Misclassified:Inferred Outdoor

Page 19: A Comparison Between Bayesian Networks and Generalized Linear Models in the Indoor/Outdoor Scene Classification Problem

Misclassified: Inferred Indoor

Page 20: A Comparison Between Bayesian Networks and Generalized Linear Models in the Indoor/Outdoor Scene Classification Problem

Generalized Linear Model

Outdoor and Indoor can be thought of a binary output

Logit kernel

Page 21: A Comparison Between Bayesian Networks and Generalized Linear Models in the Indoor/Outdoor Scene Classification Problem

Likelihood for GLM

Newton-Raphson Get estimates of mean and variance (1st and 2nd

derivative) Find optimal based on estimates (Taylor

Expansion) Iterate

Generally, this quickly converges to the optimal solution

Page 22: A Comparison Between Bayesian Networks and Generalized Linear Models in the Indoor/Outdoor Scene Classification Problem

glm(formula = outdoorCounts ~ color + texture + bluesky + grass, family = binomial(link = logit), data = trainingTable)

Deviance Residuals: Min 1Q Median 3Q Max-2.4827352 -0.2137121 0.0004311 0.1292940 2.7534686

Coefficients: Estimate Std. Error z value Pr(>|z|)(Intercept) -3.7680 0.4350 -8.663 < 2e-16 ***color0.9 -2.0746 0.5622 -3.690 0.000224 ***color0.5 1.4171 1.1127 1.274 0.202818texture0.9 0.5966 0.5800 1.029 0.303678texture0.5 9.7881 2399.5448 0.004 0.996745bluesky0.1 2.4976 0.7470 3.343 0.000827 ***bluesky0.2 2.2192 0.9739 2.279 0.022688 *bluesky0.4 3.7680 1.4796 2.547 0.010877 *bluesky0.5 3.9168 1.3167 2.975 0.002932 **bluesky0.8 20.2739 1676.6934 0.012 0.990353bluesky0.9 1.2633 1.7006 0.743 0.457559bluesky1 6.8030 0.6274 10.843 < 2e-16 ***grass 5.8175 0.6398 9.093 < 2e-16 ***---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Page 23: A Comparison Between Bayesian Networks and Generalized Linear Models in the Indoor/Outdoor Scene Classification Problem

Side by Side Comparison

GLM

Correct Prediction Incorrect Prediction Percent Correct

Indoor 289 14 95.4

Outdoor 346 6 98.3

Total 635 20 97

BN

Correct Prediction Incorrect Prediction Percent Correct

Indoor 288 9 97.0Outdoor 350 7 98.0Total 638 16 97.3

Page 24: A Comparison Between Bayesian Networks and Generalized Linear Models in the Indoor/Outdoor Scene Classification Problem

Misclassified: Predicted Outdoor

Page 25: A Comparison Between Bayesian Networks and Generalized Linear Models in the Indoor/Outdoor Scene Classification Problem

Misclassified: Predicted Indoor

Page 26: A Comparison Between Bayesian Networks and Generalized Linear Models in the Indoor/Outdoor Scene Classification Problem

Conclusion

The newer Bayesian Network model may perform classification slightly better than GLM BN is more computationally intensive Unclear if there is in fact a difference Both models have difficulty with the same images

Better to introduce new data than to use a new model New model give (at most) marginal improvement

Page 27: A Comparison Between Bayesian Networks and Generalized Linear Models in the Indoor/Outdoor Scene Classification Problem

References Heckerman, D. A Tutorial on Learning with

Bayesian Networks. In Learning in Graphical Models, M. Jordan, ed.. MIT Press, Cambridge, MA, 1999.

Murphy, K. A Brief Introduction to Graphical Models and Bayesian Networks, http://www.cs.ubc.ca/~murphyk/Bayes/bnintro.html(viewed 4/1/08)

Lehmann, E.L. and Casella G. Theory of Point Estimation (2nd edition)

Weisberg, S. Applied Linear Regression (3rd Edition)

Page 28: A Comparison Between Bayesian Networks and Generalized Linear Models in the Indoor/Outdoor Scene Classification Problem

Data Given Model Prob