A Comparison Between Bayesian Networks and Generalized Linear Models in the Indoor/Outdoor Scene Classification Problem

A Comparison Between Bayesian Networks and Generalized Linear

Models in the Indoor/OutdoorScene Classification Problem

Overview Introduce Scene Classification Problems Motivation for Scene Classification Kodak's JBJL Database and Features Bayesian Networks

Brief Overview (description, inference, structure learning)

Classification Results GLM

Briefer Overview Classification Results

Comparison and Conclusion

Problem Statement: Given a set of consumer digital images, can we use a statistical model to distinguish between indoor images and outdoor images?

Motivation

Kodak Increase visual appeal by processing based on

classification Object Recognition

Provide context information which may give clues to scale, location, identity, etc.

Procedure

Establish ground-truth for all images Perform feature extraction and

confidence/probability mapping for features Divide images into training and testing set Use test images to train a model to predict

ground-truth Use the model to predict ground truth for the

test set Evaluate performance

Kodak JBJL

Consumer image database 615 indoor and 693 outdoor images Some images are difficult for HSV to determine

whether it is indoor or outdoor Some images have indoor and outdoor parts

Features and Probability Mapping

“Low-level” Features Ohta-space color histogram (color information) MSAR model (texture information)

“Mid-level” Features Grass classifier Sky classifier

K-NN Used to Extract Probs from Features Quantized to nearest 10% (11 states for Mid-level,

3 states for Low-level)

Feature Probs and Classes

> table( iocTable$indoor.outdoor, iocTable$color ) 0.1 0.5 0.9 indoor 415 13 187 outdoor 523 13 157

> table( iocTable$indoor.outdoor, iocTable$texture ) 0.1 0.5 0.9 indoor 509 1 105 outdoor 567 2 124

> table( iocTable$indoor.outdoor, iocTable$bluesky ) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 indoor 566 20 10 0 3 3 0 0 0 1 7 outdoor 91 14 6 0 3 7 0 0 4 10 535

> table( iocTable$indoor.outdoor, iocTable$grass ) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 indoor 576 0 1 0 3 3 0 0 9 11 2 outdoor 100 116 49 0 23 14 0 0 1 0 352

Stat. Model 1: Bayesian Network

Graphical Model Variables are represented by vertices of a

graph Conditional relationships are represented by

directed edges Conditional Probability table associated with

each vertex Quantifies vertex relationships Facilitates automated inference

Exact Inference

Model Joint Probability

Inference

Structure Learning Search Space

Space BNs Variable-State Combination

(#States per Node) x (#Nodes) 2178 possible

Structures Limited to DAGs 29281

Scoring Metric

Score a structure based on how well the data models the data

We do have an expression estimate the data given the structure

Unfortunately, the data probability is difficult to estimate

The Bayes Dirichlet Likelihood Equivalent

Can compare structures 2 at a time What is the prior on structure?

Assume all structures are equally likely Use #edges to penalize complex networks

Challenges

Not all structures can be considered if there is only a small amount of data. Context dilution Can't consider cases where CPT cannot be filled in

Finding an optimal structure is NP hard

BDe Structure For I/O Classification

Greedy algorithm with BDe scoring Naïve Bayes Model!

Result Compared to Previous

Indoor vs Outdoor Classification using Computed SemanticFeatures Expert OpinionCorrect Incorrect Percent Correct

Indoor 519 96 84.4%Outdoor 589 104 85.0%Overall 1108 200 84.7%

Indoor vs Outdoor Classification using Computed SemanticFeatures

Model SelectionCorrect Incorrect Percent Correct

Indoor 288 9 97.0%Outdoor 350 7 98.0%Overall 638 16 97.3%

Previous Results

Our Results

Misclassified:Inferred Outdoor

Misclassified: Inferred Indoor

Generalized Linear Model

Outdoor and Indoor can be thought of a binary output

Logit kernel

Likelihood for GLM

Newton-Raphson Get estimates of mean and variance (1st and 2nd

derivative) Find optimal based on estimates (Taylor

Expansion) Iterate

Generally, this quickly converges to the optimal solution

glm(formula = outdoorCounts ~ color + texture + bluesky + grass, family = binomial(link = logit), data = trainingTable)

Deviance Residuals: Min 1Q Median 3Q Max-2.4827352 -0.2137121 0.0004311 0.1292940 2.7534686

Coefficients: Estimate Std. Error z value Pr(>|z|)(Intercept) -3.7680 0.4350 -8.663 < 2e-16 ***color0.9 -2.0746 0.5622 -3.690 0.000224 ***color0.5 1.4171 1.1127 1.274 0.202818texture0.9 0.5966 0.5800 1.029 0.303678texture0.5 9.7881 2399.5448 0.004 0.996745bluesky0.1 2.4976 0.7470 3.343 0.000827 ***bluesky0.2 2.2192 0.9739 2.279 0.022688 *bluesky0.4 3.7680 1.4796 2.547 0.010877 *bluesky0.5 3.9168 1.3167 2.975 0.002932 **bluesky0.8 20.2739 1676.6934 0.012 0.990353bluesky0.9 1.2633 1.7006 0.743 0.457559bluesky1 6.8030 0.6274 10.843 < 2e-16 ***grass 5.8175 0.6398 9.093 < 2e-16 ***---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Side by Side Comparison

GLM

Correct Prediction Incorrect Prediction Percent Correct

Indoor 289 14 95.4

Outdoor 346 6 98.3

Total 635 20 97

BN

Correct Prediction Incorrect Prediction Percent Correct

Indoor 288 9 97.0Outdoor 350 7 98.0Total 638 16 97.3

Misclassified: Predicted Outdoor

Misclassified: Predicted Indoor

Conclusion

The newer Bayesian Network model may perform classification slightly better than GLM BN is more computationally intensive Unclear if there is in fact a difference Both models have difficulty with the same images

Better to introduce new data than to use a new model New model give (at most) marginal improvement

References Heckerman, D. A Tutorial on Learning with

Bayesian Networks. In Learning in Graphical Models, M. Jordan, ed.. MIT Press, Cambridge, MA, 1999.

Murphy, K. A Brief Introduction to Graphical Models and Bayesian Networks, http://www.cs.ubc.ca/~murphyk/Bayes/bnintro.html(viewed 4/1/08)

Lehmann, E.L. and Casella G. Theory of Point Estimation (2nd edition)

Weisberg, S. Applied Linear Regression (3rd Edition)

http://www.cs.ubc.ca/~murphyk/Bayes/bnintro.html

http://www.cs.ubc.ca/~murphyk/Bayes/bnintro.html

Data Given Model Prob

Documents

A Comparison Between Bayesian Networks and Generalized Linear Models in the Indoor/Outdoor Scene Classification Problem