Upload
myles-jacobs
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
Wei Zhang [email protected] ([email protected])
Akshat Surve [email protected]
Xiaoli Fern [email protected]
Thomas Dietterich [email protected]
Learning Non-Redundant Codebooks for Classifying
Complex Objects
Contents
Learning codebooks for object classification
Learning non-redundant codebooksFrameworkBoost-Resampling algorithmBoost-Reweighting algorithm
ExperimentsConclusions and future work
2
Contents
Learning codebooks for object classification
Learning non-redundant codebooksFrameworkBoost-Resampling algorithmBoost-Reweighting algorithm
ExperimentsConclusions and future work
3
Problem 1: Stonefly Recognition
Cal
Dor
Hes
Iso
Mos
Pte
Swe
Yor
Zap4
Visual Codebook for Object Recognition
Interest Region Detector
Region Descriptors
Visual Codebook
2017
3
18
2
Image Attribute Vector(Term Frequency)
Classifier
6
Training image
Testing image
5
Problem 2: Document Classification
Through the first half of the 20th century, most of the scientific community believed dinosaurs to have been slow, unintelligent cold-blooded animals. Most research conducted since the 1970s, however, has supported the view that dinosaurs were active animals with elevated metabolisms and numerous adaptations for social interaction. The resulting transformation in the scientific understanding of dinosaurs has gradually filtered …
6
Variable-length Document …
absent: 0…active: 1…animal: 2…believe: 1…dinosaur: 3…social:1…
Fixed-length Bag-of-words
Codebook for Document Classification
Cluster the words to form code-words Through the first half of the 20th
century, most of the scientific community believed dinosaurs to have been slow, unintelligent cold-blooded animals. Most research conducted since the 1970s, however, has supported the view that dinosaurs were active animals with elevated metabolisms and numerous adaptations for social interaction. The resulting transformation in the scientific understanding of dinosaurs has gradually filtered …
Through the first half of the 20th century, most of the scientific community believed dinosaurs to have been slow, unintelligent cold-blooded animals. Most research conducted since the 1970s, however, has supported the view that dinosaurs were active animals with elevated metabolisms and numerous adaptations for social interaction. The resulting transformation in the scientific understanding of dinosaurs has gradually filtered …
Through the first half of the 20th century, most of the scientific community believed dinosaurs to have been slow, unintelligent cold-blooded animals. Most research conducted since the 1970s, however, has supported the view that dinosaurs were active animals with elevated metabolisms and numerous adaptations for social interaction. The resulting transformation in the scientific understanding of dinosaurs has gradually filtered …
Training corpus
dog, canine, hound, ...
cluster 1
cluster 2
car, automobile, vehicle, ……
Through the first half of the 20th century, most of the scientific community believed dinosaurs to have been slow, unintelligent cold-blooded animals. Most research conducted since the 1970s, however, has supported the view that dinosaurs were active animals with elevated metabolisms and numerous adaptations for social interaction. The resulting transformation in the scientific understanding of dinosaurs has gradually filtered …
codebook
Input document
… cluster K
20
1 … 0 2
Classifier
7
Contents
Learning codebooks for object classification
Learning non-redundant codebooksFrameworkBoost-Resampling algorithmBoost-Reweighting algorithm
ExperimentsConclusions and future work
8
Learning Non-Redundant Codebooks
Motivation: Improve the discriminative performance of any codebook and classifier learning approach by encouraging non-redundancy in the learning process.
Approach: learn multiple codebooks and classifiers; wrap the codebook and classifier learning process inside a boosting procedure [1].
Codebook Approaches: k-means, Gaussian Mixture Modeling, Information Bottleneck, Vocabulary trees, Spatial pyramid …
Non-Redundant Learning
[1] Freund, Y. and Schapire, R. (1996). Experiments with a new boosting algorithm. ICML.9
Non-Redundant Codebook and Classifier Learning Framework
ClassifierC1
Clustering X based on
weights W1(B) X X
Update boosting weights
…………
Codebook D1
ClassifierCt
Clustering X based on weights Wt
(B) X X
Codebook Dt
ClassifierCT
Clustering X based on
weights WT(B)
X X Codebook DT
W1(B)
PredictionsL1
Final Predictio
nsL
Wt(B)
PredictionsLt
WT(B)
PredictionsLT
Update boosting weights
Update boosting weights
Update boosting weights
…………
10
Instantiations of the Framework
• Boost-Reweighting (discrete feature space): Supervised clustering features X based on the joint distribution table Pt(X, Y) (Y represents the class labels). This table is updated at each iteration based on the new boosting weights.
• Boost-Resampling (continuous feature space): Generate a non-redundant clustering set by sampling the training examples according to the updated boosting weights. The codebook is constructed by clustering the features in this clustering set.
11
Codebook Learning and Classification Algorithms
Documents:Codebook Learning: Information Bottleneck
(IB) [1]: L = I(X ; X’) − βI(X’ ; Y)
Classification: Naïve Bayes
Objects: Codebook Learning: K-MeansClassification: Bagged Decision Trees
[1] Bekkerman, R., El-yaniv, R., Tishby, N., Winter, Y., Guyon, I. and Elisseeff, A. (2003). Distributional word clusters vs. words for text categorization. JMLR.
12
Image Attributes: tf−idf Weights
[1] Salton, G. and Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management.
Term-frequency−inverse document frequency (tf−idf)
weight [1]:
"Document" = Image
"Term" = Instance of a visual word
Interest RegionsRegion Descriptors
Visual Codebook
2017
3
18
2
Image Attribute Vector
6
13
Classifier
Classifier
tf-idf
Contents
Learning codebooks for object classification
Learning non-redundant codebooksFrameworkBoost-Resampling algorithmBoost-Reweighting algorithm
ExperimentsConclusions and future work
14
Experimental Results − Stonefly Recognition
Dataset Boost Larios [1] Opelt [2]
STONEFLY2 97.85 79.37 70.10
STONEFLY4 98.21 82.42 /
[1] Larios, N., Deng, H., Zhang, W., Sarpola, M., Yuen, J., Paasch, R., Moldenke, A., Lytle, D., Ruiz Correa, S., Mortensen, E., Shapiro, L. and Dietterich, T. (2008). Automated insect identification through concatenated histograms of local appearance features. Machine Vision and Applications.
[2] Opelt, A., Pinz, A., Fussenegger, M. and Auer, P. (2006). Generic object recognition with boosting. PAMI.
• 3-fold cross validation experiments• The size of each codebook K = 100• The number of boosting iterations T = 50
15
Experimental Results − Stonefly Recognition
(cont.)
Dataset Boost Single Random
STONEFLY2 97.85 85.84 89.16
STONEFLY4 98.21 67.20 90.42
STONEFLY9 95.09 78.33 89.07
• Single: learns only a single codebook of size K×T = 5000. • Random: weighted sampling is replaced with uniform random sampling that neglects the boosting weights.
Boost achieves 77% error reduction comparing with Single on STONEFLY9.
16
Experimental Results − Stonefly Recognition
(cont.)
17
Experimental Results − Document Classification
• S1000: learns a single codebook of size 1000. • S100: learns a single codebook of size 100.
• Random: 10 bagged samples of the original training corpus are used to estimate the joint distribution table Pt(X, Y).
Dataset Boost Random S1000 S100
NG10 90.24 85.43 84.31 79.88
ENRON10 84.44 81.09 80.90 74.23
18
Experimental Results − Document Classification (cont.)
• [TODO]: add Figure 5 in a similar format as Figure 4
19
Contents
Learning codebooks for object classification
Learning non-redundant codebooksFrameworkBoost-Resampling algorithmBoost-Reweighting algorithm
ExperimentsConclusions and future work
20
Conclusions and Future Work
Conclusions:Non-redundant learning is a simple and general
framework to effectively improve the performance of
codebooks.
Future work:Explore the underlying reasons for the
effectiveness of non-redundant codebooks – discriminative analysis, non-redundancy tests;
More comparison experiments on well-established datasets.
21
Acknowledgements
• Supported by Oregon State University insect ID project: http://web.engr.oregonstate.edu/~tgd/bugid
• Supported by NSF under grant number IIS-0705765.
Thank you !22