Upload
chip
View
42
Download
0
Embed Size (px)
DESCRIPTION
Training Image Classifiers with Similarity Metrics, Linear Programming, and Minimal Supervision. Asilomar SSC Karl Ni, Ethan Phelps, Katherine Bouman, Nadya Bliss Lincoln Laboratory, Massachusetts Institute of Technology 2 November 2012. - PowerPoint PPT Presentation
Citation preview
12012 LLNL
Training Image Classifiers with Similarity Metrics, Linear Programming, and Minimal Supervision
Asilomar SSC
Karl Ni, Ethan Phelps, Katherine Bouman, Nadya BlissLincoln Laboratory, Massachusetts Institute of Technology
2 November 2012
This work is sponsored by the Department of the Air Force under Air Force contract FA8721-05-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the author and are not necessarily endorsed by the United States Government.
22012 LLNL
• What can a computer understand?
Applying Semantic Understanding of Images
• Who?• What?• When?• Where?
Classifier Decision!
Training Data Query by exampleStatistical modeledQuery by sketch
Computer vision algorithms• Image retrieval• Robotic navigation• Semantic labeling• Image sketch• Structure from Motion• Image localization
• Requires: Some prior knowledge
Feature Extraction
Matching &Association
32012 LLNL
Processing
FRAMEWORKLocalization Algorithms
EXPL
OIT
ATIO
N
Ground Imagery, VideoAerial Imagery, Video
Location
Training Framework
• Metadata• Graphs• Point
Clouds• Distributi
ons• Terrain• Etc.
World ModelMulti-Modal
SourcesOFF
LIN
ESE
TUP
Feature Extraction
Matching &Association
42012 LLNL
• Introduction• Feature Pruning Background• Matched Filter Training• Results• Summary
Outline
52012 LLNL
• Problems in image pattern matching
• Features are a quantitative way for machines to understand an image
Image Property Feature Technique– Local Color (Luma + Chroma Hists)– Object texture (DCT Local & Normalized)– Shape (Curvelets, Shapelets)– Lower level gradients (DWT : Haar, Daubechies)– Higher level descriptors (SIFT/SURF/HoG, etc)– Scene descriptors (GIST) - Torralba et al.
Finding the Features of Image
• Each image = 10 million pixels! • Most dimensions are irrelevant• Multiple concepts inside the image• Typical chain:
Feature Extraction
Training / Classifier
62012 LLNL
Numerous features: subset is relevant
• FEATURES ARE:• Red bricks on multiple
buildings• Small hedges, etc• Windows of a certain
type• Types of buildings are
there
• FEATURES ARE:• Arches and white
buildings• Domes and ancient
architecture• Older/speckled
materials (higher frequency image content)
• FEATURES ARE:• More suburb-like• Larger roads• Drier vegetation• Shorter houses
Choice of features requires looking at multiple semantic concepts defined by entities and attributes inside of images
Feature Extraction
Training / Classifier
72012 LLNL
• Most of the features are irrelevant
• Large dimensionality and algorithmic complexity
• Keep small numbers of salient features and discard large numbers of nondescriptive features
– Feature invariance to transformations, content, and context only to an extent (e.g., SIFT, RIFT, etc.)
– Simplify classifier (both computation & supervision)– Multiple instances of several features describing the same object
• Require a high level of abstraction– Visual similarity does not always correlate with “semantic” similarity
Feature DescriptorsFeature
ExtractionTraining / Classifier
Brown et. al., Lowe et. al.,Ng et. al., Thrun et. al.
82012 LLNL
• Tools to hand labelconcepts
• 2006-2011– Google Image Labeler– Kobus’s Corel Dataset– MIT LabelMe– Yahoo! Games
• Problems– Tedious– Time consuming– Incorrect– Very low throughput
• Famous algorithms – Parallelizable– Not generalizable,
unfortunately
Getting the Right Features
Feature Extraction
People can’t be flying or walking on billboards!1. Chair, 2. Table, 3. Road, 4. Road, 5. Table, 6. Car, 7. Keyboard
1
3
46
7
25
92012 LLNL
• Segmentation is a difficult manual task
• Multiple semantic concepts per single image
• Considerable amounts of noise most often irrelevant to any concept
Automatically Learn the Best Features
Concept 1 (e.g., sky)
Concept 2 (e.g., mountain)
Concept 3 (e.g., river)
Semantic Simplex
0.2 0.3 0.05 … 0.2
Kwitt, et. al. (Kitware)
102012 LLNL
• Lots of work in the 1990s– Conditional probabilities through large training data sets– Motivated by the query by example and query by sketch problems– Primarily based on multiple instance learning and noisy density
estimation• Learning multiple instances of an object (no noise case)
• Robustness to noise through law of large numbers– Hope to integrate it out
– Although the area of red boxes per instance is small, their aggregate over all instances is dominant
Leveraging Related Work
Noise, if uncorrelated, will become more and more sparse
Diettrich, et. al.
Keeler, et. al.
(Not the IBM Query for relational databases Zloof, but Ballerini et al.)
112012 LLNL
• Feature clustering in the large
• Mixture hierarchies can be incrementally trained
Parallel Calculations through Hierarchies
Top Level GMM
Lower Level GMMsCan be done in parallel
image 1 image 2 image 3
Vasconcelos, et. Al.
Image Class 1 Image Class 2 Image Class N
Distribution 1
Entire image
Distribution 2 Distribution N
Entire image Entire image
Training imagesTraining images
Automatic feature subselection has been submitted to SSP 2012
Lincoln LaboratoryGRID Processing
122012 LLNL
• Introduction• Feature Pruning Background• Matched Filter Training• Results• Summary
Outline
132012 LLNL
• Hierarchical Gaussian mixtures as a density estimate– Small sample-bias is large– Non-convex / sensitive to initialization– Extensive computational process to bring hierarchies together– Each level requires supervision (#classes, initialization, etc.)
• Think discriminantly:– Instead of: Generating centroids that represent images– Think: Prune features to eliminate redundancy
• Sparsity optimization– Solving directly for the features that we want to use– Reduction of redundancy is intuitive and not generative
• Under normalization, GMM’s classifier can be implemented with matched filter instead
Finding a sparse basis set
normalize
cCc
yx,maxarg},...,1{
22
},...,1{||||minarg c
Ccyx
142012 LLNL
• Let the feature be the jth feature in the training set, where italicized is the ith dimension of that feature.
• Let the X be a d x N matrix that represents the collection of all the features, where the jth column of X is a feature vector xj.
A Note on Notation
)(
)(2
)(
)(
1
jd
j
j
j
x
xx
x
)( jx)( j
ix
NjX xxxx ,,,, 21
152012 LLNL
• Gaussian Mixture Models
• Many optimization problems induce sparsity:
• Matched filter constraint:
• Relaxation of constraints
Finding sparsity with linear programming
Group Lasso
Max-Constraint OptimizationNot convex
LP Optimization Problem:Faster than G-LassoIndependemt of dimensionality!Convex (unlike MF opt & GMM, EM)On average, according to N2
GMM, solved via EM(non-convex optimization problem)
ii
T tXXtr
)(minarg
10 iij t 11:, Tjsuch that and
jj
T XXtr 2:, ||||)(maxarg
}1,0{ij 11:, Tjsuch that and
jjXX 2:,
22 ||||||)(||minarg
N
j
M
mmmjmm xp
MM 1 1......)|(logmin
11
Feature Extraction
Training / Classifier
162012 LLNL
• Relies on similarity matrix concept
• Actual implementation does not include similarity matrix, but rather keeps track of beta indices
Intuition
β =
< t1
< t2
< t3
< t4
ii
T tXXtr
)(minarg*
10 iij t 11Ts.t. and
195.1.01.095.12.001.02.0198.1.0098.1
XX T
ℓ∞-norm of the rows of X
172012 LLNL
• The optimization problem consists solely of dot products in a similarity function, whose prototypes are provided by that are similar to a set:
• Nonlinearity may be introduced in a kernel function (RKHS) that induces a vector space that we may not necessarily know the mapping to.
Nonlinear Feature Selection
ii
T tXXtr
)(minarg
10 iij t 11:, Tjsuch that and
10 iij t 11:, Tjsuch that and
182012 LLNL
Application to Classification
Feature Extraction
Matching &Association
= BEST FEATURES
QUERY
Classifying Imagewith Confidence
Just a faster way to classify imagery in one versus all frameworks
ii
T tXXtr
)(minarg*
10 iij t 11T
TRAINING
Feature Extraction
192012 LLNL
• Introduction• Feature Pruning Background• Matched Filter Training• Results• Summary
Outline
202012 LLNL
LP Feature Learning versus G-Lasso
• More intuitive grouping– Threshold learning is unnecessary– Post-processing is unnecessary
• 5.452% more accurate in +1/-1 learning classes
212012 LLNL
Segmentation and Classification Visual Result
Decisions
Original Image
Decisions
222012 LLNL
Interesting automatic semantic learning result
232012 LLNL
Application to Localization
TrainingDatasets MIT-Kendall Vienna Dubrovnik Lubbock
Testing MIT-Kendall 0.975 0.056 0.024 0.102
Vienna 0.050 0.896 0.035 0.060
Dubrovnik 0.015 0.024 0.905 0.057Lubbock 0.097 0.002 0.053 0.901
• 1400 images per dataset• Filter reduction to 356 filters per class• Less than a minute classification time• Coverage of cities: entire cities (Vienna, Dubrovnik,
Lubbock), portion of Cambridge (MIT-Kendall)
242012 LLNL
• Accurate modeling must occur before we have any hope in classifying images.
• Feature pruning is equivalent to Gaussian centroid determination under normalization
• Sparse optimization enables feature pruning and matched filter creation
• Sparse optimization contains only dot products so optimization can occur with RKHS in the transductive setting
Summary
252012 LLNL
• K. Ni, E. Phelps, K. L. Bouman, N. Bliss, “Image Feature Selection via Linear Programming,” to appear in Presentation at Asilomar SSC, Pacific Grove, CA. October (Asilomar ‘12)
• S. M. Sawyer, K. Ni, N. T. Bliss. "Cluster-based 3D Reconstruction of Aerial Video." to appear in Presentation at the 1st IEEE High Performance Extreme Computing Conference, Waltham, MA. September 2012 (HPEC '12)
• H. Viggh and K. Ni, “SIFT Based Localization Using Prior World Model for Robotic Navigation in Urban Environments,” to appear in Presentation at the 16th International Conference on Image Processing, Computer Vision, and Pattern Recognition, 2012, Las Vegas, Nevada (IPCV-2012)
• K. Ni, Z. Sun, N. Bliss, "Real-time Global Motion Blur Detection", to appear in Presentation at the IEEE International Conference on Image Processing, 2012, Orlando, Florida, (ICIP-2012)
• N. Arcalano, K. Ni, B. Miller, N. Bliss, P. Wolfe, "Moments of Parameter Estimates for Chung-Lu Random Graph Models", Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2012, Kyoto Japan, ICASSP-2012
• A. Vasile, L. Skelly, K. Ni, R. Heinrichs, O. Camps, and M. Sznaier, “Efficient City-sized 3D Reconstruction from Ultra-High Resolution Aerial and Ground Video Imagery”, Proceedings of the IEEE International Symposium on Visual Computing, 2011, Las Vegas, NV, ISCV-2011, pp 347-358
• K. Ni, Z. Sun, N. Bliss, "3-D Image Geo-Registration Using Vision-Based Modeling", Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2011, Prague, Czech Republic, ICASSP-2011, pp 1573 - 1576
• K. Ni, T. Q. Nguyen, "Empirical Type-I Filter Design for Image Interpolation", Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP-2010, pp 866 - 869
• Z. Sun, N. Bliss, & K. Ni, "A 3-D Feature Model for Image Matching", Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP-2010, pp 2194-2197
• K. Ni, Z. Sun, N. Bliss, & N. Snavely, "Construction and exploitation of a 3D model from 2D image features", Proceedings of SPIE International Conference on Electronic Imaging, Inverse Problems Session, SPIE-2010, Vol. 7533, San Jose, CA, U.S.A., January 2010.
References
262012 LLNL
• MIT Lincoln Laboratory– Karl Ni– Nicholas Armstrong-Crews– Scott Sawyer– Nadya Bliss
• MIT– Katherine L. Bouman
• Boston University– Zachary Sun
• Northeastern University– Alexandru Vasile
• Cornell University– Noah Snavely
Contributors and Acknowledgements
272012 LLNL
Questions?
282012 LLNL
Backup