47
A New Approach To The Multiclass Classification Problem Category Vector Space

Category vectorspaceessex

Embed Size (px)

Citation preview

Page 1: Category vectorspaceessex

A New Approach To The Multiclass Classification Problem

Category Vector Space

Page 2: Category vectorspaceessex

Agenda

Problem Motivation Discussion Preliminary Results

Page 3: Category vectorspaceessex

Multi-class classification through binary classification

One-vs-All One-vs-One

Multi-class classification can be constructed often as a generalization of binary classification

In practice multi-class classification is done by combining binary classifiers

Classification ProblemProblem

Page 4: Category vectorspaceessex

Object recognition

100

Automated protein classification

50

300-600

Digit recognition

10

Phoneme recognition

[Waibel, Hanzawa, Hinton,Shikano, Lang 1989]

http

://w

ww

.glu

e.um

d.ed

u/~z

heli

n/re

cog.

htm

l

The multi-class algorithm computationally expensive

Multiclass ApplicationsLarge Category Space

Problem

Page 5: Category vectorspaceessex

Hand-writing recognition (e.g., USPS) Text classification Face detection Face expression recognition

Other Multiclass ApplicationsProblem

Page 6: Category vectorspaceessex

Data: {(xi,yi)} i =1,…,n

Classification Setup

Question: design a classification rule

y = f(x)

such that, given a new x, this predicts y with

minimal probability of error

vectorfeature dRx

label1,1 y

Training and test data drawn i.i.d. from fixed but unknown probability distribution D

Labeled training set

nnii yxyxS ,,,,

Problem

Page 7: Category vectorspaceessex

+

+

+

+

_

+

__

_

_

+

+

_

Training examples mapped to (usually high-dimensional) feature space by a feature map F(x) = (F1(x), … , Fd(x))

Learn linear decision boundary: Trade-off between maximizing geometric margin of the trainingdata and minimizing margin violations

Support Vector Machines (SVMs)Problem

Page 8: Category vectorspaceessex

Linear classifier defined in feature space by

SVM solution gives

as a linear combination of support vectors, a subset of the training vectors

+

+

+

+

_

+

__

_

_

+

+

_w

b

Definition Of SVM Classifiers

bwf xx ,

N

iiw

1ix

Problem

Page 9: Category vectorspaceessex

Definition Of A Margin

History (Vapnik, 1965) if linearly separable: Place hyerplane “far” from the data:

large margin

BAD

Problem

Page 10: Category vectorspaceessex

Maximize The MarginHistory (Vapnik, 1965) if linearly separable:

Place hyerplane “far” from the data: large margin

GOOD

Large margin classifierLeads to good generalization (performance on test sets)

Problem

Page 11: Category vectorspaceessex

One-vs-All (OVA) For each class build a classifier for that class vs the rest

Constructs k SVM models

Often very imbalanced classifiers

Asymmetry in the amount of training data

Earliest implementation for SVM multiclass

One-vs-One (OVO) Constructs k(k-1)/2 classifiers

Rooted binary SVM’s with k leaves

Traverse tree to reach leaf node

Combining Binary ClassifiersProblem

Page 12: Category vectorspaceessex

Race categories {White, Black, Asian}Task: Map the image training set to the race labels

The training (learning) Test (generalization)

Scenario: Ambiguous test image is presented Mixed race person A person drawn from a race which is not represented by the

system (i.e. Hispanics, Native Americans, etc)No way of assigning a mixed label

The system cannot represent the mixed race person using a combination of categories

No way of representing unknown racePossible Solution:

Indicate that the incoming image is outside the margin of each learned category

Example 1Motivation

Page 13: Category vectorspaceessex

Musical samples generated by a single instrument Electric guitar—a set of note categories {C,C#,D,D#, etc.}

Task: Map the training set musical notes to the labels Reasonable learning and generalization properties

Scenario: Given musical sequences Intervals (two notes simultaneously struck such as {C,F#} ) Chords (containing three or more notes)

Ambiguity at the training set level Forced to assign new labels to intervals and chords even though they

contain the same features—single notes—as the note categories Music sequence case, if we learned a conditional probability

distribution p(L = l|x) x is a music sequence and L = {C,C#,D, · · · ,B} is a set of note labels When x is an interval—say a tritone

No way of assigning high probability to the tritone Possible Solution:

Accommodate the tritone by assigning it a new label Large label space Truncate because of exponential size considerations

Example 2Motivation

Page 14: Category vectorspaceessex

Categories are conceived as nominal labels No underlying geometry for the categories Inability of the conditional distribution to give us a

measure (value) for interpolated categories Non-represented interpolated categories are left out Not easy to distinguish basic categories from compound

categories

Problems With Combining Binary ClassifiersMotivation

Page 15: Category vectorspaceessex

Invoke the notion of a category vector spaceCategories are defined with a geometric structure

Assume that the set of categories(labels) forms a vector spaceMusic sequence would correspond to a label in a twelve dimensional vector

{C,C#,D,D#,E,F,F#,G,G#,A,A#,B}Basic note C,C#,D etc. would have its own coordinate axis (vector space)Learning problem:

Map the training set music sequences to vectors in a 12 dimensional space such that the training and test set errors are small

Map the training musical sequences to the 12 dimensional vector space and then (if a support vector machine approach is used), maximize the margin of the mapped vectors in the category space

Race classification example is analogous Depends on how many races we wish to explicitly represent Map the training set to race category vector space and maximize the margin

Generalization problem: Map a test set musical sequence or image into the category space and then ask if it

lies within the margin of a note (or chord) or race category

Category Vector Spaces Solution

Note: Extensions to other multi-category learning applications are straightforward assuming we can map category labels to coordinate.

12Ry

Motivation

Page 16: Category vectorspaceessex

Solution: The columns of W are the top D eigenvectors(corresponding to the largest eigenvalues) of

Multiclass Fisher Related Idea

Nixi ,,1,R M

D categories and a projected set of features defined bythe MC-FLD maximizes

'iswhere DMWxWz iT

i

TBT

W WWSWWStraceWJ1

)(

Ta

D

aaaB mmmmNS

1

D

a

N

i

Taiaiw

a

mxmxS1 1

B-1wSS

where

Given the feature vectors

meanpooled

categorytheofmeanclass

m

am tha

Eigenvectors are orthonormalColumns of W constitute a category vector spaceInterpret as a category space projectionOptimal solution is a set of orthogonal weight vectors

Discussion

Page 17: Category vectorspaceessex

Avoided this approach since margins are not maximized in category space

We have not seen a classifier take a three class problem with labels {0,1,2}, map the input features into a vector space

Basis vectors , and Attempt to maximize the margin in the category vector

space Not seen any previous work where a pattern from a

compound category—say a combination of labels 1 and 2—is also used in training with a conversion of the compound category to a vector

T001 T010 T100

T110

Discussion

Disadvantage Of Multiclass Fisher

Page 18: Category vectorspaceessex

Input feature vectors are mapped to the category vector space using a kernel-based approach

In the category vector space, maximizing the margin is equivalent to forming hypercones

Mapped feature vectors that lie inside the hypercone have a distinct class label

Mapped vectors that lie in between hypercones are ambiguous

Hypercones are not allowed to intersect

Depicts basic categories

Description of Category Vector SpacesDiscussion

Page 19: Category vectorspaceessex

Each pattern now exists as a linear superposition of category vectors in the category space.

Ensures ambiguity is handled at a fundamental level Compound categories can be directly represented in the

category space Can maximize the compound category margin as well as

the margins for the basic categories

Advantages Of Category Vector Space Discussion

Page 20: Category vectorspaceessex

Regression Each input training set feature vector must be mapped

to a corresponding point where M is the number of feature dimensions and D the cardinality of the basic categories

Classification Each mapped feature vector must maximize its margin

relative to its own category vector against the other category vectors Here is known and corresponds to a category vector

Technical Challenges

Mix R

DRiz

izy

yyy '

Discussion

Page 21: Category vectorspaceessex

controls the width of the interval for which there is no penalty Slack variable vectors are non-negative component-wise Weight vector and bias help map the feature vector to its counterpart. The choice of kernel K (GRBF or otherwise) is hidden in the operator which ∗

implements inner products by projecting vectors in into a suitable space The regularization parameter weighs the norm of against the data fitting

error. Larger the value of , the greater the emphasis on the data fitting error

Regression In Category Space

D

a

N

iia

N

iia

D

aaa wwzbw

1 1

*

11

*1 22

1;,,,

*iaaiaia bxwz

iaiaaia zbxw

0ia

0* ia

subject to the constraints

aw

Discussion

Page 22: Category vectorspaceessex

Associate each mapped vector with a category vector Category vectors

Can be basis vectors (axes corresponding to basic categories) in the category space Ordinary vectors (corresponding to compound categories)

In this definition of membership, no distinction is made between basic and compound categories.

We seek to maximize the margin in the category space Minimizing the norm of the mapped vectors is equivalent to maximizing the

margin provided the inequalities can be satisfied

Classification In Category Space

N

iii zzz

12 2

1

1 biai yziyz

LiiaLb ,,1,,,1

subject to the constraints

Discussion

Page 23: Category vectorspaceessex

Integrated Classification and Regression Objective Function

The design of the objective function is so we can obtain an integrated dual classification and regression objective

zzbwzbw 2*

1* ;,,,,,,,

Discussion

Page 24: Category vectorspaceessex

Gaussian radial basis function (GRBF) classifier with multiple outputs One for each basic category

Given a training set of registered and cropped face images Labels are {White, Black, Asian}

GRBF classifier to map the input feature vectors into the category space Since we know the label of each training set pattern we approximate the

mapped category space

Multi-Category GRBFPreliminary Results

K

aa

Ta

N

i

K

aai

TaiaGRBF wwbxwzbwE

11 1

,

2

2

2exp

ji

ij

xxx

TiNiii xxxx ,,, 21

N

iiai

N

ii

Ti

TTaa zxxxwb

1

1

1

,

Solution

Page 25: Category vectorspaceessex

45 training images from the “Labeled Faces in the Wild” image database Database contains over 13,000 images that were captured using the Viola-

Jones face detector Each face has been labeled with the corresponding name of the person Of the 5749 people featured in the database 1680 individuals have multiple images with each image being unique

In the 45 training images, 15 were from each of the three races considered 45 images registered to one “standard” image (after first converting them to

grayscale) using a landmark-based thin-plate spline (TPS) The landmarks used were:

Three(3) for each eye Two(2) for the nose Two(2) for the two ears (very approximate since the ears are often not visible).

After registering the images, they were cropped and resized to 130×90 with the intensity scale adjusted to [0,1].

Free parameters were set to and These were carefully but qualitatively chosen to get a good training set separation in category space

Experimental Setup

1 10

White Basis = y1 = [1,0,0]T Black Basis = y2 = [0,1,0]T Asian Basis = y3 = [0,0,1]T

Preliminary Results

Page 26: Category vectorspaceessex

Training set images: Top row: Asian, Middle row: Black, Bottom row: White

Race Classification Training ImagesPreliminary Results

Page 27: Category vectorspaceessex

Training set images mapped into the category vector space

Category Space For Training ImagesPreliminary Results

Page 28: Category vectorspaceessex

Test set images: Top row: Asian, Middle row: Black, Bottom row: White51 test set images (17 Asian, 16 Black, 18White)Used the weights discovered by the GRBF classifier to map the input test set images into thecategory space

Race Classification Testing ImagesPreliminary Results

Page 29: Category vectorspaceessex

In the graph above we can see the separation in the category space

Category Space Testing ImagesPreliminary Results

Page 30: Category vectorspaceessex

Pairwise classificationsRoughly separate each pair by drawing lines through the originRemoving the orthogonal subspace that is not being compared against

Pairwise Projection Of Category Space Testing Images

The pairwise separations in the category space show an improved visualizationOne could in fact draw separating boundaries in the three pairwise comparisons and obtain an overall decision boundary in 3D

Preliminary Results

Page 31: Category vectorspaceessex

Nine ambiguous (from our perspective) facesWanted to exhibit the tolerance of ambiguity that is a hallmark of category spacesThe conclusion drawn from the result is a subjective one

Ambiguous faces mapped into the category space. Note how they cluster together.

Ambiguity TestingPreliminary Results

Page 32: Category vectorspaceessex

Experiment With MPEG-7 Database

Butterfly

Bat

Bird

Preliminary Results

Page 33: Category vectorspaceessex

Experiment With MPEG-7 Database

Fly

Chicken

Batbird

Preliminary Results

Page 34: Category vectorspaceessex

3 Class TrainingPreliminary Results

Page 35: Category vectorspaceessex

3 Class TestingPreliminary Results

Page 36: Category vectorspaceessex

4 Class TrainingPreliminary Results

Page 37: Category vectorspaceessex

4 Class TestingPreliminary Results

Page 38: Category vectorspaceessex

Summary

Fundamental contribution is learning of category spaces from patternsEnsures ambiguity is handled at a fundamental levelCompound categories can be directly represented in the category spaceSpecific approach integrates regression and classification (iCAR)

Combines a regression objective function (map the patterns) Maximum margin objective function

(perform multicategory classification in category space)

Page 39: Category vectorspaceessex

Questions & Discussion

Thank You

Page 40: Category vectorspaceessex

References

[1] H. Guo. Diffeomorphic point matching with applications in medical image analysis. PhD thesis,University of Florida, Gainesville, FL, 2005. Ph.D. Thesis.[2] J. Zhang. New information theoretic distance measures and algorithms for multimodality image registration .PhD thesis, University of Florida, Gainesville, FL, 2005. Ph.D. Thesis.[3] A. A. Kumthekar. Affine image registration using minimum spanning tree entropies. Master’s thesis,University of Florida, Gainesville, FL, 2004. M. S. Thesis.[4] A. Rajwade, A. Banerjee, and A. Rangarajan. A new method of probability density estimation withapplication to mutual information-based image registration. In IEEE Computer Vision and PatternRecognition (CVPR), volume 2, pages 1769–1776, 2006.[5] A. Peter and A. Rangarajan. A new closed form information metric for shape analysis. In MedicalImage Computing and Computer Assisted Intervention (MICCAI part 1), Springer LNCS 4190, pages249–256. 2006.[6] A. S. Roy, A. Gopinath, and A. Rangarajan. Deformable density matching for 3D non-rigid registrationof shapes. InMedical Image Computing and Computer Assisted Intervention (MICCAI part 1), SpringerLNCS 4791, pages 942–949. 2007.[7] F.Wang, B. Vemuri, and A. Rangarajan. Groupwise point pattern registration using a novel CDF-basedJensen Shannon divergence. In IEEE Computer Vision and Pattern Recognition (CVPR), volume 1,pages 1283–1288, 2006.[8] L. Garcin, A. Rangarajan, and L. Younes. Non-rigid registration of shapes via diffeomorphic pointmatching and clustering. In IEEE Conf. on Image Processing, volume 5, pages 3299–3302, 2004.[9] F. Wang, B.C. Vemuri, A. Rangarajan, I.M. Schmalfuss, and S.J. Eisenschenk. Simultaneous nonrigidregistration of multiple point sets and atlas construction. In European Conference on Computer Vision(ECCV), pages 551–563, 2006.[10] H. Guo, A. Rangarajan, and S. Joshi. 3D diffeomorphic shape registration on hippocampal datasets. InJames S. Duncan and Guido Gerig, editors, Medical Image Computing and Computer Assisted Intervention(MICCAI), pages 984–991. 2005.

Page 41: Category vectorspaceessex

References

[11] A. Rangarajan, J. Coughlan, and A. L. Yuille. A Bayesian network framework for relational shapematching. In IEEE Intl. Conf. Computer Vision (ICCV), volume 1, pages 671–678, 2003.[12] J. Zhang and A. Rangarajan. Multimodality image registration using an extensible information metricand high dimensional histogramming. In Information Processing in Medical Imaging, pages 725–737,2005.[13] J. Zhang and A. Rangarajan. Affine image registration using a new information metric. In IEEEComputer Vision and Pattern Recognition (CVPR), volume 1, pages 848–855, 2004.[14] J. Zhang and A. Rangarajan. A unified feature based registration method for multimodality images. InIEEE International Symposium on Biomedical Imaging (ISBI), pages 724–727, 2004.[15] A. Peter and A. Rangarajan. Shape matching using the Fisher-Rao Riemannian metric: Unifying shaperepresentation and deformation. In IEEE International Symposium on Biomedical Imaging (ISBI),pages 1164–1167, 2006.[16] A. Rajwade, A. Banerjee, and A. Rangarajan. Continuous image representations avoid the histogrambinning problem in mutual information-based registration. In IEEE International Symposium onBiomedical Imaging (ISBI), pages 840–844, 2006.[17] H. Guo, A. Rangarajan, S. Joshi, and L. Younes. A new joint clustering and diffeomorphism estimationalgorithm for non-rigid shape matching. In Chandra Khambametteu, editor, IEEE CVPR Workshop onArticulated and Non-rigid motion (ANM), pages 16–22. 2004.[18] H. Guo, A. Rangarajan, S. Joshi, and L. Younes. Non-rigid registration of shapes via diffeomorphicpoint matching. In IEEE Intl. Symposium on Biomedical Imaging (ISBI), volume 1, pages 924–927,2004.[19] H. Guo, A. Rangarajan, and S. Joshi. Diffeomorphic point matching. In N. Paragios, Y. Chen, andO. Faugeras, editors, The Handbook of Mathematical Models in Computer Vision, pages 205–220.2005.[20] A. Peter and A. Rangarajan. Maximum likelihood wavelet density estimation with applications toimage and shape matching. IEEE Trans. Image Processing, 2007. (accepted subject to minor revision).

Page 42: Category vectorspaceessex

References[21] F. Wang, B.C. Vemuri, A. Rangarajan, and S.J. Eisenschenk. Simultaneous nonrigid registration ofmultiple point sets and atlas construction. IEEE Trans. Pattern Analysis and Machine Intelligence,2007. (in press).[22] A. Peter and A. Rangarajan. Information geometry for landmark shape analysis: Unifying shape representationand deformation. IEEE Trans. Pattern Analysis and Machine Intelligence, 2007. (revisedand resubmitted).[23] A. Rajwade, A. Banerjee, and A. Rangarajan. Probability density estimation using isocontours andisosurfaces: Applications to information theoretic image registration. IEEE Trans. Pattern Analysisand Machine Intelligence, 2007. (under revision).[24] A. Peter and A. Rangarajan. Shape L’Ane Rouge: Sliding wavelets for indexing and retrieval. In IEEEComputer Vision and Pattern Recognition (CVPR), 2008. (submitted).[25] A. Rajwade, A. Banerjee, and A. Rangarajan. Newimage-based density estimators for 3D intermodalityimage registration. In IEEE Computer Vision and Pattern Recognition (CVPR), 2008. (submitted).[26] A. Rangarajan and H. Chui. Applications of optimizing neural networks in medical image registration.In Artificial Neural Networks in Medicine and Biology (ANNIMAB), Perspectives in neural computing,pages 99–104. Springer, 2000.[27] A. Rangarajan and H. Chui. A mixed variable optimization approach to non-rigid image registration. InDiscrete Mathematical Problems with Medical Applications, volume 55 of DIMACS series in DiscreteMathematics and Computer Science, pages 105–123. American Mathematical Society, 2000.[28] H. Chui and A. Rangarajan. A new algorithm for non-rigid point matching. In Proceedings of IEEEConf. on Computer Vision and Pattern Recognition–CVPR 2000, volume 2, pages 44–51. IEEE Press,2000.[29] H. Chui and A. Rangarajan. A feature registration framework using mixture models. In IEEEWorkshopon Mathematical Methods in Biomedical Image Analysis (MMBIA), pages 190–197. IEEE Press, 2000.[30] H. Chui, L. Win, J. Duncan, R. Schultz, and A. Rangarajan. A unified feature registration method forbrain mapping. In Information Processing in Medical Imaging (IPMI), pages 300–314. Springer, 2001

Page 43: Category vectorspaceessex

References[ [31] A. Rangarajan. Learning matrix space image representations. In Energy Minimization Methods forComputer Vision and Pattern Recognition (EMMCVPR), Lecture Notes in Computer Science, LNCS2134, pages 153–168. Springer, New York, 2001.[32] A. Rangarajan, H. Chui, and E.Mjolsness. A relationship between spline-based deformable models andweighted graphs in non-rigid matching. In IEEE Conf. on Computer Vision and Pattern Recognition(CVPR), pages I:897–904. IEEE Press, 2001.[33] H. Chui and A. Rangarajan. Learning an atlas from unlabeled point-sets. In IEEE Workshop on MathematicalMethods in Biomedical Image Analysis (MMBIA), pages 58–65. IEEE Press, 2001.[34] H. Chui and A. Rangarajan. A new joint point clustering and matching algorithm for estimating nonrigiddeformations. In Intl. Conf. on Mathematics and Engineering Techniques in Medicine and BiologicalSciences (METMBS), pages I:309–315. CSREA Press, 2002.[35] A. Rangarajan and A. L. Yuille. MIME: Mutual information minimization and entropy maximizationfor Bayesian belief propagation. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advancesin Neural Information Processing Systems 14, pages 873–880, Cambridge, MA, 2002. MIT Press.[36] A. L. Yuille and A. Rangarajan. The Concave Convex procedure (CCCP). In T. G. Dietterich, S. Becker,and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, pages 1033–1040,Cambridge, MA, 2002. MIT Press.[37] H. Chui, L. Win, J. Duncan, R. Schultz, and A. Rangarajan. A unified non-rigid feature registrationmethod for brain mapping. Medical Image Analysis, 7(2):113–130, 2003.[38] H. Chui and A. Rangarajan. A new point matching algorithm for non-rigid registration. ComputerVision and Image Understanding, 89(2-3):114–141, 2003.[39] A. L. Yuille and A. Rangarajan. The Concave-Convex procedure (CCCP). Neural Computation,15:915–936, 2003.[40] H. Chui, A. Rangarajan, J. Zhang, and C.M. Leonard. Unsupervised learning of an atlas from unlabeledpoint-sets. IEEE Trans. Pattern Analysis and Machine Intelligence, 26(2):160–172, 2004.[41] P. Gardenfors. Conceptual spaces: The geometry of thought. MIT Press, 2000.[42] J. C. Platt, N. Cristianini, and J. Shawe-Taylor. Large margin DAGs for multiclass classification. InAdvances in Neural Information Processing Systems (NIPS), volume 12, pages 547–553. MIT Press,2000.

Page 44: Category vectorspaceessex

References[43] Y. Lee, Y. Lin, and G. Wahba. Multicategory support vector machines, theory, and application tothe classification of microarray data and satellite radiance data. Journal of the American StatisticalAssociation, 99:67–81, 2004.[44] C.-W. Hsu and C.-J. Lin. A comparison of methods for multiclass support vector machines. IEEETrans. Neural Networks, 13(2):415–425, 2002.[45] T. Kolb. Music theory for guitarists: Everything you ever wanted to know but were afraid to ask . HalLeonard, 2005.[46] K. Fukunaga. Introduction to Statistical Pattern Recognition. Academic Press (second edition), 1990.[47] S. Mika, G. Ratsch, and K.-R. Muller. A mathematical programming approach to the kernel fisheralgorithm. In T. K. Leen, T. G. Dietterich, and V. Tresp, editors, Advances in Neural InformationProcessing Systems 13, pages 591–597. MIT Press, 2001.[48] D. Widdows. Geometry and Meaning. Center for the Study of Language and Information, 2004.[49] T. Jebara. Machine Learning: Discriminative and Generative. Kluwer Academic Publishers, 2003.[50] V. Vapnik. Statistical Learning Theory. Wiley Interscience, 1998.[51] B. Scholkopf, A. Smola, R. C. Williamson, and P. L. Bartlett. New support vector algorithms. NeuralComputation, 12(5):1207–1245, 2000.[52] M. E. Tipping. Sparse Bayesian learning and the relevance vector machine. Journal of MachineLearning Research, 1:211–244, 2001.[53] U. Kressel. Pairwise classification and support vector machines. In Advances in Kernel Methods -Support Vector Learning, pages 255–268. MIT Press, 1999.[54] C. M. Bishop. Pattern recognition and machine learning. Springer, 2006.[55] J. Weston and C. Watkins. Multi-class support vector machines. Technical Report CSD-TR-98-04,Department of Computer Science, Royal Holloway, University of London, 1998.[56] E. L. Allwein, R. E. Schapire, and Y. Singer. Reducing multiclass to binary: a unifying approach formargin classifiers. Journal of Machine Learning Research, 1:113–141, 2001.[57] J. C. Platt. Fast training of support vector machines using sequential minimal optimization. In Advancesin Kernel Methods - Support Vector Learning, pages 185–208. MIT Press, 1999.

Page 45: Category vectorspaceessex

References[58] L. Kaufman. Solving the quadratic programming problem arising in support vector classification.In B. Schölkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods - Support VectorLearning, pages 147–168. MIT Press, 1999.[59] O. L. Mangasarian and D. R. Musicant. Lagrangian support vector machines. Journal of MachineLearning Research, 1(3):161–177, 2001.[60] G. M. Fung and O. L. Mangasarian. A feature selection Newton method for support vector machineclassification. Computational Optimization and Applications, 28:185–2002, 2004.[61] T. Joachims. Making large-scale SVM learning practical. In B. Schölkopf, C. Burges, and A. Smola,editors, Advances in Kernel Methods - Support Vector Learning, pages 169–184. MIT Press, 1999.[62] K. Crammer and Y. Singer. On the algorithmic implementation of multiclass kernel-based vector machine.Journal of Machine Learning Research, 2(2):265–292, Springer 2002.[63] J. A. K. Suykens and J. Vandewalle. Multiclass least squares support vector machines. In InternationalJoint Conference on Neural Networks, volume 2, pages 900–903, 1999.[64] T. Joachims. Training linear SVMs in linear time. In Proceedings of the 12th ACM SIGKDD internationalconference on Knowledge discovery and data mining, volume 12, pages 217–226, 2006.[65] G. B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller. Labeled faces in the wild: A databasefor studying face recognition in unconstrained environments. Technical Report 07-49, University ofMassachusetts, Amherst, October 2007. Available at http://vis-www.cs.umass.edu/lfw.[66] P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. In IEEEComputer Vision and Pattern Recognition (CVPR), volume 1, pages 511–518, 2001.[67] G. Wahba. Spline models for observational data. SIAM, Philadelphia, PA, 1990.[68] F. L. Bookstein. Principal warps: Thin-plate splines and the decomposition of deformations. IEEETrans. Patt. Anal. Mach. Intell., 11(6):567–585, June 1989.[69] S. Ramaswamy, P. Tamayo, R. Rifkin, S. Mukherjee, C.-H. Yeang, M. Angelo, C. Ladd, M. Reich,E. Latulippe, J. P. Mesirov, T. Poggio, W. Gerald, M. Lodadagger, E. S. Lander, and T. R. Golub. Multiclasscancer diagnosis using tumor gene expression signatures. Proceedings of the National Academyof Sciences (PNAS), 98(26):15149–15154, 2001.

Page 46: Category vectorspaceessex

References[70] D. Lowe. Object recognition from local scale-invariant features. In IEEE International Conference onComputer Vision (ICCV), volume 2, pages 1150–1157, 1999.[71] M. E. Tipping and C. M. Bishop. Mixtures of probabilistic principal component analyzers. NeuralComputation, 11(2):443–482, 1999.[72] M. A. O. Vasilescu and D. Terzopoulos. Multilinear Image Analysis for Facial Recognition. In ICPR(2), pages 511–514, 2002.[73] X. He, D. Cai, H. Liu, and J. Han. Image clustering with tensor representation. In Zhang H., ChuaT., Steinmetz R., Kankanhalli M. S., and Wilcox L., editors, ACM Multimedia, pages 132–140. ACM,2005.[74] J. B. MacQueen. Some Methods for classification and Analysis of Multivariate Observations. InProceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, volume 1, pages281–297. University of California Press, 1967.[75] D. Titterington, A. Smith, and U. Makov. Statistical Analysis of Finite Mixture Distributions. JohnWiley & Sons, 1985.[76] J. Pearl. Probabilistic Reasoning in Intelligent Systems : Networks of Plausible Inference. MorganKaufmann, September 1988.[77] X. He, D. Cai, and P. Niyogi. Tensor Subspace Analysis. InWeiss Y., Schölkopf B., and Platt J., editors,Advances in Neural Information Processing Systems 18, pages 499–506. MIT Press, Cambridge, MA,2006.[78] R. J. Hathaway. Another interpretation of the EM algorithm for mixture distributions. Statistics andProbability Letters, 4:53–56, 1986.[79] R. M. Neal and G. E. Hinton. A view of the EM algorithm that justifies incremental, sparse, and othervariants. In Jordan M. I., editor, Learning in Graphical Models, pages 355–370. Kluwer, 1998.[80] A. L. Yuille and J. J. Kosowsky. Statistical physics algorithms that converge. Neural Computation,6(3):341–356, May 1994.[81] A. L. Yuille, P. Stolorz, and J. Utans. Statistical physics, mixtures of distributions, and the EM algorithm.Neural Computation, 6(2):334–340, March 1994.

Page 47: Category vectorspaceessex

References[82] B. Leibe and B. Schiele. Analyzing appearance and contour based methods for object categorization.In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume 2, pages 409–415,Madison, WI, June 2003.[83] G. Griffin, A. Holub, and P. Perona. CalTech 256 object category dataset. Technical Report CNS-TR-2007-001, Calif. Inst. of Tech., 2007.