View
43
Download
0
Category
Tags:
Preview:
DESCRIPTION
MSRC Summer School - 30/06/2009. Hybrids of generative and discriminative methods for machine learning. Cambridge – UK. Motivation. Generative models prior knowledge handle missing data such as labels Discriminative models perform well at classification However - PowerPoint PPT Presentation
Citation preview
MSRC Summer School - 30/06/2009
Cambridge – UK
Hybrids of generative anddiscriminative methods for
machine learning
Motivation
Generative models• prior knowledge• handle missing data such as labels
Discriminative models• perform well at classification
However• no straightforward way to combine them
Content
Generative and discriminative methods
A principled hybrid framework• Study of the properties on a toy example• Influence of the amount of labelled data
Content
Generative and discriminative methods
A principled hybrid framework• Study of the properties on a toy example• Influence of the amount of labelled data
Generative methods
Answer: “what does a cat look like? and a dog?” => data and labels joint distribution
x : data
c : label
: parameters
Generative methods
Objective function:G() = p() p(X, C|)
G() = p() n p(xn, cn|)
1 reusable model per class, can deal with incomplete data
Example: GMMs
Example of generative model
Discriminative methods
Answer: “is it a cat or a dog?” => labels posterior distribution
x : data
c : label
: parameters
Discriminative methods
The objective function isD() = p() p(C|X, )
D() = p() n p(cn|xn, )
Focus on regions of ambiguity, make faster predictions
Example: neural networks, SVMs
Example of discriminative model
SVMs / NNs
Generative versus discriminative
No effect of the double mode on the decision boundary
Content
Generative and discriminative methods
A principled hybrid framework• Study of the properties on a toy example• Influence of the amount of labelled data
Semi-supervised learning
Few labelled data / lots of unlabelled data
Discriminative methods overfit, generative models only help classify if they are “good”
Need to have the modelling power of generative models while performing at discriminating => hybrid models
Discriminative trainingBach et al, ICASSP 05
Discriminative objective function:D() = p() n p(cn|xn, )
Using a generative model:D() = p() n [ p(xn, cn|) / p(xn|) ]
D() = p() n c p(xn, c|)
p(xn, cn|)
Convex combinationBouchard et al, COMPSTAT 04
Generative objective function:G() = p() n p(xn, cn|)
Discriminative objective function:D() = p() n p(cn|xn, )
Convex combination:log L() = log D() + (1- ) log G()
[0,1]
A principled hybrid model
A principled hybrid model
A principled hybrid model
A principled hybrid model
A principled hybrid model
- posterior distribution of the labels
’- marginal distribution of the data
and ’ communicate through a prior
Hybrid objective function:
L(,’) = p(,’) n p(cn|xn, ) n p(xn|’)
A principled hybrid model
= ’ => p(, ’) = p() (-’)
L(,’) = p() (-’) n p(cn|xn, ) n p(xn|’)
L() = G() generative case
’ => p(, ’) = p() p(’) L(,’) = [ p() n p(cn|xn, ) ] [ p(’) n p(xn|’) ] L(,’) = D() f(’) discriminative case
A principled hybrid model
Anything in between – hybrid case
Choice of prior:p(, ’) = p() N(’|, ())
0 => = ’
1 => => ’
Why principled?
Consistent with the likelihood of graphical models
=> one way to train a system
Everything can now be modelled => potential to be Bayesian
Potential to learn
Learning
EM / Laplace approximation / MCMC either intractable or too slow
Conjugate gradients flexible, easy to check BUT sensitive to
initialisation, slow
Variational inference
Content
Generative and discriminative methods
A principled hybrid framework• Study of the properties on a toy example• Influence of the amount of labelled data
Toy example
Toy example
2 elongated distributions
Only spherical gaussians allowed => wrong model
2 labelled points per class => strong risk of overfitting
Toy example
Decision boundaries
Content
Generative and discriminative methods
A principled hybrid framework• Study of the properties on a toy example• Influence of the amount of labelled data
A real example
Images are a special case, as they contain several features each
2 levels of supervision: at the image level, and at the feature level• Image label only => weakly labelled• Image label + segmentation => fully labelled
The underlying generative model
gaussian
multinomial
multinomial
The underlying generative model
weakly – fully labelled
Experimental set-up
3 classes: bikes, cows, sheep
: 1 Gaussian per class => poor generative model
75 training images for each category
HF framework
HF versus CC
Results
When increasing the proportion of fully labelled data, the trend is:
generative hybrid discriminative
Weakly labelled data has little influence on the trend
With sufficient fully labelled data, HF tends to perform better than CC
Experimental set-up
3 classes: lions, tigers and cheetahs
: 1 Gaussian per class => poor generative model
75 training images for each category
HF framework
HF versus CC
Results
Hybrid models consistently perform better
However, generative and discriminative models haven’t reached saturation
No clear difference between HF and CC
Conclusion
Principled hybrid framework
Possibility to learn the best trade-off
Helps for ambiguous datasets when labelled data is scarce
Problem of optimisation
Future avenues
Bayesian version (posterior distribution of ) under study
Replace by a diagonal matrix to allow flexibility => need for the Bayesian version
Choice of priors
Thank you!
Recommended