Hybrids of generative and discriminative methods for machine learning

MSRC Summer School - 30/06/2009

Cambridge – UK

Hybrids of generative anddiscriminative methods for

machine learning

Motivation

Generative models• prior knowledge• handle missing data such as labels

Discriminative models• perform well at classification

However• no straightforward way to combine them

Content

Generative and discriminative methods

A principled hybrid framework• Study of the properties on a toy example• Influence of the amount of labelled data

Content

Generative methods

Answer: “what does a cat look like? and a dog?” => data and labels joint distribution

x : data

c : label

: parameters

Generative methods

Objective function:G() = p() p(X, C|)

G() = p() n p(xn, cn|)

1 reusable model per class, can deal with incomplete data

Example: GMMs

Example of generative model

Discriminative methods

Answer: “is it a cat or a dog?” => labels posterior distribution

x : data

c : label

: parameters

Discriminative methods

The objective function isD() = p() p(C|X, )

D() = p() n p(cn|xn, )

Focus on regions of ambiguity, make faster predictions

Example: neural networks, SVMs

Example of discriminative model

SVMs / NNs

Generative versus discriminative

No effect of the double mode on the decision boundary

Content

Semi-supervised learning

Few labelled data / lots of unlabelled data

Discriminative methods overfit, generative models only help classify if they are “good”

Need to have the modelling power of generative models while performing at discriminating => hybrid models

Discriminative trainingBach et al, ICASSP 05

Discriminative objective function:D() = p() n p(cn|xn, )

Using a generative model:D() = p() n [ p(xn, cn|) / p(xn|) ]

D() = p() n c p(xn, c|)

p(xn, cn|)

Convex combinationBouchard et al, COMPSTAT 04

Generative objective function:G() = p() n p(xn, cn|)

Discriminative objective function:D() = p() n p(cn|xn, )

Convex combination:log L() = log D() + (1- ) log G()

A principled hybrid model

- posterior distribution of the labels

’- marginal distribution of the data

and ’ communicate through a prior

Hybrid objective function:

L(,’) = p(,’) n p(cn|xn, ) n p(xn|’)

= ’ => p(, ’) = p() (-’)

L(,’) = p() (-’) n p(cn|xn, ) n p(xn|’)

L() = G() generative case

’ => p(, ’) = p() p(’) L(,’) = [ p() n p(cn|xn, ) ] [ p(’) n p(xn|’) ] L(,’) = D() f(’) discriminative case

Anything in between – hybrid case

Choice of prior:p(, ’) = p() N(’|, ())

0 => = ’

1 => => ’

Why principled?

Consistent with the likelihood of graphical models

=> one way to train a system

Everything can now be modelled => potential to be Bayesian

Potential to learn

Learning

EM / Laplace approximation / MCMC either intractable or too slow

Conjugate gradients flexible, easy to check BUT sensitive to

initialisation, slow

Variational inference

Content

Toy example

2 elongated distributions

Only spherical gaussians allowed => wrong model

2 labelled points per class => strong risk of overfitting

Toy example

Decision boundaries

Content

A real example

Images are a special case, as they contain several features each

2 levels of supervision: at the image level, and at the feature level• Image label only => weakly labelled• Image label + segmentation => fully labelled

The underlying generative model

gaussian

multinomial

The underlying generative model

weakly – fully labelled

Experimental set-up

3 classes: bikes, cows, sheep

: 1 Gaussian per class => poor generative model

75 training images for each category

HF framework

HF versus CC

Results

When increasing the proportion of fully labelled data, the trend is:

generative hybrid discriminative

Weakly labelled data has little influence on the trend

With sufficient fully labelled data, HF tends to perform better than CC

Experimental set-up

3 classes: lions, tigers and cheetahs

: 1 Gaussian per class => poor generative model

75 training images for each category

HF framework

HF versus CC

Results

Hybrid models consistently perform better

However, generative and discriminative models haven’t reached saturation

No clear difference between HF and CC

Conclusion

Principled hybrid framework

Possibility to learn the best trade-off

Helps for ambiguous datasets when labelled data is scarce

Problem of optimisation

Future avenues

Bayesian version (posterior distribution of ) under study

Replace by a diagonal matrix to allow flexibility => need for the Bayesian version

Choice of priors

Thank you!

Hybrids of generative and discriminative methods for machine learning

Documents

Generative or Discriminative? Getting the Best of Both Worlds · PDF fileFor many applications of machine learning the goal is to predict the value of ... generative and discriminative

Comparison of Generative and Discriminative Techniques for ... · PDF fileComparison of Generative and Discriminative Techniques for Object Detection and Classication Ilkay Ulusoy1

Discriminative and Generative Classifiers

Generative Adversarial Networks, and Applications Adversarial Networks, and Applications Outline: •Generative Models vs Discriminative Models (Background) •Generative Adversarial

Generative Models vs. Discriminative models. Roughly: Discriminative Feedforw ard Bottom-up Generative Feedforward recurrentfeedback Bottom-uphorizontaltop-down

Combining Generative and Discriminative Models for

Natural Language Processing - University of …people.ischool.berkeley.edu/~dbamman/nlpF17/slides/3...Generative vs. Discriminative models • Generative models specify a joint distribution

Discriminative and generative methods for bags of features

Learning A Joint Discriminative-Generative Model …sgg/papers/AlexiouEtAl...Learning A Joint Discriminative-Generative Model for Action Recognition Ioannis Alexiou 1, Tao Xiang2 and

Exploiting Generative Models in Discriminative Classifiers

On Discriminative vs. Generative Classifiers: A · PDF fileOn Discriminative vs. Generative classifiers: A comparison of logistic regression and naive Bayes Andrew Y. Ng Computer Science

Generative and Discriminative Approaches to Graphical ...altun/Teaching/CS359/Lecture0.pdf · Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI

Generative Models vs. Discriminative models

Generative and Discriminative Approaches to Graphical ...ttic.uchicago.edu/~altun/Teaching/CS359/Lecture2.pdf · Generative and Discriminative Approaches to Graphical Models CMSC

Machine Learning: Generative and Discriminative Models

Generative and Discriminative Models · • Logistic regression, ... • Conditional Random Fields (CRF) 5 Generative and Discriminative Pairs • Data point-based – Naïve Bayes

A Joint Discriminative Generative Model for Deformable ......A Joint Discriminative Generative Model for Deformable Model Construction and Classiﬁcation Ioannis Marras 1, Symeon

Logistic Regression, Generative and Discriminative Classifiersguestrin/Class/10701-S05/slides/LogRegress-1-24-0… · Logistic Regression, Generative and Discriminative Classifiers

A Hybrid Discriminative/Generative Approach for Modeling

On Discriminative vs. Generative Classifiers: A comparison