15
PAC-Bayes Risk Bounds for Sample-Compressed Gibbs Classifiers ICML 2005 ICML 2005 François Laviolette and Mario Marchand Université Laval

PAC-Bayes Risk Bounds for Sample-Compressed Gibbs Classifiers

  • Upload
    cuyler

  • View
    36

  • Download
    2

Embed Size (px)

DESCRIPTION

PAC-Bayes Risk Bounds for Sample-Compressed Gibbs Classifiers. ICML 2005 François Laviolette and Mario Marchand Université Laval. PLAN. The “traditional” PAC-Bayes theorem (for the usual data-independent setting ) - PowerPoint PPT Presentation

Citation preview

Page 1: PAC-Bayes Risk Bounds  for  Sample-Compressed Gibbs Classifiers

PAC-Bayes Risk Bounds for

Sample-Compressed Gibbs Classifiers

ICML 2005ICML 2005

François Laviolette and Mario MarchandUniversité Laval

Page 2: PAC-Bayes Risk Bounds  for  Sample-Compressed Gibbs Classifiers

PLAN The “traditional” PAC-Bayes theorem

(for the usual data-independent setting )

The “generalized” PAC-Bayes theorem (for the more general sample compression setting)

Implications and follow-ups

Page 3: PAC-Bayes Risk Bounds  for  Sample-Compressed Gibbs Classifiers

A result from folklore :

Page 4: PAC-Bayes Risk Bounds  for  Sample-Compressed Gibbs Classifiers

In particular, for Gibbs classifiers:

What if we choose P after observing the data?

Page 5: PAC-Bayes Risk Bounds  for  Sample-Compressed Gibbs Classifiers

The “traditional” PAC-Bayes Theorem

Page 6: PAC-Bayes Risk Bounds  for  Sample-Compressed Gibbs Classifiers

The Gibbs and the majority vote We have a bound for GQ but we normally use instead the Bayes

classifier BQ (which is the Q-weighted majority vote classifier)

Consequently R(BQ) · 2R(GQ) (can be improved with the “de-randomization” technique of Langford-Shaw-Taylor 2003)

So the PAC-Bayes theorem also gives a bound on the Majority vote classifier.

Page 7: PAC-Bayes Risk Bounds  for  Sample-Compressed Gibbs Classifiers

The sample compression setting Theorem 1 is valid in the usual data-independent

setting where H is defined without reference to the training data

Example: H = the set of all linear classifiers h: Rn!{-1,+1}

In the more general sample compression setting, each classifier is identified by 2 different sources of information:

The compression set: an (ordered) subset of the training set A message string of additional information needed to identify a

classifier

Theorem 1 is not valid in this more general setting

Page 8: PAC-Bayes Risk Bounds  for  Sample-Compressed Gibbs Classifiers

To be more precise: In the sample compression setting, there exists a

“reconstruction” function R that gives a classifier

h = R(, Si)

when given a compression set Si and a message string .

Recall that Si is an ordered subset of the training set S where the order is specified by i=(i1, i2, … , i|i|).

Page 9: PAC-Bayes Risk Bounds  for  Sample-Compressed Gibbs Classifiers

Examples

Set Covering Machines (SCM) [Marchand and Shaw-Taylor JMLR 2002]

Decision List Machines (DLM) [Marchand and Sokolova JMLR 2005]

Support Vector Machines (SVM) Nearest neighbour classifiers (NNC) …

Page 10: PAC-Bayes Risk Bounds  for  Sample-Compressed Gibbs Classifiers

We will thus use priors defined over the set of all the parameters (i,) needed by the reconstruction function R, once a training set S is given.

The priors should be written as:

Priors in the sample compression setting

The priors must be Data-independent

Page 11: PAC-Bayes Risk Bounds  for  Sample-Compressed Gibbs Classifiers

The “generalized” PAC-Bayes Theorem

Page 12: PAC-Bayes Risk Bounds  for  Sample-Compressed Gibbs Classifiers

a (the rescaled ) incorporates Occam’s principle of parsimony

The new PAC-Bayes theorem states that the risk bound for is lower than the risk bound for any .

Page 13: PAC-Bayes Risk Bounds  for  Sample-Compressed Gibbs Classifiers

The PAC-Bayes theorem for bounded compression set size

Page 14: PAC-Bayes Risk Bounds  for  Sample-Compressed Gibbs Classifiers

Conclusion The new PAC-Bayes bound

is valid in the more general sample compression setting.

incorporates automatically the Occam’s principle of parsimony

A sample compressed Gibbs classifier can have a smaller risk bound than any of its member.

Page 15: PAC-Bayes Risk Bounds  for  Sample-Compressed Gibbs Classifiers

The next steps Finding derived bounds for particular sample

compressed classifiers like: majority votes of SCMs and DLMs, SVMs NNCs.

Developing new learning algorithms based on the theoretical information given by the bound.

A tight Risk bound for Majority vote classifiers ?