108
Chapter 6 The Structural Risk Minimization Principle Junping Zhang [email protected] Intelligent Information Processing Labor atory, Fudan University March 23, 2004

Chapter 6 The Structural Risk Minimization Principle Junping Zhang [email protected] Intelligent Information Processing Laboratory, Fudan University

Embed Size (px)

Citation preview

Page 1: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University

Chapter 6 The Structural Risk Minimization Principle

Junping [email protected]

Intelligent Information Processing Laboratory, Fudan UniversityMarch 23, 2004

Page 2: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University

Objectives

Page 3: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University

Structural risk minimization

Page 4: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University

Two other induction principles

Page 5: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University

The Scheme of the SRM induction principle

Page 6: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University

Real-Valued functions

Page 7: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 8: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 9: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 10: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 11: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 12: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 13: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University

Principle of SRM

Page 14: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 15: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 16: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 17: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 18: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 19: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 20: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 21: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University

SRM

Page 22: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 23: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University

Minimum Description Length and SRM inductive principles

The idea about the Nature of Random Phenomena

Minimum Description Length Principle for the Pattern Recognition Problem

Bounds for the MDL SRM for the simplest Model and MDL The Shortcoming of the MDL

Page 24: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University

The idea about the Nature of Random Phenomena

Probability theory (1930s, Kolmogrov) Formal inference Axiomatization hasn’t considered nat

ure of randomness Axioms: given probability measures

Page 25: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University

The idea about the Nature of Random Phenomena

The model of randomness Solomonoff (1965), Kolmogrov (1965), Ch

aitin (1966). Algorithm (descriptive) complexity

The length of the shortest binary computer program

Up to an additive constant does not depend on the type of computer.

Universal characteristic of the object.

Page 26: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University

A relatively large string describing an object is random If algorithm complexity of an object is high If the given description of an object cannot be

compressed significantly. MML (Wallace and Boulton, 1968)& MDL

(Rissanen, 1978) Algorithm Complexity as a main tool of induc

tion inference of learning machines

Page 27: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University

Minimum Description Length Principle for the Pattern Recognition Problem

Given l pairs containing the vector x and the binary value ω

Consider two strings: the binary string

Page 28: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University

Question

Q: Given (147), is the string (146) a random object?

A: to analyze the complexity of the string (146) in the spirit of Solomonoff-Kolmogorov-Chaitin ideas

Page 29: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University

Compress its description

Since ω i i=1,…l are binary values, the string (146) is described by l bits.

Since training pairs were drawn randomly and independently.

The value ω i depend on the vector xi but not on the vector xj.

Page 30: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University

Model

Page 31: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 32: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 33: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 34: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University

General Case: not contain the perfect table.

Page 35: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 36: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University

Randomness

Page 37: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University

Bounds for the MDL

Q: Does the compression coefficient

K(T) determine the probability of the test error in classification (decoding) vectors x by the table T?

A: Yes

Page 38: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University

Comparison between the MDL and ERM in the simplest model

Page 39: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 40: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 41: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University

SRM for the simplest Model and MDL

Page 42: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University

SRM for the simplest Model and MDL

Page 43: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 44: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University

The power of compression coefficient

To obtain bound for the probability of error

Only information about the coefficient need to be known.

Page 45: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University

The power of compression coefficient

How many examples we used How the structure of code books was

organized Which code book was used and how

many tables were in this code book. How many errors were made by the

table from the code book we used.

Page 46: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University

MDL principle

To minimize the probability of error One has to minimize the coefficient

of compression

Page 47: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University

The shortcoming of the MDL

MDL uses code books with a finite number of tables.

Continuously depends on parameters, one has to first quantize that set to make the tables.

Page 48: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University

Quantization

How do we make the ‘smart’ quantization for a given number of observations.

For a given set of functions, how can we construct a code book with a small number of tables but with good approximation ability?

Page 49: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University

The shortcoming of the MDL

Finding a good quantization is extremely difficult and determines the main shortcoming of MDL principle.

The MDL principle works well when the problem of constructing reasonable code books has a good solution.

Page 50: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University

Consistency of the SRM principle and asymptotic bounds on the rate of convergence

Q: Is the SRM consistent? What is the bound on the

(asymptotic) rate of convergence?

Page 51: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 52: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 53: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University

Consistency of the SRM principle.

Page 54: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University

Simplification version

Page 55: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 56: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 57: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 58: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 59: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University

Remark

To avoid choosing the minimum of functional (156) over the infinite number of elements of the structure.

Additional constraint Choose the minimum from the first l

elements of the structure where l is equal to the number of observations.

Page 60: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 61: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 62: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University

Discussions and Example

Page 63: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University

The rate of convergence is determined by two contradictory requirements on the rule n=n(l). The first summand: The larger n=n(l) , the small

er is the deviation The second summand: The larger n=n(l), the lar

ger deviation For structures with a known bound on the r

ate of approximation, select the rule that assures the largest rate of convergence.

Page 64: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 65: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 66: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 67: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 68: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 69: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 70: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University

Bounds for the regression estimation problem

Page 71: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University

The model of regression estimation by series expansion

Page 72: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 73: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 74: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 75: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 76: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 77: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University

Example

Page 78: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 79: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 80: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University

The problem of approximating functions

Page 81: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 82: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 83: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 84: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 85: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 86: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 87: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 88: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 89: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 90: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 91: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University

To get high asymptotic rate of approximation

the only constraint is that the kernel should be a bounded

function which can be described as a family of functions possessing finite VC dimension.

Page 92: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University

Problem of local risk minimization

Page 93: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 94: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 95: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 96: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University

Local Risk Minimization Model

Page 97: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 98: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 99: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 100: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 101: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 102: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 103: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 104: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 105: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University
Page 106: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University

Note

Using local risk minimization methods, one probably does not need rich sets of approximating functions. Whereas the classical semi-local

methods are based on using a set of constant functions.

Page 107: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University

Note

For local estimation functions in the one-dimensional case, it is probably enough to consider elements Sk, k=0,1,2,3 containing the polynomials of degree 0,1,2,3

Page 108: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University

Summary MDL SRM Local Risk Functional