Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent...

Chapter 6 The Structural Risk Minimization Principle

Junping Zhangjpzhang@fudan.edu.cn

Intelligent Information Processing Laboratory, Fudan UniversityMarch 23, 2004

Objectives

Structural risk minimization

Two other induction principles

The Scheme of the SRM induction principle

Real-Valued functions

Principle of SRM

Minimum Description Length and SRM inductive principles

The idea about the Nature of Random Phenomena

Minimum Description Length Principle for the Pattern Recognition Problem

Bounds for the MDL SRM for the simplest Model and MDL The Shortcoming of the MDL

Probability theory (1930s, Kolmogrov) Formal inference Axiomatization hasn’t considered nat

ure of randomness Axioms: given probability measures

The model of randomness Solomonoff (1965), Kolmogrov (1965), Ch

aitin (1966). Algorithm (descriptive) complexity

The length of the shortest binary computer program

Up to an additive constant does not depend on the type of computer.

Universal characteristic of the object.

A relatively large string describing an object is random If algorithm complexity of an object is high If the given description of an object cannot be

compressed significantly. MML (Wallace and Boulton, 1968)& MDL

(Rissanen, 1978) Algorithm Complexity as a main tool of induc

tion inference of learning machines

Minimum Description Length Principle for the Pattern Recognition Problem

Given l pairs containing the vector x and the binary value ω

Consider two strings: the binary string

Question

Q: Given (147), is the string (146) a random object?

A: to analyze the complexity of the string (146) in the spirit of Solomonoff-Kolmogorov-Chaitin ideas

Compress its description

Since ω i i=1,…l are binary values, the string (146) is described by l bits.

Since training pairs were drawn randomly and independently.

The value ω i depend on the vector xi but not on the vector xj.

General Case: not contain the perfect table.

Randomness

Bounds for the MDL

Q: Does the compression coefficient

K(T) determine the probability of the test error in classification (decoding) vectors x by the table T?

A: Yes

Comparison between the MDL and ERM in the simplest model

SRM for the simplest Model and MDL

The power of compression coefficient

To obtain bound for the probability of error

Only information about the coefficient need to be known.

The power of compression coefficient

How many examples we used How the structure of code books was

organized Which code book was used and how

many tables were in this code book. How many errors were made by the

table from the code book we used.

MDL principle

To minimize the probability of error One has to minimize the coefficient

of compression

The shortcoming of the MDL

MDL uses code books with a finite number of tables.

Continuously depends on parameters, one has to first quantize that set to make the tables.

Quantization

How do we make the ‘smart’ quantization for a given number of observations.

For a given set of functions, how can we construct a code book with a small number of tables but with good approximation ability?

The shortcoming of the MDL

Finding a good quantization is extremely difficult and determines the main shortcoming of MDL principle.

The MDL principle works well when the problem of constructing reasonable code books has a good solution.

Consistency of the SRM principle and asymptotic bounds on the rate of convergence

Q: Is the SRM consistent? What is the bound on the

(asymptotic) rate of convergence?

Consistency of the SRM principle.

Simplification version

Remark

To avoid choosing the minimum of functional (156) over the infinite number of elements of the structure.

Additional constraint Choose the minimum from the first l

elements of the structure where l is equal to the number of observations.

Discussions and Example

The rate of convergence is determined by two contradictory requirements on the rule n=n(l). The first summand: The larger n=n(l) , the small

er is the deviation The second summand: The larger n=n(l), the lar

ger deviation For structures with a known bound on the r

ate of approximation, select the rule that assures the largest rate of convergence.

Bounds for the regression estimation problem

The model of regression estimation by series expansion

Example

The problem of approximating functions

To get high asymptotic rate of approximation

the only constraint is that the kernel should be a bounded

function which can be described as a family of functions possessing finite VC dimension.

Problem of local risk minimization

Local Risk Minimization Model

Using local risk minimization methods, one probably does not need rich sets of approximating functions. Whereas the classical semi-local

methods are based on using a set of constant functions.

For local estimation functions in the one-dimensional case, it is probably enough to consider elements Sk, k=0,1,2,3 containing the polynomials of degree 0,1,2,3

Summary MDL SRM Local Risk Functional

Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent...

Documents

1 A Survey on the China’s Apparel Industry Xingmin Yin China Center for Economic Studies, Fudan University 220 Handan Road, Shanghai, 200433 Email: yin1953@fudan.edu.cnyin1953@fudan.edu.cn

China’s Aid and Investment in Africa: A Viable Solution to ... · A Viable Solution to International Development? Yu ZHENG1 Fudan University yzheng@fudan.edu.cn August 15, 2016

Fudan Newsletter Fall 2013

Chapter 6 The Memory Hierarchy Jin Lu 11210240054@fudan.edu.cn

Work @ Fudan University

Innate Immunity Rui He ruihe@fudan.edu.cn Department of Immunology Shanghai Medical School Fudan University

FUDAN UNIVERSITY Dysfunction & Transformation Fudan …

University of Texas at Austin - Towards 3D Human …zhouxy/materials/ICCV17-poster.pdf1 Fudan University, 2 The University of Texas at Austin, 3 Microsoft Research {zhouxy13,xyxue}@fudan.edu.cn,

An Improved Algorithm for School Bullying Based on K-means ... · School of Information Science and Sechnology, Fudan University, Shanghai 200433, China. gongxy14@fudan.edu.cn . Keywords:

Outdoor Air Pollution and Population Health in China Haidong Kan, M.D., Ph.D. School of Public Health, Fudan University Shanghai, China kanh@fudan.edu.cn

Conference · 2019. 4. 19. · Wang 1,3Feng1,2, Yong Cai, Shen Ke1, Zhu Qin1, & Shen Jie1 1 Fudan University, 2 University of California, 3 University of North Carolina wangzy@fudan.edu.cn

Welcome to Fudan...Buddy Program •From year 2010 on, Fudan launches Buddy Program for incoming international exchange students. Fudan will recruit volunteers to match with exchange

Surface of Enveloped Viruses - Fudan University · 2013. 9. 13. · Surface of Enveloped Viruses Ye, Rong Ph.D. Fudan University School of Basic Medical Sciences. yerong24@fudan.edu.cn

H¨olderian error bounds and Kurdyka-L ojasiewicz ... · ∗School of Data Science, Fudan University, Shanghai, China, rjjiang@fudan.edu.cn ... ming and duality theory [35], to name

Full Media and Pan-Education Shiyong Zhang szhang@fudan.edu.cn Fudan University,Shanghai,China August 28, 2003 APAN 2003,BUSAN NiEC, FUDAN, Aug.28,2003

Fudan-UC Dispatch Fudan-UC Dispatch · 2019-03-04 · Fudan-UC Dispatch 1 FUDAN AND UC SCHOLARS’ RESEARCH ON CHINA Maximillian Auffhammer, UC Berkeley Field: Agricultural Economics

XMM-Newton tutorial - Fudan Universityblackhole2018.fudan.edu.cn/bh2018/S/2bc.pdf · XMM-Newton tutorial Mauro Dadina (largely based on Eleonora Torresi presentation) Image courtesy

FUDAN UNIVERSITY SCHOOL OF LA W · FUDAN UNIVERSITY SCHOOL OF LA W ... comparative civil and commercial law, business law, ... FUDAN UNIVERSITY SCHOOL OF LAW 9

Junping Shi1 - College of William & Mary

Research Strengths of Fudan University Dr. Chouwen ZHU Director, Foreign Affairs Office, Fudan University Matchmaking Workshop FUDAN-MACQUARIE-HAMBURG