IDSIA Lugano Switzerland On the Convergence Speed of MDL Predictions for Bernoulli Sequences Jan...

Preview:

Citation preview

IDSIA Lugano Switzerland

On the Convergence Speed of MDL Predictions for Bernoulli Sequences

Jan Poland and Marcus Hutter

Is MDL Really So Bad?

or

2

Big Picture

MDL Bayes Other methods, e.g. PAC-Bayes

3

Bernoulli Classes

121410

38164

111001

18164

111000

58164

111010

341161101

78164

111011

01400

11401

#

wCode

141161100

Code = 111|{z}1+#bi ts

0|{z}stop

10|{z}data

² Set of parameters £ = f#1;#2; : : :g½[0;1]² Weights w# for each#2 £² Weights correspond to codes: w#=2¡ (Code#)

4

² Givenobservedsequencex=x1x2:::xn

² Probabilityof x given#:p#(x) =##ones(x)(1¡ #)n¡ #ones(x)

² Posterior weights w#(x) =w#p#(x)P#w#p#(x)

² Bayesmixture»(x) =P#w#(x)#

² MDL/MAP #¤(x) =argmax#w#(x)#

² MaximumLikelihood (ML):SameasMAP,butwithprior weightsset to1

Estimators

5

An Example Process

Sequence x

Bayes mixture

ML estimate

MAP (MDL) *

0.5

0

0

0

0.21

0

0

1

0.5

0.5

0.5

0

0.45

0.34

0.5

0000011

0.4

5/16

0.5

...(32)...

0.27

0.25

0.25

...(640)...

0.3

5/16

5/16

Trueparameter#0= 5

16 =0:3125

6

² Let #0 2 £ bethe trueparameter withweight w0

² » converges to #0 almost surely and fast,

preciselyP 1t=0E(»¡ #0)2 · ln(w¡ 10 )

² #¤ convergesto#0 almost surelyandingeneral slow,

preciselyP 1t=0E(#¤ ¡ #0)2 · O(w¡ 10 )

² Even true for arbitrary non-i.i.d. (semi-) measures!² TheML estimates converge to #0 almost surely,no such assertion about convergencespeed possible

What We Know

7

² Bayesmixturebound is descri pt i on l ength(#0)

² MDL bound is exp(descri pt i on l ength(#0))

² ) MDL is exponentiallyworse in general

² This is also a loss bound!

² Howabout simple classes?

² Deterministic classes: can showboundhuge constant£(descri pt i on l ength(#0))3

² Simplestochastic classes, e.g. Bernoulli?

Is MDL Really So Bad?

8

N parameters, w#= 1Nfor all #, #0= 1

2

MDL Is Really So Bad!

12+ 116

12+ 18

12+ 14

12

: :: } }}}}

Pt E(#

¤ ¡ #0)21#¤2[12+18;12+14]

=O(1)

Pt E(#

¤ ¡ #0)21#¤2[12+ 116;12+ 18]

=O(1)

Pt E(#

¤ ¡ #0)2=O(w¡ 10 ) in the following example:

9

² The instantaneous loss bound is good,

precisely E (#¤ ¡ #0)2 · 1nO¡ln(w¡ 10 )

¢

² This does not imply a ¯nitely bounded cumulativeloss!

² The cumulative loss bound is good for certain niceclasses (parameters+weights)

² Intuitively: Bound is good if parameters of equalweights areuniformly distributed

MDL Is Not That Bad!

10

² De ne interval construction (I k; J k) which exponen-tially contracts to #0

² Let K (I k) betheshortest description lengthof some#2 I k

Prepare Sharper Upper Bound

0 18

178

34

58

12

38#0= 1

4}J 0= [0; 12)

}I 0= [12;1]

}}}

I1 J1I1

11

² Let K (J k) betheshortest description lengthof some#2 J k

² Let ¢ (k) =max©K (I k) ¡ K (J k);0

ª

² Theorem:X

t

E(#¤ ¡ #0)2 · O¡lnw¡ 10 +

1X

k=1

2¡ ¢ (k)p¢ (k)

¢

² Corollaries: \Uniformly distributed weights ) goodbounds

Sharper Upper Bound

12

² £ = fall computable#2 [0;1]g

² w#=2¡ K (#), whereK denotesthepre xKolmogorovcomplexity

²Pk 2¡ ¢ (k)

p¢ (k) = 1 ) Theoremnot applicable

² Conjecture:X

t

E(#¤¡ #0)2 · O¡lnw¡ 10 +

1X

k=1

2¡ ¢ (k)¢

² ) bound huge constant£pol ynomi al holds forincompressible#0

² Compare to determistic case

The Universal Case

13

² Cumulativeand instantaneousboundsareincompat-ible

² Main positivegeneralizes to arbitrary i.i.d. classes

² Openproblem: goodboundsformoregeneral classes?

² Thank you!

Conclusions

Recommended