View
218
Download
1
Category
Tags:
Preview:
Citation preview
IDSIA Lugano Switzerland
On the Convergence Speed of MDL Predictions for Bernoulli Sequences
Jan Poland and Marcus Hutter
Is MDL Really So Bad?
or
2
Big Picture
MDL Bayes Other methods, e.g. PAC-Bayes
3
Bernoulli Classes
121410
38164
111001
18164
111000
58164
111010
341161101
78164
111011
01400
11401
#
wCode
141161100
Code = 111|{z}1+#bi ts
0|{z}stop
10|{z}data
² Set of parameters £ = f#1;#2; : : :g½[0;1]² Weights w# for each#2 £² Weights correspond to codes: w#=2¡ (Code#)
4
² Givenobservedsequencex=x1x2:::xn
² Probabilityof x given#:p#(x) =##ones(x)(1¡ #)n¡ #ones(x)
² Posterior weights w#(x) =w#p#(x)P#w#p#(x)
² Bayesmixture»(x) =P#w#(x)#
² MDL/MAP #¤(x) =argmax#w#(x)#
² MaximumLikelihood (ML):SameasMAP,butwithprior weightsset to1
Estimators
5
An Example Process
Sequence x
Bayes mixture
ML estimate
MAP (MDL) *
0.5
0
0
0
0.21
0
0
1
0.5
0.5
0.5
0
0.45
0.34
0.5
0000011
0.4
5/16
0.5
...(32)...
0.27
0.25
0.25
...(640)...
0.3
5/16
5/16
Trueparameter#0= 5
16 =0:3125
6
² Let #0 2 £ bethe trueparameter withweight w0
² » converges to #0 almost surely and fast,
preciselyP 1t=0E(»¡ #0)2 · ln(w¡ 10 )
² #¤ convergesto#0 almost surelyandingeneral slow,
preciselyP 1t=0E(#¤ ¡ #0)2 · O(w¡ 10 )
² Even true for arbitrary non-i.i.d. (semi-) measures!² TheML estimates converge to #0 almost surely,no such assertion about convergencespeed possible
What We Know
7
² Bayesmixturebound is descri pt i on l ength(#0)
² MDL bound is exp(descri pt i on l ength(#0))
² ) MDL is exponentiallyworse in general
² This is also a loss bound!
² Howabout simple classes?
² Deterministic classes: can showboundhuge constant£(descri pt i on l ength(#0))3
² Simplestochastic classes, e.g. Bernoulli?
Is MDL Really So Bad?
8
N parameters, w#= 1Nfor all #, #0= 1
2
MDL Is Really So Bad!
12+ 116
12+ 18
12+ 14
12
: :: } }}}}
Pt E(#
¤ ¡ #0)21#¤2[12+18;12+14]
=O(1)
Pt E(#
¤ ¡ #0)21#¤2[12+ 116;12+ 18]
=O(1)
Pt E(#
¤ ¡ #0)2=O(w¡ 10 ) in the following example:
9
² The instantaneous loss bound is good,
precisely E (#¤ ¡ #0)2 · 1nO¡ln(w¡ 10 )
¢
² This does not imply a ¯nitely bounded cumulativeloss!
² The cumulative loss bound is good for certain niceclasses (parameters+weights)
² Intuitively: Bound is good if parameters of equalweights areuniformly distributed
MDL Is Not That Bad!
10
² De ne interval construction (I k; J k) which exponen-tially contracts to #0
² Let K (I k) betheshortest description lengthof some#2 I k
Prepare Sharper Upper Bound
0 18
178
34
58
12
38#0= 1
4}J 0= [0; 12)
}I 0= [12;1]
}}}
I1 J1I1
11
² Let K (J k) betheshortest description lengthof some#2 J k
² Let ¢ (k) =max©K (I k) ¡ K (J k);0
ª
² Theorem:X
t
E(#¤ ¡ #0)2 · O¡lnw¡ 10 +
1X
k=1
2¡ ¢ (k)p¢ (k)
¢
² Corollaries: \Uniformly distributed weights ) goodbounds
Sharper Upper Bound
12
² £ = fall computable#2 [0;1]g
² w#=2¡ K (#), whereK denotesthepre xKolmogorovcomplexity
²Pk 2¡ ¢ (k)
p¢ (k) = 1 ) Theoremnot applicable
² Conjecture:X
t
E(#¤¡ #0)2 · O¡lnw¡ 10 +
1X
k=1
2¡ ¢ (k)¢
² ) bound huge constant£pol ynomi al holds forincompressible#0
² Compare to determistic case
The Universal Case
13
² Cumulativeand instantaneousboundsareincompat-ible
² Main positivegeneralizes to arbitrary i.i.d. classes
² Openproblem: goodboundsformoregeneral classes?
² Thank you!
Conclusions
Recommended