Upload
evan-phillips
View
220
Download
0
Embed Size (px)
Citation preview
1
• By Gil Kalai
• Institute of Mathematics and Center for Rationality, Hebrew University, Jerusalem, Israel
• presented by: Yair Cymbalista
2
• In this lecture we would like to study the extent to which concepts of choice used in economic theory imply “learnable” behavior.
• Learning theory which was introduced and developed by Valiant, Angluin and others at the early 1970’s deals with the question how well a family of objects (functions in this lecture) is learnable from examples. Valiants learning concept is statistical i.e. the examples are chosen randomly.
3
• In order to analyze Learnability we will use a basic model of statistical learning theory introduced by Valiant called the model of PAC-Learnability (PAC stands for “probably approximately correct”). It is based on choosing examples randomly according to some probability distribution.
4
• Let Let 0 1 and be a probability distribution on U. We say that F is learnable from t examples with probability of at least 1 with respect to the probability distribution if the following assertion holds:
}.:|{ YUffF
5
• For every f F if and are chosen at random and independently according to the probability distribution and if f F satisfies , i = 1, 2, . . . , t, then
with probability of at least 1 . We say that F is learnable from t examples with probability of at least 1 if this is the case with respect to every probability distribution on U.
uuuu t ,....,, 21
)()(' ii ufuf )()(' ufuf
6
• Given a set X of N alternatives, a choice function c is a mapping which assigns to a nonempty subset S X an element c(S) S.
• A “rationalizable” choice function is consistent with maximizing behavior, i.e. there is a linear ordering on the alternatives in X and c(S) is the maximum among the elements of S with respect to this ordering.
7
• Rationalizable choice functions are characterized by the Independence of Irrelevant alternatives condition (IIA): the chosen element from a set is also chosen from every subset which contains it.
• A class of choice functions is symmetric if it is invariant under permutations of the alternatives.
8
• In our lecture we will concentrate in the proof of the next two theorems.
• Theorem 1.1. A rationalizable choice function can be statistically learned with high probability from a number of examples which is linear in the number of alternatives.
• Theorem 1.2. Every symmetric class of choice functions requires at least a number of examples which is linear in the number of alternatives for learnability in the PAC-model.
9
. if )( and if )(that
so function a is there},...,2,1{subset
everyfor such that of ,...,, valuesand
of ,...,, elements ofnumber maximal the
is , dimby denoted , ofdimension -P The
21
21
P
BiyufBiyuf
FfsB
YyyyU
uuu
FF
iiBiiB
B
s
s
}:|{ YUffF
10
• Theorem 2.1. For a fixed value of 0 1, the number of examples t needed to learn a class of functions with probability of at least 1 is bounded above an below by a linear function of the P-dimension.
11
• Proposition 2.2. Let Then,
• Proof: The proof is obvious since in order of the P-dimension of F to be s we need at least distinct functions.
}:|{ YUffF
.||log)(dim 2P FF
s2
12
• • If F can be learned from t examples with
probability of at least 1 then so can F' (since we can choose to be supported only on U').
}.|{'Let
.'let ,}:|{Let
'FffF
UUYUffF
U
).(dim)'(dim PP FF
13
• Let C be the class of rationalizable choice
functions defined on nonempty subsets of a set X
of alternatives where |X|=N.
• |C|=N!• Proposition 2.2 • Proposition 2.1 The number of examples
required to learn a rationalizable function in the PAC-model is O(NlogN).
N.logNN!logdim 22P C
14
• Theorem 3.1.• Proof: We will first show that Consider
the elements For
each an order relation which satisfies
gives an appropriate function .
.1dimP NC.1dimP NC
.11 , ),,( 111 Niayaau iii
},...,,{ 121 NuuuB
Bi fBu , if 1111 and , if aaBuaa iii
15
Proof of Theorem 3.1(cont.)
• We will now show that We need to show that for every N sets and N elements i=1,2,…,N there is a subset S {1,2,…,N} such that there is no linear order on the ground set X for which is the maximal element of if and only if i S. Let , clearly we can assume for every k. Let
.1dimP NCNAAA ,...,, 21
ii Aa
iaiA
|| kk As 1ks
).,...,,( 21 NxxxX
16
Proof of Theorem 3.1(cont.)
For every j = 1,2,…,N consider the following vector: . Where:
NNjjjj vvvv R),...,,( 21
jkjkkj
jkjkj
jkkj
axAxv
axsv
Axv
and if 1-
if 1-
if 0
17
Proof of Theorem 3.1(cont.)
• Note that all vectors belong to an N-1 dimensional of vectors whose sum of coordinates is 0. Therefore the vectors are linearly dependent. Suppose now that
and that not all equal zero.
• Let S={ j| }. We will now show that there is no c C such that when k S and when k S.
jv
01
N
j jjvr
NRV Nvvv ,...,, 21
s'jr0jc
kk aAc )( kk aAc )(
18
Proof of Theorem 3.1(cont.)
• Assume to the contrary that there is such a function c. Let , and let
Denote by y the mth coordinate in the linear combination we will show that y is positive.
If or if then
}0|{ jj rAB ).(Bcxm
.1
N
j jjvr
.0 0 jjjmj vrAxr
19
Proof of Theorem 3.1(cont.)
• Assume now that and therefore . There are two cases to consider:
• Therefore Contradiction!!!.0)1(
)( 0
.0)1(
)( 0
jmjj
mjjj
jjmjj
mjjj
rvr
xAcaSjr
srvr
xAcaSjr
jmj Axr 0mj xAc )(
.01
N
j
mjjvry
20
• Theorem 1.1. A rationalizable choice function can be statistically learned with high probability from a number of examples which is linear in the number of alternatives.
• Proof: Theorem 1.1 follows from Theorem 3.1 and theorem 2.1.
21
• Let . Let 0 , 1 and be a probability distribution on U. Let g be an arbitrary function . Define the distance from g to F, dist(g,F), to be the minimum probability over fF that f(x) g(x), with respect to .
• Given t random elements (drawn independently according to ), define the empirical distance of g to F, as: min
}:|{ YUffF
YUg :
tuuu ,....,, 21
,),(dist emp Fg./|)}()(:{| tugufi iiFf
22
• Theorem 4.1. There exists K(,) such that for every probability distribution on U and every function , the number of independent random examples t needed such that
with probability of at least 1- ,
is at most
YUg :
|),(dist),dist(| emp fgFg
. )(dim) , ( P FK
23
• Corollary 4.2. For every probability distribution on U and every function , if g agrees with a function in F on t independent random examples and t then: dist(g,F) < with probability of at least 1- .
YUg :
)(dim) , ( P FK
24
• The class of rationalizable choice functions is symmetric under relabeling of the alternatives. Mathematically speaking, every permutation on X induces a symmetry among all choice functions given by
• A class of choice functions will be called symmetric if it is closed under all permutations of the ground set of alternatives X.
)).( ())(( 1 ScSc
25
• A choice function defined on pairs of elements is an asymmetric preference relation. Every choice function describes an asymmetric preference relation by restricting it to pairs of elements.
• Every choice function defined on pairs of elements of X describes a tournament whose vertices are the elements of X, such that for two elements x,yX, c({x,y})=x if and only if in the graph induced by the tournament there is an edge oriented from x to y.
26
• Theorem 5.1. (1) The P-dimension of every symmetric class C of preference relations (considered as choice functions on pairs) on N alternatives is at least N/2. (2) When N 8 the P-dimension is at least N-1. (3) when N 68, if the P-dimension is precisely N-1, then the class is the class of order relations.
27
Proof of Theorem 5.1
(1)Let Let Let cC and assume without loss of generality that Let R{1,2,…,m}. We will define as follows: (If N is odd define .)
. 2/ and },...,,{ 21 NmxxxX N .1 },,{ 212 mixxA iii
.1 ,)( 12 mixAc ii
R
.)( and )(
: then If
22R1212R kkkk xxxx
Rk
.)( and )(
: then If
122R212R
kkkk xxxx
Rk
NN xx )(R
28
Proof of Theorem 5.1(cont.)
Therefore: and hence is satisfactory.
)()())((
)))((())((: then If
12121
R1
R
R1
RR
kkkk
kk
AcxxAc
AcAcRk
)()())((
)))((())((: then If
2121
R1
R
R1
RR
kkkk
kk
AcxxAc
AcAcRk
RkAcxAc kkk )())(( 12R)(R c
29
Proof of Theorem 5.1(cont.)
(2) To prove part (2) we will use the next conjecture made by Rosenfeld and proved by Havet and Thomson: When N 8, for every path P on N vertices with an arbitrary orientation of the edges, every tournament on N vertices contains a copy of P.
30
Proof of Theorem 5.1(cont.)
Let c be a choice function in the class and consider the tournament T described by c. Let Every choice function c' on describes a directed path P. Suppose that a copy of P can be found in our tournament and that the vertices of this copy (in the order they appear on the path) are
Define a permutation by The choice function (c) will agree with c' on
.11 ,},{ 1 NkxxA kkk
121 ,...,, NAAA
.,...,,21 Niii xxx .)(
jij xx
.,...,, 121 NAAA
31
• Theorem 5.1 implies the following Corollary:
Corollary 5.2. The P-dimension of every symmetric class of choice functions on N alternatives, N 8 is at least N-1.
32
• Consider a tournament with N players such that for two players i and j there is a probability that i beats j in a match between the two. Among a set A N of players let c(A) to be the player most likely to win a tournament involving the players in A.
• Consider the class W of choice functions that arise in this model where
ijp
].1,0[ijp
33
• Theorem 6.1. The class of choice functions W requires examples for learning in the PAC-model.
)( 3NO
34
• Consider m polynomials in r variables For a point the sign pattern where
mixxQ ri 1 ),...,( 1.,...,, 21 rxxx rc R
mmsss }1,0,1{),...,,( 21
.0)( if 1
0)( if 0
0)( if )1(
:namely ),(
cQs
cQs
cQs
csignQs
jj
jj
jj
jj
35
• Theorem 6.2. If the degree of every is at most D and if 2m>r then the number of sign patterns given by the polynomials is at most
.)/8( rreDm
jQ
mQQQ ,...,, 21
36
• Given a set A of players, the probability that the k-th player will be the winner in a tournament between the players of A is described by a polynomial Q(A,k) in the variables as follows: Let M= to be an s by s matrix representing the out come of all matches between the players of A in a possible tournament such that
if player i won the match against player j and otherwise.
ijp)( ijm
1 ,1for 0 ijii msim0ijm
37
Proof of Theorem 6.1(cont.)
• The probability that such a matrix M will represent the results of matches in a tournament is:
• Define Q(A,k)=
• C(A) is the player kA for which Q(A,k) is maximal.
Mp
1
,M
ijmAji
ijpp
}by drepresentet tournamenin the won :{ M Mkp
38
Proof of Theorem 6.1(cont.)
• Q(A,k) is a polynomial of degree N(N-1)/2 in N(N-1)/2 variables i, jA.
• Now consider Q(A,k, j) = Q(A,k) - Q(A, j) for all subsets A N and k, jA k j. We have all together less than polynomials in N(N-1)/2 variables The degree of these polynomials is at most N(N-1)/2.
ijp
.ijp
22 NN
39
Proof of Theorem 6.1(cont.)
• Now c(A)=k Q(A,k, j) >0 for every jA j k. Therefore the choice function given by a vector of probabilities is determined by the sign pattern of all polynomials Q(A,k, j).
• We can now invoke Warren’s theorem with r = D = and According to Warren’s theorem the number of different sign patterns of the polynomials is at most
ijp
)(2N .22 NNm
.)28( 2/)1(2 NNN Ne
40
Proof of Theorem 6.1(cont.)
• • Therefore it follows from theorem 2.1 and
Proposition 2.2 that the number of examples needed to learn W in the PAC-model is
.)28(log 32/)1(22 NNe NNN
).( 3NO
41
• Our main result determined the P-dimension of the class of rationalizable choice functions and showed that the number of examples needed to learn a rationalizable choice function is linear in the number of alternatives.
• We also described a mathematical method for analyzing the statistical learnability of complicated choice models.