Upload
benedict-mcleod
View
34
Download
1
Embed Size (px)
DESCRIPTION
The 2-Catalog. Segmentation. Problem. Joint work with Shmuel Safra. Motivation. Motivation. The Catalog Problem. Input: A set of customers C . A set of pages P . A function : C 2 P . The catalog size r . - PowerPoint PPT Presentation
Citation preview
1
Joint work with Shmuel Safra
Joint work with Shmuel Safra
2
MotivationMotivation
3
MotivationMotivation
4
The Catalog ProblemThe Catalog ProblemInput: A set of customers C. A set of pages P. A function : C 2P. The catalog size r.
Output: A catalog P’ P of size r s.t. is maximal.
Cc'Pc
5
The Catalog Problem The Catalog Problem (cont.)(cont.)Algorithm:Take the r most popular pages.
6
Catalog SegmentationCatalog Segmentation
7
The k-Catalog The k-Catalog SegmentationSegmentationInput: A set of customers C. A set of pages P. A function : C 2P. The catalog size r.
Output: k catalogs P1,…,Pk P of size r each,
s.t. is maximal.
Cc
iki
Pcmax
8
Representation as a Representation as a GraphGraph We can consider the input as a bipartite
graph G = (C, P, E), whereE = { (c,p) | c C, p (c) }.
Then, our goal is to find k sets of vertices P1,…Pk P of size r each, and a partition of C into k sets C1,…,Ck s.t.| E ( P1C1 … Pk Ck) | is maximal.
9
Uniform Catalog ProblemUniform Catalog ProblemDefinition: A catalog problem is called
uniform if there exists a number d such that the degree of every vertex p P is d.
The maximum possible number of hits for a uniform catalog problem is krd.
Thus, we can normalize the number of hits and define
drkPC...PCE kk11maxGsat
10
HardnessHardnessTheorem (Kleinberg, Papadimitriou and
Raghavan): It is NP-hard to precisely
compute the optimal k catalogs.
11
ApproximationApproximationProposition: Taking the r most popular
pages in all k catalogs gives an approximation factor of 1/k.
Proof: In the optimal solution, there is a catalog that gives at least 1/k of the hits. Thus, using only this catalog leaves us with at least 1/k of the hits. Replacing this catalog by the r most popular pages can only increase the number of hits.
12
Dense InstancesDense InstancesKleinberg, Papadimitriou and Raghavan
gave an approximation scheme for dense instances, i.e. instances in which each customer is interested in at least fraction of the pages.
13
The PCPThe PCP A SAT instance = (1,…,n) over 2
types of variables: X and Y. The range of the variables x X is
RX = {0,1}l. The range of the variables y Y is {0,1}. Each i depends on exactly one x
X and one y Y, s.t the value assigned to x determines the value of y. Thus, we can write it as a function xy : Rx {0,1}.
14
The PCP (cont.)The PCP (cont.)It is NP-hard to distinguish between the
following 2 cases:
Good: There exists an assignment A s.t.
Bad: For any assignment A
1yAxAPr yxyx
21
yx yAxAPryx
15
The ReductionThe ReductionGiven an instance for the above PCP, let
G be the following instance for the 2-catalog segmentation problem:
P = { (x, a, s) | x X, a RX, s {0,1} } C = { (y, b) | y Y, b {0,1} } (x, a, s) (y, b)
xy and xy(a) = b s r = |X|
16
CompletenessCompletenessTheorem: If is satisfiable then sat(G) =
1.
Proof: Consider the following segmentation: i {0,1}, Pi = { (x, A(x), i) | x X}. y Y, (y, A(y)) gets P0 and (y, A(y))
gets P1.Thus, for every page in the catalogs, all the
customers that are interested in it get it, and hence sat(G) = 1.
17
We would like to show that: , = (), = () s.t. if sat(G) > ½ + then there exists an assignment A s.t.
.
We would like to construct an assignment according to the catalogs.
SoundnessSoundness
21
yx yAxAPryx
Problem: A catalog might contain many pages for the same x with different assignments.
18
Refining the PCPRefining the PCPSolution: Changing the PCP.
Good: There exists an assignment A s.t.
Bad: For any assignment A
1yAxAPr yxyx
21
yx yAxAPryx
21
yxXx
yAxAPrPryx
19
Choosing One CatalogChoosing One CatalogNow, assume sat(G) > ½ + . Thus, for
one of the catalogs, Pi’,
and hence
222
1'icp:cPp
CcPrPr'i
21
'icp:c,PpCcPr
'i
20
Choosing a Subset of Choosing a Subset of PagesPages Let .
Thus, |Pi’’| /2 |X|.
Now, let us keep only one page in Pi’’ for each x X, and denote the set by Pi’’’.|Pi’’’| 2-l /2 |X|.
221
'icp:c'i'i CcPr|Pp'P
21
Enforcing the Same sEnforcing the Same s s’ {0,1} s.t.
|{ (x, a, s’) | (x, a, s’) Pi’’’ }| 2-l+1 /2 |X|.
Denote the set of the corresponding x’s by X’.
For an appropriate value of , |X’| |X|.
22
Constructing an Constructing an AssignmentAssignmentWe would like to construct an assignment
as follows: x X’, assign the value of the
appropriate page. y Y, if (y, b) gets the catalog Pi’,
assign the value b s’ to y.
Thus, x X’, ½ + /2 of the clauses xy are satisfied.
23
ProblemProblemFor a variable y Y, both (y, 0) and (y, 1)
might get the same catalog. Thus, we cannot obtain an assignment to Y as we would like to.
24
ProblemProblemFor a variable y Y, both (y, 0) and (y, 1)
might get the same catalog. Thus, we cannot obtain an assignment to Y as we would like to.
25
Taking Subsets of x’sTaking Subsets of x’sInstead of taking one page for each (x, a,
s), we take a page for every tuple of: A subset of m x’s An assignment to A bit s
x
xA x
26
The PCPThe PCP = (1,…,n) over variables, X and Y, s.t.
it is NP-hard to distinguish between:
Good: There exists an assignment A s.t.
Bad: For any assignment A
1yAxAPr yxyx
21
yxXx
yAxAPrPryx
27
par[par[,k] - Definitions,k] - Definitions For a 3SAT formula over boolean
variables Y, let Y(k) be the set of allk-subset of Y, and let (k) be the set of all k- subset of .
VY(k), let SV be the set of all assignments to V.
C(k), let SC be the set of all satisfying assignments to C.
28
par[par[,k] – Definitions ,k] – Definitions (cont.)(cont.) VY(k), C(k), let V C if V is a choice
of one variable of each clause in C.
VY(k), C(k), s.t. V C let a|V denote the natural restriction of an a SC to SV.
29
par[par[,k] ,k] Definition: For a 3SAT formula over
boolean variables Y, denote by par[,k] the following instance:
There are 2 types of variables: W : x[V] for every V Y(k), over SV
Z : x[C] for every C (k), over SC
There is a local test [C,V] for everyV C that accepts x[C]|v = x[V].
30
par[par[,k] (cont.),k] (cont.)Definition: For a set of boolean clauses ,
let sat() denote the maximal fraction of clauses of that can be satisfied simultaneously.
Theorem: If sat() = 1 then sat(par[,k]) = 1. sat(par[, k]) sat()c·k for some c>0.
31
Long CodeLong CodeDefinition: An R-long-code has one bit for
each boolean f : [R] {0,1}.
32
The PCP of [ST]The PCP of [ST]For any bipartite graph G = ([k], [k], E) we
construct a SAT instance (G), that contains one boolean function for every choice of:
z Z v1,…vk LC[z] w1,…,wk W, s.t. 1 i k, wi z 1 i k, ui wi
k2 perturbation functions p1,1,…,pk,k
33
The PCP of [ST] (cont.)The PCP of [ST] (cont.) (v1,…,vk,u1,…,uk,p1,1,…,pk,k) = TRUE
(i,j)E, vi uj = ‘vi uj pi,j’.
Denote TRUEp,...,p,u,...,u,v,...,vPrp k,k1,1k1k1
p,u,v t,sji
34
The PCP of [ST] (cont.)The PCP of [ST] (cont.)Theorem: > 0, it is NP-hard to
distinguish between the following 2 cases:
Good: G = ([k], [k], E), p > (1 - )-|E|
Bad: G = ([k], [k], E), p < 2-|E|
35
Our PCPOur PCP A SAT instance = (1,…,n) over 2
types of variables: X and Y. The range of the variables x X is
RX = {0,1}l. The range of the variables y Y is
{0,1}. Each i is of the type xy : Rx
{0,1}.
36
Our PCP (cont.)Our PCP (cont.) Let k = l/2. Given an instance (G) as above, we
construct an instance as follows: There is a variable x X for every
test (G). An assignment to x is an assignment to the bits v1,…,vk,u1,…,uk.
Y = LC[W].
37
Our PCP (cont.)Our PCP (cont.)Theorem: , > 0 and for some
constant c = c( ) > 0, it is NP-hard to distinguish between:
Good: There exists an assignment A s.t.
Bad: For any assignment A
1yAxAPr yxyx
21
yxXx
yAxAPrPryx
2cl2
38
Our PCP (cont.)Our PCP (cont.)Lemma: If there exists an assignment A
s.t.
,
then, there exists a graph G = (V, U, E) and an assignment to LC[W] and LC[Z] s.t.p 2-|E|.
21
yxXx
yAxAPrPryx
39
Our PCP (cont.)Our PCP (cont.)Proof: Assume there exists an assignment
A s.t.
.
We assign the bits of LC[W] the values assigned to them by A, and the bits of LC[Z] are assigned random values.
21
yxXx
yAxAPrPryx
40
Our PCP (cont.)Our PCP (cont.)We now have to construct a graph G that
would satisfy the lemma.
We call an x good if .
Let x be good and let V0, U0 be the corresponding vertices.
21
yx yAxAPryx
41
Our PCP (cont.)Our PCP (cont.)V0 U0
V1 U1
U2
The set of vertices in V0 for which at least½ + /2 of their edges are consistent with x.
|V1| /2 k
The set of vertices in U0 that are consistent with x.
U0 \ U1
42
Our PCP (cont.)Our PCP (cont.)Proposition: There exists i {1,2} s.t.
|Ui| /4 k, and at least ½ + /4 of the edges between Ui and V1 are consistent with x.
43
Our PCP (cont.)Our PCP (cont.)The set of vertices in V0 for which at least½ + /2 of their edges are consistent with x.
|V1| /2 k
The set of vertices in U0 that are consistent with x.
U0 \ U1
V1 U1
V’
U’
44
Our PCP (cont.)Our PCP (cont.)V1 U1
V1
U1
U2
The set of vertices in V0 for which at least½ + /2 of their edges are consistent with x.
|V1| /2 k
The set of vertices in U0 that are consistent with x.
U0 \ U1
45
Our PCP (cont.)Our PCP (cont.) Let U’ Ui, V’ V1, s.t. |U’| = |V’| = /4
k, and at least ½ + /4 of the edges between U’ and V’ are consistent with x.
There are less than 22k possibilities to choose U’ and V’ there is a subset X’ of at least 2-2k (and thus of size at least2-2k |X|) of the good x’s with the same choice of U’ and V’.
46
Our PCP (cont.)Our PCP (cont.) Let X’’ be the subset of variables x X’
that are consistent with the random assignment to LC[Z].
The probability that A(x) is consistent with a random assignment to LC[Z] is 2-k
the expected size of X’’ is 2-k |X’|.
Therefore, there exists an assignment to LC[Z] s.t. |X’’| 2-3k |X|.
47
Our PCP (cont.)Our PCP (cont.) Let G be the multi-set of all graphs
G = (V’, U’, E), corresponding to the variables x X’’, where E is the set of all edges between U’ and V’ that are consistent with x.
|G| 2-3k |X|.
GG, |E| (½ + /4) (/4 k)2.
48
Our PCP (cont.)Our PCP (cont.)Lemma: Let G be a multi-set of bipartite
graphs on [k’][k’], s.t. each graph in G has at least (½ + ’)k’2 edges.Then, t ’/2 k’2, G = ([k’], [k’], E), s.t. |E| t and
. t2
'1
'E,'k,'k'GE'EPr
G
49
Our PCP (cont.)Our PCP (cont.)By the above lemma, for k’ = /4 k and
’ = /2, G = ([/4 k], [/4 k], E), s.t.|E| = t = c’ (/4 k)2, where c’ < /4, and all the edges of this graph are consistent in at least 2-3k (/4)t fraction of the variables in X.
Considering this graph over the vertex sets U and V gives the desired result.