49
1 Joint work with Shmuel Safra

1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

  • View
    220

  • Download
    3

Embed Size (px)

Citation preview

Page 1: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

1

Joint work with Shmuel Safra

Joint work with Shmuel Safra

Page 2: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

2

MotivationMotivation

Page 3: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

3

MotivationMotivation

Page 4: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

4

The Catalog ProblemThe Catalog ProblemInput: A set of customers C. A set of pages P. A function : C 2P. The catalog size r.

Output: A catalog P’ P of size r s.t. is maximal.

Cc'Pc

Page 5: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

5

The Catalog Problem The Catalog Problem (cont.)(cont.)Algorithm:Take the r most popular pages.

Page 6: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

6

Catalog SegmentationCatalog Segmentation

Page 7: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

7

The k-Catalog The k-Catalog SegmentationSegmentationInput: A set of customers C. A set of pages P. A function : C 2P. The catalog size r.

Output: k catalogs P1,…,Pk P of size r each,

s.t. is maximal.

Cc

iki

Pcmax

Page 8: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

8

Representation as a Representation as a GraphGraph We can consider the input as a bipartite

graph G = (C, P, E), whereE = { (c,p) | c C, p (c) }.

Then, our goal is to find k sets of vertices P1,…Pk P of size r each, and a partition of C into k sets C1,…,Ck s.t.| E ( P1C1 … Pk Ck) | is maximal.

Page 9: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

9

Uniform Catalog ProblemUniform Catalog ProblemDefinition: A catalog problem is called

uniform if there exists a number d such that the degree of every vertex p P is d.

The maximum possible number of hits for a uniform catalog problem is krd.

Thus, we can normalize the number of hits and define

drkPC...PCE kk11maxGsat

Page 10: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

10

HardnessHardnessTheorem (Kleinberg, Papadimitriou and

Raghavan): It is NP-hard to precisely

compute the optimal k catalogs.

Page 11: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

11

ApproximationApproximationProposition: Taking the r most popular

pages in all k catalogs gives an approximation factor of 1/k.

Proof: In the optimal solution, there is a catalog that gives at least 1/k of the hits. Thus, using only this catalog leaves us with at least 1/k of the hits. Replacing this catalog by the r most popular pages can only increase the number of hits.

Page 12: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

12

Dense InstancesDense InstancesKleinberg, Papadimitriou and Raghavan

gave an approximation scheme for dense instances, i.e. instances in which each customer is interested in at least fraction of the pages.

Page 13: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

13

The PCPThe PCP A SAT instance = (1,…,n) over 2

types of variables: X and Y. The range of the variables x X is

RX = {0,1}l. The range of the variables y Y is {0,1}. Each i depends on exactly one x

X and one y Y, s.t the value assigned to x determines the value of y. Thus, we can write it as a function xy : Rx {0,1}.

Page 14: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

14

The PCP (cont.)The PCP (cont.)It is NP-hard to distinguish between the

following 2 cases:

Good: There exists an assignment A s.t.

Bad: For any assignment A

1yAxAPr yxyx

21

yx yAxAPryx

Page 15: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

15

The ReductionThe ReductionGiven an instance for the above PCP, let

G be the following instance for the 2-catalog segmentation problem:

P = { (x, a, s) | x X, a RX, s {0,1} } C = { (y, b) | y Y, b {0,1} } (x, a, s) (y, b)

xy and xy(a) = b s r = |X|

Page 16: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

16

CompletenessCompletenessTheorem: If is satisfiable then sat(G) =

1.

Proof: Consider the following segmentation: i {0,1}, Pi = { (x, A(x), i) | x X}. y Y, (y, A(y)) gets P0 and (y, A(y))

gets P1.Thus, for every page in the catalogs, all the

customers that are interested in it get it, and hence sat(G) = 1.

Page 17: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

17

We would like to show that: , = (), = () s.t. if sat(G) > ½ + then there exists an assignment A s.t.

.

We would like to construct an assignment according to the catalogs.

SoundnessSoundness

21

yx yAxAPryx

Problem: A catalog might contain many pages for the same x with different assignments.

Page 18: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

18

Refining the PCPRefining the PCPSolution: Changing the PCP.

Good: There exists an assignment A s.t.

Bad: For any assignment A

1yAxAPr yxyx

21

yx yAxAPryx

21

yxXx

yAxAPrPryx

Page 19: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

19

Choosing One CatalogChoosing One CatalogNow, assume sat(G) > ½ + . Thus, for

one of the catalogs, Pi’,

and hence

222

1'icp:cPp

CcPrPr'i

21

'icp:c,PpCcPr

'i

Page 20: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

20

Choosing a Subset of Choosing a Subset of PagesPages Let .

Thus, |Pi’’| /2 |X|.

Now, let us keep only one page in Pi’’ for each x X, and denote the set by Pi’’’.|Pi’’’| 2-l /2 |X|.

221

'icp:c'i'i CcPr|Pp'P

Page 21: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

21

Enforcing the Same sEnforcing the Same s s’ {0,1} s.t.

|{ (x, a, s’) | (x, a, s’) Pi’’’ }| 2-l+1 /2 |X|.

Denote the set of the corresponding x’s by X’.

For an appropriate value of , |X’| |X|.

Page 22: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

22

Constructing an Constructing an AssignmentAssignmentWe would like to construct an assignment

as follows: x X’, assign the value of the

appropriate page. y Y, if (y, b) gets the catalog Pi’,

assign the value b s’ to y.

Thus, x X’, ½ + /2 of the clauses xy are satisfied.

Page 23: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

23

ProblemProblemFor a variable y Y, both (y, 0) and (y, 1)

might get the same catalog. Thus, we cannot obtain an assignment to Y as we would like to.

Page 24: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

24

ProblemProblemFor a variable y Y, both (y, 0) and (y, 1)

might get the same catalog. Thus, we cannot obtain an assignment to Y as we would like to.

Page 25: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

25

Taking Subsets of x’sTaking Subsets of x’sInstead of taking one page for each (x, a,

s), we take a page for every tuple of: A subset of m x’s An assignment to A bit s

x

xA x

Page 26: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

26

The PCPThe PCP = (1,…,n) over variables, X and Y, s.t.

it is NP-hard to distinguish between:

Good: There exists an assignment A s.t.

Bad: For any assignment A

1yAxAPr yxyx

21

yxXx

yAxAPrPryx

Page 27: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

27

par[par[,k] - Definitions,k] - Definitions For a 3SAT formula over boolean

variables Y, let Y(k) be the set of allk-subset of Y, and let (k) be the set of all k- subset of .

VY(k), let SV be the set of all assignments to V.

C(k), let SC be the set of all satisfying assignments to C.

Page 28: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

28

par[par[,k] – Definitions ,k] – Definitions (cont.)(cont.) VY(k), C(k), let V C if V is a choice

of one variable of each clause in C.

VY(k), C(k), s.t. V C let a|V denote the natural restriction of an a SC to SV.

Page 29: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

29

par[par[,k] ,k] Definition: For a 3SAT formula over

boolean variables Y, denote by par[,k] the following instance:

There are 2 types of variables: W : x[V] for every V Y(k), over SV

Z : x[C] for every C (k), over SC

There is a local test [C,V] for everyV C that accepts x[C]|v = x[V].

Page 30: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

30

par[par[,k] (cont.),k] (cont.)Definition: For a set of boolean clauses ,

let sat() denote the maximal fraction of clauses of that can be satisfied simultaneously.

Theorem: If sat() = 1 then sat(par[,k]) = 1. sat(par[, k]) sat()c·k for some c>0.

Page 31: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

31

Long CodeLong CodeDefinition: An R-long-code has one bit for

each boolean f : [R] {0,1}.

Page 32: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

32

The PCP of [ST]The PCP of [ST]For any bipartite graph G = ([k], [k], E) we

construct a SAT instance (G), that contains one boolean function for every choice of:

z Z v1,…vk LC[z] w1,…,wk W, s.t. 1 i k, wi z 1 i k, ui wi

k2 perturbation functions p1,1,…,pk,k

Page 33: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

33

The PCP of [ST] (cont.)The PCP of [ST] (cont.) (v1,…,vk,u1,…,uk,p1,1,…,pk,k) = TRUE

(i,j)E, vi uj = ‘vi uj pi,j’.

Denote TRUEp,...,p,u,...,u,v,...,vPrp k,k1,1k1k1

p,u,v t,sji

Page 34: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

34

The PCP of [ST] (cont.)The PCP of [ST] (cont.)Theorem: > 0, it is NP-hard to

distinguish between the following 2 cases:

Good: G = ([k], [k], E), p > (1 - )-|E|

Bad: G = ([k], [k], E), p < 2-|E|

Page 35: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

35

Our PCPOur PCP A SAT instance = (1,…,n) over 2

types of variables: X and Y. The range of the variables x X is

RX = {0,1}l. The range of the variables y Y is

{0,1}. Each i is of the type xy : Rx

{0,1}.

Page 36: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

36

Our PCP (cont.)Our PCP (cont.) Let k = l/2. Given an instance (G) as above, we

construct an instance as follows: There is a variable x X for every

test (G). An assignment to x is an assignment to the bits v1,…,vk,u1,…,uk.

Y = LC[W].

Page 37: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

37

Our PCP (cont.)Our PCP (cont.)Theorem: , > 0 and for some

constant c = c( ) > 0, it is NP-hard to distinguish between:

Good: There exists an assignment A s.t.

Bad: For any assignment A

1yAxAPr yxyx

21

yxXx

yAxAPrPryx

2cl2

Page 38: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

38

Our PCP (cont.)Our PCP (cont.)Lemma: If there exists an assignment A

s.t.

,

then, there exists a graph G = (V, U, E) and an assignment to LC[W] and LC[Z] s.t.p 2-|E|.

21

yxXx

yAxAPrPryx

Page 39: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

39

Our PCP (cont.)Our PCP (cont.)Proof: Assume there exists an assignment

A s.t.

.

We assign the bits of LC[W] the values assigned to them by A, and the bits of LC[Z] are assigned random values.

21

yxXx

yAxAPrPryx

Page 40: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

40

Our PCP (cont.)Our PCP (cont.)We now have to construct a graph G that

would satisfy the lemma.

We call an x good if .

Let x be good and let V0, U0 be the corresponding vertices.

21

yx yAxAPryx

Page 41: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

41

Our PCP (cont.)Our PCP (cont.)V0 U0

V1 U1

U2

The set of vertices in V0 for which at least½ + /2 of their edges are consistent with x.

|V1| /2 k

The set of vertices in U0 that are consistent with x.

U0 \ U1

Page 42: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

42

Our PCP (cont.)Our PCP (cont.)Proposition: There exists i {1,2} s.t.

|Ui| /4 k, and at least ½ + /4 of the edges between Ui and V1 are consistent with x.

Page 43: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

43

Our PCP (cont.)Our PCP (cont.)The set of vertices in V0 for which at least½ + /2 of their edges are consistent with x.

|V1| /2 k

The set of vertices in U0 that are consistent with x.

U0 \ U1

V1 U1

V’

U’

Page 44: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

44

Our PCP (cont.)Our PCP (cont.)V1 U1

V1

U1

U2

The set of vertices in V0 for which at least½ + /2 of their edges are consistent with x.

|V1| /2 k

The set of vertices in U0 that are consistent with x.

U0 \ U1

Page 45: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

45

Our PCP (cont.)Our PCP (cont.) Let U’ Ui, V’ V1, s.t. |U’| = |V’| = /4

k, and at least ½ + /4 of the edges between U’ and V’ are consistent with x.

There are less than 22k possibilities to choose U’ and V’ there is a subset X’ of at least 2-2k (and thus of size at least2-2k |X|) of the good x’s with the same choice of U’ and V’.

Page 46: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

46

Our PCP (cont.)Our PCP (cont.) Let X’’ be the subset of variables x X’

that are consistent with the random assignment to LC[Z].

The probability that A(x) is consistent with a random assignment to LC[Z] is 2-k

the expected size of X’’ is 2-k |X’|.

Therefore, there exists an assignment to LC[Z] s.t. |X’’| 2-3k |X|.

Page 47: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

47

Our PCP (cont.)Our PCP (cont.) Let G be the multi-set of all graphs

G = (V’, U’, E), corresponding to the variables x X’’, where E is the set of all edges between U’ and V’ that are consistent with x.

|G| 2-3k |X|.

GG, |E| (½ + /4) (/4 k)2.

Page 48: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

48

Our PCP (cont.)Our PCP (cont.)Lemma: Let G be a multi-set of bipartite

graphs on [k’][k’], s.t. each graph in G has at least (½ + ’)k’2 edges.Then, t ’/2 k’2, G = ([k’], [k’], E), s.t. |E| t and

. t2

'1

'E,'k,'k'GE'EPr

G

Page 49: 1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation

49

Our PCP (cont.)Our PCP (cont.)By the above lemma, for k’ = /4 k and

’ = /2, G = ([/4 k], [/4 k], E), s.t.|E| = t = c’ (/4 k)2, where c’ < /4, and all the edges of this graph are consistent in at least 2-3k (/4)t fraction of the variables in X.

Considering this graph over the vertex sets U and V gives the desired result.