37
Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013

Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013

Embed Size (px)

Citation preview

Page 1: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013

Improved approximation for k-medianShi Li

Department of Computer SciencePrinceton UniversityPrinceton, NJ, 08540

04/20/2013

Page 2: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013

$100 $130

maintenance cost transportation cost

$10$20

$50

$30$30

+ minimize

Page 3: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013

BALINSKI, M. L.1966. On finding integer solutions to linear programs. In Proceedings of the IBM Scientific Computing Symposium on Combinatorial Problems. IBM, New York, pp. 225–248.KUEHN, A. A., AND HAMBURGER, M. J. 1963. A heuristic program for locating warehouses.STOLLSTEIMER, J. F.1961. The effect of technical change and output expansion on the optimum number, size and location of pear marketing facilities in a California pear producing region. Ph.D. thesis, Univ. California at Berkeley, Berkeley, Calif. STOLLSTEIMER, J. F.1963. A working model for plant numbers and locations. J. Farm Econom. 45, 631– 645.

Facility Location Problem

Page 4: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013

Uncapacitated Facility Location (UFL)

facility cost connection cost

+

F : potential facility locations

C : set of clientsfi , i F : cost for opening i

d : metric over F C

find S F,

minimize

facilities clients

$30

$100

$100

$100

$20

$100

Page 5: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013

Wal-mart Stores in New Jersey

Question :

Suppose you have budget

for 50 stores, how will you

select 50 locations?

Page 6: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013

k-median facilities clients

+

F : potential facility locations

C : set of clients

d : metric over F C

find S F,

minimize

fi , i F : cost for opening ik : number of facilities to open

|S |= k

Page 7: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013

k-median clustering

Page 8: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013

Known Results: UFL O(log n)-approximation [Hoc82] constant approximations

3.16 [STA98] 2.41 [GK99] 3 [JV99] 1.853 [CG99] 1.728 [CG99] 5+ε [Kor00] 1.861 [MMSV01]

1.736 [CS03] 1.61 [JMS02] 1.582 [Svi02] 1.52 [MYZ02] 1.50 [Byr07] 1.488 [Li11]

1.463-hardness of approx. [GK98]

Page 9: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013

4 Deterministic rounding of linear programs 4.5 The uncapacitated facility location problem

5 Random sampling and randomized rounding of linear programs

5.8 The uncapacitated facility location problem

7 The primal-dual method7.6 The uncapacitated facility location problem

9 Further uses of greedy and local search algorithms9.1 A local search algorithm for the uncapacitated facility location problem9.4 A greedy algorithm for the uncapacitated facility location problem

12 Further uses of random sampling and randomized rounding of linear programmings

12.1 The uncapacitated facility location problem

Page 10: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013

Know results : k-median

pseudo-approximation 1-approx with O(k log n) facilities [Hoc82]

2(1+ε)-approx. with (1+1/ε)k facilities[LV92]

super-constant approximation O(log n loglog n) [Bar96,Bar98]

O(log k loglog k) [CCGS98]

Page 11: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013

Known Results: k-median

constant approximation

LP rounding Primal-Dual Local Search

6.667 [CGTS99] 6 [JV99]

4 [CG99]4 [JMS03]3.25 [CL12]

3+ε [AGK+01]

1+√3+ε [LS13]

(1+2/e)-hardness of approximation [JMS03]

Page 12: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013

Lloyd Algorithm[Lloyd82] k-means clustering : min total squared distances

k-means vs k-median• clustering: k-means is more

often used

• Walmart example: k-median

is more appropriate

• approximation: k-median is

“easier”

Page 13: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013

Local Search Can we improve the solution

by p swaps? No : stop

Yes : swap and repeat

Approximation : k-median : 3+2/p [AGK+01]

k-means : (3+2/p)2 [KMN+02]

Page 14: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013

LP for k-medianyi : whether to open i

xi,j : whether connect j to i

open at most k facilitiesclient j must be connectedclient j can only connected to an open facility

integrality gap is at least 2

integrality gap is at most 3 (proof non-constructive)

Page 15: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013

(1+√3+ε)-approximation on k-median

Page 16: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013

k-median and UFL f = cost of a facility

f #open facilities

Given a black-box α-approximation A for UFL

Naïve try : find an f such that A opens k facilities

α-approxition for k-median?

Proof : α ≈1.488 for UFL, α > 1.736 for k-median

Page 17: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013

k-median and UFLNaïve try : find an f such that A opens k facilities

2 issues with naïve try :

1. need LMP α-approximation for UFL

α-approximation:

LMP α-approximation

LMP = Lagragean Multiplier Preserving

Page 18: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013

k-median and UFL

S1 : set of k1 < k facilities

S2 : set of k2 > k facilitiesbi-point solution

Naïve try : find an f such that A opens k facilities

2 issues with naïve try :

1. need LMP α-approximation for UFL

2. can not find f s.t. A opens exactly k facilities

Page 19: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013

k-median and UFL2 issues with naïve try :

1. need LMP α-approximation for UFL

2. can not find f s.t. A opens exactly k facilities

LMP approx. factor

bi-point integral

final ratio for k-median

[JV] [JMS]

3

x 2

6

2

x 2

4

our result

2

do not know how to improvethis factor of 2 is tight !!

Page 20: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013

bi-point solution

k1= |S1| < k ≤ |S2| = k2

a, b : ak1 + bk2 = k, a + b = 1

bi-point solution : aS1+bS2

cost(aS1+bS2) = a cost(S1) + b cost(S2)

S1 S2

Page 21: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013

gap-2 instance

1

0

k + 1

cost of integral solution = 2

k1 = 1, k2 = k+1

cost(S1) = k+1, cost(S2) =

0

S1 S2

Page 22: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013

k-median and UFL

Main Lemma 2 : bi-point solution of cost C

solution of cost with k+O(1/ε) facilities

[JV] [JMS] our result

LMP approx. factor 3 2 2bi-point integral x 2 x 2

final ratio for k-median 6 4

this factor of 2 is tight !!

bi-point pseudo-integral

Main Lemma 1 : suffice to give an α-approximate

solution with k+O(1) facilities

Page 23: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013

Main Lemma 1

with k+1 open facilities, cost = 0with k open facilities , cost huge

A : black-box α-approximation with k+c open facilities

A' : (α+ε)-approximation with k open facilities

A' calls A nO(c/ε) times.

bad instance:

Page 24: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013

Dense FacilityBi : set of clients in a small ball around i

i is A-dense, if connection cost of Bi in OPT is ≥ A

iBi

this instance : i is A-dense for A≈opt

Page 25: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013

Dense Facility

Bi

Reduction component works directly if there are no opt/t-dense facilities, t = O(c/ε)

can reduce to such an instance in nO(t) time

i

Page 26: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013

[Awasthi-Blum-Sheffet] : ε, δ >0 constants,

OPTk-1 ≥ (1+δ)OPTk can find (1+ε)-approximation

Main Lemma 1 : suffice to give an α-approximate

solution with k+O(1) facilities

k-median clustering is easy in practice reason : there is a “meaningful” clustering

Lemma 1 from [ABS]

Page 27: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013

Lemma 1 from [ABS]

Algorithm

Apply A to (k-c, F, C, d) solution with k facilities of cost ≤ αOPTk-c

Apply [ABS] to each (k-i, F, C, d) for i = 0, 1, 2, …, c-1

Output the best of the c+1 solutions

Proof

If OPTk-c ≤ (1+ε)OPTk, then done.

otherwise, consider the smallest i s.t. OPTk-i-1 ≥ (1+ε)1/cOPTk-i

[ABS] on (k-i, F, C, d) solution of cost (1+ε)OPTk-i ≤ (1+ε)2OPTk

[ABS] OPTk-1 ≥ (1+δ)OPTk (1+ε)-approximation

A : α-approximation algorithm for k-median with k+c medians

Page 28: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013

Main Lemma 2 : bi-point solution of cost C

solution of cost with k+O(1/ε) facilities

[JV] bi-point solution of cost C solution of cost 2C

based on improving [JV] algorithm

Page 29: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013

S1 S2

given : bi-point solution aS1+bS2

select S’2 S2 , |S’2| = |S1| = k1

with prob. a, open S1

with prob. b, open S’2

randomly open k-k1 facilities in S2 \ S’2

i

JV algorithm

τi = nearest facility of i

guarantee : either i is open, or τi is open

Page 30: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013

Analysis of JV algorithm

i1

i2

i3≤ d1+d2

If i2 is open, connect j to i2

Otherwise, if i1 is open, connect j to i1

Otherwise connect j to i3

E[cost of j] ≤ × [cost of j in aS1+bS2]

d1 d2ji1 S1 , i3 S’2

either i1 or i3 is open

2

Page 31: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013

Our Algorithmon average, d1 >> d2

d(j, i3) ≤ i1

i2

i3

d1 d2

≤ d1+d2

j

i3≤ d1+d2

If i2 is open, connect j to i2

Otherwise, if i1 is open, connect j to i1

Otherwise connect j to i3

E[cost of j] ≤ × [cost of j in aS1+bS2]2

d1+2d

2

2d1+d2

Page 32: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013

Our Algorithm

for a star, either the center is open, or all leaves are open

idea : big stars: always open the center,

open each leaf with prob. ≈b group small stars of the same

size, dependent rounding for each group, open 3 more

facilities than expected

first try open each star independently? with prob. a, open the center,

with prob. b, open the leaves problem : can not bound the

number of open facilities

need to guarantee : either i is open, or τi is open

iτi

Page 33: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013

small starssmall star : star of size ≤ 2/(abε )

Mh : set of stars of size h, m = |Mh|

Roughly,

for am stars, open the center

for bm stars, open the leaves

More accurately,

permute the stars and the

facilities

open top centers

open bottom leaves

Page 34: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013

big starssize h > 2/(abε )

always open the center

randomly open leaves

≈ bh for big star

Page 35: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013

Lemma : we open at most k + 6/(abε) facilities.

for a big star of size h,

FRAC : a+bh

ALG :

for a group of m small stars of size h

FRAC : m(a+bh)

ALG :

there are at most 2/(abε) groups

Page 36: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013

Summary

Main Lemma 2 : bi-point solution of cost C

solution of cost with k+O(1/ε) facilities

[JV] [JMS] our result

LMP approx. factor 3 2 2x 2 x 2

final ratio for k-median 6 4bi-point pseudo-integral

Main Lemma 1 : suffice to give an α-approximate

solution with k+O(1) facilities

Page 37: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013

Open Problems gap between integral solution with k+1 open

facilities and LP value(with k open facilities)?

tight analysis?

algorithm works for k-means?

THANK YOU!Questions?