Upload
cornelius-stone
View
234
Download
0
Tags:
Embed Size (px)
Citation preview
Improved approximation for k-medianShi Li
Department of Computer SciencePrinceton UniversityPrinceton, NJ, 08540
04/20/2013
$100 $130
maintenance cost transportation cost
$10$20
$50
$30$30
+ minimize
BALINSKI, M. L.1966. On finding integer solutions to linear programs. In Proceedings of the IBM Scientific Computing Symposium on Combinatorial Problems. IBM, New York, pp. 225–248.KUEHN, A. A., AND HAMBURGER, M. J. 1963. A heuristic program for locating warehouses.STOLLSTEIMER, J. F.1961. The effect of technical change and output expansion on the optimum number, size and location of pear marketing facilities in a California pear producing region. Ph.D. thesis, Univ. California at Berkeley, Berkeley, Calif. STOLLSTEIMER, J. F.1963. A working model for plant numbers and locations. J. Farm Econom. 45, 631– 645.
Facility Location Problem
Uncapacitated Facility Location (UFL)
facility cost connection cost
+
F : potential facility locations
C : set of clientsfi , i F : cost for opening i
d : metric over F C
find S F,
minimize
facilities clients
$30
$100
$100
$100
$20
$100
Wal-mart Stores in New Jersey
Question :
Suppose you have budget
for 50 stores, how will you
select 50 locations?
k-median facilities clients
+
F : potential facility locations
C : set of clients
d : metric over F C
find S F,
minimize
fi , i F : cost for opening ik : number of facilities to open
|S |= k
k-median clustering
Known Results: UFL O(log n)-approximation [Hoc82] constant approximations
3.16 [STA98] 2.41 [GK99] 3 [JV99] 1.853 [CG99] 1.728 [CG99] 5+ε [Kor00] 1.861 [MMSV01]
1.736 [CS03] 1.61 [JMS02] 1.582 [Svi02] 1.52 [MYZ02] 1.50 [Byr07] 1.488 [Li11]
1.463-hardness of approx. [GK98]
4 Deterministic rounding of linear programs 4.5 The uncapacitated facility location problem
5 Random sampling and randomized rounding of linear programs
5.8 The uncapacitated facility location problem
7 The primal-dual method7.6 The uncapacitated facility location problem
9 Further uses of greedy and local search algorithms9.1 A local search algorithm for the uncapacitated facility location problem9.4 A greedy algorithm for the uncapacitated facility location problem
12 Further uses of random sampling and randomized rounding of linear programmings
12.1 The uncapacitated facility location problem
Know results : k-median
pseudo-approximation 1-approx with O(k log n) facilities [Hoc82]
2(1+ε)-approx. with (1+1/ε)k facilities[LV92]
super-constant approximation O(log n loglog n) [Bar96,Bar98]
O(log k loglog k) [CCGS98]
Known Results: k-median
constant approximation
LP rounding Primal-Dual Local Search
6.667 [CGTS99] 6 [JV99]
4 [CG99]4 [JMS03]3.25 [CL12]
3+ε [AGK+01]
1+√3+ε [LS13]
(1+2/e)-hardness of approximation [JMS03]
Lloyd Algorithm[Lloyd82] k-means clustering : min total squared distances
k-means vs k-median• clustering: k-means is more
often used
• Walmart example: k-median
is more appropriate
• approximation: k-median is
“easier”
Local Search Can we improve the solution
by p swaps? No : stop
Yes : swap and repeat
Approximation : k-median : 3+2/p [AGK+01]
k-means : (3+2/p)2 [KMN+02]
LP for k-medianyi : whether to open i
xi,j : whether connect j to i
open at most k facilitiesclient j must be connectedclient j can only connected to an open facility
integrality gap is at least 2
integrality gap is at most 3 (proof non-constructive)
(1+√3+ε)-approximation on k-median
k-median and UFL f = cost of a facility
f #open facilities
Given a black-box α-approximation A for UFL
Naïve try : find an f such that A opens k facilities
α-approxition for k-median?
Proof : α ≈1.488 for UFL, α > 1.736 for k-median
k-median and UFLNaïve try : find an f such that A opens k facilities
2 issues with naïve try :
1. need LMP α-approximation for UFL
α-approximation:
LMP α-approximation
LMP = Lagragean Multiplier Preserving
k-median and UFL
S1 : set of k1 < k facilities
S2 : set of k2 > k facilitiesbi-point solution
Naïve try : find an f such that A opens k facilities
2 issues with naïve try :
1. need LMP α-approximation for UFL
2. can not find f s.t. A opens exactly k facilities
k-median and UFL2 issues with naïve try :
1. need LMP α-approximation for UFL
2. can not find f s.t. A opens exactly k facilities
LMP approx. factor
bi-point integral
final ratio for k-median
[JV] [JMS]
3
x 2
6
2
x 2
4
our result
2
do not know how to improvethis factor of 2 is tight !!
bi-point solution
k1= |S1| < k ≤ |S2| = k2
a, b : ak1 + bk2 = k, a + b = 1
bi-point solution : aS1+bS2
cost(aS1+bS2) = a cost(S1) + b cost(S2)
S1 S2
gap-2 instance
1
0
k + 1
cost of integral solution = 2
k1 = 1, k2 = k+1
cost(S1) = k+1, cost(S2) =
0
S1 S2
k-median and UFL
Main Lemma 2 : bi-point solution of cost C
solution of cost with k+O(1/ε) facilities
[JV] [JMS] our result
LMP approx. factor 3 2 2bi-point integral x 2 x 2
final ratio for k-median 6 4
this factor of 2 is tight !!
bi-point pseudo-integral
Main Lemma 1 : suffice to give an α-approximate
solution with k+O(1) facilities
Main Lemma 1
with k+1 open facilities, cost = 0with k open facilities , cost huge
A : black-box α-approximation with k+c open facilities
A' : (α+ε)-approximation with k open facilities
A' calls A nO(c/ε) times.
bad instance:
Dense FacilityBi : set of clients in a small ball around i
i is A-dense, if connection cost of Bi in OPT is ≥ A
iBi
this instance : i is A-dense for A≈opt
Dense Facility
Bi
Reduction component works directly if there are no opt/t-dense facilities, t = O(c/ε)
can reduce to such an instance in nO(t) time
i
[Awasthi-Blum-Sheffet] : ε, δ >0 constants,
OPTk-1 ≥ (1+δ)OPTk can find (1+ε)-approximation
Main Lemma 1 : suffice to give an α-approximate
solution with k+O(1) facilities
k-median clustering is easy in practice reason : there is a “meaningful” clustering
Lemma 1 from [ABS]
Lemma 1 from [ABS]
Algorithm
Apply A to (k-c, F, C, d) solution with k facilities of cost ≤ αOPTk-c
Apply [ABS] to each (k-i, F, C, d) for i = 0, 1, 2, …, c-1
Output the best of the c+1 solutions
Proof
If OPTk-c ≤ (1+ε)OPTk, then done.
otherwise, consider the smallest i s.t. OPTk-i-1 ≥ (1+ε)1/cOPTk-i
[ABS] on (k-i, F, C, d) solution of cost (1+ε)OPTk-i ≤ (1+ε)2OPTk
[ABS] OPTk-1 ≥ (1+δ)OPTk (1+ε)-approximation
A : α-approximation algorithm for k-median with k+c medians
Main Lemma 2 : bi-point solution of cost C
solution of cost with k+O(1/ε) facilities
[JV] bi-point solution of cost C solution of cost 2C
based on improving [JV] algorithm
S1 S2
given : bi-point solution aS1+bS2
select S’2 S2 , |S’2| = |S1| = k1
with prob. a, open S1
with prob. b, open S’2
randomly open k-k1 facilities in S2 \ S’2
i
JV algorithm
τi = nearest facility of i
guarantee : either i is open, or τi is open
Analysis of JV algorithm
i1
i2
i3≤ d1+d2
If i2 is open, connect j to i2
Otherwise, if i1 is open, connect j to i1
Otherwise connect j to i3
E[cost of j] ≤ × [cost of j in aS1+bS2]
d1 d2ji1 S1 , i3 S’2
either i1 or i3 is open
2
Our Algorithmon average, d1 >> d2
d(j, i3) ≤ i1
i2
i3
d1 d2
≤ d1+d2
j
i3≤ d1+d2
If i2 is open, connect j to i2
Otherwise, if i1 is open, connect j to i1
Otherwise connect j to i3
E[cost of j] ≤ × [cost of j in aS1+bS2]2
d1+2d
2
2d1+d2
Our Algorithm
for a star, either the center is open, or all leaves are open
idea : big stars: always open the center,
open each leaf with prob. ≈b group small stars of the same
size, dependent rounding for each group, open 3 more
facilities than expected
first try open each star independently? with prob. a, open the center,
with prob. b, open the leaves problem : can not bound the
number of open facilities
need to guarantee : either i is open, or τi is open
iτi
small starssmall star : star of size ≤ 2/(abε )
Mh : set of stars of size h, m = |Mh|
Roughly,
for am stars, open the center
for bm stars, open the leaves
More accurately,
permute the stars and the
facilities
open top centers
open bottom leaves
big starssize h > 2/(abε )
always open the center
randomly open leaves
≈ bh for big star
Lemma : we open at most k + 6/(abε) facilities.
for a big star of size h,
FRAC : a+bh
ALG :
for a group of m small stars of size h
FRAC : m(a+bh)
ALG :
there are at most 2/(abε) groups
Summary
Main Lemma 2 : bi-point solution of cost C
solution of cost with k+O(1/ε) facilities
[JV] [JMS] our result
LMP approx. factor 3 2 2x 2 x 2
final ratio for k-median 6 4bi-point pseudo-integral
Main Lemma 1 : suffice to give an α-approximate
solution with k+O(1) facilities
Open Problems gap between integral solution with k+1 open
facilities and LP value(with k open facilities)?
tight analysis?
algorithm works for k-means?
THANK YOU!Questions?