45
Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler

Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler

Embed Size (px)

Citation preview

Page 1: Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler

Facility Location in Dynamic Geometric Data Streams

Christiane LammersenChristian Sohler

Page 2: Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler

Dynamic Geometric Data Streams• Streams of geometric data arise in

– Mobile networks– Sensor networks– …

• Continuously changing data– Mobile networks: position of nodes– Sensor networks: measured data

• Communication in form of update operations– Update consists of ID of node, old value, new value

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 2

Page 3: Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 333

Hierarchical Communication Systems

• upper layer offers lower layer a certain service• each node can be a server• cost for server ↔ access time

3

Page 4: Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 4

Hierarchical Communication Systems

• upper layer offers lower layer a certain service• each node can be a server• cost for server ↔ access time

Page 5: Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 5

Dynamic Geometric Data Streams

• m insert and delete operations• points in low-dimensional, discrete space

{1, ..., }d

• polylog(, m) memory space, one pass

[Indyk ‘04]

Page 6: Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 666

Dynamic Uniform FLP• point set P• facilities have uniform opening cost f • clients have uniform demand b• goal: maintaining F P, so as to minimize

6

Pp

Fq qpbFf min

FLP related to k-Median but|F| can be (|P|) problem in streaming approximation of the cost

Page 7: Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 777

Related Work• P. Indyk: Algorithms for Dynamic Geometric

Problems over Data Streams, STOC 04– O(log2)-approximation for cost of FLP– Idea: nested squared grids, open facility in all

heavy cells

• G. Frahling and C. Sohler: Coresets in Dynamic Geometric Data Streams, STOC 05– space partition based on heavy cells

7

Page 8: Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 8

Construction of Our Streaming Method

deterministic method

Edet(P) = (OPT(P))

randomized methodErand(P) = (Edet(P))

streaming methodEstream(P) = (Erand(P))

Page 9: Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets

• Impose log()+1 nested squared grids• In each grid, identify the heavy cells• Partition the input space based on the heavy cells• For each cell size, count the number of points within

cells of that size => estimator for cost:

[Indyk ’04, Frahling and Sohler ‘05]

9

Deterministic Method

log

0det 2

i iSPC

iCnPE

Page 10: Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets

• Impose log()+1 nested squared grids• In each grid, identify the heavy cells• Partition the input space based on the heavy cells• For each cell size, count the number of points within

cells of that size => estimator for cost:

10

Deterministic Method

log

0det 2

i iSPC

iCnPE

Idea: Open one facility in each heavy cell in the space partition.

Page 11: Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets

• Impose log()+1 nested squared grids• In each grid, identify the heavy cells• Partition the input space based on the heavy cells• For each cell size, count the number of points within

cells of that size => estimator for cost:

11

Deterministic Method

log

0det 2

i iSPC

iCnPE

Idea: Open one facility in each heavy cell in the space partition.

Page 12: Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 12

Nested Grids

• Impose log()+1 nested squared grids

= 16Level: 4

Page 13: Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 13

Nested Grids

• Impose log()+1 nested squared grids

= 16Level: 3

Page 14: Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 14

Nested Grids

• Impose log()+1 nested squared grids

= 16Level: 2

Page 15: Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 15

Nested Grids

• Impose log()+1 nested squared grids

= 16Level: 1

Page 16: Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 16

Nested Grids

• Impose log()+1 nested squared grids

= 16Level: 0

Page 17: Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 17

Deterministic Method• Impose log()+1 nested squared grids• In each grid, identify the heavy cells• Partition the input space based on the heavy cells• For each cell size, count the number of points within

cells of that size => estimator for cost:

log

0det 2

i iSPC

iCnPE

Page 18: Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 18

Space Partition

• In each grid, identify the heavy cells• Partition the input space based on the heavy cells

f = 8= 16Level: 4

Cell in level i is heavy if it contains f / 2i points.

Page 19: Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 19

Space Partition

• In each grid, identify the heavy cells• Partition the input space based on the heavy cells

f = 8= 16Level: 3

Cell in level i is heavy if it contains f / 2i points.

Page 20: Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 20

Space Partition

• In each grid, identify the heavy cells• Partition the input space based on the heavy cells

f = 8= 16Level: 3

Cell in level i is heavy if it contains f / 2i points.

Page 21: Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 21

Space Partition

• In each grid, identify the heavy cells• Partition the input space based on the heavy cells

f = 8= 16Level: 3

Cell in level i is heavy if it contains f / 2i points.

Page 22: Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 22

Space Partition

• In each grid, identify the heavy cells• Partition the input space based on the heavy cells

f = 8= 16Level: 3

Cell in level i is heavy if it contains f / 2i points.

Page 23: Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 23

Space Partition

• In each grid, identify the heavy cells• Partition the input space based on the heavy cells

f = 8= 16Level: 2

Cell in level i is heavy if it contains f / 2i points.

Page 24: Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 24

Space Partition

• In each grid, identify the heavy cells• Partition the input space based on the heavy cells

f = 8= 16Level: 2

Cell in level i is heavy if it contains f / 2i points.

Page 25: Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 25

Space Partition

• In each grid, identify the heavy cells• Partition the input space based on the heavy cells

f = 8= 16Level: 2

Cell in level i is heavy if it contains f / 2i points.

Page 26: Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 26

Space Partition

• In each grid, identify the heavy cells• Partition the input space based on the heavy cells

f = 8= 16Level: 2

Cell in level i is heavy if it contains f / 2i points.

Page 27: Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 27

Space Partition

• In each grid, identify the heavy cells• Partition the input space based on the heavy cells

f = 8= 16Level: 1

Cell in level i is heavy if it contains f / 2i points.

Page 28: Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 28

Space Partition

• In each grid, identify the heavy cells• Partition the input space based on the heavy cells

f = 8= 16Level: 1

Cell in level i is heavy if it contains f / 2i points.

Page 29: Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 29

Space Partition

• In each grid, identify the heavy cells• Partition the input space based on the heavy cells

f = 8= 16Level: 1

Cell in level i is heavy if it contains f / 2i points.

Page 30: Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 30

Space Partition

• In each grid, identify the heavy cells• Partition the input space based on the heavy cells

f = 8= 16Level: 0

Cell in level i is heavy if it contains f / 2i points.

Page 31: Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 31

Space Partition

• In each grid, identify the heavy cells• Partition the input space based on the heavy cells

Page 32: Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 32

Deterministic Method• Impose log()+1 nested squared grids• In each grid, identify the heavy cells• Partition the input space based on the heavy cells• For each cell size, count the number of points within

cells of that size => estimator for cost:

log

0det 2

i iSPC

iCnPE

Page 33: Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 33

Cost Estimator

• For each cell size, count the number of points within cells of that size => estimator for cost:

log

0det 2

i iSPC

iCnPE

020

Page 34: Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 34

Cost Estimator

• For each cell size, count the number of points within cells of that size => estimator for cost:

log

0det 2

i iSPC

iCnPE

020

Page 35: Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 35

Cost Estimator

• For each cell size, count the number of points within cells of that size => estimator for cost:

log

0det 2

i iSPC

iCnPE

10 2920 9 points

Page 36: Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 36

Cost Estimator

• For each cell size, count the number of points within cells of that size => estimator for cost:

10 2920

log

0det 2

i iSPC

iCnPE

Page 37: Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 37

Cost Estimator

• For each cell size, count the number of points within cells of that size => estimator for cost:

10 2920 46

272920 210

log

0det 2

i iSPC

iCnPE7 points

Page 38: Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 38

Value of Cost Estimator is (OPT(P))

iii CndCn 222

• Contribution of heavy cell C in level i is at most

• Contribution of light cell C in level i is at most

ii CndCnf 22

• A heavy cell in level i contains ( f / 2i) points.• The space partition is balanced.• The distance of a cell in level i to heavy cell is O(2i).

Page 39: Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 39

Value of Cost Estimator is O(OPT(P))

• Contribution of distant cell C in level i is at least n(C) .2i-1

• OPT(P) f . |FOPT|• Estimated cost for near cell C in level i is n(C) .2i = O( f )• There is a constant number of near cells.• Estimated cost for near cells is O( f . |FOPT|)

level i

radius 2i-1

Page 40: Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 40

Deterministic Method• Impose log()+1 nested squared grids• In each grid, identify the heavy cells• Partition the input space based on the heavy cells• For each cell size, count the number of points within

cells of that size => estimator for cost:

log

0det 2

i iSPC

iCnPE

Page 41: Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 41

Randomized MethodIdea:

– Heavy cell in level i contains at least f /2i points– Sample a point in level i with probability 2i/f

Problem: coin flips & delete operationsSolution:

– Hash function hi : { 1,…, }d → { 1,…, f / 2i }

– Sample set Si = { p P | hi( p) = 1 }

1 2 3 4 if 2

…hi

Page 42: Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 42

Randomized Methodfor each level i do

F(i) set of all marked cells C in level i such thata) no subcell of C is markedb) no smaller cell within a distance of less than 2i-1

is marked

return

log

0

)(i

rand iFfPE

Erand(P) = (Edet(P))

Page 43: Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 43

Idea: Reduction to counting distinct elements

Implementation:- For each level i count distinct elements in

DE1(i) = {C|C is in level i and marked}{C|C is in level i and a) or b) fails}

and DE2(i) = {C|C is in level i and a) or b) fails}

- Output difference as cost for level i

Streaming Method

DE1(i)

DE2(i)

DE1(i+1)

DE2(i+1)

Page 44: Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler

IITK Workshop on Algorithms for Christiane LammersenProcessing Massive Data Sets 444444

Conclusion & Future Work

Streaming Algorithm for Dynamic FLP:• constant factor approximation of cost• update-time: O(log(1/) . polylog())

• space : O(log(1/) . polylog())

• failure probability:

Future Work:• approximation factor not exponential in d• (1+)-approximation algorithm

44

Page 45: Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler

Thank you for your attention!

Department of Computer ScienceTechnische Universität DortmundOtto-Hahn-Str. 1444221 Dortmund, Germany

Phone: +49 231 755-4762 Fax.: +49 231 755-2047 Email: [email protected]://ls2-www.cs.uni-dortmund.de/~lammersen/