23
Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller

Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller

Embed Size (px)

Citation preview

Page 1: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller

Mining for Empty Rectangles in Large Data Sets

Jeff Edmonds

Jarek Gryz

Dongming Liang

Renee Miller

Page 2: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller

2

0 0 1 1 0 0 0 0 1

1 2 3 6 7 8

Matrix representation

A B 3 1 3

6 7 8

A,B(R S)

Page 3: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller

3

0 0 1 1 0 0 0 0 1

1 2 3 6 7 8

Find All Maximal 0-Rectangles

A,B(R S)

000

0 00

al

00

0

um

A B 3 1 3

6 7 8

Page 4: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller

4

0 0 1 1 0 0 0 0 1

95 96 97 BMW Z3 Honda L2 Toyota 6A

Example

A,B(R S)

0 0Car Year

First BMW Z3 series cars were made in 1997.

Page 5: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller

5

Relation to Previous Work

[Lui, Ku, Hsu] & [Orlowski] Our Work

Problem:

Purpose:• Machine Learning• Computational Geometry

• Query Optimization

• between points in real plane

• within a 0-1 matrix

Find all maximal empty rectangles

# of maximal 0-rectangles:• O( (# 1’s)2 ) • O( #0’s )

[Namaad, Hsu, Lee]

Page 6: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller

6

Relation to Previous WorkOur Work

Time:

Space:• O(|X||Y|) • O(min(|X|, |Y|))

• only two rows of matrix kept in memory

• O( # 1’s log(#1’s) + # rectangles ) = O(|X||Y|)

• O( #0’s ) = O(|X||Y|)

[Lui, Ku, Hsu] & [Orlowski][Namaad, Hsu, Lee]

Page 7: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller

7

Relation to Previous WorkOur Work

Practical Implementation:

Scalable:• Scales Badly • Scales well wrt

• # of tuples in join• # of maximal rectangles• # of values |X| & |Y|

• Intensive random memory access

Requires a single scan of the sorted data

Practical?• IBM paid us $25,000

to patent it!

[Lui, Ku, Hsu] & [Orlowski][Namaad, Hsu, Lee]

Page 8: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller

8

Structure of Algorithmloop y = 1..|Y|

loop x = 1..|X|• Output all maximal 0-rectangles

with <x,y> as bottom-right corner• Maintain the loop invariant

1

1

1

1

1

X

•0

Y

0

1

Timing

O(1) amortized time per <x,y>

<x,y> *

Page 9: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller

9

Designing an Algorithm Define Problem Define Loop

InvariantsDefine Measure of Progress

Define Step Define Exit Condition Maintain Loop Inv

Make Progress Initial Conditions Ending

km

79 km

to school

Exit

Exit

79 km 75 km

Exit

Exit

0 km Exit

Page 10: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller

10

1

1

1

1

1•00

1

XY

<x,y> *

Define the Loop Invariant• We have read the matrix up to <x,y>

and cannot reread the matrix.• We must output all maximal 0-rectangles

with <x,y> as bottom-right corner• What must we remember?

Page 11: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller

11

0

step

1

1

1

1

1 0

( x ,y )r r

( x ,y )1 1

( x ,y )2 2

( x ,y )3 3

( x ,y )4 4

( x ,y )5 5

Stack of steps 1

1

X

Y

<x,y> *1 0 0 0 0

10

00

0

0

x*

y*

Page 12: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller

12

1

1

1

1

1

1

1

X

Y

0

1 0 0 0 0

10

00

0

0

( x ,y )r r

( x ,y )1 1

( x , y )

0

<x,y> *

Constructing Maximal Rectangles

Page 13: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller

13

1

1

1

1

1

1

1

X

Y

0

1 0 0 0 0

10

00

0

0

( x ,y )r r

( x ,y )1 1

( x , y )

0

• Too Narrow • Maximal• Too short

<x,y> *

Constructing Maximal Rectangles

Page 14: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller

14<x-1,y> *

Constructing staircase(x,y)from staircase(x-1,y)

1

1

1

1

1

1

1

1 0 0 0 0

00

00

0

0

0

00

00

0

1

0

00

0

Case 1

<x,y> *

0

Page 15: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller

151

1

1

1

1

1

1

X

Y

0

1 0 0 0 0

1

0

00

0

0

( x ,y )r r

( x ,y )1 1

( x, y )

0<x-1,y> *

Constructing staircase(x,y)from staircase(x-1,y)

0

Case 2

Page 16: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller

161

1

1

1

1

1

1

X

Y

0

1 0 0 0 0

1

0

00

0

0

( x ,y )r r

( x ,y )1 1

( x, y )

0

• Too Narrow • Maximal• Too short

<x-1,y> *

Constructing staircase(x,y)from staircase(x-1,y)

00

Delete

Keep

<x,y> *

0

Page 17: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller

17

Constructing x* & y*

1

1

1

1

1

1

1

0

1 0 0 0 0

( x ,y )r r

( x ,y )1 1

( x, y )

0<x,y> *

00

00

0

0

01

0

x*

y*

Page 18: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller

18X

Y

<x,y>

10

0

00

00

0

100

00

0

01

0

1

00

0

00

00

0

0

01

000

00

0

0

100

00

0

0

10

01

0

0

10

00

0

0

10

0

00

00

0

0

100

00

0

0

01

000

00

0

0

10

Location of last 1 seen in each column

*

Page 19: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller

19

Structure of Algorithmloop y = 1..|Y|

loop x = 1..|X|• Construct staircase(x,y)• Output all maximal 0-rectangles

with <x,y> as bottom-right corner

1

1

1

1

1

X

•0

Y

<x.y>

0

1

Timing

O(1) amortized time per <x,y>

Third

<x,y> *

Page 20: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller

201

1

1

1

1

1

1

X

Y

0

1 0 0 0 0

1

0

00

0

0

( x ,y )r r

( x ,y )1 1

( x, y )

0

• Too Narrow • Maximal• Too short

<x,y> *

Timing

00

Delete

0

Only work that is not constant Time

Page 21: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller

21

TimingAmortized # of steps deleted (per <x,y>)

= # of steps created (per <x,y>) 1£

<x-1,y> *1

1

1

1

1

1

1

1 0 0 0 0

00

00

0

0

0

00

00

0

1

0

00

0

Page 22: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller

22

Number of Maximal Rectangles

# of maximal 0-rectangles:

• O( (# 1’s)2 ) [Namaad, Hsu, Lee]• Running time of alg = O( #0’s )

£

£

Page 23: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller

23

Designing an Algorithm Define Problem Define Loop

InvariantsDefine Measure of Progress

Define Step Define Exit Condition Maintain Loop Inv

Make Progress Initial Conditions Ending

km

79 km

to school

Exit

Exit

79 km 75 km

Exit

Exit

0 km Exit