20
C. Faloutsos 15-826 1 CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (regions, quadtrees, knn queries) C. Faloutsos 15-826 Copyright: C. Faloutsos (2009) 2 CMU SCS Must-read Material Alberto Belussi and Christos Faloutsos, Estimating the Selectivity of Spatial Queries Using the `Correlation' Fractal Dimension Proc. of VLDB, p. 299-310, 1995 15-826 Copyright: C. Faloutsos (2009) 3 CMU SCS Optional Material Optional, but very useful: Manfred Schroeder Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman and Company, 1991

15-826: Multimedia Databases and Data Mining › ... › courses › 826.S09 › FOILS-pdf › 190_fra… · Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 15-826: Multimedia Databases and Data Mining › ... › courses › 826.S09 › FOILS-pdf › 190_fra… · Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman

C. Faloutsos 15-826

1

CMU SCS

15-826: Multimedia Databases

and Data Mining

Lecture #12: Fractals - case studies Part III

(regions, quadtrees, knn queries)

C. Faloutsos

15-826 Copyright: C. Faloutsos (2009) 2

CMU SCS

Must-read Material

• Alberto Belussi and Christos Faloutsos,

Estimating the Selectivity of Spatial Queries

Using the `Correlation' Fractal Dimension

Proc. of VLDB, p. 299-310, 1995

15-826 Copyright: C. Faloutsos (2009) 3

CMU SCS

Optional Material

Optional, but very useful: Manfred Schroeder

Fractals, Chaos, Power Laws: Minutes

from an Infinite Paradise W.H. Freeman

and Company, 1991

Page 2: 15-826: Multimedia Databases and Data Mining › ... › courses › 826.S09 › FOILS-pdf › 190_fra… · Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman

C. Faloutsos 15-826

2

15-826 Copyright: C. Faloutsos (2009) 4

CMU SCS

Outline

Goal: ‘Find similar / interesting things’

• Intro to DB

• Indexing - similarity search

• Data Mining

15-826 Copyright: C. Faloutsos (2009) 5

CMU SCS

Indexing - Detailed outline

• primary key indexing

• secondary key / multi-key indexing

• spatial access methods

– z-ordering

– R-trees

– misc

• fractals

– intro

– applications

• text

• ...

15-826 Copyright: C. Faloutsos (2009) 6

CMU SCS

Indexing - Detailed outline

• fractals

– intro

– applications

• disk accesses for R-trees (range queries)

• dimensionality reduction

• selectivity in M-trees

• dim. curse revisited

• “fat fractals”

• quad-tree analysis [Gaede+]

• nn queries [Belussi+]

Page 3: 15-826: Multimedia Databases and Data Mining › ... › courses › 826.S09 › FOILS-pdf › 190_fra… · Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman

C. Faloutsos 15-826

3

15-826 Copyright: C. Faloutsos (2009) 7

CMU SCS

‘Fat’ fractals & R-tree

performance on region data• Problem [Proietti+,’99]

• Given

– N (# of data regions )

• estimate how many of them will qualify for the average range query (q1 x q2 x ... qE)

Of course, we need more info

Q: what?

15-826 Copyright: C. Faloutsos (2009) 8

CMU SCS

R-tree performance on region

dataA: the distributions of their sizes

Q: do we also need some info about the locations?

15-826 Copyright: C. Faloutsos (2009) 9

CMU SCS

R-tree performance on region

dataA: the distributions of their sizes

Q: do we also need some info about the locations?

A: no (not for range queries)

Page 4: 15-826: Multimedia Databases and Data Mining › ... › courses › 826.S09 › FOILS-pdf › 190_fra… · Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman

C. Faloutsos 15-826

4

15-826 Copyright: C. Faloutsos (2009) 10

CMU SCS

R-tree performance on region

dataA: the distributions of their sizes

Q: what exactly would we need?

15-826 Copyright: C. Faloutsos (2009) 11

CMU SCS

R-tree performance on region

dataA: the distributions of their sizes

Q: what exactly would we need?

A: for self-similar regions (~ ‘fat’ fractals), we just need the slope of the Korcak law!

(and the total area) [Proietti+]

15-826 Copyright: C. Faloutsos (2009) 12

CMU SCS

More power laws: areas –

Korcak’s law

Scandinavian lakes

Any pattern?

Page 5: 15-826: Multimedia Databases and Data Mining › ... › courses › 826.S09 › FOILS-pdf › 190_fra… · Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman

C. Faloutsos 15-826

5

15-826 Copyright: C. Faloutsos (2009) 13

CMU SCS

More power laws: areas –

Korcak’s law

Scandinavian lakes

area vs

complementary

cumulative count

(log-log axes) 0

2

4

6

8

10

0 2 4 6 8 10 12 14

Num

ber

of

regio

ns

Area

Area distribution of regions (SL dataset)

-0.85*x +10.8

log(count( >= area))

log(area)

B (patchiness

exponent)

15-826 Copyright: C. Faloutsos (2009) 14

CMU SCS

More power laws: Korcak

1

10

100

1000

1 10 100 1000 10000 100000

Nu

mb

er

of

isla

nd

s

Area

Area distribution of Japan Archipelago

Japan islands;

area vs cumulative

count (log-log axes) log(area)

log(count( >= area))

15-826 Copyright: C. Faloutsos (2009) 15

CMU SCS

Korcak’s law & “fat fractals”

Page 6: 15-826: Multimedia Databases and Data Mining › ... › courses › 826.S09 › FOILS-pdf › 190_fra… · Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman

C. Faloutsos 15-826

6

15-826 Copyright: C. Faloutsos (2009) 16

CMU SCS

R-tree performance on regions

• Once we know ‘B’ (and the total area)

• we can second-guess the individual sizes

• and then apply the [Pagel+93] formula

• Bottom line:

15-826 Copyright: C. Faloutsos (2009) 17

CMU SCS

R-tree performance on regions

15-826 Copyright: C. Faloutsos (2009) 18

CMU SCS

R-tree performance on regions

0.70190,526757REGIONS

0.60136,893470ISLANDS

0.8575,910816LAKES

BANDataset

Page 7: 15-826: Multimedia Databases and Data Mining › ... › courses › 826.S09 › FOILS-pdf › 190_fra… · Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman

C. Faloutsos 15-826

7

15-826 Copyright: C. Faloutsos (2009) 19

CMU SCS

R-tree performance on regions

query side

sel. error

15-826 Copyright: C. Faloutsos (2009) 20

CMU SCS

‘Fat’ fractals - observation

B: patchiness exp; d: embedding dim

DH: Hausdorff of periphery

B= DH / d

15-826 Copyright: C. Faloutsos (2009) 21

CMU SCS

‘Fat’ fractals - observation

Page 8: 15-826: Multimedia Databases and Data Mining › ... › courses › 826.S09 › FOILS-pdf › 190_fra… · Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman

C. Faloutsos 15-826

8

15-826 Copyright: C. Faloutsos (2009) 22

CMU SCS

‘Fat’ fractals

• intuition behind B = DH / d ?

15-826 Copyright: C. Faloutsos (2009) 23

CMU SCS

‘Fat’ fractals

• intuition behind B = DH / d ?

• A: consider ‘flooding’:

15-826 Copyright: C. Faloutsos (2009) 24

CMU SCS

‘Fat’ fractals

• intuition behind B = DH / d ?

• A: consider ‘flooding’:

Page 9: 15-826: Multimedia Databases and Data Mining › ... › courses › 826.S09 › FOILS-pdf › 190_fra… · Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman

C. Faloutsos 15-826

9

15-826 Copyright: C. Faloutsos (2009) 25

CMU SCS

Conclusions

• ‘Fat’ fractals model regions well

• patchiness exp.: B = DH / d

• can help us estimate selectivities

15-826 Copyright: C. Faloutsos (2009) 26

CMU SCS

Indexing - Detailed outline

• fractals

– intro

– applications

• disk accesses for R-trees (range queries)

• dimensionality reduction

• selectivity in M-trees

• dim. curse revisited

• “fat fractals”

• quad-tree analysis [Gaede+]

• nn queries [Belussi+]

15-826 Copyright: C. Faloutsos (2009) 27

CMU SCS

Fractals and Quadtrees

• Problem: how many quadtree nodes will we

need, to store a region in some level of

approximation? [Gaede+96]

Page 10: 15-826: Multimedia Databases and Data Mining › ... › courses › 826.S09 › FOILS-pdf › 190_fra… · Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman

C. Faloutsos 15-826

10

15-826 Copyright: C. Faloutsos (2009) 28

CMU SCS

Fractals and Quadtrees

• I.e.:

15-826 Copyright: C. Faloutsos (2009) 29

CMU SCS

Fractals and Quadtrees

• I.e.:

level of quadtree

# of quadtree

‘blocks’

?

15-826 Copyright: C. Faloutsos (2009) 30

CMU SCS

Fractals and Quadtrees

• Datasets:

FranconiaBrain Atlas

Page 11: 15-826: Multimedia Databases and Data Mining › ... › courses › 826.S09 › FOILS-pdf › 190_fra… · Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman

C. Faloutsos 15-826

11

15-826 Copyright: C. Faloutsos (2009) 31

CMU SCS

Fractals and Quadtrees

• Hint:

– assume that the boundary is self-similar, with a

given fd

– how will the quad-tree (oct-tree) look like?

15-826 Copyright: C. Faloutsos (2009) 32

CMU SCS

Fractals and Quadtrees

white

black

gray

15-826 Copyright: C. Faloutsos (2009) 33

CMU SCS

Fractals and QuadtreesLet pg(i) the prob. to find a gray node at level i.

If self-similar, what can we say for pg(i) ?

Page 12: 15-826: Multimedia Databases and Data Mining › ... › courses › 826.S09 › FOILS-pdf › 190_fra… · Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman

C. Faloutsos 15-826

12

15-826 Copyright: C. Faloutsos (2009) 34

CMU SCS

Fractals and QuadtreesLet pg(i) the prob. to find a gray node at level i.

If self-similar, what can we say for pg(i) ?

A: pg(i) = pg= constant

15-826 Copyright: C. Faloutsos (2009) 35

CMU SCS

Fractals and Quadtrees

Assume only ‘gray’ and ‘white’ nodes (ie., no volume’)

Assume that pg is given - how many gray nodes at level i?

15-826 Copyright: C. Faloutsos (2009) 36

CMU SCS

Fractals and Quadtrees

Assume only ‘gray’ and ‘white’ nodes (ie., no volume’)

Assume that pg is given - how many gray nodes at level i?

A: 1 at level 0;

4*pg

(4*pg)* (4*pg)

...

(4*pg )i

Page 13: 15-826: Multimedia Databases and Data Mining › ... › courses › 826.S09 › FOILS-pdf › 190_fra… · Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman

C. Faloutsos 15-826

13

15-826 Copyright: C. Faloutsos (2009) 37

CMU SCS

Fractals and Quadtrees

• I.e.:

level of quadtree

# of quadtree

‘blocks’

? (4*pg) i

15-826 Copyright: C. Faloutsos (2009) 38

CMU SCS

Fractals and Quadtrees

• I.e.:

level of quadtree

log(# of quadtree

‘blocks’)

? log[(4*pg) i]

15-826 Copyright: C. Faloutsos (2009) 39

CMU SCS

Fractals and Quadtrees

• Conclusion: Self-similarity leads to easy

and accurate estimation

level

log2(#blocks)

Page 14: 15-826: Multimedia Databases and Data Mining › ... › courses › 826.S09 › FOILS-pdf › 190_fra… · Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman

C. Faloutsos 15-826

14

15-826 Copyright: C. Faloutsos (2009) 40

CMU SCS

Fractals and Quadtrees

• Conclusion: Self-similarity leads to easy

and accurate estimation

level

log2(#blocks)

15-826 Copyright: C. Faloutsos (2009) 41

CMU SCS

Fractals and Quadtrees

15-826 Copyright: C. Faloutsos (2009) 42

CMU SCS

Fractals and Quadtrees

level

log(#blocks)

Page 15: 15-826: Multimedia Databases and Data Mining › ... › courses › 826.S09 › FOILS-pdf › 190_fra… · Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman

C. Faloutsos 15-826

15

15-826 Copyright: C. Faloutsos (2009) 43

CMU SCS

Fractals and Quadtrees

• Final observation: relationship between pg

and fractal dimension?

15-826 Copyright: C. Faloutsos (2009) 44

CMU SCS

Fractals and Quadtrees

• Final observation: relationship between pg

and fractal dimension?

• A: very close:

(4*pg)i = # of gray nodes at level i =

# of Hausdorff grid-cells of side (1/2)i = r

Eventually: DH = 2 + log2( pg )

and, for E-d spaces: DH = E + log2( pg )

15-826 Copyright: C. Faloutsos (2009) 45

CMU SCS

Fractals and Quadtrees

for E-d spaces: DH = E + log2( pg )

Sanity check:

- point: DH = 0 pg = ??

- line in 2-d: DH = 1 pg = ??

- plane in 2-d: DH=2 pg= ??

Page 16: 15-826: Multimedia Databases and Data Mining › ... › courses › 826.S09 › FOILS-pdf › 190_fra… · Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman

C. Faloutsos 15-826

16

15-826 Copyright: C. Faloutsos (2009) 46

CMU SCS

Fractals and Quadtrees

Final conclusions:

• self-similarity leads to estimates for # of z-

values = # of quadtree/oct-tree blocks

• close dependence on the Hausdorff fractal

dimension of the boundary

15-826 Copyright: C. Faloutsos (2009) 47

CMU SCS

Indexing - Detailed outline

• fractals

– intro

– applications

• disk accesses for R-trees (range queries)

• dimensionality reduction

• selectivity in M-trees

• dim. curse revisited

• “fat fractals”

• quad-tree analysis [Gaede+]

• nn queries [Belussi+]

15-826 Copyright: C. Faloutsos (2009) 48

CMU SCS

NN queries

• Q: in NN queries, what is the effect of the shape of the query region? [Belussi+95]

rLinf

L2

L1

Page 17: 15-826: Multimedia Databases and Data Mining › ... › courses › 826.S09 › FOILS-pdf › 190_fra… · Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman

C. Faloutsos 15-826

17

15-826 Copyright: C. Faloutsos (2009) 49

CMU SCS

NN queries

• Q: in NN queries, what is the effect of the shape of the query region?

• that is, for L2, and self-similar data:

log(d)

log(#pairs-within(<=d))

r L2

D2

15-826 Copyright: C. Faloutsos (2009) 50

CMU SCS

NN queries

• Q: What about L1, Linf?

log(d)

log(#pairs-within(<=d))

r L2

D2

15-826 Copyright: C. Faloutsos (2009) 51

CMU SCS

NN queries

• Q: What about L1, Linf?

• A: Same slope, different intercept

log(d)

log(#pairs-within(<=d))

r L2

D2

Page 18: 15-826: Multimedia Databases and Data Mining › ... › courses › 826.S09 › FOILS-pdf › 190_fra… · Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman

C. Faloutsos 15-826

18

15-826 Copyright: C. Faloutsos (2009) 52

CMU SCS

NN queries

• Q: What about L1, Linf?

• A: Same slope, different intercept

log(d)

log(#neighbors)

15-826 Copyright: C. Faloutsos (2009) 53

CMU SCS

NN queries

• Q: what about the intercept? Ie., what can we say about N2 and Ninf

r L2

N2 neighbors

rLinf

Ninf neighbors

volume: V2 volume: Vinf

SKIP

15-826 Copyright: C. Faloutsos (2009) 54

CMU SCS

NN queries

• Consider sphere with volume Vinf and r’radius

r L2

N2 neighbors

rLinf

Ninf neighbors

volume: V2 volume: Vinf

r’

SKIP

Page 19: 15-826: Multimedia Databases and Data Mining › ... › courses › 826.S09 › FOILS-pdf › 190_fra… · Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman

C. Faloutsos 15-826

19

15-826 Copyright: C. Faloutsos (2009) 55

CMU SCS

NN queries

• Consider sphere with volume Vinf and r’radius

• (r/r’)^E = V2 / Vinf

• (r/r’)^D2 = N2 / N2’

• N2’ = Ninf (since shape does not matter)

• and finally:

SKIP

15-826 Copyright: C. Faloutsos (2009) 56

CMU SCS

NN queries

( N2 / Ninf ) ^ 1/D2 = (V2 / Vinf) ^ 1/E

SKIP

15-826 Copyright: C. Faloutsos (2009) 57

CMU SCS

NN queries

Conclusions: for self-similar datasets

• Avg # neighbors: grows like (distance)^D2 , regardless of query shape (circle, diamond, square, e.t.c. )

Page 20: 15-826: Multimedia Databases and Data Mining › ... › courses › 826.S09 › FOILS-pdf › 190_fra… · Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman

C. Faloutsos 15-826

20

15-826 Copyright: C. Faloutsos (2009) 58

CMU SCS

Indexing - Detailed outline

• fractals

– intro

– applications• disk accesses for R-trees (range queries)

• dimensionality reduction

• selectivity in M-trees

• dim. curse revisited

• “fat fractals”

• quad-tree analysis [Gaede+]

• nn queries [Belussi+]

– Conclusions

15-826 Copyright: C. Faloutsos (2009) 59

CMU SCS

Fractals - overall conclusions

• self-similar datasets: appear often

• powerful tools: correlation integral, NCDF, rank-frequency plot

• intrinsic/fractal dimension helps in

– estimations (selectivities, quadtrees, etc)

– dim. reduction / dim. curse

• (later: can help in image compression...)

15-826 Copyright: C. Faloutsos (2009) 60

CMU SCS

References

1. Belussi, A. and C. Faloutsos (Sept. 1995). Estimating the

Selectivity of Spatial Queries Using the `Correlation' Fractal

Dimension. Proc. of VLDB, Zurich, Switzerland.

2. Faloutsos, C. and V. Gaede (Sept. 1996). Analysis of the z-

ordering Method Using the Hausdorff Fractal Dimension.

VLDB, Bombay, India.

3. Proietti, G. and C. Faloutsos (March 23-26, 1999). I/O

complexity for range queries on region data stored using an R-

tree. International Conference on Data Engineering (ICDE),

Sydney, Australia.