115
Introduction to Fractals and Fractal Dimension Christos Faloutsos

Introduction to Fractals and Fractal Dimension Christos Faloutsos

Embed Size (px)

Citation preview

Page 1: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Introduction to Fractalsand Fractal Dimension

Christos Faloutsos

Page 2: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 2

Intro to Fractals - Outline

• Motivation – 3 problems / case studies

• Definition of fractals and power laws

• Solutions to posed problems

• More examples

• Discussion - putting fractals to work!

• Conclusions – practitioner’s guide

• Appendix: gory details - boxcounting plots

Page 3: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 3

Road end-points of Montgomery county:

•Q : distribution?

• not uniform

• not Gaussian

• ???

• no rules/pattern at all!?

Problem #1: GIS - points

Page 4: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 4

Problem #2 - spatial d.m.

Galaxies (Sloan Digital Sky Survey w/ B. Nichol)

- ‘spiral’ and ‘elliptical’ galaxies

(stores and households ...)

- patterns?

- attraction/repulsion?

- how many ‘spi’ within r from an ‘ell’?

Page 5: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 5

Problem #3: traffic

• disk trace (from HP - J. Wilkes); Web traffic

time

#bytes

Poisson

- how many explosions to expect?

- queue length distr.?

Page 6: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 6

Common answer for all three:

• Fractals / self-similarities / power laws

• Seminal works from Hilbert, Minkowski, Cantor, Mandelbrot, (Hausdorff, Lyapunov, Ken Wilson, …)

Page 7: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 7

Road map

• Motivation – 3 problems / case studies

• Definition of fractals and power laws

• Solutions to posed problems

• More examples and tools

• Discussion - putting fractals to work!

• Conclusions – practitioner’s guide

• Appendix: gory details - boxcounting plots

Page 8: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 8

Definitions (cont’d)

• Simple 2-D figure has finite perimeter and finite area– area ≈ perimeter2

• Do all 2-D figures have finite perimeter,

finite area, and area ≈ perimeter2?

Page 9: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 9

What is a fractal?

= self-similar point set, e.g., Sierpinski triangle:

...zero area;

infinite length!

Page 10: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 10

Definitions (cont’d)

• Paradox: Infinite perimeter ; Zero area!

• ‘dimensionality’: between 1 and 2

• actually: Log(3)/Log(2) = 1.58…

Page 11: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 11

Definitions (cont’d)

• “self similar”– pieces look like whole independent of scale

• “dimension”dim = log(# self-similar pieces) / log(scale)

– dim(line) = log(2)/log(2) = 1– dim(square) = log(4)/log(2) = 2– dim(cube) = log(8)/log(2) = 3

Page 12: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 12

Intrinsic (‘fractal’) dimension

• Q: fractal dimension of a line?

• A: 1 (= log(2)/log(2))

Page 13: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 13

Intrinsic (‘fractal’) dimension

• Q: dfn for a given set of points?

42

33

24

15

yx

Page 14: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 14

Intrinsic (‘fractal’) dimension

• Q: fractal dimension of a line?

• A: nn ( <= r ) ~ r^1(‘power law’: y=x^a)

• Q: fd of a plane?• A: nn ( <= r ) ~ r^2Fd = slope of (log(nn) vs

log(r) )

Page 15: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 15

Intrinsic (‘fractal’) dimension

• Algorithm, to estimate it?Notice• avg nn(<=r) is exactly tot#pairs(<=r) / (2*N)

including ‘mirror’ pairs

Page 16: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 16

Dfn of fd:

ONLY for a perfectly self-similar point set:

=log(n)/log(f) = log(3)/log(2) = 1.58

...zero area;

infinite length!

Page 17: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 17

Sierpinsky triangle

log( r )

log(#pairs within <=r )

1.58

== ‘correlation integral’

Page 18: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 18

Observations:

• Euclidean objects have integer fractal dimensions – point: 0– lines and smooth curves: 1– smooth surfaces: 2

• fractal dimension -> roughness of the periphery

Page 19: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 19

Important properties

• fd = embedding dimension -> uniform pointset

• a point set may have several fd, depending on scale

Page 20: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 20

Road map

• Motivation – 3 problems / case studies

• Definition of fractals and power laws

• Solutions to posed problems

• More examples and tools

• Discussion - putting fractals to work!

• Conclusions – practitioner’s guide

• Appendix: gory details - boxcounting plots

Page 21: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 21

Cross-roads of Montgomery county:

•any rules?

Problem #1: GIS points

Page 22: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 22

Solution #1

A: self-similarity ->• <=> fractals • <=> scale-free• <=> power-laws

(y=x^a, F=C*r^(-2))• avg#neighbors(<= r )

= r^D

log( r )

log(#pairs(within <= r))

1.51

Page 23: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 23

Solution #1

A: self-similarity• avg#neighbors(<= r )

~ r^(1.51)

log( r )

log(#pairs(within <= r))

1.51

Page 24: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 24

Examples:MG county

• Montgomery County of MD (road end-points)

Page 25: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 25

Examples:LB county

• Long Beach county of CA (road end-points)

Page 26: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 26

Solution#2: spatial d.m.Galaxies ( ‘BOPS’ plot - [sigmod2000])

log(#pairs)

log(r)

Page 27: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 27

Solution#2: spatial d.m.

log(r)

log(#pairs within <=r )

spi-spi

spi-ell

ell-ell

- 1.8 slope

- plateau!

- repulsion!

Page 28: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 28

spatial d.m.

log(r)

log(#pairs within <=r )

spi-spi

spi-ell

ell-ell

- 1.8 slope

- plateau!

- repulsion!

Page 29: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 29

spatial d.m.

r1r2

r1

r2

Heuristic on choosing # of clusters

Page 30: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 30

spatial d.m.

log(r)

log(#pairs within <=r )

spi-spi

spi-ell

ell-ell

- 1.8 slope

- plateau!

- repulsion!

Page 31: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 31

spatial d.m.

log(r)

log(#pairs within <=r )

spi-spi

spi-ell

ell-ell

- 1.8 slope

- plateau!

-repulsion!!

-duplicates

Page 32: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 32

Solution #3: traffic

• disk traces: self-similar:

time

#bytes

Page 33: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 33

Solution #3: traffic

• disk traces (80-20 ‘law’ = ‘multifractal’)

time

#bytes

20% 80%

Page 34: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 34

Solution#3: traffic

Clarification:

• fractal: a set of points that is self-similar

• multifractal: a probability density function that is self-similar

Many other time-sequences are bursty/clustered: (such as?)

Page 35: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 35

Tape accesses

time

Tape#1 Tape# N

# tapes needed, to retrieve n records?

(# days down, due to failures / hurricanes / communication noise...)

Page 36: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 36

Tape accesses

time

Tape#1 Tape# N

# tapes retrieved

# qual. records

50-50 = Poisson

real

Page 37: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 37

Fast estimation of fd(s):

• How, for the (correlation) fractal dimension?

• A: Box-counting plot:

log( r )

rpi

log(sum(pi ^2))

Page 38: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 38

Definitions

• pi : the percentage (or count) of points in the i-th cell

• r: the side of the grid

Page 39: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 39

Fast estimation of fd(s):

• compute sum(pi^2) for another grid side, r’

log( r )

r’

pi’

log(sum(pi ^2))

Page 40: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 40

Fast estimation of fd(s):• etc; if the resulting plot has a linear part, its

slope is the correlation fractal dimension D2

log( r )

log(sum(pi ^2))

Page 41: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 41

Hausdorff or box-counting fd:

• Box counting plot: Log( N ( r ) ) vs Log ( r)

• r: grid side

• N (r ): count of non-empty cells

• (Hausdorff) fractal dimension D0:

)log(

))(log(0 r

rND

ƒƒ

−=

Page 42: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 42

Definitions (cont’d)

• Hausdorff fd:r

log(r)

log(#non-empty cells)

D0

Page 43: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 43

Observations

• q=0: Hausdorff fractal dimension

• q=2: Correlation fractal dimension (identical to the exponent of the number of neighbors vs radius)

• q=1: Information fractal dimension

Page 44: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 44

Observations, cont’d

• in general, the Dq’s take similar, but not identical, values.

• except for perfectly self-similar point-sets, where Dq=Dq’ for any q, q’

Page 45: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 45

Examples:MG county

• Montgomery County of MD (road end-points)

Page 46: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 46

Examples:LB county

• Long Beach county of CA (road end-points)

Page 47: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 47

Summary So Far

• many fractal dimensions, with nearby values

• can be computed quickly (O(N) or O(N log(N))

• (code: on the web)

Page 48: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 48

Dimensionality Reduction with FD

Problem definition: ‘Feature selection’• given N points, with E dimensions• keep the k most ‘informative’ dimensions[Traina+,SBBD’00]

Page 49: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 49

Dim. reduction - w/ fractals

y

xxx

yy(a) Quarter-circle (c) Spike(b)Line1

0000

1

not informative

Page 50: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 50

Dim. reduction

Problem definition: ‘Feature selection’• given N points, with E dimensions• keep the k most ‘informative’ dimensionsRe-phrased: spot and drop attributes with

strong (non-)linear correlationsQ: how do we do that?

Page 51: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 51

Dim. reduction

A: Hint: correlated attributes do not affect the intrinsic/fractal dimension, e.g., if

y = f(x,z,w)we can drop y(hence: ‘partial fd’ (PFD) of a set of

attributes = the fd of the dataset, when projected on those attributes)

Page 52: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 52

Dim. reduction - w/ fractals

y

xxx

yy(a) Quarter-circle (c) Spike(b)Line1

0000

1

PFD~0

PFD=1

global FD=1

Page 53: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 53

Dim. reduction - w/ fractals

y

xxx

yy(a) Quarter-circle (c) Spike(b)Line1

0000

1

PFD=1

PFD=1

global FD=1

Page 54: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 54

Dim. reduction - w/ fractals

y

xxx

yy(a) Quarter-circle (c) Spike(b)Line1

0000

1

PFD~1

PFD~1global FD=1

Page 55: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 55

Dim. reduction - w/ fractals

• (problem: given N points in E-d, choose k best dimensions)

• Q: Algorithm?

Page 56: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 56

Dim. reduction - w/ fractals

• Q: Algorithm?

• A: e.g., greedy - forward selection:– keep the attribute with highest partial fd– add the one that causes the highest increase in

pfd– etc., until we are within epsilon from the full

f.d.

Page 57: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 57

Dim. reduction - w/ fractals

• (backward elimination: ~ reverse)– drop the attribute with least impact on the p.f.d.– repeat– until we are epsilon below the full f.d.

Page 58: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 58

Dim. reduction - w/ fractals

• Q: what is the smallest # of attributes we should keep?

Page 59: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 59

Dim. reduction - w/ fractals

• Q: what is the smallest # of attributes we should keep?

• A: we should keep at least as many as the f.d. (and probably, a few more)

Page 60: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 60

Dim. reduction - w/ fractals

• Results: E.g., on the ‘currency’ dataset

• (daily exchange rates for USD, HKD, BP, FRF, DEM, JPY - i.e., 6-d vectors, one per day - base currency: CAD)

e.g.:

USD

FRF

Page 61: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 61

E.g., on the ‘currency’ dataset

correlation integral

1.98

log(r)

log(#pairs(<=r))

Page 62: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 62

E.g., on the ‘currency’ datasetif unif + indep.

USD HKD FRF DEM BP JY

Page 63: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 63

E.g., on the eigenface dataset

16-d vectors,one for each of ~1K faces

4.25

Page 64: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 64

E.g., on the eigenface dataset

Page 65: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 65

Dim. reduction - w/ fractals

Conclusion:• can do non-linear dim. reduction

y

x

(a) Quarter-circle1

00

1

PFD~1

PFD~1

global FD=1

Page 66: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 66

More tools

• Zipf’s law

• Korcak’s law / “fat fractals”

Page 67: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 67

A famous power law: Zipf’s law

• Q: vocabulary word frequency in a document - any pattern?

aaron zoo

freq.

Page 68: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 68

A famous power law: Zipf’s law

• Bible - rank vs frequency (log-log)

log(rank)

log(freq)

“a”

“the”

Page 69: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 69

A famous power law: Zipf’s law

• Bible - rank vs frequency (log-log)

• similarly, in many other languages; for customers and sales volume; city populations etc etclog(rank)

log(freq)

Page 70: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 70

A famous power law: Zipf’s law

•Zipf distr:

freq = 1/ rank

•generalized Zipf:

freq = 1 / (rank)^a

log(rank)

log(freq)

Page 71: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 71

Olympic medals (Sidney):

rank

log(#medals)

Page 72: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 72

More power laws: areas – Korcak’s law

Scandinavian lakes

Any pattern?

Page 73: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 73

More power laws: areas – Korcak’s law

Scandinavian lakes area vs complementary cumulative count (log-log axes)

log(count( >= area))

log(area)

Page 74: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 74

More power laws: Korcak

Japan islands

Page 75: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 75

More power laws: Korcak

Japan islands;

area vs cumulative count (log-log axes) log(area)

log(count( >= area))

Page 76: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 76

(Korcak’s law: Aegean islands)

Page 77: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 77

Korcak’s law & “fat fractals”

How to generate such regions?

Page 78: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 78

Korcak’s law & “fat fractals”Q: How to generate such regions?A: recursively, from a single region

Page 79: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 79

Conclusions

• tool#1: (for points) ‘correlation integral’: (#pairs within <= r) vs (distance r)

• tool#2: (for categorical values) rank-frequency plot (a’la Zipf)

• tool#3: (for numerical values) CCDF: Complementary cumulative distr. function (#of elements with value >= a )

Page 80: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 80

Fractals - overall conclusions

• Real data often disobey textbook assumptions: not Gaussian, not Poisson, not uniform, not independent)

• self-similar datasets: appear often• powerful tools: correlation integral, rank-

frequency plot, ...• intrinsic/fractal dimension helps:

– find patterns– dimensionality reduction

Page 81: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 81

NN queries• Q: What about L1, Linf?• A: Same slope, different intercept

log(d)

log(#neighbors)

Page 82: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 82

Practitioner’s guide:• tool#1: #pairs vs distance, for a set of objects, with a

distance function (slope = intrinsic dimensionality)

log(hops)

log(#pairs)

2.8

log( r )

log(#pairs(within <= r))

1.51

internetMGcounty

Page 83: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 83

Practitioner’s guide:• tool#2: rank-frequency plot (for categorical attributes)

log(rank)

log(degree)

-0.82

internet domains Biblelog(freq)

log(rank)

Page 84: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 84

Practitioner’s guide:• tool#3: CCDF, for (skewed) numerical attributes, eg.

areas of islands/lakes, UNIX jobs...)

log(count( >= area))

log(area)

scandinavian lakes

Page 85: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 85

Resources:

• Software for fractal dimension– http://www.cs.cmu.edu/~christos– [email protected]

Page 86: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 86

Books

• Strongly recommended intro book:– Manfred Schroeder Fractals, Chaos, Power

Laws: Minutes from an Infinite Paradise W.H. Freeman and Company, 1991

• Classic book on fractals:– B. Mandelbrot Fractal Geometry of Nature,

W.H. Freeman, 1977

Page 87: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 87

References

•Belussi, A. and C. Faloutsos (Sept. 1995). Estimating the Selectivity of Spatial Queries Using the `Correlation' Fractal Dimension. Proc. of VLDB, Zurich, Switzerland.•Faloutsos, C. and V. Gaede (Sept. 1996). Analysis of the z-ordering Method Using the Hausdorff Fractal Dimension. VLDB, Bombay, India.•Proietti, G. and C. Faloutsos (March 23-26, 1999). I/O complexity for range queries on region data stored using an R-tree. International Conference on Data Engineering (ICDE), Sydney, Australia.

Page 88: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 88

Road map

• Motivation – 3 problems / case studies

• Definition of fractals and power laws

• Solutions to posed problems

• More tools and examples

• Discussion - putting fractals to work!

• Conclusions – practitioner’s guide

• Appendix: gory details - boxcounting plots

Page 89: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 89

NN queries

• Q: in NN queries, what is the effect of the shape of the query region? [Belussi+95]

rLinf

L2

L1

Page 90: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 90

NN queries• Q: in NN queries, what is the effect of the

shape of the query region?• that is, for L2, and self-similar data:

log(d)

log(#pairs-within(<=d))

r L2

D2

Page 91: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 91

NN queries• Q: What about L1, Linf?

log(d)

log(#pairs-within(<=d))

r L2

D2

Page 92: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 92

NN queries• Q: What about L1, Linf?• A: Same slope, different intercept

log(d)

log(#pairs-within(<=d))

r L2

D2

Page 93: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 93

NN queries• Q: what about the intercept? Ie., what can

we say about N2 and Ninf

r L2

N2 neighbors

rLinf

Ninf neighbors

volume: V2 volume: Vinf

Page 94: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 94

NN queries• Consider sphere with volume Vinf and r’

radius

r L2

N2 neighbors

rLinf

Ninf neighbors

volume: V2 volume: Vinf

r’

Page 95: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 95

NN queries• Consider sphere with volume Vinf and r’

radius• (r/r’)^E = V2 / Vinf

• (r/r’)^D2 = N2 / N2’• N2’ = Ninf (since shape does not matter)• and finally:

Page 96: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 96

NN queries ( N2 / Ninf ) ^ 1/D2 = (V2 / Vinf) ^ 1/E

Page 97: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 97

NN queriesConclusions: for self-similar datasets• Avg # neighbors: grows like (distance)^D2 ,

regardless of query shape (circle, diamond, square, e.t.c. )

Page 98: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 98

Internet topology

• Internet routers: how many neighbors within h hops?

Reachability function: number of neighbors within r hops, vs r (log-log).

Mbone routers, 1995log(hops)

log(#pairs)

2.8

Page 99: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 99

More power laws on the Internet

degree vs rank, for Internet domains (log-log) [sigcomm99]

log(rank)

log(degree)

-0.82

Page 100: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 100

More power laws - internet

• pdf of degrees: (slope: 2.2 )

Log(count)

Log(degree)

-2.2

Page 101: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 101

Even more power laws on the Internet

Scree plot for Internet domains (log-log) [sigcomm99]

log(i)

log( i-th eigenvalue)

0.47

Page 102: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 102

More apps: Brain scans

• Oct-trees; brain-scans

octree levels

Log(#octants)

2.63 = fd

Page 103: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 103

More apps: Medical images

[Burdett et al, SPIE ‘93]:

• benign tumors: fd ~ 2.37

• malignant: fd ~ 2.56

Page 104: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 104

More fractals:

• cardiovascular system: 3 (!)

• stock prices (LYCOS) - random walks: 1.5

• Coastlines: 1.2-1.58 (Norway!)

1 year 2 years

Page 105: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 105

Page 106: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 106

More power laws

• duration of UNIX jobs [Harchol-Balter]

• Energy of earthquakes (Gutenberg-Richter law) [simscience.org]

log(freq)

magnitudeday

amplitude

Page 107: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 107

Even more power laws:

• publication counts (Lotka’s law)• Distribution of UNIX file sizes• Income distribution (Pareto’s law)

• web hit counts [Huberman]

Page 108: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 108

Power laws, cont’ed

• In- and out-degree distribution of web sites [Barabasi], [IBM-CLEVER]

• length of file transfers [Bestavros+]

• Click-stream data (w/ A. Montgomery (CMU-GSIA) + MediaMetrix)

Page 109: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 109

Resources:

• Software for fractal dimension– http://www.cs.cmu.edu/~christos– [email protected]

Page 110: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 110

Books

• Strongly recommended intro book:– Manfred Schroeder Fractals, Chaos, Power

Laws: Minutes from an Infinite Paradise W.H. Freeman and Company, 1991

• Classic book on fractals:– B. Mandelbrot Fractal Geometry of Nature,

W.H. Freeman, 1977

Page 111: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 111

References

– [ieeeTN94] W. E. Leland, M.S. Taqqu, W. Willinger, D.V. Wilson, On the Self-Similar Nature of Ethernet Traffic, IEEE Transactions on Networking, 2, 1, pp 1-15, Feb. 1994.

– [pods94] Christos Faloutsos and Ibrahim Kamel, Beyond Uniformity and Independence: Analysis of R-trees Using the Concept of Fractal Dimension, PODS, Minneapolis, MN, May 24-26, 1994, pp. 4-13

Page 112: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 112

References

– [vldb95] Alberto Belussi and Christos Faloutsos, Estimating the Selectivity of Spatial Queries Using the `Correlation' Fractal Dimension Proc. of VLDB, p. 299-310, 1995

– [vldb96] Christos Faloutsos, Yossi Matias and Avi Silberschatz, Modeling Skewed Distributions Using Multifractals and the `80-20 Law’ Conf. on Very Large Data Bases (VLDB), Bombay, India, Sept. 1996.

Page 113: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 113

References

– [vldb96] Christos Faloutsos and Volker Gaede Analysis of the Z-Ordering Method Using the Hausdorff Fractal Dimension VLD, Bombay, India, Sept. 1996

– [sigcomm99] Michalis Faloutsos, Petros Faloutsos and Christos Faloutsos, What does the Internet look like? Empirical Laws of the Internet Topology, SIGCOMM 1999

Page 114: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 114

References

– [icde99] Guido Proietti and Christos Faloutsos, I/O complexity for range queries on region data stored using an R-tree International Conference on Data Engineering (ICDE), Sydney, Australia, March 23-26, 1999

– [sigmod2000] Christos Faloutsos, Bernhard Seeger, Agma J. M. Traina and Caetano Traina Jr., Spatial Join Selectivity Using Power Laws, SIGMOD 2000

Page 115: Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 115

References

• [PODS94] Faloutsos, C. and I. Kamel (May 24-26, 1994). Beyond Uniformity and Independence: Analysis of R-trees Using the Concept of Fractal Dimension. Proc. ACM SIGACT-SIGMOD-SIGART PODS, Minneapolis, MN.• [Traina+, SBBD’00] Traina, C., A. Traina, et al. (2000). Fast feature selection using the fractal dimension. XV Brazilian Symposium on Databases (SBBD), Paraiba, Brazil.