24
1 CMU SCS 15-721 DB Sys. Design & Impl. Sensors & wavelets Christos Faloutsos www.cs.cmu.edu/~christos 15-721 C. Faloutsos 2 CMU SCS Roadmap 1) Roots: System R and Ingres 2) Implementation: buffering, indexing, q-opt <...> 7) Data Analysis - data mining ... sensors, time series, indexing and wavelets sensors and forecasting 8) Benchmarks 9) vision statements extras (streams/sensors, graphs, multimedia, web, fractals) 15-721 C. Faloutsos 3 CMU SCS Citation C. Faloutsos: Searching Multimedia Databases by Content, Kluwer Academic Press, 1996 chapter on GEMINI chapter on FastMap Appendices on DFT and wavelets 15-721 C. Faloutsos 4 CMU SCS Outline • Motivation Similarity Search and Indexing DSP (Digital Signal Processing) Linear Forecasting Bursty traffic - fractals and multifractals Non-linear forecasting • Conclusions 15-721 C. Faloutsos 5 CMU SCS Problem definition • Given : one or more sequences x 1 , x 2 , … , x t , … (y 1 , y 2 , … , y t , … ) • Find – similar sequences; forecasts – patterns; clusters; outliers 15-721 C. Faloutsos 6 CMU SCS Motivation - Applications Financial, sales, economic series Medical – ECGs +; blood pressure etc monitoring – reactions to new drugs – elderly care

Sensors Wavelets

  • Upload
    others

  • View
    16

  • Download
    0

Embed Size (px)

Citation preview

1

CMU SCS

15-721 DB Sys. Design & Impl.

Sensors & wavelets

Christos Faloutsoswww.cs.cmu.edu/~christos

15-721 C. Faloutsos 2

CMU SCS

Roadmap1) Roots: System R and Ingres2) Implementation: buffering, indexing, q-opt<...>7) Data Analysis - data mining

...sensors, time series, indexing and waveletssensors and forecasting

8) Benchmarks9) vision statements extras (streams/sensors, graphs, multimedia, web, fractals)

15-721 C. Faloutsos 3

CMU SCS

Citation

C. Faloutsos: Searching Multimedia Databases by Content,Kluwer Academic Press, 1996– chapter on GEMINI– chapter on FastMap– Appendices on DFT and wavelets

15-721 C. Faloutsos 4

CMU SCS

Outline

• Motivation• Similarity Search and Indexing• DSP (Digital Signal Processing)• Linear Forecasting• Bursty traffic - fractals and multifractals• Non-linear forecasting• Conclusions

15-721 C. Faloutsos 5

CMU SCS

Problem definition

• Given: one or more sequencesx1 , x2 , … , xt , …(y1, y2, … , yt, …… )

• Find– similar sequences; forecasts– patterns; clusters; outliers

15-721 C. Faloutsos 6

CMU SCS

Motivation - Applications• Financial, sales, economic series

• Medical

– ECGs +; blood pressure etc monitoring

– reactions to new drugs

– elderly care

2

15-721 C. Faloutsos 7

CMU SCS

Motivation - Applications(cont’d)

• ‘Smart house’

– sensors monitor temperature, humidity,air quality

• video surveillance

15-721 C. Faloutsos 8

CMU SCS

Motivation - Applications(cont’d)

• civil/automobile infrastructure

– bridge vibrations [Oppenheim+02]

– road conditions / traffic monitoring

15-721 C. Faloutsos 9

CMU SCS

Stream Data: automobile traffic

Automobile traffic

0200400600800

100012001400160018002000

time

# cars

15-721 C. Faloutsos 10

CMU SCS

Motivation - Applications(cont’d)

• Weather, environment/anti-pollution

– volcano monitoring

– air/water pollutant monitoring

15-721 C. Faloutsos 11

CMU SCS

Stream Data: Sunspots

Sunspot Data

0

50

100

150

200

250

300

time

#sunspots per month

15-721 C. Faloutsos 12

CMU SCS

Motivation - Applications(cont’d)

• Computer systems

– ‘Active Disks’ (buffering, prefetching)

– web servers (ditto)

– network traffic monitoring

– ...

3

15-721 C. Faloutsos 13

CMU SCS

Stream Data: Disk accesses

Disk traffic

0

5000000

10000000

15000000

20000000

time

#bytes

15-721 C. Faloutsos 14

CMU SCS

Settings & Applications

• One or more sensors, collecting time-seriesdata

15-721 C. Faloutsos 15

CMU SCS

Settings & Applications

Each sensor collects data (x1, x2, …, xt, …)

15-721 C. Faloutsos 16

CMU SCS

Settings & Applications

Some sensors ‘report’ to others or to the central site

15-721 C. Faloutsos 17

CMU SCS

Settings & Applications

Goal #1:Finding patternsin a single time sequence

15-721 C. Faloutsos 18

CMU SCS

Settings & Applications

Goal #2:Finding patternsin many time sequences

4

15-721 C. Faloutsos 19

CMU SCS

Problem #1:

Goal: given a signal (eg., #packets over time)Find: patterns, periodicities, and/or compress

year

count lynx caught per year(packets per day;temperature per day)

15-721 C. Faloutsos 20

CMU SCS

Problem#2: ForecastGiven xt, xt-1, … , forecast xt+1

0102030405060708090

1 3 5 7 9 11Time Tick

Num

ber

of p

acke

ts s

ent

??

15-721 C. Faloutsos 21

CMU SCS

Problem#2’: Similarity searchEg., Find a 3-tick pattern, similar to the last one

0102030405060708090

1 3 5 7 9 11Time Tick

Num

ber

of p

acke

ts s

ent

??

15-721 C. Faloutsos 22

CMU SCS

Problem #3:• Given: A set of correlated time sequences• Forecast ‘Sent(t)’

0102030405060708090

1 3 5 7 9 11Time Tick

Num

ber

of p

acke

ts

sent

lost

repeated

15-721 C. Faloutsos 23

CMU SCS

Differences from DSP/Stat

• Semi-infinite streams– we need on-line, ‘any-time’ algorithms

• Can not afford human intervention– need automatic methods

• sensors have limited memory /processing / transmitting power– need for (lossy) compression

15-721 C. Faloutsos 24

CMU SCS

Important observations

Patterns, rules, forecasting and similarityindexing are closely related:

• To do forecasting, we need– to find patterns/rules– to find similar settings in the past

• to find outliers, we need to have forecasts– (outlier = too far away from our forecast)

5

15-721 C. Faloutsos 25

CMU SCS

Important topics NOT in thistutorial:

• Continuous queries– [Babu+Widom ] [Gehrke+][Madden+03]

• Categorical data streams– [Hatonen+96]

• Outlier detection (discontinuities)– [Breunig+00]

15-721 C. Faloutsos 26

CMU SCS

Outline

• Motivation• Similarity Search and Indexing• DSP• Linear Forecasting• Bursty traffic - fractals and multifractals• Non-linear forecasting• Conclusions

15-721 C. Faloutsos 27

CMU SCS

Outline

• Motivation• Similarity Search and Indexing

– distance functions: Euclidean;Time-warping– indexing– feature extraction

• DSP• ...

15-721 C. Faloutsos 28

CMU SCS

Importance of distance functions

Subtle, but absolutely necessary:• A ‘must’ for similarity indexing (->

forecasting)• A ‘must’ for clusteringTwo major families

– Euclidean and Lp norms– Time warping and variations

15-721 C. Faloutsos 29

CMU SCS

Euclidean and Lp

¦�

� n

iii yxyxD

1

2)(),(&&

x(t) y(t)

...

¦�

� n

i

piip yxyxL

1

||),(&&

•L1: city-block = Manhattan•L2 = Euclidean•L �

15-721 C. Faloutsos 30

CMU SCS

Observation #1

• Time sequence -> n-dvector

...

Day-1

Day-2

Day-n

6

15-721 C. Faloutsos 31

CMU SCS

Observation #2

Euclidean distance isclosely related to– cosine similarity– dot product– ‘cross-correlation’

function

...

Day-1

Day-2

Day-n

15-721 C. Faloutsos 32

CMU SCS

Time Warping

• allow accelerations - decelerations– (with or w/o penalty)

• THEN compute the (Euclidean) distance (+penalty)

• related to the string-editing distance

15-721 C. Faloutsos 33

CMU SCS

Time Warping

‘stutters’ :

15-721 C. Faloutsos 34

CMU SCS

Time warpingQ: how to compute it?A: dynamic programming D( i, j ) = cost to matchprefix of length i of first sequence x with prefix

of length j of second sequence y

Skip

15-721 C. Faloutsos 35

CMU SCS

Thus, with no penalty for stutter, for sequencesx1, x2, … , xi,; y1, y2, … , yj

°̄°®­

����

�� ),1()1,(

)1,1(min][][),(

jiD

jiD

jiD

jyixjiD x-stutter

y-stutter

no stutter

Time warping Time warpingSkip

15-721 C. Faloutsos 36

CMU SCS

Time warping• Complexity: O(M*N) - quadratic on the

length of the strings• Many variations (penalty for stutters; limit

on the number/percentage of stutters; … )• popular in voice processing

[Rabiner+Juang]

Skip

7

15-721 C. Faloutsos 37

CMU SCS

Other Distance functions

• piece-wise linear/flat approx.; comparepieces [Keogh+01] [Faloutsos+97]

• ‘cepstrum’ (for voice [Rabiner+Juang])– do DFT; take log of amplitude; do DFT again!

• Allow for small gaps [Agrawal+95]See tutorial by [Gunopulos Das, SIGMOD01]

15-721 C. Faloutsos 38

CMU SCS

Conclusions

Prevailing distances:– Euclidean and– time-warping

15-721 C. Faloutsos 39

CMU SCS

Outline

• Motivation• Similarity Search and Indexing

– distance functions– indexing– feature extraction

• DSP• ...

15-721 C. Faloutsos 40

CMU SCS

Indexing

Problem:• given a set of time sequences,• find the ones similar to a desirable query

sequence

15-721 C. Faloutsos 41

CMU SCS

day

$price

1 365

day

$price

1 365

day

$price

1 365

distance function: by expert

15-721 C. Faloutsos 42

CMU SCS

Idea: ‘GEMINI’

Eg., ‘find stocks similar to MSFT’Seq. scanning: too slowHow to accelerate the search?

8

15-721 C. Faloutsos 43

CMU SCS

day1 365

day1 365

S1

Sn

F(S1)

F(Sn)

‘GEMINI’ - Pictorially

eg, avg

eg,. std

15-721 C. Faloutsos 44

CMU SCS

GEMINI

Solution: Quick-and-dirty’ filter:• extract n features (numbers, eg., avg., etc.)• map into a point in n-d feature space• organize points with off-the-shelf spatial

access method (‘SAM’ )• discard false alarms

15-721 C. Faloutsos 45

CMU SCS

Examples of GEMINI

• Time sequences: DFT (up to 100 timesfaster) [SIGMOD94];

• [Kanellakis+], [Mendelzon+]

15-721 C. Faloutsos 46

CMU SCS

Examples of GEMINI

Even on other-than-sequence data:• Images (QBIC) [JIIS94]• tumor-like shapes [VLDB96]• video [Informedia + S-R-trees]• automobile part shapes [Kriegel+97]

15-721 C. Faloutsos 47

CMU SCS

Indexing - SAMs

Q: How do Spatial Access Methods (SAMs)work?

A:

15-721 C. Faloutsos 48

CMU SCS

Indexing - SAMs

Q: How do Spatial Access Methods (SAMs)work?

A: they group nearby points (or regions)together, on nearby disk pages, and answerspatial queries quickly (‘range queries’ ,‘nearest neighbor’ queries etc)

For example:

9

15-721 C. Faloutsos 49

CMU SCS

R-trees

• [Guttman84] eg., w/ fanout 4: group nearbyrectangles to parent MBRs; each group ->disk pageA

B

C

DE

FG

H

I

J

Skip

15-721 C. Faloutsos 50

CMU SCS

R-trees

• eg., w/ fanout 4:

A

B

C

DE

FG

H

I

J

P1

P2

P3

P4F GD E

H I JA B C

Skip

15-721 C. Faloutsos 51

CMU SCS

R-trees

• eg., w/ fanout 4:

A

B

C

DE

FG

H

I

J

P1

P2

P3

P4

P1 P2 P3 P4

F GD E

H I JA B C

Skip

15-721 C. Faloutsos 52

CMU SCS

Conclusions

• Fast indexing: through GEMINI– feature extraction and– (off the shelf) Spatial Access Methods

[Gaede+98]

15-721 C. Faloutsos 53

CMU SCS

Outline

• Motivation• Similarity Search and Indexing

– distance functions– indexing– feature extraction

• DSP• ...

15-721 C. Faloutsos 54

CMU SCS

Outline

• Motivation• Similarity Search and Indexing

– distance functions– indexing– feature extraction

• DFT, DWT, DCT (data independent)• SVD, etc (data dependent)• MDS, FastMap

10

15-721 C. Faloutsos 55

CMU SCS

DFT and cousins

• very good for compressing real signals• more details on DFT/DCT/DWT: later

15-721 C. Faloutsos 56

CMU SCS

DFT and stocks

0.00

2000.00

4000.00

6000.00

8000.00

10000.00

12000.00

1 11 21 31 41 51 61 71 81 91 101 111 121

Fourier appx actual

• Dow Jones Industrialindex, 6/18/2001-12/21/2001

15-721 C. Faloutsos 57

CMU SCS

DFT and stocks

0.00

2000.00

4000.00

6000.00

8000.00

10000.00

12000.00

1 11 21 31 41 51 61 71 81 91 101 111 121

Fourier appx actual

• Dow Jones Industrialindex, 6/18/2001-12/21/2001

• just 3 DFTcoefficients give verygood approximation

1

10

100

1000

10000

100000

1000000

10000000

1 11 21 31 41 51 61 71 81 91 101 111 121

Series1

freq

Log(ampl)

15-721 C. Faloutsos 58

CMU SCS

Outline

• Motivation• Similarity Search and Indexing

– distance functions– indexing– feature extraction

• DFT, DWT, DCT (data independent)• SVD etc (data dependent)• MDS, FastMap

15-721 C. Faloutsos 59

CMU SCS

SVD

• THE optimal method for dimensionalityreduction– (under the Euclidean metric)

15-721 C. Faloutsos 60

CMU SCS

Singular Value Decomposition(SVD)

• SVD (~LSI ~ KL ~ PCA ~ spectralanalysis...) LSI: S. Dumais; M. Berry

KL: eg, Duda+Hart

PCA: eg., Jolliffe

Details: [Press+],

[Faloutsos96]

day1

day2

11

15-721 C. Faloutsos 61

CMU SCS

SVD

• Extremely useful tool– (also behind PageRank/google and Kleinberg’ s

algorithm for hubs and authorities)

• But may be slow: O(N * M * M) if N>M• any approximate, faster method?

15-721 C. Faloutsos 62

CMU SCS

SVD shorcuts• random projections (Johnson-Lindenstrauss

thm [Papadimitriou+ pods98])

15-721 C. Faloutsos 63

CMU SCS

Random projections

• pick ‘enough’ random directions (will be~orthogonal, in high-d!!)

• distances are preserved probabilistically,within epsilon

• (also, use as a pre-processing step for SVD[Papadimitriou+ PODS98])

15-721 C. Faloutsos 64

CMU SCS

Feature extraction - w/ fractals

• Main idea: drop those attributes that don’ taffect the intrinsic (‘fractal’ ) dimensionality[Traina+, SBBD 2000]

• ie., drop attributes that depend on others(linearly or non-linearly!)

Skip

15-721 C. Faloutsos 65

CMU SCS

Feature extraction - w/ fractals

y

xxx

yy(a) Quarter-circle (c) Spike(b)Line1

0000

1

PFD~1

PFD~1global FD=1

Skip

15-721 C. Faloutsos 66

CMU SCS

Outline

• Motivation• Similarity Search and Indexing

– distance functions– indexing– feature extraction

• DFT, DWT, DCT (data independent)• SVD (data dependent)• MDS, FastMap

12

15-721 C. Faloutsos 67

CMU SCS

MDS / FastMap

• but, what if we have NO points to startwith?(eg. Time-warping distance)

• A: Multi-dimensional Scaling (MDS) ;FastMap

15-721 C. Faloutsos 68

CMU SCS

MDS/FastMap

01100100100O5

10100100100O4

100100011O3

100100101O2

100100110O1

O5O4O3O2O1~100

~1

15-721 C. Faloutsos 69

CMU SCS

MDS

Multi Dimensional Scaling

15-721 C. Faloutsos 70

CMU SCS

FastMap

• Multi-dimensional scaling (MDS) can dothat, but in O(N**2) time

• FastMap [Faloutsos+95] takes O(N) time

15-721 C. Faloutsos 71

CMU SCS

FastMap: Application

VideoTrails [Kobla+97]

scene-cut detection (about 10% errors)

15-721 C. Faloutsos 72

CMU SCS

Outline

• Motivation• Similarity Search and Indexing

– distance functions– indexing– feature extraction

• DFT, DWT, DCT (data independent)• SVD (data dependent)• MDS, FastMap

13

15-721 C. Faloutsos 73

CMU SCS

Conclusions - Practitioner’ sguide

Similarity search in time sequences1) establish/choose distance (Euclidean, time-

warping,… )2) extract features (SVD, DWT, MDS), and use

an SAM (R-tree/variant) or a Metric Tree (M-tree)

2’ ) for high intrinsic dimensionalities, consider sequentialscan (it might win… )

15-721 C. Faloutsos 74

CMU SCS

Books

• William H. Press, Saul A. Teukolsky, William T.Vetterling and Brian P. Flannery: Numerical Recipes in C,Cambridge University Press, 1992, 2nd Edition. (Greatdescription, intuition and code for SVD)

• C. Faloutsos: Searching Multimedia Databases by Content,Kluwer Academic Press, 1996 (introduction to SVD, andGEMINI)

15-721 C. Faloutsos 75

CMU SCS

References

• Agrawal, R., K.-I. Lin, et al. (Sept. 1995). Fast SimilaritySearch in the Presence of Noise, Scaling and Translation inTime-Series Databases. Proc. of VLDB, Zurich,Switzerland.

• Babu, S. and J. Widom (2001). “Continuous Queries overData Streams.” SIGMOD Record 30(3): 109-120.

• Breunig, M. M., H.-P. Kriegel, et al. (2000). LOF:Identifying Density-Based Local Outliers. SIGMOD

Conference, Dallas, TX.• Berry, Michael: http://www.cs.utk.edu/~lsi/

15-721 C. Faloutsos 76

CMU SCS

References

• Ciaccia, P., M. Patella, et al. (1997). M-tree: An EfficientAccess Method for Similarity Search in Metric Spaces.VLDB.

• Foltz, P. W. and S. T. Dumais (Dec. 1992). “PersonalizedInformation Delivery: An Analysis of InformationFiltering Methods.” Comm. of ACM (CACM) 35(12): 51-60.

• Guttman, A. (June 1984). R-Trees: A Dynamic IndexStructure for Spatial Searching. Proc. ACM SIGMOD,Boston, Mass.

15-721 C. Faloutsos 77

CMU SCS

References

• Gaede, V. and O. Guenther (1998). “MultidimensionalAccess Methods.” Computing Surveys 30(2): 170-231.

• Gehrke, J. E., F. Korn, et al. (May 2001). On ComputingCorrelated Aggregates Over Continual Data Streams.ACM Sigmod, Santa Barbara, California.

15-721 C. Faloutsos 78

CMU SCS

References

• Gunopulos, D. and G. Das (2001). Time Series SimilarityMeasures and Time Series Indexing. SIGMODConference, Santa Barbara, CA.

• Hatonen, K., M. Klemettinen, et al. (1996). KnowledgeDiscovery from Telecommunication Network AlarmDatabases. ICDE, New Orleans, Louisiana.

• Jolliffe, I. T. (1986). Principal Component Analysis,Springer Verlag.

14

15-721 C. Faloutsos 79

CMU SCS

References

• Keogh, E. J., K. Chakrabarti, et al. (2001). LocallyAdaptive Dimensionality Reduction for Indexing LargeTime Series Databases. SIGMOD Conference, SantaBarbara, CA.

• Kobla, V., D. S. Doermann, et al. (Nov. 1997).VideoTrails: Representing and Visualizing Structure inVideo Sequences. ACM Multimedia 97, Seattle, WA.

15-721 C. Faloutsos 80

CMU SCS

References

• Oppenheim, I. J., A. Jain, et al. (March 2002). A MEMSUltrasonic Transducer for Resident Monitoring of SteelStructures. SPIE Smart Structures Conference SS05, SanDiego.

• Papadimitriou, C. H., P. Raghavan, et al. (1998). LatentSemantic Indexing: A Probabilistic Analysis. PODS,Seattle, WA.

• Rabiner, L. and B.-H. Juang (1993). Fundamentals ofSpeech Recognition, Prentice Hall.

15-721 C. Faloutsos 81

CMU SCS

References

• Traina, C., A. Traina, et al. (October 2000). Fast featureselection using the fractal dimension,. XV BrazilianSymposium on Databases (SBBD), Paraiba, Brazil.

15-721 C. Faloutsos 82

CMU SCS

15-721 C. Faloutsos 83

CMU SCS

Outline

• Motivation• Similarity Search and Indexing• DSP (DFT, DWT)• Linear Forecasting• Bursty traffic - fractals and multifractals• Non-linear forecasting• Conclusions

15-721 C. Faloutsos 84

CMU SCS

Outline

• DFT– Definition of DFT and properties– how to read the DFT spectrum

• DWT– Definition of DWT and properties– how to read the DWT scalogram

15

15-721 C. Faloutsos 85

CMU SCS

Introduction - Problem#1

Goal: given a signal (eg., packets over time)Find: patterns and/or compress

year

count

lynx caught per year(packets per day;automobiles per hour)

-2000

0

2000

4000

6000

8000

1 14 27 40 53 66 79 92 105

15-721 C. Faloutsos 86

CMU SCS

What does DFT do?

A: highlights the periodicities

15-721 C. Faloutsos 87

CMU SCS

DFT: definition• For a sequence x0, x1, … xn-1

• the (n-point) Discrete Fourier Transform is• X0, X1, … Xn-1 :

)/2exp(*/1

)1(

1,,0)/2exp(*/1

1

0

1

0

ntfjXnx

j

nfntfjxnX

n

tft

n

ttf

��

��

����

¦

¦�

inverse DFT

Skip

15-721 C. Faloutsos 88

CMU SCS

DFT: definition

• Good news: Available in all symbolic mathpackages, eg., in ‘mathematica’x = [1,2,1,2];X = Fourier[x];Plot[ Abs[X] ];

15-721 C. Faloutsos 89

CMU SCS

DFT: Amplitude spectrum

actual mean mean+freq12

1 12 23 34 45 56 67 78 89 100

111

year

count

Freq.

Ampl.

freq=12

freq=0

)(Im)(Re 222

fff XXA � Amplitude:

15-721 C. Faloutsos 90

CMU SCS

DFT: examples

flat

0.000

0.010

0.020

0.030

0.040

0.050

0.060

0.070

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

0

0.2

0.4

0.6

0.8

1

1.2

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

time freq

Amplitude

16

15-721 C. Faloutsos 91

CMU SCS

DFT: examples

Low frequency sinusoid

-0.150

-0.100

-0.050

0.000

0.050

0.100

0.150

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

0

0.2

0.4

0.6

0.8

1

1.2

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

time freq

15-721 C. Faloutsos 92

CMU SCS

DFT: examples

• Sinusoid - symmetry property: Xf = X*n-f

-0.150

-0.100

-0.050

0.000

0.050

0.100

0.150

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

0

0.2

0.4

0.6

0.8

1

1.2

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

time freq

15-721 C. Faloutsos 93

CMU SCS

DFT: examples

• Higher freq. sinusoid

-0.080-0.060-0.040-0.0200.0000.0200.0400.0600.080

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

0

0.1

0.2

0.3

0.4

0.5

0.6

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

time freq

15-721 C. Faloutsos 94

CMU SCS

DFT: examples

examples

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

=

+

+

15-721 C. Faloutsos 95

CMU SCS

DFT: examples

examples

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

0

0.2

0.4

0.6

0.8

1

1.2

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Freq.

Ampl.

15-721 C. Faloutsos 96

CMU SCS

Outline

• Motivation• Similarity Search and Indexing• DSP• Linear Forecasting• Bursty traffic - fractals and multifractals• Non-linear forecasting• Conclusions

17

15-721 C. Faloutsos 97

CMU SCS

Outline

• Motivation• Similarity Search and Indexing• DSP

– DFT• Definition of DFT and properties• how to read the DFT spectrum

– DWT

15-721 C. Faloutsos 98

CMU SCS

DFT: Amplitude spectrum

actual mean mean+freq12

1 12 23 34 45 56 67 78 89 100

111

year

count

Freq.

Ampl.

freq=12

freq=0

)(Im)(Re 222

fff XXA � Amplitude:

15-721 C. Faloutsos 99

CMU SCS

DFT: Amplitude spectrum

actual mean mean+freq12

1 12 23 34 45 56 67 78 89 100

111

year

count

Freq.

Ampl.

freq=12

freq=0

15-721 C. Faloutsos 100

CMU SCS

1 12 23 34 45 56 67 78 89 100

111

DFT: Amplitude spectrum

actual mean mean+freq12

year

count

Freq.

Ampl.

freq=12

freq=0

15-721 C. Faloutsos 101

CMU SCS

DFT: Amplitude spectrum

• excellent approximation, with only 2frequencies!

• so what?

actual mean mean+freq12

1 12 23 34 45 56 67 78 89 100

111

Freq.15-721 C. Faloutsos 102

CMU SCS

DFT: Amplitude spectrum

• excellent approximation, with only 2frequencies!

• so what?• A1: (lossy) compression• A2: pattern discovery 1 12 23 34 45 56 67 78 89 10

0

111

18

15-721 C. Faloutsos 103

CMU SCS

DFT: Amplitude spectrum

• excellent approximation, with only 2frequencies!

• so what?• A1: (lossy) compression• A2: pattern discovery

actual mean mean+freq12

15-721 C. Faloutsos 104

CMU SCS

DFT - Conclusions

• It spots periodicities (with the‘amplitude spectrum’)

• can be quickly computed (O( n log n)),thanks to the FFT algorithm.

• standard tool in signal processing(speech, image etc signals)

• (closely related to DCT and JPEG)

15-721 C. Faloutsos 105

CMU SCS

Outline

• Motivation• Similarity Search and Indexing• DSP

– DFT– DWT

• Definition of DWT and properties• how to read the DWT scalogram

15-721 C. Faloutsos 106

CMU SCS

Problem #1:

Goal: given a signal (eg., #packets over time)Find: patterns, periodicities, and/or compress

year

count lynx caught per year(packets per day;virus infections per month)

15-721 C. Faloutsos 107

CMU SCS

Wavelets - DWT

• DFT is great - but, how about compressinga spike?

value

time

0

0.2

0.4

0.6

0.8

1

1.2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

15-721 C. Faloutsos 108

CMU SCS

0

0.2

0.4

0.6

0.8

1

1.2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Wavelets - DWT

• DFT is great - but, how about compressinga spike?

• A: Terrible - all DFT coefficients needed!

00.20.40.6

0.81

1.2

1 3 5 7 9 11 13 15

Freq.

Ampl.value

time

19

15-721 C. Faloutsos 109

CMU SCS

Wavelets - DWT

• DFT is great - but, how about compressinga spike?

• A: Terrible - all DFT coefficients needed!

0

0.2

0.4

0.6

0.8

1

1.2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

value

time

0

0.2

0.4

0.6

0.8

1

1.2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

15-721 C. Faloutsos 110

CMU SCS

Wavelets - DWT

• Similarly, DFT suffers on short-durationwaves (eg., baritone, silence, soprano)

time

value

15-721 C. Faloutsos 111

CMU SCS

Wavelets - DWT

• Solution#1: Short window Fouriertransform (SWFT)

• But: how short should be the window?

time

freq

time

value

15-721 C. Faloutsos 112

CMU SCS

Wavelets - DWT

• Answer: multiple window sizes! -> DWT

time

freq

Timedomain DFT SWFT DWT

15-721 C. Faloutsos 113

CMU SCS

Haar Wavelets

• subtract sum of left half from right half• repeat recursively for quarters, eight-ths, ...

15-721 C. Faloutsos 114

CMU SCS

Wavelets - construction

x0 x1 x2 x3 x4 x5 x6 x7

20

15-721 C. Faloutsos 115

CMU SCS

Wavelets - construction

x0 x1 x2 x3 x4 x5 x6 x7

s1,0+

-

d1,0 s1,1d1,1 .......level 1

15-721 C. Faloutsos 116

CMU SCS

Wavelets - construction

d2,0

x0 x1 x2 x3 x4 x5 x6 x7

s1,0+

-

d1,0 s1,1d1,1 .......

s2,0level 2

15-721 C. Faloutsos 117

CMU SCS

Wavelets - construction

d2,0

x0 x1 x2 x3 x4 x5 x6 x7

s1,0+

-

d1,0 s1,1d1,1 .......

s2,0

etc ...

15-721 C. Faloutsos 118

CMU SCS

Wavelets - construction

d2,0

x0 x1 x2 x3 x4 x5 x6 x7

s1,0+

-

d1,0 s1,1d1,1 .......

s2,0

Q: map each coefficient

on the time-freq. plane

t

f

15-721 C. Faloutsos 119

CMU SCS

Wavelets - construction

d2,0

x0 x1 x2 x3 x4 x5 x6 x7

s1,0+

-

d1,0 s1,1d1,1 .......

s2,0

Q: map each coefficient

on the time-freq. plane

t

f

15-721 C. Faloutsos 120

CMU SCS

Haar wavelets - code#!/usr/bin/perl5# expects a file with numbers# and prints the dwt transform# The number of time-ticks should be a power of 2# USAGE# haar.pl <fname>

my @vals=();my @smooth; # the smooth component of the signalmy @diff; # the high-freq. component

# collect the values into the array @valwhile(<>){

@vals = ( @vals , split );}

my $len = scalar(@vals);my $half = int($len/2);while($half >= 1 ){ for(my $i=0; $i< $half; $i++){

$diff [$i] = ($vals[2*$i] - $vals[2*$i + 1] )/ sqrt(2); print "\t", $diff[$i]; $smooth [$i] = ($vals[2*$i] + $vals[2*$i + 1] )/ sqrt(2);

} print "\n"; @vals = @smooth; $half = int($half/2);}print "\t", $vals[0], "\n" ; # the final, smooth component

21

15-721 C. Faloutsos 121

CMU SCS

Wavelets - construction

Observation1:‘+’ can be some weighted addition‘-’ is the corresponding weighted difference

(‘Quadrature mirror filters’ )

Observation2: unlike DFT/DCT,there are *many* wavelet bases: Haar, Daubechies-

4, Daubechies-6, Coifman, Morlet, Gabor, ...

15-721 C. Faloutsos 122

CMU SCS

Wavelets - how do they looklike?

• E.g., Daubechies-4

15-721 C. Faloutsos 123

CMU SCS

Wavelets - how do they looklike?

• E.g., Daubechies-4

?

?

15-721 C. Faloutsos 124

CMU SCS

Wavelets - how do they looklike?

• E.g., Daubechies-4

15-721 C. Faloutsos 125

CMU SCS

Outline

• Motivation• Similarity Search and Indexing• DSP

– DFT– DWT

• Definition of DWT and properties• how to read the DWT scalogram

15-721 C. Faloutsos 126

CMU SCS

Wavelets - Drill#1:

t

f

• Q: baritone/silence/soprano - DWT?

time

value

22

15-721 C. Faloutsos 127

CMU SCS

Wavelets - Drill#1:

t

f

• Q: baritone/soprano - DWT?

time

value

15-721 C. Faloutsos 128

CMU SCS

Wavelets - Drill#2:

• Q: spike - DWT?

t

f

1 2 3 4 5 6 7 8

15-721 C. Faloutsos 129

CMU SCS

Wavelets - Drill#2:

t

f

• Q: spike - DWT?

1 2 3 4 5 6 7 8

0.00 0.00 0.71 0.00

0.00 0.50-0.350.35

15-721 C. Faloutsos 130

CMU SCS

Wavelets - Drill#3:

• Q: weekly + daily periodicity, + spike -DWT?

t

f

15-721 C. Faloutsos 131

CMU SCS

Wavelets - Drill#3:

• Q: weekly + daily periodicity, + spike -DWT?

t

f

15-721 C. Faloutsos 132

CMU SCS

Wavelets - Drill#3:

• Q: weekly + daily periodicity, + spike -DWT?

t

f

23

15-721 C. Faloutsos 133

CMU SCS

Wavelets - Drill#3:

• Q: weekly + daily periodicity, + spike -DWT?

t

f

15-721 C. Faloutsos 134

CMU SCS

Wavelets - Drill#3:

• Q: weekly + daily periodicity, + spike -DWT?

t

f

15-721 C. Faloutsos 135

CMU SCS

Wavelets - Drill#3:

• Q: DFT?

t

f

t

f

DWT DFT

15-721 C. Faloutsos 136

CMU SCS

Advantages of Wavelets

• Better compression (better RMSE with samenumber of coefficients - used in JPEG-2000)

• fast to compute (usually: O(n)!)• very good for ‘spikes’• mammalian eye and ear: Gabor wavelets

15-721 C. Faloutsos 137

CMU SCS

Overall Conclusions

• DFT, DCT spot periodicities• DWT : multi-resolution - matches

processing of mammalian ear/eye better• All three: powerful tools for compression,

pattern detection in real signals• All three: included in math packages

– (matlab, ‘R’ , mathematica, … - often inspreadsheets!)

15-721 C. Faloutsos 138

CMU SCS

Overall Conclusions

• DWT : very suitable for self-similartraffic

• DWT: used for summarization of streams[Gilbert+01], db histograms etc

24

15-721 C. Faloutsos 139

CMU SCS

Resources - software and urls

• http://www.dsptutor.freeuk.com/jsanalyser/FFTSpectrumAnalyser.html : Nice javaapplets for FFT

• http://www.relisoft.com/freeware/freq.htmlvoice frequency analyzer (needsmicrophone)

15-721 C. Faloutsos 140

CMU SCS

Resources: software and urls

• xwpl: open source wavelet package fromYale, with excellent GUI

• http://monet.me.ic.ac.uk/people/gavin/java/waveletDemos.html : wavelets andscalograms

15-721 C. Faloutsos 141

CMU SCS

Books

• William H. Press, Saul A. Teukolsky, William T.Vetterling and Brian P. Flannery: Numerical Recipes in C,Cambridge University Press, 1992, 2nd Edition. (Greatdescription, intuition and code for DFT, DWT)

• C. Faloutsos: Searching Multimedia Databases by Content,Kluwer Academic Press, 1996 (introduction to DFT,DWT)

15-721 C. Faloutsos 142

CMU SCS

Additional Reading

• [Gilbert+01] Anna C. Gilbert, Yannis Kotidis and S.Muthukrishnan and Martin Strauss, Surfing Wavelets onStreams: One-Pass Summaries for Approximate AggregateQueries, VLDB 2001