68
CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

Embed Size (px)

Citation preview

Page 1: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826: Multimedia Databases and Data Mining

Lecture #18: SVD - part I (definitions)

C. Faloutsos

Page 2: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 2

Must-read Material

• Numerical Recipes in C ch. 2.6;

• MM Textbook Appendix D

Page 3: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 3

Outline

Goal: ‘Find similar / interesting things’

• Intro to DB

• Indexing - similarity search

• Data Mining

Page 4: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 4

Indexing - Detailed outline• primary key indexing• secondary key / multi-key indexing• spatial access methods• fractals• text• Singular Value Decomposition (SVD)• multimedia• ...

Page 5: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 5

SVD - Detailed outline• Motivation• Definition - properties• Interpretation• Complexity• Case studies• Additional properties

Page 6: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 6

SVD - Motivation• problem #1: text - LSI: find ‘concepts’• problem #2: compression / dim. reduction

Page 7: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 7

SVD - Motivation• problem #1: text - LSI: find ‘concepts’

Page 8: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) P1-8

SVD - Motivation• Customer-product, for recommendation

system:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

bread

lettu

cebe

ef

vegetarians

meat eaters

tom

atos

chick

en

Page 9: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 9

SVD - Motivation• problem #2: compress / reduce

dimensionality

Page 10: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 10

Problem - specs

• ~10**6 rows; ~10**3 columns; no updates;

• random access to any cell(s) ; small error: OK

Page 11: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 11

SVD - Motivation

Page 12: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 12

SVD - Motivation

Page 13: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 13

SVD - Detailed outline• Motivation• Definition - properties• Interpretation• Complexity• Case studies• Additional properties

Page 14: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 14

SVD - Definition(reminder: matrix multiplication

1 2

3 45 6

x 1

-1

3 x 2 2 x 1

=

Page 15: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 15

SVD - Definition(reminder: matrix multiplication

1 2

3 45 6

x 1

-1

3 x 2 2 x 1

=

3 x 1

Page 16: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 16

SVD - Definition(reminder: matrix multiplication

1 2

3 45 6

x 1

-1

3 x 2 2 x 1

=

-1

3 x 1

Page 17: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 17

SVD - Definition(reminder: matrix multiplication

1 2

3 45 6

x 1

-1

3 x 2 2 x 1

=

-1

-1

3 x 1

Page 18: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 18

SVD - Definition(reminder: matrix multiplication

1 2

3 45 6

x 1

-1=

-1

-1-1

Page 19: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 19

SVD - Definition

A[n x m] = U[n x r] r x r] (V[m x r])T

• A: n x m matrix (eg., n documents, m terms)• U: n x r matrix (n documents, r concepts)• : r x r diagonal matrix (strength of each

‘concept’) (r : rank of the matrix)• V: m x r matrix (m terms, r concepts)

Page 20: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 20

SVD - Definition• A = U VT - example:

Page 21: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 21

SVD - PropertiesTHEOREM [Press+92]: always possible to

decompose matrix A into A = U VT , where• U, V: unique (*)• U, V: column orthonormal (ie., columns are unit

vectors, orthogonal to each other)– UT U = I; VT V = I (I: identity matrix)

• : singular are positive, and sorted in decreasing order

Page 22: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 22

SVD - Example• A = U VT - example:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

datainf.

retrieval

brain lung

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=CS

MD

9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

Page 23: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 23

SVD - Example• A = U VT - example:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

datainf.

retrieval

brain lung

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=CS

MD

9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

CS-conceptMD-concept

Page 24: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 24

SVD - Example• A = U VT - example:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

datainf.

retrieval

brain lung

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=CS

MD

9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

CS-conceptMD-concept

doc-to-concept similarity matrix

Page 25: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 25

SVD - Example• A = U VT - example:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

datainf.

retrieval

brain lung

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=CS

MD

9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

‘strength’ of CS-concept

Page 26: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 26

SVD - Example• A = U VT - example:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

datainf.

retrieval

brain lung

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=CS

MD

9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

term-to-conceptsimilarity matrix

CS-concept

Page 27: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 27

SVD - Example• A = U VT - example:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

datainf.

retrieval

brain lung

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=CS

MD

9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

term-to-conceptsimilarity matrix

CS-concept

Page 28: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 28

SVD - Detailed outline• Motivation• Definition - properties• Interpretation• Complexity• Case studies• Additional properties

Page 29: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 29

SVD - Interpretation #1‘documents’, ‘terms’ and ‘concepts’:• U: document-to-concept similarity matrix• V: term-to-concept sim. matrix: its diagonal elements: ‘strength’ of each

concept

Page 30: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 30

SVD – Interpretation #1‘documents’, ‘terms’ and ‘concepts’:Q: if A is the document-to-term matrix, what

is AT A?A:Q: A AT ?A:

Page 31: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 31

SVD – Interpretation #1‘documents’, ‘terms’ and ‘concepts’:Q: if A is the document-to-term matrix, what

is AT A?A: term-to-term ([m x m]) similarity matrixQ: A AT ?A: document-to-document ([n x n]) similarity

matrix

Page 32: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) P1-32Copyright: Faloutsos, Tong (2009) 2-32

SVD properties• V are the eigenvectors of the covariance

matrix ATA

• U are the eigenvectors of the Gram (inner-product) matrix AAT

Further reading:1. Ian T. Jolliffe, Principal Component Analysis (2nd ed), Springer, 2002.2. Gilbert Strang, Linear Algebra and Its Applications (4th ed), Brooks Cole, 2005.

Page 33: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 33

SVD - Interpretation #2• best axis to project on: (‘best’ = min sum of

squares of projection errors)

Page 34: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 34

SVD - Motivation

Page 35: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 35

SVD - interpretation #2

• minimum RMS error

SVD: givesbest axis to project

v1

first singular

vector

Page 36: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 36

SVD - Interpretation #2

Page 37: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 37

SVD - Interpretation #2• A = U VT - example:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

v1

Page 38: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 38

SVD - Interpretation #2• A = U VT - example:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

variance (‘spread’) on the v1 axis

Page 39: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 39

SVD - Interpretation #2• A = U VT - example:

– U gives the coordinates of the points in the projection axis

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

Page 40: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 40

SVD - Interpretation #2• More details• Q: how exactly is dim. reduction done?

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

Page 41: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 41

SVD - Interpretation #2• More details• Q: how exactly is dim. reduction done?• A: set the smallest singular values to zero:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

Page 42: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 42

SVD - Interpretation #2

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

~9.64 0

0 0x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

Page 43: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 43

SVD - Interpretation #2

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

~9.64 0

0 0x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

Page 44: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 44

SVD - Interpretation #2

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18

0.36

0.18

0.90

0

00

~9.64

x

0.58 0.58 0.58 0 0

x

Page 45: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 45

SVD - Interpretation #2

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

~

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 0 0

0 0 0 0 00 0 0 0 0

Page 46: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 46

SVD - Interpretation #2Exactly equivalent:‘spectral decomposition’ of the matrix:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

Page 47: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 47

SVD - Interpretation #2Exactly equivalent:‘spectral decomposition’ of the matrix:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

= x xu1 u2

1

2

v1

v2

Page 48: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 48

SVD - Interpretation #2Exactly equivalent:‘spectral decomposition’ of the matrix:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

= u11 vT1 u22 vT

2+ +...n

m

Page 49: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 49

SVD - Interpretation #2Exactly equivalent:‘spectral decomposition’ of the matrix:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

= u11 vT1 u22 vT

2+ +...n

m

n x 1 1 x m

r terms

Page 50: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 50

SVD - Interpretation #2approximation / dim. reduction:by keeping the first few terms (Q: how many?)

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

= u11 vT1 u22 vT

2+ +...n

m

assume: 1 >= 2 >= ...

Page 51: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 51

SVD - Interpretation #2A (heuristic - [Fukunaga]): keep 80-90% of

‘energy’ (= sum of squares of i ’s)

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

= u11 vT1 u22 vT

2+ +...n

m

assume: 1 >= 2 >= ...

Page 52: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 52

Pictorially: matrix form of SVD

– Best rank-k approximation in L2

Am

n

m

n

U

VT

Page 53: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 53

Pictorially: Spectral form of SVD

– Best rank-k approximation in L2

Am

n

+

1u1v1 2u2v2

Page 54: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 54

SVD - Detailed outline• Motivation• Definition - properties• Interpretation

– #1: documents/terms/concepts– #2: dim. reduction– #3: picking non-zero, rectangular ‘blobs’

• Complexity• Case studies• Additional properties

Page 55: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 55

SVD - Interpretation #3• finds non-zero ‘blobs’ in a data matrix

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

Page 56: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 56

SVD - Interpretation #3• finds non-zero ‘blobs’ in a data matrix

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

Page 57: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) P1-57

SVD - Interpretation #3• finds non-zero ‘blobs’ in a data matrix =• ‘communities’ (bi-partite cores, here)

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

Row 1

Row 4

Col 1

Col 3

Col 4Row 5

Row 7

Page 58: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 58

SVD - Interpretation #3• Drill: find the SVD, ‘by inspection’!• Q: rank = ??

1 1 1 0 0

1 1 1 0 0

1 1 1 0 0

0 0 0 1 1

0 0 0 1 1

= x x?? ??

??

Page 59: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 59

SVD - Interpretation #3• A: rank = 2 (2 linearly independent

rows/cols)

1 1 1 0 0

1 1 1 0 0

1 1 1 0 0

0 0 0 1 1

0 0 0 1 1

= x x??

??

?? 0

0 ??

??

??

Page 60: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 60

SVD - Interpretation #3• A: rank = 2 (2 linearly independent

rows/cols)

1 1 1 0 0

1 1 1 0 0

1 1 1 0 0

0 0 0 1 1

0 0 0 1 1

= x x?? 0

0 ??

1 0

1 0

1 0

0 1

0 11 1 1 0 0

0 0 0 1 1

orthogonal??

Page 61: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 61

SVD - Interpretation #3• column vectors: are orthogonal - but not

unit vectors:

1 1 1 0 0

1 1 1 0 0

1 1 1 0 0

0 0 0 1 1

0 0 0 1 1

= x x?? 0

0 ??

1/sqrt(3) 1/sqrt(3) 1/sqrt(3) 0 0

0 0 0 1/sqrt(2) 1/sqrt(2)

1/sqrt(3) 0

1/sqrt(3) 0

1/sqrt(3) 0

0 1/sqrt(2)

0 1/sqrt(2)

Page 62: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 62

SVD - Interpretation #3• and the singular values are:

1 1 1 0 0

1 1 1 0 0

1 1 1 0 0

0 0 0 1 1

0 0 0 1 1

= x x3 0

0 2

1/sqrt(3) 1/sqrt(3) 1/sqrt(3) 0 0

0 0 0 1/sqrt(2) 1/sqrt(2)

1/sqrt(3) 0

1/sqrt(3) 0

1/sqrt(3) 0

0 1/sqrt(2)

0 1/sqrt(2)

Page 63: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 63

SVD - Interpretation #3• Q: How to check we are correct?

1 1 1 0 0

1 1 1 0 0

1 1 1 0 0

0 0 0 1 1

0 0 0 1 1

= x x3 0

0 2

1/sqrt(3) 1/sqrt(3) 1/sqrt(3) 0 0

0 0 0 1/sqrt(2) 1/sqrt(2)

1/sqrt(3) 0

1/sqrt(3) 0

1/sqrt(3) 0

0 1/sqrt(2)

0 1/sqrt(2)

Page 64: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 64

SVD - Interpretation #3• A: SVD properties:

– matrix product should give back matrix A– matrix U should be column-orthonormal, i.e.,

columns should be unit vectors, orthogonal to each other

– ditto for matrix V– matrix should be diagonal, with positive

values

Page 65: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 65

SVD - Detailed outline• Motivation• Definition - properties• Interpretation• Complexity• Case studies• Additional properties

Page 66: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 66

SVD - Complexity• O( n * m * m) or O( n * n * m) (whichever

is less)• less work, if we just want singular values• or if we want first k singular vectors• or if the matrix is sparse [Berry]• Implemented: in any linear algebra package

(LINPACK, matlab, Splus/R, mathematica ...)

Page 67: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 67

SVD - conclusions so far• SVD: A= U VT : unique (*)• U: document-to-concept similarities• V: term-to-concept similarities• : strength of each concept• dim. reduction: keep the first few strongest

singular values (80-90% of ‘energy’)– SVD: picks up linear correlations

• SVD: picks up non-zero ‘blobs’

Page 68: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2012) 68

References• Berry, Michael: http://www.cs.utk.edu/~lsi/• Fukunaga, K. (1990). Introduction to Statistical

Pattern Recognition, Academic Press.• Press, W. H., S. A. Teukolsky, et al. (1992).

Numerical Recipes in C, Cambridge University Press.