23
1 15-826 (c) C. Faloutsos and J-Y Pan (2007) #1 CMU SCS 15-826: Multimedia Databases and Data Mining Independent Component Analysis (ICA) Jia-Yu Pan and Christos Faloutsos 15-826 (c) C. Faloutsos and J-Y Pan (2007) #2 CMU SCS Outline • Motivation • Formulation PCA and ICA Example applications • Conclusion 15-826 (c) C. Faloutsos and J-Y Pan (2007) #3 CMU SCS Motivation: (Q1) Find patterns in data Motion capture data: broad jumps Left Knee Right Knee Energy exerted Take-off Landing Energy exerted

15-826: Multimedia Databases and Data Miningchristos/courses/826.S07/FOILS-pdf/620_ICA_2007s… · 15-826 (c) C. Faloutsos and J-Y Pan (2007) #9 CMU SCS Outline • Motivation •

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 15-826: Multimedia Databases and Data Miningchristos/courses/826.S07/FOILS-pdf/620_ICA_2007s… · 15-826 (c) C. Faloutsos and J-Y Pan (2007) #9 CMU SCS Outline • Motivation •

1

15-826 (c) C. Faloutsos and J-Y Pan (2007) #1

CMU SCS

15-826: Multimedia Databases

and Data Mining

Independent Component Analysis (ICA)

Jia-Yu Pan and Christos Faloutsos

15-826 (c) C. Faloutsos and J-Y Pan (2007) #2

CMU SCS

Outline

• Motivation

• Formulation

• PCA and ICA

• Example applications

• Conclusion

15-826 (c) C. Faloutsos and J-Y Pan (2007) #3

CMU SCS

Motivation:

(Q1) Find patterns in data• Motion capture data: broad jumps

Left Knee

Right Knee

Energy exerted

Take-off

Landing

Energy exerted

Page 2: 15-826: Multimedia Databases and Data Miningchristos/courses/826.S07/FOILS-pdf/620_ICA_2007s… · 15-826 (c) C. Faloutsos and J-Y Pan (2007) #9 CMU SCS Outline • Motivation •

2

15-826 (c) C. Faloutsos and J-Y Pan (2007) #4

CMU SCS

Motivation:

(Q1) Find patterns in data

• Human would say

– Pattern 1: along

diagonal

– Pattern 2: along

vertical axis

• How to find these

automatically?Each point is the measurement

at a time tick (total 550 points).

Take-off

LandingR:L=60:1

R:L=1:1

15-826 (c) C. Faloutsos and J-Y Pan (2007) #5

CMU SCS

Motivation:

(Q2) Find hidden variables

Hidden variables

“Internet bubble”

Alcoa

American

Express

Boeing

Citi Group

“General trend”

Stock prices

15-826 (c) C. Faloutsos and J-Y Pan (2007) #6

CMU SCS

Motivation:

(Q2) Find hidden variables

“General trend” “Internet bubble”

0.940.030.63

0.64

Caterpillar Intel

Hidden variables

Page 3: 15-826: Multimedia Databases and Data Miningchristos/courses/826.S07/FOILS-pdf/620_ICA_2007s… · 15-826 (c) C. Faloutsos and J-Y Pan (2007) #9 CMU SCS Outline • Motivation •

3

15-826 (c) C. Faloutsos and J-Y Pan (2007) #7

CMU SCS

Motivation:

(Q2) Find hidden variables

Hidden variable 1 Hidden variable 2

B1,CAT

B1,INTC B2,CAT

B2,INTC?

? ?

Caterpillar Intel

15-826 (c) C. Faloutsos and J-Y Pan (2007) #8

CMU SCS

Motivation:

Find hidden variables

• There are two sound sources in a cocktail party…

Also called the “blind source separation” problem.

“Blind” because we don’t know the sources,

nor how they are mixed.

15-826 (c) C. Faloutsos and J-Y Pan (2007) #9

CMU SCS

Outline

• Motivation

• Formulation

• PCA and ICA

• Example applications

• Conclusion

Page 4: 15-826: Multimedia Databases and Data Miningchristos/courses/826.S07/FOILS-pdf/620_ICA_2007s… · 15-826 (c) C. Faloutsos and J-Y Pan (2007) #9 CMU SCS Outline • Motivation •

4

15-826 (c) C. Faloutsos and J-Y Pan (2007) #10

CMU SCS

Formulation: Finding patterns

Given n data points,

each with m attributes.

Find patterns that describe

data properties the best.

15-826 (c) C. Faloutsos and J-Y Pan (2007) #11

CMU SCS

Linear representation

• Find patterns that are vectors that describe the data set the best.

• Each point is described as a linear combination of the vectors

(patterns):

22,1, bbx 1i

vvvii hh +=

15-826 (c) C. Faloutsos and J-Y Pan (2007) #12

CMU SCS

Patterns as data “vocabulary”

Good pattern ≈ sparse coding

(Q) Given data xi’s,compute hi,j’s and bi’s that are “sparse”?

Only b1 is needed

to describe xi.

Page 5: 15-826: Multimedia Databases and Data Miningchristos/courses/826.S07/FOILS-pdf/620_ICA_2007s… · 15-826 (c) C. Faloutsos and J-Y Pan (2007) #9 CMU SCS Outline • Motivation •

5

15-826 (c) C. Faloutsos and J-Y Pan (2007) #13

CMU SCS

Patterns in motion capture data

b2

b1

n=550 ticks

2

2,1,

2,21,2

2,11,1

nxX

nn xx

xx

xx

MM

MM

Data

matrix

Left Right

222

2

2,1,

2,21,2

2,11,1

xnx

1

BH

b

b

=

−−

−−

= r

r

MM

MM

nn hh

hh

hh

Basis

vectors

Hidden

variables

? ?

Sparse ~ non-Gaussian ~ “Independent”

“Independent”: e.g., minimize mutual information.

15-826 (c) C. Faloutsos and J-Y Pan (2007) #14

CMU SCS

Outline

• Motivation

• Formulation

• PCA and ICA

• Example applications

• Conclusion

15-826 (c) C. Faloutsos and J-Y Pan (2007) #15

CMU SCS

Basis vectors and hidden variables

• Goal: Knowing X, find H and B, where

X = H B

• Problem: Under-constrained

– Need additional assumptions/constraints

X: data set

H: hidden variables

B: basis vectors

Page 6: 15-826: Multimedia Databases and Data Miningchristos/courses/826.S07/FOILS-pdf/620_ICA_2007s… · 15-826 (c) C. Faloutsos and J-Y Pan (2007) #9 CMU SCS Outline • Motivation •

6

15-826 (c) C. Faloutsos and J-Y Pan (2007) #16

CMU SCS

PCA and ICA

• PCA vectors: major variations

– Together = good “low-rank approximation”/dimensional reduction

– Individually ≠ meaningful patterns

• Luckily, ICA detects the major meaningful patterns.

PCA Vectors ICA Vectors

15-826 (c) C. Faloutsos and J-Y Pan (2007) #17

CMU SCS

PCA

• PCA (Principal Component Analysis)

– Choose vectors which are orthonormal and

– give smallest representation L2 error for

dimensional reduction

• Matrices H and B can be solved by

– SVD, neural networks, or many optimization

methods

15-826 (c) C. Faloutsos and J-Y Pan (2007) #18

CMU SCS

PCA

• Extremely popular

– Latent Semantic Indexing [Deerwester+90]

– KL transform [Duda,Hart,Stork00]

– EigenFace [Turk,Pentlend91]

– Multiple time series correlation

[Guha,Gunopulos,Koudas03]

• But, there is room for improvement.

Page 7: 15-826: Multimedia Databases and Data Miningchristos/courses/826.S07/FOILS-pdf/620_ICA_2007s… · 15-826 (c) C. Faloutsos and J-Y Pan (2007) #9 CMU SCS Outline • Motivation •

7

15-826 (c) C. Faloutsos and J-Y Pan (2007) #19

CMU SCS

ICA• ICA (Independent Component Analysis)

– Make hidden variables hi’s (columns of H)

mutually independent.

• Many implementations

– Many ways to define “independence”

– Many ways to find the most independent H.

– (B is found at the same time, since X=HB.)

15-826 (c) C. Faloutsos and J-Y Pan (2007) #20

CMU SCS

ICA

• Define “Independence”:

– Zero mutual information

– Non- Gaussianity, max. absolute Kurtosis

• To solve for H,B:

– Neural networks, optimization methods

(gradient ascent, fixed- point, …)

)()(),( jiji hphphhp =

15-826 (c) C. Faloutsos and J-Y Pan (2007) #21

CMU SCS

An non-gaussian distribution:

Laplacian pdf

)exp(2

)( xxP λλ

−=

Sharper at 0,

and more heavy tail

than Gaussian pdf

Page 8: 15-826: Multimedia Databases and Data Miningchristos/courses/826.S07/FOILS-pdf/620_ICA_2007s… · 15-826 (c) C. Faloutsos and J-Y Pan (2007) #9 CMU SCS Outline • Motivation •

8

15-826 (c) C. Faloutsos and J-Y Pan (2007) #22

CMU SCS

Maximizing non-gaussianity• Assume hi ~ non- gaussian pdf (e.g. Laplacian pdf)

– Fixed hi values, what is the most likely “B” ?

• (data point x is given and fixed)

– Find B, s.t. likelihood P(x|B) is maximized.

)det(

)()|(

)()()(

)det(

)()(

B

BxBx

Bxhh

B

hBx

Bhx

1

1

=⇒

==

=⇒

=

vv

vvv

vv

vv

Laplacian

LaplacianLaplacian

PP

PPP

P|P

15-826 (c) C. Faloutsos and J-Y Pan (2007) #23

CMU SCS

Maximize likelihood

• Likelihood P(x|B) is a function of B, f(B)

• Gradient ascent

– To find B which maximizes P(x|B)

f(B)

BB*

15-826 (c) C. Faloutsos and J-Y Pan (2007) #24

CMU SCS

ICA by maximum likelihood

H B

X: static

1−= XBH

BBB

nBHZBB

HsignZ

TTT

∆+=

−−∝∆

−=

ε

)(

Page 9: 15-826: Multimedia Databases and Data Miningchristos/courses/826.S07/FOILS-pdf/620_ICA_2007s… · 15-826 (c) C. Faloutsos and J-Y Pan (2007) #9 CMU SCS Outline • Motivation •

9

15-826 (c) C. Faloutsos and J-Y Pan (2007) #25

CMU SCS

Convergence of ICA

ICA

Start with the matrix B as

an identity matrix.

15-826 (c) C. Faloutsos and J-Y Pan (2007) #26

CMU SCS

Outline

• Motivation

• Formulation

• PCA and ICA

• Example applications

– Find topics in documents

– Hidden variables in stock prices

– Visual vocabulary for retinal images

• Conclusion

15-826 (c) C. Faloutsos and J-Y Pan (2007) #27

CMU SCS

Pattern discovery with ICA: AutoSplit[PAKDD 04][WIRI 05]

Step 1:

Data points (matrix)

Step 2:Compute patterns

Step 3:

Interpret patterns

or

or

Data mining

(Case studies)

(Q) What pattern?

(Q) Differentmodalities

(Q) How?

Video frames

Stock prices

Text documents

Page 10: 15-826: Multimedia Databases and Data Miningchristos/courses/826.S07/FOILS-pdf/620_ICA_2007s… · 15-826 (c) C. Faloutsos and J-Y Pan (2007) #9 CMU SCS Outline • Motivation •

10

15-826 (c) C. Faloutsos and J-Y Pan (2007) #28

CMU SCS

Finding patterns in high-

dimensional data

PCA finds the hyperplane. ICA finds the correct patterns.

Dimensionality

reduction

15-826 (c) C. Faloutsos and J-Y Pan (2007) #29

CMU SCS

Outline

• Motivation

• Formulation

• PCA and ICA

• Example applications

– Find topics in documents

– Hidden variables in stock prices

– Visual vocabulary for retinal images

• Conclusion

15-826 (c) C. Faloutsos and J-Y Pan (2007) #30

CMU SCS

Topic discovery on text streams

• Data: CNN headline news (Jan.-Jun. 1998)

• Documents of 10 topics in one single text

stream

– Documents are sorted by date/time

– Subsequent documents may have different

topics

Date/Time

Topic 1 Topic 3 Topic 1…

Page 11: 15-826: Multimedia Databases and Data Miningchristos/courses/826.S07/FOILS-pdf/620_ICA_2007s… · 15-826 (c) C. Faloutsos and J-Y Pan (2007) #9 CMU SCS Outline • Motivation •

11

15-826 (c) C. Faloutsos and J-Y Pan (2007) #31

CMU SCS

Topic discovery on text streams

• Known: number of topics = 10

• Unknown: (1) topic of each document (2)

topic description

Date/TimeTopic 1 Topic 3 Topic 1…

15-826 (c) C. Faloutsos and J-Y Pan (2007) #32

CMU SCS

Topic discovery in documents

X[nxm]

xi = [1, 5, …, 0]

Step 1

Windowing

Step 2

X[nxm] = H[nxm] B[mxm]

(1) Find hyperplane

(m=10)

(2) Find patterns

aaron

zoo

m=3887 (dictionary size)

(n=1659)New stories

(30 words)

Step 3

b’i = [0, 0.7, …, 0.6]

aaron

zoo

animal

B’[10x3887]

−−

−−

−−

'

'

'

10

2

1

b

b

b

M

(Q) What does b’i mean?

15-826 (c) C. Faloutsos and J-Y Pan (2007) #33

CMU SCS

Step 3: Interpret the patterns

b’i = [0, 0.7, …, 0.6]

aaron

zooanim

al

Topicsfound

A hidden topic!

Top words : “animal”, “zoo”, …

Sorted word listID

bombReFloridaTornadoPeoplJ

ReTeamGameBowlSuperI

AsianEconomJapanEconomiAsiaH

VisitCubanCastroCubePopeG

GameWomenGoldOlympMedalF

JoneFormerKillGrahamZamoraE

DoctorPillImpotDrugViagraD

CattlOprahTexaBeefWinfreiC

BirminghamAtlantaClinicRudolphbombB

ArmiMajorsexualSergeantMckinneA

General idea: related to the data attributes

−−

−−

−−

'

'

'

10

2

1

b

b

b

M

m=3887 (dictionary size)

Page 12: 15-826: Multimedia Databases and Data Miningchristos/courses/826.S07/FOILS-pdf/620_ICA_2007s… · 15-826 (c) C. Faloutsos and J-Y Pan (2007) #9 CMU SCS Outline • Motivation •

12

15-826 (c) C. Faloutsos and J-Y Pan (2007) #34

CMU SCS

Step 3: Evaluate the patterns

Tornado in Florida10

Superbowl XXXII9

The economic crisis in Asia8

The Pope’s historic visit to Cube in Winter 19987

1998 Winter Olympic games6

Diane Zamora is convicted of helping to murder her lover’s girlfriend5

New impotency drug Viagra is approved for use4

The Cattle Industry in Texas sues Oprah Winfrey for defaming beef3

A bomb explodes in a Birmingham, AL abortion clinic2

Sgt. Gene Mckinney is on trial for alleged sexual misconduct1

True TopicID

AutoSplit finds correct topics.

Sorted word listID

joneformerkillgrahamzamoraE

doctorpillImpotdrugviagraD

cattloprahtexabeefwinfreiC

birminghamatlantaclinicrudolphbombB

armimajorsexualsergeantmckinneA

15-826 (c) C. Faloutsos and J-Y Pan (2007) #35

CMU SCS

Step 3: Evaluate the patterns

AutoSplitID

joneformerkillgrahamzamoraE

doctorpillImpotdrugviagraD

cattloprahtexabeefwinfreiC

birminghamatlantaclinicrudolphbombB

armimajorsexualsergeantmckinneA

AutoSplit’s topics are better than PCA.

PCAID

olympgrahamwinfreiviagrazamoraE’

beeftexadrugwinfreiviagraD’

oprahbeeftexaviagrawinfreiC’

atlantaclinicrudolphmckinnebombB’

sergeantsexualwomenbombmckinneA’

PCAID

olympgrahamwinfreiviagrazamoraE’

beeftexadrugwinfreiviagraD’

oprahbeeftexaviagrawinfreiC’

atlantaclinicrudolphmckinnebombB’

sergeantsexualwomenbombmckinneA’

15-826 (c) C. Faloutsos and J-Y Pan (2007) #36

CMU SCS

Step 3: Evaluate the patternsAutoSplit

E

D

C

B

A

AutoSplit’s topics are better than PCA.

PCA

E’

D’

C’

B’

A’

PCA vectors mix the topics.

Topic 1

Topic 2

Page 13: 15-826: Multimedia Databases and Data Miningchristos/courses/826.S07/FOILS-pdf/620_ICA_2007s… · 15-826 (c) C. Faloutsos and J-Y Pan (2007) #9 CMU SCS Outline • Motivation •

13

15-826 (c) C. Faloutsos and J-Y Pan (2007) #37

CMU SCS

Outline

• Motivation

• Formulation

• PCA and ICA

• Example applications

– Find topics in documents

– Hidden variables in stock prices

– Visual vocabulary for retinal images

• Conclusion

15-826 (c) C. Faloutsos and J-Y Pan (2007) #38

CMU SCS

Find hidden variables (DJIA stocks)

• Weekly DJIA closing prices

– 01/02/1990- 08/05/2002, n=660 data points

– A data point: prices of 29 companies at the time

Alcoa

American

Express

Boeing

Caterpillar

Citi Group

15-826 (c) C. Faloutsos and J-Y Pan (2007) #39

CMU SCS

Formulation: Find hidden variables

AA1, …, XOM1

AAn, …, XOMn

H11, H12, …, H1m

Hn1, Hn2, …, Hnm

=

B11, B12, …, B1m

Bm1, Bm2, …, Bmm? ?

Date Hidden variable

Date

Page 14: 15-826: Multimedia Databases and Data Miningchristos/courses/826.S07/FOILS-pdf/620_ICA_2007s… · 15-826 (c) C. Faloutsos and J-Y Pan (2007) #9 CMU SCS Outline • Motivation •

14

15-826 (c) C. Faloutsos and J-Y Pan (2007) #40

CMU SCS

Characterize hidden variable by

the companies it influences

“General trend” “Internet bubble”

0.940.63 0.03

0.64

Caterpillar Intel

B1,CAT

B1,INTC B2,CAT

B2,INTC

15-826 (c) C. Faloutsos and J-Y Pan (2007) #41

CMU SCS

Companies related to hidden variable 1

0.658768Hewlett-Packard0.900317Du Pont

0.647774Home Depot0.903858Coca Cola

0.638010Intel0.906542MMM

0.624570WalMart0.911120Boeing

0.021885AT&T0.938512Caterpillar

LowestHighest

B1,j

“General trend”

15-826 (c) C. Faloutsos and J-Y Pan (2007) #42

CMU SCS

Companies related to hidden variable 1

0.658768Hewlett-Packard0.900317Du Pont

0.647774Home Depot0.903858Coca Cola

0.638010Intel0.906542MMM

0.624570WalMart0.911120Boeing

0.021885AT&T0.938512Caterpillar

LowestHighest

B1,j

All companies are affected by the “general trend”variable (with weights 0.6~0.9), except AT&T.

Page 15: 15-826: Multimedia Databases and Data Miningchristos/courses/826.S07/FOILS-pdf/620_ICA_2007s… · 15-826 (c) C. Faloutsos and J-Y Pan (2007) #9 CMU SCS Outline • Motivation •

15

15-826 (c) C. Faloutsos and J-Y Pan (2007) #43

CMU SCS

General trend (and outlier)

“General trend”

AT&T

United Technologies

Walmart

Exxon Mobil

15-826 (c) C. Faloutsos and J-Y Pan (2007) #44

CMU SCS

Companies related to hidden variable 2

0.133337Du Pont0.490529Disney

0.109576Procter and Gamble0.504871American Express

0.031678Caterpillar0.509164GE

-0.089569International Paper0.621159Hewlett-Packard

-0.194843Philip Morris0.641102Intel

LowestHighest

B2,j

2000-2001 “Internet bubble”

Tech company

15-826 (c) C. Faloutsos and J-Y Pan (2007) #45

CMU SCS

Companies related to hidden variable 2

0.133337Du Pont0.490529Disney

0.109576Procter and Gamble0.504871American Express

0.031678Caterpillar0.509164GE

-0.089569International Paper0.621159Hewlett-Packard

-0.194843Philip Morris0.641102Intel

LowestHighest

B2,j

Companies affected by the “internet bubble” variable (with weights 0.5~0.6) are tech-related.

Other companies are un-related (weights < 0.15).

Tech company

Page 16: 15-826: Multimedia Databases and Data Miningchristos/courses/826.S07/FOILS-pdf/620_ICA_2007s… · 15-826 (c) C. Faloutsos and J-Y Pan (2007) #9 CMU SCS Outline • Motivation •

16

15-826 (c) C. Faloutsos and J-Y Pan (2007) #46

CMU SCS

Outline

• Motivation

• Formulation

• PCA and ICA

• Example applications

– Find topics in documents

– Hidden variables in stock prices

– Visual vocabulary for retinal images

• Conclusion

15-826 (c) C. Faloutsos and J-Y Pan (2007) #47

CMU SCS

Mining cat retinal images [ICDM 05]

Normal 1 day after

detachment

7 days after

detachment

Retina

28 days after

detachment

3d28dr1h3dr 1d6dO2

Treatment

Detachment Development Distribution of 2 proteins

15-826 (c) C. Faloutsos and J-Y Pan (2007) #48

CMU SCS

“Vocabulary” for biomedical images?

• How to describe biomedical images?

• Analogy: the topics for text

• Football reports: “touchdown”, “punt”, etc.

• DB papers: “query”, “optimization”, etc.

• How to derive “visual vocabulary terms”?

Normal7 days after

detachment

“spongy”

Page 17: 15-826: Multimedia Databases and Data Miningchristos/courses/826.S07/FOILS-pdf/620_ICA_2007s… · 15-826 (c) C. Faloutsos and J-Y Pan (2007) #9 CMU SCS Outline • Motivation •

17

15-826 (c) C. Faloutsos and J-Y Pan (2007) #49

CMU SCS

Visual Vocabulary (ViVo) by

AutoSplit

Step 1:

Tile image

Step 2:

Extract tile

features

Step 3:

ViVo

generation

Visual

vocabulary

V1

V2

Feature 1

Fea

ture

2

8x12 tiles

15-826 (c) C. Faloutsos and J-Y Pan (2007) #50

CMU SCS

Finding ViVos

Each point is a tile. Projected to the 1st and 2nd PCA vectors.

(Feature vector: 512 color structurefeatures.)

Red lines indicate ViVos.

15-826 (c) C. Faloutsos and J-Y Pan (2007) #51

CMU SCS

Bio-mining with “ViVo”

• Visual Vocabulary for retinal images

– using AutoSplit

• Evaluation

– Qualitative: biological meanings of ViVos

– Data mining: highlight “interesting” regions

Page 18: 15-826: Multimedia Databases and Data Miningchristos/courses/826.S07/FOILS-pdf/620_ICA_2007s… · 15-826 (c) C. Faloutsos and J-Y Pan (2007) #9 CMU SCS Outline • Motivation •

18

15-826 (c) C. Faloutsos and J-Y Pan (2007) #52

CMU SCS

Biological interpretation of ViVos

V11

V8

V10

V1

ID

Co-occurring processes: Müller cell

hypertrophy and rod opsin

redistribution

Redistribution of rod opsin into cell

bodies of rod photoreceptors

Healthy outer segments of rod

photoreceptors

GFAP in inner retina (Müller cells)

Description ConditionViVo

Detached

Detached

Healthy

Healthy

15-826 (c) C. Faloutsos and J-Y Pan (2007) #53

CMU SCS

Biological interpretation of ViVos

5

4

3

2

ID

Healthy outer

segments of

rod

photoreceptors

(rod opsin)

GFAP in

hypertrophy

Müller cells

GFAP in

hypertrophy

Müller cells

GFAP in

hypertrophy

Müller cells

Description ConditionViVo

Healthy

Morphologic

al changes in

inner retina

Morphologic

al changes in

inner retina

Morphologic

al changes in

inner retina

12

9

7

6

ID

GFAP in

hypertrophy

Müller cells

Outer segment

degeneration

(rod opsin)

GFAP in

hypertrophy

Müller cells

Rod

photoreceptor

cell body

Description ConditionViVo

Morphologica

l changes in

inner retina

Detached

Morphologica

l changes in

inner retina

Background

labeling

15-826 (c) C. Faloutsos and J-Y Pan (2007) #54

CMU SCS

Bio-mining with “ViVo”

• Visual Vocabulary for retinal images

– using AutoSplit

• Evaluation

– Qualitative: biological meanings of ViVos

– Data mining: highlight “interesting” regions

Page 19: 15-826: Multimedia Databases and Data Miningchristos/courses/826.S07/FOILS-pdf/620_ICA_2007s… · 15-826 (c) C. Faloutsos and J-Y Pan (2007) #9 CMU SCS Outline • Motivation •

19

15-826 (c) C. Faloutsos and J-Y Pan (2007) #55

CMU SCS

Finding “distinguishing ViVos”

• Given: Images of two classes

– Find the class- distinguishing ViVo (“DiVo”)

– Highlight distinguishing regions

Normal Detached 3 days

DiVo: “spongy”

15-826 (c) C. Faloutsos and J-Y Pan (2007) #56

CMU SCS

Summary: a system viewpoint

Our

system

Input Output

(1) Left: “n”; Right: “3d”

Accurate classification

DiVo analysis

V8: “spongy”

(2) Regions shown (V8): “cells of rod photoreceptors”

(3) Description:“Detachment occurs!”

“Rod opsin distributes from

outer segment into cell bodies.”

ViVo interpretation

15-826 (c) C. Faloutsos and J-Y Pan (2007) #57

CMU SCS

Outline

• Motivation

• Formulation

• PCA and ICA

• Example applications

– Find topics in documents

– Hidden variables in stock prices

– Visual vocabulary for retinal images

• Conclusion

Page 20: 15-826: Multimedia Databases and Data Miningchristos/courses/826.S07/FOILS-pdf/620_ICA_2007s… · 15-826 (c) C. Faloutsos and J-Y Pan (2007) #9 CMU SCS Outline • Motivation •

20

15-826 (c) C. Faloutsos and J-Y Pan (2007) #58

CMU SCS

Conclusion

• ICA: more flexible than PCA in finding

patterns.

• Many applications

– Find topics and “vocabulary” for images

– Find hidden variables in time series (e.g., stock

prices)

– Blind source separation

15-826 (c) C. Faloutsos and J-Y Pan (2007) #59

CMU SCS

Vocabulary for embryo gene expressions

with André Balan, Christos Faloutsos, Eric P. Xing

Vocabulary

15-826 (c) C. Faloutsos and J-Y Pan (2007) #60

CMU SCS

References

• Jia-Yu Pan, Andre Guilherme Ribeiro Balan, Eric P. Xing, Agma JuciMachado Traina, and Christos Faloutsos. Automatic Mining of Fruit Fly Embryo Images. In Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2006.

• Arnab Bhattacharya, Vebjorn Ljosa, Jia-Yu Pan, Mark R. Verardo, Hyungjeong Yang, Christos Faloutsos, and Ambuj K. Singh. ViVo: Visual Vocabulary Construction for Mining Biomedical Images. In Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM), 2005.

• Masafumi Hamamoto, Hiroyuki Kitagawa, Jia-Yu Pan, and ChristosFaloutsos. A Comparative Study of Feature Vector-Based Topic Detection Schemes for Text Streams. In Proceedings of International Workshop on Challenges in Web Information Retrieval and Integration (WIRI), 2005, pp.125-130.

• Jia-Yu Pan, Hiroyuki Kitagawa, Christos Faloutsos, and MasafumiHamamoto. AutoSplit: Fast and Scalable Discovery of Hidden Variables in Stream and Multimedia Databases. In Proceedings of the The Eighth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2004.

Page 21: 15-826: Multimedia Databases and Data Miningchristos/courses/826.S07/FOILS-pdf/620_ICA_2007s… · 15-826 (c) C. Faloutsos and J-Y Pan (2007) #9 CMU SCS Outline • Motivation •

21

15-826 (c) C. Faloutsos and J-Y Pan (2007) #61

CMU SCS

Acknowledgement

• Prof. Tai Sing Lee

• Prof. Hiroyuki

Kitagawa

• Prof. HyungJeong

Yang

• Masafumi Hamamoto

• Prof. Nancy Pollard

• Prof. Jessica Hodgins

• Prof. Michael Lewicki

• Prof. Eric Xing

• CMU Informedia

project

• UCSB DB Lab

• CMU bio- imaging

center

• CMU graphics lab

15-826 (c) C. Faloutsos and J-Y Pan (2007) #62

CMU SCS

15-826 (c) C. Faloutsos and J-Y Pan (2007) #63

CMU SCS

Independence

S1

S2

r1

r2

Independent Dependent

Page 22: 15-826: Multimedia Databases and Data Miningchristos/courses/826.S07/FOILS-pdf/620_ICA_2007s… · 15-826 (c) C. Faloutsos and J-Y Pan (2007) #9 CMU SCS Outline • Motivation •

22

15-826 (c) C. Faloutsos and J-Y Pan (2007) #64

CMU SCS

Non-Gaussianity and Independence

More GaussianLess Gaussian

P(s1) P(r1)

15-826 (c) C. Faloutsos and J-Y Pan (2007) #65

CMU SCS

Non-Gaussianity and

Independence

S1

S2

r1

r2

Independent Dependent

15-826 (c) C. Faloutsos and J-Y Pan (2007) #66

CMU SCS

Non-Gaussianity and Independence

More GaussianLess Gaussian

P(s1) P(r1)

Page 23: 15-826: Multimedia Databases and Data Miningchristos/courses/826.S07/FOILS-pdf/620_ICA_2007s… · 15-826 (c) C. Faloutsos and J-Y Pan (2007) #9 CMU SCS Outline • Motivation •

23

15-826 (c) C. Faloutsos and J-Y Pan (2007) #67

CMU SCS

15-826 (c) C. Faloutsos and J-Y Pan (2007) #68

CMU SCS

Citation

• AutoSplit: Fast and Scalable Discovery of

Hidden Variables in Stream and Multimedia

Databases, Jia-Yu Pan, Hiroyuki Kitagawa,

Christos Faloutsos and Masafumi

Hamamoto

PAKDD 2004, Sydney, Australia

15-826 (c) C. Faloutsos and J-Y Pan (2007) #69

CMU SCS

References

• Aapo Hyvärinen, Juha Karhunen, Erkki

Oja: Independent Component Analysis, John

Wiley & Sons, 2001