Transcript
Page 1: Multivariate Information Bottleneck

.

Multivariate Information Bottleneck

Nir Friedman Ori Mosenzon

Noam Slonim Naftali Tishby

Hebrew University

Page 2: Multivariate Information Bottleneck

Data Analysis

Population

Statistics

5 15 25 35 45 55 65 75 80

Age

Page 3: Multivariate Information Bottleneck

Information Bottleneck

Cluster “age” clusters that are predictive of education level?

High sc

hool

Bachlo

r’s d

egre

e

PHDNon

e

17192429343944495459646974

Some

colle

ge

Page 4: Multivariate Information Bottleneck

Information Bottleneck

Cluster “age” clusters that are predictive of education level?

Also cluster education attained to be predictive of age?

High sc

hool

Bachlo

r’s d

egre

e

PHDNon

e

17192429343944495459646974

Some

colle

ge

Page 5: Multivariate Information Bottleneck

Our contribution

Generalize Information Bottleneck:

Generic principle for specifying systems of interacting clusters

Characterization of the solution for these specs

General purpose methods for constructing solutions

Page 6: Multivariate Information Bottleneck

Information Bottleneck[Tishby, Peirera & Bialek 99]

A

B P(A,B)

T

B P(T,B)

P(T|A)

Soft clustering

);( ATI);( BTI

A B

T

Minimize: I(T;A) - I(T;B)

CompressionInformation lost about A

Preserved information about B

Tradeoff

Page 7: Multivariate Information Bottleneck

Information Bottleneck Reexamined

A B

T

A B

T

Actual Distribution

)|(),( ATPBAP

Input parameters

A B

T

Desired independencies

)|;( TBAInd

G in G out

Page 8: Multivariate Information Bottleneck

Example: Symmetric Bottleneck

Simultaneous clustering of both A and B P(TA|A)

P(TB|B)

A

TA

B

TB

G in

A B

TA TB

G out

So that TA captures the information A contain about B

TB captures the information B contain about A

Page 9: Multivariate Information Bottleneck

General Principle

Input: P(X1,…,Xn)

G in - Compression Tj clusters values of paj

G out - Desired (conditional) independencies

Goal: Find P(Tj|paj) in G in to “match” G out

X1 X2 Xn…

T1 Tk…

Page 10: Multivariate Information Bottleneck

Multi-information

Multi-information

Information random variables jointly contain about each other

Generalizes mutual information

I

])()(),,(

[log),,(1

11

n

nn XPXP

XXPEXXΙ

Page 11: Multivariate Information Bottleneck

Graph Projection

Let G be a DAG

Define:

)(min)( QPKLGPKL GQ

P

Distributions consistent with G

All possible distributions

Page 12: Multivariate Information Bottleneck

Graph Projection

Let G be a DAG

Define:

)(min)( QPKLGPKL GQ

P

Multi-info as thoughP is consistent with G

Real multi-info

Gn IXXIGPKL ),,()( 1

Proposition:

Page 13: Multivariate Information Bottleneck

Multi-information & Bayesian Networks

Proposition:

If P is consistent with G

Then

Define

I

i

iin XPXXP )|(),,( 1 pa

Sum of local interactions

i

iiG XII );( pa

i

iin XIXXI );(),,( 1 pa

Page 14: Multivariate Information Bottleneck

Optimizing Criteria

Two goals: Lose info wrt G in

Attain conditional independencies in G out

Optimization objective:

)( outin GPKLIL

Force clusters to compress Minimize violations

of conditional indep. in G out

Page 15: Multivariate Information Bottleneck

Additional Interpretation

Using properties of we can rewrite

Thus, we can instead minimize

)(

)(outinin

outin

III

GPKLIL

outin IIL

)( GPKL

Minimize informationin G in

Maximize informationin G out

Page 16: Multivariate Information Bottleneck

Minimization Objective - Example

);();();( BABA TTIBTIATIL

A

TA

B

TB

G in

A B

TA TBG out

Symmetric Bottleneck

Recall BA

BABA BAPBTPATPTTP,

),()|()|(),(

Input (fixed)Parameters we

can controlParameters we

can control

Page 17: Multivariate Information Bottleneck

Characterization of Solutions

Thm: Minimal point if and only if

)},(Exp{),(

)()|( jj

jj

jjj td

Z

tPtP pa

papa

d(tj,paj) - measure of “distortion” between tj and paj

For example in symmetric bottleneck:))|()|((),( aBBA tTPaTPKLatd

Page 18: Multivariate Information Bottleneck

Finding Solutions

How can we find solutions?

Asynchronous update Pick an index j Update P(Tj|paj)

Theorem Asynchronous updates converge to (local) minima

)},(Exp{),(

)()|( jj

jj

jjj td

Z

tPtP pa

papa

Page 19: Multivariate Information Bottleneck

Example - 20 newsgroup

20,000 messages from 20 news group [Lang 1995]

A - newsgroup of the message B - word in the message

P(a,b) -

probability that choosing a random position in the corpus would select word b in a message in newsgroup a

We applied symmetric bottleneck on both attributes

Page 20: Multivariate Information Bottleneck

20 Newsgroup: Symmetric Bottleneck

N

ewsg

roup

word

Page 21: Multivariate Information Bottleneck

20 Newsgroup: Symmetric Bottleneck

alt.atheismrec.autosrec.motorcyclesrec.sport.*sci.medsci.spacesoc.religion.christiantalk.politics.*

comp.*misc.forsalesci.cryptsci.electronics

carturkishgameteamjesusgunhockey…

xfileimageencryptionwindowdosmac…

New

sgro

up

word

P(TD,TW)

Page 22: Multivariate Information Bottleneck

20 Newsgroup: Symmetric Bottleneck

New

sgro

up

word

P(TD,TW)

Page 23: Multivariate Information Bottleneck

20 Newsgroup: Symmetric Bottleneck

New

sgro

up

word

P(TD,TW)

Page 24: Multivariate Information Bottleneck

20 Newsgroup: Symmetric Bottleneck

New

sgro

up

word

P(TD,TW)

Page 25: Multivariate Information Bottleneck

20 Newsgroup: Symmetric Bottleneck

New

sgro

up

wordatheistschristianityjesusbiblesinfaith…

alt.atheismsoc.religion.christiantalk.religion.misc

P(TD,TW)

Page 26: Multivariate Information Bottleneck

Discussion

General framework: Defines a new family of optimization problems

… and solutions

Future directions: Additional algorithms - agglomerative solutions Relation to generative models Parametric constraints in Gout

Page 27: Multivariate Information Bottleneck

Example: Parallel Bottleneck

A B

T1 T2A

T1

B

T2

Gin Gout

)];,();([);();( 212111 BTTITTIBTIATIL

))|()|((

)),|(),|((),(

aBB

BaBA

tTPaTPKL

TtBPTaBPKLatd


Recommended