.
Multivariate Information Bottleneck
Nir Friedman Ori Mosenzon
Noam Slonim Naftali Tishby
Hebrew University
Data Analysis
Population
Statistics
5 15 25 35 45 55 65 75 80
Age
Information Bottleneck
Cluster “age” clusters that are predictive of education level?
High sc
hool
Bachlo
r’s d
egre
e
PHDNon
e
17192429343944495459646974
Some
colle
ge
Information Bottleneck
Cluster “age” clusters that are predictive of education level?
Also cluster education attained to be predictive of age?
High sc
hool
Bachlo
r’s d
egre
e
PHDNon
e
17192429343944495459646974
Some
colle
ge
Our contribution
Generalize Information Bottleneck:
Generic principle for specifying systems of interacting clusters
Characterization of the solution for these specs
General purpose methods for constructing solutions
Information Bottleneck[Tishby, Peirera & Bialek 99]
A
B P(A,B)
T
B P(T,B)
P(T|A)
Soft clustering
);( ATI);( BTI
A B
T
Minimize: I(T;A) - I(T;B)
CompressionInformation lost about A
Preserved information about B
Tradeoff
Information Bottleneck Reexamined
A B
T
A B
T
Actual Distribution
)|(),( ATPBAP
Input parameters
A B
T
Desired independencies
)|;( TBAInd
G in G out
Example: Symmetric Bottleneck
Simultaneous clustering of both A and B P(TA|A)
P(TB|B)
A
TA
B
TB
G in
A B
TA TB
G out
So that TA captures the information A contain about B
TB captures the information B contain about A
General Principle
Input: P(X1,…,Xn)
G in - Compression Tj clusters values of paj
G out - Desired (conditional) independencies
Goal: Find P(Tj|paj) in G in to “match” G out
X1 X2 Xn…
T1 Tk…
Multi-information
Multi-information
Information random variables jointly contain about each other
Generalizes mutual information
I
])()(),,(
[log),,(1
11
n
nn XPXP
XXPEXXΙ
Graph Projection
Let G be a DAG
Define:
)(min)( QPKLGPKL GQ
P
Distributions consistent with G
All possible distributions
Graph Projection
Let G be a DAG
Define:
)(min)( QPKLGPKL GQ
P
Multi-info as thoughP is consistent with G
Real multi-info
Gn IXXIGPKL ),,()( 1
Proposition:
Multi-information & Bayesian Networks
Proposition:
If P is consistent with G
Then
Define
I
i
iin XPXXP )|(),,( 1 pa
Sum of local interactions
i
iiG XII );( pa
i
iin XIXXI );(),,( 1 pa
Optimizing Criteria
Two goals: Lose info wrt G in
Attain conditional independencies in G out
Optimization objective:
)( outin GPKLIL
Force clusters to compress Minimize violations
of conditional indep. in G out
Additional Interpretation
Using properties of we can rewrite
Thus, we can instead minimize
)(
)(outinin
outin
III
GPKLIL
outin IIL
)( GPKL
Minimize informationin G in
Maximize informationin G out
Minimization Objective - Example
);();();( BABA TTIBTIATIL
A
TA
B
TB
G in
A B
TA TBG out
Symmetric Bottleneck
Recall BA
BABA BAPBTPATPTTP,
),()|()|(),(
Input (fixed)Parameters we
can controlParameters we
can control
Characterization of Solutions
Thm: Minimal point if and only if
)},(Exp{),(
)()|( jj
jj
jjj td
Z
tPtP pa
papa
d(tj,paj) - measure of “distortion” between tj and paj
For example in symmetric bottleneck:))|()|((),( aBBA tTPaTPKLatd
Finding Solutions
How can we find solutions?
Asynchronous update Pick an index j Update P(Tj|paj)
Theorem Asynchronous updates converge to (local) minima
)},(Exp{),(
)()|( jj
jj
jjj td
Z
tPtP pa
papa
Example - 20 newsgroup
20,000 messages from 20 news group [Lang 1995]
A - newsgroup of the message B - word in the message
P(a,b) -
probability that choosing a random position in the corpus would select word b in a message in newsgroup a
We applied symmetric bottleneck on both attributes
20 Newsgroup: Symmetric Bottleneck
N
ewsg
roup
word
20 Newsgroup: Symmetric Bottleneck
alt.atheismrec.autosrec.motorcyclesrec.sport.*sci.medsci.spacesoc.religion.christiantalk.politics.*
comp.*misc.forsalesci.cryptsci.electronics
carturkishgameteamjesusgunhockey…
xfileimageencryptionwindowdosmac…
New
sgro
up
word
P(TD,TW)
20 Newsgroup: Symmetric Bottleneck
New
sgro
up
word
P(TD,TW)
20 Newsgroup: Symmetric Bottleneck
New
sgro
up
word
P(TD,TW)
20 Newsgroup: Symmetric Bottleneck
New
sgro
up
word
P(TD,TW)
20 Newsgroup: Symmetric Bottleneck
New
sgro
up
wordatheistschristianityjesusbiblesinfaith…
alt.atheismsoc.religion.christiantalk.religion.misc
P(TD,TW)
Discussion
General framework: Defines a new family of optimization problems
… and solutions
Future directions: Additional algorithms - agglomerative solutions Relation to generative models Parametric constraints in Gout
Example: Parallel Bottleneck
A B
T1 T2A
T1
B
T2
Gin Gout
)];,();([);();( 212111 BTTITTIBTIATIL
))|()|((
)),|(),|((),(
aBB
BaBA
tTPaTPKL
TtBPTaBPKLatd