Upload
kacy
View
30
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Multivariate Information Bottleneck. Nir FriedmanOri Mosenzon Noam Slonim Naftali Tishby Hebrew University. Statistics. Data Analysis. Population. Information Bottleneck. Bachlor’s degree. Some college. Cluster “age” clusters that are predictive of education level?. High school. - PowerPoint PPT Presentation
Citation preview
.
Multivariate Information Bottleneck
Nir Friedman Ori Mosenzon
Noam Slonim Naftali Tishby
Hebrew University
Data Analysis
Population
Statistics
5 15 25 35 45 55 65 75 80
Age
Information Bottleneck
Cluster “age” clusters that are predictive of education level?
High sc
hool
Bachlo
r’s d
egre
e
PHDNon
e
17192429343944495459646974
Some
colle
ge
Information Bottleneck
Cluster “age” clusters that are predictive of education level?
Also cluster education attained to be predictive of age?
High sc
hool
Bachlo
r’s d
egre
e
PHDNon
e
17192429343944495459646974
Some
colle
ge
Our contribution
Generalize Information Bottleneck:
Generic principle for specifying systems of interacting clusters
Characterization of the solution for these specs
General purpose methods for constructing solutions
Information Bottleneck[Tishby, Peirera & Bialek 99]
A
B P(A,B)
T
B P(T,B)
P(T|A)
Soft clustering
);( ATI);( BTI
A B
T
Minimize: I(T;A) - I(T;B)
CompressionInformation lost about A
Preserved information about B
Tradeoff
Information Bottleneck Reexamined
A B
T
A B
T
Actual Distribution
)|(),( ATPBAP
Input parameters
A B
T
Desired independencies
)|;( TBAInd
G in G out
Example: Symmetric Bottleneck
Simultaneous clustering of both A and B P(TA|A)
P(TB|B)
A
TA
B
TB
G in
A B
TA TB
G out
So that TA captures the information A contain about B
TB captures the information B contain about A
General Principle
Input: P(X1,…,Xn)
G in - Compression Tj clusters values of paj
G out - Desired (conditional) independencies
Goal: Find P(Tj|paj) in G in to “match” G out
X1 X2 Xn…
T1 Tk…
Multi-information
Multi-information
Information random variables jointly contain about each other
Generalizes mutual information
I
])()(),,(
[log),,(1
11
n
nn XPXP
XXPEXXΙ
Graph Projection
Let G be a DAG
Define:
)(min)( QPKLGPKL GQ
P
Distributions consistent with G
All possible distributions
Graph Projection
Let G be a DAG
Define:
)(min)( QPKLGPKL GQ
P
Multi-info as thoughP is consistent with G
Real multi-info
Gn IXXIGPKL ),,()( 1
Proposition:
Multi-information & Bayesian Networks
Proposition:
If P is consistent with G
Then
Define
I
i
iin XPXXP )|(),,( 1 pa
Sum of local interactions
i
iiG XII );( pa
i
iin XIXXI );(),,( 1 pa
Optimizing Criteria
Two goals: Lose info wrt G in
Attain conditional independencies in G out
Optimization objective:
)( outin GPKLIL
Force clusters to compress Minimize violations
of conditional indep. in G out
Additional Interpretation
Using properties of we can rewrite
Thus, we can instead minimize
)(
)(outinin
outin
III
GPKLIL
outin IIL
)( GPKL
Minimize informationin G in
Maximize informationin G out
Minimization Objective - Example
);();();( BABA TTIBTIATIL
A
TA
B
TB
G in
A B
TA TBG out
Symmetric Bottleneck
Recall BA
BABA BAPBTPATPTTP,
),()|()|(),(
Input (fixed)Parameters we
can controlParameters we
can control
Characterization of Solutions
Thm: Minimal point if and only if
)},(Exp{),(
)()|( jj
jj
jjj td
Z
tPtP pa
papa
d(tj,paj) - measure of “distortion” between tj and paj
For example in symmetric bottleneck:))|()|((),( aBBA tTPaTPKLatd
Finding Solutions
How can we find solutions?
Asynchronous update Pick an index j Update P(Tj|paj)
Theorem Asynchronous updates converge to (local) minima
)},(Exp{),(
)()|( jj
jj
jjj td
Z
tPtP pa
papa
Example - 20 newsgroup
20,000 messages from 20 news group [Lang 1995]
A - newsgroup of the message B - word in the message
P(a,b) -
probability that choosing a random position in the corpus would select word b in a message in newsgroup a
We applied symmetric bottleneck on both attributes
20 Newsgroup: Symmetric Bottleneck
N
ewsg
roup
word
20 Newsgroup: Symmetric Bottleneck
alt.atheismrec.autosrec.motorcyclesrec.sport.*sci.medsci.spacesoc.religion.christiantalk.politics.*
comp.*misc.forsalesci.cryptsci.electronics
carturkishgameteamjesusgunhockey…
xfileimageencryptionwindowdosmac…
New
sgro
up
word
P(TD,TW)
20 Newsgroup: Symmetric Bottleneck
New
sgro
up
word
P(TD,TW)
20 Newsgroup: Symmetric Bottleneck
New
sgro
up
word
P(TD,TW)
20 Newsgroup: Symmetric Bottleneck
New
sgro
up
word
P(TD,TW)
20 Newsgroup: Symmetric Bottleneck
New
sgro
up
wordatheistschristianityjesusbiblesinfaith…
alt.atheismsoc.religion.christiantalk.religion.misc
P(TD,TW)
Discussion
General framework: Defines a new family of optimization problems
… and solutions
Future directions: Additional algorithms - agglomerative solutions Relation to generative models Parametric constraints in Gout
Example: Parallel Bottleneck
A B
T1 T2A
T1
B
T2
Gin Gout
)];,();([);();( 212111 BTTITTIBTIATIL
))|()|((
)),|(),|((),(
aBB
BaBA
tTPaTPKL
TtBPTaBPKLatd