The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Hierarchical Co-Clustering Based on Entropy Splitting Wei Cheng 1 Xiang Zhang 2 Feng Pan 3 Wei Wang 4 1

Embed Size (px)

DESCRIPTION

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Hierarchical Co-Clustering Based on Entropy Splitting View (scaled) co-occurrence matrix as a joint probability distribution between row & column random variables Objective: seeking a hierarchical co-clustering containing given number of clusters while maintaining as much “Mutual Information” between row and column clusters as possible. c1c2c3c4 r r r r

Citation preview

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Hierarchical Co-Clustering Based on Entropy Splitting Wei Cheng 1 Xiang Zhang 2 Feng Pan 3 Wei Wang 4 1 University of North Carolina at Chapel Hill, 2 Case Western Reserve University, 3 Microsoft, 4 University of California, Los Angeles Speaker: Wei Cheng The 21 st ACM Conference on Information and Knowledge Management (CIKM12) The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Idea of Co-Clustering Co-clustering To combine the row and column clustering of co- occurrence matrix together and bootstrap each other. Simultaneously cluster the rows X and columns Y of the co-occurrence matrix. The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Hierarchical Co-Clustering Based on Entropy Splitting View (scaled) co-occurrence matrix as a joint probability distribution between row & column random variables Objective: seeking a hierarchical co-clustering containing given number of clusters while maintaining as much Mutual Information between row and column clusters as possible. c1c2c3c4 r r r r The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Hierarchical Co-Clustering Based on Entropy Splitting Co-occurrence Matrices Joint probability distribution between row & column cluster random variables The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Hierarchical Co-Clustering Based on Entropy Splitting Update cluster indicators Pipeline: (recursive splitting) While(Termination condition) Find optimal row/column cluster split which achieves maximal Termination Condition: The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Hierarchical Co-Clustering Based on Entropy Splitting Randomly split cluster S into S 1 and S 2 Converge at a local optima How to find an optimal split at each step? An Entropy-based Splitting Algorithm: Input: Cluster S Until Convergence Update cluster indicators and probability values For all element x in S, re-assign it to cluster S 1 or S 2 to minimize: The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Hierarchical Co-Clustering Based on Entropy Splitting Example Y1Y1 Y2Y2 Y3Y3 Y4Y4 X1X X2X X3X3 0 0 X4X S={ X 1, X 2, X 3, X 4 } S 1 ={ X 1 } S 2 ={ X 2, X 3, X 4 } Nave method needs trying 7 splits. Exponential time to size of S. Nave method needs trying 7 splits. Exponential time to size of S. Randomly split Re-assign X 4 to S 1 S 2 ={ X 2, X 3 } S 1 ={ X 1, X 4 } The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Experiments Data sets Synthetic data 20 Newsgroups data 20 classes, documents The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Results-Synthetic Data *1000 Matrix Add noise to (a) by flipping values with probability 0.3 Randomly permute rows and columns of (b) Clustering result With hierarchical structure The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Results-20 Newsgroups Data Compare with baselines: Method HICCNVBD ICCHCC Dataset m-pre #clusters m-pre #clusters m-pre #clusters m-pre #clusters Multi5 subject Multi N/A Multi10 subject Multi N/A HICC(merged) Single-Link UPGMA WPGMA Complete-Link m-pre #clusters m-pre#clusters m-pre #clusters m-pre #clusters m-pre #clusters Micro- averaged precision: M/N M:number of documents correctly clustered; N: total number of documents The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Thank You ! Questions?