Mining 3-Clusters in Vertically Partitioned Data

Faris Alqadah & Raj BhatnagarUniversity of Cincinnati

Outline

• Introduction to 3-clustering in binary, (categorical) vertically partitioned data

• Proposed cluster quality measure• 3-Clu: algorithm for enumerating 3-clusters

from two datasets

Introduction

Traditional clustering

Bi-Clustering

3-Clustering

Why 3-clusters?

• Find correspondence between bi-clusters of two different datasets

• Sharpen local clusters with outside knowledge

• Alternative? “Join datasets then search”– Does not capture underlying interactions– Inefficient– Not always possible

Why 3-clusters?<A,1234>

<AB,134>

<AWB,13>

<AY,12>

<AX,24>

<AWBCYZ,1>

<ABDX,4>

Formal Definitions

Bi-cluster in Di

3-Cluster across D1 and D

Pattern in Di

Defining 3-clusters• D

1 is the “learner”

• Maximal rectangle of 1's under suitable permutation in learner

• Best Correspondence to rectangle of 1's in D

Cluster Quality Measure

• Intuition: Maximize number of 1's while also maximizing number of items and objects

• Trade off between objects and items– More items...less objects– More objects...less items

Quality Measure

–Consider bi-clusters in learner alone

•Which is preferable ?•User decides

Quality Measure• Quality measure:

– Monotonic in both width and height• Reflects intuition

– Balances width and height according to user defined parameter

• Introduce β

• Amount of width(attributes) willing to trade for a single unit of height (objects)

Quality Measure

Extending to 3-clusters

• Utilize same intuition• Width of 3-cluster is sum of individual

widths

Selecting β

• Larger values yield 3-clusters that are “wide” and “short” in both D1 and D2 – Cluster key websites popular with large number

of democrats and republicans

• Smaller values produce 3-clusters that are “narrow” and “long”– Discover long list of websites utilized by few

select democrats and republicans

3-Clu: Our Algorithm

• Search for 3-clusters similar to search for closed itemsets

• How to formulate the search space?– Assumption that objects out-number attributes

may not hold– Several possible orderings of the search space

Algorithm

• Define search space with primacy to objects

• Only need to maintain one search tree• Mimic closed itemset algorithm with

simultaneous pruning of search space• Prune with quality measure

Algorithm

• Cluster quality measure is neither monotone nor anti-monotone in the search space

• Pruning is still possible

Is C2 of higher quality ?

Algorithm

• Pruning rule is very optimistic

• Can be adjusted with some a-priori information

• Example β = 0.5

• x=2.73...can't prune– This assumes w will

stay at 15 for 3 more levels

Algorithm Analysis

• Computational cost: O (|O|*i*N)– Only as expensive as enumerating bi-

clusters in single dataset

• Communication cost: O(N)

• Correctness guaranteed by FCA theory

Experimental Results

• Performance tests

• Randomly split benchmark datasets CHESS and CONNECT

• Genetic dataset: Genes, GO terms, Phenotypes

• Compared to LCM and CHARM

ChessConnect

GO-Pheno

Experimental Results

• Test validity of 3-clusters

• Randomly partitioned Mushrooms dataset by attributes

Conclusion

• Novel concept of 3-clusters in vertically partitioned data

• Introduced quality measure framework for 3-clusters• Presented efficient algorithm based on closed itemset

mining algorithms, with adaptations:– Defined search space to enable simultaneous pruning

– Incorporated novel pruning method based on cluster quality measure

Mining 3-Clusters in Vertically Partitioned Data

Technology

Assisted Learning: A Framework for Multi-Organization Learning · on privacy-preservinglearning on vertically partitioned data (e.g. [18,19]) need to disclose certain features’

SW-Store: a vertically partitioned DBMS for Semantic Web data management · 2010-02-24 · The VLDB Journal (2009) 18:385–406 DOI 10.1007/s00778-008-0125-y SPECIAL ISSUE PAPER SW-Store:

Communication-EfﬁcientPrivacy-Preserving Clusteringrwright1/Publications/tdp10.pdf · ing for vertically partitioned data, Jha, Kruger, and McDaniel’s [31] addresses horizontally

Efficient Skyline Computation on Vertically Partitioned Datasets Dimitris Papadias, David Yang, Georgios Trimponias CSE Department, HKUST, Hong Kong

Function-Based Indexes Partitioned Indexes

Secure Linear Regression on Vertically Partitioned Datasets · 2018-11-06 · Secure Linear Regression on Vertically Partitioned Datasets Adri a Gasc on 1, Phillipp Schoppmann , Borja

Anomaly Detection in Vertically Partitioned Data by Distributed Core

NICE DSU TECHNICAL SUPPORT DOCUMENT PARTITIONED SURVIVAL …nicedsu.org.uk/wp-content/uploads/2017/06/Partitioned-Survival... · • A description of the partitioned survival analysis

Multiprocessor Interconnection Networks Using Partitioned ...people.cs.pitt.edu/~melhem/vita/doc/00336637.pdf · Multiprocessor Interconnection Networks Using Partitioned Optical

Statistics on Partitioned Objects

AIX Installation in a Partitioned Environment - Installing AIX in a Partitioned Environment

Privacy Preserving K-means Clustering on Vertically Partitioned Data Presented by: Jaideep Vaidya Joint work: Prof. Chris Clifton

A PARTITIONED NEURAL NETWORK APPROACH

Epitaxial growth of vertically aligned piezoelectric ... · Epitaxial growth of vertically aligned piezoelectric diphenylalanine peptide ... Piezoelectricity ... of vertically aligned

Secure Linear Regression on Vertically Partitioned Datasetsdimacs.rutgers.edu/Workshops/RAM/Slides/raykova.pdf · Adria Gascon Phillipp Schoppmann Borja Balle Mariana Raykova Samee

Automatic Data Optimization - NYOUGnyoug.org/wp-content/uploads/2014/09/Benner_ADO.pdf · Global Non‐Partitioned Index Local Partitioned Index Global Partitioned Index Table Partition

Partitioned Based Regression Verification

Privacy Preserving Association Rule Mining in Vertically Partitioned Data Reporter ： Ximeng Liu Supervisor: Rongxing Lu School of EEE, NTU

SW-Store: a vertically partitioned DBMS for Semantic Web ...gweddell/cs848/papers/SW-Store.pdf · A vertically partitioned DBMS for Semantic Web data management 387 wider, ﬂattened

SW-Store: a vertically partitioned DBMS for Semantic Web ...abadi/papers/abadi-rdf-vldbj.pdf1 Introduction The Semantic Web is an effort by the W3C [51] to enable integration and sharing