Upload
hija
View
30
Download
0
Embed Size (px)
DESCRIPTION
SCALE: a scalable framework for efficiently clustering transactional data. Hua Yan · Keke Chen · Ling Liu · Zhang Yi DMKD 2010 Reported by Wen-Chung Liao, 2010/03/02. Outlines. Motivation Objective WCD clustering Evaluating clustering results Experiments Conclusions Comments. - PowerPoint PPT Presentation
Citation preview
1Intelligent Database Systems Lab
國立雲林科技大學National Yunlin University of Science and Technology
SCALE: a scalable framework for efficiently clustering transactional data
Hua Yan · Keke Chen · Ling Liu · Zhang Yi
DMKD 2010
Reported by Wen-Chung Liao, 2010/03/02
2
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Outlines
Motivation Objective WCD clustering Evaluating clustering results Experiments Conclusions Comments
3
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Motivation
transactional data clustering algorithms require users to manually tune at least one or two parameters
lacks of cluster validation methods to evaluate the quality of transactional clustering results.
4
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Objectives
Present a fast, memory-saving, and scalable clustering algorithm that can efficiently handle large transactional datasets without resorting to manual parameter settings.
SCALE framework
5
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.WCD clustering
transactional dataset─ {abcd, bcd, ac, de, def}
6
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
7
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Evaluating clustering results
8
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiment
two synthetic datasets: ─ Tc30a6r1000_2L─ TxI4Dx Series
Three real datasets: ─ Zoo─ Mushroom ─ Retail
T10I4Dx
T10I4Dx
TxI4D100k
9
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
ZooTc30a6r1000_2L
10
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
11
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Conclusion
Two unique features of SCALE ─ the WCD clustering algorithm—a fast, memory-saving
and scalable method for clustering transactional data, ─ two transactional data specific cluster evaluation
measures: LISR and AMI. Some promising directions
─ perform some experimental comparison between the WCD measure and the entropy measure.
─ design a better algorithm for determining the best K for transactional data clustering.
─ Extend our work to handle transactional data streams
12
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Comments
Advantage─ No parameter setting required
Shortage─ If there is no BKPlot, WCD needs to determine K manually.─ No description of how BKPlot generates K in categorical c
ase. Applications
─ Transactions clustering─ Web log clustering─ …