12
1 Intelligent Database Systems Lab 國國國國國國國國 National Yunlin University of Science and T echnology SCALE: a scalable framework for efficiently clustering transactional data Hua Yan · Keke Chen · Ling Liu · Zh ang Yi DMKD 2010 Reported by Wen-Chung Liao, 2010/03 /02

SCALE: a scalable framework for efficiently clustering transactional data

  • Upload
    hija

  • View
    30

  • Download
    0

Embed Size (px)

DESCRIPTION

SCALE: a scalable framework for efficiently clustering transactional data. Hua Yan · Keke Chen · Ling Liu · Zhang Yi DMKD 2010 Reported by Wen-Chung Liao, 2010/03/02. Outlines. Motivation Objective WCD clustering Evaluating clustering results Experiments Conclusions Comments. - PowerPoint PPT Presentation

Citation preview

Page 1: SCALE: a scalable framework for efficiently clustering transactional data

1Intelligent Database Systems Lab

國立雲林科技大學National Yunlin University of Science and Technology

SCALE: a scalable framework for efficiently clustering transactional data

Hua Yan · Keke Chen · Ling Liu · Zhang Yi

DMKD 2010

Reported by Wen-Chung Liao, 2010/03/02

Page 2: SCALE: a scalable framework for efficiently clustering transactional data

2

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Outlines

Motivation Objective WCD clustering Evaluating clustering results Experiments Conclusions Comments

Page 3: SCALE: a scalable framework for efficiently clustering transactional data

3

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Motivation

transactional data clustering algorithms require users to manually tune at least one or two parameters

lacks of cluster validation methods to evaluate the quality of transactional clustering results.

Page 4: SCALE: a scalable framework for efficiently clustering transactional data

4

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Objectives

Present a fast, memory-saving, and scalable clustering algorithm that can efficiently handle large transactional datasets without resorting to manual parameter settings.

SCALE framework

Page 5: SCALE: a scalable framework for efficiently clustering transactional data

5

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.WCD clustering

transactional dataset─ {abcd, bcd, ac, de, def}

Page 6: SCALE: a scalable framework for efficiently clustering transactional data

6

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Page 7: SCALE: a scalable framework for efficiently clustering transactional data

7

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Evaluating clustering results

Page 8: SCALE: a scalable framework for efficiently clustering transactional data

8

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiment

two synthetic datasets: ─ Tc30a6r1000_2L─ TxI4Dx Series

Three real datasets: ─ Zoo─ Mushroom ─ Retail

T10I4Dx

T10I4Dx

TxI4D100k

Page 9: SCALE: a scalable framework for efficiently clustering transactional data

9

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

ZooTc30a6r1000_2L

Page 10: SCALE: a scalable framework for efficiently clustering transactional data

10

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Page 11: SCALE: a scalable framework for efficiently clustering transactional data

11

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Conclusion

Two unique features of SCALE ─ the WCD clustering algorithm—a fast, memory-saving

and scalable method for clustering transactional data, ─ two transactional data specific cluster evaluation

measures: LISR and AMI. Some promising directions

─ perform some experimental comparison between the WCD measure and the entropy measure.

─ design a better algorithm for determining the best K for transactional data clustering.

─ Extend our work to handle transactional data streams

Page 12: SCALE: a scalable framework for efficiently clustering transactional data

12

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Comments

Advantage─ No parameter setting required

Shortage─ If there is no BKPlot, WCD needs to determine K manually.─ No description of how BKPlot generates K in categorical c

ase. Applications

─ Transactions clustering─ Web log clustering─ …