Intelligent Database Systems Lab N.Y.U.S.T. I. M. 1 Visualization of multi-algorithm clustering for...

Preview:

Citation preview

1Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Visualization of multi-algorithm clustering for better economic decisions - The case of car pricing

Presenter : Wu, Jia-Hao

Authors : Ran M. Bittmann , Roy Gelbard

DSS (2009)

國立雲林科技大學National Yunlin University of Science and Technology

2Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

2

Outline

Motivation

Objective

Methodology

Experiments

Conclusion

Personal Comments

3Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Motivation

Decision makers must analyze diverse algorithms and parameters on the decision-making issues they face.

There is no supportive model or tool which enables comparing different result-clusters generated by these algorithms and parameters.

4Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Objective

The authors developed a methodology called Multi Algorithms Voting (MAV).

The visualization format of MAV just like “Tetris-like” , which enables a cross-algorithm presentation.

5Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Methodology – Multi Algorithms Voting

The Tetris-like format is composed of rows, columns and colors. Each column represents a specific algorithm.

Each line represents a specific sample case.

Each color represents a “Vote”.Algorithms

Sample case

Vote

6Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Methodology – Meter

Squared Vote Error (SVE) Calculated as the square sum of all the algorithms that did not vote for

the chosen classification.

H=(7-6)2The classification same. The classification

different.H=(7-4)2

7Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Methodology – Meter

Distance From Second Best (DFSB) Calculated as the difference in the number of votes that the best vote.

H=(6-1) The classification same. The classification different.

H=(4-2)

8Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Experiments

The case of car pricing and the cars in the dataset were classified into three price classes.

The authors use 14 parameters for each car to perform the clustering. The car manufacturer.

The car’s engine size.

The number of air bags in the car…

Use five algorithms to classification all dataset. Average Linkage (between Groups)

Average Linkage (within Groups)

Single Linkage

Median Method

Ward Method

9Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Experiments

M3 , Single Linkage , was unable to match this class correctly. M4 , Median method , correctly classified all the cars. Samples 54 and 60 were classified as belonging to the second

class by many algorithms.

10Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Experiments

Samples 66~71 were classified as belonging to the first price class by most algorithms.

The 72 was classified as belonging to price class three , suggesting it is under-priced.

11Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Experiments

The price class proved to be the hardest one to classify.

M5 is an exception to the rule and proved to be quite effective in classifying cars belonging to this class.

12Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Conclusion

Visual presentation of multi-classifications allows the decision maker to identify the right models.

The MAV can see the result-clusters of algorithms and evaluate the algorithms.

Use the case of cars pricing to identified that are suspected to be overpriced and under-priced.

13Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Comments

Advantage A interesting format to compare the results.

Drawback …

Application Cluster analysis.

Decision support.

Recommended