27
Classi cation Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology

Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology

Embed Size (px)

Citation preview

Classification

Data Mining Experiment

Department of Computer Science

Shenzhen Graduate School

Harbin Institute of Technology

Data Mining Resources on the Web

1. A comprehensive site for many resources of KDDhttp://www.kdnuggets.com/

2. tutorial type articles on currently hot topicshttp://www.sigkdd.org/

3. The KDD Cup(1997~2010)http://www.sigkdd.org/kddcup/index.php

4, UCI Datasethttp://archive.ics.uci.edu/ml/

5. Conferences, Journals, and Organizations SIGKDD,ICDM,SIGMOD,SDM,PAKDDIEEE Transactions on Knowledge and Data EngineeringData Mining Group

Tools

Clementine

Clementine is a platform of data mining developed by ISL (Integral Solutions Limited) company . SPSS company integrated and developed Clementine after purchasing the ISL company in 1999. Now Clementine has become another highlight of SPSS company. Merger and acquisition of IBM and SPSS happened in 2010

It is a data mining and text analytics workbench used to build predictive models. It has a visual interface which allows users to leverage statistical and data mining algorithms without programming.

Tools

Clementine

Workflow1

Dataset1

1. Led71. attribute#1, attribute#2, ….. attribute#7, label2. 3200 instance3. All attribute values are either 0 or 14. Whether the corresponding light is on or not for the decimal digit

Load the file

Operations

Partitions

C5.0

View the model

Model analysis

CHAID

View model

Dataset2

Listing of attributes:

label: >50K, <=50K.

Age, workclass, fnlwgt, education, education-num, marital-status, occupation, relationship, race, sex, capital-gain, capital-loss, hours-per-week, native-country

Flow

Setting

Partitions

C5.0 Analysis

CHAID Analysis

Data cleaning

Partition

Flow

C5.0 and CHAID

You can do other data preprocessing

according to your requirements.

Programming

• Programming– Use C4.5 or Bayes classifier– Dataset

Programming

Compare your resultwith the tool.

Classification

Data Mining Experiment

Department of Computer Science

Shenzhen Graduate School

Harbin Institute of Technology