Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Data Mining Lab Introduction to Weka 11/12/2012
Data mining with WEKA
Data Mining Lab Introduction to Weka 11/12/2012 ‹#›
WEKA : the software
“Waikato Environment for Knowledge Analysis”
Data Mining Software in Java– a collection of machine learning algorithms
for data mining tasks
– http://www.cs.waikato.ac.nz/ml/weka/
Inclusion– data pre-processing, classification, regression,
clustering, association rules, visualization
Data Mining Lab Introduction to Weka 11/12/2012 ‹#›
How to install WEKA
Download WEKA from– http://www.cs.waikato.ac.nz/ml/weka/index_downloadi
ng.html
Data Mining Lab Introduction to Weka 11/12/2012 ‹#›
How to install WEKA
Next I Agree Next Next Install
Data Mining Lab Introduction to Weka 11/12/2012 ‹#›
WEKA GUI chooser
Explorer Experimenter
KnowledgeFlow Command Line Interface
Data Mining Lab Introduction to Weka 11/12/2012 ‹#›
WEKA Explorer : Open file
Open file Brings up a dialog box allowing you to browse for the data file on the local file system
Open URL Asks for a Uniform Resource Locator address for where the data is stored
Open DB Reads data from a database
Generate Enables you to generate artificial data from a variety of DataGenerators
Data can be imported from a file in various formats: ARFF, CSV, C4.5
Data Mining Lab Introduction to Weka 11/12/2012 ‹#›
Open arff file
weather– 14 samples– 4 attribute– binary class
No. outlook temperature humidity windysunny 85 85 FALSE nosunny 80 90 TRUE no
overcast 83 86 FALSE yesrainy 70 96 FALSE yesrainy 68 80 FALSE yesrainy 65 70 TRUE no
overcast 64 65 FALSE yessunny 72 95 FALSE nosunny 69 70 FALSE yesrainy 75 80 TRUE yessunny 75 70 TRUE yes
overcast 72 90 TRUE yesovercast 81 75 FALSE yes
rainy 71 91 TRUE no
Data Mining Lab Introduction to Weka 11/12/2012 ‹#›
File information
attribute & class information
Data Mining Lab Introduction to Weka 11/12/2012 ‹#›
File information – visualization all
각각의 에트리뷰트값에 대한 클래스분포를 확인 할 수있다
Data Mining Lab Introduction to Weka 11/12/2012 ‹#›
Classify Section
Select a classifier
Test Option
Select class attribute
Data Mining Lab Introduction to Weka 11/12/2012 ‹#›
Choose a classifier for classification
NaiveBayes
Data Mining Lab Introduction to Weka 11/12/2012 ‹#›
k-fold cross validation(set as k = 2)
Set test options
data set
k-1 : training set, 1 : test set
Data Mining Lab Introduction to Weka 11/12/2012 ‹#›
Training & test
Click the ‘Start’ Button
Data Mining Lab Introduction to Weka 11/12/2012 ‹#›
Prediction Accuracy
Data Mining Lab Introduction to Weka 11/12/2012 ‹#›
Choose other classifiers
Multi Layer Perceptron & DecisionStump
Data Mining Lab Introduction to Weka 11/12/2012 ‹#›
Prediction result
Naïve Bayes
Multi Layer Perceptron
Decision Stump
Data Mining Lab Introduction to Weka 11/12/2012 ‹#›Data Mining Lab Introduction to Weka 11/12/2012
ARFF
Data Mining Lab Introduction to Weka 11/12/2012 ‹#›
ARFF format
An ARFF (= Attribute-Relation File Format ) file is an ASCII text file that describes a list of instances sharing a set of attributes
ARFF files have two distinct sections– Header : relation, attributes– Data
Data Mining Lab Introduction to Weka 11/12/2012 ‹#›
ARFF data
@relation heart-disease-simplified
@attribute age numeric@attribute sex { female, male}@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}@attribute cholesterol numeric@attribute exercise_induced_angina { no, yes}@attribute class { present, not_present}
@data63,male,typ_angina,233,no,not_present67,male,asympt,286,yes,present67,male,asympt,229,yes,present38,female,non_anginal,230,no,not_present
.
.
.
Header
Data
Data Mining Lab Introduction to Weka 11/12/2012 ‹#›
ARFF data – header section
@relation heart-disease-simplified
@attribute age numeric@attribute sex { female, male}@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}@attribute cholesterol numeric@attribute exercise_induced_angina { no, yes}@attribute class { present, not_present}
Realation - @relation <relation-name>– The relation name is defined as the first line in the ARFF
Attribute - @attribute <attribute-name> <datatype>– @attribute statement uniquely defines the name of that attribute– Data type
numeric(integer,real is treated as numeric)<nominal-specification>string
Data Mining Lab Introduction to Weka 11/12/2012 ‹#›
ARFF data – data section
Data a single line denoting the start of the data segment in the file
@data63,male,typ_angina,233,no,not_present67,male,asympt,286,yes,present67,male,asympt,229,yes,present38,female,non_anginal,230,no,not_present
.
.
.
Data Mining Lab Introduction to Weka 11/12/2012 ‹#›
실습과제
“UCI machine learning repository” 에서데이터를다운받는다
데이터의특징을설명
classification algorithm 두개를골라성능을비교한다
clustering을실행한다.기말고사일에리포트로제출