Upload
ophelia-maxwell
View
221
Download
1
Tags:
Embed Size (px)
Citation preview
W E K AWaikato Environment for
Knowledge Aquisition
Goals of the workshop
• Aquisition of functional knowledge about the WEKA platform
• Ability of processing (own) data in WEKA
Write seminar work
identifying a problem
transform into data
choose appropriate DM
technique
apply to data
evaluate & interpret the
results
Some basic facts about WEKA:
• WEKAWEKA(1)(1) = a flightless bird with an inquisitive nature
(found only on the islands of New Zealand)
• WEKAWEKA(2)(2) = a software ‘workbench’ incorporating several
standard ML/DM techniques
• AAuutthhororss = Ian H. Witten, Eibe Frank (et. al.)
• ProgramProgrammingming languagelanguage = JAVA
• OOrriginigin = The University of Waikato, New Zealand
• LiteraturLiteraturee = Ian H. Witten, Eibe Frank: Practical Machine Learning Tools with JAVA Implementations, Morgan Kaufmann, 1999
• HomepageHomepage = http://www.cs.waikato.ac.nz/~ml/weka
What is WEKA ?
• make ML/DM techniques generally
available
• apply them to practical problems
(in agriculture)
• develop new ML/DM algorithms
• contribute to the theoretical framework
of the field (ML/DM)
Objectives of WEKA
Versions of WEKA
• There are several versions of WEKA:– WEKA 3.0: “book version” compatible
with description in data mining book– WEKA 3.2: “GUI version” adds graphical
user interfaces (book version is command-line only)
– WEKA 3.4: “development version” with lots of improvements
• This workshop is based on WEKA 3.4(.3)
ARFF format (“flat” files):• example: Play-tennis domain
The input to WEKA
%this is an example of a knowledge %domain in ARFF format
@relation weather
@attribute outlook {sunny, overcast, rainy}@attribute temperature real@attribute humidity real@attribute windy {TRUE, FALSE}@attribute play {yes, no}
@datasunny,85,85,FALSE,nosunny,80,90,TRUE,noovercast,83,86,FALSE,yesrainy,70,96,FALSE,yesrainy,68,80,FALSE,yesrainy,65,70,TRUE,noovercast,64,65,TRUE,yessunny,72,95,FALSE,nosunny,69,70,FALSE,yesrainy,75,80,FALSE,yessunny,75,70,TRUE,yesovercast,72,90,TRUE,yesovercast,81,75,FALSE,yes. . .
Conversion to theARFF format?
Example:• converting from
MS-EXCEL to ARFF
Starting WEKA – the GUI
• Preprocess panel
A quick tour of the “explorer”
Domain info. panel
Attributes panel
Status bar
Filters panel
Attribute info. panel
Log file
Attribute visualization
panel
• Classify panel
Classifier panel
Class attribute
Output panel
Test options panel
Result panel
A quick tour of the “explorer”
• Visualize panel
A quick tour of the “explorer”
• example: The command line
C:\Temp>java weka.classifiers.trees.J48
Weka exception: No training file and no object input file given.
General options:
-t <name of training file> Sets training file.
-T <name of test file> Sets test file. If missing, a cross-validation will be performed on the training data.
-c <class index> Sets index of class attribute (default: last).
-x <number of folds> Sets number of folds for cross-validation (default: 10).
-s <random number seed> Sets random number seed for cross-validation (default: 1).
-m <name of file with cost matrix> Sets file with cost matrix.
-l <name of input file> Sets model input file.
-d <name of output file> Sets model output file.
-v Outputs no statistics for training data.
-o Outputs statistics only, not the classifier.
-i Outputs detailed information-retrieval statistics for each class.
-k Outputs information-theoretic statistics.
-p Only outputs predictions for test instances.
-r Only outputs cumulative margin distribution.
-z <class name> Only outputs the source representation of the classifier, giving it the supplied name.
-g Only outputs the graph representation of the classifier.
Options specific to weka.classifiers.j48.J48:
-U Use unpruned tree.
-C <pruning confidence> Set confidence threshold for pruning. (default 0.25)
-M <minimum number of instances> Set minimum number of instances per leaf. (default 2)
-R Use reduced error pruning.
-N <number of folds> Set number of folds for reduced error pruning. One fold is used as pruning set. (default 3)
-B Use binary splits only.
-S Don't perform subtree raising.
-L Do not clean up after the tree has been built.
GUI (+):
• visualisation of data and (some) models
GUI (-):
• not all the parameterscan be set (reduced functionality)
GUI vs. command line
Command line (-):
• only textual visualisation of models
• awkward to use
Command line (+):
• full functionality (‘saving the model’)
• batch processing
PROs:
• open source (GNU
licence)
• platform-independent (JAVA)
• easy to use
• (relatively) easy to
modify
PROs & CONs of WEKA
CONs:
• relatively slow (JAVA)
• ‘incomplete’
documentation(some GUI features couldbe explained better)
• some features
availableonly from command line
Let’s go to work