Upload
raj-endran
View
6
Download
2
Embed Size (px)
DESCRIPTION
Data Mining-Weka an Intoduction
Citation preview
WEKA AN INTRODUCTION
WEKA Waikato Environment for Knowledge Analysis
(WEKA) Developed by the Department of Computer Science,
University of Waikato, New Zealand Machine learning/data mining software written in
Java (distributed under the GNU Public License) Used for research, education, and applications http://www.cs.waikato.ac.nz/ml/weka/
Weka Interfaces Explorer
Preprocessing, attribute selection, learning, visualization
Knowledge Flow Visual design of KDD process
Experimenter testing and evaluating machine learning
algorithms Command-line
Data Formats Uses flat text files to describe the data Can work with a wide variety of data files including
its own .arff format and C4.5 file formats Data can be imported from a file in various formats:
ARFF, CSV, C4.5 etc. ARFF (Attribute Relation File Format)
@relation person
Native format ARFF Supports file Conversions
Explorer Applying Filters: Supervised Vs Unsupervised Filters Attribute Vs Instance Filters Unsupervised Attribute Filters
Add-Adds a new attribute Normalize-Scales all numeric values Remove-Remove Attributes (RemoveType /
RemoveUseless) Unsupervised Instance Filters
Randomize- Randomize order of instance in a dataset
RemoveWithValues- Filter out instances with certain attribute values
Supervised Attribute Filters AttributeSelection- Attribute Selection Methods Discretize- Convert Numeric attributes to nominal
Supervised Instance Filters Resample- Produce a random sub sample of a
dataset
Classifiers: Bayes - BayesNet, NaiveBayes Trees - ID3, J48 Rules - OneR, Conjunctive Rule Functions - Linear Regression,
RBFNetwork, Multilayer Perceptron
Lazy - KStar, IBk Miscellaneous- VFI
Clusterers: OPTICS DBScan SimpleKMeans Cobweb
Associations: Apriori Predictive Apriori Filtered Associator
Attribute Selection: Attribute Evaluators
CfsSubsetEval ClassifierSubsetEval GainRatioAttributeEval InfoGainAttributeEval
Search Method Best First Exhaustive Search Genetic Search Rank Search
Knowledge Flow Interface: Data-flow inspired interface to WEKA process data in batches or incrementally process multiple batches or streams in parallel (each
separate flow executes in its own thread) chain filters together visualize performance of incremental classifiers
during processing
Experimenter Interface: Enables the user to create, run, modify, and analyse
experiments in a more convenient manner Modes of Operation
Simple Advanced
Local / Remote Experiments are supported
Command Line Interface: Plain text panel from where commands can be
entered java [] invokes a java class
with the given arguments (if any) break stops the current thread, e.g., a running
classifier, in a friendly manner kill stops the current thread in an unfriendly
fashion cls clears the output area exit exits the Simple CLI help []
Weka Operation: The Operating Systems command line interface can
also be used after setting the CLASSPATH accordingly.
All the functionality supported by Weka can also be invoked from ones own source code.
Weka Extensions: BioWeka - Extension library for knowledge
discovery in biology WekaMetal - Meta learning extension to WEKA
Weka-Parallel - Parallel processing for WEKA Grid Weka - Grid computing using WEKA
References: Witten, I.H. and Frank, E. (2005) Data Mining:
Practical machine learning tools and techniques. 2nd edition Morgan Kaufmann, San Francisco
Weka Knowledge Flow Tutorial, Mark Hall Peter Reutemann http://www.inf.fh-dortmund.de/personen/professoren/engels/dm/praktikum/WEKA-KnowledgeFlowTutorial-3-5-7.pdf
WEKA Manual for Version 3-6-2 - Remco R. Bouckaert, Eibe Frank et.al, January 11, 2010
WEKA AN INTRODUCTIONWEKA Waikato Environment for Knowledge Analysis (WEKA) Developed by the Department of Computer Science, University of Waikato, New Zealand Machine learning/data mining software written in Java (distributed under the GNU Public License) Used for research, education, and applications http://www.cs.waikato.ac.nz/ml/weka/
Weka Interfaces Explorer Preprocessing, attribute selection, learning, visualization
Knowledge Flow Visual design of KDD process
Experimenter testing and evaluating machine learning algorithms
Command-line
Data Formats Uses flat text files to describe the data Can work with a wide variety of data files including its own .arff format and C4.5 file formats Data can be imported from a file in various formats: ARFF, CSV, C4.5 etc. ARFF (Attribute Relation File Format)@relation person@attribute age numeric @attribute name string @attribute education {College, Masters, Doctorate} @attribute class {>50K,