6
WEKA AN INTRODUCTION WEKA Waikato Environment for Knowledge Analysis (WEKA) Developed by the Department of Computer Science, University of Waikato, New Zealand Machine learning/data mining software written in Java (distributed under the GNU Public License) Used for research, education, and applications http://www.cs.waikato.ac.nz/ml/weka/ Weka Interfaces Explorer Preprocessing, attribute selection, learning, visualization Knowledge Flow Visual design of KDD process Experimenter testing and evaluating machine learning algorithms Command-line Data Formats Uses flat text files to describe the data Can work with a wide variety of data files including its own “.arff” format and C4.5 file formats Data can be imported from a file in various formats: ARFF, CSV, C4.5 etc. ARFF (Attribute Relation File Format) @relation person

Data Mining-Weka an Intoduction

Embed Size (px)

DESCRIPTION

Data Mining-Weka an Intoduction

Citation preview

  • WEKA AN INTRODUCTION

    WEKA Waikato Environment for Knowledge Analysis

    (WEKA) Developed by the Department of Computer Science,

    University of Waikato, New Zealand Machine learning/data mining software written in

    Java (distributed under the GNU Public License) Used for research, education, and applications http://www.cs.waikato.ac.nz/ml/weka/

    Weka Interfaces Explorer

    Preprocessing, attribute selection, learning, visualization

    Knowledge Flow Visual design of KDD process

    Experimenter testing and evaluating machine learning

    algorithms Command-line

    Data Formats Uses flat text files to describe the data Can work with a wide variety of data files including

    its own .arff format and C4.5 file formats Data can be imported from a file in various formats:

    ARFF, CSV, C4.5 etc. ARFF (Attribute Relation File Format)

    @relation person

  • @attribute age numeric @attribute name string @attribute education {College, Masters, Doctorate} @attribute class {>50K,
  • Native format ARFF Supports file Conversions

    Explorer Applying Filters: Supervised Vs Unsupervised Filters Attribute Vs Instance Filters Unsupervised Attribute Filters

    Add-Adds a new attribute Normalize-Scales all numeric values Remove-Remove Attributes (RemoveType /

    RemoveUseless) Unsupervised Instance Filters

    Randomize- Randomize order of instance in a dataset

    RemoveWithValues- Filter out instances with certain attribute values

    Supervised Attribute Filters AttributeSelection- Attribute Selection Methods Discretize- Convert Numeric attributes to nominal

    Supervised Instance Filters Resample- Produce a random sub sample of a

    dataset

    Classifiers: Bayes - BayesNet, NaiveBayes Trees - ID3, J48 Rules - OneR, Conjunctive Rule Functions - Linear Regression,

    RBFNetwork, Multilayer Perceptron

    Lazy - KStar, IBk Miscellaneous- VFI

  • Clusterers: OPTICS DBScan SimpleKMeans Cobweb

    Associations: Apriori Predictive Apriori Filtered Associator

    Attribute Selection: Attribute Evaluators

    CfsSubsetEval ClassifierSubsetEval GainRatioAttributeEval InfoGainAttributeEval

    Search Method Best First Exhaustive Search Genetic Search Rank Search

    Knowledge Flow Interface: Data-flow inspired interface to WEKA process data in batches or incrementally process multiple batches or streams in parallel (each

    separate flow executes in its own thread) chain filters together visualize performance of incremental classifiers

    during processing

  • Experimenter Interface: Enables the user to create, run, modify, and analyse

    experiments in a more convenient manner Modes of Operation

    Simple Advanced

    Local / Remote Experiments are supported

    Command Line Interface: Plain text panel from where commands can be

    entered java [] invokes a java class

    with the given arguments (if any) break stops the current thread, e.g., a running

    classifier, in a friendly manner kill stops the current thread in an unfriendly

    fashion cls clears the output area exit exits the Simple CLI help []

    Weka Operation: The Operating Systems command line interface can

    also be used after setting the CLASSPATH accordingly.

    All the functionality supported by Weka can also be invoked from ones own source code.

    Weka Extensions: BioWeka - Extension library for knowledge

    discovery in biology WekaMetal - Meta learning extension to WEKA

  • Weka-Parallel - Parallel processing for WEKA Grid Weka - Grid computing using WEKA

    References: Witten, I.H. and Frank, E. (2005) Data Mining:

    Practical machine learning tools and techniques. 2nd edition Morgan Kaufmann, San Francisco

    Weka Knowledge Flow Tutorial, Mark Hall Peter Reutemann http://www.inf.fh-dortmund.de/personen/professoren/engels/dm/praktikum/WEKA-KnowledgeFlowTutorial-3-5-7.pdf

    WEKA Manual for Version 3-6-2 - Remco R. Bouckaert, Eibe Frank et.al, January 11, 2010

    WEKA AN INTRODUCTIONWEKA Waikato Environment for Knowledge Analysis (WEKA) Developed by the Department of Computer Science, University of Waikato, New Zealand Machine learning/data mining software written in Java (distributed under the GNU Public License) Used for research, education, and applications http://www.cs.waikato.ac.nz/ml/weka/

    Weka Interfaces Explorer Preprocessing, attribute selection, learning, visualization

    Knowledge Flow Visual design of KDD process

    Experimenter testing and evaluating machine learning algorithms

    Command-line

    Data Formats Uses flat text files to describe the data Can work with a wide variety of data files including its own .arff format and C4.5 file formats Data can be imported from a file in various formats: ARFF, CSV, C4.5 etc. ARFF (Attribute Relation File Format)@relation person@attribute age numeric @attribute name string @attribute education {College, Masters, Doctorate} @attribute class {>50K,