23
WEKA WEKA A . Antony Alex MCA Dr G R D College of Science – CBE Tamil Nadu - India

Weka

  • Upload
    antony

  • View
    213

  • Download
    6

Embed Size (px)

DESCRIPTION

Weka

Citation preview

Page 1: Weka

WEKAWEKA

A . Antony Alex MCADr G R D College of Science – CBE

Tamil Nadu - India

Page 2: Weka

Waikato Environment for Waikato Environment for Knowledge AnalysisKnowledge Analysis

� A collection of open source ML algorithms◦ pre-processing◦ classifiers◦ clustering◦ association ruleIt’s a data mining/machine learning tool developed by � It’s a data mining/machine learning tool developed by Department of Computer Science, University of Waikato, New Zealand.

� Weka is also a bird found only on the islands of New Zealand.

� Java based � Routines are implemented as classes and logically

arranged in packages� Comes with an extensive GUI interface

Page 3: Weka

Download and Install WEKADownload and Install WEKA

� Website: http://www.cs.waikato.ac.nz/~ml/weka/index.html

� Support multiple platforms (written in java):� Support multiple platforms (written in java):

◦Windows, Mac OS X and Linux

39/10/2012

Page 4: Weka

Main FeaturesMain Features

� 49 data preprocessing tools

� 76 classification/regression algorithms

� 8 clustering algorithms

� 3 algorithms for finding association rules

� 15 attribute/subset evaluators + 10 search � 15 attribute/subset evaluators + 10 search algorithms for feature selection

49/10/2012

Page 5: Weka

• Dataset• Classifier• Weka.filters• Weka.classifiers

java weka.core.converters.CSVLoader data.csv > data.arff

Command line interface

java weka.core.converters.CSVLoader data.csv > data.arff

java weka.core.converters.C45Loader c45_filestem > data.arff

java weka.classifiers.rules.ZeroR -t weather.arff

java weka.classifiers.trees.J48 -t weather.arff

java weka.filters.supervised.attribute.Discretize -i data/iris.arff \ -o iris-nom.arff -c last

java weka.filters.supervised.attribute.Discretize -i data/cpu.arff \ -o cpu-classvendor-nom.arff -c first

Page 6: Weka

Main GUIMain GUI

� Three graphical user interfaces

◦ “The Explorer” (exploratory data analysis)

◦ “The Experimenter” (experimental environment)

◦ “The KnowledgeFlow” (new process ◦ “The KnowledgeFlow” (new process model inspired interface)

69/10/2012

Page 7: Weka

Explorer: preExplorer: pre--processing the dataprocessing the data

� Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary

� Data can also be read from a URL or from an SQL database (using JDBC)

Pre-processing tools in WEKA are called

9/10/2012 7

� Pre-processing tools in WEKA are called “filters”

� WEKA contains filters for:◦ Discretization, normalization, resampling,

attribute selection, transforming and combining attributes, …

Page 8: Weka

� DatabaseUtils.props.hsql - HSQLDB

� DatabaseUtils.props.msaccess - MS Access

DatabaseUtils.props.mssqlserver - MS SQL Server

�jdbcDriver�jdbcURL

ACCESSING DATABASEACCESSING DATABASE

� DatabaseUtils.props.mssqlserver - MS SQL Server

� DatabaseUtils.props.mysql - MySQL

� DatabaseUtils.props.odbc - ODBC access via ODBC/JDBC bridge,

� DatabaseUtils.props.oracle - Oracle 10g

� DatabaseUtils.props.postgresql - PostgreSQL 7.4

� DatabaseUtils.props.sqlite3 - sqlite 3.x

Page 9: Weka

@relation heart-disease-simplified

@attribute age numeric

@attribute sex { female, male}

@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}

@attribute cholesterol numeric

@attribute exercise_induced_angina { no, yes}

WEKA “flat” filesWEKA “flat” files

9/10/2012 9

@attribute exercise_induced_angina { no, yes}

@attribute class { present, not_present}

@data

63,male,typ_angina,233,no,not_present

67,male,asympt,286,yes,present

67,male,asympt,229,yes,present

38,female,non_anginal,?,no,not_present

...

Page 10: Weka

9/10/2012 University of Waikato 10

Page 11: Weka

9/10/2012 University of Waikato 11

Page 12: Weka

9/10/2012 University of Waikato 12

Page 13: Weka

9/10/2012 University of Waikato 13

Page 14: Weka

WEKA:: Explorer: building “classifiers”WEKA:: Explorer: building “classifiers”

� Classifiers in WEKA are models for predicting nominal or numeric quantities

� Implemented learning schemes include:◦ Decision trees and lists, instance-based classifiers,

support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, …support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, …

� “Meta”-classifiers include:◦ Bagging, boosting, stacking, error-correcting output

codes, locally weighted learning, …

Page 15: Weka
Page 16: Weka

Explorer: clustering dataExplorer: clustering data

� WEKA contains “clusterers” for finding groups of similar instances in a dataset

� Implemented schemes are:

◦ k-Means, EM, Cobweb, X-means, FarthestFirst

9/10/2012 16

◦ k-Means, EM, Cobweb, X-means, FarthestFirst

� Clusters can be visualized and compared to “true” clusters

Page 17: Weka

Explorer: finding associationsExplorer: finding associations

� WEKA contains an implementation of the Apriori algorithm for learning association rules

◦ Works only with discrete data

� Can identify statistical dependencies between groups of attributes:

9/10/2012 17

groups of attributes:

◦ milk, butter ⇒ bread, eggs (with confidence 0.9 and support 2000)

� Apriori can compute all rules that have a given minimum support and exceed a given confidence

Page 18: Weka

Explorer: attribute selectionExplorer: attribute selection

� Panel that can be used to investigate which (subsets of) attributes are the most predictive ones

� Attribute selection methods contain two parts:

9/10/2012 18

◦ A search method: best-first, forward selection, random, exhaustive, genetic algorithm, ranking

◦ An evaluation method: correlation-based, wrapper, information gain, chi-squared, …

� Very flexible: WEKA allows (almost) arbitrary combinations of these two

Page 19: Weka

Explorer: data visualizationExplorer: data visualization

� Visualization very useful in practice: e.g. helps to determine difficulty of the learning problem

� WEKA can visualize single attributes (1-d) and pairs of attributes (2-d)◦ To do: rotating 3-d visualizations (Xgobi-style)

9/10/2012 19

◦ To do: rotating 3-d visualizations (Xgobi-style)

� Color-coded class values� “Jitter” option to deal with nominal

attributes (and to detect “hidden” data points)

� “Zoom-in” function

Page 20: Weka
Page 21: Weka
Page 22: Weka
Page 23: Weka

Thank UThank U