13
Incremental Learning using WEKA CS267: Data Mining Presentation Guided By: Dr. Tran - Rohit Vobbilisetty

Incremental Learning using WEKA

Embed Size (px)

DESCRIPTION

How to incrementally train the model using WEKA 3.7 Developer version. Model used is Stochastic Gradient Descent.

Citation preview

Page 1: Incremental Learning using WEKA

Incremental Learning using WEKA

CS267: Data Mining PresentationGuided By: Dr. Tran

- Rohit Vobbilisetty

Page 2: Incremental Learning using WEKA

WEKA - Definition Incremental Learning – Definition Incremental Learning in WEKA Steps to train an UpdateableClassifier Stochastic Gradient Descent Sample Code, Result and Demo

Overview

Page 3: Incremental Learning using WEKA

Weka (Waikato Environment for Knowledge Analysis) is a collection of machine learning algorithms for data mining tasks.

Weka 3.7 (Developer version)

What is WEKA ?

Page 4: Incremental Learning using WEKA

Train the Model for each Instance within the dataset

Suitable when dealing with large datasets, which do not fit into the computer’s memory.

Incremental Learning Definition and Need

Page 5: Incremental Learning using WEKA

Applicable to Models implementing the interface:weka.classifiers.UpdateableClassifier(http://weka.sourceforge.net/doc.dev/weka/classifiers/UpdateableClassifier.html)

Models implementing this interface:

HoeffdingTree, Ibk, KStar , LWL, MultiClassClassifierUpdateable, NaiveBayesMultinomialText, NaiveBayesMultinomialUpdateable, NaiveBayesUpdateable, SGD, SGDText

Incremental Learning - Weka

Page 6: Incremental Learning using WEKA

Initialize an object of ArffLoader. Retrieve this object’s structure and set it’s

class index (The feature that needs to be predicted – setClassIndex() ).

Iteratively retrieve an instance from the training set and update the classifier ( updateClassifier() ).

Evaluate the trained model against the test dataset.

Step to train an UpdateableClassifier()

Page 7: Incremental Learning using WEKA

Stochastic gradient descent is a gradient descent optimization method for minimizing an objective function that is written as a sum of differentiable functions.

Applicable to large datasets, since each iteration involves processing only a single instance of the training dataset.

Stochastic Gradient Descent

w: Parameter to be estimated. Qi(w): A single instance of data

Page 8: Incremental Learning using WEKA

Name: vote.arff ( 17 features ) Features:

Class Name: 2 (democrat, republican) handicapped-infants: 2 (y,n) water-project-cost-sharing: 2 (y,n) adoption-of-the-budget-resolution: 2 (y,n) physician-fee-freeze: 2 (y,n) el-salvador-aid: 2 (y,n) religious-groups-in-schools: 2 (y,n) anti-satellite-test-ban: 2 (y,n) aid-to-nicaraguan-contras: 2 (y,n) mx-missile: 2 (y,n) immigration: 2 (y,n) synfuels-corporation-cutback: 2 (y,n) education-spending: 2 (y,n) superfund-right-to-sue: 2 (y,n) crime: 2 (y,n) duty-free-exports: 2 (y,n) export-administration-act-south-africa: 2 (y,n)

Sample DataSet Description

Page 9: Incremental Learning using WEKA

ArffLoader loader = new ArffLoader();loader.setFile(new File(“Training File Path”));

Instances structure = loader.getStructure();

SGD classifier = new SGD(); // Configure the classifier classifier.setEpochs(500);classifier.setEpsilon(0.001);

// Required if dealing with binary classclassifier.setLossFunction(new SelectedTag(SGD.HINGE, SGD.TAGS_SELECTION));

structure.setClassIndex(16); // Set the feature to be predictedclassifier.buildClassifier(structure);

Instance current; // Incrementally update the

Classifierwhile ((current = loader.getNextInstance(structure)) != null) { ((UpdateableClassifier)classifier).updateClassifier(current);}

Sample Code - SGD

Page 10: Incremental Learning using WEKA

Class =

-0.26 handicapped-infants + -0.09 water-project-cost-sharing + -0.51 adoption-of-the-budget-resolution + 0.73 physician-fee-freeze + 0.33 el-salvador-aid + 0.04 religious-groups-in-schools + -0.14 anti-satellite-test-ban + -0.33 aid-to-nicaraguan-contras + -0.28 mx-missile + 0.1 immigration + -0.37 synfuels-corporation-cutback + 0.33 education-spending + 0.15 superfund-right-to-sue + 0.18 crime + -0.25 duty-free-exports + 0.02 export-administration-act-south-africa - 0.11

Sample Output

Correctly Classified Instances 401 92.1839 %Incorrectly Classified Instances 34 7.8161 %Kappa statistic 0.838 Mean absolute error 0.0782Root mean squared error 0.2796Relative absolute error 16.482 %Root relative squared error 57.4214 %Coverage of cases (0.95 level) 92.1839 %Mean rel. region size (0.95 level) 50 %Total Number of Instances 435

Confusion Matrix:242.0 25.0 9.0 159.0

Page 11: Incremental Learning using WEKA

SGD class does not support Numeric data types, unless it is configured to use Huber Loss or Square Loss.

The learning rate should not be too small (Slow process) or large (Overshoot the minimum).

Some errors had to be resolved by consulting the WEKA Java code.

Challenges Faced

Page 13: Incremental Learning using WEKA

Thank You