12
Introduction to Weka CS4705 – Natural Language Processing Thursday, September 28

Introduction to Weka

Embed Size (px)

DESCRIPTION

CS4705 – Natural Language Processing Thursday, September 28. Introduction to Weka. What is weka?. java-based Machine Learning Tool 3 modes of operation GUI Command Line API (not discussed here) To run: java -Xmx1024M -jar ~cs4705/bin/weka.jar &. weka Homepage. - PowerPoint PPT Presentation

Citation preview

Page 1: Introduction to Weka

Introduction to Weka

CS4705 – Natural Language ProcessingThursday, September 28

Page 2: Introduction to Weka

What is weka?

● java-based Machine Learning Tool● 3 modes of operation

– GUI

– Command Line

– API (not discussed here)● To run:

– java -Xmx1024M -jar ~cs4705/bin/weka.jar &

Page 3: Introduction to Weka

weka Homepage

● http://www.cs.waikato.ac.nz/ml/weka/

Page 4: Introduction to Weka

.arff file format

● http://www.cs.waikato.ac.nz/~ml/weka/arff.html@relation name

@attribute attrName {numeric, string, <nominal>, date}

...

@data

a,b,c,d,e

● <nominal> := {class1,class2,...,classN}

Page 5: Introduction to Weka

Example Arff Files

● http://sourceforge.net/projects/weka

● iris.arff● cmc.arff

Page 6: Introduction to Weka

To Classify with weka GUI

1.Run weka GUI

2.Click 'Explorer'

3.'Open file...'

4.Select 'Classify' tab

5.'Choose' a classifier

6.Confirm options

7.Click 'Start'

8.Wait...

9.Right-click on Result list entry

a. 'Save result buffer'

b.'Save model'

Page 7: Introduction to Weka

Classify

● Some classifiers to start with.

– NaiveBayes

– JRip

– J48

– SMO● Find References by selecting a classifier● Use Cross-Validation!

Page 8: Introduction to Weka

Analyzing Results

● Important tools for Homework 2

– Accuracy● “Correctly classified instances”

– Confusion matrix

– Save model

– Visualization

Page 9: Introduction to Weka

Running weka from the Command Line

● Running an N-fold cross validation experiment– java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -t trainingdata.arff -x N

● Using a predefined test set– java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -t trainingdata.arff -T testingdata.arff

Page 10: Introduction to Weka

● Saving the model– java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -t trainingdata.arff -d output.model

● Classifying a test set– java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -l input.model -T testingdata.arff

Page 11: Introduction to Weka

● Analyzing results

– Get predictions from test data● java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -l input.model -T testingdata.arff -p range

– Then DIY with scripts● awk and sed will be your friends

Page 12: Introduction to Weka

● Getting predictions from crossvalidation

– “Output Predictions” doesn't cut it.– export CLASSPATH=~cs4705/bin/:~cs4705/bin/weka.jar

– java callClassifier weka.classifiers.bayes.NaiveBayes -t trainingdata.arff