29
1 Evaluation

1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset

Embed Size (px)

Citation preview

Page 1: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset

1 1 Slide

Slide

Evaluation

Page 2: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset

2 2 Slide

Slide

Interactive decision tree construction

• Load segmentchallenge.arff; look at dataset

• Select UserClassifier (tree classifier)

• Use the test set segmenttest.arff

• Examine data visualizer and tree visualizer

• Plot regioncentroidrow vs intensitymean

• Rectangle, Polygon and Polyline selection tools

… several selections …

• Right click in Tree visualizer and Accept the tree

Over to you: how well can you do?

Be a classifier!

Page 3: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset

3 3 Slide

Slide

Build a tree: what strategy did you use?

Given enough time, you could produce a “perfect”

tree for the dataset

• but would it perform well on the test test?

Be a classifier!

Page 4: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset

4 4 Slide

Slide

Testdata

Trainingdata

MLalgorithm

Classifier Deploy!

Evaluationresults

Training and Testing

Page 5: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset

5 5 Slide

Slide

Testdata

Trainingdata

MLalgorithm

Classifier Deploy!

Evaluationresults

sets produced byBasic assumption: training and testindependent sampling from an infinite population

Training and Testing

Page 6: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset

6 6 Slide

Slide

Use J48 to analyze the segment dataset

• Open file segment‐challenge.arff

• Choose J48 decision tree learner (trees>J48)

• Supplied test set segment‐test.arff

• Run it: 96% accuracy

• Evaluate on training set: 99% accuracy

• Evaluate on percentage split: 95% accuracy

• Do it again: get exactly the same result!

Training and Testing

Page 7: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset

7 7 Slide

Slide

Basic assumption:

• training and test sets sampled independently

from an infinite population

Just one dataset? — hold some out for testing

Expect slight variation in results… but Weka

produces same results each time…Why?

• E.g. J48 on segment‐challenge dataset

Training and Testing

Page 8: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset

8 8 Slide

Slide

Evaluate J48 on segment‐challenge

• With segment‐challenge and J48 (trees>J48)

• Set percentage split to 90%

• Run it: 96.7% accuracy

• [More options] Repeat

with a different ith seed

• Use 2, 3, 4, 5, 6, 7, 8, 9, 10

Repeated Training and Testing

0.967

0.9400.9400.9670.9530.9670.9200.947

0.9330.947

Page 9: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset

9 9 Slide

Slide

0.967

0.9400.9400.9670.9530.9670.9200.9470.9330.947

x iSample mean x =n

(xi – x )2Variance 2 =

n – 1

Standard deviation

x = 0.949, = 0.0158

Repeated Training and Testing

Evaluate J48 on segment‐challenge

Page 10: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset

10 10 Slide

Slide

Basic assumption:

• training and test sets sampled independently

from an infinite population

Expect slight variation in results … get it by

setting the random‐number seed

Can calculate mean and standard deviation

experimentally

Repeated Training and Testing

Page 11: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset

11 11 Slide

Slide

Use diabetes dataset and default holdout Open file diabetes.arff Test option: Percentage split Try these classifiers:

• trees > J48 76%• bayes > NaiveBayes 77%• lazy > IBk 73%• rules > PART 74%

768 instances (500 negative, 268 positive) Always guess “negative”: 500/768=65%

• rules > ZeroR: most likely class!

Baseline Accuracy

Page 12: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset

12 12 Slide

Slide

Sometimes baseline is best!• Open supermarket.arff and blindly apply

• rules > ZeroR 64%• trees > J48 63%• bayes > NaiveBayes 63%• lazy > IBk 38%• rules > PART 63%

• Attributes are not informative

• Caution: Don’t just apply Weka to a dataset:

you need to understand what’s going on

Baseline Accuracy

Page 13: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset

13 13 Slide

Slide

Consider whether differences are significant

Always try a simple baseline, e.g. rules > ZeroR

Caution: Don’t just apply Weka to a dataset: you

need to understand what’s going on

Baseline Accuracy

Page 14: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset

14 14 Slide

Slide

Can we improve upon repeated holdout (i.e.

reduce variance)?

Cross‐validation

Stratified cross‐validation

Cross-Validation

Page 15: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset

15 15 Slide

Slide

Repeated holdouthold out 10% for testing, repeat 10 times

(repeat 10 times)

Cross-Validation

Page 16: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset

16 16 Slide

Slide

10‐fold cross‐validation

Divide dataset into 10 parts

Hold out each part in turnAverage the results

(folds)

Each data point used once for testing, 9 times for training

Stratified cross‐validation

Ensure that each fold has the rightproportion of each class value

Cross-Validation

Page 17: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset

17 17 Slide

Slide

Cross‐validation better than repeated holdout

Stratified is even better

Practical rule of thumb:Lots of data? – use percentage splitElse stratified 10‐fold cross‐validation

Cross-Validation

Page 18: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset

18 18 Slide

Slide

Is cross‐validation really better than repeated holdout?

Diabetes dataset

Baseline accuracy (rules > ZeroR):

trees > J4810‐fold cross‐validation

65.1%

73.8%

… with1

73.8

different random number seed2

75.0

3

75.5

4

75.5

5

74.4

6

75.6

7

73.6

8

74.0

9

74.5

10

73.0

Cross-Validation Results

Page 19: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset

19 19 Slide

Slide

holdout(10%)

75.377.980.574.071.470.179.271.480.567.5

cross‐validation(10‐fold)

73.875.075.575.574.475.673.674.074.573.0

xi Sample mean x =n

(xi – x )2Variance 2 =

n – 1

Standard deviation

x = 74.5x = 74.8 = = 4.6 0.9

Cross-Validation Results

Page 20: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset

20 20 Slide

Slide

Why 10‐fold? E.g. 20‐fold: 75.1%

Cross‐validation really is better than repeated holdout

It reduces the variance of the estimate

Cross-Validation Results

Page 21: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset

21 21 Slide

Slide

Evaluation MethodsExercises

Page 22: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset

22 22 Slide

Slide

Plan

To evaluate the performance of machine learning algorithms classifying Tic-Tac-Toe games.

Page 23: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset

23 23 Slide

Slide

Classification on Tic-Tac-Toe

Download Tic-Tac-Toe dataset tic-tac-toe.zip from Course Page.

Work as a team to evaluate the performance of machine learning algorithms classifying Tic-Tac-Toe games.

Page 24: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset

24 24 Slide

Slide

Evaluation Methods

Using Training Set (use 100% of instances to train/learn and use 100% of instances to test performance)

10-fold Cross-Validation

Split 70% (use 70% of instances to train/learn and use the rest of 30% of instances to test performance)

Page 25: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset

25 25 Slide

Slide

Classifiers Being Used Decision Tree

• Tree → J48 Neural Network

• Functions → MultilayerPerceptron (trainingtime=50)

Bayes Network• Bayes → NaiveBayes

Nearest Neighbor• Lazy → IBk (k=3)

Page 26: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset

26 26 Slide

Slide

Using Weka

Extract Tic-Tac-Toe.zip to the Weka folder Load Weka program Open the Tic-Tac-Toe.arff Choose Explorer

Page 27: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset

27 27 Slide

Slide

Using Weka (cont.)

Click Classify tab Choose J48 Classifier below trees Set the Test options to Use training set Enable Output predictions in More options Click Start to run

Page 28: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset

28 28 Slide

Slide

Using Weka (cont.)

Accuracy rate

Page 29: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset

29 29 Slide

Slide

Reporting Download Tic-tac-toe-report.docx Complete the table evaluating the performance of

different learning methods in Q1. Find the best performer in Q2, Q3, and Q4.