1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look...

1 1 Slide

Evaluation

2 2 Slide

Interactive decision tree construction

• Load segmentchallenge.arff; look at dataset

• Select UserClassifier (tree classifier)

• Use the test set segmenttest.arff

• Examine data visualizer and tree visualizer

• Plot regioncentroidrow vs intensitymean

• Rectangle, Polygon and Polyline selection tools

… several selections …

• Right click in Tree visualizer and Accept the tree

Over to you: how well can you do?

Be a classifier!

3 3 Slide

Build a tree: what strategy did you use?

Given enough time, you could produce a “perfect”

tree for the dataset

• but would it perform well on the test test?

Be a classifier!

4 4 Slide

Testdata

Trainingdata

MLalgorithm

Classifier Deploy!

Evaluationresults

Training and Testing

5 5 Slide

Testdata

Trainingdata

MLalgorithm

Classifier Deploy!

Evaluationresults

sets produced byBasic assumption: training and testindependent sampling from an infinite population

6 6 Slide

Use J48 to analyze the segment dataset

• Open file segment‐challenge.arff

• Choose J48 decision tree learner (trees>J48)

• Supplied test set segment‐test.arff

• Run it: 96% accuracy

• Evaluate on training set: 99% accuracy

• Evaluate on percentage split: 95% accuracy

• Do it again: get exactly the same result!

7 7 Slide

Basic assumption:

• training and test sets sampled independently

from an infinite population

Just one dataset? — hold some out for testing

Expect slight variation in results… but Weka

produces same results each time…Why?

• E.g. J48 on segment‐challenge dataset

8 8 Slide

Evaluate J48 on segment‐challenge

• With segment‐challenge and J48 (trees>J48)

• Set percentage split to 90%

• Run it: 96.7% accuracy

• [More options] Repeat

with a different ith seed

• Use 2, 3, 4, 5, 6, 7, 8, 9, 10

Repeated Training and Testing

0.9400.9400.9670.9530.9670.9200.947

0.9330.947

9 9 Slide

0.9400.9400.9670.9530.9670.9200.9470.9330.947

x iSample mean x =n

(xi – x )2Variance 2 =

n – 1

Standard deviation

x = 0.949, = 0.0158

Evaluate J48 on segment‐challenge

10 10 Slide

Basic assumption:

• training and test sets sampled independently

from an infinite population

Expect slight variation in results … get it by

setting the random‐number seed

Can calculate mean and standard deviation

experimentally

11 11 Slide

Use diabetes dataset and default holdout Open file diabetes.arff Test option: Percentage split Try these classifiers:

• trees > J48 76%• bayes > NaiveBayes 77%• lazy > IBk 73%• rules > PART 74%

768 instances (500 negative, 268 positive) Always guess “negative”: 500/768=65%

• rules > ZeroR: most likely class!

Baseline Accuracy

12 12 Slide

Sometimes baseline is best!• Open supermarket.arff and blindly apply

• rules > ZeroR 64%• trees > J48 63%• bayes > NaiveBayes 63%• lazy > IBk 38%• rules > PART 63%

• Attributes are not informative

• Caution: Don’t just apply Weka to a dataset:

you need to understand what’s going on

Baseline Accuracy

13 13 Slide

Consider whether differences are significant

Always try a simple baseline, e.g. rules > ZeroR

Caution: Don’t just apply Weka to a dataset: you

need to understand what’s going on

Baseline Accuracy

14 14 Slide

Can we improve upon repeated holdout (i.e.

reduce variance)?

Cross‐validation

Stratified cross‐validation

Cross-Validation

15 15 Slide

Repeated holdouthold out 10% for testing, repeat 10 times

(repeat 10 times)

Cross-Validation

16 16 Slide

10‐fold cross‐validation

Divide dataset into 10 parts

Hold out each part in turnAverage the results

(folds)

Each data point used once for testing, 9 times for training

Stratified cross‐validation

Ensure that each fold has the rightproportion of each class value

Cross-Validation

17 17 Slide

Cross‐validation better than repeated holdout

Stratified is even better

Practical rule of thumb:Lots of data? – use percentage splitElse stratified 10‐fold cross‐validation

Cross-Validation

18 18 Slide

Is cross‐validation really better than repeated holdout?

Diabetes dataset

Baseline accuracy (rules > ZeroR):

trees > J4810‐fold cross‐validation

… with1

different random number seed2

Cross-Validation Results

19 19 Slide

holdout(10%)

75.377.980.574.071.470.179.271.480.567.5

cross‐validation(10‐fold)

73.875.075.575.574.475.673.674.074.573.0

xi Sample mean x =n

(xi – x )2Variance 2 =

n – 1

Standard deviation

x = 74.5x = 74.8 = = 4.6 0.9

20 20 Slide

Why 10‐fold? E.g. 20‐fold: 75.1%

Cross‐validation really is better than repeated holdout

It reduces the variance of the estimate

21 21 Slide

Evaluation MethodsExercises

22 22 Slide

To evaluate the performance of machine learning algorithms classifying Tic-Tac-Toe games.

23 23 Slide

Classification on Tic-Tac-Toe

Download Tic-Tac-Toe dataset tic-tac-toe.zip from Course Page.

Work as a team to evaluate the performance of machine learning algorithms classifying Tic-Tac-Toe games.

24 24 Slide

Evaluation Methods

Using Training Set (use 100% of instances to train/learn and use 100% of instances to test performance)

10-fold Cross-Validation

Split 70% (use 70% of instances to train/learn and use the rest of 30% of instances to test performance)

25 25 Slide

Classifiers Being Used Decision Tree

• Tree → J48 Neural Network

• Functions → MultilayerPerceptron (trainingtime=50)

Bayes Network• Bayes → NaiveBayes

Nearest Neighbor• Lazy → IBk (k=3)

26 26 Slide

Using Weka

Extract Tic-Tac-Toe.zip to the Weka folder Load Weka program Open the Tic-Tac-Toe.arff Choose Explorer

27 27 Slide

Using Weka (cont.)

Click Classify tab Choose J48 Classifier below trees Set the Test options to Use training set Enable Output predictions in More options Click Start to run

28 28 Slide

Using Weka (cont.)

Accuracy rate

29 29 Slide

Reporting Download Tic-tac-toe-report.docx Complete the table evaluating the performance of

different learning methods in Q1. Find the best performer in Q2, Q3, and Q4.

1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look...

Documents

Dataset Andrototal

Unbiased Look at Dataset Bias - People | MIT CSAILpeople.csail.mit.edu/torralba/publications/datasets_cvpr... · 2011-06-28 · Unbiased Look at Dataset Bias Antonio Torralba Massachusetts

Coding ADO.Net DataSet Objects ISYS 512. DataSet Object A DataSet object can hold several tables and relationships between tables. A DataSet is a set

Dataset Documentation

Dataset Tracht6A

Closing remarks · 2018-11-05 · Closing remarks. Open Images V4 dataset Open Images V4 dataset. Open Images V4 dataset Open Images V4 dataset. Dataset paper coming soon. Future

Unbiased Look at Dataset Bias...Unbiased Look at Dataset Bias Antonio Torralba Massachusetts Institute of Technology torralba@csail.mit.edu Alexei A. Efros Carnegie Mellon University

UNBIASED LOOK AT DATASET BIAS - Computer Scienceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/Pres...UNBIASED LOOK AT DATASET BIAS Antonia Torralba Alexei A. Efros MIT CMU Presented

Effective dataset construction in computer visionvision.is.tohoku.ac.jp/.../mva2015-effective-dataset-construction.pdfEffective Dataset Construction ... EDPMs increase the e xpressi

Dataset Journeys

A Look at a VOLTTRON™ Use Case: Intelligent Load Controls · Intelligent Load Control v- Agent Intelligent load control is a highly automated VOLTTRON™ v -Agent that will prioritize

The IMHG dataset: A Multi-View Hand Gesture RGB-D Dataset

DATASET - landlaeknir.is

Stanford University · 3.1 Dataset SQuAD dataset is a machine comprehension dataset on Wikipedia articles with more than 100,000 questions [1]. The dataset is randomly partitioned

The highD Dataset: A Drone Dataset of Naturalistic Vehicle

DATASET INTRODUCTION 1. Dataset: Urine 2 From Cleveland Clinic 1981-1984

Partially Occluded Hands: A challenging new dataset for ... · Partially Occluded Hands Dataset 5 3 A dataset of partially occluded hands We collected a dataset of 11,840 images of

A look at the human mutational load from the systems biology perspective

UNBIASED LOOK AT DATASET BIAS Antonio Torralba Massachusetts Institute of Technology Alexei A. Efros Carnegie Mellon University CVPR 2011

Bayesian Subnational Estimation using Complex Survey Data ... 4 - Tutorial 1 Slides.pdf · Use R Load packages in R. Use functions and operators in R. Load and explore a dataset in