173
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression Clustering Association Rules Attribute Selection Data Visualization The Experimenter The Knowledge Machine Learning with WEKA

Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

Embed Size (px)

Citation preview

Department of Computer Science, University of Waikato, New Zealand

Eibe Frank

WEKA: A Machine Learning Toolkit

The Explorer• Classification and

Regression• Clustering• Association Rules• Attribute Selection• Data Visualization

The Experimenter The Knowledge

Flow GUI Conclusions

Machine Learning with WEKA

04/18/23 University of Waikato 2

WEKA: the bird

Copyright: Martin Kramer ([email protected])

04/18/23 University of Waikato 3

WEKA: the software Machine learning/data mining software written in

Java (distributed under the GNU Public License) Used for research, education, and applications Complements “Data Mining” by Witten & Frank Main features:

Comprehensive set of data pre-processing tools, learning algorithms and evaluation methods

Graphical user interfaces (incl. data visualization) Environment for comparing learning algorithms

04/18/23 University of Waikato 4

WEKA: versions There are several versions of WEKA:

WEKA 3.0: “book version” compatible with description in data mining book

WEKA 3.2: “GUI version” adds graphical user interfaces (book version is command-line only)

WEKA 3.3: “development version” with lots of improvements

This talk is based on the latest snapshot of WEKA 3.3 (soon to be WEKA 3.4)

04/18/23 University of Waikato 5

@relation heart-disease-simplified

@attribute age numeric@attribute sex { female, male}@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}@attribute cholesterol numeric@attribute exercise_induced_angina { no, yes}@attribute class { present, not_present}

@data63,male,typ_angina,233,no,not_present67,male,asympt,286,yes,present67,male,asympt,229,yes,present38,female,non_anginal,?,no,not_present...

WEKA only deals with “flat” files

04/18/23 University of Waikato 6

@relation heart-disease-simplified

@attribute age numeric@attribute sex { female, male}@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}@attribute cholesterol numeric@attribute exercise_induced_angina { no, yes}@attribute class { present, not_present}

@data63,male,typ_angina,233,no,not_present67,male,asympt,286,yes,present67,male,asympt,229,yes,present38,female,non_anginal,?,no,not_present...

WEKA only deals with “flat” files

04/18/23 University of Waikato 7

04/18/23 University of Waikato 8

04/18/23 University of Waikato 9

04/18/23 University of Waikato 10

Explorer: pre-processing the data Data can be imported from a file in various

formats: ARFF, CSV, C4.5, binary Data can also be read from a URL or from an SQL

database (using JDBC) Pre-processing tools in WEKA are called “filters” WEKA contains filters for:

Discretization, normalization, resampling, attribute selection, transforming and combining attributes, …

04/18/23 University of Waikato 11

04/18/23 University of Waikato 12

04/18/23 University of Waikato 13

04/18/23 University of Waikato 14

04/18/23 University of Waikato 15

04/18/23 University of Waikato 16

04/18/23 University of Waikato 17

04/18/23 University of Waikato 18

04/18/23 University of Waikato 19

04/18/23 University of Waikato 20

04/18/23 University of Waikato 21

04/18/23 University of Waikato 22

04/18/23 University of Waikato 23

04/18/23 University of Waikato 24

04/18/23 University of Waikato 25

04/18/23 University of Waikato 26

04/18/23 University of Waikato 27

04/18/23 University of Waikato 28

04/18/23 University of Waikato 29

04/18/23 University of Waikato 30

04/18/23 University of Waikato 31

04/18/23 University of Waikato 32

Explorer: building “classifiers” Classifiers in WEKA are models for predicting

nominal or numeric quantities Implemented learning schemes include:

Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, …

“Meta”-classifiers include: Bagging, boosting, stacking, error-correcting output

codes, locally weighted learning, …

04/18/23 University of Waikato 33

04/18/23 University of Waikato 34

04/18/23 University of Waikato 35

04/18/23 University of Waikato 36

04/18/23 University of Waikato 37

04/18/23 University of Waikato 38

04/18/23 University of Waikato 39

04/18/23 University of Waikato 40

04/18/23 University of Waikato 41

04/18/23 University of Waikato 42

04/18/23 University of Waikato 43

04/18/23 University of Waikato 44

04/18/23 University of Waikato 45

04/18/23 University of Waikato 46

04/18/23 University of Waikato 47

04/18/23 University of Waikato 48

04/18/23 University of Waikato 49

04/18/23 University of Waikato 50

04/18/23 University of Waikato 51

04/18/23 University of Waikato 52

04/18/23 University of Waikato 53

04/18/23 University of Waikato 54

04/18/23 University of Waikato 55

04/18/23 University of Waikato 56

04/18/23 University of Waikato 57

04/18/23 University of Waikato 58

04/18/23 University of Waikato 59

04/18/23 University of Waikato 60

04/18/23 University of Waikato 61

04/18/23 University of Waikato 62

04/18/23 University of Waikato 63

04/18/23 University of Waikato 64

04/18/23 University of Waikato 65QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

04/18/23 University of Waikato 66QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

04/18/23 University of Waikato 67QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

04/18/23 University of Waikato 68

04/18/23 University of Waikato 69

04/18/23 University of Waikato 70

04/18/23 University of Waikato 71

04/18/23 University of Waikato 72

04/18/23 University of Waikato 73

04/18/23 University of Waikato 74

04/18/23 University of Waikato 75

QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

04/18/23 University of Waikato 76

04/18/23 University of Waikato 77

04/18/23 University of Waikato 78

04/18/23 University of Waikato 79

04/18/23 University of Waikato 80

QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

04/18/23 University of Waikato 81

QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

04/18/23 University of Waikato 82

04/18/23 University of Waikato 83

QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

04/18/23 University of Waikato 84

04/18/23 University of Waikato 85

04/18/23 University of Waikato 86

04/18/23 University of Waikato 87

04/18/23 University of Waikato 88

04/18/23 University of Waikato 89

04/18/23 University of Waikato 90

04/18/23 University of Waikato 91

04/18/23 University of Waikato 92

Explorer: clustering data WEKA contains “clusterers” for finding groups of

similar instances in a dataset Implemented schemes are:

k-Means, EM, Cobweb, X-means, FarthestFirst Clusters can be visualized and compared to “true”

clusters (if given) Evaluation based on loglikelihood if clustering

scheme produces a probability distribution

04/18/23 University of Waikato 93

04/18/23 University of Waikato 94

04/18/23 University of Waikato 95

04/18/23 University of Waikato 96

04/18/23 University of Waikato 97

04/18/23 University of Waikato 98

04/18/23 University of Waikato 99

04/18/23 University of Waikato 100

04/18/23 University of Waikato 101

04/18/23 University of Waikato 102

04/18/23 University of Waikato 103

04/18/23 University of Waikato 104

04/18/23 University of Waikato 105

04/18/23 University of Waikato 106

04/18/23 University of Waikato 107

04/18/23 University of Waikato 108

Explorer: finding associations WEKA contains an implementation of the Apriori

algorithm for learning association rules Works only with discrete data

Can identify statistical dependencies between groups of attributes: milk, butter bread, eggs (with confidence 0.9 and

support 2000) Apriori can compute all rules that have a given

minimum support and exceed a given confidence

04/18/23 University of Waikato 109

04/18/23 University of Waikato 110

04/18/23 University of Waikato 111

04/18/23 University of Waikato 112

04/18/23 University of Waikato 113

04/18/23 University of Waikato 114

04/18/23 University of Waikato 115

04/18/23 University of Waikato 116

Explorer: attribute selection Panel that can be used to investigate which

(subsets of) attributes are the most predictive ones Attribute selection methods contain two parts:

A search method: best-first, forward selection, random, exhaustive, genetic algorithm, ranking

An evaluation method: correlation-based, wrapper, information gain, chi-squared, …

Very flexible: WEKA allows (almost) arbitrary combinations of these two

04/18/23 University of Waikato 117

04/18/23 University of Waikato 118

04/18/23 University of Waikato 119

04/18/23 University of Waikato 120

04/18/23 University of Waikato 121

04/18/23 University of Waikato 122

04/18/23 University of Waikato 123

04/18/23 University of Waikato 124

04/18/23 University of Waikato 125

Explorer: data visualization Visualization very useful in practice: e.g. helps to

determine difficulty of the learning problem WEKA can visualize single attributes (1-d) and

pairs of attributes (2-d) To do: rotating 3-d visualizations (Xgobi-style)

Color-coded class values “Jitter” option to deal with nominal attributes (and

to detect “hidden” data points) “Zoom-in” function

04/18/23 University of Waikato 126

04/18/23 University of Waikato 127

04/18/23 University of Waikato 128

04/18/23 University of Waikato 129

04/18/23 University of Waikato 130

04/18/23 University of Waikato 131

04/18/23 University of Waikato 132

04/18/23 University of Waikato 133

04/18/23 University of Waikato 134

04/18/23 University of Waikato 135

04/18/23 University of Waikato 136

04/18/23 University of Waikato 137

04/18/23 University of Waikato 138

Performing experiments Experimenter makes it easy to compare the

performance of different learning schemes For classification and regression problems Results can be written into file or database Evaluation options: cross-validation, learning

curve, hold-out Can also iterate over different parameter settings Significance-testing built in!

04/18/23 University of Waikato 139

04/18/23 University of Waikato 140

04/18/23 University of Waikato 141

04/18/23 University of Waikato 142

04/18/23 University of Waikato 143

04/18/23 University of Waikato 144

04/18/23 University of Waikato 145

04/18/23 University of Waikato 146

04/18/23 University of Waikato 147

04/18/23 University of Waikato 148

04/18/23 University of Waikato 149

04/18/23 University of Waikato 150

04/18/23 University of Waikato 151

04/18/23 University of Waikato 152

The Knowledge Flow GUI New graphical user interface for WEKA Java-Beans-based interface for setting up and

running machine learning experiments Data sources, classifiers, etc. are beans and can

be connected graphically Data “flows” through components: e.g.,

“data source” -> “filter” -> “classifier” -> “evaluator” Layouts can be saved and loaded again later

04/18/23 University of Waikato 153

04/18/23 University of Waikato 154

04/18/23 University of Waikato 155

04/18/23 University of Waikato 156

04/18/23 University of Waikato 157

04/18/23 University of Waikato 158

04/18/23 University of Waikato 159

04/18/23 University of Waikato 160

04/18/23 University of Waikato 161

04/18/23 University of Waikato 162

04/18/23 University of Waikato 163

04/18/23 University of Waikato 164

04/18/23 University of Waikato 165

04/18/23 University of Waikato 166

04/18/23 University of Waikato 167

04/18/23 University of Waikato 168

04/18/23 University of Waikato 169

04/18/23 University of Waikato 170

04/18/23 University of Waikato 171

04/18/23 University of Waikato 172

04/18/23 University of Waikato 173

Conclusion: try it yourself! WEKA is available at

http://www.cs.waikato.ac.nz/ml/weka Also has a list of projects based on WEKA WEKA contributors:

Abdelaziz Mahoui, Alexander K. Seewald, Ashraf M. Kibriya, Bernhard Pfahringer , Brent Martin, Peter Flach, Eibe Frank ,Gabi Schmidberger ,Ian H. Witten , J. Lindgren, Janice Boughton, Jason Wells, Len Trigg, Lucio de Souza Coelho, Malcolm Ware, Mark Hall ,Remco Bouckaert , Richard Kirkby, Shane Butler, Shane Legg, Stuart Inglis, Sylvain Roy, Tony

Voyle, Xin Xu, Yong Wang, Zhihai Wang