Hierarchical Classification by Jurgen Van Gael

Hierarchical Classification Jurgen Van Gael - .

About •  Computer Scientist w/ background in ML. •  London Machine Learning Meetup. •  Founder of Math.NET numerical library. •  Previously @ Microsoft Research. •  Data science team lead at Rangespan.

Taxonomy Classification •  Input: raw product data •  Output: classification models, classified product data

Electronics

Audio Cables Amps …

Computers …

Clothing

Pants T-‐‑Shirts …

Model Rockets …

Data Collection

Feature Extraction

Training Testing

Labeling

Feature Extraction

Name: INK-M50 Black Ink Cartridge (600 pages) Manufacturer: Samsung Description: null Label: toner-inkjet-cartridges

"category": "toner-inkjet-cartridges”, "features": ["cartridge", "samsung", "black", "ink", "ink-m50", "pages”]

Feature Extraction: •  Text cleaning (stopword, lexicalisation) •  Unigram + Bigram Features •  LDA Topic Features

Data Collection

Feature Extraction

Training Testing

Labelling

h"p://radimrehurek.com/gensim

Training, Testing & Labelling

Hierarchical Classification

D A C E B

4 (5) way multiclass classification

Hierarchical Classification

2 + 3 way multiclass classification

Naïve Bayes Neural Network Logistic Regression Support Vector Machines … ?

Logistic Regression -‐‑ Model word printer-‐‑

ink printer-‐‑hardware

cartridge 4.0 0.3

the 0.0 0.0

samsung 0.5 0.5

black 0.5 0.3

printer -‐‑1.0 2.0

ink 5.0 -‐‑1.7

… … …

For each class For each feature

Add the weight

Exponentiate & Normalize

10.0 Σ= -‐‑0.6

Pr= 0.99997 0.0003

Data Collection

Feature Extraction

Training Testing

Labelling

Logistic Regression -‐‑ Inference

•  Optimise using Wapiti. •  Hyperparameter optimisation using grid search. •  Using development set to stop training?

Data Collection

Feature Extraction

Training Testing

Labelling

h"p://wapiti.limsi.fr/

Electronics Clothing

Data Collection

Feature Extraction

Training Testing

Labelling

Cross Validation Calibration •  Estimate classifier errors. •  DO NOT

o  Test on training data. o  Leave data aside.

•  Are my probability estimates correct.

•  Computation: o  Take x data points with p(.|x) =

0.9, o  Check that about 90% of labels

were correct.

Data Collection

Feature Extraction

Training Testing

Labelling

Training Data

Error = 1.2%

Error = 1.1%

Error = 1.2%

Error = 1.3%

Error = 1.2%

Data Collection

Feature Extraction

Training Testing

Labelling

Using Bayes rule to chain classifiers:

Active Learning

p(electronics|{text}) = 0.1 Data

Collection

Feature Extraction

Training Testing

Labelling

•  High probability datapoints o  Upload to production

•  Low probability datapoints o  Subsample o  Acquire more labels

Data Collection

Feature Extraction

Training Testing

Labelling

p(electronics|{text}) = 0.1

e.g. Mechanical Turk

Implementation

Implementation MongoDB S3 Raw S3 Training Data S3 Models

1. JSON export 2. Feature Extraction 3. Training 4. Classification

Training MapReduce

•  Dumbo on Hadoop

•  2000 classifiers

•  5 fold CV (+ full)

•  20 hypers on grid

= 200.000 training runs

Labelling

•  128 chunks

•  Full Cascade each chunk

Chunk 1

Chunk 2

Chunk 3

Chunk N …

Thoughts •  Extra’s:

o Partial labeling: stop when probability becomes low.

o Data ensemble learning.

•  Most time spent feature engineering. •  Tie the parameters of the classifiers?

o Frustratingly easy domain adaptation, Hal Daume III

•  Partially flattening the hierarchy for training?

Hierarchical Classification by Jurgen Van Gael

Technology

Jurgen Shaderburg

Songs of Gael

04 Biofuel Jurgen

James O'Sullivan - Greystones - Fine Gael

DMO Jurgen Nauber

The Gael Gazette, Spring 2012

Gael Delhaye Portfolio

Gael Winds, December 2010

COLÁISTE GAEL LINN · • All Gael Linn courses in the Gaeltacht are approved by the Department of Culture, Heritage and the Gaeltacht. • Coláiste Gael Linn Mhachaire Rabhartaigh

Young Fine Gael Election Informer

Mooring Components - Gael Force Group

The Gael Spring 2016

Clanna Gael Fontenoy Coaching Manual

Winter 2014 Gael Gazette magazine

Rolf Jurgen Folder

Fine gael manifesto web

Book gael leguirinec

B10 jurgen sieck_hidden_information

Probabilistic Graphical Models for Semi-Supervised Traffic Classification Rotsos Charalampos, Jurgen Van Gael, Andrew W. Moore, Zoubin Ghahramani Computer

Khalid El-Arini Carnegie Mellon University Joint work with: Ulrich Paquet, Ralf Herbrich, Jurgen Van Gael, Blaise Agüera y Arcas Transparent User Models