Hierarchical Classification by Jurgen Van Gael

Preview:

DESCRIPTION

Hierarchical Classification by Jurgen Van Gael

Citation preview

Hierarchical  Classification Jurgen Van Gael - .

About •  Computer Scientist w/ background in ML. •  London Machine Learning Meetup. •  Founder of Math.NET numerical library. •  Previously @ Microsoft Research. •  Data science team lead at Rangespan.

Taxonomy  Classification •  Input: raw product data •  Output: classification models, classified product data

ROOT

Electronics

Audio

Audio  Cables Amps …

Computers …

Clothing

Pants T-­‐‑Shirts …

Toys

Model  Rockets …

Data  Collection

Feature  Extraction

Training Testing

Labeling

Feature  Extraction

Name: INK-M50 Black Ink Cartridge (600 pages) Manufacturer: Samsung Description: null Label: toner-inkjet-cartridges

"category": "toner-inkjet-cartridges”, "features": ["cartridge", "samsung", "black", "ink", "ink-m50", "pages”]

Feature  Extraction: •  Text  cleaning  (stopword,  lexicalisation) •  Unigram  +  Bigram  Features •  LDA  Topic  Features

Data  Collection

Feature  Extraction

Training Testing

Labelling

h"p://radimrehurek.com/gensim

Training,  Testing  &  Labelling

Hierarchical  Classification

D

A C B

E

D A C E B

4  (5)  way  multiclass  classification

Hierarchical  Classification

D

A C B

E D

A C B

E

2  +  3  way  multiclass  classification

Naïve  Bayes            Neural  Network Logistic  Regression Support   Vector   Machines   … ?

Logistic  Regression  -­‐‑  Model word printer-­‐‑

ink printer-­‐‑hardware

cartridge 4.0 0.3

the 0.0 0.0

samsung 0.5 0.5

black 0.5 0.3

printer -­‐‑1.0 2.0

ink 5.0 -­‐‑1.7

… … …

For each class For each feature

Add the weight

Exponentiate & Normalize

10.0 Σ= -­‐‑0.6

Pr= 0.99997 0.0003

Data  Collection

Feature  Extraction

Training Testing

Labelling

Logistic  Regression  -­‐‑  Inference

•  Optimise using Wapiti. •  Hyperparameter optimisation using grid search. •  Using development set to stop training?

Data  Collection

Feature  Extraction

Training Testing

Labelling

h"p://wapiti.limsi.fr/

ROOT

Electronics Clothing

Data  Collection

Feature  Extraction

Training Testing

Labelling

Cross Validation Calibration •  Estimate classifier errors. •  DO NOT

o  Test on training data. o  Leave data aside.

•  Are my probability estimates correct.

•  Computation: o  Take x data points with p(.|x) =

0.9, o  Check that about 90% of labels

were correct.

Data  Collection

Feature  Extraction

Training Testing

Labelling

Training  Data

Error  =  1.2%

Error  =  1.1%

Error  =  1.2%

Error  =  1.2%

Error  =  1.3%

=

Error  =  1.2%

Data  Collection

Feature  Extraction

Training Testing

Labelling

ROOT

Electronics Clothing

Using  Bayes  rule  to  chain  classifiers:

Active  Learning

ROOT

Electronics Clothing

p(electronics|{text})  =  0.1 Data  

Collection

Feature  Extraction

Training Testing

Labelling

•  High probability datapoints o  Upload to production

•  Low probability datapoints o  Subsample o  Acquire more labels

Data  Collection

Feature  Extraction

Training Testing

Labelling

ROOT

Electronics Clothing

p(electronics|{text})  =  0.1

e.g.  Mechanical  Turk

Implementation

Implementation MongoDB S3  Raw S3  Training  Data S3  Models

1.  JSON  export 2.  Feature  Extraction 3.  Training 4.  Classification

Training  MapReduce

•  Dumbo on Hadoop

•  2000 classifiers

•  5 fold CV (+ full)

•  20 hypers on grid

= 200.000 training runs

Labelling

•  128 chunks

•  Full Cascade each chunk

D

A CB

E

Chunk  1

Chunk  2

Chunk  3

Chunk  N …

D

A CB

ED

A CB

ED

A CB

E

Thoughts •  Extra’s:

o Partial labeling: stop when probability becomes low.

o Data ensemble learning.

•  Most time spent feature engineering. •  Tie the parameters of the classifiers?

o Frustratingly easy domain adaptation, Hal Daume III

•  Partially flattening the hierarchy for training?