27
Hierarchical Classification Jurgen Van Gael - .

Hierarchical Classification by Jurgen Van Gael

  • Upload
    pydata

  • View
    322

  • Download
    1

Embed Size (px)

DESCRIPTION

Hierarchical Classification by Jurgen Van Gael

Citation preview

Page 1: Hierarchical Classification by Jurgen Van Gael

Hierarchical  Classification Jurgen Van Gael - .

Page 2: Hierarchical Classification by Jurgen Van Gael

About •  Computer Scientist w/ background in ML. •  London Machine Learning Meetup. •  Founder of Math.NET numerical library. •  Previously @ Microsoft Research. •  Data science team lead at Rangespan.

Page 3: Hierarchical Classification by Jurgen Van Gael
Page 4: Hierarchical Classification by Jurgen Van Gael

Taxonomy  Classification •  Input: raw product data •  Output: classification models, classified product data

ROOT

Electronics

Audio

Audio  Cables Amps …

Computers …

Clothing

Pants T-­‐‑Shirts …

Toys

Model  Rockets …

Page 5: Hierarchical Classification by Jurgen Van Gael

Data  Collection

Feature  Extraction

Training Testing

Labeling

Page 6: Hierarchical Classification by Jurgen Van Gael

Feature  Extraction

Page 7: Hierarchical Classification by Jurgen Van Gael

Name: INK-M50 Black Ink Cartridge (600 pages) Manufacturer: Samsung Description: null Label: toner-inkjet-cartridges

"category": "toner-inkjet-cartridges”, "features": ["cartridge", "samsung", "black", "ink", "ink-m50", "pages”]

Feature  Extraction: •  Text  cleaning  (stopword,  lexicalisation) •  Unigram  +  Bigram  Features •  LDA  Topic  Features

Data  Collection

Feature  Extraction

Training Testing

Labelling

Page 8: Hierarchical Classification by Jurgen Van Gael

h"p://radimrehurek.com/gensim

Page 9: Hierarchical Classification by Jurgen Van Gael

Training,  Testing  &  Labelling

Page 10: Hierarchical Classification by Jurgen Van Gael

Hierarchical  Classification

D

A C B

E

D A C E B

4  (5)  way  multiclass  classification

Page 11: Hierarchical Classification by Jurgen Van Gael

Hierarchical  Classification

D

A C B

E D

A C B

E

2  +  3  way  multiclass  classification

Page 12: Hierarchical Classification by Jurgen Van Gael

Naïve  Bayes            Neural  Network Logistic  Regression Support   Vector   Machines   … ?

Page 13: Hierarchical Classification by Jurgen Van Gael

Logistic  Regression  -­‐‑  Model word printer-­‐‑

ink printer-­‐‑hardware

cartridge 4.0 0.3

the 0.0 0.0

samsung 0.5 0.5

black 0.5 0.3

printer -­‐‑1.0 2.0

ink 5.0 -­‐‑1.7

… … …

For each class For each feature

Add the weight

Exponentiate & Normalize

10.0 Σ= -­‐‑0.6

Pr= 0.99997 0.0003

Data  Collection

Feature  Extraction

Training Testing

Labelling

Page 14: Hierarchical Classification by Jurgen Van Gael

Logistic  Regression  -­‐‑  Inference

•  Optimise using Wapiti. •  Hyperparameter optimisation using grid search. •  Using development set to stop training?

Data  Collection

Feature  Extraction

Training Testing

Labelling

Page 15: Hierarchical Classification by Jurgen Van Gael

h"p://wapiti.limsi.fr/

Page 16: Hierarchical Classification by Jurgen Van Gael

ROOT

Electronics Clothing

Data  Collection

Feature  Extraction

Training Testing

Labelling

Page 17: Hierarchical Classification by Jurgen Van Gael

Cross Validation Calibration •  Estimate classifier errors. •  DO NOT

o  Test on training data. o  Leave data aside.

•  Are my probability estimates correct.

•  Computation: o  Take x data points with p(.|x) =

0.9, o  Check that about 90% of labels

were correct.

Data  Collection

Feature  Extraction

Training Testing

Labelling

Training  Data

Error  =  1.2%

Error  =  1.1%

Error  =  1.2%

Error  =  1.2%

Error  =  1.3%

=

Error  =  1.2%

Page 18: Hierarchical Classification by Jurgen Van Gael

Data  Collection

Feature  Extraction

Training Testing

Labelling

ROOT

Electronics Clothing

Using  Bayes  rule  to  chain  classifiers:

Page 19: Hierarchical Classification by Jurgen Van Gael

Active  Learning

Page 20: Hierarchical Classification by Jurgen Van Gael
Page 21: Hierarchical Classification by Jurgen Van Gael

ROOT

Electronics Clothing

p(electronics|{text})  =  0.1 Data  

Collection

Feature  Extraction

Training Testing

Labelling

Page 22: Hierarchical Classification by Jurgen Van Gael

•  High probability datapoints o  Upload to production

•  Low probability datapoints o  Subsample o  Acquire more labels

Data  Collection

Feature  Extraction

Training Testing

Labelling

ROOT

Electronics Clothing

p(electronics|{text})  =  0.1

e.g.  Mechanical  Turk

Page 23: Hierarchical Classification by Jurgen Van Gael

Implementation

Page 24: Hierarchical Classification by Jurgen Van Gael

Implementation MongoDB S3  Raw S3  Training  Data S3  Models

1.  JSON  export 2.  Feature  Extraction 3.  Training 4.  Classification

Page 25: Hierarchical Classification by Jurgen Van Gael

Training  MapReduce

•  Dumbo on Hadoop

•  2000 classifiers

•  5 fold CV (+ full)

•  20 hypers on grid

= 200.000 training runs

Page 26: Hierarchical Classification by Jurgen Van Gael

Labelling

•  128 chunks

•  Full Cascade each chunk

D

A CB

E

Chunk  1

Chunk  2

Chunk  3

Chunk  N …

D

A CB

ED

A CB

ED

A CB

E

Page 27: Hierarchical Classification by Jurgen Van Gael

Thoughts •  Extra’s:

o Partial labeling: stop when probability becomes low.

o Data ensemble learning.

•  Most time spent feature engineering. •  Tie the parameters of the classifiers?

o Frustratingly easy domain adaptation, Hal Daume III

•  Partially flattening the hierarchy for training?