45
Hierarchical Classification Rongcheng Lin Computer Science Department

Hierarchical Classification Rongcheng Lin Computer Science Department

Embed Size (px)

Citation preview

Page 1: Hierarchical Classification Rongcheng Lin Computer Science Department

Hierarchical ClassificationRongcheng Lin

Computer Science Department

Page 2: Hierarchical Classification Rongcheng Lin Computer Science Department

Contents

Motivation, Definition & Problem

Review of SVM

Hierarchical Classification

Path-based Approaches

Regularization-based Approaches

Page 3: Hierarchical Classification Rongcheng Lin Computer Science Department

MotivationThe classes in real world are structured, specially often hierarchically related.

Gene function prediction Document categorization Image Search …

Hierarchies or taxonomies offer clear advantage in supporting tasks like browsing, searching or visualization International Patent Classification scheme Yahoo! Web catalogs …

Prior knowledge about class relationships will improve the classification performance, especially for tasks with large class number

Page 4: Hierarchical Classification Rongcheng Lin Computer Science Department

MotivationThe classes in real world are structured, specially often hierarchically related.

Gene function prediction Document categorization Image Search …

Hierarchies or taxonomies offer clear advantage in supporting tasks like browsing, searching or visualization International Patent Classification scheme Yahoo! Web catalogs …

Prior knowledge about class relationships will improve the classification performance, especially for tasks with large class number

Page 5: Hierarchical Classification Rongcheng Lin Computer Science Department

MotivationThe classes in real world are structured, specially often hierarchically related.

Gene function prediction Document categorization Image Search …

Hierarchies or taxonomies offer clear advantage in supporting tasks like browsing, searching or visualization International Patent Classification scheme Yahoo! Web catalogs …

Prior knowledge about class relationships will boost the classification performance, especially for tasks with large class number

Page 6: Hierarchical Classification Rongcheng Lin Computer Science Department

Definition and Problemautomatically categorize data into pre-defined topic hierarchies or taxonomies Supervised Learning Structured Output

Page 7: Hierarchical Classification Rongcheng Lin Computer Science Department

DAG and Tree Structure

Page 8: Hierarchical Classification Rongcheng Lin Computer Science Department

Definition and Problemautomatically categorize data into pre-defined topic hierarchies or taxonomies Supervised Learning Structured Output

Problem and solution?

Page 9: Hierarchical Classification Rongcheng Lin Computer Science Department

Definition and ProblemIncorporate the inter-class relationship(hierarchy) into classification

Redefine the problem

Lower level categories are more detailed while upper level categories are more general Redefine the margin

Different classification mistake are of different severity Redefine the loss function

Page 10: Hierarchical Classification Rongcheng Lin Computer Science Department

Definition and ProblemIncorporate the inter-class relationship(hierarchy) into classification

Redefine the problem

Lower level categories are more detailed while upper level categories are more general Redefine the margin

Different classification mistake are of different severity Redefine the loss function

Page 11: Hierarchical Classification Rongcheng Lin Computer Science Department

Definition and ProblemIncorporate the inter-class relationship(hierarchy) into classification

Redefine the problem

Lower level categories are more detailed while upper level categories are more general Redefine the margin

Different classification mistake are of different severity Redefine the loss function

Page 12: Hierarchical Classification Rongcheng Lin Computer Science Department

Review: Binary SVMBinary classification

Margin

Loss Function

wTx + b = 0

wTx + b < 0wTx + b > 0

f(x) = sign(wTx + b)w

xw br i

T

𝐿( 𝑓 (𝑥 ) , 𝑦 )

Page 13: Hierarchical Classification Rongcheng Lin Computer Science Department

Review: Binary SVM

𝐽 (𝑤 )=𝑅 (𝑤 )+ ∑𝑖=1 …𝑛

𝐿(𝑤 ,𝑥 𝑖 , 𝑦 𝑖)

General Form:

Page 14: Hierarchical Classification Rongcheng Lin Computer Science Department

Review: Multiclass SVM1) one-vs-the rest2) Crammer & Singer (pairwise)

Page 15: Hierarchical Classification Rongcheng Lin Computer Science Department

Review: Multiclass SVMDedicated Loss Function

Page 16: Hierarchical Classification Rongcheng Lin Computer Science Department

Review: Multiclass SVMDedicated Loss Function

𝑀𝑎𝑟𝑔𝑖𝑛 :𝛾𝑖 (𝑤 )=𝑤𝑦 𝑖𝑇 𝑋 𝑖−𝑤𝑘

𝑇 𝑋 𝑖 for k ≠ 𝑦 𝑖

Page 17: Hierarchical Classification Rongcheng Lin Computer Science Department

Review: Hinge Loss Function the more you violate the margin, the higher the penalty is.

Page 18: Hierarchical Classification Rongcheng Lin Computer Science Department

Loss Function

Page 19: Hierarchical Classification Rongcheng Lin Computer Science Department

Hierarchical ClassifiersPath-based Approaches

Large Margin Hierarchical Classification Hierarchical Document Categorization with Support Vector Machine On Large Margin Hierarchical Classification with multiple paths

Regularization-based Approaches Tree-Guided Group Lasso for Multi-task Regression Hierarchical Multitask Structured Output Learning for Large-Scale Segmentation

Page 20: Hierarchical Classification Rongcheng Lin Computer Science Department

Tree DistanceA given hierarchy induces a metric over the set of classes tree distance or tree induced error

(y,) is defined to be the number of edges along the (unique) path from y to

Page 21: Hierarchical Classification Rongcheng Lin Computer Science Department

Tree DistanceA given hierarchy induces a metric over the set of classes tree distance or tree induced error

(y,) is defined to be the number of edges along the (unique) path from y to

�̂�

y

𝛾 (𝑦 , �̂� )=4

Page 22: Hierarchical Classification Rongcheng Lin Computer Science Department

Tree Distance

2

5 6 �̂� 8 9

1

0

3

y

4

𝐷 (𝑦 , �̂� )= 𝑓 4∗𝐶4 + 𝑓 1∗𝐶1+ 𝑓 3∗𝑐3

Page 23: Hierarchical Classification Rongcheng Lin Computer Science Department

Loss Functions

1

1Zero-One Loss

Hinge Loss

Hierarchical Hinge Loss

𝐷( �̂� , 𝑦 ¿

𝐷( �̂� , 𝑦 ) 𝑓 𝑦 (𝑥 ) − 𝑓 �̂� (𝑥)

Page 24: Hierarchical Classification Rongcheng Lin Computer Science Department

Path-based Approachespath-based approaches try to find the most likely path from the root.

Only need to update the parameters of miss-classified

nodes in the tree

Page 25: Hierarchical Classification Rongcheng Lin Computer Science Department

Large margin hierarchical classifier

𝑓 𝑦 (𝑥 ) − 𝑓 �̂� (𝑥)

𝑛𝑜𝑡𝑒: 𝑦 𝑖𝑠 h𝑡 𝑒𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑙𝑎𝑏𝑒𝑙 𝑎𝑛𝑑 �̂� ≠ 𝑦

√𝛾(𝑦 , �̂� )  

√𝛾(𝑦 , �̂� )  

Page 26: Hierarchical Classification Rongcheng Lin Computer Science Department

Training Algorithm

Page 27: Hierarchical Classification Rongcheng Lin Computer Science Department

HSVM

Page 28: Hierarchical Classification Rongcheng Lin Computer Science Department

HSVM

𝑓 𝑦 (𝑥 ) − 𝑓 �̂� (𝑥)1

Δ (𝑦 𝑖 , 𝑦 )

Page 29: Hierarchical Classification Rongcheng Lin Computer Science Department

HSVM

1

Δ (𝑦 𝑖 , 𝑦 )

Page 30: Hierarchical Classification Rongcheng Lin Computer Science Department

Regularization-based ApproachesK individual classification tasks

Use a n additional regularization term to penalizes the disagreement between the individual models

Page 31: Hierarchical Classification Rongcheng Lin Computer Science Department

Multitask Learning

Inductions of multiple tasks are performed simultaneously to capture intrinsic relatedness

Page 32: Hierarchical Classification Rongcheng Lin Computer Science Department
Page 33: Hierarchical Classification Rongcheng Lin Computer Science Department

L1-Norm, L2-Norm

Penalize model complexity to avoid overfitting

L-1 Norm give more sparse estimate than L-2 Norm

Page 34: Hierarchical Classification Rongcheng Lin Computer Science Department

Group Lasso and Sparse Group Lasso

Page 35: Hierarchical Classification Rongcheng Lin Computer Science Department

HMTL: Hierarchical Multitask Learning

determine the contribution of regularization from the origin vs. the parent node’s parameters (i.e., the strength of coupling between the node and its parent)

Page 36: Hierarchical Classification Rongcheng Lin Computer Science Department

HMTL

Page 37: Hierarchical Classification Rongcheng Lin Computer Science Department

Tree-Guided Group Lasso for Multi-Task Regression with Structured Sparsity

Original Approach:

New Approach:

Note:

Page 38: Hierarchical Classification Rongcheng Lin Computer Science Department

Tree-Guided Group Lasso for Multi-Task Regression with Structured Sparsityeach leaf node is a class

each inner node is a group of classes

Page 39: Hierarchical Classification Rongcheng Lin Computer Science Department
Page 40: Hierarchical Classification Rongcheng Lin Computer Science Department

Tree-Guided Group Lasso

Page 41: Hierarchical Classification Rongcheng Lin Computer Science Department
Page 42: Hierarchical Classification Rongcheng Lin Computer Science Department
Page 43: Hierarchical Classification Rongcheng Lin Computer Science Department
Page 44: Hierarchical Classification Rongcheng Lin Computer Science Department

Advantages and DrawbacksAssume children is good

Assume parent is good

Assume both are not good

Page 45: Hierarchical Classification Rongcheng Lin Computer Science Department

Advantages and DrawbacksAssume children is good

Tree Guided Group Lasso

Assume parent is good HMTL

Assume both are not good Path-based

It depends!