34
Efficient Process for Constructing a Hierarchical Classification System Yong-wook Yoon Dec 22, 2003 NLP Lab., POSTECH

Efficient Process for Constructing a Hierarchical Classification System

  • Upload
    rosie

  • View
    35

  • Download
    0

Embed Size (px)

DESCRIPTION

Efficient Process for Constructing a Hierarchical Classification System. Yong-wook Yoon Dec 22, 2003 NLP Lab., POSTECH. Contents. Introduction Related Work Measure for Hierarchical Classifier Hierarchical Classification Demo Experiment Contribution Future Work. Introduction. - PowerPoint PPT Presentation

Citation preview

Page 1: Efficient Process for Constructing a Hierarchical Classification System

Efficient Process for Constructing a Hierarchical Classification System

Yong-wook YoonDec 22, 2003

NLP Lab., POSTECH

Page 2: Efficient Process for Constructing a Hierarchical Classification System

2

Contents

Introduction Related Work Measure for Hierarchical Classifier Hierarchical Classification Demo Experiment Contribution Future Work

Page 3: Efficient Process for Constructing a Hierarchical Classification System

3

Introduction New Trends in text categorization

Massive amount of documents produced everyday Often require On-line classification

Flat vs. Hierarchical Classification Hierarchical method is feasible in a large amount of

documents that have many levels of hierarchy Advantages of Hierarchical Classification

Well fit to a large number of categories Efficient in training time Better performance than flat classifier

Page 4: Efficient Process for Constructing a Hierarchical Classification System

4

Issues in Hierarchical Classification

There are no appropriate measure To evaluate the performance of a Hierarchical Classifier

There are no systematic process To construc a large level of Hierarchical classification syst

em

Our Suggestions New evaluation scheme that is well fit to Hierarchical Clas

sification system Efficient process to construct an optimal hierarchical clas

sification system

Page 5: Efficient Process for Constructing a Hierarchical Classification System

5

Flat vs. Hierarchical Classification

Root

C1 C2 C3 Cn

Business

Grain Oil

C1 C2 Ci Cj Cj+1 Cn

Page 6: Efficient Process for Constructing a Hierarchical Classification System

6

Variations in Hierarchical Classification

Virtual Category tree vs. Category tree Categories are organized as trees (cf. DAG) Documents can be assigned to

Leaf categories only (cf. Category tree)

Two methods in Hierarchical classification Big-Bang Approach

By only one classification A document is assigned to a leaf-node class or an internal-node class

Top-down level-based Approach A classifiier at each node of a hierarchy tree A document is classified by

Applying a sequence of classifiers from the root node to a leaf node

Page 7: Efficient Process for Constructing a Hierarchical Classification System

7

Virtual Category Tree withTop-down level-based Classification

Root

Comp. Talk.Alt.athesim

Class_1 Classifier Classifier Class_N

Doc At root node, there exists k classifiers, where k is the # of child nodes

yes yesno

Each classifier determine whether to descend the document to the lower level according to the sign of SVM score → called ‘Pachinko-Machine’

Finally, at the leaf nodes the correctness of prediction is examined

Class_i

Page 8: Efficient Process for Constructing a Hierarchical Classification System

8

Contents

IntroductionRelated Work Measure for Hierarchical Classifier Hierarchical Classification Demo Experiment Contribution Future Work

Page 9: Efficient Process for Constructing a Hierarchical Classification System

9

Previous Evaluations in Hierarchical Classification

Dumais and Chen (SIGIR-2000) Traditional Precision and Recall of each leaf-node classifi

er The probability of leaf-node classifier

L1: internal-node, L2: leaf-node classifier Boolean Scoring Function: P(L1)&&P(L2) Multiplicative Scoring Function: P(L1)*P(L2)

Limitations in Dumais and Chen May be feasible to simple cases such as 2-levels hie

rarchy (But large hierarchy?) No concern about the internal node performance

Page 10: Efficient Process for Constructing a Hierarchical Classification System

10

Previous Evaluations in Herarchical Classification (2)

Aixin Sun et al (JASIST’03) “Expanded Precision and Recall”

Considering category similarity and Contributions of misclassified documents

Limitations Difficult to compare with flat methods directly Too complex to calculate No concern about internal node performance

Page 11: Efficient Process for Constructing a Hierarchical Classification System

11

SVM in text categorization

Firstly suggested Joachims(1997) Shows superiority of SVMs over others

with experiments on Reuters-21578 (flat method) Theoretical learning model in TC (SiGiR’01)

SVM with Hierarchical method Dumais and Chen (SiGiR’00)

LookSmart Web directory (www.looksmart.com) 17173 categories organized into 7-level hierarchy

Tao Li et al (SiGiR’03) 20-Newsgroups, Optimally clustered 2-level hierarchy Only measure accuracy of a classifier

Page 12: Efficient Process for Constructing a Hierarchical Classification System

12

Contents

Introduction Related WorkMeasure for Hierarchical

Classifier Hierarchical Classification Demo Experiment Contribution Future Work

Page 13: Efficient Process for Constructing a Hierarchical Classification System

13

New evaluation on Hierarchical classification

Intermediate Precision and Recall For Internal node Classifier Selecting the classifier with the optimal

performance at intermediate level Approximate Precision and Recall

Performance of entire system in the middle of construction process

Overall P and R of Hierarchical System Applicable to Hierarchical Classifier Compatible to the traditional P and R of flat

classification

Page 14: Efficient Process for Constructing a Hierarchical Classification System

14

Evaluation of multi-labeled hierarchical classification

Given 4 category, 10 documents for test # of predictions: 4 x 10 = 40

A B C D

A BC D

B C

A - - TNB + - FNC + + TPD - + FP

A - - TNBC + + TPD - + FP

B + - FNC + + TP

Delayed Evaluation

A - - TNBC + - FND - + FP

B + - FNC + - FN

Pre-expanded Evaluation

A - - TNB + - FNC + + TPD - + FP

Ac Pr

Ac Pr Ac Pr

Ac: Actual classPr: Predicted class

Doc_1: Doc_2:Doc_1:

Doc_2:

Page 15: Efficient Process for Constructing a Hierarchical Classification System

15

Intermediate Recall of an Internal Classifier

jjj

jj

jj

jj

NLCFNTP

TPR

FPTP

TPP

,

NLCj is the weighting factor The number of all leaf node classifiers that are descenda

nts of node j Reasonable in micro-averaged evaluation

Page 16: Efficient Process for Constructing a Hierarchical Classification System

16

Business

Grain Oil

C1 C2 CiCj Cj+1 Cn

Meat

PorkC2

Cn

Internal node classifier vs.Leaf node classifier

Ex) NLCj (Meat) = 5

Page 17: Efficient Process for Constructing a Hierarchical Classification System

17

Approximate Precision and Recall at the level-k

Where TPi: # of true positive at leaf classifier I TPj,FPj, and FNj: at the internal classifier j

,)()(

kk

kk

ICjjj

LCiii

ICjj

LCii

k FPTPFPTP

TPTP

P

1

)()(kkk

kk

ICmm

ICjjj

LCiii

ICjj

LCii

k WFNWFNTPFNTP

TPTP

R

mmm

jjj

NLCFNWFN

NLCFNWFN

,

Page 18: Efficient Process for Constructing a Hierarchical Classification System

18

Overall Recall in HTC, Rh

Definition

where TPi: # of true positive at leaf classifier i FNj: # of false negative at leaf classifier j WFNk: weighted FNk at internal classifier k

nodesInternalkk

nodesLeafjj

m

i i

m

i ihh

WFNFNTP

TPRPP

1

1,

kclassifierofsclassifierleaflowerallofnumbertheisNLCwhere

NLCFNWFN

k

kleafkk ,_

Page 19: Efficient Process for Constructing a Hierarchical Classification System

19

Contents

Introduction Related Work Measure for Hierarchical Classifier Hierarchical Classification Demo Experiment Contribution Future Work

Page 20: Efficient Process for Constructing a Hierarchical Classification System

20

Contents

Introduction Related Work Measure for Hierarchical Classifier Hierarchical Classification DemoExperiment Contribution Future Work

Page 21: Efficient Process for Constructing a Hierarchical Classification System

21

20 newsgroup dataset

Usenet news article collection 19,997 documents in 20 newsgroups Each document consists of two parts: header an

d body Able to consider the intrinsic hierarchy 4.5 % of the articles have been posted to more t

han one newsgroup The cause of ‘Multi-classes’ ‘alt.atheism’ and ‘talk.religion.misc’

Page 22: Efficient Process for Constructing a Hierarchical Classification System

22

root

alt comp misc rec sci soc talk

graphicsos sys windows

forsale

autosmotor-cycles

sport

religion

cryptelectronics med

christian

space

ibm mac

hardwarehardware

baseball hocky politics religion

guns mideast misc misc

atheism

xms-windows

misc

총 8 개의 classifier 가 필요

Page 23: Efficient Process for Constructing a Hierarchical Classification System

23

Classification Result

20 Newsgroups in Three-level tree

BEP Accuracy

flatbaseline 75.9 89.2

Sigir-01 88.6 91.0

hier

withoutevaluation 86.0 90.1

Withevaluation 89.0 94.3

Sigir-03 - 96.3

Page 24: Efficient Process for Constructing a Hierarchical Classification System

24

Selection of Optimal Internal Classifier

using Intermediate P and RCOST TN FP WFN WTN P Rj BEP

comp.

700 1241 127 80 18080 90.7 93.9 92.3

500 1242 125 75 18090 90.9 94.3 92.6

100 1244 108 65 18175 92.0 95.0 93.5

70 1243 105 70 18190 92.2 94.7 93.4

50 1245 107 60 18180 92.1 95.4 93.7

30 1242 104 75 18195 92.3 94.3 93.3

sci.

300 985 230 80 15060 81.1 92.5 86.8

200 985 195 80 15200 83.5 92.5 88.0

150 983 177 88 15272 84.7 91.8 88.3

100 981 158 96 15348 86.1 91.1 88.6

80 977 136 112 15436 87.8 89.7 88.7

50 967 96 152 15596 91.0 86.4 88.7

Page 25: Efficient Process for Constructing a Hierarchical Classification System

25

Approximate P and R

For Three-level Hierarchy

Result of Another Hierarchies 2-level shows better performance than 3-level Clustered hierarchy is comparable to ours

Level-k TP FP FN Pk Rk BEP

0 5043 543 299 90.3 94.4 92.3

1 4884 594 579 89.2 89.4 89.3

2 4817 485 713 90.9 87.1 89.0

Tree types TP FP accFN Ph Rh BEP

Two-level 4820 481 708 90.9 87.1 89.1

Clustered 4845 529 719 87.1 90.2 88.6

Page 26: Efficient Process for Constructing a Hierarchical Classification System

26

Contents

Introduction Related Work Measure for Hierarchical Classifier Hierarchical Classification Demo ExperimentContribution Future Work

Page 27: Efficient Process for Constructing a Hierarchical Classification System

27

Contribution Evaluation measure for a Hierarchical Classification

system Final performance in terms of P and R Fully compatible the previous measure Makes possible to compare the performances

Between flat and hierarchical classifier Between different hierarchical classifiers

Algorithm to efficiently construct a hierarchical classification system With good performance Intermediate evaluation in the middle of construction process Maintaining original benefit of hierarchical method

Training time saving Good performance of SVM Easily applicable in the on-line execution environment

Page 28: Efficient Process for Constructing a Hierarchical Classification System

28

Future Work Further Research is required

The appropriate number of sub classes 2-level performs better than 3-level

The criterion in the selection of optimal internal node classifier Recall or BEP or Interpolation ?

실험대상 문서집합의 확대 Reuter News articles WEB KB 실제 Web 문서

Page 29: Efficient Process for Constructing a Hierarchical Classification System

The End

감사합니다 .

Page 30: Efficient Process for Constructing a Hierarchical Classification System

30

Approximate Precision and Recall

Another helpful measures to construct a hierarchical classification system

Given K height of category tree, Compute the Approximate P and R at the

level-k Helpful to recognize

How close the approximate performance is to the final performance of entire system

Page 31: Efficient Process for Constructing a Hierarchical Classification System

31

Selection Criteria of Optimal Internal Classifier

SVM cost(upper node)

80 150

TP FP FN TN P R BEP TP FP FN TN P R BEP

sci.crypt 237 8 10 858 96.7 96.0 96.3 238 8 10 904 96.7 96.0 96.4

sci.electronics 219 48 23 855 82.0 90.5 86.3 224 53 21 862 80.9 91.4 86.2

sci.med 232 9 11 782 96.3 95.5 95.9 234 9 11 906 96.3 95.5 95.9

sci.space 240 19 6 848 92.7 97.6 95.1 240 23 6 891 91.3 97.6 94.4

total 928 84 50 3390 91.7 94.9 93.3 936 93 48 3563 91.0 95.1 93.0

Combined total 162 85.1 88.4 136 87.3 89.2

The performance of cost 150 is Superior to the one of cost 80 !

COST TN FP WFN WTN P Rj BEP

sci.

300 985 230 80 15060 81.1 92.5 86.8

200 985 195 80 15200 83.5 92.5 88.0

150 983 177 88 15272 84.7 91.8 88.3

100 981 158 96 15348 86.1 91.1 88.6

80 977 136 112 15436 87.8 89.7 88.7

50 967 96 152 15596 91.0 86.4 88.7

Page 32: Efficient Process for Constructing a Hierarchical Classification System

32

root

5 1 6 2 3 7 4

talkalt talk

religion

atheism

Clustered 2-level Hierarchy

misc

mideastguns

politics

misc

sci

electronicsspace

med

comp

.graphics

.os.ms-windows.misc

.sys.ibm.pc.hardware

.sys.mac.hardware

.windows.x

rec

motor-cyclessport.baseball

sport.hockey

misc

christian

soc.religion

8

forsale

sci

crypt

rec

autos

Page 33: Efficient Process for Constructing a Hierarchical Classification System

33

Support Vector Machine Widely used in Text Categorization Recently

Shows good performance in classification tasks with large amount of data and high dimension

SVM training involves solving a quadratic programming (αi, b) The optimal solution gives rise to a decision function whic

h we use in prediction phase Given l data points {(x1,y1), … , (xl, yl)}

l

iiii byf

1

)(sgn)( xxx

Page 34: Efficient Process for Constructing a Hierarchical Classification System

34

Focus in Our Paper

Suggestion of New Measuring Scheme Well fit to Hierarchical Classification

Efficient construction of Hierarchical Classifier Compatible to the previous measures

Enables easy comparison between flat and hierarchical classifier

Efficient Hierarchical Classification model Virtual Category Structure + SVM Evaluation by Intermediate Precision and Recall