17
Intelligent Database Systems Presenter : BEI-YI JIANG Authors : GUANSONG PANG, SHENGYI JIANG 2013. INFORMATION PROCESSING AND MANAGEMENT A generalized cluster centroid based classi er for text categorization

A generalized cluster centroid based classifier for text categorization

  • Upload
    royce

  • View
    55

  • Download
    0

Embed Size (px)

DESCRIPTION

A generalized cluster centroid based classifier for text categorization. Presenter : Bei -YI Jiang Authors : Guansong Pang, Shengyi Jiang 2013. Information Processing and Management. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation. KNN - PowerPoint PPT Presentation

Citation preview

Page 1: A generalized cluster  centroid  based  classifier  for text categorization

Intelligent Database Systems Lab

Presenter : BEI-YI JIANG

Authors : GUANSONG PANG, SHENGYI JIANG

2013. INFORMATION PROCESSING AND MANAGEMENT

A generalized cluster centroid based classifier for text categorization

Page 2: A generalized cluster  centroid  based  classifier  for text categorization

Intelligent Database Systems Lab

OutlinesMotivationObjectivesMethodologyExperimentsConclusionsComments

Page 3: A generalized cluster  centroid  based  classifier  for text categorization

Intelligent Database Systems Lab

Motivation

• KNN− With the exponential growth of online textual

information, how to organize text data effectively and efficiently has become an important and demanding issue.

• Rocchio− Fails to obtain an expressive categorization

model due to its inherent linear separability assumption.

Page 4: A generalized cluster  centroid  based  classifier  for text categorization

Intelligent Database Systems Lab

Objectives

• To strengthen the expressiveness of the Rocchio model.

• Employ the improved Rocchio model to speed up the categorization process of KNN.

Page 5: A generalized cluster  centroid  based  classifier  for text categorization

Intelligent Database Systems Lab

Methodology

• KNN

• Rocchio

Page 6: A generalized cluster  centroid  based  classifier  for text categorization

Intelligent Database Systems Lab

Methodology

Page 7: A generalized cluster  centroid  based  classifier  for text categorization

Intelligent Database Systems Lab

Methodology

Page 8: A generalized cluster  centroid  based  classifier  for text categorization

Intelligent Database Systems Lab

Methodology

• GCC

• Determine the threshold

Page 9: A generalized cluster  centroid  based  classifier  for text categorization

Intelligent Database Systems Lab

Experiments

Page 10: A generalized cluster  centroid  based  classifier  for text categorization

Intelligent Database Systems Lab

Experiments

Page 11: A generalized cluster  centroid  based  classifier  for text categorization

Intelligent Database Systems Lab

Experiments

Page 12: A generalized cluster  centroid  based  classifier  for text categorization

Intelligent Database Systems Lab

Experiments

Page 13: A generalized cluster  centroid  based  classifier  for text categorization

Intelligent Database Systems Lab

Experiments

Page 14: A generalized cluster  centroid  based  classifier  for text categorization

Intelligent Database Systems Lab

Experiments

Page 15: A generalized cluster  centroid  based  classifier  for text categorization

Intelligent Database Systems Lab

Experiments

Page 16: A generalized cluster  centroid  based  classifier  for text categorization

Intelligent Database Systems Lab

Conclusions

• strengthen the expressiveness of the Rocchio model• GCCC and its variants achieve impressive

performance• obtain near linear time complexity in modeling• GCCC’s modeling stage is more time-consuming

Page 17: A generalized cluster  centroid  based  classifier  for text categorization

Intelligent Database Systems Lab

Comments• Advantages

-relatively stable-favorable performance

• Applications-online categorization