Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Bui Quang Hung, Masanori...

Preview:

Citation preview

Intelligent Database Systems Lab

Presenter: CHANG, SHIH-JIE

Authors: Bui Quang Hung, Masanori Otsubo, Yoshinori Hijikata,

Shogo Nishida

2010.WIA.

HITS algorithm improvement using semantic text portion

Intelligent Database Systems Lab

Outlines

MotivationObjectivesMethodologyExperimentsConclusionsComments

Motivation Previous researches have tried to solve following problems using anchor-related text.• Link-spamming problem BHITS method

• Automatically generated links, banner ads => Topic drift problem

<sol> Identify important link => Chakrabarti’s method

P P

P PP

Page A

Authority score A

PP

PP

PPage B

Hub score B

Intelligent Database Systems Lab

Objectives• Investigate the effectiveness of using Semantic Text Portion

(STP) for improving the HITS.

Methodology – The HITS algorithm

authority

hub

Root set R

Base set

i

Methodology – The BHITS method

authority

hubRoot set R

Base set

i hub_wt

auth_wt

Methodology – Chakrabarti’s method

authority

hub

Iteratively calculates authority scores and hub scores.

Intelligent Database Systems Lab

Methodology – Chakrabarti’s method

Intelligent Database Systems Lab

Methodology – Semantic text portion(STP)STP is a text portion in the original page which is semantically related to the anchor pointing to the target page.• LSP: Local Semantic Portion • USP: Upper-level Semantic Portion

Intelligent Database Systems Lab

Methodology – Example of LSP

410list

Intelligent Database Systems Lab

Methodology – Example of USPUSP

Intelligent Database Systems Lab

Methodology-

Methodology –

Collecting base set I

1

Root set R

Base set

i

Intelligent Database Systems Lab

Experiments

Intelligent Database Systems Lab

Experiments

16

17

Ranking results for the architecture query

Intelligent Database Systems Lab

Ranking results for the bicycling query

Intelligent Database Systems Lab

Intelligent Database Systems Lab

Intelligent Database Systems Lab

Conclusions• The use of STPs is best for improving the HITS

algorithm.

Intelligent Database Systems Lab

Comments• Advantages

- Effective.

• Applications- Web mining、 Rank web pages.

Recommended