22
Intelligent Database Systems Presenter: CHANG, SHIH-JIE Authors: Bui Quang Hung, Masanori Otsubo, Yoshinori Hijikata, Shogo Nishida 2010.WIA. HITS algorithm improvement using semanti text portion

Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Bui Quang Hung, Masanori Otsubo, Yoshinori Hijikata, Shogo Nishida 2010.WIA. HITS

Embed Size (px)

Citation preview

Page 1: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Bui Quang Hung, Masanori Otsubo, Yoshinori Hijikata, Shogo Nishida 2010.WIA. HITS

Intelligent Database Systems Lab

Presenter: CHANG, SHIH-JIE

Authors: Bui Quang Hung, Masanori Otsubo, Yoshinori Hijikata,

Shogo Nishida

2010.WIA.

HITS algorithm improvement using semantic text portion

Page 2: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Bui Quang Hung, Masanori Otsubo, Yoshinori Hijikata, Shogo Nishida 2010.WIA. HITS

Intelligent Database Systems Lab

Outlines

MotivationObjectivesMethodologyExperimentsConclusionsComments

Page 3: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Bui Quang Hung, Masanori Otsubo, Yoshinori Hijikata, Shogo Nishida 2010.WIA. HITS

Motivation Previous researches have tried to solve following problems using anchor-related text.• Link-spamming problem BHITS method

• Automatically generated links, banner ads => Topic drift problem

<sol> Identify important link => Chakrabarti’s method

P P

P PP

Page A

Authority score A

PP

PP

PPage B

Hub score B

Page 4: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Bui Quang Hung, Masanori Otsubo, Yoshinori Hijikata, Shogo Nishida 2010.WIA. HITS

Intelligent Database Systems Lab

Objectives• Investigate the effectiveness of using Semantic Text Portion

(STP) for improving the HITS.

Page 5: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Bui Quang Hung, Masanori Otsubo, Yoshinori Hijikata, Shogo Nishida 2010.WIA. HITS

Methodology – The HITS algorithm

authority

hub

Root set R

Base set

i

Page 6: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Bui Quang Hung, Masanori Otsubo, Yoshinori Hijikata, Shogo Nishida 2010.WIA. HITS

Methodology – The BHITS method

authority

hubRoot set R

Base set

i hub_wt

auth_wt

Page 7: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Bui Quang Hung, Masanori Otsubo, Yoshinori Hijikata, Shogo Nishida 2010.WIA. HITS

Methodology – Chakrabarti’s method

authority

hub

Iteratively calculates authority scores and hub scores.

Page 8: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Bui Quang Hung, Masanori Otsubo, Yoshinori Hijikata, Shogo Nishida 2010.WIA. HITS

Intelligent Database Systems Lab

Methodology – Chakrabarti’s method

Page 9: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Bui Quang Hung, Masanori Otsubo, Yoshinori Hijikata, Shogo Nishida 2010.WIA. HITS

Intelligent Database Systems Lab

Methodology – Semantic text portion(STP)STP is a text portion in the original page which is semantically related to the anchor pointing to the target page.• LSP: Local Semantic Portion • USP: Upper-level Semantic Portion

Page 10: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Bui Quang Hung, Masanori Otsubo, Yoshinori Hijikata, Shogo Nishida 2010.WIA. HITS

Intelligent Database Systems Lab

Methodology – Example of LSP

410list

Page 11: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Bui Quang Hung, Masanori Otsubo, Yoshinori Hijikata, Shogo Nishida 2010.WIA. HITS

Intelligent Database Systems Lab

Methodology – Example of USPUSP

Page 12: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Bui Quang Hung, Masanori Otsubo, Yoshinori Hijikata, Shogo Nishida 2010.WIA. HITS

Intelligent Database Systems Lab

Methodology-

Page 13: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Bui Quang Hung, Masanori Otsubo, Yoshinori Hijikata, Shogo Nishida 2010.WIA. HITS

Methodology –

Collecting base set I

1

Root set R

Base set

i

Page 14: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Bui Quang Hung, Masanori Otsubo, Yoshinori Hijikata, Shogo Nishida 2010.WIA. HITS

Intelligent Database Systems Lab

Experiments

Page 15: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Bui Quang Hung, Masanori Otsubo, Yoshinori Hijikata, Shogo Nishida 2010.WIA. HITS

Intelligent Database Systems Lab

Experiments

Page 16: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Bui Quang Hung, Masanori Otsubo, Yoshinori Hijikata, Shogo Nishida 2010.WIA. HITS

16

Page 17: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Bui Quang Hung, Masanori Otsubo, Yoshinori Hijikata, Shogo Nishida 2010.WIA. HITS

17

Ranking results for the architecture query

Page 18: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Bui Quang Hung, Masanori Otsubo, Yoshinori Hijikata, Shogo Nishida 2010.WIA. HITS

Intelligent Database Systems Lab

Ranking results for the bicycling query

Page 19: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Bui Quang Hung, Masanori Otsubo, Yoshinori Hijikata, Shogo Nishida 2010.WIA. HITS

Intelligent Database Systems Lab

Page 20: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Bui Quang Hung, Masanori Otsubo, Yoshinori Hijikata, Shogo Nishida 2010.WIA. HITS

Intelligent Database Systems Lab

Page 21: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Bui Quang Hung, Masanori Otsubo, Yoshinori Hijikata, Shogo Nishida 2010.WIA. HITS

Intelligent Database Systems Lab

Conclusions• The use of STPs is best for improving the HITS

algorithm.

Page 22: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Bui Quang Hung, Masanori Otsubo, Yoshinori Hijikata, Shogo Nishida 2010.WIA. HITS

Intelligent Database Systems Lab

Comments• Advantages

- Effective.

• Applications- Web mining、 Rank web pages.