13
CSE 450 – Web Mining Seminar Professor Brian D. Davison Fall 2005 A Presentation on When Experts Agree: Using Non- Affiliated Experts to Rank Popular Topics K. Bharat & G. A. Mihaila WWW10 Conference, May 2001, Hong Kong by Osama Ahmed Khan 10/06/2005

CSE 450 – Web Mining Seminar Professor Brian D. Davison Fall 2005 A Presentation on When Experts Agree: Using Non-Affiliated Experts to Rank Popular Topics

Embed Size (px)

Citation preview

Page 1: CSE 450 – Web Mining Seminar Professor Brian D. Davison Fall 2005 A Presentation on When Experts Agree: Using Non-Affiliated Experts to Rank Popular Topics

CSE 450 – Web Mining SeminarProfessor Brian D. Davison

Fall 2005

A Presentation on

When Experts Agree: Using Non-Affiliated Experts to Rank Popular Topics

K. Bharat & G. A. MihailaWWW10 Conference, May 2001, Hong Kong

byOsama Ahmed Khan

10/06/2005

Page 2: CSE 450 – Web Mining Seminar Professor Brian D. Davison Fall 2005 A Presentation on When Experts Agree: Using Non-Affiliated Experts to Rank Popular Topics

ProblemProblem

Query on Popular Topic Content Analysis

SolutionSolution

Most Authoritative Pages

Page 3: CSE 450 – Web Mining Seminar Professor Brian D. Davison Fall 2005 A Presentation on When Experts Agree: Using Non-Affiliated Experts to Rank Popular Topics

Technical Terms

Expert Recommendation Non-affiliation

Page 4: CSE 450 – Web Mining Seminar Professor Brian D. Davison Fall 2005 A Presentation on When Experts Agree: Using Non-Affiliated Experts to Rank Popular Topics

Hilltop AlgorithmHilltop Algorithm

1. Expert Lookup Detecting Host Affiliation Expert Selection Expert Indexing

2. Target Ranking Computing Expert Score Computing Target Score

Page 5: CSE 450 – Web Mining Seminar Professor Brian D. Davison Fall 2005 A Presentation on When Experts Agree: Using Non-Affiliated Experts to Rank Popular Topics

Detecting Host AffiliationDetecting Host Affiliation

Conditions Same first 3 octets of IP

127.0.0.1 127.0.0.15

Same rightmost non-generic token of hostname

www.ibm.com www.ibm.co.mx

Union-Find Algorithm

Page 6: CSE 450 – Web Mining Seminar Professor Brian D. Davison Fall 2005 A Presentation on When Experts Agree: Using Non-Affiliated Experts to Rank Popular Topics

Expert SelectionExpert Selection

Retrieve all webpages with:

Out-degree > Threshold (k)

(e.g. k = 5)

Expert will have:

URLs pointing to k distinct non-affiliated hosts

Page 7: CSE 450 – Web Mining Seminar Professor Brian D. Davison Fall 2005 A Presentation on When Experts Agree: Using Non-Affiliated Experts to Rank Popular Topics

Expert IndexingExpert Indexing

Inverted Index Mapping Keywords to Experts Key Phrases Match Positions

Page 8: CSE 450 – Web Mining Seminar Professor Brian D. Davison Fall 2005 A Presentation on When Experts Agree: Using Non-Affiliated Experts to Rank Popular Topics

Computing Expert ScoreComputing Expert Score

Condition Atleast 1 URL with all query keywords

Expert Score: (S0, S1, S2)

Si = SUM{key phrases p with k-i query terms} * LevelScore(p) * FullnessFactor(p,q)

Expert_Score = 232 * S0 + 216 * S1 + S2

Page 9: CSE 450 – Web Mining Seminar Professor Brian D. Davison Fall 2005 A Presentation on When Experts Agree: Using Non-Affiliated Experts to Rank Popular Topics

Computing Target ScoreComputing Target Score

Condition Atleast 2 non-affiliated experts

Target Score:

Edge_Score(E,T) = Expert_Score(E) *

SUM{query keywords w} * occ(k,T)

Target_Score = Sum{Edge_Score(E,T)}

Page 10: CSE 450 – Web Mining Seminar Professor Brian D. Davison Fall 2005 A Presentation on When Experts Agree: Using Non-Affiliated Experts to Rank Popular Topics

EvaluationEvaluation1. Locating Specific Popular Targets

Page 11: CSE 450 – Web Mining Seminar Professor Brian D. Davison Fall 2005 A Presentation on When Experts Agree: Using Non-Affiliated Experts to Rank Popular Topics

Evaluation Evaluation (Contd.)

2. Gathering Relevant Pages

Page 12: CSE 450 – Web Mining Seminar Professor Brian D. Davison Fall 2005 A Presentation on When Experts Agree: Using Non-Affiliated Experts to Rank Popular Topics

ConclusionConclusion

Characteristics Popular Queries Expert Subset

Hilltop vs. PageRank Topic Distillation

Page 13: CSE 450 – Web Mining Seminar Professor Brian D. Davison Fall 2005 A Presentation on When Experts Agree: Using Non-Affiliated Experts to Rank Popular Topics

Thank YouThank You