14
Searching and Browsing Using Tags Nikos Sarkas Social Information Systems Seminar DCS, University of Toronto, Winter 2007

Searching and Browsing Using Tags Nikos Sarkas Social Information Systems Seminar DCS, University of Toronto, Winter 2007

Embed Size (px)

Citation preview

Page 1: Searching and Browsing Using Tags Nikos Sarkas Social Information Systems Seminar DCS, University of Toronto, Winter 2007

Searching and Browsing Using Tags

Nikos Sarkas

Social Information Systems Seminar

DCS, University of Toronto, Winter 2007

Page 2: Searching and Browsing Using Tags Nikos Sarkas Social Information Systems Seminar DCS, University of Toronto, Winter 2007

Social Resource Sharing

The del.icio.us paradigm. Users store links to web pages of interest along

with arbitrary, user-specified tags in a server. The model is independent of the resource

being shared. Music (Last.fm) Photos (Flickr) Publications (CiteULike) …

Page 3: Searching and Browsing Using Tags Nikos Sarkas Social Information Systems Seminar DCS, University of Toronto, Winter 2007

Part I: Searching

Page 4: Searching and Browsing Using Tags Nikos Sarkas Social Information Systems Seminar DCS, University of Toronto, Winter 2007

Ranking Web Search Results

Two prevalent models. Ranking based on query-document similarity.

TF/IDF Metadata extraction Link analysis

Query independent static ranking. PageRank “Quality” based

Page 5: Searching and Browsing Using Tags Nikos Sarkas Social Information Systems Seminar DCS, University of Toronto, Winter 2007

Similarity Ranking, Take I

Query q={q1,q2,…,qn}.

Tags of URL p, T(p)={t1,t2,…,tm}. Define similarity as |q∩T(p)|/|T(p)|. Problems

Synonymy (according to the authors) Others?

Synonymy example Linux, Ubuntu and Gnome

Page 6: Searching and Browsing Using Tags Nikos Sarkas Social Information Systems Seminar DCS, University of Toronto, Winter 2007

Similarity Ranking, Take II

Use tags with “similar” meaning to enrich query.

Create 3 matrices MTP, tag-URL count matrix

ST, tag-tag similarity matrix

SP, URL-URL similarity matrix

Page 7: Searching and Browsing Using Tags Nikos Sarkas Social Information Systems Seminar DCS, University of Toronto, Winter 2007

Similarity Ranking, Take II

Iterate

Similarly update SP, until convergence. Then, similarity between a query q and a url p is

| ( )|| ( )|1

1 1

min( ( , ), ( , ))( , ) ( , )

| ( ) ( ) | max( ( , ), ( , ))

jiP tP t

TP i m TP j nk kTT i j P m n

m ni j TP i m TP j n

M t p M t pCS t t S p p

P t P t M t p M t p

1 1

( , ) ( , ( ))n m

T i ji j

sim q p S q p t

Page 8: Searching and Browsing Using Tags Nikos Sarkas Social Information Systems Seminar DCS, University of Toronto, Winter 2007

Social PageRank

“Popular web pages are tagged by many up-to-date users, using hot tags”.

Transfer popularity between entities. Define matrices MPU, MUT, MTP.

Iterate

' ' ' '

'1

, ,

, , ,

T Tk PU k k UT k

Tk TP k k TP k k UT k

k PU k

U M P T M U

P M A A M P U M T

P M U

Page 9: Searching and Browsing Using Tags Nikos Sarkas Social Information Systems Seminar DCS, University of Toronto, Winter 2007

Putting It All Together

Train a ranking function (RankSVM) using the following features BM25 similarity between query and url content Simple query-url tags similarity measure Complex query-url tags similarity measure PageRank Social PageRank

Results Precision, NDCG at k Small improvement over BM25, up to 25% for NDCG and

synthetic queries

Page 10: Searching and Browsing Using Tags Nikos Sarkas Social Information Systems Seminar DCS, University of Toronto, Winter 2007

Part II: Browsing

Page 11: Searching and Browsing Using Tags Nikos Sarkas Social Information Systems Seminar DCS, University of Toronto, Winter 2007

Tag Assisted Browsing

Currently two methods for tag driven browsing Keyword search Clouds of popular tags

We would like to support Semantic browsing: also present URLs annotated

with similar tags Hierarchical browsing: browse in a top-down

fashion

Page 12: Searching and Browsing Using Tags Nikos Sarkas Social Information Systems Seminar DCS, University of Toronto, Winter 2007

Semantic Browsing

Define similarity between tags: Synonymic tags: similarity above a threshold. The synonymic tags and the tag itself defines

its semantic concept. Given that the user has selected L tags, that

define semantic concepts Sc={C1,…,CL}, related URLs are:

( , ) cos( ( ), ( ))i j i jsim t t P t P t

{ | , ( ) }Cp C S Tags p C

Page 13: Searching and Browsing Using Tags Nikos Sarkas Social Information Systems Seminar DCS, University of Toronto, Winter 2007

Hierarchical Browsing

Observations No neat tree structure Multiple ways to target resource URLs associated with different categories Dynamic structure: leafs can become inner nodes

Page 14: Searching and Browsing Using Tags Nikos Sarkas Social Information Systems Seminar DCS, University of Toronto, Winter 2007

Hierarchical Browsing

Generating sub-tags Train a classifier to identify which of the tags in

the semantic concept are sub-tags Features used: ratio of tag counts, intersection

size, etc. Clustering sub-tags

Ranks tags based on a complex formula Greedy clustering technique