Upload
prudence-douglas
View
214
Download
0
Embed Size (px)
Citation preview
Searching and Browsing Using Tags
Nikos Sarkas
Social Information Systems Seminar
DCS, University of Toronto, Winter 2007
Social Resource Sharing
The del.icio.us paradigm. Users store links to web pages of interest along
with arbitrary, user-specified tags in a server. The model is independent of the resource
being shared. Music (Last.fm) Photos (Flickr) Publications (CiteULike) …
Part I: Searching
Ranking Web Search Results
Two prevalent models. Ranking based on query-document similarity.
TF/IDF Metadata extraction Link analysis
Query independent static ranking. PageRank “Quality” based
Similarity Ranking, Take I
Query q={q1,q2,…,qn}.
Tags of URL p, T(p)={t1,t2,…,tm}. Define similarity as |q∩T(p)|/|T(p)|. Problems
Synonymy (according to the authors) Others?
Synonymy example Linux, Ubuntu and Gnome
Similarity Ranking, Take II
Use tags with “similar” meaning to enrich query.
Create 3 matrices MTP, tag-URL count matrix
ST, tag-tag similarity matrix
SP, URL-URL similarity matrix
Similarity Ranking, Take II
Iterate
Similarly update SP, until convergence. Then, similarity between a query q and a url p is
| ( )|| ( )|1
1 1
min( ( , ), ( , ))( , ) ( , )
| ( ) ( ) | max( ( , ), ( , ))
jiP tP t
TP i m TP j nk kTT i j P m n
m ni j TP i m TP j n
M t p M t pCS t t S p p
P t P t M t p M t p
1 1
( , ) ( , ( ))n m
T i ji j
sim q p S q p t
Social PageRank
“Popular web pages are tagged by many up-to-date users, using hot tags”.
Transfer popularity between entities. Define matrices MPU, MUT, MTP.
Iterate
' ' ' '
'1
, ,
, , ,
T Tk PU k k UT k
Tk TP k k TP k k UT k
k PU k
U M P T M U
P M A A M P U M T
P M U
Putting It All Together
Train a ranking function (RankSVM) using the following features BM25 similarity between query and url content Simple query-url tags similarity measure Complex query-url tags similarity measure PageRank Social PageRank
Results Precision, NDCG at k Small improvement over BM25, up to 25% for NDCG and
synthetic queries
Part II: Browsing
Tag Assisted Browsing
Currently two methods for tag driven browsing Keyword search Clouds of popular tags
We would like to support Semantic browsing: also present URLs annotated
with similar tags Hierarchical browsing: browse in a top-down
fashion
Semantic Browsing
Define similarity between tags: Synonymic tags: similarity above a threshold. The synonymic tags and the tag itself defines
its semantic concept. Given that the user has selected L tags, that
define semantic concepts Sc={C1,…,CL}, related URLs are:
( , ) cos( ( ), ( ))i j i jsim t t P t P t
{ | , ( ) }Cp C S Tags p C
Hierarchical Browsing
Observations No neat tree structure Multiple ways to target resource URLs associated with different categories Dynamic structure: leafs can become inner nodes
Hierarchical Browsing
Generating sub-tags Train a classifier to identify which of the tags in
the semantic concept are sub-tags Features used: ratio of tag counts, intersection
size, etc. Clustering sub-tags
Ranks tags based on a complex formula Greedy clustering technique