15
Folksonomy-Based Adaptive Query Expansion Claudio Biancalana, Fabio Gasparetti, Alessandro Micarelli, Alfonso Miola, and Giuseppe Sansonetti Department of Computer Science and Automation Artificial Intelligence Laboratory, Roma Tre University Via della Vasca Navale, 79, 00146 Rome, Italy SRS 2012 – Montreal, Canada, July 17, 2012

Slides SRS 2012

Embed Size (px)

Citation preview

Page 1: Slides SRS 2012

Folksonomy-Based Adaptive Query Expansion

Claudio Biancalana, Fabio Gasparetti, Alessandro Micarelli, Alfonso Miola, and Giuseppe Sansonetti

Department of Computer Science and Automation

Artificial Intelligence Laboratory, Roma Tre University

Via della Vasca Navale, 79, 00146 Rome, Italy

SRS 2012 – Montreal, Canada, July 17, 2012

Page 2: Slides SRS 2012

State of the Art •  1993 - Web Search Engines

•  Popular techniques to improve their performance   Explicit Relevance Feedback and (Automatic) Query Expansion

(Maron Kuhns 1960, Rocchio 1971)

  PageRank (1998)

  (Implicitly built) User Profiles (2004)

•  e.g., Google Personalized

  Exploiting Social Networks or Signals (2010)

•  Facebook, YouTube, Twitter

  Implicitly Understanding User Actions

SRS 2012 – Montreal, Canada, July 17, 2012

Page 3: Slides SRS 2012

SRS 2012 – Montreal, Canada, July 17, 2012

Query Expansion

The process of expanding a user query with additional related words and phrases

  Original Query Q: {q1, q2,…, qk, qk+1,…, qn}

  Terms to Add Q+: {e1, e2,..., em}

  Terms to Remove Q-: {qk+1,..., qn}

Expanded Query

EQ = (Q U Q+) - Q-

{q1,q2,...,qk,e1,e2,...,em}

Page 4: Slides SRS 2012

SRS 2012 – Montreal, Canada, July 17, 2012

Building a Co-Occ Matrix

For each document, a co-occurrence matrix is generated and then summed up in a single matrix

  Usually a POS tagger extracts nouns, proper nouns, and adjectives

t1

t2

t3

t4

t5

t1 t2 t3 t4 t5

0.0

0.0

0.0

0.0

0.0

2.0

0.0

0.0

0.0

0.0

1.0

9.0

1.0

4.0

1.0

2.0

2.0

1.0

0.0

3.0

3.0

4.0 0.0

2.0

9.0

Co-Occurrence Matrix

Page 5: Slides SRS 2012

SRS 2012 – Montreal, Canada, July 17, 2012

Limits of Co-Occ Matrices

•  Furnas’ Vocabulary problem (1987)   Polysemy and Homonym

•  Mouth (river-sea; cave entrance; body part)

•  River Bank or Financial Bank

•  Corpus-dependent   Small corpora contain few statistics

  Relevant concepts missing

Page 6: Slides SRS 2012

SRS 2012 – Montreal, Canada, July 17, 2012

Research Question

Is it possible to combine Query Expansion, Social Web, Semantic Search, and User Personalization in traditional Web search tools?

Page 7: Slides SRS 2012

SRS 2012 – Montreal, Canada, July 17, 2012

Nereau, Master of Spiders, is the name of a divinity worshipped in the Nauru islands, in Micronesia. It is a foremost figure in many myths, some of which give it a specific role, that of endowing the mad with rationality and the mute with speech, thus making them complete human beings.

Nereau

Page 8: Slides SRS 2012

SRS 2012 – Montreal, Canada, July 17, 2012

Nereau Co-Occurrence Matrix •  Extension of Co-occurrence matrix:

  Semantic meta-data as 3rd dimension

  The user matrix is built on usage data

•  Use of Social Bookmarking Services for metadata retrieval:   e.g., delicious, StumbleUpon, Digg

Page 9: Slides SRS 2012

SRS 2012 – Montreal, Canada, July 17, 2012

•  Tags associated with visited URLs are collected and associated to the (stemmed) keywords from extracted content.

•  Each co-occ matrix is associated to a tag

<t1, t2, tag, co-occ>

Nereau Co-Occurrence Matrix

Page 10: Slides SRS 2012

SRS 2012 – Montreal, Canada, July 17, 2012

•  The expansion follows similar steps: each term of the query retrieves multiple co-occ matrices associated to different tags

•  The occs of the tags are summed up over all the query terms obtaining a weighted set:

Nereau Co-Occurrence Matrix

Page 11: Slides SRS 2012

SRS 2012 – Montreal, Canada, July 17, 2012

The co-occ keywords associated to the most relevant tags compose the new query

Nereau Co-Occurrence Matrix

Page 12: Slides SRS 2012

SRS 2012 – Montreal, Canada, July 17, 2012

Experimental Evaluations

Does it work?

Page 13: Slides SRS 2012

SRS 2012 – Montreal, Canada, July 17, 2012

•  Three kinds of evaluations   TREC corpus-based (500K docs, 249 queries)

•  RF vs CoOcc vs Google vs Nereau

  ODP corpus-based

•  RF vs CoOcc vs Google vs Nereau

  Web user-based

•  Google vs PersGoogle vs RF vs Nereau

•  nDCG, P@n, MAP

Experimental Evaluations

Page 14: Slides SRS 2012

Web corpus

Web

cor

pus

42 users on real Web ���sessions nDCG@{1,5,10}

SRS 2012 – Montreal, Canada, July 17, 2012

Experimental Evaluations

Page 15: Slides SRS 2012

Conclusions and Future Work •  A Nereau search engine that combines:

  Traditional Query Expansion

  Social Web

  Semantic Spaces

  Basic User Personalization

•  Suitable to be included in traditional search engines;

  Complexity O(n2K)

  n = training docs

  K = keywords extracted

•  Future Work

  Including more social data (e.g., networks, user authority)

  Addressing the dynamic of folksonomies

  Automatically assign tags when no social data is available

SRS 2012 – Montreal, Canada, July 17, 2012