View
233
Download
0
Tags:
Embed Size (px)
Citation preview
Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries
Ricardo Campos, Alípio Jorge, Gaël Dias
Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries
Ricardo Campos 1, 2, 4 Alípio Jorge 3, 4 Gaël Dias 2
2 Centre of Human Language Tecnnology and Bioinformatics, University of Beira Interior, Covilhã, Portugal
QRU 2011 – 2nd International Query Representation and Understanding Workshop in association with SIGIR 2011, Beijing - China, July 28, 2011
1Tomar Polytechnic Institute, Tomar, Portugal
3 Faculty of Sciences, University of Oporto, OPorto, Portugal
4 LIAAD-INESC Porto L.A , OPorto, Portugal
[ w w w . i p t . p t ] [ w w w . l i a a d . u p . p t ] h u l t i g . d i . u b i . p t ]
[ w w w . l i n k e d i n . c o m / i n / c a m p o s r i c a r d o] [w w w . c c c . i p t . p t / ~ r i c a r d o]
Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries
Ricardo Campos, Alípio Jorge, Gaël Dias
Query: Lady Gaga
Official web site
INTRODUCTION
Web Snippets
Web Query Logs
Conclusions
Query: Lady Gaga. Official Website
MOTIVATIONS
Difficulties
Objectives
Different Approaches in the Extraction of T-I
This is a particular hard task that can become
even more difficult if the user is not clear in his
purpose.
2 - 21
[ w w w . l i n k e d i n . c o m / i n / c a m p o s r i c a r d o] [w w w . c c c . i p t . p t / ~ r i c a r d o]
Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries
Ricardo Campos, Alípio Jorge, Gaël Dias
Query: Lady Gaga
Informative texts:
Rihanna passes Gaga as Facebook's most popular lady…
Rumor texts:
Lady Gaga, queen of extravagant fashion, is planning to intern for ... the milliner confirmed the rumors that the 'Born This Way' singer and he were ...
INTRODUCTION
Web Snippets
Web Query Logs
Conclusions
Query: Lady Gaga. Informative and Rumor texts
MOTIVATIONS
Difficulties
Objectives
Different Approaches in the Extraction of T-I
2 - 21
[ w w w . l i n k e d i n . c o m / i n / c a m p o s r i c a r d o] [w w w . c c c . i p t . p t / ~ r i c a r d o]
Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries
Ricardo Campos, Alípio Jorge, Gaël Dias
Query: Lady Gaga
Biography:
Discography Release
INTRODUCTION
Web Snippets
Web Query Logs
Conclusions
Query: Lady Gaga. Biography and Discography
MOTIVATIONS
Difficulties
Objectives
Different Approaches in the Extraction of T-I
2 - 21
[ w w w . l i n k e d i n . c o m / i n / c a m p o s r i c a r d o] [w w w . c c c . i p t . p t / ~ r i c a r d o]
Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries
Ricardo Campos, Alípio Jorge, Gaël Dias
Query: Lady Gaga
Tour Dates:
INTRODUCTION
Web Snippets
Web Query Logs
Conclusions
Query: Lady Gaga. Tour Dates
Understanding the temporal nature of a query, namely of implicit ones, is one of the most interesting challenges (Berberich et al (2010)) in (T-IR) that would enable to apply specific strategies to improve web search results retrieval.
MOTIVATIONS
Difficulties
Objectives
Different Approaches in the Extraction of T-I
2 - 21
[ w w w . l i n k e d i n . c o m / i n / c a m p o s r i c a r d o] [w w w . c c c . i p t . p t / ~ r i c a r d o]
Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries
Ricardo Campos, Alípio Jorge, Gaël Dias
However, this may prove to be a particularly difficult task and a hard challenge:
1. Different semantic concepts can be related to a query:
2. Difficult to define the boundaries between what is temporal and what is not and so is the definition of temporal ambiguity;
3. Even if temporal intents can be inferred by human annotators, the question is how to transpose this to an automatic process.
INTRODUCTION
Web Snippets
Web Query Logs
Conclusions
Motivations
DIFFICULTIES
Deal with Implicit Temporal Queries is Difficult
Objectives
Different Approaches in the Extraction of T-I
3 - 21
[ w w w . l i n k e d i n . c o m / i n / c a m p o s r i c a r d o] [w w w . c c c . i p t . p t / ~ r i c a r d o]
Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries
Ricardo Campos, Alípio Jorge, Gaël Dias
In our work we aim to understand whether temporal information can be used to automatically disambiguate query terms, namely implicit temporal queries.
INTRODUCTION
Web Snippets
Web Query Logs
Conclusions
Understand the Temporal Nature of Implicit Temporal Queries
Motivations
Difficulties
OBJECTIVES
Different Approaches in the Extraction of T-I
4 - 21
[ w w w . l i n k e d i n . c o m / i n / c a m p o s r i c a r d o] [w w w . c c c . i p t . p t / ~ r i c a r d o]
Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries
Ricardo Campos, Alípio Jorge, Gaël Dias
Usually the extraction of temporal information is based on a metadata-based approach upon time-tagged controlled collections such as news articles, using the timestamp of the document.
Jun 16, 2009 – The city of São Paulo shall have to make use of the Credicard Hall as the venue for the 2011 Miss Universe. Today was also announced that Miss Morumbi show is going to be on July 27, 2009.From Miss Universe.Com
This information can be particularly useful to date relative temporal expressions found in a document (e.g., today) with a concrete date (e.g., document creation time):
However, it can be a tricky process if used to date implicit temporal queries as the time of the document can differ significantly from the actual content of the
document;
Metadata-Based Approach
INTRODUCTION
Web Snippets
Web Query Logs
Conclusions
Motivations
Difficutlies
Objectives
DIFFERENT APPROACHES IN THE EXTRACTION OF T-I
5 - 21
[ w w w . l i n k e d i n . c o m / i n / c a m p o s r i c a r d o] [w w w . c c c . i p t . p t / ~ r i c a r d o]
Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries
Ricardo Campos, Alípio Jorge, Gaël Dias
One possible solution is to seek for related temporal references over complementary web resources:
Query-Log Resources, based on similar year-qualified queries
Simply requires the set of web search results.
Imply that some versions of the query have already been issued.
Content Approach. Query-Logs. Query-Dependency
Content-Related Resources, based on a web content approach
INTRODUCTION
Web Snippets
Web Query Logs
Conclusions
Motivations
Difficutlies
Objectives
DIFFERENT APPROACHES IN THE EXTRACTION OF T-I
6 - 21
[ w w w . l i n k e d i n . c o m / i n / c a m p o s r i c a r d o] [w w w . c c c . i p t . p t / ~ r i c a r d o]
Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries
Content-Related Resources
Query-Log Resources
Conclusions
Introduction
Web Snippets
Web Query Logs
Conclusions
7 - 21
[ w w w . l i n k e d i n . c o m / i n / c a m p o s r i c a r d o] [w w w . c c c . i p t . p t / ~ r i c a r d o]
Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries
Ricardo Campos, Alípio Jorge, Gaël Dias
One of the most interesting approaches to date implicit temporal queries is to rely on the exploration of temporal evidence within web pages:
Introduction
WEB SNIPPETS
Web Query Logs
Conclusions
Temporal Evidence within Web Pages
Difficulties
Temporal Value
TEMPORAL INFORMATION
Temporal Classification
8 - 21
[ w w w . l i n k e d i n . c o m / i n / c a m p o s r i c a r d o] [w w w . c c c . i p t . p t / ~ r i c a r d o]
Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries
Ricardo Campos, Alípio Jorge, Gaël Dias
The use of web documents to date queries not entailing any temporal information can be however a tricky process.
The main problem is related to the difficulties underlying the association of the year date found in the document and the query:
Introduction
WEB SNIPPETS
Web Query Logs
ConclusionsDIFFICULTIES
Temporal Value
Temporal Information
Temporal Classification
Correlation between the Dates and Query Concepts
9 - 21
[ w w w . l i n k e d i n . c o m / i n / c a m p o s r i c a r d o] [w w w . c c c . i p t . p t / ~ r i c a r d o]
Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries
Ricardo Campos, Alípio Jorge, Gaël Dias
450
Oil Spill;
BP Oil Spill;Waka Waka;
In this work we aim to determine the temporal value of web snippets:
TSnippets =# Snippets Retrieved with Dates
# Snippets RetrievedTSnippets(.)
TTitle(.)
TUrl(.)
Introduction
WEB SNIPPETS
Web Query Logs
Conclusions
Measures
Difficulties
TEMPORAL VALUE
Temporal Information
Temporal Classification
10 - 21
[ w w w . l i n k e d i n . c o m / i n / c a m p o s r i c a r d o] [w w w . c c c . i p t . p t / ~ r i c a r d o]
Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries
Ricardo Campos, Alípio Jorge, Gaël Dias
Conceptual Classification Number Queries
Ambiguous 220
Clear 176
Temporal Classification Number Queries %
ATemporal 132 75%
Temporal 44 25%
Broad 54
If (TA(q) < 10%) then
Query is ATemporal
ElseQuery is Temporal
Each query was classified on the basis of a temporal ambiguity value:
Introduction
WEB SNIPPETS
Web Query Logs
ConclusionsDifficulties
Temporal Value
Temporal Information
TEMPORAL CLASSIFICATION
Temporal Ambiguity Value
11 - 21
[ w w w . l i n k e d i n . c o m / i n / c a m p o s r i c a r d o] [w w w . c c c . i p t . p t / ~ r i c a r d o]
Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries
Ricardo Campos, Alípio Jorge, Gaël Dias
In order to evaluate our simple classification model, we conducted a user study;
Human annotators were asked to consider each of the 176 queries, to look at web search results and to classify them as ATemporal or Temporal;
Introduction
WEB SNIPPETS
Web Query Logs
Conclusions
Evaluation
Difficulties
Temporal Value
Temporal Information
TEMPORAL CLASSIFICATION
Overall, results pointed at 35% of implicit temporal queries from human annotators, while only 25% were given by our methodology;
12 - 21
[ w w w . l i n k e d i n . c o m / i n / c a m p o s r i c a r d o] [w w w . c c c . i p t . p t / ~ r i c a r d o]
Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries
Ricardo Campos, Alípio Jorge, Gaël Dias
Another approach to date implicit temporal queries is to use web query logs based on similar year-qualified queries:
Introduction
Web Snippets Conclusions
WEB QUERY LOGS
Bp oil spill
Bp oil spill live feed
Bp oil spill 2010
Bp oil spill map
Bp oil spill claims
Completion Search-Engine Features
Difficulties
Temporal Value
TEMPORAL INFORMATION
13 - 21
[ w w w . l i n k e d i n . c o m / i n / c a m p o s r i c a r d o] [w w w . c c c . i p t . p t / ~ r i c a r d o]
Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries
Ricardo Campos, Alípio Jorge, Gaël Dias
Extremely hard to access outside the big industrial labs;
Queries that have never been typed, thus not existing in the web search log e.g. Blaise Pascal 1623 (his year birth date)
Highly dependent on the user own intents:
Not adapted to concept disambiguation;
Query: EuroEuro 2008;
Euro 2012;
Introduction
Web Snippets Conclusions
WEB QUERY LOGS
Web Query Logs Drawbacks
DIFFICULTIES
Temporal Value
Temporal Information
14 - 21
[ w w w . l i n k e d i n . c o m / i n / c a m p o s r i c a r d o] [w w w . c c c . i p t . p t / ~ r i c a r d o]
Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries
Ricardo Campos, Alípio Jorge, Gaël Dias
Explicit temporal queries only represent 1.21% of the overall set [5];
Introduction
Web Snippets
WEB QUERY LOGS
Conclusions
Temporal Information
Furthermore, we must also take into account that the simple fact that a query is year-qualified does not necessarily mean that it has a temporal intent;
Similarly to TTitle(.), TSnippets(.) and TUrl(.)
TLogYahoo(.)
TLogGoogle(.)
Difficulties
TEMPORAL VALUE
Measures
TLogGoogle =#Suggested Queries Retrieved with Dates
# Suggested Queries Retrieved
15 - 21
[ w w w . l i n k e d i n . c o m / i n / c a m p o s r i c a r d o] [w w w . c c c . i p t . p t / ~ r i c a r d o]
Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries
Ricardo Campos, Alípio Jorge, Gaël Dias
Pearson correlation coefficient between each of the dimensions:
TSnippets(.)
TTitle(.)
TUrl(.)
Results show that:
TLogGoogle(.)
TLogYahoo(.)
TLogGoogle TTitle TSnippet TUrl
TLogYahoo 0.63 0.61 0.52 0.48
TLogGoogle 0.69 0.63 0.44
This means that as dates appear in the titles and snippets, they also tend to appear, albeit in a more reduced form, in the auto-complete query suggestion of Google.
Introduction
Web Snippets
WEB QUERY LOGS
Conclusions
Results
Difficulties
TEMPORAL VALUE
Temporal Information
16 - 21
[ w w w . l i n k e d i n . c o m / i n / c a m p o s r i c a r d o] [w w w . c c c . i p t . p t / ~ r i c a r d o]
Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries
Ricardo Campos, Alípio Jorge, Gaël Dias
An additional analysis led us to conclude that the temporal information is more frequent in web snippets than in any of the query logs of Google and Yahoo!;
Overall, while most of the queries have a TSnippet(.) value around 20%, TLogYahoo(.) and TLogGoogle(.) are mostly near to 0%.
Introduction
Web Snippets
WEB QUERY LOGS
Conclusions
Results
Difficulties
TEMPORAL VALUE
Temporal Information
17 - 21
[ w w w . l i n k e d i n . c o m / i n / c a m p o s r i c a r d o] [w w w . c c c . i p t . p t / ~ r i c a r d o]
Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries
Ricardo Campos, Alípio Jorge, Gaël Dias
Finally, we studied how strongly a given query is associated to a set of different dates, both in web snippets and in web query logs.
For this, we have built a confidence interval for the difference of means, for paired samples, between the number of times that the dates appear in the web snippets and in web query logs:
TLogGoogle(.)
TLogYahoo(.) [5.10; 6.38]
[5.12; 6.43]
Results show that the number of different dates that appear in web snippets is significantly higher than in either one of the two web query logs.
Introduction
Web Snippets
WEB QUERY LOGS
Conclusions
Results
Difficulties
TEMPORAL VALUE
Temporal Information
18 - 21
[ w w w . l i n k e d i n . c o m / i n / c a m p o s r i c a r d o] [w w w . c c c . i p t . p t / ~ r i c a r d o]
Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries
Ricardo Campos, Alípio Jorge, Gaël Dias
In this paper, we showed that web snippets are a very rich source of temporal information, especially years. Dates often appear correlated in snippets and titles.
Results show that future dates are very common in web snippets, but seldom used in Queries;
Dates mostly appear together with the categories of automotive, sports, politics, both in web snippets and web query logs;
Some of the items have even more than one date;
Introduction
Web Snippets
Web Query Logs
CONCLUSIONS
Contrary to web snippets, web query logs have a very small temporal value (at about 1.2%), which is statistically smaller when compared to the former;
Temporal Value of Web Snippets and Web Query Logs
19 - 21
[ w w w . l i n k e d i n . c o m / i n / c a m p o s r i c a r d o] [w w w . c c c . i p t . p t / ~ r i c a r d o]
Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries
Ricardo Campos, Alípio Jorge, Gaël Dias
Our experiments, also showed that web snippets can be used for query understanding;
So, the use of complementary information, such as the number of instances or the number of different dates, should be considered in future approaches;
Introduction
Web Snippets
Web Query Logs
CONCLUSIONS
We introduced a simple model for the temporal classification of queries based on the temporal value of web snippets that showed that 25% of the queries have a temporal nature. These values contrast with the 35% resulted from our user study;
Query Understanding based on Web Snippets
20 - 21
[ w w w . l i n k e d i n . c o m / i n / c a m p o s r i c a r d o] [w w w . c c c . i p t . p t / ~ r i c a r d o]
Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries
Ricardo Campos, Alípio Jorge, Gaël Dias
Thanks for your attention!
Both experimental datasets are available for download at www.ccc.ipt.pt/~ricardo/software
VipAccess is online at http://hultig.di.ubi.pt/vipaccess
Web Snippets
Web Query Logs
Conclusions
Introduction
HULTIG is online at http://hultig.di.ubi.pt
LIAAD is online at http://liaad.up.pt
Polytechnic Institute of Tomar is online at http://www.ipt.pt
Gaël Dias is online at http://www.di.ubi.pt/~ddg
Alípio Jorge is online at http://liaad.up.pt/~amjorge
21 - 21