Upload
margaret-hart
View
220
Download
0
Tags:
Embed Size (px)
Citation preview
Analyzing and EvaluatingQuery Reformulation Strategies in
Web Search Logs
Reporter Hsan-Yu Lin
Outline
• Introduction• Related Work• Reformulation Strategies• Reformulation Effectiveness Metrics• Discussion And Conclusion
Introduction
• Query reformulation (refinement)– Users frequently modify a previous search query
in hope of retrieving better results
• Goal:– Look at the types of query reformulation users
perform – Evaluate them using effectiveness metrics such as
click data
Related Work
• Computer-Generated Reformulations
Related Work
• Query Session Boundary Detection– Automatic new topic identification using multiple
linear regression (Information Processing & Management 2006)• using time and common words
– Identification of User Sessions with Hierarchical Agglomerative Clustering (ASIS&T ‘06) • using hierarchical clustering to find better timeout
value
Procedure
1. Create taxonomy of query reformulation strategies defined by formal language
2. An unsupervised rule-based classifier in detecting the different query reformulation strategies
3. Analysis of correlations between query reformulation strategies and effectiveness metrics
Reformulation Strategies
• Definitions: _ : space character P = {',−,.} : punctuation λ : empty string Σ = {[a - z],[0 - 9]} U P : alphabet ci ∈ Σ : character wi ∈ Σ∗ : word zi ( ∈ Σ U {_} ) ∗ : any string
• REFORM. 1: WORD REORDER– seattle pizza palace pizza seattle palace
• REFORM. 2: WHITESPACE AND PUNCTUATION– wal mart, tomatoprices walmart tomato prices
Reformulation Strategies
Reformulation Strategies• REFORM. 3: REMOVE WORDS
– yahoo stock price price yahoo
• REFORM. 4: ADD WORDS– eastlake home eastlake home price index
• REFORM. 5: URL STRIPPING– http www.yahoo.com yahoo
Reformulation Strategies• REFORM. 6: STEMMING– running over bridges run over bridge
• REFORM. 7: FORM ACRONYM– personal computer pc
• REFORM. 8: EXPAND ACRONYM– pda personal digital assistant
Reformulation Strategies• REFORM. 9: SUBSTRING– is there spyware on my computer is there
spywa
• REFORM. 10: SUPERSTRING– nevada police rec nevada police records 2008
• REFORM. 11: ABBREVIATION– shortened dict --> short dictionary
Reformulation Strategies• REFORM. 12: WORD SUBSTITUTION
• Synonym: easter egg search easter egg hunt• Hyponym: crimson scarf red scarf• Hypernym: personal computer laptop• Meronym: finger hand• Holonym: automobile wheel
• REFORM. 13: SPELLING CORRECTION– reformualtion reformulation
Undetected Reformulations
• Categories of reformulations which are not included in taxonomy:– Semantic Rephrasing
• how to calculate nutritional values weight watchers calculator
– Multi-Reformulations• lane county gabrage lane county garbage disposal (add
words and spelling correction)– Classifier Rule Limitations
• spelling correction used a Levenshtein edit distance of 2• Wordnet database limitation
Undetected Reformulations
The Rule-based Classifier
Measures For Session Boundary Detection
• Test data:– 100 users in the AOL query logs for evaluation– Same queries were removed (40.8% of queries)– 9,091 query pairs– 2,483 reformulations and 6,608 new queries
(27.3% reformulations)
Measures For Session Boundary Detection
• Hope high precision but not necessarily high recall– interested in inter-reformulation rather than intra-
reformulation
Reformulation Effectiveness Metrics
• Data: AOL query logs (released on 08/03/2006)• Queries: 36,389,567– 16,069,421 new queries– 14,861,326 same queries– 3,411,706 reformulations
• Metrics– Click Pattern– Click URL– Rank Change of Clicked Results
Click Pattern
Click Pattern
• (SkipSkip + ClickSkip) v.s(SkipClick + ClickClick)
• (SkipSkip) v.s
(SkipClick)
Click URL
Rank Change and Median Time between Queries
Discussion
• different reformulation strategies were effective depending on the action from the initial query– Word substitution • Skip Skip• Click Click
– spelling correction• Skip Click• Click Skip
Limitations
• Lack of Context• Normalized Query Logs• Ambiguous Queries– ‘american airlines’ , ‘delta airlines’
• Search Engine Effects
CONCLUSIONS
• Describes the human side of query reformulation and contributes to our understanding of users in search interaction
• add/remove words, word substitution, acronym expansion, and spelling correction seem most effective
• acronym formation and reordering words may be less beneficial to the user