25
Analyzing and Evaluating Query Reformulation Strategies in Web Search Logs Reporter Hsan-Yu Lin

Analyzing and Evaluating Query Reformulation Strategies in Web Search Logs ReporterHsan-Yu Lin

Embed Size (px)

Citation preview

Page 1: Analyzing and Evaluating Query Reformulation Strategies in Web Search Logs ReporterHsan-Yu Lin

Analyzing and EvaluatingQuery Reformulation Strategies in

Web Search Logs

Reporter Hsan-Yu Lin

Page 2: Analyzing and Evaluating Query Reformulation Strategies in Web Search Logs ReporterHsan-Yu Lin

Outline

• Introduction• Related Work• Reformulation Strategies• Reformulation Effectiveness Metrics• Discussion And Conclusion

Page 3: Analyzing and Evaluating Query Reformulation Strategies in Web Search Logs ReporterHsan-Yu Lin

Introduction

• Query reformulation (refinement)– Users frequently modify a previous search query

in hope of retrieving better results

• Goal:– Look at the types of query reformulation users

perform – Evaluate them using effectiveness metrics such as

click data

Page 4: Analyzing and Evaluating Query Reformulation Strategies in Web Search Logs ReporterHsan-Yu Lin

Related Work

• Computer-Generated Reformulations

Page 5: Analyzing and Evaluating Query Reformulation Strategies in Web Search Logs ReporterHsan-Yu Lin

Related Work

• Query Session Boundary Detection– Automatic new topic identification using multiple

linear regression (Information Processing & Management 2006)• using time and common words

– Identification of User Sessions with Hierarchical Agglomerative Clustering (ASIS&T ‘06) • using hierarchical clustering to find better timeout

value

Page 6: Analyzing and Evaluating Query Reformulation Strategies in Web Search Logs ReporterHsan-Yu Lin

Procedure

1. Create taxonomy of query reformulation strategies defined by formal language

2. An unsupervised rule-based classifier in detecting the different query reformulation strategies

3. Analysis of correlations between query reformulation strategies and effectiveness metrics

Page 7: Analyzing and Evaluating Query Reformulation Strategies in Web Search Logs ReporterHsan-Yu Lin

Reformulation Strategies

• Definitions: _ : space character P = {',−,.} : punctuation λ : empty string Σ = {[a - z],[0 - 9]} U P : alphabet ci ∈ Σ : character wi ∈ Σ∗ : word zi ( ∈ Σ U {_} ) ∗ : any string

Page 8: Analyzing and Evaluating Query Reformulation Strategies in Web Search Logs ReporterHsan-Yu Lin

• REFORM. 1: WORD REORDER– seattle pizza palace pizza seattle palace

• REFORM. 2: WHITESPACE AND PUNCTUATION– wal mart, tomatoprices walmart tomato prices

Reformulation Strategies

Page 9: Analyzing and Evaluating Query Reformulation Strategies in Web Search Logs ReporterHsan-Yu Lin

Reformulation Strategies• REFORM. 3: REMOVE WORDS

– yahoo stock price price yahoo

• REFORM. 4: ADD WORDS– eastlake home eastlake home price index

• REFORM. 5: URL STRIPPING– http www.yahoo.com yahoo

Page 10: Analyzing and Evaluating Query Reformulation Strategies in Web Search Logs ReporterHsan-Yu Lin

Reformulation Strategies• REFORM. 6: STEMMING– running over bridges run over bridge

• REFORM. 7: FORM ACRONYM– personal computer pc

• REFORM. 8: EXPAND ACRONYM– pda personal digital assistant

Page 11: Analyzing and Evaluating Query Reformulation Strategies in Web Search Logs ReporterHsan-Yu Lin

Reformulation Strategies• REFORM. 9: SUBSTRING– is there spyware on my computer is there

spywa

• REFORM. 10: SUPERSTRING– nevada police rec nevada police records 2008

• REFORM. 11: ABBREVIATION– shortened dict --> short dictionary

Page 12: Analyzing and Evaluating Query Reformulation Strategies in Web Search Logs ReporterHsan-Yu Lin

Reformulation Strategies• REFORM. 12: WORD SUBSTITUTION

• Synonym: easter egg search easter egg hunt• Hyponym: crimson scarf red scarf• Hypernym: personal computer laptop• Meronym: finger hand• Holonym: automobile wheel

• REFORM. 13: SPELLING CORRECTION– reformualtion reformulation

Page 13: Analyzing and Evaluating Query Reformulation Strategies in Web Search Logs ReporterHsan-Yu Lin

Undetected Reformulations

• Categories of reformulations which are not included in taxonomy:– Semantic Rephrasing

• how to calculate nutritional values weight watchers calculator

– Multi-Reformulations• lane county gabrage lane county garbage disposal (add

words and spelling correction)– Classifier Rule Limitations

• spelling correction used a Levenshtein edit distance of 2• Wordnet database limitation

Page 14: Analyzing and Evaluating Query Reformulation Strategies in Web Search Logs ReporterHsan-Yu Lin

Undetected Reformulations

Page 15: Analyzing and Evaluating Query Reformulation Strategies in Web Search Logs ReporterHsan-Yu Lin

The Rule-based Classifier

Page 16: Analyzing and Evaluating Query Reformulation Strategies in Web Search Logs ReporterHsan-Yu Lin

Measures For Session Boundary Detection

• Test data:– 100 users in the AOL query logs for evaluation– Same queries were removed (40.8% of queries)– 9,091 query pairs– 2,483 reformulations and 6,608 new queries

(27.3% reformulations)

Page 17: Analyzing and Evaluating Query Reformulation Strategies in Web Search Logs ReporterHsan-Yu Lin

Measures For Session Boundary Detection

• Hope high precision but not necessarily high recall– interested in inter-reformulation rather than intra-

reformulation

Page 18: Analyzing and Evaluating Query Reformulation Strategies in Web Search Logs ReporterHsan-Yu Lin

Reformulation Effectiveness Metrics

• Data: AOL query logs (released on 08/03/2006)• Queries: 36,389,567– 16,069,421 new queries– 14,861,326 same queries– 3,411,706 reformulations

• Metrics– Click Pattern– Click URL– Rank Change of Clicked Results

Page 19: Analyzing and Evaluating Query Reformulation Strategies in Web Search Logs ReporterHsan-Yu Lin

Click Pattern

Page 20: Analyzing and Evaluating Query Reformulation Strategies in Web Search Logs ReporterHsan-Yu Lin

Click Pattern

• (SkipSkip + ClickSkip) v.s(SkipClick + ClickClick)

• (SkipSkip) v.s

(SkipClick)

Page 21: Analyzing and Evaluating Query Reformulation Strategies in Web Search Logs ReporterHsan-Yu Lin

Click URL

Page 22: Analyzing and Evaluating Query Reformulation Strategies in Web Search Logs ReporterHsan-Yu Lin

Rank Change and Median Time between Queries

Page 23: Analyzing and Evaluating Query Reformulation Strategies in Web Search Logs ReporterHsan-Yu Lin

Discussion

• different reformulation strategies were effective depending on the action from the initial query– Word substitution • Skip Skip• Click Click

– spelling correction• Skip Click• Click Skip

Page 24: Analyzing and Evaluating Query Reformulation Strategies in Web Search Logs ReporterHsan-Yu Lin

Limitations

• Lack of Context• Normalized Query Logs• Ambiguous Queries– ‘american airlines’ , ‘delta airlines’

• Search Engine Effects

Page 25: Analyzing and Evaluating Query Reformulation Strategies in Web Search Logs ReporterHsan-Yu Lin

CONCLUSIONS

• Describes the human side of query reformulation and contributes to our understanding of users in search interaction

• add/remove words, word substitution, acronym expansion, and spelling correction seem most effective

• acronym formation and reordering words may be less beneficial to the user