Upload
latrell-aycock
View
212
Download
0
Tags:
Embed Size (px)
Citation preview
Cumulative Progress in Language Models for Information Retrieval
Antti Puurula
6/12/2013
Australasian Language Technology Workshop
University of Waikato
Ad-hoc Information Retrieval
• Ad-hoc Information Retrieval (IR) forms the basic task in IR:• Given a query, retrieve and rank documents in a collection
• Origins: • Cranfield 1 (1958-1960), Cranfield 2 (1962-1966), SMART (1961-
1999)
• Major evaluations:• TREC Ad-hoc (1990-1999), TREC Robust (2003-2005), CLEF (2000-
2009), INEX (2009-2010), NTCIR (1999-2013), FIRE (2008-2013)
Illusionary Progress in Ad-hoc IR
• TREC ad-hoc evaluations stopped in 1999, as progress plateaued
• More diverse tasks became the foci of research
• “There is little evidence of improvement in ad-hoc retrieval technology over the past decade” (Armstrong et al. 2009)• Weak baselines, non-cumulative improvements• ⟶ “no way of using LSI achieves a worthwhile improvement in
retrieval accuracy over BM25” (Atreya & Elkan, 2010)• ⟶ “there remains very little room for improvement in ad hoc
search” (Trotman & Keeler, 2011)
Progress in Language Models for IR?
• Language Models (LM) form one of the main approaches to IR
• Many improvements to LMs not adopted generally or evaluated systematically• TF-IDF feature weighting• Pitman-Yor Process smoothing• Feedback models
• Are these improvements consistent across standard datasets, cumulative, and do they improve on a strong baseline?
Query Likelihood Language Models
• Query Likelihood (QL) (Kalt 1996, Hiemstra 1998, Ponte & Croft 1998) is the basic application of LMs for IR
• Unigram case: using count vectors to represent documents and queries , rank documents given a query according to
• Assuming a generative model , and uniform priors over :
Query Likelihood Language Models 2
• The unigram QL-score for each document becomes:
• where is the Multinomial coefficient, and document models are given by the Maximum Likelihood estimates:
Pitman-Yor Process Smoothing
• Standard methods for smoothing in IR LMs are Dirichlet Prior (DP) and 2-Stage Smoothing (2SS) (Zhai & Lafferty 2004, Smucker & Allan 2007)
• Recent suggested improvement is Pitman-Yor Process smoothing (PYP), an approximation to inference on a Pitman-Yor Process (Momtazi & Klakow 2010, Huang & Renals 2010)
• All methods interpolate unsmoothed parameters with a background distribution. PYP additionally discounts the unsmoothed counts
Pitman-Yor Process Smoothing 2
• All methods share the form:
• DP:
• 2SS:
• PYP: , and
Pitman-Yor Process Smoothing 2
• All methods share the form:
• DP:
• 2SS:
• PYP: , and
,
Pitman-Yor Process Smoothing 3
• The background model is most commonly estimated by concatenating all collection documents into a single document:
• Less commonly, a uniform background model is used:
TF-IDF Feature Weighting
• Multinomial modelling assumptions of text can be corrected with TF-IDF weighting (Rennie et al. 2003, Frank & Bouckaert 2006)
• Traditional view: IDF-weighting unnecessary with IR LMs (Zhai & Lafferty 2004)
• Recent view: combination is complementary (Smucker & Allan 2007, Momtazi et al. 2010)
TF-IDF Feature Weighting 2
• Dataset documents can be weighted by TF-IDF:
• , where is the unweighted count vector, the number of documents, and number of documents where word occurs
• First factor is TF log transform using unique length normalization (Singhal et al. 1996)
• Second factor is Robertson-Walker IDF(Robertson & Zaragoza 2009)
TF-IDF Feature Weighting 3
• IDF has a overlapping function to collection smoothing (Hiemstra & Kraaij 1998)
• Interaction taken into account by replacing collection model by a uniform model in smoothing:
Model-based Feedback
• Pseudo-feedback is a traditional method in Ad-hoc IR:• Using the retrieved documents for original query , construct and
rank using a new query
• With LMs two different formalizations enable model-based feedback:• Kl-Divergence Retrieval (Zhai & Lafferty 2001)• Relevance Models (Lavrenko & Croft 2001)
• Both enable replacing the original query counts by a model
Model-based Feedback 2
• Many modeling choices exist for the feedback models, such as:• Using top retrieved documents (commonly )• Truncating the word vector to words present in the original query• Weighting the feedback documents using • Interpolating the feedback model with the original query
• These modeling choices are combined here
Model-based Feedback 3
• The interpolated query model is estimated for the query words from the top document models :
• , where is the interpolation weight and is normalizer:
Experimental Setup
• Ad-hoc IR experiments conducted on 13 standard datasets• TREC1-5 split according to data
source• OHSU-TREC• FIRE 2008-2011 English
• Preprocessing: stopword & short word() removal, Porter-stemming
• Each dataset split into development and evaluation subsets
Experimental Setup 2
• Software used for experiments was the SGMWeka 1.44 toolkit:• http://sourceforge.net/projects/sgmweka/
• Smoothing parameters optimized on development sets using Gaussian Random Searches (Luke 2009)
• Evaluation performed on evaluation sets, using Mean Average Precision of top documents (MAP@50)
• Significance tested with paired one-tailed t-tests between the datasets, with
Results
• Significant differences:• PYP > DP• PYP+TI > 2SS• PYP+TI+FB > PYP+TI
• PYP+TI+FB improves on 2SS by 4.07 MAP@50 absolute, a 17.1% relative improvement
Discussion
• The 3 evaluated improvements in language models for IR:• require little additional computation
• can be implemented with small modifications to existing IR systems• are substantial, significant and cumulative across 13 standard
datasets, compared to DP and 2SS baselines (4.07 MAP@50 absolute, 17.1% relative)
• Improvements requiring more computation possible• document neighbourhood smoothing, word correlation models,
passage-based LMs, bigram LMs, …
• More extensive evaluations needed for confirming progress