39
Constructing Query Models from Elaborate Query Formulations A Few Examples Go A Long Way Krisztian Balog kbalog@science. uva.nl Wouter Weerkamp weerkamp@science. uva.nl Maarten de Rijke [email protected] l ISLA, University of Amsterdam Presented by Tanvi Motwani

Tanvi Motwani- A Few Examples Go A Long Way

Embed Size (px)

Citation preview

Page 1: Tanvi Motwani- A Few Examples Go A Long Way

Constructing Query Models from Elaborate Query Formulations

A Few Examples Go A Long Way

Krisztian [email protected]

Wouter [email protected]

Maarten de [email protected]

ISLA, University of Amsterdam

Presented by Tanvi Motwani

Page 2: Tanvi Motwani- A Few Examples Go A Long Way

AIM

• This paper aims to introduce and compare several methods for sampling expansion terms with query independent as well as query dependent techniques.

• Along with the query it takes sample documents as input. Sample documents are additional information that users provide consisting of small number of “key references” (pages that should be linked to by good overview page of that topic)

• Aim is to increase “aspect recall” by attempting to uncover aspects of information which are not captured by the query but by the sample documents.

Page 3: Tanvi Motwani- A Few Examples Go A Long Way

Aspect Retrieval

Query: What are current applications of robotics?

Find as many different applications as possible.

Example Aspects

A1: spot-welding robotics

A2: controlling inventory

A3: pipe-laying robots

A4: talking robot

A5: robots for loading & unloading

memory tapesA6: robot telephone operators

A7: robot cranes… …

Aspect judgments A1 A2 A3 … ... Ak

d1 1 1 0 0 … 0 0

d2 0 1 1 1 … 0 0

d3 0 0 0 0 … 1 0

….dk 1 0 1 0 ... 0 1

Page 4: Tanvi Motwani- A Few Examples Go A Long Way

Overview

Retrieval

Model

Experimental Set up

Query Representati

on

Baseline Parameter

s

Experimental

Evaluation

Page 5: Tanvi Motwani- A Few Examples Go A Long Way

Overview

Retrieval

Model

Experimental Set up

Query Representati

on

Baseline Parameter

s

Experimental

Evaluation

Query Likelihoo

d

Document Modeling

Query Modeli

ng

Page 6: Tanvi Motwani- A Few Examples Go A Long Way

Overview

Retrieval

Model

Experimental Set up

Query Representati

on

Baseline Parameter

s

Experimental

Evaluation

Query Likelihoo

d

Document Modeling

Query Modeli

ng

Page 7: Tanvi Motwani- A Few Examples Go A Long Way

What is a Rainforest?

P(D1|Q) = 0.32

P(D2|Q) = 0.26

P(D3|Q) = 0.19

P(D4|Q) = 0.12

P(D5|Q) = 0.09

Query (Q) Documents

Page 8: Tanvi Motwani- A Few Examples Go A Long Way

Query Likelihood

Bayes’ Rule

Ignoring P(Q)

Assuming Independence of Query terms

Taking log

• Using query and document models

Page 9: Tanvi Motwani- A Few Examples Go A Long Way

What is a Rainforest?

Query (Q) Documents

Relevance

Model

Page 10: Tanvi Motwani- A Few Examples Go A Long Way

Underlying Relevance Model

The query and relevant documents are random samples from an underlying relevance model R.

Documents are ranked based on their similarity to the query model.The Kullback-Leibler divergence between the query and document models can he used to provide a ranking of documents.

Page 11: Tanvi Motwani- A Few Examples Go A Long Way

Overview

Retrieval

Model

Experimental Set up

Query Representati

on

Baseline Parameter

s

Experimental

Evaluation

Query Likelihoo

d

Document Modeling

Query Modeli

ng

Page 12: Tanvi Motwani- A Few Examples Go A Long Way

Document Modeling

Maximum Likelihood Estimate

Smoothing ML estimate

This document will have P(“Rain”|D) as 0, thus smoothing is required.

Page 13: Tanvi Motwani- A Few Examples Go A Long Way

Query Modeling

P(t|Q) is extremely space and thus query expansion is necessary.

This document does not have words “Rain” and “Forest” but have related words such as “Wild Life”. Expansion of query brings different “aspects” of the topic.

Page 14: Tanvi Motwani- A Few Examples Go A Long Way

Overview

Retrieval

Model

Experimental Set up

Query Representati

on

Baseline Parameter

s

Experimental

Evaluation

Page 15: Tanvi Motwani- A Few Examples Go A Long Way

Experimental Setup

• CSIRO Enterprise Research Collection (CERC), a crawl of *.csiro.au web site conducted in March 2007.

• 370,715 documents

• Size of 4.2 gigabytes

• 50 topics

• Judgments made in 3-point scale: 2: highly relevant “key reference”1: candidate key page0: not a “key reference”

Page 16: Tanvi Motwani- A Few Examples Go A Long Way

Overview

Retrieval

Model

Experimental Set up

Query Representati

on

Baseline Parameter

s

Experimental

Evaluation

Maximizing Average Precision (MAX_AP)

Maximizing Query

Log Likelihood (MAX_QLL

)

Best Empirical estimate (EMP_BES

T)

Page 17: Tanvi Motwani- A Few Examples Go A Long Way

Parameter Estimation

Maximizing Average Precision (MAX_AP)

Maximizing Query Log likelihood (MAX_QLL)

Best Empirical Estimate (EMP_BEST)

Page 18: Tanvi Motwani- A Few Examples Go A Long Way

Evaluation

•Maximum AP score is reached when weight is 0.6

•MAX_QLL performs slightly better than MAX_AP

Page 19: Tanvi Motwani- A Few Examples Go A Long Way

Overview

Retrieval

Model

Experimental Set up

Query Representati

on

Baseline Parameter

s

Experimental

Evaluation

Feedback Using

Relevance Models

Relevance Models from Sample Documents

Query Model from Sample Documents

Page 20: Tanvi Motwani- A Few Examples Go A Long Way

Query Representation

• Combination of expanded query terms is performed with the original query terms.• This prevents the topic to shift away from the original user information need.

Page 21: Tanvi Motwani- A Few Examples Go A Long Way

Overview

Retrieval

Model

Experimental Set up

Query Representati

on

Baseline Parameter

s

Experimental

Evaluation

Feedback Using

Relevance Models

Relevance Models from Sample Documents

Query Model from Sample Documents

Page 22: Tanvi Motwani- A Few Examples Go A Long Way

Feedback Using Relevance Models

Joint Probability of observing t together with query terms q1,q2…qk divided by joint probability of the query terms.

• RM1: It is assumed that t and qi are sampled independently and identically to each other

• RM2 : Sampling of q1,q2…qk are dependent on t but independent of each other.

Page 23: Tanvi Motwani- A Few Examples Go A Long Way

RM1

Assume weight of smoothing is 0.“wild” appears 5 times in this document.“rain” appears 20 times in this document.“forest” appears 30 times in this document.Number of unique terms in this document are 150.M is just this single document.P(D1) = 1/5P(“wild”, “rain”, “forest”) = 1/5* 5/150 * 20/150 * 30/150

Page 24: Tanvi Motwani- A Few Examples Go A Long Way

RM2

Given the term “wild” we first pick a document from M set with probability P(D|t) and then sample query words from the document.

Assume P(D | “wild”) is 0.7This document has 10 “rain” wordsAnd 20 “forest” wordsDocument has 200 unique wordsP(“wild”) is 0.2And M is just this documentP(“wild”, “rain”, “forest”)= 0.2* 0.7 * 20/200 * 10/200

Page 25: Tanvi Motwani- A Few Examples Go A Long Way

Overview

Retrieval

Model

Experimental Set up

Query Representati

on

Baseline Parameter

s

Experimental

Evaluation

Feedback Using

Relevance Models

Relevance Models from Sample Documents

Query Model from Sample Documents

Page 26: Tanvi Motwani- A Few Examples Go A Long Way

Relevance Models from Sample Documents

• Apply Relevance Models on Sample Document instead of Feedback documents i.e. set M = S.

• For RM1 assume P(D) = 1/|S|.

Page 27: Tanvi Motwani- A Few Examples Go A Long Way

Overview

Retrieval

Model

Experimental Set up

Query Representati

on

Baseline Parameter

s

Experimental

Evaluation

Feedback Using

Relevance Models

Relevance Models from Sample Documents

Query Model from Sample Documents

Page 28: Tanvi Motwani- A Few Examples Go A Long Way

Query Model from Sample Documents

Top K terms with highest probability P(t|S) are taken and used to formulate expanded query.

1. Sample Document set S2. Select document D from this set S with probability P(D|S)3. From this document, generate term t with probability P(t|D)4. Sum over all sample documents to obtain P(t|S)

Page 29: Tanvi Motwani- A Few Examples Go A Long Way

Query Model from Sample Documents

• Maximum Likelihood Estimate of a term (EX-QM-ML)

• Smoothed Estimate of a term (EX-QM-SM)

• Ranking Function proposed by Ponte and Croft for unsupervised query expansion (EX-QM-EXP)

Page 30: Tanvi Motwani- A Few Examples Go A Long Way

Query Model from Sample Documents

Three options for estimating P(D|S)

• Uniform: • Query-biased:

• Inverse query-biased:

Page 31: Tanvi Motwani- A Few Examples Go A Long Way

Overview

Retrieval

Model

Experimental Set up

Query Representati

on

Baseline Parameter

s

Experimental

Evaluation

Page 32: Tanvi Motwani- A Few Examples Go A Long Way

Expanded Query Models

Page 33: Tanvi Motwani- A Few Examples Go A Long Way

Combination with Original Query

Page 34: Tanvi Motwani- A Few Examples Go A Long Way

Importance of Sample Document

Page 35: Tanvi Motwani- A Few Examples Go A Long Way

Topic Level Comparison

Page 36: Tanvi Motwani- A Few Examples Go A Long Way

Topic Level Comparison

Page 37: Tanvi Motwani- A Few Examples Go A Long Way

Sampling conditioned on query

Page 38: Tanvi Motwani- A Few Examples Go A Long Way

Conclusion

• Introduced a method of sampling query expansion terms in a query-independent way, based on sample documents that reflect “aspects” of user’s information need that are not captured by the query.

• Introduced different versions of expansion term selection method, based on different term selection and document importance weighting methods and compared them against more traditional query expansion terms is a query-biased manner.

Page 39: Tanvi Motwani- A Few Examples Go A Long Way

Questions/Discussion

• Every topic needs a sample document set, is this method feasible in real world domain where there are uncountable topics?

• Aspect Recall is obtained from the sample documents, aren’t we dependent on the “goodness” or the amount of different aspects covered in sample documents for obtaining a high aspect recall?

• Theoretically there is slight increase in MAP measurement as compared to BFB-RM2 (around 0.07), for a end-user will it provide any difference in user experience? Is such a small gain in MAP worth the high cost of obtaining sample documents?