32
Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!

Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!

Embed Size (px)

Citation preview

Page 1: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!

Evaluating Novelty and Diversity

Charles Clarke

School of Computer Science

University of Waterloo

two talks in one!

Page 2: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!

Goals for Evaluation Measures

• meaningful• tractable• reusable

Page 3: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!

Evaluation Framework

We examine a framework for evaluation.

Specific measures covered by the framework include:

Clarke et al. (SIGIR ’08)Agrawal et al. (WSDM ’09)Clarke et al. (ICTIR ‘09)

Page 4: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!

Talk #1: Evaluating Diversity

Charles Clarke

School of Computer Science

University of Waterloo

Page 5: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!

Query: “windows”

1. Microsoft Windowsa) When will Windows 7 be released?

b) What’s the Windows update URL?

c) I want to download Windows Live Essentials

2. House windowsa) Where can I buy replacement windows?

b) What brands are available?

c) Aluminum or vinyl?

3. Windows Restaurant, Las Vegas

Page 6: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!

Nuggets

Nugget = any binary property of a document

Provides address of a Pella dealer. Discusses history of the Windows OS. Is the Windows update page.

(factual, topical and navigational)

Problem: potentially thousands per query.

Page 7: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!

Evaluation

• Model user information needs using nuggets. Different users will be interested in different combinations of nuggets.

• Express judgments in terms of nuggets. Judgments may be automatic or manual. Judgments are binary: Does this document contain this nugget?

• Nuggets link users and documents

Page 8: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!

Interdependencies

Problem: Complex interdependencies between nuggets.

Three possible simplifying assumptions:

1. User interested in nugget A will always be interested in nugget B.

2. User interested in nugget A will never be interested in nugget B.

3. Nuggets A and B are independent.

Page 9: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!

Possible Assumption #1

If a user interested in nugget A will always be interested in nugget B, then A and B can be treated as the same nugget.

Page 10: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!

Possible Assumption #2

A user interested in nugget A will never be interested in nugget B (and vice versa). A user’s interest in nugget A depends on their interest in nugget B.

Nugget A and nugget be may be viewed as representing different interpretations of the query.

Page 11: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!

Query: “windows”

1. Microsoft Windowsa) When will Windows 7 be released?

b) What’s the Windows update URL?

c) I want to download Windows Live Essentials

2. House windowsa) Where can I buy replacement windows?

b) What brands are available?

c) Aluminum or vinyl?

3. Windows Restaurant, Las Vegas

Page 12: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!

Query Interpretations

• Assume M interpretations• Compute any effectiveness measure with

respect to each interpretation (Sj)

• Compute weighted average (where pj is probability of interpretation j)

• Agrawal et al, 2009

Page 13: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!

Possible Assumption #3

A user’s interest in nugget A is independent of their interest in nugget B.

The probability that the user is interested in nugget A is a constant (pA).

The probability that the user is interested in nugget B is a constant (pB).

Page 14: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!

Query: “windows”

1. Microsoft Windowsa) When will Windows 7 be released?

b) What’s the Windows update URL?

c) I want to download Windows Live Essentials

2. House windowsa) Where can I buy replacement windows?

b) What brands are available?

c) Aluminum or vinyl?

3. Windows Restaurant, Las Vegas

Page 15: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!

Relevance framework

A document is relevant if it contains any relevant information (with N nuggets).

Page 16: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!

Relevance

• Assume constant user probabilities• Assume constant document probabilities• J(d, i) = 1 iff document d is judged to

contain nugget i

count the nuggets

Page 17: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!

Probability of Relevance

Estimated probability of relevance replaces relevance in standard evaluation measures, including nDCG, MAP, and Rank-biased precision.

Assumptions #2 and #3 can then be combined.

Other estimation methods possible.

Page 18: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!

Research Issues (talk #1)

• Identifying nuggets automatically– Clustering– Co-clicks– Query refinement

• Automatic judging– Patterns– Classification

• How many nuggets are enough?• Estimating probability of relevance

Page 19: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!

Conclusions (talk #1)

• Evaluating diversity requires us to model and represent the diversity.

• Nuggets represent one possible solution.• Simple user model; simple assumptions;

simple judging.

Page 20: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!

Questions?

Talk #1: Evaluating Diversity

Charles Clarke

School of Computer Science

University of Waterloo

Page 21: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!

Intermission

The TREC 2009 Web Track• traditional adhoc task• novelty and diversity task• ClueWeb09 dataset (one billion pages)• explore effectiveness measures• http://plg.uwaterloo.ca/~trecweb

Page 22: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!

Intermission: Free sample topic

<topic number=0> <query> physical therapist </query> <description> The user requires information regarding the profession and the services it provides. </description> <subtopic number=1> What does a physical therapist do? </subtopic> <subtopic number=2> Where can I find a physical therapist? </subtopic> <subtopic number=3> How much does physical therapy cost per hour? </subtopic> …

Page 23: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!

Talk #2: Evaluating Novelty

Charles Clarke

School of Computer Science

University of Waterloo

Page 24: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!

Novelty

• Novelty depends on diversity.• Previous talk considered probability of

relevance in isolation (e.g., for the top-ranked document).

• In this talk we will examine how user context impacts the probability of relevance.

Page 25: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!

User context

Page 26: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!

Simplest context model

• Ranked list• User scans result 1, 2, 3, 4, 5, … in order.• Novelty of result k considered in light of

the first k-1 results.

Page 27: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!

Relevance framework

Page 28: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!

Relevance

Assuming constant probabilities.

Page 29: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!

Beyond the ranked list

Page 30: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!

Research issues (talk #2)

• Better user models• Prior browsing context, local context, etc.• Evaluating impact of result presentation

methods– Better captions– Query suggestions– Instant answers (stock quotes, weather,

product prices, definitions)

Page 31: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!

Conclusions (talk #2)

• Modeling and representing diversity allows us to consider novelty.

• User models should be simple enough to be tractable.

• User models should be complex enough to be meaningful.

Page 32: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!

Questions?

Talk #2: Evaluating Novelty

Charles Clarke

School of Computer Science

University of Waterloo