Yandex'10 kal-slides

Why Do Users in Real Life Use Short Queries?

Kal [email protected] work with: H Keskustalo, A Pirkola,T Sharma, M Lykke Nielsen

mailto:[email protected]

A Quick Answer

Because …they are good enoughand effortless

But how to show that?

Outline

1. Introduction2. Study Design

Research QuestionTest EnvironmentExperimental Protocol

3. Experimental Results4. Conclusions

44

Introduction

Traditional test collectionbased IR:methods compared based on result qualitytopical relevanceone query per topicverbose queries(long) lists of retrieved documentsoften binary relevance with low threshold

5

Introduction

On the contrary, real searchershave various interaction strategies / expectationsconsider beyondtopical relevanceuse more than one query, if needed, in sessionsshort queries (Jansen & al., 2000)unstructured queries (Ruthven, 2008)may or may not avoid sequences of topically nonrelevant documents (Azzopardi, 2007); wantfew, but good, documents (Järvelin & al., 2008)

6

The Present Talk ...

... brings opposing views closer together:Session simulation in a test collection

topical sessions (up to 5 queries per topic)idealized session strategies (3+1)short queries (including 1word sequences)short browsing (10document) windowtask to find one (highly) relevant document§ but other search goals can be assumed

7

Research Question

What is the effectiveness ofa session of short queriescompared to one verbose [TREC] query?

8

Test Environment

TREC 78 collection41 topics528 155 documentsgraded relevance judgments (highly, fairly,marginally relevant, and nonrelevantdocuments)

Lemur retrieval systemQuery keys collected from test persons

9

Experimental Protocol

Obtaining search keysSession strategiesSimulated session constructionRetrieval protocol

10

Collecting Search Keys

7+7 test personsIntellectual analysis of 41 topicsEach topic analyzed twice: once by astudent (Group A), and once by a staffmember (Group B)The task was to identify good search keysVarious session scenarios were employed

11

An Example

Topic number: 351Description:

What information is available on petroleumexploration in the South Atlantic near theFalkland Island

Narrative:Any document discussing petroleum explorationin the South Atlantic near the Falkland Islandsis considered relevant.

12

Session Strategy S1

Oneword queries onlyJansen & al., (2000); Stenmark (2008)Lykke & al. (2009) (employed 21 times in the 60 reallive sessions)

Example:falkland →exploration →island →petroleum →

13

Session Strategy S2

Incremental query extensionOne word added if the query failsLykke & al. (13 times of 60 reallife sessions)

Example:petroleum →petroleum exploration →petroleum exploration south →petroleum exploration south atlantic

14

Session Strategy S3

”Variations on a theme of two words”2 fixed keys; 3rd key is variedLykke & al. (in 38 of 60 reallive sessions)

Example:petroleum exploration south →petroleum exploration atlantic →petroleum exploration falkland → …

15

Session Strategy S4

One verbose [TREC] query (title +description)

traditional baseline

Example:falkland petroleum exploration informationavailable petroleum exploration south atlanticfalkland island

Simulations

Instead of real interactive sessions weperformed session simulation

Search keys chosen randomly from the pool foreach topicChosen keys arranged to consecutive queriesaccording to the four strategiesPerson assumed to scan the first page and stopat the first marginal/highly relevant doc.

17

Retrieval Protocol

Construct query sessions for each strategyRetrieve Top10 documents using eachindividual query (Top50 for S4)Determine whether / how rapidly eachquery sequence succeeds/fails

18

Results

Succ

ess

of s

trat

egie

sS1

S4

by in

divi

dual

top

ics

Liberal Relevance Stringent RelevanceS1 S2 S3 S4 S1 S2 S3 S4

Topic# A B A B A B _ A B A B A B _351 1 2 5 1 3 1 1 1 2 5 1 3 1 1353 1 1 1 1 1 1 1 2 2 1 1355 1 2 1 1 1 1 1 1 2 1 1 1 1 1358 1 1 1 1 2 1 1 1 1 1 1 2 1 1360 1 2 1 1 1 1 1 2 2 3 3 2 1 1362 1 1 1 1 1 1 1 1 1 1 1 3 1364 1 1 1 1 1 1 1 1 1 1 1 1 1 1365 3 1 1 2 1 1 1 3 1 1 2 1 1 1372 5 2 1 1 1 1 1 2 2 1 2 1373 1 1 1 1 1 1 1 1 1 1 1 1 1 1377 2 1 1 1 2 1 1 1378 3 3 1 1 1 384 2 1 1 1 1 1 1 4 3 2 1 2385 2 2 1 1 1 2 2 2 1 1387 1 1 1 1 1 2 1 2 1 2 2 1 2 1388 2 3 4 3 1 1 1 4392 2 1 1 1 1 1 1 2 1 1 1 3 1 1393 2 1 1 1 1 1 1 2 1 1 1 1 1 3396 1 3 2 1 1 1 1 1 3 2 1 1 1 1399 4 2 1 1 1 4 2 1 2400 4 2 2 1 1 1 1 4 2 2 1 2 1 1402 1 1 2 1 1 1 1 1 2 1 1 1403 1 1 1 1 1 1 1 1 1 1 1 1 1 1405 1 2 3 1 1 2 1 3 2407 1 1 1 1 1 1 1 2 1 1 1 1 1 1408 2 1 1 1 1 1 1 2 3 2 1 1410 3 2 1 1 1 1 1 3 2 1 1 1 1 1414 3 2 1 1 1 415 3 1 2 1 1 1 1 2 5 1 1 1416 3 1 1 1 1 1 1 3 1 1 1 1 1 1418 2 2 1 1 1 1 1 2 2 1 1 1 1 1420 1 1 1 2 1 1 1 1 1 1 2 1 1 1421 1 2 1 3 1 1 1 2 3 1 1 1427 2 2 1 1 2 1 1 2 4 4428 2 1 1 1 1 1 1 3 1 1 1 1 1 1431 2 3 1 1 1 1 1 2 1 1 1 1 1 1437 2 3 2 3 440 2 2 3 2 1 2 2 5 5442 1 1 2 1 1 1 445 3 1 1 1 3 1 2 4 1 4 1448 2 2 1 1 1

19

Count of successful sessions (max = 41),Liberal relevance threshold

0

5

10

15

20

25

30

35

40

S1 A S1 B S2 A S2 B S3 A S3 B S4

Session strategy and test group

# Sessions

20

Count of successful sessions (max = 38),Stringent relevance threshold

0

5

10

15

20

25

30

35

S1 A S1 B S2 A S2 B S3 A S3 B S4

Session strategy and test group

# Sessions

21

22

Statistical significance

Friedman’s test by the ordinal of success.Similar results for group A and B and forliberal and stringent relevance.Significant pairwise differences (p=0.01) asfollows:

S1 differs from S2, S3, S4S2 differs from S4S3 does not differ significantly from S4

S1: Cumulative success (%)

0

10

20

30

40

50

60

70

1 2 3 4 5

Query ordinal

Perc

ent

S1 Group AS1 Group B

23


0102030405060708090

100

1 2 3 4 5

Query ordinal

Per

cent S2 Group A

S2 Group B

24


0

1020

30

4050

60

7080

90

1 2 3 4 5

Query ordinal

Perc

ent

S3 Group AS3 Group B

25


0102030405060708090

100

1 2 3 4 5

Ordinal of 10document page inspected

Perc

ent

S4 Baseline

26

Nonsession view: single best of all S1 querygenerations compared to S4 baseline

0

5

10

15

20

25

30

S1 S4

Session strategy

Per

cent

P@10

27

Nonsession view: single best of all S1 querygenerations compared to S4 baseline

0

5

10

15

20

25

S1 S4

Session strategy

Perc

ent

AP / 38 topics

28

Effort Expected number of search keysassuming various strategies

0

2

4

6

8

10

12

14

16

18

S1 S2 S3 S4

# Search keys toenter

29

Effort forequallevel ofsuccess:# searchkeys

Effort Expected number of queries to launch tofind one relevant document

0

0,5

1

1,5

2

2,5

3

3,5

4

S1 S2 S3 S4

# Queries to enter

30

Effort forequallevel ofsuccess:#queries

31

Discussion

Another way to look at the success of IRmotivated by observed user behavior:

short query sessionsshort browsingto find a few good documents.

Log studies justify simulationsShortqueries are good enough and easy

even if inferior when used individually

32

Conclusions

Test collectionbased IR evaluation could beextended to:

include multiplequery sessionsfocus on how the system is used§ querying/browsing strategies (interaction)§ in relation to user’s specific goals

focus, in evaluation, on user viewpoint§ strategies serving a particular goal§ simulation approach for repeatability + control

33

Conclusions

Session simulations:a promising approach to study the limits of theeffectiveness of various system usesfindings can be verified with real users§ but our results motivate the observed real user behavior

A prospect for search training:recognize QM patterns of userssimulate themmeasure session success from user pointofview for a”satisfactory result”

Acknowledgement

This research was supported by the Academy of Finlandgrants #120996 and #124131.Reference:Keskustalo, H. & Järvelin, K. & Pirkola, A. & Sharma, T. &Lykke Nielsen, M. (2009). Test CollectionBased IREvaluation Needs Extension Toward Sessions A Case ofExtremely Short Queries. In: Lee, G. & al., Proceedings ofAIRS 2009, Sapporo, Japan, October 2009. Heidelberg:Springer, LNCS vol. 5839, pp. 6374 .

Thank you!

Documents

Yandex'10 kal-slides