31
Lower-Bounding Term Frequency Normalization Yuanhua Lv and ChengXiang Zhai University of Illinois at Urbana- Champaign CIKM 2011 Best Student Award Paper Speaker: Tom Nov 8 th , 2011

Lower-Bounding Term Frequency Normalization Yuanhua Lv and ChengXiang Zhai University of Illinois at Urbana-Champaign CIKM 2011 Best Student Award Paper

Embed Size (px)

Citation preview

Page 1: Lower-Bounding Term Frequency Normalization Yuanhua Lv and ChengXiang Zhai University of Illinois at Urbana-Champaign CIKM 2011 Best Student Award Paper

Lower-Bounding Term Frequency Normalization

Yuanhua Lv and ChengXiang ZhaiUniversity of Illinois at Urbana-Champaign

CIKM 2011 Best Student Award Paper

Speaker: Tom

Nov 8th, 2011

Page 2: Lower-Bounding Term Frequency Normalization Yuanhua Lv and ChengXiang Zhai University of Illinois at Urbana-Champaign CIKM 2011 Best Student Award Paper

It is very difficult to improve retrieval models

• BM25 [Robertson et al. 1994]

• Pivoted length normalization (PIV) [Singhal et al. 1996]

• Query likelihood with Dirichlet prior (DIR) [Ponte & Croft 1998; Zhai & Lafferty 2001]

• PL2 [Amati & Rijsbergen 2002]

2

17 years

15 years

10 years

9 years

All these models remain strong baselines today after so many years!

Page 3: Lower-Bounding Term Frequency Normalization Yuanhua Lv and ChengXiang Zhai University of Illinois at Urbana-Champaign CIKM 2011 Best Student Award Paper

3

1. Why does it seem to be so hard to beat these state-of-the-art retrieval models {BM25, PIV, DIR, PL2 …}?

2. Are they hitting the ceiling?

Page 4: Lower-Bounding Term Frequency Normalization Yuanhua Lv and ChengXiang Zhai University of Illinois at Urbana-Champaign CIKM 2011 Best Student Award Paper

Key heuristic in all effective retrieval models: term frequency (TF) normalization by document length [Singhal et al. 96; Fang et al. 04]

• BM25

• DIR: Query likelihood with Dirichlet prior

4

)(

1log

,||

1

,1

,

,1

1

1

3

3

qdf

N

DqcavdlD

bbk

Dqck

Qqck

Qqck

DQq

||log||

)|(

),(1log),(

DQ

Cwp

DqcQqc

DQq

PIV and PL2 implement similar retrieval heuristics

Term Frequency

Document length

Term discrimination

Page 5: Lower-Bounding Term Frequency Normalization Yuanhua Lv and ChengXiang Zhai University of Illinois at Urbana-Champaign CIKM 2011 Best Student Award Paper

However, the component of TF normalization by document length is NOT lower-bounded properly

• BM25

• DIR: Query likelihood with Dirichlet prior

5

)(

1log

,||

1

,1

,

,1

1

1

3

3

qdf

N

DqcavdlD

bbk

Dqck

Qqck

Qqck

DQq

||log||

)|(

),(1log),(

DQ

Cwp

DqcQqc

DQq

0||D

||D

When a document is very long, its score from matching a query term could be too small!

Page 6: Lower-Bounding Term Frequency Normalization Yuanhua Lv and ChengXiang Zhai University of Illinois at Urbana-Champaign CIKM 2011 Best Student Award Paper

As a result, long documents could be overly penalized

D2 matches the query term, while D1 does not

Sco

re

PL2

S(D2) < S(D1)

Sco

re

DIR

S(D2) < S(D1)

Page 7: Lower-Bounding Term Frequency Normalization Yuanhua Lv and ChengXiang Zhai University of Illinois at Urbana-Champaign CIKM 2011 Best Student Award Paper

Empirical evidence: long documents indeed overly penalized

7

Prob. of relevance/retrieval: the probability of a randomly selected relevant/retrieved document having a certain document length [Singhal et al. 96]

Relevance

Retrieval Retrieval

Relevance

Document length Document length

Page 8: Lower-Bounding Term Frequency Normalization Yuanhua Lv and ChengXiang Zhai University of Illinois at Urbana-Champaign CIKM 2011 Best Student Award Paper

8

Functionality analysis of retrieval

models

Bug

TF normalization not lower-bounded properly, and long documents overly penalized

Are these retrieval models sharing this similar bug because they all violate some necessary retrieval heuristics?

Can we formally capture these necessary heuristics?

White-box Testing

Page 9: Lower-Bounding Term Frequency Normalization Yuanhua Lv and ChengXiang Zhai University of Illinois at Urbana-Champaign CIKM 2011 Best Student Award Paper

Two novel heuristics for regulating the interactions between TF and doc. length

• There should be a sufficiently large gap between the presence and absence of a query term– Document length normalization should not cause a very

long document with a non-zero TF to receive a score too close to or even lower than a short document with a zero TF

• A short document that only covers a very small subset of the query terms should not easily dominate over a very long document that contains many distinct query terms

9

LB2

LB1

Page 10: Lower-Bounding Term Frequency Normalization Yuanhua Lv and ChengXiang Zhai University of Illinois at Urbana-Champaign CIKM 2011 Best Student Award Paper

Lower-bounding constraint 1 (LB1):Occurrence > Non-Occurrence

10

D1:w

Score(Q, D1) = Score(Q, D2)

Score(Q’, D1) < Score(Q’, D2)

Q:w

D2:w q

Q’:w q

Page 11: Lower-Bounding Term Frequency Normalization Yuanhua Lv and ChengXiang Zhai University of Illinois at Urbana-Champaign CIKM 2011 Best Student Award Paper

Lower-bounding constraint 2 (LB2):First Occurrence > Repeated Occurrence

11

D1:q1

Score(Q, D1) = Score(Q, D2)

D2:q1

D1’:q1q1

D2’:q1 q2

Q:q1 q2

Score(Q, D1’) < Score(Q, D2’)

Page 12: Lower-Bounding Term Frequency Normalization Yuanhua Lv and ChengXiang Zhai University of Illinois at Urbana-Champaign CIKM 2011 Best Student Award Paper

BM25 satisfies LB1 but violates LB2

• LB1 is satisfied unconditionally• LB2 is equivalent to:

12

)(

1log

,||

1

,1

,

,1

1

1

3

3

tdf

N

DtcavdlD

bbk

Dtck

Qtck

Qtck

DQt

avdlbk

kD

122

21

1 (Parameters: k1 > 0 && 0 < b < 1)

Long documents tend to violate LB2

Large b or k1 violates LB2 easily

Page 13: Lower-Bounding Term Frequency Normalization Yuanhua Lv and ChengXiang Zhai University of Illinois at Urbana-Champaign CIKM 2011 Best Student Award Paper

DIR satisfies LB2 but violates LB1

• LB2 is equivalent to:

• LB1 is equivalent to:

13

avdl

CtpavdlD 1

)|(

1

Long documents tend to violate LB1

||log||

)|(

),(1log),(

DQ

Cwp

DqcQqc

DQq

)|(

)|(1

)|(

)|(1

Ctp

Ctp

Ctpn

Ctpn

satisfied unconditionally!

Large µ or non-discriminative terms violate LB1 easily

Page 14: Lower-Bounding Term Frequency Normalization Yuanhua Lv and ChengXiang Zhai University of Illinois at Urbana-Champaign CIKM 2011 Best Student Award Paper

No retrieval model satisfies both constraints

14

Model LB1 LB2 Parameter and/or query restrictions

BM25 Yes No b and k1 should not be too large

PIV Yes No s should not be too large

PL2 No No c should not be too small

DIR No Yes µ should not be too large; query terms should be discriminative

Can we "fix" this problem for all the models in a general way?

Page 15: Lower-Bounding Term Frequency Normalization Yuanhua Lv and ChengXiang Zhai University of Illinois at Urbana-Champaign CIKM 2011 Best Student Award Paper

Solution: a general approach to lower-bounding TF normalization

• The score of a document D from matching a query term t:

15

)(|,|),,( ttdDDtcFTerm discrimination

)(

1log

,||

1

,1

1

1

tdf

N

DtcavdlD

bbk

Dtck

BM25

CtpD

Dtc

D |||

),(

||log

DIR

PIV and PL2 also have their corresponding components

Page 16: Lower-Bounding Term Frequency Normalization Yuanhua Lv and ChengXiang Zhai University of Illinois at Urbana-Champaign CIKM 2011 Best Student Award Paper

Solution: a general approach to lower-bounding TF normalization (Cont.)

• Objective: an improved version

that does not hurt other retrieval heuristics, but

• A heuristic solution:

16

)(|,|,0')(|,|,1' ttdDFttdDF

)(|,|),,(' ttdDDtcF

l can be absorbed into δ

which satisfies all retrieval heuristics that are satisfied by )(|,|),,( ttdDDtcF

Page 17: Lower-Bounding Term Frequency Normalization Yuanhua Lv and ChengXiang Zhai University of Illinois at Urbana-Champaign CIKM 2011 Best Student Award Paper

Example: BM25+, a lower-bounded version of BM25

17

)(

1log

,||

1

,1

,

,1

1

1

3

3

tdf

N

DtcavdlD

bbk

Dtck

Qtck

Qtck

DQt

BM25:

)(

1log

,||

1

,1

,

,1

1

1

3

3

tdf

N

DtcavdlD

bbk

Dtck

Qtck

Qtck

DQt

BM25+:

BM25+ incurs almost no additional computational cost

Similarly, we can also improve PIV, DIR, and PL2, leading to PIV+, DIR+, and PL2+ respectively

Page 18: Lower-Bounding Term Frequency Normalization Yuanhua Lv and ChengXiang Zhai University of Illinois at Urbana-Champaign CIKM 2011 Best Student Award Paper

BM25+ can satisfy both LB1 and LB2

• Similarly to BM25, BM25+ satisfies LB1

• LB2 can also be satisfied unconditionally if:

18

21

1

k

k

Experiments show later that setting δ = 1.0 works very well

Page 19: Lower-Bounding Term Frequency Normalization Yuanhua Lv and ChengXiang Zhai University of Illinois at Urbana-Champaign CIKM 2011 Best Student Award Paper

The proposed approach can fix or alleviate the problem of all these retrieval models

19

BM25+ Yes Yes

PIV+ Yes Yes

PL2+ Yes Yes

DIR+ Alleviated Yes

BM25 Yes No

PIV Yes No

PL2 No No

DIR No Yes

Current retrieval models

Improved retrieval models

LB1 LB2

Page 20: Lower-Bounding Term Frequency Normalization Yuanhua Lv and ChengXiang Zhai University of Illinois at Urbana-Champaign CIKM 2011 Best Student Award Paper

Experiment Setup

• Standard TREC document collections– Web: WT2G, WT10G, and Terabyte– News: Robust04

• Standard TREC query sets:– Short (the title field): e.g., “Iraq foreign debt reduction”

– Verbose (the description field): e.g., “Identify any efforts, proposed or undertaken, by world governments to seek reduction of Iraq's foreign debt”

• 2-fold cross validation for parameter tuning

20

Page 21: Lower-Bounding Term Frequency Normalization Yuanhua Lv and ChengXiang Zhai University of Illinois at Urbana-Champaign CIKM 2011 Best Student Award Paper

BM25+ improves over BM25 significantly

21

BM25+ performs better on Web data than on News data

Web Web News

Superscripts 1/2/3/4 indicating significance at the 0.05/0.02/0.01/0.001 level

δ = 1.0 works well, confirming constraint analysis that 21

1

k

k

BM25+ performs better on verbose queries?

Short

Verbose

σ = 2.31 σ = 2.63 σ = 1.19

Page 22: Lower-Bounding Term Frequency Normalization Yuanhua Lv and ChengXiang Zhai University of Illinois at Urbana-Champaign CIKM 2011 Best Student Award Paper

BM25 overly penalizes long documents more seriously for verbose queries

22

The “condition” that BM25 violates LB2 is

avdlbk

kD

122

|| 21

1 (monotonically decreasing with b & k1)

The optimal settings of b & k1 are larger for verbose queries

Page 23: Lower-Bounding Term Frequency Normalization Yuanhua Lv and ChengXiang Zhai University of Illinois at Urbana-Champaign CIKM 2011 Best Student Award Paper

The improvement indeed comes from alleviating the problem of overly-penalizing long docs

23

BM25+ (verbose)BM25+ (short)

BM25 (short) BM25 (verbose)

Page 24: Lower-Bounding Term Frequency Normalization Yuanhua Lv and ChengXiang Zhai University of Illinois at Urbana-Champaign CIKM 2011 Best Student Award Paper

DIR+ improves over DIR significantly

24

Fixing δ = 0.05 works very well

DIR+ performs better on verbose than on short queries

Superscripts 1/2/3/4 indicating significance at the 0.05/0.02/0.01/0.001 level

Short

Verbose

?

avdl

CtpavdlD 1

)|(

1DIR can only satisfy LB1 if

Optimal µ settings

Page 25: Lower-Bounding Term Frequency Normalization Yuanhua Lv and ChengXiang Zhai University of Illinois at Urbana-Champaign CIKM 2011 Best Student Award Paper

PL2+ improves over PL2 significantly

25

Fixing δ = 0.8 works very well

PL2+ performs better on verbose than on short queries

Superscripts 1/2/3/4 indicating significance at the 0.05/0.02/0.01/0.001 level

Short

Verbose

Optimal settings of c: the smaller, the more dangerous

Page 26: Lower-Bounding Term Frequency Normalization Yuanhua Lv and ChengXiang Zhai University of Illinois at Urbana-Champaign CIKM 2011 Best Student Award Paper

PIV+ works as we expected

26

PIV+ does not consistently outperform PIV, as we expected

Superscripts 1 indicating significance at the 0.05 level

PIV can satisfy LB2 if avdls

D

1

899.0

It’s fine, as the optimal settings of s are very small

Page 27: Lower-Bounding Term Frequency Normalization Yuanhua Lv and ChengXiang Zhai University of Illinois at Urbana-Champaign CIKM 2011 Best Student Award Paper

27

1. Why does it seem to be so hard to beat these state-of-the-art retrieval models {BM25, PIV, DIR, PL2 …}?

2. Are they hitting the ceiling?

We weren’t able to figure out their deficiency analytically.

No, they haven’t hit the ceiling yet!

Page 28: Lower-Bounding Term Frequency Normalization Yuanhua Lv and ChengXiang Zhai University of Illinois at Urbana-Champaign CIKM 2011 Best Student Award Paper

Conclusions

• Reveal a common deficiency of current retrieval models

• Propose two novel formal constraints

• Show that current retrieval models do not satisfy both constraints, and that retrieval performance tends to be poor if either constraint is violated

• Develop a general and efficient solution, which has been shown analytically to fix/alleviate the problem of current retrieval models

• Demonstrate the effectiveness of the proposed algorithms across different collections for different types of queries

28

Page 29: Lower-Bounding Term Frequency Normalization Yuanhua Lv and ChengXiang Zhai University of Illinois at Urbana-Champaign CIKM 2011 Best Student Award Paper

Our models {BM25+, DIR+, PL2+} can potentially replace current

state-of-the-art retrieval models {BM25, DIR, PL2}

29

)(

1log

,||

1

,1

,

,1

1

1

3

3

tdf

N

DtcavdlD

bbk

Dtck

Qtck

Qtck

DQt

BM25:

)(

1log0.1

,||

1

,1

,

,1

1

1

3

3

tdf

N

DtcavdlD

bbk

Dtck

Qtck

Qtck

DQt

BM25+:

Page 30: Lower-Bounding Term Frequency Normalization Yuanhua Lv and ChengXiang Zhai University of Illinois at Urbana-Champaign CIKM 2011 Best Student Award Paper

Future work

• This work has demonstrated the power of doing axiomatic analysis to fix deficiencies of retrieval models. Are there any other deficiencies of current retrieval models? If so, can we solve them with axiomatic analysis?

• Can we go beyond bag of words with constraint analysis?

• Can we find a comprehensive set of constraints that are sufficient for deriving a unique (optimal) retrieval function

30

Page 31: Lower-Bounding Term Frequency Normalization Yuanhua Lv and ChengXiang Zhai University of Illinois at Urbana-Champaign CIKM 2011 Best Student Award Paper

Thanks!

31