49
A Static Rank Framework for Lucene/Solr Mike Schultz [email protected]

A Static Rank Framework for Lucene / Solr

  • Upload
    vida

  • View
    44

  • Download
    0

Embed Size (px)

DESCRIPTION

A Static Rank Framework for Lucene / Solr. Mike Schultz [email protected]. Static Rank for Solr / Lucene. Dynamic Rank Why Static Rank Combining Scores Static Rank Components. Multiple Fields / Multiple Types. PubDate. Continuous (Date, Int , Float, …). I sNews. M ediaType. - PowerPoint PPT Presentation

Citation preview

Page 1: A Static Rank Framework for  Lucene / Solr

A Static Rank Framework for Lucene/SolrMike [email protected]

Page 2: A Static Rank Framework for  Lucene / Solr

Static Rank for Solr/Lucene

• Dynamic Rank

• Why Static Rank

• Combining Scores

• Static Rank Components

Page 3: A Static Rank Framework for  Lucene / Solr

Multiple Fields / Multiple Types

PubDate

IsNews

MediaType

TextBody

Continuous (Date, Int, Float, …)

Page 4: A Static Rank Framework for  Lucene / Solr

Multiple Fields / Multiple Types

PubDate

IsNews

MediaType

TextBody

Continuous (Date, Int, Float, …)

Boolean (True, False)

Page 5: A Static Rank Framework for  Lucene / Solr

Multiple Fields / Multiple Types

PubDate

IsNews

MediaType

TextBody

Continuous (Date, Int, Float, …)

Boolean (True, False)

Enum (Book, CD, DVD, Cassette)

Page 6: A Static Rank Framework for  Lucene / Solr

Multiple Fields / Multiple Types

PubDate

IsNews

MediaType

TextBody

Continuous (Date, Int, Float, …)

Boolean (True, False)

Enum (Book, CD, DVD, Cassette)

Text (Natural Language)

Page 7: A Static Rank Framework for  Lucene / Solr

Dynamic Rank

PubDate

IsNews

MediaType

TextBody

TF * IDF

Query

Dynamic Score

Page 8: A Static Rank Framework for  Lucene / Solr

Dynamic Rank

• Query Dependent = F(Q,D)PubDate

IsNews

MediaType

TextBody

TF * IDF

Query

Dynamic Score

Page 9: A Static Rank Framework for  Lucene / Solr

Dynamic Rank

• Query Dependent = F(Q,D)• Huge dynamic range (0.001-1502.3)

PubDate

IsNews

MediaType

TextBody

TF * IDF

Query

Dynamic Score

Page 10: A Static Rank Framework for  Lucene / Solr

Dynamic Rank

• Query Dependent = F(Q,D)• Huge dynamic range (0.001-1502.3)• Not comparable across queries

PubDate

IsNews

MediaType

TextBody

TF * IDF

Query

Dynamic Score

Page 11: A Static Rank Framework for  Lucene / Solr

Dynamic Rank

• Query Dependent = F(Q,D)• Huge dynamic range (0.001-1502.3)• Not comparable across queries• Not easily normalized

PubDate

IsNews

MediaType

TextBody

TF * IDF

Query

Dynamic Score

Page 12: A Static Rank Framework for  Lucene / Solr

Why Static Rank?

PubDate

IsNews

MediaType

TextBody

Query

Static Rank System Static Score

Page 13: A Static Rank Framework for  Lucene / Solr

Why Static Rank?

PubDate

IsNews

MediaType

TextBody

Query

Static Rank System Static Score

All (dynamic) things equal, I want– Newer over older

Page 14: A Static Rank Framework for  Lucene / Solr

Why Static Rank?

PubDate

IsNews

MediaType

TextBody

Query

Static Rank System Static Score

All (dynamic) things equal, I want– Newer over older– CD over cassette

Page 15: A Static Rank Framework for  Lucene / Solr

Why Static Rank?

PubDate

IsNews

MediaType

TextBody

Query

Static Rank System Static Score

All (dynamic) things equal, I want– Newer over older– CD over cassette– Arbitrary feature A over arbitrary

feature B

Page 16: A Static Rank Framework for  Lucene / Solr

Static Rank

PubDate

IsNews

MediaType

TextBody

Query

Static Rank System

• Query Independent = F(D)– i.e. static across queries

Static Score

Page 17: A Static Rank Framework for  Lucene / Solr

Static Rank

PubDate

IsNews

MediaType

TextBody

Query

Static Rank System

• Query Independent = F(D)– i.e. static across queries

• More easily bounded

Static Score

Page 18: A Static Rank Framework for  Lucene / Solr

Combined Rank

PubDate

IsNews

MediaType

TextBody

TF * IDF

Query

Static Rank System

Cust

om Q

uery

Com

bine

d Sc

ore

Page 19: A Static Rank Framework for  Lucene / Solr

Framework - Requirements

Cust

om Q

uery

Com

bine

d Sc

ore

• Intuitive, hand-tunable, debuggable

Page 20: A Static Rank Framework for  Lucene / Solr

Framework - Requirements

Cust

om Q

uery

Com

bine

d Sc

ore

• Intuitive, hand-tunable, debuggable• Query-time only, no re-indexing

Page 21: A Static Rank Framework for  Lucene / Solr

Framework - Requirements

Cust

om Q

uery

Com

bine

d Sc

ore

• Intuitive, hand-tunable, debuggable• Query-time only, no re-indexing• Minimal parameters

Page 22: A Static Rank Framework for  Lucene / Solr

Framework - Requirements

Cust

om Q

uery

Com

bine

d Sc

ore

• Intuitive, hand-tunable, debuggable• Query-time only, no re-indexing• Minimal parameters• Static Rank should boost / demote– But not too much!– Docs should stay in their own dynamic

rank “neighborhood”.

Page 23: A Static Rank Framework for  Lucene / Solr

Combining Scores - Approaches

Cust

om Q

uery

Com

bine

d Sc

ore

• Addition?– Dynamic(0.0001) + Static(0.3) = 0.3001– Dynamic(1542.1) + Static(0.3) = 1542.4– Difficult to get right across queries

Page 24: A Static Rank Framework for  Lucene / Solr

Combining Scores - Approaches

Cust

om Q

uery

Com

bine

d Sc

ore

• Multiplication?– Dynamic(50.0) * Static(0.3) = 15.0– Dynamic(10.0) * Static(2.0) = 20.0– Could work, but awkward

Page 25: A Static Rank Framework for  Lucene / Solr

Combining Scores - Approaches

Line

ar Q

uery

Com

bine

d Sc

ore

1. Bound StaticScore: -1.0 to 1.02. CScore = DScore*(100+S%*SScore)– At most, staticRank will boost/demote

dynamicScore by S%– CScore = 0.014 * (100+30*0.5)– CScore = 145.3 * (100+30*-0.5)

Page 26: A Static Rank Framework for  Lucene / Solr

LinearQuery

Page 27: A Static Rank Framework for  Lucene / Solr

Static Rank

PubDate

IsNews

MediaType

TextBody

Query

Static Rank System Static Score

Page 28: A Static Rank Framework for  Lucene / Solr

Static Rank

PubDate

IsNews

MediaType

TextBody

Query

Static Rank System Static Score

• Extend solr.ValueSource/Parser

Page 29: A Static Rank Framework for  Lucene / Solr

Static Rank

PubDate

IsNews

MediaType

TextBody

Query

Static Rank System Static Score

• Extend solr.ValueSource/Parser • Uses field cache for inputs

Page 30: A Static Rank Framework for  Lucene / Solr

Static Rank

PubDate

IsNews

MediaType

TextBody

Query

Static Rank System Static Score

• Extend solr.ValueSource/Parser • Uses field cache for inputs• Extremely fast

Page 31: A Static Rank Framework for  Lucene / Solr

Static Rank

PubDate

IsNews

MediaType

Page 32: A Static Rank Framework for  Lucene / Solr

Static Rank

PubDate

IsNews

MediaType

AgoValueSource

years ago

Page 33: A Static Rank Framework for  Lucene / Solr

Static Rank

PubDate

IsNews

MediaType

MuxValueSource

0

T

F

AgoValueSource

years ago

years ago

Page 34: A Static Rank Framework for  Lucene / Solr

MuxValueSource Config

Page 35: A Static Rank Framework for  Lucene / Solr

Static Rank

PubDate

IsNews

MediaType

0

T

F

EnumValueSource

MuxValueSourceAgoValueSource

years ago

years ago

Page 36: A Static Rank Framework for  Lucene / Solr

EnumValueSource Config

• Maps Fixed-Vocabulary to YEARS AGO• A hierarchy and 3 values: MIN,0,MAX• All things equal (dynamically), DVD = +3.3 years

Page 37: A Static Rank Framework for  Lucene / Solr

Static Rank

PubDate

IsNews

MediaType

0

T

F

SumValueSource

EnumValueSource

MuxValueSourceAgoValueSource

years ago

years ago

years ago

years ago ?

-1

1

Page 38: A Static Rank Framework for  Lucene / Solr

Mapping YearsAgo to -1.0 – 1.0• Step Function: if > 10 years-ago = -1, else = +1• 1 parameter• Too abrupt

Page 39: A Static Rank Framework for  Lucene / Solr

Mapping YearsAgo to -1.0 – 1.0• Step Function: if > 10 years-ago = -1, else = +1• 1 parameter• Too abrupt

• Linear• No parameters (fixed)• Too gradual over 2000+ years

Page 40: A Static Rank Framework for  Lucene / Solr

Mapping YearsAgo to -1.0 – 1.0• Step Function: if > 10 years-ago = -1, else = +1• 1 parameter• Too abrupt

• Linear• No parameters (fixed)• Too gradual over 2000+ years

• Sigmoid• 2 parameters• Smooth over entire range• Easy to calculate

Page 41: A Static Rank Framework for  Lucene / Solr

Sigmoid

Slope

Page 42: A Static Rank Framework for  Lucene / Solr

Sigmoid

Slope x-intercept (year)

Page 43: A Static Rank Framework for  Lucene / Solr

1.0

-1.0

Years-ago

x0 = 1.5 years ago

Page 44: A Static Rank Framework for  Lucene / Solr

Static Rank

PubDate

IsNews

MediaType

0

T

F

SumValueSource

EnumValueSource

MuxValueSourceAgoValueSource

SigmoidValueSource

-1

1

years ago

years ago

years ago

Page 45: A Static Rank Framework for  Lucene / Solr

SigmoidValueSource Config

Page 46: A Static Rank Framework for  Lucene / Solr

Static Rank Config

Page 47: A Static Rank Framework for  Lucene / Solr

Conclusion

• solr.ValueSource/Parser - fast and flexible

Page 48: A Static Rank Framework for  Lucene / Solr

Conclusion

• solr.ValueSource/Parser - fast and flexible

• CScore = DScore * (100 + S% * SScore)• -1.0 < SScore < 1.0

Page 49: A Static Rank Framework for  Lucene / Solr

Conclusion

• solr.ValueSource/Parser - fast and flexible

• CScore = DScore * (100 + S% * SScore)• -1.0 < SScore < 1.0

• “Time” as a common currency for static features