Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
InvestigatingperTopicUpperBoundforSessionSearchEvaluation
Zhiwen Tang
DepartmentofComputerScienceGeorgetownUniversity
GraceHuiYang
SessionSearch
• Multiplerunsofsearch
• Complexinformationneed
• Evaluationneedstoconsiderthewholeprocess
1
• Usefulinformationthattheusergains• Rawrelevancescore
• Discounting• Basedondocumentranking• Basedondiversity
• User’sefforts• Timespent• Lengthsofdocumentsbeingviewed
2
EvaluationofSessionSearch
• Mostsessionsearchmetricsconsiderallthosefactorsintooneoverwhelminglycomplexformula
• Theoptimalvalue,akaupperbound,ofthosemetricshighlyvariesondifferentsearchtopics
• InCranfield-likesettings(e.g.TREC),thedifferenceisoftenignored
3
TheProblem
• Twosystems
• Allthesystemsreturns5docsperround
• Eachsystemconductsoneroundofinteraction
• Metric:• CubeTest:
• Luo,Jiyun,etal."Thewaterfillingmodelandthecubetest:multi-dimensionalevaluationforprofessionalsearch." CIKM,2013.
4
Toyexample
𝐶𝑇 =∑ ∑ ∑ 𝜃&�
& 𝑟𝑒𝑙 𝑖, 𝑗 ∗ 𝛾1(&,3,456)|93:;<|4=6 >
3=6
∑ ∑ 𝑐𝑜𝑠𝑡(𝑖, 𝑗)|93:;<|4=6 >
3=6
5
ToyexampleDoc Relevancescoreregardingtopic-subtopic
1-1 1-2 2-1 2-2 2-3 2-4 2-5
d1 1 4
d2 3 4
d3 4
d4 4
d5 4
System Topic1 CT-topic1
Topic2 CT-topic2
CT-avg NormalizedCT-avg
System1 d1, irrel,irrel,irrel,irrel 1 d1,d3,d4,d5,irrel 16 8.5 0.596
System2 d2, irrel,irrel,irrel,irrel 3 d1,d3,d4,d5,irrel 14 8.5 0.787
Optimal d1, d2,irrel,irrel,irrel 4 d1, d2,d3,d4,d5 17
• Whatistheoptimalmetricvaluethatasystemcanachieve?
• Howtogettheupperboundforeachsearchtopic?
• Howdoesitaffecttheevaluationconclusions?• Varianceofdifferenttopics
• Normalization
6
ResearchQuestions
𝑠𝑐𝑜𝑟𝑒C = D𝑟𝑎𝑤_𝑠𝑐𝑜𝑟𝑒 𝑡𝑜𝑝𝑖𝑐, 𝐴 − 𝑙𝑜𝑤𝑒𝑟_𝑏𝑜𝑢𝑛𝑑(𝑡𝑜𝑝𝑖𝑐)𝑢𝑝𝑝𝑒𝑟_𝑏𝑜𝑢𝑛𝑑 𝑡𝑜𝑝𝑖𝑐 − 𝑙𝑜𝑤𝑒𝑟_𝑏𝑜𝑢𝑛𝑑(𝑡𝑜𝑝𝑖𝑐)
�
;OP3&
• Session-DCG(sDCG)
• Järvelin,Kalervo,etal."Discountedcumulatedgainbasedevaluationofmultiple-queryIRsessions." AdvancesinInformationRetrieval (2008):4-15.
• CubeTest(CT)
• Luo,Jiyun,etal."Thewaterfillingmodelandthecubetest:multi-dimensionalevaluationforprofessionalsearch." CIKM,2013.
• ExpectedUtility(EU)
• Yang,Yiming,andAbhimanyuLad."Modelingexpectedutilityofmulti-sessioninformationdistillation." ConferenceontheTheoryofInformationRetrieval.Springer,Berlin,Heidelberg,2009.
7
Sessionsearchmetrics
𝐸𝑈 =D𝑃 𝜔 D D 𝜃& ∗ 𝛾1 &,3,456�
&∈V<,W
− 𝑎 ∗ 𝑐𝑜𝑠𝑡(𝑖, 𝑗)�
3,4 ∈X
)�
X
𝐶𝑇 =∑ ∑ ∑ 𝜃&�
& 𝑟𝑒𝑙 𝑖, 𝑗 ∗ 𝛾1(&,3,456)|93:;<|4=6 >
3=6
∑ ∑ 𝑐𝑜𝑠𝑡(𝑖, 𝑗)|93:;<|4=6 >
3=6
𝑠𝐷𝐶𝐺 =D D𝑟𝑒𝑙(𝑖, 𝑗)
1 + log` 𝑗 ∗ 1 + log`a 𝑖
|93:;<|
4=6
>
3=6
• Gain• Theamountofusefulinformationausercanlearnfromadocument
• Cost• Theefforttheuserspendsonthatdocument
• Rankingdiscounts:• Basedontheoriginalrankingpositionofadocument• Assumption:theloweradocumentranks,thelesslikelytheuserwillreadit
• Noveltydiscounts:• Measuresuser’sknowledgecoverage,ageneralformofrankingdiscount• Assumption:Ifadocumentisrelatedtoasubtopic/nuggetthattheuserreadbefore,thenitcontributeslessnovelinformationaboutthissubtopic/nugget
8
Deconstructthemetrics
• sDCG
• CubeTest
• ExpectedUtility
9
Deconstructthemetrics
CostGain Rank_discount Novelty_discount
𝑠𝐷𝐶𝐺 =D D𝑟𝑒𝑙(𝑖, 𝑗)
1 + log` 𝑗 ∗ 1 + log`a 𝑖
|93:;<|
4=6
>
3=6
𝐶𝑇 =∑ ∑ ∑ 𝜃&�
& 𝑟𝑒𝑙 𝑖, 𝑗 ∗ 𝛾1(&,3,456)|93:;<|4=6 >
3=6
∑ ∑ 𝑐𝑜𝑠𝑡(𝑖, 𝑗)|93:;<|4=6 >
3=6
𝐸𝑈 =D𝑃 𝜔 D D 𝜃& ∗ 𝛾1 &,3,456�
&∈V<,W
− 𝑎 ∗ 𝑐𝑜𝑠𝑡(𝑖, 𝑗)�
3,4 ∈X
)�
X
• sDCG
• CubeTest
• ExpectedUtility
10
Deconstructthemetrics
𝑠𝐷𝐶𝐺 = 𝐷𝑖𝑠𝑐𝑜𝑢𝑛𝑡𝑒𝑑𝐺𝑎𝑖𝑛 =D𝑟𝑎𝑛𝑘_𝑑𝑖𝑠𝑐𝑜𝑢𝑛𝑡V
�
V
∗ 𝑔𝑎𝑖𝑛V
𝐶𝑇 =𝐷𝑖𝑠𝑐𝑜𝑢𝑛𝑡𝑒𝑑𝐺𝑎𝑖𝑛
𝐶𝑜𝑠𝑡=∑ ∑ 𝑛𝑜𝑣𝑒𝑙𝑡𝑦_𝑑𝑖𝑠𝑐𝑜𝑢𝑛𝑡V,& ∗ 𝑔𝑎𝑖𝑛V,&�
&�V
∑ 𝑐𝑜𝑠𝑡V�V
𝐸𝑈 = 𝐷𝑖𝑠𝑐𝑜𝑢𝑛𝑡𝑒𝑑𝐺𝑎𝑖𝑛 − 𝐷𝑖𝑠𝑐𝑜𝑢𝑛𝑡𝑒𝑑𝐶𝑜𝑠𝑡
= DD𝑛𝑜𝑣𝑒𝑙𝑡𝑦_𝑑𝑖𝑠𝑐𝑜𝑢𝑛𝑡V,& ∗ 𝑟𝑎𝑛𝑘_𝑑𝑖𝑠𝑐𝑜𝑢𝑛𝑡V ∗ 𝑔𝑎𝑖𝑛V,&
�
&
−D𝑟𝑎𝑛𝑘_𝑑𝑖𝑠𝑐𝑜𝑢𝑛𝑡V ∗ 𝑐𝑜𝑠𝑡V�
V
�
V
• Factorsconsideredinthemetrics:• Gain,Cost,Rankingdiscount,Noveltydiscount
• Wearedealingwithrankings• Howtomaximize/minimizethediscountedsum?
11
OptimizationMethod
• RearrangementInequality
• InIR,ProbabilityRankingPrinciple[4]• theoveralleffectivenessofanIRsystemcanbeachievedthebestbyrankingthedocumentsbytheirusefulnessindescendingorder
12
Oursolution
𝑥6𝑦1 + 𝑥g𝑦156 +…+ 𝑥1𝑦6 ≤ 𝑥j 6 𝑦6 + 𝑥j g 𝑦g +…+ 𝑥j 1 𝑦1 ≤ 𝑥6𝑦6 + 𝑥g𝑦g + ⋯+ 𝑥1𝑦1𝑓𝑜𝑟𝑥6 ≤ 𝑥g … ≤ 𝑥1𝑎𝑛𝑑𝑦6 ≤ 𝑦g … ≤ 𝑦1
13
Oursolution
• Butinourproblem:• Multiplerankinglistsarerequiredtobeoptimizedsimultaneously• E.g.Maximizethegainonallthesubtopicssimultaneously
• How?• Optimizeeachrequiredrankinglistindependentlytoapproximatetheoverallbound
• Onlyonerankinglistneedstobeoptimized
14
sDCG
𝑠𝐷𝐶𝐺 = 𝐷𝑖𝑠𝑐𝑜𝑢𝑛𝑡𝑒𝑑𝐺𝑎𝑖𝑛 =D𝑟𝑎𝑛𝑘_𝑑𝑖𝑠𝑐𝑜𝑢𝑛𝑡V
�
V
∗ 𝑔𝑎𝑖𝑛V
𝑚𝑎𝑥𝑖𝑚𝑖𝑧𝑒D D𝑟𝑒𝑙(𝑖, 𝑗)
1 + log` 𝑗 ∗ 1 + log`a 𝑖
|93:;<|
4=6
>
3=6
𝑠𝐷𝐶𝐺 =D D𝑟𝑒𝑙(𝑖, 𝑗)
1 + log` 𝑗 ∗ 1 + log`a 𝑖
|93:;<|
4=6
>
3=6
• #(C)+1rankinglistsneedtobeoptimized
15
CubeTest(CT)
𝐶𝑇 =∑ 𝜃& ∑ ∑ 𝑟𝑒𝑙 𝑖, 𝑗 ∗ 𝛾1(&,3,456)|93:;<|
4=6 >3=6
�&
∑ ∑ 𝑐𝑜𝑠𝑡(𝑖, 𝑗)|93:;<|4=6 >
3=6
𝐶𝑇 =𝐷𝑖𝑠𝑐𝑜𝑢𝑛𝑡𝑒𝑑𝐺𝑎𝑖𝑛
𝐶𝑜𝑠𝑡=∑ ∑ 𝑛𝑜𝑣𝑒𝑙𝑡𝑦_𝑑𝑖𝑠𝑐𝑜𝑢𝑛𝑡V,& ∗ 𝑔𝑎𝑖𝑛V,&�
&�V
∑ 𝑐𝑜𝑠𝑡V�V
𝑚𝑎𝑥𝑖𝑚𝑖𝑧𝑒D D 𝑟𝑒𝑙& 𝑖, 𝑗 ∗ 𝛾∑ 93:;o p 456<qrosr ∀𝑐
93:;<
4=6
>
3=6
𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒D D 𝑐𝑜𝑠𝑡(𝑖, 𝑗)93:;<
4=6
>
3=6
• AnapproximationofEU[2]
• 𝟂:thesubsetofdocumentstheuserchecked• #(C)+1rankinglistsneedtobeoptimized
16
ExpectedUtility(EU)
𝐸𝑈 = 1
1 − 𝛾 D𝜃& 1 − 𝛾∑ v X 1 &,X�
w
�
&
− 𝑎D𝑃 𝜔 𝑙𝑒𝑛(𝜔)�
X
𝑚𝑎𝑥𝑖𝑚𝑖𝑧𝑒D D 𝑟𝑒𝑙& 𝑖, 𝑗 ∗ 1 − 𝑝 456∀𝑐93:;<
4=6
>
3=6
𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒D D 𝑐𝑜𝑠𝑡 𝑖, 𝑗 1 − 𝑝 45693:;<
4=6
>
3=6
• Dataset:• SubmittedrunsofTREC2016DynamicDomaintrack• SomestatisticsofTREC2016DDcorpus:
• #Topics=53• #Subtopics=242• #relevantdocs=14597
17
Experiments
18
Boundsondifferenttopics
𝑠𝐷𝐶𝐺 = 𝐷𝑖𝑠𝑐𝑜𝑢𝑛𝑡𝑒𝑑𝐺𝑎𝑖𝑛
19
Boundsondifferenttopics
𝐶𝑇 =𝐷𝑖𝑠𝑐𝑜𝑢𝑛𝑡𝑒𝑑𝐺𝑎𝑖𝑛
𝐶𝑜𝑠𝑡
20
Boundsondifferenttopics
𝐸𝑈 = 𝐷𝑖𝑠𝑐𝑜𝑢𝑛𝑡𝑒𝑑𝐺𝑎𝑖𝑛−𝐷𝑖𝑠𝑐𝑜𝑢𝑛𝑡𝑒𝑑𝐶𝑜𝑠𝑡
• Thedifferenceoftheoptimalvalueametricwouldproducefordifferenttopicsislargeandshouldnotbeignored.
21
Conclusion1
22
NormalizationEffect𝑠𝐷𝐶𝐺 = 𝐷𝑖𝑠𝑐𝑜𝑢𝑛𝑡𝑒𝑑𝐺𝑎𝑖𝑛
23
NormalizationEffect𝐶𝑇 =
𝐷𝑖𝑠𝑐𝑜𝑢𝑛𝑡𝑒𝑑𝐺𝑎𝑖𝑛𝐶𝑜𝑠𝑡
24
NormalizationEffect𝐸𝑈 = 𝐷𝑖𝑠𝑐𝑜𝑢𝑛𝑡𝑒𝑑𝐺𝑎𝑖𝑛 − 𝑎 ∗ 𝐷𝑖𝑠𝑐𝑜𝑢𝑛𝑡𝑒𝑑𝐶𝑜𝑠𝑡 𝑎 = 0.01
25
NormalizationEffect𝐸𝑈 = 𝐷𝑖𝑠𝑐𝑜𝑢𝑛𝑡𝑒𝑑𝐺𝑎𝑖𝑛 − 𝑎 ∗ 𝐷𝑖𝑠𝑐𝑜𝑢𝑛𝑡𝑒𝑑𝐶𝑜𝑠𝑡 𝑎 = 0.001
• Usingtheboundsfornormalizationbringsinmorefairnessintoevaluation
26
Conclusion2
• Deconstructionofsessionsearchmetrics
• Computingtheupperboundoneachsearchtopic
• Hugevarianceontheupperboundsamongtopics
• Normalizationprovidesanotherviewpoint
27
Summary
• Canthisboundhelpusdesignabettersessionsearchsystem?
• Lazyuser,smartsystem
• Ifthesystemhascompletedthefirst𝑘 iterationsandknowsitsactualscore
• Ifitalsoknowstheupperboundscorefor𝑘+1iterations
• Stoporcontinue?
28
Discussion
• Usedinthisyear’sTREC-DDevaluation• https://github.com/trec-dd/trec-dd-jig• http://trec-dd.org/
29
Resource
30
Thankyou!
31
Reference
• [1]Kalervo Järvelin,SusanLPrice,LoisMLDelcambre,andMarianneLykkeNielsen.2008. Discountedcumulatedgainbasedevaluationofmultiple-queryIRsessions. InEuropeanConferenceonInformationRetrieval.Springer,4-15.
• [2]Jiyun Luo,ChristopherWing,HuiYang,andMartiHearst.2013. Thewaterllingmodelandthecubetest:multi-dimensionalevaluationforprofessionalsearch.In Proceedingsofthe22ndACMinternationalconferenceonInformation&KnowledgeManagement.ACM,709-714.• [3]Yiming YangandAbhimanyuLad.2009. Modelingexpectedutilityofmulti-sessioninformationdistillation. InConferenceontheTheoryofInformationRetrieval.Springer,164-175.• [4]Robertson,StephenE."TheprobabilityrankingprincipleinIR." Journalofdocumentation 33.4(1977):294-304.