View
332
Download
2
Category
Preview:
Citation preview
THE WATER FILLING MODEL AND THE CUBE TEST:�Multi-Dimensional Evaluation for Professional Search
Jiyun Luo1 Christopher Wing1 Grace Hui Yang1 Marti A. Hearst2
1Department of Computer Science Georgetown University Washington, DC, USA {jl1749, cpw26}@georgetown.edu huiyang@cs.georgetown.edu CIKM 2013
2School of Information University of California, Berkeley Berkeley, CA, USA hearst@berkeley.edu
1
INTRODUCTION
¢ Complicated search has recently received much attention
¢ Professional search activities are usually complicated search tasks � Examples: Medical record search, Legal search,
Patent prior art search
¢ Evaluation metrics need to reflect this complexity � U-measure for whole session evaluation [Sakai et al.
sigir’13] � Time-based gain [Smucker and Clarke sigir’12] � α-nDCG for diversity and novelty [Clarke et al. sigir’08] � PRES for recall-orientated search tasks [Magdy and Jones,
sigir’10] 2
PROFESSIONAL SEARCH
¢ Rich information needs � Multiple aspects or subtopics
¢ Time-sensitive � It is not true that professional searchers, e.g., lawyers, are
evil and would like to read irrelevant documents since they are paid by time and only care about recall
¢ Novelty � Once examined one relevant document, subsequent
relevant documents are perceived as less relevant
¢ Stopping criteria � Once a sub-information-need has been fulfilled, relevant
documents about it will contribute not much any more
¢ A mix of unranked and ranked retrieval � Boolean search and proximity search are still popular 3
Fenestration Segment Stent-Graft and Fenestration Method US 20090259290 A1
Patent Prior Art Search
ABSTRACT A method includes deploying a fenestration segment stent-graft into a main vessel such that a fenestration section …
1. A fenestration segment stent-graft comprising : a proximal section comprising a woven graft cloth; … 2. The fenestration segment stent-graft of claim 1 wherein said proximal section comprises a proximal end and a distal end, … 3. The fenestration segment stent-graft of claim 2 wherein said attachment means comprises stitching. … 20. A fenestration segment stent-graft comprising : a proximal section; a distal section; … 21. The fenestration segment stent-graft of claim 20 wherein said fenestration section comprises : graft material comprising loose woven fibers…
Claims
4
Looking for published literature that can be used to `say no’ to a patent application. A granted patent should be novel and non-trivial. Ø Time constraint: less than 6 hours
Independent
Dependent Dependent Dependent
5
¢ Information need with multiple subtopics
¢ Goal: fulfill the info need with relevant documents as soon as possible
¢ A document can cover different subtopics
¢ Stop finding more relevant documents for a subtopic or for the entire information need
¢ A cube with multiple segments
¢ Goal: fill up the cube with water as soon as possible
¢ “document water” can flow in different segments
¢ Reaching a cap in a segment and no more water can go there
Professional Search The Water-filling Model
We draw an analogy between Professional Search and Filling Water into a Cube
How to judge a search system is good? Ø We assume the searcher wants the multi-subtopics of a task
to be fulfilled as quickly as possible & as much as possible
The Task Cube
Ø The Cube with unit length represents the entire information need
Ø Each cuboid in the Cube represents a subtopic
Ø The top of the Cube is the cap that limits the maximum amount of relevant information needed Ø Stopping criterion
Ø The bottom is segmented into different areas. Ø The area size indicates the importance of each
subtopic. Ø E.g. in prior art search, independent claims are
assigned more weights than dependent claims
6
An empty task cube for a search task with 6 subtopics
The Water Filling Model
7
Ø A new coming relevant document will increase waters in all its relevant subtopics
Ø The height increment is the relevance gain from that document with regard to that subtopic
Ø The total height of the water in one cuboid represents the accumulated relevance gain for a subtopic
Ø Total volume in the task Cube is the total Gain
The Cube Test
Ø Based on the water-filling model, we design a new multi-dimensional evaluation metric for professional search: the Cube Test (CT)
8
Ø CT calculates the rates of how fast a search system can fill up the task cube as much as possible
Ø It is a speed function
The Gain Function
𝐺𝑎𝑖𝑛(𝑄,𝑑𝑗)=∑𝑖↑▒𝑎𝑟𝑒𝑎𝑖 ×height𝑖,𝑗 × KeepFilling𝑖
Ø Document dj’s gain is calculated as the volume of relevant “document water” that matches to all subtopics in the task cube.
Ø A more concrete equation:
where - Γ is a discounting factor for subtopic novelty, Γ = γnrel(ci,j-1) where nrel(ci, j-1) is # of relevant documents for subtopic ci in previously examined documents (d1 to dj-1).
- θi is the importance of the ith subtopic, ∑𝑖↑▒θ𝑖 = 1. - rel(d j,c i) is the water height, i.e., the document d j’s
relevance grade towards subtopic c i, - Ι is the indicator function, - MaxHeight is the cap for subtopic relevance (set to 1). 9
10
Ø Total Gain for a list of documents have been examined
The Total Gain Function
Ø Note that it does not assume any traversal order
Ø It even does not assume ranked retrieval
Ø This allows us to support both ranked and unranked retrieval or a mix of them
The Cube Test - Recap
11
Ø It is a speed function Ø The time function is the amount of time taken from the
beginning up to the tth document, it can be Ø actual reading time Ø a formulation similar to TBG [Smucker &
Clarke,sigir’12], taking into account document length ∑𝑗=1↑𝑡▒4.4+ 𝑟↓𝑖 ×(0.018𝑙↓𝑗 +7.8)
Ø or simply # of documents have been examined so far
EXPERIMENTS Datasets
USPTO • It consists of three million US patent applications and
publications from 2001 to 2013 in XML with images removed. • We created 33 runs for 49 prior art finding tasks. • Office actions written by US Patent Examiners are parsed
and the ground truth are extracted automatically from them (PublicPair)
CLEF-IP 2012 • XML patent documents from the European Patent Office
(EPO) prior to 2002 and 400,000+ documents published by the World Intellectual Property Organization (WIPO).
• We evaluate the 31 official runs from 5 teams who participated CLEF-IP 2012.
12
Discriminative Power
Ø We compare the new metric with a few well-known metrics: • Recall • I-rec (Sakai et al. EVIA’10] • nDCG • α-nDCG [Clarke et al. sigir’08] • PRES [Magdy and Jones, sigir’10] • MAP • TBG [Smucker & Clarke, sigir’12] • nERR-IA [Sakai & Song, sigir’11]
Ø Evaluate the evaluation metrics by their discrimination power [Sakai, sigir’06]
Ø We test a few variations of CT
Ø In the CLEF-IP dataset, all CT metrics show high discriminative power.
13
Ø For the USPTO dataset, Recall and I-rec show the best discriminative power. CT metrics show good discriminative power.
Tradeoff between coverage and single relevance
Ø CT is able to adjust its bias between recall-oriented tasks and precision-oriented tasks
Ø We create two artificial runs Ø coverage run It arranges relevant
documents to each subtopic in a round-robin fashion.
Ø single relevance run It puts all relevant documents ordered by rel(d, ci) for a subtopic first, then for the next subtopic.
CT vs. γ for the coverage run
CT vs. γ for the single relevance run
The novelty discount base γ ranges in [0.1,0.9]. When γ is small, CT has a big novelty discount, is biased towards coverage and rewards more for runs that spread relevant documents across different subtopics; When γ is big, CT is biased towards precision and rewards more for runs that produce highly relevant documents early.
14
Conclusions
Ø This paper presents a novel evaluation metric (the Cube Test), based on a novel utility model (the water filling model)
Ø It addresses several important dimensions in professional search, and in complicated search in general Ø Covers different aspects or subtopics Ø Subtopics no need to be equally important Ø Allows for single document to cover several subtopics Ø Is time-sensitive Ø Handles the stopping criterion
Ø Adding more relevant documents to certain subtopic will not help to improve the overall gain
Ø Expresses the tradeoff between time, quality of documents, and diverse coverage of subtopics
15
Acknowledgments: Portions of this work were conducted to explore new concepts under the umbrella of a larger project at the US Patent and Trademark Office.
THANK YOU
Jiyun Luo1 Christopher Wing1 Hui Yang1 Marti A. Hearst2
1Department of Computer Science Georgetown University Washington, DC, USA {jl1749, cpw26}@georgetown.edu huiyang@cs.georgetown.edu
2School of Information University of California, Berkeley Berkeley, CA, USA hearst@berkeley.edu
16
CT Variations
17
Recommended