The Water Filling Model and The Cube Test: Multi-Dimensional Evaluation for Professional Search...

THE WATER FILLING MODEL AND THE CUBE TEST:�Multi-Dimensional Evaluation for Professional Search

Jiyun Luo1 Christopher Wing1 Grace Hui Yang1 Marti A. Hearst2

1Department of Computer Science Georgetown University Washington, DC, USA {jl1749, cpw26}@georgetown.edu huiyang@cs.georgetown.edu CIKM 2013

2School of Information University of California, Berkeley Berkeley, CA, USA hearst@berkeley.edu

INTRODUCTION

¢ Complicated search has recently received much attention

¢ Professional search activities are usually complicated search tasks �  Examples: Medical record search, Legal search,

Patent prior art search

¢ Evaluation metrics need to reflect this complexity �  U-measure for whole session evaluation [Sakai et al.

sigir’13] �  Time-based gain [Smucker and Clarke sigir’12] �  α-nDCG for diversity and novelty [Clarke et al. sigir’08] �  PRES for recall-orientated search tasks [Magdy and Jones,

sigir’10] 2

PROFESSIONAL SEARCH

¢  Rich information needs �  Multiple aspects or subtopics

¢  Time-sensitive �  It is not true that professional searchers, e.g., lawyers, are

evil and would like to read irrelevant documents since they are paid by time and only care about recall

¢  Novelty �  Once examined one relevant document, subsequent

relevant documents are perceived as less relevant

¢  Stopping criteria �  Once a sub-information-need has been fulfilled, relevant

documents about it will contribute not much any more

¢  A mix of unranked and ranked retrieval �  Boolean search and proximity search are still popular 3

Fenestration Segment Stent-Graft and Fenestration Method US 20090259290 A1

Patent Prior Art Search

ABSTRACT A method includes deploying a fenestration segment stent-graft into a main vessel such that a fenestration section …

1. A fenestration segment stent-graft comprising : a proximal section comprising a woven graft cloth; … 2. The fenestration segment stent-graft of claim 1 wherein said proximal section comprises a proximal end and a distal end, … 3. The fenestration segment stent-graft of claim 2 wherein said attachment means comprises stitching. … 20. A fenestration segment stent-graft comprising : a proximal section; a distal section; … 21. The fenestration segment stent-graft of claim 20 wherein said fenestration section comprises : graft material comprising loose woven fibers…

Claims

Looking for published literature that can be used to `say no’ to a patent application. A granted patent should be novel and non-trivial. Ø  Time constraint: less than 6 hours

Independent

Dependent Dependent Dependent

¢  Information need with multiple subtopics

¢  Goal: fulfill the info need with relevant documents as soon as possible

¢  A document can cover different subtopics

¢  Stop finding more relevant documents for a subtopic or for the entire information need

¢  A cube with multiple segments

¢  Goal: fill up the cube with water as soon as possible

¢  “document water” can flow in different segments

¢  Reaching a cap in a segment and no more water can go there

Professional Search The Water-filling Model

We draw an analogy between Professional Search and Filling Water into a Cube

How to judge a search system is good? Ø  We assume the searcher wants the multi-subtopics of a task

to be fulfilled as quickly as possible & as much as possible

The Task Cube

Ø  The Cube with unit length represents the entire information need

Ø  Each cuboid in the Cube represents a subtopic

Ø  The top of the Cube is the cap that limits the maximum amount of relevant information needed Ø  Stopping criterion

Ø  The bottom is segmented into different areas. Ø  The area size indicates the importance of each

subtopic. Ø  E.g. in prior art search, independent claims are

assigned more weights than dependent claims

An empty task cube for a search task with 6 subtopics

The Water Filling Model

Ø  A new coming relevant document will increase waters in all its relevant subtopics

Ø  The height increment is the relevance gain from that document with regard to that subtopic

Ø  The total height of the water in one cuboid represents the accumulated relevance gain for a subtopic

Ø  Total volume in the task Cube is the total Gain

The Cube Test

Ø  Based on the water-filling model, we design a new multi-dimensional evaluation metric for professional search: the Cube Test (CT)

Ø  CT calculates the rates of how fast a search system can fill up the task cube as much as possible

Ø  It is a speed function

The Gain Function

𝐺𝑎𝑖𝑛(𝑄,𝑑𝑗)=∑𝑖↑▒𝑎𝑟𝑒𝑎𝑖 ×height𝑖,𝑗 × KeepFilling𝑖

Ø  Document dj’s gain is calculated as the volume of relevant “document water” that matches to all subtopics in the task cube.

Ø  A more concrete equation:

where - Γ is a discounting factor for subtopic novelty, Γ = γnrel(ci,j-1) where nrel(ci, j-1) is # of relevant documents for subtopic ci in previously examined documents (d1 to dj-1).

- θi is the importance of the ith subtopic, ∑𝑖↑▒θ𝑖  = 1. - rel(d j,c i) is the water height, i.e., the document d j’s

relevance grade towards subtopic c i, - Ι is the indicator function, - MaxHeight is the cap for subtopic relevance (set to 1). 9

Ø Total Gain for a list of documents have been examined

The Total Gain Function

Ø Note that it does not assume any traversal order

Ø  It even does not assume ranked retrieval

Ø This allows us to support both ranked and unranked retrieval or a mix of them

The Cube Test - Recap

Ø  It is a speed function Ø The time function is the amount of time taken from the

beginning up to the tth document, it can be Ø  actual reading time Ø  a formulation similar to TBG [Smucker &

Clarke,sigir’12], taking into account document length ∑𝑗=1↑𝑡▒4.4+ 𝑟↓𝑖 ×(0.018𝑙↓𝑗 +7.8)  

Ø  or simply # of documents have been examined so far

EXPERIMENTS Datasets

USPTO •  It consists of three million US patent applications and

publications from 2001 to 2013 in XML with images removed. •  We created 33 runs for 49 prior art finding tasks. •  Office actions written by US Patent Examiners are parsed

and the ground truth are extracted automatically from them (PublicPair)

CLEF-IP 2012 •  XML patent documents from the European Patent Office

(EPO) prior to 2002 and 400,000+ documents published by the World Intellectual Property Organization (WIPO).

•  We evaluate the 31 official runs from 5 teams who participated CLEF-IP 2012.

Discriminative Power

Ø  We compare the new metric with a few well-known metrics: •  Recall •  I-rec (Sakai et al. EVIA’10] •  nDCG •  α-nDCG [Clarke et al. sigir’08] •  PRES [Magdy and Jones, sigir’10] •  MAP •  TBG [Smucker & Clarke, sigir’12] •  nERR-IA [Sakai & Song, sigir’11]

Ø  Evaluate the evaluation metrics by their discrimination power [Sakai, sigir’06]

Ø  We test a few variations of CT

Ø  In the CLEF-IP dataset, all CT metrics show high discriminative power.

Ø  For the USPTO dataset, Recall and I-rec show the best discriminative power. CT metrics show good discriminative power.

Tradeoff between coverage and single relevance

Ø  CT is able to adjust its bias between recall-oriented tasks and precision-oriented tasks

Ø  We create two artificial runs Ø  coverage run It arranges relevant

documents to each subtopic in a round-robin fashion.

Ø  single relevance run It puts all relevant documents ordered by rel(d, ci) for a subtopic first, then for the next subtopic.

CT vs. γ for the coverage run

CT vs. γ for the single relevance run

The novelty discount base γ ranges in [0.1,0.9]. When γ is small, CT has a big novelty discount, is biased towards coverage and rewards more for runs that spread relevant documents across different subtopics; When γ is big, CT is biased towards precision and rewards more for runs that produce highly relevant documents early.

Conclusions

Ø  This paper presents a novel evaluation metric (the Cube Test), based on a novel utility model (the water filling model)

Ø  It addresses several important dimensions in professional search, and in complicated search in general Ø  Covers different aspects or subtopics Ø  Subtopics no need to be equally important Ø  Allows for single document to cover several subtopics Ø  Is time-sensitive Ø  Handles the stopping criterion

Ø  Adding more relevant documents to certain subtopic will not help to improve the overall gain

Ø  Expresses the tradeoff between time, quality of documents, and diverse coverage of subtopics

Acknowledgments: Portions of this work were conducted to explore new concepts under the umbrella of a larger project at the US Patent and Trademark Office.

THANK YOU

Jiyun Luo1 Christopher Wing1 Hui Yang1 Marti A. Hearst2

1Department of Computer Science Georgetown University Washington, DC, USA {jl1749, cpw26}@georgetown.edu huiyang@cs.georgetown.edu

2School of Information University of California, Berkeley Berkeley, CA, USA hearst@berkeley.edu

CT Variations

The Water Filling Model and The Cube Test: Multi-Dimensional Evaluation for Professional Search...

Science

Web Cube and News Cube Tips

Mining Data Streams with Periodically changing Distributions Yingying Tao, Tamer Ozsu CIKM’09

Cikm keynote nov2014

Rubik’s Cube Flags of the World - · PDF fileRubik’s Cube Flags of the World 27.11.2008 Rubik’s Cube Rubik’s Cube Flags of the World Rubik’s Cube Flags of the

HRLF Hyperclean filling machine, filling

3d cube building cube by cube powerpoint ppt templates

ACM Seventeenth Conference - CIKM 2008 · ACM Seventeenth Conference on Information and Knowledge Management CIKM 2008 Opening Address ... • Solicit applications to host CIKM 2011

Date : 2012/3/5 Source: Marcus Fontoura et . al(CIKM’11)

CIKM Cup 2016: Cross-Device Linking

CIKM Tutorial 2008

CIKM Presentation at the AFAAS Review Workshop Addis-Ababa 15 oct 2014

Harvesting Knowledge from Web Data and Text CIKM 2010 Tutorial (1/2 Day)

7 cube and cube roots.pdf

Semantic Tags Generation and Retrieval for Online Advertising - CIKM 2010

X CUBE II x cube 160W User Manual

Rubik’s Cube Solution – Useful Links Rubik’s Cube …bayanbox.ir/view/8965019195528040020/RubiksCubeSolutions.pdf · Rubik’s Cube Solutions 06.12.2008 Rubik’s Cube Rubik’s

Cikm 2013 - Beyond Data From User Information to Business Value

October 28, 2008JSE-PR-08-03: CIKM 2008 -- Why E-Discovery is a CIKM-Hard Problem © Copyright 2008, JustSystems Evans Research, Inc. 1 Why E-Discovery

Contextual Shortcuts (CIKM 2007)

CIKM 2011 Keynote