25
Leveraging Conceptual Lexicon Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima Gerani, Jimmy Xiangji Huang and Fabio Crestani Source : SIGIR’13 Advisor : Jia-ling Koh Speaker : Yi-hsuan Yeh

Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima

Embed Size (px)

Citation preview

Page 1: Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima

Leveraging Conceptual Lexicon:Query Disambiguation using

Proximity Information for Patent Retrieval

Date : 2013/10/30

Author : Parvaz Mahdabi, Shima Gerani,

Jimmy Xiangji Huang and Fabio Crestani

Source : SIGIR’13

Advisor : Jia-ling Koh

Speaker : Yi-hsuan Yeh

Page 2: Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima

2

Outline Introduction Method Experiments Conclusion

Page 3: Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima

3

Introduction Patent prior art search is a task in patent

retrieval where the goal is to rank documents which describe prior art work related to a patent application.

Challenge:1. Find a focused information need and remove the

ambiguous and noisy terms.2. Query disambiguation. (ex: bus)

Page 4: Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima

4

Introduction Previous work has not fully studied the effect of

using proximity information and exploiting domain specific resources for performing query disambiguation.

1. Terms closer to query terms are more likely to be related to the query topic.

2. Using a domain dependent resource leads to the extraction of more relevant expansion concepts.

Propose a proximity based framework for query expansion which utilizes a conceptual lexicon for patent retrieval.

Page 5: Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima

5

Framework

Querypatent

document

Query

Query-specific lexicon

Proximity-based

method

Query expansion

terms

Re-rank result list

Page 6: Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima

6

Outline Introduction Method

Query document reduction Building conceptual lexicon Proximity-based framework Document relevance score Expansion concept selection strategies

Experiments Conclusion

Page 7: Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima

7

Query document reduction Query patent: title, abstract, description,

and claims Example:

1. A chair having only two legs. 2. The chair of claim , further comprising at least

one leg made of wood.

Claim is independent because it does not reference any other claim.

Use the items in the first independent claim as the initial query.

Page 8: Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima

8

Building conceptual lexicon Form: IPC (International Patent Classification)

definition pages Stop-words removal

Filter out document frequency > 10

The IPC class of the query is searched in the lexicon and the terms matching this class are considered as candidate expansion terms.

Candidate expansion terms

Page 9: Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima

9

Proximity-based framework Assume: An expansion term refer with

higher probability to the query terms closer to its position.

1

20

32

12

An expansion term()

Query term ()

: the query term at position in the document d

Document d

Position

Page 10: Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima

10

𝑝 (𝑞∨𝑡 𝑗 )=25

𝑑1𝑑2

𝑑3𝑑4𝑑5

Query

Term

Query:

Term : chair

Page 11: Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima

11

Gaussian kernel Laplace kernel

Rectangle kernel

𝑖𝑗

𝑘 ( 𝑗 ,𝑖 )

Page 12: Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima

12

Example: Rectangle kernel

𝜎: bandwidth parameterAssume:

𝑖𝑖−1 𝑖+1

0.144

𝑖+2𝑖−2 𝑖+3𝑖−3

𝑘 (𝑖 , 𝑗 )

𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛

Bandwidth = 2

Page 13: Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima

13

Document relevance score1. Avg position strategy

2. Max position strategy

expansion term

𝑡1𝑡 2

𝑡 3

Documents

Page 14: Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima

14

Expansion concept selection strategies1. Explicit expansion concepts (EEC)

Restrict expansion term that appear in (query document).

2. Implicit expansion concepts (IEC) Use all expansion term.

Page 15: Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima

15

3. Combine search strategies (CSS) Linear combine query result lists and IPC

expansion concepts result list.

4. Proximity-based pseudo relevance feedback (PPRF)

Extracting expansion concepts form the feedback documents.

Page 16: Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima

16

Outline Introduction Method Experiments Conclusion

Page 17: Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima

17

Experiments Dataset:

CLEF-IP 2010, CLEF-IP 2011

Evaluation: Top 1000 results MAP, Recall and PRES(patent retrieval evaluation

score)

Baseline: Language modeling with Dirichlet smoothing + language model re-rank

Page 18: Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima

18

Motivation for Using Proximity Information CLEF-IP 2010 100 random queries, top 100 documents

Page 19: Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima

19

Effect of Density Kernel

Page 20: Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima

20

Comparison of Max and Avg Strategy CLEF-IP 2010 Gaussian kernel IEC

Page 21: Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima

21

Number of Expansion Terms

Page 22: Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima

22

Effect of Combination

λ=0: the query expansion model is used λ=1: the initial query is used.

Page 23: Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima

23

Effect of Query Reformulation Gaussian kernel Max strategy 40 expansion terms

Page 24: Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima

24

Outline Introduction Method Experiments Conclusion

Page 25: Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima

25

Conclusion Constructed a domain dependent conceptual

lexicon which can be used as an external resource for query expansion.

Proximity-based retrieval framework provides a principled way to calculate the importance weight for expansion terms selected from the conceptual lexicon.

We showed that proximity of expansion terms to query terms is a good indicator of the importance of the expansion terms.