6
October 24, 2002 Hemali Majithia - CADIP, UMBC 1 Query Caching in Agent-based Distributed Information Retrieval Hemali Majithia

Query Caching in Agent-based Distributed Information Retrieval

  • Upload
    paniz

  • View
    18

  • Download
    1

Embed Size (px)

DESCRIPTION

Query Caching in Agent-based Distributed Information Retrieval. Hemali Majithia. Problem Definition. DIR (IR) systems access their collections to perform searches and answer queries Query resolution on large corpora is expensive in terms of time and resources - PowerPoint PPT Presentation

Citation preview

Page 1: Query Caching in Agent-based Distributed Information Retrieval

October 24, 2002 Hemali Majithia - CADIP, UMBC 1

Query Caching in Agent-based Distributed Information Retrieval

Hemali Majithia

Page 2: Query Caching in Agent-based Distributed Information Retrieval

October 24, 2002 Hemali Majithia - CADIP, UMBC 2

Problem Definition DIR (IR) systems access their collections to perform

searches and answer queries Query resolution on large corpora is expensive in terms of

time and resources Similar queries produce similar results

Repetitive and redundant searching of the collections Resource Wastage and Inefficiency

Solution – “ CACHING QUERIES ”

Page 3: Query Caching in Agent-based Distributed Information Retrieval

October 24, 2002 Hemali Majithia - CADIP, UMBC 3

Solution Caching Mechanism

Cache new queries along with the results Answer future similar queries using the cached queries

New Query Query which has not been answered before

Similar Query Query which is identical or similar to the queries existing in the

cache Emphasis

If similar queries exist, you can retrieve the results for those queries from the previous searched queries rather than exact match

Retrieval linear time collection size

Page 4: Query Caching in Agent-based Distributed Information Retrieval

October 24, 2002 Hemali Majithia - CADIP, UMBC 4

Caching Mechanism Two level Caching Mechanism

First level Exact Match Second level Inverted Index of the queries

Caching Algorithm Least Recent Used (LRU) Least Frequent Used (LFU) Lowest Relative Value (LRV)

Similarity Metric Cosine Similarity

Page 5: Query Caching in Agent-based Distributed Information Retrieval

October 24, 2002 Hemali Majithia - CADIP, UMBC 5

Caching in CARROT–II

NodeI

NodeII

QueryAgent

C2Agent

C2Agent

C2Agent

C2Agent

C2Agent

11. Response

C2Agent

2. Lookup

1. User query

4. Q

uery

forw

arde

d

5. M

iss

7. Lookup

6. Query forwardedto best C2

10. Results returned

9.. Update cache

SecondaryCache

SecondaryCache

Primary cache

Primarycache

Primarycache

Primary cache

Primarycache

Primarycache

3. MISS 8. HIT

Page 6: Query Caching in Agent-based Distributed Information Retrieval

October 24, 2002 Hemali Majithia - CADIP, UMBC 6

Metrics for Evaluation of Caching Mechanism

Efficiency Round Trip Time (RTT) = Total time to answer queries fired at the

system Hit Rate = For each agent cache and total hit rate Cost of caching = The over head caused by caching (assuming that

the HIT rate is 0)

Effectiveness Precision = fraction of retrieved documents that are relevant Recall =fraction of relevant documents that are retrieved