Upload
paniz
View
18
Download
1
Embed Size (px)
DESCRIPTION
Query Caching in Agent-based Distributed Information Retrieval. Hemali Majithia. Problem Definition. DIR (IR) systems access their collections to perform searches and answer queries Query resolution on large corpora is expensive in terms of time and resources - PowerPoint PPT Presentation
Citation preview
October 24, 2002 Hemali Majithia - CADIP, UMBC 1
Query Caching in Agent-based Distributed Information Retrieval
Hemali Majithia
October 24, 2002 Hemali Majithia - CADIP, UMBC 2
Problem Definition DIR (IR) systems access their collections to perform
searches and answer queries Query resolution on large corpora is expensive in terms of
time and resources Similar queries produce similar results
Repetitive and redundant searching of the collections Resource Wastage and Inefficiency
Solution – “ CACHING QUERIES ”
October 24, 2002 Hemali Majithia - CADIP, UMBC 3
Solution Caching Mechanism
Cache new queries along with the results Answer future similar queries using the cached queries
New Query Query which has not been answered before
Similar Query Query which is identical or similar to the queries existing in the
cache Emphasis
If similar queries exist, you can retrieve the results for those queries from the previous searched queries rather than exact match
Retrieval linear time collection size
October 24, 2002 Hemali Majithia - CADIP, UMBC 4
Caching Mechanism Two level Caching Mechanism
First level Exact Match Second level Inverted Index of the queries
Caching Algorithm Least Recent Used (LRU) Least Frequent Used (LFU) Lowest Relative Value (LRV)
Similarity Metric Cosine Similarity
October 24, 2002 Hemali Majithia - CADIP, UMBC 5
Caching in CARROT–II
NodeI
NodeII
QueryAgent
C2Agent
C2Agent
C2Agent
C2Agent
C2Agent
11. Response
C2Agent
2. Lookup
1. User query
4. Q
uery
forw
arde
d
5. M
iss
7. Lookup
6. Query forwardedto best C2
10. Results returned
9.. Update cache
SecondaryCache
SecondaryCache
Primary cache
Primarycache
Primarycache
Primary cache
Primarycache
Primarycache
3. MISS 8. HIT
October 24, 2002 Hemali Majithia - CADIP, UMBC 6
Metrics for Evaluation of Caching Mechanism
Efficiency Round Trip Time (RTT) = Total time to answer queries fired at the
system Hit Rate = For each agent cache and total hit rate Cost of caching = The over head caused by caching (assuming that
the HIT rate is 0)
Effectiveness Precision = fraction of retrieved documents that are relevant Recall =fraction of relevant documents that are retrieved