Search Engine Caching Rank-preserving two-level caching for scalable search engines, Paricia Correia...

Search Engine Caching

Rank-preserving two-level caching for scalable search engines, Paricia Correia Saraiva et al, September 2001 http://doi.acm.org.ezproxy.lib.vt.edu:8080/10.1145/383952.383959

Predictive Caching and Prefetching of Query Results in Search Engines, Ronny Lempel and Shlomo Moran, September 2001

http://doi.acm.org.ezproxy.lib.vt.edu:8080/10.1145/775152.775156

Presented by

Adam "So, is this gonna be on the test?" Edelman

The Problem

• The User: "I want my results now!"

• But...– Over 4 billion web pages– Over 1 million queries per minute

• How do we keep response times down as the web grows?

Search Engine Statistics

• 63.7% of the search phrases appear only once in the billion query log

• The 25 most popular queries in the log account for 1.5% of the submissions

• Considerable time and processing power can be saved through well implemented caching

Search Engine Statistics

• 58% of the users view only the first page of results (the top-10 results)

• No more than 12% of users browse through more than 3 result pages.

• We do not need to cache large result sets for a given query

What do we Cache?

• 36% of all queries have been retrieved before

• Can we apply caching even if the query does not exactly match any previous query?

What do we Cache?

• Saraiva et. al propose a two level cache

• In addition to caching query results, we also cache inverted lists for popular terms

Query Cache Implementation

• Store only the first 50 references per query– ~25KB per query

• Query logs show that the miss ratios do not drastically improve after query result cache exceeds 10 MB

Inverted List Cache Implementation• For this data set 50-75%

of inverted lists contain documents where term appears only once

• Use 4KB inverted list size per term– More work needs to be

• Asymptotic behavior is apparent after cache exceeds 200MB

• Use 250MB for IL Cache

Two-Level Cache Implementation

• Combine previous two caches

• 270MB total cache– Accounts for only 6.5% of overall index size

• Tested over a log of 100K queries to TodoBR

Two-Level Cache Results• Compared to caches of

270MB for only query results, only inverted lists and no cache

• Queries processed reduced by 62%– 21% increase compared to

only query result cache

• Page fetches from the database reduced 95%– 3% increase compared to

only inverted list cache

Two-Level Cache Results

• For more than 20 queries per second two-level cache is 20% disk reads of no cache

• Two-level cache can handle 64 queries per second against 22 per second with no cache

How do we cache?

• Saraiva et al use a least recently used (LRU) replacement policy for cache maintenance

• Users search in sessions, the next query will probably be related to the previous query

• Can we use this to improve caching?

Probability Driven Cache (PDC)

• Lempel and Moran propose a cache based on the probability of a page being requested

Page Least Recently Used (PLRU)

• Allocate a page queue that can accommodate a certain number of result pages

• When the queue is full and a new page needs to be cached, the least recently used page is removed from the cach

• Achieves hit ratios around 30% for warm, large caches

Page Segmented LRU (PSLRU)

• Maintains two LRU segments, a protected segment and a probationary segment

• Pages are first placed in the probationary segment, if requested again they are moved to the protected segment

• Pages evicted from the protected segment are moved to the probationary segment

• Pages evicted from the probationary segment are removed from the cache

• Consistently outperforms PLRU although difference is very small

Topic LRU (TLRU)

• Let t(q) denote the topic of the query q

• After the cache is warm, any cached result page of t(q) is moved to the tail of the queue.

• Each topic’s pages will reside contiguously in the queue

Topic SLRU (TSLRU)

• All pages are initially inserted in the probationary segment

• In addition to promoting pages from probationary to protected, we also promote all pages of t(q)

Search Engine Caching Rank-preserving two-level caching for scalable search engines, Paricia Correia...

Documents

Jorge A. Saraiva University of Aveiro

Caching for Cash: Caching

#SUPERNATURAL | by Milenna Casseb Saraiva

Clara Saraiva Discover Presentation

Storage-Aware Caching: Revisiting Caching for Heterogeneous Systems

Caching in ASP - gangainstitute.com€¦ · Caching in ASP.NET •Caching is the most critical factor in creating scalable, high performance web application. •Caching Locations:

Web Caching - University of California, San Diego · Web Caching Page 1 of 24 Web Caching based on: Web Caching , Geoff Huston Web Caching and Zipf-like Distributions: Evidence and

Cost-Aware Strategies for Query Result Caching in Web Search Engines

Performance of Compressed Inverted List Caching in Search ...engineering.nyu.edu/~suel/papers/listcaching.pdfPerformance of Compressed Inverted List Caching in Search Engines Jiangong

Paulo Saraiva dos Santos Almiro Moreira

CACHING TECHNIQUES FOR LARGE SCALE WEB SEARCH ENGINES

Compromised Search Engine Caching

Caching and caching dependencies explained in Kentico CMS

Paulo Saraiva dos Santos Data Collection Department

Pearson apresentacao saraiva

Ehcache User Guide · Ehcache is now used for Hibernate caching, data access object caching, security credential caching, web caching, SOAP and RESTful server caching, application

Cache Design of SSD-Based Search Engine Architectures: An ...cseweb.ucsd.edu/~csjgwang/pubs/TOIS14_SSDWebCache.pdf · Caching is an important optimization in search engine architectures

Saraiva final

Intro to Caching,Caching Algorithms and Caching Frameworks

PlacePlace TypeType ServiceService Analysis Caching Integration Sync Search Relational BLOB Query BackupLoad Multi Dim In Memory File XML Reporting