Selecting the right cache framework

Selecting The Right Cache FrameworkBEST CACHE FRAMEWORK FOR YOUR APPLICATION

@MOHAMMED FAZULUDDIN

Topics

• Overview• Types of Caches• Caching Algorithms• Cache Time Based Expiration Models • Cache Frameworks• Cache Drawbacks

• A cache is an amount of faster memory used to improve data access by storing portions of a data set the whole of which is slower to access.

• On most computer disk access is very slow relative to the speed of the main memory; to speed repeated accesses to files or disk blocks, most computers cache recently accessed data from disk in main memory or some other form of fast memory.

• Using caching technology across the multi-tier model can help reduce the number of back-and-forth communications.

• It avoids the expensive re-acquisition of objects by not releasing the objects immediately after their use and instead, the objects are stored in memory and reused for any subsequent client requests.

• The cache also allows the higher throughput from the underlying resources.

Overview

Caching Architecture

Overview

Types of Caches • Web Caching (Browser/Proxy/Gateway):• Browser, Proxy, and Gateway caching work differently but have the

same goal to reduce overall network traffic and latency. • Browser caching is controlled at the individual user level, where as,

proxy and gateway is on a much larger scale. • Commonly the cached data could be DNS (Domain Name Server) data,

used to resolve domain names to the IP addresses and mail server records.

• The data type which changes infrequently is best cached for longer periods of time by the Proxy and/or Gateway servers.

• Browser caching helps users quickly navigate pages they have recently visited. This caching feature is free to take advantage of and is often overlooked by most hosting companies and many developers.

Types of Caches Web Caching (Browser/Proxy/Gateway):

Types of Caches • Data Caching:• Data caching is a very important tool when you have database driven

applications or CMS solutions. It’s best used for frequent calls to data that does not change rapidly.

• Data caching will help your website or application to load faster and it gives better experience to the users.

• It will avoiding extra trips to the DB to retrieve data sets even it is not changed. It stores the data in local memory on the server which is the fastest way to retrieve information on a web server.

• The database is the bottle neck for almost all web application, so the fewer DB calls the better. Most DB solutions will also make an attempt to cache frequently used queries in order to reduce turnaround time. For example, MS SQL uses Execution Plans for Store Procedures and Queries to speed up the process time.

Types of Caches Data Caching:

Types of Caches • Application/Output Caching:• Most CMS have built in this cache mechanisms; however, many users

don’t understand them and simply ignore them. • It’s best to understand what data cache options you have and to

implement them whenever possible. • Application/Output caching can drastically reduce your website load time

and reduce server overhead. • Different than Data Caching, which stores raw data sets.• Application/Output Caching often utilizes server level caching techniques

that cache raw HTML. • It can be per page of data, parts of a page (headers/footers) or module

data, but it is usually HTML markup.

Types of Caches • Distributed Caching:• Distributed Caching is for the big applications. • Most high volume systems like Google, YouTube, Amazon and many others use

this technique. • This approach allows the web servers to pull and store from distributed server’s

memory. Once implemented, it allow the web server to simply serve pages and not have to worry about running out of memory.

• This allows the distributed cache to be made up of a cluster of cheaper machines only serving up memory. Once the cluster is setup, you can add new machine of memory at any time without disrupting your users.

• Ever notice how these large companies like Google can return results so quickly when they have hundreds of thousands of simultaneous users? They use Clustered Distributed Caching along with other techniques to infinitely store the data in memory because memory retrieval is faster than file or DB retrieval.

Types of Caches Distributed Caching:

Caching Algorithms • Some of the most popular and theoretically important

algorithms are FIFO, LRU, LFU, LRU2, 2Q.• FIFO (First In First Out): • Items are added to the cache as they are accessed, putting them in a

queue or buffer and not changing their location in the buffer; when the cache is full, items are ejected in the order they were added.

• Cache access overhead is constant time regardless of the size of the cache.

• The advantage of this algorithm is that it's simple and fast; it can be implemented using just an array and an index.

• The disadvantage is that it's not very smart; it doesn't make any effort to keep more commonly used items in cache.

Caching Algorithms • LRU - (Least Recently Used): • Items are added to the cache as they are accessed; when the cache is full,

the least recently used item is ejected. • This type of cache is typically implemented as a linked list, so that an item

in cache, when it is accessed again, can be moved back up to the head of the queue; items are ejected from the tail of the queue. Cache access overhead is again constant time.

• This algorithm is simple and fast, and it has a significant advantage over FIFO in being able to adapt somewhat to the data access pattern; frequently used items are less likely to be ejected from the cache.

• The main disadvantage is that it can still get filled up with items that are unlikely to be re-accessed soon; in particular, it can become useless in the face of scans over a larger number of items than fit in the cache. Nonetheless, this is by far the most frequently used caching algorithm.

Caching Algorithms • LRU2 - (Least Recently Used Twice): • Items are added to the main cache the second time they are accessed; when

the cache is full, the item whose second most recent access is ejected. • Because of the need to track the two most recent accesses, access overhead

increases logarithmically with cache size, which can be a disadvantage. In addition, accesses have to be tracked for some items not yet in the cache.

• There may also be a second, smaller, time limited cache to capture temporally clustered accesses, but the optimal size of this cache relative to the main cache depends strongly on the data access pattern, so there's some tuning effort involved.

• The advantage is that it adapts to changing data patterns, like LRU, and in addition won't fill up from scanning accesses, since items aren't retained in the main cache unless they've been accessed more than once.

Caching Algorithms • 2Q - (Two Queues): • Items are added to an LRU cache as they are accessed. • If accessed again, they are moved to a second, larger, LRU cache. Items are

typically ejected so as to keep the first cache at about 1/3 the size of the second. • This algorithm attempts to provide the advantages of LRU2 while keeping cache

access overhead constant, rather than having it increase with cache size. Published data seems to indicate that it largely succeeds.

• LFU - (Least Frequently Used): • Frequency of use data is kept on all items. • The most frequently used items are kept in the cache. Because of the

bookkeeping requirements, cache access overhead increases logarithmically with cache size; in addition, data needs to be kept on all items whether or not in the cache.

Cache Time Based Expiration Models • Simple time-based expiration: Data in the cache is invalidated based on

absolute time periods. Items are added to the cache, and remains in the cache for a specific amount of time.

Summary for Simple time-based expiration: Fast, not adaptive, not scan resistant.

• Extended time-based expiration: Data in the cache is invalidated based on relative time periods. Items are added to the cache, and remains in the cache until they are invalidated at certain points in time, such as every five minutes, each day at 12.00 etc.

Summary for Extended time-based expiration: Fast, not adaptive, not scan resistant.

• Sliding time-based expiration: Data in the cache is invalidated by specifying the amount of time the item is allowed to be idle in the cache after last access time.

Summary for Sliding time-based expiration: Fast, adaptive, not scan.

Caching Time Based Expiration • JBoss Cache: • It can be used in a standalone, non-clustered environment, to cache

frequently accessed data in memory thereby removing data retrieval or calculation bottlenecks while providing “enterprise” features such as JTA compatibility, eviction and persistence.

• JBoss Cache is also a clustered cache, and can be used in a cluster to replicate state providing a high degree of failover.

• JBoss Cache can – and often is – used outside of JBoss AS, in other Java EE environments such as spring, Tomcat, Glassfish, BEA WebLogic, IBM WebSphere, and even in standalone Java programs.

• JBoss Cache works out of the box with most popular transaction managers, and even provides an API where custom transaction manager lookups can be written.

Cache Frameworks • OSCache: • It can be used to cache both static and dynamic web pages.• OSCache is also used by many projects Jofti, spring, Hibernate. • OSCache is also used by many sites like The Server Side, JRoller,

and Java Lobby• JCS(Java Caching System): • It is used in java for server-side java applications. • It is intended to speed up dynamic web applications by providing

a means to manage cached data of various dynamic natures. • Like any caching system, the JCS is most useful for high read, low

put applications.

Cache Frameworks • EhCache: • It is used for general purpose caching, J2EE and light-weight

containers tuned for large size cache objects. • EhCache Acts as a pluggable cache for Hibernate 2.1. With Small

foot print, Minimal dependencies, fully documented and Production tested.

• It is used in a lot of Java frameworks such as Alfresco, Cocoon, Hibernate, and spring, JPOX, Jofti, Acegi, Kosmos, Tudu Lists and Lutece.

• EhCache is the default cache for Hibernate with EhCache you can serialize both Serializable objects and Non-serializable.

• Non-serializable Objects can use all parts of EhCache except for Disk Store and replication.

Cache Frameworks • JCache: • JCache Open Source is an effort to make an Open Source version of JSR-107

JCache.• ShiftOne: • ShiftOne Java Object Cache is a Java library that implements several strict

object caching policies, as well as a light framework for configuring cache behavior.

• SwarmCache: • SwarmCache is a simple but effective distributed cache. It uses IP multicast to

efficiently communicate with any number of hosts on a LAN. • It is specifically designed for use by clustered, database-driven web

applications. • SwarmCache uses Java Groups internally to manage the membership and

communications of its distributed cache.

Cache Frameworks • WhirlyCache: • Whirlycache is a fast, configurable in-memory object cache for Java. • It can be used, for example, to speed up a website or an application by

caching objects that would otherwise have to be created by querying a database or by another expensive procedure.

• Jofti: • Jofti is a simple to use, high-performance object indexing and searching

solution for Objects in a Caching layer or storage structure that supports the Map interface.

• The framework supports EHCache, JBossCache and OSCache and provides for transparent addition, removal and updating of objects in its index as well as simple to use query capabilities for searching.

• Features include type aware searching, configurable object property indexing, indexing/searching by interfaces as well as support for Dynamic Proxies, primitive attributes, Collections and Arrays.

Cache Frameworks • cache4j: • cache4j is a cache for Java objects with a simple API and fast

implementation. • It features in-memory caching, a design for a multi-threaded

environment, both synchronized and blocking implementations, a choice of eviction algorithms (LFU, LRU, FIFO), and the choice of either hard or soft references for object storage..

• Open Terracotta: • Open Terracotta is Open Source Clustering for Java. • It has the features to support HTTP Session Replication, Distributed

Cache, POJO Clustering, Application Coordination across cluster's JVMs (made using code injection, so you don't need to modify your code).

Cache Drawbacks • Stale data:• This means that when you use cached content/data you are at risk of

presenting old data that's no longer relevant to the new situation. • If you've cached a query of products, but in the mean time the product

manager has delete four products, the users will get listings to products that don't exists.

• Overhead:• The business logic you use to make sure your data is somewhere between

being fast and being stale, which lead to complexity, and complexity leads to more code that you need to maintain and understand.

• You'll easily lose oversight of where data exists in the caching complex, at what level, and how to fix the stale data if you get it.

THANKS

Technology

Selecting the right cache framework