Queues, Pools and Caches - Paper

Queues, Pools and Caches - Everything a DBA should know of scaling modern OLTPGwen (Chen) Shapira, Senior ConsultantThe Pythian [email protected]

Scalability Problems in Highly Concurrent SystemsWhen we drive through a particularly painful traffic jam, we tend to assume that the jam has a cause. That road maintenance or an accident blocked traffic and created the slowdown. However, we often reach the end of the traffic jam without seeing any visible cause.

Traffic researcher Prof. Sugiyama and his team showed that with sufficient traffic density, traffic jams will occur with no discernible root cause. Traffic jams will even form when cars drive in constant speed on a circular one-lane tracki.

“When a large number of vehicles, beyond the road capacity, are successively injected into the road, the density exceeds the critical value and the free flow state becomes unstable.”ii

OLTP systems are systems built to handle large number of small transactions. In those systems the main requirements are servicing large number of concurrent requests, with low and predictable latency. Good scalability for OLTP system can be defined as “Achieving maximum useful concurrency from a shared system”iii.

OLTP systems often behave exactly like traffic jams in Prof. Sugiyama’s experiments – more and more traffic is loaded into the database. Inevitably, a traffic jam will occur, and we may not be able to find any visible root cause for that. In a wonderful video, Andrew Holdsworth of Oracle’s Real World Performance group shows how increasing traffic on a database server can dramatically increase latency without any improvement in response times and how reducing the number of connections to the database can improve performanceiv.

In this presentation, I’ll discuss several design patterns and frameworks that are used to improve scalability by controlling concurrency in modern OTLP systems and web based architectures.

All the patterns and frameworks I’ll discuss are considered part of the software architecture. DBAs often take little interest in the design and architecture of the applications that use the database. But databases never operate in vacuum, DBAs who understand application design can have better dialog with the software team when it comes to scalability, and progress beyond finger pointing and “The database is slow” blaming. Those frameworks require sizing, capacity planning and monitoring – a task that DBAs are better qualified for than software developers, I’ll go into details on how DBAs can help size and monitor these systems with the database performance in mind.

mailto:[email protected]

Connection Pools

The Problem:Scaling application servers is a well understood problem. Through use of horizontal scaling and stateless interactions it is relatively easy to deploy enough application capacity to support even thousands of simultaneous user requests. This scalability, however, does not extend to the database layer.

Opening and closing a database connection is a high latency operation, due to the network protocol used between the application server and the database and the significant overhead of database resources. Web applications and OLTP systems can't afford this latency for every user request.

The Solution:Instead of opening a new connection for each application request, the application engine prepares a certain number of open database connections and caches them in a connection pool.

In Java, DataSource class is a factory for creating database connections and the preferred way of getting a connection. Java defines a generic DataSource interface, and there are many vendors that provide their own DataSource implementations. Many, but not all the implementations also include connection pooling.v

Using the generic DataSource interface, developers call getConnection(), and the DataSource class provides the connection. Since the developers write the same code regardless of whether the DataSource class they are using implements pooling or not, asking a developer whether he is using connection pooling is not a reliable method to determine if connection pooling is used.

To make things more complicated, the developer is often unaware of which DataSource class he is using. The DataSource implementation will be registered with the Java Naming Directory (JNDI) and can be deployed and managed separately from the application that is using it. Finding out which DataSource is used and how the connection pool is configured can take some digging and creativity. Most application servers contain a configuration file called "server.xml" or "context.xml" that will contain various resource descriptions. Search for a resource with type "javax.sql.DataSource" can find the configuration of the DataSource class and the connection pool minimum and maximum sizes.

The Architecture:

New problems:1. When connection pools are used all users share the same schema and same sessions, tracing

can be difficult. We advise developers to use DBMS_APPLICATION_INFO to set extra information such as username (typically in client_info field), module and action to assist in future troubleshooting.

2. Deciding on the size of a connection pool is the biggest challenge in using connection pools to increase scalability. As always, the thing that gets us into trouble is the thing we don’t know that we don’t know. Most developers are well aware that if the connection pool is too small, the database will sit idle while users are either waiting for connections or are being turned away. Since the scalability limitation of small connection pools are known, developers tend to avoid them by creating large connection pools, and increasing their size at the first hint of performance problems.

Application Business Layer

Application Data Layer

DataSource

JNDI

JDBC DriverConnection Pool

DataSource Interface

However a too large connection pool is a much greater risk to the application scalability. Here is what the scalability of an OLTP system typically looks likevi:

Amdahl’s law say that the scalability of the system is constrained by its serial component as the users are waiting for shared resources such as IO and CPU (This is the contention delay), but according to the Universal Scalability Law there is a second delay called “coherency delay” – which is the cost of maintaining data consistency in the system, this models waits on latches and mutexes. After a certain point, adding more users to the system will decrease throughput.

Even when throughput doesn’t increase, at the point where throughput stops growing linearly, data starts to queue and response times suffer proportionally:

If you check the wait events for a system that is past the point of saturation, you will see very high CPU utilization, high “log file sync” event as a result of the CPU contention and high waits for concurrency events such as “buffer busy waits” and “library cache latch”.

3. Even when the negative effects of too many concurrent users on the system are made clear, developers still argue for oversized connection pools with the excuse that most of the connections will be idle most of the time. There are two significant problems with this approach:

a. While we believe that most of the connections will be idle most of the time, we can’t be certain that this will be the case. In fact, the worst performance issues I’ve seen were caused by the application actually using the entire connection pool allocated. This often happens when response times at the database already suffer for some reason, and the application does not receive response in a timely manner. At this point the application or users rerun the operation, using another connection to run the exact same query. Soon there are hundreds of connections to the database, all attempting to run the same queries and waiting for the same latches.

b. The oversized connection pools have to be re-established during failover events or database restarts. The larger the connection pool is, the longer the application will take to recover from failover event, as a result decreasing the availability of the application.

4. Connection pools typically allow setting minimum and maximum sizes for the pool. When the application starts it will open connections until the minimum number of connections is met. Whenever it runs out of connections, it will open new connections until it reaches the maximum level. If connections are idle for too long, they will be closed, but never below the minimum level. This sounds fairly reasonable, until you ask yourself - if we set the minimum to the number of connections usually needed, when will the pool run out of connections?

A connection pool can be seen as a queue. Users arrive and are serviced by the database while holding a connection. According to little's law the avg. number of connections used in the queue is (avg. DB response time)*(avg. user arrival rate). It is easy to see that you will run out of connections if the rate that users use your site increases, or if the database performance degrades and response times increase.

If your connection pool can grow at these times, it means that it will open new connections, a resource intensive operation as we previously noted, to a database that is already abnormally busy. This will farther slow things down, which can lead to a vicious cycle known as "connection storm". It is much safer to configure the connection pool to a specific size – which is the maximum number of concurrent users that can run queries on the database with acceptable performance. We’ll discuss later how to determine this size. This will ensure that during peak times you will have enough connections to maximize throughput at acceptable latency, and no more.

5. Unfortunately, even if you decide on a proper number of database connections, there is the problem of multiple application servers. In most web architectures there are multiple web servers, each with a separate connection pool, all connecting to the same database server. In this case, it seems appropriate to divide the number of connections the DB will sustain by the number of servers and size the individual pools by that number. The problem with this approach is that load balancing is never perfect, so it is expected that some app servers will run out of connections while others still have spare connections. In some cases the number of application

servers is so large that dividing the number of connections leaves less than one connection per server.

Solutions to new problems: As we discussed in the previous section, the key to scaling OLTP systems is by limiting the number of concurrent connections to a number that the database can reasonably support even when they are all active. The challenge is in determining this number.

Keeping in mind that OLTP workloads are typically CPU-bound, the number of concurrent users the system can support is limited by the number of cores on the database server. A database with 12 cores can typically only run 12 concurrent CPU-bound sessions.

The best way to size the connection pool is by simulating the load generated by the application. Running a load test on the database is a great way of figuring out the maximum number of concurrent active sessions that can be sustained by the database. This should usually be done with assistance from the QA department, as they probably already determined the mix of various transactions that simulates the normal operations load.

It is important to test the number of concurrently active connections the database can support at its peak, therefore while testing it is critical to make sure that the database is indeed at full capacity and is the bottleneck at the point when we decide the number of connections is maximal. This can be reasonably validated by checking the CPU and IO queues at the database server and correlating with the response times of the virtual users.

In usual performance tests, you try to decide on the maximum numbers of users the application can support. Therefore you run the test with increasing number of virtual users, until the response times become unacceptable. However, when attempting to determine the maximum number of connections in the pool, you should run the test with a fixed number of users and keep increasing the number of connections in the connection pool until the database CPU utilization goes above 60%, the wait events go from “CPU” to concurrency events and response times become unacceptable. Typically all three of these symptoms will start occurring at approximately the same time.

If a QA department and load testing tools are not available, it is possible to use the methodology described by James Morle in his paper "Brewing Benchmarks" and generate load testing scripts from trace files, which can later be replayed by SwingBench.

When running a load test is impractical, you will need to estimate the number of connections based on available data. The factors to consider are:

1. How many cores are available on the database server?2. How many concurrent users or threads does the application need to support?3. When an application thread takes a connection from the pool, how much of the time is spent

holding the connection without actually running database queries? The more time the

application spends “just holding” the connection, the larger the pool will need to be to support the application workload.

4. How much of the database workload is IO-bound? You can check IOWAIT on the database server to determine this. The more IO-bound your workload is, the more concurrent users you can run without running into concurrency contention (You will see a lot of IO contention though).

“Number of cores”x4 is a good connection pool starting point. Less if the connections are heavily utilized by the application and there is little IO activity and more if the opposite is true.

The remaining problem is what to do if the number of application servers is large and it is inefficient to divide the connection pool limit among the application servers. Well-architected systems usually have a separate data layer that can be deployed on separate set of servers. This data layer should be the only component of the application allowed to open connections to the database, and it provides data objects to the various application server components. In this architecture, the connections are divided between the data-layer servers, of which there are typically much fewer. This design has three great advantages: First, the data layer usually grows much slower than the application and rarely requires new servers to be added, which means that pools rarely require resizing. Second, application requests can be balanced between the data servers based on the remaining pool capacity and third, if there is a need to add application-side caching to the system (such as Memcached), only the data layer needs modification.

Application Message Queues

The Problem:By limiting the number of connections from the application servers to the database, we are preventing a large number of queries from queuing at the database server. If the total number of connections allowed from application servers to the database is limited to 400, the run queue on the database will not exceed 400 (at least not by much).

We discussed in the previous section why preventing excessive concurrency in the database layer is critical for database scalability and latency. However, we still need to discuss how the application can deal with the user requests that arrive when there is no free database connection to handle them.

Let’s assume that we limited the connection pool to 50 connections, and due to a slow-down in the database, all 50 connections are currently busy servicing user requests. However, new user requests are still arriving into the system at their usual rate. What shall we do with these requests?

1. Throw away the database request and return error or static content to the user.Some requests have to be serviced immediately. If the front page of your website can't load within few seconds, it is not worth servicing at all. Hopefully, the database is not a critical component in displaying these pages (we'll discuss the options when we discuss caches). If it does depend on the database and your connection pool is currently busy, you will want to display a static page and hope the customer will try again later.

2. Place the request in queue for later processing. Some requests can be put aside for later processing, giving the user the impression of immediate return. For example, if your system allows the user to request reports by email, the request can certainly be acknowledged and queued for off-line processing. This option can be mixed with the first option – limit the size of the queue to N requests and display error messages for the rest.

3. Give the request extra-high priority. The application can recognize that the request arrived from the CIO and make sure it gets to the database ahead of any other user, perhaps cancelling several user requests to get this done.

4. Give the request extra-low priority. Some requests are so non-critical that there is no reason to even attempt serving them with low latency. If a user uses your application to send a message to another user, and there is no guarantee on how soon the message will arrive, it makes sense to tell the user the message was sent while in effect waiting until a connection in the pool is idle before attempting to serve the message. Recurring events are almost always lower priority than one-time events: User signing up for the service is one time event, and if lost, will have immediate business impact. Auditing user activity, on the other hand, is recurring event, and in case of delay will have lower business impact.

5. Some requests are actually a mix of requests from different sources such as a dashboard, in these cases it is best to display the different dashboard components as the data arrives, with some components taking longer than others to show up.

In all those cases, the application is able to prioritise requests and decide on a course of action, based on information that the database did not have at the time. It makes sense to shift the queuing to the application when the database is highly loaded, because the application is better capable of dealing with the excessive load.

Databases are not the only constrained resources, as application servers have their own limitations when dealing with excess load. Typically, application servers have limited number of threads. This is done for the same reason we limit the number of connections to the database servers - the server only has limited number of cores and excessive number of threads will overload the server without improving throughput. Since database requests are usually the highest latency action that is done by an application thread, when the database is slow to response, all the application server threads can be busy waiting for the database. The CPU on the application server will be idle while the application cannot respond to additional user requests.

All this leads to the conclusion that from both the database perspective and the application perspective, it is preferable to decouple the application requests from the database requests. This allows the application to prioritise requests, hide latency and keep the application server and database server busy but not overloaded.

The Solution:Message queues provide an asynchronous communications protocol, meaning that the sender and receiver of the message do not need to interact with the message queue at the same time. They can be used by web applications and OLTP systems as a way to hide latency or variance in latency.

Java defines a common messaging API, JMS. There are multiple implementations of this API, both open source and commercial. Oracle advanced queues are bundled with Oracle RDBMS both SE and EE at no extra cost. These implementations differ in their feature set, supported operations, reliability and stability. The API supports queues for point-to-point messaging with a single publisher and single consumer. It also supports topics for publish-subscribe model where multiple consumers can subscribe to various topics and receive the messages broadcasted with the topic.

Message queues are typically installed by system administrators as a separate server or component, just like databases are installed and maintained. The message queue server is called "Broker", and is usually backed by a database to ensure that messages are persistent even when the broker fails. The application server then connects to the broker by a URL, and can publish and consume from queues by the queue name.

The Architecture:

New Problems:There are some common mythologies related to queue management, which may make developers reluctant to use them when necessaryvii:

1. It is impossible to reliably monitor queues2. Queues are not necessary if you do proper capacity planning3. Message queues are unnecessarily complicated. There must be a simpler way to achieve the

same goals.

Solutions to New Problems:While queues are undeniably useful to improve throughput both at the database and application server layers, they do complicate the architecture. Let’s tackle the myths one by one:

1. If it was indeed impossible to monitor queues, you would not monitor the CPU, load average, average active sessions, blocking sessions, disk IO waits, latches.All systems have many queues. The only question is - where is the queue managed and how easy it will be to manage each specific queue.

If you use Oracle Advanced Queues, V$AQ will show you the number of messages in the queue and the average wait for messages in the queue, which is usually all you need to determine the status of the queue. For the more paranoid, I'd recommend adding a heartbeat monitor - insert a monitoring message to the queue at regular intervals and check that your process can read it from queue and the amount of time it took to arrive.

The more interesting question is what do you do with the monitoring information - at what point will you send an alert to the on-call SA and what will you want her to do when she receives the alert?


Connection PoolJDBC Driver

JNDI

DataSource


Application Business Layer Message Queue

Any queuing system will have high variance in service times and arrival rates of work. If the service time and arrival rates were constant, there will be no need for queues. The high variance is expected to lead to spikes in system utilization, which can cause false alarms - the system is behaving as it should, but messages are accumulating in the queue. Our goal is to give as early as possible notice that there is a genuine issue with the system that should be resolved and not send warnings when the system is behaving as expected.

For this end, I recommend monitoring the following parameters: Service time - this will be monitored at the consumer thread. The thread should track

(i.e. instrument) and log at regular intervals the average time it took to process a message from the queue. If service time increase significantly (compared to a known baseline, taking into account the known variance in response times), it can indicate a slowdown in processing and should be investigated.

Arrival rate should be monitored at the processes that are writing to the queue. How many messages are inserted to the queue every second? This should be tracked for long term capacity planning and to determine peak usage periods.

Queue size - the number of messages in the queue. Using Little's Law we can measure the amount of time a message spends in the queue (wait time) instead.If queue size or wait time increase significantly, this can indicate a "business issue" - i.e. impending breach of SLA. If the wait time frequently climbs to the point when SLAs are breached, it indicates that the system is does not have enough capacity to serve the current workloads. In this case either service times should be reduced (i.e. tuning), or more processing servers should be added. Note that queue size can and should go up for short periods of time, and recovering from bursts can take a while (depending on the service utilization), so this is only an issue if the queue size is high and does not start declining within few minutes, which will indicate that the system is not recovering.

Service utilization - what percent of the time the consumer thread is busy. This can be calculated by (arrival rate/(service time x number of consumers)). The more the service is utilized, the higher the probability that when a new message arrives, it will have other messages ahead of it in the queue and since R=S+W, the service times will suffer. Since we already measure the queue size directly, the main use of service utilization is capacity planning, and in particular detection of over-provisioned systems. For known utilization and fixed service times, if we know the arrival rates will grow by 50% tomorrow, you can calculate the expected effect on response timesviii:

Note that by replacing many small queues on the database server with one (or few) centralized queue in the application, you are in a much better position to calculate utilization and predict the effect on response times.

2. Queues are inevitable. Capacity planning or not, the fact that arrival rates and service times are random will ensure that there will be times when requests will be queued, unless you plan to turn away a large percentage of your business.

I suspect that what is really meant by "capacity planning will eliminate need for queues" is that it is possible to over-provision a system in a way that the queue servers (consumers) will have very low utilization. In this case queues will be exceedingly rare so it may make sense to throw the queue away and have the application threads communicate with the consumers directly. The application will then have to throw away any request that arrives when the consumers are busy, but in this system it will almost never happen. This is “capacity planning by overprovisioning”. I've worked on many databases that rarely exceeded 5% CPU. You'll still need to closely monitor the service utilization to make sure you increase your capacity to keep utilization low. I would not call this type of capacity planning "proper", though.

On the other hand, introduction of a few well defined and well understood queues will help capacity planning. If we assume fixed server utilization, the size of the queue is proportional to the number of servers. So on some systems; it is possible to do the capacity planning just by examining the queue sizes.

3. Message Queues are indeed a complicated and not always stable beast. Queues are a simple concept. How did we get to a point where we need all those servers, protocols and applications to simply create a queue? Depending on your problem definition, it is possible that message queues are an excessive overhead. Sometimes all you need is a memory structure and few pointers. My colleague Marc Fielding created a high-performance queue system with a database table and two jobs. Some developers consider the database a worse overhead and prefer to implement their queues with a file, split and xargs. If this satisfies your requirements, then by all means, use those solutions.

In other cases, I've attempted to implement a simple queuing solution, but the requirements kept piling up: What if we want to add more consumers? What if the consumer crashed and only processed some of the messages it retrieved? By the time I finished tweaking my system to address all the new requirements; it was far easier to use an existing solution. So I advise to only use home-grown solutions if you are reasonably certain the requirements will remain simple. If you suspect that you'll have to start dealing with multiple subscribers, which may or may not need to retrieve the same message multiple times, which may or may not want to ack messages, and that may or may not want to filter specific message types, then I recommend using an existing solution.

ActiveMQ, RabbitMQ (acquired by springsource) are popular open source implementations, and Oracle Advanced Queue is free if you already have Oracle RDBMS license. When choosing an off the shelf message queue, it is important to understand how the system can be monitored and make sure that queue size, wait times and availability of the queue can be tracked by your favorite monitoring tool. If high availability is a requirement, this should also be taken into account when choosing message queue provider, since different queue systems support different HA options.

Application Caching:

The Problem:The database is a sophisticated and well optimized caching machine, but as we saw when we discussed connection pools, it has its limitations when it comes to scaling. One of those limitations is that a single database machine is limited in the amount of RAM it has, so if your data working set is larger than the amount of memory available, your application would have to access the disk occasionally. Disk access is 10,000 times slower than memory access. Even a slight increase in the amount of disk access your queries have to perform, the type that happens naturally as your system grows, can have devastating impact on the database performance.

With Oracle RAC, more cache memory is available by pooling together memory from multiple machines into global cache. However, the performance improvement from the additional servers is not proportional to what you'd see if you would add more memory to the same machine. Oracle has to maintain cache consistency between the servers, and this introduces significant overhead. RAC can scale, but not in every case and it requires careful application design to make this happen.

The Solution:Memcached is a distributed, memory-only, key-value store. It can be used by the application server to cache results of database queries that can be used multiple times. The great benefit of Memcached is that it is distributed and can use free memory on any server, allowing for caching to be done outside of Oracle’s scarce buffer cache. If you have 5 application servers and you allocate 1G RAM to Memcached on each server, you have 5G of additional caching.

Memcached cache is an LRU, just like the buffer cache. If the application is trying to store a new key, and there is no free memory, the oldest item in the cache will be evicted and its memory used for the new key.

According to the documentation, Memcached scales very well when adding additional servers because the servers do not communicate with each other at all. Each client has a list of available servers and the hash function that allows it to know which server will hold the value for which key. When the application requests data from cache, it connects to a single server and accesses exactly one key. When a single cache node crashes, there will get more cache misses and therefore more database requests, but the rest of the nodes will continue operating as usual.

I was unable to find any published benchmarks that confirm this claim, so I ran my own un-official benchmark, using Amazon’s ElastiCache, a service which allows one to create a Memcached cluster and add nodes to it.

Few comments regarding the use of Amazon’s ElastiCache and how I ran the tests:

1. Amazon’s ElastiCache is only usable from servers on Amazon’s EC2 cloud. To run the test, I created an ElastiCache cluster with two small servers (1.3G RAM, 1 virtual core), and one EC2 micro node (613 MB, up to two virtual cores for short bursts) running Amazon’s Linux distribution.

2. I ran the test using Brutisix, a Memcached load test framework, written in PHP. The test is fairly configurable, and I ran it as follows:

7 gets to 3 sets read/write mix, all reads and writes were random. Values were limited to 256 bit.

First test ran with a key space of 10K keys, which fit easily in memory of one Memcached node. The node was pre-warmed with the keys.

Second test ran with the same key space, two-nodes, both pre-warmed. Third test was one node again, 1M keys, which do not fit in memory of one or two nodes

and no pre-warming of cache. Fourth test with two nodes, 1M keys. Second node added after first node was already

active. The first 3 tests ran for 5 minutes each, the fourth ran for 15 minutes. The single node tests ran with 2 threads, and the two-node tests ran with four.

3. Amazon’s cloud monitoring framework was used to monitor Memcached’s statistics. It had two annoying properties – it did not automatically refresh, and the values it showed were always 5 minutes old. In the future, it will be worth the time to install my own monitoring software on an EC2 node to track Memcached performance.

Here is a chart of the total number of gets we could run on each node:

Number of hits and misses per node:

Few conclusions from the tests I ran:

1. In the tests I ran, get latency was 2ms on AWS cluster and 0.0068 on my desktop. It appears that the only latency you’ll experience with Memcached is the network latency.

2. The ratio of hits and misses did not affect the total throughput of the cluster. The throughput is somewhat better with a larger key space, possibly due to fewer get collisions.

3. Throughput dropped when I added the second server, and total throughput never exceeded 60K gets per minute. It is likely that at the configuration I ran, the client could not sustain more than 60K gets per minute.

4. 60K random reads per minute at 2ms latency is pretty impressive for two very small servers, rented at 20 cents an hour. You will need a fairly high-end configuration to get the same performance from your database.

By using Memcached (or other application-side caching), load on the database will be reduced, since there are fewer connections and fewer reads. Database slowdowns will have less impact on the application responsiveness, since on many pages most of the data arrives from cache, the page can gradually display without the users feeling that they wait forever to get results. Even better, if the database is unavailable, you can still maintain partial availability of the application by displaying cached results – in the best cases, only write operations will be unavailable when the database is down.

The Architecture:


Connection PoolJDBC Driver

JNDI

DataSource


Application Business Layer Message Queue

Memcached

New Problems:Unlike Oracle's buffer cache, which is automatically used by queries, use of the application cache does not happen automatically and requires code changes to the application. In this sense it is somewhat similar to Oracle's result cache - it stores results by request and not data blocks automatically. The changes required to use Memcached are usually done in the data layer. The code that queries the database is replaced by code that only queries the database if the result was not found in the cache first.

This places the burden of properly using the cache on the developers. It is said that the only difficult problems in computer science are naming things and cache invalidation. The purpose of this paper is not to solve the most difficult problem in computer science, but we will offer some advice on proper use of Memcached.

In addition, Memcached presents the usual operational questions – How big should it be, and how can it be monitored. We will discuss capacity planning and monitoring of Memcached as well.

Solutions to new problems:The first step in integrating Memcached into your application is to re-write the functions in your data layer, so they will look for data in the cache before querying the database:

For example, the following:

function get_username(int userid) {

username = db_select("SELECT usename FROM users WHERE userid = ?",

userid);

return username;

}

Will be replaced by:

function get_username(int userid) {

/* first try the cache */

name = memcached_fetch("username:" + userid);

if (!name) {

/* not found : request database */

name = db_select("SELECT username FROM users WHERE userid = ?",

userid);

/* then store in cache until next get */

memcached_add("username:" + userid, username);

}

return data;

}

We will also need to change the code that updates the database so it will update the cache as well, otherwise we risk serving stale data:

function update_username(int userid, string username) {

/* first update database */

result = db_execute("Update users set username=? WHERE userid=?",

userid,username);

if (result) {

/* database update successful: update cache */

memcached_set("username:" + userid, username);

}

Of course, not every function should be cached. The cache has limited size, and there is an overhead for attempting to use the cache for data that is not actually there. The main benefits are to use the cache for results of large or highly redundant queries.

To use the cache effectively without risking data corruption, keep the following in mind:

1. Use ASH data to find the queries that use the most database time. Queries that take significant amount of time to execute and short queries that execute very often are good candidates for caching. Of course many of these queries use bind variables and return different results for each user. As we showed in the example, the bind variables can be used as part of the cache key to store and retrieve results for each group of binds separately. Due to the LRU nature of the cache, commonly used binds will remain and cache and get reused while infrequently used combinations will get evicted.

2. Memcached takes large amounts of memory (the more the merrier!) but there is evidencex that it does not scale well across large number of cores. This makes Memcached a good candidate to share a server with an application that makes intensive use of the CPU and doesn't require as much memory. Another option is to create multiple virtual machines on a single multi-core server and install Memcached on all the virtual machines. However this configuration means that you will lose most of your caching capacity with the crash of a single physical server.

3. Memcached is not durable. If you can't afford to lose specific information, store it in the database before you store it in Memcached. This seems to imply that you can't use Memcached to scale a system which is doing primarily large number of writes. In effect, it depends on the exact bottlenecks. If your top wait event is "Log file sync", you can use Memcached to reduce the total amount of work the database does, reduce the CPU load and therefore potentially reduce "log file sync" wait.

4. Some data should be stored eventually but can be lost without critical impact to the system. Instrumentation and logging information is definitely in this category. This information can be stored in Memcached and written to the database in batches and infrequently.

5. Consider pre-populating the cache: If you rely on Memcached to keep your performance predictable, a crash of a Memcached server will send significant amounts of traffic to the database and the effects on performance will be noticeable. When the server comes back, it can take a while until all the data is loaded to the cache again, prolonging the period of reduced performance. To improve performance in the first minutes after a restart, consider a script that will pre-load data into the cache when the Memcached server starts.

6. Consider very carefully what to do when the data is updated: Sometimes it is easy to simultaneously update the cache - if user changes his address and the address is stored in the cache, update the cache immediately after updating the database. This is the best case scenario, as the cache is kept useful through update. Memcached API contains functions that allow changing data atomically or avoid race conditions.When the data in the cache is actually aggregated data, it may not be possible to update it, but will be possible to evict the current information as irrelevant and reload it to the cache when it is next needed. This can make the cache useless when the data is updated and reloaded very frequently.Sometimes it isn't even possible to figure out what keys should be evicted from cache when specific field is updated, especially if the cache contains results of complex queries. This situation is best avoided, but can be dealt with by setting expiration time for the data, and preparing to serve possibly-stale data for that period of time.

How big should the cache be?

It is better to have many servers with less memory than few servers with a lot of memory. This minimises the impact of one crashed Memcached server. Remember that there is no performance penalty to a large number of nodes.

Losing a Memcached instance will always send additional traffic to the database. You need to have enough Memcached servers to make sure the extra traffic will not cause unacceptable latency to the application.

There are no downsides to a cache that is too large, so in general allocate to Memcached all the memory you can afford.

If the average number of gets per item is very low, you can safely reduce the amount of memory allocated.

There is no "cache size advisor" for Memcached, and it is impossible to predict the effect of adding or reducing the cache size based on the monitoring data available from Memcached. SimCache is a tool that based on detailed hit/miss logs for the existing Memcached can simulate an LRU cache and predict the hit/miss ratio in various cache sizes. In many environments keeping such detailed log is impractical, but tracking a sample of the requests could be possible and can still be used to predict cache effects.

Knowing the average latency of database reads under various loads and the latency of Memcached reads should allow you to predict changes in response time as Memcached size and its hit ratio changes. For example:You use SimCache to see that with cache size of 10G you will have hit ratio of 95% in

Memcached. Memcached has latency of 1ms in your system. With 5% of the queries hitting the database, you expect the database CPU utilization to be around 20%, almost 100% of the DB Time on the CPU, and almost no wait time on the queue between the business and the data layers (you tested this separately when sizing your connection pool). In this case the database latency will be 5ms, so we expect the average latency for the data layer to be 0.95*1+0.05*5=1.2ms.

How do I monitor Memcached?

Monitor number of items, gets, sets and misses. An increase in the number of cache misses will definitely mean that the database load is increasing at same time, and can indicate that more memory is necessary. Make sure that the number of gets is higher than the number of sets. If you are setting more than getting, the cache is a waste of space. If the number of gets per item is very low, the cache may be oversized. There is no downside to an oversized cache, but you may want to use the memory for another purpose.

Monitor for number of evictions. Data is evicted when the application attempts to store new item but there is no memory left. An increase in the number of evictions can also indicate that more memory is needed. Evicted time shows the time between the last get of the item to its eviction. If this period is short, this is a good indication that memory shortage makes the cache less effective.

It is important to note that low hit rate and high number of evictions do not immediately mean you should buy more memory. It is possible that your application is misusing the cache:

o Maybe the application sets large numbers of keys, most of which are never used again. In this case you should reconsider the way you use the cache.

o Maybe the TTL for the keys is too short. In this case you will see low hit rate but not many evictions.

o The application frequently attempts to get items that don't exist, perhaps due to data purging of some sort. Consider setting the key with a "null" value, to make sure the invalid searches do not hit the database over and over.

Monitor for swapping. Memcached is intended to speed performance by caching data in memory. If the data is spilled to disk, it is doing more harm than good.

Monitor for average response time. You should see very few requests that take over 1-2ms, longer wait times can indicate that you are hitting the maximum connection limit for the server, or that CPU utilization on the server is too high.

Monitor that the number of connections to the server does not come close to the max connections settings of Memcached (configurable).

Do not monitor "stat sizes" for statistics about size of items in cache. This locks up the entire cache.

All the values I mentioned can be read from Memcached using the STAT call in its API. You can run this command and get the results directly by telnet to port 11211. Many monitoring systems, including Cactii and Ganglia include monitoring templates for Memcached.

i Traffic jam without bottleneck -experimental evidence for the physical mechanism of the formation of a jamYuki Sugiyama, Minoru Fukui, Macoto Kikuchi, Katsuya Hasebe, Akihiro Nakayama, Katsuhiro Nishinari, Shin-ichi Tadaki, Satoshi Yukawa New Journal of Physics, Vol.10, (2008), 033001ii http://www.telegraph.co.uk/science/science-news/3334754/Too-many-cars-cause-traffic-jams.htmliii Scaling Oracle8i™: Building Highly Scalable OLTP System Architectures, James Morleiv http://www.youtube.com/watch?v=xNDnVOCdvQ0v http://docs.oracle.com/javase/1.4.2/docs/guide/jdbc/getstart/datasource.htmlvi http://www.perfdynamics.com/Manifesto/USLscalability.htmlvii http://teddziuba.com/2011/02/the-case-against-queues.htmlviii http://www.cmg.org/measureit/issues/mit62/m_62_15.html

ix http://code.google.com/p/brutis/

x http://assets.en.oreilly.com/1/event/44/Hidden%20Scalability%20Gotchas%20in%20Memcached%20and%20Friends%20Presentation.pdf

http://code.google.com/p/brutis/

http://www.cmg.org/measureit/issues/mit62/m_62_15.html

http://teddziuba.com/2011/02/the-case-against-queues.html

http://docs.oracle.com/javase/1.4.2/docs/guide/jdbc/getstart/datasource.html

http://www.youtube.com/watch?v=xNDnVOCdvQ0

http://www.telegraph.co.uk/science/science-news/3334754/Too-many-cars-cause-traffic-jams.html

Technology

Queues, Pools and Caches - Paper