An SRE's guide to Memcached for building scalable applications

The core of a site reliability engineer's job is building scalable applications and infrastructure.

Image by:

Opensource.com

Memcached is a general-purpose memory-caching system. This means it is designed to save (or cache) answers to questions that take a long time to compute or retrieve and are likely to be asked again. A common use case is to accelerate the use of a database: for example, if we expect to need the list of "names of all people who are on team X" repeatedly, we might put this data in Memcached rather than run a SQL query each time. (Note: Memcached is occasionally referred to as "memcache." We will stick to the full name throughout this article.)

Caches are helpful for building scalable applications and infrastructure, which is at the core of being a site reliability engineer (SRE). This article looks at what it takes to operate Memcached effectively.

Memory recommendations and daemonizing Memcached

Memcached works best when it is limited to the amount of memory the system has available: it will evict data when it takes more space than the memory limit. Some memory space will be needed for what we can call "overhead"—memory needed for Memcached's administrative operations, the operating system, and ancillary processes.

This is set via the -m command-line flag, which is likely the only flag you will need to run Memcached. The -d (daemonize) flag is usually not useful: on modern systemd- based operating systems, Memcached should not daemonize itself. Likewise, if you run it under Docker, it also should not daemonize itself.

Running Memcached in a container is fine, but it is important to consider what else is running on the host and carefully tune memory requirements.

Looking at Memcached data with stats

When you're running Memcached, it is often a good idea to directly connect and play around to see what is going on. It is reasonably safe, even with production instances, as long as you take a bit of care.

The safest command to run is stats. It will cause Memcached to spit out a large number of statistics and details that are often useful:

$ echo stats | nc localhost 11211
...
STAT uptime 1346
...
STAT max_connections 1024
STAT curr_connections 2
STAT total_connections 6
STAT rejected_connections 4
...
STAT get_hits 0
STAT get_misses 0
STAT get_expired 0
...
END

The most interesting statistics are usually "hits," "misses," and "expired." These can often tell an interesting story about how effective the cache is. If effectiveness is reduced, this is a cause for concern, as it could degrade application performance.

One slightly less safe thing to try is a store-and-retrieve. This is a good way to "kick the tires" and learn how it works:

$ echo stats | nc localhost 11211
set my_key 0 0 8
my_value^M
STORED
get my_key
VALUE my_key 0 8
my_value
END

After typing my_value, you need to send a DOS-style line ending: return and newline. This is done in a Linux console using Ctrl+V and then pressing Enter, which will output the return character (ASCII 13), and then press Enter as usual to output the newline character (ASCII 10).

The first 0 is for "metadata," and passing in 0 means that there is no interesting metadata. It is treated as a bitmask, so 0 has all bits turned off. The second 0 is for expiry time. It means "do not expire." In general, this is fine for testing. In production settings, it is good to set an expiry time for keys.

If this is an important instance, care must be taken to not overwrite an important key. However, the ability to quickly store and retrieve via the command line allows making sure that Memcached is running correctly.

Note that in a modern microservices setup, many services will want to save data in Memcache, and it is worthwhile to come up with a strategy to manage that. One option is to run a Memcache, or a cluster, per service. However, this is often complicated and high maintenance. Often, the right thing is to have services share a Memcache. In that case, it is a good idea to implement some reasonable policies; for example, mandating prefixing the service name to the key. This allows checking which services use how much space by using the cachedump command:

$ echo 'stats items' | nc -w 1 localhost 11211|grep ':number '
STAT items:1:number 2

This command will show all the "slab" IDs. Memcache stores similarly sized keys in slabs; this example has only one slab:

$ echo 'stats cachedump 1 1000' | nc -w 1 localhost 11211
ITEM my_key [8 b; 0 s]
ITEM foo [5 b; 0 s]
END

Here, there are two keys: one with an 8-byte value and one with a 5-byte value.

In a more realistic scenario with many keys, you might want to process this data with awk or a script and, by using a local convention, figure out how much space each service is using.

This can be integrated into a monitoring system, like Prometheus, to track behavior over time.

Additionally, since many teams share the same service, it is a useful guideline to suggest that services encrypt and authenticate the data they are caching. Symmetric encryption on modern CPUs is performant, and this allows much simpler security models. One example of a library that supports this is cryptography's Fernet. (Let me know in the comments if you want to read more about this.)

Conclusion

Memcached is a common open source technology for an SRE to support. In this article, I looked at how to query common statistics about it, offered some configuration advice, and showed how to help multiple teams share a single Memcached cluster in an easily monitored and secure way. Do you have other questions on how to manage Memcached? Ask in the comments, and I will happily share more tips.