How to scale Ruby on Rails with Redis

Consumer Financial Protection Bureau on open source and "growing the pie"

Image by:

Opensource.com

Ruby on Rails is a useful framework for quickly building line-of-business applications. But as our applications grow, we face scaling challenges. There are a variety of tools we can use—but adding different technologies to our applications increases complexity. This article explores how to use the Redis in-memory data structure store as a multi-purpose tool to solve different problems.

First, we need to install Redis, which can be done with brew, apt-get, or docker. And, of course, we need to have Ruby on Rails. As an example, we will build an online event-management application. Here are the basic models.

class User < ApplicationRecord
  has_many :tickets
end
class Event < ApplicationRecord
  has_many :tickets
end
class Ticket < ApplicationRecord
  belongs_to :user
  belongs_to :event
end

Redis as a cache

The application's first requirement is to show how many tickets an event sold and how much money it earned. We will create these methods.

class Event < ApplicationRecord
  def tickets_count
    tickets.count
  end
  def tickets_sum
    tickets.sum(:amount)
  end
end

This code will fire SQL queries against our database to fetch the data. The problem is it may become slow as it scales. To speed things, we can cache results of these methods. First, we need to enable caching with Redis for our application. Add gem 'redis-rails' to the Gemfile and run bundle install. In config/environments/development.rb, configure:

config.cache_store = :redis_store, {
  expires_in: 1.hour,
  namespace: 'cache',
  redis: { host: 'localhost', port: 6379, db: 0 },
  }

Specifying cache namespace is optional, but it helps. This code also sets the default application level expiration to one hour, when the Redis time-to-live (TTL) will purge stale data. Now we can wrap our methods in cache blocks.

class Event < ApplicationRecord
  def tickets_count
    Rails.cache.fetch([cache_key, __method__], expires_in: 30.minutes) do
      tickets.count
    end
  end
  def tickets_sum
    Rails.cache.fetch([cache_key, __method__]) do
      tickets.sum(:amount)
    end
  end
end

Rails.cache.fetch will check if a specific key exists in Redis. If the key exists, it will return a value associated with the key to the application and not execute the code. If the key does not exist, Rails will run the code within the block and store data in Redis. cache_key is a method provided by Rails that will combine the model name, primary key, and last updated timestamp to create a unique Redis key. We'll add __method__, which will use the name of a specific method to further uniquify the keys. And we can optionally specify different expirations on some methods. The data in Redis will look like this.

{"db":0,"key":"cache:events/1-20180322035927682000000/tickets_count:","ttl":1415,
"type":"string","value":"9",...}

{"db":0,"key":"cache:events/1-20180322035927682000000/tickets_sum:","ttl":3415,
"type":"string","value":"127",...}

{"db":0,"key":"cache:events/2-20180322045827173000000/tickets_count:","ttl":1423,
"type":"string","value":"16",...}

{"db":0,"key":"cache:events/2-20180322045827173000000/tickets_sum:","ttl":3423,
"type":"string","value":"211",...}

In this situation, event 1 sold nine tickets totaling $127 and event 2 sold 16 tickets totaling $211.

Cache busting

What if another ticket is sold right after we cache this data? The website will show cached content until Redis purges these keys with TTL. It might be OK in some situations to show stale content, but we want to show accurate, current data. This is where the last updated timestamp is used. We will specify a touch: true callback from the child model (ticket) to the parent (event). Rails will touch the updated_at timestamp, which will force creation of a new cache_key for the event model.

class Ticket < ApplicationRecord
  belongs_to :event, touch: true
end
# data in Redis
{"db":0,"key":"cache:events/1-20180322035927682000000/tickets_count:","ttl":1799,
  "type":"string","value":"9",...}
{"db":0,"key":"cache:events/1-20180322035928682000000/tickets_count:","ttl":1800,
  "type":"string","value":"10",...}
...

The pattern is: Once we create a combination of cache key and content, we do not change it. We create new content with a new key, and previously cached data remains in Redis until TTL purges it. This wastes some Redis RAM but it simplifies our code, and we do not need to write special callbacks to purge and regenerate cache.

We need to be careful in selecting our TTL because if our data changes frequently and TTL is long, we will store too much-unused cache. If the data changes infrequently but TTL is too short, we will regenerate cache even when nothing changes. Here are some suggestions I wrote for how to balance this.

A note of caution: Caching should not be a Band-Aid solution. We should look for ways to write efficient code and optimize database indexes. But sometimes caching is still necessary and can be a quick solution to buy time for a more complex refactor.

Redis as a queue

The next requirement is to generate reports for one or multiple events showing detailed stats on how much money each event received and listing the individual tickets sold with user info.

class ReportGenerator
  def initialize event_ids
  end
  def perform
    # query DB and output data to XLSX
  end
end

Generating these reports may be slow, as data must be gathered from multiple tables. Instead of making users wait for a response and downloading the spreadsheet, we can turn it into a background job and send an email when it's finished with an attachment or a link to the file.

Ruby on Rails has an Active Job framework that can use a variety of queues. In this example, we will leverage the Sidekiq library, which stores data in Redis. Add gem 'sidekiq' to the Gemfile and run bundle install. We will also use the sidekiq-cron gem to schedule recurring jobs.

# in config/environments/development.rb
config.active_job.queue_adapter = :sidekiq
# in config/initializers/sidekiq.rb
schedule = [
  {'name' => MyName, 'class' => MyJob, 'cron'  => '1 * * * *',  
  'queue' => default, 'active_job' => true }
]
Sidekiq.configure_server do |config|
 config.redis = { host:'localhost', port: 6379, db: 1 }
 Sidekiq::Cron::Job.load_from_array! schedule
end
Sidekiq.configure_client do |config|
 config.redis = { host:'localhost', port: 6379, db: 1 }
end

Note that we are using a different Redis database for Sidekiq. It is not a requirement, but it can be useful to store cache in a separate Redis database (or even on a different server) in case we need to flush it.

We can also create another config file for Sidekiq to specify which queues it should watch. We do not want to have too many queues, but having only one queue can lead to situations where it gets clogged with low-priority jobs and delays a high-priority job. In config/sidekiq.yml:

---
:queues:
  - [high, 3]
  - [default, 2]
  - [low, 1]

Now we will create the job and specify low-priority queue.

class ReportGeneratorJob < ApplicationJob
  queue_as :low
  self.queue_adapter = :sidekiq  
  def perform event_ids
    # either call ReportGenerator here or move the code into the job
  end
end

We can optionally set a different queue adapter. Active Job allows us to use different queue backends for different jobs within the same application. We can have jobs that need to run millions of times per day. Redis could handle this, but we might want to use a different service like AWS Simple Queue Service (SQS). I wrote a comparison of different queue options that might be helpful to you.

Sidekiq takes advantage of many Redis data types. It uses Lists to store jobs, which makes queuing really fast. It uses Sorted Sets to delay job execution (either specifically requested by an application or when doing an exponential backoff on retry). Redis Hashes store statistics on how many jobs were executed and how long they took.

Recurring jobs are also stored in Hashes. We could have used plain Linux cron to kick off the jobs, but that would introduce a single point of failure into our system. With Sidekiq-cron, the schedule is stored in Redis, and any of the servers where Sidekiq workers run can execute the job (the library ensures that only one worker will grab a specific job at a scheduled time). Sidekiq also has a great UI where we can view various stats and either pause scheduled jobs or execute them on demand.

Redis as a database

The last business requirement is to track how many visits there are to each event page so we can determine their popularity. For that we will use Sorted Sets. We can either create the REDIS_CLIENT directly to call native Redis commands or use the Leaderboard gem, which provides additional features.

# config/initializers/redis.rb
REDIS_CLIENT = Redis.new(host: 'localhost', port: 6379, db: 1)
# config/initializers/leaderboard.rb
redis_options = {:host => 'localhost', :port => 6379, :db => 1}
EVENT_VISITS = Leaderboard.new('event_visits', Leaderboard::DEFAULT_OPTIONS, redis_options)

Now we can call it from the controller's show action:

class EventsController < ApplicationController
  def show
    ...
    REDIS_CLIENT.zincrby('events_visits', 1, @event.id)
    # or
    EVENT_VISITS.change_score_for(@event.id, 1)
  end
end
# data in Redis
{"db":1,"key":"events_visits","ttl":-1,"type":"zset","value":[["1",1.0],...,["2",4.0],["7",22.0]],...}

Adding items to a Sorted Set slows down eventually when we have millions of Sorted Set members, but Redis is plenty fast for most use cases. We can now use this Sorted Set to determine the rank and score of each event. Or we can display the top 10 events with REDIS_CLIENT.zrange('events_visits', 0, 9).

Since we are using Redis to store very different types of data (cache, jobs, etc.), we need to be careful not to run out of RAM. Redis will evict keys on its own, but it cannot tell the difference between a key holding stale cache vs. something important to our application.

Further information

I hope this article was a useful introductin to using Redis for variety of purposes in a Ruby on Rails application. If you'd like to learn more, consult the following links: