Scaling PostgreSQL with Kubernetes Operators

Operators let users create standardized interfaces for managing stateful applications, like PostgreSQL, across Kubernetes-enabled cloud environments.

Image by:

Opensource.com

Running applications with containers and orchestrating their lifecycles with Kubernetes has transformed how teams manage large-scale deployments. Containers are typically used to perform a short-lived task, such as sending a message, or running an application that does not need to store information, such as a web server. A new container can be created that's unaware of any work that previously occurred—in other words, be "stateless"—and this is completely fine for most applications.

The important part of a database system is that it maintains state: a process, while it is running, can modify the state of the data on disk, and that state is maintained even after the process terminates. In other words, when you use a database system, your data must always be stored on disk whether or not your system is running.

On the surface, this is at odds with the "serverless" world: it's important for a database to maintain state even after a process terminates. However, if an application is able to provide Kubernetes with additional knowledge about how to maintain the state beyond the lifecycle of a pod, not only can databases take advantage of Kubernetes' orchestration features, but suddenly teams have the ability to run their own database-as-a-service platforms.

Kubernetes Operators & PostgreSQL

A Kubernetes Operator is a type of application that lets developers provide additional context to Kubernetes on how to manage a stateful application. An Operator lets developers provide more application-specific knowledge to properly maintain the full lifecycle of a stateful application and ensure critical assets, such as data, remain safe and accessible.

With a database, like the popular open source PostgreSQL database, an Operator can help with actions including:

Provisioning: allocating disk space where the data will be permanently stored
Scaling: safely creating a replica (or copy) of the database that is consistently kept up-to-date
High availability: ensuring applications can always read and write to the database even if a node becomes unavailable
User management: enabling user access and remembering permissions to specific databases within an environment

Additional operating requirements include using a particular version of the database software, using a particular storage system or hardware profile, or deploying the databases to particular server nodes in a cluster.

Crunchy Data's open source Crunchy PostgreSQL Operator is an Operator for PostgreSQL that is used in many production environments. It provides a simple yet powerful command-line interface that lets users deploy their own database-as-a-service system on any Kubernetes-enabled platform.

For example, with the pgo create command, which is used to provision a database, you can set up a distributed, high-availability PostgreSQL cluster with full disaster recovery support along with a database monitoring sidecar powered by pgMonitor. In other words, in lieu of a complicated multi-step process or having to write your own scripts, you can create the type of PostgreSQL system required for production from a single command.

While this may seem to be overkill if you are managing a handful of database clusters, the value of an Operator scales significantly if you need to support hundreds or thousands of different clusters. Having a standardized set of commands with the ability to flexibly deploy clusters for different workloads both eases the administration burden and provides more options for how a team can develop and deploy workloads into production.

Open source as a service

While many large cloud providers provide various open source platforms "as a service," this has the effect of locking people into a particular cloud infrastructure. Using Kubernetes and an open source Operator like the Crunchy PostgreSQL Operator lets users choose which cloud environments they want to run their stateful services (have Kubelet, will travel!), all while managing them from a standardized interface.

Jonathan Katz will present Operating PostgreSQL with Kubernetes at Scale at the 17th annual Southern California Linux Expo (SCaLE 17x) March 7-10 in Pasadena, Calif.