Sysadmins, site reliability engineers (SREs), and cloud operators all too often struggle to feel confident in their infrastructure as it scales up. Also too often, they think the only way to solve their challenges is to write a tool for in-house use. Fortunately, there are options. There are many open source tools available to test an infrastructure's performance. Here are my favorites.
Pbench is a performance testing harness to make executing benchmarks and performance tools easier and more convenient. In short, it:
- Excels at running micro-benchmarks on large scales of hosts (bare-metal, virtual machines, containers, etc.) while automating a potentially large set of benchmark parameters
- Focuses on installing, configuring, and executing benchmark code and performance tools and not on provisioning or orchestrating the testbed (e.g., OpenStack, RHEV, RHEL, Docker, etc.)
- Is designed to work in concert with provisioning tools like BrowBeat or Ansible playbooks
Baselining is a critical aspect of infrastructure reliability. Ripsaw is a performance benchmark Operator for launching workloads on Kubernetes. It deploys as a Kuberentes Operator that then deploys common workloads, including specific applications (e.g., Couchbase) or general performance tests (e.g., Uperf) to measure and establish a performance baseline.
The collection of tools in OpenShift Scale, OpenShift's open source solution for performance testing, do everything from spinning up OpenShift on OpenStack installations (TripleO Install and ShiftStack Install), installing on Amazon Web Services (AWS), or providing containerized tooling, like running Pbench on your cluster or doing cluster limits testing, network tests, storage tests, metric tests with Prometheus, logging, and concurrent build testing.
Scale's CI suite is flexible enough to both add workloads and include your workloads when deploying to Azure or anywhere else you might run. You can see the full suite of tools on GitHub.
Browbeat calls itself "a performance tuning and analysis tool for OpenStack." You can use it to analyze and tune the deployment of your workloads. It also automates the deployment of standard monitoring and data analysis tools like Grafana and Graphite. Browbeat is maintained on GitHub.
Smallfile is a filesystem workload generator targeted for scale-out, distributed storage. It has been used to test a number of open filesystem technologies, including GlusterFS, CephFS, Network File System (NFS), Server Message Block (SMB), and OpenStack Cinder volumes. It is maintained on GitHub.
Ceph Benchmarking Tool
Ceph Benchmarking Tool (CBT) is a testing harness that can automate tasks for testing Ceph cluster performance. It records system metrics with collectl, and it can collect more information with tools including perf, blktrace, and valgrind. CBT can also do advanced testing that includes automated object storage daemon outages, erasure-coded pools, and cache-tier configurations.
Contributors have extended CBT to use Pbench monitoring tools and Ansible and to run the Smallfile benchmark. A separate Grafana visualization dashboard uses Elasticsearch data generated by Automated Ceph Test.
Satellite-performance (satperf) is a set of Ansible playbooks and helper scripts to deploy Satellite 6 environments and measure the performance of selected actions, such as concurrent registrations, remote execution, Puppet operations, repository synchronizations and promotions, and more. You can find Satperf on GitHub.
Sysadmins, SREs, and cloud operators face a wide variety of challenges as they work to scale their infrastructure, but luckily there is also a wide variety of tools to help them get past those common issues. Any of these seven tools should help you get started testing your infrastructure's performance as it scales.
Are there other open source performance and scaling tools that should be on this list? Add your favorites in the comments.