Gating production in DevOps

Image by:

opensource.com

When we think about gates, we think about having something to protect. Gates are most often used to provide a physical boundary for the sake of security. They are made of metal, or wood, or plastic, and even sometimes they are made of software. They save us from uninvited risks of damaging something important to us.

Gates are essential to DevOps

Before we get into gates, let's take a step back and discuss the role of DevOps practices in the software delivery lifecycle (SDLC) process. I believe DevOps' role is to be accountable for and reduce the risk inherent in SDLC management. This risk is measured in all key business factors, from money to time. For a deeper look at the SDLC as it relates to DevOps, read Bryant Son's article about DevOps pipelines.

The entire set of DevOps practices works around its pillars of practice: continuous integration, continuous deployment/delivery, and continuous monitoring. Any mistake in setting up any of these pillars will land you in a troubled development process. To mitigate this scenario, many people recommend using the following testing approaches in the appropriate place in your SDLC:

Unit tests
Integration tests
Functional tests
Penetration tests
Acceptance tests

When it comes time to put some assurance on your software's quality and readiness, someone has to sign off and say, "go ahead." When tests are well-designed, passing them means, in effect, your product is good enough to be put in your customers' hands.

What do we have to understand about testing to appropriately gate our customers from premature changes to our product?

Types of gates

Gates must involve more precise tests and approvals to ensure the SDLC process is well taken care of without compromising software delivery time.

There are two categories of gates I would like to discuss: manual and automated.

Manual gates

In some organizations, testing even the most basic functionality of a product is considered a full-time job for a quality assurance (QA) engineer. Manual gates require a QA team member sign-off, where QA engineers run a few tests and certify the product is ready to be promoted to the next step in the process towards being put in customers' hands.

Manual approvals

Assume you have a release process that goes through a change management process. Before you execute a change, you need someone, typically a change manager, to review and approve your change request.

Manual tests

After the manual approval, a QA engineer (or a similar position dedicated to testing) runs tests manually on the changes. Their work is usually quite thorough and can identify challenges that would be difficult to detect with automated tests.

Automated gates

Automated gates use software to manage approvals toward the next step in software development.

Automated approvals

Assume you've written an execution plan using Hashicorp's Terraform to spin up your infrastructure, leveraging the benefits of infrastructure-as-code, but you want to validate whether the resources have been created with the development team's required quantity and specifications. By running terraform apply -input=false my_terraform_plan without using the -auto-approve flag, you would select Terraform's built-in interactive approval process, which puts up a gate that requires your confirmation before applying the configuration (more on the Terraform workflow). You can also use the Jenkins pipeline: input step plugin to wait for your approval after terraform plan before applying the configuration. Jenkins is a common DevOps pipeline tool that can reduce friction in these processes.

Automated tests

The more testing we can do before a patch makes it through a gate, the better. Automating tests increases the likelihood that the update does what we want it to do. Say you are updating your infrastructure by sending a new configuration file to your proxy server, Nginx. You can know ahead of time that the update will work as designed if you run something like InSpec to verify the Nginx state will be what you expect it to be after the deployment:

describe service('nginx') do
  it { should be_enabled }
  it { should be_installed }
  it { should be_running }
end

If InSpec throws an exception, you know the updated configuration will not be safe for production—and that your gates are effectively supporting your customer's need for safe deployment.

In another example, let's say you deployed a Docker Swarm cluster and need to verify a service named myservice. Below is the InSpec code for this scenario:

describe docker_service(myservice) do
  it { should exist }
  its('ports') { should include '*:8080->8080/tcp' }
  its(‘repo’) { should eq 'alpine' }
  its('tag') { should eq 'latest' }
en

These are examples of integration and functional testing, although the lines between the two are often debated. InSpec is a powerful open source tool to achieve a declarative test strategy, and it works with standard automation tools like Terraform, Ansible, and Chef. InSpec is one of several tools available to validate infrastructure state, from open ports to the installed components, and their functionality.

Which gates?

Before we delve further into when to gate, we should examine WHICH gates. To understand the context, let's look at the traditional testing process and things to consider before making room for more gates and approvals.

Traditional testing

The figure below shows the traditional testing process as software is delivered using an agile process in the SDLC. The results of each step dictate what actions you need to take, then you put your code back into the cycle, and repeat until it becomes good enough to be delivered to customers.

The speed and diversity of modern software development create new issues that the traditional approach can't handle. Some points to keep in mind given this new paradigm:

Track testing code coverage so you know what percentage of the code is being tested and can get some idea about code quality.
Unit tests must cover security functions, like vulnerability scanning in the artifacts generated after the build step.
Integration and functional tests should include the platform (e.g., Kubernetes) where the software will be deployed.

Too much automation can be bad

Don't forget that running manual tests is still important because sometimes too much automation can be counterproductive. Manual testing is often easier to get started, and it can adapt as you figure out what exactly you want to test, how to test it, and why it matters. Until you can answer the what, how, and why, automation is not the right solution; it will likely over-engineer your testing and make simple things look complicated.

Limit the gates

You are not building a jail. The goal of gating in DevOps is to ensure a stable production environment. You need only the gates that are necessary. While it's tempting to think everything needs to be verified before it's promoted to production, you also need to know how to control and where to put the gates so that they don't affect the software delivery timeline or make the process overly complicated.

For example, whether or not the tests are run in the cloud:

Unit tests must run when the code is integrated with other components to create the software package.
Infrastructure tests can be done after the infrastructure is spun and ready for the first time.
Smoke tests must run on applications after they are deployed on the platform.
Network scanning and penetration testing can be done once the application is deployed on the platform.

Also, note that not every type of approval or gate discussed in this article is required every time after the artifacts (e.g., container runtime images, virtual machine images, or software archives) are promoted to production.

Conclusion

Gating has always been part of software development. The strategy for achieving safe deployment has moved from manual to automated gating as the speed of software development has increased. Too much of either type of gating can work against the goal of releasing stable code (remember that requires both "releasing" and "stable").

It is difficult to achieve this level of gating without at least some automation at play. Use infrastructure-as-code principles wherever possible and run tests on your infrastructure to make sure it is as reliable as the software you put on top of it.