How to use GitOps to automate Terraform

Instead of using CI/CD pipelines or Terraform Cloud, try this alternative approach to automating Terraform using Flux and GitOps.
3 readers like this.
Tips and gears turning

opensource.com

GitOps as a workflow is perfect for application delivery, mostly used in Kubernetes environments, but it is also possible to use for infrastructure. In a typical GitOps scenario, you might want to look at solutions like Crossplane as a Kubernetes-native alternative, while most traditional infrastructure are still used with CI/CD pipelines. There are several benefits of creating your deployment platform with Kubernetes as the base, but it also means that more people would have to have that particular skill set. One of the benefits of an Infrastructure-as-Code tool like Terraform is that it is easy to learn, and doesn't require much specialized knowledge.

When my team was building our platform services, we wanted everyone to be able to contribute. Most, if not all, of our engineers use Terraform on a daily basis. They know how to create Terraform modules that can be used in several scenarios and for several customers. While there are several ways of automating Terraform, we would like to utilize a proper GitOps workflow as much as possible.

How does the Terraform controller work

While searching for alternatives for running Terraform using Kubernetes, I found several controllers and operators, but none that I felt had as much potential as the tf-controller from Weaveworks. We are already using Flux as our GitOps tool. The tf-controller works by utilizing some of the core functionality from Flux, and has a custom resource for Terraform deployments. The source controller takes care of fetching our modules, the Kustomize controllers apply the Terraform resources, and then the controller spins up static pods (called runners) that run your Terraform commands.

The Terraform resource looks something like this:

apiVersion: infra.contrib.fluxcd.io/v1alpha1
kind: Terraform
metadata:
  name: helloworld
  namespace: flux-system
spec:
  interval: 1m
  approvePlan: auto
  path: ./terraform/module
  sourceRef:
    kind: GitRepository
    name: helloworld
    namespace: flux-system

There are a few things to note on the specs here. The interval in the spec controls how often the controller starts up the runner pods. This then performs a terraform plan on your root module, which is defined by the path parameter.

This particular resource is set to automatically approve a plan. This means that if there is a difference between the plan and the current state of the target system, a new runner will run to apply the changes automatically. This makes the process as "GitOps" as possible, but you can disable this. If you disable it, you have to manually approve plans. You can do this either by using the Terraform Controller CLI or by updating your manifests with a reference to the commit which should be applied. For more details, see the documentation on manual approval.

The tf-controller utilizes the source controller from Flux. The sourceRef attribute is used to define which source resource you want to use, just like a Flux Kustomization resource would.

Advanced deployments

While the example above works, it's not the type of deployment my team would normally do. When not defining a backend storage, the state would get stored in the cluster, which is fine for testing and development. But for production, I prefer that the state file is stored somewhere outside the cluster. I don't want this defined in the root module directly, as I want to reuse our root modules in several deployments. This means I have to define our backend in our Terraform resource.

Here is an example of how I set up custom backend configurations. You can find all available backends in the Terraform docs:

apiVersion: infra.contrib.fluxcd.io/v1alpha1
kind: Terraform
metadata:
  name: helloworld
  namespace: flux-system
spec:
  backendConfig:
	customConfiguration: |
		backend "azurerm" {
		  resource_group_name  = "rg-terraform-mgmt"
		  storage_account_name = "stgextfstate"
		  container_name       = "tfstate"
		  key                  = "helloworld.tfstate"
		}
  ...

Storing the state file outside the cluster means that I can redeploy our cluster. But then there is no storage dependency. There is no need for backup or state migration. As soon as the new cluster is up, it runs the commands against the same state, and I am back in business.

Another advanced move is dependencies between modules. Sometimes we design deployments like a two-stage rocket, where one deployment sets up certain resources that the next one uses. In these scenarios, we need to make sure that our Terraform is written in such a fashion so that we output any data needed as inputs for the second module, and ensure that the first module has a successful run first.

These two examples are from code used while demonstrating dependencies, and all the code can be found on my GitHub. Some of the code is omitted for brevity's sake:

apiVersion: infra.contrib.fluxcd.io/v1alpha1
kind: Terraform
metadata:
  name: shared-resources
  namespace: flux-system
spec:
  ...
  writeOutputsToSecret:
    name: shared-resources-output
  ...
apiVersion: infra.contrib.fluxcd.io/v1alpha1
kind: Terraform
metadata:
  name: workload01
  namespace: flux-system
spec:
  ...
  dependsOn:
    - name: shared-resources
	...
  varsFrom:
    - kind: Secret
      name: shared-resources-output
  ...

In the deployment that I call shared-resources, you see that I defined a secret where the outputs from the deployment should be stored. In this case, the outputs are the following:

output "subnet_id" {
  value = azurerm_virtual_network.base.subnet.*.id[0]
}

output "resource_group_name" {
  value = azurerm_resource_group.base.name
}

In the workload01 deployment, I first define our dependency with the dependsOn attribute, which makes sure that shared-resources has a successful run before scheduling workload01. The outputs from shared-resources is then used as inputs in workload01, which is the reason why I want to wait.

Why not pipelines or Terraform Cloud

The most common approach to automating Terraform is either by using CI/CD pipelines or Terraform Cloud. Using pipelines for Terraform works fine, but usually ends up with needing to copy pipeline definitions over and over again. There are solutions to that, but by using the tf-controller you have a much more declarative approach to defining what you want your deployments to look like rather than defining the steps in an imperative fashion.

Terraform Cloud has introduced a lot of features that overlap with using the GitOps workflow, but using the tf-controller does not exclude you from using Terraform Cloud. You could use Terraform Cloud as the backend for your deployment, only automating the runs through the tf-controller.

The reason that my team uses this approach is that we already deploy applications using GitOps, and we have much more flexibility as to how we can offer these capabilities as a service. We can control our implementation through APIs, making self-service more accessible to both our operators and end-users. The details around our platform approach are such a big topic, that we will have to return to those in its own article.


This article was originally published on the author's blog and has been republished with permission.

Roberth Strand, standing in front of a window, wearing a Kubernetes hoodie and his company t-shirt
I build platforms that scale, at Amesto Fortytwo, your friendly neighborhood platform engineering shop. Besides work, I am also active in the CNCF, more specifically in the TAG App Delivery. There I serve as co-chair of the platforms working group and maintainer on the OpenGitOps project.

Comments are closed.

Creative Commons LicenseThis work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.