What you should know about GitOps with Terraform Cloud

GitOps With Terraform Cloud.png

Is GitOps just another buzzword?

Being in tech, you’re probably bombarded with buzzwords: DevOps, DevSecOps, AIOps, ChatOps... ops, ops, ops! Many of these are just buzzwords. However, GitOps is not – it's a new label for a tried and tested workflow. GitOps means performing all of your operations based on a single source of truth. Typically, this “single source of truth” is a git repository. Instead of allowing developers and sysadmins to make changes directly to your infrastructure, changes must first be made in git and then reviewed by other team members. Once changes have been made in git, automated tooling applies these to what’s in production. 

There are a number of major benefits to using a GitOps workflow:

  • You can always be confident of knowing exactly what you’re running in production. There are no surprises when making changes to a resource – everything matches the state it’s supposed to in git.

  • New team members are just as effective as seasoned veterans. There’s no “secret sauce” that only one person knows when making infrastructure changes – as long as you make the same changes in git, the result will be the same regardless of who does it.

  • GitOps allows you to enforce a policy that ensures infrastructure changes are reviewed by other team members. You can make sure that all team members are in agreement and you’re doing things in the best possible way before making big changes.

  • More members of your business will be able to see and contribute to your infrastructure setup. Although developers and executives likely won’t be making big changes at first, it gives non-operations team members the ability to make small changes and participate in reviews of big changes or provide input to your ops team. This can help improve communication with your ops team and prevent silos where your operations team becomes a single point of failure.

  • It’s easier to revert to a previous state if something breaks. You can just revert the commits that broke something and go back to a working version of your infrastructure if something unwanted occurs. This obviously can’t fix deletions of an important resource (“oops, I just deleted our prod DB”), but this lets you rapidly recover from misconfigurations of load balancers or and other networking snafus.

  • You can copy-and-paste infrastructure resources, and clean them up easily. This is a major time-saver when an important client or executive wants a copy of an environment and all you have to do is copy, paste, apply.

Terraform and Terraform Cloud

 
terraform.png
 

GitOps itself is not a new technique – we've been using it at stack.io for over a decade now. What has changed recently is the availability of tools to easily implement a GitOps workflow. Whereas GitOps previously was relatively hard to implement and required the know-how to build your own infrastructure continuous deployment pipeline from scratch, now you can do it relatively easily using a small number of tools. In particular, we recommend using Terraform and Terraform Cloud with your favourite version control provider (GitHub, Bitbucket, GitLab, etc.).

Terraform is a tool for defining infrastructure-as-code. Though it has competitors (CloudFormation, Pulumi, etc.), Terraform supports the widest variety of different technologies (you can use it to deploy more), all of the important features are free, it’s easy to slowly get started with it, and it has the best handling of infrastructure drift. “Infrastructure drift” is the term for when someone from your team (or an automated system) makes changes that aren’t defined in git. Terraform detects drift and gives you the option to correct it as it happens (unlike CloudFormation which simply shuts itself down and refuses to work until you manually fix the issue).

Terraform Cloud is a managed service for running Terraform. It provides a centralized workspace for Terraform usage, as well as a private module registry, and most importantly, an automated GitOps workflow that you have the option to use. Terraform Cloud is free for less than 5 users (great for small ops teams), and has relatively affordable pricing beyond the free tier (pay $20 per user on the Teams & Governance plan, and unlimited users but pay per “terraform apply” and # of admin users on the Business plan). We recommend starting out with the free tier first (especially for small ops teams), and only moving on to the Teams and Business plans if you like what you see. If you don’t like any of the available pricing tiers, there’s also the open-source tool Atlantis, which gives you the ability to use a Terraform Cloud-like workflow, for free: https://www.runatlantis.io/.

Get started using Terraform 

Terraform and Terraform Cloud can have a steep learning curve if you’re not using them already. If you’re not using Terraform currently, we recommend trying it out in a test project before doing anything further:

  1. Register for a Terraform Cloud account and create a new organization.

  2. Create a new workspace on Terraform Cloud using the “CLI-driven" workflow.

  3. Once you have your new workspace, select it, and change the execution mode to “Local” (find this under Settings >> General >> Execution Mode >> Local. This will let you use Terraform in the “traditional way” while testing things out (executing Terraform directly on your workstation instead of remotely on Terraform Cloud).

  4. Terraform Cloud will provide you with a snippet of code to add to your new project. This will tell your project to use Terraform Cloud to store its state. Add this snippet to a .tf file in a new directory and commit it to a new git repository on Github, Bitbucket, or your favourite VCS provider of choice:

terraform {
    backend "remote" {
      organization = "your-organization-name"

      workspaces {
        name = "your-workspace-name"
      }
    }
  }

Once you’ve got this setup, try following along with one of the official Terraform tutorials for your cloud provider of choice: https://learn.hashicorp.com/terraform?utm_source=terraform_io 

Setting up GitOps workflows on Terraform Cloud

Once you’re feeling confident with your new Terraform workspace, you can set up the full GitOps workflow on Terraform Cloud. We don’t recommend switching over until you have everything working just the way you want it to.

The first step to setting things up is changing the Terraform execution mode back from “Local” to “Remote”. You can do so under Settings >> General >> Execution Mode >> Remote. You’ll likely need to provide Terraform Cloud with valid credentials to allow it to run things on your behalf. Although you could give it your personal credentials, we recommend creating a new IAM user for whatever cloud provider you are using with programmatic access (your cloud provider should give you a set of access keys for the new user). This IAM user will need the appropriate permissions to execute things on your behalf. Add the keys to the “Variables” page for your workspace. You can verify that you’ve set up this part correctly with “terraform plan”. If you get an error here, it means that Terraform Cloud is either missing cloud provider credentials entirely, or the credentials provided do not have the access they need to do things.

Once you’ve got remote runs working, you can set up a new VCS provider to allow Terraform Cloud to talk to your git repo – follow the appropriate documentation for your git provider here: https://www.terraform.io/docs/cloud/vcs/index.html. Once this is set up, go back to your Terraform workspace and select your new VCS provider under Settings >> Version Control. Make sure to enable “Automatic speculative plans”. Finally, once this is all complete, set up a branch protection rule on GitHub/Bitbucket/etc. and enforce that changes to your “master” branch can only happen via pull requests.

Were you able to set all of that up? If so, excellent – let's take a step back and examine the features we’ve put together:

  • Terraform Cloud now has the ability to execute infrastructure changes on your behalf.

  • To make new infrastructure changes, users from your organization will need to submit a PR to your new Terraform repository. Upon opening a PR, the following things will happen:

    • Terraform Cloud will run a “speculative plan” and show you what infrastructure changes would occur if you merged the PR. The plan’s output will be automatically attached to the PR for easy review.

    • If you’re on the Terraform Cloud for Business tier, Terraform Cloud may estimate the predicted cost of the changes for some resources (only works for big cloud providers like AWS and GCP).

    • You have the option to approve and merge the changes. 

  • On merging the PR, Terraform will perform a “terraform plan” and wait for manual approval before applying the changes (“terraform apply”).

  • You (or someone else) can do a final approval of the changes in Terraform Cloud and apply them to production.

Now all you have to do from now on is just make changes in git to make changes to your infrastructure. There’s no need to fool around with awscli or the web dashboard unless you want to – just make changes and review them on GitHub. As mentioned earlier, stack.io has been using this type of GitOps workflow for over a decade now. If you’re looking for a partner to help you get started with a modern infrastructure workflow, get in touch and let us know!