Don’t let your IaC Configurations Drift

Written by

Albert Heinle

Configuration drift, like s—, happens.

Configuration drift happens when the system’s actual state is different from the original or intended state. We often think about configuration drift as the result of people’s actions, i.e., patching software, adding new resources, making temporary “fixes” to prod but not altering the codebase. But we continue to see a lack of clarity/understanding about the expected state of the system, i.e., when we leverage PaaS/IaaS and accept default settings, or we don’t pin the version on containers included in a project.

Wasn’t Infrastructure as Code (IaC) supposed to fix this? The goal is to increase development velocity, more features deployed sooner. Using IaC and Continuous Deployment (the CD in CI/CD) can be powerful tools to establish a known expected state of the system. But it does require that you have full insight into all the buttons and knobs (configuration settings) for the infrastructure before it’s deployed.

It’s possible using the right tools and automation techniques to minimize configuration drift. However, if teams (Dev, DevOps, IT, SRE, etc.) are able to make manual changes to apps or infrastructure (maybe by ssh’ing into prod to make manual changes, just this one time) these changes are often not observed until a breach or failure occurs. And when that happens, engineers/developers are required to spend valuable time to understand why something that works in one environment behaves differently.

There are 2 key tactics that can help manage configuration drift:

Immutability - configurations can not be changed on a live system after it has been deployed. Changes are made to the repository and the code is redeployed to see the changes.
Configuration as Code - all configuration settings are stored in a central Git repository. And any changes to a live environment are made by making a pull request from this repository.

This requires an automated deployment environment and central repository of configurations. But when some of the configurations are unknown, not explicitly set by templates/modules or embedded in provided IaaS/PaaS settings it leaves risk for configuration drift and misconfigurations.

Increase Release Velocity

We’re adopting IaC and CD to go faster. To release more software sooner. But cloud native applications are complex. We use Terraform for our infrastructure.

Take the setup of AWS’s Elastic Kubernetes Service (EKS) for example. It can be very tedious. So we cheat a little. People create modules for common use cases. It helps teams reduce the learning curve and workload of adopting new technologies, thus going faster. For a lot of modules there is an “official” provider, here are the official Hashicorp AWS modules. And there are modules for specific use cases. Just for AWS, there are about 5000 unofficial modules that have a variety of special use cases.

Scanning Modules

Modules are great. We just want to understand and be aware of configuration and security risks associated with each module before deploying them into prod. There are no guarantees that using current scanners (open source and/or commercial) will detect the configuration parameters in most modules, unless someone has extended the ruleset or queries to include the specific modules and their configuration parameters. If you’re using community Kubernetes modules, you will probably get a false sense of security when you run your terraform files through scanners and they generate no complaints or errors.

Example: Terraform using the kubectl-module:


resource "kubectl_manifest" "test" {
  yaml_body = <<YAML
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  securityContext:    
    runAsUser: 1000  
    containers:  
    - name: nginx    
      image: nginx    
      securityContext:      
        runAsUser: 2000      
        allowPrivilegeEscalation: true
YAML
}

We have tested a number of scanners on this file (versions as of publishing: KICS v1.7.9, tfsec 1.28.5, Snyk 1.1280.0 and CoGuard 0.2.31). None of them were able to extract that there is a very bad pod deployed, one where privilege escalation is true. (The example manifest defines a Pod that runs a container where allowPrivilegeEscalation is true. This allows the Pod to bypass the their existing permissions, and gain more privileges than its parent).

Only CoGuard warned that a third party package is used (flag terraform_kubernetes_do_not_use_community_modules). This should be a trigger to investigate the settings of the third-party package.

First of all, when you use an unofficial or a partner module, you need to know exactly why. Most of the time it is your DevOps/Developer velocity, and that is indeed an okay reason. If it is not, then stick to the official modules whose side-effects are well-understood and automatically checked.

Top-down or Bottom-up

In a perfect world, all of the configuration settings would be understood and set in a centralized repository before software has been deployed. But we’re realists. We often start by looking at production systems for misconfigurations and the unintended side-effects of configurations. To figure this out, we export the configuration of deployed cloud infrastructure to Terraform.

You can do this by for AWS by running:

coguard cloud aws

The current cloud configuration infrastructure is exported as Terraform files using the official providers (we do not use any community modules, it’s “pure” Terraform ;-). And these files are scanned using CoGuard for misconfigurations, security best practices and compliance. This provides us a method by exporting the full configuration settings, to understand what the modules have actually done as part of their inclusion. And more importantly what needs to be adjusted. The variances/issues can be fixed directly by setting the corresponding configuration parameters in your Terraform file (and checked into your IaC repo), or file a bug-report to the module-maintainers to ensure that they address the issue.

Conclusion

The use of third-party modules in Terraform or other IaC solutions can cause unintended configuration drift. In order to not sacrifice the great promise of having all configurations nicely as code in your repo, CoGuard offers the ability to snapshot, export and scan your production cloud configurations, so that you can use third party modules, but still ensure that you are not hit by unintended side-effects.

Sign up for CoGuard and install the CoGuard CLI to create a Terraform file containing your cloud configuration settings (this assumes you have the CLI from your cloud provider installed locally).

pip3 install coguard-cli
coguard cloud [aws|gcp|azure]

Photo credit: Jakob Rosen on Unsplash

Dangerous Defaults