DevOps Tips

Elevating the Standard: A Deeper Look into CoGuard

Configuration changes are just as risky as code changes. Here's our take on the tooling required to give configurations the same, rigorous quality assurance and testing as code and why it's necessary.

Albert Heinle
Written by
Albert Heinle

Configuration files and security

Developers during hackathon: We built an entire application in just 3 days. Developers after hackathon: Adding that icon is going to take 3 weeks. - Unknown

Code Scanning and Configuration Files

The best part of modern software stacks is that different tools are at different points on the Configuration Complexity Clock. Configuration changes are just as risky as code changes. Similar processes and tools are necessary to avoid production outages caused by misconfiguration and security violations that are introduced in configuration parameters or code.

Image: The Configuration Complexity Clock from Mike Hadlow

Automated Tools for Code Review

All software has bugs and that some of the bugs will be security problems. We need to provide developers with the tool to find and fix these mistakes, as  they can be exploited. Code reviews are a valuable tool for catching developer mistakes. And while these reviews should also be looking for security and misconfiguration issues in the code, the expertise in operations and different software systems is just not there. The average developer is not the best resource to look for security bugs or configuration errors at scale. Most developers are very good at building software and making code work, but their mindset is different than the hacker who is trying to break the code. There is a need for automated tools to locate security bugs in code. 

Modern development and operations teams require automated tools to address common problems in code and software configurations. Static analysis tools, aka code scanners, rapidly look at code and find common errors that lead to security bugs. The tools will not resolve underlying design flaws, but they help development and operation teams identify misconfigurations and security bugs before the code and infrastructure is provisioned for production. 

Static Analysis for Configuration Files 

CoGuard provides static analysis security tools, also known as a code scanner,  for Infrastructure as Code (IaC), configuration files for containers, applications, cloud environments and APIs. 

CoGuard solution provides:

  1. Engine and Generalized Infrastructure Model
  2. Rules and Rule-sets for Supported Services

Engine and Generalized Infrastructure Model

CoGuard Engine is built from the ground up to be fast and extensible. It is NOT a ‘glorified grep’ that matches parameters and composite values against known expressions. CoGuard Engine has a generalized model of software and hardware infrastructure. This allows for analysis of different, connected pieces and their relationships. For example, cloud configurations and application level configurations which connect to specific cloud services or other compute instances.  

As an example, take Apache Kafka. One of the many ways to enable synchronization between brokers is to use Zookeeper. This requires Kafka also to be able to connect to Zookeeper. The following example shows a CloudFormation description of two EC2 compute instances for Kafka and Zookeeper. The EC2-instances are configured to be on different subnets, and have no way to connect to each other – this is less common when using CloudFormation but quite common when using the AWS web tools when operators have to select the security group from a large list of similarly named groups. The example will cause an error when deploying and potential downtime. Because the Kafka instance cannot connect to the Zookeeper instance in this configuration being on separate subnets. 

Sidenote: Of course, if the users use Infrastructure as Code (IaC), this mistake is easy to spot in the CloudFormation file. However, if set up in the AWS web console, this mistake is likely to happen, since a mis-click in a drop-down menu pushes you to the wrong page.

Example: CloudFormation for Kafka and Zookeeper on EC2 compute instances



AWSTemplateFormatVersion: '2010-09-09'
Description: VPC, EC2 Instances for Kafka and Zookeeper

Resources:
  KafkaVPC:
    Type: AWS::EC2::VPC
    Properties:
      CidrBlock: 10.0.0.0/16
      EnableDnsSupport: true
      EnableDnsHostnames: true
      Tags:
        - Key: Name
          Value: KafkaVPC

  ZookeeperVPC:
    Type: AWS::EC2::VPC
    Properties:
      CidrBlock: 10.1.0.0/16
      EnableDnsSupport: true
      EnableDnsHostnames: true
      Tags:
        - Key: Name
          Value: ZookeeperVPC

  KafkaSubnet:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref KafkaVPC
      CidrBlock: 10.0.0.0/24
      AvailabilityZone: us-east-1a  # Modify for your preferred AZ
      Tags:
        - Key: Name
          Value: KafkaSubnet

  ZookeeperSubnet:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref ZookeeperVPC
      CidrBlock: 10.1.0.0/24
      AvailabilityZone: us-east-1b  # Modify for your preferred AZ
      Tags:
        - Key: Name
          Value: ZookeeperSubnet

  KafkaInstance:
    Type: AWS::EC2::Instance
    Properties:
      InstanceType: t2.micro  # Adjust as needed
      SecurityGroups:
        - !Ref KafkaSecurityGroup
      KeyName: your-key-name
      ImageId: your-ami-id
      SubnetId: !Ref KafkaSubnet
      UserData: !Base64 |
        # Add your user data script to configure Kafka inside the instance

  ZookeeperInstance:
    Type: AWS::EC2::Instance
    Properties:
      InstanceType: t2.micro  # Adjust as needed
      SecurityGroups:
        - !Ref ZookeeperSecurityGroup
      KeyName: your-key-name
      ImageId: your-ami-id
      SubnetId: !Ref ZookeeperSubnet
      UserData: !Base64 |
        # Add your user data script to configure Zookeeper inside the instance

  KafkaSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Security Group for Kafka EC2
      VpcId: !Ref KafkaVPC
      SecurityGroupIngress:
        - CidrIp: 0.0.0.0/0
          IpProtocol: tcp
          FromPort: 9092  # Kafka port
          ToPort: 9092

  ZookeeperSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Security Group for Zookeeper EC2
      VpcId: !Ref ZookeeperVPC
      SecurityGroupIngress:
        - CidrIp: 0.0.0.0/0

          IpProtocol: tcp
          FromPort: 2181  # Zookeeper port
          ToPort: 2181

And assume the Kafka configuration states:


zookeeper.connect=10.1.0.0:2181

These misconfigurations are identified and flagged by CoGuard before the infrastructure is deployed to the test or production environments. The Generalized Infrastructure Model is flexible enough to understand the configuration requirements between different applications across multiple Virtual Private Clouds and subnets.

Rules & Rule-Sets for Supported Services

CoGuard Rules & Rule-Sets are defined in predicate logic. And the relationship between different devices/applications/layers are abstracted to the generalized infrastructure model and can be accessed using a Python API. New software, cloud endpoints, applications, services are added by expanding the rule-set.  The Generalized Infrastructure Model enforces interconnection analysis as part of new service integration. 

Our goals:

  • Identify and fix misconfigurations and security violations in configuration files
  • Reduce the time/friction to add new services, new compliance frameworks, etc.
  • Reduce the number of  false positives presented to development and operations teams about security violations, i.e., this is not “CVE winner winner chicken dinner” where the “most” CVE violations wins

CoGuard is dedicated to providing extensible, static analysis tools for configuration files of infrastructure and applications. We recognize that code review and remediation needs to adapt with teams from automated to manual. CoGuard’s future is built on Infrastructure as Code, but we help teams adopt new tools and methods to improve the configuration, security and efficiency of their operations. 

Automated Infrastructure aka Infrastructure as Code

“The future is already here - it’s just not very evenly distributed” – William Gibson

How did we get here? A desirable future state for Development + Operations (DevOps) was shown to the world at the Velocity conference in 2009. John Allspaw and Paul Hammond, then of the Flickr Operations and Engineering teams respectively, presented their organizational changes that allowed them to achieve great build/deployment velocity of greater than 10 deployments per day. This was a unique viewpoint on the changes necessary to the relationship between development and operations to enable what is best for the business. 

Repeatable Infrastructure Patterns & Rapid Deployments

Allspaw and Hammond presented a methodology and a culture to enable development and operations teams to be in alignment and move faster. They presented a culture and set of tools, that could be adopted and adapted by teams to help lower the risk of change associated with technology scaling. These were built on some core premises: 

  • Automated infrastructure 
  • Role & configuration management of a task driven infrastructure 
  • Shared version control
  • One step build and deploy 
  • Shared metrics

These are the building blocks to enable trust between development teams and operations teams as things change. Operations needs to be involved in feature discussions. Development needs to be involved in infrastructure discussions. Everyone needs to trust that everyone else is doing what is best for the business. And that change management works without blame to accomplish this. 

You can see the tensions between groups or at least the lack of trust. It often rears its head during discussions about cloud adoption and managed services. Many organizations still opt for on-premise solutions because the change management processes and resilience are not fully formed. And while business owners are interested in reducing costs, shifting expenses from capital expenditure to operational expenditures, or releasing new features that customers are demanding in a timely manner. Change management and trust are part of the culture that needs to be addressed. 

Technology, tools and tactics

The most important technology change to help a broader adoption of DevOps was the introduction of containers and their orchestration. (Think Chef, Puppet, SystemImager, Docker, Kubernetes, Ansible, CloudFormation, Terraform, etc.). Containers are not tied specific environments. Teams were able to take all of their moving parts in their organization and transfer them into specific, immutable, reproducible task driven environments in a scripted way. No more manual fiddling with the knobs and levers – the knobs and levers are just parameters in code. Virtual machines could be configured automatically in code. Containers enabled faster movement between environments. 

The broad availability of containers and container orchestration enabled teams to define infrastructure programmatically. The rise of infrastructure as code (IaC) has pushed teams to work from a shared source control repository. As teams are able to define their resources as code in code repositories, operations teams adopt the tools of development teams and development teams are required to think about operations and infrastructure. This has enabled change management between development and operations teams.

And whether it is serverless, Platform-as-a-Service, or whatever new tech that emerges, the next question that needs to be addressed is “What about security?”.

What about security?

“All code contains bugs. Some of those bugs are security bugs we must find.” 

Where should we start? Let’s start with a list of known vulnerabilities, download the latest CVE database from MITRE. Then grep for the tools you are using, and the output of your package manager of installed services. If there is an entry, alert, and review for potential remediation steps  This is a very common solution: a number of scanners do exactly this, Trivvy/Grype.

It can be as simple as:



for i in $(cat your_list_of_used_software.txt)
do
  # allitems is the csv from MITRE
  grep -i “$i” allitems.csv | wc -l > new_number.txt;
  test $(cat old_number.txt || 0) -eq $(cat new_number.txt);
  mv new_number.txt old_number.txt;
done

This is often a starting point for building IaC code scanners. Companies have extended this approach. The Snyk team uses its deep knowledge of software issues in the open source projects to identify and report on them when scanning a container.

Analyzing IaC for Better or Worse

IaC like Terraform, Kubernetes, CloudFormation and cloud API endpoints (all the vendors) can be analyzed too.

This wasn’t possible in the past. Cloud services and IaC have made it easy to define in build or access existing configurations in live systems. Parameters can be discovered including: 

  • Firewall rules
  • Backups verification
  • Logging
  • Managed services are running with best practices

The ability to express infrastructure as code has led to the evolution of grep tools. These tools are built on  Open Policy Agent (OPA). OPA policies are written in a DSL called REGO. This has created a de-facto standard that can be used by anyone. But it has also made it difficult to create new policies and expand the current rule-sets. 

A false sense of security

The adoption of OPA tools has created a false sense of completeness. 

These new tools provided a new way to begin the analysis of cloud endpoints, packaged and customized containers. It is easy to hit the ceiling of the capabilities of these solutions. The limitations become obvious when there is configurable software run inside each new container that is configured to interact with other containers and applications in the software infrastructure. These connection points leave room for misconfigurations, typos, and security vulnerabilities at and across each device, network and application.

For example, KICS, provides IaC code scanning and is built on top of a REGO/OPA tool set. KICS supports Ansible configurations for major cloud providers (AWS, GCP, Azure) configurations only. It is possible to add additional REGO rules to to extend to other cloud providers and to add Ansible policies for other specific Linux setups (e.g. the Builtins module). But this requires significant developer commitment and has not been completed at the time of publication. 

The OPA tools provide development and operations teams a false sense of security. By being able to claim that scans were performed on the infrastructure and all issues were remediated according to their current sets of security analyzers.

CoGuard is an alternative to OPA and REGO based code scanners. 

CoGuard - Code Scanning for Configuration Files

CoGuard: 

  • expands easily to new software (in terms of implementation time);
  • easier to  maintenance and expansion of existing supported software;
  • scan on a whole infrastructure model, and not just siloed pieces.
  • Enables teams in the discovery phase, we are not restricting ourselves to simply the api endpoints of the cloud or the files directly.

CoGuard is composed of 2 parts:

  • CoGuard Engine and Generalized Infrastructure Mode
  • Rules and rulesets based for Supported Services and Compliance Frameworks 

The CoGuard Engine is a predicate logic engine. We build our own representation of the infrastructure, network, containers and applications similar to the way a compiler does. We are looking at the relationship between configuration parameters at a deeper level. This provides us with flexibility to both check or validate configurations and policies, but it also allows many configurations to be auto-remediated. Auto-remediation is a tool that allows us to capture misconfigurations as part of the code commit or code review process. And create Pull Requests to allow the suggested configuration changes to be evaluated in the existing development process.

More than 70% of our current rules have shown to be candidates for auto-remediation. We are committed to reducing the CVE false positives and focus on configuration changes that prevent misconfigurations and security errors. Along with fitting tools like auto-remediation into existing development workflows and code review processes. We recognize the need to save teams time and allow others to contribute to the shared code repository. 

The flexibility of separate rules and rulesets allows us to add emerging security best practices, compliance frameworks, and new services. It also means that we’re able to leverage these same compliance frameworks to determine violations for infrastructure in the public cloud, or behind the firewall on-premise. 

Extending CoGuard - Our Roadmap

While the addition of rules is straightforward, it remains a manual process. We have begun to use publicly available LLMs to translate structured documents such as software manuals into policies. We know that “software is eating the world” and that all that software has been configured in some way. Automating the process will allow CoGuard to keep the rules and rulesets up-to-date with changes and to expand as new tools are developed. 

Get started today

Register for a CoGuard account

Install the CoGuard CLI


pip3 install coguard-cli

And see a sample report on common infrastructure. Or better yet, run it against your repository. 

Explore a test environment

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Check out and explore a test environment to run infra audits on sample repositories of web applications and view select reports on CoGuard's interative dashboard today.