Skip to main content

Dagger, the new CI/CD era?

·14 mins
ci/cd dagger automation opentofu iac infrastructure
Romain Boulanger
Author
Romain Boulanger
Cloud Architect with DevSecOps mindset
Table of Contents
This article was created with version v0.13.5 of Dagger using the Python SDK.

What is Dagger?
#

Dagger is an open source toolkit (SDK) for building container-based CI/CDs. It is distributed under the Apache Version 2.0 licence for the Dagger engine, the core component.

One of Dagger’s key advantages, thanks to containerisation technology, is the ability to test pipelines locally, eliminating the common “push and pray”.

This is a common issue for many, and I’m sure it is for you as well! Who hasn’t had an invalid YAML error after modifying their .gitlab-ci.yml or a file in the .github/workflows directory?

Dagger aims to address this by allowing you to test changes locally, add new steps, and then deploy them to your CI/CD tool, whether it’s GitLab CI, GitHub Actions, Circle CI, and even Kubernetes!

Container-based technology, whether using Docker, Podman or others, guarantees reusability and consistency of execution regardless of the target platform, and encourages collaboration thanks to the ease of handling the code.

Dagger’s configuration supports three main languages: Python, Go and Typescript. Other languages, such as .Net, Java, Rust, PHP, etc. are still experimental. This approach lets you work in your application’s native language, avoiding multiple code bases and the complexity of YAML.

A key point that may slow down the adoption of this kind of tool is the need, for the time being at least, to run the main container, the Dagger engine with the root user.

Personally, I opted for Python, the language I’m most familiar with, particularly for developing quick-to-develop scripts. Although I’m not an advanced Python developer, I found the documentation on the three main languages extremely well-organised to help me design my first pipeline.

How it works
#

Dagger operates by invoking calling Dagger Functions, there are two types:

The ones you can create. These functions are represented by the @function annotation in Python and can be called by the dagger call command.

And of course there are the Dagger Functions known as Core that you can chain together to perform your tasks. These are provided by the SDK Dagger.

For instance, this block of code below will create a container by retrieving an nginx:1.25-alpine image, mounting a directory in /usr/share/nginx/html and exposing port 80.

dag.container()
.from_("nginx:1.25-alpine")
.with_directory("/usr/share/nginx/html", build)
.with_exposed_port(80)

I therefore used several functions such as from_, with_directory and with_exposed_port listed as Dagger Functions Core.

The SDK Python provides a wealth of information about the various APIs you can call. Behind these functions, the Dagger engine uses GraphQL for basic core functionality such as executing containers, interacting with files or folders, and so on. As previously mentioned.

Each API request uses the Directed Acyclic Graph (DAG) to calculate the result, optimising performance by caching operations. This caching feature is a key aspect of the tool, enabling fast pipeline execution and saving valuable time during the process.

If you’d like to find out more about Dagger’s architecture, I recommend you take a look at this page, giving you an overview of the tool.

As an introduction, I recommend following the Quickstart to get your feet wet in the language of your choice, without forgetting to install the CLI first.

All of Dagger’s features are listed in the Cookbook, in particular where I found many interesting features such as mounting a secret as a file or invalidating the cache for various operations. These points will be covered in more detail later in the post.

To avoid constantly reinventing the wheel, you can check out the Daggerverse, a directory of publicly distributed modules that are fully reusable.

Finally, for those wishing to view and debug pipelines defined in Dagger, a Dagger Cloud service is available, with a free offer for individuals.

My first pipeline
#

Now it’s time to get serious!

I had the idea to use Dagger in a context I frequently work with: deploying infrastructure as code, particularly with OpenTofu and Google Cloud.

The goal is to replicate the structure of this CI/CD pipeline (French blog post) tailored to OpenTofu by creating the different functions within Dagger.

You can get the code from GitHub and adapt it to your needs:

axinorm/dagger-tofu-pipeline

A Dagger pipeline with OpenTofu code !

Python
0
0

OpenTofu code structure
#

The infrastructure code is fairly straightforward. It is located in the infra folder and contains two folders, the first modules for centralising the resource naming module. The second is the network layer.

This network layer deploys a VPC and a subnetwork within Google Cloud. Two configurations can be defined:

  • A first .tfvars file in the configurations folder to instantiate the chosen layer and containing mandatory variables such as, for instance, subnets to define the subnets;
  • The backend configuration associated with the file above in the backend-configurations folder, the file must have the suffix -backend.

An example in the code repository with the file example.tfvars :

project_id = "" # To be modified

project_name     = "example"
environment_type = "dev"
location         = "europe-west6"

subnets = [
  {
    name          = "frontend"
    ip_cidr_range = "10.0.0.0/24"
  },
  {
    name          = "backend"
    ip_cidr_range = "10.0.1.0/24"
  },
]

and example-backend.tfvars:

bucket = "" # To be modified
prefix = "opentofu/tfstates/network/example"

These files can be completely customised based on your requirements and necessitate a bucket in Google Cloud Storage as the backend, along with a project_id and a Google Cloud service account that has the permissions to create the aforementioned network resources.

The roles/compute.networkAdmin role will be sufficient for the service account at project level.

Dagger code overview
#

Dagger’s specific code is located in the directory dagger/src/main, where there are four files :

  • __init__.py: defines the entry point for Dagger;
  • tofu_pipeline.py: contains the custom Dagger Functions to be called, i.e. deploy or destroy;
  • tofu_context.py: is used to store the OpenTofu execution context, in other words the various arguments that will be used when the tofu command is called;
  • tofu_jobs.py: defines the set of unit jobs that will be called in sequence, such as the fmt, validate, plan, apply, destroy commands, and so on. It also contains the execution definition for Tflint and Checkov.

To invoke a Dagger Function deploy, you can use the command dagger call like this:

dagger call deploy \
  --directory=./infra --layer=network --configuration=example --credentials=file:./google_application_credentials.json --tofu-version=1.8.3 --checkov-version=3.2.257 --tflint-version=v0.53.0

Several arguments are required:

  • --directory: Path of the folder containing the infrastructure as code, ./infra for example ;
  • --layer: Name of the infrastructure folder to be deployed;
  • --configuration: Name of the configuration (.tfvars file) inside the folder defined above;
  • --credentials: Path to the JSON file for the Google Cloud service account. For example: file:./google_application_credentials.json. The file: prefix is mandatory for mounting content as a file;
  • --tofu-version: OpenTofu version (default value: latest) ;
  • --checkov-version: Checkov version (default value: latest) ;
  • --tflint-version: Tflint version (default value: latest).

Get your hands dirty
#

In the tofu_jobs.py file, the _tofu function is used as the basis for creating the container that will run all OpenTofu actions (except for Checkov and Tflint, using different images):

def _tofu(self) -> dagger.Container:
    """Return a configured container for OpenTofu"""

    return (
        dag.container()
        .from_(f"ghcr.io/opentofu/opentofu:{self._tofu_context.tofu_version}")
        .with_directory("/infra", self._tofu_context.directory)
        .with_mounted_cache(
            f"/infra/{self._tofu_context.layer}/.terraform",
            dag.cache_volume("terraform"),
            sharing=dagger.CacheSharingMode.SHARED
        )
        .with_mounted_cache(
            "/infra/plan",
            dag.cache_volume("plan"),
            sharing=dagger.CacheSharingMode.PRIVATE
        )
        .with_workdir(f"/infra/{self._tofu_context.layer}")
    )

Here are a few explanations of the code. First of all, the image is pulled according to the version defined in the execution context:

dag.container()
.from_(f"ghcr.io/opentofu/opentofu:{self._tofu_context.tofu_version}")

As with traditional containers, it is possible to mount volumes, particularly for source code. To do this, use the with_directory function:

.with_directory("/infra", self._tofu_context.directory)

OpenTofu will need to store providers and modules, so a shared cache system should be set up to reuse binaries on different runs regardless of settings, resulting in the dagger.CacheSharingMode.SHARED:

.with_mounted_cache(
    f"/infra/{self._tofu_context.layer}/.terraform",
    dag.cache_volume("terraform"),
    sharing=dagger.CacheSharingMode.SHARED
)

The same procedure is used in a directory to store the plan, but sharing is deactivated with dagger.CacheSharingMode.PRIVATE. This is because each plan is specific to the OpenTofu execution context.

Finally, the working directory is positioned according to the infrastructure layer defined in the context:

.with_workdir(f"/infra/{self._tofu_context.layer}")

As you can see, this function is used as the core for the different executions of the tofu command. I won’t go into detail about the checkov or tflint functions, as they work in a similar way.

It is used by the format and init functions as follows:

async def format(self) -> str:
    """Check the code formatting"""
    return await (
        self._tofu()
            .with_exec([
                "tofu",
                "fmt",
                "-check",
                "-recursive",
                "-write=false",
                "-diff",
            ]).stdout()
    )

The self._tofu() calls the method above and allows other Dagger Functions to be chained together, because _tofu returns a container (dagger.Container).

The with_exec instruction uses the command and all the arguments to initiate the code formatting check.

Finally, the checkov and tflint functions rely on their respective configuration files, namely .checkov.yml and .tflint.hcl. These files must be located at the root of the code repository, as shown in the provided repository.

Invalidating the cache, the key point
#

When working with infrastructure as code, it’s important to generate a plan for each run, even if no changes have been made to the source code. This is particularly crucial since the infrastructure could be altered by manual actions, even though this is considered poor practice.

This is why it is necessary to invalidate Dagger’s cache system.

You find this in the plan function:

.with_env_variable("CACHEBUSTER", str(datetime.now()))

This process, described in the documentation, can be used to force Dagger to replay the execution of commands within the container.

Secret handling
#

In the _init function, that performs the tofu init command, the file containing the service account for connecting to Google Cloud must be mounted in the container.

For this, the cookbook gives an example to be used with the with_mounted_secret Dagger Function:

def _init(self) -> dagger.Container:
    """Initialise OpenTofu layer"""

    return (
        self._tofu()
            .with_mounted_secret("/infra/credentials", self._tofu_context.credentials)
[...]

This method protects sensitive information without having to store it in hard coded format.

Running locally
#

All good?

It’s time to deploy the OpenTofu code!

To do this, use the command dagger call :

dagger call deploy --directory=./infra --layer=network --configuration=example --credentials=file:./google_application_credentials.json --tofu-version=1.8.3 --checkov-version=3.2.257 --tflint-version=v0.53.0 --progress=plain
The --progress=plain is not required but it helps to provide a better visibility of what’s happening in your pipeline.

Here is a log example:

[...]
15  : TofuPipeline.deploy(
15  :     checkovVersion: "3.2.257"
15  :     configuration: "example"
15  :     credentials: setSecret(name: "a4c62b237ad8da55664161b75ebdb135eed885b818e20c0ec5573e0cd963d1f1"): Secret!
15  :     directory: ModuleSource.resolveDirectoryFromCaller(path: "./infra"): Directory!
15  :     layer: "network"
15  :     tflintVersion: "v0.53.0"
15  :     tofuVersion: "1.8.3"
15  :   ): Void
16  :   upload /Users/axinorm/dev/perso/dagger/tofu-pipeline from 3c29abyrf1rwefjc5ftbxz1k4 (client id: vmt19b2jf3z6x2m83s7crt84u, session id: p7zj7atp82diiggzwyaigavbw)
16  :   upload /Users/axinorm/dev/perso/dagger/tofu-pipeline from 3c29abyrf1rwefjc5ftbxz1k4 (client id: vmt19b2jf3z6x2m83s7crt84u, session id: p7zj7atp82diiggzwyaigavbw) DONE [0.2s]
17  :   Container.from(address: "ghcr.io/opentofu/opentofu:1.8.3"): Container!
18  :     resolving ghcr.io/opentofu/opentofu:1.8.3
18  :     resolving ghcr.io/opentofu/opentofu:1.8.3 DONE [1.1s]
19  :     cache request: pull ghcr.io/opentofu/opentofu:1.8.3
20  :   Container.withDirectory(
20  :       directory: ModuleSource.resolveDirectoryFromCaller(path: "./infra"): Directory!
20  :       exclude: []
20  :       include: []
20  :       path: "/infra"
20  :     ): Container!
21  :     copy / /infra
17  :   Container.from DONE [1.1s]
22  :   Container.withMountedCache(
22  :       cache: cacheVolume(key: "terraform"): CacheVolume!
22  :       path: "/infra/network/.terraform"
22  :     ): Container!
22  :   Container.withMountedCache DONE [0.0s]
23  :   Container.withMountedCache(
23  :       cache: cacheVolume(key: "plan"): CacheVolume!
23  :       path: "/infra/plan"
23  :       sharing: PRIVATE
23  :     ): Container!
23  :   Container.withMountedCache DONE [0.0s]
24  :   Container.withWorkdir(path: "/infra/network"): Container!
24  :   Container.withWorkdir DONE [0.0s]
25  :   Container.withExec(args: ["tofu", "fmt", "-check", "-recursive", "-write=false", "-diff"]): Container!
25  :   Container.withExec DONE [0.0s]
26  :   Container.stdout: String!
26  :   Container.stdout DONE [5.7s]
27  :   Container.withExec(args: ["tofu", "init", "-backend-config", "backend-configurations/example-backend.tfvars"]): Container!
28  :   Container.withExec(args: ["tofu", "validate"]): Container!
28  :   Container.withExec DONE [0.0s]
29  :   Container.stdout: String!
27  :   Container.withExec(args: ["tofu", "init", "-backend-config", "backend-configurations/example-backend.tfvars"]): Container!
27  :   [0.2s] |
27  :   [0.2s] | Initializing the backend...
27  :   [0.6s] |
27  :   [0.6s] | Successfully configured the backend "gcs"! OpenTofu will automatically
27  :   [0.6s] | use this backend unless the backend configuration changes.
27  :   [0.8s] | Initializing modules...
27  :   [0.8s] | - naming in ../modules/naming
27  :   [0.8s] |
27  :   [0.8s] | Initializing provider plugins...
27  :   [0.8s] | - Finding hashicorp/google versions matching "~> 6.5.0"...
27  :   [2.8s] | - Installing hashicorp/google v6.5.0...
27  :   [7.0s] | - Installed hashicorp/google v6.5.0 (signed, key ID 0C0AF313E5FD9F80)
27  :   [7.0s] |
[...]
27  :   Container.withExec DONE [0.0s]
28  :   Container.withExec DONE [9.0s]
28  :   [8.8s] | Success! The configuration is valid.
29  :   Container.stdout DONE [8.9s]
30  :   Container.from(address: "ghcr.io/terraform-linters/tflint:v0.53.0"): Container!
31  :     resolving ghcr.io/terraform-linters/tflint:v0.53.0
31  :     resolving ghcr.io/terraform-linters/tflint:v0.53.0 DONE [1.0s]
30  :   Container.from DONE [1.0s]
32  :   Container.stdout: String!
30  :   Container.from(address: "ghcr.io/terraform-linters/tflint:v0.53.0"): Container!
33  :     pull ghcr.io/terraform-linters/tflint:v0.53.0
34  :   Container.withDirectory(
34  :       directory: ModuleSource.resolveDirectoryFromCaller(path: "./infra"): Directory!
34  :       exclude: []
34  :       include: []
34  :       path: "/infra"
34  :     ): Container!
34  :   Container.withDirectory DONE [0.0s]
35  :   Container.withFile(
35  :       path: "/infra/network/.tflint.hcl"
35  :       source: Directory.file(path: ".tflint.hcl"): File!
35  :     ): Container!
35  :   Container.withFile DONE [0.0s]
36  :   Container.withWorkdir(path: "/infra/network"): Container!
36  :   Container.withWorkdir DONE [0.0s]
37  :   Container.withExec(args: ["tflint", "--recursive"]): Container!
37  :   Container.withExec DONE [0.0s]
30  :   Container.from(address: "ghcr.io/terraform-linters/tflint:v0.53.0"): Container!
33  :     pull ghcr.io/terraform-linters/tflint:v0.53.0 DONE [0.1s]
32  :   Container.stdout DONE [2.6s]
38  :   Container.from(address: "bridgecrew/checkov:3.2.257"): Container!
39  :     resolving docker.io/bridgecrew/checkov:3.2.257
39  :     resolving docker.io/bridgecrew/checkov:3.2.257 DONE [1.4s]
40  :     pull docker.io/bridgecrew/checkov:3.2.257
41  :   Container.withDirectory(
41  :       directory: ModuleSource.resolveDirectoryFromCaller(path: "./infra"): Directory!
41  :       exclude: []
41  :       include: []
41  :       path: "/infra"
41  :     ): Container!
41  :   Container.withDirectory DONE [0.0s]
42  :   Container.withFile(
42  :       path: "/infra/network/.checkov.yaml"
42  :       source: Directory.file(path: ".checkov.yaml"): File!
42  :     ): Container!
43  :     copy /.checkov.yaml /infra/network/.checkov.yaml
38  :   Container.from(address: "bridgecrew/checkov:3.2.257"): Container!
40  :     pull docker.io/bridgecrew/checkov:3.2.257 DONE [0.1s]
38  :   Container.from DONE [1.4s]
42  :   Container.withFile DONE [0.0s]
44  :   Container.withWorkdir(path: "/infra/network"): Container!
44  :   Container.withWorkdir DONE [0.0s]
45  :   Container.withExec(args: ["checkov", "-d", "."]): Container!
45  :   Container.withExec DONE [0.0s]
46  :   Container.stdout: String!
45  :   Container.withExec DONE [27.6s]
45  :   [27.4s] | terraform scan results:
45  :   [27.4s] |
45  :   [27.4s] | Passed checks: 0, Failed checks: 0, Skipped checks: 4
45  :   [27.4s] |
45  :   [27.4s] |
46  :   Container.stdout DONE [29.9s]
47  :   Container.withExec(args: ["tofu", "plan", "-var-file", "./configurations/example.tfvars", "-out", "/infra/plan/example.tofuplan"]): Container!
48  :   Container.stdout: String!
47  :   Container.withExec(args: ["tofu", "plan", "-var-file", "./configurations/example.tfvars", "-out", "/infra/plan/example.tofuplan"]): Container!
47  :   [3.0s] |
47  :   [3.0s] | OpenTofu used the selected providers to generate the following execution
47  :   [3.0s] | plan. Resource actions are indicated with the following symbols:
47  :   [3.0s] |   + create
47  :   [3.0s] |
47  :   [3.0s] | OpenTofu will perform the following actions:
47  :   [3.0s] |
47  :   [3.0s] |   # google_compute_network.this will be created
47  :   [3.0s] |   + resource "google_compute_network" "this" {
47  :   [3.0s] |       + auto_create_subnetworks                   = false
47  :   [3.0s] |       + delete_default_routes_on_create           = false
47  :   [3.0s] |       + gateway_ipv4                              = (known after apply)
47  :   [3.0s] |       + id                                        = (known after apply)
47  :   [3.0s] |       + internal_ipv6_range                       = (known after apply)
47  :   [3.0s] |       + mtu                                       = (known after apply)
47  :   [3.0s] |       + name                                      = "vpc-example-ew6"
47  :   [3.0s] |       + network_firewall_policy_enforcement_order = "AFTER_CLASSIC_FIREWALL"
47  :   [3.0s] |       + numeric_id                                = (known after apply)
47  :   [3.0s] |       + project                                   = "..."
47  :   [3.0s] |       + routing_mode                              = (known after apply)
47  :   [3.0s] |       + self_link                                 = (known after apply)
47  :   [3.0s] |     }
[...]

At the end, everything is successfully deployed! This ensures that all the steps performed are visible, from the fmt to the plan, I’ve removed the apply to shorten the logs but you get the general idea.

Running in GitHub Actions
#

Once local execution has been validated, it’s time to start deploying our Dagger pipeline on GitHub Actions, for example.

You can retrieve the content and create a dagger.yml file in the .github/workflows directory.

It is obviously based on this one from the documentation, adding the part for creating the secret containing the Google Cloud service account.

name: dagger
on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Create secret file with Google Cloud credentials
        run: |
          cat << EOF > ./google_application_credentials.json
          ${{ secrets.GOOGLE_APPLICATION_CREDENTIALS }}
          EOF          

      - name: Deploy infrastructure
        uses: dagger/dagger-for-github@v6
        with:
          version: 0.13.5
          dagger-flags: --progress=plain
          verb: call
          workdir: .
          module: ""
          args: deploy --directory=./infra --layer=$LAYER --configuration=$CONFIGURATION --credentials=file:./google_application_credentials.json --tofu-version=$TOFU_VERSION --checkov-version=$CHECKOV_VERSION --tflint-version=$TFLINT_VERSION
        env:
          LAYER: ${{ vars.LAYER }}
          CONFIGURATION: ${{ vars.CONFIGURATION }}
          TOFU_VERSION: ${{ vars.TOFU_VERSION }}
          CHECKOV_VERSION: ${{ vars.CHECKOV_VERSION }}
          TFLINT_VERSION: ${{ vars.TFLINT_VERSION }}

Don’t forget to define the variables LAYER, CONFIGURATION, TOFU_VERSION, CHECKOV_VERSION and TFLINT_VERSION in the GitHub Actions variables, as well as GOOGLE_APPLICATION_CREDENTIALS in the secret section.

To conclude
#

Dagger stands out for its container-based approach, enabling you to run your pipelines either locally or in your preferred tool. It’s easy to learn, thanks to the documentation, but you’ll need to use a language that’s compatible with the SDK or interact with the APIs available in GraphQL.

Personally, I think that the Dagger concept is highly promising, as it provides developers the opportunity to check their code under standardised conditions using the Dagger Functions defined, without relying on third-party tools before the famous git push.

From an application developer’s perspective, since each CI/CD tool has its own structure and format, Dagger helps standardise this process, simplifying the complexity of adaptation and making migrations from one tool to another smoother.

However, as mentioned earlier, the need to run Dagger in privileged mode is a significant security consideration, particularly when using your own runners. I hope future versions will address this issue with an alternative solution.

As always, you need to look at it in the light of your own context, but it’s certainly a tool that offers an innovative approach, even for infrastructure as code!

Related

Cloud Security: best practices
·20 mins
cloud security aws azure googlecloud network architecture iac infrastructure
Protect your services with Cloudflare and deploy your configurations with OpenTofu
·16 mins
iac opentofu code cloudflare terraform security dns waf mtls
Some good practices to put in place when building Terraform code
·11 mins
iac aws azure googlecloud terraform code