Introduction#
Regardless of the context, On-Premise or Cloud, it’s hard not to mention Terraform when it comes to the automated creation of an infrastructure. Back in the day, even without a major release, Terraform established itself as a standard, thanks in particular to its ability to manage the lifecycle of deployed resources, while combining a high degree of traceability and reproducibility thanks to the “code” aspect.
Even though its publisher, Hashicorp, has recently changed its licence, Terraform remains free for the vast majority of the public, who do not sell commercial solutions associated with this tool.
Following this major change, the community split in two leading the creation of OpenTofu. It guarantees compatibility with Terraform since version 1.6.0, but will gradually offer distinct features that are eagerly awaited: such as State encryption, scheduled for 1.7.0.
No matter what you use, Terraform or OpenTofu, the way you write code is pretty much the same. That’s why, in this article, I’m going to detail a number of good practices to put in place when using this type of infrastructure as code tool.
It’s crucial to consider how you design Terraform code, ensuring it is as scalable as possible while adhering to the language’s constraints and standards.
For illustration purposes, I’ll give a few examples from the three main Cloud providers, namely: Google Cloud, AWS and Azure.
Finally, I won’t go into more detail about how Terraform works, as I’ve already presented it in previous articles.
Terraform code is still code#
Although the main purpose of Terraform is to deploy infrastructure in a declarative way, the code written must meet the same requirements as traditional development code such as Java or Go.
That’s why I recommend including tools for checking compliance with best practices, and even unit tests, right from the first line of code.
A few months ago, I wrote an article about GitLab CI and the implementation of Terraform, detailing several small tools that perform this kind of verification. I recommend reading it to get an overview of the components I usually use to generally improve code quality and prevent the introduction of security flaws on the infrastructure side.
To be honest, these syntax checking tools are very inexpensive to implement and help you adhere to language standards. If you want to introduce them later in the development of your infrastructure, the technical debt will only be greater!
Mono repo or multi repos ?#
When starting a project with Terraform, there’s a question that comes up quite often: “Should you use a single repo or multiple ones?”
In hindsight, there isn’t a definitive answer; it largely depends on the configuration of the team responsible for developing and maintaining the Terraform code.
The classic case is the construction of a Landing Zone, regardless of the Cloud provider. Typically, a single team manages the implementation of the organization’s structure, permissions management, and network architecture.
It may be more appropriate to have a single repo for this use case, as there are a number of advantages:
- Centralisation of the whole infrastructure, with the Git code repository becoming the source of truth for the infrastructure deployed;
- A single CI/CD enabling the same checks and tests to be run regardless of the layer of infrastructure deployed;
- A centralised view on the CI/CD side for planning and applying, allowing you to see drifts at a glance.
However, if the network team is completely separate from the team managing the Cloud organisation, there may be an advantage to separating these two parts:
- Better management of developer permissions and total isolation;
- Totally independent versioning.
Nevertheless, if you use multiple repositories, ensure that you maintain a consistent code structure and apply the same checks in your CI/CD pipeline to uphold high standards across the infrastructure!
It’s up to you to decide, depending on the way you work!
Git project structure#
Classic case#
For code structure, you can have as many .tf
files as you like. Bear in mind that a minimum set of base files is crucial to maintain a certain degree of consistency between your Git projects:
main.tf
: Main file to declare your resources;variables.tf
: Variable declaration file;outputs.tf
: Declaration file for output blocks;providers.tf
: Lists Terraform providers and constraints at version level;backend.tf
: Terraformbackend
configuration fortfstate
management, I’ll come back to this a little later.
The aim is also to have a main.tf
that is fairly easy to read. By this I mean that you should not hesitate to break down your resources into other files when you feel it is relevant.
It might be interesting to split up the resources associated with the network and those for computing: for example, network.tf
and a compute.tf
. In this case, the main.tf
file could contain our Resource Group in the case of code for Azure.
Environment management#
If the same resources are to be deployed in different environments (development, test, production, and so on), it is wise to avoid duplicating the code as much as possible.
This is why it is preferable to rely on a mechanism integrated into Terraform, the tfvars
files.
In this example for deploying a Google Kubernetes Engine platform, the environments are managed using three .tfvars
files located in the configurations
folder. Each file corresponds to a deployment environment.
The variables.tf
file defines a project_id
variable that will only be instantiated through one of these three files.
This approach requires the file to be passed as a parameter so that the plan
and apply
instructions can be carried out. See the example here:
# Plan
terraform plan -var-file=./configurations/development.tfvars
# Apply
terraform apply -var-file=./configurations/production.tfvars
Finally, it is recommended that only variables that are dependent on the environment are put in these files, to ensure better readability.
Naming conventions#
Defining a naming convention is crucial for deploying your resources uniformly. This should be established before writing any code and includes both the name of the resource itself and the suffix that all your resources will use.
How do you meet this requirement? There are several ways:
- Defining your hard coded rules for each resource: not at all scalable and does not allow the convention to be centralised in any way;
- Use a dedicated provider, for example for Azure, the Terraform provider azurecaf fulfils this role: certainly the most accomplished and customisable way, but requires a lot of upstream code writing to achieve a successful result;
- Use a module, again for Azure, naming, using a relatively simple declaration to take advantage of the predefined outputs;
- Create your own module and version it independently: Easy, quick and totally reusable.
As for the fourth point, I’m going to illustrate it with an example of code that’s very quick to implement - there’s no need to build a module that is too complex:
The module shown is contained in a Git code repository to be called by different Terraform code repositories. The naming convention used here is intended to be centralised so that it can be used globally.
Three input variables can be defined for a variables.tf
file:
- project: the name of the project to be deployed;
- environment: the type of environment the project uses. For example: dev, staging or prod. Depending on the name of your environments;
- location: the region used to deploy the project resources.
In addition to the name of the resource, these three fields will give us a suffix that it can use for all our resources.
Within the module code, the main file will be a locals.tf
. Example here for Azure :
locals {
services = {
"resource_group" = "rg"
"virtual_network" = "vnet"
[...]
}
suffix = "${var.project}-${var.environment}-${var.location}"
}
Of course, the idea is to complete the list of services
according to your needs.
All you need to do is define the services
and suffix
values in the outputs.tf
file.
Once the module has been called in the code, it can be used as follows:
resource "azurerm_resource_group" "main" {
name = "${module.naming.service["resource_group"]}-${module.naming.suffix}"
[...]
}
Here’s an easy way to build a naming convention without breaking the bank. Bear in mind that building your own module requires you to maintain it and manage its lifecycle! This means keeping an eye on code quality, version management and so on.
Tags or labels#
Your resources can carry key/value sets named tag for AWS and Azure or label for Google Cloud. No matter what Terraform project or module you are building, this aspect is crucial for traceability, most importantly, for visibility over resource billing. This is a key point emphasized in FinOps best practices.
To be clear, you should make a tags or labels variable mandatory as input to your Terraform code.
variable "tags" {
type = map(string)
description = "Map of tags for deployed resources"
}
Here is an example of a module used within the Terraform code. The tags are defined in a terraform.tfvars
file and injected into the module.
A good approach to making tags or labels mandatory is to add Service control policies (SCPs) in AWS that block the creation of resources if required tags are missing.
Remote Tfstate#
The classic way of operating Terraform is reliant on the creation of a state file called tfstate, used to list of all the deployed resources and their characteristics, as well as the outputs blocks you have defined. In other words, it’s the memory of the infrastructure managed by Terraform.
Instead of storing this file in your Git repository, clearly not a good idea, especially when you’re working with several people, it’s recommended to outsource the creation of this file by specifying a backend
block.
Depending on the type of Cloud, different services are used:
- For AWS, S3 can handle the storage aspect and Dynamo DB will manage the lock mechanism;
terraform {
backend "s3" {
bucket = "tf-buckets-a3"
key = "landing-zone/network"
region = "eu-central-2"
}
}
On Azure, azurerm uses a Storage Account and a Blob container to store the tfstate while managing the lock mechanism natively;
On the Google Cloud side, gcs, aka Google Cloud Storage, is used to store Terraform’s state and lock management.
Regardless of your choice, it is strongly recommended to activate versioning to preserve your tfstates through modifications.
Secondly, instead of hard-coding the backend configuration, it’s recommended to use a .tfvars
configuration file to set the parameters dynamically, depending on the environment.
To do this, you need to create an empty backend
block, as shown below :
terraform {
backend "azurerm" {
}
}
Then populate it with a .tfvars
file containing information such as resource_group_name
, storage_account_name
, and so forth for Azure.
Finally, during the initialisation phase, an additional argument will be required to pass these parameters to the backend block:
terraform init -backend-config=./configurations/production-backend.tfvars
In the case of a production environment.
Sensitive variables#
Sensitive or confidential variables must never be contained or stored in the infrastructure code, as they could be widely spread through Git.
There are several methods for doing this:
Use CI/CD variables: the quickest and easiest case to implement, these variables can store different formats depending on the tool used;
Configure Cloud solutions such as
Secret Manager
(Google Cloud) to store this type of information, can all be done via data sources on the Terraform side;Implementing a Vault-type solution is the most complicated case, as the tools need to be installed, configured, updated and maintained. Terraform also makes it possible to interact with data sources.
It’s up to you to choose the most appropriate solution for your application.
Use the mechanisms of language#
In Terraform, it is possible to create loops or conditions to make the code more dynamic.
As far as conditions are concerned, only ternary conditions are available. As you can see, this limits the possibilities considerably!
To avoid chaining this type of expression too many times, it is useful to use functions with map variables. Here’s an example of a simple case of region selection on Azure:
output "azure_region_v1" {
value = var.my_country == "france" ? "francecentral" : (var.my_country == "suisse" ? "switzerlandnorth" : (var.my_country == "etatunis" ? "eastus" : ""))
}
It’s really very tricky to read…
The simplified version would look like this:
variables.tf
variable "azure_regions" {
type = map(string)
description = "Map of Azure regions"
default = {
"suisse" = "switzerlandnorth",
"france" = "francecentral",
"etatunis" = "eastus"
}
}
variable "my_country" {
type = string
description = "My country name"
}
outputs.tf
output "azure_region_v2" {
value = lookup(var.azure_regions, var.my_country, "")
}
The lookup function can be used to search a key in a map and specify a default value, reducing the need for conditions in our case.
As you can see, the code is much easier to read and maintain, especially if you need to add a fourth region!
In general, conditions should be used with the count
instruction to create or not create a resource. For other cases, there is often another way to optimise the code.
A final word#
When discussing an infrastructure-as-code language, we often forget that it’s still code and must be designed to be as easily maintainable and upgradeable as possible.
I hope these points have given you some tips on how to use Terraform on a daily basis.
Of course, this list is not exhaustive, there is so much to say about such a comprehensive language. Nevertheless, I’ve tried to outline the most relevant key points.
Happy code writing!