Traces, a story of observability#
The word observability is something you’re beginning to know by heart now, having read so many of my blog posts about it.
In the previous post, I talked about logs with Loki and Alloy. So it’s the next logical step to continue in this way by talking about traces!
Yes, but what is a trace?
Well, this is what allows us to visualise and understand the execution flow of a request, particularly in microservices architectures where a single user action can trigger dozens of calls between different services.
Each trace is made up of several segments called spans. These segments represent the different operations between components, providing a complete picture of the relationships between them and an insight into the different dependencies.
Differences between traces and spans
The growing discussion around traces today is largely due to the adoption of standards like OpenTracing and later OpenTelemetry, which have significantly popularized application instrumentation for their collection, making this practice more accessible.
Leaving traces in your Kubernetes cluster can help you!#
Kubernetes has become the environment most often mentioned when talking about Cloud Native applications, particularly when they are split into microservices. Quite simply because this orchestrator’s strengths lie in its ability to scale each component rapidly, thereby improving resilience and availability.
Here are a few points that, in my opinion, underline the essential role of having traces in a containerised world:
Identify bottlenecks: Traces help to identify precisely the services or calls that consume the most time in a complete transaction ;
Understand and identify dependencies: Traces reveal the relationships between different services and components, helping understand the overall architecture of an application;
Performance optimisation: When people talk about performance in relation to traces, they mainly identify latency. This gives you an idea of the component that takes the longest to respond to a request overall, so that you can take action to optimise it.
Moreover, within the Kubernetes ecosystem, additional components like Ingress Controllers, Gateway API, Service Mesh, and Network Policy also play a important role. These traces are crucial to understanding the path of a call through an application, but also through the various orchestrator mechanisms!
Tempo#
Grafana’s Tempo is an open source tool from the Cloud Native Computing Foundation (CNCF), integrating perfectly with the Prometheus and Grafana world. It provides a single storage solution for traces in the form of backend.
Observability landscape within the CNCF in March 2025.
Tempo supports connections with object storage such as S3, Google Cloud Storage, Azure Blob Storage and even Minio for storing raw traces.
What’s more, it offers a wide range of formats: Jaeger, Zipkin and OpenTelemetry, under different protocols, making it easy to adopt in existing environments.
As mentioned above, it works very well with Grafana, making it easy to match metrics and logs to traces using trace identifiers (traceId).
The duration for which traces are retained can be customized through flexible retention policies, allowing organizations to adapt them to their specific needs.
Tempo’s architecture ensures that it can adapt to massive volumes of trace data without any loss of performance.
Finally, Tempo can be scaled according to your needs thanks to its distributed architecture, making it easy to improve performance when ingesting traces.
Architecture#
Tempo architecture from official documentation
Tempo’s architecture is made up of several components:
- Distributor will retrieve traces in different formats;
- Ingester will be responsible for creating filters and indexes by organising the traces into blocks before storing them;
- Query Frontend provides an API for retrieving traces, and it is this component that will be queried using Grafana;
- Querier is located behind the Query Frontend and is used to retrieve traces using storage or the Ingester;
- Compactor reduces storage space;
- Metrics generator is an optional component that can be used to generate metrics from traces to populate Prometheus.
Installation modes#
Tempo comes with two installation modes:
Monolithic (Single Binary): Quick to set up for testing purposes, or for small volumes of data;
Distributed: For large-scale deployments, Tempo is broken down into microservices (ingester, compactor, distributor, etc.), enabling much more granular scaling than is possible in the first method.
An operator exists: Tempo Operator for enthusiasts of this type of mechanism. It is based partly on the microservices mode described above.
After theory comes practice!#
To follow the steps, you can retrieve the configurations via this code repository:
Set up and configure Tempo with Alloy and Grafana to look for traces in your Kubernetes cluster.
Feel free to adjust the values as needed to better suit your configuration.
These different values files will be used to configure Tempo and the various associated tools.
Let’s get started!
Alloy#
Alloy goes much further than a simple log collector. It can also be used to retrieve traces via a dedicated configuration.
To do this, it is essential to inject the right configuration. For this use case, the http and grpc endpoints in OpenTelemetry format are required.
This configuration is completely flexible and depends on the format that your application or tool is capable of providing in order to retrieve the traces.
alloy:
configMap:
# -- Create a new ConfigMap for the config file.
create: true
# -- Content to assign to the new ConfigMap. This is passed into `tpl` allowing for templating from values.
content: |-
logging {
level = "info"
}
otelcol.receiver.otlp "default" {
http {}
grpc {}
output {
traces = [otelcol.processor.batch.default.input]
}
}
otelcol.processor.batch "default" {
output {
traces = [otelcol.exporter.otlp.tempo.input]
}
}
otelcol.exporter.otlp "tempo" {
client {
endpoint = "tempo-distributor.observability.svc.cluster.local:4317"
tls {
insecure = true
}
}
}
Not to mention forwarding and exporting them to the Tempo service, deployed immediately afterwards.
By default, the Alloy collector exposes a service that needs to be overloaded to make the ports associated with the OpenTelemetry protocol (OTLP) available in both HTTP and GRPC.
# -- Extra ports to expose on the Alloy container.
extraPorts:
- name: otlp-http
port: 4318
targetPort: 4318
protocol: TCP
- name: otlp-grpc
port: 4317
targetPort: 4317
protocol: TCP
The configuration seems correct, so the next step is to deploy it.
Before you start, don’t forget to configure the source to retrieve the Helm charts from Grafana:
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
Then proceed with the installation:
helm install alloy grafana/alloy --version 0.12.5 --namespace observability --create-namespace --values ./values-alloy.yaml
Alloy is now ready to take on your events!
Tempo#
To install Tempo with Helm, you can choose between two configurations:
A tempo chart in monolithic mode, a single binary to deploy;
Another chart decomposed into microservices: tempo-distributed ideal for managing the load granularly on each component.
To test Tempo’s operation in conditions close to reality, I’ve chosen the tempo-distributed chart. Feel free to choose the setup that suits you best.
In this configuration mode, object storage is strongly recommended. Personally, I use Minio, which is directly included in the form of a Helm sub-chart, with a bucket to store the traces:
# Minio
minio:
enabled: true
mode: standalone
rootUser: grafana-tempo
rootPassword: supersecret
buckets:
# Default Tempo storage bucket.
- name: tempo-traces
policy: none
purge: false
Next, the OTLP protocols must be enabled for Tempo to be able to receive traces from Alloy:
traces:
otlp:
http:
# -- Enable Tempo to ingest Open Telemetry HTTP traces
enabled: true
grpc:
# -- Enable Tempo to ingest Open Telemetry GRPC traces
enabled: true
Lastly, we complete the setup by configuring storage in S3 mode using the parameters of the deployed Minio.
# To configure a different storage backend instead of local storage:
storage:
trace:
# -- The supported storage backends are gcs, s3 and azure, as specified in https://grafana.com/docs/tempo/latest/configuration/#storage
backend: s3
wal:
path: /tmp/tempo/wal
s3:
bucket: tempo-traces
endpoint: tempo-minio:9000
access_key: grafana-tempo
secret_key: supersecret
insecure: true
tls_insecure_skip_verify: true
To deploy everything, use the helm install
command that you know (almost) by heart:
helm install tempo grafana/tempo-distributed --version 1.32.7 --namespace observability --create-namespace --values ./values-tempo.yaml
Grafana#
Grafana, the visualisation tool by far the best, benefits from a streamlined configuration, requiring you to initialise a datasource in order to manipulate Tempo data:
datasources:
datasources.yaml:
apiVersion: 1
datasources:
# Tempo DataSource
- name: Tempo
uid: tempo
type: tempo
url: http://tempo-query-frontend:3100/
access: proxy
orgId: 1
Here too, the same rules apply for deployment:
helm install grafana grafana/grafana --version 8.10.4 --namespace observability --create-namespace --values ./values-grafana.yaml
Traefik for testing#
All that’s left is to plug in an application on which you want to view the traces. I use Traefik, a tool that can generate OTLP-format traces and dump them into a collector like Alloy.
Why Traefik?
Traefik is an Ingress Controller, in other words, the gateway for exposing my services to the outside world. I find it useful to be able to understand the routes and middleware called for each request. This helps me understand whether my configuration is correct or not.
To do this, Traefik’s configuration is light. The aim is to be able to read basic traces provided by Traefik.
To send the traces in OTLP format, here is an extract of the settings to adopt:
## Tracing
# -- https://doc.traefik.io/traefik/observability/tracing/overview/
tracing: # @schema additionalProperties: false
# -- Enables tracing for internal resources. Default: false.
addInternals: true
otlp:
# -- See https://doc.traefik.io/traefik/v3.0/observability/tracing/opentelemetry/
enabled: true
http:
# -- Set to true in order to send metrics to the OpenTelemetry Collector using HTTP.
enabled: true
# -- Format: <scheme>://<host>:<port><path>. Default: http://localhost:4318/v1/metrics
endpoint: "http://alloy.observability:4318/v1/traces"
Unsurprisingly, we fill in the endpoint provided by Alloy associated with the HTTP protocol.
In addition, the addInternals: true
enables all Traefik’s internal layers to be traced, very useful when configuring a set of middleware.
Finally, in order to generate traffic and have traces, we can use a service in NodePort
mode to simplify the use case:
service:
enabled: true
## -- Single service is using `MixedProtocolLBService` feature gate.
## -- When set to false, it will create two Service, one for TCP and one for UDP.
type: NodePort
Once again, Helm is involved in deploying the Traefik chart:
helm install traefik traefik/traefik --version 34.4.1 --namespace ingress --create-namespace --values ./values-traefik.yaml
The tools are now correctly deployed and, above all, operational! As you can see:
$ kubectl -n observability get po
NAME READY STATUS RESTARTS AGE
alloy-d7649 2/2 Running 0 72s
grafana-5b5dd98f75-dpdlm 1/1 Running 0 67s
tempo-compactor-5856cfc4b6-vf9sm 1/1 Running 3 (94s ago) 2m1s
tempo-distributor-776dd495cc-k2vdl 1/1 Running 3 (97s ago) 2m1s
tempo-ingester-0 1/1 Running 3 (86s ago) 2m1s
tempo-ingester-1 1/1 Running 3 (88s ago) 2m1s
tempo-ingester-2 1/1 Running 3 (89s ago) 2m1s
tempo-memcached-0 1/1 Running 0 2m1s
tempo-minio-568d558987-rvvjp 1/1 Running 0 2m1s
tempo-querier-6b7fb8f848-2ztqs 1/1 Running 3 (88s ago) 2m1s
tempo-query-frontend-685fcd8fb-fl8bg 1/1 Running 3 (97s ago) 2m1s
$ kubectl -n ingress get po
NAME READY STATUS RESTARTS AGE
traefik-59c8dbcb57-9gmpt 1/1 Running 0 78s
View traces#
Go to Grafana with a port-forward:
kubectl -n observability port-forward svc/grafana 8080:80
Generate a few traces by accessing the Traefik service and without further ado, go to the Explore section of Grafana and select Tempo as the DataSource.
Here’s an example of my personal configuration, obviously to get this result I’ve configured a few IngressRoutes
and Middleware
within Traefik:
A few final words#
Tempo is the ideal tool for collecting traces of your applications. Thanks to its microservices-based architecture, it can take advantage of the scalability offered by Kubernetes while integrating seamlessly into the Grafana ecosystem.
The different formats and protocols enable Tempo to store a very wide range of data without having to use other third-party solutions.
Last but not least, it’s fairly quick to set up, thanks to ready-to-use documentation and Helm charts.