Post update:
March 3, 2025: Following the announcement of the Promtail depreciation, the blog post has been amended to use Alloy, its successor.
Introduction#
Observability is a key element for ensuring the reliability and performance of deployed applications, in particular for understanding their behaviour and identifying any problems encountered.
This becomes essential in a containerised environment where there are different layers to monitor, in particularly from the application to the infrastructure hosting, maybe an orchestration solution such as Kubernetes.
I’ve previously discussed observability on this blog, particularly focusing on monitoring with Prometheus (French blog post), covering metrics collection and alert definition.
Yes, but what’s behind the word “observability”?
But what is observability?#
Observability is based on three fundamental pillars: metrics, logs and traces:
- Metrics represent indicators based on a period of time. For example: CPU or the number of requests with http code 5xx over the last 10 minutes;
- Logs model application activity in the form of time-stamped records;
- Traces, the final pillar, enable you to follow the path of a request through different application or non-application components.
Writing logs is often overlooked, but they are much more than simple text messages. They bear witness to every event or error that occurs in an application or infrastructure.
Obviously this depends on how the developers of the application or tool have built these logs. There are several points to bear in mind:
- Think about the person who will be operating the application, the tool or even the infrastructure to build clear and explicit messages;
- Have different levels: INFO, WARNING, DEBUG, or others, with the possibility of selecting the relevant level without consuming unnecessary storage.
- Avoid storing sensitive information such as credit card numbers or ID card identifiers.
This is not an exhaustive list, of course, but creating logs is a major challenge when it comes to studying the behaviour of a system, even more so in a containerised world where the distributed and ephemeral nature of these elements poses a number of challenges.
Kubernetes and logs: a major challenge#
Container orchestration with Kubernetes is now a standard way of deploying and managing applications. However, the logging part can quickly become complex.
These valuable logs can come from a variety of sources:
Containers: An essential part of this approach, each container generates its own logs, reflecting the activity of the application it is hosting;
Pods or other Kubernetes objects: One level above containers, Pods can have specific logs linked to their lifecycle, accessible via the command
kubectl get events
for example or viakubectl describe po ...
;The components of the Controlplane: Components such as the API Server, the Scheduler and the Controller Manager generate logs that are crucial for understanding the state of the cluster and the actions taken by it;
The cluster’s nodes: In other words, the physical or virtual machines that make up the cluster, these also produce system logs, particularly via installed components such as the
kubelet
accessible via systemd.
As mentioned above, containers are highly ephemeral, and Kubernetes introduces additional complexity by managing scaling and ensuring high availability. It achieves this by distributing multiple instances of the same container across different machines, among other strategies.
A Pod can be created, destroyed and recreated in a matter of seconds, taking all its logs with it if they are not properly collected and centralised.
The second challenge is necessarily storage and indexing, without exponentially increasing disk space or drastically reducing performance.
That’s why choosing a tool with these characteristics is a crucial step. So it’s time to introduce Loki!
Loki#
Here we go with Loki#
Loki from Grafana Labs describes itself as a high-performance solution for managing logs. It is designed to be highly available and multi-tenant, while taking its inspiration from Prometheus, a standard in metrics handling.
In fact, Loki offers a log query language called LogQL that is very similar to PromQL, a system for defining alerts and, last but not least, it integrates seamlessly with Grafana to provide a graphical interface.
It is also referenced by the Cloud Native Computing Foundation (CNCF) as part of the observability family.
CNCF observability in February 2025
Traditional log management tools like Elastic Stack index entire message contents, allowing for fast text searches. However, this comes at the expense of high resource consumption, especially CPU usage.
This is why Loki revolutionises the principle by indexing only the metadata of logs in the form of labels, rather than the content itself, and this should not be underestimated.
Labels and Kubernetes are a love story, so you’ll be able to find custom labels for your objects directly within Loki, as well as other very useful information:
- The namespace
- The name of the Pod.
- The name of the container
- The node name
- The log level
- The captured output (stdout, stderr, etc.)
Sample of log picked up by Loki
Loki can be installed in a number of different ways, but it has been designed to be natively easy to scale thanks to its highly decomposed architecture.
Architecture#
The architecture for setting up a logging platform with Loki is based on three main components:
Alloy is the agent that collects logs and attaches labels to send them to Loki;
Loki centralises storage and indexing, based on labels. It also includes an optional storage object called Minio to manage large-scale storage. Other types of storage such as AWS S3, Google Cloud Storage or Azure Blob Storage can be selected;
Grafana : The visualisation tool by excellence. It has a huge community database with ready-to-use dashboards.
Loki has several separate components:
- Distributor is responsible for handling incoming push requests from customers;
- Ingester is responsible for data persistence and transfer to long-term storage;
- Query Frontend is an optional service that provides API endpoints for the Querier component to speed up reading;
- Querier is responsible for executing Log Query Language (LogQL) requests;
- Index Gateway is responsible for processing and serving metadata requests. The recommended format is TSDB (Time Series Database), the same as Prometheus;
- Compactor is used to compact the multiple index files produced by the Ingester and sent to object storage with one index file per day;
- Ruler manages and evaluates the defined rule expressions and/or alerts.
If you’d like to find out more about the technical specifications, I recommend consulting this excellent article in the official documentation.
Installation modes#
Loki offers different installation modes to adapt to the complexity of your use case, but also to give you the flexibility you need to address scalability issues:
Single Binary: Very interesting for environments that do not require high availability, with a maximum of a few tens of GB per day. All components are contained in a single binary;
Distributed: Useful for large-scale deployments, Loki is split into micro-services (ingester, querier, distributor, etc.), enabling fine-grained scalability and providing high availability. This mode is reserved for very large volumes of logs, up to 1 TB per day;
Simple Scalable: The best of both worlds between the previous installations, Loki is represented by three “big” components: read, write, and backend, making it easy to manage while retaining the characteristics of Distributed mode in terms of availability and log ingestion.
Clearly, the choice will depend on your needs, although the Simple Scalable mode seems to meet all expectations by highlighting Loki’s strengths as defined above.
To find out more, you can check out the different diagrams for each mode at this address.
It’s time to take action!
Get your keyboards ready!#
To install the whole thing, there’s nothing better than a ready-to-use Git repository!
Set up and configure Loki with Promtail and Grafana to analyse logs in your Kubernetes cluster.
Feel free to download the content and adjust the three values files as needed for installing Loki, Alloy, and Grafana.
Loki#
First step, Loki, and now we have to make a choice about the installation mode. Personally, I’ve opted for SimpleScalable because it’s a good compromise for exploiting Loki’s strengths when needed.
deploymentMode: SimpleScalable
For this mode, Loki requires object storage, so Minio comes into play via a sub-chart that can be activated:
# -- Configuration for the minio subchart
minio:
enabled: true
For those unfamiliar with this tool, Minio is an object storage system based on the features of AWS S3, enabling simple and effective scaling on both on-premises and cloud infrastructures.
Don’t miss out on defining the configuration schema that Loki will use to store the data:
loki:
[...]
schemaConfig:
configs:
- from: 2025-02-01
object_store: s3
store: tsdb
schema: v13
index:
prefix: index_
period: 24h
It is also possible to specify the number of replicas for each of the three components of the SimpleScalable mode:
# Configuration for the write pod(s)
write:
# -- Number of replicas for the write
replicas: 1
# -- Configuration for the read pod(s)
read:
# -- Number of replicas for the read
replicas: 1
# -- Configuration for the backend pod(s)
backend:
# -- Number of replicas for the backend
replicas: 1
For demonstration purposes, I’ve only put in one replica, but you can easily adapt this to your requirements, especially in a high-availability context.
With the Loki file now configured, the next step is deployment. Before proceeding, ensure you have set up the source for retrieving Grafana charts.
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
Then proceed with the setup:
helm install loki grafana/loki --version 6.27.0 --namespace observability --create-namespace --values ./values-loki.yaml
A few seconds later…
$ kubectl -n observability get po
NAME READY STATUS RESTARTS AGE
loki-backend-0 2/2 Running 0 110s
loki-canary-688x8 1/1 Running 0 110s
loki-chunks-cache-0 2/2 Running 0 110s
loki-gateway-c589f7fd-zskwb 1/1 Running 0 110s
loki-minio-0 1/1 Running 0 110s
loki-read-64787b5cff-6c4b8 1/1 Running 0 110s
loki-results-cache-0 2/2 Running 0 110s
loki-write-0 1/1 Running 0 110s
That sounds pretty good, so we can move on!
Alloy#
Alloy, the log collector is very easy to configure:
alloy:
# -- Overrides the chart's computed fullname. Used to change the full prefix of
# resource names.
fullnameOverride: "alloy"
## Various Alloy settings. For backwards compatibility with the grafana-agent
## chart, this field may also be called "agent". Naming this field "agent" is
## deprecated and will be removed in a future release.
alloy:
configMap:
# -- Create a new ConfigMap for the config file.
create: true
# -- Content to assign to the new ConfigMap. This is passed into `tpl` allowing for templating from values.
content: |-
[...]
loki.write "default" {
endpoint {
url = "http://loki-gateway/loki/api/v1/push"
tenant_id = "1"
}
external_labels = {}
}
mounts:
# -- Mount /var/log from the host into the container for log collection.
varlog: true
It is imperative to specify the gateway to push the logs into Loki with the tenant_id
, because by default Loki is multi-tenant and provides data isolation.
You can deactivate this behaviour by specifying the attribute auth_enabled: false
in Loki and deleting this information in the file above.
Within the configMap
, the configuration can be edited to add, replace or remove labels. Personally, I converted the Promtail configuration from my previous installation. For further details, you can refer to this link which provides a step-by-step explanation.
Finally, ensure you mount the directory that contains your logs (mounts.varlog
), or you won’t have any data to collect.
Now it’s time to deploy Alloy :
helm install alloy grafana/alloy --version 0.12.1 --namespace observability --create-namespace --values ./values-alloy.yaml
Alloy works in DaemonSet
mode, which is logical enough for retrieving logs from all the nodes in the cluster. In my case, I have two Pods for two nodes:
$ kubectl -n observability get po
NAME READY STATUS RESTARTS AGE
[...]
alloy-jhlvq 1/1 Running 0 94s
alloy-vsbzb 1/1 Running 0 94s
Grafana#
Then there’s the visualisation part, probably the most important for making sense of this blog post, but above all for consulting and visualising these precious logs!
Grafana needs a data source to read the contents of Loki:
## Configure grafana datasources
## ref: http://docs.grafana.org/administration/provisioning/#datasources
##
datasources:
datasources.yaml:
apiVersion: 1
datasources:
# Loki DataSource
- name: Loki
uid: loki
type: loki
url: http://loki-read:3100/
access: proxy
orgId: 1
jsonData:
httpHeaderName1: 'X-Scope-OrgID'
secureJsonData:
httpHeaderValue1: '1'
So we add Loki by specifying the loki-read service to retrieve the content we need. Don’t forget the HTTP headers for Loki’s multi-tenant configuration.
Second step: retrieve one (or more) of the community dashboards available on the Grafana site.
Simple and effective, I chose this one. It lets you filter by namespace, container and stream, with the option of making a custom query.
To do this, you need to define a location in Grafana to put this dashboard in a folder:
# Configure grafana dashboard providers
## ref: http://docs.grafana.org/administration/provisioning/#dashboards
##
## `path` must be /var/lib/grafana/dashboards/<provider_name>
##
dashboardProviders:
dashboardproviders.yaml:
apiVersion: 1
providers:
- name: 'Logs'
orgId: 1
folder: 'Logs'
type: file
disableDeletion: false
editable: true
options:
path: /var/lib/grafana/dashboards/logs
And, above all, to request Grafana to retrieve the dashboard with the associated ID and revision:
## Configure grafana dashboard to import
## NOTE: To use dashboards you must also enable/configure dashboardProviders
## ref: https://grafana.com/dashboards
##
## dashboards per provider, use provider name as key.
##
dashboards:
logs:
loki:
gnetId: 15141
revision: 1
datasource: Loki
For keeping things simple, I’ve chosen to disable authentication:
## Grafana's primary configuration
## NOTE: values in map will be converted to ini format
## ref: http://docs.grafana.org/installation/configuration/
##
grafana.ini:
auth.anonymous:
enabled: true
org_role: Admin
auth:
disable_login_form: true
Well, well! It’s time to deploy this last piece of the log platform:
helm install grafana grafana/grafana --version 8.10.1 --namespace observability --create-namespace --values ./values-grafana.yaml
Everything looks good here too:
kubectl -n observability get po
NAME READY STATUS RESTARTS AGE
grafana-7dccdbb78-2bbjm 1/1 Running 0 60s
View logs#
If like me, you have configured Grafana without Ingress
, the easiest way is to do port-forward
to access it:
kubectl -n observability port-forward svc/grafana 8080:80
Once in your browser, connect to localhost:8080
and go to Dashboards > Logs > Loki Kubernetes Logs.
And bingo! You’re back on the dashboard you set up earlier, with the option of viewing your logs!
View from Grafana
A final word#
Loki, along with Alloy and Grafana, are the ideal companions for displaying and collecting the logs needed to study behaviour within Kubernetes.
Loki is easy to deploy and can be configured as required, making it a reliable and robust solution that fits in perfectly with the Cloud Native ecosystem dedicated to observability initiated by Prometheus.
I look forward to seeing you next month for an article on the final pillar of observability: traces!