Before getting started…#
After a year off, I’m back for KubeCon and CloudNativeCon, taking place this year in London! As always, this event brings together enthusiasts and contributors to Cloud Native technologies and those closely or remotely linked to Kubernetes.
From 1 to 4 April, I was fortunate enough to take part in this major conference, thanks in particular to my company SoKube, which I’d like to thank very sincerely.
This 2025 edition made its mark with some very trendy topics, notably Artificial Intelligence, a theme that was omnipresent throughout the keynotes, stands and conferences.
There was also observability, along with OpenTelemetry, Platform Engineering and, last but not least, security, still very much in the spotlight.
To make a fresh approach to the feedback from recent years, in the following post I’d like to give you the key announcements from this event, along with the conferences that made the biggest impression on me.
Launch keynote#
KubeCon kicked off with the traditional Keynote, and Chris Aniszczyk (CTO of the Cloud Native Computing Foundation) was the first to take to the stage.
A few statistics were given: this year’s event attracted a total of more than 12,500 people from all over the world. An impressive figure, reflecting an increasingly active and diverse open source community.
This return to London also had a symbolic dimension, since the very first European KubeCon was held there in 2016.
10 years of CNCF#
The Keynote started with a historical reminder of how the event came about: Kubernetes officially became the first CNCF project in March 2016. Today, nearly 9.2 million developers claim to be Cloud Native, showing just how much adoption of these technologies has accelerated in less than a decade.
To mark the 10th anniversary of the CNCF, a number of statistics and visualisations were shared, including a map showing the activity of open source projects created and maintained.
The message was clear: Maintainers are the key. Indeed, maintainers are the beating heart of the ecosystem, and their role is more crucial than ever in the evolution of the CNCF.
And that sets the tone for what’s to come!
Golden Kubestronaut#
The Kubestronaut program is evolving with the arrival of a new level: Golden Kubestronaut.
This program requires CKAD, CKA, CKS, KCNA and KCSA certification.
With the Golden level, you also need to obtain all the other CNCF certifications, such as: Prometheus Certified Associate, Certified Argo Project Associate, and so on. Not forgetting the LFCS (Linux Foundation Certified System Administrator).
As you can see, this represents a real achievement for certification enthusiasts!
European sovereignty with NeoNephos#
NeoNephos is an initiative aimed at developing a sovereign cloud for Europe, with the main objective of building on CNCF projects while guaranteeing European regulatory standards.
This is a strong signal of the growing importance of digital sovereignty and technological independence in the distributed cloud.
Upcoming events#
The locations of the next two KubeCon Europe events have been revealed:
- For 2026, the venue will be Amsterdam;
- And 2027 will take place in Barcelona!
And let’s not forget that KubeCon will be held in India for the first time in the coming months.
Let’s move on to the technical topics covered at the Keynote!
Scheduling and dynamic resource allocation#
And it gets off to a great start with a presentation of DRA (Dynamic Resource Allocation) with the ability to allocate hardware resources such as GPUs between one or more Pods.
This section ends with the presentation of a new open source project by Google: Dynamic Multi-Cluster Orchestrator, which enables different types of workload to be managed between different clusters.
More information will soon be available on the GitHub code repository.
The aim of this project is to optimise resource placement across a fleet of clusters, with the desire to avoid wasting resources according to remaining capacity.
Finally, Argo CD is also part of the story, with a plugin that lets you take advantage of this multi-cluster orchestrator to take care of application deployment across all the clusters.
Observability with a mix of AI at eBay#
eBay has shared its experience of observability management within its application system, comprising 4,600 microservices!
A volume that makes manual sorting of alerts ineffective. That’s why they have set up an AI-based anomaly detection system that not only analyses traces, metrics and logs, but also reports on the actions to be taken.
This is a real demonstration of a use case that uses AI to improve the reliability of an information system.
All this also makes it possible to introduce OpenTelemetry, a very rich toolbox for improving the observability of applications. Perhaps this part will be integrated into this tool in a future version?
An UI for Kubernetes#
Headlamp will become the official graphical interface for Kubernetes. This will allow you to connect and control one or more clusters.
Whether via the browser or the thick client, it will be possible to manage all the Kubernetes objects and also set permissions to grant accesses.
In addition to the visualisation aspects, it will be possible to execute commands, edit resources and even add plugins to control additional tools such as Flux.
Rust in the Linux kernel#
Another keynote topic was the gradual introduction of Rust into the Linux kernel. Following a brief overview of the kernel’s architecture, some historical limitations are inherent in C, particularly in terms of memory management and concurrency. These aspects are frequently the source of critical bugs.
With the introduction of Rust, the promise is clear: enhanced memory security without compromising performance. Thanks to its compile-time verification mechanisms, property model and resource management guards, Rust drastically reduces the number of locks and explicit allocations, which are classic sources of errors.
Rust can fail like C but safer.
Currently, around 25,000 lines of Rust code are integrated into the Linux kernel.
However, the introduction of Rust is not without friction. It demands reworking existing C APIs for compatibility and best practices, posing a cultural shift for long-time C developers.
To sum up, Rust in the kernel is both a cultural and a technical transition, but one that seems to be on the right track.
Artificial intelligence for sign language with Kubeflow#
The keynote continued with a demonstration of an AI/ML pipeline using Kubeflow, centred on a sign language recognition application.
Here, Kubeflow is used to handle the pre-processing of data from cameras strategically placed to capture gestures. Using the power of Kubernetes, the system scales as needed to support the transcription of this visual language.
Regarding the technical challenges, two stand out in particular:
- Adapting data transformation to a specific context, with constraints linked to the physical environment;
- Managing the location and coordination of cameras to accurately capture hand movements based on body position.
Kubernetes is the tool that enables this technical feat to be achieved, thanks to its capabilities:
- Resource optimisation to allocate GPUs when needed;
- Scaling: adding workloads as required;
- Orchestration of the various stages: from data collection to inference (predictions or precise decisions).
OpenTelemetry, the observability standard#
The Keynote wrapped up by highlighting a fast-growing theme: monitoring as code, at the crossroads of observability and automation.
OpenTelemetry is the observability project that’s got the wind in its sails, in fact it’s the second most active project in the CNCF, just behind Kubernetes. It centralises, structures and unifies the collection of metrics, traces and logs across different languages. A sort of big toolbox for developers.
This tool is designed to make the data accessible, easy to interpret, and most importantly, actionable for all teams.
OpenTelemetry comes with an auto-instrumentation mechanism for retrieving traces, for example, without needing to modify the underlying code or binary. A very fine demonstration was given using an application in Java and Go, and the result was quite magical.
Ultimately, the aim is to reduce the developer’s work and increase the ability to make applications visible, a very good step towards greater visibility
Last tool presented: Perses
A sandbox project within the CNCF to create fully configurable dashboards using code embedding the GitOps philosophy.
A future competitor for Grafana?
My favourite talks#
Here are a few talks that I really enjoyed and that I encourage you to watch again.
Day-2’000 - Migration From Kubeadm+Ansible To ClusterAPI+Talos: A Swiss Bank’s Journey#
By Clément Nussbaumer from PostFinance
A first session with feedback from a Swiss bank to begin with, with an important topic: the modernisation of Kubernetes cluster creation towards the Cluster API standard.
The talk began with a presentation of the current infrastructure:
- 35 Kubernetes clusters in production;
- 450 nodes spread across two on-premises datacentres;
- Aging clusters with more than 2,000 days on the clock (i.e. more than 5 years!).
All of this with the classic combo: Terraform for the infrastructure management, Ansible and Puppet for machine configuration, and Argo CD for deploying resources within Kubernetes.
But here’s the thing… the idea was to bring the flexibility of Cluster API to manage clusters and their lifecycle while benefiting from the immutability of Talos Linux to adhere to rolling update practices and avoid “patching” the infrastructure.
The choice of this operating system was based on its great simplicity:
- Minimalist OS, just a few binaries to run Kubernetes;
- Secure mTLS connection between the
talosctl
command line tool and the nodes; - And, as mentioned above, the operating system is immutable and intended to remain ephemeral.
The migration process remains challenging and is currently still undergoing testing, but Clément took the opportunity to give us some feedback on this change of technology:
- Align configurations: the service-account-issuer parameters and the encryption key etcd must be identical between Talos and kubeadm;
- Import existing certificates (PKI), including the Talos-specific key from the endpoint;
- Create Custom Resource Definition (CRD) Cluster APIs, with one namespace per cluster;
- Join old nodes to new ones via Cluster API, gradually replacing all existing nodes.
A spectacular demonstration of node deployment via the Cluster API rounded off the session. A very interesting talk and a very rich experience!
More Nodes, More Problems: Solving Multi-Host GPU/TPU Scheduling With Dynamic Resource Allocation#
By John Belamaric and Morten Torkildsen from Google
With the explosion of AI/ML workloads, computing power requirements are constantly growing. This talk tackled a well-known issue for Kubernetes infrastructures: how to efficiently schedule resource-intensive jobs on multi-node architectures with GPUs or TPUs, while ensuring optimal performance, reliability and resilience.
To begin with, deploying AI in Kubernetes highlights certain issues:
- Artificial Intelligence-based workloads require large resource-side nodes, which fatally increases the risk in the event of failure;
- Need for compact placement: Pods need to be deployed on nodes close to each other to take advantage of the hardware topology;
- Concurrent job orchestration without race conditions.
This is why DRA (Dynamic Resource Allocation), recently added to Kubernetes, will be able to meet these various challenges with a fairly new approach:
- It has an API for describing peripherals (e.g. Nvidia GPUs):
- It offers new objects: ResourceClaim, ResourceClaimTemplate, DeviceClass and ResourceSlice, the first of which is covered in this presentation for expressing resource requirements;
- The Scheduler is able to make a decision based on the ResourceClaim definitions linked to the Pods;
- The Kubelet applies the decision made above.
Currently, the Scheduler is based on a fit first logic, though a transition to a scoring-based system is a possibility for the future through a scoring mechanism.
In addition, the speakers highlight what happens in the event of a component failure within a node: A taint is applied in NoExecute, resulting in all Pods being evicted and replaced together on another available TPU or GPU device.
Finally, version 1.33 of Kubernetes will enable a GPU to be split into several smaller units, a feature eagerly awaited by the community!
I found it very interesting to understand the mechanisms within DRA, but also to get a glimpse of future developments.
Beyond Security: Leveraging OPA for FinOps in Kubernetes#
By Sathish Kumar Venkatesan from DevOpsCloudJunction Foundation Inc.
Visibility and optimisation of cloud costs have become a central issue in large-scale Kubernetes environments. In this conference, Sathish presented an unusual approach: diverting the use of OPA (Open Policy Agent), usually used for security, in order to track costs.
Some statistics were given, with an average of 32% of cloud resources being wasted!
There are several reasons to this result: over-provisioned workloads, under-utilised nodes, missing quotas and inactive resources. The end result is Kubernetes clusters that cost a lot for moderate use.
To avoid wastage, the solution is simple: use security constraints within OPA using the Rego language, which is fully compatible with a GitOps approach.
A few rules are suggested for putting this into place:
- Tag compliance to improve cost allocation visibility;
- Optimise compute according to workload type;
- Allocate maximum costs per namespace;
- Storage optimisation based on data;
- Forcing the use of Spot instances for development environments;
- Linking expenditure to fixed budgets.
The approach is intended to be iterative, with the need to constantly measure efficiency and user feedback.
Finally, the OpenCost tool was mentioned as a reference solution for monitoring cost allocation.
This talk emphasizes how the FinOps approach can be integrated using Policy as Code tools such as OPA/Gatekeeper or Kyverno, which help establish a standardised framework for starting up a cluster.
Let’s finish with a talk on security…
Encryption, Identities, and Everything in Between; Building Secure Kubernetes Networks#
By Lior Lieberman from Google and Igor Velichkovich from Stealth Startup
In this security and network-oriented conference, Lior Lieberman and Igor Velichkovich gave a comprehensive overview of the challenges of managing network restrictions in Kubernetes.
Their main message was clear: to go beyond the current limitations, the identity assigned to Pods must become the keystone of network policies.
This topic begins with an observation made during troubleshooting phases: the use of a well-known debug Pod that is forgotten to delete with extended network access: This practice can result in multiple security vulnerabilities, thereby expanding the attack surface.
Several incidents in different sectors have shown that this practice can lead to data leaks or breaches of compliance.
It should therefore be avoided at all costs… especially without network security.
Speaking of network policies, a quick state of the art is given on what Kubernetes has today:
The NetworkPolicy in version 1 is presented as the standardised solution, compatible with the majority of NICs, with usage designed for developers.
However, a number of limitations quickly become apparent:
- Deny is implicit, which is sometimes restrictive;
- Lack of global cluster policies for administrators;
- The rules work with IPs or labels, but not with identities.
One solution to overcome some of these limitations is based on AdminNetworkPolicy or BaselineAdminNetworkPolicy.
These are available in CRD configuration and support explicit Deny, enabling tenants to be isolated within the same cluster.
But they still don’t manage identities…
But why would you want to implement identity-based policies at all costs?
The two speakers pointed out that IPs are cruelly ephemeral, especially in a world where infrastructure is constantly being destroyed and created! As for labels, anyone can change them on a Pod as it runs, whereas attaching an identity through a service account would make things immutable!
Nevertheless, Istio really stands out here with its AuthorizationPolicy feature. It not only lets you manage service accounts as identities, but also gives you fine-grained control to allow or block actions at the application layer (Layer 7) of the OSI model.. This is also the case with Cilium for this last part.
The point is… there is a serious lack of standardisation, and NetworkPolicy as it is known today is subject to future changes to complete its Zero Trust transition.
This was an extremely interesting talk that allowed us to look ahead to future security aspirations within the Kubernetes mechanisms.
A few words to conclude this wonderful event#
KubeCon 2025 in London confirmed what many had been expecting: the Kubernetes ecosystem has entered a new phase of maturity, particularly when it comes to embracing Artificial Intelligence!
Innovative solutions based on automation, resource management, cost optimisation and security highlight Kubernetes’ determination to become the benchmark platform for all types of workload!
But beyond the technical side of things, KubeCon is an exciting event with a very active and passionate community who love talking to each other and driving forward the future of Cloud Native and Kubernetes.
I can’t wait for the next edition!