How the latest Kubernetes release now handles AI workloads – and more


SerrNovik/Getty Images

Kubernetes, the container-management orchestration system, has proven vital for modern computing, except for one area: artificial intelligence (AI) and machine learning (ML). The problem: AI/ML demands substantial CPU, memory, and GPU resources, which are not easy to manage on Kubernetes.

Now, with the latest Kubernetes release — Kubernetes 1.31, Elli — the Cloud Native Computing Foundation (CNCF) is addressing these issues.

Also: 5 ways CIOs can manage the business demand for generative AI

Elli’s improved AI features start with alpha support for Open Container Initiative (OCI) images and artifacts as a native volume source. This may not sound like much, but it enables developers to switch out large language models (LLM) as easily as they do ordinary container images. 

Elli also brings an updated dynamic resource allocation API and design to Kubernetes. This feature will help standardize accessing and managing hardware accelerators, such as GPUs, which are essential for AI and ML. It also simplifies the implementation of features such as cluster autoscaling, which — in turn — will make it easier to run AI and ML jobs on Kubernetes. 

In the past, Kubernetes had multiple ways to access a host’s underlying hardware. The updated dynamic resource allocation (DRA) provides a more straightforward way to access system resources. The old style, which uses a DRA driver controller, is still supported via “classic DRA.”

Leaving AI behind, Kubernetes also now fully supports AppArmor, a Linux kernel security module that allows system administrators to restrict programs’ capabilities with per-program profiles. This feature has reached general availability, enabling users to set AppArmor profiles for containers directly through the Kubernetes API. Implemented properly, AppArmour support will help make Kubernetes clusters and workloads more secure. 

Elsewhere on the security front, a new optional feature enables administrators to configure the endpoints so anonymous requests for access can be blocked. This will help protect clusters from Role Based Access Control (RBAC) misconfigurations that could otherwise give anonymous users broad access to the cluster.

Also: OIN expands Linux patent protection yet again (but not to AI)

As an ever-evolving open-source program, Kubernetes 1.31 also continues to streamline and modernize its codebase by dropping out-of-date features. This includes: 

  • Removal of in-tree cloud provider integrations: Completing a process that began with version 1.26, Kubernetes 1.31 removes all in-tree cloud provider integrations. This move is part of an ongoing effort to ensure Kubernetes remains a vendor-neutral platform. You can still integrate your Kubernetes clusters with a specific cloud provider. However, you won’t be able to do so from inside Kubernetes. You’ll have to use the recommended approach via external integration.
  • Deprecation of non-CSI volume limit plugins: The release deprecates all non-CSI volume limit scheduler plugins, aligning with Kubernetes’ strategy to transition to the Container Storage Interface (CSI) for all storage-related functionalities.
  • Deprecation of Cgroups v1: Cgroups is a Linux kernel feature that enables you to allocate, prioritize, and manage system resources among processes. From here on out, cgroups v2 is what you should be using. 

Put it all together and this release is about much more than making Kubernetes AI-friendly.  Angelos Kolaitis, a Canonical senior software engineer and team leader for the Kubernetes 1.31 release, said this new release is about “taking the complexity and the implementation-specific details out of the code of Kubernetes, have Kubernetes focus on the desired state and leave all of the implementations, all of the extra source outside of it.”

Also: What is Kubernetes, and why is it so important?

Beyond the new features, Kolaitis said, “You don’t have to do anything, but Kubernetes will be taking extra steps to make sure that your workloads are running, and are reliable, and can be counted on to have your production services not breaking.” 

New features are nice, but stability is golden. For that reason alone, I recommend exploring an update to Kubernetes 1.31 sooner rather than later. 





Source link