HAMi: Heterogeneous GPU Sharing on Kubernetes

Shriira Press

Preface

A single GPU is far more powerful than most workloads need, yet Kubernetes hands the whole card to one container at a time. HAMi slices that card into fractions and shares it safely across many pods.

Welcome to HAMi: Heterogeneous GPU Sharing on Kubernetes.

HAMi — Heterogeneous AI Computing Virtualization Middleware, once known as k8s-vGPU-scheduler — is a CNCF sandbox project that fixes one of the most expensive habits in cloud-native AI: leaving accelerators idle. Kubernetes treats a GPU as an indivisible resource, so a notebook that needs two gigabytes of memory still locks an entire eighty-gigabyte card, and a cluster full of such pods burns money while its silicon sits half-asleep. HAMi changes the unit of allocation. Through a scheduler extender, a mutating webhook, per-vendor device plugins, and an in-container CUDA interception library called HAMi-core, it lets a pod ask for a slice of a GPU — a fixed amount of memory, a percentage of compute cores, or a fraction of a device — and then enforces those limits hard, without touching the application's code. It does this across NVIDIA, Cambricon, Hygon, Iluvatar, Moore Threads, Huawei Ascend, MetaX, and more, presenting a single scheduling model over wildly different hardware. This book teaches HAMi from the ground up: the underutilization problem and what HAMi is, how its components fit together, how HAMi-core virtualizes memory and compute inside the container, how the scheduler places fractional requests, how you write pod specs and choose devices, the scheduling policies and MIG support, observability, and finally how to run it well in production. Eight focused chapters with clear diagrams that turn GPU sharing from a mystery into something you can reason about.

This title is part of the ShriIra library and is free to read in full, right here — our small contribution to making world-class knowledge easy to reach.

A note on reading it: open the Contents menu at the top of the reader to jump between chapters, use the Aa menu to set a comfortable text size, theme (light, sepia, or night), and single- or two-page layout. Your place is saved automatically, so you can always pick up where you left off.

We hope it serves you well.

— Shriira Press

Contents

  1. Chapter 1 — What HAMi Is
  2. Chapter 2 — Architecture and Components
  3. Chapter 3 — HAMi-core and In-Container Virtualization
  4. Chapter 4 — The Scheduler and Device Plugins
  5. Chapter 5 — Requesting Fractional GPUs
  6. Chapter 6 — Scheduling Policies and Device Selection
  7. Chapter 7 — MIG, Heterogeneous Devices, and Observability
  8. Chapter 8 — HAMi in Practice
0%
1/1