HAMi: Heterogeneous GPU Sharing on Kubernetes cover

Technology · Ebook

HAMi: Heterogeneous GPU Sharing on Kubernetes

by Shriira Press

4.5(157)144 pagesPublished 2026

HAMi — Heterogeneous AI Computing Virtualization Middleware — is a CNCF sandbox project that fixes one of the most expensive habits in cloud-native AI: leaving accelerators idle. Kubernetes hands a whole GPU to one container, so a tiny workload still locks an entire card. HAMi changes the unit of allocation. Through a scheduler extender, a mutating webhook, per-vendor device plugins, and an in-container CUDA interception library called HAMi-core, it lets a pod request a slice of a GPU — fixed memory, a percentage of compute cores, or a fraction of a device — and enforces those limits hard, without touching the application's code. This free book teaches HAMi from the ground up across eight focused chapters: the underutilization problem, the architecture, HAMi-core's interception, the scheduler and device plugins, fractional pod specs, scheduling policies and device selection, MIG and multi-vendor support, and running it well in production.

Contents

  1. 1Preface
  2. 2Chapter 1 — What HAMi Is
  3. 3Chapter 2 — Architecture and Components
  4. 4Chapter 3 — HAMi-core and In-Container Virtualization
  5. 5Chapter 4 — The Scheduler and Device Plugins
  6. 6Chapter 5 — Requesting Fractional GPUs
  7. 7Chapter 6 — Scheduling Policies and Device Selection
  8. 8Chapter 7 — MIG, Heterogeneous Devices, and Observability
  9. 9Chapter 8 — HAMi in Practice