Technology · Ebook
Volcano: Batch Scheduling for Kubernetes
by Shriira Press
Volcano is the CNCF batch scheduling system for Kubernetes — bringing high-performance scheduling for AI/ML, big-data, and HPC workloads, with gang scheduling, fair-share queues, priorities, and a job-level abstraction the default scheduler lacks. This free book teaches it from the ground up: the batch scheduling problem and what Volcano is, Kubernetes scheduling and batch concepts, Volcano's architecture (the scheduler, controllers, and plugins), the Volcano Job (tasks, roles, PodGroups), gang scheduling (all-or-nothing for distributed jobs), queues and fair sharing (DRF, priorities, preemption), scheduling policies and plugins (the composable framework), AI/ML and big-data integration (TensorFlow, PyTorch, Spark, MPI), GPU, topology, and advanced scheduling, and using Volcano in practice. Ten focused chapters with clear diagrams that make batch scheduling concrete — gang-schedule distributed jobs (so all pods run together, avoiding wasted GPUs and deadlocks), share resources fairly across teams, and place workloads GPU- and topology-aware — running compute-intensive workloads efficiently and fairly on Kubernetes.
Contents
- 1Preface
- 2Chapter 1 — What Volcano Is
- 3Chapter 2 — Kubernetes Scheduling and Batch Concepts
- 4Chapter 3 — Volcano Architecture
- 5Chapter 4 — The Volcano Job
- 6Chapter 5 — Gang Scheduling
- 7Chapter 6 — Queues and Fair Sharing
- 8Chapter 7 — Scheduling Policies and Plugins
- 9Chapter 8 — AI/ML and Big-Data Integration
- 10Chapter 9 — GPU, Topology, and Advanced Scheduling
- 11Chapter 10 — Volcano in Practice
