Volcano: Batch Scheduling for Kubernetes
Shriira Press
Run AI/ML, big-data, and HPC workloads on Kubernetes. High-performance batch scheduling with gang scheduling, fair-share queues, and GPU-aware policies the default scheduler lacks.
Welcome to Volcano: Batch Scheduling for Kubernetes.
Volcano is the CNCF batch scheduling system for Kubernetes — bringing high-performance scheduling for AI/ML, big-data, and HPC workloads, with gang scheduling, fair-share queues, priorities, and a job-level abstraction the default scheduler lacks. This free book teaches it from the ground up: the batch scheduling problem and what Volcano is, Kubernetes scheduling and batch concepts, Volcano's architecture (the scheduler, controllers, and plugins), the Volcano Job (tasks, roles, PodGroups), gang scheduling (all-or-nothing for distributed jobs), queues and fair sharing (DRF, priorities, preemption), scheduling policies and plugins (the composable framework), AI/ML and big-data integration (TensorFlow, PyTorch, Spark, MPI), GPU, topology, and advanced scheduling, and using Volcano in practice. Ten focused chapters with clear diagrams that make batch scheduling concrete — gang-schedule distributed jobs (so all pods run together, avoiding wasted GPUs and deadlocks), share resources fairly across teams, and place workloads GPU- and topology-aware — running compute-intensive workloads efficiently and fairly on Kubernetes.
This title is part of the ShriIra library and is free to read in full, right here — our small contribution to making world-class knowledge easy to reach.
A note on reading it: open the Contents menu at the top of the reader to jump between chapters, use the Aa menu to set a comfortable text size, theme (light, sepia, or night), and single- or two-page layout. Your place is saved automatically, so you can always pick up where you left off.
We hope it serves you well.
— Shriira Press