Volcano: Batch Scheduling for Kubernetes

Shriira Press

Preface

Run AI/ML, big-data, and HPC workloads on Kubernetes. High-performance batch scheduling with gang scheduling, fair-share queues, and GPU-aware policies the default scheduler lacks.

Welcome to Volcano: Batch Scheduling for Kubernetes.

Volcano is the CNCF batch scheduling system for Kubernetes — bringing high-performance scheduling for AI/ML, big-data, and HPC workloads, with gang scheduling, fair-share queues, priorities, and a job-level abstraction the default scheduler lacks. This free book teaches it from the ground up: the batch scheduling problem and what Volcano is, Kubernetes scheduling and batch concepts, Volcano's architecture (the scheduler, controllers, and plugins), the Volcano Job (tasks, roles, PodGroups), gang scheduling (all-or-nothing for distributed jobs), queues and fair sharing (DRF, priorities, preemption), scheduling policies and plugins (the composable framework), AI/ML and big-data integration (TensorFlow, PyTorch, Spark, MPI), GPU, topology, and advanced scheduling, and using Volcano in practice. Ten focused chapters with clear diagrams that make batch scheduling concrete — gang-schedule distributed jobs (so all pods run together, avoiding wasted GPUs and deadlocks), share resources fairly across teams, and place workloads GPU- and topology-aware — running compute-intensive workloads efficiently and fairly on Kubernetes.

This title is part of the ShriIra library and is free to read in full, right here — our small contribution to making world-class knowledge easy to reach.

A note on reading it: open the Contents menu at the top of the reader to jump between chapters, use the Aa menu to set a comfortable text size, theme (light, sepia, or night), and single- or two-page layout. Your place is saved automatically, so you can always pick up where you left off.

We hope it serves you well.

— Shriira Press

Contents

  1. Chapter 1 — What Volcano Is
  2. Chapter 2 — Kubernetes Scheduling and Batch Concepts
  3. Chapter 3 — Volcano Architecture
  4. Chapter 4 — The Volcano Job
  5. Chapter 5 — Gang Scheduling
  6. Chapter 6 — Queues and Fair Sharing
  7. Chapter 7 — Scheduling Policies and Plugins
  8. Chapter 8 — AI/ML and Big-Data Integration
  9. Chapter 9 — GPU, Topology, and Advanced Scheduling
  10. Chapter 10 — Volcano in Practice
0%
1/1