KServe: Model Serving on Kubernetes cover

Technology · Ebook

KServe: Model Serving on Kubernetes

by Shriira Press

4.7(420)188 pagesPublished 2026

KServe is a standardized, serverless model inference platform on Kubernetes — making it simple to deploy, scale, and manage machine learning models in production. This free book teaches it from the ground up: the model serving problem and what KServe is, the serving landscape (training vs serving), KServe's architecture (the InferenceService, serverless foundation), the InferenceService abstraction, serving runtimes (multi-framework support), the standard inference protocol (the Open Inference Protocol), autoscaling and serverless (scale-to-zero), advanced inference (transformers, explainers, inference graphs), production serving (canary, monitoring, payload logging), and operating KServe in practice (including LLM serving with vLLM and OpenAI-compatible APIs). Ten focused chapters with clear diagrams that demystify model serving — turning trained models into scalable, standard, cost-efficient production inference, including the LLMs at the center of modern AI.

Contents

  1. 1Preface
  2. 2Chapter 1 — What KServe Is
  3. 3Chapter 2 — The Model Serving Problem and Landscape
  4. 4Chapter 3 — KServe's Architecture
  5. 5Chapter 4 — The InferenceService
  6. 6Chapter 5 — Serving Runtimes
  7. 7Chapter 6 — The Standard Inference Protocol
  8. 8Chapter 7 — Autoscaling and Serverless
  9. 9Chapter 8 — Advanced Inference: Transformers, Explainers, and Graphs
  10. 10Chapter 9 — Production Serving
  11. 11Chapter 10 — Operating KServe in Practice