Technology · Ebook
Fluid: Data Orchestration for Kubernetes
by Shriira Press
Fluid is a Kubernetes-native distributed dataset orchestration and acceleration engine for data-intensive applications (AI/ML and big data) — making data a first-class, cached, locality-aware citizen of Kubernetes. This free book teaches it from the ground up: the data problem in cloud-native compute and what Fluid is, the data-locality problem (decoupled storage, repeated fetching, idle GPUs), Fluid's architecture (Datasets, Runtimes, controllers), the Dataset abstraction, caching runtimes (Alluxio, JuiceFS, ThinRuntime), data acceleration (caching, prefetching, fast reads), data-aware scheduling (bringing compute to the data), AI/ML and big data use cases (training, serving, analytics), operating Fluid (managing caches, observability, autoscaling, consistency), and using Fluid in practice. Ten focused chapters with clear diagrams that demystify how to keep expensive compute fed — by caching data near compute, reusing it, and scheduling for locality — so data stops being the bottleneck.
Contents
- 1Preface
- 2Chapter 1 — What Fluid Is
- 3Chapter 2 — The Data Locality Problem
- 4Chapter 3 — Fluid's Architecture
- 5Chapter 4 — The Dataset Abstraction
- 6Chapter 5 — Caching Runtimes
- 7Chapter 6 — Data Acceleration
- 8Chapter 7 — Data-Aware Scheduling
- 9Chapter 8 — AI/ML and Big Data Use Cases
- 10Chapter 9 — Operating Fluid
- 11Chapter 10 — Using Fluid in Practice
