Generative AI · Ebook

Video Generation: From Frames to World Models

by Shriira Press

4.7(4,050)101 pagesPublished 2026

A comprehensive, self-contained guide to how machines learn to generate video — moving, temporally coherent imagery — from the first video GANs to the latent video-diffusion transformers behind today's text-to-video systems. This is the third volume in a trilogy; it blends intuition, mathematics, and runnable code, and builds directly on its companions on machine learning and image generation.

1Preface
2Chapter 1 — What Is Video Generation?
3Chapter 2 — Video as Data: Space, Time, and Motion
4Chapter 3 — Neural Building Blocks for Video
5Chapter 4 — Early Approaches: Video GANs and Autoregressive Video
6Chapter 5 — Video Diffusion Fundamentals
7Chapter 6 — Latent Video Diffusion and Spatiotemporal Compression
8Chapter 7 — Temporal Architectures: From U-Nets to Diffusion Transformers
9Chapter 8 — Text-to-Video, Image-to-Video, and Conditioning
10Chapter 9 — Controlling Video: Motion, Camera, and Consistency
11Chapter 10 — Long Video: Coherence, Memory, and World Models
12Chapter 11 — Evaluating Generated Video
13Chapter 12 — Systems, Efficiency, and Deployment
14Chapter 13 — Ethics, Deepfakes, and the Future
15Appendix A — Notation and Symbols
16Appendix B — Further Reading

Video Generation: From Frames to World Models

Contents