Generative AI · Ebook
Video Generation: From Frames to World Models
by Shriira Press
4.7(4,050)101 pagesPublished 2026
A comprehensive, self-contained guide to how machines learn to generate video — moving, temporally coherent imagery — from the first video GANs to the latent video-diffusion transformers behind today's text-to-video systems. This is the third volume in a trilogy; it blends intuition, mathematics, and runnable code, and builds directly on its companions on machine learning and image generation.
Contents
- 1Preface
- 2Chapter 1 — What Is Video Generation?
- 3Chapter 2 — Video as Data: Space, Time, and Motion
- 4Chapter 3 — Neural Building Blocks for Video
- 5Chapter 4 — Early Approaches: Video GANs and Autoregressive Video
- 6Chapter 5 — Video Diffusion Fundamentals
- 7Chapter 6 — Latent Video Diffusion and Spatiotemporal Compression
- 8Chapter 7 — Temporal Architectures: From U-Nets to Diffusion Transformers
- 9Chapter 8 — Text-to-Video, Image-to-Video, and Conditioning
- 10Chapter 9 — Controlling Video: Motion, Camera, and Consistency
- 11Chapter 10 — Long Video: Coherence, Memory, and World Models
- 12Chapter 11 — Evaluating Generated Video
- 13Chapter 12 — Systems, Efficiency, and Deployment
- 14Chapter 13 — Ethics, Deepfakes, and the Future
- 15Appendix A — Notation and Symbols
- 16Appendix B — Further Reading
