Video Generation: From Frames to World Models cover

Generative AI · Ebook

Video Generation: From Frames to World Models

by Shriira Press

4.7(4,050)101 pagesPublished 2026

A comprehensive, self-contained guide to how machines learn to generate video — moving, temporally coherent imagery — from the first video GANs to the latent video-diffusion transformers behind today's text-to-video systems. This is the third volume in a trilogy; it blends intuition, mathematics, and runnable code, and builds directly on its companions on machine learning and image generation.

Contents

  1. 1Preface
  2. 2Chapter 1 — What Is Video Generation?
  3. 3Chapter 2 — Video as Data: Space, Time, and Motion
  4. 4Chapter 3 — Neural Building Blocks for Video
  5. 5Chapter 4 — Early Approaches: Video GANs and Autoregressive Video
  6. 6Chapter 5 — Video Diffusion Fundamentals
  7. 7Chapter 6 — Latent Video Diffusion and Spatiotemporal Compression
  8. 8Chapter 7 — Temporal Architectures: From U-Nets to Diffusion Transformers
  9. 9Chapter 8 — Text-to-Video, Image-to-Video, and Conditioning
  10. 10Chapter 9 — Controlling Video: Motion, Camera, and Consistency
  11. 11Chapter 10 — Long Video: Coherence, Memory, and World Models
  12. 12Chapter 11 — Evaluating Generated Video
  13. 13Chapter 12 — Systems, Efficiency, and Deployment
  14. 14Chapter 13 — Ethics, Deepfakes, and the Future
  15. 15Appendix A — Notation and Symbols
  16. 16Appendix B — Further Reading