Video Generation: From Frames to World Models

Shriira Press

Preface

A comprehensive, self-contained guide to how machines learn to generate video — moving, temporally coherent imagery — from the first video GANs to…

Welcome to Video Generation: From Frames to World Models.

A comprehensive, self-contained guide to how machines learn to generate video — moving, temporally coherent imagery — from the first video GANs to the latent video-diffusion transformers behind today's text-to-video systems. This is the third volume in a trilogy; it blends intuition, mathematics, and runnable code, and builds directly on its companions on machine learning and image generation.

This title is part of the ShriIra library and is free to read in full, right here — our small contribution to making world-class knowledge easy to reach.

A note on reading it: open the Contents menu at the top of the reader to jump between chapters, use the Aa menu to set a comfortable text size, theme (light, sepia, or night), and single- or two-page layout. Your place is saved automatically, so you can always pick up where you left off.

We hope it serves you well.

— Shriira Press

Contents

  1. Chapter 1 — What Is Video Generation?
  2. Chapter 2 — Video as Data: Space, Time, and Motion
  3. Chapter 3 — Neural Building Blocks for Video
  4. Chapter 4 — Early Approaches: Video GANs and Autoregressive Video
  5. Chapter 5 — Video Diffusion Fundamentals
  6. Chapter 6 — Latent Video Diffusion and Spatiotemporal Compression
  7. Chapter 7 — Temporal Architectures: From U-Nets to Diffusion Transformers
  8. Chapter 8 — Text-to-Video, Image-to-Video, and Conditioning
  9. Chapter 9 — Controlling Video: Motion, Camera, and Consistency
  10. Chapter 10 — Long Video: Coherence, Memory, and World Models
  11. Chapter 11 — Evaluating Generated Video
  12. Chapter 12 — Systems, Efficiency, and Deployment
  13. Chapter 13 — Ethics, Deepfakes, and the Future
  14. Appendix A — Notation and Symbols
  15. Appendix B — Further Reading
0%
1/1