Introducing RTFM (Real-Time Frame Model): a highly efficient World Model that generates video frames in real time as you interact with it, powered by a single H100 GPU. RTFM renders persistent and 3D consistent worlds, both real and imaginary. Try our demo of RTFM today!
Generative World Models will inevitably be computationally demanding, potentially scaling beyond even the requirements of today’s LLMs. But we believe they are a crucial research direction to explore in the future of rendering and spatial intelligence.
RTFM does not build an explicit 3D representation of the world. Instead, it takes one or more 2D images as input, and directly generates new 2D images of the same scene from different points of view.
RTFM can be seen as a learned renderer: it is an autoregressive diffusion transformer trained end-to-end on large-scale video data, and it learns to model 3D geometry, reflections, shadows and more just by observing them in its training set.
RTFM can also be used to reconstruct real-world locations from sparsely captured photographs. These are not real videos: they are frames generated by RTFM.
For a limited time, you can try out a live demo of RTFM yourself, hosted on cloud GPUs and streamed to your device (mobile support included!):
305.72K