DApp Store | Web3 Hub for Events & Games

Trending topics

We present a research preview of Self-Flow: a scalable approach for training multi-modal generative models. Multi-modal generation requires end-to-end learning across modalities: image, video, audio, text - without being limited by external models for representation learning. Self-Flow addresses this with self-supervised flow matching that scales efficiently across modalities. Results: • Up to 2.8x faster convergence across modalities. • Improved temporal consistency in video • Sharper text rendering and typography This is foundational research for our path towards multimodal visual intelligence.

Self-Flow improves temporal consistency in video generation. 4B parameter multi-modal model trained on 6M videos.

Cleaner typography and text rendering. 4B parameter multi-modal model trained on 200M images.

Joint video-audio generation from a single model (sound on) 4B parameter multi-modal model trained on 2M audio-video pairs.

Self-Flow opens a path toward world models: combining visual scalability with semantic abstraction for planning and understanding. Here's action prediction from a 675M parameter model.

75

Top

Ranking

Favorites