Trending topics
#
Bonk Eco continues to show strength amid $USELESS rally
#
Pump.fun to raise $1B token sale, traders speculating on airdrop
#
Boop.Fun leading the way with a new launchpad on Solana.
We present a research preview of Self-Flow: a scalable approach for training multi-modal generative models.
Multi-modal generation requires end-to-end learning across modalities: image, video, audio, text - without being limited by external models for representation learning. Self-Flow addresses this with self-supervised flow matching that scales efficiently across modalities.
Results:
• Up to 2.8x faster convergence across modalities.
• Improved temporal consistency in video
• Sharper text rendering and typography
This is foundational research for our path towards multimodal visual intelligence.

Self-Flow improves temporal consistency in video generation.
4B parameter multi-modal model trained on 6M videos.
Cleaner typography and text rendering.
4B parameter multi-modal model trained on 200M images.


Joint video-audio generation from a single model (sound on)
4B parameter multi-modal model trained on 2M audio-video pairs.
Self-Flow opens a path toward world models: combining visual scalability with semantic abstraction for planning and understanding.
Here's action prediction from a 675M parameter model.
75
Top
Ranking
Favorites
