🙌The LLM any-to-any world welcomes Ming-flash-omni-preview, featuring a powerful 103B-A9B arch made highly efficient through sparse MoE. It establishes a new benchmark for open-source omni-modal performance in understanding and generation: 1. Controllable Image Generation: Introducing Generative Segmentation-as-Editing, which enables precise, pixel-level control. The model achieved a score of *0.90* on the GenEval benchmark. 2. Streaming Video Understanding: Enhanced capabilities for detailed and seamless audio-visual comprehension. 3. Dialect Recognition: Attaining SOTA performance in Chinese Dialect ASR, demonstrating proficiency across diverse dialects such as Hunanese, Cantonese, and Minnanese. #OpenSourceModels