Truly impressive release of hybrid tiny models from the Qwen team as always! People are asking how do they compare in speed, latency, and memory to @liquidai’s LFMs for on-device deployment? Here is a quick profiling on Apple M3 Ultra: > LFM2.5-1.2B is 52% faster in decode than the Qwen3.5-0.8B. > LFM2-700M is 71% faster than Qwen3.5-0.8B on decode > LFM2-2.6B has the same speed as Qwen3.5-2B on decode > LFM2-700M uses 46% less peak memory than Qwen3.5-0.8B > LFM2-2.6B uses 21% less peak memory than Qwen3.5-2B > lfms prefill with the same parameter size is generally 12% faster than Qwen3.5 We designed LFM2 series with our hardware-in-the-loop meta AI design approach that allows us to find out the most efficient architecture for a given processor without quality sacrifice. This test is done on Apple M3 Ultra, 512 GB unified memory Config: > 512 prompt tokens, 128 generation tokens, > 5 trials per configuration > Framework: MLX (mlx-lm / mlx-vlm)