Trending topics
#
Bonk Eco continues to show strength amid $USELESS rally
#
Pump.fun to raise $1B token sale, traders speculating on airdrop
#
Boop.Fun leading the way with a new launchpad on Solana.
Truly impressive release of hybrid tiny models from the Qwen team as always!
People are asking how do they compare in speed, latency, and memory to @liquidai’s LFMs for on-device deployment?
Here is a quick profiling on Apple M3 Ultra:
> LFM2.5-1.2B is 52% faster in decode than the Qwen3.5-0.8B.
> LFM2-700M is 71% faster than Qwen3.5-0.8B on decode
> LFM2-2.6B has the same speed as Qwen3.5-2B on decode
> LFM2-700M uses 46% less peak memory than Qwen3.5-0.8B
> LFM2-2.6B uses 21% less peak memory than Qwen3.5-2B
> lfms prefill with the same parameter size is generally 12% faster than Qwen3.5
We designed LFM2 series with our hardware-in-the-loop meta AI design approach that allows us to find out the most efficient architecture for a given processor without quality sacrifice.
This test is done on Apple M3 Ultra, 512 GB unified memory
Config:
> 512 prompt tokens, 128 generation tokens,
> 5 trials per configuration
> Framework: MLX (mlx-lm / mlx-vlm)

Top
Ranking
Favorites
