In just 48 hours at @RunAnywhereAI we built MetalRT: beating @Apple at their own game and delivering the FASTEST LLM inference engine on the market for Apple Silicon right now. - 570 tok/s decode @liquidai LFM 2.5-1.2B 4-bit - 658 tok/s decode @Alibaba_Qwen Qwen3-0.6B, 4-bit - 6.6 ms time-to-first-token - 1.19× faster than Apple's own MLX (identical model files) - 1.67× faster than llama.cpp on average We crushed Apple MLX, llama.cpp, uzu(by TryMirai), and Ollama across four different 4-bit models, including the on-device optimized LFM2.5-1.2B on a single M4 Max. Excited for this one! #ycombinator #runanywhere #ondeviceai #applesilicon #mlx