A 24-billion-parameter model just ran on a laptop and picked the right tool in under half a second. The real story is that tool-calling agents finally became fast enough to feel like software. Liquid built LFM2-24B-A2B using a hybrid architecture that mixes convolution blocks with grouped query attention in a 1:3 ratio. Only 2.3 billion parameters activate per token, even though the full model holds 24 billion. That sparse activation pattern is why it fits in 14.5 GB of memory and dispatches tools in 385 milliseconds on an M4 Max. The architecture was designed through hardware-in-the-loop search, meaning they optimized the model structure by testing it directly on the chips it would run on. No cloud translation layer. No API roundtrip. The model, the tools, and your data stay on the machine. This unlocks three things that were impractical before: 1. Regulated industries can run agents on employee laptops without data leaving the device. 2. Developers can prototype multi-tool workflows without managing API keys or rate limits. 3. Security teams get full audit trails without vendor subprocessors in the loop. The model hit 80% accuracy on single-step tool selection across 67 tools spanning 13 MCP servers. If this performance holds at scale, two assumptions need updating. First, on-device agents are no longer a battery-life trade-off; they're a compliance feature. Second, the bottleneck in agentic workflows is shifting from model capability to tool ecosystem maturity.
088339
08833912 hours ago
> 385ms average tool selection. > 67 tools across 13 MCP servers. > 14.5GB memory footprint. > Zero network calls. LocalCowork is an AI agent that runs on a MacBook. Open source. 🧵
Amazing work from: @liquidai @ramin_m_h
149