🔥Congrats to @Zai_org on launching GLM-5 — 744B parameters (40B active), trained on 28.5T tokens, integrating DeepSeek Sparse Attention to keep deployment cost manageable while preserving long-context capacity. vLLM has day-0 support for GLM-5-FP8 with: 📖 DeepSeek Sparse Attention for efficient long-context serving ⚡️ MTP speculative decoding ⚙️ Tool calling + thinking mode Recipe with serving configs and benchmarks: 🔗