We’re working with the OSS community to take the guesswork out of disaggregated serving by integrating NVIDIA Dynamo into the stack, with support for all major inference serving frameworks. 🔹 @sgl_project community is improving AI inference performance—reducing guesswork and enabling faster, more efficient, and scalable model execution. 🔹 Mooncake AI built the first SGLang backend for AIConfigurator, enabling rapid support for models like Llama, Qwen, and DeepSeek by implementing the collector layer for core operations such as GEMM and attention. 🔹 @alibaba_cloud integrated AIConfigurator into its AI Serving Stack on Kubernetes (ACK), using the RoleBasedGroup (RBG) orchestration engine to automate deployments and manage prefill/decode disaggregation. The result: 1.86× higher throughput on Qwen3-235B‑FP8 while maintaining TTFT < 5 s and ITL < 40 ms. Read the technical blog →