Aegaeon: Effective GPU Pooling for Concurrent LLM Serving on the Market Beida and Alibaba Cloud Aegaeon has been beta deployed in Alibaba Cloud Model Studio for over three months, currently serving tens of models that range from 1.8B to 72B parameters. It reduces the number of GPUs required for serving these models from 1,192 to 213, highlighting an 82% GPU resource saving