MAJOR KV-CACHE MEMORY FIX Fix the KV-cache of GLM-4.7-Flash with this single-line change in vLLM 200K context now take ~10GB of VRAM instead of ~180GB NVFP4 is now on HF* - ~20.4GB weights - Nearly zero loss vs 62.4GB BF16 This SOTA model now runs on a single RTX 5090 (32GB VRAM) > with the full 200K context > VRAM still left over *HF: GadflyII/GLM-4.7-Flash-NVFP4