🚀 Meet HySparse: Our new breakthrough in long-context LLM efficiency! We’re excited to share HySparse (Hybrid Sparse Attention)—a hybrid model architecture that interleaves each full attention layer with multiple sparse attention layers, where the sparse layers strategically derive important token selection and KV caches from the preceding full layer! 📖 Paper link: