thu-ml / SageAttention

#11

3,411427+1 todayCuda

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

📊 Project Info

Language: Cuda
Stars: ⭐ 3,411
Forks: 427
Today: +1
Ranking: #11
Collection: Language
Trending Date: June 5, 2026
Last Push: 1/17/2026

🏷️ Topics

attentioncudaefficient-attentioninference-accelerationllmllm-inframlsysquantizationtritonvideo-generatevideo-generationvit

📸 Screenshots