[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
📊 Project Info
- Language
- Cuda
- Stars
- ⭐ 3,411
- Forks
- 427
- Today
- +1
- Ranking
- #11
- Collection
- Language
- Trending Date
- June 5, 2026
- Last Push
- 1/17/2026
🏷️ Topics
attentioncudaefficient-attentioninference-accelerationllmllm-inframlsysquantizationtritonvideo-generatevideo-generationvit


