Cuda Trending

Trending Cuda repos on GitHub · last 7 days

karpathy
karpathy /

llm.c

#13

LLM training in simple, raw C/CUDA

30,1333,625+15
Cuda
alibaba
alibaba /

rtp-llm

#6

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

1,187204+8
Cuda
gptinferencellamallmllm-serving
deepseek-ai
deepseek-ai /

DeepGEMM

#3

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

7,3391,023+8
Cuda
NVlabs
NVlabs /

instant-ngp

#8

Instant neural graphics primitives: lightning fast NeRF and more

17,4232,063+6
Cuda
3d-reconstructioncomputer-graphicscomputer-visioncudafunction-approximation
NVIDIA
NVIDIA /

cuopt

#1

GPU accelerated decision optimization

934190+5
Cuda
cudagpulinear-programmingoptimization
BBuf
BBuf /

how-to-optim-algorithm-in-cuda

#15

how to optimize some algorithm in cuda.

3,060278+3
Cuda
cudallm
NVIDIA
NVIDIA /

nvbench

#14

CUDA Kernel Benchmarking Library

870109+2
Cuda
benchmarkcudacuda-kernelsgpukernel-benchmark
rapidsai
rapidsai /

cuvs

#12

cuVS - a library for vector search and clustering on the GPU

777194+2
Cuda
annsclusteringcudadistancegpu
HazyResearch
HazyResearch /

ThunderKittens

#2

Tile primitives for speedy kernels

3,411290+2
Cuda
thu-ml
thu-ml /

SageAttention

#11

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

3,411427+1
Cuda
attentioncudaefficient-attentioninference-accelerationllm
NVIDIA
NVIDIA /

CUDALibrarySamples

#10

CUDA Library Samples

2,425459+1
Cuda
cudacudsscufftcurandcusolver
Dao-AILab
Dao-AILab /

causal-conv1d

#9

Causal depthwise conv1d in CUDA, with a PyTorch interface

896190+1
Cuda
NVIDIA
NVIDIA /

nccl-tests

#5

NCCL Tests

1,542375+1
Cuda
deepseek-ai
deepseek-ai /

DeepEP

#4

DeepEP: an efficient expert-parallel communication library

9,6961,279+1
Cuda
brucefan1983
brucefan1983 /

GPUMD

#16

Graphics Processing Units Molecular Dynamics

782186
Cuda
cudagpugpumdheat-transporthigh-performance-computing
NVIDIA
NVIDIA /

cub

#7

[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl

1,833462
Cuda
algorithmscppcpp11cpp14cpp17
karpathy
karpathy /

llm.c

#15

LLM training in simple, raw C/CUDA

30,1183,624+13
Cuda
NVIDIA
NVIDIA /

cuopt

#10

GPU accelerated decision optimization

929188+11
Cuda
cudagpulinear-programmingoptimization
HazyResearch
HazyResearch /

ThunderKittens

#5

Tile primitives for speedy kernels

3,409290+4
Cuda
thu-ml
thu-ml /

SageAttention

#12

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

3,410426+3
Cuda
attentioncudaefficient-attentioninference-accelerationllm
NVIDIA
NVIDIA /

nccl-tests

#8

NCCL Tests

1,541375+2
Cuda
Dao-AILab
Dao-AILab /

causal-conv1d

#6

Causal depthwise conv1d in CUDA, with a PyTorch interface

895189+2
Cuda
mirage-project
mirage-project /

mirage

#14

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

2,290214+1
Cuda
rapidsai
rapidsai /

cuvs

#13

cuVS - a library for vector search and clustering on the GPU

775193+1
Cuda
annsclusteringcudadistancegpu
NVIDIA
NVIDIA /

CUDALibrarySamples

#9

CUDA Library Samples

2,424459+1
Cuda
cudacudsscufftcurandcusolver
alibaba
alibaba /

rtp-llm

#3

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

1,179204+1
Cuda
gptinferencellamallmllm-serving
deepseek-ai
deepseek-ai /

DeepEP

#2

DeepEP: an efficient expert-parallel communication library

9,6951,278+1
Cuda
deepseek-ai
deepseek-ai /

DeepGEMM

#1

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

7,3311,021+1
Cuda
NVIDIA
NVIDIA /

nvbench

#11

CUDA Kernel Benchmarking Library

868109
Cuda
benchmarkcudacuda-kernelsgpukernel-benchmark
NVIDIA
NVIDIA /

cub

#7

[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl

1,833462
Cuda
algorithmscppcpp11cpp14cpp17
brucefan1983
brucefan1983 /

GPUMD

#4

Graphics Processing Units Molecular Dynamics

782186
Cuda
cudagpugpumdheat-transporthigh-performance-computing
karpathy
karpathy /

llm.c

#7

LLM training in simple, raw C/CUDA

30,1073,623+9
Cuda
alibaba
alibaba /

rtp-llm

#1

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

1,178204+8
Cuda
gptinferencellamallmllm-serving
deepseek-ai
deepseek-ai /

DeepGEMM

#2

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

7,3301,019+6
Cuda
NVIDIA
NVIDIA /

cuopt

#14

GPU accelerated decision optimization

918184+5
Cuda
cudagpulinear-programmingoptimization
thu-ml
thu-ml /

SageAttention

#6

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

3,407425+4
Cuda
attentioncudaefficient-attentioninference-accelerationllm
deepseek-ai
deepseek-ai /

DeepEP

#4

DeepEP: an efficient expert-parallel communication library

9,6941,276+4
Cuda
mirage-project
mirage-project /

mirage

#15

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

2,289214+2
Cuda
brucefan1983
brucefan1983 /

GPUMD

#3

Graphics Processing Units Molecular Dynamics

782186+2
Cuda
cudagpugpumdheat-transporthigh-performance-computing
NVlabs
NVlabs /

instant-ngp

#13

Instant neural graphics primitives: lightning fast NeRF and more

17,4142,065+1
Cuda
3d-reconstructioncomputer-graphicscomputer-visioncudafunction-approximation
HenryHuYu
HenryHuYu /

DiffPhysDrone

#10

Published on Nature Machine Intelligence! The first real robot(quadrotor) based on differentiable physics training.

55982+1
Cuda
droneend-to-endreinforcement-learningrobotics
Dao-AILab
Dao-AILab /

causal-conv1d

#9

Causal depthwise conv1d in CUDA, with a PyTorch interface

893188+1
Cuda
NVIDIA
NVIDIA /

CUDALibrarySamples

#5

CUDA Library Samples

2,423459+1
Cuda
cudacudsscufftcurandcusolver
HazyResearch
HazyResearch /

ThunderKittens

#12

Tile primitives for speedy kernels

3,405290
Cuda
rapidsai
rapidsai /

cugraph

#11

cuGraph - RAPIDS Graph Analytics Library

2,187357
Cuda
complex-networkscudagpugraphgraph-algorithms
NVIDIA
NVIDIA /

nccl-tests

#8

NCCL Tests

1,539375
Cuda
alibaba
alibaba /

rtp-llm

#4

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

1,170204+16
Cuda
gptinferencellamallmllm-serving
karpathy
karpathy /

llm.c

#3

LLM training in simple, raw C/CUDA

30,1033,622+13
Cuda
deepseek-ai
deepseek-ai /

DeepGEMM

#1

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

7,3251,014+6
Cuda
HazyResearch
HazyResearch /

ThunderKittens

#12

Tile primitives for speedy kernels

3,405290+4
Cuda
deepseek-ai
deepseek-ai /

DeepEP

#6

DeepEP: an efficient expert-parallel communication library

9,6931,273+4
Cuda
NVIDIA
NVIDIA /

CUDALibrarySamples

#10

CUDA Library Samples

2,422458+3
Cuda
cudacudsscufftcurandcusolver
thu-ml
thu-ml /

SageAttention

#9

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

3,403425+3
Cuda
attentioncudaefficient-attentioninference-accelerationllm
NVlabs
NVlabs /

instant-ngp

#13

Instant neural graphics primitives: lightning fast NeRF and more

17,4142,065+2
Cuda
3d-reconstructioncomputer-graphicscomputer-visioncudafunction-approximation
Dao-AILab
Dao-AILab /

causal-conv1d

#11

Causal depthwise conv1d in CUDA, with a PyTorch interface

892188+2
Cuda
rapidsai
rapidsai /

cuvs

#8

cuVS - a library for vector search and clustering on the GPU

772192+2
Cuda
annsclusteringcudadistancegpu
NVIDIA
NVIDIA /

nccl-tests

#7

NCCL Tests

1,539375+2
Cuda
NVIDIA
NVIDIA /

cuopt

#2

GPU accelerated decision optimization

915183+2
Cuda
cudagpulinear-programmingoptimization
brucefan1983
brucefan1983 /

GPUMD

#5

Graphics Processing Units Molecular Dynamics

780186+1
Cuda
cudagpugpumdheat-transporthigh-performance-computing

Other Languages