Back to browse
GitHub Repository

Mamba SSM and Mamba-3 SISO in Rust with optional CUDA GPU acceleration. Inference and training (BPTT through SSM state, AdamW), CPU + GPU paths, custom CUDA kernels, CUDA Graph capture, f32 / bf16 / f16. Batch-invariant bf16 inference — per-row output is bit-identical across batch sizes.

10 starsRust

Mamba SSM in Rust – training and inference with custom CUDA kernels

by silvermpx·Mar 23, 2026·1 point·0 comments

AI Analysis

●●SolidWizardryNiche Gem

Custom CUDA kernels for SSM recurrence with zero framework dependencies.

Strengths
  • Full BPTT through recurrent SSM state enables actual training, not just inference.
  • Zero-allocation single-step inference hits ~200μs on CPU without GPU.
  • Standalone design means no PyTorch, Burn, or Candle dependency chain.
Weaknesses
  • Mamba implementations already exist in multiple languages; Rust isn't unique.
  • No benchmark comparisons against official Mamba or other ports.
Category
Target Audience

ML engineers wanting Rust-based SSM implementations

Similar To

mamba-minimal · Candle · Burn

Similar Projects

Developer Tools●●●Banger

cuTile Rust: Safe, data-race-free GPU kernels in Rust

Extends Rust's ownership model across GPU boundary with tile-based partitioning for data-race-free kernels.

WizardryBig BrainNiche Gem
melihelibol
106184d ago
AI/ML●●●Banger

Glq LLM quantization using E8 lattice

E8 lattice codebooks beat GPTQ at 2-4 bpw with fused CUDA kernel skipping weight materialization.

WizardryBig Brain
acd
2019d ago