INT21 – Self-Improving PTX Kernel Factory
PTX kernel generation is rare, but 'self-improving' claims need verifiable benchmarks.
AI-generated x86-64 assembly vs GCC -O3 on production kernels. 4.8-6.3x on base64, verified with 300K fuzz iterations.
PSHUFB nibble trick beats GCC's lookup table by 4.8–6.3x on base64; reproducible fuzz methodology.
Low-level systems programmers, compiler engineers, AI researchers
Superoptimizer (classical) · STOKE (superoptimization tool)
PTX kernel generation is rare, but 'self-improving' claims need verifiable benchmarks.
Well-reasoned three-tier architecture, but lacks reference implementations and adoption proof.
Hand-tuned SSE particle engine from 2002 assembly, now runs in your browser via WASM.
6KB binary for an AI agent—fits on a floppy disk 62 times over.
Autonomous kernel optimizer that won MLSys contest with 34.93x speedup.
Beats PyTorch eager by 5.29x on RMSNorm using autonomous agent loops.