Rriftt_ai.h – A bare-metal, dependency-free C23 tensor engine

Name: Rriftt_ai.h – A bare-metal, dependency-free C23 tensor engine
Availability: InStock
Author: Rriftt

by Rriftt·Mar 3, 2026·4 points·0 comments

Visit Project View on HN

AI Analysis

●●●BangerWizardryZero to OneBig Brain

Drop-in C23 neural network engine with zero BLAS, Python, or build-system dependency. Genuinely rare.

Strengths

•True zero-dependency implementation of Transformer stack (RoPE, attention, RMSNorm, SwiGLU, AdamW) in ~2KB header—architectural insight
•Strict arena allocation eliminates hidden malloc/free during forward/backward pass—genuine control and predictability
•Brutalist philosophy + rigorous C23 with full training pipeline (tokenizer, loss, optimizer) in one file—solves a real pain point

Weaknesses

•Early-stage: 1 star, minimal examples beyond the 20-line demo; no benchmarks vs PyTorch/Jax on realistic models
•No evidence of production use; adoption and long-term maintenance unclear

Post Description

Hi HN, I built rriftt_ai.h because I hit my breaking point with the modern deep learning stack.

I wanted to train and run Transformers, but I was exhausted by gigabyte-sized Python environments, opaque C++ build systems, and deep BLAS dependency trees. I wanted to see what it actually takes to execute a forward and backward pass from absolute scratch.

The result is a single-header, stb-style C library written in strict C23.

Architectural decisions I made: - *Zero dependencies:* It requires nothing but a C compiler and the standard math library. - *Strict memory control:* You instantiate a `RaiArena` at boot. The engine operates entirely within that perimeter. There are zero hidden `malloc` or `free` calls during execution. - *The Full Stack:* It natively implements Scaled Dot-Product Attention, RoPE, RMSNorm, and SwiGLU. I also built the backprop routines, Cross-Entropy loss, AdamW optimizer, and a BPE tokenizer directly into the structs.

It is currently public domain (or MIT, your choice). The foundation is stable and deterministic, but it is currently pure C math. I built this architecture to scale, so if anyone wants to tear apart my C23 implementation, audit the memory alignment, or submit SIMD/hardware-specific optimizations for the matmul operations, I'm actively reviewing PRs.