GPT-2 inference in pure C#, 0 bytes allocated per token
GPT-2 inference in pure C# allocating zero bytes per token beats ONNX Runtime.
Static-allocation MLP inference in ANSI C using 2-slot circular buffer with fixed stride indexing. An easy to use, minimal MLP alternative to GiorgosXou/NeuralNetworks enhanced with PROGMEM, int-quantization etc.
Two-slot ring buffer cuts MLP RAM usage to the practical lower bound on microcontrollers.
Embedded developers, microcontroller engineers, TinyML practitioners
TFLite Micro · uTensor · Eloquent Arduino
This project is the result of that exploration: a fully static-allocation approach to MLP inference in ANSI C, using a simple 2-slot ring buffer to keep memory usage predictable and extremely low, while at the same time fast.
I believe this is close to the practical lower bound for RAM usage in general-purpose CPU MLP inference without sacrificing speed or introducing runtime complexity.
A more aggressive approach I've previously used is allocating and freeing memory per layer-to-layer pair during inference, but that introduces overhead and fragmentation if not used carefully. [1]
Curious how it compares to other minimal inference implementations people have seen (or built). Feedback and edge cases welcome. Hope you like it. Have fun. <3
[0]: https://github.com/GiorgosXou/NeuralNetworks#-research [1]: look for REDUCE_RAM_DELETE_OUTPUTS in the source of [0]
GPT-2 inference in pure C# allocating zero bytes per token beats ONNX Runtime.
Read-only GPU waste scanner finds 20-40% cluster spend waste without agents or sidecars.
One-command GPU waste scanner when Kubecost requires full Prometheus setup.
Persistent IPC channels survive reboots—unusual property most message systems lack.
Prometheus alternative in one binary—but early, still rough, needs maturation.
56 ns cross-language IPC beats iceoryx and Aeron on their own turf.