Back to browse
GitHub Repository

Machine Learning

12 starsC#

GPT-2 inference in pure C#, 0 bytes allocated per token

by dev-on-bike·May 17, 2026·1 point·1 comment

AI Analysis

●●●BangerWizardryBig Brain

GPT-2 inference in pure C# allocating zero bytes per token beats ONNX Runtime.

Strengths
  • Zero-allocation hot paths eliminate GC pressure for predictable real-time latency.
  • Matches PyTorch output within 1e-4 without native binaries or Python runtime.
  • KV-cache implementation scales O(N) while keeping managed memory usage flat.
Weaknesses
  • Supports only 14 operators currently, limiting complex model architecture imports.
  • Lacks the broad ecosystem and tooling surrounding established ONNX runtimes.
Category
Target Audience

.NET developers needing high-performance local inference

Similar To

ONNX Runtime · TensorFlow.NET · ML.NET

Similar Projects

AI/ML●●Solid

PyTorch on Java

LibTorch bindings bring CUDA and MPS backends to Java with LLaMA-3 inference included.

Niche GemBig Brain
pdsminer
2010d ago
AI/ML●●●Banger

Static-allocation MLP inference in ANSI C using a 2-slot ring buffer

Two-slot ring buffer cuts MLP RAM usage to the practical lower bound on microcontrollers.

Niche GemWizardryBig Brain
xou
4024d ago