PyTorch on Java
LibTorch bindings bring CUDA and MPS backends to Java with LLaMA-3 inference included.
Machine Learning
GPT-2 inference in pure C# allocating zero bytes per token beats ONNX Runtime.
.NET developers needing high-performance local inference
ONNX Runtime · TensorFlow.NET · ML.NET
LibTorch bindings bring CUDA and MPS backends to Java with LLaMA-3 inference included.
Hand-written FlashAttention and full gradient checks in pure CUDA with no PyTorch.
Formally verifies ResNet and ViT architectures using Lean 4 proofs.
Two-slot ring buffer cuts MLP RAM usage to the practical lower bound on microcontrollers.
Cryptographic signing of government UFO docs is a clever trust layer.
30x faster cold start than vLLM with zero PyTorch dependencies.