Llmtop – Htop for LLM Inference Clusters (vLLM, SGLang, Nim, Ollama,)
htop for vLLM clusters without the Prometheus overhead.
Build your own high performance LLM inference engine in C++ and CUDA - a smaller version of vLLM
Build vLLM from scratch with PagedAttention kernels when llama.cpp already exists.
Developers learning CUDA kernel engineering and LLM inference internals
vLLM · llama.cpp · mlc-llm
htop for vLLM clusters without the Prometheus overhead.
Teaches LLM RL training with working Tic Tac Toe demo that beats gpt-5-mini.
28% faster Vulkan-to-CUDA on Qwen, but llm.c and llama.cpp already own inference.
30x faster cold start than vLLM with zero PyTorch dependencies.
E8 lattice codebooks beat GPTQ at 2-4 bpw with fused CUDA kernel skipping weight materialization.
INT4 inference engine beats llama.cpp on VRAM, but competing against established tools.