Back to browse
GitHub Repository

Zero-dependency single-header C++ library for streaming OpenAI & Anthropic LLM responses. Drop in llm_stream.hpp and go.

3 starsC++

Single-header C++ libraries for LLM APIs – zero deps beyond libcurl

by Shmungus·Mar 6, 2026·1 point·0 comments

AI Analysis

●●SolidShip ItCozy

Single-header C++ LLM bindings, libcurl only—but streaming + caching already exist elsewhere.

Strengths
  • True zero-dependency design (hand-rolled JSON parser, no nlohmann/boost) lowers deployment friction
  • Five modular libraries (stream, cache, cost, retry, format) each solve a real pain point independently
  • Token counting and cost estimation offline + LRU-backed semantic cache are genuinely useful for C++ apps
Weaknesses
  • Crowded space: llama.cpp, cpp-httplib, and Curl++ already wrap LLM APIs for C++
  • No benchmarks vs existing solutions or evidence of performance advantage
Target Audience

C++ developers building LLM-integrated applications, game engines, embedded systems

Similar To

llama.cpp · cpp-httplib · curlpp

Post Description

- llm-stream — streaming from OpenAI + Anthropic, callback-based - llm-cache — file-backed semantic cache, LRU eviction - llm-cost — offline token counting + cost estimation - llm-retry — exponential backoff + circuit breaker + provider failover - llm-format — structured output enforcer with hand-rolled JSON parser

Drop in one .hpp, link libcurl, done. No nlohmann, no boost, no Python.

https://github.com/Mattbusel/llm-stream https://github.com/Mattbusel/llm-cache https://github.com/Mattbusel/llm-cost https://github.com/Mattbusel/llm-retry https://github.com/Mattbusel/llm-format

Similar Projects