GitHub Repository

Zero-dependency single-header C++ library for streaming OpenAI & Anthropic LLM responses. Drop in llm_stream.hpp and go.

3 starsC++

Single-header C++ libraries for LLM APIs – zero deps beyond libcurl

Name: Single-header C++ libraries for LLM APIs – zero deps beyond libcurl
Availability: InStock
Author: Shmungus

by Shmungus·Mar 6, 2026·1 point·0 comments

Visit Project View on HN

AI Analysis

●●SolidShip ItCozy

Single-header C++ LLM bindings, libcurl only—but streaming + caching already exist elsewhere.

Strengths

•True zero-dependency design (hand-rolled JSON parser, no nlohmann/boost) lowers deployment friction
•Five modular libraries (stream, cache, cost, retry, format) each solve a real pain point independently
•Token counting and cost estimation offline + LRU-backed semantic cache are genuinely useful for C++ apps

Weaknesses

•Crowded space: llama.cpp, cpp-httplib, and Curl++ already wrap LLM APIs for C++
•No benchmarks vs existing solutions or evidence of performance advantage

Post Description

- llm-stream — streaming from OpenAI + Anthropic, callback-based - llm-cache — file-backed semantic cache, LRU eviction - llm-cost — offline token counting + cost estimation - llm-retry — exponential backoff + circuit breaker + provider failover - llm-format — structured output enforcer with hand-rolled JSON parser

Drop in one .hpp, link libcurl, done. No nlohmann, no boost, no Python.

https://github.com/Mattbusel/llm-stream https://github.com/Mattbusel/llm-cache https://github.com/Mattbusel/llm-cost https://github.com/Mattbusel/llm-retry https://github.com/Mattbusel/llm-format