StreamHouse – Open-source Kafka alternative
S3-native storage slashes Kafka costs from thousands to $23 per TB monthly.
Open-source event streaming platform built on S3. Kafka-compatible APIs, built-in SQL engine, schema registry — one Rust binary replaces Kafka + ZooKeeper + KSQL. Retention costs pennies, not thousands
Removes broker disk complexity entirely—S3 as durable log cuts Kafka ops burden and cost dramatically.
Platform engineers and data teams running high-volume streaming workloads; cost-conscious ops teams.
Apache Kafka · Pulsar · Redpanda
I built StreamHouse, an open-source streaming platform that replaces Kafka's broker-managed storage with direct S3 writes. The goal: same semantics, fraction of the cost.
How it works: Producers batch and compress records, a stateless server manages partition routing and metadata (SQLite for dev, PostgreSQL for prod), and segments land directly in S3. Consumers read from S3 with a local segment cache. No broker disks to manage, no replication factor to tune — S3 gives you 11 nines of durability out of the box.
What's there today: - Producer API with batching, LZ4 compression, and offset tracking (62K records/sec) - Consumer API with consumer groups, auto-commit, and multi-partition fanout (30K+ records/sec) - Kafka-compatible protocol (works with existing Kafka clients) - REST API, gRPC API, CLI, and a web UI - Docker Compose setup for trying it locally in 5 minutes
The cost model is what motivated this. Kafka's storage costs scale with replication factor × retention × volume. With S3 at $0.023/GB/month, storing a TB of events costs ~$23/month instead of hundreds on broker EBS volumes.
Written in Rust, ~50K lines across 15 crates. Apache 2.0 licensed.
GitHub: https://github.com/gbram1/streamhouse
Happy to answer questions about the architecture, tradeoffs, or what I learned building this.
S3-native storage slashes Kafka costs from thousands to $23 per TB monthly.
Stateless agents with S3 storage cuts Kafka+Flink costs by 92%, but confluent-kafka wire protocol still needs fixes.
S3-backed Kafka eliminates broker state management; inspired by Warpstream but open-source.
Wire-protocol parsing means zero Docker overhead for Kafka integration tests.
Stateless Go proxy routes LLM requests by model name to vLLM backends.
Stateless architecture means your GitHub data is never stored on their servers.