Back to browse
GitHub Repository

Free, Open-Source AI API Gateway with Gemini, OpenAI & Anthropic Compatibility in 1 file

44 starsTypeScript

OpenGem – A Load-Balanced Gemini API Proxy (No API Key Required)

by ariozgun·Feb 22, 2026·7 points·3 comments

AI Analysis

PassBold Bet

Reverse-engineered Gemini auth pooling free accounts—violates ToS, unsustainable.

Strengths
  • Smart load balancer detects real 429s vs rate limits, auto-rotating accounts intelligently
  • Standardized `/v1beta` endpoint drops into existing Gen AI SDKs without code rewrites
  • Dashboard with real-time usage stats and function-calling support is polished
Weaknesses
  • Reverse-engineering Google credentials and pooling accounts likely violates Gemini ToS—legal/account-ban risk
  • Depends entirely on Google's free tier not patching the auth exploit; no moat or durability
Target Audience

Developers building AI agents, LLM applications on limited budgets

Similar To

Unkey · LiteLLM · Anthropic API gateway

Post Description

Hi HN! I built OpenGem, an open-source, load-balanced proxy for the Gemini API that requires absolutely no paid API keys.

GitHub: https://github.com/arifozgun/OpenGem

The Context: Like many developers, I was constantly hitting "429 Quota Exceeded" errors while building AI agents and processing large payloads on free tiers. I wanted to build freely without calculating API costs for every test request.

How it works: I reverse-engineered the official Gemini CLI authentication to get standard API access. However, a single free Google account quota depletes quickly. To solve this, I built a Smart Load Balancer at the core of OpenGem.

What it does: - You connect multiple idle/free Google accounts to the dashboard via OAuth. - OpenGem acts as a standard endpoint (`POST /v1beta/models/{model}`). - It routes traffic to the least-used account. If an account hits a real 429 quota limit, OpenGem instantly detects it, puts that account on a 60-minute cooldown, and seamlessly retries with the next available account. It differentiates between simple RPM bursts and actual limits.

Tech specs: - Fully compatible with official Google SDKs (`@google/genai`), LangChain, and standard SSE streaming (no broken [DONE] chunks). - Supports native "tools" (Function Calling) for agentic workflows. - Raised payload limit to 50MB for massive contexts. - AES-256-GCM encryption for all sensitive configs and OAuth tokens at rest. - Toggle between Firebase Firestore or a fully offline Local JSON database.

It’s strictly for educational purposes and personal research to bypass the friction of testing/prototyping. The entire project is MIT licensed.

I’m currently running it with my own side projects and it handles heavy agent tasks flawlessly. I would love any feedback on the load balancing logic, security implementations, or just general thoughts!

Similar Projects