OpenGem – Free, self-healing load-balanced proxy for Google Gemini API
Reverse-engineers free Gemini API; smart quota rotation, but against Google's terms of service.
Free, Open-Source AI API Gateway with Gemini, OpenAI & Anthropic Compatibility in 1 file
Reverse-engineered Gemini auth pooling free accounts—violates ToS, unsustainable.
Developers building AI agents, LLM applications on limited budgets
Unkey · LiteLLM · Anthropic API gateway
GitHub: https://github.com/arifozgun/OpenGem
The Context: Like many developers, I was constantly hitting "429 Quota Exceeded" errors while building AI agents and processing large payloads on free tiers. I wanted to build freely without calculating API costs for every test request.
How it works: I reverse-engineered the official Gemini CLI authentication to get standard API access. However, a single free Google account quota depletes quickly. To solve this, I built a Smart Load Balancer at the core of OpenGem.
What it does: - You connect multiple idle/free Google accounts to the dashboard via OAuth. - OpenGem acts as a standard endpoint (`POST /v1beta/models/{model}`). - It routes traffic to the least-used account. If an account hits a real 429 quota limit, OpenGem instantly detects it, puts that account on a 60-minute cooldown, and seamlessly retries with the next available account. It differentiates between simple RPM bursts and actual limits.
Tech specs: - Fully compatible with official Google SDKs (`@google/genai`), LangChain, and standard SSE streaming (no broken [DONE] chunks). - Supports native "tools" (Function Calling) for agentic workflows. - Raised payload limit to 50MB for massive contexts. - AES-256-GCM encryption for all sensitive configs and OAuth tokens at rest. - Toggle between Firebase Firestore or a fully offline Local JSON database.
It’s strictly for educational purposes and personal research to bypass the friction of testing/prototyping. The entire project is MIT licensed.
I’m currently running it with my own side projects and it handles heavy agent tasks flawlessly. I would love any feedback on the load balancing logic, security implementations, or just general thoughts!
Reverse-engineers free Gemini API; smart quota rotation, but against Google's terms of service.
Multi-account rotation with cooldowns beats single-account rate limits.
Zero-dependency proxy handles 429s better than writing custom retry logic in your app.
In-process key rotation with state machine simplicity instead of LiteLLM/Redis overhead.
Zero-code sharding proxy with cross-shard aggregates in production, serving millions QPS today.
Predictive account switching beats waiting for rate-limit errors on multiple Claude subscriptions.