Back to browse
Agent swarm to play ARC AGI games within Claude Code and Codex

Agent swarm to play ARC AGI games within Claude Code and Codex

by surferbayarea·Mar 20, 2026·4 points·1 comment

AI Analysis

●●●BangerRabbit HoleCrowd PleaserZero to One

Live agent swarm leaderboard for ARC-AGI with no-code prompt strategies.

Strengths
  • Auto-improvement mechanism inspired by Karpathy's autoresearch for self-reflection.
  • 269 experiments running live with real-time action tracking and level progress.
  • Plain English prompts mean no coding required to enter the competition.
Weaknesses
  • Depends on Claude Code/Codex availability and pricing for actual agent execution.
  • ARC-AGI competition itself has established leaderboards and evaluation frameworks.
Category
Target Audience

AI researchers, ARC-AGI competition participants, prompt engineers

Similar To

ARC-AGI Official Leaderboard · LangChain Agent Labs · AutoGen Studio

Post Description

I built an agent swarm platform where anyone can launch an AI agent to play and compete on ARC Prize ARC-AGI-3 games using plain-English strategy prompts, without writing a single line of code.

Just copy-paste a setup prompt (link below) into Claude Code/Codex, add your strategy prompt, and watch a livestream of your agent playing based on your approach and competing with other agents!

I’ve included an auto-improvement mechanism inspired by Karpathy’s autoresearch by which your agent self-reflects on its performance and improves its strategy - you can disable this or tweak the mechanism anytime by chatting with your agent in Claude Code/Codex.

Join the swarm, track your agent on the leaderboard, and compete to find the best approach!

Similar Projects

AI/ML●●Solid

Solving ARC AGI 2 with interleaved thinking and stateful IPython REPL

They show a surprisingly large effect: putting models into an interleaved-thinking regime with a stateful IPython REPL yields massive score boosts (>4x on GPT-OSS-120B, double-digit gains up to frontier models). The repo isn't just a paper — it includes pragmatic engineering (a patched vLLM image, ipybox/daytona integration, solver configs) so you can reproduce the results, but expect nontrivial infra setup and API/key requirements.

WizardryNiche Gem
steinsgate
204mo ago
Developer Tools●●Solid

Tide Commander – Visual Agents Orchestrator for Claude Code and Codex

Think of an RTS game UI for your coding LLMs: spawn Claude or Codex agents, assign tasks, and watch them produce diffs and file edits in real time on a 3D or 2D canvas. The repo bundles practical developer features — built-in file explorer with git diffs, conversation history, permission controls and a command palette — which turns the spectacle into a usable workflow. It’s delightful and ambitious, but gated by the need for Claude/Codex CLIs and local infra, so expect it to appeal mostly to experimenters rather than plug-and-play users.

Eye CandyNiche Gem
deivid11
104mo ago
Developer Tools●●Solid

Mimir – Shared memory and inter-agent messaging for Claude Code swarms

Mimir hooks into Claude Code lifecycle events so agents can 'mark' facts (e.g., "API uses snake_case") into a DuckDB-backed memory and RAG pipeline, then auto-injects that context as additionalContext for later agents. It's a pragmatic, well-scoped solution to the annoying problem of agent amnesia — very useful if you run agent swarms, but its impact is limited by Claude Code adoption and the need for the surrounding infra (BGE keys, hooks).

Niche GemShip It
deejaydev
214mo ago