LLM Colosseum – A daily battle royale between frontier LLMs

Name: LLM Colosseum – A daily battle royale between frontier LLMs
Availability: InStock
Author: sanifhimani

by sanifhimani·Feb 25, 2026·2 points·0 comments

Visit Project View on HN

AI Analysis

●●●BangerRabbit HoleWizardryCrowd Pleaser

Live LLM showdown with emergent strategies, beats static leaderboards.

Strengths

•Genuinely novel format for comparing LLM behavior—battle dynamics reveal strategic differences between models that benchmarks miss.
•Full API integration (Anthropic, OpenAI, Google, xAI) with zero scripted outcomes—emergent gameplay is real, not choreographed.
•Daily automated battles with git-backed JSON logs create a living artifact of model personalities over time.

Weaknesses

•Niche entertainment value—cool to watch, but limited practical insight into which model is 'better' for real work.
•No statistical rigor on sample size or decision-making depth; single battle outcomes are anecdotal.

Post Description

I put Claude, GPT, Gemini, and Grok in an arena and let them fight it out. Each model gets the full game state and decides how to survive - move, attack, form alliances, betray. Every decision comes from the model's API, nothing is scripted.

First battle ran today. Gemini won by allying with GPT early, then backstabbing at the perfect moment. Claude tried to play it safe and got eliminated. They play very differently and it's fun to watch.

Stack is React + Canvas, Bun + Hono on the backend. No database — battle data is JSON committed to git. Each model talks through its native SDK (Anthropic, OpenAI, Google, xAI). A new battle runs automatically every day.

Source: https://github.com/sanifhimani/llm-colosseum