GitHub Repository

A 1v1 Bomberman-style game where two LLM agents play autonomously against each other. No human plays — you watch the AIs fight. Each agent receives a text description of the board state, reasons about it, and outputs a move as JSON. The game engine executes it.

12 starsPython

A Bomberman-style 1v1 game where LLMs compete in real time

Name: A Bomberman-style 1v1 game where LLMs compete in real time
Availability: InStock
Author: sunandsurf

by sunandsurf·Apr 14, 2026·2 points·2 comments

Visit Project View on HN

AI Analysis

●●SolidNiche GemBig Brain

Real-time LLM vs LLM combat creates genuine speed-vs-reasoning tradeoffs ARC-AGI doesn't capture.

Strengths

•Text-based state harness avoids slow visual processing bottlenecks models still struggle with.
•Strategic game design forces tradeoffs between move speed and reasoning quality.
•OpenRouter integration means any model can compete without code changes.

Weaknesses

•Very early stage with zero stars and no benchmarking methodology documented.
•Limited to OpenRouter models; no local model support for offline testing.

Post Description

A few weeks ago, ARC-AGI 3 was released. For those unfamiliar, it’s a benchmark designed to study agentic intelligence through interactive environments.

I'm a big fan of these kinds of benchmarks as IMO they reveal so much more about the capabilities and limits of agentic AI than static Q&A benchmarks. They are also more intuitive to understand when you are able to actually see how the model behaves in these environments.

I wanted to build something in that spirit, but with an environment that pits two LLMs against each other. My criteria were:

1. Strategic & Real-time. The game had to create genuine tradeoffs between speed and quality of reasoning. Smaller models can make more moves but less strategic ones; larger models move slower but smarter. 2. Good harness. I deliberately avoided visual inputs — models are still too slow and not accurate enough with them (see: Claude playing Pokémon). Instead, a harness translates the game state into structured text, and the game engine renders the agents' responses as fluid animations. 3. Fun to watch. Because benchmarks don't need to be dry bread :) The end result is a Bomberman-style 1v1 game where two agents compete by destroying bricks and trying to bomb each other. You can check a demo video here: https://youtu.be/4x8tVypmuRk

Would love to hear what you think!

Similar Projects

AI/ML●●●Banger

A real-time strategy game that AI agents can play

Screeps-style RTS where LLMs code their way to victory, real iterative learning.

Big BrainWizardryRabbit Hole

__cayenne__

220783mo ago

AI/ML●●Solid

A real-time strategy game that AI agents can play

Having models emit runnable strategy code and then observe five rounds of iterative adaptation is a clever, low-abstraction way to test in-context learning and agentic behavior. The Screeps-style API plus per-frame runtime limits (1s/frame, 2,000 frames) forces practical engineering trade-offs, but the setup will be gated by compute cost and careful reproducibility choices.

WizardryBig BrainNiche Gem

__cayenne__

414mo ago

Gaming●●Solid

1v1 coding game that LLMs struggle with

LLMs can code bots but can't strategize—reveals blindspot in AI game-playing ability.

Niche GemWizardry

levmiseri

2983mo ago

Gaming●Mid

LLMs playing Poker, build your own bot or hook it up to an LLM and join

LLMs playing poker live is entertaining, but it's a novelty demo without depth or staying power for serious users.

Crowd PleaserShip It

ericlmtn

404mo ago

AI/ML●●●Banger

Rogue-Bench – LLMs play the game Rogue

Using 1980s Rogue as an LLM benchmark is genuinely novel and technically clever.

WizardryZero to One

iwhalen

1024d ago

AI/ML●●Solid

NetHack agent harness with benchmarks and livestream

You can watch an LLM play NetHack step-by-step with the model's reasoning, the exact action code, and a live game canvas — that instrumentation is the product's real selling point. The leaderboard + run/benchmark framing makes it useful for comparing agents rather than just a flashy demo, but it's still squarely for people who care about NetHack or agent evaluation; more detail on reproducible metrics and integrations would push it further.

Niche GemWizardry

kenforthewin

114mo ago