Detect any object in satellite imagery using a text prompt
Tile-based VLM inference with coordinate projection, but dense objects still need YOLO.

VLM-based satellite detection sounds good until you remember YOLO and specialized models handle occlusion better.
Geospatial analysts, environmental researchers, urban planners, commercial imagery users
Planet Labs API · Esri ArcGIS · OpenCV + YOLO pipelines
Pipeline: select area and zoom level, split the region into mercantile tiles, run each tile with the prompt through a VLM, convert predicted bounding boxes to geographic coordinates (WGS84), and render the results back on the map.
It works reasonably well for distinct structures in a zero-shot setting. occluded objects are still better handled by specialized detectors like YOLO models.
There is a public demo and no login required. I am mainly interested in feedback on detection quality, performance tradeoffs between VLMs and specialized detectors, and potential real-world use cases.
Tile-based VLM inference with coordinate projection, but dense objects still need YOLO.
Open-vocabulary object detection exporting to YOLO format without login requirements.
AI object detection for text placement, but Canva and Adobe Express already do this better.
GeoGuessr clone with satellite imagery—10k players but no real differentiation.
GPU-accelerated 30K object rendering is impressive, but the space tracking category already has Heavens-Above and N2YO.
Yet another prompt enhancer when PromptPerfect and dozens of AI wrappers already exist.