Claude Code for Mobile GUI Automation

Name: Claude Code for Mobile GUI Automation
Availability: InStock
Author: UgOrange

by UgOrange·Feb 19, 2026·1 point·0 comments

Visit Project View on HN

AI Analysis

●●SolidBig BrainNiche Gem

The Take

Splitting planner (Claude/Codex) from an orchestrator/skill layer that handles retries, rollback and stateful sessions is the project's best idea — it directly targets the brittleness of long LLM GUI workflows. The repo gives practical bits (CLI, install script, direct coordinate tap mode and unified JSON outputs), but it's early and niche: useful if you're building LLM-controlled phone automation, less interesting for general automation folks.

Post Description

Phone GUI agents (e.g., AutoGLM-Phone, GELab) can already do NL-driven taps/navigation/form filling. My observation: smaller GUI models (often 4B/9B class) work well for single interactions, but become brittle on long workflows with branching and recovery.

I built a Skill layer that separates planning from execution:

- Planner: Claude Code / Codex (task decomposition, decision-making, replanning) - Orchestrator: Skill layer (state machine, retries/rollback, tool protocol) - Executor: phone GUI model (screen parsing + UI actions + cross-app execution)

Execution loop:

1. Goal in NL/template 2. Planner emits step plan + conditions + fallback strategy 3. Skill compiles into atomic actions (tap/type/swipe/wait/verify) 4. GUI executor runs on real/cloud phone, returns screenshots/state/structured output 5. Planner/orchestrator decide next step until success/fallback Potential use cases:

- recruiting outreach automation - multi-platform content distribution - social outreach workflows - lead extraction - competitor monitoring