Back to browse
GitHub Repository
14 starsPython

Claude Code for Mobile GUI Automation

by UgOrange·Feb 19, 2026·1 point·0 comments

AI Analysis

●●SolidBig BrainNiche Gem
The Take

Splitting planner (Claude/Codex) from an orchestrator/skill layer that handles retries, rollback and stateful sessions is the project's best idea — it directly targets the brittleness of long LLM GUI workflows. The repo gives practical bits (CLI, install script, direct coordinate tap mode and unified JSON outputs), but it's early and niche: useful if you're building LLM-controlled phone automation, less interesting for general automation folks.

Target Audience

Mobile automation engineers, AI/ML researchers building LLM-driven agents, RPA developers, and devs experimenting with LLM+device orchestration

Post Description

Phone GUI agents (e.g., AutoGLM-Phone, GELab) can already do NL-driven taps/navigation/form filling. My observation: smaller GUI models (often 4B/9B class) work well for single interactions, but become brittle on long workflows with branching and recovery.

I built a Skill layer that separates planning from execution:

- Planner: Claude Code / Codex (task decomposition, decision-making, replanning) - Orchestrator: Skill layer (state machine, retries/rollback, tool protocol) - Executor: phone GUI model (screen parsing + UI actions + cross-app execution)

Execution loop:

1. Goal in NL/template 2. Planner emits step plan + conditions + fallback strategy 3. Skill compiles into atomic actions (tap/type/swipe/wait/verify) 4. GUI executor runs on real/cloud phone, returns screenshots/state/structured output 5. Planner/orchestrator decide next step until success/fallback Potential use cases:

- recruiting outreach automation - multi-platform content distribution - social outreach workflows - lead extraction - competitor monitoring

Similar Projects