GitHub Repository

An understudy watches. Then performs.

439 starsTypeScript

Understudy – Teach a desktop agent by demonstrating a task once

Name: Understudy – Teach a desktop agent by demonstrating a task once
Availability: InStock
Author: bayes-song

by bayes-song·Mar 12, 2026·120 points·41 comments

Visit Project View on HN

AI Analysis

●●SolidBig BrainShip It

Intent extraction beats brittle coordinate macros, but desktop agents are getting crowded fast.

Strengths

•Extracts semantic intent rather than screen coordinates for resilient replays.
•Operates across browsers, terminals, desktop apps, and messaging in one session.
•Local-first runtime keeps data on-device without API dependencies.

Weaknesses

•Only 4 commits on GitHub — very early stage, unclear production readiness.
•No Windows support mentioned; macOS-first limits audience significantly.

Post Description

I built Understudy because a lot of real work still spans native desktop apps, browser tabs, terminals, and chat tools. Most current agents live in only one of those surfaces.

Understudy is a local-first desktop agent runtime that can operate GUI apps, browsers, shell tools, files, and messaging in one session. The part I'm most interested in feedback on is teach-by-demonstration: you do a task once, the agent records screen video + semantic events, extracts the intent rather than coordinates, and turns it into a reusable skill.

Demo video: https://www.youtube.com/watch?v=3d5cRGnlb_0

In the demo I teach it: Google Image search -> download a photo -> remove background in Pixelmator Pro -> export -> send via Telegram. Then I ask it to do the same for Elon Musk. The replay isn't a brittle macro: the published skill stores intent steps, route options, and GUI hints only as a fallback. In this example it can also prefer faster routes when they are available instead of repeating every GUI step.

Current state: macOS only. Layers 1-2 are working today; Layers 3-4 are partial and still early.

npm install -g @understudy-ai/understudy understudy wizard

GitHub: https://github.com/understudy-ai/understudy

Happy to answer questions about the architecture, teach-by-demonstration, or the limits of the current implementation.

Similar Projects

Design○Pass

Change multiple parts of an image at once with annotations tool[video]

YouTube tutorial without a product link — can't actually try the tool.

julienreszka

2017d ago

AI/ML●Mid

A whiteboard for your AI coding agent [video]

Just a YouTube demo of a whiteboard feature with no code or product to try.

Ship It

kirby88

101mo ago

Developer Tools●●Solid

Logic gates as persistent stateful tasks – a BCD decoder built on a VM

Logic gates as stateful bytecode tasks—elegant model, but narrow use case.

WizardryBig BrainNiche Gem

tracyspacy

203mo ago

AI/ML●Mid

OpenSkynet Your AI Terminator

Browser automation agent when BrowserUse and MultiOn already exist.

Bold Bet

jasonEinstien

5020d ago

Security●●●Banger

An interactive map of hidden AI dev agent action paths

Visualizes the exact four-step path where AI code assistance becomes action authority.

Big BrainBold Bet

davidresilify

101mo ago

Developer Tools●●Solid

Self-Integrating AI Agent

This repo actually wires an OpenCode agent to Membrane so the agent can find existing connectors and synthesize missing ones on the fly — intent becomes action, not just a toy prompt example. It ships a runnable Next.js UI and clear quick-start steps, which makes the idea tangible fast; what I'd like to see next are security notes, more examples of complex connector synthesis, and tests that prove the approach scales beyond demos.

Bold BetBig Brain

hcle25

104mo ago