Back to browse
I fit a 9-agent LLM pipeline into 1.5GB of RAM on iOS

I fit a 9-agent LLM pipeline into 1.5GB of RAM on iOS

by TheCosmicStage·Mar 5, 2026·2 points·0 comments

AI Analysis

●●●BangerWizardryBig BrainShip It

ExecuTorch compilation + speculative decoding cuts 9-agent LLM to 1.5GB on iOS.

Strengths
  • Blackboard pattern decouples multi-agent reasoning without sequential context degradation, solving a real architectural problem.
  • Ahead-of-time PyTorch compilation to .pte binaries eliminates wrapper overhead; speculative decoding gives 2.2-3.6x speedup measured rigorously.
  • Tiered model strategy (1B/3B/11B) with identical architecture across hardware—thoughtful constraint-driven design balancing capability with device reality.
Weaknesses
  • Pre-release tech spec with no live demo, ship date, or user testing—vaporware risk outweighs the architectural innovation.
  • Whisper voice input + biometrics promised but incomplete; shipping timeline unclear and missing critical journaling features (export, sync, backup).
Category
Target Audience

Mobile developers, AI/ML engineers interested in on-device inference

Similar To

Ollama · llama.cpp · MLX (Apple Silicon)

Post Description

"Hey HN. I've been building a completely offline AI journal. The biggest hurdle was the memory footprint of running multiple agent personas. I ended up bypassing standard wrappers and using Meta's ExecuTorch to compile the PyTorch graphs ahead-of-time for the Apple Neural Engine, plus 4-bit quantization. Happy to answer any questions about the CoreML backend or managing the 'Blackboard' state object for the agents without killing the battery."

Similar Projects

AI/ML●●Solid

LLM-use – cost-effective LLM orchestrator for agents

Smart local‑first routing that only escalates to expensive cloud planners when necessary is the standout idea — combined with per‑run cost accounting and full Ollama offline support it solves a real operational itch. The repo is a pragmatic, CLI/TUI-focused toolkit (scraping + cache, MCP server mode) that feels useful for teams wanting a no‑friction orchestrator, but it’s playing in a crowded space of agent frameworks so the novelty is incremental rather than revolutionary.

Niche GemBig Brain
justvugg
214mo ago
AI/ML●●Solid

Memex – A local-first AI journal that keeps everything as Markdown

Local-first AI journal with multi-agent architecture when most competitors store everything in the cloud.

Dark HorseSolve My Problem
sparkleMing
1020d ago