Digest AI vs HN About

GitHub Repository

Neutral, reproducible benchmark for local LLMs on Apple Silicon (Mac · iPhone · iPad) — MLX, llama.cpp, CoreML, Apple Foundation Models

29 starsSwift

iPhone ANE holds LLM tok/s while MLX and LiteRT thermal-throttle

by mlboy·Jun 4, 2026·1 point·0 comments

Visit Project View on HN

AI Analysis

●●●BangerDark HorseBig BrainSolve My Problem

LiteRT beats MLX on Gemma memory while CoreML sips power on the Neural Engine.

Strengths

•Automated `devicectl` headless mode removes manual testing friction on iOS devices.
•Compares Google LiteRT against Apple MLX and CoreML on mobile hardware.
•Reveals Neural Engine memory efficiency versus GPU throughput tradeoffs clearly.

Weaknesses

•"iPhone 17 Pro" label raises eyebrows since the device doesn't publicly exist.
•Limited model coverage favors Gemma and Qwen, needs broader architecture testing.

Category

Target Audience

iOS AI developers, Edge ML engineers

Similar To

MLC Bench · Llama.cpp Benchmarks · Perfetto

Similar Projects

AI/ML●●Solid

Running Gemma 4 on an iPhone 13 Pro

Clean Swift wrapper for Gemma 4 with vision and audio on iPhone.

Niche GemShip It

dengjiuhong

102mo ago

AI/ML●●●Banger

A tiny C program where an LLM rewires its DAG while running

LLM mutates the workflow DAG mid-run via a constrained four-verb grammar.

Big BrainWizardryZero to One

mrkn1

1541mo ago

Developer Tools●●Solid

Slopsome – a VRAM fit calculator and tok/s database for local LLMs

VRAM calculator with crowd-sourced tok/s benchmarks when model cards already exist.

Niche GemSolve My Problem

NexAIGuy

307d ago

AI/ML●●Solid

I ran Qwen3.5 35B on my iPhone at 5.6 tok/SEC

Runs 19.5GB Qwen3.5 on 12GB RAM iPhone via memory swapping.

WizardryBold Bet

alexintosh

423mo ago

AI/ML●●●Banger

We've built a standalone Apple Watch app running LLMs offline, locally

First standalone Apple Watch LLM app running 700M-1.7B models completely offline.

WizardryZero to OneDark Horse

pielouNW

3017d ago

AI/ML●●●Banger

I wrote an LLM inference engine in pure Go – 48 tok/s zero dependencies

Pure Go LLM inference, zero dependencies, 48 tok/s—genuinely novel for Go ecosystem.

Zero to OneWizardryBig Brain

computerex

203mo ago