3.125-Bit LLM quantization bypassing tensor cores

Name: 3.125-Bit LLM quantization bypassing tensor cores
Availability: InStock
Author: dmaniss

by dmaniss·May 21, 2026·3 points·0 comments

Visit Project View on HN

AI Analysis

●●●BangerBig BrainBold Bet

Replaces Tensor Cores with LUTs and bitwise ops for 3-bit edge inference.

Strengths

•Targets memory bandwidth bottleneck instead of compute, which is the real constraint at batch size 1.
•Data-free quantization avoids needing calibration datasets, simplifying deployment.
•Challenges the industry assumption that more Tensor Cores are the only path forward.

Weaknesses

•No code, benchmarks, or proof-of-concept implementation provided yet.
•Claims about preserving reasoning capabilities need empirical validation.

Similar Projects

AI/ML●●●Banger

Glq LLM quantization using E8 lattice

E8 lattice codebooks beat GPTQ at 2-4 bpw with fused CUDA kernel skipping weight materialization.

WizardryBig Brain

acd

2013d ago

AI/ML●●●Banger

1-Bit Bonsai, the First Commercially Viable 1-Bit LLMs

1-bit weights matching 8B model performance while running 132 tokens/sec on M4 Pro.

Big BrainZero to OneWizardry

PrismML

4301532mo ago

AI/ML●●●Banger

Running a 1.7B parameters LLM on an Apple Watch

Runs a 1.7B LLM offline on Apple Watch using 1-bit quantization.

WizardryNiche Gem

pielouNW

302mo ago

AI/ML●●●Banger

Core Rth. A governed AI kernel for engineers who don't trust their LLMs

Proposal-first governance + hardware E-stop for AI controlling robots/drones—legitimately novel safety architecture.

Big BrainBold Bet

christianrth

113mo ago

AI/ML●Mid

The Deterministic Core Architecture for AI-Augmented Applications

Interesting diagnosis of AI statelessness, but six artifacts aren't directly accessible.

Big Brain

Brandon_Bell

309d ago

AI/ML●Mid

Fixing AI's Core Flaws, A protocol cuts LLM token waste by 40–70%

The repo outlines a concrete seven-layer protocol (SLP, World Model Interpreter, Agent/Persona/Knowledge layers, Metacognition, etc.) and even splits each piece into its own subrepo — that modular breakdown is the repo's strongest move. But this reads more like an ambitious manifesto and design spec than a working system: good docs and diagrams are present, yet there's little visible implementation, benchmarks, or reproducible evidence for the bold claims (like 40–70% token savings).

Bold BetRabbit Hole

WujieGuGavin

103mo ago