ChatJimmy Ultra (100k+ tokens/sec)
In-browser LLM inference, but unclear if 100k tok/sec is real or marketing.
TinyVision is an evolving project focused on designing ultra-lightweight image classification models with minimal parameter counts. The goal is to explore what’s actually necessary for fundamental vision tasks by combining handcrafted feature preprocessing with highly efficient CNN architectures.
Ultra-lightweight CNNs achieving 86% accuracy with under 12k parameters.
ML researchers, embedded engineers
MobileNet · SqueezeNet
Hello everyone,
I just wanted to share my project and wanted some feedback on it
Goal: Most image models today are bulky and overkill for basic tasks. This project explores how small we can make image classification models while still keeping them functional by stripping them down to the bare minimum.
Current Progress & Results:
Cat vs Dog Classification: First completed task using a 25,000-image dataset with filter bank preprocessing and compact CNNs.
Achieved up to 86.87% test accuracy with models under 12.5k parameters.
Several models under 5k parameters reached over 83% accuracy, showcasing strong efficiency-performance trade-offs.
CIFAR-10 Classification: Second completed task using the CIFAR-10 dataset. This approach just relies on compact CNN architectures without the filter bank preprocessing.
A 22.11k parameter model achieved 87.38% accuracy.
A 31.15k parameter model achieved 88.43% accuracy.
All code and experiments are available in my GitHub repository: https://github.com/SaptakBhoumik/TinyVision
I would love for you to check out the project and let me know your feedback!
Also, do leave a star if you find it interesting
In-browser LLM inference, but unclear if 100k tok/sec is real or marketing.
Sub-25MB TTS models when Coqui and glow-TTS already dominate the space.
2ms startup beats Lua, but 2x-5x slower at runtime—unfocused tradeoff.
Classification API claiming 100x cheaper than GPT-5.4 with zero-shot setup.
Ternary quantization and layer streaming for 140B models on Mac Mini, but claims lack real-world validation.
100M free tokens is generous, but Hugging Face and Replicate already host models.