GitHub Repository

AI enabled insights from emails, calendars, contacts, files, Slack, databases, web... Fast, private and local. Launching soon!

177 starsRust

Extract (financial) data from emails with local LLM

Name: Extract (financial) data from emails with local LLM
Availability: InStock
Author: brainless

by brainless·Mar 10, 2026·1 point·0 comments

Visit Project View on HN

AI Analysis

●MidShip It

Local LLM email parsing when Plaid and receipt scanners already exist.

Strengths

•Template-per-sender approach means LLM runs once per sender, not per email.
•OS keychain integration stores OAuth tokens with single prompt on first launch.
•SQLite storage keeps all email data local with no cloud dependencies.

Weaknesses

•Author acknowledges significant extraction bugs still present in current version.
•Requires manual OAuth setup with Google—friction for non-technical users.

Post Description

I wanted to have all my emails (and files) scanned for financial data. Transactions, Bills (I may not have paid). I wanted this to run entirely locally and not depend on a Large Language Model from a cloud provider.

I initially started with Google Gemini 3 Flash but I switched to Ollama + Ministral 3:3b. The extraction is not exhaustive and there is much to improve but this is working.

dwata runs locally, runs a web backend and the gui runs in browser. Connects to emails, downloads them. Then we can run the financial template detection. It checks for similar looking emails, grouped by sender. Then sends a sample from each cluster to LLM agent. The LLM is asked to find out the parts of text that look like the data we are looking for. dwata then searches for the variables/values that LLM gave in the email, creates a template by replacing the data with template tags. Saves template to DB. dwata parse the data from each email when extracting data.

Roadmap: There is a long way to go, the extractor needs to work much, much, better. dwata will also work on files soon (bank/CC statements).

I want to extract vendors, businesses, contacts, events, places, etc. Connect to different APIs and process everything locally.

dwata will be able to download and process data from Hacker News API too (or other similar sources) - extract entities you care about.

Eventually, only use Ollama/Llama.cpp with models that fit 6-8GB graphics cards or 16GB unified memory only!!