Giving Claude Code persistent memory with a self-hosted MCP server
Claude memory without token costs, but requires running five services for one feature.
Self-hosted mem0 MCP server for Claude Code. Run a complete memory server against self-hosted Qdrant + Neo4j + Ollama while using Claude as the main LLM.
Claude Code memory server that auto-reads OAT tokens, routes LLM ops to local or cloud models.
Claude Code users who want session-persistent memory; developers running self-hosted AI infrastructure
mem0ai (the library being wrapped) · Continue.dev context caching · Aider session context management
Every Claude Code session starts from zero, no memory of previous sessions. This server uses mem0ai as a library and exposes 11 MCP tools for storing, searching, and managing memories. Qdrant handles vector storage, Ollama runs embeddings locally (bge-m3), and Neo4j optionally builds a knowledge graph.
Some engineering details HN might find interesting:
- Zero-config auth: auto-reads Claude Code's OAT token from ~/.claude/.credentials.json, detects token type (OAT vs API key), and configures the SDK accordingly. No separate API key needed. - Graph LLM ops (3 calls per add_memory) can be routed to Ollama (free/local), Gemini 2.5 Flash Lite (near-free), or a split-model where Gemini handles entity extraction (85.4% accuracy) and Claude handles contradiction detection (100% accuracy).
Python, MIT licensed, one-command install via uvx.
Claude memory without token costs, but requires running five services for one feature.
Runs extraction and search server-side so your local MCP is a tiny HTTP client — no local DBs, no giant RAM leaks, and an easy npx install and .mcp.json or global MCP registration. It exposes clear tools (save_memory, recall_memories, extract_memories, get_project_context) and adds project-scoped + global preferences — a pragmatic fix for Claude Code's tiny flat-file memory. The tradeoff is obvious: usefulness depends on the hosted API (privacy, uptime, cost), and the repo looks early-stage with minimal commits and docs beyond the quickstart.
Local RAG for Claude when Cursor and other editors already handle context.
Local RAG + MCP for Claude with zero external dependencies—elegant constraint execution.
Hierarchical memory that persists across Claude Code, Cursor, and Windsurf—solve context amnesia.
Claude Code memory without Redis—138 lines of diskcache beats 'use a vector DB' conventional wisdom.