GitHub Repository

Privacy middleware for LLM & RAG pipelines - consistent pseudonymization, encrypted vault, SSE streaming rehydration.

32 starsRust

CloakPipe – Rust privacy proxy for LLM APIs with pseudonymization

Name: CloakPipe – Rust privacy proxy for LLM APIs with pseudonymization
Availability: InStock
Author: rohansx

by rohansx·Mar 6, 2026·2 points·0 comments

Visit Project View on HN

AI Analysis

●●●BangerBig BrainSolve My ProblemShip It

Consistent pseudonymization preserves semantic structure while hiding entities from API providers.

Strengths

•Solves a real RAG privacy leak (embedding APIs, vector DBs, LLM providers) with elegant consistent hashing approach
•Rust proxy means zero code changes to existing applications; ships on crates.io
•Rehydration architecture preserves embeddings and retrieval semantics better than naive redaction

Weaknesses

•Entity detection reliability depends on underlying NER library; financial/domain-specific entities may be missed
•No benchmarks on latency impact at scale or memory overhead of vault management

Post Description

CloakPipe is a small Rust proxy that sits between your application and any OpenAI-compatible API.

It detects sensitive entities in requests, replaces them with consistent pseudonyms, forwards the sanitized request to the LLM provider, then rehydrates the response before returning it to your app.

“Consistent” means the same input always maps to the same token (e.g. "Tata Motors" → "ORG_7"). This preserves semantic structure so embeddings and retrieval still work, while ensuring the API provider never sees the real entity values.

The motivation came from looking at typical RAG architectures. A standard pipeline leaks data in multiple places per query:

- Raw document text sent to embedding APIs - Embeddings stored in cloud vector databases (recent work like Zero2Text shows they can be inverted) - Query embeddings sent to providers - Retrieved context sent to LLM generation APIs

Existing approaches tend to fall into three buckets:

- Redaction ([REDACTED]) which destroys semantic meaning and breaks retrieval - NER-based detection pipelines that add significant latency - Stateless replacements that break vector search because tokens change between requests

CloakPipe tries to solve this by doing deterministic pseudonymization with a local mapping vault.

Some implementation details:

- Written in Rust as a single binary - <5ms overhead per request in testing - AES-256-GCM encrypted mapping vault with zeroize memory safety - OpenAI-compatible proxy endpoints (`/v1/chat/completions`, `/v1/embeddings`) - Streaming response rehydration (handles tokens split across SSE chunks) - Pattern detection for API keys, JWTs, emails, IPs, financial amounts, fiscal dates - Custom detection rules via TOML config

It's designed to be drop-in: point your client to the proxy by changing `OPENAI_BASE_URL`.

Repo: https://github.com/rohansx/cloakpipe