Back to browse
GitHub Repository

Privacy middleware for LLM & RAG pipelines - consistent pseudonymization, encrypted vault, SSE streaming rehydration.

32 starsRust

CloakPipe – Rust privacy proxy for LLM APIs with pseudonymization

by rohansx·Mar 6, 2026·2 points·0 comments

AI Analysis

●●●BangerBig BrainSolve My ProblemShip It

Consistent pseudonymization preserves semantic structure while hiding entities from API providers.

Strengths
  • Solves a real RAG privacy leak (embedding APIs, vector DBs, LLM providers) with elegant consistent hashing approach
  • Rust proxy means zero code changes to existing applications; ships on crates.io
  • Rehydration architecture preserves embeddings and retrieval semantics better than naive redaction
Weaknesses
  • Entity detection reliability depends on underlying NER library; financial/domain-specific entities may be missed
  • No benchmarks on latency impact at scale or memory overhead of vault management
Category
Target Audience

Engineers building RAG pipelines and LLM applications with sensitive data

Similar To

Twingate (network-layer privacy) · Zero2Text (embedding inversion attacks) · HashiCorp Vault (secrets management)

Post Description

CloakPipe is a small Rust proxy that sits between your application and any OpenAI-compatible API.

It detects sensitive entities in requests, replaces them with consistent pseudonyms, forwards the sanitized request to the LLM provider, then rehydrates the response before returning it to your app.

“Consistent” means the same input always maps to the same token (e.g. "Tata Motors" → "ORG_7"). This preserves semantic structure so embeddings and retrieval still work, while ensuring the API provider never sees the real entity values.

The motivation came from looking at typical RAG architectures. A standard pipeline leaks data in multiple places per query:

- Raw document text sent to embedding APIs - Embeddings stored in cloud vector databases (recent work like Zero2Text shows they can be inverted) - Query embeddings sent to providers - Retrieved context sent to LLM generation APIs

Existing approaches tend to fall into three buckets:

- Redaction ([REDACTED]) which destroys semantic meaning and breaks retrieval - NER-based detection pipelines that add significant latency - Stateless replacements that break vector search because tokens change between requests

CloakPipe tries to solve this by doing deterministic pseudonymization with a local mapping vault.

Some implementation details:

- Written in Rust as a single binary - <5ms overhead per request in testing - AES-256-GCM encrypted mapping vault with zeroize memory safety - OpenAI-compatible proxy endpoints (`/v1/chat/completions`, `/v1/embeddings`) - Streaming response rehydration (handles tokens split across SSE chunks) - Pattern detection for API keys, JWTs, emails, IPs, financial amounts, fiscal dates - Custom detection rules via TOML config

It's designed to be drop-in: point your client to the proxy by changing `OPENAI_BASE_URL`.

Repo: https://github.com/rohansx/cloakpipe

Similar Projects

AI/ML●●Solid

I built proxy that keeps RAG working while hiding PII

Consistent pseudonymization beats redaction when RAG embeddings must survive.

Big BrainSolve My Problem
rohansx
403mo ago
Security●●●Banger

OpenGuard

Drop-in LLM traffic guard with PII redaction and prompt injection detection, one command.

Solve My ProblemSlick
everlier
103mo ago