Detecting LLM hallucinations in <1ms using hidden states (RTX3050, 4GB)
Detects hallucinations via hidden state geometry in under 1ms with no training required.
Real-time hallucination detection for LLMs via Geometric Drift Analysis in Hidden States.
Detects hallucinations mid-generation via hidden state geometry, not output analysis.
ML engineers, AI safety researchers, developers deploying LLMs locally
Anthropic Constitutional AI · Alignment Research Center guardrails
KEY RESULTS (Gemma-2B, N=1000):
• 54% hallucination detection with 7% false positive rate
• <1% computational overhead (runs on RTX 3050 with 4GB VRAM)
• ROC-AUC: 0.8995
WHY IT'S DIFFERENT:
Traditional methods analyze the output text semantically.
SIB-ENGINE monitors "geometric drift" in hidden states during generation - identifying the structural collapse of the latent space before the first incorrect token is sampled.
This approach offers unique advantages:
• Real-time intervention: Stop generation mid-stream
• Language-agnostic: No semantic analysis needed
• Privacy-preserving: Never reads the actual content
• Extremely lightweight: Works on consumer hardware
HOW IT WORKS: SIB-ENGINE monitors the internal stability of the model's computation. While the system utilizes multiple structural signals to detect instability, two primary indicators include:
Representation Stability: Tracking how the initial intent is preserved or distorted as it moves through the model's transformation space.
Cross-Layer Alignment: Monitoring the consensus of information processing across different neural depths to identify early-stage divergence.
When these (and other proprietary structural signals) deviate from the expected stable manifold, the system flags a potential hallucination before it manifests in the output.
DEMO & CODE:
• Demo video: https://www.youtube.com/watch?v=H1_zDC0SXQ8
• GitHub: https://github.com/yubainu/sibainu-engine
• Raw data: raw_logs.csv (full transparency)
LIMITATIONS:
• Tested on Gemma-2B only (2.5B parameters)
• Designed to scale, but needs validation on larger models
• Catches "structurally unstable" hallucinations (about half)
• Best used as first-line defense in ensemble systems
TECHNICAL NOTES:
• No external models needed (unlike self-consistency methods)
• No knowledge bases required (unlike RAG approaches)
• Adds ~1% inference time vs. 300-500% for semantic methods
• Works by monitoring the process not the product
I'd love feedback on:
• Validation on larger models (Seeking strategic partnerships and compute resources for large-scale validation.)
• Integration patterns for production systems
• Comparison with other structural approaches
• Edge cases where geometric signals fail
This represents a fundamentally different paradigm: instead of asking "is this text correct?", we ask "was the generation process unstable?" The answer is surprisingly informative.
Happy to discuss technical details in the comments!
Detects hallucinations via hidden state geometry in under 1ms with no training required.
Hallucination detector for LLMs, but existing tools like Guardrails and Langfuse already do this.
Detects hallucinations via latent space geometry instead of text analysis, but 54% detection rate is incomplete.
Monitors GPU+CPU+memory in one themed terminal view—leaner than Glances.
Granular API key controls and token cost tracking beat basic llama.cpp wrappers.
Detects sycophancy and jailbreak drift in LLMs without needing model weights.