Back to browse
GitHub Repository

Real-time hallucination detection for LLMs via Geometric Drift Analysis in Hidden States.

15 starsPython

Running hallucination detection on a $200 GPU (RTX 3050, 4GB)

by yubainu·Feb 25, 2026·2 points·2 comments

AI Analysis

●●SolidWizardryBig Brain

Detects hallucinations mid-generation via hidden state geometry, not output analysis.

Strengths
  • Novel approach monitors internal model stability rather than semantic output, enabling real-time intervention
  • Runs on $200 consumer GPU with <1% computational overhead, no server required
  • Published benchmarks (54% detection, 0.8995 ROC-AUC) with reproducible evaluation script and raw data
Weaknesses
  • Lite demo version limited to single detection axis; full 4-axis model results not open-sourced for verification
  • Only tested on Gemma-2B; unclear how approach scales to larger models like 7B or 13B variants
Category
Target Audience

ML engineers, AI safety researchers, developers deploying LLMs locally

Similar To

Anthropic Constitutional AI · Alignment Research Center guardrails

Post Description

I built SIB-ENGINE, a real-time hallucination detection system that monitors LLM internal structure rather than output content.

KEY RESULTS (Gemma-2B, N=1000):

• 54% hallucination detection with 7% false positive rate

• <1% computational overhead (runs on RTX 3050 with 4GB VRAM)

• ROC-AUC: 0.8995

WHY IT'S DIFFERENT:

Traditional methods analyze the output text semantically.

SIB-ENGINE monitors "geometric drift" in hidden states during generation - identifying the structural collapse of the latent space before the first incorrect token is sampled.

This approach offers unique advantages:

• Real-time intervention: Stop generation mid-stream

• Language-agnostic: No semantic analysis needed

• Privacy-preserving: Never reads the actual content

• Extremely lightweight: Works on consumer hardware

HOW IT WORKS: SIB-ENGINE monitors the internal stability of the model's computation. While the system utilizes multiple structural signals to detect instability, two primary indicators include:

Representation Stability: Tracking how the initial intent is preserved or distorted as it moves through the model's transformation space.

Cross-Layer Alignment: Monitoring the consensus of information processing across different neural depths to identify early-stage divergence.

When these (and other proprietary structural signals) deviate from the expected stable manifold, the system flags a potential hallucination before it manifests in the output.

DEMO & CODE:

• Demo video: https://www.youtube.com/watch?v=H1_zDC0SXQ8

• GitHub: https://github.com/yubainu/sibainu-engine

• Raw data: raw_logs.csv (full transparency)

LIMITATIONS:

• Tested on Gemma-2B only (2.5B parameters)

• Designed to scale, but needs validation on larger models

• Catches "structurally unstable" hallucinations (about half)

• Best used as first-line defense in ensemble systems

TECHNICAL NOTES:

• No external models needed (unlike self-consistency methods)

• No knowledge bases required (unlike RAG approaches)

• Adds ~1% inference time vs. 300-500% for semantic methods

• Works by monitoring the process not the product

I'd love feedback on:

• Validation on larger models (Seeking strategic partnerships and compute resources for large-scale validation.)

• Integration patterns for production systems

• Comparison with other structural approaches

• Edge cases where geometric signals fail

This represents a fundamentally different paradigm: instead of asking "is this text correct?", we ask "was the generation process unstable?" The answer is surprisingly informative.

Happy to discuss technical details in the comments!

Similar Projects