Back to browse
Regression tests for detecting cross-domain hallucinations in LLMs

Regression tests for detecting cross-domain hallucinations in LLMs

by Ginsabo·Feb 16, 2026·1 point·2 comments

AI Analysis

●●SolidBig BrainNiche Gem

Regression tests catch cross-domain hallucinations, but prompt-based approach won't scale.

Strengths
  • Identifies real hallucination failure mode (tech-legal-financial overreach) with concrete examples
  • Structured dataset (40 edge cases) + measured baseline gives empirical grounding vs. hand-waved claims
  • Generic system prompt approach means plug-and-play deployment to any LLM
Weaknesses
  • Prompt-based mitigations have known brittleness; adversarial prompts can still break guarantees
  • No comparison to alternative safety approaches (fine-tuning, RLHF, mechanistic interpretability)
Category
Target Audience

LLM safety researchers, prompt engineers, enterprise AI deployment teams

Similar To

Constitutional AI · Anthropic Harmlessness techniques · PromptGuard

Post Description

LLMs sometimes generate structurally valid but logically impossible claims when technical and legal domains mix.

Example failure mode: A model sees “CVE-2024-XXXX fixed in v2.1” and hallucinates a causal link to “Users must pay retroactive fees under EU regulation Article 56.”

To explore this, I built a regression dataset (40 edge cases) covering:

Fake identifier bindings (CVE + version)

Retroactive fiscal claims

Cross-domain causality leaps (Tech → Legal)

Over-assertive phrasing without evidence

Then I designed a structured system prompt that:

Detects official identifiers (CVE, Regulation numbers) vs placeholders

Flags monetary + retroactivity combinations as high-risk

Enforces proportional claim strength based on available evidence

Results:

Automated: 40/40 regression cases pass (JSON dataset + simple Python runner included).

Manual adversarial: ~40 prompts designed to test:

Draft article traps (e.g., hallucinated “Article 52c” in EU AI Act)

Pricing model fabrications (e.g., “billing based on parameter count”)

Version binding errors (e.g., incorrect Node.js default versions)

This is not fine-tuning—just a structured prompt experiment focused on structural validation.

Looking for feedback on:

Missing edge cases

Failure modes I didn’t consider

Whether this approach generalizes beyond legal/technical mixing

Gist (spec + dataset + runner): https://gist.github.com/ginsabo/6ebeb9490846ee6a268bd13560c0...

Similar Projects

AI/MLPass

A text-only reasoning core for LLMs (MIT, system prompt and self-test)

Single TXT boots a menu-driven demo and includes SHA256 verification plus Colab experiments — that packaging shows real operational thinking. It focuses on symbolic-structure failure modes and ships a self-test and runnable MVPs for a subset of problems, which makes it useful for rigorous prompt-level experiments; results will still hinge on the host model, so expect variable payoff.

Big BrainNiche Gem
wfgy-github
104mo ago
AI/ML●●Solid

UQLM – Closed-book hallucination detection with UQ

Peer-reviewed LLM hallucination detector using uncertainty quantification, published in JMLR and TMLR.

Niche GemSolve My Problem
virenbajaj
3114d ago