Runtime governance layer that refuses high-risk LLM outputs

Name: Runtime governance layer that refuses high-risk LLM outputs
Availability: InStock
Author: milarien

by milarien·Feb 17, 2026·1 point·1 comment

Visit Project View on HN

AI Analysis

●●SolidNiche GemShip It

The Take

The demo implements post-generation admissibility checks and returns structured refusals (decision codes, rule triggered, divergence metrics and a stable prompt fingerprint) so you can audit enforcement decisions. It's a crisp, focused proof-of-concept for runtime enforcement — useful as a starting pattern — but it stops short of addressing bypass/adversarial vectors, deployment integration, or guarantees that make it enforceable at scale.

Post Description

I built a minimal demo of runtime epistemic governance for LLMs. The script calls an upstream model, then applies an admissibility layer before returning the answer. For high-risk actionable claims (e.g., pediatric drug dosages), it refuses the output and logs: decision (pass_through: false) rule triggered divergence from baseline prompt fingerprint (stable hash) This is not prompt engineering — it is post-generation enforcement at inference time. Repo: https://github.com/milarien/aurora-governor-demo Example refusal run: https://github.com/milarien/aurora-governor-demo/tree/main/d... I’m interested in technical critique on whether this qualifies as enforceable runtime governance vs. guardrail filtering.