Back to browse
CLI to score AI prompts after a prod failure

CLI to score AI prompts after a prod failure

by techcam·Mar 18, 2026·1 point·1 comment

AI Analysis

●●SolidSolve My ProblemShip It

Prompt CVE tracking is clever, but LangSmith and Arize already cover this ground.

Strengths
  • Versioned safety scores with published benchmark methodology add credibility.
  • CLI-first workflow fits naturally into existing CI/CD pipelines.
  • Cost estimation across OpenAI, Anthropic, and Gemini helps catch budget blowups.
Weaknesses
  • Prompt analysis space is crowded with LangSmith, PromptLayer, and Portkey.
  • Safety scoring without transparency on training data feels like a black box.
Target Audience

ML engineers and AI product teams shipping prompts to production

Similar To

LangSmith · PromptLayer · Arize Phoenix

Post Description

About six months ago I shipped a customer-facing feature where the system prompt had a subtle ambiguity in the instruction hierarchy. Within two days, users found a natural-language path that caused the model to ignore the safety constraint entirely.

It wasn’t a jailbreak — just phrasing I hadn’t anticipated. The prompt looked fine. It passed code review. It failed in production.

That made me realize how little tooling exists between “write a prompt” and “ship it.”

We have linters for code. We have type checkers. We have static analysis.

For prompts, we mostly have vibes.

So I built CostGuardAI.

npm install -g @camj78/costguardai costguardai analyze my-prompt.txt

It analyzes prompts across a few structural risk dimensions: - jailbreak / prompt injection surface - instruction hierarchy ambiguity - under-constrained outputs (hallucination risk) - conflicting directives - token cost + context usage

It outputs a CostGuardAI Safety Score (0–100, higher = safer) and shows what’s driving the risk.

Example:

CostGuardAI Safety Score: 58 (Warning)

Top Risk Drivers: - instruction ambiguity - missing output constraints - unconstrained role scope

The scoring isn’t trying to predict every failure — it’s closer to static analysis: catching structural patterns that correlate with prompts breaking in production.

If you want to see output before installing: https://costguardai.io/report/demo https://costguardai.io/benchmarks

I’m a solo founder and this is still early, but it’s already caught real issues in my own prompts.

Curious what HN thinks — especially from people working on prompt evals or LLM safety tooling.

Similar Projects

Security●●Solid

PromptSonar – Static analysis for LLM prompt security

Static scanner catches prompt injections in code before runtime, unlike runtime guards.

Solve My ProblemShip It
meghal86
103mo ago
Security●●●Banger

Promptinel – A Security Scanner for Prompts

Deterministic prompt linter flags injection, exfiltration, obfuscation before LLM runs—treats prompts as executable code.

Big BrainZero to OneSolve My Problem
cunningfatalist
103mo ago