Grafana Cloud observability plugin for Hermes Agent
Agent installs its own observability plugin via llms.txt instead of manual config.

Zero-code instrumentation via monkey-patching, but Langsmith, Helicone, and Arize already do this.
Python developers building LLM applications who need observability without code refactoring.
Langsmith · Helicone · Arize
It can also gather custom metadata about a call, this can be any KV pairs you want, both pre and post request.
```python
import caliper
import anthropic
caliper.init(target="s3") # This is all that's required for basic observability, no changes needed to LLM calls for basic metrics
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
messages=[{"role": "user", "content": "What is 2 + 2?"}],
caliper_metadata={"campaign": "q4"}, # Pre request metadata
)print(response.content[0].text)
caliper.annotate(sentiment="positive") # Post request metadata
```
You can use this to track effectiveness of model changes, tracking them against difference user tiers. Maybe your free tier users don't notice if you use a cheaper model but you paying users do? How do you know if a recent system prompt change was effective? You can track the version of the prompt in metadata and compare post request rating annotations between prompt versions.
It has a dev mode which logs locally, it can also send files to S3. The SDK has a background queue and worker which flushes in batches that are configurable in size and time between flushes. It exports to S3 as batched JSON files to readily to integrate into most data engineering pipelines or you can just query directly with a tool like DuckDB.
PyPi: https://pypi.org/project/caliper-sdk/
Edits: formatting and PyPi Link
Agent installs its own observability plugin via llms.txt instead of manual config.
Zero-config RAG tracing when LangSmith needs heavy instrumentation.
Yet another LLM ops layer when LangSmith, Helicone, and Braintrust already exist.
Disk-based state with git history lets you audit and replay agent runs.
Acrobat Pro alternative that auto-redacts PII offline before LLM uploads.
Generates event catalogs from code in two commands, replacing manual schema docs.