Self-hosted AI agent observability (OTel, Grafana, bash hooks)
Complete observability for AI coding assistants, but only supports three CLIs.
Okapi is an observability stack. It ingests telemetry using OTLP, exposes queries via PromQl, stores traces and spans. Okapi can we used to build dashboards, view metrics and store and analyze traces in distributed systems.
Yet another observability stack when Grafana and Honeycomb already dominate the market.
DevOps engineers, SREs, platform teams
Grafana · Honeycomb · Signoz
Features: - Otel everywhere, its Okapi's preferred and only ingestion mechanism :) Currently Okapi supports ingestion via protobuf-over-HTTP. Here's a sample config (https://github.com/okapi-core/okapi?tab=readme-ov-file#examp...)
- Dashboards both via clicks and code: Okapi UI has a dashboard designer hopefully with autocomplete everywhere so users don't have to guess metric paths. However, if you're not a fan of clicks and/or love GitOps all Okapi dashboards can be expressed as YAML templates.
- Out-of-box Service health: For an application instrumented as per Otel conventions, Okapi has REDs as a first-class concept. Service health pages have RED breakdowns for the service, its sub-operations and dependent paths. The calculations are subject to applications being instrumented, but hopefully following a convention makes things easy.
- And of course AI : Okapi has a limited capability AI SRE agent affectionately called Oscar (supposed to be an okapi but no mascot yet). Calling it a full blown SRE is a stretch since its tough job. You can ask Oscar questions in natural language as you would any chatbot and it will try its best to answer. Atleast on integration tests, Oscar can fetch metrics, find traces given criteria and do a multiple step debugging that links query latencies with high CPU usage on hosts.
I am curious to hear feedback from the community so check it out.
TLDR: https://github.com/okapi-core/okapi?tab=readme-ov-file#quick...
Complete observability for AI coding assistants, but only supports three CLIs.
PostHog alternative with Apache Iggy, Delta Lake, and 3-tier query architecture—handles scale differently.
Sits between logs and Datadog—eliminates retry noise, saves 60–90% ingestion volume.
Single-box observability hitting 5M events/sec without Kafka or cloud clusters.
Single Docker container with SQLite beats LangSmith's heavy Postgres dependency.
Yet another SSG when Astro and Hugo already dominate