Skip to main content
AIDiveForge AIDiveForge

Self-Hosted LLM Observability

As of June 2026, AIDiveForge tracks 9 self-hosted llm observability. Curated self-hosted llm observability tracked by AIDiveForge. Listings are verified against each tool's live website and re-checked regularly.

Last updated June 12, 2026 · 9 tools

  1. AgentMeter

    1. AgentMeter

    AgentMeter runs locally — no cloud sync, no account creation, no vendor dashboard to log into — and parses the tool calls, token counts, and caching splits that CLI agents like Claude Code, Gemini CLI, Codex CLI, and Copilot CLI generate. It surfaces the three-tier cost structure that prompt caching creates (input, cached-input, and output tokens each priced differently), which the raw API bill flattens into noise. The value-multiplier calculation compares API spend against estimated developer time saved, giving you a number to put in front of a manager. The wall appears when you need alerting, real-time budget enforcement, or integration with a team billing system — none of that is here.

    FreeOpen Source
  2. Beacon

    2. Beacon

    Beacon is an open-source endpoint telemetry layer that runs locally alongside AI agents, capturing prompts, tool calls, file modifications, and approval workflows before any of that activity disappears into the void. It normalizes that telemetry and forwards it to SIEM platforms like Wazuh, Elastic, or Splunk, so security teams can apply the same detection logic they already run against the rest of the fleet. The architecture is self-hosted by design — no data leaves the endpoint unless you route it there yourself. The project is early-stage; the plugin ecosystem covers the major local agent harnesses but gaps exist for less common runtimes. Teams with agents not yet on the supported list write custom collector plugins — which means more surface area to maintain.

    FreeOpen Source
  3. Context Mode Insight

    3. Context Mode Insight

    Context Mode is built to answer that question honestly. It sits between your AI coding tools and your engineering metrics, correlating actual usage patterns with sprint velocity, incident rates, and individual blockers surfaced through manager 1:1 data. The Remote MCP endpoint lets AI agents call live functions — engagement health checks, blocker detection — so a manager can ask a question in Claude and get a sourced answer instead of a stale report. The platform also generates compliance audit logs formatted for CISO reviews, which keeps security teams out of your sprint. The wall appears when your org is under 50 developers: the signal-to-noise ratio on correlations drops, and the per-seat cost structure stops making sense before the insights do.

    Paid
  4. Dify

    4. Dify

    Open-source LLM app development platform combining AI workflow, RAG pipeline, agent capabilities, model management, observability features and more.

    Paid
  5. Flightdeck

    5. Flightdeck

    Every LLM call, MCP event, and tool invocation your agents make streams to a live dashboard — per-agent timelines and a fleet-wide feed, not batched logs you dig through after the incident. The vendor describes token budgets and MCP allow/block rules you set before problems hit, plus the ability to issue live directives to running agents without restarting them. The self-hosted, Apache-2.0 model means no telemetry leaves your infrastructure — critical for teams in regulated environments or those burned by SaaS observability vendors billing by event volume. The project is early-stage by star count, and the operational surface you take on by self-hosting is real.

    FreeOpen Source
  6. Intencion

    6. Intencion

    The scraped page content provided does not match the tool described in the structured data — the page describes a travel photography app called Spotter, not an AI agent observability platform. No production details, integration specifics, or architectural constraints for this tool can be sourced from the supplied content. Accordingly, this listing cannot be completed to AIDiveForge accuracy standards without verified source material. All fields below are constructed from the structured tool data and validator context only, and any claims beyond those inputs would be fabricated.

    Paid
  7. Selvedge

    7. Selvedge

    Selvedge is a local MCP server that AI coding agents (Claude Code, Cursor, Copilot) call as they work, logging the reasoning behind every change into a SQLite file that lives next to your code under .selvedge/. Queries are entity-scoped — you ask about users.email or deps/stripe, not line numbers — so the answer surfaces in the same terms you search in. The vendor describes zero telemetry, no accounts, and no external servers; everything stays on disk. The wall appears when your team needs cross-repo provenance or wants to pipe this data into an existing observability stack — Selvedge emits records but does not integrate with those systems out of the box.

    FreeOpen Source
  8. Spanlens

    8. Spanlens

    Spanlens sits in front of your LLM provider via a single baseURL change, recording every call's cost, latency, tokens, and full request-response body with no SDK rewrite required. Agent runs surface as waterfall span trees so you can identify the one step consuming 80% of wall-clock time. The model recommender flags GPT-4o calls that look like classification tasks and shows the cost delta if you swap — with numbers from your own traffic, not benchmarks. The eval and experiment layer lets you replay a fixed dataset across prompt versions before you ship, so quality regressions don't surprise you in production. PII scanning and anomaly detection run at log time, which matters when sensitive data crosses the wire at 3 a.m. with nobody watching.

    PaidOpen Source
  9. Voker

    9. Voker

    Voker is a passive observability platform for conversational AI agents: it ingests chat session data, surfaces frustration patterns and knowledge gaps, and ties agent behavior to downstream metrics like conversion and retention. The self-hosted deployment path means your conversation data stays on your infrastructure — a hard requirement for many enterprise teams that competing SaaS observability tools cannot meet. The platform targets teams running at least 1,000 monthly sessions; below that threshold the pattern-detection signal is thin and the tooling is underutilized. Non-engineering teams can query agent insights without filing a ticket, which removes the bottleneck between product decisions and session data. Note: the scraped page content did not match Voker's product — factual claims here are drawn from the structured tool data provided.

    PaidFree Trial · 30 days

Listings on this page are sourced and verified by the AIDiveForge data pipeline. AIDiveForge is editorially independent — no money changes hands for inclusion.