Skip to main content
AIDiveForge AIDiveForge
Visit RunbookHermes

Get This Tool

License: MIT Any use incl. commercial
Local-run terms: MIT-licensed source code may be freely used, modified, and distributed commercially or non-commercially, with retention of copyright and license notices.

Share This Tool

Compare This Tool
📋 Embed this tool on your site

Copy this code to embed a compact tool card:

RunbookHermes

FreeOpen SourceAPISelf-HostedAgentic

Pricing

Model
Free

Summary

Incident response falls apart when the gap between 'something is wrong' and 'we know why' takes longer than the outage itself — and most on-call tooling just pages people faster without doing the diagnosis work. RunbookHermes is an MIT-licensed AIOps agent that closes that gap by autonomously correlating metrics, logs, and traces, proposing evidence-backed remediation, and requiring a human sign-off before anything executes.

The agent runs multi-signal diagnosis across observability data, builds a root-cause hypothesis, and generates or updates runbooks from what it learns — so the next incident with the same failure pattern starts from a documented baseline instead of a blank slate. The approval-gated remediation workflow means automated action doesn't ship without a reviewer, which matters when the blast radius is a production service. Where it breaks: the repo is five commits deep with zero open issues, which signals early-stage software, not battle-hardened infrastructure. Teams with complex multi-service topologies will hit integration gaps before the agent's reasoning does. Self-hosting is required, so operationalizing this adds a deployment and maintenance surface your platform team owns.

Bottom line: Pick RunbookHermes for an SRE team that wants an autonomous first-responder to triage and document incidents while a human stays in the loop — but expect to build integrations yourself if your observability stack is anything beyond what the repo ships with.

Community Performance Report Card

No community ratings yet. Be the first to rate this tool!

Best For: Organizations seeking autonomous incident response with human oversight, Teams wanting to reduce MTTR while maintaining safety, Engineering cultures that treat incidents as learning opportunities, Multi-service deployments with observability data integration, SRE and Platform Engineering teams

Community Benchmarks Community

No community benchmarks yet. Be the first to share a real-world data point.

  • Evidence-driven root-cause hypothesis before remediation is proposed, so the on-call engineer reviews a reasoned diagnosis instead of raw signal noise — which means sign-off decisions take seconds rather than requiring independent investigation.
  • Approval-gated execution model, so automated remediation actions cannot ship to production without a reviewer in the loop — which avoids the class of incidents caused by runaway automation acting on a misdiagnosis.
  • Runbook generation and learning from live incidents, so operational knowledge accumulates in structured documentation rather than living exclusively in the memory of whoever was paged — which matters when the person who handled the last incident is on vacation for the next one.
  • MIT license with full self-hosted deployment, so the agent and its incident data stay inside your own infrastructure — which removes the vendor-access and data-residency concerns that block AIOps adoption in regulated environments.
  • Multi-signal ingestion across metrics, logs, and traces, so the agent correlates evidence across observability layers rather than diagnosing from a single data source — which reduces false-positive root-cause conclusions from incomplete signal.
  • The repository has five commits and no closed issues, which means there is no public evidence of the agent performing correctly under real production incident load — teams that need a vetted tool before adoption will need to run their own failure-mode testing before trusting it on a live on-call rotation.
  • Integration coverage is bounded by what the observability MCP toolserver ships with; teams running Datadog, Honeycomb, or custom telemetry pipelines that fall outside that surface will write and maintain their own integration connectors — at which point they are owning a non-trivial piece of the agent's input layer.
  • There is no community or commercial support path documented in the repo; when the agent produces a wrong root-cause hypothesis or the approval workflow misbehaves at 3 AM, the escalation path is the GitHub repo and whatever institutional knowledge your team has built — teams that require SLA-backed support or vendor escalation will move to a commercial AIOps platform instead.

Community Reviews

No reviews yet. Be the first to share your experience.

About

Platforms
Linux, macOS, Docker, Kubernetes
API Available
Yes
Self-Hosted
Yes
Last Updated
2026-06-09T05:56:26.804Z

Best For

Who it's for

  • Organizations seeking autonomous incident response with human oversight
  • Teams wanting to reduce MTTR while maintaining safety
  • Engineering cultures that treat incidents as learning opportunities
  • Multi-service deployments with observability data integration
  • SRE and Platform Engineering teams

What it does well

  • Production incident response and root-cause analysis
  • Evidence-driven remediation with human approval gates
  • Automated runbook generation and SRE knowledge capture
  • Multi-signal incident diagnosis from metrics, logs, and traces
  • Team training on fault patterns and operational procedures

Integrations

PrometheusLokiAlertmanagerFeishuWeComOpenAI-compatible model providersHermes Agent ecosystem

Discussion Community

No discussion yet. Sign in to start the conversation.

Spotted incorrect or missing data? Join our community of contributors.

Sign Up to Contribute

Community Notes & Tips Community

Be the first to contribute. General notes, observations, gotchas, and tips from people who use this tool day-to-day.

Frequently Asked Questions

Is RunbookHermes free?
Yes — RunbookHermes is fully free to use. There is no paid tier.
Is RunbookHermes open source?
Yes. RunbookHermes is open source.
Does RunbookHermes have an API?
Yes. RunbookHermes exposes a developer API. See the official documentation at https://github.com/tommy-yw/runbookhermes for details.
Can I self-host RunbookHermes?
Yes. RunbookHermes supports self-hosting on your own infrastructure.
What platforms does RunbookHermes support?
RunbookHermes is available on: Linux, macOS, Docker, Kubernetes.

Hours Saved & ROI Stories Community

Be the first to contribute. Concrete time/cost savings, with context. e.g. "Cut my code review backlog from 4h to 45m per week."

RunbookHermes

Incident diagnosis is the part of on-call that burns people out: pulling signals from three different dashboards at 2 AM, manually correlating a latency spike in traces with a log error from ten minutes earlier, then writing up a postmortem that nobody reads before the same pattern hits again. RunbookHermes addresses this as a Hermes-native AIOps agent: it autonomously ingests multi-signal observability data — metrics, logs, and traces — constructs an evidence-driven root-cause hypothesis, proposes a remediation action, and waits for a human to approve before executing. The runbook it produces from that incident becomes the starting point for the next one.

The defining feature is the approval-gated remediation loop. The agent does not act autonomously end-to-end — it reasons and proposes, then the reviewer decides. This is architecturally meaningful for production environments where autonomous execution without oversight is a liability, not a feature. Combined with runbook learning, the system is designed to accumulate operational knowledge from real incidents rather than requiring an SRE team to maintain documentation separately from the work that generates it.

RunbookHermes fits SRE and platform engineering teams who want to reduce mean time to resolution without removing human judgment from the remediation step. The repo includes a TUI gateway, web interface, observability MCP toolserver, ACP adapter, and plugin/skills architecture — indicating a modular design that supports extension. What the repo does not yet show is a track record at scale: with five commits and no closed issues, the gap between the architectural intent and production-hardened behavior is unknown. Teams running heterogeneous observability stacks should audit the integrations directory carefully before committing to this as a production dependency.

The project ships with Docker support, a Nix environment, Homebrew packaging, and an example environment configuration, so the self-hosting path is documented. The observability MCP toolserver in the repo is the integration surface for connecting the agent to live telemetry — the vendor describes this as Hermes-native, meaning the agent framework is purpose-built around this tool rather than layered on top of a generic agent SDK.