Self-Hosted Agent Frameworks

As of June 2026, AIDiveForge tracks 44 self-hosted agent frameworks. Curated self-hosted agent frameworks tracked by AIDiveForge. Listings are verified against each tool's live website and re-checked regularly.

Last updated June 12, 2026 · 44 tools

1. Agent Development Kit (ADK)
ADK is the open-source agent development framework that lets you build, debug, and deploy reliable AI agents at enterprise scale.
Free
2. Agent Governance Toolkit
Policy enforcement, zero-trust identity, execution sandboxing, and reliability engineering for autonomous AI agents.
Free
3. agentmemory
Orbit is an open-source agent orchestration harness that wraps coding agent runs in bounded, dependency-ordered tasks, then gates task completion on real validation: tests, lint, and type checks must pass before an orbit closes. Every run produces structured JSON artifacts — agent output, rubric scores, accept/iterate/stop recommendations, and a human-readable progress log — so you have a trail to review, not just a diff to guess at. It runs against Claude, Codex, Cursor, or any agent that speaks JSON over CLI. The demo runs without an API key, which matters when you're evaluating whether it even fits your workflow. Where it strains: teams who need a web UI, multi-agent parallelism, or cloud-managed infrastructure will hit the limits of an intentionally small CLI harness fast.
FreeOpen Source
4. Agnt
AGNT is a local-first agent operating system built around an AGI loop: the agent executes a step, evaluates the result, and re-plans before moving forward — without you steering each decision. Persistent memory and skill layers mean context survives across sessions, not just within a single run. The visual workflow designer handles repeatable paths; goal-mode hands the agent an objective and lets it figure out the steps. Self-hosted deployment with Docker keeps data on your own infrastructure, which matters when your legal team has opinions about where prompts and outputs live. The custom license — not OSI-standard — is the detail that stops procurement at some organizations before the first demo.
PaidOpen Source
5. AutoGPU
The repo describes autonomous agents writing RTL, running it through real EDA tools, reading timing and layout reports, and revising the design — iterating without a human in the seat for each pass. The documented target is small systolic array architectures, specifically matrix-multiply accelerators; the codebase includes ISA definitions, physical design configs, and golden reference models. At that constrained scope, researchers report the agent loop closes. Scale the design complexity beyond what the existing module hierarchy covers and the agents lose the plot — the feedback loops that work for a mac array do not generalize to a multi-block SoC. Teams pushing past the documented scope end up writing their own agent scaffolding on top, at which point AutoGPU is a reference rather than a runtime.
FreeOpen Source
6. AutoLang
Orbit wraps each agent run in a bounded loop: it pulls one task from a dependency-ordered backlog, hands it to whatever agent you've wired up, runs tests, lint, and type checks, and refuses to close the task until validation passes. Every run produces structured JSON — what the agent returned, how it scored against a rubric, whether a human should accept or re-queue. That audit trail is the point. The ceiling appears when your workflow needs anything beyond task-level sequencing: parallel agent execution, real-time dashboards, or integration with existing CI pipelines requires you to build the glue yourself.
FreeOpen Source
7. Browser Use
Browser Use is an open-source Python library for autonomous web task automation using LLMs and computer vision. Teams use it to extract competitive data, fill forms at scale, and monitor page changes across hundreds of sites. The tool hits 89.1% success on standard benchmarks and comes with stealth browser support, CAPTCHA solving, and residential proxies across 195+ countries. The vendor also runs a cloud infrastructure option alongside the self-hosted library. Most production teams pair it with managed browser infrastructure and human approval gates for financial or sensitive actions. The sharp edge: LLMs can't reliably distinguish user instructions from webpage content, leaving agents vulnerable to indirect prompt injection attacks that succeed 24% of the time without defenses.
PaidOpen Source
8. Conversations in AI Coding Agent
Orbit is an MIT-licensed, self-hosted harness that wraps a coding agent run in a bounded loop: it selects a task from a dependency-ordered backlog, hands off to whatever agent you plug in, runs tests and lint as a hard gate, and writes structured JSON artifacts that record exactly what happened. Every closed orbit leaves four files — agent output, rubric scoring, an accept-or-iterate recommendation, and a human-readable progress log. The demo runs without an API key, which means you can verify the mechanics before committing any credentials. The harness is agent-neutral by design; the vendor page cites Claude, Codex, and Cursor as examples. Where it shows its seams: Orbit is intentionally small, so teams needing a hosted dashboard, team-level access controls, or CI/CD pipeline integration will be writing that glue themselves.
FreeOpen Source
9. CopilotKit
The core model is a React and Angular SDK that connects your existing frontend to whatever agent backend you're already running — LangChain, CrewAI, or a custom setup — via the AG-UI protocol, a bi-directional event stream the vendor describes as 'the general-purpose connection between a user-facing application and any agentic backend.' Agents render rich UI cards, forms, and widgets inline as they work, not just text responses. Thread and state persistence is handled automatically across sessions. The friction point arrives when your deployment target isn't a web surface: Slack and Teams connections are flagged as early access, which means you're betting on a roadmap, not a shipping feature. Teams with strict approval gates before agent actions can wire those checkpoints in, but the docs describe this as a configuration responsibility rather than a built-in guardrail system.
PaidOpen Source
10. CrewAI
CrewAI helps enterprises operate teams of AI agents that perform complex tasks autonomously, reliably and with full control. The open-source framework (free, self-hosted) defines agents with roles, goals, and backstories, orchestrating them through tasks; the paid AMP adds a visual Studio, deployment infrastructure, tracing, guardrails, and enterprise features. The framework was rebuilt from scratch to remove LangChain dependency; as of v1.14, it's fully standalone and works with any LLM provider. It's used by nearly half of the Fortune 500. But production friction is real: common Reddit advice is to start with CrewAI for speed and migrate to LangGraph when you hit scaling limits—reasonable for most projects. Users report that enthusiasm evaporates when running repeatedly on multiple components, and executing large SELECT queries overflows the LLM context window.
PaidOpen Source
11. DataGrout Invariant
DataGrout AI's platform is built to govern agents that run across enterprise systems — CRM, ERP, accounting — where an uncontrolled action has a real cost. The vendor describes deterministic execution controls, hallucination prevention, persistent memory across sessions, and audit trails that satisfy compliance review. Observability and cost tracking are positioned as first-class features, not add-ons, so teams can see which agent step burned the most tokens before the bill arrives. The self-hosted option matters for regulated industries where data cannot leave the perimeter. Where the platform has less evidence behind it: community reports and independent benchmarks are scarce, which makes it harder to verify the hallucination reduction claims at scale before you commit.
Paid
12. Dify
Open-source LLM app development platform combining AI workflow, RAG pipeline, agent capabilities, model management, observability features and more.
Paid
13. Eidentic
The SDK centers on a temporal knowledge graph that tracks when facts were true, resolves contradictions, and consolidates between sessions — so the agent sharpens over time rather than accumulating noise. Durable runs, enforced cost ceilings, and CI-gated evals ship as part of the core, not as paid add-ons. The vendor benchmarks report 55.2% on LongMemEval versus 41.0% for full-context stuffing, and claims up to 39× fewer tokens per query. The gap shows up in support and long-running assistant workflows where session history compounds. At v0.1, the ecosystem is early — teams building anything outside the TypeScript path face a hard stop.
FreeOpen Source
14. Elysia
An open-source framework that spins up an end-to-end agentic RAG application with just two terminal commands.
Free
15. Enforra
Orbit is a harness that wraps AI coding agents — Claude, Codex, Cursor, any JSON-speaking CLI — in a bounded task loop: the agent runs, tests and lint decide whether the work passes, and every run leaves inspectable JSON artifacts whether it succeeds or fails. The evidence trail is the product. You get structured output describing what the agent returned, rubric scoring for task focus and diff signal, and a human-readable progress log. Where it breaks: Orbit does not plan, does not write tasks, and does not decide what to build next — it validates and records what other agents attempt. Teams that need autonomous end-to-end execution will hit that ceiling immediately.
FreeOpen Source
16. Enju
Orbit structures agent work into discrete, dependency-ordered loops: one task per run, deterministic validation gates, and four output artifacts that record exactly what the agent returned, how the run scored against a rubric, and what should happen next. The demo runs without an API key, which means you can evaluate the harness itself before spending a single token. Where it gets constrained: Orbit is a harness, not a scheduler — it does not autonomously drive through a backlog or retry failed orbits on its own. Teams wiring it into CI pipelines write the outer loop themselves.
FreeOpen Source
17. FalsifyLab Alpha
The vendor describes FalsifyLab Pro as an MCP server deployable inside Claude Code, Cursor, Cline, or Windsurf, where agents autonomously call tools to pull SEC filings, DeFi vault yields, whale wallet positions, and live macro tape — SPX, VIX, on-chain signals. The free tier returns cached data with rate limits, which is enough to validate a workflow but not enough for production research latency. The Pro subscription unlocks live feeds. Self-hosted deployment is available via PyPI, so teams with data-residency requirements can run it without routing signals through vendor infrastructure. The ceiling appears when research logic grows complex: the tool surfaces data, but multi-step branching across asset classes still lives in your agent scaffolding, not inside FalsifyLab.
PaidFree Trial · 7 days
18. Genomi
The core workflow is four steps: install the agent harness, point it at your raw genome file on disk, build a local SQLite index, then ask questions through whichever AI agent you already run — Claude Code, Cursor, Gemini CLI, Goose, and others are listed as compatible. Pharmacogenomics, carrier status, polygenic risk scores, nutrigenomics, and ancestry PCA projection are all covered through distinct skill modules backed by ClinVar, PharmCAT, PGS Catalog, HPO, GenCC, and 1000 Genomes reference data. The privacy architecture is explicit: raw genome data stays on disk, and only the specific evidence snippets relevant to a query cross the boundary to whatever LLM handles the response. The vendor marks this as experimental and not for clinical use — which means researchers and privacy-conscious individuals exploring personal data are the intended audience, not clinical teams expecting diagnostic-grade output.
FreeOpen Source
19. Goose
Open-source local-first AI agent framework for automating complex tasks with any LLM provider.
Free
20. Hermes Agent
Self-improving open-source AI agent with persistent memory, skill learning, and multi-platform access.
Free
21. Hermes Agent
The agent lives on your server — not a vendor's — and connects to Telegram, Discord, Slack, WhatsApp, Signal, and email simultaneously, so the same agent handles a Slack request in the morning and a scheduled backup at night. Persistent memory and auto-generated skills mean it accumulates institutional knowledge over time rather than starting cold on each invocation. Real sandboxing across Docker, SSH, Singularity, Modal, and local backends means you can isolate risky tasks without routing them through a third party. The ceiling appears when you need managed reliability guarantees: at v0.16.0 this is early-stage software, and self-hosted operations teams carry full responsibility for uptime, credential management, and model API costs. Teams that need SLA-backed infrastructure typically wire Hermes into a managed hosting layer — which adds operational overhead the framework itself does not absorb.
FreeOpen Source
22. Hugging Face Spaces
Orbit acts as a harness around any JSON-speaking coding agent — Claude, Codex, Cursor, or others — running one task per cycle, executing tests and lint checks to decide whether the work advances, and writing structured JSON artifacts for every run. The dependency-aware backlog keeps each task bounded so agents do not drift across scope. Where it breaks: Orbit is intentionally minimal, so teams expecting a hosted dashboard, a GUI, or built-in agent adapters beyond CLI-level integration will build those layers themselves. The artifact trail is machine-readable JSON and a markdown log — useful for audits, not for a non-technical stakeholder who needs a summary.
FreeOpen Source
23. Kikubot
Each Kikubot container polls one IMAP mailbox, feeds incoming email into an LLM agentic loop with a configured tool set, and replies over SMTP. Multi-agent workflows emerge naturally: a coordinator agent emails specialists, specialists reply, threads become the audit trail. The architecture requires a running mail server, which adds operational surface area before a single agent does anything useful. Teams with no existing mail infrastructure will spend more time on SMTP/IMAP setup than on agent logic. When the email-as-bus metaphor stops fitting — high-frequency tasks, sub-second latency requirements, or webhooks that can't wait for a polling interval — this architecture forces a full redesign.
FreeOpen Source
24. Langflow
Open-source visual builder for constructing AI agents and RAG applications via drag-and-drop interface with Python extensibility.
PaidOpen Source
25. LocalFlow
The core loop is deliberately small: Orbit selects one dependency-ordered task, hands it to whichever coding agent you wire in, runs tests, lint, and type checks, and only closes the task if the agent can prove the work passed. Every run produces four artifact files — structured result JSON, rubric-scored evaluation, a review recommendation, and a human-readable progress log. That paper trail is what lets you compare two agents on the same task by diffing artifacts instead of re-running demos. The harness runs locally with no API key required for the replay demo, so there is nothing to provision before you can see it work. The ceiling appears fast on non-coding tasks — Orbit is built for code-output validation and nothing else.
FreeOpen Source
26. MemPalace
Orbit wraps agent runs in bounded loops: it selects one dependency-ordered task, hands it to your agent, runs tests and lint and type checks, and only marks work complete if validation passes. Every run produces structured JSON artifacts and a human-readable progress log, so you are reviewing evidence instead of trusting output. The agent-neutral contract means you can swap Claude, Codex, or Cursor behind the same harness and compare structured artifacts across runs. The tool is intentionally small — it handles the validation harness, not the full development lifecycle. Teams with sparse test coverage will find the validation gates have nothing to enforce.
FreeOpen Source
27. Microsoft Agent Framework
A framework for building, orchestrating and deploying AI agents and multi-agent workflows with support for Python and .NET.
Free
28. Mind-expander
The agent drives the canvas: it can run `npx mind-expander` in the background, load skill integrations, and build guided tours through architecture. You see the same graph the agent is reasoning about, which means review decisions and refactor plans are grounded in actual dependency structure — not the agent's approximation of it. That shared view is the differentiator. The ceiling arrives with language support: Rust and TypeScript are covered, the docs describe more language frontends as planned. Teams whose core services are in Go, Python, or Java will hit that wall on day one.
FreeOpen Source
29. Mnemo
Orbit wraps each agent run in a bounded loop: it selects a dependency-ordered task from your backlog, hands it to whichever coding agent you point at it, then runs tests, lint, and type checks before the task is allowed to close. Every run leaves structured JSON artifacts — what the agent returned, how the output scored against a rubric, and a human-readable recommendation to accept, iterate, or stop. The agent-neutral contract means you can swap Claude for Codex behind the same harness and compare artifacts instead of gut feelings. Where Orbit hits its ceiling: it is a harness, not a planner, so teams that need autonomous task decomposition or cross-repo coordination will be adding that layer themselves.
FreeOpen Source
30. NanoClaw
NanoClaw is a lightweight, open-source personal AI agent that runs on your own machine, connects to messaging apps like WhatsApp, Telegram, Slack, Discord, and Signal, and is built around just 15 source files you can read in a single sitting.
Free
31. Nightwatch
The agent runs a ReAct loop: it calls tools against your live infrastructure — Kubernetes, Docker, AWS, Grafana, GitHub — reasons over what it finds, and produces ranked remediation proposals that sit in a queue waiting for your sign-off before anything touches production. Read-only investigation is the hard constraint by design, which means the agent cannot act unilaterally. That boundary is a feature for regulated or risk-averse teams and a ceiling for teams that want closed-loop auto-remediation. Self-hosted and air-gap friendly, with local inference support, it fits environments where data never leaves the building.
PaidOpen Source
32. OpenAgents
OpenAgents positions itself as the coordination backbone for distributed AI agents. You get a hosted workspace (or self-host) where agents working on separate machines discover each other, share files and browser context, and coordinate via @mentions. Installation is one-liner: install the Launcher desktop app, point agents at a workspace token, and they join. The platform is open-source with an active but modest community. The technical surface is clean—agents register on the network, events flow between them, and context stays shared. The hard part surfaces later: when your agents are actually doing different things (some coding, some reviewing, some managing), orchestrating handoffs stays manual. This is SDK-first, not no-code. If you're building a research team of specialized agents or debugging scenarios where you need human eyes on agent reasoning in real time, the shared workspace genuinely reduces context switching. If you're running a single coding agent that sometimes needs to call another agent, you might be over-engineering it.
FreeOpen Source
33. OpenFang
An open-source Agent Operating System built from scratch in Rust, designed to run autonomous agents on schedules.
Free
34. Patina
Orbit wraps each agent task in a bounded loop: the agent works, validation runs (tests, lint, type checks), and the task only closes when the checks pass. Every loop leaves structured JSON artifacts — what the agent returned, how it scored against a rubric, and a human-readable recommendation to accept, retry, or stop. This makes agent runs auditable after the fact, not just observable in the moment. The ceiling appears when your project needs multi-agent coordination or a hosted execution layer — Orbit is deliberately narrow, self-hosted only, and ships no managed runtime.
FreeOpen Source
35. Preseason.ai
Orbit sits between your backlog and your coding agent, selecting one dependency-ordered task at a time, running the agent, then forcing the result through tests, lint, and type checks before marking the task done. Every run writes structured JSON artifacts — what the agent returned, how the output scored against a rubric, whether a human should accept or iterate — so you are reviewing evidence, not trusting a diff. The agent-neutral contract means you can run Claude, Codex, and Cursor against the same task and compare artifacts instead of impressions. The harness is intentionally minimal; it does not schedule, it does not host, and it does not manage secrets — which means the moment your workflow needs cross-repo coordination or cloud execution, you are writing the glue yourself.
FreeOpen Source
36. ProData AI
Orbit is an open-source harness that wraps AI coding agent runs in a fixed loop: pick a task from a dependency-ordered backlog, run the agent, validate the output against tests, lint, and type checks, then record structured evidence before the task closes. Nothing advances without proof. Each run produces four artifact files — agent output, rubric scores, a recommendation, and a human-readable log — so you can inspect exactly what happened without replaying the whole session. The harness is agent-neutral; Claude, Codex, Cursor, or any JSON-speaking CLI plugs in behind the same contract. The ceiling appears quickly on teams who need anything beyond the validation-gate model — custom orchestration, parallel agent execution, or UI-driven workflow design are not in scope.
FreeOpen Source
37. RoBrain
RoBrain sits between your team's AI coding tools — Claude Code, Cursor, Copilot, Codex CLI — and a shared Postgres instance, capturing not just decisions but the alternatives your team ruled out. An MCP server runs inside the editor and surfaces relevant history before the agent acts; a batch Synthesis scan reads the whole corpus on a schedule to flag contradictions and drift that no single session would catch. That cross-session contradiction detection is where it separates from alternatives that only check at insertion time or silently delete the losing decision. Self-hosted on Apache 2.0 with your own Postgres; cloud extraction and the Planning API are paid-only features.
PaidOpen Source
38. RunbookHermes
The agent runs multi-signal diagnosis across observability data, builds a root-cause hypothesis, and generates or updates runbooks from what it learns — so the next incident with the same failure pattern starts from a documented baseline instead of a blank slate. The approval-gated remediation workflow means automated action doesn't ship without a reviewer, which matters when the blast radius is a production service. Where it breaks: the repo is five commits deep with zero open issues, which signals early-stage software, not battle-hardened infrastructure. Teams with complex multi-service topologies will hit integration gaps before the agent's reasoning does. Self-hosting is required, so operationalizing this adds a deployment and maintenance surface your platform team owns.
FreeOpen Source
39. Skawld
The SDK runs on Node.js 18+ and Bun 1.1+ as an ESM-only package, so it fits cleanly into modern TypeScript projects without a build-step fight. The vendor describes a minimal setup as a single `Agent` instantiation with a provider, a tool set, and a session — you are running a streaming agent loop in under a dozen lines. Where it starts to strain is on the documentation side: the README is thin, full docs live off-repo at skawld.com/docs, and community reports are sparse given the early star count. Teams who need battle-tested enterprise support or a large ecosystem of pre-built integrations will hit that ceiling fast.
FreeOpen Source
40. Tab Council
Orbit wraps agent coding work in a bounded loop: it selects a dependency-ordered task, hands it to whichever agent you've wired up, then requires passing tests, lint, and type checks before the task closes. Every run produces structured JSON — what the agent returned, how it scored against a rubric, and a human-readable progress log. Nothing advances on the agent's word alone. The ceiling appears when your workflow needs anything beyond single-task validation loops: multi-repo coordination, branching logic between tasks, or a hosted dashboard for non-engineering stakeholders all require you to build on top of Orbit yourself.
FreeOpen Source
41. Tabby
Open-source, self-hosted AI coding assistant with code completion, chat, and agentic automation.
Free
42. Thunderbolt
Open-source, self-hosted enterprise AI client emphasizing data sovereignty and model choice.
Paid
43. Vmette
The threat model vmette solves is concrete: prompt injection on a fetched web page, a malicious package in an AI-suggested install, or model output that does something you didn't intend — all of it lands inside the VM, not on your host. The isolation is hardware-level, not a container namespace that a determined process can escape. Because everything runs on-device, no agent output leaves your machine to a third-party cloud sandbox. The ceiling appears at the edges: vmette is macOS-only, and teams whose agents need to run on Linux servers or in CI pipelines will need a different isolation strategy.
FreeOpen Source
44. Z3r0
Z3r0 is an open-source, self-hosted workbench where a coordinating agent (Z3r0/CSO) delegates to five specialist agents — code audit, recon, exploitation validation, reverse engineering, and cryptography — each scoped to a defined domain. Sessions run against a PostgreSQL-backed timeline log with replay, so long engagements survive interruptions and context window rollovers. WorkProject records tie every finding to authorized scope, targets, and sandbox bindings, which means the evidence chain stays intact when the model context doesn't. The wall appears when your engagement requires a specialist task not covered by the six fixed roles — there is no agent plugin system described in the docs, so teams extending scope are writing new agents from scratch.
FreeOpen Source

Listings on this page are sourced and verified by the AIDiveForge data pipeline. AIDiveForge is editorially independent — no money changes hands for inclusion.