Open Source Agent Frameworks

As of June 2026, AIDiveForge tracks 32 open source agent frameworks. Curated open source agent frameworks tracked by AIDiveForge. Each project has a verified public source repository. Listings are verified against each tool's live website and re-checked regularly.

Last updated June 12, 2026 · 32 tools

1. agentmemory
Orbit is an open-source agent orchestration harness that wraps coding agent runs in bounded, dependency-ordered tasks, then gates task completion on real validation: tests, lint, and type checks must pass before an orbit closes. Every run produces structured JSON artifacts — agent output, rubric scores, accept/iterate/stop recommendations, and a human-readable progress log — so you have a trail to review, not just a diff to guess at. It runs against Claude, Codex, Cursor, or any agent that speaks JSON over CLI. The demo runs without an API key, which matters when you're evaluating whether it even fits your workflow. Where it strains: teams who need a web UI, multi-agent parallelism, or cloud-managed infrastructure will hit the limits of an intentionally small CLI harness fast.
FreeOpen Source
2. Agnt
AGNT is a local-first agent operating system built around an AGI loop: the agent executes a step, evaluates the result, and re-plans before moving forward — without you steering each decision. Persistent memory and skill layers mean context survives across sessions, not just within a single run. The visual workflow designer handles repeatable paths; goal-mode hands the agent an objective and lets it figure out the steps. Self-hosted deployment with Docker keeps data on your own infrastructure, which matters when your legal team has opinions about where prompts and outputs live. The custom license — not OSI-standard — is the detail that stops procurement at some organizations before the first demo.
PaidOpen Source
3. AutoGPU
The repo describes autonomous agents writing RTL, running it through real EDA tools, reading timing and layout reports, and revising the design — iterating without a human in the seat for each pass. The documented target is small systolic array architectures, specifically matrix-multiply accelerators; the codebase includes ISA definitions, physical design configs, and golden reference models. At that constrained scope, researchers report the agent loop closes. Scale the design complexity beyond what the existing module hierarchy covers and the agents lose the plot — the feedback loops that work for a mac array do not generalize to a multi-block SoC. Teams pushing past the documented scope end up writing their own agent scaffolding on top, at which point AutoGPU is a reference rather than a runtime.
FreeOpen Source
4. AutoLang
Orbit wraps each agent run in a bounded loop: it pulls one task from a dependency-ordered backlog, hands it to whatever agent you've wired up, runs tests, lint, and type checks, and refuses to close the task until validation passes. Every run produces structured JSON — what the agent returned, how it scored against a rubric, whether a human should accept or re-queue. That audit trail is the point. The ceiling appears when your workflow needs anything beyond task-level sequencing: parallel agent execution, real-time dashboards, or integration with existing CI pipelines requires you to build the glue yourself.
FreeOpen Source
5. Browser Use
Browser Use is an open-source Python library for autonomous web task automation using LLMs and computer vision. Teams use it to extract competitive data, fill forms at scale, and monitor page changes across hundreds of sites. The tool hits 89.1% success on standard benchmarks and comes with stealth browser support, CAPTCHA solving, and residential proxies across 195+ countries. The vendor also runs a cloud infrastructure option alongside the self-hosted library. Most production teams pair it with managed browser infrastructure and human approval gates for financial or sensitive actions. The sharp edge: LLMs can't reliably distinguish user instructions from webpage content, leaving agents vulnerable to indirect prompt injection attacks that succeed 24% of the time without defenses.
PaidOpen Source
6. Conversations in AI Coding Agent
Orbit is an MIT-licensed, self-hosted harness that wraps a coding agent run in a bounded loop: it selects a task from a dependency-ordered backlog, hands off to whatever agent you plug in, runs tests and lint as a hard gate, and writes structured JSON artifacts that record exactly what happened. Every closed orbit leaves four files — agent output, rubric scoring, an accept-or-iterate recommendation, and a human-readable progress log. The demo runs without an API key, which means you can verify the mechanics before committing any credentials. The harness is agent-neutral by design; the vendor page cites Claude, Codex, and Cursor as examples. Where it shows its seams: Orbit is intentionally small, so teams needing a hosted dashboard, team-level access controls, or CI/CD pipeline integration will be writing that glue themselves.
FreeOpen Source
7. CopilotKit
The core model is a React and Angular SDK that connects your existing frontend to whatever agent backend you're already running — LangChain, CrewAI, or a custom setup — via the AG-UI protocol, a bi-directional event stream the vendor describes as 'the general-purpose connection between a user-facing application and any agentic backend.' Agents render rich UI cards, forms, and widgets inline as they work, not just text responses. Thread and state persistence is handled automatically across sessions. The friction point arrives when your deployment target isn't a web surface: Slack and Teams connections are flagged as early access, which means you're betting on a roadmap, not a shipping feature. Teams with strict approval gates before agent actions can wire those checkpoints in, but the docs describe this as a configuration responsibility rather than a built-in guardrail system.
PaidOpen Source
8. CrewAI
CrewAI helps enterprises operate teams of AI agents that perform complex tasks autonomously, reliably and with full control. The open-source framework (free, self-hosted) defines agents with roles, goals, and backstories, orchestrating them through tasks; the paid AMP adds a visual Studio, deployment infrastructure, tracing, guardrails, and enterprise features. The framework was rebuilt from scratch to remove LangChain dependency; as of v1.14, it's fully standalone and works with any LLM provider. It's used by nearly half of the Fortune 500. But production friction is real: common Reddit advice is to start with CrewAI for speed and migrate to LangGraph when you hit scaling limits—reasonable for most projects. Users report that enthusiasm evaporates when running repeatedly on multiple components, and executing large SELECT queries overflows the LLM context window.
PaidOpen Source
9. Eidentic
The SDK centers on a temporal knowledge graph that tracks when facts were true, resolves contradictions, and consolidates between sessions — so the agent sharpens over time rather than accumulating noise. Durable runs, enforced cost ceilings, and CI-gated evals ship as part of the core, not as paid add-ons. The vendor benchmarks report 55.2% on LongMemEval versus 41.0% for full-context stuffing, and claims up to 39× fewer tokens per query. The gap shows up in support and long-running assistant workflows where session history compounds. At v0.1, the ecosystem is early — teams building anything outside the TypeScript path face a hard stop.
FreeOpen Source
10. Enforra
Orbit is a harness that wraps AI coding agents — Claude, Codex, Cursor, any JSON-speaking CLI — in a bounded task loop: the agent runs, tests and lint decide whether the work passes, and every run leaves inspectable JSON artifacts whether it succeeds or fails. The evidence trail is the product. You get structured output describing what the agent returned, rubric scoring for task focus and diff signal, and a human-readable progress log. Where it breaks: Orbit does not plan, does not write tasks, and does not decide what to build next — it validates and records what other agents attempt. Teams that need autonomous end-to-end execution will hit that ceiling immediately.
FreeOpen Source
11. Enju
Orbit structures agent work into discrete, dependency-ordered loops: one task per run, deterministic validation gates, and four output artifacts that record exactly what the agent returned, how the run scored against a rubric, and what should happen next. The demo runs without an API key, which means you can evaluate the harness itself before spending a single token. Where it gets constrained: Orbit is a harness, not a scheduler — it does not autonomously drive through a backlog or retry failed orbits on its own. Teams wiring it into CI pipelines write the outer loop themselves.
FreeOpen Source
12. Genomi
The core workflow is four steps: install the agent harness, point it at your raw genome file on disk, build a local SQLite index, then ask questions through whichever AI agent you already run — Claude Code, Cursor, Gemini CLI, Goose, and others are listed as compatible. Pharmacogenomics, carrier status, polygenic risk scores, nutrigenomics, and ancestry PCA projection are all covered through distinct skill modules backed by ClinVar, PharmCAT, PGS Catalog, HPO, GenCC, and 1000 Genomes reference data. The privacy architecture is explicit: raw genome data stays on disk, and only the specific evidence snippets relevant to a query cross the boundary to whatever LLM handles the response. The vendor marks this as experimental and not for clinical use — which means researchers and privacy-conscious individuals exploring personal data are the intended audience, not clinical teams expecting diagnostic-grade output.
FreeOpen Source
13. Hermes Agent
The agent lives on your server — not a vendor's — and connects to Telegram, Discord, Slack, WhatsApp, Signal, and email simultaneously, so the same agent handles a Slack request in the morning and a scheduled backup at night. Persistent memory and auto-generated skills mean it accumulates institutional knowledge over time rather than starting cold on each invocation. Real sandboxing across Docker, SSH, Singularity, Modal, and local backends means you can isolate risky tasks without routing them through a third party. The ceiling appears when you need managed reliability guarantees: at v0.16.0 this is early-stage software, and self-hosted operations teams carry full responsibility for uptime, credential management, and model API costs. Teams that need SLA-backed infrastructure typically wire Hermes into a managed hosting layer — which adds operational overhead the framework itself does not absorb.
FreeOpen Source
14. Hugging Face Spaces
Orbit acts as a harness around any JSON-speaking coding agent — Claude, Codex, Cursor, or others — running one task per cycle, executing tests and lint checks to decide whether the work advances, and writing structured JSON artifacts for every run. The dependency-aware backlog keeps each task bounded so agents do not drift across scope. Where it breaks: Orbit is intentionally minimal, so teams expecting a hosted dashboard, a GUI, or built-in agent adapters beyond CLI-level integration will build those layers themselves. The artifact trail is machine-readable JSON and a markdown log — useful for audits, not for a non-technical stakeholder who needs a summary.
FreeOpen Source
15. Kikubot
Each Kikubot container polls one IMAP mailbox, feeds incoming email into an LLM agentic loop with a configured tool set, and replies over SMTP. Multi-agent workflows emerge naturally: a coordinator agent emails specialists, specialists reply, threads become the audit trail. The architecture requires a running mail server, which adds operational surface area before a single agent does anything useful. Teams with no existing mail infrastructure will spend more time on SMTP/IMAP setup than on agent logic. When the email-as-bus metaphor stops fitting — high-frequency tasks, sub-second latency requirements, or webhooks that can't wait for a polling interval — this architecture forces a full redesign.
FreeOpen Source
16. Langflow
Open-source visual builder for constructing AI agents and RAG applications via drag-and-drop interface with Python extensibility.
PaidOpen Source
17. LocalFlow
The core loop is deliberately small: Orbit selects one dependency-ordered task, hands it to whichever coding agent you wire in, runs tests, lint, and type checks, and only closes the task if the agent can prove the work passed. Every run produces four artifact files — structured result JSON, rubric-scored evaluation, a review recommendation, and a human-readable progress log. That paper trail is what lets you compare two agents on the same task by diffing artifacts instead of re-running demos. The harness runs locally with no API key required for the replay demo, so there is nothing to provision before you can see it work. The ceiling appears fast on non-coding tasks — Orbit is built for code-output validation and nothing else.
FreeOpen Source
18. MemPalace
Orbit wraps agent runs in bounded loops: it selects one dependency-ordered task, hands it to your agent, runs tests and lint and type checks, and only marks work complete if validation passes. Every run produces structured JSON artifacts and a human-readable progress log, so you are reviewing evidence instead of trusting output. The agent-neutral contract means you can swap Claude, Codex, or Cursor behind the same harness and compare structured artifacts across runs. The tool is intentionally small — it handles the validation harness, not the full development lifecycle. Teams with sparse test coverage will find the validation gates have nothing to enforce.
FreeOpen Source
19. Mind-expander
The agent drives the canvas: it can run `npx mind-expander` in the background, load skill integrations, and build guided tours through architecture. You see the same graph the agent is reasoning about, which means review decisions and refactor plans are grounded in actual dependency structure — not the agent's approximation of it. That shared view is the differentiator. The ceiling arrives with language support: Rust and TypeScript are covered, the docs describe more language frontends as planned. Teams whose core services are in Go, Python, or Java will hit that wall on day one.
FreeOpen Source
20. Mnemo
Orbit wraps each agent run in a bounded loop: it selects a dependency-ordered task from your backlog, hands it to whichever coding agent you point at it, then runs tests, lint, and type checks before the task is allowed to close. Every run leaves structured JSON artifacts — what the agent returned, how the output scored against a rubric, and a human-readable recommendation to accept, iterate, or stop. The agent-neutral contract means you can swap Claude for Codex behind the same harness and compare artifacts instead of gut feelings. Where Orbit hits its ceiling: it is a harness, not a planner, so teams that need autonomous task decomposition or cross-repo coordination will be adding that layer themselves.
FreeOpen Source
21. Nightwatch
The agent runs a ReAct loop: it calls tools against your live infrastructure — Kubernetes, Docker, AWS, Grafana, GitHub — reasons over what it finds, and produces ranked remediation proposals that sit in a queue waiting for your sign-off before anything touches production. Read-only investigation is the hard constraint by design, which means the agent cannot act unilaterally. That boundary is a feature for regulated or risk-averse teams and a ceiling for teams that want closed-loop auto-remediation. Self-hosted and air-gap friendly, with local inference support, it fits environments where data never leaves the building.
PaidOpen Source
22. OpenAgents
OpenAgents positions itself as the coordination backbone for distributed AI agents. You get a hosted workspace (or self-host) where agents working on separate machines discover each other, share files and browser context, and coordinate via @mentions. Installation is one-liner: install the Launcher desktop app, point agents at a workspace token, and they join. The platform is open-source with an active but modest community. The technical surface is clean—agents register on the network, events flow between them, and context stays shared. The hard part surfaces later: when your agents are actually doing different things (some coding, some reviewing, some managing), orchestrating handoffs stays manual. This is SDK-first, not no-code. If you're building a research team of specialized agents or debugging scenarios where you need human eyes on agent reasoning in real time, the shared workspace genuinely reduces context switching. If you're running a single coding agent that sometimes needs to call another agent, you might be over-engineering it.
FreeOpen Source
23. Patina
Orbit wraps each agent task in a bounded loop: the agent works, validation runs (tests, lint, type checks), and the task only closes when the checks pass. Every loop leaves structured JSON artifacts — what the agent returned, how it scored against a rubric, and a human-readable recommendation to accept, retry, or stop. This makes agent runs auditable after the fact, not just observable in the moment. The ceiling appears when your project needs multi-agent coordination or a hosted execution layer — Orbit is deliberately narrow, self-hosted only, and ships no managed runtime.
FreeOpen Source
24. Preseason.ai
Orbit sits between your backlog and your coding agent, selecting one dependency-ordered task at a time, running the agent, then forcing the result through tests, lint, and type checks before marking the task done. Every run writes structured JSON artifacts — what the agent returned, how the output scored against a rubric, whether a human should accept or iterate — so you are reviewing evidence, not trusting a diff. The agent-neutral contract means you can run Claude, Codex, and Cursor against the same task and compare artifacts instead of impressions. The harness is intentionally minimal; it does not schedule, it does not host, and it does not manage secrets — which means the moment your workflow needs cross-repo coordination or cloud execution, you are writing the glue yourself.
FreeOpen Source
25. ProData AI
Orbit is an open-source harness that wraps AI coding agent runs in a fixed loop: pick a task from a dependency-ordered backlog, run the agent, validate the output against tests, lint, and type checks, then record structured evidence before the task closes. Nothing advances without proof. Each run produces four artifact files — agent output, rubric scores, a recommendation, and a human-readable log — so you can inspect exactly what happened without replaying the whole session. The harness is agent-neutral; Claude, Codex, Cursor, or any JSON-speaking CLI plugs in behind the same contract. The ceiling appears quickly on teams who need anything beyond the validation-gate model — custom orchestration, parallel agent execution, or UI-driven workflow design are not in scope.
FreeOpen Source
26. RoBrain
RoBrain sits between your team's AI coding tools — Claude Code, Cursor, Copilot, Codex CLI — and a shared Postgres instance, capturing not just decisions but the alternatives your team ruled out. An MCP server runs inside the editor and surfaces relevant history before the agent acts; a batch Synthesis scan reads the whole corpus on a schedule to flag contradictions and drift that no single session would catch. That cross-session contradiction detection is where it separates from alternatives that only check at insertion time or silently delete the losing decision. Self-hosted on Apache 2.0 with your own Postgres; cloud extraction and the Planning API are paid-only features.
PaidOpen Source
27. RunbookHermes
The agent runs multi-signal diagnosis across observability data, builds a root-cause hypothesis, and generates or updates runbooks from what it learns — so the next incident with the same failure pattern starts from a documented baseline instead of a blank slate. The approval-gated remediation workflow means automated action doesn't ship without a reviewer, which matters when the blast radius is a production service. Where it breaks: the repo is five commits deep with zero open issues, which signals early-stage software, not battle-hardened infrastructure. Teams with complex multi-service topologies will hit integration gaps before the agent's reasoning does. Self-hosting is required, so operationalizing this adds a deployment and maintenance surface your platform team owns.
FreeOpen Source
28. Skawld
The SDK runs on Node.js 18+ and Bun 1.1+ as an ESM-only package, so it fits cleanly into modern TypeScript projects without a build-step fight. The vendor describes a minimal setup as a single `Agent` instantiation with a provider, a tool set, and a session — you are running a streaming agent loop in under a dozen lines. Where it starts to strain is on the documentation side: the README is thin, full docs live off-repo at skawld.com/docs, and community reports are sparse given the early star count. Teams who need battle-tested enterprise support or a large ecosystem of pre-built integrations will hit that ceiling fast.
FreeOpen Source
29. Tab Council
Orbit wraps agent coding work in a bounded loop: it selects a dependency-ordered task, hands it to whichever agent you've wired up, then requires passing tests, lint, and type checks before the task closes. Every run produces structured JSON — what the agent returned, how it scored against a rubric, and a human-readable progress log. Nothing advances on the agent's word alone. The ceiling appears when your workflow needs anything beyond single-task validation loops: multi-repo coordination, branching logic between tasks, or a hosted dashboard for non-engineering stakeholders all require you to build on top of Orbit yourself.
FreeOpen Source
30. Tabbit
Orbit wraps agent execution in bounded, dependency-ordered tasks: one unit of work at a time, with tests, lint, and type checks acting as the gate before progress is recorded. Every run produces four structured artifacts — result JSON, rubric evaluation, a review recommendation, and a human-readable progress log — so code review has evidence instead of vibes. The agent-neutral contract means you can swap Claude, Codex, or Cursor behind the same harness and compare artifacts on identical task sets. The ceiling appears fast: Orbit is deliberately small, so teams that need scheduling across distributed workers or CI/CD pipeline integration will be adding that infrastructure themselves. It is a harness, not a platform.
FreeOpen Source
31. Vmette
The threat model vmette solves is concrete: prompt injection on a fetched web page, a malicious package in an AI-suggested install, or model output that does something you didn't intend — all of it lands inside the VM, not on your host. The isolation is hardware-level, not a container namespace that a determined process can escape. Because everything runs on-device, no agent output leaves your machine to a third-party cloud sandbox. The ceiling appears at the edges: vmette is macOS-only, and teams whose agents need to run on Linux servers or in CI pipelines will need a different isolation strategy.
FreeOpen Source
32. Z3r0
Z3r0 is an open-source, self-hosted workbench where a coordinating agent (Z3r0/CSO) delegates to five specialist agents — code audit, recon, exploitation validation, reverse engineering, and cryptography — each scoped to a defined domain. Sessions run against a PostgreSQL-backed timeline log with replay, so long engagements survive interruptions and context window rollovers. WorkProject records tie every finding to authorized scope, targets, and sandbox bindings, which means the evidence chain stays intact when the model context doesn't. The wall appears when your engagement requires a specialist task not covered by the six fixed roles — there is no agent plugin system described in the docs, so teams extending scope are writing new agents from scratch.
FreeOpen Source

Listings on this page are sourced and verified by the AIDiveForge data pipeline. AIDiveForge is editorially independent — no money changes hands for inclusion.