Self-Hosted LLMs

As of June 2026, AIDiveForge tracks 72 self-hosted llms. Curated self-hosted llms tracked by AIDiveForge. Listings are verified against each tool's live website and re-checked regularly.

Last updated June 12, 2026 · 72 tools

1. Agent Development Kit (ADK)
ADK is the open-source agent development framework that lets you build, debug, and deploy reliable AI agents at enterprise scale.
Free
2. Agent Governance Toolkit
Policy enforcement, zero-trust identity, execution sandboxing, and reliability engineering for autonomous AI agents.
Free
3. agentmemory
Orbit is an open-source agent orchestration harness that wraps coding agent runs in bounded, dependency-ordered tasks, then gates task completion on real validation: tests, lint, and type checks must pass before an orbit closes. Every run produces structured JSON artifacts — agent output, rubric scores, accept/iterate/stop recommendations, and a human-readable progress log — so you have a trail to review, not just a diff to guess at. It runs against Claude, Codex, Cursor, or any agent that speaks JSON over CLI. The demo runs without an API key, which matters when you're evaluating whether it even fits your workflow. Where it strains: teams who need a web UI, multi-agent parallelism, or cloud-managed infrastructure will hit the limits of an intentionally small CLI harness fast.
FreeOpen Source
4. Agnt
AGNT is a local-first agent operating system built around an AGI loop: the agent executes a step, evaluates the result, and re-plans before moving forward — without you steering each decision. Persistent memory and skill layers mean context survives across sessions, not just within a single run. The visual workflow designer handles repeatable paths; goal-mode hands the agent an objective and lets it figure out the steps. Self-hosted deployment with Docker keeps data on your own infrastructure, which matters when your legal team has opinions about where prompts and outputs live. The custom license — not OSI-standard — is the detail that stops procurement at some organizations before the first demo.
PaidOpen Source
5. AnyFrame
AnyFrame lets engineering, ops, and support teams spin up agents that trigger from Slack messages, Linear tickets, or GitHub PR comments and then act — rolling back a deploy, writing tests against a diff, or navigating a billing portal without touching an API. The harness layer is swappable: Claude Code, Codex, Cursor, Gemini CLI, and others sit behind the same agent surface, so a model switch doesn't break your workflow. The SDK lets you embed that same runtime inside your own product in a few lines of code. The ceiling shows up when you need strict approval before an agent acts on production — the vendor describes autonomous execution, and teams that need a mandatory human sign-off step before every consequential action will need to build that gate themselves.
Paid
6. AutoGPU
The repo describes autonomous agents writing RTL, running it through real EDA tools, reading timing and layout reports, and revising the design — iterating without a human in the seat for each pass. The documented target is small systolic array architectures, specifically matrix-multiply accelerators; the codebase includes ISA definitions, physical design configs, and golden reference models. At that constrained scope, researchers report the agent loop closes. Scale the design complexity beyond what the existing module hierarchy covers and the agents lose the plot — the feedback loops that work for a mac array do not generalize to a multi-block SoC. Teams pushing past the documented scope end up writing their own agent scaffolding on top, at which point AutoGPU is a reference rather than a runtime.
FreeOpen Source
7. Autoheal
AI platform leveraging a Production Context Graph to automate alert triage, root cause investigation, and incident remediation for enterprise SRE teams.
Paid
8. AutoLang
Orbit wraps each agent run in a bounded loop: it pulls one task from a dependency-ordered backlog, hands it to whatever agent you've wired up, runs tests, lint, and type checks, and refuses to close the task until validation passes. Every run produces structured JSON — what the agent returned, how it scored against a rubric, whether a human should accept or re-queue. That audit trail is the point. The ceiling appears when your workflow needs anything beyond task-level sequencing: parallel agent execution, real-time dashboards, or integration with existing CI pipelines requires you to build the glue yourself.
FreeOpen Source
9. BGE-M3
BGE is a family of open-source embedding and reranking models from BAAI, released under MIT license with weights available on Hugging Face and PyPI, designed to run entirely on your own infrastructure. The core workflow is straightforward: generate dense embeddings, index them in a vector database, and optionally layer in sparse or multi-vector retrieval for hybrid search. Multi-lingual retrieval is a documented strength, with cross-lingual matching working across language pairs without requiring parallel training data. The ceiling appears when your domain is highly specialized — out-of-the-box embeddings on narrow technical corpora produce ranking quality that requires fine-tuning to fix, and that fine-tuning work lands entirely on your team.
FreeOpen Source
10. Bloom
Bloom generates targeted evaluation suites for arbitrary behavioral traits.
Free
11. Browser Use
Browser Use is an open-source Python library for autonomous web task automation using LLMs and computer vision. Teams use it to extract competitive data, fill forms at scale, and monitor page changes across hundreds of sites. The tool hits 89.1% success on standard benchmarks and comes with stealth browser support, CAPTCHA solving, and residential proxies across 195+ countries. The vendor also runs a cloud infrastructure option alongside the self-hosted library. Most production teams pair it with managed browser infrastructure and human approval gates for financial or sensitive actions. The sharp edge: LLMs can't reliably distinguish user instructions from webpage content, leaving agents vulnerable to indirect prompt injection attacks that succeed 24% of the time without defenses.
PaidOpen Source
12. Ciris
CIRIS runs a signed reasoning agent on your phone or a home device, with no warehouse in the middle for the closest privacy circles. The vendor describes two paths: fully on-device using a small model like Gemma 4, or free hosted inference for phones that can't run a local model — both paths produce cryptographically signed outputs. Every claim the agent makes carries an ed25519+post-quantum signature, so you can audit it, revoke trust, and re-open any conclusion built on a bad source. The architecture depends on a 'social circle' data model; data in your innermost circles never sends the network message that would let anyone request it. Teams needing broad third-party integrations or a hosted API endpoint will find neither here.
FreeOpen Source
13. Codeium
Devin, from Cognition, operates as a self-directed agent: given a task, it plans steps, writes and executes code, runs tests, interprets the output, and iterates — without a developer holding its hand through each transition. The vendor positions it for high-volume routine tickets, legacy migrations, and exploratory codebase work where the bottleneck is throughput, not creativity. Teams delegate backlog tickets and get draft PRs back; the agent handles the scaffolding. The ceiling appears on tasks requiring deep organizational context — tribal knowledge about why a module exists, or business logic that lives in nobody's head and in no doc. At that point, a developer re-enters the loop, which partly offsets the delegation gain.
Paid
14. Command R7B
Command R7B is a smaller language model optimized for tasks that don't require reasoning at the frontier—summarization, classification, instruction-following, and document analysis. Cohere positions it as the pragmatic choice for teams tired of paying for (or waiting on) 70B+ parameter models when a tighter, faster alternative works. It's free and open source, which means no API charges and full control over deployment. The real limitation: it will struggle on abstract reasoning, mathematical proof, or multi-step logic puzzles where 70B models shine. For enterprises choosing between this and proprietary APIs, the tradeoff is real but worth calculating.
PaidOpen Source
15. Conversations in AI Coding Agent
Orbit is an MIT-licensed, self-hosted harness that wraps a coding agent run in a bounded loop: it selects a task from a dependency-ordered backlog, hands off to whatever agent you plug in, runs tests and lint as a hard gate, and writes structured JSON artifacts that record exactly what happened. Every closed orbit leaves four files — agent output, rubric scoring, an accept-or-iterate recommendation, and a human-readable progress log. The demo runs without an API key, which means you can verify the mechanics before committing any credentials. The harness is agent-neutral by design; the vendor page cites Claude, Codex, and Cursor as examples. Where it shows its seams: Orbit is intentionally small, so teams needing a hosted dashboard, team-level access controls, or CI/CD pipeline integration will be writing that glue themselves.
FreeOpen Source
16. CopilotKit
The core model is a React and Angular SDK that connects your existing frontend to whatever agent backend you're already running — LangChain, CrewAI, or a custom setup — via the AG-UI protocol, a bi-directional event stream the vendor describes as 'the general-purpose connection between a user-facing application and any agentic backend.' Agents render rich UI cards, forms, and widgets inline as they work, not just text responses. Thread and state persistence is handled automatically across sessions. The friction point arrives when your deployment target isn't a web surface: Slack and Teams connections are flagged as early access, which means you're betting on a roadmap, not a shipping feature. Teams with strict approval gates before agent actions can wire those checkpoints in, but the docs describe this as a configuration responsibility rather than a built-in guardrail system.
PaidOpen Source
17. CrewAI
CrewAI helps enterprises operate teams of AI agents that perform complex tasks autonomously, reliably and with full control. The open-source framework (free, self-hosted) defines agents with roles, goals, and backstories, orchestrating them through tasks; the paid AMP adds a visual Studio, deployment infrastructure, tracing, guardrails, and enterprise features. The framework was rebuilt from scratch to remove LangChain dependency; as of v1.14, it's fully standalone and works with any LLM provider. It's used by nearly half of the Fortune 500. But production friction is real: common Reddit advice is to start with CrewAI for speed and migrate to LangGraph when you hit scaling limits—reasonable for most projects. Users report that enthusiasm evaporates when running repeatedly on multiple components, and executing large SELECT queries overflows the LLM context window.
PaidOpen Source
18. DataGrout Invariant
DataGrout AI's platform is built to govern agents that run across enterprise systems — CRM, ERP, accounting — where an uncontrolled action has a real cost. The vendor describes deterministic execution controls, hallucination prevention, persistent memory across sessions, and audit trails that satisfy compliance review. Observability and cost tracking are positioned as first-class features, not add-ons, so teams can see which agent step burned the most tokens before the bill arrives. The self-hosted option matters for regulated industries where data cannot leave the perimeter. Where the platform has less evidence behind it: community reports and independent benchmarks are scarce, which makes it harder to verify the hallucination reduction claims at scale before you commit.
Paid
19. DBRX Instruct
DBRX Instruct is a free, open-source large language model built by Databricks for instruction-following tasks in software development and enterprise applications. It uses a mixture-of-experts architecture to balance performance with efficiency, and integrates natively with Databricks' data platform—a meaningful advantage if you're already in that ecosystem. The model shows strong results on coding and reasoning benchmarks, but carries real limitations: no vision capabilities, a shorter context window than Claude or GPT-4, and less real-world adoption in mainstream enterprise settings. For teams deeply embedded in Databricks infrastructure, it's a compelling option; for everyone else, it remains a secondary choice.
FreeOpen Source
20. DeepSeek V3
A fast, chat-based, Mixture-of-Experts (MoE) model from DeepSeek.
PaidOpen Source
21. Dify
Open-source LLM app development platform combining AI workflow, RAG pipeline, agent capabilities, model management, observability features and more.
Paid
22. Due Diligence Agents
The tool runs parallel analysis across Legal, Finance, Commercial, Technology, Cybersecurity, HR, Tax, Regulatory, and ESG workstreams — domains that siloed consultants hand off sequentially, bleeding weeks in the process. Each agent cross-references findings against the others, so a revenue concentration risk in the commercial workstream gets flagged against the indemnification language in legal without a human manually connecting the dots. Outputs land in Excel and Word with citations intact, ready for an IC memo. The knowledge compounds across deal runs, so repeat buyers in the same sector start with context the first team had to build from scratch. The ceiling appears when your data room contains formats the parser does not handle cleanly — and at that point, teams are pre-processing documents manually before the agents ever see them.
FreeOpen Source
23. Eidentic
The SDK centers on a temporal knowledge graph that tracks when facts were true, resolves contradictions, and consolidates between sessions — so the agent sharpens over time rather than accumulating noise. Durable runs, enforced cost ceilings, and CI-gated evals ship as part of the core, not as paid add-ons. The vendor benchmarks report 55.2% on LongMemEval versus 41.0% for full-context stuffing, and claims up to 39× fewer tokens per query. The gap shows up in support and long-running assistant workflows where session history compounds. At v0.1, the ecosystem is early — teams building anything outside the TypeScript path face a hard stop.
FreeOpen Source
24. Elysia
An open-source framework that spins up an end-to-end agentic RAG application with just two terminal commands.
Free
25. Enforra
Orbit is a harness that wraps AI coding agents — Claude, Codex, Cursor, any JSON-speaking CLI — in a bounded task loop: the agent runs, tests and lint decide whether the work passes, and every run leaves inspectable JSON artifacts whether it succeeds or fails. The evidence trail is the product. You get structured output describing what the agent returned, rubric scoring for task focus and diff signal, and a human-readable progress log. Where it breaks: Orbit does not plan, does not write tasks, and does not decide what to build next — it validates and records what other agents attempt. Teams that need autonomous end-to-end execution will hit that ceiling immediately.
FreeOpen Source
26. Enju
Orbit structures agent work into discrete, dependency-ordered loops: one task per run, deterministic validation gates, and four output artifacts that record exactly what the agent returned, how the run scored against a rubric, and what should happen next. The demo runs without an API key, which means you can evaluate the harness itself before spending a single token. Where it gets constrained: Orbit is a harness, not a scheduler — it does not autonomously drive through a backlog or retry failed orbits on its own. Teams wiring it into CI pipelines write the outer loop themselves.
FreeOpen Source
27. Extella.AI
The structured tool data describes an agentic execution platform from Chariot Technologies Lab., Inc. with primitives called Rules, Concepts, and Experts — built for research automation, cross-system operations, and persistent memory across sessions. The scraped page, however, describes Spotter: a mobile app that identifies landmarks, street food, and wildlife via camera snap and saves them as travel journal entries. There is no matching factual source to ground a production review of the intended tool. Writing a listing from the validator summary alone, without page-sourced specifics on architecture, failure modes, or integration depth, would produce claims that cannot be verified.
Free
28. FalsifyLab Alpha
The vendor describes FalsifyLab Pro as an MCP server deployable inside Claude Code, Cursor, Cline, or Windsurf, where agents autonomously call tools to pull SEC filings, DeFi vault yields, whale wallet positions, and live macro tape — SPX, VIX, on-chain signals. The free tier returns cached data with rate limits, which is enough to validate a workflow but not enough for production research latency. The Pro subscription unlocks live feeds. Self-hosted deployment is available via PyPI, so teams with data-residency requirements can run it without routing signals through vendor infrastructure. The ceiling appears when research logic grows complex: the tool surfaces data, but multi-step branching across asset classes still lives in your agent scaffolding, not inside FalsifyLab.
PaidFree Trial · 7 days
29. GEDD
The vendor describes GEDD as a release-readiness tool for AI product managers and domain experts. A PM loads realistic launch-risk scenarios, the domain expert reviews the agent in the shape of the actual task, names failure modes in their own vocabulary, and the session exits with a release report plus a validated evaluation set. That loop converts qualitative judgment into regression gates usable in CI/CD. The ceiling appears when you need programmatic API access — GEDD exposes none, so teams that want to pipe evaluation results into downstream automation build that bridge themselves. Setup requires local installation via pip and depends on sagemaker-mlflow, grounded-evals, and mlflow.
FreeOpen Source
30. Genomi
The core workflow is four steps: install the agent harness, point it at your raw genome file on disk, build a local SQLite index, then ask questions through whichever AI agent you already run — Claude Code, Cursor, Gemini CLI, Goose, and others are listed as compatible. Pharmacogenomics, carrier status, polygenic risk scores, nutrigenomics, and ancestry PCA projection are all covered through distinct skill modules backed by ClinVar, PharmCAT, PGS Catalog, HPO, GenCC, and 1000 Genomes reference data. The privacy architecture is explicit: raw genome data stays on disk, and only the specific evidence snippets relevant to a query cross the boundary to whatever LLM handles the response. The vendor marks this as experimental and not for clinical use — which means researchers and privacy-conscious individuals exploring personal data are the intended audience, not clinical teams expecting diagnostic-grade output.
FreeOpen Source
31. Goose
Open-source local-first AI agent framework for automating complex tasks with any LLM provider.
Free
32. Hermes Agent
Self-improving open-source AI agent with persistent memory, skill learning, and multi-platform access.
Free
33. Hermes Agent
The agent lives on your server — not a vendor's — and connects to Telegram, Discord, Slack, WhatsApp, Signal, and email simultaneously, so the same agent handles a Slack request in the morning and a scheduled backup at night. Persistent memory and auto-generated skills mean it accumulates institutional knowledge over time rather than starting cold on each invocation. Real sandboxing across Docker, SSH, Singularity, Modal, and local backends means you can isolate risky tasks without routing them through a third party. The ceiling appears when you need managed reliability guarantees: at v0.16.0 this is early-stage software, and self-hosted operations teams carry full responsibility for uptime, credential management, and model API costs. Teams that need SLA-backed infrastructure typically wire Hermes into a managed hosting layer — which adds operational overhead the framework itself does not absorb.
FreeOpen Source
34. Hermes Desktop
Hermes Studio is an open-source, self-hosted dashboard that wraps Hermes Agent in a control plane: task scheduling, multi-agent coordination, memory and skill management, cost tracking, and an approval gate for actions you don't want running unsupervised. The vendor describes it as MIT-licensed with no paid tiers, which means every feature ships without a paywall. The architecture assumes you are already running Hermes Agent locally — Hermes Studio is the interface, not the runtime. Teams that need cloud-hosted infrastructure or agents that run without a local Hermes Agent install will hit that wall immediately.
FreeOpen Source
35. HermesBench
OpenResume is a browser-based resume builder and parser that keeps all data local: nothing is sent to a server, no account is required. You fill in a form, the tool renders an ATS-optimized PDF in real time, and you download it. The parser side lets you drop in an existing resume and see exactly how an automated screener will read it — which fields it finds, which it misses. The tool handles one job well. It does not support multiple resume versions with branching tailoring logic, and teams needing bulk generation or API-driven output will find no hooks to connect to.
FreeOpen Source
36. Hugging Face Spaces
Orbit acts as a harness around any JSON-speaking coding agent — Claude, Codex, Cursor, or others — running one task per cycle, executing tests and lint checks to decide whether the work advances, and writing structured JSON artifacts for every run. The dependency-aware backlog keeps each task bounded so agents do not drift across scope. Where it breaks: Orbit is intentionally minimal, so teams expecting a hosted dashboard, a GUI, or built-in agent adapters beyond CLI-level integration will build those layers themselves. The artifact trail is machine-readable JSON and a markdown log — useful for audits, not for a non-technical stakeholder who needs a summary.
FreeOpen Source
37. Kikubot
Each Kikubot container polls one IMAP mailbox, feeds incoming email into an LLM agentic loop with a configured tool set, and replies over SMTP. Multi-agent workflows emerge naturally: a coordinator agent emails specialists, specialists reply, threads become the audit trail. The architecture requires a running mail server, which adds operational surface area before a single agent does anything useful. Teams with no existing mail infrastructure will spend more time on SMTP/IMAP setup than on agent logic. When the email-as-bus metaphor stops fitting — high-frequency tasks, sub-second latency requirements, or webhooks that can't wait for a polling interval — this architecture forces a full redesign.
FreeOpen Source
38. Kimi WebBridge
The platform handles long-horizon coding tasks, parallel document research, and full-stack web generation through a coordinated swarm architecture — the vendor states K2.6 scales to 300 sub-agents running concurrently. The model weights are open-source under a Modified MIT license, so teams with strict data governance can run inference locally rather than routing sensitive payloads to a cloud endpoint. Where the friction surfaces is at the edges: the scraped interface shows a broad surface — Slides, Websites, Docs, Deep Research, Sheets, Agent Swarm, Kimi Code, Kimi Claw — and integrating any of those outputs into an existing CI/CD pipeline requires API work the UI does not abstract. Teams building beyond Kimi's native surfaces reach for the API fast.
Paid
39. Langflow
Open-source visual builder for constructing AI agents and RAG applications via drag-and-drop interface with Python extensibility.
PaidOpen Source
40. Llama 3
Llama 3 is a large language model family designed to handle standard NLP workloads—text generation, translation, summarization, and sentiment analysis—across a range of scales. Meta released it as open source, meaning you can download weights, fine-tune locally, or run it on your own infrastructure instead of hitting an API. The catch: while free to use, the model is young relative to Llama 2, and local deployment requires real hardware or cloud credits. For teams building production systems, this trades managed convenience for control and lower long-term marginal costs.
FreeOpen Source
41. Llama 4 Scout
Scout carries a 10M token context window, meaning you can feed it an entire codebase or a stack of legal documents in a single pass without chunking pipelines or retrieval hacks. Maverick trades raw context depth for stronger multimodal reasoning, handling interleaved image and text inputs through native early-fusion architecture rather than a bolted-on vision adapter. Both models ship as open weights, downloadable from Hugging Face after license acceptance, with no API bill required if you run them yourself. The ceiling appears at inference: the Mixture-of-Experts architecture demands hardware that most teams do not have sitting idle, and running Scout's full 10M context window in practice requires significant GPU memory that a standard cloud instance will not cover.
FreeOpen Source
42. LobeHub
LobeHub lets you define a goal and have the system assemble an agent team, dispatch parallel workers across tasks, and surface results without you approving every step. The agent marketplace and skill library — reportedly over 332,000 skills and 64,000 MCP server connections — mean you're not building from scratch each time. Memory is white-box and editable, so agents don't silently drift from your preferences. Where it gets difficult: the self-hosted path requires you to manage your own infrastructure, and the complexity of multi-agent coordination means debugging a failed task chain is non-trivial. Teams running production workloads tend to add observability tooling — the Langfuse integration listed on the page suggests this is an expected pattern, not an edge case.
Paid
43. Locaible
Locaible runs AI agents entirely on your own machine: no bytes leave the device, no API calls to OpenAI or Anthropic, no telemetry. The vendor states it is GDPR and EU AI Act compliant by design, which matters when your legal or finance team needs a paper trail for the regulator, not a ToS URL. Multi-step workflows chain separate agents — one retrieves from your indexed documents, one analyses, one drafts — each running its own local model. The ceiling appears when your team scales beyond a small LAN setup: team seats authenticate over a private token and require a detected LAN IP, so distributed or remote teams hit a networking configuration wall before they hit a workflow one.
PaidFree Trial · 7 days
44. LocalFlow
The core loop is deliberately small: Orbit selects one dependency-ordered task, hands it to whichever coding agent you wire in, runs tests, lint, and type checks, and only closes the task if the agent can prove the work passed. Every run produces four artifact files — structured result JSON, rubric-scored evaluation, a review recommendation, and a human-readable progress log. That paper trail is what lets you compare two agents on the same task by diffing artifacts instead of re-running demos. The harness runs locally with no API key required for the replay demo, so there is nothing to provision before you can see it work. The ceiling appears fast on non-coding tasks — Orbit is built for code-output validation and nothing else.
FreeOpen Source
45. MagesticAI
The platform runs a pipeline of specialized agents — Planner, Coder, QA — that hand off work through isolated Git worktrees, so each task gets its own branch and a bad run does not contaminate the main codebase. You monitor execution in real-time through a web UI, which means you are not staring at terminal logs hoping the right thing happened. The vendor describes cross-session knowledge retention, so the system carries context between separate task runs. The architecture supports multiple LLM providers, which means you are not locked to one API when costs shift. At 78 stars and 184 commits, this is early-stage software — community support is thin and the blast radius of an undocumented breaking change falls entirely on your team.
FreeOpen Source
46. MemPalace
Orbit wraps agent runs in bounded loops: it selects one dependency-ordered task, hands it to your agent, runs tests and lint and type checks, and only marks work complete if validation passes. Every run produces structured JSON artifacts and a human-readable progress log, so you are reviewing evidence instead of trusting output. The agent-neutral contract means you can swap Claude, Codex, or Cursor behind the same harness and compare structured artifacts across runs. The tool is intentionally small — it handles the validation harness, not the full development lifecycle. Teams with sparse test coverage will find the validation gates have nothing to enforce.
FreeOpen Source
47. Microsoft Agent Framework
A framework for building, orchestrating and deploying AI agents and multi-agent workflows with support for Python and .NET.
Free
48. Mind-expander
The agent drives the canvas: it can run `npx mind-expander` in the background, load skill integrations, and build guided tours through architecture. You see the same graph the agent is reasoning about, which means review decisions and refactor plans are grounded in actual dependency structure — not the agent's approximation of it. That shared view is the differentiator. The ceiling arrives with language support: Rust and TypeScript are covered, the docs describe more language frontends as planned. Teams whose core services are in Go, Python, or Java will hit that wall on day one.
FreeOpen Source
49. Mistral
Mistral offers a family of large language models ranging from the lightweight Mistral 7B to the more capable Mistral Large, accessible both as open-source downloads and via paid API. The company positions itself as the cost-conscious alternative to ChatGPT and Claude, with a free tier covering basic use cases but throttled requests that frustrate serious users. Pricing for the API starts around $0.14 per million input tokens—roughly one-third OpenAI's rate—making it genuinely cheap at scale. The catch: public API documentation remains sparse, and the free tier's limitations mean you'll likely hit a paywall faster than expected.
FreeOpen Source
50. Mistral Large 2
Mistral Large 2 is a general-purpose language model trained to handle complex reasoning, code generation, and multilingual work at the scale enterprises need. It's free to use via API or self-host, sits in the same performance tier as proprietary models from OpenAI and Anthropic, and can ingest documents up to 128,000 tokens long. The core trade-off: it has a knowledge cutoff earlier than competitors and lacks serious vision capabilities, making it less suitable for tasks requiring current events or image understanding. For teams optimizing on cost and reasoning quality rather than breadth of modalities, it's a genuine alternative to paid tiers.
FreeOpen Source
51. Mnemo
Orbit wraps each agent run in a bounded loop: it selects a dependency-ordered task from your backlog, hands it to whichever coding agent you point at it, then runs tests, lint, and type checks before the task is allowed to close. Every run leaves structured JSON artifacts — what the agent returned, how the output scored against a rubric, and a human-readable recommendation to accept, iterate, or stop. The agent-neutral contract means you can swap Claude for Codex behind the same harness and compare artifacts instead of gut feelings. Where Orbit hits its ceiling: it is a harness, not a planner, so teams that need autonomous task decomposition or cross-repo coordination will be adding that layer themselves.
FreeOpen Source
52. NanoClaw
NanoClaw is a lightweight, open-source personal AI agent that runs on your own machine, connects to messaging apps like WhatsApp, Telegram, Slack, Discord, and Signal, and is built around just 15 source files you can read in a single sitting.
Free
53. Nightwatch
The agent runs a ReAct loop: it calls tools against your live infrastructure — Kubernetes, Docker, AWS, Grafana, GitHub — reasons over what it finds, and produces ranked remediation proposals that sit in a queue waiting for your sign-off before anything touches production. Read-only investigation is the hard constraint by design, which means the agent cannot act unilaterally. That boundary is a feature for regulated or risk-averse teams and a ceiling for teams that want closed-loop auto-remediation. Self-hosted and air-gap friendly, with local inference support, it fits environments where data never leaves the building.
PaidOpen Source
54. OpenAgents
OpenAgents positions itself as the coordination backbone for distributed AI agents. You get a hosted workspace (or self-host) where agents working on separate machines discover each other, share files and browser context, and coordinate via @mentions. Installation is one-liner: install the Launcher desktop app, point agents at a workspace token, and they join. The platform is open-source with an active but modest community. The technical surface is clean—agents register on the network, events flow between them, and context stays shared. The hard part surfaces later: when your agents are actually doing different things (some coding, some reviewing, some managing), orchestrating handoffs stays manual. This is SDK-first, not no-code. If you're building a research team of specialized agents or debugging scenarios where you need human eyes on agent reasoning in real time, the shared workspace genuinely reduces context switching. If you're running a single coding agent that sometimes needs to call another agent, you might be over-engineering it.
FreeOpen Source
55. OpenFang
An open-source Agent Operating System built from scratch in Rust, designed to run autonomous agents on schedules.
Free
56. OpenLegion
Each agent gets its own isolated container, spend cap, and vault-proxied credentials — so a rogue agent can't drain your API budget or leak credentials to the next task in the queue. The platform deploys a coordinated fleet from a plain-English description of the function you need: a sales pipeline, a content studio, a research desk. Credential handling and per-agent budgets are locked down by default, which means you're not retrofitting security after something goes wrong. The ceiling appears when your workflow needs branching logic that the template model can't express — at that point you're describing edge cases in natural language and hoping the agent interprets them correctly. Teams with deterministic multi-step requirements often add a separate orchestration layer to compensate.
PaidFree Trial · 7 days
57. Orchestrik.ai
The scraped vendor page does not match the tool data provided. The page content describes 'Spotter,' a travel-identification app, while the structured data references an enterprise AI agent platform from ITMTB Technologies. Because the only factual source available is the Spotter page — which contains no information about multi-agent workflows, compliance features, audit trails, or backend integrations — this listing cannot be written to the publication standard required. Asserting capabilities from the structured input without page-level sourcing would violate the grounding rule. A corrected scrape of the ITMTB Technologies product page is needed before this listing can be completed accurately.
Paid
58. Patina
Orbit wraps each agent task in a bounded loop: the agent works, validation runs (tests, lint, type checks), and the task only closes when the checks pass. Every loop leaves structured JSON artifacts — what the agent returned, how it scored against a rubric, and a human-readable recommendation to accept, retry, or stop. This makes agent runs auditable after the fact, not just observable in the moment. The ceiling appears when your project needs multi-agent coordination or a hosted execution layer — Orbit is deliberately narrow, self-hosted only, and ships no managed runtime.
FreeOpen Source
59. Preseason.ai
Orbit sits between your backlog and your coding agent, selecting one dependency-ordered task at a time, running the agent, then forcing the result through tests, lint, and type checks before marking the task done. Every run writes structured JSON artifacts — what the agent returned, how the output scored against a rubric, whether a human should accept or iterate — so you are reviewing evidence, not trusting a diff. The agent-neutral contract means you can run Claude, Codex, and Cursor against the same task and compare artifacts instead of impressions. The harness is intentionally minimal; it does not schedule, it does not host, and it does not manage secrets — which means the moment your workflow needs cross-repo coordination or cloud execution, you are writing the glue yourself.
FreeOpen Source
60. ProData AI
Orbit is an open-source harness that wraps AI coding agent runs in a fixed loop: pick a task from a dependency-ordered backlog, run the agent, validate the output against tests, lint, and type checks, then record structured evidence before the task closes. Nothing advances without proof. Each run produces four artifact files — agent output, rubric scores, a recommendation, and a human-readable log — so you can inspect exactly what happened without replaying the whole session. The harness is agent-neutral; Claude, Codex, Cursor, or any JSON-speaking CLI plugs in behind the same contract. The ceiling appears quickly on teams who need anything beyond the validation-gate model — custom orchestration, parallel agent execution, or UI-driven workflow design are not in scope.
FreeOpen Source
61. Qwen2.5 72B
Qwen2.5 72B is a free, fully open-source large language model built by Alibaba that you can run on your own hardware. It competes directly with Claude and GPT-4-class models on reasoning, code generation, and math—areas where most open alternatives historically lag—while supporting 128,000 token contexts and multiple languages. The catch is computational: you'll need serious GPU investment (roughly $200k+ in hardware) to run it at scale, and like all LLMs, it has a knowledge cutoff and may need customization for niche domains. For organizations that can afford the infrastructure, it eliminates per-API-call costs entirely.
FreeOpen Source
62. RoBrain
RoBrain sits between your team's AI coding tools — Claude Code, Cursor, Copilot, Codex CLI — and a shared Postgres instance, capturing not just decisions but the alternatives your team ruled out. An MCP server runs inside the editor and surfaces relevant history before the agent acts; a batch Synthesis scan reads the whole corpus on a schedule to flag contradictions and drift that no single session would catch. That cross-session contradiction detection is where it separates from alternatives that only check at insertion time or silently delete the losing decision. Self-hosted on Apache 2.0 with your own Postgres; cloud extraction and the Planning API are paid-only features.
PaidOpen Source
63. RunbookHermes
The agent runs multi-signal diagnosis across observability data, builds a root-cause hypothesis, and generates or updates runbooks from what it learns — so the next incident with the same failure pattern starts from a documented baseline instead of a blank slate. The approval-gated remediation workflow means automated action doesn't ship without a reviewer, which matters when the blast radius is a production service. Where it breaks: the repo is five commits deep with zero open issues, which signals early-stage software, not battle-hardened infrastructure. Teams with complex multi-service topologies will hit integration gaps before the agent's reasoning does. Self-hosting is required, so operationalizing this adds a deployment and maintenance surface your platform team owns.
FreeOpen Source
64. Skawld
The SDK runs on Node.js 18+ and Bun 1.1+ as an ESM-only package, so it fits cleanly into modern TypeScript projects without a build-step fight. The vendor describes a minimal setup as a single `Agent` instantiation with a provider, a tool set, and a session — you are running a streaming agent loop in under a dozen lines. Where it starts to strain is on the documentation side: the README is thin, full docs live off-repo at skawld.com/docs, and community reports are sparse given the early star count. Teams who need battle-tested enterprise support or a large ecosystem of pre-built integrations will hit that ceiling fast.
FreeOpen Source
65. SynapCores Agent
The repo, published by SynapCores under MIT, routes all memory, retrieval, semantic tool selection, and generation through the SynapCores backend — one database as the entire brain. There is no LangChain, no separate vector store, no framework glue to audit or upgrade. The project ships a browser chat widget and a live debug sidebar so you can watch memory recall and tool routing decisions in real time. That transparency is the differentiating feature — and also the boundary: the agent's intelligence rides entirely on the SynapCores backend, whose self-hosted deployment requirements the repo does not fully document. Teams that need the backend running on-premise will hit that wall before they hit a code problem.
FreeOpen Source
66. Tab Council
Orbit wraps agent coding work in a bounded loop: it selects a dependency-ordered task, hands it to whichever agent you've wired up, then requires passing tests, lint, and type checks before the task closes. Every run produces structured JSON — what the agent returned, how it scored against a rubric, and a human-readable progress log. Nothing advances on the agent's word alone. The ceiling appears when your workflow needs anything beyond single-task validation loops: multi-repo coordination, branching logic between tasks, or a hosted dashboard for non-engineering stakeholders all require you to build on top of Orbit yourself.
FreeOpen Source
67. Tabby
Open-source, self-hosted AI coding assistant with code completion, chat, and agentic automation.
Free
68. Teralynk
The scraped page content does not match the tool described in the structured data — the page belongs to Spotter, a travel identification app, not Teralynk's workflow automation platform. No production details about Teralynk's agent architecture, file system integrations, MCP tool use, or governance controls can be sourced from the provided page. The vendor states a freemium model with storage limits and capped workflow runs on the free tier; paid-only features unlock higher run volumes and expanded storage. Teams evaluating this for compliance auditing or multi-cloud document workflows cannot rely on this listing for verified capability claims — vendor documentation should be consulted directly.
Paid
69. Thunderbolt
Open-source, self-hosted enterprise AI client emphasizing data sovereignty and model choice.
Paid
70. Vmette
The threat model vmette solves is concrete: prompt injection on a fetched web page, a malicious package in an AI-suggested install, or model output that does something you didn't intend — all of it lands inside the VM, not on your host. The isolation is hardware-level, not a container namespace that a determined process can escape. Because everything runs on-device, no agent output leaves your machine to a third-party cloud sandbox. The ceiling appears at the edges: vmette is macOS-only, and teams whose agents need to run on Linux servers or in CI pipelines will need a different isolation strategy.
FreeOpen Source
71. WorkBuddy
WorkBuddy runs as a local-first agent on the desktop, autonomously chaining file access, web search, and document generation into single-prompt workflows. The Tencent ecosystem fit is real: WeCom and WeChat integrations mean scheduling and messaging tasks route without extra setup, which matters if your organization already lives there. Outside that ecosystem, the integration surface narrows fast. Teams running mixed SaaS stacks report reaching for MCP-compatible connectors to fill the gaps — which adds configuration overhead the tool is supposed to eliminate. Self-hosted execution is the headline privacy story, but the closed-source codebase means you audit what the vendor discloses, not the code itself.
Paid
72. Z3r0
Z3r0 is an open-source, self-hosted workbench where a coordinating agent (Z3r0/CSO) delegates to five specialist agents — code audit, recon, exploitation validation, reverse engineering, and cryptography — each scoped to a defined domain. Sessions run against a PostgreSQL-backed timeline log with replay, so long engagements survive interruptions and context window rollovers. WorkProject records tie every finding to authorized scope, targets, and sandbox bindings, which means the evidence chain stays intact when the model context doesn't. The wall appears when your engagement requires a specialist task not covered by the six fixed roles — there is no agent plugin system described in the docs, so teams extending scope are writing new agents from scratch.
FreeOpen Source

Listings on this page are sourced and verified by the AIDiveForge data pipeline. AIDiveForge is editorially independent — no money changes hands for inclusion.