Skip to main content
AIDiveForge AIDiveForge

Free LLMs

As of June 2026, AIDiveForge tracks 51 free llms. Curated free llms tracked by AIDiveForge. Each tool listed is currently free. Listings are verified against each tool's live website and re-checked regularly.

Last updated June 12, 2026 · 51 tools

  1. Agent Development Kit (ADK)

    1. Agent Development Kit (ADK)

    ADK is the open-source agent development framework that lets you build, debug, and deploy reliable AI agents at enterprise scale.

    Free
  2. Agent Governance Toolkit

    2. Agent Governance Toolkit

    Policy enforcement, zero-trust identity, execution sandboxing, and reliability engineering for autonomous AI agents.

    Free
  3. agentmemory

    3. agentmemory

    Orbit is an open-source agent orchestration harness that wraps coding agent runs in bounded, dependency-ordered tasks, then gates task completion on real validation: tests, lint, and type checks must pass before an orbit closes. Every run produces structured JSON artifacts — agent output, rubric scores, accept/iterate/stop recommendations, and a human-readable progress log — so you have a trail to review, not just a diff to guess at. It runs against Claude, Codex, Cursor, or any agent that speaks JSON over CLI. The demo runs without an API key, which matters when you're evaluating whether it even fits your workflow. Where it strains: teams who need a web UI, multi-agent parallelism, or cloud-managed infrastructure will hit the limits of an intentionally small CLI harness fast.

    FreeOpen Source
  4. AutoGPU

    4. AutoGPU

    The repo describes autonomous agents writing RTL, running it through real EDA tools, reading timing and layout reports, and revising the design — iterating without a human in the seat for each pass. The documented target is small systolic array architectures, specifically matrix-multiply accelerators; the codebase includes ISA definitions, physical design configs, and golden reference models. At that constrained scope, researchers report the agent loop closes. Scale the design complexity beyond what the existing module hierarchy covers and the agents lose the plot — the feedback loops that work for a mac array do not generalize to a multi-block SoC. Teams pushing past the documented scope end up writing their own agent scaffolding on top, at which point AutoGPU is a reference rather than a runtime.

    FreeOpen Source
  5. AutoLang

    5. AutoLang

    Orbit wraps each agent run in a bounded loop: it pulls one task from a dependency-ordered backlog, hands it to whatever agent you've wired up, runs tests, lint, and type checks, and refuses to close the task until validation passes. Every run produces structured JSON — what the agent returned, how it scored against a rubric, whether a human should accept or re-queue. That audit trail is the point. The ceiling appears when your workflow needs anything beyond task-level sequencing: parallel agent execution, real-time dashboards, or integration with existing CI pipelines requires you to build the glue yourself.

    FreeOpen Source
  6. BGE-M3

    6. BGE-M3

    BGE is a family of open-source embedding and reranking models from BAAI, released under MIT license with weights available on Hugging Face and PyPI, designed to run entirely on your own infrastructure. The core workflow is straightforward: generate dense embeddings, index them in a vector database, and optionally layer in sparse or multi-vector retrieval for hybrid search. Multi-lingual retrieval is a documented strength, with cross-lingual matching working across language pairs without requiring parallel training data. The ceiling appears when your domain is highly specialized — out-of-the-box embeddings on narrow technical corpora produce ranking quality that requires fine-tuning to fix, and that fine-tuning work lands entirely on your team.

    FreeOpen Source
  7. Bloom

    7. Bloom

    Bloom generates targeted evaluation suites for arbitrary behavioral traits.

    Free
  8. Ciris

    8. Ciris

    CIRIS runs a signed reasoning agent on your phone or a home device, with no warehouse in the middle for the closest privacy circles. The vendor describes two paths: fully on-device using a small model like Gemma 4, or free hosted inference for phones that can't run a local model — both paths produce cryptographically signed outputs. Every claim the agent makes carries an ed25519+post-quantum signature, so you can audit it, revoke trust, and re-open any conclusion built on a bad source. The architecture depends on a 'social circle' data model; data in your innermost circles never sends the network message that would let anyone request it. Teams needing broad third-party integrations or a hosted API endpoint will find neither here.

    FreeOpen Source
  9. Conversations in AI Coding Agent

    9. Conversations in AI Coding Agent

    Orbit is an MIT-licensed, self-hosted harness that wraps a coding agent run in a bounded loop: it selects a task from a dependency-ordered backlog, hands off to whatever agent you plug in, runs tests and lint as a hard gate, and writes structured JSON artifacts that record exactly what happened. Every closed orbit leaves four files — agent output, rubric scoring, an accept-or-iterate recommendation, and a human-readable progress log. The demo runs without an API key, which means you can verify the mechanics before committing any credentials. The harness is agent-neutral by design; the vendor page cites Claude, Codex, and Cursor as examples. Where it shows its seams: Orbit is intentionally small, so teams needing a hosted dashboard, team-level access controls, or CI/CD pipeline integration will be writing that glue themselves.

    FreeOpen Source
  10. DBRX Instruct

    10. DBRX Instruct

    DBRX Instruct is a free, open-source large language model built by Databricks for instruction-following tasks in software development and enterprise applications. It uses a mixture-of-experts architecture to balance performance with efficiency, and integrates natively with Databricks' data platform—a meaningful advantage if you're already in that ecosystem. The model shows strong results on coding and reasoning benchmarks, but carries real limitations: no vision capabilities, a shorter context window than Claude or GPT-4, and less real-world adoption in mainstream enterprise settings. For teams deeply embedded in Databricks infrastructure, it's a compelling option; for everyone else, it remains a secondary choice.

    FreeOpen Source
  11. Due Diligence Agents

    11. Due Diligence Agents

    The tool runs parallel analysis across Legal, Finance, Commercial, Technology, Cybersecurity, HR, Tax, Regulatory, and ESG workstreams — domains that siloed consultants hand off sequentially, bleeding weeks in the process. Each agent cross-references findings against the others, so a revenue concentration risk in the commercial workstream gets flagged against the indemnification language in legal without a human manually connecting the dots. Outputs land in Excel and Word with citations intact, ready for an IC memo. The knowledge compounds across deal runs, so repeat buyers in the same sector start with context the first team had to build from scratch. The ceiling appears when your data room contains formats the parser does not handle cleanly — and at that point, teams are pre-processing documents manually before the agents ever see them.

    FreeOpen Source
  12. Eidentic

    12. Eidentic

    The SDK centers on a temporal knowledge graph that tracks when facts were true, resolves contradictions, and consolidates between sessions — so the agent sharpens over time rather than accumulating noise. Durable runs, enforced cost ceilings, and CI-gated evals ship as part of the core, not as paid add-ons. The vendor benchmarks report 55.2% on LongMemEval versus 41.0% for full-context stuffing, and claims up to 39× fewer tokens per query. The gap shows up in support and long-running assistant workflows where session history compounds. At v0.1, the ecosystem is early — teams building anything outside the TypeScript path face a hard stop.

    FreeOpen Source
  13. Elysia

    13. Elysia

    An open-source framework that spins up an end-to-end agentic RAG application with just two terminal commands.

    Free
  14. Enforra

    14. Enforra

    Orbit is a harness that wraps AI coding agents — Claude, Codex, Cursor, any JSON-speaking CLI — in a bounded task loop: the agent runs, tests and lint decide whether the work passes, and every run leaves inspectable JSON artifacts whether it succeeds or fails. The evidence trail is the product. You get structured output describing what the agent returned, rubric scoring for task focus and diff signal, and a human-readable progress log. Where it breaks: Orbit does not plan, does not write tasks, and does not decide what to build next — it validates and records what other agents attempt. Teams that need autonomous end-to-end execution will hit that ceiling immediately.

    FreeOpen Source
  15. Enju

    15. Enju

    Orbit structures agent work into discrete, dependency-ordered loops: one task per run, deterministic validation gates, and four output artifacts that record exactly what the agent returned, how the run scored against a rubric, and what should happen next. The demo runs without an API key, which means you can evaluate the harness itself before spending a single token. Where it gets constrained: Orbit is a harness, not a scheduler — it does not autonomously drive through a backlog or retry failed orbits on its own. Teams wiring it into CI pipelines write the outer loop themselves.

    FreeOpen Source
  16. Extella.AI

    16. Extella.AI

    The structured tool data describes an agentic execution platform from Chariot Technologies Lab., Inc. with primitives called Rules, Concepts, and Experts — built for research automation, cross-system operations, and persistent memory across sessions. The scraped page, however, describes Spotter: a mobile app that identifies landmarks, street food, and wildlife via camera snap and saves them as travel journal entries. There is no matching factual source to ground a production review of the intended tool. Writing a listing from the validator summary alone, without page-sourced specifics on architecture, failure modes, or integration depth, would produce claims that cannot be verified.

    Free
  17. GEDD

    17. GEDD

    The vendor describes GEDD as a release-readiness tool for AI product managers and domain experts. A PM loads realistic launch-risk scenarios, the domain expert reviews the agent in the shape of the actual task, names failure modes in their own vocabulary, and the session exits with a release report plus a validated evaluation set. That loop converts qualitative judgment into regression gates usable in CI/CD. The ceiling appears when you need programmatic API access — GEDD exposes none, so teams that want to pipe evaluation results into downstream automation build that bridge themselves. Setup requires local installation via pip and depends on sagemaker-mlflow, grounded-evals, and mlflow.

    FreeOpen Source
  18. Genomi

    18. Genomi

    The core workflow is four steps: install the agent harness, point it at your raw genome file on disk, build a local SQLite index, then ask questions through whichever AI agent you already run — Claude Code, Cursor, Gemini CLI, Goose, and others are listed as compatible. Pharmacogenomics, carrier status, polygenic risk scores, nutrigenomics, and ancestry PCA projection are all covered through distinct skill modules backed by ClinVar, PharmCAT, PGS Catalog, HPO, GenCC, and 1000 Genomes reference data. The privacy architecture is explicit: raw genome data stays on disk, and only the specific evidence snippets relevant to a query cross the boundary to whatever LLM handles the response. The vendor marks this as experimental and not for clinical use — which means researchers and privacy-conscious individuals exploring personal data are the intended audience, not clinical teams expecting diagnostic-grade output.

    FreeOpen Source
  19. Goose

    19. Goose

    Open-source local-first AI agent framework for automating complex tasks with any LLM provider.

    Free
  20. Guildly

    20. Guildly

    Each agent has a fixed role: PM writes PRDs, Manager routes tickets, SDEs work in isolated git worktrees, Reviewer signs off before anything merges. Every action traces back through a chain — line of code to ticket, ticket to PRD, PRD to the #general message that started it. The audit trail isn't a report you run after the fact; it's the structure the system runs on. That structure is also the ceiling: teams needing agents to adapt their process mid-sprint, or handle workflows that don't fit the six-role model, will hit the playbook's edges before long. The tool is in beta, with no API and no self-hosted option, so the surface you can extend is narrow.

    Free
  21. Hermes Agent

    21. Hermes Agent

    Self-improving open-source AI agent with persistent memory, skill learning, and multi-platform access.

    Free
  22. Hermes Agent

    22. Hermes Agent

    The agent lives on your server — not a vendor's — and connects to Telegram, Discord, Slack, WhatsApp, Signal, and email simultaneously, so the same agent handles a Slack request in the morning and a scheduled backup at night. Persistent memory and auto-generated skills mean it accumulates institutional knowledge over time rather than starting cold on each invocation. Real sandboxing across Docker, SSH, Singularity, Modal, and local backends means you can isolate risky tasks without routing them through a third party. The ceiling appears when you need managed reliability guarantees: at v0.16.0 this is early-stage software, and self-hosted operations teams carry full responsibility for uptime, credential management, and model API costs. Teams that need SLA-backed infrastructure typically wire Hermes into a managed hosting layer — which adds operational overhead the framework itself does not absorb.

    FreeOpen Source
  23. Hermes Desktop

    23. Hermes Desktop

    Hermes Studio is an open-source, self-hosted dashboard that wraps Hermes Agent in a control plane: task scheduling, multi-agent coordination, memory and skill management, cost tracking, and an approval gate for actions you don't want running unsupervised. The vendor describes it as MIT-licensed with no paid tiers, which means every feature ships without a paywall. The architecture assumes you are already running Hermes Agent locally — Hermes Studio is the interface, not the runtime. Teams that need cloud-hosted infrastructure or agents that run without a local Hermes Agent install will hit that wall immediately.

    FreeOpen Source
  24. HermesBench

    24. HermesBench

    OpenResume is a browser-based resume builder and parser that keeps all data local: nothing is sent to a server, no account is required. You fill in a form, the tool renders an ATS-optimized PDF in real time, and you download it. The parser side lets you drop in an existing resume and see exactly how an automated screener will read it — which fields it finds, which it misses. The tool handles one job well. It does not support multiple resume versions with branching tailoring logic, and teams needing bulk generation or API-driven output will find no hooks to connect to.

    FreeOpen Source
  25. Hugging Face Spaces

    25. Hugging Face Spaces

    Orbit acts as a harness around any JSON-speaking coding agent — Claude, Codex, Cursor, or others — running one task per cycle, executing tests and lint checks to decide whether the work advances, and writing structured JSON artifacts for every run. The dependency-aware backlog keeps each task bounded so agents do not drift across scope. Where it breaks: Orbit is intentionally minimal, so teams expecting a hosted dashboard, a GUI, or built-in agent adapters beyond CLI-level integration will build those layers themselves. The artifact trail is machine-readable JSON and a markdown log — useful for audits, not for a non-technical stakeholder who needs a summary.

    FreeOpen Source
  26. Kikubot

    26. Kikubot

    Each Kikubot container polls one IMAP mailbox, feeds incoming email into an LLM agentic loop with a configured tool set, and replies over SMTP. Multi-agent workflows emerge naturally: a coordinator agent emails specialists, specialists reply, threads become the audit trail. The architecture requires a running mail server, which adds operational surface area before a single agent does anything useful. Teams with no existing mail infrastructure will spend more time on SMTP/IMAP setup than on agent logic. When the email-as-bus metaphor stops fitting — high-frequency tasks, sub-second latency requirements, or webhooks that can't wait for a polling interval — this architecture forces a full redesign.

    FreeOpen Source
  27. Llama 3

    27. Llama 3

    Llama 3 is a large language model family designed to handle standard NLP workloads—text generation, translation, summarization, and sentiment analysis—across a range of scales. Meta released it as open source, meaning you can download weights, fine-tune locally, or run it on your own infrastructure instead of hitting an API. The catch: while free to use, the model is young relative to Llama 2, and local deployment requires real hardware or cloud credits. For teams building production systems, this trades managed convenience for control and lower long-term marginal costs.

    FreeOpen Source
  28. Llama 4 Scout

    28. Llama 4 Scout

    Scout carries a 10M token context window, meaning you can feed it an entire codebase or a stack of legal documents in a single pass without chunking pipelines or retrieval hacks. Maverick trades raw context depth for stronger multimodal reasoning, handling interleaved image and text inputs through native early-fusion architecture rather than a bolted-on vision adapter. Both models ship as open weights, downloadable from Hugging Face after license acceptance, with no API bill required if you run them yourself. The ceiling appears at inference: the Mixture-of-Experts architecture demands hardware that most teams do not have sitting idle, and running Scout's full 10M context window in practice requires significant GPU memory that a standard cloud instance will not cover.

    FreeOpen Source
  29. LocalFlow

    29. LocalFlow

    The core loop is deliberately small: Orbit selects one dependency-ordered task, hands it to whichever coding agent you wire in, runs tests, lint, and type checks, and only closes the task if the agent can prove the work passed. Every run produces four artifact files — structured result JSON, rubric-scored evaluation, a review recommendation, and a human-readable progress log. That paper trail is what lets you compare two agents on the same task by diffing artifacts instead of re-running demos. The harness runs locally with no API key required for the replay demo, so there is nothing to provision before you can see it work. The ceiling appears fast on non-coding tasks — Orbit is built for code-output validation and nothing else.

    FreeOpen Source
  30. MagesticAI

    30. MagesticAI

    The platform runs a pipeline of specialized agents — Planner, Coder, QA — that hand off work through isolated Git worktrees, so each task gets its own branch and a bad run does not contaminate the main codebase. You monitor execution in real-time through a web UI, which means you are not staring at terminal logs hoping the right thing happened. The vendor describes cross-session knowledge retention, so the system carries context between separate task runs. The architecture supports multiple LLM providers, which means you are not locked to one API when costs shift. At 78 stars and 184 commits, this is early-stage software — community support is thin and the blast radius of an undocumented breaking change falls entirely on your team.

    FreeOpen Source
  31. MemPalace

    31. MemPalace

    Orbit wraps agent runs in bounded loops: it selects one dependency-ordered task, hands it to your agent, runs tests and lint and type checks, and only marks work complete if validation passes. Every run produces structured JSON artifacts and a human-readable progress log, so you are reviewing evidence instead of trusting output. The agent-neutral contract means you can swap Claude, Codex, or Cursor behind the same harness and compare structured artifacts across runs. The tool is intentionally small — it handles the validation harness, not the full development lifecycle. Teams with sparse test coverage will find the validation gates have nothing to enforce.

    FreeOpen Source
  32. Microsoft Agent Framework

    32. Microsoft Agent Framework

    A framework for building, orchestrating and deploying AI agents and multi-agent workflows with support for Python and .NET.

    Free
  33. Mind-expander

    33. Mind-expander

    The agent drives the canvas: it can run `npx mind-expander` in the background, load skill integrations, and build guided tours through architecture. You see the same graph the agent is reasoning about, which means review decisions and refactor plans are grounded in actual dependency structure — not the agent's approximation of it. That shared view is the differentiator. The ceiling arrives with language support: Rust and TypeScript are covered, the docs describe more language frontends as planned. Teams whose core services are in Go, Python, or Java will hit that wall on day one.

    FreeOpen Source
  34. Mistral

    34. Mistral

    Mistral offers a family of large language models ranging from the lightweight Mistral 7B to the more capable Mistral Large, accessible both as open-source downloads and via paid API. The company positions itself as the cost-conscious alternative to ChatGPT and Claude, with a free tier covering basic use cases but throttled requests that frustrate serious users. Pricing for the API starts around $0.14 per million input tokens—roughly one-third OpenAI's rate—making it genuinely cheap at scale. The catch: public API documentation remains sparse, and the free tier's limitations mean you'll likely hit a paywall faster than expected.

    FreeOpen Source
  35. Mistral Large 2

    35. Mistral Large 2

    Mistral Large 2 is a general-purpose language model trained to handle complex reasoning, code generation, and multilingual work at the scale enterprises need. It's free to use via API or self-host, sits in the same performance tier as proprietary models from OpenAI and Anthropic, and can ingest documents up to 128,000 tokens long. The core trade-off: it has a knowledge cutoff earlier than competitors and lacks serious vision capabilities, making it less suitable for tasks requiring current events or image understanding. For teams optimizing on cost and reasoning quality rather than breadth of modalities, it's a genuine alternative to paid tiers.

    FreeOpen Source
  36. Mnemo

    36. Mnemo

    Orbit wraps each agent run in a bounded loop: it selects a dependency-ordered task from your backlog, hands it to whichever coding agent you point at it, then runs tests, lint, and type checks before the task is allowed to close. Every run leaves structured JSON artifacts — what the agent returned, how the output scored against a rubric, and a human-readable recommendation to accept, iterate, or stop. The agent-neutral contract means you can swap Claude for Codex behind the same harness and compare artifacts instead of gut feelings. Where Orbit hits its ceiling: it is a harness, not a planner, so teams that need autonomous task decomposition or cross-repo coordination will be adding that layer themselves.

    FreeOpen Source
  37. NanoClaw

    37. NanoClaw

    NanoClaw is a lightweight, open-source personal AI agent that runs on your own machine, connects to messaging apps like WhatsApp, Telegram, Slack, Discord, and Signal, and is built around just 15 source files you can read in a single sitting.

    Free
  38. OpenAgents

    38. OpenAgents

    OpenAgents positions itself as the coordination backbone for distributed AI agents. You get a hosted workspace (or self-host) where agents working on separate machines discover each other, share files and browser context, and coordinate via @mentions. Installation is one-liner: install the Launcher desktop app, point agents at a workspace token, and they join. The platform is open-source with an active but modest community. The technical surface is clean—agents register on the network, events flow between them, and context stays shared. The hard part surfaces later: when your agents are actually doing different things (some coding, some reviewing, some managing), orchestrating handoffs stays manual. This is SDK-first, not no-code. If you're building a research team of specialized agents or debugging scenarios where you need human eyes on agent reasoning in real time, the shared workspace genuinely reduces context switching. If you're running a single coding agent that sometimes needs to call another agent, you might be over-engineering it.

    FreeOpen Source
  39. OpenFang

    39. OpenFang

    An open-source Agent Operating System built from scratch in Rust, designed to run autonomous agents on schedules.

    Free
  40. Patina

    40. Patina

    Orbit wraps each agent task in a bounded loop: the agent works, validation runs (tests, lint, type checks), and the task only closes when the checks pass. Every loop leaves structured JSON artifacts — what the agent returned, how it scored against a rubric, and a human-readable recommendation to accept, retry, or stop. This makes agent runs auditable after the fact, not just observable in the moment. The ceiling appears when your project needs multi-agent coordination or a hosted execution layer — Orbit is deliberately narrow, self-hosted only, and ships no managed runtime.

    FreeOpen Source
  41. Preseason.ai

    41. Preseason.ai

    Orbit sits between your backlog and your coding agent, selecting one dependency-ordered task at a time, running the agent, then forcing the result through tests, lint, and type checks before marking the task done. Every run writes structured JSON artifacts — what the agent returned, how the output scored against a rubric, whether a human should accept or iterate — so you are reviewing evidence, not trusting a diff. The agent-neutral contract means you can run Claude, Codex, and Cursor against the same task and compare artifacts instead of impressions. The harness is intentionally minimal; it does not schedule, it does not host, and it does not manage secrets — which means the moment your workflow needs cross-repo coordination or cloud execution, you are writing the glue yourself.

    FreeOpen Source
  42. ProData AI

    42. ProData AI

    Orbit is an open-source harness that wraps AI coding agent runs in a fixed loop: pick a task from a dependency-ordered backlog, run the agent, validate the output against tests, lint, and type checks, then record structured evidence before the task closes. Nothing advances without proof. Each run produces four artifact files — agent output, rubric scores, a recommendation, and a human-readable log — so you can inspect exactly what happened without replaying the whole session. The harness is agent-neutral; Claude, Codex, Cursor, or any JSON-speaking CLI plugs in behind the same contract. The ceiling appears quickly on teams who need anything beyond the validation-gate model — custom orchestration, parallel agent execution, or UI-driven workflow design are not in scope.

    FreeOpen Source
  43. Qwen2.5 72B

    43. Qwen2.5 72B

    Qwen2.5 72B is a free, fully open-source large language model built by Alibaba that you can run on your own hardware. It competes directly with Claude and GPT-4-class models on reasoning, code generation, and math—areas where most open alternatives historically lag—while supporting 128,000 token contexts and multiple languages. The catch is computational: you'll need serious GPU investment (roughly $200k+ in hardware) to run it at scale, and like all LLMs, it has a knowledge cutoff and may need customization for niche domains. For organizations that can afford the infrastructure, it eliminates per-API-call costs entirely.

    FreeOpen Source
  44. RunbookHermes

    44. RunbookHermes

    The agent runs multi-signal diagnosis across observability data, builds a root-cause hypothesis, and generates or updates runbooks from what it learns — so the next incident with the same failure pattern starts from a documented baseline instead of a blank slate. The approval-gated remediation workflow means automated action doesn't ship without a reviewer, which matters when the blast radius is a production service. Where it breaks: the repo is five commits deep with zero open issues, which signals early-stage software, not battle-hardened infrastructure. Teams with complex multi-service topologies will hit integration gaps before the agent's reasoning does. Self-hosting is required, so operationalizing this adds a deployment and maintenance surface your platform team owns.

    FreeOpen Source
  45. Skawld

    45. Skawld

    The SDK runs on Node.js 18+ and Bun 1.1+ as an ESM-only package, so it fits cleanly into modern TypeScript projects without a build-step fight. The vendor describes a minimal setup as a single `Agent` instantiation with a provider, a tool set, and a session — you are running a streaming agent loop in under a dozen lines. Where it starts to strain is on the documentation side: the README is thin, full docs live off-repo at skawld.com/docs, and community reports are sparse given the early star count. Teams who need battle-tested enterprise support or a large ecosystem of pre-built integrations will hit that ceiling fast.

    FreeOpen Source
  46. SynapCores Agent

    46. SynapCores Agent

    The repo, published by SynapCores under MIT, routes all memory, retrieval, semantic tool selection, and generation through the SynapCores backend — one database as the entire brain. There is no LangChain, no separate vector store, no framework glue to audit or upgrade. The project ships a browser chat widget and a live debug sidebar so you can watch memory recall and tool routing decisions in real time. That transparency is the differentiating feature — and also the boundary: the agent's intelligence rides entirely on the SynapCores backend, whose self-hosted deployment requirements the repo does not fully document. Teams that need the backend running on-premise will hit that wall before they hit a code problem.

    FreeOpen Source
  47. Tab Council

    47. Tab Council

    Orbit wraps agent coding work in a bounded loop: it selects a dependency-ordered task, hands it to whichever agent you've wired up, then requires passing tests, lint, and type checks before the task closes. Every run produces structured JSON — what the agent returned, how it scored against a rubric, and a human-readable progress log. Nothing advances on the agent's word alone. The ceiling appears when your workflow needs anything beyond single-task validation loops: multi-repo coordination, branching logic between tasks, or a hosted dashboard for non-engineering stakeholders all require you to build on top of Orbit yourself.

    FreeOpen Source
  48. Tabbit

    48. Tabbit

    Orbit wraps agent execution in bounded, dependency-ordered tasks: one unit of work at a time, with tests, lint, and type checks acting as the gate before progress is recorded. Every run produces four structured artifacts — result JSON, rubric evaluation, a review recommendation, and a human-readable progress log — so code review has evidence instead of vibes. The agent-neutral contract means you can swap Claude, Codex, or Cursor behind the same harness and compare artifacts on identical task sets. The ceiling appears fast: Orbit is deliberately small, so teams that need scheduling across distributed workers or CI/CD pipeline integration will be adding that infrastructure themselves. It is a harness, not a platform.

    FreeOpen Source
  49. Tabby

    49. Tabby

    Open-source, self-hosted AI coding assistant with code completion, chat, and agentic automation.

    Free
  50. Vmette

    50. Vmette

    The threat model vmette solves is concrete: prompt injection on a fetched web page, a malicious package in an AI-suggested install, or model output that does something you didn't intend — all of it lands inside the VM, not on your host. The isolation is hardware-level, not a container namespace that a determined process can escape. Because everything runs on-device, no agent output leaves your machine to a third-party cloud sandbox. The ceiling appears at the edges: vmette is macOS-only, and teams whose agents need to run on Linux servers or in CI pipelines will need a different isolation strategy.

    FreeOpen Source
  51. Z3r0

    51. Z3r0

    Z3r0 is an open-source, self-hosted workbench where a coordinating agent (Z3r0/CSO) delegates to five specialist agents — code audit, recon, exploitation validation, reverse engineering, and cryptography — each scoped to a defined domain. Sessions run against a PostgreSQL-backed timeline log with replay, so long engagements survive interruptions and context window rollovers. WorkProject records tie every finding to authorized scope, targets, and sandbox bindings, which means the evidence chain stays intact when the model context doesn't. The wall appears when your engagement requires a specialist task not covered by the six fixed roles — there is no agent plugin system described in the docs, so teams extending scope are writing new agents from scratch.

    FreeOpen Source

Listings on this page are sourced and verified by the AIDiveForge data pipeline. AIDiveForge is editorially independent — no money changes hands for inclusion.