Free LLMs

As of June 2026, AIDiveForge tracks 51 free llms. Curated free llms tracked by AIDiveForge. Each tool listed is currently free. Listings are verified against each tool's live website and re-checked regularly.

Last updated June 12, 2026 · 51 tools

1. Agent Development Kit (ADK)
ADK is the open-source agent development framework that lets you build, debug, and deploy reliable AI agents at enterprise scale.
Free
2. Agent Governance Toolkit
Policy enforcement, zero-trust identity, execution sandboxing, and reliability engineering for autonomous AI agents.
Free
3. agentmemory
Orbit is an open-source agent orchestration harness that wraps coding agent runs in bounded, dependency-ordered tasks, then gates task completion on real validation: tests, lint, and type checks must pass before an orbit closes. Every run produces structured JSON artifacts — agent output, rubric scores, accept/iterate/stop recommendations, and a human-readable progress log — so you have a trail to review, not just a diff to guess at. It runs against Claude, Codex, Cursor, or any agent that speaks JSON over CLI. The demo runs without an API key, which matters when you're evaluating whether it even fits your workflow. Where it strains: teams who need a web UI, multi-agent parallelism, or cloud-managed infrastructure will hit the limits of an intentionally small CLI harness fast.
FreeOpen Source
4. AutoGPU
The repo describes autonomous agents writing RTL, running it through real EDA tools, reading timing and layout reports, and revising the design — iterating without a human in the seat for each pass. The documented target is small systolic array architectures, specifically matrix-multiply accelerators; the codebase includes ISA definitions, physical design configs, and golden reference models. At that constrained scope, researchers report the agent loop closes. Scale the design complexity beyond what the existing module hierarchy covers and the agents lose the plot — the feedback loops that work for a mac array do not generalize to a multi-block SoC. Teams pushing past the documented scope end up writing their own agent scaffolding on top, at which point AutoGPU is a reference rather than a runtime.
FreeOpen Source
5. AutoLang
Orbit wraps each agent run in a bounded loop: it pulls one task from a dependency-ordered backlog, hands it to whatever agent you've wired up, runs tests, lint, and type checks, and refuses to close the task until validation passes. Every run produces structured JSON — what the agent returned, how it scored against a rubric, whether a human should accept or re-queue. That audit trail is the point. The ceiling appears when your workflow needs anything beyond task-level sequencing: parallel agent execution, real-time dashboards, or integration with existing CI pipelines requires you to build the glue yourself.
FreeOpen Source
6. BGE-M3
BGE is a family of open-source embedding and reranking models from BAAI, released under MIT license with weights available on Hugging Face and PyPI, designed to run entirely on your own infrastructure. The core workflow is straightforward: generate dense embeddings, index them in a vector database, and optionally layer in sparse or multi-vector retrieval for hybrid search. Multi-lingual retrieval is a documented strength, with cross-lingual matching working across language pairs without requiring parallel training data. The ceiling appears when your domain is highly specialized — out-of-the-box embeddings on narrow technical corpora produce ranking quality that requires fine-tuning to fix, and that fine-tuning work lands entirely on your team.
FreeOpen Source
7. Bloom
Bloom generates targeted evaluation suites for arbitrary behavioral traits.
Free
8. Ciris
CIRIS runs a signed reasoning agent on your phone or a home device, with no warehouse in the middle for the closest privacy circles. The vendor describes two paths: fully on-device using a small model like Gemma 4, or free hosted inference for phones that can't run a local model — both paths produce cryptographically signed outputs. Every claim the agent makes carries an ed25519+post-quantum signature, so you can audit it, revoke trust, and re-open any conclusion built on a bad source. The architecture depends on a 'social circle' data model; data in your innermost circles never sends the network message that would let anyone request it. Teams needing broad third-party integrations or a hosted API endpoint will find neither here.
FreeOpen Source
9. Conversations in AI Coding Agent
Orbit is an MIT-licensed, self-hosted harness that wraps a coding agent run in a bounded loop: it selects a task from a dependency-ordered backlog, hands off to whatever agent you plug in, runs tests and lint as a hard gate, and writes structured JSON artifacts that record exactly what happened. Every closed orbit leaves four files — agent output, rubric scoring, an accept-or-iterate recommendation, and a human-readable progress log. The demo runs without an API key, which means you can verify the mechanics before committing any credentials. The harness is agent-neutral by design; the vendor page cites Claude, Codex, and Cursor as examples. Where it shows its seams: Orbit is intentionally small, so teams needing a hosted dashboard, team-level access controls, or CI/CD pipeline integration will be writing that glue themselves.
FreeOpen Source
10. DBRX Instruct
DBRX Instruct is a free, open-source large language model built by Databricks for instruction-following tasks in software development and enterprise applications. It uses a mixture-of-experts architecture to balance performance with efficiency, and integrates natively with Databricks' data platform—a meaningful advantage if you're already in that ecosystem. The model shows strong results on coding and reasoning benchmarks, but carries real limitations: no vision capabilities, a shorter context window than Claude or GPT-4, and less real-world adoption in mainstream enterprise settings. For teams deeply embedded in Databricks infrastructure, it's a compelling option; for everyone else, it remains a secondary choice.
FreeOpen Source
11. Due Diligence Agents
The tool runs parallel analysis across Legal, Finance, Commercial, Technology, Cybersecurity, HR, Tax, Regulatory, and ESG workstreams — domains that siloed consultants hand off sequentially, bleeding weeks in the process. Each agent cross-references findings against the others, so a revenue concentration risk in the commercial workstream gets flagged against the indemnification language in legal without a human manually connecting the dots. Outputs land in Excel and Word with citations intact, ready for an IC memo. The knowledge compounds across deal runs, so repeat buyers in the same sector start with context the first team had to build from scratch. The ceiling appears when your data room contains formats the parser does not handle cleanly — and at that point, teams are pre-processing documents manually before the agents ever see them.
FreeOpen Source
12. Eidentic
The SDK centers on a temporal knowledge graph that tracks when facts were true, resolves contradictions, and consolidates between sessions — so the agent sharpens over time rather than accumulating noise. Durable runs, enforced cost ceilings, and CI-gated evals ship as part of the core, not as paid add-ons. The vendor benchmarks report 55.2% on LongMemEval versus 41.0% for full-context stuffing, and claims up to 39× fewer tokens per query. The gap shows up in support and long-running assistant workflows where session history compounds. At v0.1, the ecosystem is early — teams building anything outside the TypeScript path face a hard stop.
FreeOpen Source
13. Elysia
An open-source framework that spins up an end-to-end agentic RAG application with just two terminal commands.
Free
14. Enforra
Orbit is a harness that wraps AI coding agents — Claude, Codex, Cursor, any JSON-speaking CLI — in a bounded task loop: the agent runs, tests and lint decide whether the work passes, and every run leaves inspectable JSON artifacts whether it succeeds or fails. The evidence trail is the product. You get structured output describing what the agent returned, rubric scoring for task focus and diff signal, and a human-readable progress log. Where it breaks: Orbit does not plan, does not write tasks, and does not decide what to build next — it validates and records what other agents attempt. Teams that need autonomous end-to-end execution will hit that ceiling immediately.
FreeOpen Source
15. Enju
Orbit structures agent work into discrete, dependency-ordered loops: one task per run, deterministic validation gates, and four output artifacts that record exactly what the agent returned, how the run scored against a rubric, and what should happen next. The demo runs without an API key, which means you can evaluate the harness itself before spending a single token. Where it gets constrained: Orbit is a harness, not a scheduler — it does not autonomously drive through a backlog or retry failed orbits on its own. Teams wiring it into CI pipelines write the outer loop themselves.
FreeOpen Source
16. Extella.AI
The structured tool data describes an agentic execution platform from Chariot Technologies Lab., Inc. with primitives called Rules, Concepts, and Experts — built for research automation, cross-system operations, and persistent memory across sessions. The scraped page, however, describes Spotter: a mobile app that identifies landmarks, street food, and wildlife via camera snap and saves them as travel journal entries. There is no matching factual source to ground a production review of the intended tool. Writing a listing from the validator summary alone, without page-sourced specifics on architecture, failure modes, or integration depth, would produce claims that cannot be verified.
Free
17. GEDD
The vendor describes GEDD as a release-readiness tool for AI product managers and domain experts. A PM loads realistic launch-risk scenarios, the domain expert reviews the agent in the shape of the actual task, names failure modes in their own vocabulary, and the session exits with a release report plus a validated evaluation set. That loop converts qualitative judgment into regression gates usable in CI/CD. The ceiling appears when you need programmatic API access — GEDD exposes none, so teams that want to pipe evaluation results into downstream automation build that bridge themselves. Setup requires local installation via pip and depends on sagemaker-mlflow, grounded-evals, and mlflow.
FreeOpen Source
18. Genomi
The core workflow is four steps: install the agent harness, point it at your raw genome file on disk, build a local SQLite index, then ask questions through whichever AI agent you already run — Claude Code, Cursor, Gemini CLI, Goose, and others are listed as compatible. Pharmacogenomics, carrier status, polygenic risk scores, nutrigenomics, and ancestry PCA projection are all covered through distinct skill modules backed by ClinVar, PharmCAT, PGS Catalog, HPO, GenCC, and 1000 Genomes reference data. The privacy architecture is explicit: raw genome data stays on disk, and only the specific evidence snippets relevant to a query cross the boundary to whatever LLM handles the response. The vendor marks this as experimental and not for clinical use — which means researchers and privacy-conscious individuals exploring personal data are the intended audience, not clinical teams expecting diagnostic-grade output.
FreeOpen Source
19. Goose
Open-source local-first AI agent framework for automating complex tasks with any LLM provider.
Free
20. Guildly
Each agent has a fixed role: PM writes PRDs, Manager routes tickets, SDEs work in isolated git worktrees, Reviewer signs off before anything merges. Every action traces back through a chain — line of code to ticket, ticket to PRD, PRD to the #general message that started it. The audit trail isn't a report you run after the fact; it's the structure the system runs on. That structure is also the ceiling: teams needing agents to adapt their process mid-sprint, or handle workflows that don't fit the six-role model, will hit the playbook's edges before long. The tool is in beta, with no API and no self-hosted option, so the surface you can extend is narrow.
Free
21. Hermes Agent
Self-improving open-source AI agent with persistent memory, skill learning, and multi-platform access.
Free
22. Hermes Agent
The agent lives on your server — not a vendor's — and connects to Telegram, Discord, Slack, WhatsApp, Signal, and email simultaneously, so the same agent handles a Slack request in the morning and a scheduled backup at night. Persistent memory and auto-generated skills mean it accumulates institutional knowledge over time rather than starting cold on each invocation. Real sandboxing across Docker, SSH, Singularity, Modal, and local backends means you can isolate risky tasks without routing them through a third party. The ceiling appears when you need managed reliability guarantees: at v0.16.0 this is early-stage software, and self-hosted operations teams carry full responsibility for uptime, credential management, and model API costs. Teams that need SLA-backed infrastructure typically wire Hermes into a managed hosting layer — which adds operational overhead the framework itself does not absorb.
FreeOpen Source
23. Hermes Desktop
Hermes Studio is an open-source, self-hosted dashboard that wraps Hermes Agent in a control plane: task scheduling, multi-agent coordination, memory and skill management, cost tracking, and an approval gate for actions you don't want running unsupervised. The vendor describes it as MIT-licensed with no paid tiers, which means every feature ships without a paywall. The architecture assumes you are already running Hermes Agent locally — Hermes Studio is the interface, not the runtime. Teams that need cloud-hosted infrastructure or agents that run without a local Hermes Agent install will hit that wall immediately.
FreeOpen Source
24. HermesBench
OpenResume is a browser-based resume builder and parser that keeps all data local: nothing is sent to a server, no account is required. You fill in a form, the tool renders an ATS-optimized PDF in real time, and you download it. The parser side lets you drop in an existing resume and see exactly how an automated screener will read it — which fields it finds, which it misses. The tool handles one job well. It does not support multiple resume versions with branching tailoring logic, and teams needing bulk generation or API-driven output will find no hooks to connect to.
FreeOpen Source
25. Hugging Face Spaces
Orbit acts as a harness around any JSON-speaking coding agent — Claude, Codex, Cursor, or others — running one task per cycle, executing tests and lint checks to decide whether the work advances, and writing structured JSON artifacts for every run. The dependency-aware backlog keeps each task bounded so agents do not drift across scope. Where it breaks: Orbit is intentionally minimal, so teams expecting a hosted dashboard, a GUI, or built-in agent adapters beyond CLI-level integration will build those layers themselves. The artifact trail is machine-readable JSON and a markdown log — useful for audits, not for a non-technical stakeholder who needs a summary.
FreeOpen Source
26. Kikubot
Each Kikubot container polls one IMAP mailbox, feeds incoming email into an LLM agentic loop with a configured tool set, and replies over SMTP. Multi-agent workflows emerge naturally: a coordinator agent emails specialists, specialists reply, threads become the audit trail. The architecture requires a running mail server, which adds operational surface area before a single agent does anything useful. Teams with no existing mail infrastructure will spend more time on SMTP/IMAP setup than on agent logic. When the email-as-bus metaphor stops fitting — high-frequency tasks, sub-second latency requirements, or webhooks that can't wait for a polling interval — this architecture forces a full redesign.
FreeOpen Source
27. Llama 3
Llama 3 is a large language model family designed to handle standard NLP workloads—text generation, translation, summarization, and sentiment analysis—across a range of scales. Meta released it as open source, meaning you can download weights, fine-tune locally, or run it on your own infrastructure instead of hitting an API. The catch: while free to use, the model is young relative to Llama 2, and local deployment requires real hardware or cloud credits. For teams building production systems, this trades managed convenience for control and lower long-term marginal costs.
FreeOpen Source
28. Llama 4 Scout
Scout carries a 10M token context window, meaning you can feed it an entire codebase or a stack of legal documents in a single pass without chunking pipelines or retrieval hacks. Maverick trades raw context depth for stronger multimodal reasoning, handling interleaved image and text inputs through native early-fusion architecture rather than a bolted-on vision adapter. Both models ship as open weights, downloadable from Hugging Face after license acceptance, with no API bill required if you run them yourself. The ceiling appears at inference: the Mixture-of-Experts architecture demands hardware that most teams do not have sitting idle, and running Scout's full 10M context window in practice requires significant GPU memory that a standard cloud instance will not cover.
FreeOpen Source
29. LocalFlow
The core loop is deliberately small: Orbit selects one dependency-ordered task, hands it to whichever coding agent you wire in, runs tests, lint, and type checks, and only closes the task if the agent can prove the work passed. Every run produces four artifact files — structured result JSON, rubric-scored evaluation, a review recommendation, and a human-readable progress log. That paper trail is what lets you compare two agents on the same task by diffing artifacts instead of re-running demos. The harness runs locally with no API key required for the replay demo, so there is nothing to provision before you can see it work. The ceiling appears fast on non-coding tasks — Orbit is built for code-output validation and nothing else.
FreeOpen Source
30. MagesticAI
The platform runs a pipeline of specialized agents — Planner, Coder, QA — that hand off work through isolated Git worktrees, so each task gets its own branch and a bad run does not contaminate the main codebase. You monitor execution in real-time through a web UI, which means you are not staring at terminal logs hoping the right thing happened. The vendor describes cross-session knowledge retention, so the system carries context between separate task runs. The architecture supports multiple LLM providers, which means you are not locked to one API when costs shift. At 78 stars and 184 commits, this is early-stage software — community support is thin and the blast radius of an undocumented breaking change falls entirely on your team.
FreeOpen Source
31. MemPalace
Orbit wraps agent runs in bounded loops: it selects one dependency-ordered task, hands it to your agent, runs tests and lint and type checks, and only marks work complete if validation passes. Every run produces structured JSON artifacts and a human-readable progress log, so you are reviewing evidence instead of trusting output. The agent-neutral contract means you can swap Claude, Codex, or Cursor behind the same harness and compare structured artifacts across runs. The tool is intentionally small — it handles the validation harness, not the full development lifecycle. Teams with sparse test coverage will find the validation gates have nothing to enforce.
FreeOpen Source
32. Microsoft Agent Framework
A framework for building, orchestrating and deploying AI agents and multi-agent workflows with support for Python and .NET.
Free
33. Mind-expander
The agent drives the canvas: it can run `npx mind-expander` in the background, load skill integrations, and build guided tours through architecture. You see the same graph the agent is reasoning about, which means review decisions and refactor plans are grounded in actual dependency structure — not the agent's approximation of it. That shared view is the differentiator. The ceiling arrives with language support: Rust and TypeScript are covered, the docs describe more language frontends as planned. Teams whose core services are in Go, Python, or Java will hit that wall on day one.
FreeOpen Source
34. Mistral
Mistral offers a family of large language models ranging from the lightweight Mistral 7B to the more capable Mistral Large, accessible both as open-source downloads and via paid API. The company positions itself as the cost-conscious alternative to ChatGPT and Claude, with a free tier covering basic use cases but throttled requests that frustrate serious users. Pricing for the API starts around $0.14 per million input tokens—roughly one-third OpenAI's rate—making it genuinely cheap at scale. The catch: public API documentation remains sparse, and the free tier's limitations mean you'll likely hit a paywall faster than expected.
FreeOpen Source
35. Mistral Large 2
Mistral Large 2 is a general-purpose language model trained to handle complex reasoning, code generation, and multilingual work at the scale enterprises need. It's free to use via API or self-host, sits in the same performance tier as proprietary models from OpenAI and Anthropic, and can ingest documents up to 128,000 tokens long. The core trade-off: it has a knowledge cutoff earlier than competitors and lacks serious vision capabilities, making it less suitable for tasks requiring current events or image understanding. For teams optimizing on cost and reasoning quality rather than breadth of modalities, it's a genuine alternative to paid tiers.
FreeOpen Source
36. Mnemo
Orbit wraps each agent run in a bounded loop: it selects a dependency-ordered task from your backlog, hands it to whichever coding agent you point at it, then runs tests, lint, and type checks before the task is allowed to close. Every run leaves structured JSON artifacts — what the agent returned, how the output scored against a rubric, and a human-readable recommendation to accept, iterate, or stop. The agent-neutral contract means you can swap Claude for Codex behind the same harness and compare artifacts instead of gut feelings. Where Orbit hits its ceiling: it is a harness, not a planner, so teams that need autonomous task decomposition or cross-repo coordination will be adding that layer themselves.
FreeOpen Source
37. NanoClaw
NanoClaw is a lightweight, open-source personal AI agent that runs on your own machine, connects to messaging apps like WhatsApp, Telegram, Slack, Discord, and Signal, and is built around just 15 source files you can read in a single sitting.
Free
38. OpenAgents
OpenAgents positions itself as the coordination backbone for distributed AI agents. You get a hosted workspace (or self-host) where agents working on separate machines discover each other, share files and browser context, and coordinate via @mentions. Installation is one-liner: install the Launcher desktop app, point agents at a workspace token, and they join. The platform is open-source with an active but modest community. The technical surface is clean—agents register on the network, events flow between them, and context stays shared. The hard part surfaces later: when your agents are actually doing different things (some coding, some reviewing, some managing), orchestrating handoffs stays manual. This is SDK-first, not no-code. If you're building a research team of specialized agents or debugging scenarios where you need human eyes on agent reasoning in real time, the shared workspace genuinely reduces context switching. If you're running a single coding agent that sometimes needs to call another agent, you might be over-engineering it.
FreeOpen Source
39. OpenFang
An open-source Agent Operating System built from scratch in Rust, designed to run autonomous agents on schedules.
Free
40. Patina
Orbit wraps each agent task in a bounded loop: the agent works, validation runs (tests, lint, type checks), and the task only closes when the checks pass. Every loop leaves structured JSON artifacts — what the agent returned, how it scored against a rubric, and a human-readable recommendation to accept, retry, or stop. This makes agent runs auditable after the fact, not just observable in the moment. The ceiling appears when your project needs multi-agent coordination or a hosted execution layer — Orbit is deliberately narrow, self-hosted only, and ships no managed runtime.
FreeOpen Source
41. Preseason.ai
Orbit sits between your backlog and your coding agent, selecting one dependency-ordered task at a time, running the agent, then forcing the result through tests, lint, and type checks before marking the task done. Every run writes structured JSON artifacts — what the agent returned, how the output scored against a rubric, whether a human should accept or iterate — so you are reviewing evidence, not trusting a diff. The agent-neutral contract means you can run Claude, Codex, and Cursor against the same task and compare artifacts instead of impressions. The harness is intentionally minimal; it does not schedule, it does not host, and it does not manage secrets — which means the moment your workflow needs cross-repo coordination or cloud execution, you are writing the glue yourself.
FreeOpen Source
42. ProData AI
Orbit is an open-source harness that wraps AI coding agent runs in a fixed loop: pick a task from a dependency-ordered backlog, run the agent, validate the output against tests, lint, and type checks, then record structured evidence before the task closes. Nothing advances without proof. Each run produces four artifact files — agent output, rubric scores, a recommendation, and a human-readable log — so you can inspect exactly what happened without replaying the whole session. The harness is agent-neutral; Claude, Codex, Cursor, or any JSON-speaking CLI plugs in behind the same contract. The ceiling appears quickly on teams who need anything beyond the validation-gate model — custom orchestration, parallel agent execution, or UI-driven workflow design are not in scope.
FreeOpen Source
43. Qwen2.5 72B
Qwen2.5 72B is a free, fully open-source large language model built by Alibaba that you can run on your own hardware. It competes directly with Claude and GPT-4-class models on reasoning, code generation, and math—areas where most open alternatives historically lag—while supporting 128,000 token contexts and multiple languages. The catch is computational: you'll need serious GPU investment (roughly $200k+ in hardware) to run it at scale, and like all LLMs, it has a knowledge cutoff and may need customization for niche domains. For organizations that can afford the infrastructure, it eliminates per-API-call costs entirely.
FreeOpen Source
44. RunbookHermes
The agent runs multi-signal diagnosis across observability data, builds a root-cause hypothesis, and generates or updates runbooks from what it learns — so the next incident with the same failure pattern starts from a documented baseline instead of a blank slate. The approval-gated remediation workflow means automated action doesn't ship without a reviewer, which matters when the blast radius is a production service. Where it breaks: the repo is five commits deep with zero open issues, which signals early-stage software, not battle-hardened infrastructure. Teams with complex multi-service topologies will hit integration gaps before the agent's reasoning does. Self-hosting is required, so operationalizing this adds a deployment and maintenance surface your platform team owns.
FreeOpen Source
45. Skawld
The SDK runs on Node.js 18+ and Bun 1.1+ as an ESM-only package, so it fits cleanly into modern TypeScript projects without a build-step fight. The vendor describes a minimal setup as a single `Agent` instantiation with a provider, a tool set, and a session — you are running a streaming agent loop in under a dozen lines. Where it starts to strain is on the documentation side: the README is thin, full docs live off-repo at skawld.com/docs, and community reports are sparse given the early star count. Teams who need battle-tested enterprise support or a large ecosystem of pre-built integrations will hit that ceiling fast.
FreeOpen Source
46. SynapCores Agent
The repo, published by SynapCores under MIT, routes all memory, retrieval, semantic tool selection, and generation through the SynapCores backend — one database as the entire brain. There is no LangChain, no separate vector store, no framework glue to audit or upgrade. The project ships a browser chat widget and a live debug sidebar so you can watch memory recall and tool routing decisions in real time. That transparency is the differentiating feature — and also the boundary: the agent's intelligence rides entirely on the SynapCores backend, whose self-hosted deployment requirements the repo does not fully document. Teams that need the backend running on-premise will hit that wall before they hit a code problem.
FreeOpen Source
47. Tab Council
Orbit wraps agent coding work in a bounded loop: it selects a dependency-ordered task, hands it to whichever agent you've wired up, then requires passing tests, lint, and type checks before the task closes. Every run produces structured JSON — what the agent returned, how it scored against a rubric, and a human-readable progress log. Nothing advances on the agent's word alone. The ceiling appears when your workflow needs anything beyond single-task validation loops: multi-repo coordination, branching logic between tasks, or a hosted dashboard for non-engineering stakeholders all require you to build on top of Orbit yourself.
FreeOpen Source
48. Tabbit
Orbit wraps agent execution in bounded, dependency-ordered tasks: one unit of work at a time, with tests, lint, and type checks acting as the gate before progress is recorded. Every run produces four structured artifacts — result JSON, rubric evaluation, a review recommendation, and a human-readable progress log — so code review has evidence instead of vibes. The agent-neutral contract means you can swap Claude, Codex, or Cursor behind the same harness and compare artifacts on identical task sets. The ceiling appears fast: Orbit is deliberately small, so teams that need scheduling across distributed workers or CI/CD pipeline integration will be adding that infrastructure themselves. It is a harness, not a platform.
FreeOpen Source
49. Tabby
Open-source, self-hosted AI coding assistant with code completion, chat, and agentic automation.
Free
50. Vmette
The threat model vmette solves is concrete: prompt injection on a fetched web page, a malicious package in an AI-suggested install, or model output that does something you didn't intend — all of it lands inside the VM, not on your host. The isolation is hardware-level, not a container namespace that a determined process can escape. Because everything runs on-device, no agent output leaves your machine to a third-party cloud sandbox. The ceiling appears at the edges: vmette is macOS-only, and teams whose agents need to run on Linux servers or in CI pipelines will need a different isolation strategy.
FreeOpen Source
51. Z3r0
Z3r0 is an open-source, self-hosted workbench where a coordinating agent (Z3r0/CSO) delegates to five specialist agents — code audit, recon, exploitation validation, reverse engineering, and cryptography — each scoped to a defined domain. Sessions run against a PostgreSQL-backed timeline log with replay, so long engagements survive interruptions and context window rollovers. WorkProject records tie every finding to authorized scope, targets, and sandbox bindings, which means the evidence chain stays intact when the model context doesn't. The wall appears when your engagement requires a specialist task not covered by the six fixed roles — there is no agent plugin system described in the docs, so teams extending scope are writing new agents from scratch.
FreeOpen Source

Listings on this page are sourced and verified by the AIDiveForge data pipeline. AIDiveForge is editorially independent — no money changes hands for inclusion.