Get This Tool
cua
Summary
Generic sandboxes boot one OS, reset state manually, and hand you a single machine — fine for smoke tests, not for parallel RL rollouts across Linux, Windows, macOS, and Android simultaneously. Cua is the infrastructure layer built for exactly that gap.
Cua provisions cross-OS fleets from a single API, forks machine state over copy-on-write snapshots so you can reproduce failures without rebuilding from scratch, and serves pre-booted machines from warm pools that claim in milliseconds. The open-source Cua Driver runs background desktop automation on macOS and Windows — agents click, type, scroll, and inspect accessibility trees without stealing your cursor. Linux support in Cua Driver is in pre-release, so teams with Linux-heavy desktop workflows will hit that wall immediately. At scale, you either point your training loop at live warm pools or order verified trajectory datasets that arrive pre-packaged for your ingestion pipeline.
Bottom line: Cua is the right call when you need parallel agent evals across four OS families with snapshot-based rollback — it breaks down as your primary surface when the desktop automation you need runs on Linux, where the Driver is still pre-release.
Hosted & API Pricing
The model is free to self-host. These are the creator's hosted/API options.Dedicated fleets
Hosted, BYOC, or on-prem
- Warm pools
- Verified data
- SOC 2
Pricing may have changed since last verified. Check the official site for current plans.
Pricing Plans
Free- Free Tier
- Free tier on GitHub; dedicated fleets by request
Free
Open-source stack and free tier on GitHub
- Cua Driver (MIT)
- Basic sandbox and bench usage
Dedicated Fleets
Hosted, BYOC, or on-prem infrastructure
- Warm pools
- Verified data delivery
- SOC 2-ready
Pricing may have changed since last verified. Check the official site for current plans.
Community Performance Report Card
No community ratings yet. Be the first to rate this tool!
Community Benchmarks Community
Sign in to submit a benchmarkNo community benchmarks yet. Be the first to share a real-world data point.
Pros
Sign in to edit- One API boots Linux, Windows, macOS, and Android machines across six local runtimes or the cloud, so you stop maintaining separate provisioning scripts for each OS your agents target.
- Copy-on-write snapshot forking lets you branch from a known machine state for every parallel episode, which means failures reproduce against the exact environment that produced them — no manual state reconstruction.
- Warm pools serve pre-booted machines in milliseconds, so large parallel eval batches do not serialize on cold-start latency the way they do with on-demand VM provisioning.
- Cua Driver runs background desktop automation without capturing focus or the cursor, so an agent can operate continuously on a developer's machine without interrupting their session — the thing that makes persistent eval loops on shared hardware viable.
- MIT-licensed open-source control and eval layers mean you can audit, fork, and self-host the Driver and Bench components, so vendor lock-in on the core automation interface is not a forcing function.
Cons
Sign in to edit- Cua Driver's Linux desktop backend is in pre-release. Teams whose agents target Linux native apps cannot ship production automation against it — they run macOS or Windows coverage and maintain a separate path for Linux, or they wait on a release timeline the docs do not commit to.
- Verified trajectory datasets are produced and scored by Cua's own evaluators running on Cua's environments. Teams with strict data-provenance requirements or proprietary app surfaces that cannot be handed to a third-party fleet will need to run their own rollouts, which folds the full harness-management burden back onto them.
- The benchmark data the vendor surfaces — the best frontier agent clearing 6 of 25 expert KiCad tasks — scopes to a narrow expert domain. Teams trying to predict how their agent will perform on general enterprise UI workflows have precious little external validation data to anchor against, and will need to author their own Cua Bench evals before the infrastructure investment pays off.
Community Reviews
Sign in to write a reviewNo reviews yet. Be the first to share your experience.
About
- Platforms
- macOS, Windows, Linux (pre-release), Android
- API Available
- Yes
- Self-Hosted
- Yes
- Last Updated
- 2026-06-18T03:26:26.824Z
Best For
Who it's for
- AI agent developers needing cross-OS desktop control
- Teams running large-scale eval or data-generation workloads
- Users requiring background, focus-preserving automation
- Organizations seeking open-source control layers with hosted scaling
What it does well
- Running parallel agent evaluations and RL loops on real machines
- Generating verified trajectory datasets for training
- Automating desktop UI tasks across multiple operating systems
- Reproducing failures from machine snapshots
- Benchmarking computer-use agents on native apps and mobile
Integrations
Discussion Community
Sign in to commentNo discussion yet. Sign in to start the conversation.
Compare cua
Spotted incorrect or missing data? Join our community of contributors.
Sign Up to ContributeCommunity Notes & Tips Community
Sign in to contributeBe the first to contribute. General notes, observations, gotchas, and tips from people who use this tool day-to-day.
Frequently Asked Questions
- Is cua free?
- cua is a paid tool. No permanent free tier is offered.
- Is cua open source?
- Yes. cua is open source.
- Does cua have an API?
- Yes. cua exposes a developer API. See the official documentation at https://cua.ai for details.
- Can I self-host cua?
- Yes. cua supports self-hosting on your own infrastructure.
- What platforms does cua support?
- cua is available on: macOS, Windows, Linux (pre-release), Android.
Hours Saved & ROI Stories Community
Sign in to contributeBe the first to contribute. Concrete time/cost savings, with context. e.g. "Cut my code review backlog from 4h to 45m per week."
Curated lists that include this category
Most agent eval infrastructure treats a single VM as the unit of work. Cua treats a fleet as the unit. The core workflow is: boot machines across Linux, Windows, macOS, or Android via the Python or TypeScript SDK or the cua CLI, fork known machine states using copy-on-write snapshots, run parallel episodes across warm pools, and release machines back when the episode ends. Reward collection, failure reproduction, and dataset export all happen against the same fleet layer.
The differentiating feature is snapshot-native rollouts. Rather than tearing down and rebuilding a machine after each episode — the standard approach that serializes your batch throughput — Cua forks from a known snapshot, runs the rollout, and discards the fork. The vendor states warm pools scale to zero when idle and claim in milliseconds, which means large parallel batches do not wait on cold-start overhead.
Cua Driver, the open-source MIT-licensed component, is the piece that sits closest to your agent’s tool calls. It runs as an MCP stdio server, a long-running daemon, or a one-shot shell command on the same binary — so the same harness serves both automated agent loops and manual human testing. Native desktop backends on macOS and Windows preserve the user session, meaning the agent operates in the background without interrupting whoever is logged in. Linux support is explicitly in pre-release, which limits the Driver’s usefulness for teams whose target surface is Linux desktop.
For teams that do not want to manage the harness at all, Cua offers verified trajectory datasets: the vendor runs the rollouts on the same environments and delivers pre-packaged, evaluator-scored trajectory data scoped by task and surface. BYOC and on-prem options are available for teams with data-residency requirements, and the vendor states SOC 2 readiness for the hosted offering.
