Get This Tool
Runway
Summary
You ran the agent, got a diff, and have no idea whether it actually passed your tests or just claimed to — because nothing in the loop required proof. Orbit exists to close that gap.
Orbit wraps agent runs in bounded execution cycles: one task selected from a dependency-ordered backlog, real test and lint gates that must pass before the task closes, and a structured artifact trail left after every run. You get four output files — agent result, rubric evaluation, a human-readable progress log, and an accept/iterate/stop recommendation — so you can audit what happened instead of re-running it from memory. The deterministic replay demo runs without an API key, which means you can inspect the full loop before wiring in Claude, Codex, or any other JSON-speaking CLI. The tool is intentionally scoped: it handles the harness, not the agent. Teams that need the agent itself to do more will hit that boundary fast.
Bottom line: Pick Orbit when you need reproducible, auditable evidence that an agent's work actually passed your test suite — not when you need the agent to do sophisticated multi-step reasoning that the harness itself cannot validate.
Pricing Plans
SubscriptionLast verified 2 days ago- Price
- $12/mo
- Free Tier
- 125 credits (one time), 3 video editor projects, 5GB asset storage, No Gen-4 Video
Free
For individuals looking to explore Runway's AI Tools and content creation features.
- 125 credits (one time)
- 125 credits = 25s of Gen-4 Turbo or Gen-3 Alpha Turbo
- Generative Video Gen-4 Turbo (Image to Video)
- Generative Image Gen-4 (Text to Image, References)
- Gemini 3 Pro
- Gemini 2.5
- Image Apps
- Generative Audio
- Text to Speech
- Audio Apps
- 3 video editor projects
- 5GB asset storage
- No Gen-4 Video
Standard
For individuals and small teams looking for more access, more AI Tools and more export options. Max. 5 users per workspace.
- 625 credits monthly
- Access to all Apps
- Ability to run Workflows
- Generative Video Gen-4.5 (Text + Image to Video)
- Aleph (Video Editing)
- Gen-4 (Image to Video)
- Act-Two (Performance Capture)
- Veo 3.1
- Veo 3
- Video Apps
- Access to all third party video models
- Access to all third party image models
- Upscale resolution
- Remove watermarks
- Credits refresh monthly
- No rate restrictions
- Buy more credits
- 100GB asset storage
- Unlimited video editor projects
- Technical support via Runway dashboard
Pro
For individuals and teams looking to add all of Runway's features into their workflows. Max. 10 users per workspace.
- 2250 credits monthly
- Create Custom Voices for Lip Sync and Text to Speech
- 500GB asset storage
- Everything in Standard
Max
Best value for heavy usage and for experimenting. Max. 10 users per workspace.
- 9500 credits monthly
- Unused credits roll over 1 month
- First access to newest models
- Highest generation volume
- Everything in Pro
Enterprise
For teams and organizations that need customization, advanced security and support.
- Scalable for large organizations
- All Pro Plan features
- Single sign-on
- Custom credit amounts
- Configurable organization and team spaces
- Advanced security and compliance
- Enterprise-wide onboarding
- Ongoing success program
- Priority support
- Integration with internal tools
- Workspace Analytics
View full pricing on runwayml.com →
Pricing may have changed since last verified. Check the official site for current plans.
Community Performance Report Card
No community ratings yet. Be the first to rate this tool!
Community Benchmarks Community
Sign in to submit a benchmarkNo community benchmarks yet. Be the first to share a real-world data point.
Pros
Sign in to edit- Validation gates require tests, lint, and type checks to pass before a task closes, which means you get actual proof of correctness instead of an agent's self-reported success.
- Four structured output artifacts per run (agent result, rubric evaluation, review recommendation, progress log), so you can audit any run after the fact without re-executing it.
- Agent-neutral contract — any JSON-speaking CLI plugs in behind the same harness — so comparing two coding agents means inspecting their artifacts under identical conditions instead of running separate experiments with no shared baseline.
- Dependency-aware backlog selection advances one verified task at a time, which means a broken task blocks downstream work rather than silently corrupting the next step.
- The deterministic replay demo runs without an API key, so you can fully inspect the harness loop before committing any cloud API spend or credentials.
Cons
Sign in to edit- Orbit supplies the harness, not the agent, the tasks, or the test suite — teams without an existing automated test infrastructure will spend the first sprint writing prerequisites rather than running orbits.
- The artifact schema and gate logic are defined by the harness contract; teams that need custom rubric dimensions or non-standard validation steps beyond tests, lint, and type checks will need to modify the project directly, since the vendor page describes no plugin or configuration surface for that.
- The project is described as 'intentionally small' with no commercial offering and no roadmap published on the vendor page — teams that need SLA-backed support, managed hosting, or a maintained integration ecosystem will move to a commercial agent orchestration platform rather than maintain a fork of a small open-source harness.
Community Reviews
Sign in to write a reviewNo reviews yet. Be the first to share your experience.
About
- Platforms
- Linux, macOS, Windows (Python)
- API Available
- Yes
- Self-Hosted
- Yes
- Last Updated
- 2026-06-07T20:31:32.028Z
Best For
Who it's for
- Teams evaluating AI coding agents experimentally
- Projects requiring reproducible, auditable agent runs
- Repositories with automated test and lint validation
- Harness engineers prototyping agent workflows
What it does well
- Testing and validating AI coding agents deterministically
- Running self-healing repository workflows with test gates
- Comparing multiple coding agents via artifact inspection
- Executing dependency-ordered task backlogs with agent verification
- Auditing agent behavior through structured progress logs
Integrations
Discussion Community
Sign in to commentNo discussion yet. Sign in to start the conversation.
Compare Runway
Spotted incorrect or missing data? Join our community of contributors.
Sign Up to ContributeCommunity Notes & Tips Community
Sign in to contributeBe the first to contribute. General notes, observations, gotchas, and tips from people who use this tool day-to-day.
Frequently Asked Questions
- Is Runway free?
- Runway is a paid tool ($12/mo). No permanent free tier is offered.
- Is Runway open source?
- Yes. Runway is open source.
- Does Runway have an API?
- Yes. Runway exposes a developer API. See the official documentation at https://runwayml.com for details.
- Can I self-host Runway?
- Yes. Runway supports self-hosting on your own infrastructure.
- What platforms does Runway support?
- Runway is available on: Linux, macOS, Windows (Python).
Hours Saved & ROI Stories Community
Sign in to contributeBe the first to contribute. Concrete time/cost savings, with context. e.g. "Cut my code review backlog from 4h to 45m per week."
Curated lists that include this category
Most agent runs leave you with a diff and a hope. Orbit structures that run into what the vendor calls an ‘orbit’: the harness selects one task from a dependency-ordered backlog, hands it to whatever coding agent you’ve configured, runs your tests, lint, and type checks as validation gates, and only closes the task if the agent can prove the result. Every run writes four artifacts — a structured agent-result.json, a rubric-scored evaluation.json, a review.json with an accept/iterate/stop recommendation, and a human-readable progress.md — so the evidence trail is inspectable long after the run completes.
The differentiating feature is the gate-and-artifact contract, not the agent. Orbit is described on the vendor page as agent-neutral: Claude, Codex, Cursor, or any CLI that speaks JSON can be dropped behind the same harness. That means you compare agents by inspecting their artifacts under identical conditions instead of relying on anecdotal impressions from different sessions. The vendor explicitly calls out ‘adapter experiments’ as a supported workflow — swap the agent, replay the same task, diff the evaluation scores.
Orbit fits narrowly: repositories with real automated test suites, teams that need reproducible audit trails for agent behavior, and harness engineers prototyping how agents handle a backlog. It does not supply the agent, the tasks, or the test suite. Teams that have none of those pieces in place will be building the prerequisites before Orbit adds value. The vendor page describes the project as ‘intentionally small,’ which is an honest signal about scope — contributions are welcomed specifically when they make the harness easier to verify or replay, not when they expand what the harness does.
The deterministic replay demo — invoked with MOCK=1 ./replay.sh auth-rescue — requires no API key and demonstrates the full selection, validation, and artifact-recording loop locally. The project is MIT licensed with no paid tier described on the vendor page.
