Get This Tool
RiddleRun
Pricing
- Model
- Free
Summary
Maintaining E2E tests against a UI that keeps changing means your test scripts break faster than your engineers can fix them — RiddleRun replaces fragile selector-based scripts with an agent that reads your JSON-described user journey and navigates the browser on its own.
RiddleRun combines a CLI and an optional self-hosted web app, both running inside Docker, so your test environment travels with the repo rather than living on someone's laptop. You define a user journey in JSON — steps, assertions, expected outcomes — and a Playwright/browser-use agent executes the whole sequence autonomously. The Docker-first setup means teams can wire it into CI without installing a browser stack on the build machine. The project has two GitHub stars and one open issue at the time of curation, which signals early-stage maturity — documentation depth and community support are thin, and the agent's decision logic is largely a black box to the teams running it.
Bottom line: Pick RiddleRun for a self-hosted, AI-driven smoke test against a Wikipedia-style content site or any rapidly changing UI where rewriting selector-based tests every sprint is killing velocity — but plan for a Playwright-native alternative if your test suite needs deterministic assertions, fine-grained failure tracing, or enterprise support.
Community Performance Report Card
No community ratings yet. Be the first to rate this tool!
Community Benchmarks Community
Sign in to submit a benchmarkNo community benchmarks yet. Be the first to share a real-world data point.
Pros
Sign in to edit- JSON-defined test journeys decouple test authorship from code, so a product manager or QA analyst can write and update test cases without touching a Playwright script.
- Docker-first deployment means the entire test environment — browser, agent, backend — is version-controlled and reproducible, so 'works on my machine' test failures stop being a sprint tax.
- Autonomous agent execution adapts when UI elements shift position or change labels, so a redesign doesn't immediately invalidate your entire test suite the way selector-based tests do.
- Fully open-source with no paid tier, so there is no usage ceiling, no API key cost, and no vendor lock-in — the full source is forkable and auditable.
- Optional self-hosted web app alongside the CLI, so teams that want a visual interface for running and reviewing tests get one without leaving their own infrastructure.
Cons
Sign in to edit- Agent decision logic is opaque: when a test fails, the JSON output and logs do not currently expose a step-by-step trace of what the agent attempted, which means debugging a false negative on a critical checkout flow requires re-running the test manually and watching the browser — not reading a structured failure report.
- The project carries two GitHub stars and one open issue at curation, which means there is precious little community knowledge to draw on when the agent misinterprets a journey step; teams hit a wall and wait on the single maintainer rather than searching a forum or Stack Overflow thread.
- Complex assertion logic — verifying specific data values, confirming API responses correlate with UI state, or testing accessibility properties — is not described anywhere in the documented feature set; teams needing that depth will add a Playwright test layer alongside RiddleRun, at which point they are maintaining two systems.
- Teams whose CI pipeline requires parallel test execution across multiple environments will find no documented support for distributed runs; at the point where a single Docker container's serial execution makes the test suite a bottleneck, the likely move is to a Playwright-native framework or a hosted AI testing service with built-in parallelism.
Community Reviews
Sign in to write a reviewNo reviews yet. Be the first to share your experience.
About
- Platforms
- Docker, CLI, self-hosted web app
- API Available
- No
- Self-Hosted
- Yes
- Last Updated
- 2026-06-11T06:15:22.070Z
Best For
Who it's for
- Teams needing AI-assisted E2E tests
- Projects with rapidly changing UIs
- Docker-based test environments
What it does well
- Automated end-to-end webpage testing
- Agent-driven browser interaction verification
- JSON-defined user journey execution
- Self-hosted testing for teams
Integrations
Discussion Community
Sign in to commentNo discussion yet. Sign in to start the conversation.
Compare RiddleRun
Spotted incorrect or missing data? Join our community of contributors.
Sign Up to ContributeCommunity Notes & Tips Community
Sign in to contributeBe the first to contribute. General notes, observations, gotchas, and tips from people who use this tool day-to-day.
Frequently Asked Questions
- Is RiddleRun free?
- Yes — RiddleRun is fully free to use. There is no paid tier.
- Is RiddleRun open source?
- Yes. RiddleRun is open source.
- Can I self-host RiddleRun?
- Yes. RiddleRun supports self-hosting on your own infrastructure.
- What platforms does RiddleRun support?
- RiddleRun is available on: Docker, CLI, self-hosted web app.
Hours Saved & ROI Stories Community
Sign in to contributeBe the first to contribute. Concrete time/cost savings, with context. e.g. "Cut my code review backlog from 4h to 45m per week."
Curated lists that include this category
RiddleRun is an open-source agentic browser testing tool that executes E2E test cases through a Playwright/browser-use agent rather than hand-written selector scripts. The core workflow is JSON-in, test-run-out: you describe a user journey as a JSON file, pass it to the CLI or the web UI, and the agent navigates a real browser to verify that journey end-to-end. Everything runs inside Docker, and the repo ships both a standard compose file and a production compose file so the path from local to server is a config swap, not a rebuild.
The differentiating feature is the autonomous agent execution loop. Instead of breaking when a button moves or a class name changes, the agent interprets intent from the JSON description and finds its own path through the page — the same way a human tester would adapt rather than quit. This makes RiddleRun a credible option for UIs under active development where conventional test scripts would require constant maintenance.
RiddleRun fits teams that want AI-assisted coverage without a hosted testing SaaS and are comfortable owning the infrastructure. It fits less well when you need deterministic pass/fail logic tied to specific DOM states, full debugging traces of what the agent clicked and why, or a mature plugin ecosystem. The project is at an early public stage — community reports and issue history are sparse — so teams that hit an edge case are largely on their own until the maintainer responds. Teams requiring audit-grade test logs or multi-environment parallelism are the most likely to exhaust what this tool currently offers and move to a maintained Playwright framework or a commercial AI testing platform.
