Get This Tool
Resurf
Pricing
- Model
- Free
Summary
Testing browser agents today means choosing between flaky real websites and static HTML benchmarks missing state and dynamics. Resurf offers a middle path: a synthetic e-commerce site that replays deterministically and injects failure modes.
Resurf is a framework for reproducible testing of AI browser agents built on a single synthetic site (shop_v1) that exercises forms, multi-step checkout, auth, and inventory logic. The tool records per-step DOM snapshots, screenshots, and token counts. It supports multiple agent adapters (browser-use, stagehand) and runs modifiers—failure-mode injectors for network latency, payment declines, and server errors—via YAML configuration without code changes. The core appeal is reproducibility: tests replay from SQLite snapshots with seeded randomness rather than hitting live sites. The limitations are structural: v0 ships one site only, the project has minimal GitHub activity (5 stars, 31 commits), and no independent benchmarks or community validation exist yet. This is genuinely early stage—suited only for teams willing to build on foundational abstractions before the ecosystem exists.
Bottom line: Pick this if your team needs deterministic agent testing and can accept a single e-commerce template. Once you need multiple domains or production-scale evaluation, you'll outgrow it quickly.
Community Performance Report Card
No community ratings yet. Be the first to rate this tool!
Community Benchmarks Community
Sign in to submit a benchmarkNo community benchmarks yet. Be the first to share a real-world data point.
Pros
Sign in to edit- Deterministic and reproducible test execution via SQLite reset and seeding
- Failure-mode injection enables testing resilience without real-world dependencies
- Auditable success evaluation based on database state rather than LLM judges
- Multiple adapter support (browser-use, stagehand, vision-only)
- Production-shaped synthetic site covers realistic flows (auth, multi-step checkout, returns)
Cons
Sign in to edit- Early v0 release with single synthetic site (shop_v1); expanding to more domains requires additional content work
- Limited to e-commerce domain in current version
- Requires Docker, Python 3.11+, Node 20+ (for stagehand), and Chromium
Community Reviews
Sign in to write a reviewNo reviews yet. Be the first to share your experience.
About
- Platforms
- Docker, Python, Node.js, Chromium
- API Available
- No
- Self-Hosted
- Yes
- Last Updated
- 2026-05-15T21:33:14.849Z
Best For
Who it's for
- AI browser agent developers and researchers
- QA teams validating agent robustness
- Teams building production-grade browser automation
- Researchers benchmarking multi-step agentic workflows
- Organizations needing deterministic testing environments
What it does well
- Testing browser agent performance against simulated failures and real-world scenarios
- Benchmarking multi-step interaction flows (forms, auth, checkout)
- Evaluating agent reasoning on ambiguous UI elements
- Reproducing deterministic test runs for regression testing
- Comparing agent adapter implementations (browser-use, stagehand)
Integrations
Discussion Community
Sign in to commentNo discussion yet. Sign in to start the conversation.
Spotted incorrect or missing data? Join our community of contributors.
Sign Up to ContributeCommunity Notes & Tips Community
Sign in to contributeBe the first to contribute. General notes, observations, gotchas, and tips from people who use this tool day-to-day.
Recommended skills for this tool
Auto-curated by the AIDiveForge recommendation matrix. These skills are predicted to enhance this tool based on category, capability, and domain signals.
-
Meeting Summary Template transform 32%
Turn a raw transcript into a decision-focused recap: outcomes, owners, deadlines, open threads.
Why: category partial · caps 0/0 · domain ops
-
Standup Note Synthesizer transform 32%
Merge individual standup bullets from multiple people into a single team digest with blockers surfaced to the top.
Why: category partial · caps 0/0 · domain ops
-
Runbook Skeleton post 32%
Produce a first-draft runbook from a postmortem — detection, diagnosis, mitigation, rollback — so the next incident has a template to follow.
Why: category partial · caps 0/0 · domain ops
-
OKR Draft Critiquer post 32%
Score draft OKRs against SMART criteria and the outcome-not-output rule, with suggested rewrites for each failing key result.
Why: category partial · caps 0/0 · domain ops
Frequently Asked Questions
- Is Resurf free?
- Yes — Resurf is fully free to use. There is no paid tier.
- Is Resurf open source?
- Yes. Resurf is open source — the source repository is at https://github.com/lightfeed/resurf.
- Can I self-host Resurf?
- Yes. Resurf supports self-hosting on your own infrastructure.
- What platforms does Resurf support?
- Resurf is available on: Docker, Python, Node.js, Chromium.
Hours Saved & ROI Stories Community
Sign in to contributeBe the first to contribute. Concrete time/cost savings, with context. e.g. "Cut my code review backlog from 4h to 45m per week."
