Bloom and Resurf are both coding assistants tracked by AIDiveForge. Below is a side-by-side comparison of pricing, capabilities, platforms, and ownership — sourced from each tool's live website and verified before publishing.
Testing framework providing deterministic, reproducible environments for AI browser agent validation with synthetic websites and failure-mode injection.
Attribute
Bloom
Resurf
Pricing
Free
Free
Free trial
No
No
Open source
No
Yes
Has API
Yes
No
Self-hosted option
Yes
Yes
Platforms
Python; integrates with Anthropic and OpenAI models via LiteLLM; supports Weights & Biases
Docker, Python, Node.js, Chromium
Languages
Python
—
Released
2025-12-20
—
Pros
Reproducible and targeted evaluations that quantify frequency and severity across automatically generated scenarios
Evaluations correlate strongly with hand-labelled judgments and reliably separate baseline models from intentionally misaligned ones
Researchers can extensively configure Bloom's behavior, through choosing models for each stage, adjusting interactions' length and modality
Using Bloom evaluations took only a few days to conceptualize, refine and generate
Integrates with Weights & Biases for experiments at scale and exports Inspect-compatible transcripts
Deterministic and reproducible test execution via SQLite reset and seeding
Failure-mode injection enables testing resilience without real-world dependencies
Auditable success evaluation based on database state rather than LLM judges
Multiple adapter support (browser-use, stagehand, vision-only)
Production-shaped synthetic site covers realistic flows (auth, multi-step checkout, returns)
Cons
Bloom is only as robust as the seeds and judging logic that power it; teams should treat seeds as living governance artifacts, and for ambiguous or highly contextual behaviors, periodic manual review is still necessary
Bloom's evaluation suite is unlikely to match the precise distribution of scenarios found in existing benchmarks, and since model behavior can be sensitive to context and prompt variations, direct comparisons are unreliable
Early v0 release with single synthetic site (shop_v1); expanding to more domains requires additional content work
Limited to e-commerce domain in current version
Requires Docker, Python 3.11+, Node 20+ (for stagehand), and Chromium
Bottom line
Resurf is open source; only Bloom exposes a public API. Choose based on which difference matters most for your workflow.
Comparison data is sourced and verified by the AIDiveForge data pipeline. AIDiveForge is editorially independent.
We use cookies for analytics and to measure how the site performs. You decide what's on.
See our Privacy Policy.
Cookie preferences
Choose which categories of cookies we may set on your device. Strictly necessary cookies are always on. The rest you can toggle individually.
Strictly necessary
Required for core site functionality (login state, security, your consent record). Cannot be disabled.
Functional
Remember preferences like theme, dismissed banners, and saved comparisons. No tracking.
Analytics
Self-hosted page analytics + Google Analytics 4. Helps us see which pages are useful. Pseudonymous, IP-anonymized.
Marketing & advertising
Used by Google's ad and personalization signals if we ever run paid promotions. Off by default.
You can revisit these choices any time via the "Cookie settings" link in the footer. Read the full Privacy Policy.