---
name: browser-testing-agent-loop
description: Agentic loop that exercises a web app like a new user, files bug reports with screenshots + reproduction steps.
title: Browser Testing Agent Loop
category: agents-workflows
difficulty: advanced
license: MIT
source_url: "https://github.com/browser-use/browser-use"
icon: 🕷️
input: mixed
output: structured-json
phase: post
domain: ops
tags: browser-testing,agent-loop,playwright,vision-llm,qa-automation,bug-reporting,web-app-testing,agentic-workflow,screenshot-analysis,accessibility-testing,regression-detection,structured-bug-reports
best_for:
  - pre-release QA passes on web applications
  - discovery of visual and layout regressions
  - accessibility and UX issue detection
  - reducing manual QA effort on iterative builds
---

## Description

A reusable skill wrapping Playwright + a vision-capable LLM. Given a URL and an optional user story, it navigates the app, tries variations of user flows, and files structured bug reports (title, steps, expected, actual, severity, screenshot). Acts as a tireless QA pass for pre-release builds.

## Why it works

Traditional E2E test suites only catch regressions in scenarios a human has already thought of. An agent with a vision model catches the messy class of bugs nobody scripted: broken layouts, accessible-but-invisible elements, misleading copy. The report-then-continue pattern prevents one bug from blocking discovery of others, and the screenshot-plus-DOM snapshot gives developers a concrete repro artifact, not a hallucinated bug description.

## How it works

1) Spin up a headless browser (Playwright). 2) Navigate to the start URL; capture full-page screenshot + accessibility tree. 3) Feed both to the vision LLM with a system prompt: 'you're a QA engineer; choose the next action that explores a new area of this app'. 4) Parse returned action: click / type / navigate. 5) Execute; re-capture. If a condition from the user story is violated (404, network error, visual regression vs a baseline), emit a bug record with: title, reproduction steps from the action log, severity, attached screenshot. 6) Continue until the budget is hit or every link has been touched. 7) Output: `bugs.jsonl` + a browsable HTML report.