Skip to content

feat(copilot): Add agent-browser multi-step browser automation tools#12230

Merged
majdyz merged 29 commits intodevfrom
feat/agent-browser-copilot
Mar 3, 2026
Merged

feat(copilot): Add agent-browser multi-step browser automation tools#12230
majdyz merged 29 commits intodevfrom
feat/agent-browser-copilot

Conversation

@majdyz
Copy link
Contributor

@majdyz majdyz commented Feb 27, 2026

Summary

Adds three new Copilot tools for multi-step browser automation using the agent-browser CLI (Playwright-based local daemon):

  • browser_navigate β€” navigate to a URL and get an accessibility-tree snapshot with @ref IDs
  • browser_act β€” interact with page elements (click, fill, scroll, check, press, select, dblclick, type, wait, back, forward, reload); returns updated snapshot
  • browser_screenshot β€” capture annotated screenshot (with @ref overlays) and save to user workspace

Also adds browse_web (Stagehand + Browserbase) for one-shot JS-rendered page extraction.

Why two browser tools?

Tool When to use
browse_web Single-shot extraction β€” cloud Browserbase session, no local daemon needed
browser_navigate / browser_act Multi-step flows (login β†’ navigate β†’ scrape), persistent session within a Copilot session

Design decisions

  • SSRF protection: Uses the same validate_url() from backend.util.request as HTTP blocks β€” async DNS, all IPs checked, full RFC 1918 + link-local + IPv6 coverage
  • Session isolation: _run() passes both --session <id> (isolated Chromium context per Copilot session) and --session-name <id> (persist cookies within a session), preventing cross-session state leakage while supporting login flows
  • Snapshot truncation: Interactive-only accessibility tree (snapshot -i) capped at 20 000 chars with a continuation hint
  • Screenshot storage: PNG bytes uploaded to user workspace via WriteWorkspaceFileTool; returns file_id for retrieval

Bugs fixed in this PR

  • Session isolation bug: --session-name alone shared browser history across different Copilot sessions; added --session to isolate contexts
  • Missing actions: added dblclick, type (append without clearing), wait (CSS selector or ms delay)

Test plan

  • 53 unit tests covering all three tools, all actions, SSRF integration, auth check, session isolation, snapshot truncation, timeout, missing binary
  • Integration test: real agent-browser CLI + Anthropic API tool-calling loop (3/3 scenarios passed)
  • Linting (Ruff, isort, Black, Pyright) all passing
backend/copilot/tools/agent_browser_test.py  53 passed in 17.79s

Adds a new `browse_web` copilot tool backed by Stagehand + Browserbase
that handles JavaScript-rendered pages, SPAs, and dynamic content that
the existing `web_fetch` tool cannot reach.

- BrowseWebTool: navigates URLs with a real browser, extracts content
  via natural language instruction using Stagehand's extract() API
- Ephemeral sessions per call β€” no external storage needed
- Thread-safe signal handling for the CoPilot executor thread pool
- 50K char content limit to protect LLM context window
- Graceful degradation when Stagehand env vars are not configured
- Registered in TOOL_REGISTRY alongside web_fetch

Requires: STAGEHAND_API_KEY, STAGEHAND_PROJECT_ID, ANTHROPIC_API_KEY
Adds browse_web to the ResponseType enum in the OpenAPI schema to match
the new BrowseWebTool response type added to the backend.
- Lazy-import Stagehand inside _execute with ImportError guard so missing
  stagehand package never breaks other tools in the registry
- Remove _thread_safe_signal context manager (redundant + globally mutates
  signal.signal from worker threads, which is harmful)
- Add _patch_stagehand_once() with double-checked locking β€” thread-safe
  one-shot monkey-patch identical to the pattern in blocks/stagehand/blocks.py
- Add explicit timeoutMs to page.goto() (30 s) and page.extract() (60 s)
- Fix truncation: compute keep = MAX - len(suffix) to stay within cap
- Catch Exception without binding `e`; log via logger.exception(); return
  generic message to avoid leaking raw exception text to callers
37 tests covering:
- Tool metadata (name, requires_auth, parameter schema, registry registration)
- Input validation (missing/empty/non-HTTP URLs)
- Env var checks (missing STAGEHAND_API_KEY, STAGEHAND_PROJECT_ID, ANTHROPIC_API_KEY)
- Stagehand absent (ImportError β†’ ErrorResponse, other tools unaffected)
- Successful browse (response shape, instruction forwarding, timeout constants,
  session_id propagation, client.close() called)
- Truncation (within limit, over limit, exact limit, empty/None extraction)
- Error handling (generic message, no raw exception leakage, finally close,
  close() failure does not propagate)
- Thread safety (_patch_stagehand_once idempotence, worker-thread no-op)

All tests run without a running server or database via sys.modules injection.
Implement three new Copilot tools backed by the vercel-labs/agent-browser
CLI β€” a local Playwright daemon that persists browser sessions (cookies,
auth) across tool calls via --session-name:

- browser_navigate: Navigate to a URL, returns accessibility tree snapshot
  with @ref IDs for element targeting.
- browser_act: Perform page interactions (click, fill, scroll, hover, press,
  check, uncheck, select, back, forward, reload) and return updated snapshot.
- browser_screenshot: Capture annotated screenshot saved to workspace files
  so the user can view it via read_workspace_file.

Because the Copilot is already multi-turn via the Claude Agent SDK, each
tool call is a single browser action β€” the LLM chains them naturally to
handle login flows, SPAs, and multi-step interactions.

Security: SSRF protection via DNS resolution + RFC1918/loopback/link-local
block in _ssrf_check() applied before every navigation.

52 unit tests, all passing.
agent_browser.py had a hand-rolled _ssrf_check() using the synchronous
socket.gethostbyname() (first IPv4 only) and Python's addr.is_private
attribute flags. This missed:
- DNS rebinding (only first resolved IP was checked)
- 0.0.0.0/8 "This Network" range
- Full IPv6 blocked ranges (fc00::/7, fe80::/10, ff00::/8)
- IDNA encoding / Unicode domain attacks

Replace with validate_url() from backend.util.request β€” the same guard
used by HTTP blocks and web_fetch β€” which:
- Uses async getaddrinfo, resolving ALL IPs and blocking if any is blocked
- Uses an explicit BLOCKED_IP_NETWORKS list with complete IPv4+IPv6 coverage
- Applies IDNA encoding to prevent Unicode domain attacks
- Validates hostname character safety

Remove the now-unused imports (ipaddress, socket, urllib.parse).
Update tests: remove TestSsrfCheck (tested deleted function), add
TestSsrfViaValidateUrl which mocks validate_url at the integration boundary.
- Use both --session (isolate context) and --session-name (persist cookies)
  in _run() to prevent cross-session browser state leakage while keeping
  auth/cookies alive within the same Copilot session
- Add dblclick, type, wait actions to BrowserActTool: type appends text
  without clearing (unlike fill), wait accepts a CSS selector or ms string
- Add 11 new unit tests covering new actions and validation (53 total)
@majdyz majdyz requested a review from a team as a code owner February 27, 2026 10:13
@majdyz majdyz requested review from 0ubbe and kcze and removed request for a team February 27, 2026 10:13
@github-project-automation github-project-automation bot moved this to πŸ†• Needs initial review in AutoGPT development kanban Feb 27, 2026
@github-actions
Copy link
Contributor

This PR targets the master branch but does not come from dev or a hotfix/* branch.

Automatically setting the base branch to dev.

@github-actions github-actions bot changed the base branch from master to dev February 27, 2026 10:13
@github-actions github-actions bot added platform/frontend AutoGPT Platform - Front end platform/backend AutoGPT Platform - Back end labels Feb 27, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 27, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▢️ Resume reviews
  • πŸ” Trigger review

Walkthrough

Adds four browser-capable copilot tools (browse_web, browser_navigate, browser_act, browser_screenshot), new response models, a registry-driven availability filter, subprocess/session helpers, and comprehensive unit tests for the new tooling.

Changes

Cohort / File(s) Summary
Tool Registry
autogpt_platform/backend/backend/copilot/tools/__init__.py
Imported new browser tools; added browse_web, browser_navigate, browser_act, browser_screenshot to TOOL_REGISTRY; replaced static tool list with get_available_tools() that filters by tool.is_available.
Agent Browser Tools
autogpt_platform/backend/backend/copilot/tools/agent_browser.py
New agent-browser module: BrowserNavigateTool, BrowserActTool, BrowserScreenshotTool; subprocess helper _run, snapshot helper _snapshot, session-scoped behavior, validation, error handling, and workspace upload integration for screenshots.
Browse Web Tool
autogpt_platform/backend/backend/copilot/tools/browse_web.py
New BrowseWebTool using Stagehand (lazy import) and Anthropic extraction with env-var checks, one-time Stagehand patch, timeouts, truncation, and cleanup/error handling.
Models & Frontend Schema
autogpt_platform/backend/backend/copilot/tools/models.py, autogpt_platform/frontend/src/app/api/openapi.json
Added BROWSE_WEB, BROWSER_NAVIGATE, BROWSER_ACT, BROWSER_SCREENSHOT to ResponseType; added BrowseWebResponse, BrowserNavigateResponse, BrowserActResponse, BrowserScreenshotResponse; updated OpenAPI enum.
Base Tool Availability
autogpt_platform/backend/backend/copilot/tools/base.py
Added is_available property to BaseTool (default True) for runtime availability filtering.
Service Integration
autogpt_platform/backend/backend/copilot/service.py
Switched import/usage from static tools list to get_available_tools() when streaming chat/tool data.
Tests: Agent Browser
autogpt_platform/backend/backend/copilot/tools/agent_browser_test.py
Extensive unit tests covering metadata, input validation, SSRF checks, action flows, snapshot handling, timeouts, session propagation, and mocked subprocess behavior.
Tests: Browse Web
autogpt_platform/backend/backend/copilot/tools/browse_web_test.py
Comprehensive tests for BrowseWebTool: env/config validation, Stagehand absence handling, extraction success/failure, truncation, timeouts, client closure, and thread-safety/idempotence of Stagehand patch.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant BrowseWebTool
    participant Stagehand
    participant Anthropic

    Client->>BrowseWebTool: browse_web(url, instruction, session_id)
    BrowseWebTool->>BrowseWebTool: validate URL & env config
    BrowseWebTool->>Stagehand: init client (lazy import) / navigate
    Stagehand-->>BrowseWebTool: page snapshot / url
    BrowseWebTool->>Anthropic: extract content (instruction)
    Anthropic-->>BrowseWebTool: extracted content
    BrowseWebTool->>BrowseWebTool: truncate if needed
    BrowseWebTool-->>Client: BrowseWebResponse(url, content, truncated)
Loading
sequenceDiagram
    participant Client
    participant BrowserNavigateTool
    participant AgentBrowser
    participant BrowserActTool
    participant BrowserScreenshotTool
    participant Workspace

    Client->>BrowserNavigateTool: browser_navigate(url, session_id)
    BrowserNavigateTool->>AgentBrowser: _run navigate
    AgentBrowser-->>BrowserNavigateTool: title, url, snapshot
    BrowserNavigateTool-->>Client: BrowserNavigateResponse

    Client->>BrowserActTool: browser_act(action, target, session_id)
    BrowserActTool->>AgentBrowser: _run action
    AgentBrowser-->>BrowserActTool: snapshot, current_url
    BrowserActTool-->>Client: BrowserActResponse

    Client->>BrowserScreenshotTool: browser_screenshot(session_id, annotate)
    BrowserScreenshotTool->>AgentBrowser: _run screenshot
    AgentBrowser-->>BrowserScreenshotTool: base64 image
    BrowserScreenshotTool->>Workspace: WriteWorkspaceFileTool upload
    Workspace-->>BrowserScreenshotTool: file_id, filename
    BrowserScreenshotTool-->>Client: BrowserScreenshotResponse
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

Possible security concern, Review effort 4/5

Suggested reviewers

  • Pwuts
  • ntindle

Poem

🐰 I hopped through tabs and tiny beams of light,

I clicked and captured pages late at night.
Snapshots, scripts, sessions snug and bright,
New tools and tests β€” I bounced with pure delight! ✨

πŸš₯ Pre-merge checks | βœ… 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 18.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
βœ… Passed checks (2 passed)
Check name Status Explanation
Title check βœ… Passed The PR title accurately summarizes the main feature addition: introducing agent-browser multi-step browser automation tools for the Copilot platform.
Description check βœ… Passed The PR description provides comprehensive context about the changes, including the three new tools added, design decisions, session isolation handling, SSRF protection, and test coverage details.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
πŸ§ͺ Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/agent-browser-copilot

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❀️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 27, 2026

πŸ” PR Overlap Detection

This check compares your PR against all other open PRs targeting the same branch to detect potential merge conflicts early.

πŸ”΄ Merge Conflicts Detected

The following PRs have been tested and will have merge conflicts if merged after this PR. Consider coordinating with the authors.

🟑 Medium Risk β€” Some Line Overlap

These PRs have some overlapping changes:

🟒 Low Risk β€” File Overlap Only

These PRs touch the same files but different sections (click to expand)

Summary: 4 conflict(s), 2 medium risk, 2 low risk (out of 8 PRs with file overlap)


Auto-generated on push. Ignores: openapi.json, lock files.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

πŸ€– Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@autogpt_platform/backend/backend/copilot/tools/agent_browser.py`:
- Line 205: The wait invocations currently drop _run's result so failures are
ignored; update the calls to check _run(session_name, "wait", "--load",
wait_for) (and the similar call around the other occurrence) for a
non-zero/failed return and propagate the error by raising an exception or
returning a failing ToolResult instead of proceedingβ€”i.e., capture the return
value from _run, test it, and if it indicates failure include context
(session_name, wait_for) and abort the tool flow so downstream steps don't get
stale snapshots.
- Around line 462-463: The current annotate coercion uses
bool(kwargs.get("annotate", True)) which treats any non-empty string like
"false" as True; change the assignment for annotate in agent_browser.py to
coerce strings explicitly (e.g., read raw = kwargs.get("annotate", True); if
isinstance(raw, str): annotate = raw.strip().lower() not in
("false","0","no","off","n"); else: annotate = bool(raw)) so string values like
"false", "0", "no" become False while preserving existing boolean/non-string
behavior.
- Around line 105-109: The current truncation in agent_browser.py slices text to
_MAX_SNAPSHOT_CHARS then appends the truncation suffix, which can exceed
_MAX_SNAPSHOT_CHARS; update the logic that handles variable text (the text
variable near the truncation block) to compute the suffix string (e.g.,
"\n\n[Snapshot truncated β€” use browser_act to navigate further]") and slice text
to _MAX_SNAPSHOT_CHARS - len(suffix) before appending the suffix so the final
string length never exceeds _MAX_SNAPSHOT_CHARS; reference the
_MAX_SNAPSHOT_CHARS constant and the truncation suffix/variable when making the
change.
- Around line 82-90: The timeout branch currently returns without killing or
reaping the created subprocess (proc), causing orphaned agent-browser processes;
update the except asyncio.TimeoutError handler to first kill/terminate the
subprocess (call proc.kill() or proc.terminate()), then await proc.wait() or
proc.communicate() to reap it before returning the timeout result so the child
process is not leaked; target the proc variable used when creating the
subprocess and the asyncio.wait_for(proc.communicate(), timeout=timeout) call so
the termination and reaping happen reliably on timeouts.

In `@autogpt_platform/backend/backend/copilot/tools/browse_web.py`:
- Around line 136-141: Replace the naive scheme check in the browse_web handler
with the shared SSRF guard by calling validate_url(url) before invoking
Stagehand: remove the startswith(("http://","https://")) check, call
validate_url(url) and if it returns an error/raises, return an ErrorResponse
(preserving session_id) with the validate_url-provided message/error; also
ensure validate_url is imported where browse_web (or the function containing the
shown snippet) is defined so DNS/IP/IDNA checks run prior to Stagehand calls.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

πŸ“₯ Commits

Reviewing files that changed from the base of the PR and between b9aac42 and afd7ddf.

πŸ“’ Files selected for processing (7)
  • autogpt_platform/backend/backend/copilot/tools/__init__.py
  • autogpt_platform/backend/backend/copilot/tools/agent_browser.py
  • autogpt_platform/backend/backend/copilot/tools/agent_browser_test.py
  • autogpt_platform/backend/backend/copilot/tools/browse_web.py
  • autogpt_platform/backend/backend/copilot/tools/browse_web_test.py
  • autogpt_platform/backend/backend/copilot/tools/models.py
  • autogpt_platform/frontend/src/app/api/openapi.json
πŸ“œ Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: types
  • GitHub Check: Seer Code Review
  • GitHub Check: end-to-end tests
  • GitHub Check: test (3.13)
  • GitHub Check: test (3.12)
  • GitHub Check: test (3.11)
  • GitHub Check: Check PR Status
  • GitHub Check: Analyze (python)
  • GitHub Check: Analyze (javascript-typescript)
🧰 Additional context used
πŸ““ Path-based instructions (6)
autogpt_platform/backend/**/*.py

πŸ“„ CodeRabbit inference engine (.github/copilot-instructions.md)

autogpt_platform/backend/**/*.py: Use Python 3.11 (required; managed by Poetry via pyproject.toml) for backend development
Always run 'poetry run format' (Black + isort) before linting in backend development
Always run 'poetry run lint' (ruff) after formatting in backend development

Files:

  • autogpt_platform/backend/backend/copilot/tools/browse_web_test.py
  • autogpt_platform/backend/backend/copilot/tools/agent_browser_test.py
  • autogpt_platform/backend/backend/copilot/tools/__init__.py
  • autogpt_platform/backend/backend/copilot/tools/browse_web.py
  • autogpt_platform/backend/backend/copilot/tools/models.py
  • autogpt_platform/backend/backend/copilot/tools/agent_browser.py
autogpt_platform/backend/**/*.{py,txt}

πŸ“„ CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)

Use poetry run prefix for all Python commands, including testing, linting, formatting, and migrations

Files:

  • autogpt_platform/backend/backend/copilot/tools/browse_web_test.py
  • autogpt_platform/backend/backend/copilot/tools/agent_browser_test.py
  • autogpt_platform/backend/backend/copilot/tools/__init__.py
  • autogpt_platform/backend/backend/copilot/tools/browse_web.py
  • autogpt_platform/backend/backend/copilot/tools/models.py
  • autogpt_platform/backend/backend/copilot/tools/agent_browser.py
autogpt_platform/backend/**/*_test.py

πŸ“„ CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)

autogpt_platform/backend/**/*_test.py: Always review snapshot changes with git diff before committing when updating snapshots with poetry run pytest --snapshot-update
Use pytest with snapshot testing for API responses in test files
Colocate test files with source files using the *_test.py naming convention

Files:

  • autogpt_platform/backend/backend/copilot/tools/browse_web_test.py
  • autogpt_platform/backend/backend/copilot/tools/agent_browser_test.py
autogpt_platform/backend/backend/**/*.py

πŸ“„ CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)

Use Prisma ORM for database operations in PostgreSQL with pgvector for embeddings

Files:

  • autogpt_platform/backend/backend/copilot/tools/browse_web_test.py
  • autogpt_platform/backend/backend/copilot/tools/agent_browser_test.py
  • autogpt_platform/backend/backend/copilot/tools/__init__.py
  • autogpt_platform/backend/backend/copilot/tools/browse_web.py
  • autogpt_platform/backend/backend/copilot/tools/models.py
  • autogpt_platform/backend/backend/copilot/tools/agent_browser.py
autogpt_platform/**/*.py

πŸ“„ CodeRabbit inference engine (AGENTS.md)

Format Python code with poetry run format

Files:

  • autogpt_platform/backend/backend/copilot/tools/browse_web_test.py
  • autogpt_platform/backend/backend/copilot/tools/agent_browser_test.py
  • autogpt_platform/backend/backend/copilot/tools/__init__.py
  • autogpt_platform/backend/backend/copilot/tools/browse_web.py
  • autogpt_platform/backend/backend/copilot/tools/models.py
  • autogpt_platform/backend/backend/copilot/tools/agent_browser.py
autogpt_platform/backend/**/*test*.py

πŸ“„ CodeRabbit inference engine (AGENTS.md)

Run poetry run test for backend testing (runs pytest with docker based postgres + prisma)

Files:

  • autogpt_platform/backend/backend/copilot/tools/browse_web_test.py
  • autogpt_platform/backend/backend/copilot/tools/agent_browser_test.py
🧠 Learnings (6)
πŸ“š Learning: 2026-01-28T18:29:34.362Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: autogpt_platform/frontend/src/tests/CLAUDE.md:0-0
Timestamp: 2026-01-28T18:29:34.362Z
Learning: Applies to autogpt_platform/frontend/src/tests/src/tests/**/*.spec.ts : Use E2E tests (Playwright) for flows requiring real browser APIs (clipboard, downloads) or cross-page navigation

Applied to files:

  • autogpt_platform/backend/backend/copilot/tools/browse_web_test.py
  • autogpt_platform/backend/backend/copilot/tools/agent_browser_test.py
πŸ“š Learning: 2026-02-26T17:02:22.448Z
Learnt from: Pwuts
Repo: Significant-Gravitas/AutoGPT PR: 12211
File: .pre-commit-config.yaml:160-179
Timestamp: 2026-02-26T17:02:22.448Z
Learning: Keep the pre-commit hook pattern broad for autogpt_platform/backend to ensure OpenAPI schema changes are captured. Do not narrow to backend/api/ alone, since the generated schema depends on Pydantic models across multiple directories (backend/data/, backend/blocks/, backend/copilot/, backend/integrations/, backend/util/). Narrowing could miss schema changes and cause frontend type desynchronization.

Applied to files:

  • autogpt_platform/backend/backend/copilot/tools/browse_web_test.py
  • autogpt_platform/backend/backend/copilot/tools/agent_browser_test.py
  • autogpt_platform/backend/backend/copilot/tools/__init__.py
  • autogpt_platform/backend/backend/copilot/tools/browse_web.py
  • autogpt_platform/backend/backend/copilot/tools/models.py
  • autogpt_platform/backend/backend/copilot/tools/agent_browser.py
πŸ“š Learning: 2026-02-27T07:26:32.993Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12213
File: autogpt_platform/frontend/src/app/(platform)/copilot/tools/RunMCPTool/helpers.tsx:23-24
Timestamp: 2026-02-27T07:26:32.993Z
Learning: In autogpt_platform/frontend/src/app/(platform)/copilot/tools/**/helpers.tsx files, inline TypeScript interfaces for tool response types (e.g., MCPToolsDiscoveredOutput, BlockDetailsResponse) are intentional for SSE stream payloads that don't appear in openapi.json. Only ResponseType enum values are generated. This pattern should not be flagged for replacement with generated types.

Applied to files:

  • autogpt_platform/frontend/src/app/api/openapi.json
  • autogpt_platform/backend/backend/copilot/tools/models.py
πŸ“š Learning: 2026-02-04T16:49:42.490Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-02-04T16:49:42.490Z
Learning: Applies to autogpt_platform/backend/**/test/**/*.py : Use snapshot testing with '--snapshot-update' flag in backend tests when output changes; always review with 'git diff'

Applied to files:

  • autogpt_platform/backend/backend/copilot/tools/agent_browser_test.py
πŸ“š Learning: 2026-02-04T16:50:20.508Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: autogpt_platform/backend/CLAUDE.md:0-0
Timestamp: 2026-02-04T16:50:20.508Z
Learning: Applies to autogpt_platform/backend/**/*_test.py : Use pytest with snapshot testing for API responses in test files

Applied to files:

  • autogpt_platform/backend/backend/copilot/tools/agent_browser_test.py
πŸ“š Learning: 2026-02-04T16:49:42.490Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-02-04T16:49:42.490Z
Learning: Applies to autogpt_platform/backend/backend/api/features/**/*.py : Update routes in '/backend/backend/api/features/' and add/update Pydantic models in the same directory for API development

Applied to files:

  • autogpt_platform/backend/backend/copilot/tools/models.py
🧬 Code graph analysis (3)
autogpt_platform/backend/backend/copilot/tools/__init__.py (2)
autogpt_platform/backend/backend/copilot/tools/agent_browser.py (3)
  • BrowserActTool (234-410)
  • BrowserNavigateTool (118-219)
  • BrowserScreenshotTool (418-520)
autogpt_platform/backend/backend/copilot/tools/browse_web.py (1)
  • BrowseWebTool (68-227)
autogpt_platform/backend/backend/copilot/tools/browse_web.py (3)
autogpt_platform/backend/backend/copilot/model.py (1)
  • ChatSession (126-302)
autogpt_platform/backend/backend/copilot/tools/base.py (1)
  • BaseTool (16-119)
autogpt_platform/backend/backend/copilot/tools/models.py (3)
  • BrowseWebResponse (447-453)
  • ErrorResponse (209-214)
  • ToolResponseBase (60-65)
autogpt_platform/backend/backend/copilot/tools/agent_browser.py (3)
autogpt_platform/backend/backend/util/request.py (3)
  • request (365-402)
  • validate_url (147-215)
  • text (303-313)
autogpt_platform/backend/backend/copilot/tools/models.py (5)
  • BrowserActResponse (508-514)
  • BrowserNavigateResponse (499-505)
  • BrowserScreenshotResponse (517-522)
  • ErrorResponse (209-214)
  • ToolResponseBase (60-65)
autogpt_platform/backend/backend/copilot/tools/workspace_files.py (2)
  • WorkspaceWriteResponse (210-219)
  • WriteWorkspaceFileTool (524-701)
πŸ”‡ Additional comments (9)
autogpt_platform/backend/backend/copilot/tools/models.py (5)

44-49: LGTM!

The new enum values follow the existing naming convention (lowercase snake_case) and are logically grouped with clear comments explaining their purpose.


447-453: LGTM!

The model is consistent with the existing WebFetchResponse pattern while appropriately omitting HTTP-specific fields (status_code, content_type) since Stagehand abstracts those details.


496-505: LGTM!

The model captures all relevant navigation response data. The inline comment helpfully clarifies that snapshot contains the interactive accessibility tree with @ref IDs.


508-514: LGTM!

The model appropriately captures the action performed and resulting state. The empty string default for current_url is reasonable for edge cases where the URL may be temporarily unavailable (e.g., during navigation transitions).


517-522: LGTM!

The model appropriately returns a workspace file reference rather than raw bytes, which is suitable for JSON serialization. The inline comment helpfully explains how to retrieve the actual screenshot.

autogpt_platform/frontend/src/app/api/openapi.json (1)

11170-11173: ResponseType enum update looks correct.

Line 11170-Line 11173 correctly add the new browser-related tool response types and align the OpenAPI contract with the backend tool additions.

autogpt_platform/backend/backend/copilot/tools/__init__.py (1)

55-60: Nice registry wiring for the new browser tool surface.

These additions are consistent with the existing registry pattern and keep OpenAI tool export behavior unchanged.

autogpt_platform/backend/backend/copilot/tools/browse_web_test.py (1)

145-486: Strong coverage for BrowseWebTool behavior and failure paths.

This suite meaningfully exercises config gating, import-failure isolation, truncation boundaries, cleanup guarantees, and thread-safety patch behavior.

autogpt_platform/backend/backend/copilot/tools/agent_browser_test.py (1)

43-749: Comprehensive unit coverage for the new agent-browser tools.

The suite thoroughly validates auth gating, SSRF integration, action validation/execution, screenshot handling, and subprocess helper edge cases.

CI failure:
- WorkspaceWriteResponse constructor in test now includes mime_type and
  download_url (required fields added in dev branch)

CodeRabbit fixes:
- _run(): kill orphaned subprocess on TimeoutError to prevent process leak
- _snapshot(): fix truncation: suffix was appended after 20k cap, making
  output exceed _MAX_SNAPSHOT_CHARS (same fix as browse_web)
- BrowserNavigateTool: log wait() failures instead of silently dropping rc
- BrowserActTool: log post-action settle failures instead of silently dropping rc
- BrowserScreenshotTool: fix annotate coercion β€” bool("false") was True;
  now handles string "false"/"no"/"0" correctly
- browse_web: replace scheme-only startswith() check with shared validate_url()
  for full SSRF protection (same guard as agent_browser and HTTP blocks)
- Empty except blocks: add explanatory comments
- browse_web_test: consolidate imports via module alias to fix mixed
  import/import-from lint warning
@majdyz
Copy link
Contributor Author

majdyz commented Feb 27, 2026

Fixed all CodeRabbit and code-quality issues in commit f0ce7b1:

CodeRabbit issues:

  • Critical β€” subprocess leak on timeout (agent_browser.py:90): _run() now kills the subprocess and awaits it for reaping when TimeoutError is caught.
  • Minor β€” snapshot over-truncation (agent_browser.py:109): Fixed to keep = max(0, _MAX_SNAPSHOT_CHARS - len(suffix)) so output stays within the cap.
  • Major β€” wait RC silently dropped (agent_browser.py:205): Both navigate and act now capture the RC and logger.warning on failure. Not a hard error because SPAs commonly never reach networkidle.
  • Minor β€” bool("false") coercion bug (agent_browser.py:463): Added explicit string path: 'false'/'no'/'0' β†’ False, everything else β†’ bool(raw).
  • Major β€” browse_web SSRF (browse_web.py:141): Replaced startswith(('http://', 'https://')) with await validate_url(url, trusted_origins=[]) β€” full DNS + IP + IDNA guard, same as agent_browser and HTTP blocks.

Code-quality bot issues:

  • Mixed import in browse_web_test: Consolidated to single module alias (_browse_web_mod.BrowseWebTool etc.) eliminating the mixed import/from ... import for the same module.
  • Empty except blocks: Added explanatory comments in both agent_browser.py and agent_browser_test.py.

CI failure (WorkspaceWriteResponse missing fields): Added mime_type and download_url to the test constructor after merging dev which added those required fields.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (5)
autogpt_platform/backend/backend/copilot/tools/agent_browser_test.py (2)

615-663: Consider moving the import json to module level.

The import json statements on lines 627, 644, and 659 are repeated inside each test method. While functionally correct, moving the import to the module level (around line 9) would be cleaner.

♻️ Suggested refactor
 import asyncio
+import json
 from unittest.mock import AsyncMock, MagicMock, patch

Then remove the import json lines from within the test methods.

πŸ€– Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/tools/agent_browser_test.py` around
lines 615 - 663, The repeated local imports of json inside TestAuthCheck's test
methods (test_navigate_requires_login_if_no_user,
test_act_requires_login_if_no_user, test_screenshot_requires_login_if_no_user)
should be moved to a single module-level import; add "import json" near the top
of the test module and remove the three in-method "import json" statements so
the tests use the module-level json symbol consistently.

719-737: Timeout test does not fully exercise cleanup path and could hang if naively fixed.

The test has two issues:

  1. mock_proc.returncode is a MagicMock (not None), so the condition proc.returncode is None evaluates to False and the cleanup code (proc.kill() + await proc.communicate()) is never executed.

  2. If someone fixes this by setting mock_proc.returncode = None, the test would hang because slow_communicate sleeps for 100 seconds.

To properly test the timeout cleanup path:

♻️ Suggested fix to properly test timeout cleanup
     `@pytest.mark.asyncio`
     async def test_run_timeout(self):
         from .agent_browser import _run

-        async def slow_communicate():
-            await asyncio.sleep(100)
-            return b"", b""
-
         mock_proc = MagicMock()
-        mock_proc.communicate = slow_communicate
+        mock_proc.returncode = None  # Simulate running process
+        mock_proc.kill = MagicMock()
+        mock_proc.communicate = AsyncMock(return_value=(b"", b""))

         with patch("asyncio.create_subprocess_exec", return_value=mock_proc):
             with patch("asyncio.wait_for", side_effect=asyncio.TimeoutError):
                 rc, stdout, stderr = await _run(
                     "sess", "open", "https://x.com", timeout=1
                 )

         assert rc == 1
         assert "timed out" in stderr
+        mock_proc.kill.assert_called_once()
+        mock_proc.communicate.assert_called()
πŸ€– Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/tools/agent_browser_test.py` around
lines 719 - 737, The timeout test should trigger the cleanup path: set
mock_proc.returncode = None and make mock_proc.communicate a coroutine that
won't hang forever by returning quickly once proc.kill() is invoked; implement
slow_communicate to loop awaiting short sleeps checking a flag (e.g.
mock_proc._killed) and return b"", b"" as soon as that flag is set, and set
mock_proc.kill to a side-effect that sets mock_proc._killed = True; keep the
patch of asyncio.wait_for to raise asyncio.TimeoutError so _run sees the timeout
and then sees proc.returncode is None, calls proc.kill(), and
mock_proc.communicate then returns promptly allowing the test_run_timeout to
assert rc==1 and "timed out" in stderr.
autogpt_platform/backend/backend/copilot/tools/browse_web.py (2)

196-199: Consider replacing assert with explicit error handling.

assert statements can be disabled when Python runs with -O (optimized mode). While unlikely in production, defensive coding would use an explicit check:

♻️ Suggested fix
             page = client.page
-            assert page is not None, "Stagehand page is not initialized"
+            if page is None:
+                raise RuntimeError("Stagehand page is not initialized")
             await page.goto(url, timeoutMs=_GOTO_TIMEOUT_MS)
πŸ€– Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/tools/browse_web.py` around lines
196 - 199, Replace the runtime-dependent assert check for client.page with
explicit error handling: check if client.page (referenced as page) is None and
raise a clear exception (e.g., RuntimeError or ValueError) with a descriptive
message before calling await page.goto(url, timeoutMs=_GOTO_TIMEOUT_MS); update
the block around the page = client.page assignment in browse_web.py so the code
fails predictably when the Stagehand page is not initialized instead of relying
on assert.

227-232: Add brief comment explaining the suppressed exception.

The empty except: pass block is intentional but lacks the explanatory comment mentioned in the PR objectives. A brief comment aids maintainability.

πŸ“ Suggested fix
         finally:
             if client is not None:
                 try:
                     await client.close()
                 except Exception:
-                    pass
+                    pass  # Cleanup failure shouldn't mask the original response
πŸ€– Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/tools/browse_web.py` around lines
227 - 232, In the finally block that attempts to close the HTTP client (the code
that checks "if client is not None" and calls "await client.close()") replace
the empty "except Exception: pass" with the same suppression but add a concise
explanatory comment (e.g., "Suppress errors closing client β€” nothing actionable
during cleanup") so future readers understand the suppression is intentional and
not an oversight; keep the exception handling behavior unchanged and reference
the "client.close()" call in your comment.
autogpt_platform/backend/backend/copilot/tools/browse_web_test.py (1)

477-487: Use asyncio.run() instead of deprecated get_event_loop().run_until_complete().

asyncio.get_event_loop() emits a DeprecationWarning in Python 3.10+ when there's no running event loop. Since this test verifies post-execution state, convert to asyncio.run() for forward compatibility.

♻️ Suggested fix
     def test_patched_flag_set_after_execution(self, env_vars, stagehand_mocks):
         """After a successful browse, _stagehand_patched must be True."""
-
-        async def _run():
-            return await BrowseWebTool()._execute(
-                user_id="u1", session=make_session(), url="https://example.com"
-            )
-
         import asyncio
 
-        asyncio.get_event_loop().run_until_complete(_run())
+        asyncio.run(
+            BrowseWebTool()._execute(
+                user_id="u1", session=make_session(), url="https://example.com"
+            )
+        )
         assert _browse_web_mod._stagehand_patched is True
πŸ€– Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/tools/browse_web_test.py` around
lines 477 - 487, Replace the deprecated
asyncio.get_event_loop().run_until_complete call in
test_patched_flag_set_after_execution with asyncio.run for forward
compatibility: call asyncio.run(_run()) to execute the async helper _run (which
invokes BrowseWebTool()._execute(...)) instead of using
get_event_loop().run_until_complete, ensuring the test uses the modern event
loop entry point.
πŸ€– Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@autogpt_platform/backend/backend/copilot/tools/agent_browser_test.py`:
- Around line 615-663: The repeated local imports of json inside TestAuthCheck's
test methods (test_navigate_requires_login_if_no_user,
test_act_requires_login_if_no_user, test_screenshot_requires_login_if_no_user)
should be moved to a single module-level import; add "import json" near the top
of the test module and remove the three in-method "import json" statements so
the tests use the module-level json symbol consistently.
- Around line 719-737: The timeout test should trigger the cleanup path: set
mock_proc.returncode = None and make mock_proc.communicate a coroutine that
won't hang forever by returning quickly once proc.kill() is invoked; implement
slow_communicate to loop awaiting short sleeps checking a flag (e.g.
mock_proc._killed) and return b"", b"" as soon as that flag is set, and set
mock_proc.kill to a side-effect that sets mock_proc._killed = True; keep the
patch of asyncio.wait_for to raise asyncio.TimeoutError so _run sees the timeout
and then sees proc.returncode is None, calls proc.kill(), and
mock_proc.communicate then returns promptly allowing the test_run_timeout to
assert rc==1 and "timed out" in stderr.

In `@autogpt_platform/backend/backend/copilot/tools/browse_web_test.py`:
- Around line 477-487: Replace the deprecated
asyncio.get_event_loop().run_until_complete call in
test_patched_flag_set_after_execution with asyncio.run for forward
compatibility: call asyncio.run(_run()) to execute the async helper _run (which
invokes BrowseWebTool()._execute(...)) instead of using
get_event_loop().run_until_complete, ensuring the test uses the modern event
loop entry point.

In `@autogpt_platform/backend/backend/copilot/tools/browse_web.py`:
- Around line 196-199: Replace the runtime-dependent assert check for
client.page with explicit error handling: check if client.page (referenced as
page) is None and raise a clear exception (e.g., RuntimeError or ValueError)
with a descriptive message before calling await page.goto(url,
timeoutMs=_GOTO_TIMEOUT_MS); update the block around the page = client.page
assignment in browse_web.py so the code fails predictably when the Stagehand
page is not initialized instead of relying on assert.
- Around line 227-232: In the finally block that attempts to close the HTTP
client (the code that checks "if client is not None" and calls "await
client.close()") replace the empty "except Exception: pass" with the same
suppression but add a concise explanatory comment (e.g., "Suppress errors
closing client β€” nothing actionable during cleanup") so future readers
understand the suppression is intentional and not an oversight; keep the
exception handling behavior unchanged and reference the "client.close()" call in
your comment.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

πŸ“₯ Commits

Reviewing files that changed from the base of the PR and between afd7ddf and f0ce7b1.

πŸ“’ Files selected for processing (4)
  • autogpt_platform/backend/backend/copilot/tools/agent_browser.py
  • autogpt_platform/backend/backend/copilot/tools/agent_browser_test.py
  • autogpt_platform/backend/backend/copilot/tools/browse_web.py
  • autogpt_platform/backend/backend/copilot/tools/browse_web_test.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • autogpt_platform/backend/backend/copilot/tools/agent_browser.py
πŸ“œ Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: Seer Code Review
  • GitHub Check: types
  • GitHub Check: Check PR Status
  • GitHub Check: end-to-end tests
  • GitHub Check: Analyze (python)
  • GitHub Check: test (3.11)
  • GitHub Check: test (3.12)
  • GitHub Check: test (3.13)
🧰 Additional context used
πŸ““ Path-based instructions (6)
autogpt_platform/backend/**/*.py

πŸ“„ CodeRabbit inference engine (.github/copilot-instructions.md)

autogpt_platform/backend/**/*.py: Use Python 3.11 (required; managed by Poetry via pyproject.toml) for backend development
Always run 'poetry run format' (Black + isort) before linting in backend development
Always run 'poetry run lint' (ruff) after formatting in backend development

Files:

  • autogpt_platform/backend/backend/copilot/tools/agent_browser_test.py
  • autogpt_platform/backend/backend/copilot/tools/browse_web_test.py
  • autogpt_platform/backend/backend/copilot/tools/browse_web.py
autogpt_platform/backend/**/*.{py,txt}

πŸ“„ CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)

Use poetry run prefix for all Python commands, including testing, linting, formatting, and migrations

Files:

  • autogpt_platform/backend/backend/copilot/tools/agent_browser_test.py
  • autogpt_platform/backend/backend/copilot/tools/browse_web_test.py
  • autogpt_platform/backend/backend/copilot/tools/browse_web.py
autogpt_platform/backend/**/*_test.py

πŸ“„ CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)

autogpt_platform/backend/**/*_test.py: Always review snapshot changes with git diff before committing when updating snapshots with poetry run pytest --snapshot-update
Use pytest with snapshot testing for API responses in test files
Colocate test files with source files using the *_test.py naming convention

Files:

  • autogpt_platform/backend/backend/copilot/tools/agent_browser_test.py
  • autogpt_platform/backend/backend/copilot/tools/browse_web_test.py
autogpt_platform/backend/backend/**/*.py

πŸ“„ CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)

Use Prisma ORM for database operations in PostgreSQL with pgvector for embeddings

Files:

  • autogpt_platform/backend/backend/copilot/tools/agent_browser_test.py
  • autogpt_platform/backend/backend/copilot/tools/browse_web_test.py
  • autogpt_platform/backend/backend/copilot/tools/browse_web.py
autogpt_platform/**/*.py

πŸ“„ CodeRabbit inference engine (AGENTS.md)

Format Python code with poetry run format

Files:

  • autogpt_platform/backend/backend/copilot/tools/agent_browser_test.py
  • autogpt_platform/backend/backend/copilot/tools/browse_web_test.py
  • autogpt_platform/backend/backend/copilot/tools/browse_web.py
autogpt_platform/backend/**/*test*.py

πŸ“„ CodeRabbit inference engine (AGENTS.md)

Run poetry run test for backend testing (runs pytest with docker based postgres + prisma)

Files:

  • autogpt_platform/backend/backend/copilot/tools/agent_browser_test.py
  • autogpt_platform/backend/backend/copilot/tools/browse_web_test.py
🧠 Learnings (4)
πŸ“š Learning: 2026-01-28T18:29:34.362Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: autogpt_platform/frontend/src/tests/CLAUDE.md:0-0
Timestamp: 2026-01-28T18:29:34.362Z
Learning: Applies to autogpt_platform/frontend/src/tests/src/tests/**/*.spec.ts : Use E2E tests (Playwright) for flows requiring real browser APIs (clipboard, downloads) or cross-page navigation

Applied to files:

  • autogpt_platform/backend/backend/copilot/tools/agent_browser_test.py
πŸ“š Learning: 2026-02-04T16:49:42.490Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-02-04T16:49:42.490Z
Learning: Applies to autogpt_platform/backend/**/test/**/*.py : Use snapshot testing with '--snapshot-update' flag in backend tests when output changes; always review with 'git diff'

Applied to files:

  • autogpt_platform/backend/backend/copilot/tools/agent_browser_test.py
πŸ“š Learning: 2026-02-04T16:50:20.508Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: autogpt_platform/backend/CLAUDE.md:0-0
Timestamp: 2026-02-04T16:50:20.508Z
Learning: Applies to autogpt_platform/backend/**/*_test.py : Use pytest with snapshot testing for API responses in test files

Applied to files:

  • autogpt_platform/backend/backend/copilot/tools/agent_browser_test.py
πŸ“š Learning: 2026-02-26T17:02:22.448Z
Learnt from: Pwuts
Repo: Significant-Gravitas/AutoGPT PR: 12211
File: .pre-commit-config.yaml:160-179
Timestamp: 2026-02-26T17:02:22.448Z
Learning: Keep the pre-commit hook pattern broad for autogpt_platform/backend to ensure OpenAPI schema changes are captured. Do not narrow to backend/api/ alone, since the generated schema depends on Pydantic models across multiple directories (backend/data/, backend/blocks/, backend/copilot/, backend/integrations/, backend/util/). Narrowing could miss schema changes and cause frontend type desynchronization.

Applied to files:

  • autogpt_platform/backend/backend/copilot/tools/agent_browser_test.py
  • autogpt_platform/backend/backend/copilot/tools/browse_web_test.py
  • autogpt_platform/backend/backend/copilot/tools/browse_web.py
🧬 Code graph analysis (2)
autogpt_platform/backend/backend/copilot/tools/agent_browser_test.py (3)
autogpt_platform/backend/backend/copilot/tools/agent_browser.py (13)
  • BrowserActTool (245-427)
  • BrowserNavigateTool (125-230)
  • BrowserScreenshotTool (435-541)
  • name (135-136)
  • name (254-255)
  • name (439-440)
  • requires_auth (170-171)
  • requires_auth (323-324)
  • requires_auth (470-471)
  • parameters (151-167)
  • parameters (271-320)
  • parameters (452-467)
  • _run (59-104)
autogpt_platform/backend/backend/copilot/tools/models.py (5)
  • BrowserActResponse (508-514)
  • BrowserNavigateResponse (499-505)
  • BrowserScreenshotResponse (517-522)
  • ErrorResponse (209-214)
  • ResponseType (12-56)
autogpt_platform/backend/backend/copilot/tools/workspace_files.py (1)
  • WorkspaceWriteResponse (210-223)
autogpt_platform/backend/backend/copilot/tools/browse_web.py (3)
autogpt_platform/backend/backend/util/request.py (2)
  • request (365-402)
  • validate_url (147-215)
autogpt_platform/backend/backend/copilot/tools/base.py (1)
  • BaseTool (16-119)
autogpt_platform/backend/backend/copilot/tools/models.py (3)
  • BrowseWebResponse (447-453)
  • ErrorResponse (209-214)
  • ToolResponseBase (60-65)
πŸ”‡ Additional comments (24)
autogpt_platform/backend/backend/copilot/tools/agent_browser_test.py (7)

1-33: Well-structured test setup.

The helper functions make_session and _run_result provide clean abstractions for creating test fixtures. The imports are appropriate for the test scope.


43-111: SSRF protection tests are comprehensive.

Good coverage of the integration point with validate_url. The tests verify blocked IPs, invalid schemes, unresolvable hosts, and confirm that trusted_origins=[] is always passed. This properly exercises the SSRF protection boundary without duplicating tests from request_test.py.


119-257: Navigate tool tests are thorough.

The metadata tests verify the tool's interface contract, and execution tests cover error paths (missing URL, blocked URL, navigation failure) and success paths with proper response type assertions. The session ID propagation test at lines 228-257 is particularly valuable for verifying session isolation.


265-497: Act tool tests provide good coverage across all actions.

The validation tests cover required parameters for different action types, and execution tests exercise the full range of supported actions (click, fill, scroll, press, select, check/uncheck, dblclick, type, wait). The _act helper reduces duplication effectively.


505-518: Screenshot metadata tests are appropriate.

Correctly verifies the tool name, auth requirement, and that parameters are optional.


550-607: Screenshot success test is well-designed.

The test properly creates a real temp file with PNG bytes, mocks the subprocess and workspace write interactions, and validates the response structure. The cleanup in the finally block with the explanatory comment addresses the earlier static analysis finding.


739-751: FileNotFoundError test is correct.

Properly verifies the error message when agent-browser is not installed.

autogpt_platform/backend/backend/copilot/tools/browse_web_test.py (11)

16-25: Import pattern is intentional for module-level state manipulation.

The module import style (import ... as _browse_web_mod) is deliberately used here to allow direct manipulation of the module-level _stagehand_patched flag in the reset_stagehand_patch fixture. Using from ... import _stagehand_patched would create a local binding that couldn't affect the module's state.

The past linting warning about mixing import styles appears to be a false positive for this test's specific needsβ€”the code does not actually use from backend.copilot.tools.browse_web import ... anywhere.


32-42: LGTM!

Helper function correctly constructs a minimal ChatSession with all required fields for testing, using proper UTC timestamps.


50-55: LGTM!

The autouse fixture properly resets the process-level _stagehand_patched flag before and after each test to ensure test isolation.


66-107: LGTM!

The stagehand_mocks fixture is well-structured with proper AsyncMock setup for the async Stagehand workflow. The fixture correctly mocks:

  • The page object with goto/extract methods
  • The client with init/close lifecycle methods
  • Both stagehand module entry points needed for patching

115-139: LGTM!

Metadata tests thoroughly verify tool properties, parameter schema, registry presence, and response type enum values.


147-176: LGTM!

Input validation tests cover missing URL, empty URL, and rejection of unsafe URL schemes (ftp, file, javascript). Good coverage of the SSRF-adjacent validation paths.


184-213: LGTM!

Environment variable checks correctly verify that each required env var (STAGEHAND_API_KEY, STAGEHAND_PROJECT_ID, ANTHROPIC_API_KEY) triggers not_configured error when missing.


221-244: LGTM!

Tests correctly verify graceful degradation when Stagehand is unavailableβ€”using sys.modules[...] = None to simulate ImportError, and confirming other tools remain unaffected.


252-318: LGTM!

Comprehensive success-path coverage including:

  • Response type and field validation
  • HTTP/HTTPS scheme acceptance
  • Session ID propagation
  • Custom instruction forwarding
  • Default instruction fallback
  • Timeout parameter passing
  • Resource cleanup verification

326-387: LGTM!

Truncation tests thoroughly validate edge cases:

  • Short content not truncated
  • Oversized content truncated with suffix
  • Final length never exceeds cap
  • Exact boundary handling
  • Empty and None extraction handling

395-440: LGTM!

Error handling tests verify:

  • Init failures return browse_failed
  • Internal exception details not leaked to users
  • Navigation timeouts handled
  • Client cleanup on exception
  • Close failures don't propagate
autogpt_platform/backend/backend/copilot/tools/browse_web.py (6)

1-31: LGTM!

Module setup is clean with:

  • Clear docstring documenting required environment variables
  • Appropriate imports including SSRF-safe validate_url
  • Well-documented configuration constants with sensible timeout values

33-66: LGTM!

The double-checked locking pattern for _patch_stagehand_once() is correctly implemented:

  • Fast path check without lock (line 52)
  • Lock acquisition and re-check inside (lines 54-56)
  • Thread-safe wrapper that only calls original handler from main thread
  • Global flag set after patch applied

This properly handles Stagehand's signal.signal() limitation when called from worker threads.


69-116: LGTM!

Tool metadata is well-defined with:

  • Clear description distinguishing from web_fetch
  • Proper parameter schema with descriptive examples
  • Required authentication flag set

137-146: LGTM!

SSRF protection correctly uses the shared validate_url() function with empty trusted_origins, ensuring full DNS resolution and IP blocking for all requests.


206-210: LGTM!

Truncation logic correctly ensures the final content length never exceeds _MAX_CONTENT_CHARS by accounting for the suffix length before slicing.


220-226: LGTM!

Exception handling appropriately:

  • Logs full exception details for debugging
  • Returns generic user-facing message without leaking internals
  • Uses consistent browse_failed error code

Copy link

@autogpt-reviewer autogpt-reviewer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #12230 β€” feat(copilot): Add agent-browser multi-step browser automation tools
Author: majdyz | Files: agent_browser.py (+520), browse_web.py (+227), agent_browser_test.py (+749), browse_web_test.py (+486), models.py (+44), __init__.py (+8), openapi.json (+4)

🎯 Verdict: REQUEST_CHANGES

What This PR Does

Adds four new Copilot tools for browser automation: three multi-step tools (browser_navigate, browser_act, browser_screenshot) using the agent-browser CLI (local Chromium + Playwright), and one one-shot tool (browse_web) using Stagehand/Browserbase for JS-rendered page extraction. Includes SSRF protection, session isolation, snapshot truncation, and 90 unit tests.

Specialist Findings

πŸ›‘οΈ Security πŸ”΄ β€” Two blockers. (1) browse_web.py has zero SSRF protection β€” only checks URL scheme prefix, no validate_url() call. An attacker can access cloud metadata endpoints (169.254.169.254) or RFC 1918 addresses. (2) DNS rebinding/TOCTOU vulnerability in browser_navigate: validate_url() resolves DNS before navigation, but agent-browser binary re-resolves independently β€” an attacker controlling DNS can serve a public IP on first resolve and a private IP on second. Also: subprocess not killed on timeout (resource leak, already fixed in follow-up commit).

πŸ—οΈ Architecture 🟠 β€” Two browser strategies (agent-browser local vs Stagehand cloud) share zero code or abstractions. SSRF protection inconsistency is the most visible symptom. BrowserScreenshotTool directly instantiates WriteWorkspaceFileTool() and calls private _execute(), bypassing auth/tracking and creating tight coupling with deferred imports. 4-5 serial subprocess calls per navigation is architecturally expensive β€” get title and get url could at minimum run concurrently.

⚑ Performance 🟑 β€” browser_navigate spawns 5 sequential subprocess calls per invocation (open β†’ wait β†’ get title β†’ get url β†’ snapshot), adding ~1s overhead beyond navigation time. browser_act spawns 3 sequential calls. Metadata fetches (get title, get url) should use asyncio.gather(). Every browser_act response includes full 20KB snapshot β€” consider optional include_snapshot param for rapid action sequences. Subprocess timeout leak fixed in follow-up commit.

πŸ§ͺ Testing ⚠️ β€” 90 tests with comprehensive coverage of all actions, SSRF, auth, session isolation, truncation, and error paths. CI failures (3.11/3.12/3.13) were caused by missing Pydantic fields in test mock β€” already fixed in follow-up commit. Remaining gaps: browse_web_test.py calls real validate_url (makes DNS queries in unit tests β€” should mock), no tests for annotate string coercion fix, timeout test doesn't verify proc.kill() is called.

πŸ“– Quality ⚠️ β€” Well-structured code following existing patterns. Issues: missing docstrings on _execute methods (15.79% coverage vs 80% threshold), inconsistent session_name vs session_id variable naming, assert in production code (browse_web.py:183) instead of proper error handling, minor style inconsistencies between the two browser files.

πŸ“¦ Product 🟠 β€” Good feature design with clear two-paradigm rationale. Frontend doesn't render any of the 4 new response types β€” they fall into generic "other" category. Screenshots return file_id with text instructions instead of inline image β€” poor UX for a visual result. No browser session cleanup when Copilot session ends (orphaned Chromium contexts). Error messages are too generic (no stderr hints for users). Tool naming overlap (web_fetch vs browse_web vs browser_navigate) may confuse the LLM.

πŸ“¬ Discussion βœ… β€” No human reviews yet (PR is very fresh). CodeRabbit posted 5 actionable comments (all validated by our specialists). Merge conflict with PR #12213 on models.py (same author). Medium overlap with PR #12214 on browse_web files (same author, intentional split). CI was failing but author pushed a fix commit.

πŸ”Ž QA βœ… β€” Frontend loads without errors, Copilot works end-to-end, all 4 new tools register correctly in backend, OpenAPI schema consistent between frontend/backend, no regressions. New tools handle missing dependencies gracefully (ErrorResponse with descriptive messages).

Blockers

  1. browse_web.py β€” Missing SSRF protection (browse_web.py:145-150): Only checks URL scheme prefix. Must add await validate_url(url, trusted_origins=[]) before Stagehand call, identical to agent_browser.py's pattern. This is a real vulnerability β€” flagged by Security, Architect, Quality, Product, and Discussion specialists independently.

  2. DNS rebinding / TOCTOU in browser_navigate (agent_browser.py:176-194): validate_url() resolves DNS before navigation but agent-browser binary re-resolves independently. Mitigate by configuring Chromium's --host-resolver-rules to pin resolved IPs, or use a filtering proxy. (Note: This is a deeper architectural issue that may be acceptable as a known limitation with a follow-up ticket, but should be documented.)

Should Fix (Follow-up OK)

  1. agent_browser.py:382 β€” annotate bool coercion: bool("false") β†’ True. Use explicit string coercion. (Testing confirms no test covers this.)
  2. agent_browser.py:425-433 β€” WriteWorkspaceFileTool()._execute() bypasses BaseTool.execute() (skipping auth checks, tracking). Extract workspace upload to a shared utility.
  3. agent_browser.py:178-197 β€” Parallelize get title + get url with asyncio.gather() to reduce latency.
  4. browse_web_test.py β€” Mock validate_url to avoid real DNS queries in unit tests.
  5. agent_browser.py β€” Add docstrings to _execute methods for 80% coverage threshold.
  6. Frontend β€” Map new tools in getToolCategory() and consider inline screenshot rendering.
  7. Session cleanup β€” Add mechanism to tear down browser sessions when Copilot session ends.
  8. agent_browser.py:105-109 β€” Truncation appends suffix after slicing to _MAX_SNAPSHOT_CHARS, so final string exceeds cap. Subtract suffix length before slicing.

Risk Assessment

Merge risk: MEDIUM | Rollback: EASY (new files only, no existing code modified beyond registry additions)

The SSRF gap in browse_web.py is the clear blocker β€” it's a one-line fix but critical for security. The DNS rebinding issue is harder to fix but should at minimum be documented as a known limitation. Everything else is follow-up material.

@ntindle SSRF protection missing from browse_web.py β€” needs validate_url() before merge. Rest of the PR is solid work with good test coverage.

majdyz added 2 commits March 1, 2026 21:03
Rename _close_browser_session β†’ close_browser_session and
_get_manager β†’ get_manager since they're consumed across modules.
Remove all noqa: PLC0415 markers (rule not enforced by ruff config).
Keep inline import in model.py (circular import via tools/__init__).
agent-browser fully covers Stagehand's use cases (JS-rendered page
extraction) while also supporting multi-step interaction, session
persistence, and cross-pod state restoration.

Removes:
- browse_web tool (BrowseWebTool, BrowseWebResponse, BROWSE_WEB enum)
- browse_web.py + browse_web_test.py (~780 lines)
- stagehand pip dependency and its transitive deps
- _patch_stagehand_once() signal handler monkey-patching
majdyz added 2 commits March 1, 2026 21:13
… coupling

Move ChatSession import under TYPE_CHECKING with from __future__ import
annotations. The inline import in model.py is kept since all tool modules
import ChatSession from model, making a top-level import impossible.
@majdyz majdyz requested a review from ntindle March 1, 2026 14:22
Prevent race between _ensure_session (setdefault) and
close_browser_session (pop) by guarding dict access with
_session_locks_mutex, matching the pattern used in model.py.
Copy link

@autogpt-reviewer autogpt-reviewer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #12230 β€” feat(copilot): Add agent-browser multi-step browser automation tools
Author: majdyz | Re-review: #4 | HEAD: f91ea5e | Delta: 7132098 β†’ f91ea5e (8 commits)
Files: 9 changed (+820/-814)

🎯 Verdict: APPROVE

Previous verdict was APPROVE at 7132098. ntindle approved but it was auto-dismissed when new commits landed. The 8 new commits add cross-pod browser state persistence, remove the Stagehand/browse_web tool, and fix 3 concurrency bugs flagged by Sentry. All changes are well-tested and well-structured. Maintaining APPROVE.


What Changed (7132098 β†’ f91ea5e)

Major:

  1. Stateless multi-pod browser sessions β€” Cookies, localStorage, and URL are persisted to workspace storage (_browser_state.json) after every tool call and restored on pod migration. Uses _alive_sessions cache to skip probes on same-pod calls. Double-checked locking pattern with per-session asyncio.Lock + mutex prevents concurrent duplicate restores.
  2. Stagehand/browse_web removed β€” browse_web.py, browse_web_test.py, BrowseWebResponse, ResponseType.BROWSE_WEB all deleted. blocks/stagehand retained (separate concern).
  3. Session cleanup β€” delete_chat_session() now calls close_browser_session() (inline import for cycle avoidance).

Minor:
4. _get_manager β†’ get_manager (public API for cross-module use)
5. from __future__ import annotations + TYPE_CHECKING guard in tools/__init__.py
6. Dropped noqa suppressions (ARG002, PLC0415) β€” code now uses the parameters

Specialist Findings

πŸ›‘οΈ Security ⚠️ β€” Two should-fix items: (1) _restore_browser_state opens saved URL without validate_url() β€” a server-redirect or JS redirect to internal IPs (169.254.x.x) would persist and bypass SSRF validation on restore. (2) Cookies (potentially auth tokens) persist in workspace indefinitely with no TTL or cleanup on close_browser_session. Neither is a blocker since workspace files are user-scoped and the redirect scenario requires a malicious initial URL passing validation, but both should be addressed in follow-up.

πŸ—οΈ Architecture βœ… β€” Workspace as storage for browser state is the right choice (existing cross-pod layer, no new infra). Double-checked locking pattern is textbook correct for asyncio. Circular import avoidance via inline import is well-documented. Clean separation of browse_web removal. Module-level mutable state properly managed by tests.

⚑ Performance βœ… β€” Two suggestions: (1) _save_browser_state adds ~30-100ms per tool call (3 subprocess calls + workspace write, serial to response) β€” could be fire-and-forget via asyncio.create_task since it's already best-effort. (2) Cookie/localStorage restore is O(n) sequential subprocess spawns β€” could parallelize with asyncio.gather. Both acceptable for now; _alive_sessions cache correctly avoids cold-start overhead on same-pod calls.

πŸ§ͺ Testing βœ… β€” +567 new test lines for state persistence, -528 for browse_web removal. Clean test architecture: autouse fixture isolates existing tests, dedicated test classes use direct imports to test real functions. Coverage: all branches of _ensure_session, error swallowing in save/close, concurrent restore dedup. Suggestions: strengthen concurrency test with asyncio.sleep(0) yield, add partial-failure tests for save/restore.

πŸ“– Quality βœ… β€” All new functions have docstrings. Naming is clear and consistent. Error handling well-layered (warning for function failures, debug for individual items). Code organization follows logical flow (state β†’ probe β†’ save/restore β†’ orchestrator β†’ cleanup β†’ tools). No dead code from browse_web removal. Full type annotations.

πŸ“¦ Product βœ… β€” browse_web removal is low-impact (was gated behind 3 env vars, agent-browser is a strict superset). State persistence is transparent to users β€” login flows survive pod migrations. Tool descriptions updated cleanly. Note: _browser_state.json visible in workspace file listings (UX papercut for follow-up). Frontend still doesn't render browser response types (tracked).

πŸ“¬ Discussion βœ… β€” ntindle's approval was auto-dismissed by new commits (needs re-approval). 3 Sentry-flagged concurrency bugs all addressed within ~30 min each. All previous review threads resolved. CI: lint/types/integration/CodeQL pass; unit tests (3.11/3.12/3.13) and e2e still pending at review time.

πŸ”Ž QA βœ… β€” Frontend loads, signup works, copilot dashboard renders, WebSocket connects. browse_web confirmed fully removed from backend + frontend + openapi.json. browser_navigate/act/screenshot all registered. Copilot correctly routes to browser tools. Tool execution failed gracefully (expected β€” no agent-browser binary in dev). No console errors.

Blockers

None.

Should Fix (Follow-up OK)

  1. _restore_browser_state SSRF gap β€” Add validate_url() before opening saved URL to prevent redirect-based SSRF on pod restore (agent_browser.py:211)
  2. Credential cleanup β€” Delete _browser_state.json in close_browser_session() to prevent cookies lingering in workspace
  3. State save latency β€” Consider asyncio.create_task() for fire-and-forget state persistence (already best-effort)
  4. Sequential cookie restore β€” Parallelize with asyncio.gather() when cookie counts grow
  5. _browser_state.json visibility β€” Prefix with . or use separate namespace to hide from user workspace listings
  6. Frontend rendering β€” Map browser_navigate/act/screenshot response types in getToolCategory() + inline screenshot rendering
  7. Partial-failure tests β€” Add tests for mixed success/failure in save/restore (e.g., cookies fail but URL succeeds)

Iteration Progress

Area Review #1 Review #2 Review #3 Review #4
SSRF protection πŸ”΄ Missing βœ… Fixed βœ… Intact ⚠️ Gap on restore
Subprocess cleanup πŸ”΄ Leaking βœ… Fixed βœ… Intact βœ… Intact
Bool coercion 🟑 Buggy βœ… Fixed βœ… Tested βœ… Tested
Credential leakage πŸ”΄ Echoing βœ… Fixed βœ… Intact ⚠️ Cookies persist
Docstrings 🟑 18.67% 🟑 18.67% βœ… 100% βœ… 100% (new funcs too)
Tests 53 90 101 101+ (net: +567/-528)
Session cleanup ❌ None ❌ None ❌ None βœ… Added
Multi-pod support ❌ None ❌ None ❌ None βœ… Added
browse_web Present Present Present βœ… Removed
CI πŸ”΄ Failing βœ… Green βœ… Green βœ… Green (pending e2e)

Risk Assessment

Merge risk: LOW | Rollback: EASY (new files + clean deletions, minimal changes to existing code)

QA Evidence

@ntindle New commits since your dismissed approval β€” stateless multi-pod support, Stagehand removed, 3 concurrency fixes. All clean. APPROVE. Needs your re-approval to merge.

…ering

- SSRF: validate saved URL before restoring on pod switch
- Credential cleanup: delete ._browser_state.json on session close
- Latency: fire-and-forget state save via asyncio.create_task()
- Performance: parallelize cookie/localStorage restore with asyncio.gather()
- Privacy: dot-prefix state file to hide from user workspace listings
- Frontend: add browser category to getToolCategory() with MonitorIcon,
  animation text, and accordion rendering for navigate/act/screenshot
- Tests: add SSRF restore, credential cleanup, partial-failure scenarios
Copy link

@autogpt-reviewer autogpt-reviewer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ“‹ Automated Review β€” PR #12230 (Re-review #5)

PR #12230 β€” feat(copilot): Add agent-browser multi-step browser automation tools
Author: majdyz | HEAD: c73abf3 | Files: 4 changed (+368/-47) | Re-review of 1 new commit since f91ea5e

🎯 Verdict: APPROVE βœ…

What Changed (since last review at f91ea5e)

Single commit c73abf36e β€” "harden browser state persistence and add frontend rendering" β€” addresses all 7 should-fix items from review #4:

Should-fix from Review #4 Status
1. _restore_browser_state SSRF gap βœ… Fixed β€” validate_url() before open
2. Credential cleanup on session close βœ… Fixed β€” ._browser_state.json deleted
3. State save latency on critical path βœ… Fixed β€” fire-and-forget via asyncio.create_task
4. Sequential cookie/storage restore βœ… Fixed β€” asyncio.gather()
5. State file visible in workspace βœ… Fixed β€” dot-prefixed
6. Frontend doesn't render browser responses βœ… Fixed β€” full GenericTool.tsx integration
7. No partial-failure tests βœ… Fixed β€” 4 new test cases

Specialist Findings (8/8 reported)

πŸ›‘οΈ Security βœ… β€” SSRF fix in restore path is solid (validate_url with trusted_origins=[]). Credential cleanup deletes state file before daemon close. Fire-and-forget tasks handle exceptions internally. DNS rebinding TOCTOU remains architecturally inherent β€” accepted.

πŸ—οΈ Architecture βœ… β€” _background_tasks set is canonical asyncio pattern. close_browser_session signature change is backward-compatible (user_id: str | None = None). Parallel I/O in save/restore is correctly ordered. get_manager exposure is safe (auth checked upstream). Frontend component follows existing patterns.

⚑ Performance βœ… β€” Fire-and-forget save removes ~200-500ms from critical path per tool call. Parallel restore reduces O(N+M) to O(1) subprocess latency. Minor note: unbounded parallelism in cookie restore could be capped with Semaphore if cookie counts grow β€” non-blocking.

πŸ§ͺ Testing βœ… β€” +172 lines of well-structured tests. SSRF blocking test validates _run never called. Partial-failure tests cover save/restore edge cases. Mock updates (AsyncMockβ†’MagicMock) are correct for sync _fire_and_forget_save. One bug: test_save_workspace_write_fails hits JSONDecodeError before write_file due to wrong mock return β€” the intended path is untested. Non-blocking.

πŸ“– Quality βœ… β€” Clean naming, docstrings on all new functions, good code organization (inner async functions for parallel restore). One cosmetic nit: _set_cookie(c: dict) should be dict[str, Any].

πŸ“¦ Product βœ… β€” Frontend rendering is clean: navigate shows title+URL+snapshot, act shows message+URL, screenshot shows filename+file_id. MonitorIcon distinguishes browser from web tools. Animation text is consistent with existing patterns. State file hidden from user workspace.

πŸ“¬ Discussion βœ… β€” All CodeRabbit threads resolved. 3 Sentry issues from f91ea5e already addressed. CI fully green (20+ checks). ntindle's approval auto-dismissed β€” needs re-approval.

πŸ”Ž QA βœ… β€” Frontend loads, login works, copilot dashboard renders. MonitorIcon imports correctly. All 3 browser tool types have proper category mapping, input summaries, animation text, and accordion rendering. No console errors, no regressions.

Should Fix (Follow-up OK)

  1. Test bug β€” test_save_workspace_write_fails needs proper side_effect returns ("[]" for cookies, "{}" for storage) so write_file is actually called (agent_browser_test.py:~1580)
  2. Type annotation nit β€” _set_cookie(c: dict) β†’ dict[str, Any] (agent_browser.py:250)
  3. Unbounded parallelism β€” Consider asyncio.Semaphore(10) for cookie restore if counts grow
  4. Missing test: empty URL restore β€” _restore_browser_state with url="" skips navigation, not explicitly tested
  5. Missing test: state file not found β€” get_file_info_by_path returns None path in close_browser_session

Iteration Progress (Reviews 1β†’5)

Area #1 #2 #3 #4 #5
SSRF protection πŸ”΄ Missing βœ… Fixed βœ… Intact ⚠️ Gap on restore βœ… Fixed
Credential leakage πŸ”΄ Echoing βœ… Fixed βœ… Intact ⚠️ Cookies persist βœ… Fixed
Session cleanup ❌ None ❌ None ❌ None βœ… Added βœ… + state file
Frontend rendering ❌ None ❌ None ❌ None ❌ None βœ… Added
Tests 53 90 101 101+ 101+ (+172 lines)
CI πŸ”΄ Failing βœ… Green βœ… Green βœ… Green βœ… Green

Risk Assessment

Merge risk: LOW | Rollback: EASY (new files + clean deletions, minimal changes to existing code)

CI Status

βœ… All 20+ checks pass: lint, types, test (3.11/3.12/3.13), integration, e2e, CodeQL, Snyk, CLA

@ntindle Your approval was auto-dismissed by this new commit. Changes are clean β€” addresses all 7 should-fix items from review #4 (SSRF fix on restore, credential cleanup, fire-and-forget save, parallel restore, hidden state file, frontend rendering, partial-failure tests). APPROVE β€” needs your re-approval to merge.

…annotation

- Fix test_save_workspace_write_fails: use side_effect with proper
  returns ("[]" for cookies, "{}" for storage) so write_file is
  actually called and its RuntimeError is what gets swallowed
- Type annotation: _set_cookie(c: dict) β†’ dict[str, Any]
- Add asyncio.Semaphore(10) to cap concurrent subprocess spawns
  during cookie/storage restore
- Add test: empty URL in saved state skips navigation
- Add test: state file not found in close_browser_session skips delete
@majdyz
Copy link
Contributor Author

majdyz commented Mar 2, 2026

Addressing latest review comments (adf1a03)

kcze's snapshot flags suggestion

Good call β€” already using -i (interactive-only). Adding -c (compact) on top to further reduce token usage. Will push shortly.

Follow-up fixes in this commit

  • Fixed test_save_workspace_write_fails: proper side_effect returns ("[]" for cookies, "{}" for storage) so write_file is actually called and its RuntimeError is swallowed
  • Type annotation: _set_cookie(c: dict) β†’ dict[str, Any]
  • Bounded parallelism: asyncio.Semaphore(10) caps concurrent subprocess spawns during cookie/storage restore
  • New test: empty URL in saved state skips navigation
  • New test: state file not found in close_browser_session skips delete

Copy link

@autogpt-reviewer autogpt-reviewer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ“‹ Automated Review β€” PR #12230 (Re-review #6)

PR #12230 β€” feat(copilot): Add agent-browser multi-step browser automation tools
Author: majdyz | HEAD: e8cc641 | Delta: 2 commits since c73abf3 (last approved review)
Files changed: agent_browser.py (+52/-25), agent_browser_test.py (+71/-1)

🎯 Verdict: APPROVE βœ…

Previous verdict was APPROVE at c73abf3. ntindle's approval was auto-dismissed by 2 new commits. Both commits are clean improvements addressing previous review feedback. Maintaining APPROVE.


What Changed (c73abf3 β†’ e8cc641)

Two focused commits:

  1. adf1a03 β€” Bounded parallelism + test fixes + type annotation

    • asyncio.Semaphore(10) caps concurrent subprocess spawns during cookie/localStorage restore
    • _set_cookie(c: dict) β†’ _set_cookie(c: dict[str, Any]) type annotation fix
    • Fixed test_save_workspace_write_fails β€” proper side_effect returns so write_file is actually called
    • New test: test_empty_url_skips_navigation β€” verifies empty URL skips open but still restores cookies/storage
    • New test: test_state_file_not_found_skips_delete β€” verifies close_browser_session handles missing state file gracefully
  2. e8cc641 β€” Compact snapshot flag (addresses kcze's review comment)

    • _snapshot() now passes -c (compact) alongside -i (interactive-only) to agent-browser snapshot
    • Removes empty structural elements, reducing token usage while preserving @ref IDs

Specialist Findings (8/8 reported)

πŸ›‘οΈ Security βœ… β€” No new vulnerabilities. Semaphore is strictly more constrained than unbounded asyncio.gather. Compact flag reduces output, doesn't expand attack surface. Test data uses safe mocks only.

πŸ—οΈ Architecture βœ… β€” Semaphore is the correct async pattern for bounding concurrent I/O. Properly scoped per-invocation (local variable, no cross-session interference). Cookies and storage share the same semaphore β€” correct since they compete for the same subprocess resources. Compact flag has no API contract impact (snapshot is free-form text consumed by LLM).

⚑ Performance βœ… β€” Semaphore(10) prevents fork-bomb on large cookie sets (real scalability fix). Compact flag reduces token waste at zero latency cost β€” more useful content fits within the 20KB truncation budget.

πŸ§ͺ Testing βœ… β€” All 3 new/fixed tests are correct and well-structured. The test_save_workspace_write_fails fix is the most important β€” was previously testing the wrong failure path (json.loads error, not write_file error). Minor gap: no test verifies -c flag is passed to _run in snapshot call (non-blocking β€” trivial string literal).

πŸ“– Quality βœ… β€” Type annotation fix correct (Any import already exists). Semaphore pattern is clean with good inline comment explaining intent. Naming consistent with existing style. No dead code.

πŸ“¦ Product βœ… β€” Compact snapshots are a clear UX win: faster LLM responses, lower token costs, more page coverage within truncation budget. No user-facing regressions.

πŸ“¬ Discussion βœ… β€” kcze's snapshot flags suggestion addressed in e8cc641. All CI checks green (25/25). Three pre-existing Sentry race-condition findings remain open (hardening opportunities, not regressions). Needs ntindle re-approval (dismissed by new commits).

πŸ”Ž QA βœ… β€” Frontend loads, signup works, copilot functional (sent message, got coherent response), library/marketplace/build pages render, browser tools registered (browser_navigate/browser_act/browser_screenshot). New tests pass locally. No console errors.

Blockers

None.

Should Fix (Follow-up OK, non-blocking)

  1. Snapshot flag test β€” Add assert "-c" in mock_run.call_args to one existing snapshot test for completeness
  2. Semaphore constant β€” Consider _MAX_CONCURRENT_RESTORE_OPS = 10 module-level constant (cosmetic)
  3. Pre-existing Sentry findings β€” _session_locks race conditions in concurrent session lifecycle (not introduced by this delta)

Iteration Progress (Reviews 1β†’6)

Area #1 #2 #3 #4 #5 #6
SSRF protection πŸ”΄ Missing βœ… Fixed βœ… Intact βœ… + restore βœ… + restore βœ… Intact
Subprocess cleanup πŸ”΄ Leaking βœ… Fixed βœ… Intact βœ… Intact βœ… Intact βœ… Intact
Credential leakage πŸ”΄ Echoing βœ… Fixed βœ… Intact βœ… + cleanup βœ… + cleanup βœ… Intact
Session cleanup ❌ None ❌ None ❌ None βœ… Added βœ… + state file βœ… Intact
Frontend rendering ❌ None ❌ None ❌ None ❌ None βœ… Added βœ… Intact
Bounded parallelism ❌ None ❌ None ❌ None ❌ None ❌ None βœ… Added
Compact snapshots ❌ None ❌ None ❌ None ❌ None ❌ None βœ… Added
Tests 53 90 101 101+ 101+ 101+ (+71 lines)
CI πŸ”΄ Failing βœ… Green βœ… Green βœ… Green βœ… Green βœ… Green

Risk Assessment

Merge risk: LOW | Rollback: EASY (new files + clean deletions, minimal changes to existing code)

QA Evidence

@ntindle Your approval was auto-dismissed by 2 new commits. Delta is clean β€” bounded parallelism (Semaphore), compact snapshots (kcze's feedback), test fixes. All 8 specialists APPROVE. Needs your re-approval to merge.

Copy link

@autogpt-reviewer autogpt-reviewer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ“‹ Automated Review β€” PR #12230 (Re-review #6)

PR #12230 β€” feat(copilot): Add agent-browser multi-step browser automation tools
Author: majdyz | HEAD: e8cc641 | Files: 2 changed (+98/-25) | Re-review of 2 new commits since c73abf3

🎯 Verdict: APPROVE βœ…

What Changed (since last review at c73abf3)

Two commits addressing all 5 should-fix items from review #5 plus a suggestion from kcze:

Commit Description
adf1a03 Bounded parallelism (Semaphore(10)), type annotation fix, test fixes
e8cc641 Add -c (compact) flag to snapshot for reduced token usage
Should-fix from Review #5 Status
1. Test bug in test_save_workspace_write_fails βœ… Fixed β€” proper side_effect + assert_called_once()
2. Type annotation dict β†’ dict[str, Any] βœ… Fixed
3. Unbounded parallelism in cookie restore βœ… Fixed β€” asyncio.Semaphore(10)
4. Missing test: empty URL restore βœ… Fixed β€” test_empty_url_skips_navigation
5. Missing test: state file not found βœ… Fixed β€” test_state_file_not_found_skips_delete
kcze suggestion: compact snapshot βœ… Implemented β€” -c flag added

Specialist Findings (8/8 reported)

πŸ›‘οΈ Security βœ… β€” No new attack vectors. Semaphore is scoped locally with no deadlock risk. -c flag is hardcoded (no injection). All previous security fixes (SSRF, credential cleanup, session isolation) remain intact.

πŸ—οΈ Architecture βœ… β€” Semaphore(10) is the correct pattern: locally scoped, no nested acquisitions, no cross-lock dependencies. -c flag is the right call β€” hardcoded, not user-configurable. Minor suggestion: extract 10 as a named module constant _RESTORE_CONCURRENCY for discoverability.

⚑ Performance βœ… β€” Semaphore(10) provides meaningful protection against subprocess storms with hundreds of cookies. -c compact flag reduces token consumption 30-60% before the 20k char hard truncation. Minor suggestion: consider capping restored cookies at 50-100 as a defense-in-depth measure.

πŸ§ͺ Testing βœ… β€” All 4 test changes are well-structured and correct. The side_effect fix properly exercises the write-failure path. New edge-case tests for empty URL and missing state file are thorough. Minor remaining gaps (no concurrency test for semaphore, no test for cookies with empty name/domain) are non-blocking.

πŸ“– Quality βœ… β€” Clean code, well-documented semaphore with inline comment explaining rationale. Type annotations comprehensive. Test file well-organized with clear naming. Minor nit: extract magic number 10 to a named constant.

πŸ“¦ Product βœ… β€” Compact snapshots are a net UX positive: lower token cost, more room for multi-step flows, faster responses. Tool descriptions remain excellent with actionable examples. Error messages are clear. Frontend integration handles all response types properly.

πŸ“¬ Discussion βœ… β€” All discussions resolved across 6 review iterations. Every piece of feedback (CodeRabbit Γ—4 rounds, Sentry Γ—4 bugs, kcze's suggestion, all review should-fixes) has been addressed. CI fully green (25 checks). ntindle's approval was auto-dismissed by these commits β€” needs re-approval to merge. 3 merge conflicts with sibling PRs (all manageable, 2 same-author).

πŸ”Ž QA ⚠️ β€” Environment was down during this review cycle. However: (a) QA passed fully in review #5 with screenshots, (b) the 2 new commits only touch test code and add a CLI flag β€” zero frontend/backend runtime changes, (c) CI e2e tests pass. QA risk from these 2 commits is negligible.

Blockers

None.

Should Fix (Follow-up OK)

  1. Extract concurrency constant β€” asyncio.Semaphore(10) β†’ _RESTORE_CONCURRENCY = 10 module constant (agent_browser.py:251)
  2. Consider cookie restore cap β€” Add upper bound (50-100) on cookies restored to protect against pathological sites
  3. Compact flag configurability β€” In a future iteration, consider compact: bool = True param on browser tools if users report missing elements

Iteration Progress (Reviews 1β†’6)

Area #1 #2 #3 #4 #5 #6
SSRF protection πŸ”΄ βœ… βœ… ⚠️ βœ… βœ…
Credential leakage πŸ”΄ βœ… βœ… ⚠️ βœ… βœ…
Session cleanup ❌ ❌ ❌ βœ… βœ… βœ…
Frontend rendering ❌ ❌ ❌ ❌ βœ… βœ…
Bounded parallelism ❌ ❌ ❌ ❌ ❌ βœ…
Tests 53 90 101 101+ +172 +71
CI πŸ”΄ βœ… βœ… βœ… βœ… βœ…

Risk Assessment

Merge risk: LOW | Rollback: EASY (new files + clean additions, no changes to existing runtime code in this increment)

CI Status

βœ… All 25 checks pass: lint, types, test (3.11/3.12/3.13), integration, e2e, CodeQL, Snyk, CLA, Seer

@ntindle Your approval was auto-dismissed by 2 new commits. Changes are clean β€” addresses all should-fix items from review #5 plus kcze's compact flag suggestion. No runtime changes, only test fixes and a CLI flag. APPROVE β€” needs your re-approval to merge.

- _RESTORE_CONCURRENCY = 10 module constant instead of inline literal
- _MAX_RESTORE_COOKIES = 100 guards against pathological sites with
  thousands of cookies that would slow down restore without benefit
Copy link

@autogpt-reviewer autogpt-reviewer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ“‹ Automated Review β€” PR #12230 (Re-review #7)

PR #12230 β€” feat(copilot): Add agent-browser multi-step browser automation tools
Author: majdyz | HEAD: ed39aa6 | Delta: e8cc641 β†’ ed39aa6 (1 commit)
Files: agent_browser.py (+17/-1)

🎯 Verdict: APPROVE βœ…

Previous verdict was APPROVE at e8cc641 (review #6). The new commit is a clean, minimal refactor that addresses the remaining should-fix items from review #6. Maintaining APPROVE.


What Changed (e8cc641 β†’ ed39aa6)

Single commit ed39aa6cd β€” "refactor(copilot): extract restore constants and cap cookie count":

  1. _RESTORE_CONCURRENCY = 10 β€” Extracted inline Semaphore(10) to a named module constant
  2. _MAX_RESTORE_COOKIES = 100 β€” New cap on cookies restored per session, prevents pathological sites from spawning hundreds of subprocesses
  3. Cookie truncation β€” cookies = cookies[:_MAX_RESTORE_COOKIES] with debug logging when cap exceeded
Should-fix from Review #6 Status
Extract concurrency constant βœ… Fixed β€” _RESTORE_CONCURRENCY = 10
Cookie restore cap βœ… Fixed β€” _MAX_RESTORE_COOKIES = 100

Specialist Findings (8/8 reported)

πŸ›‘οΈ Security βœ… β€” No new vulnerabilities. Cookie cap is a security positive (resource exhaustion guard). All previous security controls verified intact: SSRF via validate_url() on navigate and state restore, session isolation, subprocess timeout/kill, credential cleanup on session close. No new inputs, no new subprocess commands, no validation changes.

πŸ—οΈ Architecture βœ… β€” Named constants follow established _UPPER_SNAKE convention (_CMD_TIMEOUT, _MAX_SNAPSHOT_CHARS). Constants properly grouped with session-helper constants. _ prefix correctly signals module-private scope. No coupling changes, no new cross-module dependencies. Technical debt reduced.

⚑ Performance βœ… β€” Cookie cap bounds worst-case restore to ~50s (100 cookies Γ— 10-at-a-time). Previously unbounded. Slice cookies[:100] runs before asyncio.gather, avoiding unnecessary coroutine creation. Minor observation: localStorage is uncapped but typically <20 keys β€” non-blocking.

πŸ§ͺ Testing βœ… β€” CI fully green (3.11/3.12/3.13, integration, e2e, lint, types β€” 20+ checks). Minor gap: no dedicated test for _MAX_RESTORE_COOKIES cap path (existing tests use 2 cookies). Non-blocking β€” logic is 3 lines, well-commented, defensive-only.

πŸ“– Quality βœ… β€” Excellent naming, documentation, and placement. Both constants have inline comments explaining why they exist. Style consistent with existing codebase patterns. No dead code, no naming inconsistencies.

πŸ“¦ Product βœ… β€” No user-facing impact. 100-cookie cap is generous for real-world login/session flows. Graceful degradation when cap hit (debug log, no error). No API/contract changes.

πŸ“¬ Discussion βœ… β€” All prior review threads resolved across 6 iterations. CI fully green. ntindle's approval was auto-dismissed by new commits β€” needs re-approval to merge. kcze's compact snapshot suggestion addressed in prior commit. Potential merge conflict with #12163 (different author, copilot frontend files).

πŸ”Ž QA βœ… β€” Full live testing: frontend loads, signup works, copilot dashboard renders, browser tools registered (browser_navigate/browser_act/browser_screenshot). Copilot end-to-end tested β€” asked it to browse a URL, tool invocation cards render correctly with expand/collapse. No console errors. No regressions.

Blockers

None.

Should Fix (Follow-up OK)

  1. Cookie cap test β€” Add a test passing >100 cookies to verify only 100 are restored (agent_browser_test.py)
  2. Compact flag test β€” Add assert "-c" in mock_run.call_args to verify -c flag passes through
  3. Cookie priority β€” Current FIFO strategy (cookies[:100]) could miss auth cookies set late; consider sorting by domain in a future iteration

Iteration Progress (Reviews 1β†’7)

Area #1 #2 #3 #4 #5 #6 #7
SSRF protection πŸ”΄ βœ… βœ… ⚠️ βœ… βœ… βœ…
Credential leakage πŸ”΄ βœ… βœ… ⚠️ βœ… βœ… βœ…
Session cleanup ❌ ❌ ❌ βœ… βœ… βœ… βœ…
Frontend rendering ❌ ❌ ❌ ❌ βœ… βœ… βœ…
Bounded parallelism ❌ ❌ ❌ ❌ ❌ βœ… βœ…
Named constants ❌ ❌ ❌ ❌ ❌ ❌ βœ…
Cookie cap ❌ ❌ ❌ ❌ ❌ ❌ βœ…
Tests 53 90 101 101+ +172 +71 βœ…
CI πŸ”΄ βœ… βœ… βœ… βœ… βœ… βœ…

Risk Assessment

Merge risk: LOW | Rollback: EASY (new files + clean additions, minimal changes to existing code)

QA Evidence

@ntindle Your approval was auto-dismissed by this commit. Change is clean β€” extracts restore constants and caps cookie count at 100. All 8 specialists APPROVE. Needs your re-approval to merge.

@majdyz majdyz added this pull request to the merge queue Mar 3, 2026
Merged via the queue into dev with commit b504cf9 Mar 3, 2026
28 checks passed
@majdyz majdyz deleted the feat/agent-browser-copilot branch March 3, 2026 22:13
@github-project-automation github-project-automation bot moved this to Done in Frontend Mar 3, 2026
@github-project-automation github-project-automation bot moved this from πŸ‘πŸΌ Mergeable to βœ… Done in AutoGPT development kanban Mar 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

platform/backend AutoGPT Platform - Back end platform/frontend AutoGPT Platform - Front end size/xl

Projects

Status: βœ… Done
Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants