feat(backend/copilot): async polling for agent-generator + SSE auto-reconnect by majdyz · Pull Request #12199 · Significant-Gravitas/AutoGPT

majdyz · 2026-02-24T14:18:26Z

Summary

Platform service client now uses submit-and-poll pattern for all agent-generator calls (10s poll interval, 30min timeout via time.monotonic())
asyncio.sleep() in poll loop yields to event loop, keeping SSE heartbeats alive through GCP's L7 load balancer
Transient HTTP errors (429, 503, 504, 408) and connection errors are retried (up to 5 consecutive), not fatal
agent_json responses validated before returning (prevents silent None propagation)
Frontend auto-reconnects up to 3 times when SSE drops mid-stream
agentgenerator_timeout setting wired into HTTP client (was hardcoded)

Why

GCP's L7 load balancer kills SSE connections at ~5 minutes. Agent generation takes 5–30 minutes, so the old synchronous HTTP call would block for the entire duration, causing the SSE connection to die. The new polling pattern keeps each request short (~30s timeout) while asyncio.sleep(10) yields control for heartbeat delivery.

Changes

File	Change
`service.py`	`_submit_and_poll()` helper with retry logic for transient errors, `agent_json` validation, `agentgenerator_timeout` wired into client, `time.monotonic()` for wall-clock timeout
`settings.py`	`agentgenerator_timeout` default 1800→30 (now per-request, not total)
`useCopilotPage.ts`	Auto-reconnect on mid-stream SSE drop (max 3 attempts, 1s delay)
`test_service.py`	35 tests: polling happy path, failures, retry, timeout, transient HTTP retry, missing agent_json validation

Companion PR

Agent-generator async endpoints + Redis job store: https://github.com/Significant-Gravitas/AutoGPT-Agent-Generator/pull/120

Review comments addressed

autogpt-reviewer: Wire agentgenerator_timeout setting into HTTP client
autogpt-reviewer: Use time.monotonic() for accurate wall-clock timeout
autogpt-reviewer: Restore decompose_goal_external docstring with return-type enumeration
coderabbitai: Retry transient HTTP statuses (429/503/504/408) during polling
coderabbitai: Validate agent_json exists before returning from 3 endpoints

Test plan

35/35 backend tests pass
CI green (23/23 checks)
Deploy both PRs to dev and test full agent generation flow
Test SSE drop recovery: wait for LB kill at 5min → verify frontend reconnects
Verify polling heartbeats keep SSE alive during long generation

…nd SSE reconnect Platform service now submits jobs to agent-generator and polls for results (10s interval) instead of blocking on a single HTTP call for up to 30 min. asyncio.sleep in the poll loop yields to the event loop, keeping SSE heartbeats alive through GCP's L7 load balancer. Frontend auto-reconnects up to 3 times when SSE drops mid-stream.

greptile-apps · 2026-02-24T14:18:33Z

Too many files changed for review. (484 files found, 100 file limit)

github-actions · 2026-02-24T14:18:38Z

This PR targets the master branch but does not come from dev or a hotfix/* branch.

Automatically setting the base branch to dev.

coderabbitai · 2026-02-24T14:18:43Z

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and concisely summarizes the two main changes: async polling for agent-generator in the backend and SSE auto-reconnect for the frontend.
Description check	✅ Passed	The description provides detailed context on the changes, motivation (GCP L7 load balancer timeout), technical implementation details, and test coverage. It directly relates to the changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 95.74% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/async-polling-agent-generator

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-02-24T14:19:15Z

🔍 PR Overlap Detection

This check compares your PR against all other open PRs targeting the same branch to detect potential merge conflicts early.

🔴 Merge Conflicts Detected

The following PRs have been tested and will have merge conflicts if merged after this PR. Consider coordinating with the authors.

feat(copilot): add dummy agent generator for testing #12071 (Otto-AGPT · updated 12d ago)

🟢 Low Risk — File Overlap Only

These PRs touch the same files but different sections (click to expand)

fix(copilot): prevent double output from StreamFinish/mark_task_completed race #12195 (Otto-AGPT · updated 8h ago)
- Shared files: autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts
fix(backend): standardize microservice host/port configuration #12189 (majdyz · updated 1d ago)
- autogpt_platform/backend/backend/util/settings.py: L372-374
feat(copilot): Replace bubblewrap with e2b sandbox for unified filesystem #12162 (Otto-AGPT · updated 4d ago)
- Shared files: autogpt_platform/backend/backend/util/settings.py
Fix CoPilot stop button by cancelling backend chat stream #12116 (Deeven-Seru · updated 7d ago)
- Shared files: autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts

Summary: 1 conflict(s), 0 medium risk, 4 low risk (out of 5 PRs with file overlap)

Auto-generated on push. Ignores: openapi.json, lock files.

autogpt-reviewer

PR #12199 — feat(copilot): async polling for agent-generator + SSE auto-reconnect
Author: majdyz (Zamil Majdy) | Files: service.py (+183/-239), settings.py (+2/-2), useCopilotPage.ts (+40/-1), test_service.py (+372/-203)

🎯 Verdict: APPROVE WITH CONDITIONS

What This PR Does

GCP's L7 load balancer kills SSE connections at ~5 minutes. Agent generation takes 5-30 minutes, so the old synchronous HTTP call blocked for the entire duration, causing SSE to die. This PR switches the backend to a submit-and-poll pattern (submit job → poll every 10s → up to 30min budget) so each HTTP request stays short and asyncio.sleep() yields control for SSE heartbeat delivery. The frontend adds auto-reconnect (3 attempts, 1s delay) when SSE drops mid-stream.

Specialist Findings

🛡️ Security ✅ — No vulnerabilities found. Job ID is server-side only (never exposed to users), SSE reconnect uses same authenticated transport, no new endpoints added. _submit_and_poll is entirely backend→agent-generator with no user-controlled input in URL construction. Only informational: defensive job_id format validation would harden against a compromised upstream service.

🏗️ Architecture ✅ — Textbook refactoring. _submit_and_poll() is a clean single-responsibility abstraction that eliminated 4× duplicated HTTP/error-handling blocks (net -56 lines despite new functionality). Timeout model now correct: 30s per-request vs 30-min polling budget. Error classification helpers preserved and reused. Frontend reconnect cleanly contained in existing useEffect status watcher.
⚠️ agentgenerator_timeout setting in settings.py:374-376 is now dead code — nothing reads it. Must wire it back into _REQUEST_TIMEOUT or remove it.

⚡ Performance ✅ — Polling overhead is negligible (0.1 req/s per job; 100 concurrent sessions = 10 req/s). No memory leaks — poll responses aren't accumulated. Connection pool adequate.
⚠️ service.py:185-187: elapsed += POLL_INTERVAL_SECONDS drifts from wall-clock time — doesn't account for HTTP request duration (up to 30s on timeout). Under pathological conditions, loop could run ~5.4× longer than the 30-min cap. Should use time.monotonic().
⚠️ No cap on consecutive transient poll errors — sustained network partition retries for entire 30-min budget.

🧪 Testing ⚠️ — Backend polling tests are solid: 8 new TestSubmitAndPoll tests cover happy path, failed job, HTTP errors, connection errors, missing job_id, transient retry, 404 expired job, and timeout. Public function tests properly mock _submit_and_poll.
❌ Frontend SSE reconnect has zero test coverage — 40 lines of new stateful reconnect logic (the user-facing feature that solves the GCP LB problem) with no automated tests. Missing: reconnect trigger, counter cap, counter reset, cleanup on unmount.
⚠️ No test for consecutive transient failures, malformed poll JSON, or unexpected poll status values.

📖 Quality ✅ — Clean, well-organized code. Constants properly extracted (POLL_INTERVAL_SECONDS, MAX_POLL_TIME_SECONDS, MAX_RECONNECT_ATTEMPTS). Good section dividers. Frontend reconnect logic is readable with proper state machine.
⚠️ decompose_goal_external docstring lost useful return-type enumeration during refactor (service.py:225-228).
⚠️ setTimeout(() => resumeStream(), 1_000) — inline magic number should be RECONNECT_DELAY_MS.

📦 Product ✅ — Solves a real, painful UX problem (SSE dying during long agent generation). Backend polling is invisible to users. Frontend auto-reconnect is well-guarded with attempt limits and proper cleanup. Deployment-independent — backend and frontend can be deployed separately.
⚠️ "Connection lost" toast after 3 failures is a dead end — no retry button, asks user to manually refresh. Backend job may still be running.
⚠️ No user-visible progress indicator during 30-minute generation jobs — indefinite spinner.

📬 Discussion ✅ — CI fully green (25/25 checks pass). No human reviews submitted yet. No unresolved review threads.
⚠️ Author's manual test checklist (5 items) is entirely unchecked — deployment validation pending.
⚠️ Merge conflict detected with PR #12071 (Otto-AGPT) — needs coordination.
⚠️ Both Greptile and CodeRabbit skipped review due to inflated file count (299 files — branch likely includes dev merge).

🔎 QA ⚠️ — Backend changes verified: containers running pr-12199 code, _submit_and_poll present, health endpoint returns 200, clean startup logs. Storybook loads correctly.
⚠️ Frontend could not be live-tested — port 3000 was serving a different worktree (pr-12141), and Supabase client was unavailable for auth. SSE reconnect code verified correct in the worktree source but not exercised in browser.

Conditions (must fix before merge)

agentgenerator_timeout is dead code — settings.py:374-376: The setting exists (updated default + description) but nothing reads it. _REQUEST_TIMEOUT is hardcoded to httpx.Timeout(30.0) at service.py:82. Either wire _get_settings().config.agentgenerator_timeout into the timeout, or remove the dead setting. Dead config confuses operators.
Use monotonic clock for elapsed tracking — service.py:185-187: Replace elapsed += POLL_INTERVAL_SECONDS with time.monotonic() delta. Current approach doesn't account for HTTP request duration and can overshoot the 30-min budget by a large margin under pathological conditions (every poll hitting 30s timeout).

Should Fix (Follow-up OK)

service.py:192-195 — Add a consecutive transient error cap (e.g., bail after 5 consecutive RequestErrors) or exponential backoff. Currently retries indefinitely within the budget.
useCopilotPage.ts:281 — Exponential backoff on reconnect (1s → 2s → 4s) instead of fixed 1s. All 3 attempts fire within ~3s if LB is actively killing connections.
useCopilotPage.ts:~249 — Move MAX_RECONNECT_ATTEMPTS to module scope (standard React pattern).
useCopilotPage.ts:278 — Extract RECONNECT_DELAY_MS = 1_000 as a named constant.
service.py:225-228 — Restore return-type enumeration in decompose_goal_external docstring (possible types: instructions, clarifying_questions, unachievable_goal, vague_goal, error).
Frontend SSE reconnect tests — Add at least unit tests for the reconnect state machine (trigger, counter cap, counter reset, cleanup).

Risk Assessment

Merge risk: LOW | Rollback: EASY (revert 4 files, no schema changes, no data migrations)

@ntindle Well-architected refactor that solves the GCP LB SSE timeout problem cleanly. Two conditions before merge: wire or remove the orphaned agentgenerator_timeout setting, and fix the elapsed time tracking to use monotonic clock. Everything else is follow-up material. The core mechanism is sound — net code reduction with better reliability.

autogpt-reviewer

PR #12199 — feat(copilot): async polling for agent-generator + SSE auto-reconnect
Author: majdyz (Zamil Majdy) | Files: service.py (+183/-239), settings.py (+2/-2), useCopilotPage.ts (+40/-1), test_service.py (+372/-203)

🎯 Verdict: APPROVE WITH CONDITIONS

What This PR Does

GCP's L7 load balancer kills SSE connections at ~5 minutes, but agent generation takes 5–30 minutes. This PR switches the platform backend from a single long-lived HTTP call to a submit-and-poll pattern (POST → 202 + job_id → GET poll every 10s, 30min budget). The frontend adds auto-reconnect (3 attempts, 1s delay) when SSE drops mid-stream. Together, these two layers solve the connection-drop problem at both backend and frontend.

Specialist Findings

🛡️ Security ✅ — No blockers. job_id flows entirely within backend-to-microservice boundary — no IDOR risk. SSE reconnect reuses existing authenticated session — no auth bypass. Minor nit: job_id is interpolated directly into URL path (service.py:186) without format validation; a UUID/alphanumeric check would harden against hypothetical path traversal from a compromised internal service.

🏗️ Architecture ✅ — Clean, well-isolated abstraction. _submit_and_poll() has single responsibility; all 4 external functions delegate uniformly. Massive DRY win — eliminated 4× identical try/except blocks. Frontend and backend layers are independent (good separation). ⚠️ agentgenerator_timeout setting (settings.py:374) is now dead code — _REQUEST_TIMEOUT is hardcoded at service.py:82. Wire it back or delete the setting.

⚡ Performance ✅ — asyncio.sleep(10) correctly yields to event loop for heartbeat delivery. Singleton httpx.AsyncClient with connection pooling is efficient. No memory leaks in poll loop. ⚠️ Fixed 10s interval = up to 180 requests per generation; exponential backoff (2s→30s cap) would reduce to ~50 while detecting fast jobs sooner. ⚠️ Elapsed time tracking (service.py:190) drifts from wall clock — uses counter instead of time.monotonic().

🧪 Testing ⚠️ — Backend TestSubmitAndPoll (8 tests) is solid: happy path, failures, transient retry, timeout all covered. Public function tests properly mock at the _submit_and_poll boundary. However: ❌ Frontend SSE auto-reconnect (40 lines of stateful useEffect logic) has zero test coverage — this is the user-facing reliability fix. ❌ customize_template_external has no test class at all (missing TestCustomizeTemplateExternal). Missing edge cases: all-transient-errors-until-timeout, malformed poll JSON, empty result on completed job.

📖 Quality ✅ — Genuinely cleaner refactor (net -56 lines). Good naming conventions (_submit_and_poll, POLL_INTERVAL_SECONDS). Well-organized section comments. ⚠️ Docstrings trimmed too aggressively — decompose_goal_external lost useful return-type enumeration (instructions/clarifying_questions/unachievable_goal/vague_goal/error). ⚠️ setTimeout(() => resumeStream(), 1_000) — extract as RECONNECT_DELAY_MS for consistency.

📦 Product ✅ — Solves a real, painful user problem. Happy path is completely transparent to users. Reconnect is well-guarded with attempt limits and proper cleanup. Toast on exhaustion is clear. ⚠️ No progress indicator during long generation (25 min with zero feedback). ⚠️ Unclear if Stop button cancels the _submit_and_poll loop. ⚠️ Consider "Reconnecting..." indicator instead of brief error-state flash during 1s reconnect delay.

📬 Discussion ⚠️ — Zero human reviews (PR opened ~3 hours ago). Both Greptile and CodeRabbit skipped due to inflated diff (299 files from dev merge — only 4 actual changes). 🔴 Merge conflict with #12071 (stale/approved). Companion PR dependency: AutoGPT-Agent-Generator/pull/120 must deploy alongside for polling endpoints to exist. All 5 manual test plan checkboxes unchecked.

🔎 QA ⚠️ — Environment partially broken (backend/Supabase down, only Next.js frontend running). Login blocked, copilot page inaccessible. Code inspection confirms reconnect logic is well-implemented. Could not perform live SSE reconnect testing.

Conditions for Merge

agentgenerator_timeout setting is dead code — settings.py:374-376 defines it, service.py:82 hardcodes _REQUEST_TIMEOUT = httpx.Timeout(30.0) instead. Either wire settings.config.agentgenerator_timeout into the timeout, or delete the setting. Dead config misleads operators. (Flagged by: Security, Architect, Quality)
Restore decompose_goal_external return-type documentation — The docstring lost the enumeration of possible return types (instructions, clarifying_questions, unachievable_goal, vague_goal, error). One-line abbreviated Args: sections for all 4 functions would also help callers. (Flagged by: Quality)

Should Fix (Follow-up OK)

Frontend SSE reconnect needs tests — 40 lines of stateful useEffect logic with reconnectAttemptsRef, setTimeout cleanup, counter reset. At minimum: RTL test verifying reconnect triggers on streaming→error + hasActiveStream, counter exhaustion shows toast, counter resets on success. (Flagged by: Testing)
Add TestCustomizeTemplateExternal — Missing test class entirely. Function has its own response-mapping logic. (Flagged by: Testing)
Exponential backoff for polling — Fixed 10s = 180 requests worst case. Start at 2s, cap at 30s → ~50 requests, faster detection of quick jobs. (Flagged by: Performance, Architect)
Use time.monotonic() for elapsed tracking — Current counter drifts by HTTP request duration per iteration. (Flagged by: Performance, Quality)
Extract RECONNECT_DELAY_MS = 1_000 — Inline magic number in frontend. (Flagged by: Quality)
Rebase to reduce diff noise — 299-file diff from dev merge blocks automated reviewers. 4 actual files. (Flagged by: Discussion)

Risk Assessment

Merge risk: LOW | Rollback: EASY (revert 4 files, no migrations)

Deployment note: Requires companion PR (AutoGPT-Agent-Generator/pull/120) deployed alongside for polling endpoints. Without it, backend will get unexpected responses from agent-generator. Dummy mode still works independently.

@ntindle Two small conditions (dead setting + restore docstring), then this is good to go. Backend refactor is a clean DRY win, frontend reconnect is well-implemented. Main gap is frontend test coverage — not blocking merge but should be a fast follow-up.

… clock, cap poll errors

majdyz · 2026-02-25T07:53:46Z

Review Items Addressed (`0bc098a`)

Must Fix (both done):

✅ Wired agentgenerator_timeout setting into HTTP client — _get_client() now reads settings.config.agentgenerator_timeout dynamically instead of the hardcoded constant
✅ Replaced elapsed += POLL_INTERVAL_SECONDS with time.monotonic() delta for accurate wall-clock timeout tracking

Should Fix (done in same commit):
3. ✅ Added MAX_CONSECUTIVE_POLL_ERRORS = 5 — polling loop now gives up after 5 consecutive transient RequestError failures instead of retrying for the full 30-min budget
4. ✅ Restored decompose_goal_external docstring with full return-type enumeration

Tests updated: test_poll_timeout now mocks time.monotonic, new test_poll_gives_up_after_consecutive_transient_errors test added. All 32 tests pass.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py`:
- Around line 320-325: The code currently returns result.get("agent_json")
directly which can be None if the job payload is malformed; update each
agent-producing endpoint that returns result.get("agent_json") (the three
occurrences using result.get("agent_json")) to validate that agent_json is
present and is the expected type (e.g., a dict) before returning it, and if
invalid or missing return an invalid_response-style error object (use the
existing invalid_response helper or a
{"type":"error","error":"invalid_response", ...} payload) so callers get a clear
diagnostic instead of None; apply this change to all three return sites that
currently return result.get("agent_json").
- Around line 189-197: The poll handler currently treats transient HTTP statuses
as fatal; after calling _classify_http_error(e) inside the httpx.HTTPStatusError
except block, detect transient cases (e.g., status codes 429, 503, 504, 408 or
error_type values that _classify_http_error returns for rate limiting/temporary
upstream errors) and do NOT return an immediate _create_error_response; instead
log the transient condition via logger.error and surface a retryable signal (for
example raise a new RetryablePollError or re-raise the original exception) so
the caller can retry the poll; only call and return _create_error_response for
non-transient error_type values. Ensure you reference and update the
httpx.HTTPStatusError except block, the use of _classify_http_error,
logger.error, and _create_error_response accordingly.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 6e61cb1 and 0bc098a.

📒 Files selected for processing (4)

autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py
autogpt_platform/backend/backend/util/settings.py
autogpt_platform/backend/test/agent_generator/test_service.py
autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)

GitHub Check: Seer Code Review
GitHub Check: types
GitHub Check: Check PR Status
GitHub Check: test (3.13)
GitHub Check: test (3.12)
GitHub Check: test (3.11)
GitHub Check: end-to-end tests
GitHub Check: Analyze (python)

🧰 Additional context used

📓 Path-based instructions (15)

autogpt_platform/backend/**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

autogpt_platform/backend/**/*.py: Use Python 3.11 (required; managed by Poetry via pyproject.toml) for backend development
Always run 'poetry run format' (Black + isort) before linting in backend development
Always run 'poetry run lint' (ruff) after formatting in backend development

Files:

autogpt_platform/backend/backend/util/settings.py
autogpt_platform/backend/test/agent_generator/test_service.py
autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py

autogpt_platform/backend/**/*.{py,txt}

📄 CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)

Use poetry run prefix for all Python commands, including testing, linting, formatting, and migrations

Files:

autogpt_platform/backend/backend/util/settings.py
autogpt_platform/backend/test/agent_generator/test_service.py
autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py

autogpt_platform/backend/backend/**/*.py

📄 CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)

Use Prisma ORM for database operations in PostgreSQL with pgvector for embeddings

Files:

autogpt_platform/backend/backend/util/settings.py
autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py

autogpt_platform/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Format Python code with poetry run format

Files:

autogpt_platform/backend/backend/util/settings.py
autogpt_platform/backend/test/agent_generator/test_service.py
autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py

autogpt_platform/backend/**/test/**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use snapshot testing with '--snapshot-update' flag in backend tests when output changes; always review with 'git diff'

Files:

autogpt_platform/backend/test/agent_generator/test_service.py

autogpt_platform/backend/**/*test*.py

📄 CodeRabbit inference engine (AGENTS.md)

Run poetry run test for backend testing (runs pytest with docker based postgres + prisma)

Files:

autogpt_platform/backend/test/agent_generator/test_service.py

autogpt_platform/frontend/**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

autogpt_platform/frontend/**/*.{ts,tsx,js,jsx}: Use Node.js 21+ with pnpm package manager for frontend development
Always run 'pnpm format' for formatting and linting code in frontend development

Files:

autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts

autogpt_platform/frontend/**/*.{tsx,ts}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

autogpt_platform/frontend/**/*.{tsx,ts}: Use function declarations for components and handlers (not arrow functions) in React components
Only use arrow functions for small inline lambdas (map, filter, etc.) in React components
Use PascalCase for component names and camelCase with 'use' prefix for hook names in React
Use Tailwind CSS utilities only for styling in frontend components
Use design system components from 'src/components/' (atoms, molecules, organisms) in frontend development
Never use 'src/components/legacy/' in frontend code
Only use Phosphor Icons (@phosphor-icons/react) for icons in frontend components
Use generated API hooks from '@/app/api/generated/endpoints/' instead of deprecated 'BackendAPI' or 'src/lib/autogpt-server-api/'
Use React Query for server state (via generated hooks) in frontend development
Default to client components ('use client') in Next.js; only use server components for SEO or extreme TTFB needs
Use '' component for rendering errors in frontend UI; use toast notifications for mutation errors; use 'Sentry.captureException()' for manual exceptions
Separate render logic from data/behavior in React components; keep comments minimal (code should be self-documenting)

Files:

autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts

autogpt_platform/frontend/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

autogpt_platform/frontend/**/*.{ts,tsx}: No barrel files or 'index.ts' re-exports in frontend code
Regenerate API hooks with 'pnpm generate:api' after backend OpenAPI spec changes in frontend development

Files:

autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts

autogpt_platform/frontend/src/**/*.{ts,tsx}

📄 CodeRabbit inference engine (autogpt_platform/frontend/CLAUDE.md)

autogpt_platform/frontend/src/**/*.{ts,tsx}: Fully capitalize acronyms in symbols, e.g. graphID, useBackendAPI
Use function declarations (not arrow functions) for components and handlers
Separate render logic (.tsx) from business logic (use*.ts hooks)
Use shadcn/ui (Radix UI primitives) with Tailwind CSS styling for UI components
Use Phosphor Icons only for icons
Use ErrorCard for render errors, toast for mutations, and Sentry for exceptions
Use design system components from src/components/ (atoms, molecules, organisms)
Never use src/components/__legacy__/* components
Use generated API hooks from @/app/api/__generated__/endpoints/ with pattern use{Method}{Version}{OperationName}
Use Tailwind CSS only for styling, with design tokens
Do not use useCallback or useMemo unless asked to optimize a given function
Never type with any unless a variable/attribute can ACTUALLY be of any type

autogpt_platform/frontend/src/**/*.{ts,tsx}: Structure components as ComponentName/ComponentName.tsx + useComponentName.ts + helpers.ts and use design system components from src/components/ (atoms, molecules, organisms)
Use generated API hooks from @/app/api/__generated__/endpoints/ with pattern use{Method}{Version}{OperationName} and regenerate with pnpm generate:api
Use function declarations (not arrow functions) for components and handlers
Separate render logic from business logic with component.tsx + useComponent.ts + helpers.ts structure
Colocate state when possible, avoid creating large components, use sub-components in local /components folder
Avoid large hooks, abstract logic into helpers.ts files when sensible
Use arrow functions only for callbacks, not for component declarations
Avoid comments at all times unless the code is very complex
Do not use useCallback or useMemo unless asked to optimize a given function

Files:

autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts

autogpt_platform/frontend/src/**/*use*.ts

📄 CodeRabbit inference engine (autogpt_platform/frontend/CLAUDE.md)

Do not type hook returns, let TypeScript infer as much as possible

Files:

autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts

autogpt_platform/frontend/**/*.{js,jsx,ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

autogpt_platform/frontend/**/*.{js,jsx,ts,tsx}: Format frontend code using pnpm format
Never use components from src/components/__legacy__/*

Files:

autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts

autogpt_platform/frontend/**/*.{js,jsx,ts,tsx,css}

📄 CodeRabbit inference engine (AGENTS.md)

Use Tailwind CSS only for styling, use design tokens, and use Phosphor Icons only

Files:

autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts

autogpt_platform/frontend/src/**/*.ts

📄 CodeRabbit inference engine (AGENTS.md)

Do not type hook returns, let Typescript infer as much as possible

Files:

autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts

autogpt_platform/**/*.{ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

Never type with any, if no types available use unknown

Files:

autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts

🧠 Learnings (2)

📚 Learning: 2026-02-04T16:49:42.490Z

Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-02-04T16:49:42.490Z
Learning: Applies to autogpt_platform/backend/**/test/**/*.py : Use snapshot testing with '--snapshot-update' flag in backend tests when output changes; always review with 'git diff'

Applied to files:

autogpt_platform/backend/test/agent_generator/test_service.py

📚 Learning: 2026-02-04T16:50:20.508Z

Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: autogpt_platform/backend/CLAUDE.md:0-0
Timestamp: 2026-02-04T16:50:20.508Z
Learning: Applies to autogpt_platform/backend/**/*_test.py : Always review snapshot changes with `git diff` before committing when updating snapshots with `poetry run pytest --snapshot-update`

Applied to files:

autogpt_platform/backend/test/agent_generator/test_service.py

🧬 Code graph analysis (1)

autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py (2)

autogpt_platform/backend/backend/util/request.py (2)

post (569-570)

json (295-301)

autogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.py (2)

generate_agent_dummy (101-115)

customize_template_dummy (140-151)

🔇 Additional comments (5)

autogpt_platform/backend/backend/util/settings.py (1)

374-377: Per-request timeout config update looks consistent with the new polling model.

The new default/value description pair is clear and matches submit/poll request-level timeout semantics.

autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts (2)

261-266: Session cache invalidation on stream end is a good addition.

This avoids stale hydrated session data after stream termination and improves remount/navigation consistency.

268-297: Reconnect strategy is well-bounded and safely gated.

The streaming→error guard with hasActiveStream and capped retry attempts prevents runaway reconnect loops while still recovering common mid-stream SSE drops.

autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py (1)

181-185: Good use of monotonic timing and non-blocking poll waits.

Using time.monotonic() with await asyncio.sleep(...) is the right pattern for stable elapsed-time checks in async loops.

autogpt_platform/backend/test/agent_generator/test_service.py (1)

52-274: Great coverage on the new submit-and-poll core path.

These tests exercise success, timeout, transient network failures, 404 expiry, and capped retry behavior in a way that matches the new helper semantics.

autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py

…ent_json responses

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py`:
- Around line 231-233: In _submit_and_poll, after detecting status ==
"completed" (and before returning poll_data.get("result", {})), validate that
poll_data.get("result") is a dict; if it is, return it, otherwise log an error
including job_id and the unexpected result type/value and return a structured
invalid response (e.g. {"invalid_response": True, "raw_result": <result>}) so
downstream callers that call .get(...) won't raise AttributeError.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 0bc098a and 9b3e25d.

📒 Files selected for processing (2)

autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py
autogpt_platform/backend/test/agent_generator/test_service.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)

GitHub Check: Seer Code Review
GitHub Check: types
GitHub Check: test (3.11)
GitHub Check: test (3.13)
GitHub Check: test (3.12)
GitHub Check: end-to-end tests
GitHub Check: Analyze (python)
GitHub Check: Check PR Status

🧰 Additional context used

📓 Path-based instructions (6)

autogpt_platform/backend/**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

autogpt_platform/backend/**/*.py: Use Python 3.11 (required; managed by Poetry via pyproject.toml) for backend development
Always run 'poetry run format' (Black + isort) before linting in backend development
Always run 'poetry run lint' (ruff) after formatting in backend development

Files:

autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py
autogpt_platform/backend/test/agent_generator/test_service.py

autogpt_platform/backend/**/*.{py,txt}

📄 CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)

Use poetry run prefix for all Python commands, including testing, linting, formatting, and migrations

Files:

autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py
autogpt_platform/backend/test/agent_generator/test_service.py

autogpt_platform/backend/backend/**/*.py

📄 CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)

Use Prisma ORM for database operations in PostgreSQL with pgvector for embeddings

Files:

autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py

autogpt_platform/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Format Python code with poetry run format

Files:

autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py
autogpt_platform/backend/test/agent_generator/test_service.py

autogpt_platform/backend/**/test/**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use snapshot testing with '--snapshot-update' flag in backend tests when output changes; always review with 'git diff'

Files:

autogpt_platform/backend/test/agent_generator/test_service.py

autogpt_platform/backend/**/*test*.py

📄 CodeRabbit inference engine (AGENTS.md)

Run poetry run test for backend testing (runs pytest with docker based postgres + prisma)

Files:

autogpt_platform/backend/test/agent_generator/test_service.py

🧠 Learnings (2)

📚 Learning: 2026-02-04T16:49:42.490Z

Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-02-04T16:49:42.490Z
Learning: Applies to autogpt_platform/backend/**/test/**/*.py : Use snapshot testing with '--snapshot-update' flag in backend tests when output changes; always review with 'git diff'

Applied to files:

autogpt_platform/backend/test/agent_generator/test_service.py

📚 Learning: 2026-02-04T16:50:20.508Z

Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: autogpt_platform/backend/CLAUDE.md:0-0
Timestamp: 2026-02-04T16:50:20.508Z
Learning: Applies to autogpt_platform/backend/**/*_test.py : Always review snapshot changes with `git diff` before committing when updating snapshots with `poetry run pytest --snapshot-update`

Applied to files:

autogpt_platform/backend/test/agent_generator/test_service.py

🧬 Code graph analysis (1)

autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py (1)

autogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.py (2)

generate_agent_dummy (101-115)

customize_template_dummy (140-151)

🔇 Additional comments (2)

autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py (1)

189-208: Transient poll retry behavior is a solid reliability improvement.

Handling retryable HTTP/request errors with a consecutive-error cap is a good balance between resilience and bounded wait time.

Also applies to: 212-225

autogpt_platform/backend/test/agent_generator/test_service.py (1)

52-336: Great expansion of polling-path test coverage.

The new cases for timeout, transient retries, and consecutive-error aborts materially reduce regression risk for the async polling flow.

Also applies to: 338-850

autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py

coderabbitai

🧹 Nitpick comments (2)

autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py (2)

344-350: Consider deduplicating agent_json extraction/validation.

The same validation block appears in three endpoints. A small helper would reduce drift and keep error responses consistent.

♻️ Suggested refactor

+def _extract_agent_json_or_error(result: dict[str, Any]) -> dict[str, Any]:
+    agent_json = result.get("agent_json")
+    if isinstance(agent_json, dict):
+        return agent_json
+    return _create_error_response(
+        "Agent Generator returned no agent_json in result", "invalid_response"
+    )
@@
-    agent_json = result.get("agent_json")
-    if not isinstance(agent_json, dict):
-        return _create_error_response(
-            "Agent Generator returned no agent_json in result", "invalid_response"
-        )
-    return agent_json
+    return _extract_agent_json_or_error(result)
@@
-    agent_json = result.get("agent_json")
-    if not isinstance(agent_json, dict):
-        return _create_error_response(
-            "Agent Generator returned no agent_json in result", "invalid_response"
-        )
-    return agent_json
+    return _extract_agent_json_or_error(result)
@@
-    agent_json = result.get("agent_json")
-    if not isinstance(agent_json, dict):
-        return _create_error_response(
-            "Agent Generator returned no agent_json in result", "invalid_response"
-        )
-    return agent_json
+    return _extract_agent_json_or_error(result)

Also applies to: 391-396, 441-446

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py`
around lines 344 - 350, Create a small helper (e.g.,
_extract_agent_json(result)) in service.py that encapsulates the repeated logic:
get "agent_json" from the result, verify it's a dict, and return either the dict
or call _create_error_response("Agent Generator returned no agent_json in
result", "invalid_response"). Replace the three duplicated blocks (the
extraction/validation around agent_json at the locations using variable
agent_json) with calls to this helper so all endpoints use the same validation
and error response.

240-245: Handle unknown poll statuses explicitly instead of treating all as “still running”.

Right now, any status other than "completed" / "failed" keeps polling until timeout. If upstream returns an unexpected terminal state, this delays failure diagnosis by up to 30 minutes.

♻️ Suggested hardening

         if status == "completed":
             logger.info(f"Agent Generator job {job_id} completed")
             result = poll_data.get("result", {})
             if not isinstance(result, dict):
                 return _create_error_response(
                     "Agent Generator returned invalid result payload",
                     "invalid_response",
                 )
             return result
         elif status == "failed":
             error_msg = poll_data.get("error", "Job failed")
             logger.error(f"Agent Generator job {job_id} failed: {error_msg}")
             return _create_error_response(error_msg, "job_failed")
-        # else: still running — keep polling
+        elif status in {"pending", "queued", "running"}:
+            continue
+        else:
+            logger.error(f"Unknown job status for {job_id}: {status!r}")
+            return _create_error_response(
+                f"Unknown job status: {status}",
+                "invalid_response",
+            )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py`
around lines 240 - 245, The polling loop currently treats any
non-"completed"/"failed" status as "still running"; update the branch that
checks status (around the elif status == "failed" block handling job_id and
poll_data) to explicitly detect unexpected terminal statuses: if status is a
known running state (e.g., "pending","running","in_progress") continue polling,
otherwise log an error including job_id and the unexpected status/poll_data and
return _create_error_response with a distinct code (e.g., "unknown_status") so
callers fail fast instead of waiting the full timeout.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py`:
- Around line 344-350: Create a small helper (e.g., _extract_agent_json(result))
in service.py that encapsulates the repeated logic: get "agent_json" from the
result, verify it's a dict, and return either the dict or call
_create_error_response("Agent Generator returned no agent_json in result",
"invalid_response"). Replace the three duplicated blocks (the
extraction/validation around agent_json at the locations using variable
agent_json) with calls to this helper so all endpoints use the same validation
and error response.
- Around line 240-245: The polling loop currently treats any
non-"completed"/"failed" status as "still running"; update the branch that
checks status (around the elif status == "failed" block handling job_id and
poll_data) to explicitly detect unexpected terminal statuses: if status is a
known running state (e.g., "pending","running","in_progress") continue polling,
otherwise log an error including job_id and the unexpected status/poll_data and
return _create_error_response with a distinct code (e.g., "unknown_status") so
callers fail fast instead of waiting the full timeout.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 9b3e25d and 8ef8bec.

📒 Files selected for processing (1)

autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)

GitHub Check: types
GitHub Check: Seer Code Review
GitHub Check: end-to-end tests
GitHub Check: test (3.12)
GitHub Check: Analyze (python)
GitHub Check: test (3.11)
GitHub Check: test (3.13)
GitHub Check: Check PR Status

🧰 Additional context used

📓 Path-based instructions (4)

autogpt_platform/backend/**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

autogpt_platform/backend/**/*.py: Use Python 3.11 (required; managed by Poetry via pyproject.toml) for backend development
Always run 'poetry run format' (Black + isort) before linting in backend development
Always run 'poetry run lint' (ruff) after formatting in backend development

Files:

autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py

autogpt_platform/backend/**/*.{py,txt}

📄 CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)

Use poetry run prefix for all Python commands, including testing, linting, formatting, and migrations

Files:

autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py

autogpt_platform/backend/backend/**/*.py

📄 CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)

Use Prisma ORM for database operations in PostgreSQL with pgvector for embeddings

Files:

autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py

autogpt_platform/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Format Python code with poetry run format

Files:

autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py

🔇 Additional comments (3)

autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py (3)

125-135: Nice improvement on timeout configuration.

Using settings.config.agentgenerator_timeout in _get_client() removes the hardcoded timeout and keeps request timeout behavior centrally configurable.

181-227: Polling loop resiliency is solid.

time.monotonic() budget tracking, asyncio.sleep() yielding, and capped consecutive transient error retries are well implemented for long-running jobs.

503-508: close_client() lifecycle handling looks correct.

Closing the singleton client and resetting _client to None is clean and safe for reinitialization.

…nknown poll status

majdyz · 2026-02-25T09:58:28Z

Addressed both nitpicks from the latest coderabbit review in ec7c7eb:

Code dedup — extracted _extract_agent_json(result) helper, used in all 3 agent-producing endpoints.
Unknown poll status — now explicitly handles running/pending/queued as known running states; unknown statuses fail fast with invalid_response instead of polling for 30 minutes.

sentry · 2026-02-25T10:02:34Z

autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts

+        const timer = setTimeout(() => resumeStream(), 1_000);
+        return () => clearTimeout(timer);


Bug: A race condition in the useEffect hook for auto-reconnection can prevent it from firing. If a dependency changes, the reconnect timer is cleared but not reset.
_{Severity: MEDIUM}

Suggested Fix

To fix this, the logic for tracking the previous status needs to be more robust against re-renders. One approach is to use useRef to store the timer ID and only clear it if it exists, while ensuring the condition to set a new timer correctly evaluates the status transition, perhaps by managing the 'previous status' state outside the ref that gets updated every render.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts#L281-L282 Potential issue: A race condition exists in the auto-reconnect logic within the `useEffect` hook. When a stream's `status` transitions from "streaming" to "error", a one-second timer is set to call `resumeStream`. However, if a dependency in the hook's dependency array (e.g., the `resumeStream` function) is recreated within that one-second window, the effect's cleanup function is triggered, clearing the timer. When the effect re-runs, the `prev` status is now "error", causing the condition `prev === "streaming"` to fail. This prevents the timer from being set again, and the auto-reconnection never occurs.

_{Did we get this right? 👍 / 👎 to inform future reviews.}

coderabbitai

🧹 Nitpick comments (3)

autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py (3)

227-229: Guard poll_resp.json() against DecodingError.

Same as the submit step: a 200 response with a non-JSON body causes httpx.DecodingError to escape the poll loop, producing an opaque error at the caller site.

♻️ Proposed fix

         consecutive_errors = 0
-        poll_data = poll_resp.json()
+        try:
+            poll_data = poll_resp.json()
+        except (httpx.DecodingError, ValueError) as e:
+            return _create_error_response(
+                f"Agent Generator returned non-JSON poll response for job {job_id}: {e}",
+                "invalid_response",
+            )
         status = poll_data.get("status")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py`
around lines 227 - 229, Guard the call to poll_resp.json() against
httpx.DecodingError by wrapping it in a try/except: catch httpx.DecodingError
around poll_resp.json(), log the decoding error (include poll_resp.status_code
and poll_resp.text if available), increment the existing consecutive_errors
counter and continue the polling loop instead of letting the exception escape;
ensure poll_data is only used when parsing succeeded (i.e., move usage of
poll_data and status into the try block or after a successful parse).

159-176: Guard response.json() against DecodingError; consider retrying transient submit errors.

Two related concerns on the submit step:

Unguarded response.json() (line 171): If the agent-generator returns a non-JSON body with a 202 (even transiently), httpx.DecodingError escapes _submit_and_poll unhandled. It is caught only by each caller's broad except Exception, producing an opaque "Unexpected error" message. Catching it here gives a more actionable diagnostic.
No retry for transient submit errors: The poll loop retries 429/503/504/408 up to MAX_CONSECUTIVE_POLL_ERRORS times, but a single transient HTTP error during the submit POST fails the entire operation. If brief LB restarts during submission are a concern, a small retry budget here would be consistent.

♻️ Proposed fix for issue 1 (JSON decoding guard)

     data = response.json()
+    except (httpx.DecodingError, ValueError) as e:
+        return _create_error_response(
+            f"Agent Generator returned non-JSON submit response: {e}",
+            "invalid_response",
+        )
     job_id = data.get("job_id")

Or inline within the existing try block:

     try:
         response = await client.post(endpoint, json=payload)
         response.raise_for_status()
+        data = response.json()
     except httpx.HTTPStatusError as e:
         error_type, error_msg = _classify_http_error(e)
         logger.error(error_msg)
         return _create_error_response(error_msg, error_type)
     except httpx.RequestError as e:
         error_type, error_msg = _classify_request_error(e)
         logger.error(error_msg)
         return _create_error_response(error_msg, error_type)
+    except (httpx.DecodingError, ValueError) as e:
+        return _create_error_response(
+            f"Agent Generator returned non-JSON submit response: {e}",
+            "invalid_response",
+        )
-
-    data = response.json()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py`
around lines 159 - 176, The submit step in _submit_and_poll should guard against
httpx.DecodingError when calling response.json() and should retry transient
submit failures: wrap the client.post call in a small retry loop (e.g., 2–3
attempts with brief backoff) that treats transient status codes (429, 503, 504,
408) and RequestError as retryable (consistent with MAX_CONSECUTIVE_POLL_ERRORS
behavior), use _classify_http_error/_classify_request_error to classify/log
failures, and on final failure return _create_error_response with a clear
"invalid_response" or appropriate error_type; additionally catch
httpx.DecodingError after a successful response and return
_create_error_response with a diagnostic message instead of letting it
propagate.

34-36: Consider exposing poll interval/budget in Settings alongside agentgenerator_timeout.

agentgenerator_timeout (per-request HTTP timeout) is now a configurable setting, but POLL_INTERVAL_SECONDS (10 s) and MAX_POLL_TIME_SECONDS (1800 s) are hardcoded constants. For deployments with different job latency profiles these are operationally useful knobs. This is a low-priority improvement, but keeping the three values together in Settings would be consistent.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py`
around lines 34 - 36, The hardcoded polling constants POLL_INTERVAL_SECONDS,
MAX_POLL_TIME_SECONDS, and MAX_CONSECUTIVE_POLL_ERRORS should be moved into the
app Settings alongside agentgenerator_timeout so they are configurable per
deployment; add new Settings fields (e.g., agentgenerator_poll_interval,
agentgenerator_poll_timeout, agentgenerator_max_consecutive_poll_errors) with
the current defaults (10.0, 1800.0, 5), update
autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py to
remove the constants and read those values from Settings (use the same Settings
accessor used for agentgenerator_timeout) wherever POLL_INTERVAL_SECONDS,
MAX_POLL_TIME_SECONDS, and MAX_CONSECUTIVE_POLL_ERRORS are referenced, and
ensure any validation/type casting (float/int) is applied and unit tests/usage
still behave the same by default.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py`:
- Around line 227-229: Guard the call to poll_resp.json() against
httpx.DecodingError by wrapping it in a try/except: catch httpx.DecodingError
around poll_resp.json(), log the decoding error (include poll_resp.status_code
and poll_resp.text if available), increment the existing consecutive_errors
counter and continue the polling loop instead of letting the exception escape;
ensure poll_data is only used when parsing succeeded (i.e., move usage of
poll_data and status into the try block or after a successful parse).
- Around line 159-176: The submit step in _submit_and_poll should guard against
httpx.DecodingError when calling response.json() and should retry transient
submit failures: wrap the client.post call in a small retry loop (e.g., 2–3
attempts with brief backoff) that treats transient status codes (429, 503, 504,
408) and RequestError as retryable (consistent with MAX_CONSECUTIVE_POLL_ERRORS
behavior), use _classify_http_error/_classify_request_error to classify/log
failures, and on final failure return _create_error_response with a clear
"invalid_response" or appropriate error_type; additionally catch
httpx.DecodingError after a successful response and return
_create_error_response with a diagnostic message instead of letting it
propagate.
- Around line 34-36: The hardcoded polling constants POLL_INTERVAL_SECONDS,
MAX_POLL_TIME_SECONDS, and MAX_CONSECUTIVE_POLL_ERRORS should be moved into the
app Settings alongside agentgenerator_timeout so they are configurable per
deployment; add new Settings fields (e.g., agentgenerator_poll_interval,
agentgenerator_poll_timeout, agentgenerator_max_consecutive_poll_errors) with
the current defaults (10.0, 1800.0, 5), update
autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py to
remove the constants and read those values from Settings (use the same Settings
accessor used for agentgenerator_timeout) wherever POLL_INTERVAL_SECONDS,
MAX_POLL_TIME_SECONDS, and MAX_CONSECUTIVE_POLL_ERRORS are referenced, and
ensure any validation/type casting (float/int) is applied and unit tests/usage
still behave the same by default.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 8ef8bec and ec7c7eb.

📒 Files selected for processing (1)

autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: test (3.13)
GitHub Check: test (3.12)
GitHub Check: end-to-end tests
GitHub Check: test (3.11)
GitHub Check: Check PR Status

🧰 Additional context used

📓 Path-based instructions (4)

autogpt_platform/backend/**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

autogpt_platform/backend/**/*.py: Use Python 3.11 (required; managed by Poetry via pyproject.toml) for backend development
Always run 'poetry run format' (Black + isort) before linting in backend development
Always run 'poetry run lint' (ruff) after formatting in backend development

Files:

autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py

autogpt_platform/backend/**/*.{py,txt}

📄 CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)

Use poetry run prefix for all Python commands, including testing, linting, formatting, and migrations

Files:

autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py

autogpt_platform/backend/backend/**/*.py

📄 CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)

Use Prisma ORM for database operations in PostgreSQL with pgvector for embeddings

Files:

autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py

autogpt_platform/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Format Python code with poetry run format

Files:

autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py

🧬 Code graph analysis (1)

autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py (2)

autogpt_platform/backend/backend/util/request.py (2)

post (569-570)

json (295-301)

autogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.py (2)

generate_agent_dummy (101-115)

customize_template_dummy (140-151)

🔇 Additional comments (1)

autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py (1)

143-252: _submit_and_poll polling logic is well-structured overall.

The poll loop correctly:

Uses time.monotonic() for accurate wall-clock budget tracking.

Resets consecutive_errors after any successful poll response (line 227).

Distinguishes 404 (non-retryable) from transient statuses (retryable) before computing status_code.

Fails fast on unknown job statuses (invalid_response) rather than burning the full poll budget.

Returns a structured error dict in every exit path.

All three items from previous review rounds (transient HTTP retry, agent_json validation, result dict type guard) are correctly addressed.

autogpt-reviewer

PR #12199 — Re-review #2: feat(copilot): async polling for agent-generator + SSE auto-reconnect
Author: majdyz | HEAD: ec7c7eb | Previous: d78e0ee (4 new commits)

🎯 Verdict: APPROVE

What Changed Since Last Review

Author addressed both conditions and most should-fix items from our first review in 4 focused commits:

Commit	What
`0bc098a`	Wire `agentgenerator_timeout` setting into client, use `time.monotonic()`
`9b3e25d`	Retry transient HTTP errors (429/503/504/408), validate `job_id`, `MAX_CONSECUTIVE_POLL_ERRORS=5`
`8ef8bec`	Validate completed job `result` type
`ec7c7eb`	Extract `_extract_agent_json` helper, fail fast on unknown status

Condition Status

✅ agentgenerator_timeout dead code — FIXED. _get_client() now reads settings.config.agentgenerator_timeout. Hardcoded _REQUEST_TIMEOUT removed.
✅ Restore decompose_goal_external docstring — FIXED. Full return-type enumeration restored with all 5 variants.

Should-Fix Status

Item	Status
`time.monotonic()` for elapsed tracking	✅ FIXED
Consecutive transient error cap	✅ FIXED (`MAX_CONSECUTIVE_POLL_ERRORS=5`)
Transient HTTP retry (429/503/504/408)	✅ FIXED (new, beyond what we asked)
Validate `result` type on completed job	✅ FIXED
Unknown job status → error	✅ FIXED (new — explicit `running`/`pending`/`queued` allowlist)
`_extract_agent_json` DRY helper	✅ FIXED (replaces 3× `result.get("agent_json")`)
5 new tests for above	✅ FIXED
Frontend reconnect tests	❌ Not addressed (follow-up)
`TestCustomizeTemplateExternal`	❌ Not addressed (follow-up)
`RECONNECT_DELAY_MS` constant	❌ Not addressed (minor)

New Code Quality Assessment

The new code is well-structured:

Transient HTTP retry correctly differentiates 429/503/504/408 (retry with counter) from 500/other (fail immediately). Good distinction.
consecutive_errors counter resets to 0 on successful poll — only consecutive failures trigger give-up. Correct behavior.
_extract_agent_json eliminates 3× identical .get("agent_json") calls with proper validation. Returns typed error on missing/invalid.
Explicit status allowlist (running/pending/queued) with fallthrough error for unknown statuses — defensive and correct.
Test coverage for all new paths: transient HTTP retry + recovery, non-transient immediate fail, consecutive error give-up, monotonic time mocking, missing agent_json validation.

Remaining (Non-blocking Follow-ups)

Frontend SSE reconnect tests — 40 lines of stateful useEffect still untested. Not blocking because the code was reviewed by 3 specialists and logic is straightforward, but should be added.
TestCustomizeTemplateExternal — Still missing. Low risk (mirrors generate_agent_patch_external logic).
RECONNECT_DELAY_MS constant — Minor style nit.

Risk Assessment

Merge risk: LOW | Rollback: EASY (4 files, no migrations)
CI: All tests pass (3.11/3.12/3.13), lint ✅, types ✅, e2e ✅. Check PR Status failure is a race condition (ran before other checks finished).

⚠️ Deployment: Still requires companion PR (AutoGPT-Agent-Generator/pull/120) for polling endpoints.

@ntindle Clean iteration — author addressed all conditions and most should-fix items with well-tested code. Ready to merge.

github-actions · 2026-02-26T13:52:49Z

This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request.

ntindle · 2026-03-04T03:31:42Z

is this still needed?

ntindle

Code Review — `feat(backend/copilot): async polling for agent-generator + SSE auto-reconnect`

The submit-and-poll approach is the right architectural decision for surviving GCP L7 LB kill-offs, and the overall structure is clean. However, there are several issues ranging from a behavioral loop bug to missing safety nets that need to be addressed before merge.

🔴 Blocking

1. Infinite reconnect loop in `useCopilotPage.ts` (line ~1603)

The reconnect counter is reset every time status === "streaming" — which happens immediately after each successful resumeStream(). This means the 3-attempt guard is per-error-burst, not per-session, and can loop indefinitely:

streaming → error    (attempt 1, counter = 1)
resumeStream() called
streaming            (counter RESETS to 0 ← bug)
streaming → error    (attempt 1 again, counter = 1)
... loops forever if LB keeps killing connections every ~5 min

Fix: Only reset the counter on status === "ready" (stream fully and successfully complete), not on status === "streaming":

// Reset reconnect counter ONLY when stream completes normally
if (status === "ready") {
  reconnectAttemptsRef.current = 0;
}

2. `response.json()` / `poll_resp.json()` not guarded in `_submit_and_poll` (`service.py`, lines ~142 & ~199)

If the agent-generator (or an intervening proxy) returns a non-JSON body (HTML error page, plain-text gateway timeout, etc.) both data = response.json() and poll_data = poll_resp.json() will throw a json.JSONDecodeError. This propagates out of _submit_and_poll unhandled, gets caught by the bare except Exception in callers, and surfaces as a cryptic "unexpected_error".

Fix: Wrap both .json() calls:

try:
    data = response.json()
except Exception as exc:
    return _create_error_response(
        f"Agent Generator returned non-JSON submit response: {exc}",
        "invalid_response",
    )

Do the same for poll_data = poll_resp.json() inside the polling loop.

3. Silent removal of `operation_id` / `session_id` parameters is a breaking API change (`service.py`)

generate_agent_patch_external and customize_template_external previously accepted operation_id: str = "" and session_id: str = "" keyword arguments that enabled Redis Streams async-callback mode. These are gone without a deprecation shim.

Any caller that passes those keyword arguments will receive an immediate TypeError at runtime. The companion PR description notes the {"status": "accepted"} response type is also removed, but the internal callers of these functions (copilot router, etc.) have not been shown to be audited.

Required: Show the call sites of both functions and confirm no caller passes operation_id or session_id. If they do, add **_ignored_kwargs or raise a DeprecationWarning with a grace-period PR.

4. `agentgenerator_timeout` semantic change silently breaks operator overrides (`settings.py`, line ~656)

Changing the default from 1800 → 30 is fine for new deployments, but:

Anyone who has overridden agentgenerator_timeout=60 (for example) expecting 60 seconds total time will now have a 60-second per-request HTTP timeout and still poll for up to 30 minutes.
The hardcoded MAX_POLL_TIME_SECONDS = 1800.0 in service.py cannot be overridden via settings at all.

Fix: Expose MAX_POLL_TIME_SECONDS as a settings field (e.g., agentgenerator_poll_timeout: int = Field(default=1800, ...)). Update the agentgenerator_timeout description to say "per-request HTTP timeout" explicitly and note the default change.

🟠 High Severity

5. `MAX_RECONNECT_ATTEMPTS` defined inside the hook (`useCopilotPage.ts`, line ~1559)

const MAX_RECONNECT_ATTEMPTS = 3;   // recreated on every render

Move to module scope. This is also flagged by most lint/ESLint no-magic-numbers rules.

6. `resumeStream` in `useEffect` deps may not be stable (`useCopilotPage.ts`, line ~1606)

resumeStream is listed in the useEffect dependency array. If resumeStream is not wrapped in useCallback with a stable dependency set, it will be a new function reference on every render, causing the effect to fire on every render — potentially triggering reconnect logic unexpectedly. Please confirm resumeStream has a stable reference, or wrap the call inside the effect with a ref to avoid the dependency.

7. No tests for `customize_template_external` polling behavior (`test_service.py`)

customize_template_external was migrated to _submit_and_poll but there is no corresponding test class in the diff. It is the only migrated endpoint without coverage for:

Happy-path agent_json extraction
clarifying_questions passthrough
Error propagation from _submit_and_poll
_extract_agent_json validation failure

Add a TestCustomizeTemplateExternal class mirroring TestGenerateAgentPatchExternal.

8. No cancellation of in-flight polling job on caller abort (`service.py`, `_submit_and_poll`)

If the FastAPI request handler is cancelled (client closes connection, timeout), asyncio.CancelledError will propagate correctly out of asyncio.sleep, but the job submitted to the agent-generator will continue running on the remote side until completion. There is no cleanup path.

Fix: Catch asyncio.CancelledError, issue a best-effort DELETE /api/jobs/{job_id}, then re-raise:

except asyncio.CancelledError:
    logger.warning(f"Polling cancelled for job {job_id}, attempting cleanup")
    with contextlib.suppress(Exception):
        await client.delete(f"/api/jobs/{job_id}")
    raise

🟡 Medium / Suggestions

9. `asyncio.sleep` patched globally in tests (`test_service.py`)

patch("asyncio.sleep", new_callable=AsyncMock) patches the global symbol, not the one imported by the module under test. Prefer:

patch("backend.copilot.tools.agent_generator.service.asyncio.sleep", new_callable=AsyncMock)

This is more precise and won't accidentally suppress sleeps in other co-routines sharing the event loop.

10. `_extract_agent_json` return type is ambiguous (`service.py`, lines ~226-236)

_extract_agent_json returns either valid agent JSON or an error-response dict, both typed as dict[str, Any]. Callers check result.get("type") == "error" before calling it, so errors from validation inside _extract_agent_json silently bubble back as undetected error dicts.

Consider returning None on validation failure (and generating the error response in the caller) so type checkers can distinguish the success case from the error case.

11. `_submit_and_poll` tests reach into private API (`test_service.py`, `TestSubmitAndPoll`)

await service._submit_and_poll(...) tests the private helper directly. These tests are fragile — renaming or splitting the function breaks the entire class. Most of the scenarios (transient retry, 404, timeout) are already implicitly exercised by the public-API tests. Consider whether direct private-function tests are necessary or if they can be folded into the public-API test classes.

12. `_client` singleton ignores runtime `agentgenerator_timeout` changes (`service.py`, `_get_client()`)

This was true before this PR but becomes more operationally important now that agentgenerator_timeout controls per-request HTTP timeouts in a polling loop. A comment documenting that the client must be reset (set service._client = None) after changing the setting would save future debugging time.

13. `hasActiveStream` timing in `useEffect` (`useCopilotPage.ts`)

hasActiveStream is read inside useEffect, which runs after the render commit. If status → "error" and hasActiveStream → false are batched into the same React update, the effect may see hasActiveStream === false and skip the reconnect even though the backend job is still running. Document the ordering assumption or use a useRef for hasActiveStream to guarantee synchronous reads.

📝 Nits

The float() cast in timeout = httpx.Timeout(float(settings.config.agentgenerator_timeout)) is redundant — httpx.Timeout already accepts int. Remove float().
Docstrings on _create_error_response, _classify_http_error, and _classify_request_error were shortened to one-liners. The removed Args / Returns sections on _classify_* functions (which have non-obvious tuple return types) should be preserved for discoverability.
decompose_goal_external, generate_agent_external, and generate_agent_patch_external still declare -> dict[str, Any] | None but the new implementation never returns None. Update return types to -> dict[str, Any] to avoid misleading callers.

Summary of required changes before merge:

Fix infinite reconnect loop (reset counter on "ready" only, not "streaming")
Guard .json() calls against JSONDecodeError in _submit_and_poll
Audit and confirm operation_id/session_id removal doesn't break any callers
Expose MAX_POLL_TIME_SECONDS via settings; update agentgenerator_timeout description
Add TestCustomizeTemplateExternal test class
Handle asyncio.CancelledError with best-effort job cleanup

majdyz requested review from a team as code owners February 24, 2026 14:18

github-project-automation bot added this to AutoGPT development kanban Feb 24, 2026

majdyz requested review from Swiftyos and ntindle and removed request for a team February 24, 2026 14:18

github-project-automation bot moved this to 🆕 Needs initial review in AutoGPT development kanban Feb 24, 2026

github-actions bot changed the base branch from master to dev February 24, 2026 14:18

github-actions bot added platform/backend AutoGPT Platform - Back end platform/blocks labels Feb 24, 2026

github-actions bot added the size/xl label Feb 24, 2026

autogpt-reviewer reviewed Feb 24, 2026

View reviewed changes

fix(backend): address PR review - wire timeout setting, use monotonic…

0bc098a

… clock, cap poll errors

github-actions bot added platform/frontend AutoGPT Platform - Front end and removed platform/blocks labels Feb 25, 2026

github-project-automation bot added this to Frontend Feb 25, 2026

coderabbitai bot reviewed Feb 25, 2026

View reviewed changes

autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py Show resolved Hide resolved

autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py Show resolved Hide resolved

fix(backend): retry transient HTTP errors during polling, validate ag…

9b3e25d

…ent_json responses

majdyz changed the title ~~feat(copilot): async polling for agent-generator + SSE auto-reconnect~~ feat(backend/copilot): async polling for agent-generator + SSE auto-reconnect Feb 25, 2026

coderabbitai bot reviewed Feb 25, 2026

View reviewed changes

autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py Outdated Show resolved Hide resolved

fix(backend): validate completed job result type in _submit_and_poll

8ef8bec

coderabbitai bot reviewed Feb 25, 2026

View reviewed changes

refactor(backend): extract _extract_agent_json helper, fail fast on u…

ec7c7eb

…nknown poll status

sentry bot reviewed Feb 25, 2026

View reviewed changes

coderabbitai bot reviewed Feb 25, 2026

View reviewed changes

This was referenced Feb 25, 2026

fix(backend/frontend): error handling, stream reconnection, and chat switching #12205

Merged

fix(copilot): prevent double output from StreamFinish/mark_task_completed race #12195

Closed

autogpt-reviewer reviewed Feb 26, 2026

View reviewed changes

github-actions bot added the conflicts Automatically applied to PRs with merge conflicts label Feb 26, 2026

ntindle requested changes Mar 4, 2026

View reviewed changes

github-project-automation bot moved this from 🆕 Needs initial review to 🚧 Needs work in AutoGPT development kanban Mar 4, 2026

majdyz closed this Mar 5, 2026

github-project-automation bot moved this from 🚧 Needs work to ✅ Done in AutoGPT development kanban Mar 5, 2026

github-project-automation bot moved this to Done in Frontend Mar 5, 2026

		const timer = setTimeout(() => resumeStream(), 1_000);
		return () => clearTimeout(timer);

Conversation

majdyz commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Changes

Companion PR

Review comments addressed

Test plan

Uh oh!

greptile-apps bot commented Feb 24, 2026

Uh oh!

github-actions bot commented Feb 24, 2026

Uh oh!

coderabbitai bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔍 PR Overlap Detection

🔴 Merge Conflicts Detected

🟢 Low Risk — File Overlap Only

Uh oh!

autogpt-reviewer left a comment

Choose a reason for hiding this comment

What This PR Does

Specialist Findings

Conditions (must fix before merge)

Should Fix (Follow-up OK)

Risk Assessment

Uh oh!

autogpt-reviewer left a comment

Choose a reason for hiding this comment

What This PR Does

Specialist Findings

Conditions for Merge

Should Fix (Follow-up OK)

Risk Assessment

Uh oh!

majdyz commented Feb 25, 2026

Review Items Addressed (0bc098a)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

majdyz commented Feb 25, 2026

Uh oh!

sentry bot Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

autogpt-reviewer left a comment

Choose a reason for hiding this comment

What Changed Since Last Review

Condition Status

Should-Fix Status

New Code Quality Assessment

Remaining (Non-blocking Follow-ups)

Risk Assessment

Uh oh!

github-actions bot commented Feb 26, 2026

Uh oh!

ntindle commented Mar 4, 2026

Uh oh!

ntindle left a comment

Choose a reason for hiding this comment

Code Review — feat(backend/copilot): async polling for agent-generator + SSE auto-reconnect

🔴 Blocking

1. Infinite reconnect loop in useCopilotPage.ts (line ~1603)

majdyz commented Feb 24, 2026 •

edited

Loading

coderabbitai bot commented Feb 24, 2026 •

edited

Loading

github-actions bot commented Feb 24, 2026 •

edited

Loading

Review Items Addressed (`0bc098a`)

Code Review — `feat(backend/copilot): async polling for agent-generator + SSE auto-reconnect`

1. Infinite reconnect loop in `useCopilotPage.ts` (line ~1603)

2. `response.json()` / `poll_resp.json()` not guarded in `_submit_and_poll` (`service.py`, lines ~142 & ~199)

3. Silent removal of `operation_id` / `session_id` parameters is a breaking API change (`service.py`)

4. `agentgenerator_timeout` semantic change silently breaks operator overrides (`settings.py`, line ~656)

5. `MAX_RECONNECT_ATTEMPTS` defined inside the hook (`useCopilotPage.ts`, line ~1559)

6. `resumeStream` in `useEffect` deps may not be stable (`useCopilotPage.ts`, line ~1606)

7. No tests for `customize_template_external` polling behavior (`test_service.py`)

8. No cancellation of in-flight polling job on caller abort (`service.py`, `_submit_and_poll`)

9. `asyncio.sleep` patched globally in tests (`test_service.py`)

10. `_extract_agent_json` return type is ambiguous (`service.py`, lines ~226-236)

11. `_submit_and_poll` tests reach into private API (`test_service.py`, `TestSubmitAndPoll`)

12. `_client` singleton ignores runtime `agentgenerator_timeout` changes (`service.py`, `_get_client()`)

13. `hasActiveStream` timing in `useEffect` (`useCopilotPage.ts`)