Skip to content

feat(backend/copilot): async polling for agent-generator + SSE auto-reconnect#12199

Closed
majdyz wants to merge 5 commits intodevfrom
feat/async-polling-agent-generator
Closed

feat(backend/copilot): async polling for agent-generator + SSE auto-reconnect#12199
majdyz wants to merge 5 commits intodevfrom
feat/async-polling-agent-generator

Conversation

@majdyz
Copy link
Contributor

@majdyz majdyz commented Feb 24, 2026

Summary

  • Platform service client now uses submit-and-poll pattern for all agent-generator calls (10s poll interval, 30min timeout via time.monotonic())
  • asyncio.sleep() in poll loop yields to event loop, keeping SSE heartbeats alive through GCP's L7 load balancer
  • Transient HTTP errors (429, 503, 504, 408) and connection errors are retried (up to 5 consecutive), not fatal
  • agent_json responses validated before returning (prevents silent None propagation)
  • Frontend auto-reconnects up to 3 times when SSE drops mid-stream
  • agentgenerator_timeout setting wired into HTTP client (was hardcoded)

Why

GCP's L7 load balancer kills SSE connections at ~5 minutes. Agent generation takes 5–30 minutes, so the old synchronous HTTP call would block for the entire duration, causing the SSE connection to die. The new polling pattern keeps each request short (~30s timeout) while asyncio.sleep(10) yields control for heartbeat delivery.

Changes

File Change
service.py _submit_and_poll() helper with retry logic for transient errors, agent_json validation, agentgenerator_timeout wired into client, time.monotonic() for wall-clock timeout
settings.py agentgenerator_timeout default 1800β†’30 (now per-request, not total)
useCopilotPage.ts Auto-reconnect on mid-stream SSE drop (max 3 attempts, 1s delay)
test_service.py 35 tests: polling happy path, failures, retry, timeout, transient HTTP retry, missing agent_json validation

Companion PR

Review comments addressed

  • autogpt-reviewer: Wire agentgenerator_timeout setting into HTTP client
  • autogpt-reviewer: Use time.monotonic() for accurate wall-clock timeout
  • autogpt-reviewer: Restore decompose_goal_external docstring with return-type enumeration
  • coderabbitai: Retry transient HTTP statuses (429/503/504/408) during polling
  • coderabbitai: Validate agent_json exists before returning from 3 endpoints

Test plan

  • 35/35 backend tests pass
  • CI green (23/23 checks)
  • Deploy both PRs to dev and test full agent generation flow
  • Test SSE drop recovery: wait for LB kill at 5min β†’ verify frontend reconnects
  • Verify polling heartbeats keep SSE alive during long generation

…nd SSE reconnect

Platform service now submits jobs to agent-generator and polls for results
(10s interval) instead of blocking on a single HTTP call for up to 30 min.
asyncio.sleep in the poll loop yields to the event loop, keeping SSE
heartbeats alive through GCP's L7 load balancer.

Frontend auto-reconnects up to 3 times when SSE drops mid-stream.
@majdyz majdyz requested review from a team as code owners February 24, 2026 14:18
@majdyz majdyz requested review from Swiftyos and ntindle and removed request for a team February 24, 2026 14:18
@github-project-automation github-project-automation bot moved this to πŸ†• Needs initial review in AutoGPT development kanban Feb 24, 2026
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 24, 2026

Too many files changed for review. (484 files found, 100 file limit)

@github-actions
Copy link
Contributor

This PR targets the master branch but does not come from dev or a hotfix/* branch.

Automatically setting the base branch to dev.

@github-actions github-actions bot changed the base branch from master to dev February 24, 2026 14:18
@github-actions github-actions bot added platform/backend AutoGPT Platform - Back end platform/blocks labels Feb 24, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 24, 2026

πŸš₯ Pre-merge checks | βœ… 3
βœ… Passed checks (3 passed)
Check name Status Explanation
Title check βœ… Passed The title accurately and concisely summarizes the two main changes: async polling for agent-generator in the backend and SSE auto-reconnect for the frontend.
Description check βœ… Passed The description provides detailed context on the changes, motivation (GCP L7 load balancer timeout), technical implementation details, and test coverage. It directly relates to the changeset.
Docstring Coverage βœ… Passed Docstring coverage is 95.74% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • πŸ“ Generate docstrings (stacked PR)
  • πŸ“ Generate docstrings (commit on current branch)
πŸ§ͺ Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/async-polling-agent-generator

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❀️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 24, 2026

πŸ” PR Overlap Detection

This check compares your PR against all other open PRs targeting the same branch to detect potential merge conflicts early.

πŸ”΄ Merge Conflicts Detected

The following PRs have been tested and will have merge conflicts if merged after this PR. Consider coordinating with the authors.

🟒 Low Risk β€” File Overlap Only

These PRs touch the same files but different sections (click to expand)

Summary: 1 conflict(s), 0 medium risk, 4 low risk (out of 5 PRs with file overlap)


Auto-generated on push. Ignores: openapi.json, lock files.

Copy link

@autogpt-reviewer autogpt-reviewer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #12199 β€” feat(copilot): async polling for agent-generator + SSE auto-reconnect
Author: majdyz (Zamil Majdy) | Files: service.py (+183/-239), settings.py (+2/-2), useCopilotPage.ts (+40/-1), test_service.py (+372/-203)

🎯 Verdict: APPROVE WITH CONDITIONS

What This PR Does

GCP's L7 load balancer kills SSE connections at ~5 minutes. Agent generation takes 5-30 minutes, so the old synchronous HTTP call blocked for the entire duration, causing SSE to die. This PR switches the backend to a submit-and-poll pattern (submit job β†’ poll every 10s β†’ up to 30min budget) so each HTTP request stays short and asyncio.sleep() yields control for SSE heartbeat delivery. The frontend adds auto-reconnect (3 attempts, 1s delay) when SSE drops mid-stream.

Specialist Findings

πŸ›‘οΈ Security βœ… β€” No vulnerabilities found. Job ID is server-side only (never exposed to users), SSE reconnect uses same authenticated transport, no new endpoints added. _submit_and_poll is entirely backendβ†’agent-generator with no user-controlled input in URL construction. Only informational: defensive job_id format validation would harden against a compromised upstream service.

πŸ—οΈ Architecture βœ… β€” Textbook refactoring. _submit_and_poll() is a clean single-responsibility abstraction that eliminated 4Γ— duplicated HTTP/error-handling blocks (net -56 lines despite new functionality). Timeout model now correct: 30s per-request vs 30-min polling budget. Error classification helpers preserved and reused. Frontend reconnect cleanly contained in existing useEffect status watcher.
⚠️ agentgenerator_timeout setting in settings.py:374-376 is now dead code β€” nothing reads it. Must wire it back into _REQUEST_TIMEOUT or remove it.

⚑ Performance βœ… β€” Polling overhead is negligible (0.1 req/s per job; 100 concurrent sessions = 10 req/s). No memory leaks β€” poll responses aren't accumulated. Connection pool adequate.
⚠️ service.py:185-187: elapsed += POLL_INTERVAL_SECONDS drifts from wall-clock time β€” doesn't account for HTTP request duration (up to 30s on timeout). Under pathological conditions, loop could run ~5.4Γ— longer than the 30-min cap. Should use time.monotonic().
⚠️ No cap on consecutive transient poll errors β€” sustained network partition retries for entire 30-min budget.

πŸ§ͺ Testing ⚠️ β€” Backend polling tests are solid: 8 new TestSubmitAndPoll tests cover happy path, failed job, HTTP errors, connection errors, missing job_id, transient retry, 404 expired job, and timeout. Public function tests properly mock _submit_and_poll.
❌ Frontend SSE reconnect has zero test coverage β€” 40 lines of new stateful reconnect logic (the user-facing feature that solves the GCP LB problem) with no automated tests. Missing: reconnect trigger, counter cap, counter reset, cleanup on unmount.
⚠️ No test for consecutive transient failures, malformed poll JSON, or unexpected poll status values.

πŸ“– Quality βœ… β€” Clean, well-organized code. Constants properly extracted (POLL_INTERVAL_SECONDS, MAX_POLL_TIME_SECONDS, MAX_RECONNECT_ATTEMPTS). Good section dividers. Frontend reconnect logic is readable with proper state machine.
⚠️ decompose_goal_external docstring lost useful return-type enumeration during refactor (service.py:225-228).
⚠️ setTimeout(() => resumeStream(), 1_000) β€” inline magic number should be RECONNECT_DELAY_MS.

πŸ“¦ Product βœ… β€” Solves a real, painful UX problem (SSE dying during long agent generation). Backend polling is invisible to users. Frontend auto-reconnect is well-guarded with attempt limits and proper cleanup. Deployment-independent β€” backend and frontend can be deployed separately.
⚠️ "Connection lost" toast after 3 failures is a dead end β€” no retry button, asks user to manually refresh. Backend job may still be running.
⚠️ No user-visible progress indicator during 30-minute generation jobs β€” indefinite spinner.

πŸ“¬ Discussion βœ… β€” CI fully green (25/25 checks pass). No human reviews submitted yet. No unresolved review threads.
⚠️ Author's manual test checklist (5 items) is entirely unchecked β€” deployment validation pending.
⚠️ Merge conflict detected with PR #12071 (Otto-AGPT) β€” needs coordination.
⚠️ Both Greptile and CodeRabbit skipped review due to inflated file count (299 files β€” branch likely includes dev merge).

πŸ”Ž QA ⚠️ β€” Backend changes verified: containers running pr-12199 code, _submit_and_poll present, health endpoint returns 200, clean startup logs. Storybook loads correctly.
⚠️ Frontend could not be live-tested β€” port 3000 was serving a different worktree (pr-12141), and Supabase client was unavailable for auth. SSE reconnect code verified correct in the worktree source but not exercised in browser.
landing storybook

Conditions (must fix before merge)

  1. agentgenerator_timeout is dead code β€” settings.py:374-376: The setting exists (updated default + description) but nothing reads it. _REQUEST_TIMEOUT is hardcoded to httpx.Timeout(30.0) at service.py:82. Either wire _get_settings().config.agentgenerator_timeout into the timeout, or remove the dead setting. Dead config confuses operators.

  2. Use monotonic clock for elapsed tracking β€” service.py:185-187: Replace elapsed += POLL_INTERVAL_SECONDS with time.monotonic() delta. Current approach doesn't account for HTTP request duration and can overshoot the 30-min budget by a large margin under pathological conditions (every poll hitting 30s timeout).

Should Fix (Follow-up OK)

  1. service.py:192-195 β€” Add a consecutive transient error cap (e.g., bail after 5 consecutive RequestErrors) or exponential backoff. Currently retries indefinitely within the budget.
  2. useCopilotPage.ts:281 β€” Exponential backoff on reconnect (1s β†’ 2s β†’ 4s) instead of fixed 1s. All 3 attempts fire within ~3s if LB is actively killing connections.
  3. useCopilotPage.ts:~249 β€” Move MAX_RECONNECT_ATTEMPTS to module scope (standard React pattern).
  4. useCopilotPage.ts:278 β€” Extract RECONNECT_DELAY_MS = 1_000 as a named constant.
  5. service.py:225-228 β€” Restore return-type enumeration in decompose_goal_external docstring (possible types: instructions, clarifying_questions, unachievable_goal, vague_goal, error).
  6. Frontend SSE reconnect tests β€” Add at least unit tests for the reconnect state machine (trigger, counter cap, counter reset, cleanup).

Risk Assessment

Merge risk: LOW | Rollback: EASY (revert 4 files, no schema changes, no data migrations)

@ntindle Well-architected refactor that solves the GCP LB SSE timeout problem cleanly. Two conditions before merge: wire or remove the orphaned agentgenerator_timeout setting, and fix the elapsed time tracking to use monotonic clock. Everything else is follow-up material. The core mechanism is sound β€” net code reduction with better reliability.

Copy link

@autogpt-reviewer autogpt-reviewer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #12199 β€” feat(copilot): async polling for agent-generator + SSE auto-reconnect
Author: majdyz (Zamil Majdy) | Files: service.py (+183/-239), settings.py (+2/-2), useCopilotPage.ts (+40/-1), test_service.py (+372/-203)

🎯 Verdict: APPROVE WITH CONDITIONS


What This PR Does

GCP's L7 load balancer kills SSE connections at ~5 minutes, but agent generation takes 5–30 minutes. This PR switches the platform backend from a single long-lived HTTP call to a submit-and-poll pattern (POST β†’ 202 + job_id β†’ GET poll every 10s, 30min budget). The frontend adds auto-reconnect (3 attempts, 1s delay) when SSE drops mid-stream. Together, these two layers solve the connection-drop problem at both backend and frontend.

Specialist Findings

πŸ›‘οΈ Security βœ… β€” No blockers. job_id flows entirely within backend-to-microservice boundary β€” no IDOR risk. SSE reconnect reuses existing authenticated session β€” no auth bypass. Minor nit: job_id is interpolated directly into URL path (service.py:186) without format validation; a UUID/alphanumeric check would harden against hypothetical path traversal from a compromised internal service.

πŸ—οΈ Architecture βœ… β€” Clean, well-isolated abstraction. _submit_and_poll() has single responsibility; all 4 external functions delegate uniformly. Massive DRY win β€” eliminated 4Γ— identical try/except blocks. Frontend and backend layers are independent (good separation). ⚠️ agentgenerator_timeout setting (settings.py:374) is now dead code β€” _REQUEST_TIMEOUT is hardcoded at service.py:82. Wire it back or delete the setting.

⚑ Performance βœ… β€” asyncio.sleep(10) correctly yields to event loop for heartbeat delivery. Singleton httpx.AsyncClient with connection pooling is efficient. No memory leaks in poll loop. ⚠️ Fixed 10s interval = up to 180 requests per generation; exponential backoff (2sβ†’30s cap) would reduce to ~50 while detecting fast jobs sooner. ⚠️ Elapsed time tracking (service.py:190) drifts from wall clock β€” uses counter instead of time.monotonic().

πŸ§ͺ Testing ⚠️ β€” Backend TestSubmitAndPoll (8 tests) is solid: happy path, failures, transient retry, timeout all covered. Public function tests properly mock at the _submit_and_poll boundary. However: ❌ Frontend SSE auto-reconnect (40 lines of stateful useEffect logic) has zero test coverage β€” this is the user-facing reliability fix. ❌ customize_template_external has no test class at all (missing TestCustomizeTemplateExternal). Missing edge cases: all-transient-errors-until-timeout, malformed poll JSON, empty result on completed job.

πŸ“– Quality βœ… β€” Genuinely cleaner refactor (net -56 lines). Good naming conventions (_submit_and_poll, POLL_INTERVAL_SECONDS). Well-organized section comments. ⚠️ Docstrings trimmed too aggressively β€” decompose_goal_external lost useful return-type enumeration (instructions/clarifying_questions/unachievable_goal/vague_goal/error). ⚠️ setTimeout(() => resumeStream(), 1_000) β€” extract as RECONNECT_DELAY_MS for consistency.

πŸ“¦ Product βœ… β€” Solves a real, painful user problem. Happy path is completely transparent to users. Reconnect is well-guarded with attempt limits and proper cleanup. Toast on exhaustion is clear. ⚠️ No progress indicator during long generation (25 min with zero feedback). ⚠️ Unclear if Stop button cancels the _submit_and_poll loop. ⚠️ Consider "Reconnecting..." indicator instead of brief error-state flash during 1s reconnect delay.

πŸ“¬ Discussion ⚠️ β€” Zero human reviews (PR opened ~3 hours ago). Both Greptile and CodeRabbit skipped due to inflated diff (299 files from dev merge β€” only 4 actual changes). πŸ”΄ Merge conflict with #12071 (stale/approved). Companion PR dependency: AutoGPT-Agent-Generator/pull/120 must deploy alongside for polling endpoints to exist. All 5 manual test plan checkboxes unchecked.

πŸ”Ž QA ⚠️ β€” Environment partially broken (backend/Supabase down, only Next.js frontend running). Login blocked, copilot page inaccessible. Code inspection confirms reconnect logic is well-implemented. Could not perform live SSE reconnect testing.


Conditions for Merge

  1. agentgenerator_timeout setting is dead code β€” settings.py:374-376 defines it, service.py:82 hardcodes _REQUEST_TIMEOUT = httpx.Timeout(30.0) instead. Either wire settings.config.agentgenerator_timeout into the timeout, or delete the setting. Dead config misleads operators. (Flagged by: Security, Architect, Quality)

  2. Restore decompose_goal_external return-type documentation β€” The docstring lost the enumeration of possible return types (instructions, clarifying_questions, unachievable_goal, vague_goal, error). One-line abbreviated Args: sections for all 4 functions would also help callers. (Flagged by: Quality)

Should Fix (Follow-up OK)

  1. Frontend SSE reconnect needs tests — 40 lines of stateful useEffect logic with reconnectAttemptsRef, setTimeout cleanup, counter reset. At minimum: RTL test verifying reconnect triggers on streaming→error + hasActiveStream, counter exhaustion shows toast, counter resets on success. (Flagged by: Testing)
  2. Add TestCustomizeTemplateExternal β€” Missing test class entirely. Function has its own response-mapping logic. (Flagged by: Testing)
  3. Exponential backoff for polling β€” Fixed 10s = 180 requests worst case. Start at 2s, cap at 30s β†’ ~50 requests, faster detection of quick jobs. (Flagged by: Performance, Architect)
  4. Use time.monotonic() for elapsed tracking β€” Current counter drifts by HTTP request duration per iteration. (Flagged by: Performance, Quality)
  5. Extract RECONNECT_DELAY_MS = 1_000 β€” Inline magic number in frontend. (Flagged by: Quality)
  6. Rebase to reduce diff noise β€” 299-file diff from dev merge blocks automated reviewers. 4 actual files. (Flagged by: Discussion)

Risk Assessment

Merge risk: LOW | Rollback: EASY (revert 4 files, no migrations)

Deployment note: Requires companion PR (AutoGPT-Agent-Generator/pull/120) deployed alongside for polling endpoints. Without it, backend will get unexpected responses from agent-generator. Dummy mode still works independently.

@ntindle Two small conditions (dead setting + restore docstring), then this is good to go. Backend refactor is a clean DRY win, frontend reconnect is well-implemented. Main gap is frontend test coverage β€” not blocking merge but should be a fast follow-up.

@github-actions github-actions bot added platform/frontend AutoGPT Platform - Front end and removed platform/blocks labels Feb 25, 2026
@majdyz
Copy link
Contributor Author

majdyz commented Feb 25, 2026

Review Items Addressed (0bc098a)

Must Fix (both done):

  1. βœ… Wired agentgenerator_timeout setting into HTTP client β€” _get_client() now reads settings.config.agentgenerator_timeout dynamically instead of the hardcoded constant
  2. βœ… Replaced elapsed += POLL_INTERVAL_SECONDS with time.monotonic() delta for accurate wall-clock timeout tracking

Should Fix (done in same commit):
3. βœ… Added MAX_CONSECUTIVE_POLL_ERRORS = 5 β€” polling loop now gives up after 5 consecutive transient RequestError failures instead of retrying for the full 30-min budget
4. βœ… Restored decompose_goal_external docstring with full return-type enumeration

Tests updated: test_poll_timeout now mocks time.monotonic, new test_poll_gives_up_after_consecutive_transient_errors test added. All 32 tests pass.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

πŸ€– Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py`:
- Around line 320-325: The code currently returns result.get("agent_json")
directly which can be None if the job payload is malformed; update each
agent-producing endpoint that returns result.get("agent_json") (the three
occurrences using result.get("agent_json")) to validate that agent_json is
present and is the expected type (e.g., a dict) before returning it, and if
invalid or missing return an invalid_response-style error object (use the
existing invalid_response helper or a
{"type":"error","error":"invalid_response", ...} payload) so callers get a clear
diagnostic instead of None; apply this change to all three return sites that
currently return result.get("agent_json").
- Around line 189-197: The poll handler currently treats transient HTTP statuses
as fatal; after calling _classify_http_error(e) inside the httpx.HTTPStatusError
except block, detect transient cases (e.g., status codes 429, 503, 504, 408 or
error_type values that _classify_http_error returns for rate limiting/temporary
upstream errors) and do NOT return an immediate _create_error_response; instead
log the transient condition via logger.error and surface a retryable signal (for
example raise a new RetryablePollError or re-raise the original exception) so
the caller can retry the poll; only call and return _create_error_response for
non-transient error_type values. Ensure you reference and update the
httpx.HTTPStatusError except block, the use of _classify_http_error,
logger.error, and _create_error_response accordingly.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

πŸ“₯ Commits

Reviewing files that changed from the base of the PR and between 6e61cb1 and 0bc098a.

πŸ“’ Files selected for processing (4)
  • autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py
  • autogpt_platform/backend/backend/util/settings.py
  • autogpt_platform/backend/test/agent_generator/test_service.py
  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts
πŸ“œ Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: Seer Code Review
  • GitHub Check: types
  • GitHub Check: Check PR Status
  • GitHub Check: test (3.13)
  • GitHub Check: test (3.12)
  • GitHub Check: test (3.11)
  • GitHub Check: end-to-end tests
  • GitHub Check: Analyze (python)
🧰 Additional context used
πŸ““ Path-based instructions (15)
autogpt_platform/backend/**/*.py

πŸ“„ CodeRabbit inference engine (.github/copilot-instructions.md)

autogpt_platform/backend/**/*.py: Use Python 3.11 (required; managed by Poetry via pyproject.toml) for backend development
Always run 'poetry run format' (Black + isort) before linting in backend development
Always run 'poetry run lint' (ruff) after formatting in backend development

Files:

  • autogpt_platform/backend/backend/util/settings.py
  • autogpt_platform/backend/test/agent_generator/test_service.py
  • autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py
autogpt_platform/backend/**/*.{py,txt}

πŸ“„ CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)

Use poetry run prefix for all Python commands, including testing, linting, formatting, and migrations

Files:

  • autogpt_platform/backend/backend/util/settings.py
  • autogpt_platform/backend/test/agent_generator/test_service.py
  • autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py
autogpt_platform/backend/backend/**/*.py

πŸ“„ CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)

Use Prisma ORM for database operations in PostgreSQL with pgvector for embeddings

Files:

  • autogpt_platform/backend/backend/util/settings.py
  • autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py
autogpt_platform/**/*.py

πŸ“„ CodeRabbit inference engine (AGENTS.md)

Format Python code with poetry run format

Files:

  • autogpt_platform/backend/backend/util/settings.py
  • autogpt_platform/backend/test/agent_generator/test_service.py
  • autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py
autogpt_platform/backend/**/test/**/*.py

πŸ“„ CodeRabbit inference engine (.github/copilot-instructions.md)

Use snapshot testing with '--snapshot-update' flag in backend tests when output changes; always review with 'git diff'

Files:

  • autogpt_platform/backend/test/agent_generator/test_service.py
autogpt_platform/backend/**/*test*.py

πŸ“„ CodeRabbit inference engine (AGENTS.md)

Run poetry run test for backend testing (runs pytest with docker based postgres + prisma)

Files:

  • autogpt_platform/backend/test/agent_generator/test_service.py
autogpt_platform/frontend/**/*.{ts,tsx,js,jsx}

πŸ“„ CodeRabbit inference engine (.github/copilot-instructions.md)

autogpt_platform/frontend/**/*.{ts,tsx,js,jsx}: Use Node.js 21+ with pnpm package manager for frontend development
Always run 'pnpm format' for formatting and linting code in frontend development

Files:

  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts
autogpt_platform/frontend/**/*.{tsx,ts}

πŸ“„ CodeRabbit inference engine (.github/copilot-instructions.md)

autogpt_platform/frontend/**/*.{tsx,ts}: Use function declarations for components and handlers (not arrow functions) in React components
Only use arrow functions for small inline lambdas (map, filter, etc.) in React components
Use PascalCase for component names and camelCase with 'use' prefix for hook names in React
Use Tailwind CSS utilities only for styling in frontend components
Use design system components from 'src/components/' (atoms, molecules, organisms) in frontend development
Never use 'src/components/legacy/' in frontend code
Only use Phosphor Icons (@phosphor-icons/react) for icons in frontend components
Use generated API hooks from '@/app/api/generated/endpoints/' instead of deprecated 'BackendAPI' or 'src/lib/autogpt-server-api/
'
Use React Query for server state (via generated hooks) in frontend development
Default to client components ('use client') in Next.js; only use server components for SEO or extreme TTFB needs
Use '' component for rendering errors in frontend UI; use toast notifications for mutation errors; use 'Sentry.captureException()' for manual exceptions
Separate render logic from data/behavior in React components; keep comments minimal (code should be self-documenting)

Files:

  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts
autogpt_platform/frontend/**/*.{ts,tsx}

πŸ“„ CodeRabbit inference engine (.github/copilot-instructions.md)

autogpt_platform/frontend/**/*.{ts,tsx}: No barrel files or 'index.ts' re-exports in frontend code
Regenerate API hooks with 'pnpm generate:api' after backend OpenAPI spec changes in frontend development

Files:

  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts
autogpt_platform/frontend/src/**/*.{ts,tsx}

πŸ“„ CodeRabbit inference engine (autogpt_platform/frontend/CLAUDE.md)

autogpt_platform/frontend/src/**/*.{ts,tsx}: Fully capitalize acronyms in symbols, e.g. graphID, useBackendAPI
Use function declarations (not arrow functions) for components and handlers
Separate render logic (.tsx) from business logic (use*.ts hooks)
Use shadcn/ui (Radix UI primitives) with Tailwind CSS styling for UI components
Use Phosphor Icons only for icons
Use ErrorCard for render errors, toast for mutations, and Sentry for exceptions
Use design system components from src/components/ (atoms, molecules, organisms)
Never use src/components/__legacy__/* components
Use generated API hooks from @/app/api/__generated__/endpoints/ with pattern use{Method}{Version}{OperationName}
Use Tailwind CSS only for styling, with design tokens
Do not use useCallback or useMemo unless asked to optimize a given function
Never type with any unless a variable/attribute can ACTUALLY be of any type

autogpt_platform/frontend/src/**/*.{ts,tsx}: Structure components as ComponentName/ComponentName.tsx + useComponentName.ts + helpers.ts and use design system components from src/components/ (atoms, molecules, organisms)
Use generated API hooks from @/app/api/__generated__/endpoints/ with pattern use{Method}{Version}{OperationName} and regenerate with pnpm generate:api
Use function declarations (not arrow functions) for components and handlers
Separate render logic from business logic with component.tsx + useComponent.ts + helpers.ts structure
Colocate state when possible, avoid creating large components, use sub-components in local /components folder
Avoid large hooks, abstract logic into helpers.ts files when sensible
Use arrow functions only for callbacks, not for component declarations
Avoid comments at all times unless the code is very complex
Do not use useCallback or useMemo unless asked to optimize a given function

Files:

  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts
autogpt_platform/frontend/src/**/*use*.ts

πŸ“„ CodeRabbit inference engine (autogpt_platform/frontend/CLAUDE.md)

Do not type hook returns, let TypeScript infer as much as possible

Files:

  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts
autogpt_platform/frontend/**/*.{js,jsx,ts,tsx}

πŸ“„ CodeRabbit inference engine (AGENTS.md)

autogpt_platform/frontend/**/*.{js,jsx,ts,tsx}: Format frontend code using pnpm format
Never use components from src/components/__legacy__/*

Files:

  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts
autogpt_platform/frontend/**/*.{js,jsx,ts,tsx,css}

πŸ“„ CodeRabbit inference engine (AGENTS.md)

Use Tailwind CSS only for styling, use design tokens, and use Phosphor Icons only

Files:

  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts
autogpt_platform/frontend/src/**/*.ts

πŸ“„ CodeRabbit inference engine (AGENTS.md)

Do not type hook returns, let Typescript infer as much as possible

Files:

  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts
autogpt_platform/**/*.{ts,tsx}

πŸ“„ CodeRabbit inference engine (AGENTS.md)

Never type with any, if no types available use unknown

Files:

  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts
🧠 Learnings (2)
πŸ“š Learning: 2026-02-04T16:49:42.490Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-02-04T16:49:42.490Z
Learning: Applies to autogpt_platform/backend/**/test/**/*.py : Use snapshot testing with '--snapshot-update' flag in backend tests when output changes; always review with 'git diff'

Applied to files:

  • autogpt_platform/backend/test/agent_generator/test_service.py
πŸ“š Learning: 2026-02-04T16:50:20.508Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: autogpt_platform/backend/CLAUDE.md:0-0
Timestamp: 2026-02-04T16:50:20.508Z
Learning: Applies to autogpt_platform/backend/**/*_test.py : Always review snapshot changes with `git diff` before committing when updating snapshots with `poetry run pytest --snapshot-update`

Applied to files:

  • autogpt_platform/backend/test/agent_generator/test_service.py
🧬 Code graph analysis (1)
autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py (2)
autogpt_platform/backend/backend/util/request.py (2)
  • post (569-570)
  • json (295-301)
autogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.py (2)
  • generate_agent_dummy (101-115)
  • customize_template_dummy (140-151)
πŸ”‡ Additional comments (5)
autogpt_platform/backend/backend/util/settings.py (1)

374-377: Per-request timeout config update looks consistent with the new polling model.

The new default/value description pair is clear and matches submit/poll request-level timeout semantics.

autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts (2)

261-266: Session cache invalidation on stream end is a good addition.

This avoids stale hydrated session data after stream termination and improves remount/navigation consistency.


268-297: Reconnect strategy is well-bounded and safely gated.

The streaming→error guard with hasActiveStream and capped retry attempts prevents runaway reconnect loops while still recovering common mid-stream SSE drops.

autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py (1)

181-185: Good use of monotonic timing and non-blocking poll waits.

Using time.monotonic() with await asyncio.sleep(...) is the right pattern for stable elapsed-time checks in async loops.

autogpt_platform/backend/test/agent_generator/test_service.py (1)

52-274: Great coverage on the new submit-and-poll core path.

These tests exercise success, timeout, transient network failures, 404 expiry, and capped retry behavior in a way that matches the new helper semantics.

@majdyz majdyz changed the title feat(copilot): async polling for agent-generator + SSE auto-reconnect feat(backend/copilot): async polling for agent-generator + SSE auto-reconnect Feb 25, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

πŸ€– Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py`:
- Around line 231-233: In _submit_and_poll, after detecting status ==
"completed" (and before returning poll_data.get("result", {})), validate that
poll_data.get("result") is a dict; if it is, return it, otherwise log an error
including job_id and the unexpected result type/value and return a structured
invalid response (e.g. {"invalid_response": True, "raw_result": <result>}) so
downstream callers that call .get(...) won't raise AttributeError.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

πŸ“₯ Commits

Reviewing files that changed from the base of the PR and between 0bc098a and 9b3e25d.

πŸ“’ Files selected for processing (2)
  • autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py
  • autogpt_platform/backend/test/agent_generator/test_service.py
πŸ“œ Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: Seer Code Review
  • GitHub Check: types
  • GitHub Check: test (3.11)
  • GitHub Check: test (3.13)
  • GitHub Check: test (3.12)
  • GitHub Check: end-to-end tests
  • GitHub Check: Analyze (python)
  • GitHub Check: Check PR Status
🧰 Additional context used
πŸ““ Path-based instructions (6)
autogpt_platform/backend/**/*.py

πŸ“„ CodeRabbit inference engine (.github/copilot-instructions.md)

autogpt_platform/backend/**/*.py: Use Python 3.11 (required; managed by Poetry via pyproject.toml) for backend development
Always run 'poetry run format' (Black + isort) before linting in backend development
Always run 'poetry run lint' (ruff) after formatting in backend development

Files:

  • autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py
  • autogpt_platform/backend/test/agent_generator/test_service.py
autogpt_platform/backend/**/*.{py,txt}

πŸ“„ CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)

Use poetry run prefix for all Python commands, including testing, linting, formatting, and migrations

Files:

  • autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py
  • autogpt_platform/backend/test/agent_generator/test_service.py
autogpt_platform/backend/backend/**/*.py

πŸ“„ CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)

Use Prisma ORM for database operations in PostgreSQL with pgvector for embeddings

Files:

  • autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py
autogpt_platform/**/*.py

πŸ“„ CodeRabbit inference engine (AGENTS.md)

Format Python code with poetry run format

Files:

  • autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py
  • autogpt_platform/backend/test/agent_generator/test_service.py
autogpt_platform/backend/**/test/**/*.py

πŸ“„ CodeRabbit inference engine (.github/copilot-instructions.md)

Use snapshot testing with '--snapshot-update' flag in backend tests when output changes; always review with 'git diff'

Files:

  • autogpt_platform/backend/test/agent_generator/test_service.py
autogpt_platform/backend/**/*test*.py

πŸ“„ CodeRabbit inference engine (AGENTS.md)

Run poetry run test for backend testing (runs pytest with docker based postgres + prisma)

Files:

  • autogpt_platform/backend/test/agent_generator/test_service.py
🧠 Learnings (2)
πŸ“š Learning: 2026-02-04T16:49:42.490Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-02-04T16:49:42.490Z
Learning: Applies to autogpt_platform/backend/**/test/**/*.py : Use snapshot testing with '--snapshot-update' flag in backend tests when output changes; always review with 'git diff'

Applied to files:

  • autogpt_platform/backend/test/agent_generator/test_service.py
πŸ“š Learning: 2026-02-04T16:50:20.508Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: autogpt_platform/backend/CLAUDE.md:0-0
Timestamp: 2026-02-04T16:50:20.508Z
Learning: Applies to autogpt_platform/backend/**/*_test.py : Always review snapshot changes with `git diff` before committing when updating snapshots with `poetry run pytest --snapshot-update`

Applied to files:

  • autogpt_platform/backend/test/agent_generator/test_service.py
🧬 Code graph analysis (1)
autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py (1)
autogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.py (2)
  • generate_agent_dummy (101-115)
  • customize_template_dummy (140-151)
πŸ”‡ Additional comments (2)
autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py (1)

189-208: Transient poll retry behavior is a solid reliability improvement.

Handling retryable HTTP/request errors with a consecutive-error cap is a good balance between resilience and bounded wait time.

Also applies to: 212-225

autogpt_platform/backend/test/agent_generator/test_service.py (1)

52-336: Great expansion of polling-path test coverage.

The new cases for timeout, transient retries, and consecutive-error aborts materially reduce regression risk for the async polling flow.

Also applies to: 338-850

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py (2)

344-350: Consider deduplicating agent_json extraction/validation.

The same validation block appears in three endpoints. A small helper would reduce drift and keep error responses consistent.

♻️ Suggested refactor
+def _extract_agent_json_or_error(result: dict[str, Any]) -> dict[str, Any]:
+    agent_json = result.get("agent_json")
+    if isinstance(agent_json, dict):
+        return agent_json
+    return _create_error_response(
+        "Agent Generator returned no agent_json in result", "invalid_response"
+    )
@@
-    agent_json = result.get("agent_json")
-    if not isinstance(agent_json, dict):
-        return _create_error_response(
-            "Agent Generator returned no agent_json in result", "invalid_response"
-        )
-    return agent_json
+    return _extract_agent_json_or_error(result)
@@
-    agent_json = result.get("agent_json")
-    if not isinstance(agent_json, dict):
-        return _create_error_response(
-            "Agent Generator returned no agent_json in result", "invalid_response"
-        )
-    return agent_json
+    return _extract_agent_json_or_error(result)
@@
-    agent_json = result.get("agent_json")
-    if not isinstance(agent_json, dict):
-        return _create_error_response(
-            "Agent Generator returned no agent_json in result", "invalid_response"
-        )
-    return agent_json
+    return _extract_agent_json_or_error(result)

Also applies to: 391-396, 441-446

πŸ€– Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py`
around lines 344 - 350, Create a small helper (e.g.,
_extract_agent_json(result)) in service.py that encapsulates the repeated logic:
get "agent_json" from the result, verify it's a dict, and return either the dict
or call _create_error_response("Agent Generator returned no agent_json in
result", "invalid_response"). Replace the three duplicated blocks (the
extraction/validation around agent_json at the locations using variable
agent_json) with calls to this helper so all endpoints use the same validation
and error response.

240-245: Handle unknown poll statuses explicitly instead of treating all as β€œstill running”.

Right now, any status other than "completed" / "failed" keeps polling until timeout. If upstream returns an unexpected terminal state, this delays failure diagnosis by up to 30 minutes.

♻️ Suggested hardening
         if status == "completed":
             logger.info(f"Agent Generator job {job_id} completed")
             result = poll_data.get("result", {})
             if not isinstance(result, dict):
                 return _create_error_response(
                     "Agent Generator returned invalid result payload",
                     "invalid_response",
                 )
             return result
         elif status == "failed":
             error_msg = poll_data.get("error", "Job failed")
             logger.error(f"Agent Generator job {job_id} failed: {error_msg}")
             return _create_error_response(error_msg, "job_failed")
-        # else: still running β€” keep polling
+        elif status in {"pending", "queued", "running"}:
+            continue
+        else:
+            logger.error(f"Unknown job status for {job_id}: {status!r}")
+            return _create_error_response(
+                f"Unknown job status: {status}",
+                "invalid_response",
+            )
πŸ€– Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py`
around lines 240 - 245, The polling loop currently treats any
non-"completed"/"failed" status as "still running"; update the branch that
checks status (around the elif status == "failed" block handling job_id and
poll_data) to explicitly detect unexpected terminal statuses: if status is a
known running state (e.g., "pending","running","in_progress") continue polling,
otherwise log an error including job_id and the unexpected status/poll_data and
return _create_error_response with a distinct code (e.g., "unknown_status") so
callers fail fast instead of waiting the full timeout.
πŸ€– Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py`:
- Around line 344-350: Create a small helper (e.g., _extract_agent_json(result))
in service.py that encapsulates the repeated logic: get "agent_json" from the
result, verify it's a dict, and return either the dict or call
_create_error_response("Agent Generator returned no agent_json in result",
"invalid_response"). Replace the three duplicated blocks (the
extraction/validation around agent_json at the locations using variable
agent_json) with calls to this helper so all endpoints use the same validation
and error response.
- Around line 240-245: The polling loop currently treats any
non-"completed"/"failed" status as "still running"; update the branch that
checks status (around the elif status == "failed" block handling job_id and
poll_data) to explicitly detect unexpected terminal statuses: if status is a
known running state (e.g., "pending","running","in_progress") continue polling,
otherwise log an error including job_id and the unexpected status/poll_data and
return _create_error_response with a distinct code (e.g., "unknown_status") so
callers fail fast instead of waiting the full timeout.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

πŸ“₯ Commits

Reviewing files that changed from the base of the PR and between 9b3e25d and 8ef8bec.

πŸ“’ Files selected for processing (1)
  • autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py
πŸ“œ Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: types
  • GitHub Check: Seer Code Review
  • GitHub Check: end-to-end tests
  • GitHub Check: test (3.12)
  • GitHub Check: Analyze (python)
  • GitHub Check: test (3.11)
  • GitHub Check: test (3.13)
  • GitHub Check: Check PR Status
🧰 Additional context used
πŸ““ Path-based instructions (4)
autogpt_platform/backend/**/*.py

πŸ“„ CodeRabbit inference engine (.github/copilot-instructions.md)

autogpt_platform/backend/**/*.py: Use Python 3.11 (required; managed by Poetry via pyproject.toml) for backend development
Always run 'poetry run format' (Black + isort) before linting in backend development
Always run 'poetry run lint' (ruff) after formatting in backend development

Files:

  • autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py
autogpt_platform/backend/**/*.{py,txt}

πŸ“„ CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)

Use poetry run prefix for all Python commands, including testing, linting, formatting, and migrations

Files:

  • autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py
autogpt_platform/backend/backend/**/*.py

πŸ“„ CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)

Use Prisma ORM for database operations in PostgreSQL with pgvector for embeddings

Files:

  • autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py
autogpt_platform/**/*.py

πŸ“„ CodeRabbit inference engine (AGENTS.md)

Format Python code with poetry run format

Files:

  • autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py
πŸ”‡ Additional comments (3)
autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py (3)

125-135: Nice improvement on timeout configuration.

Using settings.config.agentgenerator_timeout in _get_client() removes the hardcoded timeout and keeps request timeout behavior centrally configurable.


181-227: Polling loop resiliency is solid.

time.monotonic() budget tracking, asyncio.sleep() yielding, and capped consecutive transient error retries are well implemented for long-running jobs.


503-508: close_client() lifecycle handling looks correct.

Closing the singleton client and resetting _client to None is clean and safe for reinitialization.

@majdyz
Copy link
Contributor Author

majdyz commented Feb 25, 2026

Addressed both nitpicks from the latest coderabbit review in ec7c7eb:

  1. Code dedup β€” extracted _extract_agent_json(result) helper, used in all 3 agent-producing endpoints.
  2. Unknown poll status β€” now explicitly handles running/pending/queued as known running states; unknown statuses fail fast with invalid_response instead of polling for 30 minutes.

Comment on lines +281 to +282
const timer = setTimeout(() => resumeStream(), 1_000);
return () => clearTimeout(timer);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: A race condition in the useEffect hook for auto-reconnection can prevent it from firing. If a dependency changes, the reconnect timer is cleared but not reset.
Severity: MEDIUM

Suggested Fix

To fix this, the logic for tracking the previous status needs to be more robust against re-renders. One approach is to use useRef to store the timer ID and only clear it if it exists, while ensuring the condition to set a new timer correctly evaluates the status transition, perhaps by managing the 'previous status' state outside the ref that gets updated every render.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location:
autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts#L281-L282

Potential issue: A race condition exists in the auto-reconnect logic within the
`useEffect` hook. When a stream's `status` transitions from "streaming" to "error", a
one-second timer is set to call `resumeStream`. However, if a dependency in the hook's
dependency array (e.g., the `resumeStream` function) is recreated within that one-second
window, the effect's cleanup function is triggered, clearing the timer. When the effect
re-runs, the `prev` status is now "error", causing the condition `prev === "streaming"`
to fail. This prevents the timer from being set again, and the auto-reconnection never
occurs.

Did we get this right? πŸ‘ / πŸ‘Ž to inform future reviews.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py (3)

227-229: Guard poll_resp.json() against DecodingError.

Same as the submit step: a 200 response with a non-JSON body causes httpx.DecodingError to escape the poll loop, producing an opaque error at the caller site.

♻️ Proposed fix
         consecutive_errors = 0
-        poll_data = poll_resp.json()
+        try:
+            poll_data = poll_resp.json()
+        except (httpx.DecodingError, ValueError) as e:
+            return _create_error_response(
+                f"Agent Generator returned non-JSON poll response for job {job_id}: {e}",
+                "invalid_response",
+            )
         status = poll_data.get("status")
πŸ€– Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py`
around lines 227 - 229, Guard the call to poll_resp.json() against
httpx.DecodingError by wrapping it in a try/except: catch httpx.DecodingError
around poll_resp.json(), log the decoding error (include poll_resp.status_code
and poll_resp.text if available), increment the existing consecutive_errors
counter and continue the polling loop instead of letting the exception escape;
ensure poll_data is only used when parsing succeeded (i.e., move usage of
poll_data and status into the try block or after a successful parse).

159-176: Guard response.json() against DecodingError; consider retrying transient submit errors.

Two related concerns on the submit step:

  1. Unguarded response.json() (line 171): If the agent-generator returns a non-JSON body with a 202 (even transiently), httpx.DecodingError escapes _submit_and_poll unhandled. It is caught only by each caller's broad except Exception, producing an opaque "Unexpected error" message. Catching it here gives a more actionable diagnostic.

  2. No retry for transient submit errors: The poll loop retries 429/503/504/408 up to MAX_CONSECUTIVE_POLL_ERRORS times, but a single transient HTTP error during the submit POST fails the entire operation. If brief LB restarts during submission are a concern, a small retry budget here would be consistent.

♻️ Proposed fix for issue 1 (JSON decoding guard)
     data = response.json()
+    except (httpx.DecodingError, ValueError) as e:
+        return _create_error_response(
+            f"Agent Generator returned non-JSON submit response: {e}",
+            "invalid_response",
+        )
     job_id = data.get("job_id")

Or inline within the existing try block:

     try:
         response = await client.post(endpoint, json=payload)
         response.raise_for_status()
+        data = response.json()
     except httpx.HTTPStatusError as e:
         error_type, error_msg = _classify_http_error(e)
         logger.error(error_msg)
         return _create_error_response(error_msg, error_type)
     except httpx.RequestError as e:
         error_type, error_msg = _classify_request_error(e)
         logger.error(error_msg)
         return _create_error_response(error_msg, error_type)
+    except (httpx.DecodingError, ValueError) as e:
+        return _create_error_response(
+            f"Agent Generator returned non-JSON submit response: {e}",
+            "invalid_response",
+        )
-
-    data = response.json()
πŸ€– Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py`
around lines 159 - 176, The submit step in _submit_and_poll should guard against
httpx.DecodingError when calling response.json() and should retry transient
submit failures: wrap the client.post call in a small retry loop (e.g., 2–3
attempts with brief backoff) that treats transient status codes (429, 503, 504,
408) and RequestError as retryable (consistent with MAX_CONSECUTIVE_POLL_ERRORS
behavior), use _classify_http_error/_classify_request_error to classify/log
failures, and on final failure return _create_error_response with a clear
"invalid_response" or appropriate error_type; additionally catch
httpx.DecodingError after a successful response and return
_create_error_response with a diagnostic message instead of letting it
propagate.

34-36: Consider exposing poll interval/budget in Settings alongside agentgenerator_timeout.

agentgenerator_timeout (per-request HTTP timeout) is now a configurable setting, but POLL_INTERVAL_SECONDS (10 s) and MAX_POLL_TIME_SECONDS (1800 s) are hardcoded constants. For deployments with different job latency profiles these are operationally useful knobs. This is a low-priority improvement, but keeping the three values together in Settings would be consistent.

πŸ€– Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py`
around lines 34 - 36, The hardcoded polling constants POLL_INTERVAL_SECONDS,
MAX_POLL_TIME_SECONDS, and MAX_CONSECUTIVE_POLL_ERRORS should be moved into the
app Settings alongside agentgenerator_timeout so they are configurable per
deployment; add new Settings fields (e.g., agentgenerator_poll_interval,
agentgenerator_poll_timeout, agentgenerator_max_consecutive_poll_errors) with
the current defaults (10.0, 1800.0, 5), update
autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py to
remove the constants and read those values from Settings (use the same Settings
accessor used for agentgenerator_timeout) wherever POLL_INTERVAL_SECONDS,
MAX_POLL_TIME_SECONDS, and MAX_CONSECUTIVE_POLL_ERRORS are referenced, and
ensure any validation/type casting (float/int) is applied and unit tests/usage
still behave the same by default.
πŸ€– Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py`:
- Around line 227-229: Guard the call to poll_resp.json() against
httpx.DecodingError by wrapping it in a try/except: catch httpx.DecodingError
around poll_resp.json(), log the decoding error (include poll_resp.status_code
and poll_resp.text if available), increment the existing consecutive_errors
counter and continue the polling loop instead of letting the exception escape;
ensure poll_data is only used when parsing succeeded (i.e., move usage of
poll_data and status into the try block or after a successful parse).
- Around line 159-176: The submit step in _submit_and_poll should guard against
httpx.DecodingError when calling response.json() and should retry transient
submit failures: wrap the client.post call in a small retry loop (e.g., 2–3
attempts with brief backoff) that treats transient status codes (429, 503, 504,
408) and RequestError as retryable (consistent with MAX_CONSECUTIVE_POLL_ERRORS
behavior), use _classify_http_error/_classify_request_error to classify/log
failures, and on final failure return _create_error_response with a clear
"invalid_response" or appropriate error_type; additionally catch
httpx.DecodingError after a successful response and return
_create_error_response with a diagnostic message instead of letting it
propagate.
- Around line 34-36: The hardcoded polling constants POLL_INTERVAL_SECONDS,
MAX_POLL_TIME_SECONDS, and MAX_CONSECUTIVE_POLL_ERRORS should be moved into the
app Settings alongside agentgenerator_timeout so they are configurable per
deployment; add new Settings fields (e.g., agentgenerator_poll_interval,
agentgenerator_poll_timeout, agentgenerator_max_consecutive_poll_errors) with
the current defaults (10.0, 1800.0, 5), update
autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py to
remove the constants and read those values from Settings (use the same Settings
accessor used for agentgenerator_timeout) wherever POLL_INTERVAL_SECONDS,
MAX_POLL_TIME_SECONDS, and MAX_CONSECUTIVE_POLL_ERRORS are referenced, and
ensure any validation/type casting (float/int) is applied and unit tests/usage
still behave the same by default.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

πŸ“₯ Commits

Reviewing files that changed from the base of the PR and between 8ef8bec and ec7c7eb.

πŸ“’ Files selected for processing (1)
  • autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py
πŸ“œ Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: test (3.13)
  • GitHub Check: test (3.12)
  • GitHub Check: end-to-end tests
  • GitHub Check: test (3.11)
  • GitHub Check: Check PR Status
🧰 Additional context used
πŸ““ Path-based instructions (4)
autogpt_platform/backend/**/*.py

πŸ“„ CodeRabbit inference engine (.github/copilot-instructions.md)

autogpt_platform/backend/**/*.py: Use Python 3.11 (required; managed by Poetry via pyproject.toml) for backend development
Always run 'poetry run format' (Black + isort) before linting in backend development
Always run 'poetry run lint' (ruff) after formatting in backend development

Files:

  • autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py
autogpt_platform/backend/**/*.{py,txt}

πŸ“„ CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)

Use poetry run prefix for all Python commands, including testing, linting, formatting, and migrations

Files:

  • autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py
autogpt_platform/backend/backend/**/*.py

πŸ“„ CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)

Use Prisma ORM for database operations in PostgreSQL with pgvector for embeddings

Files:

  • autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py
autogpt_platform/**/*.py

πŸ“„ CodeRabbit inference engine (AGENTS.md)

Format Python code with poetry run format

Files:

  • autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py
🧬 Code graph analysis (1)
autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py (2)
autogpt_platform/backend/backend/util/request.py (2)
  • post (569-570)
  • json (295-301)
autogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.py (2)
  • generate_agent_dummy (101-115)
  • customize_template_dummy (140-151)
πŸ”‡ Additional comments (1)
autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py (1)

143-252: _submit_and_poll polling logic is well-structured overall.

The poll loop correctly:

  • Uses time.monotonic() for accurate wall-clock budget tracking.
  • Resets consecutive_errors after any successful poll response (line 227).
  • Distinguishes 404 (non-retryable) from transient statuses (retryable) before computing status_code.
  • Fails fast on unknown job statuses (invalid_response) rather than burning the full poll budget.
  • Returns a structured error dict in every exit path.

All three items from previous review rounds (transient HTTP retry, agent_json validation, result dict type guard) are correctly addressed.

Copy link

@autogpt-reviewer autogpt-reviewer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #12199 β€” Re-review #2: feat(copilot): async polling for agent-generator + SSE auto-reconnect
Author: majdyz | HEAD: ec7c7eb | Previous: d78e0ee (4 new commits)

🎯 Verdict: APPROVE


What Changed Since Last Review

Author addressed both conditions and most should-fix items from our first review in 4 focused commits:

Commit What
0bc098a Wire agentgenerator_timeout setting into client, use time.monotonic()
9b3e25d Retry transient HTTP errors (429/503/504/408), validate job_id, MAX_CONSECUTIVE_POLL_ERRORS=5
8ef8bec Validate completed job result type
ec7c7eb Extract _extract_agent_json helper, fail fast on unknown status

Condition Status

  1. βœ… agentgenerator_timeout dead code β€” FIXED. _get_client() now reads settings.config.agentgenerator_timeout. Hardcoded _REQUEST_TIMEOUT removed.
  2. βœ… Restore decompose_goal_external docstring β€” FIXED. Full return-type enumeration restored with all 5 variants.

Should-Fix Status

Item Status
time.monotonic() for elapsed tracking βœ… FIXED
Consecutive transient error cap βœ… FIXED (MAX_CONSECUTIVE_POLL_ERRORS=5)
Transient HTTP retry (429/503/504/408) βœ… FIXED (new, beyond what we asked)
Validate result type on completed job βœ… FIXED
Unknown job status β†’ error βœ… FIXED (new β€” explicit running/pending/queued allowlist)
_extract_agent_json DRY helper βœ… FIXED (replaces 3Γ— result.get("agent_json"))
5 new tests for above βœ… FIXED
Frontend reconnect tests ❌ Not addressed (follow-up)
TestCustomizeTemplateExternal ❌ Not addressed (follow-up)
RECONNECT_DELAY_MS constant ❌ Not addressed (minor)

New Code Quality Assessment

The new code is well-structured:

  • Transient HTTP retry correctly differentiates 429/503/504/408 (retry with counter) from 500/other (fail immediately). Good distinction.
  • consecutive_errors counter resets to 0 on successful poll β€” only consecutive failures trigger give-up. Correct behavior.
  • _extract_agent_json eliminates 3Γ— identical .get("agent_json") calls with proper validation. Returns typed error on missing/invalid.
  • Explicit status allowlist (running/pending/queued) with fallthrough error for unknown statuses β€” defensive and correct.
  • Test coverage for all new paths: transient HTTP retry + recovery, non-transient immediate fail, consecutive error give-up, monotonic time mocking, missing agent_json validation.

Remaining (Non-blocking Follow-ups)

  1. Frontend SSE reconnect tests β€” 40 lines of stateful useEffect still untested. Not blocking because the code was reviewed by 3 specialists and logic is straightforward, but should be added.
  2. TestCustomizeTemplateExternal β€” Still missing. Low risk (mirrors generate_agent_patch_external logic).
  3. RECONNECT_DELAY_MS constant β€” Minor style nit.

Risk Assessment

Merge risk: LOW | Rollback: EASY (4 files, no migrations)
CI: All tests pass (3.11/3.12/3.13), lint βœ…, types βœ…, e2e βœ…. Check PR Status failure is a race condition (ran before other checks finished).

⚠️ Deployment: Still requires companion PR (AutoGPT-Agent-Generator/pull/120) for polling endpoints.

@ntindle Clean iteration β€” author addressed all conditions and most should-fix items with well-tested code. Ready to merge.

@github-actions github-actions bot added the conflicts Automatically applied to PRs with merge conflicts label Feb 26, 2026
@github-actions
Copy link
Contributor

This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request.

@ntindle
Copy link
Member

ntindle commented Mar 4, 2026

is this still needed?

Copy link
Member

@ntindle ntindle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review β€” feat(backend/copilot): async polling for agent-generator + SSE auto-reconnect

The submit-and-poll approach is the right architectural decision for surviving GCP L7 LB kill-offs, and the overall structure is clean. However, there are several issues ranging from a behavioral loop bug to missing safety nets that need to be addressed before merge.


πŸ”΄ Blocking

1. Infinite reconnect loop in useCopilotPage.ts (line ~1603)

The reconnect counter is reset every time status === "streaming" β€” which happens immediately after each successful resumeStream(). This means the 3-attempt guard is per-error-burst, not per-session, and can loop indefinitely:

streaming β†’ error    (attempt 1, counter = 1)
resumeStream() called
streaming            (counter RESETS to 0 ← bug)
streaming β†’ error    (attempt 1 again, counter = 1)
... loops forever if LB keeps killing connections every ~5 min

Fix: Only reset the counter on status === "ready" (stream fully and successfully complete), not on status === "streaming":

// Reset reconnect counter ONLY when stream completes normally
if (status === "ready") {
  reconnectAttemptsRef.current = 0;
}

2. response.json() / poll_resp.json() not guarded in _submit_and_poll (service.py, lines ~142 & ~199)

If the agent-generator (or an intervening proxy) returns a non-JSON body (HTML error page, plain-text gateway timeout, etc.) both data = response.json() and poll_data = poll_resp.json() will throw a json.JSONDecodeError. This propagates out of _submit_and_poll unhandled, gets caught by the bare except Exception in callers, and surfaces as a cryptic "unexpected_error".

Fix: Wrap both .json() calls:

try:
    data = response.json()
except Exception as exc:
    return _create_error_response(
        f"Agent Generator returned non-JSON submit response: {exc}",
        "invalid_response",
    )

Do the same for poll_data = poll_resp.json() inside the polling loop.

3. Silent removal of operation_id / session_id parameters is a breaking API change (service.py)

generate_agent_patch_external and customize_template_external previously accepted operation_id: str = "" and session_id: str = "" keyword arguments that enabled Redis Streams async-callback mode. These are gone without a deprecation shim.

Any caller that passes those keyword arguments will receive an immediate TypeError at runtime. The companion PR description notes the {"status": "accepted"} response type is also removed, but the internal callers of these functions (copilot router, etc.) have not been shown to be audited.

Required: Show the call sites of both functions and confirm no caller passes operation_id or session_id. If they do, add **_ignored_kwargs or raise a DeprecationWarning with a grace-period PR.

4. agentgenerator_timeout semantic change silently breaks operator overrides (settings.py, line ~656)

Changing the default from 1800 β†’ 30 is fine for new deployments, but:

  • Anyone who has overridden agentgenerator_timeout=60 (for example) expecting 60 seconds total time will now have a 60-second per-request HTTP timeout and still poll for up to 30 minutes.
  • The hardcoded MAX_POLL_TIME_SECONDS = 1800.0 in service.py cannot be overridden via settings at all.

Fix: Expose MAX_POLL_TIME_SECONDS as a settings field (e.g., agentgenerator_poll_timeout: int = Field(default=1800, ...)). Update the agentgenerator_timeout description to say "per-request HTTP timeout" explicitly and note the default change.


🟠 High Severity

5. MAX_RECONNECT_ATTEMPTS defined inside the hook (useCopilotPage.ts, line ~1559)

const MAX_RECONNECT_ATTEMPTS = 3;   // recreated on every render

Move to module scope. This is also flagged by most lint/ESLint no-magic-numbers rules.

6. resumeStream in useEffect deps may not be stable (useCopilotPage.ts, line ~1606)

resumeStream is listed in the useEffect dependency array. If resumeStream is not wrapped in useCallback with a stable dependency set, it will be a new function reference on every render, causing the effect to fire on every render β€” potentially triggering reconnect logic unexpectedly. Please confirm resumeStream has a stable reference, or wrap the call inside the effect with a ref to avoid the dependency.

7. No tests for customize_template_external polling behavior (test_service.py)

customize_template_external was migrated to _submit_and_poll but there is no corresponding test class in the diff. It is the only migrated endpoint without coverage for:

  • Happy-path agent_json extraction
  • clarifying_questions passthrough
  • Error propagation from _submit_and_poll
  • _extract_agent_json validation failure

Add a TestCustomizeTemplateExternal class mirroring TestGenerateAgentPatchExternal.

8. No cancellation of in-flight polling job on caller abort (service.py, _submit_and_poll)

If the FastAPI request handler is cancelled (client closes connection, timeout), asyncio.CancelledError will propagate correctly out of asyncio.sleep, but the job submitted to the agent-generator will continue running on the remote side until completion. There is no cleanup path.

Fix: Catch asyncio.CancelledError, issue a best-effort DELETE /api/jobs/{job_id}, then re-raise:

except asyncio.CancelledError:
    logger.warning(f"Polling cancelled for job {job_id}, attempting cleanup")
    with contextlib.suppress(Exception):
        await client.delete(f"/api/jobs/{job_id}")
    raise

🟑 Medium / Suggestions

9. asyncio.sleep patched globally in tests (test_service.py)

patch("asyncio.sleep", new_callable=AsyncMock) patches the global symbol, not the one imported by the module under test. Prefer:

patch("backend.copilot.tools.agent_generator.service.asyncio.sleep", new_callable=AsyncMock)

This is more precise and won't accidentally suppress sleeps in other co-routines sharing the event loop.

10. _extract_agent_json return type is ambiguous (service.py, lines ~226-236)

_extract_agent_json returns either valid agent JSON or an error-response dict, both typed as dict[str, Any]. Callers check result.get("type") == "error" before calling it, so errors from validation inside _extract_agent_json silently bubble back as undetected error dicts.

Consider returning None on validation failure (and generating the error response in the caller) so type checkers can distinguish the success case from the error case.

11. _submit_and_poll tests reach into private API (test_service.py, TestSubmitAndPoll)

await service._submit_and_poll(...) tests the private helper directly. These tests are fragile β€” renaming or splitting the function breaks the entire class. Most of the scenarios (transient retry, 404, timeout) are already implicitly exercised by the public-API tests. Consider whether direct private-function tests are necessary or if they can be folded into the public-API test classes.

12. _client singleton ignores runtime agentgenerator_timeout changes (service.py, _get_client())

This was true before this PR but becomes more operationally important now that agentgenerator_timeout controls per-request HTTP timeouts in a polling loop. A comment documenting that the client must be reset (set service._client = None) after changing the setting would save future debugging time.

13. hasActiveStream timing in useEffect (useCopilotPage.ts)

hasActiveStream is read inside useEffect, which runs after the render commit. If status β†’ "error" and hasActiveStream β†’ false are batched into the same React update, the effect may see hasActiveStream === false and skip the reconnect even though the backend job is still running. Document the ordering assumption or use a useRef for hasActiveStream to guarantee synchronous reads.


πŸ“ Nits

  • The float() cast in timeout = httpx.Timeout(float(settings.config.agentgenerator_timeout)) is redundant β€” httpx.Timeout already accepts int. Remove float().
  • Docstrings on _create_error_response, _classify_http_error, and _classify_request_error were shortened to one-liners. The removed Args / Returns sections on _classify_* functions (which have non-obvious tuple return types) should be preserved for discoverability.
  • decompose_goal_external, generate_agent_external, and generate_agent_patch_external still declare -> dict[str, Any] | None but the new implementation never returns None. Update return types to -> dict[str, Any] to avoid misleading callers.

Summary of required changes before merge:

  1. Fix infinite reconnect loop (reset counter on "ready" only, not "streaming")
  2. Guard .json() calls against JSONDecodeError in _submit_and_poll
  3. Audit and confirm operation_id/session_id removal doesn't break any callers
  4. Expose MAX_POLL_TIME_SECONDS via settings; update agentgenerator_timeout description
  5. Add TestCustomizeTemplateExternal test class
  6. Handle asyncio.CancelledError with best-effort job cleanup

@github-project-automation github-project-automation bot moved this from πŸ†• Needs initial review to 🚧 Needs work in AutoGPT development kanban Mar 4, 2026
@majdyz majdyz closed this Mar 5, 2026
@github-project-automation github-project-automation bot moved this from 🚧 Needs work to βœ… Done in AutoGPT development kanban Mar 5, 2026
@github-project-automation github-project-automation bot moved this to Done in Frontend Mar 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

conflicts Automatically applied to PRs with merge conflicts platform/backend AutoGPT Platform - Back end platform/frontend AutoGPT Platform - Front end size/xl

Projects

Status: βœ… Done
Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants