feat(backend/copilot): async polling for agent-generator + SSE auto-reconnect#12199
feat(backend/copilot): async polling for agent-generator + SSE auto-reconnect#12199
Conversation
β¦nd SSE reconnect Platform service now submits jobs to agent-generator and polls for results (10s interval) instead of blocking on a single HTTP call for up to 30 min. asyncio.sleep in the poll loop yields to the event loop, keeping SSE heartbeats alive through GCP's L7 load balancer. Frontend auto-reconnects up to 3 times when SSE drops mid-stream.
|
Too many files changed for review. ( |
|
This PR targets the Automatically setting the base branch to |
π₯ Pre-merge checks | β 3β Passed checks (3 passed)
βοΈ Tip: You can configure your own custom pre-merge checks in the settings. β¨ Finishing Touches
π§ͺ Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
π PR Overlap DetectionThis check compares your PR against all other open PRs targeting the same branch to detect potential merge conflicts early. π΄ Merge Conflicts DetectedThe following PRs have been tested and will have merge conflicts if merged after this PR. Consider coordinating with the authors.
π’ Low Risk β File Overlap OnlyThese PRs touch the same files but different sections (click to expand)
Summary: 1 conflict(s), 0 medium risk, 4 low risk (out of 5 PRs with file overlap) Auto-generated on push. Ignores: |
autogpt-reviewer
left a comment
There was a problem hiding this comment.
PR #12199 β feat(copilot): async polling for agent-generator + SSE auto-reconnect
Author: majdyz (Zamil Majdy) | Files: service.py (+183/-239), settings.py (+2/-2), useCopilotPage.ts (+40/-1), test_service.py (+372/-203)
π― Verdict: APPROVE WITH CONDITIONS
What This PR Does
GCP's L7 load balancer kills SSE connections at ~5 minutes. Agent generation takes 5-30 minutes, so the old synchronous HTTP call blocked for the entire duration, causing SSE to die. This PR switches the backend to a submit-and-poll pattern (submit job β poll every 10s β up to 30min budget) so each HTTP request stays short and asyncio.sleep() yields control for SSE heartbeat delivery. The frontend adds auto-reconnect (3 attempts, 1s delay) when SSE drops mid-stream.
Specialist Findings
π‘οΈ Security β
β No vulnerabilities found. Job ID is server-side only (never exposed to users), SSE reconnect uses same authenticated transport, no new endpoints added. _submit_and_poll is entirely backendβagent-generator with no user-controlled input in URL construction. Only informational: defensive job_id format validation would harden against a compromised upstream service.
ποΈ Architecture β
β Textbook refactoring. _submit_and_poll() is a clean single-responsibility abstraction that eliminated 4Γ duplicated HTTP/error-handling blocks (net -56 lines despite new functionality). Timeout model now correct: 30s per-request vs 30-min polling budget. Error classification helpers preserved and reused. Frontend reconnect cleanly contained in existing useEffect status watcher.
agentgenerator_timeout setting in settings.py:374-376 is now dead code β nothing reads it. Must wire it back into _REQUEST_TIMEOUT or remove it.
β‘ Performance β
β Polling overhead is negligible (0.1 req/s per job; 100 concurrent sessions = 10 req/s). No memory leaks β poll responses aren't accumulated. Connection pool adequate.
service.py:185-187: elapsed += POLL_INTERVAL_SECONDS drifts from wall-clock time β doesn't account for HTTP request duration (up to 30s on timeout). Under pathological conditions, loop could run ~5.4Γ longer than the 30-min cap. Should use time.monotonic().
π§ͺ Testing TestSubmitAndPoll tests cover happy path, failed job, HTTP errors, connection errors, missing job_id, transient retry, 404 expired job, and timeout. Public function tests properly mock _submit_and_poll.
β Frontend SSE reconnect has zero test coverage β 40 lines of new stateful reconnect logic (the user-facing feature that solves the GCP LB problem) with no automated tests. Missing: reconnect trigger, counter cap, counter reset, cleanup on unmount.
π Quality β
β Clean, well-organized code. Constants properly extracted (POLL_INTERVAL_SECONDS, MAX_POLL_TIME_SECONDS, MAX_RECONNECT_ATTEMPTS). Good section dividers. Frontend reconnect logic is readable with proper state machine.
decompose_goal_external docstring lost useful return-type enumeration during refactor (service.py:225-228).
setTimeout(() => resumeStream(), 1_000) β inline magic number should be RECONNECT_DELAY_MS.
π¦ Product β
β Solves a real, painful UX problem (SSE dying during long agent generation). Backend polling is invisible to users. Frontend auto-reconnect is well-guarded with attempt limits and proper cleanup. Deployment-independent β backend and frontend can be deployed separately.
π¬ Discussion β
β CI fully green (25/25 checks pass). No human reviews submitted yet. No unresolved review threads.
π QA _submit_and_poll present, health endpoint returns 200, clean startup logs. Storybook loads correctly.

Conditions (must fix before merge)
-
agentgenerator_timeoutis dead code βsettings.py:374-376: The setting exists (updated default + description) but nothing reads it._REQUEST_TIMEOUTis hardcoded tohttpx.Timeout(30.0)atservice.py:82. Either wire_get_settings().config.agentgenerator_timeoutinto the timeout, or remove the dead setting. Dead config confuses operators. -
Use monotonic clock for elapsed tracking β
service.py:185-187: Replaceelapsed += POLL_INTERVAL_SECONDSwithtime.monotonic()delta. Current approach doesn't account for HTTP request duration and can overshoot the 30-min budget by a large margin under pathological conditions (every poll hitting 30s timeout).
Should Fix (Follow-up OK)
service.py:192-195β Add a consecutive transient error cap (e.g., bail after 5 consecutiveRequestErrors) or exponential backoff. Currently retries indefinitely within the budget.useCopilotPage.ts:281β Exponential backoff on reconnect (1s β 2s β 4s) instead of fixed 1s. All 3 attempts fire within ~3s if LB is actively killing connections.useCopilotPage.ts:~249β MoveMAX_RECONNECT_ATTEMPTSto module scope (standard React pattern).useCopilotPage.ts:278β ExtractRECONNECT_DELAY_MS = 1_000as a named constant.service.py:225-228β Restore return-type enumeration indecompose_goal_externaldocstring (possible types:instructions,clarifying_questions,unachievable_goal,vague_goal,error).- Frontend SSE reconnect tests β Add at least unit tests for the reconnect state machine (trigger, counter cap, counter reset, cleanup).
Risk Assessment
Merge risk: LOW | Rollback: EASY (revert 4 files, no schema changes, no data migrations)
@ntindle Well-architected refactor that solves the GCP LB SSE timeout problem cleanly. Two conditions before merge: wire or remove the orphaned agentgenerator_timeout setting, and fix the elapsed time tracking to use monotonic clock. Everything else is follow-up material. The core mechanism is sound β net code reduction with better reliability.
autogpt-reviewer
left a comment
There was a problem hiding this comment.
PR #12199 β feat(copilot): async polling for agent-generator + SSE auto-reconnect
Author: majdyz (Zamil Majdy) | Files: service.py (+183/-239), settings.py (+2/-2), useCopilotPage.ts (+40/-1), test_service.py (+372/-203)
π― Verdict: APPROVE WITH CONDITIONS
What This PR Does
GCP's L7 load balancer kills SSE connections at ~5 minutes, but agent generation takes 5β30 minutes. This PR switches the platform backend from a single long-lived HTTP call to a submit-and-poll pattern (POST β 202 + job_id β GET poll every 10s, 30min budget). The frontend adds auto-reconnect (3 attempts, 1s delay) when SSE drops mid-stream. Together, these two layers solve the connection-drop problem at both backend and frontend.
Specialist Findings
π‘οΈ Security β
β No blockers. job_id flows entirely within backend-to-microservice boundary β no IDOR risk. SSE reconnect reuses existing authenticated session β no auth bypass. Minor nit: job_id is interpolated directly into URL path (service.py:186) without format validation; a UUID/alphanumeric check would harden against hypothetical path traversal from a compromised internal service.
ποΈ Architecture β
β Clean, well-isolated abstraction. _submit_and_poll() has single responsibility; all 4 external functions delegate uniformly. Massive DRY win β eliminated 4Γ identical try/except blocks. Frontend and backend layers are independent (good separation). agentgenerator_timeout setting (settings.py:374) is now dead code β _REQUEST_TIMEOUT is hardcoded at service.py:82. Wire it back or delete the setting.
β‘ Performance β
β asyncio.sleep(10) correctly yields to event loop for heartbeat delivery. Singleton httpx.AsyncClient with connection pooling is efficient. No memory leaks in poll loop. service.py:190) drifts from wall clock β uses counter instead of time.monotonic().
π§ͺ Testing TestSubmitAndPoll (8 tests) is solid: happy path, failures, transient retry, timeout all covered. Public function tests properly mock at the _submit_and_poll boundary. However: β Frontend SSE auto-reconnect (40 lines of stateful useEffect logic) has zero test coverage β this is the user-facing reliability fix. β customize_template_external has no test class at all (missing TestCustomizeTemplateExternal). Missing edge cases: all-transient-errors-until-timeout, malformed poll JSON, empty result on completed job.
π Quality β
β Genuinely cleaner refactor (net -56 lines). Good naming conventions (_submit_and_poll, POLL_INTERVAL_SECONDS). Well-organized section comments. decompose_goal_external lost useful return-type enumeration (instructions/clarifying_questions/unachievable_goal/vague_goal/error). setTimeout(() => resumeStream(), 1_000) β extract as RECONNECT_DELAY_MS for consistency.
π¦ Product β
β Solves a real, painful user problem. Happy path is completely transparent to users. Reconnect is well-guarded with attempt limits and proper cleanup. Toast on exhaustion is clear. _submit_and_poll loop.
π¬ Discussion
π QA
Conditions for Merge
-
agentgenerator_timeoutsetting is dead code βsettings.py:374-376defines it,service.py:82hardcodes_REQUEST_TIMEOUT = httpx.Timeout(30.0)instead. Either wiresettings.config.agentgenerator_timeoutinto the timeout, or delete the setting. Dead config misleads operators. (Flagged by: Security, Architect, Quality) -
Restore
decompose_goal_externalreturn-type documentation β The docstring lost the enumeration of possible return types (instructions,clarifying_questions,unachievable_goal,vague_goal,error). One-line abbreviatedArgs:sections for all 4 functions would also help callers. (Flagged by: Quality)
Should Fix (Follow-up OK)
- Frontend SSE reconnect needs tests β 40 lines of stateful
useEffectlogic withreconnectAttemptsRef,setTimeoutcleanup, counter reset. At minimum: RTL test verifying reconnect triggers onstreamingβerror+hasActiveStream, counter exhaustion shows toast, counter resets on success. (Flagged by: Testing) - Add
TestCustomizeTemplateExternalβ Missing test class entirely. Function has its own response-mapping logic. (Flagged by: Testing) - Exponential backoff for polling β Fixed 10s = 180 requests worst case. Start at 2s, cap at 30s β ~50 requests, faster detection of quick jobs. (Flagged by: Performance, Architect)
- Use
time.monotonic()for elapsed tracking β Current counter drifts by HTTP request duration per iteration. (Flagged by: Performance, Quality) - Extract
RECONNECT_DELAY_MS = 1_000β Inline magic number in frontend. (Flagged by: Quality) - Rebase to reduce diff noise β 299-file diff from dev merge blocks automated reviewers. 4 actual files. (Flagged by: Discussion)
Risk Assessment
Merge risk: LOW | Rollback: EASY (revert 4 files, no migrations)
Deployment note: Requires companion PR (AutoGPT-Agent-Generator/pull/120) deployed alongside for polling endpoints. Without it, backend will get unexpected responses from agent-generator. Dummy mode still works independently.
@ntindle Two small conditions (dead setting + restore docstring), then this is good to go. Backend refactor is a clean DRY win, frontend reconnect is well-implemented. Main gap is frontend test coverage β not blocking merge but should be a fast follow-up.
β¦ clock, cap poll errors
Review Items Addressed (0bc098a)Must Fix (both done):
Should Fix (done in same commit): Tests updated: |
There was a problem hiding this comment.
Actionable comments posted: 2
π€ Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py`:
- Around line 320-325: The code currently returns result.get("agent_json")
directly which can be None if the job payload is malformed; update each
agent-producing endpoint that returns result.get("agent_json") (the three
occurrences using result.get("agent_json")) to validate that agent_json is
present and is the expected type (e.g., a dict) before returning it, and if
invalid or missing return an invalid_response-style error object (use the
existing invalid_response helper or a
{"type":"error","error":"invalid_response", ...} payload) so callers get a clear
diagnostic instead of None; apply this change to all three return sites that
currently return result.get("agent_json").
- Around line 189-197: The poll handler currently treats transient HTTP statuses
as fatal; after calling _classify_http_error(e) inside the httpx.HTTPStatusError
except block, detect transient cases (e.g., status codes 429, 503, 504, 408 or
error_type values that _classify_http_error returns for rate limiting/temporary
upstream errors) and do NOT return an immediate _create_error_response; instead
log the transient condition via logger.error and surface a retryable signal (for
example raise a new RetryablePollError or re-raise the original exception) so
the caller can retry the poll; only call and return _create_error_response for
non-transient error_type values. Ensure you reference and update the
httpx.HTTPStatusError except block, the use of _classify_http_error,
logger.error, and _create_error_response accordingly.
βΉοΈ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled
You can enable these sources in your CodeRabbit configuration.
π Files selected for processing (4)
autogpt_platform/backend/backend/copilot/tools/agent_generator/service.pyautogpt_platform/backend/backend/util/settings.pyautogpt_platform/backend/test/agent_generator/test_service.pyautogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts
π Review details
β° Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: Seer Code Review
- GitHub Check: types
- GitHub Check: Check PR Status
- GitHub Check: test (3.13)
- GitHub Check: test (3.12)
- GitHub Check: test (3.11)
- GitHub Check: end-to-end tests
- GitHub Check: Analyze (python)
π§° Additional context used
π Path-based instructions (15)
autogpt_platform/backend/**/*.py
π CodeRabbit inference engine (.github/copilot-instructions.md)
autogpt_platform/backend/**/*.py: Use Python 3.11 (required; managed by Poetry via pyproject.toml) for backend development
Always run 'poetry run format' (Black + isort) before linting in backend development
Always run 'poetry run lint' (ruff) after formatting in backend development
Files:
autogpt_platform/backend/backend/util/settings.pyautogpt_platform/backend/test/agent_generator/test_service.pyautogpt_platform/backend/backend/copilot/tools/agent_generator/service.py
autogpt_platform/backend/**/*.{py,txt}
π CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)
Use
poetry runprefix for all Python commands, including testing, linting, formatting, and migrations
Files:
autogpt_platform/backend/backend/util/settings.pyautogpt_platform/backend/test/agent_generator/test_service.pyautogpt_platform/backend/backend/copilot/tools/agent_generator/service.py
autogpt_platform/backend/backend/**/*.py
π CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)
Use Prisma ORM for database operations in PostgreSQL with pgvector for embeddings
Files:
autogpt_platform/backend/backend/util/settings.pyautogpt_platform/backend/backend/copilot/tools/agent_generator/service.py
autogpt_platform/**/*.py
π CodeRabbit inference engine (AGENTS.md)
Format Python code with
poetry run format
Files:
autogpt_platform/backend/backend/util/settings.pyautogpt_platform/backend/test/agent_generator/test_service.pyautogpt_platform/backend/backend/copilot/tools/agent_generator/service.py
autogpt_platform/backend/**/test/**/*.py
π CodeRabbit inference engine (.github/copilot-instructions.md)
Use snapshot testing with '--snapshot-update' flag in backend tests when output changes; always review with 'git diff'
Files:
autogpt_platform/backend/test/agent_generator/test_service.py
autogpt_platform/backend/**/*test*.py
π CodeRabbit inference engine (AGENTS.md)
Run
poetry run testfor backend testing (runs pytest with docker based postgres + prisma)
Files:
autogpt_platform/backend/test/agent_generator/test_service.py
autogpt_platform/frontend/**/*.{ts,tsx,js,jsx}
π CodeRabbit inference engine (.github/copilot-instructions.md)
autogpt_platform/frontend/**/*.{ts,tsx,js,jsx}: Use Node.js 21+ with pnpm package manager for frontend development
Always run 'pnpm format' for formatting and linting code in frontend development
Files:
autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts
autogpt_platform/frontend/**/*.{tsx,ts}
π CodeRabbit inference engine (.github/copilot-instructions.md)
autogpt_platform/frontend/**/*.{tsx,ts}: Use function declarations for components and handlers (not arrow functions) in React components
Only use arrow functions for small inline lambdas (map, filter, etc.) in React components
Use PascalCase for component names and camelCase with 'use' prefix for hook names in React
Use Tailwind CSS utilities only for styling in frontend components
Use design system components from 'src/components/' (atoms, molecules, organisms) in frontend development
Never use 'src/components/legacy/' in frontend code
Only use Phosphor Icons (@phosphor-icons/react) for icons in frontend components
Use generated API hooks from '@/app/api/generated/endpoints/' instead of deprecated 'BackendAPI' or 'src/lib/autogpt-server-api/'
Use React Query for server state (via generated hooks) in frontend development
Default to client components ('use client') in Next.js; only use server components for SEO or extreme TTFB needs
Use '' component for rendering errors in frontend UI; use toast notifications for mutation errors; use 'Sentry.captureException()' for manual exceptions
Separate render logic from data/behavior in React components; keep comments minimal (code should be self-documenting)
Files:
autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts
autogpt_platform/frontend/**/*.{ts,tsx}
π CodeRabbit inference engine (.github/copilot-instructions.md)
autogpt_platform/frontend/**/*.{ts,tsx}: No barrel files or 'index.ts' re-exports in frontend code
Regenerate API hooks with 'pnpm generate:api' after backend OpenAPI spec changes in frontend development
Files:
autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts
autogpt_platform/frontend/src/**/*.{ts,tsx}
π CodeRabbit inference engine (autogpt_platform/frontend/CLAUDE.md)
autogpt_platform/frontend/src/**/*.{ts,tsx}: Fully capitalize acronyms in symbols, e.g.graphID,useBackendAPI
Use function declarations (not arrow functions) for components and handlers
Separate render logic (.tsx) from business logic (use*.tshooks)
Use shadcn/ui (Radix UI primitives) with Tailwind CSS styling for UI components
Use Phosphor Icons only for icons
Use ErrorCard for render errors, toast for mutations, and Sentry for exceptions
Use design system components fromsrc/components/(atoms, molecules, organisms)
Never usesrc/components/__legacy__/*components
Use generated API hooks from@/app/api/__generated__/endpoints/with patternuse{Method}{Version}{OperationName}
Use Tailwind CSS only for styling, with design tokens
Do not useuseCallbackoruseMemounless asked to optimize a given function
Never type withanyunless a variable/attribute can ACTUALLY be of any type
autogpt_platform/frontend/src/**/*.{ts,tsx}: Structure components asComponentName/ComponentName.tsx+useComponentName.ts+helpers.tsand use design system components fromsrc/components/(atoms, molecules, organisms)
Use generated API hooks from@/app/api/__generated__/endpoints/with patternuse{Method}{Version}{OperationName}and regenerate withpnpm generate:api
Use function declarations (not arrow functions) for components and handlers
Separate render logic from business logic with component.tsx + useComponent.ts + helpers.ts structure
Colocate state when possible, avoid creating large components, use sub-components in local/componentsfolder
Avoid large hooks, abstract logic intohelpers.tsfiles when sensible
Use arrow functions only for callbacks, not for component declarations
Avoid comments at all times unless the code is very complex
Do not useuseCallbackoruseMemounless asked to optimize a given function
Files:
autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts
autogpt_platform/frontend/src/**/*use*.ts
π CodeRabbit inference engine (autogpt_platform/frontend/CLAUDE.md)
Do not type hook returns, let TypeScript infer as much as possible
Files:
autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts
autogpt_platform/frontend/**/*.{js,jsx,ts,tsx}
π CodeRabbit inference engine (AGENTS.md)
autogpt_platform/frontend/**/*.{js,jsx,ts,tsx}: Format frontend code usingpnpm format
Never use components fromsrc/components/__legacy__/*
Files:
autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts
autogpt_platform/frontend/**/*.{js,jsx,ts,tsx,css}
π CodeRabbit inference engine (AGENTS.md)
Use Tailwind CSS only for styling, use design tokens, and use Phosphor Icons only
Files:
autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts
autogpt_platform/frontend/src/**/*.ts
π CodeRabbit inference engine (AGENTS.md)
Do not type hook returns, let Typescript infer as much as possible
Files:
autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts
autogpt_platform/**/*.{ts,tsx}
π CodeRabbit inference engine (AGENTS.md)
Never type with
any, if no types available useunknown
Files:
autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts
π§ Learnings (2)
π Learning: 2026-02-04T16:49:42.490Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-02-04T16:49:42.490Z
Learning: Applies to autogpt_platform/backend/**/test/**/*.py : Use snapshot testing with '--snapshot-update' flag in backend tests when output changes; always review with 'git diff'
Applied to files:
autogpt_platform/backend/test/agent_generator/test_service.py
π Learning: 2026-02-04T16:50:20.508Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: autogpt_platform/backend/CLAUDE.md:0-0
Timestamp: 2026-02-04T16:50:20.508Z
Learning: Applies to autogpt_platform/backend/**/*_test.py : Always review snapshot changes with `git diff` before committing when updating snapshots with `poetry run pytest --snapshot-update`
Applied to files:
autogpt_platform/backend/test/agent_generator/test_service.py
𧬠Code graph analysis (1)
autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py (2)
autogpt_platform/backend/backend/util/request.py (2)
post(569-570)json(295-301)autogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.py (2)
generate_agent_dummy(101-115)customize_template_dummy(140-151)
π Additional comments (5)
autogpt_platform/backend/backend/util/settings.py (1)
374-377: Per-request timeout config update looks consistent with the new polling model.The new default/value description pair is clear and matches submit/poll request-level timeout semantics.
autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts (2)
261-266: Session cache invalidation on stream end is a good addition.This avoids stale hydrated session data after stream termination and improves remount/navigation consistency.
268-297: Reconnect strategy is well-bounded and safely gated.The streamingβerror guard with
hasActiveStreamand capped retry attempts prevents runaway reconnect loops while still recovering common mid-stream SSE drops.autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py (1)
181-185: Good use of monotonic timing and non-blocking poll waits.Using
time.monotonic()withawait asyncio.sleep(...)is the right pattern for stable elapsed-time checks in async loops.autogpt_platform/backend/test/agent_generator/test_service.py (1)
52-274: Great coverage on the new submit-and-poll core path.These tests exercise success, timeout, transient network failures, 404 expiry, and capped retry behavior in a way that matches the new helper semantics.
autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py
Show resolved
Hide resolved
autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py
Show resolved
Hide resolved
β¦ent_json responses
There was a problem hiding this comment.
Actionable comments posted: 1
π€ Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py`:
- Around line 231-233: In _submit_and_poll, after detecting status ==
"completed" (and before returning poll_data.get("result", {})), validate that
poll_data.get("result") is a dict; if it is, return it, otherwise log an error
including job_id and the unexpected result type/value and return a structured
invalid response (e.g. {"invalid_response": True, "raw_result": <result>}) so
downstream callers that call .get(...) won't raise AttributeError.
βΉοΈ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled
You can enable these sources in your CodeRabbit configuration.
π Files selected for processing (2)
autogpt_platform/backend/backend/copilot/tools/agent_generator/service.pyautogpt_platform/backend/test/agent_generator/test_service.py
π Review details
β° Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: Seer Code Review
- GitHub Check: types
- GitHub Check: test (3.11)
- GitHub Check: test (3.13)
- GitHub Check: test (3.12)
- GitHub Check: end-to-end tests
- GitHub Check: Analyze (python)
- GitHub Check: Check PR Status
π§° Additional context used
π Path-based instructions (6)
autogpt_platform/backend/**/*.py
π CodeRabbit inference engine (.github/copilot-instructions.md)
autogpt_platform/backend/**/*.py: Use Python 3.11 (required; managed by Poetry via pyproject.toml) for backend development
Always run 'poetry run format' (Black + isort) before linting in backend development
Always run 'poetry run lint' (ruff) after formatting in backend development
Files:
autogpt_platform/backend/backend/copilot/tools/agent_generator/service.pyautogpt_platform/backend/test/agent_generator/test_service.py
autogpt_platform/backend/**/*.{py,txt}
π CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)
Use
poetry runprefix for all Python commands, including testing, linting, formatting, and migrations
Files:
autogpt_platform/backend/backend/copilot/tools/agent_generator/service.pyautogpt_platform/backend/test/agent_generator/test_service.py
autogpt_platform/backend/backend/**/*.py
π CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)
Use Prisma ORM for database operations in PostgreSQL with pgvector for embeddings
Files:
autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py
autogpt_platform/**/*.py
π CodeRabbit inference engine (AGENTS.md)
Format Python code with
poetry run format
Files:
autogpt_platform/backend/backend/copilot/tools/agent_generator/service.pyautogpt_platform/backend/test/agent_generator/test_service.py
autogpt_platform/backend/**/test/**/*.py
π CodeRabbit inference engine (.github/copilot-instructions.md)
Use snapshot testing with '--snapshot-update' flag in backend tests when output changes; always review with 'git diff'
Files:
autogpt_platform/backend/test/agent_generator/test_service.py
autogpt_platform/backend/**/*test*.py
π CodeRabbit inference engine (AGENTS.md)
Run
poetry run testfor backend testing (runs pytest with docker based postgres + prisma)
Files:
autogpt_platform/backend/test/agent_generator/test_service.py
π§ Learnings (2)
π Learning: 2026-02-04T16:49:42.490Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-02-04T16:49:42.490Z
Learning: Applies to autogpt_platform/backend/**/test/**/*.py : Use snapshot testing with '--snapshot-update' flag in backend tests when output changes; always review with 'git diff'
Applied to files:
autogpt_platform/backend/test/agent_generator/test_service.py
π Learning: 2026-02-04T16:50:20.508Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: autogpt_platform/backend/CLAUDE.md:0-0
Timestamp: 2026-02-04T16:50:20.508Z
Learning: Applies to autogpt_platform/backend/**/*_test.py : Always review snapshot changes with `git diff` before committing when updating snapshots with `poetry run pytest --snapshot-update`
Applied to files:
autogpt_platform/backend/test/agent_generator/test_service.py
𧬠Code graph analysis (1)
autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py (1)
autogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.py (2)
generate_agent_dummy(101-115)customize_template_dummy(140-151)
π Additional comments (2)
autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py (1)
189-208: Transient poll retry behavior is a solid reliability improvement.Handling retryable HTTP/request errors with a consecutive-error cap is a good balance between resilience and bounded wait time.
Also applies to: 212-225
autogpt_platform/backend/test/agent_generator/test_service.py (1)
52-336: Great expansion of polling-path test coverage.The new cases for timeout, transient retries, and consecutive-error aborts materially reduce regression risk for the async polling flow.
Also applies to: 338-850
autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
π§Ή Nitpick comments (2)
autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py (2)
344-350: Consider deduplicatingagent_jsonextraction/validation.The same validation block appears in three endpoints. A small helper would reduce drift and keep error responses consistent.
β»οΈ Suggested refactor
+def _extract_agent_json_or_error(result: dict[str, Any]) -> dict[str, Any]: + agent_json = result.get("agent_json") + if isinstance(agent_json, dict): + return agent_json + return _create_error_response( + "Agent Generator returned no agent_json in result", "invalid_response" + ) @@ - agent_json = result.get("agent_json") - if not isinstance(agent_json, dict): - return _create_error_response( - "Agent Generator returned no agent_json in result", "invalid_response" - ) - return agent_json + return _extract_agent_json_or_error(result) @@ - agent_json = result.get("agent_json") - if not isinstance(agent_json, dict): - return _create_error_response( - "Agent Generator returned no agent_json in result", "invalid_response" - ) - return agent_json + return _extract_agent_json_or_error(result) @@ - agent_json = result.get("agent_json") - if not isinstance(agent_json, dict): - return _create_error_response( - "Agent Generator returned no agent_json in result", "invalid_response" - ) - return agent_json + return _extract_agent_json_or_error(result)Also applies to: 391-396, 441-446
π€ Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py` around lines 344 - 350, Create a small helper (e.g., _extract_agent_json(result)) in service.py that encapsulates the repeated logic: get "agent_json" from the result, verify it's a dict, and return either the dict or call _create_error_response("Agent Generator returned no agent_json in result", "invalid_response"). Replace the three duplicated blocks (the extraction/validation around agent_json at the locations using variable agent_json) with calls to this helper so all endpoints use the same validation and error response.
240-245: Handle unknown poll statuses explicitly instead of treating all as βstill runningβ.Right now, any status other than
"completed"/"failed"keeps polling until timeout. If upstream returns an unexpected terminal state, this delays failure diagnosis by up to 30 minutes.β»οΈ Suggested hardening
if status == "completed": logger.info(f"Agent Generator job {job_id} completed") result = poll_data.get("result", {}) if not isinstance(result, dict): return _create_error_response( "Agent Generator returned invalid result payload", "invalid_response", ) return result elif status == "failed": error_msg = poll_data.get("error", "Job failed") logger.error(f"Agent Generator job {job_id} failed: {error_msg}") return _create_error_response(error_msg, "job_failed") - # else: still running β keep polling + elif status in {"pending", "queued", "running"}: + continue + else: + logger.error(f"Unknown job status for {job_id}: {status!r}") + return _create_error_response( + f"Unknown job status: {status}", + "invalid_response", + )π€ Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py` around lines 240 - 245, The polling loop currently treats any non-"completed"/"failed" status as "still running"; update the branch that checks status (around the elif status == "failed" block handling job_id and poll_data) to explicitly detect unexpected terminal statuses: if status is a known running state (e.g., "pending","running","in_progress") continue polling, otherwise log an error including job_id and the unexpected status/poll_data and return _create_error_response with a distinct code (e.g., "unknown_status") so callers fail fast instead of waiting the full timeout.
π€ Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py`:
- Around line 344-350: Create a small helper (e.g., _extract_agent_json(result))
in service.py that encapsulates the repeated logic: get "agent_json" from the
result, verify it's a dict, and return either the dict or call
_create_error_response("Agent Generator returned no agent_json in result",
"invalid_response"). Replace the three duplicated blocks (the
extraction/validation around agent_json at the locations using variable
agent_json) with calls to this helper so all endpoints use the same validation
and error response.
- Around line 240-245: The polling loop currently treats any
non-"completed"/"failed" status as "still running"; update the branch that
checks status (around the elif status == "failed" block handling job_id and
poll_data) to explicitly detect unexpected terminal statuses: if status is a
known running state (e.g., "pending","running","in_progress") continue polling,
otherwise log an error including job_id and the unexpected status/poll_data and
return _create_error_response with a distinct code (e.g., "unknown_status") so
callers fail fast instead of waiting the full timeout.
βΉοΈ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled
You can enable these sources in your CodeRabbit configuration.
π Files selected for processing (1)
autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py
π Review details
β° Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: types
- GitHub Check: Seer Code Review
- GitHub Check: end-to-end tests
- GitHub Check: test (3.12)
- GitHub Check: Analyze (python)
- GitHub Check: test (3.11)
- GitHub Check: test (3.13)
- GitHub Check: Check PR Status
π§° Additional context used
π Path-based instructions (4)
autogpt_platform/backend/**/*.py
π CodeRabbit inference engine (.github/copilot-instructions.md)
autogpt_platform/backend/**/*.py: Use Python 3.11 (required; managed by Poetry via pyproject.toml) for backend development
Always run 'poetry run format' (Black + isort) before linting in backend development
Always run 'poetry run lint' (ruff) after formatting in backend development
Files:
autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py
autogpt_platform/backend/**/*.{py,txt}
π CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)
Use
poetry runprefix for all Python commands, including testing, linting, formatting, and migrations
Files:
autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py
autogpt_platform/backend/backend/**/*.py
π CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)
Use Prisma ORM for database operations in PostgreSQL with pgvector for embeddings
Files:
autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py
autogpt_platform/**/*.py
π CodeRabbit inference engine (AGENTS.md)
Format Python code with
poetry run format
Files:
autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py
π Additional comments (3)
autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py (3)
125-135: Nice improvement on timeout configuration.Using
settings.config.agentgenerator_timeoutin_get_client()removes the hardcoded timeout and keeps request timeout behavior centrally configurable.
181-227: Polling loop resiliency is solid.
time.monotonic()budget tracking,asyncio.sleep()yielding, and capped consecutive transient error retries are well implemented for long-running jobs.
503-508:close_client()lifecycle handling looks correct.Closing the singleton client and resetting
_clienttoNoneis clean and safe for reinitialization.
β¦nknown poll status
|
Addressed both nitpicks from the latest coderabbit review in ec7c7eb:
|
| const timer = setTimeout(() => resumeStream(), 1_000); | ||
| return () => clearTimeout(timer); |
There was a problem hiding this comment.
Bug: A race condition in the useEffect hook for auto-reconnection can prevent it from firing. If a dependency changes, the reconnect timer is cleared but not reset.
Severity: MEDIUM
Suggested Fix
To fix this, the logic for tracking the previous status needs to be more robust against re-renders. One approach is to use useRef to store the timer ID and only clear it if it exists, while ensuring the condition to set a new timer correctly evaluates the status transition, perhaps by managing the 'previous status' state outside the ref that gets updated every render.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location:
autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts#L281-L282
Potential issue: A race condition exists in the auto-reconnect logic within the
`useEffect` hook. When a stream's `status` transitions from "streaming" to "error", a
one-second timer is set to call `resumeStream`. However, if a dependency in the hook's
dependency array (e.g., the `resumeStream` function) is recreated within that one-second
window, the effect's cleanup function is triggered, clearing the timer. When the effect
re-runs, the `prev` status is now "error", causing the condition `prev === "streaming"`
to fail. This prevents the timer from being set again, and the auto-reconnection never
occurs.
Did we get this right? π / π to inform future reviews.
There was a problem hiding this comment.
π§Ή Nitpick comments (3)
autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py (3)
227-229: Guardpoll_resp.json()againstDecodingError.Same as the submit step: a 200 response with a non-JSON body causes
httpx.DecodingErrorto escape the poll loop, producing an opaque error at the caller site.β»οΈ Proposed fix
consecutive_errors = 0 - poll_data = poll_resp.json() + try: + poll_data = poll_resp.json() + except (httpx.DecodingError, ValueError) as e: + return _create_error_response( + f"Agent Generator returned non-JSON poll response for job {job_id}: {e}", + "invalid_response", + ) status = poll_data.get("status")π€ Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py` around lines 227 - 229, Guard the call to poll_resp.json() against httpx.DecodingError by wrapping it in a try/except: catch httpx.DecodingError around poll_resp.json(), log the decoding error (include poll_resp.status_code and poll_resp.text if available), increment the existing consecutive_errors counter and continue the polling loop instead of letting the exception escape; ensure poll_data is only used when parsing succeeded (i.e., move usage of poll_data and status into the try block or after a successful parse).
159-176: Guardresponse.json()againstDecodingError; consider retrying transient submit errors.Two related concerns on the submit step:
Unguarded
response.json()(line 171): If the agent-generator returns a non-JSON body with a 202 (even transiently),httpx.DecodingErrorescapes_submit_and_pollunhandled. It is caught only by each caller's broadexcept Exception, producing an opaque "Unexpected error" message. Catching it here gives a more actionable diagnostic.No retry for transient submit errors: The poll loop retries 429/503/504/408 up to
MAX_CONSECUTIVE_POLL_ERRORStimes, but a single transient HTTP error during the submit POST fails the entire operation. If brief LB restarts during submission are a concern, a small retry budget here would be consistent.β»οΈ Proposed fix for issue 1 (JSON decoding guard)
data = response.json() + except (httpx.DecodingError, ValueError) as e: + return _create_error_response( + f"Agent Generator returned non-JSON submit response: {e}", + "invalid_response", + ) job_id = data.get("job_id")Or inline within the existing try block:
try: response = await client.post(endpoint, json=payload) response.raise_for_status() + data = response.json() except httpx.HTTPStatusError as e: error_type, error_msg = _classify_http_error(e) logger.error(error_msg) return _create_error_response(error_msg, error_type) except httpx.RequestError as e: error_type, error_msg = _classify_request_error(e) logger.error(error_msg) return _create_error_response(error_msg, error_type) + except (httpx.DecodingError, ValueError) as e: + return _create_error_response( + f"Agent Generator returned non-JSON submit response: {e}", + "invalid_response", + ) - - data = response.json()π€ Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py` around lines 159 - 176, The submit step in _submit_and_poll should guard against httpx.DecodingError when calling response.json() and should retry transient submit failures: wrap the client.post call in a small retry loop (e.g., 2β3 attempts with brief backoff) that treats transient status codes (429, 503, 504, 408) and RequestError as retryable (consistent with MAX_CONSECUTIVE_POLL_ERRORS behavior), use _classify_http_error/_classify_request_error to classify/log failures, and on final failure return _create_error_response with a clear "invalid_response" or appropriate error_type; additionally catch httpx.DecodingError after a successful response and return _create_error_response with a diagnostic message instead of letting it propagate.
34-36: Consider exposing poll interval/budget inSettingsalongsideagentgenerator_timeout.
agentgenerator_timeout(per-request HTTP timeout) is now a configurable setting, butPOLL_INTERVAL_SECONDS(10 s) andMAX_POLL_TIME_SECONDS(1800 s) are hardcoded constants. For deployments with different job latency profiles these are operationally useful knobs. This is a low-priority improvement, but keeping the three values together inSettingswould be consistent.π€ Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py` around lines 34 - 36, The hardcoded polling constants POLL_INTERVAL_SECONDS, MAX_POLL_TIME_SECONDS, and MAX_CONSECUTIVE_POLL_ERRORS should be moved into the app Settings alongside agentgenerator_timeout so they are configurable per deployment; add new Settings fields (e.g., agentgenerator_poll_interval, agentgenerator_poll_timeout, agentgenerator_max_consecutive_poll_errors) with the current defaults (10.0, 1800.0, 5), update autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py to remove the constants and read those values from Settings (use the same Settings accessor used for agentgenerator_timeout) wherever POLL_INTERVAL_SECONDS, MAX_POLL_TIME_SECONDS, and MAX_CONSECUTIVE_POLL_ERRORS are referenced, and ensure any validation/type casting (float/int) is applied and unit tests/usage still behave the same by default.
π€ Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py`:
- Around line 227-229: Guard the call to poll_resp.json() against
httpx.DecodingError by wrapping it in a try/except: catch httpx.DecodingError
around poll_resp.json(), log the decoding error (include poll_resp.status_code
and poll_resp.text if available), increment the existing consecutive_errors
counter and continue the polling loop instead of letting the exception escape;
ensure poll_data is only used when parsing succeeded (i.e., move usage of
poll_data and status into the try block or after a successful parse).
- Around line 159-176: The submit step in _submit_and_poll should guard against
httpx.DecodingError when calling response.json() and should retry transient
submit failures: wrap the client.post call in a small retry loop (e.g., 2β3
attempts with brief backoff) that treats transient status codes (429, 503, 504,
408) and RequestError as retryable (consistent with MAX_CONSECUTIVE_POLL_ERRORS
behavior), use _classify_http_error/_classify_request_error to classify/log
failures, and on final failure return _create_error_response with a clear
"invalid_response" or appropriate error_type; additionally catch
httpx.DecodingError after a successful response and return
_create_error_response with a diagnostic message instead of letting it
propagate.
- Around line 34-36: The hardcoded polling constants POLL_INTERVAL_SECONDS,
MAX_POLL_TIME_SECONDS, and MAX_CONSECUTIVE_POLL_ERRORS should be moved into the
app Settings alongside agentgenerator_timeout so they are configurable per
deployment; add new Settings fields (e.g., agentgenerator_poll_interval,
agentgenerator_poll_timeout, agentgenerator_max_consecutive_poll_errors) with
the current defaults (10.0, 1800.0, 5), update
autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py to
remove the constants and read those values from Settings (use the same Settings
accessor used for agentgenerator_timeout) wherever POLL_INTERVAL_SECONDS,
MAX_POLL_TIME_SECONDS, and MAX_CONSECUTIVE_POLL_ERRORS are referenced, and
ensure any validation/type casting (float/int) is applied and unit tests/usage
still behave the same by default.
βΉοΈ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled
You can enable these sources in your CodeRabbit configuration.
π Files selected for processing (1)
autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py
π Review details
β° Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: test (3.13)
- GitHub Check: test (3.12)
- GitHub Check: end-to-end tests
- GitHub Check: test (3.11)
- GitHub Check: Check PR Status
π§° Additional context used
π Path-based instructions (4)
autogpt_platform/backend/**/*.py
π CodeRabbit inference engine (.github/copilot-instructions.md)
autogpt_platform/backend/**/*.py: Use Python 3.11 (required; managed by Poetry via pyproject.toml) for backend development
Always run 'poetry run format' (Black + isort) before linting in backend development
Always run 'poetry run lint' (ruff) after formatting in backend development
Files:
autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py
autogpt_platform/backend/**/*.{py,txt}
π CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)
Use
poetry runprefix for all Python commands, including testing, linting, formatting, and migrations
Files:
autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py
autogpt_platform/backend/backend/**/*.py
π CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)
Use Prisma ORM for database operations in PostgreSQL with pgvector for embeddings
Files:
autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py
autogpt_platform/**/*.py
π CodeRabbit inference engine (AGENTS.md)
Format Python code with
poetry run format
Files:
autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py
𧬠Code graph analysis (1)
autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py (2)
autogpt_platform/backend/backend/util/request.py (2)
post(569-570)json(295-301)autogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.py (2)
generate_agent_dummy(101-115)customize_template_dummy(140-151)
π Additional comments (1)
autogpt_platform/backend/backend/copilot/tools/agent_generator/service.py (1)
143-252:_submit_and_pollpolling logic is well-structured overall.The poll loop correctly:
- Uses
time.monotonic()for accurate wall-clock budget tracking.- Resets
consecutive_errorsafter any successful poll response (line 227).- Distinguishes 404 (non-retryable) from transient statuses (retryable) before computing
status_code.- Fails fast on unknown job statuses (
invalid_response) rather than burning the full poll budget.- Returns a structured error dict in every exit path.
All three items from previous review rounds (transient HTTP retry,
agent_jsonvalidation,resultdict type guard) are correctly addressed.
autogpt-reviewer
left a comment
There was a problem hiding this comment.
PR #12199 β Re-review #2: feat(copilot): async polling for agent-generator + SSE auto-reconnect
Author: majdyz | HEAD: ec7c7eb | Previous: d78e0ee (4 new commits)
π― Verdict: APPROVE
What Changed Since Last Review
Author addressed both conditions and most should-fix items from our first review in 4 focused commits:
| Commit | What |
|---|---|
0bc098a |
Wire agentgenerator_timeout setting into client, use time.monotonic() |
9b3e25d |
Retry transient HTTP errors (429/503/504/408), validate job_id, MAX_CONSECUTIVE_POLL_ERRORS=5 |
8ef8bec |
Validate completed job result type |
ec7c7eb |
Extract _extract_agent_json helper, fail fast on unknown status |
Condition Status
- β
agentgenerator_timeoutdead code β FIXED._get_client()now readssettings.config.agentgenerator_timeout. Hardcoded_REQUEST_TIMEOUTremoved. - β
Restore
decompose_goal_externaldocstring β FIXED. Full return-type enumeration restored with all 5 variants.
Should-Fix Status
| Item | Status |
|---|---|
time.monotonic() for elapsed tracking |
β FIXED |
| Consecutive transient error cap | β
FIXED (MAX_CONSECUTIVE_POLL_ERRORS=5) |
| Transient HTTP retry (429/503/504/408) | β FIXED (new, beyond what we asked) |
Validate result type on completed job |
β FIXED |
| Unknown job status β error | β
FIXED (new β explicit running/pending/queued allowlist) |
_extract_agent_json DRY helper |
β
FIXED (replaces 3Γ result.get("agent_json")) |
| 5 new tests for above | β FIXED |
| Frontend reconnect tests | β Not addressed (follow-up) |
TestCustomizeTemplateExternal |
β Not addressed (follow-up) |
RECONNECT_DELAY_MS constant |
β Not addressed (minor) |
New Code Quality Assessment
The new code is well-structured:
- Transient HTTP retry correctly differentiates 429/503/504/408 (retry with counter) from 500/other (fail immediately). Good distinction.
consecutive_errorscounter resets to 0 on successful poll β only consecutive failures trigger give-up. Correct behavior._extract_agent_jsoneliminates 3Γ identical.get("agent_json")calls with proper validation. Returns typed error on missing/invalid.- Explicit status allowlist (
running/pending/queued) with fallthrough error for unknown statuses β defensive and correct. - Test coverage for all new paths: transient HTTP retry + recovery, non-transient immediate fail, consecutive error give-up, monotonic time mocking, missing agent_json validation.
Remaining (Non-blocking Follow-ups)
- Frontend SSE reconnect tests β 40 lines of stateful
useEffectstill untested. Not blocking because the code was reviewed by 3 specialists and logic is straightforward, but should be added. TestCustomizeTemplateExternalβ Still missing. Low risk (mirrorsgenerate_agent_patch_externallogic).RECONNECT_DELAY_MSconstant β Minor style nit.
Risk Assessment
Merge risk: LOW | Rollback: EASY (4 files, no migrations)
CI: All tests pass (3.11/3.12/3.13), lint β
, types β
, e2e β
. Check PR Status failure is a race condition (ran before other checks finished).
@ntindle Clean iteration β author addressed all conditions and most should-fix items with well-tested code. Ready to merge.
|
This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request. |
|
is this still needed? |
ntindle
left a comment
There was a problem hiding this comment.
Code Review β feat(backend/copilot): async polling for agent-generator + SSE auto-reconnect
The submit-and-poll approach is the right architectural decision for surviving GCP L7 LB kill-offs, and the overall structure is clean. However, there are several issues ranging from a behavioral loop bug to missing safety nets that need to be addressed before merge.
π΄ Blocking
1. Infinite reconnect loop in useCopilotPage.ts (line ~1603)
The reconnect counter is reset every time status === "streaming" β which happens immediately after each successful resumeStream(). This means the 3-attempt guard is per-error-burst, not per-session, and can loop indefinitely:
streaming β error (attempt 1, counter = 1)
resumeStream() called
streaming (counter RESETS to 0 β bug)
streaming β error (attempt 1 again, counter = 1)
... loops forever if LB keeps killing connections every ~5 min
Fix: Only reset the counter on status === "ready" (stream fully and successfully complete), not on status === "streaming":
// Reset reconnect counter ONLY when stream completes normally
if (status === "ready") {
reconnectAttemptsRef.current = 0;
}2. response.json() / poll_resp.json() not guarded in _submit_and_poll (service.py, lines ~142 & ~199)
If the agent-generator (or an intervening proxy) returns a non-JSON body (HTML error page, plain-text gateway timeout, etc.) both data = response.json() and poll_data = poll_resp.json() will throw a json.JSONDecodeError. This propagates out of _submit_and_poll unhandled, gets caught by the bare except Exception in callers, and surfaces as a cryptic "unexpected_error".
Fix: Wrap both .json() calls:
try:
data = response.json()
except Exception as exc:
return _create_error_response(
f"Agent Generator returned non-JSON submit response: {exc}",
"invalid_response",
)Do the same for poll_data = poll_resp.json() inside the polling loop.
3. Silent removal of operation_id / session_id parameters is a breaking API change (service.py)
generate_agent_patch_external and customize_template_external previously accepted operation_id: str = "" and session_id: str = "" keyword arguments that enabled Redis Streams async-callback mode. These are gone without a deprecation shim.
Any caller that passes those keyword arguments will receive an immediate TypeError at runtime. The companion PR description notes the {"status": "accepted"} response type is also removed, but the internal callers of these functions (copilot router, etc.) have not been shown to be audited.
Required: Show the call sites of both functions and confirm no caller passes operation_id or session_id. If they do, add **_ignored_kwargs or raise a DeprecationWarning with a grace-period PR.
4. agentgenerator_timeout semantic change silently breaks operator overrides (settings.py, line ~656)
Changing the default from 1800 β 30 is fine for new deployments, but:
- Anyone who has overridden
agentgenerator_timeout=60(for example) expecting 60 seconds total time will now have a 60-second per-request HTTP timeout and still poll for up to 30 minutes. - The hardcoded
MAX_POLL_TIME_SECONDS = 1800.0inservice.pycannot be overridden via settings at all.
Fix: Expose MAX_POLL_TIME_SECONDS as a settings field (e.g., agentgenerator_poll_timeout: int = Field(default=1800, ...)). Update the agentgenerator_timeout description to say "per-request HTTP timeout" explicitly and note the default change.
π High Severity
5. MAX_RECONNECT_ATTEMPTS defined inside the hook (useCopilotPage.ts, line ~1559)
const MAX_RECONNECT_ATTEMPTS = 3; // recreated on every renderMove to module scope. This is also flagged by most lint/ESLint no-magic-numbers rules.
6. resumeStream in useEffect deps may not be stable (useCopilotPage.ts, line ~1606)
resumeStream is listed in the useEffect dependency array. If resumeStream is not wrapped in useCallback with a stable dependency set, it will be a new function reference on every render, causing the effect to fire on every render β potentially triggering reconnect logic unexpectedly. Please confirm resumeStream has a stable reference, or wrap the call inside the effect with a ref to avoid the dependency.
7. No tests for customize_template_external polling behavior (test_service.py)
customize_template_external was migrated to _submit_and_poll but there is no corresponding test class in the diff. It is the only migrated endpoint without coverage for:
- Happy-path
agent_jsonextraction clarifying_questionspassthrough- Error propagation from
_submit_and_poll _extract_agent_jsonvalidation failure
Add a TestCustomizeTemplateExternal class mirroring TestGenerateAgentPatchExternal.
8. No cancellation of in-flight polling job on caller abort (service.py, _submit_and_poll)
If the FastAPI request handler is cancelled (client closes connection, timeout), asyncio.CancelledError will propagate correctly out of asyncio.sleep, but the job submitted to the agent-generator will continue running on the remote side until completion. There is no cleanup path.
Fix: Catch asyncio.CancelledError, issue a best-effort DELETE /api/jobs/{job_id}, then re-raise:
except asyncio.CancelledError:
logger.warning(f"Polling cancelled for job {job_id}, attempting cleanup")
with contextlib.suppress(Exception):
await client.delete(f"/api/jobs/{job_id}")
raiseπ‘ Medium / Suggestions
9. asyncio.sleep patched globally in tests (test_service.py)
patch("asyncio.sleep", new_callable=AsyncMock) patches the global symbol, not the one imported by the module under test. Prefer:
patch("backend.copilot.tools.agent_generator.service.asyncio.sleep", new_callable=AsyncMock)This is more precise and won't accidentally suppress sleeps in other co-routines sharing the event loop.
10. _extract_agent_json return type is ambiguous (service.py, lines ~226-236)
_extract_agent_json returns either valid agent JSON or an error-response dict, both typed as dict[str, Any]. Callers check result.get("type") == "error" before calling it, so errors from validation inside _extract_agent_json silently bubble back as undetected error dicts.
Consider returning None on validation failure (and generating the error response in the caller) so type checkers can distinguish the success case from the error case.
11. _submit_and_poll tests reach into private API (test_service.py, TestSubmitAndPoll)
await service._submit_and_poll(...) tests the private helper directly. These tests are fragile β renaming or splitting the function breaks the entire class. Most of the scenarios (transient retry, 404, timeout) are already implicitly exercised by the public-API tests. Consider whether direct private-function tests are necessary or if they can be folded into the public-API test classes.
12. _client singleton ignores runtime agentgenerator_timeout changes (service.py, _get_client())
This was true before this PR but becomes more operationally important now that agentgenerator_timeout controls per-request HTTP timeouts in a polling loop. A comment documenting that the client must be reset (set service._client = None) after changing the setting would save future debugging time.
13. hasActiveStream timing in useEffect (useCopilotPage.ts)
hasActiveStream is read inside useEffect, which runs after the render commit. If status β "error" and hasActiveStream β false are batched into the same React update, the effect may see hasActiveStream === false and skip the reconnect even though the backend job is still running. Document the ordering assumption or use a useRef for hasActiveStream to guarantee synchronous reads.
π Nits
- The
float()cast intimeout = httpx.Timeout(float(settings.config.agentgenerator_timeout))is redundant βhttpx.Timeoutalready acceptsint. Removefloat(). - Docstrings on
_create_error_response,_classify_http_error, and_classify_request_errorwere shortened to one-liners. The removedArgs/Returnssections on_classify_*functions (which have non-obvious tuple return types) should be preserved for discoverability. decompose_goal_external,generate_agent_external, andgenerate_agent_patch_externalstill declare-> dict[str, Any] | Nonebut the new implementation never returnsNone. Update return types to-> dict[str, Any]to avoid misleading callers.
Summary of required changes before merge:
- Fix infinite reconnect loop (reset counter on
"ready"only, not"streaming") - Guard
.json()calls againstJSONDecodeErrorin_submit_and_poll - Audit and confirm
operation_id/session_idremoval doesn't break any callers - Expose
MAX_POLL_TIME_SECONDSvia settings; updateagentgenerator_timeoutdescription - Add
TestCustomizeTemplateExternaltest class - Handle
asyncio.CancelledErrorwith best-effort job cleanup

Summary
time.monotonic())asyncio.sleep()in poll loop yields to event loop, keeping SSE heartbeats alive through GCP's L7 load balanceragent_jsonresponses validated before returning (prevents silentNonepropagation)agentgenerator_timeoutsetting wired into HTTP client (was hardcoded)Why
GCP's L7 load balancer kills SSE connections at ~5 minutes. Agent generation takes 5β30 minutes, so the old synchronous HTTP call would block for the entire duration, causing the SSE connection to die. The new polling pattern keeps each request short (~30s timeout) while
asyncio.sleep(10)yields control for heartbeat delivery.Changes
service.py_submit_and_poll()helper with retry logic for transient errors,agent_jsonvalidation,agentgenerator_timeoutwired into client,time.monotonic()for wall-clock timeoutsettings.pyagentgenerator_timeoutdefault 1800β30 (now per-request, not total)useCopilotPage.tstest_service.pyCompanion PR
Review comments addressed
agentgenerator_timeoutsetting into HTTP clienttime.monotonic()for accurate wall-clock timeoutdecompose_goal_externaldocstring with return-type enumerationagent_jsonexists before returning from 3 endpointsTest plan