fix(platform/copilot): fix stuck sessions, stop button, and StreamFinish reliability#12191
fix(platform/copilot): fix stuck sessions, stop button, and StreamFinish reliability#12191
Conversation
…ution Removes background task spawning for long-running tools (create_agent, edit_agent) in favor of synchronous blocking execution. This simplifies the conversation flow by returning actual results to Claude in a single LLM continuation instead of two separate continuations. Changes: - Remove asyncio.create_task() background task spawning - Execute tools synchronously with await - Fix database access to use get_chat_session().messages - Remove unused pending_msg/started_msg variables - Fix dummy agent generator to match real behavior - Add callProviderMetadata for frontend mini-game indication Benefits: - Single LLM continuation with actual result (not operation started) - Clearer conversation flow - Frontend mini-game still shows via SSE during blocking execution - Simplified codebase without background task tracking
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds provider metadata to tool-input SSE and surfaces long-running tools to the frontend; persists tool_use_id in a ContextVar for selected tools; converts long-running tool execution into a synchronous streaming path that returns final results; introduces async publish behavior for dummy agent tools and stale-task auto-completion in stream registry. Changes
Sequence Diagram(s)sequenceDiagram
actor Frontend as Frontend
participant Service as Copilot Service
participant Adapter as SDK Adapter
participant Tool as Tool Executor
participant Registry as Stream Registry
Frontend->>Service: Request long-running tool (e.g. create_agent)
Service->>Frontend: Emit StreamToolInputAvailable (callProviderMetadata: {isLongRunning:true})
Frontend-->>Frontend: Render long-running UI state
Service->>Adapter: Invoke tool handler
Adapter->>Adapter: Set _current_tool_use_id ContextVar
Adapter->>Tool: Execute tool synchronously (streaming)
Tool-->>Adapter: Stream outputs / final result
Adapter-->>Service: Return final result content
Service->>Registry: Save tool call state & emit StreamToolOutputAvailable (result)
Frontend-->>Frontend: Display final result and clear indicator
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🔍 PR Overlap DetectionThis check compares your PR against all other open PRs targeting the same branch to detect potential merge conflicts early. 🔴 Merge Conflicts DetectedThe following PRs have been tested and will have merge conflicts if merged after this PR. Consider coordinating with the authors.
🟢 Low Risk — File Overlap OnlyThese PRs touch the same files but different sections (click to expand)
Summary: 8 conflict(s), 0 medium risk, 4 low risk (out of 12 PRs with file overlap) Auto-generated on push. Ignores: |
|
autogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.py
Outdated
Show resolved
Hide resolved
Additional Comments (2)
The function signature indicates it should return
These debug logging statements (lines 66-79) with serialization tests add overhead to every tool call. The PR description mentions this passed format/lint but these should be removed for production. |
There was a problem hiding this comment.
Actionable comments posted: 4
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (7)
autogpt_platform/backend/backend/copilot/response_model.pyautogpt_platform/backend/backend/copilot/sdk/response_adapter.pyautogpt_platform/backend/backend/copilot/sdk/security_hooks.pyautogpt_platform/backend/backend/copilot/sdk/service.pyautogpt_platform/backend/backend/copilot/sdk/tool_adapter.pyautogpt_platform/backend/backend/copilot/service.pyautogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.py
🧰 Additional context used
📓 Path-based instructions (4)
autogpt_platform/backend/**/*.py
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
autogpt_platform/backend/**/*.py: Use Python 3.11 (required; managed by Poetry via pyproject.toml) for backend development
Always run 'poetry run format' (Black + isort) before linting in backend development
Always run 'poetry run lint' (ruff) after formatting in backend development
Files:
autogpt_platform/backend/backend/copilot/sdk/response_adapter.pyautogpt_platform/backend/backend/copilot/response_model.pyautogpt_platform/backend/backend/copilot/sdk/tool_adapter.pyautogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.pyautogpt_platform/backend/backend/copilot/sdk/service.pyautogpt_platform/backend/backend/copilot/sdk/security_hooks.pyautogpt_platform/backend/backend/copilot/service.py
autogpt_platform/backend/**/*.{py,txt}
📄 CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)
Use
poetry runprefix for all Python commands, including testing, linting, formatting, and migrations
Files:
autogpt_platform/backend/backend/copilot/sdk/response_adapter.pyautogpt_platform/backend/backend/copilot/response_model.pyautogpt_platform/backend/backend/copilot/sdk/tool_adapter.pyautogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.pyautogpt_platform/backend/backend/copilot/sdk/service.pyautogpt_platform/backend/backend/copilot/sdk/security_hooks.pyautogpt_platform/backend/backend/copilot/service.py
autogpt_platform/backend/backend/**/*.py
📄 CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)
Use Prisma ORM for database operations in PostgreSQL with pgvector for embeddings
Files:
autogpt_platform/backend/backend/copilot/sdk/response_adapter.pyautogpt_platform/backend/backend/copilot/response_model.pyautogpt_platform/backend/backend/copilot/sdk/tool_adapter.pyautogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.pyautogpt_platform/backend/backend/copilot/sdk/service.pyautogpt_platform/backend/backend/copilot/sdk/security_hooks.pyautogpt_platform/backend/backend/copilot/service.py
autogpt_platform/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Format Python code with
poetry run format
Files:
autogpt_platform/backend/backend/copilot/sdk/response_adapter.pyautogpt_platform/backend/backend/copilot/response_model.pyautogpt_platform/backend/backend/copilot/sdk/tool_adapter.pyautogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.pyautogpt_platform/backend/backend/copilot/sdk/service.pyautogpt_platform/backend/backend/copilot/sdk/security_hooks.pyautogpt_platform/backend/backend/copilot/service.py
🧬 Code graph analysis (4)
autogpt_platform/backend/backend/copilot/sdk/response_adapter.py (1)
autogpt_platform/backend/backend/copilot/response_model.py (6)
StreamToolInputAvailable(149-161)to_sse(48-60)to_sse(76-84)to_sse(178-187)to_sse(212-222)to_sse(237-239)
autogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.py (1)
autogpt_platform/backend/backend/copilot/completion_consumer.py (1)
publish_operation_complete(319-349)
autogpt_platform/backend/backend/copilot/sdk/service.py (3)
autogpt_platform/backend/backend/copilot/model.py (1)
get_chat_session(342-394)autogpt_platform/backend/backend/copilot/db.py (1)
get_chat_session(26-32)autogpt_platform/backend/backend/copilot/response_model.py (1)
StreamToolInputAvailable(149-161)
autogpt_platform/backend/backend/copilot/service.py (5)
autogpt_platform/backend/backend/copilot/tools/models.py (2)
ErrorResponse(207-212)OperationInProgressResponse(451-459)autogpt_platform/backend/backend/copilot/tools/create_agent.py (1)
is_long_running(51-52)autogpt_platform/backend/backend/copilot/tools/base.py (1)
is_long_running(40-47)autogpt_platform/backend/backend/copilot/response_model.py (2)
StreamToolInputAvailable(149-161)StreamToolOutputAvailable(164-187)autogpt_platform/backend/backend/copilot/stream_registry.py (3)
create_task(86-177)mark_task_completed(644-700)get_task(725-750)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: types
- GitHub Check: Greptile Review
- GitHub Check: Seer Code Review
- GitHub Check: test (3.11)
- GitHub Check: test (3.13)
- GitHub Check: test (3.12)
- GitHub Check: Check PR Status
- GitHub Check: Analyze (python)
🔇 Additional comments (12)
autogpt_platform/backend/backend/copilot/sdk/security_hooks.py (1)
189-198: LGTM: contextvar capture for long‑running tool IDs.autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py (2)
67-71: LGTM: context var for tool_use_id tracking.
301-309: LGTM: long‑running callback observability.autogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.py (2)
107-162: LGTM: async/sync dummy generator flow + delayed publish.
171-238: LGTM: patch flow mirrors async/sync behavior cleanly.autogpt_platform/backend/backend/copilot/sdk/service.py (2)
218-283: LGTM: synchronous long‑running callback flow is clearer.
908-912: LGTM: tool input event logging includes provider metadata.autogpt_platform/backend/backend/copilot/service.py (5)
66-66: LGTM: import cleanup for updated response models.
1422-1434: LGTM: provider metadata for long‑running tools.
1632-1650: LGTM: no auto‑continuation for legacy background path.
1663-1803: LGTM: synchronous + async‑polling path is well structured.
1843-1851: LGTM: skip pending update for fallback IDs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@autogpt_platform/backend/backend/copilot/response_model.py`:
- Around line 50-59: The SSE serialization is logging full JSON at INFO which
can leak sensitive tool inputs; in the to_sse serialization path (use symbols
model_dump_json, callProviderMetadata, and logger.info) only serialize when
debugging is enabled and log at DEBUG level: check
logger.isEnabledFor(logging.DEBUG) before calling model_dump_json and replace
logger.info with logger.debug, and apply the same change to the other occurrence
referenced (the block around lines 158-161) so sensitive payloads are not
produced or logged at INFO.
In `@autogpt_platform/backend/backend/copilot/sdk/response_adapter.py`:
- Around line 118-149: Change the info-level debug logs that serialize tool
inputs to debug-level and guard expensive/sensitive serialization behind a
debug-enabled check: replace logger.info calls that reference provider_metadata,
the created StreamToolInputAvailable (tool_input_obj),
tool_input_obj.callProviderMetadata, tool_input_obj.model_dump_json(...) and
tool_input_obj.to_sse() with logger.debug and wrap the model_dump_json/to_sse
calls in if logger.isEnabledFor(logging.DEBUG): so serialization only happens
when DEBUG is enabled; keep responses.append(tool_input_obj) unchanged.
In `@autogpt_platform/backend/backend/copilot/sdk/service.py`:
- Around line 165-178: The helper is checking tool_call.get("name") but
session.messages stores OpenAI-format tool_calls with the tool name nested under
the "function" key, so update the lookup in the loop over msg.tool_calls (the
block referencing msg.tool_calls, tool_name, tool_use_id, session_id, logger) to
first try extracting the name from tool_call.get("function", {}).get("name") and
fall back to tool_call.get("name") for backward compatibility; keep the existing
MCP prefix check (f"mcp__copilot__{tool_name}") and preserve the logic that logs
tool_use_id and returns it when found.
In `@autogpt_platform/backend/backend/copilot/service.py`:
- Around line 1465-1531: The synchronous execution path that awaits
_execute_long_running_tool_with_streaming currently holds the SSE connection
idle and must be updated to send periodic heartbeat StreamToolOutputAvailable
events while waiting: wrap the await call in an async heartbeat loop (similar to
the normal tool heartbeat implementation) that schedules periodic yields of a
small heartbeat payload (e.g., {"heartbeat": true}) using the same tool_call_id/
task_id/session_id, and only stop heartbeats when the awaited
_execute_long_running_tool_with_streaming completes or raises; on exception
ensure you still mark the task via stream_registry.mark_task_completed and send
the error StreamToolOutputAvailable before returning, and on success send the
final result_str as currently done.
autogpt_platform/backend/backend/copilot/sdk/response_adapter.py
Outdated
Show resolved
Hide resolved
autogpt-reviewer
left a comment
There was a problem hiding this comment.
📋 Final Review Summary
PR: #12191 — fix(backend/copilot): refactor long-running tools to synchronous execution
Author: majdyz | Reviewers: Swiftyos, Pwuts | Size: XL (+434/-230, 7 files) | CI: ✅ All green | CLA: ✅ Signed
Specialist Verdicts
| Reviewer | Verdict | Key Finding |
|---|---|---|
| 🛡️ Security | ✅ | No auth bypass or secrets exposure; ContextVar usage is safe |
| 🏗️ Architecture | Fundamental concern: sync blocking ties up SSE connection for 5+ min; ContextVar set but unused | |
| ⚡ Performance | Hard polling loop (1s × 300) is wasteful; exclude_none=False increases all SSE payloads |
|
| 🧪 Testing | No new tests for critical new code paths (polling loop, _find_latest_tool_use_id, timeout handling) | |
| 📖 Quality | Excessive debug logging (5+ lines per tool call in response_adapter.py); dead ContextVar code | |
| 📦 Product | Claude now gets real results (good!), but removal of LLM continuation breaks reconnection UX | |
| 📬 Discussion | ✅ | No unresolved threads from maintainers |
| 🔎 QA Validator | SSE contract change (exclude_none=False) affects all events; frontend compatibility unverified |
Blockers (Must Fix Before Merge)
1. Dead ContextVar — _current_tool_use_id set but never read
security_hooks.py:193sets_current_tool_use_idfor long-running tools- But
_build_long_running_callbackinsdk/service.pyuses_find_latest_tool_use_id()(DB query) instead - The ContextVar is imported in
security_hooks.pyfromtool_adapter.pybut never consumed by the callback - Fix: Either use the ContextVar in the callback (faster, no DB query) or remove the dead code
2. exclude_none=False is a global SSE change with unverified frontend impact
response_model.py:50—to_sse()onStreamBaseResponsenow includes ALL None fields in every SSE event- This affects every SSE message type (text chunks, tool starts, tool outputs, usage events, etc.), not just long-running tools
- Increases payload size for every event and may break frontend parsers that don't expect
nullfields - Fix: Apply
exclude_none=Falseonly toStreamToolInputAvailable.to_sse()override, not the base class
3. Hard polling loop for 202 Accepted webhooks
service.py:1720-1745—asyncio.sleep(1.0)in a loop up to 300 iterations (5 minutes)- This holds the SSE generator, an async task slot, and the stream lock for the entire duration
- Should use Redis pub/sub,
asyncio.Event, orstream_registrynotification instead of busy-wait - Fix: Replace polling with event-driven notification from
completion_consumer
Should Fix
4. Excessive debug logging in production code
response_adapter.py:127-146— 5 sequentiallogger.info()calls per tool call including:- Full JSON serialization test (
model_dump_json) to_sse()test call (serializes twice)- Field existence verification
- Full JSON serialization test (
- These should be
logger.debug()at most, or removed entirely before merge
5. Fire-and-forget asyncio.create_task() in dummy.py
dummy.py:127anddummy.py:192—asyncio.create_task()without storing reference- Task can be garbage-collected before completion
- Fix: Store in module-level
_background_tasksset (same pattern already used elsewhere in this codebase)
6. Removed LLM continuation breaks reconnection experience
service.py:1632-1636andservice.py:1647-1650— Both success and error paths now skip_generate_llm_continuation- If a user disconnects during the 5-minute wait and reconnects, they'll see raw tool output with no AI explanation
- The PR description doesn't mention this as an intentional UX change
Nice to Have
7. callProviderMetadata hardcoded for only two tools
response_adapter.py:128andservice.py:1428—if tool_name in ("create_agent", "edit_agent")is fragile- Should use
tool.is_long_runningproperty (already available) instead of hardcoded names - The non-SDK path (
service.py:1428) correctly usestool.is_long_running, but the SDK path (response_adapter.py:128) hardcodes names
8. _find_latest_tool_use_id is O(n) on session message count
- Iterates
reversed(session.messages)which loads all messages; for long sessions this could be slow - If the ContextVar approach (blocker #1) is adopted, this function becomes unnecessary
Risk Assessment
- Merge risk: MEDIUM — Core architectural change from async→sync with several edge cases in webhook polling and SSE serialization
- Rollback difficulty: EASY — Clean revert, no migrations or schema changes
🎯 Final Verdict: REQUEST CHANGES
Recommendation
The core idea is sound — giving Claude actual tool results instead of "operation started" placeholders is a significant improvement for copilot response quality. However, three issues need addressing before merge:
- The dead ContextVar code suggests incomplete refactoring — either the DB query approach or the ContextVar approach should be chosen, not both partially implemented
- The
exclude_none=Falsechange on the base class is a shotgun change that affects all SSE consumers — scope it to just the events that need it - The hard polling loop is a reliability concern for production — a 5-minute busy-wait holding SSE resources will cause problems under load
The debug logging and fire-and-forget tasks are lower priority but should be cleaned up.
For Maintainers
@Swiftyos @Pwuts — This PR is closely related to #12190 (same author, same infrastructure). The sync-vs-async decision is architecturally important for the copilot's long-running tool story. Recommend reviewing blockers #1-3 with the author before approving.
…d cleanup This commit completes the synchronous long-running tool execution refactor by: 1. **Fixes all PR review issues**: - Move logging import to module level in response_model.py - Change INFO logging to DEBUG with isEnabledFor guards to prevent sensitive data leakage - Fix tool name lookup in SDK to check function.name (OpenAI format) - Add heartbeats during synchronous execution to keep SSE connection alive - Fix polling timeout to release Redis lock and mark task as failed - Track background tasks in dummy.py to prevent garbage collection 2. **Add frontend support for callProviderMetadata**: - Update CreateAgent.tsx to detect long-running via callProviderMetadata.isLongRunning - Update EditAgent.tsx with same detection logic - Maintain backwards compatibility with operation_started/pending output types 3. **Remove dead backend code**: - Remove unused _background_tasks set from service.py (only used in dummy.py now) - Remove dead _execute_long_running_tool function (never called) - Add comments clarifying legacy vs new execution paths 4. **Update frontend comments**: - Clarify that useLongRunningToolPolling is for async fallback and legacy sessions - Document that modern synchronous execution bypasses polling The synchronous execution flow now works end-to-end: - Backend sets callProviderMetadata.isLongRunning on tool input - Backend blocks synchronously with heartbeats - Frontend shows mini-game immediately when detecting isLongRunning - Backend returns actual result after completion - Frontend displays result (single LLM continuation) Async fallback path (202 Accepted + webhooks) still supported for external services.
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (3)
autogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.py (1)
154-166: Addexc_info=Trueto error logs in background publish functions to preserve tracebacksBoth
_publish_dummy_result_after_delayand_publish_dummy_patch_after_delaylog onlystr(e)on failure, discarding the full traceback. This makes diagnosing Redis connectivity or serialization failures in dev/CI very difficult.♻️ Proposed fix (applies to both functions)
- except Exception as e: - logger.error(f"[Dummy] Failed to publish to Redis Streams: {e}") + except Exception: + logger.error("[Dummy] Failed to publish to Redis Streams", exc_info=True)Also applies to: 233-245
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@autogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.py` around lines 154 - 166, The error handlers in _publish_dummy_result_after_delay and _publish_dummy_patch_after_delay currently log only the exception string, losing tracebacks; update both except blocks to pass exc_info=True to logger.error (keeping the existing message and exception variable e) so the full traceback is preserved for debugging when publish_operation_complete or publish_operation_patch fails.autogpt_platform/frontend/src/app/(platform)/copilot/tools/CreateAgent/CreateAgent.tsx (1)
124-126: Extract thecallProviderMetadatacast into a shared helper to reduce duplication.The same
as unknown as { callProviderMetadata?: ... }cast appears in bothCreateAgent.tsxandEditAgent.tsx. Consider extracting a small typed helper (e.g., in a shared helpers file) that safely extractsisLongRunningfrom a tool part.♻️ Example helper
// e.g., in a shared copilot/tools/helpers.ts export function isLongRunningTool(part: { [key: string]: unknown }): boolean { const meta = (part as Record<string, unknown>).callProviderMetadata; if (meta && typeof meta === "object" && "isLongRunning" in meta) { return (meta as { isLongRunning?: boolean }).isLongRunning === true; } return false; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@autogpt_platform/frontend/src/app/`(platform)/copilot/tools/CreateAgent/CreateAgent.tsx around lines 124 - 126, Extract the repeated cast logic into a shared helper (e.g., isLongRunningTool) and use it from both CreateAgent.tsx and EditAgent.tsx: implement a small function that accepts the tool part (type like Record<string, unknown> or { [key: string]: unknown }) and safely reads callProviderMetadata?.isLongRunning returning a boolean, then replace the inline cast expression in CreateAgent (the isLongRunning computation) and the analogous expression in EditAgent to call this helper.autogpt_platform/backend/backend/copilot/sdk/service.py (1)
140-191: Race condition concern is mitigated by cache synchronization and fallback ID generation.The session message is persisted before
_callbackfires. At line 956,upsert_chat_session()saves theAssistantMessagewith tool calls to both the database and Redis cache synchronously viacache_chat_session(). The callback then queries the cached session, which should reflect the update. However, in a distributed system, a brief staleness window is theoretically possible. The fallback ID generation at lines 225–231 adequately handles this case by generating asdk-ID if the tool_use_id lookup fails. The comment at line 220 asserting "CRITICAL" and "already saved" is accurate for database persistence, but the fallback design already accounts for any transient cache inconsistencies, making additional retry/delay logic unnecessary.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@autogpt_platform/backend/backend/copilot/sdk/service.py` around lines 140 - 191, The comment asserting "CRITICAL" and that the tool_use_id is "already saved" should be softened to reflect that while upsert_chat_session persists to DB and cache, a transient cache staleness window is possible and is handled by the fallback sdk- ID generation; update the wording near upsert_chat_session and the fallback ID generation (the code that emits an sdk- ID) to remove the absolute claim, note that _find_latest_tool_use_id may return None if cache is briefly stale, and reference that the fallback path will generate a safe sdk- ID so no additional retry/delay logic is required.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (8)
autogpt_platform/backend/backend/copilot/response_model.pyautogpt_platform/backend/backend/copilot/sdk/response_adapter.pyautogpt_platform/backend/backend/copilot/sdk/service.pyautogpt_platform/backend/backend/copilot/service.pyautogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.pyautogpt_platform/frontend/src/app/(platform)/copilot/tools/CreateAgent/CreateAgent.tsxautogpt_platform/frontend/src/app/(platform)/copilot/tools/EditAgent/EditAgent.tsxautogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- autogpt_platform/backend/backend/copilot/response_model.py
🧰 Additional context used
📓 Path-based instructions (15)
autogpt_platform/frontend/**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
autogpt_platform/frontend/**/*.{ts,tsx,js,jsx}: Use Node.js 21+ with pnpm package manager for frontend development
Always run 'pnpm format' for formatting and linting code in frontend development
Files:
autogpt_platform/frontend/src/app/(platform)/copilot/tools/EditAgent/EditAgent.tsxautogpt_platform/frontend/src/app/(platform)/copilot/tools/CreateAgent/CreateAgent.tsxautogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts
autogpt_platform/frontend/**/*.{tsx,ts}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
autogpt_platform/frontend/**/*.{tsx,ts}: Use function declarations for components and handlers (not arrow functions) in React components
Only use arrow functions for small inline lambdas (map, filter, etc.) in React components
Use PascalCase for component names and camelCase with 'use' prefix for hook names in React
Use Tailwind CSS utilities only for styling in frontend components
Use design system components from 'src/components/' (atoms, molecules, organisms) in frontend development
Never use 'src/components/legacy/' in frontend code
Only use Phosphor Icons (@phosphor-icons/react) for icons in frontend components
Use generated API hooks from '@/app/api/generated/endpoints/' instead of deprecated 'BackendAPI' or 'src/lib/autogpt-server-api/'
Use React Query for server state (via generated hooks) in frontend development
Default to client components ('use client') in Next.js; only use server components for SEO or extreme TTFB needs
Use '' component for rendering errors in frontend UI; use toast notifications for mutation errors; use 'Sentry.captureException()' for manual exceptions
Separate render logic from data/behavior in React components; keep comments minimal (code should be self-documenting)
Files:
autogpt_platform/frontend/src/app/(platform)/copilot/tools/EditAgent/EditAgent.tsxautogpt_platform/frontend/src/app/(platform)/copilot/tools/CreateAgent/CreateAgent.tsxautogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts
autogpt_platform/frontend/**/*.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
autogpt_platform/frontend/**/*.{ts,tsx}: No barrel files or 'index.ts' re-exports in frontend code
Regenerate API hooks with 'pnpm generate:api' after backend OpenAPI spec changes in frontend development
Files:
autogpt_platform/frontend/src/app/(platform)/copilot/tools/EditAgent/EditAgent.tsxautogpt_platform/frontend/src/app/(platform)/copilot/tools/CreateAgent/CreateAgent.tsxautogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts
autogpt_platform/frontend/src/**/*.{ts,tsx}
📄 CodeRabbit inference engine (autogpt_platform/frontend/CLAUDE.md)
autogpt_platform/frontend/src/**/*.{ts,tsx}: Fully capitalize acronyms in symbols, e.g.graphID,useBackendAPI
Use function declarations (not arrow functions) for components and handlers
Separate render logic (.tsx) from business logic (use*.tshooks)
Use shadcn/ui (Radix UI primitives) with Tailwind CSS styling for UI components
Use Phosphor Icons only for icons
Use ErrorCard for render errors, toast for mutations, and Sentry for exceptions
Use design system components fromsrc/components/(atoms, molecules, organisms)
Never usesrc/components/__legacy__/*components
Use generated API hooks from@/app/api/__generated__/endpoints/with patternuse{Method}{Version}{OperationName}
Use Tailwind CSS only for styling, with design tokens
Do not useuseCallbackoruseMemounless asked to optimize a given function
Never type withanyunless a variable/attribute can ACTUALLY be of any type
autogpt_platform/frontend/src/**/*.{ts,tsx}: Structure components asComponentName/ComponentName.tsx+useComponentName.ts+helpers.tsand use design system components fromsrc/components/(atoms, molecules, organisms)
Use generated API hooks from@/app/api/__generated__/endpoints/with patternuse{Method}{Version}{OperationName}and regenerate withpnpm generate:api
Use function declarations (not arrow functions) for components and handlers
Separate render logic from business logic with component.tsx + useComponent.ts + helpers.ts structure
Colocate state when possible, avoid creating large components, use sub-components in local/componentsfolder
Avoid large hooks, abstract logic intohelpers.tsfiles when sensible
Use arrow functions only for callbacks, not for component declarations
Avoid comments at all times unless the code is very complex
Do not useuseCallbackoruseMemounless asked to optimize a given function
Files:
autogpt_platform/frontend/src/app/(platform)/copilot/tools/EditAgent/EditAgent.tsxautogpt_platform/frontend/src/app/(platform)/copilot/tools/CreateAgent/CreateAgent.tsxautogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts
autogpt_platform/frontend/src/**/*.tsx
📄 CodeRabbit inference engine (autogpt_platform/frontend/CLAUDE.md)
Component props should be
type Props = { ... }(not exported) unless it needs to be used outside the componentComponent props should be
interface Props { ... }(not exported) unless the interface needs to be used outside the component
Files:
autogpt_platform/frontend/src/app/(platform)/copilot/tools/EditAgent/EditAgent.tsxautogpt_platform/frontend/src/app/(platform)/copilot/tools/CreateAgent/CreateAgent.tsx
autogpt_platform/frontend/**/*.{js,jsx,ts,tsx}
📄 CodeRabbit inference engine (AGENTS.md)
autogpt_platform/frontend/**/*.{js,jsx,ts,tsx}: Format frontend code usingpnpm format
Never use components fromsrc/components/__legacy__/*
Files:
autogpt_platform/frontend/src/app/(platform)/copilot/tools/EditAgent/EditAgent.tsxautogpt_platform/frontend/src/app/(platform)/copilot/tools/CreateAgent/CreateAgent.tsxautogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts
autogpt_platform/frontend/**/*.{js,jsx,ts,tsx,css}
📄 CodeRabbit inference engine (AGENTS.md)
Use Tailwind CSS only for styling, use design tokens, and use Phosphor Icons only
Files:
autogpt_platform/frontend/src/app/(platform)/copilot/tools/EditAgent/EditAgent.tsxautogpt_platform/frontend/src/app/(platform)/copilot/tools/CreateAgent/CreateAgent.tsxautogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts
autogpt_platform/**/*.{ts,tsx}
📄 CodeRabbit inference engine (AGENTS.md)
Never type with
any, if no types available useunknown
Files:
autogpt_platform/frontend/src/app/(platform)/copilot/tools/EditAgent/EditAgent.tsxautogpt_platform/frontend/src/app/(platform)/copilot/tools/CreateAgent/CreateAgent.tsxautogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts
autogpt_platform/frontend/src/app/(platform)/**/*.tsx
📄 CodeRabbit inference engine (AGENTS.md)
If adding protected frontend routes, update
frontend/lib/supabase/middleware.ts
Files:
autogpt_platform/frontend/src/app/(platform)/copilot/tools/EditAgent/EditAgent.tsxautogpt_platform/frontend/src/app/(platform)/copilot/tools/CreateAgent/CreateAgent.tsx
autogpt_platform/backend/**/*.py
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
autogpt_platform/backend/**/*.py: Use Python 3.11 (required; managed by Poetry via pyproject.toml) for backend development
Always run 'poetry run format' (Black + isort) before linting in backend development
Always run 'poetry run lint' (ruff) after formatting in backend development
Files:
autogpt_platform/backend/backend/copilot/sdk/service.pyautogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.pyautogpt_platform/backend/backend/copilot/service.pyautogpt_platform/backend/backend/copilot/sdk/response_adapter.py
autogpt_platform/backend/**/*.{py,txt}
📄 CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)
Use
poetry runprefix for all Python commands, including testing, linting, formatting, and migrations
Files:
autogpt_platform/backend/backend/copilot/sdk/service.pyautogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.pyautogpt_platform/backend/backend/copilot/service.pyautogpt_platform/backend/backend/copilot/sdk/response_adapter.py
autogpt_platform/backend/backend/**/*.py
📄 CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)
Use Prisma ORM for database operations in PostgreSQL with pgvector for embeddings
Files:
autogpt_platform/backend/backend/copilot/sdk/service.pyautogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.pyautogpt_platform/backend/backend/copilot/service.pyautogpt_platform/backend/backend/copilot/sdk/response_adapter.py
autogpt_platform/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Format Python code with
poetry run format
Files:
autogpt_platform/backend/backend/copilot/sdk/service.pyautogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.pyautogpt_platform/backend/backend/copilot/service.pyautogpt_platform/backend/backend/copilot/sdk/response_adapter.py
autogpt_platform/frontend/src/**/*use*.ts
📄 CodeRabbit inference engine (autogpt_platform/frontend/CLAUDE.md)
Do not type hook returns, let TypeScript infer as much as possible
Files:
autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts
autogpt_platform/frontend/src/**/*.ts
📄 CodeRabbit inference engine (AGENTS.md)
Do not type hook returns, let Typescript infer as much as possible
Files:
autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts
🧠 Learnings (1)
📚 Learning: 2026-02-04T16:49:42.490Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-02-04T16:49:42.490Z
Learning: Applies to autogpt_platform/frontend/**/*.{ts,tsx} : Regenerate API hooks with 'pnpm generate:api' after backend OpenAPI spec changes in frontend development
Applied to files:
autogpt_platform/frontend/src/app/(platform)/copilot/tools/CreateAgent/CreateAgent.tsx
🧬 Code graph analysis (5)
autogpt_platform/frontend/src/app/(platform)/copilot/tools/EditAgent/EditAgent.tsx (1)
autogpt_platform/frontend/src/app/(platform)/copilot/tools/CreateAgent/helpers.tsx (3)
isOperationStartedOutput(75-82)isOperationPendingOutput(84-88)isOperationInProgressOutput(90-97)
autogpt_platform/frontend/src/app/(platform)/copilot/tools/CreateAgent/CreateAgent.tsx (1)
autogpt_platform/frontend/src/app/(platform)/copilot/tools/CreateAgent/helpers.tsx (3)
isOperationStartedOutput(75-82)isOperationPendingOutput(84-88)isOperationInProgressOutput(90-97)
autogpt_platform/backend/backend/copilot/sdk/service.py (3)
autogpt_platform/backend/backend/copilot/db.py (1)
get_chat_session(26-32)autogpt_platform/backend/backend/copilot/model.py (1)
get_chat_session(342-394)autogpt_platform/backend/backend/copilot/response_model.py (1)
StreamToolInputAvailable(149-161)
autogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.py (3)
autogpt_platform/backend/backend/util/logging.py (2)
info(41-43)error(49-51)autogpt_platform/backend/backend/copilot/stream_registry.py (1)
create_task(86-177)autogpt_platform/backend/backend/copilot/completion_consumer.py (1)
publish_operation_complete(319-349)
autogpt_platform/backend/backend/copilot/sdk/response_adapter.py (2)
autogpt_platform/backend/backend/util/logging.py (1)
debug(53-55)autogpt_platform/backend/backend/copilot/response_model.py (6)
StreamToolInputAvailable(149-161)to_sse(51-60)to_sse(76-84)to_sse(178-187)to_sse(212-222)to_sse(237-239)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: types
- GitHub Check: Seer Code Review
- GitHub Check: end-to-end tests
- GitHub Check: Analyze (python)
- GitHub Check: Check PR Status
- GitHub Check: test (3.11)
- GitHub Check: test (3.13)
- GitHub Check: test (3.12)
🔇 Additional comments (9)
autogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.py (1)
16-17: Background task tracking pattern is correct
_background_tasks.add(bg_task)followed bybg_task.add_done_callback(_background_tasks.discard)is the idiomatic asyncio pattern for preventing premature GC of fire-and-forget tasks, and the callback receives the task as its sole argument matchingset.discard's signature.Also applies to: 123-127, 189-195
autogpt_platform/backend/backend/copilot/sdk/response_adapter.py (1)
117-150: LGTM — debug logging properly guarded and provider metadata cleanly set.The previous review feedback has been addressed: logging is at
DEBUGlevel and serialization is gated behindlogger.isEnabledFor(logging.DEBUG).autogpt_platform/backend/backend/copilot/sdk/service.py (2)
243-284: Synchronous execution path looks correct.The refactor from async background tasks to synchronous blocking is clean:
- Tool executes via
await _execute_long_running_tool_with_streaming- Fallback to a generic message if
result_contentisNone(Line 272)- Returns actual content to Claude as a single MCP result
One minor note: Line 272 returns a hardcoded JSON string
'{"message": "Operation completed"}'while_execute_long_running_tool_with_streaming(in service.py Line 1545) uses the same fallback string. This is consistent.
909-912: LGTM — safe attribute access for debug logging.Using
getattr(response, "callProviderMetadata", None)is appropriate here sinceresponseis typed asStreamBaseResponsewhich doesn't necessarily have that field.autogpt_platform/backend/backend/copilot/service.py (4)
1420-1433: LGTM — provider metadata for long-running tools in the non-SDK path.Consistent with the SDK path in
response_adapter.py. Theget_toollookup andis_long_runningcheck is a cleaner approach than hardcoding tool names.
1484-1535: Heartbeat loop during synchronous long-running execution looks correct.This addresses the prior review feedback. The
asyncio.shield+asyncio.wait_forpattern is consistent with the normal tool heartbeat loop (Lines 1563-1568) and correctly protects the inner task from cancellation on timeout.
1796-1803: LGTM — skip DB update for fallback SDK IDs.Correctly prevents attempting to update messages by a generated
sdk-ID that doesn't match any stored message.
66-66: LGTM — import cleanup consistent with removal of pending/started flow.autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts (1)
230-233: LGTM — comment accurately reflects the updated architecture.The documentation now correctly distinguishes between the async webhook fallback path, legacy operation states, and the new synchronous execution path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@autogpt_platform/backend/backend/copilot/service.py`:
- Around line 1673-1731: The polling success path returns result_content without
clearing the running-operation key, so when delegated_to_async is True the
finally block won't run cleanup; add an awaited call to
_mark_operation_completed(tool_call_id) immediately after detecting the task
completed (before querying the DB or before returning result_content) to ensure
RUNNING_OPERATION_PREFIX+tool_call_id is cleared—place the await
_mark_operation_completed(tool_call_id) in the same branch where
task_status.status != "running" (or just before returning result_content) and
keep error handling minimal (let any exception bubble or log) so the final DB
lookup and return still proceed.
In `@autogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.py`:
- Around line 228-231: The background task _publish_dummy_patch_after_delay
currently captures current_agent by reference and only calls
current_agent.copy() after awaiting asyncio.sleep(delay_seconds), which risks
patching a mutated dict; fix by making a defensive shallow copy of current_agent
(e.g., saved_agent = current_agent.copy()) and any data needed from
update_request before the asyncio.sleep call, then use saved_agent to build
patched and set patched["description"] = f"{saved_agent.get('description', '')}
(updated: {update_request})" after the delay so the patch operates on the
immutable snapshot.
- Around line 154-166: The two helper functions
_publish_dummy_result_after_delay and _publish_dummy_patch_after_delay emit
inconsistent result payloads—one sends result={"agent_json": agent_json} while
the other sends result={"type":"agent","agent_json": patched"}—and the "type"
field is unused downstream; standardize both to the same shape by removing the
unused "type" key (or add it to both if intentionally needed) so both publish
helpers emit result with the identical top-level keys (e.g.,
result={"agent_json": ...}), and update the logger messages if needed to reflect
the standardized payload; ensure completion_handler.py behavior remains the same
by keeping agent_json present.
In
`@autogpt_platform/frontend/src/app/`(platform)/copilot/tools/CreateAgent/CreateAgent.tsx:
- Around line 122-133: hasExpandableContent currently only checks part.state ===
"output-available", so when a tool is synchronous-blocking (isLongRunning ===
true) the UI sets isOperating but the ToolAccordion/MiniGame never renders;
update the hasExpandableContent logic to also treat isLongRunning (or
isOperating) as allowing expandable content (i.e., include isLongRunning ||
isOperating in the condition that previously required part.state ===
"output-available"), and apply the same change in the corresponding EditAgent
component where hasExpandableContent is computed.
In
`@autogpt_platform/frontend/src/app/`(platform)/copilot/tools/EditAgent/EditAgent.tsx:
- Around line 107-119: hasExpandableContent currently only returns true when
part.state === "output-available", so ToolAccordion/MiniGame won't render during
synchronous blocking; update the hasExpandableContent computation to also allow
isOperating (which is computed from isLongRunning and output checks) to enable
UI while the backend is blocking—modify the expression that defines
hasExpandableContent in EditAgent.tsx to include isOperating (or isLongRunning)
alongside the existing part.state === "output-available" check so ToolAccordion
and MiniGame render during long-running synchronous operations.
---
Nitpick comments:
In `@autogpt_platform/backend/backend/copilot/sdk/service.py`:
- Around line 140-191: The comment asserting "CRITICAL" and that the tool_use_id
is "already saved" should be softened to reflect that while upsert_chat_session
persists to DB and cache, a transient cache staleness window is possible and is
handled by the fallback sdk- ID generation; update the wording near
upsert_chat_session and the fallback ID generation (the code that emits an sdk-
ID) to remove the absolute claim, note that _find_latest_tool_use_id may return
None if cache is briefly stale, and reference that the fallback path will
generate a safe sdk- ID so no additional retry/delay logic is required.
In `@autogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.py`:
- Around line 154-166: The error handlers in _publish_dummy_result_after_delay
and _publish_dummy_patch_after_delay currently log only the exception string,
losing tracebacks; update both except blocks to pass exc_info=True to
logger.error (keeping the existing message and exception variable e) so the full
traceback is preserved for debugging when publish_operation_complete or
publish_operation_patch fails.
In
`@autogpt_platform/frontend/src/app/`(platform)/copilot/tools/CreateAgent/CreateAgent.tsx:
- Around line 124-126: Extract the repeated cast logic into a shared helper
(e.g., isLongRunningTool) and use it from both CreateAgent.tsx and
EditAgent.tsx: implement a small function that accepts the tool part (type like
Record<string, unknown> or { [key: string]: unknown }) and safely reads
callProviderMetadata?.isLongRunning returning a boolean, then replace the inline
cast expression in CreateAgent (the isLongRunning computation) and the analogous
expression in EditAgent to call this helper.
autogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.py
Outdated
Show resolved
Hide resolved
autogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.py
Outdated
Show resolved
Hide resolved
autogpt_platform/frontend/src/app/(platform)/copilot/tools/CreateAgent/CreateAgent.tsx
Outdated
Show resolved
Hide resolved
autogpt_platform/frontend/src/app/(platform)/copilot/tools/EditAgent/EditAgent.tsx
Outdated
Show resolved
Hide resolved
…y race condition - Fix hasExpandableContent to check isLongRunning first (CreateAgent/EditAgent) Without this, MiniGame accordion won't expand during synchronous blocking - Fix Redis lock not released in polling success path (service.py:1731) - Fix race condition in dummy.py: copy current_agent BEFORE sleep - Standardize dummy helper payloads (remove inconsistent "type" field) - Add exc_info=True to error logging in dummy helpers
0198c06 to
952a25c
Compare
- Fix _stream_listener infinite loop when session metadata expires:
hget("status") returns None after TTL, but `None and x` is False
so listener never breaks, keeping SSE open and chat locked forever.
Now treats missing metadata as "not running" and exits.
- Cancel now breaks the stream loop immediately instead of draining
- Remove redundant StreamFinishStep publish from error handler
- Publish StreamFinishStep to Redis (step-level signal for frontend)
- Catch BaseException in executor safety net
- Log errors from safety-net mark_session_completed instead of silencing
- Enable refetchOnMount/refetchOnReconnect on frontend session query
952a25c to
5d9b122
Compare
…initialization The session-scoped async test data fixtures do DB operations via Prisma but didn't depend on the `server` fixture (which calls `db.connect()`). Without the explicit dependency, pytest can't guarantee `server` runs first, causing "Event loop is closed" errors in CI.
Issues attributed to commits in this pull requestThis pull request was merged and Sentry observed the following issues:
|
Summary
_stream_listenerinfinite xread loop when Redis session metadata TTL expired —hgetreturnedNonewhich bypassed thestatus != "running"break condition. Fixed by treatingNonestatus as non-running.mark_session_completedwhen executor doesn't respond within 5s. Returnscancelled=Truefor already-expired sessions.yield StreamFinish()removed from service layers (sdk/service.py, service.py, dummy.py).mark_session_completedis now the single atomic source of truth for publishing StreamFinish via Lua CAS script.refetchOnMount: trueso page refresh re-fetches session state.Test plan