fix(platform/copilot): fix stuck sessions, stop button, and StreamFinish reliability by majdyz · Pull Request #12191 · Significant-Gravitas/AutoGPT

majdyz · 2026-02-21T18:08:27Z

Summary

Fix stuck sessions: Root cause was _stream_listener infinite xread loop when Redis session metadata TTL expired — hget returned None which bypassed the status != "running" break condition. Fixed by treating None status as non-running.
Fix stop button reliability: Cancel endpoint now force-completes via mark_session_completed when executor doesn't respond within 5s. Returns cancelled=True for already-expired sessions.
Single-owner StreamFinish: All yield StreamFinish() removed from service layers (sdk/service.py, service.py, dummy.py). mark_session_completed is now the single atomic source of truth for publishing StreamFinish via Lua CAS script.
Rename task → session/turn: Consistent terminology across stream_registry and processor.
Frontend session refetch: Added refetchOnMount: true so page refresh re-fetches session state.
Test fixes: Updated e2e, service, and run_agent tests for StreamFinish removal; fixed async fixture decorators.

Test plan

E2E dummy streaming tests pass (13 passed, 1 xfailed)
run_agent_test.py passes (async fixture decorator fix)
service_test.py passes (StreamFinish assertions removed)
Manual: verify stuck sessions recover on page refresh
Manual: verify stop button works for active and expired sessions
Manual: verify no duplicate StreamFinish events in SSE stream

…ution Removes background task spawning for long-running tools (create_agent, edit_agent) in favor of synchronous blocking execution. This simplifies the conversation flow by returning actual results to Claude in a single LLM continuation instead of two separate continuations. Changes: - Remove asyncio.create_task() background task spawning - Execute tools synchronously with await - Fix database access to use get_chat_session().messages - Remove unused pending_msg/started_msg variables - Fix dummy agent generator to match real behavior - Add callProviderMetadata for frontend mini-game indication Benefits: - Single LLM continuation with actual result (not operation started) - Clearer conversation flow - Frontend mini-game still shows via SSE during blocking execution - Simplified codebase without background task tracking

coderabbitai · 2026-02-21T18:08:53Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Adds provider metadata to tool-input SSE and surfaces long-running tools to the frontend; persists tool_use_id in a ContextVar for selected tools; converts long-running tool execution into a synchronous streaming path that returns final results; introduces async publish behavior for dummy agent tools and stale-task auto-completion in stream registry.

Changes

Cohort / File(s)	Summary
Response model & adapter `autogpt_platform/backend/backend/copilot/response_model.py`, `autogpt_platform/backend/backend/copilot/sdk/response_adapter.py`	Adds optional `callProviderMetadata` to `StreamToolInputAvailable`, switches SSE serialization to include None fields, logs when provider metadata exists, and adapter populates `{isLongRunning: true}` for create/edit agent inputs.
Pre-tool hook & ContextVar `autogpt_platform/backend/backend/copilot/sdk/security_hooks.py`, `autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py`	Introduces `_current_tool_use_id` ContextVar; security hook sets it for select long-running tools and tool_adapter logs callback presence/invocation.
SDK service long-running flow `autogpt_platform/backend/backend/copilot/sdk/service.py`	Adds `_find_latest_tool_use_id`; rewrites long-running callback to resolve tool_call_id from history, synchronously execute long-running tool via `_execute_long_running_tool_with_streaming`, and return final result content instead of emitting pending/started messages.
Core service & stream registry `autogpt_platform/backend/backend/copilot/service.py`, `autogpt_platform/backend/backend/copilot/stream_registry.py`	Refactors long-running execution to synchronous streaming, propagates provider metadata into streams, adds stale-task auto-completion (10 min), adjusts logging/heartbeat and error-streaming behavior, and trims some pending/started public exports.
Dummy agent tools (async publish) `autogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.py`	Dual-mode behavior: when `operation_id`/`task_id` provided returns accepted and schedules background publish to Redis Streams after a delay; tracks background tasks to avoid GC; sync mode retains delayed direct return.
Frontend long-running UI `autogpt_platform/frontend/src/app/(platform)/copilot/tools/CreateAgent/CreateAgent.tsx`, `autogpt_platform/frontend/src/app/(platform)/copilot/tools/EditAgent/EditAgent.tsx`	Frontend reads `callProviderMetadata.isLongRunning` to treat inputs as operating/long-running and show expandable content even when prior output checks would not.
Polling, comments & ignores `autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts`, `autogpt_platform/frontend/src/app/(platform)/copilot/hooks/useLongRunningToolPolling.ts`, `autogpt_platform/backend/.gitignore`, `.gitignore`	Polling interval increased (1.5s → 5s) for legacy async polling; comment updated to reflect async/sync modes; `.application.logs` added to backend .gitignore and `.next` ignore adjusted.

Sequence Diagram(s)

sequenceDiagram
    actor Frontend as Frontend
    participant Service as Copilot Service
    participant Adapter as SDK Adapter
    participant Tool as Tool Executor
    participant Registry as Stream Registry

    Frontend->>Service: Request long-running tool (e.g. create_agent)
    Service->>Frontend: Emit StreamToolInputAvailable (callProviderMetadata: {isLongRunning:true})
    Frontend-->>Frontend: Render long-running UI state

    Service->>Adapter: Invoke tool handler
    Adapter->>Adapter: Set _current_tool_use_id ContextVar
    Adapter->>Tool: Execute tool synchronously (streaming)
    Tool-->>Adapter: Stream outputs / final result
    Adapter-->>Service: Return final result content
    Service->>Registry: Save tool call state & emit StreamToolOutputAvailable (result)
    Frontend-->>Frontend: Display final result and clear indicator

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

#12165 — Overlaps on copilot service long-running/tool execution and streaming behavior changes.
#12164 — Similar modifications to SDK service long-running callback and pending/started message flows.
#12103 — Related changes across response_adapter, service, and long-running tool handling.

Suggested labels

Review effort 4/5

Suggested reviewers

Swiftyos
Pwuts

Poem

🐰 I hopped through streams and set an ID so neat,
I flagged the long-run tools with metadata sweet.
Background tasks whisper, Redis keeps the tune,
Sync streams bring the finale — under the moon!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title covers the main changes: fixing stuck sessions, stop button reliability, and StreamFinish handling. It accurately reflects the primary objectives without being too vague or overly broad.
Description check	✅ Passed	The description is well-detailed and directly related to the changeset. It explains the root causes (stuck sessions, stop button issues, StreamFinish reliability) and covers all major modifications across backend and frontend files.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/copilot-synchronous-long-running-tools

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-02-21T18:09:07Z

🔍 PR Overlap Detection

This check compares your PR against all other open PRs targeting the same branch to detect potential merge conflicts early.

🔴 Merge Conflicts Detected

The following PRs have been tested and will have merge conflicts if merged after this PR. Consider coordinating with the authors.

feat(copilot): Replace bubblewrap with e2b sandbox for unified filesystem #12162 (Otto-AGPT · updated 3d ago)
- 📁 autogpt_platform/backend/backend/copilot/sdk/
  - service.py (4 conflicts, ~190 lines)
  - tool_adapter.py (3 conflicts, ~17 lines)
Fix CoPilot stop button by cancelling backend chat stream #12116 (Deeven-Seru · updated 6d ago)
- 📁 autogpt_platform/
  - backend/backend/copilot/stream_registry.py (3 conflicts, ~64 lines)
  - frontend/src/app/(platform)/copilot/useCopilotPage.ts (2 conflicts, ~24 lines)
  - frontend/src/app/api/openapi.json (2 conflicts, ~20 lines)
fix(copilot): prevent double output from StreamFinish/mark_task_completed race #12195 (Otto-AGPT · updated 15h ago)
- 📁 autogpt_platform/
  - backend/backend/copilot/executor/processor.py (1 conflict, ~31 lines)
  - frontend/src/app/(platform)/copilot/useCopilotPage.ts (1 conflict, ~10 lines)
feat(library): implement folder organization system for agents #12101 (Abhi1992002 · updated 1h ago)
- 📁 autogpt_platform/frontend/src/app/api/
  - openapi.json (1 conflict, ~22 lines)
feat(platform): Add dynamic LLM model registry with admin UI #11699 (Bentlybro · updated 15h ago)
- 📁 autogpt_platform/
  - backend/backend/api/rest_api.py (1 conflict, ~10 lines)
  - frontend/src/app/(platform)/build/components/legacy-builder/NodeInputs.tsx (deleted here, modified there)
  - frontend/src/hooks/useAgentGraph.tsx (deleted here, modified there)
chore(frontend): Fix react-doctor warnings + add CI job #12163 (0ubbe · updated 13h ago)
- 📁 autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatMessagesContainer/
  - ChatMessagesContainer.tsx (1 conflict, ~6 lines)
refactor(backend): Merge autogpt_libs into backend package #12074 (Otto-AGPT · updated 11d ago)
feat(copilot): add dummy agent generator for testing #12071 (Otto-AGPT · updated 11d ago)

🟢 Low Risk — File Overlap Only

These PRs touch the same files but different sections (click to expand)

fix(backend): standardize microservice host/port configuration #12189 (majdyz · updated 7h ago)
- autogpt_platform/backend/backend/util/settings.py: L372-374
fix(copilot): workspace file listing fix #12190 (majdyz · updated 13h ago)
- Shared files: autogpt_platform/frontend/src/app/(platform)/copilot/tools/GenericTool/GenericTool.tsx
Handle workspace:// URLs in regular markdown links #12166 (ntindle · updated 3d ago)
- Shared files: autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatMessagesContainer/ChatMessagesContainer.tsx
feat(platform): Add waitlist feature with admin management and user notifications #11723 (ntindle · updated 13d ago)
- Shared files: autogpt_platform/backend/backend/api/rest_api.py

Summary: 8 conflict(s), 0 medium risk, 4 low risk (out of 12 PRs with file overlap)

Auto-generated on push. Ignores: openapi.json, lock files.

autogpt_platform/backend/backend/copilot/service.py

autogpt_platform/backend/backend/copilot/response_model.py

greptile-apps · 2026-02-21T18:12:02Z

Greptile Summary

Refactors long-running tool execution (create_agent, edit_agent) from asynchronous background tasks to synchronous blocking execution, providing Claude with the actual result instead of an "operation started" message.

Key changes:

Long-running tools now block until completion, with Claude receiving the final result in a single continuation
Added callProviderMetadata field with isLongRunning flag to trigger frontend mini-game during blocking operations
Implemented polling mechanism (up to 5 minutes) for tools that delegate to external services via webhook (202 Accepted response)
Updated dummy agent generator to mimic both sync and async (202) behavior patterns
Removed automatic LLM continuation generation after background tool completion

Architecture shift:
Previously: Tool returns "started" message → background task → LLM continuation when complete
Now: Tool blocks synchronously → polling if async (202) → return final result to Claude directly

This simplifies the conversation flow by eliminating the intermediate "operation started" state, though it introduces blocking behavior that could impact scalability under high concurrency.

Confidence Score: 3/5

This PR is moderately safe to merge with manual testing required
The refactoring achieves its goal of simplifying LLM continuations, but introduces synchronous blocking that could impact performance under load. The polling implementation (1-second intervals for up to 5 minutes) and untracked background tasks in the dummy generator are potential issues. Debug logging should be removed before production. The logic is sound but needs thorough testing with real agent generation workloads.
Pay close attention to service.py (polling logic, error handling) and dummy.py (untracked background tasks)

Sequence Diagram

sequenceDiagram
    participant User
    participant Frontend
    participant Service as Copilot Service
    participant Tool as Long-Running Tool
    participant Redis as Redis Streams
    participant Consumer as Completion Consumer
    
    User->>Frontend: Request agent creation
    Frontend->>Service: SSE stream with tool call
    Service->>Service: Mark operation started
    Service->>Service: Create task in stream_registry
    Service->>Service: Save tool_call to session
    
    Note over Service,Tool: Synchronous execution (BLOCKS)
    Service->>Tool: Execute with operation_id & task_id
    
    alt Tool returns sync result
        Tool-->>Service: Agent JSON directly
        Service->>Redis: Publish result to stream
        Service->>Redis: Mark task completed
        Service-->>Frontend: Return agent JSON to Claude
    else Tool returns 202 Accepted
        Tool-->>Service: status: "accepted"
        Note over Service,Redis: Polling loop (max 5 min)
        loop Every 1 second
            Service->>Redis: Check task status
            Redis-->>Service: status: "running"
        end
        Tool->>Redis: Publish via webhook
        Consumer->>Redis: Process completion message
        Consumer->>Service: Update session with result
        Redis-->>Service: status: "completed"
        Service->>Service: Query DB for result
        Service-->>Frontend: Return agent JSON to Claude
    end
    
    Frontend->>Frontend: Show mini-game (via isLongRunning)
    Frontend->>User: Display final result

_{Last reviewed commit: d5115de}

greptile-apps

_{7 files reviewed, 6 comments}

_{Edit Code Review Agent Settings | Greptile}

autogpt_platform/backend/backend/copilot/response_model.py

autogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.py

autogpt_platform/backend/backend/copilot/service.py

greptile-apps · 2026-02-21T18:12:17Z

Additional Comments (2)

autogpt_platform/backend/backend/copilot/service.py
Returns None instead of error string

The function signature indicates it should return str | None, and the caller at line 1489 expects a string. When session is not found, this returns None early. The caller should handle this case, but it's inconsistent with the timeout error at line 1749 which returns a JSON error string.

            return '{"error": "Session not found"}'

autogpt_platform/backend/backend/copilot/sdk/response_adapter.py
Remove debug logging before production

These debug logging statements (lines 66-79) with serialization tests add overhead to every tool call. The PR description mentions this passed format/lint but these should be removed for production.

autogpt_platform/backend/backend/copilot/sdk/service.py

coderabbitai

Actionable comments posted: 4

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e40c8c7 and d5115de.

📒 Files selected for processing (7)

autogpt_platform/backend/backend/copilot/response_model.py
autogpt_platform/backend/backend/copilot/sdk/response_adapter.py
autogpt_platform/backend/backend/copilot/sdk/security_hooks.py
autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py
autogpt_platform/backend/backend/copilot/service.py
autogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.py

🧰 Additional context used

📓 Path-based instructions (4)

autogpt_platform/backend/**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

autogpt_platform/backend/**/*.py: Use Python 3.11 (required; managed by Poetry via pyproject.toml) for backend development
Always run 'poetry run format' (Black + isort) before linting in backend development
Always run 'poetry run lint' (ruff) after formatting in backend development

Files:

autogpt_platform/backend/backend/copilot/sdk/response_adapter.py
autogpt_platform/backend/backend/copilot/response_model.py
autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py
autogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.py
autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/backend/backend/copilot/sdk/security_hooks.py
autogpt_platform/backend/backend/copilot/service.py

autogpt_platform/backend/**/*.{py,txt}

📄 CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)

Use poetry run prefix for all Python commands, including testing, linting, formatting, and migrations

Files:

autogpt_platform/backend/backend/copilot/sdk/response_adapter.py
autogpt_platform/backend/backend/copilot/response_model.py
autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py
autogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.py
autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/backend/backend/copilot/sdk/security_hooks.py
autogpt_platform/backend/backend/copilot/service.py

autogpt_platform/backend/backend/**/*.py

📄 CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)

Use Prisma ORM for database operations in PostgreSQL with pgvector for embeddings

Files:

autogpt_platform/backend/backend/copilot/sdk/response_adapter.py
autogpt_platform/backend/backend/copilot/response_model.py
autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py
autogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.py
autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/backend/backend/copilot/sdk/security_hooks.py
autogpt_platform/backend/backend/copilot/service.py

autogpt_platform/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Format Python code with poetry run format

Files:

autogpt_platform/backend/backend/copilot/sdk/response_adapter.py
autogpt_platform/backend/backend/copilot/response_model.py
autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py
autogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.py
autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/backend/backend/copilot/sdk/security_hooks.py
autogpt_platform/backend/backend/copilot/service.py

🧬 Code graph analysis (4)

autogpt_platform/backend/backend/copilot/sdk/response_adapter.py (1)

autogpt_platform/backend/backend/copilot/response_model.py (6)

StreamToolInputAvailable (149-161)

to_sse (48-60)

to_sse (76-84)

to_sse (178-187)

to_sse (212-222)

to_sse (237-239)

autogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.py (1)

autogpt_platform/backend/backend/copilot/completion_consumer.py (1)

publish_operation_complete (319-349)

autogpt_platform/backend/backend/copilot/sdk/service.py (3)

autogpt_platform/backend/backend/copilot/model.py (1)

get_chat_session (342-394)

autogpt_platform/backend/backend/copilot/db.py (1)

get_chat_session (26-32)

autogpt_platform/backend/backend/copilot/response_model.py (1)

StreamToolInputAvailable (149-161)

autogpt_platform/backend/backend/copilot/service.py (5)

autogpt_platform/backend/backend/copilot/tools/models.py (2)

ErrorResponse (207-212)

OperationInProgressResponse (451-459)

autogpt_platform/backend/backend/copilot/tools/create_agent.py (1)

is_long_running (51-52)

autogpt_platform/backend/backend/copilot/tools/base.py (1)

is_long_running (40-47)

autogpt_platform/backend/backend/copilot/response_model.py (2)

StreamToolInputAvailable (149-161)

StreamToolOutputAvailable (164-187)

autogpt_platform/backend/backend/copilot/stream_registry.py (3)

create_task (86-177)

mark_task_completed (644-700)

get_task (725-750)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)

GitHub Check: types
GitHub Check: Greptile Review
GitHub Check: Seer Code Review
GitHub Check: test (3.11)
GitHub Check: test (3.13)
GitHub Check: test (3.12)
GitHub Check: Check PR Status
GitHub Check: Analyze (python)

🔇 Additional comments (12)

autogpt_platform/backend/backend/copilot/sdk/security_hooks.py (1)

189-198: LGTM: contextvar capture for long‑running tool IDs.

autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py (2)

67-71: LGTM: context var for tool_use_id tracking.

301-309: LGTM: long‑running callback observability.

autogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.py (2)

107-162: LGTM: async/sync dummy generator flow + delayed publish.

171-238: LGTM: patch flow mirrors async/sync behavior cleanly.

autogpt_platform/backend/backend/copilot/sdk/service.py (2)

218-283: LGTM: synchronous long‑running callback flow is clearer.

908-912: LGTM: tool input event logging includes provider metadata.

autogpt_platform/backend/backend/copilot/service.py (5)

66-66: LGTM: import cleanup for updated response models.

1422-1434: LGTM: provider metadata for long‑running tools.

1632-1650: LGTM: no auto‑continuation for legacy background path.

1663-1803: LGTM: synchronous + async‑polling path is well structured.

1843-1851: LGTM: skip pending update for fallback IDs.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@autogpt_platform/backend/backend/copilot/response_model.py`:
- Around line 50-59: The SSE serialization is logging full JSON at INFO which
can leak sensitive tool inputs; in the to_sse serialization path (use symbols
model_dump_json, callProviderMetadata, and logger.info) only serialize when
debugging is enabled and log at DEBUG level: check
logger.isEnabledFor(logging.DEBUG) before calling model_dump_json and replace
logger.info with logger.debug, and apply the same change to the other occurrence
referenced (the block around lines 158-161) so sensitive payloads are not
produced or logged at INFO.

In `@autogpt_platform/backend/backend/copilot/sdk/response_adapter.py`:
- Around line 118-149: Change the info-level debug logs that serialize tool
inputs to debug-level and guard expensive/sensitive serialization behind a
debug-enabled check: replace logger.info calls that reference provider_metadata,
the created StreamToolInputAvailable (tool_input_obj),
tool_input_obj.callProviderMetadata, tool_input_obj.model_dump_json(...) and
tool_input_obj.to_sse() with logger.debug and wrap the model_dump_json/to_sse
calls in if logger.isEnabledFor(logging.DEBUG): so serialization only happens
when DEBUG is enabled; keep responses.append(tool_input_obj) unchanged.

In `@autogpt_platform/backend/backend/copilot/sdk/service.py`:
- Around line 165-178: The helper is checking tool_call.get("name") but
session.messages stores OpenAI-format tool_calls with the tool name nested under
the "function" key, so update the lookup in the loop over msg.tool_calls (the
block referencing msg.tool_calls, tool_name, tool_use_id, session_id, logger) to
first try extracting the name from tool_call.get("function", {}).get("name") and
fall back to tool_call.get("name") for backward compatibility; keep the existing
MCP prefix check (f"mcp__copilot__{tool_name}") and preserve the logic that logs
tool_use_id and returns it when found.

In `@autogpt_platform/backend/backend/copilot/service.py`:
- Around line 1465-1531: The synchronous execution path that awaits
_execute_long_running_tool_with_streaming currently holds the SSE connection
idle and must be updated to send periodic heartbeat StreamToolOutputAvailable
events while waiting: wrap the await call in an async heartbeat loop (similar to
the normal tool heartbeat implementation) that schedules periodic yields of a
small heartbeat payload (e.g., {"heartbeat": true}) using the same tool_call_id/
task_id/session_id, and only stop heartbeats when the awaited
_execute_long_running_tool_with_streaming completes or raises; on exception
ensure you still mark the task via stream_registry.mark_task_completed and send
the error StreamToolOutputAvailable before returning, and on success send the
final result_str as currently done.

autogpt_platform/backend/backend/copilot/response_model.py

autogpt_platform/backend/backend/copilot/sdk/response_adapter.py

autogpt_platform/backend/backend/copilot/sdk/service.py

autogpt_platform/backend/backend/copilot/service.py

autogpt-reviewer

📋 Final Review Summary

PR: #12191 — fix(backend/copilot): refactor long-running tools to synchronous execution

Author: majdyz | Reviewers: Swiftyos, Pwuts | Size: XL (+434/-230, 7 files) | CI: ✅ All green | CLA: ✅ Signed

Specialist Verdicts

Reviewer	Verdict	Key Finding
🛡️ Security	✅	No auth bypass or secrets exposure; ContextVar usage is safe
🏗️ Architecture	⚠️	Fundamental concern: sync blocking ties up SSE connection for 5+ min; ContextVar set but unused
⚡ Performance	⚠️	Hard polling loop (1s × 300) is wasteful; `exclude_none=False` increases all SSE payloads
🧪 Testing	⚠️	No new tests for critical new code paths (polling loop, _find_latest_tool_use_id, timeout handling)
📖 Quality	⚠️	Excessive debug logging (5+ lines per tool call in response_adapter.py); dead ContextVar code
📦 Product	⚠️	Claude now gets real results (good!), but removal of LLM continuation breaks reconnection UX
📬 Discussion	✅	No unresolved threads from maintainers
🔎 QA Validator	⚠️	SSE contract change (`exclude_none=False`) affects all events; frontend compatibility unverified

Blockers (Must Fix Before Merge)

1. Dead ContextVar — _current_tool_use_id set but never read

security_hooks.py:193 sets _current_tool_use_id for long-running tools
But _build_long_running_callback in sdk/service.py uses _find_latest_tool_use_id() (DB query) instead
The ContextVar is imported in security_hooks.py from tool_adapter.py but never consumed by the callback
Fix: Either use the ContextVar in the callback (faster, no DB query) or remove the dead code

2. exclude_none=False is a global SSE change with unverified frontend impact

response_model.py:50 — to_sse() on StreamBaseResponse now includes ALL None fields in every SSE event
This affects every SSE message type (text chunks, tool starts, tool outputs, usage events, etc.), not just long-running tools
Increases payload size for every event and may break frontend parsers that don't expect null fields
Fix: Apply exclude_none=False only to StreamToolInputAvailable.to_sse() override, not the base class

3. Hard polling loop for 202 Accepted webhooks

service.py:1720-1745 — asyncio.sleep(1.0) in a loop up to 300 iterations (5 minutes)
This holds the SSE generator, an async task slot, and the stream lock for the entire duration
Should use Redis pub/sub, asyncio.Event, or stream_registry notification instead of busy-wait
Fix: Replace polling with event-driven notification from completion_consumer

Should Fix

4. Excessive debug logging in production code

response_adapter.py:127-146 — 5 sequential logger.info() calls per tool call including:
- Full JSON serialization test (model_dump_json)
- to_sse() test call (serializes twice)
- Field existence verification
These should be logger.debug() at most, or removed entirely before merge

5. Fire-and-forget asyncio.create_task() in dummy.py

dummy.py:127 and dummy.py:192 — asyncio.create_task() without storing reference
Task can be garbage-collected before completion
Fix: Store in module-level _background_tasks set (same pattern already used elsewhere in this codebase)

6. Removed LLM continuation breaks reconnection experience

service.py:1632-1636 and service.py:1647-1650 — Both success and error paths now skip _generate_llm_continuation
If a user disconnects during the 5-minute wait and reconnects, they'll see raw tool output with no AI explanation
The PR description doesn't mention this as an intentional UX change

Nice to Have

7. callProviderMetadata hardcoded for only two tools

response_adapter.py:128 and service.py:1428 — if tool_name in ("create_agent", "edit_agent") is fragile
Should use tool.is_long_running property (already available) instead of hardcoded names
The non-SDK path (service.py:1428) correctly uses tool.is_long_running, but the SDK path (response_adapter.py:128) hardcodes names

8. _find_latest_tool_use_id is O(n) on session message count

Iterates reversed(session.messages) which loads all messages; for long sessions this could be slow
If the ContextVar approach (blocker #1) is adopted, this function becomes unnecessary

Risk Assessment

Merge risk: MEDIUM — Core architectural change from async→sync with several edge cases in webhook polling and SSE serialization
Rollback difficulty: EASY — Clean revert, no migrations or schema changes

🎯 Final Verdict: REQUEST CHANGES

Recommendation

The core idea is sound — giving Claude actual tool results instead of "operation started" placeholders is a significant improvement for copilot response quality. However, three issues need addressing before merge:

The dead ContextVar code suggests incomplete refactoring — either the DB query approach or the ContextVar approach should be chosen, not both partially implemented
The exclude_none=False change on the base class is a shotgun change that affects all SSE consumers — scope it to just the events that need it
The hard polling loop is a reliability concern for production — a 5-minute busy-wait holding SSE resources will cause problems under load

The debug logging and fire-and-forget tasks are lower priority but should be cleaned up.

For Maintainers

@Swiftyos @Pwuts — This PR is closely related to #12190 (same author, same infrastructure). The sync-vs-async decision is architecturally important for the copilot's long-running tool story. Recommend reviewing blockers #1-3 with the author before approving.

autogpt_platform/backend/backend/copilot/sdk/service.py

autogpt_platform/backend/backend/copilot/service.py

…d cleanup This commit completes the synchronous long-running tool execution refactor by: 1. **Fixes all PR review issues**: - Move logging import to module level in response_model.py - Change INFO logging to DEBUG with isEnabledFor guards to prevent sensitive data leakage - Fix tool name lookup in SDK to check function.name (OpenAI format) - Add heartbeats during synchronous execution to keep SSE connection alive - Fix polling timeout to release Redis lock and mark task as failed - Track background tasks in dummy.py to prevent garbage collection 2. **Add frontend support for callProviderMetadata**: - Update CreateAgent.tsx to detect long-running via callProviderMetadata.isLongRunning - Update EditAgent.tsx with same detection logic - Maintain backwards compatibility with operation_started/pending output types 3. **Remove dead backend code**: - Remove unused _background_tasks set from service.py (only used in dummy.py now) - Remove dead _execute_long_running_tool function (never called) - Add comments clarifying legacy vs new execution paths 4. **Update frontend comments**: - Clarify that useLongRunningToolPolling is for async fallback and legacy sessions - Document that modern synchronous execution bypasses polling The synchronous execution flow now works end-to-end: - Backend sets callProviderMetadata.isLongRunning on tool input - Backend blocks synchronously with heartbeats - Frontend shows mini-game immediately when detecting isLongRunning - Backend returns actual result after completion - Frontend displays result (single LLM continuation) Async fallback path (202 Accepted + webhooks) still supported for external services.

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (3)

autogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.py (1)
154-166: Add exc_info=True to error logs in background publish functions to preserve tracebacks

Both _publish_dummy_result_after_delay and _publish_dummy_patch_after_delay log only str(e) on failure, discarding the full traceback. This makes diagnosing Redis connectivity or serialization failures in dev/CI very difficult.
♻️ Proposed fix (applies to both functions)
-    except Exception as e:
-        logger.error(f"[Dummy] Failed to publish to Redis Streams: {e}")
+    except Exception:
+        logger.error("[Dummy] Failed to publish to Redis Streams", exc_info=True)
Also applies to: 233-245
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.py`
around lines 154 - 166, The error handlers in _publish_dummy_result_after_delay
and _publish_dummy_patch_after_delay currently log only the exception string,
losing tracebacks; update both except blocks to pass exc_info=True to
logger.error (keeping the existing message and exception variable e) so the full
traceback is preserved for debugging when publish_operation_complete or
publish_operation_patch fails.
autogpt_platform/frontend/src/app/(platform)/copilot/tools/CreateAgent/CreateAgent.tsx (1)
124-126: Extract the callProviderMetadata cast into a shared helper to reduce duplication.

The same as unknown as { callProviderMetadata?: ... } cast appears in both CreateAgent.tsx and EditAgent.tsx. Consider extracting a small typed helper (e.g., in a shared helpers file) that safely extracts isLongRunning from a tool part.
♻️ Example helper
// e.g., in a shared copilot/tools/helpers.ts
export function isLongRunningTool(part: { [key: string]: unknown }): boolean {
  const meta = (part as Record<string, unknown>).callProviderMetadata;
  if (meta && typeof meta === "object" && "isLongRunning" in meta) {
    return (meta as { isLongRunning?: boolean }).isLongRunning === true;
  }
  return false;
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@autogpt_platform/frontend/src/app/`(platform)/copilot/tools/CreateAgent/CreateAgent.tsx
around lines 124 - 126, Extract the repeated cast logic into a shared helper
(e.g., isLongRunningTool) and use it from both CreateAgent.tsx and
EditAgent.tsx: implement a small function that accepts the tool part (type like
Record<string, unknown> or { [key: string]: unknown }) and safely reads
callProviderMetadata?.isLongRunning returning a boolean, then replace the inline
cast expression in CreateAgent (the isLongRunning computation) and the analogous
expression in EditAgent to call this helper.
autogpt_platform/backend/backend/copilot/sdk/service.py (1)
140-191: Race condition concern is mitigated by cache synchronization and fallback ID generation.

The session message is persisted before _callback fires. At line 956, upsert_chat_session() saves the AssistantMessage with tool calls to both the database and Redis cache synchronously via cache_chat_session(). The callback then queries the cached session, which should reflect the update. However, in a distributed system, a brief staleness window is theoretically possible. The fallback ID generation at lines 225–231 adequately handles this case by generating a sdk- ID if the tool_use_id lookup fails. The comment at line 220 asserting "CRITICAL" and "already saved" is accurate for database persistence, but the fallback design already accounts for any transient cache inconsistencies, making additional retry/delay logic unnecessary.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/sdk/service.py` around lines 140 -
191, The comment asserting "CRITICAL" and that the tool_use_id is "already
saved" should be softened to reflect that while upsert_chat_session persists to
DB and cache, a transient cache staleness window is possible and is handled by
the fallback sdk- ID generation; update the wording near upsert_chat_session and
the fallback ID generation (the code that emits an sdk- ID) to remove the
absolute claim, note that _find_latest_tool_use_id may return None if cache is
briefly stale, and reference that the fallback path will generate a safe sdk- ID
so no additional retry/delay logic is required.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between d5115de and b3f34b3.

📒 Files selected for processing (8)

autogpt_platform/backend/backend/copilot/response_model.py
autogpt_platform/backend/backend/copilot/sdk/response_adapter.py
autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/backend/backend/copilot/service.py
autogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.py
autogpt_platform/frontend/src/app/(platform)/copilot/tools/CreateAgent/CreateAgent.tsx
autogpt_platform/frontend/src/app/(platform)/copilot/tools/EditAgent/EditAgent.tsx
autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts

🚧 Files skipped from review as they are similar to previous changes (1)

autogpt_platform/backend/backend/copilot/response_model.py

🧰 Additional context used

📓 Path-based instructions (15)

autogpt_platform/frontend/**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

autogpt_platform/frontend/**/*.{ts,tsx,js,jsx}: Use Node.js 21+ with pnpm package manager for frontend development
Always run 'pnpm format' for formatting and linting code in frontend development

Files:

autogpt_platform/frontend/src/app/(platform)/copilot/tools/EditAgent/EditAgent.tsx
autogpt_platform/frontend/src/app/(platform)/copilot/tools/CreateAgent/CreateAgent.tsx
autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts

autogpt_platform/frontend/**/*.{tsx,ts}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

autogpt_platform/frontend/**/*.{tsx,ts}: Use function declarations for components and handlers (not arrow functions) in React components
Only use arrow functions for small inline lambdas (map, filter, etc.) in React components
Use PascalCase for component names and camelCase with 'use' prefix for hook names in React
Use Tailwind CSS utilities only for styling in frontend components
Use design system components from 'src/components/' (atoms, molecules, organisms) in frontend development
Never use 'src/components/legacy/' in frontend code
Only use Phosphor Icons (@phosphor-icons/react) for icons in frontend components
Use generated API hooks from '@/app/api/generated/endpoints/' instead of deprecated 'BackendAPI' or 'src/lib/autogpt-server-api/'
Use React Query for server state (via generated hooks) in frontend development
Default to client components ('use client') in Next.js; only use server components for SEO or extreme TTFB needs
Use '' component for rendering errors in frontend UI; use toast notifications for mutation errors; use 'Sentry.captureException()' for manual exceptions
Separate render logic from data/behavior in React components; keep comments minimal (code should be self-documenting)

Files:

autogpt_platform/frontend/src/app/(platform)/copilot/tools/EditAgent/EditAgent.tsx
autogpt_platform/frontend/src/app/(platform)/copilot/tools/CreateAgent/CreateAgent.tsx
autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts

autogpt_platform/frontend/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

autogpt_platform/frontend/**/*.{ts,tsx}: No barrel files or 'index.ts' re-exports in frontend code
Regenerate API hooks with 'pnpm generate:api' after backend OpenAPI spec changes in frontend development

Files:

autogpt_platform/frontend/src/app/(platform)/copilot/tools/EditAgent/EditAgent.tsx
autogpt_platform/frontend/src/app/(platform)/copilot/tools/CreateAgent/CreateAgent.tsx
autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts

autogpt_platform/frontend/src/**/*.{ts,tsx}

📄 CodeRabbit inference engine (autogpt_platform/frontend/CLAUDE.md)

autogpt_platform/frontend/src/**/*.{ts,tsx}: Fully capitalize acronyms in symbols, e.g. graphID, useBackendAPI
Use function declarations (not arrow functions) for components and handlers
Separate render logic (.tsx) from business logic (use*.ts hooks)
Use shadcn/ui (Radix UI primitives) with Tailwind CSS styling for UI components
Use Phosphor Icons only for icons
Use ErrorCard for render errors, toast for mutations, and Sentry for exceptions
Use design system components from src/components/ (atoms, molecules, organisms)
Never use src/components/__legacy__/* components
Use generated API hooks from @/app/api/__generated__/endpoints/ with pattern use{Method}{Version}{OperationName}
Use Tailwind CSS only for styling, with design tokens
Do not use useCallback or useMemo unless asked to optimize a given function
Never type with any unless a variable/attribute can ACTUALLY be of any type

autogpt_platform/frontend/src/**/*.{ts,tsx}: Structure components as ComponentName/ComponentName.tsx + useComponentName.ts + helpers.ts and use design system components from src/components/ (atoms, molecules, organisms)
Use generated API hooks from @/app/api/__generated__/endpoints/ with pattern use{Method}{Version}{OperationName} and regenerate with pnpm generate:api
Use function declarations (not arrow functions) for components and handlers
Separate render logic from business logic with component.tsx + useComponent.ts + helpers.ts structure
Colocate state when possible, avoid creating large components, use sub-components in local /components folder
Avoid large hooks, abstract logic into helpers.ts files when sensible
Use arrow functions only for callbacks, not for component declarations
Avoid comments at all times unless the code is very complex
Do not use useCallback or useMemo unless asked to optimize a given function

Files:

autogpt_platform/frontend/src/app/(platform)/copilot/tools/EditAgent/EditAgent.tsx
autogpt_platform/frontend/src/app/(platform)/copilot/tools/CreateAgent/CreateAgent.tsx
autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts

autogpt_platform/frontend/src/**/*.tsx

📄 CodeRabbit inference engine (autogpt_platform/frontend/CLAUDE.md)

Component props should be type Props = { ... } (not exported) unless it needs to be used outside the component

Component props should be interface Props { ... } (not exported) unless the interface needs to be used outside the component

Files:

autogpt_platform/frontend/src/app/(platform)/copilot/tools/EditAgent/EditAgent.tsx
autogpt_platform/frontend/src/app/(platform)/copilot/tools/CreateAgent/CreateAgent.tsx

autogpt_platform/frontend/**/*.{js,jsx,ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

autogpt_platform/frontend/**/*.{js,jsx,ts,tsx}: Format frontend code using pnpm format
Never use components from src/components/__legacy__/*

Files:

autogpt_platform/frontend/src/app/(platform)/copilot/tools/EditAgent/EditAgent.tsx
autogpt_platform/frontend/src/app/(platform)/copilot/tools/CreateAgent/CreateAgent.tsx
autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts

autogpt_platform/frontend/**/*.{js,jsx,ts,tsx,css}

📄 CodeRabbit inference engine (AGENTS.md)

Use Tailwind CSS only for styling, use design tokens, and use Phosphor Icons only

Files:

autogpt_platform/frontend/src/app/(platform)/copilot/tools/EditAgent/EditAgent.tsx
autogpt_platform/frontend/src/app/(platform)/copilot/tools/CreateAgent/CreateAgent.tsx
autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts

autogpt_platform/**/*.{ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

Never type with any, if no types available use unknown

Files:

autogpt_platform/frontend/src/app/(platform)/copilot/tools/EditAgent/EditAgent.tsx
autogpt_platform/frontend/src/app/(platform)/copilot/tools/CreateAgent/CreateAgent.tsx
autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts

autogpt_platform/frontend/src/app/(platform)/**/*.tsx

📄 CodeRabbit inference engine (AGENTS.md)

If adding protected frontend routes, update frontend/lib/supabase/middleware.ts

Files:

autogpt_platform/frontend/src/app/(platform)/copilot/tools/EditAgent/EditAgent.tsx
autogpt_platform/frontend/src/app/(platform)/copilot/tools/CreateAgent/CreateAgent.tsx

autogpt_platform/backend/**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

autogpt_platform/backend/**/*.py: Use Python 3.11 (required; managed by Poetry via pyproject.toml) for backend development
Always run 'poetry run format' (Black + isort) before linting in backend development
Always run 'poetry run lint' (ruff) after formatting in backend development

Files:

autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.py
autogpt_platform/backend/backend/copilot/service.py
autogpt_platform/backend/backend/copilot/sdk/response_adapter.py

autogpt_platform/backend/**/*.{py,txt}

📄 CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)

Use poetry run prefix for all Python commands, including testing, linting, formatting, and migrations

Files:

autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.py
autogpt_platform/backend/backend/copilot/service.py
autogpt_platform/backend/backend/copilot/sdk/response_adapter.py

autogpt_platform/backend/backend/**/*.py

📄 CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)

Use Prisma ORM for database operations in PostgreSQL with pgvector for embeddings

Files:

autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.py
autogpt_platform/backend/backend/copilot/service.py
autogpt_platform/backend/backend/copilot/sdk/response_adapter.py

autogpt_platform/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Format Python code with poetry run format

Files:

autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.py
autogpt_platform/backend/backend/copilot/service.py
autogpt_platform/backend/backend/copilot/sdk/response_adapter.py

autogpt_platform/frontend/src/**/*use*.ts

📄 CodeRabbit inference engine (autogpt_platform/frontend/CLAUDE.md)

Do not type hook returns, let TypeScript infer as much as possible

Files:

autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts

autogpt_platform/frontend/src/**/*.ts

📄 CodeRabbit inference engine (AGENTS.md)

Do not type hook returns, let Typescript infer as much as possible

Files:

autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts

🧠 Learnings (1)

📚 Learning: 2026-02-04T16:49:42.490Z

Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-02-04T16:49:42.490Z
Learning: Applies to autogpt_platform/frontend/**/*.{ts,tsx} : Regenerate API hooks with 'pnpm generate:api' after backend OpenAPI spec changes in frontend development

Applied to files:

autogpt_platform/frontend/src/app/(platform)/copilot/tools/CreateAgent/CreateAgent.tsx

🧬 Code graph analysis (5)

autogpt_platform/frontend/src/app/(platform)/copilot/tools/EditAgent/EditAgent.tsx (1)

autogpt_platform/frontend/src/app/(platform)/copilot/tools/CreateAgent/helpers.tsx (3)

isOperationStartedOutput (75-82)

isOperationPendingOutput (84-88)

isOperationInProgressOutput (90-97)

autogpt_platform/frontend/src/app/(platform)/copilot/tools/CreateAgent/CreateAgent.tsx (1)

autogpt_platform/frontend/src/app/(platform)/copilot/tools/CreateAgent/helpers.tsx (3)

isOperationStartedOutput (75-82)

isOperationPendingOutput (84-88)

isOperationInProgressOutput (90-97)

autogpt_platform/backend/backend/copilot/sdk/service.py (3)

autogpt_platform/backend/backend/copilot/db.py (1)

get_chat_session (26-32)

autogpt_platform/backend/backend/copilot/model.py (1)

get_chat_session (342-394)

autogpt_platform/backend/backend/copilot/response_model.py (1)

StreamToolInputAvailable (149-161)

autogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.py (3)

autogpt_platform/backend/backend/util/logging.py (2)

info (41-43)

error (49-51)

autogpt_platform/backend/backend/copilot/stream_registry.py (1)

create_task (86-177)

autogpt_platform/backend/backend/copilot/completion_consumer.py (1)

publish_operation_complete (319-349)

autogpt_platform/backend/backend/copilot/sdk/response_adapter.py (2)

autogpt_platform/backend/backend/util/logging.py (1)

debug (53-55)

autogpt_platform/backend/backend/copilot/response_model.py (6)

StreamToolInputAvailable (149-161)

to_sse (51-60)

to_sse (76-84)

to_sse (178-187)

to_sse (212-222)

to_sse (237-239)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)

GitHub Check: types
GitHub Check: Seer Code Review
GitHub Check: end-to-end tests
GitHub Check: Analyze (python)
GitHub Check: Check PR Status
GitHub Check: test (3.11)
GitHub Check: test (3.13)
GitHub Check: test (3.12)

🔇 Additional comments (9)

autogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.py (1)

16-17: Background task tracking pattern is correct

_background_tasks.add(bg_task) followed by bg_task.add_done_callback(_background_tasks.discard) is the idiomatic asyncio pattern for preventing premature GC of fire-and-forget tasks, and the callback receives the task as its sole argument matching set.discard's signature.

Also applies to: 123-127, 189-195

autogpt_platform/backend/backend/copilot/sdk/response_adapter.py (1)

117-150: LGTM — debug logging properly guarded and provider metadata cleanly set.

The previous review feedback has been addressed: logging is at DEBUG level and serialization is gated behind logger.isEnabledFor(logging.DEBUG).

autogpt_platform/backend/backend/copilot/sdk/service.py (2)

243-284: Synchronous execution path looks correct.

The refactor from async background tasks to synchronous blocking is clean:

Tool executes via await _execute_long_running_tool_with_streaming

Fallback to a generic message if result_content is None (Line 272)

Returns actual content to Claude as a single MCP result

One minor note: Line 272 returns a hardcoded JSON string '{"message": "Operation completed"}' while _execute_long_running_tool_with_streaming (in service.py Line 1545) uses the same fallback string. This is consistent.

909-912: LGTM — safe attribute access for debug logging.

Using getattr(response, "callProviderMetadata", None) is appropriate here since response is typed as StreamBaseResponse which doesn't necessarily have that field.

autogpt_platform/backend/backend/copilot/service.py (4)

1420-1433: LGTM — provider metadata for long-running tools in the non-SDK path.

Consistent with the SDK path in response_adapter.py. The get_tool lookup and is_long_running check is a cleaner approach than hardcoding tool names.

1484-1535: Heartbeat loop during synchronous long-running execution looks correct.

This addresses the prior review feedback. The asyncio.shield + asyncio.wait_for pattern is consistent with the normal tool heartbeat loop (Lines 1563-1568) and correctly protects the inner task from cancellation on timeout.

1796-1803: LGTM — skip DB update for fallback SDK IDs.

Correctly prevents attempting to update messages by a generated sdk- ID that doesn't match any stored message.

66-66: LGTM — import cleanup consistent with removal of pending/started flow.

autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts (1)

230-233: LGTM — comment accurately reflects the updated architecture.

The documentation now correctly distinguishes between the async webhook fallback path, legacy operation states, and the new synchronous execution path.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@autogpt_platform/backend/backend/copilot/service.py`:
- Around line 1673-1731: The polling success path returns result_content without
clearing the running-operation key, so when delegated_to_async is True the
finally block won't run cleanup; add an awaited call to
_mark_operation_completed(tool_call_id) immediately after detecting the task
completed (before querying the DB or before returning result_content) to ensure
RUNNING_OPERATION_PREFIX+tool_call_id is cleared—place the await
_mark_operation_completed(tool_call_id) in the same branch where
task_status.status != "running" (or just before returning result_content) and
keep error handling minimal (let any exception bubble or log) so the final DB
lookup and return still proceed.

In `@autogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.py`:
- Around line 228-231: The background task _publish_dummy_patch_after_delay
currently captures current_agent by reference and only calls
current_agent.copy() after awaiting asyncio.sleep(delay_seconds), which risks
patching a mutated dict; fix by making a defensive shallow copy of current_agent
(e.g., saved_agent = current_agent.copy()) and any data needed from
update_request before the asyncio.sleep call, then use saved_agent to build
patched and set patched["description"] = f"{saved_agent.get('description', '')}
(updated: {update_request})" after the delay so the patch operates on the
immutable snapshot.
- Around line 154-166: The two helper functions
_publish_dummy_result_after_delay and _publish_dummy_patch_after_delay emit
inconsistent result payloads—one sends result={"agent_json": agent_json} while
the other sends result={"type":"agent","agent_json": patched"}—and the "type"
field is unused downstream; standardize both to the same shape by removing the
unused "type" key (or add it to both if intentionally needed) so both publish
helpers emit result with the identical top-level keys (e.g.,
result={"agent_json": ...}), and update the logger messages if needed to reflect
the standardized payload; ensure completion_handler.py behavior remains the same
by keeping agent_json present.

In
`@autogpt_platform/frontend/src/app/`(platform)/copilot/tools/CreateAgent/CreateAgent.tsx:
- Around line 122-133: hasExpandableContent currently only checks part.state ===
"output-available", so when a tool is synchronous-blocking (isLongRunning ===
true) the UI sets isOperating but the ToolAccordion/MiniGame never renders;
update the hasExpandableContent logic to also treat isLongRunning (or
isOperating) as allowing expandable content (i.e., include isLongRunning ||
isOperating in the condition that previously required part.state ===
"output-available"), and apply the same change in the corresponding EditAgent
component where hasExpandableContent is computed.

In
`@autogpt_platform/frontend/src/app/`(platform)/copilot/tools/EditAgent/EditAgent.tsx:
- Around line 107-119: hasExpandableContent currently only returns true when
part.state === "output-available", so ToolAccordion/MiniGame won't render during
synchronous blocking; update the hasExpandableContent computation to also allow
isOperating (which is computed from isLongRunning and output checks) to enable
UI while the backend is blocking—modify the expression that defines
hasExpandableContent in EditAgent.tsx to include isOperating (or isLongRunning)
alongside the existing part.state === "output-available" check so ToolAccordion
and MiniGame render during long-running synchronous operations.

---

Nitpick comments:
In `@autogpt_platform/backend/backend/copilot/sdk/service.py`:
- Around line 140-191: The comment asserting "CRITICAL" and that the tool_use_id
is "already saved" should be softened to reflect that while upsert_chat_session
persists to DB and cache, a transient cache staleness window is possible and is
handled by the fallback sdk- ID generation; update the wording near
upsert_chat_session and the fallback ID generation (the code that emits an sdk-
ID) to remove the absolute claim, note that _find_latest_tool_use_id may return
None if cache is briefly stale, and reference that the fallback path will
generate a safe sdk- ID so no additional retry/delay logic is required.

In `@autogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.py`:
- Around line 154-166: The error handlers in _publish_dummy_result_after_delay
and _publish_dummy_patch_after_delay currently log only the exception string,
losing tracebacks; update both except blocks to pass exc_info=True to
logger.error (keeping the existing message and exception variable e) so the full
traceback is preserved for debugging when publish_operation_complete or
publish_operation_patch fails.

In
`@autogpt_platform/frontend/src/app/`(platform)/copilot/tools/CreateAgent/CreateAgent.tsx:
- Around line 124-126: Extract the repeated cast logic into a shared helper
(e.g., isLongRunningTool) and use it from both CreateAgent.tsx and
EditAgent.tsx: implement a small function that accepts the tool part (type like
Record<string, unknown> or { [key: string]: unknown }) and safely reads
callProviderMetadata?.isLongRunning returning a boolean, then replace the inline
cast expression in CreateAgent (the isLongRunning computation) and the analogous
expression in EditAgent to call this helper.

autogpt_platform/backend/backend/copilot/service.py

autogpt_platform/backend/backend/copilot/tools/agent_generator/dummy.py

autogpt_platform/frontend/src/app/(platform)/copilot/tools/CreateAgent/CreateAgent.tsx

autogpt_platform/frontend/src/app/(platform)/copilot/tools/EditAgent/EditAgent.tsx

…y race condition - Fix hasExpandableContent to check isLongRunning first (CreateAgent/EditAgent) Without this, MiniGame accordion won't expand during synchronous blocking - Fix Redis lock not released in polling success path (service.py:1731) - Fix race condition in dummy.py: copy current_agent BEFORE sleep - Standardize dummy helper payloads (remove inconsistent "type" field) - Add exc_info=True to error logging in dummy helpers

autogpt_platform/backend/backend/copilot/service.py

- Fix _stream_listener infinite loop when session metadata expires: hget("status") returns None after TTL, but `None and x` is False so listener never breaks, keeping SSE open and chat locked forever. Now treats missing metadata as "not running" and exits. - Cancel now breaks the stream loop immediately instead of draining - Remove redundant StreamFinishStep publish from error handler - Publish StreamFinishStep to Redis (step-level signal for frontend) - Catch BaseException in executor safety net - Log errors from safety-net mark_session_completed instead of silencing - Enable refetchOnMount/refetchOnReconnect on frontend session query

…services

autogpt_platform/backend/backend/copilot/stream_registry.py

…rom services

…tures

…initialization The session-scoped async test data fixtures do DB operations via Prisma but didn't depend on the `server` fixture (which calls `db.connect()`). Without the explicit dependency, pytest can't guarantee `server` runs first, causing "Event loop is closed" errors in CI.

sentry · 2026-02-28T15:27:45Z

Issues attributed to commits in this pull request

This pull request was merged and Sentry observed the following issues:

‼️ SandboxException: 401: authorization header is mal... in app:local-behave:local

majdyz requested a review from a team as a code owner February 21, 2026 18:08

majdyz requested review from Pwuts and Swiftyos and removed request for a team February 21, 2026 18:08

github-project-automation bot added this to AutoGPT development kanban Feb 21, 2026

github-project-automation bot moved this to 🆕 Needs initial review in AutoGPT development kanban Feb 21, 2026

github-actions bot added platform/backend AutoGPT Platform - Back end size/xl labels Feb 21, 2026

sentry bot reviewed Feb 21, 2026

View reviewed changes

autogpt_platform/backend/backend/copilot/service.py Outdated Show resolved Hide resolved

majdyz commented Feb 21, 2026

View reviewed changes

autogpt_platform/backend/backend/copilot/response_model.py Outdated Show resolved Hide resolved

greptile-apps bot reviewed Feb 21, 2026

View reviewed changes

majdyz commented Feb 21, 2026

View reviewed changes

autogpt_platform/backend/backend/copilot/sdk/service.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Feb 21, 2026

View reviewed changes

autogpt-reviewer suggested changes Feb 21, 2026

View reviewed changes

github-project-automation bot moved this from 🆕 Needs initial review to 🚧 Needs work in AutoGPT development kanban Feb 21, 2026

autogpt-reviewer reviewed Feb 21, 2026

View reviewed changes

autogpt_platform/backend/backend/copilot/sdk/service.py Outdated Show resolved Hide resolved

autogpt-reviewer reviewed Feb 21, 2026

View reviewed changes

autogpt_platform/backend/backend/copilot/service.py Outdated Show resolved Hide resolved

autogpt-reviewer reviewed Feb 21, 2026

View reviewed changes

autogpt_platform/backend/backend/copilot/service.py Outdated Show resolved Hide resolved

github-actions bot added the platform/frontend AutoGPT Platform - Front end label Feb 21, 2026

github-project-automation bot added this to Frontend Feb 21, 2026

coderabbitai bot reviewed Feb 21, 2026

View reviewed changes

majdyz added 2 commits February 22, 2026 07:16

chore: untrack accidentally committed .application.logs file

f2c9471

sentry bot reviewed Feb 22, 2026

View reviewed changes

autogpt_platform/backend/backend/copilot/service.py Outdated Show resolved Hide resolved

autogpt_platform/backend/backend/copilot/service.py Outdated Show resolved Hide resolved

majdyz force-pushed the fix/copilot-synchronous-long-running-tools branch 4 times, most recently from 0198c06 to 952a25c Compare February 24, 2026 09:59

Swiftyos previously approved these changes Feb 24, 2026

View reviewed changes

github-project-automation bot moved this from 🚧 Needs work to 👍🏼 Mergeable in AutoGPT development kanban Feb 24, 2026

majdyz dismissed Swiftyos’s stale review via 5d9b122 February 24, 2026 10:12

majdyz force-pushed the fix/copilot-synchronous-long-running-tools branch from 952a25c to 5d9b122 Compare February 24, 2026 10:12

fix(backend/copilot): update e2e tests for StreamFinish removal from …

1754e34

…services

sentry bot reviewed Feb 24, 2026

View reviewed changes

autogpt_platform/backend/backend/copilot/stream_registry.py Show resolved Hide resolved

fix(backend/copilot): update service tests for StreamFinish removal f…

ad3260a

…rom services

majdyz requested a review from Swiftyos February 24, 2026 10:23

fix(backend/copilot): use pytest_asyncio for session-scoped async fix…

da37343

…tures

majdyz enabled auto-merge February 24, 2026 10:30

majdyz changed the title ~~fix(platform/copilot): synchronous tools + turn isolation + critical bug fixes~~ fix(platform/copilot): fix stuck sessions, stop button, and StreamFinish reliability Feb 24, 2026

Swiftyos approved these changes Feb 24, 2026

View reviewed changes

majdyz added this pull request to the merge queue Feb 24, 2026

Merged via the queue into dev with commit 0e72e1f Feb 24, 2026
27 checks passed

majdyz deleted the fix/copilot-synchronous-long-running-tools branch February 24, 2026 11:06

github-project-automation bot moved this to Done in Frontend Feb 24, 2026

github-project-automation bot moved this from 👍🏼 Mergeable to ✅ Done in AutoGPT development kanban Feb 24, 2026

This was referenced Feb 25, 2026

feat(backend/copilot): async polling for agent-generator + SSE auto-reconnect #12199

Closed

fix(backend/frontend): error handling, stream reconnection, and chat switching #12205

Merged

Conversation

majdyz commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

coderabbitai bot commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions bot commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔍 PR Overlap Detection

🔴 Merge Conflicts Detected

🟢 Low Risk — File Overlap Only

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot commented Feb 21, 2026

Greptile Summary

Confidence Score: 3/5

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot commented Feb 21, 2026

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

autogpt-reviewer left a comment

Choose a reason for hiding this comment

📋 Final Review Summary

PR: #12191 — fix(backend/copilot): refactor long-running tools to synchronous execution

Specialist Verdicts

Blockers (Must Fix Before Merge)

Should Fix

Nice to Have

Risk Assessment

🎯 Final Verdict: REQUEST CHANGES

Recommendation

For Maintainers

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sentry bot commented Feb 28, 2026

Issues attributed to commits in this pull request

Uh oh!

majdyz commented Feb 21, 2026 •

edited

Loading

coderabbitai bot commented Feb 21, 2026 •

edited

Loading

github-actions bot commented Feb 21, 2026 •

edited

Loading