There is a common refrain from many that “why can’t agents just be tools?”. This blog provides a point of view about why tools and agents should be treated differently, and why there needs to be a different way to interact with them.
When asking about whether agent interoperation is simply just using an agent as a tool in another agent, we need to define the types of operational boundaries that exist. To do this we must move our thinking to the source of the control flow decisions and the scope that decisions are to be made in.
The well-defined nature of a tool
Consider the case of an agent that can utilize multiple tools. The tools are used to take action, either gather information, make a change or to transform information from one form to another. In each of these cases, there is a temporal relationship that input is provided and some output is returned. This is illustrated in the figure below, where there is also an example of streaming action.
Long running operations (LRO) are conceptually identical to the standard flow, simply with a longer time axis. Practically, the output is a reference to the LRO with a separate action to inspect its state.
When talking about a unit of action there is an implicit expectation that there is a temporal structure that follows the sequence - request action, undertake action, complete action, with error handling when actions can’t be completed. In this sequence there is no alternative state besides working, completed or error. This aligns to the traditional API structures and is the basis of the concept of a tool. A tool is something that can be asked to take an action, can be awaited for completion of the action, and can report errors.
For problem solving, the interaction with another agent must be able to report a request for more information or a suggestion to address a deficiency. One may try to implement these into the error code handling or extend the payload of completion in the tool definition, but this moves the tool from a unit of action, to a participant in problem solving. Things become even more complicated if a user (human or agent) changes the goal during the problem solving process. The position of this post is that we should model these two scenarios as different interactions.
Agents are problem-solving collaborators
Agents are expected to have autonomy. This means that they are empowered to make decisions based on the information available and environment they are operating in. They are able to handle changing requirements (“I changed my mind, instead of A, I’d like B”). Take for example an agent that is asked to “update the user’s address to X”. This sounds like an action request, which should simply complete with a success or failure outcome. But in fact, depending on the agent’s environment and knowledge, it could be much more complex. For example:
- What if the only address the agent tracks is an email address and the address provided is a physical mailing address?
- What if the system the agent is using requires some sort of verification before making such a change? For example, providing proof of address?
- What if the system of record the agent is talking to has expectations, like all user accounts with addresses must also have phone numbers?
In this world, the agent is not a tool, it is now a problem solver. In order to be a problem solver, it needs to engage with the caller to determine the best course of action. The diagram below shows the multi-turn nature of the agent interface, with a tool interface used by the agent.
With this interface, the agent can return information to the caller to work together to solve a problem. For example:
- Sorry, I need an email address to update the user’s profile, you provided a mailing address
- Yes, I can update the address. But first the address must be verified by following the steps at http://verify.my.address/
- I’d love to update the address. It looks like we don’t have a phone number for the account. I’ll need that to complete the address update.
At this point the caller can make its own decisions about how to proceed. Maybe they have the necessary information available and can provide it, maybe this needs to be surfaced to a human for action, etc. The key difference between this and the tool flow is that the action is not guaranteed to be completed upon return. The action is in an incomplete or interrupted state; it has started, perhaps some subactions have been completed, but the original action is incomplete. Further to this, it is possible that the action will never be completed. For example, if the user doesn’t verify their address, the action to change the address is never completed.
Structured vs unbounded interactions
Digging a little deeper, there is a distinction about the shape the inputs and outputs have for a tool versus an agent. As mentioned above, tools define a unit of action, and this definition can be made with a highly structured input and output schema. This structure limits the input domain, ?, and output range, ℝ, so that the action space is tightly constrained. This tight constraint allows for a clear interpretation of the error space. Anything that doesn’t conform to the input domain is an error, and anything that can’t be represented by the output range is an error. How errors are handled is up to the caller but the representation of the error state is minimally defined by an error code.
Thus, we define a tool as a time boxed action that uses structured inputs and outputs to define the functional mapping ƒ(x ∈ ?) → y ∈ ℝ with an error state if x ∉∊? or the function produces y ∉ ℝ. This definition allows the usage of a tool to adhere to structural expectations and model based reasoning about the tool usage, i.e. ensure the model can generate an appropriate x ∈ ? and that the target output is y ∈ ℝ. If an error is returned, the model can reason one of these two conditions is not met. It can also assume that either an error or y ∈ ℝ will be returned upon completion of the action. The error state of the action implies that the action failed. The action won’t be resumed when called again, instead a new action with a new x ∈ ? is executed. This is a key distinction between a tool and an agent.
An agent interaction, however, both ? and ℝ are effectively unbounded. Additionally, the action might not be completed upon the return of y ∈ ℝ, instead, y would contain some incremental information that the caller should consume, transform and pass back to the agent to continue. This iterative process toward the action completion introduces a new set of challenges and why we propose that this interface is different from the tool interface - thus Agents should not be treated as tools. The tool interface is a degenerative case of the agent interface. Agent as a tool should only be used in the situations where the degenerative case is the only one you wish to support, i.e. that the agent can take an action and see it completed or error and not reach an interrupted state that needs resumption.
Illustrative example: The Trip Planner
We turn to a more concrete example to illustrate how problem solving is key to the ability to reach the goal. This example is based on the one found at [a2a-samples]. Here the problem space is to help a user to plan a trip. This trip planning solution has multiple participants, an Orchestrator, a Planner, a flight booking agent, a hotel booking agent and a car rental booking. This system is designed to help a user with a task like, “I would like to go to London for 5 days in July for under $4000.”
Here the planner must come up with criteria to begin the search. For example allocation of budget and parameters around locations and time lines. The plan is used to interact with the 3 different agents to develop a proposal. Each part of the plan may need specific information that might require the orchestrator to surface to the user. For example, “what airline are you departing from?” or “what tube stop in London do you want to be close too”. One could even be smart enough to ask “most people don’t rent cars in London, are you sure you want to?”.
The key is this is a problem solving interaction. What are the right sets of parameters that can be used to fit the overall constraints of the problem space. These constraints aren’t even known a priori but instead are developed through multiple interactions.
Once the full set of constraints are known (including budgets, time frames, more specific personal preferences around airline class or more precise location, for example) the problem becomes easier to identify a set of best options. The user can be presented with these options and the bookings can be made. The searches and the bookings are all done via tools, while the navigation through the constraints and search space is done via open ended and multiple engagements across numerous agents.
GOTO and the path forward
This framing can be thought of analogously to the use of GOTO in traditional programming. The agent interface introduces a GOTO circumstance, the control flow can be forced to leave the expected execution context and potentially never return. It might be pushed into another GOTO to another context. This type of circumstance is much harder to manage and design around. Just like the arguments around the use of GOTO in programming, where the recommendation is to isolate this usage to a few constructs, like breaking loops, we too propose we should isolate it to the agent-to-agent boundary. We should not push it to the tool boundary, where structured programming, improved readability and interpretation and debuggability expectations can be enforced.
If we utilize these principles, each individual agent can be defined to operate in its action space in a more controlled and constrained manner, improving the ability to scale solutions.

