The Token Cost Hidden Inside Agent Hype

The Token Cost Hidden Inside Agent Hype

TL;DR: MCP is useful when the model needs to make a judgment, work through ambiguity, or use tools in context. But it is usually a higher-cost way to run a known operation. The model spends tokens inspecting tools, choosing actions, preparing arguments, reading responses, and reasoning through outcomes. If the job has known inputs and a known provider, a direct API call is often cheaper, safer, and easier to test. Let the LLM help with thinking and drafting. Let backend code handle reliable execution.

One of the most under-discussed costs in agent systems is not the subscription price, the model API bill, or the fee charged by a provider.

It is the token cost of putting the model inside the execution path for work that does not actually need model judgment.

This gets hidden by the promise of agents.

The pitch is attractive: connect tools through MCP, give the agent access to the systems it needs, and let it get work done.

That is a real capability.

It can be extremely useful when the work is ambiguous, context-heavy, or dependent on judgment.

But an MCP tool call is not the same cost shape as a direct API call.

That distinction matters.

A direct API call is boring in the best possible way. The application already knows what it wants to do. It has a typed request shape. It validates the inputs. It calls the provider. It handles the response. It stores the result. It exposes status to the user.

The model does not need to read the tool list. It does not need to choose the tool. It does not need to infer which operation should run. It does not need to turn natural language into provider arguments. It does not need to inspect a raw provider response just to decide whether a known operation succeeded.

With an MCP-mediated workflow, the shape is different.

The model may have to inspect available tools, decide which tool fits the request, prepare the arguments, call the tool, read the result, interpret the response, explain the outcome, and sometimes reason through a retry.

Every one of those steps can add tokens.

Sometimes that token spend is worth it.

If the agent needs to compare context, decide whether an email should be sent, draft a message, choose between possible audiences, detect missing information, or ask the operator for clarification, then the model is doing useful work.

In that case, tool use is not just execution. It is part of a reasoning loop.

But if the operation is already known, repeatable, and sensitive, the cost starts looking different.

This is where Mailgun becomes a useful example.

Imagine adding email sending to a local-first app like pm_bot.

The app already has strong boundaries. The browser is the operator interface. Runtime state stays inside the repo. Filesystem access is backend-only. Secrets should remain backend-only. Content workflows have explicit steps for drafting, staging, importing, retrying, and publishing.

Now suppose the goal is simple: send a known email through Mailgun.

The agent-native temptation is to expose Mailgun through MCP and let the agent send it.

That may work. The email may go out. Mailgun may return a message id. The app may record that something happened.

But the cost model has changed.

The model is no longer just helping with the language of the email. It is now participating in the operational act of sending it. It may be reading tool schemas, processing tool results, and interpreting provider responses that the application could have handled deterministically.

That is the part agent enthusiasm often skips.

Tool access is usually framed as capability expansion: the model can now do more things.

In production, tool access is also cost expansion.

More tool context. More intermediate reasoning. More provider output in the conversation. More chances for the model to spend tokens on work that application code could have performed without involving a model at all.

For Mailgun, a direct integration is usually the cleaner production path.

The app can validate the recipient, subject, template id, variables, and content before sending. It can keep the API key on the backend. It can call Mailgun directly. It can store the remote message id. It can persist status clearly: pending, succeeded, failed, retryable. It can show failures in the UI without asking a model to interpret every provider response.

That is not anti-agent.

It is good system design.

The LLM still has an important role.

Let it draft the email. Let it refine the subject line. Let it summarize the newsletter. Let it suggest which content might deserve a notification. Let it help the operator think.

Then let the app validate the result.

Let the operator approve the send when approval is part of the workflow.

Let backend code call Mailgun directly.

Let the app persist the send state, provider response, retry status, and audit trail.

That split matters because language generation and operational authority are not the same responsibility.

MCP is strongest when the task genuinely benefits from an agent using tools in context. Direct APIs are strongest when the application already knows the operation and needs reliable execution.

The mature question is not whether MCP is good or bad.

The mature question is: why is the model involved in this step?

If the answer is judgment, ambiguity, synthesis, or operator support, model involvement may be worth the token cost.

If the answer is simply that a known email needs to be sent through a known provider with known inputs, then the model is probably in the wrong part of the workflow.

This is especially important for local-first tools.

A local-first app should not turn a sensitive send operation into an opaque agent action just because the tool interface exists. Secrets should stay out of browser-visible state. Failures should remain visible. Retries should be explicit. Publishing and sending should have clear lifecycle boundaries.

Those requirements do not disappear because the agent can call a tool.

In fact, they become more important.

The more capable agents become, the more deliberate we need to be about where they belong.

The LLM should not become the default execution engine for every integration. Sometimes it should be the thinking layer. Sometimes it should be the drafting layer. Sometimes it should stay out of the path entirely.

Mailgun makes the point concrete, but the pattern is broader.

If you are connecting Stripe, Slack, GitHub, Notion, HubSpot, Mailgun, or any other provider, ask the same question before reaching for an MCP tool:

Does this step need judgment, or does it need reliable execution?

If it needs judgment, use the agent where its reasoning has value.

If it needs reliable execution, use application code.

That is where the real production discipline lives.

Not in connecting every possible tool to an agent, but in deciding which parts of the workflow deserve expensive model attention and which parts should be cheap, deterministic, testable code.

The hidden cost of MCP is not merely that it uses tokens. Everything involving a model uses tokens.

The hidden cost is using higher-cost model attention where you did not need a model in the first place.

-----------
If you find this content useful, please share it with this link: [https://patrickmichael.co.za/subscribe](https://patrickmichael.co.za/subscribe)

Classification