Sub-Agents Need a Budget, Not Just a Rubric

TL;DR: Sub-agent loops can make AI work more trustworthy, but they are not free. Every extra scan, plan, review, and retry can consume context and tokens. The answer is not to avoid sub-agents. The answer is to put a budget around what each agent receives and returns.

A graded sub-agent loop is powerful because it separates jobs.
--------
Our typical loop process:
---
Plan with phased checklist -> review and pass
---
Orchestrate -> Implement -> Review against Grading Rubric PASS/FAIL -> Loop until PASS
-------
One agent implements. Another reviews. The reviewer returns PASS or FAIL. If there are must-fix issues, the implementer fixes only those issues, then the reviewer checks again.

That structure creates confidence. You are not relying on one assistant to do the work and grade its own homework. You get a second role whose job is judgment.

But there is a hidden cost.

If every agent receives the full conversation, the full plan, all previous commentary, the complete diff, and the whole architecture context, the loop becomes expensive fast. The process may be correct, but wasteful. Worse, the parent session gets heavier each time the loop runs.

That was the practical tradeoff in this session.

We were ooptimising the Codex sub-agent architecture for a real project. The goal was not just to use sub-agents. The goal was to use them without turning every phase into a context dump.

The control pattern was simple.

Run one phase at a time.

Pass the plan path, not the full plan text.

Give the implementer only the checklist for the current phase, the relevant files, and a few invariants.

Give the reviewer only the changed files and the instruction to inspect `git diff -- <files>` against the phase checklist.

If review fails, send the implementer only the numbered must-fix list. Do not resend the scan notes, the full plan, prior reviewer commentary, or the whole diff.

That is disciplined context control.

It keeps the parent agent in an orchestration role. The parent does not try to carry every detail in chat. It routes the right context to the right sub-agent at the right time.

For a business user, the important point is cost control and confidence at the same time.

The rubric and review loop increase trust. The bounded handoffs keep the process from becoming unnecessarily expensive. Both matter.

Without grading, you get fast output with weak confidence.

Without context gating, you get confident output with hidden cost.

The useful middle is a controlled loop: one phase, one implementer, one reviewer, scoped evidence, explicit PASS or FAIL, and must-fix-only retries.

That is how sub-agents become operational instead of theatrical.

The lesson is not "always use more agents." The lesson is: use specialized agents when the work deserves them, and make every handoff earn its place.

A good sub-agent workflow should answer two questions before it runs: what does this agent need to know, and what is it allowed to return?

If you can answer those clearly, the loop gives you better outputs and a reason to trust them without letting context costs run away.

-----------
If you find this content useful, please share it with this link: [https://patrickmichael.co.za/subscribe](https://patrickmichael.co.za/subscribe)

Classification

All