Hardening a Voice Squad Means Hardening the Workflow Around It

A multi-agent voice squad does not fail only inside the model.

It can fail in the workflow around the model.

That was the useful lesson from hardening the PM Squad deployment flow.

The immediate problem looked simple: after creating a new squad member assistant, the assistant existed in Vapi and the local `<slug>_current.json` snapshot existed, but the squad config still had the old member list. The next operator step, `pm_vapi_squad.py update`, reads the static squad JSON it is given. It does not discover assistant snapshots.

So the system had two true facts at once.

The assistant had been created successfully.

The squad still did not know about it.

That is exactly the kind of gap that makes multi-agent systems feel unreliable. The runtime may be healthy, the assistant may be valid, and the prompt may be correct, but the orchestration layer is stale.

The fix was not to make the live update command smarter by reaching into every assistant directory and guessing intent.

The better move was to add a local-only sync gate.

The new command is explicit:

`python3 pm_vapi_squad.py sync-member --member pm_squad_booking --config agent-configs/dev/vapi/squads/pm_squad_current_reset.squad.json`

It reads the created assistant ID from `agent-configs/dev/assistants/<slug>/<slug>_current.json`.

It reads the member role from `agent-configs/dev/assistants/<slug>/<slug>.deploy.json`.

It adds the assistant ID to the squad `members` array only if it is absent.

It updates `_meta.currentResetState.deployedMembers`.

It removes the slug from `pendingMembersWithoutAssistantIds`.

It is idempotent.

That last property matters. A hardening command should be safe to rerun when the operator is uncertain. If the command can only be run once, it becomes another source of operational anxiety. If it can be rerun and report “already synced,” it becomes a recovery tool.

The second hardening decision was to keep membership sync generic but handoff wiring explicit.

Adding a member to a squad is mechanical. If the assistant has a valid ID, the local squad config can reference it.

Adding handoff routes is not mechanical. It is product logic.

So the implementation uses role templates. Booking has declared edges: Orchestrator to Booking, Knowledge to Booking, Booking to Orchestrator, and Booking to Messaging. Email Capture has declared edges between Booking and Email Capture. Unknown roles still sync membership by default, but they warn that no handoff wiring was added. Operators can opt into stricter behavior with `--require-wiring-template`.

This is a useful pattern: make the safe, structural operation generic; make the behavioral operation explicit.

The third hardening decision was to generate handoff tools in the squad context, not mutate the base assistant tools.

The generated Vapi handoffs go under member-level `assistantOverrides["tools:append"]`.

They do not replace `assistantOverrides.model.tools`.

They do not rewrite the saved assistant snapshot.

They do not overwrite existing handoffs.

That boundary keeps squad-only routing attached to the squad configuration. A specialist assistant can remain a reusable assistant artifact, while the squad config owns the squad-specific graph.

The fourth hardening decision was to make the deploy script call local sync only after a successful create.

The create path now does the assistant-level work first: render payload, create assistant, validate the response, write the deploy response, write the accepted current snapshot, and initialize config patch artifacts.

Only after those steps succeed does it sync the local squad config.

It still does not run `pm_vapi_squad.py update`.

That distinction is the safety boundary.

Assistant create is a live assistant mutation. Local squad sync is a file mutation. Live squad update is a separate live squad mutation. Combining all three into one opaque action would be convenient, but it would remove the operator review point.

The fifth hardening decision was to make the dry-run artifact the review contract.

After sync, the next command is:

`python3 pm_vapi_squad.py update agent-configs/dev/vapi/squads/pm_squad_current_reset.squad.json --dry-run -v`

That writes `output/squad_preview.json`.

The preview is the exact payload that would be sent to Vapi, with `_meta` stripped. For the Booking case, the preview should show four members: Orchestrator, Knowledge, Messaging, and Booking. It should show Booking handoff wiring under `assistantOverrides["tools:append"]`. It should not show handoffs under `assistantOverrides.model.tools`.

This small artifact distinction prevented confusion in the session.

`output/squad_preview.json` is for dry-run review.

`output/squad_response.json` is for the live Vapi squad response after a real update or get.

Those two files answer different questions.

The preview answers: what are we about to send?

The response answers: what does Vapi say is live?

If those are blurred together, an operator may inspect the wrong file and think the flow failed.

The sixth hardening decision was to document the recovery path.

The automatic create-to-sync path was implemented, but the user later ran the dry-run and found only three members in the preview. That exposed a documentation gap. The manual said the sync should happen automatically, but did not clearly show the manual `sync-member` command for recovery.

That is part of hardening too.

A process is not hardened when only the happy path is documented. It is hardened when the operator knows what to check when the output looks wrong.

The recovery rule is now clear:

If the dry-run preview still has the old member count, run `sync-member` manually, rerun the dry-run update, and review `output/squad_preview.json` again before any live squad update.

That is not a workaround. It is an explicit operational escape hatch.

The seventh hardening decision was to test the behavior at the boundary level.

The focused tests do not only check that a function returns true. They check the operator contract:

Booking sync into a three-member config adds the fourth member.

Generic roles sync membership and warn without adding handoffs.

Strict mode fails for unknown roles.

Booking and Email Capture templates generate the expected role edges.

Generated handoffs are appended through `assistantOverrides["tools:append"]`.

Existing overrides are preserved.

Running sync twice does not duplicate members or handoffs.

Missing snapshots and missing assistant IDs fail clearly.

Dry-run update can read the synced four-member config without calling Vapi.

That last test is important because it validates the safety boundary directly. Dry-run should produce a preview artifact, not mutate the live squad.

The broader lesson is this:

A voice squad is not just a set of assistants. It is a chain of state transitions.

Assistant source files become rendered payloads.

Rendered payloads become live assistants.

Live assistant IDs become local squad member references.

Local role metadata becomes squad handoff wiring.

The local squad config becomes a dry-run preview.

The reviewed preview becomes a live squad update.

The live squad response becomes the final verification artifact.

Hardening means each transition has an owner, an artifact, a failure mode, and a recovery path.

That is what makes the workflow trustworthy.

Not because it eliminates human review.

Because it gives human review the right file, at the right time, with a safe command to rerun when the state is stale.

-----------
If you find this content useful, please share it with this link: [https://patrickmichael.co.za/subscribe](https://patrickmichael.co.za/subscribe)

Classification

All