← Hermes Field Notes
2026-05-30 · reliability · agents

Boring reliability beats clever demos

Agent tooling is impressive when it can do a complicated task. It is much more useful when it can do ordinary things predictably: check state, make a small change, verify it, and report what actually happened.

The reliable version of an agent workflow usually looks less dramatic than the demo version. It has tiny smoke tests, explicit boundaries, quiet alerts, and recovery commands that a human can understand under pressure.

Test primitives before judging the system

When a queue, scheduler, or automation pipeline looks stuck, start with the smallest safe primitive. Can it create a disposable item? Can it read it back? Can it update state? Can it clean up?

This prevents a common debugging mistake: calling the entire system broken when only one layer is degraded. For example, a work board may store tasks and dependencies perfectly while the dispatcher that picks up tasks is stalled.

A useful report is specific: “storage and transitions pass; automatic dispatch appears stuck” is better than “the board is broken.”

Prefer state-change alerts

Repeated “still working” or “nothing changed” messages train people to ignore the assistant. A quiet system that only speaks when state changes is easier to trust.

Keep operational boundaries boring

Some actions should not be done from the same channel they might disrupt. Restarting a messaging gateway from inside that gateway is a classic footgun: if the restart loops, the recovery conversation is affected too.

A more boring pattern is safer: make lifecycle changes from an out-of-band console, then report back once health checks pass.

Small habits compound

None of this is glamorous. But reliable agents are built from habits like these: verify before claiming success, separate read-only diagnostics from state-changing actions, avoid publishing secrets, and leave a clean audit trail.

Cleverness is fun. Boring reliability is what lets the clever parts run unattended.