Skip to content
Writing
May 2026 1 min read

The seam is the system

Reliability is rarely lost inside a service. It is lost at the boundaries — so that is where the design effort belongs.

Distributed SystemsReliability

Every postmortem I have written eventually points at a boundary: a retry that amplified load, a timeout that did not compose, a queue that silently reordered. The code inside each service was fine. The system failed in the gaps between the services.

We spend most of our design energy on the boxes and almost none on the arrows. But the arrows are where the interesting failures live — partial failure, backpressure, clock skew, the thundering herd.

A useful discipline: for every arrow in your architecture diagram, write down what happens when it is slow, when it is down, and when it lies. If you cannot answer all three, you have found your next design task.

Comments

Loading comments…