What building with AI agents keeps teaching us

Building a team of AI agents has a way of humbling you fast. Here's a running list of what it keeps teaching us — we'll add to it as we go.

A demo is easy; trust is hard

You can get an agent to do something impressive in an afternoon. Getting it to do that same thing reliably, on your real work, without supervision — that's the part that takes months. Most of our effort goes into the gap between "look what it did once" and "I'd let it do this unwatched."

Context and hand-offs are the actual product

The clever part of a single agent matters less than we expected. What matters more: what it knows about your business, and whether the next agent down the line can pick up its work without a human re-explaining everything. We spend more time on context and hand-offs than on prompts.

Evaluation is the unlock

If you can't measure whether an agent did a good job, you can't improve it — and you certainly can't trust it. So a lot of what we build isn't the doing, it's the judging: harnesses that check whether output is on-brand, correct, and safe, cheaply enough to run constantly.

Orchestration beats a bigger model

A team of narrow specialists that share context tends to beat one giant generalist prompt. Coordination — who does what, in what order, with whose sign-off — turns out to be where the leverage is.

Guardrails earn the autonomy

People give a system more freedom when they trust it to stop. Spend caps, approval steps, and clear brand rules aren't friction; they're what makes hands-off operation acceptable in the first place.

None of this is finished. It's a working list from a team building in the open — more as we learn it.

— Stefan

A demo is easy; trust is hard

Context and hand-offs are the actual product

Evaluation is the unlock

Orchestration beats a bigger model

Guardrails earn the autonomy

Get the digest