Building a team of AI agents has a way of humbling you fast. Here's a running list of what it keeps teaching us — we'll add to it as we go.
A demo is easy; trust is hard
You can get an agent to do something impressive in an afternoon. Getting it to do that same thing reliably, on your real work, without supervision — that's the part that takes months. Most of our effort goes into the gap between "look what it did once" and "I'd let it do this unwatched."
Context and hand-offs are the actual product
The clever part of a single agent matters less than we expected. What matters more: what it knows about your business, and whether the next agent down the line can pick up its work without a human re-explaining everything. We spend more time on context and hand-offs than on prompts.
Evaluation is the unlock
If you can't measure whether an agent did a good job, you can't improve it — and you certainly can't trust it. So a lot of what we build isn't the doing, it's the judging: harnesses that check whether output is on-brand, correct, and safe, cheaply enough to run constantly.
Orchestration beats a bigger model
A team of narrow specialists that share context tends to beat one giant generalist prompt. Coordination — who does what, in what order, with whose sign-off — turns out to be where the leverage is.
Guardrails earn the autonomy
People give a system more freedom when they trust it to stop. Spend caps, approval steps, and clear brand rules aren't friction; they're what makes hands-off operation acceptable in the first place.
None of this is finished. It's a working list from a team building in the open — more as we learn it.
— Stefan