Cheap specialists, not expensive generalists

Here's a pattern we keep seeing: someone creates a single agent, gives it Sonnet, writes a vague task, and hopes it figures things out. It runs for 10 minutes, burns through credits, and produces something half-right.

Then someone else breaks the same job into three Haiku agents in a workflow. Each one has a specific task, specific tools, and a focused prompt. The whole thing runs in a fraction of the time, costs less, and produces better results.

The second approach wins every time.

Why smaller models work better for agents

A well-prompted Haiku agent with a narrow task doesn't need to be smart. It needs to be reliable. When the prompt says "extract all URLs from this page and output them as JSON," there's one right answer. Haiku nails it. Sonnet also nails it — but costs more to get the same result.

The bigger model helps when the task is ambiguous. But ambiguity in agent tasks is usually a prompt problem, not a model problem. If your agent needs Sonnet to understand what to do, the prompt isn't specific enough.

The cost math

Haiku is roughly 1/30th the cost of Sonnet per token. A three-step workflow of Haiku agents costs less than a single Sonnet run doing the same work. And because each agent has a narrower scope, they tend to use fewer tokens per step — the gap gets wider.

Real example: a competitor research workflow.

Step 1: Haiku agent scrapes a list of competitor URLs from search results
Step 2: Haiku agent visits each URL and extracts key data points
Step 3: Haiku agent summarises findings into a structured report

Three focused tasks. Each one straightforward enough that Haiku handles it without breaking a sweat. Total cost: a fraction of what a single Sonnet agent would burn trying to do all three at once.

Cheap specialists, not expensive generalists

Why smaller models work better for agents

The cost math

When to use Sonnet

Workflows make this practical

The principle