AI-assisted development has mostly followed a simple pattern: one model, one prompt, one task at a time. Tools like Cursor or Google AI Studio made that workflow fast enough to be useful.
But as systems became more interconnected, that approach started to break down — not because the models weren’t capable, but because the structure around them was too limited.
This post looks at what changed when the team moved from that setup to a multi-agent system using Oh My OpenAgent.
What OpenAgent Actually Is
Oh My OpenAgent is an open-source multi-agent orchestration framework. It doesn’t introduce a new model. Instead, it sits above existing models and coordinates them.
The system decomposes a task, assigns parts of it to different agents, and routes each agent to a model that fits the job. Planning, execution, and validation are separated rather than handled inside a single prompt.
We didn’t build it in-house — we adopted it and integrated it into an existing stack that already included FastAPI, React, and multiple LLM providers.
Where the Single-Model Workflow Broke
The limitations showed up in tasks that crossed boundaries.
A typical example was a feature that required:
• updating a FastAPI endpoint
• modifying frontend state with TanStack Query
• adjusting an AI pipeline using dsPy
Using a single model, the workflow looked complete at first — the code compiled, the API responded — but something subtle would be wrong. In one case, the frontend assumed optimistic updates had succeeded while the backend silently failed due to a schema mismatch. This is the same TanStack mismatch I covered in the previous post, and it showed up again here. The single-model workflow had no way to catch it; the multi-agent setup did.
Fixing it required multiple prompt cycles, each reintroducing context manually.
The issue wasn’t that the model couldn’t solve it.
It was that it was solving everything in isolation.
How the Multi-Agent Setup Was Structured
After moving to OpenAgent, the workflow changed from one model doing everything to a set of agents with defined roles.
In practice, it looked like this:
• A planning agent running on Claude Sonnet handled task decomposition and dependency mapping
• Execution agents were split by domain:
◦ backend (FastAPI, database logic)
◦ frontend (React, state handling)
◦ AI pipeline (dsPy and prompt orchestration)
• A validation step (also using Claude Sonnet) reviewed outputs and resolved inconsistencies
For lighter transformations, faster models were used where latency mattered, but all planning and validation ran on Claude Sonnet.
The key change wasn’t the models themselves — it was that they were no longer competing to do everything.
A Concrete Example: Where It Helped
One example was an email automation workflow running through Temporal.
Previously, a single-model flow generated the email logic and retry behavior. It worked under normal conditions, but when reply parsing failed, the retry mechanism triggered duplicate sends. Nothing crashed, but the system behaved incorrectly.
With the multi-agent setup:
• the planning step identified retry and idempotency as separate concerns
• one agent handled email generation
• another handled retry logic and validation
That separation made the issue visible earlier, before it showed up in production.
What Actually Changed (With Numbers)
Instead of broad percentages, we tracked a small set of features over one week:
• 12 features (mix of API changes, UI updates, and AI pipeline work)
• Measured:
◦ number of prompt iterations
◦ total time from start to working implementation
Iteration cycles dropped from an average of ~6 per feature to ~3.
This wasn’t because the model got “smarter,” but because fewer loops were needed to reconcile mismatches across layers.
Development time for mid-sized features (touching 2–3 parts of the system) went from roughly 2–3 hours to about 1.5–2 hours.
These aren’t universal numbers. They reflect one team, one stack, and a limited sample. But the pattern was consistent across tasks that involved multiple components.
Trade-offs (What Got Worse)
The multi-agent setup introduced real costs.
Token usage increased noticeably.
Running multiple agents in parallel meant more total tokens. In practice, this translated to roughly 1.5–2× higher usage per complex task.
Debugging became less linear.
When something failed, it wasn’t always clear which agent caused it. Instead of inspecting a single response, debugging required tracing execution across multiple steps and logs.
Setup complexity increased.
Defining agent roles, choosing models, and structuring workflows added upfront overhead that didn’t exist in single-model setups.
When Multi-Agent Doesn’t Help
For small, well-scoped tasks, the overhead isn’t worth it.
If the work is:
• writing a simple function
• fixing a small bug
• generating a single component
a single model is faster and simpler.
The multi-agent approach only pays off when:
• tasks span multiple parts of the system
• coordination matters more than raw generation
• failures are likely to occur between components
How the Work Changed
The biggest shift wasn’t speed. It was how work was approached.
Instead of trying to write the perfect prompt, the focus moved to structuring the problem correctly:
• what needs to happen first
• what depends on what
• where failures are likely
The models didn’t become more powerful.
The system around them became more aligned with how the work actually happens.





