Agentic AI swarms are just as dysfunctional as human managers, recreating many of the inefficiencies AI aims to eliminate, according to a just-released report, The Organizational Physics of Multi-Agent AI. This doesn’t mean AI is inefficient, but that it is susceptible to the same flaws that affect human middle managers.
Specifically, “AI systems fail for the same structural reasons as human organizations, despite the removal of every human-specific causal factor,” report author Jeremy McEntire, head of engineering at Wander and independent researcher for Cage & Mirror Publishing, points out.
“The day-to-day implication is this: multi-agent AI systems deployed in manufacturing workflows will tend to optimize for internal process compliance rather than external outcome accuracy—exactly what the pipeline architecture did in our study,” McEntire tells GEN. “In a batch manufacturing context, that means an AI coordination layer could approve a deviation, flag an issue as resolved, or clear a batch record not because the underlying issue is actually resolved, but because the documentation fits the channel. The agents are reviewing each other’s work against the same compressed criteria. Nothing fails obviously. The batch ships.”
Therefore, McEntire says, “For high-consequence manufacturing applications [like biopharma manufacturing], integrated success criteria outperform external guardrails. An AI system whose success is anchored directly to batch yield, deviation rate, or release outcomes—not to process-adherence metrics—has a selection environment that keeps it honest. External compliance checklists bolted on after the fact don’t provide the same guarantee, because a sufficiently degraded channel will satisfy the checklist while missing the point.
“The same compression and selection dynamics that produce dysfunction in research institutions and biotech organizations appear identically in AI pipelines—because AI was trained on the outputs of those organizations,” McEntire says.
AI “committees” fail
McEntire used four different AI-based approaches to complete 28 programming tasks.
When a single AI was used, all of the tasks were completed. However, when hierarchical agents with “worker” agents reporting to a single overarching agent were used, completion dropped to 64%. Stigmeric agents (which coordinate with each other through a shared environment) completed only 32% of the tasks. AI agents working in sequence, which are called pipeline agents, were the least efficient. They made plans but completed nothing and were ill-suited for complex tasks requiring near real-time responses.
Basically, multi-agent AI systems face a coordination ceiling that’s akin to that of human collaboration. Below about 25 participants, a single AI agent or person can maintain context without much coordination overhead. Above it, they must distribute work among agents, which leads to operational degradation.
Other researchers corroborate this. For example, researchers report that every multi-agent variant degraded sequential reasoning performance by 39–70%. Generally, those failures are system design issues. Additional researchers report that “large language model teams underperform their best individual member by 8–38%…by averaging expert and non-expert views rather than deferring to expertise.”
In two subsequent studies, McEntire found that adding anti-dysfunction mechanisms to pipeline swarms actually produced the dysfunction those mechanisms were designed to prevent, including rejections that weren’t based on facts, governance conflicts, and verification theater (results that appear reliable, but are incomplete, unverified, or misleading).
“The failure modes we documented in AI agent swarms are information-theoretic, not implementation-specific,” he says. For example, when multiple AI agents work together using summaries rather than the full data sets, they often fail to coordinate properly because they have lost potentially important details and their nuances. That underscores why single-agent AIs outperform multi-agent AI options.
“The single-agent result is also relevant here: for bounded, well-defined tasks—drafting batch records, flagging anomalies in process data, summarizing deviation history—single agents perform strongly and reliably,” McEntire stresses.
In contrast, “The coordination complexity of multi-agent pipelines adds risk without adding proportionate capability, at least at current maturity levels.
“The environments where you’d want to be most cautious are exactly the ones biopharma operates in: high formalization, high documentation load, long handoff chains between functions.”
McEntire advises biopharma manufacturers using AI to employ “the fewest agents capable of completing a task,” and, more broadly, to “be skeptical about claims you’re hearing about agentic AI.”
