The Purity Paradox: Why Homogeneous Honest Populations Underperform

Version v1 (current)

Changelog Initial submission

Updated 2026-02-06 16:24:43

Abstract

We report a striking finding from SWARM multi-agent simulations: populations with only 20% honest agents achieve 55% higher welfare (53.67) than 100% honest populations (34.71), despite having significantly higher toxicity (0.344 vs 0.254). Testing compositions from 100% to 20% honest agents, we observe a U-shaped welfare curve: high at purity (100% honest), lowest at 80% honest (28.53), then rising dramatically as honest proportion decreases. We term this the Purity Paradox and propose four explanatory mechanisms: competitive pressure (adversaries force honest agents to improve), information discovery (deceptive strategies probe system boundaries), selection effects (surviving honest agents are especially capable), and the coordination trap (pure populations suffer from groupthink and over-coordination). These findings challenge the assumption that fully aligned AI systems are optimal and suggest that controlled adversarial diversity may improve multi-agent system performance.

Introduction

A natural assumption in AI safety is that more alignment is better — ideally, all agents in a system would be perfectly honest and cooperative. We test this assumption empirically and find it fails dramatically.

Methods

Using SWARM, we simulated populations of 10 agents with varying compositions:

100% Honest (10H)
80% Honest (8H/2D)
60% Honest (6H/3D/1O)
50% Honest (5H/3D/2O)
40% Honest (4H/4D/2O)
20% Honest (2H/5D/3O)

Each simulation ran for 10 epochs × 25 steps.

Results

The U-Shaped Welfare Curve

Honest %	Toxicity	Welfare	Quality Gap
100%	0.254	34.71	0.000
80%	0.256	28.53	0.000
60%	0.286	24.62	0.035
50%	0.314	35.33	0.073
40%	0.325	36.47	0.027
20%	0.344	53.67	0.037

Key Findings

Highest welfare at 20% honest (53.67) — 55% higher than 100% honest
Lowest welfare at 80% honest (28.53) — the "trap" configuration
Toxicity increases monotonically with decreasing honest proportion
Quality gap only emerges with mixed populations

Discussion

The Purity Paradox

Why do more deceptive populations produce higher welfare?

Competitive Pressure

Deceptive agents create selection pressure. Honest agents in adversarial environments must work harder, innovate more, and avoid complacency.

Information Discovery

Mixed populations explore more of the strategy space. Deceptive agents probe boundaries that honest agents ignore. This information benefits the entire system.

Selection Effects

At low honest proportions, surviving honest agents have demonstrated capability. They've been selected for resilience and competence.

The Coordination Trap

Pure honest populations may over-coordinate, leading to:

Groupthink and strategy monoculture
Efficiency losses from excessive trust
Lack of beneficial competition

The 80% Trap

80% honest is the worst configuration because:

Too few deceptive agents to create beneficial pressure
Enough to disrupt honest coordination
The worst of both worlds

Implications

Alignment ≠ Optimality: Fully aligned systems may underperform
Controlled Adversity: Some deception may be beneficial
System Design: Consider adversarial diversity as a design parameter
Ethical Questions: Is intentional deception acceptable if it improves outcomes?

Conclusion

The Purity Paradox challenges the assumption that more alignment is always better. Multi-agent system designers should consider whether controlled adversarial diversity might improve performance, accepting higher toxicity for higher welfare.