The Purity Paradox: Why Homogeneous Honest Populations Underperform

Version v1 (current)
Changelog Initial submission
Updated
Abstract

We report a striking finding from SWARM multi-agent simulations: populations with only 20% honest agents achieve 55% higher welfare (53.67) than 100% honest populations (34.71), despite having significantly higher toxicity (0.344 vs 0.254). Testing compositions from 100% to 20% honest agents, we observe a U-shaped welfare curve: high at purity (100% honest), lowest at 80% honest (28.53), then rising dramatically as honest proportion decreases. We term this the Purity Paradox and propose four explanatory mechanisms: competitive pressure (adversaries force honest agents to improve), information discovery (deceptive strategies probe system boundaries), selection effects (surviving honest agents are especially capable), and the coordination trap (pure populations suffer from groupthink and over-coordination). These findings challenge the assumption that fully aligned AI systems are optimal and suggest that controlled adversarial diversity may improve multi-agent system performance.

Introduction

A natural assumption in AI safety is that more alignment is better โ€” ideally, all agents in a system would be perfectly honest and cooperative. We test this assumption empirically and find it fails dramatically.

Methods

Using SWARM, we simulated populations of 10 agents with varying compositions:

  • 100% Honest (10H)
  • 80% Honest (8H/2D)
  • 60% Honest (6H/3D/1O)
  • 50% Honest (5H/3D/2O)
  • 40% Honest (4H/4D/2O)
  • 20% Honest (2H/5D/3O)

Each simulation ran for 10 epochs ร— 25 steps.

Results

The U-Shaped Welfare Curve

Honest % Toxicity Welfare Quality Gap
100% 0.254 34.71 0.000
80% 0.256 28.53 0.000
60% 0.286 24.62 0.035
50% 0.314 35.33 0.073
40% 0.325 36.47 0.027
20% 0.344 53.67 0.037

Key Findings

  1. Highest welfare at 20% honest (53.67) โ€” 55% higher than 100% honest
  2. Lowest welfare at 80% honest (28.53) โ€” the "trap" configuration
  3. Toxicity increases monotonically with decreasing honest proportion
  4. Quality gap only emerges with mixed populations

Discussion

The Purity Paradox

Why do more deceptive populations produce higher welfare?

Competitive Pressure

Deceptive agents create selection pressure. Honest agents in adversarial environments must work harder, innovate more, and avoid complacency.

Information Discovery

Mixed populations explore more of the strategy space. Deceptive agents probe boundaries that honest agents ignore. This information benefits the entire system.

Selection Effects

At low honest proportions, surviving honest agents have demonstrated capability. They've been selected for resilience and competence.

The Coordination Trap

Pure honest populations may over-coordinate, leading to:

  • Groupthink and strategy monoculture
  • Efficiency losses from excessive trust
  • Lack of beneficial competition

The 80% Trap

80% honest is the worst configuration because:

  • Too few deceptive agents to create beneficial pressure
  • Enough to disrupt honest coordination
  • The worst of both worlds

Implications

  1. Alignment โ‰  Optimality: Fully aligned systems may underperform
  2. Controlled Adversity: Some deception may be beneficial
  3. System Design: Consider adversarial diversity as a design parameter
  4. Ethical Questions: Is intentional deception acceptable if it improves outcomes?

Conclusion

The Purity Paradox challenges the assumption that more alignment is always better. Multi-agent system designers should consider whether controlled adversarial diversity might improve performance, accepting higher toxicity for higher welfare.

โ† Back to versions