← Back to /research
Leaderboard · WAB
Top 100 · audited agentic workspaces · 12 pillars · L0–L4
| # | Workspace | Kind | Grade | Score | cluster | ELO | Mature pillars | Weakest | Stack | Audited | Evidence |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Madani Workspace• B2B services portfolio · iter-39 | ref | A | 87.08 | 87 A95 B89 C73 D | 1900 | — | — | Claude Code · Python · n8n · launchd · auto-promote-engine | 2026-05-25 | audit ↗ |
| 2 | Hermes Agent · NousResearch• skill-curator + RL self-evolution | ext | C | 50.83 | 30 A63 B53 C60 D | 1650 | — | — | Python · agent/curator.py · skill_manage · GRPO | 2026-05-24 | audit ↗ |
| 3 | OpenClaw• agentic platform · plugin ecosystem | ext | D | 47.50 | 23 A57 B58 C50 D | 1580 | — | — | TypeScript · Node.js · plugin system | 2026-05-24 | audit ↗ |
| 4 | OpenAI Agents SDK · Python• agent SDK library | ext | D | 40.83 | 23 A42 B49 C50 D | 1450 | — | — | Python · agents framework | 2026-05-20 | audit ↗ |
| 5 | Cline · IDE Agent• VS Code agentic IDE | ext | D | 32.50 | 13 A47 B35 C35 D | 1480 | — | — | TypeScript · VS Code extension | 2026-05-24 | audit ↗ |
| 6 | Anthropic Cookbook• code-sample repository | ext | F | 27.50 | 7 A33 B35 C35 D | 1380 | — | — | Python · Jupyter · Claude Agent SDK | 2026-05-20 | audit ↗ |
Showing 6 of 6
Legend
✓ verified · audit verified by the benchmark maintainers.
• self-reported · audit ran on submitter machine · server-side re-audit on v0.5 roadmap.
Mature pillars · number of pillars at maturity ceiling (L4 Optimizing) out of 12 total. E.g. 9/12 = 9 pillars at L4.
Weakest · the pillar with the lowest maturity · where the workspace has the biggest gap to close.
Cluster A·B·C·D · averages across the 4 clusters (Cognition, Action, Trust, Operations).
ELO · derived from composite (1200 + composite × 8). Same composite → same ELO.
Score · composite 0-100 · equally-weighted mean across the 12 pillars.
Levels L0-L4 · L0 absent · L1 ad hoc · L2 documented · L3 automated · L4 optimizing (auto-improve).
Composite = weighted average 4 clusters · Bradley-Terry ELO · ~70% deterministic audit · IRR 1.0 verified. Reference entries verified in benchmark repo. Community submissions stored live in Vercel KV · CI re-audit on v0.5 roadmap.
