Book
← Back to /research

Leaderboard · WAB

Top 100 · audited agentic workspaces · 12 pillars · L0–L4

#WorkspaceKindGradeScoreclusterELOMature pillarsWeakestStackAuditedEvidence
1Madani Workspace
B2B services portfolio · iter-39
refA87.08
87
A
95
B
89
C
73
D
1900Claude Code · Python · n8n · launchd · auto-promote-engine2026-05-25audit ↗
2Hermes Agent · NousResearch
skill-curator + RL self-evolution
extC50.83
30
A
63
B
53
C
60
D
1650Python · agent/curator.py · skill_manage · GRPO2026-05-24audit ↗
3OpenClaw
agentic platform · plugin ecosystem
extD47.50
23
A
57
B
58
C
50
D
1580TypeScript · Node.js · plugin system2026-05-24audit ↗
4OpenAI Agents SDK · Python
agent SDK library
extD40.83
23
A
42
B
49
C
50
D
1450Python · agents framework2026-05-20audit ↗
5Cline · IDE Agent
VS Code agentic IDE
extD32.50
13
A
47
B
35
C
35
D
1480TypeScript · VS Code extension2026-05-24audit ↗
6Anthropic Cookbook
code-sample repository
extF27.50
7
A
33
B
35
C
35
D
1380Python · Jupyter · Claude Agent SDK2026-05-20audit ↗

Showing 6 of 6

Legend

verified · audit verified by the benchmark maintainers.
self-reported · audit ran on submitter machine · server-side re-audit on v0.5 roadmap.
Mature pillars · number of pillars at maturity ceiling (L4 Optimizing) out of 12 total. E.g. 9/12 = 9 pillars at L4.
Weakest · the pillar with the lowest maturity · where the workspace has the biggest gap to close.
Cluster A·B·C·D · averages across the 4 clusters (Cognition, Action, Trust, Operations).
ELO · derived from composite (1200 + composite × 8). Same composite → same ELO.
Score · composite 0-100 · equally-weighted mean across the 12 pillars.
Levels L0-L4 · L0 absent · L1 ad hoc · L2 documented · L3 automated · L4 optimizing (auto-improve).

Composite = weighted average 4 clusters · Bradley-Terry ELO · ~70% deterministic audit · IRR 1.0 verified. Reference entries verified in benchmark repo. Community submissions stored live in Vercel KV · CI re-audit on v0.5 roadmap.