Leaderboard · WAB

Top 100 · audited agentic workspaces · 12 pillars · L0–L4

#	Workspace	Kind	Grade	Score	cluster	ELO	Mature pillars	Weakest	Stack	Audited	Evidence
1	Madani Workspace• B2B services portfolio · iter-39	ref	A	87.08	87 A 95 B 89 C 73 D	1900	—	—	Claude Code · Python · n8n · launchd · auto-promote-engine	2026-05-25	audit ↗
2	Hermes Agent · NousResearch• skill-curator + RL self-evolution	ext	C	50.83	30 A 63 B 53 C 60 D	1650	—	—	Python · agent/curator.py · skill_manage · GRPO	2026-05-24	audit ↗
3	OpenClaw• agentic platform · plugin ecosystem	ext	D	47.50	23 A 57 B 58 C 50 D	1580	—	—	TypeScript · Node.js · plugin system	2026-05-24	audit ↗
4	OpenAI Agents SDK · Python• agent SDK library	ext	D	40.83	23 A 42 B 49 C 50 D	1450	—	—	Python · agents framework	2026-05-20	audit ↗
5	Cline · IDE Agent• VS Code agentic IDE	ext	D	32.50	13 A 47 B 35 C 35 D	1480	—	—	TypeScript · VS Code extension	2026-05-24	audit ↗
6	Anthropic Cookbook• code-sample repository	ext	F	27.50	7 A 33 B 35 C 35 D	1380	—	—	Python · Jupyter · Claude Agent SDK	2026-05-20	audit ↗

Showing 6 of 6

Legend

✓ verified · audit verified by the benchmark maintainers.
• self-reported · audit ran on submitter machine · server-side re-audit on v0.5 roadmap.
Mature pillars · number of pillars at maturity ceiling (L4 Optimizing) out of 12 total. E.g. 9/12 = 9 pillars at L4.
Weakest · the pillar with the lowest maturity · where the workspace has the biggest gap to close.
Cluster A·B·C·D · averages across the 4 clusters (Cognition, Action, Trust, Operations).
ELO · derived from composite (1200 + composite × 8). Same composite → same ELO.
Score · composite 0-100 · equally-weighted mean across the 12 pillars.
Levels L0-L4 · L0 absent · L1 ad hoc · L2 documented · L3 automated · L4 optimizing (auto-improve).

Composite = weighted average 4 clusters · Bradley-Terry ELO · ~70% deterministic audit · IRR 1.0 verified. Reference entries verified in benchmark repo. Community submissions stored live in Vercel KV · CI re-audit on v0.5 roadmap.