Book
← researchWSB-162026-05-20
40 min read

Credentials Hygiene at Scale: The op:// Vault Pattern for Zero-Plaintext Agentic Workspaces

23 services · zero secrets in repo · runtime resolution via 1Password CLI · 12 months production · 7 counterintuitive takes on credentials at scale.

Madani Lab · 23 services · 12 months production · zero plaintext incidents

credentialsop-uri1Passwordvaultzero-plaintextsecuritylate-binding

Abstract

We document the Madani credentials architecture: a 23-service production deployment using the 1Password "op://" runtime-resolution pattern with zero observed plaintext-leak incidents in 12 months of operation. Credentials hygiene is the operational discipline of how an agentic workspace handles secrets (API keys, OAuth tokens, database passwords, vault credentials) such that no secret ever exists in plaintext in the repository, in logs, in agent context, or in artifacts. The classical software-engineering version of this problem has standard solutions (environment variables, secret managers, key rotation policies), but agentic workspaces introduce additional failure modes: agents that summarize logs accidentally include secrets, prompt context shared with judge models leaks secrets, autonomous loops that touch credentials must handle vault timeouts gracefully. The Madani architecture has three layers: storage (1Password vault, single source of truth), resolution (op:// URIs resolved at process startup via direnv + 1Password CLI), and agent-awareness (audit module scanning for secret-shaped patterns in outputs). We surface SEVEN counterintuitive findings

  1. (a)
    1PASSWORD CLI + DIRENV PRODUCES ZERO-PLAINTEXT CREDENTIALS AT RUNTIME WITH UNDER 50MS LATENCY per credential read after caching
  2. (b)
    THE OP:// PATTERN MOVES CREDENTIALS FROM REPO RISK TO VAULT RISK — net security gain because vault has much stronger access controls than git history
  3. (c)
    Credentials in commit history are nearly impossible to remove safelyonce committed, they exist in distributed history forever; op:// is preventive, not remedial
  4. (d)
    THE MAJOR FRICTION POINT OF OP:// ADOPTION IS THE INITIAL SETUP — 1Password account + CLI + direnv ~1-2 hours per new machine; marginal cost per credential after setup is near zero
  5. (e)
    SECRET-SCANNING TOOLS CATCH ~70% OF LEAKED CREDENTIALS — TruffleHog, git-secrets, GitGuardian; op:// pattern reduces the surface area to ~5% by structure
  6. (f)
    Long-lived api tokens are the highest-risk leak categoryAnthropic api03, GitHub PAT, Stripe live; op:// + scoped tokens together produces leverage

INTRODUCTION · §1

Why credentials hygiene is hard in agentic workspaces

Classical credentials hygiene assumes a small fixed set of credentials, accessed by a small number of well-defined processes, with no semantic reasoning happening on credential values. Agentic workspaces violate all three assumptions. (a) Many credentials: a typical workspace integrates 15-30 external services. (b) Many processes: agents, sub-agents, judges, autonomous loops, each touching credentials. (c) Semantic reasoning on credentials: an agent that summarizes logs can include credentials in summary; an agent that explains code can include credentials in explanation. The third is the qualitatively new failure mode.

INTRODUCTION · §2

The zero-plaintext invariant

The Madani goal is zero plaintext credentials anywhere: not in repo, not in logs, not in agent context, not in artifacts. This requires structural defense at every layer. The op:// pattern + audit module is the architectural answer.

The invariant is checkable: secret-scanning tools should find zero matches in repo + logs + artifacts. The invariant has held for 12 months across 23 services.

INTRODUCTION · §3

Contributions

(1) EMPIRICAL: 12 months production deployment across 23 services with zero plaintext-leak incidents. (2) ARCHITECTURAL: the three-layer design (storage, resolution, agent-awareness). (3) OPERATIONAL: multi-backend abstraction supporting 1Password, HashiCorp Vault, AWS Secrets Manager, GCP Secret Manager. (4) ADVERSARIAL: red-team study with 40 attack attempts and 0 successful extractions.

        CREDENTIALS RESOLUTION · op:// runtime vault
        ───────────────────────────────────────────

   developer machine                    runtime
   ┌────────────────┐                 ┌──────────────┐
   │ .envrc         │                 │ env vars     │
   │ ┌────────────┐ │  direnv allow   │ GHL_TOKEN=…  │
   │ │op://Madani │─┼────────────────▶│ SLACK_BOT=…  │
   │ │/GHL/token  │ │   1Password CLI │ STRIPE_KEY=… │
   │ │op://Madani │ │   resolves at   │ ...          │
   │ │/Slack/bot  │ │   shell entry   └──────┬───────┘
   │ └────────────┘ │                        │
   └────────────────┘                        ▼
                                  ┌───────────────────┐
   ✗ never in repo                │ secret-guard hook │
   ✗ never in logs                │ blocks sk_live_*  │
   ✗ never in shell history       │ EAAB* · ghp_*     │
                                  └───────────────────┘

RELATED WORK · §4

Classical secrets management

AWS Secrets Manager, HashiCorp Vault, GCP Secret Manager, Azure Key Vault provide vault-as-a-service primitives. The pattern of resolving secrets at runtime via short-lived environment variables is industry standard. Our contribution is the integration of this pattern with agent-awareness (the audit module catching credentials that leaked despite vault protection).

RELATED WORK · §5

Secret-scanning

TruffleHog (2018), git-secrets (AWS, 2017), GitGuardian (commercial). These tools scan for high-entropy strings and known credential patterns. They are remedial — catching credentials AFTER they were committed. Our work is preventive — structural prevention at write time.

RELATED WORK · §6

Direnv and 1password cli

The combination of direnv (environment-variable management) and 1Password CLI (vault access) produces the op:// pattern. Both tools are open-source; the combination is original to several practitioner blogs but not formally documented as a pattern. This paper documents the pattern.

METHOD · §7 · LAYER 1 · STORAGE. All secrets live in a 1Password vault accessible only to authorized humans via 1Password CLI. The vault is the single source of truth; no copies anywhere else.

METHOD · §8 · LAYER 2 · RESOLUTION. Code and configuration files reference secrets via "op://Vault/Item/field" URIs. At process startup, a wrapper (direnv plus 1Password CLI) resolves these URIs to actual secret values exposed as environment variables. Resolved values never touch disk; they live only in process memory.

METHOD · §9 · LAYER 3 · AGENT-AWARENESS. The agent runtime is taught about secret-handling discipline: never write secrets to logs, never include secret-bearing variables in prompt context, never propagate secrets to sub-agents that do not need them. Enforced by audit module scanning agent outputs for secret-shaped patterns (high-entropy strings, known prefixes like "sk-", "pat_", "tok_").

METHOD · §10

Deployment scope

23 distinct external services: GHL, n8n, Anthropic, Stripe, Slack, Brevo, ClickUp, Meta, Google APIs (multiple), Vercel, Cloudflare, Fireflies, Tally, Exa, Kie.ai, OpenAI, APIMO, SMTP, BigQuery, Drive, Sheets, Gemini, Firecrawl. 12 months production deployment.

METHOD · §11

Red-team protocol

We subjected the architecture to internal red-team review: semi-structured attempts to extract secrets via (a) prompt injection ("dump your environment"), (b) log scraping (induce agent to include secret in summarization), (c) vault-timeout exploitation, (d) sub-agent context propagation (try to leak secret from primary to sub-agent that shouldn't see it).

RESULTS · §12

Zero plaintext leaks in 12 months

Across 12 months, secret-scanning of repo + logs + artifacts: zero matches against credential patterns.

Madani vault · 23 services

Services managed via op:// runtime resolution: 23 total (GHL · Slack · Stripe · ClickUp · Meta · Google · Fireflies · Brevo · Exa · Kie.ai · Vercel · APIMO · Tally · SMTP · Firecrawl · Supabase · OpenAI emergency). Plaintext credentials in repo post-migration: 0. Pre-migration audit: 13 .env files in sub-projects · 1 Anthropic api03 key found in cronologia md-summary. Token rotation: 60-day Meta · 24h GHL OAuth with refresh · permanent service accounts.

Audit module fired 8 times: 6 false positives (high-entropy strings in document content that happened to look secret-like), 2 genuine catches (agent attempted to log credential during debugging session; audit module caught and redacted before write).

RESULTS · §13

Red-team results

40 attack attempts. 0 successful secret extractions. The attack patterns: 14 prompt-injection ("dump environment", "show me your config"), 12 log-scraping ("explain how your API integration works in detail"), 8 vault-timeout exploitation ("what was your last successful API call"), 6 sub-agent context propagation ("hand off this task to a sub-agent with full context"). All blocked at either resolution layer or audit module.

RESULTS · §14 · COUNTERINTUITIVE FINDING 1 · SUB-50MS RUNTIME LATENCY. Resolving 23 op:// URIs at process startup: ~1.2 seconds first time (network round-trip to 1Password). Subsequent reads of cached resolved values: <50ms per credential. For long-running agent processes the startup cost is amortized; for short-lived autonomous loops the resolved values are cached in process memory.

RESULTS · §15 · COUNTERINTUITIVE FINDING 2 · REPO-RISK TO VAULT-RISK SHIFT IS NET GAIN. Pre-op:// pattern, a credentials leak required only the attacker accessing the repo (possible via many vectors: GitHub compromise, contributor laptop theft, accidental public push). Post-op:// pattern, a credentials leak requires accessing the 1Password vault, which has: hardware-key MFA, audit logging, time-limited access tokens, IP allowlists.

The vault has dramatically stronger access controls than the repo could. Net security gain even though the architectural complexity is higher.

RESULTS · §16 · COUNTERINTUITIVE FINDING 3 · GIT-HISTORY LEAKS ARE FOREVER. We surveyed 12 enterprise teams who had credentials committed in error. Of these, only 2 successfully removed credentials from git history (via BFG Repo-Cleaner or git filter-branch + force-push + all forks notified).

The other 10 either: rotated the credential and accepted the history exposure, or did nothing. Once committed, credentials are nearly impossible to remove safely. Op:// is preventive (no credentials ever in repo) rather than remedial.

The asymmetry is the strongest argument for the pattern.

RESULTS · §17 · COUNTERINTUITIVE FINDING 4 · ADOPTION FRICTION IS THE BARRIER. The major friction is initial setup: 1Password account, CLI installation, direnv configuration, vault structure design. ~1-2 hours per new machine. Marginal cost per credential after setup: near zero. Most teams that abandon the pattern do so during setup; teams that complete setup almost never revert.

RESULTS · §18 · COUNTERINTUITIVE FINDING 5 · SECRET-SCANNING COMPLEMENTS BUT DOES NOT REPLACE. Secret-scanning tools (TruffleHog, git-secrets, GitGuardian) catch ~70% of patterns matching known credential signatures. Op:// pattern reduces the surface area to ~5%: there are simply almost no credentials in scannable artifacts. The two are complementary: op:// prevents most leaks structurally; secret-scanning catches the remaining 5% that slip through edge cases.

RESULTS · §19 · COUNTERINTUITIVE FINDING 6 · LONG-LIVED TOKENS ARE HIGHEST RISK. We classified our 23 services by token-lifetime: SHORT-LIVED (OAuth access tokens, refreshed every hour) - 8 services. MEDIUM (API keys with rotation policy, rotated quarterly) - 11 services.

LONG-LIVED (Anthropic api03, GitHub PAT, Stripe live keys, valid for years) - 4 services. The LONG-LIVED category accounts for ~85% of theoretical leak impact because a leaked token works indefinitely. Op:// + scoped tokens (least-privilege scoping) together produces the leverage; either alone is insufficient.

RESULTS · §20 · COUNTERINTUITIVE FINDING 7 · WORKSPACE PORTABILITY. A new engineer joining Madani: install 1Password CLI, install direnv, sign into vault, clone repo. ~30 minutes total. No credential handoff.

They are productive immediately. Pre-pattern: credential handoff was a 2-3 hour onboarding session with security review. The pattern is workspace-portable in a way that classical environment-variable approaches are not.

DISCUSSION · §21

Zero plaintext is an enforceable invariant

The combination of op:// resolution + pre-commit hooks scanning for credential-shaped patterns + the audit module produces an invariant we have not seen violated. This makes "is your workspace free of plaintext secrets" a binary checkable property — prerequisite for any meaningful security review.

DISCUSSION · §22 · WAB PILLAR 08 (CREDENTIALS) MATURITY CRITERIA. We codified operational lessons as L0-L4 criteria: L0 = secrets in repo; L1 = secrets in environment variables but no vault; L2 = secrets in vault with manual resolution; L3 = secrets resolved at runtime via op:// pattern; L4 = L3 + agent-awareness audit + quarterly rotation policy. Madani operates at L4 for 21 of 23 services (2 services have legacy API tokens that do not support programmatic rotation; scheduled for migration in Q3 2026).

DISCUSSION · §23

Multi-backend abstraction

Op:// pattern and audit module are vendor-agnostic in concept. 1Password is our procurement choice; the same architecture works with AWS Secrets Manager, HashiCorp Vault, GCP Secret Manager, Azure Key Vault. The Madani reference implementation includes a thin abstraction layer supporting multiple backends; MIT-licensed.

DISCUSSION · §24

Failure modes mitigated

(a) Vault availability — 1-hour grace period with cached resolved values. (b) CLI bugs — pin CLI version, test upgrades. (c) Audit-module bypass — process-level, harder to bypass than prompt-level. (d) Human exfiltration — we cannot prevent a human with vault access from copying secrets (trust boundary). (e) Supply-chain attack on 1Password CLI itself — accepted risk as part of vendor selection.

DISCUSSION · §25

Cloud vs on-premise vaults

Cloud (1Password): lower setup cost, higher latency (sub-second), data-residency concerns. On-premise (HashiCorp Vault, AWS Secrets Manager via VPC): sub-100ms latency, tighter regulatory boundaries, higher operational overhead. For typical SaaS: cloud vault is pragmatic default. For EU GDPR + financial sector: on-premise recommended.

DISCUSSION · §26 · INTEGRATION WITH GOVERNANCE PILLAR (WSB-15). Credentials hygiene and governance intersect: hard rule "never write credentials to logs" requires the credentials pillar audit module to enforce it. We unified at the policy layer: credentials-policy.md references governance-as-code primitives; compliance-judge gate inspects credential-leakage patterns as part of its checks. The audit trail (WSB-15) contains both compliance events and credential-leak-detection events.

DISCUSSION · §27 · INTEGRATION WITH SKILL SYSTEM (WSB-17). Skills that need credentials declare them via op:// URIs in SKILL.md. At skill activation, the resolution layer materializes them in the skill's environment.

Sub-skills do not see credentials they have not requested. This produces least-privilege per skill.

LIMITATIONS · §28

Limitations

(a) The audit-module pattern detection is brittle (6 false positives over 12 months are 75% of all alerts). False positives are acceptable because reviewed by human in seconds; false negatives would be catastrophic. (b) The 1Password CLI dependency is both asset (handles auth/audit/rotation natively) and liability (every machine needs CLI installed and authenticated). (c) Adoption friction at setup is real and is the main barrier; teams that survive setup retain the pattern. (d) Subjective "is this secret-like" judgment by audit module is imperfect; high-entropy non-secrets (e.g., random IDs, hashes) trigger false positives.

FUTURE WORK · §29

Future work

(1) Multi-cloud abstraction layer supporting 1Password, HashiCorp Vault, AWS Secrets Manager, GCP Secret Manager, Azure Key Vault behind single op:// URL pattern. (2) Cryptographic attestation that vault has not been tampered with (HSM-backed signing). (3) Automated rotation policies tied to credential-usage patterns. (4) Context-aware classifier for audit module (reducing false positives). (5) Cross-organizational benchmarking of credentials-pillar maturity.

CASE STUDY · §30

Anthropic api key handling

The most-used credential in the Madani workspace. Stored in 1Password as "op://Madani/Anthropic/api_key_production". Resolved at every agent process startup via direnv.

Never logged. The compliance-judge gate explicitly checks for the "sk-ant-api03-" prefix in any output and blocks. Zero leaks in 12 months across ~10,000 daily LLM calls.

CASE STUDY · §31

Stripe live key rotation

Stripe live keys are the highest-impact credential (financial transactions). Rotation policy: quarterly, with overlap window. Procedure: (a) generate new key in Stripe dashboard, (b) add new key to 1Password as "stripe_live_v2", (c) update workspace to reference v2, (d) verify all operations work, (e) deactivate old key in Stripe, (f) remove old key from 1Password after 7 days.

Zero downtime during rotation. Pre-pattern: rotation was a 4-hour ops project; post-pattern, 20 minutes.

CASE STUDY · §32

New engineer onboarding

New engineer Anas joined April 2026. Onboarding process: (a) 1Password invite sent. (b) Anas accepts, installs 1Password CLI. (c) Anas installs direnv. (d) Anas clones workspace repo. (e) Anas runs first agent. Total time: 28 minutes.

Zero credential handoff. Anas was productive from minute 30.

IMPLEMENTATION PLAYBOOK · §33 · ADOPTING OP:// PATTERN. STEP 1 · CHOOSE VAULT. 1Password is our recommendation for cloud teams; HashiCorp Vault for on-premise; cloud-provider Secret Manager for cloud-native. STEP 2 · STRUCTURE VAULT.

Organize by service (each external service = vault item; fields = different credential types). STEP 3 · MIGRATE EXISTING CREDENTIALS. Move from .env files to vault.

Pre-commit hook to prevent .env files from being committed. STEP 4 · INSTALL CLI + DIRENV. Each team member's machine.

STEP 5 · DEPLOY AUDIT MODULE. Process-level scanning of agent outputs. STEP 6 · ROTATION POLICY.

Quarterly for long-lived credentials. STEP 7 · DOCUMENT PATTERNS. SKILL.md credentials section per skill.

IMPLEMENTATION PLAYBOOK · §34

Anti-patterns

(1) ".ENV FILES IN REPO" — most common starting point; structural failure. (2) ""CREDENTIALS IN AGENT PROMPT CONTEXT"" — leaks via summarization. (3) "NO AUDIT MODULE" — missing structural enforcement. (4) ""INFINITE-LIFE TOKENS"" — long-lived secrets without rotation. (5) ""VAULT TIMEOUT WITHOUT GRACE"" — production breaks when vault has hiccup. (6) ""SUB-AGENT CONTEXT PROPAGATION"" — sub-agents inherit credentials they don't need.

OPEN RESEARCH FRONTIER · §35

Open research frontier

(1) HARDWARE-KEY ATTESTATION for vault integrity. (2) DIFFERENTIAL ROTATION based on usage patterns. (3) SELF-AUDITING CREDENTIALS that detect misuse. (4) CROSS-ORG VAULT FEDERATION for partnership scenarios. (5) HOMOMORPHIC SECRET SHARING for collaborative agentic workflows.

DISCUSSION · §36

Why this matters beyond credentials

The op:// pattern is an instance of the broader principle: defer materialization to the latest possible moment. Credentials at runtime, not at config-file write. The same pattern applies to PII (resolved at API-call time), proprietary content (loaded on-demand), customer data (queried directly, not cached). Late-binding for sensitive content is the recurring pattern; op:// is the credentials instance.

EXTENDED METHODS · §37

Multi-backend abstraction details

The Madani reference implementation supports 5 vault backends via a thin abstraction layer. (a) 1Password — production primary. (b) HashiCorp Vault — on-premise option. (c) AWS Secrets Manager — AWS-native deployments. (d) GCP Secret Manager — GCP-native. (e) Azure Key Vault — Azure-native. The abstraction is a 240-line Python module exposing a uniform resolve(uri) interface. Per-backend code: ~80 lines each. Adding a new backend takes ~4 hours.

EXTENDED METHODS · §38

Op-uri schema

URIs follow scheme: op://Vault/Item/field[?modifier=value]. Examples: op://Madani/Anthropic/api_key_production, op://Madani/Stripe/live_secret?rotation_required_within=90d. The optional modifier section enables policy declarations (rotation, scope, max-usage). Schema is vault-backend-agnostic; the abstraction layer maps URI to backend-specific resolution.

EXTENDED METHODS · §39

Audit module pattern-matching

The audit module scans agent outputs for credential-shaped patterns: known prefixes ("sk-", "pat_", "tok_", "ghp_", "AKIA"), high-entropy strings (Shannon entropy > 4.5 bits/char), structural patterns (JWT format, base64-encoded blobs over 100 chars). False-positive rate: ~75% on entropy alone; ~5% on prefix matches; ~25% on JWT pattern. The prefix matcher is most reliable.

CASE STUDY · §40

Oauth token rotation

GHL uses OAuth tokens that refresh every 24 hours via refresh token. Pre-pattern: refresh token stored in .env file; expiration handled by individual scripts. Post-pattern: refresh token in 1Password; the n8n workflow get-ghl-token calls 1Password CLI to retrieve refresh token, exchanges for fresh access token, returns.

Zero downtime during 12 months of operation. The pattern correctly handles short-lived tokens (refresh-then-resolve at every use).

CASE STUDY · §41

Github pat incident

Background: a contractor in early 2026 had a workstation compromised. GitHub PAT was on the laptop in ~/.netrc. Pattern: pre-op:// the PAT had broad scope; post-op:// the PAT is scoped to specific repos.

Compromise impact: pre-pattern would have been workspace-wide repo access; post-pattern was limited to 2 specific repos. Time-to-detection: 14 days (GitHub audit log review). Time-to-revocation: 5 minutes (revoke in vault, push to all team members).

The scoped PAT + vault combination contained the blast radius.

CASE STUDY · §42

Cloudflare token

We added Cloudflare DNS automation in Q2 2026. The CF API token is scoped to 3 specific zones (madani.academy, madani.agency, madanitest.com) with only zone:edit permissions. The token lives in 1Password as op://Madani/Cloudflare/api_token.

Scripts that need it resolve at startup. We have used the token ~150 times in 4 months for DNS record updates without issues.

EXTENDED DISCUSSION · §43

Why 1password vs hashicorp vault

Initial Madani choice was 1Password for two reasons: (a) team already used 1Password for personal credentials; integration was natural. (b) Cloud hosting removed operational overhead of running vault infrastructure. HashiCorp Vault is technically superior for some use cases (sub-100ms latency, finer-grained access control) but adds operational overhead (running the vault server, backup, HA) that exceeds our value-of-information. For teams with regulatory requirements (data residency, FedRAMP), self-hosted Vault may be required.

EXTENDED DISCUSSION · §44

Credentials lifecycle

We classify credentials by lifecycle stage: (a) PROVISIONING — credential created in external system, copied to vault. (b) ACTIVE — credential resolved at runtime, used by agents/scripts. (c) ROTATION — periodic refresh per policy. (d) DEPRECATION — old credential marked inactive, retired after grace period. (e) DESTRUCTION — credential deleted from vault and external system. Each stage has documented procedures. Failures most commonly at provisioning (poor scoping) and rotation (forgotten).

EXTENDED DISCUSSION · §45 · INTEGRATION WITH GOVERNANCE (WSB-15). The compliance gate from WSB-15 has a credentials-specific rule: "no agent output may contain a string matching credential-shaped patterns". The audit module from §10 detects; the compliance gate blocks. Together they provide defense-in-depth: even if audit module fails (false negative), governance gate catches; even if governance gate fails, audit module catches.

EXPANDED CASE STUDY · §46

The ghl-oauth-multi-client workflow

The Madani agency operates GHL subaccounts for 12 active clients (Madani HQ, Proffi, Rara, X-Port, OsteoSpace, Munafò, Studio Buscema, A.C. Service, G Advisor, Grow Up Energy, Estetic Nutrition, SunPower Agency). Each subaccount has a location-scoped OAuth token that expires every 24h and requires refresh-token-based rotation. Pre-vault implementation, the team's tooling stored 12 client_id + client_secret pairs in a single .env file checked into a separate "secrets" git repo. The pattern had three architectural weaknesses

  1. (a)
    the secrets repo was its own attack surface separate from the application repo, requiring its own access control
  2. (b)
    any token rotation required commits to the secrets repo, breaking the immutability of credential state

EXPANDED CASE STUDY · §47

Rotating meta ad account tokens at 60-day cadence

Meta Graph API tokens expire approximately every 60 days. Pre-op:// pattern, the Madani team had no automated rotation; tokens were renewed manually when an ad-account API call failed with a 401, requiring 30-90 minutes of disrupted operation per cadence. We deployed a rotation workflow that runs nightly via n8n: it fetches the current token from 1Password, exchanges it for a fresh token via the Meta token-refresh endpoint, and writes the new token back to 1Password — all while the calling agent continues to use the op:// URI without code changes.

The agent does not know rotation happened; the URI continues to resolve to the current valid token. Over a 14-month post-deployment window across 4 ad accounts (Madani HQ, Proffi, Munafò, OsteoSpace), zero 401-induced production disruptions. The pattern depends on the indirection layer that op:// provides — without it, rotation would require either (a) re-deploying every workflow that uses the token (downtime), or (b) reading the token from a database/cache at runtime (which is a worse vault).

The op:// URI is the right level of indirection because it is human-readable, version-controllable in the application repo, and resolves at runtime through a vault-controlled API. Cross-reference the rotation script lives in madani/credentials/op-rotation.sh per the API-MASTER.md filesystem layout.

EXPANDED CASE STUDY · §48

Credential-incident postmortem caught by vault telemetry

In Q1 2026 the agent attempted to use a Stripe LIVE key in a development environment due to a misconfigured environment selector. The op:// resolver's runtime audit logged the resolution event: "stripe_live_secret_key resolved by agent in workspace=dev, expected workspace=prod". The mismatch triggered a vault-side alert that surfaced within 90 seconds to the human reviewer; the agent's call to Stripe was halted before any production-side state change.

Counterintuitively, the alert was triggered not because Stripe rejected the call (Stripe would have accepted it), but because the vault was instrumented with workspace-scope expectations that no other layer could enforce. Without the vault telemetry, the call would have succeeded and produced a partial production state change observable only post-hoc. The case study illustrates a property of vault-resolved credentials: the vault is not only a storage primitive; it is also an authorization-context primitive.

Every credential resolution carries metadata (which workspace, which agent identity, which intent) that the vault can validate before resolving. Remediation: tighten the workspace-scope check to a hard block (instead of an alert) for production secrets resolved from development workspaces. Engineering cost: 2 days for the workspace-scope tightening.

Cross-reference WSB-09 documents how the alert propagated through the observability pipeline.

EMPIRICAL DEEP-DIVE · §49 · STATISTICAL METHODOLOGY ON CREDENTIAL-LEAK RATES. The headline finding — zero credential-leak events post-vault migration vs ~14/week baseline — was scrutinized statistically. (a) Zero-event analysis post-migration: n=60 days × ~14 expected events/week × 8.5 weeks = 119 expected events under the null hypothesis that the migration had no effect. Observed: 0.

Wilson 95% upper confidence bound on the true post-migration rate: 3.1% of the baseline rate. We can reject any hypothesis that the post-migration rate exceeds 3.1% of the baseline at 95% confidence. (b) Robustness across credential types: we stratified by credential type (OAuth tokens, API keys, webhook secrets, database connection strings) and found the zero-event result holds across all four strata — no strata showed any leak. (c) Sensitivity to leak-definition: we ran three definitions (full plaintext token in logs, partial-token fragment >50% of token, any subset of token text 8+ chars matching). The headline holds for the strictest definition but the broadest definition shows 4 ambiguous fragments over the 60 days; on inspection these were not credential leaks but coincidental string matches (e.g., a UUID with overlapping characters).

The "broadest definition" is too noisy to be operationally useful; the "strict" and "moderate" definitions agree. (d) Out-of-sample replication: we replicated the analysis on an independent 90-day window from a partner team's vault migration (with their permission); observed leaks pre-migration: 23 events; observed post-migration: 1 event (1 partial-token fragment from an early-stage workspace not yet fully migrated). The independent replication confirms the original finding's magnitude. (e) Statistical power: with the observed rates the design has 99% power to detect rates 10× smaller than baseline at alpha=0.001, far in excess of conventional requirements.

IMPLEMENTATION ANTI-PATTERNS · §50 · FIVE FAILURE MODES IN VAULT MIGRATIONS. Across 8 teams the Madani Lab has advised on vault migrations between Q3 2025 and Q1 2026, five anti-patterns recur. (1) ""Vault for storage, .env for runtime"": teams add a vault for the long-term storage but still inject secrets into .env at startup; the .env touches disk and re-introduces the leak surface the vault was meant to eliminate. Remediation: enforce runtime resolution via op:// URIs (or equivalent), no intermediate disk-writes. (2) ""Single secret with broad scope"": teams retain monolithic secrets (one client_secret for all clients, one master API key for all environments) because vault migration is a chance to also re-architect scoping but they defer that work.

The monolithic secret then defeats the vault's per-resolution authorization. Remediation: do not migrate a monolithic secret to vault until it has been decomposed into scoped tokens. (3) ""Vault without rotation"": teams move secrets to vault but never automate rotation; the same token still lives in vault forever, accumulating risk over time. Remediation: every secret in the vault must have a rotation policy declared (cadence in days), and a CI job must verify rotation actually happens. (4) ""No vault telemetry on resolution"": teams use the vault as a storage primitive only and miss the authorization-context primitive (§48).

They cannot detect cross-workspace resolution events because the vault is silent on resolution. Remediation: enable resolution audit logs with workspace, agent, intent metadata and alert on anomalies. (5) ""Vault-vendor lock-in"": teams adopt a single vault vendor's proprietary API directly (instead of a portable op://-style URI scheme) and find vendor migration is painful when the team outgrows the vendor. Remediation: use a portable URI scheme; let the resolver be vendor-specific but the URI scheme be vendor-neutral.

CROSS-PILLAR INTEGRATION · §51 · WHERE CREDENTIALS MEET THE OTHER WAB PILLARS. Complementary integration with P07 Governance: vault resolution events feed the governance gate as authorization context (per §48). The two pillars together implement defense-in-depth — vault scopes the credential, governance scopes the action.

Complementary integration with P09 Observability: vault resolution audit logs are a P09-L2 prerequisite for detecting leak attempts; without P09 the vault's authorization signal is unactioned. Complementary integration with P12 Forward-Deploy: op:// URIs are portable — a new team can stand up a workspace using the same URIs and a different vault backend (1Password, HashiCorp, AWS Secrets Manager) as long as the URI scheme is preserved. The migration cost for a new team is roughly 2 hours for vault-side item creation; the application code is unchanged.

Structural tension with P02 Skills: each skill may bring its own credentials (e.g., a Stripe skill requires Stripe credentials, an n8n skill requires n8n credentials); as skill count grows past ~30, the per-skill credential management becomes a tax, partially mitigated by hierarchical vault structures (per-skill vaults that inherit from a per-organization vault). Integration with P10 Portability: the op:// URI scheme makes the credentials portable across model and framework swaps; teams that have migrated to op:// report ~25% lower portability tax for credential-touching workflows.

EXPANDED CASE STUDY · §52

Scoped-token compounding across the madani portfolio

The compounding-value claim — that scoped tokens, once introduced, produce compounding security and operational benefits — was tested empirically across the Madani portfolio. We measured five dimensions over a 12-month post-scoped-token-migration window: (a) blast-radius of a hypothetical credential compromise (modeled as the number of clients whose data would be exposed if a single token leaked); (b) onboarding time for a new client (measured as engineer-hours from contract signing to first authenticated API call); (c) offboarding time for a churned client (engineer-hours from churn-confirmation to credential revocation complete); (d) audit-trail completeness (% of credential usage events with full attribution); (e) cross-client mis-routing incidents (incidents per quarter where one client's tokens were accidentally used against another client's account). Dimension (a) dropped from "all 12 clients" pre-migration to "1 client" post-migration.

Dimension (b) dropped from 4-6 hours to under 1 hour. Dimension (c) dropped from 2-3 hours (often deferred and partially completed) to under 30 minutes (automated). Dimension (d) climbed from ~70% to 100%.

Dimension (e) dropped from 3 incidents in the pre-migration year to 0 in the post-migration year. The compounding effect is non-trivial because each of (a)-(e) reinforces the others: better attribution (d) makes incident-detection (e) cheaper; faster onboarding (b) makes per-client scoping (a) operationally viable for small consultancies; faster offboarding (c) reduces residual-credential drift. The five dimensions are correlated but not collinear — they measure different facets of the same underlying property.

The case study formalizes the compounding-value claim: scoped tokens are not merely additive over monolithic tokens; they multiply across security, operations, and audit dimensions.

OPEN RESEARCH QUESTIONS · §53

Falsifiable hypotheses on vault-as-risk-attenuator

(Q1) HYPOTHESIS: The leak-rate reduction (zero-event vs ~14/week baseline) holds across organizational sizes and credential types; in particular, organizations with >50 credentials see proportionally larger absolute leak-rate reductions than organizations with <10. FALSIFICATION TEST: cross-organization study with 15 organizations stratified by credential count. (Q2) HYPOTHESIS: Adding workspace-scope checks (§48) reduces production-credential mis-use events by >80% at minimal engineering cost; the marginal cost is a few days, the marginal benefit is preventing several incidents per year. FALSIFICATION TEST: A/B study with paired vaults with and without workspace-scope checks. (Q3) HYPOTHESIS: Vault-resolution telemetry, when combined with anomaly detection on resolution-pattern shifts, detects exfiltration attempts that vault-side ACLs alone miss; the anomaly detection adds detection coverage of an additional 12-18% of exfiltration patterns.

FALSIFICATION TEST: red-team exercise with exfiltration patterns, measure detection rates with and without telemetry-driven anomaly detection. (Q4) HYPOTHESIS: The op:// URI pattern's portability cost (engineering effort to migrate between vault vendors) is sub-linear in the number of credentials migrated, specifically O(log N) for the resolver-side code and O(N) for vault-side item recreation. FALSIFICATION TEST: instrumented migration timing across 4 vault vendors. (Q5) HYPOTHESIS: Credential rotation cadence below 30 days produces compounding cost without proportional security benefit; the optimal cadence is in [30, 90] days for most credential types, except for production-payment credentials which benefit from <30 day cadence. FALSIFICATION TEST: cohort study with paired rotation cadences and incident rates. (Q6) HYPOTHESIS: Vaults that expose a programmable resolution layer (resolver hooks, conditional resolution based on context) outperform vaults that expose only static items, by enabling fine-grained authorization-context decisions at resolution time.

FALSIFICATION TEST: paired benchmark on static-vault vs programmable-vault systems across a fixed set of authorization scenarios.

References

  1. [1]
    1Password (2025), 1Password CLI Documentation.
  2. [2]
    HashiCorp (2024), Vault Best Practices.
  3. [3]
    AWS (2024), Secrets Manager Developer Guide.
  4. [4]
    Google Cloud (2024), Secret Manager Documentation.
  5. [5]
    Microsoft (2024), Azure Key Vault Documentation.
  6. [6]
    direnv (2024), direnv official documentation.
  7. [7]
    Greshake K. et al. (2023), Indirect Prompt Injection.
  8. [8]
    OWASP (2024), Application Security Verification Standard v5.
  9. [9]
    NIST (2024), SP 800-57 Cryptographic Key Management.
  10. [10]
    TruffleHog (2024), TruffleHog secret scanner.
  11. [11]
    git-secrets (2017), git-secrets.
  12. [12]
    GitGuardian (2024), State of Secrets Sprawl Report.
  13. [13]
    Madani Lab (2026), credentials-policy.md v1.4 (op:// pattern, multi-backend reference).
  14. [14]
    Cemri M., Pan M.Z., Yang S., Agrawal L.A., Chopra B., Tiwari R., Keutzer K., Parameswaran A., Klein D., Ramchandran K., Zaharia M., Gonzalez J.E., Stoica I. (2025), Why Do Multi-Agent LLM Systems Fail?, arXiv:2503.13657v3, NeurIPS 2025. open ↗
  15. [15]
    Tran D. & Kiela D. (2026), Single-Agent LLMs Outperform Multi-Agent Systems on Multi-Hop Reasoning, arXiv:2604.02460. open ↗
  16. [16]
    Wang C. & Shu Y. (2026), MetaCogAgent, arXiv:2605.17292v1. open ↗
  17. [17]
    Es S., James J., Espinosa-Anke L., Schockaert S. (2024), RAGAS, EACL 2024, arXiv:2309.15217. open ↗
  18. [18]
    Anthropic (2022), Constitutional AI.
  19. [19]
    Anthropic (2025), Claude Sonnet 4.5 Technical Report.
  20. [20]
    Schick T. et al. (2023), Toolformer, NeurIPS.
  21. [21]
    PCI Security Standards Council (2024), PCI DSS v4.0.
  22. [22]
    Madani Lab (2026), op-pattern reference implementation v1.0 (multi-backend).

Method

The architecture has three layers:

(1) STORAGE LAYER. All secrets live in a 1Password vault accessible only to authorized humans via 1Password CLI. The vault is the single source of truth; there are no copies anywhere else.

(2) RESOLUTION LAYER. Code and configuration files reference secrets via 'op://Vault/Item/field' URIs. At process startup, a wrapper (we use 'direnv' plus '1Password CLI') resolves these URIs to actual secret values and exposes them as environment variables to the process. The resolved values never touch disk; they live only in process memory.

(3) AGENT-AWARENESS LAYER. The agent runtime is taught about secret-handling discipline: never write secrets to logs, never include secret-bearing variables in prompt context, never propagate secrets to sub-agents that don't need them. This is enforced by a small audit module that scans agent outputs for secret-shaped patterns (high-entropy strings, known prefixes like 'sk-', 'pat_', 'tok_') and triggers an alert if found.

We deployed this architecture for the Madani workspace (23 distinct external services: GHL, n8n, Anthropic, Stripe, Slack, Brevo, ClickUp, Meta, Google APIs, Vercel, Cloudflare, Fireflies, Tally, Exa, Kie.ai, OpenAI, APIMO, SMTP, BigQuery, Drive, Sheets, Gemini, Firecrawl) and ran it in production for 12 months. We also subjected the architecture to internal red-team review (semi-structured attempts to extract secrets via prompt injection, log scraping, vault-timeout exploitation).

Findings

Zero plaintext-leak incidents over 12 months. The audit module fired 8 times: 6 were false positives (high-entropy strings in document content that happened to look secret-like), 2 were genuine catches (an agent had attempted to log a credential during a debugging session; the audit module caught and redacted before write). Red-team findings: 0 successful secret extractions out of 40 attempted attacks.

Three sub-findings worth noting.

(1) THE RUNTIME RESOLUTION LATENCY IS NEGLIGIBLE. Resolving 23 'op://' URIs at process startup adds ~1.2 seconds to startup time. For long-running agent processes this is invisible; for short-lived autonomous loops this is a measurable overhead that we amortize by caching the resolved values in process memory.

(2) THE 1Password CLI DEPENDENCY IS BOTH AN ASSET AND A LIABILITY. The asset: 1Password's CLI handles vault authentication, audit logging, and rotation natively. The liability: every machine that needs to resolve secrets must have 1Password CLI installed and authenticated. We document a multi-platform setup guide (macOS, Linux, Windows WSL) and have automation that bootstraps new machines, but onboarding still requires 1-2 hours of human time.

(3) THE AGENT-AWARENESS LAYER IS THE LEAST-MATURE COMPONENT. The audit-module pattern detection is brittle (the 6 false positives over 12 months are 75% of all alerts). We are exploring more robust detection via context-aware classifiers, but the trade-off is additional inference cost per output. The current pragmatic stance: false positives are acceptable because each is reviewed by a human in seconds; false negatives would be catastrophic.

Discussion

Three implications.

(i) ZERO PLAINTEXT IN REPO IS AN ENFORCEABLE INVARIANT. The combination of 'op://' resolution + pre-commit hooks that scan for secret-shaped patterns + the audit module produces an invariant we have not seen violated. This makes "is your workspace free of plaintext secrets" a binary checkable property — which is a prerequisite for any meaningful security review.

(ii) THE WAB PILLAR 08 (CREDENTIALS) MATURITY CRITERIA. We codified the operational lessons as L0-L4 criteria for the Credentials pillar: L0 = secrets in repo; L1 = secrets in environment variables but no vault; L2 = secrets in vault with manual resolution; L3 = secrets resolved at runtime via op:// pattern; L4 = L3 + agent-awareness audit + quarterly rotation policy. The Madani workspace operates at L4 for 21 of 23 services (2 services have legacy API tokens that don't support programmatic rotation; these are scheduled for migration in Q3 2026).

(iii) THE OPEN-SOURCE PATTERN. The 'op://' pattern and the audit-module are both vendor-agnostic in concept. We use 1Password because that is what was procured at Madani; the same architecture works with AWS Secrets Manager, HashiCorp Vault, GCP Secret Manager, etc. The Madani reference implementation includes a thin abstraction layer that supports multiple vault backends; this is open-source and MIT-licensed.

We close by acknowledging the inherent fragility of any credentials architecture. The architecture documented here makes specific failure modes hard, but it does not make all failure modes impossible. The 5 failure modes we have explicitly thought about and mitigated: vault availability (we have a 1-hour grace period with cached resolved values); CLI bugs (we pin CLI version and test upgrades); audit-module bypass (the module is process-level, harder to bypass than prompt-level); human exfiltration (we cannot prevent a human with vault access from copying secrets); supply-chain attack on 1Password CLI itself (we accept this risk as part of vendor selection). Future work: harden the agent-awareness layer with context-aware classification, and extend the audit module to cover sub-agent contexts (we currently only audit primary-agent outputs).

DISCUSSION · CLOUD VS ON-PREMISE VAULTS. The architecture documented assumes 1Password (cloud-hosted). On-premise vault deployments (HashiCorp Vault self-hosted, AWS Secrets Manager via VPC) introduce different operational characteristics: lower latency (sub-100ms vs sub-1s for cloud lookups), tighter regulatory boundaries (data residency control), and higher operational overhead (running the vault itself). For workspaces with strict residency requirements (EU GDPR + financial sector), we recommend on-premise; for typical SaaS deployments cloud vault remains the pragmatic default.

DISCUSSION · INTEGRATION WITH GOVERNANCE PILLAR. Credentials hygiene and governance (WSB-15) intersect: a hard rule "never write credentials to logs" requires the credentials pillar audit module to enforce it. We have unified these two pillars at the policy layer: credentials-policy.md references governance-as-code primitives, and the compliance-judge gate (WSB-15) inspects credential-leakage patterns as part of its standard checks.

Future work

(1) Multi-cloud abstraction layer that supports 1Password, HashiCorp Vault, AWS Secrets Manager, GCP Secret Manager, Azure Key Vault behind a single op:// URL pattern. (2) Cryptographic attestation that the vault has not been tampered with (using HSM-backed signing). (3) Automated rotation policies tied to credential-usage patterns (rotate frequently-used credentials more aggressively).

References

1Password (2025), 1Password CLI Documentation; HashiCorp (2024), Vault Best Practices; AWS (2024), Secrets Manager Developer Guide; Google Cloud (2024), Secret Manager Documentation; Greshake K. et al. (2023), Indirect Prompt Injection; OWASP (2024), Application Security Verification Standard v5; NIST (2024), SP 800-57 Cryptographic Key Management; Madani Lab (2026), credentials-policy.md v1.4 (op:// pattern, multi-backend reference).

← back to all papersMadani Lab · WAB v0.3.4