# Claude Managed Agents Drop — KAIRO Adoption Decision

Source: @ClaudeDevs X-thread, 2026-05-06.
URL: https://x.com/claudedevs/status/2052069321355182447

## TL;DR decision matrix

| Feature | Verdict | Effort | Order |
|---|---|---|---|
| **Outcomes-loop (rubric self-grading)** | ✅ ADOPT NOW | LOW (write 1 markdown rubric) | 1st |
| **Webhooks → Cloudflare Worker → NTFY/GHL** | ✅ ADOPT WITH BRIDGE | MED (deploy 1 Worker) | 2nd |
| **Multi-agent orchestration** | ✅ ADOPT for V2 Prospect Cloner build | MED-HIGH (rebuild on Managed Agents API) | 3rd |
| **/claude-api skill** | ✅ ALREADY ACTIVE | NONE (auto-loaded in this session) | done |
| **Dreaming (memory curation)** | ❌ SKIP for now, revisit at scale | HIGH | n/a |

---

## 1. Outcomes-Loop — adopt now

**What:** Plain Markdown rubric uploaded as a file. A grader sub-agent (in its own context window, deliberately bias-isolated from the main agent) evaluates each iteration of work against the rubric and feeds back a per-criterion gap list. Loop runs until satisfied, max iterations hit (default 3, ceiling 20), or interrupted.

**Cost:** Standard token rates. Grader call ~2,400 input + 350 output tokens per iteration in the docs example (so $0.005-$0.02 per grade depending on model).

**Code shape (Python):**
```python
client.beta.sessions.events.send(
    session_id=session.id,
    events=[{
        "type": "user.define_outcome",
        "description": "Build a prospect dossier for ICE COLD YETI",
        "rubric": {"type": "text", "content": RUBRIC},
        "max_iterations": 5,
    }],
)
```

**Why now:** The 4 prevention rules I saved to memory after the Yeti v0 disaster (no-pricing-on-first-touch, recheck-brief-before-each-artifact, all-hands-build-agents, open-every-artifact-judgment) literally rewrite as criteria in a single rubric file. The grader catches the failures BEFORE Anthony reviews. Estimated value: every Yeti-style 3-round redo saved is roughly $9-18 in API spend plus 30-60 minutes of Anthony's review time.

**KAIRO Prospect Dossier Rubric v1 (drop-in ready):**

```markdown
# KAIRO Prospect Dossier Rubric

## Brand Identity
- Brand name used throughout is either the confirmed locked name or the exact phrase "marketing and AI company". No other variations.
- No em dashes appear anywhere in the copy. Use commas, periods, semicolons, parentheses, or rewrite.
- All logos referenced were opened and visually confirmed against the prospect's existing mark before use.

## Contact Accuracy
- The phone number in any tel: or sms: link is Anthony's number 361-549-6417, not the prospect's.
- The prospect's address and business category were verified against a live source, not inferred.

## Audience Positioning
- The target audience is named by job title or buying role, not described as "general" or "everyone".
- No pricing figures, service fees, or cost estimates appear anywhere in the deliverable.
- The deliverable references the in-person conversation with the prospect, not a cold-discovery framing.

## Audio + Visual Alignment
- If audio is present, the script matches the page copy in tone and substance, no contradictions.
- Audio for prospect-facing artifacts opens with "Hi, this is Anthony's research agent" and stays in agent voice.

## Output Verification
- A completed deliverable file exists at the expected path. A plan or outline alone does not satisfy.
- The deployment URL resolves and returns HTTP 200 before marking complete.
- All asset paths embedded in the HTML resolve to real files in the deploy bundle.
```

**Action:** save this rubric at `~/KAIRO/projects/orion-ai-co/prospect-engine/rubrics/prospect-dossier-v1.md` and reference it via the Files API on every dossier-generating session once we move to Managed Agents.

---

## 2. Webhooks — adopt with thin Cloudflare Worker bridge

**What:** Anthropic POSTs JSON event payloads to your registered HTTPS endpoint when session state changes. At-least-once delivery, idempotency via `event.id`. Console-only registration (UI), HTTPS port 443 only, signed with `X-Webhook-Signature` header. SDK has an `unwrap()` helper that verifies signature and rejects payloads more than 5 minutes old.

**Event types worth subscribing to:**
- `session.outcome_evaluation_ended` (the big one — fires when the grader finishes an iteration)
- `session.status_idled` (agent waiting for input)
- `session.thread_created` / `session.thread_terminated` (multi-agent observability)
- `session.status_terminated` (terminal failure)

**Why a Cloudflare Worker bridge:** Anthropic webhooks send `Content-Type: application/json` with a complex schema. NTFY expects raw text body with title/priority/tags as headers. They don't speak the same wire format. A 10-line Cloudflare Worker translates one to the other:

```javascript
export default {
  async fetch(request, env) {
    const event = await request.json();
    if (event.data.type !== 'session.outcome_evaluation_ended') {
      return new Response('ignored', { status: 200 });
    }
    const session = await fetchSession(event.data.id, env.ANTHROPIC_KEY);
    const dossierUrl = session.output_url; // or whatever the artifact URL is
    await fetch('https://ntfy.sh/kairo-ant-updates', {
      method: 'POST',
      headers: {
        'Title': '🎯 Prospect dossier ready',
        'Priority': 'high',
        'Actions': `view, Open dossier, ${dossierUrl}, clear=true`,
      },
      body: `Dossier ready for review: ${dossierUrl}`,
    });
    return new Response('ok');
  },
};
```

**Architecture for v2 Prospect Engine:**
```
Plod transcript drops in iCloud / inbox folder
  → file-watch trigger (Cloudflare R2 event or polled)
  → POSTs to thin starter Worker
  → Worker creates Managed Agent session via Anthropic API
  → Agent runs prospect research with multi-agent orchestration + outcomes-loop rubric
  → On session.outcome_evaluation_ended, Anthropic webhook fires our Cloudflare Worker
  → Worker fetches dossier artifact URL from session
  → Worker POSTs to NTFY → Anthony's phone with action button
```

**Action:** deploy 2 thin Cloudflare Workers (starter + completion bridge) when we build v2. Plus configure 1 webhook in the Console pointing at the completion-bridge Worker URL.

---

## 3. Multi-Agent Orchestration — adopt for V2 build

**What:** The coordinator agent has a `multiagent.agents` list of pre-created sub-agent IDs. At runtime, the coordinator delegates to any of them via the `agent_toolset_20260401` tools. Each sub-agent runs in its own thread (isolated context window, own model, own system prompt, own tools) but they share the same container/filesystem.

**Limits:**
- 25 concurrent threads per session
- 1 level of nesting (coordinator → sub-agents; sub-sub-agents not supported)
- 20 unique agents max in the roster
- Up to 25 active threads can hit a single sub-agent (multi-instance)

**Code shape (Python):**
```python
coordinator = client.beta.agents.create(
    name="Prospect Cloner Coordinator",
    model="claude-opus-4-7",
    system="You receive a Plod transcript and orchestrate prospect research...",
    tools=[{"type": "agent_toolset_20260401"}],
    multiagent={
        "type": "coordinator",
        "agents": [
            {"type": "agent", "id": web_auditor.id},
            {"type": "agent", "id": social_auditor.id},
            {"type": "agent", "id": ad_scanner.id},
            {"type": "agent", "id": reviews_scanner.id},
            {"type": "agent", "id": gbp_scanner.id},
            {"type": "agent", "id": positioning_strategist.id},
        ],
    },
)
```

**Why now (vs Claude Code Agent tool):** The Claude Code Agent tool (what I currently use to spawn `general-purpose` and `Explore` agents) only works inside a Claude Code session. It can't run as a standalone Python service. It also shares the parent context window, so spawning 6 research agents pollutes my context. Managed Agents gives hard context isolation and works as a standalone service — perfect for the v2 Prospect Cloner that triggers from a Plod transcript drop, no Claude Code session involved.

**Migration cost:** medium. Pre-create 6 specialized sub-agents (one per recon surface) once, store IDs. Build a thin Python service or Cloudflare Worker that creates a session per Plod transcript and streams events. Replaces the hand-rolled `Agent` tool spawning + `/tmp` polling.

**Action:** build into the v2 Prospect Cloner architecture. Not a same-day adopt because we're not building v2 today.

---

## 4. /claude-api Skill — already active

**What:** Anthropic's official skill that gives Claude deep, opinionated knowledge of the Anthropic SDK. Auto-fires on imports of `anthropic`/`@anthropic-ai/sdk`. Enforces:
- Default model: `claude-opus-4-7` unless explicitly overridden
- Default thinking: `{type: "adaptive"}` on 4.6+
- Default streaming: ON for any long input/output or high `max_tokens`
- Default `max_tokens`: ~16k non-streaming, ~64k streaming
- Use SDK objects, never raw `requests`/`fetch`
- Prompt caching enforcement: 4096-token minimum for Opus 4.7/4.6/Haiku 4.5; place breakpoints on system prompts
- Migration handling: knows every retired model and its drop-in replacement

**Status in this session:** already loaded (visible in the available-skills list at the top of every session). No install action needed.

**Where it actually applies in KAIRO:** the skill's SKIP rule (`file imports openai`) excludes most existing KAIRO scripts because they use OpenAI for image gen and TTS, not Anthropic. The skill's value lights up when we write new Claude-API scripts. The two highest-value triggers:

1. Any new script in the v2 Prospect Cloner build (will use `import anthropic` heavily for Managed Agents)
2. Any KAIRO orchestration script that wants to summarize/categorize content (e.g. a brain-summarizer agent over `~/KAIRO/brain/insights/`)

**Action:** no install action. When we write new KAIRO scripts touching the Anthropic SDK, expect the skill to enforce caching, model, and thinking defaults automatically.

---

## 5. Dreaming — skip for now

**What:** Manually-triggered async job. POSTs to `/v1/dreams` with an input memory store and up to 100 session transcripts. Outputs a NEW memory store with duplicates merged, stale entries replaced, new patterns surfaced. Input store is never modified.

**Why skip:** Dreaming operates on Anthropic-hosted memory stores (cloud API resource: `memstore_01...`). Your existing memory at `~/.claude/projects/-Users-ant-mba-KAIRO/memory/` is a local `.md` file system. The two are completely separate storage layers with no bridge.

To use Dreaming you would need to:
1. Manually populate a Managed Agents memory store via the Memory Stores API with all current `.md` content (one-time migration script)
2. Rewrite the write pipeline so future memories go to the API store instead of `.md` files
3. Lose the git-versionability and grep-ability of the current system
4. Pay for Opus or Sonnet to run over all input sessions on every dream

**Risk vs Memory File Protection rule:** Dreaming is structurally non-destructive (input store untouched, output is a new store you review or discard), so it partially aligns with your rule. But there's no built-in diff/approval gate. You'd be reviewing two potentially-large memory stores side by side without UI assistance.

**When to revisit:** if KAIRO scales to hundreds of sessions per week and manual curation becomes impractical. Today: 29 active memory entries, you have firm control. Manual stays the right call.

---

## Build sequence (recommended)

1. **Today (10 min):** save the Prospect Dossier rubric to `prospect-engine/rubrics/prospect-dossier-v1.md`. Use it as a checklist on the next manual prospect dossier. Even without Managed Agents, the rubric is a quality gate I can self-grade against per the existing "open every artifact, write one-line judgment" memory rule.

2. **This week (1-2 hr):** prove out Managed Agents with a tiny one-shot script. Pre-create 1 coordinator + 2 sub-agents. Run a single small prospect (a hypothetical clone of the Yeti recon) end-to-end. Verify outcomes-loop catches a deliberate violation (e.g. plant a $5,000 price in the dossier, see the rubric flag it).

3. **Next 1-2 weeks (4-6 hr):** build the v2 Prospect Cloner architecture:
   - 6 sub-agents: web-auditor, social-auditor, ad-scanner, reviews-scanner, gbp-scanner, positioning-strategist
   - Coordinator with `multiagent.agents`
   - Cloudflare Worker starter + completion-bridge
   - Webhook registered in Console
   - End-to-end test: drop a Plod transcript, dossier hosted at Cloudflare Pages within 10 minutes, NTFY notification with action button

4. **Whenever Plod has a webhook trigger:** wire Plod transcript → Cloudflare R2 upload → R2 event → starter Worker. Until then, manual file-watch is fine.

5. **Defer:** Dreaming until KAIRO operates at scale that warrants it.

## What this means for billing / cost

Estimated incremental cost per prospect run on the v2 architecture:
- 6 sub-agents × Sonnet × ~3-5 minutes each = ~$0.50-1.50
- Coordinator orchestration overhead = ~$0.10-0.30
- Outcomes-loop grader, 2-3 iterations = ~$0.02-0.06
- Image generation (gpt-image-2 logos, separate from Anthropic) = ~$2-3
- Audio briefing (OpenAI Nova TTS, separate) = ~$0.05-0.10
- Cloudflare Pages deploy = $0
- Cloudflare Workers + R2 = ~$0 (free tier)
- NTFY = $0

**Total: roughly $3-5 per prospect.** Same ballpark as today, with much less manual orchestration.

## Sources

- [Multi-agent orchestration](https://platform.claude.com/docs/en/managed-agents/multi-agent)
- [Define outcomes (rubrics)](https://platform.claude.com/docs/en/managed-agents/define-outcomes)
- [Webhooks + supported event types](https://platform.claude.com/docs/en/managed-agents/webhooks#supported-event-types)
- [Dreaming (memory curation)](https://platform.claude.com/docs/en/managed-agents/dreams)
- [/claude-api skill OSS repo](https://github.com/anthropics/skills/tree/main/skills/claude-api)
- @ClaudeDevs X-thread anchor: https://x.com/claudedevs/status/2052069321355182447

Full agent transcripts retained at `/private/tmp/claude-501/.../tasks/` (do not delete).