# Design: Migrate hu-agent agentic provider from `cursor-agent` to Claude Code CLI - **Date:** 2026-05-28 - **Author:** Iván Gómez Yaury - **Status:** Draft (pending review) - **Working with:** Agustín De Luca (will run the plan and test jointly) ## Context `hu-agent` today delegates all agentic coding work to the **Cursor CLI agent**: `CursorCliClient.runAgent(prompt, workDir)` spawns the `cursor-agent` binary as a subprocess inside a freshly cloned target repo. The binary runs the full agentic loop (reads files, edits them, runs commands, makes git changes), streams `stream-json` events to stdout, and the client extracts the final assistant text. The pipelines (`fix`, `pr-comment`, `pr-mention`, `slack-mention`) build prompts with sentinel markers (`PR_TITLE_*`, `PR_BODY_*`, `REPLY_*`, `VERDICT/SUMMARY/QUESTIONS`) and parse those markers out of the returned text. The AI Task Force decided (Slack `#tech-ai-task-force`, 2026-05-28) to: 1. **Migrate hu-agent to Claude first** (Matías D'Elia — an Anthropic API key is provided for now). 2. Move to **Glados** (Humand's internal LLM gateway) as a **medium-term** goal. Glados cannot host the agentic loop yet: it does not expose `tool_use`/functions and discards the model's `thinking` blocks (Agustín De Luca's blocker, 2026-05-27). Since the agent needs file editing and command execution, Claude-direct is the only way to move forward now. Going Claude-direct also **restores** the reasoning/thinking visibility Agustín flagged we'd lose — Anthropic returns thinking blocks, unlike Glados today. Agustín's explicit constraint: a **minimal migration, focused on Claude, with no architecture changes**, due to time. ## Goal Replace the agentic provider with **Claude Code CLI in headless mode** with the smallest possible diff, keeping pipelines, prompt markers, and the `runAgent(prompt, workDir) → CliRunResult` contract untouched. Ship behind a provider toggle for safe rollback, and leave a configuration seam for the future Glados switch. ## Non-Goals - **No structured/tool output.** The `PR_*` / `REPLY_*` / `VERDICT` markers stay exactly as they are. Switching to tool-based structured output is an architecture change and is out of scope. - **No `CursorCloudClient` migration.** The Cursor cloud API (launches an agent that auto-creates a PR from a repo URL, used by a debug route) has no clean Claude analog. It stays on Cursor, unchanged. - **No Claude Agent SDK.** Adopting the SDK changes the consumption model (async message iteration, permission callbacks, session management). That is the "cambios en arquitectura" Agustín asked to avoid. - **No Glados migration now.** Blocked on Glados supporting `tool_use` + thinking passthrough. This spec only prepares the seam (`ANTHROPIC_BASE_URL`). ## Approach (chosen over alternatives) Three options were considered: 1. **Claude Code CLI headless (chosen).** Swap the spawned binary. `cli-client.ts` already does subprocess spawn + `stream-json` parsing + final-text extraction; the cursor→claude invocation maps almost 1:1. Smallest diff, matches the "minimal migration" constraint, and keeps deploy consistent with how `cursor-agent` is already shipped (curl installer). 2. **Claude Agent SDK** — cleaner long-term architecture but larger diff and an architecture change. Rejected for now. 3. **Raw Anthropic API + custom agentic loop** — maximum control, re-implements what Claude Code gives for free, largest surface area. Rejected. ## Architecture ### Provider abstraction + toggle Extract a minimal interface that the current `CursorCliClient` already satisfies de facto: ```ts export interface AgentRunner { runAgent( prompt: string, workDir: string, timeoutMs?: number, options?: RunAgentOptions, ): Promise; } ``` Add `ClaudeCliClient implements AgentRunner`. In `index.ts`, a switch on `AGENT_PROVIDER` (`claude` | `cursor`) selects which implementation is injected into the pipelines. **The pipelines do not change** — they keep depending on the `AgentRunner` shape (currently typed as `CursorCliClient`; the field type widens to `AgentRunner`). ### `ClaudeCliClient` (new — mirrors `cli-client.ts`) Reuse the existing skeleton verbatim where possible: subprocess spawn, stdout/stderr streaming, timeout timer with `SIGTERM`→`SIGKILL` grace kill, prompt-to-file fallback when the prompt exceeds the argv limit, per-issue log tagging. Changes vs the cursor client: | Concern | cursor-agent (today) | Claude Code CLI | |---|---|---| | Invocation | `cursor-agent agent "

" --api-key K --model M --yolo --print --trust --output-format stream-json --stream-partial-output` | `claude -p "

" --model M --output-format stream-json --include-partial-messages --verbose --dangerously-skip-permissions` | | Auth | `--api-key` flag + `CURSOR_API_KEY` env | `ANTHROPIC_API_KEY` env (no flag) | | Endpoint | n/a | `ANTHROPIC_BASE_URL` env (Glados seam, unset by default) | | Permission bypass | `--yolo --trust` | `--dangerously-skip-permissions` | | Partial streaming | `--stream-partial-output` | `--include-partial-messages` (requires `--verbose` in print mode) | | Working dir | spawn `cwd` | spawn `cwd` (no flag needed for primary dir) | | Timeout | external subprocess kill | external subprocess kill (Claude Code has no `--timeout`) | | Container env | — | `DISABLE_AUTOUPDATER=1` | Binary install in the Dockerfile: `curl -fsSL https://claude.ai/install.sh | bash` (same pattern as the existing `cursor-agent` install). > **Verify before coding:** the exact CLI flag names and behavior (especially > `--include-partial-messages` + `--verbose` coupling and `--dangerously-skip-permissions`) > must be confirmed against the installed `claude --help` / current docs at > implementation time — this spec records the intended mapping, not a frozen contract. ### stream-json parser Rewrite `handleStreamLine` (live stderr→Datadog logging) and `extractAssistantText` (final text extraction) to Claude Code's `stream-json` event schema. The broad shape: - a `system` / `init` first event with session metadata (model, tools, session id), - per-turn assistant message events carrying `content[]` blocks (`text`, `thinking`, `tool_use`), - tool result events, - partial-message delta events (when `--include-partial-messages`), - a final event carrying the last assistant message / result. **The exact `type` values and field paths must be verified empirically** by running `claude -p --output-format stream-json` against a sample task once the Anthropic key is available, and captured as fixtures. The parser must keep: - streaming thinking + tool-call events to stderr (preserves the production traceability Agustín worried about), and - returning the **final** assistant text (with markers intact) as `CliRunResult.output`, so the existing marker parsers in `prompts.ts` keep working unchanged. This is the only behaviorally-risky part of the migration and gets its own verification checkpoint in the implementation plan. ### Config / env (`utils/config.ts`) Add (Zod-validated, the only place reading `process.env`): - `AGENT_PROVIDER`: `claude` | `cursor`. **Default `cursor`** (see Rollout — the first deploy must be inert). - `ANTHROPIC_API_KEY`: required when `AGENT_PROVIDER=claude`. - `ANTHROPIC_MODEL`: default `claude-opus-4-8`. Configurable (drop to Sonnet if cost/volume demands). - `ANTHROPIC_BASE_URL`: optional. Unset = api.anthropic.com. The Glados seam. Keep all `CURSOR_*` vars while the toggle exists. Update `.env.example`. Env names are provider-specific on purpose: the `claude` binary reads `ANTHROPIC_API_KEY` and `ANTHROPIC_BASE_URL` natively, so these are not arbitrary. ## Error handling Mirror the cursor client: typed `CursorAgentError` equivalent (reuse the existing `core/errors.ts` classes — `CursorAgentError`/`JobTimeoutError`, or a renamed provider-neutral `AgentError` if cheap), non-zero exit → `success: false`, timeout → `JobTimeoutError`, process-died-immediately guard, best-effort temp-dir cleanup. No new error semantics. ## Testing - New `test/cursor/claude-cli-client.test.ts` (mirrors the cursor client tests): mock `Bun.spawn`, feed `stream-json` line fixtures, assert `extractAssistantText` output and that the marker parsers (`extractReply`, `extractPrTitle`, `parseVerdictStructuredOutput`, etc.) still extract correctly from it. - Fixtures captured from a real `claude -p` run (the verification checkpoint). - Reuse `test/_helpers` (`mock-logger`, fixture helpers). - Keep coverage ≥ 80% (`bunfig.toml` threshold); the new file must be tested, not added to `coveragePathIgnorePatterns`. ## Rollout / rollback 1. **Deploy with `AGENT_PROVIDER=cursor`** (the default). This validates the whole refactor is inert in production — cursor keeps running exactly as before with all the new code merged. 2. Once confirmed healthy, **flip `AGENT_PROVIDER=claude` and redeploy.** Test the Claude path jointly (Ivo + Agustín), ideally first against a real ticket end to end. 3. If Claude misbehaves, **flip back to `cursor`** (env change + redeploy) — no code revert needed. 4. After Claude is validated in production, a follow-up PR removes `cursor-agent`, the `CURSOR_*` vars, and the toggle (cloud client decision revisited separately). ## Open questions / dependencies - Anthropic API key from Matías (in flight). - Exact Claude Code `stream-json` event schema — verified at implementation time against real output. - Glados medium-term path stays blocked on Glados `tool_use` + thinking passthrough; tracked separately (Rula). This spec's `ANTHROPIC_BASE_URL` seam is the only prep for it.