# Design: Migrate hu-agent agentic provider from `cursor-agent` to Claude Code CLI

- **Date:** 2026-05-28
- **Author:** Iván Gómez Yaury
- **Status:** Draft (pending review)
- **Working with:** Agustín De Luca (will run the plan and test jointly)

## Context

`hu-agent` today delegates all agentic coding work to the **Cursor CLI agent**:
`CursorCliClient.runAgent(prompt, workDir)` spawns the `cursor-agent` binary as a
subprocess inside a freshly cloned target repo. The binary runs the full agentic
loop (reads files, edits them, runs commands, makes git changes), streams
`stream-json` events to stdout, and the client extracts the final assistant text.
The pipelines (`fix`, `pr-comment`, `pr-mention`, `slack-mention`) build prompts
with sentinel markers (`PR_TITLE_*`, `PR_BODY_*`, `REPLY_*`, `VERDICT/SUMMARY/QUESTIONS`)
and parse those markers out of the returned text.

The AI Task Force decided (Slack `#tech-ai-task-force`, 2026-05-28) to:

1. **Migrate hu-agent to Claude first** (Matías D'Elia — an Anthropic API key is
   provided for now).
2. Move to **Glados** (Humand's internal LLM gateway) as a **medium-term** goal.

Glados cannot host the agentic loop yet: it does not expose `tool_use`/functions
and discards the model's `thinking` blocks (Agustín De Luca's blocker, 2026-05-27).
Since the agent needs file editing and command execution, Claude-direct is the only
way to move forward now. Going Claude-direct also **restores** the reasoning/thinking
visibility Agustín flagged we'd lose — Anthropic returns thinking blocks, unlike
Glados today.

Agustín's explicit constraint: a **minimal migration, focused on Claude, with no
architecture changes**, due to time.

## Goal

Replace the agentic provider with **Claude Code CLI in headless mode** with the
smallest possible diff, keeping pipelines, prompt markers, and the
`runAgent(prompt, workDir) → CliRunResult` contract untouched. Ship behind a
provider toggle for safe rollback, and leave a configuration seam for the future
Glados switch.

## Non-Goals

- **No structured/tool output.** The `PR_*` / `REPLY_*` / `VERDICT` markers stay
  exactly as they are. Switching to tool-based structured output is an
  architecture change and is out of scope.
- **No `CursorCloudClient` migration.** The Cursor cloud API (launches an agent
  that auto-creates a PR from a repo URL, used by a debug route) has no clean
  Claude analog. It stays on Cursor, unchanged.
- **No Claude Agent SDK.** Adopting the SDK changes the consumption model
  (async message iteration, permission callbacks, session management). That is
  the "cambios en arquitectura" Agustín asked to avoid.
- **No Glados migration now.** Blocked on Glados supporting `tool_use` + thinking
  passthrough. This spec only prepares the seam (`ANTHROPIC_BASE_URL`).

## Approach (chosen over alternatives)

Three options were considered:

1. **Claude Code CLI headless (chosen).** Swap the spawned binary. `cli-client.ts`
   already does subprocess spawn + `stream-json` parsing + final-text extraction;
   the cursor→claude invocation maps almost 1:1. Smallest diff, matches the
   "minimal migration" constraint, and keeps deploy consistent with how
   `cursor-agent` is already shipped (curl installer).
2. **Claude Agent SDK** — cleaner long-term architecture but larger diff and an
   architecture change. Rejected for now.
3. **Raw Anthropic API + custom agentic loop** — maximum control, re-implements
   what Claude Code gives for free, largest surface area. Rejected.

## Architecture

### Provider abstraction + toggle

Extract a minimal interface that the current `CursorCliClient` already satisfies
de facto:

```ts
export interface AgentRunner {
  runAgent(
    prompt: string,
    workDir: string,
    timeoutMs?: number,
    options?: RunAgentOptions,
  ): Promise<CliRunResult>;
}
```

Add `ClaudeCliClient implements AgentRunner`. In `index.ts`, a switch on
`AGENT_PROVIDER` (`claude` | `cursor`) selects which implementation is injected
into the pipelines. **The pipelines do not change** — they keep depending on the
`AgentRunner` shape (currently typed as `CursorCliClient`; the field type widens
to `AgentRunner`).

### `ClaudeCliClient` (new — mirrors `cli-client.ts`)

Reuse the existing skeleton verbatim where possible: subprocess spawn, stdout/stderr
streaming, timeout timer with `SIGTERM`→`SIGKILL` grace kill, prompt-to-file
fallback when the prompt exceeds the argv limit, per-issue log tagging.

Changes vs the cursor client:

| Concern | cursor-agent (today) | Claude Code CLI |
|---|---|---|
| Invocation | `cursor-agent agent "<p>" --api-key K --model M --yolo --print --trust --output-format stream-json --stream-partial-output` | `claude -p "<p>" --model M --output-format stream-json --include-partial-messages --verbose --dangerously-skip-permissions` |
| Auth | `--api-key` flag + `CURSOR_API_KEY` env | `ANTHROPIC_API_KEY` env (no flag) |
| Endpoint | n/a | `ANTHROPIC_BASE_URL` env (Glados seam, unset by default) |
| Permission bypass | `--yolo --trust` | `--dangerously-skip-permissions` |
| Partial streaming | `--stream-partial-output` | `--include-partial-messages` (requires `--verbose` in print mode) |
| Working dir | spawn `cwd` | spawn `cwd` (no flag needed for primary dir) |
| Timeout | external subprocess kill | external subprocess kill (Claude Code has no `--timeout`) |
| Container env | — | `DISABLE_AUTOUPDATER=1` |

Binary install in the Dockerfile: `curl -fsSL https://claude.ai/install.sh | bash`
(same pattern as the existing `cursor-agent` install).

> **Verify before coding:** the exact CLI flag names and behavior (especially
> `--include-partial-messages` + `--verbose` coupling and `--dangerously-skip-permissions`)
> must be confirmed against the installed `claude --help` / current docs at
> implementation time — this spec records the intended mapping, not a frozen contract.

### stream-json parser

Rewrite `handleStreamLine` (live stderr→Datadog logging) and
`extractAssistantText` (final text extraction) to Claude Code's `stream-json`
event schema. The broad shape:

- a `system` / `init` first event with session metadata (model, tools, session id),
- per-turn assistant message events carrying `content[]` blocks (`text`, `thinking`,
  `tool_use`),
- tool result events,
- partial-message delta events (when `--include-partial-messages`),
- a final event carrying the last assistant message / result.

**The exact `type` values and field paths must be verified empirically** by
running `claude -p --output-format stream-json` against a sample task once the
Anthropic key is available, and captured as fixtures. The parser must keep:

- streaming thinking + tool-call events to stderr (preserves the production
  traceability Agustín worried about), and
- returning the **final** assistant text (with markers intact) as `CliRunResult.output`,
  so the existing marker parsers in `prompts.ts` keep working unchanged.

This is the only behaviorally-risky part of the migration and gets its own
verification checkpoint in the implementation plan.

### Config / env (`utils/config.ts`)

Add (Zod-validated, the only place reading `process.env`):

- `AGENT_PROVIDER`: `claude` | `cursor`. **Default `cursor`** (see Rollout — the
  first deploy must be inert).
- `ANTHROPIC_API_KEY`: required when `AGENT_PROVIDER=claude`.
- `ANTHROPIC_MODEL`: default `claude-opus-4-8`. Configurable (drop to Sonnet if
  cost/volume demands).
- `ANTHROPIC_BASE_URL`: optional. Unset = api.anthropic.com. The Glados seam.

Keep all `CURSOR_*` vars while the toggle exists. Update `.env.example`. Env names
are provider-specific on purpose: the `claude` binary reads `ANTHROPIC_API_KEY` and
`ANTHROPIC_BASE_URL` natively, so these are not arbitrary.

## Error handling

Mirror the cursor client: typed `CursorAgentError` equivalent (reuse the existing
`core/errors.ts` classes — `CursorAgentError`/`JobTimeoutError`, or a renamed
provider-neutral `AgentError` if cheap), non-zero exit → `success: false`, timeout
→ `JobTimeoutError`, process-died-immediately guard, best-effort temp-dir cleanup.
No new error semantics.

## Testing

- New `test/cursor/claude-cli-client.test.ts` (mirrors the cursor client tests):
  mock `Bun.spawn`, feed `stream-json` line fixtures, assert `extractAssistantText`
  output and that the marker parsers (`extractReply`, `extractPrTitle`,
  `parseVerdictStructuredOutput`, etc.) still extract correctly from it.
- Fixtures captured from a real `claude -p` run (the verification checkpoint).
- Reuse `test/_helpers` (`mock-logger`, fixture helpers).
- Keep coverage ≥ 80% (`bunfig.toml` threshold); the new file must be tested, not
  added to `coveragePathIgnorePatterns`.

## Rollout / rollback

1. **Deploy with `AGENT_PROVIDER=cursor`** (the default). This validates the whole
   refactor is inert in production — cursor keeps running exactly as before with all
   the new code merged.
2. Once confirmed healthy, **flip `AGENT_PROVIDER=claude` and redeploy.** Test the
   Claude path jointly (Ivo + Agustín), ideally first against a real ticket end to end.
3. If Claude misbehaves, **flip back to `cursor`** (env change + redeploy) — no code
   revert needed.
4. After Claude is validated in production, a follow-up PR removes `cursor-agent`,
   the `CURSOR_*` vars, and the toggle (cloud client decision revisited separately).

## Open questions / dependencies

- Anthropic API key from Matías (in flight).
- Exact Claude Code `stream-json` event schema — verified at implementation time
  against real output.
- Glados medium-term path stays blocked on Glados `tool_use` + thinking passthrough;
  tracked separately (Rula). This spec's `ANTHROPIC_BASE_URL` seam is the only prep
  for it.