# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

## [2.0.15] - 2026-06-02

### Fixed

- **Qwen 3.7 reasoning compat** — `qwen/qwen3.7-max` on Cline/OpenRouter uses DeepSeek-style `reasoning_content` format. Added `DEEPSEEK_PROXY_COMPAT` so Pi preserves and replays reasoning tokens correctly, preventing plan-mode hangs ([#213](https://github.com/apmantza/pi-free/pull/213)).

- **Kimi K2.6 reasoning compat** — Kimi models on NVIDIA/OpenRouter need `requiresReasoningContentOnAssistantMessages: true` to correctly replay reasoning tokens in assistant messages. Without it, the model gets stuck when trying to call tools or produce output after thinking. Refs [earendil-works/pi#5309](https://github.com/earendil-works/pi/issues/5309) ([#213](https://github.com/apmantza/pi-free/pull/213)).

- **MiniMax reasoning compat** — MiniMax M3 and other MiniMax models now have full DeepSeek-style compat (`thinkingFormat: "deepseek"`, `requiresReasoningContentOnAssistantMessages: true`). Previously, models marked `reasoning: true` without `thinkingFormat` caused Pi to enter plan mode but couldn't parse the reasoning tokens, resulting in hangs ([#212](https://github.com/apmantza/pi-free/pull/212), [#213](https://github.com/apmantza/pi-free/pull/213)).

### Added

- **`/probe-routeway` command** — Tests each Routeway model with a minimal chat request and auto-hides models that return 5xx or 404 errors. Runs lazily on first `session_start` with 24h probe cache TTL. Follows the same pattern as `/probe-nvidia` ([#213](https://github.com/apmantza/pi-free/pull/213)).

## [2.0.14] - 2026-06-02

### Added

- **Routeway provider** — OpenAI-compatible gateway (`api.routeway.ai/v1`) with 219 models, 16 free (`:free` suffix). Set `ROUTEWAY_API_KEY` or add `routeway_api_key` to `~/.pi/free.json`. Toggle with `/toggle-routeway` ([#209](https://github.com/apmantza/pi-free/pull/209)).

### Fixed

- **Cline free model merging** — Free-to-try models (e.g. `qwen3.7-plus`) from Cline's recommended list now appear in the free model picker even when absent from the main catalog ([#209](https://github.com/apmantza/pi-free/pull/209)).

- **`_pricingKnown` / `_freeKnown` authoritatve flag** — Providers can now signal whether pricing data is authoritative via `_pricingKnown`. When `false`, `isFreeModel` falls back to name-based detection. Kilo's `isFree` API flag now flows through as `_freeKnown` ([#209](https://github.com/apmantza/pi-free/pull/209)).

- **MiniMax reasoning compat** — MiniMax M3 and other MiniMax models now have `supportsReasoningEffort: true` compat settings. Previously, models marked `reasoning: true` without compat caused Pi to enter plan mode without knowing the thinking format, resulting in hangs.

## [2.0.13] - 2026-05-21

### Added

- **OpenCode static headers injection** — pi-free now injects required OpenCode headers (`x-opencode-client`, `x-opencode-session`, `x-opencode-request`, `x-opencode-project`, `User-Agent`) when capturing/re-registering pi's built-in OpenCode models **and** when dynamically fetching/registering OpenCode models from `opencode.ai/zen/v1`. Prevents requests from hanging indefinitely when pi's model generation omits these headers ([pi#4680](https://github.com/earendil-works/pi/issues/4680), [#171](https://github.com/apmantza/pi-free/issues/171), [#173](https://github.com/apmantza/pi-free/issues/173), [#174](https://github.com/apmantza/pi-free/issues/174)). Headers are now regenerated per-call with fresh session and request IDs. Uses native `ses_`/`msg_` prefixed ULID identifiers matching OpenCode's `Identifier.descending()` format to avoid daily rate-limit throttling ([#175](https://github.com/apmantza/pi-free/issues/175)).

- **OpenCode endpoint detection** — Replaced regex-based OpenCode endpoint check with a simple string comparison, reducing overhead on every streaming request.

### Fixed

- **Lazy-load Pi AI stream providers** — Pi-ai's OpenAI completions and Anthropic stream modules are now imported lazily on first use rather than at extension load time. Eliminates start-up failures when pi-ai exports are not yet resolvable ([#177](https://github.com/apmantza/pi-free/issues/177)).

- **Subpath resolution for isolated extension context** — Pi loads pi-free from a directory tree that does not contain `@earendil-works/pi-ai` in its `node_modules`. `createRequire().resolve()` only understands CJS resolution, but pi-ai is ESM-only with strict exports. The new fallback resolves a pi-ai dependency from Pi's entry point, walks up to `node_modules`, reads `pi-ai/package.json`, and maps the `exports` field to the actual file path. Fixes module resolution for both `anthropic` and `openai-completions` subpaths. Includes integration test.

- **Security: shell injection in test** — Replaced `execSync` with `execFileSync` in the OpenCode session integration test to avoid shell injection risk.

### Security

- **Bump `brace-expansion` 5.0.5 → 5.0.6** — Patches minor dependency vulnerability. Fixes `npm audit`. ([#172](https://github.com/apmantza/pi-free/issues/172))

## [2.0.12] - 2026-05-13

### Added

- **Novita AI provider** — OpenAI-compatible API at `api.novita.ai/openai/v1` with 100+ open-source models. Non-standard but rich metadata: per-model pricing (`input_token_price_per_m`), context size, max output tokens, reasoning/vision features, and model descriptions. 3 free models, 99 paid.

- **FastRouter provider** — OpenRouter-compatible API at `api.fastrouter.ai/api/v1` with 170+ models. Always discovered (no auth needed for model listing). Full pricing, context lengths, and feature metadata. 129 text models (6 free, 123 paid) after filtering image/video. Set `FASTROUTER_API_KEY` for chat completions.

- **Dynamic model fetching for OpenCode and OpenRouter** — Pi's built-in providers now get their models fetched dynamically from the API (`opencode.ai/zen/v1/models` and `openrouter.ai/api/v1/models`), same as Mistral, Groq, Cerebras, and xAI. Overwrites Pi's defaults with the full model list. OpenCode uses name-based free detection (API returns no pricing); OpenRouter uses full cost-based detection.

- **API key reading from `~/.pi/agent/auth.json`** — `getOpencodeApiKey()` and `getOpenrouterApiKey()` now fall back to Pi's auth.json when the env var isn't set, matching how Pi's built-in providers read their keys.

### Changed

- **`_pricingKnown` guard in `isFreeModel`** — Providers can now signal whether pricing data is authoritative. When `_pricingKnown` is explicitly `false` (API returned no pricing), `isFreeModel` falls back to name-only detection (checks for "free" in the model name). This eliminates false positives where missing pricing data was treated as $0 cost. All affected providers (ZenMux, Together, CrofAI, dynamic-built-in, fetchOpenAICompatibleModels, deepinfra, sambanova, novita) now set this flag correctly.

- **All providers now use `isFreeModel` consistently** — Together switched from hardcoded `cost===0` check to `isFreeModel`. DeepInfra and SambaNova switched from manual free lists to `isFreeModel` with proper `_pricingKnown` metadata. NVIDIA, Codestral, and Ollama explicitly documented as free-tier providers (`freeModels = allModels`).

- **Unified OpenRouter-based providers** — Kilo, OpenRouter, and Cline now share the same `fetchOpenRouterCompatibleModels` / OpenRouter API logic.

### Removed

- **`DEFAULT_MIN_SIZE_B` (30B minimum model size filter)** — Removed from `model-fetcher.ts` and `cline-models.ts`. All models are now shown regardless of parameter count. NVIDIA still uses its own 70B threshold (`NVIDIA_MIN_SIZE_B`).

### Fixed

- **ZenMux false free classifications** — Models without `pricings` data (DeepSeek Chat V3.1, Kimi K2 0711, Claude 3.7 Sonnet) were incorrectly classified as free because missing pricing defaulted to $0. Fixed to 3 genuinely free models (down from 6 false positives).

- **Together AI, CrofAI, dynamic-built-in missing-pricing false positives** — Same `?? 0` pattern across multiple providers could mark unpriced models as free. All now set `_pricingKnown: false` when pricing is absent from the API response.

## [2.0.10] - 2026-05-08

### Fixed

- **Config wipe on JSON parse failure** — `saveConfig` used `loadConfigFile()` which returns `{}` on any parse error, causing `{ ...{}, ...updates }` to write a partial config that permanently destroyed all API keys. Now reads the raw file directly and refuses to save if corrupt. `ensureConfigFile` also refuses to overwrite corrupt files.

- **Built-in provider keys removed from pi-free config** — `mistral_api_key`, `groq_api_key`, `cerebras_api_key`, `xai_api_key`, and `hf_token` are no longer in `~/.pi/free.json`. These are pi's own built-in providers; their keys come from environment variables only.

## [2.0.9] - 2026-05-08

### Added

- **Together AI provider** — Fast inference on 200+ open-source models (Llama, DeepSeek, Qwen, etc.) through an OpenAI-compatible API. $1 trial credit on signup, no credit card required. Set `TOGETHER_AI_API_KEY`.

- **Per-model metadata for Ollama Cloud** — Fetches `/api/show` details for every Ollama Cloud model to detect real capabilities: thinking/vision support, actual context windows (up to 1M tokens), and thinking level maps (`reasoning_effort`). Models now show parameter size and quantization in display names.

- **Thinking level maps** — Four curated maps (`DEFAULT`, `GPT_OSS`, `QWEN3`, `NO_OFF`) for Ollama Cloud models that map Pi's thinking levels to Ollama's `reasoning_effort` values, based on per-model API testing.

- **`/ollama-cloud-refresh` command** — Re-fetch Ollama Cloud models from the API and update the provider live, no restart needed.

- **Persistent Ollama Cloud cache** — Models cached via `provider-cache.ts` for fast startup. Stale cache auto-refreshes on `session_start`. Fallback models used when cache is unavailable.

### Fixed

- **ZenMux pricing** — Fixed `pricings` key (was reading `pricing`, always returned $0). Now correctly extracts per-model pricing (per-million-tokens ÷ 1M). Also uses `display_name`, `input_modalities` (vision detection), and `capabilities.reasoning` from API.

- **CrofAI model metadata** — Custom fetch now reads per-model `name`, `custom_reasoning`, `context_length`, `max_completion_tokens`, and per-million-token `pricing` from the API.

- **DeepInfra model metadata** — Extracts real model data from the `metadata` sub-object (context_length, max_tokens, pricing, reasoning tags). Filters non-chat models (embedding, rerank, whisper).

- **Ollama Cloud model names** — Enriched with parameter size and quantization (e.g., `deepseek-v4-pro (671B, Q4_0)`). Set `supportsDeveloperRole: false` (fixes GLM models silently ignoring prompts). Bumped `maxTokens` from 4096 to 32768.

- **SambaNova model accuracy** — `fetchOpenAICompatibleModels` now reads per-model `context_length`, `max_completion_tokens`, and `pricing` from SambaNova's extended API response. Also reads `reasoning`, `input_modalities`, and accepts plain array responses.

### Changed

- **Package scope migration** — Updated all peer dependency imports from `@mariozechner/*` to `@earendil-works/*` (`pi-ai`, `pi-coding-agent`, `pi-tui`) to match the upstream scope rename in `@earendil-works/pi` v0.74.0.

## [2.0.8] - 2026-05-07

### Added

- **Codestral provider** — Mistral's code-focused model via codestral.mistral.ai.
  Free tier (Experiment plan): 2 req/min, 500K tokens/min, 1B tokens/month.
  Uses pi's built-in Mistral SDK (`mistral-conversations` API type).

- **LLM7.io provider** — OpenAI-compatible API gateway routing across
  multiple providers (OpenAI, Mistral, Google, DeepSeek, etc.). Free tier:
  default/fast selectors, 100 req/hr, 20 req/min.

- **DeepInfra provider** — AI inference cloud with 100+ open-source models.
  $5 one-time credit on signup (no credit card). Models fetched dynamically.
  Shown as trial credit provider in `/free-providers`.

- **SambaNova provider** — Fast inference on custom RDU hardware with
  OpenAI-compatible API. All models accessible on free tier (no credit card):
  20-480 RPM. Models include Llama 3.3 70B, DeepSeek-V3/R1, Llama 4 Maverick.
  Shown as freemium provider in `/free-providers`.

### Changed

- **Codestral: fixed HTTP 422 error** — Switched API type from
  `openai-completions` to `mistral-conversations`. The OpenAI completions
  adapter was sending unrecognized fields (`stream_options`, `store`,
  `max_completion_tokens`) that Mistral's API rejects with 422.

### Fixed

- **Toggle commands persist across sessions for all providers** — Providers using
  `setupProvider` (zenmux, crofai, llm7, sambanova, deepinfra) were always
  registering `freeModels` on startup, ignoring the persisted `show_paid` config.
  Now each provider reads its config getter and registers the correct initial
  model set. Fixes #149.

### Security

- **Log injection prevention** — `scripts/update-benchmarks.ts` sanitizes external
  API data (CRLF stripping) before logging. Fixes SonarCloud S1075.

### Reliability

- **Prefer `String#replaceAll()` over `String#replace()`** — Replaced all 7 flagged
  instances. Where regex is unnecessary (2/7), switched to string literal form.
  Fixes SonarCloud S4144.

### Added

- **`agents.md`** — Codebase guide for AI agents covering architecture, patterns,
  conventions, testing, and the Pi extension API.

### Added

- **Passive quota monitoring** — Extracts rate-limit headers from every
  provider response via `after_provider_response` event (no extra API calls).
  Tries 6 header format variants (`x-ratelimit-remaining`,
  `ratelimit-remaining-requests-day`, etc.). Shows remaining quota in the
  status bar with warning icons when ≤25% or ≤10%. Fixes #147.

### Fixed

- **Missing `g` flag on `replaceAll` regexps broke model filtering** —
  `String.prototype.replaceAll()` requires a global RegExp; 20+ patterns in
  `benchmark-lookup.ts` were missing it, causing a `TypeError` that prevented
  models from appearing for providers like cline and kilo. Added `/g` flag to
  all affected patterns. Fixes #151.

### Changed

- **Resolved ~280 SonarCloud issues across 21 files** — Bulk code-quality
  cleanup including: stripping trailing zeros from `toFixed()` (S7748),
  `global` → `globalThis` (S7764), `parseFloat` → `Number.parseFloat` (S7773),
  naming unnamed async exports (S7726), `String.raw` for path strings (S7780),
  top-level await over promise chains (S7785), re-export from source (S7763),
  `.at(-1)` over `[length-1]` (S7755), `node:fs` protocol imports (S7772),
  and logging user-controlled data sanitization (S5145). Fixes #148.

### Security

- **Bump `basic-ftp` 5.3.0 → 5.3.1** — Patches GHSA-rpmf-866q-6p89 (high
  severity): malicious FTP server could cause client-side DoS via unbounded
  multiline control response buffering. Fixes `npm audit` finding.

### Refactored

- **Extracted shared model-fetch helper** — `fetchOpenAICompatibleModels()`
  in `lib/util.ts` eliminates ~120 lines of duplicated fetch→parse→map
  boilerplate across CrofAI, DeepInfra, and SambaNova providers.

## [2.0.6] - 2026-05-02

### Security

- **5x S5852 regex super-linear runtime** — Replaced all flagged regex patterns
  (nested quantifiers in model size extraction) with manual char-by-char string
  parsing in `parseModelSize()`, `normalizeSizeTokenOrder()`, and test helpers.
  Eliminates catastrophic backtracking risk.

- **4x S4036 PATH variable security** —
  - `open-browser.ts`: Added `resolveExe()` helper that prefers known absolute
    paths (`/usr/bin/open`, `C:\Windows\System32\...\powershell.exe`) before
    falling back to PATH lookup
  - `check-extensions.mjs`: Removed hardcoded PATH override; resolved `npm` via
    `execFileSync` with known absolute paths

- **1x S4721 command injection** — Replaced `execSync` with `execFileSync` in
  `resolveExe()` helper. `execFileSync` takes separate arguments and never
  spawns a shell, eliminating the injection vector.

### Changed

- **Banner image** — Converted `banner.svg` to `banner.png` for reliable
  rendering across all GitHub surfaces (mobile, email, dark mode readers).

## [2.0.5] - 2026-05-02

### Added

- **NVIDIA model probe auto-discovery** — Lazy auto-probe for NVIDIA models on
  first `session_start` (once per session). Broken 404 models detected and
  auto-hidden without requiring manual `/probe-nvidia`.

### Changed

- **Ollama provider updates** — Improved cloud model detection and configuration.

## [2.0.4] - 2026-05-02

### Fixed

- **OpenRouter key resolution no longer falls back to `free.json`** —
  `getOpenrouterApiKey()` now only checks the `OPENROUTER_API_KEY` environment variable.
  Previously it fell back to `~/.pi/free.json`, which could contain stale/revoked keys
  that conflict with pi's built-in OpenRouter provider (which reads from
  `~/.pi/agent/auth.json`).

- **Removed `openrouter_api_key` from `PiFreeConfig` interface and config template** —
  Prevents future persistence of OpenRouter keys in `free.json`, eliminating the
  source of stale key conflicts for built-in providers.

## [2.0.3] - 2026-05-02

### Added

- **Consistent `isFreeModel` helper with Route A/B logic** — Created a unified helper for free model detection that automatically detects whether a provider exposes pricing:
  - **Route A (pricing-exposed)**: Model is free if `cost === 0` OR `"free"` in name (OR logic)
  - **Route B (non-pricing-exposed)**: Model is free only if `"free"` in name
  - Dynamic detection: If ALL models have cost === 0, assumes pricing not exposed → uses Route B
  - If ANY model has cost > 0, assumes pricing exposed → uses Route A
  - All providers (Cline, Kilo, NVIDIA, Ollama, dynamic built-in) now use this consistent helper

- **CrofAI provider (PAID)** — Added new **paid** provider for CrofAI (https://crof.ai), an OpenAI-compatible LLM inference API. **Note: CrofAI is a paid provider** — users must have a CrofAI API key with credits. The provider uses Route B detection (name-only) since CrofAI's API doesn't expose per-model pricing. Only models with `"free"` in their names are marked as free (none currently).

- **ZenMux provider (PAID)** — Added new **paid** provider for ZenMux AI gateway (https://zenmux.ai), a unified API for 200+ models from OpenAI, Anthropic, Google, etc. **Note: ZenMux is a paid provider** — users must have a ZenMux API key with credits. The provider uses Route A detection (OR logic) since ZenMux exposes pricing. Models marked as free only if `cost === 0` OR `"free"` in name (2 free models identified: GLM 4.7 Flash Free, GLM 4.6v Flash Free).

- **Comprehensive `isFreeModel` test suite** — Added 30+ unit tests covering Route A, Route B, freemium behavior, and edge cases. Tests verify correct classification on actual OpenRouter API data (371 models, 30 free).

- **Toggle commands for dynamic built-in providers** — Added `/toggle-mistral`, `/toggle-groq`,
  `/toggle-cerebras`, `/toggle-xai`, and `/toggle-huggingface` commands. These providers were
  registered with the global toggle system but lacked per-provider toggle commands, making
  free/paid switching inaccessible without editing config files.

- **Lazy auto-probe for NVIDIA models** — Extracted `runNvidiaProbe()` into a shared function
  called automatically on first `session_start` (once per session). Previously, users had to
  manually run `/probe-nvidia` to discover 404 models. Now broken models are detected and
  auto-hidden on first use.

### Changed

- **Cline provider now uses `isFreeModel`** — Fixed Cline to use the consistent `isFreeModel` helper instead of `m.cost.input === 0`. Previously used cost-only filtering, now uses proper OR logic for pricing-exposed providers.

- **NVIDIA test expectations updated** — Updated tests to reflect strict Route B behavior (name-only detection for non-pricing-exposed providers). Added test for models with `"free"` in name being marked as free.

### Fixed

- **`provider-factory.ts` — `beforeProviderRequest` hook now scoped to owning provider** —
  The hook was firing for **all** provider requests regardless of which provider the factory
  was configuring. Now checks `evt.provider !== def.providerId` and returns early if the
  event doesn't belong to the owning provider.

- **`provider-factory.ts` — `reRegister` callback no longer corrupts stored model lists** —
  When toggling between free/paid modes, the callback was overwriting `stored.all` with only
  the filtered subset, losing the original full model list. Now preserves the original model
  lists for correct subsequent toggling.

- **`lib/types.ts` — Removed leftover `LspTestInterface`** — Removed a test interface that
  was left in production code.

- **`index.ts` — Removed redundant `.catch()` on deprecated Qwen provider** — The `.catch()`
  was unnecessary since `Promise.allSettled` already handles rejections.

### Removed

- **Qwen provider (deprecated)** — Removed Qwen OAuth provider as the 1,000 req/day free tier is no longer available. Provider remains functional for existing authenticated users but new free tier registrations are not supported.

- **Modal provider** — Removed single-model Modal provider (only had GLM-5.1 FP8). Users should use other providers for GLM models.

- **Cloudflare provider** — Removed Cloudflare Workers AI provider as it's now built into pi core. Users can use pi's built-in Cloudflare provider instead.

- **Qwen test file** — Removed `tests/qwen.test.ts` along with the deprecated provider.

## [2.0.2] - 2026-04-26

### Added

- **Model matching debug logging** — Added `~/.pi/modelmatch.log` to diagnose which models get Coding Index scores and which don't:
  - Logs every matching attempt with provider, model ID, normalization strategy, and result
  - CSV-like format: `timestamp|provider|modelId|modelName|action|strategy|normalizedId|matchKey|codingIndex|details`
  - Provider-specific normalizers for better matching:
    - **NVIDIA**: Strips vendor prefixes (`meta/`, `mistralai/`, `microsoft/`, `qwen/`, etc.)
    - **Cloudflare**: Strips `@cf/namespace/` prefixes
    - **Groq**: Removes `-versatile` and numeric context suffixes (`-32768`)
    - **Cerebras**: Normalizes `llama3.1` → `llama-3.1`, auto-adds `instruct` suffix
    - **Mistral**: Strips `-latest` suffix
    - **Ollama**: Converts `model:tag` → `model-tag`
  - Common suffix stripping: `:free`, date codes (`-20250514`), versions (`-v1.1`), `-it`, `-fp8`/`-bf16`

- **Enhanced benchmark lookup** — `enhanceModelNameWithCodingIndex()` now accepts optional `provider` parameter for provider-aware normalization

- **Static 404 model blocklist for NVIDIA** — Probed all 136 models from `integrate.api.nvidia.com/v1/models` and identified 57 that return 404 "Function not found" on `/v1/chat/completions`. These are now hard-filtered so they never appear in the model selector:
  - Covers discontinued models (`databricks/dbrx-instruct`, `meta/codellama-70b`, `meta/llama2-70b`, `ibm/granite-*`, etc.)
  - Covers embedding-only models listed as chat-capable (`nvidia/nv-embed-v1`, `nvidia/nv-embedqa-*`, `snowflake/arctic-embed-l`, etc.)
  - Covers stale API catalog entries (`mistralai/mistral-large`, `mistralai/mistral-large-2-instruct`, `writer/palmyra-*`, etc.)
  - Full list in `NVIDIA_KNOWN_404_MODELS` in `providers/nvidia/nvidia.ts`

- **`/probe-nvidia` command** — On-demand model health check. Tests every registered NVIDIA model with a minimal `max_tokens: 1` request, auto-hides any new 404s in `~/.pi/free.json`, and re-registers the provider immediately.

- **`scripts/probe-nvidia.mjs`** — Standalone Node.js script to reproduce the probe. Reads `~/.pi/free.json` for the API key, batches 20 requests at a time with 10s timeout, and prints all broken model IDs for adding to the blocklist.

- **Ollama Cloud 403 handling** — Same pattern as NVIDIA 404s for Ollama Cloud:
  - `OLLAMA_KNOWN_403_MODELS` blocklist for models that return 403 "access denied"
  - `/probe-ollama` command to test all models on-demand, auto-hide broken ones, and re-register
  - `scripts/probe-ollama.mjs` standalone script for blocklist maintenance

- **Provider-scoped hidden models** — Hidden models are now provider-specific:
  - Format: `"provider/model-id"` (e.g., `"ollama/kimi-k2.6"`, `"nvidia/broken-model"`)
  - A model hidden from one provider doesn't hide it from other providers
  - Backward compatible with old global `"model-id"` format
  - All providers updated: NVIDIA, Ollama, Cloudflare, Cline, Kilo, Modal

### Fixed

- **Probe commands timeout handling** — Added `fetchWithTimeout` with 10-second timeout to `/probe-nvidia` and `/probe-ollama` commands. Prevents the coding harness from freezing when individual model probe requests hang indefinitely.

- **NVIDIA provider now sends `authHeader: true`** — Explicitly enables `Authorization: Bearer` header injection. Previously relied on pi's implicit behavior which could fail in some configurations.

### Removed

- **NVIDIA 404 model warning log** — Removed the `console.warn("[nvidia] Skipping known 404 model: ...")` output when filtering out known broken models. The filter still works silently; use `/probe-nvidia` to identify new 404s if needed.

### Changed

- **Cloudflare provider now fetches models dynamically** — Replaced static 19-model hardcoded list with live API fetch from `api.cloudflare.com/client/v4/accounts/{account_id}/ai/models`:
  - Automatically discovers all 30+ text generation models (was manually maintaining 19)
  - Smart filtering excludes embeddings, image generation, speech, translation, and vision-only models via regex patterns
  - Metadata inference from model IDs: detects vision (`vision`/`multimodal`), reasoning (`r1`/`thinking`/`qwq`), context windows, and estimated costs
  - Fixed Mistral Small ID: changed from incorrect `@cf/mistralai/...` to correct `@cf/mistral/...`
  - Added new fallback models: Kimi K2.6, OpenAI GPT-OSS 120B/20B, Qwen 2.5 Coder 32B, QwQ 32B, Llama 3.2 11B Vision
  - Graceful fallback to expanded 18-model hardcoded list if API fetch fails

- **NVIDIA provider now queries NVIDIA's API directly** — Source of truth switched from `models.dev` curated JSON to `https://integrate.api.nvidia.com/v1/models`:
  - Eliminates 57 missing models and 25 stale entries from the old third-party source
  - Models not in `models.dev` get inferred metadata (128k context, 4k output, vision/reasoning heuristics)
  - Added regex-based non-chat model filtering for unknown models (embeddings, whisper, reward models, safety guards, parsers, detectors, etc.)
  - Graceful fallback to `models.dev` if NVIDIA API is unreachable
  - Removed paid/free toggle filtering — NVIDIA is freemium (all models use free credits)

## [2.0.1] - 2026-04-24

### Added

- **Built-in provider toggle support** (`lib/built-in-toggle.ts`) — Enables free/paid filtering for Pi's built-in providers that expose per-model pricing:
  - **OpenCode (`/toggle-opencode`)** — Captures built-in OpenCode models on session start and filters to free-only by default
  - **OpenRouter (`/toggle-openrouter`)** — Now uses the built-in toggle system for consistency
  - Toggle works in the current session (no restart needed)
  - Persisted via `opencode_show_paid` and `openrouter_show_paid` in `~/.pi/free.json`

### Changed

- **OpenRouter moved to built-in toggle system** — OpenRouter is now handled by `lib/built-in-toggle.ts` alongside OpenCode for a unified approach:
  - Removed from `providers/dynamic-built-in/index.ts`
  - Eliminated duplicate toggle command registration logic
  - Consolidated toggle persistence with other built-in providers

- **Standardized all toggle commands to `toggle-{provider}`** — Renamed from `{provider}-toggle` for consistency:
  - `/kilo-toggle` → `/toggle-kilo`
  - `/cline-toggle` → `/toggle-cline`
  - `/openrouter-toggle` → `/toggle-openrouter`
  - `/nvidia-toggle` → `/toggle-nvidia`
  - `/cloudflare-toggle` → `/toggle-cloudflare`
  - `/ollama-toggle` → `/toggle-ollama`
  - `/mistral-toggle` → `/toggle-mistral`
  - `/groq-toggle` → `/toggle-groq`
  - `/cerebras-toggle` → `/toggle-cerebras`
  - `/toggle-opencode` (new)

### Fixed

- **Ollama Cloud model fetching endpoint** — Corrected the `/v1/models` → `/models` endpoint path in `providers/ollama/ollama.ts`:
  - The previous fix (2.0.0) incorrectly used `/v1/models`; Ollama Cloud's models endpoint is `/v1/models` for chat completions but `/models` for listing
  - This ensures model fetching works correctly with the OpenAI-compatible API

### Removed

- **Global `/free` command** — Removed the global free-only toggle. Per-provider toggles (`/toggle-{provider}`) are now the only way to switch between free and paid models. The `/free-providers` status command remains.

## [2.0.0] - 2026-04-23

### Breaking Changes

- **Removed Fireworks provider** — Fireworks is now a built-in Pi provider (added in pi 0.68.1), so the extension's Fireworks provider has been removed to avoid conflicts:
  - Deleted `providers/fireworks/fireworks.ts` and `tests/fireworks.test.ts`
  - Removed all Fireworks configuration options from `config.ts` (`fireworks_api_key`, `fireworks_show_paid`)
  - Users should now use Pi's built-in Fireworks support with `FIREWORKS_API_KEY`

- **Renamed Ollama provider to `ollama-cloud`** — Changed provider ID from `"ollama"` to `"ollama-cloud"` to avoid collision with Pi's built-in local Ollama provider:
  - This prevents provider ID conflicts when both are registered
  - All log messages and documentation now reference "Ollama Cloud"

### Removed

- **Dropped `@sinclair/typebox` peer dependency** — Pi 0.69.0 migrated from `@sinclair/typebox` to `typebox` 1.x. The extension didn't directly import this package, so it was removed from `peerDependencies` to avoid potential conflicts.

### Fixed

- **Ollama Cloud API endpoint** — Fixed broken Ollama Cloud integration:
  - Changed `BASE_URL_OLLAMA` from `https://ollama.com` to `https://ollama.com/v1` — the OpenAI-compatible API endpoint
  - Fixed model fetching to use `/v1/models` instead of `/api/tags` — ensures model IDs work with chat completions endpoint
  - Previously calls went to HTML homepage instead of API endpoints, causing 404 errors

### Removed

- **Removed paid model warning on selection** — Deleted the `model_select` event handler that showed:
  - `⚠️ Paid model selected (${model.id}). Use "/free off" to enable paid models.`
  - This warning was redundant since the global `/free` toggle and provider toggles already control model visibility

- **Removed pointless `/modal-toggle` command** — Modal provider only has 1 free model (GLM-5.1 FP8), so there was nothing meaningful to toggle:
  - Added `skipToggle` option to `ProviderDefinition` and `ProviderSetupConfig` interfaces
  - Modal provider now sets `skipToggle: true` to prevent toggle command creation

### Changed

- **Marked Qwen provider as fully deprecated** — Updated messaging to clarify the provider is broken:
  - Changed model name from `"Qwen Coder — Free 1k/day"` to `"Qwen Coder — DEPRECATED (free tier discontinued)"`
  - Updated all JSDoc comments to clearly state auth is broken and free tier is no longer available
  - Provider remains for backward compatibility but should not be used

### Added

- **Cloudflare Workers AI provider** — New provider for Cloudflare's serverless GPU platform:
  - 50+ open-source models: Llama 4, Mistral Small 3.1, Qwen 2.5/3, DeepSeek R1, Gemma 4, Kimi K2.5/2.6, and more
  - **10,000 Neurons/day FREE tier** (resets daily at 00:00 UTC)
  - **$0.011 per 1,000 Neurons** beyond free allocation
  - Only requires `CLOUDFLARE_API_TOKEN` — account ID auto-derived from token
  - Toggle with `/cloudflare-toggle`
  - Create token at https://dash.cloudflare.com/profile/api-tokens

- **Unified dynamic built-in providers module** — New `providers/dynamic-built-in/` module that dynamically fetches models from Pi's built-in providers when users have API keys:
  - **Mistral** (`MISTRAL_API_KEY`) — Fetches from `api.mistral.ai/v1/models`
  - **Groq** (`GROQ_API_KEY`) — Fetches from `api.groq.com/openai/v1/models`
  - **Cerebras** (`CEREBRAS_API_KEY`) — Fetches from `api.cerebras.ai/v1/models`
  - **xAI** (`XAI_API_KEY`) — Fetches from `api.x.ai/v1/models`
  - **Hugging Face** (`HF_TOKEN` — optional) — Fetches public + authenticated models
  - **OpenRouter** — Moved from `index.ts` to unified module with dynamic fetch
  - All integrate with global `/free` toggle and have per-provider toggle commands (`/mistral-toggle`, `/groq-toggle`, etc.)

- **Global `/free` toggle system** — New centralized free/paid filtering across ALL providers:
  - `/free on/off/status` — Toggle free-only view globally
  - `/free-providers` — Show free/paid model counts by provider
  - `FREE_ONLY` config option and `PI_FREE_ONLY` environment variable
  - Providers register via `registerWithGlobalToggle()` for unified filtering

### Fixed

- **Toggle commands now actually filter models from UI** — Previously, toggle commands only showed notifications but didn't remove paid models from the model picker:
  - **OpenRouter (`/openrouter-toggle`)**: Now uses `registerProvider`/`unregisterProvider` to actually filter models from the picker UI
  - **NVIDIA (`/nvidia-toggle`)**: Added dynamic `showPaid` parameter to `fetchNvidiaModels()` so toggle properly switches between free and paid model sets
  - **Fireworks**: Removed broken toggle command — all models are paid with no free tier, so there was nothing to toggle

### Added

- **OpenRouter per-provider free model toggle** — Added `/openrouter-toggle` command for the built-in OpenRouter provider:
  - `/openrouter-toggle` — Switch between showing only free models vs all models (including paid)
  - New config flag `openrouter_show_paid` in `~/.pi/free.json` (default: `false`)
  - Environment variable: `OPENROUTER_SHOW_PAID=true` to show paid models by default
  - This brings OpenRouter (a built-in pi provider) in line with extension providers that have per-provider toggles

### Deprecated

- **Qwen provider** — The 1,000 requests/day free tier is no longer available from Qwen/DashScope. The provider code remains for backward compatibility but is now deprecated:
  - Added `@deprecated` JSDoc tags to all Qwen-related exports
  - Added deprecation warning when Qwen provider loads
  - Added warning when `QWEN_SHOW_PAID` config is used
  - Consider migrating to other free providers: Kilo, Cline, NVIDIA, or Modal

### Added

- **Go provider** — OpenCode Go subscription gateway (⚠️ paid only — $5 first month, then $10/month, no free tier) with models: GLM-5, Kimi K2.5, MiMo-V2-Pro, MiMo-V2-Omni, MiniMax M2.7, MiniMax M2.5
  - Set `OPENCODE_GO_API_KEY` or `opencode_go_api_key` in `~/.pi/free.json`
  - Toggle with `/go-toggle`

### Fixed

- **All providers now show Coding Index scores in model selector** — Added `enhanceWithCI()` to factory-based providers (nvidia, fireworks, mistral, modal, ollama) and cline. Now all providers display CI scores in `/models` command (pi-models extension).

- **All providers now show in `--list-models`** — Providers (zen, openrouter, go) that registered models only in `session_start` were missing from `pi --list-models` which runs before session starts. Added immediate registration for these providers:
  - **zen**: Added model caching to `~/.pi/provider-cache.json` for immediate registration + dynamic refresh
  - **openrouter**: Immediate model registration at extension load (like kilo/cline)
  - **go**: Immediate registration with static model list (no API to fetch from)
  - All 11 providers now visible in `--list-models`

### Changed

- Updated README with clear free vs paid provider distinction (9 free + 2 paid-only: Go, Fireworks)
- Added Go and Fireworks provider documentation under new "💳 Paid-Only Providers" section
- Added `opencode_go_api_key` to config file template
- Updated package.json description and keywords to include all 11 providers

### Added

- **Provider model cache** (`lib/provider-cache.ts`) — New utility for caching provider model lists to `~/.pi/provider-cache.json`. Used by zen provider for faster startup and offline access after first successful fetch.

## [1.0.9] - 2026-04-14

### Fixed

- **Qwen OAuth breaks other OAuth providers** — `modifyModels` receives all models across every registered provider, not just Qwen's. The previous `map()` stamped the Qwen dashscope `baseUrl` onto every model, causing other OAuth providers (Kilo, OpenRouter, etc.) to return 404 after a `/login qwen` flow. Now only models with `provider === PROVIDER_QWEN` are patched; others pass through unchanged.

## [1.0.8] - 2026-04-13

### Added

- **Modal provider** — Free access to GLM-5.1 FP8 (128k context, 16k max output) during promotional period (free until April 30, 2026)
  - Requires a free Modal API key (`MODAL_API_KEY` or `modal_api_key` in `~/.pi/free.json`)
  - Model: `zai-org/GLM-5.1-FP8` — 128k context window, 16k max output tokens
- **Qwen provider** — Free access to Qwen Coder (1,000 requests/day) via OAuth device flow
  - Run `/login qwen` to authenticate through Qwen Studio (chat.qwen.ai)
  - Uses `coder-model` alias (maps to Qwen3.6-Plus on the backend)
  - 131k context window, 16k max output tokens, zero cost

### Fixed

- **Qwen OAuth browser launch on Windows** — URLs with `&` query params were truncated by `cmd.exe`'s `&` command separator; switched to `powershell.exe Start-Process` which passes the URL as a literal string
- **Qwen API endpoint** — Replicates qwen-code's `getCurrentEndpoint()` logic: uses `resource_url` from OAuth token response (`dashscope.aliyuncs.com` for Chinese accounts, `portal.qwen.ai` for international), with fallback to `dashscope.aliyuncs.com/compatible-mode/v1`
- **Qwen DashScope headers** — Added all headers required by DashScope's OpenAI-compatible API: `X-DashScope-AuthType: qwen-oauth`, `X-DashScope-CacheControl: enable`, `X-DashScope-UserAgent`, `Client-Code: QwenCode`
- **Qwen modifyModels crash** — `modifyModels` must be synchronous; making it async caused the pi framework to receive a `Promise` instead of a `Model[]`, breaking `ModelRegistry.find()`

## [1.0.5] - 2025-04-03

### Fixed

- **NVIDIA provider non-chat model filtering** (comment/implementation mismatch)
  - Added modalities-based filtering to exclude embedding, speech-to-text, OCR, and image-gen models
  - Filters models where `output` is not `["text"]` (e.g., image generation like `black-forest-labs/flux.1-dev`)
  - Filters models where `input` lacks `"text"` (e.g., OCR like `nvidia/nemoretriever-ocr-v1`, speech-to-text like `openai/whisper-large-v3`)
  - Updated file comment to accurately describe the filtering behavior
  - Added 8 comprehensive unit tests for model filtering logic

## [1.0.4] - 2025-04-03

### Fixed

- **All tests now passing** (127/127)
  - Fixed mock paths in kilo.test.ts, zen.test.ts, ollama.test.ts
  - Fixed createCtxReRegister mocks in zen.test.ts and openrouter.test.ts
  - Fixed cline.test.ts to test actual provider re-registration behavior
  - Added missing DEFAULT_MIN_SIZE_B constant to openrouter mock

### Changed

- **Code quality improvements**
  - Refactored usage modules to break circular dependency (limits.ts ↔ formatters.ts)
  - Created usage/types.ts with shared interfaces (FreeTierLimit, FreeTierUsage)
  - Bumped version to 1.0.4

## [1.0.3] - 2025-04-03

### Changed

- Updated package.json metadata (name, description, keywords, repository URL)
- Updated .npmignore for cleaner publishes

## [1.0.0] - 2024-03-28

### Added

- Initial release with 6 providers: Kilo, Zen, OpenRouter, NVIDIA, Cline, Fireworks
- Free tier usage tracking across all sessions
- Provider failover with model hopping
- Autocompact integration for rate limit recovery
- Usage widget with glimpseui
- Command toggles for free/all model filtering
- Hardcoded benchmark data from Artificial Analysis

### Changed

- **Major refactoring**: Split free-tier-limits.ts into usage/\* modules
  - usage/tracking.ts - runtime session tracking
  - usage/cumulative.ts - persistent storage
  - usage/formatters.ts - display formatting
  - 77% line reduction (741 → 166 lines)
- **Major refactoring**: Split usage-widget.ts into widget/\* modules
  - widget/data.ts - data collection
  - widget/format.ts - formatting utilities
  - widget/render.ts - HTML generation
  - 74% line reduction (~350 → 90 lines)
- **Refactoring**: Extracted functions from cline-auth.ts
  - fetchAuthorizeUrl() - auth URL fetching
  - waitForAuthCode() - callback handling
  - exchangeCodeForTokens() - token exchange
  - parseManualInput() - manual input parsing
- **Refactoring**: Simplified model-hop.ts complexity
  - Extracted handleDowngradeDecision()
  - Extracted tryAlternativeModel()
- **Deduplication**: Created shared modules
  - lib/json-persistence.ts - file I/O with caching
  - lib/logger.ts - structured logging
  - providers/model-fetcher.ts - OpenRouter-compatible fetching
- Replaced ~30 console.log statements with structured logging
- Fixed all 9 pre-existing test failures
  - fetchWithRetry now throws after last retry
  - Fixed auth pattern matching (added key.*not.*valid)
  - Updated capability ranking tests
  - Added resetUsageStats() for test isolation

### Fixed

- fetchWithRetry() now properly throws after exhausting retries
- Auth error pattern matching now handles more message variants
- Test isolation for free-tier-limits tests