# Shared Agent Memory RFC

## Summary

This document defines a shared memory architecture for agents working across many projects for one human operator, with one active machine at a time.

The design goal is not a generic vector database. The goal is a durable memory system that helps agents understand:

- who Sebas is, what he prefers, and how he tends to decide
- what each project is trying to do
- which decisions and constraints already exist
- which knowledge from one project should inform another

The v1 architecture is intentionally simple:

- `SQLite/libSQL` is the canonical store
- embeddings are auxiliary, not central
- all agents talk to one local `memory-service`
- one local `memory-engine` handles ingestion, extraction, consolidation, embeddings, and index refresh
- memory is captured through explicit actions, not passive continuous listening

This keeps the system local-first, auditable, and easy to evolve without committing to distributed complexity too early.

## Goals

- Share durable memory across agents and projects
- Preserve global personal context about Sebas
- Preserve project-specific facts, conventions, and decisions
- Reuse useful knowledge across repositories
- Support hybrid retrieval: metadata + keyword/full-text + semantic search
- Keep memory explainable and evidence-backed
- Work well with many local agent processes on one machine

## Non-Goals

- Multi-writer conflict resolution across simultaneously active machines
- Passive always-on ingestion of ambient real-world conversation
- Dedicated external vector database in v1
- Rich graph-native storage in v1
- Markdown-vault-first storage as the canonical source of truth

## Core Architecture

The system has four main pieces:

1. `SQLite/libSQL` database
   The source of truth for all memory records, links, sources, ingestion history, and retrieval logs.

2. `memory-service`
   A local shared API surface used by all agents and tools. It is the only supported read/write path into memory.

3. `memory-engine`
   A local background worker responsible for asynchronous processing: extraction, consolidation, embedding generation, and index maintenance.

4. Agent clients
   Any coding agent, assistant, or automation process that queries or writes memory through the service.

### Design Choice

`SQLite/libSQL` is chosen over Postgres and OpenViking for v1 because the actual operating model is:

- one user
- one active machine at a time
- local self-hosting
- moderate write concurrency
- low operational tolerance for heavy infrastructure

This is enough to support a serious memory layer without overbuilding.

## Execution Model: Memory Pipeline Engine

The system must explicitly define when memory enters, who triggers it, and when it becomes durable.

### Components

- `memory-service`
  Accepts writes and reads. Creates ingestion events. Exposes search and context APIs.

- `memory-engine`
  Runs as an event-driven worker. Pulls jobs from a small local queue stored in the database or managed by the service.

### What Triggers Ingestion

Only these ingestion moments are allowed in v1:

1. `manual write`
   A user or agent explicitly writes a memory record, preference, project note, or decision through the API.

2. `session close`
   An agent run, coding session, or task execution finishes and submits its transcript, summary, or artifacts for extraction.

3. `conversation capture`
   A human conversation is intentionally saved into the system as notes, transcript, import, or agent-written summary.

4. `scheduled maintenance`
   Background maintenance runs to generate embeddings, refresh indexes, suggest consolidation, and detect duplicates.

### What Does Not Trigger Ingestion

The system does not do any of the following in v1:

- passive live listening to real-world conversations
- always-on capture from microphones or chat streams
- automatic ingestion of all repo text by default
- promotion of raw conversation snippets directly into durable truth

### Human Conversation Policy

Human conversations are harder than agent sessions because the system should not infer permission or significance by default.

In v1, a conversation enters memory only if one of these happens:

- a transcript or note is pasted/imported explicitly
- an agent writes a summary and it is intentionally saved
- a user captures a conversation artifact through the API

This means the memory system is `explicit-capture`, not `ambient-capture`.

### State Flow

All memory processing follows this lifecycle:

1. `capture`
   Raw material is submitted. This creates a `source` and an `ingestion_event`.

2. `extract`
   The engine derives candidate memories from the raw material. These land as `inbox` records.

3. `consolidate`
   The engine or a later approval pass deduplicates, merges, links, and scopes those candidates.

4. `promote`
   A consolidated record becomes `active` and available as durable memory.

5. `index`
   Embeddings and full-text indexes are updated asynchronously.

### Timing Rules

- Manual writes for profile or project facts are synchronous.
- Session-close extraction is asynchronous.
- Conversation capture is asynchronous after explicit save/import.
- Embedding generation must never block storage.
- Retrieval must still work when embeddings are missing by falling back to metadata and full-text search.

### Ownership Rules

- Agents may auto-create `episode` records and candidate `inbox` facts.
- Durable `profile` facts about Sebas should require explicit confirmation or a promotion rule.
- `decision` records should preserve rationale and source before activation.
- Contradictory or low-confidence memories should remain explicit, not silently overwritten.

### Failure Rules

- Failed jobs remain retryable in `ingestion_events`.
- Raw source material is never deleted because extraction failed.
- Every promoted memory must retain source/evidence references.
- Partial indexing failure must not make the record unreadable.

## Memory Model

The system stores multiple kinds of memory. These are not all equally trustworthy.

### Memory Classes

- `profile`
  Global facts, preferences, worldview hints, working style, communication preferences, recurring intent.

- `project`
  Facts tied to one project, repo, product, or codebase.

- `decision`
  Important choices with rationale, alternatives, and scope.

- `artifact`
  Reusable documents, links, snippets, references, or procedures.

- `episode`
  Session observations, conversation fragments, run summaries, transient context.

- `task_hint`
  Reusable next-step patterns or operational hints that help agents act better.

### Stable Types

#### `MemoryRecord`

Canonical stored unit. Required fields:

- `id`
- `type`
- `scope`
- `status`
- `project_id`
- `repo_id`
- `agent_id`
- `source_kind`
- `title`
- `content`
- `summary`
- `confidence`
- `freshness`
- `created_at`
- `updated_at`
- `observed_at`
- `source_ref`
- `evidence_ref`
- optional embedding vector

#### `MemoryScope`

Allowed values:

- `global`
- `project`
- `repo`
- `agent`
- `session`

#### `MemoryStatus`

Allowed values:

- `inbox`
- `active`
- `superseded`
- `contradicted`
- `archived`

#### `SourceKind`

Allowed values:

- `manual`
- `conversation`
- `run`
- `document`
- `import`

#### `IngestionEvent`

Tracks pipeline execution:

- event id
- trigger type
- source reference
- requested scope
- job state
- timestamps
- processor outcome
- retry count
- error message if any

#### `RetrievalResult`

Contains:

- memory record
- combined score
- score breakdown by retrieval component
- matched scope
- evidence reference
- explanation text

#### `ContextBundle`

The final payload an agent consumes. It should include:

- global profile facts
- relevant project/repo facts
- active decisions
- recent episodes when useful
- citations/explanations

### Relationships

Memory links support these relations:

- `supports`
- `contradicts`
- `supersedes`
- `applies_to`
- `derived_from`
- `related_to`

## Logical Schema

The implementation must support at least these logical tables or equivalent collections:

- `projects`
- `repos`
- `memories`
- `memory_links`
- `sources`
- `ingestion_events`
- `retrieval_logs`
- `profiles` or `saved_views`

### Schema Notes

- `memories` is the canonical table for all memory types.
- `sources` stores original artifacts: transcript refs, note refs, document refs, run refs.
- `memory_links` stores semantic and operational relations.
- `ingestion_events` is both audit log and retry queue anchor.
- `retrieval_logs` make memory behavior inspectable.

### Indexing Requirements

The system should maintain:

- metadata indexes on scope/project/repo/type/status
- FTS index over `title`, `summary`, `content`
- vector index for embeddings if available in chosen runtime

Embeddings remain inside the main store in v1. No separate vector database.

## Retrieval Behavior

Hybrid retrieval is mandatory.

### Retrieval Pipeline

1. Apply metadata filters first
   Filter by scope, project, repo, type, status, and recency bounds.

2. Run full-text retrieval
   Use keyword and lexical matching for explicit facts and exact terms.

3. Run semantic retrieval
   Use embeddings for concept-level similarity.

4. Merge and rerank
   Combine keyword score, semantic score, scope relevance, recency, and confidence.

5. Build response explanation
   Return why each item was selected.

### Default Search Scope

For most agent requests, search should consider:

- global profile memory
- current project and repo memory
- linked cross-project memories only when relevance is above threshold

This prevents unrelated noise while still allowing useful transfer.

### Retrieval Guarantees

- Retrieval must not depend solely on embeddings.
- Records with missing embeddings must still be discoverable.
- Results must include evidence or source references whenever possible.
- Returned context should prefer fewer, stronger memories over many weak ones.

## Ingestion and Consolidation

The system uses two ingestion lanes.

### Lane 1: Manual Curated Ingestion

Used for:

- personal profile facts
- communication preferences
- worldview and working style
- project conventions
- explicit decisions

These writes are synchronous and can become durable immediately if written in structured form.

### Lane 2: Automatic Extracted Ingestion

Used for:

- agent session transcripts
- run summaries
- imported conversations
- documents and notes requiring extraction

These writes first create `inbox` records. They are candidates, not durable truth.

### Consolidation Rules

Consolidation should:

- detect duplicates or near-duplicates
- merge equivalent facts
- preserve provenance
- create contradiction links when claims disagree
- promote only sufficiently grounded memories to `active`

### Truth Policy

- `episode` records may be stored automatically.
- candidate facts extracted from episodes stay `inbox` until consolidation.
- profile/worldview memories should be promoted with explicit confirmation or policy.
- latest write does not automatically replace canonical truth.

## Service API

The exact transport can be local HTTP, local RPC, or MCP-compatible wrapping. The behavioral API should include:

- `ingest(memory_input)`
- `capture_session(source_ref, scope)`
- `capture_conversation(source_ref, scope, capture_mode)`
- `search(query, scopes, filters)`
- `context_for(project, repo, agent, task)`
- `consolidate(inbox_items)`
- `upsert_profile_fact(...)`
- `explain(memory_id | retrieval_id)`
- `export()`
- `import()`

### API Expectations

- all writes go through the service
- all agent context reads go through the service
- service responses should be deterministic enough to debug
- search and context calls should expose explanation fields, not just payloads

## Operational Model

This system should be treated as shared infrastructure for all local agents.

### Concurrency

Multiple agents may read and write at once on the same machine. The design assumes:

- one active machine
- many local processes
- moderate concurrent writes

The service layer should serialize or coordinate writes as needed so agents do not each invent their own storage rules.

### Backup and Portability

V1 backup strategy can be simple:

- regular snapshot/export of the database
- optional export of sources or source references
- manual transfer between machines when switching active device

Because there is no simultaneous multi-device usage requirement, this is acceptable for v1.

## Acceptance Scenarios

The future implementation should satisfy these scenarios:

1. An agent in project A retrieves Sebas global preferences plus project A conventions without project B noise.

2. A decision recorded in project A appears in project B only when linked or strongly relevant.

3. A manually saved profile preference is available immediately even before embeddings exist.

4. A conversation imported from outside the system stays raw until explicitly captured, then becomes `inbox`, then later may be promoted.

5. Session-close extraction does not block the agent workflow.

6. Two contradictory facts can coexist with explicit status and evidence.

7. Retrieval combines lexical and semantic matches and can explain result selection.

8. Export/import preserves ids, timestamps, links, statuses, and ingestion history.

9. Multiple local agents can safely use the same shared memory service.

## Deferred Ideas / Future Extensions

These ideas are useful, but should not be implemented in v1.

### OpenViking-Inspired Ideas

- Hierarchical context loading (`L0/L1/L2`)
  Useful later because agents often need summary-first then drill-down retrieval.
  Not now because the core memory model and retrieval service should exist before multi-level context packaging.

- Filesystem-style browsing and recursive retrieval
  Useful later because it gives humans and agents a navigable knowledge shape.
  Not now because SQL-backed canonical memory is simpler and more reliable for v1.

- Richer separation of user, agent, resource, and skill namespaces
  Useful later because it makes larger agent ecosystems cleaner.
  Not now because current scope is one operator and one shared memory domain.

### Arscontexta-Inspired Ideas

- Optional Markdown vault as a capture surface
  Useful later because Markdown is portable, easy to inspect, and good for human-curated notes.
  Not now because canonical truth should stay in one structured store first.

- Reflection and reweave workflows over notes
  Useful later because long-running knowledge bases benefit from periodic restructuring.
  Not now because the system first needs reliable ingestion, status handling, and consolidation.

### Other Deferred Extensions

- Graph/entity memory
  Useful later when cross-project entity relationships become a retrieval bottleneck.
  Not now because links in SQL are enough for early stages.

- Multi-device sync and conflict handling
  Useful later if more than one machine becomes active in parallel.
  Not now because the operating model explicitly avoids that complexity.

- Live conversation ingestion
  Useful later only if a clear privacy, consent, and review model exists.
  Not now because passive capture is too risky and noisy.

## Recommended v1 Principle

Treat memory as a structured, evidence-backed, consolidating knowledge system.

Do not treat memory as:

- a dump of all conversations
- a pure vector search problem
- an autonomous truth machine

The system should help agents remember better, not hallucinate with more confidence.