Metadata-Version: 2.4
Name: recollect
Version: 0.5.3
Summary: Human-like memory for AI applications
License-Expression: MIT
Requires-Python: >=3.12.4
Requires-Dist: asyncpg>=0.31.0
Requires-Dist: fastembed>=0.7.4
Requires-Dist: orjson>=3.11.7
Requires-Dist: pgvector>=0.4.2
Requires-Dist: pydantic-ai>=1.0
Requires-Dist: pydantic>=2.12.5
Description-Content-Type: text/markdown

# recollect

Cognitive memory system for AI applications. Activation-based retrieval with time decay, spreading activation, and token-budgeted recall.

## Install

```bash
pip install recollect                 # pydantic-ai provider included
```

## Quick Start

```python
import asyncio
from recollect import CognitiveMemory

async def main():
    memory = CognitiveMemory()
    await memory.connect()

    await memory.experience(
        "The team decided to migrate from Redis to PostgreSQL for persistence."
    )

    thoughts = await memory.think_about("database decisions", token_budget=500)
    for thought in thoughts:
        print(f"[{thought.activation:.2f}] {thought.content}")

    await memory.close()

asyncio.run(main())
```

## How it works

Bi-encoder cosine similarity reflects semantic overlap, not causal relationships. A query about "Friday dinner plans" has near-zero similarity to "Alex has a severe peanut allergy" -- yet the allergy is safety-critical for a Thai restaurant dinner. Three mechanisms address this gap: concept attention at query time, spreading activation through pre-computed associations, and Hebbian recall tokens that link causally related memories at write time.

**Write path.** `experience(content)` sends the text to an LLM which extracts entities, concepts, context tags, and a significance score. A 768-dimensional embedding is generated locally via FastEmbed (nomic-embed-text-v1.5-Q). Both the embedding and the extracted metadata are stored in PostgreSQL with an HNSW index.

**Read path.** `think_about(query)` embeds the query and runs an HNSW search for candidate traces. Concept attention (ColBERT-style MaxSim over per-trace concept vectors stored at write time) re-ranks those candidates. Spreading activation then traverses the association graph -- a recursive CTE in PostgreSQL that follows temporal, entity, and semantic edges -- to surface traces that did not rank in the initial search but are strongly linked to those that did. Recall tokens apply a gated score bonus for traces that share ambiguous entities with the query. The final list is clipped to fit within the `token_budget`.

**Working memory.** A 7 +/- 2 slot buffer (range 5-9, enforced) mirrors Miller's Law. When it is full, the weakest trace is displaced to storage.

**Strength and decay.** Every trace has a `strength` in [0.0, 1.0] that decays exponentially over time. Retrieval boosts strength. `consolidate()` merges related traces and removes those below the consolidation threshold.

**Recall token lifecycle.** Each recall token also carries a `strength` in [0.0, 1.0]. When a token participates in a successful recall, its strength increments by 0.1 (capped at 1.0). During `consolidate()`, inactive token strengths decay by a factor of 0.9 per pass. Tokens that fall below 0.01 are archived: they become invisible to query-time activation but retain their label, stamps, and significance score. If a future write-time assessment extends or revises an archived token, it reactivates with `strength = significance` -- a health or safety token with significance 0.85 comes back strong, a low-significance token comes back weak but viable. Archived tokens are never deleted; the situational group survives at negligible storage cost and can re-enter the active pool whenever the situation recurs.

**Scoring.** All candidates from the five retrieval sources merge through a single formula:

```
effective_sim = 0.7 * concept_maxsim + 0.3 * biencoder_cosine
                                        [falls back to biencoder when concept_maxsim = 0]

score = effective_sim
      + significance * 0.15
      + |valence| * 0.05
      + activation_level * 0.10         [spreading activation candidates]
      + entity_sim * 0.1 * significance * concept_sim   [entity match; zero when concept_maxsim = 0]
      + propagated_sim * 0.50           [recall token candidates]
```

The entity bonus is multiplicatively gated by `concept_sim`: entity name matches contribute zero signal when the trace has no semantic overlap with the query, preventing unrelated traces from floating up because they share a name. All parameters are tunable via TOML config.

## API

| Method | Description |
|--------|-------------|
| `connect(db_url=None)` | Connect to PostgreSQL. Uses `DATABASE_URL` env var if no argument. |
| `experience(content)` | Store a memory trace. LLM extracts entities, concepts, significance. |
| `think_about(query, token_budget)` | Retrieve memories that fit within a token limit. Returns `list[Thought]`. |
| `consolidate(threshold=None)` | Merge and prune weak traces. |
| `forget(trace_id)` | Remove a trace. |
| `reinforce(trace_id, factor=1.1)` | Strengthen a trace. |
| `facts(subject=None)` | List persona facts. |
| `start_session(user_id)` | Begin a scoped session. |
| `close()` | Disconnect and release resources. |

## Environment Variables

| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `DATABASE_URL` | Yes | `postgresql://localhost:5432/memory_sdk` | PostgreSQL connection string. |
| `PYDANTIC_AI_MODEL` | No | -- | pydantic-ai model string in `provider:model` format (e.g., `ollama:ministral-3`, `anthropic:claude-haiku-4-5-20251001`). |
| `ANTHROPIC_API_KEY` | For Anthropic models | -- | Anthropic API key. Read by pydantic-ai's Anthropic backend. |
| `OPENAI_API_KEY` | For OpenAI models | -- | OpenAI API key. Read by pydantic-ai's OpenAI backend. |
| `OLLAMA_BASE_URL` | No | `http://localhost:11434/v1` | Ollama API endpoint. |
| `MEMORY_EXTRACTION_MAX_TOKENS` | No | `8192` | Max tokens for LLM extraction. Reasoning models consume thinking tokens before output; 8192 covers most cases. |
| `MEMORY_CONFIG` | No | -- | Path to custom TOML config file. |
| `MEMORY_EXTRACTION_INSTRUCTIONS` | No | -- | Override extraction prompt instructions. |
| `MEMORY_RECALL_TOKENS_ENABLED` | No | `true` | Enable write-time token stamping and query-time activation. |
| `MEMORY_RECALL_TOKENS_TOP_K` | No | `5` | Max related traces to consider for token assessment. |
| `MEMORY_RECALL_TOKENS_THRESHOLD` | No | `0.3` | Min cosine similarity to consider a trace as related. |
| `MEMORY_RECALL_TOKENS_STRENGTH_THRESHOLD` | No | `0.1` | Min token strength to activate at query time. |
| `MEMORY_RECALL_TOKENS_SCORE_BONUS` | No | `0.1` | Gated additive bonus: `token_strength * bonus * effective_sim`. |
| `MEMORY_RECALL_TOKENS_REINFORCE_BOOST` | No | `0.1` | Strength increment on token activation (capped at 1.0). |
| `MEMORY_RECALL_TOKENS_DECAY_FACTOR` | No | `0.9` | Multiply inactive token strength by this during consolidation. |

## Configuration

Defaults ship in `config.toml`. Override by placing a `memory.toml` in your working directory, or set `MEMORY_CONFIG` to a custom path. Only include keys you want to change:

```toml
[memory]
decay_rate = 0.05

[retrieval]
max_retrievals = 10

[extraction]
max_tokens = 2048
pydantic_ai_model = "ollama:ministral-3"   # pydantic-ai provider:model format
```

### Config sections

| Section | Controls | Key parameters |
|---------|----------|----------------|
| `[database]` | PostgreSQL connection | `url` |
| `[memory]` | Core memory model | `initial_strength`, `consolidation_threshold`, `decay_rate` |
| `[working_memory]` | Working memory capacity | `capacity` (default 7, range 5-9) |
| `[retrieval]` | Retrieval pipeline tuning | `max_retrievals`, `search_limit`, `selection_threshold` |
| `[extraction]` | LLM extraction | `max_tokens`, `max_concepts`, `max_relations`, `pydantic_ai_model` |
| `[embedding]` | Local embedding model | `model`, `dimensions` |
| `[persona]` | Persona fact management | `auto_extract`, `confidence_threshold` |
| `[session]` | Session summaries | `summary_strength`, `summary_max_tokens` |

Full defaults: [`config.toml`](src/recollect/config.toml)

Or pass a path directly:

```python
from recollect.config import MemoryConfig

config = MemoryConfig(config_path=Path("./my-config.toml"))
memory = CognitiveMemory(config=config)
```

## LLM Provider

Single provider behind the `LLMProvider` protocol. Routes calls through pydantic-ai's Agent abstraction, giving access to 20+ model backends through a single dependency.

The model string uses pydantic-ai format (`provider:model`). Credentials are read from the environment by the underlying provider (e.g., `ANTHROPIC_API_KEY` for Anthropic, `OLLAMA_BASE_URL` for Ollama).

```python
from recollect.llm.pydantic_ai import PydanticAIProvider

# Model configured via PYDANTIC_AI_MODEL env var, or pass explicitly:
provider = PydanticAIProvider()  # uses PYDANTIC_AI_MODEL
provider = PydanticAIProvider(model="anthropic:claude-sonnet-4-6")
provider = PydanticAIProvider(model="ollama:llama3")
```

### Reasoning models

Models that use internal chain-of-thought (OpenAI o1/o3, Qwen3, DeepSeek-R1) consume thinking tokens from the `max_tokens` budget. If extraction returns empty responses, increase the token budget:

```toml
# memory.toml
[extraction]
max_tokens = 8192
```

The default is 4096, which provides sufficient headroom for most models.

## Requirements

- Python 3.12+
- PostgreSQL 17 with [pgvector](https://github.com/pgvector/pgvector)
- `DATABASE_URL` environment variable

## License

MIT
