Metadata-Version: 2.4
Name: hippotorch
Version: 0.4.2
Summary: Differentiable episodic memory for reinforcement learning.
Author: Döme Zsolt
Keywords: reinforcement-learning,episodic-memory,pytorch,replay-buffer,rl
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: torch>=2.0
Requires-Dist: numpy>=1.21
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Requires-Dist: black>=23.7; extra == "dev"
Requires-Dist: ruff>=0.1.7; extra == "dev"
Requires-Dist: isort>=5.12; extra == "dev"
Requires-Dist: mypy>=1.7; extra == "dev"
Requires-Dist: pre-commit>=3.5; extra == "dev"
Requires-Dist: hypothesis>=6.80; extra == "dev"
Provides-Extra: docs
Requires-Dist: mkdocs>=1.5; extra == "docs"
Requires-Dist: mkdocstrings[python]>=0.24; extra == "docs"
Requires-Dist: mkdocs-material>=9.5; extra == "docs"
Provides-Extra: envs
Requires-Dist: gymnasium==0.28.1; extra == "envs"
Requires-Dist: minigrid>=3.0.0; extra == "envs"
Requires-Dist: pygame>=2.4.0; extra == "envs"
Provides-Extra: hub
Requires-Dist: huggingface_hub>=0.20; extra == "hub"
Requires-Dist: safetensors>=0.4; extra == "hub"
Requires-Dist: jsonschema>=4.0; extra == "hub"
Provides-Extra: umap
Requires-Dist: umap-learn>=0.5; extra == "umap"
Provides-Extra: robotics
Requires-Dist: gymnasium-robotics>=1.4.2; extra == "robotics"
Requires-Dist: mujoco>=2.3.0; extra == "robotics"
Provides-Extra: faiss
Requires-Dist: faiss-cpu>=1.7.4; extra == "faiss"
Provides-Extra: faiss-gpu
Requires-Dist: faiss-gpu>=1.7.4; extra == "faiss-gpu"
Dynamic: license-file

# hippotorch

[![PyPI](https://img.shields.io/pypi/v/hippotorch?logo=pypi&logoColor=white)](https://pypi.org/project/hippotorch/)
[![pipeline status](https://gitlab.com/domezsolt/hippotorch/badges/main/pipeline.svg)](https://gitlab.com/domezsolt/hippotorch/-/pipelines)
[![coverage](https://gitlab.com/domezsolt/hippotorch/badges/main/coverage.svg)](https://gitlab.com/domezsolt/hippotorch/-/pipelines)

> **Differentiable episodic memory for reinforcement learning. Retrieves what matters. Forgets what doesn't.**

[Changelog](CHANGELOG.md)

Hippotorch is a drop-in upgrade for replay buffers. It keeps experiences in a learnable memory so agents can remember rare successes, connect distant cause and effect, and transfer knowledge between similar worlds. Under the hood it uses reward-aware contrastive learning, but you mostly interact with a friendly API.

---

## Highlights

- **Memory that adapts with you.** Dual encoders organize episodes by usefulness instead of mere recency.
- **Semantic + uniform sampling.** A single buffer can surface hard-to-find wins while still covering the full state space.
- **Production-friendly extras.** Hugging Face Hub export, FAISS retrieval, Gymnasium wrappers, and health reports ship in the box.
- **Batteries included.** Dozens of scripts and docs show exactly how to benchmark, visualize, and share results.

If you already converge with a plain replay buffer, keep it. Hippotorch shines when agents forget early lessons, face sparse rewards, or operate in partially observed environments.

---

## Installation

```bash
pip install hippotorch            # minimal setup
pip install hippotorch[faiss]     # fast nearest-neighbor retrieval
pip install hippotorch[envs]      # Gymnasium helpers + examples
pip install hippotorch[hub]       # Hugging Face Hub + safetensors
pip install hippotorch[umap]      # projector UMAP export
```

Requirements: Python ≥3.9, PyTorch ≥2.0

---

## Quick Tour

Create an encoder + memory, add episodes, then mix semantic and uniform samples:

```python
import torch
from hippotorch import Episode, DualEncoder, MemoryStore, HippocampalReplayBuffer

state_dim, action_dim = 4, 1
encoder = DualEncoder(input_dim=state_dim + action_dim + 1, embed_dim=128)
memory = MemoryStore(embed_dim=128, capacity=50_000)
buffer = HippocampalReplayBuffer(memory=memory, encoder=encoder, mixture_ratio=0.3)

states = torch.randn(32, state_dim)
actions = torch.randn(32, action_dim)
rewards = torch.randn(32)
buffer.add_episode(Episode(states=states, actions=actions, rewards=rewards))

# Query-aware sampling
query_state = torch.cat([states[0], torch.zeros(action_dim), rewards[:1]])
batch = buffer.sample(batch_size=64, query_state=query_state, top_k=5)

# Sleep/consolidate occasionally
metrics = buffer.consolidate(steps=50, batch_size=64, report_quality=True)
print(metrics["loss"])
```

Rolling with Stable Baselines 3 or Gymnasium? Wrap your existing replay buffer with `SB3ReplayBufferWrapper` or the `HippotorchMemoryWrapper` and keep the rest of your pipeline untouched.

Need hyperparameter guidance? Start with `docs/hyperparameter_guide.md` for recommended ranges, then see `docs/diagnostics.md` for health checks and `docs/curriculum.md` for training tips.

---

## Everyday Tools

### Recall While Acting
- Use the lightweight read API: `from hippotorch import query`.
- Pipe `query(..., top_k=5)` results into policies or logging code.
- Gymnasium adapter emits dict observations so SB3 policies can consume retrieval features alongside pixels.
- Examples: `examples/query_inference_demo.py`, `examples/minigrid_memory_wrapper.py`.

### Portable Brains
- Share trained memories with `push_memory_to_hub` / `load_memory_from_hub`.
- Choose local folders for offline passes or Hugging Face Hub for team-wide reuse.
- `scripts/hub_roundtrip_smoke.py` is a 30-second sanity check.
- Docs: `docs/hub.md`.

### Glass-Box Diagnostics
- `buffer.health_report()` returns retrievability, staleness, collapse indicators, and alignment scores.
- Log with `report.to_tensorboard(writer, step)` or `report.to_wandb(run)`.
- See `docs/diagnostics.md` for visuals.

### Batch Retrieval for Low Latency
- `buffer.query_batch(query_vecs, top_k=K)` handles `[B,T,D]` tensors in one go.
- Matches single-query results without looping Python.
- Works with both torch and FAISS backends.

### Multi-GPU Encoding
- Set `multi_gpu=True` on `DualEncoder`/`VisualEpisodeEncoder` or `Consolidator` to enable `torch.nn.DataParallel` when multiple GPUs are present.
- Snapshots handle `module.` prefixes transparently; save/load works across single- and multi-GPU runs.

---

## Ready-to-Run Samples

Pick a script, set a seed, and you get a reproducible snapshot:

- **Benchmarks & diagnostics**
  - Retrieval perf: `python scripts/bench_retrieval.py --sizes 10000 100000`
  - Visualization: `python scripts/export_projector_embeddings.py --snapshot run.pt`
  - Retrieval heatmap: `python scripts/retrieval_heatmap.py --memory-checkpoint ...`
- **Environments**
  - CartPole smoke: `bash scripts/quick_cartpole.sh`
  - Corridor curriculum/oracle: `bash scripts/corridor_curriculum.sh`, `bash scripts/corridor_oracle_zn.sh`
  - MiniGrid sweeps: `python scripts/minigrid_memory_benchmark.py --steps 8000 --seeds 3`
  - FetchReach benchmark: `bash scripts/fetchreach_benchmark.sh`
  - HER comparison (FetchReach): `bash scripts/her_comparison.sh`
  - Intrinsic curiosity example: `python -m examples.intrinsic_demo --episodes 20`
- **Ablations & studies**
  - Rank-weighted consolidation: `bash scripts/run_rank_ablation.sh`
  - Consolidation micro bench: `bash scripts/run_consolidation_micro.sh`
  - Visual MiniGrid clustering: `python -m examples.minigrid_visual --steps 2000`

All scripts keep runtime under a couple of minutes unless stated otherwise. Longer jobs (corridor oracle full run, curriculum sweeps) note their expected duration in the script header.

---

## Learn More

- [docs/benchmarks.md](docs/benchmarks.md) – retrieval setups, FAISS parity, and profiling tips.
- [docs/curriculum.md](docs/curriculum.md) – how to stage corridor tasks and measure regret.
- [docs/usage.md](docs/usage.md) – wrappers, segmenters, and rollout recipes.
- [docs/hub.md](docs/hub.md) – how to move memories between machines or teammates.
- Getting started notebook: `docs/tutorials/getting_started.ipynb`
- API Reference (MkDocs): build locally with `make docs` and open `site/index.html` (source: [docs/api.md](docs/api.md)).
 - Sparse Atari pilot (Montezuma’s Revenge): `bash scripts/atari_pilot.sh` or `python -u scripts/atari_sparse_pilot.py --env ALE/MontezumaRevenge-v5 --steps 10000` (requires optional extras: `pip install gymnasium[atari] autorom` then run `AutoROM --accept-license`). See `docs/atari_pilot.md`.

Problems or ideas? Open an issue or send a Merge Request on GitLab.
