Metadata-Version: 2.4
Name: citall
Version: 0.1.0
Summary: Interactive 3D PCA visualization from PyTorch embeddings with Plotly
Author: citall contributors
Project-URL: Homepage, https://example.com/citall
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: torch>=2.0
Requires-Dist: plotly>=5.18
Requires-Dist: numpy>=1.24

# citall

`citall` is a Python library for interactive 3D PCA exploration of embeddings.

You provide:
- a `.pt` file with vectors (and optionally ids)
- a `.json` or `.jsonl` file with metadata
- the metadata key used to match points
- an optional image directory where files are named with the same id

The output is an interactive Plotly HTML app where:
- you can rotate/zoom/select in 3D
- hover shows quick point details
- click opens a side panel with full metadata and image preview

## Install (uv)

```bash
uv sync
```

Then run commands through the project environment:

```bash
uv run citall --help
```

## Input formats

### Embeddings `.pt`

Supported payloads:
- `torch.Tensor` of shape `(N, D)`
- `dict` containing vectors and optional ids

Default dict key discovery:
- vectors: `vectors`, `embeddings`, `tensor`, `data`
- ids: `ids`, `keys`, `labels`, `names`

You can override via `vector_key` and `id_key`.

### Metadata `.json` / `.jsonl`

Supported:
- `.jsonl`: one JSON object per line
- `.json`: list of objects
- `.json`: object map from id -> metadata object

You must provide `metadata_key` so `citall` can match embeddings ids to metadata rows.

### Images

If `images_dir` is set, `citall` looks for:
- `<images_dir>/<id>`
- `<images_dir>/<id>.png|.jpg|.jpeg|.webp|.gif`

## Python API

```python
from citall import CitallExplorer

explorer = CitallExplorer.from_files(
    vectors_pt="embeddings.pt",
    metadata_path="meta.jsonl",
    metadata_key="sample_id",
    images_dir="./images",
    vector_key="embeddings",  # optional
    id_key="ids",             # optional
    strict_metadata=False,
)

explorer.compute_pca(n_components=3)
explorer.save_html(
    output_html="citall_plot.html",
    title="My dataset",
    hover_fields=["label", "source"],
  color_by="label",
  legend_title="Class",
)

print(explorer.summary())
```

## CLI

```bash
uv run citall \
  --vectors embeddings.pt \
  --metadata metadata.jsonl \
  --metadata-key sample_id \
  --images-dir ./images \
  --output citall_plot.html \
  --title "Embedding explorer" \
  --hover-fields label,source \
  --color-by label \
  --legend-title "Class"
```

## Python Script Execution (uv)

```bash
uv run python examples/basic_usage.py
```

## Notes

- PCA is computed with PyTorch and projected to 3D.
- Plot rendering uses Plotly.
- HTML output is standalone and shareable.
- Use `color_by` / `--color-by` to split the cloud into colored legend groups.
- For very large datasets, you may want to downsample before plotting.

## Roadmap

- filtering and search UI in side panel
- cluster overlays
- notebook widget mode
- optional UMAP/t-SNE backends
