Metadata-Version: 2.4
Name: lineage-simulator
Version: 0.1.0
Summary: A Single-Cell Lineage Simulator with Fate-Aware Gene Expression
Author-email: "Haizhi (Gary) Lai" <hglai.png@gmail.com>
License-Expression: MIT
Classifier: Development Status :: 3 - Alpha
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: anndata>=0.10
Requires-Dist: h5py>=3.15.1
Requires-Dist: llvmlite>=0.43
Requires-Dist: numba>=0.60
Requires-Dist: numpy<2.4
Requires-Dist: pandas>=3.0.0
Requires-Dist: pydantic>=2.12.5
Requires-Dist: scanpy>=1.9.8
Requires-Dist: scikit-learn>=1.8.0
Requires-Dist: scipy>=1.17.0
Requires-Dist: tabulate>=0.9.0
Requires-Dist: umap-learn>=0.5
Provides-Extra: download
Requires-Dist: gdown>=5.2.1; extra == "download"
Dynamic: license-file

# LineageSim

A simulator for single-cell lineage tracing data that generates fate-aware gene expression — progenitor cells carry transcriptomic signatures of their descendants' fates, capturing a key property of real developmental data.

## Installation

Requires Python >= 3.11.

```bash
pip install lineagesim
```

Or for development:

```bash
uv sync
```

## CLI Usage

```bash
# See all available options
lineagesim generate --help

# Generate with default parameters
lineagesim generate -o dataset.h5ad

# An example showing full control of simulation parameters
lineagesim generate \
    --n-cells 4096 --n-genes 500 --n-cif 32 \
    --beta 0.3 --sigma 0.5 --commit-depth 7 \
    --gene-effect-prob 0.3 --scale-s 1.0 \
    --protocol UMI --skip-technical-noise \
    --seed 42 -o dataset.h5ad
```

## Python Quickstart

```python
from lineagesim.simulator import LineageSim

sim = LineageSim()
tree, expression = sim.generate_dataset(
    n_cells=8192,
    n_genes=1000,
    beta=0.2,
)
# tree: dict mapping cell IDs to metadata (fate, parent, depth, etc.)
# expression: (n_cells, n_genes) array of observed counts
```

## Parameters

`generate_dataset()` accepts the following parameters:

| Parameter              | Type    | Default | Description                                           |
| ---------------------- | ------- | ------- | ----------------------------------------------------- |
| `n_cells`              | `int`   | 8192    | Number of cells to simulate                           |
| `n_genes`              | `int`   | 3000    | Number of genes to simulate                           |
| `n_CIF`                | `int`   | 32      | Number of Cell Identity Factors (latent dimensions)   |
| `sigma`                | `float` | 0.2     | Brownian motion noise std dev                         |
| `beta`                 | `float` | 0.2     | Fate signal strength                                  |
| `gene_effect_prob`     | `float` | 0.3     | Probability that a CIF affects a gene                 |
| `scale_s`              | `float` | 1.0     | Transcription rate scaling factor                     |
| `protocol`             | `str`   | `"UMI"` | Sequencing protocol (`"UMI"` or `"nonUMI"`)           |
| `skip_technical_noise` | `bool`  | `False` | If `True`, return true counts without technical noise |
| `commit_depth`         | `int`   | 7       | Depth at which fate commitment occurs                 |

## License

MIT
