Metadata-Version: 2.4
Name: histoseg
Version: 0.1.7
Summary: ...
Author-email: Taobo Hu <taobo.hu@scilifelab.se>
License: Required Notice: Copyright (c) 2025 SPATHO AB.
        Required Notice: Copyright (c) 2025 Mengping Long, Taobo Hu, Mats Nilsson.
        Required Notice: Any commercial use requires a separate commercial license from SPATHO AB.
        
        # PolyForm Noncommercial License 1.0.0
        
        <https://polyformproject.org/licenses/noncommercial/1.0.0>
        
        ## Acceptance
        
        In order to get any license under these terms, you must agree to them as both strict obligations and conditions to all your licenses.
        
        ## Copyright License
        
        The licensor grants you a copyright license for the software to do everything you might do with the software that would otherwise infringe the licensor's copyright in it for any permitted purpose. However, you may only distribute the software according to [Distribution License](#distribution-license) and make changes or new works based on the software according to [Changes and New Works License](#changes-and-new-works-license).
        
        ## Distribution License
        
        The licensor grants you an additional copyright license to distribute copies of the software. Your license to distribute covers distributing the software with changes and new works permitted by [Changes and New Works License](#changes-and-new-works-license).
        
        ## Notices
        
        You must ensure that anyone who gets a copy of any part of the software from you also gets a copy of these terms or the URL for them above, as well as copies of any plain-text lines beginning with `Required Notice:` that the licensor provided with the software. For example:
        
        > Required Notice: Copyright Yoyodyne, Inc. (http://example.com)
        
        ## Changes and New Works License
        
        The licensor grants you an additional copyright license to make changes and new works based on the software for any permitted purpose.
        
        ## Patent License
        
        The licensor grants you a patent license for the software that covers patent claims the licensor can license, or becomes able to license, that you would infringe by using the software.
        
        ## Noncommercial Purposes
        
        Any noncommercial purpose is a permitted purpose.
        
        ## Personal Uses
        
        Personal use for research, experiment, and testing for the benefit of public knowledge, personal study, private entertainment, hobby projects, amateur pursuits, or religious observance, without any anticipated commercial application, is use for a permitted purpose.
        
        ## Noncommercial Organizations
        
        Use by any charitable organization, educational institution, public research organization, public safety or health organization, environmental protection organization, or government institution is use for a permitted purpose regardless of the source of funding or obligations resulting from the funding.
        
        ## Fair Use
        
        You may have "fair use" rights for the software under the law. These terms do not limit them.
        
        ## No Other Rights
        
        These terms do not allow you to sublicense or transfer any of your licenses to anyone else, or prevent the licensor from granting licenses to anyone else. These terms do not imply any other licenses.
        
        ## Patent Defense
        
        If you make any written claim that the software infringes or contributes to infringement of any patent, your patent license for the software granted under these terms ends immediately. If your company makes such a claim, your patent license ends immediately for work on behalf of your company.
        
        ## Violations
        
        The first time you are notified in writing that you have violated any of these terms, or done anything with the software not covered by your licenses, your licenses can nonetheless continue if you come into full compliance with these terms, and take practical steps to correct past violations, within 32 days of receiving notice. Otherwise, all your licenses end immediately.
        
        ## No Liability
        
        As far as the law allows, the software comes as is, without any warranty or condition, and the licensor will not be liable to you for any damages arising out of these terms or the use or nature of the software, under any kind of legal claim.
        
        ## Definitions
        
        The **licensor** is the individual or entity offering these terms, and the **software** is the software the licensor makes available under these terms.
        
        **You** refers to the individual or entity agreeing to these terms.
        
        **Your company** is any legal entity, sole proprietorship, or other kind of organization that you work for, plus all organizations that have control over, are under the control of, or are under common control with that organization. **Control** means ownership of substantially all the assets of an entity, or the power to direct its management and policies by vote, contract, or otherwise. Control can be direct or indirect.
        
        **Your licenses** are all the licenses granted to you for the software under these terms.
        
        **Use** means anything you do with the software requiring one of your licenses.
        
Project-URL: Homepage, https://github.com/hutaobo/HistoSeg
Project-URL: Documentation, https://histoseg.readthedocs.io
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: AUTHORS.rst
Dynamic: license-file

<div align="center">

# HistoSeg

<p align="center">
  <a href="https://pypi.org/project/histoseg/"><img alt="PyPI" src="https://img.shields.io/pypi/v/histoseg.svg"></a>
  <a href="https://histoseg.readthedocs.io/en/latest/"><img alt="Docs" src="https://readthedocs.org/projects/histoseg/badge/?version=latest"></a>
  <a href="https://github.com/hutaobo/HistoSeg/actions/workflows/publish.yml"><img alt="Publish to PyPI" src="https://github.com/hutaobo/HistoSeg/actions/workflows/publish.yml/badge.svg?branch=master"></a>
  <a href="https://polyformproject.org/licenses/noncommercial/1.0.0/"><img alt="License: PolyForm Noncommercial 1.0.0" src="https://img.shields.io/badge/License-PolyForm--Noncommercial%201.0.0-blue.svg"></a>
</p>

</div>

HistoSeg is a Python toolkit for **spatial transcriptomics segmentation / geometry extraction**.

The current focus is **Pattern1 isoline (0.5)** contour generation from cell clusters (e.g., 10x Xenium GraphClust output):

- Pick a set of “target clusters” (Pattern1)
- Fit a KNN regressor to estimate *P(target)* over space
- Smooth the probability field
- Extract a contour (isoline) at **level = 0.5**
- Save contour vertices and a quick preview plot

---

## Quick links

- **Documentation:** https://histoseg.readthedocs.io/en/latest/
- **Source code:** https://github.com/hutaobo/HistoSeg
- **Issue tracker:** https://github.com/hutaobo/HistoSeg/issues

> ⚠️ **License note**
>
> This project is distributed under the **PolyForm Noncommercial 1.0.0** license.
> **Academic and other noncommercial use is permitted.**
> **Any commercial use requires a separate commercial license from SPATHO AB.**
> See `LICENSE` for the full terms.

---

## Installation

### Install from PyPI (recommended)

```bash
pip install -U histoseg
```

### Install from source (for development)

```bash
git clone https://github.com/hutaobo/HistoSeg.git
cd HistoSeg
pip install -U pip
pip install -e .
```

### Dependencies

The Pattern1 isoline workflow uses:

- numpy, pandas
- scipy
- scikit-learn
- matplotlib
- a Parquet engine (**pyarrow is recommended**)

If you run into missing imports, install them explicitly:

```bash
pip install -U numpy pandas pyarrow scipy scikit-learn matplotlib
```

Optional:

- Hugging Face downloader: `pip install -U huggingface_hub`

---

## Tutorial: Pattern1 isoline (0.5)

### What you need (inputs)

The isoline workflow expects the following files:

1. `clusters.csv`
   - Typically from GraphClust: `analysis/clustering/gene_expression_graphclust/clusters.csv`
   - Must contain columns: `Barcode`, `Cluster`

2. `cells.parquet`
   - A cell-level table with spatial coordinates (x/y-like columns)
   - Must contain at least:
     - coordinate columns (e.g. `x`/`y` or `x_centroid`/`y_centroid`)
     - an id column that can be aligned with `clusters.csv:Barcode` (the code tries several common column names)

3. `tissue_boundary.csv` (optional but recommended if you enable synthetic background)
   - Must contain columns `x,y` **or** `X,Y`

### What you get (outputs)

By default, the pipeline writes into `out_dir`:

- `params.json` — all parameters + inferred join columns
- `pattern1_isoline_<level>_<i>.npy` — contour vertices (Nx2 arrays)
- `pattern1_isoline_<level>.png` — quick preview plot

---

## Quickstart

### One-liner (from a Hugging Face dataset repo)

This follows the example notebook in `examples/contour_generation_pattern1_from_hf.ipynb`.

```python
# pip install -U histoseg
# pip install -U huggingface_hub pandas pyarrow numpy scipy scikit-learn matplotlib

from histoseg import run_pattern1_isoline_from_hf

PATTERN1 = (10, 23, 19, 27, 14, 20, 25, 26)

result = run_pattern1_isoline_from_hf(
    repo_id="hutaobo/output-XETG00082_C105",
    revision="main",  # or a commit hash for strict reproducibility
    out_dir="outputs/pattern1_isoline0p5_from_graphclust",
    pattern1_clusters=PATTERN1,

    # Defaults are intentionally exposed for tuning:
    grid_n=1200,
    knn_k=30,
    smooth_sigma=5.0,
    min_cells_inside=10,
)

print("Outputs folder:", result.out_dir)
print("Preview image:", result.preview_png)
print("Contours:", len(result.contours))
```

### Run on local files

```python
from histoseg import Pattern1IsolineConfig, run_pattern1_isoline

PATTERN1 = (10, 23, 19, 27, 14, 20, 25, 26)

cfg = Pattern1IsolineConfig(
    clusters_csv="/path/to/analysis/clustering/gene_expression_graphclust/clusters.csv",
    cells_parquet="/path/to/cells.parquet",
    tissue_boundary_csv="/path/to/tissue_boundary.csv",
    out_dir="outputs/pattern1_isoline0p5",
    pattern1_clusters=PATTERN1,

    # Optional tuning:
    grid_n=1200,
    knn_k=30,
    smooth_sigma=5.0,
    min_cells_inside=10,
)

result = run_pattern1_isoline(cfg)
print(result)
```

---

## How it works (workflow overview)

```mermaid
flowchart TD
  A["clusters.csv<br/>Barcode/Cluster"] --> C["Align barcodes<br/>with cells.parquet"]
  B["cells.parquet<br/>x/y + id-like column"] --> C
  C --> D["Select target clusters<br/>(Pattern1)"]
  D --> E["Sample background points<br/>(other cells)"]
  F["tissue_boundary.csv"] --> G["Generate synthetic background<br/>(optional)"]
  G --> E
  D --> H["KNN regression<br/>predict P(target)"]
  E --> H
  H --> I["Predict on mesh grid"]
  I --> J["Gaussian smoothing"]
  J --> K["Mask by tissue<br/>(nearest-cell threshold)"]
  K --> L["Extract isoline<br/>level = 0.5"]
  L --> M["Filter loops<br/>min_cells_inside"]
  M --> N["Save params.json<br/>+ contours .npy<br/>+ preview .png"]
```

---

## Troubleshooting & tuning

If no contour is found, try:

- Decrease `min_cells_inside` (e.g. 10 → 3)
- Increase `smooth_sigma` (e.g. 5 → 8)
- Increase `knn_k` (e.g. 30 → 50)
- Reduce `grid_n` to speed up (note: `grid_n=1200` can be heavy)

---

## API reference (high-level)

### Pattern1 isoline

- `Pattern1IsolineConfig`  
  Dataclass holding all parameters and input paths.

- `run_pattern1_isoline(cfg) -> Pattern1IsolineResult`  
  Runs the full pipeline on local files.

- `run_pattern1_isoline_from_hf(repo_id, revision="main", ...) -> Pattern1IsolineResult`  
  Convenience wrapper that downloads required files from a Hugging Face *dataset repo* and then runs the pipeline.

### Hugging Face I/O helpers

- `download_xenium_outs(repo_id, revision="main", clusters_relpath=..., cache_dir=None)`  
  Downloads `cells.parquet`, `tissue_boundary.csv`, and the specified `clusters.csv` from a dataset repo.

### SFPlot utilities (legacy / optional)

This repository contains a small subset of SFPlot-style utilities and re-exports:

- `compute_cophenetic_distances_from_df(df, ...)`
- `plot_cophenetic_heatmap(matrix, ...)`

---

## GUI (experimental)

A GUI entry point is configured as:

```bash
histoseg-gui
```

Notes:

- The current GUI code path is still in flux and may require extra dependencies (e.g., Pillow) and/or an external `sfplot` installation.
- For production workflows, prefer the Python API shown above.

---

## Contributing

Issues and pull requests are welcome.

When reporting a bug, please include:

- OS + Python version
- `histoseg` version
- Minimal reproducible code (or a small input subset)
- Expected vs. actual behavior

---

## License

This project is distributed under the **PolyForm Noncommercial 1.0.0** license.
Noncommercial use (including academic research) is permitted.
Any commercial use requires a separate commercial license from **SPATHO AB**.
See `LICENSE` for details.
