Metadata-Version: 2.4
Name: pybiogsp
Version: 1.1.0
Summary: Biological Graph Signal Processing for Spatial Data Analysis
Home-page: https://github.com/BMEngineeR/PyBioGSP
Author: Yuzhou Chang
Author-email: Yuzhou Chang <yuzhou.chang@osumc.edu>
License: GPL-3.0-only
Project-URL: Homepage, https://github.com/BMEngineeR/PyBioGSP
Project-URL: Documentation, https://pybiogsp.readthedocs.io
Project-URL: Source, https://github.com/BMEngineeR/PyBioGSP
Project-URL: Issues, https://github.com/BMEngineeR/PyBioGSP/issues
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: numpy>=1.20.0
Requires-Dist: scipy>=1.7.0
Requires-Dist: pandas>=1.3.0
Requires-Dist: torch>=1.9.0
Requires-Dist: scikit-learn>=0.24.0
Requires-Dist: tqdm>=4.60.0
Provides-Extra: viz
Requires-Dist: matplotlib>=3.4.0; extra == "viz"
Requires-Dist: seaborn>=0.11.0; extra == "viz"
Provides-Extra: docs
Requires-Dist: sphinx>=7.0.0; extra == "docs"
Requires-Dist: furo>=2024.1.29; extra == "docs"
Requires-Dist: myst-parser>=2.0.0; extra == "docs"
Requires-Dist: sphinx-autodoc-typehints>=2.0.0; extra == "docs"
Provides-Extra: dev
Requires-Dist: pytest>=6.0.0; extra == "dev"
Requires-Dist: pytest-cov>=2.0.0; extra == "dev"
Requires-Dist: black>=21.0.0; extra == "dev"
Requires-Dist: flake8>=3.9.0; extra == "dev"
Requires-Dist: mypy>=0.900; extra == "dev"
Requires-Dist: build>=1.2.0; extra == "dev"
Requires-Dist: twine>=5.0.0; extra == "dev"
Dynamic: author
Dynamic: home-page
Dynamic: requires-python

# PyBioGSP

**Biological Graph Signal Processing for Spatial Data Analysis**

A Python implementation of Graph Signal Processing (GSP) methods including Spectral Graph Wavelet Transform (SGWT) for analyzing spatial patterns in biological data. Uses PyTorch for accelerated matrix decomposition.

Based on Hammond, Vandergheynst, and Gribonval (2011) "Wavelets on Graphs via Spectral Graph Theory" and biological application in Stephanie, Yao, Yuzhou (2024).

## Features

- **Multi-scale analysis** of spatial signals using Spectral Graph Wavelet Transform
- **PyTorch acceleration** for fast eigendecomposition and matrix operations
- **Multiple kernel families**: Mexican Hat, Meyer, and Heat kernels
- **Graph Fourier Transform** (GFT) and Inverse GFT
- **Similarity analysis** using energy-normalized weighted similarity in Fourier domain
- **Simulation tools** for generating test patterns (circles, stripes, checkerboards)
- **Visualization functions** for SGWT decomposition, kernels, and patterns

## Installation

### From PyPI (recommended)

```bash
pip install pybiogsp
```

### From source (development)

```bash
git clone https://github.com/BMEngineeR/PyBioGSP.git
cd PyBioGSP
pip install -e ".[viz]"  # includes matplotlib & seaborn
```

## Quick Start

```python
from pybiogsp import SGWT

# 1. Initialize with a DataFrame containing X, Y coordinates and signal columns
sg = SGWT(data=df, x_col="X", y_col="Y",
          signals=["signal_1", "signal_2"],
          J=3, scaling_factor=5, kernel_type="heat")

# 2. Build spectral graph (k-NN -> Laplacian -> eigendecomposition)
sg.run_spec_graph(k=12, laplacian_type="normalized",
                  length_eigenvalue=900, verbose=False, use_torch=True)

# 3. Forward & inverse SGWT
sg.run_sgwt(use_batch=True, verbose=False, use_torch=True)

# 4a. Compare two signals in wavelet domain
result = sg.run_sgcc("signal_1", "signal_2", return_parts=True)
print(f"Overall similarity:  {result['S']:.4f}")
print(f"Low-freq similarity: {result['c_low']:.4f}")
print(f"High-freq similarity:{result['c_nonlow']:.4f}")

# 4b. Or compute all-pairs SGCC matrix at once (matrix multiplication, no loop)
sgcc_df = sg.run_sgcc_matrix()
print(sgcc_df)

# 4b. Or compute all-pairs SGCC matrix at once (matrix multiplication, no loop)
sgcc_df = sg.run_sgcc_matrix()
print(sgcc_df)

# 5. Energy analysis
energy_df = sg.energy_analysis("signal_1")
print(energy_df)
```

## Workflow

```sh
SGWT(data) -> run_spec_graph() -> run_sgwt() -> run_sgcc() / energy_analysis()
```

| Step | Method                                     | What it does                                                       |
| ---- | ------------------------------------------ | ------------------------------------------------------------------ |
| 1    | `SGWT(data, ...)`                        | Initialize with DataFrame, coordinates, signals, kernel parameters |
| 2    | `run_spec_graph(k, laplacian_type, ...)` | Build k-NN graph -> Laplacian -> eigendecomposition                |
| 3    | `run_sgwt(use_batch, ...)`               | Forward SGWT (wavelet coefficients) + inverse (reconstruction)     |
| 4a   | `run_sgcc(signal1, signal2, ...)`        | Energy-weighted cosine similarity ->`c_low`, `c_nonlow`, `S` |
| 4a'  | `run_sgcc_matrix()`                      | All-pairs SGCC matrix via matrix multiplication (no loop needed)   |
| 4a'  | `run_sgcc_matrix()`                      | All-pairs SGCC matrix via matrix multiplication (no loop needed)   |
| 4b   | `energy_analysis(signal_name)`           | Per-scale energy distribution                                      |

## API Overview

### SGWT Class — Main Entry Point

```python
from pybiogsp import SGWT

sg = SGWT(
    data,                    # DataFrame with coordinates and signals
    x_col="x", y_col="y",   # Coordinate column names
    signals=["sig1"],        # Signal columns (None = auto-detect)
    J=5,                     # Number of wavelet scales
    scaling_factor=2.0,      # Ratio between consecutive scales
    kernel_type="heat",      # "heat" | "mexican_hat" | "meyer"
)

sg.run_spec_graph(k=25, laplacian_type="normalized", length_eigenvalue=None, use_torch=True)
sg.run_sgwt(use_batch=True, use_torch=True)
result = sg.run_sgcc("sig1", "sig2", return_parts=True)   # -> {c_low, c_nonlow, S, ...}
sgcc_df = sg.run_sgcc_matrix()                              # -> p×p DataFrame of all-pairs S
energy = sg.energy_analysis("sig1")                        # -> DataFrame
```

### Kernel Types

| Kernel            | Scaling (low-pass) | Wavelet (band-pass)    | Best for                            |
| ----------------- | ------------------ | ---------------------- | ----------------------------------- |
| `"heat"`        | `exp(-t)`        | `t * exp(-t)`        | Smooth diffusion patterns (default) |
| `"mexican_hat"` | `exp(-0.5t^2)`   | `t^2 * exp(-0.5t^2)` | Oscillatory / edge patterns         |
| `"meyer"`       | Smooth step        | Smooth band-pass       | Sharp frequency separation          |

### Core & Utility Functions

| Module    | Functions                                                                                                                                                          |
| --------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `core`  | `sgwt_get_kernels`, `compute_sgwt_filters`, `sgwt_auto_scales`, `sgwt_forward`, `sgwt_inverse`, `compare_kernel_families`                              |
| `utils` | `cal_laplacian`, `fast_decomposition_lap`, `gft`, `igft`, `cosine_similarity`, `cosine_similarity_matrix`, `find_knee_point`, `build_knn_graph`, `check_kband`, `get_device` |

### Simulation

All simulation functions return `Dict[str, DataFrame]` with columns `X`, `Y`, `signal_1`, `signal_2`.

| Function                        | Pattern             | Key Parameters                                                  |
| ------------------------------- | ------------------- | --------------------------------------------------------------- |
| `simulate_multiscale`         | Concentric circles  | `grid_size`, `Ra_seq`, `n_steps`, `n_centers`           |
| `simulate_stripe_patterns`    | Parallel stripes    | `grid_size`, `gap_seq`, `width_seq`, `theta_seq`        |
| `simulate_multiscale_overlap` | Overlapping circles | `grid_size`, `n_centers`, `Ra_seq`, `Rb_seq`            |
| `simulate_moving_circles`     | Approaching circles | `grid_size`, `radius_seq`, `n_steps`, `center_distance` |
| `simulate_checkerboard`       | Grid tiles          | `grid_size`, `tile_size` (returns single DataFrame)         |

### Visualization

| Function                                                                                                          | Purpose                                                |
| ----------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------ |
| `plot_sgwt_decomposition(sg, signal_name)`                                                                      | Wavelet coefficients on spatial grid per scale         |
| `plot_fourier_modes(sg, mode_type)`                                                                             | Graph eigenvectors (`"low"`, `"high"`, `"both"`) |
| `visualize_sgwt_kernels(eigenvalues, ...)`                                                                      | Filter bank plot (scaling + wavelet kernels)           |
| `visualize_similarity_xy(results)`                                                                              | Scatter `c_low` vs `c_nonlow` from batch analysis  |
| `visualize_multiscale`, `visualize_stripe_patterns`, `visualize_moving_circles`, `visualize_checkerboard` | Grid view of simulation patterns                       |

## GPU Acceleration

PyBioGSP auto-selects the best available device (CUDA > MPS > CPU). Pass `use_torch=True` (default) to `run_spec_graph()` and `run_sgwt()` for accelerated computation.

## Applications

- **Spatial transcriptomics**: Analyzing gene expression patterns (Visium, MERFISH, etc.)
- **Multiplexed imaging**: Cell type distributions (CODEX, IMC, etc.)
- **Neuroscience**: Brain connectivity and signal analysis
- **Developmental biology**: Spatial pattern formation
- **Pathology**: Tumor microenvironment analysis

## Copilot Agent Skill

This repo includes a [VS Code Copilot agent skill](https://code.visualstudio.com/docs/copilot/copilot-customization) at `.github/skills/pybiogsp-analysis/` that provides Copilot with full PyBioGSP workflow knowledge — parameter meanings, result interpretation, troubleshooting, and batch analysis patterns. When using Copilot in this workspace, it can guide you through the SGWT pipeline end-to-end.

## References

1. Hammond, D. K., Vandergheynst, P., & Gribonval, R. (2011). Wavelets on graphs via spectral graph theory. *Applied and Computational Harmonic Analysis*, 30(2), 129-150.
2. Stephanie, Yao, Yuzhou (2024). [Biological Application]. *bioRxiv*. doi:10.1101/2024.12.20.629650

## License

GPL-3.0

## Author

Yuzhou Chang (yuzhou.chang@osumc.edu)
