Metadata-Version: 2.4
Name: cnsbm
Version: 1.0.0
Summary: Categorical Block Modelling For Primary and Residual Copy Number Variation
Author-email: Kevin Lam <kevin.lam@stat.ubc.ca>
License-Expression: MIT
Project-URL: Homepage, https://github.com/lamke07/CNSBM
Project-URL: Documentation, https://github.com/lamke07/CNSBM#readme
Project-URL: Repository, https://github.com/lamke07/CNSBM
Project-URL: Bug Tracker, https://github.com/lamke07/CNSBM/issues
Project-URL: Paper, https://arxiv.org/abs/2506.22963
Keywords: machine learning,stochastic block model,copy number variation,bioinformatics,clustering,variational inference
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.21.0
Requires-Dist: pandas>=1.3.0
Requires-Dist: scipy>=1.7.0
Requires-Dist: scikit-learn>=1.0.0
Requires-Dist: matplotlib>=3.3.0
Requires-Dist: seaborn>=0.11.0
Requires-Dist: jax>=0.3.0
Provides-Extra: dev
Requires-Dist: pytest>=6.0; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: black; extra == "dev"
Requires-Dist: flake8; extra == "dev"
Requires-Dist: mypy; extra == "dev"
Requires-Dist: jupyter; extra == "dev"
Provides-Extra: gpu
Requires-Dist: jax[cuda]>=0.3.0; extra == "gpu"
Dynamic: license-file

# CN-SBM: Categorical Block Modelling For Primary and Residual Copy Number Variation
This repository contains the implementation of the model described in the paper: **CN-SBM: Categorical Block Modelling For Primary and Residual Copy Number Variation** ([arXiv:2506.22963](https://arxiv.org/abs/2506.22963)), to appear in MLCB 2025.

# Note on NumPy Version
If you plan to use pickle files in an environment with NumPy < 2, you should install numpy < 2 for compatibility. Otherwise, it is fine to use NumPy 2 or newer.

# Installation

## Option 1: Install from PyPI (Recommended)

```bash
pip install cnsbm
```

## Option 2: Install from source

```bash
git clone https://github.com/lamke07/CNSBM.git
cd CNSBM
pip install .
```

## Option 3: Development installation

```bash
git clone https://github.com/lamke07/CNSBM.git
cd CNSBM
pip install -e .
```

## Option 4: Using Conda (Alternative)

```bash
conda env create -f environment.yml
conda activate cnsbm
pip install .
```

### GPU Support (Optional)

For GPU acceleration with JAX:

```bash
pip install cnsbm[gpu]
```

# Simple usage

```python
import os
import jax.numpy as jnp
from cnsbm import CNSBM

cwd = os.getcwd()

# C is a categorical matrix (integer-encoded categories starting from 0),
# missing values are encoded as -1. The number of categories will be inferred by C.max().
# For an example of how to construct and use C, see cn_vi-simple.ipynb in this repository
C = jnp.asarray(C)
K, L = 15, 10

# Initialize Jax model
sbm_test = CNSBM(C, K, L, rand_init='spectral_bi', fill_na=2)
# Run batch variational inference
_ = sbm_test.batch_vi(75, batch_print=1, fitted=False, tol=1e-6)

# plot reordered output and get summary information
sbm_test.plt_blocks(plt_init=True)
sbm_test.summary()
_ = sbm_test.ICL(verbose=True, slow=True)

# Save model outputs and export cluster labels / probabilities
os.makedirs(os.path.join(cwd, 'output'), exist_ok=True)
sbm_test.export_outputs_csv(os.path.join(cwd, 'output'), model_name='test_sbm')
sbm_test.save_jax_model(os.path.join(cwd, 'output', f'test_sbm.pickle'))
```
