Metadata-Version: 2.1
Name: scbig
Version: 0.1.0
Summary: scBiG: a novel scRNA-seq representation learning method based on graph node embedding
Home-page: https://github.com/sldyns/scBiG
Author: Kun Qian, Ting Li
Author-email: kun_qian@foxmail.com
Maintainer: Kun Qian
Maintainer-email: kun_qian@foxmail.com
License: MIT Licence
Keywords: single-cell RNA-sequencing,Graph node embedding,Dimensionality reduction
Platform: any
Description-Content-Type: text/markdown
Requires-Dist: numpy
Requires-Dist: scanpy
Requires-Dist: h5py
Requires-Dist: torch
Requires-Dist: dgl
Requires-Dist: pandas
Requires-Dist: scipy
Requires-Dist: sklearn
Requires-Dist: louvain

# scBiG: a novel scRNA-seq representation learning method based on graph node embedding

## Overview

![alt](overview.png)

scBiG is a graph autoencoder network where the encoder based on multi-layer graph convolutional networks extracts high-order representations of cells and genes from the cell-gene bipartite graph, and the decoder based on the ZINB model uses these representations to reconstruct the gene expression matrix. By virtue of a model-driven self-supervised training paradigm, scBiG can effectively learn low-dimensional representations of both cells and genes, amenable to diverse downstream analytical tasks.

## Installation

Please install `scBiG` from pypi with:

```bash
pip install scbig
```

Or clone this repository and use

```bash
pip install -e .
```

in the root of this repository.

## Quick start

Load the data to be analyzed:

```python
import scanpy as sc

adata = sc.AnnData(data)
```



Perform data pre-processing:

```python
# Basic filtering
sc.pp.filter_genes(adata, min_cells=3)
sc.pp.filter_cells(adata, min_genes=200)

adata.raw = adata.copy()

# Total-count normlize, logarithmize the data, calculate the gene size factor 
sc.pp.normalize_per_cell(adata)
adata.obs['cs_factor'] = adata.obs.n_counts / np.median(adata.obs.n_counts)
sc.pp.log1p(adata)
adata.var['gs_factor'] = np.max(adata.X, axis=0, keepdims=True).reshape(-1)
```

Run the scBiG method:

```python
from scbig import run_scbig
adata = run_scbig(adata)
```

The output adata contains the cell embeddings in `adata.obsm['feat']` and the gene embeddings in `adata.obsm['feat']`. The embeddings can be used as input of other downstream analyses.

Please refer to `tutorial.ipynb` for a detailed description of scBiG's usage.

