Metadata-Version: 2.4
Name: limma_py
Version: 0.1.0
Summary: Python implementation of R's limma package for differential expression analysis
Home-page: https://github.com/wd1566/limma-py
Author: Zhang Xian
Author-email: Zhang Xian <2967569628@qq.com>
License: MIT
Project-URL: Homepage, https://github.com/wd1566/limma_py
Project-URL: Repository, https://github.com/wd1566/limma_py
Project-URL: Issues, https://github.com/wd1566/limma_py/issues
Keywords: bioinformatics,limma,differential-expression,genomics
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.19.0
Requires-Dist: pandas>=1.3.0
Requires-Dist: scipy>=1.7.0
Requires-Dist: statsmodels>=0.13.0
Dynamic: license-file

# limma_py

A Python implementation of the popular R limma package for differential expression analysis of gene expression data.

[![Python Version](https://img.shields.io/badge/python-3.7+-blue.svg)](https://www.python.org/)
[![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)
[![Documentation](https://img.shields.io/badge/docs-mkdocs-blue.svg)](https://wd1566.github.io/limma_py/)

## Overview

`limma_py` is a comprehensive Python port of the widely-used R limma (Linear Models for Microarray Data) package. It provides powerful statistical methods for analyzing gene expression data from microarray and RNA-seq experiments, with a focus on differential expression analysis.

### Key Features

- ✅ **Complete Functionality** - Provides the same core functionality as R's limma package
- ✅ **Linear Model Fitting** - Supports complex experimental design matrices
- ✅ **Empirical Bayes Moderation** - Improves statistical power for small sample data
- ✅ **Multiple Group Comparisons** - Supports complex contrast analysis
- ✅ **Pythonic API** - User-friendly interface designed for Python users
- ✅ **High Performance Computing** - Optimized implementation based on NumPy and SciPy

## Installation

### Via pip
```bash
pip install limma_py
```

### From source
```bash
git clone https://github.com/wd1566/limma_py
cd limma_py
pip install -e .
```

### Quick Start

```python
import pandas as pd
import numpy as np
import limma_py

# Read CSV file, header=0 indicates the first row contains column names
data = pd.read_csv("data/Harmine_iTSA.csv", header=0)

# Extract gene names (first column)
gene_names = data.iloc[:, 0].values

# Extract expression matrix (all columns starting from the second)
expr_data = data.iloc[:, 1:]

# Define experimental groups: first 5 samples are control group (V_group), last 5 are treatment group (D_group)
group = np.array(["V_group"] * 5 + ["D_group"] * 5)

# Create design matrix using one-hot encoding
design_df = pd.get_dummies(group, drop_first=False)[["V_group", "D_group"]]

# Ensure design matrix is of integer type
design = design_df.astype(int)

# Create a copy of the expression matrix and set gene names as row indices
expr_matrix = expr_data.copy()
expr_matrix.index = gene_names

# Perform linear model fitting using limma
fit_python = limma_py.lmFit(expr_matrix, design)

# Set up contrast matrix: compare differences between treatment group (D_group) and control group (V_group)
contrasts = limma_py.make_contrasts('D_group - V_group', levels=design)

# Perform contrast analysis on the fitted results
fit_python = limma_py.contrasts_fit(fit_python, contrasts)

# Moderate standard errors using empirical Bayes method
eb_python = limma_py.eBayes(fit_python)

# Extract differential expression analysis result table
res = limma_py.toptable(eb_python)
```

## Data Format Requirements

Input data should be in CSV format with the following structure:

```csv
Gene,Sample1,Sample2,Sample3,Sample4,Sample5,Sample6,Sample7,Sample8,Sample9,Sample10
Gene_1,8.45,7.89,8.12,15.67,14.89,16.23,8.33,7.95,8.21,8.09
Gene_2,12.34,11.89,12.56,11.45,10.98,12.11,25.67,24.89,26.01,25.34
Gene_3,5.67,6.01,5.89,18.90,19.45,18.23,6.12,5.78,6.34,5.95
```
**Format specifications:**

- First column: Gene/protein names (any identifier)

- Subsequent columns: Numerical expression matrix, each column represents a sample

- Must be in CSV format, other formats need to be converted first

## Core Functions

- **`lmFit()`** - Linear model fitting for gene expression data
- **`make_contrasts()`** - Generate contrast matrices for group comparisons
- **`contrasts_fit()`** - Extract results for specified contrasts
- **`eBayes()`** - Empirical Bayes moderation of standard errors
- **`toptable()`** - Extract top-ranked genes from analysis results

## Documentation

For detailed documentation, examples, and API reference, visit our [documentation site](https://wd1566.github.io/limma_py/).

- [Installation Guide](https://wd1566.github.io/limma_py/installation/)
- [Usage Examples](https://wd1566.github.io/limma_py/examples/)
- [API Reference](https://wd1566.github.io/limma_py/api/)

## Use Cases

- **Microarray Data Analysis**
- **RNA-seq Differential Expression Analysis**
- **Multiple Group Experimental Designs**
- **Time Series Expression Data**
- **Any expression data analysis requiring linear modeling**

## Dependencies

- pandas >= 1.0.0
- numpy >= 1.18.0
- scipy >= 1.4.0
- statsmodels >= 0.11.0

## Contributing

Contributions are welcome! Please feel free to submit pull requests, report bugs, or suggest new features.

## License

This project is licensed under the MIT License - see the LICENSE file for details.
