Metadata-Version: 2.4
Name: proxai-ms
Version: 0.1.1
Summary: Mass spectrometry machine learning utilities derived from ProXAI notebooks.
Author: Benjamin Nouri Nigjeh
License: MIT
Keywords: mass spectrometry,proteomics,machine learning,saliency,tensorflow
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.24
Requires-Dist: pandas>=2.0
Requires-Dist: matplotlib>=3.8
Provides-Extra: ml
Requires-Dist: tensorflow>=2.14; extra == "ml"
Requires-Dist: scikit-learn>=1.3; extra == "ml"
Requires-Dist: scipy>=1.11; extra == "ml"
Provides-Extra: raw
Requires-Dist: h5py>=3.10; extra == "raw"
Requires-Dist: tqdm>=4.66; extra == "raw"
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: ruff>=0.4; extra == "dev"
Requires-Dist: build>=1.2; extra == "dev"
Requires-Dist: twine>=5.0; extra == "dev"
Requires-Dist: jupyter>=1.0; extra == "dev"
Requires-Dist: nbformat>=5.10; extra == "dev"
Dynamic: license-file

<p align="center">
  <img src="https://raw.githubusercontent.com/benjaminnigjeh/proxai-ms/main/assets/logo.png" alt="ProXAI-MS Logo" width="220"/>
</p>

<h1 align="center">ProXAI-MS</h1>

<p align="center">
  Transforming MS1 Data into Learnable Representations<br>
  via Gradient-Based Pseudo-MS1 Spectra
</p>

<p align="center">
  <img src="https://img.shields.io/badge/python-3.10+-blue"/>
  <img src="https://img.shields.io/badge/status-active-success"/>
  <img src="https://img.shields.io/badge/license-MIT-green"/>
</p>

---

## 🚀 Overview

**ProXAI-MS** is a machine learning framework that converts MS1 data into **interpretable pseudo-MS1 spectra** using gradient-based saliency mapping.

It enables:
- Binary classification (control vs experiment)
- Gradient-based feature attribution
- Reconstruction of spectra from learned signal importance

---

## 🔥 Key Features

- Train ML models on binned MS1 data  
- Gradient-based explainability  
- Separate **positive vs negative gradients**  
- Convert gradients → **pseudo-MS1 spectra**  
- Supports flexible dataset formats (long or wide)  
- CLI + Python API  

---

## 📦 Installation

### From TestPyPI
pip install -i https://test.pypi.org/simple/ proxai-ms==0.1.0

### From source
git clone https://github.com/<your-username>/proxai-ms.git
cd proxai-ms
pip install -e .

---

## ⚙️ CLI Usage

Run the full pipeline:

proxai-ms run ^
  --csv "F:\20251110\dataset_rt.csv" ^
  --label-column target ^
  --bin-column bin ^
  --bin-values 15 ^
  --control-labels 0 1 2 ^
  --experiment-labels 3 4 ^
  --out-prefix "F:\20261110\proxai_test"

---

### 🔹 Arguments

- --csv → Input dataset  
- --label-column → Label column  
- --bin-column → Column for bin grouping  
- --bin-values → Number of bins per sample  
- --control-labels → Control group labels  
- --experiment-labels → Experiment group labels  
- --out-prefix → Output prefix  

---

### 📤 Outputs

- <prefix>_pseudo_ms1_positive.csv  
- <prefix>_pseudo_ms1_negative.csv  
- <prefix>_pseudo_ms1_plot.png  

---

## 🧪 Python API

from proxai_ms import run_pipeline

result = run_pipeline(
    csv_path="dataset.csv",
    label_column="target",
    bin_column="bin",
    bin_values=15,
    control_labels=[0,1,2],
    experiment_labels=[3,4],
)

---

## 📊 Input Format

Supports:

### Wide format
Rows = samples, columns = m/z bins

### Long format
- bin column (grouping index)
- intensity values
- label column

---

## 🔬 Core Concept

ProXAI learns signal importance via gradients:

- Positive gradients → experiment signal  
- Negative gradients → control signal  

These are mapped back to spectral space to form:

> **Pseudo-MS1 = learned biochemical representation**

Unlike traditional pipelines:
- No peak picking required  
- No manual feature engineering  
- Fully data-driven representation learning  

---

## 🧱 Project Structure

proxai-ms/

├── assets/  
├── src/proxai_ms/  
│   ├── training.py  
│   ├── explain.py  
│   ├── pipeline.py  
│   └── cli.py  
├── notebooks/  
├── docs/  
├── scripts/  
└── pyproject.toml  

---

## ⚠️ Important Notes

- Disable normalization if gradients collapse to zero  
- Avoid averaging gradients across samples incorrectly  
- Always separate positive and negative gradients before aggregation  

---

## 🛣️ Roadmap

- Deep learning models (CNN / Transformer)  
- Multi-class classification  
- UniDec integration  
- mzML export  
- GUI (ProXAI Desktop)  

---

## 👨‍🔬 Author

Benjamin Nouri Nigjeh  
Proteomics • Machine Learning • Mass Spectrometry  

---

## 📜 License

MIT License
