Metadata-Version: 2.4
Name: bca-survival
Version: 0.2.6
Summary: A package for survival analysis with body composition analysis data
Author-email: Eric Frodl <eric.frodl@gmx.de>
License: MIT
Project-URL: Homepage, https://github.com/eFroD/bca-survival-analyzer
Project-URL: Documentation, https://eFroD.github.io/bca-survival-analyzer
Project-URL: Repository, https://github.com/eFroD/bca-survival-analyzer
Project-URL: Issues, https://github.com/eFroD/bca-survival-analyzer/issues
Project-URL: Changelog, https://github.com/eFroD/bca-survival-analyzer/blob/main/CHANGELOG.md
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Topic :: Scientific/Engineering :: Medical Science Apps.
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas>=2.2.1
Requires-Dist: numpy>=1.26.4
Requires-Dist: scikit-learn>=1.4.2
Requires-Dist: lifelines>=0.28.0
Requires-Dist: matplotlib>=3.9.1
Requires-Dist: seaborn>=0.13.2
Requires-Dist: statsmodels>=0.14.0
Requires-Dist: tqdm>=4.66.4
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=3.0.0; extra == "dev"
Requires-Dist: black>=22.3.0; extra == "dev"
Requires-Dist: isort>=5.10.0; extra == "dev"
Requires-Dist: flake8>=4.0.0; extra == "dev"
Requires-Dist: mypy>=0.950; extra == "dev"
Provides-Extra: docs
Requires-Dist: sphinx>=4.5.0; extra == "docs"
Requires-Dist: sphinx-rtd-theme>=1.0.0; extra == "docs"
Requires-Dist: myst-parser>=0.17.0; extra == "docs"
Requires-Dist: sphinx-autodoc-typehints>=1.19.0; extra == "docs"
Dynamic: license-file

[![CI](https://github.com/eFroD/bca-survival-analyzer/actions/workflows/ci.yml/badge.svg)](https://github.com/eFroD/bca-survival-analyzer/actions/workflows/ci.yml)
[![Documentation](https://img.shields.io/badge/docs-latest-blue.svg)](https://eFroD.github.io/bca-survival-analyzer/)
[![PyPI](https://img.shields.io/pypi/v/bca-survival.svg)](https://pypi.org/project/bca-survival/)
[![Python Version](https://img.shields.io/pypi/pyversions/bca-survival.svg)](https://pypi.org/project/bca-survival/)


# Survival Analysis Package

A Python package for analyzing survival data with a focus on body composition assessment. It was designed to utilize the results obtained by the [BOA - Body and Organ Analysis](https://github.com/UMEssen/Body-and-Organ-Analysis) workflow. In this repository we provide tools to reorganize the result of this algorithm to merge it to the patient table, add tools for data cleaning and a [lifelines](https://zenodo.org/records/10456828) wrapper for automatical explorative anaylsis on survival outcomes given the Body-Composition results. 

## Features

- **Survival Analysis**: Cox proportional hazards regression and Kaplan-Meier survival curves
- **Body Composition Analysis**: Tools for processing and analyzing BCA data
- **BOA Extractor**: Command-line tool for extracting measurements from BOA data
- **Data Preprocessing**: Utilities for cleaning and preparing survival data
- **CLI Tools**: Command-line utilities for data merging, format conversion, and PDF encryption

## Installation

```bash
pip install bca-survival
```

## Usage

### Basic Survival Analysis
```python
from bca_survival.analyzer import BCASurvivalAnalyzer

# Load your data, sharing the same identifiers
df_main = pd.read_csv('clinical_data.csv')
df_measurements = pd.read_csv('bca_measurements.csv')

# Initialize the analyzer
analyzer = BCASurvivalAnalyzer(
    df_main, df_measurements,
    main_id_col='patient_id', measurement_id_col='id',
    start_date_col='diagnosis_date', event_date_col='event_date', event_col='event_status'
)

# Perform univariate analysis
columns = ['l5::WL::imat::mean_ml', 'l5::WL::tat::mean_ml', 'age', 'gender']
results = analyzer.univariate_cox_regression(columns)

# Generate Kaplan-Meier plot
analyzer.kaplan_meier_plot('l5::WL::imat::mean_ml', split_strategy='median')

# Perform multivariate analysis
model = analyzer.multivariate_cox_regression(columns)
```

## Command-Line Tools

The package includes several command-line tools for common data processing tasks:

### BOA Extractor

Extract measurements from BOA (Body Composition Assessment) data:
```bash
boa-extract /path/to/data /path/to/output
```

**Purpose**: Processes BOA data files and extracts relevant measurements for survival analysis.

**Arguments**:
- `data_path`: Path to the directory containing BOA data files
- `output_path`: Path where extracted measurements will be saved

---

### BCA Merger

Merge two Excel files based on ID columns:
```bash
bca-merge <first_file> <second_file> <id_column_name>
```

**Purpose**: Combines clinical data with body composition measurements by matching on ID columns.

**Arguments**:
- `first_file`: Path to the first Excel file (e.g., clinical data)
- `second_file`: Path to the second Excel file (e.g., BCA measurements)
- `id_column_name`: Name of the ID column in the first file to match with 'StudyID' in the second file

**Example**:
```bash
bca-merge clinical_data.xlsx bca_measurements.xlsx patient_id
```

**Output**: Creates a file named `{first_file}_merged.xlsx` with:
- All rows from both files (outer join)
- Matched records combined into single rows
- Date columns formatted as DD.MM.YYYY
- No duplicate StudyID columns

**Notes**:
- The second file must have a column named 'StudyID'
- Uses outer merge to preserve all data from both files
- Automatically removes duplicate ID columns

---

### Survival Result Converter

Convert Excel files to multiple formats (PDF, CSV, TXT):
```bash
survival-result-converter [directory]
```

**Purpose**: Batch converts Excel files to multiple formats for reporting and data sharing.

**Arguments**:
- `directory`: Directory to scan for Excel files (default: current directory)

**Example**:
```bash
# Convert all Excel files in current directory
survival-result-converter

# Convert Excel files in specific directory
survival-result-converter /path/to/results
```

**Output Structure**:
```
directory/
├── PDF/
│   ├── file1.pdf
│   └── file2.pdf
├── CSV/
│   ├── file1.csv
│   ├── file2_sheet1.csv
│   └── file2_sheet2.csv
└── TXT/
    ├── file1.txt
    └── file2.txt
```

**Features**:
- Recursively processes all `.xlsx` files in the directory tree
- Creates separate output folders (PDF, CSV, TXT)
- For multi-sheet Excel files:
  - PDF: All sheets in single file
  - CSV: Separate file per sheet
  - TXT: All sheets in single file with separators
- PDF generation supports two methods:
  - Windows: Uses COM automation for high-quality output
  - Cross-platform: Uses fpdf library with automatic column sizing

**PDF Features**:
- Landscape orientation for better table visibility
- Automatic column width adjustment
- Fits tables to page width
- Handles large tables (up to 1000 rows per sheet)
- Text wrapping for long content

---

### PDF Report Extractor

Encrypt and organize PDF files from a directory tree:
```bash
pdf-report-extractor <input_path> <output_path> <password>
```

**Purpose**: Finds PDF files in a directory structure, copies them with standardized names, and encrypts them for secure distribution.

**Arguments**:
- `input_path`: Root directory to search for PDF files
- `output_path`: Destination directory for encrypted PDFs
- `password`: Password to encrypt the PDFs with

**Example**:
```bash
pdf-report-extractor /data/patient_reports /encrypted_reports MySecureP@ss123
```

**Behavior**:
- Recursively searches for all `.pdf` files
- Copies files to destination with naming pattern: `encrypted_{parent_folder_name}.pdf`
- Encrypts each file using user password protection
- Requires `pdftk` to be installed

**Check pdftk Installation**:
```bash
pdf-report-extractor --check-pdftk
```

**Installing pdftk**:
- Ubuntu/Debian: `sudo apt-get install pdftk`
- macOS: `brew install pdftk-java`
- Windows: Download from [PDFtk website](https://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/)

**Output Summary**:
```
Processing: /data/patient_reports/folder1/report.pdf
  -> /encrypted_reports/encrypted_folder1.pdf
  -> Encrypted successfully

Processing complete:
  - Files processed successfully: 15
  - Errors: 0
```

**Notes**:
- Original files remain unchanged
- If encryption fails, the unencrypted copy is removed from destination
- Parent folder name is used for output filename (one level up from the PDF)

---

## Documentation

Refer to the documentation in the `docs/` directory for detailed information:

1. Install the package with documentation dependencies:
```bash
   pip install -e ".[docs]"
```

2. Build the documentation on Windows:
```bash
   cd docs
   make.bat html
```
   
   Or on Linux/macOS:
```bash
   cd docs
   make html
```

3. Open `docs/build/html/index.html` in your browser

## Development

Clone the repository and install in development mode:
```bash
git clone https://gitlab.com/your-group/survival-analysis.git
cd survival-analysis
pip install -e ".[dev]"
```

## Requirements

### Core Dependencies
- pandas
- openpyxl (for Excel file handling)
- lifelines (for survival analysis)

### Optional Dependencies
- **For PDF conversion** (survival-result-converter):
  - Windows: pywin32
  - Cross-platform: fpdf, openpyxl
- **For PDF encryption** (pdf-report-extractor):
  - pdftk (external dependency)

## Common Workflows

### Workflow 1: Complete Data Processing Pipeline
```bash
# 1. Merge clinical and BCA data
bca-merge clinical.xlsx measurements.xlsx PatientID

# 2. Perform survival analysis (Python)
# ... (use BCASurvivalAnalyzer)

# 3. Convert results to multiple formats
survival-result-converter ./results

# 4. Encrypt PDF reports for distribution
pdf-report-extractor ./results/PDF ./encrypted_reports SecurePassword123
```

### Workflow 2: Quick Data Conversion
```bash
# Convert a directory of Excel results to PDF
survival-result-converter /path/to/results

# PDFs are created in /path/to/results/PDF/
```
