Metadata-Version: 2.4
Name: rfx-ml
Version: 1.0.1
Summary: High-Performance Random Forests with GPU Acceleration and QLORA Compression
Home-page: https://github.com/chriskuchar/RFX
Author: Chris Kuchar
Author-email: chrisjkuchar@gmail.com
Project-URL: Bug Reports, https://github.com/chriskuchar/RFX/issues
Project-URL: Source, https://github.com/chriskuchar/RFX
Project-URL: Documentation, https://github.com/chriskuchar/RFX/blob/main/README.md
Keywords: random forest,machine learning,gpu,cuda,classification,visualization,proximity
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: C++
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.19.0
Requires-Dist: pybind11>=2.6.0
Provides-Extra: dev
Requires-Dist: pytest>=6.0.0; extra == "dev"
Requires-Dist: pytest-cov>=2.10.0; extra == "dev"
Provides-Extra: viz
Requires-Dist: matplotlib>=3.3.0; extra == "viz"
Requires-Dist: seaborn>=0.11.0; extra == "viz"
Requires-Dist: plotly>=5.0.0; extra == "viz"
Provides-Extra: examples
Requires-Dist: scikit-learn>=0.24.0; extra == "examples"
Requires-Dist: pandas>=1.3.0; extra == "examples"
Requires-Dist: tqdm>=4.60.0; extra == "examples"
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: license-file
Dynamic: project-url
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# RFX: Random Forests X

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.7+](https://img.shields.io/badge/python-3.7+-blue.svg)](https://www.python.org/downloads/)

**RFX** (Random Forests X) is a high-performance Python implementation of Breiman and Cutler's original Random Forest methodology with GPU acceleration and QLORA compression.

## Key Features

- **Complete classification**: Out-of-bag error, confusion matrices, class probabilities
- **Local importance**: Per-sample feature importance (similar to SHAP, built-in)
- **Proximity matrices**: Pairwise sample similarities for outlier detection and visualization
- **QLORA compression**: 12,500× memory reduction (80GB → 6.4MB) for large-scale proximity analysis
- **Full GPU acceleration**: CUDA for trees, importance, and proximity matrices
- **Interactive visualization**: Python-native rfviz with 3D MDS and parallel coordinates

**Result:** Proximity-based workflows now scale to 200K–1M+ samples.

## Installation

```bash
# Basic installation
pip install rfx-ml

# With visualization dependencies
pip install rfx-ml[viz]

# With all optional dependencies
pip install rfx-ml[viz,examples]
```

**Prerequisites:** CMake 3.12+, Python 3.7+, CUDA toolkit 11.0+ (required for building; GPU usage optional at runtime), C++ compiler with C++17 support.

The `pip install` command will automatically build from source. Make sure you have the prerequisites installed before running pip.

## Quick Start

```python
import numpy as np
import RFX as rf

# Load sample data
X, y = rf.load_wine()

# Train Random Forest
model = rf.RandomForestClassifier(
    ntree=100,
    compute_importance=True,
    compute_local_importance=True,
    compute_proximity=True,
    use_gpu=False  # Set to True for GPU acceleration
)

model.fit(X, y)

# Get predictions and metrics
oob_error = model.get_oob_error()
print(f"OOB Error: {oob_error:.4f}")

predictions = model.predict(X)
importance = model.feature_importances_()
local_imp = model.get_local_importance()

# Interactive visualization
rf.rfviz(
    rf_model=model,
    X=X,
    y=y,
    output_file="rfviz_example.html"
)
```

## GPU Acceleration & QLORA

For large datasets, enable GPU acceleration and QLORA compression:

```python
# Large-scale proximity analysis with QLORA
model = rf.RandomForestClassifier(
    ntree=500,
    use_gpu=True,
    compute_proximity=True,
    use_qlora=True,
    rank=32,  # Low-rank approximation
    quant_mode="int8"
)

model.fit(X, y)

# Get low-rank factors (memory efficient)
A, B, rank = model.get_lowrank_factors()

# Compute MDS directly from factors (no reconstruction!)
mds_coords = model.compute_mds_from_factors(k=3)
```

**Memory savings:** 100K samples: 74.5 GB (full matrix) → 19 MB (QLORA rank-100) = 4000× compression.

## Documentation

For complete documentation, examples, and advanced usage, visit:
- **GitHub**: https://github.com/chriskuchar/RFX
- **Full README**: https://github.com/chriskuchar/RFX/blob/main/README.md

## License

MIT License - see [LICENSE](https://github.com/chriskuchar/RFX/blob/main/LICENSE) file for details.

## Links

- **Source Code**: https://github.com/chriskuchar/RFX
- **Bug Reports**: https://github.com/chriskuchar/RFX/issues
- **Documentation**: https://github.com/chriskuchar/RFX/blob/main/README.md

