Metadata-Version: 2.4
Name: opforch
Version: 2.0.0
Summary: PyTorch-Inspired Optimum-Path Forest Classifier
Home-page: https://github.com/gugarosa/opforch
Author: Gustavo de Rosa
Author-email: gustavo.rosa@unesp.br
License: Apache 2.0
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.19.5
Requires-Dist: torch>=2.0.0
Provides-Extra: tests
Requires-Dist: coverage; extra == "tests"
Requires-Dist: pytest; extra == "tests"
Requires-Dist: pytest-pep8; extra == "tests"
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license
Dynamic: license-file
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: summary

# OPForch: A PyTorch-Powered Optimum-Path Forest Classifier

[![Latest release](https://img.shields.io/github/release/gugarosa/opforch.svg)](https://github.com/gugarosa/opforch/releases)
[![Open issues](https://img.shields.io/github/issues/gugarosa/opforch.svg)](https://github.com/gugarosa/opforch/issues)
[![License](https://img.shields.io/github/license/gugarosa/opforch.svg)](https://github.com/gugarosa/opforch/blob/master/LICENSE)

## Welcome to OPForch.

*Note that this implementation relies purely on the standard [LibOPF](https://github.com/jppbsi/LibOPF). Therefore, if one uses our package, please also cite the original LibOPF [authors](https://github.com/jppbsi/LibOPF/wiki/Additional-information).*

OPForch is a **PyTorch-based** implementation of the Optimum-Path Forest (OPF) classifier, migrated from the original [OPFython](https://github.com/gugarosa/opfython) package. By replacing per-node Python objects with dense tensors and scalar Numba loops with batched tensor operations, OPForch delivers **massive speedups** while maintaining **zero prediction mismatches** against the reference implementation.

### Key Highlights

| Metric | Result |
|--------|--------|
| **Accuracy Parity** | 0 prediction mismatches across all 4 classifiers |
| **Predict Speedup** | Up to **484Ã—** faster at N=10,000 |
| **Fit Speedup** | Up to **19Ã—** faster at N=10,000 |
| **Distance Matrix** | Up to **413Ã—** faster (batched tensor vs NÂ² scalar loop) |
| **GPU Acceleration** | **12.7Ã—** additional speedup on RTX 4070 for distance computation |
| **Device Support** | CPU, CUDA, and Multi-GPU via `DeviceManager` |

### Use OPForch if you need:

* Graph-based classification without hyperparameter tuning
* Deterministic training with competitive accuracy
* GPU-accelerated distance computation and prediction
* A drop-in replacement for OPFython with orders-of-magnitude speedups

OPForch is compatible with: **Python 3.8+** and **PyTorch 2.0+**.

---

## Package Structure

```
opforch/
â”œâ”€â”€ core/
â”‚   â”œâ”€â”€ heap.py          # Tensor-backed binary heap
â”‚   â”œâ”€â”€ subgraph.py      # Dense tensor columns (13 state tensors)
â”‚   â””â”€â”€ opf.py           # Abstract base (torch.save/load, device)
â”œâ”€â”€ math/
â”‚   â”œâ”€â”€ distance.py      # 47 batched (N,D)Ã—(M,D)â†’(N,M) distance metrics
â”‚   â”œâ”€â”€ general.py       # Accuracy, confusion matrix, normalize, purity
â”‚   â””â”€â”€ random.py        # Tensor-based random generators
â”œâ”€â”€ models/
â”‚   â”œâ”€â”€ supervised.py        # MST + competition + batched predict
â”‚   â”œâ”€â”€ knn_supervised.py    # KNN density clustering + k-selection
â”‚   â”œâ”€â”€ semi_supervised.py   # Labeled + unlabeled propagation
â”‚   â””â”€â”€ unsupervised.py      # Density clustering + normalized cut
â”œâ”€â”€ stream/
â”‚   â”œâ”€â”€ loader.py        # CSV/TXT/JSON â†’ torch.Tensor
â”‚   â”œâ”€â”€ parser.py        # Extract features + labels
â”‚   â””â”€â”€ splitter.py      # Train/test split
â”œâ”€â”€ subgraphs/
â”‚   â””â”€â”€ knn.py           # KNNSubgraph (torch.topk, vectorized PDF)
â”œâ”€â”€ utils/
â”‚   â”œâ”€â”€ constants.py     # EPSILON, FLOAT_MAX, status codes
â”‚   â”œâ”€â”€ converter.py     # Binary OPF format converters
â”‚   â”œâ”€â”€ device.py        # DeviceManager (CPU/GPU/multi-GPU)
â”‚   â”œâ”€â”€ exception.py     # Custom exception hierarchy
â”‚   â””â”€â”€ logging.py       # Timed rotating file logger
â”œâ”€â”€ report/              # Migration report, benchmarks, and plots
â”œâ”€â”€ examples/            # Usage scripts for all 4 classifiers
```

---

## Installation

Install from source:

```bash
git clone https://github.com/gugarosa/opforch.git
cd opforch
pip install -e .
```

For GPU support, install PyTorch with CUDA:

```bash
pip install torch --index-url https://download.pytorch.org/whl/cu124
```

---

## Quick Start

### Supervised Classification

```python
import torch
from opforch.models import SupervisedOPF
from opforch.stream import loader, parser, splitter

# Load data
data = loader.load_txt("data/boat.txt")
X, Y = parser.parse_loader(data)
X_train, X_test, Y_train, Y_test = splitter.split(X, Y, percentage=0.5)

# Train and predict (CPU)
opf = SupervisedOPF(distance="log_squared_euclidean")
opf.fit(X_train, Y_train)
predictions = opf.predict(X_test)

# GPU â€” just change the device
opf_gpu = SupervisedOPF(distance="euclidean", device="cuda:0")
opf_gpu.fit(X_train.cuda(), Y_train.cuda())
predictions = opf_gpu.predict(X_test.cuda())
```

### Available Classifiers

| Classifier | Description |
|-----------|-------------|
| `SupervisedOPF` | MST-based prototype detection + cost competition |
| `KNNSupervisedOPF` | k-NN density clustering with validation-driven k |
| `SemiSupervisedOPF` | Extends supervised with unlabeled data propagation |
| `UnsupervisedOPF` | Density-based clustering with normalized cut |

All classifiers support `fit()`, `predict()`, `save()`, and `load()`, and accept a `device` parameter for CPU/GPU execution.

---

## Benchmarks

Run the benchmark suite to compare performance on your hardware:

```bash
# Baseline benchmarks (47 metrics, 4 models, scaling)
python report/benchmark.py

# Extended benchmarks (up to N=10K, GPU, dimensionality)
python report/benchmark_extended.py

# Generate plots
python report/plot_benchmarks.py
python report/plot_extended.py
```

For the full migration report with detailed analysis, see [`report/REPORT.md`](report/REPORT.md).

---

## Architecture

The key architectural change from OPFython is the elimination of per-node Python objects in favor of dense tensor columns:

```
OPFython:  subgraph.nodes[i].cost = 5.0        # Python object attribute
OPForch:   subgraph.costs[i] = 5.0             # Tensor element (GPU-ready)
```

Prediction is fully batched â€” a single tensor operation replaces the O(NÃ—M) Python loop:

```python
dist_matrix = distance_fn(train_features, test_features)      # (N, M)
path_costs = torch.maximum(train_costs[:, None], dist_matrix)  # (N, M)
predictions = train_labels[path_costs.argmin(dim=0)]           # (M,)
```

For the complete architecture documentation, see [`ARCHITECTURE.md`](ARCHITECTURE.md).

---

## Citation

If you use OPForch to fulfill any of your needs, please cite us:

```
J. P. Papa, A. X. FalcÃ£o and C. T. N. Suzuki.
Supervised Pattern Classification based on Optimum-Path Forest.
International Journal of Imaging Systems and Technology (2009).
```

---

## Datasets

Looking for datasets? We have some pre-loaded into OPF file format in the `data/` directory. More are available at [recogna.tech](http://recogna.tech).

---

## Support

If you ever need to report a bug, talk to us, or suggest improvements, please open an issue. We will do our best to help.

---
