Metadata-Version: 2.4
Name: vfa
Version: 0.1.0
Summary: Variance Feature Analysis for binary classification feature selection
Author-email: mohdadil <mohdadil@live.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/nqmn/vfa
Project-URL: Repository, https://github.com/nqmn/vfa
Project-URL: Bug Tracker, https://github.com/nqmn/vfa/issues
Keywords: feature-selection,machine-learning,variance-analysis,classification
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.20.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Dynamic: license-file

# VFA - Variance Feature Analysis

A Python package for binary classification feature selection using variance-based analysis.

## Overview

Variance Feature Analysis (VFA) implements a feature selection method based on the Class-Variance Ratio (CVR). It selects the most discriminative features for binary classification tasks by analyzing the ratio of between-class variance to total variance.

## Installation

```bash
pip install vfa
```

## Features

- Fast variance-based feature selection for binary classification
- Automatic feature ranking using Class-Variance Ratio (CVR)
- Weighted feature aggregation
- Compatible with scikit-learn workflows
- Lightweight with minimal dependencies (only NumPy)

## Usage

```python
from vfa import variance_feature_analysis
import numpy as np

# Example data
X = np.random.rand(100, 20)  # 100 samples, 20 features
y = np.random.randint(0, 2, 100)  # Binary labels

# Select top 5 features
X_selected, f_aggregated, selected_indices, scores = variance_feature_analysis(X, y, k=5)

print(f"Selected feature indices: {selected_indices}")
print(f"Feature scores: {scores[selected_indices]}")
```

## Parameters

- `X` (array-like): Training data features of shape (n_samples, n_features)
- `y` (array-like): Target labels of shape (n_samples,) - must be binary
- `k` (int, default=8): Number of top features to select
- `epsilon` (float, default=1e-12): Small constant to prevent division by zero

## Returns

- `X_selected`: Selected feature subset
- `f_aggregated`: Weighted aggregation of selected features
- `selected_idx`: Indices of selected features
- `scores`: CVR scores for all features

## How It Works

The algorithm:
1. Computes within-class and between-class variance for each feature
2. Calculates the Class-Variance Ratio (CVR) = B / (B + W)
3. Selects the top-k features with highest CVR scores
4. Returns selected features and their weighted aggregation

## Requirements

- Python >=3.8
- NumPy >=1.20.0

## Development

```bash
# Clone the repository
git clone https://github.com/nqmn/vfa.git
cd vfa

# Install development dependencies
pip install -e ".[dev]"

# Run tests
pytest
```

## License

MIT License - see LICENSE file for details.

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## Citation

If you use this package in your research, please cite:

```bibtex
@software{vfa2024,
  title={VFA: Variance Feature Analysis},
  author={Mohd Adil Mokti},
  year={2026},
  url={https://github.com/nqmn/vfa}
}
```
