Metadata-Version: 2.4
Name: womac
Version: 0.1.1
Summary: Rank experts in prediction competitions using the Wisdom of the Most Accurate Crowd (WOMAC) algorithm
Author: Sid Srinivasan
License: MIT License
        
        Copyright (c) 2025 Siddarth Srinivasan
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
License-File: LICENSE
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.12.7
Requires-Dist: numpy>=2.2.5
Requires-Dist: omegaconf>=2.3.0
Requires-Dist: pandas>=2.2.3
Requires-Dist: torch>=2.7.0
Requires-Dist: tqdm>=4.67.1
Description-Content-Type: text/markdown

# WOMAC

WOMAC (Wisdom of the Most Accurate Crowd) is a Python library for identifying prediction competition winners. `Experts' (predictors) are ranked based on performance against reference predictions derived from jackknifed peer experts' predictions and observed targets.


---

## Installation

We recommend using [`pyenv`](https://github.com/pyenv/pyenv) for Python installation and [`uv`](https://docs.astral.sh/uv/getting-started/installation/) for Python package management:

1. Clone the repo:
   ```bash
   git clone https://github.com/sidsrinivasan/womac.git
   cd womac
   ```

2. Create & activate a virtual environment:
   ```bash
   pyenv install 3.12.7
   pyenv local 3.12.7
   uv venv
   source .venv/bin/activate
   ```

3. Install dependencies:

   With **uv**:
   ```bash
   uv sync
   ```

---

## Quickstart

1. __Binary Outcomes__
```python
import pandas as pd
import numpy as np
from womac import Womac

womac_config = WomacConfig.binary_outcome()
model = Womac()
X = pd.read_csv("X.csv").to_numpy()  # (m, n) matrix of expert predictions
Y = pd.read_csv("Y.csv").to_numpy()  # (m,) vector of target values
results = model.score_competition(X, Y, womac_config=womac_config)
```

2. __Continuous Outcomes__
```python
import pandas as pd
import numpy as np
from womac import Womac

womac_config = WomacConfig.continuous_outcome()
model = Womac()
X = pd.read_csv("X.csv").to_numpy()  # (m, n) matrix of expert predictions
Y = pd.read_csv("Y.csv").to_numpy()  # (m,) vector of target values
results = model.score_competition(X, Y, womac_config=womac_config)
```

---

### Configuration Objects

All configuration classes live in [`src/womac/config.py`](src/womac/config.py).


1. **WomacConfig**
   Top-level WOMAC configuration, used in [`Womac.score_competition`](src/womac/womac.py#L333):
   - `target` (Target): Whether the target labels/outcomes are `BINARY` or `CONTINUOUS`.
   - `reference_pool_config` (ReferencePoolConfig): Configuration for selecting the reference pool.
   - `feature_config` (FeatureConfig): Configuration for data processing (whether to use logits, feature construction).
   - `model_config` (ModelConfig): Configuration for choice of model and model training options.

   Convenience constructors:
   - `WomacConfig.binary_outcome()`: Preset for binary classification (logit transformation of predictions) and defaults to `FitMethod.UNIFORM`; thus the WOMAC reference is uniform geometric pooling of selected reference pool.
   - `WomacConfig.continuous_outcome()`: Preset for continuous outcomes. Defaults to `FitMethod.UNIFORM`; thus the WOMAC reference is a uniform arithmetic pooling of the selected reference pool.


2. **ReferencePoolConfig**
   Configuration for how the reference pool of peer experts is selected in [`Womac.select_reference_pool`](src/womac/womac.py#L212). Parameters:
   - `jackknife_row` (bool, default=`True`): If `True`, select potentially different reference pool for each task by holding it out for MFS calculations.
   - `jackknife_col` (bool, default=`True`): If `True`, select potentially different reference pool for each expert by holding it out for MFS calculations.
   - `drop_row_nans` (bool, default=`True`): Drop experts with missing predictions for the held-out row. Only valid when `jackknife_row=True`.
   - `missing_data_strategy` (MissingDataStrategy, default=`IMPUTE_OUTCOME_MEAN`): Strategy for handling missing entries in the error computation and reference pool:
     - `MissingDataStrategy.IMPUTE_COLUMN_MEAN`: impute with the mean of each expert's non-missing predictions.
     - `MissingDataStrategy.IMPUTE_OUTCOME_MEAN`: impute with the overall mean of observed targets.
     - `MissingDataStrategy.IGNORE_NANS`: ignore `NaN` values when computing errors.
   - `mfs_config` (`MFSConfig`, default=`MFSConfig()`): Configuration for automatic cutoff sweep; used only when `mfs_method=AUTO`.


3. **MFSConfig**
   Configuration for marginal feature screening (MFS) parameters, used in `ReferencePoolConfig`. Parameters:
   - `mfs_method` (`MFSMethod | None`, default=`MFSMethod.PERCENTILE`): Method for marginal feature screening to filter peer experts. Options:
     - `MFSMethod.PERCENTILE`: filter peers by error percentile (lower errors retained).
     - `MFSMethod.THRESHOLD`: filter peers by an absolute error threshold.
     - `MFSMethod.AUTO`: identify the best cutoff by cross-validation.
     - `MFSMethod.NONE`: disable screening and include all experts.
   - `mfs_cutoff` (float, default=`15`): Cutoff value for MFS. For `PERCENTILE`, cutoff is in (0, 100); for `THRESHOLD`, cutoff is positive. Ignored if `AUTO` or `NONE`.
   - `start_sweep` (float, default=`1`): Starting value for automatic cutoff sweep (must be > 0) when `MFSMethod.AUTO` is used.
   - `end_sweep` (float, default=`100`): Ending value for sweep (must be > `start_sweep` and ≤ 100) when `MFSMethod.AUTO` is used.
   - `step_size` (float, default=`1`): Step size for sweep (must be > 0) when `MFSMethod.AUTO` is used.


4. **FeatureConfig**
   Controls how features are constructed from predictions and missing data in [`src/womac/womac.py`](src/womac/womac.py#L477):
   - `missing_data_strategy` (MissingDataStrategy, default=`IMPUTE_OUTCOME_MEAN`): Strategy for handling missing entries for scoring expert. Options are (`IGNORE_NANS` not allowed):
     - `IMPUTE_COLUMN_MEAN`: impute missing predictions with the mean of each expert's non-missing predictions.
     - `IMPUTE_OUTCOME_MEAN`: impute missing predictions with the overall mean of observed targets.
   - `missingness_features` (FeaturizeAs | None, default=`None`): Determines how to encode missing values:
     - `None`: do not add missingness features.
     - `AGGREGATE`: add a single feature summarizing missingness (overall fraction of missing predictions) on each task.
     - `SEPARATE`: add one missingness indicator per expert, capturing which expert's predictions were missing on each task.
   - `pred_features` (FeaturizeAs, default=`AGGREGATE`): Determines how to encode prediction values:
     - `AGGREGATE`: collapse predictions of all experts into a summary statistic (mean) per reference split, yielding one feature.
     - `SEPARATE`: include each expert's prediction as a separate feature, yielding as many features as experts.
   - `logit_feats` (bool, default=`False`): Apply a logit (inverse-sigmoid) transform to the features; only valid for binary outcomes.
   - `score_as_probs` (bool, default=`False`): If `True` and `logit_feats=True`, compute MSE in probability space; ignored otherwise.

5. **ModelConfig**
   Configuration parameters for model fitting in [`src/womac/model.py`](src/womac/model.py) and [`Womac._fit`](src/womac/womac.py#L586):
   - `fit_method` (FitMethod): Strategy for fitting:
     - `UNIFORM`: uniform averaging of experts' predictions (no training).
     - `WEIGHTED`: learn weights over experts (use training).
   - `epochs` (int, default=`500`): Number of training epochs for optimization.
   - `lr` (float, default=`0.1`): Learning rate for the optimizer.
   - `l2` (float, default=`0.1`): L2 regularization coefficient (weight decay).
   - `log_interval` (int, default=`50`): Number of epochs between training log outputs.

### Notes.

1. When selecting reference pool with MFS, predictions are compared directly against targets without applying logit transformation, since the binary case the targets are binary labels and logit would produce NaNs.

2. Both for computing the reference pool and for scoring experts, imputation is applied before any logit transformation. This means that in the binary case, missing data is always imputed from arithmetic pooling.

3. In the binary case, if `logit_feats=True` and `score_as_probs=False`, the reference matrix is computed in logit space.

### Public API

The `Womac` class exposes the following methods for running the full pipeline or individual steps. All live in `src/womac/womac.py`.

1. **Constructor**
   ```python
   Womac(
       min_responses_per_task: int = 2,
       min_responses_per_expert: int = 2,
       device: str = DEVICE,
       seed: Optional[int] = None,  # random seed for reproducibility (torch & numpy)
   )
   ```
   - `min_responses_per_task`: Minimum number of experts required per task; tasks with fewer responses are dropped.
   - `min_responses_per_expert`: Minimum number of tasks per expert; experts with fewer responses are dropped.
   - `device`: Torch device for tensor operations (`cpu` or `cuda` or `mps`).
   - `seed`: If provided, sets torch random seed for reproducibility.


2. **select_reference_pool**
   ```python
   mask: torch.BoolTensor = model.select_reference_pool(
       X: torch.Tensor,  # shape (m, n)
       Y: torch.Tensor,  # shape (m,)
       reference_pool_config: ReferencePoolConfig
   )
   ```
   - Applies missing-data rules and performs marginal feature screening to choose which experts serve as references for each task-expert pair.
   - Returns boolean mask of shape `(m, dim2, n)`, where `dim2 = n` if `jackknife_col=True` else `1`.
   - `mask[i, j, k] = True` indicates expert `k` is included in the reference pool for the `i`th row-jackknife split and `j`th column-jackknife split. (`i` and `j` may be broadcasted to `m` and `n` respectively if `jackknife_row` or `jackknife_col` is `False`).

3. **get_reference_matrix**
   ```python
   Z: torch.Tensor = model.get_reference_matrix(
       X: torch.Tensor,  # shape (m, n)
       Y: torch.Tensor,  # shape (m,)
       womac_config: WomacConfig
   )
   ```
   - Runs `select_reference_pool` then computes reference predictions according to `FeatureConfig`.
   - Returns `Z` of shape `(m, dim2)`, the aggregated reference solution, where `dim2 = n` if `jackknife_col=True` else `1`.

4. **tune_mfs**
   ```python
   best_cutoff: float = model.tune_mfs(
       X: torch.Tensor,  # shape (m, n)
       Y: torch.Tensor,  # shape (m,)
       womac_config: WomacConfig
   )
   ```
   - Runs a sweep over MFS cutoffs to find the percentile cutoff where MSE between predictions and targets is minimized.
   - Returns the best cutoff value for marginal feature screening.

5. **score_competition**
   ```python
   result: WomacResult = model.score_competition(
       X: TensorLike,    # shape (m, n)
       Y: TensorLike,    # shape (m,)
       womac_config: WomacConfig
   )
   ```
   - Executes the full WOMAC workflow: tune mfs cutoff (optional) → identify reference pool and compute reference solutions → score experts → rank experts.
   - Returns `WomacResult`, which includes:
     - `womac_ranked_indices`: `torch.Tensor` (n,) expert IDs sorted by WOMAC MSE.
     - `womac_ranked_scores`: `torch.Tensor` (n,) final ranking scores by WOMAC MSE.
     - `womac_scores`: `torch.Tensor` (n,) raw WOMAC MSE of each expert per original indexing.
     - `outcome_ranked_indices`: `torch.Tensor` (n,) expert IDs sorted by outcome MSE.
     - `outcome_ranked_scores`: `torch.Tensor` (n,) final ranking scores by outcome MSE.
     - `outcome_scores`: `torch.Tensor` (n,) raw outcome MSE of each expert per original indexing.
     - `reference_matrix`: `torch.Tensor` (m, dim2) used to derive final scores (where `dim2` is `n` if `jackknife_col=True`, `1` otherwise).
     - `fitted_model`: `ModelResult`, which contains `ModelResult.coef_`, `ModelResult.bias_`, `ModelResult.train_losses`, and `ModelResult.val_losses`.

   **`WomacResult` helper methods & properties**
   - `result.winner(by_womac: bool = True)`: (original) index of the top expert (defaults to using womac score).
   - `result.reference_solution`: aggregated reference for each task (and potentially each expert, if `jackknife_col=True`).
   - `result.reference_for_expert(idx)`: full reference solution on the `m` tasks for expert at original index `idx`.
   - `result.top_k(k, by_womac)`: top-`k` expert indices (defaults to using womac score).
   - `result.score_of(idx, by_womac)`, `result.rank_of(idx, by_womac)`: individual expert stats (defaults to using womac score).
   - `result.leaderboard(limit: int = None, by_womac: bool = True)`: str of rankings (defaults to using womac score).

### Example

See `notebooks/tutorial.ipynb` for an example.

---

## Command-line Interface

WOMAC can be run from the terminal using the `womac` entrypoint installed via the `pyproject.toml` script. Use the following flags:

```bash
womac \
  --predictions /path/to/predictions.csv \
  --targets /path/to/targets.csv \
  --config   /path/to/config.yaml \
  [--output  /path/to/leaderboard.csv] \
  [--top_k <int>]
```

Flags:
- `--predictions, -p`: Path to your predictions matrix (CSV or NumPy `.npy`).
- `--targets,     -t`: Path to your target values (CSV or `.npy`).
- `--config,      -c`: YAML configuration file defining `ReferencePoolConfig`, `FeatureConfig`, `ModelConfig`, `fit_method`, and `seed`.
- `--output,      -o`: (optional) Path to save the expert leaderboard as CSV.
- `--top_k,       -k`: (optional) Log leaderboard with top K expert indices (default: `25`).

A sample `config.yaml` skeleton is provided at the project root. Edit the sections to customize your run.

```yaml
# sample womac configuration file

# Reference pool selection settings
# WOMAC settings
target: "BINARY"                 # Options: BINARY, CONTINUOUS

# Reference pool selection settings
reference_pool_config:
  jackknife_row: true
  jackknife_col: true
  drop_row_nans: true
  missing_data_strategy: "IMPUTE_OUTCOME_MEAN"  # Options: IMPUTE_COLUMN_MEAN, IMPUTE_OUTCOME_MEAN, IGNORE_NANS
  mfs_config:
    mfs_method: "AUTO"            # Options: AUTO, PERCENTILE, THRESHOLD, NONE
    mfs_cutoff: 15                # ignored if mfs_method is NONE or AUTO
    start_sweep: 1
    end_sweep: 100
    step_size: 1

# Feature construction settings
feature_config:
  missing_data_strategy: "IMPUTE_OUTCOME_MEAN"  # Options: IMPUTE_COLUMN_MEAN, IMPUTE_OUTCOME_MEAN
  missingness_features: null     # Options: AGGREGATE, SEPARATE, null
  pred_features: "AGGREGATE"     # Options: AGGREGATE, SEPARATE
  logit_feats: true              # If true, applies logit transformation to features (geometric pooling)
  score_as_probs: true           # Ignored if logit_feats is false; determines whether MSE in logit space or prob space

# Model training settings
model_config:
  fit_method: "UNIFORM"          # Options: UNIFORM, WEIGHTED
  epochs: 500       # Number of training epochs, ignored if fit_method is UNIFORM
  lr: 0.1           # Learning rate, ignored if fit_method is UNIFORM
  l2: 0.1           # L2 regularization coefficient, ignored if fit_method is UNIFORM
  log_interval: 50  # Epoch interval for logging, ignored if fit_method is UNIFORM

# WOMAC Misc settings
min_responses_per_task: 2
min_responses_per_expert: 2
device: "cpu"                     # Options: "cpu", "cuda", "mps"
seed: 42
```

---

## Testing

This project uses `pytest` and includes coverage for all major functionality.

```bash
# from project root, with venv activated
uv run pytest
```

---

## License

This project is licensed under the [MIT License](LICENSE).

---

## References

- [`src/womac/womac.py`](src/womac/womac.py) – core algorithm
- [`src/womac/reference_pool.py`](src/womac/reference_pool.py) – reference‐pool selection routines
- [`src/womac/model.py`](src/womac/model.py) – regression models
- [`src/womac/config.py`](src/womac/config.py) – configuration/dataclasses
- [`tests/test_womac.py`](src/womac/tests/test_womac.py) – unit tests
