Metadata-Version: 2.1
Name: fedops-dataset
Version: 0.3.4
Summary: Local-first dataset toolkit for multimodal federated learning artifacts (partition/feature/simulation)
Author: FedOps Dataset Team
License: MIT
Keywords: federated-learning,multimodal,dataset,huggingface,fedops
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: huggingface-hub>=0.25.0
Requires-Dist: typing-extensions>=4.9.0

# fedops-dataset

`fedops-dataset` is a local-first dataset toolkit for multimodal FL experiments (FedMS2-v8 style).

It supports:
- raw data bootstrap (`fetch-raw`)
- partition/feature/simulation generation (`create-v8`)
- Python API for runtime-driven usage (`FedOpsLocalDataset`)

Python requirement: `>=3.8`

## Install

```bash
pip install fedops-dataset
```

## Local-First Quickstart

### 1) Fetch or setup raw data

```bash
# CREMA-D
fedops-dataset fetch-raw --dataset crema_d --data-root /path/to/fed_multimodal/data

# PTB-XL
fedops-dataset fetch-raw --dataset ptb-xl --data-root /path/to/fed_multimodal/data

# Hateful Memes (auto-download from HF repo)
fedops-dataset fetch-raw \
  --dataset hateful_memes \
  --data-root /path/to/fed_multimodal/data \
  --hateful-memes-repo-id neuralcatcher/hateful_memes \
  --hateful-memes-revision main

# Hateful Memes (manual prepared source folder)
fedops-dataset fetch-raw \
  --dataset hateful_memes \
  --data-root /path/to/fed_multimodal/data \
  --hateful-memes-source-dir /path/to/hateful_memes_source \
  --hateful-memes-mode symlink
```

### 2) Validate raw roots

```bash
fedops-dataset check-raw-datasets \
  --data-root /path/to/fed_multimodal/data \
  --hateful-memes-root /path/to/fed_multimodal/data/hateful_memes
```

### 3) Generate v8 artifacts (`alpha`, `ps`, `pm`)

```bash
# Dry run first
fedops-dataset create-v8 \
  --dataset hateful_memes \
  --alpha 0.1 \
  --sample-missing-rate 0.2 \
  --modality-missing-rate 0.8 \
  --repo-root /path/to/fed-multimodal \
  --data-root /path/to/fed_multimodal/data \
  --hateful-memes-root /path/to/fed_multimodal/data/hateful_memes \
  --dry-run

# Real run
fedops-dataset create-v8 \
  --dataset hateful_memes \
  --alpha 50 \
  --sample-missing-rate 0.2 \
  --modality-missing-rate 0.8 \
  --repo-root /path/to/fed-multimodal \
  --data-root /path/to/fed_multimodal/data \
  --hateful-memes-root /path/to/fed_multimodal/data/hateful_memes
```

Note on `alpha`:
- both `--alpha 5.0` and `--alpha 50` resolve to artifact token `alpha50`
- `--alpha 0.1` resolves to `alpha01`

## Python API (Runtime-Driven)

### Direct local usage

```python
from fedops_dataset import FedOpsLocalDataset

ds = FedOpsLocalDataset(
    dataset="hateful_memes",
    alpha=0.1,
    sample_missing_rate=0.2,   # ps
    modality_missing_rate=0.8, # pm
    repo_root="/path/to/fed-multimodal",
    data_root="/path/to/fed_multimodal/data",
    hateful_memes_root="/path/to/fed_multimodal/data/hateful_memes",
)

ds.prepare(dry_run=False)
partition = ds.load_partition()
simulation = ds.load_simulation()
client0_records = ds.client_records(0, use_simulation=True)
```

### Flower-style runtime config usage

```python
from fedops_dataset import FedOpsLocalDataset

run_config = {
    "repo-root": "/path/to/fed-multimodal",
    "data-root": "/path/to/fed_multimodal/data",
    "hateful-memes-root": "/path/to/fed_multimodal/data/hateful_memes",
}

# Simulation mode example (Flower simulation engine)
node_config = {"partition-id": 0, "num-partitions": 10}

ds = FedOpsLocalDataset.from_runtime_config(
    dataset="crema_d",
    alpha=0.1,
    sample_missing_rate=0.2,
    modality_missing_rate=0.2,
    run_config=run_config,
    node_config=node_config,
)

mode = ds.node_mode(node_config)  # "simulation"
records = ds.client_records_from_node_config(node_config, use_simulation=True)
```

## Path Semantics

- Simulation mode:
  - detected when `node_config` has `partition-id` and `num-partitions`
  - client records can be resolved from `partition-id`
- Deployment mode:
  - if `node_config` has `data-path`, it is used as runtime data root
  - each node can point to its own local data path
- No hardcoded path is required:
  - pass `run_config`/`node_config`, CLI args, or env vars

## Environment Variables

```bash
export FEDOPS_REPO_ROOT=/path/to/fed-multimodal
export FEDOPS_OUTPUT_DIR=/path/to/fed-multimodal/fed_multimodal/output
export FEDOPS_DATA_ROOT=/path/to/fed_multimodal/data
export HATEFUL_MEMES_ROOT=/path/to/fed_multimodal/data/hateful_memes
```

## Optional HF Artifact Client

`FedOpsDatasetClient` remains available if you also host artifacts in a Hugging Face dataset repo.
It is optional for local/original-data mode.

## Maintainer Release

```bash
cd fedops_dataset
export TWINE_USERNAME=__token__
export TWINE_PASSWORD=<pypi-token>
./scripts/publish_pypi.sh
```
