Metadata-Version: 2.1
Name: unidq
Version: 0.1.0
Summary: Unified Transformer for Multi-Task Data Quality
Author-email: Your Name <your.email@example.com>
License: MIT
Project-URL: Homepage, https://github.com/yourusername/unidq
Project-URL: Documentation, https://unidq.readthedocs.io
Project-URL: Repository, https://github.com/yourusername/unidq
Project-URL: Issues, https://github.com/yourusername/unidq/issues
Keywords: data-quality,machine-learning,transformers,error-detection,data-cleaning,imputation,multi-task-learning
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Database
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: torch>=1.9.0
Requires-Dist: numpy>=1.19.0
Requires-Dist: pandas>=1.3.0
Requires-Dist: scikit-learn>=0.24.0
Requires-Dist: tqdm>=4.60.0
Provides-Extra: dev
Requires-Dist: pytest>=6.0; extra == "dev"
Requires-Dist: pytest-cov>=2.0; extra == "dev"
Requires-Dist: black>=22.0; extra == "dev"
Requires-Dist: flake8>=4.0; extra == "dev"
Provides-Extra: docs
Requires-Dist: sphinx>=4.0; extra == "docs"
Requires-Dist: sphinx-rtd-theme>=1.0; extra == "docs"

# UNIDQ: Unified Data Quality

[![PyPI version](https://badge.fury.io/py/unidq.svg)](https://pypi.org/project/unidq/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)

**A unified transformer architecture for multi-task data quality assessment.**

UNIDQ addresses 6 data quality tasks with a single model:
- ✅ Error Detection (F1=0.894, +42% vs Raha)
- ✅ Data Repair
- ✅ Missing Value Imputation (R²=0.941, +295% vs MICE)
- ✅ Label Noise Detection (F1=0.856, +28% vs Cleanlab)
- ✅ Label Classification
- ✅ Data Valuation

## Installation
```bash
pip install unidq
```

## Quick Start
```python
from unidq import UNIDQ, MultiTaskDataset, UNIDQTrainer

# Load your data
dataset = MultiTaskDataset(
    dirty_features=X_dirty,
    clean_features=X_clean,
    error_mask=errors,
    labels=y
)

# Initialize model
model = UNIDQ(n_features=X_dirty.shape[1])

# Train
trainer = UNIDQTrainer(model)
trainer.fit(dataset)

# Predict
results = model.predict(X_new)
print(f"Detected errors: {results['errors']}")
print(f"Imputed values: {results['imputed']}")
```


## Citation

If you use UNIDQ in your research, please cite:
```bibtex
@inproceedings{unidq2026,
  title={UNIDQ: A Unified Transformer Architecture for Multi-Task Data Quality},
  author={Your Name},
  booktitle={Proceedings of the VLDB Endowment},
  year={2026}
}
```

## License

MIT License
