Metadata-Version: 2.4
Name: dqm-ml-core
Version: 1.1.5
Summary: Python library designed provide core dqml metrics without huge dependencies, as well as common API shared by metrics
Author-email: Safenai <support@safenai.io>, IRT SystemX <support@irt-systemx.fr>
License-Expression: Apache-2.0
Project-URL: Homepage, https://irt-systemx.github.io/dqm-ml
Project-URL: Documentation, https://irt-systemx.github.io/dqm-ml
Project-URL: Repository, https://github.com/IRT-SystemX/dqm-ml
Keywords: ml,metrics,data
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Typing :: Typed
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: pyarrow>=6.0.0
Requires-Dist: numpy>=1.20.0
Requires-Dist: pandas>=1.3.0
Requires-Dist: scipy>=1.7.0

# DQM-ML Core

This package defines the foundational API and core metrics for the DQM-ML V2 framework.

## Key Concepts

### `DatametricProcessor`

The base class for all metrics and feature extractors. It supports a streaming architecture by splitting computation into two phases:
1. Batch Level: `compute_batch_metric()` updates intermediate statistics for a single chunk of data.
2. Dataset Level: `compute()` aggregates these statistics into final scores.

## Included Metrics

* Completeness: Analyzes null/missing values.
* Representativeness: Statistical distribution analysis (Chi-Square, KS, etc.).

## For Developers

To create a new metric:
1. Subclass `dqm_ml_core.api.data_processor.DatametricProcessor`.
2. Define `needed_columns()`, `generated_features()`, and `generated_metrics()`.
3. Implement the streaming logic in `compute_batch_metric()` and `compute()`.
