Metadata-Version: 2.4
Name: dataprep-ai
Version: 0.1.4
Summary: One-line, opinionated data cleaning for pandas/Polars. Fixes missing values, categories, outliers, and duplicates with transparent logs and a reproducible report.
Project-URL: Homepage, https://github.com/RohitRajdev/dataprep-ai
Project-URL: Documentation, https://github.com/RohitRajdev/dataprep-ai#readme
Project-URL: Issues, https://github.com/RohitRajdev/dataprep-ai/issues
Author-email: Rohit Rajdev <rohit@sandscript.ai>
License: Apache-2.0
License-File: LICENSE
Keywords: data cleaning,data wrangling,pandas,polars,preprocessing
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: <3.13,>=3.9
Requires-Dist: numpy>=1.23
Requires-Dist: pandas>=1.5
Requires-Dist: pyarrow>=10
Requires-Dist: pydantic>=2.6
Requires-Dist: rich>=13.7.0
Requires-Dist: scikit-learn>=1.2
Provides-Extra: app
Requires-Dist: matplotlib>=3.7; extra == 'app'
Requires-Dist: streamlit>=1.35.0; extra == 'app'
Provides-Extra: full
Requires-Dist: matplotlib>=3.7; extra == 'full'
Requires-Dist: polars>=1.0.0; extra == 'full'
Requires-Dist: streamlit>=1.35.0; extra == 'full'
Requires-Dist: ydata-profiling>=4.6; (python_version >= '3.9') and extra == 'full'
Description-Content-Type: text/markdown

![CI](https://github.com/RohitRajdev/dataprep-ai/actions/workflows/ci.yml/badge.svg)
[![PyPI](https://img.shields.io/pypi/v/dataprep-ai.svg)](https://pypi.org/project/dataprep-ai/)
[![Python Versions](https://img.shields.io/pypi/pyversions/dataprep-ai.svg)](https://pypi.org/project/dataprep-ai/)
[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://www.apache.org/licenses/LICENSE-2.0)
[![Downloads](https://img.shields.io/pypi/dm/dataprep-ai.svg)](https://pypistats.org/packages/dataprep-ai)



# dataprep-ai

**One-line, opinionated data cleaning for pandas/Polars.**  
Fix missing values, inconsistent categories, outliers, and duplicates with transparent logs and a reproducible report.

---

## Installation

```bash
pip install dataprep-ai

----
For the optional explorer app:
pip install "dataprep-ai[app]"

Requirements

Python: 3.9 – 3.12

OS: Linux, macOS, Windows

Required libs (auto-installed): pandas, numpy, pyarrow, scikit-learn, pydantic, rich

Optional:

polars (enabled automatically where supported) — Polars round-trip I/O

streamlit, matplotlib — only needed for the explorer

Quickstart:

import pandas as pd
from dataprep_ai import clean, CleaningConfig

df = pd.DataFrame({
    "age":[23, None, 25, 1000],
    "income":[52000, 58000, None, 1200000],
    "city":["NY","New York","nyc", None],
    "id":[1,2,2,4]
})

result = clean(df, CleaningConfig(
    id_columns=["id"],
    outlier_strategy="iqr_cap",
    categorical_normalization=True,
    drop_duplicates=False
))

print(result.summary_markdown)  # see cleaning report
df_clean = result.df            # cleaned DataFrame
result.to_json("clean_report.json")

Streamlit Explorer:

pip install "dataprep-ai[app]"
streamlit run -m dataprep_ai.explore -- --csv your.csv

Backends

Input = pandas.DataFrame → Output = pandas.DataFrame

Input = polars.DataFrame → Output = polars.DataFrame (internally converts via pandas in v0.1)

License

Apache-2.0


