Metadata-Version: 2.4
Name: dfcleanerpro
Version: 0.2.2
Summary: Simple DataFrame cleaning toolkit
Home-page: https://github.com/Nishanth9696/dfcleaner
Author: Nishanth_K
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: pandas
Dynamic: author
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# dfcleanerpro

![PyPI](https://img.shields.io/pypi/v/dfcleanerpro)
![Python](https://img.shields.io/badge/python-3.10%2B-blue)
![License](https://img.shields.io/badge/license-MIT-green)

> Professional DataFrame cleaning toolkit for Data Engineers and Data Scientists.

---

## Install

```bash
pip install dfcleanerpro
```

---

## Features

- One-line DataFrame cleaning
- Fluent pipeline API with method chaining
- Audit log — track every transformation
- Schema validator — enforce column types
- Numeric-safe missing value filling
- Dataset quality analyzer

---

## Quick Start

### Simple Clean

```python
from dfcleanerpro import clean_dataframe

cleaned = clean_dataframe(
    df,
    drop_duplicates=True,
    snake_case=True,
    fill_missing="mean",
    remove_empty_cols=True
)
```

### Pipeline API

```python
from dfcleanerpro import DFPipeline

result, audit = (
    DFPipeline(df)
    .snake_case()
    .remove_duplicates()
    .drop_empty_cols()
    .fill_missing("mean")
    .run(audit=True)
)

print(audit)
# {
#   'steps_applied': ['snake_case', 'remove_duplicates', 'drop_empty_cols', 'fill_missing_mean'],
#   'duplicates_removed': 3,
#   'cols_dropped': ['empty_col'],
#   'cols_filled': ['age', 'salary'],
#   'rows_before': 100,
#   'rows_after': 97
# }
```

### Schema Validator

```python
result = (
    DFPipeline(df)
    .validate_schema({"age": "int", "salary": "float"})
    .clean()
    .run()
)
```

### Dataset Analyzer

```python
from dfcleanerpro import analyze_dataframe

report = analyze_dataframe(df)
# {
#   'rows': 500,
#   'columns': 8,
#   'duplicate_rows': 12,
#   'missing_values': {'age': 3, 'salary': 7},
#   'dtypes': {'age': 'int64', 'name': 'object'}
# }
```

---

## API Reference

### `clean_dataframe(df, ...)`

| Parameter          | Type    | Default | Description                        |
|--------------------|---------|---------|------------------------------------|
| `drop_duplicates`  | bool    | True    | Remove duplicate rows              |
| `snake_case`       | bool    | True    | Convert column names to snake_case |
| `fill_missing`     | str/None| None    | 'zero', 'mean', 'median', or None  |
| `remove_empty_cols`| bool    | True    | Drop all-null columns              |

### `DFPipeline(df)`

| Method                  | Description                             |
|-------------------------|-----------------------------------------|
| `.snake_case()`         | Convert column names to snake_case      |
| `.remove_duplicates()`  | Drop duplicate rows                     |
| `.drop_empty_cols()`    | Drop all-null columns                   |
| `.fill_missing(method)` | Fill numeric NaNs — zero/mean/median    |
| `.validate_schema(dict)`| Enforce column types                    |
| `.run(audit=False)`     | Execute pipeline, optionally with audit |

---

## Roadmap

- [ ] `auto_dtype_conversion()`
- [ ] `trim_string_columns()`
- [ ] `detect_outliers()`
- [ ] `data_quality_report()` — HTML export
- [ ] CLI support: `dfcleanerpro clean file.csv`

---

## License

MIT
