Metadata-Version: 2.4
Name: dataset-inspector
Version: 0.0.2
Summary: Lightweight CLI tool to analyze CSV datasets and detect common ML data issues.
Author: Quentin
Project-URL: Homepage, https://github.com/mqtin/dataset-inspector
Project-URL: Repository, https://github.com/mqtin/dataset-inspector
Project-URL: Issues, https://github.com/mqtin/dataset-inspector/issues
Keywords: machine-learning,data-quality,csv,cli,data-validation
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas>=1.5
Requires-Dist: numpy>=1.23
Dynamic: license-file

# dataset-inspector

![Author](https://img.shields.io/badge/author-mqtin-orange)

![Issues](https://img.shields.io/github/issues/mqtin/dataset-inspector)
![PRs](https://img.shields.io/github/issues-pr/mqtin/dataset-inspector)

![Forks](https://img.shields.io/github/forks/mqtin/dataset-inspector)
![Stars](https://img.shields.io/github/stars/mqtin/dataset-inspector?style=social)

![Last commit](https://img.shields.io/github/last-commit/mqtin/dataset-inspector)
![Repo size](https://img.shields.io/github/repo-size/mqtin/dataset-inspector)

![PyPI](https://img.shields.io/pypi/v/dataset-inspector)
![Python versions](https://img.shields.io/pypi/pyversions/dataset-inspector)
![Wheel](https://img.shields.io/pypi/wheel/dataset-inspector)
![License](https://img.shields.io/pypi/l/dataset-inspector)



A lightweight CLI tool to quickly analyze CSV datasets and detect common issues before training ML models.

## Features

- **Basic statistics** – mean, std, min, max, median, quartiles for every numeric column
- **Missing values** – per-column count and percentage
- **Duplicate rows** – exact duplicate detection
- **Class imbalance** – minority/majority ratio check for categorical columns
- **Anomaly detection** – IQR or z-score based outlier flagging
- **Terminal report** – clean, colour-coded output
- **JSON export** – optional machine-readable report

## Installation

```bash
pip install dataset-inspector
```

## Quick start

```bash
# Analyse the included example dataset
python -m dataset_inspector --input examples/sample.csv

# Specify a target column for class-imbalance check
python -m dataset_inspector --input data.csv --target label

# Use z-score anomaly detection and export JSON
python -m dataset_inspector --input data.csv --anomaly-method zscore --json report.json

# Enable verbose / debug logging
python -m dataset_inspector --input data.csv -v
```

## Project structure

```
dataset_inspector/
├── __init__.py      # package metadata
├── __main__.py      # python -m entry point
├── cli.py           # argument parsing (argparse)
├── analyzer.py      # core analysis logic
├── report.py        # terminal + JSON report formatting
└── utils.py         # logging helpers
examples/
└── sample.csv       # small demo dataset
```
