Metadata-Version: 2.4
Name: databroom
Version: 0.3.1
Summary: A cross-language DataFrame cleaning assistant with interactive GUI and one-click code export
Author-email: Oliver Lozano <onlozanoo@gmail.com>
Maintainer-email: Oliver Lozano <onlozanoo@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/onlozanoo/databroom
Project-URL: Documentation, https://github.com/onlozanoo/databroom/blob/main/README.md
Project-URL: Repository, https://github.com/onlozanoo/databroom
Project-URL: Issues, https://github.com/onlozanoo/databroom/issues
Project-URL: Changelog, https://github.com/onlozanoo/databroom/releases
Keywords: data-cleaning,pandas,streamlit,data-preprocessing,code-generation,gui,dataframe
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Software Development :: Code Generators
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas>=1.3.0
Requires-Dist: numpy>=1.20.0
Requires-Dist: streamlit>=1.28.0
Requires-Dist: unidecode>=1.3.0
Requires-Dist: jinja2>=3.0.0
Requires-Dist: pathlib2>=2.3.0
Requires-Dist: typer>=0.7.0
Requires-Dist: rich>=12.0.0
Provides-Extra: cli
Requires-Dist: pandas>=1.3.0; extra == "cli"
Requires-Dist: numpy>=1.20.0; extra == "cli"
Requires-Dist: unidecode>=1.3.0; extra == "cli"
Requires-Dist: jinja2>=3.0.0; extra == "cli"
Requires-Dist: pathlib2>=2.3.0; extra == "cli"
Requires-Dist: typer>=0.7.0; extra == "cli"
Requires-Dist: rich>=12.0.0; extra == "cli"
Provides-Extra: gui
Requires-Dist: pandas>=1.3.0; extra == "gui"
Requires-Dist: numpy>=1.20.0; extra == "gui"
Requires-Dist: streamlit>=1.28.0; extra == "gui"
Requires-Dist: unidecode>=1.3.0; extra == "gui"
Requires-Dist: jinja2>=3.0.0; extra == "gui"
Requires-Dist: pathlib2>=2.3.0; extra == "gui"
Dynamic: license-file

# Databroom

A DataFrame cleaning tool with CLI, GUI, and code generation capabilities.

## Why Databroom?

**Manual pandas approach:**
```python
# 15+ lines of repetitive code
import pandas as pd
import unicodedata

df = pd.read_csv("messy_data.csv")
# Remove empty columns
df = df.loc[:, df.isnull().mean() < 0.9]
# Clean column names
df.columns = df.columns.str.lower().str.replace(' ', '_')
# Remove accents from text values
def clean_text(text):
    if pd.isna(text): return text
    return ''.join(c for c in unicodedata.normalize('NFKD', str(text)) 
                   if not unicodedata.combining(c))
for col in df.select_dtypes(include=['object']).columns:
    df[col] = df[col].apply(clean_text)
df.to_csv("clean_data.csv", index=False)
```

**Databroom approach:**
```bash
# Single command
databroom clean messy_data.csv --clean-all --output-file clean_data.csv
```

## Installation

```bash
pip install databroom
```

## Quick Start

### Command Line Interface

```bash
# Clean everything (recommended)
databroom clean data.csv --clean-all --output-file cleaned.csv

# Clean only columns
databroom clean data.csv --clean-columns --output-file cleaned.csv

# Clean with code generation
databroom clean data.csv --clean-all --output-code script.py

# Generate R code
databroom clean data.csv --clean-all --output-code script.R --lang r

# Launch interactive GUI
databroom gui
```

### Python API

```python
from databroom.core.broom import Broom

# Load and clean data
broom = Broom.from_csv('data.csv')
cleaned = broom.clean_all()  # Smart clean everything

# Or use specific operations
cleaned = broom.clean_columns().clean_rows()

# Get cleaned DataFrame
df = cleaned.get_df()
```

## Features

- **Smart Operations**: `--clean-all`, `--clean-columns`, `--clean-rows`
- **Advanced Options**: Fine-tune with `--no-snakecase`, `--empty-threshold`, etc.
- **Code Generation**: Export Python/pandas or R/tidyverse scripts
- **Interactive GUI**: Streamlit-based web interface
- **File Support**: CSV, Excel, JSON input/output

## Available Operations

| Operation | Description |
|-----------|-------------|
| `clean_all()` | Complete cleaning: columns + rows with all operations |
| `clean_columns()` | Clean column names: snake_case + remove accents + remove empty |
| `clean_rows()` | Clean row data: snake_case + remove accents + remove empty |

### Legacy operations (still supported)
- `remove_empty_cols()`, `remove_empty_rows()`
- `standardize_column_names()`, `normalize_column_names()`
- `normalize_values()`, `standardize_values()`

## CLI Parameters

```bash
# Smart Operations
--clean-all              # Clean everything
--clean-columns          # Clean column names only  
--clean-rows            # Clean row data only

# Advanced Options
--no-snakecase          # Keep original text case
--no-remove-accents-vals # Keep accents in values
--empty-threshold 0.8   # Custom missing value threshold

# Output
--output-file clean.csv # Save cleaned data
--output-code script.py # Generate reproducible code
--lang python          # Code language (python/r)
```

## Examples

### Data Science Workflow
```bash
databroom clean survey.xlsx \
  --clean-all \
  --empty-threshold 0.7 \
  --output-file clean.csv \
  --output-code analysis.py
```

### R/Tidyverse Code Generation
```bash
databroom clean data.csv \
  --clean-all \
  --output-code analysis.R \
  --lang r
```

### Batch Processing
```bash
for file in *.csv; do
  databroom clean "$file" --clean-columns --output-file "clean_$file"
done
```

## GUI Interface

Launch the interactive web interface:

```bash
databroom gui
# Opens http://localhost:8501
```

Features:
- Drag & drop file upload
- Live preview of operations
- Interactive parameter tuning
- Real-time code generation
- One-click download

## Method Chaining

```python
from databroom.core.broom import Broom

result = (Broom.from_csv('messy_data.csv')
          .clean_columns(empty_threshold=0.8)
          .clean_rows(snakecase=False)
          .get_df())
```

## Code Generation

All operations automatically generate reproducible code:

```python
# Generated Python code
import pandas as pd
from databroom.core.broom import Broom

broom_instance = Broom.from_csv("data.csv")
broom_instance = broom_instance.clean_all()
df_cleaned = broom_instance.pipeline.df
```

## License

MIT License - see LICENSE file for details.
