Metadata-Version: 2.4
Name: datashaper-ai
Version: 0.1.0
Summary: Local-first data cleaning for CSV files. No cloud uploads, complete privacy.
Author-email: DataShaper Team <support@datashaper.ai>
License: MIT
Project-URL: Homepage, https://datashaper.ai
Project-URL: Documentation, https://datashaper.ai/docs
Project-URL: Repository, https://github.com/yourusername/datashaper
Project-URL: Changelog, https://github.com/yourusername/datashaper/blob/main/CHANGELOG.md
Project-URL: Bug Tracker, https://github.com/yourusername/datashaper/issues
Keywords: data-cleaning,csv,polars,duckdb,privacy,local-first,data-quality
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Utilities
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: polars>=0.19.0
Requires-Dist: duckdb>=0.9.0
Requires-Dist: click>=8.0.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: structlog>=23.0.0
Requires-Dist: httpx>=0.25.0
Requires-Dist: pyyaml>=6.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: build>=1.0.0; extra == "dev"
Requires-Dist: twine>=4.0.0; extra == "dev"

# DataShaper CLI

**Local-first data cleaning for CSV files. No cloud uploads, complete privacy.**

[![PyPI version](https://badge.fury.io/py/datashaper.svg)](https://pypi.org/project/datashaper/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

---

## 🚀 Quick Start

### Installation

```bash
pip install datashaper
```

### Basic Usage

```bash
# Clean a CSV file
datashaper clean mydata.csv

# Output: mydata_cleaned.csv + quality report
```

---

## ✨ Features

- **🔒 100% Local Processing** - Your data never leaves your device
- **⚡ Dual-Engine Architecture** - Polars (fast) + DuckDB (stable)
- **🧹 Auto-Detection** - Finds duplicates, nulls, formatting issues
- **📊 Quality Scoring** - Get a data quality score (0-100%)
- **🎯 Smart Routing** - Automatically selects best engine for your file
- **💼 Enterprise Ready** - Hardware-bound licensing, Team features

---

## 📋 Commands

### 1. Clean Data

```bash
datashaper clean input.csv
```

**What it does:**
- Removes duplicate rows
- Handles missing values
- Trims whitespace
- Validates data types
- Outputs: `input_cleaned.csv`

**Example Output:**
```
🔑 License: PRO tier
🧹 Cleaning: customer_data.csv
✅ Cleaned data saved to: customer_data_cleaned.csv

📊 Summary:
  • Issues found: 847
  • Fixes applied: 847
  • Quality score: 94.2%
  • Engine used: Polars
```

### 2. Create Bundle (Pro+)

```bash
datashaper bundle input.csv -o output.zip
```

**Creates ZIP with:**
- `cleaned.csv` - Cleaned data
- `report.html` - Detailed quality report
- `schema.json` - Data schema

### 3. Activate License

```bash
datashaper activate DS-ABCD-1234-WXYZ-5678
```

**Features:**
- Binds to your hardware (prevents sharing)
- Works offline after activation
- Stored in `~/.datashaper/license.json`

### 4. Check Status

```bash
datashaper status
```

**Shows:**
- Current tier (FREE, PRO, TEAM, ENTERPRISE)
- File size limit
- Feature availability
- License expiry (if applicable)

---

## 🎯 Tier Comparison

| Feature | Free | Pro | Team | Enterprise |
|---------|------|-----|------|------------|
| File Size | 25 MB | 1 GB | 5 GB | Unlimited* |
| Engine | Polars | Both | Both | Both + Chunked |
| ZIP Export | ❌ | ✅ | ✅ | ✅ |
| HTML Reports | ❌ | ✅ | ✅ | ✅ |
| Team Sharing | ❌ | ❌ | ✅ (10 seats) | ✅ (Custom) |
| Priority Support | ❌ | ❌ | ✅ | ✅ |
| Offline Activation | ✅ | ✅ | ✅ | ✅ |

*Unlimited with recommended hardware (see [HARDWARE.md](./HARDWARE.md))

---

## 🏗️ Architecture

### CrashGuard Engine

**Intelligent engine selection:**
```
File < 1GB → Polars (fast, in-memory)
File > 1GB → DuckDB (stable, out-of-core)
Polars OOM → Auto-retry with DuckDB
```

**Why two engines?**
- **Polars**: 10x faster for typical files
- **DuckDB**: Handles files larger than RAM
- **CrashGuard**: Never crashes, always falls back

---

## 🔒 Privacy & Security

### Your Data Never Leaves Your Device

**Zero cloud uploads:**
- All processing happens locally
- No servers see your data
- Works completely offline (after activation)

**Network requests (exactly 3):**
1. License validation (license key only)
2. Auto-update check (version number only)
3. Team sync (Team tier only, shared rules)

**Read more:**
- [SECURITY.md](./SECURITY.md) - Complete security overview
- [PRIVACY.md](./PRIVACY.md) - Privacy policy
- [NETWORK_AUDIT.md](./NETWORK_AUDIT.md) - Complete network transparency

---

## 💻 Desktop App

**Cross-platform desktop app available:**

- Windows (`.exe`)
- macOS (`.dmg`)
- Linux (`.AppImage`)

**Features:**
- Drag-and-drop file upload
- Real-time progress
- Visual results dashboard
- Team collaboration (Team tier)
- Auto-updates

[Download Desktop App →](https://datashaper.ai/download)

---

## 📦 Enterprise Features

### Team Tier

**Collaboration features:**
- 10 user seats
- Shared cleaning rules
- Team activity audit logs
- Centralized billing

### Enterprise Tier

**For large organizations:**
- Unlimited seats
- Custom file size limits
- On-premises deployment
- SSO integration
- Dedicated support
- SLA guarantees

[Contact Sales →](mailto:enterprise@datashaper.ai)

---

## 🛠️ Development

### From Source

```bash
git clone https://github.com/yourusername/datashaper.git
cd datashaper
pip install -e .
```

### Run Tests

```bash
pytest tests/
```

### Build Desktop App

```bash
cd desktop
npm install
npm run tauri:build
```

---

## 📚 Documentation

- [Security & Privacy](./SECURITY.md)
- [Hardware Recommendations](./HARDWARE.md)
- [Network Audit](./NETWORK_AUDIT.md)
- [Privacy Policy](./PRIVACY.md)

---

## 🤝 Support

### Community

- [GitHub Issues](https://github.com/yourusername/datashaper/issues)
- [Discussions](https://github.com/yourusername/datashaper/discussions)

### Enterprise

- **Email**: enterprise@datashaper.ai
- **Support**: support@datashaper.ai
- **Security**: security@datashaper.ai

---

## 📄 License

MIT License - see [LICENSE](LICENSE) for details

**Commercial licensing available for Enterprise tier.**

---

## 🚀 Getting Started

1. **Install**: `pip install datashaper`
2. **Activate**: `datashaper activate YOUR-LICENSE-KEY`
3. **Clean**: `datashaper clean yourfile.csv`

**That's it!** Your data is cleaned locally, privately, securely.

---

**Made with ❤️ for data professionals who value privacy.**
