Metadata-Version: 2.4
Name: databeak
Version: 0.0.2
Summary: DataBeak: MCP server for comprehensive CSV file operations with pandas-based tools
Project-URL: Homepage, https://github.com/jonpspri/databeak
Project-URL: Documentation, https://github.com/jonpspri/databeak#readme
Project-URL: Repository, https://github.com/jonpspri/databeak
Project-URL: Issues, https://github.com/jonpspri/databeak/issues
Project-URL: Changelog, https://github.com/jonpspri/databeak/blob/main/CHANGELOG.md
Project-URL: Release Notes, https://github.com/jonpspri/databeak/releases
Author-email: Jonathan Springer <jps@s390x.com>
Maintainer-email: Jonathan Springer <jps@s390x.com>
License: Apache-2.0
License-File: LICENSE
License-File: NOTICE
Keywords: csv,data-analysis,data-manipulation,data-profiling,data-quality,data-validation,fastmcp,mcp,model-context-protocol,outlier-detection,pandas
Classifier: Development Status :: 4 - Beta
Classifier: Framework :: AsyncIO
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Office/Business :: Financial :: Spreadsheet
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Utilities
Classifier: Typing :: Typed
Requires-Python: >=3.11
Requires-Dist: aiofiles>=24.1.0
Requires-Dist: chardet>=5.2.0
Requires-Dist: fastmcp>=2.11.3
Requires-Dist: httpx>=0.27.0
Requires-Dist: numpy>=2.1.3
Requires-Dist: openpyxl>=3.1.5
Requires-Dist: pandas>=2.2.3
Requires-Dist: psutil>=7.0.0
Requires-Dist: pyarrow>=17.0.0
Requires-Dist: pydantic-settings>=2.10.1
Requires-Dist: pydantic>=2.10.4
Requires-Dist: python-dateutil>=2.9.0
Requires-Dist: pytz>=2024.2
Requires-Dist: scipy>=1.16.1
Requires-Dist: simpleeval>=1.0.3
Requires-Dist: tabulate>=0.9.0
Provides-Extra: all
Requires-Dist: bottleneck>=1.4.0; extra == 'all'
Requires-Dist: faker>=30.0.0; extra == 'all'
Requires-Dist: fastparquet>=2024.11.0; extra == 'all'
Requires-Dist: hypothesis>=6.122.0; extra == 'all'
Requires-Dist: ipython>=8.29.0; extra == 'all'
Requires-Dist: mkdocs-material>=9.5.0; extra == 'all'
Requires-Dist: mkdocs-mermaid2-plugin>=1.1.0; extra == 'all'
Requires-Dist: mkdocs>=1.6.0; extra == 'all'
Requires-Dist: mkdocstrings[python]>=0.26.0; extra == 'all'
Requires-Dist: mypy>=1.13.0; extra == 'all'
Requires-Dist: numexpr>=2.10.0; extra == 'all'
Requires-Dist: pandas-stubs>=2.2.3; extra == 'all'
Requires-Dist: pre-commit>=4.0.0; extra == 'all'
Requires-Dist: pytest-asyncio>=0.24.0; extra == 'all'
Requires-Dist: pytest-benchmark>=5.0.0; extra == 'all'
Requires-Dist: pytest-cov>=5.0.0; extra == 'all'
Requires-Dist: pytest-mock>=3.14.0; extra == 'all'
Requires-Dist: pytest>=8.3.0; extra == 'all'
Requires-Dist: rich>=13.9.0; extra == 'all'
Requires-Dist: ruff>=0.7.0; extra == 'all'
Requires-Dist: types-aiofiles>=24.1.0; extra == 'all'
Requires-Dist: types-pytz>=2024.2; extra == 'all'
Requires-Dist: types-tabulate>=0.9.0; extra == 'all'
Provides-Extra: dev
Requires-Dist: ipython>=8.29.0; extra == 'dev'
Requires-Dist: mypy>=1.13.0; extra == 'dev'
Requires-Dist: pandas-stubs>=2.2.3; extra == 'dev'
Requires-Dist: pre-commit>=4.0.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.24.0; extra == 'dev'
Requires-Dist: pytest-benchmark>=5.0.0; extra == 'dev'
Requires-Dist: pytest-cov>=5.0.0; extra == 'dev'
Requires-Dist: pytest-mock>=3.14.0; extra == 'dev'
Requires-Dist: pytest>=8.3.0; extra == 'dev'
Requires-Dist: rich>=13.9.0; extra == 'dev'
Requires-Dist: ruff>=0.7.0; extra == 'dev'
Requires-Dist: types-aiofiles>=24.1.0; extra == 'dev'
Requires-Dist: types-pytz>=2024.2; extra == 'dev'
Requires-Dist: types-tabulate>=0.9.0; extra == 'dev'
Provides-Extra: docs
Requires-Dist: mkdocs-material>=9.5.0; extra == 'docs'
Requires-Dist: mkdocs-mermaid2-plugin>=1.1.0; extra == 'docs'
Requires-Dist: mkdocs>=1.6.0; extra == 'docs'
Requires-Dist: mkdocstrings[python]>=0.26.0; extra == 'docs'
Provides-Extra: performance
Requires-Dist: bottleneck>=1.4.0; extra == 'performance'
Requires-Dist: fastparquet>=2024.11.0; extra == 'performance'
Requires-Dist: numexpr>=2.10.0; extra == 'performance'
Provides-Extra: test
Requires-Dist: faker>=30.0.0; extra == 'test'
Requires-Dist: hypothesis>=6.122.0; extra == 'test'
Requires-Dist: pytest-asyncio>=0.24.0; extra == 'test'
Requires-Dist: pytest-benchmark>=5.0.0; extra == 'test'
Requires-Dist: pytest-cov>=5.0.0; extra == 'test'
Requires-Dist: pytest-mock>=3.14.0; extra == 'test'
Requires-Dist: pytest>=8.3.0; extra == 'test'
Description-Content-Type: text/markdown

# DataBeak

## AI-Powered CSV Processing via Model Context Protocol

Transform how AI assistants work with CSV data. DataBeak provides 40+
specialized tools for data manipulation, analysis, and validation through the
Model Context Protocol (MCP).

## Features

- 🔄 **Complete Data Operations** - Load, transform, analyze, and export CSV data
- 📊 **Advanced Analytics** - Statistics, correlations, outlier detection, data
  profiling
- ✅ **Data Validation** - Schema validation, quality scoring, anomaly detection
- 💾 **Auto-Save & History** - Never lose work with configurable strategies and
  undo/redo
- ⚡ **High Performance** - Handles large datasets with streaming and chunking
- 🔒 **Session Management** - Multi-user support with isolated sessions
- 🌟 **Production Quality** - Zero ruff violations, 100% mypy compliance,
  comprehensive test coverage

## Getting Started

The fastest way to use DataBeak is with `uvx` (no installation required):

### For Claude Desktop

Add this to your MCP Settings file:

```json
{
  "mcpServers": {
    "databeak": {
      "command": "uvx",
      "args": [
        "--from",
        "git+https://github.com/jonpspri/databeak.git",
        "databeak"
      ]
    }
  }
}
```

### For Other AI Clients

DataBeak works with Continue, Cline, Windsurf, and Zed. See the
[installation guide](https://jonpspri.github.io/databeak/installation) for
specific configuration examples.

### Quick Test

Once configured, ask your AI assistant:

```text
"Load a CSV file and show me basic statistics"
"Remove duplicate rows and export as Excel"
"Find outliers in the price column"
```

## Documentation

📚 **[Complete Documentation](https://jonpspri.github.io/databeak/)**

- [Installation Guide](https://jonpspri.github.io/databeak/installation) - Setup
  for all AI clients
- [Quick Start Tutorial](https://jonpspri.github.io/databeak/tutorials/quickstart)
  \- Learn in 10 minutes
- [API Reference](https://jonpspri.github.io/databeak/api/overview) - All 40+
  tools documented
- [Architecture](https://jonpspri.github.io/databeak/architecture) - Technical
  details

## Environment Variables

| Variable                    | Default | Description               |
| --------------------------- | ------- | ------------------------- |
| `DATABEAK_MAX_FILE_SIZE_MB` | 1024    | Maximum file size         |
| `DATABEAK_CSV_HISTORY_DIR`  | "."     | History storage location  |
| `DATABEAK_SESSION_TIMEOUT`  | 3600    | Session timeout (seconds) |

## Contributing

We welcome contributions! Please:

1. Fork the repository
1. Create a feature branch (`git checkout -b feature/amazing-feature`)
1. Make your changes with tests
1. Run quality checks: `uv run -m pytest`
1. Submit a pull request

**Note**: All changes must go through pull requests. Direct commits to `main`
are blocked by pre-commit hooks.

## Development

```bash
# Setup development environment
git clone https://github.com/jonpspri/databeak.git
cd databeak
uv sync

# Run the server locally
uv run databeak

# Run tests
uv run -m pytest tests/unit/          # Unit tests (primary)
uv run -m pytest                      # All tests

# Run quality checks
uv run ruff check
uv run mypy
```

### Testing Structure

DataBeak currently focuses on comprehensive unit testing with future plans for
integration and E2E testing:

- **Unit Tests** (`tests/unit/`) - Fast, isolated module tests (current focus)
- **Integration Tests** (`tests/integration/`) - Future: FastMCP Client-based
  testing
- **E2E Tests** (`tests/e2e/`) - Future: Complete workflow validation

**Current Test Execution:**

```bash
uv run pytest -n auto tests/unit/          # Run unit tests (primary)
uv run pytest -n auto --cov=src/databeak   # Run with coverage analysis
```

See [Testing Guide](tests/README.md) for comprehensive testing details.

## License

Apache 2.0 - see [LICENSE](LICENSE) file.

## Support

- **Issues**: [GitHub Issues](https://github.com/jonpspri/databeak/issues)
- **Discussions**:
  [GitHub Discussions](https://github.com/jonpspri/databeak/discussions)
- **Documentation**:
  [jonpspri.github.io/databeak](https://jonpspri.github.io/databeak/)
