Metadata-Version: 2.4
Name: databeak
Version: 0.1.2
Summary: DataBeak: MCP server for comprehensive CSV file operations with pandas-based tools
Project-URL: Homepage, https://github.com/jonpspri/databeak
Project-URL: Documentation, https://github.com/jonpspri/databeak#readme
Project-URL: Repository, https://github.com/jonpspri/databeak
Project-URL: Issues, https://github.com/jonpspri/databeak/issues
Project-URL: Changelog, https://github.com/jonpspri/databeak/blob/main/CHANGELOG.md
Project-URL: Release Notes, https://github.com/jonpspri/databeak/releases
Author-email: Jonathan Springer <jps@s390x.com>
Maintainer-email: Jonathan Springer <jps@s390x.com>
License: Apache-2.0
License-File: LICENSE
License-File: NOTICE
Keywords: csv,data-analysis,data-manipulation,data-profiling,data-quality,data-validation,fastmcp,mcp,model-context-protocol,outlier-detection,pandas
Classifier: Development Status :: 4 - Beta
Classifier: Framework :: AsyncIO
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Office/Business :: Financial :: Spreadsheet
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Utilities
Classifier: Typing :: Typed
Requires-Python: >=3.12
Requires-Dist: aiofiles>=24.1.0
Requires-Dist: chardet>=5.2.0
Requires-Dist: fastapi>=0.117.1
Requires-Dist: fastmcp>=2.12.4
Requires-Dist: httpx>=0.27.0
Requires-Dist: numpy>=2.1.3
Requires-Dist: openpyxl>=3.1.5
Requires-Dist: pandas>=2.2.3
Requires-Dist: pandera>=0.26.1
Requires-Dist: psutil>=7.0.0
Requires-Dist: pyarrow>=17.0.0
Requires-Dist: pydantic-settings>=2.10.1
Requires-Dist: pydantic>=2.10.4
Requires-Dist: python-dateutil>=2.9.0
Requires-Dist: pytz>=2024.2
Requires-Dist: scipy>=1.16.1
Requires-Dist: simpleeval>=1.0.3
Requires-Dist: tabulate>=0.9.0
Requires-Dist: uvicorn>=0.36.0
Description-Content-Type: text/markdown

# DataBeak

[![Tests](https://github.com/jonpspri/databeak/actions/workflows/test.yml/badge.svg)](https://github.com/jonpspri/databeak/actions/workflows/test.yml)
[![codecov](https://codecov.io/gh/jonpspri/databeak/branch/main/graph/badge.svg)](https://codecov.io/gh/jonpspri/databeak)
[![Python 3.12+](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/downloads/)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Code style: ruff](https://img.shields.io/badge/code%20style-ruff-000000.svg)](https://github.com/astral-sh/ruff)

## AI-Powered CSV Processing via Model Context Protocol

Transform how AI assistants work with CSV data. DataBeak provides 40+
specialized tools for data manipulation, analysis, and validation through the
Model Context Protocol (MCP).

## Features

- 🔄 **Complete Data Operations** - Load, transform, analyze, and export CSV data
- 📊 **Advanced Analytics** - Statistics, correlations, outlier detection, data
  profiling
- ✅ **Data Validation** - Schema validation, quality scoring, anomaly detection
- 🎯 **Stateless Design** - Clean MCP architecture with external context
  management
- ⚡ **High Performance** - Handles large datasets with streaming and chunking
- 🔒 **Session Management** - Multi-user support with isolated sessions
- 🌟 **Code Quality** - Zero ruff violations, 100% mypy compliance, perfect MCP
  documentation standards, comprehensive test coverage

## Getting Started

The fastest way to use DataBeak is with `uvx` (no installation required):

### For Claude Desktop

Add this to your MCP Settings file:

```json
{
  "mcpServers": {
    "databeak": {
      "command": "uvx",
      "args": [
        "--from",
        "git+https://github.com/jonpspri/databeak.git",
        "databeak"
      ]
    }
  }
}
```

### For Other AI Clients

DataBeak works with Continue, Cline, Windsurf, and Zed. See the
[installation guide](https://jonpspri.github.io/databeak/installation) for
specific configuration examples.

### HTTP Mode (Advanced)

For HTTP-based AI clients or custom deployments:

```bash
# Run in HTTP mode
uv run databeak --transport http --host 0.0.0.0 --port 8000

# Access server at http://localhost:8000/mcp
# Health check at http://localhost:8000/health
```

### Quick Test

Once configured, ask your AI assistant:

```text
"Load a CSV file and show me basic statistics"
"Remove duplicate rows and export as Excel"
"Find outliers in the price column"
```

## Documentation

📚 **[Complete Documentation](https://jonpspri.github.io/databeak/)**

- [Installation Guide](https://jonpspri.github.io/databeak/installation) - Setup
  for all AI clients
- [Quick Start Tutorial](https://jonpspri.github.io/databeak/tutorials/quickstart)
  \- Learn in 10 minutes
- [API Reference](https://jonpspri.github.io/databeak/api/overview) - All 40+
  tools documented
- [Architecture](https://jonpspri.github.io/databeak/architecture) - Technical
  details

## Environment Variables

| Variable                    | Default | Description               |
| --------------------------- | ------- | ------------------------- |
| `DATABEAK_MAX_FILE_SIZE_MB` | 1024    | Maximum file size         |
| `DATABEAK_CSV_HISTORY_DIR`  | "."     | History storage location  |
| `DATABEAK_SESSION_TIMEOUT`  | 3600    | Session timeout (seconds) |

## Known Limitations

DataBeak is designed for interactive CSV processing with AI assistants. Be aware
of these constraints:

- **File Size**: Maximum 1024MB per file (configurable via
  `DATABEAK_MAX_FILE_SIZE_MB`)
- **Session Management**: Maximum 100 concurrent sessions, 1-hour timeout
  (configurable)
- **Memory**: Large datasets may require significant memory; monitor with
  `system_info` tool
- **CSV Dialects**: Assumes standard CSV format; complex dialects may require
  pre-processing
- **Concurrency**: Single-threaded processing per session; parallel sessions
  supported
- **Data Types**: Automatic type inference; complex types may need explicit
  conversion
- **URL Loading**: HTTPS only; blocks private networks (127.0.0.1, 192.168.x.x,
  10.x.x.x) for security

For production deployments with larger datasets, consider adjusting environment
variables and monitoring resource usage.

## Contributing

We welcome contributions! Please:

1. Fork the repository
1. Create a feature branch (`git checkout -b feature/amazing-feature`)
1. Make your changes with tests
1. Run quality checks: `uv run -m pytest`
1. Submit a pull request

**Note**: All changes must go through pull requests. Direct commits to `main`
are blocked by pre-commit hooks.

## Development

```bash
# Setup development environment
git clone https://github.com/jonpspri/databeak.git
cd databeak
uv sync

# Run the server locally
uv run databeak

# Run tests
uv run -m pytest tests/unit/          # Unit tests (primary)
uv run -m pytest                      # All tests

# Run quality checks
uv run ruff check
uv run mypy src/databeak/
```

### Testing Structure

DataBeak implements comprehensive unit and integration testing:

- **Unit Tests** (`tests/unit/`) - 940+ fast, isolated module tests
- **Integration Tests** (`tests/integration/`) - 43 FastMCP Client-based
  protocol tests across 7 test files
- **E2E Tests** (`tests/e2e/`) - Planned: Complete workflow validation

**Test Execution:**

```bash
uv run pytest -n auto tests/unit/          # Run unit tests (940+ tests)
uv run pytest -n auto tests/integration/   # Run integration tests (43 tests)
uv run pytest -n auto --cov=src/databeak   # Run with coverage analysis
```

See [Testing Guide](tests/README.md) for comprehensive testing details.

## License

Apache 2.0 - see [LICENSE](LICENSE) file.

## Support

- **Issues**: [GitHub Issues](https://github.com/jonpspri/databeak/issues)
- **Discussions**:
  [GitHub Discussions](https://github.com/jonpspri/databeak/discussions)
- **Documentation**:
  [jonpspri.github.io/databeak](https://jonpspri.github.io/databeak/)
