Metadata-Version: 2.4
Name: docalyze-mcp-server
Version: 0.2.1
Summary: MCP server for reading and visually analyzing local documents (PDF, Excel, CSV, Word, PowerPoint, images). No API keys required — works with GitHub Copilot and any MCP-compatible AI host.
Project-URL: Repository, https://github.com/LunarPerovskite/docalyze-mcp-server
Author: Juan Esteban Mosquera
License-Expression: MIT
License-File: LICENSE
Keywords: copilot,docalyze,documents,excel,mcp,pdf,vscode
Classifier: Development Status :: 4 - Beta
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Text Processing :: General
Requires-Python: >=3.10
Requires-Dist: mcp>=1.0.0
Requires-Dist: openpyxl>=3.1.2
Requires-Dist: pandas<3.0,>=2.2.0
Requires-Dist: pdfplumber>=0.11.0
Provides-Extra: all
Requires-Dist: pillow>=10.3.0; extra == 'all'
Requires-Dist: pytesseract>=0.3.10; extra == 'all'
Requires-Dist: python-docx>=1.1.0; extra == 'all'
Requires-Dist: python-pptx>=1.0.0; extra == 'all'
Provides-Extra: docx
Requires-Dist: python-docx>=1.1.0; extra == 'docx'
Provides-Extra: ocr
Requires-Dist: pillow>=10.3.0; extra == 'ocr'
Requires-Dist: pytesseract>=0.3.10; extra == 'ocr'
Provides-Extra: pptx
Requires-Dist: python-pptx>=1.0.0; extra == 'pptx'
Description-Content-Type: text/markdown

<!-- mcp-name: io.github.LunarPerovskite/docalyze -->
<p align="center">
  <img src="logo.svg" alt="Docalyze Logo" width="150" height="150">
</p>

# Docalyze MCP Server

An MCP (Model Context Protocol) server that lets AI assistants read and visually analyze local documents — PDFs, Excel spreadsheets, CSV files, Word documents, PowerPoint presentations, and images.

No API keys required. The host AI (GitHub Copilot, Claude, etc.) does all the reasoning directly.

## Supported Formats

| Format | Extensions | Read | Visual |
|--------|-----------|:----:|:------:|
| PDF | `.pdf` | ✅ | ✅ |
| Excel | `.xlsx`, `.xls` | ✅ | ✅ |
| CSV / TSV | `.csv`, `.tsv` | ✅ | — |
| JSON | `.json` | ✅ | — |
| Word | `.docx` | ✅ | ✅ |
| PowerPoint | `.pptx` | ✅ | ✅ |
| Plain text | `.txt`, `.md` | ✅ | — |
| Images | `.png`, `.jpg`, `.jpeg`, `.gif`, `.bmp`, `.tiff`, `.webp` | — | ✅ |

## Tools

| Tool | Description |
|------|-------------|
| `list_documents` | List files under a directory, filtered by glob pattern |
| `document_info` | Get metadata (size, modified date, sheets) for a file |
| `read_document` | Extract text content from a document with pagination |
| `visual_evaluate_document` | Return page images inline so the AI can analyze charts, tables, and diagrams |

## Installation

### From VS Code (recommended)

Search for **docalyze** in the MCP server gallery (Extensions sidebar → MCP tab) and click Install.

### From PyPI

```bash
pip install docalyze-mcp-server
```

### From npm

```bash
npx docalyze-mcp-server
```

This requires [uv](https://docs.astral.sh/uv/) or pipx installed — the npm wrapper calls `uvx` to run the Python package automatically.

### Manual setup

Add to your VS Code `mcp.json` (or `settings.json`):

```jsonc
{
  "servers": {
    "docalyze": {
      "type": "stdio",
      "command": "python",
      "args": ["-m", "docalyze_mcp_server"],
      "env": {
        "PYTHONIOENCODING": "utf-8"
      }
    }
  }
}
```

Or, if you installed via pip and want to use the entry point:

```jsonc
{
  "servers": {
    "docalyze": {
      "type": "stdio",
      "command": "docalyze-mcp-server"
    }
  }
}
```

## Optional Dependencies

The base install handles PDF, Excel, CSV, JSON, and plain text. For additional formats:

```bash
# Word documents
pip install docalyze-mcp-server[docx]

# PowerPoint
pip install docalyze-mcp-server[pptx]

# OCR (requires Tesseract installed on your system)
pip install docalyze-mcp-server[ocr]

# Everything
pip install docalyze-mcp-server[all]
```

## Configuration

The server reads documents from a configurable root directory. Set the `DOCUMENTS_ROOT` environment variable to change it:

```jsonc
{
  "servers": {
    "docalyze": {
      "type": "stdio",
      "command": "docalyze-mcp-server",
      "env": {
        "DOCUMENTS_ROOT": "/path/to/your/documents"
      }
    }
  }
}
```

If not set, it defaults to the directory containing the server script.

## License

MIT
