Metadata-Version: 2.4
Name: docalyze-mcp-server
Version: 0.1.0
Summary: MCP server for reading and visually analyzing local documents (PDF, Excel, CSV, Word, PowerPoint, images). No API keys required — works with GitHub Copilot and any MCP-compatible AI host.
Project-URL: Repository, https://github.com/LunarPerovskite/docalyze-mcp-server
Author: Juan Esteban Mosquera
License-Expression: MIT
License-File: LICENSE
Keywords: copilot,docalyze,documents,excel,mcp,pdf,vscode
Classifier: Development Status :: 4 - Beta
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Text Processing :: General
Requires-Python: >=3.10
Requires-Dist: mcp>=1.0.0
Requires-Dist: openpyxl>=3.1
Requires-Dist: pandas>=2.0
Requires-Dist: pdfplumber>=0.10
Provides-Extra: all
Requires-Dist: pillow>=10.0; extra == 'all'
Requires-Dist: pytesseract>=0.3; extra == 'all'
Requires-Dist: python-docx>=1.0; extra == 'all'
Requires-Dist: python-pptx>=0.6; extra == 'all'
Provides-Extra: docx
Requires-Dist: python-docx>=1.0; extra == 'docx'
Provides-Extra: ocr
Requires-Dist: pillow>=10.0; extra == 'ocr'
Requires-Dist: pytesseract>=0.3; extra == 'ocr'
Provides-Extra: pptx
Requires-Dist: python-pptx>=0.6; extra == 'pptx'
Description-Content-Type: text/markdown

<!-- mcp-name: io.github.LunarPerovskite/docalyze -->

# Docalyze MCP Server

An MCP (Model Context Protocol) server that lets AI assistants read and visually analyze local documents — PDFs, Excel spreadsheets, CSV files, Word documents, PowerPoint presentations, and images.

No API keys required. The host AI (GitHub Copilot, Claude, etc.) does all the reasoning directly.

## Supported Formats

| Format | Extensions | Read | Visual |
|--------|-----------|:----:|:------:|
| PDF | `.pdf` | ✅ | ✅ |
| Excel | `.xlsx`, `.xls` | ✅ | ✅ |
| CSV / TSV | `.csv`, `.tsv` | ✅ | — |
| JSON | `.json` | ✅ | — |
| Word | `.docx` | ✅ | ✅ |
| PowerPoint | `.pptx` | ✅ | ✅ |
| Plain text | `.txt`, `.md` | ✅ | — |
| Images | `.png`, `.jpg`, `.jpeg`, `.gif`, `.bmp`, `.tiff`, `.webp` | — | ✅ |

## Tools

| Tool | Description |
|------|-------------|
| `list_documents` | List files under a directory, filtered by glob pattern |
| `document_info` | Get metadata (size, modified date, sheets) for a file |
| `read_document` | Extract text content from a document with pagination |
| `visual_evaluate_document` | Return page images inline so the AI can analyze charts, tables, and diagrams |

## Installation

### From VS Code (recommended)

Search for **docalyze** in the MCP server gallery (Extensions sidebar → MCP tab) and click Install.

### From PyPI

```bash
pip install docalyze-mcp-server
```

### Manual setup

Add to your VS Code `mcp.json` (or `settings.json`):

```jsonc
{
  "servers": {
    "docalyze": {
      "type": "stdio",
      "command": "python",
      "args": ["-m", "docalyze_mcp_server"],
      "env": {
        "PYTHONIOENCODING": "utf-8"
      }
    }
  }
}
```

Or, if you installed via pip and want to use the entry point:

```jsonc
{
  "servers": {
    "docalyze": {
      "type": "stdio",
      "command": "docalyze-mcp-server"
    }
  }
}
```

## Optional Dependencies

The base install handles PDF, Excel, CSV, JSON, and plain text. For additional formats:

```bash
# Word documents
pip install docalyze-mcp-server[docx]

# PowerPoint
pip install docalyze-mcp-server[pptx]

# OCR (requires Tesseract installed on your system)
pip install docalyze-mcp-server[ocr]

# Everything
pip install docalyze-mcp-server[all]
```

## Configuration

The server reads documents from a configurable root directory. Set the `DOCUMENTS_ROOT` environment variable to change it:

```jsonc
{
  "servers": {
    "docalyze": {
      "type": "stdio",
      "command": "docalyze-mcp-server",
      "env": {
        "DOCUMENTS_ROOT": "/path/to/your/documents"
      }
    }
  }
}
```

If not set, it defaults to the directory containing the server script.

## License

MIT
