Metadata-Version: 2.4
Name: p6plab-datasage
Version: 1.0.0
Summary: MCP server for secure local file system access
Project-URL: Homepage, https://github.com/p6plab/datasage
Project-URL: Repository, https://github.com/p6plab/datasage
Project-URL: Issues, https://github.com/p6plab/datasage/issues
Author-email: P6P Lab <contact@p6plab.com>
License: MIT License
        
        Copyright (c) 2024 P6P Lab
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
License-File: LICENSE
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.10
Requires-Dist: fastmcp>=2.0.0
Requires-Dist: pyyaml>=6.0
Provides-Extra: dev
Requires-Dist: black>=23.0.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Description-Content-Type: text/markdown

# DataSage MCP Server

DataSage is a Model Context Protocol (MCP) server that provides AI assistants with secure access to local file systems. It enables generative AI tools like Amazon Q, Claude Desktop, and other MCP-compatible clients to search, read, and navigate local files and directories through a standardized interface.

## Features

- **Secure File Access**: Configurable path restrictions prevent access outside specified directories
- **Full-Text Search**: Search file contents and filenames with fuzzy matching, regex, and exact matching
- **Directory Traversal**: Navigate directory structures with configurable depth limits
- **Text File Support**: Automatic detection and handling of text-based files with encoding support
- **MCP Compliant**: Follows Model Context Protocol specification for seamless AI integration
- **FastMCP v2**: Built on the latest FastMCP framework for optimal performance

## Installation

Install DataSage using uvx (recommended):

```bash
uvx p6plab-datasage
```

Or install with pip:

```bash
pip install p6plab-datasage
```

## Quick Start

1. **Create a configuration file** (`datasage.yaml`):

```yaml
server:
  name: "My DataSage"
  description: "Local file server for AI assistants"

paths:
  - path: "~/Documents"
    description: "Personal documents"
  - path: "~/Code"
    description: "Source code files"

settings:
  max_depth: 10
  max_file_size: 10485760  # 10MB
```

2. **Start the server**:

```bash
# STDIO transport (for Claude Desktop, etc.)
uvx p6plab-datasage

# HTTP transport (for web-based clients)
uvx p6plab-datasage --transport http --port 8000

# Custom configuration
uvx p6plab-datasage --config my-config.yaml
```

## Configuration

### Configuration File Format

DataSage uses YAML configuration files with the following structure:

```yaml
server:
  name: "DataSage"                    # Server name
  description: "File server for AI"   # Server description

paths:                                # Allowed file paths
  - path: "~/Documents"
    description: "Documents folder"
  - path: "/Users/shared/projects"
    description: "Shared projects"

settings:
  max_depth: 10                       # Maximum directory depth
  max_file_size: 10485760            # Maximum file size (10MB)
  text_detection: "auto"             # Text file detection method
  excluded_extensions:               # Binary file extensions to skip
    - ".exe"
    - ".jpg"
    - ".pdf"

tools:
  search:
    description: "Search files"       # Tool descriptions
    max_results: 50
  get_page:
    description: "Get file content"
  get_page_children:
    description: "List directory contents"

search:
  fuzzy_threshold: 0.8               # Fuzzy matching threshold
  enable_regex: true                 # Enable regex search
  index_content: true                # Index file contents
```

### Environment Variables

Override configuration with environment variables (higher priority):

```bash
export DATASAGE_NAME="Custom DataSage"
export DATASAGE_DESCRIPTION="Custom description"
export DATASAGE_PATHS="/path1,/path2"
export DATASAGE_MAX_DEPTH=5
export DATASAGE_TOOL_SEARCH_DESC="Custom search description"
```

## Available Tools

DataSage provides three MCP tools:

### 1. `search`
Search files by content or filename with multiple matching algorithms.

**Parameters:**
- `query` (required): Search query string
- `file_type` (optional): File extension filter (e.g., ".py", ".md")
- `search_type` (optional): "content", "filename", or "both" (default: "both")
- `match_type` (optional): "exact", "fuzzy", or "regex" (default: "fuzzy")
- `max_results` (optional): Maximum results to return (default: 20)

### 2. `get_page`
Retrieve the content of a specific file.

**Parameters:**
- `path` (required): File path to read
- `encoding` (optional): Text encoding (default: "utf-8")

### 3. `get_page_children`
List the contents of a directory with optional recursion.

**Parameters:**
- `path` (required): Directory path to list
- `max_depth` (optional): Maximum recursion depth (default: 1)
- `include_files` (optional): Include files in results (default: true)
- `include_dirs` (optional): Include directories in results (default: true)
- `file_filter` (optional): File extension filter

## Usage Examples

### With Claude Desktop

Add to your Claude Desktop MCP configuration:

```json
{
  "mcpServers": {
    "datasage": {
      "command": "uvx",
      "args": ["p6plab-datasage", "--config", "/path/to/datasage.yaml"]
    }
  }
}
```

### With FastMCP Client

```python
import asyncio
from fastmcp import Client

async def main():
    async with Client("uvx p6plab-datasage") as client:
        # Search for Python files
        result = await client.call_tool("search", {
            "query": "function",
            "file_type": ".py",
            "search_type": "content"
        })
        print(result.content[0].text)

asyncio.run(main())
```

### Command Line Options

```bash
# Basic usage
uvx p6plab-datasage

# HTTP server
uvx p6plab-datasage --transport http --port 8000

# Custom configuration
uvx p6plab-datasage --config /path/to/config.yaml

# Bind to all interfaces
uvx p6plab-datasage --transport http --host 0.0.0.0 --port 8000

# Show help
uvx p6plab-datasage --help
```

## Security

DataSage implements multiple security measures:

- **Path Validation**: Only allows access to explicitly configured paths
- **Directory Traversal Protection**: Prevents `../` attacks
- **File Type Filtering**: Automatically excludes binary files
- **Size Limits**: Configurable maximum file sizes
- **Permission Checking**: Respects file system permissions

## Development

### Running from Source

```bash
git clone <repository>
cd datasage
pip install -e .
python -m p6plab_datasage.server --config examples/datasage.yaml
```

### Running Tests

```bash
pip install -e ".[dev]"
pytest
```

### Using FastMCP CLI

```bash
fastmcp run src/p6plab_datasage/server.py
fastmcp run src/p6plab_datasage/server.py --transport http --port 8000
```

## License

MIT License - see LICENSE file for details.

## Contributing

Contributions are welcome! Please feel free to submit issues and pull requests.
