Metadata-Version: 2.4
Name: source-coop-mcp
Version: 0.2.6
Summary: MCP server for Source Cooperative auto-discovery and data exploration
Project-URL: Homepage, https://github.com/yharby/source-coop-mcp
Project-URL: Repository, https://github.com/yharby/source-coop-mcp
Project-URL: Documentation, https://github.com/yharby/source-coop-mcp#readme
Project-URL: Issues, https://github.com/yharby/source-coop-mcp/issues
Project-URL: Changelog, https://github.com/yharby/source-coop-mcp/blob/main/CHANGELOG.md
Author-email: yharby <me@youssefharby.com>
License: MIT
Keywords: geospatial,mcp,model-context-protocol,object-storage,s3,source-cooperative
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: GIS
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.11
Requires-Dist: fastmcp>=2.13.0
Requires-Dist: httpx>=0.28.1
Requires-Dist: obstore>=0.8.2
Provides-Extra: dev
Requires-Dist: pre-commit>=4.3.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=1.2.0; extra == 'dev'
Requires-Dist: pytest>=8.4.2; extra == 'dev'
Requires-Dist: ruff>=0.14.4; extra == 'dev'
Description-Content-Type: text/markdown

# Source Cooperative MCP Server

[![Tests](https://github.com/yharby/source-coop-mcp/actions/workflows/test-and-report.yml/badge.svg)](https://github.com/yharby/source-coop-mcp/actions)
[![PyPI version](https://badge.fury.io/py/source-coop-mcp.svg)](https://pypi.org/project/source-coop-mcp/)
[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

**Discover and access 800TB+ of geospatial data through AI agents.**

An MCP (Model Context Protocol) server for [Source Cooperative](https://source.coop) - a collaborative repository with datasets from Maxar, Harvard, ESA, USGS, and 90+ organizations.

---

## 🏗️ Architecture Overview

```mermaid
graph TB
    subgraph "AI Clients"
        A1[Claude Desktop]
        A2[Claude Code]
        A3[Cursor]
        A4[Cline]
        A5[Zed]
        A6[Continue.dev]
    end

    subgraph "MCP Server"
        MCP[Source Cooperative MCP<br/>FastMCP + obstore]
    end

    subgraph "6 Available Tools"
        T1[list_accounts<br/>94+ orgs]
        T2[list_products<br/>hybrid S3+API]
        T3[get_product_details<br/>+ README]
        T4[list_product_files<br/>tree mode]
        T5[get_file_metadata<br/>no download]
        T6[search<br/>hybrid fuzzy]
    end

    subgraph "Data Sources"
        S1[HTTP API<br/>source.coop/api]
        S2[S3 Direct<br/>opendata.source.coop]
    end

    A1 -->|JSON-RPC| MCP
    A2 -->|JSON-RPC| MCP
    A3 -->|JSON-RPC| MCP
    A4 -->|JSON-RPC| MCP
    A5 -->|JSON-RPC| MCP
    A6 -->|JSON-RPC| MCP

    MCP --> T1
    MCP --> T2
    MCP --> T3
    MCP --> T4
    MCP --> T5
    MCP --> T6

    T1 --> S2
    T2 --> S1
    T2 --> S2
    T3 --> S1
    T3 --> S2
    T4 --> S2
    T5 --> S2
    T6 --> S1

    style MCP fill:#4CAF50,stroke:#2E7D32,stroke-width:3px,color:#fff
    style S1 fill:#2196F3,stroke:#1976D2,stroke-width:2px,color:#fff
    style S2 fill:#2196F3,stroke:#1976D2,stroke-width:2px,color:#fff
```

**Key Features:**
- ✅ **Token Optimized** - 72% reduction for large datasets
- ✅ **Smart Partitions** - Auto-detects Hive-style patterns
- ✅ **Fuzzy Search** - Handles typos and partial matches
- ✅ **No Auth** - All 800TB+ is public

---

## 🚀 Quick Start

### Install

```bash
uvx source-coop-mcp
```

### Configure Your AI Client

#### **Claude Desktop / Claude Code / Cursor / Cline**

Add to config file:
- **Claude Desktop**: `~/Library/Application Support/Claude/claude_desktop_config.json` (macOS)
- **Claude Code**: VS Code `settings.json`
- **Cursor**: Cursor settings
- **Cline**: Cline MCP settings

```json
{
  "mcpServers": {
    "source-coop": {
      "command": "uvx",
      "args": ["source-coop-mcp"]
    }
  }
}
```

#### **Zed**

Add to Zed settings:

```json
{
  "context_servers": {
    "source-coop": {
      "command": "uvx",
      "args": ["source-coop-mcp"]
    }
  }
}
```

#### **Continue.dev**

Add to Continue config (`~/.continue/config.json`):

```json
{
  "experimental": {
    "modelContextProtocolServers": [
      {
        "transport": {
          "type": "stdio",
          "command": "uvx",
          "args": ["source-coop-mcp"]
        }
      }
    ]
  }
}
```

**Restart your AI client and start exploring!**

---

## 🛠️ Available Tools

| Tool | Purpose | Performance |
|------|---------|-------------|
| `list_accounts()` | Find all 94+ organizations | ~850ms |
| `list_products()` | **Hybrid:** S3 mode (default) for ALL datasets + file counts | ~240ms |
| `list_products(include_unpublished=False)` | API mode for published datasets with rich metadata | ~500ms |
| `get_product_details()` | Get metadata + README automatically | ~650ms |
| `list_product_files()` | List files with S3/HTTP paths | ~240ms |
| `list_product_files(show_tree=True)` | Tree view (72% token savings) | ~980ms |
| `get_file_metadata()` | Get file info without downloading | ~230ms |
| `search(query)` | **Hybrid:** Search accounts + products (published + unpublished), top 5 results | ~5-10s |

---

## 💡 What You Can Do

### Discover Data

```
"List all organizations in Source Cooperative"
→ Returns 94+ organizations: maxar, planet, harvard, etc.

"Find all datasets for harvard-lil"
→ Discovers published + unpublished products

"Search for climate datasets"
→ Smart fuzzy search handles typos and partial matches
```

### Access Files

```
"List files in harvard-lil/gov-data"
→ Returns S3 paths and HTTP URLs ready for analysis

"Show me the file tree with partition detection"
→ Smart visualization: year={2020,2021,...+5 more}/ [partitioned]

"Get file metadata without downloading"
→ Size, last modified, ETag
```

### Smart Search

```
"Search for climte" (typo)
→ Finds "climate" datasets (fuzzy matching)

"Search for geo" (partial)
→ Finds "geospatial", "geocoding", etc.
```

---

## ⚡ Features

| Feature | Description |
|---------|-------------|
| **Complete Discovery** | Finds unpublished products the official API doesn't show |
| **No Authentication** | All 800TB+ data is public |
| **Fast Performance** | Rust-backed S3 client (9x faster than boto3) |
| **Token Optimized** | Tree mode: 72% token reduction for large datasets |
| **Smart Partitions** | Auto-detects patterns: `year={2020,2021,...}` |
| **Fuzzy Search** | Handles typos and partial matches |
| **README Integration** | Documentation automatically included |
| **800TB+ Data** | 94+ organizations, geospatial datasets |

---

## 📋 Example Workflow

```
1. "List all organizations"
   → Get 94+ account names

2. "Show me all datasets from maxar"
   → Discover published + unpublished products

3. "Search for climate data"
   → Smart fuzzy search finds relevant datasets

4. "Get details for harvard-lil/gov-data"
   → Full metadata + README content

5. "List files in this dataset with tree view"
   → Token-optimized tree with partition detection
```

---

## 🎯 Why This Server?

### Problem
Source Cooperative has 800TB+ of valuable data, but:
- Official API only shows **published** products
- No auto-discovery of organizations
- Requires knowing what you're looking for

### Solution
This MCP server provides:
- ✅ Complete auto-discovery (published + unpublished)
- ✅ Smart search with fuzzy matching
- ✅ Direct S3 access for all files
- ✅ Token-optimized outputs (72% reduction)
- ✅ Smart partition detection (10-88% additional savings)
- ✅ README documentation included automatically
- ✅ No authentication required

---

## 📊 Performance

All operations complete in **under 1 second**:

```
list_accounts():                          ~850ms  (94+ organizations)
list_products():                          ~240ms  (S3 mode - ALL datasets + file counts)
list_products(include_unpublished=False): ~500ms  (API mode - published with metadata)
list_product_files():                     ~240ms  (simple list)
list_product_files(tree=True):            ~980ms  (72% token savings)
get_file_metadata():                      ~230ms  (HEAD only)
search(query):                            ~5-10s  (hybrid search - 1 recursive S3 scan, top 5 enriched)
```

### Token Optimization Impact

| Dataset Size | Without Tree | With Tree | Saved |
|--------------|--------------|-----------|-------|
| 10 files | 1,500 tokens | 415 tokens | 72.3% |
| 100 files | 15,000 tokens | 4,150 tokens | 72.3% |
| 1,000 files | 150,000 tokens | 41,500 tokens | 72.3% |

With partition detection (1,000 partitions): **88% total savings!**

---

## 🔧 Requirements

- **Python**: 3.11 or higher
- **Package Manager**: `uv` (installed automatically by `uvx`)
- **Operating Systems**: macOS, Linux, Windows

---

## 🤝 Development

See [DEVELOPMENT.md](DEVELOPMENT.md) for:
- Architecture details
- Testing instructions
- Contributing guidelines
- Performance benchmarks
- Token optimization details

---

## 📝 Support

- **Issues**: [GitHub Issues](https://github.com/yharby/source-coop-mcp/issues)

---

## 📄 License

MIT License - see [LICENSE](LICENSE) for details.
