Metadata-Version: 2.4
Name: wise-pizza-mcp
Version: 0.1.2
Summary: MCP server for wise-pizza: find interesting segments in multidimensional data
Project-URL: Homepage, https://github.com/MotleyAI/wise-pizza-mcp
Project-URL: Repository, https://github.com/MotleyAI/wise-pizza-mcp
Project-URL: Bug Tracker, https://github.com/MotleyAI/wise-pizza-mcp/issues
License-Expression: MIT
License-File: LICENSE
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Python: >=3.10
Requires-Dist: kaleido
Requires-Dist: mcp[cli]>=1.0.0
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: plotly
Requires-Dist: wise-pizza>=0.2.9
Provides-Extra: dev
Requires-Dist: pytest; extra == 'dev'
Requires-Dist: pytest-asyncio; extra == 'dev'
Description-Content-Type: text/markdown

# wise-pizza-mcp

**Slice through your data like a hot knife through pizza.**

An MCP server that wraps the [wise-pizza](https://github.com/transferwise/wise-pizza) library, letting LLMs find the most interesting segments in multidimensional data. Load CSVs once, then run multiple analyses — all through the Model Context Protocol.

## What it does

wise-pizza finds segments in your data that are unusual or that explain changes between time periods. This MCP server exposes four analysis methods:

- **explain_levels** — Find segments whose average is most different from the global average
- **explain_changes_in_totals** — Explain what drove changes in totals between two datasets
- **explain_changes_in_average** — Explain what drove changes in averages between two datasets
- **explain_timeseries** — Find segments with different temporal patterns in panel data

Plus dataset management tools to load, list, and remove CSV data.

## Prerequisites

This server uses [uv](https://docs.astral.sh/uv/) to run. Install it if you don't have it:

```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```

## Quick start

### Install with uvx

Add to your Claude Desktop config (`~/.claude/claude_desktop_config.json`):

```json
{
  "mcpServers": {
    "wise-pizza": {
      "command": "uvx",
      "args": ["wise-pizza-mcp"]
    }
  }
}
```

If you're working in a clone of this repo with Claude Code, the server is auto-configured via `.mcp.json` — no manual setup needed.

## Tools

### `load_data`

Load a CSV file into the in-memory dataset store.

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `file_path` | string | yes | Path to a CSV file on disk |
| `dataset_name` | string | yes | Unique name to refer to this dataset |
| `dims` | list[string] | yes | Column names to use as dimensions (categorical columns to segment by) |
| `total_name` | string | yes | Column name containing the totals (the value to analyze) |
| `size_name` | string | no | Column name containing segment sizes/weights |

**Example:**
```
Load the file /data/sales.csv as "sales_q1" with dimensions ["region", "product", "channel"],
total_name "revenue", and size_name "transactions"
```

### `list_loaded_datasets`

Returns a list of all dataset names currently in the store.

### `remove_loaded_dataset`

Remove a dataset from the store by name.

### `explain_levels`

Find segments whose average is most different from the global one. Great for answering: "Which segments are unusual in this dataset?"

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `dataset_name` | string | required | Name of a loaded dataset |
| `max_segments` | int | auto | Maximum segments to find |
| `min_depth` | int | 1 | Minimum dimensions per segment |
| `max_depth` | int | 2 | Maximum dimensions per segment |
| `constrain_signs` | bool | true | Force segment weights to match direction |
| `cluster_values` | bool | false | Consider clusters of similar dimension values |
| `image_size` | list[int] | none | [width, height] to generate a PNG chart |

**Example:**
```
Run explain_levels on "sales_q1" with max_segments=5 and image_size=[1200, 600]
```

### `explain_changes_in_totals`

Find segments that explain differences between totals of two datasets.

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `dataset_name_1` | string | required | Baseline dataset |
| `dataset_name_2` | string | required | Comparison dataset |
| `max_segments` | int | auto | Maximum segments to find |
| `min_depth` | int | 1 | Minimum dimensions per segment |
| `max_depth` | int | 2 | Maximum dimensions per segment |
| `how` | string | "totals" | "totals", "split_fits", "extra_dim", or "force_dim" |
| `constrain_signs` | bool | true | Force segment weights to match direction |
| `cluster_values` | bool | false | Consider clusters of similar dimension values |
| `image_size` | list[int] | none | [width, height] for PNG chart |

The `how` parameter controls decomposition:
- `"totals"` — Combined analysis of segment total changes
- `"split_fits"` — Separately decompose size changes vs average changes (returns `SplitAnalysisResult`)
- `"extra_dim"` — Treat size/average contribution as an additional dimension
- `"force_dim"` — Like extra_dim, but each segment must specify a Change_from constraint

### `explain_changes_in_average`

Same signature as `explain_changes_in_totals`, but focuses on average value changes rather than total volume changes.

### `explain_timeseries`

Split time series panel data into segments with different temporal patterns.

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `dataset_name` | string | required | Name of a loaded dataset |
| `time_name` | string | required | Column containing time values |
| `num_segments` | int | auto | Number of segments to find |
| `max_depth` | int | 2 | Maximum dimensions per segment |
| `fit_sizes` | bool | auto | Also fit segment sizes over time |
| `num_breaks` | int | 3 | Breakpoints in stylized time series |
| `n_jobs` | int | 10 | Parallel jobs for tree building |
| `ignore_averages` | bool | true | Ignore overall level when comparing |
| `image_size` | list[int] | none | [width, height] for PNG chart |

## Response format

### AnalysisResult

Most tools return an `AnalysisResult` JSON:

```json
{
  "task": "levels",
  "segments": [
    {
      "segment": {"region": "US", "product": "A"},
      "total": 1500.0,
      "seg_size": 200.0,
      "naive_avg": 7.5,
      "impact": 120.0,
      "avg_impact": 0.6
    }
  ],
  "relevant_clusters": {},
  "global_average": 5.2,
  "markdown_summary": "| ... |",
  "image_path": "/tmp/tmpXXXXXX.png"
}
```

- **segment**: Dimension-value pairs defining this segment
- **total**: Total value within the segment
- **seg_size**: Size/weight of the segment
- **naive_avg**: Simple average in the segment (total / seg_size)
- **impact**: Segment's impact on the overall total (lasso solver only)
- **avg_impact**: Impact per unit weight (lasso solver only)
- **relevant_clusters**: Cluster definitions when `cluster_values=True`
- **image_path**: Path to generated PNG file (when `image_size` is set)

### SplitAnalysisResult

When using `how="split_fits"`, the response contains separate analyses:

```json
{
  "task": "changes (split)",
  "size_analysis": { "...AnalysisResult..." },
  "average_analysis": { "...AnalysisResult..." },
  "image_path": "/tmp/tmpXXXXXX.png"
}
```

## Use case walkthroughs

### 1. Finding unusual segments in current data

```
1. Load my sales data: load_data("/data/sales_jan.csv", "jan_sales",
   dims=["region", "product", "channel"], total_name="revenue", size_name="orders")

2. Find unusual segments: explain_levels("jan_sales", max_segments=5, image_size=[1200, 600])
```

This identifies segments where the average revenue per order differs most from the global average — e.g., "APAC + Premium products have 3x higher revenue per order than average".

### 2. Explaining changes between periods

```
1. Load both periods:
   load_data("/data/sales_q1.csv", "q1", dims=["region", "product"], total_name="revenue", size_name="orders")
   load_data("/data/sales_q2.csv", "q2", dims=["region", "product"], total_name="revenue", size_name="orders")

2. What drove the total change?
   explain_changes_in_totals("q1", "q2", image_size=[1000, 800])

3. What about average changes specifically?
   explain_changes_in_average("q1", "q2")

4. Break down size vs average contributions:
   explain_changes_in_totals("q1", "q2", how="split_fits")
```

### 3. Time series segmentation

```
1. Load panel data:
   load_data("/data/monthly_metrics.csv", "monthly",
   dims=["region", "product"], total_name="revenue", size_name="customers")

2. Find segments with different trends:
   explain_timeseries("monthly", time_name="month", image_size=[1200, 800])
```

This finds groups like "EU + Product B is declining while US + Product A is growing".

## Development

```bash
git clone https://github.com/MotleyAI/wise-pizza-mcp.git
cd wise-pizza-mcp
pip install -e ".[dev]"
pytest tests/ -v
```

## License

MIT
