Metadata-Version: 2.4
Name: token-optimizer-mcp
Version: 0.1.0
Summary: High-performance MCP server for minimizing LLM token usage and API costs via structural code analysis and precision chunking
Project-URL: Homepage, https://github.com/Tjpatel16/token-optimizer-mcp
Project-URL: Repository, https://github.com/Tjpatel16/token-optimizer-mcp
Project-URL: Issues, https://github.com/Tjpatel16/token-optimizer-mcp/issues
Author: TJ
License: MIT
License-File: LICENSE
Keywords: ai,code-editor,context-optimization,cost-optimization,llm,mcp,token-optimization
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Software Development :: Quality Assurance
Requires-Python: >=3.10
Requires-Dist: aiofiles>=23.0
Requires-Dist: mcp[cli]>=1.0.0
Provides-Extra: dev
Requires-Dist: build; extra == 'dev'
Requires-Dist: ruff; extra == 'dev'
Requires-Dist: twine; extra == 'dev'
Provides-Extra: embeddings
Requires-Dist: faiss-cpu>=1.7; extra == 'embeddings'
Requires-Dist: numpy>=1.24; extra == 'embeddings'
Description-Content-Type: text/markdown

<div align="center">
  <h1>🚀 Token Optimizer MCP</h1>
  <p><strong>A high-performance Model Context Protocol (MCP) server that slashes LLM token consumption and API overhead. Using structural code summarization, intelligent file chunking, and semantic vector search, it ensures your AI agent receives high-fidelity context with minimal payload—maximizing both speed and cost-efficiency.</strong></p>

[![Python >= 3.10](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
[![MCP SDK](https://img.shields.io/badge/MCP-FastMCP-orange.svg)](https://modelcontextprotocol.io/)

</div>

---

## 📖 Introduction

Working with Agentic LLMs and large codebases usually means burning through massive context windows, leading to exorbitant API costs and slow response times.

**Token Optimizer MCP** solves this by providing a suite of highly-optimized tools that intercept requests and ensure the LLM _only receives exactly what it needs_. It leverages structural chunking, code skeleton extraction, unified Git diffing, and strict character limits to compress payloads—all while maintaining the semantic context the LLM requires to function.

Additionally, the built-in **Token Tracker** persistently calculates exactly how many tokens (and estimated dollars) you save on every request.

---

## 🛠️ Installation

Requirements: **Python 3.10+**. You can install the package via `pip`. Choose the installation method based on whether you need semantic search capabilities.

### 1. Base Installation (Recommended)

This installs the core MCP server and all standard token-optimization tools. It is lightweight and sufficient for most users.

```bash
pip install token-optimizer-mcp
```

### 2. Installation with Embeddings

Use this command if you want to enable the **Semantic Search (Vector Store)** feature. This installation includes additional dependencies like `faiss-cpu` and `numpy`.

> [!TIP]
> **Already installed the base package?** No problem. Running this command will simply "upgrade" your installation by adding the missing embedding-related dependencies.

```bash
pip install "token-optimizer-mcp[embeddings]"
```

---

## ⚡ Features

- **Smart File Chunking**: Never send a 10,000 line file again. Read exact line ranges with hard caps.
- **Structural Summarization**: Extract classes, imports, and function signatures to give the LLM a topological map of a file without the payload of raw code.
- **Trimmed Search**: Recursive codebase searching that returns only file paths and micro-snippets.
- **Delta Diffing**: Returns only the modified lines (`git diff`), ensuring unmodified code is never re-processed.
- **Memory Summarization**: A persistent JSON store for the LLM to stash compressed conversational context, preventing the need to replay huge histories.
- **Semantic Code Search (Embeddings)**:
  - **On-the-fly Indexing**: Automatically chunks and indexes your codebase into a vector space.
  - **Similarity Retrieval**: Allows the LLM to find relevant code sections by meaning rather than just keywords, using an optimized FAISS vector store.
  - **Efficient Context**: Bridges the gap between "knowing the file exists" and "finding the exact relevant snippet" without reading the whole repo.

---

## 🚀 Usage

Once installed, the CLI tool acts as both the MCP Server entry point and a management interface.

### 1. MCP Client Configuration

To hook the optimizer up to your agent (e.g., Claude Desktop, Cursor, or Codex, Antigravity), simply define the environment variables and point the command to `token-optimizer-mcp`:

```json
{
  "mcpServers": {
    "token-optimizer": {
      "command": "token-optimizer-mcp",
      "args": ["run"],
      "env": {
        "ENABLE_TOKEN_TRACKING": "true",
        "TOKEN_COST_PER_1K": "0.003"
      }
    }
  }
}
```

### 2. Management CLI Commands

You can run these commands manually in your terminal to manage the server's cache and view your savings.

| Command                            | Description                                      |
| :--------------------------------- | :----------------------------------------------- |
| `token-optimizer-mcp run`          | Starts the MCP server via `stdio` (Standard).    |
| `token-optimizer-mcp run --sse`    | Starts the MCP server via SSE (HTTP).            |
| `token-optimizer-mcp stats`        | Prints your lifetime token and monetary savings. |
| `token-optimizer-mcp reset-stats`  | Wipes the historical token tracker data.         |
| `token-optimizer-mcp clear-memory` | Purges the persisted LLM memory stash.           |

> [!NOTE]
> The `run` commands are typically handled automatically by your MCP client (like Claude Desktop) once configured. You generally don't need to run them manually unless you are testing or debugging.

#### Example: Checking your savings

```bash
$ token-optimizer-mcp stats

📊 Lifetime Token Savings
======================================
Total Requests:      24
Tokens Used:         8,421
Tokens Saved:        114,290
Estimated Savings:   $0.3428
Tracking Since:      2024-05-12T10:00:00Z
--------------------------------------
```

---

## ⚙️ Configuration File (Environment Variables)

The optimizer is configured entirely via Environment Variables. Edit these in your `mcpServers` config block to tune the aggressiveness of the tokenizer.

| Variable                | Default | Description                                                                                                                                                                                |
| :---------------------- | :------ | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `PROJECT_ROOT`          | `.`     | Absolute path to the repository you are analyzing.                                                                                                                                         |
| `MAX_FILE_LINES`        | `300`   | Hard cap on the number of lines returned by file read tools.                                                                                                                               |
| `MAX_OUTPUT_CHARS`      | `8000`  | Global cutoff limit applied to _all_ tool responses.                                                                                                                                       |
| `MAX_PREVIEW_CHARS`     | `200`   | Snippet length returned alongside search hits.                                                                                                                                             |
| `MAX_SUMMARY_CHARS`     | `2000`  | Truncation limit for structural file summaries.                                                                                                                                            |
| `ENABLE_TOKEN_TRACKING` | `true`  | Toggle the persistent `~/.cache` token tracker on/off.                                                                                                                                     |
| `TOKEN_COST_PER_1K`     | `0.003` | **Critical for Cost Estimation.** This value represents the cost (in USD) of **1,000 input tokens** for the model you are using. The tracker uses this to calculate your monetary savings. |

### 💡 Understanding `TOKEN_COST_PER_1K`

Token savings are calculated using a standard heuristic: **1 token ≈ 4 characters**. To get accurate dollar savings, you should match this value to your specific model's pricing:

- **Claude Sonnet 4.6**: `0.003` (Default)
- **Claude Opus 4.6**: `0.015`
- **Google Gemini 3 Pro**: `0.0025`
- **GPT-5-codex**: `0.0020`

> [!IMPORTANT]
> **Check your provider's current pricing.** The values above are illustrative and model pricing changes frequently. Always verify the latest "Input Token" price from your AI provider's official documentation.

> [!NOTE]
> The metric used is always **Input/Prompt tokens**, as that is where the optimization (and savings) occur.

### Optional Embeddings Variables

_(Only relevant if installed via `[embeddings]`)_

- `EMBEDDING_CHUNK_SIZE` (Default: `50`): Lines per chunk for vectorization.
- `FAISS_INDEX_PATH`: Absolute path to store the FAISS `.faiss` database.

---

## 🧠 Under the Hood

Because Claude, chatGPT, and most Agent frameworks don't know they have unlimited context, they frequently ask for the entire `app.js` file just to check an import. **Token Optimizer MCP** intercepts this behavior:

1. **Tool Invocation**: The LLM calls `read_file_chunk(path="app.js", start_line=1, end_line=5000)`.
2. **Cap Enforcement**: The server intercepts the request limits it to `MAX_FILE_LINES` (e.g. 300).
3. **Savings Calculation**: `token_tracker.py` calculates the token delta between sending the 5000 lines vs the returned 300 lines using an industry-standard heuristic (`len(str) / 4`).
4. **Piggybacked Stats**: The server returns the payload _alongside_ the `_token_stats` object. The LLM sees exactly how many tokens it just saved by being constrained.

---

## 🐛 Bug Reports & Feature Requests

If you encounter a bug or unexpected behavior, please [open an issue](https://github.com/Tjpatel16/token-optimizer-mcp/issues) to report it.

Likewise, if you need a new feature or have an idea for an improvement, you can request it by opening a feature request on GitHub.

---

## 📝 License

Distributed under the MIT License. See `LICENSE` for more information.
