Metadata-Version: 2.4
Name: agent-eval-mcp
Version: 0.1.0
Summary: Deterministic evaluation tools for AI coding agents, exposed as an MCP server.
License: MIT
Keywords: ai-agents,code-review,evaluation,mcp
Requires-Python: >=3.10
Requires-Dist: mcp[cli]>=1.0.0
Requires-Dist: pydantic>=2.0
Description-Content-Type: text/markdown

# 🛡️ agent-eval-mcp

**Deterministic Evaluation and Guardrails for AI Coding Agents.**

[![MCP Compatible](https://img.shields.io/badge/MCP-Compatible-green.svg)](#)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](#)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](#)

Building autonomous coding agents is easy. Figuring out how to evaluate whether what they've done is actually good is incredibly hard. 

`agent-eval-mcp` is a stateless, deterministic Model Context Protocol (MCP) server that stops AI agents from writing lazy, unverified, or hallucinated code. It provides language-agnostic rulesets and hybrid scoring to grade AI-generated revisions *before* they get merged.

## ⚠️ The Problem

When you ask an LLM to evaluate its own code, it suffers from sycophancy. It will confidently tell you its fix is perfect, even when it has:
* Generated dummy patterns like `new HashMap<>()` or `pass`.
* Left `// TODO: implement this` in the production patch.
* Hallucinated the surrounding `SEARCH/REPLACE` context, breaking the Git patch.

## 💡 The Solution

This package exposes objective evaluation tools to your agentic workflows via the **Model Context Protocol (MCP)**. It evaluates AI-generated `<<<< SEARCH ==== >>>> REPLACE` blocks using fuzzy-matching and language-specific Abstract Syntax Tree (AST) rules (Java, Python, TypeScript) to catch hallucinations deterministically. 

It completely decouples the heavy lifting of code validation from your LLM orchestration layer.

## 🚀 Quickstart

### 1. Install the Package
Install globally via pip so your MCP clients can execute it:

```bash
pip install agent-eval-mcp