Metadata-Version: 2.4
Name: ax-prover
Version: 0.1.0
Summary: LangGraph agent for Lean4 theorem formalization and proving
Author-email: Borja Requena <borja@axiomatic-ai.com>, Austin Letson <austin@axiomatic-ai.com>, Krystian Nowakowski <krystian@axiomatic-ai.com>, Izan Beltran Ferreiro <izan@axiomatic-ai.com>, Leopoldo Sarra <leopoldo@axiomatic-ai.com>
License-Expression: AGPL-3.0-only
Project-URL: Homepage, https://github.com/Axiomatic-AI/ax-prover-base
Project-URL: Repository, https://github.com/Axiomatic-AI/ax-prover-base
Project-URL: Issues, https://github.com/Axiomatic-AI/ax-prover-base/issues
Keywords: lean4,theorem-proving,formalization,langgraph,mathlib
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Mathematics
Classifier: Topic :: Software Development :: Libraries
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: aiohttp>=3.9.0
Requires-Dist: google-auth>=2.23.0
Requires-Dist: langgraph>=1.0.3
Requires-Dist: langchain>=1.0.8
Requires-Dist: langchain-anthropic>=1.3.0
Requires-Dist: langchain-google-genai>=4.2.0
Requires-Dist: langchain-openai>=1.1.7
Requires-Dist: lean-interact>=0.1.0
Requires-Dist: jsonargparse[omegaconf]>=4.35.0
Requires-Dist: omegaconf>=2.3.0
Requires-Dist: platformdirs>=4.0
Requires-Dist: pydantic>=2.12.4
Requires-Dist: pylatexenc>=2.10
Requires-Dist: python-dotenv>=1.2.1
Requires-Dist: requests>=2.31.0
Requires-Dist: tavily-python>=0.7.13
Requires-Dist: wikipedia>=1.4.0
Provides-Extra: dev
Requires-Dist: build>=1.0.0; extra == "dev"
Requires-Dist: pre-commit>=4.5.0; extra == "dev"
Requires-Dist: ruff>=0.14.6; extra == "dev"
Requires-Dist: pytest>=9.0.1; extra == "dev"
Requires-Dist: pytest-cov>=7.0.0; extra == "dev"
Requires-Dist: pytest-asyncio>=1.3.0; extra == "dev"
Requires-Dist: pytest-timeout>=2.4.0; extra == "dev"
Requires-Dist: setuptools-scm>=8.0; extra == "dev"
Requires-Dist: twine>=6.0.0; extra == "dev"
Dynamic: license-file

# ax-prover

**A minimal agent for automated theorem proving in Lean 4**

[![CI](https://github.com/Axiomatic-AI/ax-prover-base/actions/workflows/unit_tests.yml/badge.svg)](https://github.com/Axiomatic-AI/ax-prover-base/actions/workflows/unit_tests.yml)
[![Python 3.11+](https://img.shields.io/badge/python-3.11%2B-blue)](https://www.python.org/downloads/)
[![License: AGPL-3.0](https://img.shields.io/badge/License-AGPL--3.0-blue.svg)](LICENSE)
[![PyPI version](https://img.shields.io/pypi/v/ax-prover)](https://pypi.org/project/ax-prover/)

A simple, modular agent that proves Lean 4 theorems through iterative refinement.
It uses off-the-shelf LLMs (no fine-tuning) with a feedback loop, a memory system, and library search tools to achieve competitive results against highly-engineered systems that rely on specialized training and orders of magnitude more compute.

## Key Results

| Benchmark | AxProverBase | Best Comparable |
|-----------|----------|-----------------|
| **PutnamBench** | **54.7%** (pass@1) | 13.0% (Goedel V2, pass@184) |
| **FATE-M** | **98.0%** | 62.7% (DeepSeek V2, pass@64) |
| **FATE-H** | **66.0%** | 3.0% (DeepSeek V2) |
| **FATE-X** | **24.0%** | 0.0% (all others) |
| **LeanCat** | **59.0%** | 14.0% (Gemini 3 Pro) |

All results with Claude Opus 4.5, 50 iterations, pass@1. See our [paper](#citation) for full details and comparisons.

## How It Works

<p align="center">
  <img src="assets/figure1.png" alt="ax-prover architecture" width="500">
</p>

The agent runs an iterative loop:

1. **Proposer** — An LLM writes Lean 4 proof code, optionally using tools (LeanSearch, web search) to find relevant Mathlib lemmas
2. **Compiler** — Builds the code with `lake`; extracts goal states at `sorry` locations to provide structured feedback
3. **Reviewer** — Verifies statement preservation and proof validity (no `sorry`, no cheating tactics)
4. **Memory** — Summarizes lessons from failed attempts into a concise "lab notebook" to prevent repeating mistakes

The loop continues until the proof is complete or the iteration budget is exhausted (default: 50).

## Quick Start

```bash
pip install ax-prover
```

```bash
# Configure your API keys
ax-prover configure

# Navigate to a Lean 4 project
cd /path/to/lean4-project

# Prove a theorem
ax-prover prove MyModule:my_theorem
```

## Installation

```bash
pip install ax-prover
# or
uv add ax-prover

# For development (includes ruff, pytest, pre-commit)
pip install -e ".[dev]"
```

<details>
<summary><strong>Prerequisites</strong></summary>

- **Python 3.11+**
- **Lean 4** with `lake` available on PATH ([installation guide](https://leanprover-community.github.io/get_started.html))
- **LLM API key** — at least one of:
  - `ANTHROPIC_API_KEY` (recommended — Claude Opus 4.5 gives best results)
  - `OPENAI_API_KEY`
  - `GOOGLE_API_KEY`
- **Tavily API key** (optional, for web search) — `TAVILY_API_KEY`

Set up your API keys interactively:

```bash
ax-prover configure
```

Or export them directly in your shell:

```bash
export ANTHROPIC_API_KEY=sk-ant-...
```

</details>

## Usage

### Proving theorems

```bash
# Prove a specific theorem by module path
ax-prover prove MyModule.Path:theorem_name

# Prove a specific theorem by file path
ax-prover prove MyProject/Algebra/Ring.lean:theorem_name

# Prove the theorem at a specific line
ax-prover prove MyProject/Algebra/Ring.lean#L42

# Prove all unproven theorems in a file
ax-prover prove MyProject/Algebra/Ring.lean

# Skip lake build (if repo is already built)
ax-prover prove MyModule:theorem_name --skip-build

# Save JSON output to file (for scripting/automation)
ax-prover prove MyModule:theorem_name -o result.json
```

### Running experiments

Run batch evaluations on [LangSmith](https://smith.langchain.com) datasets:

```bash
# Run experiment on a dataset
ax-prover experiment dataset_name

# With custom concurrency
ax-prover experiment dataset_name --max-concurrency 8
```

<details>
<summary><strong>Configuration</strong></summary>

Customize behavior with YAML config files and CLI overrides:

```yaml
# my_config.yaml
prover:
  max_iterations: 75
  prover_llm:
    model: "anthropic:claude-opus-4-20250514"
    temperature: 0.5
    thinking:
      type: enabled
      budget_tokens: 32000
```

```bash
# Use a config file
ax-prover --config my_config.yaml prove MyModule:theorem

# Override values from the CLI
ax-prover prove MyModule:theorem prover.max_iterations=100

# Save your current configuration for later reuse
ax-prover --save-config my_setup prove MyModule:theorem
```

</details>

## Contributing

We welcome contributions of all kinds — bug reports, feature requests, documentation, and code.
See our [Contributing Guide](CONTRIBUTING.md) to get started.

## License

This project is licensed under the [AGPL-3.0](LICENSE).

## Citation

If you use ax-prover in your research, please cite:

```bibtex
@article{axproverbase2026,
  title={A Minimal Agent for Automated Theorem Proving},
  author={Requena Pozo, Borja and Letson, Austin and Nowakowski, Krystian and Beltran Ferreiro, Izan and Sarra, Leopoldo},
  year={2026}
}
```
