Metadata-Version: 2.4
Name: mdmin
Version: 1.0.0
Summary: Rule-based markdown compression for LLM consumption. Reduces token usage by 20-35%.
License-Expression: AGPL-3.0-only
Project-URL: Homepage, https://mdmin.dev
Project-URL: Repository, https://github.com/Dean86/mdmin
Project-URL: Documentation, https://mdmin.dev
Keywords: markdown,compression,llm,tokens,minify,ai,context-window,gpt,claude,openai,anthropic
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Text Processing
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9
Description-Content-Type: text/markdown

# mdmin

Rule-based markdown compression for LLM consumption. Reduces token usage by 20–35%.

**Website:** [mdmin.dev](https://mdmin.dev) • **npm:** [npmjs.com/package/mdmin](https://npmjs.com/package/mdmin)

## Install

```bash
pip install mdmin
```

Zero dependencies. Python 3.9+.

## Usage

### Python API

```python
from mdmin import compress, estimate_tokens

result = compress(text, level="medium")

print(result.output)         # compressed text
print(result.stats.saved)    # tokens saved
print(result.stats.pct)      # % reduction
# CompressResult(output=..., stats=CompressionStats(input_tokens=2273, output_tokens=1765, saved=508, pct=22.3, ...))
```

### CLI

```bash
# Compress a file (output to stdout)
mdmin compress README.md

# Save to file
mdmin compress README.md -o README.min.md

# Compression level
mdmin compress README.md --level aggressive

# Show token stats across all levels
mdmin stats README.md

# Pipe from stdin
cat file.md | mdmin compress -
```

## Compression Levels

| Level | Savings | What it does |
|---|---|---|
| `light` | ~10% | Whitespace, comments, basic verbose patterns |
| `medium` | ~20-25% | + more verbose patterns, table compression, formatting cleanup |
| `aggressive` | ~25-35% | + article stripping, list compression, bold removal, dictionary dedup |

## What It Compresses

- **Verbose phrases**: 150+ patterns — "In order to" → "To", "Due to the fact that" → "Because"
- **Whitespace**: Blank lines, trailing spaces, decorative horizontal rules
- **Tables**: Markdown tables → compact CSV or key:value format
- **Formatting**: Redundant bold on headers, deep heading nesting, emphasis markers
- **Lists**: Short bullet lists → inline comma-separated (aggressive)
- **Links**: Empty titles, unused references, verbose alt text
- **Dictionary dedup**: Repeated phrases replaced with §1, §2 tokens

## API Reference

```python
compress(text: str, level: str = "medium") -> CompressResult
```

- `level`: `"light"` | `"medium"` | `"aggressive"`
- Returns `CompressResult` with `.output` (str) and `.stats` (CompressionStats)

```python
estimate_tokens(text: str) -> int
```

Fast BPE token count estimate (no external dependencies).

### CompressResult

```python
result.output          # str — compressed text
result.stats           # CompressionStats
```

### CompressionStats

```python
stats.input_tokens     # int
stats.output_tokens    # int
stats.saved            # int (input - output)
stats.pct              # float (% saved)
stats.input_chars      # int
stats.output_chars     # int
stats.level            # str
stats.dictionary       # int (dedup entries created)
```

## License

AGPL-3.0-only
