Metadata-Version: 2.4
Name: mdmin
Version: 1.1.1
Summary: Rule-based markdown compression + context extraction for LLM consumption. Reduces token usage by 20-95%.
License-Expression: AGPL-3.0-only
Project-URL: Homepage, https://mdmin.dev
Project-URL: Repository, https://github.com/Dean86/mdmin
Project-URL: Documentation, https://mdmin.dev
Keywords: markdown,compression,llm,tokens,minify,ai,context-window,gpt,claude,openai,anthropic
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Text Processing
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9
Description-Content-Type: text/markdown

# mdmin

Markdown compression + context extraction for LLM consumption. Reduces token usage by 20–95%.

**Website:** [mdmin.dev](https://mdmin.dev) • **npm:** [npmjs.com/package/mdmin](https://npmjs.com/package/mdmin)

## Install

```bash
pip install mdmin
```

Zero dependencies. Python 3.9+.

## Compress

Strip verbose phrases, redundant formatting, and structural waste. 13–35% token savings.

```python
from mdmin import compress, estimate_tokens

result = compress(text, level="medium")
print(result.output)       # compressed text
print(result.stats.pct)    # e.g. 22.3 (%)
print(result.stats.saved)  # tokens saved
```

```bash
mdmin compress README.md                    # stdout
mdmin compress README.md -o README.min.md   # save to file
mdmin compress README.md --level aggressive
mdmin stats README.md                       # compare all levels
cat file.md | mdmin compress -              # stdin
```

## Extract

Given a large document and a query, returns only the relevant chunks within a token budget.
TF-IDF based — no external API, no vector database, runs in milliseconds. 70–95% reduction on targeted queries.

```python
from mdmin import extract

result = extract(large_doc, "how does auth work", max_tokens=2000)
print(result.text)               # relevant chunks only
print(result.stats.reduction)    # e.g. 91.2 (%)
print(result.stats.chunks_extracted)  # e.g. 2 of 24 chunks
```

```bash
mdmin extract bigdoc.md -q "how does auth work"
mdmin extract bigdoc.md -q "database schema" --max 1500
```

For advanced use:

```python
from mdmin import ContextExtractor

extractor = ContextExtractor()
extractor.index(large_doc)
result = extractor.extract("auth flow", max_tokens=2000)

# Multi-doc: score chunks globally across files
scored = extractor.score_chunks("auth flow")
```

## Compression Levels

| Level | Savings | What it does |
|---|---|---|
| `light` | ~10% | Whitespace, comments, basic verbose patterns |
| `medium` | ~20-25% | + more verbose patterns, table compression, formatting cleanup |
| `aggressive` | ~25-35% | + article stripping, list compression, bold removal, dictionary dedup |

## API Reference

### compress

```python
compress(text: str, level: str = "medium") -> CompressResult
```

Returns `CompressResult` with `.output` (str) and `.stats` (CompressionStats):

```python
stats.input_tokens     # int
stats.output_tokens    # int
stats.saved            # int
stats.pct              # float (% saved)
stats.input_chars      # int
stats.output_chars     # int
stats.level            # str
```

### extract

```python
extract(text: str, query: str, *, max_tokens: int = 2000) -> ExtractResult
```

Returns `ExtractResult` with `.text` (str) and `.stats` (ExtractStats):

```python
stats.total_doc_tokens    # int
stats.extracted_tokens    # int
stats.chunks_total        # int
stats.chunks_extracted    # int
stats.reduction           # float (% reduction)
stats.top_scores          # list[TopScore]
```

### estimate_tokens

```python
estimate_tokens(text: str) -> int
```

Fast BPE token count estimate (no external dependencies).

## License

AGPL-3.0-only
