Metadata-Version: 2.4
Name: llm-cleaner
Version: 0.1.1
Summary: A robust JSON repair and extraction tool for LLM outputs.
Author-email: Jiajun Jiao <jiaojiajunqd@gmail.com>
Requires-Python: >=3.7
Description-Content-Type: text/markdown

# LLM Cleaner

A robust, zero-dependency Python utility designed to extract and repair JSON from "dirty" LLM outputs. 

Large Language Models often output JSON wrapped in Markdown, cluttered with comments, or truncated due to token limits. This library cleans up the mess and ensures you get valid JSON strings ready for parsing.

*Future updates will support other formats (XML, YAML).*

## Features

- **Markdown Extraction**: Automatically detects and extracts JSON from \`\`\`json\`\`\` blocks.
- **Truncation Handling**: Detects cut-off JSON and automatically adds missing closing brackets/braces.
- **Comment Removal**: Removes `//` and `/* */` comments (including those inside `code` blocks) while preserving URLs.
- **Garbage Collection**: Truncates conversational text after the JSON (e.g., "Hope this helps!").
- **Syntax Repair**:
  - Converts Python `True`/`False`/`None` to JSON `true`/`false`/`null`.
  - Fixes single quotes (`'key': 'value'`) to double quotes.
  - Removes trailing commas.

## Installation

```bash
pip install llm-cleaner
```

## Usage
```
import json
from llm_cleaner.json_cleaner import clean

raw_output = """
Here is the data:
```json
{
  'id': 1,
  "status": "active", // comment
"""
cleaned_json = clean(raw_output)
data = json.loads(cleaned_json)
print(data)
```

Output:
```
{'id': 1, 'status': 'active'}
```

## How it works

1. **Extract**: Locates the most likely JSON block using Regex.
2. **Sanitize**: Strips comments and conversational filler.
3. **Balance**: Uses a stack-based approach to close any unclosed brackets if the output was truncated.
4. **Repair**: Applies heuristic replacements for common syntax errors.

## License

MIT
