Metadata-Version: 2.4
Name: toneformat
Version: 1.0.0
Summary: TONE – Token-Optimized Notation Engine for LLMs
Author-email: Abdulbasit Ayinde <ayindealiyu1@gmail.com>
License-Expression: MIT
Keywords: tone,json,llm,token,serialization,data-format,ai,parser
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Markup
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Utilities
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: types-setuptools; extra == "dev"
Provides-Extra: cli
Requires-Dist: click>=8.0.0; extra == "cli"
Requires-Dist: rich>=13.0.0; extra == "cli"
Provides-Extra: fastapi
Requires-Dist: fastapi>=0.104.0; extra == "fastapi"
Provides-Extra: pydantic
Requires-Dist: pydantic>=2.0.0; extra == "pydantic"
Provides-Extra: pandas
Requires-Dist: pandas>=2.0.0; extra == "pandas"
Provides-Extra: all
Requires-Dist: click>=8.0.0; extra == "all"
Requires-Dist: rich>=13.0.0; extra == "all"
Requires-Dist: fastapi>=0.104.0; extra == "all"
Requires-Dist: pydantic>=2.0.0; extra == "all"
Requires-Dist: pandas>=2.0.0; extra == "all"
Requires-Dist: pyyaml>=6.0; extra == "all"
Requires-Dist: tiktoken>=0.5.0; extra == "all"
Dynamic: license-file

# TONE - Token-Optimized Notation Engine for LLMs

[![Python Version](https://img.shields.io/badge/python-3.8%2B-blue)](https://www.python.org/)
[![License](https://img.shields.io/badge/license-MIT-green)](LICENSE)
[![Status](https://img.shields.io/badge/status-production%20ready-success)](https://github.com)
[![Tests](https://img.shields.io/badge/tests-209%20passing-brightgreen)](tests)

**TONE – Token-Optimized Notation Engine for LLMs** is a compact, human-readable data format designed for **AI and LLM contexts**.  
It encodes JSON-compatible structures using far fewer tokens typically **40-60% less**  
making it ideal for AI assistants, RAG systems, and conversational data pipelines.

> **Status**: ✅ Production-ready - Full implementation complete  
> **Reference**: [TypeScript Implementation](https://github.com/johannschopplich/toon) (inspiration)

## Why TONE?

AI is becoming cheaper and more accessible, but **LLM tokens still cost money**. TONE achieves 30-60% token reduction versus JSON for uniform tabular data:

### JSON (15,145 tokens):
```json
{
  "repositories": [
    {
      "id": 28457823,
      "name": "freeCodeCamp",
      "repo": "freeCodeCamp/freeCodeCamp",
      "description": "freeCodeCamp.org's open-source codebase...",
      "createdAt": "2014-12-24T17:49:19Z",
      "updatedAt": "2025-10-28T11:58:08Z"
    }
  ]
}
```

### TONE (8,745 tokens, 42.3% reduction):
```
repositories[3]{id,name,repo,description,createdAt,updatedAt}:
  28457823,freeCodeCamp,freeCodeCamp/freeCodeCamp,freeCodeCamp.org's open-source codebase...,2014-12-24T17:49:19Z,2025-10-28T11:58:08Z
```

## Key Features

- 💸 **Token-efficient**: typically 30–60% fewer tokens than JSON
- 🤿 **LLM-friendly guardrails**: explicit lengths and fields enable validation
- 🍱 **Minimal syntax**: removes redundant punctuation
- 📐 **Indentation-based**: like YAML, uses whitespace instead of braces
- 🧺 **Tabular arrays**: declare keys once, stream data as rows

## Installation

### Basic Installation
```bash
pip install toneformat
```

**Note**: The package is installed as `toneformat`, but imported as `tone`:
```python
import tone  # Import works the same way
```

### With CLI Support
```bash
pip install 'toneformat[cli]'
```

### With All Integrations
```bash
pip install 'toneformat[all]'
```

### For Development
```bash
git clone https://github.com/your-username/python-tone.git
cd python-tone
pip install -e ".[dev,cli]"
```

## Quick Start

```python
from tone import encode, decode

# Encode Python data to TONE
data = {
    "users": [
        {"id": 1, "name": "Alice", "role": "admin"},
        {"id": 2, "name": "Bob", "role": "user"}
    ]
}

toon_str = encode(data)
print(toon_str)
# users[2]{id,name,role}:
#   1,Alice,admin
#   2,Bob,user

# Decode TONE back to Python
decoded = decode(toon_str)
assert decoded == data
```

## TONE Format Overview

### Objects
```python
from tone import encode

encode({'id': 123, 'name': 'Ada', 'active': True})
```
```
id: 123
name: Ada
active: true
```

### Nested Objects
```python
encode({'user': {'id': 123, 'name': 'Ada'}})
```
```
user:
  id: 123
  name: Ada
```

### Primitive Arrays (Inline)
```python
encode({'tags': ['admin', 'ops', 'dev']})
```
```
tags[3]: admin,ops,dev
```

### Tabular Arrays
```python
encode({
    'items': [
        {'sku': 'A1', 'qty': 2, 'price': 9.99},
        {'sku': 'B2', 'qty': 1, 'price': 14.5}
    ]
})
```
```
items[2]{sku,qty,price}:
  A1,2,9.99
  B2,1,14.5
```

### List Arrays (Non-Uniform)
```python
encode({
    'items': [
        {'id': 1, 'name': 'First'},
        {'id': 2, 'name': 'Second', 'extra': True}
    ]
})
```
```
items[2]:
  - id: 1
    name: First
  - id: 2
    name: Second
    extra: true
```

### Arrays of Arrays
```python
encode({'pairs': [[1, 2], [3, 4]]})
```
```
pairs[2]:
  - [2]: 1,2
  - [2]: 3,4
```

### Root Arrays
```python
encode([{'id': 1}, {'id': 2}])
```
```
[2]{id}:
  1
  2
```

## Core Features

### ✅ Implemented
- 🎯 **Type Safety**: Full type hints with Python's typing module
- 🔄 **Encoding**: Complete Python → TONE conversion
- 🔄 **Decoding**: Complete TONE → Python conversion
- 🎨 **CLI Tool**: Command-line interface with rich output
- 🧪 **Testing**: 191 comprehensive tests (100% passing)
- 📊 **Coverage**: 71% code coverage
- ✅ **Validation**: Strict mode enforcement
- 🎛️ **Options**: All delimiters, indent, length markers
- ⚡ **Streaming**: Memory-efficient for large files
- 🔢 **Token Intelligence**: Built-in token counting
- 🎯 **Rich Errors**: Context-aware exceptions with suggestions
- 📁 **Context Managers**: Pythonic file handling
- 🖨️ **Pretty Printing**: Human-readable formatting
- ⚡ **Async Support**: Non-blocking operations
- 🔍 **Debug Tools**: Parse tree inspection & debugging

### ✅ Ecosystem Integrations
- 🚀 **FastAPI**: Web API plugin with TONEResponse
- 📦 **Pydantic**: Type-safe model encoding/decoding
- 🐼 **Pandas**: DataFrame converters
- 🔄 **Converters**: JSON, YAML, CSV interoperability
- 🔍 **Token Optimization**: Auto-optimization tools

## CLI Usage

The CLI tool supports converting between JSON and TONE formats:

```bash
# Encode JSON to TONE (auto-detected by extension)
tone input.json -o output.tone

# Decode TONE to JSON (auto-detected by extension)
tone data.tone -o output.json

# Tab-separated output (often more token-efficient)
tone data.json --delimiter "\t" -o output.tone

# Pipe-separated with length markers
tone data.json --delimiter "|" --length-marker -o output.tone

# Manual mode selection
tone file.txt -e  # Force encode
tone file.txt -d  # Force decode

# Custom indentation
tone data.json --indent 4 -o output.tone

# Disable strict mode
tone data.tone --no-strict -o output.json

# Help
tone --help
```

## Comprehensive API Reference

### Core Functions

#### `encode(value, options=None)`

Converts Python values to TONE format.

**Parameters:**
- `value`: Any JSON-serializable value (dict, list, primitive, or nested)
- `options`: Optional dict with:
  - `indent` (int): Spaces per indentation level (default: 2)
  - `delimiter` (str): Array delimiter - `','`, `'\t'`, or `'|'` (default: `','`)
  - `length_marker` (str or None): Optional `'#'` prefix for array lengths

**Returns:** TONE-formatted string

**Example:**
```python
from tone import encode

data = {
    'users': [
        {'id': 1, 'name': 'Alice', 'role': 'admin'},
        {'id': 2, 'name': 'Bob', 'role': 'user'}
    ]
}

toon_str = encode(data, {'delimiter': '|'})
print(toon_str)
```

#### `decode(input_str, options=None)`

Converts TONE string back to Python values.

**Parameters:**
- `input_str`: TONE-formatted string
- `options`: Optional dict with:
  - `indent` (int): Spaces per indentation level (default: 2)
  - `strict` (bool): Enable strict validation (default: True)

**Returns:** Python value (dict, list, or primitive)

**Example:**
```python
from tone import decode

toon_str = """users[2]{id,name}:
  1,Alice
  2,Bob"""

data = decode(toon_str)
# {'users': [{'id': 1, 'name': 'Alice'}, {'id': 2, 'name': 'Bob'}]}
```

### Context Managers

#### `TONEEncoder` & `TONEDecoder`

Pythonic file handling for TONE files.

**Example:**
```python
from tone import TONEEncoder, TONEDecoder

# Write
with TONEEncoder("data.tone") as enc:
    enc.encode({"users": [{"id": 1, "name": "Alice"}]})

# Read
with TONEDecoder("data.tone") as dec:
    data = dec.decode()
```

### Async Operations

#### `aencode(value, options=None)` & `adecode(input_str, options=None)`

Async encode/decode for non-blocking operations.

**Example:**
```python
import asyncio
from tone import aencode, adecode

async def main():
    toon_str = await aencode({"name": "Alice"})
    data = await adecode(toon_str)

asyncio.run(main())
```

#### `aencode_parallel(values, max_workers=5)` & `adecode_parallel(toon_strings, max_workers=5)`

Parallel async processing for large batches.

**Example:**
```python
import asyncio
from tone import aencode_parallel

async def main():
    items = [{"id": i} for i in range(1000)]
    results = await aencode_parallel(items, max_workers=10)

asyncio.run(main())
```

### Streaming

#### `encode_stream(items, chunk_size=1000)` & `decode_stream(file_or_iterator, chunk_size=1000)`

Memory-efficient processing for large datasets.

**Example:**
```python
from tone import encode_stream

def large_data():
    for i in range(10_000_000):
        yield {"id": i, "value": f"item_{i}"}

# Process in chunks
for chunk in encode_stream(large_data(), chunk_size=10000):
    write_to_file(chunk)
```

### Debug Tools

#### `debug_encode(value, options=None)` & `debug_decode(toon_str, options=None)`

Debug encoding/decoding with metadata and optimization suggestions.

**Example:**
```python
from tone import debug_encode

info = debug_encode({"users": [...]})
print(f"Format: {info['format_detected']}")
print(f"Token savings: {info['size_reduction']*100:.1f}%")
for tip in info['optimization_suggestions']:
    print(f"  - {tip}")
```

#### `inspect_parse_tree(toon_str, indent="  ")`

Visualize TONE parse tree structure.

**Example:**
```python
from tone import inspect_parse_tree

tree = inspect_parse_tree(toon_str)
print(tree)

# Output:
# Document
# ├─ Array (tabular, 2 items)
# │  ├─ Fields: ['id', 'name']
# │  └─ Rows:
# │     ├─ {'id': '1', 'name': 'Alice'}
# │     └─ {'id': '2', 'name': 'Bob'}
```

### Token Optimization

#### `estimate_tokens(data)`

Estimate token count for data.

**Example:**
```python
from tone import estimate_tokens

tokens = estimate_tokens({"users": [...]})
print(f"Estimated tokens: {tokens}")
```

#### `compare_formats(data, delimiter=",")`

Compare token usage across different formats.

**Example:**
```python
from tone import compare_formats

comparison = compare_formats(data)
print(f"TONE (comma): {comparison['toon_comma']:,} tokens")
print(f"TONE (tab):   {comparison['toon_tab']:,} tokens")
print(f"JSON:         {comparison['json']:,} tokens")
print(f"\nSavings: {(1 - comparison['toon_comma']/comparison['json'])*100:.1f}%")
```

#### `optimize_for_tokens(data, target_method="simple")`

Find optimal encoding configuration for minimum tokens.

**Example:**
```python
from tone import optimize_for_tokens

optimal = optimize_for_tokens(data)
print(f"Best delimiter: {optimal['best']}")
print(f"Token savings: {optimal['tokens']}")
```

### Formatting & Display

#### `format_value(value, indent=0, max_depth=10)`

Pretty print Python values in human-readable format.

**Example:**
```python
from tone import format_value

formatted = format_value({"users": [...]})
print(formatted)
```

#### `summarize_structure(value, max_items=10)`

Create data structure summary.

**Example:**
```python
from tone import summarize_structure

summary = summarize_structure(data)
print(summary)

# Output:
# Object with 1 key: 'users'
#   users: Array with 100 items
#     Items: {'id': 0, 'name': 'item_0'}, ..., + 98 more
```

#### `create_table(data, title=None)`

Create rich Table for visualization.

**Example:**
```python
from tone import create_table
from rich.console import Console

table = create_table(data, title="Users")
console = Console()
console.print(table)
```

### Ecosystem Integrations

#### FastAPI

Return TONE format from FastAPI endpoints for token-efficient LLM responses.

**Installation:**
```bash
pip install 'tone[fastapi]'
```

**Usage:**
```python
from fastapi import FastAPI
from tone.integrations import TONEResponse

app = FastAPI()

@app.get("/users", response_class=TONEResponse)
async def get_users():
    return [
        {'id': 1, 'name': 'Alice', 'role': 'admin'},
        {'id': 2, 'name': 'Bob', 'role': 'user'}
    ]

# Returns TONE format: users[2]{id,name,role}: ...
```

**Benefits:**
- Token-efficient API responses
- LLM-optimized output
- Simple integration
- Compatible with FastAPI features

#### Pydantic

Type-safe encoding/decoding with Pydantic models.

**Installation:**
```bash
pip install 'tone[pydantic]'
```

**Usage:**
```python
from pydantic import BaseModel
from tone.integrations import encode_model, decode_model

class User(BaseModel):
    id: int
    name: str
    email: str

# Encode Pydantic models
users = [User(id=1, name='Alice', email='alice@example.com')]
tone = encode_model(users)

# Decode to Pydantic models
decoded_users = decode_model(tone, User)
# Returns: list[User]
```

**Features:**
- Automatic validation
- Type safety
- Model serialization
- List support

#### Pandas

Convert DataFrames to/from TONE format efficiently.

**Installation:**
```bash
pip install 'tone[pandas]'
```

**Usage:**
```python
import pandas as pd
from tone.integrations import to_toon, from_toon

# DataFrame to TONE (optimal tabular format)
df = pd.DataFrame({
    'id': [1, 2, 3],
    'name': ['Alice', 'Bob', 'Charlie'],
    'score': [95.5, 87.2, 91.8]
})

toon_str = to_toon(df)
# Returns: users[3]{id,name,score}: ...

# TONE to DataFrame
df2 = from_toon(toon_str)
# Returns pandas DataFrame with proper types
```

**Benefits:**
- Efficient tabular encoding
- Type preservation
- DataFrame compatibility
- Easy data export/import

#### Format Converters

Convert between TONE and other serialization formats.

**JSON Converter:**
```python
from tone.integrations import to_json, from_json

# TONE to JSON
toon_str = 'users[2]{id,name}:\n  1,Alice\n  2,Bob'
json_str = to_json(toon_str, indent=2)

# JSON to TONE
json_str = '{"users": [{"id": 1, "name": "Alice"}]}'
toon_str = from_json(json_str)
```

**YAML Converter:**
```python
from tone.integrations import to_yaml, from_yaml

# TONE to YAML
yaml_str = to_yaml(toon_str)

# YAML to TONE
toon_str = from_yaml(yaml_str)
```

**CSV Converter:**
```python
from tone.integrations import to_csv, from_csv

# TONE to CSV (tabular arrays)
csv_str = to_csv(toon_str)

# CSV to TONE (as tabular array)
toon_str = from_csv(csv_str)
```

### Error Handling

TONE provides rich, context-aware exceptions:

```python
from tone import decode
from tone.exceptions import (
    TONEError,
    TONESyntaxError,
    TONEValidationError
)

try:
    data = decode(toon_str)
except TONESyntaxError as e:
    logger.error(f"Syntax error: {e}")
    # e has context, line numbers, suggestions
    handle_syntax_error(e)
except TONEValidationError as e:
    logger.warning(f"Validation error: {e}")
    # Consider falling back to non-strict mode
    data = decode(toon_str, {"strict": False})
```

**Exception Classes:**
- `TONEError` - Base exception for TONE errors
- `TONEEncodeError` - Error during encoding
- `TONEDecodeError` - Error during decoding
- `TONESyntaxError` - Syntax error in TONE format
- `TONEValidationError` - Validation error (strict mode)
- `TONENormalizationError` - Error during value normalization
- `TONETypeError` - Type-related error
- `TONEValueError` - Value-related error

### Type Definitions

**Core Types:**
```python
JsonValue = Union[JsonPrimitive, JsonArray, JsonObject]
JsonPrimitive = Union[str, int, float, bool, None]
JsonArray = List[JsonValue]
JsonObject = Dict[str, JsonValue]
```

**Options:**
```python
EncodeOptions = {
    indent: int,           # Spaces per indent (default: 2)
    delimiter: str,        # Comma, tab, or pipe (default: ",")
    length_marker: str     # Optional "#" prefix (default: None)
}

DecodeOptions = {
    indent: int,    # Indent size (default: 2)
    strict: bool    # Strict validation (default: True)
}
```

**Constants:**
```python
DELIMITERS = {"comma": ",", "tab": "\t", "pipe": "|"}
DEFAULT_DELIMITER = ","
```

## Real-World Examples

### LLM Integration

```python
from tone import encode, compare_formats, optimize_for_tokens
import openai

# Prepare your data
analytics = {
    "metrics": [
        {"metric": "users", "value": 1500, "change": 15},
        {"metric": "revenue", "value": 50000, "change": 8},
        {"metric": "sessions", "value": 5000, "change": 22}
    ]
}

# Find optimal format
optimal = optimize_for_tokens(analytics)

# Encode with optimal settings
toon_str = encode(analytics, {"delimiter": optimal['best']})

# Send to LLM
response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": "You are an analytics assistant."},
        {"role": "user", "content": f"Analyze this data:\n\n{toon_str}\n\nSummarize the key trends."}
    ]
)
```

### Web API (FastAPI)

```python
from fastapi import FastAPI, HTTPException
from tone.integrations import TONEResponse
import logging

logger = logging.getLogger(__name__)
app = FastAPI()

@app.get("/api/analytics", response_class=TONEResponse)
async def get_analytics():
    """Returns analytics data in token-efficient format."""
    try:
        import pandas as pd
        
        # Get your data
        data = {
            "daily_stats": [
                {"date": "2025-01-01", "users": 150, "revenue": 5000},
                {"date": "2025-01-02", "users": 165, "revenue": 5200},
                {"date": "2025-01-03", "users": 180, "revenue": 5400}
            ]
        }
        
        return data
        
    except Exception as e:
        logger.error(f"API error: {e}")
        raise HTTPException(status_code=500, detail=str(e))

# The response will be:
# daily_stats[3]{date,users,revenue}:
#   2025-01-01,150,5000
#   2025-01-02,165,5200
#   2025-01-03,180,5400
```

### Data Science Workflow

```python
import pandas as pd
from tone.integrations import to_toon, from_toon
from tone import estimate_tokens

# Start with DataFrame
df = pd.DataFrame({
    "timestamp": pd.date_range("2025-01-01", periods=5, freq="D"),
    "temperature": [20, 22, 19, 21, 23],
    "humidity": [45, 50, 48, 52, 47]
})

# Convert to TONE (optimal for uniform data!)
toon_str = to_toon(df)

# Check token count
tokens = estimate_tokens(toon_str)
print(f"Token count: {tokens}")

# Send to LLM for analysis
# ... LLM processes TONE format ...

# Convert back to DataFrame
df_restored = from_toon(toon_str)
print(df_restored.equals(df))  # True
```

### Configuration Management

```python
# config.tone
database:
  host: localhost
  port: 5432
  name: myapp
  pool_size: 10
features:
  enabled:
    - auth
    - api
    - logging
settings:
  debug: false
  log_level: INFO
  timeout: 30

# Python
from tone import decode
from pathlib import Path

config = decode(Path("config.tone").read_text())
print(f"Connecting to {config['database']['host']}:{config['database']['port']}")
```

### Database Export

```python
import sqlite3
from tone import TONEEncoder

conn = sqlite3.connect("database.db")
cursor = conn.cursor()

# Query data
cursor.execute("SELECT * FROM users WHERE active = 1")
rows = cursor.fetchall()
columns = [desc[0] for desc in cursor.description]

# Convert to dicts
users = [dict(zip(columns, row)) for row in rows]

# Export as TONE
data = {"users": users}
with TONEEncoder("users_export.tone") as enc:
    enc.encode(data)

print(f"Exported {len(users)} users")
```

## Best Practices

### Performance Optimization

**Use Tabular Format for Uniform Data:**
```python
from tone import encode

# ✅ Best: Tabular format (most efficient)
uniform_data = {
    "users": [
        {"id": 1, "name": "Alice", "email": "a@example.com"},
        {"id": 2, "name": "Bob", "email": "b@example.com"},
        # ... many more
    ]
}

toon = encode(uniform_data)
# users[100]{id,name,email}:
#   1,Alice,a@example.com
#   2,Bob,b@example.com
#   ...

# ❌ Mixed data loses efficiency
```

**Choose the Right Delimiter:**
```python
from tone import encode

# For wide data (many columns), use tab
wide_data = [{"col1": 1, "col2": 2, ..., "col20": 20}]
toon = encode(wide_data, {"delimiter": "\t"})  # Fewer tokens

# For narrow data, comma is fine
narrow_data = [{"id": 1, "name": "Alice"}]
toon = encode(narrow_data)  # Comma default is fine
```

**Stream Large Datasets:**
```python
from tone import encode_stream

def large_data():
    for i in range(10_000_000):
        yield {"id": i, "value": f"item_{i}"}

# ✅ Process in chunks
for chunk in encode_stream(large_data(), chunk_size=10000):
    write_to_file(chunk)

# ❌ Loading everything uses lots of memory!
```

**Use Async for Web Applications:**
```python
import asyncio
from tone import aencode_parallel

async def api_handler(request_data_list):
    # Process multiple requests in parallel
    results = await aencode_parallel(request_data_list, max_workers=10)
    return results
```

### Error Handling

**Always Handle TONE Errors:**
```python
from tone import decode
from tone.exceptions import TONESyntaxError, TONEValidationError

try:
    data = decode(toon_str)
except TONESyntaxError as e:
    logger.error(f"Syntax error: {e}")
    # e has context, line numbers, suggestions
    handle_syntax_error(e)
except TONEValidationError as e:
    logger.warning(f"Validation error: {e}")
    # Consider falling back to non-strict mode
    data = decode(toon_str, {"strict": False})
```

**Use Strict Mode in Production:**
```python
# Development: Be lenient
data = decode(toon_str, {"strict": False})

# Production: Enforce correctness
data = decode(toon_str, {"strict": True})
```

### Security Considerations

**Sanitize User Input:**
```python
from tone import decode
import re

def safe_decode(toon_str: str) -> dict:
    """Decode TONE with safety checks."""
    # Check reasonable size
    if len(toon_str) > 10_000_000:  # 10MB limit
        raise ValueError("Input too large")
    
    # Validate basic structure
    if not re.match(r'^[\w\n\t\[\]{}:,-"\'"\\\s]+$', toon_str):
        raise ValueError("Suspicious characters detected")
    
    return decode(toon_str, {"strict": True})
```

**Validate Schema:**
```python
from tone import decode

def validate_schema(data: dict, expected_keys: list) -> dict:
    """Validate decoded data structure."""
    if not isinstance(data, dict):
        raise ValueError("Expected dict")
    
    missing = set(expected_keys) - set(data.keys())
    if missing:
        raise ValueError(f"Missing keys: {missing}")
    
    return data
```

### Testing Strategies

**Test Round-trips:**
```python
import pytest
from tone import encode, decode

def test_roundtrip(data):
    """Ensure data survives encoding/decoding."""
    toon_str = encode(data)
    decoded = decode(toon_str)
    assert decoded == data

test_roundtrip({"users": [...]})
```

**Test Edge Cases:**
```python
def test_edge_cases():
    # Empty structures
    assert decode(encode({})) == {}
    assert decode(encode([])) == []
    
    # Special values
    assert decode(encode(None)) is None
    assert decode(encode(True)) is True
    assert decode(encode(False)) is False
    
    # Large numbers
    assert decode(encode({"big": 10**20})) == {"big": 10**20}
```

## Troubleshooting

### Common Issues

**Issue: "Indentation error"**
```python
# Check your indent size
try:
    data = decode(toon_str, {"indent": 2})
except TONESyntaxError:
    # Try different indent size
    data = decode(toon_str, {"indent": 4})
```

**Issue: "Expected N items, but got M"**
```python
# Use strict=False for lenient parsing
data = decode(toon_str, {"strict": False})

# Or fix your data
# Remove extra items or correct the header count
```

**Issue: High memory usage**
```python
# Use streaming instead
from tone import encode_stream

for chunk in encode_stream(large_data(), chunk_size=1000):
    process(chunk)
```

**Issue: Slow encoding**
```python
# Use async for multiple items
import asyncio
from tone import aencode_parallel

results = await aencode_parallel(items, max_workers=10)
```

## Performance

### Expected Performance

| Operation | Small (100 items) | Medium (1k items) | Large (10k items) |
|-----------|------------------|-------------------|-------------------|
| Encode    | <1ms            | ~5ms             | ~50ms            |
| Decode    | <1ms            | ~6ms             | ~60ms            |
| Tokens    | ~500            | ~5,000           | ~50,000          |

### Optimization Tips

1. **Prefer tabular format**: 30-60% token savings
2. **Use tab delimiter**: Best for wide data
3. **Stream large files**: Memory-efficient
4. **Parallel processing**: 5-10x speedup for batches

## Specification

This implementation follows the [TOON Specification v1.3](https://github.com/johannschopplich/toon/blob/main/SPEC.md) for cross-language interoperability and compatibility with the original TypeScript implementation.

## Contributing

Contributions are welcome! This project maintains strict SPEC compliance and high code quality standards.

### Development Setup

```bash
# Clone repository
git clone https://github.com/your-username/python-tone.git
cd python-tone

# Install development dependencies
pip install -e ".[dev,cli,all]"

# Run tests
pytest

# Run linters
black tone tests
ruff check tone tests
mypy tone

# Check coverage
pytest --cov=tone --cov-report=html
```

### Code Quality

- Type hints throughout
- 100% test coverage goal
- SPEC compliance validation
- Zero linter errors/warnings

## License

MIT License

```
MIT © 2025 Abdulbasit Ayinde

Originally inspired by TOON by Johann Schopplich.

Enhanced and maintained as TONE, the global token-optimized notation standard for AI.
```

## Acknowledgments

- Original TypeScript implementation by [Johann Schopplich](https://github.com/johannschopplich)
- Based on [TOON Specification v1.3](https://github.com/johannschopplich/toon)
- Enhanced and maintained as **TONE** by Abdulbasit Ayinde

## Links

- **Specification**: [SPEC.md](https://github.com/johannschopplich/toon/blob/main/SPEC.md)
- **Original TypeScript**: [@byjohann/toon](https://github.com/johannschopplich/toon)
- **TypeScript NPM**: [npm/@byjohann/toon](https://www.npmjs.com/package/@byjohann/toon)

---

**TONE – Token-Optimized Notation Engine for LLMs**  
Making AI communication more efficient, one token at a time.
