Metadata-Version: 2.4
Name: contexthub
Version: 0.1.0
Summary: A lightweight context management library for LLM applications
Author: SecretSettler
License-Expression: Apache-2.0
Project-URL: Homepage, https://github.com/SecretSettler/ContextHub
Project-URL: Repository, https://github.com/SecretSettler/ContextHub
Project-URL: Issues, https://github.com/SecretSettler/ContextHub/issues
Keywords: llm,context,ai,context-window,chunking
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: tiktoken
Requires-Dist: tiktoken>=0.5.0; extra == "tiktoken"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Dynamic: license-file

# ContextHub

A lightweight Python library for managing LLM context windows. It helps structure context items, estimate token usage, prioritize what stays when budget is limited, and split long text into smaller chunks.

## Features

- **Context window management** with token budgets
- **Priority-aware context selection** (CRITICAL, HIGH, MEDIUM, LOW)
- **Chunking utilities** — by tokens, sentences, paragraphs, or custom separator
- **Zero-dependency** default install with optional `tiktoken` integration
- **OpenAI-compatible** message format export

## Installation

```bash
pip install contexthub
```

Optional tokenizer support for accurate token counting:

```bash
pip install "contexthub[tiktoken]"
```

## Quick Start

### Build a prioritized context window

```python
from contexthub import ContextWindow, Priority

window = ContextWindow(max_tokens=50)

window.add("You are a precise assistant.", priority=Priority.CRITICAL, role="system")
window.add("Summarize this document in 3 bullet points.", priority=Priority.HIGH, role="user")
window.add("Previous conversation detail that can be dropped first.", priority=Priority.LOW)

selected_items = window.build()
messages = window.to_messages()

print([item.priority.name for item in selected_items])
print(messages)
```

### Chunk long text

```python
from contexthub import chunk_by_tokens, chunk_by_sentences, chunk_by_paragraphs, chunk_by_separator

text = "This is sentence one. This is sentence two. This is sentence three."

print(chunk_by_sentences(text, max_sentences=2))
print(chunk_by_tokens(text, max_tokens=10, overlap=2))
print(chunk_by_paragraphs("Paragraph one.\n\nParagraph two.\n\nParagraph three."))
print(chunk_by_separator("a|b|c", separator="|"))
```

### Utility helpers

```python
from contexthub import count_tokens, trim_to_fit

text = "This is a long context block for a tight budget"
print(count_tokens(text))
print(trim_to_fit(text, max_tokens=6))
```

## API Summary

### ContextWindow

- `ContextWindow(max_tokens=4096, tokenizer=None)`
  - `add(content, priority=Priority.MEDIUM, role="user", metadata=None)` — Add a context item
  - `build()` — Build context fitting within token budget, prioritized
  - `to_messages()` — Export as OpenAI-compatible message list
  - `clear()` — Clear all items
  - `total_tokens` — Current total token count
  - `remaining_tokens` — Remaining token budget
  - `items` — All items in the window

### Types

- `ContextItem` — Dataclass with content, priority, role, metadata, token_count
- `Priority` — Enum: CRITICAL, HIGH, MEDIUM, LOW

### Chunking

- `chunk_by_tokens(text, max_tokens, overlap=0, tokenizer=None)`
- `chunk_by_sentences(text, max_sentences=5)`
- `chunk_by_paragraphs(text)`
- `chunk_by_separator(text, separator="\n")`

### Priority Helpers

- `select_by_priority(items, max_tokens, tokenizer=None)`
- `trim_to_fit(text, max_tokens, tokenizer=None, suffix="...")`

### Tokenizer

- `count_tokens(text, model=None)`

## Development

```bash
pip install -e ".[dev]"
python -m pytest tests/ -v
python -m build
```

## License

Apache License 2.0. See [LICENSE](LICENSE).
