Metadata-Version: 2.4
Name: tpress
Version: 0.2.0
Summary: Press repositories, code files, and HTML into compact, prompt-ready context bundles.
Author-email: copypasteitworks <copypasteitworks@gmail.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/copypasteitworks/TextPress
Project-URL: Repository, https://github.com/copypasteitworks/TextPress
Project-URL: Issues, https://github.com/copypasteitworks/TextPress/issues
Project-URL: Documentation, https://github.com/copypasteitworks/TextPress#readme
Keywords: llm,context,compression,cli,prompt,rag,repository,bundler
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Text Processing
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: click>=8.0.0
Requires-Dist: rich>=13.0.0
Requires-Dist: tiktoken>=0.7.0
Provides-Extra: dev
Requires-Dist: pytest>=8.0.0; extra == "dev"
Requires-Dist: ruff>=0.6.0; extra == "dev"
Provides-Extra: html
Requires-Dist: beautifulsoup4>=4.12.0; extra == "html"
Dynamic: license-file

# TextPress (tpress)

> Press repositories and code into compact, prompt-ready context bundles for LLMs.

[![PyPI version](https://img.shields.io/pypi/v/tpress.svg)](https://pypi.org/project/tpress/)
[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)
[![Tests](https://github.com/copypasteitworks/TextPress/actions/workflows/ci.yml/badge.svg)](https://github.com/copypasteitworks/TextPress/actions)

## Why tpress?

When working with LLMs, you often need to share code context. Copy-pasting files is tedious, and full repos exceed token limits. **tpress** solves this by:

- 📦 **Bundling** entire repos into a single markdown file
- 🎯 **Prioritizing** code over data files (Python/JS first, JSON last)
- 🔐 **Redacting** secrets automatically before sharing
- ✂️ **Compressing** intelligently to fit token budgets

## Install

```bash
pip install tpress
```

## Quick Start

```bash
# Bundle current directory → bundle.md
tpress .

# Bundle with custom output path
tpress . -o context.md

# Compact a single file
tpress README.md -o readme.txt

# Custom token limit
tpress . -t 50000
```

## Usage

### Default Command (Auto-detect)

tpress auto-detects whether you're passing a file or directory:

```bash
tpress .                    # Directory → repo mode
tpress src/main.py          # File → file mode
tpress /path/to/project     # Directory → repo mode
```

### Repository Bundling

```bash
tpress repo . -o bundle.md
```

**Output:**
```
✓ bundle.md · 47k tokens · 19 files
```

#### Adaptive Bundling

tpress uses relevance-based adaptive bundling by default:

- **File Ranking** (0 = most relevant):
  - Rank 0: Core code (`.py`, `.js`, `.ts`, `.go`, `.rs`)
  - Rank 1-4: Compiled, shell, SQL, config
  - Rank 5-6: Documentation, HTML/CSS
  - Rank 7-8: Data files (`.json`, `.csv`)

- **Smart Ordering**: Most relevant files appear first in the bundle

- **Token Budget**: Default 100k limit with intelligent reduction:
  1. Summarize large data files (JSON/CSV → schema only)
  2. Compress lower-priority files (strip docstrings/comments)
  3. Exclude lowest-relevance files if needed

```bash
# Verbose output
tpress . -v

# Very verbose (full analysis)
tpress . -vv

# Disable adaptive bundling
tpress repo . --no-adaptive
```

#### Filtering Modes

```bash
tpress repo .           # Default: code, config, docs
tpress repo . --strict  # Code only
tpress repo . --all     # Everything including data files
```

#### Split Large Repos

```bash
tpress repo . --split -t 50000
# Creates: bundle_1.md, bundle_2.md, ...
```

### Single File Compaction

```bash
tpress main.py                    # Output to stdout
tpress main.py -o compact.py      # Output to file
tpress main.py -p llm-safe        # With secret redaction
```

### HTML Extraction

```bash
tpress html page.html -o text.md
```

## Profiles

| Profile | Description |
|---------|-------------|
| `lossless` | Conservative normalization (whitespace only) |
| `readable` | Unwrap paragraphs, light cleanup |
| `llm` | Aggressive (dedupe, structure-aware) |
| `llm-safe` | Same as `llm` + automatic secret redaction |

## Secret Redaction

The `llm-safe` profile automatically detects and redacts:

- API keys (OpenAI, AWS, Google Cloud, Azure)
- Tokens (GitHub, Slack, JWT)
- Private keys and connection strings
- High-entropy strings

```bash
# Force redaction on any profile
tpress . --redact

# Disable redaction
tpress . --no-redact
```

## CLI Reference

```
tpress [PATH] [OPTIONS]

Arguments:
  PATH          File or directory to process

Options:
  -o, --out PATH          Output file path
  -c, --copy              Copy output to clipboard (no file created)
  -p, --profile TEXT      Compression profile [default: llm]
  -t, --token-limit INT   Soft token limit [default: 100000]
  -v, --verbose           Verbosity: -v steps, -vv detailed
  --redact/--no-redact    Enable/disable secret redaction
  -V, --version           Show version
  --help                  Show help

Subcommands:
  repo     Bundle a repository (full options)
  file     Compact a single file
  html     Extract text from HTML
```

### Clipboard Support

Copy output directly to clipboard instead of writing a file:

```bash
# Copy bundle to clipboard
tpress . -c

# Works with all modes
tpress file.py -c
tpress doc.html -c --html
```

### Token Counting

Token counts are displayed after each operation:

```
✓ bundle.md · 47k tokens · 19 files
```

Tokens are counted using [tiktoken](https://github.com/openai/tiktoken) (GPT tokenizer).

## Roadmap

- [x] Repository bundling with adaptive token limits
- [x] Relevance-based file ordering
- [x] Secret redaction
- [ ] HTML pipeline: `extract | clean | outline`
- [ ] Language-aware strategies (AST-based)
- [ ] Custom redaction patterns via config

## Contributing

Contributions are welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

## License

MIT
