Metadata-Version: 2.4
Name: pagefuse
Version: 0.1.0
Summary: Fuse any document, any format — PDF, DOCX, PPTX, images and more
Home-page: https://github.com/raptorgold14/pagefuse
Author: RaptorGold
Author-email: hello@pagefuse.net
License: MIT
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Office/Business
Classifier: Topic :: Utilities
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pikepdf>=8.0.0
Requires-Dist: python-docx>=1.0.0
Requires-Dist: python-pptx>=0.6.0
Requires-Dist: click>=8.1.0
Requires-Dist: rich>=13.0.0
Requires-Dist: pillow>=10.0.0
Requires-Dist: lxml>=4.9.0
Requires-Dist: img2pdf>=0.4.0
Requires-Dist: pypdfium2>=4.0.0
Requires-Dist: requests>=2.28.0
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license
Dynamic: license-file
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# PageFuse

Fuse any document, any format. Combine pages from PDFs, Word docs, PowerPoint slides, images, and more into a single output.

## Supported Input Formats

| Format | Extensions | Requires LibreOffice |
|--------|-----------|----------------------|
| PDF | `.pdf` | No |
| Images | `.png`, `.jpg`, `.jpeg`, `.tiff` | No |
| Word | `.docx`, `.doc` | Yes |
| PowerPoint | `.pptx`, `.ppt` | Yes |
| OpenDocument | `.odt`, `.odp`, `.ods` | Yes |
| Other | `.rtf`, `.html`, `.csv`, `.xlsx`, `.xls` | Yes |

## Supported Output Formats

Output format is determined by the file extension in your config or command:

| Format | Extension | Requires LibreOffice | Notes |
|--------|-----------|----------------------|-------|
| PDF | `.pdf` | No | Default — fast, lossless |
| Image | `.png`, `.jpg`, `.tiff` | No | Single page → file; multi-page → ZIP |
| Word | `.docx`, `.odt` | Yes | |
| PowerPoint | `.pptx`, `.odp` | Yes | |
| Web | `.html` | Yes | |

## Installation

```bash
# Linux (recommended — avoids system Python restrictions)
pipx install pagefuse

# macOS
pip install pagefuse

# Windows
pip install pagefuse

# Or inside a virtual environment (any platform)
python3 -m venv venv && source venv/bin/activate
pip install pagefuse
```

> **Linux note:** If you see `error: externally-managed-environment`, use `pipx` instead of `pip`.
> Install pipx with: `sudo apt install pipx && pipx ensurepath`

### Uninstall

```bash
pipx uninstall pagefuse   # if installed via pipx
pip uninstall pagefuse    # if installed via pip
```

**LibreOffice** is required only for Office/OpenDocument/HTML formats. PDF and image
formats work on all platforms without it.

```bash
# Ubuntu / Debian
sudo apt install libreoffice

# macOS
brew install --cask libreoffice

# Windows
# Download from https://www.libreoffice.org/download and add soffice.exe to PATH
```

## Usage

### Quick assembly (no config file)

```bash
# Assemble into a PDF
pagefuse quick output.pdf cover.pdf:1 terms.docx:all pricing.pdf:1-3 slides.pptx:2,4,6

# Assemble and export as Word document
pagefuse quick output.docx cover.pdf:1 terms.docx:all

# Assemble and export as images (multi-page → output.zip)
pagefuse quick output.png report.pdf:1-3
```

Each source is `file:pages`. Omit `:pages` to include all pages.

### Config file assembly

```bash
pagefuse assemble board_pack.fuse
```

Example `board_pack.fuse`:

```
# Output format is determined by the extension (.pdf, .docx, .html, .png, …)
# Add multiple output: lines to export to several formats in one run.
output: board_pack.pdf
output: board_pack.docx
output: board_pack_preview.png

# Metadata (title defaults to output filename if omitted)
title:   Q4 Board Pack
author:  Finance Team
subject: Board meeting materials

file: templates/cover_letter.pdf       1
file: reports/financial_data.docx      all
file: slides/main_deck.pptx            1-4
file: reports/charts.pdf               3,5,7
file: templates/signature_page.pdf     1
```

Generate a template config:

```bash
pagefuse init                        # assemble config (default) → config.fuse
pagefuse init --split                # split config → config.fuse
pagefuse init --output my.fuse       # custom filename
pagefuse init --split --output split.fuse
```

### Split a document into parts

Inline (no config file):

```bash
pagefuse split report.pdf cover.pdf:1 body.pdf:2-10 appendix.pdf:11-20

# Each output can be a different format
pagefuse split report.pdf summary.pdf:1 full.docx:all preview.png:1
```

Or use a `.fuse` config file:

```bash
pagefuse split split.fuse
```

Example `split.fuse`:

```
source: annual_report.pdf

output: cover.pdf              1
output: executive_summary.pdf  2-5
output: financials.pdf         6-20
output: appendix.docx          21-30
output: cover_preview.png      1
```

Each output is `file:pages`. Omit `:pages` to copy all pages.

### Inspect a document

```bash
pagefuse info report.pdf
pagefuse info slides.pptx
pagefuse info photo.png
```

## Page Specification Syntax

| Spec        | Meaning                  |
|-------------|--------------------------|
| `all`       | Every page               |
| `5`         | Page 5 only              |
| `1-3`       | Pages 1 through 3        |
| `1,3,5`     | Pages 1, 3, and 5        |
| `1-3,5,7-9` | Mixed ranges and singles |

Page numbers are 1-based.

## Development

```bash
git clone https://github.com/raptorgold14/pagefuse.git
cd pagefuse
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
pip install -e .
```

Run tests:

```bash
pytest
```

See `examples/` for sample `.fuse` configs and `examples/generate_pdfs.py` to regenerate fixture files.

## Roadmap

| Phase | Scope | Status |
|-------|-------|--------|
| 1 — CLI MVP | PDF, DOCX, PPTX, images, OpenDocument, multi-format output | ✅ Done |
| 2 — Validate | Launch, gather feedback, iterate | 🔜 Next |
| 3 — Native formats | Pure Python DOCX/PPTX extraction (no LibreOffice dep) | Planned |
| 4 — GUI | Tauri-based GUI wrapping the CLI | Planned |

## License

MIT
