Metadata-Version: 2.4
Name: pagefuse
Version: 0.1.1
Summary: Your pages, your way — PDF, DOCX, images and more
Home-page: https://github.com/raptorgold14/pagefuse
Author: RaptorGold
Author-email: hello@pagefuse.net
License: MIT
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Office/Business
Classifier: Topic :: Utilities
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pikepdf>=8.0.0
Requires-Dist: python-docx>=1.0.0
Requires-Dist: python-pptx>=0.6.0
Requires-Dist: click>=8.1.0
Requires-Dist: rich>=13.0.0
Requires-Dist: pillow>=10.0.0
Requires-Dist: lxml>=4.9.0
Requires-Dist: img2pdf>=0.4.0
Requires-Dist: pypdfium2>=4.0.0
Requires-Dist: requests>=2.28.0
Requires-Dist: markdown>=3.4.0
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license
Dynamic: license-file
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# PageFuse

Your pages, your way. Combine pages from PDFs, Word docs, PowerPoint slides, images, Markdown, and more into a single output.

## Supported Input Formats

| Format | Extensions | Requires LibreOffice |
|--------|-----------|----------------------|
| PDF | `.pdf` | No |
| Images | `.png`, `.jpg`, `.jpeg`, `.tiff`, `.tif` | No |
| Markdown | `.md`, `.markdown` | No |
| Word | `.docx`, `.doc` | Yes |
| PowerPoint | `.pptx`, `.ppt` | Yes |
| OpenDocument | `.odt`, `.odp` | Yes |
| Web | `.html` | Yes |

## Supported Output Formats

Output format is determined by the file extension in your config or command:

| Format | Extension | Requires LibreOffice | Notes |
|--------|-----------|----------------------|-------|
| PDF | `.pdf` | No | Default — fast, lossless |
| Image | `.png`, `.jpg`, `.tiff` | No | Single page → file; multi-page → `.png.zip` |
| HTML | `.html` | No | Self-contained — pages rendered as embedded images |
| Word | `.docx`, `.odt` | Yes | Not valid output from presentation sources |
| Presentation | `.odp` | Yes | Not valid output from word-processor sources |

## Installation

```bash
# Linux (recommended — avoids system Python restrictions)
pipx install pagefuse

# macOS
pip install pagefuse

# Windows
pip install pagefuse

# Or inside a virtual environment (any platform)
python3 -m venv venv && source venv/bin/activate
pip install pagefuse

# Via Cargo (requires Python 3.9+ on PATH)
cargo install pagefuse
```

> **Linux note:** If you see `error: externally-managed-environment`, use `pipx` instead of `pip`.
> Install pipx with: `sudo apt install pipx && pipx ensurepath`

### Uninstall

```bash
pipx uninstall pagefuse   # if installed via pipx
pip uninstall pagefuse    # if installed via pip
```

**LibreOffice** is required only for Office/OpenDocument formats. PDF, image, HTML, and Markdown
output all work without it.

```bash
# Ubuntu / Debian
sudo apt install libreoffice

# macOS
brew install --cask libreoffice

# Windows
# Download from https://www.libreoffice.org/download and add soffice.exe to PATH
```

## Usage

### Global options

These options apply to all subcommands and must be placed before the subcommand name:

```bash
pagefuse [OPTIONS] COMMAND [ARGS]...
```

| Option | Default | Description |
|--------|---------|-------------|
| `--lo-timeout SECS` | `300` | LibreOffice conversion timeout in seconds. No hard limit — increase for large or complex files. |
| `--version` | | Print version and exit. |

Example:

```bash
pagefuse --lo-timeout 600 assemble output.docx big_report.pdf:all
```

### Assemble documents

Combine pages from multiple documents into one output. Pass a `.fuse` config file, or use inline arguments:

```bash
# Inline — output first, then sources
pagefuse assemble output.pdf cover.pdf:1 terms.docx:all pricing.pdf:1-3 slides.pptx:2,4,6

# Export as Word document
pagefuse assemble output.docx cover.pdf:1 terms.docx:all

# Export as self-contained HTML
pagefuse assemble output.html report.pdf:1-5

# Export as images (multi-page → output.png.zip)
pagefuse assemble output.png report.pdf:1-3

# From a config file
pagefuse assemble board_pack.fuse

# Preview without writing any files
pagefuse assemble --dry-run board_pack.fuse
pagefuse assemble --dry-run output.pdf cover.pdf:1 terms.docx:all
```

Each source is `file:pages`. Omit `:pages` to include all pages.

Example `board_pack.fuse`:

```
# Output format is determined by the extension (.pdf, .docx, .html, .png, …)
# Add multiple output: lines to export to several formats in one run.
output: board_pack.pdf
output: board_pack.docx
output: board_pack_preview.png

# Metadata (title defaults to output filename if omitted)
title:   Q4 Board Pack
author:  Finance Team
subject: Board meeting materials

file: templates/cover_letter.pdf       1
file: reports/financial_data.docx      all
file: slides/main_deck.pptx            1-4
file: reports/charts.pdf               3,5,7
file: templates/signature_page.pdf     1
```

### Split a document into parts

Extract pages from one document into multiple outputs. Pass a `.fuse` config file, or use inline arguments:

```bash
# Inline — source first, then outputs with page specs
pagefuse split report.pdf cover.pdf:1 body.pdf:2-10 appendix.pdf:11-20

# Each output can be a different format
pagefuse split report.pdf summary.pdf:1 full.docx:all preview.png:1

# From a config file
pagefuse split split.fuse

# Preview without writing any files
pagefuse split --dry-run report.pdf cover.pdf:1 body.pdf:2-10
pagefuse split --dry-run split.fuse
```

> **Note:** Images (`.png`, `.jpg`, etc.) cannot be used as split sources. Presentation sources
> (`.pptx`, `.ppt`, `.odp`) cannot produce word-processor outputs (`.docx`, `.odt`), and
> word-processor sources cannot produce `.odp`. Run `pagefuse info <file>` to see what is
> supported for a given file.

Example `split.fuse`:

```
source: annual_report.pdf

# Metadata (optional — defaults to source file metadata)
title:   Annual Report
author:  Finance Team

output: cover.pdf              1
output: executive_summary.pdf  2-5
output: financials.pdf         6-20
output: appendix.docx          21-30
output: cover_preview.png      1
```

Each output is `file:pages`. Omit `:pages` to copy all pages.

### Generate a config template

Use `pagefuse init` to generate a starter `.fuse` file:

```bash
pagefuse init                            # assemble config → config.fuse
pagefuse init --output board_pack.fuse   # custom filename

pagefuse init --split                    # split config → config.fuse
pagefuse init --split --output split.fuse
```

The `--split` flag generates a split-style template (with `source:` and `output:` lines) instead of the default assemble-style template (with `file:` and `output:` lines).

### Inspect a document

Show page count, metadata, and format support for one or more files:

```bash
pagefuse info report.pdf
pagefuse info report.pdf slides.pptx photo.png
```

Output includes a **Format Support** table showing which commands accept the file as input and what output formats are available:

```
  File    slides.pptx
  Format  PPTX
  Pages   12

              Format Support
 ┌──────────┬───────────────┬─────────────────────────────────────┐
 │ Command  │ Input support │ Output support                      │
 ├──────────┼───────────────┼─────────────────────────────────────┤
 │ assemble │ yes           │ .html  .jpg  .jpeg  .odp  .pdf  ... │
 │ split    │ yes           │ .html  .jpg  .jpeg  .odp  .pdf  ... │
 └──────────┴───────────────┴─────────────────────────────────────┘
```

### Version

```bash
pagefuse --version
```

## Page Specification Syntax

| Spec        | Meaning                  |
|-------------|--------------------------|
| `all`       | Every page               |
| `5`         | Page 5 only              |
| `1-3`       | Pages 1 through 3        |
| `1,3,5`     | Pages 1, 3, and 5        |
| `1-3,5,7-9` | Mixed ranges and singles |

Page numbers are 1-based.

## Error Handling

PageFuse validates all inputs before any work starts:

- **File not found** — all missing files are reported together
- **Wrong format** — unsupported extensions are caught early
- **Invalid page spec** — space instead of colon (e.g. `file.pdf 1`) is detected and corrected
- **Invalid page range** — all out-of-range specs across all files are reported at once, showing the filename and its actual page count
- **Output format** — unsupported output extensions are caught before conversion begins
- **Format incompatibility** — presentation sources cannot produce word-processor outputs and vice versa; `pagefuse info <file>` shows what is allowed

Example:

```
Error: Page specification errors:
  range '2-50' is invalid in 'report.pdf' (12 pages total)
  page 15 does not exist in 'cover.pdf' (3 pages total)
```

## Performance

- **Parallel input conversion** — up to 4 source files converted simultaneously
- **Parallel output writing** — multiple output formats written simultaneously
- **Resource estimation** — estimated peak memory (worst-case concurrent footprint) and disk usage shown before work starts; warns if disk space is tight
- **Live progress table** — shows all tasks upfront with animated spinner on active tasks, checkmark on completed, file sizes, memory usage, and elapsed time per task
- **Thread-safe rendering** — pypdfium2 rendering serialised to prevent crashes on concurrent image/HTML output

## Development

```bash
git clone https://github.com/raptorgold14/pagefuse.git
cd pagefuse
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
pip install -e .
```

Run tests:

```bash
pytest
```

See `examples/` for sample `.fuse` configs and `examples/generate_pdfs.py` to regenerate fixture files.

## License

MIT
