Metadata-Version: 2.4
Name: pdfimgq
Version: 0.1.1
Summary: Desktop and CLI tool for verifying the technical quality of raster images embedded in PDF documents.
Author: Petr Novák
License-Expression: AGPL-3.0-or-later
Project-URL: Issues, https://github.com/novax414/pdfimgq-issues/issues
Keywords: pdf,image-quality,dpi,pymupdf,pyside6
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: End Users/Desktop
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Multimedia :: Graphics
Classifier: Topic :: Scientific/Engineering :: Image Processing
Classifier: Topic :: Utilities
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: PyMuPDF>=1.26.6
Requires-Dist: Pillow>=11.3.0
Requires-Dist: numpy>=2.3.4
Requires-Dist: PySide6>=6.10.1
Dynamic: license-file

# PDFIMGQ — PDF Image Quality Checker

A Python desktop application and command-line tool that extracts images embedded in PDF files and evaluates their print-readiness and visual quality metrics (e.g., DPI, shadow/highlight clipping, contrast, colorfulness).

Designed primarily for academic papers, theses, and professional publications to ensure all embedded figures and images meet standard quality thresholds.

---

## Features
- **Graphical User Interface (GUI):** A user-friendly desktop app for visual inspection and filtering.
- **Command Line Interface (CLI):** Automate analysis and generate CSV reports for multiple PDFs.
- **Data Extraction:** Automatically extracts and saves all images from PDFs in their original formats.
- **Comprehensive Metrics:** Computes effective DPI, tonal clipping, contrast, and color richness.
- **Smart Recommendations:** Provides non-blocking, heuristic-based tags (e.g., *Low DPI*, *High Shadow Clip*) to quickly spot problematic figures.

---

## Installation

`pdfimgq` requires Python 3.11 or higher. 

**Install via PyPI (Recommended):**
```bash
pip install pdfimgq
```

---

## Usage

### Graphical User Interface (GUI)

To launch the desktop application, run the following command in your terminal or command prompt:

```bash
pdfimgq --gui
```

**Workflow:**

1. Click **Select PDFs** to load your documents.
2. Select a PDF from the dropdown menu.
3. The table will populate with all detected images, their metrics, and recommendations. Click on any row to view the extracted image and detailed analysis.
4. Output files (extracted images and CSVs) are saved by default to the `PDF_Image_Quality_Outputs` folder in your `Documents` directory.

### Command Line Interface (CLI)

To run the tool directly from the terminal, use the `pdfimgq` command. Running `pdfimgq` without arguments prints help. For batch processing, pass `--input` and optionally `--outdir`.

```bash
# Check available options
pdfimgq --help

# Process a specific PDF and save results to a specific folder
pdfimgq --input ./MyThesis.pdf --outdir ./results

# Process all PDFs in a directory recursively
pdfimgq --input ./documents --outdir ./results --recursive
```

---

## Output Files

For each analyzed PDF (e.g., `MyThesis.pdf`), the tool generates:

* `OUTPUT_DIR/MyThesis.csv` — A detailed CSV report with one row per image draw call.
* `OUTPUT_DIR/MyThesis/` — A directory containing all physically extracted images.

---

## Understanding the Metrics

### 1. Effective DPI

The image’s effective resolution based on its pixel dimensions and physical size on the PDF page. Higher DPI generally means a sharper print.

* **< 200 DPI:** Often too soft for quality print (`Low DPI`).
* **200–299 DPI:** Borderline; usually fine on screen, acceptable but not ideal for print (`Borderline DPI`).
* **≥ 300 DPI:** Commonly recommended for printed theses and high-quality figures.

### 2. Shadow & Highlight Clip [%]

Percentage of pixels that are exactly pure black (`0`) or exactly pure white (`255`). High values may mean lost detail.

* **≈ 0–0.5%:** Very good; detail is preserved.
* **0.5–5%:** Moderate clipping (`Moderate Shadow/Highlight Clip`). Usually acceptable, but check if important details are lost.
* **> 5%:** High clipping (`High Shadow/Highlight Clip`). Likely visible loss of detail.

### 3. Contrast (P1–P99) [%]

A simple contrast indicator based on the difference between the 1st and 99th percentile brightness levels, normalized to 0–100%.

* **< 5%:** Image likely looks very flat (`Low Contrast`).
* **5–10%:** Close to the low-contrast threshold (`Borderline Contrast`).
* **≥ 10%:** Not flagged by this heuristic.

### 4. Colorfulness (HS)

An estimate of perceived colorfulness using the Hasler–Süsstrunk opponent-channel measure. Higher values indicate more vibrant colors; grayscale images are near zero.

* **< 15:** Not/slightly colorful; essentially grayscale or very muted (`Low Colorfulness`).
* **15–33:** Moderately colorful; clearly some color, but not vivid (`Limited Colorfulness`).

### 5. Color Richness [%]

An estimate of overall color diversity (0–100%), computed as normalized Shannon entropy of a quantized RGB histogram. Lower values suggest fewer effectively used colors.

* **< 20%:** Extremely limited palette, ≈ < 8 effective colors (`Low Color Richness`).
* **20–33.3%:** Limited palette, ≈ 8–32 effective colors (`Limited Color Richness`).
* **> 33.3%:** Typical for most illustrations, photos, and visualizations.

---

## Recommendations (Tag meanings)

These tags are **friendly suggestions**, not strict errors. Always consider the actual content and purpose of your image. For instance, a purposefully minimalist diagram will naturally flag as having limited color richness.

| Tag | Condition | Note |
| --- | --- | --- |
| **Low DPI** | Under 200 in either dimension | May print blurry or pixelated. |
| **Borderline DPI** | 200–299 in either dimension | Consider higher resolution for professional print. |
| **Moderate Shadow Clip** | 0.5% – 5% | Check dark areas for lost detail. |
| **High Shadow Clip** | > 5% | Significant shadow detail loss. |
| **Moderate Highlight Clip** | 0.5% – 5% | Check bright areas for blown-out pixels. |
| **High Highlight Clip** | > 5% | Significant highlight detail loss. |
| **Low Contrast** | < 5% | Image lacks depth/punch. |
| **Borderline Contrast** | 5% – 10% | Image may appear somewhat flat. |
| **Low Colorfulness** | < 15 | Image is nearly grayscale. |
| **Limited Colorfulness** | 15 – 33 | Modest color intensity. |
| **Low Color Richness** | < 20% | Extremely limited color palette. |
| **Limited Color Richness** | 20% – 33.3% | Limited color palette. |
| **All Good** | None of the above | No notable technical issues detected by heuristics. |

---

## Dependencies & License

This project relies on several open-source libraries, most notably **PyMuPDF**, which is dual-licensed under GNU AGPLv3 or a commercial license from Artifex. By using `pdfimgq` under its open-source license, you must comply with the AGPLv3 requirements.

`pdfimgq` itself is released under the **GNU Affero General Public License v3.0 (AGPL-3.0)**. See the `LICENSE` file for more details.

