Metadata-Version: 2.4
Name: docorient
Version: 0.1.1
Summary: Document image orientation detection and correction using projection profile analysis and optional Tesseract OSD.
Project-URL: Homepage, https://github.com/cebraspe-lab/docorient
Project-URL: Repository, https://github.com/cebraspe-lab/docorient
Project-URL: Issues, https://github.com/cebraspe-lab/docorient/issues
Author: Cebraspe Lab
License-Expression: MIT
License-File: LICENSE
Keywords: correction,document,image,ocr,orientation,rotation,tesseract
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Image Processing
Classifier: Typing :: Typed
Requires-Python: >=3.10
Requires-Dist: numpy>=1.24
Requires-Dist: pillow>=10.0
Requires-Dist: tqdm>=4.60
Provides-Extra: dev
Requires-Dist: build; extra == 'dev'
Requires-Dist: pytest-cov; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Requires-Dist: twine; extra == 'dev'
Provides-Extra: ocr
Requires-Dist: pytesseract>=0.3.10; extra == 'ocr'
Description-Content-Type: text/markdown

# docorient

Document image orientation detection and correction.

Detects and fixes rotation (0°, 90°, 180°, 270°) in scanned document images using projection profile analysis and optional Tesseract OSD.

## Installation

```bash
pip install docorient
```

For 180° detection via Tesseract OSD:

```bash
pip install docorient[ocr]
```

> **Note:** The `[ocr]` extra requires [Tesseract](https://github.com/tesseract-ocr/tesseract) installed on your system.

## Quick Start

### Detect orientation

```python
from PIL import Image
from docorient import detect_orientation

image = Image.open("document.jpg")
result = detect_orientation(image)

print(result.angle)     # 0, 90, 180, or 270
print(result.method)    # detection method used
print(result.reliable)  # confidence flag
```

### Correct a single image

```python
from docorient import correct_image

corrected = correct_image(image)
corrected.save("fixed.jpg")
```

### Correct with metadata

```python
from docorient import correct_image

result = correct_image(image, return_metadata=True)
print(result.orientation.angle)
result.image.save("fixed.jpg")
```

### Correct multi-page document (majority voting)

```python
from docorient import correct_document_pages

pages = [Image.open(f"page_{i}.jpg") for i in range(5)]
corrected_pages = correct_document_pages(pages)
```

### Batch process a directory

> **Note (macOS/Windows):** `process_directory` uses multiprocessing internally.
> Always call it inside `if __name__ == "__main__":` when running as a script.

```python
from docorient import process_directory, OrientationConfig

if __name__ == "__main__":
    config = OrientationConfig(workers=4, output_quality=95)
    summary = process_directory("./scans", output_dir="./fixed", config=config)

    print(f"Corrected: {summary.corrected}/{summary.total_pages}")
```

### CLI

```bash
docorient ./scans --output ./fixed --workers 4
docorient ./scans --dry-run
docorient ./scans --no-ocr --limit 100
```

## How It Works

1. **Projection profile analysis** detects 90° and 270° rotations by comparing horizontal vs vertical text energy
2. **Tesseract OSD** (optional) detects 180° rotation with confidence thresholding
3. **Majority voting** across pages of the same document improves reliability

## Supported Formats

Any format readable by Pillow: JPEG, PNG, TIFF, BMP, GIF, WebP, and more.

## Configuration

```python
from docorient import OrientationConfig

config = OrientationConfig(
    osd_confidence_threshold=2.0,
    output_quality=92,
    max_osd_dimension=1200,
    projection_target_dimension=800,
    workers=4,
    resume_enabled=True,
)
```

## License

MIT
