Metadata-Version: 2.4
Name: ocr-to-excel
Version: 0.1.2
Summary: Convert DepEd OCR PDFs to Excel.
License-Expression: LicenseRef-Proprietary
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: pandas
Requires-Dist: openpyxl
Requires-Dist: pillow
Requires-Dist: rich
Requires-Dist: colorama

# DepEdPDFtoExcel

Convert DepEd OCR PDFs to Excel using Ghostscript + Tesseract OCR.

## Requirements

- Python 3.9+
- Ghostscript (for PDF to image conversion)
- Tesseract OCR

## Quick start

```powershell
python -m pip install -r requirements.txt
python ocr_to_excel.py
```

## Automated dependency install (Windows)

This script installs Ghostscript and Tesseract using `winget`.

```powershell
Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass
./scripts/install_dependencies.ps1
```

Optional: set user env vars after install (paths may vary by version):

```powershell
./scripts/install_dependencies.ps1 -SetEnv
```

If `winget` is missing, install **App Installer** from Microsoft Store and retry.

## Environment variables

- `TESSERACT_CMD` - path to `tesseract.exe` (optional)
- `GS_PATH` - path to `gswin64c.exe` (optional)
- `OCR_DPI` - image DPI (default 300)
- `CROP_TOP_RATIO` - top crop ratio (default 0.2)
- `CROP_BOTTOM_RATIO` - bottom crop ratio (default 0.9)

## Output

Generated Excel files are placed in `outputs/excel/`.
