Metadata-Version: 2.1
Name: pdf2zh
Version: 1.6.5
Summary: Latex PDF Translator
Home-page: https://github.com/Byaidu/PDFMathTranslate
Author: Byaidu
Author-email: byaidux@gmail.com
License: MIT
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: charset-normalizer
Requires-Dist: cryptography
Requires-Dist: requests
Requires-Dist: pymupdf
Requires-Dist: tqdm
Requires-Dist: tenacity
Requires-Dist: doclayout-yolo
Requires-Dist: numpy
Requires-Dist: ollama

# PDFMathTranslate

<p align="center">
  <!-- PyPI -->
  <a href="https://pypi.org/project/pdf2zh/">
    <img src="https://img.shields.io/pypi/v/pdf2zh"/>
  </a>
  <!-- License -->
  <a href="./LICENSE">
    <img src="https://img.shields.io/github/license/Byaidu/PDFMathTranslate"/>
  </a>
</p>

PDF scientific paper translation and bilingual comparison.

- 📊 Retain formulas and charts.

- 📄 Preserve table of contents.

- 🌐 Support multiple translation services.

## Installation

```bash
pip install pdf2zh
```

## Usage

Execute the translation command in the command line to generate the translated document `example-zh.pdf` and the bilingual document `example-dual.pdf` in the current directory.

### Translate the entire document

```bash
pdf2zh example.pdf
```

### Translate part of the document

```bash
pdf2zh example.pdf -p 1-3,5
```

### Translate with the specified language

See [Languages Codes](https://developers.google.com/admin-sdk/directory/v1/languages).

```bash
pdf2zh example.pdf -li en -lo ja
```

### Translate with Ollama

See [Ollama](https://github.com/ollama/ollama).

```bash
pdf2zh example.pdf -s gemma2
```

### Use regex to specify formula fonts and characters that need to be preserved

```bash
pdf2zh BDA3.pdf -f "(CM[^RT].*|MS.*|XY.*|MT.*|BL.*|.*0700|.*0500|.*Italic)" -c "(\(|\||\)|\+|=|\d|[\u0080-\ufaff])"
```

## Preview

![image](https://github.com/user-attachments/assets/57e1cde6-c647-4af8-8f8f-587a40050dde)

![image](https://github.com/user-attachments/assets/0e6d7e44-18cd-443a-8a84-db99edf2c268)

![image](https://github.com/user-attachments/assets/5fe6af83-2f5b-47b1-9dd1-4aee6bc409de)

## Acknowledgement

Document merging: [PyMuPDF](https://github.com/pymupdf/PyMuPDF)

Document parsing: [Pdfminer.six](https://github.com/pdfminer/pdfminer.six)

Document extraction: [MinerU](https://github.com/opendatalab/MinerU)

Multi-threaded translation: [MathTranslate](https://github.com/SUSYUSTC/MathTranslate)

Layout parsing: [DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO)

## Star History

<a href="https://star-history.com/#Byaidu/PDFMathTranslate&Date">
 <picture>
   <source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=Byaidu/PDFMathTranslate&type=Date&theme=dark" />
   <source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/svg?repos=Byaidu/PDFMathTranslate&type=Date" />
   <img alt="Star History Chart" src="https://api.star-history.com/svg?repos=Byaidu/PDFMathTranslate&type=Date" />
 </picture>
</a>
