Metadata-Version: 2.1
Name: rapidocr-pdf
Version: 0.0.1
Summary: Tools of extracting PDF content based on RapidOCR
Home-page: https://github.com/RapidAI/RapidOCRPDF
Author: SWHL
Author-email: liekkaskono@163.com
License: Apache-2.0
Keywords: rapidocr_pdf,rapidocr_onnxruntime,ocr,onnxruntime,openvino
Platform: Any
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Requires-Python: >=3.6,<=3.10
Description-Content-Type: text/markdown
Requires-Dist: filetype
Requires-Dist: pymupdf
Provides-Extra: onnxruntime
Requires-Dist: rapidocr-onnxruntime ; extra == 'onnxruntime'
Provides-Extra: openvino
Requires-Dist: rapidocr-openvino ; extra == 'openvino'

## RapidOCRPDF
<p>
    <a href=""><img src="https://img.shields.io/badge/Python->=3.7,<=3.10-aff.svg"></a>
    <a href=""><img src="https://img.shields.io/badge/OS-Linux%2C%20Win%2C%20Mac-pink.svg"></a>
</p>

- 依托于[RapidOCR](https://github.com/RapidAI/RapidOCR)仓库，快速提取PDF中文字，包括扫描版PDF、加密版PDF。
- 暂不包括版式还原。

### 使用
1. 安装`rapid_ocr_pdf`库
   ```bash
   # 基于rapidocr_onnxruntime
   pip install rapid_ocr_pdf[onnxruntime]

   # 基于rapidocr_openvino
   pip install rapidocr_pdf[openvino]
   ```
2. 使用方式
    ```python
    from rapidocr_pdf import PDFExtracter

    pdf_extracter = PDFExtracter()

    pdf_path = 'tests/test_files/direct_and_image.pdf'
    texts = pdf_extracter(pdf_path)
    print(texts)
    ```
3. 输入输出说明
   - **输入**：`Union[str, Path, bytes]`
   - **输出**：`List` \[**页码**, **文本内容** + **置信度**\]， 具体参见下例：
       ```python
       [
           ['0', '达大学拉斯维加斯分校）的一次中文评测中获得最', '0.8969868'],
           ['1', 'ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network∗\nYuliang Liu‡†', '0.8969868'],
       ]
       ```


