Metadata-Version: 2.1
Name: gpt-pdf-md
Version: 0.1
Summary: A Python package that utilizes GPT-4V and other tools to convert PDFs into Markdown files.
Author: Max Hager
Author-email: maxhager28@gmail.com
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: cachetools==5.3.2
Requires-Dist: certifi==2023.7.22
Requires-Dist: cffi==1.16.0
Requires-Dist: charset-normalizer==3.3.2
Requires-Dist: cryptography==41.0.5
Requires-Dist: google-api-core==2.13.0
Requires-Dist: google-auth==2.23.4
Requires-Dist: google-cloud-core==2.3.3
Requires-Dist: google-cloud-storage==2.13.0
Requires-Dist: google-crc32c==1.5.0
Requires-Dist: google-resumable-media==2.6.0
Requires-Dist: googleapis-common-protos==1.61.0
Requires-Dist: idna==3.4
Requires-Dist: load-dotenv==0.1.0
Requires-Dist: pdf2image==1.16.3
Requires-Dist: pdfminer.six==20221105
Requires-Dist: Pillow==10.1.0
Requires-Dist: protobuf==4.25.0
Requires-Dist: pyasn1==0.5.0
Requires-Dist: pyasn1-modules==0.3.0
Requires-Dist: pycparser==2.21
Requires-Dist: PyMuPDF==1.23.6
Requires-Dist: PyMuPDFb==1.23.6
Requires-Dist: pypandoc==1.12
Requires-Dist: PyPDF2==3.0.1
Requires-Dist: python-dotenv==1.0.0
Requires-Dist: requests==2.31.0
Requires-Dist: rsa==4.9
Requires-Dist: urllib3==2.0.7

# GPT PDF Reader

GPT PDF Reader is a Python package that utilizes GPT-4V and other tools to extract and process information from PDF files.

## Features

- Extracts figures from PDF files using the `pdffigures2` Scala library.
- Converts PDF pages to images and uploads them to Google Cloud Bucket.
- Utilizes GPT-4V Vision to generate Markdown content from pdf an than inserts image urls into markdown.

## Additional Dependencies

This package requires the `pdffigures2` Scala library to extract figures from PDF files. You can get it by cloning the `pdffigures2` repository:


## Installation

The installation process requires Java and Scala. The following instructions are for macOS users:

```bash
brew tap AdoptOpenJDK/openjdk
brew install --cask adoptopenjdk11
brew install jenv
echo 'export PATH="$HOME/.jenv/bin:$PATH"' >> ~/.zshrc
echo 'eval "$(jenv init -)"' >> ~/.zshrc
```

After updating your shell configuration, close and reopen your terminal, then set Java 11 as the global version using jenv:

```bash
jenv add /Library/Java/JavaVirtualMachines/adoptopenjdk-11.jdk/Contents/Home/
jenv global 11.0.11
```

Install GPT PDF Reader via pip:

```bash
pip install gptpdfreader
```

Configure the required environment variables in your .env file without spaces or unnecessary quotes:

```env
OPENAI_API_KEY=open_ai_key
GOOGLE_ID=google_project_id
GOOGLE_BUCKET=google_bucket_name
```

## Usage

To process a PDF and generate Markdown content:

```python
from gptpdfreader.reader import main

main('path_to_your_pdf.pdf')
```

This will process the specified PDF and output a Markdown file with the extracted information in the same directory.

## Limitations 

some limitations

## Contributing

We welcome contributions! Please open an issue or submit a pull request on our GitHub repository.

## Support

For questions and support, please open an issue in the GitHub issue tracker.

## License




