Metadata-Version: 2.4
Name: bib-extractor
Version: 0.1.2
Summary: Extract DOIs or titles from PDF papers and generate a BibTeX bibliography
Author: msrtarit
License: MIT
Project-URL: Homepage, https://github.com/msrtarit/bib_extractor
Project-URL: Repository, https://github.com/msrtarit/bib_extractor.git
Keywords: bibtex,doi,pdf,extractor
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Dynamic: license-file

# Bib Extractor

[![PyPI version](https://badge.fury.io/py/bib-extractor.svg)](https://pypi.org/project/bib-extractor/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

A tiny, **pip‑installable** Python utility that scans a folder of PDF papers, extracts their DOI (or a title fallback), and produces a JSON file that can be turned into a BibTeX bibliography.

---

## ✨ Features
- Works on any folder of PDFs.
- Uses `pdftotext` (Poppler) to read text from PDFs.
- Detects DOI strings with a robust regular expression.
- **Multiple API Support**: Queries `doi.org` and falls back to **Crossref** for metadata.
- **Auto-Rename**: Automatically renames PDFs to `Year - Author - Title.pdf`.
- **Formatted Citations**: Generates **APA/MLA** style reference lists in a separate text file.
- **Visual Progress**: Includes a terminal progress bar for high‑volume processing.
- Zero external Python dependencies (standard library only).

---

## 📦 Installation
### From PyPI (recommended)
```bash
pip install bib-extractor
```
### From source
1. **Install Poppler** – `pdftotext` is required.
   - **Windows**: download from <https://github.com/oschwartz10612/poppler-windows/releases> and add the `bin` folder to your `PATH`.
   - **macOS**: `brew install poppler`
   - **Linux**: `sudo apt-get install poppler-utils`
2. **Clone the repository**
```bash
git clone https://github.com/msrtarit/bib_extractor.git
cd bib_extractor
```
3. (Optional) Create a virtual environment and install the package in editable mode:
```bash
python -m venv .venv
.venv\\Scripts\\activate   # Windows
# or source .venv/bin/activate on Unix
pip install -e .
```

---

## 🚀 Usage
### Extract DOIs / titles
```bash
# Using the installed command (if installed via pip)
bib-extractor --dir path/to/papers --output paper_info.json

# Or run the script directly from the source checkout
python extract_bib_info.py --dir path/to/papers --output paper_info.json
```
- `--dir` defaults to the current working directory.
- `--output` defaults to `paper_info.json`.

### Fetch BibTeX entries & Auto-Rename
```bash
# Fetch entries and automatically rename your PDFs
bib-fetch --input paper_info.json --output papers.bib --rename --dir path/to/papers
```
- `--input`: The JSON file from the extractor.
- `--output`: The destination `.bib` file.
- `--citations`: (Optional) Output file for a formatted reference list (e.g., `refs.txt`).
- `--style`: (Optional) Citation style for the list (`apa` or `mla`, default is `apa`).
- `--rename`: (Optional) Automatically renames the files in `--dir` to a standard format: `Year - Author - Title.pdf`.
- `--dir`: (Required if renaming) The folder where your original PDFs are located.

The extractor prints progress and writes a JSON array like:
```json
[
  {"file": "1.pdf", "doi": "10.1109/XYZ.2023.123456"},
  {"file": "2.pdf", "title": "An Interesting Study on …"}
]
```

---

## Next steps
- Convert the generated `.bib` file to the citation style you need.
- Extend the workflow with additional scripts or integrate into your bibliography manager.

---

## 🤝 Contributing
Please see the [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines on how to fork the repo, set up a development environment, and submit pull requests.

---

## 📜 License
This project is licensed under the **MIT License** – see the [LICENSE](LICENSE) file for details.


