Metadata-Version: 2.4
Name: huntpdf
Version: 1.0.0
Summary: CLI tool to fetch PDFs of arxiv, DOI, and PubMed articles
License-Expression: MIT
License-File: LICENSE
Requires-Python: >=3.10
Requires-Dist: arxiv>=2.1
Requires-Dist: httpx>=0.27
Requires-Dist: playwright>=1.40
Provides-Extra: dev
Requires-Dist: pytest; extra == 'dev'
Requires-Dist: respx; extra == 'dev'
Description-Content-Type: text/markdown

# huntpdf

<img width="512" height="512" alt="image" src="https://github.com/user-attachments/assets/c71cc3a5-8f75-4c64-8912-4c19ac177fdd" />


A Python-based CLI tool to fetch PDFs of arxiv, DOI, and PubMed articles.

## Installation

```bash
pip install huntpdf
playwright install chromium  # one-time setup for browser fallback
```

## Usage

```bash
huntpdf <query> [-o OUTPUT]
```

Where `<query>` can be:

| Input Type | Example |
|------------|---------|
| arXiv ID | `huntpdf 2301.07041` |
| DOI | `huntpdf 10.1038/nature12373` |
| PMC ID | `huntpdf PMC4056847` |
| PubMed ID | `huntpdf 25428566` |
| URL | `huntpdf https://arxiv.org/abs/2301.07041` |

Output is JSON:

```json
{"status": "success", "pdf_path": "/absolute/path/to/file.pdf"}
```

### Options

- `-o, --output PATH` — Save PDF to a specific path (default: auto-generated in current directory)

### Supported URL patterns

- `arxiv.org/abs/{id}` and `arxiv.org/pdf/{id}`
- `doi.org/{doi}`
- `pubmed.ncbi.nlm.nih.gov/{pmid}`
- `pmc.ncbi.nlm.nih.gov/articles/{PMCID}`
- Direct PDF links (any URL serving `application/pdf`)
- Other URLs (attempted via headless browser fallback)

### Download strategy

For each URL, huntpdf tries:

1. **Direct HTTP download** — fast streaming via httpx
2. **Headless browser** — Playwright-based fallback for JavaScript-rendered pages

For DOI lookups specifically, if the Unpaywall PDF URL fails (e.g. paywall), huntpdf queries the **Semantic Scholar API** for alternate open-access copies and tries existing resolvers (arXiv, PMC) with the discovered identifiers.

### Environment variables

- `UNPAYWALL_EMAIL` **(required for DOI lookups)** — Your email address for Unpaywall API requests

## Development

```bash
pip install -e ".[dev]"
pytest
```

Run only unit tests (skip integration tests that hit real APIs):

```bash
pytest -m "not integration"
```
