Metadata-Version: 2.4
Name: arxivy
Version: 0.1.0
Summary: CLI for the arXiv API, for humans and agents alike
Project-URL: Homepage, https://github.com/mrshu/arxivy
Project-URL: Documentation, https://github.com/mrshu/arxivy#readme
Project-URL: Repository, https://github.com/mrshu/arxivy
Author: mrshu
License-Expression: MIT
Keywords: academic,arxiv,bibtex,cli,papers,research
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering
Requires-Python: >=3.10
Requires-Dist: defusedxml>=0.7.1
Requires-Dist: httpx>=0.25.0
Requires-Dist: rich>=13.0.0
Requires-Dist: typer>=0.9.0
Provides-Extra: dev
Requires-Dist: pytest-httpx>=0.21.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Description-Content-Type: text/markdown

# arxivy

A command-line interface for the [arXiv API](https://info.arxiv.org/help/api/index.html), designed for both human researchers and AI agents.

Part of a family of academic CLI tools ([s2cli](https://github.com/mrshu/s2cli), [dblpcli](https://github.com/mrshu/dblpcli), [openalexcli](https://github.com/mrshu/openalexcli)) that share a consistent interface and output conventions.

## Installation

```bash
pip install arxivy
```

Or with [uv](https://github.com/astral-sh/uv):

```bash
uv pip install arxivy
```

## Quick Start

Run directly without installing using [uvx](https://docs.astral.sh/uv/guides/tools/):

```bash
uvx arxivy search "attention is all you need"
```

Or after installing:

```bash
# Search for papers (shows table in terminal)
arxivy search "attention mechanism transformers"

# Get paper details
arxivy paper 1706.03762

# Export BibTeX
arxivy bibtex 1706.03762 >> references.bib

# Browse latest papers in a category
arxivy new cs.AI --limit 5
```

## Output Formats

arxivy is designed to work seamlessly for both humans and AI agents:

| Context | Default Output | Behavior |
|---------|----------------|----------|
| Terminal (interactive) | Human-readable table | Easy to scan and read |
| Piped to another command | Compact JSON | Machine-parseable for scripts |
| `--json` flag | Pretty JSON | Explicit JSON when you need it |
| `--bibtex` / `-b` flag | BibTeX | Ready for LaTeX |

```bash
# Terminal: shows a nice Rich table
arxivy search "transformers"

# Piped: automatically outputs JSON for jq, scripts, AI agents
arxivy search "transformers" | jq '.results[0].title'

# Explicit JSON (pretty-printed in terminal)
arxivy search "transformers" --json

# BibTeX output
arxivy search "transformers" --bibtex
```

## Commands

| Command | Description |
|---------|-------------|
| `arxivy search <query>` | Search papers by keyword |
| `arxivy paper <id>...` | Get paper details (single = detail view, multiple = table) |
| `arxivy bibtex <id>...` | Export BibTeX citations |
| `arxivy new <category>` | Browse recent papers in a category |

### `arxivy search`

Search papers across all of arXiv.

```bash
# Basic search
arxivy search "attention is all you need"

# Filter by arXiv category
arxivy search "transformers" --category cs.CL
arxivy search "transformers" -c cs.CL

# Limit results
arxivy search "deep learning" --limit 20
arxivy search "deep learning" -n 20

# Sort by submission date or last updated
arxivy search "diffusion models" --sort submittedDate
arxivy search "diffusion models" --sort lastUpdatedDate --order ascending

# Pagination
arxivy search "neural networks" --limit 10 --offset 20

# Show abstracts in the table
arxivy search "reinforcement learning" --abstract
arxivy search "reinforcement learning" -a

# Combine options
arxivy search "vision transformer" -c cs.CV -n 5 --sort submittedDate --json
```

**Options:**

| Flag | Short | Description |
|------|-------|-------------|
| `--limit` | `-n` | Maximum number of results (default: 10) |
| `--offset` | | Pagination offset |
| `--category` | `-c` | Filter by arXiv category (e.g. `cs.AI`, `math.CO`, `hep-ph`) |
| `--sort` | | Sort by: `relevance`, `lastUpdatedDate`, `submittedDate` |
| `--order` | | Sort order: `ascending`, `descending` |
| `--json` | | Output as JSON |
| `--bibtex` | `-b` | Output as BibTeX |
| `--abstract` | `-a` | Include abstracts in table output |

### `arxivy paper`

Fetch one or more papers by arXiv ID.

```bash
# Single paper: shows detailed panel with abstract, authors, links
arxivy paper 1706.03762

# Multiple papers: shows comparison table
arxivy paper 1706.03762 2010.11929 1810.04805

# Accepts full URLs
arxivy paper https://arxiv.org/abs/1706.03762

# Old-style arXiv IDs
arxivy paper hep-ph/0601001

# Versioned IDs (version is stripped automatically)
arxivy paper 1706.03762v7

# Export as JSON or BibTeX
arxivy paper 1706.03762 --json
arxivy paper 1706.03762 --bibtex
```

### `arxivy bibtex`

Export BibTeX entries for one or more papers. This is a shortcut for `arxivy paper <ids> --bibtex`.

```bash
# Single paper
arxivy bibtex 1706.03762

# Multiple papers
arxivy bibtex 1706.03762 2010.11929 1810.04805

# Save to file
arxivy bibtex 1706.03762 2010.11929 > references.bib

# Append to existing file
arxivy bibtex 1810.04805 >> references.bib
```

BibTeX entries use `@article` when a journal reference is present, `@misc` otherwise. All entries include `eprint`, `archiveprefix`, and `primaryclass` fields for proper arXiv citation.

### `arxivy new`

Browse the most recently submitted papers in an arXiv category.

```bash
# Latest cs.AI papers
arxivy new cs.AI

# Limit results
arxivy new cs.CL --limit 20
arxivy new cs.CL -n 20

# With abstracts
arxivy new cs.LG -n 5 --abstract

# Export as JSON or BibTeX
arxivy new math.CO --json
arxivy new hep-ph -n 10 --bibtex
```

## arXiv ID Formats

arxivy automatically normalizes various ID formats:

| Input | Normalized to |
|-------|---------------|
| `1706.03762` | `1706.03762` |
| `1706.03762v7` | `1706.03762` |
| `https://arxiv.org/abs/1706.03762v7` | `1706.03762` |
| `http://arxiv.org/abs/1706.03762` | `1706.03762` |
| `hep-ph/0601001` | `hep-ph/0601001` |
| `hep-ph/0601001v2` | `hep-ph/0601001` |

## arXiv Categories

Some commonly used arXiv categories:

| Category | Field |
|----------|-------|
| `cs.AI` | Artificial Intelligence |
| `cs.CL` | Computation and Language (NLP) |
| `cs.CV` | Computer Vision |
| `cs.LG` | Machine Learning |
| `cs.RO` | Robotics |
| `cs.SE` | Software Engineering |
| `math.CO` | Combinatorics |
| `math.ST` | Statistics Theory |
| `stat.ML` | Machine Learning (Statistics) |
| `hep-ph` | High Energy Physics - Phenomenology |
| `quant-ph` | Quantum Physics |
| `cond-mat` | Condensed Matter |

Full list: https://arxiv.org/category_taxonomy

## JSON Output Structure

### Search / New results

```json
{
  "results": [
    {
      "arxiv_id": "1706.03762",
      "title": "Attention Is All You Need",
      "summary": "The dominant sequence transduction models...",
      "authors": [
        {"name": "Ashish Vaswani", "affiliation": "Google Brain"}
      ],
      "published": "2017-06-12T17:57:34Z",
      "updated": "2023-08-02T01:31:28Z",
      "categories": ["cs.CL", "cs.LG"],
      "primary_category": "cs.CL",
      "comment": "15 pages, 5 figures",
      "journal_ref": "Advances in Neural Information Processing Systems 30 (NIPS 2017)",
      "doi": "10.48550/arXiv.1706.03762",
      "pdf_url": "https://arxiv.org/pdf/1706.03762v7",
      "abstract_url": "https://arxiv.org/abs/1706.03762v7"
    }
  ],
  "meta": {
    "total_results": 100,
    "start_index": 0,
    "items_per_page": 10,
    "query": "attention is all you need"
  }
}
```

### Single paper

```json
{
  "result": {
    "arxiv_id": "1706.03762",
    "title": "Attention Is All You Need",
    "...": "..."
  }
}
```

### Errors

```json
{
  "error": {
    "code": "NOT_FOUND",
    "message": "Paper not found: 9999.99999",
    "suggestion": "Check the arXiv ID format (e.g. 1706.03762 or hep-ph/0601001)",
    "documentation": "https://info.arxiv.org/help/api/index.html"
  }
}
```

## Examples

### Human Workflows

```bash
# Find the seminal transformer paper
arxivy search "attention is all you need" -n 5

# Look up a specific paper with full details
arxivy paper 1706.03762

# Export a bibliography for a literature review
arxivy bibtex 1706.03762 2010.11929 1810.04805 > transformers.bib

# Browse today's ML papers
arxivy new cs.LG -n 20

# Search within a category, sorted by date
arxivy search "RLHF" -c cs.AI --sort submittedDate

# Read abstracts at a glance
arxivy search "chain of thought" -n 5 --abstract
```

### AI Agent / Scripting Workflows

```bash
# Quick context gathering (auto-JSON when piped)
arxivy search "retrieval augmented generation" | jq '.results[:3]'

# Extract just titles
arxivy search "vision transformers" | jq -r '.results[].title'

# Get arXiv IDs for further processing
arxivy search "BERT" | jq -r '.results[].arxiv_id'

# Batch BibTeX export from a list of IDs
arxivy bibtex 1706.03762 1810.04805 2010.11929

# Get the latest paper in a category
arxivy new cs.AI -n 1 | jq '.results[0]'

# Check if a paper exists
arxivy paper 1706.03762 --json | jq '.result.title'
```

## Rate Limiting

arXiv asks clients to wait at least 3 seconds between requests. arxivy enforces this automatically via proactive throttling — no action needed on your part. If the arXiv API returns a server error (5xx), arxivy retries with exponential backoff (up to 3 retries).

## Development

```bash
# Clone and install
git clone https://github.com/mrshu/arxivy.git
cd arxivy
uv sync --all-extras

# Run tests
uv run pytest -v

# Lint
uv run ruff check src/ tests/
```

## Publishing

This project follows the same publishing approach as the sibling tools in this family.

### Trusted Publishing (recommended)

Publishing to PyPI is automated via GitHub Actions using PyPI Trusted Publishing.
You must configure the trusted publisher once in PyPI.
To publish a new version:

1. In PyPI, add a Trusted Publisher for:
   - Owner: `mrshu`
   - Repository: `arxivy`
   - Workflow: `.github/workflows/publish.yml`

2. Bump `version` in `pyproject.toml`
3. Update `CHANGELOG.md`
4. Tag and push a release tag:

```bash
git tag v0.1.0
git push origin v0.1.0
```

The `publish` workflow builds the package and runs `uv publish`.

### Manual publish (fallback)

```bash
uv build
uv publish
```

## Design Philosophy

Based on [CLI best practices](https://clig.dev/) and consistent with the sibling tools:

1. **Human-first by default** — Rich tables in terminals, detail panels for single papers
2. **Machine-friendly when piped** — Automatic compact JSON for scripts and AI agents
3. **Explicit overrides** — `--json` and `--bibtex` flags when you need control
4. **No API key needed** — arXiv's API is free and open
5. **Respectful throttling** — Proactive 3-second delays between requests, as arXiv requests

## License

MIT
