Metadata-Version: 2.4
Name: mcsci-hub
Version: 2026.2.16
Summary: FastMCP server for academic paper search, metadata, PDFs, citations, and BibTeX
Project-URL: Repository, https://git.supported.systems/MCP/mcsci-hub
Author-email: Ryan Malloy <ryan@supported.systems>
License-Expression: MIT
Keywords: academic,bibtex,crossref,mcp,openalex,papers,sci-hub,semantic-scholar
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering
Requires-Python: >=3.11
Requires-Dist: beautifulsoup4>=4.14.3
Requires-Dist: fastmcp>=2.14.5
Requires-Dist: httpx>=0.28.1
Requires-Dist: pydantic-settings>=2.13.0
Requires-Dist: pydantic>=2.11.0
Requires-Dist: tenacity>=9.1.4
Description-Content-Type: text/markdown

# mcsci-hub

An [MCP](https://modelcontextprotocol.io/) server that gives LLMs access to academic literature. Search papers, retrieve metadata, download PDFs, traverse citation graphs, and generate BibTeX -- all through a unified interface backed by four academic data sources.

Built with [FastMCP](https://gofastmcp.com/).

```
search "CRISPR delivery" --> get_paper(doi) --> fetch_pdf(doi) --> [mcp-pdf] --> analyze
```

## How It Works

mcsci-hub aggregates data from multiple academic APIs in parallel, merging results into a single coherent response:

| Source | What it provides | Access |
|--------|-----------------|--------|
| **CrossRef** | Authoritative metadata -- title, authors, journal, volume, pages | Free (polite pool) |
| **OpenAlex** | Abstracts, open access URLs, semantic search | Free (polite pool) |
| **Semantic Scholar** | Citation counts, influential citations, fields of study, BibTeX | Free (optional API key) |
| **Sci-Hub** | Paywalled PDF access via mirror scraping | Fallback only |

The strategy: always try open access first. Sci-Hub is a last resort. When any source fails, the others still return what they can (graceful degradation).

## Tools

| Tool | Description |
|------|-------------|
| `search_papers(query, ...)` | Keyword/topic search via OpenAlex with CrossRef fallback. Filters by year range and open access status. |
| `get_paper(doi)` | Full metadata from all sources, merged and cached. Returns title, authors, abstract, citation counts, fields of study, OA status. |
| `fetch_pdf(doi)` | Downloads the PDF: tries OA URL first, then Sci-Hub. Saves locally and hints to use mcp-pdf for parsing. |
| `get_citations(doi, max_results)` | Forward citations -- papers that cite this one. Includes Semantic Scholar's "influential" flag. |
| `get_references(doi, max_results)` | Backward references -- papers this one cites. Traces intellectual lineage. |
| `get_bibtex(doi)` | BibTeX entry from Semantic Scholar, or constructed from CrossRef metadata as fallback. |

## Resources

| URI | Description |
|-----|-------------|
| `scihub://mirrors` | Configured mirror domains and count |
| `scihub://config` | Full server configuration (cache, timeouts, PDF directory, API key status) |
| `doi://{prefix}/{suffix}` | Dynamic paper metadata lookup by DOI (e.g. `doi://10.1038/nature12373`) |

## Prompts

| Prompt | Description |
|--------|-------------|
| `literature_review(topic, depth)` | Guided workflow: search, detail top papers, traverse citations, synthesize. `depth='deep'` adds PDF analysis via mcp-pdf. |
| `paper_deep_dive(doi)` | Single paper analysis: metadata, PDF, full-text parsing, citation context. |
| `extract_findings(doi)` | Focused data extraction from a paper's PDF -- tables, results, conclusions. |

## Composability with mcp-pdf

`fetch_pdf` saves PDFs locally and returns the file path. Tool responses and prompts guide the calling LLM to use [mcp-pdf](https://pypi.org/project/mcp-pdf/) for full-text extraction, creating the pipeline:

```
search --> fetch --> parse --> analyze
```

Install mcp-pdf alongside mcsci-hub to unlock the full workflow.

## Installation

Requires Python 3.11+.

```bash
# Clone and install
git clone git@git.supported.systems:MCP/mcsci-hub.git
cd mcsci-hub
uv sync

# Copy and edit environment config
cp .env.example .env
```

### Add to Claude Code

```bash
# Local development
claude mcp add mcsci-hub -- uv run --directory /path/to/mcsci-hub mcsci-hub

# From PyPI (once published)
claude mcp add mcsci-hub -- uvx mcsci-hub
```

### Run standalone (stdio transport)

```bash
uv run mcsci-hub
```

## Configuration

All settings are managed via environment variables (loaded from `.env` by pydantic-settings).

| Variable | Default | Description |
|----------|---------|-------------|
| `SCIHUB_MIRRORS` | `sci-hub.se,sci-hub.st,sci-hub.ru` | Comma-separated mirror domains, tried in order |
| `CROSSREF_MAILTO` | *(empty)* | Email for CrossRef polite pool (higher rate limits) |
| `OPENALEX_MAILTO` | *(empty)* | Email for OpenAlex polite pool |
| `S2_API_KEY` | *(empty)* | Optional Semantic Scholar API key for higher rate limits |
| `CACHE_MAX_SIZE` | `1000` | Maximum entries in the TTL cache |
| `CACHE_TTL` | `3600` | Cache entry lifetime in seconds |
| `HTTP_TIMEOUT` | `30` | Request timeout in seconds |
| `PDF_SAVE_DIR` | `/tmp/mcsci-hub-pdfs` | Directory where downloaded PDFs are saved |

See `.env.example` for a ready-to-use template.

## Data Source Strategy

Each field in the aggregated response has a primary and fallback source:

| Field | Primary | Fallback | Notes |
|-------|---------|----------|-------|
| Title, Authors | CrossRef | OpenAlex, S2 | CrossRef is the authoritative registry |
| Abstract | OpenAlex | S2 | Decoded from inverted index format |
| Citation count | S2 | OpenAlex | S2 distinguishes influential citations |
| Open Access URL | OpenAlex | S2 | OpenAlex integrates Unpaywall data |
| Paywalled PDF | Sci-Hub | -- | Mirror scraping, fallback only |
| BibTeX | S2 | Constructed from CrossRef | S2 has `citationStyles.bibtex` |
| Fields of study | S2 | -- | Better classification taxonomy |

All sources are queried in parallel via `asyncio.gather` with `return_exceptions=True` -- a slow or failing source never blocks the others.

## Project Structure

```
src/mcsci_hub/
  server.py                  # FastMCP app, lifespan, resources, prompts
  config.py                  # pydantic-settings env config
  cache.py                   # TTL-aware LRU cache
  models.py                  # Pydantic response models
  clients/
    crossref.py              # CrossRef API (polite pool, rate limited)
    openalex.py              # OpenAlex API (abstracts, OA URLs)
    semantic_scholar.py      # S2 API (citations, BibTeX)
    scihub.py                # Sci-Hub scraper (mirror rotation, retry)
  tools/
    search.py                # search_papers
    paper.py                 # get_paper + aggregate_paper()
    pdf.py                   # fetch_pdf
    citations.py             # get_citations, get_references
    bibtex.py                # get_bibtex
```

## Development

```bash
# Install with dev dependencies
uv sync

# Run tests (55 tests, all mocked with respx + FastMCP in-process transport)
uv run pytest -v

# Run tests with coverage
uv run pytest -v --cov=mcsci_hub

# Lint
uv run ruff check src/ tests/
```

## License

MIT
