Metadata-Version: 2.4
Name: scholar-mcp
Version: 0.1.0
Summary: MCP server for academic paper search, powered by Semantic Scholar (214M+ papers)
License: MIT
Project-URL: Homepage, https://github.com/Liyux3/scholar-mcp
Project-URL: Repository, https://github.com/Liyux3/scholar-mcp
Project-URL: Issues, https://github.com/Liyux3/scholar-mcp/issues
Keywords: mcp,semantic-scholar,academic,papers,arxiv,citations
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: fastmcp>=2.0.0
Requires-Dist: feedparser>=6.0.0
Requires-Dist: pypdf>=4.0.0
Requires-Dist: httpx>=0.27.0
Requires-Dist: beautifulsoup4>=4.12.0
Requires-Dist: lxml>=4.9.0
Provides-Extra: dev
Requires-Dist: pytest>=8.0.0; extra == "dev"
Dynamic: license-file

# scholar-mcp

[![PyPI version](https://badge.fury.io/py/scholar-mcp.svg)](https://pypi.org/project/scholar-mcp)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
[![MCP](https://img.shields.io/badge/MCP-compatible-green.svg)](https://modelcontextprotocol.io)

MCP server for academic paper search, powered by [Semantic Scholar](https://www.semanticscholar.org/) (214M+ papers, 2.49B citations). Search, cite, download, and read papers directly from Claude Code or any MCP client.

## Quick Start

**One-liner for Claude Code:**

```bash
claude mcp add scholar -- uvx scholar-mcp
```

**Or with an API key for higher rate limits:**

```bash
claude mcp add scholar -e S2_API_KEY=your_key -- uvx scholar-mcp
```

**For Claude Desktop**, add to your config:

```json
{
  "mcpServers": {
    "scholar": {
      "command": "uvx",
      "args": ["scholar-mcp"]
    }
  }
}
```

> [!NOTE]
> Requires Python 3.10+ and [uv](https://docs.astral.sh/uv/) installed. No API key needed for basic use (100 requests / 5 minutes free).

## Features

- **Search** across 214M+ papers with filters (year, venue, field of study, citation count, open access)
- **Paper details** with TLDR summaries, BibTeX, venue metadata
- **Citation graph** traversal (who cites this paper, what does it reference)
- **Recommendations** for similar/related papers
- **Author search** with h-index, affiliations, paper counts
- **PDF download** with smart fallback chain (Semantic Scholar -> arXiv -> bioRxiv/medRxiv)
- **Full-text extraction** from downloaded PDFs
- **Fallback search** via arXiv and Google Scholar when Semantic Scholar is unavailable

## Tools

| Tool | Description |
|------|-------------|
| `search_papers` | Search 214M+ papers with year, venue, field, citation filters. Falls back to arXiv, then Google Scholar |
| `get_paper` | Paper details by Semantic Scholar ID, DOI, ArXiv ID (`ArXiv:xxxx`), or PMID (`PMID:xxxx`) |
| `get_citations` | Papers that cite a given paper (up to 1000) |
| `get_references` | Papers referenced by a given paper (up to 1000) |
| `recommend_papers` | Similar/related papers via S2 recommendation engine (up to 500) |
| `search_authors` | Find researchers with h-index, affiliations, paper/citation counts |
| `download_paper` | Download PDF: tries S2 open access, arXiv, bioRxiv/medRxiv |
| `read_paper` | Download + extract full text from PDF (with optional page limit) |

## Configuration

All configuration is via environment variables (all optional):

| Variable | Default | Description |
|----------|---------|-------------|
| `S2_API_KEY` | — | [Semantic Scholar API key](https://www.semanticscholar.org/product/api#api-key-form) for higher rate limits |
| `SCHOLAR_DOWNLOAD_DIR` | `./downloads` | Directory for downloaded PDFs |
| `S2_TIMEOUT` | `30` | API request timeout in seconds |

**Rate limits:** Free tier allows 100 requests per 5 minutes. With an API key: ~100 requests per second.

## Examples

Search with filters:
```python
search_papers("transformer architecture", year="2020-2024", venue="NeurIPS", min_citations=100)
```

Look up specific papers:
```python
get_paper("ArXiv:1706.03762")      # by arXiv ID
get_paper("10.1038/nature12373")   # by DOI
get_paper("PMID:19872477")         # by PubMed ID
```

Explore the citation graph:
```python
get_citations("ArXiv:1706.03762", limit=20)   # who cites this?
get_references("ArXiv:1706.03762", limit=20)  # what does it cite?
recommend_papers("ArXiv:1706.03762")           # find similar work
```

Download and read:
```python
download_paper("ArXiv:1706.03762")
read_paper("ArXiv:1706.03762", max_pages=5)
```

## Development

```bash
git clone https://github.com/Liyux3/scholar-mcp.git
cd scholar-mcp
uv venv && uv pip install -e ".[dev]"
uv run pytest tests/
```

## How It Works

```
Query -> Semantic Scholar API (214M papers)
            |  fails?
            v
         arXiv API (2.5M preprints)
            |  fails?
            v
         Google Scholar (scraping, last resort)
```

PDF downloads try multiple sources: Semantic Scholar open access URL, then arXiv direct, then bioRxiv/medRxiv for biology preprints.

## License

MIT
