Metadata-Version: 2.4
Name: bibtex-ref-extractor
Version: 0.1.0
Summary: Extract and manage BibTeX references from PDF files with MCP support
Project-URL: Homepage, https://github.com/Atun-tunz/bibtex-ref-extractor--
Project-URL: Documentation, https://github.com/Atun-tunz/bibtex-ref-extractor--#readme
Project-URL: Repository, https://github.com/Atun-tunz/bibtex-ref-extractor--
Project-URL: Issues, https://github.com/Atun-tunz/bibtex-ref-extractor--/issues
Author-email: Atun-tunz <xuhaoyang2002@163.com>
License: MIT
License-File: LICENSE
Keywords: academic,bibtex,citation,doi,mcp,paper,pdf,reference
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.10
Requires-Dist: mcp>=1.0.0
Requires-Dist: pymupdf>=1.23.0
Requires-Dist: requests>=2.28.0
Provides-Extra: dev
Requires-Dist: black>=23.0.0; extra == 'dev'
Requires-Dist: mypy>=1.0.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.21.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Description-Content-Type: text/markdown

# BibTeX Reference Extractor

从PDF提取参考文献，智能查询DOI，管理BibTeX引用库。

[!\[Python 3.10+\](https://img.shields.io/badge/python-3.10%2B-blue null)](https://www.python.org/)
[!\[MCP Compatible\](https://img.shields.io/badge/MCP-Compatible-green null)](https://modelcontextprotocol.io/)
[!\[MIT License\](https://img.shields.io/badge/License-MIT-yellow null)](https://opensource.org/licenses/MIT)

[English](#english) • [中文](#中文)

***

## English

### What is this?

A tool to extract references from PDF papers and get their BibTeX citations automatically.

### Project Structure

```
bibtex-ref-extractor/
├── src/bibtex_extractor/     # Core Python package
│   ├── pdf_reader.py         # Extract refs from PDF
│   ├── reference_lookup.py   # Query DOI from databases
│   ├── bibtex_manager.py     # Manage .bib files
│   └── cli.py                # Command line interface
├── mcp_server/               # MCP server for AI assistants
│   └── server.py
└── pyproject.toml
```

### Installation

```bash
# Clone and install
git clone https://github.com/Atun-tunz/bibtex-ref-extractor--.git
cd bibtex-ref-extractor
pip install -e .
```

### Usage

**Command Line:**

```bash
# Look up a paper
bibtex-extractor lookup "Deep Knowledge Tracing" -a Piech

# Extract from PDF
bibtex-extractor extract paper.pdf

# Process PDF to BibTeX
bibtex-extractor process paper.pdf -b refs.bib
```

**Python API:**

```python
import sys
sys.path.insert(0, 'src')  # Add src to path

from bibtex_extractor import lookup_reference, BibTeXManager

# Look up a paper
result = lookup_reference("Deep Knowledge Tracing", "Piech")
print(result['bibtex'])

# Manage library
manager = BibTeXManager("refs.bib")
manager.add_entry(result['bibtex'])
manager.save()
```

**MCP for AI Assistants:**

Add to Claude Desktop config (`claude_desktop_config.json`):

```json
{
  "mcpServers": {
    "bibtex": {
      "command": "python",
      "args": ["-m", "mcp_server.server"],
      "cwd": "/path/to/bibtex-ref-extractor"
    }
  }
}
```

### Data Sources

CrossRef → Semantic Scholar → arXiv → DBLP

***

## 中文

### 这是什么？

从PDF论文中提取参考文献，自动查询DOI并生成BibTeX引用格式。

### 项目结构

```
bibtex-ref-extractor/
├── src/bibtex_extractor/     # 核心Python包
│   ├── pdf_reader.py         # 从PDF提取参考文献
│   ├── reference_lookup.py   # 查询学术数据库获取DOI
│   ├── bibtex_manager.py     # 管理.bib文件
│   └── cli.py                # 命令行工具
├── mcp_server/               # MCP服务器（供AI助手调用）
│   └── server.py
└── pyproject.toml
```

### 安装

```bash
# 克隆并安装
git clone https://github.com/Atun-tunz/bibtex-ref-extractor--.git
cd bibtex-ref-extractor
pip install -e .
```

### 使用方式

**命令行：**

```bash
# 查询论文
bibtex-extractor lookup "Deep Knowledge Tracing" -a Piech

# 从PDF提取参考文献
bibtex-extractor extract paper.pdf

# 处理PDF生成BibTeX
bibtex-extractor process paper.pdf -b refs.bib
```

**Python代码：**

```python
import sys
sys.path.insert(0, 'src')  # 添加src到路径

from bibtex_extractor import lookup_reference, BibTeXManager

# 查询论文
result = lookup_reference("Deep Knowledge Tracing", "Piech")
print(result['bibtex'])

# 管理引用库
manager = BibTeXManager("refs.bib")
manager.add_entry(result['bibtex'])
manager.save()
```

**MCP供AI助手调用：**

添加到Claude Desktop配置 (`claude_desktop_config.json`):

```json
{
  "mcpServers": {
    "bibtex": {
      "command": "python",
      "args": ["-m", "mcp_server.server"],
      "cwd": "项目路径"
    }
  }
}
```

### 数据来源

CrossRef → Semantic Scholar → arXiv → DBLP

***

## MCP说明

### MCP是什么？

MCP (Model Context Protocol) 是让AI助手（如Claude）能调用外部工具的标准协议。

### 本项目的MCP实现

本项目包含一个MCP服务器 (`mcp_server/server.py`)，提供以下工具：

| 工具                            | 功能           |
| ----------------------------- | ------------ |
| `extract_references_from_pdf` | 从PDF提取参考文献   |
| `lookup_reference`            | 查询DOI和BibTeX |
| `add_to_bibtex`               | 添加到BibTeX库   |
| `search_bibtex`               | 搜索条目         |
| `process_pdf_references`      | 完整工作流        |

### MCP需要部署吗？

**不需要在线部署**。MCP服务器在本地运行，AI助手通过标准输入输出与之通信。

***

## License

MIT License - see [LICENSE](LICENSE)
