Metadata-Version: 2.4
Name: SmartResume
Version: 0.1.1
Summary: Intelligent Resume Parsing System
License: Apache-2.0
Project-URL: Homepage, https://github.com/alibaba/SmartResume
Project-URL: Repository, https://github.com/alibaba/SmartResume
Project-URL: Documentation, https://github.com/alibaba/SmartResume#readme
Project-URL: Bug Tracker, https://github.com/alibaba/SmartResume/issues
Keywords: resume,parsing,ocr,llm,pdf,layout
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.23
Requires-Dist: pillow>=9.5
Requires-Dist: opencv-python>=4.7
Requires-Dist: pdfplumber>=0.10.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: tiktoken>=0.5.1
Requires-Dist: openai>=1.30.0
Requires-Dist: json-repair>=0.16.0
Requires-Dist: easyocr>=1.7.0
Requires-Dist: langdetect>=1.0.9
Requires-Dist: vllm>=0.2.0
Requires-Dist: torch>=2.0.0
Requires-Dist: transformers>=4.30.0
Requires-Dist: accelerate>=0.20.0
Requires-Dist: modelscope>=1.9.0
Requires-Dist: requests>=2.25.0
Requires-Dist: onnxruntime>=1.17.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Requires-Dist: black>=23.0; extra == "dev"
Requires-Dist: flake8>=6.0; extra == "dev"
Requires-Dist: mypy>=1.0; extra == "dev"
Provides-Extra: docs
Requires-Dist: sphinx>=5.0; extra == "docs"
Requires-Dist: sphinx-rtd-theme>=1.0; extra == "docs"
Provides-Extra: gradio
Requires-Dist: gradio<6,>=5.34; extra == "gradio"
Requires-Dist: gradio-pdf>=0.0.22; extra == "gradio"
Dynamic: license-file

# SmartResume - Intelligent Resume Parsing System

<div align="center">
  <img src="assets/logo.png" alt="SmartResume Logo" width="80%" >
</div>

<p align="center">
    💻 <a href="https://github.com/alibaba/SmartResume">Code</a>&nbsp&nbsp | &nbsp&nbsp🤗 <a href="https://www.modelscope.cn/models/Alibaba-EI/SmartResume">Model</a>&nbsp&nbsp | &nbsp&nbsp🤖 <a href="https://modelscope.cn/studios/Alibaba-EI/SmartResumeDemo/summary">Demo</a>&nbsp&nbsp | &nbsp&nbsp📑 <a href="https://arxiv.org/abs/2510.09722">Technical Report</a>
</p>

<p align="right"><b>English</b> | <a href="README_CN.md">中文</a></p>


## Project Introduction
SmartResume is an layout‑aware resume parsing system. It ingests resumes in PDF, image and common Office formats, extracts clean text (OCR + PDF metadata), reconstructs reading order with layout detection, and leverages LLMs to convert content into structured fields such as basic info, education, and work experience.

[demo](https://github.com/user-attachments/assets/5814b880-cdb5-41d8-9534-cf6e6909c136)

## Quick Start

### Requirements

- Python >= 3.9
- CUDA >= 11.0 (optional, for GPU acceleration)
- Memory >= 8GB
- Storage >= 10GB

### Installation

1. **Clone the repository**
```bash
git clone https://github.com/alibaba/SmartResume.git
cd SmartResume
```

2. **Create conda environment**
```bash
conda create -n resume_parsing python=3.9
conda activate resume_parsing
```

3. **Install dependencies**
```bash
pip install -e .
```

4. **Configure environment**
```bash
# Copy configuration template
cp configs/config.yaml.example configs/config.yaml
# Edit configuration file and add API keys
vim configs/config.yaml
```

### Basic Usage

#### Method 1: Command Line Interface (Recommended)

```bash
# Parse single resume file
python scripts/start.py --file resume.pdf

# Specify extraction types
python scripts/start.py --file resume.pdf --extract_types basic_info work_experience education
```

#### Method 2: Python API

```python
from smartresume import ResumeAnalyzer

# Initialize analyzer
analyzer = ResumeAnalyzer(init_ocr=True, init_llm=True)

# Parse resume
result = analyzer.pipeline(
    cv_path="resume.pdf",
    resume_id="resume_001",
    extract_types=["basic_info", "work_experience", "education"]
)

print(result)
```

### Local Model Deployment

SmartResume now supports local model deployment using vLLM, reducing dependency on external APIs:

```bash
# Download Qwen-0.6B-resume model
python scripts/download_models.py

# Deploy model
bash scripts/start_vllm.sh
```

For detailed local model deployment guide, see [LOCAL_MODELS](docs/local-models.md).


## Key Features

| Metric Category | Specific Metric | Value | Description |
|----------------|----------------|-------|-------------|
| **Layout Detection** | mAP@0.5 | **92.1%** | High layout detection accuracy |
| **Information Extraction** | Overall Accuracy | **93.1%** | High accuracy |
| **Processing Speed** | Single Page Time | **1.22s** | High performance |
| **Language Support** | Supported Languages | **many** | Covering major global languages |


### Benchmark Results

For detailed benchmark results, see [Benchmark Results](docs/benchmark-results-en.md).

## Configuration

For detailed configuration options, see the [Configuration Guide](docs/configuration.md).

## License Information

This project is licensed under the [LICENSE](LICENSE).

Currently, some models in this project were previously trained with third-party detectors. We plan to explore and replace them with models under more permissive licenses to enhance user-friendliness and flexibility.

## Acknowledgments

- [PDFplumber](https://github.com/jsvine/pdfplumber)
- [EasyOCR](https://github.com/JaidedAI/EasyOCR)

## Citation
```bibtex
@article{Zhu2025SmartResume,
  title={Layout-Aware Parsing Meets Efficient LLMs: A Unified, Scalable Framework for Resume Information Extraction and Evaluation},
  author={Fanwei Zhu and Jinke Yu and Zulong Chen and Ying Zhou and Junhao Ji and Zhibo Yang and Yuxue Zhang and Haoyuan Hu and Zhenghao Liu},
  journal={arXiv preprint arXiv:2510.09722},
  year={2025},
  url={https://arxiv.org/abs/2510.09722}
}
```

---

**Note**: Please ensure compliance with relevant laws and regulations and privacy policies.
