Metadata-Version: 2.4
Name: bibfusion
Version: 1.0.0
Summary: A Python package for merging, preprocessing, and enriching bibliographic data from WoS, Scopus, and OpenAlex.
Author: Universidad Nacional de Colombia
License: MIT
Project-URL: Repository, https://github.com/ladmepaz/preprocessing
Keywords: bibliometrics,scientometrics,wos,scopus,openalex,tree of science
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Science/Research
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.24
Requires-Dist: pandas>=2.0
Requires-Dist: rapidfuzz>=3.0
Requires-Dist: requests>=2.30
Dynamic: license-file

# Preprocessing Package

This package provides utilities to preprocess, clean, and harmonize bibliographic data from multiple scientific sources, primarily **Web of Science (WoS)** and **Scopus**.  
It is designed to support bibliometric and scientometric analyses by transforming raw exports into structured pandas DataFrames.

⚠️ **Status:** under active development. APIs and internal structures may change.

---

## Features

- Parsing of raw bibliographic exports into structured DataFrames
- Support for multiple data sources (WoS, Scopus, Crossref, OpenAlex)
- Reference enrichment and linkage across sources
- Designed for reproducible research workflows

---

## Core Functions

- **`wos_df()`**  
  Transforms Web of Science `.txt` export files into pandas DataFrames.

- **`scopus_df()`**  
  Converts Scopus `.bib` export files into pandas DataFrames.

- **`doi_crossref()`**  
  Queries the Crossref API to retrieve metadata associated with a given DOI.

- **`scopus_ref()`**  
  Processes and links article references, identifying relationships between cited documents.

---

## Installation

Clone the repository and install the required dependencies:

```bash
pip install -r requirements.txt
