Metadata-Version: 2.4
Name: kgdata_core
Version: 4.0.2
Classifier: Programming Language :: Rust
Classifier: Programming Language :: Python :: Implementation :: CPython
Requires-Dist: python-dotenv>=0.19.0,<0.20.0 ; extra == 'dev'
Requires-Dist: pytest>=8.3.2,<9.0.0 ; extra == 'dev'
Requires-Dist: kgdata-core[dev] ; extra == 'all'
Provides-Extra: dev
Provides-Extra: all
License-File: LICENSE
License-File: LICENSE
Summary: Library to process dumps of knowledge graphs (Wikipedia, DBpedia, Wikidata)
Home-Page: https://github.com/binh-vu/kgdata
Author-email: Binh Vu <binh@toan2.com>
Requires-Python: >=3.10
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: homepage, https://github.com/binh-vu/kgdata
Project-URL: repository, https://github.com/binh-vu/kgdata

# kgdata ![PyPI](https://img.shields.io/pypi/v/kgdata) ![Documentation](https://readthedocs.org/projects/kgdata/badge/?version=latest&style=flat)

KGData is a library to process dumps of Wikipedia, Wikidata. What it can do:

- Clean up the dumps to ensure the data is consistent (resolve redirect, remove dangling references)
- Create embedded key-value databases to access entities from the dumps.
- Extract Wikidata ontology.
- Extract Wikipedia tables and convert the hyperlinks to Wikidata entities.
- Create Pyserini indices to search Wikidata’s entities.
- and more

For a full documentation, please see [the website](https://kgdata.readthedocs.io/).

## Installation

From PyPI (using pre-built binaries):

```bash
pip install kgdata[spark]   # omit spark to manually specify its version if your cluster has different version
```

