Metadata-Version: 2.1
Name: kgdata
Version: 3.4.2
Classifier: Programming Language :: Rust
Classifier: Programming Language :: Python :: Implementation :: CPython
Requires-Dist: orjson >= 3.8.2, < 4.0.0
Requires-Dist: tqdm >= 4.64.0, < 5.0.0
Requires-Dist: beautifulsoup4 >= 4.9.3, < 5.0.0
Requires-Dist: cityhash >= 0.4.2, < 0.5.0
Requires-Dist: pyspark >= 3.3.0, < 4.0.0
Requires-Dist: loguru >= 0.6.0, < 0.7.0
Requires-Dist: rdflib >= 6.1.1, < 7.0.0
Requires-Dist: six >= 1.16.0, < 2.0.0
Requires-Dist: ruamel.yaml >= 0.17.21, < 0.18.0
Requires-Dist: chardet >= 5.0.0, < 6.0.0
Requires-Dist: ujson >= 5.5.0, < 6.0.0
Requires-Dist: redis >= 3.5.3, < 4.0.0
Requires-Dist: numpy >= 1.22.3, < 2.0.0
Requires-Dist: fastnumbers >= 3.2.1, < 4.0.0
Requires-Dist: requests >= 2.28.0, < 3.0.0
Requires-Dist: sem-desc >= 4.4.2, < 5.0.0
Requires-Dist: click >= 8.0.0, <= 8.0.4
Requires-Dist: parsimonious >= 0.8.1, < 0.9.0
Requires-Dist: hugedict >= 2.9.1, < 3.0.0
Requires-Dist: rsoup >= 2.5.1, < 3.0.0
Requires-Dist: lxml >= 4.9.0, < 5.0.0
Requires-Dist: ray >= 2.0.1, < 3.0.0
Requires-Dist: python-dotenv >= 0.19.0, < 0.20.0; extra == 'dev'
Requires-Dist: pytest >= 7.1.3, < 8.0.0; extra == 'dev'
Requires-Dist: black >= 22.10.0, < 23.0.0; extra == 'dev'
Provides-Extra: dev
License-File: LICENSE
Summary: Library to process dumps of knowledge graphs (Wikipedia, DBpedia, Wikidata)
Author-email: Binh Vu <binh@toan2.com>
Requires-Python: >=3.8
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: repository, https://github.com/binh-vu/kgdata
Project-URL: homepage, https://github.com/binh-vu/kgdata

# kgdata ![PyPI](https://img.shields.io/pypi/v/kgdata) ![Documentation](https://readthedocs.org/projects/kgdata/badge/?version=latest&style=flat)

KGData is a library to process dumps of Wikipedia, Wikidata. What it can do:

- Clean up the dumps to ensure the data is consistent (resolve redirect, remove dangling references)
- Create embedded key-value databases to access entities from the dumps.
- Extract Wikidata ontology.
- Extract Wikipedia tables and convert the hyperlinks to Wikidata entities.
- Create Pyserini indices to search Wikidata’s entities.
- and more

For a full documentation, please see [the website](https://kgdata.readthedocs.io/).

## Installation

From PyPI (using pre-built binaries):

```bash
pip install kgdata
```

