Metadata-Version: 2.4
Name: kgdata
Version: 7.0.7
Classifier: Programming Language :: Rust
Classifier: Programming Language :: Python :: Implementation :: CPython
Requires-Dist: orjson>=3.9.0,<4.0.0
Requires-Dist: tqdm>=4.64.0,<5.0.0
Requires-Dist: beautifulsoup4>=4.9.3,<5.0.0
Requires-Dist: loguru>=0.7.0,<0.8.0
Requires-Dist: rdflib>=7.0.0,<8.0.0
Requires-Dist: six>=1.16.0,<2.0.0
Requires-Dist: ruamel-yaml>=0.17.21,<0.18.0
Requires-Dist: chardet>=5.0.0,<6.0.0
Requires-Dist: ujson>=5.5.0,<6.0.0
Requires-Dist: redis>=3.5.3,<4.0.0
Requires-Dist: numpy>=2.1.1,<3.0.0
Requires-Dist: requests>=2.28.0,<3.0.0
Requires-Dist: sem-desc>=6.11.2,<7.0.0
Requires-Dist: click>=8.1.3,<9.0.0
Requires-Dist: parsimonious>=0.8.1,<0.9.0
Requires-Dist: hugedict>=2.12.10,<3.0.0
Requires-Dist: rsoup>=3.1.7,<4.0.0
Requires-Dist: lxml>=4.9.0,<5.0.0
Requires-Dist: pqdict>=1.3.0,<2.0.0
Requires-Dist: ftfy>=6.1.3,<7.0.0
Requires-Dist: python-dotenv>=0.19.0,<0.20.0 ; extra == 'dev'
Requires-Dist: pytest>=7.1.3,<8.0.0 ; extra == 'dev'
Requires-Dist: black>=22.10.0,<23.0.0 ; extra == 'dev'
Requires-Dist: pyspark>=3.5.0,<4.0.0 ; extra == 'spark'
Requires-Dist: ray>=2.0.1,<3.0.0 ; extra == 'ray'
Requires-Dist: kgdata[dev,spark,ray] ; extra == 'all'
Provides-Extra: dev
Provides-Extra: spark
Provides-Extra: ray
Provides-Extra: all
License-File: LICENSE
License-File: LICENSE
Summary: Library to process dumps of knowledge graphs (Wikipedia, DBpedia, Wikidata)
Keywords: knowledge-graph,wikidata,wikipedia,dbpedia
Home-Page: https://github.com/binh-vu/kgdata
Author-email: Binh Vu <binh@toan2.com>
Requires-Python: >=3.10
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: homepage, https://github.com/binh-vu/kgdata
Project-URL: repository, https://github.com/binh-vu/kgdata

# kgdata ![PyPI](https://img.shields.io/pypi/v/kgdata) ![Documentation](https://readthedocs.org/projects/kgdata/badge/?version=latest&style=flat)

KGData is a library to process dumps of Wikipedia, Wikidata. What it can do:

- Clean up the dumps to ensure the data is consistent (resolve redirect, remove dangling references)
- Create embedded key-value databases to access entities from the dumps.
- Extract Wikidata ontology.
- Extract Wikipedia tables and convert the hyperlinks to Wikidata entities.
- Create Pyserini indices to search Wikidata’s entities.
- and more

For a full documentation, please see [the website](https://kgdata.readthedocs.io/).

## Installation

From PyPI (using pre-built binaries):

```bash
pip install kgdata[spark]   # omit spark to manually specify its version if your cluster has different version
```

