Metadata-Version: 2.4
Name: wikimine
Version: 0.0.6
Project-URL: Documentation, https://github.com/dataset.sh/wikimine#readme
Project-URL: Issues, https://github.com/dataset.sh/wikimine/issues
Project-URL: Source, https://github.com/dataset.sh/wikimine
Author-email: Hao Wu <haowu@dataset.sh>
License-Expression: MIT
License-File: LICENSE.txt
Classifier: Development Status :: 4 - Beta
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Requires-Python: >=3.8
Requires-Dist: click
Requires-Dist: cmfn
Requires-Dist: mwparserfromhell
Requires-Dist: mwxml
Requires-Dist: peewee
Requires-Dist: tqdm
Description-Content-Type: text/markdown

# wikimine

[![PyPI - Version](https://img.shields.io/pypi/v/wikimine.svg)](https://pypi.org/project/wikimine)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/wikimine.svg)](https://pypi.org/project/wikimine)

-----

## Table of Contents

- [Installation](#installation)
- [License](#license)

## Installation

```console
pip install wikimine
```

## Motivation

`wikidata` contains lots of knowledge modeled by a very powerful graph structure.

Its data structure is powerful and enable lots of applications,
but it also has a steep learning curve for most programmers.

To be able to use `wikidata`, a programmer need to understand its

It also relies on `triplestore/graph` database and a new query language `sparql`,
both have limited learning resource currently.

This project translate wikidata into a data modeling format that's more familiar to most developers,
and use only `sqlite`, removing the need to setting up any new database system.

As a result,
our approach lose some functionality and usefulness of `wikidata`'s original design,
but are more familiar to most developers while still provide enough usefulness of `wikidata`.

While developers can explore `wikidata` using a more familiar mindset with familiar tools.
We hope `wikimine` can serve as a gateway for wikidata, graph database and semantic web,
and allow more people contribute to those related projects.

## Data Modeling

`wikimine` contains the following tables (using peewee ORM):

```python

class WikidataEntityLabel(BaseModel):
    """
    wikidata entity label
    """
    entity_id = CharField()
    language = CharField()
    value = CharField()


class WikidataEntityDescriptions(BaseModel):
    """
    wikidata entity descriptions.
    example: (Q9191, 'en', 'René Descartes')
    """
    entity_id = CharField()
    language = CharField()
    value = CharField()


class WikidataEntityAliases(BaseModel):
    """
    wikidata entity aliases.
    example:
    (Q9191, 'en', 'Descartes')
    (Q9191, 'en', 'Cartesius')
    """
    entity_id = CharField()
    language = CharField()
    value = CharField()


class WikidataClaim(BaseModel):
    """
    wikidata claim contains Statements about wikidata items.
    (item, property, value).

    You can read more about this concept here:
    https://www.wikidata.org/wiki/Wikidata:Introduction

    This table is indexed by
        (source_entity, property_id, target_entity).
        (property_id, target_entity).
        (target_entity).
    """
    source_entity = CharField()  # entity id of item.
    property_id = CharField()
    body = JSONField()  # this is the claim body.
    target_entity = CharField(null=True)  # this is only true if mainsnak.datavalue is wikibase-entityid

```

## Usage

### Process the wikidata json dump

After download the dump

```shell
# first split the dump into smaller pieces for easier processing.
python -m wikimine.cli split ./path-to-dump ./path-to-workspace-folder
# parse and import to sqlite.
python -m wikimine.cli import /path/to/db ./path-to-workspace-folder
# build indices.
python -m wikimine.cli index /path/to/db
```

### Connect to db

```python
from wikimine import auto_connect, connect
"""
    Search for database path from the following source and connect to it automatically.
    1.  from environment variable [WIKIMINE_WIKIDATA_DB].
    2.  ~/.wikimine.config.json: {"db_path": "/path/to/db"}
"""
auto_connect()
# or
connect('/path/to/db')
```

### Label and Link lookup

```python
from wikimine import lookup_label, lookup_wikilink
import wikimine.relations as rel
import wikimine.entity as ent

print(ent.People.Descartes)
print(lookup_label(ent.People.Descartes))
print(lookup_wikilink(ent.People.Descartes))

print(lookup_label(rel.People.lang_written))
```

#### Other commonly used entity and relations

```python
from wikimine.utils import list_static_class_members
import wikimine.relations as rel
import wikimine.entity as ent

print('class People:')
for k, v in list_static_class_members(ent.People):
    print(f'  {k}: {v}')
print()

print('class Location:')
for k, v in list_static_class_members(ent.Location):
    print(f'  {k}: {v}')
print()

print('class Company:')
for k, v in list_static_class_members(ent.Company):
    print(f'  {k}: {v}')
print()

print('class WrittenWorks:')
for k, v in list_static_class_members(ent.WrittenWork):
    print(f'  {k}: {v}')
print()

```

### Query the knowledge graph

```python
from wikimine.query import \
    list_instances_of, \
    get_common_classes, \
    get_common_edges, \
    get_classes_of_instance, \
    get_profile, \
    get_tree 

import wikimine.relations as rel
import wikimine.entity as ent
import pprint

# list first 50 people
print("List first 50 people")
people = list_instances_of(ent.CommonTypes.Human, limit=50)
pprint.pp(people)
print('\n ----- \n')

# get common type of instances
print("Get common type of instances")
common_classes = get_common_classes([
    ent.Company.VW,
    ent.Company.Xerox,
    ent.Company.Apple,
])
common_classes.print_summary()
print('\n ----- \n')

print('List commonly existed outgoing relations of a group of entity')
common_edges = get_common_edges(people)
common_edges.print_summary()
print('\n ----- \n')

print('List all classes of a given entity')
classes = get_classes_of_instance(ent.Company.Apple)
pprint.pp(classes)
print('\n ----- \n')

print('Get all outgoing edge and its value of a given entity')
profile = get_profile(ent.WrittenWork.A_Mathematical_Theory_of_Communication)
pprint.pp(profile)
print('\n ----- \n')


print('Show all types that are has human as a subtype recursively')
tree = get_tree(ent.CommonTypes.Human, rel.TypingRelations.subclass_of)
tree.show()
print('\n ----- \n')

print('Show all types that are sub types of human recursively')
tree = get_tree(ent.CommonTypes.Human, rel.TypingRelations.subclass_of, direction='backward')
tree.show()

```

## License

`wikimine` is distributed under the terms of the [MIT](https://spdx.org/licenses/MIT.html) license.
