Metadata-Version: 2.1
Name: pykeen
Version: 1.0.2
Summary: A package for training and evaluating multimodal knowledge graph embeddings
Home-page: https://github.com/pykeen/pykeen
Author: Mehdi Ali
Author-email: mehdi.ali@cs.uni-bonn.de
Maintainer: Mehdi Ali
Maintainer-email: mehdi.ali@cs.uni-bonn.de
License: MIT
Download-URL: https://github.com/pykeen/pykeen/releases
Project-URL: Bug Tracker, https://github.com/pykeen/pykeen/issues
Keywords: Knowledge Graph Embeddings,Machine Learning,Data Mining,Linked Data
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Chemistry
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Topic :: Scientific/Engineering :: Mathematics
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Python: >=3.6
Description-Content-Type: text/markdown
Requires-Dist: dataclasses-json
Requires-Dist: numpy
Requires-Dist: click
Requires-Dist: click-default-group
Requires-Dist: sklearn
Requires-Dist: torch
Requires-Dist: tqdm
Requires-Dist: requests
Requires-Dist: optuna (<1.2.0,>=1.0.0)
Requires-Dist: pandas (>=1.0.0)
Requires-Dist: tabulate
Requires-Dist: dataclasses ; python_version < "3.7"
Provides-Extra: docs
Requires-Dist: sphinx ; extra == 'docs'
Requires-Dist: sphinx-rtd-theme ; extra == 'docs'
Requires-Dist: sphinx-click ; extra == 'docs'
Requires-Dist: sphinx-autodoc-typehints ; extra == 'docs'
Requires-Dist: sphinx-automodapi ; extra == 'docs'
Requires-Dist: texext ; extra == 'docs'
Provides-Extra: mlflow
Requires-Dist: mlflow (>=1.8.0) ; extra == 'mlflow'
Provides-Extra: plotting
Requires-Dist: matplotlib ; extra == 'plotting'
Requires-Dist: seaborn ; extra == 'plotting'
Provides-Extra: templating
Requires-Dist: jinja2 ; extra == 'templating'

<p align="center">
  <img src="docs/source/logo.png" height="150">
</p>

<h1 align="center">
  PyKEEN
</h1>

<p align="center">
  <a href="https://travis-ci.com/pykeen/pykeen">
    <img src="https://travis-ci.com/pykeen/pykeen.svg?token=2tyMYiCcZbjqYscNWXwZ&branch=master"
         alt="Travis CI">
  </a>

  <a href='https://opensource.org/licenses/MIT'>
    <img src='https://img.shields.io/badge/License-MIT-blue.svg' alt='License'/>
  </a>

  <a href="https://zenodo.org/badge/latestdoi/242672435">
    <img src="https://zenodo.org/badge/242672435.svg" alt="DOI">
  </a>

  <a href="https://badge.fury.io/py/pykeen">
    <img src="https://badge.fury.io/py/pykeen.svg" alt="PyPI version" height="18">
  </a>
</p>

<p align="center">
    <b>PyKEEN</b> (<b>P</b>ython <b>K</b>nowl<b>E</b>dge <b>E</b>mbeddi<b>N</b>gs) is a Python package designed to
    train and evaluate knowledge graph embedding models (incorporating multi-modal information). It is part of the
    <a href="https://github.com/pykeen">KEEN Universe</a>.
</p>

<p align="center">
  <a href="#installation">Installation</a> •
  <a href="#quickstart">Quickstart</a> •
  <a href="#datasets-13">Datasets</a> •
  <a href="#models-23">Models</a> •
  <a href="#supporters">Support</a>
</p>

## Installation

The development version of PyKEEN can be downloaded and installed from
[PyPI](https://pypi.org/project/pykeen/) on Python 3.7+ with:

```bash
$ pip install pykeen
```

The development version of PyKEEN can be downloaded and installed from
[GitHub](https://github.com/pykeen/pykeen) on Python 3.7+ with:

```bash
$ git clone https://github.com/pykeen/pykeeen.git pykeen
$ cd pykeen
$ pip install -e .
$ # Install pre-commit
$ pip install pre-commit
$ pre-commit install
```

PyKEEN has several extras for installation that are defined in the ``[options.extras_require]`` section
of the ``setup.cfg``. They can be included with installation using the bracket notation like in 
``pip install pykeen[docs]`` or ``pip install -e .[docs]``. Several can be listed, comma-delimited like in
``pip install pykeen[docs,plotting]``.

| Name | Description |
|------|-------------|
| ``plotting`` | Plotting with ``seaborn`` and generation of word clouds  |
| ``mlflow`` | Tracking of results with ``mlflow`` |
| ``docs`` | Building of the documentation |
| ``templating`` | Building of templated documentation, like the README |

## Contributing

Contributions, whether filing an issue, making a pull request, or forking, are appreciated. 
See [CONTRIBUTING.md](/CONTRIBUTING.md) for more information on getting involved.

## Quickstart [![Documentation Status](https://readthedocs.org/projects/pykeen/badge/?version=latest)](https://pykeen.readthedocs.io/en/latest/?badge=latest)

This example shows how to train a model on a data set and test on another data set.

The fastest way to get up and running is to use the pipeline function. It
provides a high-level entry into the extensible functionality of this package.
The following example shows how to train and evaluate the TransE model on the
Nations dataset. By default, the training loop uses the stochastic local closed world assumption (sLCWA) training
approach and evaluates with rank-based evaluation.

```python
from pykeen.pipeline import pipeline
result = pipeline(
    model='TransE',
    dataset='nations',
)
```

The results are returned in a dataclass that has attributes for the trained
model, the training loop, and the evaluation.

PyKEEN is extensible such that:

- Each model has the same API, so anything from ``pykeen.models`` can be dropped in
- Each training loop has the same API, so ``pykeen.training.LCWATrainingLoop`` can be dropped in
- Triples factories can be generated by the user with ``from pykeen.triples.TriplesFactory``

## Implementation

Below are the models, data sets, training modes, evaluators, and metrics implemented
in ``pykeen``.

### Datasets (13)

| Name          | Reference                       | Description                                                                                        |
|---------------|---------------------------------|----------------------------------------------------------------------------------------------------|
| fb15k         | `pykeen.datasets.FB15k`         | The FB15k data set.                                                                                |
| fb15k237      | `pykeen.datasets.FB15k237`      | The FB15k-237 data set.                                                                            |
| hetionet      | `pykeen.datasets.Hetionet`      | The Hetionet dataset is a large biological network.                                                |
| kinships      | `pykeen.datasets.Kinships`      | The Kinships data set.                                                                             |
| nations       | `pykeen.datasets.Nations`       | The Nations data set.                                                                              |
| openbiolink   | `pykeen.datasets.OpenBioLink`   | The OpenBioLink dataset.                                                                           |
| openbiolinkf1 | `pykeen.datasets.OpenBioLinkF1` | The PyKEEN First Filtered OpenBioLink 2020 Dataset.                                                |
| openbiolinkf2 | `pykeen.datasets.OpenBioLinkF2` | The PyKEEN Second Filtered OpenBioLink 2020 Dataset.                                               |
| openbiolinklq | `pykeen.datasets.OpenBioLinkLQ` | The low-quality variant of the OpenBioLink dataset.                                                |
| umls          | `pykeen.datasets.UMLS`          | The UMLS data set.                                                                                 |
| wn18          | `pykeen.datasets.WN18`          | The WN18 data set.                                                                                 |
| wn18rr        | `pykeen.datasets.WN18RR`        | The WN18-RR data set.                                                                              |
| yago310       | `pykeen.datasets.YAGO310`       | The YAGO3-10 data set is a subset of YAGO3 that only contains entities with at least 10 relations. |

### Models (23)

| Name                | Reference                           | Citation                     |
|---------------------|-------------------------------------|------------------------------|
| ComplEx             | `pykeen.models.ComplEx`             | Trouillon *et al.*, 2016     |
| ComplExLiteral      | `pykeen.models.ComplExLiteral`      | Agustinus *et al.*, 2018     |
| ConvE               | `pykeen.models.ConvE`               | Dettmers *et al.*, 2018      |
| ConvKB              | `pykeen.models.ConvKB`              | Nguyen *et al.*, 2018        |
| DistMult            | `pykeen.models.DistMult`            | Yang *et al.*, 2014          |
| DistMultLiteral     | `pykeen.models.DistMultLiteral`     | Agustinus *et al.*, 2018     |
| ERMLP               | `pykeen.models.ERMLP`               | Dong *et al.*, 2014          |
| ERMLPE              | `pykeen.models.ERMLPE`              | Sharifzadeh *et al.*, 2019   |
| HolE                | `pykeen.models.HolE`                | Nickel *et al.*, 2016        |
| KG2E                | `pykeen.models.KG2E`                | He *et al.*, 2015            |
| NTN                 | `pykeen.models.NTN`                 | Socher *et al.*, 2013        |
| ProjE               | `pykeen.models.ProjE`               | Shi *et al.*, 2017           |
| RESCAL              | `pykeen.models.RESCAL`              | Nickel *et al.*, 2011        |
| RGCN                | `pykeen.models.RGCN`                | Schlichtkrull *et al.*, 2018 |
| RotatE              | `pykeen.models.RotatE`              | Sun *et al.*, 2019           |
| SimplE              | `pykeen.models.SimplE`              | Kazemi *et al.*, 2018        |
| StructuredEmbedding | `pykeen.models.StructuredEmbedding` | Bordes *et al.*, 2011        |
| TransD              | `pykeen.models.TransD`              | Ji *et al.*, 2015            |
| TransE              | `pykeen.models.TransE`              | Bordes *et al.*, 2013        |
| TransH              | `pykeen.models.TransH`              | Wang *et al.*, 2014          |
| TransR              | `pykeen.models.TransR`              | Lin *et al.*, 2015           |
| TuckER              | `pykeen.models.TuckER`              | Balazevic *et al.*, 2019     |
| UnstructuredModel   | `pykeen.models.UnstructuredModel`   | Bordes *et al.*, 2014        |

### Losses (7)

| Name            | Reference                           | Description                                                                                       |
|-----------------|-------------------------------------|---------------------------------------------------------------------------------------------------|
| bce             | `pykeen.losses.BCELoss`             | A wrapper around the PyTorch binary cross entropy loss.                                           |
| bceaftersigmoid | `pykeen.losses.BCEAfterSigmoidLoss` | A loss function which uses the numerically unstable version of explicit Sigmoid + BCE.            |
| crossentropy    | `pykeen.losses.CrossEntropyLoss`    | Evaluate cross entropy after softmax output.                                                      |
| marginranking   | `pykeen.losses.MarginRankingLoss`   | A wrapper around the PyTorch margin ranking loss.                                                 |
| mse             | `pykeen.losses.MSELoss`             | A wrapper around the PyTorch mean square error loss.                                              |
| nssa            | `pykeen.losses.NSSALoss`            | An implementation of the self-adversarial negative sampling loss function proposed by [sun2019]_. |
| softplus        | `pykeen.losses.SoftplusLoss`        | A loss function for the softplus.                                                                 |

### Regularizers (5)

| Name     | Reference                                 | Description                                              |
|----------|-------------------------------------------|----------------------------------------------------------|
| combined | `pykeen.regularizers.CombinedRegularizer` | A convex combination of regularizers.                    |
| lp       | `pykeen.regularizers.LpRegularizer`       | A simple L_p norm based regularizer.                     |
| no       | `pykeen.regularizers.NoRegularizer`       | A regularizer which does not perform any regularization. |
| powersum | `pykeen.regularizers.PowerSumRegularizer` | A simple x^p based regularizer.                          |
| transh   | `pykeen.regularizers.TransHRegularizer`   | A regularizer for the soft constraints in TransH.        |

### Optimizers (6)

| Name     | Reference              | Description                                                             |
|----------|------------------------|-------------------------------------------------------------------------|
| adadelta | `torch.optim.Adadelta` | Implements Adadelta algorithm.                                          |
| adagrad  | `torch.optim.Adagrad`  | Implements Adagrad algorithm.                                           |
| adam     | `torch.optim.Adam`     | Implements Adam algorithm.                                              |
| adamax   | `torch.optim.Adamax`   | Implements Adamax algorithm (a variant of Adam based on infinity norm). |
| adamw    | `torch.optim.AdamW`    | Implements AdamW algorithm.                                             |
| sgd      | `torch.optim.SGD`      | Implements stochastic gradient descent (optionally with momentum).      |

### Training Loops (2)

| Name   | Reference                           | Description                                                                               |
|--------|-------------------------------------|-------------------------------------------------------------------------------------------|
| lcwa   | `pykeen.training.LCWATrainingLoop`  | A training loop that uses the local closed world assumption training approach.            |
| slcwa  | `pykeen.training.SLCWATrainingLoop` | A training loop that uses the stochastic local closed world assumption training approach. |

### Negative Samplers (2)

| Name      | Reference                                  | Description                                                                            |
|-----------|--------------------------------------------|----------------------------------------------------------------------------------------|
| basic     | `pykeen.sampling.BasicNegativeSampler`     | A basic negative sampler.                                                              |
| bernoulli | `pykeen.sampling.BernoulliNegativeSampler` | An implementation of the bernoulli negative sampling approach proposed by [wang2014]_. |

### Stoppers (2)

| Name   | Reference                      | Description                   |
|--------|--------------------------------|-------------------------------|
| early  | `pykeen.stoppers.EarlyStopper` | A harness for early stopping. |
| nop    | `pykeen.stoppers.NopStopper`   | A stopper that does nothing.  |

### Evaluators (2)

| Name      | Reference                              | Description                                   |
|-----------|----------------------------------------|-----------------------------------------------|
| rankbased | `pykeen.evaluation.RankBasedEvaluator` | A rank-based evaluator for KGE models.        |
| sklearn   | `pykeen.evaluation.SklearnEvaluator`   | An evaluator that uses a Scikit-learn metric. |

### Metrics (6)

| Metric                  | Description                                                                                                        | Evaluator   | Reference                                  |
|-------------------------|--------------------------------------------------------------------------------------------------------------------|-------------|--------------------------------------------|
| Adjusted Mean Rank      | The mean over all chance-adjusted ranks: mean_i (2r_i / (num_entities+1)). Lower is better.                        | rankbased   | `pykeen.evaluation.RankBasedMetricResults` |
| Average Precision Score | The area under the precision-recall curve, between [0.0, 1.0]. Higher is better.                                   | sklearn     | `pykeen.evaluation.SklearnMetricResults`   |
| Hits At K               | The hits at k for different values of k, i.e. the relative frequency of ranks not larger than k. Higher is better. | rankbased   | `pykeen.evaluation.RankBasedMetricResults` |
| Mean Rank               | The mean over all ranks: mean_i r_i. Lower is better.                                                              | rankbased   | `pykeen.evaluation.RankBasedMetricResults` |
| Mean Reciprocal Rank    | The mean over all reciprocal ranks: mean_i (1/r_i). Higher is better.                                              | rankbased   | `pykeen.evaluation.RankBasedMetricResults` |
| Roc Auc Score           | The area under the ROC curve between [0.0, 1.0]. Higher is better.                                                 | sklearn     | `pykeen.evaluation.SklearnMetricResults`   |

## Hyper-parameter Optimization

### Samplers (2)

| Name   | Reference                       | Description                                                     |
|--------|---------------------------------|-----------------------------------------------------------------|
| random | `optuna.samplers.RandomSampler` | Sampler using random sampling.                                  |
| tpe    | `optuna.samplers.TPESampler`    | Sampler using TPE (Tree-structured Parzen Estimator) algorithm. |

## Experimentation

### Reproduction

PyKEEN includes a set of curated experimental settings for reproducing past landmark
experiments. They can be accessed and run like:

```bash
pykeen experiments reproduce tucker balazevic2019 fb15k
```

Where the three arguments are the model name, the reference, and the data set.
The output directory can be optionally set with `-d`.

### Ablation

PyKEEN includes the ability to specify ablation studies using the
hyper-parameter optimization module. They can be run like:

```bash
pykeen experiments ablation ~/path/to/config.json
```

## Acknowledgements

### Supporters

This project has been supported by several organizations (in alphabetical order):

- [Bayer](https://www.bayer.com/)
- [Enveda Therapeutics](https://envedatherapeutics.com/)
- [Fraunhofer Institute for Algorithms and Scientific Computing](https://www.scai.fraunhofer.de)
- [Fraunhofer Institute for Intelligent Analysis and Information Systems](https://www.iais.fraunhofer.de)
- [Fraunhofer Center for Machine Learning](https://www.cit.fraunhofer.de/de/zentren/maschinelles-lernen.html)
- [Ludwig-Maximilians-Universität München](https://www.en.uni-muenchen.de/index.html)
- [Munich Center for Machine Learning (MCML)](https://mcml.ai/)
- [Siemens](https://new.siemens.com/global/en.html)
- [Smart Data Analytics Research Group (University of Bonn & Fraunhofer IAIS)](https://sda.tech)
- [Technical University of Denmark - DTU Compute - Section for Cognitive Systems](https://www.compute.dtu.dk/english/research/research-sections/cogsys)
- [Technical University of Denmark - DTU Compute - Section for Statistics and Data Analysis](https://www.compute.dtu.dk/english/research/research-sections/stat)
- [University of Bonn](https://www.uni-bonn.de/)

### Logo

The PyKEEN logo was designed by Carina Steinborn.


