Metadata-Version: 2.4
Name: semantic-gan
Version: 0.1.0
Summary: A library for Knowledge Graph Completion using Wasserstein GANs
Home-page: https://github.com/erdemonal/SemanticGAN
Author: Erdem Önal
License-Expression: GPL-3.0
Project-URL: Homepage, https://github.com/erdemonal/SemanticGAN
Project-URL: Repository, https://github.com/erdemonal/SemanticGAN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: torch>=2.0.0
Requires-Dist: numpy>=1.24.0
Requires-Dist: pandas
Requires-Dist: tqdm
Requires-Dist: scikit-learn
Requires-Dist: plotly
Requires-Dist: matplotlib>=3.7.0
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-python

# Wasserstein GAN for Knowledge Graph Completion

[![Sync Results](https://github.com/erdemonal/SemanticGAN/actions/workflows/sync-results.yml/badge.svg)](https://github.com/erdemonal/SemanticGAN/actions/workflows/sync-results.yml)
[![pages-build-deployment](https://github.com/erdemonal/SemanticGAN/actions/workflows/pages/pages-build-deployment/badge.svg)](https://github.com/erdemonal/SemanticGAN/actions/workflows/pages/pages-build-deployment)

This repository contains an experimental research system for knowledge graph completion on the DBLP Computer Science Bibliography using Wasserstein GANs.

The system uses a Wasserstein GAN to generate candidate RDF triples from an evolving publication graph. Model training is executed periodically using an automated workflow.

## Technical Report

A detailed description of the model, training procedure, and evaluation is provided in the technical report:

[`paper/knowledge-graph-completion-wasserstein-gan.pdf`](paper/knowledge-graph-completion-wasserstein-gan.pdf)

The LaTeX source is available in [`paper/main.tex`](paper/main.tex)

## Results

Training outputs and generated RDF triples are available at:
https://erdemonal.github.io/SemanticGAN

## Methodology

The system processes the DBLP XML dump from https://dblp.uni-trier.de/xml to extract a knowledge graph with entity types Publication, Author, Venue, and Year. Relations include dblp:wrote, dblp:hasAuthor, dblp:publishedIn, and dblp:inYear.

The preprocessing script `scripts/prepare_dblp_kg.py` reads the XML file incrementally and produces RDF triples in tab separated format. The preprocessed 1M triple dataset is versioned and maintained in the [Hugging Face Dataset Hub](https://huggingface.co/datasets/erdemonal/SemanticGAN-Dataset).

The WGAN model consists of a Generator that produces tail entity embeddings from noise and relation embeddings, and a Discriminator that scores triples using a scalar Wasserstein distance. Training uses RMSprop with gradient clipping to enforce the Lipschitz constraint.

An automated training workflow is orchestrated via GitHub Actions. Training is executed on external compute infrastructure, and the resulting artifacts are synchronized after each run.

## Model Storage and Data Decoupling

Model weights and processed knowledge graph artifacts are hosted on the Hugging Face Hub across two repositories:

Model Hub: [erdemonal/SemanticGAN](https://huggingface.co/erdemonal/SemanticGAN) stores the persistent WGAN checkpoints.

Dataset Hub: [erdemonal/SemanticGAN-Dataset](https://huggingface.co/datasets/erdemonal/SemanticGAN-Dataset) contains the processed DBLP triples and ID mappings.

The automated training workflow fetches processed data from the Dataset Hub and restores model states from the Model Hub before each training run.

## Data Availability

The DBLP dataset is publicly available from https://dblp.uni-trier.de/xml

Documentation is available at https://dblp.org/xml/docu/dblpxml.pdf

