Metadata-Version: 2.4
Name: rdfsolve
Version: 0.0.1
Summary: Wraps several RDF schema solver tools
Author-email: Javier Millán Acosta <javier.millanacosta@maastrichtuniversity.nl>
Maintainer-email: Javier Millán Acosta <javier.millanacosta@maastrichtuniversity.nl>
License: MIT License
        
        Copyright (c) 2024 Javier Millán Acosta
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Bug Tracker, https://github.com/jmillanacosta/rdfsolve/issues
Project-URL: Homepage, https://github.com/jmillanacosta/rdfsolve
Project-URL: Repository, https://github.com/jmillanacosta/rdfsolve.git
Project-URL: Documentation, https://rdfsolve.readthedocs.io
Keywords: snekpack,cookiecutter
Classifier: Development Status :: 1 - Planning
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Framework :: Pytest
Classifier: Framework :: tox
Classifier: Framework :: Sphinx
Classifier: Natural Language :: English
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Typing :: Typed
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: rdflib>=6.0.0
Requires-Dist: sparqlwrapper>=2.0.0
Requires-Dist: pandas>=2.0.0
Requires-Dist: numpy
Requires-Dist: bioregistry
Requires-Dist: linkml>=1.9.1
Requires-Dist: linkml-runtime>=1.9.0
Requires-Dist: deprecated>=1.2.13
Requires-Dist: pyld
Requires-Dist: more_itertools
Requires-Dist: tqdm
Requires-Dist: pyarrow==22.0.0
Requires-Dist: click
Requires-Dist: more_click
Requires-Dist: flask
Requires-Dist: jupyter
Requires-Dist: nbconvert
Requires-Dist: ipykernel
Provides-Extra: tests
Requires-Dist: pytest; extra == "tests"
Requires-Dist: coverage; extra == "tests"
Provides-Extra: docs
Requires-Dist: sphinx>=8; extra == "docs"
Requires-Dist: sphinx-rtd-theme>=3.0; extra == "docs"
Requires-Dist: sphinx-click; extra == "docs"
Requires-Dist: sphinx-automodapi; extra == "docs"
Requires-Dist: plotly>=5.0.0; extra == "docs"
Provides-Extra: dev
Requires-Dist: ruff; extra == "dev"
Provides-Extra: web
Requires-Dist: flask>=2.0; extra == "web"
Provides-Extra: notebooks
Requires-Dist: matplotlib>=3.10.7; extra == "notebooks"
Requires-Dist: seaborn>=0.11.0; extra == "notebooks"
Requires-Dist: plotly>=5.0.0; extra == "notebooks"
Requires-Dist: networkx>=2.8; extra == "notebooks"
Requires-Dist: jupyterlab>=4.0.0; extra == "notebooks"
Requires-Dist: ipywidgets>=8.0.0; extra == "notebooks"
Requires-Dist: narwhals==2.11.0; extra == "notebooks"
Requires-Dist: plotly==6.4.0; extra == "notebooks"
Dynamic: license-file

# RDFSolve

<p align="center">
    <a href="https://github.com/jmillanacosta/rdfsolve/actions/workflows/tests.yml">
        <img alt="Tests" src="https://github.com/jmillanacosta/rdfsolve/actions/workflows/tests.yml/badge.svg" /></a>
    <a href="https://pypi.org/project/rdfsolve">
        <img alt="PyPI" src="https://img.shields.io/pypi/v/rdfsolve" /></a>
    <a href="https://pypi.org/project/rdfsolve">
        <img alt="PyPI - Python Version" src="https://img.shields.io/pypi/pyversions/rdfsolve" /></a>
    <a href="https://github.com/jmillanacosta/rdfsolve/blob/main/LICENSE">
        <img alt="PyPI - License" src="https://img.shields.io/pypi/l/rdfsolve" /></a>
    <a href='https://rdfsolve.readthedocs.io/en/latest/?badge=latest'>
        <img src='https://readthedocs.org/projects/rdfsolve/badge/?version=latest' alt='Documentation Status' /></a>
</p>

Extract RDF schemas from SPARQL endpoints and convert to multiple formats (VoID, LinkML, JSON-LD).

## Installation

```bash
uv pip install rdfsolve
```

## Quick Start

### CLI

Extract schema and convert to multiple formats:

```bash
# Discover existing VoID metadata (fast)
rdfsolve discover --endpoint https://sparql.rhea-db.org/sparql

# Extract schema (uses discovered VoID if available)
rdfsolve extract --endpoint https://sparql.rhea-db.org/sparql \
  --output-dir ./output

# Export to different formats
rdfsolve export --void-file ./output/void_description.ttl \
  --format all --output-dir ./output
```

**Extract Command Options:**

```bash
# Force fresh generation (bypasses discovered VoID)
rdfsolve extract --endpoint URL --force-generate

# Custom naming and URIs
rdfsolve extract --endpoint URL \
  --dataset-name mydata \
  --void-base-uri "http://example.org/mydata/well-known/void"

# Filter specific graphs
rdfsolve extract --endpoint URL \
  --graph-uri http://example.org/graph1 \
  --graph-uri http://example.org/graph2
```

**Export Formats:**

- `csv` - Schema patterns table
- `jsonld` - JSON-LD representation
- `linkml` - LinkML YAML schema
- `shacl` - SHACL shapes for RDF validation
- `rdfconfig` - RDF-config YAML files (model, prefix, endpoint)
- `coverage` - Pattern frequency analysis
- `all` - All formats (default)

**Export with custom LinkML schema:**

```bash
rdfsolve export --void-file void_description.ttl \
  --format linkml \
  --schema-name custom_schema \
  --schema-uri "http://example.org/schemas/custom" \
  --schema-description "Custom schema description"
```

**Export SHACL shapes for RDF validation:**

```bash
# Export closed SHACL shapes (strict validation)
rdfsolve export --void-file void_description.ttl \
  --format shacl \
  --shacl-closed \
  --shacl-suffix Shape

# Export open SHACL shapes (flexible validation)
rdfsolve export --void-file void_description.ttl \
  --format shacl \
  --shacl-open
```

SHACL (Shapes Constraint Language) shapes define constraints on RDF data and can be used to validate RDF instances against the extracted schema. Closed shapes only allow properties explicitly defined in the schema, while open shapes are more permissive.

**Export RDF-config files:**

```bash
rdfsolve export --void-file void_description.ttl \
  --format rdfconfig \
  --endpoint-url https://sparql.example.org/sparql \
  --graph-uri http://example.org/graph \
  --output-dir ./output
```

Creates a directory `{dataset}_config/` containing:

- `model.yml` - Class and property structure
- `prefix.yml` - Namespace prefix definitions  
- `endpoint.yml` - SPARQL endpoint configuration

This structure is required by the [rdf-config](https://github.com/dbcls/rdf-config) tool.

**Count instances per class:**

```bash
rdfsolve count --endpoint URL --output counts.csv
```

**Service graph filtering:**

By default, `extract` and `count` exclude Virtuoso system graphs and well-known URIs. Use `--include-service-graphs` to include them.

### Python API

```python
from rdfsolve.api import (
    generate_void_from_endpoint,
    load_parser_from_graph,
    count_instances_per_class,
    to_shacl_from_file,
    to_rdfconfig_from_file,
)

# Generate VoID from endpoint
void_graph = generate_void_from_endpoint(
    endpoint_url="https://sparql.example.org/",
    graph_uris=["http://example.org/graph"],
    void_base_uri="http://example.org/void",  # Custom partition URIs
)

# Load parser and extract schema
parser = load_parser_from_graph(void_graph)

# Export to different formats
schema_df = parser.to_schema()  # Pandas DataFrame
schema_jsonld = parser.to_jsonld()  # JSON-LD
linkml_yaml = parser.to_linkml_yaml(
    schema_name="my_schema",
    schema_base_uri="http://example.org/schemas/my_schema"
)

# Export to SHACL shapes for validation
shacl_ttl = parser.to_shacl(
    schema_name="my_schema",
    schema_base_uri="http://example.org/schemas/my_schema",
    closed=True,  # Closed shapes for strict validation
    suffix="Shape",  # Append "Shape" to class names
)

# Or use the convenience function
shacl_ttl = to_shacl_from_file(
    "void_description.ttl",
    schema_name="my_schema",
    closed=True,
)

# Export to RDF-config format
rdfconfig = to_rdfconfig_from_file(
    "void_description.ttl",
    endpoint_url="https://sparql.example.org/",
    graph_uri="http://example.org/graph",
)
# Save to {dataset}_config/ directory structure
import os
os.makedirs("dataset_config", exist_ok=True)
with open("dataset_config/model.yml", "w") as f:
    f.write(rdfconfig["model"])
with open("dataset_config/prefix.yml", "w") as f:
    f.write(rdfconfig["prefix"])
with open("dataset_config/endpoint.yml", "w") as f:
    f.write(rdfconfig["endpoint"])

# Count instances per class
class_counts = count_instances_per_class(
    "https://sparql.example.org/",
    graph_uris=["http://example.org/graph"],
)
```

## Features

- Extract RDF schemas from SPARQL endpoints using VoID partitions
- Discover existing VoID metadata or generate fresh
- Export to multiple formats: CSV, JSON-LD, LinkML, SHACL, RDF-config, coverage analysis
- SHACL shapes generation for RDF data validation
- RDF-config export for schema documentation (compatible with rdf-config tool)
- Customizable dataset naming and VoID partition URIs
- Service graph filtering (excludes Virtuoso system graphs by default)
- Instance counting per class with optional sampling

## Documentation

- Documentation: [rdfsolve.readthedocs.io](https://rdfsolve.readthedocs.io)
- Results dashboard: [jmillanacosta.github.io/rdfsolve](https://jmillanacosta.github.io/rdfsolve)

## License

MIT License - see [LICENSE](LICENSE) for details.

[![Powered by the Bioregistry](https://img.shields.io/static/v1?label=Powered%20by&message=Bioregistry&color=BA274A&style=flat&logo=image/png;base64,iVBORw0KGgoAAAANSUhEUgAAACgAAAAoCAYAAACM/rhtAAAACXBIWXMAAAEnAAABJwGNvPDMAAAAGXRFWHRTb2Z0d2FyZQB3d3cuaW5rc2NhcGUub3Jnm+48GgAACi9JREFUWIWtmXl41MUZxz/z291sstmQO9mQG0ISwHBtOOSwgpUQhApWgUfEowKigKI81actypaqFbWPVkGFFKU0Vgs+YgvhEAoqEUESrnDlEEhCbkLYJtlkk9399Y/N/rKbzQXt96+Zed+Z9/t7Z+adeecnuA1s5yFVSGrLOAf2qTiEEYlUZKIAfYdKE7KoBLkQSc4XgkPfXxz/owmT41ZtiVtR3j94eqxQq5aDeASIvkVb12RBtt0mb5xZsvfa/5XgnqTMcI3Eq7IQjwM+7jJJo8YvNhK/qDBUOl8A7JZWWqqu01Jeg6Pd1nW4NuBjjax6eWrRruv/M8EDqTMflmXeB0Jcbb6RIRhmTCJ0ymgC0wYjadTd9nW0tWMu+In63NNU7c3FWtvgJpXrZVlakVGU8/ltEcwzGjU3miI/ABa72vwTB5K45AEi7x2PUEl9fZsHZLuDmgPHuLJpJ82lle6iTSH6mpXp+fnt/Sa4yzhbp22yfwFkgnMaBy17kPhFmQh1997qLxztNkq35XB505fINtf0iz1WvfTQ7Pxdlj4Jdnjuny5yvpEhjHh7FQOGD/YyZi4owS86HJ+QQMDpJaBf3jUXlHD21+8q0y4LDppV/vfNO7+jzV3Pa6SOac0E8I8fSPonpm7JAVR+eRhzwU/Ofj+e49tpT/HdtGXcyLvQJ8HAtCTGfmJCF2dwfpTMz4NszX/uqqdyr+xPyVwoEK+C03PGrDX4GkJ7NBJ+txH/hCgAit7cRlNxOY62dmzmZgwzJvZJUh2gI/xnRmoOHsfe3AqQ/kho0qXs+pLzLh3FgwdT54YKxLsAQq0mbf1zHuTsltZejemHJSrlgGGDPGTXc09zdM5qTi59jZbKOg+Zb1QYI95+XokEQogPDifPDnPJFQ8uCkl8FyGmACQtn4dhxp3KINX7jnHi0ZeJnT8dla8Plbu+48zzfyJ08kh8ggIACB4zlIAhsURm3EnML6eB6Fzep1a+SUt5DS2VddTs+4GQccPRhgV1kowIQRaChhMXAPxkIev/Vl+8R/HgnqTMmI4gjH/iQOIXZSqdzQUlXDB9RPyi+1DrdVx67WMursvCkDERXYxB0ROSIOKecURMG+tBzkXAhbYbZk6teNPLkwmPzUIX71wuMiw+MHx2nEJQrWIFHSdE4pIHlFDisLZxYe1HhIwfTtLK+RSu30rVnlxGvrOapOcW9DsW3vH6CgKS4zxIXlz3Fw8dSaMmcfEcV9XHYbc/DSCZMEkgFoJzY0TeO17pVL7jANbaBoauWUJlTi4VOw+T9sazBKYl0ZB/qV/kALThQRi3vOJB0lpzw0vPMONOtOHOqRcyi7bzkEqanJo3HogBMGROUrziaGundGsOsQsyUPn6UPx2NvELZxIybhinn3uLyx9uVwaW7XbqjxdQmr2X0uy93Dh+Dtlu9zCu9vdj1PsvEWwcii7OwJAXFnoRFCoVhoxJrmr0gOQWo9qBfaorXodOHq0o1x8roN3cSMyC6ZT942uQBIlL53Jl804sV6oY9/fXAGg4WcjFdZuxlFV7GNPFRzFs7VKCRiV7ejJrTa/eDr1rFKXZOQCocEyTgHQAyUdD4B2d4cF8pohg4zC0YUFU7z5C9Jy7sVvbKPtsH6GT0tCGBtFwspBTz/zRixyApbSKk8te5+aZ4l4JdUVQWpIScmQhjGocUjJCRhcTieSjURQTF89FtttpuVaLpaya8Knp1B3OQ5Zlag/nU//9cmScS6EnONrauWjazIQv3kCoVD3quUPS+uAXHU7z1SpATpEQchSA78AwD0WVnxa1XkdjURlCJRGQHMfN/EuEjk9jyr4NRN47Hltjc58Gm0sraTjZ/w3l5BLuKkZJdFzT1f5+3Sq3NZjRDNAjaX1orb2BX2wEmkA9fvGGbvW7Q+OlUu+2wlIqdx+h3dzkJVPrda5iQJ93p+DRqcQ/PhsAw8xJ6AfHdkhuIVvoEribLl/jxKOv4Gi34T8omgnb1yOk7sdTA01AiK3J6yoGgP+gaPwHOdOP6LlTlXb3mNYXAlI8da9/e0pJBZovV2BrakYzQK/I3bg0SsiiCqClqs/0wAPB6UOVo6k3+CdEETwm1aPtP+dLlLJPSKAHOYDWCoVLlYTkKAKcCU4vO7IrhErFsLVLPXZ+V0haDcN+v8xjB9strdQfPavUA0ckefRxWNuwVNS6rBRKQB44r+Lmc5f7TRAgaFQyYzb9Dv/4gd18ASQ8/gsC0zwJNJVcw97aeWmOcDtaAW6eLXZLBchTC8EhWXbW6o+cInhMipetuu9OUvTWNnwNodzx+krlvAQIGjmECV+spyH/Ak3F5QDok+OoPXicip2HiJiWTuH6rQx6eh7BxlT0STH4xUbSUl6Df/xAIqaO9bBVn3taKUuy/ZAwYZImpvx4FYjVRgQzOec9r1vK0TmrldMiIDkO45ZXegxLLrRW13P0/heQHQ4CUhIYvfElNIHOtWaztNJ4qZQBqfFKLg3OMz135rNY624ClB0tHJcomTA5ZMGnANbaBmoOHPMy5hvZebNuLCoj71frXIN0i9pDJzj24IsIlUTCo7NI3/KyQg5ArfMleEyKBzmA6r1HO8eV+dSEySEB2G3yRpwZP1c2f+n1GjB07RIlcwNoKi7j3G839EhQF2cg6fmHmbznPRKevJ/GorIedV1wtLVzJesrV9WqQtoIHRfWjreSjwGar1ZRui3Ho7PfwHBGb3jRg6S1roGeoIuNJGBIPKV/zSF31irOrn4HXAu9B1zduhtLecelQxZZ9xTtrgC342Df8IwQyaYqBMKEWo0xaw1BI4d4DNJSWcfF32fRWnuD5NWPEDZ5lIe8NDuHq1v+ha2xGdkho4szYJg1hbj501EH6OgJ5oIS8hf/oWPm5HqNrE51vdt4nC/7k+9bIIT8GYA2Ipixn5jwjQrrZsju0XT5GubTRfiEBqFPisUvOrzPPi0VdeQ9YcJ63bWmxbzphTk7XHKvA/DrlJkfAU+Bcy2N+fA3vZK0WVoxny4idOKIfn+IO7lTz7zRObWCjdMv7VnhruOV9dws9F8u4CsAS1k1J54wYS4o6arWaaS8hvLP998yuZtnisl7wuROLkdjsKzqqtfL45FjB8gzwZnIJy6dS8Jjs3p8ausvHG3tXN26mytZO5W8Rcjsbg1Qze/X45ELHY9I7wHLXG26+CgSl8zFkDGh3zdkF2S7nep9PzhzmnK3FEGwUWOwrJr6zTdeL529EnRhf3LmfCHEBkBZiNrwIAwZkwi9a5Qzh9D6dNvXYW3jZkEJ9UdOOYPwdY/gXgdiufuGuC2C4Hy3kWXrOhmeBLQeA6jV6GLC8Y0KR613Hn+2phZaK69jqah1P/hdsCKLLIfGtnbG+f3eyfHtEHTh38mzom2SY4WQWQjE9tnBE+XIZKuQNrqCcH9wSwRdMGGSJiTnpatwTJOFMIKcgvPVX/kNIcM1gSgC8iTZfii3aEL+7fyG+C+6O8izl1GE5gAAAABJRU5ErkJggg==)](https://github.com/biopragmatics/bioregistry)
