Metadata-Version: 2.4
Name: lodkit
Version: 0.2.8
Author-email: Lukas Plank <lupl@tuta.io>
License-File: LICENSE
Requires-Python: ~=3.11
Requires-Dist: hypothesis[pytz]<7,>=6.112.2
Requires-Dist: langcodes<4,>=3.4.0
Requires-Dist: loguru<0.8,>=0.7.2
Requires-Dist: pytz~=2024.2
Requires-Dist: rdflib<8,>=7.0.0
Requires-Dist: typeguard<5,>=4.3.0
Description-Content-Type: text/markdown

<img src="lodkit.svg" width="50%" height="50%" />

# LODKit
![tests](https://github.com/lu-pl/lodkit/actions/workflows/tests.yaml/badge.svg)
[![coverage](https://coveralls.io/repos/github/lu-pl/lodkit/badge.svg?branch=main&kill_cache=1)](https://coveralls.io/github/lu-pl/lodkit?branch=main&kill_cache=1)
[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
[![PyPI version](https://badge.fury.io/py/lodkit.svg)](https://badge.fury.io/py/lodkit)
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
[![uv](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/uv/main/assets/badge/v0.json)](https://github.com/astral-sh/uv)

<!-- <a href="https://github.com/psf/black"><img alt="Code style: black" src="https://img.shields.io/badge/code%20style-black-000000.svg"></a> -->

LODKit is a collection of Linked Open Data related Python functionalities.


# Installation

LODKit is available on PyPI:

```shell
pip install lodkit
```

# Usage

## RDF Importer

`lodkit.RDFImporter` is a custom importer for importing RDF files as if they were modules.

Assuming 'graphs/some_graph.ttl' exists in the import path, `lodkit.RDFImporter` makes it possible to do the following:
```python
import lodkit
from graphs import some_graph

type(some_graph)  # <class 'rdflib.graph.Graph'>
```

Note that `lodkit.RDFImporter` is available on `import lodkit`.

## Types
`lodkit.lod_types` defines several useful `typing.TypeAliases` and `typing.Literals` for working with RDFLib-based Python functionalities.

## URI Tools

### uriclass, make_uriclass

`uriclass` and `make_uriclass` provide dataclass-inspired URI constructor functionality.

With `uriclass`, class-level attributes are converted to URIs according to uri_constructor.
For class attributes with just type information, URIs are constructed using UUIDs,
for class attributes with string values, URIs are constructed using hashing based on that string.

```python
from lodkit import uriclass

@uriclass(Namespace("https://test.org/test/"))
class uris:
    x1: str

    y1 = "hash value 1"
    y2 = "hash value 1"

    print(uris.x1)             # Namespace("https://test.org/test/<UUID>")
    print(uris.y1 == uris.y2)  # True
```

`make_uriclass` provides equalent functionality but is more apt for dynamic use.

```python
from lodkit import make_uriclass

uris = make_uriclass(
    cls_name="TestURIFun",
	    namespace="https://test.org/test/",
        fields=("x", ("y1", "hash value 1"), ("y2", "hash value 1")),
    )

    print(uris.x1)             # Namespace("https://test.org/test/<UUID>")
    print(uris.y1 == uris.y2)  # True
```
	
### uritools.utils
`uritools.utils` defines base functionality for generating UUID-based and hashed URIs.

`URIConstructorFactory` (alias of `mkuri_factory`) constructs a callable for generating URIs.
The returned callable takes an optional str argument 'hash_value'; 
If a hash value is given, the segment is generated using a hash function, else the path is generated using a uuid.

```python
from lodkit import URIConstructorFactory

mkuri = URIConstructorFactory("https://test.namespace/")
print(mkuri())                         # URIRef("https://test.namespace/<UUID>")
print(mkuri("test") == mkuri("test"))  # True
```

## Triple Tools

Triple tools defines `lodkit.ttl`, a triple constructor implementing a Turtle-like interface.

The `lodkit.ttl` constructor takes a triple subject and arbitrary predicate-object pairs (emulating [Turtle Predicate List notation](https://www.w3.org/TR/turtle/#predicate-lists)) and generates 3-tuples of RDFLib objects. 


Objects in predicate-objects pairs can be

- `URIRef`, `BNode`, `Literal` (`lodkit._TripleObject`); 
strings are also permissible and are interpreted as `Literal`
- `ttl` objects (resolved recursively)
- tuples of any object accepted by `ttl` in the object position (resolved as [Turtle Object Lists](https://www.w3.org/TR/turtle/#object-lists))
- lists of any predicate-object pairs accepted by `ttl` (resolved as [Turtle Blank Nodes](https://www.w3.org/TR/turtle/#BNodes))

More formally, the type of a predicate-object pair accepted by `lodkit.ttl` is expressed like so:

```python
type _TPredicateObjectPairObject = (
    _TripleObject
    | str
    | list[_TPredicateObjectPair]
    | Iterator[_TPredicateObjectPair]
    | tuple[_TPredicateObjectPairObject, ...]
    | ttl
)

type _TPredicateObjectPair = tuple[URIRef, _TPredicateObjectPairObject]
```

Note that `lodkit.ttl` objects implement the `Iterable` protocol and thus can be *chained*. 

> One of the main ideas of `lodkit.ttl` is to provide a functional DSL for RDF generation that allows to lazily generate triple streams that can be composed into adaptable and modular RDF generation pipelines.

The `lodkit.ttl.to_graph` method allows to generate an `rdflib.Graph` instance from a `lodkit.ttl` object.

### Examples and Usage

The following snippets provide examples of triple generation using `lodkit.ttl`; the corresponding Turtle RDF output produced by calling `lodkit.ttl.to_graph` is shown after each snippet.

- Turtle Predicate List notation	
```python
from lodkit import _Triple, ttl

triples: Iterable[_Triple] = ttl(
    ex.s,
    (ex.p, ex.o),
    (ex.p2, "literal")
)
```
```ttl
@prefix ex: <https://example.com/> .

ex:s ex:p ex:o ;
    ex:p2 "literal" .
```
- Turtle Object List notation
```python
triples: Iterable[_Triple] = ttl(
    ex.s,
    (ex.p, (ex.o, ex.o2)),
)
```
```ttl
@prefix ex: <https://example.com/> .

ex:s ex:p ex:o, ex:o2 .
```
- Turtle Blank Node notation
```python
triples: Iterable[_Triple] = ttl(
    ex.s,
    (ex.p, [(ex.p2, "literal")])
)	
```
```ttl
@prefix ex: <https://example.com/> .

ex:s ex:p [ ex:p2 "literal" ] .
```
- Nested `lodkit.ttl` object
```python
triples: Iterable[_Triple] = ttl(
    ex.s,
    (ex.p, ttl(ex.s2, (ex.p2, "literal")))
)
```
```ttl
@prefix ex: <https://example.com/> .

ex:s ex:p ex:s2 .
ex:s2 ex:p2 "literal" .
```

- Advanced example with multiple nested objects in an object list
```python
triples: Iterable[_Triple] = ttl(
    ex.s,
    (
        ex.p,
        (
            ttl(ex.s2, (ex.p2, ex.o)),
            [
                (ex.p3, ttl(
                    ex.s3,
                    (ex.p4, (ex.o2, [(ex.p5, ex.o3)])))),
                (ex.p6, [(ex.p7, "literal")]),
            ],
        ),
    ),
)
```
```ttl
@prefix ex: <https://example.com/> .

ex:s ex:p [
     ex:p3 ex:s3 ;
     ex:p6 [ ex:p7 "literal" ]
     ],
     ex:s2 .

ex:s2 ex:p2 ex:o .

ex:s3 ex:p4 [ ex:p5 ex:o3 ],
        ex:o2 .
```

- Simple RDF generation pipeline example

```python
class TripleGenerator:

    def triple_generator_1(self) -> Iterator[_Triple]:
        if conditional:
            yield (s, p, o)
        yield from ttl(s, ...)

    # more triple generator method definitions
    ...

    def __iter__(self) -> Iterator[_Triple]:
        return itertools.chain(
            self.triple_generator_1(),
            self.triple_generator_2(),
            self.triple_generator_3(),
            ...
        )

triples: Iterator[_Triple] = itertools.chain(TripleGenerator(), ...)
```


## Namespace Tools

### NamespaceGraph
`lodkit.NamespaceGraph` is a simple rdflib.Graph subclass for easy and convenient namespace binding.

```python
from lodkit import NamespaceGraph
from rdflib import Namespace

class CLSGraph(NamespaceGraph):
	crm = Namespace("http://www.cidoc-crm.org/cidoc-crm/")
	crmcls = Namespace("https://clscor.io/ontologies/CRMcls/")
	clscore = Namespace("https://clscor.io/entity/")

graph = CLSGraph()

ns_check: bool = all(
	ns in map(lambda x: x[0], graph.namespaces())
	for ns in ("crm", "crmcls", "clscore")
)

print(ns_check)  # True
```

## ClosedOntologyNamespace, DefinedOntologyNamespace
`lodkit.ClosedOntologyNamespace` and `lodkit.DefinedOntologyNamespace` are `rdflib.ClosedNamespace` and `rdflib.DefinedNameSpace` subclasses 
that are able to load namespace members based on an ontology.

```python
crm = ClosedOntologyNamespace(ontology="./CIDOC_CRM_v7.1.3.ttl")

crm.E39_Actor   # URIRef('http://www.cidoc-crm.org/cidoc-crm/E39_Actor')
crm.E39_Author  # AttributeError
```

```python
class crm(DefinedOntologyNamespace):
	ontology = "./CIDOC_CRM_v7.1.3.ttl"

crm.E39_Actor   # URIRef('http://www.cidoc-crm.org/cidoc-crm/E39_Actor')
crm.E39_Author  # URIRef('http://www.cidoc-crm.org/cidoc-crm/E39_Author') + UserWarning
```


Note that `rdflib.ClosedNamespaces` are meant to be instantiated and `rdflib.DefinedNameSpaces` are meant to be extended,
which is reflected in `lodkit.ClosedOntologyNamespace` and `lodkit.DefinedOntologyNamespace`.


## Testing Tools
`lodkit.testing_tools` aims to provide general definitions (e.g Graph format options) and [Hypothesis](https://hypothesis.readthedocs.io/en/latest/) strategies useful for testing RDFLib-based Python and code.

E.g. the `TripleStrategies.triples` strategy generates random triples utilizing all permissible subject, predicate and object types including lang-tagged and xsd-typed literals.
The following uses the triples strategies together with a Hypothesis strategy to create random graphs:

```python
from hypothesis import given, strategies as st
from lodkit import tst
from rdflib import Graph


@given(triples=st.lists(tst.triples, min_size=1, max_size=10))
def test_some_function(triples):
    graph = Graph()
    for triple in triples:
        graph.add(triple)

    assert len(graph) == len(triples)
```

The strategy generates up to 100 (by default, see [settings](https://hypothesis.readthedocs.io/en/latest/settings.html)) lists of 1-10 `tuple[_TripleSubject, URIRef, _TripleObject]` and passes them to the test function.

> Warning: The API of lodkit.tesing_tools is very likely to change soon! Strategies should be module-level callables and not properties of a Singleton.
