Metadata-Version: 2.1
Name: cheka
Version: 0.7
Summary: This tool validates a data graph against a set of SHACL shape graphs that it extracts from a hierarchy of Profiles (Standards/Specifications and/or profiles of them).
Home-page: https://github.com/surroundaustralia/cheka
Author: Nicholas Car
Author-email: nicholas.car@surroundaustralia.com
License: LICENSE
Download-URL: https://github.com/surroundaustralia/cheka/archive/0.7.tar.gz
Project-URL: Bug Reports, https://github.com/surroundaustralia/cheka/issues
Project-URL: Source, https://github.com/surroundaustralia/cheka/
Keywords: Semantic Web,OWL,ontology,SHACL,validation,profiles
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Topic :: Utilities
Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Description-Content-Type: text/markdown
Requires-Dist: rdflib (>=6.0.0)
Requires-Dist: pyshacl
Requires-Dist: requests
Requires-Dist: pytest

![](style/cheka.png)

![](https://zenodo.org/badge/DOI/10.5281/zenodo.3676330.svg)  
# CHEKA
A profile hierarchy-based RDF graph validation tool written in Python


1. [Installation](#installation)
2. [Use](#use)
3. [Testing](#testing)
4. [License](#license)
6. [Citation](#citation)
5. [Contacts](#contacts)


This tool validates a data graph against a set of SHACL shape graphs that it extracts from a hierarchy of Profiles (Standards/Specifications and/or profiles of them). It uses conformance claims in the data graph to a Profile to collate and then use all the 
*validator* SHACL files within the hierarchy of other Profiles and Standards to which that Profile profiles.

Cheka uses [Profiles Vocabulary (PROF)](https://www.w3.org/TR/dx-prof/) descriptions of Profiles and Standards and both 
traverses up a Profile hierarchy (following `prof:isProfileOf` properties) and across from `prof:Profiles`s to 
`prof:ResourceDescriptors` that describe the constraints implemented for them. These constraints are currently limited 
to [Shapes Constraint Language (SHACL)](https://www.w3.org/TR/shacl/) files and must have the `prof:Role` of 
`role:validation` to be recognised by Cheka. The [pySHACL](https://github.com/RDFLib/pySHACL) Python SHACL validator is 
used to perform SHACL validation.


## Installation
1. Ensure Python 3 in available on your system
2. Clone this repo
2. Install requirements in *requirements.txt*, e.g. `~$ pip3 install -r requirements.txt`
3. Execute scripts as per *Use* below


## Use
### Input requirements
To use Cheka, you must supply it with both a data (an RDF graph) to be validated and a profiles hierarchy (another RDF graph). It will then use 
one of several selected strategies to validate objects within the data using validating resources it locates using the 
profiles hierarchy.

You *may* supply it with a couple of other flags too for other functions.

The command line arguments (Python & BASH) are:

**Flag** | **Input values** | **Requirement** | **Notes**  
--- | --- | --- | ---
`-d` / `--data` | an RDF file's path | mandatory | Can be in most RDF formats with conventional file endings (e.g. `.ttl` for Turtle, `.jsonld` for JSON-LD)
`-p` / `--profiles` | a profile file's path | mandatory | As above. Profiles description must be formulated according to [PROF](https://www.w3.org/TR/dx-prof/) 
`-s` / `--strategy` | 'shacl' or 'profile' | optional, 'shacl' default | Which strategy to use. See [Strategies](#strategies) description below
`-u` / `--profile-uri` | the URI of a profile in the profile hierarchy | sometimes mandatory | If strategy 'profile' is selected, a profile URI must be give. The data is then validated using validators within that profile's hierarchy only
`-r` / `--get-remotes` | none | optional, default False | If True, Cheka will pull in profile and validating SHACL artifacts referenced, but not described, in the profiles hierarchy, i.e. remote profiles online


#### Data graph
This must be an RDF file with the part(s) to be validated indicating their conformance to a *profile* as per the 
Profiles Vocabulary.

Typically this will look like this:

```
@prefix dct: <http://purl.org/dc/terms/> .

<Object_X>
    a <Class_Y> ;
    dct:conformsTo <Profile_Z> ;
    ...
```

This says that `<Object_X>` is meant to conform to `<Profile_Z>`.

See the `tests/` folder for example data graphs.


#### Profiles hierarchy
This must also be an RDF file that contains a hierarchy of `prof:Profile` objects (including `dct:Standard` objects) 
that are related to one another via the `prof:isProfileOf` property and each of which has a validating resource 
indicated by relating it to a `prof:Profile` via a `prof:ResourceDescriptor` like this:

```
@prefix dct: <http://purl.org/dc/terms/> .
@prefix prof: <http://www.w3.org/ns/dx/prof/> .
@prefix role:  <http://www.w3.org/ns/dx/prof/role/> .


<Standard_A>
    a dct:Standard ;
    prof:hasResource [
        a prof:ResourceDescriptor ;
        prof:hasRole role:validation ;
        prof:hasArtifact <File_or_Uri_J> ;
    ]
.

<Profile_B>
    a prof:Profile ;
    prof:isProfileOf <Standard_A> ;
    prof:hasResource <Resource_Descriptor_P> ;
.   

<Resource_Descriptor_P>
    a prof:ResourceDescriptor ;
    prof:hasRole role:validation ;
    prof:hasArtifact <File_or_Uri_K> ;
.

<Profile_C>
    a prof:Profile ; 
    prof:isProfileOf <Profile_B> ;    
    prof:hasResource [
        a prof:ResourceDescriptor ;
        prof:hasRole role:validation ;
        prof:hasArtifact <File_or_Uri_L> ;
    ] ;
.
```

This says `<Profile_C>` is a profile of `<Profile_B>` which is, in turn, a profile of `<Standard_A>`. The two profiles 
and the standard have resources `<File_or_Uri_J>`, `<File_or_Uri_K>` & `<File_or_Uri_L>` respectively which are 
indicated to be validators by the `prof:ResourceDescriptor` classes that associate them with their profiles/standard.

See the `tests/` folder for example profiles graphs.


#### Strategies
The following different strategies may be selected for use.

**Name** | **Description**
--- | ---
*shacl* | Standard SHACL validation: all the SHACL validators from all the profiles found in the profiles hierarchy are used to validate the the given data using the SHACL validators' targeting (usually per class)
*profile* | Validates given data using the validators found linked to a profile and all the profiles in that profile's hierarchy. This is the "main" Cheka strategy, as opposed to *shacl* which is "normal" SHACL validation
*claims* | *Not implemented yet, likely February 2021*

***shacl*** is the default strategy

Note that the strategy is applied using the `-s` flag. When using Cheka as a Python module, a different strategy may be applied per call to `Cheka.validate()`.


### Running
Cheka uses the profiles graph to find all the SHACL validators it needs to validate a data graph. It returns a pySHACL 
result with an additional element - the URI of the profile used for validation: [conforms, results_graph, results_text, profile_uri]. *conforms* is either True or False.

#### As a Python module
A Python program can import Cheka (`import cheka`) after installing it (`pip install cheka`). Then Cheka can be called in code like this:

```python
import cheka

c = cheka.Cheka("data.ttl", "profiles_hierarchy.ttl")

# to tell Cheka to pull in profiles/validators 
# referenced but not defined in the profiles_hierarchy.ttl
c.get_remote_profiles = True  

# a simple validation - basic, default, shacl-only (no use of profiles)
c.validate()

# profile-based vaidation, starting with the profile Profile_C 
c.validate(
    strategy="profile", 
    profile_uri="http://example.org/profile/Profile_C"
)
```

#### As a Python command line utility
```
~$ python3 cli.py -d DATA-GRAPH-FILE -p PROFILES-GRAPH-FILE
```
*(and potentially other optional args)*

If you make the cli.py script executable (`sudo chmod a+x cli.py`) then you can run it like this:

```
~$ ./cli.py -d DATA-GRAPH-FILE -p PROFILES-GRAPH-FILE
```

#### As a BASH script
The file `cheka` in the `bin/` directory is a BASH shell script that calls `cli.py`. Make it executable 
(`sudo chmod a+x cheka`) then you can run it like this:

```
~$ ./cheka -d DATA-GRAPH-FILE -p PROFILES-GRAPH-FILE
```
*(and potentially other optional args)*

#### As a Windows executable
*coming!*


## Testing
Tests are included in the `tests/` directory. They use [pytest](https://docs.pytest.org/en/latest/) should be able to be run from the command line. They have 
no dependencies, other than pytest and Cheka itself.

Tests are annotated with what they are testing.

### Test profile hierarchy
The profiles and validators used for the tests in this code are combined in the file [test-profile.hierarchy.ttl](cheka/test/test-profile.hierarchy.ttl). This hierarchy can be used in other applications as an example of a profile hierarchy.


## License  
This code is licensed using the GPL v3 licence. See the [LICENSE file](LICENSE) for the deed. 

Note [Citation](#citation) below for attribution.


## Citation
To cite this software, please use the following BibTex:

```
@software{10.5281/zenodo.3676330,
  author = {{Nicholas J. Car}},
  title = {Cheka: A profile hierarchy-based RDF graph validation tool written in Python},
  version = {0.5},
  date = {2020},
  publisher = "SURROUND Australia Pty. Ltd.",
  doi = {10.5281/zenodo.3676330},
  url = {https://doi.org/10.5281/zenodo.3676330}
}
```

Or the following RDF:

```
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix sdo: <https://schema.org/> .
@prefix wiki: <https://www.wikidata.org/wiki/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<https://doi.org/10.5281/zenodo.3676330>
    a sdo:SoftwareSourceCode , owl:NamedIndividual ;
    sdo:codeRepository <https://github.com/surroundaustralia/cheka> ;
    dcterms:type wiki:Q7397 ; # "software"
    dcterms:creator "Nicholas J. Car" ;
    dcterms:date "2020"^^xsd:gYear ;
    dcterms:title "Cheka: A profile hierarchy-based RDF graph validation tool written in Python" ;
    sdo:version "0.5" ;
    dcterms:publisher [
        a sdo:Organization ;
        sdo:name "SURROUND Pty Ltd" ;
        sdo:url <https://surroundaustralia.com> ;
    ]
.
```


## Contacts

*publisher:*  
![](style/SURROUND-logo-100.png)  
**SURROUND Australia Pty. Ltd.**  
<https://surroundaustralia.com>  

*creator:*  
**Dr Nicholas J. Car**  
*Data Systems Architect*  
SURROUND Australia Pty. Ltd.  
<nicholas.car@surroudaustralia.com>  


