Metadata-Version: 2.4
Name: semase
Version: 0.0.1
Summary: Semantic integration of ASE with SPARQL knowledge graphs
Author-email: Daniel Hernandez <daniel@degu.cl>
Maintainer-email: Daniel Hernandez <daniel@degu.cl>
License-Expression: MIT
Project-URL: Homepage, https://git.degu.cl/daniel/SemASE
Project-URL: Repository, https://git.degu.cl/daniel/SemASE
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Chemistry
Classifier: Topic :: Scientific/Engineering :: Physics
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: ase
Requires-Dist: rdflib
Requires-Dist: SPARQLWrapper
Provides-Extra: test
Requires-Dist: pytest; extra == "test"
Requires-Dist: pyoxigraph; extra == "test"
Provides-Extra: dev
Requires-Dist: ruff; extra == "dev"
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pyoxigraph; extra == "dev"
Dynamic: license-file

# SemASE

SemASE integrates the [Atomic Simulation Environment
(ASE)](https://gitlab.com/ase/ase) with knowledge graphs stored in a SPARQL
endpoint. It extends ASE classes with methods to save and load atomic simulation
data as RDF triples, enabling semantic interoperability and structured querying
of simulation results.

SemASE works by patching ASE classes at runtime, so existing code requires
minimal changes. After activation, classes like `Atoms` and `Cell` gain
`save_to_kg()` and `update_from_kg()` methods that serialize their state into an
RDF knowledge graph and reconstruct it back.

**This project is under development**

## Installation

```bash
pip install semase
```

### Dependencies

- [ASE](https://gitlab.com/ase/ase) -- Atomic Simulation Environment
- [rdflib](https://rdflib.readthedocs.io/) -- RDF triple construction and
  serialization
- [SPARQLWrapper](https://sparqlwrapper.readthedocs.io/) -- SPARQL endpoint
  communication

## Examples

### Activating SemASE

Call `semase.activate()` once at the start of your script. After activation, all
ASE classes patched by SemASE gain knowledge graph methods.

SemASE can work against a remote SPARQL endpoint (e.g., Apache Jena Fuseki):

```python
import semase

semase.activate(endpoint="http://localhost:3030/ase/sparql")
```

Alternatively, you can use [pyoxigraph](https://pyoxigraph.readthedocs.io/) as a
local store, which requires no server setup. With no arguments, the store is
in-memory and data is lost when the process exits:

```python
import semase
import pyoxigraph

semase.activate()
store = pyoxigraph.Store()  # in-memory, not persistent
```

To persist data across sessions, pass a directory path. Oxigraph saves the
database in that directory and reloads it automatically:

```python
store = pyoxigraph.Store("/path/to/my-data")  # persistent on disk
```

Then pass `store=` to `save_to_kg()` and `update_from_kg()`:

```python
from ase import Atoms

water = Atoms('H2O', positions=[(0, 0, 0), (0, 0.76, 0.59), (0, -0.76, 0.59)])
water.save_to_kg(uri="ase:water-001", store=store)

loaded = Atoms()
loaded.update_from_kg(uri="ase:water-001", store=store)
print(loaded.get_chemical_formula())  # H2O
```

### Saving a molecule to the knowledge graph

```python
from ase import Atoms

# Build a water molecule
water = Atoms('H2O', positions=[(0, 0, 0), (0, 0.76, 0.59), (0, -0.76, 0.59)])

# Save it to the knowledge graph
water.save_to_kg(uri="ase:water-001")
```

The `save_to_kg()` method serializes the `Atoms` object -- including its `Cell`,
atomic symbols, and positions -- into RDF triples and inserts them into the
configured SPARQL endpoint.

### Loading a molecule from the knowledge graph

```python
from ase import Atoms

# Reconstruct the Atoms object from the knowledge graph
water = Atoms()
water.update_from_kg(uri="ase:water-001")

print(water.get_chemical_formula())  # H2O
print(water.positions)
```

The `update_from_kg()` method queries the SPARQL endpoint for the given URI and
populates the object's attributes from the stored triples.

### Saving a periodic crystal

```python
from ase.build import bulk

si = bulk('Si', 'diamond', a=5.43)
si.save_to_kg(uri="ase:silicon-diamond")
```

Both the `Atoms` object and its associated `Cell` are committed together. The
cell vectors and periodic boundary conditions are stored as part of the graph.

### Querying the knowledge graph directly

SemASE also exposes its SPARQL client for custom queries:

```python
from semase.sparql.client import SparqlClient

client = SparqlClient(store=store)  # or SparqlClient(endpoint="http://...")

results = client.query("""
    PREFIX semase: <https://semase.org/ontology#>

    SELECT ?system ?formula WHERE {
        ?system a semase:AtomicSystem ;
                semase:chemicalFormula ?formula .
    }
""")

for row in results:
    print(row["system"], row["formula"])
```

### Updating an existing entry

If a URI already exists in the knowledge graph, `save_to_kg()` replaces its
triples with the current state of the object:

```python
from ase import Atoms

water = Atoms()
water.update_from_kg(uri="ase:water-001")

# Modify the structure
water.positions[1] += [0, 0.01, 0]

# Save the updated version back
water.save_to_kg(uri="ase:water-001")
```

## License

[MIT](LICENSE)

## Roadmap

Next there are some ideas to consider in the roadmap of this project.

1. We need to test if this work with a toy data example.

2. This project saves objects to a shared knowledge graph of the researchers. We could improve this by integrating it with a file store. In particular, at the University of Stuttgart we use [DARUS](https://darus.uni-stuttgart.de/). Thus, it would be useful to include a module to save the simulation data to DARUS.

3. We should also integrate this tools with research workflows in order to make computations repeatable.
