Metadata-Version: 2.1
Name: gimie
Version: 0.2.0
Summary: Extract structured metadata from git repositories.
Home-page: https://github.com/SDSC-ORD/gimie
License: GPLv3
Keywords: metadata,git,extraction,linked-data
Author: Swiss Data Science Center
Author-email: contact@datascience.ch
Requires-Python: >=3.8,<4.0
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: License :: Other/Proprietary License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Requires-Dist: PyDriller (>=2.3,<3.0)
Requires-Dist: calamus (>=0.4.1,<0.5.0)
Requires-Dist: pyshacl (>=0.20.0,<0.21.0)
Requires-Dist: requests (>=2.28.2,<3.0.0)
Requires-Dist: typer (>=0.7.0,<0.8.0)
Description-Content-Type: text/markdown

# Gimie

Gimie (GIt Meta Information Extractor) is a python library and command line tool to extract structured metadata from git repositories.

:warning: Gimie is at an early development stage. It is not yet functional.

## Context
Scientific code repositories contain valuable metadata which can be used to enrich existing catalogues, platforms or databases. This tool aims to easily extract structured metadata from a generic git repositories. The following sources of information are used:

* [x] Github API
* [ ] Gitlab API
* [ ] Local Git metadata
* [ ] License text
* [ ] Free text in README
* [ ] Renku project metadata

## Installation

To install the dev version from github:

```shell
pip install git+https://github.com/SDSC-ORD/gimie.git#egg=gimie
```

## Usage

As a command line tool:
```shell
gimie data https://github.com/numpy/numpy
```
As a python library:

```python
from gimie.project import Project
proj = Project("https://github.com/numpy/numpy)

# To retrieve the rdflib.Graph object
g = proj.to_graph()

# To retrieve the serialized graph
proj.serialize(format='ttl')
```

Or to extract only from a specific source:
```python
from gimie.sources.remote import GithubExtractor
gh = GithubExtractor('https://github.com/SDSC-ORD/gimie')
gh.extract()

# To retrieve the rdflib.Graph object
g = gh.to_graph()

# To retrieve the serialized graph
gh.serialize(format='ttl')
```

## Outputs

The default output is JSON-ld, a JSON serialization of the [RDF](https://en.wikipedia.org/wiki/Resource_Description_Framework) data model. We follow the schema recommended by [codemeta](https://codemeta.github.io/).
Supported formats are json-ld, turtle and n-triples.

## Contributing

All contributions are welcome. New functions and classes should have associated tests and docstrings following the [numpy style guide](https://numpydoc.readthedocs.io/en/latest/format.html).

The code formatting standard we use is [black](https://github.com/psf/black), with `--line-length=79` to follow [PEP8](https://peps.python.org/pep-0008/) recommendations. We use [pytest](https://docs.pytest.org/en/7.2.x/) as our testing framework. This project uses [pyproject.toml](https://pip.pypa.io/en/stable/reference/build-system/pyproject-toml/) to define package information, requirements and tooling configuration.

For local development, you can clone the repository and install the package in editable mode, either using [pip](https://pip.pypa.io/en/stable/):

```shell
git clone https://github.com/SDSC-ORD/gimie && cd gimie
pip install -e .
```
Or [poetry](https://python-poetry.org/), to work in an isolated virtual environment:
```shell
git clone https://github.com/SDSC-ORD/gimie && cd gimie
poetry install
```

## Releases and Publishing on Pypi

Releases are done via github release

- a release will trigger a github workflow to publish the package on Pypi
- Make sure to update to a new version in `pyproject.toml` before making the release
- It is possible to test the publishing on Pypi.test by running a manual workflow: go to github actions and run the Workflow: 'Publish on Pypi Test'

