Metadata-Version: 2.1
Name: gimie
Version: 0.1.0
Summary: Extract structured metadata from git repositories.
Home-page: https://github.com/SDSC-ORD/gimie
License: GPLv3
Keywords: metadata,git,extraction,linked-data
Author: Swiss Data Science Center
Author-email: contact@datascience.ch
Requires-Python: >=3.8,<4.0
Classifier: Development Status :: 1 - Planning
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: License :: Other/Proprietary License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Requires-Dist: PyDriller (>=2.3,<3.0)
Requires-Dist: extruct (>=0.14.0,<0.15.0)
Requires-Dist: pyshacl (>=0.20.0,<0.21.0)
Requires-Dist: typer (>=0.7.0,<0.8.0)
Description-Content-Type: text/markdown

# Gimie

Gimie (GIt Meta Information Extractor) is a python library and command line tool to extract structured metadata from git repositories.

## Context
Scientific code repositories contain valuable metadata which can be used to enrich existing catalogues, platforms or databases. This tool aims to easily extract structured metadata from a generic git repositories. The following sources of information are used:

* [ ] Git metadata
* [ ] Filenames
* [ ] License
* [ ] HTML in web page
* [ ] Freetext content in README and other files

## Installation

To install the dev version from github:

```shell
pip install git+https://github.com/SDSC-ORD/gimie.git#egg=gimie
```

## Usage

As a command line tool:
```shell
gimie https://github.com/numpy/numpy
```
As a python library:

```python
import gimie
repo = gimie.Repo("https://github.com/numpy/nump)
```

## Outputs

The default output is JSON-ld, a JSON serialization of the [RDF](https://en.wikipedia.org/wiki/Resource_Description_Framework) data model. We follow the schema recommended by [codemeta](https://codemeta.github.io/).

## Contributing

All contributions are welcome. New functions and classes should have associated tests and docstrings following the [numpy style guide](https://numpydoc.readthedocs.io/en/latest/format.html).

The code formatting standard we use is [black](https://github.com/psf/black), with `--line-length=79` to follow [PEP8](https://peps.python.org/pep-0008/) recommendations. We use [pytest](https://docs.pytest.org/en/7.2.x/) as our testing framework. This project uses [pyproject.toml](https://pip.pypa.io/en/stable/reference/build-system/pyproject-toml/) to define package information, requirements and tooling configuration.

For local development, you can clone the repository and install the package in editable mode, either using [pip](https://pip.pypa.io/en/stable/):

```shell
git clone https://github.com/SDSC-ORD/gimie && cd gimie
pip install -e .
```
Or [poetry](https://python-poetry.org/), to work in an isolated virtual environment:
```shell
git clone https://github.com/SDSC-ORD/gimie && cd gimie
poetry install
```

