Metadata-Version: 2.1
Name: ssb-datadoc
Version: 0.4.0
Summary: Document dataset metadata. For use in Statistics Norway's metadata system.
Home-page: https://github.com/statisticsnorway/datadoc
License: MIT
Author: Statistics Norway
Author-email: stat-dev@ssb.no
Requires-Python: >=3.10,<4.0
Classifier: Development Status :: 3 - Alpha
Classifier: Framework :: Dash
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Typing :: Typed
Requires-Dist: dapla-toolbelt (>=1.3.3)
Requires-Dist: dash (>=2.11)
Requires-Dist: dash-bootstrap-components (>=1.1.0)
Requires-Dist: flask-healthz (>=0.0.3)
Requires-Dist: gcsfs (>=2022.7.1)
Requires-Dist: gunicorn (>=21.2.0)
Requires-Dist: pandas (>=1.4.2)
Requires-Dist: pyarrow (>=8.0.0)
Requires-Dist: pydantic (>2)
Requires-Dist: ssb-datadoc-model (==4.1.2)
Project-URL: Repository, https://github.com/statisticsnorway/datadoc
Description-Content-Type: text/markdown

# Datadoc

![Datadoc Unit tests](https://github.com/statisticsnorway/datadoc/actions/workflows/unit-tests.yml/badge.svg) ![Code coverage](https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/mmwinther/0c0c5bdfc360b59254f2c32d65914025/raw/pytest-coverage-badge-datadoc.json) [![PyPI version](https://img.shields.io/pypi/v/ssb-datadoc)](https://pypi.org/project/ssb-datadoc/) ![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)

Document datasets in Statistics Norway

## Usage

![DataDoc in use](./doc/change-language-example.gif)

### From Jupyter

1. Open <https://jupyter.dapla-staging.ssb.no> or another Jupyter Lab environment
1. Datadoc comes preinstalled in Statistics Norway environments. Elsewhere, run Run `pip install ssb-datadoc` to install
1. Upload a dataset to your Jupyter server (e.g. <https://github.com/statisticsnorway/datadoc/blob/master/klargjorte_data/befolkning/person_testdata_p2021-12-31_p2021-12-31_v1.parquet>)
1. Run the [demo.ipynb](./demo.ipynb) Notebook
1. Datadoc will open in the notebook

## Contributing

### Local environment

Poetry is used for dependency management. [Poe the Poet](https://github.com/nat-n/poethepoet) is used for running poe tasks within poetry's virtualenv. Upon cloning this project first install necessary dependencies, then run the tests to verify everything is working.

#### 1. Prerequisites

- Python >=3.10
- Poetry, install via `curl -sSL https://install.python-poetry.org | python3 -`

#### 2. Install dependencies

```shell
poetry install
```

#### 3. Install pre-commit hooks

```shell
poetry run pre-commit install
```

#### 4. Run tests

```shell
poetry run poe test
```

### Add dependencies

#### Main

```shell
poetry add <python package name>
```

#### Dev

```shell
poetry add --group dev <python package name>
```

### Run project locally

To run the project locally:

```shell
poetry run poe datadoc
```

### Run project locally in Jupyter

To run the project locally in Jupyter run:

```shell
poetry run poe jupyter
```

A Jupyter instance should open in your browser. Open and run the cells in the `.ipynb` file to demo datadoc.

## Running the Dockerized Application Locally

```bash
docker run -p 8050:8050 \
-v $HOME/.config/gcloud/application_default_credentials.json/:/application_default_credentials.json \
-e GOOGLE_APPLICATION_CREDENTIALS="/application_default_credentials.json" \
datadoc
```

### Release process

Run the relevant version command on a branch e.g.

```shell
poetry version patch
```

```shell
poetry version minor
```

Commit with message like `Bump version x.x.x -> y.y.y`.

Open and merge a PR.

Use Github to tag and release.

