Metadata-Version: 2.1
Name: investigraph
Version: 0.0.4
Summary: etl pipeline for investigations with follow the money data
Home-page: https://investigraph.vercel.app
License: MIT
Author: Simon Wörpel
Author-email: simon@investigativedata.org
Requires-Python: >=3.10,<4.0
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Dist: banal (==1.0.6)
Requires-Dist: cachelib (==0.10.2)
Requires-Dist: dateparser (==1.1.8)
Requires-Dist: fakeredis (==2.11.2)
Requires-Dist: followthemoney (==3.3.0)
Requires-Dist: followthemoney-store (==3.0.3)
Requires-Dist: html2text (==2020.1.16)
Requires-Dist: nomenklatura (==2.9.3)
Requires-Dist: pandas (==2.0.0)
Requires-Dist: pantomime (==0.6.0)
Requires-Dist: prefect (==2.10.11)
Requires-Dist: prefect-dask (==0.2.4)
Requires-Dist: redis (==4.5.4)
Requires-Dist: requests (==2.28.2)
Requires-Dist: smart-open[all] (==6.3.0)
Requires-Dist: sqlalchemy (==1.4.48)
Requires-Dist: tabulate (==0.9.0)
Requires-Dist: typer (==0.9.0)
Requires-Dist: zavod (==0.6.1)
Project-URL: Bug Tracker, https://github.com/investigativedata/investigraph-etl/issues
Project-URL: Documentation, https://investigativedata.github.io/investigraph/
Project-URL: Repository, https://github.com/investigativedata/investigraph-etl
Description-Content-Type: text/markdown

# investigraph

**Research and implementation of an ETL process for a curated and up-to-date public and open-source data catalog of frequently used datasets in investigative journalism.**

Using [prefect.io](https://www.prefect.io/) for ftm pipeline processing

[Documentation](https://investigativedata.github.io/investigraph/)

[Tutorial](https://investigativedata.github.io/investigraph/tutorial/)

## installation

    pip install investigraph

## example datasets

There is a dedicated [repo](https://github.com/investigativedata/investigraph-datasets) for example datasets that can be used as a [Block](https://docs.prefect.io/2.10.11/concepts/blocks/) within the prefect.io deployment.

## deployment

### docker

`docker-compose.yml` for local development / testing, use `docker-compose.prod.yml` as a starting point for a production setup. [More instructions here](https://investigativedata.github.io/investigraph/deployment/)

## run locally

Clone repo first.

Install app and dependencies (use a virtualenv):

    pip install -e .

After installation, `investigraph` as a command should be available:

    investigraph --help

Quick run a local dataset definition:

    investigraph run <dataset_name> -c ./path/to/config.yml

Register a local datasets block:

    investigraph add-block -b local-file-system/investigraph-local -u ./datasets

Register github datasets block:

    investigraph add-block -b github/investigraph-datasets -u https://github.com/investigativedata/investigraph-datasets.git

Run a dataset pipeline from a dataset defined in a registered block:

    investigraph run ec_meetings

View prefect dashboard:

    make server

## test

    make install
    make test

## supported by

[Media Tech Lab Bayern batch #3](https://github.com/media-tech-lab)

<a href="https://www.media-lab.de/en/programs/media-tech-lab">
    <img src="https://raw.githubusercontent.com/media-tech-lab/.github/main/assets/mtl-powered-by.png" width="240" title="Media Tech Lab powered by logo">
</a>

