Metadata-Version: 2.1
Name: viadot2
Version: 2.0a21
Summary: A simple data ingestion library to guide data flows from some places to other places.
Author-email: acivitillo <acivitillo@dyvenia.com>, trymzet <mzawadzki@dyvenia.com>
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: azure-core==1.25.0
Requires-Dist: azure-storage-blob==12.13.1
Requires-Dist: awswrangler==2.19.0
Requires-Dist: s3fs==2022.11.0
Requires-Dist: boto3==1.24.59
Requires-Dist: pandas==1.4.4
Requires-Dist: pyarrow==10.0.1
Requires-Dist: pyodbc<4.1.0,>=4.0.34
Requires-Dist: openpyxl==3.0.10
Requires-Dist: jupyterlab==3.2.4
Requires-Dist: azure-identity==1.7.1
Requires-Dist: matplotlib>=3.8.3
Requires-Dist: adlfs==2022.9.1
Requires-Dist: Shapely==1.8.0
Requires-Dist: imagehash==4.2.1
Requires-Dist: visions==0.7.5
Requires-Dist: sharepy<2.1.0,>=2.0.0
Requires-Dist: simple_salesforce==1.11.5
Requires-Dist: sql-metadata==2.3.0
Requires-Dist: duckdb==0.5.1
Requires-Dist: sendgrid==6.9.7
Requires-Dist: pandas-gbq==0.19.1
Requires-Dist: pyyaml>=6.0.1
Requires-Dist: pydantic==1.10.11
Requires-Dist: aiolimiter==1.0.0
Requires-Dist: trino==0.326.*
Requires-Dist: sqlalchemy==2.0.*
Requires-Dist: minio<8.0,>=7.0
Requires-Dist: databricks-connect==11.3.*

# Viadot
[![build status](https://github.com/dyvenia/viadot/actions/workflows/build.yml/badge.svg)](https://github.com/dyvenia/viadot/actions/workflows/build.yml)
[![formatting](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![codecov](https://codecov.io/gh/Trymzet/dyvenia/branch/main/graph/badge.svg?token=k40ALkXbNq)](https://codecov.io/gh/Trymzet/dyvenia)
---

**Documentation**: <a href="https://dyvenia.github.io/viadot/" target="_blank">https://dyvenia.github.io/viadot/</a>

**Source Code**: <a href="https://github.com/dyvenia/viadot" target="_blank">https://github.com/dyvenia/viadot</a>

---

A simple data ingestion library to guide data flows from some places to other places.

## Getting Data from a Source

Viadot supports several API and RDBMS sources, private and public. Currently, we support the UK Carbon Intensity public API and base the examples on it.

```python
from viadot.sources.uk_carbon_intensity import UKCarbonIntensity

ukci = UKCarbonIntensity()
ukci.query("/intensity")
df = ukci.to_df()

print(df)
```

**Output:**
|      | from              | to                | forecast | actual | index    |
| ---: | :---------------- | :---------------- | -------: | -----: | :------- |
|    0 | 2021-08-10T11:00Z | 2021-08-10T11:30Z |      211 |    216 | moderate |

The above `df` is a pandas `DataFrame` object. It contains data downloaded by `viadot` from the Carbon Intensity UK API.

## Loading Data to a Source
Depending on the source, `viadot` provides different methods of uploading data. For instance, for SQL sources, this would be bulk inserts. For data lake sources, it would be a file upload. For ready-made pipelines including data validation steps using `dbt`, see [prefect-viadot](https://github.com/dyvenia/prefect-viadot).


## Getting started
### Prerequisites
We assume that you have [Docker](https://www.docker.com/) installed.

### Installation
Clone the `2.0` branch, and set up and run the environment:
  ```sh
  git clone https://github.com/dyvenia/viadot.git -b 2.0 && \
    cd viadot/docker && \
    sh update.sh  && \
    sh run.sh && \
    cd ../
  ```

### Configuration
In order to start using sources, you must configure them with required credentials. Credentials can be specified either in the viadot config file (by default, `$HOME/.config/viadot/config.yaml`), or passed directly to each source's `credentials` parameter.

You can find specific information about each source's credentials in [the documentation](https://dyvenia.github.io/viadot/references/sql_sources/).
