Metadata-Version: 2.1
Name: viadot2
Version: 2.0a21
Summary: A simple data ingestion library to guide data flows from some places to other places.
Author-email: acivitillo <acivitillo@dyvenia.com>, trymzet <mzawadzki@dyvenia.com>
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: azure-core ==1.25.0
Requires-Dist: azure-storage-blob ==12.13.1
Requires-Dist: awswrangler ==2.19.0
Requires-Dist: s3fs ==2022.11.0
Requires-Dist: boto3 ==1.24.59
Requires-Dist: pandas ==1.4.4
Requires-Dist: pyarrow ==10.0.1
Requires-Dist: pyodbc <4.1.0,>=4.0.34
Requires-Dist: openpyxl ==3.0.10
Requires-Dist: jupyterlab ==3.2.4
Requires-Dist: azure-identity ==1.7.1
Requires-Dist: matplotlib >=3.8.3
Requires-Dist: adlfs ==2022.9.1
Requires-Dist: Shapely ==1.8.0
Requires-Dist: imagehash ==4.2.1
Requires-Dist: visions ==0.7.5
Requires-Dist: sharepy <2.1.0,>=2.0.0
Requires-Dist: simple-salesforce ==1.11.5
Requires-Dist: sql-metadata ==2.3.0
Requires-Dist: duckdb ==0.5.1
Requires-Dist: sendgrid ==6.9.7
Requires-Dist: pandas-gbq ==0.19.1
Requires-Dist: pyyaml >=6.0.1
Requires-Dist: pydantic ==1.10.11
Requires-Dist: aiolimiter ==1.0.0
Requires-Dist: trino ==0.326.*
Requires-Dist: sqlalchemy ==2.0.*
Requires-Dist: minio <8.0,>=7.0
Requires-Dist: databricks-connect ==11.3.*

# Viadot
[![build status](https://github.com/dyvenia/viadot/actions/workflows/build.yml/badge.svg)](https://github.com/dyvenia/viadot/actions/workflows/build.yml)
[![formatting](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![codecov](https://codecov.io/gh/Trymzet/dyvenia/branch/main/graph/badge.svg?token=k40ALkXbNq)](https://codecov.io/gh/Trymzet/dyvenia)
---

**Documentation**: <a href="https://dyvenia.github.io/viadot/" target="_blank">https://dyvenia.github.io/viadot/</a>

**Source Code**: <a href="https://github.com/dyvenia/viadot" target="_blank">https://github.com/dyvenia/viadot</a>

---

A simple data ingestion library to guide data flows from some places to other places.

## Getting Data from a Source

Viadot supports several API and RDBMS sources, private and public. Currently, we support the UK Carbon Intensity public API and base the examples on it.

```python
from viadot.sources.uk_carbon_intensity import UKCarbonIntensity

ukci = UKCarbonIntensity()
ukci.query("/intensity")
df = ukci.to_df()

print(df)
```

**Output:**
|      | from              | to                | forecast | actual | index    |
| ---: | :---------------- | :---------------- | -------: | -----: | :------- |
|    0 | 2021-08-10T11:00Z | 2021-08-10T11:30Z |      211 |    216 | moderate |

The above `df` is a pandas `DataFrame` object. It contains data downloaded by `viadot` from the Carbon Intensity UK API.

## Loading Data to a Source
Depending on the source, `viadot` provides different methods of uploading data. For instance, for SQL sources, this would be bulk inserts. For data lake sources, it would be a file upload. For ready-made pipelines including data validation steps using `dbt`, see [prefect-viadot](https://github.com/dyvenia/prefect-viadot).


## Getting started
### Prerequisites
We assume that you have [Docker](https://www.docker.com/) installed.

### Installation
Clone the `2.0` branch, and set up and run the environment:
  ```sh
  git clone https://github.com/dyvenia/viadot.git -b 2.0 && \
    cd viadot/docker && \
    sh update.sh  && \
    sh run.sh && \
    cd ../
  ```

### Configuration
In order to start using sources, you must configure them with required credentials. Credentials can be specified either in the viadot config file (by default, `$HOME/.config/viadot/config.yaml`), or passed directly to each source's `credentials` parameter.

You can find specific information about each source's credentials in [the documentation](https://dyvenia.github.io/viadot/references/sql_sources/).
