Metadata-Version: 2.1
Name: ascend-io-dbt-utils
Version: 0.11.3
Summary: Utilities for dbt and Ascend
Home-page: https://github.com/michaelhyatt/ascend_dbt_transform
Description-Content-Type: text/markdown

# Utility to convert a dbt project with compiled SQL models into Ascend transforms

## Overview
This utility allows you to convert a dbt project with compiled SQL models into Ascend transforms. It will use the SQL files generated with `dbt compile` in conjunction with `manifest.json` generated by `dbt compile` to create the transforms.

## Usage
Run `--help` to see the usage:
```
Usage: ascend_dbt_utils [OPTIONS] COMMAND [ARGS]...

  Collection of utilities to help convert dbt projects into Ascend dataflows.

Options:
  --help  Show this message and exit.

Commands:
  dbt-model   Operations to perform on an individual dbt model.
  dbt-models  Operations to perform on a collection of dbt models together.
  dbt-tests   Operations to perform on dbt tests.
```
### Prerequisites
Super-useful to run using venv:
```
python3 -m venv .venv
source venv/bin/activate
```
Clone the Github repo, you know how to do it. Install the required modules.
```
pip install -r requirements.txt
```
May require installing additional modules that are not mentioned in the `requirements.txt` file, depending on your environment. The following modules are optional if you want to use Ascend CLI or do dbt compilation locally:
```
pip install ascend-io-cli
pip install dbt-core
pip install dbt-snowflake
```
Place the dbt project in a subdirectory. Compile the project with `dbt compile`. Ensure the `target/compiled` directory is present. The `manifest.json` file should be present in the `target/compiled` directory, as well as the SQL files for each model.
#### `.env` file support
To save the amount of command line params typing, the utility supports the `.env` file to specify the parameters instead of specifying them on the command line. The `.env` file should be placed in the same directory where the `ascend_dbt_utils` utility is run from. The following environment variables are supported:
```
ASCEND_HOSTNAME=myhost.ascend.io
ASCEND_DATA_SERVICE=My_Snowflake_Service
ASCEMD_DATAFLOW=dbt_model_flow
DBT_MANIFEST_FILE=target/manifest.json
ASCEND_LANGUAGE=bigquery-sql # Or, "snowflake-sql" by default when omitted.
```
### Workflow
#### `ascend_dbt_utils` supports .env file instead of specifying the parameters on the command line
Example of the `.env` file:
```
ASCEND_HOSTNAME=myhost.ascend.io
ASCEND_DATA_SERVICE=My_Snowflake_Service
ASCEND_DATAFLOW=dbt_model_flow
DBT_MANIFEST_FILE=target/manifest.json
```
#### Use the `dbt-models show` command first to see the dependencies, seeds and models
The `dbt-models show` command will display the list of seeds and models defined in the dbt project. Seeds will have to be pre-created in Ascend dataflow to use them as transformer inputs. Models will be converted into transforms. The following output is an example of the `dbt-models show` command:
```
Node: seed.acme.subscription_periods
  -> Depends on: 
Node: model.acme.util_months
  -> Depends on: 
Node: model.acme.customer_revenue_by_month
  -> Depends on: seed.acme.subscription_periods
  -> Depends on: model.acme.util_months
Node: model.acme.customer_churn_month
  -> Depends on: model.acme.customer_revenue_by_month
Node: model.acme.mrr
  -> Depends on: model.acme.customer_revenue_by_month
  -> Depends on: model.acme.customer_churn_month
``` 
The output above requires us to have an Ascend dataflow with a connector or a data share connection called `subscription_periods` to allow the transforms connect to it. The transforms will be created in the dataflow with the following names:
```
util_months
customer_revenue_by_month
customer_churn_month
mrr
```
It is useful to check the following before creating the transforms:
1. The transforms will be created in the dataflow with the same names as the models. If there are any name conflicts, the transforms will not be created.
2. The utility doesn't interfere with the schema, if the transforms SQL relies on certain fields, they should be present in the seed connector and other transforms. Some manual cleanup may be required after the transforms are created to clean up errors.
3. dbt allows for models without inputs, but Ascend doesn't. Ensure that all the generated SQL files are referencing other models or seeds. If needed, add a dummy SQL statement connecting the model to a seed or another model.
4. The utility will determine the right order of model transforms creation based on `manifest.json`. If there are any dependencies missing, the transforms will not be created.
#### Use the `dbt-models validate` command to ensure all the required seeds are present in the flow
The `dbt-models validate` command will validate that all the seeds required by the transforms are present in the dataflow. The following output is an example of the `dbt-models validate` command:
```
Nodes present in the dataflow:
  raw_customers
  raw_orders
  raw_payments
Nodes absent in the dataflow:
```
#### Use the `dbt-models merge` command to create the transforms or merge the transforms into an existing flow
The `dbt-models merge` command will merge the transforms into the dataflow. The following output is an example of the `dbt-models merge` command:
```
I1021 12:52:07.756062 75551 applier.py:350] Apply Dataflow: (ds=Mikes_Snowflake_demos df=dbt_test)
I1021 12:52:08.019690 75551 applier.py:360] Update Dataflow: (ds=Mikes_Snowflake_demos df=dbt_test)
I1021 12:52:09.084092 75551 applier.py:756] Apply Component: (ds=Mikes_Snowflake_demos df=dbt_test util_months)
I1021 12:52:09.356551 75551 applier.py:697] Create Transform: (ds=Mikes_Snowflake_demos df=dbt_test util_months)
I1021 12:52:10.335402 75551 applier.py:756] Apply Component: (ds=Mikes_Snowflake_demos df=dbt_test customer_revenue_by_month)
I1021 12:52:10.617655 75551 applier.py:697] Create Transform: (ds=Mikes_Snowflake_demos df=dbt_test customer_revenue_by_month)
I1021 12:52:11.593106 75551 applier.py:756] Apply Component: (ds=Mikes_Snowflake_demos df=dbt_test customer_churn_month)
I1021 12:52:11.856432 75551 applier.py:697] Create Transform: (ds=Mikes_Snowflake_demos df=dbt_test customer_churn_month)
I1021 12:52:12.824820 75551 applier.py:756] Apply Component: (ds=Mikes_Snowflake_demos df=dbt_test mrr)
I1021 12:52:13.091333 75551 applier.py:697] Create Transform: (ds=Mikes_Snowflake_demos df=dbt_test mrr)
```
The `dbt-models merge` command requires the pointers to the `manifest.json` file that contains the required information. The transforms will be created in the dataflow with the same names as the models. If there are any name conflicts, the transforms will not be created. The `--default-seed` option allows you to specify the default seed to use for the transforms that are not connected to anything, but this requires their SQL files to be edited to reference the default seed as `XXX.YYY.subscription_periods`. If the transforms SQL relies on certain fields, they should be present in the seed connector and other transforms. Some manual cleanup may be required after the transforms are created to clean up errors.
#### Use the `dbt-models update-sql` command to only update the SQL statements of the existing transforms
The `dbt-models update-sql` command will only update the SQL statements of the existing transforms and leave the rest of the configuration as-is. This is useful to update in place the SQL of the existing transforms that were created from the dbt models.
#### If needed, use the `dbt-models delete` command
The `dbt-models delete` command will delete the transforms created from dbt models from the dataflow. Make sure to delete the deployed tests first with `dbt-tests delete`, otherwise the deletion of transforms will fail. The following output is an example of the `dbt-models delete` command:
```
Deleting node mrr
Deleting node customer_churn_month
Deleting node customer_revenue_by_month
Deleting node util_months
```
### Testing
To run dbt tests in Ascend you will need to first deploy the tests. Once deployed, the tests will appear as grouped transform components in the data flow. Once deployed, the tests will automatically run and will remain running continuously in the flow. Hence, the dbt tests can not only be used to validate the deployment but can also serve as continuously running data quality tests to validate the correct operation of your data flow.
The `dbt-tests deploy-tests` command will deploy the tests to the dataflow. The `dbt-tests delete-tests` command will remove the deployed tests. The `dbt-tests check-test-results` command will check the test results. The following output is an example of the `dbt-tests check-test-results` command:
```markdown
Checking test results based on the number of records returned by tests...
Test unique_customers_customer_id produced 0 records. Test result is 'Ok'.
Test not_null_customers_customer_id produced 0 records. Test result is 'Ok'.
Test unique_orders_order_id produced 0 records. Test result is 'Ok'.
Test not_null_orders_order_id produced 0 records. Test result is 'Ok'.
...

Checking test results based on data quality check results...
Test unique_customers_customer_id produced 'passed'
Test not_null_customers_customer_id produced 'passed'
Test unique_orders_order_id produced 'passed'
Test not_null_orders_order_id produced 'passed'
...
```
### Creating individual transforms
The `dbt-model create-component` command will create a single transform from the model. This command can also create the associated tests and group them separately per model. 
### Updating the SQL of individually created transforms
The `dbt-model update-sql` command will update the SQL of the transform created from the model.
