Metadata-Version: 2.1
Name: dbt-gp
Version: 1.8.0
Summary: The greenplum adapter plugin for dbt (data build tool)
Home-page: https://github.com/markporoshin/dbt-greenplum
Author: Vladimir Mozzhukhin
Author-email: vladimir.bm@yandex.ru
Classifier: Development Status :: 5 - Production/Stable
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Python: >=3.7
Description-Content-Type: text/markdown
Requires-Dist: dbt-core~=1.8.0
Requires-Dist: dbt-postgres~=1.8.0
Requires-Dist: psycopg2-binary~=2.8

<p align="center">
  <img src="https://raw.githubusercontent.com/dbt-labs/dbt/ec7dee39f793aa4f7dd3dae37282cc87664813e4/etc/dbt-logo-full.svg" alt="dbt logo" width="500"/>
</p>
<p align="center">
  <a href="https://github.com/dbt-labs/dbt-redshift/actions/workflows/main.yml">
    <img src="https://github.com/dbt-labs/dbt-redshift/actions/workflows/main.yml/badge.svg?event=push" alt="Unit Tests Badge"/>
  </a>
  <a href="https://github.com/dbt-labs/dbt-redshift/actions/workflows/integration.yml">
    <img src="https://github.com/dbt-labs/dbt-redshift/actions/workflows/integration.yml/badge.svg?event=push" alt="Integration Tests Badge"/>
  </a>
</p>

**[dbt](https://www.getdbt.com/)** enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.

dbt is the T in ELT. Organize, cleanse, denormalize, filter, rename, and pre-aggregate the raw data in your warehouse so that it's ready for analysis.

## dbt-greenplum

The `dbt-greenplum` package contains the code enabling dbt to work with Greenplum. This adapter based on [postgres-adapter](https://github.com/dbt-labs/dbt-core/blob/main/plugins/postgres/dbt/include/postgres/profile_template.yml) with a bit difference for a greenplum specific features

## Installation

Easiest way to start use dbt-greenplum is to install it using pip
`pip install dbt-greenplum==<version>`

Where `<version>` is same as your dbt version

Available versions:
 - 0.19.2
 - 1.0.4
 - 1.2.0
 - 1.4.0
 - 1.5.0
 - 1.8.0

## Supported Features

You can specify following settings:
 - Storage type
   - heap
   - appendoptimized
 - Distribution
   - `distributed randomly` by defaut
   - `distributed by (column, [ ... ] )` by setting up `distributed_by` parameter in the model config
   - `distributed replicated` by setting up `distributed_replicated=true` parameter in the model config
 - Table orientation
   - `orientation=colum` by default
   - `orientation=row` by setting up `orientation` parameter in `row` in the model config
 - Compress type, level and blocksize with default values
   ```bash
    compresstype=ZLIB,
    compresslevel=1,
    blocksize=32768
   ``` 
    You can also specify `blocksize`, `compresstype`, `compresslevel` in the model config
 - `appendoptimized` preference by default is `true`, also you can override it by setting up `appendoptimized` field in the model config
 - Partitions (see "Partition" chapter below)
 - Additional incremental strategy
   - `truncate-insert` by setting up `incremental_strategy="truncate+insert"` parameter in the model config or `+incremental_strategy: truncate_insert` in the dbt_project.yml

### Heap table example

To create heap table set `appendoptimized` parameter value to `false` 

```SQL
{{
   config(
      ...
      materialized='table',
      appendoptimized=false
      ...
   )
}}

select 1 as "id"
```

will produce following SQL code

```SQL
create  table "<db_name>"."<schema_name>"."<table_name>"
with (
   appendoptimized=false
) as (
   select 1 as "id"
)
DISTRIBUTED RANDOMLY;
```

### Appendoptimized table example
You can use `appendopimized` or `appendonly`(legacy) to create appendoptimized table

Such model definition

```SQL
{{
    config(
        materialized='table',
        distributed_by='id',
        appendoptimized=true,
        orientation='column',
        compresstype='ZLIB',
        compresslevel=1,
        blocksize=32768
    )
}}

with source_data as (

    select 1 as id
    union all
    select null as id

)

select *
from source_data
```

will produce following sql code

```SQL
create  table "dvault"."dv"."my_first_dbt_model__dbt_tmp"
with (
    appendoptimized=true,
    blocksize=32768,
    orientation=column,
    compresstype=ZLIB,
    compresslevel=1
)
as (
  with source_data as (
      select 1 as id
      union all
      select null as id
    )
  select *
  from source_data
)  
distributed by (id);

  
alter table "dvault"."dv"."my_first_dbt_model__dbt_tmp" rename to "my_first_dbt_model";
```

### Partitions

Greenplum does not support partitions with `create table as` [construction](https://gpdb.docs.pivotal.io/6-9/ref_guide/sql_commands/CREATE_TABLE_AS.html), so you need to build model in two steps
 - create table schema
 - insert data

To implement partitions into you dbt-model you need to specify on of the following config parameters:
 - `fields_string` - definition of columns name, type and constraints
 - one of following way to configure partitions
   - `raw_partition` by default
   - `partition_type`, `partition_column`, `partition_spec`
   - `partition_type`, `partition_column`, `partition_start`, `partition_end`, `partition_every`
   - `partition_type`, `partition_column`, `partition_values`
 - `default_partition_name` - name of default partition 'other' by default

Let consider examples of definition model with partitions

 - using `raw_partition` parameter
   ```SQL
   {% set fields_string %}
        id int4 null,
        incomingdate timestamp NULL
   {% endset %}


   {% set raw_partition %}
       PARTITION BY RANGE (incomingdate)
       (
           START ('2021-01-01'::timestamp) INCLUSIVE
           END ('2023-01-01'::timestamp) EXCLUSIVE
           EVERY (INTERVAL '1 day'),
           DEFAULT PARTITION extra
       );
   {% endset %}

   {{
       config(
           materialized='table',
           distributed_by='id',
           appendoptimized=true,
           orientation='column',
           compresstype='ZLIB',
           compresslevel=1,
           blocksize=32768,
           fields_string=fields_string,
           raw_partition=raw_partition,
           default_partition_name='other_data'
       )
   }}
   
   with source_data as (
   
       select
           1 as id,
           '2022-02-22'::timestamp as incomingdate
       union all
       select
           null as id,
           '2022-02-25'::timestamp as incomingdate
   )
   select *
   from source_data
   ```
   will produce following sql code
   ```SQL
   create table if not exists "database"."schema"."my_first_dbt_model__dbt_tmp" (
       id int4 null,
       incomingdate timestamp NULL
   )
   with (
       appendoptimized=true,
       blocksize=32768,
       orientation=column,
       compresstype=ZLIB,
       compresslevel=1
   )
   DISTRIBUTED BY (id)
   PARTITION BY RANGE (incomingdate)
   (
       START ('2021-01-01'::timestamp) INCLUSIVE
       END ('2023-01-01'::timestamp) EXCLUSIVE
       EVERY (INTERVAL '1 day'),
       DEFAULT PARTITION extra
   );
   
   insert into "database"."schema"."my_first_dbt_model__dbt_tmp" (
       with source_data as (
   
           select
               1 as id,
               '2022-02-22'::timestamp as incomingdate
           union all
           select
               null as id,
               '2022-02-25'::timestamp as incomingdate
       )
       select *
       from source_data
   );
   alter table "dvault"."dv"."my_first_dbt_model" rename to "my_first_dbt_model__dbt_backup";
   drop table if exists "dvault"."dv"."my_first_dbt_model__dbt_backup" cascade;
   alter table "database"."schema"."my_first_dbt_model__dbt_tmp" rename to "my_first_dbt_model";
   ```
 - Same result you can get using `partition_type`, `partition_column`, `partition_spec` parameters
   ```SQL
   {% set fields_string %}
       id int4 null,
       incomingdate timestamp NULL
   {% endset %}

   {%- set partition_type = 'RANGE' -%}
   {%- set partition_column = 'incomingdate' -%}
   {% set partition_spec %}
       START ('2021-01-01'::timestamp) INCLUSIVE
       END ('2023-01-01'::timestamp) EXCLUSIVE
       EVERY (INTERVAL '1 day'),
       DEFAULT PARTITION extra
   {% endset %}
   
   {{
       config(
           materialized='table',
           distributed_by='id',
           appendoptimized=true,
           orientation='column',
           compresstype='ZLIB',
           compresslevel=1,
           blocksize=32768,
           fields_string=fields_string,
           partition_type=partition_type,
           partition_column=partition_column,
           partition_spec=partition_spec,
           default_partition_name='other_data'
       )
   }}
   
   with source_data as (
   
       select
           1 as id,
           '2022-02-22'::timestamp as incomingdate
       union all
       select
           null as id,
           '2022-02-25'::timestamp as incomingdate
   )
   select *
   from source_data
   ```
 - also, you can use third way
   ```SQL
   {% set fields_string %}
       id int4 null,
       incomingdate timestamp NULL
   {% endset %}
   
   
   {%- set partition_type = 'RANGE' -%}
   {%- set partition_column = 'incomingdate' -%}
   {%- set partition_start = "'2021-01-01'::timestamp" -%}
   {%- set partition_end = "'2022-01-01'::timestamp" -%}
   {%- set partition_every = '1 day' -%}
   
   
   {{
       config(
           materialized='table',
           distributed_by='id',
           appendoptimized=true,
           orientation='column',
           compresstype='ZLIB',
           compresslevel=1,
           blocksize=32768,
           fields_string=fields_string,
           partition_type=partition_type,
           partition_column=partition_column,
           partition_start=partition_start,
           partition_end=partition_end,
           partition_every=partition_every,
           default_partition_name='other_data'
       )
   }}
   
   with source_data as (
   
       select
           1 as id,
           '2022-02-22'::timestamp as incomingdate
       union all
       select
           null as id,
           '2022-02-25'::timestamp as incomingdate
   )
   select *
   from source_data
   ```
 - example of partition_type `LIST` is coming soon

#### Table partition hints

Too check generate sql script use `-d` option:
`dbt -d run <...> -m <models>`

If you want implement complex partition logic with subpartition or something else use `raw_partition` parameter

### `Truncate+insert` incremental strategy

You can use this incremental strategy to safely reload models without cascade dropping dependent objects due to Greenplum's transactional `TRUNCATE` operation realization  

Model definition example:
```SQL
{{
   config(
      ...
      materialized='incremental',
      incremental_strategy='truncate+insert',
      ...
   )
}}

select *
from source_data
```

## Getting started

- [Install dbt](https://docs.getdbt.com/docs/installation)
- Read the [introduction](https://docs.getdbt.com/docs/introduction/) and [viewpoint](https://docs.getdbt.com/docs/about/viewpoint/)

## Join the dbt Community

- Be part of the conversation in the [dbt Community Slack](http://community.getdbt.com/)
- Read more on the [dbt Community Discourse](https://discourse.getdbt.com)

## Reporting bugs and contributing code

- Want to report a bug or request a feature? Let us know on [Slack](http://community.getdbt.com/), or open [an issue](https://github.com/markporoshin/dbt-greenplum/issues/new)
- Want to help us build dbt? Check out the [Contributing Guide](https://github.com/dbt-labs/dbt/blob/HEAD/CONTRIBUTING.md)

## Code of Conduct

Everyone interacting in the dbt project's codebases, issue trackers, chat rooms, and mailing lists is expected to follow the [dbt Code of Conduct](https://community.getdbt.com/code-of-conduct).
