Metadata-Version: 2.1
Name: recce
Version: 0.2.0
Summary: Environment diff tool for dbt
Home-page: https://github.com/InfuseAI/recce
Author: InfuseAI Dev Team
Author-email: dev@infuseai.io
Project-URL: Bug Tracker, https://github.com/InfuseAI/recce/issues
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Development Status :: 4 - Beta
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: ruamel.yaml <0.18.0
Requires-Dist: click >=7.1
Requires-Dist: dbt-core >=1.5
Requires-Dist: pandas
Requires-Dist: portalocker
Requires-Dist: fastapi
Requires-Dist: itsdangerous
Requires-Dist: uvicorn
Requires-Dist: pydantic
Requires-Dist: jinja2
Requires-Dist: requests >=2.28.1
Requires-Dist: rich >=12.0.0
Requires-Dist: sentry-sdk
Requires-Dist: watchdog
Requires-Dist: websockets
Provides-Extra: dev
Requires-Dist: pytest >=4.6 ; extra == 'dev'
Requires-Dist: pytest-flake8 ; extra == 'dev'
Requires-Dist: flake8 ==3.9.2 ; extra == 'dev'
Requires-Dist: pytest-mypy ; extra == 'dev'
Requires-Dist: pytest-cov ; extra == 'dev'
Requires-Dist: twine ; extra == 'dev'
Requires-Dist: tox ; extra == 'dev'

# Recce

[![install](https://img.shields.io/badge/pip_install-recce-006DAD?style=flat-square)](https://pypi.org/project/recce/)
[![pipy](https://img.shields.io/pypi/v/recce?style=flat-square)](https://pypi.org/project/recce/)
[![Python](https://img.shields.io/pypi/pyversions/recce?style=flat-square)](https://pypi.org/project/recce/)
[![downloads](https://img.shields.io/pypi/dw/recce?style=flat-square)](https://pypi.org/project/recce/#files)
[![license](https://img.shields.io/github/license/DataRecce/recce?style=flat-square)](https://github.com/DataRecce/recce/blob/main/LICENSE)

[![InfuseAI Discord Invite](https://img.shields.io/discord/664381609771925514?color=%237289DA&label=chat&logo=discord&logoColor=white&style=flat-square)](https://discord.com/invite/5zb2aK9KBV)

`Recce` is a PR review tool designed for DBT projects. It facilitates the comparison of results between two environments, such as development and production, and helps in identifying any differences.

## Features

- **[Lineage diff](#lineage-diff)**
- **[Schema diff](#schema-diff)**
- **[Row count diff](#row-count-diff)**
- **[Ad-Hoc Query and Query Diff](#ad-hoc-query-and-query-diff)**
- **[Profile Diff](#profile-diff)**
- **[Value Diff](#value-diff)**
- **[Checklist](#checklist)**

## Use cases

Recce is primarily designed for PR Review purposes. However, it can be expanded to the following use cases:

1. During development, we can verify new results by contrasting them with those from production prior to pushing the changes.
2. While reviewing PR, you can grasp the extent of the changes and their impact before merging.
3. For troubleshooting, you can execute ad-hoc diff queries to pinpoint the root causes.

# Usage

## Prerequisites

You have to have at least two [environments](https://docs.getdbt.com/docs/core/dbt-core-environments) in your dbt project. For example, one is for development and another is for production. You can prepare two targets with separate [schemas](https://docs.getdbt.com/docs/core/connect-data-platform/connection-profiles#understanding-target-schemas) in your DBT profile. Here is `profiles.yml` example

```
jaffle_shop:
  target: dev
  outputs:
    dev:
      type: duckdb
      path: jaffle_shop.duckdb
      schema: dev
    prod:
      type: duckdb
      path: jaffle_shop.duckdb
      schema: main
```

## Getting Started

[5 minutes walkthrough by jaffle shop example](./docs/get-started-jaffle-shop.md)

1. Installation

   ```
   pip install recce
   ```

1. Go to your DBT project
   ```
   cd your-dbt-project/
   ```
1. **Prepare base artifacts**: DBT generates [artifacts](https://docs.getdbt.com/reference/artifacts/dbt-artifacts) when every invocation. You can find these files in the `target/` folder.

   | artifacts               | DBT command                |
   | ----------------------- | -------------------------- |
   | manifest.json           | `dbt run`, `dbt build`, .. |
   | catalog.json (optional) | `dbt docs generate`        |

   Copy the artifacts for the base environment to `target-base/` folder.

1. Run the recce server.

   ```
   recce server
   ```

   Recce would diff environments between `target/` and `target-base/`

## Lineage Diff

![](./docs/assets/lineage_diff.png)

## Ad-Hoc Query and Query Diff

![](./docs/assets/query_diff.png)

You can use any dbt macros installed in your project.

```jinja
select * from {{ ref("mymodel") }}
```

In the query diff, which involves comparing the results from two different environments, the browser is required to pull all result data to the client side. Consequently, minimizing the data volume in the query results is essential for efficiency and performance.

> [!TIP]
> Hotkeys
> Cmd + Enter: Run query
> Cmd + Shift + Enter: Run query diff

## Schema diff

![](./docs/assets/schema_diff.png)

To use schema diff, make sure that both environments have `catalog.json.`

## Row count diff

![](./docs/assets/row_count_diff.png)

## Profile diff

![](./docs/assets/profile_diff.png)

Profile diff uses the `get_profile` from [dbt_profiler](https://hub.getdbt.com/data-mie/dbt_profiler/latest/). Make sure that this package is installed in your project.

```
packages:
  - package: data-mie/dbt_profiler
    version: <version>
```

Please reference [dbt-profiler] to understand the definition of each profiling stats.

## Value diff

![](./docs/assets/value_diff.png)

1. **Added**: New added PKs.
2. **Removed**: Removed PKs.
3. **Matched**: For a column, the count of matched value of common PKs.
4. **Matched %**: For a column, the ratio of matched over common PKs.

_PK: Primary key_

Value diff uses the `compare_column_values` from [audit-helper](https://hub.getdbt.com/dbt-labs/audit_helper/latest/). Make sure that this package is installed in your project.

```
packages:
  - package: dbt-labs/audit_helper
    version: <version>
```

Value diff requires to select a column as private key. The `catalog.json` is required to list the available columns.

## Checklist

![](./docs/assets/check.png)

When you feel that your query is worth recording, you can add it to the checklist and name the title and leave a description. This is very helpful for subsequent posting in PR comments.

# Q&A

### Q: How `recce` connect to my data warehouse? Does recce support my data warehouse?

`recce` use the [dbt adapter](https://docs.getdbt.com/docs/connect-adapters) to connect to your warehouse. So it should work for your data warehouse.

### Q: What credential does `recce` connect to the two environments?

Recce uses the same target in the profile to connect your warehouse. If you use the default target `dev`, it uses the credentials to connect to both environments. So please make sure that the credential can access both environments.
