Metadata-Version: 2.1
Name: reladiff
Version: 0.4.0
Summary: Command-line tool and Python library to efficiently diff rows across two different databases.
Home-page: https://github.com/erezsh/reladiff
License: MIT
Author: Erez Shinan
Author-email: erezshin@gmail.com
Requires-Python: >=3.8,<4.0
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: Intended Audience :: System Administrators
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Topic :: Database :: Database Engines/Servers
Classifier: Typing :: Typed
Provides-Extra: clickhouse
Provides-Extra: duckdb
Provides-Extra: mysql
Provides-Extra: oracle
Provides-Extra: postgresql
Provides-Extra: preql
Provides-Extra: presto
Provides-Extra: snowflake
Provides-Extra: trino
Provides-Extra: vertica
Requires-Dist: click (>=8.1,<9.0)
Requires-Dist: clickhouse-driver; extra == "clickhouse"
Requires-Dist: cryptography; extra == "snowflake"
Requires-Dist: dsnparse
Requires-Dist: duckdb (>=0.6.0,<0.7.0); extra == "duckdb"
Requires-Dist: mysql-connector-python (==8.0.29); extra == "mysql"
Requires-Dist: presto-python-client; extra == "presto"
Requires-Dist: psycopg2; extra == "postgresql"
Requires-Dist: rich
Requires-Dist: runtype (>=0.4.2,<0.5.0)
Requires-Dist: snowflake-connector-python (>=2.7.2,<3.0.0); extra == "snowflake"
Requires-Dist: sqeleton (==0.1.0)
Requires-Dist: toml (>=0.10.2,<0.11.0)
Requires-Dist: trino (>=0.314.0,<0.315.0); extra == "trino"
Project-URL: Documentation, https://reladiff.readthedocs.io/en/latest/
Project-URL: Repository, https://github.com/erezsh/reladiff
Description-Content-Type: text/markdown

# **reladiff**

## What is `reladiff`?
reladiff is a **free, open-source tool** that enables data professionals to detect differences in values between any two tables. It's fast, easy to use, and reliable. Even at massive scale.

## Documentation

[**🗎 Documentation website**](https://reladiff.readthedocs.io/en/latest/) - our detailed documentation has everything you need to start diffing.

### Databases we support

- PostgreSQL >=10
- MySQL
- Snowflake
- BigQuery
- Redshift
- Oracle
- Presto
- Databricks
- Trino
- Clickhouse
- Vertica
- DuckDB >=0.6
- SQLite (coming soon)

For their corresponding connection strings, check out our [detailed table](TODO).

#### Looking for a database not on the list?
If a database is not on the list, we'd still love to support it. [Please open an issue](https://github.com/erezsh/reladiff/issues) to discuss it, or vote on existing requests to push them up our todo list.

## Use cases

### Diff Tables Between Databases
#### Quickly identify issues when moving data between databases

<p align="center">
  <img alt="diff2" src="https://user-images.githubusercontent.com/1799931/196754998-a88c0a52-8751-443d-b052-26c03d99d9e5.png" />
</p>

### Diff Tables Within a Database
#### Improve code reviews by identifying data problems you don't have tests for
<p align="center">
  <a href=https://www.loom.com/share/682e4b7d74e84eb4824b983311f0a3b2 target="_blank">
    <img alt="Intro to Diff" src="https://user-images.githubusercontent.com/1799931/196576582-d3535395-12ef-40fd-bbbb-e205ccae1159.png" width="50%" height="50%" />
  </a>
</p>

&nbsp;
&nbsp;

## Get started

### Installation

#### First, install `reladiff` using `pip`.

```
pip install reladiff
```

#### Then, install one or more driver(s) specific to the database(s) you want to connect to.

- `pip install 'reladiff[mysql]'`

- `pip install 'reladiff[postgresql]'`

- `pip install 'reladiff[snowflake]'`

- `pip install 'reladiff[presto]'`

- `pip install 'reladiff[oracle]'`

- `pip install 'reladiff[trino]'`

- `pip install 'reladiff[clickhouse]'`

- `pip install 'reladiff[vertica]'`

- For BigQuery, see: https://pypi.org/project/google-cloud-bigquery/

_Some drivers have dependencies that cannot be installed using `pip` and still need to be installed manually._

### Run your first diff

Once you've installed `reladiff`, you can run it from the command line.

```
reladiff DB1_URI TABLE1_NAME DB2_URI TABLE2_NAME [OPTIONS]
```

Be sure to read [the docs](TODO) for detailed instructions how to build one of these commands depending on your database setup.

#### Code Example: Diff Tables Between Databases
Here's an example command for your copy/pasting, taken from the screenshot above when we diffed data between Snowflake and Postgres.

```
reladiff \
  postgresql://<username>:'<password>'@localhost:5432/<database> \
  <table> \
  "snowflake://<username>:<password>@<password>/<DATABASE>/<SCHEMA>?warehouse=<WAREHOUSE>&role=<ROLE>" \
  <TABLE> \
  -k activity_id \
  -c activity \
  -w "event_timestamp < '2022-10-10'"
```

#### Code Example: Diff Tables Within a Database

Here's a code example from [the video](https://www.loom.com/share/682e4b7d74e84eb4824b983311f0a3b2), where we compare data between two Snowflake tables within one database.

```
reladiff \
  "snowflake://<username>:<password>@<password>/<DATABASE>/<SCHEMA_1>?warehouse=<WAREHOUSE>&role=<ROLE>" <TABLE_1> \
  <SCHEMA_2>.<TABLE_2> \
  -k org_id \
  -c created_at -c is_internal \
  -w "org_id != 1 and org_id < 2000" \
  -m test_results_%t \
  --materialize-all-rows \
  --table-write-limit 10000
```

In both code examples, I've used `<>` carrots to represent values that **should be replaced with your values** in the database connection strings. For the flags (`-k`, `-c`, etc.), I opted for "real" values (`org_id`, `is_internal`) to give you a more realistic view of what your command will look like.

### We're here to help!

We know that in some cases, the reladiff command can become long and dense. And maybe you're new to the command line.

* We're here to help [on slack](https://locallyoptimistic.slack.com/archives/C03HUNGQV0S) if you have ANY questions as you use `reladiff` in your workflow.
* You can also post a question in [GitHub Discussions](https://github.com/erezsh/reladiff/discussions).


To get a Slack invite - [click here](https://locallyoptimistic.com/community/)

## How to Use

* [How to use from the shell (or: command-line)](TODO)
* [How to use from Python](TODO)
* [How to use with TOML configuration file](TODO)
* [Usage Analytics & Data Privacy](TODO)

## How to Contribute
* Feel free to open an issue or contribute to the project by working on an existing issue.
* Please read the [contributing guidelines](https://github.com/erezsh/reladiff/blob/master/CONTRIBUTING.md) to get started.

Big thanks to everyone who contributed so far:

<a href="https://github.com/erezsh/reladiff/graphs/contributors">
  <img src="https://contributors-img.web.app/image?repo=erezsh/reladiff" />
</a>

## Technical Explanation

Check out this [technical explanation](TODO) of how reladiff works.

## License

This project is licensed under the terms of the [MIT License](https://github.com/erezsh/reladiff/blob/master/LICENSE).

