Metadata-Version: 2.4
Name: snowflake-data-exchange-agent
Version: 1.5.0
Summary: Data exchange agent for migrations and validation
Project-URL: Bug Tracker, https://github.com/snowflake-eng/migrations-data-validation/issues
Project-URL: Source code, https://github.com/snowflake-eng/migrations-data-validation/
Project-URL: homepage, https://www.snowflake.com/
Author-email: "Snowflake, Inc." <snowflake-python-libraries-dl@snowflake.com>
License: Snowflake Conversion Software Terms
Keywords: Snowflake,analytics,cloud,data,data-analysis,data-analytics,data-engineering,data-management,data-processing,data-science,data-visualization,data-warehouse,database
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: Environment :: Other Environment
Classifier: License :: Other/Proprietary License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: SQL
Classifier: Topic :: Database
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Software Development
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Software Development :: Libraries :: Application Frameworks
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.11
Requires-Dist: azure-identity>=1.25.0
Requires-Dist: azure-storage-blob>=12.26.0
Requires-Dist: boto3>=1.40.41
Requires-Dist: dependency-injector>=4.48.2
Requires-Dist: flask>=3.1.2
Requires-Dist: psutil>=7.1.0
Requires-Dist: pyarrow>=22.0.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: pyodbc>=5.0.0
Requires-Dist: requests>=2.32.5
Requires-Dist: snowflake-connector-python>=4.0.0
Requires-Dist: snowflake-data-validation
Requires-Dist: sqlparse==0.5.4
Requires-Dist: toml==0.10.2
Requires-Dist: urllib3>=2.6.3
Requires-Dist: waitress>=3.0.2
Provides-Extra: all
Requires-Dist: parameterized>=0.9.0; extra == 'all'
Requires-Dist: pytest-cov>=4.0.0; extra == 'all'
Requires-Dist: pytest-mock>=3.10.0; extra == 'all'
Requires-Dist: pytest>=7.0.0; extra == 'all'
Requires-Dist: ruff>=0.1.0; extra == 'all'
Requires-Dist: ty>=0.0.1a5; extra == 'all'
Provides-Extra: development
Requires-Dist: parameterized>=0.9.0; extra == 'development'
Requires-Dist: pytest-cov>=4.0.0; extra == 'development'
Requires-Dist: pytest-mock>=3.10.0; extra == 'development'
Requires-Dist: pytest>=7.0.0; extra == 'development'
Requires-Dist: ruff>=0.1.0; extra == 'development'
Requires-Dist: ty>=0.0.1a5; extra == 'development'
Description-Content-Type: text/markdown

# Snowflake Data Exchange Agent

[![Python](https://img.shields.io/badge/python-3.11+-blue)](https://www.python.org/downloads/)

The Data Exchange Agent is the **Worker** component of the Cloud Data Migration solution. It connects to source databases (SQL Server, Amazon Redshift, Teradata, PostgreSQL), extracts data, and uploads it to Snowflake stages for ingestion by the **Data Migration Orchestrator** (`snowflake-data-migration-orchestrator`).

## Installation

```bash
pip install snowflake-data-exchange-agent
```

**Python Version**: 3.11 or higher

## Usage

```bash
# Start with a configuration file
data-exchange-agent -c <configuration-file-path>

# Start with default configuration.toml in current directory
data-exchange-agent

# Custom port and parallelism
data-exchange-agent --max-parallel-tasks 8 --port 8080

# Debug mode
data-exchange-agent --debug --port 5001
```

## Worker Configuration

The Worker configuration file uses [TOML](https://toml.io/) format.

| Section | Property | Type | Description |
|---------|----------|------|-------------|
| **Top Level** | `selected_task_source` | String | Currently should always be set to `"snowflake_stored_procedure"`. |
| `[application]` | `max_parallel_tasks` | Integer | Maximum number of tasks the worker will process in parallel (using threads). |
| `[application]` | `task_fetch_interval` | Integer | Interval (in seconds) between attempts to fetch new tasks from the Orchestrator. |
| `[application]` | `snowflake_database_for_metadata` | String | Optional. Database where the orchestrator deployed the task queue (default `SNOWCONVERT_AI`). Must match the orchestrator’s `CUSTOM_SNOWFLAKE_DATABASE_FOR_METADATA` if you override it there. |
| `[application]` | `snowflake_schema_for_data_migration_metadata` | String | Optional. Schema for `PULL_TASKS` / `COMPLETE_TASK` / `FAIL_TASK` (default `DATA_MIGRATION`). Must match the orchestrator’s `CUSTOM_SNOWFLAKE_SCHEMA_FOR_DATA_MIGRATION_METADATA` if overridden. |
| `[connections.source.*]` | | Object | Configuration for source system connections. The Worker typically requires an ODBC driver. See examples below. |
| `[connections.target.snowflake_connection_name]` | `connection_name` | String | The name of the connection entry in the `~/.snowflake/config.toml` file to use. |

When `selected_task_source` is `snowflake_stored_procedure`, the worker issues `CALL` statements against the task-queue using `application.snowflake_database_for_metadata` and `application.snowflake_schema_for_data_migration_metadata`. These settings are independent of Snowflake connection session defaults (`SNOWFLAKE_DATABASE`, `SNOWFLAKE_SCHEMA` in the connection profile).

**Example: SQL Server (Standard Authentication)**

```toml
[connections.source.sqlserver]
username = "username"
password = "password"
database = "database_name"
host = "127.0.0.1"
port = 1433
```

**Example: Amazon Redshift (IAM Authentication)**

```toml
[connections.source.redshift]
username = "demo-user"
database = "demo_db"
auth_method = "iam-provisioned-cluster"
cluster_id = "my-aws-cluster"
region = "us-west-2"
access_key_id = "your-access-key-id"
secret_access_key = "your-secret-access-key"
```

**Example: Amazon Redshift (Standard Authentication)**

```toml
[connections.source.redshift]
username = "myuser"
password = "mypassword"
database = "mydatabase"
host = "my-cluster.abcdef123456.us-west-2.redshift.amazonaws.com"
port = 5439
auth_method = "standard"
```

**Example: Teradata (ODBC)**

Set `driver_name` to the exact name returned by your environment’s ODBC driver list (for example from `pyodbc.drivers()`). The default port is `1025`. Use `dbc_name` when your Teradata COP / TDPID alias differs from `host`.

```toml
[connections.source.teradata]
driver_name = "Teradata Database ODBC Driver 17.20"
host = "your-teradata-host.example.com"
port = 1025
database = "tpcds"
username = "your_username"
password = "your_password"
# dbc_name = "TDPID_ALIAS"  # optional; defaults to host
```

> **Note:** Only one source connection is needed. The Snowflake target connection should point to a valid entry in your `~/.snowflake/config.toml`.

### ODBC Driver Auto-Detection

The agent automatically detects the best available ODBC driver for SQL Server connections. If no `odbc_driver` is specified in the configuration, it will prefer the newest available driver (ODBC Driver 18 > 17 > 13 > 11). If a specific driver is requested but not found, it falls back to the best available driver with a warning.

To manually specify a driver:

```toml
[connections.source.sqlserver]
odbc_driver = "ODBC Driver 17 for SQL Server"
```

### ODBC Encryption (SQL Server)

The `encrypt` and `trust_server_certificate` parameters are optional. By default, they are omitted from the connection string, allowing the ODBC driver to use its default behavior:

- **ODBC Driver 17 and below**: Encryption is disabled by default.
- **ODBC Driver 18 and above**: Encryption is mandatory by default.

```toml
[connections.source.sqlserver]
username = "sa"
password = "mypassword"
database = "mydb"
host = "my-server.example.com"
port = 1433
encrypt = true
trust_server_certificate = false
```

For development environments or SQL Servers without encryption support, either omit the encryption parameters or set `encrypt = false`.
