Metadata-Version: 2.4
Name: unitycatalog-ai
Version: 0.2.0
Summary: Official Python library for Unity Catalog AI support
Project-URL: homepage, https://www.unitycatalog.io/
Project-URL: issues, https://github.com/unitycatalog/unitycatalog/issues
Project-URL: repository, https://github.com/unitycatalog/unitycatalog
Author-email: Serena Ruan <serena.ruan@databricks.com>, Ben Wilson <benjamin.wilson@databricks.com>
Maintainer-email: Serena Ruan <serena.ruan@databricks.com>, Ben Wilson <benjamin.wilson@databricks.com>
License: Apache-2.0
Keywords: ai,databricks,function,genai,tool,unitycatalog
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: End Users/Desktop
Classifier: Intended Audience :: Information Technology
Classifier: Intended Audience :: Other Audience
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Software Development :: User Interfaces
Requires-Python: >=3.9
Requires-Dist: asyncio
Requires-Dist: nest-asyncio
Requires-Dist: pydantic
Requires-Dist: typing-extensions
Requires-Dist: unitycatalog-client
Provides-Extra: databricks
Requires-Dist: databricks-connect==15.1.0; extra == 'databricks'
Requires-Dist: databricks-sdk>=0.32.0; extra == 'databricks'
Requires-Dist: pandas; extra == 'databricks'
Provides-Extra: databricks-dev
Requires-Dist: databricks-connect==15.1.0; extra == 'databricks-dev'
Requires-Dist: databricks-sdk>=0.32.0; extra == 'databricks-dev'
Requires-Dist: hatch; extra == 'databricks-dev'
Requires-Dist: pandas; extra == 'databricks-dev'
Requires-Dist: pytest; extra == 'databricks-dev'
Requires-Dist: ruff==0.9.3; extra == 'databricks-dev'
Provides-Extra: dev
Requires-Dist: databricks-connect; extra == 'dev'
Requires-Dist: databricks-sdk>=0.32.0; extra == 'dev'
Requires-Dist: hatch; extra == 'dev'
Requires-Dist: pandas; extra == 'dev'
Requires-Dist: pytest; extra == 'dev'
Requires-Dist: pytest-asyncio; extra == 'dev'
Requires-Dist: ruff==0.9.3; extra == 'dev'
Description-Content-Type: text/markdown

# Unity Catalog AI Core library

The Unity Catalog AI Core library provides convenient APIs to interact with Unity Catalog functions, including the creation, retrieval and execution of functions.
The library includes clients for interacting with both Unity Catalog servers and Databricks-managed Unity Catalog services, in support of UC functions as tools in agents.

## Installation

```sh
pip install unitycatalog-ai
```

If you are using the Databricks-managed version of Unity Catalog, you can install the optional additional Databricks dependencies by providing the option:

```sh
pip install unitycatalog-ai[databricks]
```

## Get started

### Unity Catalog Function Client

The Unity Catalog (UC) function client is a core component of the Unity Catalog AI Core Library, enabling seamless interaction with a Unity Catalog server. This client allows you to manage and execute UC functions, providing both asynchronous and synchronous interfaces to cater to various application needs. Whether you're integrating UC functions into GenAI workflows or managing them directly, the UC client offers robust and flexible APIs to facilitate your development process.

#### Key Features

- **Asynchronous and Synchronous Operations**: Flexibly choose between async and sync methods based on your application's concurrency requirements.
- **Comprehensive Function Management**: Easily create, retrieve, list, execute, and delete UC functions.
- **Wrapped Function Support**: In addition to standard single-function creation, you can create *wrapped functions* that in-line additional helper functions within a function's definition to simplify code reuse and modularity.
- **Integration with GenAI**: Seamlessly integrate UC functions as tools within Generative AI agents, enhancing intelligent automation workflows.
- **Type Safety and Caching**: Enforce strict type validation and utilize caching mechanisms to optimize performance and reduce redundant executions.

#### Caveats

When using the `UnitycatalogFunctionClient` for UC, be mindful of the following considerations:

- **Asynchronous API Usage**:
    - The `UnitycatalogFunctionClient` is built on top of the asynchronous [unitycatalog-client SDK](https://pypi.org/project/unitycatalog/), which utilizes aiohttp for REST communication with the UC server.
    - The function client for Unity Catalog offers **both asynchronous and synchronous methods**. The synchronous methods are wrappers around the asynchronous counterparts, ensuring compatibility with environments that may not support asynchronous operations.
    - **Important**: Avoid creating additional event loops in environments that already have a running loop (e.g., Jupyter Notebooks) to prevent conflicts and potential runtime errors.
- **Security Considerations**:
    - **WARNING** Function execution occurs **locally** within the environment where your application is running.
    - **Caution**: Executing GenAI-generated Python code can pose security risks, especially if the code includes operations like file system access or network requests.
    - **Recommendation**: Run your application in an isolated and secure environment with restricted permissions to mitigate potential security threats.
- **External Dependencies**:
    - Ensure that any external libraries required by your UC functions are pre-installed in the execution environment.
    - Best Practice: Import external dependencies within the function body to guarantee their availability during execution.
- **Function Overwriting**:
    - The `create_function`, `create_function_async`, `create_wrapped_function` and `create_wrapped_function_async` methods allow overwriting existing functions by setting the replace parameter to True.
    - **Warning**: Overwriting functions can disrupt workflows that depend on existing function definitions. Use this feature judiciously and ensure that overwriting is intentional.
- **Type Validation and Compatibility**:
    - The client performs strict type validation based on the defined schemas. Ensure that your function parameters and return types adhere to the expected types to prevent execution errors.

#### Prerequisites

Before using the UC functions client, ensure that your environment meets the following requirements:

- **Python Version**: Python 3.10 or higher is recommended to leverage all functionalities, including function creation and execution.

- **Dependencies**: Install the necessary packages using pip:

    ```sh
    pip install unitycatalog-client unitycatalog-ai
    ```

- **Unity Catalog Server**: Ensure that you have access to a running instance of the open-source Unity Catalog server. Follow the [Unity Catalog Installation Guide](https://docs.unitycatalog.io/quickstart/) to set up your server if you haven't already.

#### Client Initialization

To interact with UC functions, initialize the `UnitycatalogFunctionClient` as shown below:

```python
import asyncio
from unitycatalog.ai.core.client import UnitycatalogFunctionClient
from unitycatalog.client import ApiClient, Configuration

# Configure the Unity Catalog API client
config = Configuration(
    host="http://localhost:8080/api/2.1/unity-catalog"  # Replace with your UC server URL
)

# Initialize the asynchronous ApiClient
api_client = ApiClient(configuration=config)

# Instantiate the UnitycatalogFunctionClient
uc_client = UnitycatalogFunctionClient(api_client=api_client)

# Example catalog and schema names
CATALOG = "my_catalog"
SCHEMA = "my_schema"
```

#### Creating a UC Function

You can create a UC function either by providing a Python callable or by submitting a `FunctionInfo` object. Below is an example (recommended) of using the `create_python_function` API that accepts a Python callable (function) as input.

To create a UC function from a Python function, define your function with appropriate type hints and a Google-style docstring:

```python
def add_numbers(a: float, b: float) -> float:
    """
    Adds two numbers and returns the result.

    Args:
        a (float): First number.
        b (float): Second number.

    Returns:
        float: The sum of the two numbers.
    """
    return a + b

# Create the function within the Unity Catalog catalog and schema specified
function_info = uc_client.create_python_function(
    func=add_numbers,
    catalog=CATALOG,
    schema=SCHEMA,
    replace=False,  # Set to True to overwrite if the function already exists
)

print(function_info)
```

#### Creating a Wrapped UC Function

In addition to standard function creation, you can create *wrapped functions*. A wrapped function uses a primary function as the interface while in-lining additional helper functions (wrapped functions) into the primary function’s definition. This feature is useful when you want to keep helper logic bundled together with the main function without needing to replicate existing common utilities within your function definitions.

For example, consider the following helper functions and the primary wrapper function that has direct dependencies on the helper functions:

```python
def a(x: int) -> int:
    return x + 1

def b(y: int) -> int:
    return y + 2

def wrapper(x: int, y: int) -> int:
    """
    Wrapper function that in-lines helper functions a and b.

    Args:
        x (int): The first argument.
        y (int): The second argument.

    Returns:
        int: The combined result of a(x) and b(y).
    """
    return a(x) + b(y)
```

To register this wrapped function as a single UC function, you can call the `create_wrapped_function` API:

```python
function_info = uc_client.create_wrapped_function(
    primary_func=wrapper,
    functions=[a, b],
    catalog=CATALOG,
    schema=SCHEMA,
    replace=False,  # Set to True to overwrite if the function already exists
)
```

#### Retrieving a UC Function

To retrieve details of a specific UC function, use the get_function method with the full function name in the format `<catalog>.<schema>.<function_name>`:

```python
full_func_name = f"{CATALOG}.{SCHEMA}.add_numbers"

# Retrieve the function information and metadata
function_info = uc_client.get_function(full_func_name)

print(function_info)
```

#### Listing Functions

```python
# List all created functions within a given schema
functions = uc_client.list_functions(
    catalog=CATALOG,
    schema=SCHEMA,
    max_results=10  # Paginated results will contain a continuation token that can be submitted with additional requests
)

for func in functions.items:
    print(func)
```

#### Executing a Function

Note that function execution occurs in the main process of where you are calling this API from. Read the notes above about security considerations for unknown code execution before calling this API.

```python
full_func_name = f"{CATALOG}.{SCHEMA}.add_numbers"
parameters = {"a": 10.5, "b": 5.5}

# Or synchronously
result = uc_client.execute_function(full_func_name, parameters)

print(result.value)  # Outputs: 16.0
```

#### Function Parameter Defaults

Defining and executing functions with parameter defaults behave similarly to standard Python function argument defaults. If a parameter is not provided that is marked as having a default value when called via the `execute_function` API, the existing default parameter value will be mapped to the function invocation call.

If using defaults in your function signatures, ensure that the descriptions are accurate and declare what the default value is to ensure that Agentic use of your function is accurate.

#### Deleting a Function

To delete a function that you have write authority to, you can use the following API:

```python
full_func_name = f"{CATALOG}.{SCHEMA}.add_numbers"

uc_client.delete_function(full_func_name)
```

### Databricks-managed UC

To use Databricks-managed Unity Catalog with this package, follow the [instructions](https://docs.databricks.com/en/dev-tools/cli/authentication.html#authentication-for-the-databricks-cli) to authenticate to your workspace and ensure that your access token has workspace-level privilege for managing UC functions.

#### Prerequisites

- **[Highly recommended]** Use python>=3.10 for accessing all functionalities including function creation and function execution.
- For creating UC functions with a SQL body definition, **only [serverless compute](https://docs.databricks.com/en/compute/use-compute.html#use-serverless-compute) is supported**.
  Install databricks-connect package with `pip install databricks-connect==15.1.0` to access serverless compute. **python>=3.10** is a requirement to install this version of the package.
- For executing the UC functions within Databricks, use either SQL warehouse or Databricks Connect with serverless:
    - SQL warehouse: create a SQL warehouse following [this instruction](https://docs.databricks.com/en/compute/sql-warehouse/create.html), and use the warehouse id when initializing the client.
    NOTE: **only `serverless` [SQL warehouse type](https://docs.databricks.com/en/admin/sql/warehouse-types.html#sql-warehouse-types) is supported** because of performance concerns.
    - Databricks connect with serverless: Install databricks-connect package with `pip install databricks-connect==15.1.0`. No config needs to be passed when initializing the client.

#### Client initialization

In this example, we use serverless compute as an example.

```python
from unitycatalog.ai.core.databricks import DatabricksFunctionClient

client = DatabricksFunctionClient()
```

#### Create a UC function

Create a UC function with SQL string should follow [this syntax](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-sql-function.html#create-function-sql-and-python).

```python
# make sure you have privilege in the corresponding catalog and schema for function creation
CATALOG = "..."
SCHEMA = "..."
func_name = "test"
sql_body = f"""CREATE FUNCTION {CATALOG}.{SCHEMA}.{func_name}(s string)
RETURNS STRING
LANGUAGE PYTHON
AS $$
  return s
$$
"""

function_info = client.create_function(sql_function_body=sql_body)
```

#### Dependencies and Environments

In Databricks runtime version 17 and higher, the ability to specify dependencies within a function execution environment is supported. Earlier runtime
versions do not support this feature and will error if the arguments `dependencies` or `environment` are submitted with a `create_python_function` or `create_wrapped_python_function` call.

To specify PyPI dependencies to include in your execution environment, you can see the minimum example below:

```python
# Define a function that requires an external PyPI dependency

def dep_check(x: str) -> str:
    """
    A function to test the dependency support for UC

    Args:
        x: An input string
    
    Returns:
        A string that reports the dependency support for UC
    """

    import scrapy  # NOTE that you must still import the library to use within the function.

    return scrapy.__version__

# Create the function and supply the dependency in standard PyPI format
client.create_python_function(func=dep_check, catalog=CATALOG, schema=SCHEMA, replace=True, dependencies=["scrapy==2.10.1"])
```

#### Retrieve a UC function

The client also provides API to get the UC function information details. Note that the function name passed in must be the full name in the format of `<catalog>.<schema>.<function_name>`.

```python
full_func_name = f"{CATALOG}.{SCHEMA}.{func_name}"
client.get_function(full_func_name)
```

#### List UC functions

To get a list of functions stored in a catalog and schema, you can use list API with wildcards to do so.

```python
client.list_functions(catalog=CATALOG, schema=SCHEMA, max_results=5)
```

#### Execute a UC function

Parameters passed into execute_function must be a dictionary that maps to the input params defined by the UC function.

```python
result = client.execute_function(full_func_name, {"s": "some_string"})
assert result.value == "some_string"
```

##### Function execution arguments configuration

To manage the function execution behavior using Databricks client under different configurations, we offer the following environment variables:

| Environment Variable                                                | Description                                                                                                                                                                     | Default Value |
|---------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------|
| `UCAI_DATABRICKS_SESSION_RETRY_MAX_ATTEMPTS`                        | Maximum number of attempts to retry refreshing the session client in case of token expiry.                                                               | `5`         |
| `UCAI_DATABRICKS_SERVERLESS_EXECUTION_RESULT_ROW_LIMIT`             | Maximum number of rows when executing functions using serverless compute with `databricks-connect`.                                                                              | `100`         |
                         | 100           |

#### Reminders

- If the function contains a `DECIMAL` type parameter, it is converted to python `float` for execution, and this conversion may lose precision.
