Metadata-Version: 2.1
Name: laktory
Version: 0.2.1
Summary: A DataOps framework for building a lakehouse
Author-email: Olivier Soucy <olivier.soucy@okube.ai>
License: MIT
Project-URL: Homepage, https://github.com/opencubes-ai/laktory
Project-URL: Bug Tracker, https://github.com/opencubes-ai/laktory/issues
Keywords: one,two
Classifier: Development Status :: 4 - Beta
Classifier: Framework :: Pydantic :: 2
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: databricks-sdk
Requires-Dist: inflect
Requires-Dist: planck
Requires-Dist: prompt_toolkit
Requires-Dist: pulumi
Requires-Dist: pulumi_databricks>=1.36
Requires-Dist: pulumi_random
Requires-Dist: pyyaml
Requires-Dist: pydantic>=2
Requires-Dist: python-dateutil
Requires-Dist: settus
Requires-Dist: typer[all]
Provides-Extra: dev
Requires-Dist: black; extra == "dev"
Requires-Dist: flit; extra == "dev"
Requires-Dist: mkdocs; extra == "dev"
Requires-Dist: mkdocstrings[python]; extra == "dev"
Requires-Dist: mkdocs-material; extra == "dev"
Requires-Dist: mkdocs-video; extra == "dev"
Provides-Extra: spark
Requires-Dist: pandas; extra == "spark"
Requires-Dist: pyarrow; extra == "spark"
Requires-Dist: pyspark; extra == "spark"
Provides-Extra: test
Requires-Dist: pytest; extra == "test"
Requires-Dist: pytest-cov; extra == "test"
Requires-Dist: pytest-examples; extra == "test"
Requires-Dist: yfinance; extra == "test"
Provides-Extra: azure
Requires-Dist: azure-identity; extra == "azure"
Requires-Dist: azure-storage-blob; extra == "azure"
Requires-Dist: pulumi_azure; extra == "azure"
Requires-Dist: pulumi_azure_native; extra == "azure"
Requires-Dist: settus[azure]; extra == "azure"
Provides-Extra: aws
Requires-Dist: boto3; extra == "aws"
Requires-Dist: pulumi_aws; extra == "aws"
Requires-Dist: settus[aws]; extra == "aws"
Provides-Extra: gcp
Requires-Dist: settus[gcp]; extra == "gcp"


# Laktory

[![pypi](https://img.shields.io/pypi/v/laktory.svg)](https://pypi.org/project/laktory/)
[![test](https://github.com/okube-ai/laktory/actions/workflows/test.yml/badge.svg)](https://github.com/okube-ai/laktory/actions/workflows/test.yml)
[![downloads](https://static.pepy.tech/badge/laktory/month)](https://pepy.tech/project/laktory)
[![versions](https://img.shields.io/pypi/pyversions/laktory.svg)](https://github.com/okube-ai/laktory)
[![license](https://img.shields.io/github/license/okube-ai/laktory.svg)](https://github.com/okube-ai/laktory/blob/main/LICENSE)

A DataOps framework for building Databricks lakehouse.

<img src="docs/images/logo_sg.png" alt="laktory logo" width="85"/>

Laktory makes it possible to express and bring to life your data vision, from raw to enriched analytics-ready datasets and finely tuned AI models, while adhering to basic DevOps best practices such as source control, code reviews and CI/CD.

Using a declarative approach, you define your datasets and transformations, validate them and deploy them into Databricks workspaces. 
Once deployed, you can once again leverage Laktory for debugging and monitoring.  

<img src="docs/images/what_is_laktory.png" alt="what is laktory" width="400"/>

## Help
See [documentation](https://www.laktory.ai/) for more details.

## Installation
Install using 
```commandline
pip install laktory[{cloud_provider}]
```
where `{cloud_provider}` is `azure`, `aws` or `gcp`. 

For more installation options,
see the [Install](https://www.laktory.ai/install/) section in the documentation.

## A Basic Example
```py
from laktory import models

table = models.Table(
    name="brz_stock_prices",
    catalog_name="prod",
    schema_name="finance",
    timestamp_key="data.created_at",
    builder={
        "layer": "SILVER",
        "table_source": {
            "name": "brz_stock_prices",
        },
        "spark_chain": {
            "nodes": [
                {
                    "column": {"name": "symbol"},
                    "type": "string",
                    "sql_expression": "data.symbol"
                }
            ]
        }
    },
)

print(table)
#> catalog_name='prod' columns=[Column(catalog_name='prod', comment=None, name='symbol', pii=None, schema_name='finance', spark_func_args=[], spark_func_kwargs={}, spark_func_name=None, sql_expression='data.symbol', table_name='brz_stock_prices', type='string', unit=None)] comment=None data=None grants=None name='brz_stock_prices' primary_key=None schema_name='finance' timestamp_key='data.created_at' builder=TableBuilder(drop_source_columns=True, drop_duplicates=None, event_source=None, joins=[], pipeline_name=None, table_source=TableDataSource(read_as_stream=True, catalog_name='prod', cdc=None, selects=None, filter=None, from_pipeline=True, name='brz_stock_prices', schema_name='finance', watermark=None), layer='SILVER')
```

To get started with a more useful example, jump into the [Quickstart](https://www.laktory.ai/quickstart/).



## A full Data Ops template
A comprehensive template on how to deploy a lakehouse as code using Laktory is maintained here:
https://github.com/okube-ai/lakehouse-as-code.

In this template, 4 pulumi projects are used to:
- `{cloud_provider}_infra`: Deploy the required resources on your cloud provider
- `unity-catalog`: Setup users, groups, catalogs, schemas and manage grants
- `workspace-conf`: Setup secrets, clusters and warehouses
- `workspace`: The data workflows to build your lakehouse.

## Okube Company
<img src="docs/images/okube.png" alt="okube logo" width="85"/>

[Okube](https://www.okube.ai) is dedicated to building open source frameworks, known as the *kubes*, empowering businesses to build, deploy and operate highly scalable data platforms and AI models. 

