Metadata-Version: 2.1
Name: laktory
Version: 0.0.7
Summary: A DataOps framework for building a lakehouse
Author-email: Olivier Soucy <olivier.soucy@okube.ai>
License: MIT
Project-URL: Homepage, https://github.com/opencubes-ai/laktory
Project-URL: Bug Tracker, https://github.com/opencubes-ai/laktory/issues
Keywords: one,two
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: databricks-sdk
Requires-Dist: jsonref
Requires-Dist: pulumi
Requires-Dist: pulumi_databricks
Requires-Dist: pyyaml
Requires-Dist: pydantic>=2
Requires-Dist: settus
Provides-Extra: dev
Requires-Dist: black; extra == "dev"
Requires-Dist: databricks-sdk; extra == "dev"
Requires-Dist: databricks-sql-connector; extra == "dev"
Requires-Dist: flit; extra == "dev"
Provides-Extra: test
Requires-Dist: pytest; extra == "test"
Requires-Dist: pytest-cov; extra == "test"
Provides-Extra: azure
Requires-Dist: azure-identity; extra == "azure"
Requires-Dist: azure-storage-blob; extra == "azure"
Requires-Dist: pulumi_azure; extra == "azure"
Requires-Dist: pulumi_azure_native; extra == "azure"
Provides-Extra: aws
Requires-Dist: boto3; extra == "aws"
Requires-Dist: pulumi_aws; extra == "aws"
Provides-Extra: gcp

# Laktory

[![pypi](https://img.shields.io/pypi/v/laktory.svg)](https://pypi.org/project/laktory/)
[![downloads](https://static.pepy.tech/badge/laktory/month)](https://pepy.tech/project/laktory)
[![versions](https://img.shields.io/pypi/pyversions/laktory.svg)](https://github.com/okube-ai/laktory)
[![license](https://img.shields.io/github/license/okube-ai/laktory.svg)](https://github.com/okube-ai/laktory/blob/main/LICENSE)

A DataOps framework for building Databricks lakehouse.

## Okube Company 

Okube is committed to develop open source data and ML engineering tools. This is an open space. Contributions are more than welcome.


## Help
TODO: Build full help documentation

## Installation
Install using `pip install laktory`

TODO: Full installation instructions

## A Basic Example
This example demonstrates how to send data events to a data lake and to set a
data pipeline defining the tables transformation layers. 

### Generate data events
A data event class defines specifications of an event and provides the methods
for writing that event to a databricks mount or a cloud storage.

```py
from laktory import models
from datetime import datetime


events = [
    models.DataEvent(
        name="stock_price",
        producer={
            "name": "yahoo-finance",
        },
        data={
            "created_at": datetime(2023, 8, 23),
            "symbol": "GOOGL",
            "open": 130.25,
            "close": 132.33,
        },
    ),
    models.DataEvent(
        name="stock_price",
        producer={
            "name": "yahoo-finance",
        },
        data={
            "created_at": datetime(2023, 8, 24),
            "symbol": "GOOGL",
            "open": 132.00,
            "close": 134.12,
        },
    )
]

for event in events:
    event.to_databricks_mount()

```
These events may now be sent to your cloud storage of choice.

### Define data pipeline and data tables
A pipeline class defines the transformations of a raw data event into curated
(silver) and consumption (gold) layers.

```py
from laktory import models

pl = models.Pipeline(
    name="pl-stock-prices",
    tables=[
        models.Table(
            name="brz_stock_prices",
            timestamp_key="data.created_at",
            event_source=models.EventDataSource(
                name="stock_price",
                producer=models.Producer(
                    name="yahoo-finance",
                )
            ),
            zone="BRONZE",
        ),
        models.Table(
            name="brz_stock_prices",
            table_source=models.TableSource(
                name="brz_stock_prices",
            ),
            zone="SILVER",
            columns = [
                {
                    "name": "created_at",
                    "type": "timestamp",
                    "func_name": "coalesce",
                    "input_cols": ["_created_at"],
                },
                {
                    "name": "low",
                    "type": "double",
                    "func_name": "coalesce",
                    "input_cols": ["data.low"],
                },
                {
                    "name": "high",
                    "type": "double",
                    "func_name": "coalesce",
                    "input_cols": ["data.high"],
                },
            ]
        ),
    ]
)
```
Laktory will provide the required framework for deploying this pipeline as a 
delta live tables in Databricks and all the associated notebooks and jobs. 
TODO: link to help


## Contributing
TODO
