Metadata-Version: 2.4
Name: tempora-ai
Version: 0.1.0
Summary: Tempora AI Python Client Library
License-Expression: LicenseRef-Proprietary
License-File: LICENSE
Keywords: machine learning,data preparation,data pipelines,time series
Author: Tempora AI
Author-email: support@tempora.ai
Requires-Python: >=3.9,<4.0
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Natural Language :: English
Classifier: Topic :: Database :: Database Engines/Servers
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Provides-Extra: jax
Provides-Extra: pytorch
Provides-Extra: tensorflow
Requires-Dist: boto3 (>=1.28)
Requires-Dist: cloudpickle (>=3.0)
Requires-Dist: eval-type-backport (>=0.2.2,<0.3.0)
Requires-Dist: jax ; extra == "jax"
Requires-Dist: jaxtyping (>=0.2.14)
Requires-Dist: numpy (>=1.20)
Requires-Dist: pandas (>=1.2)
Requires-Dist: pyarrow (>=14)
Requires-Dist: pydantic (==2.10.6)
Requires-Dist: tensorflow ; extra == "tensorflow"
Requires-Dist: tensorflow-io-gcs-filesystem ; extra == "tensorflow"
Requires-Dist: tomli (>=2.2.1,<3.0.0)
Requires-Dist: torch ; extra == "pytorch"
Requires-Dist: torchvision ; extra == "pytorch"
Requires-Dist: typeguard (>=3.0)
Requires-Dist: typing-extensions (>=4.4.0)
Project-URL: Documentation, https://docs.tempora.ai/
Project-URL: Homepage, https://tempora.ai/
Description-Content-Type: text/markdown

# Tempora AI Python Client

[Tempora](https://tempora.ai) helps you build *ML-ready batches* from structured or sequential data
residing either locally, in cloud storage or enterprise data systems. It provides
dataset abstractions, batch samplers, target specification, and feature transform
hooks enabling you to build powerful ML data prep workflows in just a few lines of Python.

## Tempora Server
The `tempora-ai` package is a Python client that *connects* to a Tempora **server** instance,
which in turn handles all the computation and data processing required for data prep workflows.

In production deployments, the server runs as a containerized application in your *own
environment* (for example in your VPC or on-prem infrastructure). However, if you would like to
try Tempora before deploying it, we offer a [free hosted trial](https://tempora.ai/trial).


## Installation

```bash
pip install tempora-ai
```

### Optional extras

```bash
pip install tempora-ai[torch]
pip install tempora-ai[tensorflow]
pip install tempora-ai[jax]
```

## Example Workflow

```python
import pandas as pd
import tempora
from tempora.datasets import Pivot, SnowflakeDataset, connect_snowflake
from tempora.samplers import RandomSampler, PredictionWindowTargetSpec, EventEncoder

# Login to the Tempora server and connect to Snowflake
tempora.login(host='10.0.8.0', username='tempora')
connect_snowflake(user='nick', account='xy12345', warehouse='my_warehouse')

# Define the datasets for source data in Snowflake.
# Specify a pivot transformation for the sensor data that is in EAV form
sensor_data = SnowflakeDataset(
    database='industrial', table='sensor_data',
    time_column='datetime', entity_keys=['Machine Serial'],
    pivot=Pivot(on='param', using='value', agg_function='first'),
)
failure_data = SnowflakeDataset(
    database='industrial', table='failure_data', time_column='Failure Date',
)

# Join the two datasets using an ASOF join, then apply a custom row filter
dataset = sensor_data.join(failure_data, ['Machine Serial'], asof_join=True)
dataset = dataset.filter("Model = 'XC-500' AND ENGINE_HOURS IS NOT NULL")

# Create a random segment sampler with event-based targets
target_spec = PredictionWindowTargetSpec(
    window_len=pd.Timedelta('20h'), columns=['Failure Date'],
    transform_spec=EventEncoder(),
)
sampler = RandomSampler(
    context_len=pd.Timedelta('50h'), batch_size=32, output_format='torch',
    target_spec=target_spec,
)

# Generate 1000 batches from the sampler and serialize them to S3 as parquet
sampler.write_batches(dataset, 1000, 's3://bucket-name/tempora_batches/')
```

## Core Concepts

- `datasets`: Define datasets backed by file systems, object stores, data warehouses or databases.
- `samplers`: Sample training/eval batches according to different strategies (random, sequential, series).
- `target specification`: Specify the targets or labels for each sampled segment in the batch.
- `transforms`: Apply custom transforms to either context and/or target window data.

## Environment Variables

- `TEMPORA_SERVER_HOST`: Server address in `hostname:port` format.
- `TEMPORA_SERVER_USERNAME`: Username to authenticate with.
- `TEMPORA_SERVER_PASSWORD`: Password to authenticate with.

## Documentation

https://docs.tempora.ai/

## Support

[support@tempora.ai](mailto:support@tempora.ai)

