Metadata-Version: 2.4
Name: xorq
Version: 0.2.0
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Software Development :: User Interfaces
Classifier: Topic :: Database :: Database Engines/Servers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Rust
Classifier: Programming Language :: Python :: Implementation :: CPython
Requires-Dist: dask==2025.1.0 ; python_full_version >= '3.10' and python_full_version < '4.0'
Requires-Dist: attrs>=24.0.0,<26 ; python_full_version >= '3.10' and python_full_version < '4.0'
Requires-Dist: pyarrow>=13.0.0,<20 ; python_full_version >= '3.10' and python_full_version < '4.0'
Requires-Dist: structlog>=24.2.0,<26 ; python_full_version >= '3.10' and python_full_version < '4.0'
Requires-Dist: pytest-mock>=3.14.0,<4 ; python_full_version >= '3.10' and python_full_version < '4.0'
Requires-Dist: cityhash>=0.4.7,<1 ; python_full_version >= '3.10' and python_full_version < '4.0'
Requires-Dist: pandas>=1.5.3,<3 ; python_full_version >= '3.10' and python_full_version < '4.0'
Requires-Dist: pyarrow-hotfix>=0.4,<1 ; python_full_version >= '3.10' and python_full_version < '4.0'
Requires-Dist: geoarrow-types>=0.2,<1 ; python_full_version >= '3.10' and python_full_version < '4.0'
Requires-Dist: pythran>=0.17.0 ; sys_platform == 'darwin'
Requires-Dist: atpublic>=5.1
Requires-Dist: parsy>=2
Requires-Dist: python-dateutil>=2.8.2
Requires-Dist: pytz>=2022.7
Requires-Dist: sqlglot==25.20.2
Requires-Dist: toolz>=0.11
Requires-Dist: typing-extensions>=4.3.0
Requires-Dist: pyyaml>=6.0.2
Requires-Dist: cloudpickle>=3.1.1
Requires-Dist: envyaml>=1.10.211231
Requires-Dist: duckdb>=1.1.3 ; extra == 'duckdb'
Requires-Dist: datafusion>=0.6,<46 ; python_full_version >= '3.10' and python_full_version < '4.0' and extra == 'datafusion'
Requires-Dist: snowflake-connector-python>=3.10.1,<4 ; python_full_version >= '3.10' and python_full_version < '4.0' and extra == 'snowflake'
Requires-Dist: quickgrove>=0.1.2 ; extra == 'quickgrove'
Requires-Dist: fsspec>=2024.6.1,<2025.3.1 ; python_full_version >= '3.10' and python_full_version < '4.0' and extra == 'examples'
Requires-Dist: pins[gcs]>=0.8.3,<1 ; python_full_version >= '3.10' and python_full_version < '4.0' and extra == 'examples'
Requires-Dist: xgboost>=1.6.1 ; python_full_version >= '3.10' and python_full_version < '4.0' and extra == 'examples'
Requires-Dist: duckdb>=0.10.3,<2 ; python_full_version >= '3.10' and python_full_version < '4.0' and extra == 'examples'
Requires-Dist: quickgrove>=0.1.2 ; extra == 'examples'
Requires-Dist: scikit-learn>=1.4.0,<2.0.0 ; extra == 'examples'
Requires-Dist: openai>=1.65.4 ; extra == 'examples'
Requires-Dist: tenacity>=9.0.0 ; extra == 'examples'
Requires-Dist: adbc-driver-postgresql>=1.4.0 ; extra == 'examples'
Requires-Dist: psycopg2-binary>=2.9.10 ; extra == 'examples'
Requires-Dist: adbc-driver-postgresql>=1.4.0 ; extra == 'postgres'
Requires-Dist: psycopg2-binary>=2.9.10 ; extra == 'postgres'
Provides-Extra: duckdb
Provides-Extra: datafusion
Provides-Extra: snowflake
Provides-Extra: quickgrove
Provides-Extra: examples
Provides-Extra: postgres
License-File: LICENSE
Summary: Data processing library built on top of Ibis and DataFusion to write multi-engine data workflows.
Author-email: Hussain Sultan <hussain@letsql.com>
Maintainer-email: Dan Lovell <dan@letsql.com>, Daniel Mesejo <mesejo@letsql.com>
Requires-Python: >=3.10
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Homepage, https://www.letsql.com/
Project-URL: Repository, https://github.com/letsql/xorq.git
Project-URL: Issues, https://github.com/letsql/xorq/issues
Project-URL: Changelog, https://github.com/letsql/xorq/blob/main/CHANGELOG.md

# xorq: Multi-engine ML pipelines made simple

[![Downloads](https://static.pepy.tech/badge/letsql)](https://pepy.tech/project/letsql)
![PyPI - Version](https://img.shields.io/pypi/v/letsql)
![GitHub License](https://img.shields.io/github/license/letsql/letsql)
![PyPI - Status](https://img.shields.io/pypi/status/letsql)
![GitHub Actions Workflow Status](https://img.shields.io/github/actions/workflow/status/letsql/letsql/ci-test.yml)
![Codecov](https://img.shields.io/codecov/c/github/letsql/letsql)

xorq is a deferred computational framework that brings the replicability and
performance of declarative pipelines to the Python ML ecosystem. It enables us
to write pandas-style transformations that never run out of memory,
automatically cache intermediate results, and seamlessly move between SQL
engines and Python UDFs—all while maintaining replicability. xorq is built on
top of Ibis and DataFusion.

| Feature | Description |
|---------|-------------|
| **Declarative expressions** | Express and execute complex data processing logic via declarative functions. Define transformations as Ibis expressions so that you are not tied to a specific execution engine. |
| **[Multi-engine](https://docs.xorq.dev/core_concepts#multi-engine-system)** | Create unified ML workflows that leverage the strengths of different data engines in a single pipeline. xorq orchestrates data movement between engines (e.g., Snowflake for initial extraction, DuckDB for transformations, and Python for ML model training). |
| **[Built-in caching](https://docs.xorq.dev/core_concepts#caching-system)** | xorq automatically caches intermediate pipeline results, minimizing repeated work. |
| **Serializable pipelines** | All pipeline definitions, including UDFs, are serialized to YAML, enabling version control, reproducibility, and CI/CD integration. Ensures consistent results across environments and makes it easy to track changes over time. |
| **Portable UDFs** | Build pipelines as  UDxFs- aggregates, windows, and transformations. The DataFusion-based xorq engine provides a portable runtime for UDF execution. |
| **Arrow-native architecture** | Built on the Apache Arrow columnar memory format and Arrow Flight transport layer, xorq achieves high-performance data transfer without cumbersome serialization overhead. |


## Getting Started
xorq functions as both an interactive library for building expressions and a
command-line interface. This dual nature enables seamless transition
from exploratory research to production-ready artifacts. The steps below will
guide through using both the CLI and library components to get started.

> [!CAUTION] 
> This library does not currently have a stable release. Both the
> API and implementation are subject to change, and future updates may not be
> backward compatible.

### Installation

xorq is available as [`xorq`](https://pypi.org/project/xorq/) on PyPI:

```shell
pip install xorq
```

> [!NOTE]
> We are changing the name from LETSQL to xorq.

### Usage

```python
# your_pipeline.py
import xorq as xo


pg = xo.postgres.connect_env()
db = xo.duckdb.connect()

batting = pg.table("batting")
awards_players = xo.examples.awards_players.fetch(backend=db)

left = batting.filter(batting.yearID == 2015)

right = (awards_players.filter(awards_players.lgID == "NL")
                       .drop("yearID", "lgID")
                       .into_backend(pg, "filtered"))

expr = (left.join(right, ["playerID"], how="semi")
            .cache()
            .select(["yearID", "stint"]))

result = expr.execute()
```

xorq provides a CLI that enables you to build serialized artifacts from expressions, making your pipelines reproducible and deployable:

```shell
# Build an expression from a Python script
xorq build your_pipeline.py -e "expr" --target-dir builds
```
This will create a build artifact directory named by its expression hash:
```
builds
└── fce90c2d4bb8
   ├── abe2c934f4fe.sql
   ├── cec2eb9706bc.sql
   ├── deferred_reads.yaml
   ├── expr.yaml
   ├── metadata.json
   ├── profiles.yaml
   └── sql.yaml
```

The CLI converts Ibis expressions into serialized artifacts that capture the complete execution graph, ensuring consistent results across environments.
More info can be found in the tutorial [Building with xorq](https://docs.xorq.dev/tutorials/build).

For more examples on how to use xorq, check the
[examples](https://github.com/letsql/xorq/tree/main/examples) directory, note
that in order to run some of the scripts in there, you need to install the
library with `examples` extra:

```shell
pip install 'xorq[examples]'
```

## Contributing

Contributions are welcome and highly appreciated. To get started, check out the [contributing guidelines](https://github.com/letsql/xorq/blob/main/CONTRIBUTING.md).

## Acknowledgements

This project heavily relies on [Ibis](https://github.com/ibis-project/ibis) and [DataFusion](https://github.com/apache/datafusion).   

## License

This repository is licensed under the [Apache License](https://github.com/letsql/xorq/blob/main/LICENSE)

