Metadata-Version: 2.1
Name: pystarburst
Version: 0.9.0
Summary: PyStarburst DataFrame API allows you to query and transform data in Starburst products in a data pipeline without having to download the data locally.
Home-page: https://starburst.io
License: Apache-2.0
Author: Starburst Data
Author-email: info@starburstdata.com
Requires-Python: >=3.9,<4.0
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Provides-Extra: pandas
Requires-Dist: pandas (>=2.2,<3.0) ; extra == "pandas"
Requires-Dist: pydantic (>=2.7.4,<3.0.0)
Requires-Dist: python-dateutil (>=2.8.2,<3.0.0)
Requires-Dist: trino (>=0.329.0,<0.330.0)
Requires-Dist: urllib3 (>=2.2.0,<3.0.0)
Requires-Dist: zstandard (>=0.22.0,<0.23.0)
Project-URL: Repository, https://github.com/starburstdata/pystarburst-examples
Description-Content-Type: text/markdown

# PyStarburst DataFrame API

PyStarburst DataFrame API allows you to query and transform data in Starburst products in a data pipeline without having to download the data locally.

## Documentation

See the PyStarburst API [documentation](https://pystarburst.eng.starburstdata.net/) and the examples [repository](https://github.com/starburstdata/pystarburst-examples).

## Getting started

Install pystarburst

```bash
pip install pystarburst
```

### Connect to a Starburst server

The parameters are the same connect parameters as in Trino Python Client.

```python
from pystarburst import Session

connection_parameters = {
    "host": "localhost",
    "port": 8080,
    "user": "admin",
    "catalog": "tpch",
    "schema": "tiny"
}

session = Session.builder.configs(connection_parameters).create()
```

### Using SQL

```python
from pystarburst import Session

session = Session.builder.configs({ ... }).create()

session.sql("SELECT 1 as a").show()
```

### Querying a table

```python
from pystarburst import Session

session = Session.builder.configs({ ... }).create()

df = session.table("nation")
print(df.schema)
df.show()

```

### Filtering a data frame

```python
from pystarburst import Session

session = Session.builder.configs({ ... }).create()

df = session.table("nation")
df.filter(df.col("regionkey") == 0).show()
```

### Joining data frames

```python
from pystarburst import Session

session = Session.builder.configs({ ... }).create()

df = session.table("nation")
df.filter(df.col("regionkey") == 0).show()
```

### Aggregation

```python
from pystarburst import Session
from pystarburst.functions import col

session = Session.builder.configs({ ... }).create()
df = session.table("nation")
df.agg((col("regionkey"), "max"), (col("regionkey"), "avg")).show()
```

