Metadata-Version: 2.4
Name: dremioframe
Version: 0.6.0
Summary: A dataframe-like library for Dremio Cloud & Dremio Software
Author-email: Alex Merced <alexmerced@alexmerced.com>
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
License-File: LICENSE
Requires-Dist: pyarrow>=14.0.0
Requires-Dist: pandas>=2.0.0
Requires-Dist: requests>=2.31.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: polars>=0.20.0
Requires-Dist: matplotlib
Requires-Dist: typer>=0.9.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: aiohttp>=3.8.0
Requires-Dist: datafusion>=33.0.0
Requires-Dist: plotly>=5.0.0
Requires-Dist: pyiceberg>=0.5.0
Requires-Dist: celery ; extra == "celery"
Requires-Dist: redis ; extra == "celery"
Requires-Dist: rich ; extra == "cli"
Requires-Dist: prompt_toolkit ; extra == "cli"
Requires-Dist: pytest>=7.0.0 ; extra == "dev"
Requires-Dist: pytest-asyncio ; extra == "dev"
Requires-Dist: requests-mock ; extra == "dev"
Requires-Dist: pyyaml ; extra == "dq"
Requires-Dist: kaleido ; extra == "image-export"
Requires-Dist: mysql-connector-python ; extra == "mysql"
Requires-Dist: psycopg2-binary ; extra == "postgres"
Requires-Dist: boto3 ; extra == "s3"
Requires-Dist: apscheduler ; extra == "scheduler"
Project-URL: Homepage, https://github.com/developer-advocacy-dremio/dremio-cloud-dremioframe
Project-URL: Issues, https://github.com/developer-advocacy-dremio/dremio-cloud-dremioframe/issues
Provides-Extra: celery
Provides-Extra: cli
Provides-Extra: dev
Provides-Extra: dq
Provides-Extra: image-export
Provides-Extra: mysql
Provides-Extra: postgres
Provides-Extra: s3
Provides-Extra: scheduler

# DremioFrame (currently in alpha)

DremioFrame is a Python library that provides a dataframe builder interface for interacting with Dremio Cloud & Dremio Software. It allows you to list data, perform CRUD operations, and administer Dremio resources using a familiar API.

## Documentation

- [Architecture](architecture.md)
- [Connection Guide](docs/connection.md)
- [Administration](docs/admin.md)
- [Catalog & Admin](docs/catalog.md)
- [Testing Guide](docs/testing.md)
- [Dataframe Builder](docs/builder.md)
- [Aggregation](docs/aggregation.md)
- [Sorting & Distinct](docs/sorting.md)
- [Joins](docs/joins.md)
- [Iceberg Features](docs/iceberg.md)
- [Advanced Features](docs/advanced.md)
- [Charting](docs/charting.md)
- [Data Export](docs/export.md)
- [API Ingestion](docs/ingestion.md)
- [Ingestion Patterns](docs/ingestion_patterns.md)
- [Working with Files](docs/files.md)
- [SQL Functions](docs/functions.md)
    - [Aggregate](docs/functions/aggregate.md)
    - [Math](docs/functions/math.md)
    - [String](docs/functions/string.md)
    - [Date](docs/functions/date.md)
    - [Window](docs/functions/window.md)
    - [Conditional](docs/functions/conditional.md)
    - [AI](docs/functions/ai.md)
    - [Complex Types](docs/functions/complex.md)
- [Local Caching](docs/caching.md)
- [Interactive Plotting](docs/plotting.md)
- [Raw SQL Querying](docs/querying.md)
- [Source Management](docs/admin.md#source-management)
- [Query Profiling](docs/profiling.md)
- [Iceberg Client](docs/iceberg.md)
- [UDF Manager](docs/udf.md)
- [CLI Tool](docs/cli.md)
- [Async Client](docs/async_client.md)
- [Orchestration](docs/orchestration.md)
    - [Orchestration Best Practices](docs/orchestration_best_practices.md)
    - [Orchestration Backend](docs/orchestration_backend.md)
    - [Distributed Execution](docs/orchestration_distributed.md)
    - [Orchestration Scheduling](docs/orchestration_scheduling.md)
    - [Dremio Job Integration](docs/orchestration_dremio_jobs.md)
    - [Iceberg Maintenance](docs/orchestration_iceberg.md)
    - [Data Quality Task](docs/orchestration_dq_task.md)
    - [Reflection Management](docs/orchestration_reflections.md)
    - [Web UI](docs/orchestration_ui.md)
    - [General Tasks](docs/orchestration_tasks.md)
    - [CLI](docs/orchestration_cli.md)
    - [Deployment](docs/orchestration_deployment.md)
    - [Extensions (dbt, Sensors)](docs/orchestration_extensions.md)
- [Data Quality Framework](docs/data_quality.md)
- [Pydantic Integration](docs/pydantic_integration.md)
- [SCD2 Guide](docs/scd2_guide.md)

## Installation

```bash
pip install dremioframe
```

To install with optional dependencies (e.g., for static image export):
```bash
pip install "dremioframe[image_export]"
```

## Quick Start

### Dremio Cloud

```python
from dremioframe.client import DremioClient

# Assumes DREMIO_PAT and DREMIO_PROJECT_ID are set in env
client = DremioClient()

# Query a table
df = client.table("Samples.samples.dremio.com.zips.json").select("city", "state").limit(5).collect()
print(df)
```

### Dremio Software

```python
client = DremioClient(
    hostname="localhost",
    port=32010,
    username="admin",
    password="password123",
    tls=False
)
```

## Features

```python
from dremioframe.client import DremioClient

client = DremioClient(pat="YOUR_PAT", project_id="YOUR_PROJECT_ID")

# List catalog
print(client.catalog.list_catalog())

# Query data
df = client.table("Samples.samples.dremio.com.zips.json").select("city", "state").filter("state = 'MA'").collect()
print(df)

# Calculated Columns
df.mutate(total_pop="pop * 2").show()

# Aggregation
df.group_by("state").agg(avg_pop="AVG(pop)").show()

# Joins
df.join("other_table", on="left_tbl.id = right_tbl.id").show()

# Iceberg Time Travel
df.at_snapshot("123456789").show()



# API Ingestion
client.ingest_api(
    url="https://api.example.com/users",
    table_name="users",
    mode="merge",
    pk="id"
)

# Charting
df.chart(kind="bar", x="category", y="sales", save_to="sales.png")

# Export
df.to_csv("data.csv")
df.to_parquet("data.parquet")

# Insert Data (Batched)
import pandas as pd
data = pd.DataFrame({"id": [1, 2], "name": ["A", "B"]})
client.table("my_table").insert("my_table", data=data, batch_size=1000)

# SQL Functions
from dremioframe import F

client.table("sales") \
    .select(
        F.col("dept"),
        F.sum("amount").alias("total_sales"),
        F.rank().over(F.Window.order_by("amount")).alias("rank")
    ) \
    .show()

# Merge (Upsert)
client.table("target").merge(
    target_table="target",
    on="id",
    matched_update={"name": "source.name"},
    not_matched_insert={"id": "source.id", "val": "source.val"},
    data=data
)

# Data Quality
df.quality.expect_not_null("city")
df.quality.expect_row_count("pop > 1000000", 5, "ge") # Expect at least 5 cities with pop > 1M

# Query Explanation
print(df.explain())

# Reflection Management
client.admin.create_reflection(dataset_id="...", name="my_ref", type="RAW", display_fields=["col1"])

# Async Client
# async with AsyncDremioClient(pat="...") as client: ...

# CLI
# dremio-cli query "SELECT 1"

# Local Caching
# client.table("source").cache("my_cache", ttl_seconds=300).sql("SELECT * FROM my_cache").show()

# Interactive Plotting
# df.chart(kind="scatter", backend="plotly").show()

# UDF Manager
# client.udf.create("add_one", {"x": "INT"}, "INT", "x + 1")

# Raw SQL
# df = client.query("SELECT * FROM my_table")

# Source Management
# client.admin.create_source_s3("my_datalake", "bucket")

# Query Profiling
# client.admin.get_job_profile("job_123").visualize().show()

# Iceberg Client
# client.iceberg.list_tables("my_namespace")

# Orchestration CLI
# dremio-cli pipeline list
# dremio-cli pipeline ui --port 8080

# Data Quality Framework
# dremio-cli dq run tests/dq
```

