Metadata-Version: 2.4
Name: dremioframe
Version: 0.10.0
Summary: A dataframe-like library for Dremio Cloud & Dremio Software
Author-email: Alex Merced <alexmerced@alexmerced.com>
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
License-File: LICENSE
Requires-Dist: pyarrow>=14.0.0
Requires-Dist: pandas>=2.0.0
Requires-Dist: requests>=2.31.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: polars>=0.20.0
Requires-Dist: matplotlib
Requires-Dist: typer>=0.9.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: aiohttp>=3.8.0
Requires-Dist: datafusion>=33.0.0
Requires-Dist: plotly>=5.0.0
Requires-Dist: pyiceberg>=0.5.0
Requires-Dist: langchain ; extra == "ai"
Requires-Dist: langchain-openai ; extra == "ai"
Requires-Dist: langchain-anthropic ; extra == "ai"
Requires-Dist: langchain-google-genai ; extra == "ai"
Requires-Dist: fastavro ; extra == "avro"
Requires-Dist: celery ; extra == "celery"
Requires-Dist: redis ; extra == "celery"
Requires-Dist: rich ; extra == "cli"
Requires-Dist: prompt_toolkit ; extra == "cli"
Requires-Dist: pytest>=7.0.0 ; extra == "dev"
Requires-Dist: pytest-asyncio ; extra == "dev"
Requires-Dist: requests-mock ; extra == "dev"
Requires-Dist: mkdocs ; extra == "docs"
Requires-Dist: mkdocs-material ; extra == "docs"
Requires-Dist: mkdocstrings[python] ; extra == "docs"
Requires-Dist: pyyaml ; extra == "dq"
Requires-Dist: openpyxl ; extra == "excel"
Requires-Dist: lxml ; extra == "html"
Requires-Dist: html5lib ; extra == "html"
Requires-Dist: kaleido ; extra == "image-export"
Requires-Dist: pylance ; extra == "lance"
Requires-Dist: mysql-connector-python ; extra == "mysql"
Requires-Dist: psycopg2-binary ; extra == "postgres"
Requires-Dist: boto3 ; extra == "s3"
Requires-Dist: apscheduler ; extra == "scheduler"
Project-URL: Homepage, https://github.com/developer-advocacy-dremio/dremio-cloud-dremioframe
Project-URL: Issues, https://github.com/developer-advocacy-dremio/dremio-cloud-dremioframe/issues
Provides-Extra: ai
Provides-Extra: avro
Provides-Extra: celery
Provides-Extra: cli
Provides-Extra: dev
Provides-Extra: docs
Provides-Extra: dq
Provides-Extra: excel
Provides-Extra: html
Provides-Extra: image-export
Provides-Extra: lance
Provides-Extra: mysql
Provides-Extra: postgres
Provides-Extra: s3
Provides-Extra: scheduler

# DremioFrame (currently in alpha)

DremioFrame is a Python library that provides a dataframe builder interface for interacting with Dremio Cloud & Dremio Software. It allows you to list data, perform CRUD operations, and administer Dremio resources using a familiar API.

## Documentation

### 🚀 Getting Started
- [Installation](#installation)
- [Quick Start](#quick-start)
- [Configuration](docs/getting_started/configuration.md)
- [Connecting to Dremio](docs/getting_started/connection.md)
- [Tutorial: ETL Pipeline](docs/getting_started/tutorial_etl.md)
- [Cookbook / Recipes](docs/getting_started/cookbook.md)
- [Troubleshooting](docs/getting_started/troubleshooting.md)

### 🛠️ Data Engineering
- [Dataframe Builder API](docs/data_engineering/builder.md)
- [Querying Data](docs/data_engineering/querying.md)
- [Joins & Transformations](docs/data_engineering/joins.md)
- [Aggregation](docs/data_engineering/aggregation.md)
- [Sorting & Filtering](docs/data_engineering/sorting.md)
- [Ingestion API](docs/data_engineering/ingestion.md)
- [Ingestion Patterns](docs/data_engineering/ingestion_patterns.md)
- [File Upload](docs/data_engineering/file_upload.md)
- [Exporting Data](docs/data_engineering/export.md)
- [Working with Files](docs/data_engineering/files.md)
- [Caching](docs/data_engineering/caching.md)
- [Pydantic Integration](docs/data_engineering/pydantic_integration.md)
- [Iceberg Tables](docs/data_engineering/iceberg.md)
- [Iceberg Lakehouse Management](docs/data_engineering/guide_iceberg_management.md)

### 📊 Analysis & Visualization
- [Charting & Plotting](docs/analysis/charting.md)
- [Interactive Plotting](docs/analysis/plotting.md)
- [Query Profiling](docs/analysis/profiling.md)

### 🧠 AI Capabilities
- [Script Generation](docs/ai/generation.md)
- [SQL Generation](docs/ai/sql.md)
- [API Call Generation](docs/ai/api.md)
- [DremioAgent Class](docs/ai/agent.md)

### 📐 Data Modeling
- [Medallion Architecture](docs/modeling/medallion.md)
- [Dimensional Modeling](docs/modeling/dimensional.md)
- [Slowly Changing Dimensions](docs/modeling/scd.md)
- [Semantic Views](docs/modeling/views.md)
- [Documenting Datasets](docs/modeling/documentation.md)

### ⚙️ Orchestration
- [Overview](docs/orchestration/overview.md)
- [Tasks & Sensors](docs/orchestration/tasks.md)
- [Extensions](docs/orchestration/extensions.md)
- [Scheduling](docs/orchestration/scheduling.md)
- [Dremio Jobs](docs/orchestration/dremio_jobs.md)
- [Iceberg Tasks](docs/orchestration/iceberg.md)
- [Reflection Tasks](docs/orchestration/reflections.md)
- [Data Quality Task](docs/orchestration/dq_task.md)
- [Distributed Execution](docs/orchestration/distributed.md)
- [Deployment](docs/orchestration/deployment.md)
- [CLI & UI](docs/orchestration/cli.md)
- [Web UI](docs/orchestration/ui.md)
- [Backends](docs/orchestration/backend.md)
- [Best Practices](docs/orchestration/best_practices.md)

### ✅ Data Quality
- [DQ Framework](docs/data_quality/framework.md)

### 🔧 Administration & Governance
- [Administration](docs/admin_governance/admin.md)
- [Catalog Management](docs/admin_governance/catalog.md)
- [Reflections Management](docs/admin_governance/reflections.md)
- [User Defined Functions (UDFs)](docs/admin_governance/udf.md)
- [Security Best Practices](docs/admin_governance/security.md)
- [Security Patterns](docs/admin_governance/security_patterns.md)
- [Governance: Masking & Row Access](docs/admin_governance/masking_and_row_access.md)
- [Governance: Tags](docs/admin_governance/tags.md)
- [Governance: Lineage](docs/admin_governance/lineage.md)
- [Governance: Privileges](docs/admin_governance/privileges.md)
- [Space & Folder Management](docs/admin_governance/spaces_folders.md)

### 🚀 Performance & Deployment
- [Performance Tuning](docs/performance/tuning.md)
- [CI/CD & Deployment](docs/deployment/cicd.md)

### 📚 Reference
- [Function Reference](docs/reference/function_reference.md)
- [SQL Functions Guide](docs/reference/functions_guide.md)
- [CLI Reference](docs/reference/cli.md)
- [API Reference](docs/reference/client.md)
- [Async Client](docs/reference/async_client.md)
- [Advanced Usage](docs/reference/advanced.md)
- [Architecture](architecture.md)
- [Testing Guide](docs/reference/testing.md)
- [Contributing](CONTRIBUTING.md)

## Installation

```bash
pip install dremioframe
```

To install with optional dependencies (e.g., for static image export):
```bash
pip install "dremioframe[image_export]"
```

## Quick Start

### Dremio Cloud

```python
from dremioframe.client import DremioClient

# Assumes DREMIO_PAT and DREMIO_PROJECT_ID are set in env
client = DremioClient()

# Query a table
df = client.table("Samples.samples.dremio.com.zips.json").select("city", "state").limit(5).collect()
print(df)
```

### Dremio Software

```python
client = DremioClient(
    hostname="localhost",
    port=32010,
    username="admin",
    password="password123",
    tls=False
)
```

## Features

```python
from dremioframe.client import DremioClient

client = DremioClient(pat="YOUR_PAT", project_id="YOUR_PROJECT_ID")

# List catalog
print(client.catalog.list_catalog())

# Query data
df = client.table("Samples.samples.dremio.com.zips.json").select("city", "state").filter("state = 'MA'").collect()
print(df)

# Calculated Columns
df.mutate(total_pop="pop * 2").show()

# Aggregation
df.group_by("state").agg(avg_pop="AVG(pop)").show()

# Joins
df.join("other_table", on="left_tbl.id = right_tbl.id").show()

# Iceberg Time Travel
df.at_snapshot("123456789").show()



# API Ingestion
client.ingest_api(
    url="https://api.example.com/users",
    table_name="users",
    mode="merge",
    pk="id"
)

# Charting
df.chart(kind="bar", x="category", y="sales", save_to="sales.png")

# Export
df.to_csv("data.csv")
df.to_parquet("data.parquet")

# Insert Data (Batched)
import pandas as pd
data = pd.DataFrame({"id": [1, 2], "name": ["A", "B"]})
client.table("my_table").insert("my_table", data=data, batch_size=1000)

# SQL Functions
from dremioframe import F

client.table("sales") \
    .select(
        F.col("dept"),
        F.sum("amount").alias("total_sales"),
        F.rank().over(F.Window.order_by("amount")).alias("rank")
    ) \
    .show()

# Merge (Upsert)
client.table("target").merge(
    target_table="target",
    on="id",
    matched_update={"name": "source.name"},
    not_matched_insert={"id": "source.id", "val": "source.val"},
    data=data
)

# Data Quality
df.quality.expect_not_null("city")
df.quality.expect_row_count("pop > 1000000", 5, "ge") # Expect at least 5 cities with pop > 1M

# Query Explanation
print(df.explain())

# Reflection Management
client.admin.create_reflection(dataset_id="...", name="my_ref", type="RAW", display_fields=["col1"])

# Async Client
# async with AsyncDremioClient(pat="...") as client: ...

# CLI
# dremio-cli query "SELECT 1"

# Local Caching
# client.table("source").cache("my_cache", ttl_seconds=300).sql("SELECT * FROM my_cache").show()

# Interactive Plotting
# df.chart(kind="scatter", backend="plotly").show()

# UDF Manager
# client.udf.create("add_one", {"x": "INT"}, "INT", "x + 1")

# Raw SQL
# df = client.query("SELECT * FROM my_table")

# Source Management
# client.admin.create_source_s3("my_datalake", "bucket")

# Query Profiling
# client.admin.get_job_profile("job_123").visualize().show()

# Iceberg Client
# client.iceberg.list_tables("my_namespace")

# Orchestration CLI
# dremio-cli pipeline list
# dremio-cli pipeline ui --port 8080

# Data Quality Framework
# dremio-cli dq run tests/dq
```

