Metadata-Version: 2.4
Name: soigia
Version: 0.1.9
Summary: Python SDK for Soi Gia
Author-email: Soi Gia <sojgja@gmail.com>
License-Expression: BSD-3-Clause
Keywords: soigia,sdk,datatable,pipeline,streamlit
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas
Requires-Dist: jinja2
Requires-Dist: loguru
Requires-Dist: pyarrow
Requires-Dist: pyyaml
Requires-Dist: tabulate
Requires-Dist: rich
Requires-Dist: sqlalchemy
Dynamic: license-file

# soigia

`soigia` is a Python SDK for dataframe workflows, internal UI screens, shared YAML configuration, and business pipelines.

## Getting Started

```bash
pip install soigia
```

```python
import soigia

df = soigia.DataTable({"name": ["alice"], "age": [19]})
print(df)
```

For local development:

```bash
pip install -e .
```

## What It Includes

- `soigia.datatable.DataTable` for pandas-like data work with extra helpers.
- `soigia.ui` for Streamlit-based internal screens.
- `soigia.pipeline.BasePipeline` for business workflows.
- `soigia.config` for shared YAML configuration and multi-account storage.

## Quick Examples

### DataTable

```python
from soigia.datatable import DataTable

df = DataTable({"name": ["alice", "bob"], "age": [19, 24]})
print(df.objects.filter(age__gte=20).values("name", "age"))
```

### Reconciliation

```python
from soigia.datatable import DataTable

left = DataTable(
    {
        "id": [1, 2],
        "name": ["alice", "bob"],
        "amount": [10, 20],
    }
)
right = DataTable(
    {
        "id": [1, 2],
        "name": ["alice", "bobby"],
        "amount": [10, 25],
    }
)

bundle = left.reconcile_bundle(
    right,
    key_columns=["id"],
    title="Oracle vs Kudu Reconciliation",
    html_path="tests/reconcile-report.html",
    json_path="tests/reconcile-report.json",
    open_report_in_browser=True,
)

print(bundle["html_path"])
print(bundle["json_path"])
print(bundle["payload"]["summary"])
```

If you only want the HTML string, use `reconcile_html_report()`. If you want the report opened immediately, use `reconcile_open()`.

### UI

```python
from soigia.ui import ui

orders = [
    {"id": 1, "customer": "Alice", "amount": 120.0, "status": "paid"},
    {"id": 2, "customer": "Bob", "amount": 88.5, "status": "pending"},
]


@ui.page("Orders")
def orders_page(ctx):
    ui.text("Order Dashboard")
    keyword = ui.text_input("Search")
    ui.metric("Total orders", len(orders))
    ui.table(lambda: [row for row in orders if keyword.value.lower() in row["customer"].lower()])


ui.run()
```

### Pipeline

```python
from soigia.pipeline import BasePipeline


class SalesPipeline(BasePipeline):
    stages = ["normalize", "summarize"]

    def load_data(self):
        return build_orders()

    def normalize(self):
        self.data_df = self.data_df.copy()
        self.data_df["amount"] = self.data_df["amount"].astype(float)

    def summarize(self):
        self.model.total_rows = len(self.data_df)


pipeline = SalesPipeline(name="sales")
result = pipeline.run()
print(result.summary_path)
```

## Documentation

- [Module Reference](docs/module-reference.md)
- [Pipeline End-to-End Guide](docs/pipeline-end-to-end.md)
- [UI End-to-End Guide](docs/ui-end-to-end.md)

## Configuration

Use `soigia.config` to load and save a shared YAML file for the whole project.

Recommended workflow:

- keep `config.example.yaml` in git
- copy it to `config.yaml` on your machine
- keep the real `config.yaml` out of version control
- never commit secrets, tokens, keys, or chat IDs in the real file

Pipeline modules can also auto-load a `config.yaml` file placed next to the pipeline file. It is exposed as `self.config` with dot access:

```python
if self.config.min_amount:
    ...
```

## Examples

Run the bundled demos:

```bash
python -m examples.pipeline_final_template
python -m examples.pipeline_yaml_config_demo
make ui-config
make ui
make ui-users
make ui-csv
make ui-markdown
make ui-jinja
```

## Release

Build distribution artifacts:

```bash
python -m pip install --upgrade build
python -m build
```

Upload to PyPI:

```bash
python -m pip install --upgrade twine
python -m twine upload dist/*
```

Run the test suite before release:

```bash
python -m pytest tests -q
```

## Publish Checklist

1. Bump the version in `pyproject.toml`.
2. Update `CHANGELOG.md`.
3. Run `python -m pytest tests -q`.
4. Run `python -m build --no-isolation`.
5. Upload with `python -m twine upload dist/*`.
6. Tag the release in git.

## Project Layout

- `soigia/`
- `examples/`
- `docs/`
- `tests/`

## Generated Files

- `data/<ClassName>/<step>.csv`
- `data/<ClassName>/<ClassName>.csv`
- `data/<ClassName>/<ClassName>.sqlite3`
- `data/<ClassName>/<ClassName>.parquet`
- `data/<ClassName>/pipeline_summary.md`
- `logs/<ClassName>.log`

## Optional Features

- Google Sheets helpers usually need `gspread` and Google service-account support.
- Shared account config screens need `pyyaml`.
- Jinja template examples need `jinja2`.
- Excel helpers need an Excel engine such as `openpyxl`.
- Parquet and Feather helpers need a parquet/arrow backend such as `pyarrow`.
- Fake data helpers work best with `faker`, but `soigia` also includes a small fallback generator.

## Core Ideas

`DataTable` keeps the pandas mental model, but adds a few opinionated helpers:

- `objects` for queryset-style chaining
- `values()` for record-style output
- `filter()` and `exclude()` for row filtering
- `order_by()` and `distinct()` for common table operations
- `join()` for relational merges
- `validate_schema()` for lightweight schema checks
- `init_versions()`, `snapshot()`, and `auto_version()` for version tracking

## Security Note

`from_pickle()` should only be used with trusted input. Pickle files can execute code during loading.
