Metadata-Version: 2.4
Name: soigia
Version: 0.1.12
Summary: Python SDK for Soi Gia
Author-email: Soi Gia <sojgja@gmail.com>
License-Expression: BSD-3-Clause
Keywords: soigia,sdk,datatable,pipeline,streamlit
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas
Requires-Dist: jinja2
Requires-Dist: loguru
Requires-Dist: pyarrow
Requires-Dist: pyyaml
Requires-Dist: tabulate
Requires-Dist: rich
Requires-Dist: sqlalchemy
Dynamic: license-file

# soigia

`soigia` is a Python SDK built around four practical workspaces:

- `DataTable` for dataframe-style work with a friendlier API
- `PipeLine` for repeatable business workflows
- `ORM` for model-driven business data with backend adapters
- `UI` for lightweight internal screens on top of Streamlit

## Quick Index

- [DataTable](#1-datatable)
- [Pipeline](#2-pipeline)
- [ORM](#3-orm)
- [UI](#4-ui)

This README focuses on those four areas only.

## Install

```bash
pip install soigia
```

For local development:

```bash
pip install -e .
```

## 1. DataTable

`DataTable` is the main table abstraction in `soigia`. It keeps the pandas mental model, but adds a more opinionated, workflow-friendly API for cleaning, filtering, joining, validating, and exporting data.

Use `DataTable` when you want:

- a table object that behaves like a dataframe
- chainable query-style operations
- simple data loading helpers
- schema checks before a dataset moves to the next step
- snapshot and version helpers for auditability

### Typical workflow

1. Load rows from records, CSV, JSON, Excel, Parquet, Feather, pickle, SQL, or Google Sheets.
2. Clean and normalize the data.
3. Filter, sort, deduplicate, or join with another table.
4. Validate the final shape and types.
5. Export or snapshot the result.

### Example

```python
from soigia.datatable import DataTable

orders = DataTable.from_records(
    [
        {"order_id": 1001, "customer": "Alice", "amount": 120.5, "city": "Hanoi"},
        {"order_id": 1002, "customer": "Bob", "amount": 88.0, "city": "Saigon"},
        {"order_id": 1003, "customer": "Carol", "amount": 240.0, "city": "Danang"},
    ]
)

adults = orders.objects.filter(amount__gte=100).order_by("amount")
print(adults.to_rows())
```

### Practical things you can do

- `DataTable.from_records(records)` for raw Python data
- `DataTable.from_csv(path)` for flat files
- `DataTable.from_json(path)` for payloads and exports
- `DataTable.from_excel(path)` for business spreadsheets
- `DataTable.from_parquet(path)` for analytics datasets
- `DataTable.from_sql(...)` for database-backed loading
- `df.objects.filter(...)` for query-like row selection
- `df.objects.exclude(...)` for inverse filtering
- `df.objects.order_by(...)` for sorting
- `df.objects.distinct(...)` for deduplication
- `df.objects.group_by(...)` for grouped summaries
- `df.join(...)` and related helpers for relational work
- `df.validate_schema(...)` for checks before release
- `df.snapshot(...)` and `df.auto_version(...)` for versioned outputs

### When to use it

`DataTable` is a good fit when:

- you need dataframe behavior, but want a stricter workflow layer
- you are passing data between cleaning steps and business logic
- you want tests to assert shape and content more clearly
- you need small, readable transformations instead of scattered pandas calls

### DataTable reference

For the full API list and examples, see:

- [Module Reference](docs/module-reference.md)

## 2. PipeLine

`PipeLine` is the workflow layer for business processing. It is designed for repeatable jobs where data comes in, gets normalized, goes through a few stages, and then writes out artifacts and summary files.

Use `PipeLine` when you want:

- a predictable execution order
- stage-based processing
- automatic output files
- rollback hooks for side effects
- config-driven behavior
- a shared model namespace for computed results

### Mental model

1. `load_data()` brings data into memory.
2. Each stage mutates or enriches the working dataset.
3. `self.datatable` is the main working dataset.
4. `self._data` stores extra data loaded from DB tables.
5. `self.envs` gives fast SQLite table access for ad-hoc reads.
6. `self.model` stores computed results.
7. `self.config` exposes YAML values.
8. `save_outputs()` writes final artifacts.
9. `on_rollback(...)` protects side effects when a stage fails.

### Example

```python
from soigia.pipeline import Pipeline
from soigia.datatable import DataTable


class SalesPipeline(Pipeline):
    stages = ["clean", "enrich", "save", "summarize"]

    def load_data(self):
        return DataTable(
            [
                {"id": 1, "amount": 100.0},
                {"id": 2, "amount": 250.0},
            ]
        )

    def clean(self):
        self.datatable = self.datatable.dropna().reset_index(drop=True)

    def enrich(self):
        self.datatable["net_amount"] = self.datatable["amount"] * 0.98

    def save(self):
        self.datatable.save_to_csv("data/sales/clean_sales.csv")
        self.push("sales", self.datatable, mode="replace")

    def summarize(self):
        self.model.total_rows = len(self.datatable)
        self.model.total_amount = float(self.datatable["amount"].sum())


pipeline = SalesPipeline(name="sales")
result = pipeline.run()

print(result.success)
print(result.summary_path)
print(result.csv_path)
```

### What a pipeline usually includes

- input loading from files, databases, or APIs
- normalization and cleaning
- business-specific enrichment
- summary or scoring logic
- CSV, SQLite, and Parquet outputs
- a Markdown summary for traceability
- log files for debugging and audit

### Recommended structure

- keep the pipeline class small and stage-oriented
- put reusable config in `config.yaml`
- use `config.example.yaml` as the checked-in template
- use `pipeline.config.example.yaml` as the pipeline runtime template
- use `self.model` for computed values that later stages need
- keep rollback handlers focused on external effects, not dataframe-only work

### Pipeline DB defaults

`Pipeline` creates a SQLite database automatically on first use, so subclasses can read and write without extra setup.

The main working attribute is `self.datatable`. Extra tables loaded from DB are cached in `self._data`.

`self.datatable` is a `DataTable`, so you can use helpers like `save_to_csv()`, `save_to_parquet()`, `load_from_db()`, and `save_to_db()` directly. For SQLite, `connection_string` is optional; if you omit it, Soi Gia uses the default local DB:

```python
self.datatable.save_to_csv("data/output.csv")
self.datatable.save_to_parquet("data/output.parquet")
self.datatable.save_to_db("sales")
self.datatable = DataTable.load_from_db("select * from sales")
```

For quick reads, `self.envs` exposes table handles:

```python
orders = self.envs.orders.filter(self.envs.orders.id > 1)
```

You can also use keyword lookups:

```python
orders = self.envs.orders.filter(id__gt=1)
```

```python
class DemoPipeline(Pipeline):
    stages = ["seed", "sync", "reload"]

    def load_data(self):
        return DataTable([{"id": 1, "name": "alice"}])

    def seed(self):
        self.db.set("users", self.datatable)

    def sync(self):
        self.db.upsert_rows(
            "users",
            [{"id": 1, "name": "alice-updated"}, {"id": 2, "name": "bob"}],
        )

    def reload(self):
        self.datatable = DataTable.load_from_db("select * from users")
```

Short aliases are also available:

```python
self.push("users", self.datatable)
self.pull("users")
self.push_datatable("users", self.datatable, mode="upsert")
self.pull_datatable("users")
```

Default upsert keys can be configured per table:

```yaml
db_upsert_keys:
  users:
    - id
  orders:
    - order_id
```

### Pipeline reference

For the end-to-end guide, see:

- [Pipeline End-to-End Guide](docs/pipeline-end-to-end.md)

## 3. ORM

`ORM` is the model-driven data layer in `soigia`. It keeps the class-based model style of Odoo and Django, while separating storage through adapters like memory, pandas, and SQLite.

Use `ORM` when you want:

- declarative models with fields
- query helpers like `filter()`, `exclude()`, `order_by()`, and `limit()`
- record creation, updates, deletes, and browsing by id
- model inheritance and Odoo-style `_inherit`
- business methods on the model itself
- a backend-agnostic API that can switch storage engines

### Mental model

1. Define a model with `class MyModel(Model)`.
2. Declare fields directly on the class.
3. Bind the model to an `Env`.
4. Use `env["model.name"]` or `Model.objects` to query data.
5. Let the backend handle storage, while the model handles business behavior.

### Example

```python
from soigia.db import Env, MemoryBackend, Model, fields


class Customer(Model):
    _name = "demo.customer"

    name = fields.String(required=True)


env = Env(backend=MemoryBackend())
customer = env["demo.customer"].create({"name": "Alice"})

print(customer.id)
print(env["demo.customer"].search([("name", "=", "Alice")]).ids)
```

### ORM reference

For the full guide and API list, see:

- [ORM End-to-End Guide](docs/db-end-to-end.md)
- [Module Reference](docs/module-reference.md)
- [ORM Examples](examples/README.md)

If you only need the API surface, start here:

## 4. UI

`soigia.ui` is a declarative Streamlit layer for internal tools. It is meant for dashboards, admin screens, review pages, and lightweight operational views.

Use `UI` when you want:

- a single-file screen definition
- simple page composition
- forms, metrics, tables, and filters
- a fast way to expose operational workflows

### Mental model

1. Define a page with `ui.page()`.
2. Compose layout blocks with `ui.sidebar()`, `ui.columns()`, `ui.tabs()`, and `ui.form()`.
3. Render data with `ui.table()` and `ui.metric()`.
4. Add inputs with `ui.text_input()`, `ui.select()`, and `ui.number_input()`.
5. Run the app with `ui.run()`.

### Example

```python
from soigia.ui import ui

orders = [
    {"id": 1, "customer": "Alice", "amount": 120.0, "status": "paid"},
    {"id": 2, "customer": "Bob", "amount": 88.5, "status": "pending"},
    {"id": 3, "customer": "Carol", "amount": 220.0, "status": "paid"},
]


@ui.page("Orders Dashboard")
def orders_page(ctx):
    ui.markdown("Track orders, filter data, and prepare quick actions.")

    with ui.sidebar():
        keyword = ui.text_input("Search customer")
        status = ui.select("Status", ["all", "paid", "pending"])
        min_amount = ui.number_input("Minimum amount", value=0)

    filtered = [
        row
        for row in orders
        if keyword.value.lower() in row["customer"].lower()
        and (status.value == "all" or row["status"] == status.value)
        and row["amount"] >= min_amount.value
    ]

    ui.metric("Orders", len(filtered))
    ui.metric("Total amount", sum(row["amount"] for row in filtered))
    ui.table(filtered)


ui.run()
```

### Common building blocks

- `ui.page()` for page registration
- `ui.sidebar()` for filters and navigation
- `ui.columns()` for split layouts
- `ui.tabs()` for grouped views
- `ui.form()` for update flows
- `ui.table()` for records and results
- `ui.metric()` for KPI cards
- `ui.text()`, `ui.markdown()`, and input widgets for interaction

### When to use it

`UI` works best for:

- internal admin dashboards
- review screens for ops or business teams
- quick forms and compact workflows
- tools that should stay simple enough to maintain in one file

### UI reference

For the end-to-end UI guide, see:

- [UI End-to-End Guide](docs/ui-end-to-end.md)

## Quick Summary

- Use `DataTable` when the problem is mostly table manipulation.
- Use `PipeLine` when the problem is a staged business workflow.
- Use `UI` when the problem is an internal screen or dashboard.

## Need More Detail?

The focused docs live here:

- [Module Reference](docs/module-reference.md)
- [Pipeline End-to-End Guide](docs/pipeline-end-to-end.md)
- [UI End-to-End Guide](docs/ui-end-to-end.md)
