Metadata-Version: 2.4
Name: soigia
Version: 0.1.14
Summary: Python SDK for Soi Gia
Author-email: Soi Gia <sojgja@gmail.com>
License-Expression: BSD-3-Clause
Keywords: soigia,sdk,datatable,pipeline,streamlit
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas
Requires-Dist: jinja2
Requires-Dist: loguru
Requires-Dist: pyarrow
Requires-Dist: pyyaml
Requires-Dist: tabulate
Requires-Dist: rich
Requires-Dist: sqlalchemy
Dynamic: license-file

# soigia

`soigia` is a Python SDK built around four practical workspaces:

- `DataTable` for dataframe-style work with a friendlier API
- `Pipeline` for repeatable business workflows
- `ORM` for model-driven business data with backend adapters
- `UI` for lightweight internal screens on top of Streamlit

## Philosophy

Soi Gia keeps the codebase practical and explicit:

- domain rules stay close to the business language
- public APIs stay small and predictable
- shared concepts live in `common`, not scattered across modules
- each bounded context owns its own entities, services, policies, and value objects
- runtime code does not depend on docs, and docs do not depend on external references
- examples in this README are meant to be copy-paste friendly and match the real API

## Quick Index

- [DataTable](#1-datatable)
- [Pipeline](#2-pipeline)
- [ORM](#3-orm)
- [UI](#4-ui)
- [Domain](#5-domain) - module-based bounded contexts for business rules

This README focuses on those four areas plus the domain layer.

## Quickstart

Install the package:

```bash
pip install soigia
```

Or for Django projects:

```bash
pip install "soigia[django]"
```

Try a small domain example:

```python
from decimal import Decimal
from soigia.domain import common, sales

customer = common.Customer(name="Alice")
product = common.Product(
    sku=common.SKU("COFFEE-001"),
    name="Coffee",
    price=common.Money(Decimal("50000")),
    stock_qty=100,
)

order = sales.SalesOrder(customer=customer)
order.add_line(product=product, quantity=common.Quantity(2))

workflow = sales.SalesWorkflowService()
workflow.confirm(order)

print(order.amount_total)
```

## Install

For local development:

```bash
pip install -e .
```

For editable install with Django extras:

```bash
pip install -e ".[django]"
```

If you only need the runtime package, `pip install soigia` is enough.

## 1. DataTable

`DataTable` is the main table abstraction in `soigia`. It keeps the pandas mental model, but adds a more opinionated, workflow-friendly API for cleaning, filtering, joining, validating, and exporting data.

Use `DataTable` when you want:

- a table object that behaves like a dataframe
- chainable query-style operations
- simple data loading helpers
- schema checks before a dataset moves to the next step
- snapshot and version helpers for auditability

### Typical workflow

1. Load rows from records, CSV, JSON, Excel, Parquet, Feather, pickle, SQL, or Google Sheets.
2. Clean and normalize the data.
3. Filter, sort, deduplicate, or join with another table.
4. Validate the final shape and types.
5. Export or snapshot the result.

### Example

```python
from soigia.datatable import DataTable

orders = DataTable.from_records(
    [
        {"order_id": 1001, "customer": "Alice", "amount": 120.5, "city": "Hanoi"},
        {"order_id": 1002, "customer": "Bob", "amount": 88.0, "city": "Saigon"},
        {"order_id": 1003, "customer": "Carol", "amount": 240.0, "city": "Danang"},
    ]
)

adults = orders.objects.filter(amount__gte=100).order_by("amount")
print(adults.to_rows())
```

### Practical things you can do

- `DataTable.from_records(records)` for raw Python data
- `DataTable.from_csv(path)` for flat files
- `DataTable.from_json(path)` for payloads and exports
- `DataTable.from_excel(path)` for business spreadsheets
- `DataTable.from_parquet(path)` for analytics datasets
- `DataTable.from_sql(...)` for database-backed loading
- `df.objects.filter(...)` for query-like row selection
- `df.objects.exclude(...)` for inverse filtering
- `df.objects.order_by(...)` for sorting
- `df.objects.distinct(...)` for deduplication
- `df.objects.group_by(...)` for grouped summaries
- `df.join(...)` and related helpers for relational work
- `df.validate_schema(...)` for checks before release
- `df.snapshot(...)` and `df.auto_version(...)` for versioned outputs

### When to use it

`DataTable` is a good fit when:

- you need dataframe behavior, but want a stricter workflow layer
- you are passing data between cleaning steps and business logic
- you want tests to assert shape and content more clearly
- you need small, readable transformations instead of scattered pandas calls

### DataTable in practice

`DataTable` is the most direct way to work with structured records in Soi Gia. The main thing to remember is that it keeps the dataframe mental model, but the API is optimized for business workflows rather than generic data science notebooks.

If you are choosing between raw pandas and `DataTable`, use `DataTable` when:

- you want a clearer chainable API for row filtering and transformation
- you need the SDK to keep row-level intent readable in tests
- you want built-in helpers for export, validation, and snapshotting

## 2. Pipeline

`Pipeline` is the workflow layer for business processing. It is designed for repeatable jobs where data comes in, gets normalized, goes through a few stages, and then writes out artifacts and summary files.

Use `Pipeline` when you want:

- a predictable execution order
- stage-based processing
- automatic output files
- rollback hooks for side effects
- config-driven behavior
- a shared model namespace for computed results

### Mental model

1. `load_data()` brings data into memory.
2. Each stage mutates or enriches the working dataset.
3. `self.datatable` is the main working dataset.
4. `self._data` stores extra data loaded from DB tables.
5. `self.envs` gives fast SQLite table access for ad-hoc reads.
6. `self.model` stores computed results.
7. `self.config` exposes YAML values.
8. `save_outputs()` writes final artifacts.
9. `on_rollback(...)` protects side effects when a stage fails.

### Example

```python
from soigia.pipeline import Pipeline
from soigia.datatable import DataTable


class SalesPipeline(Pipeline):
    stages = ["clean", "enrich", "save", "summarize"]

    def load_data(self):
        return DataTable(
            [
                {"id": 1, "amount": 100.0},
                {"id": 2, "amount": 250.0},
            ]
        )

    def clean(self):
        self.datatable = self.datatable.dropna().reset_index(drop=True)

    def enrich(self):
        self.datatable["net_amount"] = self.datatable["amount"] * 0.98

    def save(self):
        self.datatable.save_to_csv("data/sales/clean_sales.csv")
        self.push("sales", self.datatable, mode="replace")

    def summarize(self):
        self.model.total_rows = len(self.datatable)
        self.model.total_amount = float(self.datatable["amount"].sum())


pipeline = SalesPipeline(name="sales")
result = pipeline.run()

print(result.success)
print(result.summary_path)
print(result.csv_path)
```

### What a pipeline usually includes

- input loading from files, databases, or APIs
- normalization and cleaning
- business-specific enrichment
- summary or scoring logic
- CSV, SQLite, and Parquet outputs
- a Markdown summary for traceability
- log files for debugging and audit

### Recommended structure

- keep the pipeline class small and stage-oriented
- put reusable config in `config.yaml`
- use `config.example.yaml` as the checked-in template
- use `pipeline.config.example.yaml` as the pipeline runtime template
- use `self.model` for computed values that later stages need
- keep rollback handlers focused on external effects, not dataframe-only work

### Pipeline DB defaults

`Pipeline` creates a SQLite database automatically on first use, so subclasses can read and write without extra setup.

The main working attribute is `self.datatable`. Extra tables loaded from DB are cached in `self._data`.

`self.datatable` is a `DataTable`, so you can use helpers like `save_to_csv()`, `save_to_parquet()`, `load_from_db()`, and `save_to_db()` directly. For SQLite, `connection_string` is optional; if you omit it, Soi Gia uses the default local DB:

```python
self.datatable.save_to_csv("data/output.csv")
self.datatable.save_to_parquet("data/output.parquet")
self.datatable.save_to_db("sales")
self.datatable = DataTable.load_from_db("select * from sales")
```

For quick reads, `self.envs` exposes table handles:

```python
orders = self.envs.orders.filter(self.envs.orders.id > 1)
```

You can also use keyword lookups:

```python
orders = self.envs.orders.filter(id__gt=1)
```

```python
class DemoPipeline(Pipeline):
    stages = ["seed", "sync", "reload"]

    def load_data(self):
        return DataTable([{"id": 1, "name": "alice"}])

    def seed(self):
        self.db.set("users", self.datatable)

    def sync(self):
        self.db.upsert_rows(
            "users",
            [{"id": 1, "name": "alice-updated"}, {"id": 2, "name": "bob"}],
        )

    def reload(self):
        self.datatable = DataTable.load_from_db("select * from users")
```

Short aliases are also available:

```python
self.push("users", self.datatable)
self.pull("users")
self.push_datatable("users", self.datatable, mode="upsert")
self.pull_datatable("users")
```

Default upsert keys can be configured per table:

```yaml
db_upsert_keys:
  users:
    - id
  orders:
    - order_id
```

### Pipeline in practice

`Pipeline` is the orchestration layer for repeatable business jobs. Treat it as a stage runner with explicit loading, cleaning, enrichment, persistence, and summary steps.

Use `Pipeline` when you need:

- deterministic stage execution
- automatic output artifacts
- rollback hooks for side effects
- config-driven behavior
- a shared `model` object for computed values

## 3. ORM

`ORM` is the model-driven data layer in `soigia`. It keeps the class-based model style of Odoo and Django, while separating storage through adapters like memory, pandas, and SQLite.

Use `ORM` when you want:

- declarative models with fields
- query helpers like `filter()`, `exclude()`, `order_by()`, and `limit()`
- record creation, updates, deletes, and browsing by id
- model inheritance and Odoo-style `_inherit`
- business methods on the model itself
- a backend-agnostic API that can switch storage engines

### Mental model

1. Define a model with `class MyModel(Model)`.
2. Declare fields directly on the class.
3. Bind the model to an `Env`.
4. Use `env["model.name"]` or `Model.objects` to query data.
5. Let the backend handle storage, while the model handles business behavior.

### Example

```python
from soigia.db import Env, MemoryBackend, Model, fields


class Customer(Model):
    _name = "demo.customer"

    name = fields.String(required=True)


env = Env(backend=MemoryBackend())
customer = env["demo.customer"].create({"name": "Alice"})

print(customer.id)
print(env["demo.customer"].search([("name", "=", "Alice")]).ids)
```

### ORM in practice

The ORM layer follows a model-centric style. Define fields on the model, bind it to an environment, and let the backend manage storage while the model keeps business behavior.

Use `ORM` when you need:

- declarative models with field definitions
- query helpers like `filter()`, `exclude()`, `order_by()`, and `limit()`
- record lifecycle methods on the model itself
- a backend-agnostic layer that can switch storage engines

## 4. UI

`soigia.ui` is a declarative Streamlit layer for internal tools. It is meant for dashboards, admin screens, review pages, and lightweight operational views.

Use `UI` when you want:

- a single-file screen definition
- simple page composition
- forms, metrics, tables, and filters
- a fast way to expose operational workflows

### Mental model

1. Define a page with `ui.page()`.
2. Compose layout blocks with `ui.sidebar()`, `ui.columns()`, `ui.tabs()`, and `ui.form()`.
3. Render data with `ui.table()` and `ui.metric()`.
4. Add inputs with `ui.text_input()`, `ui.select()`, and `ui.number_input()`.
5. Run the app with `ui.run()`.

### Example

```python
from soigia.ui import ui

orders = [
    {"id": 1, "customer": "Alice", "amount": 120.0, "status": "paid"},
    {"id": 2, "customer": "Bob", "amount": 88.5, "status": "pending"},
    {"id": 3, "customer": "Carol", "amount": 220.0, "status": "paid"},
]


@ui.page("Orders Dashboard")
def orders_page(ctx):
    ui.markdown("Track orders, filter data, and prepare quick actions.")

    with ui.sidebar():
        keyword = ui.text_input("Search customer")
        status = ui.select("Status", ["all", "paid", "pending"])
        min_amount = ui.number_input("Minimum amount", value=0)

    filtered = [
        row
        for row in orders
        if keyword.value.lower() in row["customer"].lower()
        and (status.value == "all" or row["status"] == status.value)
        and row["amount"] >= min_amount.value
    ]

    ui.metric("Orders", len(filtered))
    ui.metric("Total amount", sum(row["amount"] for row in filtered))
    ui.table(filtered)


ui.run()
```

### Common building blocks

- `ui.page()` for page registration
- `ui.sidebar()` for filters and navigation
- `ui.columns()` for split layouts
- `ui.tabs()` for grouped views
- `ui.form()` for update flows
- `ui.table()` for records and results
- `ui.metric()` for KPI cards
- `ui.text()`, `ui.markdown()`, and input widgets for interaction

### When to use it

`UI` works best for:

- internal admin dashboards
- review screens for ops or business teams
- quick forms and compact workflows
- tools that should stay simple enough to maintain in one file

### UI in practice

`soigia.ui` is a declarative Streamlit layer for internal tools. It is intentionally simple: compose a page, place a few widgets, render data, and keep the screen definition readable.

## Quick Summary

- Use `DataTable` when the problem is mostly table manipulation.
- Use `Pipeline` when the problem is a staged business workflow.
- Use `UI` when the problem is an internal screen or dashboard.

## 5. Domain

`soigia.domain` is the pure business layer of the SDK. It is organized by bounded context, and each context stays flat so the package is easy to scan when published on PyPI.

### Domain map

- `common` contains shared primitives used by more than one business module.
- `anki` contains spaced-repetition behavior for cards, decks, and review scheduling.
- `crm` contains lead, contact, activity, stage, and workflow behavior.
- `sales` contains sales order behavior and sales workflow rules.
- `purchases` contains supplier, purchase order, approval, and purchase workflow rules.
- `obsidian` contains note, vault, task, template, and canvas behavior for knowledge-base style workflows.

### Domain structure

Each bounded context uses the same flat layout:

- `entities.py`
- `policies.py`
- `services.py`
- `value_objects.py` when needed
- `enums.py`
- `exceptions.py`

That layout keeps the package readable without deep folder nesting.

### What belongs where

- `entities.py` holds entities, aggregates, and their internal behavior.
- `value_objects.py` holds immutable business primitives and validation rules.
- `policies.py` holds business decision rules and strategy objects.
- `services.py` holds orchestration that coordinates multiple domain objects.
- `enums.py` holds state and type enums.
- `exceptions.py` holds domain-specific errors.

### Example imports

```python
from soigia.domain import anki, crm, obsidian, purchases, sales

deck = anki.Deck(name=anki.DeckName("Japanese"))
lead = crm.Lead(name="Website inquiry", stage=crm.Stage(name="Qualification", sequence=10))
vault = obsidian.Vault(name=obsidian.VaultName("Main"))
```

### Domain usage rules

- Keep business rules inside the domain.
- Keep I/O, database access, API calls, and UI out of the domain.
- Prefer value objects for validated business data.
- Prefer services for cross-entity orchestration.
- Prefer policies for rule variation and decision logic.
- Keep tests in `tests/domain`, not inside the runtime package.

### Current domain modules

- `common` for shared order, product, money, quantity, and SKU behavior.
- `anki` for review cards, decks, and scheduling.
- `crm` for contacts, leads, activities, and lead scoring.
- `sales` for sales order workflow.
- `purchases` for purchase order workflow.
- `obsidian` for notes, templates, tasks, and canvas structure.

### Domain test coverage

The domain layer is covered by dedicated tests under `tests/domain`. Those tests are the contract for the package and should be treated as the source of truth for behavior changes.

### Install variants

```bash
pip install soigia
pip install "soigia[django]"
pip install -e .
pip install -e ".[django]"
```

Use the plain install if you only need the SDK, and use the Django extra if you need the Django integration layer.

### Domain cookbook

#### Common

```python
from decimal import Decimal
from soigia.domain import common

money = common.Money(Decimal("120000"))
quantity = common.Quantity(3)
sku = common.SKU("SKU-001")
product = common.Product(sku=sku, name="Coffee", price=money, stock_qty=10)
```

#### Anki

```python
from datetime import date
from soigia.domain import anki

deck = anki.Deck(name=anki.DeckName("Japanese"))
card = anki.Card(front=anki.CardText("猫"), back=anki.CardText("cat"))
deck.add_card(card)

service = anki.ReviewService()
service.review(card, anki.ReviewRating.GOOD, reviewed_on=date(2026, 4, 4))
```

#### CRM

```python
from decimal import Decimal
from soigia.domain import crm

stage = crm.Stage(name="Qualification", sequence=10, probability=Decimal("0.40"))
lead = crm.Lead(name="Website inquiry", stage=stage)
workflow = crm.LeadWorkflowService()
workflow.qualify(lead, crm.Stage(name="Proposal", sequence=20, probability=Decimal("0.75")))
```

#### Sales

```python
from decimal import Decimal
from soigia.domain import common, sales

customer = common.Customer(name="Alice")
product = common.Product(
    sku=common.SKU("COFFEE-001"),
    name="Coffee",
    price=common.Money(Decimal("50000")),
    stock_qty=100,
)

order = sales.SalesOrder(customer=customer)
order.add_line(product=product, quantity=common.Quantity(2))
workflow = sales.SalesWorkflowService()
workflow.confirm(order)
```

#### Purchases

```python
from decimal import Decimal
from soigia.domain import common, purchases

supplier = purchases.Supplier(name="ACME Supplies")
product = common.Product(
    sku=common.SKU("PAPER-A4"),
    name="Paper",
    price=common.Money(Decimal("20000")),
    stock_qty=500,
)

order = purchases.PurchaseOrder(supplier=supplier)
order.add_line(product=product, quantity=common.Quantity(10))
workflow = purchases.PurchaseWorkflowService()
workflow.confirm(order)
```

#### Obsidian

```python
from datetime import date
from soigia.domain import obsidian

vault = obsidian.Vault(name=obsidian.VaultName("Main"))
note = obsidian.Note(
    title=obsidian.NoteTitle("Daily Note"),
    path=obsidian.NotePath("Daily/2026-04-04.md"),
    content="# 2026-04-04\n- [ ] Follow up",
)

vault.add_note(note)
obsidian.TaskService().sync_note_tasks(note)
obsidian.VaultService().create_daily_note(vault, on_date=date(2026, 4, 4))
```

### Domain API notes

- `common` is the shared core. Keep pricing, stock, money, quantity, and SKU rules there.
- `anki` is for spaced repetition. `ReviewService` updates the card state and review history.
- `crm` is for leads and opportunities. `LeadWorkflowService` moves stages and refreshes scoring.
- `sales` is for sales orders. It owns sales state transitions and pricing behavior.
- `purchases` is for purchase orders. It owns approval logic and purchase workflow transitions.
- `obsidian` is for note-based knowledge work. It owns note, vault, task, template, and canvas behavior.

### Domain invariants to remember

- Do not put database code, API calls, or UI code inside `soigia.domain`.
- Keep entity methods focused on business state and invariant enforcement.
- Keep services focused on multi-object workflow orchestration.
- Keep policy objects focused on rule selection and transition decisions.
- Keep tests outside the runtime package in `tests/domain`.
