Metadata-Version: 2.1
Name: stepping
Version: 0.0.1
Summary: Incremental View Maintenance for Python backends
Author-email: Oliver Russell <ojhrussell@gmail.com>
Maintainer-email: Oliver Russell <ojhrussell@gmail.com>
Keywords: IVM,postgres
Classifier: Programming Language :: Python
Requires-Python: >=3.11
Description-Content-Type: text/markdown
Requires-Dist: immutables (>=0.19)
Requires-Dist: mashumaro (>=3.5)
Requires-Dist: mypy (>=1.2.0)
Requires-Dist: psycopg (>=3.1.8)
Requires-Dist: psycopg-binary (>=3.1.8)
Requires-Dist: psycopg-pool (>=3.1.7)
Requires-Dist: tabulate (>=0.9.0)
Provides-Extra: dev
Requires-Dist: testing.postgresql (>=1.3.0) ; extra == 'dev'
Requires-Dist: pytest (>=7.2.2) ; extra == 'dev'
Requires-Dist: autoflake (>=2.0.2) ; extra == 'dev'
Requires-Dist: black (>=23.1.0) ; extra == 'dev'
Requires-Dist: icdiff (>=2.0.6) ; extra == 'dev'
Requires-Dist: isort (>=5.12.0) ; extra == 'dev'
Requires-Dist: prettyprinter (>=0.18.0) ; extra == 'dev'
Requires-Dist: pydot (>=1.4.2) ; extra == 'dev'
Requires-Dist: snakeviz (>=2.1.1) ; extra == 'dev'

# `stepping`

Based on the paper: [DBSP: Automatic Incremental View Maintenance for Rich Query Languages](https://github.com/vmware/database-stream-processor/blob/e6cdbb538bbce8adb90018ff75f8ae8251b3e206/doc/theory/main.pdf).

## Installation

```bash
pip install stepping
```

### Development installation

```bash
git clone git@github.com:leontrolski/stepping.git
python -m venv .env
source .env/bin/activate
pip install -e '.[dev]'
pytest
mypy src tests
```

# Internals

## Todos

### Ergonomics

- Rename to `stepping` and namespace to `stepping/db` or something.
- Think of a nice way to implement nested collections with efficient operations.

- Wrap `run.iteration` with further nice interface.
- Is there anything funky we can do like the `immerframe` lib.
- Add a commit timestamp for the tables for a future API.
- Write everything up, email the dbsp people from the original paper.

### Operator level

- Allow for (and test) arbitrary depth grouped nesting and joining in a grouped setting - is this necessary, or can it always just be achieved outside the group?
- Look at 11.8 "Window aggregates"
- Change `name: str` everywhere to be `provenance: Provenance`
- Change `transform.finalize` to be like `with freshly_numbered_vertices():` and namespace tables. (There is `reset_vertex_counter(...)` now if that helps).
- Simplify `haitch`.

### Types level

- `s/T/TSerializable/`
- Add a `maybe` function that allows for `pick_index(Left, maybe(left.a).foo)`.
- `s/[T, K]/[K, T]/` everywhere.

## Performance

- Running a pretty basic test (1 million reads, two joins, group by date), stepping's insert time is a lot slower, but querying the integrated data set takes `0.0003s` as opposed to `0.5s`. (Details in `test_profile_cute.py`).
