Metadata-Version: 2.4
Name: antarctic
Version: 0.9.9
Summary: ...
Project-URL: repository, https://github.com/tschm/antarctic
Project-URL: homepage, https://tschm.github.io/antarctic/book
Author-email: Thomas Schmelzer <thomas.schmelzer@gmail.com>
License-File: LICENSE
Requires-Python: >=3.11
Requires-Dist: fastparquet>=0.8.0
Requires-Dist: mongoengine>=0.25.0
Requires-Dist: pandas>=2.0
Requires-Dist: polars>=1.37.1
Requires-Dist: pyarrow>=22.0.0
Description-Content-Type: text/markdown

# [Antarctic](https://tschm.github.io/antarctic)

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
[![PyPI Downloads](https://img.shields.io/pypi/dm/antarctic)](https://pypi.org/project/antarctic/)
[![Coverage](https://img.shields.io/endpoint?url=https://tschm.github.io/antarctic/tests/coverage-badge.json)](https://tschm.github.io/antarctic/tests/html-coverage/index.html)
[![Book](https://github.com/tschm/antarctic/actions/workflows/rhiza_book.yml/badge.svg)](https://github.com/tschm/antarctic/actions/workflows/rhiza_book.yml)
[![Release](https://github.com/tschm/antarctic/workflows/Release/badge.svg)](https://github.com/tschm/antarctic/actions/)
[![CodeFactor](https://www.codefactor.io/repository/github/tschm/antarctic/badge)](https://www.codefactor.io/repository/github/tschm/antarctic)
[![Renovate enabled](https://img.shields.io/badge/renovate-enabled-brightgreen.svg)](https://github.com/renovatebot/renovate)

Project to persist Pandas and Polars data structures in a MongoDB database.

## Installation

```bash
pip install antarctic
```

## Usage

This project (unlike the popular arctic project which I admire)
is based on top of [MongoEngine](https://pypi.org/project/mongoengine/).
MongoEngine is an ORM for MongoDB. MongoDB stores documents.
We introduce new fields and extend the Document class
to make Antarctic a convenient choice for storing Pandas and Polars (time series) data.

### PandasField

We introduce first the PandasField for storing Pandas DataFrames.

```python
import mongomock
import pandas as pd
import numpy as np

from mongoengine import Document, connect
from antarctic.pandas_field import PandasField

# connect with your existing MongoDB
# (here I am using a popular interface mocking a MongoDB)
client = connect('mongoenginetest',
                  host='mongodb://localhost',
                  mongo_client_class=mongomock.MongoClient,
                  uuidRepresentation="standard")

# Define the blueprint for a portfolio document
class Portfolio(Document):
    nav = PandasField()
    weights = PandasField()
    prices = PandasField()

```

The portfolio objects works exactly the way you think it works

```python
data = pd.read_csv("tests/test_antarctic/resources/price.csv", index_col=0, parse_dates=True)

p = Portfolio()
p.nav = data["A"].to_frame(name="nav")
p.prices = data[["B","C","D"]] #pd.DataFrame(...)
portfolio = p.save()

nav = p.nav["nav"]
prices = p.prices

```

Behind the scenes we convert the Frame objects
into parquet bytestreams and
store them in a MongoDB database.

The format should also be readable by R.

### PolarsField

Antarctic also supports storing Polars DataFrames using the PolarsField.

```python
import polars as pl
from mongoengine import Document, StringField
from antarctic.polars_field import PolarsField

class Artist(Document):
    name = StringField(unique=True, required=True)
    data = PolarsField()

```

The PolarsField works similarly to PandasField:

```python
a = Artist(name="Artist1")
a.data = pl.DataFrame({"A": [2.0, 2.0], "B": [2.0, 2.0]})
a.save()

# Retrieve the data
df = a.data

```

PolarsField uses zstd compression by default for efficient storage,
but you can specify other compression algorithms:

```python
class CustomArtist(Document):
    name = StringField(unique=True, required=True)
    data = PolarsField(compression="snappy")  # Options: lz4, uncompressed, snappy, gzip, brotli, zstd

```

### XDocument

In most cases we have copies of very similar documents,
e.g. we store Portfolios and Symbols rather than just a Portfolio or a Symbol.
For this purpose we have developed the abstract `XDocument` class
relying on the Document class of MongoEngine.
It provides some convenient tools to simplify looping
over all or a subset of Documents of the same type, e.g.

```python
from antarctic.document import XDocument
from antarctic.pandas_field import PandasField

class Symbol(XDocument):
    price = PandasField()

```

We define a bunch of symbols and assign a price for each (or some of it):

```python
s1 = Symbol(name="A", price=data["A"].to_frame(name="price")).save()
s2 = Symbol(name="B", price=data["B"].to_frame(name="price")).save()

# We can access subsets like
for symbol in Symbol.subset(names=["B"]):
    _ = symbol  # no-op: avoid printing during tests

# often we need a dictionary of Symbols:
symbols = Symbol.to_dict(objects=[s1, s2])

# Each XDocument also provides a field for reference data:
s1.reference["MyProp1"] = "ABC"
s2.reference["MyProp2"] = "BCD"

# You can loop over (subsets) of Symbols and extract reference and/or series data
_reference = Symbol.reference_frame(objects=[s1, s2])
_frame = Symbol.frame(series="price", key="price")
_applied = list(Symbol.apply(func=lambda x: x.price["price"].mean(), default=np.nan))

```

The XDocument class is exposing DataFrames both for reference and time series data.
There is an `apply` method for using a function on (subset) of documents.

### Database vs. Datastore

Storing json or bytestream representations of Pandas objects
is not exactly a database. Appending is rather expensive as one would have
to extract the original Pandas object, append to it and convert
the new object back into a json or bytestream representation.
Clever sharding can mitigate such effects but at the end of the day
you shouldn't update such objects too often. Often practitioners
use a small database for recording (e.g. over the last 24h) and
update the MongoDB database once a day. It's extremely fast
to read the Pandas objects out of such a construction.

Often such concepts are called DataStores.

## uv

Starting with

```bash
make install
```

will install [uv](https://github.com/astral-sh/uv) and create
the virtual environment defined in
pyproject.toml and locked in uv.lock.

## marimo

We install [marimo](https://marimo.io) on the fly within the aforementioned
virtual environment. Executing

```bash
make marimo
```

will install and start marimo.
