Metadata-Version: 2.1
Name: pbumongo
Version: 1.3.0
Summary: Basic MongoDB wrapper for object-oriented collection handling
Home-page: https://github.com/ilfrich/pbu-mongo
Author: Peter Ilfrich
Author-email: das-peter@gmx.de
License: Apache-2.0
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pymongo ==3.11.2
Requires-Dist: pbu >=1.1.17

# Python Basic Utilities - Mongo `pbumongo`

Available on [PyPi](https://pypi.org/project/pbumongo/)

**Table of Contents**

1. [Installation](#installation)
2. [Usage](#usage)
3. [Classes](#classes)
    1. [AbstractMongoStore](#abstractmongostore) - abstract class for handling MongoDB collection access
       1. [MongoConnection](#mongoconnection) - a helper class to assist with creating multiple store instances 
    2. [AbstractMongoDocument](#abstractmongodocument) - abstract class for wrapping MongoDB BSON documents
    3. [ProgressUpdater](#progressupdater) - a collection of classes to help with updating job progress 
4. [Archives](#archives)


## Installation

Install via pip:

```bash
pip install pbumongo
```

## Usage

It is good practice associating a sub-class of `AbstractMongoDocument` with a sub-class of `AbstractMongoStore`. This is
done through the `deserialised_class` parameter in the `super()` constructor call of the store class. Any method for
querying documents will use that class to deserialise the BSON document into the provided class, which should extend
`AbstractMongoDocument`.

Example: let's say we want to implement access to a collection containing user documents. We'll define a class `User`
that extends `AbstractMongoDocument` and a class `UserStore` that extends `AbstractMongoStore`.

```python
# main imports
from pbumongo import AbstractMongoDocument, AbstractMongoStore
# supporting imports
import crypt
from typing import List, Optional
from time import time


# this is an example of a minimum viable class
class User(AbstractMongoDocument):
    def __init__(self):
        super().__init__()
        # define attributes with meaningful defaults
        self.username: str = None
        self.password: str = None
        self.permissions: List[str] = []
        self.last_login: int = 0

    def get_attribute_mapping(self) -> dict:
        # the values are what is used inside MongoDB documents
        return {
            "username": "username",
            "password": "password",
            "permissions": "permissions",
            "last_login": "lastLogin"
        }

    @staticmethod
    def from_json(json: dict):
        user = User()
        user.extract_system_fields(json)
        return user


class UserStore(AbstractMongoStore):
    def __init__(self, mongo_url, mongo_db, collection_name):
        super().__init__(mongo_url, mongo_db, collection_name, deserialised_class=User, data_model_version=1)

    def login(self, username, password) -> Optional[User]:
        # encrypt the password!
        pw_encrypted = crypt.crypt(password, crypt.METHOD_MD5)
        user: Optional[User] = self.query_one({"username": username, "password": pw_encrypted})
        if user is not None:
            # update last_login attribute and save it in database as well
            user.last_login = round(time())
            self.update_one(AbstractMongoStore.id_query(user.id),
                            AbstractMongoStore.set_update("lastLogin", user.last_login))
        return user

    def create_user(self, username, password) -> User:
        # check if this user already exists
        existing = self.query_one({"username": username})
        if existing is not None:
            raise ValueError(f"User with username '{username}' already exists.")
        # create new user object
        user = User()
        user.username = username
        user.password = crypt.crypt(password, crypt.METHOD_MD5)
        # store in database and return document
        user_id = self.create(user)
        return self.get(user_id)
```

#### MongoConnection

To use these classes in your application, you can use the MongoConnection helper or create the `UserStore` class
instance directly. The `MongoConnection` helper is useful, when you have a lot of collections and don't want to repeat
the mongo connection URL and DB name for every constructor.

```python
from pbumongo import MongoConnection
from mypackage import UserStore  # see implementation above

con = MongoConnection("mongodb://localhost:27017", "myDbName")
user_store = con.create_store(store_class=UserStore, collection_name="users")

user = user_store.login(username="admin", password="mypassword")
```

## Classes

### `AbstractMongoStore`

This is an abstract class and cannot be instantiated directly. Instead, define a class that extends this class.

**Constructor**

`__init__(mongo_url, mongo_db, collection_name, deserialised_class, data_model_version=1, archive_store)`

- `mongo_url` - this is the Mongo connection URL containing the host, port and optional username, password
- `mongo_db` - this is the Mongo DB name - the one you provide when using `use <dbname>` on the Mongo shell
- `collection_name` - the name of the collection - e.g. `myCollection` for `db.myCollection.find({})` on the Mongo shell
- `deserialised_class` - used for all the query methods to deserialise the BSON document into a class with attributes
  for easier access
- `data_model_version` - a number that can be used for database migration as an app develops over time


**Methods**

- `get(doc_id: str)` - fetches a single document with a matching `doc_id == document["_id"]`
- `get_all()` - fetches the entire collection content and deserialises every document. Careful, this is not an iterator,
  but returns a `list` of all the documents and can consume quite a bit of compute and memory.
- `create(document)` - creates a new document and returns the `_id` of the newly created BSON document as string. The
  `document` can be either `dict` or an instance of the `deserialised_class` provided in the `super().__init(..)` call.
  - Since version 1.0.1 a new parameter is available `create(document, return_doc=True)` which will return the entire
    document/object instead of just the `_id` of the newly created document. 
- `query_one(query: dict)` - fetches a single document and deserialises it or returns `None` if no document can be found
- `query(query: dict, sorting, paging)` - fetches multiple documents and deserialises them. `sorting` can be an
  attribute name (as provided in the BSON) or a dictionary with the sort order. `paging` is an instance of
  `pbumongo.PagingInformation`.
- `update_one(query: dict, update: dict)` - proxies the `db.collection.updateOne(..)` function from the Mongo shell
- `update(query:, update: dict` - same as `update_one`, but will update multiple documents, if the query matches
- `update_full(document)` - shortcut for updating the entire document with an updated version, the query will be
  constructed from the `id`/`_id` provided by the `document`.
- `delete(doc_id)` - deletes a single document with the provided document ID
- `delete_many(query: dict)` - deletes multiple documents matching the query.
- `set_archive(archive_store: AbstractMongoStore)` - pass another store instance used for backups/archives, should also 
  used to create indexes in the main store - see [Archives](#archives)
- `run_archive(options: Optional)` - can be implemented by the sub-class, by default does nothing. Options can be 
  anything the implementing class wants

**Static Methods**

- `AbstractMongoStore.id_query(string_id: str)` - creates a query `{ "_id": ObjectId(string_id) }`, which can be used to
  query the database
- `AbstractMongoStore.set_update(keys, values)` - creates a `$set` update statement. If only a single attribute is 
  updated, you can pass them directly as parameters, e.g. updating a key `"checked": True`, can be done by 
  `.set_update("checked", True)`. If you update multiple attributes provide them as list in the matching order.
- `AbstractMongoStore.unset_update(keys)` - creates an `$unset` update statement with the attributes listed as `keys`.
  Similarly to `.set_update`, you can provide a single key without a list for ease of use.
  
### `AbstractMongoDocument`

This is an abstract class and cannot be instantiated directly. Instead, define a class that extends this class.

**Constructor**

`__init__(doc_id=None, data_model_version=None)`

The parameters are entirely optional. Generally it is recommended to use the static method `from_json(json: dict)` to 
create BSON documents you've loaded from the database instead of calling the constructor. For new documents, you would
not provide the `_id` as the store class handles that.

**Methods**

For methods and static methods please see the documentation of `JsonDocument` from `pbu`. `AbstractMongoDocument` 
extends that class.


### `ProgressUpdater`

The `ProgressUpdaer` class is part of a set of classes that assist with keeping track of job progress. The other classes
are:

- `ProgressObject`: a database object with fields for a status (see `pbu` > `JobStatus`), start and end timestamp, 
 total count, processed count, a list of errors and a main error.
- `ProgressObjectStore`: an abstract class that provides store methods to update status, progress and errors of a 
 `ProgressObject`
- `ProgressError`: a JSON document containing an error message as well as a dictionary for data related to the error. 
 These objects will be appeneded to a `ProgressObject`'s `errors` list.
- `ProgressUpdater`: an object to pass into a processor, which holds references to the progress store and progress 
 object and provides methods for updating progress and handling errors.

Both, `ProgressObject` and `ProgressObjectStore` are abstract classes and should be extended with remaining attributes 
of a process / job definition (like a name/label, extra configuration, etc.). `ProgressObject` is an 
`AbstractMongoDocument` and `ProgressUpdateStore` is an `AbstractMongoStore`.

## Archives

Since 1.3.0 each `AbstractMongoStore` provides an interface for archives/backups with the folowing goals in mind:
- In some cases, when a collection contains lots of documents and you have a few indexes for faster queries running, 
  MongoDBs memory consumption can get quite high. So it can make sense to archive older documents in a separate 
  store/collection that is identical, but doesn't have these indexes. 
- The main store should have access to it's own archive.
- It's possible to provide a different store class as archive store.

Usage:
```python
from pbumongo import MongoConnection
from my_stores import InvoiceStore

con = MongoConnection("mongodb://localhost:27017", "myDbName")
invoice_store = con.create_store(
    store_class=InvoiceStore, 
    collection_name="invoices"
).set_archive_store(
    con.create_store(
        store_class=InvoiceStore, 
        collection_name="invoicesArchive"
    )
)

```

This creates 2 instances of InvoiceStore, each with their own collection name. The `invoice_store` (the main store) 
knows about its archive store and can access it as `self.archive_store`.

The second instance of InvoiceStore (the archive store) can detect whether it is the archive by checking:
`if self.archive_store is None`. 

The `set_archive_store` method is only called for the main store, which makes it the best place to create indexes 
instead of doing this in the constructor.

```
class InvoiceStore(AbstractMongoStore):
    def set_archive_store(self, archive_store):
        # create/ensure your indexes
        self.collection.create_index(...)
        self.collection.create_index(...)
        self.collection.create_index(...)

        # ensure to call super() - this will set self.archive_store
        return super().set_archive_store(archive_store)
```

You can pass a different class as `archive_store`. I would not recommend doing this, as it 
complicates things if you want to use the archive for lookups as well, in case your main store does not return any
results (e.g. for a start/end date query, see example below). This can however be mitigated by the archive store 
translating its own document structure into the structure of the main store.

By default no other method will use the archive store. It is purely there for convenience. And so is the `run_archive()` method.

Archive lookups can be expensive and slow down a system. Be careful about when to allow to access the archive store and 
when not and allow for longer query times when using them, as they shouldn't have the same indexes as the main store.

Example of query that uses the archive store:
```
class MyStore(AbstractMongoStore):
    def find_by_dates(self, start: int, end: int):
        query = {"timestamp": {"$gte": start, "$lte": end}}
        result = self.query(query)

        if len(result) == 0 and self.archive_store is not None:
            # regular query did not return anything, proxy the query to the archive store
            return self.archive_store.find_by_dates(start, end)
        return result
```

You can also combine archive results with regular results, provided they map to the same object. The `start` parameter
in above example is perfect to be used for this by checking if `start < last_archive_date`.
