Metadata-Version: 2.1
Name: dtool-lookup-server-direct-mongo-plugin
Version: 0.2.0
Summary: This plugin allows to submit mongo queries and aggregation
Home-page: https://github.com/IMTEK-Simulation/dtool-lookup-server-direct-mongo-plugin
Download-URL: https://github.com/IMTEK-Simulation/dtool-lookup-server-direct-mongo-plugin/tarball/0.2.0
Author: Johannes Hörmann
Author-email: johannes.hoermann@imtek.uni-freiburg.de
License: MIT
License-File: LICENSE

Dtool Lookup Server Direct Mongo Plugin
=======================================

.. image:: https://img.shields.io/github/actions/workflow/status/livMatS/dtool-lookup-server-direct-mongo-plugin/test.yml?branch=main
    :target: https://github.com/livMatS/dtool-lookup-server-direct-mongo-plugin/actions/workflows/test.yml
    :alt: GitHub Workflow Status
.. image:: https://img.shields.io/pypi/v/dtool-lookup-server-direct-mongo-plugin
    :alt: PyPI
    :target: https://pypi.org/project/dtool-lookup-server-direct-mongo-plugin/
.. image:: https://img.shields.io/github/v/tag/livMatS/dtool-lookup-server-direct-mongo-plugin
    :alt: GitHub tag (latest by date)
    :target: https://github.com/livMatS/dtool-lookup-server-direct-mongo-plugin/tags
    
- GitHub: https://github.com/livMatS/dtool-lookup-server-direct-mongo-plugin
- PyPI: https://pypi.org/project/dtool-lookup-server-direct-mongo-plugin/
- Free software: MIT License


Features
--------

- Query datasets via mongo language
- Funnel datasets through aggregation pipelines


Introduction
------------

`dtool <https://dtool.readthedocs.io>`_ is a command line tool for packaging
data and metadata into a dataset. A dtool dataset manages data and metadata
without the need for a central database.

However, if one has to manage more than a hundred datasets it can be helpful
to have the datasets' metadata stored in a central server to enable one to
quickly find datasets of interest.

The `dtool-lookup-server <https://github.com/jic-dtool/dtool-lookup-server>`_ 
provides a web API for registering datasets' metadata
and provides functionality to lookup, list and search for datasets.

This plugin allows to submit plain mongo queries and aggregation pipelines
directly to the lookup server.


Configuration
-------------

Inform this plugin about the Mongo database to use by setting the environment
variables::

    export MONGO_URI="mongodb://localhost:27017/"
    export MONGO_DB="dtool_lookup_server"
    export MONGO_COLLECTION="metadata"

If the Mongo search and retrieve plugins are used, then you may use the same
database, but must use a different collection.

Use

    export ALLOW_DIRECT_QUERY=true
    export ALLOW_DIRECT_AGGREGATION=false

to enable or disable direct mongo query and aggregation on this plugin.

ATTENTION: While direct queries respect user-wise access rights to database
entries on the lookup server level, there is no guarantee for aggregation
pipelines to do so per design. Don not enable direct aggregation in a production
environment.

Authentication
--------------

The dtool lookup server makes use of the authorized header to pass through the
JSON web token for authorization. Below we create environment variables for the
token and the header used in the following ``curl`` command samples::

    $ TOKEN=$(flask user token test-user)
    $ HEADER="Authorization: Bearer $TOKEN"

Refer to the core dcumentation of `dtool-lookup-server <https://github.com/jic-dtool/dtool-lookup-server>`_ for more information.

Direct query
------------

To look for a sepcific field ``key2: 42`` in a dataset's README.yml (provided
the file is properly YAML-formatted), use

    $ curl -H "$HEADER" -H "Content-Type: application/json" -X POST \
        -d '{"query": {"readme.key2": 42}}' http://localhost:5000/mongo/query

Response content::

    [
      {
        "base_uri": "s3://test-bucket",
        "created_at": 1683797360.056,
        "creator_username": "jotelha",
        "dtoolcore_version": "3.18.2",
        "frozen_at": 1683797362.855,
        "name": "test_dataset_2",
        "number_of_items": 1,
        "size_in_bytes": 19347,
        "tags": [],
        "type": "dataset",
        "uri": "s3://test-bucket/26785c2a-e8f8-46bf-82a1-cec92dbdf28f",
        "uuid": "26785c2a-e8f8-46bf-82a1-cec92dbdf28f"
      }
    ]


Direct aggregation
------------------

The following example of an aggregation pipeline identifies
and counts instances of the same dataset at different base URIs::

    $ curl -H "$HEADER" -H "Content-Type: application/json" -X POST \
        -d '{"aggregation": [
                {
                    "$sort": {"base_uri": 1}
                }, {
                    "$group":  {
                        "_id": "$name",
                        "count": {"$sum": 1},
                        "available_at": {"$push": "$base_uri"}
                    }
                }, {
                    "$project": {
                        "name": "$_id",
                        "count": true,
                        "available_at": true,
                        "_id": false
                    }
                }, {
                    "$sort": {"name": 1}
                }
            ]
        }' http://localhost:5000/mongo/aggregate

Response content::

    [
      {
        "available_at": [
          "s3://test-bucket"
        ],
        "count": 1,
        "name": "test_dataset_1"
      },
      {
        "available_at": [
          "s3://test-bucket",
          "smb://test-share"
        ],
        "count": 2,
        "name": "test_dataset_2"
      }
    ]


Testing
-------

Running unit tests with ``pytest`` requires a healthy lookup server installation
and the availability of required services such as databases. Please refer to
the core
`dtool-lookup-server <https://github.com/jic-dtool/dtool-lookup-server>`_
for setup instructions.
