Metadata-Version: 2.1
Name: dynamodb-traverse
Version: 0.1.2
Summary: High performance, thread safe traversing tool for AWS DynamoDB
Home-page: https://github.com/holyshipt/dynamodb_traverse
Author: Lawrence He
Author-email: ruyangmao1001@gmail.com
License: MIT
Platform: UNKNOWN
Classifier: Intended Audience :: Developers
Classifier: Topic :: Software Development :: Libraries
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.8
Requires-Python: >=3.8
Description-Content-Type: text/markdown

# dynamodb-traverse
High performance, thread safe, hackable traversing tool for AWS DynamoDB based on aioboto3.
<p align="center">
<a href="https://travis-ci/holyshipt/dynamodb_traverse"><img alt="Build Status" src="https://travis-ci.org/holyshipt/dynamodb_traverse.svg?branch=master"></a>
</p>

### Installation/Uninstallation
Prerequisite: python 3.8+ and aioboto3>=6.4.1 (bleeding edge)

Run following command to install requirements:

`pip install aioboto3`

Next, install dynamodb-traverse by running:

`pip install dynamodb-traverse`

To uninstall dynamodb-traverse, run:

`pip uninstall dynamodb-traverse`

### Usage

Here's a sample program to traverse through a demo table called "default":
```python

import asyncio
import time

from dynamodb_migration.counter import AsyncCounter
from dynamodb_migration.main import DynamoDBClient, cardinality
import dynamodb_migration.ddb_const as cst

if __name__ == '__main__':
    loop = asyncio.get_event_loop()

    # dynamodb-migration will look at your aws credentials defined at ~/.aws/credentials and pick up a profile named 'prod-api'
    # make sure it's correctly align to your environment
    client = DynamoDBClient(queue=asyncio.Queue(loop=loop), **{'profile': 'prod-api'})

    config_table_name = 'default'

    if True:
        # a counter to keep counting items being scanned
        counter = AsyncCounter()

        # here we define a coroutine as main entrance of our program
        async def main():

            # async context manager for dynamodb 
            async with client.get_dynamodb() as dynamodb:
                table = dynamodb.Table(config_table_name)

                # check API document section for complete schema of the parameters 
                await client.traverse(
                    **{
                        cst.PRODUCER: {
                            cst.SOURCE_TABLE: table,
                            cst.THREAD_COUNT: 3,
                            cst.SCAN_BATCH_SIZE: 35,
                            # INDEXNAME: 'indexname'
                        },
                        cst.CONSUMER: {
                            cst.THREAD_COUNT: 3,
                            cst.FUNCTION: cardinality,
                            cst.TIMEOUT: 3,
                            cst.ARGS: [AsyncCounter()]
                        }
                    }
                )

        start = time.perf_counter()
        loop.run_until_complete(main())
        elapsed = time.perf_counter() - start
        loop.close()
        print(f'finished in {elapsed:0.5f} sec')
```

### create client
```
client = DynamoDBClient(
            queue=asyncio.Queue(loop=event_loop), 
            profile='string',
            log_file='string',
            local='boolean',
            **kwargs
         )
```

#### Parameters
* queue (Queue) [REQUIRED] - (async) in memory buffer queue 
    * event_loop (loop) [REQUIRED] - if use async queue, a loop need to be specified
* profile (string) [REQUIRED] - name of aws profile to use, which is defined in `.aws/credentials`
* local (boolean) [OPTIONAL] - a flag to indicate if it's local or prod env. Default to True.
* kwargs [OPTIONAL] - check [boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb.html#client) for advanced usage

### traverse
```
 client.traverse(**{
       'producer': {
           'source': 'string',
           'TotalSegments': 'number',
           'Limit': 'number',
           'IndexName': 'string',
            **kwargs
        },
       'consumer': {
           'TotalSegments': 'number',
           'function': 'function_label',
           'timeout': 'number',
           'args': 'list',
           **kwargs
        }
 })
```

#### Parameters
* producer (hash) [REQUIRED] - a hash describing the producer thread
    * source (string) [REQUIRED] - name of the source table in dynamodb
    * TotalSegments (number) [REQUIRED] - same in [boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb.html#DynamoDB.Client.scan)
    * Limit (number) [REQUIRED] - same in [boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb.html#DynamoDB.Client.scan)
    * IndexName (string) [OPTIONAL] - name of the source table index. If specified, we are scanning data from target index, instead of full table. 
    * kwargs (OPTIONAL) check [boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb.html#DynamoDB.Client.scan) for more advanced usage.

* consumer (hash) [REQUIRED] - a hash describing the consumer thread
    * TotalSegments (number) - same in [boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb.html#DynamoDB.Client.scan)
    * function (function_label) - pass a function to this consumer!
    * args (list) - pass a list of args to the function you just supplied. currently we only support position based args
    * timeout (number) - how many second should consumer wait if there's no work load available to it

### Benchmark (in progress)

### Road map
We are currently working on a distributed traversing tool that takes traversing tasks execution to next level. So stay tuned!  




