Metadata-Version: 2.1
Name: dask_elk
Version: 0.1.1
Summary: Dask connection with Elasticsearch
Home-page: https://github.com/avlahop/dask-elk
Maintainer: Apostolos Vlachopoulos
Maintainer-email: avlahop@gmail.com
License: GPLv3
Project-URL: Documentation, https://dask-elk.readthedocs.io
Description: [![CircleCI](https://circleci.com/gh/avlahop/dask-elk/tree/master.svg?style=svg)](https://circleci.com/gh/avlahop/dask-elk/tree/master)
        
        # dask-elk
        Use dask to fetch data from Elasticsearch in parallel by sending the request to each shard separatelly. 
        
        # Table of Contents
        1. [Introduction](#introduction)
        1. [Usage](#usage)
        
        
        
        
        
        
        
        
        ## Introduction <a name='introduction' />
        The library tries to imitate the functionality of the ES Hadoop plugin for spark. `dask-elk` performs a parallel read across all the target indices shards.
        In order to achieve that it uses Elasticsearch scrolling mechanism. 
        
        
        ## Usage <a name="usage" />
        To use the library and read from an index:
        
        ```python
        from dask_elk.client import DaskElasticClient
        
        # First create a client
        client = DaskElasticClient() # localhost Elasticsearch
        
        index = 'my-index'
        df = client.read(index=index, doc_type='_doc')
        ```
        
        You can even pass a query to push down to elasticsearch, so that any filtering can be done on the Elasticsearch side. Because `dask-elk` uses scroll mechanism aggregations are not supported
        ```python
        from dask_elk.client import DaskElasticClient
        
        # First create a client
        client = DaskElasticClient() # localhost Elasticsearch
        query = {
            "query" : {
                "term" : { "user" : "kimchy" }
            }
        }
        index = 'my-index'
        df = client.read(query=query, index=index, doc_type='_doc')
        ```
        
        Read documentation [here](https://dask-elk.readthedocs.io)
Keywords: elasticsearch dask parallel
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.7
Classifier: Development Status :: 4 - Beta
Classifier: License :: OSI Approved :: GNU General Public License (GPL)
Requires-Python: >=2.7
Description-Content-Type: text/markdown
