Metadata-Version: 2.1
Name: servicex
Version: 1.0.0b2
Summary: Front-end for the ServiceX Data Server
Home-page: https://github.com/iris-hep/func_adl_xAOD
Author: G. Watts (IRIS-HEP/UW Seattle)
Author-email: gwatts@uw.edu
Maintainer: Gordon Watts (IRIS-HEP/UW Seattle)
Maintainer-email: gwatts@uw.edu
License: TBD
Description: # ServiceX_frontend
         Client access library for ServiceX
        
        [![GitHub Actions Status](https://github.com/ssl-hep/ServiceX_frontend/workflows/CI/CD/badge.svg)](https://github.com/ssl-hep/ServiceX_frontend/actions)
        [![Code Coverage](https://codecov.io/gh/ssl-hep/ServiceX_frontend/graph/badge.svg)](https://codecov.io/gh/ssl-hep/ServiceX_frontend)
        
        [![PyPI version](https://badge.fury.io/py/servicex.svg)](https://badge.fury.io/py/servicex)
        [![Supported Python versions](https://img.shields.io/pypi/pyversions/servicex.svg)](https://pypi.org/project/servicex/)
        
        # Introduction
        
        Given you have a selection string, this library will manage submitting it to a ServiceX instance and retreiving the data locally for you.
        The selection string is often generated by another front-end library, for example:
        
        - func_adl.xAOD (for ATLAS xAOD's)
        - func_adl.XXX (for flat ntuples)
        - xxx for columns
        
        These libraries are just coming up now, so this list is just an outline.
        
        # Prerequisites
        
        Before you install this library you'll need:
        
        - An environment based on python 3.7 or later
        - A ServiceX end-point. For example, `http://localhost:5000/servicex`.
        
        # Usage
        
        The following lines will return a `pandas.DataFrame` containing all the jet pT's from an ATLAS xAOD file containing Z->ee Monte Carlo:
        
        ```
            import servicex
            query = "(call ResultTTree (call Select (call SelectMany (call EventDataset (list 'localds:bogus')) (lambda (list e) (call (attr e 'Jets') 'AntiKt4EMTopoJets'))) (lambda (list j) (/ (call (attr j 'pt')) 1000.0))) (list 'JetPt') 'analysis' 'junk.root')"
            dataset = "mc15_13TeV:mc15_13TeV.361106.PowhegPythia8EvtGen_AZNLOCTEQ6L1_Zee.merge.DAOD_STDM3.e3601_s2576_s2132_r6630_r6264_p2363_tid05630052_00"
            r = servicex.get_data(query , dataset, servicex_endpoint=endpoint)
            print(r)
        ```
        And the output in a terminal window from running the above script (takes about 1-2 minutes to complete):
        ```
        python scripts\run_test.py http://localhost:5000/servicex
                    JetPt
        entry
        0       38.065707
        1       31.967096
        2        7.881337
        3        6.669581
        4        5.624053
        ...           ...
        710183  42.926141
        710184  30.815709
        710185   6.348002
        710186   5.472711
        710187   5.212714
        
        [11355980 rows x 1 columns]
        ```
        
        If your query is badly formed or there is an other problem with the backend, an exception will be thrown.
        
        If you'd like to be able to submit multiple queries and have them run on the ServiceX back end in parallel, it may be best to use the `asyncio` interface, which has the identical signature, but is called `get_data_async`.
        
        # Features
        
        Implemented:
        
        - Accepts a `qastle` formatted query
        - Exceptions are used to report back errors of all sorts from the service to the user's code.
        - Data is return as a `pandas.DataFrame` or a  `awkward` array (see the `data_type` parameter)
        - Complete returned data must fit in the process' memory
        - Run in an async or a non-async environment and non-async methods will accomodate automatically (including `jupyter` notebooks).
        - Support up to 100 simultanious queries from a laptop-like front end without overwhelming the local machine (hopefully ServiceX will be overwhelmed!)
        - Start downloading files as soon as they are ready (before ServiceX is done with the complete transform).
        
        Comming:
        
        - Data is returned as a list of ROOT files located in a specified directory
        - Make it easy to submit the same query for 100 different datasets
        
        # Testing
        
        This code has been tested in several environments:
        
        - Windows, Linux, MacOS
        - Python 3.6, 3.7, 3.8
           - 3.8.0 and 3.8.1 only. Unfortunately, 3.8.2 has caused `nest_asyncio` to fail. Until that package is updated we are stuck at 3.8.1.
        - Jupyter Notebooks (not automated), regular python command-line invoked source files
        
        # Development
        
        For any changes please feel free to submit pull requests!
        
        To do development please setup your environment with the following steps:
        
        1. A python 3.7 development environment
        1. Pull down this package, XX
        1. `python -m pip install -e .[test]`
        1. Run the tests to make sure everything is good: `pytest`.
        
        Then add tests as you develop. When you are done, submit a pull request with any required changes to the documentation and the online tests will run.
        
Platform: Any
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: Programming Language :: Python
Classifier: Topic :: Software Development
Classifier: Topic :: Utilities
Requires-Python: >=3.6, <=3.8.1
Description-Content-Type: text/markdown
Provides-Extra: test
