Metadata-Version: 2.1
Name: py-hive-iomete
Version: 2.1.2
Summary: Python interface to iomete (Hive)
Home-page: https://github.com/iomete/py-hive-iomete
Author: Vusal Dadalov
Author-email: vusal@iomete.com
License: Apache License, Version 2.0
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Database :: Front-Ends
License-File: LICENSE
Requires-Dist: future
Requires-Dist: python-dateutil
Requires-Dist: thrift==0.13.0
Provides-Extra: sqlalchemy
Requires-Dist: sqlalchemy<=1.4.46,>=1.3.0; extra == "sqlalchemy"
Provides-Extra: test
Requires-Dist: mock>=1.0.0; extra == "test"
Requires-Dist: pytest; extra == "test"
Requires-Dist: pytest-cov; extra == "test"
Requires-Dist: requests>=1.0.0; extra == "test"
Requires-Dist: sqlalchemy<=1.4.46,>=1.3.0; extra == "test"
Requires-Dist: thrift==0.13.0; extra == "test"
Provides-Extra: presto
Requires-Dist: requests>=1.0.0; extra == "presto"
Provides-Extra: trino
Requires-Dist: requests>=1.0.0; extra == "trino"
Provides-Extra: hive
Requires-Dist: sasl>=0.2.1; extra == "hive"
Requires-Dist: thrift>=0.10.0; extra == "hive"
Requires-Dist: thrift_sasl>=0.1.0; extra == "hive"
Provides-Extra: kerberos
Requires-Dist: requests_kerberos>=0.12.0; extra == "kerberos"

==============
py-hive-iomete
==============

py-hive-iomete is a collection of Python `DB-API <http://www.python.org/dev/peps/pep-0249/>`_ and
`SQLAlchemy <http://www.sqlalchemy.org/>`_ interfaces for
`iomete hive <http://hive.apache.org/>`_.

Usage
=====

DB-API
------
.. code-block:: python

    from pyhive import hive

    connection = hive.connect(
        host="<data_plane_host>",
        port=<data_plane_port>,
        scheme="http", # or "https"
        lakehouse="<lakehouse_cluster_name>",
        data_plane=None # or data_plane (namespace)
        database="default",
        username="<username>",
        password="<password>"
    )

    cursor = connection.cursor()
    cursor.execute("SELECT * FROM my_awesome_data LIMIT 10")

    print(cursor.fetchone())
    print(cursor.fetchall())

DB-API (asynchronous)
---------------------
.. code-block:: python

    from pyhive import hive
    from TCLIService.ttypes import TOperationState

    connection = hive.connect(
        host="<data_plane_host>",
        port=<data_plane_port>,
        scheme="http", # or "https"
        lakehouse="<lakehouse_cluster_name>",
        data_plane=None # or data_plane (namespace)
        database="default",
        username="<username>",
        password="<password>"
    )

    cursor = connection.cursor()

    cursor.execute("SELECT * FROM my_awesome_data LIMIT 10", async_=True)

    status = cursor.poll().operationState

    while status in (TOperationState.INITIALIZED_STATE, TOperationState.RUNNING_STATE):
        logs = cursor.fetch_logs()
        for message in logs:
            print(message)

        # If needed, an asynchronous query can be cancelled at any time with:
        # cursor.cancel()

        status = cursor.poll().operationState

    print(cursor.fetchall())


SQLAlchemy
----------
First install this package to register it with SQLAlchemy (see ``setup.py``).

.. code-block:: python

    from sqlalchemy.engine import create_engine
    from sqlalchemy.orm import sessionmaker
    from sqlalchemy.schema import *

    # Possible dialects (hive and iomete are both operate identically):
    # hive+http
    # hive+https
    # iomete+http
    # iomete+https
    engine = create_engine(
        'iomete+https://<username>:<password>@<data_plane_host>:<data_plane_port>/<database>?lakehouse=<lakehouse_cluster_name>')

    # or with data_plane specified
    # engine = create_engine(
    #    'iomete+https://<username>:<password>@<data_plane_host>:<data_plane_port>/<database>?lakehouse=<lakehouse_cluster_name>&data_plane=<data_plane>')

    # Alternatively, "hive" driver could be used as well
    # engine = create_engine(
    #    'hive+https://<username>:<password>@<data_plane_host>:<data_plane_port>/<database>?lakehouse=<lakehouse_cluster_name>')

    session = sessionmaker(bind=engine)()
    records = session.query(Table('my_awesome_data', MetaData(bind=engine), autoload=True)) \
        .limit(10) \
        .all()
    print(records)

Note: query generation functionality is not exhaustive or fully tested, but there should be no
problem with raw SQL.


Requirements
============

Install using

- ``pip install 'py-hive-iomete'`` for the DB-API interface
- ``pip install 'py-hive-iomete[sqlalchemy]'`` for the SQLAlchemy interface

py-hive-iomete works with

- Python 2.7 / Python 3

Changelog
=========
See https://github.com/iomete/py-hive-iomete/releases.

Contributing
============
- Changes must come with tests, with the exception of trivial things like fixing comments. See .travis.yml for the test environment setup.
- Notes on project scope:

  - This project is intended to be a minimal iomete (hive) client that does that one thing and nothing else.
    Features that can be implemented on top of py-hive-iomete, such integration with your favorite data analysis library, are likely out of scope.
  - We prefer having a small number of generic features over a large number of specialized, inflexible features.

Updating TCLIService
====================

The TCLIService module is autogenerated using a ``TCLIService.thrift`` file. To update it, the
``generate.py`` file can be used: ``python generate.py <TCLIServiceURL>``. When left blank, the
version for Hive 2.3 will be downloaded.
