Metadata-Version: 2.1
Name: hivejdbc
Version: 0.2.3
Summary: Hive database driver via jdbc
Home-page: https://github.com/OpenBigDataPlatform/hivejdbc
Author: OpenBigDataPlatform
License: Apache-2.0
Keywords: dbapi jdbc hive hadoop
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Java
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Database
Classifier: Topic :: Software Development :: Libraries :: Java Libraries
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Description-Content-Type: text/markdown
Requires-Dist: pyjdbc (==0.2.2)

# hivejdbc
hivejdbc is `db-api-2.0` compliant **Apache Hive** driver that supports
- kerberos
- ssl
- service discovery via zookeeper
- host-connection list
- and all other jdbc driver options

# installation
```properties
pip3 install hivejdbc
```

## Cursors
`hivejdbc` can use a `dictonary` cursor if desired.

```python
from hivejdbc import connect, DictCursor
conn = connect('example.com', 'default', cursor=DictCursor)

```

### Cursors support `with`
```python
from hivejdbc import connect
conn = connect('example.com', database='default')
with conn.cursor() as cursor:
    cursor.execute('select * from test.persons')
    rows = cursor.fetchall()
```

### Cursors are iterable
```python
from hivejdbc import connect
conn = connect('example.com', database='default')
cursor = conn.cursor()
cursor.execute('select * from test.persons')
for row in cursor:
    print(row[0])
cursor.close()
```

### Cursors Support
- `fetchone()`
- `fetchmany()`
- `fetchall()`

```python
from hivejdbc import connect
conn = connect('example.com', database='default')
cursor = conn.cursor()
cursor.execute('select * from test.persons')
cursor.fetchone() # fetch first row or None
cursor.fetchmany(5) # fetch next 5 rows
cursor.fetchall() # fetch remaining rows or empty list
cursor.close()
```

## Connection Strings
`hivejdbc` features many `connect` function arguments. Many of these arguments can be ignored 
and are simply present to offer the full options provided by the **Hive** jdbc driver.

To import the `hivejdbc` connect function:
```python
from hivejdbc import connect
```

### Unsecured Hive Instance
to connect to an unsecured hive instance listening on the default port `10000`, and the `default` database:
```python
conn = connect('example.com', 'default')
```

unless all required `hive-jars` are on the classpath already you'll need to define the driver path  
Java uses `jar` files to combine many libraries into one. We'll use our `fatjar` to provide all the required 
dependencies in one place.  
Make sure you're using the correct driver for your **Hive** version.
```python
conn = connect('example.com', 'default', driver='hive-client-hive-2.1.1-hdfs-3.0.3-fatjar.jar')
```

to connect with a custom port of `10015`
```python
conn = connect('example.com', 'default', port=10015)
```

### Username and Password
```python
conn = connect(host='example.com', 
               database='default', 
               port=10015, 
               user='hive_user', 
               password='secret')
```

### SSL
If the hive-server has `ssl` enabled you'll need to provide a `jks` trust store that contains the servers public 
certificate.
```python
conn = connect(host='hive2.example.com',
               port=10015,
               database='default',
               driver='hive-client-hive-2.1.1-hdfs-3.0.3-fatjar.jar',
               ssl=True,
               trust_store='./truststore.jks',
               trust_password='changeit',
               principal='hive/hive2.example.com@EXAMPLE.COM',
               user_principal='hive/hive2.example.com',
               user_keytab='hive.keytab',
               realm='EXAMPLE.COM',
               kdc='kerberosdc.example.com:88')
```

### Kerberos
Authenticating with kerberos can be done a few ways:
1. get valid kerberos credentials via `kinit` before running `hivejdbc`
1. rely on `hivejdbc` to obtain kerberos credentials via a `user-principal` and `user-keytab` provided 
   to the program.


#### Operating System `kinit`
Connect to...
- a `ssl` enabled cluster
- a secured cluster (`kerberos`)
- using the operating systems kerberos configuration
  default locations are searched depending on platform
- using the operating system `kinit` token  
  default locations for the `token-cache` are searched
- if `kinit` has not been performed, or a `token-cache` cannot be found an exception will be thrown
```python
conn = connect(host='hive2.example.com',
               port=10015,
               database='default',
               driver='hive-client-hive-2.1.1-hdfs-3.0.3-fatjar.jar',
               ssl=True,
               trust_store='./truststore.jks',
               trust_password='changeit',
               principal='hive/hive2.example.com@EXAMPLE.COM')
```

#### `hivejdbc` does the `kinit` via `keytab` and a custom `krb5.conf`
connect to... 
- a `ssl` enabled cluster
- a secured cluster (`kerberos`)
- using the operating systems kerberos configuration krb5.conf
- using a `keytab` for authentication  
  the keytab will be used to login via java's built-in kerberos implementation
  avoiding the need for any operating system dependency
- we will provide the `kdc` and `realm` via the `krb5_conf` argument
  if we didn't provide `krb5_conf` argument default locations would be searched within various system paths
```python
conn = connect(host='hive2.example.com',
               port=10015,
               database='default',
               driver='hive-client-hive-2.1.1-hdfs-3.0.3-fatjar.jar',
               ssl=True,
               trust_store='./truststore.jks',
               trust_password='changeit',
               principal='hive/hive2.example.com@EXAMPLE.COM',
               krb5_conf='kerberos/custom_krb5.conf',
               user_principal='hive/hive2.example.com',
               user_keytab='user.keytab')
```



#### `hivejdbc` does the `kinit` via `keytab` with no `krb5.conf`
connect to...
- an ssl enabled cluster
- a secured cluster (kerberos)
- not using the operating system or relying on any of its configurations
- manually setting the realm, and the kerberos "kdc" to authenticate to
- using a keytab for authentication
- this configuration is the most portable...
```python
conn = connect(host='hive2.example.com',
               port=10015,
               database='default',
               driver='hive-client-hive-2.1.1-hdfs-3.0.3-fatjar.jar',
               ssl=True,
               trust_store='./truststore.jks',
               trust_password='changeit',
               principal='hive/hive2.example.com@EXAMPLE.COM',
               user_principal='hive/hive2.example.com',
               user_keytab='hive.keytab',
               realm='EXAMPLE.COM',
               kdc='kerberosdc.example.com:88')
```

## Queries and Parameters

For these examples we'll setup a `test` database with a `persons` table...

```python
cursor = conn.cursor()
cursor.execute('CREATE DATABASE IF NOT EXISTS test')
cursor.execute('DROP TABLE IF EXISTS test.persons')
cursor.execute('CREATE TABLE test.persons (name VARCHAR(64), age INT, address STRING, '
               'first TIMESTAMP, balance DECIMAL(12,2))')
```

Our table sql will have 5 columns defined in the above statement:
```sql
CREATE TABLE test.persons (
    name VARCHAR(64), 
    age INT, 
    address STRING,
    first TIMESTAMP, 
    balance DECIMAL(12,2)
)
```

### single insert
Let's insert a single record:
```python
cursor.execute('''    
INSERT INTO TABLE test.persons (name, age, address, first, balance)
VALUES ('john doe', 35, '1583 Whistling Pines Dr, Redstone CO 80612', '08-22-1981 00:00:00', '100.10')
''')
```

### `positional` parameterized sql query
Insert a single record, using paramterized arguments that will automatically be escaped.    
This prevents sql injection as well

```python
cursor.execute('''    
INSERT INTO TABLE test.persons (name, age, address, first, balance)
VALUES (%s, %s, %s, %s, %s)
''', ['Kevin Jones', 28, '802 1st st, Raleigh NC', '12-23-2020 00:00:00', 85.25])
```

The signature of `execute` is:
```python
def execute(sql, params=None):
    ""
```
- **sql** is the sql statement
- **params** are `named (dict)` or `positional (sequence)` arguments used by the sql statement for variable 
  substitution

### `named` parameterized sql query
**INSERT** with named parameters 

In addition to positional parameters using `%s` we support `named parameters` as well.

You can see the named arguments are defined below in the `sql` statement as: `(:name, :age, :addr, :dt, :bal)`  

The second parameter to the `execute` method is a `dictionary` where the keys are equal to the parameters defined in the sql
```python
cursor.execute('''
INSERT INTO TABLE test.persons (name, age, address, first, balance)
VALUES (:name, :age, :addr, :dt, :bal)
''', {'name': 'Bob Clark',
      'age': 41,
      'addr': '348 W Dickinson Rd, Norfolk VA',
      'dt': '12-23-2020 00:00:00',
      'bal': 200.20})
```

### Using `executemany`
You can execute many queries in one python statement using `executemany`  

Note that this is for programmer ease of use; hive's `jdbc` driver does not support `batch-mode`, so this functionality is faked and is no more 
efficient than executing 3 statements individually.
```
cursor.executemany('''
INSERT INTO TABLE test.persons (name, age, address, first, balance)
VALUES (%s, %s, %s, %s, %s)
''', [
    ('john doe', 35, '1583 Whistling Pines Dr, Redstone CO 80612', '08-22-1981 00:00:00', 100.10),
    ('Kevin Jones', 28, '802 1st st, Raleigh NC', '12-23-2020 00:00:00', 85.25),
    ('Bob Clark', 41, '348 W Dickinson Rd, Norfolk VA', '12-23-2020 00:00:00', 200.20)
])
```


