Metadata-Version: 2.1
Name: pygbq
Version: 0.25
Summary: Easily integrate data in BigQuery
Home-page: https://github.com/ZiggerZZ/pygbq
Author: Zigfrid Zvezdin
Author-email: ziggerzz@gmail.com
License: UNKNOWN
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown
Requires-Dist: pandas
Requires-Dist: pandas-gbq
Requires-Dist: pytz
Requires-Dist: google-auth
Requires-Dist: google-cloud-bigquery
Requires-Dist: google-cloud-logging
Requires-Dist: google-cloud-secret-manager
Requires-Dist: bigquery-schema-generator
Requires-Dist: tqdm
Requires-Dist: pyarrow

# PyGBQ

> Easily integrate data in BigQuery

## Example

```python
from pygbq import Client
import requests

client = Client()
token = client.get_secret('secret_name')
headers = {'Authorization': f'Bearer {token}'}
url = ...
data = requests.get(url, headers=headers).json()
response = client.update_table_using_temp(data=data, table_id='mydataset.mytable', how=['id'])
```

This snippets gets some data from an `url` and (merges)[https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax#merge_statement] ((upserts)[https://en.wikipedia.org/wiki/Merge_(SQL)]) on `id` column it to the table `mytable` in the dataset `mydataset`.


## Install and set up

`pip install pygbq`

Set up the [authentication](https://cloud.google.com/docs/authentication/getting-started).

## How it works

### how=['column1', 'column2', ...]

PyGBQ generates one or many temporary tables that are merged into the target table. During the merge *all* the columns of the target table are updated.
Here's how it looks like:
1. Split `data` into batches.
2. For every batch create `mydataset.mytable_tmp_SOMERANDOMPOSTFIX`, put it inside and run
```sql
MERGE myproject.mydataset.mytable T
USING myproject.mydataset.mytable_tmp_SOMERANDOMPOSTFIX S
ON T.column1 = S.column1 AND T.column2 = S.column2
WHEN NOT MATCHED THEN
	INSERT ROW
WHEN MATCHED THEN
	UPDATE SET
column1 = S.column1,
column2 = S.column2,
column3 = S.column3,
column4 = S.column4
...
```

### how='replace'

1. Creates a table `mydataset.mytable` with schema automatically generated by `bigquery-schema-generator`.
2. Splits data into batches and inserts it to `mydataset.mytable`.

### how='fail'

Identical to `how='replace'` except that it fails if `mydataset.mytable` exists. 

### how='insert'

Splits data into batches and inserts (appends) it to `mydataset.mytable`.


For more details look at [Documentation](#documentation) section.


## Documentation

Here's the documentation with default parameters.


### Client

#### init

```python
from pygbq import Client
client = Client(default_dataset=None, path_to_key=None)
```

Initalizes a client. You can specify:

* `default_dataset` - (str) default dataset that the `client` will be using to reference tables
* `path_to_key` - (str) By default PyGQB uses `from google.auth import default` to get credentials, but you can specify this parameter if you wish to use `from google.auth import load_credentials_from_file` instead.

#### update_table_using_temp

```python
client.update_table_using_temp(data, table_id, how, schema: Union[str, List[dict]] = None, expiration=1, max_insert_num_rows=4000)
```

Updates table.

* `data` - list of dict
* `table_id` - (str) Table id, could have one of the following forms:
	* `table_name` if `default_dataset` is set
	* `dataset_name.table_name`
	* `project_id.dataset_name.table_name`
* `how` - (str or List[dict]) Look at [How it works](#how-it-works) section
* `expiration` - (float) temporary tables expiration time in hours
* `max_insert_num_rows` - (int) how many rows per temporary table is inserted

#### get_secret

```python
client.get_secret(self, secret_id, version="latest")
```

Get a secret stored in Secret Manager.

* `secret_id` - (str) Secret name
* `version` - Secret version

#### add_secret

```python
client.get_secret(self, secret_id, version="latest")
```

Adds a new secret version in Secret Manager.

* `secret_id` - (str) Secret name
* `data` - (str) Secret value

### read_jsonl

```python
from pygbq import read_jsonl
read_jsonl(name: str = "data.jsonl")
```

Reads a new line delimited json.

