Metadata-Version: 2.4
Name: bqdf
Version: 0.0.5
Summary: Ergonomic wrapper for pandas_gbq that simplifies loading BigQuery data into DataFrames
Home-page: https://github.com/motdam/bqdf
Author: motdam
Author-email: tmad@hotmail.co.uk
License: Apache Software License 2.0
Keywords: nbdev jupyter notebook python
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: License :: OSI Approved :: Apache Software License
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: fastcore
Requires-Dist: pandas
Requires-Dist: google-cloud-bigquery
Requires-Dist: google-cloud-bigquery-storage
Requires-Dist: pandas-gbq
Provides-Extra: dev
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: license
Dynamic: license-file
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# bqdf


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

## Usage

### Installation

Install latest from the GitHub
[repository](https://github.com/motdam/bqdf):

``` sh
$ pip install git+https://github.com/motdam/bqdf.git
```

or from [conda](https://anaconda.org/motdam/bqdf)

``` sh
$ conda install -c motdam bqdf
```

or from [pypi](https://pypi.org/project/bqdf/)

``` sh
$ pip install bqdf
```

### Documentation

Documentation can be found hosted on this GitHub
[repository](https://github.com/motdam/bqdf)’s
[pages](https://motdam.github.io/bqdf/). Additionally you can find
package manager specific guidelines on
[conda](https://anaconda.org/motdam/bqdf) and
[pypi](https://pypi.org/project/bqdf/) respectively.

## How to use

This lib provides convenience functions for streamlining the interface
of the pandas-gbq library to perform CRUD operations in BigQuery more
quickly

``` python
import pandas_gbq
import pandas as pd
```

``` python
top_terms_query = """
-- todays top 10 search terms in England
SELECT refresh_date, rank, term, score, percent_gain / 100 as percent_gain, country_name, week
FROM `bigquery-public-data.google_trends.international_top_rising_terms` 
WHERE country_name = 'United Kingdom'
  and refresh_date = current_date - 1
  and region_name = 'England'
order by refresh_date desc, week desc, rank
limit 5
"""
```

### Reading a BigQuery table

``` python
df = read(top_terms_query, project_id='bq-sandbox-motdam')
df.head()
```

    Downloading:   0%|          |Downloading: 100%|██████████|
    Loaded 5 rows × 7 cols (0.0000 GB) from query in 0.72s
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 5 entries, 0 to 4
    Data columns (total 7 columns):
     #   Column        Non-Null Count  Dtype         
    ---  ------        --------------  -----         
     0   refresh_date  5 non-null      datetime64[ns]
     1   rank          5 non-null      Int64         
     2   term          5 non-null      object        
     3   score         5 non-null      Int64         
     4   percent_gain  5 non-null      Float64       
     5   country_name  5 non-null      object        
     6   week          5 non-null      dbdate        
    dtypes: Float64(1), Int64(2), datetime64[ns](1), dbdate(1), object(2)
    memory usage: 427.0+ bytes
    None

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

|  | refresh_date | rank | term | score | percent_gain | country_name | week |
|----|----|----|----|----|----|----|----|
| 0 | 2025-11-27 | 1 | liverpool vs psv | 100 | 63.5 | United Kingdom | 2025-11-23 |
| 1 | 2025-11-27 | 2 | psg vs tottenham | 70 | 32.5 | United Kingdom | 2025-11-23 |
| 2 | 2025-11-27 | 3 | richard branson | 100 | 32.0 | United Kingdom | 2025-11-23 |
| 3 | 2025-11-27 | 4 | arsenal vs bayern | 100 | 31.5 | United Kingdom | 2025-11-23 |
| 4 | 2025-11-27 | 5 | rinky dink | 100 | 30.5 | United Kingdom | 2025-11-23 |

</div>

To recreate the above with the original library you would need the below
boiler plate to inspect the results and convert columns into pandas
friendly dtypes.

``` python
df = pandas_gbq.read_gbq(top_terms_query, project_id='bq-sandbox-motdam')
df = df.astype({
    'percent_gain':'Float64'
})
df['week'] = pd.to_datetime(df['week'])
df['refresh_date'] = pd.to_datetime(df['refresh_date'])
print(df.info())
df.head()
```

    Downloading:   0%|          |Downloading: 100%|██████████|
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 5 entries, 0 to 4
    Data columns (total 7 columns):
     #   Column        Non-Null Count  Dtype         
    ---  ------        --------------  -----         
     0   refresh_date  5 non-null      datetime64[ns]
     1   rank          5 non-null      Int64         
     2   term          5 non-null      object        
     3   score         5 non-null      Int64         
     4   percent_gain  5 non-null      Float64       
     5   country_name  5 non-null      object        
     6   week          5 non-null      datetime64[ns]
    dtypes: Float64(1), Int64(2), datetime64[ns](2), object(2)
    memory usage: 427.0+ bytes
    None

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

|  | refresh_date | rank | term | score | percent_gain | country_name | week |
|----|----|----|----|----|----|----|----|
| 0 | 2025-11-27 | 1 | liverpool vs psv | 100 | 63.5 | United Kingdom | 2025-11-23 |
| 1 | 2025-11-27 | 2 | psg vs tottenham | 70 | 32.5 | United Kingdom | 2025-11-23 |
| 2 | 2025-11-27 | 3 | richard branson | 100 | 32.0 | United Kingdom | 2025-11-23 |
| 3 | 2025-11-27 | 4 | arsenal vs bayern | 100 | 31.5 | United Kingdom | 2025-11-23 |
| 4 | 2025-11-27 | 5 | rinky dink | 100 | 30.5 | United Kingdom | 2025-11-23 |

</div>

### Writing a df to BigQuery

The rest [`to`](https://motdam.github.io/bqdf/core.html#to) function is
unchanged beyond removing the redundant \_gbq suffix. We can write our
df back into BigQuery using hte
[`to`](https://motdam.github.io/bqdf/core.html#to) function.

``` python
# Write the dataframe to a temporary table
to(df, 'bq-sandbox-motdam.temporary.top_10_eng_search_terms', if_exists='replace')
```

      0%|          | 0/1 [00:00<?, ?it/s]100%|██████████| 1/1 [00:00<00:00, 5899.16it/s]

    Sent 5 rows × 7 cols (0.0000 GB) to bq-sandbox-motdam.temporary.top_10_eng_search_terms in 2.86s

### Executing SQL in BigQuery

The [`ex`](https://motdam.github.io/bqdf/core.html#ex) fucntion enables
non df based CRUD operations within the same api which can be useful for
creating feature processing pipelines.

``` python
project = 'bq-sandbox-motdam'

def create_top_terms(period, days):
    return f"""
    CREATE OR REPLACE TABLE `{project}.temporary.top_terms_{period}` AS
    WITH ranked AS (
      SELECT region_name, term, COUNT(*) as appearances, AVG(rank) as avg_rank,
        ROW_NUMBER() OVER (PARTITION BY region_name ORDER BY COUNT(*) DESC, AVG(rank)) as rn
      FROM `bigquery-public-data.google_trends.international_top_rising_terms`
      WHERE country_name = 'United Kingdom'
        AND region_name IN ('England', 'Scotland', 'Wales', 'Northern Ireland')
        AND refresh_date BETWEEN CURRENT_DATE() - {days} AND CURRENT_DATE()
        AND rank <= 100
      GROUP BY region_name, term
    )
    SELECT region_name, term as top_term_{period}
    FROM ranked WHERE rn = 1
    """

ex(create_top_terms('today', 1), project_id=project)
ex(create_top_terms('week', 8), project_id=project)
ex(create_top_terms('month', 31), project_id=project)
ex(create_top_terms('year', 366), project_id=project)

final_query = f"""
SELECT t.region_name, t.top_term_today, w.top_term_week, m.top_term_month, y.top_term_year
FROM `{project}.temporary.top_terms_today` as t
JOIN `{project}.temporary.top_terms_week` as w ON t.region_name = w.region_name
JOIN `{project}.temporary.top_terms_month` as m ON t.region_name = m.region_name
JOIN `{project}.temporary.top_terms_year` as y ON t.region_name = y.region_name
ORDER BY t.region_name
"""

read(final_query, project_id=project)
```

    Processed 0.3874 GB, 0 rows affected in 2.13s
    Processed 3.0023 GB, 0 rows affected in 2.21s
    Processed 11.6263 GB, 0 rows affected in 2.48s
    Processed 12.1607 GB, 0 rows affected in 2.22s
    Downloading:   0%|          |Downloading: 100%|██████████|
    Loaded 4 rows × 5 cols (0.0000 GB) from query in 0.74s
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 4 entries, 0 to 3
    Data columns (total 5 columns):
     #   Column          Non-Null Count  Dtype 
    ---  ------          --------------  ----- 
     0   region_name     4 non-null      object
     1   top_term_today  4 non-null      object
     2   top_term_week   4 non-null      object
     3   top_term_month  4 non-null      object
     4   top_term_year   4 non-null      object
    dtypes: object(5)
    memory usage: 292.0+ bytes
    None

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

|  | region_name | top_term_today | top_term_week | top_term_month | top_term_year |
|----|----|----|----|----|----|
| 0 | England | liverpool vs psv | man united vs everton | ftse 100 | ftse 100 |
| 1 | Northern Ireland | liverpool vs psv | man united vs everton | ftse 100 | ftse 100 |
| 2 | Scotland | liverpool vs psv | man united vs everton | ftse 100 | ftse 100 |
| 3 | Wales | liverpool vs psv | man united vs everton | ftse 100 | ftse 100 |

</div>

## Developer Guide

If you are new to using `nbdev` here are some useful pointers to get you
started.

### Install bqdf in Development mode

``` sh
# make sure bqdf package is installed in development mode
$ pip install -e .

# make changes under nbs/ directory
# ...

# compile to have changes apply to bqdf
$ nbdev_prepare
```
