Metadata-Version: 2.4
Name: bqdf
Version: 0.0.3
Summary: Ergonomic wrapper for pandas_gbq that simplifies loading BigQuery data into DataFrames
Home-page: https://github.com/motdam/bqdf
Author: motdam
Author-email: tmad@hotmail.co.uk
License: Apache Software License 2.0
Keywords: nbdev jupyter notebook python
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: License :: OSI Approved :: Apache Software License
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: fastcore
Requires-Dist: pandas
Requires-Dist: google-cloud-bigquery
Requires-Dist: google-cloud-bigquery-storage
Requires-Dist: pandas-gbq
Provides-Extra: dev
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: license
Dynamic: license-file
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# bqdf


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

## Usage

### Installation

Install latest from the GitHub
[repository](https://github.com/motdam/bqdf):

``` sh
$ pip install git+https://github.com/motdam/bqdf.git
```

or from [conda](https://anaconda.org/motdam/bqdf)

``` sh
$ conda install -c motdam bqdf
```

or from [pypi](https://pypi.org/project/bqdf/)

``` sh
$ pip install bqdf
```

### Documentation

Documentation can be found hosted on this GitHub
[repository](https://github.com/motdam/bqdf)’s
[pages](https://motdam.github.io/bqdf/). Additionally you can find
package manager specific guidelines on
[conda](https://anaconda.org/motdam/bqdf) and
[pypi](https://pypi.org/project/bqdf/) respectively.

## How to use

This lib provides convenience functions for streamlining the interface
of the pandas-gbq library to perform CRUD operations in BigQuery more
quickly

``` python
import pandas_gbq
import pandas as pd
```

``` python
top_terms_query = """
-- todays top 10 search terms in England
SELECT refresh_date, rank, term, score, percent_gain / 100 as percent_gain, country_name, week
FROM `bigquery-public-data.google_trends.international_top_rising_terms` 
WHERE country_name = 'United Kingdom'
  and refresh_date = current_date - 1
  and region_name = 'England'
order by refresh_date desc, week desc, rank
limit 5
"""
```

### Reading a BigQuery table

``` python
df = read(top_terms_query, project_id='bq-sandbox-motdam')
df.head()
```

    Downloading:   0%|          |Downloading: 100%|██████████|
    Loaded 5 rows × 7 cols (0.0000 GB) from query in 1.31s
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 5 entries, 0 to 4
    Data columns (total 7 columns):
     #   Column        Non-Null Count  Dtype         
    ---  ------        --------------  -----         
     0   refresh_date  5 non-null      datetime64[ns]
     1   rank          5 non-null      Int64         
     2   term          5 non-null      object        
     3   score         5 non-null      Int64         
     4   percent_gain  5 non-null      Float64       
     5   country_name  5 non-null      object        
     6   week          5 non-null      dbdate        
    dtypes: Float64(1), Int64(2), datetime64[ns](1), dbdate(1), object(2)
    memory usage: 427.0+ bytes
    None

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

|  | refresh_date | rank | term | score | percent_gain | country_name | week |
|----|----|----|----|----|----|----|----|
| 0 | 2025-11-24 | 1 | liverpool vs nottm forest | 15 | 86.0 | United Kingdom | 2025-11-23 |
| 1 | 2025-11-24 | 2 | leeds united vs aston villa | 100 | 63.5 | United Kingdom | 2025-11-23 |
| 2 | 2025-11-24 | 3 | arsenal vs tottenham | 100 | 62.0 | United Kingdom | 2025-11-23 |
| 3 | 2025-11-24 | 4 | newcastle vs man city | 26 | 51.0 | United Kingdom | 2025-11-23 |
| 4 | 2025-11-24 | 5 | chayote | 9 | 35.0 | United Kingdom | 2025-11-23 |

</div>

To recreate the above with the original library you would need the below
boiler plate to inspect the results and convert columns into pandas
friendly dtypes.

``` python
df = pandas_gbq.read_gbq(top_terms_query, project_id='bq-sandbox-motdam')
df = df.astype({
    'percent_gain':'Float64'
})
df['week'] = pd.to_datetime(df['week'])
df['refresh_date'] = pd.to_datetime(df['refresh_date'])
print(df.info())
df.head()
```

    Downloading:   0%|          |Downloading: 100%|██████████|
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 5 entries, 0 to 4
    Data columns (total 7 columns):
     #   Column        Non-Null Count  Dtype         
    ---  ------        --------------  -----         
     0   refresh_date  5 non-null      datetime64[ns]
     1   rank          5 non-null      Int64         
     2   term          5 non-null      object        
     3   score         5 non-null      Int64         
     4   percent_gain  5 non-null      Float64       
     5   country_name  5 non-null      object        
     6   week          5 non-null      datetime64[ns]
    dtypes: Float64(1), Int64(2), datetime64[ns](2), object(2)
    memory usage: 427.0+ bytes
    None

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

|  | refresh_date | rank | term | score | percent_gain | country_name | week |
|----|----|----|----|----|----|----|----|
| 0 | 2025-11-24 | 1 | liverpool vs nottm forest | 15 | 86.0 | United Kingdom | 2025-11-23 |
| 1 | 2025-11-24 | 2 | leeds united vs aston villa | 100 | 63.5 | United Kingdom | 2025-11-23 |
| 2 | 2025-11-24 | 3 | arsenal vs tottenham | 100 | 62.0 | United Kingdom | 2025-11-23 |
| 3 | 2025-11-24 | 4 | newcastle vs man city | 26 | 51.0 | United Kingdom | 2025-11-23 |
| 4 | 2025-11-24 | 5 | chayote | 9 | 35.0 | United Kingdom | 2025-11-23 |

</div>

### Writing a df to BigQuery

The rest [`to`](https://motdam.github.io/bqdf/core.html#to) function is
unchanged beyond removing the redundant \_gbq suffix. We can write our
df back into BigQuery using hte
[`to`](https://motdam.github.io/bqdf/core.html#to) function.

``` python
# Write the dataframe to a temporary table
to(df, 'bq-sandbox-motdam.temporary.top_10_eng_search_terms', if_exists='replace')
```

      0%|          | 0/1 [00:00<?, ?it/s]100%|██████████| 1/1 [00:00<00:00, 9198.04it/s]

    Sent 5 rows × 7 cols (0.0000 GB) to bq-sandbox-motdam.temporary.top_10_eng_search_terms in 3.53s

### Executing SQL in BigQuery

The [`ex`](https://motdam.github.io/bqdf/core.html#ex) fucntion enables
non df based CRUD operations within the same api which can be useful for
creating feature processing pipelines.

``` python
project = 'bq-sandbox-motdam'

def create_top_terms(period, days):
    return f"""
    CREATE OR REPLACE TABLE `{project}.temporary.top_terms_{period}` AS
    WITH ranked AS (
      SELECT region_name, term, COUNT(*) as appearances, AVG(rank) as avg_rank,
        ROW_NUMBER() OVER (PARTITION BY region_name ORDER BY COUNT(*) DESC, AVG(rank)) as rn
      FROM `bigquery-public-data.google_trends.international_top_rising_terms`
      WHERE country_name = 'United Kingdom'
        AND region_name IN ('England', 'Scotland', 'Wales', 'Northern Ireland')
        AND refresh_date BETWEEN CURRENT_DATE() - {days} AND CURRENT_DATE()
        AND rank <= 100
      GROUP BY region_name, term
    )
    SELECT region_name, term as top_term_{period}
    FROM ranked WHERE rn = 1
    """

ex(create_top_terms('today', 1), project_id=project)
ex(create_top_terms('week', 8), project_id=project)
ex(create_top_terms('month', 31), project_id=project)
ex(create_top_terms('year', 366), project_id=project)

final_query = f"""
SELECT t.region_name, t.top_term_today, w.top_term_week, m.top_term_month, y.top_term_year
FROM `{project}.temporary.top_terms_today` as t
JOIN `{project}.temporary.top_terms_week` as w ON t.region_name = w.region_name
JOIN `{project}.temporary.top_terms_month` as m ON t.region_name = m.region_name
JOIN `{project}.temporary.top_terms_year` as y ON t.region_name = y.region_name
ORDER BY t.region_name
"""

read(final_query, project_id=project)
```

    Processed 0.3883 GB, 0 rows affected in 2.21s
    Processed 2.9971 GB, 0 rows affected in 2.35s
    Processed 11.6400 GB, 0 rows affected in 2.17s
    Processed 12.1727 GB, 0 rows affected in 2.54s
    Downloading:   0%|          |Downloading: 100%|██████████|
    Loaded 4 rows × 5 cols (0.0000 GB) from query in 0.63s
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 4 entries, 0 to 3
    Data columns (total 5 columns):
     #   Column          Non-Null Count  Dtype 
    ---  ------          --------------  ----- 
     0   region_name     4 non-null      object
     1   top_term_today  4 non-null      object
     2   top_term_week   4 non-null      object
     3   top_term_month  4 non-null      object
     4   top_term_year   4 non-null      object
    dtypes: object(5)
    memory usage: 292.0+ bytes
    None

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

|  | region_name | top_term_today | top_term_week | top_term_month | top_term_year |
|----|----|----|----|----|----|
| 0 | England | liverpool vs nottm forest | rugby today | ftse 100 | india vs australia |
| 1 | Northern Ireland | liverpool vs nottm forest | rugby today | ftse 100 | india vs australia |
| 2 | Scotland | liverpool vs nottm forest | rugby today | ftse 100 | india vs australia |
| 3 | Wales | liverpool vs nottm forest | rugby today | ftse 100 | india vs australia |

</div>

British search history in a nutshell: ‘Is it raining?’ followed
immediately by ‘Can I afford to move somewhere sunny?’

## Developer Guide

If you are new to using `nbdev` here are some useful pointers to get you
started.

### Install bqdf in Development mode

``` sh
# make sure bqdf package is installed in development mode
$ pip install -e .

# make changes under nbs/ directory
# ...

# compile to have changes apply to bqdf
$ nbdev_prepare
```
